## control functionals for Monte Carlo integration

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on June 28, 2016 by xi'an

A paper on control variates by Chris Oates, Mark Girolami (Warwick) and Nicolas Chopin (CREST) appeared in a recent issue of Series B. I had read and discussed the paper with them previously and the following is a set of comments I wrote at some stage, to be taken with enough gains of salt since Chris, Mark and Nicolas answered them either orally or in the paper. Note also that I already discussed an earlier version, with comments that are not necessarily coherent with the following ones! [Thanks to the busy softshop this week, I resorted to publish some older drafts, so mileage can vary in the coming days.]

First, it took me quite a while to get over the paper, mostly because I have never worked with reproducible kernel Hilbert spaces (RKHS) before. I looked at some proofs in the appendix and at the whole paper but could not spot anything amiss. It is obviously a major step to uncover a manageable method with a rate that is lower than √n. When I set my PhD student Anne Philippe on the approach via Riemann sums, we were quickly hindered by the dimension issue and could not find a way out. In the first versions of the nested sampling approach, John Skilling had also thought he could get higher convergence rates before realising the Monte Carlo error had not disappeared and hence was keeping the rate at the same √n speed.

The core proof in the paper leading to the 7/12 convergence rate relies on a mathematical result of Sun and Wu (2009) that a certain rate of regularisation of the function of interest leads to an average variance of order 1/6. I have no reason to mistrust the result (and anyway did not check the original paper), but I am still puzzled by the fact that it almost immediately leads to the control variate estimator having a smaller order variance (or at least variability). On average or in probability. (I am also uncertain on the possibility to interpret the boxplot figures as establishing super-√n speed.)

Another thing I cannot truly grasp is how the control functional estimator of (7) can be both a mere linear recombination of individual unbiased estimators of the target expectation and an improvement in the variance rate. I acknowledge that the coefficients of the matrices are functions of the sample simulated from the target density but still…

Another source of inner puzzlement is the choice of the kernel in the paper, which seems too simple to be able to cover all problems despite being used in every illustration there. I see the kernel as centred at zero, which means a central location must be know, decreasing to zero away from this centre, so possibly missing aspects of the integrand that are too far away, and isotonic in the reference norm, which also seems to preclude some settings where the integrand is not that compatible with the geometry.

I am equally nonplussed by the existence of a deterministic bound on the error, although it is not completely deterministic, depending on the values of the reproducible kernel at the points of the sample. Does it imply anything restrictive on the function to be integrated?

A side remark about the use of intractable in the paper is that, given the development of a whole new branch of computational statistics handling likelihoods that cannot be computed at all, intractable should possibly be reserved for such higher complexity models.

## Rémi Bardenet’s seminar

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on April 7, 2016 by xi'an

Next week, Rémi Bardenet is giving a seminar in Paris, Thursday April 14, 2pm, in ENSAE [room 15] on MCMC methods for tall data. Unfortunately, I will miss this opportunity to discuss with Rémi as I will be heading to La Sapienza, Roma, for Clara Grazian‘s PhD defence the next day.  And on Monday afternoon, April 11, Nicolas Chopin will give a talk on quasi-Monte Carlo for sequential problems at Institut Henri Poincaré.

## position opening at ENSAE ParisTech

Posted in Kids, Statistics, Travel, University life with tags , , , , , , , on March 28, 2016 by xi'an

There is an opening for an associate or full professor position in Statistics and Machine Learning at ENSAE, Paris (soon to move to the Paris-Saclay campus, next to École Polytechnique). The details are provided here. The deadline is April 18, 2016, for a hiring in September or October 2016.

## convergence for non-Markovian simulated AAs

Posted in Books, pictures, Statistics with tags , , , on December 24, 2015 by xi'an

Mathieu Gerber (formerly CREST) and Luke Bornn have arXived a paper on the almost sure convergence of simulated annealing algorithms when using a non-Markovian sequence that can be in the limiting case completely deterministic and hence use quasi-Monte Carlo sequences. The paper extends the earlier Gerber and Bornn (2015) that I missed. While the paper is highly technical, it shows that under some conditions a sequence of time-varying kernels can be used to reach the maximum of an objective function. With my limited experience with simulated annealing I find this notion of non-iid or even non-random both worth investigating and somewhat unsurprising from a practitioner’s view in that modifying a standard simulated annealing algorithm with steps depending on the entire past of the sequence usually produces better performances.

## PAC-Bayesians

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , on September 22, 2015 by xi'an

Yesterday, I took part in the thesis defence of James Ridgway [soon to move to the University of Bristol[ at Université Paris-Dauphine. While I have already commented on his joint paper with Nicolas on the Pima Indians, I had not read in any depth another paper in the thesis, “On the properties of variational approximations of Gibbs posteriors” written jointly with Pierre Alquier and Nicolas Chopin.

PAC stands for probably approximately correct and starts with an empirical form of posterior, called the Gibbs posterior, where the log-likelihood is replaced with an empirical error

$\pi(\theta|x_1,\ldots,x_n) \propto \exp\{-\lambda r_n(\theta)\}\pi(\theta)$

that is rescaled by a factor λ. Factor that is called the learning rate, to be optimised as the (Kullback) closest  approximation to the true unknown distribution, by Peter Grünwald (2012) in his SafeBayes approach. In the paper of James, Pierre and Nicolas, there is no visible Bayesian perspective, since the pseudo-posterior is used to define a randomised estimator that achieves optimal oracle bounds. When λ is of order n. The purpose of the paper is rather to produce an efficient approximation to the Gibbs posterior, by using variational Bayes techniques. And to derive point estimators. With the added appeal that the approximation also achieves the oracle bounds. (Surprisingly, the authors do not leave the Pima Indians alone as they use this benchmark for a ranking model.) Since there is no discussion on the choice of the learning rate λ, as opposed to Bissiri et al. (2013) I discussed around Bayes.250, I have difficulties perceiving the possible impact of this representation on Bayesian analysis. Except maybe as an ABC device, as suggested by Christophe Andrieu.

## SMC 2015

Posted in Statistics, Travel, University life with tags , , , , , , , , , , on September 7, 2015 by xi'an

Nicolas Chopin ran a workshop at ENSAE on sequential Monte Carlo the past three days and it was a good opportunity to get a much needed up-to-date on the current trends in the field. Especially given that the meeting was literally downstairs from my office at CREST. And given the top range of researchers presenting their current or past work (in the very amphitheatre where I attended my first statistics lectures, a few dozen years ago!). Since unforeseen events made me miss most of the central day, I will not comment on individual talks, some of which I had already heard in the recent past, but this was a high quality workshop, topped by a superb organisation. (I started wondering why there was no a single female speaker in the program and so few female participants in the audience, then realised this is a field with a massive gender imbalance, which is difficult to explain given the different situation in Bayesian statistics and even in Bayesian computation…)  Some key topics I gathered during the talks I could attend–apologies to the other speakers for missing their talk due to those unforeseen events–are unbiasedness, which sounds central to the SMC methods [at least those presented there] as opposed to MCMC algorithms, and local features, used in different ways like hierarchical decomposition, multiscale, parallelisation, local coupling, &tc., to improve convergence and efficiency…

## Edmond Malinvaud (1923-2015)

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on March 11, 2015 by xi'an

The statistician, econometrician, macro- and micro-economist, Edmond Malinvaud died on Saturday, March 7. He had been director of my alma mater ENSAE (1962–1966), directeur de la Prévision at the Finance Department (1972–1974), director of INSEE (1974–1987), and Professeur at Collège de France (1988–1993). While primarily an economist, with his theories of disequilibrium and unemployment, reflected in his famous book Théorie macro-économique (1981) that he taught us at ENSAE, he was also instrumental in shaping the French econometrics school, see his equally famous Statistical Methods of Econometrics (1970), and in the reorganisation of INSEE as the post-war State census and economic planning tool. He was also an honorary Fellow of the Royal Statistical Society and the 1981 president of the International Institute of Statistics. Edmond Malinvaud studied under Maurice Allais, Nobel Prize in economics in 1988, and was himself considered as a potential Nobel for several years. My personal memories of him at ENSAE and CREST are of a very clear teacher and of a kind and considerate man, with the reserve and style of a now-bygone era…