A short announcement that the slides of almost all talks at the CRiSM workshop on estimating constants last April 20-22 are now available. Enjoy (and dicuss)!
Archive for Monte Carlo Statistical Methods
Last week, Matias Quiroz, Mattias Villani, and Robert Kohn arXived a paper on exact subsampling MCMC, a paper that contributes to the current literature on approximating MCMC samplers for large datasets, in connection with an earlier paper of Quiroz et al. discussed here last week.
The “exact” in the title is to be understood in the Russian roulette sense. By using Rhee and Glynn debiaising device, the authors achieve an unbiased estimator of the likelihood as in Bardenet et al. (2015). The central tool for the derivation of an unbiased and positive estimator is to find a control variate for each component of the log likelihood that is good enough for the difference between the component and the control to be lower bounded. By the constant a in the screen capture above. When the individual terms d in the product are iid unbiased estimates of the log likelihood difference. And q is the sum of the control variates. Or maybe more accurately of the cheap substitutes to the exact log likelihood components. Thus still of complexity O(n), which makes the application to tall data more difficult to contemplate.
The $64 question is obviously how to produce cheap and efficient control variates that kill the curse of the tall data. (It still irks to resort to this term of control variate, really!) Section 3.2 in the paper suggests clustering the data and building an approximation for each cluster, which seems to imply manipulating the whole dataset at this early stage. At a cost of O(Knd). Furthermore, because finding a correct lower bound a is close to impossible in practice, the authors use a “soft lower bound”, meaning that it is only an approximation and thus that (3.4) above can get negative from time to time, which cancels the validation of the method as a pseudo-marginal approach. The resolution of this difficulty is to resort to the same proxy as in the Russian roulette paper, replacing the unbiased estimator with its absolute value, an answer I already discussed for the Russian roulette paper. An additional step is proposed by Quiroz et al., namely correlating the random numbers between numerator and denominator in their final importance sampling estimator, via a Gaussian copula as in Deligiannidis et al.
This paper made me wonder (idly wonder, mind!) anew how to get rid of the vexing unbiasedness requirement. From a statistical and especially from a Bayesian perspective, unbiasedness is a second order property that cannot be achieved for most transforms of the parameter θ. And that does not keep under reparameterisation. It is thus vexing and perplexing that unbiased is so central to the validation of our Monte Carlo technique and that any divergence from this canon leaves us wandering blindly with no guarantee of ever reaching the target of the simulation experiment…
Next week, Rémi Bardenet is giving a seminar in Paris, Thursday April 14, 2pm, in ENSAE [room 15] on MCMC methods for tall data. Unfortunately, I will miss this opportunity to discuss with Rémi as I will be heading to La Sapienza, Roma, for Clara Grazian‘s PhD defence the next day. And on Monday afternoon, April 11, Nicolas Chopin will give a talk on quasi-Monte Carlo for sequential problems at Institut Henri Poincaré.
Richard Everitt organises an afternoon workshop on Bayesian computation in Reading, UK, on April 19, the day before the Estimating Constant workshop in Warwick, following a successful afternoon last year. Here is the programme:
1230-1315 Antonietta Mira, Università della Svizzera italiana 1315-1345 Ingmar Schuster, Université Paris-Dauphine 1345-1415 Francois-Xavier Briol, University of Warwick 1415-1445 Jack Baker, University of Lancaster 1445-1515 Alexander Mihailov, University of Reading 1515-1545 Coffee break 1545-1630 Arnaud Doucet, University of Oxford 1630-1700 Philip Maybank, University of Reading 1700-1730 Elske van der Vaart, University of Reading 1730-1800 Reham Badawy, Aston University 1815-late Pub and food (SCR, UoR campus)
and the general abstract:
The Bayesian approach to statistical inference has seen major successes in the past twenty years, finding application in many areas of science, engineering, finance and elsewhere. The main drivers of these successes were developments in Monte Carlo methods and the wide availability of desktop computers. More recently, the use of standard Monte Carlo methods has become infeasible due the size and complexity of data now available. This has been countered by the development of next-generation Monte Carlo techniques, which are the topic of this meeting.
The meeting takes place in the Nike Lecture Theatre, Agriculture Building [building number 59].
Statistical Rethinking: A Bayesian Course with Examples in R and Stan is a new book by Richard McElreath that CRC Press sent me for review in CHANCE. While the book was already discussed on Andrew’s blog three months ago, and [rightly so!] enthusiastically recommended by Rasmus Bååth on Amazon, here are the reasons why I am quite impressed by Statistical Rethinking!
“Make no mistake: you will wreck Prague eventually.” (p.10)
While the book has a lot in common with Bayesian Data Analysis, from being in the same CRC series to adopting a pragmatic and weakly informative approach to Bayesian analysis, to supporting the use of STAN, it also nicely develops its own ecosystem and idiosyncrasies, with a noticeable Jaynesian bent. To start with, I like the highly personal style with clear attempts to make the concepts memorable for students by resorting to external concepts. The best example is the call to the myth of the golem in the first chapter, which McElreath uses as an warning for the use of statistical models (which almost are anagrams to golems!). Golems and models [and robots, another concept invented in Prague!] are man-made devices that strive to accomplish the goal set to them without heeding the consequences of their actions. This first chapter of Statistical Rethinking is setting the ground for the rest of the book and gets quite philosophical (albeit in a readable way!) as a result. In particular, there is a most coherent call against hypothesis testing, which by itself justifies the title of the book. Continue reading
One justification for pseudo-marginal Metropolis-Hastings algorithms is the completion or demarginalisation of the initial target with the random variates used to compute the unbiased estimator of the target or likelihood. In a recent arXival, M.-N. Tran, Robert Kohn, M. Quiroz and M. Villani explore the idea of only updating part of those auxiliary random variates, hence the block in the title. The idea is to “reduce the variability in the ratio of the likelihood estimates” but I think it also reduces the moves of the sampler by creating a strong correlation between the likelihood estimates. Of course, a different appeal of the approach is when facing a large product of densities, large enough to prevent the overall approximation at once and requiring blockwise approximations. As in, e.g., consensus Monte Carlo and other “big data” (re)solutions. The convergence results provided in the paper are highly stylised (like assuming the log of the unbiased estimator of the likelihood being normal and simulation run from the prior), but they lead to a characterisation of the inefficiency of the pseudo-marginal algorithm. The inefficiency being defined as the ratio of the variances when using the true likelihood and when using the limiting unbiased estimator. There is however no corresponding result for selecting the number of blocks, G, which is chosen as G=100 in the paper.