Archive for splitting data

ISBA 2016

Posted in Kids, Statistics, Travel, University life, Wines with tags , , , , , , , , , , on June 14, 2016 by xi'an

non-tibetan flags in Pula, Sardinia, June 12, 2016I remember fondly the early Valencia meetings where we did not have to pick between sessions. Then one year there were two sessions and soon more. And we now have to pick among equally tantalising sessions. [Complaint of the super wealthy, I do realise.] After a morning trip to San’Antioco and the southern coast of Sardinia, I started my ISBA 2016 with an not [that Bayesian] high dimension session with Michael Jordan (who gave a talk related to his MCMski lecture), Isa Verdinelli and Larry Wasserman.

Larry gave a [non-Bayesian, what else?!] talk on the problem of data splitting versus double use of the same data. Or rather using a model index estimated from a given dataset to estimate the properties of the mean of the same data. As in model selection. While splitting the data avoids all sorts of problem, not splitting the data but using a different loss function could avoid the issue. (And the infinite regress that if we keep conducting inference, we may have to split further and further the data.) Namely, if we were looking only at quantities that do not vary across models. So it is surprising that prediction get affected by this.

In a second session around Bayesian tests and model choice, Sarah Filippi presented the Bayesian non-parametric test she devised with Chris Holmes, using Polya trees. And mentioned our testing-by-mixture approach as a valuable alternative! Veronika Rockova talked about her new approach to efficient variable selection by spike-and-slab priors, through a mix of particle MCMC and EM, plus some variational Bayes motivations. (She also mentioned extensions by repulsive sampling through the pinball sampler, of which her recent AISTATS paper reminded me.)

Later in the evening, I figured out that the poster sessions that make the ISBA/Valencia meetings so unique are alas out of reach for me as the level of noise and my reduced hearing capacities (!) make impossible any prolonged discussion on any serious notion. No poster session for ‘Og’s men!, then, even though I can hang out at the fringe and chat with friends!

importance sampling with multiple MCMC sequences

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on October 2, 2015 by xi'an

Vivek Roy, Aixian Tan and James Flegal arXived a new paper, Estimating standard errors for importance sampling estimators with multiple Markov chains, where they obtain a central limit theorem and hence standard error estimates when using several MCMC chains to simulate from a mixture distribution as an importance sampling function. Just before I boarded my plane from Amsterdam to Calgary, which gave me the opportunity to read it completely (along with half a dozen other papers, since it is a long flight!) I first thought it was connecting to our AMIS algorithm (on which convergence Vivek spent a few frustrating weeks when he visited me at the end of his PhD), because of the mixture structure. This is actually altogether different, in that a mixture is made of unnormalised complex enough densities, to act as an importance sampler, and that, due to this complexity, the components can only be simulated via separate MCMC algorithms. Behind this characterisation lurks the challenging problem of estimating multiple normalising constants. The paper adopts the resolution by reverse logistic regression advocated in Charlie Geyer’s famous 1994 unpublished technical report. Beside the technical difficulties in establishing a CLT in this convoluted setup, the notion of mixing importance sampling and different Markov chains is quite appealing, especially in the domain of “tall” data and of splitting the likelihood in several or even many bits, since the mixture contains most of the information provided by the true posterior and can be corrected by an importance sampling step. In this very setting, I also think more adaptive schemes could be found to determine (estimate?!) the optimal weights of the mixture components.