Archive for high dimensions

A precursor of ABC-Gibbs

Posted in Books, R, Statistics with tags , , , , , , , , , , on June 7, 2019 by xi'an

Following our arXival of ABC-Gibbs, Dennis Prangle pointed out to us a 2016 paper by Athanasios Kousathanas, Christoph Leuenberger, Jonas Helfer, Mathieu Quinodoz, Matthieu Foll, and Daniel Wegmann, Likelihood-Free Inference in High-Dimensional Model, published in Genetics, Vol. 203, 893–904 in June 2016. This paper contains a version of ABC Gibbs where parameters are sequentially simulated from conditionals that depend on the data only through small dimension conditionally sufficient statistics. I had actually blogged about this paper in 2015 but since then completely forgotten about it. (The comments I had made at the time still hold, already pertaining to the coherence or lack thereof of the sampler. I had also forgotten I had run an experiment of an exact Gibbs sampler with incoherent conditionals, which then seemed to converge to something, if not the exact posterior.)

All ABC algorithms, including ABC-PaSS introduced here, require that statistics are sufficient for estimating the parameters of a given model. As mentioned above, parameter-wise sufficient statistics as required by ABC-PaSS are trivial to find for distributions of the exponential family. Since many population genetics models do not follow such distributions, sufficient statistics are known for the most simple models only. For more realistic models involving multiple populations or population size changes, only approximately-sufficient statistics can be found.

While Gibbs sampling is not mentioned in the paper, this is indeed a form of ABC-Gibbs, with the advantage of not facing convergence issues thanks to the sufficiency. The drawback being that this setting is restricted to exponential families and hence difficult to extrapolate to non-exponential distributions, as using almost-sufficient (or not) summary statistics leads to incompatible conditionals and thus jeopardise the convergence of the sampler. When thinking a wee bit more about the case treated by Kousathanas et al., I am actually uncertain about the validation of the sampler. When tolerance is equal to zero, this is not an issue as it reproduces the regular Gibbs sampler. Otherwise, each conditional ABC step amounts to introducing an auxiliary variable represented by the simulated summary statistic. Since the distribution of this summary statistic depends on more than the parameter for which it is sufficient, in general, it should also appear in the conditional distribution of other parameters. At least from this Gibbs perspective, it thus relies on incompatible conditionals, which makes the conditions proposed in our own paper the more relevant.

congrats, Prof Rousseau!

Posted in Statistics with tags , , , , , , , , on April 4, 2019 by xi'an

distributed posteriors

Posted in Books, Statistics, Travel, University life with tags , , , , , , , on February 27, 2019 by xi'an

Another presentation by our OxWaSP students introduced me to the notion of distributed posteriors, following a 2018 paper by Botond Szabó and Harry van Zanten. Which corresponds to the construction of posteriors when conducting a divide & conquer strategy. The authors show that an adaptation of the prior to the division of the sample is necessary to recover the (minimax) convergence rate obtained in the non-distributed case. This is somewhat annoying, except that the adaptation amounts to take the original prior to the power 1/m, when m is the number of divisions. They further show that when the regularity (parameter) of the model is unknown, the optimal rate cannot be recovered unless stronger assumptions are made on the non-zero parameters of the model.

“First of all, we show that depending on the communication budget, it might be advantageous to group local machines and let different groups work on different aspects of the high-dimensional object of interest. Secondly, we show that it is possible to have adaptation in communication restricted distributed settings, i.e. to have data-driven tuning that automatically achieves the correct bias-variance trade-off.”

I find the paper of considerable interest for scalable MCMC methods, even though the setting may happen to sound too formal, because the study incorporates parallel computing constraints. (Although I did not investigate the more theoretical aspects of the paper.)

IMS workshop [day 3]

Posted in pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , on August 30, 2018 by xi'an

I made the “capital” mistake of walking across the entire NUS campus this morning, which is quite green and pretty, but which almost enjoys an additional dimension brought by such an intense humidity that one feels having to get around this humidity!, a feature I have managed to completely erase from my memory of my previous visit there. Anyway, nothing of any relevance. oNE talk in the morning was by Markus Eisenbach on tools used by physicists to speed up Monte Carlo methods, like the Wang-Landau flat histogram, towards computing the partition function, or the distribution of the energy levels, definitely addressing issues close to my interest, but somewhat beyond my reach for using a different language and stress, as often in physics. (I mean, as often in physics talks I attend.) An idea that came out clear to me was to bypass a (flat) histogram target and aim directly at a constant slope cdf for the energy levels. (But got scared away by the Fourier transforms!)

Lawrence Murray then discussed some features of the Birch probabilistic programming language he is currently developing, especially a fairly fascinating concept of delayed sampling, which connects with locally-optimal proposals and Rao Blackwellisation. Which I plan to get back to later [and hopefully sooner than later!].

In the afternoon, Maria de Iorio gave a talk about the construction of nonparametric priors that create dependence between a sequence of functions, a notion I had not thought of before, with an array of possibilities when using the stick breaking construction of Dirichlet processes.

And Christophe Andrieu gave a very smooth and helpful entry to partly deterministic Markov processes (PDMP) in preparation for talks he is giving next week for the continuation of the workshop at IMS. Starting with the guided random walk of Gustafson (1998), which extended a bit later into the non-reversible paper of Diaconis, Holmes, and Neal (2000). Although I had a vague idea of the contents of these papers, the role of the velocity ν became much clearer. And premonitory of the advances made by the more recent PDMP proposals. There is obviously a continuation with the equally pedagogical talk Christophe gave at MCqMC in Rennes two months [and half the globe] ago,  but the focus being somewhat different, it really felt like a new talk [my short term memory may also play some role in this feeling!, as I now remember the discussion of Hilderbrand (2002) for non-reversible processes]. An introduction to the topic I would recommend to anyone interested in this new branch of Monte Carlo simulation! To be followed by the most recently arXived hypocoercivity paper by Christophe and co-authors.

ABCDay [arXivals]

Posted in Books, Statistics, University life with tags , , , , , , on March 2, 2018 by xi'an

A bunch of ABC papers on arXiv yesterday, most of them linked to the incoming Handbook of ABC:

    1. Overview of Approximate Bayesian Computation S. A. Sisson, Y. Fan, M. A. Beaumont
    2. Kernel Recursive ABC: Point Estimation with Intractable Likelihood Takafumi Kajihara, Keisuke Yamazaki, Motonobu Kanagawa, Kenji Fukumizu
    3. High-dimensional ABC D. J. Nott, V. M.-H. Ong, Y. Fan, S. A. Sisson
    4. ABC Samplers Y. Fan, S. A. Sisson


ISBA 2016

Posted in Kids, Statistics, Travel, University life, Wines with tags , , , , , , , , , , on June 14, 2016 by xi'an

non-tibetan flags in Pula, Sardinia, June 12, 2016I remember fondly the early Valencia meetings where we did not have to pick between sessions. Then one year there were two sessions and soon more. And we now have to pick among equally tantalising sessions. [Complaint of the super wealthy, I do realise.] After a morning trip to San’Antioco and the southern coast of Sardinia, I started my ISBA 2016 with an not [that Bayesian] high dimension session with Michael Jordan (who gave a talk related to his MCMski lecture), Isa Verdinelli and Larry Wasserman.

Larry gave a [non-Bayesian, what else?!] talk on the problem of data splitting versus double use of the same data. Or rather using a model index estimated from a given dataset to estimate the properties of the mean of the same data. As in model selection. While splitting the data avoids all sorts of problem, not splitting the data but using a different loss function could avoid the issue. (And the infinite regress that if we keep conducting inference, we may have to split further and further the data.) Namely, if we were looking only at quantities that do not vary across models. So it is surprising that prediction get affected by this.

In a second session around Bayesian tests and model choice, Sarah Filippi presented the Bayesian non-parametric test she devised with Chris Holmes, using Polya trees. And mentioned our testing-by-mixture approach as a valuable alternative! Veronika Rockova talked about her new approach to efficient variable selection by spike-and-slab priors, through a mix of particle MCMC and EM, plus some variational Bayes motivations. (She also mentioned extensions by repulsive sampling through the pinball sampler, of which her recent AISTATS paper reminded me.)

Later in the evening, I figured out that the poster sessions that make the ISBA/Valencia meetings so unique are alas out of reach for me as the level of noise and my reduced hearing capacities (!) make impossible any prolonged discussion on any serious notion. No poster session for ‘Og’s men!, then, even though I can hang out at the fringe and chat with friends!

at CIRM [#3]

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on March 4, 2016 by xi'an

Simon Barthelmé gave his mini-course on EP, with loads of details on the implementation of the method. Focussing on the EP-ABC and MCMC-EP versions today. Leaving open the difficulty of assessing to which limit EP is converging. But mentioning the potential for asynchronous EP (on which I would like to hear more). Ironically using several times a logistic regression example, if not on the Pima Indians benchmark! He also talked about approximate EP solutions that relate to consensus MCMC. With a connection to Mark Beaumont’s talk at NIPS [at the time as mine!] on the comparison with ABC. While we saw several talks on EP during this week, I am still agnostic about the potential of the approach. It certainly produces a fast proxy to the true posterior and hence can be exploited ad nauseam in inference methods based on pseudo-models like indirect inference. In conjunction with other quick and dirty approximations when available. As in ABC, it would be most useful to know how far from the (ideal) posterior distribution does the approximation stands. Machine learning approaches presumably allow for an evaluation of the predictive performances, but less so for the modelling accuracy, even with new sampling steps. [But I know nothing, I know!]

Dennis Prangle presented some on-going research on high dimension [data] ABC. Raising the question of what is the true meaning of dimension in ABC algorithms. Or of sample size. Because the inference relies on the event d(s(y),s(y’))≤ξ or on the likelihood l(θ|x). Both one-dimensional. Mentioning Iain Murray’s talk at NIPS [that I also missed]. Re-expressing as well the perspective that ABC can be seen as a missing or estimated normalising constant problem as in Bornn et al. (2015) I discussed earlier. The central idea is to use SMC to simulate a particle cloud evolving as the target tolerance ξ decreases. Which supposes a latent variable structure lurking in the background.

Judith Rousseau gave her talk on non-parametric mixtures and the possibility to learn parametrically about the component weights. Starting with a rather “magic” result by Allman et al. (2009) that three repeated observations per individual, all terms in a mixture are identifiable. Maybe related to that simpler fact that mixtures of Bernoullis are not identifiable while mixtures of Binomial are identifiable, even when n=2. As “shown” in this plot made for X validated. Actually truly related because Allman et al. (2009) prove identifiability through a finite dimensional model. (I am surprised I missed this most interesting paper!) With the side condition that a mixture of p components made of r Bernoulli products is identifiable when p ≥ 2[log² r] +1, when log² is base 2-logarithm. And [x] the upper rounding. I also find most relevant this distinction between the weights and the remainder of the mixture as weights behave quite differently, hardly parameters in a sense.