ABC in Helsinki [on-board]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on May 17, 2016 by xi'an

ABC in Helsinki (a.k.a. ABCruise) has started! With a terrific weather most adequate for a cruise on the Baltic. The ship on which the workshop takes place is certainly larger than any I have been on, including the Channel ferries, and the inside alley looks rather like a shopping centre! However, the setting is exceptional, with comfy sea-facing cabins and pleasant breaks (including fancy tea!) Plus,  we have a quiet and cosy conference room that makes one forgets one is on a boat. Until it starts rocking. Or listing! The cruise boat is definitely large enough to be fairly stable. A unique experience we could consider for future (AB-see) workshops (with the caveat that we benefited from exceptional circumstances that brought the costs down to ridiculous amounts).

Richard Everitt talked about the synthetic likelihood approach and its connection with ABC. Making clear for me a point I had somewhat forgotten, namely that the approximative likelihood is a Gaussian at the observed summary statistics, but one centred at empirical moments derived from the simulation of pseudo summaries based on a given value of the parameter θ. So it is not an exact approach in that it does not converge to the true likelihood as the number of simulation grows to infinity. (While a kernel would converge.) That means it may (will) misrepresent the tails unless the distribution of the summary statistic is close to Normal. Richard also introduced bootstrap or bags of little bootstraps in order to speed up the generation of the pseudo-data, which makes sense albeit it moves the sampling away from the true model since it is conditional on  a single simulation.

Jean-Michel Marin introduced the ABC inference algorithm we are currently working on, using regression random forests that differ from the classification forests we used for model selection. (The paper is close to completion so I hope to be able to tell more in a near future!) Clara Grazian presented her semi-parametric work using ABC with Brunero Liseo. That was part of her thesis. Thomas Schön presented an extension of his particle Gibbs with adaptive sampling to the case of degenerate transitions, using an ABC approximation to get around this central problem. A very interesting entry that I need to study deeper. And Caroline Colijn talked about ABC for trees, mostly about the selection of summary statistics towards comparing tree topologies, with  a specific distance between trees that caters to the topology and only the topology.

at CIRM [#3]

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on March 4, 2016 by xi'an

Simon Barthelmé gave his mini-course on EP, with loads of details on the implementation of the method. Focussing on the EP-ABC and MCMC-EP versions today. Leaving open the difficulty of assessing to which limit EP is converging. But mentioning the potential for asynchronous EP (on which I would like to hear more). Ironically using several times a logistic regression example, if not on the Pima Indians benchmark! He also talked about approximate EP solutions that relate to consensus MCMC. With a connection to Mark Beaumont’s talk at NIPS [at the time as mine!] on the comparison with ABC. While we saw several talks on EP during this week, I am still agnostic about the potential of the approach. It certainly produces a fast proxy to the true posterior and hence can be exploited ad nauseam in inference methods based on pseudo-models like indirect inference. In conjunction with other quick and dirty approximations when available. As in ABC, it would be most useful to know how far from the (ideal) posterior distribution does the approximation stands. Machine learning approaches presumably allow for an evaluation of the predictive performances, but less so for the modelling accuracy, even with new sampling steps. [But I know nothing, I know!]

Dennis Prangle presented some on-going research on high dimension [data] ABC. Raising the question of what is the true meaning of dimension in ABC algorithms. Or of sample size. Because the inference relies on the event d(s(y),s(y’))≤ξ or on the likelihood l(θ|x). Both one-dimensional. Mentioning Iain Murray’s talk at NIPS [that I also missed]. Re-expressing as well the perspective that ABC can be seen as a missing or estimated normalising constant problem as in Bornn et al. (2015) I discussed earlier. The central idea is to use SMC to simulate a particle cloud evolving as the target tolerance ξ decreases. Which supposes a latent variable structure lurking in the background.

Judith Rousseau gave her talk on non-parametric mixtures and the possibility to learn parametrically about the component weights. Starting with a rather “magic” result by Allman et al. (2009) that three repeated observations per individual, all terms in a mixture are identifiable. Maybe related to that simpler fact that mixtures of Bernoullis are not identifiable while mixtures of Binomial are identifiable, even when n=2. As “shown” in this plot made for X validated. Actually truly related because Allman et al. (2009) prove identifiability through a finite dimensional model. (I am surprised I missed this most interesting paper!) With the side condition that a mixture of p components made of r Bernoulli products is identifiable when p ≥ 2[log² r] +1, when log² is base 2-logarithm. And [x] the upper rounding. I also find most relevant this distinction between the weights and the remainder of the mixture as weights behave quite differently, hardly parameters in a sense.

Combining Particle MCMC with Rao-Blackwellized Monte Carlo Data Association

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on October 10, 2014 by xi'an

This recently arXived paper by Juho Kokkala and Simo Särkkä mixes a whole lot of interesting topics, from particle MCMC and Rao-Blackwellisation to particle filters, Kalman filters, and even bear population estimation. The starting setup is the state-space hidden process models where particle filters are of use. And where Andrieu, Doucet and Hollenstein (2010) introduced their particle MCMC algorithms. Rao-Blackwellisation steps have been proposed in this setup in the original paper, as well as in the ensuing discussion, like recycling rejected parameters and associated particles. The beginning of the paper is a review of the literature in this area, in particular of the Rao-Blackwellized Monte Carlo Data Association algorithm developed by Särkkä et al. (2007), of which I was not aware previously. (I alas have not followed closely enough the filtering literature in the past years.) Targets evolve independently according to Gaussian dynamics.

In the description of the model (Section 3), I feel there are prerequisites on the model I did not have (and did not check in Särkkä et al., 2007), like the meaning of targets and measurements: it seems the model assumes each measurement corresponds to a given target. More details or an example would have helped. The extension against the existing appears to be the (major) step of including unknown parameters. Due to my lack of expertise in the domain, I have no notion of the existence of similar proposals in the literature, but handling unknown parameters is definitely of direct relevance for the statistical analysis of such problems!

The simulation experiment based on an Ornstein-Uhlenbeck model is somewhat anticlimactic in that the posterior on the mean reversion rate is essentially the prior, conveniently centred at the true value, while the others remain quite wide. It may be that the experiment was too ambitious in selecting 30 simultaneous targets with only a total of 150 observations. Without highly informative priors, my beotian reaction is to doubt the feasibility of the inference. In the case of the Finnish bear study, the huge discrepancy between priors and posteriors, as well as the significant difference between the forestry expert estimations and the model predictions should be discussed, if not addressed, possibly via a simulation using the posteriors as priors. Or maybe using a hierarchical Bayes model to gather a time-wise coherence in the number of bear families. (I wonder if this technique would apply to the type of data gathered by Mohan Delampady on the West Ghats tigers…)

Overall, I am slightly intrigued by the practice of running MCMC chains in parallel and merging the outcomes with no further processing. This assumes a lot in terms of convergence and mixing on all the chains. However, convergence is never directly addressed in the paper.

Advances in scalable Bayesian computation [day #2]

Posted in Books, Mountains, pictures, R, Statistics, University life with tags , , , , , , , , , , , on March 5, 2014 by xi'an

And here is the second day of our workshop Advances in Scalable Bayesian Computation gone! This time, it sounded like the “main” theme was about brains… In fact, Simon Barthelmé‘s research originated from neurosciences, while Dawn Woodard dissected a brain (via MRI) during her talk! (Note that the BIRS website currently posts Simon’s video as being Dan Simpson’s talk, the late change in schedule being due to Dan most unfortunately losing his passport during a plane transfer and most unfortunately being prevented from attending…) I found Simon’s talk quite inspiring, with this Tibshirani et al.’s trick of using logistic regression to estimate densities as a classification problem central to the method and suggesting a completely different vista for handling normalising constants… Then Raazesh Sainudiin gave a detailed explanation and validation of his approach to density estimation by multidimensional pavings/histograms, with a tree representation allowing for fast merging of different estimators. Raaz had given a preliminary version of the talk at CREST last Fall, which helped with focussing on the statistical aspects of the method. Chris Strickland then exposed an image analysis of flooded Northern Queensland landscapes, using a spatio-temporal model with changepoints and about 18,000 parameters. still managing to get an efficiency of O(np) thanks to two tricks. Then it was time for the group photograph outside in a balmy -18⁰ and an open research time that was quite profitable.

In the afternoon sessions, Paul Fearnhead presented an auxiliary variable approach to particle Gibbs, which again opened new possibilities for handling state-space models, but also reminding me of Xiao-Li Meng’s reparameterisation devices. And making me wonder (out loud) whether or not the SMC algorithm was that essential in a static setting, since the sequence could be explored in any possible order for a fixed time horizon. Then Emily Fox gave a 2-for-1 talk, mostly focussing on the first talk, where she introduced a new technique for approximating the gradient in Hamiltonian (or Hockey!) Monte Carlo, using second order Langevin. She did not have much time for the second talk, which intersected with the one she gave at BNP’ski in Chamonix, but focussed on a notion of sandwiched slice sampling where the target density only needs bounds that can get improved if needed. A cool trick! And the talks ended with Dawn Woodard‘s analysis of time varying 3-D brain images towards lesion detection, through an efficient estimation of a spatial mixture of normals.

Advances in scalable Bayesian computation [day #1]

Posted in Books, Mountains, pictures, R, Statistics, University life with tags , , , , , , , , , on March 4, 2014 by xi'an

This was the first day of our workshop Advances in Scalable Bayesian Computation and it sounded like the “main” theme was probabilistic programming, in tune with my book review posted this morning. Indeed, both Vikash Mansinghka and Frank Wood gave talks about this concept, Vikash detailing the specifics of a new programming language called Venture and Frank focussing on his state-space version of the above called Anglican. This is a version of the language Church, developed to handle probabilistic models and inference (hence the joke about Anglican, “a Church of England Venture’! But they could have also added that Frank Wood was also the name of a former archbishop of Melbourne..!) I alas had an involuntary doze during Vikash’s talk, which made it harder for me to assess the fundamentals of those ventures, of how they extended beyond a “mere” new software (and of why I would invest in learning a Lisp-based language!).

The other talks of Day #1 were of a more “classical” nature with Pierre Jacob explaining why non-negative unbiased estimators were impossible to provide in general, a paper I posted about a little while ago, and including an objective Bayes example that I found quite interesting. Then Sumeet Singh (no video) presented a joint work with Nicolas Chopin on the uniform ergodicity of the particle Gibbs sampler, a paper that I should have commented here (except that it appeared just prior to The Accident!), with a nice coupling proof. And Maria Lomeli gave us an introduction to the highly general Poisson-Kingman mixture models as random measures, which encompasses all of the previously studied non-parametric random measures, with an MCMC implementation that included a latent variable representation for the alpha-stable process behind the scene, representation that could be (and maybe is) also useful in parametric analyses of alpha-stable processes.

We also had an open discussion in the afternoon that ended up being quite exciting, with a few of us voicing out some problems or questions about existing methods and others making suggestions or contradictions. We are still a wee bit short of considering a collective paper on MCMC under constraints with coherent cross-validated variational Bayes and loss-based pseudo priors, with applications to basketball data” to appear by the end of the week!

Add to this two visits to the Sally Borden Recreation Centre for morning swimming and evening climbing, and it is no wonder I woke up a bit late this morning! Looking forward Day #2!