Archive for MCMC

delayed acceptance [alternative]

Posted in Books, Kids, Statistics, University life with tags , , , , , , on October 22, 2014 by xi'an

In a comment on our Accelerating Metropolis-Hastings algorithms: Delayed acceptance with prefetching paper, Philip commented that he had experimented with an alternative splitting technique retaining the right stationary measure: the idea behind his alternative acceleration is again (a) to divide the target into bits and (b) run the acceptance step by parts, towards a major reduction in computing time. The difference with our approach is to represent the  overall acceptance probability

\min_{k=0,..,d}\left\{\prod_{j=1}^k \rho_j(\eta,\theta),1\right\}

and, even more surprisingly than in our case, this representation remains associated with the right (posterior) target!!! Provided the ordering of the terms is random with a symmetric distribution on the permutation. This property can be directly checked via the detailed balance condition.

In a toy example, I compared the acceptance rates (acrat) for our delayed solution (letabin.R), for this alternative (letamin.R), and for a non-delayed reference (letabaz.R), when considering more and more fractured decompositions of a Bernoulli likelihood.

> system.time(source("letabin.R"))
user system elapsed
225.918 0.444 227.200
> acrat
[1] 0.3195 0.2424 0.2154 0.1917 0.1305 0.0958
> system.time(source("letamin.R"))
user system elapsed
340.677 0.512 345.389
> acrat
[1] 0.4045 0.4138 0.4194 0.4003 0.3998 0.4145
> system.time(source("letabaz.R"))
user system elapsed
49.271 0.080 49.862
> acrat
[1] 0.6078 0.6068 0.6103 0.6086 0.6040 0.6158

A very interesting outcome since the acceptance rate does not change with the number of terms in the decomposition for the alternative delayed acceptance method… Even though it logically takes longer than our solution. However, the drawback is that detailed balance implies picking the order at random, hence loosing on the gain in computing the cheap terms first. If reversibility could be bypassed, then this alternative would definitely get very appealing!

Combining Particle MCMC with Rao-Blackwellized Monte Carlo Data Association

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , on October 10, 2014 by xi'an

This recently arXived paper by Juho Kokkala and Simo Särkkä mixes a whole lot of interesting topics, from particle MCMC and Rao-Blackwellisation to particle filters, Kalman filters, and even bear population estimation. The starting setup is the state-space hidden process models where particle filters are of use. And where Andrieu, Doucet and Hollenstein (2010) introduced their particle MCMC algorithms. Rao-Blackwellisation steps have been proposed in this setup in the original paper, as well as in the ensuing discussion, like recycling rejected parameters and associated particles. The beginning of the paper is a review of the literature in this area, in particular of the Rao-Blackwellized Monte Carlo Data Association algorithm developed by Särkkä et al. (2007), of which I was not aware previously. (I alas have not followed closely enough the filtering literature in the past years.) Targets evolve independently according to Gaussian dynamics.

In the description of the model (Section 3), I feel there are prerequisites on the model I did not have (and did not check in Särkkä et al., 2007), like the meaning of targets and measurements: it seems the model assumes each measurement corresponds to a given target. More details or an example would have helped. The extension against the existing appears to be the (major) step of including unknown parameters. Due to my lack of expertise in the domain, I have no notion of the existence of similar proposals in the literature, but handling unknown parameters is definitely of direct relevance for the statistical analysis of such problems!

The simulation experiment based on an Ornstein-Uhlenbeck model is somewhat anticlimactic in that the posterior on the mean reversion rate is essentially the prior, conveniently centred at the true value, while the others remain quite wide. It may be that the experiment was too ambitious in selecting 30 simultaneous targets with only a total of 150 observations. Without highly informative priors, my beotian reaction is to doubt the feasibility of the inference. In the case of the Finnish bear study, the huge discrepancy between priors and posteriors, as well as the significant difference between the forestry expert estimations and the model predictions should be discussed, if not addressed, possibly via a simulation using the posteriors as priors. Or maybe using a hierarchical Bayes model to gather a time-wise coherence in the number of bear families. (I wonder if this technique would apply to the type of data gathered by Mohan Delampady on the West Ghats tigers…)

Overall, I am slightly intrigued by the practice of running MCMC chains in parallel and merging the outcomes with no further processing. This assumes a lot in terms of convergence and mixing on all the chains. However, convergence is never directly addressed in the paper.

hypothesis testing for MCMC

Posted in Books, Statistics, University life with tags , , , , on October 6, 2014 by xi'an

A recent arXival by Benjamin Gyori and Daniel Paulin considers sequential testing based on MCMC simulation. The test is about an expectation under the target and stationary distribution of the Markov chain (i.e., the posterior in a Bayesian setting). Hence testing whether or not the posterior expectation is below a certain bound is not directly relevant from a Bayesian perspective. One would test instead whether or not the parameter itself is below the bound… The paper is then more a study of sequential tests when the data is a Markov chain than in any clear connection with MCMC topics. Despite the paper including an example of a Metropolis-Hastings scheme for approximating the posterior on the parameters of an ODE. I am a bit puzzled by the purpose of the test, as I was rather expecting tests connected with the convergence of the Markov chain or of the empirical mean. (But, given the current hour, I may also have missed a crucial point!)

future of computational statistics

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , on September 29, 2014 by xi'an

I am currently preparing a survey paper on the present state of computational statistics, reflecting on the massive evolution of the field since my early Monte Carlo simulations on an Apple //e, which would take a few days to return a curve of approximate expected squared error losses… It seems to me that MCMC is attracting more attention nowadays than in the past decade, both because of methodological advances linked with better theoretical tools, as for instance in the handling of stochastic processes, and because of new forays in accelerated computing via parallel and cloud computing, The breadth and quality of talks at MCMski IV is testimony to this. A second trend that is not unrelated to the first one is the development of new and the rehabilitation of older techniques to handle complex models by approximations, witness ABC, Expectation-Propagation, variational Bayes, &tc. With a corollary being an healthy questioning of the models themselves. As illustrated for instance in Chris Holmes’ talk last week. While those simplifications are inevitable when faced with hardly imaginable levels of complexity, I still remain confident about the “inevitability” of turning statistics into an “optimize+penalize” tunnel vision…  A third characteristic is the emergence of new languages and meta-languages intended to handle complexity both of problems and of solutions towards a wider audience of users. STAN obviously comes to mind. And JAGS. But it may be that another scale of language is now required…

If you have any suggestion of novel directions in computational statistics or instead of dead ends, I would be most interested in hearing them! So please do comment or send emails to my gmail address bayesianstatistics

BAYSM’14 recollection

Posted in Books, Kids, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , on September 23, 2014 by xi'an

Poster for the meeting found everywhere on the WU campus, Wien business and economics universityWhen I got invited to BAYSM’14 last December, I was quite excited to be part of the event. (And to have the opportunities to be in Austria, in Wien and on the new WU campus!) And most definitely and a posteriori I have not been disappointed given the high expectations I had for that meeting…! The organisation was seamless, even by Austrian [high] standards, the program diverse and innovative, if somewhat brutal for older Bayesians and the organising committee (Angela Bitto, Gregor Kastner, and Alexandra Posekany) deserves an ISBA recognition award [yet to be created!] for their hard work and dedication. Thanks also to Sylvia Früwirth-Schnatter for hosting the meeting in her university. They set the standard very high for the next BAYSM organising team. (To be hold in Firenze/Florence, on June 19-21, 2016, just prior to the ISBA Worlbaysm5d meeting not taking place in Banff. A great idea to associate with a major meeting, in order to save on travel costs. Maybe the following BAYSM will take place in Edinburgh! Young, local, and interested Bayesians just have to contact the board of BAYS with proposals.)

So, very exciting and diverse. A lot of talks in applied domains, esp. economics and finance in connection with the themes of the guest institution, WU.  On the talks most related to my areas of interest, I was pleased to see Matthew Simpson working on interweaving MCMC with Vivek Roy and Jarad Niemi, Madhura Killedar constructing her own kind of experimental ABC on galaxy clusters, Kathrin Plankensteiner using Gaussian processes on accelerated test data, Julyan Arbel explaining modelling by completely random measures for hazard mixtures [and showing his filliation with me by (a) adapting my pun title to his talk, (b) adding an unrelated mountain picture to the title page, (c) including a picture of a famous probabilist, Paul Lévy, to his introduction of Lévy processes and (d) using xkcd strips], Ewan Cameron considering future ABC for malaria modelling,  Konstantinos Perrakis working on generic importance functions in data augmentation settings, Markus baysm4Hainy presenting his likelihood-free design (that I commented a while ago), Kees Mulder explaining how to work with the circular von Mises distribution. Not to mention the numerous posters I enjoyed over the first evening. And my student Clara Grazian who talked about our joint and current work on Jeffreys priors for mixture of distributions. Whose talk led me to think of several extensions…

Besides my trek through past and current works of mine dealing with mixtures, the plenary sessions for mature Bayesians were given by Mike West and Chris Holmes, who gave very different talks but with the similar message that data was catching up with modelling and with a revenge and that we [or rather young Bayesians] needed to deal with this difficulty. And use approximate or proxy models. Somewhat in connection with my last part on an alternative to Bayes factors, Mike also mentioned a modification of the factor in order to attenuate the absorbing impact of long time series. And Chris re-set Bayesian analysis within decision theory, constructing approximate models by incorporating the loss function as a substitute to the likelihood.

Once again, a terrific meeting in a fantastic place with a highly unusual warm spell. Plus enough time to run around Vienna and its castles and churches. And enjoy local wines (great conference evening at a Heuriger, where we did indeed experience Gemütlichkeit.) And museums. Wunderbar!

Bangalore workshop [ಬೆಂಗಳೂರು ಕಾರ್ಯಾಗಾರ] and new book

Posted in Books, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , on August 13, 2014 by xi'an

IIScOn the last day of the IFCAM workshop in Bangalore, Marc Lavielle from INRIA presented a talk on mixed effects where he illustrated his original computer language Monolix. And mentioned that his CRC Press book on Mixed Effects Models for the Population Approach was out! (Appropriately listed as out on a 14th of July on amazon!) He actually demonstrated the abilities of Monolix live and on diabets data provided by an earlier speaker from Kolkata, which was a perfect way to start initiating a collaboration! Nice cover (which is all I saw from the book at this stage!) that maybe will induce candidates to write a review for CHANCE. Estimation of those mixed effect models relies on stochastic EM algorithms developed by Marc Lavielle and Éric Moulines in the 90’s, as well as MCMC methods.

adaptive equi-energy sampling

Posted in Statistics, University life with tags , , on July 14, 2014 by xi'an

Today, I took part in the thesis defence of Amandine Shreck at Telecom-ParisTech. I had commented a while ago on the Langevin algorithm for discontinuous targets she developed with co-authors from that school towards variable selection. The thesis also contains material on the equi-energy sampler that is worth mentioning. The algorithm relates to the Wang-Landau algorithm last discussed here for the seminars of Pierre and Luke in Paris, last month. The algorithm aims at facilitating the moves around the target density by favouring moves from one energy level to the next. As explained to me by Pierre once again after his seminar, the division of the space according to the target values is a way to avoid creating artificial partitions over the sampling space. A sort of Lebesgue version of Monte Carlo integration. The energy bands

\{\theta;\ \underline{\pi}\le \pi(\theta) \le \overline{\pi}

require the choice of a sequence of bounds on the density, values that are hardly available prior to the simulation of the target. The paper corresponding to this part of the thesis (and published in our special issue of TOMACS last year) thus considers the extension when the bounds are defined on the go, in a adaptive way. This could be achieved based on earlier simulations, using some quantiles of the observed values of the target but this is  a costly solution which requires to keep an ordered sample of the density values. (Is it that costly?!) Thus the authors prefer to determine the energy levels in a cheaper adaptive manner. Namely, through a Robbins-Monro/stochastic approximation type update of the bounds,

\xi_{n+1m\alpha}=\xi_{n,\alpha}+\gamma_n(\alpha-\mathbb{I}\{\pi(\theta_n)\le \xi_{n,\alpha}\}\,.

My questions related with this part of the thesis were about the actual gain if any in computing time versus efficiency, the limitations in terms of curse of dimension and storage, the connections with the Wang-Landau algorithm and pseudo-marginal approximations, and the (degree of) likelihood of an universal and automatised adaptive equi-energy sampler.

Follow

Get every new post delivered to your Inbox.

Join 670 other followers