Archive for MCMC

zig, zag, and subsampling

Posted in Books, Statistics, University life with tags , , , , , , , , , on December 29, 2016 by xi'an

ENSAE, Nov. 17, 2010Today, I alas missed a seminar at BiPS on the Zig-Zag (sub-)sampler of Joris Bierkens, Paul Fearnhead and Gareth Roberts, presented here in Paris by James Ridgway. Fortunately for me, I had some discussions with Murray Pollock in Warwick and then again with Changye Wu in Dauphine that shed some light on this complex but highly innovative approach to simulating in Big Data settings thanks to a correct subsampling mechanism.

The zig-zag process runs a continuous process made of segments that turn from one diagonal to the next at random times driven by a generator connected with the components of the gradient of the target log-density. Plus a symmetric term. Provided those random times can be generated, this process is truly available and associated with the right target distribution. When the components of the parameter are independent (an unlikely setting), those random times can be associated with an inhomogeneous Poisson process. In the general case, one needs to bound the gradients by more manageable functions that create a Poisson process that can later be thinned. Next, one needs to simulate the process for the upper bound, a task that seems hard to achieve apart from linear and piecewise constant upper bounds. The process has a bit of a slice sampling taste, except that it cannot be used as a slice sampler but requires continuous time integration, given that the length of each segment matters. (Or maybe random time subsampling?)

A highly innovative part of the paper concentrates on Big Data likelihoods and on the possibility to subsample properly and exactly the original dataset. The authors propose Zig-Zag with subsampling by turning the gradients into random parts of the gradients. While remaining unbiased. There may be a cost associated with this gain of one to n, namely that the upper bounds may turn larger as they handle all elements in the likelihood at once, hence become (even) less efficient. (I am more uncertain about the case of the control variates, as it relies on a Lipschitz assumption.) While I still miss an easy way to implement the approach in a specific model, I remain hopeful for this new approach to make a major dent in the current methodologies!

anytime!

Posted in Books, Mountains, pictures, Statistics, Travel with tags , , , , , on December 22, 2016 by xi'an

“An anytime algorithm is an algorithm that can be run continuously, generating progressively better solutions when afforded additional computation time. Traditional particle-based inference algorithms are not anytime in nature; all particles need to be propagated in lock-step to completion in order to compute expectations.”

Following a discussion with Lawrence Murray last week, I read Paige et al.  NIPS 2014 paper on their anytime sequential Monte Carlo algorithm. As explained above, an anytime algorithm is interruptible, meaning it can be stopped at any time without biasing the outcome of the algorithm. While MCMC algorithms can qualify as anytime (provided they are in stationary regime), it is not the case with sequential and particle Monte Carlo algorithms, which do not have an inbred growing mechanism preserving the target. In the case of Paige et al.’s proposal, the interruptible solution returns an unbiased estimator of the marginal likelihood at time n for any number of particles, even when this number is set or increased during the computation. The idea behind the solution is to create a particle cascade by going one particle at a time and creating children of this particle in proportion to the current average weight. An approach that can be run indefinitely. And since memory is not infinite, the authors explain how to cap the number of alive particles without putting the running distribution in jeopardy…

pseudo-marginal MCMC with minimal replicas

Posted in Books, Statistics, University life with tags , , , , , , on November 16, 2016 by xi'an

A week ago, Chris Sherlock, Alexandre Thiery and Anthony Lee (Warwick) arXived a short paper where they show that it is most efficient to use only one random substitute to the likelihood when considering a pseudo-marginal algorithm. Thus generalising an earlier paper of Luke Bornn and co-authors I commented earlier. A neat side result in the paper is that the acceptance probability for m replicas [in the pseudo-marginal approximation] is at most m/s the acceptance probability for s replicas when s<m. The main result is as follows:

screenshot_20161114_140345There is a (superficial?) connection with the fact that when running Metropolis-within-Gibbs there is no advantage in doing more than one single Metropolis iteration, as the goal is to converge to the joint posterior, rather than approximating better the full conditional…

Journal of Open Source Software

Posted in Books, R, Statistics, University life with tags , , , , , , , , on October 4, 2016 by xi'an

A week ago, I received a request for refereeing a paper for the Journal of Open Source Software, which I have never seen (or heard of) before. The concept is quite interesting with a scope much broader than statistical computing (as I do not know anyone in the board and no-one there seems affiliated with a Statistics department). Papers are very terse, describing the associated code in one page or two, and the purpose of refereeing is to check the code. (I was asked to evaluate an MCMC R package but declined for lack of time.) Which is a pretty light task if the code is friendly enough to operate right away and provide demos. Best of luck to this endeavour!

Savage-Dickey supermodels

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on September 13, 2016 by xi'an

The Wider Image: Bolivia's cholita climbers: Combination picture shows Aymara indigenous women (L-R) Domitila Alana, 42, Bertha Vedia, 48, Lidia Huayllas, 48, and Dora Magueno, 50, posing for a photograph at the Huayna Potosi mountain, Bolivia April 6, 2016Combination picture shows Aymara indigenous women (L-R) Domitila Alana, 42, Bertha Vedia, 48, Lidia Huayllas, 48, and Dora Magueno, 50, posing for a photograph at the Huayna Potosi mountain, Bolivia April 6, 2016. (c.) REUTERS/David Mercado. REUTERS/David MercadoA. Mootoovaloo, B. Bassett, and M. Kunz just arXived a paper on the computation of Bayes factors by the Savage-Dickey representation through a supermodel (or encompassing model). (I wonder why Savage-Dickey is so popular in astronomy and cosmology statistical papers and not so much elsewhere.) Recall that the trick is to write the Bayes factor in favour of the encompasssing model as the ratio of the posterior and of the prior for the tested parameter (thus eliminating nuisance or common parameters) at its null value,

B10=π(φ⁰|x)/π(φ⁰).

Modulo some continuity constraints on the prior density, and the assumption that the conditional prior on nuisance parameter is the same under the null model and the encompassing model [given the null value φ⁰]. If this sounds confusing or even shocking from a mathematical perspective, check the numerous previous entries on this topic on the ‘Og!

The supermodel created by the authors is a mixture of the original models, as in our paper, and… hold the presses!, it is a mixture of the likelihood functions, as in Phil O’Neill’s and Theodore Kypraios’ paper. Which is not mentioned in the current paper and should obviously be. In the current representation, the posterior distribution on the mixture weight α is a linear function of α involving both evidences, α(m¹-m²)+m², times the artificial prior on α. The resulting estimator of the Bayes factor thus shares features with bridge sampling, reversible jump, and the importance sampling version of nested sampling we developed in our Biometrika paper. In addition to O’Neill and Kypraios’s solution.

The following quote is inaccurate since the MCMC algorithm needs simulating the parameters of the compared models in realistic settings, hence representing the multidimensional integrals by Monte Carlo versions.

“Though we have a clever way of avoiding multidimensional integrals to calculate the Bayesian Evidence, this new method requires very efficient sampling and for a small number of dimensions is not faster than individual nested sampling runs.”

I actually wonder at the sheer rationale of running an intensive MCMC sampler in such a setting, when the weight α is completely artificial. It is only used to jump from one model to the next, which sound quite inefficient when compared with simulating from both models separately and independently. This approach can also be seen as a special case of Carlin’s and Chib’s (1995) alternative to reversible jump. Using instead the Savage-Dickey representation is of course infeasible. Which makes the overall reference to this method rather inappropriate in my opinion. Further, the examples processed in the paper all involve (natural) embedded models where the original Savage-Dickey approach applies. Creating an additional model to apply a pseudo-Savage-Dickey representation does not sound very compelling…

Incidentally, the paper also includes a discussion of a weird notion, the likelihood of the Bayes factor, B¹², which is plotted as a distribution in B¹², most strangely. The only other place I met this notion is in Murray Aitkin’s book. Something’s unclear there or in my head!

“One of the fundamental choices when using the supermodel approach is how to deal with common parameters to the two models.”

This is an interesting question, although maybe not so relevant for the Bayes factor issue where it should not matter. However, as in our paper, multiplying the number of parameters in the encompassing model may hinder convergence of the MCMC chain or reduce the precision of the approximation of the Bayes factor. Again, from a Bayes factor perspective, this does not matter [while it does in our perspective].

Jeff down-under

Posted in Books, Statistics, Travel, University life with tags , , , , , , , on September 9, 2016 by xi'an

amsi_ssaJeff Rosenthal is the AMSI-SSA (Australia Mathematical Sciences Institute – Statistical Society of Australia) lecturer this year and, as I did in 2012, will tour Australia giving seminars. Including this one at QUT. Enjoy, if you happen to be down-under!

MCqMC 2016 [#4]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on August 21, 2016 by xi'an

In his plenary talk this morning, Arnaud Doucet discussed the application of pseudo-marginal techniques to the latent variable models he has been investigating for many years. And its limiting behaviour towards efficiency, with the idea of introducing correlation in the estimation of the likelihood ratio. Reducing complexity from O(T²) to O(T√T). With the very surprising conclusion that the correlation must go to 1 at a precise rate to get this reduction, since perfect correlation would induce a bias. A massive piece of work, indeed!

The next session of the morning was another instance of conflicting talks and I hoped from one room to the next to listen to Hani Doss’s empirical Bayes estimation with intractable constants (where maybe SAME could be of interest), Youssef Marzouk’s transport maps for MCMC, which sounds like an attractive idea provided the construction of the map remains manageable, and Paul Russel’s adaptive importance sampling that somehow sounded connected with our population Monte Carlo approach. (With the additional step of considering transform maps.)

An interesting item of information I got from the final announcements at MCqMC 2016 just before heading to Monash, Melbourne, is that MCqMC 2018 will take place in the city of Rennes, Brittany, on July 2-6. Not only it is a nice location on its own, but it is most conveniently located in space and time to attend ISBA 2018 in Edinburgh the week after! Just moving from one Celtic city to another Celtic city. Along with other planned satellite workshops, this occurrence should make ISBA 2018 more attractive [if need be!] for participants from oversea.