## thermodynamic integration plus temperings

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , on July 30, 2019 by xi'an

Biljana Stojkova and David Campbel recently arXived a paper on the used of parallel simulated tempering for thermodynamic integration towards producing estimates of marginal likelihoods. Resulting into a rather unwieldy acronym of PT-STWNC for “Parallel Tempering – Simulated Tempering Without Normalizing Constants”. Remember that parallel tempering runs T chains in parallel for T different powers of the likelihood (from 0 to 1), potentially swapping chain values at each iteration. Simulated tempering monitors a single chain that explores both the parameter space and the temperature range. Requiring a prior on the temperature. Whose optimal if unrealistic choice was found by Geyer and Thomson (1995) to be proportional to the inverse (and unknown) normalising constant (albeit over a finite set of temperatures). Proposing the new temperature instead via a random walk, the Metropolis within Gibbs update of the temperature τ then involves normalising constants.

“This approach is explored as proof of concept and not in a general sense because the precision of the approximation depends on the quality of the interpolator which in turn will be impacted by smoothness and continuity of the manifold, properties which are difficult to characterize or guarantee given the multi-modal nature of the likelihoods.”

To bypass this issue, the authors pick for their (formal) prior on the temperature τ, a prior such that the profile posterior distribution on τ is constant, i.e. the joint distribution at τ and at the mode [of the conditional posterior distribution of the parameter] is constant. This choice makes for a closed form prior, provided this mode of the tempered posterior can de facto be computed for each value of τ. (However it is unclear to me why the exact mode would need to be used.) The resulting Metropolis ratio becomes independent of the normalising constants. The final version of the algorithm runs an extra exchange step on both this simulated tempering version and the untempered version, i.e., the original unnormalised posterior. For the marginal likelihood, thermodynamic integration is invoked, following Friel and Pettitt (2008), using simulated tempering samples of (θ,τ) pairs (associated instead with the above constant profile posterior) and simple Riemann integration of the expected log posterior. The paper stresses the gain due to a continuous temperature scale, as it “removes the need for optimal temperature discretization schedule.” The method is applied to the Glaxy (mixture) dataset in order to compare it with the earlier approach of Friel and Pettitt (2008), resulting in (a) a selection of the mixture with five components and (b) much more variability between the estimated marginal  likelihoods for different numbers of components than in the earlier approach (where the estimates hardly move with k). And (c) a trimodal distribution on the means [and unimodal on the variances]. This example is however hard to interpret, since there are many contradicting interpretations for the various numbers of components in the model. (I recall Radford Neal giving an impromptu talks at an ICMS workshop in Edinburgh in 2001 to warn us we should not use the dataset without a clear(er) understanding of the astrophysics behind. If I remember well he was excluded all low values for the number of components as being inappropriate…. I also remember taking two days off with Peter Green to go climbing Craigh Meagaidh, as the only authorised climbing place around during the foot-and-mouth epidemics.) In conclusion, after presumably too light a read (I did not referee the paper!), it remains unclear to me why the combination of the various tempering schemes is bringing a noticeable improvement over the existing. At a given computational cost. As the temperature distribution does not seem to favour spending time in the regions where the target is most quickly changing. As such the algorithm rather appears as a special form of exchange algorithm.

## convergence of MCMC

Posted in Statistics with tags , , , , , , , , , on June 16, 2017 by xi'an

Michael Betancourt just posted on arXiv an historical  review piece on the convergence of MCMC, with a physical perspective.

“The success of these of Markov chain Monte Carlo, however, contributed to its own demise.”

The discourse proceeds through augmented [reality!] versions of MCMC algorithms taking advantage of the shape and nature of the target distribution, like Langevin diffusions [which cannot be simulated directly and exactly at the same time] in statistics and molecular dynamics in physics. (Which reminded me of the two parallel threads at the ICMS workshop we had a few years ago.) Merging into hybrid Monte Carlo, morphing into Hamiltonian Monte Carlo under the quills of Radford Neal and David MacKay in the 1990’s. It is a short entry (and so is this post), with some background already well-known to the community, but it nonetheless provides a perspective and references rarely mentioned in statistics.

## Bruce Lindsay (March 7, 1947 — May 5, 2015)

Posted in Books, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on May 22, 2015 by xi'an

## “those” coincidences

Posted in pictures, Travel, University life with tags , , , , , , , , , , , , , , on June 21, 2014 by xi'an

Last Thursday night, after a friendly dinner closing the ICMS workshop, I was rushing back to Pollock Halls to catch some sleep before a very early flight. When crossing North Bridge, on top of Waverley station, I then spotted in the crowd a well-known face of a fellow statistician from Cambridge University, on an academic visit to the University of Edinburgh that was completely unrelated with the workshop. Then, today, on my way back from submitting a visa request at the Indian embassy in Paris, I took the RER train for one stop between Gare du Nord and Chatelet. When I stood up from my seat and looked behind me, a senior (and most famous) mathematician was sitting right there, in deep conversation with a colleague about algorithms… Just two of “those” coincidences. (Edinburgh may be propitious to coincidences: at the last ICMS workshop I attended, I ended up in the same Indian restaurant as Marc Suchard, who also was on an academic visit to the University of Edinburgh that was completely unrelated with the workshop!)

## posterior likelihood ratio is back

Posted in Statistics, University life with tags , , , , , , , , , on June 10, 2014 by xi'an

“The PLR turns out to be a natural Bayesian measure of evidence of the studied hypotheses.”

Isabelle Smith and André Ferrari just arXived a paper on the posterior distribution of the likelihood ratio. This is in line with Murray Aitkin’s notion of considering the likelihood ratio

$f(x|\theta_0) / f(x|\theta)$

as a prior quantity, when contemplating the null hypothesis that θ is equal to θ0. (Also advanced by Alan Birnbaum and Arthur Dempster.) A concept we criticised (rather strongly) in our Statistics and Risk Modelling paper with Andrew Gelman and Judith Rousseau.  The arguments found in the current paper in defence of the posterior likelihood ratio are quite similar to Aitkin’s:

• defined for (some) improper priors;
• invariant under observation or parameter transforms;
• more informative than tthe posterior mean of the posterior likelihood ratio, not-so-incidentally equal to the Bayes factor;
• avoiding using the posterior mean for an asymmetric posterior distribution;
• achieving some degree of reconciliation between Bayesian and frequentist perspectives, e.g. by being equal to some p-values;
• easily computed by MCMC means (if need be).

One generalisation found in the paper handles the case of composite versus composite hypotheses, of the form

$\int\mathbb{I}\left( p(x|\theta_1)

which brings back an earlier criticism I raised (in Edinburgh, at ICMS, where as one-of-those-coincidences, I read this paper!), namely that using the product of the marginals rather than the joint posterior is no more a standard Bayesian practice than using the data in a prior quantity. And leads to multiple uses of the data. Hence, having already delivered my perspective on this approach in the past, I do not feel the urge to “raise the flag” once again about a paper that is otherwise well-documented and mathematically rich.

## Edinburgh snapshot (#5)

Posted in pictures, Running, Travel with tags , , , , on June 9, 2014 by xi'an

## computational methods for statistical mechanics [day #4]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on June 7, 2014 by xi'an

My last day at this ICMS workshop on molecular simulation started [with a double loop of Arthur’s Seat thankfully avoiding the heavy rains of the previous night and then] Chris Chipot‘s magistral entry to molecular simulation for proteins with impressive slides and simulation movies, even though I could not follow the details to really understand the simulation challenges therein, just catching a few connections with earlier talks. A typical example of a cross-disciplinary gap, where the other discipline always seems to be stressing the ‘wrong” aspects. Although this is perfectly unrealistic, it would immensely to prepare talks in pairs for such interdisciplinary workshops! Then Gersende Fort presented results about convergence and efficiency for the Wang-Landau algorithm. The idea is to find the optimal rate for updating the weights of the elements of the partition towards reaching the flat histogram in minimal time. Showing massive gains on toy examples. The next talk went back to molecular biology with Jérôme Hénin‘s presentation on improved adaptive biased sampling. With an exciting notion of orthogonality aiming at finding the slowest directions in the target and putting the computational effort. He also discussed the tension between long single simulations and short repeated ones, echoing a long-going debate in the MCMC community. (He also had a slide with a picture of my first 1983 Apple IIe computer!) Then Antonietta Mira gave a broad perspective on delayed rejection and zero variance estimates. With impressive variance reductions (although some physicists then asked for reduction of order 10¹⁰!). Johannes Zimmer gave a beautiful maths talk on the connection between particle and diffusion limits (PDEs) and Wasserstein geometry and large deviations. (I did not get most of the talk, but it was nonetheless beautiful!) Bert Kappen concluded the day (and the workshop for me) by a nice introduction to control theory. Making connection between optimal control and optimal importance sampling. Which made me idly think of the following problem: what if control cannot be completely… controlled and hence involves a stochastic part? Presumably of little interest as the control would then be on the parameters of the distribution of the control.

“The alanine dipeptide is the fruit fly of molecular simulation.”

The example of this alanine dipeptide molecule was so recurrent during the talks that it justified the above quote by Michael Allen. Not that I am more proficient in the point of studying this protein or using it as a benchmark. Or in identifying the specifics of the challenges of molecular dynamics simulation. Not a criticism of the ICMS workshop obviously, but rather of my congenital difficulty with continuous time processes!!! So I do not return from Edinburgh with a new research collaborative project in molecular dynamics (if with more traditional prospects), albeit with the perception that a minimal effort could bring me to breach the vocabulary barrier. And maybe consider ABC ventures in those (new) domains. (Although I fear my talk on ABC did not impact most of the audience!)