## at the centre of Bayes

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on October 14, 2019 by xi'an

## thermodynamic integration plus temperings

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , on July 30, 2019 by xi'an

Biljana Stojkova and David Campbel recently arXived a paper on the used of parallel simulated tempering for thermodynamic integration towards producing estimates of marginal likelihoods. Resulting into a rather unwieldy acronym of PT-STWNC for “Parallel Tempering – Simulated Tempering Without Normalizing Constants”. Remember that parallel tempering runs T chains in parallel for T different powers of the likelihood (from 0 to 1), potentially swapping chain values at each iteration. Simulated tempering monitors a single chain that explores both the parameter space and the temperature range. Requiring a prior on the temperature. Whose optimal if unrealistic choice was found by Geyer and Thomson (1995) to be proportional to the inverse (and unknown) normalising constant (albeit over a finite set of temperatures). Proposing the new temperature instead via a random walk, the Metropolis within Gibbs update of the temperature τ then involves normalising constants.

“This approach is explored as proof of concept and not in a general sense because the precision of the approximation depends on the quality of the interpolator which in turn will be impacted by smoothness and continuity of the manifold, properties which are difficult to characterize or guarantee given the multi-modal nature of the likelihoods.”

To bypass this issue, the authors pick for their (formal) prior on the temperature τ, a prior such that the profile posterior distribution on τ is constant, i.e. the joint distribution at τ and at the mode [of the conditional posterior distribution of the parameter] is constant. This choice makes for a closed form prior, provided this mode of the tempered posterior can de facto be computed for each value of τ. (However it is unclear to me why the exact mode would need to be used.) The resulting Metropolis ratio becomes independent of the normalising constants. The final version of the algorithm runs an extra exchange step on both this simulated tempering version and the untempered version, i.e., the original unnormalised posterior. For the marginal likelihood, thermodynamic integration is invoked, following Friel and Pettitt (2008), using simulated tempering samples of (θ,τ) pairs (associated instead with the above constant profile posterior) and simple Riemann integration of the expected log posterior. The paper stresses the gain due to a continuous temperature scale, as it “removes the need for optimal temperature discretization schedule.” The method is applied to the Glaxy (mixture) dataset in order to compare it with the earlier approach of Friel and Pettitt (2008), resulting in (a) a selection of the mixture with five components and (b) much more variability between the estimated marginal  likelihoods for different numbers of components than in the earlier approach (where the estimates hardly move with k). And (c) a trimodal distribution on the means [and unimodal on the variances]. This example is however hard to interpret, since there are many contradicting interpretations for the various numbers of components in the model. (I recall Radford Neal giving an impromptu talks at an ICMS workshop in Edinburgh in 2001 to warn us we should not use the dataset without a clear(er) understanding of the astrophysics behind. If I remember well he was excluded all low values for the number of components as being inappropriate…. I also remember taking two days off with Peter Green to go climbing Craigh Meagaidh, as the only authorised climbing place around during the foot-and-mouth epidemics.) In conclusion, after presumably too light a read (I did not referee the paper!), it remains unclear to me why the combination of the various tempering schemes is bringing a noticeable improvement over the existing. At a given computational cost. As the temperature distribution does not seem to favour spending time in the regions where the target is most quickly changing. As such the algorithm rather appears as a special form of exchange algorithm.

## convergence of MCMC

Posted in Statistics with tags , , , , , , , , , on June 16, 2017 by xi'an

Michael Betancourt just posted on arXiv an historical  review piece on the convergence of MCMC, with a physical perspective.

“The success of these of Markov chain Monte Carlo, however, contributed to its own demise.”

The discourse proceeds through augmented [reality!] versions of MCMC algorithms taking advantage of the shape and nature of the target distribution, like Langevin diffusions [which cannot be simulated directly and exactly at the same time] in statistics and molecular dynamics in physics. (Which reminded me of the two parallel threads at the ICMS workshop we had a few years ago.) Merging into hybrid Monte Carlo, morphing into Hamiltonian Monte Carlo under the quills of Radford Neal and David MacKay in the 1990’s. It is a short entry (and so is this post), with some background already well-known to the community, but it nonetheless provides a perspective and references rarely mentioned in statistics.

## Bruce Lindsay (March 7, 1947 — May 5, 2015)

Posted in Books, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on May 22, 2015 by xi'an

## “those” coincidences

Posted in pictures, Travel, University life with tags , , , , , , , , , , , , , , on June 21, 2014 by xi'an

Last Thursday night, after a friendly dinner closing the ICMS workshop, I was rushing back to Pollock Halls to catch some sleep before a very early flight. When crossing North Bridge, on top of Waverley station, I then spotted in the crowd a well-known face of a fellow statistician from Cambridge University, on an academic visit to the University of Edinburgh that was completely unrelated with the workshop. Then, today, on my way back from submitting a visa request at the Indian embassy in Paris, I took the RER train for one stop between Gare du Nord and Chatelet. When I stood up from my seat and looked behind me, a senior (and most famous) mathematician was sitting right there, in deep conversation with a colleague about algorithms… Just two of “those” coincidences. (Edinburgh may be propitious to coincidences: at the last ICMS workshop I attended, I ended up in the same Indian restaurant as Marc Suchard, who also was on an academic visit to the University of Edinburgh that was completely unrelated with the workshop!)

## posterior likelihood ratio is back

Posted in Statistics, University life with tags , , , , , , , , , on June 10, 2014 by xi'an

“The PLR turns out to be a natural Bayesian measure of evidence of the studied hypotheses.”

Isabelle Smith and André Ferrari just arXived a paper on the posterior distribution of the likelihood ratio. This is in line with Murray Aitkin’s notion of considering the likelihood ratio

$f(x|\theta_0) / f(x|\theta)$

as a prior quantity, when contemplating the null hypothesis that θ is equal to θ0. (Also advanced by Alan Birnbaum and Arthur Dempster.) A concept we criticised (rather strongly) in our Statistics and Risk Modelling paper with Andrew Gelman and Judith Rousseau.  The arguments found in the current paper in defence of the posterior likelihood ratio are quite similar to Aitkin’s:

• defined for (some) improper priors;
• invariant under observation or parameter transforms;
• more informative than tthe posterior mean of the posterior likelihood ratio, not-so-incidentally equal to the Bayes factor;
• avoiding using the posterior mean for an asymmetric posterior distribution;
• achieving some degree of reconciliation between Bayesian and frequentist perspectives, e.g. by being equal to some p-values;
• easily computed by MCMC means (if need be).

One generalisation found in the paper handles the case of composite versus composite hypotheses, of the form

$\int\mathbb{I}\left( p(x|\theta_1)

which brings back an earlier criticism I raised (in Edinburgh, at ICMS, where as one-of-those-coincidences, I read this paper!), namely that using the product of the marginals rather than the joint posterior is no more a standard Bayesian practice than using the data in a prior quantity. And leads to multiple uses of the data. Hence, having already delivered my perspective on this approach in the past, I do not feel the urge to “raise the flag” once again about a paper that is otherwise well-documented and mathematically rich.

## Edinburgh snapshot (#5)

Posted in pictures, Running, Travel with tags , , , , on June 9, 2014 by xi'an