Archive for Biometrika

adaptive ABC tolerance

Posted in Books, Statistics, University life with tags , , , , , , , , , on June 2, 2020 by xi'an

“There are three common approaches for selecting the tolerance sequence (…) [they] can lead to inefficient sampling”

Umberto Simola, Jessi Cisewski-Kehe, Michael Gutmann and Jukka Corander recently arXived a paper entitled Adaptive Approximate Bayesian Computation Tolerance Selection. I appreciate that they start from our ABC-PMC paper, i.e., Beaumont et al. (2009) [although the representation that the ABC tolerances are fixed in advance is somewhat incorrect in that we used in our codes quantiles of the distances to set our tolerances.] This is also the approach advocated for the initialisation step by the current paper.  Although remaining a wee bit vague. Subsequent steps are based on the proximity between the resulting approximations to the ABC posteriors, more exactly with a quantile derived from the maximum of the ratio between two estimated successive ABC posteriors. Mimicking the Accept-Reject step if always one step too late.  The iteration stops when the ratio is almost one, possibly missing the target due to Monte Carlo variability. (Recall that the “optimal” tolerance is not zero for a finite sample size.)

“…the decrease in the acceptance rate is mitigated by the improvement in the proposed particles.”

A problem is that it depends on the form of the approximation and requires non-parametric hence imprecise steps. Maybe variational encoders could help. Interesting approach by Sugiyama et al. (2012), of which I knew nothing, the core idea being that the ratio of two densities is also the solution to minimising a distance between the numerator density and a variable function times the bottom density. However since only the maximum of the ratio is needed, a more focused approach could be devised. Rather than first approximating the ratio and second maximising the estimated ratio. Maybe the solution of Goffinet et al. (1992) on estimating an accept-reject constant could work.

A further comment is that the estimated density is not properly normalised, which lessens the Accept-Reject analogy since the optimum may well stand above one. And thus stop “too soon”. (Incidentally, the paper contains the mixture example of Sisson et al. (2007), for which our own graphs were strongly criticised during our Biometrika submission!)

the exponential power of now

Posted in Books, Statistics, University life with tags , , , , , , , , , , on March 22, 2020 by xi'an

The New York Times had an interview on 13 March with Britta Jewell (MRC, Imperial College London) and Nick Jewell (London School of Hygiene and Tropical Medicine & U of C Berkeley), both epidemiologists. (Nick is also an AE for Biometrika.) Where they explain quite convincingly that the devastating power of the exponential growth and the resulting need for immediate reaction. An urgency that Western governments failed to heed, unsurprisingly including the US federal government. Maybe they should have been told afresh about the legend of paal paysam, where the king who lost to Krishna was asked to double rice grains on the successive squares of a chess board. (Although this is presumably too foreign a thought experiment for The agent orange. He presumably prefers the unbelievable ideological rantings of John Ioannides. Who apparently does mind sacrificing “people with limited life expectancies” for the sake of the economy.) Incidentally, I find the title “The exponential power of now” fabulous!

séminaire P de S

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on February 18, 2020 by xi'an

As I was in Paris and free for the occasion (!), I attended the Paris Statistics seminar this afternoon, in the Latin Quarter. With a first talk by Kweku Abraham on Bayesian inverse problems set a prior on the quantity of interest, γ, rather than its transform G(γ), observed with noise. Always perturbed by the juggling of different distances, like L² versus Kullback-Leibler, in non-parametric frameworks. Reminding me of probabilistic numerics, at least in the framework, since the crux of the talk was 100% about convergence. And a second talk by Leanaïc Chizat on convex neural networks corresponding to an infinite number of neurons, with surprising properties, including implicit bias. And a third talk by Anne Sabourin on PCA for extremes. Which assumed very little on the model but more on the geometry of the distribution, like extremes being concentrated on a subspace. As I was rather tired from an intense week at Warwick, and after a weekend of reading grant applications and Biometrika submissions (!), my foggy brain kept switching to these proposals, trying to make connections with the talks, not completely inappropriately in two cases out of three. (I am afraid the same may happen tomorrow at our probability seminar on computer-based proofs!)

Hastings at 50, from a Metropolis

Posted in Kids, pictures, Running, Travel with tags , , , , , , , , , , , , , , , , , , , , , , on January 4, 2020 by xi'an

A weekend trip to the quaint seaside city of Le Touquet Paris-Plage, facing the city of Hastings on the other side of the Channel, 50 miles away (and invisible on the pictures!), during and after a storm that made for a fantastic watch from our beach-side rental, if less for running! The town is far from being a metropolis, actually, but it got its added surname “Paris-Plage” from British investors who wanted to attract their countrymen in the late 1800s. The writers H.G. Wells and P.G. Wodehouse lived there for a while. (Another type of tourist, William the Conqueror, left for Hastings in 1066 from a wee farther south, near Saint-Valéry-sur-Somme.)

And the coincidental on-line publication in Biometrika of a 50 year anniversary paper, The Hastings algorithm at fifty by David Dunson and James Johndrow. More of a celebration than a comprehensive review, with focus on scalable MCMC, gradient based algorithms, Hamiltonian Monte Carlo, nonreversible Markov chains, and interesting forays into approximate Bayes. Which makes for a great read for graduate students and seasoned researchers alike!

unbiased Hamiltonian Monte Carlo with couplings

Posted in Books, Kids, Statistics, University life with tags , , , , , , on October 25, 2019 by xi'an

In the June issue of Biometrika, which had been sitting for a few weeks on my desk under my teapot!, Jeremy Heng and Pierre Jacob published a paper on unbiased estimators for Hamiltonian Monte Carlo using couplings. (Disclaimer: I was not involved with the review or editing of this paper.) Which extends to HMC environments the earlier paper of Pierre Jacob, John O’Leary and Yves Atchadé, to be discussed soon at the Royal Statistical Society. The fundamentals are the same, namely that an unbiased estimator can be produced from a converging sequence of estimators and that it can be de facto computed if two Markov chains with the same marginal can be coupled. The issue with Hamiltonians is to figure out how to couple their dynamics. In the Gaussian case, it is relatively easy to see that two chains with the same initial momentum meet periodically. In general, there is contraction within a compact set (Lemma 1). The coupling extends to a time discretisation of the Hamiltonian flow by a leap-frog integrator, still using the same momentum. Which roughly amounts in using the same random numbers in both chains. When defining a relaxed meeting (!) where both chains are within δ of one another, the authors rely on a drift condition (8) that reminds me of the early days of MCMC convergence and seem to imply the existence of a small set “where the target distribution [density] is strongly log-concave”. And which makes me wonder if this small set could be used instead to create renewal events that would in turn ensure both stationarity and unbiasedness without the recourse to a second coupled chain. When compared on a Gaussian example with couplings on Metropolis-Hastings and MALA (Fig. 1), the coupled HMC sees hardly any impact of the dimension of the target (in the average coupling time), with a much lower value. However, I wonder at the relevance of the meeting time as an assessment of efficiency. In the sense that the coupling time is not a convergence time but reflects as well on the initial conditions. I acknowledge that this allows for an averaging over  parallel implementations but I remain puzzled by the statement that this leads to “estimators that are consistent in the limit of the number of replicates, rather than in the usual limit of the number of Markov chain iterations”, since a particularly poor initial distribution could on principle lead to a mode of the target being never explored or on the coupling time being ever so rarely too large for the computing abilities at hand.