Archive for Biometrika

the exponential power of now

Posted in Books, Statistics, University life with tags , , , , , , , , , , on March 22, 2020 by xi'an

The New York Times had an interview on 13 March with Britta Jewell (MRC, Imperial College London) and Nick Jewell (London School of Hygiene and Tropical Medicine & U of C Berkeley), both epidemiologists. (Nick is also an AE for Biometrika.) Where they explain quite convincingly that the devastating power of the exponential growth and the resulting need for immediate reaction. An urgency that Western governments failed to heed, unsurprisingly including the US federal government. Maybe they should have been told afresh about the legend of paal paysam, where the king who lost to Krishna was asked to double rice grains on the successive squares of a chess board. (Although this is presumably too foreign a thought experiment for The agent orange. He presumably prefers the unbelievable ideological rantings of John Ioannides. Who apparently does mind sacrificing “people with limited life expectancies” for the sake of the economy.) Incidentally, I find the title “The exponential power of now” fabulous!

séminaire P de S

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on February 18, 2020 by xi'an

As I was in Paris and free for the occasion (!), I attended the Paris Statistics seminar this afternoon, in the Latin Quarter. With a first talk by Kweku Abraham on Bayesian inverse problems set a prior on the quantity of interest, γ, rather than its transform G(γ), observed with noise. Always perturbed by the juggling of different distances, like L² versus Kullback-Leibler, in non-parametric frameworks. Reminding me of probabilistic numerics, at least in the framework, since the crux of the talk was 100% about convergence. And a second talk by Leanaïc Chizat on convex neural networks corresponding to an infinite number of neurons, with surprising properties, including implicit bias. And a third talk by Anne Sabourin on PCA for extremes. Which assumed very little on the model but more on the geometry of the distribution, like extremes being concentrated on a subspace. As I was rather tired from an intense week at Warwick, and after a weekend of reading grant applications and Biometrika submissions (!), my foggy brain kept switching to these proposals, trying to make connections with the talks, not completely inappropriately in two cases out of three. (I am afraid the same may happen tomorrow at our probability seminar on computer-based proofs!)

Hastings at 50, from a Metropolis

Posted in Kids, pictures, Running, Travel with tags , , , , , , , , , , , , , , , , , , , , , , on January 4, 2020 by xi'an

A weekend trip to the quaint seaside city of Le Touquet Paris-Plage, facing the city of Hastings on the other side of the Channel, 50 miles away (and invisible on the pictures!), during and after a storm that made for a fantastic watch from our beach-side rental, if less for running! The town is far from being a metropolis, actually, but it got its added surname “Paris-Plage” from British investors who wanted to attract their countrymen in the late 1800s. The writers H.G. Wells and P.G. Wodehouse lived there for a while. (Another type of tourist, William the Conqueror, left for Hastings in 1066 from a wee farther south, near Saint-Valéry-sur-Somme.)

And the coincidental on-line publication in Biometrika of a 50 year anniversary paper, The Hastings algorithm at fifty by David Dunson and James Johndrow. More of a celebration than a comprehensive review, with focus on scalable MCMC, gradient based algorithms, Hamiltonian Monte Carlo, nonreversible Markov chains, and interesting forays into approximate Bayes. Which makes for a great read for graduate students and seasoned researchers alike!

unbiased Hamiltonian Monte Carlo with couplings

Posted in Books, Kids, Statistics, University life with tags , , , , , , on October 25, 2019 by xi'an

In the June issue of Biometrika, which had been sitting for a few weeks on my desk under my teapot!, Jeremy Heng and Pierre Jacob published a paper on unbiased estimators for Hamiltonian Monte Carlo using couplings. (Disclaimer: I was not involved with the review or editing of this paper.) Which extends to HMC environments the earlier paper of Pierre Jacob, John O’Leary and Yves Atchadé, to be discussed soon at the Royal Statistical Society. The fundamentals are the same, namely that an unbiased estimator can be produced from a converging sequence of estimators and that it can be de facto computed if two Markov chains with the same marginal can be coupled. The issue with Hamiltonians is to figure out how to couple their dynamics. In the Gaussian case, it is relatively easy to see that two chains with the same initial momentum meet periodically. In general, there is contraction within a compact set (Lemma 1). The coupling extends to a time discretisation of the Hamiltonian flow by a leap-frog integrator, still using the same momentum. Which roughly amounts in using the same random numbers in both chains. When defining a relaxed meeting (!) where both chains are within δ of one another, the authors rely on a drift condition (8) that reminds me of the early days of MCMC convergence and seem to imply the existence of a small set “where the target distribution [density] is strongly log-concave”. And which makes me wonder if this small set could be used instead to create renewal events that would in turn ensure both stationarity and unbiasedness without the recourse to a second coupled chain. When compared on a Gaussian example with couplings on Metropolis-Hastings and MALA (Fig. 1), the coupled HMC sees hardly any impact of the dimension of the target (in the average coupling time), with a much lower value. However, I wonder at the relevance of the meeting time as an assessment of efficiency. In the sense that the coupling time is not a convergence time but reflects as well on the initial conditions. I acknowledge that this allows for an averaging over  parallel implementations but I remain puzzled by the statement that this leads to “estimators that are consistent in the limit of the number of replicates, rather than in the usual limit of the number of Markov chain iterations”, since a particularly poor initial distribution could on principle lead to a mode of the target being never explored or on the coupling time being ever so rarely too large for the computing abilities at hand.

down with Galton (and Pearson and Fisher…)

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , on July 22, 2019 by xi'an

In the last issue of Significance, which I read in Warwick prior to the conference, there is a most interesting article on Galton’s eugenics, his heritage at University College London (UCL), and the overall trouble with honouring prominent figures of the past with memorials like named building or lectures… The starting point of this debate is a protest from some UCL students and faculty about UCL having a lecture room named after the late Francis Galton who was a professor there. Who further donated at his death most of his fortune to the university towards creating a professorship in eugenics. The protests are about Galton’s involvement in the eugenics movement of the late 18th and early 19th century. As well as professing racist opinions.

My first reaction after reading about these protests was why not?! Named places or lectures, as well as statues and other memorials, have a limited utility, especially when the named person is long dead and they certainly do not contribute in making a scientific theory [associated with the said individual] more appealing or more valid. And since “humans are [only] humans”, to quote Stephen Stigler speaking in this article, it is unrealistic to expect great scientists to be perfect, the more if one multiplies the codes for ethical or acceptable behaviours across ages and cultures. It is also more rational to use amphitheater MS.02 and lecture room AC.18 rather than associate them with one name chosen out of many alumni’s or former professors’.

Predictably, another reaction of mine was why bother?!, as removing Galton’s name from the items it is attached to is highly unlikely to change current views on eugenism or racism. On the opposite, it seems to detract from opposing the present versions of these ideologies. As some recent proposals linking genes and some form of academic success. Another of my (multiple) reactions was that as stated in the article these views of Galton’s reflected upon the views and prejudices of the time, when the notions of races and inequalities between races (as well as genders and social classes) were almost universally accepted, including in scientific publications like the proceedings of the Royal Society and Nature. When Karl Pearson launched the Annals of Eugenics in 1925 (after he started Biometrika) with the very purpose of establishing a scientific basis for eugenics. (An editorship that Ronald Fisher would later take over, along with his views on the differences between races, believing that “human groups differ profoundly in their innate capacity for intellectual and emotional development”.) Starting from these prejudiced views, Galton set up a scientific and statistical approach to support them, by accumulating data and possibly modifying some of these views. But without much empathy for the consequences, as shown in this terrible quote I found when looking for more material:

“I should feel but little compassion if I saw all the Damaras in the hand of a slave-owner, for they could hardly become more wretched than they are now…”

As it happens, my first exposure to Galton was in my first probability course at ENSAE when a terrific professor was peppering his lectures with historical anecdotes and used to mention Galton’s data-gathering trip to Namibia, literally measure local inhabitants towards his physiognomical views , also reflected in the above attempt of his to superpose photographs to achieve the “ideal” thief…

dynamic nested sampling for stars

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , , , , , , , , , , on April 12, 2019 by xi'an

In the sequel of earlier nested sampling packages, like MultiNest, Joshua Speagle has written a new package called dynesty that manages dynamic nested sampling, primarily intended for astronomical applications. Which is the field where nested sampling is the most popular. One of the first remarks in the paper is that nested sampling can be more easily implemented by using a Uniform reparameterisation of the prior, that is, a reparameterisation that turns the prior into a Uniform over the unit hypercube. Which means in fine that the prior distribution can be generated from a fixed vector of uniforms and known transforms. Maybe not such an issue given that this is the prior after all.  The author considers this makes sampling under the likelihood constraint a much simpler problem but it all depends in the end on the concentration of the likelihood within the unit hypercube. And on the ability to reach the higher likelihood slices. I did not see any special trick when looking at the documentation, but reflected on the fundamental connection between nested sampling and this ability. As in the original proposal by John Skilling (2006), the slice volumes are “estimated” by simulated Beta order statistics, with no connection with the actual sequence of simulation or the problem at hand. We did point out our incomprehension for such a scheme in our Biometrika paper with Nicolas Chopin. As in earlier versions, the algorithm attempts at visualising the slices by different bounding techniques, before proceeding to explore the bounded regions by several exploration algorithms, including HMC.

“As with any sampling method, we strongly advocate that Nested Sampling should not be viewed as being strictly“better” or “worse” than MCMC, but rather as a tool that can be more or less useful in certain problems. There is no “One True Method to Rule Them All”, even though it can be tempting to look for one.”

When introducing the dynamic version, the author lists three drawbacks for the static (original) version. One is the reliance on this transform of a Uniform vector over an hypercube. Another one is that the overall runtime is highly sensitive to the choice the prior. (If simulating from the prior rather than an importance function, as suggested in our paper.) A third one is the issue that nested sampling is impervious to the final goal, evidence approximation versus posterior simulation, i.e., uses a constant rate of prior integration. The dynamic version simply modifies the number of point simulated in each slice. According to the (relative) increase in evidence provided by the current slice, estimated through iterations. This makes nested sampling a sort of inversted Wang-Landau since it sharpens the difference between slices. (The dynamic aspects for estimating the volumes of the slices and the stopping rule may hinder convergence in unclear ways, which is not discussed by the paper.) Among the many examples produced in the paper, a 200 dimension Normal target, which is an interesting object for posterior simulation in that most of the posterior mass rests on a ring away from the maximum of the likelihood. But does not seem to merit a mention in the discussion. Another example of heterogeneous regression favourably compares dynesty with MCMC in terms of ESS (but fails to include an HMC version).

[Breaking News: Although I wrote this post before the exciting first image of the black hole in M87 was made public and hence before I was aware of it, the associated AJL paper points out relying on dynesty for comparing several physical models of the phenomenon by nested sampling.]


revisiting marginalisation paradoxes [Bayesian reads #1]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on February 8, 2019 by xi'an

As a reading suggestion for my (last) OxWaSP Bayesian course at Oxford, I included the classic 1973 Marginalisation paradoxes by Phil Dawid, Mervyn Stone [whom I met when visiting UCL in 1992 since he was sharing an office with my friend Costas Goutis], and Jim Zidek. Paper that also appears in my (recent) slides as an exercise. And has been discussed many times on this  ‘Og.

Reading the paper in the train to Oxford was quite pleasant, with a few discoveries like an interesting pike at Fraser’s structural (crypto-fiducial?!) distributions that “do not need Bayesian improper priors to fall into the same paradoxes”. And a most fascinating if surprising inclusion of the Box-Müller random generator in an argument, something of a precursor to perfect sampling (?). And a clear declaration that (right-Haar) invariant priors are at the source of the resolution of the paradox. With a much less clear notion of “un-Bayesian priors” as those leading to a paradox. Especially when the authors exhibit a red herring where the paradox cannot disappear, no matter what the prior is. Rich discussion (with none of the current 400 word length constraint), including the suggestion of neutral points, namely those that do identify a posterior, whatever that means. Funny conclusion, as well:

“In Stone and Dawid’s Biometrika paper, B1 promised never to use improper priors again. That resolution was short-lived and let us hope that these two blinkered Bayesians will find a way out of their present confusion and make another comeback.” D.J. Bartholomew (LSE)

and another

“An eminent Oxford statistician with decidedly mathematical inclinations once remarked to me that he was in favour of Bayesian theory because it made statisticians learn about Haar measure.” A.D. McLaren (Glasgow)

and yet another

“The fundamentals of statistical inference lie beneath a sea of mathematics and scientific opinion that is polluted with red herrings, not all spawned by Bayesians of course.” G.N. Wilkinson (Rothamsted Station)

Lindley’s discussion is more serious if not unkind. Dennis Lindley essentially follows the lead of the authors to conclude that “improper priors must go”. To the point of retracting what was written in his book! Although concluding about the consequences for standard statistics, since they allow for admissible procedures that are associated with improper priors. If the later must go, the former must go as well!!! (A bit of sophistry involved in this argument…) Efron’s point is more constructive in this regard since he recalls the dangers of using proper priors with huge variance. And the little hope one can hold about having a prior that is uninformative in every dimension. (A point much more blatantly expressed by Dickey mocking “magic unique prior distributions”.) And Dempster points out even more clearly that the fundamental difficulty with these paradoxes is that the prior marginal does not exist. Don Fraser may be the most brutal discussant of all, stating that the paradoxes are not new and that “the conclusions are erroneous or unfounded”. Also complaining about Lindley’s review of his book [suggesting prior integration could save the day] in Biometrika, where he was not allowed a rejoinder. It reflects on the then intense opposition between Bayesians and fiducialist Fisherians. (Funny enough, given the place of these marginalisation paradoxes in his book, I was mistakenly convinced that Jaynes was one of the discussants of this historical paper. He is mentioned in the reply by the authors.)