## [more than] everything you always wanted to know about marginal likelihood

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , on February 10, 2022 by xi'an

Earlier this year, F. Llorente, L. Martino, D. Delgado, and J. Lopez-Santiago have arXived an updated version of their massive survey on marginal likelihood computation. Which I can only warmly recommend to anyone interested in the matter! Or looking for a base camp to initiate a graduate project. They break the methods into four families

1. Deterministic approximations (e.g., Laplace approximations)
2. Methods based on density estimation (e.g., Chib’s method, aka the candidate’s formula)
3. Importance sampling, including sequential Monte Carlo, with a subsection connecting with MCMC
4. Vertical representations (mostly, nested sampling)

Besides sheer computation, the survey also broaches upon issues like improper priors and alternatives to Bayes factors. The parts I would have done in more details are reversible jump MCMC and the long-lasting impact of Geyer’s reverse logistic regression (with the noise contrasting extension), even though the link with bridge sampling is briefly mentioned there. There is even a table reporting on the coverage of earlier surveys. Of course, the following postnote of the manuscript

The Christian Robert’s blog deserves a special mention , since Professor C. Robert has devoted several entries of his blog with very interesting comments regarding the marginal likelihood estimation and related topics.

does not in the least make me less objective! Some of the final recommendations

• use of Naive Monte Carlo [simulate from the prior] should be always considered [assuming a proper prior!]
• a multiple-try method is a good choice within the MCMC schemes
• optimal umbrella sampling estimator is difficult and costly to implement , so its best performance may not be achieved in practice
• adaptive importance sampling uses the posterior samples to build a suitable normalized proposal, so it benefits from localizing samples in regions of high posterior probability while preserving the properties of standard importance sampling
• Chib’s method is a good alternative, that provide very good performances [but is not always available]
• the success [of nested sampling] in the literature is surprising.

## poor statistics

Posted in Books, pictures, R, Statistics, Travel, Wines with tags , , , , , , , , , , , , on September 24, 2019 by xi'an

I came over the weekend across this graph and the associated news that the county of Saint-Nazaire, on the southern border of Brittany, had a significantly higher rate of cancers than the Loire countries. The complete study written by Solenne Delacour, Anne Cowppli-Bony, amd Florence Molinié, is quite cautious about the reasons for this higher rate, even using a Bayesian Poisson-Gamma smoothing (and the R package empbaysmooth), and citing the 1991 paper by Besag, York and Mollié, but the local and national medias are quick to blame the local industries for the difference. The graph above is particularly bad in that it accumulates mortality causes that are not mutually exclusive or independent. For instance, the much higher mortality rate due to alcohol is obviously responsible for higher rates of most other entries. And indicates a sociological pattern that may or may not be due to the type of job in the area, but differs from the more rural other parts of the Loire countries. (Which, like Brittany, are already significantly above (50%) the national reference for alcohol related health issues.), and may not be strongly connected to exposition to chemicals. For instance, the rates of pulmonary cancers are mostly comparable to the national average, if higher than the rest of the Loire countries and connect with a high smoking propensity. Lymphomas are not significantly different from the regional reference. The only type of cancer that can be directly attributed to working conditions are the mesothelioma, mostly caused by asbestos exposure, which was used in ship building, a specialty of the area. Among the many possible reasons for the higher mortality of the county, the study mentions a lower exposure to medical testings (connected with the sociological composition of the area). Which would indicate the most effective policies for lowering these higher cancer and mortality rates.

## amazing appendix

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , on February 13, 2018 by xi'an

In the first appendix of the 1995 Statistical Science paper of Besag, Green, Higdon and Mengersen, on MCMC, “Bayesian Computation and Stochastic Systems”, stands a fairly neat result I was not aware of (and which Arnaud Doucet, with his unrivalled knowledge of the literature!, pointed out to me in Oxford, avoiding me the tedium to try to prove it afresco!). I remember well reading a version of the paper in Fort Collins, Colorado, in 1993 (I think!) but nothing about this result.

It goes as follows: when running a Metropolis-within-Gibbs sampler for component x¹ of a collection of variates x¹,x²,…, thus aiming at simulating from the full conditional of x¹ given x⁻¹ by making a proposal q(x|x¹,x⁻¹), it is perfectly acceptable to use a proposal that depends on a parameter α (no surprise so far!) and to generate this parameter α anew at each iteration (still unsurprising as α can be taken as an auxiliary variable) and to have the distribution of this parameter α depending on the other variates x²,…, i.e., x⁻¹. This is the surprising part, as adding α as an auxiliary variable was messing up the update of x⁻¹. But the proof as found in the 1995 paper [page 35] does not require to consider α as such as it establishes global balance directly. (Or maybe still detailed balance when writing the whole Gibbs sampler as a cycle of Metropolis steps.) Terrific! And a whiff mysterious..!

## Gibbs for kidds

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , on February 12, 2018 by xi'an

A chance (?) question on X validated brought me to re-read Gibbs for Kids, 25 years after it was written (by my close friends George and Ed). The originator of the question had difficulties with the implementation, apparently missing the cyclic pattern of the sampler, as in equations (2.3) and (2.4), and with the convergence, which is only processed for a finite support in the American Statistician paper. The paper [which did not appear in American Statistician under this title!, but inspired an animal bredeer, Dan Gianola, to write a “Gibbs for pigs” presentation in 1993 at the 44th Annual Meeting of the European Association for Animal Production, Aarhus, Denmark!!!] most appropriately only contains toy examples since those can be processed and compared to know stationary measures. This is for instance the case for the auto-exponential model

$f(x,y) \propto exp(-xy)$

which is only defined as a probability density for a compact support. (The paper does not identify the model as a special case of auto-exponential model, which apparently made the originator of the model, Julian Besag in 1974, unhappy, as George and I found out when visiting Bath, where Julian was spending the final year of his life, many years later.) I use the limiting case all the time in class to point out that a Gibbs sampler can be devised and operate without a stationary probability distribution. However, being picky!, I would like to point out that, contrary, to a comment made in the paper, the Gibbs sampler does not “fail” but on the contrary still “converges” in this case, in the sense that a conditional ergodic theorem applies, i.e., the ratio of the frequencies of visits to two sets A and B with finite measure do converge to the ratio of these measures. For instance, running the Gibbs sampler 10⁶ steps and ckecking for the relative frequencies of x’s in (1,2) and (1,3) gives 0.685, versus log(2)/log(3)=0.63, since 1/x is the stationary measure. One important and influential feature of the paper is to stress that proper conditionals do not imply proper joints. George would work much further on that topic, in particular with his PhD student at the time, my friend Jim Hobert.

With regard to the convergence issue, Gibbs for Kids points out to Schervish and Carlin (1990), which came quite early when considering Gelfand and Smith published their initial paper the very same year, but which also adopts a functional approach to convergence, along the paper’s fixed point perspective, somehow complicating the matter. Later papers by Tierney (1994), Besag (1995), and Mengersen and Tweedie (1996) considerably simplified the answer, which is that irreducibility is a necessary and sufficient condition for convergence. (Incidentally, the reference list includes a technical report of mine’s on latent variable model MCMC implementation that never got published.)

## Wilfred Keith Hastings [1930-2016]

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on December 9, 2016 by xi'an

A few days ago I found on the page Jeff Rosenthal has dedicated to Hastings that he has passed away peacefully on May 13, 2016 in Victoria, British Columbia, where he lived for 45 years as a professor at the University of Victoria. After holding positions at University of Toronto, University of Canterbury (New Zealand), and Bell Labs (New Jersey). As pointed out by Jeff, Hastings’ main paper is his 1970 Biometrika description of Markov chain Monte Carlo methods, Monte Carlo sampling methods using Markov chains and their applications. Which would take close to twenty years to become known to the statistics world at large, although you can trace a path through Peskun (his only PhD student) , Besag and others. I am sorry it took so long to come to my knowledge and also sorry it apparently went unnoticed by most of the computational statistics community.