Archive for entropy

approximate Bayesian inference [survey]

Posted in Statistics with tags , , , , , , , , , , , , , , , , , , on May 3, 2021 by xi'an

In connection with the special issue of Entropy I mentioned a while ago, Pierre Alquier (formerly of CREST) has written an introduction to the topic of approximate Bayesian inference that is worth advertising (and freely-available as well). Its reference list is particularly relevant. (The deadline for submissions is 21 June,)

special issue of Entropy

Posted in Statistics with tags , , , , , on September 11, 2020 by xi'an

EntropyMCMC [R package]

Posted in Statistics with tags , , , , , , , , , , , , on March 26, 2019 by xi'an

My colleague from the Université d’Orléans, Didier Chauveau, has just published on CRAN a new R package called EntropyMCMC, which contains convergence assessment tools for MCMC algorithms, based on non-parametric estimates of the Kullback-Leibler divergence between current distribution and target. (A while ago, quite a while ago!, we actually collaborated with a few others on the Springer-Verlag Lecture Note #135 Discretization and MCMC convergence assessments.) This follows from a series of papers by Didier Chauveau and Pierre Vandekerkhove that started with a nearest neighbour entropy estimate. The evaluation of this entropy is based on N iid (parallel) chains, which involves a parallel implementation. While the missing normalising constant is overwhelmingly unknown, the authors this is not a major issue “since we are mostly interested in the stabilization” of the entropy distance. Or in the comparison of two MCMC algorithms. [Disclaimer: I have not experimented with the package so far, hence cannot vouch for its performances over large dimensions or problematic targets, but would as usual welcome comments and feedback on readers’ experiences.]

let the evidence speak [book review]

Posted in Books, Kids, Statistics with tags , , , , , , , , , , on December 17, 2018 by xi'an

This book by Alan Jessop, professor at the Durham University Business School,  aims at presenting Bayesian ideas and methods towards decision making “without formula because they are not necessary; the ability to add and multiply is all that is needed.” The trick is in using a Bayes grid, in other words a two by two table. (There are a few formulas that survived the slaughter, see e.g. on p. 91 the formula for the entropy. Contained in the chapter on information that I find definitely unclear.) When leaving the 2×2 world, things become more complicated and the construction of a prior belief as a probability density gets heroic without the availability of maths formulas. The first part of the paper is about Likelihood, albeit not the likelihood function, despite having the general rule that (p.73)

belief is proportional to base rate x likelihood

which is the book‘s version of Bayes’ (base?!) theorem. It then goes on to discuss the less structure nature of prior (or prior beliefs) against likelihood by describing Tony O’Hagan’s way of scaling experts’ beliefs in terms of a Beta distribution. And mentioning Jaynes’ maximum entropy prior without a single formula. What is hard to fathom from the text is how can one derive the likelihood outside surveys. (Using the illustration of 1963 Oswald’s murder by Ruby in the likelihood chapter does not particularly help!) A bit of nitpicking at this stage: the sentence

“The ancient Greeks, and before them the Chinese and the Aztecs…”

is historically incorrect since, while the Chinese empire dates back before the Greek dark ages, the Aztecs only rule Mexico from the 14th century (AD) until the Spaniard invasion. While most of the book sticks with unidimensional parameters, it also discusses more complex structures, for which it relies on Monte Carlo, although the description is rather cryptic (use your spreadsheet!, p.133). The book at this stage turns into a more story-telling mode, by considering for instance the Federalist papers analysis by Mosteller and Wallace. The reader can only follow the process of assessing a document authorship for a single word, as multidimensional cases (for either data or parameters) are out of reach. The same comment applies to the ecology, archeology, and psychology chapters that follow. The intermediary chapter on the “grossly misleading” [Court wording] of the statistical evidence in the Sally Clark prosecution is more accessible in that (again) it relies on a single number. Returning to the ban of Bayes rule in British courts:

In the light of the strong criticism by this court in the 1990s of using Bayes theorem before the jury in cases where there was no reliable statistical evidence, the practice of using a Bayesian approach and likelihood ratios to formulate opinions placed before a jury without that process being disclosed and debated in court is contrary to principles of open justice.

the discussion found in the book is quite moderate and inclusive, in that a Bayesian analysis helps in gathering evidence about a case, but may be misunderstood or misused at the [non-Bayesian] decision level.

In conclusion, Let the Evidence Speak is an interesting introduction to Bayesian thinking, through a simplifying device, the Bayes grid, which seems to come from management, with a large number of examples, if not necessarily all realistic and some side-stories. I doubt this exposure can produce expert practitioners, but it makes for an worthwhile awakening for someone “likely to have read this book because [one] had heard of Bayes but were uncertain what is was” (p.222). With commendable caution and warnings along the way.

evaluating stochastic algorithms

Posted in Books, R, Statistics, University life with tags , , , , , , , , on February 20, 2014 by xi'an

Reinaldo sent me this email a long while ago

Could you recommend me a nice reference about 
measures to evaluate stochastic algorithms (in 
particular focus in approximating posterior 

and I hope he is still reading the ‘Og, despite my lack of prompt reply! I procrastinated and procrastinated in answering this question as I did not have a ready reply… We have indeed seen (almost suffered from!) a flow of MCMC convergence diagnostics in the 90’s.  And then it dried out. Maybe because of the impossibility to be “really” sure, unless running one’s MCMC much longer than “necessary to reach” stationarity and convergence. The heat of the dispute between the “single chain school” of Geyer (1992, Statistical Science) and the “multiple chain school” of Gelman and Rubin (1992, Statistical Science) has since long evaporated. My feeling is that people (still) run their MCMC samplers several times and check for coherence between the outcomes. Possibly using different kernels on parallel threads. At best, but rarely, they run (one or another form of) tempering to identify the modal zones of the target. And instances where non-trivial control variates are available are fairly rare. Hence, a non-sequitur reply at the MCMC level. As there is no automated tool available, in my opinion. (Even though I did not check the latest versions of BUGS.)

As it happened, Didier Chauveau from Orléans gave today a talk at Big’MC on convergence assessment based on entropy estimation, a joint work with Pierre Vandekerkhove. He mentioned SamplerCompare which is an R package that appeared in 2010. Soon to come is their own EntropyMCMC package, using parallel simulation. And k-nearest neighbour estimation.

If I re-interpret the question as focussed on ABC algorithms, it gets both more delicate and easier. Easy because each ABC distribution is different. So there is no reason to look at the unreachable original target. Delicate because there are several parameters to calibrate (tolerance, choice of summary, …) on top of the number of MCMC simulations. In DIYABC, the outcome is always made of the superposition of several runs to check for stability (or lack thereof). But this tells us nothing about the distance to the true original target. The obvious but impractical answer is to use some basic bootstrapping, as it is generally much too costly.