## statistical illiteracy

Posted in Statistics with tags , , , , , , , , , , , on October 27, 2020 by xi'an

An opinion tribune in the Guardian today about the importance of statistical literacy in these COVIdays, entitled “Statistical illiteracy isn’t a niche problem. During a pandemic, it can be fatal“, by Carlo Rovelli (a physics professor on Luminy campus) which, while well-intended, is not particularly helping. For instance, the tribune starts with a story of a cluster of a rare disease happening in a lab along with the warning that [Poisson] clusters also occur with uniform sampling. But.. being knowledgeable about the Poisson process may help in reducing the psychological stress within the lab only if the cluster size is compatible with the prevalence of the disease in the neighbourhood. Obviously, a poor understanding of randomness and statistical tools has not help with the handling of the pandemics by politicians, decision-makers, civil servants and doctors (although I would have added the fundamental misconception about scientific models which led most people to confuse the map with the territory and later cry wolf…)

Rovelli also cites Bruno de Finetti as “the key to understanding probability”, as a representation of one’s beliefs rather than a real thing. While I agree with this Bayesian perspective, I am unsure it will percolate well enough with the Guardian audience. And bring more confidence in the statistical statements made by experts…

It is only when I finished reading the column that I realised it was adapted from a book soon to appear by the author. And felt slightly cheated. [Obviously, I did not read it so this is NOT a book review!]

## Poisson process model for Monte Carlo methods

Posted in Books with tags , , , , , , , on February 25, 2016 by xi'an

“Taken together this view of Monte Carlo simulation as a maximization problem is a promising direction, because it connects Monte Carlo research with the literature on optimization.”

Chris Maddison arXived today a paper on the use of Poisson processes in Monte Carlo simulation. based on the so-called Gumbel-max trick, which amounts to add to the log-probabilities log p(i) of the discrete target, iid Gumbel variables, and to take the argmax as the result of the simulation. A neat trick as it does not require the probability distribution to be normalised. And as indicated in the above quote to relate simulation and optimisation. The generalisation considered here replaces the iid Gumbel variates by a Gumbel process, which is constructed as an “exponential race”, i.e., a Poisson process with an exponential auxiliary variable. The underlying variates can be generated from a substitute density, à la accept-reject, which means this alternative bounds the true target.  As illustrated in the plot above.

The paper discusses two implementations of the principle found in an earlier NIPS 2014 paper [paper that contains most of the novelty about this method], one that refines the partition and the associated choice of proposals, and another one that exploits a branch-and-bound tree structure to optimise the Gumbel process. With apparently higher performances. Overall, I wonder at the applicability of the approach because of the accept-reject structure: it seems unlikely to apply to high dimensional problems.

While this is quite exciting, I find it surprising that this paper completely omits references to Brian Ripley’s considerable input on simulation and point processes. As well as the relevant Geyer and Møller (1994). (I am obviously extremely pleased to see that our 2004 paper with George Casella and Marty Wells is quoted there. We had written this paper in Cornell, a few years earlier, right after the 1999 JSM in Baltimore, but it has hardly been mentioned since then!)

## the Poisson transform

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , on June 19, 2014 by xi'an

In obvious connection with an earlier post on the “estimation” of normalising constants, Simon Barthelmé and Nicolas Chopin just arXived a paper on The Poisson transform for unormalised statistical models. Obvious connection because I heard of the Guttmann and Hyvärinen (2012) paper when Simon came to CREST to give a BiP talk on this paper a few weeks ago. (A connected talk he gave in Banff is available as a BIRS video.)

Without getting too much into details, the neat idea therein is to turn the observed likelihood

$\sum_{i=1}^n f(x_i|\theta) - n \log \int \exp f(x|\theta) \text{d}x$

into a joint likelihood

$\sum_{i=1}^n[f(x_i|\theta)+\nu]-n\int\exp[f(x|\theta)+\nu]\text{d}x$

which is the likelihood of a Poisson point process with intensity function

$\exp\{ f(x|\theta) + \nu +\log n\}$

This is an alternative model in that the original likelihood does not appear as a marginal of the above. Only the modes coincide, with the conditional mode in ν providing the normalising constant. In practice, the above Poisson process likelihood is unavailable and Guttmann and Hyvärinen (2012) offer an approximation by means of their logistic regression.

Unavailable likelihoods inevitably make me think of ABC. Would ABC solutions be of interest there? In particular, could the Poisson point process be simulated with no further approximation? Since the “true” likelihood is not preserved by this representation, similar questions to those found in ABC arise, like a measure of departure from the “true” posterior. Looking forward the Bayesian version! (Marginalia: Siméon Poisson died in Sceaux, which seemed to have attracted many mathematicians at the time, since Cauchy also spent part of his life there…)

## improper priors, incorporated

Posted in Books, Statistics, University life with tags , , , , , , , , on January 11, 2012 by xi'an

If a statistical procedure is to be judged by a criterion such as a conventional loss function (…) we should not expect optimal results from a probabilistic theory that demands multiple observations and multiple parameters.” P. McCullagh & H. Han

Peter McCullagh and Han Han have just published in the Annals of Statistics a paper on Bayes’ theorem for improper mixtures. This is a fascinating piece of work, even though some parts do elude me… The authors indeed propose a framework based on Kingman’s Poisson point processes that allow to include (countable) improper priors in a coherent probabilistic framework. This framework requires the definition of a test set A in the sampling space, the observations being then the events Y∩ A, Y being an infinite random set when the prior is infinite. It is therefore complicated to perceive this representation in a genuine Bayesian framework, i.e. for a single observation, corresponding to a single parameter value. In that sense it seems closer to the original empirical Bayes, à la Robbins.

An improper mixture is designed for a generic class of problems, not necessarily related to one another scientifically, but all having the same mathematical structure.” P. McCullagh & H. Han

The paper thus misses in my opinion a clear link with the design of improper priors. And it does not offer a resolution of the  improper prior Bayes factor conundrum. However, it provides a perfectly valid environment for working with improper priors. For instance, the final section on the marginalisation “paradoxes” is illuminating in this respect as it does not demand  using a limit of proper priors.