**T**he next edition of the MaxEnt conferences, or more precisely workshops on Bayesian Inference and Maximum Entropy Methods in Science and Engineering , MaxEnt2023, will take place in Garching (bei München) next 3-7 July. At the Max-Planck-Institut für Plasmaphysik. While the conference is usually of strong interest, it is rather improbable I will attend it this year. (The only time I took part in a MaxEnt conference was in 2009, in Oxford. Oxford, Mississippi!).

## Archive for Bayesian inference

## MaxEnt im Garching

Posted in pictures, Statistics, Travel, University life with tags Bayesian inference, Germany, Le Monde, Max Planck Institute, MaxEnt2023, maximum entropy, München, Munich, Oxford (Mississipi) on December 28, 2022 by xi'an## learning optimal summary statistics

Posted in Books, pictures, Statistics with tags ABC, Approximate Bayesian computation, Bayesian inference, Fisher information, Kullback-Leibler divergence, neural density estimator, Normandy, quiz, sufficiency, summary statistics on July 27, 2022 by xi'an

“Despite the pursuit of the holy grail of sufficient statistics, most applications will have to settle for the weakest concept of optimal statistics.”Quiz #1:How does Bayes sufficiency [which preserves the posterior density] differ from sufficiency [which preserves the likelihood function]?Quiz #2:

How does Fisher-information sufficiency [which preserves the information matrix] differ from standard sufficiency [which preserves the likelihood function]?

Read a recent arXival by Till Hoffmann and Jukka-Pekka Onnela that I frankly found most puzzling… Maybe due to the Norman train where I was traveling being particularly noisy.

The argument in the paper is to find a summary statistic that minimises the [empirical] expected posterior entropy, which equivalently means minimising the expected Kullback-Leibler distance to the full posterior. And maximizing the mutual information between parameters θ and summaries t(.). And maximizing the expected surprise. Which obviously requires breaking the sample into iid components and hence considering the gain brought by a specific transform of a *single* observation. The paper also contains a long comparison with other criteria for choosing summaries.

“Minimizing the posterior entropy would discard the sufficient statistic t such that the posterior is equal to the prior–we have not learned anything from the data.”

Furthermore, the *expected* aspect of the criterion takes us away from a proper Bayes analysis (and exhibits artifacts as the one above), which somehow makes me question the relevance of comparing entropies under different distributions. It took me a long while to realise that the collection of summaries was set by the user and quite limited. Like a neural network representation of the posterior mean. And the *intractable* posterior is further *approximated* by a closed-form function of the parameter θ and of the summary t(.). Using there a neural density estimator. Or a mixture density network.

## Bayes in Riddler mode

Posted in Books, Kids, R, Statistics with tags Bayesian inference, capture-recapture, hypergeometric distribution, R, riddle, The Riddler on July 7, 2022 by xi'an**A** very classical (textbook) question on the Riddler on inferring the contents of an urn from an Hypergeometric experiment:

You have an urn with N red and white balls, but you have no information about what N might be. You draw n=19 balls at random, without replacement, and you get 8 red balls and 11 white balls. What is your best guess for the original number of balls (red and white) in the urn?

With therefore a likelihood given by

leading to a simple posterior derivation when choosing a 1/RW improper prior. That can be computed for a range of integer values of R and W:

L=function(R,W)lfactorial(R)+lfactorial(W)+ lfactorial(R+W-19)-lfactorial(R-8)- lfactorial(W-11)-lfactorial(R+W)

and produces a posterior mean of 99.1 for R and of 131.2 for W, or a posterior median of 52 for R and 73 for W. And to the above surface for the log-likelihood. Which is unsurprisingly maximal at (8,11). The dependence on the prior is of course significant!

However silly me missed one word in the riddle, namely that R and W were equal… With a proper prior in 1/R², the posterior mean is 42.2 (unstable) and the posterior median 20. While an improper prior in 1/R leads to a posterior mean of 133.7 and a posterior median of 72. However, since the posterior mean increases with the number of values of R for which the posterior is computed, it may be that this mean does not exist!

## confidence in confidence

Posted in Statistics, University life with tags Bayesian inference, confidence distribution, credible intervals, flat prior, Proceedings of the Royal Society on June 8, 2022 by xi'an*[This is a ghost post that I wrote eons ago and which got lost in the meanwhile.]*

**F**ollowing the false confidence paper, Céline Cunen, Niels Hjort & Tore Schweder wrote a short paper in the same Proceedings A defending confidence distributions. And blame the phenomenon on Bayesian tools, which “might have unfortunate frequentist properties”. Which comes as no surprise since Tore Schweder and Nils Hjort wrote a book promoting confidence distributions for statistical inference.

*“…there will never be any false confidence, and we can trust the obtained confidence! “*

Their re-analysis of Balch et al (2019) is that using a flat prior on the location (of a satellite) leads to a non-central chi-square distribution as the posterior on the squared distance δ² (between two satellites). Which incidentally happens to be a case pointed out by Jeffreys (1939) against the use of the flat prior as δ² has a constant bias of d (the dimension of the space) plus the non-centrality parameter. And offers a neat contrast between the posterior, with non-central chi-squared cdf with two degrees of freedom

and the confidence “cumulative distribution”

Cunen et al (2020) argue that the frequentist properties of the confidence distribution 1-C(R), where R is the impact distance, are robust to an increasing σ when the true value is also R. Which does not seem to demonstrate much. A second illustration of B and C when the distance δ varies and both σ and |y|² are fixed is even more puzzling when the authors criticize the Bayesian credible interval for missing the “true” value of δ, as I find the statement meaningless for a fixed value of |y|²… Looking forward the third round!, i.e. a rebuttal by Balch et al (2019)