## MaxEnt im Garching

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , on December 28, 2022 by xi'an The next edition of the MaxEnt conferences, or more precisely workshops on Bayesian Inference and Maximum Entropy Methods in Science and Engineering , MaxEnt2023, will take place in Garching (bei München) next 3-7 July. At the Max-Planck-Institut für Plasmaphysik. While the conference is usually of strong interest, it is rather improbable I will attend it this year. (The only time I took part in a MaxEnt conference was in 2009, in Oxford. Oxford, Mississippi!).

## inferring the number of components [remotely]

Posted in Statistics with tags , , , , , , , , , , , , , , , , , on October 14, 2022 by xi'an

## learning optimal summary statistics

Posted in Books, pictures, Statistics with tags , , , , , , , , , on July 27, 2022 by xi'an

Despite the pursuit of the holy grail of sufficient statistics, most applications will have to settle for the weakest concept of optimal statistics.” Quiz #1: How does Bayes sufficiency [which preserves the posterior density] differ from sufficiency [which preserves the likelihood function]?

Quiz #2: How does Fisher-information sufficiency [which preserves the information matrix] differ from standard sufficiency [which preserves the likelihood function]?

Read a recent arXival by Till Hoffmann and Jukka-Pekka Onnela that I frankly found most puzzling… Maybe due to the Norman train where I was traveling being particularly noisy.

The argument in the paper is to find a summary statistic that minimises the [empirical] expected posterior entropy, which equivalently means minimising the expected Kullback-Leibler distance to the full posterior.  And maximizing the mutual information between parameters θ and summaries t(.). And maximizing the expected surprise. Which obviously requires breaking the sample into iid components and hence considering the gain brought by a specific transform of a single observation. The paper also contains a long comparison with other criteria for choosing summaries.

“Minimizing the posterior entropy would discard the sufficient statistic t such that the posterior is equal to the prior–we have not learned anything from the data.”

Furthermore, the expected aspect of the criterion takes us away from a proper Bayes analysis (and exhibits artifacts as the one above), which somehow makes me question the relevance of comparing entropies under different distributions. It took me a long while to realise that the collection of summaries was set by the user and quite limited. Like a neural network representation of the posterior mean. And the intractable posterior is further approximated by a closed-form function of the parameter θ and of the summary t(.). Using there a neural density estimator. Or a mixture density network.

## Bayes in Riddler mode

Posted in Books, Kids, R, Statistics with tags , , , , , on July 7, 2022 by xi'an A very classical (textbook) question on the Riddler on inferring the contents of an urn from an Hypergeometric experiment:

You have an urn with N  red and white balls, but you have no information about what N might be. You draw n=19 balls at random, without replacement, and you get 8 red balls and 11 white balls. What is your best guess for the original number of balls (red and white) in the urn?

With therefore a likelihood given by $\frac{R!}{(R-8)!}\frac{W!}{(W-11)!}\frac{(R+W-19)!}{(R+W)!}$

leading to a simple posterior derivation when choosing a 1/RW improper prior. That can be computed for a range of integer values of R and W:

L=function(R,W)lfactorial(R)+lfactorial(W)+
lfactorial(R+W-19)-lfactorial(R-8)-
lfactorial(W-11)-lfactorial(R+W)


and produces a posterior mean of 99.1 for R and of 131.2 for W, or a posterior median of 52 for R and 73 for W. And to the above surface for the log-likelihood. Which is unsurprisingly maximal at (8,11). The dependence on the prior is of course significant!

However silly me missed one word in the riddle, namely that R and W were equal… With a proper prior in 1/R², the posterior mean is 42.2 (unstable) and the posterior median 20. While an improper prior in 1/R leads to a posterior mean of 133.7 and a posterior median of 72. However, since the posterior mean increases with the number of values of R for which the posterior is computed, it may be that this mean does not exist!

## confidence in confidence

Posted in Statistics, University life with tags , , , , on June 8, 2022 by xi'an

[This is a ghost post that I wrote eons ago and which got lost in the meanwhile.] Following the false confidence paper, Céline Cunen, Niels Hjort & Tore Schweder wrote a short paper in the same Proceedings A defending confidence distributions. And blame the phenomenon on Bayesian tools, which “might have unfortunate frequentist properties”. Which comes as no surprise since Tore Schweder and Nils Hjort wrote a book promoting confidence distributions for statistical inference.

“…there will never be any false confidence, and we can trust the obtained confidence! “

Their re-analysis of Balch et al (2019) is that using a flat prior on the location (of a satellite) leads to a non-central chi-square distribution as the posterior on the squared distance δ² (between two satellites). Which incidentally happens to be a case pointed out by Jeffreys (1939) against the use of the flat prior as δ² has a constant bias of d (the dimension of the space) plus the non-centrality parameter. And offers a neat contrast between the posterior, with non-central chi-squared cdf with two degrees of freedom $F(\delta)=\Gamma_2(\delta^2/\sigma^2;||y||^2/\sigma^2)$

and the confidence “cumulative distribution” $C(\delta)=1-\Gamma_2(|y||^2/\sigma^2;\delta^2/\sigma^2)$

Cunen et al (2020) argue that the frequentist properties of the confidence distribution 1-C(R), where R is the impact distance, are robust to an increasing σ when the true value is also R. Which does not seem to demonstrate much. A second illustration of B and C when the distance δ varies and both σ and |y|² are fixed is even more puzzling when the authors criticize the Bayesian credible interval for missing the “true” value of δ, as I find the statement meaningless for a fixed value of |y|²… Looking forward the third round!, i.e. a rebuttal by Balch et al (2019)