**A** great Bayesian Analysis webinar this afternoon with well-balanced presentations by Steve MacEachern and John Lewis, and original discussions by Bertrand Clarke and Fabrizio Rugieri. Which attracted 122 participants. I particularly enjoyed Bertrand’s points that likelihoods were more general than models [made in 6 different wordings!] and that this paper was closer to the M-open perspective. I think I eventually got the reason why the approach could be seen as an ABC with ε=0, since the simulated y’s all get the right statistic, but this presentation does not bring a strong argument in favour of the restricted likelihood approach, when considering the methodological and computational effort. The discussion also made me wonder if tools like VAEs could be used towards approximating the distribution of T(y) conditional on the parameter θ. This is also an opportunity to thank my friend Michele Guindani for his hard work as Editor of Bayesian Analysis and in particular for keeping the discussion tradition thriving!

## Archive for information

## likelihood-free and summary-free?

Posted in Books, Mountains, pictures, Statistics, Travel with tags ABC, arXiv, Australia, Cramèr-von Mises distance, curse of dimensionality, energy, Gaussian mixture, indirect inference, information, kernel density estimator, likelihood-free methods, mean discrepancy, summary statistics, Wasserstein distance on March 30, 2021 by xi'an**M**y friends and coauthors Chris Drovandi and David Frazier have recently arXived a paper entitled *A comparison of likelihood-free methods with and without summary statistics*. In which they indeed compare these two perspectives on approximate Bayesian methods like ABC and Bayesian synthetic likelihoods.

“A criticism of summary statistic based approaches is that their choice is oftenad hocand there will generally be an inherent loss of information.”

In ABC methods, the recourse to a summary statistic is often advocated as a “necessary evil” against the greater evil of the curse of dimension, paradoxically providing a faster convergence of the ABC approximation (Fearnhead & Liu, 2018). The authors propose a somewhat generic selection of summary statistics based on [my undergrad mentors!] Gouriéroux’s and Monfort’s indirect inference, using a mixture of Gaussians as their auxiliary model. Summary-free solutions, as in our Wasserstein papers, rely on distances between distributions, hence are functional distances, that can be seen as dimension-free as well (or criticised as infinite dimensional). Chris and David consider energy distances (which sound very much like standard distances, except for averaging over all permutations), maximum mean discrepancy as in Gretton et al. (2012), Cramèr-von Mises distances, and Kullback-Leibler divergences estimated via one-nearest-neighbour formulas, for a univariate sample. I am not aware of any degree of theoretical exploration of these functional approaches towards the precise speed of convergence of the ABC approximation…

“We found that at least one of the full data approaches was competitive with or outperforms ABC with summary statistics across all examples.”

The main part of the paper, besides a survey of the existing solutions, is to compare the performances of these over a few chosen (univariate) examples, with the exact posterior as the golden standard. In the g & k model, the Pima Indian benchmark of ABC studies!, Cramèr does somewhat better. While it does much worse in an M/G/1 example (where Wasserstein does better, and similarly for a stereological extremes example of Bortot et al., 2007). An ordering inversed again for a toad movement model I had not seen before. While the usual provision applies, namely that this is a simulation study on unidimensional data and a small number of parameters, the design of the four comparison experiments is very careful, eliminating versions that are either too costly or too divergence, although this could be potentially criticised for being unrealistic (i.e., when the true posterior is unknown). The computing time is roughly the same across methods, which essentially remove the call to kernel based approximations of the likelihood. Another point of interest is that the distance methods are significantly impacted by transforms on the data, which should not be so for intrinsic distances! Demonstrating the distances are not intrinsic…

## Mea Culpa

Posted in Statistics with tags Bayesian Analysis, Bayesian foundations, book review, E.T. Jaynes, improper posteriors, improper prior, information, John Skilling, likelihood function, marginalisation paradoxes, prior information, probability theory on April 10, 2020 by xi'an*[A quote from Jaynes about improper priors that I had missed in his book, Probability Theory.]*

For many years, the present writer was caught in this error just as badly as anybody else, because Bayesian calculations with improper priors continued to give just the reasonable and clearly correct results that common sense demanded. So warnings about improper priors went unheeded; just that psychological phenomenon. Finally, it was the marginalization paradox that forced recognition that we had only been lucky in our choice of problems. If we wish to consider an improper prior, the only correct way of doing it is to approach it as a well-defined limit of a sequence of proper priors. If the correct limiting procedure should yield an improper posterior pdf for some parameter α, then probability theory is telling us that the prior information and data are too meager to permit any inferences about α. Then the only remedy is to seek more data or more prior information; probability theory does not guarantee in advance that it will lead us to a useful answer to every conceivable question.Generally, the posterior pdf is better behaved than the prior because of the extra information in the likelihood function, and the correct limiting procedure yields a useful posterior pdf that is analytically simpler than any from a proper prior. The most universally useful results of Bayesian analysis obtained in the past are of this type, because they tended to be rather simple problems, in which the data were indeed so much more informative than the prior information that an improper prior gave a reasonable approximation – good enough for all practical purposes – to the strictly correct results (the two results agreed typically to six or more significant figures).

In the future, however, we cannot expect this to continue because the field is turning to more complex problems in which the prior information is essential and the solution is found by computer. In these cases it would be quite wrong to think of passing to an improper prior. That would lead usually to computer crashes; and, even if a crash is avoided, the conclusions would still be, almost always, quantitatively wrong. But, since likelihood functions are bounded, the analytical solution with proper priors is always guaranteed to converge properly to finite results; therefore it is always possible to write a computer program in such a way (avoid underflow, etc.) that it cannot crash when given proper priors. So, even if the criticisms of improper priors on grounds of marginalization were unjustified,it remains true that in the future we shall be concerned necessarily with proper priors.

## bitcoin and cryptography for statistical inference and AI

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags AI, anonymised data, bitcoin, Britain, cryptography, encryption, Gregynog Hall, Gregynog Statistical Conference, information, Navy, Powys, Tregynon, Wales on April 16, 2018 by xi'an**A** recent news editorial in Nature (15 March issue) reminded me of the lectures Louis Aslett gave at the Gregynog Statistical Conference last week, on the advanced use of cryptography tools to analyse sensitive and private data. Lectures that reminded me of a graduate course I took on cryptography and coding, in Paris 6, and which led me to visit a lab at the Université de Limoges during my conscripted year in the French Navy. With no research outcome. Now, the notion of using encrypted data towards statistical analysis is fascinating in that it may allow for efficient inference and personal data protection at the same time. As opposed to earlier solutions of anonymisation that introduced noise and data degradation, not always providing sufficient protection of privacy. Encryption that is also the notion at the basis of the Nature editorial. An issue completely missing from the paper, while stressed by Louis, is that this encryption (like Bitcoin) is costly, in order to deter hacking, and hence energy inefficient. Or limiting the amount of data that can be used in such studies, which would turn the idea into a stillborn notion.

## Conditional love [guest post]

Posted in Books, Kids, Statistics, University life with tags Andrei Kolmogorov, axioms of probability, Bayes rule, Bayesian nonparametrics, Bayesian statistics, bootstrap, Bruno de Finetti, Céline Dion, David Draper, Dirichlet process, Edwin Jaynes, exchangeability, extendibility, information, JSM 2015, MCMC, plausibility, Richard Cox, Series B, Stone-Weierstrass, Theory of Probability on August 4, 2015 by xi'an*[When Dan Simpson told me he was reading Terenin’s and Draper’s latest arXival in a nice Bath pub—and not a nice bath tub!—, I asked him for a blog entry and he agreed. Here is his piece, read at your own risk! If you remember to skip the part about Céline Dion, you should enjoy it very much!!!]*

**P**robability has traditionally been described, as per Kolmogorov and his ardent follower Katy Perry, unconditionally. This is, of course, excellent for those of us who really like measure theory, as the maths is identical. Unfortunately mathematical convenience is not necessarily enough and a large part of the applied statistical community is working with Bayesian methods. These are unavoidably conditional and, as such, it is natural to ask if there is a fundamentally conditional basis for probability.

Bruno de Finetti—and later Richard Cox and Edwin Jaynes—considered conditional bases for Bayesian probability that are, unfortunately, incomplete. The critical problem is that they mainly consider finite state spaces and construct finitely additive systems of conditional probability. For a variety of reasons, neither of these restrictions hold much truck in the modern world of statistics.

In a recently arXiv’d paper, Alexander Terenin and David Draper devise a set of axioms that make the Cox-Jaynes system of conditional probability rigorous. Furthermore, they show that the complete set of Kolmogorov axioms (including countable additivity) can be derived as theorems from their axioms by conditioning on the entire sample space.

This is a deep and fundamental paper, which unfortunately means that I most probably do not grasp it’s complexities (especially as, for some reason, I keep reading it in pubs!). However I’m going to have a shot at having some thoughts on it, because I feel like it’s the sort of paper one should have thoughts on. Continue reading