Archive for Bayesian asymptotics

the mysterious disappearance of the Leiden statistics group

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on July 14, 2021 by xi'an

I was forwarded an article from Mare, the journal of the University of Leiden (Universiteit Leiden), a weekly newspaper written by an independent team of professional journalists. Entitled “Fraude, verdwenen evaluaties en een verziekt klimaat: hoe de beste statistiekgroep van Nederland uiteenviel” (Fraud, lost evaluations and a sickening climate: how the best statistics group in the Netherlands fell apart), it tells (through Google translate) the appalling story of how an investigation on mishandled student course evaluations led to the disintegration of the World-renowned Leiden statistics group,  with the departure of a large fraction of its members, including its head, Aad van der Vaart, a giant in mathematical statistics, author of deep, reference, books like Asymptotic Statistics and  Fundamentals of Nonparametric Bayesian Inference, an ERC advanced grant recipient, and now professor at TU Delft… While I am not at all acquainted with the specifics, reading the article makes the chain of events sound like chaos propagation, when the suspicious disappearance of student evaluation forms about a statistics course leads to a re-evaluation round, itself put under scrutiny by the University, then to a recruitment freeze of prospective statistician appointments by the (pure math) successor of Aad, as well as increasing harassment of the statisticians in the  Mathematisch Instituut, and eventually to the exile of most of them. Wat een verspilling!

mathematical theory of Bayesian statistics [book review]

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , on May 6, 2021 by xi'an

I came by chance (and not by CHANCE) upon this 2018 CRC Press book by Sumio Watanabe and ordered it myself to gather which material it really covered. As the back-cover blurb was not particularly clear and the title sounded quite general. After reading it, I found out that this is a mathematical treatise on some aspects of Bayesian information criteria, in particular on the Widely Applicable Information Criterion (WAIC) that was introduced by the author in 2010. The result is a rather technical and highly focussed book with little motivation or intuition surrounding the mathematical results, which may make the reading arduous for readers. Some background on mathematical statistics and Bayesian inference is clearly preferable and the book cannot be used as a textbook for most audiences, as opposed to eg An Introduction to Bayesian Analysis by J.K. Ghosh et al. or even more to Principles of Uncertainty by J. Kadane. In connection with this remark the exercises found in the book are closer to the delivery of additional material than to textbook-style exercises.

“posterior distributions are often far from any normal distribution, showing that Bayesian estimation gives the more accurate inference than other estimation methods.”

The overall setting is one where both the sampling and the prior distributions are different from respective “true” distributions. Requiring a tool to assess the discrepancy when utilising a specific pair of such distributions. Especially when the posterior distribution cannot be approximated by a Normal distribution. (Lindley’s paradox makes an interesting incognito incursion on p.238.) The WAIC is supported for the determination of the “true” model, in opposition to AIC and DIC, incl. on a mixture example that reminded me of our eight versions of DIC paper. In the “Basic Bayesian Theory” chapter (§3), the “basic theorem of Bayesian statistics” (p.85) states that the various losses related with WAIC can be expressed as second-order Taylor expansions of some cumulant generating functions, with order o(n⁻¹), “even if the posterior distribution cannot be approximated by any normal distribution” (p.87). With the intuition that

“if a log density ratio function has a relatively finite variance then the generalization loss, the cross validation loss, the training loss and WAIC have the same asymptotic behaviors.”

Obviously, these “basic” aspects should come as a surprise to a fair percentage of Bayesians (in the sense of not being particularly basic). Myself included. Chapter 4 exposes why, for regular models, the posterior distribution accumulates in an ε neighbourhood of the optimal parameter at a speed O(n2/5). With the normalised partition function being of order n-d/2 in the neighbourhood and exponentially negligible outside. A consequence of this regular asymptotic theory is that all above losses are asymptotically equivalent to the negative log likelihood plus similar order n⁻¹ terms that can be ordered. Chapters 5 and 6 deal with “standard” [the likelihood ratio is a multi-index power of the parameter ω] and general posterior distributions that can be written as mixtures of standard distributions,  with expressions of the above losses in terms of new universal constants. Again, a rather remote concern of mine. The book also includes a chapter (§7) on MCMC, with a rather involved proof that a Metropolis algorithm satisfies detailed balance (p.210). The Gibbs sampling section contains an extensive example on a two-dimensional two-component unit-variance Normal mixture, with an unusual perspective on the posterior, which is considered as “singular” when the true means are close. (Label switching or the absence thereof is not mentioned.) In terms of approximating the normalising constant (or free energy), the only method discussed there is path sampling, with a cryptic remark about harmonic mean estimators (not identified as such). In a final knapsack chapter (§9),  Bayes factors (confusedly denoted as L(x)) are shown to be most powerful tests in a Bayesian sense when comparing hypotheses without prior weights on said hypotheses, while posterior probability ratios are the natural statistics for comparing models with prior weights on said models. (With Lindley’s paradox making another appearance, still incognito!) And a  notion of phase transition for hyperparameters is introduced, with the meaning of a radical change of behaviour at a critical value of said hyperparameter. For instance, for a simple normal- mixture outlier model, the critical value of the Beta hyperparameter is α=2. Which is a wee bit of a surprise when considering Rousseau and Mengersen (2011) since their bound for consistency was α=d/2.

In conclusion, this is quite an original perspective on Bayesian models, covering the somewhat unusual (and potentially controversial) issue of misspecified priors and centered on the use of information criteria. I find the book could have benefited from further editing as I noticed many typos and somewhat unusual sentences (at least unusual to me).

[Disclaimer about potential self-plagiarism: this post or an edited version should eventually appear in my Books Review section in CHANCE.]

Korean trip

Posted in Mountains, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , on November 24, 2019 by xi'an

A fairly short but exciting trip to Seoul and to the Fall meeting of the Korean Statistical Society there. Plus giving a seminar at Seoul National University, where I stayed and enjoyed its beautiful campus surrounded by hills painted in the flamboyant reds and yellows of trees. Running to the top of Gwanaksan in the early morning, with some scrambling moments, was a fantastic beginning for the day! Although it was quite unintentional Sacha Tsybakov from CREST happened to be another invited speaker at the meeting (along with Regina Liu from Rutgers, whom I was also met in Salzburg two months ago) and we had a nice stroll together on the University of Seoul campus during a break in the sessions, gaining another view of the city from the top of the Bukhasan mountain. The talk I gave there on the asymptotics of ABC happened to be more attended than my tutorial lecture delivered at the beginning of JSM in Denver this summer. I am thus quite grateful to the organisers for their invitation and this opportunity to meet Korean statisticians and to get a glimpse of Korean culture and cuisine!

 

probably ABC [and provably robust]

Posted in Books, pictures, Statistics, Travel with tags , , , , , , , , on August 8, 2017 by xi'an

Two weeks ago, James Ridgway (formerly CREST) arXived a paper on misspecification and ABC, a topic on which David Frazier, Judith Rousseau and I have been working for a while now [and soon to be arXived as well].  Paper that I re-read on a flight to Amsterdam [hence the above picture], written as a continuation of our earlier paper with David, Gael, and Judith. One specificity of the paper is to use an exponential distribution on the distance between the observed and simulated sample within the ABC distribution. Which reminds me of the resolution by Bissiri, Holmes, and Walker (2016) of the intractability of the likelihood function. James’ paper contains oracle inequalities between the ABC approximation and the genuine distribution of the summary statistics, like a bound on the distance between the expectations of the summary statistics under both models. Which writes down as a sum of a model bias, of two divergences between empirical and theoretical averages, on smoothness penalties, and on a prior impact term. And a similar bound on the distance between the expected distance to the oracle estimator of θ under the ABC distribution [and a Lipschitz type assumption also found in our paper]. Which first sounded weird [to me] as I would have expected the true posterior, until it dawned on me that the ABC distribution is the one used for the estimation [a passing strike of over-Bayesianism!]. While the oracle bound could have been used directly to discuss the rate of convergence of the exponential rate λ to zero [with the sample size n], James goes into the interesting alternative direction of setting a prior on λ, an idea that dates back to Olivier Catoni and Peter Grünwald. Or rather a pseudo-posterior on λ, a common occurrence in the PAC-Bayesian literature. In one of his results, James obtains a dependence of λ on the dimension m of the summary [as well as the root dependence on the sample size n], which seems to contradict our earlier independence result, until one realises this scale parameter is associated with a distance variable, itself scaled in m.

The paper also contains a non-parametric part, where the parameter θ is the unknown distribution of the data and the summary the data itself. Which is quite surprising as I did not deem it possible to handle non-parametrics with ABC. Especially in a misspecified setting (although I have trouble perceiving what this really means).

“We can use most of the Monte Carlo toolbox available in this context.”

The theoretical parts are a bit heavy on notations and hard to read [as a vacation morning read at least!]. They are followed by a Monte Carlo implementation using SMC-ABC.  And pseudo-marginals [at least formally as I do not see how the specific features of pseudo-marginals are more that an augmented representation here]. And adaptive multiple pseudo-samples that reminded me of the Biometrika paper of Anthony Lee and Krys Latuszynski (Warwick). Therefore using indeed most of the toolbox!

CORE talk at Louvain-la-Neuve

Posted in Statistics with tags , , , , , , , on March 16, 2017 by xi'an

Tomorrow, I will give a talk at the seminar for econometrics and finance of CORE, in Louvain-la-Neuve, Belgium. Here are my slides, recycled from several earlier talks and from Judith’s slides in Banff:

 

%d bloggers like this: