**I** was forwarded an article from Mare, the journal of the University of Leiden (Universiteit Leiden), a weekly newspaper written by an independent team of professional journalists. Entitled *“Fraude, verdwenen evaluaties en een verziekt klimaat: hoe de beste statistiekgroep van Nederland uiteenviel” (**Fraud, lost evaluations and a sickening climate: how the best statistics group in the Netherlands fell apart)*, it tells (through Google translate) the appalling story of how an investigation on mishandled student course evaluations led to the disintegration of the World-renowned Leiden statistics group, with the departure of a large fraction of its members, including its head, Aad van der Vaart, a giant in mathematical statistics, author of deep, reference, books like Asymptotic Statistics and Fundamentals of Nonparametric Bayesian Inference, an ERC advanced grant recipient, and now professor at TU Delft… While I am not at all acquainted with the specifics, reading the article makes the chain of events sound like chaos propagation, when the suspicious disappearance of student evaluation forms about a statistics course leads to a re-evaluation round, itself put under scrutiny by the University, then to a recruitment freeze of prospective statistician appointments by the (pure math) successor of Aad, as well as increasing harassment of the statisticians in the Mathematisch Instituut, and eventually to the exile of most of them. Wat een verspilling!

## Archive for Bayesian asymptotics

## the mysterious disappearance of the Leiden statistics group

Posted in Books, pictures, Statistics, University life with tags Bayesian asymptotics, Bayesian nonparametrics, ERC, Leiden, mathematical statistics, the Netherlands, TU Delft, Universiteit Leiden on July 14, 2021 by xi'an## Korean trip

Posted in Mountains, Running, Statistics, Travel, University life with tags ABC, Bayesian asymptotics, Bukhasan, campus, CREST, jatp, Korea, Korean Statistical Society, Rutgers University, Salzburg, Seoul, Seoul National University, tutorial, University of Seoul on November 24, 2019 by xi'an**A** fairly short but exciting trip to Seoul and to the Fall meeting of the Korean Statistical Society there. Plus giving a seminar at Seoul National University, where I stayed and enjoyed its beautiful campus surrounded by hills painted in the flamboyant reds and yellows of trees. Running to the top of Gwanaksan in the early morning, with some scrambling moments, was a fantastic beginning for the day! Although it was quite unintentional Sacha Tsybakov from CREST happened to be another invited speaker at the meeting (along with Regina Liu from Rutgers, whom I was also met in Salzburg two months ago) and we had a nice stroll together on the University of Seoul campus during a break in the sessions, gaining another view of the city from the top of the Bukhasan mountain. The talk I gave there on the asymptotics of ABC happened to be more attended than my tutorial lecture delivered at the beginning of JSM in Denver this summer. I am thus quite grateful to the organisers for their invitation and this opportunity to meet Korean statisticians and to get a glimpse of Korean culture and cuisine!

## probably ABC [and provably robust]

Posted in Books, pictures, Statistics, Travel with tags ABC, ABC-SMC, adaptive Monte Carlo algorithm, Bayesian asymptotics, CREST, Gaussian processes, likelihood-free methods, misspecified model, oracle inequalities on August 8, 2017 by xi'an**T**wo weeks ago, James Ridgway (formerly CREST) arXived a paper on misspecification and ABC, a topic on which David Frazier, Judith Rousseau and I have been working for a while now [and soon to be arXived as well]. Paper that I re-read on a flight to Amsterdam [hence the above picture], written as a continuation of our earlier paper with David, Gael, and Judith. One specificity of the paper is to use an exponential distribution on the distance between the observed and simulated sample within the ABC distribution. Which reminds me of the resolution by Bissiri, Holmes, and Walker (2016) of the intractability of the likelihood function. James’ paper contains oracle inequalities between the ABC approximation and the genuine distribution of the summary statistics, like a bound on the distance between the expectations of the summary statistics under both models. Which writes down as a sum of a model bias, of two divergences between empirical and theoretical averages, on smoothness penalties, and on a prior impact term. And a similar bound on the distance between the expected distance to the oracle estimator of θ under the ABC distribution [and a Lipschitz type assumption also found in our paper]. Which first sounded weird [to me] as I would have expected the true posterior, until it dawned on me that the ABC distribution is the one used for the estimation [a passing strike of over-Bayesianism!]. While the oracle bound could have been used directly to discuss the rate of convergence of the exponential rate λ to zero [with the sample size n], James goes into the interesting alternative direction of setting a prior on λ, an idea that dates back to Olivier Catoni and Peter Grünwald. Or rather a pseudo-posterior on λ, a common occurrence in the PAC-Bayesian literature. In one of his results, James obtains a dependence of λ on the dimension m of the summary [as well as the root dependence on the sample size n], which seems to contradict our earlier independence result, until one realises this scale parameter is associated with a distance variable, itself scaled in m.

The paper also contains a non-parametric part, where the parameter θ is the unknown distribution of the data and the summary the data itself. Which is quite surprising as I did not deem it possible to handle non-parametrics with ABC. Especially in a misspecified setting (although I have trouble perceiving what this really means).

“We can use most of the Monte Carlo toolbox available in this context.”

The theoretical parts are a bit heavy on notations and hard to read [as a vacation morning read at least!]. They are followed by a Monte Carlo implementation using SMC-ABC. And pseudo-marginals [at least formally as I do not see how the specific features of pseudo-marginals are more that an augmented representation here]. And adaptive multiple pseudo-samples that reminded me of the Biometrika paper of Anthony Lee and Krys Latuszynski (Warwick). Therefore using indeed most of the toolbox!

## CORE talk at Louvain-la-Neuve

Posted in Statistics with tags ABC, ABC convergence, Banff, Bayesian asymptotics, Bayesian econometrics, Belgium, CORE, Louvain-la-Neuve on March 16, 2017 by xi'anTomorrow, I will give a talk at the seminar for econometrics and finance of CORE, in Louvain-la-Neuve, Belgium. Here are my slides, recycled from several earlier talks and from Judith’s slides in Banff:

## mixture models with a prior on the number of components

Posted in Books, Statistics, University life with tags Bayesian asymptotics, Bayesian non-parametrics, Chinese restaurant process, consistency, Dirichlet mixture priors, Dirichlet process, mixtures, reversible jump on March 6, 2015 by xi'an

“From a Bayesian perspective, perhaps the most natural approach is to treat the numberof components like any other unknown parameter and put a prior on it.”

**A**nother mixture paper on arXiv! Indeed, Jeffrey Miller and Matthew Harrison recently arXived a paper on estimating the number of components in a mixture model, comparing the parametric with the non-parametric Dirichlet prior approaches. Since priors can be chosen towards agreement between those. This is an obviously interesting issue, as they are often opposed in modelling debates. The above graph shows a crystal clear agreement between finite component mixture modelling and Dirichlet process modelling. The same happens for classification. However, Dirichlet process priors do not return an estimate of the number of components, which may be considered a drawback if one considers this is an identifiable quantity in a mixture model… But the paper stresses that the number of estimated clusters under the Dirichlet process modelling tends to be larger than the number of components in the finite case. Hence that the Dirichlet process mixture modelling is not consistent in that respect, producing parasite extra clusters…

In the parametric modelling, the authors assume the same scale is used in all Dirichlet priors, that is, for all values of k, the number of components. Which means an incoherence when marginalising from k to (k-p) components. Mild incoherence, in fact, as the parameters of the different models do not have to share the same priors. And, as shown by Proposition 3.3 in the paper, this does not prevent coherence in the marginal distribution of the latent variables. The authors also draw a comparison between the distribution of the partition in the finite mixture case and the Chinese restaurant process associated with the partition in the infinite case. A further analogy is that the finite case allows for a stick breaking representation. A noteworthy difference between both modellings is about the size of the partitions

in the finite (homogeneous partitions) and infinite (extreme partitions) cases.

An interesting entry into the connections between “regular” mixture modelling and Dirichlet mixture models. Maybe not ultimately surprising given the past studies by Peter Green and Sylvia Richardson of both approaches (1997 in Series B and 2001 in JASA).