**A**n X validated question as to why the MLE is not necessarily (well) covered by a posterior distribution. Even for a flat prior… Which in restrospect highlights the fact that the MLE (and the MAP) are invasive species in a Bayesian ecosystem. Since they do not account for the dominating measure. And hence do not fare well under reparameterisation. (As a very much to the side comment, I also managed to write an almost identical and simultaneous answer to the first answer to the question.)

## Archive for map

## posterior distribution missing the MLE

Posted in Books, Kids, pictures, Statistics with tags Canadian Rockies, dominating measure, Eiffel Peak, Lake Louise, map, maximum a posteriori, maximum likelihood estimation, measure on April 25, 2019 by xi'an## Bayesian maps of Africa

Posted in pictures, Statistics with tags Africa, Bayesian geostatistics, childhood growth failure, educational achievement, Kofi Annan, map, Nature on March 21, 2018 by xi'an**A** rather special issue of Nature this week (1 March 2018) as it addresses Bayesian geo-cartography and mapping childhood growth failure and educational achievement (along with sexual differences) all across Africa! Including the (nice) cover of the journal, a preface by Kofi Annan, a cover article by Brian Reich and Murali Haran, and the first two major articles of the journal, one of which includes Ewan Cameron as a co-author. As I was reading this issue of Nature in the train back from Brussels, I could not access the supplementary material, so could not look at the specifics of the statistics, but the maps look quite impressive with a 5×5 km² resolution. And inclusion not only of uncertainty maps but also of predictive maps on the probability of achieving WHO 2025 goals. Surprisingly close to one in some parts of Africa. In terms of education, there are strong oppositions between different regions, with the south of the continent, including Madagascar, showing a positive difference for women in terms of years of education. While there is no reason (from my train seat) to doubt the statistical analyses, I take quite seriously the reservation of the authors that the quality of the prediction cannot be better than the quality of the data, which is “determined by the volume and fidelity of nationally representative surveys”. Which relates to an earlier post of mine about a similar concern with the deaths in Congo.

## mixtures of mixtures

Posted in pictures, Statistics, University life with tags arXiv, Austria, clustering, k-mean clustering algorithm, Linkz, map, MCMC, mixture, overfitting, Wien on March 9, 2015 by xi'an**A**nd yet another arXival of a paper on mixtures! This one is written by Gertraud Malsiner-Walli, Sylvia Frühwirth-Schnatter, and Bettina Grün, from the Johannes Kepler University Linz and the Wirtschaftsuniversitat Wien I visited last September. With the exact title being Identifying mixtures of mixtures using Bayesian estimation.

So, what *is* a mixture of mixtures if not a mixture?! Or if not *only* a mixture. The upper mixture level is associated with clusters, while the lower mixture level is used for modelling the distribution of a given cluster. Because the cluster needs to be real enough, the components of the mixture are assumed to be heavily overlapping. The paper thus spends a large amount of space on detailing the construction of the associated hierarchical prior. Which in particular implies defining through the prior what a cluster means. The paper also connects with the overfitting mixture idea of Rousseau and Mengersen (2011, Series B). At the cluster level, the Dirichlet hyperparameter is chosen to be very small, 0.001, which empties superfluous clusters but sounds rather arbitrary (which is the reason why we did not go for such small values in our testing/mixture modelling). On the opposite, the mixture weights have an hyperparameter staying (far) away from zero. The MCMC implementation is based on a standard Gibbs sampler and the outcome is analysed and sorted by estimating the “true” number of clusters as the MAP and by selecting MCMC simulations conditional on that value. From there clusters are identified via the point process representation of a mixture posterior. Using a standard k-means algorithm.

The remainder of the paper illustrates the approach on simulated and real datasets. Recovering in those small dimension setups the number of clusters used in the simulation or found in other studies. As noted in the conclusion, using solely a Gibbs sampler with such a large number of components is rather perilous since it may get stuck close to suboptimal configurations. Especially with very small Dirichlet hyperparameters.

## Statistics slides (5)

Posted in Books, Kids, Statistics, University life with tags Bayesian statistics, Don Rubin, HPD region, map, Paris, Université Paris Dauphine on December 7, 2014 by xi'an**H**ere is the fifth and last set of slides for my third year statistics course, trying to introduce Bayesian statistics in the most natural way and hence starting with… Rasmus’ socks and ABC!!! This is an interesting experiment as I have no idea how my students will react. Either they will see the point besides the anecdotal story or they’ll miss it (being quite unhappy so far about the lack of mathematical rigour in my course and exercises…). We only have two weeks left so I am afraid the concept will not have time to seep through!

## SAME but different

Posted in Statistics, University life with tags data cloning, document analysis, map, Monte Carlo Statistical Methods, parallel MCMC, SAME, simulated annealing, simulation, stochastic optimisation, variational Bayes methods on October 27, 2014 by xi'an**A**fter several clones of our SAME algorithm appeared in the literature, it is rather fun to see another paper acknowledging the connection. SAME but different was arXived today by Zhao, Jiang and Canny. The point of this short paper is to show that the parallel implementation of SAME leads to efficient performances compared with existing standards. Since the duplicated latent variables are independent [given θ] they can be simulated in parallel. They further assume independence between the components of those latent variables. And finite support. As in document analysis. So they can sample the replicated latent variables all at once. Parallelism is thus used solely for the components of the latent variable(s). SAME is normally associated with an annealing schedule but the authors could not detect an improvement over a fixed and large number of replications. They reported gains comparable to state-of-the-art variational Bayes on two large datasets. Quite fun to see SAME getting a new life thanks to computer scientists!