**T**oday I made a “quick” (10h door to door!) round trip visit to Marseille (by train) to take part in the PhD thesis defense (committee) of Edwin Fourrier-Nicolaï, which title was *Poverty, inequality and redistribution: an econometric approach*. While this was mainly a thesis in economics, meaning defending some theory on inequalities based on East German data, there were Bayesian components in the thesis that justified (to some extent!) my presence in the jury. Especially around mixture estimation by Gibbs sampling. (On which I started working almost exactly 30 years ago, when I joined Paris 6 and met Gilles Celeux and Jean Diebolt.) One intriguing [for me] question stemmed from this defense, namely the notion of a Bayesian estimation of a *three i’s of poverty* (TIP) curve. The three i’s stand for incidence, intensity, and inequality, as, introduced in Jenkins and Lambert (1997), this curve measure the average income loss from the poverty level for the *100p*% lower incomes, when p varies between 0 and 1. It thus depends on the distribution F of the incomes and when using a mixture distribution its computation requires a numerical cdf inversion to determine the income *p*-th quantile. A related question is thus on how to define a Bayesian estimate of the TIP curve. Using an average over the values of an MCMC sample does not sound absolutely satisfactory since the upper bound in the integral varies for each realisation of the parameter. The use of another estimate would however require a specific loss function, an issue not discussed in the thesis.

## Archive for mixture of distributions

## the three i’s of poverty

Posted in Books, pictures, Statistics, Travel, University life with tags Gibbs sampling, loss function, Marseille, mixture of distributions, thesis defence, three i's of poverty on September 15, 2019 by xi'an## a jump back in time

Posted in Books, Kids, Statistics, Travel, University life with tags Bayesian statistics, Fortran, French army, LaTeX, mixture of distributions, noninformative priors, Purdue University, S, software, Spain, Valencia 3, Valencia conferences on October 1, 2018 by xi'an**A**s the Department of Statistics in Warwick is slowly emptying its shelves and offices for the big migration to the new building that is almost completed, books and documents are abandoned in the corridors and the work spaces. On this occasion, I thus happened to spot a vintage edition of the Valencia 3 proceedings. I had missed this meeting and hence the volume for, during the last year of my PhD, I was drafted in the French Navy and as a result prohibited to travel abroad. (Although on reflection I could have safely done it with no one in the military the wiser!) Reading through the papers thirty years later is a weird experience, as I do not remember most of the papers, the exception being the mixture modelling paper by José Bernardo and Javier Giròn which I studied a few years later when writing the mixture estimation and simulation paper with Jean Diebolt. And then again in our much more recent non-informative paper with Clara Grazian. And Prem Goel’s survey of Bayesian software. That is, 1987 state of the art software. Covering an amazing eighteen list. Including versions by Zellner, Tierney, Schervish, Smith [but no MCMC], Jaynes, Goldstein, Geweke, van Dijk, Bauwens, which apparently did not survive the ages till now. Most were in Fortran but S was also mentioned. And another version of Tierney, Kass and Kadane on Laplace approximations. And the reference paper of Dennis Lindley [who was already retired from UCL at that time!] on the Hardy-Weinberg equilibrium. And another paper by Don Rubin on using SIR (Rubin, 1983) for simulating from posterior distributions with missing data. Ten years before the particle filter paper, and apparently missing the possibility of weights with infinite variance.

There already were some illustrations of Bayesian analysis in action, including one by Jay Kadane reproduced in his book. And several papers by Jim Berger, Tony O’Hagan, Luis Pericchi and others on imprecise Bayesian modelling, which was in tune with the era, the imprecise probability book by Peter Walley about to appear. And a paper by Shaw on numerical integration that mentioned quasi-random methods. Applied to a 12 component Normal mixture.Overall, a much less theoretical content than I would have expected. And nothing about shrinkage estimators, although a fraction of the speakers had worked on this topic most recently.

At a less fundamental level, this was a time when ~~La~~TeX was becoming a standard, as shown by a few papers in the volume (and as I was to find when visiting Purdue the year after), even though most were still typed on a typewriter, including a manuscript addition by Dennis Lindley. And Warwick appeared as a Bayesian hotpot!, with at least five papers written by people there permanently or on a long term visit. (In case a local is interested in it, I have kept the volume, to be found in my new office!)

## Handbook of Mixture Analysis [cover]

Posted in Books, Statistics, University life with tags Chapman & Hall, classification, clustering, CRC Press, handbook, handbook of mixture analysis, JSM 2018, mixture of distributions, mixtures of experts on August 15, 2018 by xi'an**O**n the occasion of my talk at JSM2018, CRC Press sent me the cover of our incoming handbook on mixture analysis, courtesy of Rob Calver who managed to get it to me on very short notice! We are about ready to send the manuscript to CRC Press and hopefully the volume will get published pretty soon. It would have been better to have it ready for JSM2018, but we editors got delayed by a few months for the usual reasons.

## off to Vancouver

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags Bayesian Analysis, British Columbia, Canada, default prior, Joint Statistical Meeting, JSM 2018, mixture of distributions, objective Bayes, summer of British conferences, Vancouver Island on July 29, 2018 by xi'an**I** am off today to Vancouver for JSM2018, eight years after I visited the West Coast for another JSM! And a contender for the Summer of British Conferences, since it is in British Columbia.

And again looking forward the city, (some of) the meeting, and getting together with long-time-no-see friends. Followed by a fortnight of vacations on Vancouver Island where ‘Og posting may get sparse…

I hope I can take advantage of the ten hours in the plane from Paris to write my talk from scratch about priors for mixtures of distributions. Based on our papers with Clara Grazian and with Kaniav Kamary and Kate Lee. Still having some leeway since my talk is on Thursday morning, on the last day of the meeting…

## the [not so infamous] arithmetic mean estimator

Posted in Books, Statistics with tags arithmetic mean estimator, Bayesian Analysis, Chib's approximation, harmonic mean estimator, HPD region, importance sampling, label switching, mixture of distributions, nested sampling, unbiasedness on June 15, 2018 by xi'an

“Unfortunately, no perfect solution exists.”Anna Pajor

**A**nother paper about harmonic and not-so-harmonic mean estimators that I (also) missed came out last year in Bayesian Analysis. The author is Anna Pajor, whose earlier note with Osiewalski I also spotted on the same day. The idea behind the approach [which belongs to the branch of Monte Carlo methods requiring additional simulations after an MCMC run] is to start as the corrected harmonic mean estimator on a restricted set **A** as to avoid tails of the distributions and the connected infinite variance issues that plague the harmonic mean estimator (an old ‘Og tune!). The marginal density p(y) then satisfies an identity involving the prior expectation of the likelihood function restricted to **A** divided by the posterior coverage of **A**. Which makes the resulting estimator unbiased only when this posterior coverage of **A** is known, which does not seem realist or efficient, except if **A** is an HPD region, as suggested in our earlier “safe” harmonic mean paper. And efficient only when **A** is well-chosen in terms of the likelihood function. In practice, the author notes that P(**A**|y) is to be estimated from the MCMC sequence and that the set **A** should be chosen to return large values of the likelihood, p(y|θ), through importance sampling, hence missing somehow the double opportunity of using an HPD region. Hence using the same default choice as in Lenk (2009), an HPD region which lower bound is derived as the minimum likelihood in the MCMC sample, “range of the posterior sampler output”. Meaning P(**A**|y)=1. (As an aside, the paper does not produce optimality properties or even heuristics towards efficiently choosing the various parameters to be calibrated in the algorithm, like the set **A** itself. As another aside, the paper concludes with a simulation study on an AR(p) model where the marginal may be obtained in closed form if stationarity is not imposed, which I first balked at, before realising that even in this setting both the posterior and the marginal do exist for a finite sample size, and hence the later can be estimated consistently by Monte Carlo methods.) A last remark is that computing costs are not discussed in the comparison of methods.

The final experiment in the paper is aiming at the marginal of a mixture model posterior, operating on the galaxy benchmark used by Roeder (1990) and about every other paper on mixtures since then (incl. ours). The prior is pseudo-conjugate, as in Chib (1995). And label-switching is handled by a random permutation of indices at each iteration. Which may not be enough to fight the attraction of the current mode on a Gibbs sampler and hence does not automatically correct Chib’s solution. As shown in Table 7 by the divergence with Radford Neal’s (1999) computations of the marginals, which happen to be quite close to the approximation proposed by the author. (As an aside, the paper mentions poor performances of Chib’s method when centred at the posterior mean, but this is a setting where the posterior mean is meaningless because of the permutation invariance. As another, I do not understand how the RMSE can be computed in this real data situation.) The comparison is limited to Chib’s method and a few versions of arithmetic and harmonic means. Missing nested sampling (Skilling, 2006; Chopin and X, 2011), and attuned importance sampling as in Berkoff et al. (2003), Marin, Mengersen and X (2005), and the most recent Lee and X (2016) in Bayesian Analysis.