**A**n interesting question from a reader of the Bayesian Choice came out on X validated last week. It was about Laplace’s succession rule, which I found somewhat over-used, but it was nonetheless interesting because the question was about the discrepancy of the “non-informative” answers derived from two models applied to the data: an Hypergeometric distribution in the Bayesian Choice and a Binomial on Wikipedia. The originator of the question had trouble with the difference between those two “non-informative” answers as she or he believed that there was a *single* non-informative principle that should lead to a unique answer. This does not hold, even when following a reference prior principle like Jeffreys’ invariant rule or Jaynes’ maximum entropy tenets. For instance, the Jeffreys priors associated with a Binomial and a Negative Binomial distributions differ. And even less when considering that there is no unity in reaching those reference priors. (Not even mentioning the issue of the reference dominating measure for the definition of the entropy.) This led to an informative debate, which is the point of X validated.

## Archive for the Statistics Category

## same data – different models – different answers

Posted in Books, Kids, Statistics, University life with tags Beaumont-en-Auge, Jeffreys priors, Laplace succession rule, non-informative priors, Normandie, Pierre Simon de Laplace, reference priors, statue, The Bayesian Choice, wikipedia on June 1, 2016 by xi'an## inefficiency of data augmentation for large samples

Posted in Books, pictures, Running, Statistics, Travel, University life with tags convergence of Gibbs samplers, Data augmentation, Gibbs sampling, Hamiltonian Monte Carlo, importance sampling, logit model, MCMC, Monte Carlo Statistical Methods, probit model, simulation, spectral gap on May 31, 2016 by xi'an**O**n Monday, James Johndrow, Aaron Smith, Natesh Pillai, and David Dunson arXived a paper on the diminishing benefits of using data augmentation for large and highly imbalanced categorical data. They reconsider the data augmentation scheme of Tanner and Wong (1987), surprisingly not mentioned, used in the first occurrences of the Gibbs sampler like Albert and Chib’s (1993) or our mixture estimation paper with Jean Diebolt (1990). The central difficulty with data augmentation is that the distribution to be simulated operates on a space that is of order O(n), even when the original distribution covers a single parameter. As illustrated by the coalescent in population genetics (and the subsequent intrusion of the ABC methodology), there are well-known cases when the completion is near to impossible and clearly inefficient (as again illustrated by the failure of importance sampling strategies on the coalescent). The paper provides spectral gaps for the logistic and probit regression completions, which are of order a power of log(n) divided by √n, when all observations are equal to one. In a somewhat related paper with Jim Hobert and Vivek Roy, we studied the spectral gap for mixtures with a small number of observations: I wonder at the existence of a similar result in this setting, when all observations stem from one component of the mixture, when all observations are one. The result in this paper is theoretically appealing, the more because the posteriors associated with such models are highly regular and very close to Gaussian (and hence not that challenging as argued by Chopin and Ridgway). And because the data augmentation algorithm is uniformly ergodic in this setting (as we established with Jean Diebolt and later explored with Richard Tweedie). As demonstrated in the experiment produced in the paper, when comparing with HMC and Metropolis-Hastings (same computing times?), which produce much higher effective sample sizes.

## the random variable that was always less than its mean…

Posted in Books, Kids, R, Statistics with tags cross validated, empirical cdf, Gumbel distribution, R, skewed distribution, Stack Exchange on May 30, 2016 by xi'an**A**lthough this is far from a paradox when realising why the phenomenon occurs, it took me a few lines to understand why the empirical average of a log-normal sample is apparently a biased estimator of its mean. And why conversely the biased plug-in estimator does not appear to present a bias. To illustrate this “paradox” consider the picture below which compares both estimators of the mean of a log-normal LN(0,σ²) distribution as σ² increases: *blue* stands for the empirical mean, while *gold* corresponds to the plug-in estimator exp(σ²/2) when σ² is estimated from the log-sample, as in a normal sample. (The sample is of size 10⁶.) The gold sequence remains around one, while the blue one drifts away towards zero…

The question came on X validated and my first reaction was to doubt an implementation which outcome was so counter-intuitive. But then I thought further about the representation of a log-normal variate as exp(σξ) when ξ is a standard Normal variate. When σ grows large enough, it is near impossible for σξ to be larger than σ². More precisely,

P(X>E[X])=P(σξ>σ²/2)=1-Φ(σ/2)

which can be arbitrarily small.

## another riddle with a stopping rule

Posted in Books, Kids, R with tags 538, FiveThirtyEight, stopping rule, The Riddler on May 27, 2016 by xi'an**A** puzzle on The Riddler last week that is rather similar to an earlier one. Given the probability (1/2,1/3,1/6) on {1,2,3}, what is the mean of the number N of draws to see all possible outcomes and what is the average number of 1’s in those draws? The second question is straightforward, as the proportions of 1’s, 2’s and 3’s in the sequence till all values are observed remain 3/6, 2/6 and 1/6. The first question follows from the representation of the average

as the probability to exceed n is the probability that at least one value is not observed by the n-th draw, namely

3+(1/2)^{n}+(2/3)^{n}+(5/6)^{n}-(1/6)^{n}-(1/3)^{n}-(1/2)^{n}

which leads to an easy summation for the expectation, namely

3+(2/3)³/(1/3)+(5/6)³/(1/6)-(1/3)³/(2/3)-(1/6)³/(5/6)=73/10

## Computing the variance of a conditional expectation via non-nested Monte Carlo

Posted in Books, pictures, Statistics, University life with tags conditional probability, debiasing, Monte Carlo approximations, Monte Carlo Statistical Methods, Rao-Blackwellisation on May 26, 2016 by xi'an**T**he recent arXival by Takashi Goda of Computing the variance of a conditional expectation via non-nested Monte Carlo led me to read it as I could not be certain of the contents from only reading the title! The short paper considers the issue of estimating the variance of a conditional expectation when able to simulate the joint distribution behind the quantity of interest. The second moment E(E[f(X)|Y]²) can be written as a triple integral with two versions of x given y and one marginal y, which means that it can approximated in an unbiased manner by simulating a realisation of y then conditionally two realisations of x. The variance requires a third simulation of x, which the author seems to deem too costly and that he hence replaces with another unbiased version based on two conditional generations only. (He notes that a faster biased version is available with bias going down faster than the Monte Carlo error, which makes the alternative somewhat irrelevant, as it is also costly to derive.) An open question after reading the paper stands with the optimal version of the generic estimator (5), although finding the optimum may require more computing time than it is worth spending. Another one is whether or not this version of the expected conditional variance is more interesting (computation-wise) that the difference between the variance and the expected conditional variance as reproduced in (3) given that both quantities can equally be approximated by unbiased Monte Carlo…

## the end of Series B!

Posted in Statistics, University life, Books, pictures with tags Royal Statistical Society, JRSSB, RSS, Electronic Journal of Statistics, academic journals on May 25, 2016 by xi'an**I** received this news from the RSS today that all the RSS journals are turning 100% electronic. No paper version any longer! I deeply regret this move on which, as an RSS member, I would have appreciated to be consulted as I find much easier to browse through the current issue when it arrives in my mailbox, rather than being t best reminded by an email that I will most likely ignore and erase. And as I consider the production of the journals the prime goal of the Royal Statistical Society. And as I read that only 25% of the members had opted so far for the electronic format, which does not sound to me like a majority. In addition, moving to electronic-only journals does not bring the perks one would expect from electronic journals:

- no bonuses like supplementary material, code, open or edited comments
- no reduction in the subscription rate of the journals and penalty fees if one still wants a paper version, which amounts to a massive increase in the subscription price
- no disengagement from the commercial publisher, whose role become even less relevant
- no access to the issues of the years one has paid for, once one stops subscribing.

“The benefits of electronic publishing include: faster publishing speeds; increased content; instant access from a range of electronic devices; additional functionality; and of course, environmental sustainability.”

The move is sold with typical marketing noise. But I do not buy it: publishing speeds will remain the same as driven by the reviewing part, I do not see where the contents are increased, and I cannot seriously read a journal article from my phone, so this range of electronic devices remains a gadget. Not happy!

## an avalanche of new arXivals

Posted in Books, Statistics, University life with tags arXiv on May 25, 2016 by xi'an**I**n the past few days, I spotted the following papers being arXived, far too many for me to comment them all!

- MCMC with Strings and Branes: The Suburban Algorithm (Extended Version) by Jonathan J. Heckman, Jeffrey G. Bernstein, Ben Vigoda
- Methods for Bayesian Variable Selection with Binary Response Data using the EM Algorithm by Patrick McDermott, John Snyder, Rebecca Willison
- Computing the variance of a conditional expectation via non-nested Monte Carlo by Takashi Goda
- Metropolis-Hastings algorithms with autoregressive proposals, and a few examples by Richard A. Norton, Colin Fox
- Multilevel Particle Filters: Normalizing Constant Estimation by Ajay Jasra, Kengo Kamatani, Prince Prepah Osei, Yan Zhou
- Sobol’ indices for problems defined in non-rectangular domains by S. Kucherenko, O.V. Klymenko, N. Shah

Enjoy!