**T**he paper ‘Unbiased Markov chain Monte Carlo methods with couplings’ by Pierre Jacob et al. will be discussed (or Read) tomorrow at the Royal Statistical Society, 12 Errol Street, London, tomorrow night, Wed 11 December, at 5pm London time. With a pre-discussion session at 3pm, involving Chris Sherlock and Pierre Jacob, and chaired by Ioanna Manolopoulou. While I will alas miss this opportunity, due to my trip to Vancouver over the weekend, it is great that that the young tradition of pre-discussion sessions has been rekindled as it helps put the paper into perspective for a wider audience and thus makes the more formal Read Paper session more profitable. As we discussed the paper in Paris Dauphine with our graduate students a few weeks ago, we will for certain send one or several written discussions to Series B!

## Archive for unbiasedness

## unbiased MCMC discussed at the RSS tomorrow night

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags AABI, coupling, discussion paper, Journal of the Royal Statistical Society, Markov chain Monte Carlo algorithm, MCMC, Read paper, Royal Statistical Society, Series B, unbiasedness, Université Paris Dauphine, Vancouver on December 10, 2019 by xi'an## unbiased estimators that do not exist

Posted in Statistics with tags binomial distribution, counterexample, cross validated, exam, existence of unbiased estimators, monomial, teaching, unbiasedness on January 21, 2019 by xi'an**W**hen looking at questions on X validated, I came across this seemingly obvious request for an unbiased estimator of **P**(X=k), when X~**B**(n,p). Except that X is not observed but only Y~**B**(s,p) with s<n. Since **P**(X=k) is a polynomial in p, I was expecting such an unbiased estimator to exist. But it does not, for the reasons that Y only takes s+1 values and that any function of Y, including the MLE of **P**(X=k), has an expectation involving monomials in p of power s at most. It is actually straightforward to establish properly that the unbiased estimator does not exist. But this remains an interesting additional example of the rarity of the existence of unbiased estimators, to be saved until a future mathematical statistics exam!

## the [not so infamous] arithmetic mean estimator

Posted in Books, Statistics with tags arithmetic mean estimator, Bayesian Analysis, Chib's approximation, harmonic mean estimator, HPD region, importance sampling, label switching, mixture of distributions, nested sampling, unbiasedness on June 15, 2018 by xi'an

“Unfortunately, no perfect solution exists.”Anna Pajor

**A**nother paper about harmonic and not-so-harmonic mean estimators that I (also) missed came out last year in Bayesian Analysis. The author is Anna Pajor, whose earlier note with Osiewalski I also spotted on the same day. The idea behind the approach [which belongs to the branch of Monte Carlo methods requiring additional simulations after an MCMC run] is to start as the corrected harmonic mean estimator on a restricted set **A** as to avoid tails of the distributions and the connected infinite variance issues that plague the harmonic mean estimator (an old ‘Og tune!). The marginal density p(y) then satisfies an identity involving the prior expectation of the likelihood function restricted to **A** divided by the posterior coverage of **A**. Which makes the resulting estimator unbiased only when this posterior coverage of **A** is known, which does not seem realist or efficient, except if **A** is an HPD region, as suggested in our earlier “safe” harmonic mean paper. And efficient only when **A** is well-chosen in terms of the likelihood function. In practice, the author notes that P(**A**|y) is to be estimated from the MCMC sequence and that the set **A** should be chosen to return large values of the likelihood, p(y|θ), through importance sampling, hence missing somehow the double opportunity of using an HPD region. Hence using the same default choice as in Lenk (2009), an HPD region which lower bound is derived as the minimum likelihood in the MCMC sample, “range of the posterior sampler output”. Meaning P(**A**|y)=1. (As an aside, the paper does not produce optimality properties or even heuristics towards efficiently choosing the various parameters to be calibrated in the algorithm, like the set **A** itself. As another aside, the paper concludes with a simulation study on an AR(p) model where the marginal may be obtained in closed form if stationarity is not imposed, which I first balked at, before realising that even in this setting both the posterior and the marginal do exist for a finite sample size, and hence the later can be estimated consistently by Monte Carlo methods.) A last remark is that computing costs are not discussed in the comparison of methods.

The final experiment in the paper is aiming at the marginal of a mixture model posterior, operating on the galaxy benchmark used by Roeder (1990) and about every other paper on mixtures since then (incl. ours). The prior is pseudo-conjugate, as in Chib (1995). And label-switching is handled by a random permutation of indices at each iteration. Which may not be enough to fight the attraction of the current mode on a Gibbs sampler and hence does not automatically correct Chib’s solution. As shown in Table 7 by the divergence with Radford Neal’s (1999) computations of the marginals, which happen to be quite close to the approximation proposed by the author. (As an aside, the paper mentions poor performances of Chib’s method when centred at the posterior mean, but this is a setting where the posterior mean is meaningless because of the permutation invariance. As another, I do not understand how the RMSE can be computed in this real data situation.) The comparison is limited to Chib’s method and a few versions of arithmetic and harmonic means. Missing nested sampling (Skilling, 2006; Chopin and X, 2011), and attuned importance sampling as in Berkoff et al. (2003), Marin, Mengersen and X (2005), and the most recent Lee and X (2016) in Bayesian Analysis.

## unbiased consistent nested sampling via sequential Monte Carlo [a reply]

Posted in pictures, Statistics, Travel with tags auxiliary variable, Brisbane, evidence, marginal likelihood, nested sampling, Og, particle filter, QUT, unbiasedness on June 13, 2018 by xi'an*Rob Salomone sent me the following reply on my comments of yesterday about their recently arXived paper.*

“Which never occurred as the number one difficulty there, as the simplest implementation runs a Markov chain from the last removed entry, independently from the remaining entries. Even stationarity is not an issue sinceI believe that the first occurrence within the level set is distributed from the constrained prior.”

“And then, in a twist that is not clearly explained in the paper, the focus moves to an improved nested sampler that moves one likelihood value at a time, with a particle step replacing a singleparticle. (Things get complicated when several particles may take the very same likelihood value, but randomisation helps.) At this stage the algorithm is quite similar to the original nested sampler. Except for the unbiased estimation of the constants, thefinal constant, and the replacement of exponential weights exp(-t/N) by powers of (N-1/N)”

**is**a special case of SMC (with the weights replaced with a suboptimal choice).

## unbiased consistent nested sampling via sequential Monte Carlo

Posted in pictures, Statistics, Travel with tags auxiliary variable, Brisbane, evidence, marginal likelihood, nested sampling, Og, particle filter, QUT, unbiasedness on June 12, 2018 by xi'an

“Moreover, estimates of the marginal likelihood are unbiased.” (p.2)

Rob Salomone, Leah South, Chris Drovandi and Dirk Kroese (from QUT and UQ, Brisbane) recently arXived a paper that frames the nested sampling in such a way that marginal likelihoods can be unbiasedly (and consistently) estimated.

“Why isn’t nested sampling more popular with statisticians?” (p.7)

A most interesting question, especially given its popularity in cosmology and other branches of physics. A first drawback pointed out in the c is the requirement of independence between the elements of the sample produced at each iteration. Which never occurred as the number one difficulty there, as the simplest implementation runs a Markov chain from the last removed entry, independently from the remaining entries. Even stationarity is not an issue since I believe that the first occurrence within the level set is distributed from the constrained prior.

A second difficulty is the use of quadrature which turns integrand into step functions at random slices. Indeed, mixing Monte Carlo with numerical integration makes life much harder, as shown by the early avatars of nested sampling that only accounted for the numerical errors. (And which caused Nicolas and I to write our critical paper in Biometrika.) There are few studies of that kind in the literature, the only one I can think of being [my former PhD student] Anne Philippe‘s thesis twenty years ago.

The third issue stands with the difficulty in parallelising the method. Except by jumping k points at once, rather than going one level at a time. While I agree this makes life more complicated, I am also unsure about the severity of that issue as k nested sampling algorithms can be run in parallel and aggregated in the end, from simple averaging to something more elaborate.

The final blemish is that the nested sampling estimator has a stopping mechanism that induces a truncation error, again maybe a lesser problem given the overall difficulty in assessing the total error.

The paper takes advantage of the ability of SMC to produce unbiased estimates of a sequence of normalising constants (or of the normalising constants of a sequence of targets). For nested sampling, the sequence is made of the prior distribution restricted to an embedded sequence of level sets. With another sequence restricted to bands (likelihood between two likelihood boundaries). If all restricted posteriors of the second kind and their normalising constant are known, the full posterior is known. Apparently up to the main normalising constant, i.e. the marginal likelihood., *ℨ*, except that it is also the sum of all normalising constants. Handling this sequence by SMC addresses the four concerns of the four authors, apart from the truncation issue, since the largest likelihood bound need be set for running the algorithm.

When the sequence of likelihood bounds is chosen based on the observed likelihoods so far, the method becomes adaptive. Requiring again the choice of a stopping rule that may induce bias if stopping occurs too early. And then, in a twist that is not clearly explained in the paper, the focus moves to an improved nested sampler that moves one likelihood value at a time, with a particle step replacing a single particle. (Things get complicated when several particles may take the very same likelihood value, but randomisation helps.) At this stage the algorithm is quite similar to the original nested sampler. Except for the unbiased estimation of the constants, the final constant, and the replacement of exponential weights exp(-t/N) by powers of (N-1/N).

The remainder of this long paper (61 pages!) is dedicated to practical implementation, calibration and running a series of comparisons. A nice final touch is the thanks to the ‘Og for its series of posts on nested sampling, which “helped influence this work, and played a large part in inspiring it.”

In conclusion, this paper is certainly a worthy exploration of the nested sampler, providing further arguments towards a consistent version, with first and foremost an (almost?) unbiased resolution. The comparison with a wide range of alternatives remains open, in particular time-wise, if evidence is the sole target of the simulation. For instance, the choice of this sequence of targets in an SMC may be improved by another sequence, since changing one particle at a time does not sound efficient. The complexity of the implementation and in particular of the simulation from the prior under more and more stringent constraints need to be addressed.

## about paradoxes

Posted in Books, Kids, Statistics, University life with tags bias, book review, email, Jacobian, Mark Chang, MLE, paradoxes, reparameterisation, scientific inference, The Bayesian Choice, unbiasedness on December 5, 2017 by xi'an**A**n email I received earlier today about statistical paradoxes:

I am a PhD student in biostatistics, and an avid reader of your work. I recently came across this blog post, where you review a text on statistical paradoxes, and I was struck by this section:

I found this section provocative, but I am unclear on the nature of these “paradoxes”. I reviewed my stat inference notes and came across the classic example that there is no unbiased estimator for 1/p w.r.t. a binomial distribution, but I believe you are getting at a much more general result. If it’s not too much trouble, I would sincerely appreciate it if you could point me in the direction of a reference or provide a bit more detail for these two “paradoxes”.

The text is Chang’s Paradoxes in Scientific Inference, which I indeed reviewed negatively. To answer about the bias “paradox”, it is indeed a neglected fact that, while the average of *any* transform of a sample obviously is an unbiased estimator of its mean (!), the converse does not hold, namely, an *arbitrary* transform of the model parameter θ is not necessarily enjoying an unbiased estimator. In Lehmann and Casella, Chapter 2, Section 4, this issue is (just slightly) discussed. But essentially, transforms that lead to unbiased estimators are mostly the polynomial transforms of the mean parameters… (This also somewhat connects to a recent X validated question as to why MLEs are not always unbiased. Although the simplest explanation is that the transform of the MLE is the MLE of the transform!) In exponential families, I would deem the range of transforms with unbiased estimators closely related to the collection of functions that allow for inverse Laplace transforms, although I cannot quote a specific result on this hunch.

The other “paradox” is that, if h(X) is the MLE of the model parameter θ for the observable X, the distribution of h(X) has a density different from the density of X and, hence, its maximisation in the parameter θ may differ. An example (my favourite!) is the MLE of ||a||² based on x N(a,I) which is ||x||², a poor estimate, and which (strongly) differs from the MLE of ||a||² based on ||x||², which is close to (1-p/||x||²)²||x||² and (nearly) admissible [as discussed in the Bayesian Choice].

## unbiased HMC

Posted in Books, pictures, Statistics with tags Boston, coupling, Hamiltonian Monte Carlo, Harvard, HMC, Massachusetts, sunrise, unbiasedness on September 25, 2017 by xi'an**J**eremy Heng and Pierre Jacob arXived last week a paper on unbiased Hamiltonian Monte Carlo by coupling, following the earlier paper of Pierre and co-authors on debiasing by coupling a few weeks ago. The coupling within the HMC amounts to running two HMC chains with common random numbers, plus subtleties!

“As with any other MCMC method, HMC estimators are justified in the limit of the number of iterations. Algorithms which rely on such asymptotics face the risk of becoming obsolete if computational power keeps increasing through the number of available processors and not through clock speed.”

The main difficulty here is to have both chains meet (exactly) with large probability, since coupled HMC can only bring these chain close to one another. The trick stands in using both coupled HMC *and* coupled Hastings-Metropolis kernels, since the coupled MH kernel allows for exact meetings when the chains are already close, after which they remain happily and forever together! The algorithm is implemented by choosing between the kernels at random at each iteration. (Unbiasedness follows by the Glynn-Rhee trick, which is eminently well-suited for coupling!) As pointed out from the start of the paper, the appeal of this unbiased version is that the algorithm can be (embarrassingly) parallelised since all processors in use return estimators that are iid copies of one another, hence easily merged into a better estimator.