## non-negative unbiased estimators

Posted in Books, Kids, Statistics, University life with tags , , , , , on October 3, 2013 by xi'an

Pierre Jacob and Alexandre Thiéry just arXived a highly pertinent paper on the most debated issue of non-negative unbiased estimators (of positive quantities). If you remember that earlier post of mine, I mentioned the issue in connection with the Russian roulette estimator(s) of Mark Girolami et al. And, as Pierre and Alexandre point out in the paper, there is also a clear and direct connection with the Bernoulli factory problem. And with our Vanilla Rao-Blackwellisation technique (sadly overlooked, once more!).

The first thing I learned from the paper is how to turn a converging sequence into an unbiased estimator. If (En) is this converging sequence, with limit μ, then

$\sum_{n=0}^N (E_n-E_{n-1}) / \mathbb{P}(N\ge n)$

is unbiased..! Amazing. Even though the choice of the distribution of N matters towards getting a finite variance estimator, this transform is simply amazing. (Of course, once one looks at it, one realises it is the “old” trick of turning a series into a sequence and vice-versa. Still…!) And then you can reuse it into getting an unbiased estimator for almost any transform of μ.

The second novel thing in the paper is the characterisation of impossible cases for non-negative unbiased estimators. For instance, if the original sequence has an unbounded support, there cannot be such an estimator. If the support is an half-line, the transform must be monotonous monotonic. If the support is a bounded interval (a,b), then the transform must be bounded from below by a polynomial bound

$\epsilon\,\min\{(x-a)^m,(b-x)^n\}$

(where the extra-parameters obviously relate to the transform). (In this later case, the authors also show how to derive a Bernoulli estimator from the original unbiased estimator.)

## IS² for Bayesian inference

Posted in Statistics, University life with tags , , , , on September 26, 2013 by xi'an

“…the method of Approximate Bayesian Computation (ABC) may be used to estimate unbiasedly an approximation to the likelihood.”

Minh-Ngoc Tran, Marcel Scharth, Michael Pitt and Robert Kohn arXived a paper on using an unbiased estimate of the likelihood in lieu of the genuine thing and still getting convergence to the right thing. While the spirit of the paper is in the same spirit as the fundamental paper of  Andrieu and Roberts (2009, AoS, somewhat surprisingly missing from the reference), comparing the asymptotic efficiency of using an estimate versus using the genuine likelihood, my attention was distracted by the above quote. This is the only sentence (besides the abstract) where ABC is mentioned and I was a bit confused: ABC is used to estimate an approximation to the likelihood, for sure, converging to

$\int_{d(x,x^\text{obs})\le\varepsilon} f(x|\theta)\,\text{d}\theta$

as the number of pseudo-datasets grows to infinity and it is unbiased on this sense, but this is not the reason for using ABC, as the ABC pseudo-likelihood above is the (by)product of the methodology rather than the genuine quantity of interest. Reading the sentence too fast gave me the feeling that ABC did produce an unbiased approximation to the genuine likelihood! Distracted I was, since this is not at all the point of the paper! However, I would be curious to see how it applies to ABC.

The core result is the convergence of an importance sampling estimator using a likelihood estimated by importance sampling (hence the IS², also inspired by SMC²),. The trick in the proof is to turn the computation of  the likelihood estimand into the production of an (unobserved or “implicitly generated”) auxiliary variable and then to rewrite the original estimator as a genuine importance estimator. (This seems to imply the derivation of an independent importance sampling estimator of the likelihood at each iteration, right?) Standard convergence results then follow, except that the asymptotic variance has an extra term. And except that the estimator of the likelihood does not have to converge, i.e. can keep a fixed number of terms and a positive variance. The second part of the paper establishes that using an estimate degrades the asymptotic variance.

## Alésia sunset

Posted in pictures, Running, Statistics, University life, Wines with tags , , , , , , , on July 12, 2013 by xi'an

Mark Girolami came on Monday for a short visit at CREST this week, to discuss further the Russian roulette with Nicolas and I (and evacuate some of my “worries”), exploit the potential links with vanilla Rao-Blackwellisation, and look at other directions of common interest. In the conversation, we spent a while pondering about the “sign problem”, namely the difficulty with signed unbiased estimates of positive normalising constants. Quickly bumping into the impossibility of simulating from a negative density. Not that we had high expectations of solving in a single afternoon an NP hard problem, and one of the major unsolved problems in the physics of many-particle systems… Although Mark had made the “mistake” of picking a Monday for his visit, reducing considerably the potential for wine bars and great restaurants in the area, we undertook to play Russian roulette with sea-shells, at a brasserie in the shadow of Alésia church, without any of us being hit by a bacterial bullet. (Mark then played the Parisian roulette by biking back to the north of Paris and his hotel, again managing to foil the automotive bullet!)

## n-1,n,n+1, who [should] care?!

Posted in Statistics, University life with tags , , on February 5, 2013 by xi'an

Terry Speed wrote a column in the latest IMS Bulletin (the one I received a week ago) about the choice of the denominator in the variance estimator. That is, should s² involve n (number of observations), n-1 (degrees of freedom), n+1 or anything else in its denominator? I find the question more interesting than the answer (sorry, Terry!) as it demonstrates quite forcibly that there is not a single possible choice for this estimator of the variance but that instead the “optimal” estimator is determined by the choice of the optimality criterion: this makes for a wonderful (if rather formal) playground for a class on decision theoretic statistics. And I often use it on my students. Non-Bayesian mathematical statistics courses often give the impression that there is a natural (single) estimator, when this estimator is based on an implicit choice of an optimality criterion. (This issue is illustrated in the books of Chang and of Vasishth and Broe I discussed earlier. As well as by the Stein effect, of course.) I thus deem it worthwhile to impress upon all users of statistics that there is no such single optimal choice, that unbiasedness is not a compulsory property—just as well since most parameters cannot be estimated in an unbiased manner!—, and that there is room for a subjective choice of a “best” estimator, as paradoxical as it may sound to non-statisticians.