**F**ollowing Victor Elvira‘s visit to Dauphine, one and a half year ago, where we discussed the many defects of ESS as a default measure of efficiency for importance sampling estimators, and then some more efforts (mostly from Victor!) to formalise these criticisms, Victor, Luca Martino and I wrote a paper on this notion, now arXived. (Victor most kindly attributes the origin of the paper to a 2010 ‘Og post on the topic!) The starting thread of the (re?)analysis of this tool introduced by Kong (1992) is that the ESS used in the literature is an *approximation* to the “true” ESS, generally unavailable. Approximation that is pretty crude and hence impacts the relevance of using it as *the* assessment tool for comparing importance sampling methods. In the paper, we re-derive (with the uttermost precision) the resulting approximation and list the many assumptions that [would] validate this approximation. The resulting drawbacks are many, from the absurd property of always being worse than direct sampling, to being independent from the target function and from the sample *per se*. Since only importance weights matter. This list of issues is not exactly brand new, but we think it is worth signaling given the fact that this approximation has been widely used in the last 25 years, due to its simplicity, as a practical rule of thumb [!] in a wide variety of importance sampling methods. In continuation of the directions drafted in Martino et al. (2017), we also indicate some alternative notions of importance efficiency. Note that this paper does not cover the use of ESS for MCMC algorithms, where it is somewhat more legit, if still too rudimentary to really catch convergence or lack thereof! *[Note: I refrained from the post title resinking the ESS…]*

## Archive for simulation

## rethinking the ESS

Posted in Statistics with tags arXiv, delta method, effective sample size, efficiency measures, efficient importance sampling, ESS, importance sampling, MCMC, Monte Carlo Statistical Methods, simulation on September 14, 2018 by xi'an## hitting a wall

Posted in Books, Kids, R, Statistics, University life with tags basic statistics, cross validated, kernel density estimator, non-parametric kernel estimation, simulation, Statistical Forum on July 5, 2018 by xi'an**O**nce in a while, or a wee bit more frequently (!), it proves impossible to communicate with a contributor of a question on X validated. A recent instance was about simulating from a multivariate kernel density estimate where the kernel terms at x¹,x²,… are Gaussian kernels applied to the inverses of the norms |x-x¹|, |x-x²|,… rather than to the norms as in the usual formulation. The reason for using this type of kernel is unclear, as it certainly does not converge to an estimate of the density of the sample x¹,x²,… as the sample size grows, since it excludes a neighbourhood of each point in the sample. Since the kernel term tends to a non-zero constant at infinity, the support of the density estimate is restricted to the hypercube [0,1]x…x[0,1], again with unclear motivations. No mention being made of the bandwidth adopted for this kernel. If one takes this exotic density as a given, the question is rather straightforward as the support is compact, the density bounded and a vanilla accept-reject can be implemented. As illustrated by the massive number of comments on that entry, it did not work as the contributor adopted a fairly bellicose attitude about suggestions from moderators on that site and could not see the point in our requests for clarification, despite plotting a version of the kernel that had its maximum [and not its minimum] at x¹… After a few attempts, including writing a complete answer, from which the above graph is taken (based on an initial understanding of the support being for (x-x¹), …), I gave up and deleted all my entries.On that question.

## best unbiased estimator of θ² for a Poisson model

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags Bahamas, best unbiased estimator, complete statistics, counterexample, cross validated, George Forsythe, John von Neumann, Monte Carlo Statistical Methods, Poisson distribution, Rao-Blackwell theorem, sailing, shark, simulation on May 23, 2018 by xi'an**A** mostly traditional question on X validated about the “best” [minimum variance] unbiased estimator of θ² from a Poisson P(θ) sample leads to the Rao-Blackwell solution

and a similar estimator could be constructed for θ³, θ⁴, … With the interesting limitation that this procedure stops at the power equal to the number of observations (minus one?). But, since the expectation of a power of the sufficient statistics S [with distribution P(nθ)] is a polynomial in θ, there is *de facto* no limitation. More interestingly, there is no unbiased estimator of negative powers of θ in this context, while this neat comparison on Wikipedia (borrowed from the great book of counter-examples by Romano and Siegel, 1986, selling for a mere $180 on amazon!) shows why looking for an unbiased estimator of exp(-2θ) is particularly foolish: the only solution is (-1) to the power S [for a single observation]. (There is however a first way to circumvent the difficulty if having access to an arbitrary number of generations from the Poisson, since the Forsythe – von Neuman algorithm allows for an unbiased estimation of exp(-F(x)). And, as a second way, as remarked by Juho Kokkala below, a sample of at least two Poisson observations leads to a more coherent best unbiased estimator.)

## independent random sampling methods [book review]

Posted in Books, Statistics, University life with tags book review, inverse cdf, MCMC, Monte Carlo methods, Monte Carlo Statistical Methods, multiple try Metropolis, Non-Uniform Random Variate Generation, PRNG, random number generation, ratio of uniform algorithm, simulation, Springer-Verlag, Universidad Carlos III de Madrid, vertical density representation on May 16, 2018 by xi'an**L**ast week, I had the pleasant surprise to receive a copy of this book in the mail. Book that I was not aware had been written or published (meaning that I was not involved in its review!). The three authors, Luca Martino, David Luengo, and Joaquín Míguez, of Independent Random Sampling Methods are from Madrid universities and I have read (and posted on) several of their papers on (population) Monte Carlo simulation in the recent years. Including Luca’s survey of multiple try MCMC which was helpful in writing our WIREs own survey.

The book is a pedagogical coverage of most algorithms used to simulate independent samples from a given distribution, which of course recoups some of the techniques exposed with more details by [another] Luc, namely Luc Devroye’s Non-uniform random variate generation bible, often mentioned here (and studied in uttermost details by a dedicated reading group in Warwick). It includes a whole chapter on accept-reject methods, with in particular a section on Payne-Dagpunar’s band rejection I had not seen previously. And another entire chapter on ratio-of-uniforms techniques. On which the three authors had proposed generalisations [covered by the book], years before I attempted to go the same way, having completely forgotten reading their paper at the time… Or the much earlier 1991 paper by Jon Wakefield, Alan Gelfand and Adrian Smith!

The book also covers the “vertical density representation”, due to Troutt (1991), which consists in considering the distribution of the density p(.) of the random variable X as a random variable, p(X). I remember pondering about this alternative to the cdf transform and giving up on it as the outcome has a distribution depending on p, even when the density is monotonous. Even though I am not certain from reading the section that this is particularly appealing…

Given its title, the book contains very little about MCMC. Except for a last and final chapter that covers adaptive independent Metropolis-Hastings algorithms, in connection with some of the authors’ recent work. Like multiple try Metropolis. Relating to the (unidimensional) ARMS “ancestor” of adaptive MCMC methods. (As noted in a recent blog on Holden et al., 2009 , I have trouble understanding how recycling only rejected proposed values to build a better proposal distribution is enough to guarantee convergence of an adaptive algorithm, but the book does not delve much into this convergence.)

All in all and with the bias induced by me working in the very area, I find the book quite a nice entry on the topic, which can be used in a Monte Carlo course at both undergraduate and graduate levels if one want to avoid going into Markov chains. It is certainly less likely to scare students away than the comprehensive Non-uniform random variate generation and on the opposite may induce some of them to pursue a research career in this domain.

## fiducial simulation

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags book review, conditional density, English train, fiducial statistics, Jeffreys prior, Monte Carlo Statistical Methods, Oxford, pseudo-random generator, simulation, Student's t distribution, sufficient statistics, University of Warwick on April 19, 2018 by xi'an**W**hile reading Confidence, Likelihood, Probability), by Tore Schweder and Nils Hjort, in the train from Oxford to Warwick, I came upon this unexpected property shown by Lindqvist and Taraldsen (Biometrika, 2005) that to simulate a sample **y** conditional on the realisation of a sufficient statistic, T(**y**)=t⁰, it is sufficient (!!!) to simulate the components of **y** as y=G(u,θ), with u a random variable with fixed distribution, e.g., a U(0,1), and to solve in θ the fixed point equation T(**y**)=t⁰. Assuming there exists a single solution. Brilliant (like an aurora borealis)! To borrow a simple example from the authors, take an exponential sample to be simulated given the sum statistics. As it is well-known, the conditional distribution is then a (rescaled) Beta and the proposed algorithm ends up being a standard Beta generator. For the method to work in general, T(**y**) must factorise through a function of the u’s, a so-called pivotal condition which brings us back to my post title. If this condition does not hold, the authors once again brilliantly introduce a pseudo-prior distribution on the parameter θ to make it independent from the u’s conditional on T(**y**)=t⁰. And discuss the choice of the Jeffreys prior as optimal in this setting even when this prior is improper. While the setting is necessarily one of exponential families and of sufficient conditioning statistics, I find it amazing that this property is not more well-known [at least by me!]. And wonder if there is an equivalent outside exponential families, for instance for simulating a ** t** sample conditional on the average of this sample.

## accelerating MCMC

Posted in Books, Statistics, University life with tags acceleration of MCMC algorithms, algorithms, arXiv, cross validated, MCMC, Monte Carlo Statistical Methods, referee, simulation, Telecom Lille, typology, Université Paris Dauphine, University of Warwick, WIREs on April 11, 2018 by xi'an**A**s forecasted a rather long while ago (!), I wrote a short and incomplete survey on some approaches to accelerating MCMC. With the massive help of Victor Elvira (Lille), Nick Tawn (Warwick) and Changye Wu (Dauphine). Survey which current version just got arXived and which has now been accepted by WIREs Computational Statistics. The typology (and even the range of methods) adopted here is certainly mostly arbitrary, with suggestions for different divisions made by a very involved and helpful reviewer. While we achieved a quick conclusion to the review process, suggestions and comments are most welcome! Even if we cannot include every possible suggestion, just like those already made on X validated. (WIREs stands for Wiley Interdisciplinary Reviews and its dozen topics cover several fields, from computational stats to biology, to medicine, to engineering.)