**A** few days ago I came across a short paper in the Central European Journal of Economic Modelling and Econometrics by Pajor and Osiewalski that proposes a correction to the infamous harmonic mean estimator that is essentially the one Darren and I made in 2009, namely to restrict the evaluations of the likelihood function to a subset **A** of the simulations from the posterior. Paper that relates to an earlier 2009 paper by Peter Lenk, which investigates the same object with this same proposal and that we had missed for all that time. The difference is that, while we examine an arbitrary HPD region at level 50% or 80% as the subset **A**, Lenk proposes to derive a minimum likelihood value from the MCMC run and to use the associated HPD region, which means using all simulations, hence producing the same object as the original harmonic mean estimator, except that it is corrected by a multiplicative factor P(**A**). Or rather an approximation. This correction thus maintains the infinite variance of the original, a point apparently missed in the paper.

## Archive for the University life Category

## another version of the corrected harmonic mean estimator

Posted in Books, pictures, Statistics, University life with tags Gibbs sampler, harmonic mean estimator, HPD region, importance sampling, MCMC algorithm, Monte Carlo Statistical Methods on June 11, 2018 by xi'an## maximal spacing around order statistics [#2]

Posted in Books, R, Statistics, University life with tags corrplot, extreme value theory, Gumbel distribution, independence, order statistics, R, riddle, spacings, The Riddler on June 8, 2018 by xi'an**T**he proposed solution of the riddle from the Riddler discussed here a few weeks ago is rather approximative, in that the distribution of

when the n-sample is made of iid Normal variates is (a) replaced with the distribution of one arbitrary minimum and (b) the distribution of the minimum is based on an assumption of independence between the absolute differences. Which does not hold, as shown by the above correlation matrix (plotted via corrplot) for N=11 and 10⁴ simulations. One could think that this correlation decreases with N, but it remains essentially 0.2 for larger values of N. (On the other hand, the minima are essentially independent.)

## Metropolis-Hastings importance sampling

Posted in Books, Statistics, University life with tags central limit theorem, curse of dimensionality, importance sampling, MCMC algorithms, Metropolis-Hastings algorithm, Monte Carlo Statistical Methods, optimal acceptance rate, Pima Indians, Rao-Blackwellisation, sequential Monte Carlo on June 6, 2018 by xi'an*[Warning: As I first got the paper from the authors and sent them my comments, this paper read contains their reply as well.]*

**I**n a sort of crazy coincidence, Daniel Rudolf and Björn Sprungk arXived a paper on a Metropolis-Hastings importance sampling estimator that offers similarities with the one by Ingmar Schuster and Ilja Klebanov posted on arXiv the same day. The major difference in the construction of the importance sampler is that Rudolf and Sprungk use the conditional distribution of the proposal in the denominator of their importance weight, while Schuster and Klebanov go for the marginal (or a Rao-Blackwell representation of the marginal), mostly in an independent Metropolis-Hastings setting (for convergence) and for a discretised Langevin version in the applications. The former use a very functional L² approach to convergence (which reminded me of the early Schervish and Carlin, 1990, paper on the convergence of MCMC algorithms), not all of it necessary in my opinion. As for instance the extension of convergence properties to the augmented chain, namely (current, proposed), is rather straightforward since the proposed chain is a random transform of the current chain. An interesting remark at the end of the proof of the CLT is that the asymptotic variance of the importance sampling estimator is the same as with iid realisations from the target. This is a point we also noticed when constructing population Monte Carlo techniques (more than ten years ago), namely that dependence on the past in sequential Monte Carlo does not impact the validation and the moments of the resulting estimators, simply because “everything cancels” in importance ratios. The mean square error bound on the Monte Carlo error (Theorem 20) is not very surprising as the term ρ(y)²/P(x,y) appears naturally in the variance of importance samplers.

The first illustration where the importance sampler does worse than the initial MCMC estimator for a wide range of acceptance probabilities (Figures 2 and 3, which is which?) and I do not understand the opposite conclusion from the authors.

*[Here is an answer from Daniel and Björn about this point:]*

Indeed the formulation in our paper is unfortunate. The point we want to stress is that we observed in the numerical experiments certain ranges of step-sizes for which MH importance sampling shows a better performance than the classical MH algorithm with optimal scaling. Meaning that the MH importance sampling with optimal step-size can outperform MH sampling, without using additional computational resources. Surprisingly, the optimal step-size for the MH importance sampling estimator seems to remain constant for an increasing dimension in contrast to the well-known optimal scaling of the MH algorithm (given by a constant optimal acceptance rate).

The second uses the Pima Indian diabetes benchmark, amusingly (?) referring to Chopin and Ridgway (2017) who warn against the recourse to this dataset and to this model! The loss in mean square error due to the importance sampling may again be massive (Figure 5) and setting for an optimisation of the scaling factor in Metropolis-Hastings algorithms sounds unrealistic.

*[And another answer from Daniel and Björn about this point:]*

Indeed, Chopin and Ridgway suggest more complex problems with a larger number of covariates as benchmarks. However, the well-studied PIMA data set is a sufficient example in order to illustrate the possible benefits but also the limitations of the MH importance sampling approach. The latter are clearly (a) the required knowledge about the optimal step-size—otherwise the performance can indeed be dramatically worse than for the MH algorithm—and (b) the restriction to a small or at most moderate number of covariates. As you are indicating, optimizing the scaling factor is a challenging task. However, the hope is to derive some simple rule of thumb for the MH importance sampler similar to the well-known acceptance rate tuning for the standard MCMC estimator.

## ISBA code of conduct [open to discussion]

Posted in pictures, University life with tags ABC in Edinburgh, academia, code of conduct, Edinburgh, ISBA, safeisba@bayesian.org, society on June 1, 2018 by xi'an**A**fter many electronic meetings and as numerous consultations with lawyers and other legal experts, the safeISBA committee has come up with a code of conduct that covers all ISBA activities and applies to all members. Just in time for the ISBA 2018 meeting next month in Edinburgh. And all satellite and subsequent ISBA sponsored ones. The document is available on-line for everyone to peruse and discuss. There will be round-table sessions to this purpose in Edinburgh. And hopefully no opportunity for having to implement the proposed disciplinary options… (Comments, complaints, testimonies, concerns, and suggestions can also be sent to our dedicated account safeisba at the standard bayesian.org email address.)

## Markov chain importance sampling

Posted in Books, pictures, Running, Statistics, Travel, University life with tags Berlin, Euler discretisation, Freie Universität Berlin, importance sampling, Ingmar Schuster, Langevin MCMC algorithm, marginal, MCMC algorithms, Metropolis-Hastings algorithm, Rao-Blackwellisation, Université Paris Dauphine, variance reduction on May 31, 2018 by xi'an**I**ngmar Schuster (formerly a postdoc at Dauphine and now in Freie Universität Berlin) and Ilja Klebanov (from Berlin) have recently arXived a paper on recycling proposed values in [a rather large class of] Metropolis-Hastings and unadjusted Langevin algorithms. This means using the proposed variates of one of these algorithms as in an importance sampler, with an importance weight going from the target over the (fully conditional) proposal to the target over the marginal stationary target. In the Metropolis-Hastings case, since the later is not available in most setups, the authors suggest using a Rao-Blackwellised nonparametric estimate based on the entire MCMC chain. Or a subset.

“Our estimator refutes the folk theorem that it is hard to estimate [the normalising constant] with mainstream Monte Carlo methods such as Metropolis-Hastings.”

The paper thus brings an interesting focus on the proposed values, rather than on the original Markov chain, which naturally brings back to mind the derivation of the joint distribution of these proposed values we made in our (1996) Rao-Blackwellisation paper with George Casella. Where we considered a parametric and non-asymptotic version of this distribution, which brings a guaranteed improvement to MCMC (Metropolis-Hastings) estimates of integrals. In subsequent papers with George, we tried to quantify this improvement and to compare different importance samplers based on some importance sampling corrections, but as far as I remember, we only got partial results along this way, and did not cover the special case of the normalising constant Þ… Normalising constants did not seem such a pressing issue at that time, I figure. (A *Monte Carlo 101* question: how can we be certain the importance sampler offers a finite variance?)

Ingmar’s views about this:

I think this is interesting future work. My intuition is that for Metropolis-Hastings importance sampling with random walk proposals, the variance is guaranteed to be finite because the importance distribution ρ_θ is a convolution of your target ρ with the random walk kernel q. This guarantees that the tails of ρ_θ are no lighter than those of ρ. What other forms of q mean for the tails of ρ_θ I have less intuition about.

When considering the Langevin alternative with transition (4), I was first confused and thought it was incorrect for moving from one value of Y (proposal) to the next. But that’s what unadjusted means in “unadjusted Langevin”! As pointed out in the early Langevin literature, e.g., by Gareth Roberts and Richard Tweedie, using a discretised Langevin diffusion in an MCMC framework means there is a risk of non-stationarity & non-ergodicity. Obviously, the corrected (MALA) version is more delicate to approximate (?) but at the very least it ensures the Markov chain does not diverge. Even when the unadjusted Langevin has a stationary regime, its joint distribution is likely quite far from the joint distribution of a proper discretisation. Now this also made me think about a parameterised version in the 1996 paper spirit, but there is nothing specific about MALA that would prevent the implementation of the general principle. As for the unadjusted version, the joint distribution is directly available. (But not necessarily the marginals.)

Here is an answer from Ingmar about that point

Personally, I think the most interesting part is the practical performance gain in terms of estimation accuracy for fixed CPU time, combined with the convergence guarantee from the CLT. ULA was particularly important to us because of the papers of Arnak Dalalyan, Alain Durmus & Eric Moulines and recently from Mike Jordan’s group, which all look at an unadjusted Langevin diffusion (and unimodal target distributions). But MALA admits a Metropolis-Hastings importance sampling estimator, just as Random Walk Metropolis does – we didn’t include MALA in the experiments to not get people confused with MALA and ULA. But there is no delicacy involved whatsoever in approximating the marginal MALA proposal distribution. The beauty of our approach is that it works for almost all Metropolis-Hastings algorithms where you can evaluate the proposal density q, there is no constraint to use random walks at all (we will emphasize this more in the paper).

## on the terrible situation of Venezuela universities

Posted in pictures, Travel, University life with tags Caracas, South America, UCV, Universidad Central de Venezuela, Venezuela on May 30, 2018 by xi'an**S**ome colleagues of mine have started a funding project to help support to some degree our desperate Venezuelan colleagues in maths and stats, who are working under unbelievable duress! For instance, at Universidad Central de Venezuela, in Caracas, which I visited in 2007 and where I gave a short course on MCMC to a large group of enthusiastic graduate students, the building of the department of mathematics has been closed for sanitary reasons and now houses vultures who nest in former offices as the one above… Other departments at UCV have already closed. This comes as no surprise given the general state of the country, where basic needs are increasingly missing, but the disappearance of what was a great education system in South America is a tragedy on its own!