**A**n ICLR 2019 paper by Neklyudov, Egorov and Vetrov on an optimal choice of the proposal in an independent Metropolis algorithm I discovered via an X validated question. Namely whether or not the expected Metropolis-Hastings acceptance ratio is always one (which it is not when the support of the proposal is restricted). The paper mentions the domination of the Accept-Reject algorithm by the associated independent Metropolis-Hastings algorithm, which has actually been stated in our Monte Carlo Statistical Methods (1999, Lemma 6.3.2) and may prove even older. The authors also note that the expected acceptance probability is equal to one minus the total variation distance between the joint defined as target x Metropolis-Hastings proposal distribution and its time-reversed version. Which seems to suffer from the same difficulty as the one mentioned in the X validated question. Namely that it only holds when the support of the Metropolis-Hastings proposal is at least the support of the target (or else when the support of the joint defined as target x Metropolis-Hastings proposal distribution is somewhat symmetric. Replacing total variation with Kullback-Leibler then leads to a manageable optimisation target if the proposal is a parameterised independent distribution. With a GAN version when the proposal is not explicitly available. I find it rather strange that one still seeks independent proposals for running Metropolis-Hastings algorithms as the result will depend on the family of proposals considered and as performances will deteriorate with dimension (the authors mention a 10% acceptance rate, which sounds quite low). [As an aside, ICLR 2020 will take part in Addis Abeba next April.]

## Archive for simulation

## an independent sampler that maximizes the acceptance rate of the MH algorithm

Posted in Books, Kids, Statistics, University life with tags accept-reject algorithm, adaptive Monte Carlo algorithm, Addis Abeba, Bayesian GANs, Ethiopia, ICLR 2019, importance sampling, Kullback-Leibler divergence, Monte Carlo Statistical Methods, optimal acceptance rate, optimisation, reversibility, simulation, total variation on September 3, 2019 by xi'an## off to SimStat2019, Salzburg

Posted in Mountains, Running, Statistics, University life with tags ABC, Alps, approximate Bayesian inference, Austria, Berchtesgaden Alps, climbing, conference, Hohe Tauern, Monte Carlo Statistical Methods, Salzburg, SimStat2019, simulation, simulation and statistics, workshop on September 2, 2019 by xi'an**T**oday, I am off to Salzburg for the SimStat 2019 workshop, or more formally the 10th International Workshop on Simulation and Statistics, where I give a talk on ABC. The program of the workshop is quite diverse and rich and so I do not think I will have time to take advantage of the Hohe Tauern or the Berchtesgaden Alps to go climbing. Especially since I am also discussing papers in an ABC session.

## simulation fodder for future exams

Posted in Books, Kids, R, Statistics with tags accept-reject algorithm, cross validated, finite mixtures, Monte Carlo Statistical Methods, simulation on February 20, 2019 by xi'an**H**ere are two nice exercises for a future simulation exam, seen and solved on X validated.The first one is about simulating a Gibbs sampler associated with the joint target

exp{-|x|-|y|-a|y-x|}

defined over IR² for a≥0 (or possibly a>-1). The conditionals are identical and non-standard, but a simple bound on the conditional density is the corresponding standard double exponential density, which makes for a straightforward accept-reject implementation. However it is also feasible to break the full conditional into three parts, depending on the respective positions of x, y, and 0, and to obtain easily invertible cdfs on the three intervals.The second exercise is about simulating from the cdf

which can be numerically inverted. It is however more fun to call for an accept-reject algorithm by bounding the density with a ½ ½ mixture of an Exponential Exp(a) and of the 1/(p+1)-th power of an Exponential Exp(b/(p+1)). Since no extra constant appears in the solution, I suspect the (p+1) in b/(p+1) was introduced on purpose. As seen in the above fit for 10⁶ simulations (and a=1,b=2,p=3), there is no deviation from the target! There is nonetheless an even simpler and rather elegant resolution to the exercise: since the tail function (1-F(x)) appears as the product of two tail functions, exp(-ax) and the other one, the cdf is the distribution of the minimum of two random variates, one with the Exp(a) distribution and the other one being the 1/(p+1)-th power of an Exponential Exp(b/(p+1)) distribution. Which of course returns a very similar histogram fit:

## a new method to solve the transformation of calculus

Posted in Statistics with tags Book, calculus, mail of the day, mass emailing, noise and no signal, simulation, spam on December 23, 2018 by xi'an**A**n hilariously ridiculous email I just received (warning: book cover unrelated):

Good day! this is very important to the “Mathematics” and the related fields,

“The Simulator”,“Probability theory”,”Statistics”,”Numerical Analysis”,

“Cryptography”,“Data mining”,“The big data analysis”and“Artificial Intelligence”.

The transformation of random variables in Calculus is very difficult and sometimes

is impossible to be done. The simulator can get the accuracy and precise simulated data

and the database could be the probability distributution if the data size is 100,000,000

or more. The probabilistic model can be designed and getting the probability distribution

and the coefficient using the simulator.

(1)“The Simulator” has four methods,

1) the basic method is the inverse function of the distribution function,

2) the transformation method to get the simulated data,

3) the numerical analysis method to build the simulated database,

4) the simulated database and the estimated line of random variable to get the simulated data.

(2) “Probability Theory” can use the simulator to a tool.

(3) ”Statistics”, the sampling distribution of the point estimator and the test statistic

can be seen as the transformation equation and the critical point and p value is from

the sampling distribution.

(4) ”Numerical Analysis”, the simulator data can overcome the limit of numerical analysis,

the number of random variables could be more 10000.

(5) “Cryptography”, the simulator of the probabilistic model will derive the lock code

which cannot be unlocked.

(6) “Data mining”, the data set can be a specific probability distribution using

“goodness of fit” or “Curve-fitting” or “Curvilinear”.

1) “goodness of fit”, there are 45 distributions for the null hypothesis.

2) “Curve-fitting”, the estimated line of random variable and the estimated line

of the distribution function.

3) “Curvilinear”, the data set is not arithmetic series.

(7) “The big data analysis”, the number of random variables could be more 10000

about the simulator of the probabilistic model.

(8) “Artificial Intelligence”, the model after analysis can be the transformation

equation, the simulator of the probabilistic model can get the simulated data.

The first book name is “The simulator” will be public, the context contains

(1) The simulation methods,

(2)“Probability Theory”,

(3) ”Statistics” and how to write the statistical package even the population is not

Normal distribution or a special statistical model.

(4)“Cryptography”,

(5)“Explored the arithmetic data”,

## ziggurat algorithm

Posted in Books, pictures, Statistics, University life with tags accept-reject algorithm, Box-Muller algorithm, fixed-point representation, floating-point representation, George Marsaglia, inverse cdf, Luc Devroye, Non-Uniform Random Variate Generation, random number generation, simulation, Sunday morning, ziggurat algorithm on October 30, 2018 by xi'an

AWikipediaziggurat(Akkadian: ziqqurat, D-stem of zaqāru “to build on a raised area”) is a type of massive stone structure built in ancient Mesopotamia. It has the form of a terraced compound of successively receding stories or levels.

*I*n a recent arXival, Jalalvand and Charsooghi revisit the ziggurat algorithm that simulates from a univariate distribution by finding horizontal strips that pile up on top of the target as in a ziggurat or a pyramid, hence the name. Which George Marsaglia introduced in 1963. When finely tuned the method is quite efficient. Maybe because it designs an accept-reject move for each strip of the ziggurat rather than globally. For instance, versions constructed for a Normal target are more efficient [3½ times faster] than the Box-Muller algorithm. The generalisation found in the paper divides the target into strips of equal area, rather than dominating rectangular strips of equal area, which requires some work when the target density is non-standard. For targets with unbounded support or unbounded values, a function g transforming the tail into (0,1) has to be selected. A further constraint is that the inverse cdf of the transformed g(X) has to be known. And a large part of the paper examines several scenarii towards simulating from the tail region. For unbounded densities, a similarly minute analysis is undertaken, again with requests about the target like its algebraic order.

“…the result of division of a random integer by its range is a fixed-point number which unlike a floating-point number does not enjoy increased precision near 0. When such random numbers are used in the tail algorithm they cause premature termination of the tail and large gaps between produced random numbers near the termination point.”

The paper further discusses the correction of an error common to earlier ziggurat algorithms, due to the conversion from fixed-point to floating-point numbers, as indicated in the above quote. Although this had already been addressed by George Marsaglia in the early 1990’s.

“Ziggurat algorithm has a high setup time, so it’s not suitable for applications that require variates with frequently changing shape parameters.”

When testing the algorithm against different methods (in STL and Boost), and different distributions, the gains are between two and seven times faster, except for the Exponential target where the original ziggurat algorithm performs better. Interestingly, the gains (and the computing time) increase with the degrees of freedom for the Gamma target, in relation with Devroye’s (1986) remark on the absence of uniformly bounded execution times for this distribution. Same thing for the Weibull variates, obviously. Reflecting upon the usually costly computation of cdfs and inverse cdfs on machines and software, the inverse cdf method is systematically left behind! In conclusion, a good Sunday morning read if not of direct consequences for MCMC implementation, as warned by the authors.

## calibrating approximate credible sets

Posted in Books, Statistics with tags ABC, approximate Bayesian inference, calibration, convergence diagnostics, credible intervals, exchangeability, harmonic mean estimator, simulation on October 26, 2018 by xi'an**E**arlier this week, Jeong Eun Lee, Geoff Nicholls, and Robin Ryder arXived a paper on the calibration of approximate Bayesian credible intervals. *(Warning: all three authors are good friends of mine!)* They start from the core observation that dates back to Monahan and Boos (1992) of exchangeability between θ being generated from the prior and φ being generated from the posterior associated with one observation generated from the prior predictive. (There is no name for this distribution, other than the prior, that is!) A setting amenable to ABC considerations! Actually, Prangle et al. (2014) relies on this property for assessing the ABC error, while pointing out that the test for exchangeability is not fool-proof since it works equally for two generations from the prior.

“The diagnostic tools we have described cannot be “fooled” in quite the same way checks based on the exchangeability can be.”

The paper thus proposes methods for computing the coverage [under the true posterior] of a credible set computed using an approximate posterior. (I had to fire up a few neurons to realise this was the right perspective, rather than the reverse!) A first solution to approximate the exact coverage of the approximate credible set is to use logistic regression, instead of the exact coverage, based on some summary statistics [not necessarily in an ABC framework]. And a simulation outcome that the parameter [simulated from the prior] at the source of the simulated data is within the credible set. Another approach is to use importance sampling when simulating from the pseudo-posterior. However this sounds dangerously close to resorting to an harmonic mean estimate, since the importance weight is the inverse of the approximate likelihood function. Not that anything unseemly transpires from the simulations.

## rethinking the ESS

Posted in Statistics with tags arXiv, delta method, effective sample size, efficiency measures, efficient importance sampling, ESS, importance sampling, MCMC, Monte Carlo Statistical Methods, simulation on September 14, 2018 by xi'an**F**ollowing Victor Elvira‘s visit to Dauphine, one and a half year ago, where we discussed the many defects of ESS as a default measure of efficiency for importance sampling estimators, and then some more efforts (mostly from Victor!) to formalise these criticisms, Victor, Luca Martino and I wrote a paper on this notion, now arXived. (Victor most kindly attributes the origin of the paper to a 2010 ‘Og post on the topic!) The starting thread of the (re?)analysis of this tool introduced by Kong (1992) is that the ESS used in the literature is an *approximation* to the “true” ESS, generally unavailable. Approximation that is pretty crude and hence impacts the relevance of using it as *the* assessment tool for comparing importance sampling methods. In the paper, we re-derive (with the uttermost precision) the resulting approximation and list the many assumptions that [would] validate this approximation. The resulting drawbacks are many, from the absurd property of always being worse than direct sampling, to being independent from the target function and from the sample *per se*. Since only importance weights matter. This list of issues is not exactly brand new, but we think it is worth signaling given the fact that this approximation has been widely used in the last 25 years, due to its simplicity, as a practical rule of thumb [!] in a wide variety of importance sampling methods. In continuation of the directions drafted in Martino et al. (2017), we also indicate some alternative notions of importance efficiency. Note that this paper does not cover the use of ESS for MCMC algorithms, where it is somewhat more legit, if still too rudimentary to really catch convergence or lack thereof! *[Note: I refrained from the post title resinking the ESS…]*