## how can a posterior be uniform?

Posted in Books, Statistics with tags , , , , , , on September 1, 2020 by xi'an A bemusing question from X validated:

How can we have a posterior distribution that is a uniform distribution?

With the underlying message that a uniform distribution does not depend on the data, since it is uniform! While it is always possible to pick the parameterisation a posteriori so that the posterior is uniform, by simply using the inverse cdf transform, or to pick the prior a posteriori so that the prior cancels the likelihood function, there exist more authentic discrete examples of a data realisation leading to a uniform distribution, as eg in the Multinomial model. I deem the confusion to stem from the impression either that uniform means non-informative (what we could dub Laplace’s daemon!) or that it could remain uniform for all realisations of the sampled rv.

## ziggurat algorithm

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , on October 30, 2018 by xi'an

A ziggurat (Akkadian: ziqqurat, D-stem of zaqāru “to build on a raised area”) is a type of massive stone structure built in ancient Mesopotamia. It has the form of a terraced compound of successively receding stories or levels. Wikipedia In a recent arXival, Jalalvand and Charsooghi revisit the ziggurat algorithm that simulates from a univariate distribution by finding horizontal strips that pile up on top of the target as in a ziggurat or a pyramid, hence the name. Which George Marsaglia introduced in 1963. When finely tuned the method is quite efficient. Maybe because it designs an accept-reject move for each strip of the ziggurat rather than globally. For instance, versions constructed for a Normal target are more efficient [3½ times faster] than the Box-Muller algorithm. The generalisation found in the paper divides the target into strips of equal area, rather than dominating rectangular strips of equal area, which requires some work when the target density is non-standard. For targets with unbounded support or unbounded values, a function g transforming the tail into (0,1) has to be selected. A further constraint is that the inverse cdf of the transformed g(X) has to be known. And a large part of the paper examines several scenarii towards simulating from the tail region. For unbounded densities, a similarly minute analysis is undertaken, again with requests about the target like its algebraic order.

“…the result of division of a random integer by its range is a fixed-point number which unlike a floating-point number does not enjoy increased precision near 0. When such random numbers are used in the tail algorithm they cause premature termination of the tail and large gaps between produced random numbers near the termination point.”

The paper further discusses the correction of an error common to earlier ziggurat algorithms, due to the  conversion from fixed-point to floating-point numbers, as indicated in the above quote. Although this had already been addressed by George Marsaglia in the early 1990’s.

“Ziggurat algorithm has a high setup time, so it’s not suitable for applications that require variates with frequently changing shape parameters.”

When testing the algorithm against different methods (in STL and Boost), and different distributions, the gains are between two and seven times faster, except for the Exponential target where the original ziggurat algorithm performs better. Interestingly, the gains (and the computing time) increase with the degrees of freedom for the Gamma target, in relation with Devroye’s (1986) remark on the absence of uniformly bounded execution times for this distribution. Same thing for the Weibull variates, obviously. Reflecting upon the usually costly computation of cdfs and inverse cdfs on machines and software, the inverse cdf method is systematically left behind! In conclusion, a good Sunday morning read if not of direct consequences for MCMC implementation, as warned by the authors.

## independent random sampling methods [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on May 16, 2018 by xi'an Last week, I had the pleasant surprise to receive a copy of this book in the mail. Book that I was not aware had been written or published (meaning that I was not involved in its review!). The three authors, Luca Martino, David Luengo, and Joaquín Míguez, of Independent Random Sampling Methods are from Madrid universities and I have read (and posted on) several of their papers on (population) Monte Carlo simulation in the recent years. Including Luca’s survey of multiple try MCMC which was helpful in writing our WIREs own survey.

The book is a pedagogical coverage of most algorithms used to simulate independent samples from a given distribution, which of course recoups some of the techniques exposed with more details by [another] Luc, namely Luc Devroye’s Non-uniform random variate generation bible, often mentioned here (and studied in uttermost details by a dedicated reading group in Warwick). It includes a whole chapter on accept-reject methods, with in particular a section on Payne-Dagpunar’s band rejection I had not seen previously. And another entire chapter on ratio-of-uniforms techniques. On which the three authors had proposed generalisations [covered by the book], years before I attempted to go the same way, having completely forgotten reading their paper at the time… Or the much earlier 1991 paper by Jon Wakefield, Alan Gelfand and Adrian Smith!

The book also covers the “vertical density representation”, due to Troutt (1991), which consists in considering the distribution of the density p(.) of the random variable X as a random variable, p(X). I remember pondering about this alternative to the cdf transform and giving up on it as the outcome has a distribution depending on p, even when the density is monotonous. Even though I am not certain from reading the section that this is particularly appealing…

Given its title, the book contains very little about MCMC. Except for a last and final chapter that covers adaptive independent Metropolis-Hastings algorithms, in connection with some of the authors’ recent work. Like multiple try Metropolis. Relating to the (unidimensional) ARMS “ancestor” of adaptive MCMC methods. (As noted in a recent blog on Holden et al., 2009 , I have trouble understanding how recycling only rejected proposed values to build a better proposal distribution is enough to guarantee convergence of an adaptive algorithm, but the book does not delve much into this convergence.)

All in all and with the bias induced by me working in the very area, I find the book quite a nice entry on the topic, which can be used in a Monte Carlo course at both undergraduate and graduate levels if one want to avoid going into Markov chains. It is certainly less likely to scare students away than the comprehensive Non-uniform random variate generation and on the opposite may induce some of them to pursue a research career in this domain.

## uniform on the sphere [or not]

Posted in pictures, R, Statistics with tags , , , , , , , , , , , , on March 8, 2018 by xi'an While looking at X validated questions, I came upon this comment that simulating a uniform distribution on a d-dimensional unit sphere does not proceed from generating angles at random on (0,2π) and computing spherical coordinates… Which I must confess would have been my initial suggestion! This is obvious, nonetheless, when computing the Jacobian of the spherical coordinate transform, which involves powers of the sines of the angles, in a decreasing sequence from d-1 to zero. This means that the angles should be simulated according to their respective sine-power densities. However, except for the d=3 case, where simulating from the density sin(φ) is straightforward by inverse cdf, i.e. φ=acos(1-2u), the cdfs for the higher powers are combinations of sines and cosines, and as such are not easily inverted. Take for instance the eighth power:

F⁸(φ)=(840 φ – 672 sin(2 φ) + 168 sin(4 φ) – 32 sin(6 φ) + 3 sin(8 φ))/3072

While the densities are bounded by sin(φ), up to a constant, and hence an accept-reject can be easily derived, the efficiency decreases with the dimension according to the respective ratio of the Wallis’ integrals, unsurprisingly. A quick check for d=4 shows that the Normal simulation+projection-by-division-by-its-norm is faster. Puzzling a bit further about this while running, I wondered at the simultaneous simulations from sin(φ), sin(φ)², sin(φ)³, &tc., but cannot see a faster way to recycle simulations from sin(φ). Points (φ,u) located in-between two adjacent power curves are acceptable simulations from the corresponding upper curve but they need be augmented by points (φ,u) under the lower curve to constitute a representative sample. In the end, this amounts to multiplying simulations from the highest power density as many times as there are powers. No gain in sight… Sigh! However, a few days later, while enjoying the sunset over Mont Blanc(!), I figured out that there exists a direct and efficient way to simulate from these powers of the sine function. Indeed, when looking at the density of cos(φ), it happens to be the signed root of a Beta(½,(d-1)/2), which avoids the accept-reject step. Presumably this is well-known, but I have not seen this proposal associated with the uniform distribution on the sphere.

## postprocessing for ABC

Posted in Books, Statistics with tags , , , , on June 1, 2017 by xi'an Two weeks ago, G.S. Rodrigues, Dennis Prangle and Scott Sisson have recently arXived a paper on recalibrating ABC output to make it correctly calibrated (in the frequentist sense). As in earlier papers, it takes advantage of the fact that the tail posterior probability should be uniformly distributed at the true value of the [simulated] parameter behind the [simulated] data. And as in Prangle et al. (2014), relies on a copula representation. The main notion is that marginals posteriors can be reasonably approximated by non-parametric kernel estimators, which means that an F⁰oF⁻¹ transform can be applied to an ABC reference table in a fully non-parametric extension of Beaumont et al.  (2002). Besides the issue that F is an approximation, I wonder about the computing cost of this approach, given that computing the post-processing transforms comes at a cost of O(pT²) when p is the dimension of the parameter and T the size of the ABC learning set… One question that came to me while discussing the paper with Jean-Michel Marin is why one would use F⁻¹(θ¹|s) instead of directly a uniform U(0,1) since in theory this should be a uniform U(0,1).