## ratio-of-uniforms [#4]

Posted in Books, pictures, R, Statistics, University life with tags , , , , on December 2, 2016 by xi'an

Possibly the last post on random number generation by Kinderman and Monahan’s (1977) ratio-of-uniform method. After fiddling with the Gamma(a,1) distribution when a<1 for a while, I indeed figured out a way to produce a bounded set with this method: considering an arbitrary cdf Φ with corresponding pdf φ, the uniform distribution on the set Λ of (u,v)’s in R⁺xX such that

0≤u≤Φοƒ[φοΦ⁻¹(u)v]

induces the distribution with density proportional to ƒ on φοΦ⁻¹(U)V. This set Λ has a boundary that is parameterised as

u=Φοƒ(x),  v=1/φοƒ(x), x∈Χ

which remains bounded in u since Φ is a cdf and in v if φ has fat enough tails. At both 0 and ∞. When ƒ is the Gamma(a,1) density this can be achieved if φ behaves like log(x)² near zero and like a inverse power at infinity. Without getting into all the gory details, closed form density φ and cdf Φ can be constructed for all a’s, as shown for a=½ by the boundaries in u and v (yellow) below

which leads to a bounded associated set Λ

At this stage, I remain uncertain of the relevance of such derivations, if only because the set A thus derived is ill-suited for uniform draws proposed on the enclosing square box. And also because a Gamma(a,1) simulation can rather simply be derived from a Gamma(a+1,1) simulation. But, who knows?!, there may be alternative usages of this representation, such as innovative slice samplers. Which means the ratio-of-uniform method may reappear on the ‘Og one of those days…

## MAP as Bayes estimators

Posted in Books, Kids, Statistics with tags , , , , on November 30, 2016 by xi'an

Robert Bassett and Julio Deride just arXived a paper discussing the position of MAPs within Bayesian decision theory. A point I have discussed extensively on the ‘Og!

“…we provide a counterexample to the commonly accepted notion of MAP estimators as a limit of Bayes estimators having 0-1 loss.”

The authors mention The Bayesian Choice stating this property without further precautions and I completely agree to being careless in this regard! The difficulty stands with the limit of the maximisers being not necessarily the maximiser of the limit. The paper includes an example to this effect, with a prior as above,  associated with a sampling distribution that does not depend on the parameter. The sufficient conditions proposed therein are that the posterior density is almost surely proper or quasiconcave.

This is a neat mathematical characterisation that cleans this “folk theorem” about MAP estimators. And for which the authors are to be congratulated! However, I am not very excited by the limiting property, whether it holds or not, as I have difficulties conceiving the use of a sequence of losses in a mildly realistic case. I rather prefer the alternate characterisation of MAP estimators by Burger and Lucka as proper Bayes estimators under another type of loss function, albeit a rather artificial one.

## asymptotically exact inference in likelihood-free models

Posted in Books, pictures, Statistics with tags , , , , , , , on November 29, 2016 by xi'an

“We use the intuition that inference corresponds to integrating a density across the manifold corresponding to the set of inputs consistent with the observed outputs.”

Following my earlier post on that paper by Matt Graham and Amos Storkey (University of Edinburgh), I now read through it. The beginning is somewhat unsettling, albeit mildly!, as it starts by mentioning notions like variational auto-encoders, generative adversial nets, and simulator models, by which they mean generative models represented by a (differentiable) function g that essentially turn basic variates with density p into the variates of interest (with intractable density). A setting similar to Meeds’ and Welling’s optimisation Monte Carlo. Another proximity pointed out in the paper is Meeds et al.’s Hamiltonian ABC.

“…the probability of generating simulated data exactly matching the observed data is zero.”

The section on the standard ABC algorithms mentions the fact that ABC MCMC can be (re-)interpreted as a pseudo-marginal MCMC, albeit one targeting the ABC posterior instead of the original posterior. The starting point of the paper is the above quote, which echoes a conversation I had with Gabriel Stolz a few weeks ago, when he presented me his free energy method and when I could not see how to connect it with ABC, because having an exact match seemed to cancel the appeal of ABC, all parameter simulations then producing an exact match under the right constraint. However, the paper maintains this can be done, by looking at the joint distribution of the parameters, latent variables, and observables. Under the implicit restriction imposed by keeping the observables constant. Which defines a manifold. The mathematical validation is achieved by designing the density over this manifold, which looks like

$p(u)\left|\frac{\partial g^0}{\partial u}\frac{\partial g^0}{\partial u}^\text{T}\right|^{-\textonehalf}$

if the constraint can be rewritten as g⁰(u)=0. (This actually follows from a 2013 paper by Diaconis, Holmes, and Shahshahani.) In the paper, the simulation is conducted by Hamiltonian Monte Carlo (HMC), the leapfrog steps consisting of an unconstrained move followed by a projection onto the manifold. This however sounds somewhat intense in that it involves a quasi-Newton resolution at each step. I also find it surprising that this projection step does not jeopardise the stationary distribution of the process, as the argument found therein about the approximation of the approximation is not particularly deep. But the main thing that remains unclear to me after reading the paper is how the constraint that the pseudo-data be equal to the observable data can be turned into a closed form condition like g⁰(u)=0. As mentioned above, the authors assume a generative model based on uniform (or other simple) random inputs but this representation seems impossible to achieve in reasonably complex settings.

## simulation by hand

Posted in Books, Kids, pictures, Statistics, Travel with tags , , , , , , , on November 28, 2016 by xi'an

A rather weird question on X validated this week was about devising a manual way to simulate (a few) normal variates. By manual I presume the author of the question means without resorting to a computer or any other business machine. Now, I do not know of any real phenomenon that is exactly and provably Normal. As analysed in a great philosophy of science paper by Aidan Lyon, the standard explanations for a real phenomenon to be Normal are almost invariably false, even those invoking the Central Limit Theorem. Hence I cannot think of a mechanical device that would directly return Normal generations from a Normal distribution with known parameters. However, since it is possible to simulate by hand Uniform U(0,1) variates [up to a given precision] using a chronometre or a wheel, calls to versions of the Box-Müller algorithm that do not rely on logarithmic or trigonometric functions are feasible, for instance by generating two Exponential variates, x and y, until 2y>(1-x)², x being the output. And generating Exponential variates is easy provided a radioactive material with known half-life is available, along with a Geiger counter. Or, if not, by calling von Neumann’s exponential generator. As detailed in Devroye’s simulation book.

After proposing this solution, I received a comment from the author of the question towards a simpler solution based, e.g., on the Central Limit Theorem. Presumably for simple iid random variables such as coin tosses or dice experiments. While I used the CLT for simulating Normal variables in my very early days [just after programming on punched cards!], I do not think this is a very good or efficient method, as the tails grow very slowly to normality. By comparison, using the same amount of coin tosses to create a sufficient number of binary digits of a Uniform variate produces a computer-precision exact Uniform variate, which can be exploited in Box-Müller-like algorithms to return exact Normal variates… Even by hand if necessary. [For some reason, this question attracted a lot of traffic and an encyclopaedic answer on X validated, despite being borderline to the point of being proposed for closure.]

## “Stein deviates from the statistical norm”

Posted in Books, pictures, Statistics, University life with tags , , , , , , , on November 27, 2016 by xi'an

## Charles M. Stein [1920-2016]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on November 26, 2016 by xi'an

I have just heard that Charles Stein, Professor at Stanford University, passed away last night. Although the following image is definitely over-used, I truly feel this is the departure of a giant of statistics.  He has been deeply influential on the fields of probability and mathematical statistics, primarily in decision theory and approximation techniques. On the first field, he led to considerable changes in the perception of optimality by exhibiting the Stein phenomenon, where the aggregation of several admissible estimators of unrelated quantities may (and will) become inadmissible for the joint estimation of those quantities! Although the result can be explained by mathematical and statistical reasoning, it was still dubbed a paradox due to its counter-intuitive nature. More foundationally, it led to expose the ill-posed nature of frequentist optimality criteria and certainly contributed to the Bayesian renewal of the 1980’s, before the MCMC revolution. (It definitely contributed to my own move, as I started working on the Stein phenomenon during my thesis, before realising the fundamentally Bayesian nature of the domination results.)

“…the Bayesian point of view is often accompanied by an insistence that people ought to agree to a certain doctrine even without really knowing what this doctrine is.” (Statistical Science, 1986)

The second major contribution of Charles Stein was the introduction of a new technique for normal approximation that is now called the Stein method. It relies on a differential operator and produces estimates of approximation error in Central Limit theorems, even in dependent settings. While I am much less familiar with this aspect of Charles Stein’s work, I believe the impact it has had on the field is much more profound and durable than the Stein effect in Normal mean estimation.

(During the Vietnam War, he was quite active in the anti-war movement and the above picture from 2003 shows that his opinions had not shifted over time!) A giant truly has gone.

## sampling by exhaustion

Posted in Books, Kids, R, Statistics with tags , , , , on November 25, 2016 by xi'an

The riddle set by The Riddler of last week sums up as follows:

Within a population of size N, each individual in the population independently selects another individual. All individuals selected at least once are removed and the process iterates until one or zero individual is left. What is the probability that there is zero individual left?

While I cannot see a clean analytical solution to this problem, it reminds me of an enveloppe-versus-letter (matches) problem I saw in graduate school. Indeed, the expected number of removed (or selected) individuals is given by

$N\left\{1-\frac{N-2}{N-1}\right\}^{N-1}$

which is equivalent to (1-e⁻¹)N for N large, meaning that the population decreases by an average factor of e⁻¹ at each round. And that it takes on average approximately log(N) iterations to reach a single individual. A simulation of the first probabilities of ending with one individual led me to the above curve, which wiggles in an almost periodic way around the probability ½, equal to the average of those probabilities. Using the R code

rad=function(N){#next population size
ut=sample(rep(2:N,2),1)
for (i in 2:N)#sampling
ut=c(ut,sample(rep((1:N)[-i],2),1))
return(N-length(unique(ut))}
sal=rep(0,m);sal[1]=1
for (N in 3:M){
prop=0;
for (t in 1:T){#one single step
i=rad(N)
if (i>0) prop=prop+sal[i]}
sal[N]=prop/T}


which exploits the previously computed probabilities. The variability is most interesting if unexpected, but looking back at Feller‘s sections and exercises on the classical occupancy problem, I could not find a connection with this problem. If it exists. Still, if N is large enough, the exclusion of one index from the selection becomes negligible and the probability of moving from n to m individuals should be approximately [Feller, eqn (2.4), p.102]

$p_n(m)={n\choose m}\sum_{v=}^{n-m} (-1)^v {n-m\choose v} \left(1-\frac{m+v}{n}\right)^n$

This formula approximates quite well the exact probability, but as in a previous post about the birthday problem, it proves quite delicate to compute. As already noticed by Feller.