## ratio-of-uniforms [#4]

Posted in Books, pictures, R, Statistics, University life with tags , , , , on December 2, 2016 by xi'an

Possibly the last post on random number generation by Kinderman and Monahan’s (1977) ratio-of-uniform method. After fiddling with the Gamma(a,1) distribution when a<1 for a while, I indeed figured out a way to produce a bounded set with this method: considering an arbitrary cdf Φ with corresponding pdf φ, the uniform distribution on the set Λ of (u,v)’s in R⁺xX such that

0≤u≤Φοƒ[φοΦ⁻¹(u)v]

induces the distribution with density proportional to ƒ on φοΦ⁻¹(U)V. This set Λ has a boundary that is parameterised as

u=Φοƒ(x),  v=1/φοƒ(x), x∈Χ

which remains bounded in u since Φ is a cdf and in v if φ has fat enough tails. At both 0 and ∞. When ƒ is the Gamma(a,1) density this can be achieved if φ behaves like log(x)² near zero and like a inverse power at infinity. Without getting into all the gory details, closed form density φ and cdf Φ can be constructed for all a’s, as shown for a=½ by the boundaries in u and v (yellow) below

which leads to a bounded associated set Λ

At this stage, I remain uncertain of the relevance of such derivations, if only because the set A thus derived is ill-suited for uniform draws proposed on the enclosing square box. And also because a Gamma(a,1) simulation can rather simply be derived from a Gamma(a+1,1) simulation. But, who knows?!, there may be alternative usages of this representation, such as innovative slice samplers. Which means the ratio-of-uniform method may reappear on the ‘Og one of those days…

## asymptotically exact inference in likelihood-free models [a reply from the authors]

Posted in R, Statistics with tags , , , , , , , , , , , , , , , , , on December 1, 2016 by xi'an

[Following my post of lastTuesday, Matt Graham commented on the paper with force détails. Here are those comments. A nicer HTML version of the Markdown reply below is also available on Github.]

Thanks for the comments on the paper!

A few additional replies to augment what Amos wrote:

This however sounds somewhat intense in that it involves a quasi-Newton resolution at each step.

The method is definitely computationally expensive. If the constraint function is of the form of a function from an M-dimensional space to an N-dimensional space, with MN, for large N the dominant costs at each timestep are usually the constraint Jacobian (c/u) evaluation (with reverse-mode automatic differentiation this can be evaluated at a cost of O(N) generator / constraint evaluations) and Cholesky decomposition of the Jacobian product (c/u)(c/u) with O(N³) cost (though in many cases e.g. i.i.d. or Markovian simulated data, structure in the generator Jacobian can be exploited to give a significantly reduced cost). Each inner Quasi-Newton update involves a pair of triangular solve operations which have a O(N²) cost, two matrix-vector multiplications with O(MN) cost, and a single constraint / generator function evaluation; the number of Quasi-Newton updates required for convergence in the numerical experiments tended to be much less than N hence the Quasi-Newton iteration tended not to be the main cost.

The high computation cost per update is traded off however with often being able to make much larger proposed moves in high-dimensional state spaces with a high chance of acceptance compared to ABC MCMC approaches. Even in the relatively small Lotka-Volterra example we provide which has an input dimension of 104 (four inputs which map to ‘parameters’, and 100 inputs which map to ‘noise’ variables), the ABC MCMC chains using the coarse ABC kernel radius ϵ=100 with comparably very cheap updates were significantly less efficient in terms of effective sample size / computation time than the proposed constrained HMC approach. This was in large part due to the elliptical slice sampling updates in the ABC MCMC chains generally collapsing down to very small moves even for this relatively coarse ϵ. Performance was even worse using non-adaptive ABC MCMC methods and for smaller ϵ, and for higher input dimensions (e.g. using a longer sequence with correspondingly more random inputs) the comparison becomes even more favourable for the constrained HMC approach. Continue reading

## sampling by exhaustion

Posted in Books, Kids, R, Statistics with tags , , , , on November 25, 2016 by xi'an

The riddle set by The Riddler of last week sums up as follows:

Within a population of size N, each individual in the population independently selects another individual. All individuals selected at least once are removed and the process iterates until one or zero individual is left. What is the probability that there is zero individual left?

While I cannot see a clean analytical solution to this problem, it reminds me of an enveloppe-versus-letter (matches) problem I saw in graduate school. Indeed, the expected number of removed (or selected) individuals is given by

$N\left\{1-\frac{N-2}{N-1}\right\}^{N-1}$

which is equivalent to (1-e⁻¹)N for N large, meaning that the population decreases by an average factor of e⁻¹ at each round. And that it takes on average approximately log(N) iterations to reach a single individual. A simulation of the first probabilities of ending with one individual led me to the above curve, which wiggles in an almost periodic way around the probability ½, equal to the average of those probabilities. Using the R code

rad=function(N){#next population size
ut=sample(rep(2:N,2),1)
for (i in 2:N)#sampling
ut=c(ut,sample(rep((1:N)[-i],2),1))
return(N-length(unique(ut))}
sal=rep(0,m);sal[1]=1
for (N in 3:M){
prop=0;
for (t in 1:T){#one single step
if (i>0) prop=prop+sal[i]}
sal[N]=prop/T}


which exploits the previously computed probabilities. The variability is most interesting if unexpected, but looking back at Feller‘s sections and exercises on the classical occupancy problem, I could not find a connection with this problem. If it exists. Still, if N is large enough, the exclusion of one index from the selection becomes negligible and the probability of moving from n to m individuals should be approximately [Feller, eqn (2.4), p.102]

$p_n(m)={n\choose m}\sum_{v=}^{n-m} (-1)^v {n-m\choose v} \left(1-\frac{m+v}{n}\right)^n$

This formula approximates quite well the exact probability, but as in a previous post about the birthday problem, it proves quite delicate to compute. As already noticed by Feller.

## Monty Python generator

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , on November 23, 2016 by xi'an

By some piece of luck I came across a paper by the late George Marsaglia, genial contributor to the field of simulation, and Wai Wan Tang, entitled The Monty Python method for generating random variables. As shown by the below illustration, the concept is to flip the piece H outside the rectangle back inside the rectangle, exploiting the remaining area above the density. The fantastic part is actually that “since areas of the rectangle add to 1, the slim in-between area is exactly the tail area”! So the tiny bit between G and the flipped H is the remaining tail.In the case of a Gamma Ga(a,1) variate, the authors express this variate as the transform of another variate with a nearly symmetry density, on which the Monty Python method applies. The transform is

$q(x)=(a-1/3)(1 + x/\sqrt{16a})^3$

with -√16a<x. The second nice trick is that the density of x is provided for free by the Gamma Ga(a,1) density and the transform, thanks to the change of variable formula. One lingering question is obviously how to handle the tail part. This is handled separately in the paper, with a rather involved algorithm, but since the area of the tail is tiny, a mere 1.2% in the case of the Gaussian density, this instance occurs rarely. Very clever if highly specialised! (The case of a<1 has to be processed by the indirect of multiplying a Ga(a+1,1) by a uniform variate to the power 1/a.)

I also found out that there exists a Monte Python software, which is an unrelated Monte Carlo code in python [hence the name] for cosmological inference. Including nested sampling, unsurprisingly.

## postdoc on missing data at École Polytechnique

Posted in Kids, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , on November 18, 2016 by xi'an

Julie Josse contacted me for advertising a postdoc position at École Polytechnique, in Palaiseau, south of Paris. “The fellowship is focusing on missing data. Interested graduates should apply as early as possible since the position will be filled when a suitable candidate is found. The Centre for Applied Mathematics (CMAP) is  looking for highly motivated individuals able to develop a general multiple imputation method for multivariate continuous and categorical variables and its implementation in the free R software. The successful candidate will be part of research group in the statistical team on missing values. The postdoc will also have excellent opportunities to collaborate with researcher in public health with partners on the analysis of a large register from the Paris Hospital (APHP) to model the decisions and events when severe trauma patients are handled by emergency doctors. Candidates should contact Julie Josse at polytechnique.edu.”

## simulation under zero measure constraints

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , on November 17, 2016 by xi'an

A theme that comes up fairly regularly on X validated is the production of a sample with given moments, either for calibration motives or from a misunderstanding of the difference between a distribution mean and a sample average. Here are some entries on that topic:

In most of those questions, the constraint in on the sum or mean of the sample, which allows for an easy resolution by a change of variables. It however gets somewhat harder when the constraint involves more moments or, worse, an implicit solution to an equation. A good example of the later is the quest for a sample with a given maximum likelihood estimate in the case this MLE cannot be derived analytically. As for instance with a location-scale t sample…

Actually, even when the constraint is solely on the sum, a relevant question is the production of an efficient simulation mechanism. Using a Gibbs sampler that changes one component of the sample at each iteration does not qualify, even though it eventually produces the proper sample. Except for small samples. As in this example

n=3;T=1e4
s0=.5 #fixed average
sampl=matrix(s0,T,n)
for (t in 2:T){
sampl[t,]=sampl[t-1,]
for (i in 1:(n-1)){
sampl[t,i]=runif(1,
min=max(0,n*s0-sum(sampl[t,c(-i,-n)])-1),
max=min(1,n*s0-sum(sampl[t,c(-i,-n)])))
sampl[t,n]=n*s0-sum(sampl[t,-n])}}

For very large samples, I figure that proposing from the unconstrained density can achieve a sufficient efficiency, but the in-between setting remains an interesting problem.

## analysing the US election result, from Oxford, England

Posted in pictures, R, Statistics, Travel, University life with tags , on November 14, 2016 by xi'an

Seth Flaxman (Oxford), Dougal J. Sutherland (UCL), Yu-Xiang Wang (CMU), and Yee Whye Teh (Oxford), published on arXiv this morning an analysis of the US election, in what they called most appropriately a post-mortem. Using ecological inference already employed after Obama’s re-election. And producing graphs like the following one: