## Bertrand’s tartine

Posted in Books, Kids, pictures, Statistics with tags , , , , , , , , , , on November 25, 2022 by xi'an

A riddle from The Riddler on cutting a square (toast) into two parts and keeping at least 25% of the surface on each part while avoiding Bertrand’s paradox. By defining the random cut as generated by two uniform draws over the periphery of the square. Meaning that ¼ of the draws are on the same side, ½ on adjacent sides and again ¼ on opposite sides. Meaning one has to compute

P(UV>½)= ½(1-log(2))

and

P(½(U+V)∈(¼,¾))= ¾

Resulting in a probability of 0.2642 (checked by simulation)

## sampling, transport, and diffusions

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , on November 18, 2022 by xi'an

This week, I am attending a very cool workshop at the Flatiron Institute (not in the Flatiron building!, but close enough) on Sampling, Transport, and Diffusions, organised by Bob Carpenter and Michael Albergo. It is quite exciting as I do not know most participants or their work! The Flatiron Institute is a private institute focussed on fundamental science funded by the Simons Foundation (in such working conditions universities cannot compete with!).

Eric Vanden-Eijden gave an introductory lecture on using optimal transport notion to improve sampling, with a PDE/ODE approach of continuously turning a base distribution into a target (formalised by the distribution at time one). This amounts to solving a velocity solution to an KL optimisation objective whose target value is zero. Velocity parameterised as a deep neural network density estimator. Using a score function in a reverse SDE inspired by Hyvärinnen (2005), with a surprising occurrence of Stein’s unbiased estimator, there for the same reasons of getting rid of an unknown element. In a lot of environments, simulating from the target is the goal and this can be achieved by MCMC sampling by normalising flows, learning the transform / pushforward map.

At the break, Yuling Yao made a very smart remark that testing between two models could also be seen as an optimal transport, trying to figure an optimal transform from one model to the next, rather than the bland mixture model we used in our mixtestin paper. At this point I have no idea about the practical difficulty of using / inferring the parameters of this continuum but one could start from normalising flows. Because of time continuity, one would need some driving principle.

Esteban Tabak gave another interest talk on simulating from a conditional distribution, which sounds like a no-problem when the conditional density is known but a challenge when only pairs are observed. The problem is seen as a transport problem to a barycentre obtained as a distribution independent from the conditioning z and then inverting. Constructing maps through flows. Very cool, even possibly providing an answer for causality questions.

Many of the transport talks involved normalizing flows. One by [Simons Fellow] Christopher Jazynski about adding to the Hamiltonian (in HMC) an artificial flow field  (Vaikuntanathan and Jarzynski, 2009) to make up for the Hamiltonian moving too fast for the simulation to keep track. Connected with Eric Vanden-Eijden’s talk in the end.

An interesting extension of delayed rejection for HMC by Chirag Modi, with a manageable correction à la Antonietta Mira. Johnatan Niles-Weed provided a nonparametric perspective on optimal transport following Hütter+Rigollet, 21 AoS. With forays into the Sinkhorn algorithm, mentioning Aude Genevay’s (Dauphine graduate) regularisation.

Michael Lindsey gave a great presentation on the estimation of the trace of a matrix by the Hutchinson estimator for sdp matrices using only matrix multiplication. Solution surprisingly relying on Gibbs sampling called thermal sampling.

And while it did not involve optimal transport, I gave a short (lightning) talk on our recent adaptive restore paper: although in retrospect a presentation of Wasserstein ABC could have been more suited to the audience.

## another drawer of socks

Posted in Books, Kids, R, Statistics with tags , , , , , , on November 6, 2022 by xi'an

A socks riddle from the Riddler but with no clear ABC connection! Twenty-eight socks from fourteen pairs of socks are taken from a drawer, one by one, and laid on a surface that only fit nine socks at a time, with complete pairs removed. What is the probability that all pairs are stored without running out of space? No orphan socks then!!

Writing an R code for this experiment is straightforward

for(v in 1:1e6){
S=sample(rep(1:14,2))
x=S[1]
for(t in 2:18){
if(S[t]%in%x){x=x[S[t]!=x]}else{x=c(x,S[t])}
if(sum(!!x)>9){
F=F+1;break()}}}


and it returns a value quite close to 0.7 for the probability of success. I was expecting a less brute-force resolution but the the Riddler only provided the answer of 70.049 based on the above tree of probabilities (which I was too lazy to code).

## why is this algorithm simulating a Normal variate?

Posted in Books, Kids, R, Statistics with tags , , , , , , , on September 15, 2022 by xi'an

A backward question from X validated as to why the above is a valid Normal generator based on exponential generations. Which can be found in most textbooks (if not ours). And in The Bible, albeit as an exercise. The validation proceeds from the (standard) Exponential density dominating the (standard) Normal density and, according to Devroye, may have originated from von Neumann himself. But with a brilliant reverse engineering resolution by W. Huber on X validated. While a neat exercise, it requires on average 2.64 Uniform generations per Normal generation, against a 1/1 ratio for Box-Muller (1958) polar approach, or 1/0.86 for the Marsaglia-Bray (1964) composition-rejection method. The apex of the simulation jungle is however Marsaglia and Tsang (2000) ziggurat algorithm. At least on CPUs since, Note however that “The ziggurat algorithm gives a more efficient method for scalar processors (e.g. old CPUs), while the Box–Muller transform is superior for processors with vector units (e.g. GPUs or modern CPUs)” according to Wikipedia.

To draw a comparison between this Normal generator (that I will consider as von Neumann’s) and the Box-Müller polar generator,

#Box-Müller
bm=function(N){
a=sqrt(-2*log(runif(N/2)))
b=2*pi*runif(N/2)
return(c(a*sin(b),a*cos(b)))
}

#vonNeumann
vn=function(N){
u=-log(runif(2.64*N))
v=-2*log(runif(2.64*N))>(u-1)^2
w=(runif(2.64*N)<.5)-2
return((w*u)[v])
}


here are the relative computing times

> system.time(bm(1e8))
utilisateur     système      écoulé
7.015       0.649       7.674
> system.time(vn(1e8))
utilisateur     système      écoulé
42.483       5.713      48.222


## stuck exchange

Posted in Books, Kids, Statistics with tags , , , , , , , on August 16, 2022 by xi'an

Made an attempt at explaining on X validated why simulating from the joint was equivalent to simulating from the marginal then from the conditional. Unfortunately failed as I could not fathom where the OP’s difficulty was. It seems it started at defining what drawing from a distribution meant… Then someone came by asking why I was writing the exponential in this unusual way (this was a barred E for expectation) and whether or not the “thin hollow rectangle” (a barred I for indicator) was standing for identity, that is

$\mathbb E\quad\text{and}\quad \mathbb I$

Reaching a point of incomprehension from which I could not recover…