In collaboration with the Met Office, my friend and Warwick colleague Rito Dutta is co-organising a two-day workshop in Warwick in July on the use of statistics and machine learning tools in weather prediction. Attendance is free, but registration needed for tea breaks.
Archive for simulation
fusing simulation with data science [18-19 July 2023]
Posted in pictures, Running, Statistics, Travel, University life with tags climate change, CRiSM, data assimilation, data science, fusion, Met Office, PDEs, simulation, simulator model, solver, tea, University of Warwick, weather modelling, workshop on June 5, 2023 by xi'anon control variates
Posted in Books, Kids, Statistics, University life with tags Bayesian Core, Bayesian lasso, Charles Stein, concentration inequality, control variate, France, importance sampling, MCMC, Monte Carlo Statistical Methods, NUTS, simulation, Stein operator, Voronoi tesselation on May 27, 2023 by xi'anA few months ago, I had to write a thesis evaluation of Rémi Leluc’s PhD, which contained several novel Monte Carlo proposals on control variates and importance techniques. For instance, Leluc et al. (Statistics and Computing, 2021) revisits the concept of control variables by adding a perspective of control variable selection using LASSO. This prior selection is relevant since control variables are not necessarily informative about the objective function being integrated and my experience is that the more variables the less reliable the improvement. The remarkable feature of the results is in obtaining explicit and non-asymptotic bounds.
The author obtains a concentration inequality on the error resulting from the use of control variables, under strict assumptions on the variables. The associated numerical experiment illustrates the difficulties of practically implementing these principles due to the number of parameters to calibrate. I found the example of a capture-recapture experiment on ducks (European Dipper) particularly interesting, not only because we had used it in our book but also because it highlights the dependence of estimates on the dominant measure.
Based on a NeurIPS 2022 poster presentation Chapter 3 is devoted to the use of control variables in sequential Monte Carlo, where a sequence of importance functions is constructed based on previous iterations to improve the approximation of the target distribution. Under relatively strong assumptions of importance functions dominating the target distribution (which could generally be achieved by using an increasing fraction of the data in a partial posterior distribution), of sub-Gaussian tails of an intractable distribution’s residual, a concentration inequality is established for the adaptive control variable estimator.
This chapter uses a different family of control variables, based on a Stein operator introduced in Mira et al. (2016). In the case where the target is a mixture in IRd, one of our benchmarks in Cappé et al. (2008), remarkable gains are obtained for relatively high dimensions. While the computational demands of these improvements are not mentioned, the comparison with an MCMC approach (NUTS) based on the same number of particles demonstrates a clear improvement in Bayesian estimation.
Chapter 4 corresponds to a very recent arXival and presents a very original approach to control variate correction by reproducing the interest rate law through an approximation using the closest neighbor (leave-one-out) method. It requires neither control function nor necessarily additional simulation, except for the evaluation of the integral, which is rather remarkable, forming a kind of parallel with the bootstrap. (Any other approximation of the distribution would also be acceptable if available at the same computational cost.) The thesis aims to establish the convergence of the method when integration is performed by a Voronoi tessellation, which leads to an optimal rate of order n-1-2/d for quadratic error (under conditions of integrand regularity). In the alternative where the integral must be evaluated by Monte Carlo, this optimality disappears, unless a massive amount of simulations are used. Numerical illustrations cover SDEs and a Bayesian hierarchical modeling already used in Oates et al. (2017), with massive gain in both cases.
Bertrand’s tartine
Posted in Books, Kids, pictures, Statistics with tags Bertrand's paradox, chord, conditioning, Joseph Bertrand, measure theory, Monte Carlo experiment, riddle, simulation, slice, tartine, The Riddler on November 25, 2022 by xi'anA riddle from The Riddler on cutting a square (toast) into two parts and keeping at least 25% of the surface on each part while avoiding Bertrand’s paradox. By defining the random cut as generated by two uniform draws over the periphery of the square. Meaning that ¼ of the draws are on the same side, ½ on adjacent sides and again ¼ on opposite sides. Meaning one has to compute
P(UV>½)= ½(1-log(2))
and
P(½(U+V)∈(¼,¾))= ¾
Resulting in a probability of 0.2642 (checked by simulation)
sampling, transport, and diffusions
Posted in pictures, Running, Statistics, Travel, University life with tags causality, delayed rejection sampling, Flatiron building, Flatiron Institute, HMC, Hyvärinnen score, Madison Square Garden, normalising flow, NYC, optimal transport, Restore, Simmons Foundation, simulation, Sinkhorn algorithm, WABC, Wasserstein distance on November 18, 2022 by xi'an
This week, I am attending a very cool workshop at the Flatiron Institute (not in the Flatiron building!, but close enough) on Sampling, Transport, and Diffusions, organised by Bob Carpenter and Michael Albergo. It is quite exciting as I do not know most participants or their work! The Flatiron Institute is a private institute focussed on fundamental science funded by the Simons Foundation (in such working conditions universities cannot compete with!).
Eric Vanden-Eijden gave an introductory lecture on using optimal transport notion to improve sampling, with a PDE/ODE approach of continuously turning a base distribution into a target (formalised by the distribution at time one). This amounts to solving a velocity solution to an KL optimisation objective whose target value is zero. Velocity parameterised as a deep neural network density estimator. Using a score function in a reverse SDE inspired by Hyvärinnen (2005), with a surprising occurrence of Stein’s unbiased estimator, there for the same reasons of getting rid of an unknown element. In a lot of environments, simulating from the target is the goal and this can be achieved by MCMC sampling by normalising flows, learning the transform / pushforward map.
At the break, Yuling Yao made a very smart remark that testing between two models could also be seen as an optimal transport, trying to figure an optimal transform from one model to the next, rather than the bland mixture model we used in our mixtestin paper. At this point I have no idea about the practical difficulty of using / inferring the parameters of this continuum but one could start from normalising flows. Because of time continuity, one would need some driving principle.
Esteban Tabak gave another interest talk on simulating from a conditional distribution, which sounds like a no-problem when the conditional density is known but a challenge when only pairs are observed. The problem is seen as a transport problem to a barycentre obtained as a distribution independent from the conditioning z and then inverting. Constructing maps through flows. Very cool, even possibly providing an answer for causality questions.
Many of the transport talks involved normalizing flows. One by [Simons Fellow] Christopher Jazynski about adding to the Hamiltonian (in HMC) an artificial flow field (Vaikuntanathan and Jarzynski, 2009) to make up for the Hamiltonian moving too fast for the simulation to keep track. Connected with Eric Vanden-Eijden’s talk in the end.
An interesting extension of delayed rejection for HMC by Chirag Modi, with a manageable correction à la Antonietta Mira. Johnatan Niles-Weed provided a nonparametric perspective on optimal transport following Hütter+Rigollet, 21 AoS. With forays into the Sinkhorn algorithm, mentioning Aude Genevay’s (Dauphine graduate) regularisation.
Michael Lindsey gave a great presentation on the estimation of the trace of a matrix by the Hutchinson estimator for sdp matrices using only matrix multiplication. Solution surprisingly relying on Gibbs sampling called thermal sampling.
And while it did not involve optimal transport, I gave a short (lightning) talk on our recent adaptive restore paper: although in retrospect a presentation of Wasserstein ABC could have been more suited to the audience.
another drawer of socks
Posted in Books, Kids, R, Statistics with tags ABC, combinatorics, FiveThirtyEight, R, simulation, socks, The Riddler on November 6, 2022 by xi'anA socks riddle from the Riddler but with no clear ABC connection! Twenty-eight socks from fourteen pairs of socks are taken from a drawer, one by one, and laid on a surface that only fit nine socks at a time, with complete pairs removed. What is the probability that all pairs are stored without running out of space? No orphan socks then!!
Writing an R code for this experiment is straightforward
for(v in 1:1e6){ S=sample(rep(1:14,2)) x=S[1] for(t in 2:18){ if(S[t]%in%x){x=x[S[t]!=x]}else{x=c(x,S[t])} if(sum(!!x)>9){ F=F+1;break()}}}
and it returns a value quite close to 0.7 for the probability of success. I was expecting a less brute-force resolution but the the Riddler only provided the answer of 70.049 based on the above tree of probabilities (which I was too lazy to code).