## uniform spacings

Posted in Books, Kids, R with tags , , , , on June 8, 2023 by xi'an

A riddle on uniform spacings!, namely when considering eight iid Uniform (0,1) variates as visiting times and three further iid Uniform (0,1) variates as server availability times, with unit service time, the question being the probability a server is available for a ninth visiting time, T⁹. Which can be decomposed into four cases:

1. at least one server becomes available between the previous visiting time and T⁹
2. at least two servers become available between the penultimate visiting time and the previous visiting time
3. two servers become available between the antepenultimate visiting time and penultimate visiting time and one server becomes available between the penultimate visiting time and the previous visiting time
4. three servers become available between the antepenultimate visiting time and the penultimate visiting time

with respective probabilities (using Devroye’s Theorem 2.1)

1/4, 8 9!3!/12!, 7 8!3!/12!, 7 8!3!/12!,

resulting in a total probability of 0.2934, compatible with a simulation assessment.

## Galton and Watson voluntarily skipping some generations

Posted in Books, Kids, R with tags , , , , , on June 2, 2023 by xi'an

A riddle on a form of a Galton-Watson process, starting from a single unit, where no one dies but rather, at each of 100 generations, Dog either opts for a Uniform number υ of additional units or increments a counter γ by this number υ, its goal being to optimise γ. The solution proposed by the Riddler does not establish his solution’s is the optimal strategy and considers anyway average gains. Solution that consists in always producing more units until the antepenultimate hour (ie incrementing only at the 99th and 100th generations),  I tried instead various logical (?) rules and compared outputs by bRute foRce, resulting in higher maxima (over numerous repeated calls) for the alternative principle

```s<-function(p=.66){
G=0;K=1 for(t in 1:9){
i=sample(1:K,1)
K=K+i*(i>=K*p)
G=G+i*(i<K*p)}
return(c(G+sample(1:K,1),K))}```

## the alpinist [film review]

Posted in Books, Kids, Mountains, pictures, Running with tags , , , , , , , , , , , , , , , , , , , , on June 1, 2023 by xi'an

Watched (with supplementary oxygen) The Alpinist in the plane to Jeddah. It is a documentary (made by the same filmmakers who filmed the Dawn Wall) about the amazing Canadian alpinist Marc-André Leclerc, who died in 2018 on the Mendenhall Glacier, Alaska, in an avalanche, after achieving extraordinary complex solo climbs as eg on Mount Robbson, Cerro Torre in Patagonia. These are winter climbs, partly ice climbing, where no repetition is possible and where the objective conditions (hence dangers) may vary considerably. In that regard, these achievements could be argued to go even beyond Alex Honnold’s free solo climb of El Capitan, where Honnold practiced the route over and over before making his successful free solo attempt. (Obviously an inhuman achievement when considering the hardest bits are at least 7c!) Watching Marc-André Leclerc when mixed climbing is just as heart stopping as watching Honnold rock climbing. He must have been incredibly strong to master these monstruous icy walls and maintain his absolute vigilance in each crampon move, in each ice-pick placement. Sadly it ended up with an avalanche… I obviously enjoyed X’ing many places I had visited, like the approach walk to Robson (in 1991!), the Three Sisters of Canmore [below], and routes of Squamish [above]. (I wonder who filmed during these non-advertised climbs. For instance, he told no one except his partner when he summited Mount Robson. In some cases he was clearly self-filing at lower intensity points, but in others it could have been an helicopter (or a drone?). In this respect, but by far not only in this respect, his blog is definitely worth the read.)

~

## on control variates

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , on May 27, 2023 by xi'an

A few months ago, I had to write a thesis evaluation of Rémi Leluc’s PhD, which contained several novel Monte Carlo proposals on control variates and importance techniques. For instance, Leluc et al. (Statistics and Computing, 2021) revisits the concept of control variables by adding a perspective of control variable selection using LASSO. This prior selection is relevant since control variables are not necessarily informative about the objective function being integrated and my experience is that the more variables the less reliable the improvement. The remarkable feature of the results is in obtaining explicit and non-asymptotic bounds.

The author obtains a concentration inequality on the error resulting from the use of control variables, under strict assumptions on the variables. The associated numerical experiment illustrates the difficulties of practically implementing these principles due to the number of parameters to calibrate. I found the example of a capture-recapture experiment on ducks (European Dipper) particularly interesting, not only because we had used it in our book but also because it highlights the dependence of estimates on the dominant measure.

Based on a NeurIPS 2022 poster presentation Chapter 3 is devoted to the use of control variables in sequential Monte Carlo, where a sequence of importance functions is constructed based on previous iterations to improve the approximation of the target distribution. Under relatively strong assumptions of importance functions dominating the target distribution (which could generally be achieved by using an increasing fraction of the data in a partial posterior distribution), of sub-Gaussian tails of an intractable distribution’s residual, a concentration inequality is established for the adaptive control variable estimator.

This chapter uses a different family of control variables, based on a Stein operator introduced in Mira et al. (2016). In the case where the target is a mixture in IRd, one of our benchmarks in Cappé et al. (2008), remarkable gains are obtained for relatively high dimensions. While the computational demands of these improvements are not mentioned, the comparison with an MCMC approach (NUTS) based on the same number of particles demonstrates a clear improvement in Bayesian estimation.

Chapter 4 corresponds to a very recent arXival and presents a very original approach to control variate correction by reproducing the interest rate law through an approximation using the closest neighbor (leave-one-out) method. It requires neither control function nor necessarily additional simulation, except for the evaluation of the integral, which is rather remarkable, forming a kind of parallel with the bootstrap. (Any other approximation of the distribution would also be acceptable if available at the same computational cost.) The thesis aims to establish the convergence of the method when integration is performed by a Voronoi tessellation, which leads to an optimal rate of order n-1-2/d for quadratic error (under conditions of integrand regularity). In the alternative where the integral must be evaluated by Monte Carlo, this optimality disappears, unless a massive amount of simulations are used. Numerical illustrations cover SDEs and a Bayesian hierarchical modeling already used in Oates et al. (2017), with massive gain in both cases.

## Bernoulli factory on a budget

Posted in Books, Kids with tags , , on May 23, 2023 by xi'an

A form of Bernoulli factory with limited energy from The Riddler: being given the choice of 0<p<1, what is the minimal number n of throws such that the outcomes of n iid B(p) draws can be partitioned into six groups of equal probability? Running a brute force R code [in the train to Cambridge] and checking random partitions for different values of p and n, I found a near fit for n=6 and p around 0.42. With partition weights

0.1658159 0.1669405 0.1657981 0.1662112 0.1682599 0.1669745

while missing the n=5 case for p≈0.3034.