## séminaire P de S

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on February 18, 2020 by xi'an

As I was in Paris and free for the occasion (!), I attended the Paris Statistics seminar this afternoon, in the Latin Quarter. With a first talk by Kweku Abraham on Bayesian inverse problems set a prior on the quantity of interest, γ, rather than its transform G(γ), observed with noise. Always perturbed by the juggling of different distances, like L² versus Kullback-Leibler, in non-parametric frameworks. Reminding me of probabilistic numerics, at least in the framework, since the crux of the talk was 100% about convergence. And a second talk by Leanaïc Chizat on convex neural networks corresponding to an infinite number of neurons, with surprising properties, including implicit bias. And a third talk by Anne Sabourin on PCA for extremes. Which assumed very little on the model but more on the geometry of the distribution, like extremes being concentrated on a subspace. As I was rather tired from an intense week at Warwick, and after a weekend of reading grant applications and Biometrika submissions (!), my foggy brain kept switching to these proposals, trying to make connections with the talks, not completely inappropriately in two cases out of three. (I am afraid the same may happen tomorrow at our probability seminar on computer-based proofs!)

## maximal spacing around order statistics [#2]

Posted in Books, R, Statistics, University life with tags , , , , , , , , on June 8, 2018 by xi'an

The proposed solution of the riddle from the Riddler discussed here a few weeks ago is rather approximative, in that the distribution of

$\Delta_n=\max_i\,\min_j\,|X_{i}-X_{j}|$

when the n-sample is made of iid Normal variates is (a) replaced with the distribution of one arbitrary minimum and (b) the distribution of the minimum is based on an assumption of independence between the absolute differences. Which does not hold, as shown by the above correlation matrix (plotted via corrplot) for N=11 and 10⁴ simulations. One could think that this correlation decreases with N, but it remains essentially 0.2 for larger values of N. (On the other hand, the minima are essentially independent.)

## maximal spacing around order statistics

Posted in Books, R, Statistics, University life with tags , , , , , , , on May 17, 2018 by xi'an

The riddle from the Riddler for the coming weeks is extremely simple to express in mathematical terms, as it summarises into characterising the distribution of

$\Delta_n=\max_i\,\min_j\,|X_{i}-X_{j}|$

when the n-sample is made of iid Normal variates. I however had a hard time finding a result connected with this quantity since most available characterisations are for either Uniform or Exponential variates. I eventually found a 2017 arXival by Nagaraya et al.  covering the issue. Since the Normal distribution belongs to the Gumbel domain of attraction, the extreme spacings, that is the spacings between the most extreme orders statistics [rescaled by nφ(Φ⁻¹{1-n⁻¹})] are asymptotically independent and asymptotically distributed as (Theorem 5, p.15, after correcting a typo):

$(\xi_1,\xi_2/2,...)$

where the ξ’s are Exp(1) variates. A crude approximation is thus to consider that the above Δ is distributed as the maximum of two standard and independent exponential distributions, modulo the rescaling by  nφ(Φ⁻¹{1-n⁻¹})… But a more adequate result was pointed out to me by Gérard Biau, namely a 1986 Annals of Probability paper by Paul Deheuvels, my former head at ISUP, Université Pierre and Marie Curie. In this paper, Paul Deheuvels establishes that the largest spacing in a normal sample, M¹, satisfies

$\mathbb{P}(\sqrt{2\log\,n}\,M^1\le x) \to \prod_{i=1}^{\infty} (1-e^{-ix})^2$

from which a conservative upper bound on the value of n required for a given bound x⁰ can be derived. The simulation below compares the limiting cdf (in red) with the empirical cdf of the above Δ based on 10⁴ samples of size n=10³.The limiting cdf is the cdf of the maximum of an infinite sequence of independent exponentials with scales 1,½,…. Which connects with the above result, in fine. For a practical application, the 99% quantile of this distribution is 4.71. To achieve a maximum spacing of, say 0.1, with probability 0.99, one would need 2 log(n) > 5.29²/0.1², i.e., log(n)>1402, which is a pretty large number…

## truncated Gumbels

Posted in Books, Kids, pictures, Statistics with tags , , , , , , , on April 6, 2018 by xi'an

As I had to wake up pretty early on Easter morning to give my daughter a ride, while waiting I came upon this calculus question on X validated of computing the conditional expectation of a Gumbel variate, conditional on its drifted version being larger than another independent Gumbel variate with the same location-scale parameters. (Just reminding readers that a Gumbel G(0,1) variate is a double log-uniform, i.e., can be generated as X=-log(-log(U)).) And found after a few minutes (and a call to Wolfram Alpha integrator) that

$\mathbb{E}[\epsilon_1|\epsilon_1+c>\epsilon_0]=\gamma+\log(1+e^{-c})$

which is simple enough to make me wonder if there is a simpler derivation than the call to the exponential integral Ei(x) function. (And easy to check by simulation.)

Incidentally, I discovered that Emil Gumbel had applied statistical analysis to the study of four years of political murders in the Weimar Republic, demonstrating the huge bias of the local justice towards right-wing murders. When he signed the urgent call [for the union of the socialist and communist parties] against fascism in 1932, he got expelled from his professor position in Heidelberg and emigrated to France, which he had to leave again for the USA on the Nazi invasion in 1940. Where he became a professor at Columbia.

## an extension of nested sampling

Posted in Books, Statistics, University life with tags , , , , , , , on December 16, 2014 by xi'an

I was reading [in the Paris métro] Hastings-Metropolis algorithm on Markov chains for small-probability estimation, arXived a few weeks ago by François Bachoc, Lionel Lenôtre, and Achref Bachouch, when I came upon their first algorithm that reminded me much of nested sampling: the following was proposed by Guyader et al. in 2011,

To approximate a tail probability P(H(X)>h),

• start from an iid sample of size N from the reference distribution;
• at each iteration m, select the point x with the smallest H(x)=ξ and replace it with a new point y simulated under the constraint H(y)≥ξ;
• stop when all points in the sample are such that H(X)>h;
• take

$\left(1-\dfrac{1}{N}\right)^{m-1}$

as the unbiased estimator of P(H(X)>h).

Hence, except for the stopping rule, this is the same implementation as nested sampling. Furthermore, Guyader et al. (2011) also take advantage of the bested sampling fact that, if direct simulation under the constraint H(y)≥ξ is infeasible, simulating via one single step of a Metropolis-Hastings algorithm is as valid as direct simulation. (I could not access the paper, but the reference list of Guyader et al. (2011) includes both original papers by John Skilling, so the connection must be made in the paper.) What I find most interesting in this algorithm is that it even achieves unbiasedness (even in the MCMC case!).

## checking for finite variance of importance samplers

Posted in R, Statistics, Travel, University life with tags , , , , , , , , on June 11, 2014 by xi'an

Over a welcomed curry yesterday night in Edinburgh I read this 2008 paper by Koopman, Shephard and Creal, testing the assumptions behind importance sampling, which purpose is to check on-line for (in)finite variance in an importance sampler, based on the empirical distribution of the importance weights. To this goal, the authors use the upper tail  of the weights and a limit theorem that provides the limiting distribution as a type of Pareto distribution

$\dfrac{1}{\beta}\left(1+\xi z/\beta \right)^{-1-1/\xi}$

over (0,∞). And then implement a series of asymptotic tests like the likelihood ratio, Wald and score tests to assess whether or not the power ξ of the Pareto distribution is below ½. While there is nothing wrong with this approach, which produces a statistically validated diagnosis, I still wonder at the added value from a practical perspective, as raw graphs of the estimation sequence itself should exhibit similar jumps and a similar lack of stabilisation as the ones seen in the various figures of the paper. Alternatively, a few repeated calls to the importance sampler should disclose the poor convergence properties of the sampler, as in the above graph. Where the blue line indicates the true value of the integral.

## computational methods for statistical mechanics [day #3]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on June 6, 2014 by xi'an

The third day [morn] at our ICMS workshop was dedicated to path sampling. And rare events. Much more into [my taste] Monte Carlo territory. The first talk by Rosalind Allen looked at reweighting trajectories that are not in an equilibrium or are missing the Boltzmann [normalizing] constant. Although the derivation against a calibration parameter looked like the primary goal rather than the tool for constant estimation. Again papers in J. Chem. Phys.! And a potential link with ABC raised by Antonietta Mira… Then Jonathan Weare discussed stratification. With a nice trick of expressing the normalising constants of the different terms in the partition as solution(s) of a Markov system

$v\mathbf{M}=v$

Because the stochastic matrix M is easier (?) to approximate. Valleau’s and Torrie’s umbrella sampling was a constant reference in this morning of talks. Arnaud Guyader’s talk was in the continuation of Toni Lelièvre’s introduction, which helped a lot in my better understanding of the concepts. Rephrasing things in more statistical terms. Like the distinction between equilibrium and paths. Or bias being importance sampling. Frédéric Cérou actually gave a sort of second part to Arnaud’s talk, using importance splitting algorithms. Presenting an algorithm for simulating rare events that sounded like an opposite nested sampling, where the goal is to get down the target, rather than up. Pushing particles away from a current level of the target function with probability ½. Michela Ottobre completed the series with an entry into diffusion limits in the Roberts-Gelman-Gilks spirit when the Markov chain is not yet stationary. In the transient phase thus.