Archive for extreme value theory

statistical aspects of climate change [discuss]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on August 4, 2022 by xi'an


As part of its annual conference in Aberdeen, Scotland, the RSS is organising a discussion meeting on two papers presented on Wednesday 14 September 2022, 5.00PM – 7.00PM (GMT+1), with free on-line registration.

Two papers will be presented:

‘Assessing present and future risk of water damage using building attributes, meteorology, and topography’ by Heinrich-Mertsching et al.​
‘The importance of context in extreme value analysis with application to extreme temperatures in the USA and Greenland’ by Clarkson et al.​

“The Discussion Meeting at this year’s RSS conference in Aberdeen will feature two papers on the Statistical Aspects of Climate Change. The Discussion Meetings Committee chose this topic area motivated by the UN Climate Change Conference (COP26) held in Glasgow last year and because climate changes and the environment is one of the RSS’s six current campaigning priorities for 2022.

You are welcome to listen to the speakers and join in the discussion of the papers which follows the presentations. All the proceedings will be published in a forthcoming issue of Journal of the Royal Statistical Society, Series C (Applied Statistics) .”

Dr Shirley Coleman, Chair and Honorary Officer for Discussion Meetings

around the table

Posted in Books, pictures, R, Statistics with tags , , , , , , , , , , on December 2, 2020 by xi'an

Monty Python and the Holy Grail: the round table in CamelotThe Riddler has a variant on the classical (discrete) random walk around a circle where every state (but the starting point) has the same probability 1/(n-1) to be visited last. Surprising result that stems almost immediately from the property that, leaving from 0, state a is visited couterclockwise before state b>a is visited clockwise is b/a+b. The variant includes (or seems to include) the starting state 0 as counting for the last visit (as a return to the origin). In that case, all n states, including the origin, but the two neighbours of 0, 1, and n-1, have the same probability to be last. This can also be seen on an R code that approximates (inner loop) the probability that a given state is last visited and record how often this probability is largest (outer loop):

w=0*(1:N)#frequency of most likely last
for(t in 1:1e6){
 o=0*w#probabilities of being last
 for(v in 1:1e6)#sample order of visits
   o[i]=o[i<-1+unique(cumsum(sample(c(-1,1),300,rep=T))%%N)[N]]+1
 w[j]=w[j<-order(o)[N]]+1}

However, upon (jogging) reflection, the double loop is a waste of energy and

o=0*(1:N)
for(v in 1:1e8)
   o[i]=o[i<-1+unique(cumsum(sample(c(-1,1),500,rep=T))%%N)[N]]+1

should be enough to check that all n positions but both neighbours have the same probability of being last visited. Removing the remaining loop should be feasible by considering all subchains starting at one of the 0’s, since this is a renewal state, but I cannot fathom how to code it succinctly. A more detailed coverage of the original problem (that is, omitting the starting point) was published the Monday after publication of the riddle on R bloggers, following a blog post by David Robinson on Variance Explained.

R codegolf challenge: is there a way to shorten the above R for loop in a single line command?!

séminaire P de S

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on February 18, 2020 by xi'an

As I was in Paris and free for the occasion (!), I attended the Paris Statistics seminar this afternoon, in the Latin Quarter. With a first talk by Kweku Abraham on Bayesian inverse problems set a prior on the quantity of interest, γ, rather than its transform G(γ), observed with noise. Always perturbed by the juggling of different distances, like L² versus Kullback-Leibler, in non-parametric frameworks. Reminding me of probabilistic numerics, at least in the framework, since the crux of the talk was 100% about convergence. And a second talk by Leanaïc Chizat on convex neural networks corresponding to an infinite number of neurons, with surprising properties, including implicit bias. And a third talk by Anne Sabourin on PCA for extremes. Which assumed very little on the model but more on the geometry of the distribution, like extremes being concentrated on a subspace. As I was rather tired from an intense week at Warwick, and after a weekend of reading grant applications and Biometrika submissions (!), my foggy brain kept switching to these proposals, trying to make connections with the talks, not completely inappropriately in two cases out of three. (I am afraid the same may happen tomorrow at our probability seminar on computer-based proofs!)

maximal spacing around order statistics [#2]

Posted in Books, R, Statistics, University life with tags , , , , , , , , on June 8, 2018 by xi'an

The proposed solution of the riddle from the Riddler discussed here a few weeks ago is rather approximative, in that the distribution of

\Delta_n=\max_i\,\min_j\,|X_{i}-X_{j}|

when the n-sample is made of iid Normal variates is (a) replaced with the distribution of one arbitrary minimum and (b) the distribution of the minimum is based on an assumption of independence between the absolute differences. Which does not hold, as shown by the above correlation matrix (plotted via corrplot) for N=11 and 10⁴ simulations. One could think that this correlation decreases with N, but it remains essentially 0.2 for larger values of N. (On the other hand, the minima are essentially independent.)

maximal spacing around order statistics

Posted in Books, R, Statistics, University life with tags , , , , , , , on May 17, 2018 by xi'an

The riddle from the Riddler for the coming weeks is extremely simple to express in mathematical terms, as it summarises into characterising the distribution of

\Delta_n=\max_i\,\min_j\,|X_{i}-X_{j}|

when the n-sample is made of iid Normal variates. I however had a hard time finding a result connected with this quantity since most available characterisations are for either Uniform or Exponential variates. I eventually found a 2017 arXival by Nagaraya et al.  covering the issue. Since the Normal distribution belongs to the Gumbel domain of attraction, the extreme spacings, that is the spacings between the most extreme orders statistics [rescaled by nφ(Φ⁻¹{1-n⁻¹})] are asymptotically independent and asymptotically distributed as (Theorem 5, p.15, after correcting a typo):

(\xi_1,\xi_2/2,...)

where the ξ’s are Exp(1) variates. A crude approximation is thus to consider that the above Δ is distributed as the maximum of two standard and independent exponential distributions, modulo the rescaling by  nφ(Φ⁻¹{1-n⁻¹})… But a more adequate result was pointed out to me by Gérard Biau, namely a 1986 Annals of Probability paper by Paul Deheuvels, my former head at ISUP, Université Pierre and Marie Curie. In this paper, Paul Deheuvels establishes that the largest spacing in a normal sample, M¹, satisfies

\mathbb{P}(\sqrt{2\log\,n}\,M^1\le x) \to \prod_{i=1}^{\infty} (1-e^{-ix})^2

from which a conservative upper bound on the value of n required for a given bound x⁰ can be derived. The simulation below compares the limiting cdf (in red) with the empirical cdf of the above Δ based on 10⁴ samples of size n=10³.The limiting cdf is the cdf of the maximum of an infinite sequence of independent exponentials with scales 1,½,…. Which connects with the above result, in fine. For a practical application, the 99% quantile of this distribution is 4.71. To achieve a maximum spacing of, say 0.1, with probability 0.99, one would need 2 log(n) > 5.29²/0.1², i.e., log(n)>1402, which is a pretty large number…