Archive for sampling

data assimilation and reduced modelling for high-D problems [CIRM]

Posted in Books, Kids, Mountains, pictures, Running, Statistics, University life with tags , , , , , , , , , , , , , , , , , on February 8, 2021 by xi'an

Next summer, from 19 July till 27 August, there will be a six week program at CIRM on the above theme, bringing together scientists from both the academic and industrial communities. The program includes a one-week summer school followed by 5 weeks of research sessions on projects proposed by academic and industrial partners.

Confirmed speakers of the summer school (Jul 19-23) are:

  • Albert Cohen (Sorbonne University)
  • Masoumeh Dashti (University of Sussex)
  • Eric Moulines (Ecole Polytechnique)
  • Anthony Nouy (Ecole Centrale de Nantes)
  • Claudia Schillings (Mannheim University)

Junior participants may apply for fellowships to cover part or the whole stay. Registration and application to fellowships will be open soon.

state of the art in sampling & clustering [workshop]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on September 17, 2020 by xi'an

Next month, I am taking part in a workshop on sampling & clustering at the Max-Planck-Institut für Physik in Garching, Germany (near München). By giving a three hour introduction to ABC, as I did three years ago in Autrans. Being there and talking with local researchers if the sanitary conditions allow. From my office otherwise. Other speakers include Michael Betancourt on HMC and Johannes Buchner on nested sampling. The remote participation to this MPI workshop is both open and free, but participants must register before 18 September, namely tomorrow.

a perfectly normally distributed sample

Posted in R, Statistics with tags , , , , , , , , on May 9, 2019 by xi'an

When I saw this title on R-bloggers, I was wondering how “more perfect” a Normal sample could be when compared with the outcome of rnorm(n). Hence went checking the original blog on bayestestR in search of more information. Which was stating nothing more than how to generate a sample is perfectly normal by using the rnorm_perfect function. Still unsure of the meaning, I contacted one of the contributors who replied very quickly

…that’s actually a good question. I would say an empirical sample having characteristics as close as possible to a cannonic gaussian distribution.
and again leaving me hungering for more details. I thus downloaded the package bayestestR and opened the rnorm_perfect function. Which is simply the sequence of n-quantiles
stats::qnorm(seq(1/n, 1 – 1/n, length.out = n), mean, sd)
which I would definitely not call a sample as it has nothing random. And perfect?! Not really, unless one associates randomness and imperfection.

approximate Bayesian inference under informative sampling

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , on March 30, 2018 by xi'an

In the first issue of this year Biometrika, I spotted a paper with the above title, written by Wang, Kim, and Yang, and thought it was a particular case of ABC. However, when I read it on a rare metro ride to Dauphine, thanks to my hurting knee!, I got increasingly disappointed as the contents had nothing to do with ABC. The purpose of the paper was to derive a consistent and convergent posterior distribution based on a estimator of the parameter θ that is… consistent and convergent under informative sampling. Using for instance a Normal approximation to the sampling distribution of this estimator. Or to the sampling distribution of the pseudo-score function, S(θ) [which pseudo-normality reminded me of Ron Gallant’s approximations and of my comments on them]. The paper then considers a generalisation to the case of estimating equations, U(θ), which may again enjoy a Normal asymptotic distribution. Involving an object that does not make direct Bayesian sense, namely the posterior of the parameter θ given U(θ)…. (The algorithm proposed to generate from this posterior (8) is also a mystery.) Since the approach requires consistent estimators to start with and aims at reproducing frequentist coverage properties, I am thus at a loss as to why this pseudo-Bayesian framework is adopted.

sampling by exhaustion

Posted in Books, Kids, R, Statistics with tags , , , , on November 25, 2016 by xi'an

The riddle set by The Riddler of last week sums up as follows:

Within a population of size N, each individual in the population independently selects another individual. All individuals selected at least once are removed and the process iterates until one or zero individual is left. What is the probability that there is zero individual left?

While I cannot see a clean analytical solution to this problem, it reminds me of an enveloppe-versus-letter (matches) problem I saw in graduate school. Indeed, the expected number of removed (or selected) individuals is given by

N\left\{1-\frac{N-2}{N-1}\right\}^{N-1}

which is equivalent to (1-e⁻¹)N for N large, meaning that the population decreases by an average factor of e⁻¹ at each round. And that it takes on average approximately log(N) iterations to reach a single individual. A simulation of the first probabilities of ending with one individual led me to the above curve, which wiggles in an almost periodic way around the probability ½, equal to the average of those probabilities. Using the R code

rad=function(N){#next population size
  ut=sample(rep(2:N,2),1)
  for (i in 2:N)#sampling
   ut=c(ut,sample(rep((1:N)[-i],2),1))
  return(N-length(unique(ut))}
sal=rep(0,m);sal[1]=1
for (N in 3:M){
 prop=0;
 for (t in 1:T){#one single step
   i=rad(N)
   if (i>0) prop=prop+sal[i]}
 sal[N]=prop/T}

which exploits the previously computed probabilities. The variability is most interesting if unexpected, but looking back at Feller‘s sections and exercises on the classical occupancy problem, I could not find a connection with this problem. If it exists. Still, if N is large enough, the exclusion of one index from the selection becomes negligible and the probability of moving from n to m individuals should be approximately [Feller, eqn (2.4), p.102]

p_n(m)={n\choose m}\sum_{v=}^{n-m} (-1)^v {n-m\choose v} \left(1-\frac{m+v}{n}\right)^n

This formula approximates quite well the exact probability, but as in a previous post about the birthday problem, it proves quite delicate to compute. As already noticed by Feller.

%d bloggers like this: