**T**his post is a very preliminary announcement that Jukka Corander, Judith Rousseau and myself are planning an ABC in Svalbard workshop in 2021, on 12-13 April, following the “ABC in…” franchise that started in 2009 in Paris… It would be great to hear expressions of interest from potential participants towards scaling the booking accordingly. (While this is a sequel to the highly productive ABCruise of two years ago, between Helsinki and Stockholm, the meeting will take place in Longyearbyen, Svalbard, and participants will have to fly there from either Oslo or Tromsø, Norway, As boat cruises from Iceland or Greenland start later in the year. Note also that in mid-April, being 80⁰ North, Svalbard enjoys more than 18 hours of sunlight and that the average temperature last April was -3.9⁰C with a high of 4⁰C.) The scientific committee should be constituted very soon, but we already welcome proposals for sessions (and sponsoring, quite obviously!).

## Archive for ABCruise

## revisiting the Gelman-Rubin diagnostic

Posted in Books, pictures, Statistics, Travel, University life with tags ABCruise, asymptotic variance, convergence diagnostics, effective sample size, Gelman-Rubin statistic, Gulf of Bothnia, independence, MCMC, MCMC convergence, Monte Carlo Statistical Methods, stopping rule, subsampling, sunset, Titanic on January 23, 2019 by xi'an**J**ust before Xmas, Dootika Vats (Warwick) and Christina Knudson arXived a paper on a re-evaluation of the ultra-popular 1992 Gelman and Rubin MCMC convergence diagnostic. Which compares within-variance and between-variance on parallel chains started from hopefully dispersed initial values. Or equivalently an under-estimating and an over-estimating estimate of the MCMC average. In this paper, the authors take advantage of the variance estimators developed by Galin Jones, James Flegal, Dootika Vats and co-authors, which are batch mean estimators consistently estimating the asymptotic variance. They also discuss the choice of a cut-off on the ratio R of variance estimates, i.e., how close to one need it be? By relating R to the effective sample size (for which we also have reservations), which gives another way of calibrating the cut-off. The main conclusion of the study is that the recommended 1.1 bound is too large for a reasonable proximity to the true value of the Bayes estimator *(Disclaimer: The above ABCruise header is unrelated with the paper, apart from its use of the Titanic dataset!)
*

In fact, I have other difficulties than setting the cut-off point with the original scheme as a way to assess MCMC convergence or lack thereof, among which

- its dependence on the parameterisation of the chain and on the estimation of a specific target function
- its dependence on the starting distribution which makes the time to convergence not absolutely meaningful
- the confusion between getting to stationarity and exploring the whole target
- its missing the option to resort to subsampling schemes to attain pseudo-independence or scale time to convergence (albeit see 3. above)
- a potential bias brought by the stopping rule.

## Takaisin helsinkiin

Posted in pictures, Statistics, Travel with tags ABCruise, conference, EMS 2017, Europe, ferry harbour, Finland, folded Markov chain, Helsinki, North, Randal Douc, Scandinavia on July 23, 2017 by xi'an**I** am off tomorrow morning to Helsinki for the European Meeting of Statisticians (EMS 2017). Where I will talk on how to handle multiple estimators in Monte Carlo settings (although I have not made enough progress in this direction to include anything truly novel in the talk!) Here are the slides:

I look forward this meeting, as I remember quite fondly the previous one I attended in Budapest. Which was of the highest quality in terms of talks and interactions. (I also remember working hard with Randal Douc on a yet-unfinished project!)

## ABC random forests for Bayesian parameter inference

Posted in Books, Kids, R, Statistics, Travel, University life, Wines with tags ABC approximation error, ABC in Helsinki, abcrf, ABCruise, arXiv, Baltic Sea, Bayesian inference, Gulf of Bothnia, Helsinki, Lapin Kulta, out-of-bag correction, R, random forests, reference table, sunrise on May 20, 2016 by xi'an**B**efore leaving Helsinki, we arXived [from the Air France lounge!] the paper Jean-Michel presented on Monday at ABCruise in Helsinki. This paper summarises the experiments Louis conducted over the past months to assess the great performances of a random forest regression approach to ABC parameter inference. Thus validating in this experimental sense the use of this new approach to conducting ABC for Bayesian inference by random forests. (And not ABC model choice as in the Bioinformatics paper with Pierre Pudlo and others.)

I think the major incentives in exploiting the (still mysterious) tool of random forests [against more traditional ABC approaches like Fearnhead and Prangle (2012) on summary selection] are that (i) forests do not require a preliminary selection of the summary statistics, since an arbitrary number of summaries can be used as input for the random forest, even when including a large number of useless white noise variables; (b) there is no longer a tolerance level involved in the process, since the many trees in the random forest define a natural if rudimentary distance that corresponds to being or not being in the same leaf as the observed vector of summary statistics η(y); (c) the size of the reference table simulated from the prior (predictive) distribution does not need to be as large as for in usual ABC settings and hence this approach leads to significant gains in computing time since the production of the reference table usually is the costly part! To the point that deriving a different forest for each univariate transform of interest is truly a minor drag in the overall computing cost of the approach.

An intriguing point we uncovered through Louis’ experiments is that an unusual version of the variance estimator is preferable to the standard estimator: we indeed exposed better estimation performances when using a weighted version of the out-of-bag residuals (which are computed as the differences between the simulated value of the parameter transforms and their expectation obtained by removing the random trees involving this simulated value). Another intriguing feature [to me] is that the regression weights as proposed by Meinshausen (2006) are obtained as an average of the inverse of the number of terms in the leaf of interest. When estimating the posterior expectation of a transform h(θ) given the observed η(y), this summary statistic η(y) ends up in a given leaf for each tree in the forest and all that matters for computing the weight is the number of points from the reference table ending up in this very leaf. I do find this difficult to explain when confronting the case when many simulated points are in the leaf against the case when a single simulated point makes the leaf. This single point ends up being much more influential that all the points in the other situation… While being an outlier of sorts against the prior simulation. But now that I think more about it (after an expensive Lapin Kulta beer in the Helsinki airport while waiting for a change of tire on our airplane!), it somewhat makes sense that rare simulations that agree with the data should be weighted much more than values that stem from the prior simulations and hence do not translate much of an information brought by the observation. (If this sounds murky, blame the beer.) What I found great about this new approach is that it produces a non-parametric evaluation of the cdf of the quantity of interest h(θ) at no calibration cost or hardly any. (An R package is in the making, to be added to the existing R functions of abcrf we developed for the ABC model choice paper.)