## insufficient statistics for ABC model choice

Posted in Books, Kids, Statistics, University life with tags , , , , , , on February 12, 2014 by xi'an

Julien Stoehr, Pierre Pudlo, and Lionel Cucala (I3M, Montpellier) arXived yesterday a paper entitled “Geometric summary statistics for ABC model choice between hidden Gibbs random fields“. Julien had presented this work at the MCMski 4 poster session.  The move to a hidden Markov random field means that our original approach with Aude Grelaud does not apply: there is no dimension-reduction sufficient statistics in that case… The authors introduce a small collection of (four!) focussed statistics to discriminate between Potts models. They further define a novel misclassification rate, conditional on the observed value and derived from the ABC reference table. It is the predictive error rate

$\mathbb{P}^{\text{ABC}}(\hat{m}(Y)\ne m|S(y^{\text{obs}}))$

integrating in both the model index m and the corresponding random variable Y (and the hidden intermediary parameter) given the observation. Or rather the transform of the observation by the summary statistic S. In a simulation experiment, the paper shows that the predictive error rate decreases quite a lot by including 2 or 4 geometric summary statistics on top of the no-longer-sufficient concordance statistics. (I did not find how the distance is constructed and how it adapts to a larger number of summary statistics.)

[the ABC posterior probability of index m] uses the data twice: a first one to calibrate the set of summary statistics, and a second one to compute the ABC posterior.” (p.8)

It took me a while to understand the above quote. If we consider ABC model choice as we did in our original paper, it only and correctly uses the data once. However, if we select the vector of summary statistics based on an empirical performance indicator resulting from the data then indeed the procedure does use the data twice! Is there a generic way or trick to compensate for that, apart from cross-validation?

## convergence speeds

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , on December 5, 2013 by xi'an

While waiting for Jean-Michel to leave a thesis defence committee he was part of, I read this recently arXived survey by Novak and Rudolf, Computation of expectations by Markov chain Monte Carlo methods. The first part hinted at a sort of Bernoulli factory problem: when computing the expectation of f against the uniform distribution on G,

For x ∈ G we can compute f (x) and G is given by a membership oracle, i.e. we are able to check whether any x is in G or not.

However, the remainder of the paper does not get (in) that direction but recalls instead convergence results for MCMC schemes under various norms. Like spectral gap and Cheeger’s inequalities. So useful for a quick reminder, e.g. to my Monte Carlo Statistical Methods class Master students, but altogether well-known. The paper contains some precise bounds on the mean square error of the Monte Carlo approximation to the integral. For instance, for the hit-and-run algorithm, the uniform bound (for functions f bounded by 1) is

$9.5\cdot 10^{7}\dfrac{dr}{\sqrt{n}}+6.4\cdot 10^{15}\dfrac{d^2r^2}{n}$

where d is the dimension of the space and r a scale of the volume of G. For the Metropolis-Hastings algorithm, with (independent) uniform proposal on G, the bound becomes

$\dfrac{2C\alpha_dr^d}{n}+\dfrac{4C^2\alpha_d^2r^{2d}}{n^2}\,,$

where C is an upper bound on the target density (no longer the uniform). [I rephrased Theorem 2 by replacing vol(G) with the containing hyper-ball to connect both results, αd being the proportionality constant.] The paper also covers the case of the random walk Metropolis-Hastings algorithm, with the deceptively simple bound

$1089\dfrac{(d+1)\max\{\alpha,\sqrt{d+1}\}}{\sqrt{n}}+8.38\cdot 10^5\dfrac{(d+1)\max\{\alpha^2,d+1\}}{n}$

but this is in the special case when G is the ball of radius d. The paper concludes with a list of open problems.

## Importance sampling schemes for evidence approximation in mixture models

Posted in R, Statistics, University life with tags , , , , , , , , , on November 27, 2013 by xi'an

Jeong Eun (Kate) Lee and I completed this paper, “Importance sampling schemes for evidence approximation in mixture models“, now posted on arXiv. (With the customary one-day lag for posting, making me bemoan the days of yore when arXiv would give a definitive arXiv number at the time of submission.) Kate came twice to Paris in the past years to work with me on this evaluation of Chib’s original marginal likelihood estimate (also called the candidate formula by Julian Besag). And on the improvement proposed by Berkhof, van Mechelen, and Gelman (2003), based on averaging over all permutations, idea that we rediscovered in an earlier paper with Jean-Michel Marin. (And that Andrew seemed to have completely forgotten. Despite being the very first one to publish [in English] a paper on a Gibbs sampler for mixtures.) Given that this averaging can get quite costly, we propose a preliminary step to reduce the number of relevant permutations to be considered in the averaging, removing far-away modes that do not contribute to the Rao-Blackwell estimate and called dual importance sampling. We also considered modelling the posterior as a product of k-component mixtures on the components, following a vague idea I had in the back of my mind for many years, but it did not help. In the above boxplot comparison of estimators, the marginal likelihood estimators are

1. Chib’s method using T = 5000 samples with a permutation correction by multiplying by k!.
2. Chib’s method (1), using T = 5000 samples which are randomly permuted.
3. Importance sampling estimate (7), using the maximum likelihood estimate (MLE) of the latents as centre.
4. Dual importance sampling using q in (8).
5. Dual importance sampling using an approximate in (14).
6. Bridge sampling (3). Here, label switching is imposed in hyperparameters.

## ABC for design

Posted in Statistics with tags , , , , , , , on August 30, 2013 by xi'an

I wrote a comment on this arXived paper on simulation based design that starts from Müller (1999) and gets an ABC perspective a while ago on my iPad when travelling to Montpellier and then forgot to download it…

Hainy, [Wener] Müller, and Wagner recently arXived a paper called “Likelihood-free Simulation-based Optimal Design“, paper which relies on ABC to construct optimal designs . Remember that [Peter] Müller (1999) uses a natural simulated annealing that is quite similar to our MAP [SAME] algorithm with Arnaud Doucet and Simon Godsill, relying on multiple versions of the data set to get to the maximum. The paper also builds upon our 2006 JASA paper with my then PhD student Billy Amzal, Eric Parent, and Frederic Bois, paper that took advantage of the then emerging particle methods to improve upon a static horizon target. While our method is sequential in that it pursues a moving target, it does not rely on the generic methodology developed by del Moral et al. (2006), where a backward kernel brings more stability to the moves. The paper also implements a version of our population Monte Carlo ABC algorithm (Beaumont et al., 2009), as a first step before an MCMC simulation. Overall, the paper sounds more like a review than like a strongly directive entry into ABC based design in that it remains quite generic. Not that I have specific suggestions, mind!, but I fear a realistic implementation (as opposed to the linear model used in the paper) would require a certain amount of calibration. There are missing references of recent papers using ABC for design, including some by Michael Stumpf I think.

I did not know about the Kuck et al. reference… Which is reproducing our 2006 approach within the del Moral framework. It uses a continuous temperature scale that I find artificial and not that useful, again a maybe superficial comment as I didn’t get very much into the paper … Just that integer powers lead to multiples of the sample and have a nice algorithmic counterpart.

## arXiv recent postings

Posted in Statistics, University life with tags , , , , , , , on June 14, 2013 by xi'an

As I glanced thru the recent arXiv postings in statistics, I found an overload of potential goodies, many more than I could handle in a reasonable time (unless I give up the NYT altogether!)…

For instance, Paulo Marques—a regular ‘Og commenter—wrote about an extension of Chib’s formula when the posterior is approximated in a non-parametric manner. (This was an idea I had toyed with at a time without pursuing it anywhere…)   I wonder at the stability of the approximation for two reasons: (i) Chib’s or Bayes’ formula does not require an expectation as the ratio is constant in θ; averaging over realisations of θ could have negative effects on the stability of the approximation (or not); (ii) using a non-parametric estimate will see the performances of Chib’s approximation plummet as the dimension of θ increases, most likely.

Another post is by Nogales, Pérez, and Monfort and deals with a Monte Carlo formula for conditional expectations that leaves me quite puzzled… More than excited about a potential application in ABC. Indeed, the idea is to replace the conditioning on the (hard) equality with a soft ball around the equality with a tolerance level ε. Since the authors do not seem particularly interested in ABC, I do no see the point in this technique. The first example they present does not account for the increasing computational effort as ε decreases. The second part of the paper deals with the best invariant estimator of the position parameter of a general half-normal distribution, using two conditional expectations in Proposition 2. I wonder how the method compares with bridge sampling and MCMC methods. As the paper seems to have gone beyond the Monte Carlo issue at this stage. And focus on this best invariant estimation…

Then Feroz, Hobson,  Cameron and Pettitt posted a work on importance nested sampling, on which I need to spend more time given the connection to our nested sampling paper with Nicolas. The paper builds on the Multinest software developped by Feroz, Hobson and co-authors. (Whose above picture is borrowed from.)

A side post on the moments of the partial non-central chi-square distribution function by Gill, Segura and Temme, since I always liked the specifics of this distribution… No comments so far!