## truncated t’s [typo]

Posted in pictures, Statistics with tags , , , , , on March 14, 2014 by xi'an

Last night, I received this email from Piero Foscari (im Hamburg) about my moment derivations for the absolute and the positive t distribution:

There might be two typos in the final second moment formula and its derivation (assuming no silly symmetric mistakes in my validation code): the first ν ought to be -ν, and there should be a corresponding scaling factor also for the boundary μ in Pμ,ν-2 since it arises from a change of variable. Btw in the text reference to Fig. 2 |X| wasn’t updated to X+. I hope that this is of some use.

and I checked that indeed I had forgotten the scale factor ν/(ν-2) in the t distribution with ν-2 degrees of freedom as well as the sign… So I modified the note and rearXived it. Sorry about this lack of attention to the derivation!

## insufficient statistics for ABC model choice

Posted in Books, Kids, Statistics, University life with tags , , , , , , on February 12, 2014 by xi'an

Julien Stoehr, Pierre Pudlo, and Lionel Cucala (I3M, Montpellier) arXived yesterday a paper entitled “Geometric summary statistics for ABC model choice between hidden Gibbs random fields“. Julien had presented this work at the MCMski 4 poster session.  The move to a hidden Markov random field means that our original approach with Aude Grelaud does not apply: there is no dimension-reduction sufficient statistics in that case… The authors introduce a small collection of (four!) focussed statistics to discriminate between Potts models. They further define a novel misclassification rate, conditional on the observed value and derived from the ABC reference table. It is the predictive error rate

$\mathbb{P}^{\text{ABC}}(\hat{m}(Y)\ne m|S(y^{\text{obs}}))$

integrating in both the model index m and the corresponding random variable Y (and the hidden intermediary parameter) given the observation. Or rather the transform of the observation by the summary statistic S. In a simulation experiment, the paper shows that the predictive error rate decreases quite a lot by including 2 or 4 geometric summary statistics on top of the no-longer-sufficient concordance statistics. (I did not find how the distance is constructed and how it adapts to a larger number of summary statistics.)

[the ABC posterior probability of index m] uses the data twice: a first one to calibrate the set of summary statistics, and a second one to compute the ABC posterior.” (p.8)

It took me a while to understand the above quote. If we consider ABC model choice as we did in our original paper, it only and correctly uses the data once. However, if we select the vector of summary statistics based on an empirical performance indicator resulting from the data then indeed the procedure does use the data twice! Is there a generic way or trick to compensate for that, apart from cross-validation?

## convergence speeds

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , on December 5, 2013 by xi'an

While waiting for Jean-Michel to leave a thesis defence committee he was part of, I read this recently arXived survey by Novak and Rudolf, Computation of expectations by Markov chain Monte Carlo methods. The first part hinted at a sort of Bernoulli factory problem: when computing the expectation of f against the uniform distribution on G,

For x ∈ G we can compute f (x) and G is given by a membership oracle, i.e. we are able to check whether any x is in G or not.

However, the remainder of the paper does not get (in) that direction but recalls instead convergence results for MCMC schemes under various norms. Like spectral gap and Cheeger’s inequalities. So useful for a quick reminder, e.g. to my Monte Carlo Statistical Methods class Master students, but altogether well-known. The paper contains some precise bounds on the mean square error of the Monte Carlo approximation to the integral. For instance, for the hit-and-run algorithm, the uniform bound (for functions f bounded by 1) is

$9.5\cdot 10^{7}\dfrac{dr}{\sqrt{n}}+6.4\cdot 10^{15}\dfrac{d^2r^2}{n}$

where d is the dimension of the space and r a scale of the volume of G. For the Metropolis-Hastings algorithm, with (independent) uniform proposal on G, the bound becomes

$\dfrac{2C\alpha_dr^d}{n}+\dfrac{4C^2\alpha_d^2r^{2d}}{n^2}\,,$

where C is an upper bound on the target density (no longer the uniform). [I rephrased Theorem 2 by replacing vol(G) with the containing hyper-ball to connect both results, αd being the proportionality constant.] The paper also covers the case of the random walk Metropolis-Hastings algorithm, with the deceptively simple bound

$1089\dfrac{(d+1)\max\{\alpha,\sqrt{d+1}\}}{\sqrt{n}}+8.38\cdot 10^5\dfrac{(d+1)\max\{\alpha^2,d+1\}}{n}$

but this is in the special case when G is the ball of radius d. The paper concludes with a list of open problems.

## Importance sampling schemes for evidence approximation in mixture models

Posted in R, Statistics, University life with tags , , , , , , , , , on November 27, 2013 by xi'an

Jeong Eun (Kate) Lee and I completed this paper, “Importance sampling schemes for evidence approximation in mixture models“, now posted on arXiv. (With the customary one-day lag for posting, making me bemoan the days of yore when arXiv would give a definitive arXiv number at the time of submission.) Kate came twice to Paris in the past years to work with me on this evaluation of Chib’s original marginal likelihood estimate (also called the candidate formula by Julian Besag). And on the improvement proposed by Berkhof, van Mechelen, and Gelman (2003), based on averaging over all permutations, idea that we rediscovered in an earlier paper with Jean-Michel Marin. (And that Andrew seemed to have completely forgotten. Despite being the very first one to publish [in English] a paper on a Gibbs sampler for mixtures.) Given that this averaging can get quite costly, we propose a preliminary step to reduce the number of relevant permutations to be considered in the averaging, removing far-away modes that do not contribute to the Rao-Blackwell estimate and called dual importance sampling. We also considered modelling the posterior as a product of k-component mixtures on the components, following a vague idea I had in the back of my mind for many years, but it did not help. In the above boxplot comparison of estimators, the marginal likelihood estimators are

1. Chib’s method using T = 5000 samples with a permutation correction by multiplying by k!.
2. Chib’s method (1), using T = 5000 samples which are randomly permuted.
3. Importance sampling estimate (7), using the maximum likelihood estimate (MLE) of the latents as centre.
4. Dual importance sampling using q in (8).
5. Dual importance sampling using an approximate in (14).
6. Bridge sampling (3). Here, label switching is imposed in hyperparameters.

## ABC for design

Posted in Statistics with tags , , , , , , , on August 30, 2013 by xi'an

I wrote a comment on this arXived paper on simulation based design that starts from Müller (1999) and gets an ABC perspective a while ago on my iPad when travelling to Montpellier and then forgot to download it…

Hainy, [Wener] Müller, and Wagner recently arXived a paper called “Likelihood-free Simulation-based Optimal Design“, paper which relies on ABC to construct optimal designs . Remember that [Peter] Müller (1999) uses a natural simulated annealing that is quite similar to our MAP [SAME] algorithm with Arnaud Doucet and Simon Godsill, relying on multiple versions of the data set to get to the maximum. The paper also builds upon our 2006 JASA paper with my then PhD student Billy Amzal, Eric Parent, and Frederic Bois, paper that took advantage of the then emerging particle methods to improve upon a static horizon target. While our method is sequential in that it pursues a moving target, it does not rely on the generic methodology developed by del Moral et al. (2006), where a backward kernel brings more stability to the moves. The paper also implements a version of our population Monte Carlo ABC algorithm (Beaumont et al., 2009), as a first step before an MCMC simulation. Overall, the paper sounds more like a review than like a strongly directive entry into ABC based design in that it remains quite generic. Not that I have specific suggestions, mind!, but I fear a realistic implementation (as opposed to the linear model used in the paper) would require a certain amount of calibration. There are missing references of recent papers using ABC for design, including some by Michael Stumpf I think.

I did not know about the Kuck et al. reference… Which is reproducing our 2006 approach within the del Moral framework. It uses a continuous temperature scale that I find artificial and not that useful, again a maybe superficial comment as I didn’t get very much into the paper … Just that integer powers lead to multiples of the sample and have a nice algorithmic counterpart.