## discussions on Gerber and Chopin

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , on May 29, 2015 by xi'an

As a coincidence, I received my copy of JRSS Series B with the Read Paper by Mathieu Gerber and Nicolas Chopin on sequential quasi Monte Carlo just as I was preparing an arXival of a few discussions on the paper! Among the [numerous and diverse] discussions, a few were of particular interest to me [I highlighted members of the University of Warwick and of Université Paris-Dauphine to suggest potential biases!]:

1. Mike Pitt (Warwick), Murray Pollock et al.  (Warwick) and Finke et al. (Warwick) all suggested combining quasi Monte Carlo with pseudomarginal Metropolis-Hastings, pMCMC (Pitt) and Rao-Bklackwellisation (Finke et al.);
2. Arnaud Doucet pointed out that John Skilling had used the Hilbert (ordering) curve in a 2004 paper;
3. Chris Oates, Dan Simpson and Mark Girolami (Warwick) suggested combining quasi Monte Carlo with their functional control variate idea;
4. Richard Everitt wondered about the dimension barrier of d=6 and about possible slice extensions;
5. Zhijian He and Art Owen pointed out simple solutions to handle a random number of uniforms (for simulating each step in sequential Monte Carlo), namely to start with quasi Monte Carlo and end up with regular Monte Carlo, in an hybrid manner;
6. Hans Künsch points out the connection with systematic resampling à la Carpenter, Clifford and Fearnhead (1999) and wonders about separating the impact of quasi Monte Carlo between resampling and propagating [which vaguely links to one of my comments];
7. Pierre L’Ecuyer points out a possible improvement over the Hilbert curve by a preliminary sorting;
8. Frederik Lindsten and Sumeet Singh propose using ABC to extend the backward smoother to intractable cases [but still with a fixed number of uniforms to use at each step], as well as Mateu and Ryder (Paris-Dauphine) for a more general class of intractable models;
9. Omiros Papaspiliopoulos wonders at the possibility of a quasi Markov chain with “low discrepancy paths”;
10. Daniel Rudolf suggest linking the error rate of sequential quasi Monte Carlo with the bounds of Vapnik and Ĉervonenkis (1977).

The arXiv document also includes the discussions by Julyan Arbel and Igor Prünster (Turino) on the Bayesian nonparametric side of sqMC and by Robin Ryder (Dauphine) on the potential of sqMC for ABC.

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , , , , on May 27, 2015 by xi'an

Another trip in the métro today (to work with Pierre Jacob and Lawrence Murray in a Paris Anticafé!, as the University was closed) led me to infer—warning!, this is not the exact distribution!—the distribution of x, namely

$f(x|N) = \frac{4^p}{4^{\ell+2p}} {\ell+p \choose p}\,\mathbb{I}_{N=\ell+2p}$

since a path x of length l(x) will corresponds to N draws if N-l(x) is an even integer 2p and p undistinguishable annihilations in 4 possible directions have to be distributed over l(x)+1 possible locations, with Feller’s number of distinguishable distributions as a result. With a prior π(N)=1/N on N, hence on p, the posterior on p is given by

$\pi(p|x) \propto 4^{-p} {\ell+p \choose p} \frac{1}{\ell+2p}$

Now, given N and  x, the probability of no annihilation on the last round is 1 when l(x)=N and in general

$\frac{4^p}{4^{\ell+2p}}{\ell-1+p \choose p}\big/\frac{4^p}{4^{\ell+2p}}{\ell+p \choose p}=\frac{\ell}{\ell+p}=\frac{2\ell}{N+\ell}$

which can be integrated against the posterior. The numerical expectation is represented for a range of values of l(x) in the above graph. Interestingly, the posterior probability is constant for l(x) large  and equal to 0.8125 under a flat prior over N.

Getting back to Pierre Druilhet’s approach, he sets a flat prior on the length of the path θ and from there derives that the probability of annihilation is about 3/4. However, “the uniform prior on the paths of lengths lower or equal to M” used for this derivation which gives a probability of length l proportional to 3l is quite different from the distribution of l(θ) given a number of draws N. Which as shown above looks much more like a Binomial B(N,1/2).

However, being not quite certain about the reasoning involving Fieller’s trick, I ran an ABC experiment under a flat prior restricted to (l(x),4l(x)) and got the above, where the histogram is for a posterior sample associated with l(x)=195 and the gold curve is the potential posterior. Since ABC is exact in this case (i.e., I only picked N’s for which l(x)=195), ABC is not to blame for the discrepancy! I asked about the distribution on Stack Exchange maths forum (and a few colleagues here as well) but got no reply so far… Here is the R code that goes with the ABC implementation:

#observation:
elo=195
#ABC version
T=1e6
el=rep(NA,T)
N=sample(elo:(4*elo),T,rep=TRUE)
for (t in 1:T){
#generate a path
paz=sample(c(-(1:2),1:2),N[t],rep=TRUE)
#eliminate U-turns
uturn=paz[-N[t]]==-paz[-1]
while (sum(uturn>0)){
uturn[-1]=uturn[-1]*(1-
uturn[-(length(paz)-1)])
uturn=c((1:(length(paz)-1))[uturn==1],
(2:length(paz))[uturn==1])
paz=paz[-uturn]
uturn=paz[-length(paz)]==-paz[-1]
}
el[t]=length(paz)}
#subsample to get exact posterior
poster=N[abs(el-elo)==0]


## speed seminar-ing

Posted in Books, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , , on May 20, 2015 by xi'an

Yesterday, I  made a quick afternoon trip to Montpellier as replacement of a seminar speaker who had cancelled at the last minute. Most obviously, I gave a talk about our “testing as mixture” proposal. And as previously, the talk generated a fair amount of discussion and feedback from the audience. Providing me with additional aspects to include in a revision of the paper. Whether or not the current submission is rejected, new points made and received during those seminars will have to get in a revised version as they definitely add to the appeal to the perspective. In that seminar, most of the discussion concentrated on the connection with decisions based on such a tool as the posterior distribution of the mixture weight(s). My argument for sticking with the posterior rather than providing a hard decision rule was that the message is indeed in arguing hard rules that end up mimicking the p- or b-values. And the catastrophic consequences of fishing for significance and the like. Producing instead a validation by simulating under each model pseudo-samples shows what to expect for each model under comparison. The argument did not really convince Jean-Michel Marin, I am afraid! Another point he raised was that we could instead use a distribution on α with support {0,1}, to avoid the encompassing model he felt was too far from the original models. However, this leads back to the Bayes factor as the weights in 0 and 1 are the marginal likelihoods, nothing more. However, this perspective on the classical approach has at least the appeal of completely validating the use of improper priors on common (nuisance or not) parameters. Pierre Pudlo also wondered why we could not conduct an analysis on the mixture of the likelihoods. Instead of the likelihood of the mixture. My first answer was that there was not enough information in the data for estimating the weight(s). A few more seconds of reflection led me to the further argument that the posterior on α with support (0,1) would then be a mixture of Be(2,1) and Be(1,2) with weights the marginal likelihoods, again (under a uniform prior on α). So indeed not much to gain. A last point we discussed was the case of the evolution trees we analyse with population geneticists from the neighbourhood (and with ABC). Jean-Michel’s argument was that the scenari under comparison were not compatible with a mixture, the models being exclusive. My reply involved an admixture model that contained all scenarios as special cases. After a longer pondering, I think his objection was more about the non iid nature of the data. But the admixture construction remains valid. And makes a very strong case in favour of our approach, I believe.

After the seminar, Christian Lavergne and Jean-Michel had organised a doubly exceptional wine-and-cheese party: first because it is not usually the case there is such a post-seminar party and second because they had chosen a terrific series of wines from the Mas Bruguière (Pic Saint-Loup) vineyards. Ending up with a great 2007 L’Arbouse. Perfect ending for an exciting day. (I am not even mentioning a special Livarot from close to my home-town!)

## ABC and cosmology

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on May 4, 2015 by xi'an

Two papers appeared on arXiv in the past two days with the similar theme of applying ABC-PMC [one version of which we developed with Mark Beaumont, Jean-Marie Cornuet, and Jean-Michel Marin in 2009] to cosmological problems. (As a further coincidence, I had just started refereeing yet another paper on ABC-PMC in another astronomy problem!) The first paper cosmoabc: Likelihood-free inference via Population Monte Carlo Approximate Bayesian Computation by Ishida et al. [“et al” including Ewan Cameron] proposes a Python ABC-PMC sampler with applications to galaxy clusters catalogues. The paper is primarily a description of the cosmoabc package, including code snapshots. Earlier occurrences of ABC in cosmology are found for instance in this earlier workshop, as well as in Cameron and Pettitt earlier paper. The package offers a way to evaluate the impact of a specific distance, with a 2D-graph demonstrating that the minimum [if not the range] of the simulated distances increases with the parameters getting away from the best parameter values.

“We emphasis [sic] that the choice of the distance function is a crucial step in the design of the ABC algorithm and the reader must check its properties carefully before any ABC implementation is attempted.” E.E.O. Ishida et al.

The second [by one day] paper Approximate Bayesian computation for forward modelling in cosmology by Akeret et al. also proposes a Python ABC-PMC sampler, abcpmc. With fairly similar explanations: maybe both samplers should be compared on a reference dataset. While I first thought the description of the algorithm was rather close to our version, including the choice of the empirical covariance matrix with the factor 2, it appears it is adapted from a tutorial in the Journal of Mathematical Psychology by Turner and van Zandt. One out of many tutorials and surveys on the ABC method, of which I was unaware, but which summarises the pre-2012 developments rather nicely. Except for missing Paul Fearnhead’s and Dennis Prangle’s semi-automatic Read Paper. In the abcpmc paper, the update of the covariance matrix is the one proposed by Sarah Filippi and co-authors, which includes an extra bias term for faraway particles.

“For complex data, it can be difficult or computationally expensive to calculate the distance ρ(x; y) using all the information available in x and y.” Akeret et al.

In both papers, the role of the distance is stressed as being quite important. However, the cosmoabc paper uses an L1 distance [see (2) therein] in a toy example without normalising between mean and variance, while the abcpmc paper suggests using a Mahalanobis distance that turns the d-dimensional problem into a comparison of one-dimensional projections.

## extending ABC to high dimensions via Gaussian copula

Posted in Books, pictures, Statistics, Travel, Uncategorized, University life with tags , , , on April 28, 2015 by xi'an

Li, Nott, Fan, and Sisson arXived last week a new paper on ABC methodology that I read on my way to Warwick this morning. The central idea in the paper is (i) to estimate marginal posterior densities for the components of the model parameter by non-parametric means; and (ii) to consider all pairs of components to deduce the correlation matrix R of the Gaussian (pdf) transform of the pairwise rank statistic. From those two low-dimensional estimates, the authors derive a joint Gaussian-copula distribution by using inverse  pdf transforms and the correlation matrix R, to end up with a meta-Gaussian representation

$f(\theta)=\dfrac{1}{|R|^{1/2}}\exp\{\eta^\prime(I-R^{-1})\eta/2\}\prod_{i=1}^p g_i(\theta_i)$

where the η’s are the Gaussian transforms of the inverse-cdf transforms of the θ’s,that is,

$\eta_i=\Phi^{-1}(G_i(\theta_i))$

Or rather

$\eta_i=\Phi^{-1}(\hat{G}_i(\theta_i))$

given that the g’s are estimated.

This is obviously an approximation of the joint in that, even in the most favourable case when the g’s are perfectly estimated, and thus the components perfectly Gaussian, the joint is not necessarily Gaussian… But it sounds quite interesting, provided the cost of running all those transforms is not overwhelming. For instance, if the g’s are kernel density estimators, they involve sums of possibly a large number of terms.

One thing that bothers me in the approach, albeit mostly at a conceptual level for I realise the practical appeal is the use of different summary statistics for approximating different uni- and bi-dimensional marginals. This makes for an incoherent joint distribution, again at a conceptual level as I do not see immediate practical consequences… Those local summaries also have to be identified, component by component, which adds another level of computational cost to the approach, even when using a semi-automatic approach as in Fernhead and Prangle (2012). Although the whole algorithm relies on a single reference table.

The examples in the paper are (i) the banana shaped “Gaussian” distribution of Haario et al. (1999) that we used in our PMC papers, with a twist; and (ii) a g-and-k quantile distribution. The twist in the banana (!) is that the banana distribution is the prior associated with the mean of a Gaussian observation. In that case, the meta-Gaussian representation seems to hold almost perfectly, even in p=50 dimensions. (If I remember correctly, the hard part in analysing the banana distribution was reaching the tails, which are extremely elongated in at least one direction.) For the g-and-k quantile distribution, the same holds, even for a regular ABC. What seems to be of further interest would be to exhibit examples where the meta-Gaussian is clearly an approximation. If such cases exist.

## scalable Bayesian inference for the inverse temperature of a hidden Potts model

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , on April 7, 2015 by xi'an

Matt Moores, Tony Pettitt, and Kerrie Mengersen arXived a paper yesterday comparing different computational approaches to the processing of hidden Potts models and of the intractable normalising constant in the Potts model. This is a very interesting paper, first because it provides a comprehensive survey of the main methods used in handling this annoying normalising constant Z(β), namely pseudo-likelihood, the exchange algorithm, path sampling (a.k.a., thermal integration), and ABC. A massive simulation experiment with individual simulation times up to 400 hours leads to select path sampling (what else?!) as the (XL) method of choice. Thanks to a pre-computation of the expectation of the sufficient statistic E[S(Z)|β].  I just wonder why the same was not done for ABC, as in the recent Statistics and Computing paper we wrote with Matt and Kerrie. As it happens, I was actually discussing yesterday in Columbia of potential if huge improvements in processing Ising and Potts models by approximating first the distribution of S(X) for some or all β before launching ABC or the exchange algorithm. (In fact, this is a more generic desiderata for all ABC methods that simulating directly if approximately the summary statistics would being huge gains in computing time, thus possible in final precision.) Simulating the distribution of the summary and sufficient Potts statistic S(X) reduces to simulating this distribution with a null correlation, as exploited in Cucala and Marin (2013, JCGS, Special ICMS issue). However, there does not seem to be an efficient way to do so, i.e. without reverting to simulating the entire grid X…

## MCMskv, Lenzerheide, Jan. 5-7, 2016

Posted in Kids, Mountains, pictures, R, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on March 31, 2015 by xi'an

Following the highly successful [authorised opinion!, from objective sources] MCMski IV, in Chamonix last year, the BayesComp section of ISBA has decided in favour of a two-year period, which means the great item of news that next year we will meet again for MCMski V [or MCMskv for short], this time on the snowy slopes of the Swiss town of Lenzerheide, south of Zürich. The committees are headed by the indefatigable Antonietta Mira and Mark Girolami. The plenary speakers have already been contacted and Steve Scott (Google), Steve Fienberg (CMU), David Dunson (Duke), Krys Latuszynski (Warwick), and Tony Lelièvre (Mines, Paris), have agreed to talk. Similarly, the nine invited sessions have been selected and will include Hamiltonian Monte Carlo,  Algorithms for Intractable Problems (ABC included!), Theory of (Ultra)High-Dimensional Bayesian Computation, Bayesian NonParametrics, Bayesian Econometrics,  Quasi Monte Carlo, Statistics of Deep Learning, Uncertainty Quantification in Mathematical Models, and Biostatistics. There will be afternoon tutorials, including a practical session from the Stan team, tutorials for which call is open, poster sessions, a conference dinner at which we will be entertained by the unstoppable Imposteriors. The Richard Tweedie ski race is back as well, with a pair of Blossom skis for the winner!

As in Chamonix, there will be parallel sessions and hence the scientific committee has issued a call for proposals to organise contributed sessions, tutorials and the presentation of posters on particularly timely and exciting areas of research relevant and of current interest to Bayesian Computation. All proposals should be sent to Mark Girolami directly by May the 4th (be with him!).