**H**ere are the slides of the presentation I gave at the EPSRC Advanced Computational methods for complex models in Biology at University College London, last week. Introducing random forests as proper summaries for both model choice and parameter estimation (with considerable overlap with earlier slides, obviously!). The other talks of that highly interesting day on computational Biology were mostly about ancestral graphs, using Wright-Fisher diffusions for coalescents, plus a comparison of expectation-propagation and ABC on a genealogy model by Mark Beaumont and the decision theoretic approach to HMM order estimation by Chris Holmes. In addition, it gave me the opportunity to come back to the Department of Statistics at UCL more than twenty years after my previous visit, at a time when my friend Costas Goutis was still there. And to realise it had moved from its historical premises years ago. (I wonder what happened to the two staircases built to reduce frictions between Fisher and Pearson if I remember correctly…)

## Archive for the University life Category

## advanced computational methods for complex models in Biology [talk]

Posted in Books, pictures, Statistics, Travel, University life with tags ABC, Bayesian computing, Biology, coalescent, computational biology, England, EPSRC, expectation-propagation, London, random forests, UCL, University College London, Wright-Fisher model on September 29, 2016 by xi'an## Bayesian model selection without evidence

Posted in Books, Statistics, University life with tags Bayes factor, Bayesian computation, evidence, Metropolis-Hastings algorithm, Monte Carlo Statistical Methods, normalising constant, Peter Green, reversible jump MCMC on September 20, 2016 by xi'an

“The new method circumvents the challenges associated with accurate evidence calculations by computing posterior odds ratios using Bayesian parameter estimation”

**O**ne paper leading to another, I had a look at Hee et al. 2015 paper on Bayes factor estimation. The “novelty” stands in introducing the model index as an extra parameter in a single model encompassing all models under comparison, the “new” parameterisation being in (θ,n) rather than in θ. With the distinction that the parameter θ is now made of the *union* of all parameters across all models. Which reminds us very much of Carlin and Chib (1995) approach to the problem. (Peter Green in his Biometrika (1995) paper on reversible jump MCMC uses instead a *direct sum* of parameter spaces.) The authors indeed suggest simulating jointly (θ,n) in an MCMC or nested sampling scheme. Rather than being updated by arbitrary transforms as in Carlin and Chib (1995) the useless parameters from the other models are kept constant… The goal being to estimate P(n|D) the marginal posterior on the model index, aka the posterior probability of model n.

Now, I am quite not certain keeping the other parameter constants is a valid move: given a uniform prior on n and an equally uniform proposal, the acceptance probability simplifies into the regular Metropolis-Hastings ratio for model n. Hence the move is valid within model n. If not, I presume the previous pair (θ⁰,n⁰) is repeated. Wait!, actually, this is slightly more elaborate: if a new value of n, m, is proposed, then the acceptance ratio involves the posteriors for both n⁰ and m, possibly only the likelihoods when the proposal is the prior. So the move will directly depend on the likelihood ratio in this simplified case, which indicates the scheme could be correct after all. Except that this neglects the measure theoretic subtleties that led to reversible jump symmetry and hence makes me wonder. In other words, it follows exactly the same pattern as reversible jump without the constraints of the latter… Free lunch, anyone?!

## snapshots from Nature

Posted in Books, Kids, pictures, University life with tags cortex, Cuba, evolution, Frankenstein, honeybird, Mary Shelley, Nature, Pierre Simon de Laplace, Poisson distribution, Wold Cup, Zika on September 19, 2016 by xi'an**A**mong many interesting things I read from the pile of Nature issues that had accumulated over a month of travelling, with a warning these are mostly “old” news by now!:

- the very special and untouched case of Cuba in terms of the Zika epidemics, thanks to a long term policy fighting mosquitoes at all levels of the society;
- an impressive map of the human cortex, which statistical analysis would be fascinating;
- an excerpt from Nature 13 August 1966 where the Poisson distribution was said to describe the distribution of scores during the 1966 World Cup;
- an analysis of a genetic experiment on evolution involving 50,000 generations (!) of Escherichia coli;
- a look back at the great novel Flowers for Algernon, novel I read eons ago;
- a Nature paper on the first soft robot, or octobot, along with some easier introduction, which did not tell which kind of operations could be accomplished by such a robot;
- a vignette on a Science paper about the interaction between honey hunters and hunting birds, which I also heard depicted on the French National Radio, with an experiment comparing the actual hunting (human) song, a basic sentence in the local language, and the imitation of the song of another bird. I could not understand why the experiment did not include hunting songs from other hunting groups, as they are highly different but just as effective. It would have helped in understanding how innate the reaction of the bird is;
- another literary entry at the science behind Mary Shelley’s Frankenstein;
- a study of the Mathematical Genealogy Project in terms of the few mathematicians who started most genealogies of mathematicians, including d’Alembert, advisor to Laplace of whom I am one of the many descendants, although the finding is not that astounding when considering usual genealogies where most branches die off and the highly hierarchical structure of power in universities of old.

## local kernel reduction for ABC

Posted in Books, pictures, Statistics, University life with tags ABC, Approximate Bayesian computation, Bayesian inference, kernel density estimator, reproducing kernel Hilbert space, summary statistics on September 14, 2016 by xi'an

“…construction of low dimensional summary statistics can be performed as in a black box…”

**T**oday Zhou and Fukuzumi just arXived a paper that proposes a gradient-based dimension reduction for ABC summary statistics, in the spirit of RKHS kernels as advocated, e.g., by Arthur Gretton. Here the projection is a mere *linear* projection Bs of the vector of summary statistics, s, where B is an estimated Hessian matrix associated with the posterior expectation E[θ|s]. (There is some connection with the latest version of Li’s and Fearnhead’s paper on ABC convergence as they also define a *linear* projection of the summary statistics, based on asymptotic arguments, although their matrix does depend on the true value of the parameter.) The linearity sounds like a strong restriction [to me] especially when the summary statistics have no reason to belong to a vectorial space and thus be open to changes of bases and linear projections. For instance, a specific value taken by a summary statistic, like 0 say, may be more relevant than the range of their values. On a larger scale, I am doubtful about always projecting a vector of summary statistics on a subspace with the smallest possible dimension, ie the dimension of θ. In practical settings, it seems impossible to derive the optimal projection and a subvector is almost certain to loose information against a larger vector.

“Another proposal is to use different summary statistics for different parameters.”

Which is exactly what we did in our random forest estimation paper. Using a different forest for each parameter of interest (but no real tree was damaged in the experiment!).

## Savage-Dickey supermodels

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags astrostatistics, Bayes factor, Biometrika, Brad Carlin, bridge sampling, cosmology, encompassing model, MCMC, mixtures of distributions, nested sampling, Péru, Sid Chib on September 13, 2016 by xi'an**A**. Mootoovaloo, B. Bassett, and M. Kunz just arXived a paper on the computation of Bayes factors by the Savage-Dickey representation through a supermodel (or encompassing model). (I wonder why Savage-Dickey is so popular in astronomy and cosmology statistical papers and not so much elsewhere.) Recall that the trick is to write the Bayes factor in favour of the encompasssing model as the ratio of the posterior and of the prior for the tested parameter (thus eliminating nuisance or common parameters) at its null value,

B^{10}=π(φ⁰|x)/π(φ⁰).

Modulo some continuity constraints on the prior density, and the assumption that the conditional prior on nuisance parameter is the same under the null model and the encompassing model [given the null value φ⁰]. If this sounds confusing or even shocking from a mathematical perspective, check the numerous previous entries on this topic on the ‘Og!

The supermodel created by the authors is a mixture of the original models, as in our paper, and… *hold the presses!*, it is a mixture of the likelihood functions, as in Phil O’Neill’s and Theodore Kypraios’ paper. Which is not mentioned in the current paper and should obviously be. In the current representation, the posterior distribution on the mixture weight α is a linear function of α involving both evidences, α(m¹-m²)+m², times the artificial prior on α. The resulting estimator of the Bayes factor thus shares features with bridge sampling, reversible jump, and the importance sampling version of nested sampling we developed in our Biometrika paper. In addition to O’Neill and Kypraios’s solution.

The following quote is inaccurate since the MCMC algorithm needs simulating the parameters of the compared models in realistic settings, hence representing the multidimensional integrals by Monte Carlo versions.

“Though we have a clever way of avoiding multidimensional integrals to calculate the Bayesian Evidence, this new method requires very efficient sampling and for a small number of dimensions is not faster than individual nested sampling runs.”

I actually wonder at the sheer rationale of running an intensive MCMC sampler in such a setting, when the weight α is completely artificial. It is only used to jump from one model to the next, which sound quite inefficient when compared with simulating from both models separately and independently. This approach can also be seen as a special case of Carlin’s and Chib’s (1995) alternative to reversible jump. Using instead the Savage-Dickey representation is of course infeasible. Which makes the overall reference to this method rather inappropriate in my opinion. Further, the examples processed in the paper all involve (natural) embedded models where the original Savage-Dickey approach applies. Creating an additional model to apply a pseudo-Savage-Dickey representation does not sound very compelling…

Incidentally, the paper also includes a discussion of a weird notion, the likelihood of the Bayes factor, B¹², which is plotted as a distribution in B¹², most strangely. The only other place I met this notion is in Murray Aitkin’s book. Something’s unclear there or in my head!

“One of the fundamental choices when using the supermodel approach is how to deal with common parameters to the two models.”

This is an interesting question, although maybe not so relevant for the Bayes factor issue where it should not matter. However, as in our paper, multiplying the number of parameters in the encompassing model may hinder convergence of the MCMC chain or reduce the precision of the approximation of the Bayes factor. Again, from a Bayes factor perspective, this does not matter [while it does in our perspective].

## [Royal] Series B’log

Posted in Books, Statistics, University life, Wines with tags associate editor, blogging, guest editors, referee, refereeing, Royal Statistical Society, Series B on September 12, 2016 by xi'an*[Thanks to Ingmar for suggesting the additional Royal!]*

**L**ast week, I got an email from Piotr Fryzlewicz on behalf of the Publication Committee of the Royal Statistical Society enquiring about my interest in becoming a blog associate editor for Series B! Although it does not come exactly as a surprise, as I had previously heard about this interest in creating a dedicated blog, this is great news as I think a lively blog can only enhance the visibility and impact of papers published in Series B and hence increase the influence of Series B. Being quite excited by this on-line and interactive extension to the journal, I have accepted the proposal and we are now working on designing the new blog (Series B’log!) to get it on track as quickly as possible.

Suggestions towards this experiment are most welcome! I am thinking of involving authors to write blog summaries of their paper, AEs and reviewers to voice their expert opinions about the paper, anonymously or not, and of course anyone interested in commenting the paper. The idea is to turn (almost) all papers into on-line Read Papers, with hopefully the backup of authors through their interactions with the commentators. I certainly do not intend to launch discussions on each and every paper, betting on the AEs or referees to share their impressions. And if a paper ends up being un-discussed, this may prove enough of an incentive for some. (Someone asked me if we intended to discuss rejected papers as well. This is an interesting concept, but not to be considered at the moment!)