## On the use of marginal posteriors in marginal likelihood estimation via importance-sampling

Posted in R, Statistics, University life with tags , , , , , , , , , , , , , on November 20, 2013 by xi'an

Perrakis, Ntzoufras, and Tsionas just arXived a paper on marginal likelihood (evidence) approximation (with the above title). The idea behind the paper is to base importance sampling for the evidence on simulations from the product of the (block) marginal posterior distributions. Those simulations can be directly derived from an MCMC output by randomly permuting the components. The only critical issue is to find good approximations to the marginal posterior densities. This is handled in the paper either by normal approximations or by Rao-Blackwell estimates. the latter being rather costly since one importance weight involves B.L computations, where B is the number of blocks and L the number of samples used in the Rao-Blackwell estimates. The time factor does not seem to be included in the comparison studies run by the authors, although it would seem necessary when comparing scenarii.

After a standard regression example (that did not include Chib’s solution in the comparison), the paper considers  2- and 3-component mixtures. The discussion centres around label switching (of course) and the deficiencies of Chib’s solution against the current method and Neal’s reference. The study does not include averaging Chib’s solution over permutations as in Berkoff et al. (2003) and Marin et al. (2005), an approach that does eliminate the bias. Especially for a small number of components. Instead, the authors stick to the log(k!) correction, despite it being known for being quite unreliable (depending on the amount of overlap between modes). The final example is Diggle et al. (1995) longitudinal Poisson regression with random effects on epileptic patients. The appeal of this model is the unavailability of the integrated likelihood which implies either estimating it by Rao-Blackwellisation or including the 58 latent variables in the analysis.  (There is no comparison with other methods.)

As a side note, among the many references provided by this paper, I did not find trace of Skilling’s nested sampling or of safe harmonic means (as exposed in our own survey on the topic).

## another vanilla Rao-Blackwellisation

Posted in Statistics, University life with tags , , , , , on September 16, 2013 by xi'an

In the latest issue of Statistics and Computing (2013, Issue 23, pages 577-587), Iliopoulos and Malefaki published a paper that relates to our vanilla Rao-Blackwellisation paper with Randal Douc. The idea is to derive another approximation to the ideal importance sampling weight using the “accepted” Markov chain. (With Randal, we had a Bernoulli factory representation.) The density g(x) of the accepted chain being unknown; it is represented as the expectation under π of the function

$\min\left\{q(z|x)/\pi(z),q(x|z)/\pi(x)\right\}$

and hence estimated by a self-normalised average based on the whole Markov chain. This means the resulting importance estimate uses twice the output of the algorithm and that it is biased and of order O(n²), thus the same order as our original Rao-Blackwellised estimator (Robert & Casella, 1996)… This also means convergence and CLT are very hard to establish: the main convergence theorem of the paper holds only for finite state spaces, with a surprising smaller asymptotic variance for this self-normalised average than for the ideal importance sampling estimator in the independent Metropolis-Hastings case. (Both are biased by being self-normalised and the paper does not consider the magnitude of those biases.)

Interestingly, the authors also ran a comparison with our parallelised Rao-Blackwellised version (with Pierre Jacob and Murray Smith), but conclude (P.58) at a larger CPU (should be GPU!!) required by the parallelisation, which does not really make sense: when compared with the plain Metropolis-Hastings implementation, run on a single processor, the parallel version only requires an extra random permutation per thread or per processor. I thus suspect a faulty implementation that induces this CPU being linear in the size of the blocks, like maybe only saving one output per block… Also interestingly, the paper re-analyses the Pima Indian probit model Jean-Michel Marin and I (and many others) used as benchmark in several of our papers. As in the most standard examples, the outcome shows a mild reduction in variance when using this estimated importance sampling version. Maybe a comparison with the ideal importance sampler (i.e. the one that does not divide by the sum of the weights since using normalised versions of the target and importance densities) would have helped in the comparison.

## snapshot from Budapest (EMS 2013 #3)

Posted in pictures, Statistics, University life, Wines with tags , , , , , on July 25, 2013 by xi'an

Here is the new version of the talk:

And I had a fairly interesting day at the conference, from Randal’s talk on hidden Markov models with finite valued observables to the two Terrys invited session (Terry Lyons vs. Terry Speed) to the machine learning session organised by a certain Michal Jordan (on the program) that turned out to be Michael Jordan (with a talk on the fusion between statistics and computability). A post-session chat with Terry Lyons also opened (to me) new perspectives on data summarisation. (And we at last managed to get a convergence result using a Rao-Blackwellisation argument!) Plus, we ended up the day in a nice bistrot called Zeller with an awfully friendly staff cooking family produces and serving fruity family wines and not yet spoiled by being ranked #1 on tripadvisor (but visibly attracting a lot of tourists like us).

## Path storage in the particle filter

Posted in Books, Kids, Statistics, Travel, University life with tags , , , , , , on July 24, 2013 by xi'an

In the train to Annecy, I read the recently arXived paper by my former PhD student Pierre Jacob (now at NUS), along with Lawrence Murray (Perth), and Sylvain Rubenthaler (Nice), where they obtain precise degeneracy rates of the regular particle filter applied to hidden Markov models with a compact observation space, precise enough to consider storing the entire paths at a linear occupancy rate. Interestingly, the distance to the most common ancestor is of order N log N, if N is the number of particles. And the number of nodes is O(N log N) as well. This means indeed that the whole paths can be stored, which offers a lot of potential in terms of Rao-Blackwellisation and parallelisation. I was first bemused by a silly misunderstanding about the purpose of the algorithm: it is directed at inference at the current time index, not over the whole past and not over the parameters of the model for, else how could we consider the algorithm has converged when it degenerates to a single path at some finite horizon the past? Pierre has also written a further comment of the paper on Statistfaction.

## an hectic trip!

Posted in Running, Statistics, Travel, University life with tags , , , , , , , on February 22, 2013 by xi'an

The trip to Hamburg had started inauspiciously: a heart attack (someone else’s heart) in the métro (RER) has frozen the train traffic completely on Tuesday and I was lucky to find a taxi that managed to drive me to the airport in the nick of time. As there were warnings of strike in the Hamburg airport today, I decided to pack early and left DESY long enough in advance to reach the aiport by public transportation: it is only once I cleared security and sat at the gate that I realised I had forgotten my PC power box/cord in my room at DESY. What a drag! Anyway, I managed to buy a universal adapter in Paris on my way back from the airport and still to attend Adam Johansen’s seminar on Rao-Blackwellisation of particle filters at the Big’MC seminar. An interesting exploitation of missing variable structures within missing variables in a hidden Markov chain! (I missed the reason for the O(NM) computing time, though.) On the way home, I reflected on how little I had seen of Hamburg: a nice train system, a green and pleasant suburban area south of DESY towards the Elben (Groß Flottbeck), while running this morning, and…the airport! I wish I had had the opportunity and the time to get a glimpse of downtown Hamburg.