## Sampling latent states for high-dimensional non-linear state space models with the embedded HMM method

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on March 17, 2016 by xi'an

Previously, I posted a comment on a paper by Alex Shestopaloff and Radford Neal, after my visit to Toronto two years ago, using a particular version of ensemble Monte Carlo. A new paper by the same authors was recently arXived, as an refinement of the embedded HMM paper of Neal (2003), in that the authors propose a new and more efficient way to generate from the (artificial) embedded hidden Markov sampler that is central to their technique of propagating a set of pool states. The method exploits both forward and backward representations of HMMs in an alternating manner. And propagates the pool states from one observation time to the next. The paper also exploits latent Gaussian structures to make autoregressive proposals, as well as flip proposals from x to -x [which seem to only make sense when 0 is a central value for the target, i.e. when the observables y only depend on |x|]. All those modifications bring the proposal quite close to (backward) particle Gibbs, the difference being in using Metropolis rather than importance steps. And in an improvement brought by the embedded HMM approach, even though it is always delicate to generalise those comparisons when some amount of calibration is required by both algorithms under comparison. (Especially delicate when it is rather remote from my area of expertise!) Anyway, I am still intrigued [in a positive way] by the embedded HMM idea as it remains mysterious that a finite length HMM simulation can improve the convergence performances that much. And wonder at a potential connection with an earlier paper of Anthony Lee and Krys Latuszynski using a random number of auxiliary variables. Presumably a wrong impression from a superficial memory…

## ¼th i-like workshop in St. Anne’s College, Oxford

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on March 27, 2014 by xi'an

Due to my previous travelling to and from Nottingham for the seminar and back home early enough to avoid the dreary evening trains from Roissy airport (no luck there, even at 8pm, the RER train was not operating efficiently!, and no fast lane is planed prior to 2023…), I did not see many talks at the i-like workshop. About ¼th, roughly… I even missed the poster session (and the most attractive title of Lazy ABC by Dennis Prangle) thanks to another dreary train ride from Derby to Oxford.

After a few discussions on and around research topics, it was too soon time to take advantage of the grand finale of a March shower to walk from St. Anne’s College to Oxford Station, in order to start the trip back home. I was lucky enough to find a seat and could start experimenting in R the new idea my trip to Nottingham had raised! While discussing a wee bit with my neighbour, a delightful old lady from the New Forest travelling to Coventry, recovering from a brain seizure, wondering about my LaTeX code syntax despite the tiny fonts, and who most suddenly popped a small screen from her bag to start playing Candy Crush!, apologizing all the same. The overall trip was just long enough for my R code to validate this idea of mine, making this week in England quite a profitable one!!!

## twisted filters

Posted in Statistics with tags , , , , , , , , , , on October 6, 2012 by xi'an

Nick Witheley (Bristol) and Anthony Lee (Warwick) just posted an interesting paper called ‘Twisted particle filters‘ on arXiv. (Presumably unintentionally, the title sounds like Twisted Sister, pictured above, even though I never listened to this [particularly] heavy kind of hard rock! Twisting is customarily used in the large deviation literature.)

The twisted particle paper studies the impact of the choice of something similar to, if subtly different from, an importance function on the approximation of the marginal density (or evidence) for HMMS. (In essence, the standard particle filter is modified for only one particle in the population.) The core of the paper is to compare those importance functions in a fixed N-large n setting. As in simpler importance sampling cases, there exists an optimal if impractical choice of importance function, leading to a zero variance estimator of the evidence. Nick and Anthony produce an upper bound on the general variance as well. One of the most appealing features of the paper is that the authors manage a convergence result in n rather than N. (Although the algorithms are obviously validated in the more standard large N sense.)

The paper is quite theoretical and dense (I was about to write heavy in connection with the above!), with half its length dedicated to proofs. It relies on operator theory, with eigen-functions behind the optimal filter, while not unrelated with Pierre Del Moral’s works. (It took me a while to realise that the notation ω was not the elemental draw from the σ-algebra but rather the realisation of the observed sequence! And I had to keep recalling that θ was the lag operator and not a model parameter [there is no model parameter].)

## ACS 2012 (#2)

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on July 12, 2012 by xi'an

This morning, after a nice and cool run along the river Torrens amidst almost unceasing bird songs, I attended another Bayesian ASC 2012 session with Scott Sisson presenting a simulation method aimed at correcting for biased confidence intervals and Robert Kohn giving the same talk in Kyoto. Scott’s proposal, which is rather similar to parametric bootstrap bias correction, is actually more frequentist than Bayesian as the bias is defined in terms of an correct frequentist coverage of a given confidence (or credible) interval. (Thus making the connection with Roderick Little’s calibrated Bayes talk of yesterday.) This perspective thus perceives ABC as a particular inferential method, instead of a computational approximation to the genuine Bayesian object. (We will certainly discuss the issue with Scott next week in Sydney.)

Then Peter Donnely gave a particularly exciting and well-attended talk on the geographic classification of humans, in particular of the (early 1900’s) population of the British isles, based on a clever clustering idea derived from an earlier paper of Na Li and Matthew Stephens: using genetic sequences from a group of individuals, each individual was paired with the rest of the sample as if it descended from this population. Using an HMM model, this led to clustering the sample into about 50 groups, with a remarkable geographic homogeneity: for instance, Cornwall and Devon made two distinct groups, an English speaking pocket of Wales (Little England) was identified as a specific group and so on, the central, eastern and southern England constituting an homogenous group of its own…

## updated slides for ABC PhD course

Posted in pictures, R, Statistics, Travel, University life with tags , , , , , , , on February 8, 2012 by xi'an

Over the weekend, I have added a few slides referring to recent papers mentioning the convergence of ABC algorithms, in particular the very relevant paper by Dean et al. I had already discussed in an earlier post. (This coursing is taking a much larger chunk of my time than I expected! I am thus relieved I will use the same slides in Roma next month and presumably in Melbourne next summer…)

## speed of R, C, &tc.

Posted in R, Running, Statistics, University life with tags , , , , , , , , , on February 3, 2012 by xi'an

My Paris colleague (and fellow-runner) Aurélien Garivier has produced an interesting comparison of 4 (or 6 if you consider scilab and octave as different from matlab) computer languages in terms of speed for producing the MLE in a hidden Markov model, using EM and the Baum-Welch algorithms. His conclusions are that

• matlab is a lot faster than R and python, especially when vectorization is important : this is why the difference is spectacular on filtering/smoothing, not so much on the creation of the sample;
• octave is a good matlab emulator, if no special attention is payed to execution speed…;
• scilab appears as a credible, efficient alternative to matlab;
• still, C is a lot faster; the inefficiency of matlab in loops is well-known, and clearly shown in the creation of the sample.

(In this implementation, R is “only” three times slower than matlab, so this is not so damning…) All the codes are available and you are free to make suggestions to improve the speed of of your favourite language!

## ABC for missing data

Posted in Statistics, University life with tags , , , , , , on February 2, 2012 by xi'an

I received this email a few days ago:

I am an hard-core reader of your blog and thanks to your posts I have recently started being interested to ABC (and Bayesian methods in general). I am writing you to ask for suggestions on the application of the semi-automatic ABC à la Fearnhead & Prangle. The reason why I am contacting you instead of addressing the authors is because (i) you have been involved in the RSS reading of their paper and (ii) you are an authority on ABC, and therefore you are probably best suited and less biased on such issue. I am applying ABC with the semi-automatic statistics selection provided in Fearnhead and Prangle (2012) to a problem which can be formalized as a hidden Markov model. However I am unsure of whether I am making a huge mistake on the following point: let’s suppose we have an unobserved (latent) system state X (depending on an unknown parameter θ) and a corresponding “corrupted” version which is observed with some measurement error, e.g.

Y = X + ε,

where ε is the measurement error independent of X, ε is N(0, σ²), say. Now their setup does not consider measurement error, so I wonder if the following is correct. Since I can simulate n times some X’ from p(X|θ) am I allowed to use the corresponding “simulated” n corrupted measurements

Y’ = X’ + ε’

(where ε’ is a draw from p(ε|σ)) into their regression approach to determine a (vector of) summary statistic S=(S1,S2) for (θ,σ)? I mean the Y’ are draws from a p(y|X’,θ,σ) which is conditional on X’. Is this allowed? Wilkinson (2008) is the only reference I have found considering ABC with measurement-error (the ones by Dean et al (2011) and Jasra et al (2011) being too technical in my opinion to allow a practical implementation) and since he does not consider a summary statistics-based approach (e.g. Algorithm D, page 10) of course he is not in need to simulate the corrupted measurements but only the latent ones. Therefore I am a bit unsure on whether it is statistically ok to simulate Y’ conditionally on X’ or if there is some theoretical issue that makes this nonsense.

to which I replied

…about your model and question, there is no theoretical difficulty in simulating x’ then y’given x’ and a value of the parameters. The reason is that

$y' \sim \int f(y',x'|\theta) \text{d}x' = f(y'|\theta)$

.the proper marginal as defined by the model. Using the intermediate x’ is a way to bypass the integral but this is 100% correct!…

a reply followed by a further request for precision

Although your equation is clearly true, I am not sure I fully grasp the issue, so I am asking for confirmation. Yes, as you noticed I need a

y’ ~ f(y’|θ,σ)

Now it’s certainly true that I can generate a draw x’ from f(x’|θ,σ) and then plug such x’ into f(y’|x’,θ,σ) to generate y’. By proceeding this way I obtain a draw (y’,x’) from f(y’,x’|θ,σ). I think I understood your reasoning, on how by using the procedure above I am actually skipping the computation of the integral in:

$\int f(y',x'|\theta, \sigma) \text{d}x'.$

Is it basically the case that the mechanism above is just a classic simulation from a bivariate distribution, where since I am interested in the marginal f(y’|θ,σ) I simulate from the joint density f(y’,x’|θ,σ) and then discard the x’ output?

which is indeed a correct interpretation. When simulating from a joint, the marginals are by-products of the simulation.