Archive for HMM

interdependent Gibbs samplers

Posted in Books, Statistics, University life with tags , , , , , , on April 27, 2018 by xi'an

Mark Kozdoba and Shie Mannor just arXived a paper on an approach to accelerate a Gibbs sampler. With title “interdependent Gibbs samplers“. In fact, it presents rather strong similarities with our SAME algorithm. More of the same, as Adam Johanssen (Warwick) entitled one of his papers! The paper indeed suggests multiplying replicas of latent variables (e.g., an hidden path for an HMM) in an artificial model. And as in our 2002 paper, with Arnaud Doucet and Simon Godsill, the focus here is on maximum likelihood estimation (of the genuine parameters, not of the latent variables). Along with argument that the resulting pseudo-posterior is akin to a posterior with a powered likelihood. And a link with the EM algorithm. And an HMM application.

“The generative model consist of simply sampling the parameters ,  and then sampling m independent copies of the paths”

If anything this proposal is less appealing than SAME because it aims directly at the powered likelihood, rather than utilising an annealed sequence of powers that allows for a primary exploration of the whole parameter space before entering the trapping vicinity of a mode. Which makes me fail to catch the argument from the authors that this improves Gibbs sampling, as a more acute mode has on the opposite the dangerous feature of preventing visits to other modes. Hence the relevance to resort to some form of annealing.

As already mused upon in earlier posts, I find it most amazing that this technique has been re-discovered so many times, both in statistics and in adjacent fields. The idea of powering the likelihood with independent copies of the latent variables is obviously natural (since a version pops up every other year, always under a different name), but earlier versions should eventually saturate the market!

Sampling latent states for high-dimensional non-linear state space models with the embedded HMM method

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on March 17, 2016 by xi'an

IMG_19390Previously, I posted a comment on a paper by Alex Shestopaloff and Radford Neal, after my visit to Toronto two years ago, using a particular version of ensemble Monte Carlo. A new paper by the same authors was recently arXived, as an refinement of the embedded HMM paper of Neal (2003), in that the authors propose a new and more efficient way to generate from the (artificial) embedded hidden Markov sampler that is central to their technique of propagating a set of pool states. The method exploits both forward and backward representations of HMMs in an alternating manner. And propagates the pool states from one observation time to the next. The paper also exploits latent Gaussian structures to make autoregressive proposals, as well as flip proposals from x to -x [which seem to only make sense when 0 is a central value for the target, i.e. when the observables y only depend on |x|]. All those modifications bring the proposal quite close to (backward) particle Gibbs, the difference being in using Metropolis rather than importance steps. And in an improvement brought by the embedded HMM approach, even though it is always delicate to generalise those comparisons when some amount of calibration is required by both algorithms under comparison. (Especially delicate when it is rather remote from my area of expertise!) Anyway, I am still intrigued [in a positive way] by the embedded HMM idea as it remains mysterious that a finite length HMM simulation can improve the convergence performances that much. And wonder at a potential connection with an earlier paper of Anthony Lee and Krys Latuszynski using a random number of auxiliary variables. Presumably a wrong impression from a superficial memory…

¼th i-like workshop in St. Anne’s College, Oxford

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , on March 27, 2014 by xi'an

IMG_0153Due to my previous travelling to and from Nottingham for the seminar and back home early enough to avoid the dreary evening trains from Roissy airport (no luck there, even at 8pm, the RER train was not operating efficiently!, and no fast lane is planed prior to 2023…), I did not see many talks at the i-like workshop. About ¼th, roughly… I even missed the poster session (and the most attractive title of Lazy ABC by Dennis Prangle) thanks to another dreary train ride from Derby to Oxford.

IMG_0150As it happened I had already heard or read parts of the talks in the Friday morning session, but this made understanding them better. As in Banff, Paul Fearnhead‘s talk on reparameterisations for pMCMC on hidden Markov models opened a wide door to possible experiments on those algorithms. The examples in the talk were mostly of the parameter duplication type, somewhat creating unidentifiability to decrease correlation, but I also wondered at the possibility of introducing frequent replicas of the hidden chain in order to fight degeneracy. Then Sumeet Singh gave a talk on the convergence properties of noisy ABC for approximate MLE. Although I had read some of the papers behind the talk, it made me realise how keeping balls around each observation in the ABC acceptance step was not leading to extinction as the number of observations increased. (Summet also had a good line with his ABCDE algorithm, standing for ABC done exactly!) Anthony Lee covered his joint work with Krys Łatuszyński on the ergodicity conditions on the ABC-MCMC algorithm, the only positive case being the 1-hit algorithm as discussed in an earlier post. This result will hopefully get more publicity, as I frequently read that increasing the number of pseudo-samples has no clear impact on the ABC approximation. Krys Łatuszyński concluded the morning with an aggregate of the various results he and his co-authors had obtained on the fascinating Bernoulli factory. Including constructive derivations.

After a few discussions on and around research topics, it was too soon time to take advantage of the grand finale of a March shower to walk from St. Anne’s College to Oxford Station, in order to start the trip back home. I was lucky enough to find a seat and could start experimenting in R the new idea my trip to Nottingham had raised! While discussing a wee bit with my neighbour, a delightful old lady from the New Forest travelling to Coventry, recovering from a brain seizure, wondering about my LaTeX code syntax despite the tiny fonts, and who most suddenly popped a small screen from her bag to start playing Candy Crush!, apologizing all the same. The overall trip was just long enough for my R code to validate this idea of mine, making this week in England quite a profitable one!!! IMG_0145

twisted filters

Posted in Statistics with tags , , , , , , , , , , on October 6, 2012 by xi'an

Nick Witheley (Bristol) and Anthony Lee (Warwick) just posted an interesting paper called ‘Twisted particle filters‘ on arXiv. (Presumably unintentionally, the title sounds like Twisted Sister, pictured above, even though I never listened to this [particularly] heavy kind of hard rock! Twisting is customarily used in the large deviation literature.)

The twisted particle paper studies the impact of the choice of something similar to, if subtly different from, an importance function on the approximation of the marginal density (or evidence) for HMMS. (In essence, the standard particle filter is modified for only one particle in the population.) The core of the paper is to compare those importance functions in a fixed N-large n setting. As in simpler importance sampling cases, there exists an optimal if impractical choice of importance function, leading to a zero variance estimator of the evidence. Nick and Anthony produce an upper bound on the general variance as well. One of the most appealing features of the paper is that the authors manage a convergence result in n rather than N. (Although the algorithms are obviously validated in the more standard large N sense.)

The paper is quite theoretical and dense (I was about to write heavy in connection with the above!), with half its length dedicated to proofs. It relies on operator theory, with eigen-functions behind the optimal filter, while not unrelated with Pierre Del Moral’s works. (It took me a while to realise that the notation ω was not the elemental draw from the σ-algebra but rather the realisation of the observed sequence! And I had to keep recalling that θ was the lag operator and not a model parameter [there is no model parameter].)

ACS 2012 (#2)

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on July 12, 2012 by xi'an

This morning, after a nice and cool run along the river Torrens amidst almost unceasing bird songs, I attended another Bayesian ASC 2012 session with Scott Sisson presenting a simulation method aimed at correcting for biased confidence intervals and Robert Kohn giving the same talk in Kyoto. Scott’s proposal, which is rather similar to parametric bootstrap bias correction, is actually more frequentist than Bayesian as the bias is defined in terms of an correct frequentist coverage of a given confidence (or credible) interval. (Thus making the connection with Roderick Little’s calibrated Bayes talk of yesterday.) This perspective thus perceives ABC as a particular inferential method, instead of a computational approximation to the genuine Bayesian object. (We will certainly discuss the issue with Scott next week in Sydney.)

Then Peter Donnely gave a particularly exciting and well-attended talk on the geographic classification of humans, in particular of the (early 1900’s) population of the British isles, based on a clever clustering idea derived from an earlier paper of Na Li and Matthew Stephens: using genetic sequences from a group of individuals, each individual was paired with the rest of the sample as if it descended from this population. Using an HMM model, this led to clustering the sample into about 50 groups, with a remarkable geographic homogeneity: for instance, Cornwall and Devon made two distinct groups, an English speaking pocket of Wales (Little England) was identified as a specific group and so on, the central, eastern and southern England constituting an homogenous group of its own…

updated slides for ABC PhD course

Posted in pictures, R, Statistics, Travel, University life with tags , , , , , , , on February 8, 2012 by xi'an

Over the weekend, I have added a few slides referring to recent papers mentioning the convergence of ABC algorithms, in particular the very relevant paper by Dean et al. I had already discussed in an earlier post. (This coursing is taking a much larger chunk of my time than I expected! I am thus relieved I will use the same slides in Roma next month and presumably in Melbourne next summer…)

speed of R, C, &tc.

Posted in R, Running, Statistics, University life with tags , , , , , , , , , on February 3, 2012 by xi'an

My Paris colleague (and fellow-runner) Aurélien Garivier has produced an interesting comparison of 4 (or 6 if you consider scilab and octave as different from matlab) computer languages in terms of speed for producing the MLE in a hidden Markov model, using EM and the Baum-Welch algorithms. His conclusions are that

  • matlab is a lot faster than R and python, especially when vectorization is important : this is why the difference is spectacular on filtering/smoothing, not so much on the creation of the sample;
  • octave is a good matlab emulator, if no special attention is payed to execution speed…;
  • scilab appears as a credible, efficient alternative to matlab;
  • still, C is a lot faster; the inefficiency of matlab in loops is well-known, and clearly shown in the creation of the sample.

(In this implementation, R is “only” three times slower than matlab, so this is not so damning…) All the codes are available and you are free to make suggestions to improve the speed of of your favourite language!