Dalhin, Villani [Mattias, not Cédric] and Schön arXived a paper this week with the above title. The type of intractable likelihood they consider is a non-linear state-space (HMM) model and the SMC-ABC they propose is based on an optimised Laplace approximation. That is, replacing the posterior distribution on the parameter θ with a normal distribution obtained by a Taylor expansion of the log-likelihood. There is no obvious solution for deriving this approximation in the case of intractable likelihood functions and the authors make use of a Bayesian optimisation technique called Gaussian process optimisation (GPO). Meaning that the Laplace approximation is the Laplace approximation of a surrogate log-posterior. GPO is a Bayesian numerical method in the spirit of the probabilistic numerics discussed on the ‘Og a few weeks ago. In the current setting, this means iterating three steps
- derive an approximation of the log-posterior ξ at the current θ using SMC-ABC
- construct a surrogate log-posterior by a Gaussian process using the past (ξ,θ)’s
- determine the next value of θ
In the first step, a standard particle filter cannot be used to approximate the observed log-posterior at θ because the conditional density of observed given latent is intractable. The solution is to use ABC for the HMM model, in the spirit of many papers by Ajay Jasra and co-authors. However, I find the construction of the substitute model allowing for a particle filter very obscure… (A side effect of the heat wave?!) I can spot a noisy ABC feature in equation (7), but am at a loss as to how the reparameterisation by the transform τ is compatible with the observed-given-latent conditional being unavailable: if the pair (x,v) at time t has a closed form expression, so does (x,y), at least on principle, since y is a deterministic transform of (x,v). Another thing I do not catch is why having a particle filter available prevent the use of a pMCMC approximation.
The second step constructs a Gaussian process posterior on the log-likelihood, with Gaussian errors on the ξ’s. The Gaussian process mean is chosen as zero, while the covariance function is a Matérn function. With hyperparameters that are estimated by maximum likelihood estimators (based on the argument that the marginal likelihood is available in closed form). Turning the approach into an empirical Bayes version.
The next design point in the sequence of θ’s is the argument of the maximum of a certain acquisition function, which is chosen here as a sort of maximum regret associated with the posterior predictive associated with the Gaussian process. With possible jittering. At this stage, it reminded me of the Gaussian process approach proposed by Michael Gutmann in his NIPS poster last year.
Overall, the method is just too convoluted for me to assess its worth and efficiency without a practical implementation to… practice upon, for which I do not have time! Hence I would welcome any comment from readers having attempted such implementations. I also wonder at the lack of link with Simon Wood‘s Gaussian approximation that appeared in Nature (2010) and was well-discussed in the Read Paper of Fearnhead and Prangle (2012).