Archive for intractable likelihood

parallelizable sampling method for parameter inference of large biochemical reaction models

Posted in Books, Statistics with tags , , , , , , , , on June 18, 2018 by xi'an

I came across this older (2016) arXiv paper by Jan Mikelson and Mustafa Khammash [antidated as of April 25, 2018] as another version of nested sampling. The novelty of the approach is in applying nested sampling for approximating the likelihood function in the case of involved hidden Markov models (although the name itself does not appear in the paper). This is an interesting proposal, even though there is a fairly large and very active literature on computational approaches to such objects, from sequential Monte Carlo (SMC) to particle MCMC (pMCMC), to SMC².

“We found a way to efficiently sample parameter vectors (particles) from the super level set of the likelihood (sets of particles with a likelihood equal to or higher than some threshold) corresponding to an increasing sequence of thresholds” (p.2)

The approach here is an aggregate of nested sampling and particle filters (SMC), filters that are paradoxically employed in approximating the likelihood function itself, thus called repeatedly as the value of the parameter θ changes, unless I am confused, when it seems to me that, once started with particle filters, the authors could have used them all the way to the upper level (through, again, SMC²). Instead, and that brings a further degree of (uncorrected) approximation to the procedure, a Dirichlet process prior is used to estimate Gaussian mixture approximations to the true posterior distribution(s) on the (super) level sets. Now, approximating a distribution that is zero outside a compact set [the prior restricted to the likelihood being larger than by a distribution with an infinite support does not a priori sound like a particularly enticing idea. Note also that there is no later correction for using the mixture approximation to the restricted prior. (The method also involves an approximation of the (Lebesgue) volume of the level sets that may be poor in higher dimensions.)

“DP-GMM estimations work very well in high dimensional spaces and since we use rejection sampling to obtain samples from the level set by sampling from the DP-GMM estimation, the estimation error does not get propagated through iterations.” (p.13)

One aspect of the paper that puzzles me is the use of a rejection sampler to produce new parameters simulations from a given (super) level set, as this involves a lower bound M on the Gaussian mixture approximation over this level set. If a Gaussian mixture approximation is available, there is apparently no need for this as it can be sampled directly and values below the threshold can be disposed of. It is also unclear why the error does not propagate from one level to the next, if only because of the connection between the successive particle approximations.


ABC’ptotics on-line

Posted in Statistics with tags , , , , , , , on June 14, 2018 by xi'an

Our paper on Asymptotic properties of ABC with David Frazier, Gael Martin, and Judith Rousseau, is now on-line on the Biometrika webpage. Coincidentally both papers by Wentao Li and Paul Fearnhead on ABC’ptotics are published in the June issue of the journal.

Approximate Bayesian computation allows for statistical analysis using models with intractable likelihoods. In this paper we consider the asymptotic behaviour of the posterior distribution obtained by this method. We give general results on the rate at which the posterior distribution concentrates on sets containing the true parameter, the limiting shape of the posterior distribution, and the asymptotic distribution of the posterior mean. These results hold under given rates for the tolerance used within the method, mild regularity conditions on the summary statistics, and a condition linked to identification of the true parameters. Implications for practitioners are discussed.

ABCDE for approximate Bayesian conditional density estimation

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on February 26, 2018 by xi'an

Another arXived paper I surprisingly (?) missed, by George Papamakarios and Iain Murray, on an ABCDE (my acronym!) substitute to ABC for generative models. The paper was reviewed [with reviews made available!] and accepted by NIPS 2016. (Most obviously, I was not one of the reviewers!)

“Conventional ABC algorithms such as the above suffer from three drawbacks. First, they only represent the parameter posterior as a set of (possibly weighted or correlated) samples [for which] it is not obvious how to perform some other computations using samples, such as combining posteriors from two separate analyses. Second, the parameter samples do not come from the correct Bayesian posterior (…) Third, as the ε-tolerance is reduced, it can become impractical to simulate the model enough times to match the observed data even once [when] simulations are expensive to perform”

The above criticisms are a wee bit overly harsh as, well…, Monte Carlo approximations remain a solution worth considering for all Bayesian purposes!, while the approximation [replacing the data with a ball] in ABC is replaced with an approximation of the true posterior as a mixture. Both requiring repeated [and likely expensive] simulations. The alternative is in iteratively simulating from pseudo-predictives towards learning better pseudo-posteriors, then used as new proposals at the next iteration modulo an importance sampling correction.  The approximation to the posterior chosen therein is a mixture density network, namely a mixture distribution with parameters obtained as neural networks based on the simulated pseudo-observations. Which the authors claim [p.4] requires no tuning. (Still, there are several aspects to tune, from the number of components to the hyper-parameter λ [p.11, eqn (35)], to the structure of the neural network [20 tanh? 50 tanh?], to the number of iterations, to the amount of X checking. As usual in NIPS papers, it is difficult to assess how arbitrary the choices made in the experiments are. Unless one starts experimenting with the codes provided.) All in all, I find the paper nonetheless exciting enough (!) to now start a summer student project on it in Dauphine and hope to check the performances of ABCDE on different models, as well as comparing this ABC implementation with a synthetic likelihood version.

 As an addendum, let me point out the very pertinent analysis of this paper by Dennis Prangle, 18 months ago!

running ABC when the likelihood is available

Posted in Statistics with tags , , , , , on September 19, 2017 by xi'an

Today I refereed a paper where the authors used ABC to bypass convergence (and implementation) difficulties with their MCMC algorithm. And I am still pondering whether or not this strategy makes sense. If only because ABC needs to handle the same complexity and the same amount of parameters as an MCMC algorithm. While shooting “in the dark” by using the prior or a coarse substitute to the posterior. And I wonder at the relevance of simulating new data when the [true] likelihood value [at the observed data] can be computed. This would sound to me like the relevant and unique “statistics” worth considering…

estimating constants [survey]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on February 2, 2017 by xi'an

A new survey on Bayesian inference with intractable normalising constants was posted on arXiv yesterday by Jaewoo Park and Murali Haran. A rather massive work of 58 pages, almost handy for a short course on the topic! In particular, it goes through the most common MCMC methods with a detailed description, followed by comments on components to be calibrated and the potential theoretical backup. This includes for instance the method of Liang et al. (2016) that I reviewed a few months ago. As well as the Wang-Landau technique we proposed with Yves Atchadé and Nicolas Lartillot. And the noisy MCMC of Alquier et al. (2016), also reviewed a few months ago. (The Russian Roulette solution is only mentioned very briefly as” computationally very expensive”. But still used in some illustrations. The whole area of pseudo-marginal MCMC is also missing from the picture.)

“…auxiliary variable approaches tend to be more efficient than likelihood approximation approaches, though efficiencies vary quite a bit…”

The authors distinguish between MCMC methods where the normalizing constant is approximated and those where it is omitted by an auxiliary representation. The survey also distinguishes between asymptotically exact and asymptotically inexact solutions. For instance, using a finite number of MCMC steps instead of the associated target results in an asymptotically inexact method. The question that remains open is what to do with the output, i.e., whether or not there is a way to correct for this error. In the illustration for the Ising model, the double Metropolis-Hastings version of Liang et al. (2010) achieves for instance massive computational gains, but also exhibits a persistent bias that would go undetected were it the sole method implemented. This aspect of approximate inference is not really explored in the paper, but constitutes a major issue for modern statistics (and machine learning as well, when inference is taken into account.)

In conclusion, this survey provides a serious exploration of recent MCMC methods. It begs for a second part involving particle filters, which have often proven to be faster and more efficient than MCMC methods, at least in state space models. In that regard, Nicolas Chopin and James Ridgway examined further techniques when calling to leave the Pima Indians [dataset] alone.

%d bloggers like this: