Archive for normalising constant

Bayesian model selection without evidence

Posted in Books, Statistics, University life with tags , , , , , , , on September 20, 2016 by xi'an

“The new method circumvents the challenges associated with accurate evidence calculations by computing posterior odds ratios using Bayesian parameter estimation”

One paper leading to another, I had a look at Hee et al. 2015 paper on Bayes factor estimation. The “novelty” stands in introducing the model index as an extra parameter in a single model encompassing all models under comparison, the “new” parameterisation being in (θ,n) rather than in θ. With the distinction that the parameter θ is now made of the union of all parameters across all models. Which reminds us very much of Carlin and Chib (1995) approach to the problem. (Peter Green in his Biometrika (1995) paper on reversible jump MCMC uses instead a direct sum of parameter spaces.) The authors indeed suggest simulating jointly (θ,n) in an MCMC or nested sampling scheme. Rather than being updated by arbitrary transforms as in Carlin and Chib (1995) the useless parameters from the other models are kept constant… The goal being to estimate P(n|D) the marginal posterior on the model index, aka the posterior probability of model n.

Now, I am quite not certain keeping the other parameter constants is a valid move: given a uniform prior on n and an equally uniform proposal, the acceptance probability simplifies into the regular Metropolis-Hastings ratio for model n. Hence the move is valid within model n. If not, I presume the previous pair (θ⁰,n⁰) is repeated. Wait!, actually, this is slightly more elaborate: if a new value of n, m, is proposed, then the acceptance ratio involves the posteriors for both n⁰ and m, possibly only the likelihoods when the proposal is the prior. So the move will directly depend on the likelihood ratio in this simplified case, which indicates the scheme could be correct after all. Except that this neglects the measure theoretic subtleties that led to reversible jump symmetry and hence makes me wonder. In other words, it follows exactly the same pattern as reversible jump without the constraints of the latter… Free lunch,  anyone?!

CRiSM workshop on estimating constants [slides]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , on May 4, 2016 by xi'an

A short announcement that the slides of almost all talks at the CRiSM workshop on estimating constants last April 20-22 are now available. Enjoy (and dicuss)!

estimating constants [impression soleil levant]

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on April 25, 2016 by xi'an

The CRiSM workshop on estimating constants which took place here in Warwick from April 20 till April 22 was quite enjoyable [says most objectively one of the organisers!], with all speakers present to deliver their talks  (!) and around sixty participants, including 17 posters. It remains a exciting aspect of the field that so many and so different perspectives are available on the “doubly intractable” problem of estimating a normalising constant. Several talks and posters concentrated on Ising models, which always sound a bit artificial to me, but also are perfect testing grounds for approximations to classical algorithms.

On top of [clearly interesting!] talks associated with papers I had already read [and commented here], I had not previously heard about Pierre Jacob’s coupling SMC sequence, which paper is not yet out [no spoiler then!]. Or about Michael Betancourt’s adiabatic Monte Carlo and its connection with the normalising constant. Nicolas Chopin talked about the unnormalised Poisson process I discussed a while ago, with this feature that the normalising constant itself becomes an additional parameter. And that integration can be replaced with (likelihood) maximisation. The approach, which is based on a reference distribution (and an artificial logistic regression à la Geyer), reminded me of bridge sampling. And indirectly of path sampling, esp. when Merrilee Hurn gave us a very cool introduction to power posteriors in the following talk. Also mentioning the controlled thermodynamic integration of Chris Oates and co-authors I discussed a while ago. (Too bad that Chris Oates could not make it to this workshop!) And also pointing out that thermodynamic integration could be a feasible alternative to nested sampling.

Another novel aspect was found in Yves Atchadé’s talk about sparse high-dimension matrices with priors made of mutually exclusive measures and quasi-likelihood approximations. A simplified version of the talk being in having a non-identified non-constrained matrix later projected onto one of those measure supports. While I was aware of his noise-contrastive estimation of normalising constants, I had not previously heard Michael Gutmann give a talk on that approach (linking to Geyer’s 1994 mythical paper!). And I do remain nonplussed at the possibility of including the normalising constant as an additional parameter [in a computational and statistical sense]..! Both Chris Sherlock and Christophe Andrieu talked about novel aspects on pseudo-marginal techniques, Chris on the lack of variance reduction brought by averaging unbiased estimators of the likelihood and Christophe on the case of large datasets, recovering better performances in latent variable models by estimating the ratio rather than taking a ratio of estimators. (With Christophe pointing out that this was an exceptional case when harmonic mean estimators could be considered!)

adaptive resample move for estimating constants

Posted in Books, Statistics, University life with tags , , , , , on April 4, 2016 by xi'an

bike trail from Kenilworth to the University of Warwick

“…adaptive resample-move allows us to reduce the variance of the estimate of normalizing constants.”

A few days before our Estimating Constants workshop, Marco Fraccaroa, Ulrich Paquet, and Ole Winthera arXived a paper on estimating normalising constants by resample-move sequential Monte Carlo. The main result of this note is a theorem that derives the optimal relative size of each particle system, based on the approximate variance of the associated importance weights. Which is of major importance when running a sequential algorithm under computing time or budget constraints. In practice this theorem cannot be applied in a sequential manner [since it depends on future steps] and the authors propose instead an adaptive algorithm, enlarging the current set of particles if the effective sample size per particle is not large enough. There may however be a danger of an endless step if the proposal is particularly ill-fitted to the target. Under a fixed total budget, this potential explosion in the number of particles may jeopardize the entire process. Unless some safeguarding mechanism is introduced towards getting back in time to recover more variety in earlier steps. The paper compares the adaptive method with other solutions, including an Riemanian manifold HMC, on Gaussian processes and restricted Boltzman machines. Both examples being associated with very specialised ways of building the sequence of tempered targets, it seems.

CRiSM workshop on estimating constants [#2]

Posted in pictures, Statistics, Travel, University life, Wines with tags , , , , , , , , , on March 31, 2016 by xi'an

The schedule for the CRiSM workshop on estimating constants that Nial Friel, Helen Ogden and myself host next April 20-22 at the University of Warwick is now set as follows. (The plain registration fees are £40 and accommodation on the campus is available through the online form.)

April 20, 2016
11:45 — 12:30: Adam Johansen
12:30 — 14:00: Lunch
14:00 — 14:45: Anne-Marie Lyne
14:45 — 15:30: Pierre Jacob
15:30 — 16:00: Break
16:00 — 16:45: Roberto Trotta
17:00 — 18:00: ‘Elevator’ talks
18:00 — 20:00: Poster session, Cheese and wine

April 21, 2016
9:00 — 9:45: Michael Betancourt
9:45 — 10:30: Nicolas Chopin
10:30 — 11:00: Coffee break
11:00 — 11:45: Merrilee Hurn
11:45 — 12:30: Jean-Michel Marin
12:30 — 14:00: Lunch
14:00 — 14:45: Sumit Mukherjee
14:45 — 15:30: Yves Atchadé
15:30 — 16:00: Break
16:00 — 16:45: Michael Gutmann
16:45 — 17:30: Panayiota Touloupou
19:00 — 22:00: Dinner

April 22, 2016
9:00 — 9:45: Chris Sherlock
9:45 — 10:30: Christophe Andrieu
10:30 — 11:00: Coffee break
11:00 — 11:45: Antonietta Mira

multilevel Monte Carlo for estimating constants

Posted in Books, Statistics, University life with tags , , , , , on March 18, 2016 by xi'an

Pierre Del Moral, Ajay Jasra, Kody Law, and Yan Zhou just arXived a paper entitled Sequential Monte Carlo samplers for normalizing constants. Which obviously attracted my interest! The context is one of a sequential Monte Carlo problem, with an associated sequence of targets and of attached normalising constants. While the quantity of interest only relates to the final distribution in the sequence, using Mike Giles‘ multilevel Monte Carlo approach allows for a more accurate estimation and recycling all the past particles, thanks to the telescoping formula. And the sequential representation also allows for an unbiased estimator, as is well known in the sequential Monte Carlo literature. The paper derives accurate bounds on both the variances of two normalisation constant estimators and the costs of producing such estimators (assuming there is an index typo in Corollary 3.1, where L-2 should be L-1). The improvement when compared with traditional SMC is clear on the example contained in the paper. As I read the paper rather quickly and without much attention to the notations, I may have missed the point, but I did not see any conclusion on the choice of the particle population size at each iteration of the SMC. After asking Ajay about it, he pointed out that this size can be derived as

N_k=\epsilon^{-2}Lh_k^{(\beta+\zeta)/2}K_L

(with notations taken from the paper).

multiple try Metropolis

Posted in Books, Statistics, University life with tags , , , , , , on February 18, 2016 by xi'an

martinLuca Martino and Francisco Louzada recently wrote a paper in Computational Statistics about some difficulties with the multiple try Metropolis algorithm. This version of Metropolis by Liu et al. (2000) makes several proposals in parallel and picks one among them by multinomial sampling where the weights are proportional to the corresponding importance weights. This is followed by a Metropolis acceptance step that requires simulating the same number of proposed moves from the selected value. While this is necessary to achieve detailed balance, this mixture of MCMC and importance sampling is inefficient in that it simulates a large number of particles and ends up using only one of them. By comparison, a particle filter for the same setting would propagate all N particles along iterations and only resamples occasionaly when the ESS is getting too small. (I also wonder if the method could be seen as a special kind of pseudo-marginal approach, given that the acceptance ratio is an empirical average with expectation the missing normalising constan [as I later realised the authors had pointed out!]… In which case efficiency comparisons by Christophe Andrieu and Matti Vihola could prove useful.)

The issue raised by Martino and Louzada is that the estimator of the normalising constant can be poor at times, especially when the chain is in low regions of the target, and hence get the chain stuck. The above graph illustrates this setting in the paper. However, the reason for the failure is mostly that the proposal distribution is inappropriate for the purpose of approximating the normalising constant, i.e., that importance sampling does not converge in this situation, since otherwise the average of the importance weights should a.s. converge to the normalising constant. And the method should not worsen when increasing the number of proposals at a given stage. (The solution proposed by the authors to have a random number of proposals seems unlikely to solve the issue in a generic situation. Changing the proposals towards different tail behaviours as in population Monte Carlo is more akin to defensive sampling and thus more likely to avoid trapping states. Interestingly, the authors eventually resort to a mixture denominator in the importance sampler following AMIS.)