Archive for Cambridge

nested sampling when prior and likelihood clash

Posted in Books, Statistics with tags , , , , , , , , , on April 3, 2018 by xi'an

A recent arXival by Chen, Hobson, Das, and Gelderblom makes the proposal of a new nested sampling implementation when prior and likelihood disagree, making simulations from the prior inefficient. The paper holds the position that a single given prior is used over and over all datasets that come along:

“…in applications where one wishes to perform analyses on many thousands (or even millions) of different datasets, since those (typically few) datasets for which the prior is unrepresentative can absorb a large fraction of the computational resources.” Chen et al., 2018

My reaction to this situation, provided (a) I want to implement nested sampling and (b) I realise there is a discrepancy, would be to resort to an importance sampling resolution, as we proposed in our Biometrika paper with Nicolas. Since one objection [from the authors] is that identifying outlier datasets is complicated (it should not be when the likelihood function can be computed) and time-consuming, sequential importance sampling could be implemented.

“The posterior repartitioning (PR) method takes advantage of the fact that nested sampling makes use of the likelihood L(θ) and prior π(θ) separately in its exploration of the parameter space, in contrast to Markov chain Monte Carlo (MCMC) sampling methods or genetic algorithms which typically deal solely in terms of the product.” Chen et al., 2018

The above salesman line does not ring a particularly convincing chime in that nested sampling is about as myopic as MCMC since based on the similar notion of a local proposal move, starting from the lowest likelihood argument (the minimum likelihood estimator!) in the nested sample.

“The advantage of this extension is that one can choose (π’,L’) so that simulating from π’ under the constraint L'(θ) > l is easier than simulating from π under the constraint L(θ) > l. For instance, one may choose an instrumental prior π’ such that Markov chain Monte Carlo steps adapted to the instrumental constrained prior are easier to implement than with respect to the actual constrained prior. In a similar vein, nested importance sampling facilitates contemplating several priors at once, as one may compute the evidence for each prior by producing the same nested sequence, based on the same pair (π’,L’), and by simply modifying the weight function.” Chopin & Robert, 2010

Since the authors propose to switch to a product (π’,L’) such that π’.L’=π.L, the solution appears like a special case of importance sampling, with the added drwaback that when π’ is not normalised, its normalised constant must be estimated as well. (With an extra nested sampling implementation?) Furthermore, the advocated solution is to use tempering, which is not so obvious as it seems in small dimensions. As the mass does not always diffuse to relevant parts of the space. A more “natural” tempering would be to use a subsample in the (sub)likelihood for nested sampling and keep the remainder of the sample for weighting the evaluation of the evidence.

positions in North-East America

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , on September 14, 2017 by xi'an

Today I received emails about openings in both Université de Montréal, Canada, and Harvard University, USA:

  • Professor in Statistics, Biostatistics or Data Science at U de M, deadline October 30th, 2017, a requirement being proficiency in the French language;
  • Tenure-Track Professorship in Statistics at Harvard University, Department of Statistics, details there.

after-dinner at Trinity [jatp]

Posted in pictures, Travel, University life with tags , , , , , , on July 8, 2017 by xi'an

Fourth Bayesian, Fiducial, and Frequentist Conference

Posted in Books, pictures, Statistics, Travel, University life, Wines with tags , , , , , , , on March 29, 2017 by xi'an

Next May 1-3, I will attend the 4th Bayesian, Fiducial and Frequentist Conference at Harvard University (hopefully not under snow at that time of year), which is a meeting between philosophers and statisticians about foundational thinking in statistics and inference under uncertainty. This should be fun! (Registration is now open.)

Britain, please stay!

Posted in Books, Kids, pictures, Running, University life with tags , , , , , , , , , on June 7, 2016 by xi'an

A love letter from some Europeans against Brexit that appeared in the Times Literary Supplement a few days ago, and which message I definitely support:

All of us in Europe respect the right of the British people to decide whether they wish to remain with us in the European Union. It is your decision, and we will all accept it. Nevertheless, if it will help the undecided to make up their minds, we would like to express how very much we value having the United Kingdom in the European Union. It is not just treaties that join us to your country, but bonds of admiration and affection. All of us hope that you will vote to renew them. Britain, please stay.

messages from Harvard

Posted in pictures, Statistics, Travel, University life with tags , , , , , , on March 24, 2016 by xi'an

As in Bristol two months ago, where I joined the statistics reading in the morning, I had the opportunity to discuss the paper on testing via mixtures prior to my talk with a group of Harvard graduate students. Which concentrated on the biasing effect of the Bayes factor against the more complex hypothesis/model. Arguing [if not in those terms!] that Occam’s razor was too sharp. With a neat remark that decomposing the log Bayes factor as

log(p¹(y¹,H))+log(p²(y²|y¹,H))+…

meant that the first marginal was immensely and uniquely impacted by the prior modelling, hence very likely to be very small for a larger model H, which would then take forever to recover from. And asking why there was such a difference with cross-validation

log(p¹(y¹|y⁻¹,H))+log(p²(y²|y⁻²,H))+…

where the leave-one out posterior predictor is indeed more stable. While the later leads to major overfitting in my opinion, I never spotted the former decomposition which does appear as a strong and maybe damning criticism of the Bayes factor in terms of long-term impact of the prior modelling.

Other points made during the talk or before when preparing the talk:

  1. additive mixtures are but one encompassing model, geometric mixtures could be fun too, if harder to process (e.g., missing normalising constant). Or Zellner’s mixtures (with again the normalising issue);
  2. if the final outcome of the “test” is the posterior on α itself, the impact of the hyper-parameter on α is quite relative since this posterior can be calibrated by simulation against limiting cases (α=0,1);
  3. for the same reason the different rate of accumulation near zero and one  when compared with a posterior probability is hardly worrying;
  4. what I see as a fundamental difference in processing improper priors for Bayes factors versus mixtures is not perceived as such by everyone;
  5. even a common parameter θ on both models does not mean both models are equally weighted a priori, which relates to an earlier remark in Amsterdam about the different Jeffreys priors one can use;
  6. the MCMC output also produces a sample of θ’s which behaviour is obviously different from single model outputs. It would be interesting to study further the behaviour of those samples, which are not to be confused with model averaging;
  7. the mixture setting has nothing intrinsically Bayesian in that the model can be processed in other ways.

Harvard snow sprinkle

Posted in Kids, pictures, Running, Travel, University life with tags , , , , , on March 22, 2016 by xi'an