Archive for Monte Carlo approximations

merging MCMC subposteriors

Posted in Books, Statistics, University life with tags , , , , , , , on June 8, 2016 by xi'an

Christopher Nemeth and Chris Sherlock arXived a paper yesterday about an approach to distributed MCMC sampling via Gaussian processes. As in several other papers commented on the ‘Og, the issue is to merge MCMC samples from sub-posteriors into a sample or any sort of approximation of the complete (product) posterior. I am quite sympathetic to the approach adopted in this paper, namely to use a log-Gaussian process representation of each sub-posterior and then to replace each sub-posterior with its log-Gaussian process posterior expectation in an MCMC or importance scheme. And to assess its variability through the posterior variance of the sum of log-Gaussian processes. As pointed out by the authors the closed form representation of the posterior mean of the log-posterior is invaluable as it allows for an HMC implementation. And importance solutions as well. The probabilistic numerics behind this perspective are also highly relevant.

A few arguable (?) points:

  1. The method often relies on importance sampling and hence on the choice of an importance function that is most likely influential but delicate to calibrate in complex settings as I presume the Gaussian estimates are not useful in this regard;
  2. Using Monte Carlo to approximate the value of the approximate density at a given parameter value (by simulating from the posterior distribution) is natural but is it that efficient?
  3. It could be that, by treating all sub-posterior samples as noisy versions of the same (true) posterior, a more accurate approximation of this posterior could be constructed;
  4. The method relies on the exponentiation of a posterior expectation or simulation. As of yesterday, I am somehow wary of log-normal expectations!
  5. If the purpose of the exercise is to approximate univariate integrals, it would seem more profitable to use the Gaussian processes at the univariate level;
  6. The way the normalising missing constants and the duplicate simulations are processed (or not) could deserve further exploration;
  7. Computing costs are in fine unclear when compared with the other methods in the toolbox.

Computing the variance of a conditional expectation via non-nested Monte Carlo

Posted in Books, pictures, Statistics, University life with tags , , , , on May 26, 2016 by xi'an

Fushimi Inari-Taisha shrine, Kyoto, June 27, 2012The recent arXival by Takashi Goda of Computing the variance of a conditional expectation via non-nested Monte Carlo led me to read it as I could not be certain of the contents from only reading the title! The short paper considers the issue of estimating the variance of a conditional expectation when able to simulate the joint distribution behind the quantity of interest. The second moment E(E[f(X)|Y]²) can be written as a triple integral with two versions of x given y and one marginal y, which means that it can approximated in an unbiased manner by simulating a realisation of y then conditionally two realisations of x. The variance requires a third simulation of x, which the author seems to deem too costly and that he hence replaces with another unbiased version based on two conditional generations only. (He notes that a faster biased version is available with bias going down faster than the Monte Carlo error, which makes the alternative somewhat irrelevant, as it is also costly to derive.) An open question after reading the paper stands with the optimal version of the generic estimator (5), although finding the optimum may require more computing time than it is worth spending. Another one is whether or not this version of the expected conditional variance is more interesting (computation-wise) that the difference between the variance and the expected conditional variance as reproduced in (3) given that both quantities can equally be approximated by unbiased Monte Carlo…

Bayesian model averaging in astrophysics [guest post]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , on August 12, 2015 by xi'an

.[Following my posting of a misfiled 2013 blog, Ewan Cameron told me of the impact of this paper in starting his own blog and I asked him for a guest post, resulting in this analysis, much deeper than mine. No warning necessary this time!]

Back in February 2013 when “Bayesian Model Averaging in Astrophysics: A Review” by Parkinson & Liddle (hereafter PL13) first appeared on the arXiv I was a keen, young(ish) postdoc eager to get stuck into debates about anything and everything ‘astro-statistical’. And with its seemingly glaring flaws, PL13 was more grist to the mill. However, despite my best efforts on various forums I couldn’t get a decent fight started over the right way to do model averaging (BMA) in astronomy, so out of sheer frustration two months later I made my own soapbox to shout from at Another Astrostatistics Blog. Having seen PL13 reviewed recently here on Xian’s Og it feels like the right time to revisit the subject and reflect on where BMA in astronomy is today.

As pointed out to me back in 2013 by Tom Loredo, the act of Bayesian model averaging has been around much longer than its name; indeed an early astronomical example appears in Gregory & Loredo (1992) in which the posterior mean representation of an unknown signal is constructed for an astronomical “light-curve”, averaging over a set of constant and periodic candidate models. Nevertheless the wider popularisation of model averaging in astronomy has only recently taken place through a variety of applications in cosmology: e.g. Liddle, Mukherjee, Parkinson & Wang (2006) and Vardanyan, Trotta & Silk (2011).

In contrast to earlier studies like Gregory & Loredo (1992)—or the classic review on BMA by Hoeting et al. (1999)—in which the target of model averaging is typically either a utility function, a set of future observations, or a latent parameter of the observational process (e.g. the unknown “light-curve” shape) shared naturally by all competing models, the proposal of cosmological BMA studies is to produce a model-averaged version of the posterior for a given ‘shared’ parameter: a so-called “model-averaged PDF”. This proposal didn’t sit well with me back in 2013, and it still doesn’t sit well with me today. Philosophically: without a model a parameter has no meaning, so why should we want to seek meaning in the marginalised distribution of a parameter over an entire set of models? And, practically: to put it another way, without knowing the model ‘label’ to which a given mass of model-averaged parameter probability belongs there’s nothing much useful we can do with this ‘PDF’: nothing much we can say about the data we’ve just analysed and nothing much we can say about future experiments. Whereas the space of the observed data is shared automatically by all competing models, it seems to me to be somehow “un-Bayesian” to place the further restriction that the parameters of separate models share the same scale and topology. I say “un-Bayesian” since this mode of model averaging suggests a formulation of the parameter space + prior pairing stronger than the statement of one’s prior beliefs for the distribution of observable data given the model. But I would be happy to hear arguments from the other side in the comments box below … ! Continue reading