An interesting question from X validated about constructing pseudo-priors for Bayesian model selection. Namely, how useful are these for the concept rather than the implementation? The only case where I am aware of pseudo-priors being used is in Bayesian MCMC algorithms such as Carlin and Chib (1995), where the distributions are used to complement the posterior distribution conditional on a single model (index) into a joint distribution across all model parameters. The trick of this construction is that the pseudo-priors can be essentially anything, including depending on the data as well. And while the impact the ability of the resulting Markov chain to move between spaces, they have no say on the resulting inference, either when choosing a model or when estimating the parameters of a chosen model. The concept of pseudo-priors was also central to the mis-interpretations found in Congdon (2006) and Scott (2002). Which we reanalysed with Jean-Michel Marin in Bayesian Analysis (2008) as the distinction between model-based posteriors and joint pseudo-posteriors.
Archive for model choice
are pseudopriors required in Bayesian model selection?
Posted in Books, Kids, pictures, Statistics, University life with tags Bayesian Analysis, canons, cross validated, Invalides, Joe Abercrombie, joint pseudo-posterior, model choice, Paris, posterior probability, pseudo-priors, The Last Argument of Kings on February 29, 2020 by xi'anTopological sensitivity analysis for systems biology
Posted in Books, Statistics, Travel, University life with tags Gaussian processes, model choice, model validation, nested sampling, network, PNAS, topology on December 17, 2014 by xi'anMichael Stumpf sent me Topological sensitivity analysis for systems biology, written by Ann Babtie and Paul Kirk, en avant-première before it came out in PNAS and I read it during the trip to NIPS in Montréal. (The paper is published in open access, so everyone can read it now!) The topic is quite central to a lot of debates about climate change, economics, ecology, finance, &tc., namely to assess the impact of using the wrong model to draw conclusions and make decisions about a real phenomenon. (Which reminded me of the distinction between mechanical and phenomenological models stressed by Michael Blum in his NIPS talk.) And it is of much interest from a Bayesian point of view since assessing the worth of a model requires modelling the “outside” of a model, using for instance Gaussian processes as in the talk Tony O’Hagan gave in Warwick earlier this term. I would even go as far as saying that the issue of assessing [and compensating for] how wrong a model is, given available data, may be the (single) most under-assessed issue in statistics. We (statisticians) have yet to reach our Boxian era.
In Babtie et al., the space or universe of models is represented by network topologies, each defining the set of “parents” in a semi-Markov representation of the (dynamic) model. At which stage Gaussian processes are also called for help. Alternative models are ranked in terms of fit according to a distance between simulated data from the original model (sounds like a form of ABC?!). Obviously, there is a limitation in the number and variety of models considered this way, I mean there are still assumptions made on the possible models, while this number of models is increasing quickly with the number of nodes. As pointed out in the paper (see, e.g., Fig.4), the method has a parametric bootstrap flavour, to some extent.
What is unclear is how one can conduct Bayesian inference with such a collection of models. Unless all models share the same “real” parameters, which sounds unlikely. The paper mentions using uniform prior on all parameters, but this is difficult to advocate in a general setting. Another point concerns the quantification of how much one can trust a given model, since it does not seem models are penalised by a prior probability. Hence they all are treated identically. This is a limitation of the approach (or an indication that it is only a preliminary step in the evaluation of models) in that some models within a large enough collection will eventually provide an estimate that differs from those produced by the other models. So the assessment may become altogether highly pessimistic for this very reason.
“If our parameters have a real, biophysical interpretation, we therefore need to be very careful not to assert that we know the true values of these quantities in the underlying system, just because–for a given model–we can pin them down with relative certainty.”
In addition to its relevance for moving towards approximate models and approximate inference, and in continuation of yesterday’s theme, the paper calls for nested sampling to generate samples from the posterior(s) and to compute the evidence associated with each model. (I realised I had missed this earlier paper by Michael and co-authors on nested sampling for system biology.) There is no discussion in the paper on why nested sampling was selected, compared with, say, a random walk Metropolis-Hastings algorithm. Unless it is used in a fully automated way, but the paper is rather terse on that issue… And running either approach on 10⁷ models in comparison sounds like an awful lot of work!!! Using importance [sampling] nested sampling as we proposed with Nicolas Chopin could be a way to speed up this exploration if all parameters are identical between all or most models.
talk in Linz [first slide]
Posted in Mountains, pictures, Running, University life with tags ABC, Austria, Boston, IFAS, JKU, JSM, Linz, model choice, Pöstlingberg, talk on September 17, 2014 by xi'anOlli à/in/im Paris
Posted in Statistics, Travel, University life with tags ABC, Bayesian tests, Big'MC, Bristol, classical tests, IHP, location parameter, model choice, model criticism, p-values, Paris, power, UMP tests on May 27, 2013 by xi'anWarning: Here is an old post from last October I can at last post since Olli just arXived the paper on which this talk was based (more to come, before or after Olli’s talk in Roma!).
Oliver Ratman came to give a seminar today at our Big’MC seminar series. It was an extension of the talk I attended last month in Bristol:
10:45 Oliver Ratmann (Duke University and Imperial College) – “Approximate Bayesian Computation based on summaries with frequency properties”
Approximate Bayesian Computation (ABC) has quickly become a valuable tool in many applied fields, but the statistical properties obtained by choosing a particular summary, distance function and error threshold are poorly understood. In an effort to better understand the effect of these ABC tuning parameters, we consider summaries that are associated with empirical distribution functions. These frequency properties of summaries suggest what kind of distance function are appropriate, and the validity of the choice of summaries can be assessed on the fly during Monte Carlo simulations. Among valid choices, uniformly most powerful distances can be shown to optimize the ABC acceptance probability. Considering the binding function between the ABC model and the frequency model of the summaries, we can characterize the asymptotic consistency of the ABC maximum-likelhood estimate in general situations. We provide examples from phylogenetics and dynamical systems to demonstrate that empirical distribution functions of summaries can often be obtained without expensive re-simulations, so that the above theoretical results are applicable in a broad set of applications. In part, this work will be illustrated on fitting phylodynamic models that capture the evolution and ecology of interpandemic influenza A (H3N2) to incidence time series and the phylogeny of H3N2’s immunodominant haemagglutinin gene.
I however benefited enormously from hearing the talk again and also from discussing the fundamentals of his approach before and after the talk (in the nearest Aussie pub!). Olli’s approach is (once again!) rather iconoclastic in that he presents ABC as a testing procedure, using frequentist tests and concepts to build an optimal acceptance condition. Since he manipulates several error terms simultaneously (as before), he needs to address the issue of multiple testing but, thanks to a switch between acceptance and rejection, null and alternative, the individual α-level tests get turned into a global α-level test.
Model selection for genetic and epidemiological data [back]
Posted in pictures, Statistics, Travel, University life with tags ABC model choice, London, model choice, University College London on March 31, 2012 by xi'anThe afternoon on model choice at the London School of Hygiene (!) and Tropical Medicine was worth the short trip from Paris, especially when the weather in London felt like real summer: walking in the streets was a real treat! The talks were also interesting in that the emphasis was off-key from my usual statistics talks and thus required more focus from me. The first talk by Stijn Vansteelandt emphasized (very nicely) the role of confounders and exposure in causal inference in ways that were novel to me (although it seems in the end that a proper graphical modelling of all quantities involved in the process would allow for a standard statistical analysis). I also had troubles envisioning the Bayesian version of the approach, although Stijn referred to a recent paper by Wang et al. While Stijn has a joint paper in the Series B that just arrived on my desk, this talk is more related to appear in Statistical Methodology in Medical Research (The second talk was mine and presumably too technical in that I should have gotten rid of the new mathematical assumptions [A1]-[A4] altogether.) The third was a fascinating statistical analysis by Doug Speed of an important genetic heritability paper, by Yang et al., where he took the assumptions of the model one at a time to see how they were impacting the conclusions and found that none was to blame. The fourth and final talk by David Clayton covered the role of link functions in GLMs applied to epidemiological models, in connection with older papers from the 1990’s, to conclude that the choice of the link function mattered for the statistical properties of the variable selection procedures, which I found a bit puzzling based on my (limited) econometric intuition that all link functions lead to consistent pseudo-models. In any case, this was a fairly valuable meeting, furthermore attended by a very large audience.