Archive for model choice

Topological sensitivity analysis for systems biology

Posted in Books, Statistics, Travel, University life with tags , , , , , , on December 17, 2014 by xi'an

Michael Stumpf sent me Topological sensitivity analysis for systems biology, written by Ann Babtie and Paul Kirk,  en avant-première before it came out in PNAS and I read it during the trip to NIPS in Montréal. (The paper is published in open access, so everyone can read it now!) The topic is quite central to a lot of debates about climate change, economics, ecology, finance, &tc., namely to assess the impact of using the wrong model to draw conclusions and make decisions about a real phenomenon. (Which reminded me of the distinction between mechanical and phenomenological models stressed by Michael Blum in his NIPS talk.) And it is of much interest from a Bayesian point of view since assessing the worth of a model requires modelling the “outside” of a model, using for instance Gaussian processes as in the talk Tony O’Hagan gave in Warwick earlier this term. I would even go as far as saying that the issue of assessing [and compensating for] how wrong a model is, given available data, may be the (single) most under-assessed issue in statistics. We (statisticians) have yet to reach our Boxian era.

In Babtie et al., the space or universe of models is represented by network topologies, each defining the set of “parents” in a semi-Markov representation of the (dynamic) model. At which stage Gaussian processes are also called for help. Alternative models are ranked in terms of fit according to a distance between simulated data from the original model (sounds like a form of ABC?!). Obviously, there is a limitation in the number and variety of models considered this way, I mean there are still assumptions made on the possible models, while this number of models is increasing quickly with the number of nodes. As pointed out in the paper (see, e.g., Fig.4), the method has a parametric bootstrap flavour, to some extent.

What is unclear is how one can conduct Bayesian inference with such a collection of models. Unless all models share the same “real” parameters, which sounds unlikely. The paper mentions using uniform prior on all parameters, but this is difficult to advocate in a general setting. Another point concerns the quantification of how much one can trust a given model, since it does not seem models are penalised by a prior probability. Hence they all are treated identically. This is a limitation of the approach (or an indication that it is only a preliminary step in the evaluation of models) in that some models within a large enough collection will eventually provide an estimate that differs from those produced by the other models. So the assessment may become altogether highly pessimistic for this very reason.

“If our parameters have a real, biophysical interpretation, we therefore need to be very careful not to assert that we know the true values of these quantities in the underlying system, just because–for a given model–we can pin them down with relative certainty.”

In addition to its relevance for moving towards approximate models and approximate inference, and in continuation of yesterday’s theme, the paper calls for nested sampling to generate samples from the posterior(s) and to compute the evidence associated with each model. (I realised I had missed this earlier paper by Michael and co-authors on nested sampling for system biology.) There is no discussion in the paper on why nested sampling was selected, compared with, say, a random walk Metropolis-Hastings algorithm. Unless it is used in a fully automated way,  but the paper is rather terse on that issue… And running either approach on 10⁷ models in comparison sounds like an awful lot of work!!! Using importance [sampling] nested sampling as we proposed with Nicolas Chopin could be a way to speed up this exploration if all parameters are identical between all or most models.

talk in Linz [first slide]

Posted in Mountains, pictures, Running, University life with tags , , , , , , , , , on September 17, 2014 by xi'an

Olli à/in/im Paris

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , on May 27, 2013 by xi'an

Warning: Here is an old post from last October I can at last post since Olli just arXived the paper on which this talk was based (more to come, before or after Olli’s talk in Roma!).

Oliver Ratman came to give a seminar today at our Big’MC seminar series. It was an extension of the talk I attended last month in Bristol:

10:45 Oliver Ratmann (Duke University and Imperial College) – “Approximate Bayesian Computation based on summaries with frequency properties”

Approximate Bayesian Computation (ABC) has quickly become a valuable tool in many applied fields, but the statistical properties obtained by choosing a particular summary, distance function and error threshold are poorly understood. In an effort to better understand the effect of these ABC tuning parameters, we consider summaries that are associated with empirical distribution functions. These frequency properties of summaries suggest what kind of distance function are appropriate, and the validity of the choice of summaries can be assessed on the fly during Monte Carlo simulations. Among valid choices, uniformly most powerful distances can be shown to optimize the ABC acceptance probability. Considering the binding function between the ABC model and the frequency model of the summaries, we can characterize the asymptotic consistency of the ABC maximum-likelhood estimate in general situations. We provide examples from phylogenetics and dynamical systems to demonstrate that empirical distribution functions of summaries can often be obtained without expensive re-simulations, so that the above theoretical results are applicable in a broad set of applications. In part, this work will be illustrated on fitting phylodynamic models that capture the evolution and ecology of interpandemic influenza A (H3N2) to incidence time series and the phylogeny of H3N2’s immunodominant haemagglutinin gene.

I however benefited enormously from hearing the talk again and also from discussing the fundamentals of his approach before and after the talk (in the nearest Aussie pub!). Olli’s approach is (once again!) rather iconoclastic in that he presents ABC as a testing procedure, using frequentist tests and concepts to build an optimal acceptance condition. Since he manipulates several error terms simultaneously (as before), he needs to address the issue of multiple testing but, thanks to a switch between acceptance and rejection, null and alternative, the individual α-level tests get turned into a global α-level test.

Model selection for genetic and epidemiological data [back]

Posted in pictures, Statistics, Travel, University life with tags , , , on March 31, 2012 by xi'an

The afternoon on model choice at the London School of Hygiene (!) and Tropical Medicine was worth the short trip from Paris, especially when the weather in London felt like real summer: walking in the streets was a real treat! The talks were also interesting in that the emphasis was off-key from my usual statistics talks and thus required more focus from me. The first talk by Stijn Vansteelandt emphasized (very nicely) the role of confounders and exposure in causal inference in ways that were novel to me (although it seems in the end that a proper graphical modelling of all quantities involved in the process would allow for a standard statistical analysis). I also had troubles envisioning the Bayesian version of the approach, although Stijn referred to a recent paper by Wang et al. While Stijn has a joint paper in the Series B that just arrived on my desk, this talk is more related to appear in Statistical Methodology in Medical Research  (The second talk was mine and presumably too technical in that I should have gotten rid of the new mathematical assumptions [A1]-[A4] altogether.) The third was a fascinating statistical analysis by Doug Speed of an important genetic heritability paper, by Yang et al.,  where he took the assumptions of the model one at a time to see how they were impacting the conclusions and found that none was to blame. The fourth and final talk by David Clayton covered the role of link functions in GLMs applied to epidemiological models, in connection with older papers from the 1990’s, to conclude that the choice of the link function mattered for the statistical properties of the variable selection procedures, which I found a bit puzzling based on my (limited) econometric intuition that all link functions lead to consistent pseudo-models. In any case, this was a fairly valuable meeting, furthermore attended by a very large audience.

Oxford, Oxfordshire

Posted in pictures, Statistics, University life with tags , , , , , , on February 23, 2012 by xi'an

Second Oxonian post of the week! And second English trip of the year. I will give a seminar lecture this afternoon in the Statistics Departement on ABC model choice, using the same slides as in Cambridge last month. (Following another ABC talk by Richard Wilkinson a few weeks ago.)

ABC [PhD] course

Posted in Books, R, Statistics, Travel, University life with tags , , , , , , , , , , on January 26, 2012 by xi'an

As mentioned in the latest post on ABC, I am giving a short doctoral course on ABC methods and convergence at CREST next week. I have now made a preliminary collection of my slides (plus a few from Jean-Michel Marin’s), available on slideshare (as ABC in Roma, because I am also giving the course in Roma, next month, with an R lab on top of it!):

and I did manage to go over the book by Gouriéroux and Monfort on indirect inference over the weekend. I still need to beef up the slides before the course starts next Thursday! (The core version of the slides is actually from the course I gave in Wharton more than a year ago.)

English trip (1)

Posted in Statistics, Travel, University life with tags , , , , , , , , , , , , , on January 25, 2012 by xi'an

Today, I am attending a workshop on the use of graphics processing units in Statistics in Warwick, supported by CRiSM, presenting our recent works with Randal Douc, Pierre Jacob and Murray Smith. (I will use the same slides as in Telecom two months ago, hopefully avoiding the loss of integral and summation signs this time!) Pierre Jacob will talk about Wang-Landau.

Then, tomorrow, I am off to Cambridge to talk about ABC and model choice on Friday afternoon. (Presumably using the same slides as in Provo.)

The (1) in the title is in prevision of a second trip to Oxford next month and another one to Bristol two months after! (The trip to Edinburgh does not count of course, since it is in Scotland!)

Follow

Get every new post delivered to your Inbox.

Join 717 other followers