**M**ichael Stumpf sent me Topological sensitivity analysis for systems biology, written by Ann Babtie and Paul Kirk, *en avant-première* before it came out in PNAS and I read it during the trip to NIPS in Montréal. (The paper is published in open access, so everyone can read it now!) The topic is quite central to a lot of debates about climate change, economics, ecology, finance, &tc., namely to assess the impact of using the wrong model to draw conclusions and make decisions about a real phenomenon. (Which reminded me of the distinction between mechanical and phenomenological models stressed by Michael Blum in his NIPS talk.) And it is of much interest from a Bayesian point of view since assessing the worth of a model requires modelling the “outside” of a model, using for instance Gaussian processes as in the talk Tony O’Hagan gave in Warwick earlier this term. I would even go as far as saying that the issue of assessing [and compensating for] how wrong a model is, given available data, may be the (single) most under-assessed issue in statistics. We (statisticians) have yet to reach our Boxian era.

In Babtie et al., the space or universe of models is represented by network topologies, each defining the set of “parents” in a semi-Markov representation of the (dynamic) model. At which stage Gaussian processes are also called for help. Alternative models are ranked in terms of fit according to a distance between simulated data from the original model (sounds like a form of ABC?!). Obviously, there is a limitation in the number and variety of models considered this way, I mean there are still assumptions made on the possible models, while this number of models is increasing quickly with the number of nodes. As pointed out in the paper (see, e.g., Fig.4), the method has a parametric bootstrap flavour, to some extent.

What is unclear is how one can conduct Bayesian inference with such a collection of models. Unless all models share the same “real” parameters, which sounds unlikely. The paper mentions using uniform prior on all parameters, but this is difficult to advocate in a general setting. Another point concerns the quantification of how much one can trust a given model, since it does not seem models are penalised by a prior probability. Hence they all are treated identically. This is a limitation of the approach (or an indication that it is only a preliminary step in the evaluation of models) in that some models within a large enough collection will eventually provide an estimate that differs from those produced by the other models. So the assessment may become altogether highly pessimistic for this very reason.

“If our parameters have a real, biophysical interpretation, we therefore need to be very careful not to assert that we know the true values of these quantities in the underlying system, just because–for a given model–we can pin them down with relative certainty.”

In addition to its relevance for moving towards approximate models and approximate inference, and in continuation of yesterday’s theme, the paper calls for nested sampling to generate samples from the posterior(s) and to compute the evidence associated with each model. (I realised I had missed this earlier paper by Michael and co-authors on nested sampling for system biology.) There is no discussion in the paper on why nested sampling was selected, compared with, say, a random walk Metropolis-Hastings algorithm. Unless it is used in a fully automated way, but the paper is rather terse on that issue… And running either approach on 10⁷ models in comparison sounds like an awful lot of work!!! Using importance [sampling] nested sampling as we proposed with Nicolas Chopin could be a way to speed up this exploration if all parameters are identical between all or most models.