Archive for the Books Category

locally weighted MCMC

Posted in Books, Statistics, University life with tags , , , , , , , , on July 16, 2015 by xi'an

Street light near the St Kilda Road bridge, Melbourne, July 21, 2012Last week, on arXiv, Espen Bernton, Shihao Yang, Yang Chen, Neil Shephard, and Jun Liu (all from Harvard) proposed a weighting scheme to associated MCMC simulations, in connection with the parallel MCMC of Ben Calderhead discussed earlier on the ‘Og. The weight attached to each proposal is either the acceptance probability itself (with the rejection probability being attached to the current value of the MCMC chain) or a renormalised version of the joint target x proposal, either forward or backward. Both solutions are unbiased in that they have the same expectation as the original MCMC average, being some sort of conditional expectation. The proof of domination in the paper builds upon Calderhead’s formalism.

This work reminded me of several reweighting proposals we made over the years, from the global Rao-Blackwellisation strategy with George Casella, to the vanilla Rao-Blackwellisation solution we wrote with Randal Douc a few years ago, both of whom also are demonstrably improving upon the standard MCMC average. By similarly recycling proposed but rejected values. Or by diminishing the variability due to the uniform draw. The slightly parallel nature of the approach also connects with our parallel MCM version with Pierre Jacob (now Harvard as well!) and Murray Smith (who now leaves in Melbourne, hence the otherwise unrelated picture).

Leave the Pima Indians alone!

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , on July 15, 2015 by xi'an

“…our findings shall lead to us be critical of certain current practices. Specifically, most papers seem content with comparing some new algorithm with Gibbs sampling, on a few small datasets, such as the well-known Pima Indians diabetes dataset (8 covariates). But we shall see that, for such datasets, approaches that are even more basic than Gibbs sampling are actually hard to beat. In other words, datasets considered in the literature may be too toy-like to be used as a relevant benchmark. On the other hand, if ones considers larger datasets (with say 100 covariates), then not so many approaches seem to remain competitive” (p.1)

Nicolas Chopin and James Ridgway (CREST, Paris) completed and arXived a paper they had “threatened” to publish for a while now, namely why using the Pima Indian R logistic or probit regression benchmark for checking a computational algorithm is not such a great idea! Given that I am definitely guilty of such a sin (in papers not reported in the survey), I was quite eager to read the reasons why! Beyond the debate on the worth of such a benchmark, the paper considers a wider perspective as to how Bayesian computation algorithms should be compared, including the murky waters of CPU time versus designer or programmer time. Which plays against most MCMC sampler.

As a first entry, Nicolas and James point out that the MAP can be derived by standard a Newton-Raphson algorithm when the prior is Gaussian, and even when the prior is Cauchy as it seems most datasets allow for Newton-Raphson convergence. As well as the Hessian. We actually took advantage of this property in our comparison of evidence approximations published in the Festschrift for Jim Berger. Where we also noticed the awesome performances of an importance sampler based on the Gaussian or Laplace approximation. The authors call this proposal their gold standard. Because they also find it hard to beat. They also pursue this approximation to its logical (?) end by proposing an evidence approximation based on the above and Chib’s formula. Two close approximations are provided by INLA for posterior marginals and by a Laplace-EM for a Cauchy prior. Unsurprisingly, the expectation-propagation (EP) approach is also implemented. What EP lacks in theoretical backup, it seems to recover in sheer precision (in the examples analysed in the paper). And unsurprisingly as well the paper includes a randomised quasi-Monte Carlo version of the Gaussian importance sampler. (The authors report that “the improvement brought by RQMC varies strongly across datasets” without elaborating for the reasons behind this variability. They also do not report the CPU time of the IS-QMC, maybe identical to the one for the regular importance sampling.) Maybe more surprising is the absence of a nested sampling version.

pimcisIn the Markov chain Monte Carlo solutions, Nicolas and James compare Gibbs, Metropolis-Hastings, Hamiltonian Monte Carlo, and NUTS. Plus a tempering SMC, All of which are outperformed by importance sampling for small enough datasets. But get back to competing grounds for large enough ones, since importance sampling then fails.

“…let’s all refrain from now on from using datasets and models that are too simple to serve as a reasonable benchmark.” (p.25)

This is a very nice survey on the theme of binary data (more than on the comparison of algorithms in that the authors do not really take into account design and complexity, but resort to MSEs versus CPus). I however do not agree with their overall message to leave the Pima Indians alone. Or at least not for the reason provided therein, namely that faster and more accurate approximations methods are available and cannot be beaten. Benchmarks always have the limitation of “what you get is what you see”, i.e., the output associated with a single dataset that only has that many idiosyncrasies. Plus, the closeness to a perfect normal posterior makes the logistic posterior too regular to pause a real challenge (even though MCMC algorithms are as usual slower than iid sampling). But having faster and more precise resolutions should on the opposite be  cause for cheers, as this provides a reference value, a golden standard, to check against. In a sense, for every Monte Carlo method, there is a much better answer, namely the exact value of the integral or of the optimum! And one is hardly aiming at a more precise inference for the benchmark itself: those Pima Indians [whose actual name is Akimel O’odham] with diabetes involved in the original study are definitely beyond help from statisticians and the model is unlikely to carry out to current populations. When the goal is to compare methods, as in our 2009 paper for Jim Berger’s 60th birthday, what matters is relative speed and relative ease of implementation (besides the obvious convergence to the proper target). In that sense bigger and larger is not always relevant. Unless one tackles really big or really large datasets, for which there is neither benchmark method nor reference value.

can we trust computer simulations? [day #2]

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , on July 13, 2015 by xi'an

Herrenhausen“Sometimes the models are better than the data.” G. Krinner

Second day at the conference on building trust in computer simulations. Starting with a highly debated issue, climate change projections. Since so many criticisms are addressed to climate models as being not only wrong but also unverifiable. And uncheckable. As explained by Gerhart Krinner, the IPCC has developed methodologies to compare models and evaluate predictions. However, from what I understood, this validation does not say anything about the future, which is the part of the predictions that matters. And that is attacked by critics and feeds climatic-skeptics. Because it is so easy to argue against the homogeneity of the climate evolution and for “what you’ve seen is not what you’ll get“! (Even though climatic-skeptics are the least likely to use this time-heterogeneity argument, being convinced as they are of the lack of human impact over the climate.)  The second talk was by Viktoria Radchuk about validation in ecology. Defined here as a test of predictions against independent data (and designs). And mentioning Simon Wood’s synthetic likelihood as the Bayesian reference for conducting model choice (as a synthetic likelihoods ratio). I had never thought of this use (found in Wood’s original paper) for synthetic likelihood, I feel a bit queasy about using a synthetic likelihood ratio as a genuine likelihood ratio. Which led to a lively discussion at the end of her talk. The next talk was about validation in economics by Matteo Richiardi, who discussed state-space models where the hidden state is observed through a summary statistic, perfect playground for ABC! But Matteo opted instead for a non-parametric approach that seems to increase imprecision and that I have never seen used in state-space models. The last part of the talk was about non-ergodic models, for which checking for validity becomes much more problematic, in my opinion. Unless one manages multiple observations of the non-ergodic path. Nicole Saam concluded this “Validation in…” morning with Validation in Sociology. With a more pessimistic approach to the possibility of finding a falsifying strategy, because of the vague nature of sociology models. For which data can never be fully informative. She illustrated the issue with an EU negotiation analysis. Where most hypotheses could hardly be tested.

“Bayesians persist with poor examples of randomness.” L. Smith

“Bayesians can be extremely reasonable.” L. Smith

The afternoon session was dedicated to methodology, mostly statistics! Andrew Robinson started with a talk on (frequentist) model validation. Called splitters and lumpers. Illustrated by a forest growth model. He went through traditional hypothesis tests like Neyman-Pearson’s that try to split between samples. And (bio)equivalence tests that take difference as the null. Using his equivalence R package. Then Leonard Smith took over [in a literal way!] from a sort-of-Bayesian perspective, in a work joint with Jim Berger and Gary Rosner on pragmatic Bayes which was mostly negative about Bayesian modelling. Introducing (to me) the compelling notion of structural model error as a representation of the inadequacy of the model. With illustrations from weather and climate models. His criticism of the Bayesian approach is that it cannot be holistic while pretending to be [my wording]. And being inadequate to measure model inadequacy, to the point of making prior choice meaningless. Funny enough, he went back to the ball dropping experiment David Higdon discussed at one JSM I attended a while ago, with the unexpected outcome that one ball did not make it to the bottom of the shaft. A more positive side was that posteriors are useful models but should not be interpreted from a probabilistic perspective. Move beyond probability was his final message. (For most of the talk, I misunderstood P(BS), the probability of a big surprise, for something else…) This was certainly the most provocative talk of the conference  and the discussion could have gone on for the rest of day! Somewhat, Lenny was voluntarily provocative in piling the responsibility upon the Bayesian’s head for being overconfident and not accounting for the physicist’ limitations in modelling the phenomenon of interest. Next talk was by Edward Dougherty on methods used in biology. He separated within-model uncertainty from outside-model inadequacy. The within model part is mostly easy to agree upon. Even though difficulties in estimating parameters creates uncertainty classes of models. Especially because of being from a small data discipline. He analysed the impact of machine learning techniques like classification as being useless without prior knowledge. And argued in favour of the Bayesian minimum mean square error estimator. Which can also lead to a classifier. And experimental design. (Using MSE seems rather reductive when facing large dimensional parameters.) Last talk of the day was by Nicolas Becu, a geographer, with a surprising approach to validation via stakeholders. A priori not too enticing a name! The discussion was of a more philosophical nature, going back to (re)define validation against reality and imperfect models. And including social aspects of validation, e.g., reality being socially constructed. This led to the stakeholders, because a model is then a shared representation. Nicolas illustrated the construction by simulation “games” of a collective model in a community of Thai farmers and in a group of water users.

In a rather unique fashion, we also had an evening discussion on points we share and points we disagreed upon. After dinner (and wine), which did not help I fear! Bill Oberkampf mentioned the use of manufactured solutions to check code, which seemed very much related to physics. But then we got mired into the necessity of dividing between verification and validation. Which sounded very and too much engineering-like to me. Maybe because I do not usually integrate coding errors and algorithmic errors into my reasoning (verification)… Although sharing code and making it available makes a big difference. Or maybe because considering all models are wrong is neither part of my methodology (validation). This part ended up in a fairly pessimistic conclusion on the lack of trust in most published articles. At least in the biological sciences.

can we trust computer simulations?

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , on July 10, 2015 by xi'an

lion

How can one validate the outcome of a validation model? Or can we even imagine validation of this outcome? This was the starting question for the conference I attended in Hannover. Which obviously engaged me to the utmost. Relating to some past experiences like advising a student working on accelerated tests for fighter electronics. And failing to agree with him on validating a model to turn those accelerated tests within a realistic setting. Or reviewing this book on climate simulation three years ago while visiting Monash University. Since I discuss in details below most talks of the day, here is an opportunity to opt away! Continue reading

Bayesian inference for partially identified models [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , on July 9, 2015 by xi'an

“The crux of the situation is that we lack theoretical insight into even quite basic questions about what is going on. More particularly, we cannot sayy anything about the limiting posterior marginal distribution of α compared to the prior marginal distribution of α.” (p.142)

Bayesian inference for partially identified models is a recent CRC Press book by Paul Gustafson that I received for a review in CHANCE with keen interest! If only because the concept of unidentifiability has always puzzled me. And that I have never fully understood what I felt was a sort of joker card that a Bayesian model was the easy solution to the problem since the prior was compensating for the components of the parameter not identified by the data. As defended by Dennis Lindley that “unidentifiability causes no real difficulties in the Bayesian approach”. However, after reading the book, I am less excited in that I do not feel it answers this type of questions about non-identifiable models and that it is exclusively centred on the [undoubtedly long-term and multifaceted] research of the author on the topic.

“Without Bayes, the feeling is that all the data can do is locate the identification region, without conveying any sense that some values in the region are more plausible than others.” (p.47)

Overall, the book is pleasant to read, with a light and witty style. The notational conventions are somewhat unconventional but well explained, to distinguish θ from θ* from θ. The format of the chapters is quite similar with a definition of the partially identified model, an exhibition of the transparent reparameterisation, the computation of the limiting posterior distribution [of the non-identified part], a demonstration [which it took me several iterations as the English exhibition rather than the French proof, pardon my French!]. Chapter titles suffer from an excess of the “further” denomination… The models themselves are mostly of one kind, namely binary observables and non-observables leading to partially observed multinomials with some non-identifiable probabilities. As in missing-at-random models (Chapter 3). In my opinion, it is only in the final chapters that the important questions are spelled-out, not always faced with a definitive answer. In essence, I did not get from the book (i) a characterisation of the non-identifiable parts of a model, of the  identifiability of unidentifiability, and of the universality of the transparent reparameterisation, (ii) a tool to assess the impact of a particular prior and possibly to set it aside, and (iii) a limitation to the amount of unidentifiability still allowing for coherent inference. Hence, when closing the book, I still remain in the dark (or at least in the grey) on how to handle partially identified models. The author convincingly argues that there is no special advantage to using a misspecified if identifiable model to a partially identified model, for this imbues false confidence (p.162), however we also need the toolbox to verify this is indeed the case.

“Given the data we can turn the Bayesian computational crank nonetheless and see what comes out.” (p.xix)

“It is this author’s contention that computation with partially identified models is a “bottleneck” issue.” (p.141)

Bayesian inference for partially identified models is particularly concerned about computational issues and rightly so. It is however unclear to me (without more time to invest investigating the topic) why the “use of general-purpose software is limited to the [original] parametrisation” (p.24) and why importance sampling would do better than MCMC on a general basis. I would definitely have liked more details on this aspect. There is a computational considerations section at the end of the book, but it remains too allusive for my taste. My naïve intuition would be that the lack of identifiability leads to flatter posterior and hence to easier MCMC moves, but Paul Gustafson reports instead bad mixing from standard MCMC schemes (like WinBUGS).

In conclusion, the book opens a new perspective on the relevance of partially identifiable models, trying to lift the stigma associated with them, and calls for further theory and methodology to deal with those. Here are the author’s final points (p.162):

  • “Identification is nuanced. Its absence does not preclude a parameter being well estimated, not its presence guarantee a parameter can be well estimated.”
  • “If we really took limitations of study designs and data quality seriously, then partially identifiable models would crop up all the time in a variety of scientific fields.”
  • “Making modeling assumptions for the sole purpose of gaining full identification can be a mug’s game (…)”
  • “If we accept partial identifiability, then consequently we need to regard sample size differently. There are profound implications of posterior variance tending to a positive limit as the sample size grows.”

These points may be challenging enough to undertake to read Bayesian inference for partially identified models in order to make one’s mind about their eventual relevance in statistical modelling.

[Disclaimer about potential self-plagiarism: this post will also be published as a book review in my CHANCE column. ]

how to build trust in computer simulations: Towards a general epistemology of validation

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , on July 8, 2015 by xi'an

I have rarely attended a workshop with such a precise goal, but then I have neither ever attended a philosophy workshop… Tonight, I am flying to Han(n)over, Lower Saxony, for a workshop on the philosophical aspects of simulated models. I was quite surprised to get invited to this workshop, but found it quite a treat to attend a multi-disciplinary meeting about simulations and their connection with the real world! I am less certain I can contribute anything meaningful, but still look forward to it. And will report on the discussions, hopefully. Here is the general motivation of the workshop:

“In the last decades, our capacities to investigate complex systems of various scales have been greatly enhanced by the method of computer simulation. This progress is not without a price though: We can only trust the results of computer simulations if they have been properly validated, i.e., if they have been shown to be reliable. Despite its importance, validation is often still neglected in practice and only poorly understood from a theoretical perspective. The aim of this conference is to discuss methodological and philosophical problems of validation from a multidisciplinary perspective and to take first steps in developing a general framework for thinking about validation. Working scientists from various natural and social sciences and philosophers of science join forces to make progress in understanding the epistemology of validation.”

analysing statistical and computational trade-off of estimation procedures

Posted in Books, pictures, Statistics, University life with tags , , , , , , on July 8, 2015 by xi'an

bostown1

“The collection of estimates may be determined by questions such as: How much storage is available? Can all the data be kept in memory or only a subset? How much processing power is available? Are there parallel or distributed systems that can be exploited?”

Daniel Sussman, Alexander Volfovsky, and Edoardo Airoldi from Harvard wrote a very interesting paper about setting a balance between statistical efficiency and computational efficiency, a theme that resonates with our recent work on ABC and older considerations about the efficiency of Monte Carlo algorithms. While the paper avoids drifting towards computer science even with a notion like algorithmic complexity, I like the introduction of a loss function in the comparison game, even though the way to combine both dimensions is unclear. And may limit the exercise to an intellectual game. In an ideal setting one would set the computational time, like “I have one hour to get this estimate”, and compare risks under that that computing constraint. Possibly dumping some observations from the sample to satisfy the constraint. Ideally. Which is why this also reminds me of ABC: given an intractable likelihood, one starts by throwing away some data precision by using a tolerance ε and usually more through an insufficient statistic. Hence ABC procedures could also be compared in such terms.

In the current paper, the authors only compare schemes of breaking the sample into bits to handle each observation only once. Meaning it cannot be used in both the empirical mean and the empirical variance. This sounds a bit contrived in that the optimum allocation depends on the value of the parameter the procedure attempts to estimate. Still, it could lead to a new form of bandit problems: given a bandit with as many arms as there are parameters, at each new observation, decide on the allocation towards minimising the overall risk. (There is a missing sentence at the end of Section 4.)

Any direction for turning those considerations into a practical decision machine would be fantastic, although the difficulties are formidable, from deciding between estimators and selecting a class of estimators, to computing costs and risks depending on unknown parameters.

Follow

Get every new post delivered to your Inbox.

Join 891 other followers