## Think Bayes: Bayesian Statistics Made Simple

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , on October 27, 2015 by xi'an

By some piece of luck, I came upon the book Think Bayes: Bayesian Statistics Made Simple, written by Allen B. Downey and published by Green Tea Press [which I could relate to No Starch Press, focussing on coffee!, which published Statistics Done Wrong that I reviewed a while ago] which usually publishes programming books with fun covers. The book is available on-line for free in pdf and html formats, and I went through it during a particularly exciting administrative meeting…

“Most books on Bayesian statistics use mathematical notation and present ideas in terms of mathematical concepts like calculus. This book uses Python code instead of math, and discrete approximations instead of continuous mathematics. As a result, what would be an integral in a math book becomes a summation, and most operations on probability distributions are simple loops.”

The book is most appropriately published in this collection as most of it concentrates on Python programming, with hardly any maths formula. In some sense similar to Jim Albert’s R book. Obviously, coming from maths, and having never programmed in Python, I find the approach puzzling, But just as obviously, I am aware—both from the comments on my books and from my experience on X validated—that a large group (majority?) of newcomers to the Bayesian realm find the mathematical approach to the topic a major hindrance. Hence I am quite open to this editorial choice as it is bound to include more people to think Bayes, or to think they can think Bayes.

“…in fewer than 200 pages we have made it from the basics of probability to the research frontier. I’m very happy about that.”

The choice made of operating almost exclusively through motivating examples is rather traditional in US textbooks. See e.g. Albert’s book. While it goes against my French inclination to start from theory and concepts and end up with illustrations, I can see how it operates in a programming book. But as always I fear it makes generalisations uncertain and understanding more shaky… The examples are per force simple and far from realistic statistics issues. Hence illustrates more the use of Bayesian thinking for decision making than for data analysis. To wit, those examples are about the Monty Hall problem and other TV games, some urn, dice, and coin models, blood testing, sport predictions, subway waiting times, height variability between men and women, SAT scores, cancer causality, a Geiger counter hierarchical model inspired by Jaynes, …, the exception being the final Belly Button Biodiversity dataset in the final chapter, dealing with the (exciting) unseen species problem in an equally exciting way. This may explain why the book does not cover MCMC algorithms. And why ABC is covered through a rather artificial normal example. Which also hides some of the maths computations under the carpet.

“The underlying idea of ABC is that two datasets are alike if they yield the same summary statistics. But in some cases, like the example in this chapter, it is not obvious which summary statistics to choose.¨

In conclusion, this is a very original introduction to Bayesian analysis, which I welcome for the reasons above. Of course, it is only an introduction, which should be followed by a deeper entry into the topic, and with [more] maths. In order to handle more realistic models and datasets.

## a simulated annealing approach to Bayesian inference

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on October 1, 2015 by xi'an

A misleading title if any! Carlos Albert arXived a paper with this title this morning and I rushed to read it. Because it sounded like Bayesian analysis could be expressed as a special form of simulated annealing. But it happens to be a rather technical sequel [“that complies with physics standards”] to another paper I had missed, A simulated annealing approach to ABC, by Carlos Albert, Hans Künsch, and Andreas Scheidegger. Paper that appeared in Statistics and Computing last year, and which is most interesting!

“These update steps are associated with a flow of entropy from the system (the ensemble of particles in the product space of parameters and outputs) to the environment. Part of this flow is due to the decrease of entropy in the system when it transforms from the prior to the posterior state and constitutes the well-invested part of computation. Since the process happens in finite time, inevitably, additional entropy is produced. This entropy production is used as a measure of the wasted computation and minimized, as previously suggested for adaptive simulated annealing” (p.3)

The notion behind this simulated annealing intrusion into the ABC world is that the choice of the tolerance can be adapted along iterations according to a simulated annealing schedule. Both papers make use of thermodynamics notions that are completely foreign to me, like endoreversibility, but aim at minimising the “entropy production of the system, which is a measure for the waste of computation”. The central innovation is to introduce an augmented target on (θ,x) that is

f(x|θ)π(θ)exp{-ρ(x,y)/ε},

where ε is the tolerance, while ρ(x,y) is a measure of distance to the actual observations, and to treat ε as an annealing temperature. In an ABC-MCMC implementation, the acceptance probability of a random walk proposal (θ’,x’) is then

exp{ρ(x,y)/ε-ρ(x’,y)/ε}∧1.

Under some regularity constraints, the sequence of targets converges to

π(θ|y)exp{-ρ(x,y)},

if ε decreases slowly enough to zero. While the representation of ABC-MCMC through kernels other than the Heaviside function can be found in the earlier ABC literature, the embedding of tolerance updating within the modern theory of simulated annealing is rather exciting.

Furthermore, we will present an adaptive schedule that attempts convergence to the correct posterior while minimizing the required simulations from the likelihood. Both the jump distribution in parameter space and the tolerance are adapted using mean fields of the ensemble.” (p.2)

What I cannot infer from a rather quick perusal of the papers is whether or not the implementation gets into the way of the all-inclusive theory. For instance, how can the Markov chain keep moving as the tolerance gets to zero? Even with a particle population and a sequential Monte Carlo implementation, it is unclear why the proposal scale factor [as in equation (34)] does not collapse to zero in order to ensure a non-zero acceptance rate. In the published paper, the authors used the same toy mixture example as ours [from Sisson et al., 2007], where we earned the award of the “incredibly ugly squalid picture”, with improvements in the effective sample size, but this remains a toy example. (Hopefully a post to be continued in more depth…)

## Mathematical underpinnings of Analytics (theory and applications)

Posted in Statistics, University life, Books with tags , , , , , , , , , , , , , , , on September 25, 2015 by xi'an

“Today, a week or two spent reading Jaynes’ book can be a life-changing experience.” (p.8)

I received this book by Peter Grindrod, Mathematical underpinnings of Analytics (theory and applications), from Oxford University Press, quite a while ago. (Not that long ago since the book got published in 2015.) As a book for review for CHANCE. And let it sit on my desk and in my travel bag for the same while as it was unclear to me that it was connected with Statistics and CHANCE. What is [are?!] analytics?! I did not find much of a definition of analytics when I at last opened the book, and even less mentions of statistics or machine-learning, but Wikipedia told me the following:

“Analytics is a multidimensional discipline. There is extensive use of mathematics and statistics, the use of descriptive techniques and predictive models to gain valuable knowledge from data—data analysis. The insights from data are used to recommend action or to guide decision making rooted in business context. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with the entire methodology.”

Barring the absurdity of speaking of a “multidimensional discipline” [and even worse of linking with the mathematical notion of dimension!], this tells me analytics is a mix of data analysis and decision making. Hence relying on (some) statistics. Fine.

“Perhaps in ten years, time, the mathematics of behavioural analytics will be common place: every mathematics department will be doing some of it.”(p.10)

First, and to start with some positive words (!), a book that quotes both Friedrich Nietzsche and Patti Smith cannot get everything wrong! (Of course, including a most likely apocryphal quote from the now late Yogi Berra does not partake from this category!) Second, from a general perspective, I feel the book meanders its way through chapters towards a higher level of statistical consciousness, from graphs to clustering, to hidden Markov models, without precisely mentioning statistics or statistical model, while insisting very much upon Bayesian procedures and Bayesian thinking. Overall, I can relate to most items mentioned in Peter Grindrod’s book, but mostly by first reconstructing the notions behind. While I personally appreciate the distanced and often ironic tone of the book, reflecting upon the author’s experience in retail modelling, I am thus wondering at which audience Mathematical underpinnings of Analytics aims, for a practitioner would have a hard time jumping the gap between the concepts exposed therein and one’s practice, while a theoretician would require more formal and deeper entries on the topics broached by the book. I just doubt this entry will be enough to lead maths departments to adopt behavioural analytics as part of their curriculum… Continue reading

## the worst possible proof [X’ed]

Posted in Kids, Statistics, University life, Books with tags , , , , , , on July 18, 2015 by xi'an

Another surreal experience thanks to X validated! A user of the forum recently asked for an explanation of the above proof in Lynch’s (2007) book, Introduction to Applied Bayesian Statistics and Estimation for Social Scientists. No wonder this user was puzzled: the explanation makes no sense outside the univariate case… It is hard to fathom why on Earth the author would resort to this convoluted approach to conclude about the posterior conditional distribution being a normal centred at the least square estimate and with σ²X’X as precision matrix. Presumably, he has a poor opinion of the degree of matrix algebra numeracy of his readers [and thus should abstain from establishing the result]. As it seems unrealistic to postulate that the author is himself confused about matrix algebra, given his MSc in Statistics [the footnote ² seen above after “appropriately” acknowledges that “technically we cannot divide by” the matrix, but it goes on to suggest multiplying the numerator by the matrix

$(X^\text{T}X)^{-1} (X^\text{T}X)$

which does not make sense either, unless one introduces the trace tr(.) operator, presumably out of reach for most readers]. And this part of the explanation is unnecessarily confusing in that a basic matrix manipulation leads to the result. Or even simpler, a reference to Pythagoras’  theorem.

## Bayesian inference for partially identified models [book review]

Posted in Statistics, University life, Books with tags , , , , , , , , , on July 9, 2015 by xi'an

“The crux of the situation is that we lack theoretical insight into even quite basic questions about what is going on. More particularly, we cannot sayy anything about the limiting posterior marginal distribution of α compared to the prior marginal distribution of α.” (p.142)

Bayesian inference for partially identified models is a recent CRC Press book by Paul Gustafson that I received for a review in CHANCE with keen interest! If only because the concept of unidentifiability has always puzzled me. And that I have never fully understood what I felt was a sort of joker card that a Bayesian model was the easy solution to the problem since the prior was compensating for the components of the parameter not identified by the data. As defended by Dennis Lindley that “unidentifiability causes no real difficulties in the Bayesian approach”. However, after reading the book, I am less excited in that I do not feel it answers this type of questions about non-identifiable models and that it is exclusively centred on the [undoubtedly long-term and multifaceted] research of the author on the topic.

“Without Bayes, the feeling is that all the data can do is locate the identification region, without conveying any sense that some values in the region are more plausible than others.” (p.47)

Overall, the book is pleasant to read, with a light and witty style. The notational conventions are somewhat unconventional but well explained, to distinguish θ from θ* from θ. The format of the chapters is quite similar with a definition of the partially identified model, an exhibition of the transparent reparameterisation, the computation of the limiting posterior distribution [of the non-identified part], a demonstration [which it took me several iterations as the English exhibition rather than the French proof, pardon my French!]. Chapter titles suffer from an excess of the “further” denomination… The models themselves are mostly of one kind, namely binary observables and non-observables leading to partially observed multinomials with some non-identifiable probabilities. As in missing-at-random models (Chapter 3). In my opinion, it is only in the final chapters that the important questions are spelled-out, not always faced with a definitive answer. In essence, I did not get from the book (i) a characterisation of the non-identifiable parts of a model, of the  identifiability of unidentifiability, and of the universality of the transparent reparameterisation, (ii) a tool to assess the impact of a particular prior and possibly to set it aside, and (iii) a limitation to the amount of unidentifiability still allowing for coherent inference. Hence, when closing the book, I still remain in the dark (or at least in the grey) on how to handle partially identified models. The author convincingly argues that there is no special advantage to using a misspecified if identifiable model to a partially identified model, for this imbues false confidence (p.162), however we also need the toolbox to verify this is indeed the case.

“Given the data we can turn the Bayesian computational crank nonetheless and see what comes out.” (p.xix)

“It is this author’s contention that computation with partially identified models is a “bottleneck” issue.” (p.141)

Bayesian inference for partially identified models is particularly concerned about computational issues and rightly so. It is however unclear to me (without more time to invest investigating the topic) why the “use of general-purpose software is limited to the [original] parametrisation” (p.24) and why importance sampling would do better than MCMC on a general basis. I would definitely have liked more details on this aspect. There is a computational considerations section at the end of the book, but it remains too allusive for my taste. My naïve intuition would be that the lack of identifiability leads to flatter posterior and hence to easier MCMC moves, but Paul Gustafson reports instead bad mixing from standard MCMC schemes (like WinBUGS).

In conclusion, the book opens a new perspective on the relevance of partially identifiable models, trying to lift the stigma associated with them, and calls for further theory and methodology to deal with those. Here are the author’s final points (p.162):

• “Identification is nuanced. Its absence does not preclude a parameter being well estimated, not its presence guarantee a parameter can be well estimated.”
• “If we really took limitations of study designs and data quality seriously, then partially identifiable models would crop up all the time in a variety of scientific fields.”
• “Making modeling assumptions for the sole purpose of gaining full identification can be a mug’s game (…)”
• “If we accept partial identifiability, then consequently we need to regard sample size differently. There are profound implications of posterior variance tending to a positive limit as the sample size grows.”

These points may be challenging enough to undertake to read Bayesian inference for partially identified models in order to make one’s mind about their eventual relevance in statistical modelling.

[Disclaimer about potential self-plagiarism: this post will also be published as a book review in my CHANCE column. ]

## Current trends in Bayesian methodology with applications

Posted in Statistics, University life, Books, Travel with tags , , , , , on June 20, 2015 by xi'an

When putting this volume together with Umesh Singh, Dipak Dey, and Appaia Loganathan, my friend Satyanshu Upadhyay from Varanasi, India, asked me for a foreword. The book is now out, with chapters written by a wide variety of Bayesians. And here is my foreword, for what it’s worth:

It is a great pleasure to see a new book published on current aspects of Bayesian Analysis and coming out of India. This wide scope volume reflects very accurately on the present role of Bayesian Analysis in scientific inference, be it by statisticians, computer scientists or data analysts. Indeed, we have witnessed in the past decade a massive adoption of Bayesian techniques by users in need of statistical analyses, partly because it became easier to implement such techniques, partly because both the inclusion of prior beliefs and the production of a posterior distribution that provides a single filter for all inferential questions is a natural and intuitive way to process the latter. As reflected so nicely by the subtitle of Sharon McGrayne’s The Theory that Would not Die, the Bayesian approach to inference “cracked the Enigma code, hunted down Russian submarines” and more generally contributed to solve many real life or cognitive problems that did not seem to fit within the traditional patterns of a statistical model.
Two hundred and fifty years after Bayes published his note, the field is more diverse than ever, as reflected by the range of topics covered by this new book, from the foundations (with objective Bayes developments) to the implementation by filters and simulation devices, to the new Bayesian methodology (regression and small areas, non-ignorable response and factor analysis), to a fantastic array of applications. This display reflects very very well on the vitality and appeal of Bayesian Analysis. Furthermore, I note with great pleasure that the new book is edited by distinguished Indian Bayesians, India having always been a provider of fine and dedicated Bayesians. I thus warmly congratulate the editors for putting this exciting volume together and I offer my best wishes to readers about to appreciate the appeal and diversity of Bayesian Analysis.

## ISBA 2016 [logo]

Posted in Statistics, University life, Wines, Travel, pictures with tags , , , , , , , , , , on April 22, 2015 by xi'an

Things are starting to get in place for the next ISBA 2016 World meeting, in Forte Village Resort Convention Center, Sardinia, Italy. June 13-17, 2016. And not only the logo inspired from the nuraghe below. I am sure the program will be terrific and make this new occurrence of a “Valencia meeting” worth attending. Just like the previous occurrences, e.g. Cancún last summer and Kyoto in 2012.

However, and not for the first time, I wonder at the sustainability of such meetings when faced with always increasing—or more accurately sky-rocketing!—registration fees… We have now reached €500 per participant for the sole (early reg.) fees, excluding lodging, food or transportation. If we bet on 500 participants, this means simply renting the convention centre would cost €250,000 for the four or five days of the meeting. This sounds enormous, even accounting for the processing costs of the congress organiser. (By comparison, renting the convention centre MCMSki in Chamonix for three days was less than €20,000.) Given the likely high costs of staying at the resort, it is very unlikely I will be able to support my PhD students  As I know very well of the difficulty to find dedicated volunteers willing to offer a large fraction of their time towards the success of behemoth meetings, this comment is by no means aimed at my friends from Cagliari who kindly accepted to organise this meeting. But rather at the general state of academic meetings which costs makes them out of reach for a large part of the scientific community.

Thus, this makes me wonder anew whether we should move to a novel conference model given that the fantastic growth of the Bayesian community makes the ideal of gathering together in a single beach hotel for a week of discussions, talks, posters, and more discussions unattainable. If truly physical meetings are to perdure—and this notion is as debatable as the one about the survival of paper versions of the journals—, a new approach would be to find a few universities or sponsors able to provide one or several amphitheatres around the World and to connect all those places by teleconference. Reducing the audience size at each location would greatly the pressure to find a few huge and pricey convention centres, while dispersing the units all around would diminish travel costs as well. There could be more parallel sessions and ways could be found to share virtual poster sessions, e.g. by having avatars presenting some else’s poster. Time could be reserved for local discussions of presented papers, to be summarised later to the other locations. And so on… Obviously, something would be lost of the old camaraderie, sharing research questions and side stories, as well as gossips and wine, with friends from all over the World. And discovering new parts of the World. But the cost of meetings is already preventing some of those friends to show up. I thus think it is time we reinvent the Valencia meetings into the next generation. And move to the Valenci-e-meetings.