Archive for Bayesian statistics

ISBA2020 program

Posted in Kids, Statistics, Travel, University life with tags , , , , , , , , , , , , on January 29, 2020 by xi'an

The scheduled program for ISBA 2020 is now on-line. And full of exciting sessions, many with computational focus. With dear hopes that the nCo-2019 epidemics will have abated by then (and not solely for the sake of the conference, most obviously!). While early registration ends by 15 April, the deadline for junior travel support ends up this month. And so does the deadline for contributions.

Hastings 50 years later

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , on January 9, 2020 by xi'an

What is the exact impact of the Metropolis-Hastings algorithm on the field of Bayesian statistics? and what are the new tools of the trade? What I personally find the most relevant and attractive element in a review on the topic is the current role of this algorithm, rather than its past (his)story, since many such reviews have already appeared and will likely continue to appear. What matters most imho is how much the Metropolis-Hastings algorithm signifies for the community at large, especially beyond academia. Is the availability or unavailability of software like BUGS or Stan a help or an hindrance? Was Hastings’ paper the start of the era of approximate inference or the end of exact inference? Are the algorithm intrinsic features like Markovianity a fundamental cause for an eventual extinction because of the ensuing time constraint and the lack of practical guarantees of convergence and the illusion of a fully automated version? Or are emerging solutions like unbiased MCMC and asynchronous algorithms a beacon of hope?

In their Biometrika paper, Dunson and Johndrow (2019) recently wrote a celebration of Hastings’ 1970 paper in Biometrika, where they cover adaptive Metropolis (Haario et al., 1999; Roberts and Rosenthal, 2005), the importance of gradient based versions toward universal algorithms (Roberts and Tweedie, 1995; Neal, 2003), discussing the advantages of HMC over Langevin versions. They also recall the significant step represented by Peter Green’s (1995) reversible jump algorithm for multimodal and multidimensional targets, as well as tempering (Miasojedow et al., 2013; Woodard et al., 2009). They further cover intractable likelihood cases within MCMC (rather than ABC), with the use of auxiliary variables (Friel and Pettitt, 2008; Møller et al., 2006) and pseudo-marginal MCMC (Andrieu and Roberts, 2009; Andrieu and Vihola, 2016). They naturally insist upon the need to handle huge datasets, high-dimension parameter spaces, and other scalability issues, with links to unadjusted Langevin schemes (Bardenet et al., 2014; Durmus and Moulines, 2017; Welling and Teh, 2011). Similarly, Dunson and Johndrow (2019) discuss recent developments towards parallel MCMC and non-reversible schemes such as PDMP as highly promising, with a concluding section on the challenges of automatising and robustifying much further the said procedures, if only to reach a wider range of applications. The paper is well-written and contains a wealth of directions and reflections, including those in my above introduction. Here are some mostly disconnected directions I would have liked to see covered or more covered

  1. convergence assessment today, e.g. the comparison of various approximation schemes
  2. Rao-Blackwellisation and other post-processing improvements
  3. other approximate inference tools than the pseudo-marginal MCMC
  4. importance of the parameterisation of the problem for convergence
  5. dimension issues and connection with quasi-Monte Carlo
  6. constrained spaces of measure zero, as for instance matrix distributions imposing zeros outside a diagonal band
  7. given the rise of the machine(-learners), are exploratory and intrinsically slow algorithms like MCMC doomed or can both fields feed one another? The section on optimisation could be expanded in that direction
  8. the wasteful nature of the random walk feature of MCMC algorithms, as opposed to non-reversible kernels like HMC and other PDMPs, missing from the gradient based methods section (and can we once again learn from physicists?)
  9. finer convergence issues and hence inference difficulties with complex MCMC algorithms like Gibbs samplers with incompatible conditionals
  10. use of the Hastings ratio in other algorithms like ABC or EP (in link with the section on generalised Bayes)
  11. adapting Metropolis-Hastings methods for emerging computing tools like GPUs and quantum computers

or possibly less covered, namely data augmentation put forward when it is a special case of auxiliary variables as in slice sampling and in earlier physics literature. For instance, both probit and logistic regressions do not truly require data augmentation and are more toy examples than really challenging applications. The approach of Carlin & Chib (1995) is another illustration, which has met with recent interest, despite requiring heavy calibration (just like RJMCMC). As well as a a somewhat awkward opposition between Gibbs and Hastings, in that I am not convinced that Gibbs does not remain ultimately necessary to handle high dimension problems, in the sense that the alternative solutions like Langevin, HMC, or PDMP, or…, are relying on Euclidean assumptions for the entire vector, while a direct product of Euclidean structures may prove more adequate.

a hatchet job [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , on July 20, 2019 by xi'an

By happenstance, I came across a rather savage review of John Hartigan’s Bayes Theory (1984) written by Bruce Hill in HASA, including the following slivers:

“By and large this book is at its best in developing the mathematical consequences of the theory and at its worst when dealing with the underlying ideas and concepts, which seems unfortunate since Bayesian statistics is above all an attempt to deal realistically with the nature of uncertainty and decision making.” B. Hill, JASA, 1986, p.569

“Unfortunately, those who had hoped for a serious contribution to the question will be disappointed.” B. Hill, JASA, 1986, p.569

“If the primary concern is mathematical convenience, not content or meaning, then the enterprise is a very different matter from what most of us think of as Bayesian approach.” B. Hill, JASA, 1986, p.570

“Perhaps in a century or two statisticians and probabilists will reach a similar state of maturity.” B. Hill, JASA, 1986, p.570

Perhaps this is a good place to mention that the notation in the book is formidable. Bayes’s theorem appears in a form that is  almost unrecognizable. As elsewhere, the mathematical treatment is elegant. but none of the deeper issues about the meaning and interpretation of conditional probability is discussed.” B. Hill, JASA, 1986, p.570

“The reader will find many intriguing ideas, much that is outrageous, and even some surprises (the likelihood principle is not mentioned, and conditional inference is just barely mentioned).” B. Hill, JASA, 1986, p.571

What is disappointing to me is that with a little more discipline and effort with regard to the ideas underlying Bayesian statistics, this book could have been a major contribution to the theory.” B. Hill, JASA, 1986, p.571

Another review by William Sudderth (1985, Bulletin of the American Mathematical Society) is much kinder to the book, except for the complaint that “the pace is brisk and sometimes hard to follow”.

I’m getting the point

Posted in Statistics with tags , , , , , , on February 14, 2019 by xi'an

A long-winded X validated discussion on the [textbook] mean-variance conjugate posterior for the Normal model left me [mildly] depressed at the point and use of answering questions on this forum. Especially as it came at the same time as a catastrophic outcome for my mathematical statistics exam.  Possibly an incentive to quit X validated as one quits smoking, although this is not the first attempt

AIQ [book review]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , , , , , on January 11, 2019 by xi'an

AIQ was my Christmas day read, which I mostly read while the rest of the household was still sleeping. The book, written by two Bayesians, Nick Polson and James Scott, was published before the ISBA meeting last year, but I only bought it on my last trip to Warwick [as a Xmas present]. This is a pleasant book to read, especially while drinking tea by the fire!, well-written and full of facts and anecdotes I did not know or had forgotten (more below). Intended for a general audience, it is also quite light, from a technical side, rather obviously, but also from a philosophical side. While strongly positivist about the potential of AIs for the general good, it cannot be seen as an antidote to the doomlike Superintelligence by Nick Bostrom or the more factual Weapons of Maths Destruction by Cathy O’Neal. (Both commented on the ‘Og.)

Indeed, I find the book quite benevolent and maybe a wee bit too rosy in its assessment of AIs and the discussion on how Facebook and Russian intervention may have significantly to turn the White House Orange is missing [imho] the viral nature of the game, when endless loops of highly targeted posts can cut people from the most basic common sense. While the authors are “optimistic that, given the chance, people can be smart enough”, I do reflect on the sheer fact that the hoax that Hillary Clinton was involved in a child sex ring was ever considered seriously by people. To the point of someone shooting at the pizza restaurant. And I hence am much less optimistic at the ability for a large enough portion of the population, not even the majority, to keep a critical distance from the message carried by AI driven media. Similarly, while Nick and James point out (rather late in the book) that big data (meaning large data) is not necessarily good data for being unrepresentative at the population at large, they do not propose (in the book) highly convincing solutions to battle bias in existing and incoming AIs. Leading to a global worry that AIs may do well for a majority of the population and discriminate against a minority by the same reasoning. As described in Cathy O’Neal‘s book, and elsewhere, proprietary software does not even have to explain why it discriminates. More globally, the business school environment of the authors may have prevented them from stating a worry on the massive power grab by the AI-based companies, which genetically grow with little interest in democracy and states, as shown (again) by the recent election or their systematic fiscal optimisation. Or by the massive recourse to machine learning by Chinese authorities towards a social credit system grade for all citizens.

“La rage de vouloir conclure est une des manies les plus funestes et les plus stériles qui appartiennent à l’humanité. Chaque religion et chaque philosophie a prétendu avoir Dieu à elle, toiser l’infini et connaître la recette du bonheur.” Gustave Flaubert

I did not know about Henrietta Leavitt’s prediction rule for pulsating stars, behind Hubble’s discovery, which sounds like an astronomy dual to Rosalind Franklin’s DNA contribution. The use of Bayes’ rule for locating lost vessels is also found in The Theorem that would not die. Although I would have also mentioned its failure in locating Malaysia Airlines Flight 370. I had also never heard the great expression of “model rust. Nor the above quote from Flaubert. It seems I have recently spotted the story on how a 180⁰ switch in perspective on language understanding by machines brought the massive improvement that we witness today. But I cannot remember where. And I have also read about Newton missing the boat on the precision of the coinage accuracy (was it in Bryson’s book on the Royal Society?!), but with less neutral views on the role of Newton in the matter, as the Laplace of England would have benefited from keeping the lax measures of assessment.

Great to see friendly figures like Luke Bornn and Katherine Heller appearing in the pages. Luke for his work on the statistical analysis of basketball games, Katherine  for her work on predictive analytics in medicine. Reflecting on the missed opportunities represented by the accumulation of data on any patient throughout their life that is as grossly ignored nowadays as it was at Nightingale‘s time. The message of the chapter [on “The Lady with the Lamp”] may again be somewhat over-optimistic: while AI and health companies see clear incentives in developing more encompassing prediction and diagnostic techniques, this will only benefit patients who can afford the ensuing care. Which, given the state of health care systems in the most developed countries, is an decreasing proportion. Not to mention the less developed countries.

Overall, a nice read for the general public, de-dramatising the rise of the machines!, and mixing statistics and machine learning to explain the (human) intelligence behind the AIs. Nothing on the technical side, to be sure, but this was not the intention of the authors.

Binomial vs Bernoulli

Posted in Books, Statistics with tags , , , , on December 25, 2018 by xi'an

An interesting confusion on X validated where someone was convinced that using the Bernoulli representation of a sequence of Bernoulli experiments led to different posterior probabilities of two possible models than when using their Binomial representation. The confusion actually stemmed from using different conditionals, namely N¹=4,N²=1 in the first case (for a model M¹ with two probabilities p¹ and p²) and N¹+N²=5 in the second case (for a model M² with a single probability p⁰). While (N¹,N²) is sufficient for the first model and N¹+N² is sufficient for the second model, P(M¹|N¹,N²) is not commensurable to P(M²|N¹+N²)! Another illustration of the fickleness of the notion of sufficiency when comparing models.

at CIRM [jatp]

Posted in Mountains, pictures, Running, Travel with tags , , , , , , , , , , , , , , , , , on October 21, 2018 by xi'an