## off to New York

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , on March 29, 2015 by xi'an

I am off to New York City for two days, giving a seminar at Columbia tomorrow and visiting Andrew Gelman there. My talk will be about testing as mixture estimation, with slides similar to the Nice ones below if slightly upgraded and augmented during the flight to JFK. Looking at the past seminar speakers, I noticed we were three speakers from Paris in the last fortnight, with Ismael Castillo and Paul Doukhan (in the Applied Probability seminar) preceding me. Is there a significant bias there?!

## Statistics done wrong [book review]

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , on March 16, 2015 by xi'an

no starch press (!) sent me the pdf version of this incoming book, Statistics done wrong, by Alex Reinhart, towards writing a book review for CHANCE, and I read it over two flights, one from Montpellier to Paris last week, and from Paris to B’ham this morning. The book is due to appear on March 16. It expands on a still existing website developed by Reinhart. (Discussed a year or so away on Andrew’s blog, most in comments, witness Andrew’s comment below.) Reinhart who is, incidentally or not, is a PhD candidate in statistics at Carnegie Mellon University. After apparently a rather consequent undergraduate foray into physics. Quite an unusual level of maturity and perspective for a PhD student..!

“It’s hard for me to evaluate because I am so close to the material. But on first glance it looks pretty reasonable to me.” A. Gelman

Overall, I found myself enjoying reading the book, even though I found the overall picture of the infinitely many mis-uses of statistics rather grim and a recipe for despairing of ever setting things straight..! Somehow, this is an anti-textbook, in that it warns about many ways of applying the right statistical technique in the wrong setting, without ever describing those statistical techniques. Actually without using a single maths equation. Which should be a reason good enough for me to let all hell break loose on that book! But, no, not really, I felt no compunction about agreeing with Reinhart’s warning and if you have reading Andrew’s blog for a while you should feel the same…

“Then again for a symptom like spontaneous human combustion you might get excited about any improvement.” A. Reinhart (p.13)

Maybe the limitation in the exercise is that statistics appears so much fraught with dangers of over-interpretation and false positive and that everyone (except physicists!) is bound to make such invalidated leaps in conclusion, willingly or not, that it sounds like the statistical side of Gödel’s impossibility theorem! Further, the book moves from recommendation at the individual level, i.e., on how one should conduct an experiment and separate data for hypothesis building from data for hypothesis testing, to a universal criticism of the poor standards of scientific publishing and the unavailability of most datasets and codes. Hence calling for universal reproducibility protocols that reminded of the directions explored in this recent book I reviewed on that topic. (The one the rogue bird did not like.) It may be missing on the bright side of things, for instance the wonderful possibility to use statistical models to produce simulated datasets that allow for an evaluation of the performances of a given procedure in the ideal setting. Which would have helped the increasingly depressed reader in finding ways of checking how wrongs things could get..! But also on the dark side, as it does not say much about the fact that a statistical model is most presumably wrong. (Maybe a physicist’s idiosyncrasy!) There is a chapter entitled Model Abuse, but all it does is criticise stepwise regression and somehow botches the description of Simpson’s paradox.

“You can likely get good advice in exchange for some chocolates or a beer or perhaps coauthorship on your next paper.” A. Reinhart (p.127)

The final pages are however quite redeeming in that they acknowledge that scientists from other fields cannot afford a solid enough training in statistics and hence should hire statisticians as consultants for the data collection, analysis and interpretation of their experiments. A most reasonable recommendation!

## accelerating Metropolis-Hastings algorithms by delayed acceptance

Posted in Books, Statistics, University life with tags , , , , , , , , on March 5, 2015 by xi'an

Marco Banterle, Clara Grazian, Anthony Lee, and myself just arXived our paper “Accelerating Metropolis-Hastings algorithms by delayed acceptance“, which is an major revision and upgrade of our “Delayed acceptance with prefetching” paper of last June. Paper that we submitted at the last minute to NIPS, but which did not get accepted. The difference with this earlier version is the inclusion of convergence results, in particular that, while the original Metropolis-Hastings algorithm dominates the delayed version in Peskun ordering, the later can improve upon the original for an appropriate choice of the early stage acceptance step. We thus included a new section on optimising the design of the delayed step, by picking the optimal scaling à la Roberts, Gelman and Gilks (1997) in the first step and by proposing a ranking of the factors in the Metropolis-Hastings acceptance ratio that speeds up the algorithm.  The algorithm thus got adaptive. Compared with the earlier version, we have not pursued the second thread of prefetching as much, simply mentioning that prefetching and delayed acceptance could be merged. We have also included a section on the alternative suggested by Philip Nutzman on the ‘Og of using a growing ratio rather than individual terms, the advantage being the probability of acceptance stabilising when the number of terms grows, with the drawback being that expensive terms are not always computed last. In addition to our logistic and mixture examples, we also study in this version the MALA algorithm, since we can postpone computing the ratio of the proposals till the second step. The gain observed in one experiment is of the order of a ten-fold higher efficiency. By comparison, and in answer to one comment on Andrew’s blog, we did not cover the HMC algorithm, since the preliminary acceptance step would require the construction of a proxy to the acceptance ratio, in order to avoid computing a costly number of derivatives in the discretised Hamiltonian integration.

## Posterior predictive p-values and the convex order

Posted in Books, Statistics, University life with tags , , , , , , , , , on December 22, 2014 by xi'an

Patrick Rubin-Delanchy and Daniel Lawson [of Warhammer fame!] recently arXived a paper we had discussed with Patrick when he visited Andrew and I last summer in Paris. The topic is the evaluation of the posterior predictive probability of a larger discrepancy between data and model

$\mathbb{P}\left( f(X|\theta)\ge f(x^\text{obs}|\theta) \,|\,x^\text{obs} \right)$

which acts like a Bayesian p-value of sorts. I discussed several times the reservations I have about this notion on this blog… Including running one experiment on the uniformity of the ppp while in Duke last year. One item of those reservations being that it evaluates the posterior probability of an event that does not exist a priori. Which is somewhat connected to the issue of using the data “twice”.

“A posterior predictive p-value has a transparent Bayesian interpretation.”

Another item that was suggested [to me] in the current paper is the difficulty in defining the posterior predictive (pp), for instance by including latent variables

$\mathbb{P}\left( f(X,Z|\theta)\ge f(x^\text{obs},Z^\text{obs}|\theta) \,|\,x^\text{obs} \right)\,,$

which reminds me of the multiple possible avatars of the BIC criterion. The question addressed by Rubin-Delanchy and Lawson is how far from the uniform distribution stands this pp when the model is correct. The main result of their paper is that any sub-uniform distribution can be expressed as a particular posterior predictive. The authors also exhibit the distribution that achieves the bound produced by Xiao-Li Meng, Namely that

$\mathbb{P}(P\le \alpha) \le 2\alpha$

where P is the above (top) probability. (Hence it is uniform up to a factor 2!) Obviously, the proximity with the upper bound only occurs in a limited number of cases that do not validate the overall use of the ppp. But this is certainly a nice piece of theoretical work.

## importance sampling schemes for evidence approximation [revised]

Posted in Statistics, University life with tags , , , , , , , on November 18, 2014 by xi'an

After a rather intense period of new simulations and versions, Juong Een (Kate) Lee and I have now resubmitted our paper on (some) importance sampling schemes for evidence approximation in mixture models to Bayesian Analysis. There is no fundamental change in the new version but rather a more detailed description of what those importance schemes mean in practice. The original idea in the paper is to improve upon the Rao-Blackwellisation solution proposed by Berkoff et al. (2002) and later by Marin et al. (2005) to avoid the impact of label switching on Chib’s formula. The Rao-Blackwellisation consists in averaging over all permutations of the labels while the improvement relies on the elimination of useless permutations, namely those that produce a negligible conditional density in Chib’s (candidate’s) formula. While the improvement implies truncated the overall sum and hence induces a potential bias (which was the concern of one referee), the determination of the irrelevant permutations after relabelling next to a single mode does not appear to cause any bias, while reducing the computational overload. Referees also made us aware of many recent proposals that conduct to different evidence approximations, albeit not directly related with our purpose. (One was Rodrigues and Walker, 2014, discussed and commented in a recent post.)

## revenge, death penalty, prisons, &tc.

Posted in Books, Kids, University life with tags , , , , , , , , , on May 17, 2014 by xi'an

In the latest Sunday Review of the New York Times, the Norwegian novelist Jo Nesbo has a tribune on revenge against misdeeds and law as institutionalized revenge. Somewhat hidden in the current justifications of the legal system(s). (As an aside, he mentions the example of the Icelandic Alþingi where justice was dispensed once a year, resulting in beheadings, stake burnings, and drowning in the pond depicted above…) This came a few days after another tribune on a similar topic by Charles Blow, following the “botched Oklahoma execution of Clayton Lockett”, entitled “Eye-for-eye incivility” (an understatement if any!), and arguing  about the economic inefficiency of the death penalty. Besides the basic moral quandaries of taking someone else’s life, perfectly summarised by Franquin in the following dark strip:

This sequence of tribunes links to one of my pet theories, which is that imprisonment is the most inadequate way of addressing crime and law breaking in (modern?) societies.  Setting fully aside the moral notions of revenge and punishment, which aim more at the victim or victim’s relatives than at the perpetrator, and of redemption and remorse, which are at best hypothetical and inspired by religious considerations,  I do wonder why economists have not tried to come up with more rational and game-theoretic ways of dealing with law-breakers than locking them up all together and expecting them to behave forever after the end of their term. More globally, I find it quite surprising that no one ever seems to question the very notion of sending people to jail. Indeed, it does bring any clear benefit to society as a whole. One of the usual arguments I receive in those occasions is that imprisonment keeps dangerous people away. But that seems a fairly weak notion: (i) most violent offenders are not dangerous in an absolute berserker sense but only because local circumstances made them violent at a given occurrence in space and time, (ii) those offenders are only put away for a while (in most civilised countries), (iii) they are not getting any less dangerous while in prison, and (iv) it does not apply to the vast majority of people jailed. Furthermore, from a pure offer-versus-demand perspective, this may be counterproductive: e.g., putting some thieves away in jail for a while simply gives an opportunity to other thieves to take advantage of the “thieving market”.

The Freakonomics blog has some entries on the topic—somewhat supportive of my notion that most criminals act in an overall rational way for which incentives and decentives could be considered—, but still fails to address the larger picture… I showed this post to Andrew who pointed me (of course!) to his blog, as several entries therein also consider the issue, like this one on the puzzles of criminal justice. Or prison terms for financial fraud? But I would push the argument further and call for an ultimate abolishment of the carceral system, seeking efficient and generalised alternatives to imprisonment. As detailed in this U.N. report I just came across. As I think a time will come when imprisonment will be seen as irrational as witch-burning is considered today.

## Bayesian Data Analysis [BDA3 – part #2]

Posted in Books, Kids, R, Statistics, University life with tags , , , , , , , , on March 31, 2014 by xi'an

Here is the second part of my review of Gelman et al.’ Bayesian Data Analysis (third edition):

“When an iterative simulation algorithm is “tuned” (…) the iterations will not in general converge to the target distribution.” (p.297)

Part III covers advanced computation, obviously including MCMC but also model approximations like variational Bayes and expectation propagation (EP), with even a few words on ABC. The novelties in this part are centred at Stan, the language Andrew is developing around Hamiltonian Monte Carlo techniques, a sort of BUGS of the 10’s! (And of course Hamiltonian Monte Carlo techniques themselves. A few (nit)pickings: the book advises important resampling without replacement (p.266) which makes some sense when using a poor importance function but ruins the fundamentals of importance sampling. Plus, no trace of infinite variance importance sampling? of harmonic means and their dangers? In the Metropolis-Hastings algorithm, the proposal is called the jumping rule and denoted by Jt, which, besides giving the impression of a Jacobian, seems to allow for time-varying proposals and hence time-inhomogeneous Markov chains, which convergence properties are much hairier. (The warning comes much later, as exemplified in the above quote.) Moving from “burn-in” to “warm-up” to describe the beginning of an MCMC simulation. Being somewhat 90’s about convergence diagnoses (as shown by the references in Section 11.7), although the book also proposes new diagnoses and relies much more on effective sample sizes. Particle filters are evacuated in hardly half-a-page. Maybe because Stan does not handle particle filters. A lack of intuition about the Hamiltonian Monte Carlo algorithms, as the book plunges immediately into a two-page pseudo-code description. Still using physics vocabulary that put me (and maybe only me) off. Although I appreciated the advice to check analytical gradients against their numerical counterpart.

“In principle there is no limit to the number of levels of variation that can be handled in this way. Bayesian methods provide ready guidance in handling the estimation of the unknown parameters.” (p.381)

I also enjoyed reading the part about modes that stand at the boundary of the parameter space (Section 13.2), even though I do not think modes are great summaries in Bayesian frameworks and while I do not see how picking the prior to avoid modes at the boundary avoids the data impacting the prior, in fine. The variational Bayes section (13.7) is equally enjoyable, with a proper spelled-out illustration, introducing an unusual feature for Bayesian textbooks.  (Except that sampling without replacement is back!) Same comments for the Expectation Propagation (EP) section (13.8) that covers brand new notions. (Will they stand the test of time?!)

“Geometrically, if β-space is thought of as a room, the model implied by classical model selection claims that the true β has certain prior probabilities of being in the room, on the floor, on the walls, in the edge of the room, or in a corner.” (p.368)