This to point out an opening for a tenure track position in statistics and probability at Harvard University, with deadline December 1. More specifically, for a candidate in any field of statistics and probability as well as in any interdisciplinary areas where innovative and principled use of statistics and/or probability is of vital importance

## Archive for Boston

## position at Harvard

Posted in pictures, Running, University life with tags academic position, Boston, Charles river, Harvard University, interdisciplinary research, job offer, probability, Statistics, tenure-track position on October 27, 2018 by xi'an## unbiased HMC

Posted in Books, pictures, Statistics with tags Boston, coupling, Hamiltonian Monte Carlo, Harvard, HMC, Massachusetts, sunrise, unbiasedness on September 25, 2017 by xi'an**J**eremy Heng and Pierre Jacob arXived last week a paper on unbiased Hamiltonian Monte Carlo by coupling, following the earlier paper of Pierre and co-authors on debiasing by coupling a few weeks ago. The coupling within the HMC amounts to running two HMC chains with common random numbers, plus subtleties!

“As with any other MCMC method, HMC estimators are justified in the limit of the number of iterations. Algorithms which rely on such asymptotics face the risk of becoming obsolete if computational power keeps increasing through the number of available processors and not through clock speed.”

The main difficulty here is to have both chains meet (exactly) with large probability, since coupled HMC can only bring these chain close to one another. The trick stands in using both coupled HMC *and* coupled Hastings-Metropolis kernels, since the coupled MH kernel allows for exact meetings when the chains are already close, after which they remain happily and forever together! The algorithm is implemented by choosing between the kernels at random at each iteration. (Unbiasedness follows by the Glynn-Rhee trick, which is eminently well-suited for coupling!) As pointed out from the start of the paper, the appeal of this unbiased version is that the algorithm can be (embarrassingly) parallelised since all processors in use return estimators that are iid copies of one another, hence easily merged into a better estimator.

## The Seven Pillars of Statistical Wisdom [book review]

Posted in Books, pictures, Statistics, University life with tags Boston, Florence Nightingale, foundations, JSM 2014, Karl Pearson, likelihood, Pierre Simon Laplace, quincunx, R.A. Fisher, residuals, Stephen Stigler, T.E. Lawrence, The Seven Pillars of Statistical Wisdom, Thomas Bayes, trek rule theorem, W. Gosset on June 10, 2017 by xi'an**I** remember quite well attending the ASA Presidential address of Stephen Stigler at JSM 2014, Boston, on the seven pillars of statistical wisdom. In connection with T.E. Lawrence’s 1926 book. Itself in connection with Proverbs IX:1. Unfortunately wrongly translated as *seven pillars* rather than *seven sages*.

As pointed out in the Acknowledgements section, the book came prior to the address by several years. I found it immensely enjoyable, first for putting the field in a (historical and) coherent perspective through those seven pillars, second for exposing new facts and curios about the history of statistics, third because of a literary style one would wish to see more often in scholarly texts and of a most pleasant design (and the list of reasons could go on for quite a while, one being the several references to Jorge Luis Borges!). But the main reason is to highlight the unified nature of Statistics and the reasons why it does not constitute a subfield of either Mathematics or Computer Science. In these days where centrifugal forces threaten to split the field into seven or more disciplines, the message is welcome and urgent.

Here are Stephen’s pillars (some comments being already there in the post I wrote after the address):

*aggregation*, which leads to gain information by throwing away information, aka the sufficiency principle. One (of several) remarkable story in this section is the attempt by Francis Galton, never lacking in imagination, to visualise the average man or woman by superimposing the pictures of several people of a given group. In 1870!*information*accumulating at the √n rate, aka precision of statistical estimates, aka CLT confidence [quoting de Moivre at the core of this discovery]. Another nice story is Newton’s wardenship of the English Mint, with musing about [his] potential exploiting this concentration to cheat the Mint and remain undetected!*likelihood*as the right calibration of the amount of information brought by a dataset [including Bayes’ essay as an answer to Hume and Laplace’s tests] and by Fisher in possible the most impressive single-handed advance in our field;*intercomparison*[i.e. scaling procedures from variability within the data, sample variation], from Student’s [a.k.a., Gosset‘s] t-test, better understood and advertised by Fisher than by the author, and eventually leading to the bootstrap;*regression*[linked with Darwin’s evolution of species, albeit paradoxically, as Darwin claimed to have faith in nothing but the irrelevant Rule of Three, a challenging consequence of this theory being an unobserved increase in trait variability across generations] exposed by Darwin’s cousin Galton [with a detailed and exhilarating entry on the quincunx!] as conditional expectation, hence as a true Bayesian tool, the Bayesian approach being more specifically addressed in (on?) this pillar;*design of experiments*[re-enters Fisher, with his revolutionary vision of changing all factors in Latin square designs], with an fascinating insert on the 18th Century French Loterie, which by 1811, i.e., during the Napoleonic wars, provided 4% of the national budget!;*residuals*which again relate to Darwin, Laplace, but also Yule’s first multiple regression (in 1899), Fisher’s introduction of parametric models, and Pearson’s χ² test. Plus Nightingale’s diagrams that never cease to impress me.

The conclusion of the book revisits the seven pillars to ascertain the nature and potential need for an eight pillar. It is somewhat pessimistic, *at least my reading of it was*, as it cannot (and presumably does not want to) produce any direction about this new pillar and hence about the capacity of the field of statistics to handle in-coming challenges and competition. With some amount of exaggeration (!) I do hope the analogy of the seven pillars that raises in me the image of the beautiful ruins of a Greek temple atop a Sicilian hill, in the setting sun, with little known about its original purpose, remains a mere analogy and does not extend to predict the future of the field! By its very nature, this wonderful book is about foundations of Statistics and therefore much more set in the past and on past advances than on the present, but those foundations need to move, grow, and be nurtured if the field is not to become a field of ruins, a methodology of the past!

## Bayes is typically wrong…

Posted in pictures, Running, Statistics, Travel, University life with tags Bayesian Fiducial & Frequentist Conference, Bayesian foundations, Boston, Boston harbour, Don Fraser, fiducial inference, Harvard University, matching priors, Nancy Reid, profile likelihood, skyline, sunrise on May 3, 2017 by xi'an**I**n Harvard, this morning, Don Fraser gave a talk at the Bayesian, Fiducial, and Frequentist conference where he repeated *[as shown by the above quote]* the rather harsh criticisms on Bayesian inference he published last year in Statistical Science. And which I discussed a few days ago. The “wrongness” of Bayes starts with the completely arbitrary choice of the prior, which Don sees as unacceptable, and then increases because the credible regions are not confident regions, outside natural parameters from exponential families (Welch and Peers, 1963). And one-dimensional parameters using the profile likelihood (although I cannot find a proper definition of what the profile likelihood is in the paper, apparently a plug-in version that is not a genuine likelihood, hence somewhat falling under the same *this-is-not-a-true-probability* cleaver as the disputed Bayesian approach).

“I expect we’re all missing something, but I do not know what it is.”D.R. Cox, Statistical Science, 1994

And then Nancy Reid delivered a plenary lecture *“Are we converging?”* on the afternoon that compared most principles (including objective if not subjective Bayes) against different criteria, like consistency, nuisance elimination, calibration, meaning of probability, and so on. In an highly analytic if pessimistic panorama. (The talk should be available on line at some point soon.)

## messages from Harvard

Posted in pictures, Statistics, Travel, University life with tags Arnold Zellner, Bayesian tests of hypotheses, Boston, Cambridge, Harvard University, Massachusetts, mixtures of distributions on March 24, 2016 by xi'an**A**s in Bristol two months ago, where I joined the statistics reading in the morning, I had the opportunity to discuss the paper on testing via mixtures prior to my talk with a group of Harvard graduate students. Which concentrated on the biasing effect of the Bayes factor against the more complex hypothesis/model. Arguing [if not in those terms!] that Occam’s razor was too sharp. With a neat remark that decomposing the log Bayes factor as

log(p¹(y¹,H))+log(p²(y²|y¹,H))+…

meant that the first marginal was immensely and uniquely impacted by the prior modelling, hence very likely to be very small for a larger model H, which would then take forever to recover from. And asking why there was such a difference with cross-validation

log(p¹(y¹|y⁻¹,H))+log(p²(y²|y⁻²,H))+…

where the leave-one out posterior predictor is indeed more stable. While the later leads to major overfitting in my opinion, I never spotted the former decomposition which does appear as a strong and maybe damning criticism of the Bayes factor in terms of long-term impact of the prior modelling.

Other points made during the talk or before when preparing the talk:

- additive mixtures are but one encompassing model, geometric mixtures could be fun too, if harder to process (e.g., missing normalising constant). Or Zellner’s mixtures (with again the normalising issue);
- if the final outcome of the “test” is the posterior on α itself, the impact of the hyper-parameter on α is quite relative since this posterior can be calibrated by simulation against limiting cases (α=0,1);
- for the same reason the different rate of accumulation near zero and one when compared with a posterior probability is hardly worrying;
- what I see as a fundamental difference in processing improper priors for Bayes factors versus mixtures is not perceived as such by everyone;
- even a common parameter θ on both models does not mean both models are equally weighted a priori, which relates to an earlier remark in Amsterdam about the different Jeffreys priors one can use;
- the MCMC output also produces a sample of θ’s which behaviour is obviously different from single model outputs. It would be interesting to study further the behaviour of those samples, which are not to be confused with model averaging;
- the mixture setting has nothing intrinsically Bayesian in that the model can be processed in other ways.