Archive for Pierre Simon Laplace

priors without likelihoods are like sloths without…

Posted in Books, Statistics with tags , , , , , , , , , , , , on September 11, 2017 by xi'an

“The idea of building priors that generate reasonable data may seem like an unusual idea…”

Andrew, Dan, and Michael arXived a opinion piece last week entitled “The prior can generally only be understood in the context of the likelihood”. Which connects to the earlier Read Paper of Gelman and Hennig I discussed last year. I cannot state strong disagreement with the positions taken in this piece, actually, in that I do not think prior distributions ever occur as a given but are rather chosen as a reference measure to probabilise the parameter space and eventually prioritise regions over others. If anything I find myself even further on the prior agnosticism gradation.  (Of course, this lack of disagreement applies to the likelihood understood as a function of both the data and the parameter, rather than of the parameter only, conditional on the data. Priors cannot be depending on the data without incurring disastrous consequences!)

“…it contradicts the conceptual principle that the prior distribution should convey only information that is available before the data have been collected.”

The first example is somewhat disappointing in that it revolves as so many Bayesian textbooks (since Laplace!) around the [sex ratio] Binomial probability parameter and concludes at the strong or long-lasting impact of the Uniform prior. I do not see much of a contradiction between the use of a Uniform prior and the collection of prior information, if only because there is not standardised way to transfer prior information into prior construction. And more fundamentally because a parameter rarely makes sense by itself, alone, without a model that relates it to potential data. As for instance in a regression model. More, following my epiphany of last semester, about the relativity of the prior, I see no damage in the prior being relevant, as I only attach a relative meaning to statements based on the posterior. Rather than trying to limit the impact of a prior, we should rather build assessment tools to measure this impact, for instance by prior predictive simulations. And this is where I come to quite agree with the authors.

“…non-identifiabilities, and near nonidentifiabilites, of complex models can lead to unexpected amounts of weight being given to certain aspects of the prior.”

Another rather straightforward remark is that non-identifiable models see the impact of a prior remain as the sample size grows. And I still see no issue with this fact in a relative approach. When the authors mention (p.7) that purely mathematical priors perform more poorly than weakly informative priors it is hard to see what they mean by this “performance”.

“…judge a prior by examining the data generating processes it favors and disfavors.”

Besides those points, I completely agree with them about the fundamental relevance of the prior as a generative process, only when the likelihood becomes available. And simulatable. (This point is found in many references, including our response to the American Statistician paper Hidden dangers of specifying noninformative priors, with Kaniav Kamary. With the same illustration on a logistic regression.) I also agree to their criticism of the marginal likelihood and Bayes factors as being so strongly impacted by the choice of a prior, if treated as absolute quantities. I also if more reluctantly and somewhat heretically see a point in using the posterior predictive for assessing whether a prior is relevant for the data at hand. At least at a conceptual level. I am however less certain about how to handle improper priors based on their recommendations. In conclusion, it would be great to see one [or more] of the authors at O-Bayes 2017 in Austin as I am sure it would stem nice discussions there! (And by the way I have no prior idea on how to conclude the comparison in the title!)

errors, blunders, and lies [book review]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , on July 9, 2017 by xi'an

This new book by David Salsburg is the first one in the ASA-CRC Series on Statistical Reasoning in Science and Society. Which explains why I heard about it both from CRC Press [as a suggested material for a review in CHANCE] and from the ASA [as mass emailing]. The name of the author did not ring a bell until I saw the line about his earlier The Lady Tasting Tea book,  a best-seller in the category of “soft [meaning math- and formula-free] introduction to Statistics through picturesque characters”. Which I did not read either [but Bob Carpenter did].

The current book is of the same flavour, albeit with some maths formulas [each preceded by a lengthy apology for using maths and symbols]. The topic is the one advertised in the title, covering statistical errors and the way to take advantage of them, model mis-specification and robustness, and the detection of biases and data massaging. I read the short book in one quick go, waiting for the results of the French Legislative elections, and found no particular appeal in the litany of examples, historical entries, pitfalls, and models I feel I have already read so many times in the story-telling approach to statistics. (Naked Statistics comes to mind.)

It is not that there anything terrible with the book, which is partly based on the author’s own experience in a pharmaceutical company, but it does not seem to bring out any novelty for engaging into the study of statistics or for handling data in a more rational fashion. And I do not see which portion of the readership is targeted by the book, which is too allusive for academics and too academic for a general audience, who is not necessarily fascinated by the finer details of the history (and stories) of the field. As in The Lady Tasting Tea, the chapters constitute a collection of vignettes, rather than a coherent discourse leading to a theory or defending an overall argument. Some chapters are rather poor, like the initial chapter explaining the distinction between lies, blunders, and errors through the story of the measure of the distance from Earth to Sun by observing the transit of Venus, not that the story is uninteresting, far from it!, but I find it lacking in connecting with statistics [e.g., the meaning of a “correct” observation is never explained]. Or the chapter on the Princeton robustness study, where little is explained about the nature of the wrong distributions, which end up as specific contaminations impacting mostly the variance. And some examples are hardly convincing, like those on text analysis (Chapters 13, 14, 15), where there is little backup for using Benford’s law on such short datasets.  Big data is understood only under the focus of large p, small n, which is small data in my opinion! (Not to mention a minor crime de lèse-majesté in calling Pierre-Simon Laplace Simon-Pierre Laplace! I would also have left the Marquis de aside as this title came to him during the Bourbon Restauration, despite him having served Napoléon for his entire reign.) And, as mentioned above, the book contains apologetic mathematics, which never cease to annoy me since apologies are not needed. While the maths formulas are needed.

The Seven Pillars of Statistical Wisdom [book review]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on June 10, 2017 by xi'an

I remember quite well attending the ASA Presidential address of Stephen Stigler at JSM 2014, Boston, on the seven pillars of statistical wisdom. In connection with T.E. Lawrence’s 1926 book. Itself in connection with Proverbs IX:1. Unfortunately wrongly translated as seven pillars rather than seven sages.

As pointed out in the Acknowledgements section, the book came prior to the address by several years. I found it immensely enjoyable, first for putting the field in a (historical and) coherent perspective through those seven pillars, second for exposing new facts and curios about the history of statistics, third because of a literary style one would wish to see more often in scholarly texts and of a most pleasant design (and the list of reasons could go on for quite a while, one being the several references to Jorge Luis Borges!). But the main reason is to highlight the unified nature of Statistics and the reasons why it does not constitute a subfield of either Mathematics or Computer Science. In these days where centrifugal forces threaten to split the field into seven or more disciplines, the message is welcome and urgent.

Here are Stephen’s pillars (some comments being already there in the post I wrote after the address):

  1. aggregation, which leads to gain information by throwing away information, aka the sufficiency principle. One (of several) remarkable story in this section is the attempt by Francis Galton, never lacking in imagination, to visualise the average man or woman by superimposing the pictures of several people of a given group. In 1870!
  2. information accumulating at the √n rate, aka precision of statistical estimates, aka CLT confidence [quoting  de Moivre at the core of this discovery]. Another nice story is Newton’s wardenship of the English Mint, with musing about [his] potential exploiting this concentration to cheat the Mint and remain undetected!
  3. likelihood as the right calibration of the amount of information brought by a dataset [including Bayes’ essay as an answer to Hume and Laplace’s tests] and by Fisher in possible the most impressive single-handed advance in our field;
  4. intercomparison [i.e. scaling procedures from variability within the data, sample variation], from Student’s [a.k.a., Gosset‘s] t-test, better understood and advertised by Fisher than by the author, and eventually leading to the bootstrap;
  5. regression [linked with Darwin’s evolution of species, albeit paradoxically, as Darwin claimed to have faith in nothing but the irrelevant Rule of Three, a challenging consequence of this theory being an unobserved increase in trait variability across generations] exposed by Darwin’s cousin Galton [with a detailed and exhilarating entry on the quincunx!] as conditional expectation, hence as a true Bayesian tool, the Bayesian approach being more specifically addressed in (on?) this pillar;
  6. design of experiments [re-enters Fisher, with his revolutionary vision of changing all factors in Latin square designs], with an fascinating insert on the 18th Century French Loterie,  which by 1811, i.e., during the Napoleonic wars, provided 4% of the national budget!;
  7. residuals which again relate to Darwin, Laplace, but also Yule’s first multiple regression (in 1899), Fisher’s introduction of parametric models, and Pearson’s χ² test. Plus Nightingale’s diagrams that never cease to impress me.

The conclusion of the book revisits the seven pillars to ascertain the nature and potential need for an eight pillar.  It is somewhat pessimistic, at least my reading of it was, as it cannot (and presumably does not want to) produce any direction about this new pillar and hence about the capacity of the field of statistics to handle in-coming challenges and competition. With some amount of exaggeration (!) I do hope the analogy of the seven pillars that raises in me the image of the beautiful ruins of a Greek temple atop a Sicilian hill, in the setting sun, with little known about its original purpose, remains a mere analogy and does not extend to predict the future of the field! By its very nature, this wonderful book is about foundations of Statistics and therefore much more set in the past and on past advances than on the present, but those foundations need to move, grow, and be nurtured if the field is not to become a field of ruins, a methodology of the past!