## Laplace’s Demon [coming home!]

Posted in Kids, Linux, pictures, Statistics, University life with tags , , , , , , , , , , , , , on May 11, 2020 by xi'an

A new online seminar is starting this week, called Laplace’s Demon [after too much immersion in His Dark Materials, lately, ather than Unix coding, I first wrote daemon!] and concerned with Bayesian Machine Learning at Scale. Run by Criteo in Paris (hence the Laplace filiation, I presume!). Here is the motivational blurb from their webpage

Machine learning is changing the world we live in at a break neck pace. From image recognition and generation, to the deployment of recommender systems, it seems to be breaking new ground constantly and influencing almost every aspect of our lives. In this seminar series we ask distinguished speakers to comment on what role Bayesian statistics and Bayesian machine learning have in this rapidly changing landscape. Do we need to optimally process information or borrow strength in the big data era? Are philosophical concepts such as coherence and the likelihood principle relevant when you are running a large scale recommender system? Are variational approximations, MCMC or EP appropriate in a production environment? Can I use the propensity score and call myself a Bayesian? How can I elicit a prior over a massive dataset? Is Bayes a reasonable theory of how to be perfect but a hopeless theory of how to be good? Do we need Bayes when we can just A/B test? What combinations of pragmatism and idealism can be used to deploy Bayesian machine learning in a large scale live system? We ask Bayesian believers, Bayesian pragmatists and Bayesian skeptics to comment on all of these subjects and more.

The seminar takes places on the second Wednesday of the month, at 5pm (GMT+2) starting ill-fatedly with myself on ABC-Gibbs this very Wednesday (13 May 2020), followed by Aki Vehtari, John Ormerod, Nicolas Chopin, François Caron, Pierre Latouche, Victor Elvira, Sara Filippi, and Chris Oates. (I think my very first webinar was a presentation at the Deutsche Bank, New York, I gave from CREST videoconference room from 8pm till midnight after my trip was cancelled when the Twin Towers got destroyed, on 07 September 2001…)

## Bertrand-Borel debate

Posted in Books, Statistics with tags , , , , , , , , , , , , , on May 6, 2019 by xi'an

On her blog, Deborah Mayo briefly mentioned the Bertrand-Borel debate on the (in)feasibility of hypothesis testing, as reported [and translated] by Erich Lehmann. A first interesting feature is that both [starting with] B mathematicians discuss the probability of causes in the Bayesian spirit of Laplace. With Bertrand considering that the prior probabilities of the different causes are impossible to set and then moving all the way to dismiss the use of probability theory in this setting, nipping the p-values in the bud..! And Borel being rather vague about the solution probability theory has to provide. As stressed by Lehmann.

“The Pleiades appear closer to each other than one would naturally expect. This statement deserves thinking about; but when one wants to translate the phenomenon into numbers, the necessary ingredients are lacking. In order to make the vague idea of closeness more precise, should we look for the smallest circle that contains the group? the largest of the angular distances? the sum of squares of all the distances? the area of the spherical polygon of which some of the stars are the vertices and which contains the others in its interior? Each of these quantities is smaller for the group of the Pleiades than seems plausible. Which of them should provide the measure of implausibility? If three of the stars form an equilateral triangle, do we have to add this circumstance, which is certainly very unlikely apriori, to those that point to a cause?” Joseph Bertrand (p.166)

“But whatever objection one can raise from a logical point of view cannot prevent the preceding question from arising in many situations: the theory of probability cannot refuse to examine it and to give an answer; the precision of the response will naturally be limited by the lack of precision in the question; but to refuse to answer under the pretext that the answer cannot be absolutely precise, is to place oneself on purely abstract grounds and to misunderstand the essential nature of the application of mathematics.” Emile Borel (Chapter 4)

Another highly interesting objection of Bertrand is somewhat linked with his conditioning paradox, namely that the density of the observed unlikely event depends on the choice of the statistic that is used to calibrate the unlikeliness, which makes complete sense in that the information contained in each of these statistics and the resulting probability or likelihood differ to an arbitrary extend, that there are few cases (monotone likelihood ratio) where the choice can be made, and that Bayes factors share the same drawback if they do not condition upon the entire sample. In which case there is no selection of “circonstances remarquables”. Or of uniformly most powerful tests.

## Gaussian hare and Laplacian tortoise

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , , , on October 19, 2018 by xi'an

A question on X validated on the comparative merits of L¹ versus L² estimation led me to the paper of Stephen Portnoy and Roger Koenker entitled “The Gaussian Hare and the Laplacian Tortoise: Computability of Squared-Error versus Absolute-Error Estimators”, which I had missed at the time, despite enjoying a subscription to Statistical Science till the late 90’s.. The authors went as far as producing a parody of Granville’s Fables de La Fontaine by sticking Laplace’s and Gauss’ heads on the tortoise and the hare!

I remember rather vividly going through Steve Stigler’s account of the opposition between Laplace’s and Legendre’s approaches, when reading his History of Statistics in 1990 or 1991… Laplace defending the absolute error on the basis of the default double-exponential (or Laplace) distribution, when Legendre and then Gauss argued in favour of the squared error loss on the basis of a defaul Normal (or Gaussian) distribution. (Edgeworth later returned to the support of the L¹ criterion.) Portnoy and Koenker focus mostly on ways of accelerating the derivation of the L¹ regression estimators. (I also learned from the paper that Koenker was one of the originators of quantile regression.)

## Riddler collector

Posted in Statistics with tags , , , , , , , on September 22, 2018 by xi'an

Once in a while a fairly standard problem makes it to the Riddler puzzle of the week. Today, it is the coupon collector problem, explained by W. Huber on X validated. (W. Huber happens to be the top contributor to this forum, with over 2000 answers, and the highest reputation closing on 200,000!) With nothing (apparently) unusual: coupons [e.g., collecting cards] come in packs of k=10 with no duplicate, and there are n=100 different coupons. What is the expected number one has to collect before getting all of the n coupons?  W. Huber provides an R code to solve the recurrence on the expectation, obtained by conditioning on the number m of different coupons already collected, e(m,n,k) and hence on the remaining number of collect, with an Hypergeometric distribution for the number of new coupons in the next pack. Returning 25.23 packs on average. As is well-known, the average number of packs to complete one’s collection with the final missing card is expensively large, with more than 5 packs necessary on average. The probability distribution of the required number of packs has actually been computed by Laplace in 1774 (and then again by Euler in 1785).

## an interesting identity

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , on March 1, 2018 by xi'an

Another interesting X validated question, another remembrance of past discussions on that issue. Discussions that took place in the Institut d’Astrophysique de Paris, nearby this painting of Laplace, when working on our cosmostats project. Namely the potential appeal of recycling multidimensional simulations by permuting the individual components in nearly independent settings. As shown by the variance decomposition in my answer, when opposing N iid pairs (X,Y) to the N combinations of √N simulations of X and √N simulations of Y, the comparison

$\text{var} \hat{\mathfrak{h}}^2_N=\text{var} (\hat{\mathfrak{h}}^1_N)+\frac{mn(n-1)}{N^2}\,\text{var}^Y\left\{ \mathbb{E}^{X}\left\{\mathfrak{h}(X,Y)\right\}\right\}$

$+\frac{m(m-1)n}{N^2}\,\text{var}^X\left[\mathbb{E}^Y\left\{\mathfrak{h}(X,Y)\right\}\right]$

unsurprisingly gives the upper hand to the iid sequence. A sort of converse to Rao-Blackwellisation…. Unless the production of N simulations gets much more costly when compared with the N function evaluations. No wonder we never see this proposal in Monte Carlo textbooks!