Archive for Law of Large Numbers

Casanova’s Lottery [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on January 12, 2023 by xi'an

This “history of a revolutionary game of chance” is the latest book by Stephen Stigler and is indeed of an historical nature, following the French Lottery from its inception as Loterie royale in 1758 to the Loterie Nationale in 1836 (with the intermediate names of Loterie de France, Loterie Nationale, Loterie impériale, Loterie royale reflecting the agitated history of the turn of that Century!).

The incentive for following this State lottery is that it is exceptional by its mathematical foundations. Contrary to other lotteries of the time, it was indeed grounded on the averaging of losses and gains on the long run (for the State). The French (Royal) State thus accepted the possibility of huge losses at some draws since they would be compensated by even larger gains. The reasoning proved most correct since the Loterie went providing as far as 4% of the overall State budget, despite the running costs of maintaining a network of betting places and employees, who had to be mathematically savy in order to compute the exact gains of the winners.This is rather amazing as the understanding of the Law of Large Numbers was quite fresh (on an historical scale) thanks to the considerable advances made by Pascal, Fermat, (Jakob) Bernoulli and a few others. (The book mentions the Encyclopedist and mathematician Jean d’Alembert as being present at the meeting that decided of the creation of the Loterie in 1757.)

One may wonder why Casanova gets the credit for this lottery. In true agreement with Stigler’s Law, it is directly connected with the Genoan lottery and subsequent avatars in some Italian cities, including Casanova’s Venezia. But jack-of-all-trades Casanova was instrumental in selling the notion to the French State, having landed in Paris after a daring flight from the Serenissima’s jails. After succeeding in convincing the King’s officers to launch the scheme crafted by a certain Ranieri (de’) Calzabig—not to be confused with the much maligned Salieri!—who would later collaborate with Gluck on Orfeo ed Eurydice and Alceste, Casanova received a salary from the Loterie administration and further run several betting offices. Until he left Paris for further adventures! Including an attempt to reproduce the lottery in Berlin, where Frederick II proved less receptive than Louis XIV. (Possibly due to Euler’s cautionary advice.) The final sentence of the book stands by its title: “It was indeed Casanova’s lottery” (p.210).

Unsurprisingly, given Stephen’s fascination for Pierre-Simon Laplace, the great man plays a role in the history, first by writing in 1774 one of his earliest papers on a lottery problem, namely the distribution of the number of draws needed for all 90 numbers to appear. His (correct) solution is an alternating sum whose derivation proved a numerical challenge. Thirty years later, Laplace came up with a good and manageable approximation (see Appendix Two). Laplace also contributed to the end of the Loterie by arguing on moral grounds against this “voluntary” tax, along Talleyrand, a fellow in perpetually adapting to the changing political regimes. It is a bit of a surprise to read that this rather profitable venture ended up in 1836, more under bankers’ than moralists´ pressure. (A new national lottery—based on printed tickets rather than bets on results—was created a century later, in 1933 and survived the second World War, with the French Loto appearing in 1974 as a direct successor to Casanova’s lottery.)

The book covers many fascinating aspects, from the daily run of the Loterie, to the various measures (successfully) taken against fraud, to the survival during the Révolution and its extension through (the Napoleonic) Empire, to tests for fairness thanks to numerous data from almanacs, to the behaviour of bettors and the sale of “helping” books. to (Daniel) Bernoulli, Buffon, Condorcet, and Laplace modelling rewards and supporting decreasing marginal utility. Note that there are hardly any mathematical formula, except for an appendix on the probabilities of wins and the returns, as well as Laplace’s (and Legendre’s) derivations. Which makes the book eminently suited for a large audience, the more thanks to Stephen Stigler’s perfect style.

This (paperback) book is also very pleasantly designed by the University of Chicago Press, with a plesant font (Adobe Calson Pro) and a very nice cover involving Laplace undercover, taken from a painting owned by the author. The many reproductions of epoch documents are well-done and easily readable. And, needless to say given the scholarship of Stephen, the reference list is impressive.

The book is testament to the remarkable skills of Stephen who searched for material over thirty years, from Parisian specialised booksellers to French, English, and American archives. He manages to bring into the story a wealth of connections and characters, as for instance Voltaire’s scheme to take advantage of an earlier French State lottery aimed at reimbursing State debtors. (Voltaire actually made a fortune of several millions francs out of this poorly designed lottery.) For my personal instructions, the book also put life to several Métro stations like Pereire and Duverney. But the book‘s contents will prove fascinating way beyond Parisian locals and francophiles. Enjoy!

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Books Review section in CHANCE. As appropriate for a book about capitalising on chance beliefs!]

important Markov chains

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on July 21, 2022 by xi'an

With Charly Andral (PhD, Paris Dauphine), Randal Douc, and Hugo Marival (PhD, Telecom SudParis), we just arXived a paper on importance Markov chains that merges importance sampling and MCMC. An idea already mentioned in Hastings (1970) and even earlier in Fodsick (1963), and later exploited in Liu et al.  (2003) for instance. And somewhat dual of the vanilla Rao-Backwellisation paper Randal and I wrote a (long!) while ago. Given a target π with a dominating measure π⁰≥Mπ, using a Markov kernel to simulate from this dominating measure and subsampling by the importance weight ρ does produce a new Markov chain with the desired target measure as invariant distribution. However, the domination assumption is rather unrealistic and a generic approach can be implemented without it, by defining an extended Markov chain, with the addition of the number N of replicas as the supplementary term… And a transition kernel R(n|x) on N with expectation ρ, which is a minimal(ist) assumption for the validation of the algorithm.. While this initially defines a semi-Markov chain, an extended Markov representation is also feasible, by decreasing N one by one until reaching zero, and this is most helpful in deriving convergence properties for the resulting chain, including a CLT.  While the choice of the kernel R is free, the optimal choice is associated with residual sampling, where only the fractional part of ρ is estimated by a Bernoulli simulation.

The [errors in the] error of truth [book review]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , on August 10, 2021 by xi'an

OUP sent me this book, The error of truth by Steven Osterling, for review. It is a story about the “astonishing” development of quantitative thinking in the past two centuries. Unfortunately, I found it to be one of the worst books I have read on the history of sciences…

To start with the rather obvious part, I find the scholarship behind the book quite shoddy as the author continuously brings in items of historical tidbits to support his overall narrative and sometimes fills gaps on his own. It often feels like the material comes from Wikipedia, despite expressing a critical view of the on-line encyclopedia. The [long] quote below is presumably the most shocking historical blunder, as the terror era marks the climax of the French Revolution, rather than the last fight of the French monarchy. Robespierre was the head of the Jacobins, the most radical revolutionaries at the time, and one of the Assembly members who voted for the execution of Louis XIV, which took place before the Terror. And later started to eliminate his political opponents, until he found himself on the guillotine!

“The monarchy fought back with almost unimaginable savagery. They ordered French troops to carry out a bloody campaign in which many thousands of protesters were killed. Any peasant even remotely suspected of not supporting the government was brutally killed by the soldiers; many were shot at point-blank range. The crackdown’s most intense period was a horrific ten-month Reign of Terror (“la Terreur”) during which the government guillotined untold masses (some estimates are as high as 5,000) of its own citizens as a means to control them. One of the architects of the Reign of Terror was Maximilien Robespierre, a French nobleman and lifelong politician. He explained the government’s slaughter in unbelievable terms, as “justified terror . . . [and] an emanation of virtue” (quoted in Linton 2006). Slowly, however, over the next few years, the people gained control. In the end, many nobles, including King Louis XVI and his wife Marie-Antoinette, were themselves executed by guillotining”

Obviously, this absolute misinterpretation does not matter (very) much for the (hi)story of quantification (and uncertainty assessment), but it demonstrates a lack of expertise of the author. And sap whatever trust one could have in new details he brings to light (life?). As for instance when stating

“Bayes did a lot of his developmental work while tutoring students in local pubs. He was a respected teacher. Taking advantage of his immediate resources (in his circumstance, a billiard table), he taught his theorem to many.”

which does not sound very plausible. I never heard that Bayes had students  or went to pubs or exposed his result to many before its posthumous publication… Or when Voltaire (who died in 1778) is considered as seventeenth-century precursor of the Enlightenment. Or when John Graunt, true member of the Royal Society, is given as a member of the Académie des Sciences. Or when Quetelet is presented as French and as a student of Laplace.

The maths explanations are also puzzling, from the law of large numbers illustrated by six observations, and wrongly expressed (p.54) as

\bar{X}_n+\mu\qquad\text{when}\qquad n\longrightarrow\infty

to  the Saint-Petersbourg paradox being seen as inverse probability, to a botched description of the central limit theorem  (p.59), including the meaningless equation (p.60)

\gamma_n=\frac{2^{2n}}{\pi}\int_0^\pi~\cos^{2n} t\,\text dt

to de Moivre‘s theorem being given as Taylor’s expansion

f(z)=\sum_{n=0}^\infty \frac{f^{(n)}(a)}{n!}(z-a)^2

and as his derivation of the concept of variance, to another botched depiction of the difference between Bayesian and frequentist statistics, incl. the usual horror

P(68.5<70<71.5)=95%

to independence being presented as a non-linear relation (p.111), to the conspicuous absence of Pythagoras in the regression chapter, to attributing to Gauss the concept of a probability density (when Simpson, Bayes, Laplace used it as well), to another highly confusing verbal explanation of densities, including a potential confusion between different representations of a distribution (Fig. 9.6) and the existence of distributions other than the Gaussian distribution, to another error in writing the Gaussian pdf (p.157),

f(x)=\dfrac{e^{-(z-\mu)^2}\big/2\sigma^2}{\sigma\sqrt{2\pi}}

to yet another error in the item response probability (p.301), and.. to completely missing the distinction between the map and the territory, i.e., the probabilistic model and the real world (“Truth”), which may be the most important shortcoming of the book.

The style is somewhat heavy, with many repetitions about the greatness of the characters involved in the story, and some degree of license in bringing them within the narrative of the book. The historical determinism of this narrative is indeed strong, with a tendency to link characters more than they were, and to make them greater than life. Which is a usual drawback of such books, along with the profuse apologies for presenting a few mathematical formulas!

The overall presentation further has a Victorian and conservative flavour in its adoration of great names, an almost exclusive centering on Western Europe, a patriarchal tone (“It was common for them to assist their husbands in some way or another”, p.44; Marie Curie “agreed to the marriage, believing it would help her keep her laboratory position”, p.283), a defense of the empowerment allowed by the Industrial Revolution and of the positive sides of colonialism and of the Western expansion of the USA, including the invention of Coca Cola as a landmark in the march to Progress!, to the fall of the (communist) Eastern Block being attributed to Ronald Reagan, Karol Wojtyła, and Margaret Thatcher, to the Bell Curve being written by respected professors with solid scholarship, if controversial, to missing the Ottoman Enlightenment and being particularly disparaging about the Middle East, to dismissing Galton’s eugenism as a later year misguided enthusiasm (and side-stepping the issue of Pearson’s and Fisher’s eugenic views),

Another recurrent if minor problem is the poor recording of dates and years when introducing an event or a new character. And the quotes referring to the current edition or translation instead of the original year as, e.g., Bernoulli (1954). Or even better!, Bayes and Price (1963).

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

estimating a constant (not really)

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on October 12, 2012 by xi'an

Larry Wasserman wrote a blog entry on the normalizing constant paradox, where he repeats that he does not understand my earlier point…Let me try to recap here this point and the various comments I made on StackExchange (while keeping in mind all this is for intellectual fun!)

The entry is somehow paradoxical in that Larry acknowledges (in that post) that the analysis in his book, All of Statistics, is wrong. The fact that “g(x)/c is a valid density only for one value of c” (and hence cannot lead to a notion of likelihood on c) is the very reason why I stated that there can be no statistical inference nor prior distribution about c: a sample from f does not bring statistical information about c and there can be no statistical estimate of c based on this sample. (In case you did not notice, I insist upon statistical!)

To me this problem is completely different from a statistical problem, at least in the modern sense: if I need to approximate the constant c—as I do in fact when computing Bayes factors—, I can produce an arbitrarily long sample from a certain importance distribution and derive a converging (and sometimes unbiased) approximation of c. Once again, this is Monte Carlo integration, a numerical technique based on the Law of Large Numbers and the stabilisation of frequencies. (Call it a frequentist method if you wish. I completely agree that MCMC methods are inherently frequentist in that sense, And see no problem with this because they are not statistical methods. Of course, this may be the core of the disagreement with Larry and others, that they call statistics the Law of Large Numbers, and I do not. This lack of separation between both notions also shows up in a recent general public talk on Poincaré’s mistakes by Cédric Villani! All this may just mean I am irremediably Bayesian, seeing anything motivated by frequencies as non-statistical!) But that process does not mean that c can take a range of values that would index a family of densities compatible with a given sample. In this Monte Carlo integration approach, the distribution of the sample is completely under control (modulo the errors induced by pseudo-random generation). This approach is therefore outside the realm of Bayesian analysis “that puts distributions on fixed but unknown constants”, because those unknown constants parameterise the distribution of an observed sample. Ergo, c is not a parameter of the sample and the sample Larry argues about (“we have data sampled from a distribution”) contains no information whatsoever about c that is not already in the function g. (It is not “data” in this respect, but a stochastic sequence that can be used for approximation purposes.) Which gets me back to my first argument, namely that c is known (and at the same time difficult or impossible to compute)!

Let me also answer here the comments on “why is this any different from estimating the speed of light c?” “why can’t you do this with the 100th digit of π?” on the earlier post or on StackExchange. Estimating the speed of light means for me (who repeatedly flunked Physics exams after leaving high school!) that we have a physical experiment that measures the speed of light (as the original one by Rœmer at the Observatoire de Paris I visited earlier last week) and that the statistical analysis infers about c by using those measurements and the impact of the imprecision of the measuring instruments (as we do when analysing astronomical data). If, now, there exists a physical formula of the kind

c=\int_\Xi \psi(\xi) \varphi(\xi) \text{d}\xi

where φ is a probability density, I can imagine stochastic approximations of c based on this formula, but I do not consider it a statistical problem any longer. The case is thus clearer for the 100th digit of π: it is also a fixed number, that I can approximate by a stochastic experiment but on which I cannot attach a statistical tag. (It is 9, by the way.) Throwing darts at random as I did during my Oz tour is not a statistical procedure, but simple Monte Carlo à la Buffon…

Overall, I still do not see this as a paradox for our field (and certainly not as a critique of Bayesian analysis), because there is no reason a statistical technique should be able to address any and every numerical problem. (Once again, Persi Diaconis would almost certainly differ, as he defended a Bayesian perspective on numerical analysis in the early days of MCMC…) There may be a “Bayesian” solution to this particular problem (and that would nice) and there may be none (and that would be OK too!), but I am not even convinced I would call this solution “Bayesian”! (Again, let us remember this is mostly for intellectual fun!)

%d bloggers like this: