*“We ought to estimate the chance that the probability for the happening of an event perfectly unknown, should lie between any two named degrees of probability, antecedently to any experiment made about it.”* Letter of R. Price to J. Canton, Nov. 10, 1763

**O**n a lazy and sunny Sunday afternoon, I re-read Thomas Bayes’ 1763 *Essay*. (It is available in LaTex, courtesy of Peter Lee.) The major part of the *Essay* is actually written by Richard Price, Bayes’ contribution being from page 376 to page 399. Most of the introduction by Price (in the form of a letter to John Canton) rephrases Bayes’ findings, but he stresses that Bayes set a “sure foundation for all our reasonings concerning past facts”. In the spirit of the time, he cannot prevent from relating the uncovering of “fixt laws according to which events happened” to the “existence of the Deity”. He also perceives Bayes’ rule as “solving the converse problem” from De Moivre’s Laws of Chances. At last, he stresses that, although chance should relate to *past* events, while probability relates to *future* events, the distinction should not impact conditional probability.

*“Given the number of times in which an unknown event has happened and failed; Required the chance that the probability of its happening in a single trial lies somewhere between any two degrees of probability that can be named.”* Th. Bayes

**T**he *Essay* itself consists in (a) a “brief demonstration of the general laws of chance”, (b) the derivation of Bayes’ posterior distribution for the uniform-binomial problem, (c) the computation of the posterior probability of an arbitrary interval. The first part is a rewording of De Moivre’s Laws of Chance, in particular recalling the definition of a conditional probability. Maybe the definition of the probability is worth quoting

*5. The probability of any event is the ratio between the value at which an expectation depending on the happening of the event ought to be computed and the value of the thing expected upon it’s happening.*

because it actually defines a probability as a by-product of the expected number of occurrences within a binomial experiment. (There is therefore nothing frequentist in this definition!) The main part (and the huge novelty) in the *Essay *is the derivation of the Beta posterior. Surprisingly, the setup is introduced very abruptly (in that nowhere before were those balls mentioned!):

Postulate. 1. Suppose the square table or plane ABCD to be so made and levelled, that if either of the balls *o* or *W* be thrown upon it, there shall be the same probability that it re{\st}s upon any one equal part of the plane as another, and that it must necessarily rest somewhere upon it.

and then the derivation starts with a two-page derivation that the prior (uniform) cdf is the uniform cdf. The next result is Prop. 8 [388] that gives the joint probability that the binomial probability is between *f* and *b* and that the binomial experiment gives *x=p*:

the probability the point *o* should fall between *f* and *b*, any two points named in the line *AB*, and withall that the event *M* should happen *p* times and fail *q* in *p+q* trials, is the ratio of *fghikmb*, the part of the figure *BghikmA* intercepted between the perpendiculars *fg*, bm raised upon the line *AB*, to *CA* the square upon *AB*.

where the curve is *y=x*^{p}(1-x)^{q}.

**T**he next proposition is then Bayes’ rule, still expressed in terms of surface ratio as above,

The same things supposed, I guess that the probability of the event *M* lies somewhere between *0* and the ratio of *Ab* to *AB*, my chance to be in the right is the ratio of *Abm* to *AiB*.

but clearly set within the *Beta(p+1,q+1)* distribution [in modern terms]. Bayes then inserts a scholium where he tries to justify the use of the uniform prior, however I do not see the validity of the reasoning since he seems to argue in favour of a uniform distribution on the marginal distribution of the binomial experiment:

I have no reason to think that, in a certain number of trials, it should rather happen any one possible number of times than another.

The last part of the *Essay* *per se* is about deriving a closed form formula for the Beta integral, a feat achieved in Rule I. [399]

in slightly more modern notations. The 18 remaining pages are written by Richard Price, who first reproduces Bayes’ approximations to the above integral with improvements of his own, then illustrates the performances of such approximations in specific cases, with the astounding fact that the probability covered by the approximation is centred at the MLE:

and not at the Bayes posterior mean. This could be extrapolated as one of the earliest confidence sets, except of course that the probability is over the parameter space. I note that Price also derives [409-410] as a consequence of Bayes’ calculations what is now know as Laplace’s succession rule…! Besides the derivation of the posterior distribution itself, which must be a considerable feat for the time, the attention to computational issues is highly commendable, as it would become a constant theme of Bayesian studies for centuries!!!