Archive for Bertrand’s paradox

Bertrand-Borel debate

Posted in Books, Statistics with tags , , , , , , , , , , , , , on May 6, 2019 by xi'an

On her blog, Deborah Mayo briefly mentioned the Bertrand-Borel debate on the (in)feasibility of hypothesis testing, as reported [and translated] by Erich Lehmann. A first interesting feature is that both [starting with] B mathematicians discuss the probability of causes in the Bayesian spirit of Laplace. With Bertrand considering that the prior probabilities of the different causes are impossible to set and then moving all the way to dismiss the use of probability theory in this setting, nipping the p-values in the bud..! And Borel being rather vague about the solution probability theory has to provide. As stressed by Lehmann.

“The Pleiades appear closer to each other than one would naturally expect. This statement deserves thinking about; but when one wants to translate the phenomenon into numbers, the necessary ingredients are lacking. In order to make the vague idea of closeness more precise, should we look for the smallest circle that contains the group? the largest of the angular distances? the sum of squares of all the distances? the area of the spherical polygon of which some of the stars are the vertices and which contains the others in its interior? Each of these quantities is smaller for the group of the Pleiades than seems plausible. Which of them should provide the measure of implausibility? If three of the stars form an equilateral triangle, do we have to add this circumstance, which is certainly very unlikely apriori, to those that point to a cause?” Joseph Bertrand (p.166)

 

“But whatever objection one can raise from a logical point of view cannot prevent the preceding question from arising in many situations: the theory of probability cannot refuse to examine it and to give an answer; the precision of the response will naturally be limited by the lack of precision in the question; but to refuse to answer under the pretext that the answer cannot be absolutely precise, is to place oneself on purely abstract grounds and to misunderstand the essential nature of the application of mathematics.” Emile Borel (Chapter 4)

Another highly interesting objection of Bertrand is somewhat linked with his conditioning paradox, namely that the density of the observed unlikely event depends on the choice of the statistic that is used to calibrate the unlikeliness, which makes complete sense in that the information contained in each of these statistics and the resulting probability or likelihood differ to an arbitrary extend, that there are few cases (monotone likelihood ratio) where the choice can be made, and that Bayes factors share the same drawback if they do not condition upon the entire sample. In which case there is no selection of “circonstances remarquables”. Or of uniformly most powerful tests.

Le Monde puzzle [#1024]

Posted in Books, Kids with tags , , , , , , , on October 10, 2017 by xi'an

The penultimate and appropriately somewhat Monty Hallesque Le Monde mathematical puzzle of the competition!

A dresser with 5×5 drawers contains a single object in one of the 25 drawers. A player opens a drawer at random and, after each choice, the object moves at random to a drawer adjacent to its current location and the drawer chosen by the player remains open. What is the maximum number of drawers one need to open to find the object?

In a dresser with 9 drawers in a line, containing again a single object, the player opens drawers one at a time, after which the open drawer is closed and the object moves to one of the drawers adjacent to its current location. What is the maximum number of drawers one need to open to find the object?

For the first question, setting a pattern of exploration and, given this pattern, simulating a random walk trying to avoid the said pattern as long as possible is feasible, returning a maximum number of steps over many random walks [and hence a lower bound on the true maximum]. As in the following code

sefavyd=function(pater=seq(1,49,2)%%25+1){
  fild=matrix(0,5,5)
  m=pater[1];i=fild[m]=1
  t=sample((1:25)[-m],1)
  nomove=FALSE
  while (!nomove){
   i=i+1
   m=pater[i];fild[m]=1
   if (t==m){ nomove=TRUE}else{
   muv=NULL
   if ((t-1)%%5>0) muv=c(muv,t-1)
   if (t%%5>0) muv=c(muv,t+1)
   if ((t-1)%/%5>0) muv=c(muv,t-5)
   if (t%/%5<4) muv=c(muv,t+5)
   muv=muv[fild[muv]==0]
   nomove=(length(muv)==0)
   if (!nomove) t=sample(rep(muv,2),1)}
  }
  return(i)}

But a direct reasoning starts from the observation that, while two adjacent drawers are not opened, a random walk can, with non-zero probability, switch indefinitely between both drawers. Hence, a sure recovery of the object requires opening one drawer out of two. The minimal number of drawers to open on a 5×5 dresser is 2+3+2+3+2=12. Since in 12 steps, those drawers are all open, spotting the object may require up to 13 steps.

For the second case, unless I [again!] misread the question, whatever pattern one picks for the exploration, there is always a non-zero probability to avoid discovery after an arbitrary number of steps. The [wrong!] answer is thus infinity. To cross-check this reasoning, I wrote the following R code that mimics a random pattern of exploration, associated by an opportunistic random walk that avoids discovery whenever possible (even with very low probability) bu pushing the object towards the centre,

drawl=function(){
  i=1;t=5;nomove=FALSE
  m=sample((1:9)[-t],1)
  while (!nomove){
    nextm=sample((1:9),1)
    muv=c(t-1,t+1)
    muv=muv[(muv>0)&(muv<10)&(muv!=nextm)] 
    nomove=(length(muv)==0)||(i>1e6)
    if (!nomove) t=sample(rep(muv,2),1,
              prob=1/(5.5-rep(muv,2))^4)
    i=i+1}
  return(i)}

which returns unlimited values on repeated runs. However, I was wrong and the R code unable to dismiss my a priori!, as later discussions with Robin and Julien at Paris-Dauphine exhibited ways of terminating the random walk in 18, then 15, then 14 steps! The idea was to push the target to one of the endpoints because it would then have no option but turning back: an opening pattern like 2, 3, 4, 5, 6, 7, 8, 8 would take care of a hidden object starting in an even drawer, while the following 7, 6, 5, 4, 3, 2 openings would terminate any random path starting from an odd drawer. To double check:

grawl=function(){
  len=0;muvz=c(3:8,8:1)
  for (t in 1:9){
    i=1;m=muvz[i];nomove=(t==m)
    while (!nomove){
     i=i+1;m=muvz[i];muv=c(t-1,t+1)
     muv=muv[(muv>0)&(muv<10)&(muv!=m)]
     nomove=(length(muv)==0)
     if (!nomove)
      t=sample(rep(muv,2),1)}
    len=max(len,i)}
  return(len)}

produces the value 14.

what makes variables randoms [book review]

Posted in Books, Mountains, Statistics with tags , , , , , , on July 19, 2017 by xi'an

When the goal of a book is to make measure theoretic probability available to applied researchers for conducting their research, I cannot but applaud! Peter Veazie’s goal of writing “a brief text that provides a basic conceptual introduction to measure theory” (p.4) is hence most commendable. Before reading What makes variables random, I was uncertain how this could be achieved with a limited calculus background, given the difficulties met by our third year maths students. After reading the book, I am even less certain this is feasible!

“…it is the data generating process that makes the variables random and not the data.”

Chapter 2 is about basic notions of set theory. Chapter 3 defines measurable sets and measurable functions and integrals against a given measure μ as

\sup_\pi \sum_{A\in\pi}\inf_{\omega\in A} f(\omega)\mu(A)

which I find particularly unnatural compared with the definition through simple functions (esp. because it does not tell how to handle 0x∞). The ensuing discussion shows the limitation of the exercise in that the definition is only explained for finite sets (since the notion of a partition achieving the supremum on page 29 is otherwise meaningless). A generic problem with the book, in that most examples in the probability section relate to discrete settings (see the discussion of the power set p.66). I also did not see a justification as to why measurable functions enjoy well-defined integrals in the above sense. All in all, to see less than ten pages allocated to measure theory per se is rather staggering! For instance,

\int_A f\text{d}\mu

does not appear to be defined at all.

“…the mathematical probability theory underlying our analyses is just mathematics…”

Chapter 4 moves to probability measures. It distinguishes between objective (or frequentist) and subjective measures, which is of course open to diverse interpretations. And the definition of a conditional measure is the traditional one, conditional on a set rather than on a σ-algebra. Surprisingly as this is in my opinion one major reason for using measures in probability theory. And avoids unpleasant issues such as Bertrand’s paradox. While random variables are defined in the standard sense of real valued measurable functions, I did not see a definition of a continuous random variables or of the Lebesgue measure. And there are only a few lines (p.48) about the notion of expectation, which is so central to measure-theoretic probability as to provide a way of entry into measure theory! Progressing further, the σ-algebra induced by a random variable is defined as a partition (p.52), a particularly obscure notion for continuous rv’s. When the conditional density of one random variable given the realisation of another is finally introduced (p.63), as an expectation reconciling with the set-wise definition of conditional probabilities, it is in a fairly convoluted way that I fear will scare newcomers out of their wit. Since it relies on a sequence of nested sets with positive measure, implying an underlying topology and the like, which somewhat shows the impossibility of the overall task…

“In the Bayesian analysis, the likelihood provides meaning to the posterior.”

Statistics is hurriedly introduced in a short section at the end of Chapter 4, assuming the notion of likelihood is already known by the readers. But nitpicking (p.65) at the representation of the terms in the log-likelihood as depending on an unspecified parameter value θ [not to be confused with the data-generating value of θ, which does not appear clearly in this section]. Section that manages to include arcane remarks distinguishing maximum likelihood estimation from Bayesian analysis, all this within a page! (Nowhere is the Bayesian perspective clearly defined.)

“We should no more perform an analysis clustered by state than we would cluster by age, income, or other random variable.”

The last part of the book is about probabilistic models, drawing a distinction between data generating process models and data models (p.89), by which the author means the hypothesised probabilistic model versus the empirical or bootstrap distribution. An interesting way to relate to the main thread, except that the convergence of the data distribution to the data generating process model cannot be established at this level. And hence that the very nature of bootstrap may be lost on the reader. A second and final chapter covers some common or vexing problems and the author’s approach to them. Revolving around standard errors, fixed and random effects. The distinction between standard deviation (“a mathematical property of a probability distribution”) and standard error (“representation of variation due to a data generating process”) that is followed for several pages seems to boil down to a possible (and likely) model mis-specification. The chapter also contains an extensive discussion of notations, like indexes (or indicators), which seems a strange focus esp. at this location in the book. Over 15 pages! (Furthermore, I find quite confusing that a set of indices is denoted there by the double barred I, usually employed for the indicator function.)

“…the reader will probably observe the conspicuous absence of a time-honoured topic in calculus courses, the “Riemann integral”… Only the stubborn conservatism of academic tradition could freeze it into a regular part of the curriculum, long after it had outlived its historical importance.” Jean Dieudonné, Foundations of Modern Analysis

In conclusion, I do not see the point of this book, from its insistence on measure theory that never concretises for lack of mathematical material to an absence of convincing examples as to why this is useful for the applied researcher, to the intended audience which is expected to already quite a lot about probability and statistics, to a final meandering around linear models that seems at odds with the remainder of What makes variables random, without providing an answer to this question. Or to the more relevant one of why Lebesgue integration is preferable to Riemann integration. (Not that there does not exist convincing replies to this question!)

failures and uses of Jaynes’ principle of transformation groups

Posted in Books, Kids, R, Statistics, University life with tags , , , , on April 14, 2015 by xi'an

This paper by Alon Drory was arXived last week when I was at Columbia. It reassesses Jaynes’ resolution of Bertrand’s paradox, which finds three different probabilities for a given geometric event depending on the underlying σ-algebra (or definition of randomness!). Both Poincaré and Jaynes argued against Bertrand that there was only one acceptable solution under symmetry properties. The author of this paper, Alon Drory, argues this is not the case!

“…contrary to Jaynes’ assertion, each of the classical three solutions of Bertrand’s problem (and additional ones as well!) can be derived by the principle of transformation groups, using the exact same symmetries, namely rotational, scaling and translational invariance.”

Drory rephrases as follows:  “In a circle, select at random a chord that is not a diameter. What is the probability that its length is greater than the side of the equilateral triangle inscribed in the circle?”.  Jaynes’ solution is indifferent to the orientation of one observer wrt the circle, to the radius of the circle, and to the location of the centre. The later is the one most discussed by Drory, as he argued that it does not involve an observer but the random experiment itself and relies on a specific version of straw throws in Jaynes’ argument. Meaning other versions are also available. This reminded me of an earlier post on Buffon’s needle and on the different versions of the needle being thrown over the floor. Therein reflecting on the connection with Bertrand’s paradox. And running some further R experiments. Drory’s alternative to Jaynes’ manner of throwing straws is to impale them on darts and throw the darts first! (Which is the same as one of my needle solutions.)

“…the principle of transformation groups does not make the problem well-posed, and well-posing strategies that rely on such symmetry considerations ought therefore to be rejected.”

In short, the conclusion of the paper is that there is an indeterminacy in Bertrand’s problem that allows several resolutions under the principle of indifference that end up with a large range of probabilities, thus siding with Bertrand rather than Jaynes.

Buffon versus Bertrand in R

Posted in R, Statistics with tags , , , on April 8, 2011 by xi'an

Following my earlier post on Buffon’s needle and Bertrand’s paradox, above are four outcomes corresponding to four different generations (among many) of the needle locations. The upper right-hand side makes a difference in the number of hits (out of 1000). The R code corresponding to this generation was made in the métro, so do not expect subtlety: Continue reading

When Buffon meets Bertrand

Posted in R, Statistics, Travel with tags , , , , , on April 7, 2011 by xi'an

When Peter Diggle gave his “short history” of spatial statistics this morning (I typed this in the taxi from Charles de Gaulle airport, after waiting one hour for my bag!), he started with a nice slide about Buffon’s needle (and Buffon’s portrait), since Julian Besag was often prone to give this problem as a final exam to Durham students (one of whom is responsible for the candidate’s formula). This started me thinking about how this was open to a Bertrand’s paradox of its own. Indeed, randomness for the needle throw can be represented in many ways:

  • needle centre uniformly distributed over the room (or the perpendicular to the boards) with a random orientation (with a provision to have the needle fit);
  • needle endpoint uniformly distributed over the room (again a uniform over the perpendicular is enough) with a random orientation (again with a constraint);
  • random orientation from one corner of the room and a uniform location of the centre on the resulting line (with constraints on both ends for the needle to fit);
  • random orientation from one corner of the room and a uniform location of one endpoint on the resulting line, plus a Bernoulli generation to decide on the orientation (with constraints on both ends for the needle to fit);
  • &tc.

I did not have time to implement those different generation mechanisms in R, but have little doubt they should lead to different probabilities of intersection between the needle and one of the board separations. I actually found a web-page at the University of Alabama Huntsville addressing this problem through exercises (plus 20,000 related entries! Including von MisesProbability, Statistics and Truth itself. A book I should read one of those days, following Andrew.). Note that each version corresponds to a physical mechanism. Thus that there is no way to distinguish between them. Had I time, I would also like to consider the limiting case when the room gets infinite as, presumably, some of those proposals would end up being identical.

Bertand’s paradox [R details]

Posted in Books, R, Statistics with tags , , , , , , , on March 20, 2011 by xi'an

Some may have had reservations about the “randomness” of the straws I plotted to illustrate Bertrand’s paradox. As they were all going North-West/South-East. I had actually made an inversion between cbind and rbind in the R code, which explained for this non-random orientation. Above is the corrected version, which sounds “more random” indeed. (And using wheat as the proper, if weak, colour!) The outcome of a probability of 1/2 has not changed, of course. Here is the R code as well:


lacorde=rep(0,10^3)
plot(0,0,type="n",xlim=c(-2,2),ylim=c(-2,2))

for (t in 1:10^3){

 #distance from O to chord
 dchord=10

 while (dchord>1){
 #Generate "random" straw in large box till it crosses unit circle

 a=runif(2,-10,10)
 b=runif(2,-10,10)

 #endpoints outside the circle
 if ((sum(a^2)>1)&&(sum(b^2)>1)){

 theta=abs(acos(t(b-a)%*%a/sqrt(sum((b-a)^2)*sum(a^2))))
 theta=theta%%pi
 thetb=abs(acos(t(a-b)%*%b/sqrt(sum((b-a)^2)*sum(b^2))))
 thetb=thetb%%pi

 #chord inside
 if (max(abs(theta),abs(thetb))<pi/2)
 dchord=abs(sin(theta))*sqrt(sum(a^2))
 }
 }

 lacorde[t]=2*sqrt(1-dchord)
 if (runif(1)<.1) lines(rbind(a,b),col="wheat")
 }

lecercle=cbind(sin(seq(0,2*pi,le=100)),cos(seq(0,2*pi,le=100)))
lines(lecercle,col="sienna")

As a more relevant final remark, I came to the conclusion (this morning while running) that the probability of this event can be anything between 0 and 1, rather than the three traditional 1/4, 1/3 and 1/2. Indeed, for any distribution of the “random” straws, hence for any distribution on the chord length L, a random draw can be expressed as L=F⁻¹(U), where U is uniform. Therefore, this draw is also an acceptable transform of a uniform draw, just like Bertrand’s three solutions.