Archive for Bayesian decision theory

Bayesian decision riddle

Posted in Books, Kids, Statistics with tags , , , , on June 15, 2017 by xi'an

The current puzzle on The Riddler is a version of the secretary problem with an interesting (?) Bayesian solution.

Given four positive numbers x¹, x², x³, x⁴, observed sequentially, the associated utility is the value of x at the stopping time. What is the optimal stopping rule?

While nothing is mentioned about the distribution of the x’s, I made the assumption that they were iid and uniformly distributed over (0,M), with M unknown and tried a Bayesian resolution with the non-informative prior π(M)=1/M. And failed. The reason for this failure is that the expected utility is infinite at the first step: while the posterior expected utility is finite with three and two observations, meaning I can compare stopping and continuing at the second and third steps, the predicted expected reward for continuing after observing x¹ does not exist because the expected value of max(x¹,x²) given x¹ does not exist. As the predictive density of x² is max(x¹,x²)⁻²…  Several alternatives are possible to bypass this impossible resolution, from changing the utility function to picking another reference prior.

For instance, using a prior like π(M)=1/M² l(and the same monetary return utility) leads to a proper optimal solution, namely

  1. always wait for the second observation x²
  2. stop at x² if x²>11x¹/12, else wait for x³
  3. stop at x³ if x³>23 max(x¹,x²)/24, else observe x⁴

obtained analytically on a bar table in Rouen (and checked numerically later).

Another approach is to try to optimise the probability to pick the largest amount of the four x’s, but this is not leading to an interesting solution, since it corresponds to picking the first maximum after x¹, while picking the largest among remaining ones leads to a somewhat convoluted solution I have no patience to produce here! Plus this is not a really pertinent loss function as it does not discriminate enough against waiting…

going to war [a riddle]

Posted in Books, Kids, Statistics with tags , , , , , on December 16, 2016 by xi'an

On the Riddler this week, a seemingly obvious riddle:

A game consists of Alice and Bob, each with a $1 bill, receiving a U(0,1) strength each, unknown to the other, and deciding or not to bet on this strength being larger than the opponent’s. If no player bets, they both keep their $1 bill. Else, the winner leaves with both bills. Find the optimal strategy.

As often when “optimality” is mentioned, the riddle is unclear because, when looking at the problem from a decision-theoretic perspective, the loss function of each player is not defined in the question. But the St. Petersburg paradox shows the type of loss clearly matters and the utility of money is anything but linear for large values, as explained by Daniel Bernoulli in 1738 (and later analysed by Laplace in his Essai Philosophique).  Let us assume therefore that both players live in circumstances when losing or winning $1 makes little difference, hence when the utility is linear. A loss function attached to the experiment for Alice [and a corresponding utility function for Bob] could then be a function of (a,b), the result of both Uniform draws, and of the decisions δ¹ and δ² of both players as being zero if δ¹=δ²=0 and

L(a,b,\delta^1,\delta^2)=\begin{cases}0&\text{if }\delta^1=\delta^2=0\\\mathbb{I}(a<b)-\mathbb{I}(a>b)&\text{else}\\\end{cases}

Considering this loss function, Alice aims at minimising the expected loss by her choice of δ¹, equal to zero or one, expected loss that hence depends  on the unknown and simultaneous decision of Bob. If for instance Alice assumes Bob takes the decision to compete when observing an outcome b larger than a certain bound α, her decision is based on the comparison of (when B is Uniform (0,1))


(if δ¹=0) and of 1-2a (if δ¹=1). Comparing both expected losses leads to Alice competing (δ¹=1) when a>α/2.

However, there is no reason Alice should know the value of α when playing the (single) game and so she may think that Bob will follow the same reasoning, leading him to choosing a new bound of α/4, and, by iterating the thought process, down all the way to α=0!  So this modelling leads to always play the game, with each player having a ½ probability to win… Alternatively, Alice may set a prior on α, which leads to another bound on a for playing or not the game. Which in itself is not satisfactory either. (The published solution is following the above argument. Except for posting the maths expressions.)

A new approach to Bayesian hypothesis testing

Posted in Books, Statistics with tags , , , , , on September 8, 2016 by xi'an

“The main purpose of this paper is to develop a new Bayesian hypothesis testing approach for the point null hypothesis testing (…) based on the Bayesian deviance and constructed in a decision theoretical framework. It can be regarded as the Bayesian version of the likelihood ratio test.”

This paper got published in Journal of Econometrics two years ago but I only read it a few days ago when Kerrie Mengersen pointed it out to me. Here is an interesting criticism of Bayes factors.

“In the meantime, unfortunately, Bayes factors also suffers from several theoretical and practical difficulties. First, when improper prior distributions are used, Bayes factors contains undefined constants and takes arbitrary values (…) Second, when a proper but vague prior distribution with a large spread is used to represent prior ignorance, Bayes factors tends to favour the null hypothesis. The problem may persist even when the sample size is large (…) Third, the calculation of Bayes factors generally requires the evaluation of marginal likelihoods. In many models, the marginal likelihoods may be difficult to compute.”

I completely agree with these points, which are part of a longer list in our testing by mixture estimation paper. The authors also rightly blame the rigidity of the 0-1 loss function behind the derivation of the Bayes factor. An alternative decision-theoretic based on the Kullback-Leibler distance has been proposed by José Bernardo and Raúl Rueda, in a 2002 paper, evaluating the average divergence between the null and the full under the full, with the slight drawback that any nuisance parameter has the same prior under both hypotheses. (Which makes me think of the Savage-Dickey paradox, since everything here seems to take place under the alternative.) And the larger drawback of requiring a lower bound for rejecting the null. (Although it could be calibrated under the null prior predictive.)

This paper suggests using instead the difference of the Bayesian deviances, which is the expected log ratio integrated against the posterior. (With the possible embarrassment of the quantity having no prior expectation since the ratio depends on the data. But after all the evidence or marginal likelihood faces the same “criticism”.) So it is a sort of Bayes factor on the logarithms, with a strong similarity with Bernardo & Rueda’s solution since they are equal in expectation under the marginal. As in Dawid et al.’s recent paper, the logarithm removes the issue with the normalising constant and with the Lindley-Jeffreys paradox. The approach then needs to be calibrated in order to define a decision bound about the null. The asymptotic distribution of the criterion is  χ²(p)−p, where p is the dimension of the parameter to be tested, but this sounds like falling back on frequentist tests. And the deadly .05% bounds. I would rather favour a calibration of the criterion using prior or posterior predictives under both models…

the art of war

Posted in Books, Kids, pictures with tags , , , on December 30, 2015 by xi'an

My son offered me this illustrated version of The Art of War for Xmas. (Illustrated with traditional Chinese and Japanese drawings.) And I found this citation within the fourth chapter:

Measurement owes its existence to Earth;
Estimation of quantity to Measurement;
Calculation to Estimation of quantity;
Balancing of chances to Calculation;
and Victory to Balancing of chances.

Which could somewhat be seen as an early version of Bayesian decision theory….

Bayesian propaganda?

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , on April 20, 2015 by xi'an

“The question is about frequentist approach. Bayesian is admissable [sic] only by wrong definition as it starts with the assumption that the prior is the correct pre-information. James-Stein beats OLS without assumptions. If there is an admissable [sic] frequentist estimator then it will correspond to a true objective prior.”

I had a wee bit of a (minor, very minor!) communication problem on X validated, about a question on the existence of admissible estimators of the linear regression coefficient in multiple dimensions, under squared error loss. When I first replied that all Bayes estimators with finite risk were de facto admissible, I got the above reply, which clearly misses the point, and as I had edited the OP question to include more tags, the edited version was reverted with a comment about Bayesian propaganda! This is rather funny, if not hilarious, as (a) Bayes estimators are indeed admissible in the classical or frequentist sense—I actually fail to see a definition of admissibility in the Bayesian sense—and (b) the complete class theorems of Wald, Stein, and others (like Jack Kiefer, Larry Brown, and Jim Berger) come from the frequentist quest for best estimator(s). To make my point clearer, I also reproduced in my answer the Stein’s necessary and sufficient condition for admissibility from my book but it did not help, as the theorem was “too complex for [the OP] to understand”, which shows in fine the point of reading textbooks!

Dennis Lindley (1923-2013)

Posted in Books, Statistics, University life with tags , , , , , , on December 16, 2013 by xi'an

Dennis Lindley most sadly passed away yesterday at the hospital near his home in Somerset. He was one of the founding fathers of our field (of Bayesian statistics), who contributed to formalise Bayesian statistics in a coherent theory. And to make it one with rational decision-making, a perspective missing in Jeffreys’ vision. (His papers figured prominently in the tutorials we gave yesterday for the opening of O’Bayes 250.) At the age of 90, his interest in the topic had not waned away: as his interview with Tony O’Hagan last Spring showed, his passionate arguing for the rationale of the Bayesian approach was still there and alive! The review he wrote of The Black Swan a few years ago also demonstrated he had preserved his ability to see through bogus arguments. (See his scathing “One hardly advances the respect with which statisticians are held in society by making such declarations” in his ripping discussion of Aitkin’s 1991 Posterior Bayes factors.) He also started this interesting discussion last year about the five standard deviations “needed” for the Higgs boson…  My personal email contacts with Dennis over the re-reading of Jeffreys’ book  were a fantastic experience as he kindly contributed by expanding on how the book was received at the time and correcting some of my misunderstanding. It is a pity I can no longer send him the (soon to come?) final version of my Jeffreys-Lindley paradox paper as I intended to do. The email will no longer answer our queries… I figure there will be many testimonies and shared memories of his contributions and life at the Bayes-250 conference tomorrow. Farewell, Dennis, and I hope you now explore the paths of a more coherent world than ours!

Đôi nét về GS. Xi’an

Posted in Books, Travel, University life with tags , , , , on May 28, 2013 by xi'an

Here is a short bio of me written in Vietnamese in conjunction with the course I will give at CMS (Centre for Mathematical Sciences), Ho Chi Min City, next week:

Christian P. Robert là giáo sư tại Khoa Toán ứng dụng của ĐH Paris Dauphine từ năm 2000. GS Robert đã từng giảng dạy ở các ĐH Perdue, Cornell (Mỹ) và ĐH Canterbury (New-Zealand). Ông đã làm biên tập cho tạp chí Journal of the Royal Statistical Society Series B từ năm 2006 đến năm 2009 và là phó biên tập cho tạp chí Annals of Statistics. Năm 2008, ông làm Chủ tịch của Hiệp hội Thống kê Quốc tế về Thống kê Bayes (ISBA). Lĩnh vực nghiên cứu của GS Robert bao gồm Thống kê Bayes mà tập trung chính vào Lý thuyết quyết định (Decision theory) và Mô hình lựa chọn (Model selection), Lý thuyết về Xích Markov trong mô phỏng và Thống kê tính toán.