Archive for admissibility

principles of uncertainty (second edition)

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , on July 21, 2020 by xi'an

A new edition of Principles of Uncertainty is about to appear. I was asked by CRC Press to review the new book and here are some (raw) extracts from my review. (Some comments may not apply to the final and published version, mind.)

In Chapter 6, the proof of the Central Limit Theorem utilises the “smudge” technique, which is to add an independent noise to both the sequence of rvs and its limit. This is most effective and reminds me of quite a similar proof Jacques Neveu used in its probability notes in Polytechnique. Which went under the more formal denomination of convolution, with the same (commendable) purpose of avoiding Fourier transforms. If anything, I would have favoured a slightly more condensed presentation in less than 8 pages. Is Corollary 6.5.8 useful or even correct??? I do not think so because the non-centred average rescaled by √n diverges almost surely. For the same reason, I object to the very first sentence of Section 6.5 (p.246)

In Chapter 7, I found a nice mention of (Hermann) Rubin’s insistence on not separating probability and utility as only the product matters. And another fascinating quote from Keynes, not from his early statistician’s years, but in 1937 as an established economist

“The sense in which I am using the term uncertain is that in which the prospect of a European war is uncertain, or the price of copper and the rate of interest twenty years hence, or the obsolescence of a new invention, or the position of private wealth-owners in the social system in 1970. About these matters there is no scientific basis on which to form any calculable probability whatever. We simply do not know. Nevertheless, the necessity for action and for decision compels us as practical men to do our best to overlook this awkward fact and to behave exactly as we should if we had behind us a good Benthamite calculation of a series of prospective advantages and disadvantages, each multiplied by its appropriate probability, waiting to the summed.”

(is the last sentence correct? I would have expected, pardon my French!, “to be summed”). Further interesting trivia on the criticisms of utility theory, including de Finetti’s role and his own lack of connection with subjective probability principles.

In Chapter 8, a major remark (iii) is found p.293 about the fact that a conjugate family requires a dominating measure (although this is expressed differently since the book shies away from introducing measure theory, ) reminds me of a conversation I had with Jay when I visited Carnegie Mellon in 2013 (?). Which exposes the futility of seeing conjugate priors as default priors. It is somewhat surprising that a notion like admissibility appears as a side quantity when discussing Stein’s paradox in 8.2.1 [and then later in Section 9.1.3] while it seems to me to be central to Bayesian decision theory, much more than the epiphenomenon that Stein’s paradox represents in the big picture. But the book dismisses minimaxity even faster in Section 9.1.4:

As many who suffer from paranoia have discovered, one can always dream-up an even worse possibility to guard against. Thus, the minimax framework is unstable. (p.336)

Interesting introduction of the Wishart distribution to kindly handle random matrices and matrix Jacobians, with the original space being the p(p+1)/2 real space (implicitly endowed with the Lebesgue measure). Rather than a more structured matricial space. A font error makes Corollary 8.7.2 abort abruptly. The space of positive definite matrices is mentioned in Section8.7.5 but still (implicitly) corresponds to the common p(p+1)/2 real Euclidean space. Another typo in Theorem 8.9.2 with a Frenchised version of Dirichlet, Dirichelet. Followed by a Dirchlet at the end of the proof (p.322). Again and again on p.324 and on following pages. I would object to the singular in the title of Section 8.10 as there are exponential families rather than a single one. With no mention made of Pitman-Koopman lemma and its consequences, namely that the existence of conjugacy remains an epiphenomenon. Hence making the amount of pages dedicated to gamma, Dirichlet and Wishart distributions somewhat excessive.

In Chapter 9, I noticed (p.334) a Scheffe that should be Scheffé (and again at least on p.444). (I love it that Jay also uses my favorite admissible (non-)estimator, namely the constant value estimator with value 3.) I wonder at the worth of a ten line section like 9.3, when there are delicate issues in handling meta-analysis, even in a Bayesian mood (or mode). In the model uncertainty section, Jay discuss the (im)pertinence of both selecting one of the models and setting independent priors on their respective parameters, with which I disagree on both levels. Although this is followed by a more reasonable (!) perspective on utility. Nice to see a section on causation, although I would have welcomed an insert on the recent and somewhat outrageous stand of Pearl (and MacKenzie) on statisticians missing the point on causation and counterfactuals by miles. Nonparametric Bayes is a new section, inspired from Ghahramani (2005). But while it mentions Gaussian and Dirichlet [invariably misspelled!] processes, I fear it comes short from enticing the reader to truly grasp the meaning of a prior on functions. Besides mentioning it exists, I am unsure of the utility of this section. This is one of the rare instances where measure theory is discussed, only to state this is beyond the scope of the book (p.349).

prior against truth!

Posted in Books, Kids, Statistics with tags , , , , , , , on June 4, 2018 by xi'an

A question from X validated had interesting ramifications, about what happens when the prior does not cover the true value of the parameter (assuming there ? In fact, not so much in that, from a decision theoretic perspective, the fact that that π(θ⁰)=0, or even that π(θ)=0 in a neighbourhood of θ⁰ does not matter [too much]. Indeed, the formal derivation of a Bayes estimator as minimising the posterior loss means that the resulting estimator may take values that were “impossible” from a prior perspective! Indeed, taking for example the posterior mean, the convex combination of all possible values of θ under π may well escape the support of π when this support is not convex. Of course, one could argue that estimators should further be restricted to be possible values of θ under π but that would reduce their decision theoretic efficiency.

An example is the brilliant minimaxity result by George Casella and Bill Strawderman from 1981: when estimating a Normal mean μ based on a single observation xwith the additional constraint that |μ|<ρ, and when ρ is small enough, ρ1.0567 quite specifically, the minimax estimator for this problem under squared error loss corresponds to a (least favourable) uniform prior on the pair {ρ,ρ}, meaning that π gives equal weight to ρ and ρ (and none to any other value of the mean μ). When ρ increases above this bound, the least favourable prior sees its support growing one point at a time, but remaining a finite set of possible values. However the posterior expectation, 𝔼[μ|x], can take any value on (ρ,ρ).

In an even broader suspension of belief (in the prior), it may be that the prior has such a restricted support that it cannot consistently estimate the (true value of the) parameter, but the associated estimator may remain admissible or minimax.

admissible estimators that are not Bayes

Posted in Statistics with tags , , , , , , on December 30, 2017 by xi'an

A question that popped up on X validated made me search a little while for point estimators that are both admissible (under a certain loss function) and not generalised Bayes (under the same loss function), before asking Larry Brown, Jim Berger, or Ed George. The answer came through Larry’s book on exponential families, with the two examples attached. (Following our 1989 collaboration with Roger Farrell at Cornell U, I knew about the existence of testing procedures that were both admissible and not Bayes.) The most surprising feature is that the associated loss function is strictly convex as I would have thought that a less convex loss would have helped to find such counter-examples.

Charles M. Stein [1920-2016]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , on November 26, 2016 by xi'an

I have just heard that Charles Stein, Professor at Stanford University, passed away last night. Although the following image is definitely over-used, I truly feel this is the departure of a giant of statistics.  He has been deeply influential on the fields of probability and mathematical statistics, primarily in decision theory and approximation techniques. On the first field, he led to considerable changes in the perception of optimality by exhibiting the Stein phenomenon, where the aggregation of several admissible estimators of unrelated quantities may (and will) become inadmissible for the joint estimation of those quantities! Although the result can be explained by mathematical and statistical reasoning, it was still dubbed a paradox due to its counter-intuitive nature. More foundationally, it led to expose the ill-posed nature of frequentist optimality criteria and certainly contributed to the Bayesian renewal of the 1980’s, before the MCMC revolution. (It definitely contributed to my own move, as I started working on the Stein phenomenon during my thesis, before realising the fundamentally Bayesian nature of the domination results.)

“…the Bayesian point of view is often accompanied by an insistence that people ought to agree to a certain doctrine even without really knowing what this doctrine is.” (Statistical Science, 1986)

The second major contribution of Charles Stein was the introduction of a new technique for normal approximation that is now called the Stein method. It relies on a differential operator and produces estimates of approximation error in Central Limit theorems, even in dependent settings. While I am much less familiar with this aspect of Charles Stein’s work, I believe the impact it has had on the field is much more profound and durable than the Stein effect in Normal mean estimation.

(During the Vietnam War, he was quite active in the anti-war movement and the above picture from 2003 shows that his opinions had not shifted over time!) A giant truly has gone.

Bureau international des poids et mesures [bayésiennes?]

Posted in pictures, Statistics, Travel with tags , , , , , , , , , , , , , on June 19, 2015 by xi'an

The workshop at the BIPM on measurement uncertainty was certainly most exciting, first by its location in the Parc de Saint Cloud in classical buildings overlooking the Seine river in a most bucolic manner…and second by its mostly Bayesian flavour. The recommendations that the workshop addressed are about revisions in the current GUM, which stands for the Guide to the Expression of Uncertainty in Measurement. The discussion centred on using a more Bayesian approach than in the earlier version, with the organisers of the workshop and leaders of the revision apparently most in favour of that move. “Knowledge-based pdfs” came into the discussion as an attractive notion since it rings a Bayesian bell, especially when associated with probability as a degree of belief and incorporating the notion of an a priori probability distribution. And propagation of errors. Or even more when mentioning the removal of frequentist validations. What I gathered from the talks is the perspective drifting away from central limit approximations to more realistic representations, calling for Monte Carlo computations. There is also a lot I did not get about conventions, codes and standards. Including a short debate about the different meanings on Monte Carlo, from simulation technique to calculation method (as for confidence intervals). And another discussion about replacing the old formula for estimating sd from the Normal to the Student’s t case. A change that remains highly debatable since the Student’s t assumption is as shaky as the Normal one. What became clear [to me] during the meeting is that a rather heated debate is currently taking place about the need for a revision, with some members of the six (?) organisations involved arguing against Bayesian or linearisation tools.

This became even clearer during our frequentist versus Bayesian session with a first talk so outrageously anti-Bayesian it was hilarious! Among other things, the notion that “fixing” the data was against the principles of physics (the speaker was a physicist), that the only randomness in a Bayesian coin tossing was coming from the prior, that the likelihood function was a subjective construct, that the definition of the posterior density was a generalisation of Bayes’ theorem [generalisation found in… Bayes’ 1763 paper then!], that objective Bayes methods were inconsistent [because Jeffreys’ prior produces an inadmissible estimator of μ²!], that the move to Bayesian principles in GUM would cost the New Zealand economy 5 billion dollars [hopefully a frequentist estimate!], &tc., &tc. The second pro-frequentist speaker was by comparison much much more reasonable, although he insisted on showing Bayesian credible intervals do not achieve a nominal frequentist coverage, using a sort of fiducial argument distinguishing x=X+ε from X=x+ε that I missed… A lack of achievement that is fine by my standards. Indeed, a frequentist confidence interval provides a coverage guarantee either for a fixed parameter (in which case the Bayesian approach achieves better coverage by constant updating) or a varying parameter (in which case the frequency of proper inclusion is of no real interest!). The first Bayesian speaker was Tony O’Hagan, who summarily shred the first talk to shreds. And also criticised GUM2 for using reference priors and maxent priors. I am afraid my talk was a bit too exploratory for the audience (since I got absolutely no question!) In retrospect, I should have given an into to reference priors.

An interesting specificity of a workshop on metrology and measurement is that they are hard stickers to schedule, starting and finishing right on time. When a talk finished early, we waited until the intended time to the next talk. Not even allowing for extra discussion. When the only overtime and Belgian speaker ran close to 10 minutes late, I was afraid he would (deservedly) get lynched! He escaped unscathed, but may (and should) not get invited again..!