Archive for Bayesian Choice

my book available for a mere $1,091.50

Posted in Books with tags , , , on May 1, 2016 by xi'an

As I was looking at a link to my Bayesian Choice book on Amazon, I found that one site offered it for the modest sum of $1,091.50, a very slight increase when compared with the reference price of $59.95… I do wonder at the reason (scam?) behind this offer as such a large price is unlikely to attract any potential buyer to the site. (Obviously, if you are interested by this price, feel free to contact me!)

the philosophical importance of Stein’s paradox [a reply from the authors]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on January 15, 2016 by xi'an

[In the wake of my comment on this paper written by three philosophers of Science, I received this reply from Olav Vassend.]

Thank you for reading our paper and discussing it on your blog! Our purpose with the paper was to give an introduction to Stein’s phenomenon for a philosophical audience; it was not meant to — and probably will not — offer a new and interesting perspective for a statistician who is already very familiar with Stein’s phenomenon and its extensive literature.

I have a few more specific comments:

1. We don’t rechristen Stein’s phenomenon as “holistic pragmatism.” Rather, holistic pragmatism is the attitude to frequentist estimation that we think is underwritten by Stein’s phenomenon. Since MLE is sometimes admissible and sometimes not, depending on the number of parameters estimated, the researcher has to take into account his or her goals (whether total accuracy or individual-parameter accuracy is more important) when picking an estimator. To a statistician, this might sound obvious, but to philosophers it’s a pretty radical idea.

2. “The part connecting Stein with Bayes again starts on the wrong foot, since it is untrue that any shrinkage estimator can be expressed as a Bayes posterior mean. This is not even true for the original James-Stein estimator, i.e., it is not a Bayes estimator and cannot be a Bayes posterior mean.”

That seems to depend on what you mean by a “Bayes estimator.” It is possible to have an empirical Bayes prior (constructed from the sample) whose posterior mean is identical to the original James-Stein estimator. But if you don’t count empirical Bayes priors as Bayesian, then you are right.

3. “And to state that improper priors “integrate to a number larger than 1” and that “it’s not possible to be more than 100% confident in anything”… And to confuse the Likelihood Principle with the prohibition of data dependent priors. And to consider that the MLE and any shrinkage estimator have the same expected utility under a flat prior (since, if they had, there would be no Bayes estimator!).”

I’m not sure I completely understand your criticisms here. First, as for the relation between the LP and data-dependent priors — it does seem to me that the LP precludes the use of data-dependent priors.  If you use data from an experiment to construct your prior, then — contrary to the LP — it will not be true that all the information provided by the experiment regarding which parameter is true is contained in the likelihood function, since some of the information provided by the experiment will also be in your prior.

Second, as to our claim that the ML estimator has the same expected utility (under the flat prior) as a shrinkage prior that it is dominated by—we incorporated this claim into our paper because it was an objection made by a statistician who read and commented on our paper. Are you saying the claim is false? If so, we would certainly like to know so that we can revise the paper to make it more accurate.

4. I was aware of Rubin’s idea that priors and utility functions (supposedly) are non-separable, but I didn’t (and don’t) quite see the relevance of that idea to Stein estimation.

5. “Similarly, very little of substance can be found about empirical Bayes estimation and its philosophical foundations.”

What we say about empirical Bayes priors is that they cannot be interpreted as degrees of belief; they are just tools. It will be surprising to many philosophers that priors are sometimes used in such an instrumentalist fashion in statistics.

6. The reason why we made a comparison between Stein estimation and AIC was two-fold: (a) for sociological reasons, philosophers are much more familiar with model selection than they are with, say, the LASSO or other regularized regression methods. (b) To us, it’s precisely because model selection and estimation are such different enterprises that it’s interesting that they have such a deep connection: despite being very different, AIC and shrinkage both rely on a bias-variance trade-off.

7. “I also object to the envisioned possibility of a shrinkage estimator that would improve every component of the MLE (in a uniform sense) as it contradicts the admissibility of the single component MLE!”

I don’t think our suggestion here contradicts the admissibility of single component MLE. The idea is just that if we have data D and D’ about parameters φ and φ’, then the estimates of both φ and φ’ can sometimes be improved if the estimation problems are lumped together and a shrinkage estimator is used. This doesn’t contradict the admissibility of MLE, because MLE is still admissible on each of the data sets for each of the parameters.

Again, thanks for reading the paper and for the feedback—we really do want to make sure our paper is accurate, so your feedback is much appreciated. Lastly, I apologize for the length of this comment.

Olav Vassend

approximating evidence with missing data

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on December 23, 2015 by xi'an

University of Warwick, May 31 2010Panayiota Touloupou (Warwick), Naif Alzahrani, Peter Neal, Simon Spencer (Warwick) and Trevelyan McKinley arXived a paper yesterday on Model comparison with missing data using MCMC and importance sampling, where they proposed an importance sampling strategy based on an early MCMC run to approximate the marginal likelihood a.k.a. the evidence. Another instance of estimating a constant. It is thus similar to our Frontier paper with Jean-Michel, as well as to the recent Pima Indian survey of James and Nicolas. The authors give the difficulty to calibrate reversible jump MCMC as the starting point to their research. The importance sampler they use is the natural choice of a Gaussian or t distribution centred at some estimate of θ and with covariance matrix associated with Fisher’s information. Or derived from the warmup MCMC run. The comparison between the different approximations to the evidence are done first over longitudinal epidemiological models. Involving 11 parameters in the example processed therein. The competitors to the 9 versions of importance samplers investigated in the paper are the raw harmonic mean [rather than our HPD truncated version], Chib’s, path sampling and RJMCMC [which does not make much sense when comparing two models]. But neither bridge sampling, nor nested sampling. Without any surprise (!) harmonic means do not converge to the right value, but more surprisingly Chib’s method happens to be less accurate than most importance solutions studied therein. It may be due to the fact that Chib’s approximation requires three MCMC runs and hence is quite costly. The fact that the mixture (or defensive) importance sampling [with 5% weight on the prior] did best begs for a comparison with bridge sampling, no? The difficulty with such study is obviously that the results only apply in the setting of the simulation, hence that e.g. another mixture importance sampler or Chib’s solution would behave differently in another model. In particular, it is hard to judge of the impact of the dimensions of the parameter and of the missing data.

the philosophical importance of Stein’s paradox

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on November 30, 2015 by xi'an

I recently came across this paper written by three philosophers of Science, attempting to set the Stein paradox in a philosophical light. Given my past involvement, I was obviously interested about which new perspective could be proposed, close to sixty years after Stein (1956). Paper that we should actually celebrate next year! However, when reading the document, I did not find a significantly innovative approach to the phenomenon…

The paper does not start in the best possible light since it seems to justify the use of a sample mean through maximum likelihood estimation, which only is the case for a limited number of probability distributions (including the Normal distribution, which may be an implicit assumption). For instance, when the data is Student’s t, the MLE is not the sample mean, no matter how shocking that might sounds! (And while this is a minor issue, results about the Stein effect taking place in non-normal settings appear much earlier than 1998. And earlier than in my dissertation. See, e.g., Berger and Bock (1975). Or in Brandwein and Strawderman (1978).)

While the linear regression explanation for the Stein effect is already exposed in Steve Stigler’s Neyman Lecture, I still have difficulties with the argument in that for instance we do not know the value of the parameter, which makes the regression and the inverse regression of parameter means over Gaussian observations mere concepts and nothing practical. (Except for the interesting result that two observations make both regressions coincide.) And it does not seem at all intuitive (to me) that imposing a constraint should improve the efficiency of a maximisation program… Continue reading

books versus papers [for PhD students]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , on July 7, 2012 by xi'an

Before I run out of time, here is my answer to the ISBA Bulletin Students’ corner question of the term: “In terms of publications and from your own experience, what are the pros and cons of books vs journal articles?

While I started on my first book during my postdoctoral years in Purdue and Cornell [a basic probability book made out of class notes written with Arup Bose, which died against the breakers of some referees’ criticisms], my overall opinion on this is that books are never valued by hiring and promotion committees for what they are worth! It is a universal constant I met in the US, the UK and France alike that books are not helping much for promotion or hiring, at least at an early stage of one’s career. Later, books become a more acknowledge part of senior academics’ vitae. So, unless one has a PhD thesis that is ready to be turned into a readable book without having any impact on one’s publication list, and even if one has enough material and a broad enough message at one’s disposal, my advice is to go solely and persistently for journal articles. Besides the above mentioned attitude of recruiting and promotion committees, I believe this has several positive aspects: it forces the young researcher to maintain his/her focus on specialised topics in which she/he can achieve rapid prominence, rather than having to spend [quality research] time on replacing the background and building reference. It provides an evaluation by peers of the quality of her/his work, while reviews of books are generally on the light side. It is the starting point for building a network of collaborations, few people are interested in writing books with strangers (when knowing it is already quite a hardship with close friends!). It is also the entry to workshops and international conferences, where a new book very rarely attracts invitations.

Writing a book is of course exciting and somewhat more deeply rewarding, but it is awfully time-consuming and requires a high level of organization young faculty members rarely possess when starting a teaching job at a new university (with possibly family changes as well!). I was quite lucky when writing The Bayesian Choice and Monte Carlo Statistical Methods to mostly be on leave from teaching, as it would have otherwise be impossible! That we are not making sufficient progress on our revision of Bayesian Core, started two years ago, is a good enough proof that even with tight planning, great ideas, enthusiasm, sale prospects, and available material, completing a book may get into trouble for mere organisational issues…

About capture-recapture

Posted in Books, Statistics with tags , , , , , , , on September 10, 2009 by xi'an

I really like the models derived from capture-recapture experiments, because they encompass latent variables, hidden Markov process, Gibbs simulation, EM estimation, and hierarchical models in a simple setup with a nice side story to motivate it (at least in Ecology, in Social Sciences, those models are rather associated with sad stories like homeless, heroin addicts or prostitutes…) I was thus quite surprised to hear from many that the capture-recapture chapter in Bayesian Core was hard to understand. In a sense, I find it easier than the mixture chapter because the data is discrete and everything can [almost!] be done by hand…

Today I received an email from Cristiano about a typo in The Bayesian Choice concerning capture-recapture models:

“I’ve read the paragraph (4.3.3) in your book and I have some doubts about the proposed formula in example 4.3.3. My guess is that a typo is here, where (n-n_1) instead of n_2 should appear in the hypergeometric distribution.”

It is indeed the case! This mistake has been surviving the many revisions and reprints of the book and is also found in the French translation Le Choix Bayésien, in Example 4.19… In both cases, {n_2 \choose n_2-n_{11}} should be {n-n_1 \choose n_2-n_{11}}, shame on me! (The mistake does not appear in Bayesian Core.)

My reader also had a fairly interesting question about an extension of the usual model,

That said, I would appreciate if you could help me in finding references to a slightly different setting, where the assumption is that while collecting the first or the second sample, an individual may appear twice. If we assume that a stopping rule is used: “n_1 or n_2 equal 5 and the captured individuals are different” my guess is that the hypergeometric formulation is incomplete and may lead to overestimation of the population. Is there in your knowledge any already developed study you can point me about this different framework?

to which I can only suggest to incorporate the error-in-variable structure, ie the possible confusion  in identifying individuals, within the model and to run a Gibbs sampler that simulates iteratively the latent variable” true numbers of individuals in captures 1 and 2″ and the parameters given those latent variables. This problem of counting the same individual twice or more has obvious applications in Ecology, when animals are only identified by watchers, as in whale sightings, and in Social Sciences, when individuals are lacking identification. [To answer specifically the overestimation question, this is clearly the case since n_1 and n_2 are larger than in truth, while n_{11} presumably remains the same….]