Archive for The Bayesian Choice

Bayesian propaganda?

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , on April 20, 2015 by xi'an

“The question is about frequentist approach. Bayesian is admissable [sic] only by wrong definition as it starts with the assumption that the prior is the correct pre-information. James-Stein beats OLS without assumptions. If there is an admissable [sic] frequentist estimator then it will correspond to a true objective prior.”

I had a wee bit of a (minor, very minor!) communication problem on X validated, about a question on the existence of admissible estimators of the linear regression coefficient in multiple dimensions, under squared error loss. When I first replied that all Bayes estimators with finite risk were de facto admissible, I got the above reply, which clearly misses the point, and as I had edited the OP question to include more tags, the edited version was reverted with a comment about Bayesian propaganda! This is rather funny, if not hilarious, as (a) Bayes estimators are indeed admissible in the classical or frequentist sense—I actually fail to see a definition of admissibility in the Bayesian sense—and (b) the complete class theorems of Wald, Stein, and others (like Jack Kiefer, Larry Brown, and Jim Berger) come from the frequentist quest for best estimator(s). To make my point clearer, I also reproduced in my answer the Stein’s necessary and sufficient condition for admissibility from my book but it did not help, as the theorem was “too complex for [the OP] to understand”, which shows in fine the point of reading textbooks!

comparison of Bayesian predictive methods for model selection

Posted in Books, Statistics, University life with tags , , , , , , , , , on April 9, 2015 by xi'an

“Dupuis and Robert (2003) proposed choosing the simplest model with enough explanatory power, for example 90%, but did not discuss the effect of this threshold for the predictive performance of the selected models. We note that, in general, the relative explanatory power is an unreliable indicator of the predictive performance of the submodel,”

Juho Piironen and Aki Vehtari arXived a survey on Bayesian model selection methods that is a sequel to the extensive survey of Vehtari and Ojanen (2012). Because most of the methods described in this survey stem from Kullback-Leibler proximity calculations, it includes some description of our posterior projection method with Costas Goutis and Jérôme Dupuis. We indeed did not consider prediction in our papers and even failed to include consistency result, as I was pointed out by my discussant in a model choice meeting in Cagliari, in … 1999! Still, I remain fond of the notion of defining a prior on the embedding model and of deducing priors on the parameters of the submodels by Kullback-Leibler projections. It obviously relies on the notion that the embedding model is “true” and that the submodels are only approximations. In the simulation experiments included in this survey, the projection method “performs best in terms of the predictive ability” (p.15) and “is much less vulnerable to the selection induced bias” (p.16).

Reading the other parts of the survey, I also came to the perspective that model averaging makes much more sense than model choice in predictive terms. Sounds obvious stated that way but it took me a while to come to this conclusion. Now, with our mixture representation, model averaging also comes as a natural consequence of the modelling, a point presumably not stressed enough in the current version of the paper. On the other hand, the MAP model now strikes me as artificial and linked to a very rudimentary loss function. A loss that does not account for the final purpose(s) of the model. And does not connect to the “all models are wrong” theorem.

a week in Oxford

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on January 26, 2015 by xi'an

1sprI spent [most of] the past week in Oxford in connection with our joint OxWaSP PhD program, which is supported by the EPSRC, and constitutes a joint Centre of Doctoral Training in  statistical science focussing on data-­intensive environments and large-­scale models. The first cohort of a dozen PhD students had started their training last Fall with the first year spent in Oxford, before splitting between Oxford and Warwick to write their thesis.  Courses are taught over a two week block, with a two day introduction to the theme (Bayesian Statistics in my case), followed by reading, meetings, daily research talks, mini-projects, and a final day in Warwick including presentations of the mini-projects and a concluding seminar.  (involving Jonty Rougier and Robin Ryder, next Friday). This approach by bursts of training periods is quite ambitious in that it requires a lot from the students, both through the lectures and in personal investment, and reminds me somewhat of a similar approach at École Polytechnique where courses are given over fairly short periods. But it is also profitable for highly motivated and selected students in that total immersion into one topic and a large amount of collective work bring them up to speed with a reasonable basis and the option to write their thesis on that topic. Hopefully, I will see some of those students next year in Warwick working on some Bayesian analysis problem!

On a personal basis, I also enjoyed very much my time in Oxford, first for meeting with old friends, albeit too briefly, and second for cycling, as the owner of the great Airbnb place I rented kindly let me use her bike to go around, which allowed me to go around quite freely! Even on a train trip to Reading. As it was a road racing bike, it took me a trip or two to get used to it, especially on the first day when the roads were somewhat icy, but I enjoyed the lightness of it, relative to my lost mountain bike, to the point of considering switching to a road bike for my next bike… I had also some apprehensions with driving at night, which I avoid while in Paris, but got over them until the very last night when I had a very close brush with a car entering from a side road, which either had not seen me or thought I would let it pass. Gave me the opportunity of shouting Oï!

teaching Bayesian statistics in HoChiMin City

Posted in Books, Kids, Statistics, Travel, University life with tags , , , , , on June 5, 2013 by xi'an

DSC_4977Today, I gave my course at the University of Sciences (Truòng Dai hoc) here in Saigon in front of 40 students. It was a bit of a crash course and I covered between four and five chapters of The Bayesian Choice in about six hours. This was exhausting both for them and for me, but I managed to keep writing on the blackboard till the end and they bravely managed to keep their focus till the end as well. Since the students were of various backgrounds different from maths and stats (even though some were completing PhD’s involving Bayesian tools) I do wonder how much they sifted from this crash course, apart from my oft repeated messages that everyone had to pick a prior rather than go fishing for the prior… (No pho tday but a spicy beef stew and banh xeo local pancakes!) Here are the slides for the students:

Đôi nét về GS. Xi’an

Posted in Books, Travel, University life with tags , , , , on May 28, 2013 by xi'an

Here is a short bio of me written in Vietnamese in conjunction with the course I will give at CMS (Centre for Mathematical Sciences), Ho Chi Min City, next week:

Christian P. Robert là giáo sư tại Khoa Toán ứng dụng của ĐH Paris Dauphine từ năm 2000. GS Robert đã từng giảng dạy ở các ĐH Perdue, Cornell (Mỹ) và ĐH Canterbury (New-Zealand). Ông đã làm biên tập cho tạp chí Journal of the Royal Statistical Society Series B từ năm 2006 đến năm 2009 và là phó biên tập cho tạp chí Annals of Statistics. Năm 2008, ông làm Chủ tịch của Hiệp hội Thống kê Quốc tế về Thống kê Bayes (ISBA). Lĩnh vực nghiên cứu của GS Robert bao gồm Thống kê Bayes mà tập trung chính vào Lý thuyết quyết định (Decision theory) và Mô hình lựa chọn (Model selection), Lý thuyết về Xích Markov trong mô phỏng và Thống kê tính toán.

Robins and Wasserman

Posted in Statistics, Travel, University life with tags , , , , on January 17, 2013 by xi'an

entrance to Banaras Hindu University, with the huge ISBA poster, Varanasi, Jan. 10, 2013As I attended Jamie Robins’ session in Varanasi and did not have a clear enough idea of the Robbins and Wasserman paradox to discuss it viva vocce, here are my thoughts after reading Larry’s summary. My first reaction was to question whether or not this was a Bayesian statistical problem (meaning why should I be concered with the problem). Just as the normalising constant problem was not a statistical problem. We are estimating an integral given some censored realisations of a binomial depending on a covariate through an unknown function θ(x). There is not much of a parameter. However, the way Jamie presented it thru clinical trials made the problem sound definitely statistical. So end of the silly objection. My second step is to consider the very point of estimating the entire function (or infinite dimensional parameter) θ(x) when only the integral ψ is of interest. This is presumably the reason why the Bayesian approach fails as it sounds difficult to consistently estimate θ(x) under censored binomial observations, while ψ can be. Of course, if we want to estimate the probability of a success like ψ going through functional estimation this sounds like overshooting. But the Bayesian modelling of the problem appears to require considering all unknowns at once, including the function θ(x) and cannot forget about it. We encountered a somewhat similar problem with Jean-Michel Marin when working on the k-nearest neighbour classification problem. Considering all the points in the testing sample altogether as unknowns would dwarf the training sample and its information content to produce very poor inference. And so we ended up dealing with one point at a time after harsh and intense discussions! Now, back to the Robins and Wasserman paradox, I see no problem in acknowledging a classical Bayesian approach cannot produce a convergent estimate of the integral ψ. Simply because the classical Bayesian approach is an holistic system that cannot remove information to process a subset of the original problem. Call it the curse of marginalisation. Now, on a practical basis, would there be ways of running simulations of the missing Y’s when π(x) is known in order to achieve estimates of ψ? Presumably, but they would end up with a frequentist validation…

testing via credible sets

Posted in Statistics, University life with tags , , , , , , , , , , , on October 8, 2012 by xi'an

Måns Thulin released today an arXiv document on some decision-theoretic justifications for [running] Bayesian hypothesis testing through credible sets. His main point is that using the unnatural prior setting mass on a point-null hypothesis can be avoided by rejecting the null when the point-null value of the parameter does not belong to the credible interval and that this decision procedure can be validated through the use of special loss functions. While I stress to my students that point-null hypotheses are very unnatural and should be avoided at all cost, and also that constructing a confidence interval is not the same as designing a test—the former assess the precision in the estimation, while the later opposes two different and even incompatible models—, let us consider Måns’ arguments for their own sake.

The idea of the paper is that there exist loss functions for testing point-null hypotheses that lead to HPD, symmetric and one-sided intervals as acceptance regions, depending on the loss func. This was already found in Pereira & Stern (1999). The issue with these loss functions is that they involve the corresponding credible sets in their definition, hence are somehow tautological. For instance, when considering the HPD set and T(x) as the largest HPD set not containing the point-null value of the parameter, the corresponding loss function is

L(\theta,\varphi,x) = \begin{cases}a\mathbb{I}_{T(x)^c}(\theta) &\text{when }\varphi=0\\ b+c\mathbb{I}_{T(x)}(\theta) &\text{when }\varphi=1\end{cases}

parameterised by a,b,c. And depending on the HPD region.

Måns then introduces new loss functions that do not depend on x and still lead to either the symmetric or the one-sided credible intervals.as acceptance regions. However, one test actually has two different alternatives (Theorem 2), which makes it essentially a composition of two one-sided tests, while the other test returns the result to a one-sided test (Theorem 3), so even at this face-value level, I do not find the result that convincing. (For the one-sided test, George Casella and Roger Berger (1986) established links between Bayesian posterior probabilities and frequentist p-values.) Both Theorem 3 and the last result of the paper (Theorem 4) use a generic and set-free observation-free loss function (related to eqn. (5.2.1) in my book!, as quoted by the paper) but (and this is a big but) they only hold for prior distributions setting (prior) mass on both the null and the alternative. Otherwise, the solution is to always reject the hypothesis with the zero probability… This is actually an interesting argument on the why-are-credible-sets-unsuitable-for-testing debate, as it cannot bypass the introduction of a prior mass on Θ0!

Overall, I furthermore consider that a decision-theoretic approach to testing should encompass future steps rather than focussing on the reply to the (admittedly dumb) question is θ zero? Therefore, it must have both plan A and plan B at the ready, which means preparing (and using!) prior distributions under both hypotheses. Even on point-null hypotheses.

Now, after I wrote the above, I came upon a Stack Exchange page initiated by Måns last July. This is presumably not the first time a paper stems from Stack Exchange, but this is a fairly interesting outcome: thanks to the debate on his question, Måns managed to get a coherent manuscript written. Great! (In a sense, this reminded me of the polymath experiments of Terry Tao, Timothy Gower and others. Meaning that maybe most contributors could have become coauthors to the paper!)

Follow

Get every new post delivered to your Inbox.

Join 891 other followers