Thank for discussing our work. Let me clarify the technical point that you raised:

– The difference between Leg_{j}(u)_j and T_{j}=Leg_{j}(G(θ)). One is orthonormal polyn of L_{2}[0,1] and the other one is L_{2}[G]. The second one is poly of rank-transform G(θ).

– As you correctly pointed out there is a danger in directly approximating the ratio. We work on it after taking the quantile transform: evaluate the ratio at g⁻¹(θ), which is the d(u;G,F) over unit interval. Now, this new transformed function is a proper density.

-Thus the ratio now becomes d(G(θ)) which can be expended into (NOT in Leg-basis) in , in eq (2.2), as it lives in the Hilbert space L_{2}(G)

– For your last point on Step 2 of our algo, we can also use the simple integrate command.

-Unlike traditional prior-data conflict here we attempted to answer three questions in one-shot: (i) How compatible is the pre-selected g with the given data? (ii) In the event of a conflict, can we also inform the user on the nature of misfit–finer structure that was a priori unanticipated? (iii) Finally, we would like to provide a simple, yet formal guideline for upgrading (repairing) the starting g.

Hopefully, this will clear the air. But thanks for reading the paper so carefully. Appreciate it.

“To develop a “defendable and defensible” Bayesian learning model, we have to go beyond blindly ‘turning the crank’ based on a “go-as-you-like” [approximate guess] prior. A lackluster attitude towards prior modeling could lead to disastrous inference, impacting various fields from clinical drug development to presidential election forecasts. The real questions are: How can we uncover the blind spots of the conventional wisdom-based prior? How can we develop the science of prior model-building that combines both data and science [DS-prior] in a testable manner – a double-yolk Bayesian egg?”

I came through R bloggers on this presentation of a paper by Subhadeep Mukhopadhyay and Douglas Fletcher, Bayesian modelling via goodness of fit, that aims at solving all existing problems with classical Bayesian solutions, apparently! (With also apparently no awareness of David Spiegelhalter’s take on the matter.) As illustrated by both quotes, above and below:

“The two key issues of modern Bayesian statistics are: (i) establishing principled approach for distilling statistical prior that is consistent with the given data from an initial believable scientific prior; and (ii) development of a Bayes-frequentist consolidated data analysis work ow that is more effective than either of the two separately.”

(I wonder who else in this Universe would characterise “modern Bayesian statistics” in such a non-Bayesian way! And love the notion of distillation applied to priors!) The setup is actually one of empirical Bayes inference where repeated values of the parameter θ drawn from the prior are behind independent observations. Which is not the usual framework for a statistical analysis, where a single value of the parameter is supposed to hide behind the data, but most convenient for frequency based arguments behind empirical Bayes methods (which is the case here). The paper adopts a far-from-modern discourse on the “truth” of “the” prior… (Which is always conjugate in that Universe!) Instead of recognising the relativity of a statistical analysis based on a given prior.

When I tried to read the paper any further, I hit a wall as I could not understand the principle described therein. And how it “consolidates Bayes and frequentist, parametric and nonparametric, subjective and objective, quantile and information-theoretic philosophies.”. Presumably the lack of oxygen at the altitude of Chamonix…. Given an “initial guess” at the prior, g, a conjugate prior (in dimension one with an invertible cdf), a family of priors is created in what first looks like a form of non-parametric exponential tilting of g. But a closer look [at (2.1)] exposes the “family” as the tautological π(θ)=g(θ)x π(θ)/g(θ). The ratio is expanded into a Legendre polynomial series. Which use in Bayesian statistics dates a wee bit further back than indicated in the paper (see, e.g., Friedman, 1985; Diaconis, 1986). With the side issue that the resulting approximation does not integrate to one. Another side issue is that the coefficients of the Legendre truncated series are approximated by simulations from the prior [Step 3 of the Type II algorithm], rarely an efficient approach to the posterior.

The above is the running head of the arXived paper with full title “Implications of uniformly distributed, empirically informed priors for phylogeographical model selection: A reply to Hickerson et al.” by Oaks, Linkem and Sukuraman. That I (again) read in the plane to Montréal (third one in this series!, and last because I also watched the Japanese psycho-thriller Midsummer’s Equation featuring a physicist turned detective in one of many TV episodes. I just found some common features with The Devotion of Suspect X, only to discover now that the book has been turned into another episode in the series.)

“Here we demonstrate that the approach of Hickerson et al. (2014) is dangerous in the sense that the empirically-derived priors often exclude from consideration the true values of the models’ parameters. On a more fundamental level, we question the value of adopting an empirical Bayesian stance for this model-choice problem, because it can mislead model posterior probabilities, which are inherently measures of belief in the models after prior knowledge is updated by the data.”

This paper actually is a reply to Hickerson et al. (2014, Evolution), which is itself a reply to an earlier paper by Oaks et al. (2013, Evolution). [Warning: I did not check those earlier references!] The authors object to the use of “narrow, empirically informed uniform priors” for the reason reproduced in the above quote. In connection with the msBayes of Huang et al. (2011, BMC Bioinformatics). The discussion is less about ABC used for model choice and posterior probabilities of models and more about the impact of vague priors, Oaks et al. (2013) arguing that this leads to a bias towards models with less parameters, a “statistical issue” in their words, while Hickerson et al. (2014) think this is due to msBayes way of selecting models and their parameters at random.

“…it is difficult to choose a uniformly distributed prior on divergence times that is broad enough to confidently contain the true values of parameters while being narrow enough to avoid spurious support of models with less parameter space.”

So quite an interesting debate that takes us in fine far away from the usual worries about ABC model choice! We are more at the level empirical versus natural Bayes, seen in the literature of the 80’s. (The meaning of empirical Bayes is not that clear in the early pages as the authors seem to involve any method using the data “twice”.) I actually do not remember reading papers about the formal properties of model choice done through classical empirical Bayes techniques. Except the special case of Aitkin’s (1991,2009) integrated likelihood. Which is essentially the analysis performed on the coin toy example (p.7)

“…models with more divergence parameters will be forced to integrate over much greater parameter space, all with equal prior density, and much of it with low likelihood.”

The above argument is an interesting rephrasing of Lindley’s paradox, which I cannot dispute, but of course it does not solve the fundamental issue of how to choose the prior away from vague uniform priors… I also like the quote “the estimated posterior probability of a model is a single value (rather than a distribution) lacking a measure of posterior uncertainty” as this is an issue on which we are currently working. I fully agree with the statement and we think an alternative assessment to posterior probabilities could be more appropriate for model selection in ABC settings (paper soon to come, hopefully!).

On Friday, I am giving a talk on ABC at Université Laval, in the old city of Québec. While on my way to the 14w5125 workshop on scalable Bayesian computation at BIRS, Banff. I have not visited Laval since the late 1980’s (!) even though my last trip to Québec (the city) was in 2009, when François Perron took me there for ice-climbing and skiing after a seminar in Montréal… (This trip, I will not stay long enough in Québec, alas. Keeping my free day-off for another attempt at ice-climbing near Banff.) Here are slides I have used often in the past year, but this may be the last occurrence as we are completing a paper on the topic with my friends from Montpellier.

This was a most busy and profitable week in Warwick as, in addition to meeting with local researchers and students on a wide range of questions and projects, giving an extended seminar to MASDOC students, attending as many seminars as humanly possible (!), and preparing a 5k race by running in the Warwickshire countryside (in the dark and in the rain), I received the visits of Kerrie Mengersen, Judith Rousseau and Jean-Michel Marin, with whom I made some progress on papers we are writing together. In particular, Jean-Michel and I wrote the skeleton of a paper we (still) plan to submit to COLT 2014 next week. And Judith, Kerrie and I drafted new if paradoxical aconnections between empirical likelihood and model selection. Jean-Michel and Judith also gave talks at the CRiSM seminar, Jean-Michel presenting the latest developments on the convergence of our AMIS algorithm, Judith summarising several papers on the analysis of empirical Bayes methods in non-parametric settings.

(Following my earlier discussion of his paper, Xiaoquan Wen sent me this detailed reply.)

I think it is appropriate to start my response to your comments by introducing a little bit of the background information on my research interest and the project itself: I consider myself as an applied statistician, not a theorist, and I am interested in developing theoretically sound and computationally efficient methods to solve practical problems. The FDR project originated from a practical application in genomics involving hypothesis testing. The details of this particular application can be found in this published paper, and the simulations in the manuscript are also designed for a similar context. In this application, the null model is trivially defined, however there exist finitely many alternative scenarios for each test. We proposed a Bayesian solution that handles this complex setting quite nicely: in brief, we chose to model each possible alternative scenario parametrically, and by taking advantage of Bayesian model averaging, Bayes factor naturally ended up as our test statistic. We had no problem in demonstrating the resulting Bayes factor is much more powerful than the existing approaches, even accounting for the prior (mis-)modeling for Bayes factors. However, in this genomics application, there are potentially tens of thousands of tests need to be simultaneously performed, and FDR control becomes necessary and challenging. Continue reading →

Today, I attended a “miniworkshop” on Bayesian nonparametrics in Paris (Université René Descartes, now located in an intensely renovated area near the Grands Moulins de Paris), in connection with one of the ANR research grants that support my research, BANHDITS in the present case. Reflecting incidentally that it was the third Monday in a row that I was at a meeting listening to talks (after Hong Kong and Newcastle)… The talks were as follows

9h30 – 10h15 : Dominique Bontemps/Sébastien Gadat
Bayesian point of view on the Shape Invariant Model
10h15 – 11h : Pierpaolo De Blasi
Posterior consistency of nonparametric location-scale mixtures for multivariate density estimation
11h30 – 12h15 : Jean-Bernard Salomond
General posterior contraction rate Theorem in inverse problems.
12h15 – 13h : Eduard Belitser
On lower bounds for posterior consistency (I)
14h30 – 15h15 : Eduard Belitser
On lower bounds for posterior consistency (II)
15h15 – 16h : Judith Rousseau
Posterior concentration rates for empirical Bayes approaches
16h – 16h45 : Elisabeth Gassiat
Nonparametric HMM models

While most talks were focussing on contraction and consistency rates, hence far from my current interests, both talk by Judith and Elisabeth held more appeal to me. Judith gave conditions for an empirical Bayes nonparametric modelling to be consistent, with examples taken from Peter Green’s mixtures of Dirichlet, and Elisabeth concluded with a very generic result on the consistent estimation of a finite hidden Markov model. (Incidentally, the same BANHDITS grant will also support the satellite meeting on Bayesian non-parametric at MCMSki IV on Jan. 09.)