Here is the fourth set of slides for my third year statistics course, trying to build intuition about the likelihood surface and why on Earth would one want to find its maximum?!, through graphs. I am yet uncertain whether or not I will reach the point where I can teach more asymptotics so maybe I will also include asymptotic normality of the MLE under regularity conditions in this chapter…
Archive for Bayesian statistics
Here are my tee-shirt design proposals for the official ISBA tee-shirt competition! (I used the facilities of CustomInk.com as I could not easily find a free software around. Except for the last one where I recycled my vistaprint mug design…)
While I do not have any expectation of seeing one of these the winner (!), what is your favourite one?!
Here is the third set of slides for my third year statistics course. Nothing out of the ordinary, but the opportunity to link statistics and simulation for students not yet exposed to Monte Carlo methods. (No ABC yet, but who knows?, I may use ABC as an entry to Bayesian statistics, following Don Rubin’s example! Surprising typo on the Project Euclid page for this 1984 paper, by the way…) On Monday, I had the pleasant surprise to see Shravan Vasishth in the audience, as he is visiting Université Denis Diderot (Paris 7) this month.
The first full day of talks at ISBA 2014, Cancún, was full of goodies, from the three early talks on specifically developed software, including one by Daniel Lee on STAN that completed the one given by Bob Carpenter a few weeks ago in Paris (which gives me the opportunity to advertise STAN tee-shirts!). To the poster session (which just started a wee bit late for my conference sleep pattern!). Sylvia Richardson gave an impressive lecture full of information on Bayesian genomics. I also enjoyed very much two sessions with young Bayesian statisticians, one on Bayesian econometrics and the other one more diverse and sponsored by ISBA. Overall, and this also applies to the programme of the following days, I found that the proportion of non-parametric talks was quite high this year, possibly signalling a switch in the community and the interest of Bayesians. And conversely very few talks on computing related issues. (With most scheduled after my early departure…)
In the first of those sessions, Brendan Kline talked about partially identified parameters, a topic quite close to my interests, although I did not buy the overall modelling adopted in the analysis. For instance, Brendan Kline presented the example of a parameter θ that is the expectation of a random variable Y which is indirectly observed through x <Y< x̅ . While he maintained that inference should be restricted to an interval around θ and that using a prior on θ was doomed to fail (and against econometrics culture), I would have prefered to see this example as a missing data one, with both x and x̅ containing information about θ. And somewhat object to the argument against the prior as it would equally apply to any prior modelling. Although unrelated in the themes, Angela Bitto presented a work on the impact of different prior modellings on the estimation of time-varying parameters in time-series models. À la Harrison and West 1994 Discriminating between good and poor shrinkage in a way I could not spot. Unless it was based on the data fit (horror!). And a third talk of interest by Andriy Norets that (very loosely) related to Angela’s talk by presenting a framework to modify credible sets towards frequentist properties: one example was the credible interval on a positive normal mean that led to a frequency-valid confidence interval with a modified prior. This reminded me very much of the shrinkage confidence intervals of the James-Stein era.
In my book review of the recent book by Dirk Kroese and Joshua Chan, Statistical Modeling and Computation, I mistakenly and persistently typed the name of the second author as Joshua Chen. This typo alas made it to the printed and on-line versions of the subsequent CHANCE 27(2) column. I am thus very much sorry for this mistake of mine and most sincerely apologise to the authors. Indeed, it always annoys me to have my name mistyped (usually as Roberts!) in references. [If nothing else, this typo signals it is high time for a change of my prescription glasses.]
improved approximate-Bayesian model-choice method for estimating shared evolutionary history [reply from the author]Posted in Books, Statistics, University life with tags ABC, Bayesian statistics, consistence, Dirichlet process, exchangeability, frequency properties, Kingman's coalescent, Molecular Biology and Evolution, Monte Carlo Statistical Methods, reversible jump, sufficiency, summary statistics, taxon on June 3, 2014 by xi'an
[Here is a very kind and detailed reply from Jamie Oakes to the comments I made on his ABC paper a few days ago:]
First of all, many thanks for your thorough review of my pre-print! It is very helpful and much appreciated. I just wanted to comment on a few things you address in your post.
I am a little confused about how my replacement of continuous uniform probability distributions with gamma distributions for priors on several parameters introduces a potentially crippling number of hyperparameters. Both uniform and gamma distributions have two parameters. So, the new model only has one additional hyperparameter compared to the original msBayes model: the concentration parameter on the Dirichlet process prior on divergence models. Also, the new model offers a uniform prior over divergence models (though I don’t recommend it).
Your comment about there being no new ABC technique is 100% correct. The model is new, the ABC numerical machinery is not. Also, your intuition is correct, I do not use the divergence times to calculate summary statistics. I mention the divergence times in the description of the ABC algorithm with the hope of making it clear that the times are scaled (see Equation (12)) prior to the simulation of the data (from which the summary statistics are calculated). This scaling is simply to go from units proportional to time, to units that are proportional to the expected number of mutations. Clearly, my attempt at clarity only created unnecessary opacity. I’ll have to make some edits.
Regarding the reshuffling of the summary statistics calculated from different alignments of sequences, the statistics are not exchangeable. So, reshuffling them in a manner that is not conistent across all simulations and the observed data is not mathematically valid. Also, if elements are exchangeable, their order will not affect the likelihood (or the posterior, barring sampling error). Thus, if our goal is to approximate the likelihood, I would hope the reshuffling would also have little affect on the approximate posterior (otherwise my approximation is not so good?).
You are correct that my use of “bias” was not well defined in reference to the identity line of my plots of the estimated vs true probability of the one-divergence model. I think we can agree that, ideally (all assumptions are met), the estimated posterior probability of a model should estimate the probability that the model is correct. For large numbers of simulation
replicates, the proportion of the replicates for which the one-divergence model is true will approximate the probability that the one-divergence model is correct. Thus, if the method has the desirable (albeit “frequentist”) behavior such that the estimated posterior probability of the one-divergence model is an unbiased estimate of the probability that the one-divergence model is correct, the points should fall near the identity line. For example, let us say the method estimates a posterior probability of 0.90 for the one-divergence model for 1000 simulated datasets. If the method is accurately estimating the probability that the one-divergence model is the correct model, then the one-divergence model should be the true model for approximately 900 of the 1000 datasets. Any trend away from the identity line indicates the method is biased in the (frequentist) sense that it is not correctly estimating the probability that the one-divergence model is the correct model. I agree this measure of “bias” is frequentist in nature. However, it seems like a worthwhile goal for Bayesian model-choice methods to have good frequentist properties. If a method strongly deviates from the identity line, it is much more difficult to interpret the posterior probabilites that it estimates. Going back to my example of the posterior probability of 0.90 for 1000 replicates, I would be alarmed if the model was true in only 100 of the replicates.
My apologies if my citation of your PNAS paper seemed misleading. The citation was intended to be limited to the context of ABC methods that use summary statistics that are insufficient across the models under comparison (like msBayes and the method I present in the paper). I will definitely expand on this sentence to make this clearer in revisions. Thanks!
Lastly, my concluding remarks in the paper about full-likelihood methods in this domain are not as lofty as you might think. The likelihood function of the msBayes model is tractable, and, in fact, has already been derived and implemented via reversible-jump MCMC (albeit, not readily available yet). Also, there are plenty of examples of rich, Kingman-coalescent models implemented in full-likelihood Bayesian frameworks. Too many to list, but a lot of them are implemented in the BEAST software package. One noteworthy example is the work of Bryant et al. (2012, Molecular Biology and Evolution, 29(8), 1917–32) that analytically integrates over all gene trees for biallelic markers under the coalescent.
The special issue of Statistical Science Kerrie Mengersen and I edited over the past three (four?) years is now out in print! Even though many ‘Og readers may have already seen the table of contents, here it is once again. We hope you will enjoy this 100 page long excursion in big Bayesiana. The papers are not freely accessible as “current papers” on the journal website but can yet be found in the “future papers” section.
(If a sponsor wants to support turning the papers into open access version, he or she is most welcome to contact us or the IMS!) And, thanks to Larry for reminding me!, available on arXiv. Thanks to all authors, discussants, reviewers and special kudos to Jon Wellner for his constant help and support in putting the special issue together!
- Big Bayes Stories—Foreword
- Bayesian Estimation of Population-Level Trends in Measures of Health Status
- Discussion of “Estimating the Distribution of Dietary Consumption Patterns”
- Contribution M. A. Girolam
- Wonderful Examples, but Let’s not Close Our Eyes
- Reply to the Discussion of “Estimating the Distribution of Dietary Consumption Patterns”
- Response to Discussion by A. H. Welsh on the AF 447 Paper