## a partial review of BISP8 [guest post]

Posted in Statistics, Travel, University life with tags , , , , , , , on June 17, 2013 by xi'an

Chris Drovandi (QUT) sent me his impression on BISP8 that just took place in Milano, Italia (BISP stands for Bayesian inference in stochastic processes):

Here is a review of some of the talks at BISP8. For the other talks I do not have sufficient background to give the talks the justice that they deserve. It was a very enjoyable small workshop with many talks in my areas of interest.

In the first session Vanja Dukic presented bayesian inference of SEIR epidemic DE models and state space models of google flu trends data. In the case of the state space models a particle learning algorithm was developed. The author considered both fixed and random effects for the data in each US state. In the second session, Murali Haran presented a likelihood-free approach for inferring the parameters of a spatio-temporal epidemic model. The speaker used a Gaussian process emulator of the model based on model simulations from a regulator grid of parameter values. The emulator approach is suggested to be less intensive in terms of the number of model simulations compared with abc but is only suitable for low dimensional inference problems (even less so than abc).

In the first session of day 2 Ana Palacios combined the gompertz model with Markov processes to create flexible and realistic stochastic growth models. The resulting model has a difficult likelihood and inference was performed by completing the likelihood creating simple Gibbs moves and by ABC.

There were 3 talks in a row on inference for SDEs. The first, by Simon Särkkä, avoids evaluating an intractable transition density by proposing from another diffusion model and computing importance weights using the girsanov theorem. Next, Samuel Kou used a population MCMC type approach where each chain had a different Euler discretisation. This helps improve mixing for the chain with the finest grid. Moves between chains are complicated by the different dimension for each chain. The author used a filling approach to overcome this. A very interesting aspect of the talk was using information from all chains to extrapolate various posterior quantiles to delta_t is 0 (no discretisation implying the correct posterior). I assume the extrapolation may not work as well for the extreme quantiles. The third talk, by Andrew Golightly, proposed an auxiliary approach to improve PMCMC for these models. This talk was the most technical (for me) so need more time to digest. Following my talk (based on some work here.  And some current work.) was an applied talk using smc2 methodology.

On the final day Alexandros Beskos investigated the use of SMC for Bayesian inference for a high dimensional (static) parameter. SMC is advocated here due to the ease of adaptation relative to MCMC when there is no structure in the model. The base of the approach I believe was that of Chopin (2002).

## Bayes 250 in London

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , on March 20, 2013 by xi'an

The two-day Bayes 250 Conference at the Royal Statistical Society is now officially announced with the complete programme on the RSS website. With the registration form available as well. A mix of eighteen junior and senior speakers covering the thematic and geographical spectra of UK Bayesian statistics. (It would be difficult not to acknowledge the top position of the United Kingdom in the list of contributions to Bayesian statistics!) Plus an interview of Dennis Lindley (pictured above in one of the rare pictures of Dennis available on the Web) by Tony O’Hagan! Thanks to Chris Holmes for organising this exciting meeting celebrating the 1763 publication of the Essay (with me “tagging along” as a co-organiser).

Here is a blurb I wrote as a presentation (pardon my French!):

2013 marks the 250th anniversary of the publication in Dec. 1763 of “An Essay towards solving a Problem in the Doctrine of Chances” in the Philosophical Transactions of the Royal Society of London, based on notes by Thomas Bayes and edited by Richard Price who submitted the Essay posthumously to Bayes’ death.

This publication is acknowledged as the birth certificate of what is now called Bayesian statistics and the Royal Statistical Society decided to celebrate this important milestone in the story of statistics (and not only UK statistics) by organising a conference on Bayesian statistics. The conference will take place at the RSS Headquarters in Errol Street and will run from June 19, late morning, to June 20, early afternoon. Everyone interested is welcome to present one’s work during the poster session on the afternoon of June 19.

The Royal Statistical Society is looking forward your participation in this event and hopes you will enjoy the variety in the presentations of the programme.

## who’s afraid of the big B wolf?

Posted in Books, Statistics, University life with tags , , , , , , , , , , on March 13, 2013 by xi'an

Aris Spanos just published a paper entitled “Who should be afraid of the Jeffreys-Lindley paradox?” in the journal Philosophy of Science. This piece is a continuation of the debate about frequentist versus llikelihoodist versus Bayesian (should it be Bayesianist?! or Laplacist?!) testing approaches, exposed in Mayo and Spanos’ Error and Inference, and discussed in several posts of the ‘Og. I started reading the paper in conjunction with a paper I am currently writing for a special volume in  honour of Dennis Lindley, paper that I will discuss later on the ‘Og…

“…the postdata severity evaluation (…) addresses the key problem with Fisherian p-values in the sense that the severity evaluation provides the “magnitude” of the warranted discrepancy from the null by taking into account the generic capacity of the test (that includes n) in question as it relates to the observed data”(p.88)

First, the antagonistic style of the paper is reminding me of Spanos’ previous works in that it relies on repeated value judgements (such as “Bayesian charge”, “blatant misinterpretation”, “Bayesian allegations that have undermined the credibility of frequentist statistics”, “both approaches are far from immune to fallacious interpretations”, “only crude rules of thumbs”, &tc.) and rhetorical sleights of hand. (See, e.g., “In contrast, the severity account ensures learning from data by employing trustworthy evidence (…), the reliability of evidence being calibrated in terms of the relevant error probabilities” [my stress].) Connectedly, Spanos often resorts to an unusual [at least for statisticians] vocabulary that amounts to newspeak. Here are some illustrations: “summoning the generic capacity of the test”, ‘substantively significant”, “custom tailoring the generic capacity of the test”, “the fallacy of acceptance”, “the relevance of the generic capacity of the particular test”, yes the term “generic capacity” is occurring there with a truly high frequency. Continue reading

## the anti-Bayesian moment and its passing commented

Posted in Books, Statistics, University life with tags , , , , on March 12, 2013 by xi'an

Here is a comment on our rejoinder “the anti-Bayesian moment and its passing” with Andrew Gelman from Deborah Mayo, comment that could not make it through as a comment:

You assume that I am interested in long-term average properties of procedures, even though I have so often argued that they are at most necessary (as consequences of good procedures), but scarcely sufficient for a severity assessment. The error statistical account I have developed is a statistical philosophy. It is not one to be found in Neyman and Pearson, jointly or separately, except in occasional glimpses here and there (unfortunately). It is certainly not about well-defined accept-reject rules. If N-P had only been clearer, and Fisher better behaved, we would not have had decades of wrangling. However, I have argued, the error statistical philosophy explicates, and directs the interpretation of, frequentist sampling theory methods in scientific, as opposed to behavioural, contexts. It is not a complete philosophy…but I think Gelmanian Bayesians could find in it a source of “standard setting”.

You say “the prior is both a probabilistic object, standard from this perspective, and a subjective construct, translating qualitative personal assessments into a probability distribution. The extension of this dual nature to the so-called “conventional” priors (a very good semantic finding!) is to set a reference … against which to test the impact of one’s prior choices and the variability of the resulting inference. …they simply set a standard against which to gauge our answers.”

I think there are standards for even an approximate meaning of “standard-setting” in science, and I still do not see how an object whose meaning and rationale may fluctuate wildly, even in a given example, can serve as a standard or reference. For what?

Perhaps the idea is that one can gauge how different priors change the posteriors, because, after all, the likelihood is well-defined. That is why the prior and not the likelihood is the camel. But it isn’t obvious why I should want the camel. (camel/gnat references in the paper and response).

## the anti-Bayesian moment and its passing online

Posted in Statistics, University life with tags , , on March 8, 2013 by xi'an

Our rejoinder “the anti-Bayesian moment and its passing” with Andrew Gelman has now been put online on the webpage of The American Statistician. While this rejoinder is freely available, the paper that generated the discussion and this rejoinder, ““Not Only Defended But Also Applied”: The Perceived Absurdity of Bayesian Inference” is only available to subscribers to The American Statistician. Or through arXiv.

## estimating a constant (not really)

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on October 12, 2012 by xi'an

Larry Wasserman wrote a blog entry on the normalizing constant paradox, where he repeats that he does not understand my earlier point…Let me try to recap here this point and the various comments I made on StackExchange (while keeping in mind all this is for intellectual fun!)

The entry is somehow paradoxical in that Larry acknowledges (in that post) that the analysis in his book, All of Statistics, is wrong. The fact that “g(x)/c is a valid density only for one value of c” (and hence cannot lead to a notion of likelihood on c) is the very reason why I stated that there can be no statistical inference nor prior distribution about c: a sample from f does not bring statistical information about c and there can be no statistical estimate of c based on this sample. (In case you did not notice, I insist upon statistical!)

To me this problem is completely different from a statistical problem, at least in the modern sense: if I need to approximate the constant c—as I do in fact when computing Bayes factors—, I can produce an arbitrarily long sample from a certain importance distribution and derive a converging (and sometimes unbiased) approximation of c. Once again, this is Monte Carlo integration, a numerical technique based on the Law of Large Numbers and the stabilisation of frequencies. (Call it a frequentist method if you wish. I completely agree that MCMC methods are inherently frequentist in that sense, And see no problem with this because they are not statistical methods. Of course, this may be the core of the disagreement with Larry and others, that they call statistics the Law of Large Numbers, and I do not. This lack of separation between both notions also shows up in a recent general public talk on Poincaré’s mistakes by Cédric Villani! All this may just mean I am irremediably Bayesian, seeing anything motivated by frequencies as non-statistical!) But that process does not mean that c can take a range of values that would index a family of densities compatible with a given sample. In this Monte Carlo integration approach, the distribution of the sample is completely under control (modulo the errors induced by pseudo-random generation). This approach is therefore outside the realm of Bayesian analysis “that puts distributions on fixed but unknown constants”, because those unknown constants parameterise the distribution of an observed sample. Ergo, c is not a parameter of the sample and the sample Larry argues about (“we have data sampled from a distribution”) contains no information whatsoever about c that is not already in the function g. (It is not “data” in this respect, but a stochastic sequence that can be used for approximation purposes.) Which gets me back to my first argument, namely that c is known (and at the same time difficult or impossible to compute)!

Let me also answer here the comments on “why is this any different from estimating the speed of light c?” “why can’t you do this with the 100th digit of π?” on the earlier post or on StackExchange. Estimating the speed of light means for me (who repeatedly flunked Physics exams after leaving high school!) that we have a physical experiment that measures the speed of light (as the original one by Rœmer at the Observatoire de Paris I visited earlier last week) and that the statistical analysis infers about c by using those measurements and the impact of the imprecision of the measuring instruments (as we do when analysing astronomical data). If, now, there exists a physical formula of the kind

$c=\int_\Xi \psi(\xi) \varphi(\xi) \text{d}\xi$

where φ is a probability density, I can imagine stochastic approximations of c based on this formula, but I do not consider it a statistical problem any longer. The case is thus clearer for the 100th digit of π: it is also a fixed number, that I can approximate by a stochastic experiment but on which I cannot attach a statistical tag. (It is 9, by the way.) Throwing darts at random as I did during my Oz tour is not a statistical procedure, but simple Monte Carlo à la Buffon…

Overall, I still do not see this as a paradox for our field (and certainly not as a critique of Bayesian analysis), because there is no reason a statistical technique should be able to address any and every numerical problem. (Once again, Persi Diaconis would almost certainly differ, as he defended a Bayesian perspective on numerical analysis in the early days of MCMC…) There may be a “Bayesian” solution to this particular problem (and that would nice) and there may be none (and that would be OK too!), but I am not even convinced I would call this solution “Bayesian”! (Again, let us remember this is mostly for intellectual fun!)

## ACS 2012 (#2)

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , on July 12, 2012 by xi'an

This morning, after a nice and cool run along the river Torrens amidst almost unceasing bird songs, I attended another Bayesian ASC 2012 session with Scott Sisson presenting a simulation method aimed at correcting for biased confidence intervals and Robert Kohn giving the same talk in Kyoto. Scott’s proposal, which is rather similar to parametric bootstrap bias correction, is actually more frequentist than Bayesian as the bias is defined in terms of an correct frequentist coverage of a given confidence (or credible) interval. (Thus making the connection with Roderick Little’s calibrated Bayes talk of yesterday.) This perspective thus perceives ABC as a particular inferential method, instead of a computational approximation to the genuine Bayesian object. (We will certainly discuss the issue with Scott next week in Sydney.)

Then Peter Donnely gave a particularly exciting and well-attended talk on the geographic classification of humans, in particular of the (early 1900’s) population of the British isles, based on a clever clustering idea derived from an earlier paper of Na Li and Matthew Stephens: using genetic sequences from a group of individuals, each individual was paired with the rest of the sample as if it descended from this population. Using an HMM model, this led to clustering the sample into about 50 groups, with a remarkable geographic homogeneity: for instance, Cornwall and Devon made two distinct groups, an English speaking pocket of Wales (Little England) was identified as a specific group and so on, the central, eastern and southern England constituting an homogenous group of its own…