Archive for ancilarity

how individualistic should statistics be?

Posted in Books, pictures, Statistics with tags , , , , , , , , , , , on November 5, 2015 by xi'an

keep-stats-subjectiveKeli Liu and Xiao-Li Meng completed a paper on the very nature of inference, to appear in The Annual Review of Statistics and Its Application. This paper or chapter is addressing a fundamental (and foundational) question on drawing inference based a sample on a new observation. That is, in making prediction. To what extent should the characteristics of the sample used for that prediction resemble those of the future observation? In his 1921 book, A Treatise on Probability, Keynes thought this similarity (or individualisation) should be pushed to its extreme, which led him to somewhat conclude on the impossibility of statistics and never to return to the field again. Certainly missing the incoming possibility of comparing models and selecting variables. And not building so much on the “all models are wrong” tenet. On the contrary, classical statistics use the entire data available and the associated model to run the prediction, including Bayesian statistics, although it is less clear how to distinguish between data and control there. Liu & Meng debate about the possibility of creating controls from the data alone. Or “alone” as the model behind always plays a capital role.

“Bayes and Frequentism are two ends of the same spectrum—a spectrum defined in terms of relevance and robustness. The nominal contrast between them (…) is a red herring.”

viemortrerbThe paper makes for an exhilarating if definitely challenging read. With a highly witty writing style. If only because the perspective is unusual, to say the least!, and requires constant mental contortions to frame the assertions into more traditional terms.  For instance, I first thought that Bayesian procedures were in agreement with the ultimate conditioning approach, since it conditions on the observables and nothing else (except for the model!). Upon reflection, I am not so convinced that there is such a difference with the frequentist approach in the (specific) sense that they both take advantage of the entire dataset. Either from the predictive or from the plug-in distribution. It all boils down to how one defines “control”.

“Probability and randomness, so tightly yoked in our minds, are in fact distinct concepts (…) at the end of the day, probability is essentially a tool for bookkeeping, just like the abacus.”

Some sentences from the paper made me think of ABC, even though I am not trying to bring everything back to ABC!, as drawing controls is the nature of the ABC game. ABC draws samples or control from the prior predictive and only keeps those for which the relevant aspects (or the summary statistics) agree with those of the observed data. Which opens similar questions about the validity and precision of the resulting inference, as well as the loss of information due to the projection over the summary statistics. While ABC is not mentioned in the paper, it can be used as a benchmark to walk through it.

“In the words of Jack Kiefer, we need to distinguish those problems with `luck data’ from those with `unlucky data’.”

keep-calm-and-condi-tionI liked very much recalling discussions we had with George Casella and Costas Goutis in Cornell about frequentist conditional inference, with the memory of Jack Kiefer still lingering around. However, I am not so excited about the processing of models here since, from what I understand in the paper (!), the probabilistic model behind the statistical analysis must be used to some extent in producing the control case and thus cannot be truly assessed with a critical eye. For instance, of which use is the mean square error when the model behind is unable to produce the observed data? In particular, the variability of this mean squared error is directly driven by this model. Similarly the notion of ancillaries is completely model-dependent. In the classification diagrams opposing robustness to relevance, all methods included therein are parametric. While non-parametric types of inference could provide a reference or a calibration ruler, at the very least.

Also, by continuously and maybe a wee bit heavily referring to the doctor-and-patient analogy, the paper is somewhat confusing as to which parts are analogy and which parts are methodology and to which type of statistical problem is covered by the discussion (sometimes it feels like all problems and sometimes like medical trials).

“The need to deliver individualized assessments of uncertainty are more pressing than ever.”

 A final question leads us to an infinite regress: if the statistician needs to turn to individualized inference, at which level of individuality should the statistician be assessed? And who is going to provide the controls then? In any case, this challenging paper is definitely worth reading by (only mature?) statisticians to ponder about the nature of the game!

MCMC at ICMS (2)

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on April 25, 2012 by xi'an

The second day of our workshop on computational statistics at the ICMS started with a terrific talk by Xiao-Li Meng. Although this talk related with his Inception talk in Paris last summer, and of the JCGS discussion paper, he brought new geometric aspects to the phenomenon (managing a zero correlation and hence i.i.d.-ness in the simulation of a Gaussian random effect posterior distribution). While I was reflecting about the difficulty to extend the perspective beyond normal models, he introduced a probit example where exact null correlation cannot be found but an adaptive scheme allows to explore the range of correlation coefficients. This made me somehow think of a possible version in this approach in a tempering perspective, where different data augmentation schemes would be merged into an “optimal” geometric mixture, rather than via interweaving.

As an aside, Xiao-Li mentioned the idea of Bayesian sufficiency and Bayesian ancilarity in the construction of his data augmentation schemes. He then concluded that sufficiency is identical in classical and Bayesian approaches, while ancilarity could be defined in several ways. I have already posted on that, but it seems to me that sufficiency is a weaker notion in the Bayesian perspective in the sense that all that matters is that the posterior is the same given the observation y and given the observed statistics, rather than uniformly over all possible values of the random variable Y as in the classical sense. As for ancilarity, it is also natural to consider that an ancillary statistics does not bring information on the parameter, i.e. that the prior and the posterior distributions are the same given the observed ancillary statistics. Going further to define ancilarity as posterior independence between “true” parameters and auxiliary variables, as Xiao-Li suggested, does not seem very sound as it leads to the paradoxes Basu liked so much!

Today, the overlap with the previous meetings in Bristol and in Banff was again limited: Arnaud Doucet rewrote his talk towards less technicity, which means I got the idea much more clearly than last week. The idea of having a sequence of pseudo-parameters with the same pseudo-prior seems to open a wide range of possible adaptive schemes. Faming Liang also gave a talk fairly similar to the one he presented in Banff. And David van Dyk as well, which led me to think anew about collapsed Gibbs samplers in connection with ABC and a project I just started here in Edinburgh.

Otherwise, the intense schedule of the day saw us through eleven talks. Daniele Impartato called for distributions (in the physics or Laurent Schwarz’ meaning of the term!) to decrease the variance of Monte Carlo estimations, an approach I hope to look further as Schwarz’ book is the first math book I ever bought!, an investment I tried to capitalize once in writing a paper mixing James-Stein estimation and distributions for generalised integration by part, paper that was repeatedly rejected until I gave up! Jim Griffin showed us improvements brought in the exploration of large number of potential covariates in linear and generalised linear models. Natesh Pillai tried to drag us through several of his papers on covariance matrix estimation, although I fear he lost me along the way! Let me perversely blame the schedule (rather than an early rise to run around Arthur’s Seat!) for falling asleep during Alex Beskos’ talk on Hamiltonian MCMC for diffusions, even though I was looking forward this talk. (Apologies to Alex!) Then Simon Byrne gave us a quick tour of differential geometry in connection with orthogonalization for Hamiltonian MCMC. Which brought me back very briefly to this early time I was still considering starting a PhD in differential geometry and then even more briefly played with the idea of mixing differential geometry and statistics à la Shun’ichi  Amari…. Ian Murray and  Simo Sarkka completed the day with a cartoonesque talk on latent Gaussians that connected well with Xiao-Li’s and a talk on Gaussian approximations to diffusions with unknown parameters, which kept within the main theme of the conference, namely inference on partly observed diffusions.

As written above, this was too intense a day, with hardly any free time to discuss about the talks or the ongoing projects, which makes me prefer the pace adopted in Bristol or in Banff. Having to meet a local student on leave from Dauphine for a year here did not help of course!)