## my life as a mixture [BAYSM 2014, Wien]

Posted in Books, Kids, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on September 12, 2014 by xi'an

Next week I am giving a talk at BAYSM in Vienna. BAYSM is the Bayesian Young Statisticians meeting so one may wonder why, but with Chris Holmes and Mike West, we got invited as more… erm… senior speakers! So I decided to give a definitely senior talk on a thread pursued throughout my career so far, namely mixtures. Plus it also relates to works of the other senior speakers. Here is the abstract for the talk:

Mixtures of distributions are fascinating objects for statisticians in that they both constitute a straightforward extension of standard distributions and offer a complex benchmark for evaluating statistical procedures, with a likelihood both computable in a linear time and enjoying an exponential number of local models (and sometimes infinite modes). This fruitful playground appeals in particular to Bayesians as it constitutes an easily understood challenge to the use of improper priors and of objective Bayes solutions. This talk will review some ancient and some more recent works of mine on mixtures of distributions, from the 1990 Gibbs sampler to the 2000 label switching and to later studies of Bayes factor approximations, nested sampling performances, improper priors, improved importance samplers, ABC, and a inverse perspective on the Bayesian approach to testing of hypotheses.

I am very grateful to the scientific committee for this invitation, as it will give me the opportunity to meet the new generation, learn from them and in addition discover Vienna where I have never been, despite several visits to Austria. Including its top, the Großglockner. I will also give a seminar in Linz the day before. In the Institut für Angewandte Statistik.

## independent component analysis and p-values

Posted in pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , on September 8, 2014 by xi'an

Last morning at the neuroscience workshop Jean-François Cardoso presented independent component analysis though a highly pedagogical and enjoyable tutorial that stressed the geometric meaning of the approach, summarised by the notion that the (ICA) decomposition

$X=AS$

of the data X seeks both independence between the columns of S and non-Gaussianity. That is, getting as away from Gaussianity as possible. The geometric bits came from looking at the Kullback-Leibler decomposition of the log likelihood

$-\mathbb{E}[\log L(\theta|X)] = KL(P,Q_\theta) + \mathfrak{E}(P)$

where the expectation is computed under the true distribution P of the data X. And Qθ is the hypothesised distribution. A fine property of this decomposition is a statistical version of Pythagoreas’ theorem, namely that when the family of Qθ‘s is an exponential family, the Kullback-Leibler distance decomposes into

$KL(P,Q_\theta) = KL(P,Q_{\theta^0}) + KL(Q_{\theta^0},Q_\theta)$

where θ⁰ is the expected maximum likelihood estimator of θ. (We also noticed this possibility of a decomposition in our Kullback-projection variable-selection paper with Jérôme Dupuis.) The talk by Aapo Hyvärinen this morning was related to Jean-François’ in that it used ICA all the way to a three-level representation if oriented towards natural vision modelling in connection with his book and the paper on unormalised models recently discussed on the ‘Og.

On the afternoon, Eric-Jan Wagenmaker [who persistently and rationally fight the (ab)use of p-values and who frequently figures on Andrew's blog] gave a warning tutorial talk about the dangers of trusting p-values and going fishing for significance in existing studies, much in the spirit of Andrew’s blog (except for the defence of Bayes factors). Arguing in favour of preregistration. The talk was full of illustrations from psychology. And included the line that ESP testing is the jester of academia, meaning that testing for whatever form of ESP should be encouraged as a way to check testing procedures. If a procedure finds a significant departure from the null in this setting, there is something wrong with it! I was then reminded that Eric-Jan was one of the authors having analysed Bem’s controversial (!) paper on the “anomalous processes of information or energy transfer that are currently unexplained in terms of known physical or biological mechanisms”… (And of the shocking talk by Jessica Utts on the same topic I attended in Australia two years ago.)

## JSM 2014, Boston [#3]

Posted in Statistics, University life with tags , , , , , , , on August 8, 2014 by xi'an

Today I gave a talk in the Advances in model selection session. Organised by Veronika Rockova and Ed George. (A bit of pre-talk stress: I actually attempted to change my slides at 5am and only managed to erase the current version! I thus left early enough to stop by the presentation room…) Here are the final slides, which have much in common with earlier versions, but also borrowed from Jean-Michel Marin’s talk in Cambridge. A posteriori, I think the talk missed one slide on the practical run of the ABC random forest algorithm, since later questions showed miscomprehension from the audience.

The other talks in this session were by Andreas Buja [whom I last met in Budapest last year] on valid post-modelling inference. A very relevant reflection on the fundamental bias in statistical modelling. Then by Nick Polson, about efficient ways to compute MAP for objective functions that are irregular.  Great entry into optimisation methods I had never heard of earlier.! (The abstract is unrelated.) And last but not least by Veronika Rockova, on mixing Indian buffet processes with spike-and-slab priors for factor analysis with unknown numbers of factors. A definitely advanced contribution to factor analysis, with a very nice idea of introducing a non-identifiable rotation to align on orthogonal designs. (Here too the abstract is unrelated, a side effect of the ASA requiring abstracts sent very long in advance.)

Although discussions lasted well into the following Bayesian Inference: Theory and Foundations session, I managed to listen to a few talks there. In particular, a talk by Keli Liu on constructing non-informative priors. A question of direct relevance. The notion of objectivity is to achieve a frequentist distribution of the Bayes factor associated with the point null that is constant. Or has a constant quantile at a given level. The second talk by Alexandra Bolotskikh related to older interests of mine’s, namely the construction of improved confidence regions in the spirit of Stein. (Not that surprising, given that a coauthor is Marty Wells, who worked with George and I on the topic.) A third talk by Abhishek Pal Majumder (jointly with Jan Hanning) dealt on a new type of fiducial distributions, with matching prior properties. This sentence popped a lot over the past days, but this is yet another area where I remain puzzled by the very notion. I mean the notion of fiducial distribution. Esp. in this case where the matching prior gets even closer to being plain Bayesian.

## AppliBUGS day celebrating Jean-Louis Foulley

Posted in pictures, Statistics, University life with tags , , , , , , on June 10, 2014 by xi'an

In case you are in Paris tomorrow and free, there will be an AppliBUGS day focussing on the contributions of our friend Jean-Louis Foulley. (And a regular contributor to the ‘Og!) The meeting takes place in the ampitheatre on second floor of  ENGREF-Montparnasse (19 av du Maine, 75015 Paris, Métro Montparnasse Bienvenüe). I will give a part of the O’Bayes tutorial on alternatives to the Bayes factor.

## a refutation of Johnson’s PNAS paper

Posted in Books, Statistics, University life with tags , , , , , , , on February 11, 2014 by xi'an

Jean-Christophe Mourrat recently arXived a paper “P-value tests and publication bias as causes for high rate of non-reproducible scientific results?”, intended as a rebuttal of Val Johnson’s PNAS paper. The arguments therein are not particularly compelling. (Just as ours’ may sound so to the author.)

“We do not discuss the validity of this [Bayesian] hypothesis here, but we explain in the supplementary material that if taken seriously, it leads to incoherent results, and should thus be avoided for practical purposes.”

The refutation is primarily argued as a rejection of the whole Bayesian perspective. (Although we argue Johnson’ perspective is not that Bayesian…) But the argument within the paper is much simpler: if the probability of rejection under the null is at most 5%, then the overall proportion of false positives is also at most 5% and not 20% as argued in Johnson…! Just as simple as this. Unfortunately, the author mixes conditional and unconditional, frequentist and Bayesian probability models. As well as conditioning upon the data and conditioning upon the rejection region… Read at your own risk. Continue reading

## On the use of marginal posteriors in marginal likelihood estimation via importance-sampling

Posted in R, Statistics, University life with tags , , , , , , , , , , , , , on November 20, 2013 by xi'an

Perrakis, Ntzoufras, and Tsionas just arXived a paper on marginal likelihood (evidence) approximation (with the above title). The idea behind the paper is to base importance sampling for the evidence on simulations from the product of the (block) marginal posterior distributions. Those simulations can be directly derived from an MCMC output by randomly permuting the components. The only critical issue is to find good approximations to the marginal posterior densities. This is handled in the paper either by normal approximations or by Rao-Blackwell estimates. the latter being rather costly since one importance weight involves B.L computations, where B is the number of blocks and L the number of samples used in the Rao-Blackwell estimates. The time factor does not seem to be included in the comparison studies run by the authors, although it would seem necessary when comparing scenarii.

After a standard regression example (that did not include Chib’s solution in the comparison), the paper considers  2- and 3-component mixtures. The discussion centres around label switching (of course) and the deficiencies of Chib’s solution against the current method and Neal’s reference. The study does not include averaging Chib’s solution over permutations as in Berkoff et al. (2003) and Marin et al. (2005), an approach that does eliminate the bias. Especially for a small number of components. Instead, the authors stick to the log(k!) correction, despite it being known for being quite unreliable (depending on the amount of overlap between modes). The final example is Diggle et al. (1995) longitudinal Poisson regression with random effects on epileptic patients. The appeal of this model is the unavailability of the integrated likelihood which implies either estimating it by Rao-Blackwellisation or including the 58 latent variables in the analysis.  (There is no comparison with other methods.)

As a side note, among the many references provided by this paper, I did not find trace of Skilling’s nested sampling or of safe harmonic means (as exposed in our own survey on the topic).

## whetstone and alum block for Occam’s razor

Posted in Statistics, University life with tags , , , , , , on August 1, 2013 by xi'an

A strange title, if any! (The whetstone is a natural hard stone used for sharpening steel instruments, like knifes or sickles and scythes, I remember my grand-fathers handling one when cutting hay and weeds. Alum is hydrated potassium aluminium sulphate and is used as a blood coagulant. Both items are naturally related with shaving and razors, if not with Occam!) The whole title of the paper published by Guido Consonni, Jon Forster and Luca La Rocca in Statistical Science is “The whetstone and the alum block: balanced objective Bayesian comparison of nested models for discrete data“. The paper builds on the notions introduced in the last Valencia meeting by Guido and Luca (and discussed by Judith Rousseau and myself).

Beyond the pun (that forced me to look for “alum stone” on Wikipedia!, and may be lost on some other non-native readers), the point in the title is to build a prior distribution aimed at the comparison of two models such that those models are more sharply distinguished: Occam’s razor would thus cut better when the smaller model is true (hence the whetstone) and less when it is not (hence the alum block)… The solution proposed by the authors is to replace the reference prior on the larger model, π1, with a moment prior à la Johnson and Rossell (2010, JRSS B) and then to turn this moment prior into an intrinsic prior à la Pérez and Berger (2002, Biometrika), making it an “intrinsic moment”. The first transform turns π1 into a non-local prior, with the aim of correcting for the imbalanced convergence rates of the Bayes factor under the null and under the alternative (this is the whetstone). The second transform accumulates more mass in the vicinity of the null model (this is the alum block). (While I like the overall perspective on intrinsic priors, the introduction is a wee confusing about them, e.g. when it mentions fictive observations instead of predictives.)

Being a referee for this paper, I read it in detail (and also because this is one of my favourite research topics!) Further, we already engaged into a fruitful discussion with Guido since the last Valencia meeting and the current paper incorporates some of our comments (and replies to others). I find the proposal of the authors clever and interesting, but not completely Bayesian. Overall, the paper provides a clearly novel methodology that calls for further studies…