Archive for Bayes factor

Bayesian empirical likelihood

Posted in Books, pictures, Statistics with tags , , , , , , on July 21, 2016 by xi'an

non-tibetan flags in Pula, Sardinia, June 12, 2016Sid Chib, Minchul Shin, and Anna Simoni (CREST) recently arXived a paper entitled “Bayesian Empirical Likelihood Estimation and Comparison of Moment Condition Models“. That Sid mentioned to me in Sardinia. The core notion is related to earlier Bayesian forays into empirical likelihood pseudo-models, like Lazar (2005) or our PNAS paper with Kerrie Mengersen and Pierre Pudlo. Namely to build a pseudo-likelihood using empirical likelihood principles and to derive the posterior associated with this pseudo-likelihood. Some novel aspects are the introduction of tolerance (nuisance) extra-parameters when some constraints do not hold, a maximum entropy (or exponentially tilted) representation of the empirical  likelihood function, and a Chib-Jeliazkov representation of the marginal likelihood. The authors obtain a Bernstein-von Mises theorem under correct specification. Meaning convergence. And another one under misspecification.

While the above Bernstein-von Mises theory is somewhat expected (if worth deriving) in the light of frequentist consistency results, the paper also considers a novel and exciting aspect, namely to compare models (or rather moment restrictions) by Bayes factors derived from empirical likelihoods. A grand (encompassing) model is obtained by considering all moment restrictions at once, which first sounds like more restricted, except that the extra-parameters are there to monitor constraints that actually hold. It is unclear from my cursory read of the paper whether priors on those extra-parameters can be automatically derived from a single prior. And how much they impact the value of the Bayes factor. The consistency results found in the paper do not seem to depend on the form of priors adopted for each model (for all three cases of both correctly, one correctly and none correctly specified models). Except maybe for some local asymptotic normality (LAN). Interestingly (?), the authors consider the Poisson versus Negative Binomial test we used in our testing by mixture paper. This paper is thus bringing a better view of the theoretical properties of a pseudo-Bayesian approach based on moment conditions and empirical likelihood approximations. Without a clear vision of the implementation details, from the parameterisation of the constraints (which could be tested the same way) to the construction of the prior(s) to the handling of MCMC difficulties in realistic models.

contemporary issues on hypothesis testing

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , on March 2, 2016 by xi'an

the pond in front of the Zeeman building, University of Warwick, July 01, 2014At Warwick U., CRiSM is sponsoring another workshop, this time on hypothesis testing next Sept. 15 and 16 (just before the Glencoe race!). Registration and poster submission are already open. Most obviously, given my current interests, I am quite excited by the prospect of taking part in this workshop (and sorry that Andrew can only take part by teleconference!).

normality test with 10⁸ observations?

Posted in Books, pictures, Statistics, Travel, University life with tags , , on February 23, 2016 by xi'an

Quentin Gronau and Eric-Jan Wagenmakers just arXived a rather exotic paper in that it merges experimental mathematics with Bayesian inference. The mathematical question at stake here is whether or not one of the classical irrational constants like π, e or √2 are “normal”, that is, have the same frequency for all digits in their decimal expansion. This (still) is an open problem in mathematics. Indeed, the authors do not provide a definitive answer but instead run a Bayesian testing experiment on 100 million digits on π, ending up with a Bayes factor of 2×10³¹. The figure is massive, however one must account for the number of “observations” in the sample. (Which is not a statistical sample, strictly speaking.) While I do not think the argument will convince an algebraist (as the counterargument of knowing nothing about digits after the 10⁸th one is easy to formulate!), I am also uncertain of the relevance of this huge figure, as I am unable to justify a prior on the distribution of digits if the number is not normal. Since we do not even know whether there are non-normal numbers outside rational numbers. While the flat Dirichlet prior is a uniform prior over the simplex, to assume that all possible probability repartitions are equally possible may not appeal to a mathematician, as far as I [do not] know! Furthermore, the multinomial model imposed on (at?) the series of digit of π does not have to agree with this “data” and discrepancies may as well be due to a poor sampling model as to an inappropriate prior. The data may more agree with H⁰ than with H¹ because the sampling model in H¹ is ill-suited. The paper also considers a second prior (or posterior prior) that I do not find particularly relevant.

For all I [do not] know, the huge value of the Bayes factor may be another avatar of the Lindley-Jeffreys paradox. In the sense of my interpretation of the phenomenon as a dilution of the prior mass over an unrealistically large space. Actually, the authors mention the paradox as well (p.5) but seemingly as a criticism of a frequentist approach. The picture above has its lower bound determined by a virtual dataset that produces a χ² statistic equal to the 95% χ² quantile. Dataset that stills produces a fairly high Bayes factor. (The discussion seems to assume that the Bayes factor is a one-to-one function of the χ² statistics, which is not correct I think. I wonder if exactly 95% of the sequence of Bayes factors stays within this band. There is no theoretical reason for this to happen of course.) Hence an illustration of the Lindley-Jeffreys paradox indeed, in its first interpretation of the clash between conclusions based on both paradigms. As a conclusion, I am thus not terribly convinced that this experiment supports the use of a Bayes factor for solving this normality hypothesis. Not that I support the alternative use of the p-value of course! As a sidenote, the pdf file I downloaded from arXiv has a slight bug that interacted badly with my printer in Warwick, as shown in the picture above.

ABC for wargames

Posted in Books, Kids, pictures, Statistics with tags , , , , , , on February 10, 2016 by xi'an

I recently came across an ABC paper in PLoS ONE by Xavier Rubio-Campillo applying this simulation technique to the validation of some differential equation models linking force sizes and values for both sides. The dataset is made of battle casualties separated into four periods, from pike and musket to the American Civil War. The outcome is used to compute an ABC Bayes factor but it seems this computation is highly dependent on the tolerance threshold. With highly variable numerical values. The most favoured model includes some fatigue effect about the decreasing efficiency of armies along time. While the paper somehow reminded me of a most peculiar book, I have no idea on the depth of this analysis, namely on how relevant it is to model a battle through a two-dimensional system of differential equations, given the numerous factors involved in the matter…

approximating evidence with missing data

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , on December 23, 2015 by xi'an

University of Warwick, May 31 2010Panayiota Touloupou (Warwick), Naif Alzahrani, Peter Neal, Simon Spencer (Warwick) and Trevelyan McKinley arXived a paper yesterday on Model comparison with missing data using MCMC and importance sampling, where they proposed an importance sampling strategy based on an early MCMC run to approximate the marginal likelihood a.k.a. the evidence. Another instance of estimating a constant. It is thus similar to our Frontier paper with Jean-Michel, as well as to the recent Pima Indian survey of James and Nicolas. The authors give the difficulty to calibrate reversible jump MCMC as the starting point to their research. The importance sampler they use is the natural choice of a Gaussian or t distribution centred at some estimate of θ and with covariance matrix associated with Fisher’s information. Or derived from the warmup MCMC run. The comparison between the different approximations to the evidence are done first over longitudinal epidemiological models. Involving 11 parameters in the example processed therein. The competitors to the 9 versions of importance samplers investigated in the paper are the raw harmonic mean [rather than our HPD truncated version], Chib’s, path sampling and RJMCMC [which does not make much sense when comparing two models]. But neither bridge sampling, nor nested sampling. Without any surprise (!) harmonic means do not converge to the right value, but more surprisingly Chib’s method happens to be less accurate than most importance solutions studied therein. It may be due to the fact that Chib’s approximation requires three MCMC runs and hence is quite costly. The fact that the mixture (or defensive) importance sampling [with 5% weight on the prior] did best begs for a comparison with bridge sampling, no? The difficulty with such study is obviously that the results only apply in the setting of the simulation, hence that e.g. another mixture importance sampler or Chib’s solution would behave differently in another model. In particular, it is hard to judge of the impact of the dimensions of the parameter and of the missing data.

model selection and multiple testing

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on October 23, 2015 by xi'an

Ritabrata Dutta, Malgorzata Bogdan and Jayanta Ghosh recently arXived a survey paper on model selection and multiple testing. Which provides a good opportunity to reflect upon traditional Bayesian approaches to model choice. And potential alternatives. On my way back from Madrid, where I got a bit distracted when flying over the South-West French coast, from Biarritz to Bordeaux. Spotting the lake of Hourtain, where I spent my military training month, 29 years ago!

“On the basis of comparison of AIC and BIC, we suggest tentatively that model selection rules should be used for the purpose for which they were introduced. If they are used for other problems, a fresh justification is desirable. In one case, justification may take the form of a consistency theorem, in the other some sort of oracle inequality. Both may be hard to prove. Then one should have substantial numerical assessment over many different examples.”

The authors quickly replace the Bayes factor with BIC, because it is typically consistent. In the comparison between AIC and BIC they mention the connundrum of defining a prior on a nested model from the prior on the nesting model, a problem that has not been properly solved in my opinion. The above quote with its call to a large simulation study reminded me of the paper by Arnold & Loeppky about running such studies through ecdfs. That I did not see as solving the issue. The authors also discuss DIC and Lasso, without making much of a connection between those, or with the above. And then reach the parametric empirical Bayes approach to model selection exemplified by Ed George’s and Don Foster’s 2000 paper. Which achieves asymptotic optimality for posterior prediction loss (p.9). And which unifies a wide range of model selection approaches.

A second part of the survey considers the large p setting, where BIC is not a good approximation to the Bayes factor (when testing whether or not all mean entries are zero). And recalls that there are priors ensuring consistency for the Bayes factor in this very [restrictive] case. Then, in Section 4, the authors move to what they call “cross-validatory Bayes factors”, also known as partial Bayes factors and pseudo-Bayes factors, where the data is split to (a) make the improper prior proper and (b) run the comparison or test on the remaining data. They also show the surprising result that, provided the fraction of the data used to proper-ise the prior does not converge to one, the X validated Bayes factor remains consistent [for the special case above]. The last part of the paper concentrates on multiple testing but is more tentative and conjecturing about convergence results, centring on the differences between full Bayes and empirical Bayes. Then the plane landed in Paris and I stopped my reading, not feeling differently about the topic than when the plane started from Madrid.

Mathematical underpinnings of Analytics (theory and applications)

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , on September 25, 2015 by xi'an

“Today, a week or two spent reading Jaynes’ book can be a life-changing experience.” (p.8)

I received this book by Peter Grindrod, Mathematical underpinnings of Analytics (theory and applications), from Oxford University Press, quite a while ago. (Not that long ago since the book got published in 2015.) As a book for review for CHANCE. And let it sit on my desk and in my travel bag for the same while as it was unclear to me that it was connected with Statistics and CHANCE. What is [are?!] analytics?! I did not find much of a definition of analytics when I at last opened the book, and even less mentions of statistics or machine-learning, but Wikipedia told me the following:

“Analytics is a multidimensional discipline. There is extensive use of mathematics and statistics, the use of descriptive techniques and predictive models to gain valuable knowledge from data—data analysis. The insights from data are used to recommend action or to guide decision making rooted in business context. Thus, analytics is not so much concerned with individual analyses or analysis steps, but with the entire methodology.”

Barring the absurdity of speaking of a “multidimensional discipline” [and even worse of linking with the mathematical notion of dimension!], this tells me analytics is a mix of data analysis and decision making. Hence relying on (some) statistics. Fine.

“Perhaps in ten years, time, the mathematics of behavioural analytics will be common place: every mathematics department will be doing some of it.”(p.10)

First, and to start with some positive words (!), a book that quotes both Friedrich Nietzsche and Patti Smith cannot get everything wrong! (Of course, including a most likely apocryphal quote from the now late Yogi Berra does not partake from this category!) Second, from a general perspective, I feel the book meanders its way through chapters towards a higher level of statistical consciousness, from graphs to clustering, to hidden Markov models, without precisely mentioning statistics or statistical model, while insisting very much upon Bayesian procedures and Bayesian thinking. Overall, I can relate to most items mentioned in Peter Grindrod’s book, but mostly by first reconstructing the notions behind. While I personally appreciate the distanced and often ironic tone of the book, reflecting upon the author’s experience in retail modelling, I am thus wondering at which audience Mathematical underpinnings of Analytics aims, for a practitioner would have a hard time jumping the gap between the concepts exposed therein and one’s practice, while a theoretician would require more formal and deeper entries on the topics broached by the book. I just doubt this entry will be enough to lead maths departments to adopt behavioural analytics as part of their curriculum… Continue reading


Get every new post delivered to your Inbox.

Join 1,068 other followers