reflections on the probability space induced by moment conditions with implications for Bayesian Inference [refleXions]

Posted in Statistics, University life with tags , , , , , , , , , , on November 26, 2014 by xi'an

“The main finding is that if the moment functions have one of the properties of a pivotal, then the assertion of a distribution on moment functions coupled with a proper prior does permit Bayesian inference. Without the semi-pivotal condition, the assertion of a distribution for moment functions either partially or completely specifies the prior.” (p.1)

Ron Gallant will present this paper at the Conference in honour of Christian Gouréroux held next week at Dauphine and I have been asked to discuss it. What follows is a collection of notes I made while reading the paper , rather than a coherent discussion, to come later. Hopefully prior to the conference.

The difficulty I have with the approach presented therein stands as much with the presentation as with the contents. I find it difficult to grasp the assumptions behind the model(s) and the motivations for only considering a moment and its distribution. Does it all come down to linking fiducial distributions with Bayesian approaches? In which case I am as usual sceptical about the ability to impose an arbitrary distribution on an arbitrary transform of the pair (x,θ), where x denotes the data. Rather than a genuine prior x likelihood construct. But I bet this is mostly linked with my lack of understanding of the notion of structural models.

“We are concerned with situations where the structural model does not imply exogeneity of θ, or one prefers not to rely on an assumption of exogeneity, or one cannot construct a likelihood at all due to the complexity of the model, or one does not trust the numerical approximations needed to construct a likelihood.” (p.4)

As often with econometrics papers, this notion of structural model sets me astray: does this mean any latent variable model or an incompletely defined model, and if so why is it incompletely defined? From a frequentist perspective anything random is not a parameter. The term exogeneity also hints at this notion of the parameter being not truly a parameter, but including latent variables and maybe random effects. Reading further (p.7) drives me to understand the structural model as defined by a moment condition, in the sense that

$\mathbb{E}[m(\mathbf{x},\theta)]=0$

has a unique solution in θ under the true model. However the focus then seems to make a major switch as Gallant considers the distribution of a pivotal quantity like

$Z=\sqrt{n} W(\mathbf{x},\theta)^{-\frac{1}{2}} m(\mathbf{x},\theta)$

as induced by the joint distribution on (x,θ), hence conversely inducing constraints on this joint, as well as an associated conditional. Which is something I have trouble understanding, First, where does this assumed distribution on Z stem from? And, second, exchanging randomness of terms in a random variable as if it was a linear equation is a pretty sure way to produce paradoxes and measure theoretic difficulties.

The purely mathematical problem itself is puzzling: if one knows the distribution of the transform Z=Z(X,Λ), what does that imply on the joint distribution of (X,Λ)? It seems unlikely this will induce a single prior and/or a single likelihood… It is actually more probable that the distribution one arbitrarily selects on m(x,θ) is incompatible with a joint on (x,θ), isn’t it?

“The usual computational method is MCMC (Markov chain Monte Carlo) for which the best known reference in econometrics is Chernozhukov and Hong (2003).” (p.6)

While I never heard of this reference before, it looks like a 50 page survey and may be sufficient for an introduction to MCMC methods for econometricians. What I do not get though is the connection between this reference to MCMC and the overall discussion of constructing priors (or not) out of fiducial distributions. The author also suggests using MCMC to produce the MAP estimate but this always stroke me as inefficient (unless one uses our SAME algorithm of course).

“One can also compute the marginal likelihood from the chain (Newton and Raftery (1994)), which is used for Bayesian model comparison.” (p.22)

Not the best solution to rely on harmonic means for marginal likelihoods…. Definitely not. While the author actually uses the stabilised version (15) of Newton and Raftery (1994) estimator, which in retrospect looks much like a bridge sampling estimator of sorts, it remains dangerously close to the original [harmonic mean solution] especially for a vague prior. And it only works when the likelihood is available in closed form.

“The MCMC chains were comprised of 100,000 draws well past the point where transients died off.” (p.22)

I wonder if the second statement (with a very nice image of those dying transients!) is intended as a consequence of the first one or independently.

“A common situation that requires consideration of the notions that follow is that deriving the likelihood from a structural model is analytically intractable and one cannot verify that the numerical approximations one would have to make to circumvent the intractability are sufficiently accurate.” (p.7)

This then is a completely different business, namely that defining a joint distribution by mean of moment equations prevents regular Bayesian inference because the likelihood is not available. This is more exciting because (i) there are alternative available! From ABC to INLA (maybe) to EP to variational Bayes (maybe). And beyond. In particular, the moment equations are strongly and even insistently suggesting that empirical likelihood techniques could be well-suited to this setting. And (ii) it is no longer a mathematical worry: there exist a joint distribution on m(x,θ), induced by a (or many) joint distribution on (x,θ). So the question of finding whether or not it induces a single proper prior on θ becomes relevant. But, if I want to use ABC, being given the distribution of m(x,θ) seems to mean I can only generate new values of this transform while missing a natural distance between observations and pseudo-observations. Still, I entertain lingering doubts that this is the meaning of the study. Where does the joint distribution come from..?!

“Typically C is coarse in the sense that it does not contain all the Borel sets (…)  The probability space cannot be used for Bayesian inference”

My understanding of that part is that defining a joint on m(x,θ) is not always enough to deduce a (unique) posterior on θ, which is fine and correct, but rather anticlimactic. This sounds to be what Gallant calls a “partial specification of the prior” (p.9).

Overall, after this linear read, I remain very much puzzled by the statistical (or Bayesian) implications of the paper . The fact that the moment conditions are central to the approach would once again induce me to check the properties of an alternative approach like empirical likelihood.

Bayes 250th versus Bayes 2.5.0.

Posted in Books, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , on July 20, 2013 by xi'an

More than a year ago Michael Sørensen (2013 EMS Chair) and Fabrizzio Ruggeri (then ISBA President) kindly offered me to deliver the memorial lecture on Thomas Bayes at the 2013 European Meeting of Statisticians, which takes place in Budapest today and the following week. I gladly accepted, although with some worries at having to cover a much wider range of the field rather than my own research topic. And then set to work on the slides in the past week, borrowing from my most “historical” lectures on Jeffreys and Keynes, my reply to Spanos, as well as getting a little help from my nonparametric friends (yes, I do have nonparametric friends!). Here is the result, providing a partial (meaning both incomplete and biased) vision of the field.

Since my talk is on Thursday, and because the talk is sponsored by ISBA, hence representing its members, please feel free to comment and suggest changes or additions as I can still incorporate them into the slides… (Warning, I purposefully kept some slides out to preserve the most surprising entry for the talk on Thursday!)

R.I.P. Emile…

Posted in Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , on July 5, 2013 by xi'an

I was thus in Montpellier for a few days, working with Jean-Michel Marin and attending the very final meeting of our ANR research group called Emile…  The very same group that introduced us to ABC in 2005. We had a great time, discussing about DIYABC.2, ABC for SNPs, and other extensions with our friend Arnaud Estoup, enjoying an outdoor dinner on the slopes of Pic Saint-Loup and a wine tasting on the way there, listening to ecological modelling this morning from elephant tracking [using INLA] to shell decoration in snails [using massive MCMC], running around Crès lake in the warm rain, and barely escaping the Tour de France on my way to the airport!!!

ISBA on INLA [webinar]

Posted in R, Statistics, University life with tags , , , , , , on April 3, 2013 by xi'an

If you have missed the item of information, Håvard Rue is giving an ISBA webinar tomorrow on INLA:

the ISBA Webinar on INLA is scheduled for April 4th, 2013
from 8:30 - 12:30 EDT.

-------------------------------------------------------
To join the online meeting (Now from mobile devices using the Cisco WebEx
Meeting App)
-------------------------------------------------------

2. Enter the meeting number  730 293 070 and click Join Now

A recording of the webinar will be provided shortly after the event.

Please verify that your computer is capable of connecting using WebEx at

https://support.webex.com/MyAccountWeb/systemRequirement.do?root=Tools&parent=System

or see https://www.webex.com/login/join-meeting-tips  if you are having
trouble connecting.

latent Gaussian model workshop in Reykjavik

Posted in Mountains, R, Statistics, Travel, University life with tags , , , , on March 29, 2013 by xi'an

An announcement for an Icelandic meeting next September, meeting I would have loved to attend (darn!)… This meeting is sponsored by the BayesComp session, of course!!!

We are pleased to announce that the University of Iceland will host the 3rd Workshop on Bayesian Inference for Latent Gaussian Models with Applications (LGM).

The workshop will be held in Reykjavik, Iceland, on September 12-14 2013 at Harpa ~V Reykjavik Concert Hall and Conference Centre:

The emphasized topics of LGM 2013 are:
-Machine learning
-Spatial and spatio-temporal modeling
-Bayesian non-parametrics
-Latent Gaussian models
-The workshop is not restricted to these topics

The invited speakers are:
-Matthias Katzfuß at Universität Heidelberg
-Bani Mallick at Texas A&M University
-Peter Müller at University of Texas
-Michèle Sebag at INRIA Saclay, CNRS
-Matthias Seeger at École Polytechnique Fédérale de Lausanne
-Christopher Wikle at University of Missouri

Registration fees:
Early bird fee before May 21th ~@ 375
Registration fee after May 21th ~@ 440
Student fee ~@ 250

Detailed information on the scientific program, conference field trip, organizing committee, scientific committee and meeting registration is available on the conference web-site:

LGM 2012, Trondheim

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , on May 31, 2012 by xi'an

A break from the “snapshots from Guérande” that will be a relief for all ‘ Og readers, I am sure: I am now in Trondheim, Norway, for the second Latent Gaussian model meeting, organised by Håvard Rue and his collaborators. As in the earlier edition in Zürich, the main approach to those models (that is adopted in the talks) is the INLA methodology of Rue, Martino and Chopin. I nonetheless (given the theme) gave a presentation on Rao-Blackwellisation techniques for MCMC algorithms. As I had not printed the program of the meeting prior to my departure (blame Guérande!), I had not realised I had only 20 minutes for my talk and kept adding remarks and slides during the flight from Amsterdam to Trondheim [where the clouds prevented me from seeing Jotunheimen]. (So I had to cut the second half of the talk below on parallelisation. Even with this cut, the 20 minutes went awfully fast!) Apart from my talk, I am afraid I was not in a sufficient state of awareness [due to a really early start] to give a comprehensive of the afternoon talks….

Trondheim is a nice city that sometimes feels like a village despite its size. Walking up to the university along typical wooden houses, then going around the town and along the river tonight while running a 10k loop left me with the impression of a very pleasant place (at least in the summer months).

Bayes on drugs (guest post)

Posted in Books, R, Statistics, University life with tags , , , , , , , on May 21, 2012 by xi'an

This post is written by Julien Cornebise.

Last week in Aachen was the 3rd Edition of the Bayes(Pharma) workshop. Its specificity: half-and-half industry/academic participants and speakers, all in Pharmaceutical statistics, with a great care to welcome newcomers to Bayes, so as to spread as much as possible the love where it will actually be used. First things first: all the slides are available online, thanks to the speakers for sharing those. Full disclaimer: being part of the scientific committee of the workshop, I had a strong subjective prior.

3 days, 70 participants, we were fully booked, and even regretfully had to refuse inscriptions due to lack of room-space (!! German regulations are quite… enforced). Time to size it up for next year, maybe?

My most vivid impression overall: I was struck by the interactivity of the questions/answers after each talk. Rarely fewer than 5 questions per talk (come on, we’ve all attended sessions where the chairman is forced to ask the lone question — no such thing here!), on all points of each talk, with cross-references from one question to the other, even from one *talk* to the other! Seeing so much interaction and discussion in spite of (or, probably, thanks to ?) the diversity of the audience was a real treat: not only did the questions bring up additional details about the talk, they were, more importantly, bringing very precious highlight on the questioners’ mindsets, their practical concerns and needs. Both academics and industrials were learning on all counts — and, for having sometimes seen failed marriages of the kind in the past (either a French round-table degenerating in nasty polemic on “research-induced tax credit”, or just plain mismatch of interests), I was quite impressed that we were purely and simply all interested in multiple facets of the very same thing: the interface between pharma and stats.

As is now a tradition, the first day was a short course, this time by Pr. Emmanuel Lessaffre: based on his upcoming book on Bayesian Biostatistics (Xian, maybe a review someday?), it was meant to be introductory for newcomers to Bayes, but was still packed with enough “tricks of the trades” that even seasoned Bayesians could get something out of it. I very much appreciated the pedagogy in the “live” examples, with clear convergence caveats based on traceplots of common software (WinBUGS). The most vivid memory: his strong spotlight on INLA as “the future of Bayesian computation”. Although my research is mostly on MCMC/SMC, I’m now damn curious to give it a serious try — this was further reinforced by late evening discussions with Gianluca BaioM, who revealed that all his results that were all obtained in seconds of INLA computing.

Day 2 and half-day 3 were invited and contributed talks, all motivated by top-level applications. No convergence theorems here, but practical issues, with constraints that theoreticians (including myself!) would hardly guess exist: very small sample sizes, regulatory issues, concurrence with legacy methodology with only seconds-long runtime (impossible to run 1 million MCMC steps!), and sometimes even imposed software due to validation processes! Again, as stated above, the number and quality of questions is really what I will keep from those 2 days.

If I had to state one regret, maybe, it would be this unsatisfactory feeling that, for many newcomers, MCMC = WinBUGS — with its obvious restrictions. The lesson I learned: all the great methodological advances of the last 10 years, especially in Adaptive MCMC, have not yet reached most practitioners yet, since they need *tools* they can use. It may be a sign that, as methodological researchers, we should maybe put a stronger emphasis on bringing software packages forward (for R, of course, but also for JAGS or OpenBUGS!); not only a zip-file with our article’s codes, but a full-fledged package, with ongoing support, maintenance, and forum. That’s a tough balance to find, since the time maintaining a package does not count in the holy-bibliometry… but doesn’t it have more actual impact? Besides, more packages = less papers but also = more citations of the corresponding paper. Some do take this road (Robert Gramacy’s packages were cited last week as examples of great support, and Andy Gelman and Matt Hoffman are working on the much-expected STAN, and I mentioned above Havard Rue’s R-INLA), but I don’t think it is yet considered “best practices”.

As a conclusion, this Bayes-Pharma 2012 workshop reminded me a lot of the SAMSI 2010 Summer Program: while Bayes-Pharma aims to be much more introductory, they had in common this same success in blending pharma-industry and academy. Could it be a specificity of pharma? In which case, I’m looking very much forward opening ISBA’s Specialized Section on Biostat/Pharmastat that a few colleagues and I are currently working on (more on this here soon). With such a crowd on both sides of the Atlantic, and a looming Bayes 2013 in the Netherlands, that will be exciting.