Archive for cut models

two Parisian talks by Pierre Jacob in January

Posted in pictures, Statistics, University life with tags , , , , , , , , , on December 21, 2017 by xi'an

While back in Paris from Harvard in early January, Pierre Jacob will give two talks on works of his:

January 09, 10:30, séminaire d’Analyse-Probabilités, Université Paris-Dauphine: Unbiased MCMC

Markov chain Monte Carlo (MCMC) methods provide consistent approximations of integrals as the number of iterations goes to infinity. However, MCMC estimators are generally biased after any fixed number of iterations, which complicates both parallel computation and the construction of confidence intervals. We propose to remove this bias by using couplings of Markov chains and a telescopic sum argument, inspired by Glynn & Rhee (2014). The resulting unbiased estimators can be computed independently in parallel, and confidence intervals can be directly constructed from the Central Limit Theorem for i.i.d. variables. We provide practical couplings for important algorithms such as the Metropolis-Hastings and Gibbs samplers. We establish the theoretical validity of the proposed estimators, and study their variances and computational costs. In numerical experiments, including inference in hierarchical models, bimodal or high-dimensional target distributions, logistic regressions with the Pólya-Gamma Gibbs sampler and the Bayesian Lasso, we demonstrate the wide applicability of the proposed methodology as well as its limitations. Finally, we illustrate how the proposed estimators can approximate the “cut” distribution that arises in Bayesian inference for misspecified models.

January 11, 10:30, CREST-ENSAE, Paris-Saclay: Better together? Statistical learning in models made of modules [Warning: Paris-Saclay is not in Paris!]

In modern applications, statisticians are faced with integrating heterogeneous data modalities relevant for an inference or decision problem. It is convenient to use a graphical model to represent the statistical dependencies, via a set of connected “modules”, each relating to a specific data modality, and drawing on specific domain expertise in their development. In principle, given data, the conventional statistical update then allows for coherent uncertainty quantification and information propagation through and across the modules. However, misspecification of any module can contaminate the update of others. In various settings, particularly when certain modules are trusted more than others, practitioners have preferred to avoid learning with the full model in favor of “cut distributions”. In this talk, I will discuss why these modular approaches might be preferable to the full model in misspecified settings, and propose principled criteria to choose between modular and full-model approaches. The question is intertwined with computational difficulties associated with the cut distribution, and new approaches based on recently proposed unbiased MCMC methods will be described.

Long enough after the New Year festivities (if any) to be fully operational for them!

better together?

Posted in Books, Mountains, pictures, Statistics, University life with tags , , , , , , , , on August 31, 2017 by xi'an

Yesterday came out on arXiv a joint paper by Pierre Jacob, Lawrence Murray, Chris Holmes and myself, Better together? Statistical learning in models made of modules, paper that was conceived during the MCMski meeting in Chamonix, 2014! Indeed it is mostly due to Martyn Plummer‘s talk at this meeting about the cut issue that we started to work on this topic at the fringes of the [standard] Bayesian world. Fringes because a standard Bayesian approach to the problem would always lead to use the entire dataset and the entire model to infer about a parameter of interest. [Disclaimer: the use of the very slogan of the anti-secessionists during the Scottish Independence Referendum of 2014 in our title is by no means a measure of support of their position!] Comments and suggested applications most welcomed!

The setting of the paper is inspired by realistic situations where a model is made of several modules, connected within a graphical model that represents the statistical dependencies, each relating to a specific data modality. In a standard Bayesian analysis, given data, a conventional statistical update then allows for coherent uncertainty quantification and information propagation through and across the modules. However, misspecification of or even massive uncertainty about any module in the graph can contaminate the estimate and update of parameters of other modules, often in unpredictable ways. Particularly so when certain modules are trusted more than others. Hence the appearance of cut models, where practitioners  prefer skipping the full model and limit the information propagation between these modules, for example by restricting propagation to only one direction along the edges of the graph. (Which is sometimes represented as a diode on the edge.) The paper investigates in which situations and under which formalism such modular approaches can outperform the full model approach in misspecified settings. By developing the appropriate decision-theoretic framework. Meaning we can choose between [several] modular and full-model approaches.

cut, baby, cut!

Posted in Books, Kids, Mountains, R, Statistics, University life with tags , , , , , , , , , , , , , on January 29, 2014 by xi'an

cutcutAt MCMSki IV, I attended (and chaired) a session where Martyn Plummer presented some developments on cut models. As I was not sure I had gotten the idea [although this happened to be one of those few sessions where the flu had not yet completely taken over!] and as I wanted to check about a potential explanation for the lack of convergence discussed by Martyn during his talk, I decided to (re)present the talk at our “MCMSki decompression” seminar at CREST. Martyn sent me his slides and also kindly pointed out to the relevant section of the BUGS book, reproduced above. (Disclaimer: do not get me wrong here, the title is a pun on the infamous “drill, baby, drill!” and not connected in any way to Martyn’s talk or work!)

I cannot say I get the idea any clearer from this short explanation in the BUGS book, although it gives a literal meaning to the word “cut”. From this description I only understand that a cut is the removal of an edge in a probabilistic graph, however there must/may be some arbitrariness in building the wrong conditional distribution. In the Poisson-binomial case treated in Martyn’s case, I interpret the cut as simulating from

\pi(\phi|z)\pi(\theta|\phi,y)=\dfrac{\pi(\phi)f(z|\phi)}{m(z)}\dfrac{\pi(\theta|\phi)f(y|\theta,\phi)}{m(y|\phi)}

instead of

\pi(\phi|z,\mathbf{y})\pi(\theta|\phi,y)\propto\pi(\phi)f(z|\phi)\pi(\theta|\phi)f(y|\theta,\phi)

hence loosing some of the information about φ… Now, this cut version is a function of φ and θ that can be fed to a Metropolis-Hastings algorithm. Assuming we can handle the posterior on φ and the conditional on θ given φ. If we build a Gibbs sampler instead, we face a difficulty with the normalising constant m(y|φ). Said Gibbs sampler thus does not work in generating from the “cut” target. Maybe an alternative borrowing from the rather large if disparate missing constant toolbox. (In any case, we do not simulate from the original joint distribution.) The natural solution would then be to make a independent proposal on φ with target the posterior given z and then any scheme that preserves the conditional of θ given φ and y; “any” is rather wistful thinking at this stage since the only practical solution that I see is to run a Metropolis-Hasting sampler long enough to “reach” stationarity… I also remain with a lingering although not life-threatening question of whether or not the BUGS code using cut distributions provide the “right” answer or not. Here are my five slides used during the seminar (with a random walk implementation that did not diverge from the true target…):