Archive for John Burdon Sanderson Haldane

approximation of improper by vague priors

Posted in Statistics, University life with tags , , , on November 18, 2013 by xi'an

“…many authors prefer to replace these improper priors by vague priors, i.e. probability measures that aim to represent very few knowledge on the parameter.”

Christèle Bioche and Pierre Druihlet arXived a few days ago a paper with this title. They aim at bringing a new light on the convergence of vague priors to their limit. Their notion of convergence is a pointwise convergence in the quotient space of Radon measures, quotient being defined by the removal of the “normalising” constant. The first results contained in the paper do not show particularly enticing properties of the improper limit of proper measures as the limit cannot be given any (useful) probabilistic interpretation. (A feature already noticeable when reading Jeffreys.) The first result that truly caught my interest in connection with my current research is the fact that the Haar measures appear as a (weak) limit of conjugate priors (Section 2.5). And that the Jeffreys prior is the limit of the parametrisation-free conjugate priors of Druilhet and Pommeret (2012, Bayesian Analysis, a paper I will discuss soon!). The result about the convergence of posterior means is rather anticlimactic as the basis assumption is the uniform integrability of the sequence of the prior densities. An interesting counterexample (somehow familiar to invariance fans): the sequence of Poisson distributions with mean n has no weak limit. And the Haldane prior does appear as a limit of Beta distributions (less surprising). On (0,1) if not on [0,1].

The paper contains a section on the Jeffreys-Lindley paradox, which is only considered from the second perspective, the one I favour. There is however a mention made of the noninformative answer, which is the (meaningless) one associated with the Lebesgue measure of normalising constant one. This Lebesgue measure also appears as a weak limit in the paper, even though the limit of the posterior probabilities is 1. Except when the likelihood has bounded variations outside compacts. Then the  limit of the probabilities is the prior probability of the null… Interesting, truly, but not compelling enough to change my perspective on the topic. (And thanks to the authors for their thanks!)

bioinformatics workshop at Pasteur

Posted in Books, Statistics, University life with tags , , , , on September 23, 2013 by xi'an

Once again, I (did) find myself attending lectures on a Monday! This time, it was at the Institut Pasteur, (where I did not spot any mention of Alexandre Yersin) in the bioinformatics unit, around Bayesian methods in computational biology. The workshop was organised by Michael Nilges and the program started as follows:

9:10 AM Michael Habeck (MPI Göttingen) Bayesian methods for cryo-EM
9:50 AM John Chodera (Sloan-Kettering research institute) Toward Bayesian inference of conformational distributions, analysis of isothermal titration calorimetry experiments, and forcefield parameters
11:00 AM Jeff Hoch (University of Connecticut Health Center) Haldane, Bayes, and Reproducible Research: Bedrock Principles for the Era of Big  Data
11:40 AM Martin Weigt (UPMC Paris) Direct-Coupling Analysis: From residue co-evolution to structure prediction
12:20 PM Riccardo Pellarin (UCSF) Modeling the structure of macromolecules using cross-linking data
2:20 PM Frederic Cazals (INRIA Sophia-Antipolis) Coarse-grain Modeling of Large Macro-Molecular Assemblies: Selected Challenges
3:00 PM Yannick Spill (Institut Pasteur) Bayesian Treatment of SAXS Data
3:30 PM Guillaume Bouvier (Institut Pasteur) Clustering protein conformations using Self-Organizing Maps

This is a highly interesting community, from which stemmed many of the MC and MCMC ideas, but I must admit I got lost (in translation) most of the time (and did not attend the workshop till its end), just like when I attended this workshop at the German synchrotron in Hamburg last Spring: some terms and concepts were familiar like Gibbs sampling, Hamiltonian MCMC, HMM modelling, EM steps, maximum entropy priors, reversible jump MCMC, &tc., but the talks were going too fast (for me) and focussed instead on the bio-chemical aspects, like protein folding, entropy-enthalpy, free energy, &tc. So the following comments mostly reflect my being alien to this community…

For instance, I found the talk by John Chodera quite interesting (in a fast-forward high-energy/content manner), but the probabilistic modelling was mostly absent from his slides (and seemed to reduce to a Gaussian likelihood) and the defence of Bayesian statistics sounded a bit like a mantra at times (something like “put a prior on everything you do not know and everything will end up fine with enough simulations”), a feature I once observed in the past with Bayesian ideas coming to a new field (although this hardly seems to be the case here).

All talks I attended mentioned maximum entropy as a way of modelling, apparently a common tool in this domain (as there were too little details for me). For instance, Jeff Hoch’s talk remained at a very general level, referring to a large literature (incl. David Donoho’s) for the advantages of using MaxEnt deconvolution to preserve sensitivity. (The “Haldane” part of his talk was about Haldane —who moved from UCL to the ISI in Calcutta— writing a parody on how to fake genetic data in a convincing manner. And showing the above picture.) Although he linked them with MaxEnt principles, Martin Weigt’s talk was about Markov random fields modelling contacts between amino acids in the protein, but I could not get how the selection among the huge number of possible models was handled: To me it seemed to amount to estimate a graphical model on the protein, as it also did for my neighbour. (No sign of any ABC processing in the picture.)