Archive for protein folding

bioinformatics workshop at Pasteur

Posted in Books, Statistics, University life with tags , , , , on September 23, 2013 by xi'an

Once again, I (did) find myself attending lectures on a Monday! This time, it was at the Institut Pasteur, (where I did not spot any mention of Alexandre Yersin) in the bioinformatics unit, around Bayesian methods in computational biology. The workshop was organised by Michael Nilges and the program started as follows:

9:10 AM Michael Habeck (MPI Göttingen) Bayesian methods for cryo-EM
9:50 AM John Chodera (Sloan-Kettering research institute) Toward Bayesian inference of conformational distributions, analysis of isothermal titration calorimetry experiments, and forcefield parameters
11:00 AM Jeff Hoch (University of Connecticut Health Center) Haldane, Bayes, and Reproducible Research: Bedrock Principles for the Era of Big  Data
11:40 AM Martin Weigt (UPMC Paris) Direct-Coupling Analysis: From residue co-evolution to structure prediction
12:20 PM Riccardo Pellarin (UCSF) Modeling the structure of macromolecules using cross-linking data
2:20 PM Frederic Cazals (INRIA Sophia-Antipolis) Coarse-grain Modeling of Large Macro-Molecular Assemblies: Selected Challenges
3:00 PM Yannick Spill (Institut Pasteur) Bayesian Treatment of SAXS Data
3:30 PM Guillaume Bouvier (Institut Pasteur) Clustering protein conformations using Self-Organizing Maps

This is a highly interesting community, from which stemmed many of the MC and MCMC ideas, but I must admit I got lost (in translation) most of the time (and did not attend the workshop till its end), just like when I attended this workshop at the German synchrotron in Hamburg last Spring: some terms and concepts were familiar like Gibbs sampling, Hamiltonian MCMC, HMM modelling, EM steps, maximum entropy priors, reversible jump MCMC, &tc., but the talks were going too fast (for me) and focussed instead on the bio-chemical aspects, like protein folding, entropy-enthalpy, free energy, &tc. So the following comments mostly reflect my being alien to this community…

For instance, I found the talk by John Chodera quite interesting (in a fast-forward high-energy/content manner), but the probabilistic modelling was mostly absent from his slides (and seemed to reduce to a Gaussian likelihood) and the defence of Bayesian statistics sounded a bit like a mantra at times (something like “put a prior on everything you do not know and everything will end up fine with enough simulations”), a feature I once observed in the past with Bayesian ideas coming to a new field (although this hardly seems to be the case here).

All talks I attended mentioned maximum entropy as a way of modelling, apparently a common tool in this domain (as there were too little details for me). For instance, Jeff Hoch’s talk remained at a very general level, referring to a large literature (incl. David Donoho’s) for the advantages of using MaxEnt deconvolution to preserve sensitivity. (The “Haldane” part of his talk was about Haldane —who moved from UCL to the ISI in Calcutta— writing a parody on how to fake genetic data in a convincing manner. And showing the above picture.) Although he linked them with MaxEnt principles, Martin Weigt’s talk was about Markov random fields modelling contacts between amino acids in the protein, but I could not get how the selection among the huge number of possible models was handled: To me it seemed to amount to estimate a graphical model on the protein, as it also did for my neighbour. (No sign of any ABC processing in the picture.)

Monte Carlo workshop (Tage 1 & 2)

Posted in Statistics, Travel, University life with tags , , , , , , , , , , on February 21, 2013 by xi'an

IMG_4803Gathering with simulators from other fields (mostly [quantum] physicists) offers both the appeal of seeing different perspectives on simulation and the diffiulty of having to filter alien vocabulary and presentation styles (generally assuming too much background from the audience). For instance; while the first talk on Tuesday by Gergely Barnaföldi about using GPUs for simulation was quite accessible, showing poor performances of the (CPU based) Mersenne twister., when using Dieharder as the evaluator. (This was in comparison with GPU-based solutions.) This provided an interesting contrapoint to the (later) seminar by Frederik James on random generators. (Of course, I did have some preliminary background on the topic.)

On the opposite, the second talk by Stefan Schäfer involved hybrid Monte Carlo methods but it took a lot of efforts (for me) to translate back to my understanding of the notion, gathered from this earlier Read Paper of Girolami and Calderhead, with the heat-bath and leapfrog algorithms. One extreme talk in this regard was William Lester’s talk on Wednesday morning on quantum Monte Carlo and its applications in computational chemistry where I could not get past the formulas! Too bad because it sounded quite innovative with notions like variational Monte Carlo and diffusion Monte Carlo… Nice movies, though. On the other hand, the final talk of the morning by Gabor Molnar-Saska on option pricing was highly pedagogical, defining everything and using simple examples as illustrations. (It certainly did not cure my misgivings about modelling the evolution of stock prices via pre-defined diffusions like Black-and-Scholes’, but the introduction was welcome, given the heterogeneity of the audience.) Both talks on transportation problems were also more accessible (maybe because they involved no pysics!)

The speakers in the afternoon sessions of Wednesday also made a huge effort to bring the whole audience up-to-date about their topic, like protein folding and high-energy particle physics (although everyone knows about the Higgs boson nowadays!). And ensemble Kalman filters (x2). In particular, Andrew Stuart did a great job with his simulation movies. Even the final talk about path-sampling for quantum simulation was mostly understandable, at least the problematic of it.  Sadly, at this stage, I still cannot put a meaning on “quantum Monte Carlo”… (Incidentally, I do not think my own talk reached much of the audience, missing convincing examples I did not have time to present:)