Archive for MCMC algorithms

Cancún, ISBA 2014 [day #3]

Posted in pictures, Statistics, Travel, University life with tags , , , , , on July 23, 2014 by xi'an

Cancun13…already Thursday, our [early] departure day!, with an nth (!) non-parametric session that saw [the newly elected ISBA Fellow!] Judith Rousseau present an ongoing work with Chris Holmes on the convergence or non-convergence conditions for a Bayes factor of a non-parametric hypothesis against another non-parametric. I wondered at the applicability of this test as the selection criterion in ABC settings, even though having an iid sample to start with is a rather strong requirement.

Switching between a scalable computation session with Alex Beskos, who talked about adaptive Langevin algorithms for differential equations, and a non-local prior session, with David Rossell presenting a smoother way to handle point masses in order to accommodate frequentist coverage. Something we definitely need to discuss the next time I am in Warwick! Although this made me alas miss both the first talk of the non-local session by Shane Jensen  the final talk of the scalable session by Doug Vandewrken where I happened to be quoted (!) for my warning about discretising Markov chains into non-Markov processes. In the 1998 JASA paper with Chantal Guihenneuc.

After a farewell meal of ceviche with friends in the sweltering humidity of a local restaurant, I attended [the newly elected ISBA Fellow!] Maria Vanucci’s talk on her deeply involved modelling of fMRI. The last talk before the airport shuttle was François Caron’s description of a joint work with Emily Fox on a sparser modelling of networks, along with an auxiliary variable approach that allowed for parallelisation of a Gibbs sampler. François mentioned an earlier alternative found in machine learning where all components of a vector are updated simultaneously conditional on the previous avatar of the other components, e.g. simulating (x’,y’) from π(x’|y) π(y’|x) which does not produce a convergent Markov chain. At least not convergent to the right stationary. However, running a quick [in-flight] check on a 2-d normal target did not show any divergent feature, when compared with the regular Gibbs sampler. I thus wonder at what can be said about the resulting target or which conditions are need for divergence. A few scribbles later, I realised that the 2-d case was the exception, namely that the stationary distribution of the chain is the product of the marginal. However, running a 3-d example with an auto-exponential distribution in the taxi back home, I still could not spot a difference in the outcome.

controlled thermodynamic integral for Bayesian model comparison [reply]

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , , , , , , on April 30, 2014 by xi'an

Reykjavik1Chris Oates wrotes the following reply to my Icelandic comments on his paper with Theodore Papamarkou, and Mark Girolami, reply that is detailed enough to deserve a post on its own:

Thank you Christian for your discussion of our work on the Og, and also for your helpful thoughts in the early days of this project! It might be interesting to speculate on some aspects of this procedure:

(i) Quadrature error is present in all estimates of evidence that are based on thermodynamic integration. It remains unknown how to exactly compute the optimal (variance minimising) temperature ladder “on-the-fly”; indeed this may be impossible, since the optimum is defined via a boundary value problem rather than an initial value problem. Other proposals for approximating this optimum are compatible with control variates (e.g. Grosse et al, NIPS 2013, Friel and Wyse, 2014). In empirical experiments we have found that the second order quadrature rule proposed by Friel and Wyse 2014 leads to substantially reduced bias, regardless of the specific choice of ladder.

(ii) Our experiments considered first and second degree polynomials as ZV control variates. In fact, intuition specifically motivates the use of second degree polynomials: Let us presume a linear expansion of the log-likelihood in θ. Then the implied score function is constant, not depending on θ. The quadratic ZV control variates are, in effect, obtained by multiplying the score function by θ. Thus control variates can be chosen to perfectly correlate with the log-likelihood, leading to zero-variance estimators. Of course, there is an empirical question of whether higher-order polynomials are useful when this Taylor approximation is inappropriate, but they would require the estimation of many more coefficients and in practice may be less stable.

(iii) We require that the control variates are stored along the chain and that their sample covariance is computed after the MCMC has terminated. For the specific examples in the paper such additional computation is a negligible fraction of the total computational, so that we did not provide specific timings. When non-diffegeometric MCMC is used to obtain samples, or when the score is unavailable in closed-form and must be estimated, the computational cost of the procedure would necessarily increase.

For the wide class of statistical models with tractable likelihoods, employed in almost all areas of statistical application, the CTI we propose should provide state-of-the-art estimation performance with negligible increase in computational costs.

controlled thermodynamic integral for Bayesian model comparison

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , , , , on April 24, 2014 by xi'an

Reykjavik1Chris Oates, Theodore Papamarkou, and Mark Girolami (all from the University of Warwick) just arXived a paper on a new form of thermodynamic integration for computing marginal likelihoods. (I had actually discussed this paper with the authors on a few occasions when visiting Warwick.) The other name of thermodynamic integration is path sampling (Gelman and Meng, 1998). In the current paper, the path goes from the prior to the posterior by a sequence of intermediary distributions using a power of the likelihood. While the path sampling technique is quite efficient a method, the authors propose to improve it through the recourse to control variates, in order to decrease the variance. The control variate is taken from Mira et al. (2013), namely a one-dimensional temperature-dependent transform of the score function. (Strictly speaking, this is an asymptotic control variate in that the mean is only asymptotically zero.) This control variate is then incorporated within the expectation inside the path sampling integral. Its arbitrary elements are then calibrated against the variance of the path sampling integral. Except for the temperature ladder where the authors use a standard geometric rate, as the approach does not account for Monte Carlo and quadrature errors. (The degree of the polynomials used in the control variates is also arbitrarily set.) Interestingly, the paper mixes a lot of recent advances, from the zero variance notion of Mira et al. (2013) to the manifold Metropolis-adjusted Langevin algorithm of Girolami and Calderhead (2011), uses as a base method pMCMC (Jasra et al., 2007). The examples processed in the paper are regression (where the controlled version truly has a zero variance!) and logistic regression (with the benchmarked Pima Indian dataset), with a counter-example of a PDE interestingly proposed in the discussion section. I quite agree with the authors that the method is difficult to envision in complex enough models. I also did not see mentions therein of the extra time involved in using this control variate idea.

Advances in scalable Bayesian computation [day #4]

Posted in Books, Mountains, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , , , , on March 7, 2014 by xi'an

polyptych painting within the TransCanada Pipeline Pavilion, Banff Centre, Banff, March 21, 2012Final day of our workshop Advances in Scalable Bayesian Computation already, since tomorrow morning is an open research time ½ day! Another “perfect day in paradise”, with the Banff Centre campus covered by a fine snow blanket, still falling…, and making work in an office of BIRS a dream-like moment.

Still looking for a daily theme, parallelisation could be the right candidate, even though other talks this week went into parallelisation issues, incl. Steve’s talk yesterday. Indeed, Anthony Lee gave a talk this morning on interactive sequential Monte Carlo, where he motivated the setting by a formal parallel structure. Then, Darren Wilkinson surveyed the parallelisation issues in Monte Carlo, MCMC, SMC and ABC settings, before arguing in favour of a functional language called Scala. (Neat entries to those topics can be found on Darren’s blog.) And in the afternoon session, Sylvia Frühwirth-Schnatter exposed her approach to the (embarrassingly) parallel problem, in the spirit of Steve’s , David Dunson’s and Scott’s (a paper posted on the day I arrived in Chamonix and hence I missed!). There was plenty to learn from that talk (do not miss the Yin-Yang moment at 25 mn!), but it also helped me to break a difficulty I had with the consensus Bayes representation for two weeks (more on that later!). And, even though Marc Suchard mostly talked about flu and trees in a very pleasant and broad talk, he also had a slide on parallelisation to fit the theme! Although unrelated with parallelism,  Nicolas Chopin’s talk was on sequential quasi-Monte Carlo algorithms: while I had heard previous versions of this talk in Chamonix and BigMC, I found it full of exciting stuff. And it clearly got the room truly puzzled by this possibility, in a positive way! Similarly, Alex Lenkoski spoke about extreme rain events in Norway with no trace of parallelism, but the general idea behind the examples was to question the notion of the calibrated Bayesian (with possible connections with the cut models).

This has been a wonderful week and I am sure the participants got as much as I did from the talks and the informal exchanges. Thanks to BIRS for the sponsorship and the superb organisation of the week (and to the Banff Centre for providing such a paradisical environment). I feel very privileged to have benefited from this support, even though I deadly hope to be back in Banff within a few years.

ABC for bivariate betas

Posted in Statistics, University life with tags , , , , , , , on February 19, 2014 by xi'an

Crakel and Flegal just arXived a short paper running ABC for doing inference on the parameters of two families of bivariate betas. And I could not but read it thru. And wonder why ABC was that necessary to handle the model. The said bivariate betas are defined from

V_1=(U_1+U_5+U_7)/(U_3+U_6+U_8)\,,

V_2=(U_2+U_5+U_8)/(U_4+U_6+U_7)

when

U_i\sim \text{Ga}(\delta_i,1)

and

X_1=V_1/(1+V_1)\,,\ X_2=V_2/(1+V_2)

This makes each term in the pair Beta and the two components dependent. This construct was proposed by Arnold and Ng (2011). (The five-parameter version cancels the gammas for i=3,4,5.)

Since the pdf of the joint distribution is not available in closed form, Crakel and Flegal zoom on ABC-MCMC as the method of choice and discuss simulation experiments. (The choice of the tolerance ε as an absolute rather than relative value, ε=0.2,0.0.6,0.8, puzzles me, esp. since the distance between the summary statistics is not scaled.) I however wonder why other approaches are impossible. (Or why it is necessary to use this distribution to model correlated betas. Unless I am confused copulas were invented to this effect.) First, this is a latent variable model, so latent variables could be introduced inside an MCMC scheme. A wee bit costly but feasible. Second, several moments of those distributions are known so a empirical likelihood approach could be considered.

MCKSki 5, where willst thou be?!

Posted in Mountains, Statistics, Travel, University life with tags , , , , , , , , , on February 10, 2014 by xi'an

ridge7[Here is a call from the BayesComp Board for proposals for MCMSki 5, renamed as below to fit the BayesComp section. The earlier poll on the 'Og helped shape the proposal, with the year, 2016 vs. 2017, remaining open. I just added town to resort below as it did not sound from the poll people were terribly interested in resorts.]

The Bayesian Computation Section of ISBA is soliciting proposals to host its flagship conference:

Bayesian Computing at MCMSki

The expectation is that the meeting will be held in January 2016, but the committee will consider proposals for other times through January 2017.

This meeting will be the next incarnation of the popular MCMSki series that addresses recent advances in the theory and application of Bayesian computational methods such as MCMC, all in the context of a world-class ski resort/town. While past meetings have taken place in the Alps and the Rocky Mountains, we encourage applications from any venue that could support MCMSki. A three-day meeting is planned, perhaps with an additional day or two of satellite meetings and/or short courses.

One page proposals should address feasibility of hosting the meeting including

1. Proposed dates.
2. Transportation for international participants (both the proximity of international airports and transportation to/from the venue).
3. The conference facilities.
4. The availability and cost of hotels, including low cost options.
5. The proposed local organizing committee and their collective experience organizing international meetings.
6. Expected or promised contributions from the host organization, host country, or industrial partners towards the cost of running the meetings.

Proposals should be submitted to David van Dyk (dvandyk, BayesComp Program Chair) at imperial.ac.uk no later than May 31, 2014.

The Board of Bayesian Computing Section will evaluate the proposals, choose a venue, and appoint the Program Committee for Bayesian Computing at MCMSki.

simulating determinantal processes

Posted in Statistics, Travel with tags , , , , , , , , , , on December 6, 2013 by xi'an

In the plane to Atlanta, I happened to read a paper called Efficient simulation of the Ginibre point process by Laurent Decreusefond, Ian Flint, and Anaïs Vergne (from Telecom Paristech). “Happened to” as it was a conjunction of getting tipped by my new Dauphine colleague (and fellow blogger!) Djalil Chaffaï about the paper, having downloaded it prior to departure, and being stuck in a plane (after watching the only Chinese [somewhat] fantasy movie onboard, Saving General Yang).

This is mostly a mathematics paper. While indeed a large chunk of it is concerned with the rigorous definition of this point process in an abstract space, the last part is about simulating such processes. They are called determinantal (and not detrimental as I was tempted to interpret on my first read!) because the density of an n-set (x1x2,…,xn) is given by a kind of generalised Vandermonde determinant

p(x_1,\ldots,x_n) = \dfrac{1}{n!} \text{det} \left( T(x_i,x_j) \right)

where T is defined in terms of an orthonormal family,

T(x,y) = \sum_{i=1}^n \psi_i(x) \overline{\psi_i(y)}.

(The number n of points can be simulated via an a.s. finite Bernoulli process.) Because of this representation, the sequence of conditional densities for the xi‘s (i.e. x1, x2 given x1, etc.) can be found in closed form. In the special case of the Ginibre process, the ψi‘s are of the form

\psi_i(z) =z^m \exp\{-|z|^2/2\}/\sqrt{\pi m!}

and the process cannot be simulated for it has infinite mass, hence an a.s. infinite number of points. Somehow surprisingly (as I thought this was the point of the paper), the authors then switch to a truncated version of the process that always has a fixed number N of points. And whose density has the closed form

p(x_1,\ldots,x_n) = \dfrac{1}{\pi^N} \prod_i \frac{1}{i!} \exp\{-|z_i|^2/2\}\prod_{i<j} |z_i-z_j|^2

It has an interestingly repulsive quality in that points cannot get close to one another. (It reminded me of the pinball sampler proposed by Kerrie Mengersen and myself at one of the Valencia meetings and not pursued since.) The conclusion (of this section) is anticlimactic, though,  in that it is known that this density also corresponds to the distribution of the eigenvalues of an Hermitian matrix with standardized complex Gaussian entries. The authors mentions that the fact that the support is the whole complex space Cn is a difficulty, although I do not see why.

The following sections of the paper move to the Ginibre process restricted to a compact and then to the truncated Ginibre process restricted to a compact, for which the authors develop corresponding simulation algorithms. There is however a drag in that the sequence of conditionals, while available in closed-form, cannot be simulated efficiently but rely on a uniform accept-reject instead. While I am certainly missing most of the points in the paper, I wonder if a Gibbs sampler would not be an interesting alternative given that the full (last) conditional is a Gaussian density…

Follow

Get every new post delivered to your Inbox.

Join 634 other followers