Archive for Monte Carlo Statistical Methods

PMC for combinatoric spaces

Posted in Statistics, University life with tags , , , , , , , on July 28, 2014 by xi'an

I received this interesting [edited] email from Xiannian Fan at CUNY:

I am trying to use PMC to solve Bayesian network structure learning problem (which is in a combinatorial space, not continuous space).

In PMC, the proposal distributions qi,t can be very flexible, even specific to each iteration and each instance. My problem occurs due to the combinatorial space.

For importance sampling, the requirement for proposal distribution, q, is:

support (p) ⊂ support (q)             (*)

For PMC, what is the support of the proposal distribution in iteration t? is it

support (p) ⊂ U support(qi,t)    (**)

or does (*) apply to every qi,t?

For continuous problem, this is not a big issue. We can use random walk of Normal distribution to do local move satisfying (*). But for combination search, local moving only result in finite states choice, just not satisfying (*). For example for a permutation (1,3,2,4), random swap has only choose(4,2)=6 neighbor states.

Fairly interesting question about population Monte Carlo (PMC), a sequential version of importance sampling we worked on with French colleagues in the early 2000’s.  (The name population Monte Carlo comes from Iba, 2000.)  While MCMC samplers do not have to cover the whole support of p at each iteration, it is much harder for importance samplers as their core justification is to provide an unbiased estimator to for all integrals of interest. Thus, when using the PMC estimate,

1/n ∑i,t {p(xi,t)/qi,t(xi,t)}h(qi,t),  xi,t~qi,t(x)

this estimator is only unbiased when the supports of the qi,t “s are all containing the support of p. The only other cases I can think of are

  1. associating the qi,t “s with a partition Si,t of the support of p and using instead

    i,t {p(xi,t)/qi,t(xi,t)}h(qi,t), xi,t~qi,t(x)

  2. resorting to AMIS under the assumption (**) and using instead

    1/n ∑i,t {p(xi,t)/∑j,t qj,t(xi,t)}h(qi,t), xi,t~qi,t(x)

but I am open to further suggestions!

recycling accept-reject rejections (#2)

Posted in R, Statistics, University life with tags , , , , , , on July 2, 2014 by xi'an

Following yesterday’s post on Rao’s, Liu’s, and Dunson’s paper on a new approach to intractable normalising constants, and taking advantage of being in Warwick, I tested the method on a toy model, namely the posterior associated with n Student’s t observations with unknown location parameter μ and a flat prior,

x_1,\ldots,x_n \sim p(x|\mu) \propto \left[ 1+(x-\mu)^2/\nu \right]^{-(\nu+1)/2}

which is “naturally” bounded by a Cauchy density with scale √ν. The constant M is then easily derived and running the new algorithm follows from a normal random walk proposal targeting the augmented likelihood (R code below).

dunsonAs shown by the above graph, the completion-by-rejection scheme produces a similar outcome (tomato) as the one based on the sole observations (steelblue). With a similar acceptance rate. However, the computing time is much much degraded:

> system.time(g8())
   user  system elapsed
 53.751   0.056  54.103
> system.time(g9())
   user  system elapsed
  1.156   0.000   1.161

when compared with the no-completion version. Here is the entire R code that produced both MCMC samples: Continue reading

recycling accept-reject rejections

Posted in Statistics, University life with tags , , , , , , , , , on July 1, 2014 by xi'an

Vinayak Rao, Lizhen Lin and David Dunson just arXived a paper which proposes anew technique to handle intractable normalising constants. And which exact title is Data augmentation for models based on rejection sampling. (Paper that I read in the morning plane to B’ham, since this is one of my weeks in Warwick.) The central idea therein is that, if the sample density (aka likelihood) satisfies

p(x|\theta) \propto f(x|\theta) \le q(x|\theta) M\,,

where all terms but p are known in closed form, then completion by the rejected values of an hypothetical accept-reject algorithm−hypothetical in the sense that the data does not have to be produced by an accept-reject scheme but simply the above domination condition to hold−allows for a data augmentation scheme. Without requiring the missing normalising constant. Since the completed likelihood is

\prod_{i=1}^n \dfrac{f(x_i|\theta)}{M} \prod_{j=1}^{m_i} \left\{q(y_{ij}|\theta) -\dfrac{f(y_{ij}|\theta)}{M}\right\}

A closed-form, if not necessarily congenial, function.

Now this is quite a different use of the “rejected values” from the accept reject algorithm when compared with our 1996 Biometrika paper on the Rao-Blackwellisation of accept-reject schemes (which, still, could have been mentioned there… Or Section 4.2 of Monte Carlo Statistical Methods. Rather than re-deriving the joint density of the augmented sample, “accepted+rejected”.)

It is a neat idea in that it completely bypasses the approximation of the normalising constant. And avoids the somewhat delicate tuning of the auxiliary solution of Moller et al. (2006)  The difficulty with this algorithm is however in finding an upper bound M on the unnormalised density f that is

  1. in closed form;
  2. with a manageable and tight enough “constant” M;
  3. compatible with running a posterior simulation conditional on the added rejections.

The paper seems to assume further that the bound M is independent from the current parameter value θ, at least as suggested by the notation (and Theorem 2), but this is not in the least necessary for the validation of the formal algorithm. Such a constraint would pull M higher, hence reducing the efficiency of the method. Actually the matrix Langevin distribution considered in the first example involves a bound that depends on the parameter κ.

The paper includes a result (Theorem 2) on the uniform ergodicity that relies on heavy assumptions on the proposal distribution. And a rather surprising one, namely that the probability of rejection is bounded from below, i.e. calling for a less efficient proposal. Now it seems to me that a uniform ergodicity result holds as well when the probability of acceptance is bounded from below since, then, the event when no rejection occurs constitutes an atom from the augmented Markov chain viewpoint. There therefore occurs a renewal each time the rejected variable set ϒ is empty, and ergodicity ensues (Robert, 1995, Statistical Science).

Note also that, despite the opposition raised by the authors, the method per se does constitute a pseudo-marginal technique à la Andrieu-Roberts (2009) since the independent completion by the (pseudo) rejected variables produces an unbiased estimator of the likelihood. It would thus be of interest to see how the recent evaluation tools of Andrieu and Vihola can assess the loss in efficiency induced by this estimation of the likelihood.

Maybe some further experimental evidence tomorrow…

Statistical modeling and computation [apologies]

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , on June 11, 2014 by xi'an

In my book review of the recent book by Dirk Kroese and Joshua Chan,  Statistical Modeling and Computation, I mistakenly and persistently typed the name of the second author as Joshua Chen. This typo alas made it to the printed and on-line versions of the subsequent CHANCE 27(2) column. I am thus very much sorry for this mistake of mine and most sincerely apologise to the authors. Indeed, it always annoys me to have my name mistyped (usually as Roberts!) in references.  [If nothing else, this typo signals it is high time for a change of my prescription glasses.]

computational methods for statistical mechanics [day #4]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , on June 7, 2014 by xi'an

Arthur Seat, Edinburgh, Sep. 7, 2011

My last day at this ICMS workshop on molecular simulation started [with a double loop of Arthur's Seat thankfully avoiding the heavy rains of the previous night and then] Chris Chipot‘s magistral entry to molecular simulation for proteins with impressive slides and simulation movies, even though I could not follow the details to really understand the simulation challenges therein, just catching a few connections with earlier talks. A typical example of a cross-disciplinary gap, where the other discipline always seems to be stressing the ‘wrong” aspects. Although this is perfectly unrealistic, it would immensely to prepare talks in pairs for such interdisciplinary workshops! Then Gersende Fort presented results about convergence and efficiency for the Wang-Landau algorithm. The idea is to find the optimal rate for updating the weights of the elements of the partition towards reaching the flat histogram in minimal time. Showing massive gains on toy examples. The next talk went back to molecular biology with Jérôme Hénin‘s presentation on improved adaptive biased sampling. With an exciting notion of orthogonality aiming at finding the slowest directions in the target and putting the computational effort. He also discussed the tension between long single simulations and short repeated ones, echoing a long-going debate in the MCMC community. (He also had a slide with a picture of my first 1983 Apple IIe computer!) Then Antonietta Mira gave a broad perspective on delayed rejection and zero variance estimates. With impressive variance reductions (although some physicists then asked for reduction of order 10¹⁰!). Johannes Zimmer gave a beautiful maths talk on the connection between particle and diffusion limits (PDEs) and Wasserstein geometry and large deviations. (I did not get most of the talk, but it was nonetheless beautiful!) Bert Kappen concluded the day (and the workshop for me) by a nice introduction to control theory. Making connection between optimal control and optimal importance sampling. Which made me idly think of the following problem: what if control cannot be completely… controlled and hence involves a stochastic part? Presumably of little interest as the control would then be on the parameters of the distribution of the control.

“The alanine dipeptide is the fruit fly of molecular simulation.”

The example of this alanine dipeptide molecule was so recurrent during the talks that it justified the above quote by Michael Allen. Not that I am more proficient in the point of studying this protein or using it as a benchmark. Or in identifying the specifics of the challenges of molecular dynamics simulation. Not a criticism of the ICMS workshop obviously, but rather of my congenital difficulty with continuous time processes!!! So I do not return from Edinburgh with a new research collaborative project in molecular dynamics (if with more traditional prospects), albeit with the perception that a minimal effort could bring me to breach the vocabulary barrier. And maybe consider ABC ventures in those (new) domains. (Although I fear my talk on ABC did not impact most of the audience!)

computational methods for statistical mechanics [day #3]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , on June 6, 2014 by xi'an

Arthur Seat, Edinburgh, Sep. 7, 2011

The third day [morn] at our ICMS workshop was dedicated to path sampling. And rare events. Much more into [my taste] Monte Carlo territory. The first talk by Rosalind Allen looked at reweighting trajectories that are not in an equilibrium or are missing the Boltzmann [normalizing] constant. Although the derivation against a calibration parameter looked like the primary goal rather than the tool for constant estimation. Again papers in J. Chem. Phys.! And a potential link with ABC raised by Antonietta Mira… Then Jonathan Weare discussed stratification. With a nice trick of expressing the normalising constants of the different terms in the partition as solution(s) of a Markov system

v\mathbf{M}=v

Because the stochastic matrix M is easier (?) to approximate. Valleau’s and Torrie’s umbrella sampling was a constant reference in this morning of talks. Arnaud Guyader’s talk was in the continuation of Toni Lelièvre’s introduction, which helped a lot in my better understanding of the concepts. Rephrasing things in more statistical terms. Like the distinction between equilibrium and paths. Or bias being importance sampling. Frédéric Cérou actually gave a sort of second part to Arnaud’s talk, using importance splitting algorithms. Presenting an algorithm for simulating rare events that sounded like an opposite nested sampling, where the goal is to get down the target, rather than up. Pushing particles away from a current level of the target function with probability ½. Michela Ottobre completed the series with an entry into diffusion limits in the Roberts-Gelman-Gilks spirit when the Markov chain is not yet stationary. In the transient phase thus.

computational methods for statistical mechanics [day #2]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , on June 5, 2014 by xi'an

Arthur Seat, Edinburgh, Sep. 7, 2011

The last “tutorial” talk at this ICMS workshop ["at the interface between mathematical statistics and molecular simulation"] was given by Tony Lelièvre on adaptive bias schemes in Langevin algorithms and on the parallel replica algorithm. This was both very interesting because of the potential for connections with my “brand” of MCMC techniques and rather frustrating as I felt the intuition behind the physical concepts like free energy and metastability was almost within my reach! The most manageable time in Tony’s talk was the illustration of the concepts through a mixture posterior example. Example that I need to (re)read further to grasp the general idea. (And maybe the book on Free Energy Computations Tony wrote with Mathias Rousset et Gabriel Stoltz.) A definitely worthwhile talk that I hope will get posted on line by ICMS. The other talks of the day were mostly of a free energy nature, some using optimised bias in the Langevin diffusion (except for Pierre Jacob who presented his non-negative unbiased estimation impossibility result).

Follow

Get every new post delivered to your Inbox.

Join 644 other followers