Archive for optimisation

variational particle approximations

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , on February 28, 2014 by xi'an

IMG_2515In the plane to Montréal, today, I read this paper by Kulkarni, Saeedi and Gershman, which will be presented at AISTATS. The main idea is to create a mix between particle Monte Carlo and a kind of quasi-Monte Carlo technique (qNC is not mentionned in the paper), using variational inference (and coordinate ascent) to optimise the location and weight of the particles. It is however restricted to cases with finite support (as a product of N latent variables) as in an HMM with a finite state space. There is also something I do not get in the HMM case, which is that the variational approximation to the filtering is contracted sequentially. This means that at time the K highest weight current particles are selected while the past remains unchanged. Is this due to the Markovian nature of the hidden model? (Blame oxygen deprivation, altitude dizziness or travelling stress, then!) I also fail to understand how for filtering, “at each time step, the algorithm selects the K continuations (new variable assignments of the current particle set) that maximize the variational free energy.” Because the weight function to be optimised (eqn (11)) seems to freeze the whole past path of particles… I presume I will find an opportunity while in Reykjavik to discuss those issues with the authors.

Foundations of Statistical Algorithms [book review]

Posted in Books, Linux, R, Statistics, University life with tags , , , , , , , , , , , , , on February 28, 2014 by xi'an

There is computational statistics and there is statistical computing. And then there is statistical algorithmic. Not the same thing, by far. This 2014 book by Weihs, Mersman and Ligges, from TU Dortmund, the later being also a member of the R Core team, stands at one end of this wide spectrum of techniques required by modern statistical analysis. In short, it provides the necessary skills to construct statistical algorithms and hence to contribute to statistical computing. And I wish I had the luxury to teach from Foundations of Statistical Algorithms to my graduate students, if only we could afford an extra yearly course…

“Our aim is to enable the reader (…) to quickly understand the main ideas of modern numerical algorithms [rather] than having to memorize the current, and soon to be outdated, set of popular algorithms from computational statistics.”(p.1)

The book is built around the above aim, first presenting the reasons why computers can produce answers different from what we want, using least squares as a mean to check for (in)stability, then second establishing the ground forFishman Monte Carlo methods by discussing (pseudo-)random generation, including MCMC algorithms, before moving in third to bootstrap and resampling techniques, and  concluding with parallelisation and scalability. The text is highly structured, with frequent summaries, a division of chapters all the way down to sub-sub-sub-sections, an R implementation section in each chapter, and a few exercises. Continue reading

optimisation via slice sampling

Posted in Statistics with tags , , , , on December 20, 2012 by xi'an

simulated annealing path on a multimodal surface (c) Robert and Casella, 2007This morning, over breakfast; I read the paper recently arXived by John Birge et Nick Polson. I was intrigued by the combination of optimisation and of slice sampling, but got a wee disappointed by the paper, in that it proposes a simple form of simulated annealing, minimising f(x) by targeting a small collection of energy functions exp{-τf(x)}. Indeed, the slice sampler is used to explore each of those targets, i.e. for a fixed temperature τ. For the four functions considered in the paper, a slice sampler can indeed be implemented, but this feature could be seen as a marginalia, given that a random walk Metropolis-Hastings algorithm could be used as a proposal mechanism in other cases. The other intriguing fact is that annealing is not used in the traditional way of increasing the coefficient τ along iterations (as in our SAME algorithm), for which convergence issues are much more intricate, but instead stays at the level where a whole (Markov) sample is used for each temperature, the outcomes being then compared in terms of localisation (and maybe for starting at the next temperature value). I did not see any discussion about the selection of the successive temperatures, which usually is a delicate issue in realistic settings, nor of the stopping rule(s) used to decide the maximum has been reached.

Special Issue of ACM TOMACS on Monte Carlo Methods in Statistics

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , on December 10, 2012 by xi'an

As posted here a long, long while ago, following a suggestion from the editor (and North America Cycling Champion!) Pierre Lécuyer (Université de Montréal), Arnaud Doucet (University of Oxford) and myself acted as guest editors for a special issue of ACM TOMACS on Monte Carlo Methods in Statistics. (Coincidentally, I am attending a board meeting for TOMACS tonight in Berlin!) The issue is now ready for publication (next February unless I am confused!) and made of the following papers:

* Massive parallelization of serial inference algorithms for a complex generalized linear model
MARC A. SUCHARD, IVAN ZORYCH, PATRICK RYAN, DAVID MADIGAN
*Convergence of a Particle-based Approximation of the Block Online Expectation Maximization Algorithm
SYLVAIN LE CORFF and GERSENDE FORT
* Efficient MCMC for Binomial Logit Models
AGNES FUSSL, SYLVIA FRÜHWIRTH-SCHNATTER, RUDOLF FRÜHWIRTH
* Adaptive Equi-Energy Sampler: Convergence and Illustration
AMANDINE SCHRECK and GERSENDE FORT and ERIC MOULINES
* Particle algorithms for optimization on binary spaces
CHRISTIAN SCHÄFER
* Posterior expectation of regularly paved random histograms
RAAZESH SAINUDIIN, GLORIA TENG, JENNIFER HARLOW, and DOMINIC LEE
* Small variance estimators for rare event probabilities
MICHEL BRONIATOWSKI and VIRGILE CARON
* Self-Avoiding Random Dynamics on Integer Complex Systems
FIRAS HAMZE, ZIYU WANG, and NANDO DE FREITAS
* Bayesian learning of noisy Markov decision processes
SUMEETPAL S. SINGH, NICOLAS CHOPIN, and NICK WHITELEY

Here is the draft of the editorial that will appear at the beginning of this special issue. (All faults are mine, of course!) Continue reading

more typos in Monte Carlo statistical methods

Posted in Books, Statistics, University life with tags , , , , , , , , , on October 28, 2011 by xi'an

Jan Hanning kindly sent me this email about several difficulties with Chapters 3, Monte Carlo Integration, and  5, Monte Carlo Optimization, when teaching out of our book Monte Carlo Statistical Methods [my replies in italics between square brackets, apologies for the late reply and posting, as well as for the confusion thus created. Of course, the additional typos will soon be included in the typo lists on my book webpage.]:

  1. I seem to be unable to reproduce Table 3.3 on page 88 – especially the chi-square column does not look quite right. [No, they definitely are not right: the true χ² quantiles should be 2.70, 3.84, and 6.63, at the levels 0.1, 0.05, and 0.01, respectively. I actually fail to understand how we got this table that wrong...]
  2. The second question  I have is the choice of the U(0,1) in this Example 3.6. It  feels to me that a choice of Beta(23.5,18.5) for p1 and Beta(36.5,5.5) for p2 might give a better representation based on the data we have. Any comments? [I am plainly uncertain about this... Yours is the choice based on the posterior Beta coefficient distributions associated with Jeffreys prior, hence making the best use of the data. I wonder whether or not we should remove this example altogether... It is certainly "better" than the uniform. However, in my opinion, there is no proper choice for the distribution of the pi's because we are mixing there a likelihood-ratio solution with a Bayesian perspective on the predictive distribution of the likelihood-ratio. If anything, this exposes the shortcomings of a classical approach, but it is likely to confuse the students! Anyway, this is a very interesting problem.]
  3. My students discovered that Problem 5.19 has the following typos, copying from their e-mail: “x_x” should be “x_i” [sure!]. There are a few “( )”s missing here and there [yes!]. Most importantly, the likelihood/density seems incorrect. The normalizing constant should be the reciprocal of the one showed in the book [oh dear, indeed, the constant in the exponential density did not get to the denominator...]. As a result, all the formulas would differ except the ones in part (a). [they clearly need to be rewritten, sorry about this mess!]
  4. I am unsure about the if and only if part of the Theorem 5.15 [namely that the likelihood sequence is stationary if and only if the Q function in the E step has reached a stationary point]. It appears to me that a condition for the “if part” is missing [the "only if" part is a direct consequence of Jensen's inequality]. Indeed Theorem 1 of Dempster et al 1977 has an extra condition [note that the original proof for convergence of EM has a flaw, as discussed here]. Am I missing something obvious? [maybe: it seems to me that, once Q reaches a fixed point, the likelihood L does not change... It is thus tautological, not a proof of convergence! But the theorem says a wee more, so this needs investigating. As Jan remarked, there is no symmetry in the Q function...]
  5. Should there be a (n-m) in the last term of formula (5.17)? [yes, indeed!, multiply the last term by (n-m)]
  6. Finally, I am a bit confused about the likelihood in Example 5.22 [which is a capture-recapture model]. Assume that Hij=k [meaning the animal i is in state k at time j]. Do you assume that you observed Xijr [which is the capture indicator for animal i at time j in zone k: it is equal to 1 for at most one k] as a Binomial B(n,pr) even for r≠k? [no, we observe all Xijr's with r≠k equal to zero]  The nature of the problem seems to suggest that the answer is no [for other indices, Xijr is always zero, indeed] If that is the case I do not see where the power on top of (1-pk) in the middle of the page 185 comes from [when the capture indices are zero, they do not contribute to the sum, which explains for this condensed formula. Therefore, I do not think there is anything wrong with this over-parameterised representation of the missing variables.]
  7. In Section 5.3.4, there seems to be a missing minus sign in the approximation formula for the variance [indeed, shame on us for missing the minus in the observed information matrix!]
  8. I could not find the definition of \mathbb{N}^* in Theorem 6.15. Is it all natural numbers or all integers? May be it would help to include it in Appendix B. [Surprising! This is the set of all positive integers, I thought this was a standard math notation...]
  9. In Definition 6.27, you probably want to say covering of A and not X. [Yes, we were already thinking of the next theorem, most likely!]
  10. In Proposition 6.33 –   all x in A instead of all x in X. [Yes, again! As shown in the proof. Even though it also holds for all x in X]

Thanks a ton to Jan and to his UNC students (and apologies for leading them astray with those typos!!!)

nlm [unused argument(s) (iter = 1)]

Posted in Books, R, Statistics with tags , , , , , on December 29, 2010 by xi'an

Ashley put the following comment on Chapter 5 of Introducing Monte Carlo Methods with R”:

I am reading chapter 5. I try to reproduced the result on page 128. The R codes don’t work on my laptop. When I try to run the following codes on page 128

> for (i in 1:(nlm(like,sta)$it)){
+ mmu=rbind(mmu,nlm(like,sta,iter=i)$est)}

I always get the error message

Error in f(x, …) : unused argument(s) (iter = 1)

It seems that the nlm function doesn’t accept the argument iter. I don’t know how to deal with it. I am in US. I guess the nlm version available to US R users is different from the version in EU. Please help.

And indeed with the most recent versions of R, like 2.12.1 on my own machine, calling nlm

> args(nlm)
function (f, p, ..., hessian = FALSE, typsize = rep(1, length(p)),
 fscale = 1, print.level = 0, ndigit = 12, gradtol = 1e-06,
 stepmax = max(1000 * sqrt(sum((p/typsize)^2)), 1000), steptol = 1e-06,
 iterlim = 100, check.analyticals = TRUE)

with the abbreviated argument iter instead of iterlim produces the above error message. This means the full syntax iterlim=i should now be used. In addition, the function nlm produces the minimum of the first argument f and like should thus be defined as

> like=function(mu){
+   -sum(log((.25*dnorm(da-mu[1])+.75*dnorm(da-mu[2]))))}

to end up with local maxima as on Figure 5.2. (Note: I do not think there are US versus EU versions of R…)

Ashley also pointed out another mistake on that page, namely that we used

> da=rbind(rnorm(10^2),2.5+rnorm(3*10^2))

instead of

> da=c(rnorm(10^2),2.5+rnorm(3*10^2))

to create a sample. Since the two normal samples have different sizes, rbind induces a duplication of the smaller sample, not what we intended!

València 9 snapshot [1]

Posted in Mountains, Running, Statistics, Travel, University life with tags , , , , , on June 5, 2010 by xi'an

Last morning, I attended the talks of Michael Goldstein and Herbie Lee, which were very interesting from very different perspectives. Michael talked about computer models, like the climate models that have been so much attacked recently for being “unrealistic”. The difficulty is obviously in dealing with the fact that the model is incorrect, what Michael calls external uncertainty. As statisticians, we are trained to deal with internal uncertainties, i.e. those conditional on the model. Michael did not propose a generic solution to this difficult problem, but he presented a series of principles towards this goal and his paper in the proceeedings (I have not [yet] read) contains examples of conducting this assessment. (I am not sure building a [statistical] model on top of the current [physical] models stands a chance to convince climato-skeptics, but this is interesting nonetheless.) Herbie addressed a completely different problem, namely the maximisation of a function under constraints when the constraints are partly unknown. (Think of a set whose boundaries are not precisely known.) This was a problem new to me and I plan to read the paper asap, as the design perspective added to the maximisation per se is made in order to decide about the worth of making new [costly] evaluations of the function to maximise.

Otherwise, the morning was spent in a fruitless pursuit of a wireless connection in the hotel where the conference takes place, as so many people were trying to connect at the same time! I eventually resolved the issue by crossing the road to an internet café and renting an ethernet cable for one hour. The hotel is unsurprisingly the soulless and unhelpful place I expected and I do not find any appeal in the high rise landscape constituting the neighbourhood. There is however a small track in the bush nearby that makes for a good running place in the early morning. (Finding a cliff that is both bolted and in the shade is going to prove a challenge!)

Follow

Get every new post delivered to your Inbox.

Join 644 other followers