Archive for optimisation

optimality stands in the eye of the riddler

Posted in Books, Kids, pictures, Statistics with tags , , , , , , on March 22, 2016 by xi'an

When looking at US elections on FiveThirtyEight, I came across this riddle:

Two players go (…) into separate booths (…) and a random number between zero and one appears on a screen (…) chosen from a standard uniform distribution. They can choose to keep that first number, or to (…) get a second random number, which they must keep (…) The prize (…) is awarded to the player who kept the higher number (…) Which number is the optimal cutoff for players to discard their first number and choose another? [The Riddler, Mar 4, 2016]

While the solution is now available, I wonder at the wording of this riddle, where “optimality” is not spelled out. Unless I missed some key entry, as it often happens with puzzles… Assuming both players use the same “optimal” cut-off C to make their decision to keep or drop the original uniform, the probability that one does better than the other is exactly ½ since both outcomes are iid. When considering the expected value of the number kept by a player, a simple integral shows that this value is

½(1-C²+C),

which is maximal for C=½. If one considers instead the median of the number Z kept by a player, a bit more computation leads to the function med(Z) = 1/2C when C²>1/2 and (½+C)/(1+C) when C²<1/2, which is maximal for C²=½.

“…using the golden ratio gives the best chance to win the gold bullion!” [The Riddler, Mar 4, 2016]

Incidentally (or not), the solution on 538 is also difficult to understand in that it starts with the event that the first draw is C, which is an event of probability zero. However, when trying to optimise the choice of C for one player, given that the other player has a known cuttoff of D, I found that the only case when C=D coincides with the solution proposed on Riddler, namely ½(√5-1). To check whether or not this derivation was correct, I also plotted the theoretical (right) versus the empirical (left) versions of the winning probability:

riddlerThere is no difference between the two. But the diagonal is exactly at the .5 level, as expected:riddlerinthestormwith an interesting second curve at the .5 probability level. These two level sets happen to meet at the “golden number” solution, ½(√5-1), which comes as a confirmation of my earlier remark. [Another question connected with this riddle would be to figure out the value of D used by the other player from a sequence of games.]

Approximate Maximum Likelihood Estimation

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , , , on September 21, 2015 by xi'an

linz3Bertl et al. arXived last July a paper on a maximum likelihood estimator based on an alternative to ABC techniques. And to indirect inference. (One of the authors in et al. is Andreas Futschik whom I visited last year in Linz.) Paper that I only spotted when gathering references for a reading list on ABC… The method is related to the “original ABC paper” of Diggle and Gratton (1984) which, parallel to Rubin (1984), contains in retrospect the idea of ABC methods. The starting point is stochastic approximation, namely the optimisation of a function of a parameter θ when written as an expectation of a random variable Y, E[Y|θ], as in the Kiefer-Wolfowitz algorithm. However, in the case of the likelihood function, there is rarely an unbiased estimator and the authors propose instead to use a kernel density estimator of the density of the summary statistic. This means that, at each iteration of the Kiefer-Wolfowitz algorithm, two sets of observations and hence of summary statistics are simulated and two kernel density estimates derived, both to be applied to the observed summary. The sequences underlying the Kiefer-Wolfowitz algorithm are taken from (the excellent optimisation book of) Spall (2003). Along with on-the-go adaptation and convergence test.

The theoretical difficulty in this extension is however that the kernel density estimator is not unbiased and thus that, rigorously speaking, the validation of the Kiefer-Wolfowitz algorithm does not apply here. On the practical side, the need for multiple starting points and multiple simulations of pseudo-samples may induce considerable time overload. Especially if  bootstrap is used to evaluate the precision of the MLE approximation. Besides normal and M/G/1 queue examples, the authors illustrate the approach on a population genetic dataset of Borneo and Sumatra orang-utans. With 5 parameters and 28 summary statistics. Which thus means using a kernel density estimator in dimension 28, a rather perilous adventure..!

future of computational statistics

Posted in Books, pictures, R, Statistics, University life with tags , , , , , , , , , , , , , , on September 29, 2014 by xi'an

I am currently preparing a survey paper on the present state of computational statistics, reflecting on the massive evolution of the field since my early Monte Carlo simulations on an Apple //e, which would take a few days to return a curve of approximate expected squared error losses… It seems to me that MCMC is attracting more attention nowadays than in the past decade, both because of methodological advances linked with better theoretical tools, as for instance in the handling of stochastic processes, and because of new forays in accelerated computing via parallel and cloud computing, The breadth and quality of talks at MCMski IV is testimony to this. A second trend that is not unrelated to the first one is the development of new and the rehabilitation of older techniques to handle complex models by approximations, witness ABC, Expectation-Propagation, variational Bayes, &tc. With a corollary being an healthy questioning of the models themselves. As illustrated for instance in Chris Holmes’ talk last week. While those simplifications are inevitable when faced with hardly imaginable levels of complexity, I still remain confident about the “inevitability” of turning statistics into an “optimize+penalize” tunnel vision…  A third characteristic is the emergence of new languages and meta-languages intended to handle complexity both of problems and of solutions towards a wider audience of users. STAN obviously comes to mind. And JAGS. But it may be that another scale of language is now required…

If you have any suggestion of novel directions in computational statistics or instead of dead ends, I would be most interested in hearing them! So please do comment or send emails to my gmail address bayesianstatistics

variational particle approximations

Posted in Mountains, pictures, Statistics, Travel, University life with tags , , , , , , , on February 28, 2014 by xi'an

IMG_2515In the plane to Montréal, today, I read this paper by Kulkarni, Saeedi and Gershman, which will be presented at AISTATS. The main idea is to create a mix between particle Monte Carlo and a kind of quasi-Monte Carlo technique (qNC is not mentionned in the paper), using variational inference (and coordinate ascent) to optimise the location and weight of the particles. It is however restricted to cases with finite support (as a product of N latent variables) as in an HMM with a finite state space. There is also something I do not get in the HMM case, which is that the variational approximation to the filtering is contracted sequentially. This means that at time the K highest weight current particles are selected while the past remains unchanged. Is this due to the Markovian nature of the hidden model? (Blame oxygen deprivation, altitude dizziness or travelling stress, then!) I also fail to understand how for filtering, “at each time step, the algorithm selects the K continuations (new variable assignments of the current particle set) that maximize the variational free energy.” Because the weight function to be optimised (eqn (11)) seems to freeze the whole past path of particles… I presume I will find an opportunity while in Reykjavik to discuss those issues with the authors.

Foundations of Statistical Algorithms [book review]

Posted in Books, Linux, R, Statistics, University life with tags , , , , , , , , , , , , , on February 28, 2014 by xi'an

There is computational statistics and there is statistical computing. And then there is statistical algorithmic. Not the same thing, by far. This 2014 book by Weihs, Mersman and Ligges, from TU Dortmund, the later being also a member of the R Core team, stands at one end of this wide spectrum of techniques required by modern statistical analysis. In short, it provides the necessary skills to construct statistical algorithms and hence to contribute to statistical computing. And I wish I had the luxury to teach from Foundations of Statistical Algorithms to my graduate students, if only we could afford an extra yearly course…

“Our aim is to enable the reader (…) to quickly understand the main ideas of modern numerical algorithms [rather] than having to memorize the current, and soon to be outdated, set of popular algorithms from computational statistics.”(p.1)

The book is built around the above aim, first presenting the reasons why computers can produce answers different from what we want, using least squares as a mean to check for (in)stability, then second establishing the ground forFishman Monte Carlo methods by discussing (pseudo-)random generation, including MCMC algorithms, before moving in third to bootstrap and resampling techniques, and  concluding with parallelisation and scalability. The text is highly structured, with frequent summaries, a division of chapters all the way down to sub-sub-sub-sections, an R implementation section in each chapter, and a few exercises. Continue reading

optimisation via slice sampling

Posted in Statistics with tags , , , , on December 20, 2012 by xi'an

simulated annealing path on a multimodal surface (c) Robert and Casella, 2007This morning, over breakfast; I read the paper recently arXived by John Birge et Nick Polson. I was intrigued by the combination of optimisation and of slice sampling, but got a wee disappointed by the paper, in that it proposes a simple form of simulated annealing, minimising f(x) by targeting a small collection of energy functions exp{-τf(x)}. Indeed, the slice sampler is used to explore each of those targets, i.e. for a fixed temperature τ. For the four functions considered in the paper, a slice sampler can indeed be implemented, but this feature could be seen as a marginalia, given that a random walk Metropolis-Hastings algorithm could be used as a proposal mechanism in other cases. The other intriguing fact is that annealing is not used in the traditional way of increasing the coefficient τ along iterations (as in our SAME algorithm), for which convergence issues are much more intricate, but instead stays at the level where a whole (Markov) sample is used for each temperature, the outcomes being then compared in terms of localisation (and maybe for starting at the next temperature value). I did not see any discussion about the selection of the successive temperatures, which usually is a delicate issue in realistic settings, nor of the stopping rule(s) used to decide the maximum has been reached.

Special Issue of ACM TOMACS on Monte Carlo Methods in Statistics

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , on December 10, 2012 by xi'an

As posted here a long, long while ago, following a suggestion from the editor (and North America Cycling Champion!) Pierre Lécuyer (Université de Montréal), Arnaud Doucet (University of Oxford) and myself acted as guest editors for a special issue of ACM TOMACS on Monte Carlo Methods in Statistics. (Coincidentally, I am attending a board meeting for TOMACS tonight in Berlin!) The issue is now ready for publication (next February unless I am confused!) and made of the following papers:

* Massive parallelization of serial inference algorithms for a complex generalized linear model
MARC A. SUCHARD, IVAN ZORYCH, PATRICK RYAN, DAVID MADIGAN
*Convergence of a Particle-based Approximation of the Block Online Expectation Maximization Algorithm
SYLVAIN LE CORFF and GERSENDE FORT
* Efficient MCMC for Binomial Logit Models
AGNES FUSSL, SYLVIA FRÜHWIRTH-SCHNATTER, RUDOLF FRÜHWIRTH
* Adaptive Equi-Energy Sampler: Convergence and Illustration
AMANDINE SCHRECK and GERSENDE FORT and ERIC MOULINES
* Particle algorithms for optimization on binary spaces
CHRISTIAN SCHÄFER
* Posterior expectation of regularly paved random histograms
RAAZESH SAINUDIIN, GLORIA TENG, JENNIFER HARLOW, and DOMINIC LEE
* Small variance estimators for rare event probabilities
MICHEL BRONIATOWSKI and VIRGILE CARON
* Self-Avoiding Random Dynamics on Integer Complex Systems
FIRAS HAMZE, ZIYU WANG, and NANDO DE FREITAS
* Bayesian learning of noisy Markov decision processes
SUMEETPAL S. SINGH, NICOLAS CHOPIN, and NICK WHITELEY

Here is the draft of the editorial that will appear at the beginning of this special issue. (All faults are mine, of course!) Continue reading

Follow

Get every new post delivered to your Inbox.

Join 1,077 other followers