Archive for stochastic optimisation
far south
Posted in Books, Statistics, Travel, University life with tags Biometrika, Computo, ENS Paris-Saclay, habilitation, HDR, MCMC convergence, overdamped Langevin algorithm, PDMP, Saclay, stochastic gradient descent, stochastic optimisation, unadjusted Langevin algorithm, Université Paris-Saclay on February 23, 2022 by xi'anICM 2018
Posted in pictures, Statistics, Travel, University life with tags deep learning, ICM 2018, International Congress of Mathematicians, Maria Esteban, Michael Jordan, Rio de Janeiro, stochastic optimisation, Université Paris Dauphine on August 4, 2018 by xi'anWhile I am not following the International Congress of Mathematicians which just started in Rio, and even less attending, I noticed an entry on their webpage on my friend and colleague Maria Esteban which I would have liked to repost verbatim but cannot figure how. (ICM 2018 also features a plenary lecture by Michael Jordan on gradient based optimisation [which was also Michael’s topic at ISBA 2018] and another one by Sanjeev Arora on the maths deep learning, two talks broadly related with statistics, which is presumably a première at this highly selective maths conference!)
X entropy for optimisation
Posted in Books, pictures, Statistics, Travel, University life with tags cross-entropy method, Gregynog Statistical Conference, Monte Carlo Statistical Methods, Reuven Rubinstein, sheep, simulated annealing, stochastic optimisation, stochastic simulation, sudoku, travelling salesman, Tregynon, Wales on March 29, 2018 by xi'anAt Gregynog, with mounds of snow still visible in the surrounding hills, not to be confused with the many sheep dotting the fields(!), Owen Jones gave a three hour lecture on simulation for optimisation, which is a less travelled path when compared with simulation for integration. His second lecture covered cross entropy for optimisation purposes. (I had forgotten that Reuven Rubinstein and Dirk Kroese had put forward this aspect of their technique in the very title of their book. As “A Unified Approach to Combinatorial Optimization, Monte-Carlo Simulation and Machine Learning”.) The X entropy approaches pushes for simulations restricted to top values of the target function, iterating to find the best parameter in the parametric family used for the simulation. (Best to be understood in the Kullback sense.) Now, this is a wee bit like simulated annealing, where lots of artificial entities have to be calibrated in the algorithm, due to the original problem being unrelated to an specific stochastic framework. X entropy facilitates concentration on the highest values of the target, but requires a family of probability distributions that puts weight on the top region. This may be a damning issue in large dimensions. Owen illustrated the approach in the case of the travelling salesman problem, where the parameterised distribution is a Markov chain on the state space of city sequences. Further, if the optimal value of the target is unknown, avoiding getting stuck in a local optimum may be tricky. (Owen presented a proof of convergence for a temperature going to zero slowly enough that is equivalent to a sure exploration of the entire state space, in a discrete setting, which does not provide a reassurance in this respect, as the corresponding algorithm cannot be implemented.) This method falls into the range of methods that are doubly stochastic in that they rely on Monte Carlo approximations at each iteration of the exploration algorithm.
During a later talk, I tried to recycle one of my earlier R codes on simulated annealing for sudokus, but could not find a useful family of proposal distributions to reach the (unique) solution. Using a mere product of distributions on each of the free positions in the sudoku grid only led me to a penalty of 13 errors…
1 2 8 5 9 7 4 9 3 7 3 5 1 2 4 6 2 8 4 6 9 6 3 8 5 7 1 2 7 5 3 1 6 9 4 8 8 1 4 7 8 9 7 6 2 6 9 3 8 4 2 1 3 5 3 8 6 4 7 5 2 1 9 1 4 2 9 6 3 8 5 7 9 5 7 2 1 8 3 4 6
It is hard to consider a distribution on the space of permutations, 𝔖⁸¹.
journée algorithmes stochastiques à Dauphine vendredi
Posted in Statistics, University life with tags CEREMADE, computational statistics, MCMC, Paris, sequential Monte Carlo, Statistics, stochastic algorithms, stochastic optimisation, tea, Université Paris Dauphine, workshop on November 28, 2017 by xi'anA final reminder (?) that we hold a special day of five talks around stochastic algorithms at Dauphine this Friday, from 10:00 till 17:30. Attendance is free, coffee and tea are free (while they last!), come and join us!
SAME but different
Posted in Statistics, University life with tags data cloning, document analysis, map, Monte Carlo Statistical Methods, parallel MCMC, SAME, simulated annealing, simulation, stochastic optimisation, variational Bayes methods on October 27, 2014 by xi'anAfter several clones of our SAME algorithm appeared in the literature, it is rather fun to see another paper acknowledging the connection. SAME but different was arXived today by Zhao, Jiang and Canny. The point of this short paper is to show that the parallel implementation of SAME leads to efficient performances compared with existing standards. Since the duplicated latent variables are independent [given θ] they can be simulated in parallel. They further assume independence between the components of those latent variables. And finite support. As in document analysis. So they can sample the replicated latent variables all at once. Parallelism is thus used solely for the components of the latent variable(s). SAME is normally associated with an annealing schedule but the authors could not detect an improvement over a fixed and large number of replications. They reported gains comparable to state-of-the-art variational Bayes on two large datasets. Quite fun to see SAME getting a new life thanks to computer scientists!