Archive for Matlab
Tonight, I am invited to give a speed-presenting talk at the Paris Machine Learning last meeting of Season 2, with the themes of DL, Recovering Robots, Vowpal Wabbit, Predcsis, Matlab, and Bayesian test [by yours truly!] The meeting will take place in Jussieu, Amphi 25, Here are my slides for the meeting:
As it happened, the meeting was quite crowded with talks and plagued with technical difficulties in transmitting talks from Berlin and Toronto, so I came to talk about three hours after the beginning, which was less than optimal for the most technical presentation of the evening. I actually wonder if I even managed to carry the main idea of replacing Bayes factors with posteriors of the mixture weight! [I had plenty of time to reflect upon this on my way back home as I had to wait for several and rare and crowded RER trains until one had enough room for me and my bike!]
Kate Lee pointed me to a rather surprising inefficiency in matlab, exploited in Sylvia Früwirth-Schnatter’s bayesf package: running a gamma simulation by rgamma(n,a,b) takes longer and sometimes much longer than rgamma(n,a,1)/b, the latter taking advantage of the scale nature of b. I wanted to check on my own whether or not R faced the same difficulty, so I ran an experiment [while stuck in a Thalys train at Brussels, between Amsterdam and Paris…] Using different values for a [click on the graph] and a range of values of b. To no visible difference between both implementations, at least when using system.time for checking.
a=seq(.1,4,le=25) for (t in 1:25) a[t]=system.time( rgamma(10^7,.3,a[t])) a=a/system.time(rgamma(10^7,.3,1))
Once arrived home, I wondered about the relevance of the above comparison, since rgamma(10^7,.3,1) forces R to use 1 as a scale, which may differ from using rgamma(10^7,.3), where 1 is known to be the scale [does this sentence make sense?!]. So I rerun an even bigger experiment as
a=seq(.1,4,le=25) for (t in 1:25) a[t]=system.time( rgamma(10^8,.3,a[t])) a=a/system.time(rgamma(10^8,.3))
and got the graph below. Which is much more interesting because it shows that some values of a are leading to a loss of efficiency of 50%. Indeed. (The most extreme cases correspond to a=0.3, 1.1., 5.8. No clear pattern emerging.)Update
As pointed out by Martyn Plummer in his comment, the C function behind the R rgamma function and Gamma generator does take into account the scale nature of the second parameter, so the above time differences are not due to this function but rather to whatever my computer was running at the same time…! Apologies to anyone I scared with this void warning!
Our newly created Chaire “Economie et gestion des nouvelles données” in Paris-Dauphine, ENS Ulm, École Polytechnique and ENSAE is recruiting a data scientist starting as early as May 1, the call remaining open till the position is filled. The location is in one of the above labs in Paris, the duration for at least one year, salary is varying, based on the applicant’s profile, and the contacts are Stephane Gaiffas (stephane.gaiffas AT cmap DOT polytechnique.fr), Robin Ryder (ryder AT ceremade DOT dauphine.fr). and Gabriel Peyré (peyre AT ceremade DOT dauphine.fr). Here are more details:
The chaire “Economie et gestion des nouvelles données” is recruiting a talented young engineer specialized in large scale computing and data processing. The targeted applications include machine learning, imaging sciences and finance. This is a unique opportunity to join a newly created research group between the best Parisian labs in applied mathematics and computer science (ParisDauphine, ENS Ulm, Ecole Polytechnique and ENSAE) working hand in hand with major industrial companies (Havas, BNP Paribas, Warner Bros.). The proposed position consists in helping researchers of the group to develop and implement large scale data processing methods, and applying these methods on real life problems in collaboration with the industrial partners.
A non exhaustive list of methods that are currently investigated by researchers of the group, and that will play a key role in the computational framework developed by the recruited engineer, includes :
● Large scale non smooth optimization methods (proximal schemes, interior points, optimization on manifolds).
● Machine learning problems (kernelized methods, Lasso, collaborative filtering, deep learning, learning for graphs, learning for timedependent systems), with a particular focus on large scale problems and stochastic methods.
● Imaging problems (compressed sensing, superresolution).
● Approximate Bayesian Computation (ABC) methods.
● Particle and Sequential Monte Carlo methods
The candidate should have a very good background in computer science with various programming environments (e.g. Matlab, Python, C++) and knowledge of high performance computing methods (e.g. GPU, parallelization, cloud computing). He/she should adhere to the open source philosophy and possibly be able to interact with the relevant communities (e.g. scikitlearn initiative). Typical curriculum includes engineering school or Master studies in computer science / applied maths / physics, and possibly a PhD (not required).
The recruited engineer will work within one of the labs of the chaire. He/she will benefit from a very stimulating working environment and all required computing resources. He/she will work in close interaction with the 4 research labs of the chaire, and will also have regular meetings with the industrial partners. More information about the chaire can be found online at http://www.di.ens.fr/~aspremon/chaire/
Dirk Kroese (from UQ, Brisbane) and Joshua Chan (from ANU, Canberra) just published a book entitled Statistical Modeling and Computation, distributed by Springer-Verlag (I cannot tell which series it is part of from the cover or frontpages…) The book is intended mostly for an undergrad audience (or for graduate students with no probability or statistics background). Given that prerequisite, Statistical Modeling and Computation is fairly standard in that it recalls probability basics, the principles of statistical inference, and classical parametric models. In a third part, the authors cover “advanced models” like generalised linear models, time series and state-space models. The specificity of the book lies in the inclusion of simulation methods, in particular MCMC methods, and illustrations by Matlab code boxes. (Codes that are available on the companion website, along with R translations.) It thus has a lot in common with our Bayesian Essentials with R, meaning that I am not the most appropriate or least
unbiased reviewer for this book. Continue reading
On the flight back from Warwick, I read a fairly recently arXived paper by Umberto Picchini and Julie Forman entitled “Accelerating inference for diffusions observed with measurement error and large sample sizes using Approximate Bayesian Computation: A case study” that relates to earlier ABC works (and the MATLAB abc-sde package) by the first author (earlier works I missed). Among other things, the authors propose an acceleration device for ABC-MCMC: when simulating from the proposal, the Metropolis-Hastings acceptance probability can be computed and compared with a uniform rv prior to simulating pseudo-data. In case of rejection, the pseudo-data does not need to be simulated. In case of acceptance, it is compared with the observed data as usual. This is interesting for two reasons: first it always speeds up the algorithm. Second, it shows the strict limitations of ABC-MCMC, since the rejection takes place without incorporating the information contained in the data. (Even when the proposal incorporates this information, the comparison with the prior does not go this way.) This also relates to one of my open problems, namely how to simulate directly summary statistics without simulating the whole pseudo-dataset.
Another thing (related with acceleration) is that the authors use a simulated subsample rather than the simulated sample in order to gain time: this worries me somehow as the statistics corresponding to the observed data is based on the whole observed data. I thus wonder how both statistics could be compared, since they have different distributions and variabilities, even when using the same parameter value. Or is this a sort of pluggin/bootstrap principle, the true parameter being replaced with its estimator based on the whole data? Maybe this does not matter in the end (when compared with the several levels of approximation)…
My Paris colleague (and fellow-runner) Aurélien Garivier has produced an interesting comparison of 4 (or 6 if you consider scilab and octave as different from matlab) computer languages in terms of speed for producing the MLE in a hidden Markov model, using EM and the Baum-Welch algorithms. His conclusions are that
- matlab is a lot faster than R and python, especially when vectorization is important : this is why the difference is spectacular on filtering/smoothing, not so much on the creation of the sample;
- octave is a good matlab emulator, if no special attention is payed to execution speed…;
- scilab appears as a credible, efficient alternative to matlab;
- still, C is a lot faster; the inefficiency of matlab in loops is well-known, and clearly shown in the creation of the sample.
(In this implementation, R is “only” three times slower than matlab, so this is not so damning…) All the codes are available and you are free to make suggestions to improve the speed of of your favourite language!