Archive for Matlab

postdoc in Bayesian machine learning in Berlin [reposted]

Posted in R, Statistics, Travel, University life with tags , , , , , , , , , , , , on December 24, 2019 by xi'an

The working group of Statistics at Humboldt University of Berlin invites applications for one Postdoctoral research fellow (full-time employment, 3 years with extension possible) to contribute to the research on mathematical and statistical aspects of (Bayesian) learning approaches. The research positions are associated with the Emmy Noether group Regression Models beyond the Mean – A Bayesian Approach to Machine Learning and working group of Applied Statistics at the School of Business and Economics at Humboldt-Universität Berlin. Opportunities for own scientific qualification (PhD)/career development are provided, see an overview and further links. The positions are to be filled at the earliest possible date and funded by the German Research Foundation (DFG) within the Emmy Noether programme.

Requirements:
– an outstanding PhD in Statistics, Mathematics, or related field with specialisation in Statistics, Data Science or Mathematics;
– a strong background in at least one of the following fields: mathematical statistics, computational methods, Bayesian statistics, statistical learning, advanced regression modelling;
– a thorough mathematical understanding.
– substantial experience in scientific programming with Matlab, Python, C/C++, R or similar;
– strong interest in developing novel statistical methodology and its applications in various fields such as economics or natural and life sciences;
– a very good communication skills and team experience, proficiency of the written and spoken English language (German is not obligatory).

Opportunities:
We offer the unique environment of young researchers and leading international experts in the fields. The vibrant international network includes established collaborations in Singapore and Australia. The positions offer potential to closely work with several applied sciences. Information about the research profile of the research group and further contact details can be found here. The positions are paid according to the Civil Service rates of the German States “TV-L”, E13 (if suitably qualified).

Applications should include:
– a CV with list of publications
– a motivational statement (at most one page) explaining the applicant’s interest in the announced position as well as their relevant skills and experience
– copies of degrees/university transcripts
– names and email addresses of at least two professors that may provide letters of recommendation directly to the hiring committee Applications should be sent as a single PDF file to: Prof. Dr. Nadja Klein (nadja.klein[at]hu-berlin.de), whom you may also contact for questions concerning this job post. Please indicate “Research Position Emmy Noether”.

Application deadline: 31st of January 2020

HU is seeking to increase the proportion of women in research and teaching, and specifically encourages qualified female scholars to apply. Severely disabled applicants with equivalent qualifications will be given preferential consideration. People with an immigration background are specifically encouraged to apply. Since we will not return your documents, please submit copies in the application only.

Matlab goes deep [learning]

Posted in Books, pictures, R, Statistics, University life with tags , , on September 5, 2016 by xi'an

deepearningsA most interesting link I got when reading Le Monde, about MatLab proposing deep learning tools…

Paris Machine Learning Meeting #10 Season 2

Posted in Books, Kids, pictures, Statistics, University life with tags , , , , , , , , , , on June 17, 2015 by xi'an

Invalides, Paris, May 8, 2012

Tonight, I am invited to give a speed-presenting talk at the Paris Machine Learning last meeting of Season 2, with the themes of DL, Recovering Robots, Vowpal Wabbit, Predcsis, Matlab, and Bayesian test [by yours truly!] The meeting will take place in Jussieu, Amphi 25, Here are my slides for the meeting:

As it happened, the meeting  was quite crowded with talks and plagued with technical difficulties in transmitting talks from Berlin and Toronto, so I came to talk about three hours after the beginning, which was less than optimal for the most technical presentation of the evening. I actually wonder if I even managed to carry the main idea of replacing Bayes factors with posteriors of the mixture weight! [I had plenty of time to reflect upon this on my way back home as I had to wait for several and rare and crowded RER trains until one had enough room for me and my bike!]

scale acceleration

Posted in pictures, R, Statistics, Travel, University life with tags , , , , , , , , on April 24, 2015 by xi'an

thalysKate Lee pointed me to a rather surprising inefficiency in matlab, exploited in Sylvia Früwirth-Schnatter’s bayesf package: running a gamma simulation by rgamma(n,a,b) takes longer and sometimes much longer than rgamma(n,a,1)/b, the latter taking advantage of the scale nature of b. I wanted to check on my own whether or not R faced the same difficulty, so I ran an experiment [while stuck in a Thalys train at Brussels, between Amsterdam and Paris…] Using different values for a [click on the graph] and a range of values of b. To no visible difference between both implementations, at least when using system.time for checking.

a=seq(.1,4,le=25)
for (t in 1:25) a[t]=system.time(
     rgamma(10^7,.3,a[t]))[3]
a=a/system.time(rgamma(10^7,.3,1))[3]

Once arrived home, I wondered about the relevance of the above comparison, since rgamma(10^7,.3,1) forces R to use 1 as a scale, which may differ from using rgamma(10^7,.3), where 1 is known to be the scale [does this sentence make sense?!]. So I rerun an even bigger experiment as

a=seq(.1,4,le=25)
for (t in 1:25) a[t]=system.time(
     rgamma(10^8,.3,a[t]))[3]
a=a/system.time(rgamma(10^8,.3))[3]

and got the graph below. Which is much more interesting because it shows that some values of a are leading to a loss of efficiency of 50%. Indeed. (The most extreme cases correspond to a=0.3, 1.1., 5.8. No clear pattern emerging.)thalys2Update

As pointed out by Martyn Plummer in his comment, the C function behind the R rgamma function and Gamma generator does take into account the scale nature of the second parameter, so the above time differences are not due to this function but rather to whatever my computer was running at the same time…! Apologies to anyone I scared with this void warning!

data scientist position

Posted in R, Statistics, University life with tags , , , , , , , , , , on April 8, 2014 by xi'an

Université Paris-DauphineOur newly created Chaire “Economie et gestion des nouvelles données” in Paris-Dauphine, ENS Ulm, École Polytechnique and ENSAE is recruiting a data scientist starting as early as May 1, the call remaining open till the position is filled. The location is in one of the above labs in Paris, the duration for at least one year, salary is varying, based on the applicant’s profile, and the contacts are Stephane Gaiffas (stephane.gaiffas AT cmap DOT polytechnique.fr), Robin Ryder (ryder AT ceremade DOT dauphine.fr). and Gabriel Peyré (peyre AT ceremade DOT dauphine.fr). Here are more details:

Job description

The chaire “Economie et gestion des nouvelles données” is recruiting a talented young engineer specialized in large scale computing and data processing. The targeted applications include machine learning, imaging sciences and finance. This is a unique opportunity to join a newly created research group between the best Parisian labs in applied mathematics and computer science (ParisDauphine, ENS Ulm, Ecole Polytechnique and ENSAE) working hand in hand with major industrial companies (Havas, BNP Paribas, Warner Bros.). The proposed position consists in helping researchers of the group to develop and implement large scale data processing methods, and applying these methods on real life problems in collaboration with the industrial partners.

A non exhaustive list of methods that are currently investigated by researchers of the group, and that will play a key role in the computational framework developed by the recruited engineer, includes :
● Large scale non smooth optimization methods (proximal schemes, interior points, optimization on manifolds).
● Machine learning problems (kernelized methods, Lasso, collaborative filtering, deep learning, learning for graphs, learning for timedependent systems), with a particular focus on large scale problems and stochastic methods.
● Imaging problems (compressed sensing, superresolution).
● Approximate Bayesian Computation (ABC) methods.
● Particle and Sequential Monte Carlo methods

Candidate profile

The candidate should have a very good background in computer science with various programming environments (e.g. Matlab, Python, C++) and knowledge of high performance computing methods (e.g. GPU, parallelization, cloud computing). He/she should adhere to the open source philosophy and possibly be able to interact with the relevant communities (e.g. scikitlearn initiative). Typical curriculum includes engineering school or Master studies in computer science / applied maths / physics, and possibly a PhD (not required).

Working environment

The recruited engineer will work within one of the labs of the chaire. He/she will benefit from a very stimulating working environment and all required computing resources. He/she will work in close interaction with the 4 research labs of the chaire, and will also have regular meetings with the industrial partners. More information about the chaire can be found online at http://www.di.ens.fr/~aspremon/chaire/

Statistical modeling and computation [book review]

Posted in Books, R, Statistics, University life with tags , , , , , , , , , , , , , on January 22, 2014 by xi'an

Dirk Kroese (from UQ, Brisbane) and Joshua Chan (from ANU, Canberra) just published a book entitled Statistical Modeling and Computation, distributed by Springer-Verlag (I cannot tell which series it is part of from the cover or frontpages…) The book is intended mostly for an undergrad audience (or for graduate students with no probability or statistics background). Given that prerequisite, Statistical Modeling and Computation is fairly standard in that it recalls probability basics, the principles of statistical inference, and classical parametric models. In a third part, the authors cover “advanced models” like generalised linear models, time series and state-space models. The specificity of the book lies in the inclusion of simulation methods, in particular MCMC methods, and illustrations by Matlab code boxes. (Codes that are available on the companion website, along with R translations.) It thus has a lot in common with our Bayesian Essentials with R, meaning that I am not the most appropriate or least unbiased reviewer for this book. Continue reading

accelerated ABC

Posted in R, Statistics, Travel, University life with tags , , , , , on October 17, 2013 by xi'an

AF flight to Montpellier, Feb. 07, 2012On the flight back from Warwick, I read a fairly recently arXived paper by Umberto Picchini and Julie Forman entitled “Accelerating inference for diffusions observed with measurement error and large sample sizes using Approximate Bayesian Computation: A case study” that relates to earlier ABC works (and the MATLAB abc-sde package) by the first author (earlier works I missed). Among other things, the authors propose an acceleration device for ABC-MCMC: when simulating from the proposal, the Metropolis-Hastings acceptance probability can be computed and compared with a uniform rv prior to simulating pseudo-data. In case of rejection, the pseudo-data does not need to be simulated. In case of acceptance, it is compared with the observed data as usual. This is interesting for two reasons: first it always speeds up the algorithm. Second, it shows the strict limitations of ABC-MCMC, since the rejection takes place without incorporating the information contained in the data. (Even when the proposal incorporates this information, the comparison with the prior does not go this way.) This also relates to one of my open problems, namely how to simulate directly summary statistics without simulating the whole pseudo-dataset.

Another thing (related with acceleration) is that the authors use a simulated subsample rather than the simulated sample in order to gain time: this worries me somehow as the statistics corresponding to the observed data is based on the whole observed data. I thus wonder how both statistics could be compared, since they have different distributions and variabilities, even when using the same parameter value. Or is this a sort of pluggin/bootstrap principle, the true parameter being replaced with its estimator based on the whole data? Maybe this does not matter in the end (when compared with the several levels of approximation)…