Archive for PhD thesis

efficient learning in ABC

Posted in Statistics with tags , , , , , , on October 11, 2012 by xi'an

Jean-Michel Marin just posted on arXiv a joint paper of ours, Efficient learning in ABC algorithms. This paper, to which elaboration [if not redaction] I contributed to, is one of the chapters of Mohammed Sedki’s thesis. (Mohammed is about to defend this thesis, currently with reviewers. A preliminary version of this paper was presented at ABC in London and it is in revision with Statistics and Computing. Hence missing the special issue!)

The paper builds on the sequential ABC scheme of Del Moral et al. (2012), already discussed in this post, where the tolerance level at each step is adapted from the previous iterations as a quantile of the distances. (The quantile level is based on a current effective sample size.) In a “systematic” step, the particles that are closest to the observations are preserved and duplicated, while those farther away are sampled randomly. The resulting population of particles is then perturbed by an adaptive (random walk) kernel and the algorithm stops when the tolerance level does not decrease any longer or when the acceptance rate of the Metropolis step is too low. Pierre Pudlo and Mohammed Sedki experimented a parallel implementation of the algorithm, with an almost linear improvement in the number of cores. It is less clear the same would work on a GPU because of the communication requirements. Overall, the new algorithm brings a significant improvement in computing time when compared with earlier versions, mainly because the number of simulations from the model is minimised. (It was tested on a rather complex population scenario retracing the invasion of honeybees in Europe (in connection with the previous post!)

thesis defence

Posted in pictures, Statistics, Travel, University life, Wines with tags , , on September 4, 2012 by xi'an

Today, my (now former) PhD student Pierre Jacob defended his thesis in Paris-Dauphine. This is a superb thesis on computational Bayesian statistics with five papers accepted or in the process of being accepted, covering parallel MCMC (our joint paper in JCGS with Murray Smith), free energy sampler (with Nicolas Chopin, who was also involved in the thesis direction, published in Bayesian Statistics 9), Wang-Landau algorithms, both at the theory (with Robin Ryder, in revision for AAP) and algorithmic levels (with Luke Bornn, Arnaud Doucet and Pierre Del Moral in JCGS) and SMC² (with Nicolas Chopin and Omiros Papaspiliopoulos, to appear in Series B). This impressive range of results was accomplished in about two years and a half, thanks to a high level of autonomy and an intense involvment in the thestis, as well as a long visit to Vancouver (UBC) where he collaborated with Luke Bornn and Arnaud Doucet. My advisor role was thus more in re-reading papers and arranging trips abroad than in directing the research per se, at least at the daily level, which suited me perfectly! After a two-month visit in Perth this summer/winter, Pierre will soon move to Singapore for a postdoctoral year collaborating with Ajay Jasra (and presumably many others, given his ability to start new projects wherever he goes). Congratulations for the thesis and good luck for the future!

ISBA 2012 [guest post]

Posted in pictures, Statistics, Travel, University life with tags , , , , on June 27, 2012 by xi'an

(This post on his impression on ISBA 2012 was written by Sam Clifford, PhD student at QUT.)

Living in Australia has a few benefits when it comes to ISBA 2012. The most obvious one is that travel to Japan takes less than 24 hours and all occurs within time zones which are fairly close to what I consider “normal time”. While my room mate and other attendees appeared to be in various states of exhaustion and mania (we all deal with a lack of sleep very differently) I spent the Monday morning after registration exploring parts of downtown Kyoto with two other attendees from QUT.

Well rested and ready for action, I managed to stay awake during the foundational topic sessions. I had seen Aad van der Vaart talk about coverage of credible sets this time last year in Veracruz at 8BNP, but having come further with my studies I now understood much more of what was going on and could appreciate the implications on a deeper level. The take-home message “you can be very certain but very wrong”. It’s definitely food for thought as I go ahead with my own work on non-/semi-parametric smoothing.

The standout talks for me so far have been in the Advances in Gaussian Processes, Hierarchies of BNP processes and Beta process sessions. While I don’t work directly in these fields I find them absolutely fascinating and was treated in the GP session to three very good examples of the use of GPs with some very clever intricacies in solving some large scale physical science problems. Cari Kaufman’s presentation, in particular, was a great demonstration of how we can use the properties of GPs and flexible mean estimators to obtain sensible smoothers than interpolate the data we have while giving good estimates of the remaining uncertainty. We had a chat on the way to lunch about the overlap between our work and started thinking about some interesting problems that exist in this overlap.

The Hierarchical NP Bayes session had Emily Fox talking about hierarchies of GPs that take advantage of partitioning and the additive properties of GPs and give us a multiresolution GP modelling technique. Combining a globally smooth GP with smaller scale GPs which can model local and discontinuous behaviour in a straightforward and computationally efficient manner is a really neat way to take care of the multiple scales of behaviour in data. Every time I see what Fox is working on I get really excited about the ways we might be able to use it in my group.

Probably my favourite talk was Tamara Broderick’s talk on the search for exchangeable feature probability functions as a way to characterise latent feature models in the way clustering models have been characterised. I know that Tam spent a lot of time on this talk, both late at night and early in the morning, and it paid off. The moment of beauty, for me, came when she presented an extension to Kingman’s Paintbox that allows for overlapping “partitions” by ensuring that the second feature was shared between the partitions where the first feature was and wasn’t expressed (such that p2|p1 and p2|~p1 are in equal proportions, linking it to the Independence). At that point, the talk stopped being about some interesting models based on the Beta process and became a call to discover what was possible in terms of links between a painting scheme and EFPFs. The paper became available on the arXiv during the Beta Process session

Written at Wednesday morning coffee break.

day of the theses

Posted in Statistics, University life with tags , , , , on May 14, 2012 by xi'an

Today, I will spend my day in thesis defenses, as I take part in a defense committee this morning at Supéléc, about a thesis written by Alireza Roodaki on a new approach to trans-dimensional MCMC for mixtures of distributions. Rather than a new way to simulate from posterior distributions with a varying number of components, the thesis concentrates on the post-simulation processing of the outcome of the simulation, constructing an object similar to the point process representation of Matthew Stephens where components have a meaning across varying dimensions. An interesting and novel perspective. The afternoon, I am part of another defense committee for the habilitation of Fadoua Balabdaoui, my colleague in Paris-Dauphine. Fadoua is working in non-parametric statistics, under shape constraints, but has a wide range of interests and publications that fully justify an habilitation degree at this stage of her career. (Habilitation is a degree required in France and Germany to become a Full Professor and to autonomously advise PhD students.)

An ethical issue

Posted in Statistics, University life with tags , , , on November 19, 2011 by xi'an

A few weeks ago, I was asked to act as an external referee for a PhD thesis. This thesis involved some improvement upon standard statistical methodology and applications to another field. When I eventually got the PhD document, I discovered that it started with a preface (written by the PhD student) containing claims that the student’s work has been used by co-workers, incl. the PhD supervisor, and published in a refereed journal without the student’s name nor agreement, but also with some fabricated data… This was quite a shock as I had not been made aware of this super-delicate issue a priori. And I had not information on the published piece of work,  which seemed to be in the other field (I have not been able to find it since then). When I complained to the university, I got transferred to the dean of graduate studies, who almost immediately withdrew the demand for a PhD evaluation [by me]…

I find the whole affair quite bizarre. and somewhat perturbating. Indeed, when I recontacted the university to mention my concerns, I got the following [edited and possibly translated] email

As I’m sure you can appreciate, this is an unusual case. [We were] not able to alert you to this when nominating you as  examiners, as it is important that we follow our University process and allow examiners to reach independent conclusions as to the value of the work before them.  [We are] bound by our PhD Statute and would be prejudicing the examination  process if [we] provided additional information to examiners. [We] would also be providing a route for the candidate to appeal the outcome of the examination process.

This does not make any sense to me given that any referee of this thesis is going to hit the same case when reading the first pages of the thesis… Either the PhD student should remove this complaint from the PhD document (but this does not seem right, given that there is a published paper containing some of the results claimed in the thesis, even though referees from Statistics are very unlikely to be aware of it, as, again, I could not find the corresponding paper), or the whole information should be provided to the referees of the thesis so that they can judge the matter in full light… I do not see how I could pursue the matter any further, but the whole story left me feeling quite uncomfortable.

Follow

Get every new post delivered to your Inbox.

Join 342 other followers