Archive for PhD course

NCE, VAEs, GANs & even ABC…

Posted in Statistics with tags , , , , , , , , , , , , , on May 14, 2021 by xi'an

As I was preparing my (new) lectures for a PhD short course “at” Warwick (meaning on Teams!), I read a few surveys and other papers on all these acronyms. It included the massive Guttmann and Hyvärinen 2012 NCE JMLR paperGoodfellow’s NIPS 2016 tutorial on GANs, and  Kingma and Welling 2019 introduction to VAEs. Which I found a wee bit on the light side, maybe missing the fundamentals of the notion… As well as the pretty helpful 2019 survey on normalising flows by Papamakarios et al., although missing on the (statistical) density estimation side.  And also a nice (2017) survey of GANs by Shakir Mohamed and Balaji Lakshminarayanan with a somewhat statistical spirit, even though convergence issues are not again not covered. But misspecification is there. And the many connections between ABC and GANs, if definitely missing on the uncertainty aspects. While Deep Learning by Goodfellow, Bengio and Courville adresses both the normalising constant (or partition function) and GANs, it was somehow not deep enough (!) to use for the course, offering only a few pages on NCE, VAEs and GANs. (And also missing on the statistical references addressing the issue, incl. [or excl.]  Geyer, 1994.) Overall, the infinite variations offered on GANs leave me uncertain about their statistical relevance, as it is unclear how good the regularisation therein is for handling overfitting and consistent estimation. (And if I spot another decomposition of the Kullback-Leibler divergence, I may start crying…)

Jeffreys priors for hypothesis testing [Bayesian reads #2]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , , on February 9, 2019 by xi'an

A second (re)visit to a reference paper I gave to my OxWaSP students for the last round of this CDT joint program. Indeed, this may be my first complete read of Susie Bayarri and Gonzalo Garcia-Donato 2008 Series B paper, inspired by Jeffreys’, Zellner’s and Siow’s proposals in the Normal case. (Disclaimer: I was not the JRSS B editor for this paper.) Which I saw as a talk at the O’Bayes 2009 meeting in Phillie.

The paper aims at constructing formal rules for objective proper priors in testing embedded hypotheses, in the spirit of Jeffreys’ Theory of Probability “hidden gem” (Chapter 3). The proposal is based on symmetrised versions of the Kullback-Leibler divergence κ between null and alternative used in a transform like an inverse power of 1+κ. With a power large enough to make the prior proper. Eventually multiplied by a reference measure (i.e., the arbitrary choice of a dominating measure.) Can be generalised to any intrinsic loss (not to be confused with an intrinsic prior à la Berger and Pericchi!). Approximately Cauchy or Student’s t by a Taylor expansion. To be compared with Jeffreys’ original prior equal to the derivative of the atan transform of the root divergence (!). A delicate calibration by an effective sample size, lacking a general definition.

At the start the authors rightly insist on having the nuisance parameter v to differ for each model but… as we all often do they relapse back to having the “same ν” in both models for integrability reasons. Nuisance parameters make the definition of the divergence prior somewhat harder. Or somewhat arbitrary. Indeed, as in reference prior settings, the authors work first conditional on the nuisance then use a prior on ν that may be improper by the “same” argument. (Although conditioning is not the proper term if the marginal prior on ν is improper.)

The paper also contains an interesting case of the translated Exponential, where the prior is L¹ Student’s t with 2 degrees of freedom. And another one of mixture models albeit in the simple case of a location parameter on one component only.

ABC in Les Diablerets

Posted in Statistics with tags , , , , , , , , , , on February 14, 2017 by xi'an

Since I could not download the slides of my ABC course in Les Diablerets in one go, I broke them by chapters as follows. (Warning: there is very little novelty in those slides, except for the final part on consistency.)

Although I did not do it on purpose (!), starting with indirect inference and other methods inspired from econometrics induced some discussion in the first hour of the course with econometricians in the room. Including Elvezio Ronchetti.

I also regretted piling too much material in the alphabet soup, as it was too widespread for a new audience. And as I could not keep the coherence of the earlier parts by going thru so many papers at once. Especially since I was a bit knackered after a day of skiing….

I managed to get to the final convergence chapter on the last day, even though I had to skip some of the earlier material. Which should be reorganised anyway as the parts between model choice with random forests and inference with random forests are not fully connected!

off to Oxford

Posted in Kids, pictures, Travel, University life with tags , , , , , , , on January 31, 2016 by xi'an

Oxford, Feb. 23, 2012I am off to Oxford this evening for teaching once again in the Bayesian module of the OxWaSP programme. Joint PhD programme between Oxford and Warwick, supported by the EPSRC. And with around a dozen new [excellent!] PhD students every year. Here are the slides of a longer course that I will use in the coming days:

And by popular request (!) here is the heading of my Beamer file:

% Rather be using my own color
\setbeamercolor{alerted text}{fg=lightred}

a week in Oxford

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on January 26, 2015 by xi'an

1sprI spent [most of] the past week in Oxford in connection with our joint OxWaSP PhD program, which is supported by the EPSRC, and constitutes a joint Centre of Doctoral Training in  statistical science focussing on data-­intensive environments and large-­scale models. The first cohort of a dozen PhD students had started their training last Fall with the first year spent in Oxford, before splitting between Oxford and Warwick to write their thesis.  Courses are taught over a two week block, with a two day introduction to the theme (Bayesian Statistics in my case), followed by reading, meetings, daily research talks, mini-projects, and a final day in Warwick including presentations of the mini-projects and a concluding seminar.  (involving Jonty Rougier and Robin Ryder, next Friday). This approach by bursts of training periods is quite ambitious in that it requires a lot from the students, both through the lectures and in personal investment, and reminds me somewhat of a similar approach at École Polytechnique where courses are given over fairly short periods. But it is also profitable for highly motivated and selected students in that total immersion into one topic and a large amount of collective work bring them up to speed with a reasonable basis and the option to write their thesis on that topic. Hopefully, I will see some of those students next year in Warwick working on some Bayesian analysis problem!

On a personal basis, I also enjoyed very much my time in Oxford, first for meeting with old friends, albeit too briefly, and second for cycling, as the owner of the great Airbnb place I rented kindly let me use her bike to go around, which allowed me to go around quite freely! Even on a train trip to Reading. As it was a road racing bike, it took me a trip or two to get used to it, especially on the first day when the roads were somewhat icy, but I enjoyed the lightness of it, relative to my lost mountain bike, to the point of considering switching to a road bike for my next bike… I had also some apprehensions with driving at night, which I avoid while in Paris, but got over them until the very last night when I had a very close brush with a car entering from a side road, which either had not seen me or thought I would let it pass. Gave me the opportunity of shouting Oï!