Archive for conferences

your GAN is secretly an energy-based model

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , on January 5, 2021 by xi'an

As I was reading this NeurIPS 2020 paper by Che et al., and trying to make sense of it, I came across a citation to our paper Casella, Robert and Wells (2004) on a generalized accept-reject sampling scheme where the proposal changes at each simulation that sounds surprising if appreciated! But after checking this paper also appears as the first reference on the Wikipedia page for rejection sampling, which makes me wonder if many actually read it. (On the side, we mostly wrote this paper on a drive from Baltimore to Ithaca, after JSM 1999.)

“We provide more evidence that it is beneficial to sample from the energy-based model defined both by the generator and the discriminator instead of from the generator only.”

The paper seems to propose a post-processing of the generator output by a GAN, generating from the mixture of both generator and discriminator, via a (unscented) Langevin algorithm. The core idea is that, if p(.) is the true data generating process, g(.) the estimated generator and d(.) the discriminator, then

p(x) ≈ p⁰(x)∝g(x) exp(d(x))

(The approximation would be exact the discriminator optimal.) The authors work with the latent z’s, in the GAN meaning that generating pseudo-data x from g means taking a deterministic transform of z, x=G(z). When considering the above p⁰, a generation from p⁰ can be seen as accept-reject with acceptance probability proportional to exp[d{G(z)}]. (On the side, Lemma 1 is the standard validation for accept-reject sampling schemes.)

Reading this paper made me realised how much the field had evolved since my previous GAN related read. With directions like Metropolis-Hastings GANs and Wasserstein GANs. (And I noticed a “broader impact” section past the conclusion section about possible misuses with societal consequences, which is a new requirement for NeurIPS publications.)

scalable Metropolis-Hastings, nested Monte Carlo, and normalising flows

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , on June 16, 2020 by xi'an

Over a sunny if quarantined Sunday, I started reading the PhD dissertation of Rob Cornish, Oxford University, as I am the external member of his viva committee. Ending up in a highly pleasant afternoon discussing this thesis over a (remote) viva yesterday. (If bemoaning a lost opportunity to visit Oxford!) The introduction to the viva was most helpful and set the results within the different time and geographical zones of the Ph.D since Rob had to switch from one group of advisors in Engineering to another group in Statistics. Plus an encompassing prospective discussion, expressing pessimism at exact MCMC for complex models and looking forward further advances in probabilistic programming.

Made of three papers, the thesis includes this ICML 2019 [remember the era when there were conferences?!] paper on scalable Metropolis-Hastings, by Rob Cornish, Paul Vanetti, Alexandre Bouchard-Côté, Georges Deligiannidis, and Arnaud Doucet, which I commented last year. Which achieves a remarkable and paradoxical O(1/√n) cost per iteration, provided (global) lower bounds are found on the (local) Metropolis-Hastings acceptance probabilities since they allow for Poisson thinning à la Devroye (1986) and  second order Taylor expansions constructed for all components of the target, with the third order derivatives providing bounds. However, the variability of the acceptance probability gets higher, which induces a longer but still manageable if the concentration of the posterior is in tune with the Bernstein von Mises asymptotics. I had not paid enough attention in my first read at the strong theoretical justification for the method, relying on the convergence of MAP estimates in well- and (some) mis-specified settings. Now, I would have liked to see the paper dealing with a more complex problem that logistic regression.

The second paper in the thesis is an ICML 2018 proceeding by Tom Rainforth, Robert Cornish, Hongseok Yang, Andrew Warrington, and Frank Wood, which considers Monte Carlo problems involving several nested expectations in a non-linear manner, meaning that (a) several levels of Monte Carlo approximations are required, with associated asymptotics, and (b) the resulting overall estimator is biased. This includes common doubly intractable posteriors, obviously, as well as (Bayesian) design and control problems. [And it has nothing to do with nested sampling.] The resolution chosen by the authors is strictly plug-in, in that they replace each level in the nesting with a Monte Carlo substitute and do not attempt to reduce the bias. Which means a wide range of solutions (other than the plug-in one) could have been investigated, including bootstrap maybe. For instance, Bayesian design is presented as an application of the approach, but since it relies on the log-evidence, there exist several versions for estimating (unbiasedly) this log-evidence. Similarly, the Forsythe-von Neumann technique applies to arbitrary transforms of a primary integral. The central discussion dwells on the optimal choice of the volume of simulations at each level, optimal in terms of asymptotic MSE. Or rather asymptotic bound on the MSE. The interesting result being that the outer expectation requires the square of the number of simulations for the other expectations. Which all need converge to infinity. A trick in finding an estimator for a polynomial transform reminded me of the SAME algorithm in that it duplicated the simulations as many times as the highest power of the polynomial. (The ‘Og briefly reported on this paper… four years ago.)

The third and last part of the thesis is a proposal [to appear in ICML 20] on relaxing bijectivity constraints in normalising flows with continuously index flows. (Or CIF. As Rob made a joke about this cleaning brand, let me add (?) to that joke by mentioning that looking at CIF and bijections is less dangerous in a Trump cum COVID era at CIF and injections!) With Anthony Caterini, George Deligiannidis and Arnaud Doucet as co-authors. I am much less familiar with this area and hence a wee bit puzzled at the purpose of removing what I understand to be an appealing side of normalising flows, namely to produce a manageable representation of density functions as a combination of bijective and differentiable functions of a baseline random vector, like a standard Normal vector. The argument made in the paper is that imposing this representation of the density imposes a constraint on the topology of its support since said support is homeomorphic to the support of the baseline random vector. While the supporting theoretical argument is a mathematical theorem that shows the Lipschitz bound on the transform should be infinity in the case the supports are topologically different, these arguments may be overly theoretical when faced with the practical implications of the replacement strategy. I somewhat miss its overall strength given that the whole point seems to be in approximating a density function, based on a finite sample.

sustainable workshops and conferences

Posted in Kids, pictures, Statistics, Travel, University life with tags , , , , , , , , on February 17, 2020 by xi'an

The current uncertainty about whether or not ISBA 2020 will take place (and where it will take place), along with a recent Nature article and a discussion in the common room of the Department of Statistics at Warwick, lead me to renew the call for a more sustainable form of conferencing. By creating a network of local havens or a garland of local magnets (competition open for a catchy “x of local y” denomination) attracting people in the area to gather together, attend some of the video-ed talks in the other knots, and add their own local activities, from talks to collaborative brainstorming. Obviously, this requires additional planning and some technical details, but it should become a habit rather than the exception. ABC in Grenoble is thinking about it, let me know if you are interesting in creating a local image of the workshop.

NeurIPS without visa

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on September 22, 2019 by xi'an

I came by chance upon this 2018 entry in Synced that NeurIPS now takes place in Canada between Montréal and Vancouver primarily because visas to Canada are easier to get than visas to the USA, even though some researchers still get difficulties in securing theirs. Especially researchers from some African countries, which is exposed  in the article as one of the reasons the next ICLR takes place in Addis Ababa. Which I wish I could attend! In the meanwhile, I will be taking part in an ABC workshop in Vancouver, December 08, prior to NeurIPS 2019, before visiting the Department of Statistics at UBC the day after. (My previous visit there was in 1990, I believe!) Incidentally but interestingly, the lottery entries for NeurIPS 2019 are open till September 25, to the public (those not contributing to the conference or any of its affiliated groups). This is certainly better than having bots buying all entries within 12 minutes of the opening time!

More globally, this entry makes me wonder how learned societies could invest in ensuring locations for their (international) meetings allow for a maximum inclusion in terms of these visa difficulties, but also ensuring freedom and safety for all members. Which may prove a de facto impossibility. For instance, Ethiopia has a rather poor record in terms of human rights and, in particular, homosexuality is criminalised there. An alternative would be to hold the conferences in parallel locations chosen to multiply the chances for this inclusion, but this could prove counter-productive [for inclusion] by creating groups that would never ever meet. An insolvable conundrum?

the future of conferences

Posted in Books, Kids, pictures, Travel, University life with tags , , , , , , , , , , , , , on January 22, 2019 by xi'an

The last issue of Nature for 2018 offers a stunning collection of science photographs, ten portraits of people who mattered (for the editorial board of Nature), and a collection of journalists’ entries on scientific conferences. The later point leading to interesting questioning on the future of conferences, some of which relate to earlier entries on this blog. Like attempts to make them having a lesser carbon footprint, by only attending focused conferences and workshops, warning about predatory ones, creating local hives on different continents that can partake of all talks but reduce travel and size and still allow for exchanges person to person, multiply the meetings and opportunities around a major conference to induce “only” one major trip (as in the past summer of British conferences, or the incoming geographical combination of BNP and O’Bayes 2019), cut the traditional dreary succession of short talks in parallel in favour of “unconferences” where participants set communally the themes and  structure of the meeting (but ware the dangers of bias brought by language, culture, seniority!). Of course, this move towards new formats will meet opposition from several corners, including administrators who too often see conferences as a pretense for paid vacations and refuse supporting costs without a “concrete” proof of work in the form of a presentation.Another aspect of conference was discussed there, namely the art of delivering great talks. Which is indeed more an art than a science, since the impact will not only depend on the speaker and the slides, but also on the audience and the circumstances. As years pile on, I am getting less stressed and probably too relaxed about giving talks, but still rarely feel I have reached toward enough of the audience. And still falling too easily for the infodump mistake… Which reminds me of a recent column in Significance (although I cannot link to it!), complaining about “finding it hard or impossible to follow many presentations, particularly those that involved a large number of equations.” Which sounds strange to me as on the opposite I quickly loose track in talks with no equations. And as mathematical statistics or probability issues seems to imply the use of maths symbols and equations. (This reminded me of a short course I gave once in a undisclosed location, where a portion of the audience left after the first morning, due to my use of “too many Greek letters”.) Actually, I am always annoyed at apologies for using proper maths notations, since they are the tools of our trade.Another entry of importance in this issue of Nature is an interview with Katherine Heller and Hal Daumé, as first chairs for diversity and inclusion at N[eur]IPS. Where they discuss the actions taken since the previous NIPS 2017 meeting to address the lack of inclusiveness and the harassment cases exposed there, first by Kristian Lum, Lead Statistician at the Human Rights Data Analysis Group (HRDAG), whose blog denunciation set the wheels turning towards a safer and better environment (in stats as well as machine-learning). This included the [last minute] move towards renaming the conference as NeuroIPS to avoid sexual puns on the former acronym (which as a non-native speaker I missed until it was pointed out to me!). Judging from the feedback it seems that the wheels have indeed turned a significant amount and hopefully will continue its progress.