Archive for NIPS

the Montréal declarAIon

Posted in University life with tags , , , , , , , , , , , , on April 27, 2019 by xi'an

In conjunction with Yoshua Bengio being one of the three recipients of the 2018 Alan Turing award, Nature ran an interview of him about the Montréal Déclaration for a responsible AI, which he launched at NeurIPS last December.

“Self-regulation is not going to work. Do you think that voluntary taxation works? It doesn’t.”

Reflecting on the dangers of abuse of and by AIs, from surveillance, to discrimination, but being somewhat unclear on the means to implement the ten generic principles listed there. (I had missed the Declaration when it came up.) I agree with the principles stressed by this list, well-being, autonomy, privacy, democracy, diversity, prudence, responsability, and sustainability, it remains to be seem how they can be imposed upon corporations whose own public image puts more restraint on them than ethics or on governments that are all too ready to automatise justice, police, and the restriction of citizen’s rights. Which makes the construction of a responsible AI institution difficult to imagine, if the current lack of outreach of the extra-national institutions is the gauge. (A striking coincidence is that, when  Yoshua Bengio pointed out that France was trying to make Europe an AI power, there was also a tribune in Le Monde about the lack of practical impact of this call to arms, apart from more academics moving to half-time positions in private companies.) [Warning: the picture does not work well with the dark background theme of this blog.]

the future of conferences

Posted in Books, Kids, pictures, Travel, University life with tags , , , , , , , , , , , , , on January 22, 2019 by xi'an

The last issue of Nature for 2018 offers a stunning collection of science photographs, ten portraits of people who mattered (for the editorial board of Nature), and a collection of journalists’ entries on scientific conferences. The later point leading to interesting questioning on the future of conferences, some of which relate to earlier entries on this blog. Like attempts to make them having a lesser carbon footprint, by only attending focused conferences and workshops, warning about predatory ones, creating local hives on different continents that can partake of all talks but reduce travel and size and still allow for exchanges person to person, multiply the meetings and opportunities around a major conference to induce “only” one major trip (as in the past summer of British conferences, or the incoming geographical combination of BNP and O’Bayes 2019), cut the traditional dreary succession of short talks in parallel in favour of “unconferences” where participants set communally the themes and  structure of the meeting (but ware the dangers of bias brought by language, culture, seniority!). Of course, this move towards new formats will meet opposition from several corners, including administrators who too often see conferences as a pretense for paid vacations and refuse supporting costs without a “concrete” proof of work in the form of a presentation.Another aspect of conference was discussed there, namely the art of delivering great talks. Which is indeed more an art than a science, since the impact will not only depend on the speaker and the slides, but also on the audience and the circumstances. As years pile on, I am getting less stressed and probably too relaxed about giving talks, but still rarely feel I have reached toward enough of the audience. And still falling too easily for the infodump mistake… Which reminds me of a recent column in Significance (although I cannot link to it!), complaining about “finding it hard or impossible to follow many presentations, particularly those that involved a large number of equations.” Which sounds strange to me as on the opposite I quickly loose track in talks with no equations. And as mathematical statistics or probability issues seems to imply the use of maths symbols and equations. (This reminded me of a short course I gave once in a undisclosed location, where a portion of the audience left after the first morning, due to my use of “too many Greek letters”.) Actually, I am always annoyed at apologies for using proper maths notations, since they are the tools of our trade.Another entry of importance in this issue of Nature is an interview with Katherine Heller and Hal Daumé, as first chairs for diversity and inclusion at N[eur]IPS. Where they discuss the actions taken since the previous NIPS 2017 meeting to address the lack of inclusiveness and the harassment cases exposed there, first by Kristian Lum, Lead Statistician at the Human Rights Data Analysis Group (HRDAG), whose blog denunciation set the wheels turning towards a safer and better environment (in stats as well as machine-learning). This included the [last minute] move towards renaming the conference as NeuroIPS to avoid sexual puns on the former acronym (which as a non-native speaker I missed until it was pointed out to me!). Judging from the feedback it seems that the wheels have indeed turned a significant amount and hopefully will continue its progress.

peer reviews on-line or peer community?

Posted in Statistics with tags , , , , , , , , , on September 20, 2018 by xi'an

Nature (or more precisely some researchers through Nature, associated with the UK Wellcome Trust, the US Howard Hughes Medical Institute (hhmo), and ASAPbio) has (have) launched a call for publishing reviews next to accept papers, one way or another, which is something I (and many others) have supported for quite a while. Including for rejected papers, not only because making these reviews public diminishes on principle the time involved in re-reviewing re-submitted papers but also because this should induce authors to revise papers with obvious flaws and missing references (?). Or abstain from re-submitting. Or publish a rejoinder addressing the criticisms. Anything that increases the communication between all parties, as well as the perspectives on a given paper. (This year, NIPS allows for the posting of reviews of rejected submissions, which I find a positive trend!)

In connection with this entry, I am still most sorry that I could not pursue the [superior in my opinion] project of Peer Community in computational statistics, for the time requested by Biometrika editing is just too important [given my current stamina!] for me to handle another journal (or the better alternative to a journal!). I hope someone else can take over the project and create the editorial team needed to run it.

And yet again in connection with this post (!), Andrew posted an announcement about the launch of res3archers.one, an on-line publication forum launched by Harry Crane and Ryan Martin, where the authors handle the peer review process from A to Z, including choosing the reviewers, whose reviews may be public or not, taken into account or not. Once published, the papers are open to comments from users, which constitutes a form of post-publication peer-review. Albeit a weak one in my opinion as the weakness of all such open depositories is the potential lack of interest of and reaction from the community. Incidentally, there is a $10 fee per submission for maintenance. Contrary to Peer Community in… the copyright is partly transferred to res3archers.one, which apparently prevents further publication in another journal.

troubling trends in machine learning

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , , , , , , , on July 25, 2018 by xi'an

This morning, in Coventry, while having an n-th cup of tea after a very early morning run (light comes early at this time of the year!), I spotted an intriguing title in the arXivals of the day, by Zachary Lipton and Jacob Steinhard. Addressing the academic shortcomings of machine learning papers. While I first thought little of the attempt to address poor scholarship in the machine learning literature, I read it with growing interest and, although I am pessimistic at the chances of inverting the trend, considering the relentless pace and massive production of the community, I consider the exercise worth conducting, if only to launch a debate on the excesses found in the literature.

“…desirable characteristics:  (i) provide intuition to aid the reader’s understanding, but clearly distinguish it from stronger conclusions supported by evidence; (ii) describe empirical investigations that consider and rule out alternative hypotheses; (iii) make clear the relationship between theoretical analysis and intuitive or empirical claims; and (iv) use language to empower the reader, choosing terminology to avoid misleading or unproven connotations, collisions with other definitions, or conflation with other related but distinct concepts”

The points made by the authors are (p.1)

  1. Failure to distinguish between explanation and speculation
  2. Failure to identify the sources of empirical gains
  3. Mathiness
  4. Misuse of language

Again, I had misgiving about point 3., but this is not an anti-maths argument, rather about the recourse to vaguely connected or oversold mathematical results as a way to support a method.

Most interestingly (and living dangerously!), the authors select specific papers to illustrate their point, picking from well-established authors and from their own papers, rather than from junior authors. And also include counter-examples of papers going the(ir) right way. Among the recommendations for emerging from the morass of poor scholarship papers, they suggest favouring critical writing and retrospective surveys (provided authors can be found for these!). And mention open reviews before I can mention these myself. One would think that published anonymous reviews are a step in the right direction, I would actually say that this should be the norm (plus or minus anonymity) for all journals or successors of journals (PCis coming strongly to mind). But requiring more work from the referees implies rewards for said referees, as done in some biology and hydrology journals I refereed for (and PCIs of course).

rage against the [Nature] Machine [Intelligence]

Posted in Books, Statistics, University life with tags , , , , , , , , , on May 15, 2018 by xi'an

Yesterday evening, my friend and colleague Pierre Alquier (CREST-ENSAE) got interviewed (for a few seconds on-line!, around minute 06) by the French national radio, France Culture, about the recent call to boycott the incoming Nature Machine Intelligence electronic journal. Call to the machine learning community, based on the lack of paying journals among the major machine learnings journals, like JMLR. Meaning that related conferences like AISTATS and NIPS also get their accepted papers available on-line for free. As noted in the call

“Machine learning has been at the forefront of the movement for free and open access to research. For example, in 2001 the Editorial Board of the Machine Learning Journal resigned en masse to form a new zero-cost open access journal, the Journal of Machine Learning Research (JMLR).”

1500 nuances of gan [gan gan style]

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on February 16, 2018 by xi'an

I recently realised that there is a currently very popular trend in machine learning called GAN [for generative adversarial networks] that strongly connects with ABC, at least in that it relies mostly on the availability of a generative model, i.e., a probability model that can be generated as in x=G(ϵ;θ), to draw inference about θ [or predictions]. For instance, there was a GANs tutorial at NIPS 2016 by Ian Goodfellow and many talks on the topic at recent NIPS, the 1500 in the title referring to the citations of the GAN paper by Goodfellow et al. (2014). (The name adversarial comes from opposing true model to generative model in the inference. )

If you remember Jeffreys‘s famous pique about classical tests as being based on improbable events that did not happen, GAN, like ABC,  is sort of the opposite in that it generates events until the one that was observed happens. More precisely, by generating pseudo-samples and switching parameters θ until these samples get as confused as possible between the data generating (“true”) distribution and the generative one. (In its original incarnation, GAN is indeed an optimisation scheme in θ.) A basic presentation of GAN is that it constructs a function D(x,ϕ) that represents the probability that x came from the true model p versus the generative model, ϕ being the parameter of a neural network trained to this effect, aimed at minimising in ϕ a two-term objective function

E[log D(x,ϕ)]+E[log(1D(G(ϵ;θ),ϕ))]

where the first expectation is taken under the true model and the second one under the generative model.

“The discriminator tries to best distinguish samples away from the generator. The generator tries to produce samples that are indistinguishable by the discriminator.” Edward

One ABC perception of this technique is that the confusion rate

E[log(1D(G(ϵ;θ),ϕ))]

is a form of distance between the data and the generative model. Which expectation can be approximated by repeated simulations from this generative model. Which suggests an extension from the optimisation approach to a ABCyesian version by selecting the smallest distances across a range of θ‘s simulated from the prior.

This notion relates to solution using classification tools as density ratio estimation, connecting for instance to Gutmann and Hyvärinen (2012). And ultimately with Geyer’s 1992 normalising constant estimator.

Another link between ABC and networks also came out during that trip. Proposed by Bishop (1994), mixture density networks (MDN) are mixture representations of the posterior [with component parameters functions of the data] trained on the prior predictive through a neural network. These MDNs can be trained on the ABC learning table [based on a specific if redundant choice of summary statistics] and used as substitutes to the posterior distribution, which brings an interesting alternative to Simon Wood’s synthetic likelihood. In a paper I missed Papamakarios and Murray suggest replacing regular ABC with this version…

Dirichlet process mixture inconsistency

Posted in Books, Statistics with tags , , , , on February 15, 2016 by xi'an

cover of Mixture Estimation and ApplicationsJudith Rousseau pointed out to me this NIPS paper by Jeff Miller and Matthew Harrison on the possible inconsistency of Dirichlet mixtures priors for estimating the (true) number of components in a (true) mixture model. The resulting posterior on the number of components does not concentrate on the right number of components. Which is not the case when setting a prior on the unknown number of components of a mixture, where consistency occurs. (The inconsistency results established in the paper are actually focussed on iid Gaussian observations, for which the estimated number of Gaussian components is almost never equal to 1.) In a more recent arXiv paper, they also show that a Dirichlet prior on the weights and a prior on the number of components can still produce the same features as a Dirichlet mixtures priors. Even the stick breaking representation! (Paper that I already reviewed last Spring.)