Hanwen Xing from Oxford recently posted a paper on arXiv about using GANs to improve the overlap bewtween the densities in bridge sampling. Bringing out new connections with noise contrastive estimation. The idea is to optimise a transform of one of the densities h() to bring it closer to the other density k(), using for instance normalising flows. (The call to transforms for bridge is not new, dating at least to Voter in 1985, the year I was starting my PhD!) Furthermore, using an f-divergence as a measure of functional distance allows for a reasonably straightforward update of the transform. That can be reformulated as a GAN target, which is somewhat natural in that the transform aims at confusing simulation from the transform of h and from k. This is quite an interesting proposal, even though calculating the optimal transform is time-consuming and subjet to the curse of dimensionality. I also wonder at whether or not iterating the optimisation, one density after the other, would be bring further improvement.
Archive for PhD
improving bridge samplers by GANs
Posted in Books, pictures, Statistics with tags bridge sampling, curse of dimensionality, GANs, noise contrasting estimation, normalising flow, PhD, Saint Giles cemetery, University of Oxford on July 20, 2021 by xi'anscalable Metropolis-Hastings, nested Monte Carlo, and normalising flows
Posted in Books, pictures, Statistics, University life with tags Bayesian neural networks, Bernstein-von Mises theorem, CIF, computing cost, conferences, density approximation, dissertation, doubly intractable posterior, evidence, ICML 2019, ICML 2020, image analysis, International Conference on Machine Learning, L¹ convergence, logistic regression, nesting Monte Carlo, normalising flow, PhD, probabilistic programming, quarantine, SAME algorithm, scalable MCMC, thesis defence, University of Oxford, variational autoencoders, viva on June 16, 2020 by xi'anOver a sunny if quarantined Sunday, I started reading the PhD dissertation of Rob Cornish, Oxford University, as I am the external member of his viva committee. Ending up in a highly pleasant afternoon discussing this thesis over a (remote) viva yesterday. (If bemoaning a lost opportunity to visit Oxford!) The introduction to the viva was most helpful and set the results within the different time and geographical zones of the Ph.D since Rob had to switch from one group of advisors in Engineering to another group in Statistics. Plus an encompassing prospective discussion, expressing pessimism at exact MCMC for complex models and looking forward further advances in probabilistic programming.
Made of three papers, the thesis includes this ICML 2019 [remember the era when there were conferences?!] paper on scalable Metropolis-Hastings, by Rob Cornish, Paul Vanetti, Alexandre Bouchard-Côté, Georges Deligiannidis, and Arnaud Doucet, which I commented last year. Which achieves a remarkable and paradoxical O(1/√n) cost per iteration, provided (global) lower bounds are found on the (local) Metropolis-Hastings acceptance probabilities since they allow for Poisson thinning à la Devroye (1986) and second order Taylor expansions constructed for all components of the target, with the third order derivatives providing bounds. However, the variability of the acceptance probability gets higher, which induces a longer but still manageable if the concentration of the posterior is in tune with the Bernstein von Mises asymptotics. I had not paid enough attention in my first read at the strong theoretical justification for the method, relying on the convergence of MAP estimates in well- and (some) mis-specified settings. Now, I would have liked to see the paper dealing with a more complex problem that logistic regression.
The second paper in the thesis is an ICML 2018 proceeding by Tom Rainforth, Robert Cornish, Hongseok Yang, Andrew Warrington, and Frank Wood, which considers Monte Carlo problems involving several nested expectations in a non-linear manner, meaning that (a) several levels of Monte Carlo approximations are required, with associated asymptotics, and (b) the resulting overall estimator is biased. This includes common doubly intractable posteriors, obviously, as well as (Bayesian) design and control problems. [And it has nothing to do with nested sampling.] The resolution chosen by the authors is strictly plug-in, in that they replace each level in the nesting with a Monte Carlo substitute and do not attempt to reduce the bias. Which means a wide range of solutions (other than the plug-in one) could have been investigated, including bootstrap maybe. For instance, Bayesian design is presented as an application of the approach, but since it relies on the log-evidence, there exist several versions for estimating (unbiasedly) this log-evidence. Similarly, the Forsythe-von Neumann technique applies to arbitrary transforms of a primary integral. The central discussion dwells on the optimal choice of the volume of simulations at each level, optimal in terms of asymptotic MSE. Or rather asymptotic bound on the MSE. The interesting result being that the outer expectation requires the square of the number of simulations for the other expectations. Which all need converge to infinity. A trick in finding an estimator for a polynomial transform reminded me of the SAME algorithm in that it duplicated the simulations as many times as the highest power of the polynomial. (The ‘Og briefly reported on this paper… four years ago.)
The third and last part of the thesis is a proposal [to appear in ICML 20] on relaxing bijectivity constraints in normalising flows with continuously index flows. (Or CIF. As Rob made a joke about this cleaning brand, let me add (?) to that joke by mentioning that looking at CIF and bijections is less dangerous in a Trump cum COVID era at CIF and injections!) With Anthony Caterini, George Deligiannidis and Arnaud Doucet as co-authors. I am much less familiar with this area and hence a wee bit puzzled at the purpose of removing what I understand to be an appealing side of normalising flows, namely to produce a manageable representation of density functions as a combination of bijective and differentiable functions of a baseline random vector, like a standard Normal vector. The argument made in the paper is that imposing this representation of the density imposes a constraint on the topology of its support since said support is homeomorphic to the support of the baseline random vector. While the supporting theoretical argument is a mathematical theorem that shows the Lipschitz bound on the transform should be infinity in the case the supports are topologically different, these arguments may be overly theoretical when faced with the practical implications of the replacement strategy. I somewhat miss its overall strength given that the whole point seems to be in approximating a density function, based on a finite sample.
the mind of a con man
Posted in University life with tags Diederik Stapel, fake data, Holland, NYT, PhD, seminar, The New York Times, Tilburg on May 21, 2013 by xi'an“The tone of his talks, he said, was “Let’s not talk about the plumbing, the nuts and bolts — that’s for plumbers, for statisticians.””
As I got a tablet last week and immediately subscribed to the New York Times, I started reading papers from recent editions and got to this long article of April 26, by Yudhijit Bhattacharjee on Diederik Stapel, the Dutch professor of psychology who used fake data in dozens of papers and PhD theses.
“In his early years of research — when he supposedly collected real experimental data — Stapel wrote papers laying out complicated and messy relationships between multiple variables. He soon realized that journal editors preferred simplicity.”
This article is rather puzzling in its presentation of the facts. While Stapel acknowledges making up the data that conveniently supported his theses, the journalist’s analysis is fairly ambivalent, for instance considering that faking data is a “lesser threat to the integrity of science than the massaging of data and selective reporting of experiments”. At the beginning of the article, Stapel is shown going back to places where his experiments were supposed to have taken place, but he “could not find a location that matched the conditions described in his experiment”, making it sound as if he had forgotten…
“Science is of course about discovery, about digging to discover the truth. But it is also communication, persuasion, marketing (…) People are on the road with their talk. With the same talk. It’s like a circus (…) They give a talk in Berlin, two days later they give the same talk in Amsterdam, then they go to London. They are traveling salesmen selling their story.”
The above quote from Stapel is even more puzzling, as if giving the same talk in different places is an unacceptable academic behaviour, in par with faking data and plagiarism… I do give the same talk in several conferences and seminars, mostly to different people and I do not see a problem with this. If I persist in this behaviour, it will get boring to people who see the same talk over and over, and it should lead to me not being invited to conferences or seminars any longer, but there is nothing unethical or a-scientific in this. Another illustration of the ambivalence of both the character and the article. I frankly dislike this approach to fraud, a kind of “50 shades of lies”, where all academics get under suspicion that one way or another they also acted un-ethically and in their own interest rather than towards the advancement of Science…
ABC [PhD] course
Posted in Books, R, Statistics, Travel, University life with tags ABC, CREST, graduate course, indirect inference, Malakoff, model choice, Paris, PhD, R, Roma, summary statistics on January 26, 2012 by xi'anAs mentioned in the latest post on ABC, I am giving a short doctoral course on ABC methods and convergence at CREST next week. I have now made a preliminary collection of my slides (plus a few from Jean-Michel Marin’s), available on slideshare (as ABC in Roma, because I am also giving the course in Roma, next month, with an R lab on top of it!):
and I did manage to go over the book by Gouriéroux and Monfort on indirect inference over the weekend. I still need to beef up the slides before the course starts next Thursday! (The core version of the slides is actually from the course I gave in Wharton more than a year ago.)
matematika mugaz bestalde
Posted in Statistics with tags applied mathematics, Basque Center for Applied Mathematics, Basque country, Bilbao, intership, MCMC, PhD, simulation on October 19, 2011 by xi'anHere is a call for applications (postdocs and PhDs, as well as undergrads) to the Basque Center for Applied Mathematics in the beautiful Basque country (in Bilbao, Spain):
BCAM, the Basque Center for Applied Mathematics (www.bcamath.org), is a Research Center whose mission is to develop high quality interdisciplinary research. With an international team of researchers, the scientific program is structured in several research areas covering various fields of Applied Mathematics. Moreover, BCAM is a host institution for several national and international research projects and also develops collaborative projects with Basque Industry. BCAM has opened an International Call for Researchers, offering positions for Senior and Associate Researchers, Postdoctoral Fellows and PhD Students in all areas of Applied Mathematics. Specific positions are also available within the Research Project NUMERIWAVES “New Analytical and Numerical Methods in Wave Propagation”, coordinated by E. Zuazua and funded by the ERC – European Research Council.Applications must be submitted on-line. Deadline for submission: November 11th 2011
The deadline is November 11, about three weeks from now. Of course, when looking at the programme so far, it is very remotely connected with Statistics, but they do emphasize simulation. If not of the MCMC type.