a journal of the plague year [deconfited reviews]

Found a copy of Humans by Donald Westlake on the book sharing shelves at Dauphine. And read it within a few hours, as it is very light reading but quite funny nonetheless. If hardly ranking as a mystery novel. Or crime novel, unless the crime is Gaiacide and the criminal God. Reminded me of the equally light Bobby Dollar series by Tad Williams. As the main character is an angel, falling for humans as he tries to steer them towards the Armageddon. The setting is the early 1990s, with the main scares being atomic disaster (Chernobyl) and the AIDS pandemic. Plus the rise of environmental worries and of Chinese autocracy. I put it back on the shelves on my next visit to Dauphine, hopefully for someone else to enjoy!

Baked radish stems with basil for making pesto, with a bit more bitterness than usual. Cooked plenty of fennel since this is fennel season. Continued making my weekly rhubarb preserve. Keeping the garden active, now watching squash vines invading new territory, hopefully with an eatable reward in the Fall. Tomatoes are growing incredibly fast as well..! Saw another fox in the Parc before official opening times, quite close if speeding away from me and barely avoiding bumping in a pair of greyhounds which fortunately sounded completely unconcerned.

Watched Children of Men after an exhausting week online for a grant panel. While a parabola for the coming collapse of civilisation under political, biological and environmental apocalypses [is there any meaning to use apocalyse in the plural tense!?] and a premonitory tale on Brexit and the buttressing of Britain [or Trump and his Big Wall mania] induced by anti-immigrant rethorics, the film is over the top in terms of plot and action, with symbolism taking over realism, even on the slightest degree, every shot being filled with references to religions and arts (like the Pink Floyd flying pig), to previous environmental disasters (with long shots of burning cows reminiscent of the mad cow crisis) and geo-political upheavals (including a Hamas type protest in the refugee camp, with a short appearance of a jeep with a French flag more reminiscent of the liberation of Paris in August 1944). Characters are charicaturesque, with a very Manichean division between very few good ones and mostly bad ones. The most ridiculous part of the scenario may well be the battle scene in the refugee camp [tanks versus pistols!]… Once again stunned by all the awards and praise piled upon that film.

Read two more volumes of the Witcher [bought during BayesComp for my son!]. One being Sword of Destiny and a series of short stories, like the first volume. The second Blood of Elves and the beginning of the novels. The first season on TV borrows mostly from the first two collections of short stories. Which are somewhat better than the novel, as the latter is very slow paced and overly sentimental. Not terrible, mind.

Completed with uttermost reluctance the Horde du Contrevent [translating as the windwalkers] by Alain Damasio (no English translation available, but an Italian version, l’Orda del Vento,  is). Book that I again picked for figuring in Le Monde 100 bes&tc list! And felt like constantly fronting a strong, icy wind when going through the pages of that unusual book. The style is unpleasant and rather pretentious, with numerous puns in French.. The story is one of a (religious? mystical?) group walking against the wind(s) for decades to reach the source of these winds and to find the last types of wind no one has ever met. Their dreary pilgrimage is described by the 23 membres of the group, called the Horde, with a heavy-handed typographical symbol at the start of each paragraph identifying who’s speaking (and a convenient page marker with all these symbols). A bit heavy handed as a polyphonic novel (appropriately composed in a Corsican retreat!) and even more in the crypt-Nietschean philosophy it carries… The background universe there is somehow eco-steam-punk, with the wind producing most of the energy. The most exciting part involves rather realistic ice climbing. However, I clearly stand in the small minority of those less than impressed by the book as it is highly popular among French readers, one of the highest printings in the Folio collection, with side products a BD (above) and a movie (in the making?). (And enough votes from fans to almost reach the 10 most favourite novels in Le Monde list. )

a cartoon that could have been made for lockdown

scalable Langevin exact algorithm [armchair Read Paper]

So, Murray Pollock, Paul Fearnhead, Adam M. Johansen and Gareth O. Roberts presented their Read Paper with discussions on the Wednesday aft! With a well-sized if virtual audience of nearly a hundred people. Here are a few notes scribbled during the Readings. And attempts at keeping the traditional structure of the meeting alive.

In their introduction, they gave the intuition of a quasi-stationary chain as the probability to be in A at time t while still alice as π(A) x exp(-λt) for a fixed killing rate λ. The concept is quite fascinating if less straightforward than stationarity! The presentation put the stress on the available recourse to an unbiased estimator of the κ rate whose initialisation scaled as O(n) but allowed a subsampling cost reduction afterwards. With a subsampling rat connected with Bayesian asymptotics, namely on how quickly the posterior concentrates. Unfortunately, this makes the practical construction harder, since n is finite and the concentration rate is unknown (although a default guess should be √n). I wondered if the link with self-avoiding random walks was more than historical.

The initialisation of the method remains a challenge in complex environments. And hence one may wonder if and how better it does when compared with SMC. Furthermore, while the motivation for using a Brownian motion stems from the practical side, this simulation does not account for the target π. This completely blind excursion sounds worse than simulating from the prior in other settings.

One early illustration for quasi stationarity was based on an hypothetical distribution of lions and wandering (Brownian) antelopes. I found that the associated concept of soft killing was not necessarily well received by …. the antelopes!

As it happens, my friend and coauthor Natesh Pillai was the first discussant! I did no not get the details of his first bimodal example. But he addressed my earlier question about how large the running time T should be. Since the computational cost should be exploding with T. He also drew a analogy with improper posteriors as to wonder about the availability of convergence assessment.

And my friend and coauthor Nicolas Chopin was the second discussant! Starting with a request to… leave the Pima Indians (model)  alone!! But also getting into a deeper assessment of the alternative use of SMCs.

scalable Langevin exact algorithm [Read Paper]

Murray Pollock, Paul Fearnhead, Adam M. Johansen and Gareth O. Roberts (CoI: all with whom I have strong professional and personal connections!) have a Read Paper discussion happening tomorrow [under relaxed lockdown conditions in the UK, except for the absurd quatorzine on all travelers|, but still in a virtual format] that we discussed together [from our respective homes] at Paris Dauphine. And which I already discussed on this blog when it first came out.

Here are quotes I spotted during this virtual Dauphine discussion but we did not come up with enough material to build a significant discussion, although wondering at the potential for solving the O(n) bottleneck, handling doubly intractable cases like the Ising model. And noticing the nice features of the log target being estimable by unbiased estimators. And of using control variates, for once well-justified in a non-trivial environment.

“However, in practice this simple idea is unlikely to work. We can see this most clearly with the rejection sampler, as the probability of survival will decrease exponentially with t—and thus the rejection probability will often be prohibitively large.”

“This can be viewed as a rejection sampler to simulate from μ(x,t), the distribution of the Brownian motion at time  t conditional on its surviving to time t. Any realization that has been killed is ‘rejected’ and a realization that is not killed is a draw from μ(x,t). It is easy to construct an importance sampling version of this rejection sampler.”

scalable Metropolis-Hastings, nested Monte Carlo, and normalising flows

Over a sunny if quarantined Sunday, I started reading the PhD dissertation of Rob Cornish, Oxford University, as I am the external member of his viva committee. Ending up in a highly pleasant afternoon discussing this thesis over a (remote) viva yesterday. (If bemoaning a lost opportunity to visit Oxford!) The introduction to the viva was most helpful and set the results within the different time and geographical zones of the Ph.D since Rob had to switch from one group of advisors in Engineering to another group in Statistics. Plus an encompassing prospective discussion, expressing pessimism at exact MCMC for complex models and looking forward further advances in probabilistic programming.

Made of three papers, the thesis includes this ICML 2019 [remember the era when there were conferences?!] paper on scalable Metropolis-Hastings, by Rob Cornish, Paul Vanetti, Alexandre Bouchard-Côté, Georges Deligiannidis, and Arnaud Doucet, which I commented last year. Which achieves a remarkable and paradoxical O(1/√n) cost per iteration, provided (global) lower bounds are found on the (local) Metropolis-Hastings acceptance probabilities since they allow for Poisson thinning à la Devroye (1986) and  second order Taylor expansions constructed for all components of the target, with the third order derivatives providing bounds. However, the variability of the acceptance probability gets higher, which induces a longer but still manageable if the concentration of the posterior is in tune with the Bernstein von Mises asymptotics. I had not paid enough attention in my first read at the strong theoretical justification for the method, relying on the convergence of MAP estimates in well- and (some) mis-specified settings. Now, I would have liked to see the paper dealing with a more complex problem that logistic regression.

The second paper in the thesis is an ICML 2018 proceeding by Tom Rainforth, Robert Cornish, Hongseok Yang, Andrew Warrington, and Frank Wood, which considers Monte Carlo problems involving several nested expectations in a non-linear manner, meaning that (a) several levels of Monte Carlo approximations are required, with associated asymptotics, and (b) the resulting overall estimator is biased. This includes common doubly intractable posteriors, obviously, as well as (Bayesian) design and control problems. [And it has nothing to do with nested sampling.] The resolution chosen by the authors is strictly plug-in, in that they replace each level in the nesting with a Monte Carlo substitute and do not attempt to reduce the bias. Which means a wide range of solutions (other than the plug-in one) could have been investigated, including bootstrap maybe. For instance, Bayesian design is presented as an application of the approach, but since it relies on the log-evidence, there exist several versions for estimating (unbiasedly) this log-evidence. Similarly, the Forsythe-von Neumann technique applies to arbitrary transforms of a primary integral. The central discussion dwells on the optimal choice of the volume of simulations at each level, optimal in terms of asymptotic MSE. Or rather asymptotic bound on the MSE. The interesting result being that the outer expectation requires the square of the number of simulations for the other expectations. Which all need converge to infinity. A trick in finding an estimator for a polynomial transform reminded me of the SAME algorithm in that it duplicated the simulations as many times as the highest power of the polynomial. (The ‘Og briefly reported on this paper… four years ago.)

The third and last part of the thesis is a proposal [to appear in ICML 20] on relaxing bijectivity constraints in normalising flows with continuously index flows. (Or CIF. As Rob made a joke about this cleaning brand, let me add (?) to that joke by mentioning that looking at CIF and bijections is less dangerous in a Trump cum COVID era at CIF and injections!) With Anthony Caterini, George Deligiannidis and Arnaud Doucet as co-authors. I am much less familiar with this area and hence a wee bit puzzled at the purpose of removing what I understand to be an appealing side of normalising flows, namely to produce a manageable representation of density functions as a combination of bijective and differentiable functions of a baseline random vector, like a standard Normal vector. The argument made in the paper is that imposing this representation of the density imposes a constraint on the topology of its support since said support is homeomorphic to the support of the baseline random vector. While the supporting theoretical argument is a mathematical theorem that shows the Lipschitz bound on the transform should be infinity in the case the supports are topologically different, these arguments may be overly theoretical when faced with the practical implications of the replacement strategy. I somewhat miss its overall strength given that the whole point seems to be in approximating a density function, based on a finite sample.