Archive for MCMC

congrats [IMS related]

Posted in Statistics with tags , , , , , , , , , , , on July 21, 2021 by xi'an

When I read through the June-July issue of the IMS Bulletin, I saw many causes for celebration and congratulations!, from Richard Samworth’s award of an Advanced ERC grant, to the new IMS fellows, including my friends, Ismael Castillo, Steve Mc Eachern, and Natesh Pillai, as well as my current or former associate editors, Johan Segers (JRSS B) and Changbao Wu (Biometrika). To my friends Alicia Carriquiry, David Dunson, and Tamara Broderick receiving 2021 COPSS awards, along others, including Wing Hung Wong (of the precursor Tanner & Wong, 1987 fame!). Natesh also figures among the “Quadfecta 23”, the exclusive club of authors having published at least one paper in each of the four Annals published by the IMS!

black box MCMC

Posted in Books, Statistics with tags , , , , , , , , on July 17, 2021 by xi'an

“…back-box methods, despite using no information of the proposal distribution, can actually give better estimation accuracy than the typical importance sampling [methods]…”

Earlier this week I was pointed out to Liu & Lee’s black box importance sampling, published in AISTATS 2017. (which I did not attend). Already found in Briol et al. (2015) and Oates, Girolami, and Chopin (2017), the method starts from Charles Stein‘s “unbiased estimator of the loss” (that was a fundamental tool in my own PhD thesis!), a variation on integration by part:

\mathbb E_p[\nabla\log p(X) f(X)+\nabla f(X)]=0

for differentiable functions f and p cancelling at the boundaries. It also holds for the kernelised extension

\mathbb E_p[k_p(X,x')]=0

for all x’, where the integrand is a 1-d function of an arbitrary kernel k(x,x’) and of the score function ∇log p. This null expectation happens to be a minimum since

\mathbb E_{X,X'\sim q}[k_p(X,X')]\ge 0

and hence importance weights can be obtained by minimising

\sum_{ij} w_i w_j k_p(x_i,x_j)

in w (from the unit simplex), for a sample of iid realisations from a possibly unknown distribution with density q. Liu & Lee show that this approximation converges faster than the standard Monte Carlo speed √n, when using Hilbertian properties of the kernel through control variates. Actually, the same thing happens when using a (leave-one-out) non-parametric kernel estimate of q rather than q. At least in theory.

“…simulating n parallel MCMC chains for m steps, where the length m of the chains can be smaller than what is typically used in MCMC, because it just needs to be large enough to bring the distribution `roughly’ close to the target distribution”

A practical application of the concept is suggested in the above quote. As a corrected weight for interrupted MCMC. Or when using an unadjusted Langevin algorithm. Provided the minimisation of the objective quadratic form is fast enough, the method can thus be used as a benchmark for regular MCMC implementation.

ISBA 2021 grand finale

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on July 3, 2021 by xi'an

Last day of ISBA (and ISB@CIRM), or maybe half-day, since there are only five groups of sessions we can attend in Mediterranean time.

My first session was one on priors for mixtures, with 162⁺ attendees at 5:15am! (well, at 11:15 Wien or Marseille time), Gertrud Malsiner-Walli distinguishing between priors on number of components [in the model] vs number of clusters [in the data], with a minor question of mine whether or not a “prior” is appropriate for a data-dependent quantity. And Deborah Dunkel presenting [very early in the US!] anchor models for fighting label switching, which reminded me of the talk she gave at the mixture session of JSM 2018 in Vancouver. (With extensions to consistency and mixtures of regression.) And Clara Grazian debating on objective priors for the number of components in a mixture [in the Sydney evening], using loss functions to build these. Overall it seems there were many talks on mixtures and clustering this year.

After the lunch break, when several ISB@CIRM were about to leave, we ran the Objective Bayes contributed session, which actually included several Stein-like minimaxity talks. Plus one by Théo Moins from the patio of CIRM, with ciccadas in the background. Incredibly chaired by my friend Gonzalo, who had a question at the ready for each and every speaker! And then the Savage Awards II session. Which ceremony is postponed till Montréal next year. And which nominees are uniformly impressive!!! The winner will only be announced in September, via the ISBA Bulletin. Missing the ISBA general assembly for a dinner in Cassis. And being back for the Bayesian optimisation session.

I would have expected more talks at the boundary of BS & ML (as well as COVID and epidemic decision making), the dearth of which should be a cause for concern if researchers at this boundary do not prioritise ISBA meetings over more generic meetings like NeurIPS… (An exception was George Papamakarios’ talk on variational autoencoders in the Savage Awards II session.)

Many many thanks to the group of students at UConn involved in setting most of the Whova site and running the support throughout the conference. It indeed went on very smoothly and provided a worthwhile substitute for the 100% on-site version. Actually, I both hope for the COVID pandemic (or at least the restrictions attached to it) to abate and for the hybrid structure of meetings to stay, along with the multiplication of mirror workshops. Being together is essential to the DNA of conferences, but travelling to a single location is not so desirable, for many reasons. Looking for ISBA 2022, a year from now, either in Montréal, Québec, or in one of the mirror sites!

ISBA 2021 low key

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , , , , , , , on July 2, 2021 by xi'an

Fourth day of ISBA (and ISB@CIRM), which was a bit low key for me as I had a longer hike with my wife in the morning, including a swim in a sea as cold as the Annecy lake last month!, but nonetheless enjoyable and crystal clear, then attacked my pile of Biometrika submissions that had accumulated beyond the reasonable since last week, chased late participants who hadn’t paid yet, reviewed a paper that was due two weeks ago, chatted with participants before they left, discussed a research problem, and as a result ended attending only four sessions over the whole day. Including one about Models and Methods for Networks and Graphs, with interesting computation challenges, esp. in block models, the session in memoriam of Hélène Massam, where Gérard Letac (part of ISB@CIRM!), Jacek Wesolowski, and Reza Mohammadi, all coauthors of Hélène, made presentations on their joint advances. Hélène was born in Marseille, actually, in 1949, and even though she did not stay in France after her École Normale studies, it was a further commemoration to attend this session in her birth-place. I also found out about them working on the approximation of a ratio of normalising constants for the G-Wishart. The last session of my data was the Susie Bayarri memorial lecture, with Tamara Roderick as the lecturer. Reporting on an impressive bunch of tricks to reduce computing costs for hierarchical models with Gaussian processes.

ISBA 2021.3

Posted in Kids, Mountains, pictures, Running, Statistics, Travel, University life, Wines with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on July 1, 2021 by xi'an

Now on the third day which again started early with a 100% local j-ISBA session. (After a group run to and around Mont Puget, my first real run since 2020!!!) With a second round of talks by junior researchers from master to postdoc level. Again well-attended. A talk about Bayesian non-parametric sequential taxinomy by Alessandro Zito used the BayesANT acronym, which reminded me of the new vave group Adam and the Ants I was listening to forty years ago, in case they need a song as well as a logo! (Note that BayesANT is also used for a robot using Bayesian optimisation!) And more generally a wide variety in the themes. Thanks to the j-organisers of this 100% live session!

The next session was on PDMPs, which I helped organise, with Manon Michel speaking from Marseille, exploiting the symmetry around the gradient, which is distribution-free! Then, remotely, Kengo Kamatani, speaking from Tokyo, who expanded the high-dimensional scaling limit to the Zig-Zag sampler, exhibiting an argument against small refreshment rates, and Murray Pollock, from Newcastle, who exposed quite clearly the working principles of the Restore algorithm, including why coupling from the past was available in this setting. A well-attended session despite the early hour (in the USA).

Another session of interest for me [which I attended by myself as everyone else was at lunch in CIRM!] was the contributed C16 on variational and scalable inference that included a talk on hierarchical Monte Carlo fusion (with my friends Gareth and Murray as co-authors), Darren’s call to adopt functional programming in order to save Bayesian computing from extinction, normalising flows for modularisation, and Dennis’ adversarial solutions for Bayesian design, avoiding the computation of the evidence.

Wes Johnson’s lecture was about stories with setting prior distributions based on experts’ opinions. Which reminded me of the short paper Kaniav Kamary and myself wrote about ten years ago, in response to a paper on the topic in the American Statistician. And could not understand the discrepancy between two Bayes factors based on Normal versus Cauchy priors, until I was told they were mistakenly used repeatedly.

Rushing out of dinner, I attended both the non-parametric session (live with Marta and Antonio!) and the high-dimension computational session on Bayesian model choice (mute!). A bit of a schizophrenic moment, but allowing to get a rough picture in both areas. At once. Including an adaptive MCMC scheme for selecting models by Jim Griffin. Which could be run directly over the model space. With my ever-going wondering at the meaning of neighbour models.