Archive for NeurIPS 2021

séminaire parisien de statistique [09/01/23]

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , on January 22, 2023 by xi'an

I had missed the séminaire parisien de statistique for most of the Fall semester, hence was determined to attend the first session of the year 2023, the more because the talks were close to my interest. To wit, Chiara Amorino spoke about particle systems for McKean-Vlasov SDEs, when those are parameterised by several parameters, when observing repeatedly discretised versions, hereby establishing the consistence of a contrast estimator of these estimators. I was initially confused by the mention of interacting particles, since the work is not at all about related with simulation. Just wondering whether this contrast could prove useful for a likelihood-free approach in building a Gibbs distribution?

Valentin de Bortoli then spoke on diffusion Schrödinger bridges for generative models, which allowed me to better my understanding of this idea presented by Arnaud at the Flatiron workshop last November. The presentation here was quite different, using a forward versus backward explanation via a sequence of transforms that end up approximately Gaussian, once more reminiscent of sequential Monte Carlo. The transforms are themselves approximate Gaussian versions relying on adiscretised Ornstein-Ulhenbeck process, with a missing score term since said score involves a marginal density at each step of the sequence. It can be represented [as below] as an expectation conditional on the (observed) variate at time zero (with a connection with Hyvärinen’s NCE / score matching!) Practical implementation is done via neural networks.

Last but not least!, my friend Randal talked about his Kick-Kac formula, which connects with the one we considered in our 2004 paper with Jim Hobert. While I had heard earlier version, this talk was mostly on probability aspects and highly enjoyable as he included some short proofs. The formula is expressing the stationary probability measure π of the original Markov chain in terms of explorations between two visits to an accessible set C, more general than a small set. With at first an annoying remaining term due to the set not being Harris recurrent but which eventually cancels out. Memoryless transportation can be implemented because C is free for the picking, for instance the set where the target is bounded by a manageable density, allowing for an accept-reject step. The resulting chain is non-reversible. However, due to the difficulty to simulate from the target restricted to C, a second and parallel Markov chain is instead created. Performances, unsurprisingly, depend on the choice of C, but it can be adapted to the target on the go.

day two at ISBA 22

Posted in Mountains, pictures, Running, Statistics, Travel with tags , , , , , , , , , , , , , , , , , , , on June 30, 2022 by xi'an

Still woke up early too early, which let me go for a long run in Mont Royal (which felt almost immediately familiar from earlier runs at MCM 2017!) at dawn and at a pleasant temperature (but missed the top bagel bakery on the way back!). Skipped the morning plenary lectures to complete recommendation letters and finishing a paper submission. But had a terrific lunch with a good friend I had not seen in Covid-times, at a local branch of Kinton Ramen which I already enjoyed in Vancouver as my Airbnb was located on top of it.

I chaired the afternoon Bayesian computations session with Onur Teymur presenting the general spirit of his Neurips 21 paper on black box probabilistic numerics. Mentioning that a new textbook on the topic by Phillip Henning, Michael Osborne, and Hans Kersting had appeared today! The second talk was by Laura Bondi who discussed an ABC model choice approach to assess breast cancer screening. With enough missing data (out of 78051 women followed over 12 years) to lead to an intractable likelihood. Starting with vanilla ABC using 32 summaries and moving to our random forest approach. Unsurprisingly concluding with different top models, but not characterising the identifiability provided by the choice of the summaries. The third talk was by Ryan Chan (fresh Warwick PhD recipient), about a Fusion divide-and-conquer approach that avoids the approximation of earlier approaches. In particular he uses a clever accept-reject algorithm to generate a product of densities using the component densities. A nice trick that Murray explained to me while visiting in Paris lg ast month. (The approach appears to be parameterisation dependent.) The final talk was by Umberto Picchini and in a sort the synthetic likelihood mirror of Massi’s talk yesterday, in the sense of constructing a guided proposal relying on observed summaries. If not comparing both approaches on a given toy like the g-and-k distribution.

posterior collapse

Posted in Statistics with tags , , , , , , on February 24, 2022 by xi'an

The latest ABC One World webinar was a talk by Yixin Wang about the posterior collapse of auto-encoders, of which I was completely unaware. It is essentially an identifiability issue with auto-encoders, where the latent variable z at the source of the VAE does not impact the likelihood, assumed to be an exponential family with parameter depending on z and on θ, through possibly a neural network construct. The variational part comes from the parameter being estimated as θ⁰, via a variational approximation.

“….the problem of posterior collapse mainly arises from the model and the data, rather than from inference or optimization…”

The collapse means that the posterior for the latent satisfies p(z|θ⁰,x)=p(z), which is not a standard property since θ⁰=θ⁰(x). Which Yixin Wang, David Blei and John Cunningham show is equivalent to p(x|θ⁰,z)=p(x|θ⁰), i.e. z being unidentifiable. The above quote is then both correct and incorrect in that the choice of the inference approach, i.e. of the estimator θ⁰=θ⁰(x) has an impact on whether or not p(z|θ⁰,x)=p(z) holds. As acknowledged by the authors when describing “methods modify the optimization objectives or algorithms of VAE to avoid parameter values θ at which the latent variable is non-identifiable“. They later build a resolution for identifiable VAEs by imposing that the conditional p(x|θ,z) is injective in z for all values of θ. Resulting in a neural network with Brenier maps.

From a Bayesian perspective, I have difficulties to connect to the issue, the folk lore being that selecting a proper prior is a sufficient fix for avoiding non-identifiability, but more fundamentally I wonder at the relevance of inferring about the latent z’s and hence worrying about their identifiability or lack thereof.

%d bloggers like this: