Archive for Bayesian GANs

BayesComp²³ [aka MCMski⁶]

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , on March 20, 2023 by xi'an

The main BayesComp meeting started right after the ABC workshop and went on at a grueling pace, and offered a constant conundrum as to which of the four sessions to attend, the more when trying to enjoy some outdoor activity during the lunch breaks. My overall feeling is that it went on too fast, too quickly! Here are some quick and haphazard notes from some of the talks I attended, as for instance the practical parallelisation of an SMC algorithm by Adrien Corenflos, the advances made by Giacommo Zanella on using Bayesian asymptotics to assess robustness of Gibbs samplers to the dimension of the data (although with no assessment of the ensuing time requirements), a nice session on simulated annealing, from black holes to Alps (if the wrong mountain chain for Levi), and the central role of contrastive learning à la Geyer (1994) in the GAN talks of Veronika Rockova and Éric Moulines. Victor  Elvira delivered an enthusiastic talk on our massively recycled importance on-going project that we need to complete asap!

While their earlier arXived paper was on my reading list, I was quite excited by Nicolas Chopin’s (along with Mathieu Gerber) work on some quadrature stabilisation that is not QMC (but not too far either), with stratification over the unit cube (after a possible reparameterisation) requiring more evaluations, plus a sort of pulled-by-its-own-bootstrap control variate, but beating regular Monte Carlo in terms of convergence rate and practical precision (if accepting a large simulation budget from the start). A difficulty common to all (?) stratification proposals is that it does not readily applies to highly concentrated functions.

I chaired the lightning talks session, which were 3mn one-slide snapshots about some incoming posters selected by the scientific committee. While I appreciated the entry into the poster session, the more because it was quite crowded and busy, if full of interesting results, and enjoyed the slide solely made of “0.234”, I regret that not all poster presenters were not given the same opportunity (although I am unclear about which format would have permitted this) and that it did not attract more attendees as it took place in parallel with other sessions.

In a not-solely-ABC session, I appreciated Sirio Legramanti speaking on comparing different distance measures via Rademacher complexity, highlighting that some distances are not robust, incl. for instance some (all?) Wasserstein distances that are not defined for heavy tailed distributions like the Cauchy distribution. And using the mean as a summary statistic in such heavy tail settings comes as an issue, since the distance between simulated and observed means does not decrease in variance with the sample size, with the practical difficulty that the problem is hard to detect on real (misspecified) data since the true distribution behing (if any) is unknown. Would that imply that only intrinsic distances like maximum mean discrepancy or Kolmogorov-Smirnov are the only reasonable choices in misspecified settings?! While, in the ABC session, Jeremiah went back to this role of distances for generalised Bayesian inference, replacing likelihood by scoring rule, and requirement for Monte Carlo approximation (but is approximating an approximation that a terrible thing?!). I also discussed briefly with Alejandra Avalos on her use of pseudo-likelihoods in Ising models, which, while not the original model, is nonetheless a model and therefore to taken as such rather than as approximation.

I also enjoyed Gregor Kastner’s work on Bayesian prediction for a city (Milano) planning agent-based model relying on cell phone activities, which reminded me at a superficial level of a similar exploitation of cell usage in an attraction park in Singapore Steve Fienberg told me about during his last sabbatical in Paris.

In conclusion, an exciting meeting that should have stretched a whole week (or taken place in a less congenial environment!). The call for organising BayesComp 2025 is still open, by the way.

 

ABC in Lapland²

Posted in Mountains, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , on March 16, 2023 by xi'an

On the second day of our workshop, Aki Vehtari gave a short talk about his recent works on speed up post processing by importance sampling a simulation of an imprecise version of the likelihood until the desired precision is attained, importance corrected by Pareto smoothing¹⁵. A very interesting foray into the meaning of practical models and the hard constraints on computer precision. Grégoire Clarté (formerly a PhD student of ours at Dauphine) stayed on a similar ground of using sparse GP versions of the likelihood and post processing by VB²³ then stir and repeat!

Riccardo Corradin did model-based clustering when the nonparametric mixture kernel is missing a normalizing constant, using ABC with a Wasserstein distance and an adaptive proposal, with some flavour of ABC-Gibbs (and no issue of label switching since this is clustering). Mixtures of g&k models, yay! Tommaso Rigon reconsidered clustering via a (generalised Bayes à la Bissiri et al.) discrepancy measure rather than a true model, summing over all clusters and observations a discrepancy between said observation and said cluster. Very neat if possibly costly since involving distances to clusters or within clusters. Although she considered post-processing and Bayesian bootstrap, Judith (formerly [?] Dauphine)  acknowledged that she somewhat drifted from the theme of the workshop by considering BvM theorems for functionals of unknown functions, with a form of Laplace correction. (Enjoying Lapland so much that I though “Lap” in Judith’s talk was for Lapland rather than Laplace!!!) And applications to causality.

After the (X country skiing) break, Lorenzo Pacchiardi presented his adversarial approach to ABC, differing from Ramesh et al. (2022) by the use of scoring rule minimisation, where unbiased estimators of gradients are available, Ayush Bharti argued for involving experts in selecting the summary statistics, esp. for misspecified models, and Ulpu Remes presented a Jensen-Shanon divergence for selecting models likelihood-freely²², using a test statistic as summary statistic..

Sam Duffield made a case for generalised Bayesian inference in correcting errors in quantum computers, Joshua Bon went back to scoring rules for correcting the ABC approximation, with an importance step, while Trevor Campbell, Iuri Marocco and Hector McKimm nicely concluded the workshop with lightning-fast talks in place of the cancelled poster session. Great workshop, in my most objective opinion, with new directions!

ABC in Lapland

Posted in Mountains, pictures, Statistics, University life with tags , , , , , , , , , , , , , , , , on March 15, 2023 by xi'an

Greetings from Levi, Lapland! Sonia Petrone beautifully started the ABC workshop with a (the!) plenary Sunday night talk on quasi-Bayes in the spirit of both Fortini & Petrone (2020) and the more recent Fong, Holmes, and Walker (2023). The talk got me puzzled by wondering the nature of convergence, in that it happens no matter what the underlying distribution (or lack thereof) of the data is, in that, even without any exchangeability structure, the predictive is converging. The quasi stems from a connection with the historical Smith and Markov (1978) sequential update approximation for the posterior attached with mixtures of distributions. Which itself relates to both Dirichlet posterior updates and Bayesian bootstrap à la Newton & Raftery. Appropriate link when the convergence seems to stem from the sequence of predictives instead of the underlying distribution, if any, pulling Bayes by its own bootstrap…! Chris Holmes also talked the next day about this approach, esp. about a Bayesian approach to causality that does not require counterfactuals, in connection with a recent arXival of his (on my reading list).

Carlo Alberto presented both his 2014 SABC (simulated annealing) algorithm with a neat idea of reducing waste in the tempering schedule and a recent summary selection approach based on an auto-encoder function of both y and noise to reduce to sufficient statistic. A similar idea was found in Yannik Schälte’s talk (slide above). Who was returning to Richard Wiilkinson’s exact ABC¹³ with adaptive sequential generator, also linking to simulated annealing and ABC-SMC¹² to the rescue. Notion of amortized inference. Seemingly approximating data y with NN and then learn parameter by a normalising flow.

David Frazier talked on Q-posterior²³ approach, based on Fisher’s identity, for approximating score function, which first seemed to require some exponential family structure on a completed model (but does not, after discussing with David!), Jack Jewson on beta divergence priors²³ for uncertainty on likelihoods, better than KLD divergence on e-contamination situations, any impact on ABC? Masahiro Fujisawa back to outliers impact on ABC, again with e-contaminations (with me wondering at the impact of outliers on NN estimation).

In the afternoon session (due to two last minute cancellations, we skipped (or [MCMC] skied) one afternoon session, which coincided with a bright and crispy day, how convenient! ), Massi Tamborino (U of Warwick) FitzHugh-Nagumo process, with impossibilities to solve the inference problem differently, for instance Euler-Maruyama does not always work, numerical schemes are inducing a bias. Back to ABC with the hunt for a summary that get rid of the noise, as in Carlo Alberto’s work. Yuexi Wang talked about her works on adversarial ABC inspired from GANs. Another instance where noise is used as input. True data not used in training? Imke Botha discussed an improvement to ensemble Kalman inversion which, while biased, gains over both regular SMC timewise and ensemble Kalman inversion in precision, and Chaya Weerasinghe focussed on Bayesian forecasting in state space models under model misspecification, via approximate Bayesian computation, using an auxiliary model to produce summary statistics as in indirect inference.

robust inference using posterior bootstrap

Posted in Books, Statistics, University life with tags , , , , , , , , , , , , , , , on February 18, 2022 by xi'an

The famous 1994 Read Paper by Michael Newton and Adrian Raftery was entitled Approximate Bayesian inference, where the boostrap aspect is in randomly (exponentially) weighting each observation in the iid sample through a power of the corresponding density, a proposal that happened at about the same time as Tony O’Hagan suggested the related fractional Bayes factor. (The paper may also be equally famous for suggesting the harmonic mean estimator of the evidence!, although it only appeared as an appendix to the paper.) What is unclear to me is the nature of the distribution g(θ) associated with the weighted bootstrap sample, conditional on the original sample, since the outcome is the result of a random Exponential sample and of an optimisation step. With no impact of the prior (which could have been used as a penalisation factor), corrected by Michael and Adrian via an importance step involving the estimation of g(·).

At the Algorithm Seminar today in Warwick, Emilie Pompe presented recent research, including some written jointly with Pierre Jacob, [which I have not yet read] that does exactly that inclusion of the log prior as penalisation factor, along with an extra weight different from one, as motivated by the possibility of a misspecification. Including a new approach to cut models. An alternative mentioned during the talk that reminds me of GANs is to generate a pseudo-sample from the prior predictive and add it to the original sample. (Some attendees commented on the dependence of the later version on the chosen parameterisation, which is an issue that had X’ed my mind as well.)

ABC by classification

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , on December 21, 2021 by xi'an

As a(nother) coincidence, yesterday, we had a reading group discussion at Paris Dauphine a few days after Veronika Rockova presented the paper in person in Oaxaca. The idea in ABC by classification that she co-authored with Yuexi Wang and Tetsuya Kaj is to use the empirical Kullback-Leibler divergence as a substitute to the intractable likelihood at the parameter value θ. In the generalised Bayes setting of Bissiri et al. Since this quantity is not available it is estimated as well. By a classification method that somehow relates to Geyer’s 1994 inverse logistic proposal, using the (ABC) pseudo-data generated from the model associated with θ. The convergence of the algorithm obviously depends on the choice of the discriminator used in practice. The paper also makes a connection with GANs as a potential alternative for the generalised Bayes representation. It mostly focus on the frequentist validation of the ABC posterior, in the sense of exhibiting a posterior concentration rate in n, the sample size, while requiring performances of the discriminators that may prove hard to check in practice. Expanding our 2018 result to this setting, with the tolerance decreasing more slowly than the Kullback-Leibler estimation error.

Besides the shared appreciation that working with the Kullback-Leibler divergence was a nice and under-appreciated direction, one point that came out of our discussion is that using the (estimated) Kullback-Leibler divergence as a form of distance (attached with a tolerance) is less prone to variability (or more robust) than using directly (and without tolerance) the estimate as a substitute to the intractable likelihood, if we interpreted the discrepancy in Figure 3 properly. Another item was about the discriminator function itself: while a machine learning methodology such as neural networks could be used, albeit with unclear theoretical guarantees, it was unclear to us whether or not a new discriminator needed be constructed for each value of the parameter θ. Even when the simulations are run by a deterministic transform.

%d bloggers like this: