Archive for sampling

MCMC without evaluating the target [aatB-mMC joint seminar, 24 April]

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , on April 11, 2024 by xi'an

On 24 April 2024, Guanyang Wang (Rutgers University, visiting ESSEC) will give a joint All about that Bayes – mostly Monte Carlo seminar on

MCMC when you do not want to evaluate the target distribution

In sampling tasks, it is common for target distributions to be known up to a normalizing constant. However, in many situations, evaluating even the unnormalized distribution can be costly or infeasible. This issue arises in scenarios such as sampling from the Bayesian posterior for large datasets and the ‘doubly intractable’ distributions. We provide a way to unify various MCMC algorithms, including several minibatch MCMC algorithms and the exchange algorithm. This framework not only simplifies the theoretical analysis of existing algorithms but also creates new algorithms. Similar frameworks exist in the literature, but they concentrate on different objectives.

The talk takes place at 4pm CEST, in room 8 at PariSanté Campus, Paris 15.

mostly MC [April]

Posted in Books, Kids, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , on April 5, 2024 by xi'an

Bayesian differential privacy for free?

Posted in Books, pictures, Statistics with tags , , , , , , , , , , , , on September 24, 2023 by xi'an

“We are interested in the question of how we can build differentially-private algorithms within the Bayesian framework. More precisely, we examine when the choice of prior is sufficient to guarantee differential privacy for decisions that are derived from the posterior distribution (…) we show that the Bayesian statistician’s choice of prior distribution ensures a base level of data privacy through the posterior distribution; the statistician can safely respond to external queries using samples from the posterior.”

Recently I came across this 2016 JMLR paper of Christos Dimitrakakis et al. on “how Bayesian inference itself can be used directly to provide private access to data, with no modification.” Which comes as a surprise since it implies that Bayesian sampling would be enough, per se, to keep both the data private and the information it conveys available. The main assumption on which this result is based is one of Lipschitz continuity of the model density, namely that, for a specific (pseudo-)distance ρ

|\log f(x|\theta)-\log f(y|\theta)|\le L\rho(x,y)

uniformly in θ over a set Θ with enough prior mass

\pi(\Theta)\ge 1-e^{-\epsilon}

for an ε>0. In this case, the Kullback-Leibler divergence between the posteriors π(θ|x) and π(θ|y) is bounded by a constant times ρ(x,y). (The constant being 2L when Θ is the entire parameter space.) This condition ensures differential privacy on the posterior distribution (and even more on the associated MCMC sample). More precisely, (2L,0)-differentially private in the case Θ is the entire parameter space. While there is an efficiency issue linked with the result since the bound L being set by the model and hence immovable, this remains a fundamental result for the field (as shown by its high number of citations).

transport, diffusions, and sampling

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , on November 19, 2022 by xi'an

At the Sampling, Transport, and Diffusions workshop at the Flatiron Institute, on Day #2, Marilou Gabrié (École Polytechnique) gave the second introductory lecture on merging sampling and normalising flows targeting the target distribution, when driven by a divergence criterion like KL, that only requires the shape of the target density. I first wondered about ergodicity guarantees in simultaneous MCMC and map training due to the adaptation of the flow but the update of the map only depends on the current particle cloud in (8). From an MCMC perspective, it sounds somewhat paradoxical to see the independent sampler making such an unexpected come-back when considering that no insider information is available about the (complex) posterior to drive the [what-you-get-is-what-you-see] construction of the transport map. However, the proposed approach superposed local (random-walk like) and global (transport) proposals in Algorithm 1.

Qiang Liu followed on learning transport maps, with the  Interesting notion of causalizing a graph by removing intersections (which are impossible for an ODE, as discussed by Eric Vanden-Eijden’s talk yesterday) through  coupling. Which underlies his notion of rectified flows. Possibly connecting with the next lightning talk by Jonathan Weare on spurious modes created by a variational Monte Carlo sampler and the use of stochastic gradient, corrected by (case-dependent?) regularisation.

Then came a whole series of MCMC talks!

Sam Livingstone spoke on Barker’s proposal (an incoming Biometrika paper!) as part of a general class of transforms g of the MH ratio, using jump processes based on a nasty normalising constant related with g (tractable for the original Barker algorithm). I then realised I had missed his StatSci paper on how to speak to statistical physics researchers!

Charles Margossian spoke about using a massive number of short parallel runs (many-short-chain regime) from a recent paper written with Aki,  Andrew, and Lionel Riou-Durand (Warwick) among others. Which brings us back to the challenge of producing convergence diagnostics and precisely the Gelman-Rubin R statistic or its recent nR avatar (with its linear limitations and dependence on parameterisation, as opposed to fuller distributional criteria). The core of the approach is in using blocks of GPUs to improve and speed-up the estimation of the between-chain variance. (D for R².) I still wonder at a waste of simulations / computing power resulting from stopping the runs almost immediately after warm-up is over, since reaching the stationary regime or an approximation thereof should be exploited more efficiently. (Starting from a minimal discrepancy sample would also improve efficiency.)

Lu Zhang also talked on the issue of cutting down warmup, presenting a paper co-authored with Bob, Andrew, and Aki, recommending Laplace / variational approximations for reaching faster high-posterior-density regions, using an algorithm called Pathfinder that relies on ELBO checks to counter poor performances of Laplace approximations. In the spirit of the workshop, it could be profitable to further transform / push-forward the outcome by a transport map.

Yuling Yao (of stacking and Pareto smoothing fame!) gave an original and challenging (in a positive sense) talk on the many ways of bridging densities [linked with the remark he shared with me the day before] and their statistical significance. Questioning our usual reliance on arithmetic or geometric mixtures. Ignoring computational issues, selecting a bridging pattern sounds not different from choosing a parameterised family of embedding distributions. This new typology of models can then be endowed with properties that are more or less appealing. (Occurences of the Hyvärinen score and our mixtestin perspective in the talk!)

Miranda Holmes-Cerfon talked about MCMC on stratification (illustrated by this beautiful picture of nanoparticle random walks). Which means sampling under varying constraints and dimensions with associated densities under the respective Hausdorff measures. This sounds like a perfect setting for reversible jump and in a sense it is, as mentioned in the talks. Except that the moves between manifolds are driven by the proximity to said manifold, helping with a higher acceptance rate, and making the proposals easier to construct since projections (or the reverses) have a physical meaning. (But I could not tell from the talk why the approach was seemingly escaping the symmetry constraint set by Peter Green’s RJMCMC on the reciprocal moves between two given manifolds).

data assimilation and reduced modelling for high-D problems [CIRM]

Posted in Books, Kids, Mountains, pictures, Running, Statistics, University life with tags , , , , , , , , , , , , , , , , , on February 8, 2021 by xi'an

Next summer, from 19 July till 27 August, there will be a six week program at CIRM on the above theme, bringing together scientists from both the academic and industrial communities. The program includes a one-week summer school followed by 5 weeks of research sessions on projects proposed by academic and industrial partners.

Confirmed speakers of the summer school (Jul 19-23) are:

  • Albert Cohen (Sorbonne University)
  • Masoumeh Dashti (University of Sussex)
  • Eric Moulines (Ecole Polytechnique)
  • Anthony Nouy (Ecole Centrale de Nantes)
  • Claudia Schillings (Mannheim University)

Junior participants may apply for fellowships to cover part or the whole stay. Registration and application to fellowships will be open soon.