Archive for the Statistics Category

partial rankings and aggregate ranks

Posted in Books, Kids, R, Statistics, Travel, University life with tags , , , , , , , , , on March 22, 2023 by xi'an

When interviewing impressive applicants from a stunning variety of places and background for fellows in our Data Science for Social Good program (in Warwick and Kaiserslautern) this summer, we came through the common conundrum of comparing ranks while each of us only meeting a subset of the candidates. Over a free morning, I briefly thought of the problem (while swimming) and then wrote a short R code to infer about an aggregate ranking, ρ, based on a simple model, namely a Poisson distribution on the distance between an individual’s ranking and the aggregate

d(r_i,\rho)\sim\mathcal P(\lambda)

a uniform distribution on the missing ranks as well as on the aggregate, and a non-informative prior on λ. Leading to a three step Gibbs sampler for the completion and the simulation of ρ and λ.

I am aware that the problem has been tackled in many different ways, including Bayesian ones (as in Deng et al., 2014) and local ones, but this was a fun exercise. Albeit we did not use any model in the end!

JAGS Workshop [10-14 July 2023]

Posted in Statistics, University life, Books, Travel, pictures, R with tags , , , , , , , , on March 21, 2023 by xi'an

Hey, JAGS users and would-be users, be warned that registration is now open for the annual JAGS workshop on probabilistic modelling for cognitive science. The tenth instalment of this workshop takes place July 10–14, 2023 in Amsterdam and online. This workshop is meant for researchers who want to learn how to apply Bayesian inference in practice. Most applications we discuss are taken from the field of cognitive science. The workshop is based on the book Bayesian Cognitive Modeling: A practical course written by Michael Lee and Eric-Jan Wagenmakers. It is followed by a shorter workshop (15-16 July) on Theory and Practice of Bayesian Hypothesis Testing.

BayesComp²³ [aka MCMski⁶]

Posted in Books, Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , on March 20, 2023 by xi'an

The main BayesComp meeting started right after the ABC workshop and went on at a grueling pace, and offered a constant conundrum as to which of the four sessions to attend, the more when trying to enjoy some outdoor activity during the lunch breaks. My overall feeling is that it went on too fast, too quickly! Here are some quick and haphazard notes from some of the talks I attended, as for instance the practical parallelisation of an SMC algorithm by Adrien Corenflos, the advances made by Giacommo Zanella on using Bayesian asymptotics to assess robustness of Gibbs samplers to the dimension of the data (although with no assessment of the ensuing time requirements), a nice session on simulated annealing, from black holes to Alps (if the wrong mountain chain for Levi), and the central role of contrastive learning à la Geyer (1994) in the GAN talks of Veronika Rockova and Éric Moulines. Victor  Elvira delivered an enthusiastic talk on our massively recycled importance on-going project that we need to complete asap!

While their earlier arXived paper was on my reading list, I was quite excited by Nicolas Chopin’s (along with Mathieu Gerber) work on some quadrature stabilisation that is not QMC (but not too far either), with stratification over the unit cube (after a possible reparameterisation) requiring more evaluations, plus a sort of pulled-by-its-own-bootstrap control variate, but beating regular Monte Carlo in terms of convergence rate and practical precision (if accepting a large simulation budget from the start). A difficulty common to all (?) stratification proposals is that it does not readily applies to highly concentrated functions.

I chaired the lightning talks session, which were 3mn one-slide snapshots about some incoming posters selected by the scientific committee. While I appreciated the entry into the poster session, the more because it was quite crowded and busy, if full of interesting results, and enjoyed the slide solely made of “0.234”, I regret that not all poster presenters were not given the same opportunity (although I am unclear about which format would have permitted this) and that it did not attract more attendees as it took place in parallel with other sessions.

In a not-solely-ABC session, I appreciated Sirio Legramanti speaking on comparing different distance measures via Rademacher complexity, highlighting that some distances are not robust, incl. for instance some (all?) Wasserstein distances that are not defined for heavy tailed distributions like the Cauchy distribution. And using the mean as a summary statistic in such heavy tail settings comes as an issue, since the distance between simulated and observed means does not decrease in variance with the sample size, with the practical difficulty that the problem is hard to detect on real (misspecified) data since the true distribution behing (if any) is unknown. Would that imply that only intrinsic distances like maximum mean discrepancy or Kolmogorov-Smirnov are the only reasonable choices in misspecified settings?! While, in the ABC session, Jeremiah went back to this role of distances for generalised Bayesian inference, replacing likelihood by scoring rule, and requirement for Monte Carlo approximation (but is approximating an approximation that a terrible thing?!). I also discussed briefly with Alejandra Avalos on her use of pseudo-likelihoods in Ising models, which, while not the original model, is nonetheless a model and therefore to taken as such rather than as approximation.

I also enjoyed Gregor Kastner’s work on Bayesian prediction for a city (Milano) planning agent-based model relying on cell phone activities, which reminded me at a superficial level of a similar exploitation of cell usage in an attraction park in Singapore Steve Fienberg told me about during his last sabbatical in Paris.

In conclusion, an exciting meeting that should have stretched a whole week (or taken place in a less congenial environment!). The call for organising BayesComp 2025 is still open, by the way.

 

optimal scaling for proximal MALA [All about that Bayes seminar, 21/03, Palaiseau]

Posted in Statistics, University life with tags , , , , , on March 19, 2023 by xi'an

An All about that Bayes seminar next Tuesday, at 2pm, at AgroParisTech, Francesca Crucinio (formerly Warwick and now ENSAE):

We consider a recently proposed class of MCMC methods which uses proximity maps instead of gradients to build proposal mechanisms which can be employed for both differentiable and non-differentiable targets. These methods have been shown to be stable for a wide class of targets, making them a valuable alternative to Metropolis-adjusted Langevin algorithms (MALA); and have found wide application in imaging contexts. The wider stability properties are obtained by building the Moreau-Yoshida envelope for the target of interest, which depends on a parameter λ. In this work, we investigate the optimal scaling problem for this class of algorithms, which encompasses MALA, and provide practical guidelines for the implementation of these methods.
Joint work with Alain Durmus, Pablo Jiménez, Gareth O. Roberts.

kuva Lapista⁴ [jatp]

Posted in Statistics with tags , , , , , , on March 18, 2023 by xi'an

%d bloggers like this: