## accronyms [CDT lectures]

Posted in Books, Statistics with tags , , , , , , , , , , , , , , , on May 16, 2022 by xi'an

This week, I gave a short and introductory course in Warwick for the CDT (PhD) students on my perceived connections between reverse logistic regression à la Geyer and GANS, among other things. The first attempt was cancelled in 2020 due to the pandemic, the second one in 2021 was on-line and thus offered little possibilities for interactions. Preparing for this third attempt made me read more papers on some statistical analyses of GANs and WGANs, which was more satisfactory [for me] even though I could not get into the technical details…

## finding our way in the dark

Posted in Books, pictures, Statistics with tags , , , , , , , , , on November 18, 2021 by xi'an

The paper Finding our Way in the Dark: Approximate MCMC for Approximate Bayesian Methods by Evgeny Levi and (my friend) Radu Craiu, recently got published in Bayesian Analysis. The central motivation for their work is that both ABC and synthetic likelihood are costly methods when the data is large and does not allow for smaller summaries. That is, when summaries S of smaller dimension cannot be directly simulated. The idea is to try to estimate

$h(\theta)=\mathbb{P}_\theta(d(S,S^\text{obs})\le\epsilon)$

since this is the substitute for the likelihood used for ABC. (A related idea is to build an approximate and conditional [on θ] distribution on the distance, idea with which Doc. Stoehr and I played a wee bit without getting anything definitely interesting!) This is a one-dimensional object, hence non-parametric estimates could be considered… For instance using k-nearest neighbour methods (which were already linked with ABC by Gérard Biau and co-authors.) A random forest could also be used (?). Or neural nets. The method still requires a full simulation of new datasets, so I wonder at the gain unless the replacement of the naïve indicator with h(θ) brings clear improvement to the approximation. Hence much fewer simulations. The ESS reduction is definitely improved, esp. since the CPU cost is higher. Could this be associated with the recourse to independent proposals?

In a sence, Bayesian synthetic likelihood does not convey the same appeal, since is a bit more of a tough cookie: approximating the mean and variance is multidimensional. (BSL is always more expensive!)

As a side remark, the authors use two chains in parallel to simplify convergence proofs, as we did a while ago with AMIS!

## ABC in Svalbard [the day after]

Posted in Books, Kids, Mountains, pictures, R, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , on April 19, 2021 by xi'an

The following and very kind email was sent to me the day after the workshop

thanks once again to make the conference possible. It was full of interesting studies within a friendly environment, I really enjoyed it. I think it is not easy to make a comfortable and inspiring conference in a remote version and across two continents, but this has been the result. I hope to be in presence (maybe in Svalbard!) the next edition.

and I fully agree to the talks behind full of interest and diverse. And to the scheduling of the talks across antipodal locations a wee bit of a challenge, mostly because of the daylight saving time  switches! And to seeing people together being a comfort (esp. since some were enjoying wine and cheese!).

I nonetheless found the experience somewhat daunting, only alleviated by sharing a room with a few others in Dauphine and having the opportunity to react immediately (and off-the-record) to the on-going talk. As a result I find myself getting rather scared by the prospect of the incoming ISBA 2021 World meeting. With parallel sessions and an extensive schedule from 5:30am till 9:30pm (in EDT time, i.e. GMT-4) that nicely accommodates the time zones of all speakers. I am thus thinking of (safely) organising a local cluster to attend the conference together and recover some of the social interactions that are such an essential component of [real] conferences, including students’ participation. It will of course depend on whether conference centres like CIRM reopen before the end of June. And if enough people see some appeal in this endeavour. In the meanwhile, remember to register for ISBA 2021 and for free!, before 01 May.

## ABC in Svalbard [#2]

Posted in Books, Mountains, pictures, R, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , on April 14, 2021 by xi'an

The second day of the ABC wwworkshop got a better start than yesterday [for me] as I managed to bike to Dauphine early enough to watch the end of Gael’s talk and Matias Quiroz’ in full on the Australian side (of zoom). With an interesting take on using frequency-domain (pseudo-)likelihoods in complex models. Followed by two talks by David Frazier from Monash and Chris Drovandi from Brisbane on BSL, the first on misspecification with a finer analysis as to why synthetic likelihood may prove worse: the Mahalanobis distance behind it may get very small and the predictive distribution of the distance may become multimodal. Also pointing out the poor coverage of both ABC and BSL credible intervals. And Chris gave a wide-ranging coverage of summary-free likelihood-free approaches, with examples where they were faring well against some summary-based solutions. Olivier from Grenoble [with a co-author from Monash, keeping up the Australian theme] discussed dimension reductions which could possibly lead to better summary statistics, albeit unrelated with ABC!

Riccardo Corradin considered this most Milanese problem of all problems (!), namely how to draw inference on completely random distributions. The clustering involved in this inference being costly, the authors using our Wasserstein ABC approach on the partitions, with a further link to our ABC-Gibbs algorithm (which Grégoire had just presented) for the tolerance selection. Marko Järvenpää presented an approach related with a just-published paper in Bayesian Analysis. with a notion of noisy likelihood modelled as a Gaussian process. Towards avoiding evaluating the (greedy) likelihood too often, as in the earlier Korrakitara et al. (2014). And coining the term of Bayesian Metropolis-Hastings sampler (as the regular Metropolis (Rosenbluth) is frequentist)! And Pedro Rodrigues discussed using normalising flows in poorly identified (or inverse) models. Raising the issue of validating this approximation to the posterior and connecting with earlier talks.

The afternoon session was a reply of the earliest talks from the Australian mirrors. Clara Grazian gave the first talk yesterday on using and improving a copula-based ABC, introducing empirical likelihood, Gaussian processes and splines. Leading to a question as to whether or not the copula family could be chosen by ABC tools. David Nott raised the issue of conflicting summary statistics. Illustrated by a Poisson example where using the pair made by the empirical mean and the empirical variance  as summary: while the empirical mean is sufficient, conditioning on both leads to a different ABC outcome. Which indirectly relates to a work in progress in our Dauphine group. Anthony Ebert discussed the difficulty of handling state space model parameters with ABC. In an ABCSMC² version, the likelihood is integrated out by a particle filter approximation but leading to difficulties with the associated algorithm, which I somewhat associate with the discrete nature of the approximation, possibly incorrectly. Jacob Priddle’s talked about a whitening version of Bayesian synthetic likelihood. By arguing that the variance of the Monte Carlo approximation to the moments of the Normal synthetic likelihood is much improved when assuming that the components of the summary statistic are independent. I am somewhat puzzled by the proposal, though, in that the whitening matrix need be estimated as well.

Thanks to all colleagues and friends involved in building and running the mirrors and making some exchanges possible despite the distances and time differences! Looking forward a genuine ABC meeting in a reasonable future, and who knows?!, reuniting in Svalbard for real! (The temperature in Longyearbyen today was -14⁰, if this makes some feel better about missing the trip!!!) Rather than starting a new series of “ABC not in…”

## ABC in Svalbard [#1]

Posted in Books, Mountains, pictures, R, Running, Statistics, Travel, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , on April 13, 2021 by xi'an

It started a bit awkwardly for me as I ran late, having accidentally switched to UK time the previous evening (despite a record-breaking biking-time to the University!), then the welcome desk could not find the key to the webinar room and I ended up following the first session from my office, by myself (and my teapot)… Until we managed to reunite in the said room (with an air quality detector!).

Software sessions are rather difficult to follow and I wonder what the idea on-line version should be. We could borrow from our teaching experience new-gained from the past year, where we had to engage students without the ability to roam the computer lab and look at their screens to force engage them into coding. It is however unrealistic to run a computer lab, unless a few “guinea pigs” could be selected in advance and show their progress or lack thereof during the session. In any case, thanks to the speakers who made the presentations of

1. BSL(R)
2. ELFI (Python)
3. ABCpy (Python)

this morning/evening. (Just taking the opportunity to point out the publication of the latest version of DIYABC!).

Florence Forbes’ talk on using mixture of experts was quite alluring (and generated online discussions during the break, recovering some of the fun in real conferences), esp. from my longtime interest normalising flows in mixtures of regression (and more to come as part of our biweekly reading group!). Louis talked about gaining efficiency by not resampling the entire data in large network models. Edwin Fong brought martingales and infinite dimension distributions to the rescue, generalising Polya urns! And Justin Alsing discussed the advantages of estimating the likelihood rather than estimating the posterior, which sounds counterintuitive. With a return to mixtures as approximations, using instead normalising flows. With the worth-repeating message that ABC marginalises over nuisance parameters so easily! And a nice perspective on ABayesian decision, which does not occur that often in the ABC literature. Cecilia Viscardi made a link between likelihood estimation and large deviations à la Sanov, the rare event being associated with the larger distances, albeit dependent on a primary choice of the tolerance. Michael Gutmann presented an intringuing optimisation Monte Carlo approach from his last year AISTATS 2020 paper, the simulated parameter being defined by a fiducial inversion. Reweighted by the prior times a Jacobian term, which stroke me as a wee bit odd, ie using two distributions on θ. And Rito concluded the day by seeking approximate sufficient statistics by constructing exponential families whose components are themselves parameterised as neural networks with neural parameter ω. Leading to an unnormalised model because of the energy function, hence to the use of inference techniques on ω that do not require the constant, like Gutmann & Hyvärinen (2012). And using the (pseudo-)sufficient statistic as ABCsummary statistic. Which still requires an exchange MCMC step within ABC.