Among the sessions I attended yesterday, I really liked the one on robustness and model mispecification. Especially the talk by Steve McEachern on Bayesian inference based on insufficient statistics, with a striking graph of the degradation of the Bayes factor as the prior variance increases. I sadly had no time to grab a picture of the graph, which compared this poor performance against a stable rendering when using a proper summary statistic. It clearly relates to our work on ABC model choice, as well as to my worries about the Bayes factor, so this explains why I am quite excited about this notion of restricted inference. In this session, Chris Holmes also summarised his two recent papers on loss-based inference, which I discussed here in a few posts, including the Statistical Science discussion Judith and I wrote recently. I also went to the j-ISBA [section] session which was sadly under-attended, maybe due to too many parallel sessions, maybe due to the lack of unifying statistical theme.

## Archive for approximate likelihood

## ISBA 2016 [#3]

Posted in pictures, Running, Statistics, Travel, University life, Wines with tags ABC, approximate likelihood, Calasetta, ISBA 2016, j-ISBA, loss function, restricted inference, San' Antioco, Sardinia, Statistical Science, summary statistics on June 16, 2016 by xi'an## communication-efficient distributed statistical learning

Posted in Books, Statistics, University life with tags approximate likelihood, big data, distributed computing, logistic regression, M-estimation, MCMC, scalability, tall data, Taylor expansion on June 10, 2016 by xi'an**M**ichael Jordan, Jason Lee, and Yun Yang just arXived a paper with their proposal on handling large datasets through distributed computing, thus contributing to the currently very active research topic of approximate solutions in large Bayesian models. The core of the proposal is summarised by the screenshot above, where the approximate likelihood replaces the exact likelihood with a first order Taylor expansion. The first term is the likelihood computed for a given subsample (or a given thread) at a ratio of one to N and the difference of the gradients is only computed once at a good enough guess. While the paper also considers M-estimators and non-Bayesian settings, the Bayesian part thus consists in running a regular MCMC when the log-target is approximated by the above. I first thought this proposal amounted to a Gaussian approximation à la Simon Wood or to an INLA approach but this is not the case: the first term of the approximate likelihood is exact and hence can be of any form, while the scalar product is linear in θ, providing a sort of first order approximation, albeit frozen at the chosen starting value.

Assuming that each block of the dataset is stored on a separate machine, I think the approach could further be implemented in parallel, running N MCMC chains and comparing the output. With a post-simulation summary stemming from the N empirical distributions thus produced. I also wonder how the method would perform outside the fairly smooth logistic regression case, where the single sample captures well-enough the target. The picture above shows a minor gain in a misclassification rate that is already essentially zero.

## Bayesian composite likelihood

Posted in Books, Statistics, University life with tags ABC, approximate likelihood, Bayesian Analysis, composite likelihood, Kullback-Leibler divergence, machine learning, mixture of experts on February 11, 2016 by xi'an

“…the pre-determined weights assigned to the different associations between observed and unobserved values represent strong a priori knowledge regarding the informativeness of clues. A poor choice of weights will inevitably result in a poor approximation to the “true” Bayesian posterior…”

**L**ast Xmas, Alexis Roche arXived a paper on Bayesian inference via composite likelihood. I find the paper quite interesting in that [and only in that] it defends the innovative notion of writing a composite likelihood as a pool of opinions about some features of the data. Recall that each term in the composite likelihood is a marginal likelihood for some projection z=f(y) of the data y. As in ABC settings, although it is rare to derive closed-form expressions for those marginals. The composite likelihood is parameterised by powers of those components. Each component is associated with an expert, whose weight reflects the importance. The sum of the powers is constrained to be equal to one, even though I do not understand why the dimensions of the projections play no role in this constraint. Simplicity is advanced as an argument, which sounds rather weak… Even though this may be infeasible in any realistic problem, it would be more coherent to see the weights as producing the best Kullback approximation to the true posterior. Or to use a prior on the weights and estimate them along the parameter θ. The former could be incorporated into the later following the approach of Holmes & Walker (2013). While the ensuing discussion is most interesting, it remains missing in connecting the different components in terms of the (joint) information brought about the parameters. Especially because the weights are assumed to be given rather than inferred. Especially when they depend on θ. I also wonder why the variational Bayes interpretation is not exploited any further. And see no clear way to exploit this perspective in an ABC environment.

## MCMskv #5 [future with a view]

Posted in Kids, Mountains, R, Statistics, Travel, University life with tags airbnb, approximate likelihood, asynchronous algorithms, BayesComp, BAYSM, big data, computational complexity, exact Monte Carlo, Lenzerheide, likelihood-free methods, MCMC convergence, MCMskv, Metropolis-Hastings algorithm, noisy Metropolis-Hastings algorithm, quasi-Monte Carlo methods, snow, Switzerland on January 12, 2016 by xi'an**A**s I am flying back to Paris (with an afternoon committee meeting in München in-between), I am reminiscing on the superlative scientific quality of this MCMski meeting, on the novel directions in computational Bayesian statistics exhibited therein, and on the potential settings for the next meeting. If any.

First, as hopefully obvious from my previous entries, I found the scientific program very exciting, with almost uniformly terrific talks, and a coverage of the field of computational Bayesian statistics that is perfectly tuned to my own interest. In that sense, MCMski is my “top one” conference! Even without considering the idyllic location. While some of the talks were about papers I had already read (and commented here), others brought new vistas and ideas. If one theme is to emerge from this meeting it has to be the one of approximate and noisy algorithms, with a wide variety of solutions and approaches to overcome complexity issues. If anything, I wish the solutions would also incorporate the Boxian fact that the statistical models themselves are approximate. Overall, a fantastic program (says one member of the scientific committee).

Second, as with previous MCMski meetings, I again enjoyed the unique ambience of the meeting, which always feels more relaxed and friendly than other conferences of a similar size, maybe because of the après-ski atmosphere or of the special coziness provided by luxurious mountain hotels. This year hotel was particularly pleasant, with non-guests like myself able to partake of some of their facilities. A big thank you to Anto for arranging so meticulously all the details of such a large meeting!!! I am even more grateful when realising this is the third time Anto takes over the heavy load of organising MCMski. Grazie mille!

Since this is a [and even the!] BayesComp conference, the current section program chair and board must decide on the structure and schedule of the next meeting. A few suggestions if I may: I would scrap entirely the name *MCMski* from the next conference as (a) it may sound like academic tourism for unaware bystanders (who only need to check the program of any of the MCMski conferences to stand reassured!) and (b) its topic go way beyond MCMC. Given the large attendance and equally large proportion of young researchers, I would also advise against hosting the conference in a ski resort for both cost and accessibility reasons [as we had already discussed after MCMskiv], in favour of a large enough town to offer a reasonable range of accommodations and of travel options. Like Chamonix, Innsbruck, Reykjavik, or any place with a major airport about one hour away… If nothing is available with skiing possibilities, so be it! While the outdoor inclinations of the early organisers induced us to pick locations where skiing over lunch break was a perk, any accessible location that allows for a concentration of researchers in a small area and for the ensuing day-long exchange is fine! Among the novelties in the program, the tutorials and the Breaking news! sessions were quite successful (says one member of the scientific committee). And should be continued in one format or another. Maybe a more programming thread could be added as well… And as we had mentioned earlier, to see a stronger involvement of the Young Bayesian section in the program would be great! (Even though the current meeting already had many young researcher talks.)

## never mind the big data here’s the big models [workshop]

Posted in Kids, pictures, Statistics, Travel, University life with tags approximate likelihood, Bayesian model comparison, Bayesian statistics, big data, big models, GAMs, gaussian process, latent Gaussian models, likelihood function, misspecified model, model criticism, modelliing, point processes, Sex Pistols, spatial statistics, University of Warwick on December 22, 2015 by xi'an**M**aybe the last occurrence this year of the pastiche of the iconic LP of the Sex Pistols!, made by Tamara Polajnar. The last workshop as well of the big data year in Warwick, organised by the Warwick Data Science Institute. I appreciated the different talks this afternoon, but enjoyed particularly Dan Simpson’s and Rob Scheichl’s. The presentation by Dan was so hilarious that I could not resist asking him for permission to post the slides here:

Not only hilarious [and I have certainly missed 67% of the jokes], but quite deep about the meaning(s) of modelling and his views about getting around the most blatant issues. Ron presented a more computational talk on the ways to reach petaflops on current supercomputers, in connection with weather prediction models used (or soon to be used) by the Met office. For a prediction area of 1 km². Along with significant improvements resulting from multiscale Monte Carlo and quasi-Monte Carlo. Definitely impressive! And a brilliant conclusion to the Year of Big Data (and big models).

## BAYSM’14 recollection

Posted in Books, Kids, pictures, Statistics, Travel, University life, Wines with tags ABC, approximate likelihood, architecture, Austria, BAYSM 2014, Donau, econometrics, Heuriger, interweaving, MCMC, Vienna, Wien, WU Wien, young Bayesians on September 23, 2014 by xi'an**W**hen I got invited to BAYSM’14 last December, I was quite excited to be part of the event. (And to have the opportunities to be in Austria, in Wien and on the new WU campus!) And most definitely and a posteriori I have not been disappointed given the high expectations I had for that meeting…! The organisation was seamless, even by Austrian [high] standards, the program diverse and innovative, if somewhat brutal for older Bayesians and the organising committee (Angela Bitto, Gregor Kastner, and Alexandra Posekany) deserves an ISBA recognition award [yet to be created!] for their hard work and dedication. Thanks also to Sylvia Früwirth-Schnatter for hosting the meeting in her university. They set the standard very high for the next BAYSM organising team. (To be hold in Firenze/Florence, on June 19-21, 2016, just prior to the ISBA World meeting *not* taking place in Banff. A great idea to associate with a major meeting, in order to save on travel costs. Maybe the following BAYSM will take place in Edinburgh! Young, local, and interested Bayesians just have to contact the board of BAYS with proposals.)

So, very exciting and diverse. A lot of talks in applied domains, esp. economics and finance in connection with the themes of the guest institution, WU. On the talks most related to my areas of interest, I was pleased to see Matthew Simpson working on interweaving MCMC with Vivek Roy and Jarad Niemi, Madhura Killedar constructing her own kind of experimental ABC on galaxy clusters, Kathrin Plankensteiner using Gaussian processes on accelerated test data, Julyan Arbel explaining modelling by completely random measures for hazard mixtures [and showing his filliation with me by (a) adapting my pun title to his talk, (b) adding an unrelated mountain picture to the title page, (c) including a picture of a famous probabilist, Paul Lévy, to his introduction of Lévy processes and (d) using xkcd strips], Ewan Cameron considering future ABC for malaria modelling, Konstantinos Perrakis working on generic importance functions in data augmentation settings, Markus Hainy presenting his likelihood-free design (that I commented a while ago), Kees Mulder explaining how to work with the circular von Mises distribution. Not to mention the numerous posters I enjoyed over the first evening. And my student Clara Grazian who talked about our joint and current work on Jeffreys priors for mixture of distributions. Whose talk led me to think of several extensions…

Besides my trek through past and current works of mine dealing with mixtures, the plenary sessions for mature Bayesians were given by Mike West and Chris Holmes, who gave very different talks but with the similar message that data was catching up with modelling and with a revenge and that we [or rather young Bayesians] needed to deal with this difficulty. And use approximate or proxy models. Somewhat in connection with my last part on an alternative to Bayes factors, Mike also mentioned a modification of the factor in order to attenuate the absorbing impact of long time series. And Chris re-set Bayesian analysis within decision theory, constructing approximate models by incorporating the loss function as a substitute to the likelihood.

Once again, a terrific meeting in a fantastic place with a highly unusual warm spell. Plus enough time to run around Vienna and its castles and churches. And enjoy local wines (great conference evening at a Heuriger, where we did indeed experience Gemütlichkeit.) And museums. Wunderbar!

## proper likelihoods for Bayesian analysis

Posted in Books, Statistics, University life with tags ABC, approximate likelihood, asymptotic normality, Bayesian Analysis, Biometrika, Montpellier, Padova, summary statistics on April 11, 2013 by xi'an**W**hile in Montpellier yesterday (where I also had the opportunity of tasting an excellent local wine!), I had a look at the 1992 Biometrika paper by Monahan and Boos on “*Proper likelihoods for Bayesian analysis*“. This is a paper I missed and that was pointed out to me during the discussions in Padova. The main point of this short paper is to decide when a method based on an approximative likelihood function is truly (or properly) Bayes. Just the very question a bystander would ask of ABC methods, wouldn’t it?! The validation proposed by Monahan and Boos is one of calibration of credible sets, just as in the recent arXiv paper of Dennis Prangle, Michael Blum, G. Popovic and Scott Sisson I reviewed three months ago. The idea is indeed to check by simulation that the true posterior coverage of an α-level set equals the nominal coverage α. In other words, the predictive based on the likelihood approximation should be uniformly distributed and this leads to a goodness-of-fit test based on simulations. As in our ABC model choice paper, *Proper likelihoods for Bayesian analysis* notices that Bayesian inference drawn upon an insufficient statistic is proper and valid, simply less accurate than the Bayesian inference drawn upon the whole dataset. The paper also enounces a conjecture:

A [approximate] likelihood L is a coverage proper Bayesian likelihood if and inly if L has the form L(y|θ) = c(s) g(s|θ) where s=S(y) is a statistic with density g(s|θ) and c(s) some function depending on s alone.

conjecture that sounds incorrect in that noisy ABC is also well-calibrated. (I am not 100% sure of this argument, though.) An interesting section covers the case of pivotal densities as substitute likelihoods and of the confusion created by the double meaning of the parameter θ. The last section is also connected with ABC in that Monahan and Boos reflect on the use of large sample approximations, like normal distributions for estimates of θ which are a special kind of statistics, but do not report formal results on the asymptotic validation of such approximations. All in all, a fairly interesting paper!

**R**eading this highly interesting paper also made me realise that the criticism I had made in my review of Prangle et al. about the difficulty for this calibration method to address the issue of summary statistics was incorrect: when using the true likelihood function, the use of an arbitrary summary statistics is validated by this method and is thus proper.