## Archive for MCMSki IV

## off to Chamonix!

Posted in Statistics with tags Aiguille des Grands Charmoz, Chamonix-Mont-Blanc, French Alps, ice climbing, jatp, Les Houches, MCMSki IV, Mont Blanc, ski resorts, ski town, vacations on February 1, 2020 by xi'an## your interesting published article “An introduction to the special issue “

Posted in Books, University life with tags academic journals, automated email, Beall's list, MCMSki IV, open access, predatory publishing on April 1, 2019 by xi'an**I**n the flow of unsolicited emails interested in publishing my work, a contender for the top call is this one as of today from Computer Communication & Collaboration that cites my foreword to the special issue of Statistics & Computing published out of the talks at MCMski IV in Chamonix. In 2014. (According to the above site, the publisher of the journal, Better Advances Press, does not meet most of its criteria and identified as predatory by Beall’s List, as of January 3, 2017.)

Your interesting published article “An introduction to the special issue “Joint IMS-ISBA meeting – MCMSki 4″” drives me to call for new papers, on behalf of Computer Communication & Collaboration, which is an English quarterly journal in Canada.

This peer-reviewed journal focuses on smart internet and it welcomes papers on general theories of computer science, data communications, multimedia, social network, machine learning, data mining, intelligent collaboration and other relevant topics, both theoretical and empirical.

All papers should be written in professional English. The length of 2000-6000 words is suggested. We accept papers in MS-word or PDF format.

If your paper is qualified for publication after refereeing, it will be published within 2-4 months from the date of submission.

Thank you for your consideration.

## off to Chamonix!

Posted in Mountains, pictures, Running, Travel with tags Aiguille des Grands Charmoz, Chamonix-Mont-Blanc, French Alps, ice climbing, jatp, Les Houches, MCMSki IV, Mont Blanc, ski town, vacations on March 2, 2018 by xi'an## better together?

Posted in Books, Mountains, pictures, Statistics, University life with tags Bayesian Analysis, better together, Chamonix-Mont-Blanc, cut models, decision theory, diode, Martyn Plummer, MCMSki IV, Scottish independence referendum on August 31, 2017 by xi'an**Y**esterday came out on arXiv a joint paper by Pierre Jacob, Lawrence Murray, Chris Holmes and myself, *Better together? Statistical learning in models made of modules, *paper that was conceived during the MCMski meeting in Chamonix, 2014! Indeed it is mostly due to Martyn Plummer‘s talk at this meeting about the cut issue that we started to work on this topic at the fringes of the [standard] Bayesian world. Fringes because a standard Bayesian approach to the problem would always lead to use the entire dataset and the entire model to infer about a parameter of interest. *[Disclaimer: the use of the very slogan of the anti-secessionists during the Scottish Independence Referendum of 2014 in our title is by no means a measure of support of their position!]* Comments and suggested applications most welcomed!

The setting of the paper is inspired by realistic situations where a model is made of several modules, connected within a graphical model that represents the statistical dependencies, each relating to a specific data modality. In a standard Bayesian analysis, given data, a conventional statistical update then allows for coherent uncertainty quantification and information propagation through and across the modules. However, misspecification of or even massive uncertainty about any module in the graph can contaminate the estimate and update of parameters of other modules, often in unpredictable ways. Particularly so when certain modules are trusted more than others. Hence the appearance of cut models, where practitioners prefer skipping the full model and limit the information propagation between these modules, for example by restricting propagation to only one direction along the edges of the graph. (Which is sometimes represented as a diode on the edge.) The paper investigates in which situations and under which formalism such modular approaches can outperform the full model approach in misspecified settings. By developing the appropriate decision-theoretic framework. Meaning we can choose between [several] modular and full-model approaches.

## unbiased MCMC

Posted in Books, pictures, Statistics, Travel, University life with tags convergence assessment, coupling, coupling from the past, cut distribution, MCMC, MCMSki IV, perfect sampling, renewal process on August 25, 2017 by xi'an**T**wo weeks ago, Pierre Jacob, John O’Leary, and Yves F. Atchadé arXived a paper on unbiased MCMC with coupling. Associating MCMC with unbiasedness is rather challenging since MCMC are rarely producing simulations from the exact target, unless specific tools like renewal can be produced in an efficient manner. (I supported the use of such renewal techniques as early as 1995, but later experiments led me to think renewal control was too rare an occurrence to consider it as a generic convergence assessment method.)

This new paper makes me think I had given up too easily! Here the central idea is coupling of two (MCMC) chains, associated with the debiasing formula used by Glynn and Rhee (2014) and already discussed here. Having the coupled chains meet at some time with probability one implies that the debiasing formula does not need a (random) stopping time. The coupling time is sufficient. Furthermore, several estimators can be derived from the same coupled Markov chain simulations, obtained by starting the averaging at a later time than the first iteration. The average of these (unbiased) averages results into a weighted estimate that weights more the later differences. Although coupling is also at the basis of perfect simulation methods, the analogy between this debiasing technique and perfect sampling is hard to fathom, since the coupling of two chains is not a perfect sampling instant. (Something obvious only in retrospect for me is that the variance of the resulting unbiased estimator is at best the variance of the original MCMC estimator.)

When discussing the implementation of coupling in Metropolis and Gibbs settings, the authors give a simple optimal coupling algorithm I was not aware of. Which is a form of accept-reject also found in perfect sampling I believe. (Renewal based on small sets makes an appearance on page 11.) I did not fully understood the way two random walk Metropolis steps are coupled, in that the normal proposals seem at odds with the boundedness constraints. But coupling is clearly working in this setting, while renewal does not. In toy examples like the (Efron and Morris!) baseball data and the (Gelfand and Smith!) pump failure data, the parameters k and m of the algorithm can be optimised against the variance of the averaged averages. And this approach comes highly useful in the case of the cut distribution, a problem which I became aware of during MCMskiii and on which we are currently working with Pierre and others.

## future of computational statistics

Posted in Books, pictures, R, Statistics, University life with tags ABC, Apple II, approximation, BUGS, computational statistics, expectation-propagation, JAGS, MCMC, MCMSki IV, Monte Carlo, optimisation, STAN, statistical computing, sunset, variational Bayes methods on September 29, 2014 by xi'anI am currently preparing a survey paper on the present state of computational statistics, reflecting on the massive evolution of the field since my early Monte Carlo simulations on an Apple //e, which would take a few days to return a curve of approximate expected squared error losses… It seems to me that MCMC is attracting more attention nowadays than in the past decade, both because of methodological advances linked with better theoretical tools, as for instance in the handling of stochastic processes, and because of new forays in accelerated computing via parallel and cloud computing, The breadth and quality of talks at MCMski IV is testimony to this. A second trend that is not unrelated to the first one is the development of new and the rehabilitation of older techniques to handle complex models by approximations, witness ABC, Expectation-Propagation, variational Bayes, &tc. With a corollary being an healthy questioning of the models themselves. As illustrated for instance in Chris Holmes’ talk last week. While those simplifications are inevitable when faced with hardly imaginable levels of complexity, I still remain confident about the “inevitability” of turning statistics into an “optimize+penalize” tunnel vision… A third characteristic is the emergence of new languages and meta-languages intended to handle complexity both of problems and of solutions towards a wider audience of users. STAN obviously comes to mind. And JAGS. But it may be that another scale of language is now required…

If you have any suggestion of novel directions in computational statistics or instead of dead ends, I would be most interested in hearing them! So please do comment or send emails to my gmail address bayesianstatistics…

## likelihood-free inference via classification

Posted in Books, Mountains, pictures, Statistics, Travel, University life with tags ABC, Chamonix, classification, MCMSki IV, random forests, summary statistics on August 5, 2014 by xi'an**L**ast week, Michael Gutmann, Ritabrata Dutta, Samuel Kaski, and Jukka Corander posted on arXiv the last version of the paper they had presented at MCMSki 4. As indicated by its (above) title, it suggests implementing ABC based on classification tools. Thus making it somewhat connected to our recent random forest paper.

**T**he starting idea in the paper is that datasets generated from distributions with different parameters should be easier to classify than datasets generated from distributions with the same parameters. And that classification accuracy naturally induces a distance between datasets and between the parameters behind those datasets. We had followed some of the same track when starting using random forests, before realising that for our model choice setting, proceeding the entire ABC way once the random forest procedure had been constructed was counter-productive. Random forests are just too deadly as efficient model choice machines to try to compete with them through an ABC postprocessing. Performances are just… Not. As. Good!

**A** side question: I have obviously never thought about that before but why is the naïve Bayes classification rule so called?! It never sounded very Bayesian to me to (a) use the true value of the parameter and (b) average the classification performances. Interestingly, the authors (i) show identical performances of other classification methods (Fig. 2) and (ii) an exception for MA time series: when we first experimented random forests, raw data from an MA(2) model was tested to select between MA(1) and MA(2) models, and the performances of the resulting random forest were quite poor.

**N**ow, an opposition between our two approaches is that Michael and his coauthors also include point estimation within the range of classification-based ABC inference. As we stressed in our paper, we restrict the range to classification and model choice because we do not think those machine learning tools are stable and powerful enough to perform regression and posterior probability approximation. I also see a practical weakness in the estimation scheme proposed in this new paper. Namely that the Monte Carlo gets into the way of the consistency theorem. And possibly of the simulation method itself. Another remark is that, while the authors compare the fit produced by different classification methods, there should be a way to aggregate them towards higher efficiency. Returning once more to our random forest paper, we saw improved performances each time we included a reference method, from LDA to SVMs. It would be interesting to see a (summary) variable selection version of the proposed method. A final remark is that computing time and effort do not seem to get mentioned in the paper (unless Indian jetlag confuses me more than usual). I wonder how fast the computing effort grows with the sample size to reach parametric and quadratic convergence rates.