**A**s pointed out by Peter Coles on his blog, In the Dark, Hyungsuk Tak, Sujit Ghosh, and Justin Ellis just arXived a review of the unsafe use of improper priors in astronomy papers, 24 out of 75 having failed to establish that the corresponding posteriors are well-defined. And they exhibit such an instance (of impropriety) in a MNRAS paper by Pihajoki (2017), which is a complexification of Gelfand et al. (1990), also used by Jim Hobert in his thesis. (Even though the formal argument used to show the impropriety of the posterior in Pihajoki’s paper does not sound right since it considers divergence at a single value of a parameter β.) Besides repeating this warning about an issue that was rather quickly identified in the infancy of MCMC, if not in the very first publications on the Gibbs sampler, the paper seems to argue against using improper priors due to this potential danger, stating that instead proper priors that include all likely values and beyond are to be preferred. Which reminds me of the BUGS feature of using a N(0,10⁹) prior instead of the flat prior, missing the fact that “very large” variances do impact the resulting inference (if only for the issue of model comparison, remember Lindley-Jeffreys!). And are informative in that sense. However, it is obviously a good idea to advise checking for propriety (!) and using such alternatives may come as a safety button, providing a comparison benchmark to spot possible divergences in the resulting inference.

## Archive for BUGS

## improperties on an astronomical scale

Posted in Books, pictures, Statistics with tags astronomy, astrostatistics, Bayesian inference, BUGS, improper posteriors, impropriety, noninformative priors, vague priors on December 15, 2017 by xi'an## adaptive rejection sampling with fixed number of nodes

Posted in Books, Statistics, University life with tags accept-reject algorithm, ARS, BUGS, Gibbs sampler, Monte Carlo Statistical Methods, Wally Gilks on October 8, 2015 by xi'an**T**his paper was arXived today by Martino and Louzada. It starts from the adaptive rejection sampling algorithm of Gilks and Wild (1993), which provided an almost automatic random generator for univariate log-concave target probability densities. By creating an envelope of the true target based on convexity. This algorithm was at the core of BUGS in its early days. And makes a good example of accept reject algorithm that I often use in class. It is however not so popular nowadays because of the unidimensional constraint. And because it is not particularly adequate for simulating a *single* realisation from a given target. As is the case when used in a Gibbs sampler. The authors only consider here the issue of the growing cost of running the adaptive rejection sampling algorithm, which is due to the update of the envelope at each rejection. They however fail to quantify this increase, which involves (a) finding the interval where the current rejection lies, among n existing intervals, which is of order O(n), and (b) creating both modified envelopes on both new intervals, which is of order O(1). The proposal of the authors, called cheap adaptive rejection sampling, settles for a fixed number τ of support points (and intervals), solely swapping a rejected point with the closest support point when this improves the acceptance rate. The test for swapping involves first computing the alternative target (on a pair of intervals) and the corresponding normalising constant, while swapping the rejected point with the closest support point involves finding the closest point, which is of order O(log τ). I am rather surprised that the authors do not even mention the execution time orders, resorting instead to a simulation where the gain brought by their proposal is not overwhelming. There is also an additional cost for figuring out the fixed number τ of support points, another issue not mentioned in the paper.

## reis naar Amsterdam

Posted in Books, Kids, pictures, Running, Statistics, Travel, University life, Wines with tags Amsterdam, Bayesian statistics, BUGS, canals, Holland, Ising model, JASP, Marc Kac, minimal description length principle, normalising constant, psychology, R, STAN, UvA on April 16, 2015 by xi'an**O**n Monday, I went to Amsterdam to give a seminar at the University of Amsterdam, in the department of psychology. And to visit Eric-Jan Wagenmakers and his group there. And I had a fantastic time! I talked about our mixture proposal for Bayesian testing and model choice without getting hostile or adverse reactions from the audience, quite the opposite as we later discussed this new notion for several hours in the café across the street. I also had the opportunity to meet with Peter Grünwald [who authored a book on the minimum description length principle] pointed out a minor inconsistency of the common parameter approach, namely that the Jeffreys prior on the first model did not have to coincide with the Jeffreys prior on the second model. (The Jeffreys prior for the mixture being unavailable.) He also wondered about a more conservative property of the approach, compared with the Bayes factor, in the sense that the non-null parameter could get closer to the null-parameter while still being identifiable.

Among the many persons I met in the department, Maarten Marsman talked to me about his thesis research, Plausible values in statistical inference, which involved handling the Ising model [a non-sparse Ising model with O(p²) parameters] by an auxiliary representation due to Marc Kac and getting rid of the normalising (partition) constant by the way. (Warning, some approximations involved!) And who showed me a simple probit example of the Gibbs sampler getting stuck as the sample size n grows. Simply because the uniform conditional distribution on the parameter concentrates faster (in 1/n) than the posterior (in 1/√n). This does not come as a complete surprise as data augmentation operates in an n-dimensional space. Hence it requires more time to get around. As a side remark [still worth printing!], Maarten dedicated his thesis as *“To my favourite random variables , Siem en Fem, and to my normalizing constant, Esther”*, from which I hope you can spot the influence of at least two of my book dedications! As I left Amsterdam on Tuesday, I had time for a enjoyable dinner with E-J’s group, an equally enjoyable early morning run [with perfect skies for sunrise pictures!], and more discussions in the department. Including a presentation of the new (delicious?!) Bayesian software developed there, JASP, which aims at non-specialists [i.e., researchers unable to code in R, BUGS, or, God forbid!, STAN] And about the consequences of mixture testing in some psychological experiments. Once again, a fantastic time discussing Bayesian statistics and their applications, with a group of dedicated and enthusiastic Bayesians!

## future of computational statistics

Posted in Books, pictures, R, Statistics, University life with tags ABC, Apple II, approximation, BUGS, computational statistics, expectation-propagation, JAGS, MCMC, MCMSki IV, Monte Carlo, optimisation, STAN, statistical computing, sunset, variational Bayes methods on September 29, 2014 by xi'anI am currently preparing a survey paper on the present state of computational statistics, reflecting on the massive evolution of the field since my early Monte Carlo simulations on an Apple //e, which would take a few days to return a curve of approximate expected squared error losses… It seems to me that MCMC is attracting more attention nowadays than in the past decade, both because of methodological advances linked with better theoretical tools, as for instance in the handling of stochastic processes, and because of new forays in accelerated computing via parallel and cloud computing, The breadth and quality of talks at MCMski IV is testimony to this. A second trend that is not unrelated to the first one is the development of new and the rehabilitation of older techniques to handle complex models by approximations, witness ABC, Expectation-Propagation, variational Bayes, &tc. With a corollary being an healthy questioning of the models themselves. As illustrated for instance in Chris Holmes’ talk last week. While those simplifications are inevitable when faced with hardly imaginable levels of complexity, I still remain confident about the “inevitability” of turning statistics into an “optimize+penalize” tunnel vision… A third characteristic is the emergence of new languages and meta-languages intended to handle complexity both of problems and of solutions towards a wider audience of users. STAN obviously comes to mind. And JAGS. But it may be that another scale of language is now required…

If you have any suggestion of novel directions in computational statistics or instead of dead ends, I would be most interested in hearing them! So please do comment or send emails to my gmail address bayesianstatistics…

## MCqMC 2014 [day #3]

Posted in pictures, Running, Statistics, Travel, University life, Wines with tags ABC, adaptive MCMC methods, ARMS algorithm, Belgian beer, Belgium, brewery, BUGS, dimension curse, Langevin diffusion, Leffe, Leuven, MALA, MCMC, MCQMC2014, Monte Carlo Statistical Methods, multi-level Monte Carlo, Stella Artois on April 10, 2014 by xi'an**A**s the second day at MCqMC 2014, was mostly on multi-level Monte Carlo and quasi-Monte Carlo methods, I did not attend many talks but had a long run in the countryside (even saw a pheasant and a heron), worked at “home” on pressing recruiting evaluations and had a long working session with Pierre Jacob. Plus an evening out sampling (just) a few Belgian beers in the shade of the city hall…

**T**oday was more in my ballpark as there were MCMC talks the whole day! The plenary talk was not about MCMC as Erich Novak presented a survey on the many available results bounding the complexity of approximating an integral based on a fixed number of evaluations of the integrand, some involving the dimension (and its curse), some not, some as fast as √n and some not as fast, all this depending on the regularity and the size of the classes of integrands considered. In some cases, the solution was importance sampling, in other cases, quasi-Monte Carlo, and yet other cases were still unsolved. Then Yves Atchadé gave a new perspective on computing the asymptotic variance in the central limit theorem on Markov chains when truncating the autocovariance, Matti Vihola talked about theoretical orderings of Markov chains that transmuted into the very practical consequence that using more simulations in a pseudo-marginal likelihood approximation improved acceptance rate and asymptotic variances (and this applies to aBC-MCMC as well), Radu Craiu proposed a novel processing of adaptive MCMC by treating various approximations to the true target as food for a multiple-try Metropolis algorithm, and Luca Martino had a go at resuscitating the ARMS algorithm of Gilks and Wild (used for a while in BUGS), although the talk did not dissipate all of my misgivings about the multidimensional version! I had more difficulties following the “Warwick session” which was made of four talks by current or former students from Warwick, although I appreciated the complexity of the results in infinite dimensional settings and novel approximations to diffusion based Metropolis algorithms. No further session this afternoon as the “social” activity was to visit the nearby Stella Artois brewery! This activity made us very social, for certain, even though there was hardly a soul around in this massively automated factory. *(Maybe an ‘Og post to come one of those days…)*

## A discussion on Bayesian analysis : Selecting Noninformative Priors

Posted in Statistics with tags Bayesian predictive, BUGS, conditional means priors, Dirichlet prior, noninformative priors, PhD thesis, The American Statistician on February 26, 2014 by xi'an**F**ollowing an earlier post on the American Statistician 2013 paper by Seaman III and co-authors, *Hidden dangers of specifying noninformative priors*, my PhD student Kaniav Kamary wrote a paper re-analysing the examples processed by those authors and concluding to the stability of the posterior distributions of the parameters and to the effect of the noninformative prior being essentially negligible. (This is the very first paper quoting verbatim from the ‘Og!) Kaniav logically submitted the paper to the American Statistician.

## cut, baby, cut!

Posted in Books, Kids, Mountains, R, Statistics, University life with tags BUGS, Chamonix, CREST, cut models, decompression, flu, graphical models, JAGS, Martyn Plummer, MCMC, MCMSki IV, Monte Carlo Statistical Methods, OpenBUGS, The BUGS book on January 29, 2014 by xi'an**A**t MCMSki IV, I attended (and chaired) a session where Martyn Plummer presented some developments on cut models. As I was not sure I had gotten the idea *[although this happened to be one of those few sessions where the flu had not yet completely taken over!]* and as I wanted to check about a potential explanation for the lack of convergence discussed by Martyn during his talk, I decided to (re)present the talk at our “MCMSki decompression” seminar at CREST. Martyn sent me his slides and also kindly pointed out to the relevant section of the BUGS book, reproduced above. *(Disclaimer: do not get me wrong here, the title is a pun on the infamous “drill, baby, drill!” and not connected in any way to Martyn’s talk or work!)*

**I** cannot say I get the idea any clearer from this short explanation in the BUGS book, although it gives a literal meaning to the word “cut”. From this description I only understand that a *cut* is the removal of an edge in a probabilistic graph, however there must/may be some arbitrariness in building the wrong conditional distribution. In the Poisson-binomial case treated in Martyn’s case, I interpret the cut as simulating from

instead of

hence loosing some of the information about φ… Now, this cut version is a function of φ and θ that can be fed to a Metropolis-Hastings algorithm. Assuming we can handle the posterior on φ and the conditional on θ given φ. If we build a Gibbs sampler instead, we face a difficulty with the normalising constant m(y|φ). Said Gibbs sampler thus does not work in generating from the “cut” target. Maybe an alternative borrowing from the rather large if disparate missing constant toolbox. (In any case, we *do not* simulate from the original joint distribution.) The natural solution would then be to make a independent proposal on φ with target the posterior given z and then any scheme that preserves the conditional of θ given φ and y; “any” is rather wistful thinking at this stage since the only practical solution that I see is to run a Metropolis-Hasting sampler long enough to “reach” stationarity… I also remain with a lingering although not life-threatening question of whether or not the BUGS code using cut distributions provide the “right” answer or not. Here are my five slides used during the seminar (with a random walk implementation that did not diverge from the true target…):