Archive for MCMC

adaptive incremental mixture MCMC

Posted in Statistics with tags , , , , , , , on August 12, 2022 by xi'an

Sadly, I missed this adaptive incremental mixture MCMC paper by my friends Florian Maire, Nial Friel, Antonietta Mira, and Adrian E. Raftery when it came out in JCGS in 2019. The core of the paper is about building a time-inhomogeneous mixture independent proposal, starting from an initial distribution and adding one component when hitting a point for which the ratio target / proposal is large, as this points out a part of the space that is not well-enough explored, while the other components do not change, except for a proportional decrease in the weights. This proposal reminded me of the inspiring paper of Gåsemyr (2003), which in some ways inspired our population Monte Carlo sampler. Obviously, there is a what-you-get-is-what-you-see drawback to the approach in that regions where this ratio is high may never be explored by the proposal, despite its adaptivity.

The added component is Normal, centred at the associated (accepted) proposed value ø and with covariance matrix a local estimate based on past iterations of the algorithm. And with weight proportional to the (powered) target density at ø, which does not require a normalising constant. The method however requires setting a certain number of calibration parameters like the power γ for the weight, the lower bound M for the ratio target to proposal, the rate of diminishing adaptation (which is also needed for ergodicity à la Roberts and Rosenthal (2007)).  And the implicit choice of a particular parameterisation for the Normal mixture to be close enough to the target. In the posted experiments, the number of components in the mixture does not grow to unmanageable figures, but a further adaption could be in removing components that are inactive or leading to systematic rejection as we did in the population Monte Carlo paper.

an introduction to MCMC sampling

Posted in Books, Kids, Statistics with tags , , , , , , , , , on August 9, 2022 by xi'an

Following a rather clueless question on X validated, I had a quick read of A simple introduction to Markov Chain Monte–Carlo sampling, by Ravenzwaaij, Cassey, and Brown, published in 2018 in Psychonomic Bulletin & Review, which I had never opened to this day. The setting is very basic and the authors at pain to make their explanations as simple as possible, but I find the effort somehow backfires under the excess of details. And the characteristic avoidance of mathematical symbols and formulae. For instance, in the Normal mean example that is used as introductory illustration and that confused the question originator, there is no explanation for the posterior being a N(100,15) distribution, 100 being the sample average, the notation N(μ|x,σ) is used for the posterior density, and then the Metropolis comparison brings an added layer of confusion:

“Since the target distribution is normal with mean 100 (the value of the single observation) and standard deviation 15,  this means comparing N(100|108, 15) against N(100|110, 15).”

as it most unfortunately exchanges the positions of  μ and x (which is equal to 100). There is no fundamental error there, due to the symmetry of the Normal density, but this switch from posterior to likelihood certainly contributes to the confusion of the QO. Similarly for the Metropolis step description:

“If the new proposal has a lower posterior value than the most recent sample, then randomly choose to accept or
reject the new proposal, with a probability equal to the height of both posterior values. “

And the shortcomings of MCMC may prove equally difficult to ingest: like
“The method will “work” (i.e., the sampling distribution will truly be the target distribution) as long as certain conditions are met.
Firstly, the likelihood values calculated (…) to accept or reject the new proposal must accurately reflect the density of the proposal in the target distribution. When MCMC is applied to Bayesian inference, this means that the values calculated must be posterior likelihoods, or at least be proportional to the posterior likelihood (i.e., the ratio of the likelihoods calculated relative to one another must be correct).”

which leaves me uncertain as to what the authors do mean by the alternative situation, i.e., by the proposed value not reflecting the proposal density. Again, the reluctance in using (more) formulae hurts the intended pedagogical explanations.

important Markov chains

Posted in Books, Statistics, University life with tags , , , , , , , , , , , on July 21, 2022 by xi'an

With Charly Andral (PhD, Paris Dauphine), Randal Douc, and Hugo Marival (PhD, Telecom SudParis), we just arXived a paper on importance Markov chains that merges importance sampling and MCMC. An idea already mentioned in Hastings (1970) and even earlier in Fodsick (1963), and later exploited in Liu et al.  (2003) for instance. And somewhat dual of the vanilla Rao-Backwellisation paper Randal and I wrote a (long!) while ago. Given a target π with a dominating measure π⁰≥Mπ, using a Markov kernel to simulate from this dominating measure and subsampling by the importance weight ρ does produce a new Markov chain with the desired target measure as invariant distribution. However, the domination assumption is rather unrealistic and a generic approach can be implemented without it, by defining an extended Markov chain, with the addition of the number N of replicas as the supplementary term… And a transition kernel R(n|x) on N with expectation ρ, which is a minimal(ist) assumption for the validation of the algorithm.. While this initially defines a semi-Markov chain, an extended Markov representation is also feasible, by decreasing N one by one until reaching zero, and this is most helpful in deriving convergence properties for the resulting chain, including a CLT.  While the choice of the kernel R is free, the optimal choice is associated with residual sampling, where only the fractional part of ρ is estimated by a Bernoulli simulation.

Advancements in Bayesian Methods and Implementations [to appear]

Posted in Books, Statistics with tags , , , , , on July 17, 2022 by xi'an

As noted in another post, I wrote a chapter on Bayesian testing for an incoming handbook, Advancements in Bayesian methods and implementations which is published by Elsevier at an atrocious price (as usual). Here is the table of contents:

1. Fisher Information, Cramèr-Rao and Bayesian Paradigm by Roy Frieden
2. Compound beta binomial distribution functions by Angelo Plastino
3. MCMC for GLMMS by Vivekananda Roy
4. Signal Processing and Bayesian by Chandra Murthy
5. Mathematical theory of Bayesian statistics where all models are wrong by Sumio Watanabe
6. Machine Learning and Bayesian by Jun Zhu
7. Non-parametric Bayes by Stephen Walker
8. [50 shades of] Bayesian testing [of hypotheses] by Christian P. Robert
9. Data Analysis with humans by Sumio Kaski
10. Bayesian Inference under selection by G. Alastair Young
10. Variational inference or Functional horseshoe by Anirban Bhattacharya
11. Generalized Bayes by Ryan Martin

and my chapter is also available on arXiv, quickly gathered from earlier short courses at O’Bayes meetings and some xianblog entries on the topic, hence not containing much novelty!

Bayes Rules! [book review]

Posted in Books, Kids, Mountains, pictures, R, Running, Statistics, University life with tags , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , on July 5, 2022 by xi'an

Bayes Rules! is a new introductory textbook on Applied Bayesian Model(l)ing, written by Alicia Johnson (Macalester College), Miles Ott (Johnson & Johnson), and Mine Dogucu (University of California Irvine). Textbook sent to me by CRC Press for review. It is available (free) online as a website and has a github site, as well as a bayesrule R package. (Which reminds me that both our own book R packages, bayess and mcsm, have gone obsolete on CRAN! And that I should find time to figure out the issue for an upgrading…)

As far as I can tell [from abroad and from only teaching students with a math background], Bayes Rules! seems to be catering to early (US) undergraduate students with very little exposure to mathematical statistics or probability, as it introduces basic probability notions like pmf, joint distribution, and Bayes’ theorem (as well as Greek letters!) and shies away from integration or algebra (a covariance matrix occurs on page 437 with a lot . For instance, the Normal-Normal conjugacy derivation is considered a “mouthful” (page 113). The exposition is somewhat stretched along the 500⁺ pages as a result, imho, which is presumably a feature shared with most textbooks at this level, and, accordingly, the exercises and quizzes are more about intuition and reproducing the contents of the chapter than technical. In fact, I did not spot there a mention of sufficiency, consistency, posterior concentration (almost made on page 113), improper priors, ergodicity, irreducibility, &tc., while other notions are not precisely defined, like ESS, weakly informative (page 234) or vague priors (page 77), prior information—which makes the negative answer to the quiz “All priors are informative”  (page 90) rather confusing—, R-hat, density plot, scaled likelihood, and more.

As an alternative to “technical derivations” Bayes Rules! centres on intuition and simulation (yay!) via its bayesrule R package. Itself relying on rstan. Learning from example (as R code is always provided), the book proceeds through conjugate priors, MCMC (Metropolis-Hasting) methods, regression models, and hierarchical regression models. Quite impressive given the limited prerequisites set by the authors. (I appreciated the representations of the prior-likelihood-posterior, especially in the sequential case.)

Regarding the “hot tip” (page 108) that the posterior mean always stands between the prior mean and the data mean, this should be made conditional on a conjugate setting and a mean parameterisation. Defining MCMC as a method that produces a sequence of realisations that are not from the target makes a point, except of course that there are settings where the realisations are from the target, for instance after a renewal event. Tuning MCMC should remain a partial mystery to readers after reading Chapter 7 as the Goldilocks principle is quite vague. Similarly, the derivation of the hyperparameters in a novel setting (not covered by the book) should prove a challenge, even though the readers are encouraged to “go forth and do some Bayes things” (page 509).

While Bayes factors are supported for some hypothesis testing (with no point null), model comparison follows more exploratory methods like X validation and expected log-predictive comparison.

The examples and exercises are diverse (if mostly US centric), modern (including cultural references that completely escape me), and often reflect on the authors’ societal concerns. In particular, their concern about a fair use of the inferred models is preminent, even though a quantitative assessment of the degree of fairness would require a much more advanced perspective than the book allows… (In that respect, Exercise 18.2 and the following ones are about book banning (in the US). Given the progressive tone of the book, and the recent ban of math textbooks in the US, I wonder if some conservative boards would consider banning it!) Concerning the Himalaya submitting running example (Chapters 18 & 19), where the probability to summit is conditional on the age of the climber and the use of additional oxygen, I am somewhat surprised that the altitude of the targeted peak is not included as a covariate. For instance, Ama Dablam (6848 m) is compared with Annapurna I (8091 m), which has the highest fatality-to-summit ratio (38%) of all. This should matter more than age: the Aosta guide Abele Blanc climbed Annapurna without oxygen at age 57! More to the point, the (practical) detailed examples do not bring unexpected conclusions, as for instance the fact that runners [thrice alas!] tend to slow down with age.

A geographical comment: Uluru (page 267) is not a city!, but an impressive sandstone monolith in the heart of Australia, a 5 hours drive away from Alice Springs. And historical mentions: Alan Turing (page 10) and the team at Bletchley Park indeed used Bayes factors (and sequential analysis) in cracking the Enigma, but this remained classified information for quite a while. Arianna Rosenbluth (page 10, but missing on page 165) was indeed a major contributor to Metropolis et al.  (1953, not cited), but would not qualify as a Bayesian statistician as the goal of their algorithm was a characterisation of the Boltzman (or Gibbs) distribution, not statistical inference. And David Blackwell’s (page 10) Basic Statistics is possibly the earliest instance of an introductory Bayesian and decision-theory textbook, but it never mentions Bayes or Bayesianism.

[Disclaimer about potential self-plagiarism: this post or an edited version will eventually appear in my Book Review section in CHANCE.]

%d bloggers like this: