inf R ! [book review]

Posted in Books, R, Travel with tags , , , , , , , , , , , on June 10, 2021 by xi'an

Thanks to my answering a (basic) question on X validated involving an R code, R mistakes and some misunderstanding about Bayesian hierarchical modelling, I got pointed out to Patrick Burns’ The R inferno. This is not a recent book as the second edition is of 2012, with a 2011 version still available on-line. Which is the version I read. As hinted by the cover, the book plays on Dante’s Inferno and each chapter is associated with a circle of Hell… Including drawings by Botticelli. The style is thus most enjoyable and sometimes hilarious. Like hell!

The first circle (reserved for virtuous pagans) is about treating integral reals as if they were integers, the second circle (attributed to gluttons, although Dante’s is for the lustful) is about allocating more space along the way, as in the question I answered and in most of my students’ codes! The third circle (allocated here to blasphemous sinners, destined for Dante’s seven circle, when Dante’s third circle is to the gluttons) points out the consequences of not vectorising, with for instance the impressive capacities of the ifelse() function [exploited to the max in R codecolfing!].  And the fourth circle (made for the lustfuls rather than Dante’s avaricious and prodigals) is a short warning about the opposite over-vectorising. Circle five (destined for the treasoners, and not Dante’s wrathfuls) pushes for and advises about writing R functions. Circle six recovers Dante’s classification, welcoming (!) heretics, and prohibiting global assignments, in another short chapter. Circle seven (alloted to the simoniacs—who should be sharing the eight circle with many other sinners—rather than the violents as in Dante’s seventh) discusses object attributes, with the distinction between S3 and S4 methods somewhat lost on me. Circle eight (targeting the fraudulents, as in Dante’s original) is massive as it covers “a large number of ghosts, chimeras and devils”, a collection of difficulties and dangers and freak occurences, with the initial warning that “It is a sin to assume that code does what is intended”. A lot of these came as surprises to me and I was rarely able to spot the difficulty without the guidance of the book. Plenty to learn from these examples and counter-examples. Reaching Circle nine (where live (!) the thieves, rather than Dante’s traitors). A “special place for those who feel compelled to drag the rest of us into hell.” Discussing the proper ways to get help on fori. Like Stack Exchange. Concluding with the tongue-in-cheek comment that “there seems to be positive correlation between a person’s level of annoyance at [being asked several times the same question] and ability to answer questions.” This being a hidden test, right?!, as the correlation should be negative.

on approximations of Φ and Φ⁻¹

Posted in Books, Kids, R, Statistics with tags , , , , , , , , , on June 3, 2021 by xi'an

As I was working on a research project with graduate students, I became interested in fast and not necessarily very accurate approximations to the normal cdf Φ and its inverse. Reading through this 2010 paper of Richards et al., using for instance Polya’s

$F_0(x) =\frac{1}{2}(1+\sqrt{1-\exp(-2x^2/\pi)})$

(with another version replacing 2/π with the squared root of π/8) and

$F_2(x)=1/1+\exp(-1.5976x(1+0.04417x^2))$

not to mention a rational faction. All of which are more efficient (in R), if barely, than the resident pnorm() function.

      test replications elapsed relative user.self
3 logistic       100000   0.410    1.000     0.410
2    polya       100000   0.411    1.002     0.411
1 resident       100000   0.455    1.110     0.455


For the inverse cdf, the approximations there are involving numerical inversion except for

$F_0^{-1}(p) =(-\pi/2 \log[1-(2p-1)^2])^{\frac{1}{2}}$

which proves slightly faster than qnorm()

       test replications elapsed relative user.self
2 inv-polya       100000   0.401    1.000     0.401
1  resident       100000   0.450    1.000     0.450


scale matters [maths as well]

Posted in pictures, R, Statistics with tags , , , , , , , , on June 2, 2021 by xi'an

A question from X validated on why an independent Metropolis sampler of a three component Normal mixture based on a single Normal proposal was failing to recover the said mixture…

When looking at the OP’s R code, I did not notice anything amiss at first glance (I was about to drive back from Annecy, hence did not look too closely) and reran the attached code with a larger variance in the proposal, which returned the above picture for the MCMC sample, close enough (?) to the target. Later, from home, I checked the code further and noticed that the Metropolis ratio was only using the ratio of the targets. Dividing by the ratio of the proposals made a significant (?) to the representation of the target.

More interestingly, the OP was fundamentally confused between independent and random-walk Rosenbluth algorithms, from using the wrong ratio to aiming at the wrong scale factor and average acceptance ratio, and furthermore challenged by the very notion of Hessian matrix, which is often suggested as a default scale.

more air for MCMC

Posted in Books, R, Statistics with tags , , , , , , , , , , , , , , on May 30, 2021 by xi'an

Aki Vehtari, Andrew Gelman, Dan Simpson, Bob Carpenter, and Paul-Christian Bürkner have just published a Bayesian Analysis paper about using an improved R factor for MCMC convergence assessment. From the early days of MCMC, convergence assessment has been a recurring (and recurrent!) question in the community. First leading to a flurry of proposals, [which Kerrie, Chantal, and myself reviewwwed in the Valencia 1998 proceedings], and then slowly disintegrating under the onslaughts of reality—i.e. that none could not be 100% foolproof in full generality—…. This included the (possibly now forgotten) single-versus-multiple-chains debate between Charlie Geyer [for single] and Andrew Gelman and Don Rubin [for multiple]. The later introduced an analysis-of-variance R factor, which remains quite popular up to this day, in part for being part of most MCMC software, like BUGS. That this R may fail to identify convergence issues, even in the more recent split version, does not come as a major surprise, since any situation with a long-term influence of the starting distribution may well fail to identify missing (significant) parts of the posterior support. (It is thus somewhat disconcerting to me to see that the main recommendation is to move the bound on R from 1.1 to 1.01, reminding me to some extent of a recent proposal to move the null rejection boundary from 0.05 to 0.005…) Similarly, the ESS may prove a poor signal for convergence or lack thereof, especially because the approximation of the asymptotic variance relies on stationarity assumptions. While multiplying the monitoring tools (as in CODA) helps with identifying convergence issues, looking at a single convergence indicator is somewhat like looking only at a frequentist estimator! (And with greater automation comes greater responsibility—in keeping a critical perspective.)

Looking for a broader perspective, I thus wonder at what we would instead need to assess the lack of convergence of an MCMC chain without much massaging of the said chain. An evaluation of the (Kullback, Wasserstein, or else) distance between the distribution of the chain at iteration n or across iterations, and the true target? A percentage of the mass of the posterior visited so far, which relates to estimating the normalising constant, with a relatively vast array of solutions made available in the recent years? I remain perplexed and frustrated by the fact that, 30 years later, the computed values of the visited likelihoods are not better exploited. Through for instance machine-learning approximations of the target. that could themselves be utilised for approximating the normalising constant and potential divergences from other approximations.

bean bag win

Posted in Books, Kids, pictures, R with tags , , , , on May 19, 2021 by xi'an

A quick riddle from The Riddler, where a multiple step game sees a probability of a 3 point increase of .4 and a probability of a 1 point increase of .3 with a first strategy (A), versus a probability of a 3 point increase of .4 and a probability of a 1 point increase of .3 with a second strategy (B), and a sure miss third strategy (C). The goal is to optimise the probability of hitting exactly 3 points after 4 steps.

The optimal strategy is to follow A while the score is zero, C when the score is 3, and B otherwise. The corresponding winning probability is 0.8548, as checked by the following code

win=function(n=1,s=0){
if(n==4)return((s==3)+.4*(!s)+.8*(s==2))
else{return(max(c(
.4*win(n+1,s+3)+.3*win(n+1,s+1)+.3*win(n+1,s),
.1*win(n+1,s+3)+.8*win(n+1,s+1)+.1*win(n+1,s),
win(n+1,s))))}}