**I** learned last weekend that Jean-Paul Benzécri had died earlier in the week. He was a leading and charismatic figure of the French renewal in data analysis (or *analyse des données*) that used mostly algebraic tools to analyse large datasets, while staying as far as possible from the strong abstraction of French statistics at that time. While I did not know him on a personal basis, I remember from my lecturer years there that he used to come to Institut de Statistique de l’Université de Paris (ISUP), Université Pierre et Marie Curie, once a week and meet with a large group of younger statisticians, students and junior faculty, and then talk to them for long hours while walking back and forth along the corridor in Jussieu. Showing extreme dedication from the group as this windowless corridor was particularly ghastly! (I also remember less fondly hours spent over piles and piles of SAS printout trying to make sense of multiple graphs of projections produced by these algebraic methods and feeling there were too many degrees of freedom for them to feel rigorous enough.)

## Archive for SAS

## Jean-Paul Benzécri (1932-2019)

Posted in Books, pictures, Statistics, University life with tags analyse des correspondances, analyse des données, French statistics, ISUP, Jean-Paul Benzécri, Jussieu, obituary, Paris 6, SAS, Université de Paris, Université Pierre et Marie Curie on December 3, 2019 by xi'an## don’t be late for BayesComp’2020

Posted in Statistics with tags AutoStat, BayesComp 2020, Bayesian computing, conference, conference fees, Florida, Gainesville, ISBA, MCMSki, Nimble, SAS, STAN, tutorial, University of Florida on October 4, 2019 by xi'an**A**n important reminder that October 14 is the deadline for regular registration to BayesComp 2020 as late fees will apply afterwards!!! The conference looks attractive enough to agree to pay more, but still…

## deadlines for BayesComp’2020

Posted in pictures, Statistics, Travel, University life with tags AutoStat, BayesComp 2020, Bayesian computing, conference, Florida, Gainesville, ISBA, Nimble, SAS, STAN, tutorial, University of Florida on August 17, 2019 by xi'an**W**hile I have forgotten to send a reminder that August 15 was the first deadline of BayesComp 2020 for the early registrations, here are further deadlines and dates

- BayesComp 2020 occurs on January 7-10 2020 in Gainesville, Florida, USA
- Registration is open with regular rates till October 14, 2019
- Deadline for submission of poster proposals is December 15, 2019
- Deadline for travel support applications is September 20, 2019
- There are four free tutorials on January 7, 2020, related with Stan, NIMBLE, SAS, and AutoStat

## SAS on Bayes

Posted in Books, Kids, pictures, R, Statistics, University life with tags asymptotics, Bayesian inference, credible intervals, cross validated, epistemic probability, plug-in resolution, SAS, Stack Exchange on November 8, 2016 by xi'an**F**ollowing a question on X Validated, I became aware of the following descriptions of the pros and cons of Bayesian analysis, as perceived by whoever (Tim Arnold?) wrote SAS/STAT(R) 9.2 User’s Guide, Second Edition. I replied more specifically on the point

It [Bayesian inference] provides inferences that are conditional on the data and are exact, without reliance on asymptotic approximation. Small sample inference proceeds in the same manner as if one had a large sample. Bayesian analysis also can estimate any functions of parameters directly, without using the “plug-in” method (a way to estimate functionals by plugging the estimated parameters in the functionals).

which I find utterly confusing and not particularly relevant. The other points in the list are more traditional, except for this one

It provides interpretable answers, such as “the true parameter θ has a probability of 0.95 of falling in a 95% credible interval.”

that I find somewhat unappealing in that the 95% probability has only relevance wrt to the resulting posterior, hence has no absolute (and definitely no frequentist) meaning. The criticisms of the prior selection

It does not tell you how to select a prior. There is no correct way to choose a prior. Bayesian inferences require skills to translate subjective prior beliefs into a mathematically formulated prior. If you do not proceed with caution, you can generate misleading results.

It can produce posterior distributions that are heavily influenced by the priors. From a practical point of view, it might sometimes be difficult to convince subject matter experts who do not agree with the validity of the chosen prior.

are traditional but nonetheless irksome. Once acknowledged there is no correct or true prior, it follows naturally that the resulting inference will depend on the choice of the prior and has to be understood conditional on the prior, which is why the credible interval has for instance an epistemic rather than frequentist interpretation. There is also little reason for trying to convince a fellow Bayesian statistician about one’s prior. Everything is conditional on the chosen prior and I see less and less why this should be an issue.

## optimising accept-reject

Posted in R, Statistics, University life with tags accept-reject algorithm, loops, Poisson distribution, R, R course, SAS, system.time on November 21, 2012 by xi'an**I** spotted on R-bloggers a post discussing optimising the efficiency of programming accept-reject algorithms. While it is about SAS programming, and apparently supported by the SAS company, there are two interesting features with this discussion. The first one is about avoiding the dreaded loop in accept-reject algorithms. For instance, taking the case of the truncated-at-one Poisson distribution, the code

rtpois=function(n,lambda){ sampl=c() while (length(sampl)<n){ x=rpois(1,lambda) if (x!=0) sampl=c(sampl,x)} return(sampl) }

is favoured by my R course students but highly inefficient:

> system.time(rtpois(10^5,.5)) user system elapsed 61.600 27.781 98.727

both for the stepwise increase in the size of the vector and for the loop. For instance, defining the vector sampl first requires a tenth of the above time (note the switch from 10⁵ to 10⁶):

> system.time(rtpois(10^6,.5)) user system elapsed 54.155 0.200 62.635

**A**s discussed by the author of the post, a more efficient programming should aim at avoiding the loop by predicting the number of proposals necessary to accept a given number of values. Since the bound M used in accept-reject algorithms is also the expected number of attempts for one acceptance, one should start with something around Mn proposed values. (Assuming of course all densities are properly normalised.) For instance, in the case of the truncated-at-one Poisson distribution based on proposals from the regular Poisson, the bound is 1/1-e^{-λ}. A first implementation of this principle is to build the sample via a few loops:

rtpois=function(n,lambda){ propal=rpois(ceiling(n/(1-exp(-lambda))),lambda) propal=propal[propal>0] n0=length(propal) if (n0>=n) return(propal[1:n]) else return(c(propal,rtpois(n-n0,lambda))) }

with a higher efficiency:

> system.time(rtpois(10^6,.5)) user system elapsed 0.816 0.092 0.928

Replacing the expectation with an upper bound using the variance of the negative binomial distribution does not make a significant dent in the computing time

rtpois=function(n,lambda){ M=1/(1-exp(-lambda)) propal=rpois(ceiling(M*(n+2*sqrt(n/M)/(M-1))),lambda) propal=propal[propal>0] n0=length(propal) if (n0>=n) return(propal[1:n]) else return(c(propal,rtpois(n-n0,lambda)))}

since we get

> system.time(rtpois(10^6,.5)) user system elapsed 0.824 0.048 0.877

**T**he second point about this Poisson example is that simulating a distribution with a restricted support using another distribution with a larger support is quite inefficient. Especially when λ goes to zero By comparison, using a Poisson proposal with parameter μ and translating it by 1 may bring a considerable improvement: without getting into the gory details, it can be shown that the optimal value of μ (in terms of maximal acceptance probability) is λ and that the corresponding probability of acceptance is

which is larger than the probability of the original approach when λ is less than one. As shown by the graph below, this allows for a lower bound in the probability of acceptance that remains tolerable.