**I** learned last weekend that Jean-Paul Benzécri had died earlier in the week. He was a leading and charismatic figure of the French renewal in data analysis (or *analyse des données*) that used mostly algebraic tools to analyse large datasets, while staying as far as possible from the strong abstraction of French statistics at that time. While I did not know him on a personal basis, I remember from my lecturer years there that he used to come to Institut de Statistique de l’Université de Paris (ISUP), Université Pierre et Marie Curie, once a week and meet with a large group of younger statisticians, students and junior faculty, and then talk to them for long hours while walking back and forth along the corridor in Jussieu. Showing extreme dedication from the group as this windowless corridor was particularly ghastly! (I also remember less fondly hours spent over piles and piles of SAS printout trying to make sense of multiple graphs of projections produced by these algebraic methods and feeling there were too many degrees of freedom for them to feel rigorous enough.)

## Archive for SAS

## Jean-Paul Benzécri (1932-2019)

Posted in Books, pictures, Statistics, University life with tags analyse des correspondances, analyse des données, French statistics, ISUP, Jean-Paul Benzécri, Jussieu, obituary, Paris 6, SAS, Université de Paris, Université Pierre et Marie Curie on December 3, 2019 by xi'an## don’t be late for BayesComp’2020

Posted in Statistics with tags AutoStat, BayesComp 2020, Bayesian computing, conference, conference fees, Florida, Gainesville, ISBA, MCMSki, Nimble, SAS, STAN, tutorial, University of Florida on October 4, 2019 by xi'an**A**n important reminder that October 14 is the deadline for regular registration to BayesComp 2020 as late fees will apply afterwards!!! The conference looks attractive enough to agree to pay more, but still…

## deadlines for BayesComp’2020

Posted in pictures, Statistics, Travel, University life with tags AutoStat, BayesComp 2020, Bayesian computing, conference, Florida, Gainesville, ISBA, Nimble, SAS, STAN, tutorial, University of Florida on August 17, 2019 by xi'an**W**hile I have forgotten to send a reminder that August 15 was the first deadline of BayesComp 2020 for the early registrations, here are further deadlines and dates

- BayesComp 2020 occurs on January 7-10 2020 in Gainesville, Florida, USA
- Registration is open with regular rates till October 14, 2019
- Deadline for submission of poster proposals is December 15, 2019
- Deadline for travel support applications is September 20, 2019
- There are four free tutorials on January 7, 2020, related with Stan, NIMBLE, SAS, and AutoStat

## SAS on Bayes

Posted in Books, Kids, pictures, R, Statistics, University life with tags asymptotics, Bayesian inference, credible intervals, cross validated, epistemic probability, plug-in resolution, SAS, Stack Exchange on November 8, 2016 by xi'an**F**ollowing a question on X Validated, I became aware of the following descriptions of the pros and cons of Bayesian analysis, as perceived by whoever (Tim Arnold?) wrote SAS/STAT(R) 9.2 User’s Guide, Second Edition. I replied more specifically on the point

It [Bayesian inference] provides inferences that are conditional on the data and are exact, without reliance on asymptotic approximation. Small sample inference proceeds in the same manner as if one had a large sample. Bayesian analysis also can estimate any functions of parameters directly, without using the “plug-in” method (a way to estimate functionals by plugging the estimated parameters in the functionals).

which I find utterly confusing and not particularly relevant. The other points in the list are more traditional, except for this one

It provides interpretable answers, such as “the true parameter θ has a probability of 0.95 of falling in a 95% credible interval.”

that I find somewhat unappealing in that the 95% probability has only relevance wrt to the resulting posterior, hence has no absolute (and definitely no frequentist) meaning. The criticisms of the prior selection

It does not tell you how to select a prior. There is no correct way to choose a prior. Bayesian inferences require skills to translate subjective prior beliefs into a mathematically formulated prior. If you do not proceed with caution, you can generate misleading results.

It can produce posterior distributions that are heavily influenced by the priors. From a practical point of view, it might sometimes be difficult to convince subject matter experts who do not agree with the validity of the chosen prior.

are traditional but nonetheless irksome. Once acknowledged there is no correct or true prior, it follows naturally that the resulting inference will depend on the choice of the prior and has to be understood conditional on the prior, which is why the credible interval has for instance an epistemic rather than frequentist interpretation. There is also little reason for trying to convince a fellow Bayesian statistician about one’s prior. Everything is conditional on the chosen prior and I see less and less why this should be an issue.

## optimising accept-reject

Posted in R, Statistics, University life with tags accept-reject algorithm, loops, Poisson distribution, R, R course, SAS, system.time on November 21, 2012 by xi'an**I** spotted on R-bloggers a post discussing optimising the efficiency of programming accept-reject algorithms. While it is about SAS programming, and apparently supported by the SAS company, there are two interesting features with this discussion. The first one is about avoiding the dreaded loop in accept-reject algorithms. For instance, taking the case of the truncated-at-one Poisson distribution, the code

rtpois=function(n,lambda){ sampl=c() while (length(sampl)<n){ x=rpois(1,lambda) if (x!=0) sampl=c(sampl,x)} return(sampl) }

is favoured by my R course students but highly inefficient:

> system.time(rtpois(10^5,.5)) user system elapsed 61.600 27.781 98.727

both for the stepwise increase in the size of the vector and for the loop. For instance, defining the vector sampl first requires a tenth of the above time (note the switch from 10⁵ to 10⁶):

> system.time(rtpois(10^6,.5)) user system elapsed 54.155 0.200 62.635

**A**s discussed by the author of the post, a more efficient programming should aim at avoiding the loop by predicting the number of proposals necessary to accept a given number of values. Since the bound M used in accept-reject algorithms is also the expected number of attempts for one acceptance, one should start with something around Mn proposed values. (Assuming of course all densities are properly normalised.) For instance, in the case of the truncated-at-one Poisson distribution based on proposals from the regular Poisson, the bound is 1/1-e^{-λ}. A first implementation of this principle is to build the sample via a few loops:

rtpois=function(n,lambda){ propal=rpois(ceiling(n/(1-exp(-lambda))),lambda) propal=propal[propal>0] n0=length(propal) if (n0>=n) return(propal[1:n]) else return(c(propal,rtpois(n-n0,lambda))) }

with a higher efficiency:

> system.time(rtpois(10^6,.5)) user system elapsed 0.816 0.092 0.928

Replacing the expectation with an upper bound using the variance of the negative binomial distribution does not make a significant dent in the computing time

rtpois=function(n,lambda){ M=1/(1-exp(-lambda)) propal=rpois(ceiling(M*(n+2*sqrt(n/M)/(M-1))),lambda) propal=propal[propal>0] n0=length(propal) if (n0>=n) return(propal[1:n]) else return(c(propal,rtpois(n-n0,lambda)))}

since we get

> system.time(rtpois(10^6,.5)) user system elapsed 0.824 0.048 0.877

**T**he second point about this Poisson example is that simulating a distribution with a restricted support using another distribution with a larger support is quite inefficient. Especially when λ goes to zero By comparison, using a Poisson proposal with parameter μ and translating it by 1 may bring a considerable improvement: without getting into the gory details, it can be shown that the optimal value of μ (in terms of maximal acceptance probability) is λ and that the corresponding probability of acceptance is

which is larger than the probability of the original approach when λ is less than one. As shown by the graph below, this allows for a lower bound in the probability of acceptance that remains tolerable.

## Bayes vs. SAS

Posted in Books, R, Statistics with tags adaptive MCMC methods, Bayes factor, Bayesian inference, BUGS, coda, convergence diagnostics, DIC, GENMOD, Gibbs sampling, hypothesis testing, improper prior, Introducing Monte Carlo Methods with R, LIFEREG, MCMC, Metropolis-Hastings, model choice, PHREG, SAS, variable selection on May 7, 2010 by xi'an**G**lancing perchance at the back of my * Amstat* News, I was intrigued by the

**SAS**advertisement

Bayesian Methods

Specify Bayesian analysis for ANOVA, logistic regression, Poisson regression, accelerated failure time models and Cox regression through the GENMOD, LIFEREG and PHREG procedures.

Analyze a wider variety of models with the MCMC procedure, a general purpose Bayesian analysis procedure.

and so decided to take a look at those items on the **SAS** website. (Some entries date back to 2006 so I am not claiming novelty in this post, just my reading through the manual!)

**E**ven though I have not looked at a **SAS** program since the time in 1984 I was learning principal component and discriminant analysis by programming **SAS** procedures on punched cards, it seems the MCMC part is rather manageable (if you can manage **SAS** at all!), looking very much like a second BUGS to my bystander eyes, even to the point of including ARS algorithms! The models are defined in a BUGS manner, with priors on the side (and this includes improper priors, despite a confusing first example that mixes very large variances with vague priors for the linear model!). The basic scheme is a random walk proposal with adaptive scale or covariance matrix. (The adaptivity on the covariance matrix is slightly confusing in that the way it is described it does not seem to implement the requirements of Roberts and Rosenthal for sure convergence.) Gibbs sampling is not directly covered, although some examples are in essence using Gibbs samplers. Convergence is assessed via ca. 1995 methods à la Cowles and Carlin, including the rather unreliable Raftery and Lewis indicator, but so does ** Introducing Monte Carlo Methods with R**, which takes advantage of the R

*coda*package. I have not tested (!) any of the features in the MCMC procedure but judging from a quick skim through the 283 page manual everything looks reasonable enough. I wonder if anyone has ever tested a SAS program against its BUGS counterpart for efficiency comparison.

**T**he Bayesian aspects are rather traditional as well, except for the testing issue. Indeed, from what I have read, **SAS** does not engage into testing and remains within estimation bounds, offering only HPD regions for variable selection without producing a genuine Bayesian model choice tool. I understand the issues with handling improper priors versus computing Bayes factors, as well as some delicate computational requirements, but this is a truly important chunk missing from the package. (Of course, the package contains a DIC (Deviance information criterion) capability, which may be seen as a substitute, but I have reservations about the relevance of DIC outside generalised linear models. Same difficulty with the posterior predictive.) As usual with SAS, the documentation is huge (I still remember the shelves after shelves of documentation volumes in my 1984 card-punching room!) and full of options and examples. Nothing to complain about. Except maybe the list of *disadvantages in using Bayesian analysis:*

*It does not tell you how to select a prior. There is no correct way to choose a prior. Bayesian inferences require skills to translate prior beliefs into a mathematically formulated prior. If you do not proceed with caution, you can generate misleading results.**It can produce posterior distributions that are heavily influenced by the priors. From a practical point of view, it might sometimes be difficult to convince subject matter experts who do not agree with the validity of the chosen prior.**It often comes with a high computational cost, especially in models with a large number of parameters.*

which does not say much… Since the MCMC procedure allows for any degree of hierarchical modelling, it is always possible to check the impact of a given prior by letting its parameters go random. I found that most practitioners are happy with the formalisation of their prior beliefs into mathematical densities, rather than adamant about a specific prior. As for computation, this is not a major issue.

## SAS on the radio

Posted in Statistics with tags SAS on February 4, 2009 by xi'an**I** was quite surprised this morning when hearing on the (public) radio France Inter that the report on driving conditions was sponsored by SAS (yes, the software)… I am not even sure SAS has anything to do with driving predictions, but it is interesting to see that SAS needs to advertise on the main French public radio, despite all insurance companies already requesting SAS proficiency from our students.