hard birthday problem

Posted in Books, Kids, R, Statistics with tags , , , , , , , , , on February 4, 2021 by xi'an

From an X validated question, found that WordPress now allows for direct link to pdf documents, like the above paper by my old friend Anirban Das Gupta! The question is about estimating a number M of individuals with N distinct birth dates over a year of T days. After looking around I could not find a simpler representation of the probability for N=r other than (1) in my answer,

$\frac{T!}{(\bar N-r)!}\frac{m!}{T^m} \sum_{(r_1,\ldots,r_m);\\\sum_1^m r_i=r\ \&\\\sum_1^m ir_i=m}1\Big/\prod_{j=1}^m r_j! (j!)^{r_j}$

borrowed from a paper by Fisher et al. (Another Fisher!) Checking Feller leads to the probability (p.102)

${T \choose r}\sum_{\nu=0}^r (-1)^{\nu}{r\choose\nu}\left(1-\frac{T-r+\nu}T \right)^m$

which fits rather nicely simulation frequencies, as shown using

apply(!apply(matrix(sample(1:Nb,T*M,rep=TRUE),T,M),1,duplicated),2,sum)


Further, Feller (1970, pp.103-104) justifies an asymptotic Poisson approximation with parameter$$\lambda(M)=\bar{N}\exp\{-M/\bar N\}$ from which an estimate of$M\$ can be derived. With the birthday problem as illustration (pp.105-106)!

It may be that a completion from N to (R¹,R²,…) where the components are the number of days with one birthdate, two birthdates, &tc. could help design an EM algorithm that would remove the summation in (1) but I did not spend more time on the problem (than finding a SAS approximation to the probability!).

Jean-Paul Benzécri (1932-2019)

Posted in Books, pictures, Statistics, University life with tags , , , , , , , , , , on December 3, 2019 by xi'an

I learned last weekend that Jean-Paul Benzécri had died earlier in the week. He was a leading and charismatic figure of the French renewal in data analysis (or analyse des données) that used mostly algebraic tools to analyse large datasets, while staying as far as possible from the strong abstraction of French statistics at that time. While I did not know him on a personal basis, I remember from my lecturer years there that he used to come to Institut de Statistique de l’Université de Paris (ISUP), Université Pierre et Marie Curie, once a week and meet with a large group of younger statisticians, students and junior faculty, and then talk to them for long hours while walking back and forth along the corridor in Jussieu. Showing extreme dedication from the group as this windowless corridor was particularly ghastly! (I also remember less fondly hours spent over piles and piles of SAS printout trying to make sense of multiple graphs of projections produced by these algebraic methods and feeling there were too many degrees of freedom for them to feel rigorous enough.)

don’t be late for BayesComp’2020

Posted in Statistics with tags , , , , , , , , , , , , , on October 4, 2019 by xi'an

An important reminder that October 14 is the deadline for regular registration to BayesComp 2020 as late fees will apply afterwards!!! The conference looks attractive enough to agree to pay more, but still…

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , on August 17, 2019 by xi'an

While I have forgotten to send a reminder that August 15 was the first deadline of BayesComp 2020 for the early registrations, here are further deadlines and dates

1. BayesComp 2020 occurs on January 7-10 2020 in Gainesville, Florida, USA
2. Registration is open with regular rates till October 14, 2019
3. Deadline for submission of poster proposals is December 15, 2019
4. Deadline for travel support applications is September 20, 2019
5. There are four free tutorials on January 7, 2020, related with Stan, NIMBLE, SAS, and AutoStat

SAS on Bayes

Posted in Books, Kids, pictures, R, Statistics, University life with tags , , , , , , , on November 8, 2016 by xi'an

Following a question on X Validated, I became aware of the following descriptions of the pros and cons of Bayesian analysis, as perceived by whoever (Tim Arnold?) wrote SAS/STAT(R) 9.2 User’s Guide, Second Edition. I replied more specifically on the point

It [Bayesian inference] provides inferences that are conditional on the data and are exact, without reliance on asymptotic approximation. Small sample inference proceeds in the same manner as if one had a large sample. Bayesian analysis also can estimate any functions of parameters directly, without using the “plug-in” method (a way to estimate functionals by plugging the estimated parameters in the functionals).

which I find utterly confusing and not particularly relevant. The other points in the list are more traditional, except for this one

It provides interpretable answers, such as “the true parameter θ has a probability of 0.95 of falling in a 95% credible interval.”

that I find somewhat unappealing in that the 95% probability has only relevance wrt to the resulting posterior, hence has no absolute (and definitely no frequentist) meaning. The criticisms of the prior selection

It does not tell you how to select a prior. There is no correct way to choose a prior. Bayesian inferences require skills to translate subjective prior beliefs into a mathematically formulated prior. If you do not proceed with caution, you can generate misleading results.

It can produce posterior distributions that are heavily influenced by the priors. From a practical point of view, it might sometimes be difficult to convince subject matter experts who do not agree with the validity of the chosen prior.

are traditional but nonetheless irksome. Once acknowledged there is no correct or true prior, it follows naturally that the resulting inference will depend on the choice of the prior and has to be understood conditional on the prior, which is why the credible interval has for instance an epistemic rather than frequentist interpretation. There is also little reason for trying to convince a fellow Bayesian statistician about one’s prior. Everything is conditional on the chosen prior and I see less and less why this should be an issue.