## how large is 9!!!!!!!!!?

Posted in Statistics with tags , , , , , , , , , on March 17, 2017 by xi'an

This may sound like an absurd question [and in some sense it is!], but this came out of a recent mathematical riddle on The Riddler, asking for the largest number one could write with ten symbols. The difficulty with this riddle is the definition of a symbol, as the collection of available symbols is a very relative concept. For instance, if one takes  the symbols available on a basic pocket calculator, besides the 10 digits and the decimal point, there should be the four basic operations plus square root and square,which means that presumably 999999999² is the largest one can  on a cell phone, there are already many more operations, for instance my phone includes the factorial operator and hence 9!!!!!!!!! is a good guess. While moving to a computer the problem becomes somewhat meaningless, both because there are very few software that handle infinite precision computing and hence very large numbers are not achievable without additional coding, and because it very much depends on the environment and on which numbers and symbols are already coded in the local language. As illustrated by this X validated answer, this link from The Riddler, and the xkcd entry below. (The solution provided by The Riddler itself is not particularly relevant as it relies on a particular collection of symbols, which mean Rado’s number BB(9999!) is also a solution within the right referential.)

## what does more efficient Monte Carlo mean?

Posted in Books, Kids, R, Statistics with tags , , , , , , on March 17, 2017 by xi'an

“I was just thinking that there might be a magic trick to simulate directly from this distribution without having to go for less efficient methods.”

In a simple question on X validated a few days ago [about simulating from x²φ(x)] popped up the remark that the person asking the question wanted a direct simulation method for higher efficiency. Compared with an accept-reject solution. Which shows a misunderstanding of what “efficiency” means on Monte Carlo situations. If it means anything, I would think it is reflected in the average time taken to return one simulation and possibly in the worst case. But there is no reason to call an inverse cdf method more efficient than an accept reject or a transform approach since it all depends on the time it takes to make the inversion compared with the other solutions… Since inverting the closed-form cdf in this example is much more expensive than generating a Gamma(½,½), and taking plus or minus its root, this is certainly the case here. Maybe a ziggurat method could be devised, especially since x²φ(x)<φ(x) when |x|≤1, but I am not sure it is worth the effort!

## multiplying a Gaussian matrix and a Gaussian vector

Posted in Books with tags , , , , , on March 2, 2017 by xi'an

This arXived note by Pierre-Alexandre Mattei was actually inspired by one of my blog entries, itself written from a resolution of a question on X validated. The original result about the Laplace distribution actually dates at least to 1932 and a paper by Wishart and Bartlett!I am not sure the construct has clear statistical implications, but it is nonetheless a good calculus exercise.

The note produces an extension to the multivariate case. Where the Laplace distribution is harder to define, in that multiple constructions are possible. The current paper opts for a definition based on the characteristic function. Which leads to a rather unsavoury density with Bessel functions. It however satisfies the constructive definition of being a multivariate Normal multiplied by a χ variate plus a constant vector multiplied by the same squared χ variate. It can also be derived as the distribution of

Wy+||y||²μ

when W is a (p,q) matrix with iid Gaussian columns and y is a Gaussian vector with independent components. And μ is a vector of the proper dimension. When μ=0 the marginals remain Laplace.

## an accurate variance approximation

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , on February 7, 2017 by xi'an

In answering a simple question on X validated about producing Monte Carlo estimates of the variance of estimators of exp(-θ) in a Poisson model, I wanted to illustrate the accuracy of these estimates against the theoretical values. While one case was easy, since the estimator was a Binomial B(n,exp(-θ)) variate [in yellow on the graph], the other one being the exponential of the negative of the Poisson sample average did not enjoy a closed-form variance and I instead used a first order (δ-method) approximation for this variance which ended up working surprisingly well [in brown] given that the experiment is based on an n=20 sample size.

Thanks to the comments of George Henry, I stand corrected: the variance of the exponential version is easily manageable with two lines of summation! As

$\text{var}(\exp\{-\bar{X}_n\})=\exp\left\{-n\theta[1-\exp\{-2/n\}]\right\}$

$-\exp\left\{-2n\theta[1-\exp\{-1/n\}]\right\}$

which allows for a comparison with its second order Taylor approximation:

## a well-hidden E step

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , , , , on February 3, 2017 by xi'an

A recent question on X validated ended up being quite interesting! The model under consideration is made of parallel Markov chains on a finite state space, all with the same Markov transition matrix, M, which turns into a hidden Markov model when the only summary available is the number of chains in a given state at a given time. When writing down the EM algorithm, the E step involves the expected number of moves from a given state to a given state at a given time. The conditional distribution of those numbers of chains is a product of multinomials across times and starting states, with no Markov structure since the number of chains starting from a given state is known at each instant. Except that those multinomials are constrained by the number of “arrivals” in each state at the next instant and that this makes the computation of the expectation intractable, as far as I can see.

A solution by Monte Carlo EM means running the moves for each instant under the above constraints, which is thus a sort of multinomial distribution with fixed margins, enjoying a closed-form expression but for the normalising constant. The direct simulation soon gets too costly as the number of states increases and I thus considered a basic Metropolis move, using one margin (row or column) or the other as proposal, with the correction taken on another margin. This is very basic but apparently enough for the purpose of the exercise. If I find time in the coming days, I will try to look at the ABC resolution of this problem, a logical move when starting from non-sufficient statistics!

## a typo that went under the radar

Posted in Books, R, Statistics, University life with tags , , , , , , , on January 25, 2017 by xi'an

A chance occurrence on X validated: a question on an incomprehensible formula for Bayesian model choice: which, most unfortunately!, appeared in Bayesian Essentials with R! Eeech! It looks like one line in our LATEX file got erased and the likelihood part in the denominator altogether vanished. Apologies to all readers confused by this nonsensical formula!

## flea circus

Posted in Books, Kids, pictures, R, Statistics with tags , , , , , , , , , , , on December 8, 2016 by xi'an

An old riddle found on X validated asking for Monte Carlo resolution  but originally given on Project Euler:

A 30×30 grid of squares contains 30² fleas, initially one flea per square. When a bell is rung, each flea jumps to an adjacent square at random. What is the expected number of unoccupied squares after 50 bell rings, up to six decimal places?

The debate on X validated is whether or not a Monte Carlo resolution is feasible. Up to six decimals, certainly not. But with some lower precision, certainly. Here is a rather basic R code where the 50 steps are operated on the 900 squares, rather than the 900 fleas. This saves some time by avoiding empty squares.

xprmt=function(n=10,T=50){

mean=0
for (t in 1:n){

board=rep(1,900)
for (v in 1:T){

beard=rep(0,900)
if (board[1]>0){
poz=c(0,1,0,30)
ne=rmultinom(1,board[1],prob=(poz!=0))
beard[1+poz]=beard[1+poz]+ne}
#
for (i in (2:899)[board[-1][-899]>0]){
u=(i-1)%%30+1;v=(i-1)%/%30+1
poz=c(-(u>1),(u<30),-30*(v>1),30*(v<30))
ne=rmultinom(1,board[i],prob=(poz!=0))
beard[i+poz]=beard[i+poz]+ne}
#
if (board[900]>0){
poz=c(-1,0,-30,0)
ne=rmultinom(1,board[900],prob=(poz!=0))
beard[900+poz]=beard[900+poz]+ne}
board=beard}
mean=mean+sum(board==0)}
return(mean/n)}


The function returns an empirical average over n replications. With a presumably awkward approach to the borderline squares, since it involves adding zeros to keep the structure the same… Nonetheless, it produces an approximation that is rather close to the approximate expected value, in about 3mn on my laptop.

> exprmt(n=1e3)
[1] 331.082
> 900/exp(1)
[1] 331.0915


Further gains follow from considering only half of the squares, as there are two independent processes acting in parallel. I looked at an alternative and much faster approach using the stationary distribution, with the stationary being the Multinomial (450,(2/1740,3/1740…,4/1740,…,2/1740)) with probabilities proportional to 2 in the corner, 3 on the sides, and 4 in the inside. (The process, strictly speaking, has no stationary distribution, since it is periodic. But one can consider instead the subprocess indexed by even times.) This seems to be the case, though, when looking at the occupancy frequencies, after defining the stationary as:

inva=function(B=30){
return(c(2,rep(3,B-2),2,rep(c(3,
rep(4,B-2),3),B-2),2,rep(3,B-2),2))}


namely

> mn=0;n=1e8 #14 clock hours!
> proz=rep(c(rep(c(0,1),15),rep(c(1,0),15)),15)*inva(30)
> for (t in 1:n)
+ mn=mn+table(rmultinom(1,450,prob=rep(1,450)))[1:4]
> mn=mn/n
> mn[1]=mn[1]-450
> mn
0      1      2     3
166.11 164.92  82.56 27.71
> exprmt(n=1e6) #55 clock hours!!
[1] 165.36 165.69 82.92 27.57

my original confusion being that the Poisson approximation had not yet taken over… (Of course, computing the first frequency for the stationary distribution does not require any simulation, since it is the sum of the complement probabilities to the power 450, i.e., 166.1069.)