as the probability to exceed n is the probability that at least one value is not observed by the n-th draw, namely

3+(1/2)^{n}+(2/3)^{n}+(5/6)^{n}-(1/6)^{n}-(1/3)^{n}-(1/2)^{n}

which leads to an easy summation for the expectation, namely

3+(2/3)³/(1/3)+(5/6)³/(1/6)-(1/3)³/(2/3)-(1/6)³/(5/6)=73/10

Checking the results hold is also straightforward:

averages <- function(n=1){ x=matrix(sample(1:3,100,rep=TRUE,prob=1:3),100,3) x[,1]=as.integer(x[,2]<2) x[,3]=as.integer(x[,2]>2) x[,2]=1-x[,1]-x[,3] y=apply(apply(x,2,cumsum),1,prod) m=1+sum(y==0) return(apply(x[1:m,],2,sum))}

since this gives

mumbl=matrix(0,1e5,3) for (t in 1:1e5) mumbl[t,]=averages() > apply(mumbl,2,mean) [1] 1.21766 2.43265 3.64759 > sum(apply(mumbl,2,mean)) [1] 7.2979 > apply(mumbl,2,mean)*c(6,3,2) [1] 7.30596 7.29795 7.29518

Filed under: Books, Kids, R Tagged: 538, FiveThirtyEight, stopping rule, The Riddler ]]>

Filed under: pictures, Travel, University life Tagged: Bavaria, Germany, Max Planck Institute, München, neo-gothic architecture ]]>

Filed under: Books, pictures, Statistics, University life Tagged: conditional probability, debiasing, Monte Carlo approximations, Monte Carlo Statistical Methods, Rao-Blackwellisation ]]>

- no bonuses like supplementary material, code, open or edited comments
- no reduction in the subscription rate of the journals and penalty fees if one still wants a paper version, which amounts to a massive increase in the subscription price
- no disengagement from the commercial publisher, whose role become even less relevant
- no access to the issues of the years one has paid for, once one stops subscribing.

“The benefits of electronic publishing include: faster publishing speeds; increased content; instant access from a range of electronic devices; additional functionality; and of course, environmental sustainability.”

The move is sold with typical marketing noise. But I do not buy it: publishing speeds will remain the same as driven by the reviewing part, I do not see where the contents are increased, and I cannot seriously read a journal article from my phone, so this range of electronic devices remains a gadget. Not happy!

Filed under: Books, pictures, Statistics, University life Tagged: academic journals, Electronic Journal of Statistics, JRSSB, Royal Statistical Society, RSS ]]>

- MCMC with Strings and Branes: The Suburban Algorithm (Extended Version) by Jonathan J. Heckman, Jeffrey G. Bernstein, Ben Vigoda
- Methods for Bayesian Variable Selection with Binary Response Data using the EM Algorithm by Patrick McDermott, John Snyder, Rebecca Willison
- Computing the variance of a conditional expectation via non-nested Monte Carlo by Takashi Goda
- Metropolis-Hastings algorithms with autoregressive proposals, and a few examples by Richard A. Norton, Colin Fox
- Multilevel Particle Filters: Normalizing Constant Estimation by Ajay Jasra, Kengo Kamatani, Prince Prepah Osei, Yan Zhou
- Sobol’ indices for problems defined in non-rectangular domains by S. Kucherenko, O.V. Klymenko, N. Shah

Enjoy!

Filed under: Books, Statistics, University life Tagged: arXiv ]]>

“…we will show that [importance sampling] is unnecessary in many instances…” (p.6)

An obvious question that stems from the approach is the call for importance sampling, since the numerator of the importance sampler involves the full likelihood which is unavailable in most instances when sub-sampled MCMC is required. I may have missed the part of the paper where the above statement is discussed, but the only realistic example discussed therein is the Bayesian regression tree (BART) of Chipman et al. (1998). Which indeed constitutes a challenging if one-dimensional example, but also one that requires delicate tuning that leads to cancelling importance weights but which may prove delicate to extrapolate to other models.

Filed under: Books, Statistics, University life Tagged: BART, Canada, consensus Monte Carlo, importance sampling, likelihood function, Monte Carlo Statistical Methods, scaling, subsampling, University of Toronto ]]>

[which still has a link with e in that the fraction of empty bins converges to e⁻¹ when n=m], this led me to some more involved investigation on the distribution of Y. While it can be shown directly that the probability that k bins are non-empty is

with an R representation by

miss<-function(n,m){ p=rep(0,n) for (k in 1:n) p[k]=choose(n,k)*sum((-1)^((k-1):0)*choose(k,1:k)*(1:k)^m) return(rev(p)/n^m)}

I wanted to take advantage of the moments of Y, since it writes as a sum of n indicators, counting the number of empty cells. However, the higher moments of Y are not as straightforward as its expectation and I struggled with the representation until I came upon this formula

where S(k,i) denotes the Stirling number of the second kind… Or i!S(n,i) is the number of surjections from a set of size n to a set of size i. Which leads to the distribution of Y by inverting the moment equations, as in the following R code:

diss<-function(n,m){ A=matrix(0,n,n) mome=rep(0,n) A[n,]=rep(1,n) mome[n]=1 for (k in 1:(n-1)){ A[k,]=(0:(n-1))^k for (i in 1:k) mome[k]=mome[k]+factorial(i)*as.integer(Stirling2(n,i))* (1-(i+1)/n)^m*factorial(k)/factorial(k-i-1)} return(solve(A,mome))}

that I still checked by raw simulations from the multinomial

zample<-function(n,m,T=1e4){ x=matrix(sample(1:n,m*T,rep=TRUE),nrow=T) x=sapply(apply(x,1,unique),length) return(n-x)}

Filed under: Kids, R, Statistics Tagged: moment derivation, moments, multinomial distribution, occupancy, R, Stack Exchange, Stirling number, surjection ]]>

Filed under: pictures, Travel, University life Tagged: Bavaria, Germany, Marienplatz, Munich, Neues Rathaus ]]>

“The hunters wolfed down chicken fried steaks or wolfed down cuds of Red Man, Beech-Nut, Levi Garrett, or Jackson’s Apple Jack”

The Snow Geese was written in 2002 by William Fiennes, a young Englishman recovering from a serious disease and embarking on a wild quest to overcome post-sickness depression. While the idea behind the trip is rather alluring, namely to follow Arctic geese from their wintering grounds in Texas to their summer nesting place on Baffin Island, the book itself is sort of a disaster. As the prose of the author is very heavy, or even very very heavy, with an accumulation of descriptions that do not contribute to the story and a highly bizarre habit to mention brands by groups of three. And of using heavy duty analogies, as in *“we were travelling across the middle of a page, with whiteness and black markings all around us, and geese lifting off the snow like letters becoming unstuck”*. The reflections about the recovery of the author from a bout of depression and the rise of homesickness and nostalgia are not in the least deep or challenging, while the trip of the geese does not get beyond the descriptive. Worse, the geese remain a mystery, a blur, and a collective, rather than bringing the reader closer to them. If anything is worth mentioning there, it is instead the encounters of the author with rather unique characters, at every step of his road- and plane-trips. To the point of sounding too unique to be true… His hunting trip with a couple of Inuit hunters north of Iqualit on Baffin Island is both a high and a down of the book in that sharing a few days with them in the wild is exciting in a primeval sense, while witnessing them shoot down the very geese the author followed for 5000 kilometres sort of negates the entire purpose of the trip. It then makes perfect sense to close the story with a feeling of urgency, for there is nothing worth adding.

Filed under: Books, Kids, pictures, Travel Tagged: Antarctica, Baffin Island, Inuits, Rue Mouffetard, snow geese, William Fiennes ]]>

Filed under: Mountains, pictures, Travel, Wines Tagged: Italian wines, Le Sassine, Lenzerheide, MCMskv, Ripasso, Switzerland, Valpolicella ]]>