## off to Banff [17w5024]

Posted in Mountains, pictures, Running, Statistics, Travel, University life with tags , , , , , , on February 18, 2017 by xi'an

Today, I fly from Paris to Amsterdam to Calgary to attend the ABC’ory workshop (15w2214) at the Banff International Research Station (BIRS) that Luke Bornn, Jukka Corander, Gael Martin, Dennis Prangle, Richard Wilkinson and myself built. The meeting is to brainstorm about the foundations of ABC for statistical inference rather than about the computational aspects of ABC, but the schedule is quite flexible for other directions!

## A knapsack riddle [#2]?

Posted in Kids, R, Statistics with tags , , , on February 17, 2017 by xi'an

Still about this allocation riddle of the past week, and still with my confusion about the phrasing of the puzzle, when looking at a probabilistic interpretation of the game, rather than for a given adversary’s y, the problem turns out to search for the maximum of

$\mathbb{E}[L(x,Y)]=\sum_{i=1}^{10} i\{P(Y_ix_i)\}$

where the Y’s are Binomial B(100,p). Given those p’s, this function of x is available in closed form and can thus maximised by a simulated annealing procedure, coded as

utility=function(x,p){
ute=2*pbinom(x[1]-1,100,prob=p[1])+
dbinom(x[1],100,p[1])
for (i in 2:10)
ute=ute+2*i*pbinom(x[i]-1,100,prob=p[i])+
i*dbinom(x[i],100,p[i])
return(ute)}
#basic term in utility
baz=function(i,x,p){
return(i*dbinom(x[i],100,p[i])+
i*dbinom(x[i]-1,100,p[i]))}
#relies on a given or estimated p
x=rmultinom(n=1,siz=100,prob=p)
maxloz=loss=0
newloss=losref=utility(x,p)
#random search
T=1e3
Te=1e2
baza=rep(0,10)
t=1
while ((t<T)||(newloss>loss)){
loss=newloss
i=sample(1:10,1,prob=(10:1)*(x>0))
#moving all other counters by one
xp=x+1;xp[i]=x[i]
#corresponding utility change
for (j in 1:10) baza[j]=baz(j,xp,p)
proz=exp(log(t)*(baza-baza[i])/Te)
#soft annealing move
j=sample(1:10,1,prob=proz)
if (i!=j){ x[i]=x[i]-1;x[j]=x[j]+1}
newloss=loss+baza[j]-baza[i]
if (newloss>maxloz){
maxloz=newloss;argz=x}
t=t+1
if ((t>T-10)&(newloss<losref)){
t=1;loss=0
x=rmultinom(n=1,siz=100,prob=p)
newloss=losref=utility(x,p)}}


which seems to work, albeit not always returning the same utility. For instance,

> p=cy/sum(cy)
> utility(argz,p)
[1] 78.02
> utility(cy,p)
[1] 57.89


or

> p=sy/sum(sy)
> utility(argz,p)
[1] 82.04
> utility(sy,p)
[1] 57.78


Of course, this does not answer the question as intended and reworking the code to that purpose is not worth the time!

## a concise introduction to statistical inference [book review]

Posted in Statistics with tags , , , , , , , , , , on February 16, 2017 by xi'an

[Just to warn readers and avoid emails about Xi’an plagiarising Christian!, this book was sent to me by CRC Press for a review. To be published in CHANCE.]

This is an introduction to statistical inference. And with 180 pages, it indeed is concise! I could actually stop the review at this point as a concise review of a concise introduction to statistical inference, as I do not find much originality in this introduction, intended for “mathematically sophisticated first-time student of statistics”. Although sophistication is in the eye of the sophist, of course, as this book has margin symbols in the guise of integrals to warn of section using “differential or integral calculus” and a remark that the book is still accessible without calculus… (Integral calculus as in Riemann integrals, not Lebesgue integrals, mind you!) It even includes appendices with the Greek alphabet, summation notations, and exponential/logarithms.

“In statistics we often bypass the probability model altogether and simply specify the random variable directly. In fact, there is a result (that we won’t cover in detail) that tells us that, for any random variable, we can find an appropriate probability model.” (p.17)

Given its limited mathematical requirements, the book does not get very far in the probabilistic background of statistics methods, which makes the corresponding chapter not particularly helpful as opposed to a prerequisite on probability basics. Since not much can be proven without “all that complicated stuff about for any ε>0” (p.29). And makes defining correctly notions like the Central Limit Theorem impossible. For instance, Chebychev’s inequality comes within a list of admitted results. There is no major mistake in the chapter, even though mentioning that two correlated Normal variables are jointly Normal (p.27) is inexact.

“The power of a test is the probability that you do not reject a null that is in fact correct.” (p.120)

Most of the book follows the same pattern as other textbooks at that level, covering inference on a mean and a probability, confidence intervals, hypothesis testing, p-values, and linear regression. With some words of caution about the interpretation of p-values. (And the unfortunate inversion of the interpretation of power above.) Even mentioning the Cult [of Significance] I reviewed a while ago.

Given all that, the final chapter comes as a surprise, being about Bayesian inference! Which should make me rejoice, obviously, but I remain skeptical of introducing the concept to readers with so little mathematical background. And hence a very shaky understanding of a notion like conditional distributions. (Which reminds me of repeated occurrences on X validated when newcomers hope to bypass textbooks and courses to grasp the meaning of posteriors and such. Like when asking why Bayes Theorem does not apply for expectations.) I can feel the enthusiasm of the author for this perspective and it may diffuse to some readers, but apart from being aware of the approach, I wonder how much they carry away from this brief (decent) exposure. The chapter borrows from Lee (2012, 4th edition) and from Berger (1985) for the decision-theoretic part. The limitations of the exercise are shown for hypothesis testing (or comparison) by the need to restrict the parameter space to two possible values. And for decision making. Similarly, introducing improper priors and the likelihood principle [distinguished there from the law of likelihood] is likely to get over the head of most readers and clashes with the level of the previous chapters. (And I do not think this is the most efficient way to argue in favour of a Bayesian approach to the problem of statistical inference: I have now dropped all references to the likelihood principle from my lectures. Not because of the controversy, but simply because the students do not get it.) By the end of the chapter, it is unclear a neophyte would be able to spell out how one could specify a prior for one of the problems processed in the earlier chapters. The appendix on de Finetti’s formalism on personal probabilities is very much unlikely to help in this regard. While it sounds so far beyond the level of the remainder of the book.

## SMC on a sequence of increasing dimension targets

Posted in Statistics with tags , , , , , , , , , on February 15, 2017 by xi'an

Richard Everitt and co-authors have arXived a preliminary version of a paper entitled Sequential Bayesian inference for mixture models and the coalescent using sequential Monte Carlo samplers with transformations. The central notion is an SMC version of the Carlin & Chib (1995) completion in the comparison of models in different dimensions. Namely to create auxiliary variables for each model in such a way that the dimension of the completed models are all the same. (Reversible jump MCMC à la Peter Green (1995) can also be interpreted this way, even though only relevant bits of the completion are used in the transitions.) I find the paper and the topic most interesting if only because it relates to earlier papers of us on population Monte Carlo. It also brought to my awareness the paper by Karagiannis and Andrieu (2013) on annealed reversible jump MCMC that I had missed at the time it appeared. The current paper exploits this annealed expansion in the devising of the moves. (Sequential Monte Carlo on a sequence of models with increasing dimension has been studied in the past.)

The way the SMC is described in the paper, namely, reweight-subsample-move, does not strike me as the most efficient as I would try to instead move-reweight-subsample, using a relevant move that incorporate the new model and hence enhance the chances of not rejecting.

One central application of the paper is mixture models with an unknown number of components. The SMC approach applied to this problem means creating a new component at each iteration t and moving the existing particles after adding the parameters of the new component. Since using the prior for this new part is unlikely to be at all efficient, a split move as in Richardson and Green (1997) can be considered, which brings back the dreaded Jacobian of RJMCMC into the picture! Here comes an interesting caveat of the method, namely that the split move forces a choice of the split component of the mixture. However, this does not appear as a strong difficulty, solved in the paper by auxiliary [index] variables, but possibly better solved by a mixture representation of the proposal, as in our PMC [population Monte Carlo] papers. Which also develop a family of SMC algorithms, incidentally. We found there that using a mixture representation of the proposal achieves a provable variance reduction.

“This puts a requirement on TSMC that the single transition it makes must be successful.”

As pointed by the authors, the transformation SMC they develop faces the drawback that a given model is only explored once in the algorithm, when moving to the next model. On principle, there would be nothing wrong in including regret steps, retracing earlier models in the light of the current one, since each step is an importance sampling step valid on its own right. But SMC also offers a natural albeit potentially high-varianced approximation to the marginal likelihood, which is quite appealing when comparing with an MCMC outcome. However, it would have been nice to see a comparison with alternative estimates of the marginal in the case of mixtures of distributions. I also wonder at the comparative performances of a dual approach that would be sequential in the number of observations as well, as in Chopin (2004) or our first population Monte Carlo paper (Cappé et al., 2005), since subsamples lead to tempered versions of the target and hence facilitate moves between models, being associated with flatter likelihoods.

## ABC in Les Diablerets

Posted in Statistics with tags , , , , , , , , , , on February 14, 2017 by xi'an

Since I could not download the slides of my ABC course in Les Diablerets in one go, I broke them by chapters as follows. (Warning: there is very little novelty in those slides, except for the final part on consistency.)

Although I did not do it on purpose (!), starting with indirect inference and other methods inspired from econometrics induced some discussion in the first hour of the course with econometricians in the room. Including Elvezio Ronchetti.

I also regretted piling too much material in the alphabet soup, as it was too widespread for a new audience. And as I could not keep the coherence of the earlier parts by going thru so many papers at once. Especially since I was a bit knackered after a day of skiing….

I managed to get to the final convergence chapter on the last day, even though I had to skip some of the earlier material. Which should be reorganised anyway as the parts between model choice with random forests and inference with random forests are not fully connected!

## a knapsack riddle?

Posted in Books, pictures, R, Statistics, Travel with tags , , , , , , on February 13, 2017 by xi'an

The [then current now past] riddle of the week is a sort of multiarmed bandits optimisation. Of sorts. Or rather a generalised knapsack problem. The question is about optimising the allocation of 100 undistinguishable units to 10 distinct boxes against a similarly endowed adversary, when the loss function is

$L(x,y)=(x_1>y_1)-(x_1y_{10})-(x_{10}

and the distribution q of the adversary is unknown. As usual (!), the phrasing of the riddle is somewhat ambiguous but I am under the impression that the game is played sequentially, hence that one can learn about the distribution of the adversary, at least when assuming this adversary keeps the same distribution q at all times. Continue reading

## and it only gets worse…

Posted in Statistics with tags , , , , , on February 12, 2017 by xi'an

“The Trump administration has dropped the federal government’s challenge to a nationwide injunction issued last year that blocked the fulfillment of Obama administration guidelines stating that transgender students’ access to bathrooms and other gender-segregated school facilities was protected under existing federal civil rights law.” NYT, Feb 11, 2017

“President Trump vowed on Thursday to overturn a law restricting political speech by tax-exempt churches, a potentially huge victory for the religious right and a gesture to evangelicals, a voting bloc he attracted to his campaign by promising to free up their pulpits.” NYT, Feb  2, 2017