**L**ast Friday, Guanyang Wang arXived a paper on the use of multi-armed bandits (hence the reference to the three bandits) to handle intractable normalising constants. The bandit compares or mixes Møller et al. (2006) auxiliary variable solution with Murray et al. (2006) exchange algorithm. Which are both special cases of pseudo-marginal MCMC algorithms. In both cases, the auxiliary variables produce an unbiased estimator of the ratio of the constants. Rather than the ratio of two unbiased estimators as in the more standard pseudo-marginal MCMC. The current paper tries to compare the two approaches based on the variance of the ratio estimate, but cannot derive a general ordering. The multi-armed bandit algorithm exploits both estimators of the acceptance ratio to pick the one that is almost the largest, almost because there is a correction for validating the step by detailed balance. The bandit acceptance probability is the maximum [over the methods] of the minimum [over the time directions] of the original acceptance ratio. While this appears to be valid, note that the resulting algorithm implies four times as many auxiliary variates as the original ones, which makes me wonder at the gain when compared with a parallel implementation of these methods, coupled at random times. (The fundamental difficulty of simulating from likelihoods with an unknown normalising constant remains, see p.4.)

## Archive for multi-armed bandits

## bandits for doubly intractable posteriors

Posted in Statistics with tags coupling, coupling from the past, detailed balance, doubly intractable posterior, Les Trois Brigands, multi-armed bandits, normalising constant, pseudo-marginal MCMC, ratio of integrals on April 17, 2019 by xi'an## Le Monde puzzle [#1014]

Posted in Books, Kids with tags competition, Le Monde, mathematical puzzle, multi-armed bandits on June 30, 2017 by xi'an

**O**ne of those Le Monde puzzles where a computational solution seems out of reach: given 2N batteries out of which N are charged, how many tests of pairs of batteries are necessary to find a working pair? I first tried a brute force recursive resolution checking all (2N-1)N orders of pairs over all (2N)! permutations of 0…01…1, but the magnitude of the exploration is such that the program failed to conclude for N=3! After realising that N+4 is an upper bound on the number of attempts (since testing all N disjoint pairs until success or exhaustion leaves 4 attempts to find the two operative batteries within a couple of tested pairs), I managed to cap the depth of the recursion into the following R code:

library(combinat) N=4 permz=permn(rep(c(0,1),N)) seqz=matrix(0,factorial(2*N),2*N) for (i in 1:factorial(2*N)) seqz[i,]=permz[[i]] optest=function(earl,scorz,N){ #remaining pairs best=N+4 if (length(earl)>N+3) return(best) for (i in 1:(2*N-1)) for (j in ((i+1):(2*N))){ prez=i+2*N*j if (!(prez%in%earl)){ scorzz=apply(cbind(scorz,seqz[,i]*seqz[,j]),1,max) if (min(scorzz)==1){ best=length(earl)+1 break()}else{ best=min(best,optest(c(earl,prez),scorzz,N))}} if (best==6) break() } return(best)} optest(earl=c(1+2*N*2,1+2*N*3), scorz=apply(matrix(as.vector(seqz[,c(1,1)])* as.vector(seqz[,c(2,3)]),ncol=2),1,max),N=N)

which returned a solution for N=3 (6 tests are enough, testing all combinations of the first three batteries then all combinations of the last three) but is still running for N=4! (I wonder at a possible resolution by modelling this puzzle though a very special type of multi-armed bandit problem with (pairwise) aggregated outcomes, but I did not go any further in that direction.)

## analysing statistical and computational trade-off of estimation procedures

Posted in Books, pictures, Statistics, University life with tags ABC, computing cost, efficiency, Harvard University, insufficiency, limited resources, multi-armed bandits on July 8, 2015 by xi'an

“The collection of estimates may be determined by questions such as: How much storage isavailable? Can all the data be kept in memory or only a subset? How much processingpower is available? Are there parallel or distributed systems that can be exploited?”

**D**aniel Sussman, Alexander Volfovsky, and Edoardo Airoldi from Harvard wrote a very interesting paper about setting a balance between statistical efficiency and computational efficiency, a theme that resonates with our recent work on ABC and older considerations about the efficiency of Monte Carlo algorithms. While the paper avoids drifting towards computer science even with a notion like *algorithmic complexity*, I like the introduction of a loss function in the comparison game, even though the way to combine both dimensions is unclear. And may limit the exercise to an intellectual game. In an ideal setting one would set the computational time, like *“I have one hour to get this estimate”*, and compare risks under that that computing constraint. Possibly dumping some observations from the sample to satisfy the constraint. Ideally. Which is why this also reminds me of ABC: given an intractable likelihood, one starts by throwing away some data precision by using a tolerance ε and usually more through an insufficient statistic. Hence ABC procedures could also be compared in such terms.

In the current paper, the authors only compare schemes of breaking the sample into bits to handle each observation only once. Meaning it cannot be used in both the empirical mean and the empirical variance. This sounds a bit contrived in that the optimum allocation depends on the value of the parameter the procedure attempts to estimate. Still, it could lead to a new form of bandit problems: given a bandit with as many arms as there are parameters, at each new observation, decide on the allocation towards minimising the overall risk. (There is a missing sentence at the end of Section 4.)

Any direction for turning those considerations into a practical decision machine would be fantastic, although the difficulties are formidable, from deciding between estimators and selecting a class of estimators, to computing costs and risks depending on unknown parameters.

## importance weighting without importance weights [ABC for bandits?!]

Posted in Books, Statistics, University life with tags ABC, machine learning, multi-armed bandits, tolerance, Zurich on March 27, 2015 by xi'an**I** did not read very far in the recent arXival by Neu and Bartók, but I got the impression that it was a version of ABC for bandit problems where the probabilities behind the bandit arms are not available but can be generated. Since the stopping rule found in the “Recurrence weighting for multi-armed bandits” is the generation of an arm equal to the learner’s draw (p.5). Since there is no tolerance there, the method is exact (“unbiased”). As no reference is made to the ABC literature, this may be after all a mere analogy…

## bandits at CREST

Posted in Statistics, University life with tags ABC, bandits, Jeffreys priors, multi-armed bandits, sequential decision theory, Thompson sampling on November 5, 2013 by xi'an**T**oday, Emilie Kaufmann (Telecom ParisTech) gave a talk at BiP, the Bayes in Paris seminar at CREST, on multiarmed bandits. I always liked those problems that mix probability, learning, sequential statistics, decision theory and design (exploration versus exploitation). This is a well-known problem with a large literature but, nonetheless, I still find talks on the topic entertaining and mind-challenging. Esp. as I had missed the recent arXiv paper by Aurélien Garivier and Olivier Cappé.

**H**ere the goal is to define Bayesian strategies that do well from a frequentist viewpoint. *(This reminded me of something I read or heard the other day about people acting more Bayesianly against people than against computers… Where they get down to random (and poor) strategies.)* Emilie mentioned a computing issue with the Gittins finite horizon setting, which happens to be a dynamic programming issue. This “justifies” moving to the asymptotic version of the game by Lai and Robbins (1985). The classical algorithmic resolution uses UCB (upper confidence bounds) on the probabilities of reward and picks the bandit arm with the highest bound each time, which involves some degree of approximation.

**I**n the Bayesian case, the reference is the Thompson (1933) sampling algorithm, which picks a branch at random according to the estimated proportions or something similar. Emilie and co-authors managed to extend the sampling to generic exponential families. Showing that the Jeffreys prior leads to an asymptotically optimal solution. I am actually surprised at seeing the Jeffreys prior in this framework, due to the need to predict what is happening on branches not yet or hardly visited. So using the prior seems to prohibit using an improper prior. Maybe all this does not matter for asymptotics…

**M**y only voiced question was that I did not see why the computing for the finite horizon problem is not very developed. Maybe ABC could be used for bandits??? Although it seems from the talk that Thompson sampling does very well for finite horizons.