Archive for multi-armed bandits

bandits for stratified Monte Carlo

Posted in Books, pictures, Statistics, University life with tags , , on December 12, 2021 by xi'an

In our Monte Carlo reading group in Dauphine, we recently went through Carpentier, Munos and Antos’ Adaptive strategy for stratified Monte Carlo sampling, a 2015 JMLR paper. Which given K strata and corresponding weights on the different strata aims at producing an efficient estimate of the weighted mean. This problem can be re-expressed as a K armed bandit problem, where choosing an arm is driven by running the optimal number of simulation per arm (stratum). The ideal solution takes a number of simulations proportional to the weight x standard deviation of the associated stratum. The proposed algorithm estimates this quantity on the go by always pulling the arm (stratum) with the largest (over-)estimated value. The rather unusual perspective of the paper is to bring out precise, deterministic,  and finite-sample bounds on the errors between the optimal allocations (to the arms) and their sequential approximations, the weighted MSE regrets, and the overall regret. Albeit crucially depending on the sub-Gaussianity rates c¹ and c² of the arm distributions for the construction of the shrinkage coefficient β… (Obviously, I am not deeply knowledgeable in bandits so may miss some of the more recent literature.) An extension of interest would be to estimate the weights as well, for instance when the masses of the strata are unknown. Or even deciding on the strata themselves. We also all wondered at a possible link with Wang-Landau, but the desiderata sounded divergent.

 

reXing the bridge

Posted in Books, pictures, Statistics with tags , , , , , , , , , on April 27, 2021 by xi'an

As I was re-reading Xiao-Li  Meng’s and Wing Hung Wong’s 1996 bridge sampling paper in Statistica Sinica, I realised they were making the link with Geyer’s (1994) mythical tech report, in the sense that the iterative construction of α functions “converges to the `reverse logistic regression’  described in Geyer (1994) for the two-density cases” (p.839). Although they also saw the later as an “iterative” application of Torrie and Valleau’s (1977) “umbrella sampling” estimator. And cited Bennett (1976) in the Journal of Computational Physics [for which Elsevier still asks for $39.95!] as the originator of the formula [check (6)]. And of the optimal solution (check (8)). Bennett (1976) also mentions that the method fares poorly when the targets do not overlap:

“When the two ensembles neither overlap nor satisfy the above smoothness condition, an accurate estimate of the free energy cannot be made without gathering additional MC data from one or more intermediate ensembles”

in which case this sequence of intermediate targets could be constructed and, who knows?!, optimised. (This may be the chain solution discussed in the conclusion of the paper.) Another optimisation not considered in enough detail is the allocation of the computing time to the two densities, maybe using a bandit strategy to avoid estimating the variance of the importance weights first.

bandits for doubly intractable posteriors

Posted in Statistics with tags , , , , , , , , on April 17, 2019 by xi'an

Last Friday, Guanyang Wang arXived a paper on the use of multi-armed bandits (hence the reference to the three bandits) to handle intractable normalising constants. The bandit compares or mixes Møller et al. (2006) auxiliary variable solution with Murray et al. (2006) exchange algorithm. Which are both special cases of pseudo-marginal MCMC algorithms. In both cases, the auxiliary variables produce an unbiased estimator of the ratio of the constants. Rather than the ratio of two unbiased estimators as in the more standard pseudo-marginal MCMC. The current paper tries to compare the two approaches based on the variance of the ratio estimate, but cannot derive a general ordering. The multi-armed bandit algorithm exploits both estimators of the acceptance ratio to pick the one that is almost the largest, almost because there is a correction for validating the step by detailed balance. The bandit acceptance probability is the maximum [over the methods] of the minimum [over the time directions] of the original acceptance ratio. While this appears to be valid, note that the resulting algorithm implies four times as many auxiliary variates as the original ones, which makes me wonder at the gain when compared with a parallel implementation of these methods, coupled at random times. (The fundamental difficulty of simulating from likelihoods with an unknown normalising constant remains, see p.4.)

Le Monde puzzle [#1014]

Posted in Books, Kids with tags , , , on June 30, 2017 by xi'an


One of those Le Monde puzzles where a computational solution seems out of reach: given 2N batteries out of which N are charged, how many tests of pairs of batteries are necessary to find a working pair? I first tried a brute force recursive resolution checking all (2N-1)N orders of pairs over all (2N)! permutations of 0…01…1, but the magnitude of the exploration is such that the program failed to conclude for N=3! After realising that N+4 is an upper bound on the number of attempts (since testing all N disjoint pairs until success or exhaustion leaves 4 attempts to find the two operative batteries within a couple of tested pairs), I managed to cap the depth of the recursion into the following R code:

library(combinat)
N=4
permz=permn(rep(c(0,1),N))
seqz=matrix(0,factorial(2*N),2*N)
for (i in 1:factorial(2*N)) seqz[i,]=permz[[i]]

optest=function(earl,scorz,N){
#remaining pairs
best=N+4
if (length(earl)>N+3) return(best)
for (i in 1:(2*N-1))
for (j in ((i+1):(2*N))){
prez=i+2*N*j
if (!(prez%in%earl)){
scorzz=apply(cbind(scorz,seqz[,i]*seqz[,j]),1,max)
if (min(scorzz)==1){
best=length(earl)+1
break()}else{
best=min(best,optest(c(earl,prez),scorzz,N))}}
if (best==6) break()
}
return(best)}

optest(earl=c(1+2*N*2,1+2*N*3),
scorz=apply(matrix(as.vector(seqz[,c(1,1)])*
    as.vector(seqz[,c(2,3)]),ncol=2),1,max),N=N)

which returned a solution for N=3 (6 tests are enough, testing all combinations of the first three batteries then all combinations of the last three) but is still running for N=4! (I wonder at a possible resolution by modelling this puzzle though a very special type of multi-armed bandit problem with (pairwise) aggregated outcomes, but I did not go any further in that direction.)

analysing statistical and computational trade-off of estimation procedures

Posted in Books, pictures, Statistics, University life with tags , , , , , , on July 8, 2015 by xi'an

bostown1

“The collection of estimates may be determined by questions such as: How much storage is available? Can all the data be kept in memory or only a subset? How much processing power is available? Are there parallel or distributed systems that can be exploited?”

Daniel Sussman, Alexander Volfovsky, and Edoardo Airoldi from Harvard wrote a very interesting paper about setting a balance between statistical efficiency and computational efficiency, a theme that resonates with our recent work on ABC and older considerations about the efficiency of Monte Carlo algorithms. While the paper avoids drifting towards computer science even with a notion like algorithmic complexity, I like the introduction of a loss function in the comparison game, even though the way to combine both dimensions is unclear. And may limit the exercise to an intellectual game. In an ideal setting one would set the computational time, like “I have one hour to get this estimate”, and compare risks under that that computing constraint. Possibly dumping some observations from the sample to satisfy the constraint. Ideally. Which is why this also reminds me of ABC: given an intractable likelihood, one starts by throwing away some data precision by using a tolerance ε and usually more through an insufficient statistic. Hence ABC procedures could also be compared in such terms.

In the current paper, the authors only compare schemes of breaking the sample into bits to handle each observation only once. Meaning it cannot be used in both the empirical mean and the empirical variance. This sounds a bit contrived in that the optimum allocation depends on the value of the parameter the procedure attempts to estimate. Still, it could lead to a new form of bandit problems: given a bandit with as many arms as there are parameters, at each new observation, decide on the allocation towards minimising the overall risk. (There is a missing sentence at the end of Section 4.)

Any direction for turning those considerations into a practical decision machine would be fantastic, although the difficulties are formidable, from deciding between estimators and selecting a class of estimators, to computing costs and risks depending on unknown parameters.

%d bloggers like this: