bandits for stratified Monte Carlo
In our Monte Carlo reading group in Dauphine, we recently went through Carpentier, Munos and Antos’ Adaptive strategy for stratified Monte Carlo sampling, a 2015 JMLR paper. Which given K strata and corresponding weights on the different strata aims at producing an efficient estimate of the weighted mean. This problem can be re-expressed as a K armed bandit problem, where choosing an arm is driven by running the optimal number of simulation per arm (stratum). The ideal solution takes a number of simulations proportional to the weight x standard deviation of the associated stratum. The proposed algorithm estimates this quantity on the go by always pulling the arm (stratum) with the largest (over-)estimated value. The rather unusual perspective of the paper is to bring out precise, deterministic, and finite-sample bounds on the errors between the optimal allocations (to the arms) and their sequential approximations, the weighted MSE regrets, and the overall regret. Albeit crucially depending on the sub-Gaussianity rates c¹ and c² of the arm distributions for the construction of the shrinkage coefficient β… (Obviously, I am not deeply knowledgeable in bandits so may miss some of the more recent literature.) An extension of interest would be to estimate the weights as well, for instance when the masses of the strata are unknown. Or even deciding on the strata themselves. We also all wondered at a possible link with Wang-Landau, but the desiderata sounded divergent.
Leave a Reply