Archive for sequential decision theory

the Kelly criterion and then some

Posted in R, Statistics with tags , , , , , on August 26, 2022 by xi'an

The Kelly criterion is a way to optimise an unlimited sequence of bets under the following circumstances: a probability p of winning each bet, a loss of a fraction a of the sum bet, a gain of a fraction b of the sum bet, and a fraction f of the current fortune as the sum bet. Then

f^*=\dfrac{p}{a}-\dfrac{1-p}{b}

is the fraction optimising the growth

\mathbb E[ \log\{X_n/X_0\}^{1/n}]

Here is a rendering of the empirical probability of reaching 250 before ruin, when starting with a fortune of 100, when a=1, p=0.3 and f and b vary (on a small grid). With on top Kelly’s solutions, meaning that they achieve a high probability of avoiding ruin. Until they cannot.

The Ridder is asking for a variant of this betting scheme, when the probability p to win the bet is proportional to 1/(1+b), namely .9/(1+b). In that case, the probabilities of reaching 250 (using the same R code as above) before ruin are approximated as followswith a maximal probability that does not exceed 0.36, the probability to win in one go, betting 100 with a goal of 250. It thus may be that the optimal choice, probabilitiwise is that one. Note that in that case, whatever the value of b, the Kelly criterion returns a negative fraction. Actually, the solution posted by the Riddler the week after is slightly above, 0.3686 or 1−(3/5)9/10. Which I never reached by the sequential bet of a fixed fraction of the current fortune, eps. when not accounting for the fact that aiming at 250 rather than a smaller target was removing a .9 factor.

bandits at CREST

Posted in Statistics, University life with tags , , , , , on November 5, 2013 by xi'an

Today, Emilie Kaufmann (Telecom ParisTech) gave a talk at BiP, the Bayes in Paris seminar at CREST, on multiarmed bandits. I always liked those problems that mix probability, learning, sequential statistics, decision theory and design (exploration versus exploitation). This is a well-known problem with a large literature but, nonetheless, I still find talks on the topic entertaining and mind-challenging. Esp. as I had missed the recent arXiv paper by Aurélien Garivier and Olivier Cappé.

Here the goal is to define Bayesian strategies that do well from a frequentist viewpoint. (This reminded me of something I read or heard the other day about people acting more Bayesianly against people than against computers… Where they get down to random (and poor) strategies.) Emilie mentioned a computing issue with the Gittins finite horizon setting, which happens to be a dynamic programming issue. This “justifies” moving to the asymptotic version of the game by Lai and Robbins (1985). The classical algorithmic resolution uses UCB (upper confidence bounds) on the probabilities of reward and picks the bandit arm with the highest bound each time, which involves some degree of approximation.

In the Bayesian case, the reference is the Thompson (1933) sampling algorithm, which picks a branch at random according to the estimated proportions or something similar. Emilie and co-authors managed to extend the sampling to generic exponential families. Showing that the Jeffreys prior leads to an asymptotically optimal solution. I am actually surprised at seeing the Jeffreys prior in this framework, due to the need to predict what is happening on branches not yet or hardly visited. So using the prior seems to prohibit using an improper prior. Maybe all this does not matter for asymptotics…

My only voiced question was that I did not see why the computing for the finite horizon problem is not very developed. Maybe ABC could be used for bandits??? Although it seems from the talk that Thompson sampling does very well for finite horizons.

%d bloggers like this: