## Bayesian decision riddle

Posted in Books, Kids, Statistics with tags , , , , on June 15, 2017 by xi'an

The current puzzle on The Riddler is a version of the secretary problem with an interesting (?) Bayesian solution.

Given four positive numbers x¹, x², x³, x⁴, observed sequentially, the associated utility is the value of x at the stopping time. What is the optimal stopping rule?

While nothing is mentioned about the distribution of the x’s, I made the assumption that they were iid and uniformly distributed over (0,M), with M unknown and tried a Bayesian resolution with the non-informative prior π(M)=1/M. And failed. The reason for this failure is that the expected utility is infinite at the first step: while the posterior expected utility is finite with three and two observations, meaning I can compare stopping and continuing at the second and third steps, the predicted expected reward for continuing after observing x¹ does not exist because the expected value of max(x¹,x²) given x¹ does not exist. As the predictive density of x² is max(x¹,x²)⁻²…  Several alternatives are possible to bypass this impossible resolution, from changing the utility function to picking another reference prior.

For instance, using a prior like π(M)=1/M² l(and the same monetary return utility) leads to a proper optimal solution, namely

1. always wait for the second observation x²
2. stop at x² if x²>11x¹/12, else wait for x³
3. stop at x³ if x³>23 max(x¹,x²)/24, else observe x⁴

obtained analytically on a bar table in Rouen (and checked numerically later).

Another approach is to try to optimise the probability to pick the largest amount of the four x’s, but this is not leading to an interesting solution, since it corresponds to picking the first maximum after x¹, while picking the largest among remaining ones leads to a somewhat convoluted solution I have no patience to produce here! Plus this is not a really pertinent loss function as it does not discriminate enough against waiting…

## a secretary problem with maximum ability

Posted in Kids, R with tags , , , , on April 28, 2017 by xi'an

The Riddler of today has a secretary problem, where one measures sequentially N random variables until one deems the current variable to be the largest of the whole sample. The classical secretary problem has a counter-intuitive solution where one first measures N/e random variables without taking any decision and then and only then picks the first next outcome larger than the largest in the first group. (For large values of N.) The added information in the current riddle is that the distribution of those iid random variables is set to be uniform on {1,…,M}, which begs for a modification in the algorithm. As for instance when observing M on the current draw.

The approach I devised is certainly suboptimal, as I decided to pick the currently observed value if the (conditional) probability it is the largest is larger than the probability subsequent draws. This translates into the following R code:

M=100 #maximum value
N=10  #total number of draws
hyprob=function(m){
# m is sequence of draws so far
n=length(m);mmax=max(m)
if ((m[n]<mmax)||(mmax-n<N-n)){prob=0
}else{
prob=prod(sort((1:mmax)[-m],dec=TRUE)
[1:(N-n)]/((M-n):(M-N+1))}
return(prob)}

decision=function(draz=sample(1:M,N)){
i=0
keepgoin=TRUE
while ((keepgoin)&(i<N)){
i=i+1
keepgoin=(hyprob(draz[1:i])<0.5)}
return(c(i,draz[i],(draz[i]<max(draz))))}


which produces a winning rate of around 62% when N=10 and M=100, hence much better than the expected performances of the (asymptotic) secretary algorithm, with a winning frequency of 1/e. (For N=10 and M=100, the winning frequency is only 27%.)

## another riddle with a stopping rule

Posted in Books, Kids, R with tags , , , on May 27, 2016 by xi'an

A puzzle on The Riddler last week that is rather similar to an earlier one. Given the probability (1/2,1/3,1/6) on {1,2,3}, what is the mean of the number N of draws to see all possible outcomes and what is the average number of 1’s in those draws? The second question is straightforward, as the proportions of 1’s, 2’s and 3’s in the sequence till all values are observed remain 3/6, 2/6 and 1/6. The first question follows from the representation of the average

$\mathbb{E}[N]=\sum_{n=3}^\infty \mathbb{P}(N>n) + 3$

as the probability to exceed n is the probability that at least one value is not observed by the n-th draw, namely

3+(1/2)n+(2/3)n+(5/6)n-(1/6)n-(1/3)n-(1/2)n

which leads to an easy summation for the expectation, namely

3+(2/3)³/(1/3)+(5/6)³/(1/6)-(1/3)³/(2/3)-(1/6)³/(5/6)=73/10

## stopping rule impact

Posted in Books, R, Statistics, University life with tags , , , on May 9, 2014 by xi'an

Here is a question from my friend Shravan Vasishth about the consequences of using a stopping rule:

Psycholinguists and psychologists often adopt the following type of data-gathering procedure: The experimenter gathers n data points, then checks for significance (p<0.05 or not). If it’s not significant, he gets more data (n more data points). Since time and money are limited, he might decide to stop anyway at sample size, say, some multiple of n.  One can play with different scenarios here. A typical n might be 10 or 15.

This approach would give us a distribution of t-values and p-values under repeated sampling. Theoretically, under the standard assumptions of frequentist methods, we expect a Type I error to be 0.05. This is the case in standard analyses (I also track the t-statistic, in order to compare it with my stopping rule code below).

Here’s a simulation showing what happens. I wanted to ask you whether this simulation makes sense. I assume here that the experimenter gathers 10 data points, then checks for significance (p<0.05 or not). If it’s not significant, he gets more data (10 more data points). Since time and money are limited, he might decide to stop anyway at sample size 60. This gives us p-values under repeated sampling. Theoretically, under the standard assumptions of frequentist methods, we expect a Type I error to be 0.05. This is the case in standard analyses:
##Standard:
pvals<-NULL
tstat_standard<-NULL
n<-10 # sample size
nsim<-1000 # number of simulations
stddev<-1 # standard dev
mn<-0 ## mean

for(i in 1:nsim){
samp<-rnorm(n,mean=mn,sd=stddev)
pvals[i]<-t.test(samp)$p.value tstat_standard[i]<-t.test(samp)$statistic}

## Type I error rate: about 5% as theory says:
table(pvals<0.05)[2]/nsim


But the situation quickly deteriorates as soon as adopt the strategy I outlined above:

pvals<-NULL
tstat<-NULL
## how many subjects can I run?
upper_bound<-n*6

for(i in 1:nsim){
## at the outset we have no significant result:
significant<-FALSE
## null hyp is going to be true,
## so any rejection is a mistake.
## take sample:
x<-rnorm(n,mean=mn,sd=stddev)
while(!significant & length(x)<upper_bound){
## if not significant:
if(t.test(x)$p.value>0.05){ ## get more data: x<-append(x,rnorm(n,mean=mn,sd=stddev)) ## otherwise stop: } else {significant<-TRUE}} ## will be either significant or not: pvals[i]<-t.test(x)$p.value
tstat[i]<-t.test(x)\$statistic}


Now let’s compare the distribution of the t-statistic in the standard case vs with the above stopping rule. We get fatter tails with the above stopping rule, as shown by the histogram below.

Is this a correct way to think about the stopping rule problem?

To which I replied the following:

By adopting a stopping rule on a random iid sequence, you favour values in the sequence that agree with your stopping condition, hence modify the distribution of the outcome. To take an extreme example, if you draw N(0,1) variates until the empirical average is between -2 and 2, the average thus produced cannot remain N(0,1/n) but have a different distribution.

The t-test statistic you build from your experiment is no longer distributed as a uniform variate because of the stopping rule: the sample(x1,…,x10m) (with random size 10m [resulting from increases in the sample size by adding 10 more observations at a time] is distributed from

$\prod_{i=1}^{10m} \phi(x_i) \times \prod_{j=1}^{m-1} \mathbb{I}_{t(x_1,\ldots,x_{10j})>.05} \times \mathbb{I}_{t(x_1,\ldots,x_{10m})<.05}$

if 10m<60 [assuming the maximal acceptable sample size is 60] and from

$\prod_{i=1}^{60} \phi(x_i) \times \prod_{j=1}^{5} \mathbb{I}_{t(x_1,\ldots,x_{10j})>.05}$

otherwise. The histogram at the top of this post is the empirical distribution of the average of those observations, clearly far from a normal distribution.