## you said that I said that you said that…

Posted in Books, Kids, pictures, Statistics with tags , , , on May 25, 2021 by xi'an

A riddle from The Riddler on limited information decision making, which I tought I failed to understand:

Two players, Martina and Olivia, are each secretly given realisations, m and u. Starting with Martina, they must state to the other player whom they think probably has the greater number until they agree. They are playing as a team, hoping to maximize the chances they correctly predict who has the greater number. For a given round, what is the probability that the person they agree on really does have the bigger number?

A logical strategy is as follows: If m>.5, P(U>m)<.5, hence Martina should state her number is probably bigger, which conveys to Olivia that M>.5. If u<.5, Olivia can agree for certain, else, if u>.75, P(M>u)<.5 and she can state a probably larger number, while if 0.5<u<.75, Olivia can state (truthfully) that her number us probably smaller, although there is a ½ probability she is wrong. As detailed in the solution, the probability of finishing on a false statement is ¼²+¼³+…, equal to 1/12.

## the fundamental incompatibility of HMC and data subsampling

Posted in Books, Statistics, University life with tags , , , , , , on February 23, 2015 by xi'an

Last week, Michael Betancourt, from WarwickarXived a neat wee note on the fundamental difficulties in running HMC on a subsample of the original data. The core message is that using only one fraction of the data to run an HMC with the hope that it will preserve the stationary distribution does not work. The only way to recover from the bias is to use a Metropolis-Hastings step using the whole data, a step that both kills most of the computing gain and has very low acceptance probabilities. Even the strategy that subsamples for each step in a single trajectory fails: there cannot be a significant gain in time without a significant bias in the outcome. Too bad..! Now, there are ways of accelerating HMC, for instance by parallelising the computation of gradients but, just as in any other approach (?), the information provided by the whole data is only available when looking at the whole data.