Archive for tree

you said that I said that you said that…

Posted in Books, Kids, pictures, Statistics with tags , , , on May 25, 2021 by xi'an

A riddle from The Riddler on limited information decision making, which I tought I failed to understand:

Two players, Martina and Olivia, are each secretly given realisations, m and u. Starting with Martina, they must state to the other player whom they think probably has the greater number until they agree. They are playing as a team, hoping to maximize the chances they correctly predict who has the greater number. For a given round, what is the probability that the person they agree on really does have the bigger number?

A logical strategy is as follows: If m>.5, P(U>m)<.5, hence Martina should state her number is probably bigger, which conveys to Olivia that M>.5. If u<.5, Olivia can agree for certain, else, if u>.75, P(M>u)<.5 and she can state a probably larger number, while if 0.5<u<.75, Olivia can state (truthfully) that her number us probably smaller, although there is a ½ probability she is wrong. As detailed in the solution, the probability of finishing on a false statement is ¼²+¼³+…, equal to 1/12.

Take thrice thy money, bid me tear the bond

Posted in Books, Kids with tags , , , , , on April 21, 2021 by xi'an

A rather fun riddle for which the pen&paper approach proved easier than coding in R, for once. It goes as follows: starting with one Euro, one sequentially predicts a sequence of four random bits, betting an arbitrary fraction of one’s money at each round. When winning, the bet is doubled, otherwise, it is lost. Under the information that the same bit cannot occur thrice in a row, what is the optimal minimax gain?

Three simplifications: (i) each bet is a fraction ε of the current fortune of the player, which appears as a product of (1±ε) the previous bets (ii) when the outcome is 0 or 1, this fraction ε can thus be chosen in (-1,1), (iii) while the optimal choice is ε=1 when the outcome is known, i.e., when both previous are identical. The final fortune of the player is thus of the form

(1±ε)(1±ε’)(1±ε”)(1±ε”’)

if the outcome is alternating (e.g., 0101 or 0100), while it is of the form

2(1±ε)(1±ε’)(1±ε”)

if there are two identical successive bits in the first three results (e.g., 1101 or 0110). When choosing each of the fractions ε, the minimum final gain must be maximised. This implies that ε=0 for the bet on the final bit  when the outcome is uncertain (and ε=1 otherwise). In case of an alternating début, like 01, the minimal gain is

min{(1±ε)(1±ε’)(1+ε”),2(1±ε)(1±ε’)(1-ε”)}

which is maximised by ε”=1/3, taking the objective value 4(1±ε)(1±ε’)/3. Leading to the gain after the first bit being

min{4(1±ε)(1+ε’)/3,2(1±ε)(1-ε’)}

which is maximised by ε’=1/5, for the objective value 8(1±ε)/5. By symmetry, the optimal choice is ε=0. Which ends up with a minimax gain of 3/5. [The quote is from Shakespeare, in the Merchant of Venice.]

off to BayesComp 20, Gainesville

Posted in pictures, Statistics, Travel, University life with tags , , , , , , , , , , , , on January 7, 2020 by xi'an

little people of Tofino [jatp]

Posted in Kids, pictures, Running, Travel with tags , , , , , , on August 19, 2018 by xi'an

Bayesian regression trees [seminar]

Posted in pictures, Statistics, University life with tags , , , , , , , , , , on January 26, 2018 by xi'an
During her visit to Paris, Veronika Rockovà (Chicago Booth) will give a talk in ENSAE-CREST on the Saclay Plateau at 2pm. Here is the abstract
Posterior Concentration for Bayesian Regression Trees and Ensembles
(joint with Stephanie van der Pas)Since their inception in the 1980’s, regression trees have been one of the more widely used non-parametric prediction methods. Tree-structured methods yield a histogram reconstruction of the regression surface, where the bins correspond to terminal nodes of recursive partitioning. Trees are powerful, yet  susceptible to over-fitting.  Strategies against overfitting have traditionally relied on  pruning  greedily grown trees. The Bayesian framework offers an alternative remedy against overfitting through priors. Roughly speaking, a good prior  charges smaller trees where overfitting does not occur. While the consistency of random histograms, trees and their ensembles  has been studied quite extensively, the theoretical understanding of the Bayesian counterparts has  been  missing. In this paper, we take a step towards understanding why/when do Bayesian trees and their ensembles not overfit. To address this question, we study the speed at which the posterior concentrates around the true smooth regression function. We propose a spike-and-tree variant of the popular Bayesian CART prior and establish new theoretical results showing that  regression trees (and their ensembles) (a) are capable of recovering smooth regression surfaces, achieving optimal rates up to a log factor, (b) can adapt to the unknown level of smoothness and (c) can perform effective dimension reduction when p>n. These results  provide a piece of missing theoretical evidence explaining why Bayesian trees (and additive variants thereof) have worked so well in practice.