Archive for the Statistics Category

Le Monde puzzle [#1157]

Posted in Books, Kids, R with tags , , , , , , on October 1, 2020 by xi'an

The weekly puzzle from Le Monde is an empty (?) challenge:

Kimmernaq and Aputsiaq play a game where Kimmernaq picks ten different integers between 1 and 100, and Aputsiaq must find a partition of these integers into two groups with identical sums. Who is winning?

Indeed, if the sums are equal, then the sum of their sums is even, meaning the sum of the ten integers is even. Any choice of these integers such that the sum is odd is a sure win for Kimmernaq. End of the lame game (if I understood its wording correctly!). If some integers can be left out of the groups, then both solutions seem possible: calling the R code

P=1;M=1e3
while (P<M){
a=sample(1:M,10);P=Q=0
while((P<M)&(!Q)){
t=sample(1:7,1)     #size of subset
o=1         #total sum must be even
while(!!(s<-sum(o))%%2)o=sample(a,10-t)
Q=max(2*cumsum(b<-sample(o))==s)
P=P+1}}

I found no solution (i.e. exiting the outer while loop) for M not too large…  So Aputsiaq is apparently winning. Le Monde solution considers the 2¹⁰-1=1023 possible sums made out of 10 integers, which cannot exceed 955, hence some of these sums must be equal (and the same applies when removing the common terms from both sums!). When considering the second half of the question

What if Kimmernaq picks 6 distinct integers between 1 and 40, and Aputsiaq must find a partition of these integers into two groups with identical sums. Who is winning?

recycling the above R code produced subsets systematically hitting the upper bound M, for much larger values. So Kimmernaq should have a mean to pick 6 integers such that any subgroup cannot be broken into two parts with identical sums. One of the outcomes being

 
> a
[1] 36 38 30 18  1 22

one can check that all the possible sums differ:

aa=a
for(i in 2:5){
 bb=NULL
 while(length(bb)<choose(6,i))bb=unique(c(bb,sum(sample(a,i))))
 aa=c(aa,bb)}
unique(aa)

and the outcome is indeed of length 2⁶-2=62!

As an aside, a strange [to me at least] R “mistake” was that when recycling the variable F in a code-golfing spirit, since it is equal to zero by default, rather than defining a new Q:

while((P<M)&(!F)){
...
F=max(2*cumsum(b<-sample(o))==s)
P=P+1}

the counter P was not getting updated!

a case for Bayesian deep learnin

Posted in Books, pictures, Statistics, Travel, University life with tags , , , , , , , , , , on September 30, 2020 by xi'an

Andrew Wilson wrote a piece about Bayesian deep learning last winter. Which I just read. It starts with the (posterior) predictive distribution being the core of Bayesian model evaluation or of model (epistemic) uncertainty.

“On the other hand, a flat prior may have a major effect on marginalization.”

Interesting sentence, as, from my viewpoint, using a flat prior is a no-no when running model evaluation since the marginal likelihood (or evidence) is no longer a probability density. (Check Lindley-Jeffreys’ paradox in this tribune.) The author then goes for an argument in favour of a Bayesian approach to deep neural networks for the reason that data cannot be informative on every parameter in the network, which should then be integrated out wrt a prior. He also draws a parallel between deep ensemble learning, where random initialisations produce different fits, with posterior distributions, although the equivalent to the prior distribution in an optimisation exercise is somewhat vague.

“…we do not need samples from a posterior, or even a faithful approximation to the posterior. We need to evaluate the posterior in places that will make the greatest contributions to the [posterior predictive].”

The paper also contains an interesting point distinguishing between priors over parameters and priors over functions, ony the later mattering for prediction. Which must be structured enough to compensate for the lack of data information about most aspects of the functions. The paper further discusses uninformative priors (over the parameters) in the O’Bayes sense as a default way to select priors. It is however unclear to me how this discussion accounts for the problems met in high dimensions by standard uninformative solutions. More aggressively penalising priors may be needed, as those found in high dimension variable selection. As in e.g. the 10⁷ dimensional space mentioned in the paper. Interesting read all in all!

one World ABC seminar [term #2]

Posted in Statistics with tags , , , , , , , , , , on September 29, 2020 by xi'an

The on-line One World ABC seminar continues on-line this semester! With talks every other Thursday at 11:30 UK time (12:30 central European time). Incoming speakers are

with presenters to be confirmed for 29 October. Anyone interested in presenting at this webinar in a near future should not hesitate in contacting Massimiliano Tamborrino in Warwick or any of the other organisers of the seminar!

banned books week

Posted in Statistics with tags , , , , , , , , , , , on September 28, 2020 by xi'an

Le Monde puzzle [#1155]

Posted in Books, Kids, R with tags , , , , , , , on September 26, 2020 by xi'an

The weekly puzzle from Le Monde is another Sudoku challenge:

Anahera and Wiremu play a game for T rounds. They successively pick a digit between 1 and 3, never repeating the previous one, and sum these digits. The last to play wins if the sum is a multiple of 3. Who is the winner for an optimal strategy?

By a simple dynamic programming of the optimal strategy at each step

N=3
A=matrix(-1,20,N)
A[1,1:N]=1:N
for (T in 2:20)
for (i in 1:N) A[T,i]=i+ifelse(!T%%2, #parity check
max((i+A[T-1,-i])%%3), #avoid zero
min((i+A[T-1,-i])%%3)) #seek zero

the first to play can always win the game. Not fun!