## Le Monde puzzle [#843]

Posted in Books, Kids, R with tags , , , , , on December 7, 2013 by xi'an

A Le Monde mathematical puzzle of moderate difficulty:

How many binary quintuplets (a,b,c,d,e) can be found such that any pair of quintuplets differs by at least two digits?

I solved it by the following R code that iteratively eliminates quintuplets that are not different enough from the first ones, for a random order of the 2⁵ quintuplets because the order matters in the resulting number (the intToBits trick was provided by an answer on StackExchange/stackoverflow):

```sol=0
for (t in 1:10^4){ #random permutations
as.integer(intToBits(x))})[1:5,sample(1:32)]
V=32;inin=rep(TRUE,V);J=1
while (J&lt;V){
for (i in (J+1):V)
inin[i]=FALSE
J=J+1}
if (sol&lt;V){
}
```

which returns solutions like

```> sol
[1] 11
> levote
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
[1,]  0    0    0    0    1    1    1    1    0     1     0
[2,]  0    1    0    1    0    1    0    1    0     1     1
[3,]  0    1    1    0    1    0    1    1    1     0     0
[4,]  0    1    1    1    0    0    0    0    0     1     0
[5,]  0    0    0    0    0    0    1    0    0     0     0
```

In the same Science leaflet, Marco Zito had yet another tribune worth bloggin’ about (or against), under the title “Voyage au bout du bruit” (with no apologies to Céline!), where he blathers about (background) noise ["bruit"] versus signal without ever mentioning statistics. I will not repeat the earlier feat of translating the tribune, but he also includes an interesting trivia: in the old TV sets of my childhood, the “snow” seen in the absence of transmission signal is due in part to CMB!

## Le Monde puzzle [#842]

Posted in Books, Kids, R with tags , , , on November 30, 2013 by xi'an

An easily phrased (and solved?) Le Monde mathematical puzzle that does not [really] require an R code:

The five triplets A,B,C,D,E are such that

$A_1=B_2+C_2+D_2+E_2\,,\ B_1=A_2+C_2+D_2+E_2\,,...$

and

$A_2=B_3+C_3+D_3+E_3\,,\ B_2=A_3+C_3+D_3+E_3\,,...$

Given that

$A_1=193\,,\ B_1=175\,, C_1=185\,, D_1=187\,, E_1<175$

find the five triplets.

Adding up both sets of equations shows everything solely depends upon E1… So running an R code that checks for all possible values of E1 is a brute-force solution. However, one must first find what to check. Given that the sums of the triplets are of the form (16s,4s,s), the possible choices for E1 are necessarily restricted to

```> S0=193+187+185+175
> ceiling(S0/16)
[1] 47
> floor((S0+175)/16)
[1] 57
> (47:57)*16-S0 #E1=S1-S0
[1]  12  28  44  60  76  92 108 124 140 156 172
```

The first two values correspond to a second sum S2 equal to 188 and 192, respectively, which is incompatible with A1 being 193. Furthermore, the corresponding values for E2 and E3 are then given by

```> S2==(49:57)*4
> E1=(49:57)*16-S0
> E2=S2-E1
> S3=S2/4
> S3-E2
[1] -103  -90  -77  -64  -51  -38  -25  -12    1
```

which excludes all values but E1=172. No brute-force in the end…

## Unusual timing shows how random mass murder can be (or even less)

Posted in Books, R, Statistics, Travel with tags , , , , , , , , on November 29, 2013 by xi'an

This post follows the original one on the headline of the USA Today I read during my flight to Toronto last month. I remind you that the unusual pattern was about observing four U.S. mass murders happening within four days, “for the first time in at least seven years”. Which means that the difference between the four dates is at most 3, not 4!

I asked my friend Anirban Das Gupta from Purdue University are the exact value of this probability and the first thing he pointed out was that I used a different meaning of “within 4″. He then went into an elaborate calculation to find an upper bound on this probability, upper bound that was way above my Monte Carlo approximation and my rough calculation of last post. I rechecked my R code and found it was not achieving the right approximation since one date was within 3 days of three other days, at least… I thus rewrote the following R code

```T=10^6
four=rep(0,T)
for (t in 1:T){
day=sort(sample(1:365,30,rep=TRUE)) #30 random days
day=c(day,day[day>363]-365) #account for toric difference
tem=outer(day,day,"-")
four[t]=(max(apply(((tem>-1)&(tem<4)),1,sum)>3))
}
mean(four)
```

[checked it was ok for two dates within 1 day, resulting in the birthday problem probability] and found 0.070214, which is much larger than the earlier value and shows it takes an average 14 years for the “unlikely” event to happen! And the chances that it happens within seven years is 40%.

Another coincidence relates to this evaluation, namely the fact that two elderly couples in France committed couple suicide within three days, last week. I however could not find the figures for the number of couple suicides per year. Maybe because it is extremely rare. Or undetected…

## Importance sampling schemes for evidence approximation in mixture models

Posted in R, Statistics, University life with tags , , , , , , , , , on November 27, 2013 by xi'an

Jeong Eun (Kate) Lee and I completed this paper, “Importance sampling schemes for evidence approximation in mixture models“, now posted on arXiv. (With the customary one-day lag for posting, making me bemoan the days of yore when arXiv would give a definitive arXiv number at the time of submission.) Kate came twice to Paris in the past years to work with me on this evaluation of Chib’s original marginal likelihood estimate (also called the candidate formula by Julian Besag). And on the improvement proposed by Berkhof, van Mechelen, and Gelman (2003), based on averaging over all permutations, idea that we rediscovered in an earlier paper with Jean-Michel Marin. (And that Andrew seemed to have completely forgotten. Despite being the very first one to publish [in English] a paper on a Gibbs sampler for mixtures.) Given that this averaging can get quite costly, we propose a preliminary step to reduce the number of relevant permutations to be considered in the averaging, removing far-away modes that do not contribute to the Rao-Blackwell estimate and called dual importance sampling. We also considered modelling the posterior as a product of k-component mixtures on the components, following a vague idea I had in the back of my mind for many years, but it did not help. In the above boxplot comparison of estimators, the marginal likelihood estimators are

1. Chib’s method using T = 5000 samples with a permutation correction by multiplying by k!.
2. Chib’s method (1), using T = 5000 samples which are randomly permuted.
3. Importance sampling estimate (7), using the maximum likelihood estimate (MLE) of the latents as centre.
4. Dual importance sampling using q in (8).
5. Dual importance sampling using an approximate in (14).
6. Bridge sampling (3). Here, label switching is imposed in hyperparameters.

## MCMSki IV, Jan. 6-8, 2014, Chamonix (news #12)

Posted in Mountains, R, Statistics, University life with tags , , , , , , , , , , , , on November 26, 2013 by xi'an

We are converging towards MCMSki IV getting closer and closer to the conference! I hope that by now all intended participants have registered (registration is still open!), found a place where to stay during and around the conference (still feasible!), and booked their flight to Geneva (or nearby).

First, please send me asap the  poster abstract to bayesianstatistics@gmail.com if you plan to present a poster. We are currently with 45 abstracts on my special wordpress blog and there is no deadline for sending your abstracts. Even though Jan. 07 may be a wee bit extreme….

Second, we are currently 195 registered participants. This is fantastic! I am looking forward this great company and do not expect to find free time to go skiing during the meeting! Note that there will be hardly any conference material, except for a single sheet with the program and rooms, so make sure to plan your session in advance. I also remind participants that the banquet is a paying option in the registration form. The cost is not included in the basic registration…

Third, make sure of your travel plans to and back from Chamonix. The airport in Geneva is 80 km away and you need to book a shuttle or a bus if your timing does not coincide with the two three shuttles (each way) available via the conference registration page. There are two doodles to monitor arrivals and departures, but hardly any entry so far. Not sure I can add any extra shuttle, but if there are enough of you… Check on the conference website for travel tips. And do not, I repeat do not!, consider booking a taxi at Geneva airport as an option, since they are extremely expensive. Horrendously so. (Both on the French and Swiss sides of the airport.)