Le Monde puzzle [#1016]

Posted in Books, Kids with tags , , , on July 16, 2017 by xi'an

An even more straightforward Le Monde mathematical puzzle that took a few minutes to code in the train to Cambridge:

1. Breaking {1,…,8} into two sets of four integrals, what is (or are) the division into two groups of equal size such that the sums of the squared terms from each are equal? Same question for the set {21,…,28}.
2.  Considering the integers from 1 to 12, how many divisions into two groups of size six satisfy the above property? Same question when the two groups are of different sizes.

The first code is

nop=TRUE
while (nop){
s=sample(1:8)
nop=(sum(s[1:4]^2)!=sum(s[5:8]^2))}


with result

1 6 4 7

while the second set leads to the unique [drifted] solution (up to symmetries)

21 24 26 27


and the divisions for the larger set {1,…,12} is unique in the equal case, and are four in the unequal case.

RNG impact on MCMC [or lack thereof]

Posted in Books, R, Statistics, Travel, University life with tags , , , , , , , on July 13, 2017 by xi'an

Following the talk at MCM 2017 about the strange impact of the random generator on the outcome of an MCMC generator, I tried in Montréal airport the following code on the banana target of Haario et al. (1999), copied from Soetaert and Laine and using the MCMC function of the FME package:

library(FME)
Banana <- function (x1, x2) {
return(x2 - (x1^2+1)) }
pmultinorm <- function(vec, mean, Cov) {
diff <- vec - mean
ex <- -0.5*t(diff) %*% solve(Cov) %*% diff
rdet <- sqrt(det(Cov))
power <- -length(diff)*0.5
return((2.*pi)^power / rdet * exp(ex)) }
BananaSS <- function (p) {
P <- c(p[1], Banana(p[1], p[2]))
Cov <- matrix(nr = 2, data = c(1, 0.9, 0.9, 1))
N=1e3
ejd=matrix(0,4,N)
RNGkind("Mars")
for (t in 1:N){
MCMC <- modMCMC(f = BananaSS, p = c(0, 0.7),
jump = diag(nrow = 2, x = 5), niter = 1e3)
ejd[1,t]=mean((MCMC$pars[-1,2]-MCMC$pars[1,2])^2)}


since this divergence from the initial condition seemed to reflect the experiment of the speaker at MCM 2017. Unsurprisingly, no difference came from using the different RNGs in R (which may fail to contain those incriminated by the study)…

easy riddle

Posted in Books, Kids, R with tags , , , , , on July 12, 2017 by xi'an

From the current Riddler, a problem that only requires a few lines of code and a few seconds of reasoning. Or not.

N households each stole the earnings from one of the (N-1) other households, one at a time. What is the probability that a given household is not burglarised? And what are the expected final earnings of each household in the list, assuming they all start with \$1?

The first question is close to Feller’s enveloppe problem in that

$\left(1-\frac{1}{N-1}\right)^{N-1}$

is close to exp(-1) for N large. The second question can easily be solved by an R code like

N=1e3;M=1e6
fina=rep(1,N)
for (v in 1:M){
ordre=sample(1:N)
vole=sample(1:N,N,rep=TRUE)
while (min(abs(vole-(1:N)))==0)
vole[abs(vole-(1:N))==0]=sample(1:N,
sum(vole-(1:N)==0))
cash=rep(1,N)
for (t in 1:N){
cash[ordre[t]]=cash[ordre[t]]+cash[vole[t]];cash[vole[t]]=0}
fina=fina+cash[ordre]}


which returns a pretty regular exponential-like curve, although I cannot figure the exact curve beyond the third burglary. The published solution gives the curve

${\frac{N-2}{N-1}}^{999}\times 2+{\frac{1}{N-1}}^{t-1}\times{\frac{N-1}{N}}^{N-t}\times\frac{N}{N-1}$

corresponding to the probability of never being robbed (and getting on average an extra unit from the robbery) and of being robbed only before robbing someone else (with average wealth N/(N-1)).

MCM 2017 snapshots [#2]

Posted in Books, pictures, Running, Statistics, University life with tags , , , , , , , , , , , on July 7, 2017 by xi'an

On the second day of MCM 2017, Emmanuel Gobet (from Polytechnique) gave the morning plenary talk on regression Monte Carlo methods, where he presented several ways of estimating conditional means of rv’s in nested problems where conditioning involves other conditional expectations. While interested in such problems in connection with ABC, I could not see how the techniques developed therein could apply to said problems.

By some of random chance, I ended up attending a hard-core random generation session where the speakers were discussing discrepancies between GNU library generators [I could not understand the target of interest and using MCMC till convergence seemed prone to false positives!], and failed statistical tests of some 64-bit Mersenne Twisters, and low discrepancy on-line subsamples of Uniform samples. Most exciting of all, Josef Leydold gave a talk on ratio-of-uniforms, on which I spent some time a while ago  (till ending up reinventing the wheel!), with highly refined cuts of the original box.

My own 180 slides [for a 50mn talk] somewhat worried my chairman, Art Owen, who kindly enquired the day before at the likelihood I could go through all 184 of them!!! I had appended the ABC convergence slides to an earlier set of slides on ABC with random forests in case of questions about that aspect, although I did not plan to go through those slides [and I mostly covered the 64 other slides] As the talk was in fine more about an inference method than a genuine Monte Carlo technique, plus involved random forests that sounded unfamiliar to many, I did not get many questions from the audience but had several deep discussions with people after the talk. Incidentally, we have just reposted our paper on ABC estimation via random forests, updated the abcrf R package, and submitted it to Peer Community in Evolutionary Biology!

Le Monde puzzle [#1013]

Posted in Books, Kids with tags , , , , , on June 23, 2017 by xi'an

A purely arithmetic Le Monde mathematical puzzle:

An operation þ applies to all pairs of natural integers with the properties

0 þ (a+1) = (0 þ a)+1, (a+1) þ (b+1)=(a þ b)+1, 271 þ 287 = 77777, 2018 þ 39 = 2018×39

Find the smallest integer d>287 such that there exists c<d leading to c þ d = c x d, the smallest integer f>2017 such that 2017 þ f = 2017×40. Is there any know integer f such that f þ 2017 = 40×2017?

The major appeal in this puzzle (where no R programming seems to help!) is that the “data” does not completely defines the operation  þ ! Indeed, when a<b, it is straightforward to deduce that a þ b = (0 þ 0)+b, hence solving the first two questions by deriving (0 þ 0)=270×287 [with d=315 and f=2017×40-270×287], but the opposed quantity b þ a is not defined, apart from (2018-39) þ 0. This however brings a resolution since

(2018-39) þ 0 = 2017×39 and (2018-39+2017) þ 2017 = 2017×39+2017 = 2017×40

non-Bayesian decision riddle

Posted in Statistics with tags , , , on June 22, 2017 by xi'an

As a continuation of the Bayesian resolution of last week riddle, I looked at a numeric resolution of the four secretaries problem, while in the train back from Rouen (and trying to block the chatter of my neighbours, a nuisance I find myself more and more sensitive to!). The target function is defined as

gainz=function(b,c,T=1e4,type="raw"){
x=matrix(runif(4*T),ncol=4)
maz=t(apply(x,1,cummax))
zam=t(apply(x[,4:1],1,cummax))
if (type=="raw"){return(mean(
((x[,2]>b*x[,1])*x[,2]+
(x[,2]<b*x[,1])*((x[,3]>c*maz[,2])*x[,3]+
(x[,3]<c*maz[,2])*x[,4]))/maz[,4]))}
if (type=="global"){return(mean(
((x[,2]>b*x[,1])*(x[,2]==maz[,4])+
(x[,2]<b*x[,1])*((x[,3]>c*maz[,2])*(x[,3]==maz[,4])+
(x[,3]<c*maz[,2])*(x[,4]==maz[,4])))))}
if (type=="remain"){return(mean(
((x[,2]>b*x[,1])*(x[,2]==zam[,3])+
(x[,2]<b*x[,1])*((x[,3]>c*maz[,2])*(x[,3]==zam[,2])+
(x[,3]<c*maz[,2])*(x[,4]==zam[,2])))))}}


where the data is generated from a U(0,1) distribution as the loss functions are made scaled free by deciding to always sacrifice the first draw, x¹. This function is to be optimised in (b,c) and hence I used a plain vanilla simulated annealing R code:

avemale=function(T=3e4,type){
b=c=.5
maxtar=targe=gainz(b,c,T=1e4,type)
temp=0.1
for (t in 1:T){
bp=b+runif(1,-temp,temp)
cp=c+runif(1,-temp,temp)
parge=(bp>0)*(cp>0)*gainz(bp,cp,T=1e4,type)
if (parge>maxtar){
b=bs=bp;c=cs=cp;maxtar=targe=parge}else{
if (runif(1)<exp((parge-targe)/temp)){
b=bp;c=cp;targe=parge}}
temp=.9999*temp}
return(list(bs=bs,cs=cs,max=maxtar))}


with outcomes

• b=1, c=.5, and optimum 0.8 for the raw type
• b=c=1 and optimum 0.45 for the global type
• b undefined, c=2/3 and optimum 0.75 for the remain type

datazar

Posted in R, Statistics, University life with tags , , , , , , , on June 4, 2017 by xi'an

A few weeks ago and then some, I [as occasional blogger!] got contacted by datazar.com to write a piece on this data-sharing platform. I then went and checked what this was all about, having the vague impression this was a platform where I could store and tun R codes, besides dropping collective projects, but from what I quickly read, it sounds more like being able to run R scripts from one’s machine using data and code stored on datazar.com. But after reading just one more blog entry I finally understood it is also possible to run R, SQL, NotebookJS (and LaTeX) directly on that platform, without downloading code or data to one’s machine. Which makes it a definitive plus with this site, as users can experiment with no transfer to their computer. Hence on a larger variety of platforms. While personally I do not [yet?] see how to use it for my research or [limited] teaching, it seems like an [yet another] interesting exploration of the positive uses of Internet to collaborate and communicate on scientific issues! With no opinion on privacy and data protection offered by the site, of course.