## overlap, overstreched

Posted in Books, Kids, R, Statistics with tags , , , , , , on June 15, 2020 by xi'an An interesting challenge on The Riddler on the probability to see a random interval X’ing with all other random intervals when generating n intervals from Dirichlet D(1,1,1). As it happens the probability is always 2/3, whatever n>1, as shown by the R code below (where replicate cannot be replaced by rep!):

qro=function(n,T=1e3){
quo=function(n){
xyz=t(apply(matrix(runif(2*n),n),1,sort))
sum(xyz[,1]<min(xyz[,2])&xyz[,2]>max(xyz[,1]))<0}
mean(replicate(quo(n),T))}


and discussed more in details on X validated. As only a property on permutations and partitions. (The above picture is taken from this 2015 X validated post.)

## Le Monde puzzle [#1051]

Posted in Books, Kids, R with tags , , , , , , on May 18, 2018 by xi'an A combinatoric Le Monde mathematical puzzle of limited size: When the only allowed move is to switch two balls from adjacent boxes, what is the minimal number of moves to return all balls in the above picture to their respective boxes? Same question with six boxes and 12 balls.

The question is rather interesting to code as I decided to use recursion (as usual!) but wanted to gain time by storing the number of steps needed by any configuration to reach its ordered recombination. Meaning I had to update an external vector within the recursive function for each new configuration I met. With help from Julien Stoehr, who presented me with the following code, a simplification of a common R function

v.assign <- function (i,value,...) {
temp <- get(i, pos = 1)
temp[...] <- value
assign(i, temp, pos = 1)}


which assigns one or several entries to the external vector i. I thus used this trick in the following R code, where cosz is a vector of size 5¹⁰, much larger than the less than 10! values I need but easier to code. While n≤5.

n=5;tn=2*n
baz=n^(0:(tn-1))
cosz=rep(-1,n^tn)
swee <- function(balz){
indz <- sum((balz-1)*baz)
if (cosz[indz]==-1){
if (min(diff(balz))==0){ #ordered
v.assign("cosz",indz,value=1)}else{
val <- n^tn
for (i in 2:n)
for (j in (2*i-1):(2*i))
for (k in (2*i-3):(2*i-2)){
calz <- balz
calz[k] <- balz[j];calz[j] 0)
val <- min(val,1+swee(calz))}
v.assign("cosz",indz,value=val)
}}
return(cosz[indz])}


which returns 2 for n=2, 6 for n=3, 11 for n=4, 15 for n=5. In the case n=6, I need a much better coding of the permutations of interest. Which is akin to ranking all words within a dictionary with letters (1,1,…,6,6). After some thinking (!) and searching, I came up with a new version, defining

parclass=rep(2,n)
rankum=function(confg){
n=length(confg);permdex=1
for (i in 1:(n-1)){
x=confg[i]
if (x>1){
for (j in 1:(x-1)){
if(parclass[j]>0){
parclass[j]=parclass[j]-1
permdex=permdex+ritpermz(n-i,parclass)
parclass[j]=parclass[j]+1}}}
parclass[x]=parclass[x]-1}
return(permdex)}

ritpermz=function(n,parclass){
return(factorial(n)/prod(factorial(parclass)))}


for finding the index of a given permutation, between 1 and (2n)!/2!..2!, and then calling the initial swee(p) with this modified allocation. The R code was still running when I posted this entry… and six days later, it returned the answer of 23.

## Le Monde puzzle [#1043]

Posted in Books, Kids with tags , , , , , , on March 5, 2018 by xi'an An arithmetic Le Monde mathematical puzzle :

A number is “noble” if all its digits are different and if it is equal to the average of all numbers created by permuting its digits. What are the noble numbers?

There is no need for simulation when plain enumeration works. After failing to install the R packge permutations, I installed the R package permute, which works, although (a) the function allPerm does not apply directly to a vector of characters or numbers but only to its size:

> allPerms(c("a","r","h"))
[,1] [,2] [,3]
[1,]    1    3    2
[2,]    2    1    3
[3,]    2    3    1
[4,]    3    1    2
[5,]    3    2    1


and (b) as seen above the function does not contain “all” permutations since it misses the identity permutation.  Which ends up being fine for solving this puzzle. Using a bit of digit-character manipulation

findzol=function(N=2){
for (u in 1:(10^N-1)){
digz=strsplit(as.character(u),"")[]
if (length(digz)<N)
digz=c(rep("0",N-length(digz)),digz)
if (length(unique(digz))==N){
permz=apply(matrix(digz[allPerms(1:N)],
ncol=N),2,as.numeric)
if (mean(permz%*%10^{(N-1):0})==u) print(u)}}}


I found solutions for N=3

> findzol(3)
 370
 407
 481
 518
 592
 629


and none for N=4,5,6. Le Monde gives solutions for N=9, which is not achievable by my code!

## Le Monde puzzle [#977]

Posted in Books, Kids, pictures, Statistics, Travel, University life with tags , , , , , on October 3, 2016 by xi'an A mild arithmetic Le Monde mathematical puzzle:

Find the optimal permutation of {1,2,..,15} towards minimising the maximum of sum of all three  consecutive numbers, including the sums of the 14th, 15th, and first numbers, as well as the 15th, 1st and 2nd numbers.

If once again opted for a lazy solution, not even considering simulated annealing!,

parme=sample(15)
max(apply(matrix(c(parme,parme[-1],
parme,parme[-(1:2)],parme[1:2]),3),2,sum))


and got the minimal value of 26 for the permutation

14 9 2 15 7 1 11 10 4 12 8 5 13 6 3

Le Monde gave a solution with value 25, though, which is

11 9 7 5 13 8 2 10 14 6 1 12 15 4 3

but there is a genuine mistake in the solution!! This anyway shows that brute force may sometimes fail. (Which sounds like a positive conclusion to failing to find the proper solution! But trying with a simple simulated annealing version did not produce any 25 either…)

## relabelling mixtures (#2)

Posted in Statistics, Travel, University life with tags , , , , , , on February 5, 2015 by xi'an

Following the previous post, I went and had  a (long) look at Puolamäki and Kaski’s paper. I must acknowledge that, despite having several runs through the paper, I still have trouble with the approach… From what I understand, the authors use a Bernoulli mixture pseudo-model to reallocate the observations to components.  That is, given an MCMC output with simulated allocations variables (a.k.a., hidden or latent variables), they create a (TxK)xn matrix of component binary indicators e.g., for a three component mixture,

0 1 0 0 1 0…
1 0 0 0 0 0…
0 0 1 1 0 1…
0 1 0 0 1 1…

and estimate a probability to be in component j for each of the n observations, according to the (pseudo-)likelihood $\prod_{r=1}^R \sum_{j=1}^K \prod_{i=1}^n \beta_{i,j}^{z_{i,r}}(1-\beta_{i,j})^{1-z_{i,r}}$

It took me a few days, between morning runs and those wee hours when I cannot get back to sleep (!), to make some sense of this Bernoulli modelling. The allocation vectors are used together to estimate the probabilities of being “in” component j together. However the data—which is the outcome of an MCMC simulation and de facto does not originate from that Bernoulli mixture—does not seem appropriate, both because it is produced by an MCMC simulation and is made of blocks of highly correlated rows [which sum up to one]. The Bernoulli likelihood above also defines a new model, with many more parameters than in the original mixture model. And I fail to see why perfect, partial or inexistent label switching [in the MCMC sequence] is not going to impact the estimation of the Bernoulli mixture. And why an argument based on a fixed parameter value (Theorem 3) extends to an MCMC outcome where parameters themselves are subjected to some degree of label switching. Bemused, I remain…