## Archive for graph

Posted in Books, Kids, pictures with tags , , , , , , , on December 16, 2018 by xi'an

A first graph in Le Monde about the impact of the recent tax changes on French households as a percentage of net income with negative values at both ends, except for a small spike at about 10% and another one for the upper 1%, presumably linked with the end of the fortune tax (ISF).

A second one showing incompressible expenses by income category, with poorest households facing a large constraint on lodging, missing the fraction due to taxes. Unless the percentage is computed after tax.

A last and amazing one detailing the median monthly income per socio-professional category, not because of the obvious typo on the blue collar median 1994!, but more fundamentally because retirees have a median income in the upper part of the range. (This may be true in most developed countries, I was just unaware of this imbalance.)

## a Simpson paradox of sorts

Posted in Books, Kids, pictures, R with tags , , , , , , , , , on May 6, 2016 by xi'an

The riddle from The Riddler this week is about finding an undirected graph with N nodes and no isolated node such that the number of nodes with more connections than the average of their neighbours is maximal. A representation of a connected graph is through a matrix X of zeros and ones, on which one can spot the nodes satisfying the above condition as the positive entries of the vector (X1)^2-(X^21), if 1 denotes the vector of ones. I thus wrote an R code aiming at optimising this target

targe <- function(F){
sum(F%*%F%*%rep(1,N)/(F%*%rep(1,N))^2<1)}


by mere simulated annealing:

rate <- function(N){
# generate matrix F
# 1. no single
F=matrix(0,N,N)
F[sample(2:N,1),1]=1
F[1,]=F[,1]
for (i in 2:(N-1)){
if (sum(F[,i])==0)
F[sample((i+1):N,1),i]=1
F[i,]=F[,i]}
if (sum(F[,N])==0)
F[sample(1:(N-1),1),N]=1
F[N,]=F[,N]
# 2. more connections
F[lower.tri(F)]=F[lower.tri(F)]+
sample(0:1,N*(N-1)/2,rep=TRUE,prob=c(N,1))
F[F>1]=1
F[upper.tri(F)]=t(F)[upper.tri(t(F))]
#simulated annealing
T=1e4
temp=N
targo=targe(F)
for (t in 1:T){
#1. local proposal
nod=sample(1:N,2)
prop=F
prop[nod[1],nod[2]]=prop[nod[2],nod[1]]=
1-prop[nod[1],nod[2]]
while (min(prop%*%rep(1,N))==0){
nod=sample(1:N,2)
prop=F
prop[nod[1],nod[2]]=prop[nod[2],nod[1]]=
1-prop[nod[1],nod[2]]}
target=targe(prop)
if (log(runif(1))*temp<target-targo){
F=prop;targo=target}
#2. global proposal
prop=F prop[lower.tri(prop)]=F[lower.tri(prop)]+
sample(c(0,1),N*(N-1)/2,rep=TRUE,prob=c(N,1))
prop[prop>1]=1
prop[upper.tri(prop)]=t(prop)[upper.tri(t(prop))]
target=targe(prop)
if (log(runif(1))*temp<target-targo){
F=prop;targo=target}
temp=temp*.999
}
return(F)}


This code returns quite consistently (modulo the simulated annealing uncertainty, which grows with N) the answer N-2 as the number of entries above average! Which is rather surprising in a Simpson-like manner since all entries but two are above average. (Incidentally, I found out that Edward Simpson recently wrote a paper in Significance about the Simpson-Yule paradox and him being a member of the Bletchley Park Enigma team. I must have missed out the connection with the Simpson paradox when reading the paper in the first place…)

## Le Monde puzzle [#13]

Posted in R with tags , , on April 14, 2011 by xi'an

This week, Le Monde offers not one but three related puzzles:

1. Is it possible to label the twelve edges of a cube by consecutive numbers such that the sum of the edge numbers at any of the eight nodes is constant?
2. Is it possible to label the eight nodes of a cube by consecutive numbers such that the sums of the node numbers over the edges are all different?
3. It possible to label the twelve edges of a cube by consecutive numbers except for two edges where the difference is two such that the sum of the edge numbers at any of the eight nodes is constant?

A first comment is that the lowest number amont the eight, twelve or fourteen can always be taken equal to zero. The problems are location invariant.

For the first problem, since the edges are labeled from 0 to 11, the sum of the numbers is 1+…+11 = 11×12/2. The sum of the values over all nodes is therefore 11×12, which should also be equal to 8xs, s being the constant sum at each node. Since 11×12 cannot be divided by 8, this is impossible.

For the second problem, were there a solution, an R resolution would be

 edges=rbind(c(1,1,1,2,2,3,3,4,5,5,6,7),c(2,4,5,3,6,4,7,8,6,8,7,8))
le=11
while (le<12){
node=sample(0:7)
le=length(unique(node[edges[1,]]+node[edges[2,]]))}


However, the R code does not produce an answer after a long wait, implying this is not possible.

For the third problem, there is a jump in the sequence of edge numbers at a>-1, then again at 10>b>a. Therefore the sum of the edge numbers is 11×6 + (b-a)+2x(11-b), i.e 88-a-b, which should be divisible by 4 when all the nodes have the same value. This occurs when a+b is divisible by 4, e.g. when a=6, b=10. This choice leads to the R code

b=sample(3:9,1)
a=sample(c(4-b,8-b,12-b),1)
while ((a<0)||(a>b)) a=sample(c(4-b,8-b,12-b),1)
newedge=sample(c(0:a,(a+2):(b+1),(b+3):13))
le=NULL
for (i in 1:8)
le=c(le,sum(newedge[((edges[1,]!=i)*(edges[2,]!=i))==0]))
while (length(unique(le))>1){
b=sample(3:9,1)
a=sample(c(4-b,8-b,12-b),1)
while ((a<0)||(a>b)) a=sample(c(4-b,8-b,12-b),1)
newedge=sample(c(0:a,(a+2):(b+1),(b+3):13))
le=NULL
for (i in 1:8)
le=c(le,sum(newedge[((edges[1,]!=i)*(edges[2,]!=i))==0]))
}


which produces a positive outcome

> le
[1] 20 20 20 20 20 20 20 20
> a
[1] 0
> b
[1] 8
> newedge
[1]  9  8  3  7  4  0 13 12 11  6  5  2


## Le Monde puzzle [48]

Posted in R, Statistics with tags , , , , , , on December 2, 2010 by xi'an

This week(end), the Le Monde puzzle can be (re)written as follows (even though it is presented as a graph problem):

Given a square 327×327 symmetric matrix A, where each non-diagonal entry is in {1,2,3,4,5} and $\text{diag}(A)=0$, does there exist a triplet (i,j,k) such that

$a_{ij} = a_{jk} = a_{ki}$

Solving this problem in R is very easy. Or appears to be. We can indeed create a random matrix A and check whether or not any of the five triple indicator matrices

$(A==c)^3\,,\qquad c=1,\ldots,5,$

has a non-zero diagonal entry. Indeed, since $B=A^3$ satisfies

$b_{uu} = \sum_{v,w} a_{uv}a_{vw}a_{wu}$

there is a non-zero entry iff there exists a triplet (u,v,w) such that the product $a_{uv}a_{vw}a_{wu}$ is different from zero. Here is the R code:

chec=1
for (mc in 1:10^3){

#random filled matrix
A=matrix(sample(1:5,(357)^2,rep=TRUE),357)
A=A*upper.tri(A)+t(A*upper.tri(A))

agr=0
for (t in 1:5){
B=(A==t)
agr=max(agr,max(diag(B%*%B%*%B)>0))
if (agr==1){ break()}
}

chec=min(chec,agr)
if (chec==0){ break()}
}


I did run the above code and did not find any case where no triplet was sharing the same number. Neither did I for 326, 325,… But Robin Ryder told me that this is a well-known problem in graph theory that goes under the name of Ramsey’s problem and that 327 is an upper bound on the number of nodes for the existence of the triplet with 5 colours/numbers. So this is another illustration of a case when crude simulation cannot exhibit limiting cases in order to void a general property, because of the number of possible cases. To improve the chances of uncovering the boudary value, we would need a simulation that dis-favours triplets with the same number. I am also curious to see Le Monde solution tomorrow, since finding Ramsey’s numbers seems to be a hard problem, no solution being provided for 6 colours/numbers.

Incidentally, I wonder if there is a faster way to produce a random symmetric matrix than the cumbersome

A=matrix(sample(1:5,(357)^2,rep=TRUE,357)
A=A*upper.tri(A)+t(A*upper.tri(A))


Using the alternative saving on the lower triangular part

A=matrix(sample(1:5,(357)^2,rep=TRUE,357)
A[upper.tri(A)]=sample(1:5,357*178,rep=TRUE)
A=A*upper.tri(A)+t(A*upper.tri(A))


certainly takes longer…