Random sudokus

After thinking about random sudokus for a few more weeks, I eventually came to read the paper by Newton and DeSalvo about the entropy of sudoku matrices. As written earlier, if we consider (as Newton and DeSakvo) a uniform distribution where the sudokus are drawn uniformly over the set of all sudokus, the entropy of this distribution is \log(N), where N is the size of this set of all sudokus. The (Shannon) entropy considered in the paper is completely different: it is

H = -\sum_{i=1}^9 \hat \sigma_i \log(\hat \sigma_i)

where the \hat \sigma_i‘s are the normalised transforms of the singular values of the sudoku matrix. The result that the average H is about 1.733, being larger than the same average over a collection of purely random matrices, 1.651, is then much less spectacular…

Another entry in the same paper is worth pondering about: the authors consider a random selection mechanism, working one entry at a time from the first row downwards, with a backward step in case of impossible entries. An R rendering of this generator is

s=matrix(0,ncol=9,nrow=9)
pool=rep(TRUE,9)
s[1,]=sample(1:9,9)
i=2;
while (i<10){
boxa=3*trunc((i-1)/3)+1;boxa=boxa:(boxa+2)
for (j in 1:9){
boxb=3*trunc((j-1)/3)+1;boxb=boxb:(boxb+2)
for (u in (1:9))
pool[u]=(sum(u==s[i,])+sum(u==s[,j]) +sum(u==s[boxa,boxb]))==0
if (sum(pool)>1){
s[i,j]=sample((1:9)[pool],1)
}else{
if (sum(pool)==1)
s[i,j]=(1:9)[pool]
if (sum(pool)==0){
rmrk=sample(0:(i-1),1,prob=1/(1:i)^9)
s[(i-rmrk):i, ]=0
i=i-rmrk-1
break()

}}}
i=i+1
}

and it produces valid sudokus! The question is rather to test how uniform this generator is (the authors replace uniformity with unbiasedness, resulting in a simplistic necessary condition on the generator!) and the paper is only considering an answer based on the average of the s[i,j]’s, which are naturally equidistributed over 1,2,…,9! While this answer is unsatisfactory, I have no clear idea on how to attack the problem. For instance, I ran an Rperiment looking for the sequence 1,2,…,9 to appear in either a row or a column: I think the probability of occurrence of this sequence is 1/8! (since there is always one row and one column starting with 1), i.e. 2.5 10-5, while 36000 simulations from the above showed a frequency of 5.5 10-5 both for rows and columns (i.e., 2 occurences over the 36000 replicas!). This only shows the need for a much larger experiment (and presumably a move from R to C  if I want the answer quickly!). Another test that is permutation invariant would be to check for the distribution of the number of different digits on the diagonal of the sudoku matrix, but I am not certain about the theoretical distribution. For instance, running 10,000 simulations, the average number of different digits is 6.3, with no occurrence of 1, 2 or 3, and the binomial fit is poor.

3 Responses to “Random sudokus”

  1. […] in discrete state spaces.There is no connexity in most of those problems. For instance, in the sudoku example, the values in a given entry of the grid have no ordinal meaning and 7 is just as far from […]

  2. I get the message:

    > for (j in 1:9) pool[1,j,-s[1,j]]=FALSE
    Error in pool[1, j, -s[1, j]] = FALSE : incorrect number of subscripts

    • oops! Sorry, I forgot to remove this line from my earlier sudoku solver… Incidentally, turning the 9x9x9 array pool into a vector of length 9 cuts the computing time by a factor of 10!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: