Le Monde puzzle [#825]

Yet another puzzle which first part does not require R programming, even though it is a programming question in essence:

Given five real numbers x1,…,x5, what is the minimal number of pairwise comparisons needed to rank them? Given 33 real numbers, what is the minimal number of pairwise comparisons required to find the three largest ones?

I do not see a way out of considering the first question as the quickest possible sorting of a real sample. Using either quicksort or heapsort, I achieve sorting the 5 numbers in exactly 6 comparisons for any order of the initial sample. (Now, there may be an even faster way based on comparing partial sums first… I just do not see how!) Update: Oops! I realised I made my reasoning based on a reasonable case, the correct answer is indeed 7!!!

For the second part, let us start from the remark that 32 comparisons are needed to find the largest number, then at most 31 for the second largest, and at most 30 for the third largest (since we can take advantage of the partial ordering resulting from the determination of the largest number). This is poor. If I instead use a heap algorithm, I need O(n log{n}) comparisons to build this binary tree whose parents are always larger than their siblings, as in the above example. (I can produce a sort of heap structure, although non-binary, in an average 16×2.5=40 steps. And a maximum 16×3=48 steps.) The resulting tree provides the largest number (100 in the above example) and at least the second largest number (36 in the above). To get the third largest number, I first need a comparison between the one-before-last terms of the heap (19 vs. 36 in the above), and one or two extra comparisons (25 vs. 19 and maybe 25 vs. 1 in the above). (This would induce an average 1.5 extra comparison and a maximum 2 extra comparisons, resulting in a total of 41.5 average and 49.5 maximum comparisons with my sub-optimal heap construction.)  Once again, using comparisons of sums may help in speeding up the process, for instance comparing numbers by groups of 3, but I did not pursue this solution…

quick3If instead I try to adapt quicksort to this problem, I can have a dynamic pivot that always keep at most two terms above it, providing the three numbers as a finale result. Here is an R code to check its performances:

quick3=function(x){

comp=0
i=1
lower=upper=NULL
pivot=x[1]

for (i in 2:length(x)){

if (x[i]<pivot){ lower=c(lower,x[i])
}else{

upper=c(upper,x[i])
if (length(upper)>1) comp=comp+1}

comp=comp+1

if (length(upper)==3){

pivot=min(upper)
upper=sort(upper)[-1]
}}

if (length(upper)<3) upper=c(pivot,upper)

list(top=upper,cost=comp)
}

When running this R code on 10⁴ random sequences of 33 terms, I obtained the following statistics, I obtained the following statistics on the computing costs

</p>
<p style="text-align: justify;">> summary(costs)
Min.   1st Qu.  Median  Mean    3rd Qu.  Max.
32.00   36.00   38.00   37.89   40.00   49.00

and the associated histogram represented above. Interestingly, the minimum is the number of comparisons needed to produce the maximum!

Reading the solution in Le Monde in the train to London and Bayes 250, I discovered that the author suggests a 7 comparison solution in the first case [compare A and B, C and D = 2 comparisons; if A>B and C>D, compare A and C and get, say A>C>D = 1 comparison; insert E in this ordering by comparing with C and then A or D = 2 comparisons, obtaining e.g. A>E>C>D; conclude by inserting B by first comparing with C then with D or E = 2 comparisons] and a 41 comparison solution in the second case. He is (or I am!) I was clearly mistaken in the first case while the quick3 algorithm does 41 or less most of the time (90%)  but not always.

3 Responses to “Le Monde puzzle [#825]”

  1. For the first part, I considered the following approach:
    a) Let T denote a random variable that denotes the ordering of the 5 numbers. T can assume 5! = 120 different values.
    b) An infallible sorter reduces the entropy of T to 0, no matter what your prior is.
    c) Each comparison gets you a bit of information, and reduces your entropy by, at most, log(2).
    d) Putting a uniform prior on T, your initial entropy is log(120).
    e) Since log(120)-k log(2) >= 0 for any k <= 6, a lower bound on the number of comparisons is 7.

    (I couldn't get this to work for the second part!)

  2. It’s not hard to see why your answer for the first problem is too optimistic. Each comparison gives you 1 bit of information. The worst case performance of your sorting algorithm – in terms of number of comparisons – can never be less than the number of bits required to enumerate all possible permutations, i.e if you are sorting n real numbers, you need at least the length of n! in binary in the worst case. This is ceiling(log(factorial(n), base=2)), which is 7 for n=5.

    If you believe the author, this bound can be attained. But standard sorting algorithms don’t seem to manage it. I get maximum 12 comparisons for heapsort, 10 for quicksort and 8 for mergesort.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 598 other followers