## Le Monde puzzle [#825]

**Y**et another puzzle which first part does not require R programming, even though it is a programming question in essence:

Given five real numbers x_{1},…,x_{5}, what is the minimal number ofpairwise comparisons needed to rank them? Given 33 real numbers, what is the minimal number of pairwise comparisons required to find the three largest ones?

**I** do not see a way out of considering the first question as the quickest possible sorting of a real sample. Using either quicksort or heapsort, I achieve sorting the 5 numbers in exactly 6 comparisons for any order of the initial sample. *(Now, there may be an even faster way based on comparing partial sums first… I just do not see how!)* **Update:** Oops! I realised I made my reasoning based on a reasonable case, the correct answer is indeed 7!!!

**F**or the second part, let us start from the remark that 32 comparisons are needed to find the largest number, then at most 31 for the second largest, and at most 30 for the third largest (since we can take advantage of the partial ordering resulting from the determination of the largest number). This is poor. If I instead use a heap algorithm, I need O(n log{n}) comparisons to build this binary tree whose parents are always larger than their siblings, as in the above example. *(I can produce a sort of heap structure, although non-binary, in an average 16×2.5=40 steps. And a maximum 16×3=48 steps.)* The resulting tree provides the largest number (100 in the above example) and at least the second largest number (36 in the above). To get the third largest number, I first need a comparison between the one-before-last terms of the heap (19 vs. 36 in the above), and one or two extra comparisons (25 vs. 19 and maybe 25 vs. 1 in the above). *(This would induce an average 1.5 extra comparison and a maximum 2 extra comparisons, resulting in a total of 41.5 average and 49.5 maximum comparisons with my sub-optimal heap construction.)* Once again, using comparisons of sums may help in speeding up the process, for instance comparing numbers by groups of 3, but I did not pursue this solution…

**I**f instead I try to adapt quicksort to this problem, I can have a dynamic pivot that always keep at most two terms above it, providing the three numbers as a finale result. Here is an R code to check its performances:

quick3=function(x){ comp=0 i=1 lower=upper=NULL pivot=x[1] for (i in 2:length(x)){ if (x[i]<pivot){ lower=c(lower,x[i]) }else{ upper=c(upper,x[i]) if (length(upper)>1) comp=comp+1} comp=comp+1 if (length(upper)==3){ pivot=min(upper) upper=sort(upper)[-1] }} if (length(upper)<3) upper=c(pivot,upper) list(top=upper,cost=comp) }

When running this R code on 10⁴ random sequences of 33 terms, I obtained the following statistics, I obtained the following statistics on the computing costs

</p> <p style="text-align: justify;">> summary(costs) Min. 1st Qu. Median Mean 3rd Qu. Max. 32.00 36.00 38.00 37.89 40.00 49.00

and the associated histogram represented above. Interestingly, the minimum is the number of comparisons needed to produce the maximum!

**R**eading the solution in Le Monde in the train to London and Bayes 250, I discovered that the author suggests a 7 comparison solution in the first case ** [compare A and B, C and D = 2 comparisons; if A>B and C>D, compare A and C and get, say A>C>D = 1 comparison; insert E in this ordering by comparing with C and then A or D = 2 comparisons, obtaining e.g. A>E>C>D; conclude by inserting B by first comparing with C then with D or E = 2 comparisons]** and a 41 comparison solution in the second case.

June 19, 2013 at 4:52 pm

For the first part, I considered the following approach:

a) Let T denote a random variable that denotes the ordering of the 5 numbers. T can assume 5! = 120 different values.

b) An infallible sorter reduces the entropy of T to 0, no matter what your prior is.

c) Each comparison gets you a bit of information, and reduces your entropy by, at most, log(2).

d) Putting a uniform prior on T, your initial entropy is log(120).

e) Since log(120)-k log(2) >= 0 for any k <= 6, a lower bound on the number of comparisons is 7.

(I couldn't get this to work for the second part!)

June 19, 2013 at 4:30 pm

It’s not hard to see why your answer for the first problem is too optimistic. Each comparison gives you 1 bit of information. The worst case performance of your sorting algorithm – in terms of number of comparisons – can never be less than the number of bits required to enumerate all possible permutations, i.e if you are sorting n real numbers, you need at least the length of n! in binary in the worst case. This is ceiling(log(factorial(n), base=2)), which is 7 for n=5.

If you believe the author, this bound can be attained. But standard sorting algorithms don’t seem to manage it. I get maximum 12 comparisons for heapsort, 10 for quicksort and 8 for mergesort.

June 20, 2013 at 6:22 am

Thanks: I included the deterministic solution leading to 7 comparisons.