 In the motivating toy example to our ABC model choice paper, we compare summary statistics, mean, median, variance, and… median absolute deviation (mad). The latest is the only one able to discriminate between our normal and Laplace models (as now discussed on Cross Validated!). When rerunning simulations to produce nicer graphical outcomes (for the revision), I noticed a much longer run time associated with the computation of the mad statistic. Here is a comparison for the computation of the mean, median, and mad on identical simulations:

```> system.time(mmean(10^5))
user  system elapsed
4.040   0.056   4.350
> system.time(mmedian(10^5))
user  system elapsed
12.509   0.012  15.353
user  system elapsed
23.345   0.036  23.458
```

Now, this is not particularly surprising: computing a median takes longer than computing a mean, even using quicksort!, hence computing two medians… Still, having to wait about six times longer for the delivery of a mad statistics is somehow…mad!

### 6 Responses to “mad statistic”

1. Anthony Says:

Similar to Iain’s comment, but just to note that the deterministic O(n) time median algorithm (median of medians) is often slower than the expected O(n) time (worst case O(n^2)) algorithm (Hoare’s selection algorithm), which is a minor modification of quicksort (along the same lines as what Iain said).

If you are computing many medians of at least moderately sized data sets, e.g. at each iteration of an MCMC kernel, it may be worthwhile to do so using Hoare’s selection algorithm. I found in my MSc. thesis that this made a pretty big difference in computation time (I was computing a lot of medians at each step of a particle filter).

• xi'an Says:

Thanks to both Iain and Anthony for those remarks. Is there an R function that would go faster than median() and mad()?

• Anthony Says:

Unfortunately, I don’t know if anyone has implemented one of these in R.

2. Grizzly Says:

What are these functions, mmedian/mmad? Are they part of some R package?

• xi'an Says:

no, they are my functions, computing mean, median, and mad a certain number of times to get a stable system.time(.)….

3. Iain Murray Says:

It is possible to find a median in O(n), rather than the quicksort’s O(n.log(n)): there’s no need to sort buckets that can’t contain the median. But I rarely bother with these fancy methods; sorting is hardly ever a bottleneck in any of my code.

If it really is taking lots of time for *huge* vectors, do we really need the exact median? One could get a decent estimate from a subsample. Not looking at all the data is cheaper than O(n) :-).

This site uses Akismet to reduce spam. Learn how your comment data is processed.