Published 02 Sep 2011
A pair of interesting threads about sorting algorithms recently surfaced on Stack Overflow, bringing to discussion an exciting concept (which was entirely unknown by me): sorting networks.
The first thread poses a straightforward question: what is the fastest way to sort 6 integers? The question calls for optimized code to sort an array of integers whose size is fixed and known in advance to be 6.
Experience shows that we can often sort elements a lot faster if we have some extra information about them, such as their distribution or range of values. So intuitively, one might expect better (albeit maybe not asymptotically better) performance from a sorting algorithm tailored for fixed-size arrays when compared to a generic version, fit for any input size.
And indeed some very interesting techniques were elicited in the replies to that thread. However, one of them struck me for its simplicity and sheer speed: using a sorting network. This turned out to be faster than the generic solutions (such as insertion sort) that were posted as replies, according to the original poster’s tests.
A brief explanation on that solution: sorting networks are abstract representations of the comparisons and swaps used to sort a fixed number of integers. They are often schematized as circuits in which input wires carry the values through pairwise compare-and-swap gates.
|— A simple sorting network for four elements|
Sorting networks leverage the fact that, given a fixed number of integers, the comparisons needed to sort them are also fixed. That is, the same circuit for n integers can sort all inputs of that size just by comparing and swapping elements in a fixed order.
I must not have been the only one puzzled by these clever constructs, since a second thread on Stack Overflow soon wondered, why are sorting networks so fast? The answer appears to be a combination of two factors: one, sorting networks tend to make less comparisons than generic sorting algorithms, and two, the comparisons made by sorting networks are more easily parallelizable at CPU level.
So, in order to gain perspective on the power of that new sorting technique, I decided to run a few practical tests of my own.
The methodology is simple. I implemented the quicksort algorithm in three forms. The first is a plain and simple version, which can be found in most textbooks. The second uses the common optimization of resorting to insertion sort for small arrays, of sizes up to eight. And the third used sorting networks for small arrays, again for sizes eight and smaller.
I then ran tests for input arrays of ascending (that is, already sorted), descending (sorted in reverse), all-zeroes and random integers, of sizes doubling from 2^16 (65536) to 2^22 (4194304), and compared the average user CPU times of five runs as measured with time.
The codes were compiled with clang 2.9, using -O1 and -O2, and ran on a 64-bit Linux system, with a Core I3 380M processor.
The code I used for testing is available in my Github page, in the sorting-networks-test repository. Anyone is welcome to review and try it. I also included all raw test results in that repository.
For a short summary of my results, I did not notice much difference between the insertion sort and the sorting networks versions. In fact, the former slightly outperformed the latter in almost all tests. On the bright side, both were consistently (and considerably) faster than the textbook algorithm.
It may be the case that the speed-up provided by sorting networks is highly compiler/CPU dependent, so I’d like to see more test results on the matter before drawing a definitive conclusion. But the fact that this new technique was a match for the old and well-known quicksort + insertion sort duo in terms of performance makes it a nice addition to my toolbox anyway.