7.9 Sorting by Distribution (Radix Sort) · Foundations of Algorithms Using C++ Pseudocod

## 7.9 Sorting by Distribution (Radix Sort) In the preceding section, we showed that any algorithm that sorts only by comparisons of keys can be no better than 螛 (*n lg n*). If we know nothing about the keys except that they are from an ordered set, we have no choice but to sort by comparing the keys. However, when we have more knowledge we can consider other sorting algorithms. By using additional information about the keys, we next develop one such algorithm. Suppose we know that the keys are all nonnegative integers represented in base 10. Assuming that they all have the same number of digits, we can first distribute them into distinct piles based on the values of the leftmost digits. That is, keys with the same leftmost digit are placed in the same pile. Each pile can then be distributed into distinct piles based on the values of the second digits from the left. Each new pile can then be distributed into distinct piles based on the values of the third digits from the left, and so on. After we have inspected all the digits, the keys will be sorted. [Figure 7.14](#ch07fig14) illustrates this procedure, which is called *sorting by distribution,* because the keys are distributed into piles. [![Click To expand](https://box.kancloud.cn/d05fa17d39eacfb26010f1b297ffa125_350x246.jpg)](fig7-14_0.jpg) Figure 7.14: Sorting by distribution while inspecting the digits from left to right. A difficulty with this procedure is that we need a variable number of piles. I Suppose instead that we allocate precisely 10 piles (one for each decimal digit), we inspect digits from right to left, and we always place a key in the pile corresponding to the digit currently being inspected. If we do this, the keys still end up sorted as long as we obey the following rule: On each pass, if two keys are to be placed in the same pile, the key coming from the leftmost pile (in the previous pass) is placed to the left of the other key. This procedure is illustrated in [Figure 7.15](#ch07fig15). As an example of how keys are placed, notice that after the first pass, key 416 is in a pile to the left of key 317. Therefore, when they are both placed in the first pile in the second pass, key 416 is placed to the left of key 317. In the third pass, however, key 416 ends up to the right of key 317 because it is placed in the fourth pile, whereas key 317 is placed in the third pile. In this way, the keys end up in the correct order according to their rightmost digits after the first pass, in the correct order according to their two rightmost digits after the second pass, and in the correct order according to all three digits after the third pass, which means they are sorted. [![Click To expand](https://box.kancloud.cn/40d084099fbf09926aebb4e2977ffefd_350x238.jpg)](fig7-15_0.jpg) Figure 7.15: Sorting by distribution while inspecting the digits from right to left. This sorting method precedes computers, having been the method used on the old card-sorting machines. It is called ***radix*** sort because the information used to sort the keys is a particular radix (base). The radix could be any number base, or we could use the letters of the alphabet. The number of piles is the same as the radix. For example, if we were sorting numbers represented in hexadecimal, the number of piles would be 16; if we were sorting alpha keys represented in the English alphabet, the number of piles would be 26 because there are 26 letters in that alphabet. Because the number of keys in a particular pile changes with each pass, a good way to implement the algorithm is to use linked lists. Each pile is represented by a linked list. After each pass, the keys are removed from the lists (piles) by coalescing them into one master linked list. They are ordered in that list according to the lists (piles) from which they were removed. In the next pass, the master list is traversed from the beginning, and each key is placed at the end of the list (pile) to which it belongs in that pass. In this way the rule just given is obeyed. For readability, we present this algorithm under the assumption that the keys are nonnegative integers represented in base 10. It can be readily modified to sort keys in other radix representations without affecting the order of the time complexity. We need the following declarations for the algorithm. ``` struct nodetype { keytype key; nodetype* link; }; typedef nodetype* node_pointer; ``` Algorithm 7.6: Radix Sort**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Problem: Sort *n* nonnegative integers, represented in base 10, in nondecreasing order. Inputs: linked list *masterlist* of *n* nonnegative integers, and an integer numdigits, which is the maximum number of decimal digits in each integer. Outputs: the linked list *masterlist* containing the integers in nondecreasing order. ``` void radixsort (node_pointer& masterlist, int numdigits) { index i; node_pointer list [0..9]; for (i = 1; i <= numdigits; i++){ distribute (masterlist, i); coalesce (masterlist); } } void distribute (node_pointer& masterlist, index i); // i is index of current { // digit being inspected. index l; node_pointer p; for (j = 0; j <= 9; j++) // Empty current piles. list [j] = NULL; p = masterlist; // Traverse masterlist. while (p ! = NULL){ j = value of ith digit (from the right) in p - < key; link p to the end of list [j]; p = p - > link; } } void coalesce (node_pointer& masterlist); { index j; masterlist = NULL; // Empty masterlist. for (j = 0; j <= 9; j++) link the nodes in list[j] to the end of masterlist; } ``` **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** Next we analyze the algorithm. **![Start Sidebar](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Analysis of Algorithm 7.6 Every-Case Time Complexity (Radix Sort)Basic operation: Because there are no comparisons of keys in this algorithm, we need to find a different basic operation. In an efficient implementation of *coalesce*, the lists that contain the piles would have pointers to both their beginnings and their ends so that we can readily add each list to the end of the previous one without traversing the list. Therefore, in each pass through the for loop in that procedure, a list is added to the end of *masterlist* by simply assigning an address to one pointer variable. We can take that assignment as the basic operation. In procedure *distribute*, we can take any or all of the instructions in the **while** loop as the basic operations. Therefore, to have a unit consistent with *coalesce*, we choose the one that adds a key to the end of a list by assigning an address to a pointer variable. Input size: *n*, the number of integers in *masterlist*, and *numdigits*, the maximum number of decimal digits in each integer. Traversal of the entirety of *masterlist* always requires *n* passes through the while loop in *distribute.* Addition of all the lists to *masterlist* always requires ten passes through the for loop in *coalesce.* Each of these procedures is called *numdigits* times from *radixsort.* Therefore, ![](https://box.kancloud.cn/c2761f29a7433f3122969c4b38725746_400x20.jpg) **![End Sidebar](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** This is not a 螛 (*n*) algorithm because the bound is in terms of *numdigits* and *n.* We can create arbitrarily large time complexities in terms of *n* by making *numdigits* arbitrarily large. For example, because 1,000,000,000 has 10 digits, it will take 螛 (n2) time to sort 10 numbers if the largest one is 1,000,000,000. In practice, the number of digits is ordinarily much smaller than the number of numbers. For example, if we are sorting 1,000,000 social security numbers, *n* is 1,000,000 whereas *numdigits* is only 9. It is not hard to see that, when the keys are distinct, the best-case time complexity of Radix Sort is in 螛 (*n* lg *n*), and ordinarily we achieve the best case. Next we analyze the extra space used by Radix Sort. **![Start Sidebar](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Analysis of Algorithm 7.6 Analysis of Extra Space Usage (Radix Sort)No new nodes are ever allocated in the algorithm because a key is never needed simultaneously in *masterlist* and in a list representing a pile. This means that the only extra space is the space needed to represent the keys in a linked list in the first place. Therefore, the extra space is in 螛 (*n*) links. By "in 螛 (*n*) links" we mean that the number of links is in 螛 (*n*). **![End Sidebar](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**