8.5 The Selection Problem: Introduction to Adversary Arguments · Foundations of Algorithms Using C++ Pseudocod

## 8.5 The Selection Problem: Introduction to Adversary Arguments So far we've discussed searching for a key *x* in a list of *n* keys. Next we address a different searching problem called the ***Selection problem.*** This problem is to find the *k*th-largest (or *k*th-smallest) key in a list of *n* keys. We assume that the keys are in an unsorted array (the problem is trivial for a sorted array). First we discuss the problem for *k* = 1, which means that we are finding the largest (or smallest) key. Next we show that we can simultaneously find the smallest and largest keys with fewer comparisons than would be necessary if each were found separately. Then we discuss the problem for *k* = 2, which means that we are finding the second-largest (or second-smallest) key. Finally, we address the general case. ### 8.5.1 Finding the Largest Key The following is a straightforward algorithm for finding the largest key. Alogrithm 8.2: Find Largest Key**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Problem: Find the largest key in the array *S* of size *n.* Inputs: positive integer *n*, array of keys *S* indexed from 1 to *n.* Outputs: variable *large*, whose value is the largest key in *S.* ``` void find_largest (int n, const keytype S[], keytype& large) { index i; large = S[1]; for (i = 2; i <= n; i++) if (S[i] > large) large = S[i]; } ``` **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** Clearly, the number of comparisons of keys done by the algorithm is given by ![](https://box.kancloud.cn/875f5b3e07e8a5e91f5d12b719eb3bdd_112x21.jpg) Intuitively, it seems impossible to improve on this performance. The next theorem establishes that this is so. We can think of an algorithm that determines the largest key as a tournament among the keys. Each comparison is a match in the tournament, the larger key is the ***winner*** of the comparison, and the smaller key is the ***loser.*** The largest key is the winner of the tournament. We use this terminology throughout this section. Theorem 8.7**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Any deterministic algorithm that can find the largest of *n* keys in every possible input only by comparisons of keys must in every case do at least ![](https://box.kancloud.cn/0cfede50b8627eafb68f136df9d368e8_208x22.jpg) Proof: The proof is by contradiction. That is, we show that if the algorithm does fewer than *n* – 1 comparisons for some input of size *n*, the algorithm must give the wrong answer for some other input. To that end, if the algorithm finds the largest key by doing at most *n* – 2 comparisons for some input, at least two keys in that input never lose a comparison. At least one of those two keys cannot be reported as the largest key. We can create a new input by replacing (if necessary) that key by a key that is larger than all keys in the original input. Because the results of all comparisons will be the same as for the original input, the new key will not be reported as the largest, which means that the algorithm will give the wrong answer for the new input. This contradiction proves that the algorithm must do at least *n* – 1 comparisons for every input of size *n.* **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** You must be careful to interpret [Theorem 8.7](#ch08ex16) correctly. It does not mean that every algorithm that searches only by comparisons of keys must do at least *n* – 1 comparisons to find the largest key. For example, if an array is sorted, we can find the largest key without doing any comparisons simply by returning the last item in the array. However, an algorithm that returns the last item in an array can find the largest key only when the largest key is the last item. It cannot find the largest key in every possible input. [Theorem 8.7](#ch08ex16) concerns algorithms that can find the largest key in every possible input. Of course, we can use an analogous version of [Alogrithm 8.2](#ch08ex15) to find the smallest key in *n* – 1 comparisons and an analogous version of [Theorem 8.7](#ch08ex16) to show that *n* – 1 is a lower bound for finding the smallest key. ### 8.5.2 Finding Both the Smallest and Largest Keys A straightforward way to find the smallest and largest keys simultaneously is to modify [Alogrithm 8.2](#ch08ex15) as follows. Alogrithm 8.3: Find Smallest and Largest Keys**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Problem: Find the smallest and largest keys in an array *S* of size *n.* Inputs: positive integer *n*, array of keys *S* indexed from 1 to *n.* Outputs: variables *small* and *large*, whose values are the smallest and largest keys in *S.* ``` void find_both (int n, const keytype S[], keytype& small, keytype& large) { index i; small = S[1]; large = S[1]; for (i = 2; i <= n; i++) if (S[i] < small) small = S[i]; else if (S[i] > large) large = S[i]; } ``` **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** Using [Alogrithm 8.3](#ch08ex17) is better than finding the smallest and largest keys independently, because for some inputs the comparison of *S*\[*i*\] with *large* is not done for every *i.* Therefore, we have improved on the every-case performance. But whenever *S*\[1\] is the smallest key, that comparison is done for all *i.* Therefore, the worst-ease number of comparisons of keys is given by ![](https://box.kancloud.cn/b12b2725e19e239cdc68002b80811f16_146x21.jpg) which is exactly the number of comparisons done if the smallest and largest keys are found separately. It may seem that we cannot improve on this performance, but we can. The trick is to pair the keys and find which key in each pair is smaller. This can be done with about *n*/2 comparisons. We can then find the smallest of all the smaller keys with about *n*/2 comparisons and the largest of all the larger keys with about *n*/2 comparisons. In this way we find both the smallest and largest keys with only about 3*n*/2 total comparisons. An algorithm for this method follows. The algorithm assumes that *n* is even. Alogrithm 8.4: Find Smallest and Largest Keys by Pairing Keys**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Problem: Find the smallest and largest keys in an array *S* of size *n.* Inputs: positive even integer *n*, array of keys *S* indexed from 1 to *n.* Outputs: variables *small* and *large*, whose values are the smallest and largest keys in *S.* ``` void find_both2 (int n, // n is assumed to be even. const keytype S[], keytype& small, keytype& large) { index i; if (S[1] < S[2]) { small = S[1]; large = S[2]; } else { small = S[2]; large = S[1]; } for (i = 3; i <= n − 1; i = i + 2) { // Increament i by 2. if (S[i] < S[i + 1]) { if (S[i] < small) small = S[i]; if (S[i + 1] > large) large = S[i + 1]; } else { if (S[i + 1] < small) small = S[i + 1]; if (S[i] > large) large = S[i]; } } } ``` **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** It is left as an exercise to modify the algorithm so that it works when *n* is odd and to show that its number of comparison of keys is given by ![](https://box.kancloud.cn/df3c4ac78e4c808e6cfdaae52cb8f544_241x91.jpg) Can we improve on this performance? We show that the answer is "no." We do not use a decision tree to show this because decision trees do not work well for the Selection problem. The reason is as follows. We know that a decision tree for the Selection problem must contain at least *n* leaves because there are *n* possible outcomes. [Lemma 7.3](LiB0064.html#725) says that if a binary tree has *n* leaves, its depth is greater than or equal to \[lg *n*\]. Therefore, our lower bound on the number of leaves gives us a lower bound of \[lg *n*\] comparisons in the worst case. This is not a very good lower bound, because we already know that it takes at least *n* − 1 comparisons just to find the largest key ([Theorem 8.7](#ch08ex16)). [Lemma 8.1](LiB0068.html#769) is no more useful because we can only readily establish that there must be *n* − 1 comparison nodes in the decision tree. Decision trees do not work well for the Selection problem because a result can be in more than one leaf. [Figure 8.9](#ch08fig09) shows the decision tree for [Alogrithm 8.2](#ch08ex15) (Find Largest Key) when *n* = 4. There are four leaves that report 4 and two that report 3. The number of comparisons done by the algorithm is 3 rather than lg *n* = lg 4 = 2. We see that lg *n* is a weak lower bound. [![Click To expand](https://box.kancloud.cn/427c04ad257e1fbafd9e4f6266943524_350x211.jpg)](fig8-9_0.jpg) Figure 8.9: The decision tree corresponding to [Alogrithm 8.2](#ch08ex15) when *n* = 4. We use another method, called an ***adversary argument***, to establish our lower bound. An *adversary* is an opponent or foe. Suppose you are in the local singles bar and an interested stranger asked you the tired question, "What's your sign?" This means that the person wants to know your astrological sign. There are 12 such signs, each corresponding to a period of approximately 30 days. If, for example, you were born August 25, your sign would be Virgo. To make this stale encounter more exciting, you decide to spice things up by being an adversary. You tell the stranger to guess your sign by asking yes/no questions. Being an adversary, you have no intention of disclosing your sign—you simply want to force the stranger to ask as many questions as possible. Therefore, you always provide answers that narrow down the search as little as possible. For example, suppose the stranger asks, "Were you born in summer?" Because a "no" answer would narrow the search to nine months and a "yes" answer would narrow it to three months, you answer "no." If the stranger then asks, "Were you born in a month with 31 days?" you answer "yes," because more than half of the nine possible months remaining have 31 days. The only requirement of your answers is that they be consistent with ones already given. For example, if the stranger forgets that you said you were not born in summer and later asks if you were born in July, you could not answer "yes" because this would not be consistent with your previous answers. Inconsistent answers have no sign (birthday) satisfying them. Because you answer each question so as to leave the maximum number of remaining possibilities and because your answers are consistent, you force the stranger to ask as many questions as possible before reaching a logical conclusion. Suppose some adversary's goal is to make an algorithm work as hard as possible (as you did with the stranger). At each point where the algorithm must make a decision (for example, after a comparison of keys), the adversary tries to choose a result of the decision that will keep the algorithm going as long as possible. The only restriction is that the adversary must always choose a result that is consistent with those already given. As long as the results are consistent, there must be an input for which this sequence of results occurs. If the adversary forces the algorithm to do the basic instruction *f (n)* times, then *f (n)* is a lower bound on the worst-case time complexity of the algorithm. We use an adversary argument to obtain a lower bound on the worst-case number of comparisons needed to find both the largest and smallest keys. To establish that bound, we can assume that the keys are distinct. We can do this because a lower bound on the worst-case time complexity for a subset of inputs (those with distinct keys) is a lower bound on the worst-case time complexity when all inputs are considered. Before presenting the theorem, we show our adversary's strategy. Suppose we have some algorithm that solves the problem of finding the smallest and largest keys only by comparisons of keys. If all the keys are distinct, at any given time during the execution of the algorithm a given key has one of the following states: *State* *Description of State* X The key has not been involved in a comparison. L The key has lost at least one comparison and has never won. W The key has won at least one comparison and has never lost. WL The key has won at least one comparison and has lost at least one. We can think of these states as containing units of information. If a key has state X, there are zero units of information. If a key has state L or W, there is one unit of information because we know that the key has either lost or won a comparison. If a key has state WL, there are two units of information because we know that the key has both won and lost comparisons. For the algorithm to establish that one key *small* is smallest and another key *large* is largest, the algorithm must know that every key other than *small* has won a comparison and every key other than *large* has lost a comparison. This means that the algorithm must learn ![](https://box.kancloud.cn/fa38b53e9ac917670ab884848f0e18fa_206x21.jpg) units of information. Because the goal is to make the algorithm work as hard as possible, an adversary wants to provide as little information as possible with each comparison. For example, if the algorithm first compares *s*2 and *s*1, it doesn't matter what an adversary answers, because two units of information are supplied either way. Let's say the adversary answers that *s*2 is larger. The state of *s*2 then changes from X to W, and the state of *s*1 changes from X to L. Suppose the algorithm next compares *s*3 and *s*1. If an adversary answers that *s*1 is larger, the state of *s*1 changes from L to WL and the state of *s*3 changes from X to L. This means that two units of information are disclosed. Because only one unit of information is disclosed by answering that *s*3 is larger, we have our adversary give this answer. [Table 8.2](#ch08table02) shows an adversary strategy that always discloses a minimal amount of information. When it doesn't matter which key is answered, we have simply chosen the first one in the comparison. This strategy is all we need to prove our theorem. But first let's show an example of how our adversary would actually use the strategy. **![](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Table 8.2: Our Adversary's Strategy for Foiling an Algorithm That Finds the Smallest and Largest Keys\[[\*](#ftn.ch08fnt02)\]Before Comparison Key Declared Larger by Adversary After Comparison Units of Information Learned by Algorithm - - - - - - *Si* *Sj* *Si* *Sj* - - - - - - X X *Si* W L 2 X L *Si* W L 1 X W *Sj* L W 1 X WL *Si* W WL 1 L L *Si* WL L 1 L W *Sj* L W 0 L WL *Sj* L WL 0 W W *Si* W WL 1 W WL *Si* W WL 0 WL WL Consistent with previous answers WL WL 0 \[[\*](#ch08fnt02)\]The keys *Si* and *Sj* are being compared. **![](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Example 8.2**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**[Table 8.3](#ch08table03) shows our adversary's strategy for foiling [Alogrithm 8.3](#ch08ex17) for an input size of 5. We have assigned values to the keys that are consistent with the answers. Assigning such values is not really necessary, but the adversary must keep track of the order of the keys imposed by the answers, so that a consistent answer can be given when both keys have state WL. An easy way to do this is to assign values. Furthermore, assigning values illustrates that a consistent set of answers does indeed have an input associated with it. Other than the order determined by the adversary's strategy, the answers are arbitrary. For example, when *s*3 is declared larger than *s*1, we give *s*3 the value 15. We could have given it any value greater than 10. Notice that after *s*3 is compared with *s*2, we change the value of *s*3 from 15 to 30. Remember that the adversary has no actual input when presented with the algorithm. An answer (and therefore an input value) is constructed dynamically whenever our adversary is given the decision the algorithm is currently making. It is necessary to change the value of *s*3 from 15 because *s*2 is larger than 15 and our adversary's answer is that *s*3 is larger than *s*2. After the algorithm is done, *s*1 has lost to all other keys and *s*5 has won over all other keys. Therefore, *s*1 is smallest and *s*5 is largest. The input constructed by our adversary has *s*1 as the smallest key because this is an input that makes [Alogrithm 8.3](#ch08ex17) work hardest. Notice that eight comparisons are done in [Table 8.3](#ch08table03), and the worst-case number of comparisons done by [Alogrithm 8.3](#ch08ex17) when the input size is 5 is given by ![](https://box.kancloud.cn/304d13095a6017b4922a4db564e8a8a1_174x20.jpg) This means that our adversary succeeds in making [Alogrithm 8.3](#ch08ex17) work as hard as possible when the input size is 5. **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** **![](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Table 8.3: Our Adversary's Answers in Attempting to Foil [Alogrithm 8.3](#ch08ex17) for an Input Size of 5Camparison Key Declared Larger by Adversary States/Assigned Values Units of Information Learned by Algorithm - - - - - - *S*1 *S*2 *S*3 *S*4 *S*5 - - - - - - *S*2 < *S*1 *S*2 L/10 W/20 X X X 2 *S*2 > *S*1 *S*2 L/10 W/20 X X X 0 *S*3 < *S*1 *S*3 L/10 W/20 W/15 X X 1 *S*3 > *S*2 *S*3 L/10 WL/20 W/30 X X 1 *S*4 < *S*1 *S*4 L/10 WL/20 W/30 W/15 X 1 *S*4 > *S*3 *S*4 L/10 WL/20 WL/30 W/40 X 1 *S*5 < *S*1 *S*5 L/10 WL/20 WL/30 W/40 W/15 1 *S*5 > *S*4 *S*5 L/10 WL/20 WL/30 WL/40 W/50 1 **![](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**When presented with a different algorithm for finding the largest and smallest keys, our adversary will provide answers that try to make that algorithm work as hard as possible. You are encouraged to determine the adversary's answers when presented with [Alogrithm 8.4](#ch08ex18) (Find Smallest and Largest Keys by Pairing Keys) and some input size. When developing an adversary strategy, our goal is to make algorithms for solving some problem work as hard as possible. A poor adversary may not actually achieve this goal. However, regardless of whether or not the goal is reached, the strategy can be used to obtain a lower bound on how hard the algorithms must work. Next we do this for the adversary strategy just described. Theorem 8.8**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Any deterministic algorithm that can find both the smallest and the largest of *n* keys in every possible input only by comparisons of keys must in the worst case do at least the following numbers of comparisons of keys: ![](https://box.kancloud.cn/338402404dfc482bf106bea5542badd9_178x84.jpg) Proof: We show that this is a lower bound on the worst case by showing that the algorithm must do at least this many comparisons when the keys are distinct. As noted previously, the algorithm must learn 2*n* − 2 units of information to find both the smallest and largest keys. Suppose we present our adversary with the algorithm. [Table 8.2](#ch08table02) shows that our adversary provides two units of information in a single comparison only when both keys have not been involved in previous comparisons. If *n* is even, this can be the case for at most ![](https://box.kancloud.cn/a8f7490cd701b64a8369d4e799865457_121x37.jpg) which means that the algorithm can get at most 2 (*n*/2) = *n* units of information in this way. Because our adversary provides at most one unit of information in the other comparisons, the algorithm must do at least ![](https://box.kancloud.cn/c7c844023f300fcb4154c2835e5e0722_150x16.jpg) additional comparisons to get all the information it needs. Our adversary therefore forces the algorithm to do at least ![](https://box.kancloud.cn/78f264d2993a2d9c46ed62ad3db97aac_160x39.jpg) comparisons of keys. It is left as an exercise to analyze the case in which *n* is odd. **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** Because [Alogrithm 8.4](#ch08ex18) does the number of comparisons in the bound in [Theorem 8.8](#ch08ex20), that algorithm is optimal in its worst-case performance. We have chosen a worthy adversary because we have found an algorithm that performs as well as the bound it provides. We know therefore that no other adversary could provide a larger bound. [Example 8.2](#ch08ex19) illustrates why [Alogrithm 8.3](#ch08ex17) is suboptimal. In that example, [Alogrithm 8.3](#ch08ex17) takes eight comparisons to learn eight units of information, whereas an optimal algorithm would take only six comparisons. [Table 8.3](#ch08table03) shows that the second comparison is useless because no information is learned. When adversary arguments are used, the adversary is sometimes called an ***oracle.*** Among ancient Greeks and Romans, an oracle was an entity with great knowledge that answered questions posed by humans. ### 8.5.3 Finding the Second-Largest Key To find the second-largest key, we can use [Alogrithm 8.2](#ch08ex15) (Find Largest Key) to find the largest key with *n* − 1 comparisons, eliminate that key, and then use [Alogrithm 8.2](#ch08ex15) again to find the largest remaining key with *n* − 2 comparisons. Therefore, we can find the second-largest key with 2*n* − 3 comparisons. We should be able to improve on this, because many of the comparisons done when finding the largest key can be used to eliminate keys from contention for the second largest. That is, any key that loses to a key other than the largest cannot be the second largest. The Tournament method, described next, uses this fact. The ***Tournament method*** is so named because it is patterned after the method used in elimination tournaments. For example, to determine the best college basketball team in the United States, 64 teams compete in the NCAA Tournament. Teams are paired and 32 games are played in the first round. The 32 victors are paired for the second round and 16 games are played. This process continues until only two teams remain for the final round. The winner of the single game in that round is the champion. It takes lg 64 = 6 rounds to determine the champion. For simplicity, assume that the numbers are distinct and that *n* is a power of 2. As is done in the NCAA Tournament, we pair the keys and compare the pairs in rounds until only one round remains. If there are eight keys, there are four comparisons in the first round, two in the second, and one in the last. The winner of the last round is the largest key. [Figure 8.10](#ch08fig10) illustrates the method. The Tournament method directly applies only when *n* is a power of 2. When this is not the case, we can add enough items to the end of the array to make the array size a power of 2. For example, if the array is an array of 53 integers, we add 11 elements, each with value –∞, to the end of the array to make the array contain 64 elements. In what follows, we will simply assume that *n* is a power of 2. [![Click To expand](https://box.kancloud.cn/ec2f80384bac441a9cb05aeb1a689d79_350x239.jpg)](fig8-10_0.jpg) Figure 8.10: The Tournament method. Although the winner in the last round is the largest key, the loser in that round is not necessarily the second largest. In [Figure 8.10](#ch08fig10), the second-largest key (16) loses to the largest key (18) in the second round. This is a difficulty with many actual tournaments because the two best teams do not always meet in the championship game. Anyone familiar with the American football Super Bowl is well aware of this. To find the second-largest key, we can keep track of all the keys that lose to the largest key, and then use [Alogrithm 8.2](#ch08ex15) to find the largest of those. But how can we keep track of those keys when we do not know in advance which key is the largest? We can do this by maintaining linked lists of keys, one for each key. After a key loses a match, it is added to the winning key's linked list. It is left as an exercise to write an algorithm for this method. If *n* is a power of 2, there are *n*/2 comparisons in the first round, *n*/22 in the second round, …, and *n*/2lg*n* = 1 in the last round. The total number of comparisons in all the rounds is given by ![](https://box.kancloud.cn/36dd273b632e314e7351c8660d205046_400x54.jpg) The second-to-last equality is obtained by applying the result in [Example A.4](LiB0095.html#1299) in [Appendix A](LiB0093.html#1281). This is the number of comparisons needed to complete the tournament. (Notice that we find the largest key with an optimal number of comparisons.) The largest key will have been involved in lg *n* matches, which means that there will be lg *n* keys in its linked list. If we use [Alogrithm 8.2](#ch08ex15), it takes lg *n* − 1 comparisons to find the largest key in this linked list. That key is the second largest key. Therefore, the total number of comparisons needed to find the second largest key is ![](https://box.kancloud.cn/e7d1b65977b652582048b60a1f62e6ea_308x18.jpg) It is left as an exercise to show that, for *n* in general, ![](https://box.kancloud.cn/d113e77548d472b3c46602b96e77d60c_179x21.jpg) This performance is a substantial improvement over using [Alogrithm 8.2](#ch08ex15) twice to find the second-largest key. Recall that it takes 2*n* − 3 comparisons to do it that way. Can we obtain an algorithm that does even fewer comparisons? We use an adversary argument to show that we cannot. Theorem 8.9**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Any deterministic algorithm that can find the second largest of *n* keys in every possible input only by comparisons of keys must in the worst case do at least ![](https://box.kancloud.cn/74d93b6a8f86c874a32b52fbd0a806f8_273x20.jpg) Proof: To determine that a key is second largest, an algorithm must determine that it is the largest of the *n* − 1 keys besides the largest key. Let *m* be the number of comparisons won by the largest key. None of these comparisons is useful in determining that the second-largest key is the largest of the remaining *n* − 1 keys. [Theorem 8.7](#ch08ex16) says that this determination requires at least *n* − 2 comparisons. Therefore, the total number of comparisons is at least *m* + *n* − 2. This means that, to prove the theorem, an adversary only needs to force the algorithm to make the largest key compete in at least \[lg *n*\] comparisons. Our adversary's strategy is to associate a node in a tree with each key. Initially, *n* single-node trees are created, one for each key. Our adversary uses the trees to give the result of the comparison of *si* and *sj* as follows: - If both *si* and *sj* are roots of trees containing the same number of nodes, the answer is arbitrary. The trees are combined by making the key that is declared smaller a child of the other key. - If both *si* and *sj* are roots of trees, and one tree contains more nodes than the other, the root of the smaller tree is declared smaller, and the trees are combined by making that root a child of the other root. - If *si* is a root and *sj* is not, *sj* is declared smaller and the trees are not changed. - If neither *si* nor *sj* is a root, the answer is consistent with previously assigned values and the trees are not changed. [Figure 8.11](#ch08fig11) shows how trees would be combined when our adversary is presented with the Tournament method and an instance of size 8. When the choice is arbitrary, we make the key with the larger index the winner. Only the comparisons up to completion of the tournament are shown. After that, more comparisons would be done to find the largest of the keys that lost to the largest key, but the tree would be unchanged. Let *sizek* be the number of nodes in the tree rooted at the largest key immediately after the *k*th comparison won by that key. Then ![](https://box.kancloud.cn/91c36020bea8223629e3f6f4a7ca2dc0_134x20.jpg) because the number of nodes in a defeated key's tree can be no greater than the number of nodes in the victor's tree. The initial condition is *size*0 = 1. We can solve this recurrence using the techniques in [Appendix B](LiB0102.html#1369) to conclude that ![](https://box.kancloud.cn/d02ef56b671d7420d61b274e1a18055f_88x23.jpg) If a key is a root when the algorithm stops, it has never lost a comparison. Therefore, if two trees remain when the algorithm stops, two keys have never lost a comparison. At least one of those keys is not reported as the second-largest key. We can create a new input with all the other values the same, but with the values of the two roots changed so that the key that we know is not reported as the second largest is indeed the second-largest key. The algorithm would give the wrong answer for that input. So when the algorithm stops, all *n* keys must be in one tree. Clearly, the root of that tree must be the largest key. Therefore, if *m* is the total number of comparisons won by the largest key, ![](https://box.kancloud.cn/ca6c5cfaa0a39078bc566b5e623ce418_161x75.jpg) The last equality derives from the fact that *m* is an integer. This proves the theorem. **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** [![Click To expand](https://box.kancloud.cn/d0200daa693faad4f3f36ede6824c931_350x490.jpg)](fig8-11_0.jpg) Figure 8.11: The trees created by our adversary in [Theorem 8.9](#ch08ex21) when presented with the Tournament method and an input size of 8. Because the Tournament method performs as well as our lower bound, it is optimal. We have again found a worthy adversary. No other adversary could produce a greater bound. In, the worst case, it takes at least *n* − 1 comparisons to find the largest key and at least *n* + \[lg *n*\] − 2 comparisons to find the second-largest key. Any algorithm that finds the second-largest key must also find the largest key, because to know that a key is second largest, we must know that it has lost one comparison. That loss must be to the largest key. Therefore, it is not surprising that it is harder to find the second-largest key. ### 8.5.4 Finding the kth-Smallest Key In general, the Selection problem entails finding the *k*th-largest or *k*th-smallest key. So far, we've discussed finding the largest key, because it has seemed more appropriate for the terms used. That is, it seems appropriate to call the largest key a winner. Here we discuss finding the *k*th-smallest key because it makes our algorithms more lucid. For simplicity, we assume that the keys are distinct. One way to find the *k*th-smallest key in Θ (*n* lg *n*) time is to sort the keys and return the *k*th key. We develop a method that requires fewer comparisons. Recall that procedure *partition* in [Alogrithm 2.7](LiB0018.html#179), which is used in Quicksort ([Alogrithm 2.6](LiB0018.html#177)), partitions an array so that all keys smaller than some pivot item come before it in the array and all keys larger than that pivot item come after it. The slot at which the pivot item is located is called the *pivotpoint.* We can solve the Selection problem by partitioning until the pivot item is at the *k*th slot. We do this by recursively partitioning the left subarray (the keys smaller than the pivot item) if *k* is less than pivotpoint, and by recursively partitioning the right subarray if *k* is greater than *pivotpoint.* When *k = pivotpoint*, we're done. This divide-and-conquer algorithm solves the problem by this method. Alogrithm 8.5: Selection**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Problem: Find the *k*th-smallest key in array *S* of *n* distinct keys. Inputs: positive integers *n* and *k* where *k* ≤ *n*, array of distinct keys *S* indexed from 1 to *n.* Outputs: the *k*th-smallest key in *S.* It is returned as the value of function *selection.* ``` keytype selection (index low, index high, index k) { index pivotpoint; if (low == high) return S[low]; else { partition (low, high, pivotpoint); if (k == pivotpoint) return S[pivotpoint]; else if (k < pivotpoint) return selection (low, pivotpoint − 1, k); else return selection (pivotpoint + 1, high, k); } } void partition (index low, index high, // This is the same index& pivotpoint) // routine that appears // in Alogrithm 2.7. { index i, j; keytype pivotitem; pivotitem = S[low]; // Choose first item for j = low; // pivotitem. for (i = low + 1; i <= high; i++) if (S[i] < pivotitem) { j++; exchange S[i] and S[j]; } pivotpoint = j; exchange S[low] and S[pivotpoint]; // Put pivotitem at // pivotpoint. } ``` **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** As with our recursive functions in previous chapters, *n* and *S* are not inputs to function *selection.* The top-level call to that function would be ![](https://box.kancloud.cn/5bf2d16996495ce3d5b9932facbe6041_262x23.jpg) As in Quicksort ([Alogrithm 2.6](LiB0018.html#177)), the worst case occurs when the input to each recursive call contains one less item. This happens, for example, when the array is sorted in increasing order and *k = n.* [Alogrithm 8.5](#ch08ex22) therefore has the same worst-case time complexity as [Alogrithm 2.6](LiB0018.html#177), which means that the *worst-case time complexity of the number of comparisons of keys done by [Alogrithm 8.5](#ch08ex22)* is given by ![](https://box.kancloud.cn/7967db401c4883eeee6603cae657d07a_150x40.jpg) Although the worst case is the same as that of Quicksort, we show next that [Alogrithm 8.5](#ch08ex22) performs much better on the average. **![Start Sidebar](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Analysis of Algorithm 8.5 Average-Case Time Complexity (Selection)![](https://box.kancloud.cn/485bc67a7beadb016ac9d44f235f6e03_27x27.jpg) **Basic operation**: the comparison of *S\[i\]* with *pivotitem* in *partition.* **Input size**: *n*, the number of items in the array. We assume that all inputs are equally likely. This means that we assume that all values of *k* are entered with equal frequency and all values of *pivotpoint* are returned with equal frequency. Let *p* stand for *pivotpoint.* There are *n* outcomes for which there is no recursive call (that is, if *p = k* for *k* = 1, 2, …, *n).* There are two outcomes for which 1 is the input size in the first recursive call (that is, if *p* = 2 with *k* = 1, or *p = n* − 1 with *k = n).* There are 2(2) = 4 outcomes for which 2 is the input size in the first recursive call (that is, if *p* = 3 with *k* = 1 or 2, or *p = n* − 2 with *k = n* − 1 or *n).* Listed below are the numbers of outcomes for all the input sizes: *Input Size in First Recursive Call* *Number of Outcomes That Yield the Given Size* 0 *n* 1 2 2 2(2) 3 2(3) ⋮ ⋮ *i* 2(*i*) ⋮ *n* − 1 2 (*n* − 1) It is not hard to see that, for each of these input sizes, all allowable values of *k* appear with equal frequency. Recall from the every-case analysis of [Alogrithm 2.7](LiB0018.html#179) that the number of comparisons in procedure *partition* is *n* − 1. Therefore, the average is given by the following recurrence: ![](https://box.kancloud.cn/847e314c571b8245e5a9f20790cdaf97_400x32.jpg) Using the result in [Example A.1](LiB0095.html#1293) in [Appendix A](LiB0093.html#1281) and the fact that *A* (0) = 0, and simplifying, we have ![](https://box.kancloud.cn/8d308e53c10af69142e488d91c9605e0_400x30.jpg) Next we mimic the technique used in the average-case time complexity analysis of [Alogrithm 2.6](LiB0018.html#177) (Quicksort). That is, we multiply the expression for *A* (*n*) by *n*2, apply the expression to *n* − 1, subtract the expression for *n* − 1 from the one for *n*, and simplify to obtain ![](https://box.kancloud.cn/3ae32c26c5b9f737f68b9cee08c83b89_340x68.jpg) Because *A* (0) = 0, we have the following recurrence: ![](https://box.kancloud.cn/e41defbc01f79ef0276d275030b18503_277x69.jpg) This recurrence can be solved using induction, as described in [Section B.1](LiB0102.html#1371) in [Appendix B](LiB0102.html#1369). The solution is ![](https://box.kancloud.cn/bcf89704d2e5e5708f8e2917e40234ff_105x28.jpg) In the same way, we can use the recurrence to show that *A (n)* is bounded below by a linear function. Therefore, ![](https://box.kancloud.cn/950b78c72ebaa269216a425e2a215f4e_115x21.jpg) It is straightforward to use the recurrence for *A (n)* to show that, for large values of *n*, ![](https://box.kancloud.cn/7c78ff6f51fbde14947dced775d1b4f4_90x18.jpg) **![End Sidebar](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** On the average, [Alogrithm 8.5](#ch08ex22) (Selection) does only a linear number of comparisons. Of course, the reason that this algorithm performs better on the average than [Alogrithm 2.6](LiB0018.html#177) (Quicksort) is that Quicksort has two calls to *partition*, whereas this algorithm has only one. However, they both degenerate into the same complexity when the input to the recursive call is *n* − 1. (In this case, Quicksort inputs an empty subarray on one side.) That time complexity is quadratic. If we could somehow prevent this from happening in [Alogrithm 8.5](#ch08ex22), we should be able to improve on worst-case quadratic time. Next we show how this is done. The best thing would be if *pivotpoint* were to split the array in the middle, because then the input would be cut in half in each recursive call. Recall that the ***median*** of *n* distinct keys is the key such that half the keys are smaller and half are larger. (This is precise only if *n* is odd.) If we could always choose the median for *pivotitem*, we would have optimal performance. But how can we determine the median? In procedure *partition*, we could try calling function *selection* with an input consisting of the original array and *k* equal to about half the size of that array. And yet this will not help because we end up back in selection with an instance of the same size as our original instance. However, the following maneuver does work. Assume for the moment that *n* is an odd multiple of 5. (We need not use 5; this is discussed at the end of the section.) We divide the *n* keys into *n*/5 groups of keys, each containing five keys. We find the median of each of these groups directly. As you are asked to show in the exercises, this can be done with six comparisons. We then call function *selection* to determine the median of the *n*/5 medians. The median of the medians is not necessarily the median of the *n* keys, but, as [Figure 8.12](#ch08fig12) shows, it is reasonably close. In that figure, the keys to the left of the smallest median (keys 2 and 3) must be less than the median of the medians, and the keys to the right of the largest median (keys 18 and 22) must be greater than the median of the medians. In general, the keys to the right of the smallest median (keys 8 and 12) and the keys to the left of the largest median (keys 6 and 14) could lie on either side of the median of medians. Notice that there are ![](https://box.kancloud.cn/d18a5c438cb071a1fef638681c807a47_92x48.jpg) [![Click To expand](https://box.kancloud.cn/589bbbce861f10ca618d7dbfceeaef7a_350x187.jpg)](fig8-12_0.jpg) Figure 8.12: Each bar represents a key. We do not know if the boldfaced keys are less than or greater than the median of the medians. keys that could be on either side of the median of the medians. It is not hard to see that, whenever *n* is an odd multiple of 5, there are ![](https://box.kancloud.cn/c69e728845bba7f62bb235a251027e7b_78x38.jpg) keys that could lie on either side of the median of the medians. Therefore, there are at most ![](https://box.kancloud.cn/083ed58d56d98fa6f45a706da03238a2_356x85.jpg) keys on one side of the median of the medians. We return to this result when we analyze the algorithm that uses this strategy. First we present the algorithm. Alogrithm 8.6: Selection Using the Median**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Problem: Find the *k*th-smallest key in the array *S* of *n* distinct keys. Inputs: positive integers *n* and *k* where *k* ≤ *n*, array of distinct keys *S* indexed from 1 to *n.* Outputs: the *k*th-smallest key in *S.* It is returned as the value of function *select.* ``` keytype select (int n, keytype S[], index k) { return selection2 (S, 1, n, k); } keytype selection2 (keytype S[] index low, index high, index k) { if (high == low) return S[low]; else{ partition2 (S, low, high, pivotpoint); if (k == pivotpoint) return S[pivotpoint]; else if (k < pivotpoint) return selection2 (S, low, pivotpoint − 1, k); else return selection2 (S, pivotpoint + 1, high, k); } } void partition2 (keytype S[], index low, index high, index& pivotpoint) { const arraysize = high − low + 1; const r = [arraysize / 5]; index i, j, mark, first, last; keytype pivotitem, T [1 .. r]; for (i = 1; i <= r, i++){ first = low + 5*i − 5; last = minimum (low + 5*i − 1, arraysize); T[i] = median of S[first] through S[last]; } pivotitem = select (r, T, [(r + 1) / 2] ); // Approximate the j = low; // median. for (i = low; i <= high; i++) if (S[i] == pivotitem){ exchange S[i] and S[j]; mark = j; // Mark where pivotitem j++; //placed. } else if (S[i] < pivotitem){ exchange S[i] and S[j]; j++; } pivotpoint = j − 1; exchange S[mark] and S[pivotpoint]; // Put pivotitem at // pivotpoint. } ``` **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** In [Alogrithm 8.6](#ch08ex23), unlike our other recursive algorithms, we show a simple function that calls our recursive function. The reason is that this simple function needs to be called in two places with different inputs. That is, it is called in procedure *partition*2 with *T* being the input, and globally as follows: ![](https://box.kancloud.cn/3beaf5771081a96f2e3f28363a39a984_237x21.jpg) We also made the array an input to the recursive function *selection*2 because the function is called to process both the global array *S* and the local array *T.* Next we analyze the algorithm. **![Start Sidebar](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Analysis of Algorithm 8.6 Worst-Case Time Complexity (Selection Using the Median)![](https://box.kancloud.cn/485bc67a7beadb016ac9d44f235f6e03_27x27.jpg) Basic operation: the comparison of *S\[i\]* with pivotitem in *partition2.* Input size: *n*, the number of items in the array. For simplicity, we develop a recurrence assuming that *n* is an odd multiple of 5. The recurrence approximately holds for *n* in general. The components in the recurrence are as follows. - The time in function *selection2* when called from function *selection2.* As already discussed, if *n* is an odd multiple of 5, at most ![](https://box.kancloud.cn/4191200332a53b3bae8724a2c8f9ad40_98x43.jpg) end up on one side of *pivotpoint*, which means that this is the worst-case number of keys in the input to this call to *selection2.* - The time in function *selection2* when called from procedure *partition2.* The number of keys in the input to this call to *selection2* is *n*/5. - The number of comparisons required to find the medians. As mentioned previously, the median of five numbers can be found by making six comparisons. When *n* is a multiple of 5, the algorithm finds the median of exactly *n*/5 groups of five numbers. Therefore, the total number of comparisons required to find the medians is 6*n*/5. - The number of comparisons required to partition the array. This number is *n* (assuming an efficient implementation of the comparison). We have established the following recurrence: ![](https://box.kancloud.cn/382932d251c5a7e20b4e8b958f7fbe3a_340x95.jpg) It is possible to show that the approximate equality holds even when *n* is not an odd multiple of 5. Of course, *W (n)* does not really have nonintegral inputs. However, considering such inputs simplifies our analysis. The recurrence does not suggest any obvious solution that can be used in an induction argument. Furthermore, it is not solvable by any of the other techniques in [Appendix B](LiB0102.html#1369). However, we can use a technique called ***constructive induction*** to obtain a candidate solution that can be used in an induction argument. That is, because we suspect that *W (n)* is linear, let's assume that *W (m)* ≤ *cm* for all *m* < *n* and for some constant *c.* Then the recurrence implies that ![](https://box.kancloud.cn/b3760c7e4e57f56e34ad5a576c42f0b1_400x40.jpg) Because we want to be able to conclude that *W (n)* ≤ *cn*, we need to solve ![](https://box.kancloud.cn/30e0399a72fd0c0e32c611b00ea84b76_175x40.jpg) to determine a value of *c* that would work in an induction argument. The solution is ![](https://box.kancloud.cn/bd37d68eb4cfc140045c3e9505ac8b34_55x18.jpg) We then choose the smallest value of *c* that satisfies the inequality and proceed forward with a formal induction argument to show that when *n* is not small, the worst-case time complexity is approximately bound as follows: ![](https://box.kancloud.cn/534af1da4275619ef9ec873cf59eee2d_104x21.jpg) Clearly, the inequality holds for *n* ≤ 5. It is left as an exercise to complete the induction argument. [Theorem 8.7](#ch08ex16) says that linear time is required just to solve the case where *k* = 1. We can conclude that ![](https://box.kancloud.cn/7044d625197494ec0faec22756569c40_118x21.jpg) **![End Sidebar](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** We have successfully solved the Selection problem with an algorithm that is linear-time in the worst case. As mentioned previously, we did not have to divide the array into groups of size 5 to do this. If *m* is the group size, any odd value of *m* ≥ 5 yields a linear-time complexity. We now present the reason. Establishing the results we state is left as exercises. For an arbitrary *m*, the recurrence for the worst-case time complexity is given by ![](https://box.kancloud.cn/b641af878ed1bf96e8b92c7be6e0b91a_400x42.jpg) where *a* is a positive constant. The sum of the coefficients of *n* in the expressions on the right is ![](https://box.kancloud.cn/4166beefc361a67d89604154ad8a2d44_190x40.jpg) It is straightforward that the expression on the right is less than 1 if and only if *m* > 3. It is possible to show that the recurrence ![](https://box.kancloud.cn/ad96e6d97a77877297f1247379ce84f9_248x21.jpg) describes a linear equation if *p + q* < 1. Therefore, Recurrence 8.2 describes a linear equation for all *m* ≥ 5. When *m* = 3, Recurrence 8.2 is as follows: ![](https://box.kancloud.cn/3fffd96909ac2d19400c73315075e18f_279x47.jpg) Using induction, it is possible to show that for this recurrence ![](https://box.kancloud.cn/0b8803e5181798581ce1b6f5202761aa_148x21.jpg) Therefore, 5 is the smallest odd value of *n* that yields linear performance. When *m* = 7, 9, or 11, the upper bound *c* on the time complexity is slightly smaller than when *m* = 5. The value of *c* increases very slowly as *m* increases beyond 11. For *m* not small, it is possible to show that *c* is approximated by 4 lg *m.* For example, if *m* = 100, the constant is about 4 lg 100 = 26.6. Our linear-time algorithm for the Selection problem is from Blum, Floyd, Pratt, Rivest, and Tarjan (1973). The original version is more complex, but its number of comparisons of keys is only about 5.5*n.* Hyafil (1976) has shown that a lower bound for finding the *k*th-smallest key in a set of *n* keys for *k* > 1 is given by ![](https://box.kancloud.cn/71de5509d60cb909eb5265791ba13dab_238x47.jpg) The proof can also be found in Horowitz and Sahni (1978). Notice that [Theorem 8.9](#ch08ex21) is a special case of this result. Other selection algorithms and lower bounds can be found in Schonhage, Paterson, and Pippenger (1976) and in Fussenegger and Gabow (1976). ### 8.5.5 A Probabilistic Algorithm for the Selection Problem In obtaining lower bounds for algorithms, we have assumed the algorithms to be deterministic, but we mentioned that those bounds also hold for probabilistic algorithms. We close this discussion by presenting a probabilistic algorithm for the Selection problem to illustrate when such an algorithm is useful. [Section 5.3](#ch08lev2sec7) presented a probabilistic algorithm—namely, a Monte Carlo algorithm for approximating the efficiency of a backtracking algorithm. Recall that a Monte Carlo algorithm does not necessarily give the correct answer. Rather, it provides an estimate of that answer, and the probability that the estimate is close to the correct answer increases as the time available to the algorithm increases. Here we show a different kind of probabilistic algorithm, called a Sherwood algorithm. A ***Sherwood algorithm*** always gives the correct answer. Such an algorithm is useful when some deterministic algorithm runs much faster on the average than it does in the worst case. Recall that this is true of [Alogrithm 8.5](#ch08ex22) (Selection). The worst-case quadratic time is achieved in that algorithm when the *pivotpoint* for a particular input is repeatedly close to *low* or *high* in the recursive calls. This happens, for example, when the array is sorted in increasing order and *k = n.* Because this is not the case for most inputs, the algorithm's average performance is linear. Suppose that, for a particular input, we choose the pivot item at random according to a uniform distribution. Then, when the algorithm is run for that input, the *pivotpoint* is more likely to be away from the endpoints than close to them. (Randonmess is reviewed in [Section A.8](LiB0100.html#1327) in [Appendix A](LiB0093.html#1281).) Therefore, linear performance is more likely. Because the number of comparisons is linear when averaged over all inputs, intuitively it seems that the ***expected value*** of the number of comparisons of keys done for a particular input should be linear when the pivot item is chosen at random according to a uniform distribution. We prove that this is the case, but first we stress the difference between this expected value and the average value obtained for [Alogrithm 8.5](#ch08ex22). Assuming that all possible inputs are presented in equal numbers, the average number of comparisons done by [Alogrithm 8.5](#ch08ex22) is linear. For any given input, [Alogrithm 8.5](#ch08ex22) always does the same number of comparisons, which is quadratic for some inputs. The Sherwood algorithm we are about to present does not do the same number of comparisons each time it executes on a given input. For any given input, it sometimes does a linear number of comparisons and sometimes does a quadratic number. However, if the algorithm is run many times using the same input, we can expect the average of the running times to be linear. You may ask why we would want to use such an algorithm when [Alogrithm 8.6](#ch08ex23) (Selection Using the Median) guarantees linear performance. The reason is that [Alogrithm 8.6](#ch08ex23) has a high constant because of the overhead needed for approximating the median. For a given input, our Sherwood algorithm runs faster on the average than [Alogrithm 8.6](#ch08ex23). The decision about whether to use [Alogrithm 8.6](#ch08ex23) or the Sherwood algorithm depends on the needs of the particular application. If better average performance is most important, we should use the Sherwood algorithm. If, on the other hand, quadratic performance can never be tolerated, we should use [Alogrithm 8.6](#ch08ex23). We stress again the advantage of the Sherwood algorithm over [Alogrithm 8.5](#ch08ex22) (Selection). As long as the inputs are uniformly distributed, [Alogrithm 8.5](#ch08ex22) also performs better on the average than [Alogrithm 8.6](#ch08ex23). However, in some particular application, the inputs may always be ones that approach the worst case in [Alogrithm 8.5](#ch08ex22). In such an application, [Alogrithm 8.5](#ch08ex22) always exhibits quadratic-time performance. The Sherwood algorithm avoids this difficulty in any application by choosing the pivot item at random according to a uniform distribution. Next we present the probabilistic (Sherwood) algorithm. Alogrithm 8.7: Probabilistic Selection**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Problem: Find the *k*th-smallest key in array *S* of *n* distinct keys. Inputs: positive integers *n* and *k* where *k* ≤ *n*, array of distinct keys *S* indexed from 1 to *n.* Outputs: the *k*th-smallest key in *S.* It is returned as the value of function *selection3.* ``` keytype selection3 (index low, index high, index k) { if (low == high) return S[low]; else{ partition3 (low, high, pivotpoint); if (k == pivotpoint) return S[pivotpoint]; else if (k < pivotpoint) return selection3 (low, pivotpoint − 1, k); else return selection3 (pivotpoint + 1, high, k); } } void partition3 (index low, index high, index& pivotpoint) { index i, j, randspot; keytype pivotitem; randspot = random index between low and high inclusive chosen according to a uniform distribution; pivotitem = S[randspot]; // Randomly choose pivotitem. j = low; for (i = low + 1; i <= high; i++) if (S[i] < pivotitem){ j++; exchange S[i] and S[j]; } pivotpoint = j; exchange S[low] and S[pivotpoint]; // Put pivotitem at pivotpoint. } ``` **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** [Alogrithm 8.7](#ch08ex24) differs from [Alogrithm 8.5](#ch08ex22) only in its random choice of the pivot item. Next we prove that the expected value of the number of comparisons is linear for any input. This analysis must be different from the average-case analysis of [Alogrithm 8.5](#ch08ex22) (Selection), because in that analysis we assumed that all values of *k* are entered with equal frequency. We want to obtain a linear-time expected value for each input. Because each input has a specific value of *k*, we can't assume that all values of *k* occur with equal frequency. We show that, independent of *k*, the expected value is linear. **![Start Sidebar](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Analysis of Algorithm 8.7 Expected-Value Time Complexity (Probabilistic Selection)![](https://box.kancloud.cn/485bc67a7beadb016ac9d44f235f6e03_27x27.jpg) Basic operation: the comparison of *S\[i\]* with *pivotitem* in *partition.* Input size: *n*, the number of items in the array. Given that we are looking for the *k*th-smallest key in an input of size *n*, the following table shows the size of the input and the new value of *k* in the first recursive call for each value of *pivotpoint.* *Pivotpoint* *Input Size* *New Value of k* 1 *n* − 1 *k* − 1 2 *n* − 2 *k* − 2 ⋮ ⋮ ⋮ *k* − 1 *n* − (*k* − 1) 1 *k* 0 *k* + 1 *k* *k* *k* + 2 *k* + 1 *k* ⋮ ⋮ ⋮ *n* *n* − 1 *k* Because all values of *pivotpoint* are equally likely, we have the following recurrence for the expected value: ![](https://box.kancloud.cn/82a26f2f8e0288b41b4d52a99aaabade_400x63.jpg) We can analyze this recurrence using constructive induction as we did in the worst-case analysis of [Alogrithm 8.6](#ch08ex23). That is, because we suspect that the recurrence is linear, we look for a value of *c* such that an induction argument should prove *E (n, k)* ≤ *cn.* This time, we show the induction argument, but we leave as an exercise the determination that *c* = 4 is the smallest constant that would work. Induction base: Because no comparisons are done for *n* = 1, > *E* (1, *k* = ≤ 4 (1). Induction hypothesis: Suppose that, for all *m* < *n* and all *k* ≤ *m*, > *E* (*m, k* ≤ 4*m*. Induction step: We need to show that, for all *k* ≤ *n*, ![](https://box.kancloud.cn/a3a51010ab250af4071d3d99a63dddcc_107x20.jpg) By the recurrence and the induction hypothesis, ![](https://box.kancloud.cn/f81a42c45467155bae9414caea01224a_400x64.jpg) We have that ![](https://box.kancloud.cn/dd08af11da187579104083c535e9bacf_400x227.jpg) The third equality is obtained by twice applying the result of [Example A.1](LiB0095.html#1293) in [Appendix A](LiB0093.html#1281). The last inequality derives from the fact that, in general, *k (n − k)* ≤ *n*2/4. Plugging the result just obtained into Inequality 8.3, we have ![](https://box.kancloud.cn/7280b6fa146eccaad882fb51a0decc82_365x40.jpg) We have shown that, independent of the value of *k*, ![](https://box.kancloud.cn/cb267cd064514e9855665b50d4c71e58_174x23.jpg) **![End Sidebar](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**