7.8.3 Lower Bounds for Average-Case Behavior · Foundations of Algorithms Using C++ Pseudocod

## 7.8.3 Lower Bounds for Average-Case Behavior We obtain our results under the assumption that all possible permutations are equally likely to be the input. If the pruned, valid, binary decision tree corresponding to a deterministic sorting algorithm for sorting *n* distinct keys contains any comparison nodes with only one child (as is the case for the tree in [Figure 7.12](LiB0062.html#706)), we can replace each such node by its child and prune the child to obtain a decision tree that sorts using no more comparisons than did the original tree. Every nonleaf in the new tree will contain exactly two children. A binary tree in which every nonleaf contains exactly two children is called a ***2-tree.*** We summarize this result with the following lemma. Lemma 7.5**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**To every pruned, valid, binary decision tree for sorting *n* distinct keys, there corresponds a pruned, valid decision 2-tree that is at least as efficient as the original tree. Proof: the proof follows from the preceding discussion. **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** The ***external path length (EPL)*** of a tree is the total length of all paths from the root to the leaves. For example, for the tree in [Figure 7.11](LiB0062.html#705), ![](https://box.kancloud.cn/5ac76a7741614597c4cf6106c6dd3fcd_273x17.jpg) Recall that the number of comparisons done by a decision tree to reach a leaf is the length of the path to the leaf. Therefore, the *EPL* of a decision tree is the total number of comparisons done by the decision tree to sort all possible inputs. Because there are *n*! different inputs of size *n* (when all keys are distinct) and because we are assuming all inputs to be equally likely, the average number of comparisons done by a decision tree for sorting *n* distinct keys is given by ![](https://box.kancloud.cn/cc0120d47025ccbbc16c9920bd4d9302_49x41.jpg) This result enables us to prove an important lemma. First we define ***minEPL(m)*** as the minimum of the *EPL* of 2-trees containing *m* leaves. The lemma now follows. Lemma 7.6**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Any deterministic algorithm that sorts *n* distinct keys only by comparisons of keys must on the average do at least ![](https://box.kancloud.cn/a07d00e1d3976080ec4e386b1b6db3bd_266x40.jpg) Proof: [Lemma 7.1](LiB0062.html#709)says that to every deterministic algorithm for sorting *n* distinct keys there corresponds a pruned, valid, binary decision tree containing *n* leaves. [Lemma 7.5](#ch07ex15) says that we can convert that decision tree to a 2-tree that is at least as efficient as the original tree. Because the original tree has *n*! leaves, so must the 2-tree we obtain from it. The lemma now follows from the preceding discussion. **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** By [Lemma 7.6](#ch07ex16), to obtain a lower bound for the average case, we need only find a lower bound for *mimEPL(m)*, which is accomplished by means of the following four lemmas. Lemma 7.7**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Any 2-tree that has *m* leaves and whose *EPL* equals *mimEPL(m)* must have all of its leaves on at most the bottom two levels. Proof: Suppose that some 2-tree does not have all of its leaves on the bottom two levels. Let *d* be the depth of the tree, let A be a leaf in the tree that is not on one of the bottom two levels, and let *k* be the depth of *A.* Because nodes at the bottom level have depth *d*, ![](https://box.kancloud.cn/c0f06590c6399dc5951689a53f6ae0cc_79x19.jpg) We show that this tree cannot minimize the *EPL* among trees with the same number of leaves, by developing a 2-tree with the same number of leaves and a lower *EPL.* We can do this by choosing a nonleaf *B* at level *d* – 1 in our original tree, removing its two children, and giving two children to *A*, as illustrated in [Figure 7.13](LiB0062.html#707). Clearly the new tree has the same number of leaves as the original tree. In our new tree, neither *A* nor the children of *B* are leaves, but they are leaves in our old tree. Therefore, we have decreased the *EPL* by the length of the path to *A* and by the lengths of the two paths to *B*'s children. That is, we have decreased the *EPL* by ![](https://box.kancloud.cn/f6b0fa83107f1b3c2cf241d37e9db145_151x18.jpg) In our new tree, *B* and the two new children of *A* are leaves, but they are not leaves in our old tree. Therefore, we have increased the *EPL* by the length of the path to *B* and the lengths of the two paths to *A*'s new children. That is, we have increased the *EPL* by ![](https://box.kancloud.cn/704218d813fe9444107e4879f30979f2_275x17.jpg) The net change in the *EPL* is ![](https://box.kancloud.cn/13de666f9d1f266f5044ad569ef85620_400x21.jpg) The inequality occurs because *k* < *d* – 2. Because the net change in the *EPL* is negative, the new tree has a smaller *EPL.* This completes the proof that the old tree cannot minimize the *EPL* among trees with the same number of leaves. **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** Lemma 7.8**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Any 2-tree that has *m* leaves and whose *EPL* equals *mimEPL(m)* must have ![](https://box.kancloud.cn/bdcf617fe3cc886bb53ce1b4a4ea2a5c_400x21.jpg) and have no other leaves, where *d* is the depth of the tree. Proof: Because [Lemma 7.7](#ch07ex17)says that all leaves are at the bottom two levels and because nonleaves in a 2-tree must have two children, it is not hard to see that there must be 2*d*–1 nodes at level *d* – 1. Therefore, if *r* is the number of leaves at level *d* – 1, the number of nonleaves at that level is 2*d*–1 - *r*. Because nonleaves in a 2-tree have exactly two children, for every nonleaf at level *d* – 1, there are two leaves at level *d.* Because these are the only leaves at level *d*, the number of leaves at level *d* is equal to 2 (2*d*–1 - *r*). Because [Lemma 7.7](#ch07ex17)says that all leaves are at level *d* or *d* – 1, ![](https://box.kancloud.cn/af41f8bc2d8cc0ecc6369c4646a158f6_171x24.jpg) Simplifying yields ![](https://box.kancloud.cn/bd5d1e19795cbdcd5fcb9c5b9363ec27_95x21.jpg) Therefore, the number of leaves at level *d* is ![](https://box.kancloud.cn/1b239c998bbc03a68d1648e988fb9224_204x27.jpg) **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** Lemma 7.9**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**For any 2-tree that has *m* leaves and whose *EPL* equals *mimEPL(m)*, the depth *d* is given by ![](https://box.kancloud.cn/95cbb647ab0ef6fdd9a84703c994131c_92x19.jpg) Proof: We prove the case where *m* is a power of 2. The proof of the general case is left as an exercise. If *m* is a power of 2, then, for some integer *k*, ![](https://box.kancloud.cn/345dea005c206bff93e0fb78e91f2ac3_65x21.jpg) Let *d* be the depth of a minimizing tree. As in [Lemma 7.8](#ch07ex18), let *r* be the number of leaves at level *d* – 1. By that lemma, ![](https://box.kancloud.cn/540d970bbabf1819eb1df70446ceff2e_177x23.jpg) Because *r* > 0, we must have *d > k.* We show that assuming *d > k* leads to a contradiction. If *d > k*, then ![](https://box.kancloud.cn/355f437b48588ec69086fa5baccb8efd_277x21.jpg) Because *r < m*, this means that *r = m*, and all leaves are at level *d* – 1. But there must be some leaves at level *d.* This contradiction implies that *d = k*, which means that *r* = 0. Because *r* = 0, ![](https://box.kancloud.cn/1df7286f05de980ea99c19b0eef1b66a_95x21.jpg) which means that *d* = lg *m.* Because lg *m* = \[lg *m*\] when *m* is a power of 2, this completes the proof. **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** Lemma 7.10**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**For all integers *m* > 1 ![](https://box.kancloud.cn/a0af68a670c90a13e16b45174b40e70e_196x23.jpg) Proof: By [Lemma 7.8](#ch07ex18), any 2-tree that minimizes this *EPL* must have 2*d* – *m* leaves at level *d* – 1, have 2*m* – 2*d* leaves at level *d*, and have no other leaves. We therefore have ![](https://box.kancloud.cn/c3572e626f8a2d2c8f6a10263b62e57d_400x23.jpg) Therefore, by [Lemma 7.9](#ch07ex19), ![](https://box.kancloud.cn/ac3cabb7b3cc0e1a8cd65ae8471119b5_316x26.jpg) If *m* is a power of 2, this expression clearly equals *m* lg *m*, which equals *m* lg *m* in this case. If *m* is not a power of 2, then \[lg *m*\] = \[lg *m*\] +1. So, in this case, ![](https://box.kancloud.cn/991b56733e4c56e4ced0748e708f1796_400x56.jpg) This inequality occurs because, in general, 2*m* > 2\[lg *m*\]. This completes the proof. **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** Now that we have a lower bound for *minEPL(m)*, we can prove our main result. Theorem 7.4**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Any deterministic algorithm that sorts *n* distinct keys only by comparisons of keys must on the average do at least ![](https://box.kancloud.cn/a88bc5f25c8bce4951dd42c39d66bfa1_290x23.jpg) Proof: By [Lemma 7.6](#ch07ex16), any such algorithm must on the average do at least ![](https://box.kancloud.cn/5ae8e4eb0adbd6ae40c9fdbf247a2b3d_267x41.jpg) By [Lemma 7.10](#ch07ex20), this expression is greater than or equal to ![](https://box.kancloud.cn/326af3cde4a77618c9ce5679895cdd59_181x43.jpg) The proof now follows from [Lemma 7.4](LiB0063.html#718). **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** We see that Mergesort's average-case performance of about *n* lg *n* – 1.26*n* is near optimal for algorithms that sort only by comparisons of keys.