8.2 Interpolation Search · Foundations of Algorithms Using C++ Pseudocod

## 8.2 Interpolation Search The bounds just obtained are for algorithms that rely only on comparisons of keys. We can improve on these bounds if we use some other information to assist in our search. Recall that Barney Beagle does more than just compare keys to find Colleen Collie's number in the phone book. He does not start in the middle of the phone book because he knows that the C's are near the front. He "interpolates" and starts near the front. We develop an algorithm for this strategy next. Suppose we are searching 10 integers, and we know that the first integer ranges from 0 to 9, the second from 10 to 19, the third from 20 to 29, …, and the tenth from 90 to 99. Then we can immediately report failure if the search key *x* is less than 0 or greater than 99, and, if neither of these is the case, we need only compare *x* with *S* \[1 + \[*x*/10\]\]. For example, we would compare *x* = 25 with *S* \[1 + \[25/10\]\] = *S* \[3\]. If they were not equal, we would report failure. We usually do not have this much information. However, in some applications it is reasonable to assume that the keys are close to being evenly distributed between the first one and the last one. In such cases, instead of checking whether *x* equals the middle key, we can check whether *x* equals the key that is located about where we would expect to find *x*. For example, if we think 10 keys are close to being evenly distributed from 0 to 99, we would expect to find *x* = 25 about in the third position, and we would compare *x* first with *S* \[3\] instead of *S* \[5\]. The algorithm that implements this strategy is called ***Interpolation Search.*** As in Binary Search, *low* is set initially to 1 and *high* to *n.* We then use linear interpolation to determine approximately where we feel *x* should be located. That is, we compute ![](https://box.kancloud.cn/21248fced305279118b432afa0ab6084_397x48.jpg) For example, if *S* \[1\] = 4 and *S* \[10\] = 97, and we were searching for *x* = 25, ![](https://box.kancloud.cn/34b366b27cdb01636bffa15b94b04e73_397x46.jpg) Other than the different way of computing *mid* and some extra bookkeeping, the Interpolation Search algorithm proceeds like Binary Search ([Alogrithm 1.5](LiB0009.html#38)). Alogrithm 8.1: Interpolation Search**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Problem: Determine whether *x* is in the sorted array *S* of size *n.* Inputs: positive integer *n*, and sorted (nondecreasing order) array of numbers *S* indexed from 1 to *n.* Outputs: the location *i* of *x* in *S*; 0 if *x* is not in *S.* ``` void interpsrch ( int n, const number S[], number x, index & i) { index low, high, mid; number denominator; low = 1; high = n; i = 0; if (S[low] ≤ x ≤ S[high]) while (low <= high && i == 0){ denominator = S[high] − S[low]; if (denominator == 0) mid = low; else mid = low + [((x − S[low]) * (high − low)) / denominator]; if (x==S[mid]) i = mid; else if (x < S[mid]) high = mid − 1; else low = mid + 1; } } ``` **![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)** If the keys are close to being evenly spaced, Interpolation Search homes in on the possible location of *x* more quickly than does Binary Search. For instance, in the preceding example, if *x* = 25 were less than *S*\[3\], Interpolation Search would reduce the instance of size 10 to one of size 2, whereas Binary Search would reduce it to one of size 4. Suppose that the keys are uniformly distributed between *S* \[1\] and *S* \[*n*\]. By this we mean that the probability of a randomly chosen key being in a particular range equals its probability of being in any other range of the same length. If this were the case, we would expect to find *x* at approximately the slot determined by Interpolation Search, and therefore we would expect Interpolation Search to outperform Binary Search on the average. Indeed, under the assumptions that the keys are uniformly distributed and that the search key *x* is equally likely to be in each of the array slots, it is possible to show that the average-case time complexity of Interpolation Search is given by ![](https://box.kancloud.cn/2e5ed208d54fdce538a74ed0b8835827_132x21.jpg) If *n* equals one billion, lg (lg *n*) is about 5, whereas lg *n* is about 30. A drawback of Interpolation Search is its worst-case performance. Suppose again that there are 10 keys and their values are 1, 2, 3, 4, 5, 6, 7, 8, 9, and 100. If *x* were 10; *mid* would repeatedly be set to *low*, and *x* would be compared with every key. In the worst case, Interpolation Search degenerates to a sequential search. Notice that the worst case happens when *mind* is repeatedly set to *low.* A variation of Interpolation Search called ***Robust Interpolation Search*** remedies this situation by establishing a variable *gap* such that *mid − low* and *high − mid* are always greater than *gap.* Initially we set ![](https://box.kancloud.cn/fa2059ae0fa2ac7acaa26af88d6ba082_233x35.jpg) and we compute *mid* using the previous formula for linear interpolation. After that computation, the value of *mid* is possibly changed with the following computation: ![](https://box.kancloud.cn/b08497d456969249a84a7223afc809d6_400x20.jpg) In the example where *x* = 10 and the 10 keys are 1, 2, 3, 4, 5, 6, 7, 8, 9, and 100, *gap* would initially be \[(10 − 1 + 1)1/2\] = 3, *mid* would initially be 1, and we would obtain ![](https://box.kancloud.cn/de47e1cba3e13fe34bb95630b6ada2eb_400x21.jpg) In this way we guarantee that the index used for the comparison is at least *gap* positions away from *low* and *high.* Whenever the search for *x* continues in the subarray containing the larger number of array elements, the value of *gap* is doubled, but it is never made greater than half the number of array elements in that subarray. For instance, in the previous example, the search for *x* continues in the larger subarray (the one from *S* \[5\] to *S* \[10\]). Therefore, we would double *gap*, except that in this case the subarray contains only six array elements, and doubling *gap* would make it exceed half the number of array elements in the subarray. We double *gap* in order to quickly escape from large clusters. When *x* is found to lie in the subarray containing the smaller number of array elements, we reset *gap* to its initial value. Under the assumptions that the keys are uniformly distributed and that the search key *x* is equally likely to be in each of the array slots, the average-case time complexity for Robust Interpolation Search is in Θ(lg(lg *n*)). Its worst-case time complexity is in Θ((lg *n*)2), which is worse than Binary Search but much better than Interpolation Search. There are quite a few extra computations in Robust Interpolation Search relative to Interpolation Search and in Interpolation Search relative to Binary Search. In practice, one should analyze whether the savings in comparisons justifies this increase in computations. The searches described here are also applicable to words, because words can readily be encoded as numbers. We can therefore apply the modified binary search method to searching the phone book.