## 8.2 Interpolation Search
The bounds just obtained are for algorithms that rely only on comparisons of keys. We can improve on these bounds if we use some other information to assist in our search. Recall that Barney Beagle does more than just compare keys to find Colleen Collie's number in the phone book. He does not start in the middle of the phone book because he knows that the C's are near the front. He "interpolates" and starts near the front. We develop an algorithm for this strategy next.
Suppose we are searching 10 integers, and we know that the first integer ranges from 0 to 9, the second from 10 to 19, the third from 20 to 29, …, and the tenth from 90 to 99. Then we can immediately report failure if the search key *x* is less than 0 or greater than 99, and, if neither of these is the case, we need only compare *x* with *S* \[1 + \[*x*/10\]\]. For example, we would compare *x* = 25 with *S* \[1 + \[25/10\]\] = *S* \[3\]. If they were not equal, we would report failure.
We usually do not have this much information. However, in some applications it is reasonable to assume that the keys are close to being evenly distributed between the first one and the last one. In such cases, instead of checking whether *x* equals the middle key, we can check whether *x* equals the key that is located about where we would expect to find *x*. For example, if we think 10 keys are close to being evenly distributed from 0 to 99, we would expect to find *x* = 25 about in the third position, and we would compare *x* first with *S* \[3\] instead of *S* \[5\]. The algorithm that implements this strategy is called ***Interpolation Search.*** As in Binary Search, *low* is set initially to 1 and *high* to *n.* We then use linear interpolation to determine approximately where we feel *x* should be located. That is, we compute
![](https://box.kancloud.cn/21248fced305279118b432afa0ab6084_397x48.jpg)
For example, if *S* \[1\] = 4 and *S* \[10\] = 97, and we were searching for *x* = 25,
![](https://box.kancloud.cn/34b366b27cdb01636bffa15b94b04e73_397x46.jpg)
Other than the different way of computing *mid* and some extra bookkeeping, the Interpolation Search algorithm proceeds like Binary Search ([Alogrithm 1.5](LiB0009.html#38)).
Alogrithm 8.1: Interpolation Search**![Start example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**Problem: Determine whether *x* is in the sorted array *S* of size *n.* Inputs: positive integer *n*, and sorted (nondecreasing order) array of numbers *S* indexed from 1 to *n.*
Outputs: the location *i* of *x* in *S*; 0 if *x* is not in *S.*
```
void interpsrch ( int n,
const number S[],
number x, index & i)
{
index low, high, mid;
number denominator;
low = 1; high = n; i = 0;
if (S[low] ≤ x ≤ S[high])
while (low <= high && i == 0){
denominator = S[high] − S[low];
if (denominator == 0)
mid = low;
else
mid = low + [((x − S[low]) * (high − low)) / denominator];
if (x==S[mid])
i = mid;
else if (x < S[mid])
high = mid − 1;
else
low = mid + 1;
}
}
```
**![End example](https://box.kancloud.cn/e95df9d604ab2d89febe370ae4d88fb1_1x1.gif)**
If the keys are close to being evenly spaced, Interpolation Search homes in on the possible location of *x* more quickly than does Binary Search. For instance, in the preceding example, if *x* = 25 were less than *S*\[3\], Interpolation Search would reduce the instance of size 10 to one of size 2, whereas Binary Search would reduce it to one of size 4.
Suppose that the keys are uniformly distributed between *S* \[1\] and *S* \[*n*\]. By this we mean that the probability of a randomly chosen key being in a particular range equals its probability of being in any other range of the same length. If this were the case, we would expect to find *x* at approximately the slot determined by Interpolation Search, and therefore we would expect Interpolation Search to outperform Binary Search on the average. Indeed, under the assumptions that the keys are uniformly distributed and that the search key *x* is equally likely to be in each of the array slots, it is possible to show that the average-case time complexity of Interpolation Search is given by
![](https://box.kancloud.cn/2e5ed208d54fdce538a74ed0b8835827_132x21.jpg)
If *n* equals one billion, lg (lg *n*) is about 5, whereas lg *n* is about 30.
A drawback of Interpolation Search is its worst-case performance. Suppose again that there are 10 keys and their values are 1, 2, 3, 4, 5, 6, 7, 8, 9, and 100. If *x* were 10; *mid* would repeatedly be set to *low*, and *x* would be compared with every key. In the worst case, Interpolation Search degenerates to a sequential search. Notice that the worst case happens when *mind* is repeatedly set to *low.* A variation of Interpolation Search called ***Robust Interpolation Search*** remedies this situation by establishing a variable *gap* such that *mid − low* and *high − mid* are always greater than *gap.* Initially we set
![](https://box.kancloud.cn/fa2059ae0fa2ac7acaa26af88d6ba082_233x35.jpg)
and we compute *mid* using the previous formula for linear interpolation. After that computation, the value of *mid* is possibly changed with the following computation:
![](https://box.kancloud.cn/b08497d456969249a84a7223afc809d6_400x20.jpg)
In the example where *x* = 10 and the 10 keys are 1, 2, 3, 4, 5, 6, 7, 8, 9, and 100, *gap* would initially be \[(10 − 1 + 1)1/2\] = 3, *mid* would initially be 1, and we would obtain
![](https://box.kancloud.cn/de47e1cba3e13fe34bb95630b6ada2eb_400x21.jpg)
In this way we guarantee that the index used for the comparison is at least *gap* positions away from *low* and *high.* Whenever the search for *x* continues in the subarray containing the larger number of array elements, the value of *gap* is doubled, but it is never made greater than half the number of array elements in that subarray. For instance, in the previous example, the search for *x* continues in the larger subarray (the one from *S* \[5\] to *S* \[10\]). Therefore, we would double *gap*, except that in this case the subarray contains only six array elements, and doubling *gap* would make it exceed half the number of array elements in the subarray. We double *gap* in order to quickly escape from large clusters. When *x* is found to lie in the subarray containing the smaller number of array elements, we reset *gap* to its initial value.
Under the assumptions that the keys are uniformly distributed and that the search key *x* is equally likely to be in each of the array slots, the average-case time complexity for Robust Interpolation Search is in Θ(lg(lg *n*)). Its worst-case time complexity is in Θ((lg *n*)2), which is worse than Binary Search but much better than Interpolation Search.
There are quite a few extra computations in Robust Interpolation Search relative to Interpolation Search and in Interpolation Search relative to Binary Search. In practice, one should analyze whether the savings in comparisons justifies this increase in computations.
The searches described here are also applicable to words, because words can readily be encoded as numbers. We can therefore apply the modified binary search method to searching the phone book.
- Table of Contents
- BackCover
- Foundations of Algorithms Using C++ Pseudocode, Third Edition
- Preface
- Chapter Contents
- Pedagogy
- Course Outlines
- Acknowledgments
- Errors
- Chapter 1: Algorithms - Efficiency, Analysis, and Order
- 1.2 The Importance of Developing Efficient Algorithms
- 1.3 Analysis of Algorithms
- 1.4 Order
- 1.5 Outline of This Book
- Exercises
- Chapter 2: Divide-and-Conquer
- 2.1 Binary Search
- 2.2 Mergesort
- 2.3 The Divide-and-Conquer Approach
- 2.4 Quicksort (Partition Exchange Sort)
- 2.5 Strassen's Matrix Multiplication Algorithm
- 2.6 Arithmetic with Large Integers
- 2.7 Determining Thresholds
- 2.8 When Not to Use Divide-and-Conquer
- Exercises
- Chapter 3: Dynamic Programming
- 3.1 The Binomial Coefficient
- 3.2 Floyd's Algorithm for Shortest Paths
- 3.3 Dynamic Programming and Optimization Problems
- 3.4 Chained Matrix Multiplication
- 3.5 Optimal Binary Search Trees
- 3.6 The Traveling Salesperson Problem
- Exercises
- Chapter 4: The Greedy Approach
- 4.1 Minimum Spanning Trees
- 4.2 Dijkstra's Algorithm for Single-Source Shortest Paths
- 4.3 Scheduling
- 4.4 Huffman Code
- 4.5 The Greedy Approach versus Dynamic Programming: The Knapsack Problem
- Exercises
- Chapter 5: Backtracking
- 5.2 The n-Queens Problem
- 5.3 Using a Monte Carlo Algorithm to Estimate the Efficiency of a Backtracking Algorithm
- 5.4 The Sum-of-Subsets Problem
- 5.5 Graph Coloring
- 5.6 The Hamiltonian Circuits Problem
- 5.7 The 0-1 Knapsack Problem
- Exercises
- Chapter 6: Branch-and-Bound
- 6.1 Illustrating Branch-and-Bound with the 0 - 1 Knapsack problem
- 6.2 The Traveling Salesperson Problem
- 6.3 Abductive Inference (Diagnosis)
- Exercises
- Chapter 7: Introduction to Computational Complexity - The Sorting Problem
- 7.2 Insertion Sort and Selection Sort
- 7.3 Lower Bounds for Algorithms that Remove at Most One Inversion per Comparison
- 7.4 Mergesort Revisited
- 7.5 Quicksort Revisited
- 7.6 Heapsort
- 7.6.1 Heaps and Basic Heap Routines
- 7.6.2 An Implementation of Heapsort
- 7.7 Comparison of Mergesort, Quicksort, and Heapsort
- 7.8 Lower Bounds for Sorting Only by Comparison of Keys
- 7.8.1 Decision Trees for Sorting Algorithms
- 7.8.2 Lower Bounds for Worst-Case Behavior
- 7.8.3 Lower Bounds for Average-Case Behavior
- 7.9 Sorting by Distribution (Radix Sort)
- Exercises
- Chapter 8: More Computational Complexity - The Searching Problem
- 8.1 Lower Bounds for Searching Only by Comparisons of Keys
- 8.2 Interpolation Search
- 8.3 Searching in Trees
- 8.4 Hashing
- 8.5 The Selection Problem: Introduction to Adversary Arguments
- Exercises
- Chapter 9: Computational Complexity and Interactability - An Introduction to the Theory of NP
- 9.2 Input Size Revisited
- 9.3 The Three General Problem Categories
- 9.4 The Theory of NP
- 9.5 Handling NP-Hard Problems
- Exercises
- Chapter 10: Number-Theoretic Algorithms
- 10.1 Number Theory Review
- 10.2 Computing the Greatest Common Divisor
- 10.3 Modular Arithmetic Review
- 10.4 Solving Modular Linear Equations
- 10.5 Computing Modular Powers
- 10.6 Finding Large Prime Numbers
- 10.7 The RSA Public-Key Cryptosystem
- Exercises
- Chapter 11: Introduction to Parallel Algorithms
- 11.1 Parallel Architectures
- 11.2 The PRAM Model
- Exercises
- Appendix A: Review of Necessary Mathematics
- A.2 Functions
- A.3 Mathematical Induction
- A.4 Theorems and Lemmas
- A.5 Logarithms
- A.6 Sets
- A.7 Permutations and Combinations
- A.8 Probability
- Exercises
- Appendix B: Solving Recurrence Equations - With Applications to Analysis of Recursive Algorithms
- B.2 Solving Recurrences Using the Characteristic Equation
- B.3 Solving Recurrences by Substitution
- B.4 Extending Results for n, a Power of a Positive Constant b, to n in General
- B.5 Proofs of Theorems
- Exercises
- Appendix C: Data Structures for Disjoint Sets
- References
- Index
- Index_B
- Index_C
- Index_D
- Index_E
- Index_F
- Index_G
- Index_H
- Index_I
- Index_J
- Index_K
- Index_L
- Index_M
- Index_N
- Index_O
- Index_P
- Index_Q
- Index_R
- Index_S
- Index_T
- Index_U
- Index_V
- Index_W-X
- Index_Y
- Index_Z
- List of Figures
- List of Tables
- List of Algorithms, Examples, and Theorems
- List of Sidebars