9.2 Input Size Revisited · Foundations of Algorithms Using C++ Pseudocod

## 9.2 Input Size Revisited So far it has usually sufficed to call *n* the input size in our algorithms because *n* has been a reasonable measure of the amount of data in the input. For example, in the case of sorting algorithms, *n*, the number of keys to be sorted, is a good measure of the amount of data in the input. So we called *n* the input size. However, we must not inadvertently call *n* the input size in an algorithm. Consider the following algorithm, which determines whether a positive integer *n* is prime. ``` bool prime (int n) { int i; bool switch; switch = true; i = 2; while (switch && i ≤n1/2) if (n % i == 0) // % returns the remainder switch = false; // when n is divided by i. else i++; return switch; ``` The number of passes through the **while** loop in this prime-checking algorithm is clearly in Θ(*n*1/2). However, is it a polynomial-time algorithm? The parameter *n* is the input to the algorithm; it is not the size of the input. That is, each value of *n* constitutes an instance of the problem. This is unlike a sorting algorithm, for example, in which *n* is the number of keys and the instance is the *n* keys. If the value of *n* is the input and not the size of the input in function *prime*, what is the size of the input? We return to this question after we define input size more concretely than we did in [Section 1.3](LiB0010.html#57). Definition For a given algorithm, the ***input size*** is defined as the number of characters it takes to write the input. This definition is not different from that given in [Section 1.3](LiB0010.html#57). It is only more specific about how we measure the size of the input. To count the characters it takes to write the input, we need to know how the input is encoded. Suppose that we encode it in binary, which is used inside computers. Then the characters used for the encoding are binary digits (bits), and the number of characters it takes to encode a positive integer *x* is \[lg *x*\] + 1. For example, 31 = 111112 and \[lg 31\] + 1 = 5. We simply say that it takes about lg *x* bits to encode a positive integer *x* in binary. Suppose that we use binary encoding and we wish to determine the input size for an algorithm that sorts *n* positive integers. The integers to be sorted are the inputs to the algorithm. Therefore, the input size is the count of the number of bits it takes to encode them. If the largest integer is *L*, and we encode each integer in the number of bits needed to encode the largest, then it takes about lg *L* bits to encode each of them. The input size for the *n* integers is therefore about *n* lg *L.* Suppose that instead we choose to encode the integers in base 10. Then the characters used for the encoding are decimal digits, it takes about log *L* characters to encode the largest, and the input size for the *n* integers is about *n* log *L.* Because ![](https://box.kancloud.cn/34dfcf29cf218685eb3a13b34684cba6_195x22.jpg) an algorithm is polynomial-time in terms of one of these input sizes if and only if it is polynomial-time in terms of the other. If we restrict ourselves to "reasonable" encoding schemes, then the particular encoding scheme used does not affect the determination of whether an algorithm is polynomial-time.. There does not seem to be a satisfactory formal definition of "reasonable." However, for most algorithms we usually agree on what is reasonable. For example, for any algorithm in this text, any base other than 1 could be used to encode an integer without affecting whether or not the algorithm is polynomial-time. We would therefore consider any such encoding system to be reasonable. Encoding in base 1, which is called the unary form, would not be considered reasonable. In the preceding chapters, we simply called *n*, the number of keys to be sorted, the input size in our sorting algorithms. Using *n* as the input size, we showed that the algorithms are polynomial-time. Do they remain polynomialtime when we are precise about the input size? Next we illustrate that they do. When we are being precise about input size, we also need to be more precise (than we were in [Section 1.3.1](LiB0010.html#58)) about the definition of worst-case time complexity. The precise definition follows. Definition For a given algorithm, *W* (*s*) is defined as the maximum number of steps done by the algorithm for an input size of *s.* *W* (*s*) is called the ***worst-case time complexity*** of the algorithm. A ***step*** can be considered the equivalent of one machine comparison or assignment, or, to keep the analysis machine-independent, one bit comparison or assignment. This definition is not different from that given in [Section 1.3](LiB0010.html#57). It is only more specific about the basic operation. That is, according to this definition, each step constitutes one execution of the basic operation. We used s instead of *n* for the input size because (1) the parameter *n* to our algorithms is not always a measure of the input size (e.g., in the prime-checking algorithm presented at the start of this section) and (2) when *n* is a measure of the input size, it is not ordinarily a precise measure of it. According to the definition just given, we must count all the steps done by the algorithm. Let's illustrate how we can do this, while still avoiding the details of the implementation, by analyzing [Algorithm 1.3](LiB0008.html#32) (Exchange Sort). For simplicity, assume that the keys are positive integers and that there are no other fields in the records. Look again at [Algorithm 1.3](LiB0008.html#32). The number of steps done to increment loops and do branching is bounded by a constant *c* times *n*2. If the integers are sufficiently large, they cannot be compared or assigned in one step by the computer. We saw a similar situation when we discussed large integer arithmetic in [Section 2.6](LiB0020.html#207). Therefore, we should not consider one key comparison or assignment as one step. To keep our analysis machine-independent, we consider one step to be either one bit comparison or one bit assignment. Therefore, if *L* is the largest integer, it takes at most lg *L* steps to compare one integer or to assign one integer. When we analyzed [Algorithm 1.3](LiB0008.html#32) in [Sections 1.3](LiB0010.html#57) and [Sections 7.2](LiB0053.html#633), we saw that in the worst-case Exchange Sort does *n*(*n* − 1)2 comparisons of keys and 3*n*(*n* − 1) /2 assignments to sort *n* positive integers. Therefore, the maximum number of steps done by Exchange Sort is no greater than ![](https://box.kancloud.cn/733a80b6fd69cabab7318748dac31b91_298x70.jpg) Let's use *s* = *n* lg *L* as the input size. Then ![](https://box.kancloud.cn/efbbc33d493c3e09df6672425cb36e7e_400x110.jpg) We have shown that Exchange Sort remains polynomial-time when we are precise about the input size. We can obtain similar results for all the algorithms, which we've shown to be polynomial-time using imprecise inputsizes. Furthermore, we can show that the algorithms, which we've shown to be nonpolynomial-time (e.g., Algorithm 3.11), remain nonpolynomial-time when we are precise about the input size. We see that when *n* is a measure of the amount of data in the input, we obtain correct results concerning whether an algorithm is polynomial-time by simply using *n* as the input size. Therefore, when this is the case, we continue to use *n* as the input size. We return now to the prime-checking algorithm. Because the input to the algorithm is the value of *n*, the input size is the number of characters it takes to encode *n.* Recall that if we are using base 10 encoding, it takes ∟log *n*⌋ + 1 characters to encode *n.* For example, if the number is 340, it takes 3 decimal digits, not 340, to encode it. In general, if we are using base 10 encoding, and if we set ![](https://box.kancloud.cn/f58b8b6dabce6ade1f5ace73e5c84cd8_75x19.jpg) then *s* is approximately the size of the input. In the worst-case, there are ∟*n*1/2⌋– 1 passes through the loop in function *prime.* Because *n* = 10*s*, the worst-case number of passes through the loop is about 10*s*/2. Because the total number of steps is at least equal to the number of passes through the loop, the time complexity is nonpolynomial. If we use binary encoding, it takes about lg *n* characters to encode *n.* Therefore, if we use binary encoding, ![](https://box.kancloud.cn/92c7b3289e3d9a8847daac792c0392d3_63x19.jpg) is about equal to the input size, and the number of passes through the loop is about equal to 2*r*/2. The time complexity is still nonpolynomial. The result remains unchanged as long as we use a "reasonable" encoding scheme. We mentioned previously that we would not consider unary encoding "reasonable." If we used that encoding scheme, it would take *n* characters to encode the number *n*. For example, the number 7 would be encoded as 1111111. Using this encoding, the prime-checking algorithm has a polynomial time complexity. So we see that our results do change if we stray from a "reasonable" encoding system. In algorithms such as the prime-checking algorithm, we call *n* a ***magnitude*** in the input. We've seen other algorithms whose time complexities are polynomials in terms of magnitude(s) but are not polynomials in terms of size. The time complexity of [Algorithm 1.7](LiB0009.html#51) for computing the *n*th Fibonacci term is in Θ (*n*). Because *n* is a magnitude in the input and lg *n* measures the size of the input, the time complexity of [Algorithm 1.7](LiB0009.html#51) is linear in terms of magnitudes but exponential in terms of size. The time complexity of [Algorithm 3.2](LiB0025.html#263) for computing the binomial coefficient is in Θ (*n*2). Because *n* is a magnitude in the input and lg *n* measures the size of the input, the time complexity of [Algorithm 3.2](LiB0025.html#263) is quadratic in terms of magnitudes but exponential in terms of size. The time complexity, of the dynamic programming algorithm for the 0–1 Knapsack problem discussed in Section 4.4.4 is in Θ (*nW*). In this algorithm, *n* is a measure of size because it is the number of items in the input. However, *W* is a magnitude because it is the maximum capacity of the knapsack; lg *W* measures the size of *W*. This algorithm's time complexity is polynomial in terms of magnitudes and size, but it is exponential in terms of size alone. An algorithm whose worse-case time complexity is bounded above by a polynomial function of its size and magnitudes is called ***pseudopolynomialtime***. Such an algorithm can often be quite useful because it is inefficient only when confronted with instances containing extremely large numbers. Such instances might not pertain to the application of interest. For example, in the 0–1 Knapsack problem, we might often be interested in cases where *W* is not extremely large.