# Chapter 11: Introduction to Parallel Algorithms
## Overview
Suppose you want to build a fence in your backyard and it's necessary to dig 10 deep holes, one for each fence post. Realizing that it would be a laborious and unpleasant task to individually dig the 10 holes in sequence, you look for some alternative. You remember how Mark Twain's famous charcter Tom Sawyer tricked his friends into helping him whitewash a fence by pretending it was fun. You decide to use the same clever ruse, but you update it a bit. You pass out flyers to your health-conscious neighbors announcing a hole-digging contest in your backyard. Whoever is most fit should be able to dig a hole fastest and therefore win the contest. You offer some insignificant first prize, such as a six-pack of beer, knowing that the prize is not really important to your neighbors. Rather, they just want to prove how fit they are. On contes day, 10 strong neighbors simultaneously dig 10 holes. This is called digging the holes *in paralle*. You have saved yourself a lot of work and completed the hole digging much faster then you would have done by digging them in sequence by yourself.
Just as you can complete the hole-digging job faster by having friends work in parallel, often a computer can complete a job faster if it has many procesors executing instructions in parallel. (A ***processor*** in a computer is a hardware component that processes instroductions and data.) So far we have disussed only sequential processing. That is, all of the algorithms we've presented have been designed to be implemented on a traditional sequential computer. Such a computer has one processor executing instructions in sequence, similar to your digging the 10 holes in seqence by yourself. These computers are based on the model introduced by John von Neumann. As [Figure 11.1](#ch11fig01) illustrates, this model consists of a single processor, called the central processing unit (CPU), and memory. The model takes a single sequence of instructions and operates on a single sequence of data. Such a computer is called a ***single instruction stream, single data stream*** (SISD) computer, and is popularly known as a *serial* computer.
[![Click To expand](https://box.kancloud.cn/b1cedea38e03387cfc465fc30d9e36f8_350x72.jpg)](fig11-1_0.jpg)
Figure 11.1: A traditional serial computer.
Many problems could be solved much faster if a computer had many processor executing instructions simultaneously (in parallel). This would be like having your 10 neighbors dig the 10 holes at the same time. For example, consider the Bayesian network introduced in [Section 6.3](LiB0050.html#606). [Figure 11.2](#ch11fig02) shows such a network. Each vertex in that network represents a possible condition of a patient. There is an edge from one vertex to another if having the condition at the first vertex could cause one to have the condition at the second vertex. For example, the top-right vertex represents the condition of being a smoker, and the edge emanating from that vertex means that smoking can cause lung cancer. A given cause does not allways result in its potential effects. Therefore, the probability of each effect given each of its causes also needs to be stored in the network. For example, the probability (0.5) of being a smoker is stored at the vertex containing "Smoker." The probability (0.1) of having lung cancer, given one is not a smoker, are stored at the vertex containing "Lung cancer." The probability (0.99) of having a positive chest x-ray, given that one has both tuberculosis and lung cancer, along with the probabilities of having a positive chest x-ray, given the other three combinations of values of its causes, are stored at the vertex containing "Positive chest x-ray." The basic inference problem in a Bayesian network is to determine the probability of having the conditions at all remaining vertices when it is learned that the conditions at certain vertices are present. For example, if a patient was known to be a smoker and to have had a positive chest x-ray, we might want to know the probabilities that the patient had lung cancer, had tuberculosis, had shortness of breath, and had recently visited Asia. Pearl (1986) developed an inference algorithm for solving this problem. In this algorithm, each vertex sends messages to its parents and children. For example, when it is learned that the patient has a positive chest x-ray, the vertex for "Positive chest x-ray" sends messages to its parents "Tuberculosis" and "Lung cancer." When each of these vertices receives its message, the probability of the condition at the vertex is computed, and the vertices then sends messages to its parents and other children. When these vertices receive messages, the new probabilities of the conditions at the vertices are computed, and the vertices then send messages. The message-passing scheme terminates at roots and leaves. When it is learned that the patient is also a smoker, another message stream begins at that vertex. A traditional sequential computer can compute the value of only one message or one probability at a time. The value of the message to "Tuberculosis" could be computed first, then the new probability of tuberculosis, then the value of the message to "Lung cancer," then its probability, and so on.
[![Click To expand](https://box.kancloud.cn/bebd56d77294a2ec49d6923b0bd08463_350x354.jpg)](fig11-2_0.jpg)
Figure 11.2: A Bayesian network.
If each vertex had its own processor that was capable of sending messages to the processors at the other vertices, when it is learned that the patient has a positive chest x-ray, we could first compute and send the messages to "Tuberculosis" and "Lung cancer." When each of these vertices received its message, it could independently compute and send messages to its parents and other children. Furthermore, if we also know that the patient was a smoker, and the vertex containing "Smoker" could simultaneously be computing and sending a message to its child. Clearly, if all this were taking place simulataneously, the inference could be done much more quickly. A Bayesian network used in actual applications often contains hundreds of vertices, and the inferred probabilities are needed immediately. This means that the time savings could be quite significant.
What we have just described is an architechture for a particular kind of ***parallel computer***. Such computers are called "parallel" because each processor can execute instructions simultaneously (in parallel) with all the other processors. The cost of processors has decreased dramatically over the past three decades. Currently, the speed of an off-the-shelf microprocessor is within one order of magnitude of the speed of the fastest serial computer. However, microprocessors cost many orders of magnitude less. Therefore, by connecting microprocessors as described in the previous paragraph, it is possible to obtain computing power faster than the fastest serial computer for substantiall less money. There are many applications that can benefit significantly from parallel computation. Applications in artifical intelligence include the Bayesian Network problem described previously, inference in neural networks, natural language understanding, speech recognition, and machine vision. Other applications include database query processing, weather prediction, pollution monitoring, analysis of protein structures, and many more.
There are many ways to design parallel computers. [Section 11.1](LiB0090.html#1232) discusses some of the considerations necessary in parallel design and some of the most popular parallel architectures. [Section 11.2](LiB0091.html#1252) shows how to write algorithms for one particular kind of parallel computer, called a PRAM (for "parallel random access machine"). As we shall see, this particular kind of computer is not very practical. However, the PRAM model is a straightforward generalization of the sequential model of computation. Furthermore, a PRAM algorithm can be translated into algorithms for many practical machines. So PRAM algorithms serve as a good introduction to parallel algorithms.
- Table of Contents
- BackCover
- Foundations of Algorithms Using C++ Pseudocode, Third Edition
- Preface
- Chapter Contents
- Pedagogy
- Course Outlines
- Acknowledgments
- Errors
- Chapter 1: Algorithms - Efficiency, Analysis, and Order
- 1.2 The Importance of Developing Efficient Algorithms
- 1.3 Analysis of Algorithms
- 1.4 Order
- 1.5 Outline of This Book
- Exercises
- Chapter 2: Divide-and-Conquer
- 2.1 Binary Search
- 2.2 Mergesort
- 2.3 The Divide-and-Conquer Approach
- 2.4 Quicksort (Partition Exchange Sort)
- 2.5 Strassen's Matrix Multiplication Algorithm
- 2.6 Arithmetic with Large Integers
- 2.7 Determining Thresholds
- 2.8 When Not to Use Divide-and-Conquer
- Exercises
- Chapter 3: Dynamic Programming
- 3.1 The Binomial Coefficient
- 3.2 Floyd's Algorithm for Shortest Paths
- 3.3 Dynamic Programming and Optimization Problems
- 3.4 Chained Matrix Multiplication
- 3.5 Optimal Binary Search Trees
- 3.6 The Traveling Salesperson Problem
- Exercises
- Chapter 4: The Greedy Approach
- 4.1 Minimum Spanning Trees
- 4.2 Dijkstra's Algorithm for Single-Source Shortest Paths
- 4.3 Scheduling
- 4.4 Huffman Code
- 4.5 The Greedy Approach versus Dynamic Programming: The Knapsack Problem
- Exercises
- Chapter 5: Backtracking
- 5.2 The n-Queens Problem
- 5.3 Using a Monte Carlo Algorithm to Estimate the Efficiency of a Backtracking Algorithm
- 5.4 The Sum-of-Subsets Problem
- 5.5 Graph Coloring
- 5.6 The Hamiltonian Circuits Problem
- 5.7 The 0-1 Knapsack Problem
- Exercises
- Chapter 6: Branch-and-Bound
- 6.1 Illustrating Branch-and-Bound with the 0 - 1 Knapsack problem
- 6.2 The Traveling Salesperson Problem
- 6.3 Abductive Inference (Diagnosis)
- Exercises
- Chapter 7: Introduction to Computational Complexity - The Sorting Problem
- 7.2 Insertion Sort and Selection Sort
- 7.3 Lower Bounds for Algorithms that Remove at Most One Inversion per Comparison
- 7.4 Mergesort Revisited
- 7.5 Quicksort Revisited
- 7.6 Heapsort
- 7.6.1 Heaps and Basic Heap Routines
- 7.6.2 An Implementation of Heapsort
- 7.7 Comparison of Mergesort, Quicksort, and Heapsort
- 7.8 Lower Bounds for Sorting Only by Comparison of Keys
- 7.8.1 Decision Trees for Sorting Algorithms
- 7.8.2 Lower Bounds for Worst-Case Behavior
- 7.8.3 Lower Bounds for Average-Case Behavior
- 7.9 Sorting by Distribution (Radix Sort)
- Exercises
- Chapter 8: More Computational Complexity - The Searching Problem
- 8.1 Lower Bounds for Searching Only by Comparisons of Keys
- 8.2 Interpolation Search
- 8.3 Searching in Trees
- 8.4 Hashing
- 8.5 The Selection Problem: Introduction to Adversary Arguments
- Exercises
- Chapter 9: Computational Complexity and Interactability - An Introduction to the Theory of NP
- 9.2 Input Size Revisited
- 9.3 The Three General Problem Categories
- 9.4 The Theory of NP
- 9.5 Handling NP-Hard Problems
- Exercises
- Chapter 10: Number-Theoretic Algorithms
- 10.1 Number Theory Review
- 10.2 Computing the Greatest Common Divisor
- 10.3 Modular Arithmetic Review
- 10.4 Solving Modular Linear Equations
- 10.5 Computing Modular Powers
- 10.6 Finding Large Prime Numbers
- 10.7 The RSA Public-Key Cryptosystem
- Exercises
- Chapter 11: Introduction to Parallel Algorithms
- 11.1 Parallel Architectures
- 11.2 The PRAM Model
- Exercises
- Appendix A: Review of Necessary Mathematics
- A.2 Functions
- A.3 Mathematical Induction
- A.4 Theorems and Lemmas
- A.5 Logarithms
- A.6 Sets
- A.7 Permutations and Combinations
- A.8 Probability
- Exercises
- Appendix B: Solving Recurrence Equations - With Applications to Analysis of Recursive Algorithms
- B.2 Solving Recurrences Using the Characteristic Equation
- B.3 Solving Recurrences by Substitution
- B.4 Extending Results for n, a Power of a Positive Constant b, to n in General
- B.5 Proofs of Theorems
- Exercises
- Appendix C: Data Structures for Disjoint Sets
- References
- Index
- Index_B
- Index_C
- Index_D
- Index_E
- Index_F
- Index_G
- Index_H
- Index_I
- Index_J
- Index_K
- Index_L
- Index_M
- Index_N
- Index_O
- Index_P
- Index_Q
- Index_R
- Index_S
- Index_T
- Index_U
- Index_V
- Index_W-X
- Index_Y
- Index_Z
- List of Figures
- List of Tables
- List of Algorithms, Examples, and Theorems
- List of Sidebars