💎一站式轻松地调用各大LLM模型接口,支持GPT4、智谱、星火、月之暗面及文生图 广告
# Chapter 11: Introduction to Parallel Algorithms ## Overview Suppose you want to build a fence in your backyard and it's necessary to dig 10 deep holes, one for each fence post. Realizing that it would be a laborious and unpleasant task to individually dig the 10 holes in sequence, you look for some alternative. You remember how Mark Twain's famous charcter Tom Sawyer tricked his friends into helping him whitewash a fence by pretending it was fun. You decide to use the same clever ruse, but you update it a bit. You pass out flyers to your health-conscious neighbors announcing a hole-digging contest in your backyard. Whoever is most fit should be able to dig a hole fastest and therefore win the contest. You offer some insignificant first prize, such as a six-pack of beer, knowing that the prize is not really important to your neighbors. Rather, they just want to prove how fit they are. On contes day, 10 strong neighbors simultaneously dig 10 holes. This is called digging the holes *in paralle*. You have saved yourself a lot of work and completed the hole digging much faster then you would have done by digging them in sequence by yourself. Just as you can complete the hole-digging job faster by having friends work in parallel, often a computer can complete a job faster if it has many procesors executing instructions in parallel. (A ***processor*** in a computer is a hardware component that processes instroductions and data.) So far we have disussed only sequential processing. That is, all of the algorithms we've presented have been designed to be implemented on a traditional sequential computer. Such a computer has one processor executing instructions in sequence, similar to your digging the 10 holes in seqence by yourself. These computers are based on the model introduced by John von Neumann. As [Figure 11.1](#ch11fig01) illustrates, this model consists of a single processor, called the central processing unit (CPU), and memory. The model takes a single sequence of instructions and operates on a single sequence of data. Such a computer is called a ***single instruction stream, single data stream*** (SISD) computer, and is popularly known as a *serial* computer. [![Click To expand](https://box.kancloud.cn/b1cedea38e03387cfc465fc30d9e36f8_350x72.jpg)](fig11-1_0.jpg) Figure 11.1: A traditional serial computer. Many problems could be solved much faster if a computer had many processor executing instructions simultaneously (in parallel). This would be like having your 10 neighbors dig the 10 holes at the same time. For example, consider the Bayesian network introduced in [Section 6.3](LiB0050.html#606). [Figure 11.2](#ch11fig02) shows such a network. Each vertex in that network represents a possible condition of a patient. There is an edge from one vertex to another if having the condition at the first vertex could cause one to have the condition at the second vertex. For example, the top-right vertex represents the condition of being a smoker, and the edge emanating from that vertex means that smoking can cause lung cancer. A given cause does not allways result in its potential effects. Therefore, the probability of each effect given each of its causes also needs to be stored in the network. For example, the probability (0.5) of being a smoker is stored at the vertex containing "Smoker." The probability (0.1) of having lung cancer, given one is not a smoker, are stored at the vertex containing "Lung cancer." The probability (0.99) of having a positive chest x-ray, given that one has both tuberculosis and lung cancer, along with the probabilities of having a positive chest x-ray, given the other three combinations of values of its causes, are stored at the vertex containing "Positive chest x-ray." The basic inference problem in a Bayesian network is to determine the probability of having the conditions at all remaining vertices when it is learned that the conditions at certain vertices are present. For example, if a patient was known to be a smoker and to have had a positive chest x-ray, we might want to know the probabilities that the patient had lung cancer, had tuberculosis, had shortness of breath, and had recently visited Asia. Pearl (1986) developed an inference algorithm for solving this problem. In this algorithm, each vertex sends messages to its parents and children. For example, when it is learned that the patient has a positive chest x-ray, the vertex for "Positive chest x-ray" sends messages to its parents "Tuberculosis" and "Lung cancer." When each of these vertices receives its message, the probability of the condition at the vertex is computed, and the vertices then sends messages to its parents and other children. When these vertices receive messages, the new probabilities of the conditions at the vertices are computed, and the vertices then send messages. The message-passing scheme terminates at roots and leaves. When it is learned that the patient is also a smoker, another message stream begins at that vertex. A traditional sequential computer can compute the value of only one message or one probability at a time. The value of the message to "Tuberculosis" could be computed first, then the new probability of tuberculosis, then the value of the message to "Lung cancer," then its probability, and so on. [![Click To expand](https://box.kancloud.cn/bebd56d77294a2ec49d6923b0bd08463_350x354.jpg)](fig11-2_0.jpg) Figure 11.2: A Bayesian network. If each vertex had its own processor that was capable of sending messages to the processors at the other vertices, when it is learned that the patient has a positive chest x-ray, we could first compute and send the messages to "Tuberculosis" and "Lung cancer." When each of these vertices received its message, it could independently compute and send messages to its parents and other children. Furthermore, if we also know that the patient was a smoker, and the vertex containing "Smoker" could simultaneously be computing and sending a message to its child. Clearly, if all this were taking place simulataneously, the inference could be done much more quickly. A Bayesian network used in actual applications often contains hundreds of vertices, and the inferred probabilities are needed immediately. This means that the time savings could be quite significant. What we have just described is an architechture for a particular kind of ***parallel computer***. Such computers are called "parallel" because each processor can execute instructions simultaneously (in parallel) with all the other processors. The cost of processors has decreased dramatically over the past three decades. Currently, the speed of an off-the-shelf microprocessor is within one order of magnitude of the speed of the fastest serial computer. However, microprocessors cost many orders of magnitude less. Therefore, by connecting microprocessors as described in the previous paragraph, it is possible to obtain computing power faster than the fastest serial computer for substantiall less money. There are many applications that can benefit significantly from parallel computation. Applications in artifical intelligence include the Bayesian Network problem described previously, inference in neural networks, natural language understanding, speech recognition, and machine vision. Other applications include database query processing, weather prediction, pollution monitoring, analysis of protein structures, and many more. There are many ways to design parallel computers. [Section 11.1](LiB0090.html#1232) discusses some of the considerations necessary in parallel design and some of the most popular parallel architectures. [Section 11.2](LiB0091.html#1252) shows how to write algorithms for one particular kind of parallel computer, called a PRAM (for "parallel random access machine"). As we shall see, this particular kind of computer is not very practical. However, the PRAM model is a straightforward generalization of the sequential model of computation. Furthermore, a PRAM algorithm can be translated into algorithms for many practical machines. So PRAM algorithms serve as a good introduction to parallel algorithms.