0 ratings0% found this document useful (0 votes) 74 views19 pagesChapter 2 - Models of Computation
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Sec. 1.2. Models of Computation 3
With the availability of the hardware, the most pressing question in parallel
computing today is: How to program parallel computers to solve problems efficiently
and in a practical and economically feasible way? As is the case in the sequential
world, parallel computing requires algorithms, programming languages and com-
pilers, as well as operating systems in order to actually perform a computation on the
parallel hardware, All these ingredients of parallel computing are currently receivinga
good deal of well-deserved attention from researchers.
This book is about one (and pethaps the most fundamental) aspect of
parallelism, namely, parallel algorithms. A parallel algorithm is a solution method for
a given problem destined to be performed on a parallel computer. In order to properly
design such algorithms, one needs to have a clear understanding of the model of
computation underlying the parallel computer
4.2 MODELS OF COMPUTATION
Any computer, whether sequential or parallel, operates by executing instructions on
data. A stream of instructions (the algorithm) tells the computer what to do at each
step. A stream of data (the input to the algorithm) is affected by these instructions.
Depending on whether there is one or several of these streams, we can distinguish
among four classes of computers:
1. Single Instruction stream, Single Data stream (SISD)
2 Multiple Instruction stream, Single Data stream (MISD)
3, Single Instruction stream, Multiple Data stream (SIMD)
4, Multiple Instruction stream, Multiple Data stream (MIMD).
We now examine each of these classes in some detail. In the discussion that follows we
shall not be concerned with input, output, or peripheral units that are available on
every computer,
1.2.1 SISD Computers
Acomputer in this class consists of a single processing unit receiving a single stream of
instructions that operate on a single stream of data, as shown in Fig, 1.1. At each step
during the computation the control unit emits one instruction that operates on a
datum obtained from the memory unit, Such an instruction may tell the processor, for
controL | _wstauerioy | PROcESSOR| SATA MEMORY
Figure 14 SISD computer.4 Introduction Chap. 1
example, to perform some arithmetic or logic operation on the datum and then put it
back in memory.
The overwhelming majority of computers today adhere to this model invented
by John von Neumann and his collaborators in the late 1940s. An algorithm for a
computer in this class is said to be sequential (or serial).
Example 11
In order to compute the sum of n numbers, the processor needs to gain access to the
‘memory n consecutive times and each time receive one number. There are also n= 1
additions involved that are executed in sequence. Therefore, this computation requires on.
the order of n operations in total.
This example shows that algorithms for SISD computers do not contain any
parallelism. The reason is obvious, there is only one processor! In order to obtain
from a computer the kind of parallel operation defined earlier, it will need to have
several processors. This is provided by the next three classes of computers, the classes
of interest in this book. In each of these classes, a computer possesses N processors,
where N > I.
1.2.2 MISD Computers
Here, N processors each with its own control unit share a common memory unit
where data reside, as shown in Fig. !.2. There are N streams of instructions and one
stream of data. At each step, one datum received from memory is operated upon by all
the processors simultaneously, each according to the instruction it receives from its
control. Thus, parallelism is achieved by letting the processors do different things at
the same time on the same datum. This class of computers lends itself naturally to
those computations requiring an input to be subjected to several operations, each
receiving the input in its original form. Two such computations are now illustrated.
memory |__pata
fm | ossucnon [connor
Figure 12 MISD computer.Sec 1.2 Models of Computation 5
Example 1.2
It is required to determine whether a given positive integer has no divisors except | and
itself. The obvious solution to this problem is to try all possible divisors of z: If none of
these succeeds in dividing z, then zis said to be prime; otherwise is said to be composite.
We can implement this solution as a parallel algorithm on an MISD computer.
The idea is to split the job of testing potential divisors among processors. Assume that
there are as many processors on the parallel computer as there are potential divisors of z
All processors take z as input, then each tries to divide it by its associated potential
divisor and issues an appropriate output based on the result. Thus it is possible to
determine in one step whether z is prime. More realistically, if there are fewer processors
than potential divisors, then each processor can be given the job of testing a different
subset of these divisors. In either case, a substantial speedup is obtained over a purely
sequential implementation,
Although more efficient solutions to the problem of primality testing exist, we have
chosen the simple one as it illustrates the point without the need for much mathematical
sophistication. 1]
Example 1.3
In many applications, we often need to determine to which of a number of classes does a
given object belong. The object may be a mathematical one, where it is required to
associate a number with one of several sets, each with its own properties. Or it may be a
physical one: A robot scanning the deep-sea bed "sees" different objects that it has to
recognize in order to distinguish among fish, rocks, algae, and so on. Typically,
membership of the object is determined by subjecting it to a number of different tests
The clas sss can be done very quickly on an MISD computer with as
many proce ere are classes. Each processor is associated with a class and can
recognize members of that cl Given an object to be
classified, it is sent simultaneously to all processors where it is tested in parallel. The
‘object belongs to the class associated with that processor that reports the success of its
test. (Of course, it may be that the object does not belong to any of the classes tested for,
in which case all processors report failure.) As in example 1.2, when fewer processors than
classes are available, several tests are performed by cach processor; here, however, in
reporting success, a processor must also provide the class to which the object
belongs. O
The preceding examples show that the class of MISD computers could be
extremely useful in many applications. It is also apparent that the kind of com-
putations that can be carried out intly on these computers are of a rather
specialized nature. For most applications, MISD computers would be rather
awkward to use. Parallel computers that are more flexible, and hence suitable for a
wide range of problems, are described in the next two sections
1.2.3 SIMD Computers
In this class, a parallel computer consists of N identical processors, as shown in Fig,
13.
Each of the N processors possesses its own local memory where it can store both6 Introduction Chap. 1
SHARED MEMORY
oF
INTERCONNECTION NETWORK
Dara DATA DATA
STREAM STREAM, STREAM
"
procgsson Processor
INSTRUCTION
‘STREAM
CONTROL
Figure 13. SIMD computer.
programs and data. All processors operate under the control of a single instruction
stream issued by a central control unit. Equivalently, the N processors may be
assumed to hold identical copies of a single program, each processor's copy being
stored in its local memory. There are N data streams, one per processor.
‘The processors operate synchronously: At each step, all processors execute the
same instruction, each on a different datum. The instruction could be a simple one
(such as adding or comparing two numbers) or a complex one (such as merging two
lists of numbers). Similarly, the datum may be simple (one number) or complex (several
numbers). Sometimes, it may be necessary to have only a subset of the processors
execute an instruction, This information can be encoded in the instruction itself,
thereby telling a processor whether it should be aetive (and execute the instruction) or
inactive (and wait for the next instruction). There is a mechanism, such as a global
clock, that ensures lock-step operation, Thus processors that are inactive during an
instruction or those that complete execution of the instruction before others may stay
idle until the next instruction is issued. The time interval between two instructions
may be fixed or may depend on the instruction being executed,
In most interesting problems that we wish to solve on an SIMD computer, it is
desirable for the processors to be able to communicate among themselves during the
computation in order to exchange data or intermediate results. This can be achieved
in two ways, giving rise to two subclasses: SIMD computers where communication is
through a shared memory and those where it is done via an interconnection network,Sec 1.2 Models of Computation 7
1.2.3.1 Shared-Memory (SM) SIMD Computers. This class is also
known in the literature as the Parallel Random-Access Machine (PRAM) model,
Here, the N processors share a common memory that they use in the same way a
group of people may use a bulletin board. When two processors wish to communicate,
they do so through the shared memory. Say processor i wishes to pass a number to
processor j. This is done in two steps. First, processor i writes the number in the
shared memory at a given location known to processor j. Then, processor j reads the
number from that location.
During the execution of a parallel algorithm, the N processors gain access to the
shared memory for reading input data, for reading or writing intermediate results, and
for writing final results. The basic model allows all processors to gain access to the
shared memory simultaneously if the memory locations they are trying to read from
or write into are different. However, the class of shared-memory SIMD computers can
be further divided into four subclasses, according to whether two or more processors
can gain access to the same memory location simultancously:
(@ Exclusive-Read, Exclusive-Write (EREW) SM SIMD Computers. Access
to memory locations is exclusive. In other words, no two processors are allowed
simultaneously to read from or write into the same memory location,
) Concurrent-Read, Exclusive-Write (CREW) SM SIMD Computers.
Multiple processors are allowed to read from the same memory location but the right
to write is still exclusive: No two processors are allowed to write into the same
location simultaneously.
ii) Exelusive-Read, Concurrent-Write (ERCW) SM SIMD Computers.
Multiple processors are allowed to write into the same memory location but read
accesses remain exclusive,
(iv) Concurrent-Read, Concurrent-Write (CRCW) SM SIMD Computers.
Both multiple-read and multiple-write privileges are granted.
Allowing multiple-read accesses to the same address in memory should in
principle pose no problems (except pethaps some technological ones to be discussed
later). Conceptually, cach of the several processors reading from that location makes a
copy of the location's contents and stores it in its own local memory.
With multiple-writeaccesses, however, difficulties arise. If several processors are
attempting simultaneously to store (potentially different) data at a given address,
which of them should succeed? In other words, there should be a deterministic way of
specifying the contents of that address after the write operation. Several policies have
been proposed to resolve such write conflicts, thus further subdividing classes
(iv), Some of these policies are
(a) the smallest-numbered processor is allowed to write, and access is denied to all
other processors;
(b) all processors are allowed to write provided that the quantities they are
attempting to store are equal, otherwise access is denied to all processors; and
(©) the sum of all quantities that the processors are attempting to write is stored8 Introduction Chap. 1
A typical representative of the class of problems that can be solved on parallel
computers of the SM SIMD family is given in the following example.
Example 14
Consider a very large computer file consisting of # distinct entries. We shall assume for
simplicity that the file is not sorted in any order. (In fact, it may be the case that keeping
the file sorted at all times is impossible or simply inefficient.) Now suppose that it is,
required to determine whether a given item x is present in the file in order to perform
standard database operation, such as read, update, or delete. On a conventional (i¢
SISD) computer, retrieving x requires 71 steps in the worst case where each step is @
comparison between x and a file entry. The worst case clearly occurs when X is either
equal to the last entry or not equal to any entry. On the average, of course, we expect to
do a little better: If the file entries are distributed uniformly over a given range, then half
as many steps are required to retrieve ©
The job can be done a lot faster on an EREW SM SIMD computer with N
processors, where N R. We now discuss a number of
features of this model.
(® Price. The first question to ask is: What is the price paid to fully
interconnect N processors? There are N — | lines leaving each processor for a total of
N(N = 1)/2 lines, Clearly, such a network is too expensive, especially for large values
of N. This is particularly true if we note that with N processors the best we can hope
for is an N-fold reduction in the number of steps required by a sequential algorithm, as
shown in section 1.3.13,
ii) Feasibility. Even if we could afford such a high price, the model is
unrealistic in practice, again for large values of N. Indeed, there is a limit on the
number of lines that can be connected to a processor, and that limit is dictated by the
actual physical size of the processor itself.
Gii) Relation to SM SIMD. Finally, it should be noted that the fully
interconnected model as described is weaker than a shared-memory computer for the
same reason as the R-block shared memory: No more than one processor can gain
access simultaneously to the memory block associated with another processor.
Allowing the latter would yield a cost of N’ x £{M/N), which is about the same as for
the SM SIMD (not counting the quadratic cost of the two-way lines): This clearly
would defeat our original purpose of getting a more feasible machine!
Simple Networks for SZMD Computers. It is fortunate that in most appli-
cations a small subset of all pairwise connections is usually sufficient to obtain a good
performance. The most popular of these networks are briefly outlined in what follows.
Keep in mind that since two processors can communicate in a constant number of
steps on a SM SIMD computer, any algorithm for an interconnection-network SIMD
computer can be simulated on the former model in no more steps than required to
execute it by the latter.
(i) Linear Array. The simplest way to interconnect N processors is in the form
ofa one-dimensional array, as shown in Fig. 1.6 for N = 6. Here, processor P,is linked
to its two neighbors P,_, and P,, ; through a two-way communication line. Each of
the end processors, namely, P, and P, has only one neighbor.
(ii) Two-Dimensional Array. A two-dimensional network is obtained by
arranging the N processors into an m x marray, where m = N"?, as shown in Fig. 1.7
for m=4, The processor in row j and column k is denoted by P(j,k), where
0 1 and let N pro-
cessors be available Po, p,,...,Py-y-A q-dimensional cube (or hypercube) is obtained
by connecting each processor to q neighbors. The q neighbors P, of P, are defined as
follows: The binary representation of jis obtained from that of i by complementing a
single bit. This is illustrated in Fig. 1.10 for q = 3. The indices of Po, P,,...,P, are
given in binary notation. Note that each processor has three neighbors.
There are several other interconnection networks besides the ones just de-
scribed. The decision regarding which of these to use largely depends on the
application and in particular on such factors as the kinds of computations to be
performed, the desired speed of execution, and the number of processors available. We
conclude this section by illustrating a parallel algorithm for an SIMD computer that
uses an interconnection network.
Example 15,
Assume that the sum of m numbers X.%;....,%, needs to be computed. There are ri ~ 1
additions involved in this computation, and a sequential algorithm running on a
conventional (i, SISD) computer will require ” steps to complete it, as mentioned in16 Introduction Chap. 1
Figure 1.10 Cube connection,
example 1.1, Using a ttee-connected SIMD computer with log » levels and n/? leaves,
the job can be done in log n steps as shown in Fig. LL] for n= 8.
The original input is reczived at the leaves, two numbers per leaf. Bach leaf adds its
inputs and sends the result to its parent. The process is now repeated at each subsequent
level: Each processor receives two inputs from its children, computes their sum, and sends
it to its parent. The final result is eventually produced by the root. Since at each level ail
the processors operate in parallel, the sum is computed in log » steps. This compares very
favorably with the sequential computation,
The improvement in speed is even more dramatic when mets, each of n numbers,
areavailable and the sum ofeach set is to be computed. A conventional machine requires
im steps in this case. A naive application of the parallel algorithm produces them sums in
oureur
Figure 141 Adding eight numbers on
INPUT x, x5 5% %5 X7 %3 processor tree.Sec 1.2 Models of Computation 7
{log n) steps. Through a process known as pipelining, however, we can do significantly
better. Notice that oncea set has been processed by the leaves, they are free to receive the
next one. The same observation applies to all processors at higher levels. Hence each of
the m—1 sets that follow the initial one can be input to the leaves one step after their
predecessor. Once the first sum exits from the root, a new sum is produced in the next
step. The entire process therefore takes log n + m~ 1 steps.
It should be clear from our discussion so far that SIMD computers are
considerably more versatile than those conforming to the MISD model. Numerous
problems covering a wide variety of applications can be solved by parallel algorithms
‘on SIMD computers. Also, as shown by examples 1.4 and 1.5, algorithms for these
computers are relatively easy to design, analyze, and implement. In one respect,
however, this class of problems is restricted to those that can be subdivided into a set
of identical subproblems all of which are then solved simultaneously by the same set of
instructions. Obviously, there are many computations that do not fit this pattern. In
some problems it may not be possible or desirable to execute all instructions
synchronously. Typically, such problems are subdivided into subproblems that are
not necessarily identical and cannot or should not be solved by the same set of
instructions. To solve these problems, we turn to the class of MIMD computers.
1.2.4 MIMD Computers
This class of computers is the most general and most powerful in our paradigm of
parallel computation that classifies parallel computers according to whether the
instruction and/or the data streams are duplicated. Here we have N processors, N
streams of instructions, and N streams of data, as shown in Fig. 1.12. The processors
here are of the type used in MISD computers in the sense that each possesses its own
control unit in addition to its local memory and arithmetic and logic unit. This makes
these processors more powerful than the ones used for SIMD computers.
Each processor operates under the control of an instruction stream issued by its
control unit. Thus the processors are potentially all executing different programs on
different data while solving different subproblems of a single problem. This means that
the processors typically operate asynchronously. As with SIMD computers, commu-
nication between processors is performed through a shared memory or an intercon-
nection network. MIMD computers sharing a common memory are often referred to
as multiprocessors (or tightly coupled machines) while those with an interconnection
network are known as multicomputers (ot loosely coupled machines).
Since the processors on a multiprocessor computer share a common memory,
the discussion in section 1.2.3.1 regarding the various modes of concurrent memory
access applies here as well. Indeed, two or more processors executing an asynchronous
algorithm may, by accident or by design, wish to gain access to the same memory
location. We can therefore talk of EREW, CREW, ERCW, and CRCW SM MIMD
computers and algorithms, and various methods should be established for resolving
memory access conflicts in models that disallow them.18 Introduction Chap. 1
SHARED MEMORY
oR
INTERCONNECTION. NETWORK
paral DATA DATA
STREAM| STREAM STREAM
1 2 N
PROCESSOR processor] @ @ © PROCESSOR
iwstauction | INSTRUCTION INSTRUCTION]
STREAM STREAM, ‘STREAM
" 2 N
Figure 142. MIMD computer.
Multicomputers are sometimes referred to as distributed systems, The distinction
is usually based on the physical distance separating the processors and is therefore
often subjective. A rule of thumb is the following: If all the processors are in close
proximity of one another (they are all in the same room, say), then they are a
multicomputer; otherwise (they are in different cities, say) they are a distributed
system, The nomenclature is relevant only when it comes to evaluating parallel
algorithms. Because processors in a distributed system are so far apart, the number of
data exchanges among them is significantly more important than the number of
computational steps performed by any of them.
The following example examines an application where the great flexibility of
MIMD computers is exploited,
Example 18
Computer programs that play games of strategy, such as chess, do so by generating and
searching so-called game trees. The root of the tree is the current game configuration or
Position from which the program is to make a move. Children of the root represent all the
positions reached through one move by the program. Nodes at the next level represent all
positions reached through the opponent's reply. This continues up to some predefinedSec.1.2 Models of Computation 19
‘number of levels. Each leaf position is now assigned a value representing its "goodness"
from the program’s point of view. The program then determines the path leading to the
best position it can reach assuming that the opponent plays a perfect game. Finally, the
original move on this path (ie., an edge leaving the root) is selected for the program.
As there are typically several moves per position, game trees tend to be very large.
In order to cut down on the search time, these trees are generated as they are searched.
The idea is to explore the tree using the depth-first search method. From the given root
position, paths are created and examined one by one. First, a complete path is built from
the root to a leaf. The next path is obtained by backing up from the current leaf to
position all of whose descendants have not yet been explored and building a new path.
During the generationof such a path it may happen that a position is reached that, based
on information collected so far, definitely leads to leaves that are no better than the ones
already examined. In this case the program interrupts its search along that path and all,
descendants of that position are ignored. A cutoff is said to have occurred, Search can
now resume along a new path.
So far we have described the search procedure as it would be executed sequentially
‘One way to implement it on an MIMD computer would be to distribute the subtrees of
the root among the processors and let as many subtrees as possible be explored in
parallel. During the search the processors may exchange various pieces of information.
For example, one processor may obtain from another the best move found so far: This
may lead to further cutoffs. Another datum that may be communicated is whether a
processor has finished searching its subtree(s). If there is a subtree that is still under
consideration, then an idle processor may be assigned the job of searching part of that
subtre
is approach clearly does not lend itself to implementation on an SIMD
computer as the sequence of operations involved in the search is not predictable in
advance. At any given point, the instruction being executed varies from one processor to
another: While one processor may be generating a new position, a second may be
evaluatinga leaf, a third may be executing cutoff, a fourth may be backing up to start a
new path, a fifth may be communicating its best move, a sixth may be signaling the end of
its search, and soon.
1.2.4.1 Programming MIMD Computers. As mentioned earlier, the
MIMD model of parallel computation is the most general and powerful possible.
‘Computers in this class are used to solve in parallel those problems that lack the
regular structure required by the SIMD model. This generality does not come for free:
Asynchronous algorithms are difficult to design, evaluate, and implement. In order to
appreciate the complexity involved in programming MIMD computers, it is import-
ant to distinguish between the notion of a process and that of a processor. An
asynchronous algorithm is a collection of processes some or all of which are executed
simultaneously on a number of available processors. Initially, all processors are free
The parallel algorithm starts its execution on an arbitrarily chosen processor. Shortly
thereafter it creates a number of computational tasks, or processes, to be performed. A
process thus corresponds toa section of the algorithm: There may be several processes.
associated with the same algorithm section, each with a different parameter.
Once a process is created, it must be executed on a processor. Ifa free processor20 Introduction Chap. 1
is available, the process is assigned to the processor that performs the computations
specified by the process. Otherwise (if no free processor is available), the process is
queued and waits for a processor to be free.
When a processor completes execution of a process, it becomes free. Ifa process
is waiting to be executed, then it ean be assigned to the processor just freed, Otherwise
(if no process is waiting), the processor is queued and waits for a process to be created.
The order in which processes are executed by processors can obey any policy
that assigns priorities to processes. For example, processes can be executed in a first-
in-first-out or in a last-in-first-out order. Also, the availability of a processor is
sometimes not sufficient for the processor to be assigned a waiting process. An
additional condition may have to be satisfied before the process starts. Similarly, if a
processor has already been assigned a process and an unsatisfied condition is
encountered during execution, then the processor is freed. When the condition for
resumption of that process is later satisfied, a processor (not necessarily the original
one) is assigned to it, These are but a few of the scheduling problems that characterize
the programming of multiprocessors. Finding efficient solutions to these problems is
of paramount importance if MIMD computers are to be considered useful, Note that
none of these scheduling problems arise on the less flexible but easier to program
SIMD computers.
1.2.4.2 Special-Purpose Architectures. In theory, any parallel al-
gorithm can be executed efficiently on the MIMD model. The latter can therefore be
used to build parallel computers with a wide variety of applications. Such computers
are said to have @ general-purpose architecture. In practice, by contrast, it is quite
sensible in many applications to assemble several processors in a configuration
specifically designed for the problem at hand. The result is a parallel computer well
suited for solving that problem very quickly but that cannot in general be used for any
other purpose. Such a computer is said to have a special-purpose architecture. With a
particular problem in mind, there are several ways to design a special-purpose parallel
computer. For example, a collection of specialized or very simple processors may be
used in one of the standard networks such as the mesh. Alternatively, one may
interconnect a number of standard processors in a custom geometry. These two
approaches may also be combined.
Example 17
Black-and-white pictures are stored in computers in the form of two-dimensional arrays.
Each array entry represents a picture element, or pixel. AQ entry represents a white pixel,
a L entry a black pixel. The larger the array, the more pixels we have, and hence the higher
the resolution, that is, the precision with which the picture is represented. Once a picture
is stored in that way, it can be processed, for example, to remove any noise that may be
present, increase the sharpness, fill in missing details, and determine contours of objects.
Assume that itis desired to execute a very simple noise removal algorithm that gets
rid of "salt" and "pepper" in pictures, that is, sparse white dots on a black background
and sparse black dots on a white background, respectively. Such an algorithm can be
implemented very efficiently on a set of very simple processors in a two-dimensionalSec. 1.3 Analyzing Algorithms a
configuration where each processor is linked to its eight closest neighbors (ie, the mesh
with diagonal connections in addition to horizontal and vertical ones). Fach processor
corresponds to a pixel and stores its value. All the processors can now execute the
following step in parallel: ia pixel is (1) and all its neighborsare 1(0),it changes its value
toi, O
One final observation is in order in concluding this section. Having studied a
variety of approaches to building parallel computers, it is natural to ask: How is one
to choose a parallel computer from among the available models? We already saw how
one model can use its computational abilities to simulate an algorithm designed for
another model. In fact, we shall show in the next section that one processor is capable
of executing any parallel algorithm. This indicates that all the models of parallel
computers are equivalent in terms of the problems that they cam solve. What
distinguishes one from another is the ease and speed with which it solves a particular
problem, Therefore, the range of applications for which the computer will be used and
the urgency with which answers to problems are needed are important factors in
deciding what parallel computer to use, However, as with many things in life, the
choice of a parallel computer is mostly dictated by economic considerations.
41.3 ANALYZING ALGORITHMS
This book is concerned with two aspects of parallel algorithms: their design and their
analysis. A number of algorithm design techniques were illustrated in section 1.2 in
connection with our description of the different models of parallel computation. The
examples studied therein also dealt with the question of algorithm analysis. This refers
to the process of determining how good an algorithm is, that is, how fast, how
expensive to run, and how efficient it is in its use of the available resources. In this
section we define more formally the various notions used in this book when analyzing
parallel algorithms.
Once a new algorithm for some problem has been designed, it is usually
evaluated using the following criteria: running time, number of processors used, and
cost. Besides these standard metrics, a number of other technology-related measures
are sometimes used when it is known that the algorithm is destined to run on a
computer based on that particular technology.
1.3.1 Running Time
Since speeding up computations appears to be the main reason behind our interest in
building parallel computers, the most important measure in evaluating a parallel
algorithm is therefore its running time. This is defined as the time taken by the
algorithm to solve a problem ona parallel computer, that is, the time elapsed from the
moment the algorithm starts to the moment it terminates. If the various processors do
not all begin and end their computation simultaneously, then the running time is