2.introduction To Parallel Processing
2.introduction To Parallel Processing
Parallel Processing
C H A PT ER 2
Parallel Computing
• Parallel computing is a computing where the jobs are broken into discrete
parts that can be executed concurrently.
• Each part is further broken down to a series of instructions. Instructions
from each part execute simultaneously on different CPUs.
• Parallel systems deal with the simultaneous use of multiple computer
resources that can include a single computer with multiple processors, a
number of computers connected by a network to form a parallel processing
cluster or a combination of both.
• Parallel systems are more difficult to program than computers with a single
processor because the architecture of parallel computers varies
accordingly and the processes of multiple CPUs must be coordinated and
synchronized.
The State of Computing
• Modern computers are equipped with powerful hardware facilities
driven by extensive software packages.
1. To assess state-of-the-art computing, we first review historical
milestones in the development of computers.
2. Then we take a grand tour of the crucial hardware and software
elements built into modern computer systems.
1. Computer Development Milestones
• Prior to I945, computers were made with mechanical or electromechanical
parts. The earliest mechanical computer can be traced back to 5110 BC in the
form of the abacus used in China.
• Blaise Pascal built a mechanical adder/ subtractor in France in 1642.
• Charles Babbage designed a difference engine in England for polynomial
evaluation in 1827.
• Konrad Zusc built the first binary mechanical computer in Germany in 1941.
• Howard Aiken proposed the very first electromechanical decimal computer,
which was built as the Harvard Mark I by IBM in 1944. Both Zusc’s and
Aiken’s machines were designed for general purpose computations.
• Obviously, the fact that computing and communication were carried
out with moving mechanical parts greatly limited the computing speed
and reliability of mechanical computers.
• Modern computers were marked by the introduction of electronic
components.
• The moving parts in mechanical computers were replaced by high-
mobility electrons in electronic computers.
• Information transmission by mechanical gears or levers was replaced
by electric signals traveling almost at the speed of light.
Computer Generations
• Over the past several decades, electronic computers have gone through roughly five generations of
development.
• The table below provides summary of the five generations of electronic computer development.
• Each of the first three generations lasted about 10 years. The fourth generation covered a time span of 15 years.
The fifth generation today has processors and memory devices with more than 1 billion transistors on a single
silicon chip.
First (1945-1954) Vacuum tubes and relay Machine assembly languages, ENIAC, Princeton IAS, IBM
memories, CPU driven by PC single user, no subroutine 701.
and accumulator, fixed-point linkage, programmed l/O using
arithmetic. CPU.
Second (1955-1964) Discrete transistors and core IILL used with compilers, IBM T090,
memories, floating-point subroutine libraries, batch CDC I-I5-D4,
arithmetic, I/O processors, processing monitor. Univac LARC.
multiplexed memory access.
Generation Technology and Architecture Software and Applications Representative Systems
Third (1965-1974) Integrated circuits (SS1/- Multiprogramming and IBM 360/370, CDC 6600, TI-
MSI), microprograrnrning, timesharing OS, multiuser ASC, PDP-8.
pipelining, cache, and applications.
lookahead processors.
Fourth (1975-1990) LSI/ VLSI and semiconductor Multiprocessor OS, languages, VAX 9000, Cray X-MP, IBM
memory, multiprocessors, compilers, and environments 3090, BBN TC2000.
vector supercomputers, for parallel processing.
multicomputer.
Fifth (1991- present) Advanced VLSl processors, Superscalar processors, systems Desktop, Laptop, Notebooks,
memory, and switches, high- on a chip, massively parallel IBM’s Watson.
density packaging, scalable processing, grand challenge
architectures. applications, heterogeneous
processing.
2. Elements of Modern Computers
• Hardware, software, and
programming elements of a
modern computer system are
briefly introduced below in the
context of parallel processing.
• The concept of computer
architecture is no longer
restricted to the structure of the
bare machine hardware.
• A modern computer is an
integrated system consisting of
machine hardware, an
instruction set, system software,
application programs, and user
interfaces.
Computing Problems
• The use of a computer is driven by real-life problems demanding cost effective solutions.
• Depending on the nature of the problems, the solutions may require different computing
resources.
• For numerical problems in science and technology, the solutions demand complex
mathematical formulations and intensive integer or floating-point computations.
• For alphanumerical problems in business and government, the solutions demand efficient
transaction processing, large database management, and information retrieval operations.
• For artificial intelligence (Al) problems, the solutions demand logic inferences and
symbolic manipulations.
• These computing problems have been labeled numerical computing, transaction
processing and logical reasoning.
• Some complex problems may demand a combination of these processing modes.
Algorithms and Data Structures
• Special algorithms and data structures are needed to specify the computations and
communications involved in computing problems.
• Most numerical algorithms are deterministic, using regularly structured data. Symbolic
processing may use heuristics or nondeterministic searches over large knowledge bases.
Hardware Resources
• Processors, memory, and peripheral devices form the hardware core of a computer system.
• Special hardware interfaces are often built into l/O devices such as display terminals,
workstations, optical page scanners, magnetic ink character recognizers, modems. network
adaptors, voice data entry, printers, and plotters.
• The study of computer architecture involves both hardware organization and programming/
software requirements.
• As seen by an assembly language programmer, computer architecture is abstracted by its
instruction set, which includes opcode (operation codes), addressing modes, registers,
virtual memory, etc.
• From the hardware implementation point of view, the abstract machine is organized with
CPUs, caches, buses, microcode, pipelines, physical memory, etc.
• Therefore, the study of architecture covers both instruction set architectures and machine
implementation organizations.
• Over the past decades,
computer architecture has gone
through evolutional rather than
revolutional changes.
• As depicted in the figure, we
started with the von Neumann
architecture built as a
sequential machine executing
scalar data.
• The sequential computer was
improved from bit-serial to
word—parallel operations, and
from fixed—point to floating
point operations.
• The von Neumann architecture
is slow due to sequential
execution of instructions in
programs.
• Lookahead techniques were introduced to prefetch instructions in order to overlap I/E
(Instruction Fetch/ Decode) operations and to enable functional parallelism.
• Functional parallelism was supported by two approaches: One is to use multiple functional
units simultaneously, and the other is to practice pipelining at various processing levels.
• The latter includes pipelined instruction execution, pipelined arithmetic computations, and
memory-access operations.
• Pipelining is a technique where multiple instructions are overlapped during execution.
• Pipelining has proven especially attractive in performing identical operations repeatedly
over vector data strings.
• Vector operations were originally carried out implicitly by software-controlled looping
using scalar pipeline processors.
ARCHITECTUR
AL
CLASSIFICATIO
N
Basic types of architectural classification
FLYNN’S TAXONOMY OF COMPUTER ARCHITECTURE
FENG’S CLASSIFICATION
Handler Classification
Other types of architectural classification
Classification based on coupling between processing
elements
Classification based on mode of accessing memory
ARCHITECTURAL CLASSIFICATION
Flynn’s classification: (1966) is based on
multiplicity of instruction streams and the data streams
in computer systems.
single-instruction single-data
streams (SISD);
single-instruction multiple-data
streams (SIMD);
multiple-instruction single-data
streams (MISD); and
multiple-instruction multiple-
data streams (MIMD).
SISD
Conventional single-processor von Neumann
computers are classified as SISD systems.
SIMD ARCHITECTURE
The SIMD model of parallel
computing consists of two
parts: a front-end computer of
the usual von Neumann style,
and a processor array.
A bit slice is a string of bits one from each of the words at the same vertical position.
• A clock is applied to all registers after enough time has elapsed to perform all segment
activity.
• The pipeline organization is demonstrated by means of a simple example.
• To perform the combined multiply and add operations with a stream of numbers
Ai * Bi + Ci for i = 1, 2, 3, …, 7
• Each suboperation is to be implemented in a segment within a pipeline.
R1 ← Ai , R2 ← Bi Input Ai and Bi
R3 ← R1 * R2, R4 ← Ci Multiply and input Ci
R5 ← R3 + R4 Add Ci to product
• Each segment has one or two registers and a combinational circuit as shown in Fig below.
• The five registers are loaded with new data every clock pulse. The effect of each clock is
shown in the table.
General Considerations
• Any operation that can be decomposed into a sequence of suboperations of about the same
complexity can be implemented by a pipeline processor.
• The general structure of a four-segment pipeline is illustrated in Fig below.
• We define a task as the total operation performed going through all the segments in the
pipeline.
• The behavior of a pipeline can be illustrated with a space-time diagram.
• It shows the segment utilization as a function of time.
• Now, consider a non-pipeline unit that performs the same operation and takes a time equal
to tn to complete each task.
• The total time required for n tasks is ntn.