0% found this document useful (0 votes)

10 views

COME6102 Chapter 1 Introduction 2 of 2

Shared memory multiprocessors have memory that can be accessed by all processors, allowing independent processors to share resources. They are divided into UMA, where access times are uniform, and NUMA, where access times vary depending on memory location. Distributed memory systems require a communication network since processors have separate, non-shared memories and must explicitly communicate to access data on other processors. Vector processors efficiently perform same operations on all elements of a vector using pipelining. SIMD computers execute the same instruction on multiple data elements simultaneously using an array of identical processors controlled by a single control unit.

Uploaded by

Franck Tiomo

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views

COME6102 Chapter 1 Introduction 2 of 2

Uploaded by

Franck Tiomo

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

MULTIPROCESSOR AND MULTICOMPUTERS

Two categories of parallel computers are discussed below namely shared common memory or
unshared distributed memory.

Shared Memory Multiprocessors

Shared memory parallel computers vary widely, but generally have in common the ability for all
processors to access all memory as global address space. Therefore, multiple processors can operate
independently but share the same memory resources. Changes in a memory location effected by one
processor are visible to all other processors.

Shared memory machines can be divided into two main classes based upon memory access times:
UMA, NUMA and COMA.

Uniform Memory Access (UMA)

Most commonly represented today by Symmetric Multiprocessor (SMP) machines. They have
identical processors with equal access and access times to memory as depicted in Figure 1.9.

They are sometimes referred to as CC-UMA - Cache Coherent UMA. Cache coherent means that if
one processor updates a location in shared memory, all the other processors know about the update.
Cache coherency is accomplished at the hardware level

Non-Uniform Memory Access (NUMA)

Most often they are made by physically linking two or more SMPs. One SMP can directly access
memory of another SMP and not all the processors have equal access time to all memories. Memory
access across link is slower. If cache coherency is maintained, then it may also be called CC-NUMA
- Cache Coherent NUMA
18
The COMA model
The COMA model is a special case of NUMA machine in which the distributed main memories are
converted to caches. All caches form a global address space and there is no memory hierarchy at each
processor node.

Advantages:

 Global address space provides a user-friendly programming perspective to memory

 Data sharing between tasks is both fast and uniform due to the proximity of memory to CPUs

Disadvantages:

 Primary disadvantage is the lack of scalability between memory and CPUs. Adding more
CPUs can geometrically increase traffic on the shared memory
 CPU path, and for cache coherent systems, geometrically increase traffic associated with
cache/memory management.
 Programmer responsibility for synchronization constructs that insure "correct" access of
global memory.
 Expense: it becomes increasingly difficult and expensive to design and produce shared
memory machines with ever increasing numbers of processors.

Distributed Memory
Like shared memory systems, distributed memory systems vary widely but share a common
characteristic. Distributed memory systems require a communication network to connect inter-
processor memory as depicted in Figure 1.11.

19
Processors have their own local memory. Memory addresses in one processor do not map to another
processor, so there is no concept of global address space across all processors.
Because each processor has its own local memory, it operates independently and the changes it makes
to its local memory have no effect on the memory of other processors. Hence, the concept of cache
coherency does not apply.

When a processor needs access to data in another processor, it is usually the task of the programmer
to explicitly define how and when data is communicated. Synchronization between tasks is likewise
the programmer's responsibility.

Modern multicomputer use hardware routers to pass message. Based on the interconnection and
routers and channel used the multicomputers are divided into generation

o 1st generation: based on board technology using hypercube architecture and software
controlled message switching.
o 2nd Generation: implemented with mesh connected architecture, hardware message routing
and software environment for medium distributed – grained computing.
o 3rd Generation: fine grained multicomputer like MIT J-Machine.
The network "fabric" used for data transfer varies widely, though it can be as simple as Ethernet.

Advantages:

 Memory is scalable with number of processors. Increase the number of processors and the
size of memory increases proportionately.
 Each processor can rapidly access its own memory without interference and without the
overhead incurred with trying to maintain cache coherency.
 Cost effectiveness: can use commodity, off-the-shelf processors and networking.

Disadvantages:

 The programmer is responsible for many of the details associated with data communication
between processors.
20
 It may be difficult to map existing data structures, based on global memory, to this memory
organization.
 Non-uniform memory access (NUMA) times

MULTIVECTOR AND SIMD COMPUTERS

A vector operand contains an ordered set of n elements, where n is called the length of the vector.
Each element in a vector is a scalar quantity, which may be a floating point number, an integer, a
logical value or a character.

A vector processor consists of a scalar processor and a vector unit, which could be thought of as an
independent functional unit capable of efficient vector operations.

Vector Hardware
Vector computers have hardware to perform the vector operations efficiently. Operands cannot be
used directly from memory but rather are loaded into registers and are put back in registers after the
operation. Vector hardware has the special ability to overlap or pipeline operand processing as
depicted in Figure 1.12.

Vector functional units pipelined, fully segmented each stage of the pipeline performs a step of the
function on different operand(s) once pipeline is full, a new result is produced per each clock period
(cp).

Pipelining
The pipeline is divided up into individual segments, each of which is completely independent and
involves no hardware sharing. This implies that the machine can be working on separate operands at
the same time. This ability enables it to produce one result per clock period as soon as the pipeline is
full. The same instruction is followed repeatedly using the pipeline technique so the vector processor
processes all the elements of a vector in exactly the same way. The pipeline segments arithmetic
operation such as floating point multiply into stages passing the output of one stage to the next stage
as input. The next pair of operands may enter the pipeline after the first stage has processed
the previous pair of operands. The processing of a number of operands may be carried out
simultaneously.

21
The loading of a vector register is itself a pipelined operation, with the ability to load one element
each clock period after some initial startup overhead.

SIMD Array Processors

The Synchronous parallel architectures coordinate Concurrent operations in lockstep through global
clocks, central control units, or vector unit controllers. A synchronous array of parallel processors is
called an array processor. These processors are composed of N identical processing elements (PES)
under the supervision of a one control unit (CU)

This Control unit is a computer with high speed registers, local memory and arithmetic logic unit. An
array processor is basically a single instruction and multiple data (SIMD) computers. There are N
data streams; one per processor, so different data can be used in each processor. Figure 1.13 below
shows a typical SIMD or array processor.

These processors consist of a number of memory modules which can be either global or dedicated to
each processor. Thus the main memory is the aggregate of the memory modules. These Processing
elements and memory unit communicate with each other through an interconnection network. SIMD
processors are especially designed for performing vector computations. SIMD has two basic
architectural organizations:

a. Array processor using random access memory

b. Associative processors using content addressable memory.

All N identical processors operate under the control of a single instruction stream issued by a central
control unit. The popular examples of this type of SIMD configuration is ILLIAC IV, CM-2, MP-1.
Each PEi is essentially an arithmetic logic unit (ALU) with attached working registers and local
memory PEMi for the storage of distributed data.

The CU also has its own main memory for the storage of program. The function of CU is to decode
the instructions and determine where the decoded instruction should be executed. The PE perform
same function (same instruction) synchronously in a lock step fashion under command of CU. In
order to maintain synchronous operations a global clock is used. Thus at each step i.e., when global
clock pulse changes all processors execute the same instruction, each on a different data (single
instruction multiple data).

SIMD machines are particularly useful at in solving problems involved with vector calculations where
one can easily exploit data parallelism. In such calculations the same set of instruction is applied to
22
all subsets of data. Let us do addition to two vectors each having N element and there are N/2
processing elements in the SIMD. The same addition instruction is issued to all N/2 processors and
all processor elements will execute the instructions simultaneously. It takes 2 steps to add two vectors
as compared to N steps on a SISD machine. The distributed data can be loaded into PEMs from an
external source via the system bus or via system broadcast mode using the control bus.
The array processor can be classified into two category depending how the memory units
are organized. It can be

a. Dedicated memory organization

b. Global memory organization
A SIMD computer C is characterized by the following set of parameter
C= <N,F,I,M>

Where N= the number of PE in the system. For example the iliac –IV has N=64 , the BSP has N= 16.
F= a set of data routing function provided by the interconnection network

I= The set of machine instruction for scalar vector, data routing and network manipulation operations
M = The set of the masking scheme where each mask partitions the set of PEs into disjoint subsets of
enabled PEs and disabled PEs.

PRAM AND VLSI MODELS

PRAM model (Parallel Random Access Machine)
PRAM Parallel random access machine; a theoretical model of parallel computation in which an
arbitrary but finite number of processors can access any value in an arbitrarily large shared memory
in a single time step. Processors may execute different instruction streams, but work synchronously.
This model assumes a shared memory, multiprocessor machine as shown:

1. The machine size n can be arbitrarily large

2. The machine is synchronous at the instruction level. That is, each processor is executing its
own series of instructions, and the entire machine operates at a basic time step (cycle). Within
each cycle, each processor executes exactly one operation or does nothing, i.e. it is idle. An
instruction can be any random access machine instruction, such as: fetch some operands from
memory, perform an ALU operation on the data, and store the result back in memory.
3. All processors implicitly synchronize on each cycle and the synchronization overhead is
assumed to be zero. Communication is done through reading and writing of shared variables.
4. Memory access can be specified to be UMA, NUMA, EREW, CREW, or CRCW with a
defined conflict policy.

The PRAM model can apply to SIMD class machines if all processors execute identical instructions
on the same cycle, or to MIMD class machines if the processors are executing different instructions.
Load imbalance is the only form of overhead in the PRAM model.
The four most important variations of the PRAM are:

 EREW - Exclusive read, exclusive write; any memory location may only be accessed once in
any one step. Thus forbids more than one processor from reading or writing the same memory
cell simultaneously.
23
 CREW - Concurrent read, exclusive write; any memory location may be read any number of
times during a single step, but only written to once, with the write taking place after the reads.
 ERCW - This allows exclusive read or concurrent writes to the same memory location.
 CRCW - Concurrent read, concurrent write; any memory location may be written to or read
from any number of times during a single step. A CRCW PRAM model must define some
rule for resolving multiple writes, such as giving priority to the lowest-numbered processor
or choosing amongst processors randomly. The PRAM is popular because it is theoretically
tractable and because it gives algorithm designers a common target. Nevertheless, PRAMs
cannot be emulated optimally on all architectures.

VLSI Model:
Parallel computers rely on the use of VLSI chips to fabricate the major components such as processor
arrays memory arrays and large scale switching networks. The rapid advent of very large scale
integrated (VSLI) technology now computer architects are trying to implement parallel algorithms
directly in hardware. An AT2 model is an example for two dimension VLSI chips.

Chapter 1in a Nutshell

Computer Architecture has gone through evolutional, rather than revolutional change. Sustaining
features are those that are proven to improve performance. Starting with the von Neumann
architecture (strictly sequential), computer architectures have evolved to include processing
lookahead, parallelism, and pipelining. Also a variety of parallel architectures are discussed like
SIMD, MIMD, Associative Processor, Array Processor, multicomputers, Mutiprocessor. The
performance of system is measured as CPI, MIPS. It depends on the clock rate let us say t. If C is the
total number of clock cycles needed to execute a given program, then total CPU time can be estimated
as
T= C * t = C / f

Other relationships are easily observed:

CPI = C / Ic
T =Ic * CPI * t
T =Ic * CPI / f

Processor speed is often measured in terms of millions of instructions per second, frequently called
the MIPS rate of the processor. The multiprocessor architecture can be broadly classified as tightly
coupled multiprocessor and loosely coupled multiprocessor.

A tightly coupled Multiprocessor is also called a UMA, for uniform memory access, because each
CPU can access memory data at the same (uniform) amount of time. This is the true multiprocessor.

A loosely coupled Multiprocessor is called a NUMA. Each of its node computers can access their
local memory data at one (relatively fast) speed, and remote memory data at a much slower speed.
PRAM and VSLI are the advance technologies that are used for designing the architecture.

24
Key Words
multiprocessor A computer in which processors can execute separate instruction streams, but have
access to a single address space. Most multiprocessors are shared memory machines, constructed by
connecting several processors to one or more memory banks through a bus or switch.

multicomputer A computer in which processors can execute separate instruction streams, have their
own private memories and cannot directly access one another's memories. Most multicomputers are
disjoint memory machines, constructed by joining nodes (each containing a microprocessor and some
memory) via links.
MIMD Multiple Instruction, Multiple Data; a category of Flynn's taxonomy in which many
instruction streams are concurrently applied to multiple data sets. A MIMD architecture is one in
which heterogeneous processes may execute at different rates.

MIPS one Million Instructions Per Second. A performance rating usually referring to integer or non-
floating point instructions

vector processor A computer designed to apply arithmetic operations to long vectors or arrays. Most
vector processors rely heavily on pipelining to achieve high performance
pipelining Overlapping the execution of two or more operations

Literary Elements
90% (10)
Literary Elements
88 pages
GM5MPHY
100% (1)
GM5MPHY
70 pages
15CS72 ACA Module1 Chapter1FinalCopy
No ratings yet
15CS72 ACA Module1 Chapter1FinalCopy
25 pages
Parallel Processors: Session 2
No ratings yet
Parallel Processors: Session 2
32 pages
Advanced Computer Architecture Slides
No ratings yet
Advanced Computer Architecture Slides
105 pages
2. Parallel Computers
No ratings yet
2. Parallel Computers
39 pages
Advance Computer Architecture2
No ratings yet
Advance Computer Architecture2
36 pages
Module 2 - Parallel Computing
No ratings yet
Module 2 - Parallel Computing
55 pages
Coa Unit-3,4 Notes
No ratings yet
Coa Unit-3,4 Notes
17 pages
CP4253 Map Unit I
No ratings yet
CP4253 Map Unit I
31 pages
SIMD Architecture
100% (1)
SIMD Architecture
16 pages
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
No ratings yet
CS82 Advanced Computer Architecture: Parallel Computer Models 1.2 Multiprocessors and Multicomputers
19 pages
Advanced Computer Architecture Unit 1
No ratings yet
Advanced Computer Architecture Unit 1
23 pages
Aca Unit 1.1
No ratings yet
Aca Unit 1.1
20 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Parallel Computig Assignment
No ratings yet
Parallel Computig Assignment
15 pages
09 Communication models of Parallel platforms
No ratings yet
09 Communication models of Parallel platforms
25 pages
Parallel
No ratings yet
Parallel
5 pages
Lec 5 SharedArch PDF
No ratings yet
Lec 5 SharedArch PDF
16 pages
Parallel and Distributed Algorithms: Johnnie W. Baker
No ratings yet
Parallel and Distributed Algorithms: Johnnie W. Baker
67 pages
Large Computer Systems and Pipelining: Homework
No ratings yet
Large Computer Systems and Pipelining: Homework
11 pages
15 Parallel Processing
No ratings yet
15 Parallel Processing
36 pages
Chapter - 5 Parallel Processing
No ratings yet
Chapter - 5 Parallel Processing
117 pages
Parallel Computing Lecture # 6: Parallel Computer Memory Architectures
No ratings yet
Parallel Computing Lecture # 6: Parallel Computer Memory Architectures
16 pages
SIMD Vs MIMD With Memory Models
No ratings yet
SIMD Vs MIMD With Memory Models
7 pages
CSCI 8150 Advanced Computer Architecture
100% (2)
CSCI 8150 Advanced Computer Architecture
18 pages
Definition of UMA: Basis For Comparison UMA Numa
No ratings yet
Definition of UMA: Basis For Comparison UMA Numa
10 pages
5 4 Parallel
No ratings yet
5 4 Parallel
47 pages
CH15
No ratings yet
CH15
16 pages
Unit 1 - Part - 2
No ratings yet
Unit 1 - Part - 2
30 pages
Part 1 - Lecture 2 - Parallel Hardware
No ratings yet
Part 1 - Lecture 2 - Parallel Hardware
60 pages
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
No ratings yet
Parallel Computing: Er. Anupama Singh Department of Computer Science & Engg
22 pages
Taxonomy of Parallel Computing Paradigms
No ratings yet
Taxonomy of Parallel Computing Paradigms
9 pages
Parallel Computer Models: PCA Chapter 1
No ratings yet
Parallel Computer Models: PCA Chapter 1
61 pages
Parallel Computing Memory Architectures
No ratings yet
Parallel Computing Memory Architectures
14 pages
Lecture 3 PDC
No ratings yet
Lecture 3 PDC
21 pages
Ca Unit 4 Prabu
No ratings yet
Ca Unit 4 Prabu
24 pages
(SMC), (SMP), (MPP) : Symmetric Multi-Computers Symmetric Multi-Processors
No ratings yet
(SMC), (SMP), (MPP) : Symmetric Multi-Computers Symmetric Multi-Processors
13 pages
Cis620 15 00
No ratings yet
Cis620 15 00
36 pages
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
No ratings yet
Lecture 3 - 1 Dichotomy of Parallel Computing Platforms
17 pages
Programação Paralela e Distribuída
No ratings yet
Programação Paralela e Distribuída
39 pages
Unit 4 - Parallel Computer Structures Word
No ratings yet
Unit 4 - Parallel Computer Structures Word
12 pages
Why Choose A Multiprocessor?
No ratings yet
Why Choose A Multiprocessor?
17 pages
Presentation 4
No ratings yet
Presentation 4
29 pages
Term Paper: Computer Organization and Architecure (Cse211)
No ratings yet
Term Paper: Computer Organization and Architecure (Cse211)
7 pages
Parallel Processing Report
No ratings yet
Parallel Processing Report
9 pages
Multiprocessor Architecture: Taxonomy of Parallel Architectures
100% (1)
Multiprocessor Architecture: Taxonomy of Parallel Architectures
32 pages
Parallel Processing in Processor Organization: Prabhudev S Irabashetti
No ratings yet
Parallel Processing in Processor Organization: Prabhudev S Irabashetti
4 pages
IJARCCE6G S Prabhudev Parallel PDF
No ratings yet
IJARCCE6G S Prabhudev Parallel PDF
4 pages
MCP ppt
No ratings yet
MCP ppt
19 pages
Model
No ratings yet
Model
14 pages
Lecture4 (Share Memory-"According Access")
No ratings yet
Lecture4 (Share Memory-"According Access")
16 pages
atII Bks Lec 2021 28
No ratings yet
atII Bks Lec 2021 28
6 pages
Flynn taxonomy
No ratings yet
Flynn taxonomy
4 pages
Chapter 6 Parallel and Concurrent Computing
No ratings yet
Chapter 6 Parallel and Concurrent Computing
27 pages
Seminar
No ratings yet
Seminar
85 pages
ACA Unit. 1 Parallel Processing
No ratings yet
ACA Unit. 1 Parallel Processing
10 pages
Parallel Computing Unit 2 - Parallel Computing Architecture
No ratings yet
Parallel Computing Unit 2 - Parallel Computing Architecture
49 pages
ACA-Unit5-Notes
No ratings yet
ACA-Unit5-Notes
26 pages
Coa-Unit - 5 Notes
No ratings yet
Coa-Unit - 5 Notes
38 pages
Flynn's Taxonomy and SISD SIMD MISD MIMD
86% (14)
Flynn's Taxonomy and SISD SIMD MISD MIMD
7 pages
Operating Systems Interview Questions You'll Most Likely Be Asked
From Everand
Operating Systems Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Dll-Ps-Modules - Psap 3
No ratings yet
Dll-Ps-Modules - Psap 3
4 pages
Drarry, Snarry and Snape: "The Queerest of The Queer"
No ratings yet
Drarry, Snarry and Snape: "The Queerest of The Queer"
68 pages
Q bank
No ratings yet
Q bank
4 pages
4 A's Lesson Plan
No ratings yet
4 A's Lesson Plan
5 pages
APA 7th Edition
No ratings yet
APA 7th Edition
58 pages
Lab Test 3 (Set 1)
No ratings yet
Lab Test 3 (Set 1)
1 page
Fallacy Cheatsheet For Debate Fallacy Bingo
No ratings yet
Fallacy Cheatsheet For Debate Fallacy Bingo
3 pages
Cambridge Assessment International Education: Mathematics 9709/62 May/June 2018
No ratings yet
Cambridge Assessment International Education: Mathematics 9709/62 May/June 2018
16 pages
Report 2
No ratings yet
Report 2
33 pages
Prueba Comprensión Lectora Inglés Terceros
No ratings yet
Prueba Comprensión Lectora Inglés Terceros
2 pages
Limitations of Snmpv1 History of Snmpv2 - Hierarchies - Security Snmpv2 Protocol Operations Transport Independence Rfcs
No ratings yet
Limitations of Snmpv1 History of Snmpv2 - Hierarchies - Security Snmpv2 Protocol Operations Transport Independence Rfcs
24 pages
Developing Collocation S 2
No ratings yet
Developing Collocation S 2
23 pages
Cindy, Usa: Read What These Teenagers Have To Say About The City and The Countryside
No ratings yet
Cindy, Usa: Read What These Teenagers Have To Say About The City and The Countryside
2 pages
Manuel - Fiol - Mode Melody Harmony in Traditional Afro Cuban Music - 2007
No ratings yet
Manuel - Fiol - Mode Melody Harmony in Traditional Afro Cuban Music - 2007
32 pages
Money Currency Bitcoin V11
No ratings yet
Money Currency Bitcoin V11
27 pages
Collecting Inventory and Metering Software Usage
No ratings yet
Collecting Inventory and Metering Software Usage
21 pages
Pesach To-Go - 5770 PDF
No ratings yet
Pesach To-Go - 5770 PDF
60 pages
Dimensi - LMI - LMI - MS - Manual
100% (1)
Dimensi - LMI - LMI - MS - Manual
94 pages
0680 6506 Assignment Question Paper
No ratings yet
0680 6506 Assignment Question Paper
2 pages
LESSON PLAN WRITING by Maria SJKTKK
No ratings yet
LESSON PLAN WRITING by Maria SJKTKK
3 pages
PHIẾU CHUẨN BỊ BÀI TUẦN 4
No ratings yet
PHIẾU CHUẨN BỊ BÀI TUẦN 4
7 pages
Introduction of Tamil Literature
No ratings yet
Introduction of Tamil Literature
3 pages
Reviewer For 3rd Quarter Exam
No ratings yet
Reviewer For 3rd Quarter Exam
11 pages
Eng 2 FNU - Activity 2
No ratings yet
Eng 2 FNU - Activity 2
2 pages
Plane and Solid Geometry Formulas Prepar
67% (3)
Plane and Solid Geometry Formulas Prepar
2 pages
ReleaseNote en V4.60
No ratings yet
ReleaseNote en V4.60
140 pages
Aakash Ioqm-2021 (02!02!2021) - Ans & Sol - Final (KV & JNV (
No ratings yet
Aakash Ioqm-2021 (02!02!2021) - Ans & Sol - Final (KV & JNV (
9 pages
Cognitive Functions of Language According To G. W. Leibniz: Halina Święczkowska
No ratings yet
Cognitive Functions of Language According To G. W. Leibniz: Halina Święczkowska
18 pages

COME6102 Chapter 1 Introduction 2 of 2

Uploaded by

COME6102 Chapter 1 Introduction 2 of 2

Uploaded by

MULTIPROCESSOR AND MULTICOMPUTERS

Shared Memory Multiprocessors

Uniform Memory Access (UMA)

Non-Uniform Memory Access (NUMA)

 Global address space provides a user-friendly programming perspective to memory

MULTIVECTOR AND SIMD COMPUTERS

SIMD Array Processors

a. Array processor using random access memory

a. Dedicated memory organization

PRAM AND VLSI MODELS

1. The machine size n can be arbitrarily large

Chapter 1in a Nutshell

Other relationships are easily observed:

You might also like