0% found this document useful (0 votes)

15 views21 pages

Unit 1

Uploaded by

rishithaparankusham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views21 pages

Unit 1

Uploaded by

rishithaparankusham

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 21

AVD ARCH

UNIT – I
Theory of Parallelism, Parallel computer models, The State of Computing, Multiprocessors and
Multicomputers, Multivector and SIMD Computers, PRAM and VLSI models, Architectural development
tracks, Program and network properties, Conditions of parallelism, Program partitioning and Scheduling,
Program flow Mechanisms, System interconnect Architectures.

Introduction
Parallelism means doing many things at the same time. Instead of finishing one work and
then starting another, the computer works on multiple instructions together. This saves time
and increases performance.
Modern processors (multi-core CPUs, GPUs, supercomputers) are based on this idea.

Types of Parallelism
1. Bit-Level Parallelism
o Processor handles more bits in one step.
o Example: 64-bit processor works faster than 32-bit.
2. Instruction-Level Parallelism (ILP)
o Many instructions run at the same time.
o Done using pipelining and superscalar execution.
3. Loop-Level Parallelism
o Loop tasks are divided among processors.
o Example: In matrix multiplication, each processor calculates part of the result.
4. Task-Level Parallelism
o Different functions run at the same time.
o Example: One thread handles input, another processes output.
5. Job-Level Parallelism
o Two or more programs run together.
o Example: A user can listen to music while browsing the internet.

Important Laws of Parallelism

1. Amdahl’s Law
o States that the maximum speedup of a parallel program is limited by the
portion that must be executed sequentially.
o Formula:

S=1/(1−P)+P/N
where P = parallel portion, N = number of processors.
2. Gustafson’s Law
o Argues that with increasing problem size, parallelism becomes more effective.
o Suggests scalability improves with larger workloads and more processors.

Conditions for Parallelism

 No conflicts between tasks (no data hazards).
 Enough hardware (processors, memory).
 Proper synchronization between tasks.

Diagram (Levels of Parallelism)

Bit → Instruction → Loop → Task → Job

Applications
 Weather forecasting, scientific simulations.
 Machine learning and AI.
 Image and video processing.
 Robotics and real-time systems.

Advantages
 Saves execution time.
 Increases efficiency.
 Handles large problems easily.

Disadvantages
 Needs special hardware.
 Programming is difficult.
 Communication between tasks may slow down execution.

Parallel Computer Models

Introduction
Parallel computer models describe how instructions and data are handled when multiple
processors work together. They help us understand the different ways a system can achieve
parallel execution.
The most popular classification was given by Michael J. Flynn in 1966, called Flynn’s
Taxonomy.
This classification is based on the number of instruction streams (I) and data streams (D)
being processed at the same time.
Flynn’s Taxonomy
1. SISD (Single Instruction, Single Data)
o One instruction operates on one data item at a time.
o It is a sequential model (no parallelism).
o Example: Traditional uniprocessor computers.

2. SIMD (Single Instruction, Multiple Data)

o One instruction is executed on many data items simultaneously.
o Best for data-parallel tasks like image processing, vector calculations.
o Hardware support: Vector processors, GPUs.
o Example: Graphics rendering, matrix multiplication.

3. MISD (Multiple Instruction, Single Data)

o Many instructions act on the same data.
o Rarely used in real life.
o Found in fault-tolerant and redundant systems, where the same data is
processed by different algorithms to ensure correctness.
o Example: Some military/space systems for reliability.

4. MIMD (Multiple Instruction, Multiple Data)

o Many processors execute different instructions on different data.
o Most powerful and most common in modern systems.
o Found in multicore CPUs, clusters, cloud servers, and supercomputers.
o Two types:
 Shared Memory MIMD (multiprocessors – using same memory).
 Distributed Memory MIMD (multicomputers – each processor has
private memory).

Diagram (Flynn’s Taxonomy)

Instruction Stream Data Stream
----------------- ------------
SISD ---> Single Single
SIMD ---> Single Multiple
MISD ---> Multiple Single
MIMD ---> Multiple Multiple

Applications
 SISD: Personal computers, simple microcontrollers.
 SIMD: Image/video processing, scientific simulations, AI training.
 MISD: Rare, used in fault-tolerant systems.
 MIMD: Most real-world systems like multicore CPUs, servers, clusters, cloud
computing.

Other Parallel Computer Models (Beyond Flynn’s)

 PRAM (Parallel Random Access Machine): Theoretical model assuming all
processors share memory.
 VLSI Models: Used in chip design to analyze performance with area–time trade-offs.

Advantages of Parallel Models

 Help in understanding and designing different types of architectures.
 Improve execution speed and efficiency.
 Allow scalable computing for large and complex applications.

Disadvantages
 Some models (MISD) are impractical.
 Complex design and programming.
 Higher cost for hardware and maintenance.

The State of Computing

Introduction
The state of computing refers to the current trends, challenges, and progress in computer
systems.
Initially, computers were sequential uniprocessors, but due to the end of Moore’s Law
scaling and the power wall, the industry moved towards parallelism. Today’s computing is
dominated by multi-core processors, GPUs, cloud computing, and AI accelerators.

Evolution of Computing
1. First Generation (1940s–50s) – Vacuum tubes, sequential execution.
2. Second Generation (1960s) – Transistors, faster uniprocessors.
3. Third Generation (1970s) – Integrated circuits, pipelining started.
4. Fourth Generation (1980s–90s) – Microprocessors, early multiprocessors.
5. Fifth Generation (2000s onwards) – Parallel and distributed systems, cloud
computing, GPUs, AI hardware.
Modern State of Computing
1. Multi-Core and Many-Core CPUs
o Processors now have multiple cores (quad-core, octa-core, 64+ cores in
servers).
o Each core executes tasks in parallel.
2. GPUs and Accelerators
o GPUs provide SIMD parallelism for graphics, AI, and scientific tasks.
o TPUs (Tensor Processing Units) and FPGAs are used for machine learning.
3. Cloud Computing
o Shared, on-demand resources available via the internet.
o Examples: AWS, Microsoft Azure, Google Cloud.
4. Edge and Mobile Computing
o Processing happens closer to data sources (IoT devices, mobile phones).
o Reduces latency in real-time applications.
5. Big Data and AI
o Massive datasets require parallel and distributed computing.
o AI/ML workloads dominate supercomputing and cloud systems.

Trends and Challenges

 Moore’s Law slowing down → transistor scaling is difficult.
 Power Wall → increasing clock speed generates too much heat.
 Need for parallelism → more cores, better interconnects, efficient scheduling.
 Software challenge → writing parallel programs is more complex.

Diagram: Evolution
Sequential → Pipelining → Superscalar → Multi-core → Many-core → Cloud/AI

Applications
 Cloud services (Google Drive, AWS).
 AI (Chatbots, Recommendation engines).
 Real-time systems (autonomous vehicles, robotics).
 Supercomputers for climate modeling, drug discovery.

Multiprocessors and Multicomputers,

1. Multiprocessors (Shared-Memory Systems)
 Definition: A computer system with two or more processors sharing a common
memory and I/O.
 All processors can directly access the global memory space.
 Communication happens via the shared memory.

Types of Multiprocessors:
1. Symmetric Multiprocessors (SMP):
o All processors are equal and share the same memory.
o Example: Modern multi-core CPUs.
2. Non-Uniform Memory Access (NUMA):
o Memory is divided among processors, but still accessible globally.
o Access time depends on memory location.

Advantages:
 Easy to program (shared memory model).
 High performance for medium-scale systems.

Disadvantages:
 Memory contention (bottleneck if many processors access memory).
 Limited scalability (not good for thousands of processors).

2. Multicomputers (Distributed-Memory Systems)

 Definition: A system where each processor has its own private memory.
 Processors communicate by message passing.
 No global memory space is available.

Examples:
 Computer clusters, distributed supercomputers.
 MPI (Message Passing Interface) is commonly used.

Advantages:
 Highly scalable (can connect thousands of processors).
 No memory contention, since each processor has private memory.

Disadvantages:
 Difficult to program (explicit message passing).
 Higher communication overhead.

Diagram: Difference Between Multiprocessor and Multicomputer

Multiprocessor (Shared Memory) Multicomputer (Distributed Memory)
CPU1 ---+ CPU1 --- Memory1
| |
CPU2 ---+--- Shared Memory CPU2 --- Memory2
| |
CPU3 ---+ CPU3 --- Memory3

Applications
 Multiprocessors: General-purpose servers, desktops, multi-core processors in
laptops.
 Multicomputers: Supercomputers, large-scale simulations, scientific computing,
cloud systems.

Difference between Multiprocessors and Multicomputers

Multiprocessors (Shared-Memory Multicomputers (Distributed-
Point
Systems) Memory Systems)

1. Memory Each processor has its own private

Processors share a common memory.
Organization memory.

2. Communication Communication via shared memory. Communication via message passing.

Single global address space (all processors Multiple private address spaces
3. Address Space
see the same memory). (each processor has its own).

Limited scalability (difficult beyond a few Highly scalable (can connect

4. Scalability
dozen processors). thousands of processors).

More expensive due to shared memory Cheaper and easier to build using
5. Cost
hardware. networked PCs.

Easier to program (shared memory Harder to program (requires explicit

6. Programming
model). message passing).

Multi-core CPUs, SMP systems, NUMA Computer clusters, cloud data

7. Examples
systems. centers, supercomputers.

Suffer from memory bottleneck (many

8. Memory No memory contention (each
processors competing for the same
Contention processor has private memory).
memory).

Less fault-tolerant (failure of memory may More fault-tolerant (failure of one

9. Fault Tolerance
crash whole system). node does not crash entire system).

10. Speed Faster for small to medium systems. Faster for large-scale systems.
Multi vector and SIMD Computers

Introduction
Parallel computing systems aim to process large volumes of data efficiently by executing
multiple operations at once. Two important categories are SIMD (Single Instruction
Multiple Data) computers and Multivector computers. Both are widely used in scientific,
engineering, multimedia, and AI applications, but they differ in design and working.

1. SIMD Computers
 Definition: A SIMD computer has one control unit that broadcasts the same
instruction to many processing elements (PEs), but each PE works on a different
data item at the same time.
 This is a form of data parallelism because the same operation is applied to many data
elements simultaneously.

Characteristics:
1. Single instruction → multiple data streams.
2. All processors execute in lockstep (synchronized).
3. Very efficient for applications with regular data structures like arrays and matrices.

Examples:
 GPUs (Graphics Processing Units).
 Array processors like ILLIAC IV, MasPar.
 Supercomputers using SIMD for scientific simulations.

Applications:
 Image/video processing.
 Signal processing.
 Matrix multiplication.
 Weather forecasting.

Advantages:
 Simple control mechanism.
 High performance for vectorizable tasks.
 Saves power and reduces instruction overhead.

Limitations:
 Not good for irregular tasks or tasks with different control flows.
 Idle processors if data size is uneven.
2. Multi vector Computers
 Definition: Multivector computers use vector processors that can execute operations
on entire arrays of data (vectors) in a single instruction.
 They are more powerful than SIMD because they can handle multiple vector
operations and support vector registers.

Characteristics:
1. Operates on long vectors of data (e.g., arrays of 1000 numbers).
2. Special vector instructions (e.g., vector add, vector multiply).
3. Reduces instruction fetch/decode overhead by applying one instruction to many
elements.

Examples:
 CRAY vector supercomputers (CRAY-1, CRAY-XMP).
 NEC SX series.
 Modern CPUs with vector extensions (Intel AVX, ARM NEON).

Applications:
 Scientific simulations (physics, astronomy).
 Engineering computations.
 Machine learning & AI acceleration.
 Linear algebra and matrix-based problems.

Advantages:
 Faster execution for vectorizable problems.
 Reduces instruction overhead.
 Excellent for scientific workloads.

Limitations:
 Requires vector-friendly programs.
 More complex and costly hardware.

4. Differences between SIMD and Multivector Computers

Aspect SIMD Computers Multivector Computers

Instruction Same instruction applied to multiple Vector instructions operate on arrays of

Type processors. data.

Works on individual data items across Works on entire vectors (arrays) in one
Data Handling
many processors. go.

Large number of simple processing

Hardware Fewer but powerful vector processors.
elements (PEs).

Central control unit broadcasts Vector unit fetches and executes vector
Control
instructions. instructions.

CRAY supercomputers, Intel AVX, ARM

Examples GPUs, Array processors (ILLIAC IV, MasPar).
NEON.

Flexibility Best for simple data-parallel tasks. Best for complex scientific computations.

Execution Lockstep execution, processors must Can perform multiple vector instructions
Style follow same instruction. simultaneously.

Diagram
SIMD: [Instruction] → P1(Data1), P2(Data2), P3(Data3), P4(Data4)

Vector: [Vector Instruction] → (Array of 1000 elements processed in

parallel)

Conclusion
 SIMD computers are ideal for simple, data-parallel applications such as graphics,
image processing, and matrix operations.
 Multivector computers are more powerful, designed for scientific and engineering
applications where large vector operations dominate.
 Together, they form the foundation of high-performance computing (HPC) and
modern processors (CPU + GPU hybrid systems).

PRAM and VLSI models

PRAM (Parallel Random Access Machine) Model
Introduction
 The PRAM (Parallel Random Access Machine) is a theoretical model used in
parallel computing.
 It is designed to study parallel algorithms without worrying about hardware
limitations.
 It assumes a system where many processors can work together and access a shared
memory in constant time.

Architecture of PRAM
1. Processors (P1, P2, … Pn): A large number of simple processors.
2. Shared Memory: A single global memory accessible by all processors.
3. Control: Synchronous execution (all processors run in lockstep).
4. Uniform Access: Each processor can read/write from any memory cell in unit time.

Types of PRAM Models

Depending on how read/write conflicts are handled, PRAM has four main types:
1. EREW (Exclusive Read Exclusive Write):
o No two processors can read or write the same memory cell simultaneously.
o Most restrictive, but simple.
2. CREW (Concurrent Read Exclusive Write):
o Multiple processors can read the same location, but only one can write at a
time.
3. ERCW (Exclusive Read Concurrent Write):
o Only one processor can read a location, but multiple can write at once.
4. CRCW (Concurrent Read Concurrent Write):
o Many processors can read and write simultaneously.
o Writing conflicts resolved by rules: priority write, common write, or arbitrary
write.

Advantages of PRAM
 Simplifies the analysis of parallel algorithms.
 Provides a clean mathematical model.
 Helps estimate speedup and efficiency of parallel programs.
Limitations of PRAM
 Unrealistic because actual hardware cannot support unlimited processors and
constant-time memory access.
 Ignores communication delays and memory contention.
 Used only as a theoretical tool, not a real implementation.

Applications
 Designing and analyzing algorithms for sorting, searching, matrix multiplication, and
graph problems.
 Teaching parallel computing concepts.

Diagram
P1 P2 P3 P4 ... Pn
\ | /
[Shared Global Memory]
VLSI (Very Large Scale Integration) Model

Introduction
 VLSI (Very Large Scale Integration) is the process of integrating thousands to
millions of transistors onto a single chip.
 In parallel computing, the VLSI model studies how efficiently an algorithm can be
mapped to hardware by considering chip area and execution time.

Key Points
1. Processing Elements (PEs): Many small processors are embedded on a single chip.
2. Interconnection Network: Communication between PEs is done via on-chip
networks.
3. Area–Time Complexity:
o A = Area of chip (proportional to number of processors + wires).
o T = Time taken to execute the algorithm.
o Efficiency measured using AT² (area–time squared).
4. Goal: Design hardware that minimizes chip area and computation time.

Advantages
 High speed and performance due to massive parallelism.
 Compact and low-cost compared to separate processors.
 Suitable for special-purpose architectures like systolic arrays, GPUs, and AI
accelerators.

Limitations
 Physical constraints: chip size, power, and heat dissipation.
 Complex and costly design process.
 Limited flexibility (hardware is fixed once fabricated).

Applications
 Design of CPUs and GPUs.
 AI accelerators (e.g., Google TPU, NVIDIA Tensor Cores).
 Digital Signal Processing (DSP) and image processing.
 Network-on-Chip in multicore processors.

Diagram
+----------------------------------+
| Multiple Processing Elements |
| +---+ +---+ +---+ +---+ |
| |PE1| |PE2| |PE3| |PEn| |
| +---+ +---+ +---+ +---+ |
| Interconnection Network |
+----------------------------------+
Differences between PRAM and VLSI Models
Aspect PRAM Model VLSI Model

A theoretical model for designing and A hardware-oriented model for

1. Nature
analyzing parallel algorithms. implementing algorithms on chips.

Assumes a shared global memory Each processor has limited local

2. Memory
accessible in constant time. memory, constrained by chip area.

Assumes unlimited number of Limited by chip size and transistor

3. Processors
processors. count.

Processors communicate via shared Processors communicate via on-chip

4. Communication
memory. interconnection networks.

All processors work in lockstep Timing depends on circuit delays and

5. Timing
(synchronous). hardware design.

6. Complexity Measured in parallel time steps and Measured in Area–Time (AT²)

Measure processor count. complexity.

7. Practicality Purely theoretical and idealized (not Realistic, used in physical VLSI chip
Aspect PRAM Model VLSI Model

practical). design.

Algorithm analysis (sorting, searching, Hardware design (CPUs, GPUs, AI

8. Applications
graph problems). accelerators).

Less flexible – once hardware is

9. Flexibility Flexible for any algorithm design.
fabricated, fixed.

VLSI systolic array for matrix

10. Example CRCW PRAM sorting algorithm.
multiplication.

Conclusion
 PRAM model is best for theoretical study of parallel algorithms.
 VLSI model is best for practical implementation of algorithms in hardware.
 Together, they bridge the gap between algorithm theory and hardware design.

Architectural Development Tracks

Introduction
 The evolution of computer architecture follows different development tracks
based on technological improvements and performance demands.
 These tracks describe the historical progression from sequential computing to
parallel and distributed computing.

Major Architectural Development Tracks

1. Uniprocessor Development
o Early computers were sequential and executed one instruction at a time.
o Improvements came through faster clocks, pipelining, and cache memory.
o Example: Early Intel 8086, Pentium processors.
2. Pipelined and Superscalar Architectures
o Instructions divided into stages (fetch, decode, execute, etc.) to increase
throughput.
o Superscalar processors issue multiple instructions per cycle.
o Example: RISC processors, Intel Core series.
3. Vector and Array Processors
o Support vector instructions to process multiple data elements in one
operation.
o Useful in scientific computing and graphics.
o Example: Cray supercomputers, GPUs.
4. Multithreading and Multiprocessing
o Introduction of multiple cores and threads on the same chip.
o Each core executes different instructions, improving parallelism.
o Example: Multicore CPUs (Intel i7, AMD Ryzen).
5. SIMD and MIMD Systems
o SIMD (Single Instruction, Multiple Data): One instruction applied to many
data (vector/SIMD extensions like SSE, AVX).
o MIMD (Multiple Instruction, Multiple Data): Different processors
executing different instructions on different data (used in clusters and cloud
computing).
6. Cluster and Distributed Computing
o Many computers connected via high-speed networks to work as a single
system.
o Provides scalability and fault tolerance.
o Example: Beowulf clusters, Google cloud servers.
7. VLSI and Multichip Systems
o Development of Very Large Scale Integration (VLSI) enabled millions of
transistors per chip.
o Led to GPUs, AI accelerators, and SoCs (System-on-Chip).
8. Parallel and Cloud Computing Era
o Large-scale parallel processing using supercomputers, cloud data centers,
and GPU farms.
o Used for AI, big data, scientific simulations, and machine learning.

Diagram (Simplified Evolution Path)

Uniprocessor → Pipelining → Superscalar →
Vector/Array → Multiprocessors →
Clusters → VLSI/GPUs → Cloud & AI Systems
Program and Network Properties

Introduction
 In parallel computing, program behavior and network organization determine how
efficiently tasks can be executed.
 Program properties describe how a program can be divided into parallel parts.
 Network properties describe how processors communicate in a parallel system.
1. Program Properties
1. Parallelism
o Amount of work that can be done simultaneously.
o Measured as the degree of parallelism = (total operations) / (time steps).
2. Granularity
o Coarse-grain: Large tasks, fewer communication needs.
o Fine-grain: Small tasks, frequent communication.
3. Data Dependence
o Determines whether instructions can be executed in parallel.
o Types: true dependence, anti-dependence, output dependence.
4. Control Dependence
o Arises from conditional branches (e.g., if–else).
o Reduces parallel execution possibilities.
5. Computational Load Balance
o Work must be evenly distributed among processors to avoid idle time.

2. Network Properties
1. Topology
o Structure of processor interconnection (bus, mesh, hypercube, tree, etc.).
2. Diameter
o Maximum number of links between any two processors (smaller diameter =
faster communication).
3. Connectivity
o Number of alternative paths between processors → improves fault tolerance.
4. Bisection Width/Bandwidth
o Minimum number of links that must be cut to divide the network into two
equal halves.
o Higher = better data transfer capacity.
5. Latency & Bandwidth
o Latency: Time to deliver a message.
o Bandwidth: Maximum data transfer rate.

Diagram (Conceptual)
Program Properties → Parallelism, Granularity, Dependence, Balance
Network Properties → Topology, Diameter, Bandwidth, Latency

Conditions of Parallelism
Introduction
 Parallelism means executing multiple operations simultaneously.
 For a program to be executed in parallel, certain conditions must be satisfied so that
tasks can run independently without conflicts.

Main Conditions of Parallelism

1. Data Dependence Condition
o If one instruction depends on the result of another, they cannot run in parallel.
o Types:
 True Dependence (Read after Write)
 Anti-Dependence (Write after Read)
 Output Dependence (Write after Write)
2. Control Dependence Condition
o Arises from branching and loops (e.g., if–else).
o Only one path can be taken at a time → limits parallelism.
3. Resource Dependence Condition
o Two or more instructions need the same resource (e.g., memory, I/O, or
ALU).
o Can cause structural hazards in pipelines.
4. Computational Granularity Condition
o Tasks must have the right size (granularity):
 Coarse-grain: Fewer communication overheads → better parallelism.
 Fine-grain: Too much communication → reduces efficiency.
5. Load Balancing Condition
o Work must be evenly distributed among processors.
o If some processors are idle, parallelism is wasted.
6. Latency and Communication Condition
o If communication between processors takes more time than computation,
parallelism is ineffective.
o Requires low latency, high bandwidth networks.

Diagram (Conceptual)
Conditions of Parallelism:
├── Data Dependence
├── Control Dependence
├── Resource Dependence
├── Granularity
├── Load Balance
└── Communication/Latency
Conclusion
 Parallelism is only possible when dependencies are minimized, resources are
available, and tasks are balanced.
 These conditions ensure that programs run efficiently in parallel without idle
processors or conflicts.

Program Partitioning and Scheduling

Introduction
 In parallel computing, a program cannot run as a single block.
 It must be partitioned into smaller tasks and then scheduled on processors for
execution.
 The aim is to achieve maximum parallelism, minimum idle time, and balanced
workload.

1. Program Partitioning
Partitioning = dividing a program into smaller units (tasks or data blocks).
 Functional Partitioning
o Different processors perform different functions.
o Example: One processor handles input, another does computation, another
does output.
 Data Partitioning
o Input data is split into smaller blocks and distributed among processors.
o Example: Splitting an array for parallel sorting.
 Granularity in Partitioning
o Fine-grain: Small tasks, more communication overhead.
o Coarse-grain: Larger tasks, less communication, better efficiency.

2. Program Scheduling
Scheduling = deciding the order and allocation of tasks to processors.
 Static Scheduling
o Tasks are assigned to processors before execution.
o Works well when tasks are predictable.
o Example: Loop iterations divided equally among processors.
 Dynamic Scheduling
o Tasks are assigned at runtime depending on resource availability.
o Handles irregular or unpredictable workloads better.
 Load Balancing
o Work must be distributed evenly among processors.
o Prevents idle processors and bottlenecks.

Diagram (Conceptual)
Program Partitioning → Functional / Data
↓
Program Scheduling → Static / Dynamic
↓
Balanced Parallel Execution

Conclusion
 Partitioning divides the program into manageable parallel tasks.
 Scheduling arranges those tasks efficiently on processors.
 Together, they ensure high performance, efficiency, and scalability in parallel
computing.

Program Flow Mechanisms

Introduction
 In computer architecture, program flow mechanisms define how instructions are
executed and how the control/data flows in the system.
 Different mechanisms support different types of parallelism.

Types of Program Flow Mechanisms

1. Control Flow Mechanism
o Instructions are executed in sequential order under the control of a program
counter (PC).
o Most common in traditional von Neumann machines.
o Example: CPUs in desktops/laptops.
2. Data Flow Mechanism
o Instructions execute as soon as all input data is available, instead of
following a strict sequence.
o No global program counter is needed.
o Example: Dataflow architectures for scientific computing.
3. Demand-Driven (Lazy Evaluation)
o Computation happens only when the result is required by another
instruction.
o Avoids unnecessary operations.
o Example: Functional programming languages like Haskell.
4. Reduction Mechanism
o Programs are represented as expressions, and execution is done by reducing
expressions step by step until the final result is obtained.
o Useful in symbolic and functional computation.

Diagram (Conceptual)
Program Flow Mechanisms:
├── Control Flow → Sequential execution
├── Data Flow → Executes when data ready
├── Demand Driven → Executes when needed
└── Reduction → Expression reduction

Conclusion
 Control flow is best for sequential tasks.
 Data flow and reduction models exploit parallelism.
 Demand-driven flow avoids redundant computation.
 Together, these mechanisms provide different ways to achieve efficient execution in
modern architectures.

System Interconnect Architectures

Introduction
 In a parallel computer system, multiple processors, memory units, and I/O devices
must communicate with each other.
 The system interconnect architecture defines how these components are
connected and exchange data.
 Good interconnects ensure low latency, high bandwidth, and scalability.

Types of Interconnect Architectures

1. Bus-Based Interconnection
o All processors share a common bus.
o Simple and low-cost.
o Limitation: Only one transfer at a time → not scalable.
2. Crossbar Switch
o Provides a direct connection between any processor and memory module.
o High performance, but expensive due to hardware complexity.
3. Multistage Interconnection Networks (MINs)
o Use multiple stages of switches to connect processors and memory.
o Examples: Omega network, Banyan network.
o Balance between cost and performance.
4. Mesh and Torus Networks
o Processors arranged in a grid/torus topology.
o Good scalability, moderate cost.
o Example: 2D mesh in many-core processors.
5. Hypercube Interconnection
o Processors connected in the form of a hypercube (n-dimensions).
o Diameter = log₂(N), so fast communication.
o Suitable for large parallel systems.
6. Tree and Fat-Tree Networks
o Hierarchical structure resembling a tree.
o Fat-tree improves bandwidth at higher levels.
o Widely used in clusters and supercomputers.

Key Properties to Evaluate Interconnects

 Diameter: Maximum hops between nodes.
 Bisection Width: Minimum links cut to split the network into halves.
 Latency: Time for a message to travel.
 Bandwidth: Amount of data transferred per unit time.
 Scalability: Ability to grow with more processors.

Diagram (Conceptual)
System Interconnects:
├── Bus
├── Crossbar
├── Multistage (Omega, Banyan)
├── Mesh / Torus
├── Hypercube
└── Tree / Fat-Tree

Conclusion
 Bus is simple but limited.
 Crossbar is powerful but costly.
 Multistage, Mesh, Hypercube, and Tree networks provide scalable and efficient
interconnections.
 Choice of interconnect depends on system size, cost, and performance
requirements.

Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
90 pages
PDC Complete Course File
No ratings yet
PDC Complete Course File
422 pages
Parallel Processing
No ratings yet
Parallel Processing
35 pages
Introduction To Parallel Programming
No ratings yet
Introduction To Parallel Programming
129 pages
Unit 1
No ratings yet
Unit 1
22 pages
Flynn's Classification
No ratings yet
Flynn's Classification
4 pages
Baker CHPT 5 SIMD Good
No ratings yet
Baker CHPT 5 SIMD Good
94 pages
Introduction to Parallel Computing
No ratings yet
Introduction to Parallel Computing
40 pages
Multiprocessor Basics & Performance
No ratings yet
Multiprocessor Basics & Performance
52 pages
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
No ratings yet
CS 213: Parallel Processing Architectures: Laxmi Narayan Bhuyan
26 pages
CS213 Parallel Processing Syllabus
No ratings yet
CS213 Parallel Processing Syllabus
26 pages
Flynns
No ratings yet
Flynns
41 pages
CS802A Lec-2 PDF
No ratings yet
CS802A Lec-2 PDF
28 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Parallel Computer Models: PCA Chapter 1
No ratings yet
Parallel Computer Models: PCA Chapter 1
61 pages
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
No ratings yet
Parallel Processors From Client To Cloud: Omputer Rganization and Esign
43 pages
Unit VI Parallel Programming Concepts
No ratings yet
Unit VI Parallel Programming Concepts
90 pages
Flynn's Taxonomy & Parallel Models
No ratings yet
Flynn's Taxonomy & Parallel Models
27 pages
Unit Iv Parallelism
No ratings yet
Unit Iv Parallelism
80 pages
CS0051 - Module 01 - Subtopic 1
No ratings yet
CS0051 - Module 01 - Subtopic 1
27 pages
Cloud Computing - Lecture 3
No ratings yet
Cloud Computing - Lecture 3
22 pages
CS0051 - Module 01
No ratings yet
CS0051 - Module 01
52 pages
Architecture of Parallel Computing
No ratings yet
Architecture of Parallel Computing
6 pages
Flynn's and Fengs Architecture
No ratings yet
Flynn's and Fengs Architecture
28 pages
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
No ratings yet
Lecture-13-14 Parallel and Distributed Systems Programming Models-Jameel
70 pages
Unit 7 - Parallel Processing Paradigm
No ratings yet
Unit 7 - Parallel Processing Paradigm
26 pages
W3C1 Principles of Parallel Computing
No ratings yet
W3C1 Principles of Parallel Computing
28 pages
Module 2
No ratings yet
Module 2
124 pages
Parallel Processing Explained
No ratings yet
Parallel Processing Explained
22 pages
COA U5 PPT Full
No ratings yet
COA U5 PPT Full
43 pages
Parallel Computing Main
No ratings yet
Parallel Computing Main
47 pages
CC UNIT-1 Material
No ratings yet
CC UNIT-1 Material
26 pages
Architecture
No ratings yet
Architecture
67 pages
Parallel Computing
No ratings yet
Parallel Computing
32 pages
Parallel Processing Essentials
No ratings yet
Parallel Processing Essentials
49 pages
Unit-1 ACA
No ratings yet
Unit-1 ACA
26 pages
Lec1 Introduction To Parallel Computing
No ratings yet
Lec1 Introduction To Parallel Computing
40 pages
HPA - Notes
No ratings yet
HPA - Notes
5 pages
Aca Unit 1.1
No ratings yet
Aca Unit 1.1
20 pages
System Design Primer
100% (1)
System Design Primer
162 pages
Pda 2
No ratings yet
Pda 2
105 pages
Parallel Computing for Tech Students
No ratings yet
Parallel Computing for Tech Students
14 pages
Downloadfile
No ratings yet
Downloadfile
16 pages
Chapter 6 Parallel and Concurrent Computing
No ratings yet
Chapter 6 Parallel and Concurrent Computing
27 pages
Unit 4
No ratings yet
Unit 4
16 pages
Chapter 02 - Asynchronous and Parallel Programming in
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in
55 pages
CC Unit 1.2
No ratings yet
CC Unit 1.2
39 pages
08 Parallel Algorithms Approches
No ratings yet
08 Parallel Algorithms Approches
12 pages
PAG Unit1
No ratings yet
PAG Unit1
64 pages
GPU Unit-1
No ratings yet
GPU Unit-1
10 pages
Advancedcomputer Architecture
No ratings yet
Advancedcomputer Architecture
91 pages
U1-Theory of Parallelism
No ratings yet
U1-Theory of Parallelism
43 pages
Parallel Computing Explained
No ratings yet
Parallel Computing Explained
5 pages
ORCL - Oracle Cloud Data Management 2023 Foundations Associate (1Z0-1105-23) Final Exam - 2
100% (1)
ORCL - Oracle Cloud Data Management 2023 Foundations Associate (1Z0-1105-23) Final Exam - 2
7 pages
Multi
No ratings yet
Multi
5 pages
Unit 1 - Part 1
No ratings yet
Unit 1 - Part 1
51 pages
Module - 4 - Parallel Processing
No ratings yet
Module - 4 - Parallel Processing
32 pages
Unit 2 Aa
No ratings yet
Unit 2 Aa
11 pages
Cloud Computing for Engineers
100% (1)
Cloud Computing for Engineers
34 pages
Multiprocessing Vs Multithreading 2
No ratings yet
Multiprocessing Vs Multithreading 2
16 pages
Perfect ? I
No ratings yet
Perfect ? I
7 pages
Event Planner Website
No ratings yet
Event Planner Website
10 pages
Unit2 ACA
No ratings yet
Unit2 ACA
14 pages
Cloud Computing Notes
No ratings yet
Cloud Computing Notes
62 pages
Unit 3
No ratings yet
Unit 3
19 pages
Freelancer Project Management Guide
No ratings yet
Freelancer Project Management Guide
28 pages
Mad
No ratings yet
Mad
9 pages
Unit 1 Aa
No ratings yet
Unit 1 Aa
14 pages
Unit5 Aca
No ratings yet
Unit5 Aca
11 pages
UiPath Forward
100% (1)
UiPath Forward
28 pages
Computer Networks - Unit - I
No ratings yet
Computer Networks - Unit - I
34 pages
Decentralizing Cloud with Pixeom
No ratings yet
Decentralizing Cloud with Pixeom
15 pages
SISd
No ratings yet
SISd
17 pages
Blaise Ejikeme
No ratings yet
Blaise Ejikeme
4 pages
Security Threats, Defense Mechanisms, Challenges, and Future Directions in Cloud Computing
No ratings yet
Security Threats, Defense Mechanisms, Challenges, and Future Directions in Cloud Computing
24 pages
Distributed Systems Chapter 1-Introduction
No ratings yet
Distributed Systems Chapter 1-Introduction
34 pages
C Progrms MCA
No ratings yet
C Progrms MCA
6 pages
JAVA Full Stack - Course Content PDF
No ratings yet
JAVA Full Stack - Course Content PDF
3 pages
Chapter 1
No ratings yet
Chapter 1
25 pages
Accenture Robotics Process Automation Capital Markets
No ratings yet
Accenture Robotics Process Automation Capital Markets
9 pages
5.3 IBM General Parallel File System: 5.3.1 Common GPFS Features
No ratings yet
5.3 IBM General Parallel File System: 5.3.1 Common GPFS Features
12 pages
Module 9 - Caching Information For Scalability
No ratings yet
Module 9 - Caching Information For Scalability
75 pages
Distributed File Systems: Unit - V Essay Questions
No ratings yet
Distributed File Systems: Unit - V Essay Questions
10 pages
Oracle Database Migration From Solaris To Linux On Nutanix
No ratings yet
Oracle Database Migration From Solaris To Linux On Nutanix
23 pages
ACE Battle Card
No ratings yet
ACE Battle Card
2 pages
Java - DB
No ratings yet
Java - DB
3 pages
Cloud Architecture Models Guide
No ratings yet
Cloud Architecture Models Guide
7 pages
Locus Soft ROI Whitepaper
No ratings yet
Locus Soft ROI Whitepaper
7 pages
Distributed Coordination Systems
No ratings yet
Distributed Coordination Systems
16 pages
Reducing Container Costs With Kubernetes: White Paper
No ratings yet
Reducing Container Costs With Kubernetes: White Paper
11 pages
Name: Shrey Anandariya Enrollment: SR21BSIT007 Div: B Subject: Cloud Computing
No ratings yet
Name: Shrey Anandariya Enrollment: SR21BSIT007 Div: B Subject: Cloud Computing
7 pages
Unit 3
No ratings yet
Unit 3
1 page
IJRAR24D3235
No ratings yet
IJRAR24D3235
7 pages
Oracle RAC
No ratings yet
Oracle RAC
4 pages
IARPA - Catalyst Entity Extraction & Disambiguation Study
No ratings yet
IARPA - Catalyst Entity Extraction & Disambiguation Study
122 pages
Shikhar - For - Orange Business
No ratings yet
Shikhar - For - Orange Business
3 pages
Ritika - Ritika Priyadarshni
No ratings yet
Ritika - Ritika Priyadarshni
1 page
Sachin Ramchandra Yadav
No ratings yet
Sachin Ramchandra Yadav
2 pages
Managing Performance Tuning in An SAP HANA Environment
No ratings yet
Managing Performance Tuning in An SAP HANA Environment
2 pages

Unit 1

Uploaded by

Unit 1

Uploaded by

AVD ARCH

Important Laws of Parallelism

Conditions for Parallelism

Diagram (Levels of Parallelism)

Parallel Computer Models

2. SIMD (Single Instruction, Multiple Data)

3. MISD (Multiple Instruction, Single Data)

4. MIMD (Multiple Instruction, Multiple Data)

Diagram (Flynn’s Taxonomy)

Other Parallel Computer Models (Beyond Flynn’s)

Advantages of Parallel Models

The State of Computing

Trends and Challenges

Multiprocessors and Multicomputers,

2. Multicomputers (Distributed-Memory Systems)

Diagram: Difference Between Multiprocessor and Multicomputer

Difference between Multiprocessors and Multicomputers

1. Memory Each processor has its own private

2. Communication Communication via shared memory. Communication via message passing.

Limited scalability (difficult beyond a few Highly scalable (can connect

Easier to program (shared memory Harder to program (requires explicit

Multi-core CPUs, SMP systems, NUMA Computer clusters, cloud data

Suffer from memory bottleneck (many

Less fault-tolerant (failure of memory may More fault-tolerant (failure of one

4. Differences between SIMD and Multivector Computers

Instruction Same instruction applied to multiple Vector instructions operate on arrays of

Large number of simple processing

CRAY supercomputers, Intel AVX, ARM

Vector: [Vector Instruction] → (Array of 1000 elements processed in

PRAM and VLSI models

Types of PRAM Models

A theoretical model for designing and A hardware-oriented model for

Assumes a shared global memory Each processor has limited local

Assumes unlimited number of Limited by chip size and transistor

Processors communicate via shared Processors communicate via on-chip

All processors work in lockstep Timing depends on circuit delays and

6. Complexity Measured in parallel time steps and Measured in Area–Time (AT²)

Algorithm analysis (sorting, searching, Hardware design (CPUs, GPUs, AI

Less flexible – once hardware is

VLSI systolic array for matrix

Architectural Development Tracks

Major Architectural Development Tracks

Diagram (Simplified Evolution Path)

Main Conditions of Parallelism

Program Partitioning and Scheduling

Program Flow Mechanisms

Types of Program Flow Mechanisms

System Interconnect Architectures

Types of Interconnect Architectures

Key Properties to Evaluate Interconnects

You might also like