0% found this document useful (0 votes)

3 views

MCSE-103 Advanced Computer Architecture (June 2020)

The document discusses Flynn's classification of parallel computing structures, detailing SISD, SIMD, MISD, and MIMD architectures, along with their definitions, examples, and use cases. It also covers the need for parallel processing, the benefits of pipelining, vector processing, and the differences between multicomputer and multiprocessor systems. Additionally, it explains interconnection network schemes and load balancing techniques in multiprocessor systems.

Uploaded by

gixayew714

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

MCSE-103 Advanced Computer Architecture (June 2020)

Uploaded by

gixayew714

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 15

### Unit 1: Flynn's and Handler's Classification of Parallel Computing Structures

#### Question 1a: Flynn's Classification of Parallel Processing

- **Flynn's Taxonomy**:

- SISD (Single Instruction stream, Single Data stream):

- Definition: A single instruction operates on a single data point at a time.

- **Architecture**:

- Single control unit directs operations.

- One processing element executes instructions.

- **Examples**:

- Traditional personal computers.

- Simple microprocessors (e.g., early Intel CPUs).

- **Use Cases**: Suitable for general-purpose computing tasks where parallelism is not required.

- **Diagram**:

```

Control Unit -> Processing Element -> Memory

```

- SIMD (Single Instruction stream, Multiple Data streams):

- Definition: One instruction operates on multiple data points simultaneously.

- **Architecture**:

- Single control unit broadcasts instructions to multiple processing elements.

- Each processing element executes the same instruction on different pieces of data.

- **Examples**:

- Modern Graphics Processing Units (GPUs).

- Vector processors used in scientific computing.

- **Use Cases**: Ideal for tasks that can be parallelized across large data sets, such as image
processing or scientific simulations.

- **Diagram**:

```
Control Unit

| | |

PE1 PE2 PE3 ... PEn

```

- MISD (Multiple Instruction streams, Single Data stream):

- Definition: Multiple instructions operate on a single data stream.

- **Architecture**:

- Multiple control units and processing elements.

- Rarely used due to limited practical applications.

- **Examples**:

- Hypothetical systems, certain fault-tolerant systems.

- **Use Cases**: Potentially useful in scenarios requiring multiple types of analysis on the same data.

- **Diagram**:

```

CU1 -> PE1

CU2 -> PE2

CU3 -> PE3 ... CUn -> PEn

Data Stream

```

- MIMD (Multiple Instruction streams, Multiple Data streams):

- Definition: Multiple processors execute different instructions on different data points

simultaneously.

- **Architecture**:

- Multiple autonomous processors, each with its own control unit.

- Processors can operate asynchronously.

- **Examples**:

- Multicore processors.

- Distributed systems like computer clusters.

- **Use Cases**: Suitable for a wide range of applications from general-purpose computing to large-
scale scientific simulations.

- **Diagram**:

```

CU1 -> PE1 -> Data Stream 1

CU2 -> PE2 -> Data Stream 2

CU3 -> PE3 -> Data Stream 3 ... CUn -> PEn -> Data Stream n

```

#### Question 1b: Need for Parallel Processing and Classification of Parallel Computing Structures

- Need for Parallel Processing:

- **Performance**:

- Parallel processing can significantly increase computational speed by dividing tasks among multiple
processors.

- Example: Weather simulations, where data can be processed in parallel to speed up forecasts.

- **Efficiency**:

- Utilizes resources more effectively by distributing workloads across processors.

- Example: Data centers distributing web requests across multiple servers.

- **Scalability**:

- Can handle larger problems by adding more processors.

- Example: Scientific research requiring large-scale simulations or data analysis.

- **Cost-Effectiveness**:

- Reduces processing time, leading to lower operational costs.

- Example: Financial modeling where faster computations can lead to timely decisions and cost
savings.

- Classification of Parallel Computing Structures:

- **Pipelined Processors**:

- Definition: Overlapping phases of instruction execution.

- Stages: Fetch, decode, execute, memory access, write-back.

- Use Case: Increases instruction throughput, commonly used in modern CPUs.

- **Diagram**:

```

Fetch -> Decode -> Execute -> Memory Access -> Write-Back

```

- **Vector Processors**:

- Definition: Perform operations on entire vectors simultaneously.

- **Use Case**: Efficient for scientific computations involving large data sets.

- Example: Cray supercomputers.

- **Diagram**:

```

Vector Processor

```

- **Array Processors**:

- **Definition**: Grid of processors performing the same instruction on different data points.

- Use Case: Suitable for data-parallel tasks.

- Example: Early SIMD systems.

- **Diagram**:

```

Array of Processing Elements (PE)

```

- **Multithreaded Processors**:

- **Definition**: Use multiple threads within a single processor to perform tasks concurrently.

- **Use Case**: Enhances performance for applications that can be parallelized at the thread level.

- Example: Modern CPUs with hyper-threading.

- **Diagram**:

```

Processor Core

Thread 1

Thread 2
```

- **Multiprocessors**:

- Definition: Systems with multiple processors working on different tasks.

- **Types**: Tightly coupled (shared memory) vs. loosely coupled (distributed memory).

- Use Case: Suitable for high-performance computing tasks.

- **Diagram**:

```

Multiple Processors -> Shared Memory

```

### Unit 1: Pipelined and Vector Processors

#### Question 2a: What is Pipelining?

- **Definition**:

- Pipelining is a technique where multiple instruction phases are overlapped to improve processing
efficiency.

- Each stage in the pipeline performs a part of an instruction, passing it to the next stage in a sequential
manner.

- Processing in the Pipeline:

- **Stages**:

- Fetch: Retrieve instruction from memory.

- Decode: Determine the required operations and operands.

- Execute: Perform the operations.

- Memory Access: Read/write data from/to memory.

- Write-Back: Store the results back in registers.

- **Example**:

- An instruction pipeline with five stages: Fetch, Decode, Execute, Memory Access, Write-Back.

- Each stage processes a different part of an instruction simultaneously.

- **Benefits**:
- **Increased Throughput**: Multiple instructions are processed simultaneously, increasing the overall
processing speed.

- **Resource Efficiency**: Better utilization of processor resources by keeping all stages active.

- **Diagram**:

```

Time ->

Fetch Decode Execute Mem Access Write-Back

| | | | |

+------->+------->+------->+--------->+---------->

```

#### Question 2b: Why Does Pipelining Improve Performance?

- **Increased Throughput**:

- Multiple instructions are processed simultaneously, leading to a higher number of instructions

executed per unit of time.

- Example: If each stage takes one clock cycle, a five-stage pipeline can complete five instructions every
five cycles once the pipeline is full.

- **Latency Reduction**:

- Each instruction has a shorter wait time as the stages overlap.

- Example: In a non-pipelined system, instructions would be executed sequentially, increasing wait time.

- **Resource Efficiency**:

- Keeps all stages of the processor active, maximizing resource usage.

- Example: Instead of having one instruction monopolize the processor, multiple instructions share
resources, reducing idle time.

- **Illustration**:

- In a non-pipelined architecture, an instruction might take five cycles to complete. In a pipelined

architecture, once the pipeline is full, an instruction completes every cycle.

- Diagram:

```
Non-Pipelined:

Time -> 1 2 3 4 5 6 7 8 9 10

I1 I2 I3

Pipelined:

Time -> 1 2 3 4 5 6 7 8 9 10

Fetch -> I1 I2 I3

Decode -> I1 I2 I3

Execute-> I1 I2 I3

Mem -> I1 I2 I3

WB -> I1 I2 I3

```

### Unit 1: Speedup, Throughput, and Efficiency of Pipelined Architecture

#### Question 3a: Speedup, Throughput, and Efficiency of a Pipelined Architecture

- **Speedup**:

- **Definition**: The ratio of the time taken to complete a task without pipelining to the time taken
with pipelining.

- Formula: Speedup (S) = Non-Pipelined Time / Pipelined Time.

- **Example**:

- Non-pipelined execution time: 100 cycles.

- Pipelined execution time: 25 cycles.

- Speedup: 100 / 25 = 4.

- **Diagram**:

```

Speedup ->

Non-Pipelined -> |------------------------------|

Pipelined -> |----|----|----|----|

```

- **Throughput**:

- Definition: The number of instructions processed per unit time.

- Formula: Throughput (T) = Number of Instructions / Time.

- **Example**:

- If a pipelined processor can complete 10 instructions in 10 cycles, its throughput is 1 instruction per
cycle.

- This is compared to a non-pipelined processor where instructions might complete sequentially,

resulting in a lower throughput.

- **Diagram**:

```

Throughput ->

Pipelined -> |----|----|----|----|----|

```

- **Efficiency**:

- **Definition**: The ratio of useful work done to the total work expended.

- Formula: Efficiency = (Number of Instructions / Pipelined Time) * 100%.

- **Example**:

- If a pipelined processor completes 100 instructions in 20 cycles:

- Efficiency = (100 / 20) * 100% = 500%.

- This high efficiency is due to the overlap of instruction processing stages, minimizing idle time and
maximizing use of resources.

- **Diagram**:

```

Efficiency ->

Pipelined -> |----|----|----|----|----|

```
### Unit 1: Vector Processing and SIMD Array Processor

#### Question 4a: What is Vector Processing?

- **Definition**:

- Vector processing involves executing a single instruction on multiple data elements simultaneously.

- This contrasts with scalar processing, where each instruction operates on a single data element at a
time.

- **Applications**:

- **Scientific Computing**: Vector processors excel in tasks such as linear algebra operations (matrix
multiplications, vector additions).

- **Graphics Processing**: Used in rendering pipelines for transforming and shading vertices.

- **Signal Processing**: Efficient for processing large volumes of data in real-time applications (e.g.,
audio and video processing).

- **Benefits**:

- **Performance**: Handles large datasets efficiently by processing multiple data elements in parallel.

- **Speed**: Significantly faster than scalar processing for operations on large arrays or matrices.

- **Power Efficiency**: Achieves higher performance per watt compared to scalar processors due to
parallelism.

- **Example**:

- Cray supercomputers historically used vector processing units for scientific simulations and modeling.

- **Diagram**:

```

Vector Processor

```

#### Question 4b: SIMD Array Processor

- **Definition**:

- SIMD (Single Instruction, Multiple Data) array processors execute the same instruction on multiple
data elements simultaneously.

- Arrays of processing elements (PEs) operate in parallel under the control of a central unit (CU).
- **Architecture**:

- Control Unit (CU): Issues instructions to multiple PEs.

- **Processing Elements (PEs)**: Execute the same instruction but on different data elements.

- Interconnection Network: Facilitates data exchange between PEs and memory.

- **Applications**:

- **Graphics Processing Units (GPUs)**: Use SIMD architecture for parallel execution of shader
programs.

- Scientific Computing: Accelerates simulations involving large-scale computations.

- **Machine Learning**: SIMD processors optimize parallel operations in neural network training.

- **Diagram**:

```

Control Unit (CU) -> [PE1, PE2, PE3, ... , PEn]

```

### Unit 2: Data and Control Hazards

#### Question 5a: Data and Control Hazards

- **Data Hazards**:

- Definition: Occur when instructions depend on the results of previous instructions.

- **Types**:

- **RAW (Read After Write)**: Reading a register before its value is updated.

- **WAR (Write After Read)**: Writing to a register before its previous value is read.

- **WAW (Write After Write)**: Writing to the same register multiple times before the previous write
completes.

- **Resolution**:

- **Forwarding**: Passing data directly from one pipeline stage to another to avoid stalls.

- Pipeline Interlocks: Inserting bubbles (no-ops) to prevent data hazards.

- Register Renaming: Using additional registers to avoid name conflicts.

- **Control Hazards**:
- **Definition**: Arise due to conditional branches that affect program flow.

- **Resolution**:

- **Branch Prediction**: Speculating whether a branch will be taken or not before the actual decision.

- **Delayed Branching**: Delaying the effect of a branch instruction until its outcome is known.

- Dynamic Scheduling: Reordering instructions to execute independently of branch decisions.

#### Question 5b: Difference Between Multicomputer and Multiprocessor Systems

- **Multicomputer Systems**:

- Definition: Comprise multiple independent computers connected via a network.

- **Characteristics**:

- Each computer has its own memory and operating system.

- Communication between computers occurs via message passing.

- Examples include clusters of PCs or workstations connected over a network.

- **Use Cases**: High availability, scalability, and fault tolerance in distributed computing
environments.

- **Multiprocessor Systems**:

- **Definition**: Consist of multiple processors sharing a common memory and operating system.

- **Characteristics**:

- Processors access shared memory for communication and synchronization.

- Examples include symmetric multiprocessing (SMP) systems or NUMA architectures.

- **Use Cases**: High-performance computing, where shared memory access speeds up inter-process
communication and data sharing.

### Unit 2: Multiprocessor Models

#### Question 6a: Multiprocessor Architectural Models

- **Definition**:

- Multiprocessor systems feature multiple processors that share a common memory space and can
execute tasks concurrently.

- **Architectural Models**:
- **Tightly Coupled**:

- Processors share memory and communicate directly.

- Suitable for applications requiring high-speed communication and synchronization.

- **Loosely Coupled**:

- Processors have separate memories and communicate via interconnection networks.

- Offers scalability and fault tolerance but requires efficient message-passing protocols.

- **Use Cases**:

- Tightly coupled systems are ideal for real-time processing and high-performance computing (HPC).

- Loosely coupled systems excel in distributed computing environments where scalability and fault
tolerance are critical.

#### Question 6b: Loosely Coupled Multiprocessor System

- **Characteristics**:

- Independent Memory: Each processor has its own memory space.

- Communication: Processors communicate via message passing over a network.

- **Scalability**: Easily scalable by adding more processors and nodes to the network.

- Examples: Beowulf clusters, grid computing networks.

- **Intra-processor Communication**:

- Processors communicate within the system using shared buses or interconnection networks.

- **Inter-processor Communication**:

- Data exchange between processors involves message-passing protocols that manage communication
overhead.

### Unit 3: Interconnection Networks and Load Balancing

#### Question 7a: Interconnection Network Schemes

- **Definition**:

- Interconnection networks connect processors, memory, and I/O devices within a multiprocessor or
multicomputer system.

- **Schemes**:
- **Bus-based**:

- Uses a shared communication bus for data exchange.

- Simple and cost-effective but can lead to bottlenecks.

- **Crossbar Switch**:

- Directly connects multiple devices in a non-blocking manner.

- Offers high throughput but can be costly and complex to implement.

- **Multistage Networks**:

- Connects devices in multiple stages (layers) of switches.

- Balances cost and performance, commonly used in large-scale systems.

- Mesh and Torus:

- Grid-based structures connecting processors in a mesh or toroidal topology.

- Provides fault tolerance and scalability, common in supercomputers and HPC clusters.

#### Question 7b: Load Balancing in Multiprocessor Systems

- **Definition**:

- Load balancing distributes tasks and computational load evenly across processors to optimize system
performance.

- **Techniques**:

- Static Load Balancing:

- Pre-determined assignment of tasks based on known workload characteristics.

- Example: Round-robin scheduling or partitioning tasks based on computational complexity.

- Dynamic Load Balancing:

- Adjusts task assignment in real-time based on current system load and performance metrics.

- Example: Task stealing where idle processors take on tasks from overloaded processors.

- **Example**:

- Job scheduling algorithms dynamically allocate tasks to processors based on their current workload.

- Load balancing ensures efficient resource utilization and minimizes idle time in multiprocessor
systems.
### Unit 3:

Synchronization and Coherence in Multiprocessor Systems

#### Question 8a: Synchronization Mechanisms in Multiprocessor Systems

- **Definition**:

- Synchronization ensures orderly execution of concurrent processes or threads sharing resources.

- **Mechanisms**:

- **Mutual Exclusion**:

- Prevents multiple processes from accessing a shared resource simultaneously.

- Example: Locks, semaphores, or atomic instructions.

- **Atomic Operations**:

- Guarantees that a sequence of operations is executed as a single unit without interruption.

- Example: Compare-and-swap (CAS) in shared memory systems.

- **Barrier Synchronization**:

- Ensures that all processes reach a specific point before continuing execution.

- Example: Barrier synchronization used in parallel computations to synchronize threads.

- **Diagram**:

```

Synchronization Mechanisms ->

Mutual Exclusion -> Locks, Semaphores

Atomic Operations -> Compare-and-swap

Barrier Synchronization -> Barrier

```

#### Question 8b: Cache Coherence Protocols

- **Definition**:

- Cache coherence ensures that multiple processors accessing shared data maintain consistency across
their local caches.
- **Protocols**:

- **MESI Protocol**:

- Maintains cache coherence using four states: Modified, Exclusive, Shared, and Invalid.

- Ensures that only one cache has the right to modify a given block of data at a time.

- **MOESI Protocol**:

- Enhances MESI by adding an Owned state, where a cache can read data without checking main
memory.

- Improves efficiency by reducing memory access latency.

- **MESIF Protocol**:

- Further refines MOESI by introducing a Forward state for quicker data transfer between caches.

- Reduces traffic on the interconnect network by allowing direct cache-to-cache transfers.

- **Implementation**:

- Hardware-based coherence protocols ensure consistent data across caches through snooping or
directory-based approaches.

- Example: Intel processors use MESI-based protocols to maintain cache coherence efficiently.

### Conclusion

This detailed response covers various aspects of parallel computing, from Flynn's and Handler's
classifications to pipelining, vector processing, hazards, multiprocessor architectures, interconnection
networks, load balancing, synchronization, and cache coherence. Each section provides in-depth
explanations, examples, and diagrams to illustrate key concepts in advanced computer architecture.

If you have any further questions or need additional clarification on any topic, feel free to ask!

MCSE-103 Advanced Computer Architecture (June 2020)
No ratings yet
MCSE-103 Advanced Computer Architecture (June 2020)
15 pages
MCSE-103 Advanced Computer Architecture (December 2020)
No ratings yet
MCSE-103 Advanced Computer Architecture (December 2020)
28 pages
jun-2017
No ratings yet
jun-2017
17 pages
jun-2017
No ratings yet
jun-2017
17 pages
MCSE-103 Advanced Computer Architecture
No ratings yet
MCSE-103 Advanced Computer Architecture
9 pages
OVERVIEW - MCSE-103 Advanced Computer Architecture
No ratings yet
OVERVIEW - MCSE-103 Advanced Computer Architecture
5 pages
MCSE-103 Advanced Computer Architecture
No ratings yet
MCSE-103 Advanced Computer Architecture
9 pages
OVERVIEW - MCSE-103 Advanced Computer Architecture
No ratings yet
OVERVIEW - MCSE-103 Advanced Computer Architecture
5 pages
MCSE-103 Advanced Computer Architecture (December 2017)
No ratings yet
MCSE-103 Advanced Computer Architecture (December 2017)
19 pages
MCSE-103 Advanced Computer Architecture (December 2017)
No ratings yet
MCSE-103 Advanced Computer Architecture (December 2017)
19 pages
MCSE
No ratings yet
MCSE
12 pages
2.1
No ratings yet
2.1
37 pages
Parallel_Processing_Exam_Answers
No ratings yet
Parallel_Processing_Exam_Answers
9 pages
ca 3
No ratings yet
ca 3
4 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
Instruction Pipelining and SuperScalar Development - 2019
No ratings yet
Instruction Pipelining and SuperScalar Development - 2019
53 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Parallel Archtecture and Computing
No ratings yet
Parallel Archtecture and Computing
65 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
parallel computing A2
No ratings yet
parallel computing A2
5 pages
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
No ratings yet
CS326 Parallel and Distributed Computing: SPRING 2021 National University of Computer and Emerging Sciences
47 pages
OS FINAAAAAAAAAAL_20
No ratings yet
OS FINAAAAAAAAAAL_20
90 pages
Final
No ratings yet
Final
26 pages
Parallel Computer Structures
No ratings yet
Parallel Computer Structures
23 pages
Pipelining, Introduction To Parallel Processing and Operating System
No ratings yet
Pipelining, Introduction To Parallel Processing and Operating System
50 pages
COAU5
No ratings yet
COAU5
31 pages
CSC580 Quick Notes Lect1and2
100% (1)
CSC580 Quick Notes Lect1and2
18 pages
24-25 - Parallel Processing PDF
No ratings yet
24-25 - Parallel Processing PDF
36 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
5 Pipeline
No ratings yet
5 Pipeline
63 pages
PCNOTES_241016_165758
No ratings yet
PCNOTES_241016_165758
15 pages
Parallel Processing Parallel Processing
No ratings yet
Parallel Processing Parallel Processing
64 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
QUESTION BANK UNIT 5 - Computer Organization and Architecture
No ratings yet
QUESTION BANK UNIT 5 - Computer Organization and Architecture
9 pages
Unit-V NEW
No ratings yet
Unit-V NEW
21 pages
JNTUH COA Unit 5
No ratings yet
JNTUH COA Unit 5
31 pages
Module 1
No ratings yet
Module 1
14 pages
Unit 4 ES
No ratings yet
Unit 4 ES
7 pages
Unit 6 - Pipeline, Vector Processing and Multiprocessors
No ratings yet
Unit 6 - Pipeline, Vector Processing and Multiprocessors
23 pages
HPC Pyq 2023
No ratings yet
HPC Pyq 2023
24 pages
HPC Lecture (1) Summary
No ratings yet
HPC Lecture (1) Summary
8 pages
Week1 - Parallel and Distributed Computing
100% (1)
Week1 - Parallel and Distributed Computing
46 pages
Parallel Final Exam
No ratings yet
Parallel Final Exam
9 pages
Introduction To Parallel Processing
No ratings yet
Introduction To Parallel Processing
49 pages
Pipeline and Vector
No ratings yet
Pipeline and Vector
29 pages
Canvas Pipelining and Parallel Processors
No ratings yet
Canvas Pipelining and Parallel Processors
5 pages
CSO Lecture Notes Unit - 5
No ratings yet
CSO Lecture Notes Unit - 5
11 pages
Parallel_computing
No ratings yet
Parallel_computing
32 pages
Unit 6 Mom
No ratings yet
Unit 6 Mom
23 pages
SOC
No ratings yet
SOC
71 pages
ch.9 Pipeline MoDIFIED
No ratings yet
ch.9 Pipeline MoDIFIED
76 pages
Unit 6 COA
No ratings yet
Unit 6 COA
37 pages
Chapter 02 - Asynchronous and Parallel Programming in .NET
No ratings yet
Chapter 02 - Asynchronous and Parallel Programming in .NET
55 pages
Week1-Parallel-and-Distributed-Computing
No ratings yet
Week1-Parallel-and-Distributed-Computing
55 pages
Unit 5
No ratings yet
Unit 5
51 pages
cs501 lecture 36 to 45
No ratings yet
cs501 lecture 36 to 45
37 pages
Session - 29 and 30 Instruction Pipelining and Pipeline Hazards, Instruction Level Parallelism
No ratings yet
Session - 29 and 30 Instruction Pipelining and Pipeline Hazards, Instruction Level Parallelism
25 pages
Module 4
No ratings yet
Module 4
12 pages
Answer of Os
No ratings yet
Answer of Os
6 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
ASSIGNMENT question
No ratings yet
ASSIGNMENT question
2 pages
ACA1
No ratings yet
ACA1
26 pages
ACA UNIT-3
No ratings yet
ACA UNIT-3
10 pages
ACA UNIT-4
No ratings yet
ACA UNIT-4
4 pages
ACA UNIT-1
No ratings yet
ACA UNIT-1
17 pages
ASSIGNMENT question
No ratings yet
ASSIGNMENT question
2 pages
Affidavit for Gap Certificate Format
No ratings yet
Affidavit for Gap Certificate Format
2 pages
Advance Computer Networks
No ratings yet
Advance Computer Networks
76 pages
Online Netiquette For Olfu Students (Nfos) PDF
No ratings yet
Online Netiquette For Olfu Students (Nfos) PDF
2 pages
Wflow Readthedocs Io en Latest
No ratings yet
Wflow Readthedocs Io en Latest
196 pages
Factorial Program in C - Calculate Factorial of A Number - Edureka
No ratings yet
Factorial Program in C - Calculate Factorial of A Number - Edureka
3 pages
Controlling A Sinamics G120 Via Profisafe With A Simatic S7-1200 F-Cpu
No ratings yet
Controlling A Sinamics G120 Via Profisafe With A Simatic S7-1200 F-Cpu
35 pages
Scan Roa Prs-001
No ratings yet
Scan Roa Prs-001
1 page
Suraj Kumar Generated Resume
No ratings yet
Suraj Kumar Generated Resume
3 pages
Data Science Report
No ratings yet
Data Science Report
32 pages
MBA 593_F
No ratings yet
MBA 593_F
11 pages
62762
No ratings yet
62762
49 pages
Car Salesman: Case Study - I The Challenge
No ratings yet
Car Salesman: Case Study - I The Challenge
1 page
Computer Science: Topic 2.2 - Programming Techniques
No ratings yet
Computer Science: Topic 2.2 - Programming Techniques
5 pages
Lab 3
No ratings yet
Lab 3
7 pages
16 Ways To Make Money From Home in 2024 - Forbes Advisor
No ratings yet
16 Ways To Make Money From Home in 2024 - Forbes Advisor
12 pages
Desdemona SlidesCarnival
No ratings yet
Desdemona SlidesCarnival
29 pages
Lecture Notes Hands-On With Nosql - Mongodb: - O O O O O O - O O O O O O O
No ratings yet
Lecture Notes Hands-On With Nosql - Mongodb: - O O O O O O - O O O O O O O
8 pages
Unit 4 Solved-1
No ratings yet
Unit 4 Solved-1
7 pages
Check-Out Procedures: B.Sc. (HHA) / 2 Year/ Checkout Procedures
No ratings yet
Check-Out Procedures: B.Sc. (HHA) / 2 Year/ Checkout Procedures
10 pages
The Effects of Digitalization On Auditors Tools and Working Methods
No ratings yet
The Effects of Digitalization On Auditors Tools and Working Methods
43 pages
ST62T108C6
No ratings yet
ST62T108C6
104 pages
Bca - 1 Year Course
No ratings yet
Bca - 1 Year Course
51 pages
Journal - November 2022
No ratings yet
Journal - November 2022
82 pages
Cash Officer: Job Details
No ratings yet
Cash Officer: Job Details
3 pages
MCO-3-EM-2024-3rdSem-MP@-1
No ratings yet
MCO-3-EM-2024-3rdSem-MP@-1
15 pages
Annex A
No ratings yet
Annex A
2 pages
Unit 1 Python
No ratings yet
Unit 1 Python
52 pages
SBTi Guidance For ICT
No ratings yet
SBTi Guidance For ICT
25 pages
Project Proposal Comlab - Projector
No ratings yet
Project Proposal Comlab - Projector
3 pages
Proposed 4 Bedroom Bungalo
No ratings yet
Proposed 4 Bedroom Bungalo
11 pages
CVTSP1120-M05-Commvault Disaster Recovery
No ratings yet
CVTSP1120-M05-Commvault Disaster Recovery
19 pages
52-BR-FW Sdwan
No ratings yet
52-BR-FW Sdwan
9 pages

MCSE-103 Advanced Computer Architecture (June 2020)

Uploaded by

MCSE-103 Advanced Computer Architecture (June 2020)

Uploaded by

### Unit 1: Flynn's and Handler's Classification of Parallel Computing Structures

#### Question 1a: Flynn's Classification of Parallel Processing

- **SISD (Single Instruction stream, Single Data stream)**:

- **Definition**: A single instruction operates on a single data point at a time.

- Single control unit directs operations.

- One processing element executes instructions.

- Traditional personal computers.

- Simple microprocessors (e.g., early Intel CPUs).

Control Unit -> Processing Element -> Memory

- **SIMD (Single Instruction stream, Multiple Data streams)**:

- **Definition**: One instruction operates on multiple data points simultaneously.

- Single control unit broadcasts instructions to multiple processing elements.

- Modern Graphics Processing Units (GPUs).

- Vector processors used in scientific computing.

PE1 PE2 PE3 ... PEn

- **MISD (Multiple Instruction streams, Single Data stream)**:

- **Definition**: Multiple instructions operate on a single data stream.

- Multiple control units and processing elements.

- Rarely used due to limited practical applications.

- Hypothetical systems, certain fault-tolerant systems.

CU1 -> PE1

CU2 -> PE2

CU3 -> PE3 ... CUn -> PEn

- **MIMD (Multiple Instruction streams, Multiple Data streams)**:

- **Definition**: Multiple processors execute different instructions on different data points

- Multiple autonomous processors, each with its own control unit.

- Processors can operate asynchronously.

- Distributed systems like computer clusters.

CU1 -> PE1 -> Data Stream 1

CU2 -> PE2 -> Data Stream 2

- **Need for Parallel Processing**:

- Utilizes resources more effectively by distributing workloads across processors.

- Example: Data centers distributing web requests across multiple servers.

- Can handle larger problems by adding more processors.

- Example: Scientific research requiring large-scale simulations or data analysis.

- Reduces processing time, leading to lower operational costs.

- **Classification of Parallel Computing Structures**:

- **Definition**: Overlapping phases of instruction execution.

- **Stages**: Fetch, decode, execute, memory access, write-back.

- **Use Case**: Increases instruction throughput, commonly used in modern CPUs.

- **Definition**: Perform operations on entire vectors simultaneously.

- **Example**: Cray supercomputers.

- **Use Case**: Suitable for data-parallel tasks.

- **Example**: Early SIMD systems.

Array of Processing Elements (PE)

- **Example**: Modern CPUs with hyper-threading.

- **Definition**: Systems with multiple processors working on different tasks.

- **Use Case**: Suitable for high-performance computing tasks.

Multiple Processors -> Shared Memory

### Unit 1: Pipelined and Vector Processors

#### Question 2a: What is Pipelining?

- **Processing in the Pipeline**:

- **Fetch**: Retrieve instruction from memory.

- **Decode**: Determine the required operations and operands.

- **Execute**: Perform the operations.

- **Memory Access**: Read/write data from/to memory.

- **Write-Back**: Store the results back in registers.

- Each stage processes a different part of an instruction simultaneously.

Fetch Decode Execute Mem Access Write-Back

#### Question 2b: Why Does Pipelining Improve Performance?

- Multiple instructions are processed simultaneously, leading to a higher number of instructions

- Each instruction has a shorter wait time as the stages overlap.

- Keeps all stages of the processor active, maximizing resource usage.

- In a non-pipelined architecture, an instruction might take five cycles to complete. In a pipelined

### Unit 1: Speedup, Throughput, and Efficiency of Pipelined Architecture

#### Question 3a: Speedup, Throughput, and Efficiency of a Pipelined Architecture

- **Formula**: Speedup (S) = Non-Pipelined Time / Pipelined Time.

- Non-pipelined execution time: 100 cycles.

- Pipelined execution time: 25 cycles.

Non-Pipelined -> |------------------------------|

Pipelined -> |----|----|----|----|

- **Definition**: The number of instructions processed per unit time.

- **Formula**: Throughput (T) = Number of Instructions / Time.

- This is compared to a non-pipelined processor where instructions might complete sequentially,

Pipelined -> |----|----|----|----|----|

- **Formula**: Efficiency = (Number of Instructions / Pipelined Time) * 100%.

- SISD (Single Instruction stream, Single Data stream):

- Definition: A single instruction operates on a single data point at a time.

- SIMD (Single Instruction stream, Multiple Data streams):

- Definition: One instruction operates on multiple data points simultaneously.

- MISD (Multiple Instruction streams, Single Data stream):

- Definition: Multiple instructions operate on a single data stream.

- MIMD (Multiple Instruction streams, Multiple Data streams):

- Definition: Multiple processors execute different instructions on different data points

- Need for Parallel Processing:

- Classification of Parallel Computing Structures:

- Definition: Overlapping phases of instruction execution.

- Stages: Fetch, decode, execute, memory access, write-back.

- Use Case: Increases instruction throughput, commonly used in modern CPUs.

- Definition: Perform operations on entire vectors simultaneously.

- Example: Cray supercomputers.

- Use Case: Suitable for data-parallel tasks.

- Example: Early SIMD systems.

- Example: Modern CPUs with hyper-threading.

- Definition: Systems with multiple processors working on different tasks.

- Use Case: Suitable for high-performance computing tasks.

- Processing in the Pipeline:

- Fetch: Retrieve instruction from memory.

- Decode: Determine the required operations and operands.

- Execute: Perform the operations.

- Memory Access: Read/write data from/to memory.

- Write-Back: Store the results back in registers.

- Formula: Speedup (S) = Non-Pipelined Time / Pipelined Time.

- Definition: The number of instructions processed per unit time.

- Formula: Throughput (T) = Number of Instructions / Time.

- Formula: Efficiency = (Number of Instructions / Pipelined Time) * 100%.

- Control Unit (CU): Issues instructions to multiple PEs.

- Interconnection Network: Facilitates data exchange between PEs and memory.

- Scientific Computing: Accelerates simulations involving large-scale computations.

- Definition: Occur when instructions depend on the results of previous instructions.

- Pipeline Interlocks: Inserting bubbles (no-ops) to prevent data hazards.

- Register Renaming: Using additional registers to avoid name conflicts.

- Dynamic Scheduling: Reordering instructions to execute independently of branch decisions.

- Definition: Comprise multiple independent computers connected via a network.

- Independent Memory: Each processor has its own memory space.

- Communication: Processors communicate via message passing over a network.

- Examples: Beowulf clusters, grid computing networks.

- Mesh and Torus:

- Static Load Balancing:

- Dynamic Load Balancing: