0% found this document useful (0 votes)
3 views

MCSE-103 Advanced Computer Architecture (June 2020)

The document discusses Flynn's classification of parallel computing structures, detailing SISD, SIMD, MISD, and MIMD architectures, along with their definitions, examples, and use cases. It also covers the need for parallel processing, the benefits of pipelining, vector processing, and the differences between multicomputer and multiprocessor systems. Additionally, it explains interconnection network schemes and load balancing techniques in multiprocessor systems.

Uploaded by

gixayew714
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

MCSE-103 Advanced Computer Architecture (June 2020)

The document discusses Flynn's classification of parallel computing structures, detailing SISD, SIMD, MISD, and MIMD architectures, along with their definitions, examples, and use cases. It also covers the need for parallel processing, the benefits of pipelining, vector processing, and the differences between multicomputer and multiprocessor systems. Additionally, it explains interconnection network schemes and load balancing techniques in multiprocessor systems.

Uploaded by

gixayew714
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

### Unit 1: Flynn's and Handler's Classification of Parallel Computing Structures

#### Question 1a: Flynn's Classification of Parallel Processing

- **Flynn's Taxonomy**:

- **SISD (Single Instruction stream, Single Data stream)**:

- **Definition**: A single instruction operates on a single data point at a time.

- **Architecture**:

- Single control unit directs operations.

- One processing element executes instructions.

- **Examples**:

- Traditional personal computers.

- Simple microprocessors (e.g., early Intel CPUs).

- **Use Cases**: Suitable for general-purpose computing tasks where parallelism is not required.

- **Diagram**:

```

Control Unit -> Processing Element -> Memory

```

- **SIMD (Single Instruction stream, Multiple Data streams)**:

- **Definition**: One instruction operates on multiple data points simultaneously.

- **Architecture**:

- Single control unit broadcasts instructions to multiple processing elements.

- Each processing element executes the same instruction on different pieces of data.

- **Examples**:

- Modern Graphics Processing Units (GPUs).

- Vector processors used in scientific computing.

- **Use Cases**: Ideal for tasks that can be parallelized across large data sets, such as image
processing or scientific simulations.

- **Diagram**:

```
Control Unit

| | |

PE1 PE2 PE3 ... PEn

```

- **MISD (Multiple Instruction streams, Single Data stream)**:

- **Definition**: Multiple instructions operate on a single data stream.

- **Architecture**:

- Multiple control units and processing elements.

- Rarely used due to limited practical applications.

- **Examples**:

- Hypothetical systems, certain fault-tolerant systems.

- **Use Cases**: Potentially useful in scenarios requiring multiple types of analysis on the same data.

- **Diagram**:

```

CU1 -> PE1

CU2 -> PE2

CU3 -> PE3 ... CUn -> PEn

Data Stream

```

- **MIMD (Multiple Instruction streams, Multiple Data streams)**:

- **Definition**: Multiple processors execute different instructions on different data points


simultaneously.

- **Architecture**:

- Multiple autonomous processors, each with its own control unit.

- Processors can operate asynchronously.

- **Examples**:

- Multicore processors.

- Distributed systems like computer clusters.


- **Use Cases**: Suitable for a wide range of applications from general-purpose computing to large-
scale scientific simulations.

- **Diagram**:

```

CU1 -> PE1 -> Data Stream 1

CU2 -> PE2 -> Data Stream 2

CU3 -> PE3 -> Data Stream 3 ... CUn -> PEn -> Data Stream n

```

#### Question 1b: Need for Parallel Processing and Classification of Parallel Computing Structures

- **Need for Parallel Processing**:

- **Performance**:

- Parallel processing can significantly increase computational speed by dividing tasks among multiple
processors.

- Example: Weather simulations, where data can be processed in parallel to speed up forecasts.

- **Efficiency**:

- Utilizes resources more effectively by distributing workloads across processors.

- Example: Data centers distributing web requests across multiple servers.

- **Scalability**:

- Can handle larger problems by adding more processors.

- Example: Scientific research requiring large-scale simulations or data analysis.

- **Cost-Effectiveness**:

- Reduces processing time, leading to lower operational costs.

- Example: Financial modeling where faster computations can lead to timely decisions and cost
savings.

- **Classification of Parallel Computing Structures**:

- **Pipelined Processors**:

- **Definition**: Overlapping phases of instruction execution.

- **Stages**: Fetch, decode, execute, memory access, write-back.

- **Use Case**: Increases instruction throughput, commonly used in modern CPUs.


- **Diagram**:

```

Fetch -> Decode -> Execute -> Memory Access -> Write-Back

```

- **Vector Processors**:

- **Definition**: Perform operations on entire vectors simultaneously.

- **Use Case**: Efficient for scientific computations involving large data sets.

- **Example**: Cray supercomputers.

- **Diagram**:

```

Vector Processor

```

- **Array Processors**:

- **Definition**: Grid of processors performing the same instruction on different data points.

- **Use Case**: Suitable for data-parallel tasks.

- **Example**: Early SIMD systems.

- **Diagram**:

```

Array of Processing Elements (PE)

```

- **Multithreaded Processors**:

- **Definition**: Use multiple threads within a single processor to perform tasks concurrently.

- **Use Case**: Enhances performance for applications that can be parallelized at the thread level.

- **Example**: Modern CPUs with hyper-threading.

- **Diagram**:

```

Processor Core

Thread 1

Thread 2
```

- **Multiprocessors**:

- **Definition**: Systems with multiple processors working on different tasks.

- **Types**: Tightly coupled (shared memory) vs. loosely coupled (distributed memory).

- **Use Case**: Suitable for high-performance computing tasks.

- **Diagram**:

```

Multiple Processors -> Shared Memory

```

### Unit 1: Pipelined and Vector Processors

#### Question 2a: What is Pipelining?

- **Definition**:

- Pipelining is a technique where multiple instruction phases are overlapped to improve processing
efficiency.

- Each stage in the pipeline performs a part of an instruction, passing it to the next stage in a sequential
manner.

- **Processing in the Pipeline**:

- **Stages**:

- **Fetch**: Retrieve instruction from memory.

- **Decode**: Determine the required operations and operands.

- **Execute**: Perform the operations.

- **Memory Access**: Read/write data from/to memory.

- **Write-Back**: Store the results back in registers.

- **Example**:

- An instruction pipeline with five stages: Fetch, Decode, Execute, Memory Access, Write-Back.

- Each stage processes a different part of an instruction simultaneously.

- **Benefits**:
- **Increased Throughput**: Multiple instructions are processed simultaneously, increasing the overall
processing speed.

- **Resource Efficiency**: Better utilization of processor resources by keeping all stages active.

- **Diagram**:

```

Time ->

Fetch Decode Execute Mem Access Write-Back

| | | | |

+------->+------->+------->+--------->+---------->

```

#### Question 2b: Why Does Pipelining Improve Performance?

- **Increased Throughput**:

- Multiple instructions are processed simultaneously, leading to a higher number of instructions


executed per unit of time.

- Example: If each stage takes one clock cycle, a five-stage pipeline can complete five instructions every
five cycles once the pipeline is full.

- **Latency Reduction**:

- Each instruction has a shorter wait time as the stages overlap.

- Example: In a non-pipelined system, instructions would be executed sequentially, increasing wait time.

- **Resource Efficiency**:

- Keeps all stages of the processor active, maximizing resource usage.

- Example: Instead of having one instruction monopolize the processor, multiple instructions share
resources, reducing idle time.

- **Illustration**:

- In a non-pipelined architecture, an instruction might take five cycles to complete. In a pipelined


architecture, once the pipeline is full, an instruction completes every cycle.

- Diagram:

```
Non-Pipelined:

Time -> 1 2 3 4 5 6 7 8 9 10

I1 I2 I3

Pipelined:

Time -> 1 2 3 4 5 6 7 8 9 10

Fetch -> I1 I2 I3

Decode -> I1 I2 I3

Execute-> I1 I2 I3

Mem -> I1 I2 I3

WB -> I1 I2 I3

```

### Unit 1: Speedup, Throughput, and Efficiency of Pipelined Architecture

#### Question 3a: Speedup, Throughput, and Efficiency of a Pipelined Architecture

- **Speedup**:

- **Definition**: The ratio of the time taken to complete a task without pipelining to the time taken
with pipelining.

- **Formula**: Speedup (S) = Non-Pipelined Time / Pipelined Time.

- **Example**:

- Non-pipelined execution time: 100 cycles.

- Pipelined execution time: 25 cycles.

- Speedup: 100 / 25 = 4.

- **Diagram**:

```

Speedup ->

Non-Pipelined -> |------------------------------|

Pipelined -> |----|----|----|----|


```

- **Throughput**:

- **Definition**: The number of instructions processed per unit time.

- **Formula**: Throughput (T) = Number of Instructions / Time.

- **Example**:

- If a pipelined processor can complete 10 instructions in 10 cycles, its throughput is 1 instruction per
cycle.

- This is compared to a non-pipelined processor where instructions might complete sequentially,


resulting in a lower throughput.

- **Diagram**:

```

Throughput ->

Pipelined -> |----|----|----|----|----|

```

- **Efficiency**:

- **Definition**: The ratio of useful work done to the total work expended.

- **Formula**: Efficiency = (Number of Instructions / Pipelined Time) * 100%.

- **Example**:

- If a pipelined processor completes 100 instructions in 20 cycles:

- Efficiency = (100 / 20) * 100% = 500%.

- This high efficiency is due to the overlap of instruction processing stages, minimizing idle time and
maximizing use of resources.

- **Diagram**:

```

Efficiency ->

Pipelined -> |----|----|----|----|----|

```
### Unit 1: Vector Processing and SIMD Array Processor

#### Question 4a: What is Vector Processing?

- **Definition**:

- Vector processing involves executing a single instruction on multiple data elements simultaneously.

- This contrasts with scalar processing, where each instruction operates on a single data element at a
time.

- **Applications**:

- **Scientific Computing**: Vector processors excel in tasks such as linear algebra operations (matrix
multiplications, vector additions).

- **Graphics Processing**: Used in rendering pipelines for transforming and shading vertices.

- **Signal Processing**: Efficient for processing large volumes of data in real-time applications (e.g.,
audio and video processing).

- **Benefits**:

- **Performance**: Handles large datasets efficiently by processing multiple data elements in parallel.

- **Speed**: Significantly faster than scalar processing for operations on large arrays or matrices.

- **Power Efficiency**: Achieves higher performance per watt compared to scalar processors due to
parallelism.

- **Example**:

- Cray supercomputers historically used vector processing units for scientific simulations and modeling.

- **Diagram**:

```

Vector Processor

```

#### Question 4b: SIMD Array Processor

- **Definition**:

- SIMD (Single Instruction, Multiple Data) array processors execute the same instruction on multiple
data elements simultaneously.

- Arrays of processing elements (PEs) operate in parallel under the control of a central unit (CU).
- **Architecture**:

- **Control Unit (CU)**: Issues instructions to multiple PEs.

- **Processing Elements (PEs)**: Execute the same instruction but on different data elements.

- **Interconnection Network**: Facilitates data exchange between PEs and memory.

- **Applications**:

- **Graphics Processing Units (GPUs)**: Use SIMD architecture for parallel execution of shader
programs.

- **Scientific Computing**: Accelerates simulations involving large-scale computations.

- **Machine Learning**: SIMD processors optimize parallel operations in neural network training.

- **Diagram**:

```

Control Unit (CU) -> [PE1, PE2, PE3, ... , PEn]

```

### Unit 2: Data and Control Hazards

#### Question 5a: Data and Control Hazards

- **Data Hazards**:

- **Definition**: Occur when instructions depend on the results of previous instructions.

- **Types**:

- **RAW (Read After Write)**: Reading a register before its value is updated.

- **WAR (Write After Read)**: Writing to a register before its previous value is read.

- **WAW (Write After Write)**: Writing to the same register multiple times before the previous write
completes.

- **Resolution**:

- **Forwarding**: Passing data directly from one pipeline stage to another to avoid stalls.

- **Pipeline Interlocks**: Inserting bubbles (no-ops) to prevent data hazards.

- **Register Renaming**: Using additional registers to avoid name conflicts.

- **Control Hazards**:
- **Definition**: Arise due to conditional branches that affect program flow.

- **Resolution**:

- **Branch Prediction**: Speculating whether a branch will be taken or not before the actual decision.

- **Delayed Branching**: Delaying the effect of a branch instruction until its outcome is known.

- **Dynamic Scheduling**: Reordering instructions to execute independently of branch decisions.

#### Question 5b: Difference Between Multicomputer and Multiprocessor Systems

- **Multicomputer Systems**:

- **Definition**: Comprise multiple independent computers connected via a network.

- **Characteristics**:

- Each computer has its own memory and operating system.

- Communication between computers occurs via message passing.

- Examples include clusters of PCs or workstations connected over a network.

- **Use Cases**: High availability, scalability, and fault tolerance in distributed computing
environments.

- **Multiprocessor Systems**:

- **Definition**: Consist of multiple processors sharing a common memory and operating system.

- **Characteristics**:

- Processors access shared memory for communication and synchronization.

- Examples include symmetric multiprocessing (SMP) systems or NUMA architectures.

- **Use Cases**: High-performance computing, where shared memory access speeds up inter-process
communication and data sharing.

### Unit 2: Multiprocessor Models

#### Question 6a: Multiprocessor Architectural Models

- **Definition**:

- Multiprocessor systems feature multiple processors that share a common memory space and can
execute tasks concurrently.

- **Architectural Models**:
- **Tightly Coupled**:

- Processors share memory and communicate directly.

- Suitable for applications requiring high-speed communication and synchronization.

- **Loosely Coupled**:

- Processors have separate memories and communicate via interconnection networks.

- Offers scalability and fault tolerance but requires efficient message-passing protocols.

- **Use Cases**:

- Tightly coupled systems are ideal for real-time processing and high-performance computing (HPC).

- Loosely coupled systems excel in distributed computing environments where scalability and fault
tolerance are critical.

#### Question 6b: Loosely Coupled Multiprocessor System

- **Characteristics**:

- **Independent Memory**: Each processor has its own memory space.

- **Communication**: Processors communicate via message passing over a network.

- **Scalability**: Easily scalable by adding more processors and nodes to the network.

- **Examples**: Beowulf clusters, grid computing networks.

- **Intra-processor Communication**:

- Processors communicate within the system using shared buses or interconnection networks.

- **Inter-processor Communication**:

- Data exchange between processors involves message-passing protocols that manage communication
overhead.

### Unit 3: Interconnection Networks and Load Balancing

#### Question 7a: Interconnection Network Schemes

- **Definition**:

- Interconnection networks connect processors, memory, and I/O devices within a multiprocessor or
multicomputer system.

- **Schemes**:
- **Bus-based**:

- Uses a shared communication bus for data exchange.

- Simple and cost-effective but can lead to bottlenecks.

- **Crossbar Switch**:

- Directly connects multiple devices in a non-blocking manner.

- Offers high throughput but can be costly and complex to implement.

- **Multistage Networks**:

- Connects devices in multiple stages (layers) of switches.

- Balances cost and performance, commonly used in large-scale systems.

- **Mesh and Torus**:

- Grid-based structures connecting processors in a mesh or toroidal topology.

- Provides fault tolerance and scalability, common in supercomputers and HPC clusters.

#### Question 7b: Load Balancing in Multiprocessor Systems

- **Definition**:

- Load balancing distributes tasks and computational load evenly across processors to optimize system
performance.

- **Techniques**:

- **Static Load Balancing**:

- Pre-determined assignment of tasks based on known workload characteristics.

- Example: Round-robin scheduling or partitioning tasks based on computational complexity.

- **Dynamic Load Balancing**:

- Adjusts task assignment in real-time based on current system load and performance metrics.

- Example: Task stealing where idle processors take on tasks from overloaded processors.

- **Example**:

- Job scheduling algorithms dynamically allocate tasks to processors based on their current workload.

- Load balancing ensures efficient resource utilization and minimizes idle time in multiprocessor
systems.
### Unit 3:

Synchronization and Coherence in Multiprocessor Systems

#### Question 8a: Synchronization Mechanisms in Multiprocessor Systems

- **Definition**:

- Synchronization ensures orderly execution of concurrent processes or threads sharing resources.

- **Mechanisms**:

- **Mutual Exclusion**:

- Prevents multiple processes from accessing a shared resource simultaneously.

- Example: Locks, semaphores, or atomic instructions.

- **Atomic Operations**:

- Guarantees that a sequence of operations is executed as a single unit without interruption.

- Example: Compare-and-swap (CAS) in shared memory systems.

- **Barrier Synchronization**:

- Ensures that all processes reach a specific point before continuing execution.

- Example: Barrier synchronization used in parallel computations to synchronize threads.

- **Diagram**:

```

Synchronization Mechanisms ->

Mutual Exclusion -> Locks, Semaphores

Atomic Operations -> Compare-and-swap

Barrier Synchronization -> Barrier

```

#### Question 8b: Cache Coherence Protocols

- **Definition**:

- Cache coherence ensures that multiple processors accessing shared data maintain consistency across
their local caches.
- **Protocols**:

- **MESI Protocol**:

- Maintains cache coherence using four states: Modified, Exclusive, Shared, and Invalid.

- Ensures that only one cache has the right to modify a given block of data at a time.

- **MOESI Protocol**:

- Enhances MESI by adding an Owned state, where a cache can read data without checking main
memory.

- Improves efficiency by reducing memory access latency.

- **MESIF Protocol**:

- Further refines MOESI by introducing a Forward state for quicker data transfer between caches.

- Reduces traffic on the interconnect network by allowing direct cache-to-cache transfers.

- **Implementation**:

- Hardware-based coherence protocols ensure consistent data across caches through snooping or
directory-based approaches.

- Example: Intel processors use MESI-based protocols to maintain cache coherence efficiently.

### Conclusion

This detailed response covers various aspects of parallel computing, from Flynn's and Handler's
classifications to pipelining, vector processing, hazards, multiprocessor architectures, interconnection
networks, load balancing, synchronization, and cache coherence. Each section provides in-depth
explanations, examples, and diagrams to illustrate key concepts in advanced computer architecture.

If you have any further questions or need additional clarification on any topic, feel free to ask!

You might also like