MCSE-103 Advanced Computer Architecture (June 2020)
MCSE-103 Advanced Computer Architecture (June 2020)
- **Flynn's Taxonomy**:
- **Architecture**:
- **Examples**:
- **Use Cases**: Suitable for general-purpose computing tasks where parallelism is not required.
- **Diagram**:
```
```
- **Architecture**:
- Each processing element executes the same instruction on different pieces of data.
- **Examples**:
- **Use Cases**: Ideal for tasks that can be parallelized across large data sets, such as image
processing or scientific simulations.
- **Diagram**:
```
Control Unit
| | |
```
- **Architecture**:
- **Examples**:
- **Use Cases**: Potentially useful in scenarios requiring multiple types of analysis on the same data.
- **Diagram**:
```
Data Stream
```
- **Architecture**:
- **Examples**:
- Multicore processors.
- **Diagram**:
```
CU3 -> PE3 -> Data Stream 3 ... CUn -> PEn -> Data Stream n
```
#### Question 1b: Need for Parallel Processing and Classification of Parallel Computing Structures
- **Performance**:
- Parallel processing can significantly increase computational speed by dividing tasks among multiple
processors.
- Example: Weather simulations, where data can be processed in parallel to speed up forecasts.
- **Efficiency**:
- **Scalability**:
- **Cost-Effectiveness**:
- Example: Financial modeling where faster computations can lead to timely decisions and cost
savings.
- **Pipelined Processors**:
```
Fetch -> Decode -> Execute -> Memory Access -> Write-Back
```
- **Vector Processors**:
- **Use Case**: Efficient for scientific computations involving large data sets.
- **Diagram**:
```
Vector Processor
```
- **Array Processors**:
- **Definition**: Grid of processors performing the same instruction on different data points.
- **Diagram**:
```
```
- **Multithreaded Processors**:
- **Definition**: Use multiple threads within a single processor to perform tasks concurrently.
- **Use Case**: Enhances performance for applications that can be parallelized at the thread level.
- **Diagram**:
```
Processor Core
Thread 1
Thread 2
```
- **Multiprocessors**:
- **Types**: Tightly coupled (shared memory) vs. loosely coupled (distributed memory).
- **Diagram**:
```
```
- **Definition**:
- Pipelining is a technique where multiple instruction phases are overlapped to improve processing
efficiency.
- Each stage in the pipeline performs a part of an instruction, passing it to the next stage in a sequential
manner.
- **Stages**:
- **Example**:
- An instruction pipeline with five stages: Fetch, Decode, Execute, Memory Access, Write-Back.
- **Benefits**:
- **Increased Throughput**: Multiple instructions are processed simultaneously, increasing the overall
processing speed.
- **Resource Efficiency**: Better utilization of processor resources by keeping all stages active.
- **Diagram**:
```
Time ->
| | | | |
+------->+------->+------->+--------->+---------->
```
- **Increased Throughput**:
- Example: If each stage takes one clock cycle, a five-stage pipeline can complete five instructions every
five cycles once the pipeline is full.
- **Latency Reduction**:
- Example: In a non-pipelined system, instructions would be executed sequentially, increasing wait time.
- **Resource Efficiency**:
- Example: Instead of having one instruction monopolize the processor, multiple instructions share
resources, reducing idle time.
- **Illustration**:
- Diagram:
```
Non-Pipelined:
Time -> 1 2 3 4 5 6 7 8 9 10
I1 I2 I3
Pipelined:
Time -> 1 2 3 4 5 6 7 8 9 10
Fetch -> I1 I2 I3
Decode -> I1 I2 I3
Execute-> I1 I2 I3
Mem -> I1 I2 I3
WB -> I1 I2 I3
```
- **Speedup**:
- **Definition**: The ratio of the time taken to complete a task without pipelining to the time taken
with pipelining.
- **Example**:
- Speedup: 100 / 25 = 4.
- **Diagram**:
```
Speedup ->
- **Throughput**:
- **Example**:
- If a pipelined processor can complete 10 instructions in 10 cycles, its throughput is 1 instruction per
cycle.
- **Diagram**:
```
Throughput ->
```
- **Efficiency**:
- **Definition**: The ratio of useful work done to the total work expended.
- **Example**:
- This high efficiency is due to the overlap of instruction processing stages, minimizing idle time and
maximizing use of resources.
- **Diagram**:
```
Efficiency ->
```
### Unit 1: Vector Processing and SIMD Array Processor
- **Definition**:
- Vector processing involves executing a single instruction on multiple data elements simultaneously.
- This contrasts with scalar processing, where each instruction operates on a single data element at a
time.
- **Applications**:
- **Scientific Computing**: Vector processors excel in tasks such as linear algebra operations (matrix
multiplications, vector additions).
- **Graphics Processing**: Used in rendering pipelines for transforming and shading vertices.
- **Signal Processing**: Efficient for processing large volumes of data in real-time applications (e.g.,
audio and video processing).
- **Benefits**:
- **Performance**: Handles large datasets efficiently by processing multiple data elements in parallel.
- **Speed**: Significantly faster than scalar processing for operations on large arrays or matrices.
- **Power Efficiency**: Achieves higher performance per watt compared to scalar processors due to
parallelism.
- **Example**:
- Cray supercomputers historically used vector processing units for scientific simulations and modeling.
- **Diagram**:
```
Vector Processor
```
- **Definition**:
- SIMD (Single Instruction, Multiple Data) array processors execute the same instruction on multiple
data elements simultaneously.
- Arrays of processing elements (PEs) operate in parallel under the control of a central unit (CU).
- **Architecture**:
- **Processing Elements (PEs)**: Execute the same instruction but on different data elements.
- **Applications**:
- **Graphics Processing Units (GPUs)**: Use SIMD architecture for parallel execution of shader
programs.
- **Machine Learning**: SIMD processors optimize parallel operations in neural network training.
- **Diagram**:
```
```
- **Data Hazards**:
- **Types**:
- **RAW (Read After Write)**: Reading a register before its value is updated.
- **WAR (Write After Read)**: Writing to a register before its previous value is read.
- **WAW (Write After Write)**: Writing to the same register multiple times before the previous write
completes.
- **Resolution**:
- **Forwarding**: Passing data directly from one pipeline stage to another to avoid stalls.
- **Control Hazards**:
- **Definition**: Arise due to conditional branches that affect program flow.
- **Resolution**:
- **Branch Prediction**: Speculating whether a branch will be taken or not before the actual decision.
- **Delayed Branching**: Delaying the effect of a branch instruction until its outcome is known.
- **Multicomputer Systems**:
- **Characteristics**:
- **Use Cases**: High availability, scalability, and fault tolerance in distributed computing
environments.
- **Multiprocessor Systems**:
- **Definition**: Consist of multiple processors sharing a common memory and operating system.
- **Characteristics**:
- **Use Cases**: High-performance computing, where shared memory access speeds up inter-process
communication and data sharing.
- **Definition**:
- Multiprocessor systems feature multiple processors that share a common memory space and can
execute tasks concurrently.
- **Architectural Models**:
- **Tightly Coupled**:
- **Loosely Coupled**:
- Offers scalability and fault tolerance but requires efficient message-passing protocols.
- **Use Cases**:
- Tightly coupled systems are ideal for real-time processing and high-performance computing (HPC).
- Loosely coupled systems excel in distributed computing environments where scalability and fault
tolerance are critical.
- **Characteristics**:
- **Scalability**: Easily scalable by adding more processors and nodes to the network.
- **Intra-processor Communication**:
- Processors communicate within the system using shared buses or interconnection networks.
- **Inter-processor Communication**:
- Data exchange between processors involves message-passing protocols that manage communication
overhead.
- **Definition**:
- Interconnection networks connect processors, memory, and I/O devices within a multiprocessor or
multicomputer system.
- **Schemes**:
- **Bus-based**:
- **Crossbar Switch**:
- **Multistage Networks**:
- Provides fault tolerance and scalability, common in supercomputers and HPC clusters.
- **Definition**:
- Load balancing distributes tasks and computational load evenly across processors to optimize system
performance.
- **Techniques**:
- Adjusts task assignment in real-time based on current system load and performance metrics.
- Example: Task stealing where idle processors take on tasks from overloaded processors.
- **Example**:
- Job scheduling algorithms dynamically allocate tasks to processors based on their current workload.
- Load balancing ensures efficient resource utilization and minimizes idle time in multiprocessor
systems.
### Unit 3:
- **Definition**:
- **Mechanisms**:
- **Mutual Exclusion**:
- **Atomic Operations**:
- **Barrier Synchronization**:
- Ensures that all processes reach a specific point before continuing execution.
- **Diagram**:
```
```
- **Definition**:
- Cache coherence ensures that multiple processors accessing shared data maintain consistency across
their local caches.
- **Protocols**:
- **MESI Protocol**:
- Maintains cache coherence using four states: Modified, Exclusive, Shared, and Invalid.
- Ensures that only one cache has the right to modify a given block of data at a time.
- **MOESI Protocol**:
- Enhances MESI by adding an Owned state, where a cache can read data without checking main
memory.
- **MESIF Protocol**:
- Further refines MOESI by introducing a Forward state for quicker data transfer between caches.
- **Implementation**:
- Hardware-based coherence protocols ensure consistent data across caches through snooping or
directory-based approaches.
- Example: Intel processors use MESI-based protocols to maintain cache coherence efficiently.
### Conclusion
This detailed response covers various aspects of parallel computing, from Flynn's and Handler's
classifications to pipelining, vector processing, hazards, multiprocessor architectures, interconnection
networks, load balancing, synchronization, and cache coherence. Each section provides in-depth
explanations, examples, and diagrams to illustrate key concepts in advanced computer architecture.
If you have any further questions or need additional clarification on any topic, feel free to ask!