0% found this document useful (0 votes)
17 views72 pages

Processor Architectures and Instruction Sets

This document covers the fundamentals of processor and instruction set architectures, focusing on memory operations, addressing modes, and instruction types. It explains how processors interact with memory, the importance of multiprocessor systems, and provides examples of assembly-level programming. Additionally, it discusses the differences between various instruction set architectures like x86 and ARM, highlighting their use cases and characteristics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views72 pages

Processor Architectures and Instruction Sets

This document covers the fundamentals of processor and instruction set architectures, focusing on memory operations, addressing modes, and instruction types. It explains how processors interact with memory, the importance of multiprocessor systems, and provides examples of assembly-level programming. Additionally, it discusses the differences between various instruction set architectures like x86 and ARM, highlighting their use cases and characteristics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIT 5

PROCESSOR AND INSTRUCTION SET


ARCHITECTURES
LEARNING OBJECTIVES
1. Explain memory locations and operations in
processors.
2. Identify and describe different addressing modes.
3. Differentiate various instruction types used in ISAs.
4. Develop simple assembly-level programming examples.
5. Explain the concept and importance of
multiprocessors.
INTRODUCTION
 Understanding how processors work with memory and perform data
operations is important in computer systems.
 The processor is the main part of a computer that uses different methods
(addressing modes) to access memory and run instructions like moving data,
doing calculations, and making decisions.
 Learning these basics helps you write simple code that follows how the
processor works.
 Today, with faster and more powerful computers, it’s also important to know
how systems with more than one processor (multiprocessor systems) work.
 This unit will cover these topics with clear explanations and practical
examples.
MEMORY LOCATIONS AND
OPERATIONS
MEMORY LOCATIONS AND OPERATIONS
 Memory is a sequence of storage cells, each with a
unique address.
 Two common units are:
 Byte – The smallest unit (8 bits). Each memory address points
to one byte.
 Word – A group of bytes (often 16 bits, 32 bits, 64 bits
depending on the CPU architecture).
PROCESSOR MEMORY TYPES
 Random Access Memory (RAM): Read/write memory for data
and instructions.
 Read-Only Memory (ROM): Non-volatile, permanent storage;
often used for firmware.
 Flash Memory: Re-writable memory, slower and less durable
than RAM.
 Memory Hierarchies: Registers (fastest), cache memory,
virtual memory, main memory (RAM), secondary storage.
MEMORY LOCATION CONCEPTS
 Byte: Smallest addressable unit.
 Word: Multiple bytes, aligned for efficient access.
 Aligned Addressing
 Data is stored at addresses divisible by the word size (e.g., 0, 4, 8 for 32-bit words).
 CPU reads the entire word in one fast operation.
 Unaligned Addressing
 Data stored at addresses not divisible by the word size (e.g., 3, 5, 7).
 CPU needs multiple reads, making access slower.
 Quick Analogy: Aligned grab a whole box and unaligned grab parts from
two boxes.
DATA FLOW IN MEMORY
 Data is transferred between memory and the processor via load and store operations.
 Load (LD): Move data from memory → to register (Memory Read operation).
 Example: MOV r0, [r1] (load value at address in r1 into r0)
 Store (ST): Move data from register → to memory (Memory Write operation).
 Example: ST r0, [r1] (store r0 value into memory at address in r1)
 Both require an exact memory address to read or write.
 How it Works:
 Controlled by control signals and address calculation logic.
 CPU calculates where to read/write data.
 With Offset Example (ARM): ldr r0, [r1, #10] → loads data from address r1 + 10 into r0.
DATAPATH: EXECUTING AN ARITHMETIC INSTRUCTION

 The datapath is the part of the CPU where data actually moves and operations happen.
 Executing an Arithmetic Instruction (Step by Step)
 Example:ADD R0, R1, R2
1. Instruction Fetch: CPU fetches the instruction from memory into the instruction
register.
2. Instruction Decode: Control unit decodes the opcode to understand it's an ADD.
3. Operand Fetch: Values from R1 and R2 are read from the register file.
4. Execute (ALU): ALU performs addition of values from R1 and R2.
5. Result Storage: Result is written back into R0.
6. Next Instruction: Program Counter (PC) increments to fetch the next instruction.
MEMORY READ AND WRITE
 Memory Read
 CPU places the address of data on the address bus.
 Control unit signals a memory read operation.
 Memory responds by placing data on the data bus.
 CPU reads data into a register.
 Memory Write
 CPU places the address on the address bus.
 Data to write is placed on the data bus.
 Control unit signals a memory write operation.
 Memory writes data into the addressed location.
ADDRESSING MODES
ADDRESSING MODES
 Addressing modes describe how the CPU finds operands (values or data)
when running instructions.
 Operands can be:
 Constants (e.g., #5)
 Registers (e.g., r0, ax)
 Memory locations (e.g., [0x1000], [r1, #10])
 Understanding these modes helps with writing and reading assembly
language.
 Data Flow: Address calculation based on mode; control unit computes
effective addresses for memory access.
Mode Description Example Used In

Immediate ADDRESSING MODESmov


Operand is a constant in the instruction
r0, #10
mov eax, 5
ARM
8086

Register Operand is stored in a register add r1, r2, r3, ARM


add eax, ebx → eax = eax + ebx 8086

Direct Operand is at a fixed memory address mov ax, [0x1000] 8086

Indirect Register contains the memory address of ldr r0, [r1] ARM
operand mov ax, [bx] 8086

Indexed Address = base + index register mov eax, [ebx + esi] 8086

Base + Offset Address = base register + constant offset ldr r0, [r1, #10] ARM
mov eax, [ebx + 4] 8086
ADDRESSING MODE EXAMPLES
 Immediate Addressing: Operand is a constant
in the instruction.
 Fast and simple; no memory access.
 Example:
 ARM: mov r0, #5 – Loads 5 into r0.
 8086: mov ax, 5
ADDRESSING MODE EXAMPLES
 Register Addressing: Operand is already in a
register.
 Very fast; no memory involved.
 Example:
 ARM: add r0, r1, r2.
 8086: add ax, bx
ADDRESSING MODE EXAMPLES
 Direct Addressing: Address is directly given in the
instruction.
 Example:
 8086: mov ax, [0x1000] – Load value from address 0x1000.
 Indirect Addressing: A register holds the memory address.
 Example:
 8086: mov ax, [bx].
 ARM: ldr r0, [r1].
ADDRESSING MODE EXAMPLES

 Indexed Addressing: Address = base register + index


register.
 Good for accessing arrays.
 Example:
 8086: mov eax, [ebx + esi].
 ARM: ldr r0, [r1, r2].
ADDRESSING MODE EXAMPLES

 Base + Offset Addressing: Address = base register +


a constant number.
 Used for local variables or fields in structs.
 Example:
 ARM: ldr r0, [r1, #10].
 8086: mov eax, [ebx + 4].
Mode Advantages Disadvantages

ADVANTAGES
Immediate & DISADVANTAGES
Fast, no memory access
Cannot handle large or changing
values

Register Fastest, efficient Limited number of registers

Direct Simple for fixed memory Not flexible, slower than register

Enables pointers and


Indirect Needs extra register, slower
dynamic access

Indexed Great for arrays and tables Needs 2 registers, complex

Flexible, good for structured


Base + Offset Offset range may be limited
data
IDENTIFY ADDRESSING MODE FROM CODE

Code Snippet Mode Reason

mov r0, #100 Immediate Constant is in the instruction

add r1, r2, r3 Register All operands are in registers

mov eax, [0x400] Direct Accessing fixed memory address

mov ax, [bx] Indirect Address held in register bx

mov eax, [ebx + ecx] Indexed Base + index register

ldr r0, [r1, #12] Base + Offset Base + constant offset


Operand How Address Is
Mode Example Best For
Location Found

Immediate
Inside the SUMMARY TABLEmov r0, #10
No calculation
Constants,
instruction flags

Register Register No calculation add $t0, $t1, $t2 Fast arithmetic

Fixed memory Globals, simple


Direct From instruction mov ax, [0x1000]
address access
Register holds
Indirect Use register content ldr r0, [r1] Pointers
address
Base + index
Indexed Add 2 register values mov eax, [ebx+esi] Arrays
register
Base + Register + Structs, stack
Add register and offset ldr r0, [r1, #10]
Offset constant variables
OPERAND FETCH IN A PROCESSOR
 Operand Fetch is when the CPU retrieves the data (operand) needed by an
instruction before it runs.
 Data can come from: A register (fast) or Memory (slower).
 Why it matters: The CPU must fetch data before it can perform actions like
add, compare, or jump.
 How it knows where to fetch: The addressing mode in the instruction
decides how and where to get the operand.
 Examples:
 mov r0, #5 → Immediate mode (data in instruction).
 ldr r1, [r2] → Indirect mode (address in r2).
CONCEPT OF PAGING IN MEMORY ADDRESSING
 Paging is a memory management technique that divides both a program’s
memory and physical memory (RAM) into fixed-size blocks:
 Pages (in the program)
 Frames (in physical memory)
 Both pages and frames are the same size.
 Why Use Paging?
 Manages memory efficiently.
 Removes the need for large continuous memory blocks.
 Allows programs larger than physical memory to run using virtual memory.
PAGING IN MEMORY ADDRESSING EXAMPLE
 Imagine a 16 KB program, and each page/frame is 4 KB.
 The program will be divided into 4 pages (Page 0, Page 1, Page 2, Page 3).
 Physical memory (RAM) has free frames at random spots maybe Frame 5, Frame 9, Frame 2,
and Frame 7 are available.
 The system loads each program page into any available frame.
 Program Page Loaded into Frame
 Page 0 → Frame 5
 Page 1 → Frame 9
 Page 2 → Frame 2
 Page 3 → Frame 7
 You don’t need the program stored in one big continuous space!
INSTRUCTION TYPES & INSTRUCTION SET
ARCHITECTURES (ISA)
INSTRUCTIONS
 An instruction is a fundamental command that directs the processor to
perform a specific operation.
 It is a basic unit of a program executed by the CPU to carry out tasks such as:
 Data transfer: Moving data between memory, registers, and I/O devices.
 Arithmetic and logical operations: Performing calculations and comparisons.
 Control operations: Managing program flow through branching and loops.
 Input/Output operations: Handling communication with external devices.
 Every instruction belongs to the processor’s Instruction Set Architecture (ISA)
- a formal specification that defines the available operations, their formats, and
execution behavior, ensuring standardized interaction between hardware and
software.
INSTRUCTION SET ARCHITECTURE (ISA)
 An Instruction Set Architecture (ISA) is a collection of instructions that a
processor can understand and execute.
 It defines how the CPU works with data, memory, and registers, and how
instructions are structured.
 The ISA acts as a bridge between hardware and software, making sure
programs run properly on the processor.
 Different processors have different numbers of instructions:
 x86 (CISC): Has over 500 instructions, including complex ones.
 ARM (RISC): Has about 100–150 simpler, faster instructions.
 These differences help processors suit different uses like ARM for mobile
devices and x86 for desktops and servers.
BASIC TYPES OF INSTRUCTIONS
Instruction
What It Does How It Works Example
Type
Move data between Copies data from memory
Data Transfer LOAD R1, [Addr]
memory & registers to a register or vice versa

Perform math like add or Adds or subtracts values


Arithmetic ADD R1, R2, R3
subtract in registers, stores result

Work with bits using AND, Does bit-by-bit operations


Logical AND R1, R2, R3
OR on register values

Change program order If condition is met, jump to


Control Flow BEQ R1, R2, Loop
(jump, loop) a different instruction

Communicates with Send or receive data from


Input/Output IN, OUT
external devices devices via ports
Instruction Set Architecture (ISA) vs Microarchitecture
Aspect Instruction Set Architecture (ISA) Microarchitecture

Definition Interface between software and hardware. Internal design and organization of the CPU.

Defines what instructions the CPU can Defines how the CPU executes those
Focus
execute and how they behave. instructions internally.
Instruction set, formats, addressing modes, ALU, registers, control units, pipelines, caches,
Includes
data types. buses.
Ensures software compatibility across Optimizes performance, power consumption,
Purpose
different processors. and complexity.
Intel and AMD have different CPU designs but
Example x86 ISA used by Intel and AMD CPUs.
support x86 ISA.
Impact on Software runs the same on any CPU Performance and efficiency vary depending on
Software supporting the ISA. the microarchitecture.
Types of ISA & Use Cases

ISA Type Description Common Use Cases


x86 A Complex Instruction Set Computing (CISC) Commonly used in desktops, laptops, and
architecture featuring a large and versatile set of servers. Widely implemented by processors
instructions, enabling compact and feature-rich from Intel and AMD.
programs.
ARM A Reduced Instruction Set Computing (RISC) Popular in smartphones, tablets, embedded
architecture designed for simplicity, efficiency, and low systems, IoT devices, and low-power
power consumption. Uses fewer, faster instructions. electronics (e.g., ARM Cortex series).

RISC-V An open-source RISC architecture that is modular and Frequently adopted in academic research,
freely available. It allows customization for specific education, and customizable embedded
applications and is highly suited for experimentation. systems due to its open design.
MIPS A classic RISC-based architecture known for its Still used in networking devices (routers,
simplicity and clean design, historically influential in switches) and university-level education.
teaching computer architecture principles.
JVM (Stack-based) A stack-based ISA where instructions operate on a Used for Java applications, WebAssembly, and
runtime stack rather than registers. Facilitates cross-platform software environments.
portability across hardware platforms.
Stack-based vs Register-based ISA
Feature Stack-based ISA Register-based ISA
A Stack-based Instruction Set Architecture is a type A Register-based Instruction Set Architecture is a CPU
Definition of CPU design where most instructions implicitly operate design where instructions operate directly on named registers
on a stack — a last-in, first-out (LIFO) data structure. — fast, small storage locations within the CPU.
Operands are implicitly taken from and stored to a stack
Operands are explicitly stored in and accessed through named
Data Handling using PUSH and POP operations. No operand names are
registers. Instruction operands specify the registers.
specified in the instructions.

Instruction Typically shorter and simpler, since operands are implied Longer and more structured, as each instruction specifies
Format by the stack’s top positions. source and destination registers.

Speed & Generally slower, because stack memory operations Typically faster, as direct register access reduces memory
Performance require more frequent memory reads/writes. interactions. Suitable for pipelined architectures.

More flexible and efficient for modern hardware, but


Simple and easy to implement, often used in virtual
Complexity requires more complex hardware design and instruction
machines and interpreters.
encoding.

Instruction Harder to optimize, as operands are limited to stack top Easier for compilers and CPUs to optimize, since any register can
Scheduling elements. typically be selected.

Virtual machines, platform-independent environments, and High-performance general-purpose processors, embedded


Use Cases
systems with simpler hardware needs. systems, and real-time applications.

JVM (Java Virtual Machine), WebAssembly, Forth ARM, MIPS, RISC-V, x86 (CISC ISA but with RISC-like
Examples
language systems. register operations).
DATA TRANSFER INSTRUCTIONS
 Move data between registers and memory.
 These instructions do not affect flags.

Architecture Example Meaning


Load value from memory
ARM LDR R0, [R1]
into R0
8086 MOV AX, BX Copy BX value into AX

Load word from memory


MIPS lw $t0, 0($t1)
into register $t0
ARITHMETIC AND LOGICAL INSTRUCTIONS
Operation ARM Example 8086 Example MIPS Example
Addition ADD R0, R1, R2 ADD AX, BX add $t0, $t1, $t2
Subtraction SUB R0, R1, #4 SUB AX, 4 sub $t0, $t1, $t2
Bitwise AND AND R0, R1, R2 AND AX, BX and $t0, $t1, $t2
Bitwise OR ORR R0, R1, R2 OR AX, BX or $t0, $t1, $t2

Arithmetic and Logical Instructions


Purpose Perform mathematical and logical (bitwise) operations.
Effect The above instructions often set flags like Zero (Z), Carry
(C), or Negative (N).
COMPARISON INSTRUCTIONS
Architecture Example Explanation

ARM CMP R1, R2 Compare R1 and R2


(sets flags)
8086 CMP AX, BX Compares AX and BX

MIPS slt $t0, $t1, $t2 Sets $t0 = 1 if $t1 <


$t2, else 0
Comparison (sets flags, no result stored)
BRANCH INSTRUCTIONS
Type ARM 8086 MIPS

Unconditional B label JMP label j label


Jump
Conditional BEQ label JE label beq $t0, $t1, label
Branch
Call Subroutine BL func CALL func jal func

Return RET RET jr $ra


INSTRUCTION FORMAT (STRUCTURE)
 Every machine instruction has two main parts:
1. Opcode (Operation Code): This tells the CPU what action
to perform.
 Examples:
 ADD (add numbers)
 MOV (move data)
 SUB (subtract)
2. Operands: These tell the CPU where to find the data to
use or where to store the result.
 Operands can be:
 Registers: Small, fast storage inside the CPU (like R1, R2).
 Memory addresses: Locations in the computer’s RAM.
 Constants: Fixed values (like the number 5).
INSTRUCTION FORMAT EXAMPLE
Architecture Instruction Opcode Operands

ARM ADD R0, R1, R2 ADD R0 (dest), R1, R2 (sources)

8086 MOV AX, BX MOV AX (dest), BX (source)

MIPS add $t0, $t1, $t2 add $t0 (dest), $t1, $t2

In ARM, you can add shifts:


ADD R0, R1, R2, LSL #2 → R2 is shifted left 2 bits before addition
MACHINE CODE AND INSTRUCTION DECODING
 A machine instruction is stored as binary code.
 The CPU reads it and understands:
 Opcode:What action to do (like add, move, etc.)
 Operands:Where to find or put the data (like which registers or memory)
 Example:
 ADD R1, R2, R3 means “Add the values in R2 and R3, store the result in
R1.”
 In binary, this instruction is split into parts:[opcode for ADD] [destination
R1] [source R2] [source R3]
Instruction Type Purpose Purpose 8086 Example MIPS Example
MACHINE CODE AND INSTRUCTION DECODING
Data Transfer Move data LDR R0, [R1] MOV AX, BX lw $t0, 0($t1)

Arithmetic Do math ADD R0, R1, R2 ADD AX, BX add $t0, $t1, $t2

Logical Bitwise operations AND R0, R1, R2 AND AX, BX and $t0, $t1, $t2

Comparison Set flags, no result CMP R1, R2 CMP AX, BX slt $t0, $t1, $t2

Control Flow Change instruction


BEQ label JMP label beq $t0, $t1, label
(Jump) order

Function Manage
BL, RET CALL, RET jal, jr $ra
Call/Return subroutines
STEPS IN THE INSTRUCTION CYCLE
 The CPU follows a set of steps to execute every instruction, called the Instruction
Cycle:
1. Fetch: Read the next instruction from memory (address in Program Counter -
PC).Store it in the Instruction Register (IR).
2. Decode: The Control Unit (CU) interprets the instruction to identify the
operation and data locations.
3. Operand Fetch: If needed, fetch data (operands) from registers or memory using
the addressing mode.
4. Execute:The ALU or CPU performs the operation (e.g., add, compare, jump).
5. Write Back: Store the result in a register or memory, as required.
6. Update PC: Move to the next instruction or jump to a new address if specified.
SIMPLE PROGRAMMING EXAMPLES
SIMPLE PROGRAMMING TOOLS
 Pseudocode:
 Informal, high-level description of a program.
 Written in plain language to show logic.
 Not executable but easy to understand.
 Used for planning algorithms.
 Assembly Language:
 Low-level language close to machine code.
 Uses short commands (e.g., add, mov) to control hardware.
 Requires knowledge of CPU (registers, memory).
 Translated to machine code by an assembler.
WHEN TO USE THE TOOLS

 Pseudocode: Plan and explain logic before coding.


 Assembly:Write low-level, hardware-specific code.
 Works with registers (temporary storage).
 Uses instructions like add, load, branch.
 Control flow uses conditions based on register values.
EXAMPLE 1 – SUM OF ARRAY ELEMENTS

 Pseudocode:
 sum = 0
 for each item in array:
 sum = sum + item
 return sum
EXAMPLE 1 – SUM OF ARRAY ELEMENTS
 Assembly-style (RISC-V):
 li x1, 0 # sum = 0 x1: sum
 li x2, 0 # index = 0 x2: index
 loop: x4: base address of array
 lw x3, 0(x4,x2) # load array[index]
 add x1, x1, x3 # sum += array[index]
 addi x2, x2, 1 # index++
 blt x2, length, loop # repeat if index < length
EXAMPLE 2 – IF-ELSE (FIND MAXIMUM)

 Pseudocode:
 if a > b:
 max = a
 else:
 max = b
EXAMPLE 2 – IF-ELSE (FIND MAXIMUM)
 Assembly-style (RISC-V):
 slt x5, x1, x2 # x5 = 1 if x1 < x2 slt: set x5 to 1 if less
 beq x5, x0, else # if x1 >= x2, go to else beq: branch if equal (x5 == 0)
 mv x3, x2 # max = b
 j end
 else:
 mv x3, x1 # max = a
 end:
EXAMPLE 2 – IF-ELSE (FIND MAXIMUM)
 Assembly-style (RISC-V):
 slt x5, x1, x2 # x5 = 1 if x1 < x2 slt: set x5 to 1 if less
 beq x5, x0, else # if x1 >= x2, go to else beq: branch if equal (x5 == 0)
 mv x3, x2 # max = b
 j end
 else:
 mv x3, x1 # max = a
 end:
KEY EXECUTION STEPS
1. Set Initial Values: Put starting values (like numbers, counters, or
addresses) into registers.
2. Do Arithmetic or Comparisons: Perform operations like add,
subtract, or compare values in registers.
3. Store Results: Save the result of the operation back into a register.
4. Use Branches to Control Flow: Use instructions like beq, bne, j to
decide which part of the program runs next.
5. Loop or Exit Based on Conditions: Repeat steps or exit the program
based on test results (like if a counter reached a limit).
MULTIPROCESSORS
MULTIPROCESSOR
 A multiprocessor system is a computer that has two or more processors (CPUs)
working together.
 These processors can:
 Share the same memory or have separate memory.
 Work at the same time (in parallel) to make the computer faster.
 Help the computer run many tasks at once (multitasking) and be more reliable.
 The multiprocessor can be classified based on types of multiprocessor architectures:
processor and memory configuration.
 Multiprocessor systems are commonly found in modern desktops, servers,
smartphones, and supercomputers.
HOW DO MULTIPROCESSOR SYSTEMS WORK?
 Each processor (CPU) can run its own instructions or work together with other
CPUs to handle a big task faster.
 They can share one memory or have private memory for each CPU.
 These systems improve:
 Speed (by doing more things at the same time).
 Multitasking (running many programs smoothly).
 Reliability (if one processor has a problem, others can keep working).
 Example:
 In a smartphone, multiple CPU cores: Run apps, Handle background tasks, Manage
user input all at the same time, keeping the phone fast and responsive.
TYPES OF MULTIPROCESSOR
ARCHITECTURES
SYMMETRIC MULTI-PROCESSER (SMP)
 Memory: All CPUs share the same physical memory and I/O resources.
 OS Treatment: The operating system treats all processors equally.
 Use Case: Common in desktops, laptops, and servers.
 Advantages:
 Easier programming model (using threads).
 Uniform access to resources.
 Disadvantages:
 Limited scalability because of bus contention and memory latency when many CPUs try
to access shared memory.
 Practical Example: A modern 4-core desktop PC where each core runs parts of the
operating system and user applications.
ASYMMETRIC MULTI-PROCESSER (AMP)
 Structure: One processor (the master) controls the system while the others (slaves) handle specific
tasks.
 Memory: Processors may have their own dedicated memories or share parts of the system.
 Use Case: Often found in embedded and real-time systems.
 Advantages:
 Simpler hardware design for dedicated functions.
 Efficient for specialized or time-critical operations.
 Disadvantages:
 Limited flexibility; task distribution may be unbalanced.
 More complex coordination for resource sharing.
 Practical Example: An industrial controller where one CPU manages system coordination while others
process sensor inputs and control motors.
MASSIVELY PARALLEL PROCESSER (MPP)
 Memory: Each CPU (or node) has its own private memory (distributed memory).
 Interconnection: CPUs or nodes are linked through a network topology (such as mesh,
ring, or hypercube).
 Scalability: Highly scalable, enabling the addition of many processors to handle large-scale
computations.
 Use Case: Predominantly used in supercomputers and large computing clusters.
 Advantages:
 High Scalability: Can easily scale to hundreds or thousands of processors.
 Improved Performance for Large Problems: Ideal for tasks like scientific simulations, data
analytics, and research that require massive amounts of parallel processing.
 Fault Isolation: Failure in one node often does not bring down the entire system.
MASSIVELY PARALLEL PROCESSER (MPP)
 Disadvantages:
 Programming Complexity: Requires specialized programming models (often using message
passing interfaces like MPI) to manage distributed memory.
 Interconnect Overhead: The network connecting the nodes can become a bottleneck if
not designed properly.
 Cost: High cost of setting up and maintaining extensive hardware and interconnection
networks.
 Practical Example: A weather prediction supercomputer that uses MPP to distribute
computation across thousands of nodes, each processing a portion of the overall
simulation.
MULTICORE PROCESSORS
 Integration: Multiple processor cores are integrated on a single chip.
 Memory Sharing: Cores typically share one or more levels of cache and a memory
interface.
 Use Case:Widely used in smartphones, laptops, and desktops.
 Advantages:
 High performance in a compact and energy-efficient design.
 Enables extensive multitasking.
 Disadvantages:
 Increased complexity for maintaining cache coherence.
 Possible thermal challenges due to concentrated processing power.
 Practical Example: A smartphone with 8 cores that balance performance and battery
efficiency while running several apps simultaneously.
SIMULTANEOUS MULTITHREADING (SMT)
 Concept:A single CPU core can execute instructions from multiple threads concurrently.
 Resource Sharing: Multiple threads share core resources, increasing overall utilization.
 Use Case: Features like Intel’s Hyper-Threading enhance performance in modern CPUs.
 Advantages:
 Improves throughput without requiring additional cores.
 More efficient use of CPU resources.
 Disadvantages:
 Potential resource contention among threads.
 More complex scheduling in the OS.
 Practical Example: A CPU core managing multiple threads in a web server to handle
numerous simultaneous user requests.
Comparison of Processor Architectures

Shared Memory Distributed


Asymmetric Multicore Simultaneous
Feature (Symmetric Memory (Massively
Multiprocessing Processors Multithreading
Multiprocessing) Parallel Processing)

Multiple cores share


All processors (CPUs) Each processor (or Multiple threads share
CPUs can have private main memory, each
Memory Access share one common node) has its own the resources of a
or shared memory has its own small
memory private memory single processor core
cache
Through shared Through message Cores communicate
One main CPU Threads communicate
Communication variables in the passing (sending data through shared
controls, others assist within the same core
memory between nodes) memory

More complex
Easier to program Easier for parallel Easier — handled by
Programming because it requires Simple for small,
using threads and programming using the CPU’s own thread
Model sending messages control-based tasks
shared memory threads or processes manager
between processors

Highly scalable — can


Limited — adding too Limited by the
add hundreds or Limited because one Good scalability within
Scalability many CPUs can slow number of threads a
thousands of CPU is in charge a single chip
things down core can run at once
processors
A desktop computer A supercomputer for A car’s airbag A CPU handling
Practical A smartphone with an
with 4 processor weather forecasting or controller handling multiple threads for a
Example 8-core processor
cores scientific research sensor data busy web server
ADVANTAGES OF MULTIPROCESSOR SYSTEMS
 Faster Performance: Tasks are divided among CPUs, reducing
processing time.
 Parallel Processing: Multiple instructions run at once, increasing
system throughput.
 Reliability:The system can continue operating even if one CPU fails.
 Efficient Resource Sharing: In shared memory systems, devices and
memory are used more efficiently.
 Energy Efficiency: Modern multicore processors achieve better
performance per watt.
DISADVANTAGES OF MULTIPROCESSOR SYSTEMS
 Complex Programming: Parallel programming requires careful
synchronization, data sharing, and communication.
 Memory Contention: Multiple processors accessing memory can
lead to bottlenecks.
 Cache Coherence Issues: Maintaining a consistent view of data
across multiple caches is challenging.
 Hardware Complexity: Managing interconnects, scheduling, and
memory access adds design complexity.
 Heat Dissipation: More active cores produce more heat, which can
be a thermal management challenge.
SYNCHRONIZATION IN MULTIPROCESSORS
 When multiple processors access shared data, they must be
synchronized to avoid errors:
 Race Condition: Occurs when two CPUs try to update the same
data at the same time.
 Deadlock:When CPUs wait indefinitely for each other’s resources.
 Cache Coherence: Ensures that all processors see the most recent
value of shared data.
 Mutual Exclusion: Ensures only one CPU can access critical data at
any moment.
COMMON SYNCHRONIZATION TOOLS
Tool Description Example
CPUs continually check until a lock is Used in kernel-level
Spinlocks
free synchronization
Allow only one thread or CPU to
Mutexes File system access control
access a shared resource at a time
All CPUs must reach a certain point Synchronizing threads in
Barriers
before any proceed parallel algorithms
Atomic Perform a read-modify-write Safely incrementing a
Instructions operation without interruption shared counter
CACHE COHERENCE IN SHARED-MEMORY
MULTIPROCESSORS

 In computers with multiple processors (CPUs) that share the same


memory, each CPU usually has its own small, fast memory called a
cache.
 Cache Coherence means that when one CPU updates shared data,
all other CPUs either:
 Update their copies of that data.
 Remove (invalidate) their old copies from their caches.
 This makes sure that every CPU or core always has the latest,
correct value for shared data.
CACHE COHERENCE IN SHARED-MEMORY
MULTIPROCESSORS
 Why is Cache Coherence Important?
 If different processors have different values for the same data, programs can:
 Behave unpredictably, or
 Give wrong results.
 So, it’s important to keep data consistent between all caches and the main memory for
the computer to work correctly, smoothly, and reliably.
 Cache Coherence in Modern Processors
 Each processor core has its own L1 cache (very fast, but small).
 There may also be shared L2 or L3 caches (larger, a bit slower).
 Coherence protocols help keep all these caches updated and synchronized.
 Having more levels of cache makes cache management more complex, but it also
improves overall speed and performance.
COMMON CACHE COHERENCE PROTOCOLS
 Write-Through: Every time data is written, it updates both the CPU’s cache and the
main memory immediately.
 Write-Back: Data is first written to the cache, and main memory is updated later,
improving speed.
 MESI Protocol:Tracks cache data using four states:
 Modified: Cache has changed data not yet in main memory.
 Exclusive: Cache has the only copy, unchanged.
 Shared: Cache data is the same as memory, shared by CPUs.
 Invalid: Cache data is outdated or invalid.
SINGLE-CORE VS MULTI-CORE CPUS
Feature Single-Core CPU Multi-Core CPU

Number of Cores 1 2 or more (4, 8, 16, etc.)

Parallelism Limited to one task at a time Multiple tasks running simultaneously

Power Efficiency Less efficient per task More efficient overall

Modern PCs, smartphones, high-performance


Usage Older systems, simple applications
servers

Practical
An older Pentium processor A current Intel Core i7 processor
Example
USE CASES OF MULTIPROCESSOR ARCHITECTURES
Application Area Architecture Used Practical Example

Embedded Systems AMP A vehicle’s airbag controller handling sensor data

Personal Computers SMP, Multicore A desktop PC with an 8-core CPU running multiple apps

Cloud Computing SMP, Multicore Data centers with high-core-count servers

A weather simulation supercomputer with thousands of


Scientific Computing MPP
nodes

Mobile Devices Multicore, SMT A smartphone managing multiple background services


CONCLUSION
 Multiprocessor Systems use two or more CPUs to improve performance and reliability.
 Architectures include SMP,AMP, Multicore, SMT, and MPP—each addressing different needs:
 SMP & Multicore: Best for general-purpose, shared-memory systems.
 AMP: Suitable for specialized embedded and real-time applications.
 SMT: Enhances performance within a single core.
 MPP: Ideal for high-end, massively scalable systems used in research and scientific
computing.
 Advantages include parallel execution, increased throughput, fault tolerance, and better
resource utilization.
 Challenges include complex programming, memory contention, cache coherence issues, and
hardware complexity.
REFERENCE
 Englander, I. (2014). The Architecture Of Computer Hardware, Systems Software, &
Networking. (5th Edition). John Wiley & Sons, Inc.
 Silberschatz A., Galvin P.B., & Gagne G. (2018). Operating System Concepts. (10ed).
John Wiley & Sons, Inc.
 Stallings, W. (2020). Computer Organization and Architecture. Pearson.
 Tanenbaum, A. S. (2016). Structured Computer Organization. Pearson.
UNIT 5 END

You might also like