Processor Architectures and Instruction Sets
Processor Architectures and Instruction Sets
The datapath is the part of the CPU where data actually moves and operations happen.
Executing an Arithmetic Instruction (Step by Step)
Example:ADD R0, R1, R2
1. Instruction Fetch: CPU fetches the instruction from memory into the instruction
register.
2. Instruction Decode: Control unit decodes the opcode to understand it's an ADD.
3. Operand Fetch: Values from R1 and R2 are read from the register file.
4. Execute (ALU): ALU performs addition of values from R1 and R2.
5. Result Storage: Result is written back into R0.
6. Next Instruction: Program Counter (PC) increments to fetch the next instruction.
MEMORY READ AND WRITE
Memory Read
CPU places the address of data on the address bus.
Control unit signals a memory read operation.
Memory responds by placing data on the data bus.
CPU reads data into a register.
Memory Write
CPU places the address on the address bus.
Data to write is placed on the data bus.
Control unit signals a memory write operation.
Memory writes data into the addressed location.
ADDRESSING MODES
ADDRESSING MODES
Addressing modes describe how the CPU finds operands (values or data)
when running instructions.
Operands can be:
Constants (e.g., #5)
Registers (e.g., r0, ax)
Memory locations (e.g., [0x1000], [r1, #10])
Understanding these modes helps with writing and reading assembly
language.
Data Flow: Address calculation based on mode; control unit computes
effective addresses for memory access.
Mode Description Example Used In
Indirect Register contains the memory address of ldr r0, [r1] ARM
operand mov ax, [bx] 8086
Indexed Address = base + index register mov eax, [ebx + esi] 8086
Base + Offset Address = base register + constant offset ldr r0, [r1, #10] ARM
mov eax, [ebx + 4] 8086
ADDRESSING MODE EXAMPLES
Immediate Addressing: Operand is a constant
in the instruction.
Fast and simple; no memory access.
Example:
ARM: mov r0, #5 – Loads 5 into r0.
8086: mov ax, 5
ADDRESSING MODE EXAMPLES
Register Addressing: Operand is already in a
register.
Very fast; no memory involved.
Example:
ARM: add r0, r1, r2.
8086: add ax, bx
ADDRESSING MODE EXAMPLES
Direct Addressing: Address is directly given in the
instruction.
Example:
8086: mov ax, [0x1000] – Load value from address 0x1000.
Indirect Addressing: A register holds the memory address.
Example:
8086: mov ax, [bx].
ARM: ldr r0, [r1].
ADDRESSING MODE EXAMPLES
ADVANTAGES
Immediate & DISADVANTAGES
Fast, no memory access
Cannot handle large or changing
values
Direct Simple for fixed memory Not flexible, slower than register
Immediate
Inside the SUMMARY TABLEmov r0, #10
No calculation
Constants,
instruction flags
Definition Interface between software and hardware. Internal design and organization of the CPU.
Defines what instructions the CPU can Defines how the CPU executes those
Focus
execute and how they behave. instructions internally.
Instruction set, formats, addressing modes, ALU, registers, control units, pipelines, caches,
Includes
data types. buses.
Ensures software compatibility across Optimizes performance, power consumption,
Purpose
different processors. and complexity.
Intel and AMD have different CPU designs but
Example x86 ISA used by Intel and AMD CPUs.
support x86 ISA.
Impact on Software runs the same on any CPU Performance and efficiency vary depending on
Software supporting the ISA. the microarchitecture.
Types of ISA & Use Cases
RISC-V An open-source RISC architecture that is modular and Frequently adopted in academic research,
freely available. It allows customization for specific education, and customizable embedded
applications and is highly suited for experimentation. systems due to its open design.
MIPS A classic RISC-based architecture known for its Still used in networking devices (routers,
simplicity and clean design, historically influential in switches) and university-level education.
teaching computer architecture principles.
JVM (Stack-based) A stack-based ISA where instructions operate on a Used for Java applications, WebAssembly, and
runtime stack rather than registers. Facilitates cross-platform software environments.
portability across hardware platforms.
Stack-based vs Register-based ISA
Feature Stack-based ISA Register-based ISA
A Stack-based Instruction Set Architecture is a type A Register-based Instruction Set Architecture is a CPU
Definition of CPU design where most instructions implicitly operate design where instructions operate directly on named registers
on a stack — a last-in, first-out (LIFO) data structure. — fast, small storage locations within the CPU.
Operands are implicitly taken from and stored to a stack
Operands are explicitly stored in and accessed through named
Data Handling using PUSH and POP operations. No operand names are
registers. Instruction operands specify the registers.
specified in the instructions.
Instruction Typically shorter and simpler, since operands are implied Longer and more structured, as each instruction specifies
Format by the stack’s top positions. source and destination registers.
Speed & Generally slower, because stack memory operations Typically faster, as direct register access reduces memory
Performance require more frequent memory reads/writes. interactions. Suitable for pipelined architectures.
Instruction Harder to optimize, as operands are limited to stack top Easier for compilers and CPUs to optimize, since any register can
Scheduling elements. typically be selected.
JVM (Java Virtual Machine), WebAssembly, Forth ARM, MIPS, RISC-V, x86 (CISC ISA but with RISC-like
Examples
language systems. register operations).
DATA TRANSFER INSTRUCTIONS
Move data between registers and memory.
These instructions do not affect flags.
MIPS add $t0, $t1, $t2 add $t0 (dest), $t1, $t2
Arithmetic Do math ADD R0, R1, R2 ADD AX, BX add $t0, $t1, $t2
Logical Bitwise operations AND R0, R1, R2 AND AX, BX and $t0, $t1, $t2
Comparison Set flags, no result CMP R1, R2 CMP AX, BX slt $t0, $t1, $t2
Function Manage
BL, RET CALL, RET jal, jr $ra
Call/Return subroutines
STEPS IN THE INSTRUCTION CYCLE
The CPU follows a set of steps to execute every instruction, called the Instruction
Cycle:
1. Fetch: Read the next instruction from memory (address in Program Counter -
PC).Store it in the Instruction Register (IR).
2. Decode: The Control Unit (CU) interprets the instruction to identify the
operation and data locations.
3. Operand Fetch: If needed, fetch data (operands) from registers or memory using
the addressing mode.
4. Execute:The ALU or CPU performs the operation (e.g., add, compare, jump).
5. Write Back: Store the result in a register or memory, as required.
6. Update PC: Move to the next instruction or jump to a new address if specified.
SIMPLE PROGRAMMING EXAMPLES
SIMPLE PROGRAMMING TOOLS
Pseudocode:
Informal, high-level description of a program.
Written in plain language to show logic.
Not executable but easy to understand.
Used for planning algorithms.
Assembly Language:
Low-level language close to machine code.
Uses short commands (e.g., add, mov) to control hardware.
Requires knowledge of CPU (registers, memory).
Translated to machine code by an assembler.
WHEN TO USE THE TOOLS
Pseudocode:
sum = 0
for each item in array:
sum = sum + item
return sum
EXAMPLE 1 – SUM OF ARRAY ELEMENTS
Assembly-style (RISC-V):
li x1, 0 # sum = 0 x1: sum
li x2, 0 # index = 0 x2: index
loop: x4: base address of array
lw x3, 0(x4,x2) # load array[index]
add x1, x1, x3 # sum += array[index]
addi x2, x2, 1 # index++
blt x2, length, loop # repeat if index < length
EXAMPLE 2 – IF-ELSE (FIND MAXIMUM)
Pseudocode:
if a > b:
max = a
else:
max = b
EXAMPLE 2 – IF-ELSE (FIND MAXIMUM)
Assembly-style (RISC-V):
slt x5, x1, x2 # x5 = 1 if x1 < x2 slt: set x5 to 1 if less
beq x5, x0, else # if x1 >= x2, go to else beq: branch if equal (x5 == 0)
mv x3, x2 # max = b
j end
else:
mv x3, x1 # max = a
end:
EXAMPLE 2 – IF-ELSE (FIND MAXIMUM)
Assembly-style (RISC-V):
slt x5, x1, x2 # x5 = 1 if x1 < x2 slt: set x5 to 1 if less
beq x5, x0, else # if x1 >= x2, go to else beq: branch if equal (x5 == 0)
mv x3, x2 # max = b
j end
else:
mv x3, x1 # max = a
end:
KEY EXECUTION STEPS
1. Set Initial Values: Put starting values (like numbers, counters, or
addresses) into registers.
2. Do Arithmetic or Comparisons: Perform operations like add,
subtract, or compare values in registers.
3. Store Results: Save the result of the operation back into a register.
4. Use Branches to Control Flow: Use instructions like beq, bne, j to
decide which part of the program runs next.
5. Loop or Exit Based on Conditions: Repeat steps or exit the program
based on test results (like if a counter reached a limit).
MULTIPROCESSORS
MULTIPROCESSOR
A multiprocessor system is a computer that has two or more processors (CPUs)
working together.
These processors can:
Share the same memory or have separate memory.
Work at the same time (in parallel) to make the computer faster.
Help the computer run many tasks at once (multitasking) and be more reliable.
The multiprocessor can be classified based on types of multiprocessor architectures:
processor and memory configuration.
Multiprocessor systems are commonly found in modern desktops, servers,
smartphones, and supercomputers.
HOW DO MULTIPROCESSOR SYSTEMS WORK?
Each processor (CPU) can run its own instructions or work together with other
CPUs to handle a big task faster.
They can share one memory or have private memory for each CPU.
These systems improve:
Speed (by doing more things at the same time).
Multitasking (running many programs smoothly).
Reliability (if one processor has a problem, others can keep working).
Example:
In a smartphone, multiple CPU cores: Run apps, Handle background tasks, Manage
user input all at the same time, keeping the phone fast and responsive.
TYPES OF MULTIPROCESSOR
ARCHITECTURES
SYMMETRIC MULTI-PROCESSER (SMP)
Memory: All CPUs share the same physical memory and I/O resources.
OS Treatment: The operating system treats all processors equally.
Use Case: Common in desktops, laptops, and servers.
Advantages:
Easier programming model (using threads).
Uniform access to resources.
Disadvantages:
Limited scalability because of bus contention and memory latency when many CPUs try
to access shared memory.
Practical Example: A modern 4-core desktop PC where each core runs parts of the
operating system and user applications.
ASYMMETRIC MULTI-PROCESSER (AMP)
Structure: One processor (the master) controls the system while the others (slaves) handle specific
tasks.
Memory: Processors may have their own dedicated memories or share parts of the system.
Use Case: Often found in embedded and real-time systems.
Advantages:
Simpler hardware design for dedicated functions.
Efficient for specialized or time-critical operations.
Disadvantages:
Limited flexibility; task distribution may be unbalanced.
More complex coordination for resource sharing.
Practical Example: An industrial controller where one CPU manages system coordination while others
process sensor inputs and control motors.
MASSIVELY PARALLEL PROCESSER (MPP)
Memory: Each CPU (or node) has its own private memory (distributed memory).
Interconnection: CPUs or nodes are linked through a network topology (such as mesh,
ring, or hypercube).
Scalability: Highly scalable, enabling the addition of many processors to handle large-scale
computations.
Use Case: Predominantly used in supercomputers and large computing clusters.
Advantages:
High Scalability: Can easily scale to hundreds or thousands of processors.
Improved Performance for Large Problems: Ideal for tasks like scientific simulations, data
analytics, and research that require massive amounts of parallel processing.
Fault Isolation: Failure in one node often does not bring down the entire system.
MASSIVELY PARALLEL PROCESSER (MPP)
Disadvantages:
Programming Complexity: Requires specialized programming models (often using message
passing interfaces like MPI) to manage distributed memory.
Interconnect Overhead: The network connecting the nodes can become a bottleneck if
not designed properly.
Cost: High cost of setting up and maintaining extensive hardware and interconnection
networks.
Practical Example: A weather prediction supercomputer that uses MPP to distribute
computation across thousands of nodes, each processing a portion of the overall
simulation.
MULTICORE PROCESSORS
Integration: Multiple processor cores are integrated on a single chip.
Memory Sharing: Cores typically share one or more levels of cache and a memory
interface.
Use Case:Widely used in smartphones, laptops, and desktops.
Advantages:
High performance in a compact and energy-efficient design.
Enables extensive multitasking.
Disadvantages:
Increased complexity for maintaining cache coherence.
Possible thermal challenges due to concentrated processing power.
Practical Example: A smartphone with 8 cores that balance performance and battery
efficiency while running several apps simultaneously.
SIMULTANEOUS MULTITHREADING (SMT)
Concept:A single CPU core can execute instructions from multiple threads concurrently.
Resource Sharing: Multiple threads share core resources, increasing overall utilization.
Use Case: Features like Intel’s Hyper-Threading enhance performance in modern CPUs.
Advantages:
Improves throughput without requiring additional cores.
More efficient use of CPU resources.
Disadvantages:
Potential resource contention among threads.
More complex scheduling in the OS.
Practical Example: A CPU core managing multiple threads in a web server to handle
numerous simultaneous user requests.
Comparison of Processor Architectures
More complex
Easier to program Easier for parallel Easier — handled by
Programming because it requires Simple for small,
using threads and programming using the CPU’s own thread
Model sending messages control-based tasks
shared memory threads or processes manager
between processors
Practical
An older Pentium processor A current Intel Core i7 processor
Example
USE CASES OF MULTIPROCESSOR ARCHITECTURES
Application Area Architecture Used Practical Example
Personal Computers SMP, Multicore A desktop PC with an 8-core CPU running multiple apps