0% found this document useful (0 votes)
16 views24 pages

Chapter II: Instructions: Language of The Computer

Chapter II of 'Computer Architecture' by Nguyen Duong Quynh Nhi discusses the language of the computer, focusing on MIPS assembly language and its design principles. It emphasizes the importance of simplicity and regularity in instruction design, as well as the balance between the number of registers and system performance. The chapter also covers memory operands, signed and unsigned numbers, and various operations related to computer instructions.

Uploaded by

anhtnh.23ba14015
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views24 pages

Chapter II: Instructions: Language of The Computer

Chapter II of 'Computer Architecture' by Nguyen Duong Quynh Nhi discusses the language of the computer, focusing on MIPS assembly language and its design principles. It emphasizes the importance of simplicity and regularity in instruction design, as well as the balance between the number of registers and system performance. The chapter also covers memory operands, signed and unsigned numbers, and various operations related to computer instructions.

Uploaded by

anhtnh.23ba14015
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Computer Architecture - Chapter II

Instructions: Language of the Computer


By Nguyen Duong Quynh Nhi from USTH Learning Support
Jun 2025

Contents

1 Operations & Operands of the Computer Hardware 3


1.1 Design Principle 1: Simplicity favors regularity . . . . . . . . . . . . . . . . . . 3
1.2 Design Principle 2: Smaller is Faster . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Memory Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3.2 Registers vs Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.3 Constant/Immediate Operands . . . . . . . . . . . . . . . . . . . . . . 5

2 Signed and Unsigned Numbers 6


2.1 Unsigned Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Signed Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2.1 2s-Complement Signed Integers . . . . . . . . . . . . . . . . . . . . . . 6

3 Representing Instructions in the Computer 8


3.1 Representing Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.2 MIPS Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Design Principle 3: Good design demands good compromises . . . . . . . . . . 9
3.4 Stored Program Computers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

4 Logical Operations 10

5 Instructions for Making Decisions 11

6 Supporting Procedures in Computer Hardware 12


6.1 Nested Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.2 Local Data on the Stack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
6.3 Communicating with People . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.3.1 Character Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
6.3.2 Byte/Halfword/Word Operations . . . . . . . . . . . . . . . . . . . . . 13

1
7 MIPS Addressing for 32-bit Immediates and Addresses 13
7.1 32-Bit Immediate Operands . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
7.2 Branch Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.3 Jump Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.4 MIPS Addressing Mode Summary . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.5 MIPS Encoding Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

8 Parallelism and Instructions: Synchronization 15

9 Translating and Starting a Program 16


9.1 Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
9.2 Assembler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
9.3 Linker . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9.4 Loader . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9.5 Dynamic Linking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
9.6 Starting a Java Program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

10 A C Sort Example to Put It All Together 18


10.1 The Procedure Swap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
10.2 The Procedure Sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
10.3 The Outer Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
10.4 Inner Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
10.5 Preserving Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

11 Arrays versus Pointers 21


11.1 Arrays vs. Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
11.2 Comparison of Array vs. Pointers . . . . . . . . . . . . . . . . . . . . . . . . . 21

12 MIPS Instructions 22

13 The Intel x86 ISA Overview 22


13.1 Performance and Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
13.2 Multimedia and SIMD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
13.3 64-bit Computing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
13.4 Advanced Vector Extensions (AVX) . . . . . . . . . . . . . . . . . . . . . . . . 23
13.5 Operand Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
13.6 Addressing Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
13.7 Instruction Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
13.8 Execution Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

14 Other RISC-V Instructions 24


14.1 RV64I Base Integer Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . 24
14.2 RV32I Variant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
14.3 Standard Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

15 Fallacies & Pitfalls 24


15.1 Fallacies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
15.2 Pitfalls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2
1 Operations & Operands of the Computer Hardware
• Each MIPS assembly language notation performs only one operation and must always
have exactly three variables.

• Example:
We want to place the sum of four variables b, c, d, and e into variable a.
Solution:
add a, b, c

– After the first step, the updated value of variable a is the sum of b & c.

add a, a, d

– In the second step, we add d to variable a which is currently the sum of b & c.

⇒ After this step, the value of variable a is the sum of b, c & d.


add a, a, e

– In this step, e is added to variable a, which means that the latest value of variable
a is the sum of b,c,d & e

1.1 Design Principle 1: Simplicity favors regularity


• MIPS assembly language requires every instruction to have exactly three operations,
keeping the hardware simple.
⇒ MIPS makes implementation simpler by keeping instruction decoding consistent
(regularity).
⇒ MIPS also enable higher performance at lower cost through its simple and
regular design (simplicity).

• Example:
A somewhat complex statement contains the five variables f, g, h, i, and j:
f = (g + h) − (i + j);
What might a C compiler produce?
Solution:
add t0, g, h // temp t0 = g + h
add t1, i, j // temp t1 = i + j
sub f, t0, t1 // f = t0 - t1

3
1.2 Design Principle 2: Smaller is Faster
• In MIPS architecture, registers are like bricks of a computer’s construction. Each
register holds 32 bits of data − this is called a word in MIPS.
• Why do MIPS has a limit of 32 registers?
• Reasons:
– More registers mean longer wires and more complex circuitry. This increases the
time it takes for electronic signals to travel — and that slows down each clock
cycle.
– The computer must balance: giving enough registers to the programmer, but not
so many that it slows down the system.
– Keeping the number of registers small allows for faster clock cycles and simpler
hardware design.
⇒ This principle highlights that keeping components (number of registers) small
helps the computer run faster.

1.3 Memory Operands


1.3.1 General
• Registers in MIPS are limited (only 32), so complex data (like arrays) is stored in
memory − a large, linear array of bytes.
• Due to the limited registers, MIPS uses data transfer instructions to move data
between memory and registers:
– lw (load word): memory → register
– sw (store word): register → memory
• MIPS is a load/store architecture:
– Arithmetic (e.g., add, sub) only works on registers.
– To use data in memory, you must load it into a register first.
• Example:
lw $t0, 0($s1) // Load word from memory address in $s1 to $t0
sw $t0, 4($s2) // Store word from $t0 into memory at $s2 + 4
• This example use a format called base addressing:
Address = Base Register + Offset

• In MIPS, words (32-bit) must be aligned at addresses divisible by 4.


• Endianness:
– Big-end/leftmost: Most significant byte stored first
– Little-end/rightmost: Least significant byte stored first

4
1.3.2 Registers vs Memory
• Registers are faster than memory because:

– Registers are limited in number and located close to the CPU.


– Accessing registers takes fewer clock cycles and uses less energy.
– Accessing data in memory requires extra instructions.
– MIPS arithmetic instructions can only operate on data in registers.

• Programs often have more variables than available registers.

– The compiler keeps the most frequently used variables in registers.


– Less-used variables are stored in memory − this is called spilling registers.

• To achieve high performance and energy efficiency:

– The instruction set should have a sufficient number of registers.


– Compilers must manage registers efficiently.

1.3.3 Constant/Immediate Operands


• To add constant to register, we can use quick add instruction with one constant operand
− this is called add immediate or addi

• Example:
addi $s3,$s3,4 // $s3 = $s3 + 4

• Making the Common Case Fast:

– Constants appear frequently in programs.


– Including constants directly in arithmetic instructions makes operations faster
and more energy-efficient than loading them from memory.
– MIPS includes a special register, $zero, which is hardwired to the value 0.
∗ This simplifies instruction design.
∗ For example, a move can be written as add with one operand as $zero.

5
2 Signed and Unsigned Numbers
2.1 Unsigned Numbers
• Numbers are kept in computer hardware as a series of high and low electronic signals
⇒ They are considered base 2 numbers (binary numbers).

• In any number base, the value of ith digit d is:

d × Basei

⇒ n-bit binary numbers can be represented in terms of bit value times a power of 2 (xi
means ith bit of x):

x = xn−1 · 2n−1 + xn−2 · 2n−2 + · · · + x1 · 21 + x0 · 20

• Range of Unsigned Numbers: 0 to +2n − 1

2.2 Signed Numbers


• Computer programs calculate both positive and negative numbers
⇒ How to distinguishes the positive from the negative?

– In the past, they used sign and magnitude to add a separate sign but were
abandoned (because of the unclear placement of the sign bit)
– Nowadays we borrow a string of leading 0s → result would have a string of leading
1s.

2.2.1 2s-Complement Signed Integers


• Given n-bit, we can represent both positive and negative n-bit numbers in terms of the
bit value times a power of 2:

x = −xn−1 · 2n−1 + xn−2 · 2n−2 + · · · + x1 · 21 + x0 · 20

• Range of Signed Numbers: −2n−1 to +2n−1 − 1

• To negate a 2s-complement binary number:

– Step 1: Invert every 0 to 1 & every 1 to 0


– Step 2: Add 1 to the result

6
• Example: Negate 2ten in 32-bit two’s complement:
• Solution:

• To convert a binary number in n bits to more than n bits (sign extension):


– Take most significant bit from smaller quantity and replicate it to fill the new bits
of the larger one
• Example: Convert 16-bit binary versions of 2ten and −2ten to 32-bit binary numbers
• Solution:

7
3 Representing Instructions in the Computer
3.1 Representing Instructions
• The instructions are kept in the computer and can be represented as numbers
⇒ Each piece = individual number & instructions = placing these numbers side by
side.
• Convention to map register names into numbers (instruction format):
– Registers $s0 to $s7 map onto registers 16 to 23
– Registers $t0 to $t7 map onto registers 8 to 15
• MIPS instructions are 32 bits long ⇒ To distinguish from assembly language, numeric
version of instructions = machine language & sequence of these = machine code
• Almost all computer data sizes are multiples of ⇒ hexadecimal numbers are popular
• The hexadecimal-binary conversion table:

• Example: eca8 6420 = 1110 1100 1010 1000 0110 0100 0010 0000

3.2 MIPS Fields

• Meaning of each name of fields in MIPS instructions:


– op: Basic operation of the instruction (traditionally called the opcode).
– rs: The first register source operand.
– rt: The second register source operand.
– rd: The register destination operand. It gets the result of the operation.
– shamt: Shift amount.
– funct: Function code. This field selects the specific variant of the operation in
the op field.
• When an instruction needs longer fields than those shown above
⇒ A problem occurs between the desire to keep all instructions the same length &
desire to have a single instruction format.

8
3.3 Design Principle 3: Good design demands good compromises
• This principle addresses how to keep all instructions the same length and still have a
single instruction format.

• In MIPS, we use multiple instruction formats (R-type for register, I-type for immediate)
to support different kind of instructions.
⇒ More formats make decoding a bit harder but allow 32-bit instructions uniformly
(consistency & efficiency).

• Due to the complicated hardware, we can reduce it by keeping the formats similar.

• For example, the first three fields of R-type & I-type formats are the same size and
same names.
⇒ How the hardware knows whether to treat the instruction as three fields (R-type)
or as a single field (I-type) ?

– The answer is the formats are distinguished by the values in the first field.

• MIPS instruction encoding:

3.4 Stored Program Computers


• Today’s computers are built on two key principles:

– Instructions are represented as numbers.


⇒ Programs are often shipped as files of binary numbers ⇒ Binary compatibility
allows compiled programs to work on different computers.
– Programs are stored in memory to be read or written, just like data
⇒ Programs can operate on programs.

9
⇒ The stored-program concept:

4 Logical Operations
• Logical operations were added to programming language & instruction set to simplify.

• They also pack & unpack bits into words.

• C & Java logical operations and their MIPS instructions:

• Shift Operations:

– Shift left (sl1): Move all bits to the left & filling the emptied bits with 0s.
– Shift right (sr1): Move all bits to the right & filling the emptied bits with 0s.

• AND Operations:

– Leaves a 1 in the result if both bits of operations are 1


– Use ”mask” to force certain bits to 0
⇒ Used to isolate or filter specific parts of binary data

• OR Operations:

– Place a 1 in the result if either operand bit is a 1.

• XOR Operations:

10
– Compares bits one by one and sets the result bit to:
∗ 1 if the bits are difference
∗ 0 if the bits are the same

5 Instructions for Making Decisions


• Decision making represented in programming languages using the if statement, some-
time combined with go to statement.
• Conditional branches:
– beq register1, register2, L1
∗ if register1 = register2, branch to instruction labeled L1
– bne register1, register2, L1
∗ if register 1 != register 2, branch to instruction labeled L1
• Example: Compilling If Statements
– C code:
if (i==j) f = g+h;
else f = g-h;
– RISC-V code:
bne x22, x23, Else // go to Else if i != j
add x19, x20, x21 // f = g + h (skipped if i != j)
beq x0, x0,Exit // unconditional branch
Else: sub x19, x20, x21 // f = g - h (skipped if i = j)
Exit:
• Example: Compiling While Loop:
– C code:
while (save[i] == k
i +=1;
– RISC-V code:
slli x10, x22, 3
add x10, x10, x25
ld x9, 0(x10)
bne x9, x24, Exit
addi x22, x22, 1
beq x0, x0, Loop
Exit:....
⇒ MIPS compilers use the slt, slti, beq, bne and the fixed value of 0 to create all
relative conditions: equal, not equal, less than, less than or equal, greater than, greater
than or equal.

11
• Signed comparison: blt, bge

• Unsigned comparison: bltu, bgeu

• Basic block: a sequence of instructions with:

– No embedded branches (except at end)


– No branch targets (except at beginning)

6 Supporting Procedures in Computer Hardware


• Procedures help organize code by breaking it into smaller, manageable parts.

• There are 6 required steps:

1. Place parameters in registers x10 to x17


2. Transfer control to procedure
3. Acquire storage for procedure
4. Perform procedure’s operations
5. Place result in register for caller
6. Return to place of call (address in x1)

• In addition to allocating these registers, MIPS uses jump-and-link instruction (jal):


jal ProcedureAddress

• To jump to an adress held in a register, MIPS use jalr:


jalr return address, offset(register1)

6.1 Nested Procedures

• Leaf procedures = procedures that do not call others

• Recursive procedures invoke clones of themselves


⇒ Careful when using registers in procedures since it might invoking nonleaf procedures

6.2 Local Data on the Stack

• Stack is used to store variables that are local to the procedure but do not fit in registers.

• MIPS use stack pointer ($sp) to keep track of the top of the stack.

• However, $sp can change so some MIPS can use a frame pointer instead.

12
6.3 Communicating with People

6.3.1 Character Data

• Computers use 8-bit bytes to represent characters, with the ASCII being the standard
character encoding.
• However, ASCII might has a problem of limited character sets
⇒ Unicode will solve it by supporting all characters, using formats like UTF-8 &
UTF-16 for efficient storage and compatibility.

6.3.2 Byte/Halfword/Word Operations

• Load byte/halfword/word: Sign-extend to 64 bits in rd


– lb rd, offset(rs1)
– lh rd, offset(rs1)
– lw rd, offset(rs1)
• Load byte/halfword/word unsigned: Zero-extend to 64 bits in rd
– lbu rd, offset(rs1)
– lhu rd, offset(rs1)
– lwu rd, offset(rs1)
• Store byte/halfword/word: Store rightmost 8/16/32 bits
– sb rs2, offset(rs1)
– sh rs2, offset(rs1)
– sw rs2, offset(rs1)

7 MIPS Addressing for 32-bit Immediates and Ad-


dresses

7.1 32-Bit Immediate Operands

• Constants are frequently short & fit into the 16-bit field
• However, in some cases they are bigger
⇒ MIPS set load upper immediate (lui) to:
– Copies 20-bit constant to bits [31:12] of rd
– Extends bit 31 to bit [63:32]
– Clears bits [11:0] of rd to 0

13
7.2 Branch Addressing

• The conditional branch instruction must specify two operands in addition to the
branch address.

• To specify a register that would always be added to the branch address, a branch
instruction would calculate:

Program counter = Register + Branch address

⇒ This sum solves the branch address size problem.

• Conditional branches are found in loops and if statements


⇒ They tend to branch to a nearby instruction (forward / backward).
⇒ To support this efficiently,PC-relative addressing is used to calculate the branch
address:

Target address = PC + immediate x 2

7.3 Jump Addressing

• MIPS jump instructions have the simplest addressing, known as the J-type format.

• This format consists of 6 bits for the operation field & the rest for the address field.

• This is used in both j (jump) & jal (jump-and-link) instructions, which are typically
used for procedure calls - situations where the target address may be far from the
current location.

7.4 MIPS Addressing Mode Summary

• The MIPS addressing modes are:

– Immediate addressing, where the operand is a constant within the instruction


itself.
– Register addressing, where the operand is a register.
– Base or displacement addressing, where the operand is at the memory location
whose address is the sum of a register and a constant in the instruction.
– PC-relative addressing, where the branch address is the sum of the PC and a
constant in the instruction.
– Pseudodirect addressing, where the jump address is the 26 bits of the instruction
concatenated with the upper bits of the PC.

14
7.5 MIPS Encoding Summary

• MIPS instructions are all 32 bits long and fall into 3 types:

– R-type (Register-type): Used for register-to-register operations (e.g., add, sub)


– I-type (Immediate-type): Used for operations with constants, loads, stores,
and conditional branches
∗ In RISC-V, these are split into:
· I-type for immediate ops and loads
· S-type for stores
· SB-type for conditional branches
– J-type (Jump-type): Used for unconditional jumps and procedure calls (e.g.,
j, jal)
∗ In RISC-V, this corresponds to the UJ-type format (e.g., jal)

• In RISC-V, there’s also a U-type format used for loading a 20-bit upper immediate
(e.g., lui, auipc)

8 Parallelism and Instructions: Synchronization

• Parallel execution is easier when tasks are independent but when tasks share data,
they must cooperate.
• Cooperation requires synchronization - making sure one task finishes writing before
another reads (using lock & unlock operations)
⇒ This avoids the unpredictable program behavior caused by a data race.
• To implement synchronization, hardware support is required:
– It has the ability to atomically read and modify a memory location.
– Nothing else can interpose itself between the read and write
• There are two ways to achieve atomicity:
– Single atomic instruction:
∗ One typical operation: atomic swap/exchange
∗ This operation interchanges a value in a register for a value in memory
– Atomic instruction pair (used in MIPS/RISC-V):
∗ This pair includes a special load (load linked ) & a special store (store condi-
tional ).
∗ ll (load linked): reads a value and watches the memory location.
∗ sc (store conditional): tries to store a new value:
· If location not changed → succeeds (returns 1 in MIPS, 0 in RISC-V)
· If location changed → fails (returns 0 in MIPS, non-zero in RISC-V)

15
9 Translating and Starting a Program

9.1 Compiler

• The compiler transforms the C program into an assembly language program.

• A translation hierachy for C:

9.2 Assembler

• The assembler translates Pseudoinstructions (unreal machine instructions) into one or


more real ones by offering common patterns in simpler forms.

• For complex translations, the assembler uses register $at (assembler temporary).

• To produce a complete program, the assembler first converts your assembly code →
machine code and then creates an object file.

• The object file for UNIX systems typically contains six distinct pieces:

– Header: described contents of object module


– Text segment: translated instructions
– Static data segment: data allocated for the life of the program
– Relocation info: for contents that depend on absolute location of loaded program
– Symbol table: global definitions and external refs
– Debug info: for associating with source code

16
9.3 Linker

• The linker combines all the independently compiled and assembled modules into one
complete executable program.
⇒ This avoids recompiling the entire program when only a small part changes (e.g.,
one function), which saves time and effort.

• The linker performs three main steps:

– Merges code and data segments from different modules.


– Resolves symbols by determining addresses of functions and variables.
– Patches location-dependent references (e.g., absolute addresses) and exter-
nal references (from other files/libraries).

• The linker uses:

– Relocation information to adjust memory addresses after placing modules.


– The symbol table to match function and variable names to their memory loca-
tions.

• In older systems, a relocating loader was needed to fix memory addresses at load
time.

• But in modern systems with virtual memory, the program can be loaded into a fixed
virtual address space.
⇒ So the linker can do all address fixing ahead of time, and the loader just places the
program in memory to run.

9.4 Loader

• The loader is used to load the image file from disk into memory:

– Read header to determine segment sizes.


– Create virtual address space.
– Copy text and initialized data into memory.
∗ Or set page table entries so they can be faulted in.
– Set up arguments on stack.
– Initialize registers (including sp,fp,gp).
– Jump to startup routine
∗ Copies arguments to x10,.... and calls main
∗ When main returns, do exit syscall.

17
9.5 Dynamic Linking

• Unlike Static Linking, which happens before the program runs, Dynamic Linking occurs
during program execution (at runtime).

• To resolve library routines at runtime, both the program and the library store extra
information:

– Names of external functions (symbols)


– Where to find them (location/address)

⇒ This allows the program to:

– Pick up newer versions of libraries without recompiling


– Save memory by loading only the code that is actually used
– Reduce executable file size (libraries are shared)

• Lazy Linking (Lazy Procedure Linkage): a smart strategy to delay loading each
function until it’s really needed

– At program start: Each external function call points to a small ”stub” or


placeholder (dummy routine).
– First time the function is called: The program jumps to the stub, which tells
the dynamic linker to find and load the real routine, then updates the pointer.
– Future calls: They go directly to the real routine — no more indirection.

9.6 Starting a Java Program

• Unlike C language which is fast but not portable, Java was designed to be portable and
safe.

• Java source code is compiled into an intermediate format - called Java bytecode.
These are executed by Java Virtual Machine (JVM).

• To improve performance, Java adopted Just-In-Time (JIT) compilation.

10 A C Sort Example to Put It All Together

• Because short examples don’t show how a full program works, we look at two full C
functions and translate them into MIPS:

– One that swaps array elements


– One that sorts an array

• The C for statement has three parts: initialization, loop test and iteration increment.

18
10.1 The Procedure Swap

• C parameters: swap (int v[], int k)

• MIPS uses:

– $a0: base address of array v


– $a1: index k
– $t0: temporary variable temp (swap is a leaf procedure so there is no need to save
$t0)

⇒ Final Complete Procedure:

swap:
sll $t1, $a1, 2 # $t1 = k * 4
add $t1, $a0, $t1 # $t1 = address of v[k]
lw $t0, 0($t1) # temp = v[k]
lw $t2, 4($t1) # $t2 = v[k+1]
sw $t2, 0($t1) # v[k] = v[k+1]
sw $t0, 4($t1) # v[k+1] = temp
jr $ra # return

10.2 The Procedure Sort

• The Sort Procedure in C


void s o r t ( i n t v [ ] , i n t n) {
int i , j ;
f o r ( i = 0 ; i < n ; i ++) {
f o r ( j = i − 1 ; j >= 0 && v [ j ] > v [ j + 1 ] ; j −−) {
swap ( v , j ) ;
}
}
}

• This function sorts array v with n elements.

• It uses two loops: outer and inner

10.3 The Outer Loop

• Initialization:
move $s0 , $ z e r o # i = 0

• Loop Condition:

19
f o r 1 t s t : s l t $t0 , $s0 , $s3 # $ t 0 = ( i < n )
beq $t0 , $ z e r o , e x i t 1 # e x i t i f i >= n

• Increment and Jump Back:


a d d i $s0 , $s0 , 1 # i += 1
j for1tst # go back t o l o o p c o n d i t i o n

10.4 Inner Loop

• Initialization:
a d d i $s1 , $s0 , −1 # j = i − 1

• Condition 1: j >= 0
f o r 2 t s t : s l t i $t0 , $s1 , 0
bne $t0 , $ z e r o , e x i t 2

• Condition 2: v[j] > v[j+1]


sll $t1 , $s1 , 2 # t1 = j ∗ 4
add $t2 , $s2 , $ t 1 # address of v [ j ]
lw $t3 , 0( $t2 ) # t3 = v [ j ]
lw $t4 , 4( $t2 ) # t 4 = v [ j +1]
slt $t0 , $t4 , $ t 3 # t 0 = ( v [ j ] > v [ j +1])
beq $t0 , $zero , e x i t 2 # e x i t i f v [ j ] <= v [ j +1]

• Procedure Call:
move $a0 , $s2 # pass v
move $a1 , $s1 # pass j
j a l swap # c a l l swap

• Decrement and Jump:


a d d i $s1 , $s1 , −1
j for2tst
exit2 :

10.5 Preserving Registers

• Assembly procedures like sort must save and restore used saved registers (e.g., x19--x22,
x1) to avoid altering the caller’s data.

• We use the stack to save and restore these values:

– Preserve saved registers (function prologue):

20
a ddi sp , sp , −40 // make room on t h e s t a c k f o r 5 r e g i s t e r s
sd x1 , 3 2( sp ) // save return address
sd x22 , 24 ( sp ) // save s6 ( used i n s o r t )
sd x21 , 16 ( sp ) // save s5 ( used i n s o r t )
sd x20 , 8 ( sp ) // save s4 ( used i n s o r t )
sd x19 , 0 ( sp ) // save s3 ( used i n s o r t )

– Restore saved registers (function epilogue):


exit1 :
ld x19 , 0 ( sp ) // r e s t o r e s3
ld x20 , 8 ( sp ) // r e s t o r e s4
ld x21 , 1 6( sp ) // r e s t o r e s5
ld x22 , 2 4( sp ) // r e s t o r e s6
ld x1 , 3 2( sp ) // restore return address
a ddi sp , sp , 40 // restore stack pointer
j a l r x0 , 0 ( x1 ) // r e t u r n from p r o c e d u r e

11 Arrays versus Pointers

11.1 Arrays vs. Pointers

• To access array[i], the computer must:

– Multiply i by the size of each element.


– Add that result to the starting address of the array.

⇒ This gives the exact memory address of array[i].

• Pointers store memory address directly.


⇒ It can avoid indexing complexity, which makes it often faster and simpler in low-level
code.

11.2 Comparison of Array vs. Pointers

• Array indexing is simple and safe, but may repeat address computations.

• Pointer arithmetic is faster by reducing repeated operations, but we need to be careful


when use.

• Compiler achieve same effect as manual use of pointers:

– It replace multiplication with a cheaper shift → strength reduction


– It avoid repeated address calculations → induction variable elimination

21
12 MIPS Instructions

• MIPS is a commercial RISC architecture - before RISC-V

• Both share similar basic set of instructions:

– 32-bit instructions
– 32 general purpose registers, register 0 is always 0
– 32 floating-point registers
– Memory accessed only by load/store instructions
– Use consistent addressing modes for all data types

• However, they have differences in branching:

– For <, ≤, >, ≥


– RISC-V: blt, bge, bltu, bgeu
– MIPS: slt, sltu ( set less than, result is 0 or 1)
∗ Then use beq, bne to complete the branch

13 The Intel x86 ISA Overview


We trace the evolution of Intel’s x86 processors through four key themes: performance
improvements, multimedia support, 64-bit transition, and SIMD/vector enhancements.

13.1 Performance and Architecture


– i486 (1989): Added pipelining, on-chip cache, and FPU.
– Pentium (1993): Introduced superscalar execution and 64-bit data path.
– Pentium Pro (1995): Major redesign with out-of-order execution.
– Pentium 4 (2001): Focused on high clock speed with new microarchitecture.

13.2 Multimedia and SIMD


– MMX (Pentium, 1996): First multimedia instructions.
– SSE (Pentium III, 1999): Streaming SIMD Extensions for parallel data.
– SSE2–SSE4 (2001–2006): Improved vector processing, added in Pentium 4 and
Intel Core.

22
13.3 64-bit Computing
– AMD64 (2003): Extended x86 to 64 bits.
– Intel EM64T (2004): Intel’s version with enhancements.
– Impact: More memory, registers, and precision.

13.4 Advanced Vector Extensions (AVX)


– AVX (2008): Longer SIMD registers, more instructions.
– Purpose: High-performance tasks like media, crypto, and science.

13.5 Operand Types


Most x86 instructions use two operands:

– Register ↔ Register
– Register ↔ Memory
– Register ↔ Immediate
– Memory ↔ Immediate

13.6 Addressing Modes


Common memory access modes:

– Register indirect: address in a register


– Base + offset: [Rbase + displacement]
– Indexed: [Rbase + 2scale × Rindex ]
– Full: [Rbase + 2scale × Rindex + displacement]

13.7 Instruction Encoding


– Variable-length format
– Prefixes: modify behavior (e.g., size, lock)
– Postfix: specifies operands and addressing mode

13.8 Execution Model


– Complex instructions are translated into simpler micro-ops
– Simple: 1-to-1 mapping; Complex: 1-to-many
– Internally behaves like RISC for efficiency

23
14 Other RISC-V Instructions

14.1 RV64I Base Integer Instructions


– Core 64-bit instructions for arithmetic, logic, and control.
– auipc: adds imm << 12 to pc, used with jalr for jumps.
– slt, sltu, slti, sltui: set-less-than (signed/unsigned).
– 32-bit operations: addw, subw, sllw, srlw, sraiw, etc.

14.2 RV32I Variant


– Same structure as RV64I but with 32-bit registers and data.

14.3 Standard Extensions


– M: Integer multiply/divide.
– A: Atomic operations.
– F, D: Single and double floating point.
– C: Compressed 16-bit instructions.

15 Fallacies & Pitfalls

15.1 Fallacies
– Complex instructions are always better: They may reduce the number of
instructions, but are harder to implement and can slow down even simple ones.
– Assembly is always faster: Compilers often produce faster and safer code,
especially on modern CPUs.
– Instruction sets evolve freely: Backward compatibility means new instructions
get added, not replaced — making ISAs more complex.

15.2 Pitfalls
– Word addresses increase by 1: In reality, word addresses increase by 4 bytes,
not 1.
– Using a pointer to a local variable: Once a function ends, its local variables
are gone — such pointers become invalid.

24

You might also like