Lecture 04
x86-64 Programming – part 1
Euhyun Moon, Ph.D.
Machine Learning Systems (MLSys) Lab
Computer Science and Engineering
Sogang University
Slides adapted from Randy Bryant and Dave O'Hallaron: Introduction to Computer Systems, CMU
Sogang University CSE3030:
COR1010: Introduction
AI Programming
to Computer Systems
Architecture Sits at the Hardware Interface
Source code Compiler Architecture Hardware
Different applications Perform optimizations, Instruction set Different
or algorithms generate instructions implementations
Intel Pentium 4
C Language
Intel Core 2
Program
A
GCC x86-64 Intel Core i7
Program AMD Opteron
B
AMD Athlon
Clang
Your
program ARMv8 ARM Cortex-A53
(AArch64/A64)
Apple A7
Sogang University CSE3030: Introduction to Computer Systems 2
Definitions
• Architecture (ISA): The parts of a processor design that
one needs to understand to write assembly code
• “What is directly visible to software”
• The “contract” or “blueprint” between hardware and software
• Microarchitecture: Implementation of the architecture
Sogang University CSE3030: Introduction to Computer Systems 3
Instruction Set Architectures
• The ISA defines:
• The system’s state (e.g., registers, memory, program counter)
• The instructions the CPU can execute
• The effect that each of these instructions will have on the system state
CPU
PC Memory
Registers
Sogang University CSE3030: Introduction to Computer Systems 4
Instruction Set Philosophies
• Complex Instruction Set Computing (CISC):
Add more and more elaborate and specialized instructions as
needed
• Lots of tools for programmers to use, but hardware must be able to
handle all instructions
• x86-64 is CISC, but only a small subset of instructions encountered
with Linux programs
• Reduced Instruction Set Computing (RISC):
Keep instruction set small and regular
• Easier to build fast hardware
• Let software do the complicated operations by composing simpler ones
Sogang University CSE3030: Introduction to Computer Systems 5
General ISA Design Decisions
• Instructions
• What instructions are available? What do they do?
• How are they encoded?
• Registers
• How many registers are there?
• How wide are they?
• Memory
• How do you specify a memory location?
Sogang University CSE3030: Introduction to Computer Systems 6
Mainstream ISAs
Macbooks & PCs Smartphone-like devices Mostly research
(Core i3, i5, i7, M) (iPhone, iPad, Raspberry Pi) (some traction in embedded)
x86-64 Instruction Set ARM Instruction Set RISC-V Instruction Set
Sogang University CSE3030: Introduction to Computer Systems 7
Writing Assembly Code? In 2025?
• Chances are, you’ll never write a program in assembly, but
understanding assembly is the key to the machine-level
execution model:
• Behavior of programs in the presence of bugs
• When high-level language model breaks down
• Tuning program performance
• Understand optimizations done/not done by the compiler
• Understanding sources of program inefficiency
• Implementing systems software
• What are the “states” of processes that the OS must manage
• Using special units (timers, I/O co-processors, etc.) inside processor!
• Fighting malicious software
• Distributed software is in binary form
Sogang University CSE3030: Introduction to Computer Systems 8
Assembly Programmer’s View
CPU Addresses
Memory
Registers • Code
PC Data
• Data
Condition Instructions • Stack
Codes
• Programmer-visible state
• PC: the Program Counter (%rip in x86-64)
• Address of next instruction
• Named registers
• Memory
• Byte-addressable array
• Together in “register file”
• Code and user data
• Heavily used program data
• Includes the Stack (for
• Condition codes supporting procedures)
• Store status information about most recent
arithmetic operation
• Used for conditional branching
Sogang University CSE3030: Introduction to Computer Systems 9
x86-64 Assembly “Data Types”
• Integral data of 1, 2, 4, or 8 bytes
• Data values
• Addresses
• Floating point data of 4, 8, 10 or 2x8 or 4x4 or 8x2
Not covered
• Different registers for those (e.g., %xmm1, %ymm2) in CSE3030
• Come from extensions to x86 (SSE, AVX, …)
• No aggregate types such as arrays or structures
• Just contiguously allocated bytes in memory
• Two common syntaxes
• “AT&T”: used by our course, slides, textbook, gnu tools, …
• “Intel”: used by Intel documentation, Intel tools, …
• Must know which you’re reading
Sogang University CSE3030: Introduction to Computer Systems 10
What is a Register?
• A location in the CPU that stores a small amount of data,
which can be accessed very quickly (once every clock cycle)
• Registers have names, not addresses
• In assembly, they start with % (e.g., %rsi)
• Registers are at the heart of assembly programming
• They are a precious commodity in all architectures, but especially
x86
Sogang University CSE3030: Introduction to Computer Systems 11
x86-64 Integer Registers – 64 bits wide
%rax %eax %r8 %r8d
%rbx %ebx %r9 %r9d
%rcx %ecx %r10 %r10d
%rdx %edx %r11 %r11d
%rsi %esi %r12 %r12d
%rdi %edi %r13 %r13d
%rsp %esp %r14 %r14d
%rbp %ebp %r15 %r15d
• Can reference low-order 4 bytes (also low-order 2 & 1 bytes)
Sogang University CSE3030: Introduction to Computer Systems 12
Some History: IA32 Registers – 32 bits wide
%eax %ax %ah %al accumulate
%ecx %cx %ch %cl counter
general purpose
%edx %dx %dh %dl data
%ebx %bx %bh %bl base
%esi %si source index
%edi %di destination index
%esp %sp stack pointer
%ebp %bp base pointer
16-bit virtual registers Name Origin
(backwards compatibility) (mostly obsolete)
Sogang University CSE3030: Introduction to Computer Systems 13
Memory vs. Register
Memory Register
• Addresses vs. Names
• 0x7FFFD024C3DC %rdi
• Big vs. Small
• ~ 8 GiB (16 x 8 B) = 128 B
• Slow vs. Fast
• ~50-100 ns sub-nanosecond timescale
• Dynamic vs. Static
• Can “grow” as needed fixed number in hardware
while program runs
Sogang University CSE3030: Introduction to Computer Systems 14
Three Basic Kinds of Instructions
1) Transfer data between memory and register
• Load data from memory into register
• %reg = Mem[address] Remember: Memory
• Store register data into memory is indexed just like an
array of bytes!
• Mem[address] = %reg
2) Perform arithmetic operation on register or memory data
• c = a + b; z = x << y; i = h & g;
3) Control flow: what instruction to execute next
• Unconditional jumps to/from procedures
• Conditional branches
Sogang University CSE3030: Introduction to Computer Systems 15
Instruction Sizes
• Size specifiers
• b = 1-byte “byte” w = 2-byte “word”
l = 4-byte “long word” q = 8-byte “quad word”
• Note that due to backwards-compatible support for 8086
programs (16-bit machines!), “word” means 16 bits = 2
bytes in x86 instruction names
Sogang University CSE3030: Introduction to Computer Systems 16
Operand Types
• Immediate: Constant integer data ($)
• Examples: $0x400, $-533 %rax
• Like C literal, but prefixed with ‘$’ %rcx
• Encoded with 1, 2, 4, or 8 bytes
depending on the instruction
%rdx
• Register: 1 of 16 integer registers (%) %rbx
• Examples: %rax, %r13 %rsi
• But %rsp reserved for special use %rdi
• Others have special uses for particular
instructions %rsp
• Memory: Consecutive bytes of memory %rbp
at a computed address (())
• Simplest example: (%rax) %rN
• Various other “address modes”
Sogang University CSE3030: Introduction to Computer Systems 17
x86-64 Introduction
• Data transfer instruction (mov)
• Arithmetic operations
• Memory addressing modes
• swap example
Sogang University CSE3030: Introduction to Computer Systems 18
Moving Data
• General form: mov_ source, destination
• Missing letter (_) specifies size of operands
• Note that due to backwards-compatible support for 8086 programs
(16-bit machines!), “word” means 16 bits = 2 bytes in x86 instruction
names
• Lots of these in typical code
• movb src, dst • movl src, dst
• Move 1-byte “byte” • Move 4-byte “long word”
• movw src, dst • movq src, dst
• Move 2-byte “word” • Move 8-byte “quad word”
Sogang University CSE3030: Introduction to Computer Systems 19
Operand Combinations
Source Dest Src, Dest C Analog
Reg movq $0x4, %rax var_a = 0x4;
Imm
Mem movq $-147, (%rax) *p_a = -147;
movq Reg movq %rax, %rdx var_d = var_a;
Reg
Mem movq %rax, (%rdx) *p_d = var_a;
Mem Reg movq (%rax), %rdx var_d = *p_a;
• Cannot do memory-memory transfer with a single instruction
• How would you do it?
Sogang University CSE3030: Introduction to Computer Systems 20
Some Arithmetic Operations
• Binary (two-operand) Instructions:
•
Maximum of one Format Computation
memory operand addq src, dst dst = dst + src (dst += src)
• Beware argument subq src, dst dst = dst – src
order! imulq src, dst dst = dst * src signed mult
• No distinction sarq src, dst dst = dst >> src Arithmetic
between signed shrq src, dst dst = dst >> src Logical
and unsigned
• Only arithmetic vs. shlq src, dst dst = dst << src (same as salq)
logical shifts xorq src, dst dst = dst ^ src
andq src, dst dst = dst & src
orq src, dst dst = dst | src
operand size specifier
• How do you implement “rcx = rax + rbx”?
Sogang University CSE3030: Introduction to Computer Systems 21
Some Arithmetic Operations
• Unary (one-operand) Instructions:
Format Computation
incq dst dst = dst + 1 increment
decq dst dst = dst – 1 decrement
negq dst dst = –dst negate
notq dst dst = ~dst bitwise complement
• See CS:APP3e textbook Section 3.5.5 for more instructions:
mulq, cqto, idivq, divq
Sogang University CSE3030: Introduction to Computer Systems 22
Arithmetic Example Register Use(s)
%rdi 1st argument (x)
%rsi 2nd argument (y)
long simple_arith(long x, long y)
{ %rax return value
long t1 = x + y;
long t2 = t1 * 3;
return t2;
}
y += x;
y *= 3;
long r = y;
return r;
simple_arith:
addq %rdi, %rsi
imulq $3, %rsi
movq %rsi, %rax
ret
Sogang University CSE3030: Introduction to Computer Systems 23
Example of Basic Addressing Modes
void swap(long *xp, long *yp)
{
long t0 = *xp;
long t1 = *yp;
*xp = t1;
*yp = t0;
}
Compiler Explorer:
https://godbolt.org/z/zc4Pcq
swap:
movq (%rdi), %rax
movq (%rsi), %rdx
movq %rdx, (%rdi)
movq %rax, (%rsi)
ret
Sogang University CSE3030: Introduction to Computer Systems 24
Understanding swap()
void swap(long *xp, long *yp) Registers Memory
{
%rdi
long t0 = *xp;
long t1 = *yp; %rsi
*xp = t1;
%rax
*yp = t0;
} %rdx
swap: Register Variable
movq (%rdi), %rax %rdi ⇔ xp
movq (%rsi), %rdx
%rsi ⇔ yp
movq %rdx, (%rdi)
movq %rax, (%rsi) %rax ⇔ t0
ret %rdx ⇔ t1
Sogang University CSE3030: Introduction to Computer Systems 25
Understanding swap()
Registers Memory Word
Address
%rdi 0x120 123 123 0x120
%rsi 0x100 0x118
0x110
%rax
0x108
%rdx
456 0x100
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
Sogang University CSE3030: Introduction to Computer Systems 26
Understanding swap()
Registers Memory Word
Address
%rdi 0x120 123 123 0x120
%rsi 0x100 0x118
0x110
%rax 123
0x108
%rdx
456 0x100
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
Sogang University CSE3030: Introduction to Computer Systems 27
Understanding swap()
Registers Memory Word
Address
%rdi 0x120 123 123 0x120
%rsi 0x100 0x118
0x110
%rax 123
0x108
%rdx 456
456 0x100
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
Sogang University CSE3030: Introduction to Computer Systems 28
Understanding swap()
Registers Memory Word
Address
%rdi 0x120 123 456 0x120
%rsi 0x100 0x118
0x110
%rax 123
0x108
%rdx 456
456 0x100
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
Sogang University CSE3030: Introduction to Computer Systems 29
Understanding swap()
Registers Memory Word
Address
%rdi 0x120 123 456 0x120
%rsi 0x100 0x118
0x110
%rax 123
0x108
%rdx 456
123 0x100
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
Sogang University CSE3030: Introduction to Computer Systems 30
Memory Addressing Modes: Basic
• Indirect: (R) Mem[Reg[R]]
• Data in register R specifies the memory address
• Like pointer dereference in C
• Example: movq (%rcx), %rax
• Displacement: D(R) Mem[Reg[R]+D]
• Data in register R specifies the start of some memory region
• Constant displacement D specifies the offset from that address
• Example: movq 8(%rbp), %rdx
Sogang University CSE3030: Introduction to Computer Systems 31
Complete Memory Addressing Modes
• General:
• D(Rb,Ri,S) Mem[Reg[Rb]+Reg[Ri]*S+D]
• Rb: Base register (any register)
• Ri: Index register (any register except %rsp)
• S: Scale factor (1, 2, 4, 8) – why these numbers?
• D: Constant displacement value (a.k.a. immediate)
• Special cases (see CS:APP3e Figure 3.3 on p.181)
• D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D] (S=1)
• (Rb,Ri,S) Mem[Reg[Rb]+Reg[Ri]*S] (D=0)
• (Rb,Ri) Mem[Reg[Rb]+Reg[Ri]] (S=1,D=0)
• (,Ri,S) Mem[Reg[Ri]*S] (Rb=0,D=0)
Sogang University CSE3030: Introduction to Computer Systems 32
Address Computation Examples
%rdx 0xf000 D(Rb,Ri,S) →
%rcx 0x0100 Mem[Reg[Rb]+Reg[Ri]*S+D]
Expression Address Computation Address
0x8(%rdx)
(%rdx,%rcx)
(%rdx,%rcx,4)
0x80(,%rdx,2)
Sogang University CSE3030: Introduction to Computer Systems 33
Address Computation Instruction
• leaq src, dst
• "lea" stands for load effective address
• src is address expression (any of the formats we’ve seen)
• dst is a register
• Sets dst to the address computed by the src expression
(does not go to memory! – it just does math)
• Example: leaq (%rdx,%rcx,4), %rax
• Uses:
• Computing addresses without a memory reference
• e.g., translation of p = &x[i];
• Computing arithmetic expressions of the form x+k*i+d
• Though k can only be 1, 2, 4, or 8
Sogang University CSE3030: Introduction to Computer Systems 34
Example: lea vs. mov
Registers Memory Word
Address
%rax 123
0x400 0x120
%rbx 0xF 0x118
0x8 0x110
%rcx 0x4
0x10 0x108
%rdx 0x100
0x1 0x100
%rdi
%rsi
leaq (%rdx,%rcx,4), %rax
movq (%rdx,%rcx,4), %rbx
leaq (%rdx), %rdi
movq (%rdx), %rsi
Sogang University CSE3030: Introduction to Computer Systems 35
lea
lea – “It just does math”
Sogang University CSE3030: Introduction to Computer Systems 36
Arithmetic Example Register Use(s)
%rdi 1st argument (x)
long arith(long x, long y, long z) %rsi 2nd argument (y)
{ %rdx 3rd argument (z)
long t1 = x + y;
long t2 = z + t1;
long t3 = x + 4;
long t4 = y * 48;
long t5 = t3 + t4;
long rval = t2 * t5;
return rval;
}
arith:
leaq (%rdi,%rsi), %rax • Interesting Instructions
addq %rdx, %rax • leaq: “address” computation
leaq (%rsi,%rsi,2), %rdx • salq: shift
salq $4, %rdx • imulq: multiplication
leaq 4(%rdi,%rdx), %rcx
• Only used once!
imulq %rcx, %rax
ret
Sogang University CSE3030: Introduction to Computer Systems 37
Arithmetic Example Register Use(s)
%rdi x
%rsi y
long arith(long x, long y, long z)
{ %rdx z, t4
long t1 = x + y; %rax t1, t2, rval
long t2 = z + t1;
long t3 = x + 4; %rcx t5
long t4 = y * 48;
long t5 = t3 + t4;
long rval = t2 * t5;
return rval;
}
arith:
leaq (%rdi,%rsi), %rax # rax/t1 = x + y
addq %rdx, %rax # rax/t2 = t1 + z
leaq (%rsi,%rsi,2), %rdx # rdx = 3 * y
salq $4, %rdx # rdx/t4 = (3*y) * 16
leaq 4(%rdi,%rdx), %rcx # rcx/t5 = x + t4 + 4
imulq %rcx, %rax # rax/rval = t5 * t2
ret
Sogang University CSE3030: Introduction to Computer Systems 38
Control Flow Register Use(s)
%rdi 1st argument (x)
%rsi 2nd argument (y)
%rax return value
long max(long x, long y)
{
long max; max:
if (x > y) { ???
max = x; movq %rdi, %rax
} else { ???
max = y; ???
} movq %rsi, %rax
return max; ???
} ret
Sogang University CSE3030: Introduction to Computer Systems 39
Control Flow Register Use(s)
%rdi 1st argument (x)
%rsi 2nd argument (y)
%rax return value
long max(long x, long y)
{
long max; max:
if (x > y) { Conditional jump if x <= y then jump to else
max = x; movq %rdi, %rax
} else { Unconditional jump jump to done
max = y; else:
} movq %rsi, %rax
return max; done:
} ret
Sogang University CSE3030: Introduction to Computer Systems 40
Conditionals and Control Flow
• Conditional branch/jump
• Jump to somewhere else if some condition is true,
otherwise execute next instruction
• Unconditional branch/jump
• Always jump when you get to this instruction
• Together, they can implement most control flow constructs in
high-level languages:
• if (condition) then {…} else {…}
• while (condition) {…}
• do {…} while (condition)
• for (initialization; condition; iterative) {…}
• switch {…}
Sogang University CSE3030: Introduction to Computer Systems 41
Summary
• x86-64 is a complex instruction set computing (CISC)
architecture
• There are 3 types of operands in x86-64
• Immediate, Register, Memory
• There are 3 types of instructions in x86-64
• Data transfer, Arithmetic, Control Flow
• Memory Addressing Modes: The addresses used for
accessing memory in mov (and other) instructions can be
computed in several different ways
• Base register, index register, scale factor, and displacement map
well to pointer arithmetic operations
• Control flow in x86 determined by Condition Codes
Sogang University CSE3030: Introduction to Computer Systems 42