7.x86 - Architecture
7.x86 - Architecture
Abhijit A M
[email protected]
https://jerry153fish.github.io/2016/01/01/8086-Registers.html
8086 Registers
16 bit ought to be enough ! (?)
General purpose registers
four 16-bit data registers: AX, BX, CX, DX
each in two 8-bit halves, e.g. AH and AL
Registers for memory addressing
SP, BP, SI, DI : 16 bit
SP stack pointer, BP base pointer, SI Source Index, DI
Destination Index
IP instruction Pointer
Addressible memory: 2^16 = 64 kb
8086 address extension
8086 has 20-bit physical addresses ==> 1 MB AS
the extra four bits usually come from a 16-bit "segment
register":
CS - code segment, for fetches via IP
SS - stack segment, for load/store via SP and BP
DS - data segment, for load/store via other registers
ES - another data segment, destination for string operations
%cs:%ip full address: %cs * 16 + %ip is actual address that
goes on bus. virtual to physical translation is
pa = va + seg*16
e.g. set CS = 4096 to execute starting at 65536
8086 address extension
Extending ‘address space’ with the help of
MMU
FLAGS register
FLAGS - various condition codes: whether last
arithmetic operation
overflowed
was positive/negative
was [not] zero
carry/borrow on add/subtract
etc.
whether interrupts are enabled
direction of data copy instructions
Uses in JP, JN, J[N]Z, J[N]C, J[N]O ...
32 bit 80386
boots in 16-bit mode, then on running particular
instructions switches to 32-bit mode (protected
mode)
registers are 32 bits wide, called EAX rather than AX
EAX EBX ECX EDX
ESP EBP ESI EDI
operands and addresses 32-bit in 32-bit mode,
e.g. ADD does 32-bit arithmetic
Segment registers are 16 bit: CS, SS, DS, etc.
32 bit 80386
Still possible to access 16 bit registers using AX or BX
Specifix coding of machine instructions to tell whether
operands are 16 or 32 bit
prefixes 0x66/0x67 toggle between 16-bit and 32-bit
operands/addresses respectively
in 32-bit mode, MOVW is expressed as 0x66 MOVW
the .code32 in bootasm.S tells assembler to generate 0x66
for e.g. MOVW
80386 also changed segments and added paged
memory...
Summary of registers in 80386
(32 bit)
General registers
32 bits : EAX EBX ECX EDX
16 bits : AX BX CX DX
8 bits : AH AL BH BL CH CL DH DL
Summary of registers in 80386
Segment Registers
CS: Holds the Code segment in which your
program runs.
DS : Holds the Data segment that your program
accesses.
ES,FS,GS : These are extra segment registers
SS : Holds the Stack segment your program
uses.
All 16 bit
Summary of registers in 80386
For a typical register, the
SS:EBP
corresponding segment is
Stack Segment: EBP ==>
used. Pairs of Indexes & mov (%ebp), %eax ==>
pointers (Segment & for accessing (%ebp) SS:
Registers) EBP address will be used
CS:EIP
DS:ESI , ES: EDI .
Code Segment: Index Pointer
ESI: Extended Source
E.g. mov $32, %eax ==> The Index, EDI: Extended
code of move instruction Destination Index
uses CS: EIP
SS:ESP
The EFLAGS register
Stack Segment: ESP for flags
E.g. push $32 ==> The $32
will be pushed on stack.
Using SS: ESP address
X86 Assembly Code
Syntax
Intel syntax: op dst, src (Intel manuals!)
AT&T (gcc/gas) syntax: op src, dst (xv6)
uses b, w, l suffix on instructions to specify size
of operands
Operands are registers, constant, memory
via register, memory via constant
Examples of X86 instructions
AT&T syntax "C"-ish equivalent Operands
movl %eax, %edx edx = eax; register mode
movl $0x123, %edx edx = 0x123; immediate
movl 0x123, %edx edx = direct
*(int32_t*)0x123;
movl (%ebx), %edx edx = *(int32_t*)ebx; indirect
movl 4(%ebx), %edx edx = *(int32_t*) displaced
(ebx+4)
Instructions suffix/prefix
mov %eax, %ebx # 32 bit data
movw %ax, %bx # move 16 bit data
mov %ax, %bx # ax is 16 bit, so equivalent
to movw
mov $123, 0x123 # Ambigious
movw $123, 0x123 # correct, move 16 bit
data
Types of Instructions
data movement
Control
MOV, PUSH, POP, ...
JMP, JZ, JNZ, CALL,
RET
Arithmetic
TEST, SHL, ADD,
String
AND, ...
REP MOVSB, ...
i/o
System
IN, OUT, ...
IRET, INT
Stack, given by %esp
Grows downwards
pushl %eax
Semantics are: substract 4 (32 bit value = 4 bytes) from esp
(grow downwards), and mov eax to memory pointed by esp, so
sub $4, %esp # esp = esp -4
mov %eax, (%esp) # *esp = eax
pop %eax
has opposite semantics, that is
movl (%esp), %eax
addl $4, %esp
Function calls using stack:
Approximate picture
address
Contract between caller
%esp points at
(function) and callee arguments pushed by
(called function) caller
called function may have
at entry to a function trashed formal
(i.e. just after call): arguments
%eip points at first
%eax (and %edx, if
instruction of function return type is 64-bit)
contains return value (or
%esp+4 (upwards, trash if function is void)
reverse direction!)
%eax, %edx (above), and
points at first %ecx may be trashed
argument
%ebp, %ebx, %esi, %edi
%esp points at return must contain contents
address from time of call
“caller” save and “callee” save
registers
Terminology:
%eax, %ecx, %edx are "caller save" registers
%ebp, %ebx, %esi, %edi are "callee save" registers
Means, if main()->add() then code of add()
function (callee) can
freely use eax, ecx, edx and
If it uses, ebp, ebx, etc. then
it must save them before using and restore them before
returning
GCC does more than that !
function prologue:
function epilogue
pushl %ebp
movl %ebp, %esp
movl %esp, %ebp
popl %ebp
(ret)
each function has a stack frame marked by %ebp, %esp
%esp can move to make stack frame bigger, smaller
%ebp points at saved %ebp from previous function,
chain to walk stack
Stack Frame
ebp, for debugging
The frame pointer (in ebp) is not strictly
Compiler can compute the address of its return address and
function arguments based on its knowledge of the current
depth of the stack pointer (in esp).
The frame pointer is useful for debugging purposes
Current function is based on the current value of eip.
Current value of *(ebp+4) provides the return address of the caller.
Current value of *((*ebp) + 4) (where *ebp contains the saved ebp
of the caller) provides the return address of the caller's caller
Current value of *(*(*ebp) + 4) provides the return address of the
caller's caller's caller, and so on . . .
Let’s see a demo of how the stack is built and destroyed during function calls, on a
Linux machine using GCC.
X (prev ebp)
Y
ebp
esp
mult:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $20, -24(%ebp)
movl $30, -20(%ebp)
subl $8, %esp
pushl -20(%ebp)
pushl -24(%ebp)
call add
int mult(int a, int b) {
int c, d = 20, e = 30, f;
Stack f = add(d, e);
c = a * b + f;
return c;
}
int add(int x, int y) {
X int z;
b z = x + y;
a return z;
RetAddr main()
}
X (prev ebp)
Y
ebp
mult:
esp pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $20, -24(%ebp)
movl $30, -20(%ebp)
subl $8, %esp
pushl -20(%ebp)
pushl -24(%ebp)
call add
int mult(int a, int b) {
int c, d = 20, e = 30, f;
Stack f = add(d, e);
c = a * b + f;
return c;
}
int add(int x, int y) {
X int z;
b z = x + y;
a return z;
RetAddr main()
}
X (prev ebp)
Y
ebp
mult:
pushl %ebp
movl %esp, %ebp
subl $24, %esp
movl $20, -24(%ebp)
movl $30, -20(%ebp)
esp subl $8, %esp
pushl -20(%ebp)
pushl -24(%ebp)
call add
int mult(int a, int b) {
int c, d = 20, e = 30, f;
Stack f = add(d, e);
c = a * b + f;
return c;
}
int add(int x, int y) {
X int z;
b z = x + y;
a return z;
RetAddr main()
}
X (prev ebp)
Y
ebp
mult:
pushl %ebp
movl %esp, %ebp
e = 30 subl $24, %esp
d = 20 movl $20, -24(%ebp)
movl $30, -20(%ebp)
subl $8, %esp
esp pushl -20(%ebp)
pushl -24(%ebp)
call add
int mult(int a, int b) {
int c, d = 20, e = 30, f;
Stack f = add(d, e);
c = a * b + f;
return c;
}
int add(int x, int y) {
X int z;
b z = x + y;
a return z;
RetAddr main()
}
X (prev ebp)
Y
ebp
mult:
pushl %ebp
movl %esp, %ebp
e = 30 subl $24, %esp
movl $20, -24(%ebp)
d = 20
movl $30, -20(%ebp)
subl $8, %esp
pushl -20(%ebp)
pushl -24(%ebp)
call add
esp
Stack
X int mult(int a, int b) {
b int c, d = 20, e = 30, f;
a f = add(d, e);
c = a * b + f;
RetAddr main()
return c;
X (prev ebp) }
Y int add(int x, int y) {
int z;
ebp z = x + y;
return z;
}
e = 30
d = 20
mult:
pushl %ebp
30 (y) movl %esp, %ebp
20 (x) subl $24, %esp
movl $20, -24(%ebp)
movl $30, -20(%ebp)
esp subl $8, %esp
pushl -20(%ebp)
pushl -24(%ebp)
call add
Stack
X int mult(int a, int b) {
b int c, d = 20, e = 30, f;
a f = add(d, e);
c = a * b + f;
RetAddr main()
return c;
X (prev ebp) }
Y int add(int x, int y) {
int z;
ebp z = x + y;
return z;
}
e = 30
d = 20
mult:
pushl %ebp
30 (y) movl %esp, %ebp
20 (x) subl $24, %esp
movl $20, -24(%ebp)
Retadd mult() movl $30, -20(%ebp)
subl $8, %esp
pushl -20(%ebp)
esp
pushl -24(%ebp)
call add
Stack X b
int mult(int a, int b) {
a int c, d = 20, e = 30, f;
RetAddr main() f = add(d, e);
c = a * b + f;
X (prev ebp) return c;
Y
}
int add(int x, int y) {
int z;
z = x + y;
return z;
}
e = 30
d = 20
add:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
30 (y) movl 8(%ebp), %edx
20 (x) movl 12(%ebp), %eax
addl %edx, %eax
Retadd mult()
movl %eax, -4(%ebp)
Y(prev ebp) movl -4(%ebp), %eax
leave
ret
ebp
esp
Stack X
b
int mult(int a, int b) {
a int c, d = 20, e = 30, f;
RetAddr main() f = add(d, e);
c = a * b + f;
X (prev ebp) return c;
Y
}
int add(int x, int y) {
int z;
z = x + y;
return z;
}
e = 30
d = 20
edx = 20 add:
eax = 30 pushl %ebp
eax = eax + edx = 50 movl %esp, %ebp
subl $16, %esp
30 (y) movl 8(%ebp), %edx
20 (x) movl 12(%ebp), %eax
addl %edx, %eax
Retadd mult()
movl %eax, -4(%ebp)
Y(prev ebp) movl -4(%ebp), %eax
leave
ret
ebp
esp
Stack X
b
int mult(int a, int b) {
a int c, d = 20, e = 30, f;
RetAddr main() f = add(d, e);
c = a * b + f;
X (prev ebp) return c;
Y
}
int add(int x, int y) {
int z;
z = x + y;
return z;
}
e = 30
d = 20
eax = 50 add:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
30 (y) movl 8(%ebp), %edx
20 (x) movl 12(%ebp), %eax
addl %edx, %eax
Retadd mult()
movl %eax, -4(%ebp)
Y(prev ebp) movl -4(%ebp), %eax
50 leave
ret
ebp Some redundant code
generated here. Before
esp “leave”. Result is in eax
Stack X
b
int mult(int a, int b) {
a int c, d = 20, e = 30, f;
RetAddr main() f = add(d, e);
c = a * b + f;
X (prev ebp) leave: step 1 return c;
Y
}
int add(int x, int y) {
int z;
z = x + y;
return z;
}
e = 30
d = 20
eax = 50 add:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
30 (y) movl 8(%ebp), %edx
20 (x) movl 12(%ebp), %eax
addl %edx, %eax
Retadd mult()
movl %eax, -4(%ebp)
Y(prev ebp) movl -4(%ebp), %eax
50 leave # # Set ESP to
EBP, then pop EBP.
ebp ret
esp
Stack X
b
int mult(int a, int b) {
a int c, d = 20, e = 30, f;
RetAddr main() f = add(d, e);
c = a * b + f;
X (prev ebp) leave: step 2 return c;
Y
}
int add(int x, int y) {
int z;
z = x + y;
return z;
}
e = 30
d = 20 eax = 50 add:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
30 (y) movl 8(%ebp), %edx
20 (x) movl 12(%ebp), %eax
addl %edx, %eax
Retadd mult()
movl %eax, -4(%ebp)
Y(prev ebp) movl -4(%ebp), %eax
50 leave # # Set ESP to
EBP, then pop EBP.
ebp ret
esp
Stack X
b
int mult(int a, int b) {
a int c, d = 20, e = 30, f;
RetAddr main() f = add(d, e); // here
c = a * b + f;
X (prev ebp) return c;
Y
}
int add(int x, int y) {
int z;
z = x + y;
return z;
}
e = 30
d = 20 eax = 50 add:
pushl %ebp
movl %esp, %ebp
subl $16, %esp
30 (y) movl 8(%ebp), %edx
20 (x) movl 12(%ebp), %eax
addl %edx, %eax
Retadd mult()
movl %eax, -4(%ebp)
Y(prev ebp) movl -4(%ebp), %eax
50 leave # # Set ESP to
EBP, then pop EBP.
ebp ret
esp
Stack X
b
int mult(int a, int b) {
a int c, d = 20, e = 30, f;
RetAddr main() f = add(d, e); // here
c = a * b + f;
X (prev ebp) return c;
Y
}
int add(int x, int y) {
int z;
z = x + y;
return z;
}
e = 30
d = 20 eax = 50 Mult:
....
call add
addl $16, %esp
30 (y) movl %eax, -16(%ebp)
20 (x) movl 8(%ebp), %eax
imull 12(%ebp), %eax
Retadd mult()
movl %eax, %edx
Y(prev ebp) movl -16(%ebp), %eax
50 addl %edx, %eax
movl %eax, -12(%ebp)
ebp movl -12(%ebp), %eax
leave
esp ret
Stack X
b
int mult(int a, int b) {
a int c, d = 20, e = 30, f;
RetAddr main() f = add(d, e);
c = a * b + f;
X (prev ebp) return c;
Y
}
int add(int x, int y) {
int z;
z = x + y;
f = 50 (eax) return z;
}
e = 30
d = 20 eax = 50 Mult:
....
call add
addl $16, %esp
30 (y) movl %eax, -16(%ebp)
20 (x) movl 8(%ebp), %eax
imull 12(%ebp), %eax
Retadd mult()
movl %eax, %edx
Y(prev ebp) movl -16(%ebp), %eax
50 addl %edx, %eax
movl %eax, -12(%ebp)
ebp movl -12(%ebp), %eax
leave
esp ret
Stack X
b
int mult(int a, int b) {
a int c, d = 20, e = 30, f;
RetAddr main() f = add(d, e);
c = a * b + f;
X (prev ebp) return c;
Y
}
int add(int x, int y) {
int z;
z = x + y;
f = 50 return z;
}
e = 30
d = 20 eax = a Mult:
eax = eax * b ....
edx = eax call add
eax = f addl $16, %esp
30 (y) eax = edx + eax movl %eax, -16(%ebp)
20 (x) // eax = a*b + f movl 8(%ebp), %eax
imull 12(%ebp), %eax
Retadd mult()
movl %eax, %edx
Y(prev ebp) movl -16(%ebp), %eax
50 addl %edx, %eax
movl %eax, -12(%ebp)
ebp movl -12(%ebp), %eax
leave
esp ret
Stack X
b
int mult(int a, int b) {
a int c, d = 20, e = 30, f;
RetAddr main() f = add(d, e);
c = a * b + f;
X (prev ebp) return c;
Y
}
int add(int x, int y) {
int z;
c = eax z = x + y;
f = 50 return z;
}
e = 30
d = 20 // eax = a*b + f Mult:
....
call add
addl $16, %esp
30 (y) movl %eax, -16(%ebp)
Again
20 (x) some movl 8(%ebp), %eax
redundant imull 12(%ebp), %eax
Retadd mult()
code movl %eax, %edx
Y(prev ebp) movl -16(%ebp), %eax
50 addl %edx, %eax
movl %eax, -12(%ebp)
ebp movl -12(%ebp), %eax
leave
esp ret
Stack X
b
int mult(int a, int b) {
a int c, d = 20, e = 30, f;
RetAddr main() f = add(d, e);
c = a * b + f;
X (prev ebp) return c;
Y
}
int add(int x, int y) {
int z;
c = eax z = x + y;
f = 50 return z;
}
e = 30
d = 20 After leave Mult:
// eax = a*b + f ....
call add
addl $16, %esp
30 (y) movl %eax, -16(%ebp)
20 (x) movl 8(%ebp), %eax
imull 12(%ebp), %eax
Retadd mult()
movl %eax, %edx
Y(prev ebp) movl -16(%ebp), %eax
50 addl %edx, %eax
movl %eax, -12(%ebp)
ebp movl -12(%ebp), %eax
leave
esp ret
Lessons
Calling function (caller)
Pushes arguments on stack , copies values
On call
Return IP is pushed
Initially in called function (callee)
Old ebp is pushed
ebp = stack
Stack is decremented to make space for local
variables
Lessons
Before Return
Ensure that result is in ‘eax
On Return
stack = ebp
Pop ebp (ebp = old ebp)
On ‘ret’
Pop ‘return IP’ and go back in old function
Lessons
This was a demonstration for a
User program, compiled with GCC, On Linux
Followed the conventions we discussed earlier
Applicable to
C programs which work using LIFO function calls
Compiler can’t generate code using this
mechanism for
Functions like fork(), exec(), scheduler(), etc.
Boot code of OS
Memory Management
X86 address : protected mode
address translation
X86 paging
Segmentation + Paging
Selector value is
implicit based on
address being
accessed.
Instruction: CS
Stack Variable: SS
Data: DS
etc.
GDT Entry
Page Directory Entry (PDE)
Page Table Entry (PTE)
Segment selector
EFLAGS register
CR0
CR2
CR3
CR4
mmu.h : paging related macros
#define PTXSHIFT 12 // offset of PTX in a linear address
#define PDXSHIFT 22 // offset of PDX in a linear address
#define PDX(va) (((uint)(va) >> PDXSHIFT) & 0x3FF)// page
directory index
#define PTX(va) (((uint)(va) >> PTXSHIFT) & 0x3FF)// page table
index
// construct virtual address from indexes and offset
#define PGADDR(d, t, o) ((uint)((d) << PDXSHIFT | (t) << PTXSHIFT |
(o)))
// +--------10------+-------10-------+---------12----------+
// | Page Directory | Page Table | Offset within Page |
// | Index | Index | |
// +----------------+----------------+---------------------+
// \--- PDX(va) --/ \--- PTX(va) --/
mmu.h : paging related macros
// Page directory and page table constants.
#define NPDENTRIES 1024 // #
directory entries per page directory
#define NPTENTRIES 1024 // # PTEs
per page table
#define PGSIZE 4096 // bytes
mapped by a page
#define PGROUNDUP(sz) (((sz)+PGSIZE-1) &
~(PGSIZE-1))
#define PGROUNDDOWN(a) (((a)) & ~(PGSIZE-1))
mmu.h : paging related macros
// Page table/directory entry flags.
#define PTE_P 0x001 // Present
#define PTE_W 0x002 // Writeable
#define PTE_U 0x004 // User
#define PTE_PS 0x080 // Page Size
Logical
Address =
offset Physical
Address
3
2 0 4GB Write
1 0 4GB Read, Execute
DS 0 0 0 0
GDT
SS
CS
From entry: RAM
Till: inside main(), before kvmalloc()
Logical
Address =
offset Linear
Address
4MB
CS, SS, etc. 0
Selector Base Limit Permissions
3
2 0 4GB Write 512 0 P,W,PS
1 0 4GB Read, Execute .
DS 0 0 0 0 .
GDT .
SS
3
2
CS
1
CR3 0 0 P,W,PS
entrypgdir
From entry: RAM
Till: inside main(), before kvmalloc()
Physical
Addr
Logical
Address =
offset Dir Offset
4MB
CS, SS, etc. 0
Selector Base Limit Permissions
3
2 0 4GB Write 512 0 P,W,PS
1 0 4GB Read, Execute .
DS 0 0 0 0 .
GDT .
SS
3
2
CS
1
Even now, every
CR3 0 0 P,W,PS
Logical address =
Physical address, but entrypgdir
through Page dir
Free List in XV6 Obtained after main() -> kinit1()
lock
kmem
uselock Seen
Actually like
run *freelist independent
this in memory
ly
DEVSPACE=3.96GB
4GB
Un
mapped DEVSPACE=3.96GB
KERNBASE+PHYSTOP=
2.224GB= 2272MB Unused
Kernel
data +
memory PHYSTOP = 224MB
data= 2049.0.3125 MB Kernel
Kernel data +
code + memory
RO Data
KERNBASE+EXTMEM=2049
Kernel 1.03125 MB = 1MB +
MB I/O
code + data
Space
RO Data
KERNBASE=2048MB
I/O EXTMEM=1MB
Process Space
address
space 0x80108000 =data
0
= 2049.3125 MB
0 Is obtained from
kernel.sym
After kvmalloc() in main() RAM
Physical
Addr
Logical Linear
Address = Address
offset Dir pg Offset
4MB
3
2 0 4GB Write
1 0 4GB Read, Execute
DS 0 0 0 0
GDT
SS
Page
Table
CS
Now Linear Address =
CR3
Logical Address !=
Physical Address kpgdir
After seginit() in main(). RAM
On the processor where we started booting Physical
Addr
Logical Linear
Address = Address
offset Dir pg Offset
4MB