03 Machine Basics
03 Machine Basics
14-513 18-613
1
Carnegie Mellon
2
Carnegie Mellon
Announcements
Lab 0 due today at midnight – no grace days allowed
If lab is taking you > 10 hours, consider dropping the course or
preparing to study hard on C over next 3 weeks!
Handin via autolab (if still on waitlist, submit once off waitlist)
Lab 1 (datalab) went out Jan 18, is due Feb 1
Lab 2 (bomb lab) goes out via Autolab on Thurs Jan 25
Due Feb 8
3
Carnegie Mellon
4
Carnegie Mellon
5
Carnegie Mellon
6
Carnegie Mellon
7
Carnegie Mellon
8
Carnegie Mellon
Our Coverage
IA32
The traditional x86
For 15/18-213: RIP, Summer 2015
x86-64
The standard
shark> gcc hello.c
shark> gcc –m64 hello.c
Presentation
Book covers x86-64
Web aside on IA32
We will only cover x86-64
13
Carnegie Mellon
14
Carnegie Mellon
Levels of Abstraction
#include <stdio.h>
int main(){
C programmer int i, n = 10, t1 = 0, t2 = 1, nxt;
for (i = 1; i <= n; ++i){
printf("%d, ", t1);
nxt = t1 + t2;
t1 = t2;
t2 = nxt; }
return 0; } Seems like nice
clean layers…
Assembly programmer
Computer Designer
Gates, clocks, circuit layout, …
15
Carnegie Mellon
Definitions
Architecture or ISA (instruction set architecture): The
interface between software and hardware. Aka, the
lowest-level programming interface.
Examples: instruction set specification, registers, memory model
Not a specification of the hardware itself.
Microarchitecture: Implementation of the architecture
Examples: cache sizes and core frequency
Much more – eg, modern processors execute instr’ns out-of-order!
Code Forms:
Machine Code: The byte-level programs that a processor executes
Assembly Code: A text representation of machine code
Example ISAs:
Intel: x86, IA32, Itanium, x86-64
ARM: Used in almost all mobile phones, and some laptops/servers
RISC V: New open-source ISA
16
Carnegie Mellon
Programmer-Visible State
PC: Program counter Memory
Byte addressable array
Address of next instruction
Code and user data
Called “RIP” (x86-64)
Stack to support procedures
Register file
Heavily used program data
Condition codes
Store status information about most
recent arithmetic or logical operation
Used for conditional branching
17
Carnegie Mellon
18
Carnegie Mellon
addq
add %rbx, %rax
is
rax += rbx
19
Carnegie Mellon
source
%esi %si index
destination
%edi %di index
stack
%esp %sp
pointer
base
%ebp %bp
pointer
Assembly: Operations
Transfer data between memory and register
Load data from memory into register
Store register data into memory
Transfer control
Unconditional jumps to/from procedures
Conditional branches
Indirect branches
22
Carnegie Mellon
Activity 1
Form pairs
One person open the activity instructions at
https://www.cs.cmu.edu/~213/activities/gdb-and-assembly.pdf
wget http://www.cs.cmu.edu/~213/activities/gdb-and-
assembly.tar
tar xf gdb-and-assembly.tar
cd gdb-and-assembly
./act1
23
Carnegie Mellon
movq (%rcx),%rax
movq 8(%rbp),%rdx
26
Carnegie Mellon
Special Cases
(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]]
D(Rb,Ri) Mem[Reg[Rb]+Reg[Ri]+D]
(Rb,Ri,S) Mem[Reg[Rb]+S*Reg[Ri]]
27
Carnegie Mellon
Activity 2
28
Carnegie Mellon
%rsi
%rdi
29
Carnegie Mellon
void swap
(long *xp, long *yp)
{ swap:
long t0 = *xp; movq (%rdi), %rax
long t1 = *yp; movq (%rsi), %rdx
*xp = t1; movq %rdx, (%rdi)
*yp = t0; movq %rax, (%rsi)
} ret
30
Carnegie Mellon
Understanding Swap()
Memory
void swap Registers
(long *xp, long *yp)
{ %rdi
long t0 = *xp;
%rsi
long t1 = *yp;
*xp = t1; %rax
*yp = t0;
} %rdx
Register Value
%rdi xp
%rsi yp
swap:
%rax t0
movq (%rdi), %rax # t0 = *xp
%rdx t1 movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
31
Carnegie Mellon
Understanding Swap()
Memory
Registers Address
123 0x120
%rdi 0x120
0x118
%rsi 0x100
0x110
%rax 0x108
%rdx 456 0x100
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
32
Carnegie Mellon
Understanding Swap()
Memory
Registers Address
123 0x120
%rdi 0x120
0x118
%rsi 0x100
0x110
%rax 123 0x108
%rdx 456 0x100
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
33
Carnegie Mellon
Understanding Swap()
Memory
Registers Address
123 0x120
%rdi 0x120
0x118
%rsi 0x100
0x110
%rax 123 0x108
%rdx 456 456 0x100
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
34
Carnegie Mellon
Understanding Swap()
Memory
Registers Address
456 0x120
%rdi 0x120
0x118
%rsi 0x100
0x110
%rax 123 0x108
%rdx 456 456 0x100
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
35
Carnegie Mellon
Understanding Swap()
Memory
Registers Address
456 0x120
%rdi 0x120
0x118
%rsi 0x100
0x110
%rax 123 0x108
%rdx 456 123 0x100
swap:
movq (%rdi), %rax # t0 = *xp
movq (%rsi), %rdx # t1 = *yp
movq %rdx, (%rdi) # *xp = t1
movq %rax, (%rsi) # *yp = t0
ret
36
Carnegie Mellon
movq (%rcx),%rax
movq 8(%rbp),%rdx
37
Carnegie Mellon
Address
Expression Address
Computation
0x8(%rdx) 0xf000 + 0x8 0xf008
(%rdx,%rcx) 0xf000 + 0x100 0xf100
(%rdx,%rcx,4) 0xf000 + 4*0x100 0xf400
0x80(,%rdx,2) 2*0xf000 + 0x80 0x1e080
38
Carnegie Mellon
Address
Expression Address
Computation
0x8(%rdx) 0xf000 + 0x8 0xf008
(%rdx,%rcx) 0xf000 + 0x100 0xf100
(%rdx,%rcx,4) 0xf000 + 4*0x100 0xf400
0x80(,%rdx,2) 2*0xf000 + 0x80 0x1e080
39
Carnegie Mellon
Quiz Time!
Check out:
https://canvas.cmu.edu/courses/39547/quizzes/118132
40
Carnegie Mellon
41
Carnegie Mellon
Uses
Computing addresses without a memory reference
E.g., translation of p = &x[i];
Computing arithmetic expressions of the form x + k*y
k = 1, 2, 4, or 8
Example
long
long m12(long
m12(long x)
x)
{
Converted to ASM by compiler:
{
return
return x*12;
x*12; leaq
leaq (%rdi,%rdi,2),
(%rdi,%rdi,2), %rax
%rax #
# t
t =
= x+2*x
x+2*x
}
} salq
salq $2,
$2, %rax
%rax #
# return
return t<<2
t<<2
42
Carnegie Mellon
44
Carnegie Mellon
45
Carnegie Mellon
46
Carnegie Mellon
47
Carnegie Mellon
48
Carnegie Mellon
49
Carnegie Mellon
51
Carnegie Mellon
52
Carnegie Mellon
Object Code
Code for sumstore Assembler
0x0400595: Translates .s into .o
0x53
0x48 Binary encoding of each instruction
0x89 Nearly-complete image of executable code
0xd3 Missing linkages between code in different
0xe8
files
0xf2
0xff Linker
0xff Resolves references between files
0xff •
0x48
Total of 14 bytes Combines with static run-time libraries
• Each instruction
E.g., code for malloc, printf
0x89
1, 3, or 5 bytes
0x03 Some libraries are dynamically linked
0x5b • Starts at address Linking occurs when program begins
0xc3 0x0400595
execution
53
Carnegie Mellon
54
Carnegie Mellon
Disassembler
objdump –d sum
Useful tool for examining object code
Analyzes bit pattern of series of instructions
Produces approximate rendition of assembly code
Can be run on either a.out (complete executable) or .o file
55
Carnegie Mellon
Alternate Disassembly
Disassembled
56
Carnegie Mellon
Alternate Disassembly
Disassembled
Object
Code
Dump of assembler code for function sumstore:
0x0400595: 0x0000000000400595 <+0>: push %rbx
0x53 0x0000000000400596 <+1>: mov %rdx,%rbx
0x48 0x0000000000400599 <+4>: callq 0x400590 <plus>
0x89 0x000000000040059e <+9>: mov %rax,(%rbx)
0xd3 0x00000000004005a1 <+12>:pop %rbx
0xe8 0x00000000004005a2 <+13>:retq
0xf2
0xff
0xff
0xff
Within gdb Debugger
0x48 Disassemble procedure
0x89 gdb sum
0x03
disassemble sumstore
0x5b
0xc3 Examine the 14 bytes starting at sumstore
x/14xb sumstore
57
Carnegie Mellon
No symbols in "WINWORD.EXE".
Disassembly of section .text:
30001000 <.text>:
30001000: 55 push %ebp
30001001: 8b ec mov %esp,%ebp
30001003: 6a ff
Reverse engineering
push
forbidden by
$0xffffffff
30001005: 68Microsoft
90 10 00End
30 User
push License Agreement
$0x30001090
3000100a: 68 91 dc 4c 30 push $0x304cdc91
59