0% found this document useful (0 votes)
50 views84 pages

Hardware-Assisted Run-Time Protection

Uploaded by

dajunbenmo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views84 pages

Hardware-Assisted Run-Time Protection

Uploaded by

dajunbenmo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 84

Hardware-assisted

Run-time Protection
N. Asokan†‡
https://asokan.org/asokan/
@nasokan

Acknowledgements: Thomas Nyman‡, Hans Liljestrand†, Lachlan J Gunn‡, Jan-Erik Ekberg‡, §


†) University of Waterloo, ‡) Aalto University, §) Huawei Technologies
You will be learning

 Part 1: Memory-related run-time attacks


• Common attack techniques against C/C++

 Part 2: Hardware-assisted defenses


• Emerging mechanisms in CotS processors

 Part 3: Theory of run-time attacks


• What are “weird machines”?

2
Example: Buffer overflows caused by
missing bounds checks
Software
Developer
08048464 <main>:
8048464: 55 push %ebp
8048465: 89 e5 mov %esp,%ebp
8048467: 68 20 85 04 08 push $0x8048520
804846c: e8 9f fe ff ff call 8048310 <puts@plt>
8048471: 83 c4 04 add $0x4,%esp
8048474: 8b 45 0c mov 0xc(%ebp),%eax
8048477: 83 c0 04 add $0x4,%eax
804847a: 8b 00 mov (%eax),%eax
804847c: 50 push %eax
804847d: 0804843b
e8 b9 <doit>:
ff ff ff call 804843b <doit>
int main(int argc, char *argv[]) 8048482: 83 c4
804843b: 04 55 add $0x4,%esp
push %ebp
{ 8048485: 68 31 85 04 89
804843c: 08 e5
push $0x8048531
mov %esp,%ebp
puts("So... The End..."); 804848a: e8 81 fe ff 83
804843e: ff ec
call
0c 8048310
sub <puts@plt>
$0xc,%esp
804848f: 83 c4 04
8048441: 8d 45add
f4 $0x4,%esp
lea -0xc(%ebp),%eax
doit(argv[1]);
puts("or... maybe not?"); Source Executable 8048492:
8048497:
b8 00 00 00 89
8048444:
c9
8048447:
00 45
ff 75
mov
fc
leave
08
$0x0,%eax
mov %eax,-0x4(%ebp)
pushl 0x8(%ebp)
8048498: c3 ret
return 0; code File 8048499:
804844a:
66 90
804844d:
8d 45
50 xchg
f4 lea
%ax,%ax
push
-0xc(%ebp),%eax
%eax
} 804849b: 66 90
804844e: e8 adxchg
fe ff%ax,%ax
ff call 8048300 <strcpy@plt>
void doit(char *str) 804849d: 66 90
8048453: 83 c4xchg
08 %ax,%ax
add $0x8,%esp
804849f: 90
8048456: ff 75nop
fc pushl -0x4(%ebp)
{
8048459: e8 b2 fe ff ff call 8048310 <puts@plt>
char buf[8];
804845e: 83 c4 04 add $0x4,%esp
char *ptr = buf; 8048461: 90 nop
Compiler & Linker 8048462: c9 leave
strcpy(buf, str); 8048463: c3 ret
puts(ptr);
}

missing bounds-checks in call to strcpy! 3


void doit(char *str)
{
Run-time behaviour char buf[8];
char *ptr = buf;

strcpy(buf, str);
puts(ptr);
User }

0xffffffff
$ ./a.out “Hello !” parent stack frame
Kernel Space Kernel space
0xc0000000
Stack (grows down) =TASK_SIZE
08048464 <main>:
... User Space
8048474: 8b 45 0c mov 0xc(%ebp),%eax
8048477: 83 c0 04 add $0x4,%eax 0xbfffceb8
804847a: 8b 00 mov (%eax),%eax arguments: 0xbfffd18a (argv[1])
804847c: 50 push %eax 0xbfffceb4
804847d: e8 b9 ff ff ff call 804843b <doit> return address: 0x08048482 (saved eip)
0xbfffceb0 Memory Mapping
8048482: 83 <doit>:
0804843b c4 04 add $0x4,%esp frame pointer: 0xbfffceb8 (saved ebp) Region
8048485: 0x40000000
804843b: 31 85
68 55 04 08 push $0x8048531
push %ebp 0xbfffceac
804848a: e8
804843c: 81 fe
89 ff
e5 ff call 8048310%esp,%ebp
mov <puts@plt> ptr: 0xbfffcea0 (&buf)
804848f: 83 c4 04 add $0x4,%esp 0xbfffcea8
804843e: 83 ec 0c sub $0xc,%esp ’\0’ ’!’ ’ ’ ’O’
8048492: b8
8048441: 00 00
8d 00
45 00
f4 mov $0x0,%eax
lea -0xc(%ebp),%eax 0xbfffcea4
8048497: c9
8048444: 89 45 fc leavemov %eax,-0x4(%ebp) buf: ’L’ ’L’ ’E’ ’H’
8048498: c3 ret 0xbfffcea0
8048447: ff 75 08 pushl 0x8(%ebp) arguments: 0xbfffcea0 (ptr) Heap (grows up)
8048499: 66 90 8d 45 f4
804844a: xchg %ax,%ax-0xc(%ebp),%eax
lea 0xbfffce9c
804849b: 66 90 50 xchg %ax,%ax%eax 0xbfffd18a (str) Bss segment
804844d: push
804849d: 66 90 e8 ad fe xchg %ax,%ax8048300 <strcpy@plt> 0xbfffce98
804844e: ff ffcall Data segment
804849f: 90 nop strcpy stack frame
8048453: 83 c4 08 add $0x8,%esp
8048456: ff 75 fc pushl -0x4(%ebp) Text segment
8048459: e8 b2 fe ff ff call 8048310 <puts@plt>
804845e: 83 c4 04 add $0x4,%esp 0x0804000
8048461: 90 nop 0x00000000
8048462: c9 leave
8048463: c3 ret
4
void doit(char *str)
{
Control-flow hijacking char buf[8];
char *ptr = buf;

strcpy(buf, str);

$ ./a.out $(perl -e 'print "A"x8 \ }


puts(ptr);

."\x??\x??\x04\x08"
"\x??\x??\x04\x08" \
."A"x4
"A"x4 \
0xffffffff
parent stack frame
Kernel Space Kernel space
."\x64\x84\x04\x08"')
"\x64\x84\x04\x08" 0xc0000000
Stack (grows down) =TASK_SIZE

libc init:
User Space

0xbfffceb8
arguments: 0xbfffd18a (argv[1])
0xbfffceb4
08048464:
return address: 0x08048464 (main)
0x08048482 (saved eip)
<main>: 0xbfffceb0 Memory Mapping
frame pointer: 0xffffceb8
’A’ ’A’ (saved
’A’ ebp)
’A’ Region
0xbfffceac 0x40000000
ptr: 0x0804????
0xffffcea0 (&buf)
(data)
0xbfffcea8
0804843b: ’A’
’\0’ ’!’
’A’ ’ ’
’A’ ’O’
’A’
0xbfffcea4
<doit>: ’L’
buf: ’A’ ’L’
’A’ ’E’
’A’ ’H’
’A’
0xbfffcea0
0xbfffcea0 (ptr) Heap (grows up)
0xbfffce9c
0xbfffd18a (str) Bss segment
0x8048300: 8048310:
strcpy stack frame Data segment
<strcpy>: puts:
Text segment

0x08040000
0x00000000
corrupt code pointer /
control flow 5
Memory-related
run-time attacks
Memory-related run-time attacks

Software written in memory unsafe languages such as C/C++


• Suffer from various memory-related errors

Memory errors may allow run-time attacks to compromise program behaviour


• Control-flow hijacking / code injection
• Return-Oriented Programming (ROP)
• Non-control-data attacks
• Data-Oriented Programming (DOP)

7
Run-time attacks compromise program behaviour

Adversary
exploits bug
1 if (authenticated != true)
then: call unprivileged()


else: call privileged() 1 
2 unprivileged() { … }
3 privileged() { … } 2 3  shellcode

8
(i) Code-injection attacks

Exploit memory error (e.g. buffer overflow) to:


• Inject shellcode into writable memory (usually stack)
• Corrupt code pointer (usually return address) to
redirect execution flow to shellcode

Countermeasures:
shellcode

• Stack canaries (1990) frame- badreturn
return
address
address

stack frame
record frame-pointer
Detect sequential overwrites that corrupt ret. addr. FP
• W⊕X memory access control policy (2003)
Prevent execution of shellcode by ensuring that buffer
SP
memory pages are either writable or executable

Elias Levy (as Aleph One), Smashing the stack for fun and profit, Phrack 7 (1996)
Cowan et al., StackGuard: Automatic adaptive detection and prevention of buffer-overflow attacks, USENIX Security (1998) 9
Szekeres et al., SoK: Eternal War in Memory, IEEE SP (2013)
void doit(char *str)
{
Classic code-injection char buf[8];
char *ptr = buf;

strcpy(buf, str);
puts(ptr);
$ ./a.out $(perl -e 'print "A"x8 \ }
."\x??\x??\x04\x08"
"\x??\x??\x04\x08" \
."A"x4
"A"x4 \ 0xffffffff
."\xb8\xce\xff\xfb"
"\xb8\xce\xff\xfb" \ parent stack frame
Kernel space
."\x80\xcd" ."\x40“ \ 0xc0000000
0xbfffcebc Stack (grows down)
."\xc0\x31" ."\x80\xcd" \ =TASK_SIZE


libc init: ."\x0b\xb0" ."\xc2\x89" \
."\xc1\x89" ."\xe3\x89" \ shellcode
0xbfffceb8
."\x6e\x69\x62\x2f\x68" \ 0xffffd18a (argv[1])
08048464: ."\x68\x73\x2f\x2f\x68" \ 0xbfffceb8
0x08048482 (saved eip)
0xbfffceb4

<main>:
."\x50" ."\xc0\x31") ’A’ ’A’ (saved
0xffffceb8 ’A’ ebp)
’A’
0xbfffceb0 Memory Mapping
Region
0xbfffceac 0x40000000
0xffffcea0 (&buf)
0x0804???? (data)
0xbfffcea8
0804843b: ’A’
’\0’ ’!’
’A’ ’ ’
’A’ ’O’
’A’
<doit>:
 :0xffffceb8
:<shellcode> ’L’
’A’ ’L’
’A’ ’E’
’A’ ’H’
’A’
0xbfffcea4

0xbfffcea0
Heap (grows up)
0xbfffcea0 (ptr)
0xbfffce9c
0xbfffd18a (str) Bss segment
0x8048300: 8048310:
strcpy stack frame Data segment
<strcpy>: puts:
Text segment

0x0804000
0x00000000

10
Return-oriented programming (high-level idea)

Re t u r n o r ien t ed Pro g ra mm ing


11
Return-oriented programming
Attacker arranges call stack with code pointers to existing code sequences (“gadgets”)
• Given a suitable gadget set, arbitrary return-oriented programs can be constructed


push edi
Adversary
leave
ret
exploits bug



cmp eax,edi
leave
ret

… …
mov eax,[ebp+0x8] cmp [esi+0x74],edi
leave … leave
ret add $0x4,%esp ret
leave 12
ret
(ii) Code-reuse attacks

gadget 3:
Exploit memory error without injecting code: br <puts>
ret
• Corrupt code pointer (usually return address) to …
address of gadget 3
redirect execution flow to existing code: gadget 2:
• Library functions (return-into-libc) forged data load x0, x0
• Pre-existing instruction sequences (gadgets) address of gadget 2 ret

forged data
function:
gadget 1:
Countermeasures: frame- address
address of
returnof function
address
gadget 1 …
load x0, sp

stack frame
• Control-flow Integrity (2005) record frame-pointer ret
FP
Detect control-flow transfers outside static SP+size
control-flow graph or mismatched returns
buffer
(shadow stack) SP

• Address space randomization (2001)


Hide locations of useful gadgets in memory
A. Peslyak (as Solar Designer), Getting around non-executable stack (and fix), Bugtraq (1997)
H. Shacham, The geometry of innocent flesh on the bone: return-into-libc without function calls (on the x86), ACM CCS (2007)
T. Kornau, Return Oriented Programming for the ARM Architecture, MSc Thesis, RUB (2009) 13
M. Abadi, Control-flow integrity, ACM CCS (2005)
CFI: High-level idea

CFI check at A CFI violation at D


Allowed edges: (A,B), (A,F) A Disallowed edge: (F,D)

CFI check at B CFI check at D


Allowed edges: (B,A), (B,C), (B,D) B Allowed edges: (D,B), (D,G)

CFI check at C CFI check at F


Allowed edges: (C,B), (C,G) C D F Allowed edges: (F,A)

 CFI check at G
G Allowed edges: (G,C), (G,D)

Legend:
intended forward-edge in CFG initial node in CFG
intended backward-edge in CFG node in CFG
malicious edge not part of CFG 15
Shadow Stack: High-level idea

“Shadow stack” Adversary


A tampers with


shadow stack

B
A→B→C

 
C

16
void doit(char *str)
{
Non-control data attack char buf[8];
char *ptr = buf;

strcpy(buf, str);

$ ./a.out $(perl -e 'print "A"x8 \ }


puts(ptr);

."\x08\xb0\xc4\x09"
"\x08\xb0\xc4\x09" )
0xffffffff
parent stack frame
Kernel space
0xc0000000
Stack (grows down) =TASK_SIZE

libc init:

0xbfffceb8
0xffffceb8
arguments: 0xbfffd18a (argv[1])
0xbfffceb4
08048464:
return address: 0x08048482 (saved eip)
<main>: 0xbfffceb0 Memory Mapping
frame pointer: 0xbfffceb8 (saved ebp) Region
0xbfffceac 0x40000000
ptr: 0x09c4b008 (&buf)
0xffffcea0 (in heap)
0xbfffcea8
0804843b: ’A’
’\0’ ’!’
’A’ ’ ’
’A’ ’O’
’A’
0xbfffcea4
<doit>: ’L’
buf: ’A’ ’L’
’A’ ’E’
’A’ ’H’
’A’
0xbfffcea0
0xbfffcea0 (ptr) Heap (grows up)
0xbfffce9c
0xbfffd18a (str) Bss segment
0x8048300: 8048310:
strcpy stack frame Data segment
<strcpy>: puts:
Text segment

0x0804000
0x00000000
Program logic that can be influenced Attacker influences the behavior of
as result of memory vulnerability benign program code without breaking 17
constitute “data-oriented gadgets” control-flow integrity
Data-oriented Programming
Enables expressive computation via use of “data-oriented gadgets” without diverging
from program’s benign control-flow
• Requires a “gadget dispatch” that allows chaining together gadgets at will

loop Adversary
selector exploits bug

data-oriented program

#2
copy()

#1
#1 #2

#2 #3
#3 …
load()
… …
store()
… 18
Data-oriented programming
Given a suitable gadget dispatch, Dispatch must be able to
an attacker can chain together chain data-oriented gadgets
0xffffffff
data-oriented gadgets at will without violating control-flow
Kernel space
0xc0000000
Stack (grows down) =TASK_SIZE

Memory Mapping
Region
loop selector 0x40000000

#1
#2 #1 #1
Heap (grows up)
Bss segment
#2 Data segment
Text segment

0x0804000
#3 #3 0x00000000

corrupt data flow 19


Selected Research & Vulnerabilities
ret2libc Morris Worm: CVE-1999-1416: DoS & RCE in Solaris Answerbook2 Format string vulnerabilities
1988-99 Solar Designer (Phrack) RCE in fingerd
Anders (Bugtraq. 1999)
Advanced ret2libc
2001 Nergal (Phrack)
x86-64 borrowed code chunks exploitation Non-control-data attacks
2005 Krahmer Chen et al (SSYM. ’05)
ROP on x86
2007 Shacham (CCS’07)
ROP on ATMEL AVR ROP on SPARC
2008 Francillon et al (CCS’08) Buchanan et al (CCS’08)
ROP Rootkits ROP on PowerPC ROP on ARM / iOS
2009 Hund et al (USENIX Sec. ’09) FX Lindner (BlackHat USA) Miller et al (BlackHat Europe)
ROP w/o Returns CVE-2010-3765: Nobel Peace Price website 0day
2010 Checkoway et al (CCS’10)
CVE-2010-2883: RCE in Adobe Reader and Acrobat

CVE-2011-1938: RCE in PHP


CVE-2012-0003: RCE in WMP MIDI String-Oriented Programming
2011-12 library Payer (28C3. ’11)
JIT-ROP
2013 Snow et al (IEEE S&P’13)
CVE-2013-3893: RCE in Internet Explorer
CVE-2014-9222: Misfortune cookie in RomPager
Blind ROP Stitching Gadgets Write Once, Pwn Anywhere
2014 Bittau et al (IEEE S&P’14) Davi et al (USENIX’14) CVE-2014-0160: Heartbleed vuln. in OpenSSL
Yu (BlackHat USA’14)
Out-of-Control Gadget size Matters ROP is Still Dangerous Data-Oriented Exploits
2015 Göktas et al (IEEE S&P’14) Göktas et al (USENIX’14) Carlini et al (USENIX’14) Hu et al (USENIX Sec.’15)
SROP Control-flow Bending CVE-2016-0034: Angler RCE in Silverlight
DOP
2016 Bosman et al (IEEE S&P’14) Carlini et al (USENIX Sec.’16) Hu et al (IEEE S&P ’16)
20
Taxonomy of Defenses
Out-of-bounds Dangling Format string
pointer pointer vulnerability
Memory

vulnerability Memory-safety
Unintended Unintended
Read Write

Software
Integrity Modify Modify Code-pointer Modify non-
❷ Exfiltrate data Code Integrity Compart-
violation code control-data Integrity control-data
mentalization

Exploit Interpret Inject attacker- Instruction Set Inject attacker- Address-space Inject attacker- Software

Payload exfiltrated data controlled code
Randomization controlled address
Randomization controlled data Diversification

Control-
Exploit Indirect jump to Return to Use of corrupt Data-flow
❹ flow
Dispatch corrupt address corrupt address
Integrity
data Integrity

Exploit Binary Execute Execute injected Execute gadget Execute data- Run-time

Execution Attestation modified code code fragment / code fragment oriented gadget Attestation

Information Code-injection Control-flow Data-oriented


leak attack attack attack
21
From Thomas Nyman’s doctoral dissertation, Towards Hardware-assisted Run-time Protection, 2020 (Figure Adapted from Szekeres et al., SoK: Eternal War in Memory, IEEE SP (2013))
Software, coarse-grained Software, fine-grained

Containers Software CFI


Memory-safe
chroot Privilege kernels languages

Virtual machines

Harvard architecture Memory Protection HW-assisted CFI


Tagged memory
TEEs (MPU) Branch target
Virtual memory
HW shadow stack indicators
Memory segments (MMU) W⊕X
Pointer Authentication Fine-grained
Protection rings Enclaves protection domains
(Exception levels) HW-assisted bounds checks
Run-time scope enforcement
Hardware, coarse-grained Hardware, fine-grained
22
Hardware-assisted
defenses
How to thwart run-time attacks?

Run-time attacks are now routine

Software defenses incur security vs. cost tradeoffs

Hardware-assisted defenses are attractive

24
Protect against run-time attacks
without incurring a significant
performance penalty

25
Design new hardware-security mechanisms

Example: HardScope
Enforce variable visibility rules at run time
Mitigate effects of attacks that corrupt data-plane information
Digital design, FPGA realization, compiler instrumentation, extensive analysis

Deployment challenge:
• Required the addition of 7 new instructions to the RISC-V ISA

Nyman et al. HardScope: Hardening Embedded Systems Against Data-Oriented Attacks. DAC 2019

26
Hardware assisted defenses in CotS processors

ARMv8-A mechanisms Intel x84_64 mechanisms

Pointer Authentication Memory Protection


(PA) eXtension (MPX)
Memory Tagging Memory Protection Keys
Extension (MTE) (PKU)
Branch Target Control-flow Enforcement
Identification (BTI) Technology (CET)

27
ARMv8-A mechanisms
Pointer Integrity: memory safety for pointers

Ensure pointers in memory remain unchanged

• Code pointer integrity implies CFI


• Control-flow attacks manipulate code pointers
function {
store return_address
• Data pointer integrity …
• Reduces data-only attack surface …
… corrupt_address!


load return_address
verify integrity
PI return
}
Kuznetsov et al. “Code-Pointer Integrity”, USENIX OSDI 2014
29
ARMv8.3-A Pointer Authentication

PA Key PA
PAuth management Instructions

General purpose hardware primitive approximating pointer integrity


• Ensure pointers in memory remain unchanged

Introduced in ARMv8.3-A specification (2016), improved in ARMv8.6-A (2020)


• First compatible processors 2018 (Apple A12 / iOS12)
• Userspace support in Linux 4.21, enhancements in 5.0, in-kernel support in 5.7
• Instrumentation support in GCC 7.0 ( -msign-return address, deprecated in GCC 9.0
-mbranch-protection=pac-ret[+leaf] GCC 9.0 and newer)
ARM. Arm® Architecture Reference Manual Armv8, for Armv8-A architecture profile. Version E.a. July 2019 31
ARM. Developments in the Arm A-Profile Architecture: Armv8.6-A. September 2019
ARMv8.3-A PA – PAC Generation

Adds Pointer Authentication Code (PAC) into unused bits of pointer


• Keyed, tweakable MAC from pointer address and 64-bit modifier
• PA keys protected by hardware, modifier decided where pointer created and used

8 bits reserved bit 3 – 23 bits VA_SIZE bits

tag/PAC sign ext./PAC virtual address (AP)

general purpose registers


HK(AP, M) PA key (K)

configuration register
64-bit modifier (M)

32
ARM. Arm® Architecture Reference Manual Armv8, for Armv8-A architecture profile. Version E.a. July 2019
ARMv8.3-A PA – Key management and instructions

Keys for PAC generation and verification


APIAKey_EL1 Key A for instruction address PACs
APIBKey_EL1 Key B for instruction address PACs
APDAKey_EL1 Key A for data address PACs
APDBKey_EL1 Key B for data address PACs
APGAKey_EL1 Key for generic authentication
PA Instructions
PAC<i|d><a|b> <Xd> <Xm> Add PAC to address in Xd using modifier in Xm
AUT<i|d><a|b> <Xd> <Xm> Authenticate address in Xd using modifier in Xm
PACGA <Xd> <Xn> <Xm> Calculate generic PAC for data in Xn using modifier in Xm
XPAC<i|d> <Xd> Strip PAC for address in Xd
BRA<a|b> <Xn> <Xm> Branch to address in Xn after authenticating it with modifier in Xm
BLRA<a|b> <Xn> <Xm> As BRA but perform branch with link
RETA<a|b> Authenticate address in LR with SP as modifier and return
ERETA<a|b> Authenticate address in ELR with SP as modifier and exception return
LDRA<a|b> <Xt> <Xn> Authenticate address in Xn using modifier zero and load value to Xt
operate on instruction keys only
operate on data keys only 33
ARM. Arm® Architecture Reference Manual Armv8, for Armv8-A architecture profile. Version E.a. July 2019
PA-based protection schemes

PA instructions are primitives, assembled to form protection schemes

Two main components:


• When are pointers “PACed” and “unPACed”?
• Which modifier is used at a given point?

What should the modifier be for a given pointer?


• For security: using many different modifiers makes replay attacks harder
• For functionality: large numbers of modifiers are hard to keep track of

34
Example: -msign-return-address
Deployed in GCC 5.0 and LLVM/Clang 7.0

return address
func {
Function return address pacia LR, SP generate PAC ia key

str LR PAC PAC return address



STACK

ldr LR PAC? PAC? return address

autia LR, SP verify PAC PA


ia key
ret
pacia – add PAC
autia – authenticate }
Risk of PAC reuse!
36
Qualcomm “Pointer Authentication on ARMv8.3”, whitepaper 2017
PA return address protection as a canary

The signed return address effectively is a canary:


• Any overflow that corrupts the return address is detected

More powerful than -stack-protector canaries: frame- signed return address

stack frame
record frame-pointer
• Does not require reference value FP
• Can be bound to contextual information (e.g., the SP value)
buffer
• Protects return address against arbitrary writes SP

Also has similar weaknesses:


• Existing return addresses can be reused

37
Liljestrand et al, Protecting the stack with PACed canaries. SysTEX@SOSP 2019
PA only approximates fully-precise pointer integrity
Adversary may reuse PACs

func1 {
pacia LR, SP
str LR ..ab08
… ..ab10
… ..ab18
SP
/* func1() */ ..ab20 func2 stack frame
func1
..ab28 PAC+ret_address_1
PAC+ret_address_2
brl %func1 func2 {
..ab30
..ab38
… pacia LR, SP
..ab40
..ab48
/* func2() */ str LR
..ab50

brl %func2 … STACK

… ldr LR
autia LR, SP
pacia – add PAC ret
autia – authenticate }
38
PARTS
Modifier: based on pointer type // ptr = …
• Assigned at compile-time based on C type …
• “this pointer really points to this type of data or function” mov Xmod, #type_id
On-use or on-load authentication pacia Xptr, Xmod
• Branching with combined auth+branch instruction (blraa)
• Iterating an array uses only one authentication
PACed only on pointer creation!

// *ptr // ptr();
… Authenticated on load …
Authenticated on use
ldr Xptr, [Addr] …
pacda – add PAC with data A-key mov Xmod, #type_id mov Xmod, #type_id
autda – authentic§ate
autda Xptr, Xmod blraa Xptr, Xmod
pacia – add PAC with instr A-key
blraa – authenticate and branch ldr Xd [Xptr] …

39
Liljestrand et al. PAC it up: Towards Pointer Integrity using ARM Pointer Authentication USENIX Security (2019)
Authenticated Call Stack: high-level idea

Chained MAC of authentications tokens cryptographically bound to return addresses


• Provides modifier (auth) bound to all previous return addresses on the call stack
• Statistically unique to control-flow path
• prevents reuse
• allows precise verification of returns

In dedicated register

auth0 = HK(ret0, 0) auth1 = HK(ret1, auth0) authn = HK(retn, authn-1)

ret0 ret1 retn

Liljestrand et al. PACStack: an Authenticated Call Stack. Usenix Security (2021)


40
Mitigation of hash-collisions: PAC masking

• Challenge: PAC collisions occur on average after 1.253*2b/2 return addresses


• For b=16 n= 321 addresses

• Solution: Prevent recognizing collisions by masking each auth


• pseudo-random mask generated using pacib(0x0, authi-1)

Attack w/o Masking w/ Masking


Reuse previous auth collision 1 2-b
Guess auth to existing call-site 2-b 2-b
Guess auth to arbitrary address 2-2b 2-2b

Maximum probability of success for different attacks


Liljestrand et al. PACStack: an Authenticated Call Stack. Usenix Security (2021) 42
ARMv8.5-A Memory Tagging Extension

Address Allocation
MTE tags tags

Ensures memory accesses are safe by comparing tag in pointer with tag in memory
• Can prevent seqential buffer overflows, and (with high probability) other memory errors

Introduced in ARMv8.5-A specification (announced in 2018), no hardware currently


• Userspace support in Linux 5.10, to be enabled via PROT_MTE flag in mmap()
• Stack tagging in LLVM 9.0, heap tagging support planned
• Experimental support in Android 11 via LLVM’s cudo memory allocator

ARM, Armv8.5-A Memory Tagging Extension, whitepaper 2019 46


ARM, Opensource support for Armv8.5-A Memory Tagging Extension, blog post 2019
ARMv8.5-A MTE

Address tags stored in top 4-bits of a pointer


• uses existing top-byte ignore (TBI) feature

Allocation tags stored transparently by hardware and cached


• 4-bit tag per 16-byte granule of memory

Mismatch between tags reported either:


• synchronously (precise check during testing), or
• asynchronously (imprecise checks after deployment)

ARM, Armv8.5-A Memory Tagging Extension, whitepaper 2019 47


ARM, Arm® Architecture Reference Manual Armv8, for Armv8-A architecture profile, Version F.c. July 2020
Example: Stack Tagging

• Choose random tag on function entry


• For each slot in stack frame choose tag at an offset to initial tag
• Accesses using immediate offset from SP are unchecked
int main(int argc, char *argv[]) main stack frame Stack (grows down)
{
...
doit(argv[1]);
...

return 0;
} 0xffffceb8
void doit(char *str) return address: doit stack
0x08048482 frame
(saved LR)
{ 0xffffceb0
frame pointer: 0xffffceb8 (saved FP)
0xffffcea8
char buf[16];
ptr: 0xffffce88 (&buf)
char ptr = buf; 0xffffcea0

strcpy(ptr, str); 0xffffce98 Heap (grows up)


puts(ptr); buf:
0xffffce90
Bss segment
} 0xffffce88
strcpy stack frame Data segment
void strcpy(char *str)
{ Text segment
...
}
48
LLVM MemTagSanitizer

Random base tag for each stack frame


• Slots sequentially tagged to minimize tag book-keeping
• Uses Stack Safety Analysis to optimize instrumentation
Globals tagging requires loader support to assign initial tags
Heap tagging planned via the new secure Scudo allocator

Provides:
• Deterministic prevention of sequential overflows
• Probabilistic detection of use-after-free and non-sequential out-of-bounds
• In the general case: 1-2-4 ≈ 0.94 chance of detection

E. Stepanov et al., Memory tagging in LLVM and Android, LLVM Developers’ Meeting (2020)
LLVM, MemTagSanitizer, online documentation 49
LLVM, Stack Safety Analysis, online documentation
LLVM Stack Safety Analysis

Introduced by Kuznetsov et al. but also Algorithm finds access range of pointers
used to optimize MTE instrumentation • If range is within allocated memory,
then allocation is provably safe
Memory safety loosely defined as: • Local analysis
A memory object is safe, if all pointers derived • Determines local use ranges for
from it are guaranteed to only access the memory allocations and function arguments
object itself.
• Global analysis
• Merges ranges from function arguments
Does not preclude safe object corruption
• Runs until fixed point reached
• by unrelated unsafe memory accesses
• by within-allocation memory corruption
Pointers in memory assumed unsafe!

V. Kuznetsov et al., Code-pointer integrity, OSDI (2014) 50


R. Gil et al., There’s a hole in the bottom of the C: on the effectiveness of allocation protection, SecDev (2018)
MTE (and MemTagSanitizer) challenges

Tags are corruptible


• Random tags prevent hard-coding
• Adversary can inject tagged pointers
• Safe memory tags always known!
• Guessing probability 2-4 with short 4-bit tags

Analysis is not MTE aware


• Assume pointers in memory are unsafe

Unclear security properties


• Probabilistic, but sometimes not
• No hard guarantees with tag corruption

51
LLVM, MemTagSanitizer, online documentation
Our goal

Prevent tag forgery


• Enforce any pointer loaded from unsafe memory is recognized as unsafe

Leverage MTE-awareness in safety analysis


• Introduce MTE-specific protected domain (in addition to safe / unsafe domains)
• Prove/make a larger set of allocations as safe (better optimization)

Provide clearer security guarantees


• Allow programmer to indicate variables that must remain safe
• Provide concrete guarantees for variables designated as safe by the compiler

52
Preventing tag corruption

Tags can be enforced to achieve hard guarantees!

Always set one tag-bit on load from memory // Load ptr1 from ptr2
• Prevents injection of “safe” tags (costs one tag-bit) char *ptr1 = *ptr2;
Alternatively use ARM Pointer Authentication // Apply ptr2 tag bits to ptr1
• Probabilistic but does not reserve one tag-bit ptr1 = ptr1 | (tag & ptr2);

➔ pointers in safe memory can remain uncorrupted!

53
MTE-aware analysis

main stack frame


MTE can be used to prevent sequential overflows
• Surround with different tag (either other
allocation or dedicated memory guard)
0x00..ffffceb8
return address: doit stack (saved
0x00..08048482 frame LR)
0x00..ffffceb0
Can be treated as if memory safe: frame pointer: 0x00..ffffceb8 (saved FP)
0x00..ffffcea8
ptr: 0x10..ffffce88 (&buf)
New protected memory safe domain 0x00..ffffcea0

• violation will lead to crash 0x00..ffffce98

• tagging of allocation itself can be omitted buf:


0x00..ffffcd90

memory guard
0x00..ffffde80
Analysis checks that for any sequential access strcpy stack frame

• start_range ⊆ allocated_memory
• max_step < memory_guard_size
54
Analysis of in-memory pointers

struct s { char buff[32]; char *ptr; };


Pointer safety requires storage location is safe // sizeof(struct s) = 40
• Must prove safety of pointer within storage
struct s store; store is safe
• In adiditon to allocation-based safety
struct s DATA;
• Must find all subsequent loads of pointer Is store pointer safe?
• Requires point-to analysis in general case store.ptr = &DATA;
• Non-linear data-flow through globals / heap
for (i = 0; i < 40; ++i)
store.buff[i] = get_char();
Where is &DATA used?
g_ptr = store.ptr; // store in global
char *c = &store;
func(store.ptr);

55
R. Gil et al., There’s a hole in the bottom of the C: on the effectiveness of allocation protection, SecDev (2018)
MTE-aware analysis with in-memory pointers

Conservative approximation of pointer-safety struct s { char buff[32]; char *ptr; };


• Check in-allocation bounds based on type // sizeof(struct s) = 40

• Assumes non-local stores unsafe


struct s store;
• Assumes non-typed use is unsafe struct s DATA;
Is store pointer safe?
store.ptr = &DATA;
Allows lightweight data-flow analysis &store.buff out of bounds

• Without dependence on full points-to analysis for (i = 0; i < 40; ++i)


store.buff[i] = get_char();
&DATA leaked to global
g_ptr = store.ptr; // store in global
char *c = &store;
Need to merge function
type information lost func(store.ptr); use with DATA use
56
ARMv8.5-A Branch Target Identification

BTI

Hardware-assisted CFI similar to Intel CET Indirect Branch Tracking

Introduced in ARMv8.5-A specification (2016)


• Support in Linux 5.8
• Instrumentation support in GCC 9.0 (-mbranch-protection=standard|bti)

ARM. Arm® Architecture Reference Manual Armv8, for Armv8-A architecture profile. Version E.a. July 2019 57
ARMv8.5-A BTI

Indirect branches to guarded code regions require marker instructions


• compiler places marker potential indirect branch targets
• two classes of targets: calls and jumps (RET instructions not restricted by BTI)

Branch sources
BTI call type branches
BLR … Indirect function calls
BR <x16|x17> PLT entries and tail calls
BTI jump type branches
BR … (except x16|x17) Branches to jump tables
BTI Marker Instructions
BTI <c|j|cj> Branch Target Identification for c=calls, j=jumps, cj=calls or jumps
BRK Breakpoint Instruction
HLT Halting breakpoint
PACIASP / PACIBSP Create PAC for Instruction address in LR using key A/B and SP as modifier

ARM. Arm® Architecture Reference Manual Armv8, for Armv8-A architecture profile. Version E.a. July 2019 58
Taxonomy of Defenses
Out-of-bounds Dangling Format string
pointer pointer vulnerability
Memory MTE PAuth

vulnerability Memory-safety
Unintended Unintended
Read Write

Software
Integrity Modify Modify Code-pointer Modify non-
❷ Exfiltrate data Code Integrity Compart-
violation code control-data Integrity control-data
mentalization

Exploit Interpret Inject attacker- Instruction Set Inject attacker- Address-space Inject attacker- Software

Payload exfiltrated data controlled code
Randomization controlled address
Randomization controlled data Diversification

Control-
Exploit BTI Indirect jump to Return to Use of corrupt Data-flow
❹ flow
Dispatch corrupt address corrupt address
Integrity
data Integrity

Exploit Binary Execute Execute injected Execute gadget Execute data- Run-time

Execution Attestation modified code code fragment / code fragment oriented gadget Attestation

Information Code-injection Control-flow Data-oriented


leak attack attack attack
59
Adapted from Szekeres et al., SoK: Eternal War in Memory, IEEE SP (2013)
Intel x86_64 mechanisms

Skip to CET
Intel Memory Protection Extension

Bound
Bound
MPX Check
Directory
ISA

Run-time checks for memory accesses to detect pointer bounds violations

Deployed in SkyLake microarchitecture (2015)


• Support in Linux 3.9, removed in 5.6
• Instrumentation support in GCC 5.0, removed in 9.0 (-fcheck-pointer-bounds)

Intel. Intel® 64 and IA-32 Architectures Software Developer Manuals. Volume 1. Chapter 17. May 2019

61
Intel MPX – Example

struct obj { char buf[100]; int len }


obj* a[10];
1: for (i=0; i<M; i++) {
2: total += a[i]->len;
3: }

1: obj* a[10] // Array of pointers to objs

2: total = 0
3: for (i=0; i<M; i++):
4: ai = a + i // Pointer arithmetic on a
5: objptr = load ai // Pointer to obj at a[i]
6: lenptr = objptr + 100 // Pointer to obj.len
7: len = load lenptr
8: total += len // Total length of all objs

62
Intel MPX – Bound Check Instructions

MPX Registers
BND00 – BND03 Bound Registers storing 64-bit LowerBound (LB) and 64-bit UpperBound (UB)

MPX Instructions
BNDMK <Reg> <Addr> <offset> Create LB (Addr) and UB (Addr + offset) in Reg
Bound Check Instructions
BNDCL <Reg> <Addr> Check Addr against LB in Reg .
BNDCU <Reg> <Addr> Check Addr against UB in Reg in 1’s compliment form
BNDCN <Reg> <Addr> Check Addr against UB in Reg not in 1’s compliment form
Bound Management Instructions
BNDMV <Reg> <Reg|Addr> Copy LB and UB from Reg or Addr to Reg
BNDMV <Addr> <Reg> Store LB and UB from Reg to Addr
BNDLDX <Reg> <SIB> Load LB and UB from bound directory
BNDSTX <SIB> <Reg> Store LB and UB to bound directory

Intel. Intel® 64 and IA-32 Architectures Software Developer Manuals. Volume 1. Chapter 17.4. May 2019
63
Intel MPX – Example struct obj { char buf[100]; int len }
obj* a[10]
1: for (i=0; i<M; i++) {
2: total += a[i]->len;
3: }

1: obj* a[10] // Array of pointers to objs


2: a_b = bndmk a, a+79 // Make bounds [a, a+79]
3: total = 0
4: for (i=0; i<M; i++):
5: ai = a + i // Pointer arithmetic on a
6: bndcl a_b, ai // LowerBound check of a[i]
7: bndcu a_b, ai+7 // UpperBound check of a[i]
8: objptr = load ai // Pointer to obj at a[i]
9: objptr_b = bndldx ai // Bounds for pointer at a[i]
10: lenptr = objptr + 100 // Pointer to obj.len
11: bndcl objptr_b, lenptr // Check LowerBound of obj.len
12: bndcu objptr_b, lenptr+3 // Check UpperBound of obj.len
13: len = load lenptr
14: total += len // Total length of all objs

Oleksenko et al. Design of Intel MPX. Web. 2018.


Oleksenko et al. Intel MPX Explained: A Cross-layer Analysis of the Intel MPX System Stack. SIGMETRICS ’18
64
Intel MPX – Bound Directory

MPX address-width adjust

21+MAWA GBytes
Intel. Intel® 64 and IA-32 Architectures Software Developer Manuals. Volume 1. Chapter 17.4.3.1 May 2019
65
Intel MPX – Limitations

Study by Oleksenko et al. identified the following limitations:


• overhead comparable to software-based (up to 4x slowdown, ~50% overhead on average)
• no protection against temporal memory safety errors (e.g. use-after-free)
• no support for multithreading, can lead to unsafe data races between threads
• no support for common C/C++ idioms due to memory layout restrictions
• conflicts with other ISA extensions (Intel TSX, SGX)
• instrumentation incurs significant performance penalty (> 15%) even if MPX not available

Susceptible to Bounds Check Bypass due to Meltdown speculative execution attack


• exploits lazy handling of raised bound range ( #BR) exception

Oleksenko et al. Intel MPX Explained: A Cross-layer Analysis of the Intel MPX System Stack. SIGMETRICS ’18
Canella et al. A Systematic Evaluation of Transient Execution Attacks and Defenses. USENIX Sec ‘19
66

Skip to Intel CET


Adapting MPX for kernel code

Bound Directory cannot be used in kernel code


• kernel cannot handle page faults at arbitrary points within its own execution
• pre-allocating the bound directory and bound tables not feasible due to memory overhead
bound directory for 64-bit kernel is 228 64-bit entries = 2 Gbytes
each bound table 217 32-byte entries = 4 Mbytes

Solution: dynamically determine pointer bounds using existing kernel metadata

https://ssg.aalto.fi/research/projects/kernel-hardening/
Reshetova et al. Towards Linux Kernel Memory safety. Software: Practice and Experience, Volume 48, Issue12. 2018 67
Intel Protection Keys for Userspace

PKU

User-level memory access-control mechanism at page granularity


• Associates each memory page with a 4-bit protection key kept in page table entry
• Access control rules for protection keys maintained by userspace code in PKRU register

Deployed in SkyLake microarchitecture (server configuration, 2015)


• Support in Linux 4.6 (known as MPK)
• Userspace support in GNU C Library (glibc) 2.27, GCC 5.3 (pkey_mprotect)

Intel. Intel® 64 and IA-32 Architectures Software Developer Manuals. Volume 1. Chapter 2.7., 4.6.2 May 2019
68
Intel PKU – Instructions

PKU Registers
PKRU Protection Key Right Register

write disable
access disable

PKU Instructions
RDPKRU Read PKRU value to EAX
WRPKRU Write EAX value to PKRU

Intel. Intel® 64 and IA-32 Architectures Software Developer Manuals. Volume 1. Chapter 2.7., 4.6.2 May 2019
69
Intel Control-flow Enforcement Technology

Shadow Indirect
CET Stack
Branch
Tracking

Hardware-assisted Control-Flow Integrity (CFI) to prevent control-flow hijacking

Deployed in Tiger Lake microarchitecture (mobile CPUs, 2020)


• Linux support proposed in 2018 [mem-mgmt, usermode SHSTK, IBT] (currently at v15)
• Runtime support in glibc 2.28, instrumentation support in GCC 8.0 (-fcf-protection)
• Enabled by default in Fedora 28 and Ubuntu 19.10 onwards

Intel. Control-flow Enforcement Technology Specification, Revision 3.0. May 2019


70
Intel CET – Shadow Stack

Mechanism for protecting return address stored on the call stack


• introduces second stack used exclusively for copies of return addresses
• return address popped from both stacks on return and compared

Writes to shadow stack restricted to control-flow and management instructions


• shadow stack pages protected by page table protections (additional “shadow stack” attr)
• page protection also prevents overflow and underflow of shadow stack

New architectural register: Shadow Stack Pointer


• Cannot be directly encoded as source, destination or memory operand by instructions

Intel. Control-flow Enforcement Technology Specification, Revision 3.0. Chapter 2. May 2019
71
Skip to taxonomy
Intel CET – Indirect Branch Tracking

Prevents diverting indirect CALL/JMP to invalid targets


Indirect
• typical attack vector in call/jmp-oriented programming attacks IDLE CALL / JMP
• achieves only weak CFI guarantees (single class of targets)

ENDBRANCH<Mode> WAIT_FOR_
or legacy mode
ENDBRANCH
Requires indirect JMP / CALL to target specific marker instructions
• compiler places marker at all potential indirect branch targets
• new control-protection exception (#CP) raised otherwise Generate
#CP fault
IBT Marker Instructions
ENDBRANCH32 Marker instruction in 32-bit mode
ENDBRANCH64 Marker instruction in 64-bit mode

Intel. Control-flow Enforcement Technology Specification, Revision 3.0. Chapter 3. May 2019
73
Taxonomy of Defenses
Out-of-bounds Dangling Format string
pointer pointer vulnerability
Memory MPX

vulnerability Memory-safety
Unintended Unintended
Read Write

Software
Integrity Modify Modify Code-pointer Modify non-
❷ Exfiltrate data Code Integrity Compart-
violation code control-data Integrity control-data
mentalization

Exploit Interpret Inject attacker- Instruction Set Inject attacker- Address-space Inject attacker- Software

Payload exfiltrated data controlled code
Randomization controlled address
Randomization controlled data Diversification

Control-
Exploit CET Indirect jump to Return to Use of corrupt Data-flow
❹ flow
Dispatch corrupt address corrupt address
Integrity
data Integrity

Exploit Binary Execute Execute injected Execute gadget Execute data- Run-time

Execution Attestation modified code code fragment / code fragment oriented gadget Attestation

Information Code-injection Control-flow Data-oriented


leak attack attack attack
74
Adapted from Szekeres et al., SoK: Eternal War in Memory, IEEE SP (2013)
Comparison

Skip to PA vs CET
Intel MPX vs. ARMv8.5-A MTE

Intel MPX ARMv8.5-A MTE


Spatial error protection  
Temporal error protection  
Enforcement model Deterministic Probabilistic (16 classes)
Memory Overhead High ?
Run-time Overhead Moderate to High ?

76
Intel CET ShadowStack vs. ARMv8.3-A PA

Intel CET Shadow Stack ARMv8.3-A PA


Return address protection  
Indirect branch protection  capable*
Data pointer protection  capable*
Enforcement model Deterministic Probabilistic**
Immune to pointer reuse  
Memory Overhead Low to Moderate N/A
Run-time Overhead ? (likely low) Low

*) Liljestrand et al. PAC It Up: Towards Pointer Integrity using ARM Pointer Authentication. USENIX Security’19
**) Liljestrand et al. PACStack: an Authenticated Call Stack. Usenix Security (2021)
77
Skip to theory
Intel IBT vs. ARMv8.3-A BTI

Intel IBT ARMv8.3-A BTI


Indirect branch protection  (one class)  (two classes)
Enforcement model Deterministic Deterministic
Memory Overhead N/A N/A
Run-time Overhead ? (likely low) ? (likely low)

79
A theory of run-time attacks
von Neumann architecture

Architecture for a stored-program computer


• Realizes (theoretical) concept of universal Turing machine Computer (circa 1945)
• Instructions and data stored in memory
Central Processing Unit
• Operates by changing internal state,
i.e., instructions read and modify some data. Control Unit
(clock, configuration regs, I/O)

Arithmetic / Logic Unit


Computer (circa 2020)

Output
Device

Device
(math)

Input
With a large addressable memory, different memory types (e.g.
SRAM, DRAM flash etc.) and I/O map onto single memory space PC CIR
Registers AC MAR MDR
L2 Memory

L3 Memory

Memory

81
Programs as intended finite state machines

Design of program 𝒑𝒑 can be modeled as (potentially very large) finite state machine†,‡
• The intended finite state machine (IFSM) describes the intended function of 𝒑𝒑
• To execute the IFSM on real-world computers, 𝒑𝒑 is realized as a software emulator for the IFSM

𝜽𝜽 = 𝑸𝑸, 𝒊𝒊, 𝑭𝑭, 𝚺𝚺, 𝚫𝚫, 𝜹𝜹, 𝝈𝝈 §


OPENED close
state

The IFSM represents a bug-free version of 𝒑𝒑 transition

𝒑𝒑 is a (potentially faulty) emulator for the IFSM open CLOSED


transition
𝒑𝒑 runs on a processor 𝒄𝒄𝒄𝒄𝒄𝒄
condition
†) or a finite state transducer if output is possible ‡) non-equivalence of FSM/FST to a Turing
§) 𝑸𝑸 machine does not matter as any real-world
= set of states, 𝒊𝒊 = initial state
computing device has finite memory
𝑭𝑭 = final state, 𝜮𝜮, 𝜟𝜟 = input and output alphabets
state transition function 𝜹𝜹: 𝑸𝑸 × 𝚺𝚺 → 𝑸𝑸,
output function 𝝈𝝈: 𝑸𝑸 × 𝜮𝜮 → 𝜟𝜟 82
82
cpu states
𝒑𝒑 𝒕𝒕𝒓𝒓𝒓𝒓𝒓𝒓𝒓𝒓
𝑸𝑸𝒄𝒄𝒄𝒄𝒄𝒄 = 𝑸𝑸𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰
𝒄𝒄𝒄𝒄𝒄𝒄 ∪ 𝑸𝑸 𝒄𝒄𝒄𝒄𝒄𝒄

𝑸𝑸𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰
𝒄𝒄𝒄𝒄𝒄𝒄 : concrete states of target
machine that map to a state in
the IFSM

𝑸𝑸𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕
𝒄𝒄𝒄𝒄𝒄𝒄 𝑸𝑸𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰
𝒄𝒄𝒄𝒄𝒄𝒄

𝑸𝑸𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕
𝒄𝒄𝒄𝒄𝒄𝒄 : benign transitory states
that occur during emulation of
an edge in the IFSM; part of
intended transitions

83
T. F. Dullien, Weird machines, exploitability, and provable unexploitability, IEEE Transactions on Emerging Topics in Computing (2017)
Two different perspectives of 𝜽𝜽
data

input input
State 1 State 2 State 3 State 4 State 5
instruction instruction instruction instruction

Program 𝒑𝒑

Data (user’s “program”)

input input
State 1 State 4 State 5
instruction instruction instruction instruction

Program 𝒑𝒑 (“data” from user’s perspective)


T. F. Dullien, Weird machines, exploitability, and provable unexploitability, IEEE Transactions on Emerging Topics in Computing (2017)
Two different perspectives of 𝜽𝜽
data

input input
State 1 State 2 State 3 State 4 State 5
instruction instruction instruction instruction

Program 𝒑𝒑

Data (attacker’s “program”)

input input
State 1 State ? State ?
instruction instruction instruction instruction

Program 𝒑𝒑 (“data” from attacker’s perspective)


T. F. Dullien, Weird machines, exploitability, and provable unexploitability, IEEE Transactions on Emerging Topics in Computing (2017)
What is a “weird state”?
𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕
𝑸𝑸𝒄𝒄𝒄𝒄𝒄𝒄 = 𝑸𝑸𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰
𝒄𝒄𝒄𝒄𝒄𝒄 ∪ 𝑸𝑸 𝒄𝒄𝒄𝒄𝒄𝒄 ∪ 𝑸𝑸 𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
𝒄𝒄𝒄𝒄𝒄𝒄

𝑸𝑸𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰
𝒄𝒄𝒄𝒄𝒄𝒄 : concrete states of target
machine that map to a state in
the IFSM

𝑸𝑸𝒘𝒘𝒆𝒆𝒆𝒆𝒆𝒆𝒅𝒅
𝒄𝒄𝒄𝒄𝒄𝒄 : set of stats in 𝑸𝑸𝒄𝒄𝒄𝒄𝒄𝒄
not in 𝑸𝑸𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕
𝒄𝒄𝒄𝒄𝒄𝒄 nor 𝑸𝑸𝒄𝒄𝒄𝒄𝒄𝒄
𝑸𝑸𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕
𝒄𝒄𝒄𝒄𝒄𝒄 𝑸𝑸𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰
𝒄𝒄𝒄𝒄𝒄𝒄 𝑸𝑸𝒘𝒘𝒆𝒆𝒆𝒆𝒓𝒓𝒓𝒓
𝒄𝒄𝒄𝒄𝒄𝒄

Weird states arise unintentionally


𝑸𝑸𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕
𝒄𝒄𝒄𝒄𝒄𝒄 : benign transitory states
and have no meaningful
that occur during emulation of
an edge in the IFSM; part of interpretation in the IFSM
intended transitions

86
T. F. Dullien, Weird machines, exploitability, and provable unexploitability, IEEE Transactions on Emerging Topics in Computing (2017)
Reaching a weird state

𝒒𝒒𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊 ∈ 𝑸𝑸𝒘𝒘𝒘𝒘𝒊𝒊𝒓𝒓𝒓𝒓
𝒄𝒄𝒄𝒄𝒄𝒄

𝒒𝒒𝒊𝒊 ∈ 𝑸𝑸𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰 𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕


𝒄𝒄𝒄𝒄𝒄𝒄 ∪ 𝑸𝑸𝒄𝒄𝒄𝒄𝒄𝒄

Intuitively: a bug has occurred when


cpu enters a weird state
𝑸𝑸𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕
𝒄𝒄𝒄𝒄𝒄𝒄 𝑸𝑸 𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰
𝒄𝒄𝒄𝒄𝒄𝒄 𝑸𝑸𝒘𝒘𝒆𝒆𝒆𝒆𝒓𝒓𝒓𝒓
𝒄𝒄𝒄𝒄𝒄𝒄
Vulnerability
• method of moving 𝑝𝑝 to a weird state (accessible to attacker)

Exploitation; run-time attack


• process of choosing 𝑞𝑞𝑖𝑖 , entering 𝑞𝑞𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 and programming resulting
“weird machine” in order to violate security properties of the IFSM
87
T. F. Dullien, Weird machines, exploitability, and provable unexploitability, IEEE Transactions on Emerging Topics in Computing (2017)
Recall: 𝜽𝜽 = 𝑸𝑸, 𝒊𝒊, 𝑭𝑭, 𝚺𝚺, 𝚫𝚫, 𝜹𝜹, 𝝈𝝈
Weird machines 𝑸𝑸 = set of states, 𝒊𝒊 = initial state
𝑭𝑭 = final state, 𝜮𝜮, 𝜟𝜟 = input and output alphabets
state transition function 𝜹𝜹: 𝑸𝑸 × 𝚺𝚺 → 𝑸𝑸, output function 𝝈𝝈: 𝑸𝑸 × 𝜮𝜮 → 𝜟𝜟

A weird machine is a computational device where IFSM transitions operate on weird states

𝜽𝜽𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘 = 𝑸𝑸𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘𝒘
𝒄𝒄𝒄𝒄𝒄𝒄 , 𝒒𝒒 𝒊𝒊𝒊𝒊𝒊𝒊𝒊𝒊 , 𝑸𝑸 𝑰𝑰𝑰𝑰𝑰𝑰𝑰𝑰 ∪ 𝑸𝑸𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕𝒕 , 𝚺𝚺𝚺, 𝚫𝚫𝚫, 𝜹𝜹𝜹, 𝝈𝝈𝝈
𝒄𝒄𝒄𝒄𝒄𝒄 𝒄𝒄𝒄𝒄𝒄𝒄

Instruction stream depends on input


• weird machine programmed through carefully crafted input to 𝑝𝑝 once 𝑞𝑞𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 has been entered
Emergent instruction set
• attacker (programmer of the weird machine) must discover the (often unwieldly) semantics of instructions
Unknown state space
• depends heavily on 𝑝𝑝 and 𝑞𝑞𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖
Unknown computational power
• greater complexity of the IFSM may yield greater number of instructions,
but whether or not the instructions are usable is difficult to predict

88
T. F. Dullien, Weird machines, exploitability, and provable unexploitability, IEEE Transactions on Emerging Topics in Computing (2017)
Possible sources of weird states

Human error when program 𝑝𝑝 is developed is developed Central Processing Unit


• Memory-related errors, e.g., Control Unit
- spatial errors (buffer overflows) (clock, configuration regs, I/O)

- temporal errors (use-after-free) Arithmetic / Logic Unit


(math)
• Logic errors, e.g., integer overflow
PC Pipeline
Hardware faults when 𝑝𝑝 is executed
Registers SP Gen.Purp.Regs

• Probabilistically deterministic hardware


• Fault injection, e.g., Rowhammer
Instructions Data
Transcription errors when 𝑝𝑝 is
transmitted over error-prone medium
• Hardware failure, e.g., hard drive Input Eventual output

T. F. Dullien, Weird machines, exploitability, and provable unexploitability, IEEE Transactions on Emerging Topics in Computing (2017)
89
Y. Kim et al., Flipping Bits in Memory Without Accessing Them: An Experimental Study of DRAM Disturbance Errors, ISCA (2014)
Modelling the attacker

Arbitrary program-point, chosen-bitflip


can stop 𝑝𝑝 anywhere, flip any one bit in memory, and continue
Arbitrary program-point, chosen-bitflip, registers
same as above, but cannot modify/access registers

Fixed program-point, chosen-bitflip, registers


Fixed program-point, sequential memory rewriting, registers
classical buffer overflow

Arbitrary program-point, arbitrary memory-rewriting, registers
most powerful adversary

90
T. F. Dullien, Weird machines, exploitability, and provable unexploitability, IEEE Transactions on Emerging Topics in Computing (2017)
Defining security

Depends on the desired security goal of 𝜽𝜽 and 𝑝𝑝 : e.g., not disclose sensitive
information s

Attacker defines 𝜽𝜽exploit to (adapatively) interact with 𝜽𝜽weird

Attacker wins if s is in the output of 𝜽𝜽weird with a higher probability than random

91
T. F. Dullien, Weird machines, exploitability, and provable unexploitability, IEEE Transactions on Emerging Topics in Computing (2017)
Takeaways

New hardware-assisted defenses are emerging and are (going to be) widely available

How to utilize available primitives effectively?


• Towards pointer integrity with PA (Usenix SEC ’19)

How to deal with downsides?


e.g. optimally minimize scope for PA reuse attacks?
• For return addresses: PACStack (Usenix SEC ‘21)
• For other types of pointers?
https://ssg.aalto.fi/research/projects/harp/

How do different hardware primitives compare?

We have open postdoc and graduate student positions. Talk to me! 92


Acknowledgments

Icons on slides 3, 4, 5, 15, 16, 17, 19, 21, 22, 23 and 24 made by Good Ware from www.flaticon.com
licensed by CC 3.0 BY

The PHP logo on slide 25 made by Colin Viebrock licensed by CC BY-SA 4.0

The BSD daemon on slide 25 is copyright of Marshall Kirk McKusick

All product and company names and logos are trademarks™ or registered ® trademarks of their
respective holders. Use of them does not imply any affiliation with or endorsement by them.

Slide 11 (Return-oriented programming (high-level idea) is by Luca Davi.

You might also like