0% found this document useful (0 votes)
51 views

2-A Case of Dynamic Program Analysis - CS510 Software Engineering

This document describes dynamic program analysis using instrumentation. It discusses how dynamic instrumentation works by inserting probes into the binary code as it executes. The probes are inserted by Valgrind, a dynamic binary instrumentation framework. Valgrind intercepts the original program binary instructions and redirects execution through its core, which dispatches basic blocks to instrumentation tools. The tools can insert additional instructions or modify the original ones to monitor and analyze the program as it runs.

Uploaded by

runqi fan
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

2-A Case of Dynamic Program Analysis - CS510 Software Engineering

This document describes dynamic program analysis using instrumentation. It discusses how dynamic instrumentation works by inserting probes into the binary code as it executes. The probes are inserted by Valgrind, a dynamic binary instrumentation framework. Valgrind intercepts the original program binary instructions and redirects execution through its core, which dispatches basic blocks to instrumentation tools. The tools can insert additional instructions or modify the original ones to monitor and analyze the program as it runs.

Uploaded by

runqi fan
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 42

A Case of Dynamic Program

Analysis
CS510 Software Engineering
Outline

 Introduction
• Static instrumentation vs. dynamic instrumentation
 How to implement a dynamic information flow
system

CS510 Software Engineering


What Is Instrumentation

Max = 0;
for (p = head; p; p = p->next)
{

printf(“In
count[0]++;loop\n”);
if (p->value > max)
{

count[1]++; branch\n”);
printf(“True
max = p->value;
}
}

CS510 Software Engineering


What Can Instrumentation Do?

 Profiler for compiler optimization:


• Basic-block count
• Value profile
 Micro architectural study:
• Instrument branches to simulate branch predictors
• Generate traces
 Bug checking:
• Find references to uninitialized, unallocated address
 Software tools that are capable of instrumentation:
• Valgrind, Pin, Purify, ATOM, EEL, Diablo, …

CS510 Software Engineering


Binary Instrumentation Is Dominant

 Libraries are a big pain for source code level instrumentation


• Proprietary libraries: communication (MPI, PVM), linear algebra
(NGA), database query (SQL libraries).
 Easily handle multi-lingual programs
• Source code level instrumentation is heavily language dependent.
 More complicated semantics
 Turning off compiler optimizations can maintain an almost
perfect mapping from instructions to source code lines
 Worms and viruses are rarely provided with source code
 We will be talking about binary instrumentation only
• Static
• Dynamic

CS510 Software Engineering


Static Instrumentation (Diablo)

*.c *.cpp *.S Read Object Format

Disassemble Bundles
gcc g++ asm
Construct ICFG

*.o *.a
Analyses/Optimizations

Serialize ICFG
ld
DIABLO Assemble Bundles
a.out
Write Object Format
b.out

CS510 Software Engineering


Static Instrumentation Characteristics

 Perform instrumentation before code is run


• New binary = original binary + instrumentation
• Raise binary to IR, transform IR, transfer back to binary
 All libraries are usually statically linked
• The size of binary is big
 Program representations are usually built from
program binary
• CFG
• Call graph
• PDG is hard to build from binary
 Points-to analysis on binary is almost impossible
 Simple DFA is possible

CS510 Software Engineering


Dynamic Instrumentation - Valgrind

 Developed by Julian Seward at/around Cambridge


University,UK
• Google-O'Reilly Open Source Award for "Best Toolmaker" 2006
• A merit (bronze) Open Source Award 2004
 Open source
• works on x86, AMD64, PPC code
 Easy to execute, e.g.:
• valgrind --tool=memcheck ls
 It becomes very popular
• One of the two most popular dynamic instrumentation tools
 Pin and Valgrind
• Very good usability, extendibility, robust
 25MLOC
• Mozilla, MIT, CMU-security, Me, and many other places

 Overhead is the problem


• 5-10X slowdown without any instrumentation

CS510 Software Engineering


Valgrind Infrastructure

Tool 1
VALGRIND CORE
BB
BB Decoder Tool 2
pc

Binary pc ……
Dispatcher BB Compiler
Code
Tool n

Instrumenter
Trampoline New BB
Input New BB Runtime
state

New pc

CS510 Software Engineering


Valgrind Infrastructure
1: do {
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2
1

Binary 1 ……
Dispatcher BB Compiler
Code
Tool n

Instrumenter
Trampoline
Input
Runtime

OUTPUT:
CS510 Software Engineering
Valgrind Infrastructure
1: do {
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder 1: do {
Tool 2
2: i=i+1;
3: s1;
Binary ……
4: } while (i<2)
Dispatcher BB Compiler
Code
Tool n

Instrumenter
Trampoline
Input
Runtime

OUTPUT:
CS510 Software Engineering
Valgrind Infrastructure
1: do {
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2

Binary ……
Dispatcher BB Compiler
Code
Tool n

Instrumenter
Trampoline 1: do {
Input print(“1”)
2: i=i+1; Runtime
3: s1;
4: } while (i<2)
OUTPUT:
CS510 Software Engineering
Valgrind Infrastructure
1: do {
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2

Binary ……
Dispatcher BB Compiler
Code
Tool n

Instrumenter
1 Trampoline
Input 1: do { Runtime
print(“1”)
i=i+1;
s1;
} while (i<2) OUTPUT: 1 1
CS510 Software Engineering
Valgrind Infrastructure
1: do {
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; 5 BB Decoder Tool 2
5: s2;
Binary ……
Dispatcher BB Compiler
Code
Tool n

5 Instrumenter
Trampoline
Input 1: do { Runtime
print(“1”)
i=i+1;
s1;
} while (i<2) OUTPUT: 1 1
CS510 Software Engineering
Valgrind Infrastructure
1: do {
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2

Binary ……
Dispatcher BB Compiler
Code
Tool n

Instrumenter
Trampoline
Input 1: do { Runtime
print(“1”) 5: print (“5”);
i=i+1; s2;
s1;
} while (i<2) OUTPUT: 1 1
CS510 Software Engineering
Valgrind Infrastructure
1: do {
2: i=i+1; Tool 1
3: s1; VALGRIND CORE
4: } while (i<2)
5: s2; BB Decoder Tool 2

Binary ……
Dispatcher BB Compiler
Code
Tool n

Instrumenter
Trampoline
1: do {
Input print(“1”)
i=i+1; Runtime
s1;
} while (i<2)
5: print (“5”); OUTPUT: 1 1 5
s2;
CS510 Software Engineering
Dynamic Instrumentation Characteristics

 A trampoline is required.
 Does not require recompiling or relinking
• Save time: compile and link times are significant in real systems.
• Can instrument without linking (relinking is not always possible).
 Dynamically turn on/off, change instrumentation
• From t1-t2, I want to execute F’, t3-t4, I want F’’
 Can be done by invalidating the mapping in the dispatcher.
 Can instrument running programs (such as Web or database
servers)
• Production systems.
 Can instrument self-mutating code.
• Obfuscation can be easily get around.

CS510 Software Engineering


Dynamic Instrumentation Characteristics

 Overhead is high
• Dispatching, indexing;
• Dynamic instrumentation
 Usually does not provide program representations at
run time
• Hard to acquire
• Unacceptable runtime overhead
• Simple representations such as BB are provided
• GET AROUND: combine with static tools
 Diablo + valgrind

CS510 Software Engineering


Case Study: Implement A Dynamic Information
Flow System in Valgrind

CS510 Software Engineering


Information Flow System

 IFS is important
• Confidentiality at runtime = IFS
• Tainted analysis = IFS
• Memory reference errors detection = IFS
• Data lineage system = IFS
• Dynamic slicing is partly an IFS
 Essence of an IFS
• A runtime abstract interpretation engine
 Driven by the executed program path

 Implementation on Valgrind is surprisingly easy


• Will see

CS510 Software Engineering


Language and Abstract Model

 Our binary (RISC)


• ADD r1 / #Imm, r2
• LOAD [r1 / #Imm], r2
• STORE r1, [r2 / #Imm]
• MOV r1 / #Imm, r2
• CALL r1
• SYS_READ r1, r2
 r1 is the starting address of the buffer, r2 is the size

 Abstract state
• One bit, the security bit (tainted bit)
• Prevent call at tainted value.

CS510 Software Engineering


Implement A New Tool In Valgrind

 Use a template
• The tool lackey is good candidate
• Two parts to fill in Tool n
 Instrumenter
 Runtime Instrumenter
 Instrumenter
• Initialization Runtime
• Instrumentation
• Finalization
• System calls interception
 Runtime
• Transfer functions
• Memory management for abstract state

CS510 Software Engineering


How to Store Abstract State

 Shadow memory Virtual Space Shadow Space


• We need a mapping
 Addr  Abstract State
 Register  Abstract State

[addr] val abs


typedef
struct {
UChar abits[65536];
} SecMap;

static SecMap* primary_map[65536];


static SecMap default_map;

CS510 Software Engineering


How to Store Abstract State
typedef
struct {
Virtual Space Shadow Space
UChar abits[65536];
} SecMap;

static SecMap* primary_map[65536];


static SecMap default_map;
[addr] val abs
static void init_shadow_memory ( void )
{
for (i = 0; i < 65536; i++)
default_map.abits[i] = 0;
for (i = 0; i < 65536; i++)
primary_map[i] = &default_map;
}

CS510 Software Engineering


How to Store Abstract State
typedef
struct {
Virtual Space Shadow Space
UChar abits[65536];
} SecMap;

static SecMap* primary_map[65536];


static SecMap default_map;
[addr] val abs
static void init_shadow_memory ( void )
{
for (i = 0; i < 65536; i++)
default_map.abits[i] = 0;
for (i = 0; i < 65536; i++)
primary_map[i] = &default_map;
}
static SecMap* alloc_secondary_map ()
{
map =VG_(shadow_alloc)(sizeof(SecMap));
for (i = 0; i < 65536; i++)
map->abits[i] = 0;
return map;
CS510 Software Engineering
}
How to Store Abstract State
typedef
struct {
Virtual Space Shadow Space
UChar abits[65536];
} SecMap;

static SecMap* primary_map[65536];


static SecMap default_map;
[addr] val abs
static void init_shadow_memory ( void )
{
for (i = 0; i < 65536; i++)
default_map.abits[i] = 0;
for (i = 0; i < 65536; i++)
primary_map[i] = &default_map;
} void Accessible (addr)
static SecMap* alloc_secondary_map () {
{ if (primary_map[(addr) >> 16]
map =VG_(shadow_alloc)(sizeof(SecMap)); == default_map)
for (i = 0; i < 65536; i++) primary_map[(addr) >> 16] =
map->abits[i] = 0; alloc_secondary_map(caller);
return map; }
CS510 Software Engineering
}
Initialization

void SK_(pre_clo_init)(void)
{
VG_(details_name) (“CS510 IFS");

init_shadow_memory();

VG_(needs_shadow_memory) ();
VG_(needs_shadow_regs) ();

VG_(register_noncompact_helper)((Addr) & RT_load);
VG_(register_noncompact_helper)((Addr) & …);

}

CS510 Software Engineering


Finalization

 EMPTY

void SK_(fini)(Int exitcode)


{
}

CS510 Software Engineering


Instrumentation & Runtime

UCodeBlock* SK_(instrument)(UCodeBlock* cb_in, …)


{

UCodeBlock cb = VG_(setup_UCodeBlock)(…);

for (i = 0; i < VG_(get_num_instrs)(cb_in); i++) {
u = VG_(get_instr)(cb_in, i);
switch (u->opcode) {
case LD:

case ST:

case MOV:

case ADD:

case CALL:

return cb;
} CS510 Software Engineering
Instrumentation & Runtime - LOAD

switch (u->opcode) {
case LD:
VG_(ccall_RR_R) (cb, (Addr) RT_load, u->
LD [r1], r2 r1, SHADOW (u->r1), SHADOW(U->r2)
}

SHADOW(r2)=SM(r1) | SHADOW (r1)


UChar RT_load (Addr r1, UChar sr1)
{
UChar s_bit=primary_map[a >> 16][a && 0xffff];
return (s_bit | sr1);

CS510 Software Engineering


Instrumentation & Runtime - STORE

switch (u->opcode) {
case ST:
VG_(ccall_RRR_0) (cb, (Addr) RT_store,
ST r1, [r2] u->r2, SHADOW (u->r1), SHADOW(u->r2);
}

SM(r2)=SHADOW(r1) | SHADOW (r2)

void RT_store (Addr a, UChar sr1, UChar sr2)


{
UChar s_bit= sr1 | sr2;
Accessible(a);
primary_map[a >> 16][a && 0xffff]=s_bit;
}

CS510 Software Engineering


Instrumentation & Runtime - MOV

switch (u->opcode) {
case MOV:
uInstr2(cb, MOV,…, SHADOW(u->r1), …
MOV r1, r2 SHADOW(u->r2)
}

SHADOW(r2) = SHADOW (r1)

CS510 Software Engineering


Instrumentation & Runtime - ADD

switch (u->opcode) {
case ST:
VG_(ccall_RR_R) (cb, (Addr) RT_add, SHADOW(u->r1),
ADD r1, r2 SHADOW (u->r2), SHADOW(u->r2);
}

SHADOW(r2) = SHADOW (r1) | SHADOW (r2)

UChar RT_add (UChar sr1, UChar sr2)


{
return sr1 | sr2;
}

CS510 Software Engineering


Instrumentation & Runtime - CALL

switch (u->opcode) {
case ST:
VG_(ccall_R_0) (cb, (Addr) RT_call, SHADOW(u->r1));
CALL r1
}

if (SHADOW(r1)) printf (“Pleae call CS590F”)

UChar RT_call (UChar sr1)


{
if (sr1) VG_(printf) (“Please call CS590F\n”);
}

CS510 Software Engineering


Instrumentation & Runtime – SYS_READ

void * SK_(pre_syscall) (… UInt syscallno…)


{

SYS_READ r1, r2 if (syscallno==SYSCALL_READ) {
get_syscall_params (…, &r1, &r2,…);
for (i=0;i<r2;i++) {
SM (r1[0-r2])=1 a= &r1[i];
Accessible(a);
primary_map[a >> 16][a && 0xffff]=1;
}
}

}

CS510 Software Engineering


Done!

 Let us run it through a buffer overflow exploit

void (* F) ();
char A[2];
...
read(B, 256);
i=2;
A[i]=B[i];
...
(*F) ();

CS510 Software Engineering


void (* F) (); Virtual Space Shadow Space
char A[2];
... ...
SM (r1[0-r2])=1
read(B, 256); i
MOV &B, r1
MOV 256, r2
... F
SYS_Read r1, r2
... A[1]
i=2; MOV 2, r1 A[0]
... ST r1, [&i]

A[i]=B[i];
... SM(&i)=SHADOW(r1) 1
LD [&i], r1 1
... MOV &B, r2 …
ADD r1, r2 B 1
LD [r2], r2
MOV &A, r3 r1
ADD r1, r3
r2
ST r2, [r3]
... r3
(*F) (); MOV F, r1
CALL r1
CS510 Software Engineering
void (* F) (); Virtual Space Shadow Space
char A[2];
... ...
read(B, 256); i
MOV &B, r1
MOV 256, r2
... F 1
SYS_Read r1, r2
SHADOW(r2)=SM(r2) | SHADOW (r2) A[1]
...
i=2; r2=&B[2];
MOV 2, r1 A[0]
... ST r1, [&i]
... 1
A[i]=B[i]; LD [&i], r1
... 1
MOV &B, r2 …
ADD r1, r2 B 1
LD [r2], r2
MOV &A, r3 r1
ADD r1, r3
r2 1
ST r2, [r3]
... r3
(*F) (); SM (r3)=SHADOW(r2)
MOV F, r1 | SHADOW (r3)
CALL r1 r3=&A[2]
CS510 Software Engineering
void (* F) (); Virtual Space Shadow Space
char A[2];
... ...
read(B, 256); i
MOV &B, r1
MOV 256, r2
... F 1
SYS_Read r1, r2
... A[1]
i=2; MOV 2, r1 A[0]
... ST r1, [&i]
... 1
A[i]=B[i]; LD [&i], r1
... 1
MOV &B, r2 …
ADD r1, r2 B 1
LD [r2], r2
MOV &A, r3 r1 1
SHADOW(r1)=SM(F);
ADD r1, r3
r2 1
ST r2, [r3]
... r3
(*F) (); MOV F, r1
CALL r1
if (SHADOW(r1)) printf (“Call …”);
CS510 Software Engineering
What Is Not Covered

 Information flow through control dependence


• Valgrind is not able to handle
• Valgrind + diablo

p=getpassword( );

if (p==“zhang”) {
send (m);
}

CS510 Software Engineering


Extending the IFS to Identify Memory Bugs

CS510 Software Engineering


Wrap-Up

 Abstract interpretation driven by the concrete


execution
• Only need to consider the transfer functions for statements
• No need to figure out how to combine abstract information
since there is only one path
 Implemented through code instrumentation
 Termination is often not an issue, efficiency may be a
concern
 Medicine vs. illness
• Where as static analysis is more like “precaution vs. illness”
 More flexibility in algorithm design, broader design
space.

CS510 Software Engineering

You might also like