Senior Project Proposal
Prabir Shrestha (4915302)
Myo Min Zin (4845411)
Napaporn Wuthongcharernkun
(4846824)
Objective
Motivation
Scope
The Framework
Gantt Chart
Questions and Answers
2
A Naive
Compiler
3
Front-end
Back-end
4
Evolution of Computer Programming
Managed Code vs Unmanaged Code
Bulky .NET Framework
Operating Systems written in
managed code
5
Why Low Level Virtual Machine?
– Source Language independent
– Retargetable code generator
– Supports various architectures
• X86, PowerPC, ARM
– Open source
6
It is not
– a compiler,
– a virtual machine alike JVM, .NET
Framework
It is
– A modular compiler infrastructure
• a collection of (C++) libraries and tools to
help in building compilers, debuggers,
program analyzers etc.
7
Commonly referred to as LLVM
Started as academic project at
University of Illinois on 2002.
Current development mainly by
Apple Inc.
Projects related to LLVM
– Clang: C/C++ front-end; aims to replace
gcc
– OpenGL engine in Mac OS X 10.5
– used by Adobe Systems Inc., Nvidia, Sun
Microsystems Laboratories 8
Keywords- Categories
Operators and Special Characters
Source Language Features
9
Types
bool char float int string class struct enum object
Conditionals
if else
Loops
for while do
10
Single Inheritance
base
Encapsulation
private protected public
Overloading Operators
operator
Method Overloading / Method
Overriding
override virtual
11
Indexing & Properties(Accessor/
Mutator)
get set value
Modifiers
static sealed
Type
explicit Casting
implicit
12
base enum int private this
bool explicit namespace protected true
break extern new return typeof
char false null sealed using
class float operator set virtual
const for object sizeof void
continue get public static while
do if override string value
else implicit struct is
13
Operators & Special characters supported
x.y x-- < != *= ||
f(x) --x > (T) x /=
a[x] + <= = *
x++ - >= += /
++x ! == -= &&
14
Single class Inheritance
Encapsulation
Overloadable Operators
Method Overloading/Overriding
Properties (Accessors / Mutators)
15
Overall Process
Scanner
Parser
Semantic Analyzer
Code Generator
Assembling and Linking
16
17
18
Tokenization Process- Identifying the
tokens from the input stream.
Skip meaningless characters, white
spaces,
Lexical Analysis- Checking for Lexical
Errors
Using Coco/R tool the scanner and
parser are generated at the same
time.
19
Syntax Analysis is performed at this
phase.
Coco/R generates a recursive
descent parser.
– Top down parsing method
– Procedural-like functions
– Generally for each production rule, one
procedure is generated.
Accepts Grammar in LL(k) Form
LL: Left to Right, Left most
– LL(1) Conflict Resolvers may be needed 20
Parser Error-Recovery Techniques
– Synchronization
– Weak Symbols
Synchronization Technique
– SYNC symbols are placed in the grammar,
where there’s unlikely to be errors.
– Upon error detection, parser skips input
symbols until it finds one that is expected
at a synchronization point.
21
Weak Symbols
- Placed in front of tokens that are prone
to error,
often misspelled or missing.
- When error is encountered, reports error
and can jump to next synchronization
point.
22
Synchronization Example
TypeDecl
=
SYNC
( "class" ident [ClassBase] ClassBody [";"]
| "struct" ident [Base] StructBody [";"]
| "enum" ident [":" IntType] EnumBody
[";"]
)
.
Weak Symbols Example
EnumBody
=
"{" EnumMember { WEAK "," EnumMember} "}".
23
24
A phase that follows after the
generation of parser
To check semantic error once the
lexical and syntax errors have been
checked
Examples:
– type checks, scoping of variable,
constant values not being changed, no
redefinitions of a classes, method and
member variables
25
26
After AST and semantic analysis
Generating LLVM Intermediate
Representation (IR)
27
Language and Target independent
Designed to support multiple language
frontends
Represents the key operations of
ordinary processors
Avoids machine specific constraints
– physical registers, pipelining
28
Does not define runtime and OS
system functions
– these are defined by runtime libraries
IR is a typed Virtual Instruction Set
– unbounded number of registers
– operations are low level
– checked for consistency
29
Usually 3-address code
%temp2 = add i32 %temp0, %temp1
Instructions are typed
Instructions are polymorphic
Usually Static Single Assignment (SSA)
Form
–new register for each result
–uses phi (ɸ) functions
–code generator tries to store these variables
in same real registers
30
Constant
Folding
Simplifies constant expressions at
compile time
Example
i = 100 * 20 * 3 i = 6000
31
Constant
Substituting Propagation
the values of known
constants in expressions at compile time
Example
int x = 7 int x = 7 int x = 14
int y = 14 – x int y = 14 - 7 int y = 7
32
Strength
Reduction
Costly operation is replaced with
equivalent but less expensive operation
Examples
y = x / 8 y = x >> 3
y = x * 64 y = x << 6
y = x * 2 y = x + x
33
Elimination of Useless
Instruction
Drop instructions that do not modify any
memory storage
Examples
x = y + 0 x = y
y = z * 1 y = z
34
35
User
Phases
36
37
38