0% found this document useful (0 votes)

7 views53 pages

Compiler2018 Big Picture

Compiler2018_big_picture

Uploaded by

kalapepe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views53 pages

Compiler2018 Big Picture

Compiler2018_big_picture

Uploaded by

kalapepe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Compiler 2018:

Big Picture
Lequn Chen
March 16, 2017
Why Compilers Course

• Improve programming skills

• ~10k loc

• Very important project experience in your CV

• programming skills
• perseverance
What Are Compilers

Compiler
Source Language Target Language
gcc
C Linux x86-64 ELF
javac
Java JVM Bytecode
your compiler
M* Linux x86-64 Assembly
What Are Virtual Machines
• Interpreter
• JIT Optimization focused on Code Generation

Java JVM on x86-64 Linux

Kotlin JVM on x86-64 Windows

JVM Bytecode
Scala JVM on x86-64 macOS

Clojure JVM on Embedded Systems

• Language Features
• Optimization based on High Level Semantics
not Low-Level Virtual Machine anymore

LLVM
• Code Generation
• Transforms and Optimizations

C/C++ x86-64

Rust RISC

LLVM IR
Swift ARM

Haskell MIPS

• Language Features
• Optimization based on High Level Semantics
not Low-Level Virtual Machine anymore

LLVM
• Code Generation
Front-end • Transforms and Optimizations

C/C++ x86-64

Back-end
Rust RISC

LLVM IR
Swift ARM

Haskell MIPS

• Language Features
• Optimization based on High Level Semantics
Compilers
• Overview
• Front-end → Intermediate Representation → Back-end
• More detail?
• Lexing
• Parsing
• Semantic Analysis
• IR Generation
• IR Optimization
• Code Generation
• Target-dependent Optimizations
About the Course
• Language: whatever you want

• Lexing and parsing library: whatever you want

• Source language: M* (C-and-Java-like)

• Target platform: Linux x86-64 Assembly in NASM

• Additional language features: whatever you want

• as long as it is compatible with the manual
• Optimizations: whatever you want
• as long as you can pass the tests
Lexing & Parsing
Source Code
while f3 < 100 {
f3 = f1 + f2;
f1, f2 = f2, f3;
}
Lexing
while f3 < 100 {
f3 = f1 + f2;
f1, f2 = f2, f3;
}

KEYWORD while
IDENTIFIER f3
SYMBOL <
LITERAL 100
SYMBOL {
IDENTIFIER f3
SYMBOL =
IDENTIFIER f1
SYMBOL +
IDENTIFIER f2
SYMBOL ;
IDENTIFIER f1
SYMBOL ,
IDENTIFIER f2
SYMBOL =
IDENTIFIER f2
SYMBOL ,
IDENTIFIER f3
SYMBOL ;
SYMBOL }
Parsing
while f3 < 100 {
f3 = f1 + f2;
f1, f2 = f2, f3;
}

KEYWORD while Abstract Syntax Tree

IDENTIFIER f3 WHILE
SYMBOL <
EXPRESSION BODY
LITERAL 100
SYMBOL { < STATEMENT STATEMENT
IDENTIFIER f3
SYMBOL = f3 100 ASSIGN UNPACKING
IDENTIFIER f1
SYMBOL + f3 EXPRESSION TARGET SOURCE

IDENTIFIER f2 + f1 f2 f2 f3
SYMBOL ;
IDENTIFIER f1 f1 f2
SYMBOL ,
IDENTIFIER f2
SYMBOL =
IDENTIFIER f2
SYMBOL ,
IDENTIFIER f3
SYMBOL ;
SYMBOL }
Syntax Error
while f3 < 100 {
f3 = f1 + f2;
f1, f2 = f2, f3;
}

KEYWORD while Abstract Syntax Tree

IDENTIFIER f3 WHILE
SYMBOL <
EXPRESSION BODY
LITERAL 100
SYMBOL { <
IDENTIFIER f3
SYMBOL = f3 100 ???
IDENTIFIER f1
SYMBOL +
IDENTIFIER f2
SYMBOL ;
IDENTIFIER f1
SYMBOL ,
IDENTIFIER f2
SYMBOL = Syntax Error: Expect loop body
IDENTIFIER f2
SYMBOL ,
IDENTIFIER f3
SYMBOL ;
SYMBOL }
Syntax Error
while f3 < 100 {
f3 = f1 + f2;
f1, f2 = f2, f3;
}

KEYWORD while Abstract Syntax Tree

IDENTIFIER f3 WHILE
SYMBOL <
EXPRESSION BODY
LITERAL 100
SYMBOL { < STATEMENT STATEMENT
IDENTIFIER f3
SYMBOL = f3 100 ASSIGN UNPACKING
IDENTIFIER f1
SYMBOL + f3 EXPRESSION TARGET SOURCE

IDENTIFIER f2 + f1 f2 f2 f3
SYMBOL ;
IDENTIFIER f1 f1 f2
SYMBOL ,
IDENTIFIER f2
SYMBOL = Syntax Error: Missing }
IDENTIFIER f2
SYMBOL ,
IDENTIFIER f3
SYMBOL ;
SYMBOL }
Parsing: Grammars
stmt: expr NEWLINE
| ID '=' expr NEWLINE
| NEWLINE
;

expr: <assoc=right> expr op='^' expr

Factor → ( Expr )
| Integer

def Expr():
Expr()
match('+') Infinite Recursion!
Term()
Lexer & Parser?

• Usually, lexer and parser can be completely separated.

• However,

• vector<pair<int, int>>
Pragmatic Solution
• What to do
• Build AST
• Check syntax errors
• Use parser generators, especially, ANTLR 4
• Check https://github.com/antlr/grammars-v4
• Read if you want
• https://abcdabcd987.com/using-antlr4/
• https://abcdabcd987.com/notes-on-antlr4/
Challenge Yourself

• Hand-written lexer and parser

• Check Parsing Techniques: A Practical Guide
Semantic Analysis
Semantic Error
while f3 < 100 {
f3 = f1 + f4;
f1, f2 = f2, f3;
}

KEYWORD while Abstract Syntax Tree

IDENTIFIER f3 WHILE
SYMBOL <
EXPRESSION BODY
LITERAL 100
SYMBOL { < STATEMENT STATEMENT
IDENTIFIER f3
SYMBOL = f3 100 ASSIGN UNPACKING
IDENTIFIER f1
SYMBOL + f3 EXPRESSION TARGET SOURCE

IDENTIFIER f2 + f1 f2 f2 f3
SYMBOL ;
IDENTIFIER f1 f1 f4
SYMBOL ,
IDENTIFIER f2
SYMBOL = Semantic Error: f4 used before declaration
IDENTIFIER f2
SYMBOL ,
IDENTIFIER f3
SYMBOL ;
SYMBOL }
Language Features

• x, y = y, x

•c = sum(x * y for x in a for y in b)

• a.sort(key=lambda x: x[0])
Pragmatic Solution

• What to do
• Walk the AST tree
• Build symbol table
• Check all kinds of semantic errors
Challenge Yourself

• Add features to the language

• unpacking
• list comprehension
• lambda
• lifetimes
•…
IR Generation
IR: What & Why

• Intermediate Representation

• Focus less on the source language

• Pay more attention to the target platform

• Most of transformation and analysis are done in IR

IR Design

• IR design is closely related to

• Source language
• Target machine
• Transforms / Analysis
IR: Multiple Levels
• A compiler can use more than one IR, and of course, there
are more than one level.

• HIR/MIR: Carry more information. May have type system

similar to the source language. Higher level analysis &
transforms can be performed on. (Alias analysis works
better with type knowledge)
• point1.x => (LoadField point1 “x”)

• LIR: Closer to the target machine. Don’t have much type

information (General/FP Reg). Focus on code generation.
• point1.x => (LoadMem (Mem baseAddr 4))
IR: Multiple Levels
• A compiler can use more than one IR, and of course,
there are more than one level.

• LLVM: Low Level Virtual Machine

• Actually, its level is not that low.
• And it happens that LLVM use a single representation.

• The more information you own, the more chances you

have to do analysis and transforms.
LLVM IR
• LLVM: almost keep everything!

struct RT {
char A;
%struct.RT = type { i8, [10 x [20 x i32]], i8 }
int B[10][20];
%struct.ST = type { i32, double, %struct.RT }
char C;
};
define i32* @foo(%struct.ST* %s) {
struct ST {
entry:
int X;
%arrayidx = getelementptr inbounds %struct.ST,
double Y;
%struct.ST* %s, i64 1, i32 2, i32 1, i64 5, i64 13
struct RT Z;
ret i32* %arrayidx
};
}
int *foo(struct ST *s) {
return &s[1].Z.B[5][13];
}
IR Design: Structure
• Tree (the Tiger Book)
• ✘ I cannot understand it
• ✘ Hard to analyze and transform

• Linear (the Dragon Book)

• ✘ Hard to analyze and transform

• Control Flow Graph

• ✔ Easy to build CFG IR
• ✔ Further analysis and transformations need CFG
Control Flow Graph
if x > y

if x > y:
z = x
foo()
z = x z = y
else: foo() bar()
z = y
bar()
print(z)

print(z)

• Node: Basic Block

• BB: Straight-line piece of code without any jumps or jump targets

• Directed Edge: Jumps

Design: Memory Model?
• Memory-to-Memory
• Reg Alloc: What should be keep in registers?
• ✘ Lots of students wasted lots of time on it

• Register-to-Register:
• Unlimited virtual register
• Reg Alloc: What should be spilled to memory?
• ✔ Easy to understand
• ✔ Similar to the target platform
Design: Function?
• Should the “function” and “function call” concept
present in IR?

• I’m strongly in favor of it

• ✔ Simplify things
• Function call doesn’t split basic block
• In optimization’s language, “global” means inside a
function, not the whole program.
Debugging

• I printed my IR in LLVM’s format and run

• Painful!
• No direct memory arithmetic!

• I wrote my own interpreter

• https://github.com/abcdabcd987/LLIRInterpreter
• Life is much more easier!
Pragmatic Solution
• What to do
• Walk the AST tree
• Generate IR
• IR Design
• Use CFG IR. Don’t use tree IR or linear IR.
• Use register-to-register memory model
• Check senior students’ design for reference, for example
• https://github.com/abcdabcd987/LLIRInterpreter
Challenge Yourself

• Design your own IR

• Read for your reference if you want:
• https://speakerdeck.com/abcdabcd987/
compiler2016-by-abcdabcd987
Optimizations
Optimizations
• Loop optimizations
• Loop unrolling • Code generator optimization
• Software pipelining
• Register allocation
• Data-flow optimizations • Instruction selection
• Common subexpression • Instruction scheduling
elimination
• Constant folding and • Others
propagation • Dead code elimination
• Inlining
• SSA-based optimizations
• Global value numbering • …
• Sparse conditional constant
propagation
Register Allocation

• Register-to-register IR: infinite virtual registers

• Real machine: limited number of registers

• Register allocation: map virtual registers to real registers

• Spilling: which virtual registers should move to memory

Register Allocation
• Linear scan algorithm
• ✔ Sounds easier?
• ✔ Allocate faster
• ✘ Slightly worse run time
• Graph coloring algorithm
• ✘ Liveness analysis
• ✘ Write more lines of code
• ✔ Better run time performance
• ✔ Actually, not hard at all. Way simpler than lots of OI/ACM
algorithms.
Pragmatic Solution

• What to do
• Analyze and transform IR
• Graph coloring register allocation
• Inlining
Challenge Yourself

• Try all kinds of optimizations

Code Generation
Pragmatic Solution

• What to do
• Transform IR to target machine assembly
• Do it in a naïve way
Challenge Yourself

• Dive further into x86-64

• Instruction selection
• Instruction scheduling
Standard Library
Pragmatic Solution

• Use libc
Challenge Yourself

• Write your own standard library

• Write your own heap memory allocator
Wrap Up
Pragmatic Solution
• Use ANTLR 4. Imitate existing ANTLR 4 grammars.
• Use CFG IR. Use register-to-register model.
• Use graph coloring register allocation.
• Use libc
• Talk to classmates, TAs, senior students
• Ask for help. Don’t plagiarize others’ code.
Challenge Yourself

• And help others

Compiler Construction Final
No ratings yet
Compiler Construction Final
6 pages
Compiler Construction Iii B.E. - Vi Sem: Unit - I
No ratings yet
Compiler Construction Iii B.E. - Vi Sem: Unit - I
77 pages
Lecture 01
No ratings yet
Lecture 01
47 pages
Recap: Mooly Sagiv
No ratings yet
Recap: Mooly Sagiv
42 pages
02 Simple Sysntax Directed Translation (Updated)
No ratings yet
02 Simple Sysntax Directed Translation (Updated)
60 pages
TSR - Class Cd-Unit 2
No ratings yet
TSR - Class Cd-Unit 2
275 pages
Compiler Phases Overview
No ratings yet
Compiler Phases Overview
29 pages
Lect 02
No ratings yet
Lect 02
15 pages
Report
No ratings yet
Report
20 pages
Application Domains : Business Processing Scientific System Control Publishing
No ratings yet
Application Domains : Business Processing Scientific System Control Publishing
21 pages
Unit I
No ratings yet
Unit I
89 pages
Additional Note CSC 409
No ratings yet
Additional Note CSC 409
11 pages
Last Lecture
No ratings yet
Last Lecture
18 pages
Compiler Structure Overview Lecture
No ratings yet
Compiler Structure Overview Lecture
15 pages
Compiler Design Lab Overview
No ratings yet
Compiler Design Lab Overview
40 pages
CSC 4101 Fall 2007 Notes 1. Why Study
No ratings yet
CSC 4101 Fall 2007 Notes 1. Why Study
40 pages
Complier Design Documentation
No ratings yet
Complier Design Documentation
39 pages
Compiler Design
No ratings yet
Compiler Design
19 pages
Compiler For Flat Tiny C
No ratings yet
Compiler For Flat Tiny C
24 pages
R Mini-Compiler Project Report
No ratings yet
R Mini-Compiler Project Report
16 pages
Unit 3 Part 1
No ratings yet
Unit 3 Part 1
49 pages
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
No ratings yet
Overview of Compiler Environment Pass and Phase Phases of Compiler Regular Expression Lexical Analyzer LEX Tool Bootstrapping
35 pages
Unit 1
No ratings yet
Unit 1
50 pages
ECS-603 Put 2013 Sol
No ratings yet
ECS-603 Put 2013 Sol
27 pages
CS-441: Compiler Construction: By: Muhammad Nadeem Assistant Professor
No ratings yet
CS-441: Compiler Construction: By: Muhammad Nadeem Assistant Professor
72 pages
Cs133 Group A: Compiler Construction
No ratings yet
Cs133 Group A: Compiler Construction
24 pages
9 - Syntax Analysis
No ratings yet
9 - Syntax Analysis
60 pages
Popl I
No ratings yet
Popl I
94 pages
02 Simple Sysntax Directed Translation
No ratings yet
02 Simple Sysntax Directed Translation
57 pages
Lect 11
No ratings yet
Lect 11
15 pages
Lec 4
No ratings yet
Lec 4
26 pages
Introduction To Compiling
100% (1)
Introduction To Compiling
26 pages
Introduction to Compiler Basics
No ratings yet
Introduction to Compiler Basics
33 pages
CS3501 Compiler Design Question Bank
No ratings yet
CS3501 Compiler Design Question Bank
13 pages
Screenshot 2025-01-14 at 4.08.29 PM
No ratings yet
Screenshot 2025-01-14 at 4.08.29 PM
59 pages
Compiler 2
No ratings yet
Compiler 2
45 pages
Principles of Programming Language
No ratings yet
Principles of Programming Language
44 pages
Phases of A Compiler
No ratings yet
Phases of A Compiler
17 pages
Compiler Design - pdf2
No ratings yet
Compiler Design - pdf2
2 pages
Compiler Design - Compilers Principles and Practice - A.hosking - Compiler Course Slides
No ratings yet
Compiler Design - Compilers Principles and Practice - A.hosking - Compiler Course Slides
237 pages
CD Important Questions With Answers
No ratings yet
CD Important Questions With Answers
34 pages
Type Inference
No ratings yet
Type Inference
34 pages
SSCD Chapter3
No ratings yet
SSCD Chapter3
97 pages
Compiler Phases
No ratings yet
Compiler Phases
18 pages
Lecture 24
No ratings yet
Lecture 24
49 pages
BCS - Compiler Construction - Notes
No ratings yet
BCS - Compiler Construction - Notes
60 pages
Syntax Analysis in Compiler Design
No ratings yet
Syntax Analysis in Compiler Design
52 pages
Slides 02
No ratings yet
Slides 02
141 pages
Compiler Syntax Analysis Guide
No ratings yet
Compiler Syntax Analysis Guide
19 pages
Compiler Design Overview and Phases
100% (1)
Compiler Design Overview and Phases
28 pages
Slides 01 - Compiler Construction - UET CS - Introduction
No ratings yet
Slides 01 - Compiler Construction - UET CS - Introduction
37 pages
CS1352 May09
100% (1)
CS1352 May09
14 pages
CD Unit Ii
No ratings yet
CD Unit Ii
38 pages
All Units
No ratings yet
All Units
19 pages
Java Programming Concepts Overview
No ratings yet
Java Programming Concepts Overview
9 pages
COBOL/400 Programming Guide for AS/400
100% (1)
COBOL/400 Programming Guide for AS/400
44 pages
B.Tech CSE Curriculum: Sem V & VI
No ratings yet
B.Tech CSE Curriculum: Sem V & VI
40 pages
Unix Programmer'S Manual: 4.2 Berkeley Software Distribution, Volume 2c Virtual VAX
No ratings yet
Unix Programmer'S Manual: 4.2 Berkeley Software Distribution, Volume 2c Virtual VAX
152 pages
Smart Public Transport Thesis
No ratings yet
Smart Public Transport Thesis
83 pages
Library Books
0% (1)
Library Books
29 pages
Beginner
No ratings yet
Beginner
356 pages
Operating Systems Lab Report
No ratings yet
Operating Systems Lab Report
20 pages
Code Gdhhdheneration - Issues in The Design of A Code Generator
No ratings yet
Code Gdhhdheneration - Issues in The Design of A Code Generator
19 pages
PST Notes Unit 1
No ratings yet
PST Notes Unit 1
21 pages
Answer MRN WJEC CS 001-009
No ratings yet
Answer MRN WJEC CS 001-009
9 pages
CD Unit - 1 Lms Notes
No ratings yet
CD Unit - 1 Lms Notes
58 pages
Paper 27
No ratings yet
Paper 27
12 pages
Problem Solving CSC415
No ratings yet
Problem Solving CSC415
8 pages
FreeBSD Developers' Handbook - From The O'Reilly Anthology
No ratings yet
FreeBSD Developers' Handbook - From The O'Reilly Anthology
287 pages
Java Unit-1 Assignment Answers
No ratings yet
Java Unit-1 Assignment Answers
7 pages
Project 9
No ratings yet
Project 9
2 pages
C Error Handling Techniques Explained
No ratings yet
C Error Handling Techniques Explained
9 pages
Compiler Design Overview and Functions
No ratings yet
Compiler Design Overview and Functions
43 pages
Dlib C++ Library - How To Compile
No ratings yet
Dlib C++ Library - How To Compile
5 pages
1.3.7 High - and Low-Level Languages and Their Translators PDF
No ratings yet
1.3.7 High - and Low-Level Languages and Their Translators PDF
7 pages
BASIC PROGRAMMING SKILLS - FOUNDATIONS OF COMPUTER PROGRAMMING - Shrivastava - IBRG PDF
100% (3)
BASIC PROGRAMMING SKILLS - FOUNDATIONS OF COMPUTER PROGRAMMING - Shrivastava - IBRG PDF
312 pages
100 Top Compiler Design Important Questions and Answers PDF
50% (2)
100 Top Compiler Design Important Questions and Answers PDF
20 pages
FP CaseStudies
No ratings yet
FP CaseStudies
70 pages
CVXMOD: Python Convex Optimization Tool
No ratings yet
CVXMOD: Python Convex Optimization Tool
31 pages
L1 - Overview of Compiler Construction
No ratings yet
L1 - Overview of Compiler Construction
24 pages
Lec-1.1 Introduction To System Software
No ratings yet
Lec-1.1 Introduction To System Software
24 pages
ACD Unit-2 Part-1
No ratings yet
ACD Unit-2 Part-1
36 pages
At&CD Syllabus
No ratings yet
At&CD Syllabus
2 pages
C++ Programming - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials
No ratings yet
C++ Programming - Free Computer, Programming, Mathematics, Technical Books, Lecture Notes and Tutorials
14 pages

Compiler2018 Big Picture

Uploaded by

Compiler2018 Big Picture

Uploaded by

Compiler 2018:

• Improve programming skills

• Very important project experience in your CV

Java JVM on x86-64 Linux

Kotlin JVM on x86-64 Windows

Clojure JVM on Embedded Systems

• Lexing and parsing library: whatever you want

• Source language: M* (C-and-Java-like)

• Target platform: Linux x86-64 Assembly in NASM

• Additional language features: whatever you want

KEYWORD while Abstract Syntax Tree

KEYWORD while Abstract Syntax Tree

KEYWORD while Abstract Syntax Tree

expr: <assoc=right> expr op='^' expr

• Usually, lexer and parser can be completely separated.

• Hand-written lexer and parser

KEYWORD while Abstract Syntax Tree

•c = sum(x * y for x in a for y in b)

• Add features to the language

• Focus less on the source language

• Pay more attention to the target platform

• Most of transformation and analysis are done in IR

• IR design is closely related to

• HIR/MIR: Carry more information. May have type system

• LIR: Closer to the target machine. Don’t have much type

• LLVM: Low Level Virtual Machine

• The more information you own, the more chances you

• Linear (the Dragon Book)

• Control Flow Graph

• Node: Basic Block

• BB: Straight-line piece of code without any jumps or jump targets

• Directed Edge: Jumps

• I’m strongly in favor of it

• I printed my IR in LLVM’s format and run

• I wrote my own interpreter

• Design your own IR

• Register-to-register IR: infinite virtual registers

• Real machine: limited number of registers

• Register allocation: map virtual registers to real registers

• Spilling: which virtual registers should move to memory

• Try all kinds of optimizations

• Dive further into x86-64

• Write your own standard library

• And help others

You might also like