0% found this document useful (0 votes)
79 views62 pages

CS 540 Compiler Design Overview

The document outlines the course CS 540 Spring 2013 which covers topics in compilers including lexical analysis, syntax analysis, semantic analysis, code generation, and code optimization. It provides the prerequisites, instructor contact information, grading breakdown, required textbook, and notes the course is also offered online in a distance education section.

Uploaded by

Suman Mondal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
79 views62 pages

CS 540 Compiler Design Overview

The document outlines the course CS 540 Spring 2013 which covers topics in compilers including lexical analysis, syntax analysis, semantic analysis, code generation, and code optimization. It provides the prerequisites, instructor contact information, grading breakdown, required textbook, and notes the course is also offered online in a distance education section.

Uploaded by

Suman Mondal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 62

CS 540 Spring 2013

The Course covers:


• Lexical Analysis
• Syntax Analysis
• Semantic Analysis
• Runtime environments
• Code Generation
• Code Optimization

CS 540 Spring 2013 GMU 2


Pre-requisite courses
• Strong programming background in C, C+
+ or Java – CS 310
• Formal Language (NFAs, DFAs, CFG) –
CS 330
• Assembly Language Programming and
Machine Architecture –CS 367

CS 540 Spring 2013 GMU 3


Operational Information
• Office: Engineering Building, Rm. 5315

• E-mail: [email protected]

• Class Web Page: Blackboard


• Discussion board: Piazza

• Computer Accounts on zeus.vse.gmu.edu (link on


‘Useful Links’)

CS 540 Spring 2013 GMU 4


CS 540 Course Grading
• Programming Assignments (45%)
– 5% + 10% + 10% + 20%
• Exams – midterm and final (25%, 30%)

CS 540 Spring 2013 GMU 5


Resources
• Textbooks:
– Compilers: Principles, Techniques and Tools,
Aho, Lam, Sethi & Ullman, 2007 (required)
– lex & yacc, Levine et. al.
• Slides
• Sample code for Lex/YACC (C, C++, Java)

CS 540 Spring 2013 GMU 6


Distance Education
• CS 540 Spring ‘13 session is delivered to the
Internet section (Section 540-DL) online by
NEW
• Students in distance section will access to online
lectures and can play back the lectures and
download the PDF slide files
• The distance education students will be given the
midterm and final exam on campus, on the same
day/time as in class students. Exam locations
will be announced closer to the exam dates.

CS 540 Spring 2013 GMU 7


Lecture 1: Introduction to
Language Processing & Lexical
Analysis
CS 540
What is a compiler?
A program that reads a program written in one
language and translates it into another
language.

Source language Target language

Traditionally, compilers go from high-level


languages to low-level languages.
CS 540 Spring 2013 GMU 9
Compiler Architecture

In more detail:
Intermediate
Language

Source Target Language


Language Front End – Back End –
language specific machine specific

•Separation of Concerns
•Retargeting

CS 540 Spring 2013 GMU 10


Compiler Architecture
Intermediate
Intermediate
Language
Language

Scanner Parser Semantic Code


Source Code Target
language
(lexical (syntax Analysis Generator language
Optimizer
analysis) analysis) (IC generator)
tokens Syntactic
structure

Symbol
Table

CS 540 Spring 2013 GMU 11


Lexical Analysis - Scanning

Scanner tokens Parser Semantic Code


Source (lexical (syntax Generator
Analysis
languag analysis) analysis) (IC generator)
e

Code
Optimizer

• Tokens described formally


• Breaks input into tokens
Symbol
• Remove white space
Table

CS 540 Spring 2013 GMU 12


Input: result = a + b * c / d

• Tokens:
‘result’, ‘=‘, ‘a’, ‘+’, ‘b’, ‘*’, ‘c’, ‘/’, ‘d’

identifiers
operators

CS 540 Spring 2013 GMU 13


Static Analysis - Parsing
tokens Syntactic
Scanner Parser structure Semantic Code
Source Target
language
(lexical (syntax Analysis Generator
language
analysis) analysis) (IC generator)

Code
Optimizer

• Syntax described formally


• Tokens organized into syntax tree
that describes structure Symbol
• Error checking (syntax)
Table

CS 540 Spring 2013 GMU 14


Input: result = a + b * c / d
Exp ::= Exp ‘+’ Exp Assign
| Exp ‘-’ Exp
| Exp ‘*’ Exp ID ‘=‘ Exp
| Exp ‘/’ Exp
| ID Exp ‘+’ Exp

Assign ::= ID ‘=‘ Exp ID Exp ‘*’ Exp

ID Exp ‘/’ Exp

ID ID

CS 540 Spring 2013 GMU 15


Semantic Analysis
Syntactic/semantic
Syntactic structure
Scanner Parser structure Semantic Code
Source Target
language
(lexical (syntax Analysis Generator
language
analysis) analysis) (IC generator)

Syntactic/semantic
structure
Code
Optimizer

• “Meaning”
• Type/Error Checking
• Intermediate Code Generation –
abstract machine Symbol
Table

CS 540 Spring 2013 GMU 16


Optimization

Scanner Parser Semantic Code


Source Target
language
(lexical (syntax Analysis Generator
language
analysis) analysis) (IC generator)
Syntactic/semantic
structure
Syntactic/semantic
Code structure
Optimizer

• Improving efficiency (machine


independent)
• Finding optimal code is NP Symbol
Table

CS 540 Spring 2013 GMU 17


Code Generation
Syntactic/semantic
structure
Scanner Parser Semantic Code
Source Target
language
(lexical (syntax Analysis Generator language
analysis) analysis) (IC generator)

Syntactic/semantic
Code structure
Optimizer

• IC to real machine code


• Memory management, register
allocation, instruction selection, Symbol
instruction scheduling, … Table

CS 540 Spring 2013 GMU 18


Issues Driving Compiler Design
• Correctness
• Speed (runtime and compile time)
– Degrees of optimization
– Multiple passes
• Space
• Feedback to user
• Debugging
CS 540 Spring 2013 GMU 19
Related to Compilers
• Interpreters (direct execution)
• Assemblers
• Preprocessors
• Text formatters (non-WYSIWYG)
• Analysis tools

CS 540 Spring 2013 GMU 20


Why study compilers?
• Bring together:
– Data structures & Algorithms
– Formal Languages
– Computer Architecture
• Influence:
– Language Design
– Architecture (influence is bi-directional)
• Techniques used influence other areas (program
analysis, testing, …)

CS 540 Spring 2013 GMU 21


Review of Formal Languages
• Regular expressions, NFA, DFA
• Translating between formalisms
• Using these formalisms

CS 540 Spring 2013 GMU 22


What is a language?
• Alphabet – finite character set (
• String – finite sequence of characters – can
be , the empty string (Some texts use as
the empty string)
• Language – possibly infinite set of strings
over some alphabet – can be { }, the empty
language.

CS 540 Spring 2013 GMU 23


Suppose = {a,b,c}. Some
languages over could be:
• {aa,ab,ac,bb,bc,cc}
• {ab,abc,abcc,abccc,. . .}
• { }
• {}
• {a,b,c,
• …

CS 540 Spring 2013 GMU 24


Why do we care about Regular
Languages?
• Formally describe tokens in the language
– Regular Expressions
– NFA
– DFA
• Regular Expressions  finite automata
• Tools assist in the process

CS 540 Spring 2013 GMU 25


Regular Expressions
The regular expressions over finite  are the
strings over the alphabet  + { ), (, |, * }
such that:
1. { } (empty set) is a regular expression for the
empty set
2. is a regular expression denoting {  }
3. a is a regular expression denoting set { a } for
any a in 

CS 540 Spring 2013 GMU 26


Regular Expressions
4. If P and Q are regular expressions over , then so are:
• P | Q (union)
If P denotes the set {a,…,e}, Q denotes the set {0,…,9} then
P | Q denotes the set {a,…,e,0,…,9}
• PQ (concatenation)
If P denotes the set {a,…,e}, Q denotes the set {0,…,9} then
PQ denotes the set {a0,…,e0,a1,…,e9}
• Q* (closure)
If Q denotes the set {0,…,9} then Q* denotes the set {,0,
…,9,00,…99,…}

CS 540 Spring 2013 GMU 27


Examples
If  = {a,b}
• (a | b)(a | b)
• (a | b)*b
• a*b*a*
• a*a (also known as a+)
• (ab*)|(a*b)

CS 540 Spring 2013 GMU 28


Nondeterministic Finite Automata
A nondeterministic finite automaton (NFA) is a
mathematical model that consists of
1. A set of states S
2. A set of input symbols 
3. A transition function that maps state/symbol pairs to a
set of states:
S x { + }  set of S
4. A special state s0 called the start state
5. A set of states F (subset of S) of final states
INPUT: string
OUTPUT: yes or no
CS 540 Spring 2013 GMU 29
Example NFA
Transition Table:
a

STATE
0

1 b 2 b 3 a b 
a,b 0 0,3 0 1
1 2
S = {0,1,2,3} 2 3
S0 = 0
3
 = {a,b}
F = {3}

CS 540 Spring 2013 GMU 30


NFA Execution
An NFA says ‘yes’ for an input string if there is some path from
the start state to some final state where all input has been
processed.
NFA(int s0, int input) {
if (all input processed && s0 is a final state) return Yes;
if (all input processed && s0 not a final state) return No;

for all states s1 where transition(s0,table[input]) = s1


if (NFA(s1,input_element+1) == Yes) return Yes;

for all states s1 where transition(s0,) = s1


if (NFA(s1,input_element) == Yes) return Yes;
return No;
}
Uses backtracking to search
all possible paths
CS 540 Spring 2013 GMU 31
Deterministic Finite Automata
A deterministic finite automaton (DFA) is a mathematical
model that consists of
1. A set of states S
2. A set of input symbols 
3. A transition function that maps state/symbol pairs to a
state:
SxS
4. A special state s0 called the start state
5. A set of states F (subset of S) of final states
INPUT: string
OUTPUT: yes or no
CS 540 Spring 2013 GMU 32
DFA Execution
DFA(int start_state) {
state current = start_state;
input_element = next_token();
while (input to be processed) {
current =
transition(current,table[input_element])

if current is an error state return No;


input_element = next_token();
}
if current is a final state return Yes;
else return No;
}

CS 540 Spring 2013 GMU 33


Regular Languages
1. There is an algorithm for converting any RE into
an NFA.
2. There is an algorithm for converting any NFA to a
DFA.
3. There is an algorithm for converting any DFA to a
RE.

These facts tell us that REs, NFAs and DFAs have


equivalent expressive power. All three describe
the class of regular languages.
CS 540 Spring 2013 GMU 34
Converting Regular
Expressions to NFAs
The regular expressions over finite  are the strings
over the alphabet  + { ), (, |, *} such that:
• { } (empty set) is a regular expression for the empty set

• Empty string is a regular expression denoting {  }



• a is a regular expression denoting {a } for any a in 
a

CS 540 Spring 2013 GMU 35


Converting Regular
Expressions to NFAs
If P and Q are regular expressions with NFAsNp, Nq:
P | Q (union) Np
 

 
Nq

PQ (concatenation)
Np Nq
  

CS 540 Spring 2013 GMU 36


Converting Regular
Expressions to NFAs
If Q is a regular expression with NFA Nq:
Q* (closure) 
Nq
 

CS 540 Spring 2013 GMU 37


Example (ab* | a*b)*
Starting with:
a
ab* 1 2
a*b 3
b
4

b a

ab* | a*b
a
1 2
 
5 b 6
 
3 b 4

CS 540 Spring 2013 GMU 38


Example (ab* | a*b)*
ab* | a*b
a
1 2 

5 b 6

3 b 4 
(ab* | a*b)*
a a
1 2 

 b

7 5 6 8
 
 b
 3 4
a

CS 540 Spring 2013 GMU 39


Converting NFAs to DFAs
• Idea: Each state in the new DFA will correspond to
some set of states from the NFA. The DFA will be in
state {s0,s1,…} after input if the NFA could be in any
of these states for the same input.

• Input: NFA N with state set SN, alphabet , start state sN, final
states FN, transition function TN: SN x  + {}  set of SN
• Output: DFA D with state set SD, alphabet , start state
sD = -closure(sN), final states FD, transition function TD: SD
x   SD

CS 540 Spring 2013 GMU 40


-closure()
Defn: -closure(T) = T + all NFA states reachable
from any state in T using only  transitions.
b
1 2 b

a 
b 5 -closure({1,2,5}) = {1,2,5}
a
-closure({4}) = {1,4}
3

4 -closure({3}) = {1,3,4}
-closure({3,5}) = {1,3,4,5}

CS 540 Spring 2013 GMU 41


Algorithm: Subset Construction
sD = -closure(sN) -- create start state for DFA
SD = {sD} (unmarked)
while there is some unmarked state R in SD
mark state R
for all a in  do
s = -closure((R,a));
if s not already in SD then add it (unmarked)
TD(R,a) = s;
end for
end while
FD = any element of SD that contains a state in FN

CS 540 Spring 2013 GMU 42


Example 1: Subset Construction
NFA


1 2 a,b

a b 5

a,b
3 4
b

CS 540 Spring 2013 GMU 43


Example 1: Subset Construction
NFA
1,2


1 2 a,b

a b 5

a,b a b
3 4
b {1,2}

CS 540 Spring 2013 GMU 44


Example 1: Subset Construction
NFA
b
1,2 4,5

 a
1 2 a,b

a b 5 3,5

a,b a b
3 4
b {1,2} {3,5} {4,5}
{3,5}
{4,5}

CS 540 Spring 2013 GMU 45


Example 1: Subset Construction
NFA
b
1,2 4,5

 a
1 2 a,b
b 3,5 b 4
a 5

a,b a b
3 4
b {1,2} {3,5} {4,5}
{3,5} - {4}
{4,5}
{4}

CS 540 Spring 2013 GMU 46


Example 1: Subset Construction
NFA
b a,b
1,2 4,5 5

 a
1 2 a,b
b 3,5 b 4
a 5

a,b a b
3 4
b {1,2} {3,5} {4,5}
{3,5} - {4}
{4,5} {5} {5}
{4}
{5}
CS 540 Spring 2013 GMU 47
Example 1: Subset Construction
NFA
b a,b
1,2 4,5 5

 a a,b
1 2 a,b
b 3,5 b 4
a 5

a,b a b
3 4
b {1,2} {3,5} {4,5}
{3,5} - {4}
{4,5} {5} {5}
All final states since the
NFA final state is included {4} {5} {5}
{5} - -
CS 540 Spring 2013 GMU 48
Example 2: Subset Construction
NFA

b
1 2 b

a  b 5

a
3 4

CS 540 Spring 2013 GMU 49


Example 2: Subset Construction
NFA DFA

b a
1 2 b 1 1,3,4
 b b b
a 5 a
a 2 b a
3 4 b 1,3,4,5
 b
a
1,4,5

CS 540 Spring 2013 GMU 50


Example 3: Subset Construction
NFA DFA

b
 b
1 2 a 3 5 1,2,4

 a
b b
b 3,4 3,5 3
a 4 5
b b
a

4
a

CS 540 Spring 2013 GMU 51


Converting DFAs to REs
1. Combine serial links by concatenation
2. Combine parallel links by alternation
3. Remove self-loops by Kleene closure
4. Select a node (other than initial or final) for
removal. Replace it with a set of equivalent
links whose path expressions correspond to the
in and out links
5. Repeat steps 1-4 until the graph consists of a
single link between the entry and exit nodes.

CS 540 Spring 2013 GMU 52


Example
a
d b d a d
0 1 2 3 4 5
c
d b b

6 c 7

parallel edges become alternation

d a|b|c d a d
0 1 2 3 4 5

d b
b|c
6 7

CS 540 Spring 2013 GMU 53


Example
d a|b|c d a d
0 1 2 3 4 5

d b
b|c
6 7

serial edges become concatenation

d (a|b|c) d a d
0 3 4 5
b (b|c) d

CS 540 Spring 2013 GMU 54


Example
d (a|b|c) d a d
0 3 4 5
b (b|c) d

Find paths that can be “shortened”

d (a|b|c) d a d
0 3 4 5

b(b|c)da

CS 540 Spring 2013 GMU 55


Example
d (a|b|c) d a d
0 3 4 5

b(b|c)da eliminate self-loops

d (a|b|c) d a (b(b|c)da)*d
0 3 4 5

serial edges become concatenation

d (a|b|c) d a (b(b|c)da)*d
0 5

CS 540 Spring 2013 GMU 56


Describing Regular Languages
• Generate all strings in the language
• Generate only strings in the language

Try the following:


– Strings of {a,b} that end with ‘abb’
– Strings of {a,b} that don’t end with ‘abb’
– Strings of {a,b} where every a is followed by at
least one b
CS 540 Spring 2013 GMU 57
Strings of (a|b)* that end in abb

re: (a|b)*abb
a b b
0 1 2 3

a,b NFA

b
a

a b b
0 1 2 3

b a a

DFA
CS 540 Spring 2013 GMU 58
Strings of (a|b)* that don’t end in
abb

re: ??
b
a

a b b
0 1 2 3

b a a

DFA/NFA

CS 540 Spring 2013 GMU 59


Strings of (a|b)* that don’t end in
abb
b b
a a

a b b b*a a*b b
0 1 2 3 0 1 2 3

b a a a

bb
a*bbb
a

b*a a*b b*a a*b


0 1 2 0 1 2

a*bba |
ba a*ba

CS 540 Spring 2013 GMU 60


Suggestions for writing
NFA/DFA/RE
• Typically, one of these formalisms is more
natural for the problem. Start with that and
convert if necessary.
• In NFA/DFAs, each state typically captures
some partial solution
• Be sure that you include all relevant edges
(ask – does every state have an outgoing
transition for all alphabet symbols?)

CS 540 Spring 2013 GMU 61


Non-Regular Languages
Not all languages are regular”
• The language ww where w=(a|b)*

Non-regular languages cannot be described


using REs, NFAs and DFAs.

CS 540 Spring 2013 GMU 62

You might also like