Compiler Design Notes
Compiler Design Notes
Programming Language
A programming language is an organized way of communicating with a computer
using a set of commands and instructions, instructing the computer to perform
specific task. It is compiler-based languages. Programming languages run faster
compare then scripting languages
1. It creates a .exe file.
2. Need compiler.
3. Complex
4. More code needs
Examples
1. C
2. C++
3. Java
4. Pascal
5. COBOL
6. Basic
Scripting Language
A scripting language is a programming language that supports scripts Scripting
languages don’t require to be compiled rather they are interpreted. Scripting
languages are primarily used for web applications. It an Interpreter based
language Takes less time to code as it needs less coding.
1. Does not creates a .exe file.
2. NO Need To compiler.
3. Easy to write
4. Less code needs
Examples
1. PHP
2. Python
3. JavaScript
4. VB Script
5. Perl
6. Ruby
Markup language
Markup language is a computer language that uses tags to define elements within
a document. No Need To compiler. The language specifies code for formatting,
both the layout and style, within a text file.
Examples
1. HTML
2. XML
3. XHTML
4. SGML
Client-side scripting
Client-side scripting cannot used to connect to database on server. It Executed in
the web browser.
Examples
1. JavaScript
2. ActionScript
3. VBScript
4. Ajax
5. HTML
6. CSS
Server-side scripting
Server-side scripting used to connect to database on server. It Executed in the
web Server. Generate Content for the dynamic web pages
Example 1
S ⇢ a, S ⇢ Aab, S ⇢ ∈
Where
B ⇢ B11 | B1 | 1 | 0
where
S and B are non-terminals
0 and 1 are terminals
Example 1
S ⇢ a, S ⇢ abA, S ⇢∈
where
S and A are non-terminals a and b are terminal
∈ is empty string
Example 2
S ⇢ 10B | 00S
B ⇢ 11B | 1B | 1 | 0
where
S and B are non-terminals 0 and 1 are terminals
Necessity of compiler
• Techniques used in a lexical analyzer can be used in text editors,
information retrievalsystem, and pattern recognition programs.
Properties of Compiler
1. Correctness
2. Correct output in execution.
3. It should report errors
4. Correctly report if the programmer is not following language syntax.
5. Efficiency
6. Compile time and execution.
7. Debugging / Usability.
Compiler Interpreter
1. It translates the whole program 1. It translate statement by
statement.
at atime. 2. Interpreter is slower.
2. Compiler is faster.
3. Debugging is easy.
3. Debugging is not easy.
4. Interpreter are portable.
4. Compilers are not portable.
Types of compiler
Native code compiler
A compiler may produce binary output to run /execute on the same
computer and operatingsystem. This type of compiler is called as native
code compiler.
1) Cross Compiler
A cross compiler is a compiler that runs on one machine and
produce object code foranother machine.
2) Bootstrap compiler
If a compiler has been implemented in its own language . self-hosting compiler.
3) One pass compiler
The compilation is done in one pass over the source program, hence the
compilation is completed very quickly. This is used for the programming
language PASCAL, COBOL, FORTAN.
4) Multi-pass compiler (2 or 3 pass compiler)
In this compiler , the compilation is done step by step . Each step uses the
result of theprevious step and it creates another intermediate result.
Example:- gcc , Turboo C++
5) JIT Compiler
This compiler is used for JAVA programming language and Microsoft .NET
6) Source to source compiler
It is a type of compiler that takes a high level language as a input and
its output as highlevel language. Example Open MP
List of compiler
1. Ada compiler
2. ALGOL compiler
3. BASIC compiler
4. C# compiler
5. C compiler
6. C++ compiler
7. COBOL compiler
8. Smalltalk comiler
9. Java compiler
ASSEMBLER
Source-to-source Compiler
Source code of one programming language is translated into the source of another language.
Loader
A loader is a program that places programs into memory and prepares them for execution.
loader is a part of the OS, which performs the tasks of loading executable files into memory
and run them
In analysis phase
1. Lexical Analyzer
2. Syntax Analyzer
3. Semantic Analyzer.
In synthesis phase
1. Intermediate Code Generator
2. Code Generator
3. Code Optimizer
1. Lexical Analysis
Lexical analyzer phase is the first phase of compilation process. It takes source code as input.
2. Syntax Analysis
Syntax analysis is the second phase of compilation process. It takes tokens as input and
generates a parse tree as output.
3. Semantic Analysis
Semantic analysis is the third phase of compilation process.
4. Intermediate Code Generation
Compiler generates the source code into the intermediate code.
5. Code Optimization
6.Code Generation
Lexical Analyzer
Lexical Analyzer reads the source program character by character and returns the tokens of the
source program.
Syntax Analyzer
1. A Syntax Analyzer creates the syntactic structure (generally a parse tree) of the given
program.
2. A syntax analyzer is also called a parser.
3. A parse tree describes a syntactic structure
4. The syntax of a language is specified by a context free grammar (CFG).
Semantic Analyzer
1. A semantic analyzer checks the source program for semantic errors and collects the type
information for the code generation.
2. Type-checking is an important part of semantic analyzer.
3. Normally semantic information cannot be represented by a context-free language used in
syntax analyzers
Symbol table
Symbol table information is used by the analysis and synthesis phases.
Essential data structure in compiler.
It is used to verify if a variable has been declared.
It is used to determine the scope of a name.
Regular definition
Defining a pattern for finite strings of symbols. language defined by regular grammar is
known as regular language
Properties of Regular Languages
Union
If L1 and If L2 are two regular languages, their union L1 𝖴 L2 will also be regular
Complement
If L(G) is regular language, its complement L’(G) will also be regular.
L(G) = {an | n > 1}
L’(G) = {an | n <= 1}
Kleene Closure
If L1 is a regular language, its Kleene closure L1* will also be regular.
L1 = (a 𝖴 b)
L1* = (a 𝖴 b)*
Concatenation
If L1 and If L2 are two regular languages, their concatenation L1.L2 will also be regular
Intersection
If L1 and If L2 are two regular languages, their intersection L1 ∩ L2 will also be regular.
Precedence
1. * highest precedence.
2. Concatenation (.) second-highest precedence.
3. | (Union operator) lowest precedence.
Example
Σ = {a, b}
a* (e, a,aa, aaa, aaaa …)
a+ (a, aa, aaa, aaaa …).
L* = {Empty, a, b, aa, ab, ba, bb, aab, aba, aaba, … }
L+ = {a, b, aa, ab, ba, bb, aab, aaba}
L2 = {aa, ab, bb, ba}
L3 = {aaa, aab, bbb, bba ,..}
L4 = {aaaa, aabb, bbbb, bbaa,…….}
Lexical analysis
lexical analyzer is the first phase of compiler. Its main task is to read the input characters and
produce as output a sequence of tokens that the parser uses for syntax analysis
It converts the High-level input program into a sequence of Tokens.
• Type token (id, number, real, . . . )
• Punctuation tokens (IF, void, return, . . . )
• Alphabetic tokens (keywords)
Role of Lexical Analyser
The lexical analyzer is the first phase of compiler. Its main task is to read the input characters
and produces output a sequence of tokens that the parser uses for syntax analysis. As in the
figure, upon receiving a “get next token” command from the parser the lexical analyzer reads
input characters until it can identify the next token.
It helps you to convert a sequence of characters into a sequence of tokens. The lexical analyzer
breaks this syntax into a series of tokens. It removes any extra space or comment written in the
source code.
1. Error can be detected.
2.Error is found during the execution of the program.
Basic Terminologies
Token
Sequence of characters which represents a unit of information in the source program.
1) Identifiers
2) keywords
3) operators
4) special symbols
5)constants
Example
int a = 9;
where
int- keywords
a- identifier
= operator
9 constants
; special symbol
Solution
Token=5
Non-Token
1. Comments
2. Blanks
3. New line
Lexeme
Sequence of characters in the source program that is matched by the pattern for a token.
Pattern
A set of strings in the input for which the same token is produced as output.
2. String
A string is a finite sequence of symbols taken from ∑.
Example
{0,1} is a valid string on the alphabet set
3. Length of a String
It is the number of symbols present in a string.
Examples −
• If S = ‘caeda’, |S|= 5
• If S = ‘010111’, |S|= 6
• If |S|= 0, it is called an empty string
Language
It can be finite or infinite.
Example
Examples
A,D,E,F,G
Examples
A,d,e,f,g (small letters)
3. Null String
NIL, ∈
Context free grammar
G= (V, T, P, S)
V Non-terminal symbols
T Terminal symbols.
P Production rules
S Start symbol
Derivation Tree
• Root vertex − Start symbol.
• Vertex − Non-terminal symbol.
• Leaves − Terminal symbol
Derivation Tree Approaches
Top-down Approach −
• Starts with the starting symbol S
• Goes down to tree leaves using productions
Bottom-up Approach −
• Starts from tree leaves
• Proceeds upward to the root which is the starting symbol S
Grammar Ambiguity
1. More than one leftmost derivation
2. More than one rightmost derivation
3. More than one derivation tree or parse tree
Example 1
Example
L={anbn}
Properties of context free grammar
1. Union Operation
The context free languages are closed under union. L1 and L2 are two context free
languages.
Example
Context free languages are closed under concatenation. L1 and L2 are two context free
languages.
Example
Context free languages are closed under kleen closure. L1 and L2 are two context free
languages.
Example
4. Intersection
context free languages are not closed under intersection. L1 and L2 are two context free
languages.
Example
Context free languages are not closed under complement. L1 and L2 are two context free
languages.
Example
Parser
Syntax analysis
Syntax Analyzer creates the syntactic structure of the given source program.
This syntactic structure is mostly a parse tree.
Syntax Analyzer is also known as parser.
The syntax of a programming is described by a context-free grammar (CFG). We will use
BNF(Backus-Naur Form) notation in the description of CFGs.
The syntax analyzer (parser) checks whether a given source program satisfies the rules implied
by a context-free grammar or not.