1.
Compiler: A compiler is a translator than compiled programs because
which translates a program (the source interpreter translates one line at a
program) written in one language to an time.
equivalent program (the target
[Link] Rules: The scope of a
program) written in another language.
program entity (for example, data
item) is the point of the program
Advantages: [Link] code is not where the entity is accessible. In most
of the language the scope of the data
included by compiler, therefore
items is restricted to the program block
compiled code is more secure than
in which the data item is declared. For
interpreted code. [Link] produces
example, Variable x of block A is
an executable file and therefore the
accessible in block A and in the
program can be run without need of
enclosed block B. However, variable y
the source code. [Link] object program
of block A is not accessible in block B
can be used whenever required
since y is re-declared in block B. Thus,
without the need to of recompilation.
Disadvantages: [Link] an error is statement x = y uses y of block B.
found, the whole program (source [Link] Structure: The control
code) has to be re-compiled again and structure of a language is a collection
again. [Link] code needs to be of language facilities for sequencing,
produced before a final executable file, altering the flow of control during the
this can be a slow process. execution of programs.
[Link]: An interpreter is a [Link] OF COMPILER: [Link]
computer program that directly Analysis Phase: Lexical analysis phase
executes program instructions written scans the source code as a stream of
in a programming or scripting characters and converts it into
language, without requiring them meaningful lexemes. [Link]
previously to have been compiled into Analysis Phase: The syntax analysis
a machine language program. phase takes the token produced by
Advantages: [Link] an error is found lexical analysis as input and generates
then there is no need to retranslate the a parse tree (or syntax tree).
whole program like compiler. [Link] Analysis: Semantic
[Link] (check errors) is easier analysis (mapper) processes the parse
since the interpreter stops when it trees, detects the "meaning" of
finds an error. Disadvantages: statements and generates an
[Link] code is required for the intermediate code. [Link]
program to be executed and this Generation Phase: In intermediate
source code can be read by any other code generation phase, the parse tree
programmer so it is not a secured. representation of the source code is
[Link] are generally slower converted into low-level or machine-
like intermediate representation. input and generates a parse tree (or
[Link] Optimization Code syntax tree).
Optimization Phase: Optimization
[Link] Analysis: Semantic
means making the code shorter and
analysis is the third phase of compiler
less complex, so that it can execute
which checks whether the parse tree
faster and takes lesser space. [Link]
(or syntax tree) follows the rules of
Generation Phase: Code generation
language. Processing performed by the
phase, translates the intermediate
semantic analysis step can be classified
code representation of the source
into: 1. Processing of declarative
program into the target language
statements. 2. Processing of imperative
program.
statements. During semantic
[Link] Analysis: Lexical analysis or processing of declarative statements,
scanning is the first phase of compiler. items of information are added to the
The program which does lexical various lexical tables. While processing
analysis is called scanner. A scanner is executable statements (imperative
also called as lexical analyzer or linear statements), information from lexical
analysis. The lexical analyzer is the tables is used to determine semantic
interface between the source program validity of a construct. Semantic
and the complier. The scanner scans analysis, analyze the syntax tree to
the input program character by identify external evaluations, and then
character and groups the character apply rules of validity to the elemental
into the lexical units called tokens. evaluations. In this phase, actual
[Link] Analysis (Parsing): Syntax analysis is done to determine meaning
of statement. The meaning can be
analysis is the second phase of
determined only if statement is
compiler which is also called as parsing
or hierarchical analysis. In parsing the syntactically correct.
syntax analysis process the string of [Link] OF COMPILER: A pass
descriptor synthesized by the lexical means several phases of compilers are
analyzer to determine the syntactic grouped together and read. A compiler
structure of an input statement. The pass refers to the traversal of a
syntax analysis phase takes compiler through the entire source
words/tokens from the lexical analyzer program. In an implementation of a
and checks the syntactic correctness of compiler, the activities of one or more
the input program. Output of the phases are combined into a single
parsing step is representation of the module known as a pass.
syntactic structure of a statement. The
[Link]-pass Compiler: A single-pass
conventional representation is in the
compiler is a compiler that passes
form of syntax tree. Parsing takes the
through the source code of each
token produced by lexical analysis as
compilation unit only once. A one-pass
compiler is a compiler that passes compiler is a type of compiler that can
through the parts of each compilation create executable code for different
unit only once, immediately translating machines other than the machine it
each part into its final machine code. runs on.
Advantages: 1.A single-pass compiler
[Link]: Bootstrapping
is faster than the multi-pass complier.
is a process in which simple language is
[Link]-pass compiler uses few passes
used to translate more complicated
for compilation. [Link] process
program which in turn may handle for
in single-pass compiler is less time
more complicated program.
consuming than multi-pass compiler.
Bootstrapping is an approach for
Two-pass Compiler: A two-pass
making a self-compiling compiler that
compiler is an compiler which goes
is a compiler written in the source
through assembly language twice and
programming language that it
generate object code. First pass is
determine to compile. A bootstrap
called as Pass-I. It performs tasks like
compiler can compile the compiler and
Lexical analysis, Syntax analysis,
thus we can use this compiled compiler
Semantic analysis and intermediate
to compile everything else and the
code generation. Second pass is called
future versions of itself. Advantages:
as Pass-II. It performs tasks like Storage
[Link] Bootstrapping a compiler has to be
allocation, Code optimization and Code
written in the language it compiles.
generation. Multi-pass Compiler: A
[Link] bootstrapping techniques, an
multi-pass compiler is a type of
optimizing compiler can optimize itself.
compiler that processes the source
code or abstract syntax tree of a [Link] Optimization: Code
program several times. In the Pass III, optimization is aimed at improving the
compiler can read the output file execution efficiency of a program. It is
produced by second pass and check back-end phase of compiler which
that the tree follows the rules of depends only on target code. Code
language or not. The output of optimization phase gets the
semantic analysis phase is the intermediate code as input and
annotated tree syntax. Advantages: produces optimized intermediate code
1.A multi-pass compiler requires lesser as output. Code optimization is used to
memory space than single-pass improve the intermediate code so that
compiler. [Link] wider scope thus the output of the program could run
available to these compliers allows faster and take less space.
better code generation. CROSS Advantages: [Link] optimized program
COMPILER: A compiler which may run occupied 25 per cent less storage and
on one machine and produce the execute three times faster than un-
target code for another machine is optimized program. [Link] cost of
known as cross compiler. A cross execution. Disadvantage of Code
Disadvantage: [Link] 40% extra [Link]: The sentinel is a special
compilation time is needed. character that cannot be part of the
source program, and a natural choice is
[Link] analysis is the act of
the character eof. The advantage is
breaking down source text into a set of
that we will not loose characters for
words called tokens. Each token is
which token is not yet formed, while
found by matching sequential
moving from one half to other half.
characters to patterns. A program
which performs lexical analysis is [Link] AUTOMATA: A finite
termed as a lexical analyzer (lexer), automation is formerly defined as a 5-
tokenizer or scanner. Lexical analysis is tuple (Q, ∑, δ, q0, F) where, Q is a finite
the process of converting a sequence set of states which is non empty. ∑ is
of characters from source program into input alphabet. q0 is initial state. F is a
a sequence of tokens. set of final states and F ⊆ Q. δ is a
transition function or mapping
[Link]:The scanner scans the
function Qx ∑ → Q using this the next
input program character by character
state can be determined depending on
and groups the character into the
the current [Link] State Machines
lexical units called tokens.
(FSMs) can be used to simplify lexical
[Link]: Token is a sequence of analysis. A finite automaton is a
characters that represents a unit of recognizer because it merely
information in the source program. A recognizes whether the input strings
token is a pair which contains a token are in the language or not.
name and an optional attribute value.
[Link] OF REGULAR
[Link]: A rule that defines a set of
EXPRESSIONS AND FINITE
input strings for which the same token AUTOMATA (FA): An automaton with
is produced as output is known as
a finite number of states is called a
pattern. [Link]: A lexeme is a
Finite Automaton (FA) or Finite State
sequence of characters in the source Machine (FSM). Adv.: [Link] a lexical
program that matches the pattern for a analyzer in a conventional system
token. Lexeme is an instance of a
programming language (using regular
token. It is the actual text/character
expressions). [Link] a lexical analyzer
stream that matches with the pattern
in assembly language. [Link] a lexical
and is recognized as a token.
analyzer using lexical analyzer
[Link] Errors: During the lexical generator (LEX utility on LINUX).
analysis phase Lexical error can be [Link] Buffering: Input buffering is
detected. Lexical error is a sequence of
defined as a temporary area of
characters that does not match the
memory into which each record of data
pattern of any token.
is read when the Input statement
executes. There are efficiency issues [Link] is the process of
concerned with the buffering of input. determining whether a string of tokens
can be generated by a gramma.
[Link]: Lex is a computer program
that generates lexical analyzers [Link]: The program
("scanners" or "lexers"). The purpose performing syntax analysis is known as
of a lex program is to read an input parser. The syntax analyzer (parser)
stream and recognize tokens. A lex plays an important role in the compiler
compiler or simply lex is a tool for design. The main objective of the
automatically generating a lexical parser is to check the input tokens to
analyzer for a language. It is an analyze its grammatical correctness.
integrated utility of the UNIX operating Tasks: [Link] function of the syntax
system. The input notation for the lex analyzer is to check that the tokens
is referred to as the lex language. output by the lexical analyzer occur in
patterns permitted by the syntax of the
22. Lex Library functions:
expressions. [Link] should report any
[Link]():This function is used to start
syntax error in the program. [Link] should
or resume scanning. The next call in
also recover from the errors so that it
program1 to yylex() will continue from
can continue to process the rest of the
the point where it left off. All codes in
input.
rule section are copied into yylex().
[Link](): Whenever a lexer matches [Link] required in syntax
a token, the text of the token is stored analysis phase: [Link]: A set of
in the null terminated string yytext. characters allowed by the language.
(work just like pointers in C). Whenever The individual members of these set
the new token is matched, the are called terminals. [Link]: Group of
contents of yytext are replaced by new characters from the alphabet. Any
token. [Link](): The purpose of finite sequence of alphabets is called a
yywrap() function to additional string. [Link] Rules: They are
processing in order to "wrap" things up the rules to be applied for splitting the
before terminating. [Link](): The sentence into appropriate syntactic
yyerror( ) function is called which form for analysis. [Link] (CFG):
reports the error to the user. CFG is collection of an alphabet of
letters called terminals from which we
[Link] analysis is the second
are going to make the string that will
phase of a compiler. Syntax is the
be the words of a language.
grammatical structure of a language or
[Link]: A string of terminals is
program. Syntax analysis phase is also
called as sentence. For example: id + id
known as parsing.
* id. [Link] form: It consists of
non-terminals and terminals. For
example: E * E where E is NT and * is T.
[Link]: A derivation is basically a consists of a set of producers, one for
sequence of production rules, in order each non terminal execution starts
to get the input string. [Link]: with a start symbol procedure. The
Replacement of a set of terminals or execution ends or halts when
non-terminals (or sentential form), procedure body scans the entire input
which matches R.H.S. of a production string.
rule by a non-terminal on L.H.S. is
[Link] PARSER: Predictive
called as reduction. [Link] tree or
parsers are top-down parsers. A
Parse tree or Derivation tree: It is a
predictive parser is a recursive descent
graphical representation of an
parser that does not require
expression which indicates the
backtracking. Predictive parsing is a
sequence in which production rules
special form of recursive-descent
can be applied on starting symbols so
parsing in which the look ahead symbol
as to derive the string
unambiguously determines the
27. Types of Parsers: top-down procedure selected for each non
parsing and bottom-up parsing. TOP- terminal.
DOWN PARSING: In top-down parsing,
[Link] Parser: An LL parser is a top-
we start at the top of the parse tree
down parser for a subset of the
(with the start symbol) and we end up
Context-Free Grammars (CFGs). An LL
at the bottom of the parse tree. Top-
parser parses the input from left to
down parsing is also called LL (1)
right and constructs a leftmost
parsing or Left-to-Left parsing because
derivation of the sentence. The class of
in this method, we always select
grammars which are parsable in this
leftmost non-terminal for expansion.
way is known as the LL grammars. An
[Link]-Down Parsing with LL parser is called an LL(k) parser if it
Backtracking: Backtracking means, if uses k tokens (or input strings) of look
one derivation of a production fails, ahead when parsing a sentence.
the syntax analyzer restarts the process
[Link](1) Parser: In LL(1) stands for
using different rules of same
Left-to-right parse, Leftmost derivation,
production. The backtracking
1-symbol lookahead. Definition:
technique may process the input string
Predictive parser can be constructed
more than once to determine the right
for a class of LL(1) grammar. The first 'L'
production.
stands for scanning the input from left
[Link] DESCENT PARSING: to right and second 'L' stands for
A parse that uses a set of recursive producing leftmost derivation and '1'
procedures to recognize its input stands for using one input symbol of
without backtracking is called as look-a-head at each step to make
Recursive Descent Parsing (RDP). A parsing action decisions.
recursive descent parsing program
[Link](1) Grammars: A grammar [Link] PARSERS: LR parsing was
whose parsing table has no multiply- invented by Donald Knuth in 1965. LR
defined entries is said to be LL(1). parsing is bottom-up technique LR
Hence, the above grammar is LL(1) means the input scans from left to
grammar. The LL(1) grammars are right and generate rightmost derivation
suitable for top-down parsing. The in reverse. Advantages: [Link] method
LL(1) grammar is not left recursive and is non-backtracking shift-reduce
not ambiguous. parsing method. [Link] class of
grammars can be parsing using LR
[Link]-UP PARSING: Bottom-up
method. It is superset of predictive
parsing constructs a parse tree for an
parser. [Link] errors are detected
input string beginning at the leaves
immediately while scanning the input.
(the bottom) and working up towards
[Link]-recursion is not the problem in
the root (the top). In bottom-up
LR parsing. Grammar may be left
parsing, we start with input string and
recursive.
using steps of reduction reach to start
symbol or distinguished symbol of a [Link] of LR Parser: [Link] Parser:
grammar. The SLR parsing action and goto
[Link] PRECEDENCE function from the deterministic finite
PARSER: An operator precedence automaton that recognizes viable
prefixes. It will not make specifically
parser is a bottom-up parser that
defined parsing action tables for all
interprets an operator precedence
grammars but does succeed on several
grammar. An operator precedence
grammars for programming languages.
grammar is a context-free grammar
Given a grammar G. It augment G to
that has the property that no
make G’, and from G’ it can construct C,
production has either an empty right
the canonical collection of a set of
hand side (null productions) or two
items for G’. CLR Parser: CLR refers to
adjacent non-terminals in its right-
canonical lookahead. CLR parsing uses
hand side.
the canonical collection of LR(1) items
[Link] REDUCE PARSING: The to construct the CLR(1) parsing table.
most common bottom-up parsers are CLR(1) parsing table make more
the shift-reduce parsers. The shift- number of states as compared to the
reduce parsers examine the input SLR(1) parsing. In the CLR(1), it can
tokens and either shift (push) them locate the reduce node only in the
onto a stack or reduce elements at the lookahead symbols. LALR Parser:
top of the stack, replacing a right-hand LALR Parser is Look Ahead LR Parser. It
side by a left hand side. Shift-reduce is intermediate in power between SLR
parser can be implemented using and CLR parser. It is the compaction of
stack. The stack consists of grammar CLR parser, and hence tables obtained
symbols (non-terminals/terminals).
in this will be smaller than CLR parsing 41. Evaluating an SDD at the Nodes
table. of a Parse Tree: To evaluate an SDD:
Construct annotated parse tree.
[Link] GENERATOR (YACC):
Attributes at a node of a parse tree are
YACC stands for "Yet Another Compiler
evaluated. The evaluation order is
– Compiler". YACC assists in the next
depends upon whether the attribute is
phase of the compiler. YACC creates a
synthesised attribute or inherited
parser which will be output in a form
attribute. In synthesize attribute, we
suitable for inclusion in the next phase.
can evaluate in bottom-up order
[Link]: A Syntax Directed Translation (preorder traversal) i.e. children first. In
(SDT) specifies the translation of a inherited attribute, we can evaluate in
construct in terms of attributes top down order i.e. parent first.
associated with its syntactic (preorder traversal). The SDD having
components. both synthesized and inherited
attributes, there is no guarantee of any
[Link] DIRECTED
TRANSLATION (SDT): Syntax-directed order to evaluate an attributes at
nodes.
translation fundamentally works by
adding actions to the productions in a [Link] Graph: The inter-
context-free grammar, resulting in a dependencies among the inherited and
Syntax-Directed Definition (SDD). synthesized attributes at the nodes in a
Syntax-directed translation refers to a parse tree can be shown by a directed
method of compiler implementation graph called a dependency graph. It is
where the source language translation useful and customary to depict the
is completely driven by the parser. The data flow in a node for a given
main idea behind syntax-directed production rule by a simple diagram
translation is that the semantics or the called a dependency graph. A
meaning of the program is closely tied dependency graph represents the flow
to its syntax. of information between the attribute
[Link] DIRECTED DEFINITIONS instances in a parse tree.
(SDD): A context-free grammar in 43. S-Attributed: The SDD is S-
which the productions are shown along attributed if every attribute is
with its associated semantic rules is synthesized. A syntax-directed
called as a syntax-directed definition. translation is called S-attributed if all its
While grammars and automata attributes are synthesized.
describe the structure or syntax of
44.L-Attributed Definition: The
strings in the language, something
classes of syntax-directed definitions
more is needed to describe the
whose attributes can always be
meaning or semantics of those strings.
evaluated in depth-first order are
called L-Attributed Definitions.
[Link] OF SDT: The main code format is known as triples. A
application of syntax-directed quadruple is a record structure with
translation is construction of syntax four fields which we call op, arg 1, arg2
trees. and result. The op field contains an
internal code for the operator.
[Link] of Syntax Trees:
Syntax tree is nothing but a condensed [Link] OPTIMIZATION: For efficient
form of a parse tree in which the execution of the program, code
operator and keyword modes of the optimization is required. Code
parse tree are moved to their parents optimization is back-end phase of
and a chain of single production is compiler which depends only on target
replaced by a single link. Definition: A code. The goals of optimization are the
syntax tree is a condensed form of reduction of execution time and the
parse tree which is useful for improvement in memory usage.
representing language constructs. Efficiency is achieved by: [Link]
the redundancies in a program.
[Link] Generation: Code generation
[Link] or rewriting
is the process by which a compiler's
computations in a program. [Link]
code generator converts some
appropriate code generation strategies.
intermediate representation of source
Advantages: [Link] optimized program
code into a form (e.g., machine code)
occupied 25 per cent less storage and
that can be readily executed by a
execute three times faster than un-
machine.
optimized program. [Link] cost of
[Link] Strings: In the postfix execution. Disadvantage: [Link] 40%
notation an operator appears to the extra compilation time is needed.
right of its operands. This is also called
[Link] Transformations:
Polish notation. In postfix polish
Compiler optimization is generally
notation, each operator is written
implemented using a sequence of
immediately after its operands. The set
optimizing transformations, algorithms
of polish expressions with operators +
which take a program and transform it
and * can be generated by the
to produce a semantically equivalent
grammar.
output program that uses fewer
[Link] and Quadruples: The resources or executes faster.
instruction of the Triples presentation Optimizing transformations are
is divided into three fields - op, arg1 classified into local and global
and arg2. The fields arg1 and arg2 for transformation. [Link] transformation
the arguments of op (operator) are Local transformation is used for small
either pointers to the symbol table or segments of a program. [Link]
pointers into the triple structure. Since transformation is used for small
three fields are used, the intermediate segments of a program. Global
transformation is used for larger node for every sub-expression of an
segments consisting of loops or expression; an interior node represents
function bodies. an operator and its children represent
[Link] of Common Sub- its operands. The difference between
syntax tree and DAG is a node N in a
expressions: An occurrence of an
DAG has more than one parent if N
expression E is called a common
represents a common sub-expression.
subexpression if E was previously
computed, and the values of variables [Link] Value-number Method: Each
in E have not changed since the record is one node, which has a label
previous computation. field that determines an operation
[Link] Code Elimination: Dead code. The node can be referred by its
index or position in the array. The
code is the code which can be omitted
integer index of a node is called a value
from a program without affecting the
number.
results. Dead code is detected by
checking whether the value assigned in [Link] Blocks: A basic block is a
an assignment statement is used sequence of consecutive statement in
anywhere in the program. Dead is which flow of control enters at the
nothing but the useless code, beginning and leaves at the end
statements that compute values that without halting or branching except at
never get used. the last instruction.
[Link] ADDRESS CODE: The three [Link] Graphs: A graph
address code is a sequence of representation of three address
instructions of the following form: a = statement, called a flow graph.
b op c where, a, b, c are names,
59. ISSUES IN THE DESIGN OF A
constants or compiler generated
CODE GENERATOR:1. Input to the
temporaries and op is an operator.
Code Generator,[Link] Target Program,
Three-address code is a linearized
[Link] Management, [Link]
representation of a syntax tree in
Selection,5. Register Allocation.
which explicit names correspond to the
[Link] Order.
interior nodes of the graph. The reason
for the term 'three-address code' is the
each statement usually contains three
addresses, two for the operands and
one for the result.
[Link] for Expressions: A Directed
Acyclic Graph (DAG) for an expression
identifies the common sub expressions
in the expression. DAG is constructed
similar to syntax tree. A DAG has a