Compiler Design
Compiler Design
A compiler is a program that reads a program written in one language the source language and
translates it into an equivalent program in another language-the target language. The compiler
reports to its user the presence of errors in the source program.
2. What are the two parts of a compilation? Explain briefly.
Analysis and Synthesis are the two parts of compilation.
The analysis part breaks up the source program into constituent pieces and creates an
intermediate representation of the source program.
The synthesis part constructs the desired target program from the intermediate representation.
3. List the subparts or phases of analysis part.
Analysis consists of three phases:
Linear Analysis.
Hierarchical Analysis.
Semantic Analysis.
Preprocessor
Source program
Compiler
Assembler
Lexical Analyzer
Syntax Analyzer
Semantic Analyzer
Code optimizer
Code generator
Single- pass
Multi-pass
Load-and-go
Debugging or optimizing
Preprocessors
Assemblers
10. List the phases that constitute the front end of a compiler.
The front end consists of those phases or parts of phases that depend primarily on the source
language and are largely independent of the target machine. These include
Semantic analysis
A certain amount of code optimization can be done by the front end as well. Also includes error
handling that goes along with each of these phases.
11. Mention the back-end phases of a compiler.
The back end of compiler includes those portions that depend on the target machine and
generally those portions do not depend on the source language, just the intermediate language.
These include
Code optimization
Code generation, along with error handling and symbol- table operations.
Parser generators
Scanner generators
Data-flow engines
Patterns- There is a set of strings in the input for which the same token is produced as
output. This set of strings is described by a rule called a pattern associated with the token
Lexeme- A sequence of characters in the source program that is matched by the pattern
for a token.
Union L U M ={s | s is in L or s is in M}
Character classes ([abc] where a,b,c are alphabet symbols denotes the regular
expressions a | b | c.)
Specification of syntax
T is a set of terminals
S is a start symbol
Canonical LR parser
Backtracking
Left recursion
Left factoring
Ambiguity
AX.YZ
AXY.Z
AXYZ.
17. What is meant by viable prefixes?
The set of prefixes of right sentential forms that can appear on the stack of a shift-reduce parser
are called viable prefixes. An equivalent definition of a viable prefix is that it is a prefix of a right
sentential form that does not continue past the right end of the rightmost handle of that sentential
form.
18. Define handle.
A handle of a string is a substring that matches the right side of a production, and whose
reduction to the nonterminal on the left side of the production represents one step along the
reverse of a rightmost derivation.
A handle of a right sentential form is a production A and a position of where the string
may be found and replaced by A to produce the previous right-sentential form in a rightmost
derivation of . That is , if S =>Aw =>w,then A in the position following is a handle of
w.
19. What are kernel & non-kernel items?
Kernel items, whish include the initial item, S .S, and all items whose dots are not at the left
end.
Non-kernel items, which have their dots at the left end.
20. What is phrase level error recovery?
Phrase level error recovery is implemented by filling in the blank entries in the predictive parsing
table with pointers to error routines. These routines may change, insert, or delete symbols on the
input and issue appropriate error messages. They may also pop from the stack.
Syntax tree
Postfix
3. Define backpatching.
Backpatching is the activity of filling up unspecified information of labels using appropriate
semantic actions in during the code generation process.In the semantic actions the functions used
are mklist(i),merge_list(p1,p2) and backpatch(p,i)
4. Mention the functions that are used in backpatching.
mklist(i) creates the new list. The index i is passed as an argument to this function where
I is an index to the array of quadruple.
merge_list(p1,p2) this function concatenates two lists pointed by p1 and p2. It returns
the pointer to the concatenated list.
5. What is the intermediate code representation for the expression a or b and not c?
The intermediate code representation for the expression a or b and not c is the three address
sequence
t1 := not c
t2 := b and t1
t3 := a or t2
6. What are the various methods of implementing three address statements?
The three address statements can be implemented using the following methods.
Triples : the use of temporary variables is avoided by referring the pointers in the
symbol table.
Indirect triples : the listing of triples has been done and listing pointers are used instead
of using statements.
The code generator should produce the correct and high quality code. In other words,
the code generated should be such that it should make effective use of the resources of
the target machine.
o Define and use the three address statement a:=b+c is said to define a and to
use b and c.
o Live and dead the name in the basic block is said to be live at a given point if
its value is used after that point in the program. And the name in the basic block is
said to be dead at a given point if its value is never used after that point in the
program.
2. List the terminologies used in basic blocks.
3. What is a flow graph?
A flow graph is a directed graph in which the flow control information is added to the basic
blocks.
The block whose leader is the first statement is called initial block.
Determining which names are used inside the block and computed outside the block.
Determining which statements of the block could have their computed value outside the
block.
Simplifying the list of quadruples by eliminating the common su-expressions and not
performing the assignment of the form x := y unless and until it is a must.
Algebraic simplification
1. Mention the issues to be considered while applying the techniques for code
optimization.
The improvement over the program efficiency must be achieved without changing the
algorithm of the program.
o The machine dependent optimization is based on the characteristics of the target
machine for the instruction set used and addressing modes used for the
instructions to produce the efficient target code.
o The machine independent optimization is based on the characteristics of the
programming languages for appropriate programming structure and usage of
efficient arithmetic properties in order to reduce the execution time.
Available expressions
Reaching definitions
Live variables
Busy variables
Static allocation
Stack allocation
Heap allocation
The activation record is a block of memory used for managing the information needed by a
single execution of a procedure. Various fields f activation record are:
Temporary variables
Local variables
Control link
Access link
Actual parameters
Return values
Call by value
Call by reference
Copy-restore
Call by name