0% found this document useful (0 votes)
22 views

Imp CS1352 APR08

The document discusses compiler design principles including the phases of compilation such as lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. It also defines basic blocks and flow graphs. Key terms discussed include buffer pairs, parse trees, eliminating left recursion, data flow analysis, back patching, code motion, and compiler construction tools.

Uploaded by

psycho.3fx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Imp CS1352 APR08

The document discusses compiler design principles including the phases of compilation such as lexical analysis, syntax analysis, semantic analysis, intermediate code generation, code optimization, and code generation. It also defines basic blocks and flow graphs. Key terms discussed include buffer pairs, parse trees, eliminating left recursion, data flow analysis, back patching, code motion, and compiler construction tools.

Uploaded by

psycho.3fx
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

MAY/JUNE-'09/CS1352-Answer Key

CS1352 – Principles of Compiler Design


University Question Key

Staff Incharge: Mrs.R.Ramya


April / May 2008

PART-A
1. Differentiate compiler and interpreter.
Compiler produces a target program whereas an interpreter performs the
operations implied by the source program.

2. Write short notes on buffer pair.


Concerns with efficiency issues
Used with a lookahead on the input
It is a specialized buffering technique used to reduce the overhead required to
process an input character. Buffer is divided into two N-character halves. Use two
pointers. Used at times when the lexical analyzer needs to look ahead several
characters beyond the lexeme for a pattern before a match is announced.

3. Construct a parse tree of (a+b)*c for the grammar E->E+E | E*E | (E) | id.

4. Eliminate immediate left recursion for the following grammar E->E+T | T, T->T
* F | F, F-> (E) | id.
The rule to eliminate the left recursion is A->Aá | â can be converted as A-> âA’
and A’-> áA’ | å. So, the grammar after eliminating left recursion is
E->TE’; E’->+TE’| å; T->FT’; T’->*FT’ | å; F-> (E) | id.

5. Write short notes on global data flow analysis.


Collecting information about the way data is used in a program.
• Takes control flow into account
Forward flow vs. backward flow
Forward: Compute OUT for given IN, GEN, KILL
– Information propagates from the predecessors of a vertex.
– Examples: Reachability, available expressions, constant propagation
Backward: Compute IN for given OUT, GEN, KILL
– Information propagates from the successors of a vertex.
– Example: Live variable Analysis

-1-
MAY/JUNE-'09/CS1352-Answer Key

6. Define back patching with an example.


Back patching is the activity of filling up unspecified information of labels using
appropriate semantic actions in during the code generation process. In the semantic
actions the functions used are mklist(i), merge_list(p1,p2) and backpatch(p,i).
Source: L2: x= y+1
if a or b then L3:
if c then After Backpatching:
x= y+1 100: if a goto 103
Translation: 101: if b goto 103
if a go to L1 102: goto 106
if b go to L1 103: if c goto 105
go to L3 104: goto 106
L1: if c goto L2 105: x=y+1
goto L3 106:

7. Give syntax directed translation for the following statement Call p1(int a, int b).
param a
param b
call p1

8. How will you find the leaders in basic block?


Leaders: The first statement of basic blocks.
 The first statement is a leader
 Any statement that is the target of a conditional or unconditional goto is a
leader
 Any statement that immediately follows a goto or conditional goto statement
is a leader

9. Define code motion.


It decreases the amount of code in a loop. Taking the expression which yield the same
result independent of the number of times a loop is executed (a loop-invariant
computation and places it before the loop.

10. Define basic block and flow graph.


A basic block is a sequence of consecutive statements in which flow of control enters
at the beginning and leaves at the end without halt or possibility of branching except at
the end. A flow graph is a directed graph in which the flow control information is added
to the basic blocks.
· The nodes in the flow graph are basic blocks
· the block whose leader is the first statement is called initial block.
· There is a directed edge from block B1 to block B2 if B2 immediately follows B1 in the
some execution sequence. We can say that B1 is a predecessor of B2 and B2 is a
successor of B1.

-2-
APR/MAY-'08/CS1352-Answer Key

PART - B
11. a. i. Explain the phases of compiler, with the neat schematic. (12)
The process of compilation is very complex. So it comes out to be customary from
the logical as well as implementation point of view to partition the compilation process
into several phases. A phase is a logically cohesive operation that takes as input one
representation of source program and produces as output another representation. (2)
Source program is a stream of characters: E.g. pos = init + rate * 60 (6)
– lexical analysis: groups characters into non-separable units, called token, and
generates token stream: id1 = id2 + id3 * const
• The information about the identifiers must be stored somewhere (symbol
table).
– Syntax analysis: checks whether the token stream meets the grammatical
specification of the language and generates the syntax tree.
– Semantic analysis: checks whether the program has a meaning (e.g. if pos is a record
and init and rate are integers then the assignment does not make a sense).
:=
:=

id1
+
id1
+
id2
*
id2
*
id3 inttoreal

id3 60 60

Syntax analysis Semantic analysis


– Intermediate code generation, intermediate code is something that is both close to the
final machine code and easy to manipulate (for optimization). One example is the three-
address code:
dst = op1 op op2
• The three-address code for the assignment statement:
temp1 = inttoreal(60);
temp2 = id3 * temp1;
temp3 = id2 + temp2;
id1 = temp3
– Code optimization: produces better/semantically equivalent code.
temp1 = id3 * 60.0
id1 = id2 + temp1
– Code generation: generates assembly
MOVF id3, R2
MULF #60.0, R2
MOVF id2, R1
ADDF R2, R1
MOVF R1, id1
 Symbol Table Creation / Maintenance
 Contains Info (storage, type, scope, args) on Each “Meaningful” Token, typically
Identifiers
 Data Structure Created / Initialized During Lexical Analysis
 Utilized / Updated During Later Analysis & Synthesis

-3-
APR/MAY-'08/CS1352-Answer Key

 Error Handling
 Detection of Different Errors Which Correspond to All Phases
 Each phase should know somehow to deal with error, so that compilation
can proceed, to allow further errors to be detected
Source Program

1
Lexical Analyzer

2
Syntax Analyzer

3
Semantic Analyzer

Symbol-table Error Handler


Manager
4 Intermediate Code
Generator

5
Code Optimizer

6
Code Generator

Target Program
(4)

ii. Write short notes on compiler construction tools. (4)


 Parser Generators : Produce Syntax Analyzers
 Scanner Generators : Produce Lexical Analyzers
 Syntax-directed Translation Engines : Generate Intermediate Code
 Automatic Code Generators : Generate Actual Code
 Data-Flow Engines : Support Optimization

(OR)

b. i. Explain grouping of phases. (8)


Front and back ends: (3)
Often, the phases are collected into a front end and a back end. The front end has
those phases, which depend primarily on source language and largely independent of the
target machine. These include lexical and syntactic analysis, the creation of symbol table,
semantic analysis and the generation of intermediate code.
Back end has those phases, which depend primarily on target machine and largely
independent of the source language, just the intermediate language. These include code
optimization phase, along with necessary error handling and symbol table operations.

Passes: (2)
several phases are implemented in a single pass consisting of reading an input file
and writing an output file. The activity of those phases can be interleaved during the pass.

-4-
APR/MAY-'08/CS1352-Answer Key

Reducing the number of passes: (3)


It is desirable to have few passes, since it takes time to read and write intermediate files.
But, on the other hand, if we group several phases into one pass, then we must keep entire
program in memory, because one phase may need information in a different order than a
previous phase produces it.
For some phases, grouping into one pass may present few problems:
 The interface between the lexical and syntactic analyzers can be limited to a
single token
 It is often very hard to perform code generation until the intermediate
representation has been completely generated
 It cannot generate target code for a construct if we do not know the types of
the variables involved in the construct
 It cannot determine target address of forward jump until we have seen the
intervening source code and generated target code for it.
Intermediate and target code generation can be merged into a single pass using a
technique called back patching. Use back patching, in which blank space slot is left for
missing information and fill in the slot when the information becomes available.

ii. Explain specification of tokens. (8)


Regular expressions are the notations for specifying the patterns. Each pattern
matches a set of strings
Strings and languages: (2)
An alphabet is a finite set of symbols. A string over an alphabet is a finite sequence of
symbols from the alphabet. Terms for parts of a string: Prefix, Suffix, Substring, Proper
prefix and proper suffix Language: It is a set of strings over some fixed alphabet.

Operations on languages: (2)


 Concatenation
 Union
 Kleene closure
 Positive closure

Regular expressions: (2)


 å is a regular expression that denotes {å}
 if a is a symbol in Ó, then a is a regular expression that denotes {a}
 Suppose r and s are regular expressions denoting the languages L(r) and L(s).
Then,
 (r) | (s) is a regular expression denoting L(r) U L(s)
 (r) (s) is a regular expression denoting L(r) L(s)
 (r)* is a regular expression denoting L(r)*
 (r) is a regular expression denoting L(r)
A language denoted by a regular expression is said to be a regular set.
 Unary operator * has the highest precedence and is left associative
 Concatenation has the second highest precedence and is left associative
 | has lowest precedence and is left associative

-5-
APR/MAY-'08/CS1352-Answer Key

Regular definitions: (2)


It is a sequence of definitions of the form d1->r1, d2->r2 … dn->rn
Where each di is a distinct name and each ri is a regular expression over the symbols in Ó
U {d1, d2, .. di-1}

12. a. Find the SLR parsing table for the given grammar and parse the sentence
(a+b)*c. E->E+E | E*E | (E) | id.
Given grammar: E->E*.E
1. E->E+E E->.E+E
2. E->E*E E->.E*E
3. E->(E) E->.(E)
4. E->id E->.id
Augmented grammar: I6: goto(I2, E)
E’->E E->(E.)
E->E+E E->E.+E
E->E*E E->E.*E
E->(E) I7: goto(I4, E)
E->id E->E+E.
I0: E’->.E E->E.+E
E->.E+E E->E.*E
E->.E*E I8: goto(I5, E)
E->.(E) E->E*E.
E->.id E->E.+E
I1: goto(I0, E) E->E.*E
E’->E.
E->E.+E goto(I2, ()=I2
E->E.*E goto(I2, id)=I3
I2: goto(I0, () goto(I4, ()=I2
E->(.E) goto(I4, id)=I3
E->.E+E goto(I5, ()=I2
E->.E*E goto(I5, id)=I3
E->.(E)
E->.id I9: goto(I6, ))
I3: goto(I0, id) E->(E).
E->id.
I4: goto(I1, +) goto(I6, +)=I4
E->E+.E goto(I6, *)=I5
E->.E+E goto(I7, +)=I4
E->.E*E goto(I7, *)=I5
E->.(E) goto(I8, +)=I4
E->.id goto(I8, *)=I5
I5: goto(I1, *)

First(E) = {(, id}


Follow(E)={+, *, ), $}

-6-
APR/MAY-'08/CS1352-Answer Key

SLR parsing table:

Action Goto
States
+ * ( ) id $ E
0 S2 S3 1
1 S4 S5 Acc
2 S2 S3 6
3 r4 r4 r4 r4
4 S2 S3 7
5 S2 S3 8
6 S4 S5 S9
7 S4, r1 S5, r1 r1 r1
8 S4, r2 S5, r2 r2 r2
9 r3 r3 r3 r3

Parsing the sentence (a+b)*c:

0 (a+b)*c$ shift 2
0(2 a+b)*c$ shift 3
0(2a3 +b)*c$ reduce by E->id
0(2E6 +b)*c$ shift 4
0(2E6+4 b)*c$ shift 3
0(2E6+4b3 )*c$ reduce by E->id
0(2E6+4E7 )*c$ reduce by E->E+E
0(2E6 )*c$ shift 9
0(2E6)9 *c$ reduce by E->(E)
0E1 *c$ shift 5
0E1*5 c$ shift 3
0E1*5c3 $ reduce by E->id
0E1*5E8 $ reduce by E->E*E
0E1 $ accept

(OR)

b. Find the predictive parser for the given grammar and parse the sentence (a+b)*c.
E->E+E | E*E | (E) | id.
 Elimination of left recursion (2)
 Calculation of First (3)
 Calculation of Follow (3)
 Predictive parsing table (6)
 Parsing the sentence (2)

-7-
APR/MAY-'08/CS1352-Answer Key

13. a. Generate intermediate code for the following code segment along with the required
syntax directed translation scheme: (8)
i) if(a>b)
x=a+b
else
x=a-b
where a and x are of real and b of int type data.
Syntax directed translation scheme for if E then S1 else S2:
E.true:= newlabel;
E.false:=newlabel;
S1.next:=S.next;
S2.next:=S.next;
S.code:=E.code || gen(E.true “:”) || S1.code || gen(‘goto’ S.next) ||
gen(E.false “:”) || S2.code
Intermediate code generated:
if a>b got L1
goto L2
L1: t1:=inttoreal(b)
x:=a+t1
goto L3
L2: t2:=inttoreal(b)
x:=a-t2
L3:
ii) int a,b; (8)
float c;
a=10;
switch(a)
{
case 10: c=1;
case 20: c=2;
}
Switch/Case statement in source language form:
switch(E)
{
Case V1: S1
..
Case Vn-1:Sn-1
default: Sn
}
Translation of source switch/case statement into intermediate language 3 A C:
Translation 1:
code to evaluate E into temporary t
goto test
L1: code for S1
goto next

-8-
APR/MAY-'08/CS1352-Answer Key

L2: code for S2


Goto next

Ln-1: code for Sn-1
go to next
Ln: code for Sn
goto next
test: if t=V1 goto L1
if t=V2 goto L2

if t=Vn-1 goto Ln-1
goto Ln
next:
Translation 2:
code to evaluate E into t
if t≠V1 goto L1
code for S1
goto next
L1: if t≠V2 goto L2
code for S2
goto next
L2: …

Ln-2: if t≠Vn-1 goto Ln-1


code for Sn-1
goto next
Ln-1: code for Sn
next:

Intermediate code generated:


int a,b;
float c;
SYMTAB have the following information for the above declarations:
Let offset=0
Name Type Offset Width
a integer 0 4
b integer 4 4
c float 8 8
3AC:
a:=10
if a≠10 goto L1
c:=1
goto next
L1: if a≠20 goto next
c:=2
next:

-9-
APR/MAY-'08/CS1352-Answer Key

(OR)

b. i. Generate intermediate code for the following code segment along with the required
syntax directed translation scheme: (8)
i=1; s=0;
while(i<=10)
s=s+a[i][i]
i=i+1
Semantic rules for while E do S1:
S.begin:=newlabel
E.true:= newlabel;
E.false:=S.next;
S1.next:=S.begin;
S.code:=gen(S.begin’:’) || E.code || gen(E.true “:”) || S1.code || gen(‘goto’ S.begin)

Intermediate code generated:


(1) i:=1
(2) s:=0
(3) if i ≠ 10 goto (5)
(4) goto (15)
(5) t1=i*10
(6) t1=t1+i
(7) t1=t1+4
(8) t2=addr(a)-44 //(1*10+1)*4
(9) t3=t2[t1] //a[i][j]
(10) t4=s+t3
(11) s=t4
(12) t5=i+1
(13) i=t5
(14) goto (3)
(15) …

ii. Write short notes on back-patching. (8)


Back patching is the activity of filling up unspecified information of labels using
appropriate semantic actions in during the code generation process. (2)
In the semantic actions the functions used are (2)
mklist(i) – create a new list having i, an index into array of quadruples.
merge(p1,p2) - merges two lists pointed by p1 and p2
back patch(p,j) inserts the target label j for each list pointed by p.
Example: (4)
Source: if b go to L1
if a or b then go to L3
if c then L1: if c goto L2
x= y+1 goto L3
Translation: L2: x= y+1
if a go to L1 L3:

- 10 -
APR/MAY-'08/CS1352-Answer Key

After Backpatching: 103: if c goto 105


100: if a goto 103 104: goto 106
101: if b goto 103 105: x=y+1
102: goto 106 106:

14. a. i. Explain the various issues in the design of code generation. (6)
 Input to the code generator
Intermediate representation of the source program, like linear
representations such as postfix notation, three address representations such as
quadruples, virtual machine representations such as stack machine code and
graphical representations such as syntax trees and dags.
 Target programs
It is the output such as absolute machine language, relocatable machine
language or assembly language.
 Memory management
Mapping of names in the source program to addresses of data object in run
time memory is done by front end and the code generator.
 Instruction selection
Nature of the instruction set of the target machine determines the difficulty of
instruction selection.
 Register allocation
Instructions involving registers are shorter and faster. The use of registers
is being divided into two sub problems:
o During register allocation, we select the set of variables that will reside in
registers at a point in the program
o During a subsequent register assignment phase, we pick the specific
register that a variable will reside in
 Choice of evaluation order
The order in which computations are performed affect the efficiency of target
code.
 Approaches to code generation

ii. Explain code generation phase with simple code generation algorithm. (10)
It generates target code for a sequence of three address statements. (2)
Assumptions:
 For each operator in three address statement, there is a corresponding target
language operator.
 Computed results can be left in registers as long as possible.
E.g. a=b+c: (2)
 Add Rj,Ri where Ri has b and Rj has c and result in Ri. Cost=1;
 Add c, Ri where Ri has b and result in Ri. Cost=2;
 Mov c, Rj; Add Rj, Ri; Cost=3;
Register descriptor: Keeps track of what is currently in each register
Address descriptor: Keeps tracks of the location where the current value of the name
can be found at run time. (2)

- 11 -
APR/MAY-'08/CS1352-Answer Key

Code generation algorithm: For x= y op z (2)


 Invoke the function getreg to determine the location L, where the result of y
op z should be stored (register or memory location)
 Check the address descriptor for y to determine y’
 Generate the instruction op z’, L where z’ is the current location of z
 If the current values of y and/or z have no next uses, alter register descriptor
Getreg: (2)
 If y is in a register that holds the values of no other names and y is not live,
return register of y for L
 If failed, return empty register
 If failed, if X has next use, find an occupied register and empty it
 If X is not used in the block, or suitable register is found, select memory
location of x as L

(OR)
b. i. Generate DAG representation of the following code and list out the
applications of DAG representation: (12)
i=1; s=0;
while(i<=10)
s=s+a[i][i]
i=i+1
Intermediate code generated: (4)
(1) i:=1
(2) s:=0
(3) if i ≠ 10 goto (5)
(4) goto (15)
(5) t1=i*10
(6) t1=t1+i
(7) t1=t1+4
(8) t2=addr(a)-44 //(1*10+1)*4
(9) t3=t2[t1] //a[i][j]
(10) t4=s+t3
(11) s=t4
(12) t5=i+1
(13) i=t5
(14) goto (3)
(15) …
DAG generation: (4)
Applications of DAG: (4)
· Determining the common sub-expressions.
· Determining which identifiers have their values used in the block
· Determining which statements compute values that could be used outside the block
· Simplifying the list of quadruples by eliminating the common sub-expressions and not
performing the assignment of the form x: = y unless and until it is a must.

- 12 -
APR/MAY-'08/CS1352-Answer Key

ii. Write short notes on next-use information with suitable example. (4)
If the name in a register is no longer needed, then the register can be assigned to
some other name. This idea of keeping a name in storage only if it will be used
subsequently can be applied in a number of contexts.
Computing next uses: (2)
The use of a name in a three-address statement is defined as follows: Suppose a
three-address statement i assigns a value to x. If statement j has x as an operand and
control can flow from statement i to j along a path that has no intervening assignments to
x, then we say statement j uses the value of x computed at i.
Example:
x:=i
j:=x op y // j uses the value of x
Algorithm to determine next use: (2)
The algorithm to determine next uses makes a backward pass over each basic
block, recording for each name x whether x has a next use in the block and if not,
whether it is live on exit from the block (using data flow analysis). Suppose we reach
three-address statement i: x: =y op z in our backward scan. Then do the following:
 Attach to statement i, the information currently found in the symbol table
regarding the next use and the liveness of x, y, and z.
 In the symbol table, set x to “not live” and “no next use”
 In the symbol table, set y and z to “live” and the next uses of y and z to i.

15. a. i. Explain - principle sources of optimization. (8)


Code optimization is needed to make the code run faster or take less space or both.
Function preserving transformations:
 Common sub expression elimination
 Copy propagation
 Dead-code elimination
 Constant folding
Common sub expression elimination: (2)
E is called as a common sub expression if E was previously computed and the
values of variables in E have not changed since the previous computation.
Copy propagation: (2)
Assignments of the form f:=g is called copy statements or copies in short. The
idea here is use g for f wherever possible after the copy statement.
Dead code elimination: (2)
A variable is live at a point in the program if its value can be used subsequently.
Otherwise dead. Deducing at compile time that the value of an expression is a constant
and using the constant instead is called constant folding.
Loop optimization: (2)
 Code motion: Moving code outside the loop
Takes an expression that yields the same result independent of the number of
times a loop is executed (a loop-invariant computation) and place the expression before
the loop.
 Induction variable elimination
 Reduction in strength: Replacing an expensive operation by a cheaper one.

- 13 -
APR/MAY-'08/CS1352-Answer Key

ii. Write short notes on: (8)


(1) Storage organization
Subdivision of run time memory:
Run time storage: The block of memory obtained by compiler from OS to execute the
compiled program. It is subdivided into
Code
Static data
 Generated target code
 Data objects Stack
 Stack to keep track of the activations
 Heap to store all other information
Heap
Activation record: (Frame)
It is used to store the information required by a single procedure call.
Returned value
Actual parameters
Optional control link
Optional access link
Saved machine status
Local data
temporaries
Temporaries are used to hold values that arise in the evaluation of expressions.
Local data is the data that is local to the execution of procedure. Saved machine status
represents status of machine just before the procedure is called. Control link (dynamic
link) points to the activation record of the calling procedure. Access link refers to the
non-local data in other activation records. Actual parameters are the one which is passed
to the called procedure. Returned value field is used by the called procedure to return a
value to the calling procedure

Compile time layout of local data:


The amount of storage needed for a name is determined by its type. The field for
the local data is laid out as the declarations in a procedure are examined at compile time.
The storage layout for data objects is strongly influenced by the addressing constraints on
the target machine.

(2) Parameter passing.


• Call by value
– A formal parameter is treated just like a local name. Its storage is in the
activation record of the called procedure
– The caller evaluates the actual parameter and place the r-value in the storage
for the formals
• Call by reference
• If an actual parameter is a name or expression having L-value, then that l-
value itself is passed
• However, if it is not (e.g. a+b or 2) that has no l-value, then expression is
evaluated in the new location and its address is passed.

- 14 -
APR/MAY-'08/CS1352-Answer Key

• Copy-Restore: Hybrid between call-by-value and call-by-ref (copy in, copy out)
– Actual parameters evaluated, its r-value is passed and l-value of the actuals
are determined
– When the called procedure is done, r-value of the formals are copied back to
the l-value of the actuals
• Call by name
– Inline expansion(procedures are treated like a macro)

(OR)

b. i. Optimize the following code using various optimization technique: (12)


i=1; s=0;
for (i=1; i<=3; i++)
for (j=1;j<=3;j++)
c[i][j]=c[i][j] + a[i][j] + b[i][j]

Intermediate code generated:

(1) i=1 (14) t4=addr(b)-16


(2) s=0 (15) t5=t2[t1] //c[i][j]
(3) i=1 (16) t6=t3[t1] //a[i][j]
(4) if i<=3 goto (6) (17) t7=t5+t6
(5) goto (27) (18) t8=t4[t1] //b[i][j]
(6) j=1 (19) t9=t7+t8
(7) if j<=3 goto (9) (20) t5=t9
(8) goto (24) (21) t10=j+1
(9) t1=i*3 (22) j=t10
(10) t1=t1+i (23) goto (7)
(11) t1=t1*4 (24) t11=i+1
(12) t2=addr(c) – 16 (25) i=t11
//(1*3+1)*4 (26) goto (4)
(13) t3=addr(a)-16 (27) ...
Using the techniques of common subexpression elimination, dead code
elimination, copy propagation, the code can be optimized.

ii. Write short notes on access to non-local names. (4)


The scope rules of a language decide how to reference the non-local names.
Methods used:
 Static or lexical scoping: It determines the declaration that applies to a name by
examining the program text alone.
 Dynamic scoping: It determines the declaration that applies to a name at run time,
by considering the current activations
Block – the rule used is most closely nested rule
Two methods for implementing block structure:
 Stack allocation
 Complete allocation

- 15 -

You might also like