0% found this document useful (0 votes)
17 views2 pages

Compiler Design - pdf2

Free note important

Uploaded by

me2020031
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views2 pages

Compiler Design - pdf2

Free note important

Uploaded by

me2020031
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

• Introduction Compiler • Token • Recursive Decent Parsing

A Compiler is a translator from one language, the input or source language, to another language, the output or target It is basically a sequence of characters that are treated as a unit as it cannot be further broken down. In programming Typically, top-down parsers are implemented as a set of recursive functions that descent through a
language. Often, but not always, the target language is an assembler language or the machine language for a languages like C language- keywords (int, char, float, cost, got, continue, etc.) identifiers (user-defined names), parse tree for a string. This approach is known as recursive descent parsing, also known as LL(k) parsing
computer processor. operators (+, -, *, /), delimiters/punctuators like comma (,), semicolon(;), braces ({ }), etc. , strings can be considered where the first L stands for or left-to-right, the second L stands for leftmost-derivation, and k indicates k-
Note that using a compiler requires a two-step process to run a program. as tokens. This phase recognizes three types of tokens: Terminal Symbols (TRM)- Keywords and Operators, Literals symbol look ahead. Therefore, a parser using the single symbol look-ahead method and top-down
1. Execute the compiler (and possibly an assembler) to translate the source program into a machine language (LIT), and Identifiers (IDN). parsing without backtracking is called LL(1) parser. In the following sections, will also use an extended
program. Let’s understand now how to calculate tokens in a source code (C language): BNF notation no in we
2. Execute the resulting machine language program, supplying appropriate input. Example 1: int a = 10; //Input Source code which some regulation expression operators are to be incorporated. A syntax expression defines
Compiler is a translator program that translates a program written in (HLL) the source program and translates it into Tokens int (keyword), a(identifier), =(operator), 10(constant) and ;(punctuation- sentences of the form, or. A syntax of the form defines sentences that consist of a sentence of the form
an equivalent program in (MLL) the target program. As an important part of a compiler is error showing to the semicolon) followed by a sentence of the form followed by a sentence of the form. A syntax of the form defines zero
programmer. • Lex or occurrence of the form. A syntax of the form defines zero or more occurrences
Executing a program written n HLL programming language is basically of two parts. The source program must first be Lex is a tool or a computer program that generates Lexical Analysers (converts the stream of characters into of the form.
compiled translated into an object program. Then the results object program is loaded into a memory executed. tokens). The Lex tool itself is a compiler. The Lex compiler takes the input and transforms that input into A usual implementation of an LL(1) parser is.
Types of Compiler input patterns. It is commonly used with YACC(Yet Another Compiler Compiler). It was written by Mike Lesk • initialize its data structures,
There are mainly three types of compilers. and Eric Schmidt. • get the lookahead token by calling scanner routines, and
1 Single Pass Compiler Function of Lex • call the routine that implements the start symbol.
When all the phases of the compiler are present inside a single module, it is simply called a single-pass Here is an example.
1. In the first step the source code which is in the Lex language having the file name ‘File.l’ gives as input to
compiler. It performs the work of converting source code to machine code. proc syntax Analysis()
the Lex Compiler commonly known as Lex to get the output as lex.yy.c. begin
2 Two Pass Compiler 2. After that, the output lex.yy.c will be used as input to the C compiler which gives the output in the form initialize(); // initialize global data and structures
Two-pass compiler is a compiler in which the program is translated twice, once from the front end and the of an ‘a.out’ file, and finally, the output file a.out will take the stream of character and generates tokens as NOTES.IN program(); // parser routine that implements the start symbol end;
back from the back end known as Two Pass Compiler. output. next Token(); // get the lookahead token
3 Multiphases Compiler lex.yy.c: It is a C program. • Predictive Parsing
. Backpatching
When several intermediate codes are created in a program and a syntax tree is processed many times, it is File.l: It is a Lex source program The easiest way to implement the syntax-directed definitions Predictive parsing is a special case of recursive descent parsing where no backtracking is required.
called Multi pass Compiler. It breaks codes into smaller programs. a.out: It is a Lexical analyzer for boolean expressions is to use two passes. First, construct The key problem of predictive parsing is to determine the production to be applied for a non-terminal in case of
• Major Data Structure in Compiler a syntax tree for the input, and then walk the tree in depth- alternatives.
Symbol Tables are organized for fast lookup. Items are typically entered once and then looked up several times. Hash first order, computing the translations. The main problem Non-recursive predictive parser
with generating code for boolean expressions and flow-of-
Tables and Balanced Binary Search Trees are commonly used. Each record contains a "name" (symbol) and control statements in a a single pass is sis that during one
information describing it. single pass we may not know the labels that control must go
Simple Hash Table>>> Asher translates "name" into an integer in a fixed range- the hash value. Hash Value indexes to at the time the jump statements are generated. Hence,
into an array of lists. series of branching statements with the targets of the jumps
Entry with that symbol is in that list or is not stored at all. Items with same hash value = bucket. left unspecified d is generated. Each statement will be put on
Balanced Binary Search Tree <<< Binary search trees work if they are kept balanced. Can achieve logarithmic a list of f goto statements whose labels will be filled in when
the proper label can be determined. We call this subsequent
lookup time. Algorithms are somewhat complex. Red-black trees and AVL trees are used. No leaf is much farther
filling in of labels. backpatching.
from root than any other. To manipulate lists of labels, we use three functions:
Parse Tree>> The structure of a modern computer language is tree-like. Trees represent recursion well. A makelist(i) creates a new list containing only i, an index into
Lex File Format The table-driven predictive parser has an input buffer, stack, a parsing table and an outputstream.
grammatical structure is a node with its parts as child nodes. Interior nodes are non-terminals. The tokens of the the array of quadruples; makelist returns a pointer to the list
A Lex program consists of three parts and is separated it has made. Input buffer:
language are leaves
by %% delimiters:- concatenated list. It consists of strings to be parsed, followed by S to indicate the end of the input string.
• Phases of a Compiler
Declarations %% merge(p1.p2) concatenates the lists pointed to by pl and p2, Stack:
We basically have two phases of compilers, namely the Analysis phase and Synthesis phase. The analysis
Translation rules %% and returns pointer to the It contains a sequence of grammar symbols preceded by $ to indicate the bottom of the stack. Initially, the stack
phase creates an intermediate representation from the given source code. The synthesis phase creates an Auxiliary procedures backpatch(p,i) inserts i as the target label for each of the contains the start symbol on top of $.
equivalent target program from the intermediate representation. Declarations: The declarations include declarations of variables.
statements on the list pointed to by p.
Parsing table:
A compiler is a software program that converts the high-level source code written in a programming Transition rules: These rules consist of Pattern and Action. It is a two-dimensional array M [A, a], where 'A' is a non-terminal and 'a' is a terminal. Predictive parsing program. The
language into low-level machine code that can be executed by the computer hardware. The process of Auxiliary procedures: The Auxiliary section holds auxiliary functions used in the actions. parser is controlled by a program that considers X, the symbol on top of stack, and a, thecurrent input symbol. These
converting the source code into machine code involves several phases or stages, which are collectively • Lexical Analysis: two symbols determine the parser action. There are three possibilities:
known as the phases of a compiler. The typical phases of a compiler are: A lexical analyser is also called a "Scanner". Given the code's statement/ input string, it reads the statement from • If Xa S, the parser halts and announces successful completion of parsing.
Lexical Analysis:> The first phase of a compiler is lexical analysis, also known as scanning. This phase left to right character-wise. The input to a lexical analyser is the pure high-level code from the preprocessor. It • If X = a ≠ S, the parser pops X off the stack and advances the input pointer to the next input symbol.
reads the source code and breaks it into a stream of tokens, which are the basic units of the programming identifies valid lexemes from the program and returns tokens to the syntax analyser, one after the other, • If X is a non-terminal, the program consults entry M[X, a] of the parsing table M. This entry will either be
corresponding to the get Next Token command from the syntax analyser. an X-production of the grammar or an error entry.
language. The tokens are then passed on to the next phase for further processing.
• If M[X, a] (XUVW), the parser replaces X on top of the stack by UVW
Syntax Analysis>> : The second phase of a compiler is syntax analysis, also known as parsing. This phase
• If M[X, a] error, the parser calls an error recovery routine
takes the stream of tokens generated by the lexical analysis phase and checks whether they conform to the
• LR parser :
grammar of the programming language. The output of this phase is usually an Abstract Syntax Tree (AST). LR parser is a bottom-up parser for context-free grammar that is very generally used by computer programming
Semantic Analysis>> : The third phase of a compiler is semantic analysis. This phase checks whether the language compiler and other associated tools. LR parser reads their input from left to right and produces a
code is semantically correct, i.e., whether it conforms to the language’s type system and other semantic There are three important terms to grab: right-most derivation. It is called a Bottom-up parser because it attempts to reduce the top-level grammar
rules. In this stage, the compiler checks the meaning of the source code to ensure that it makes sense. The Tokens: A Token is a pre-defined sequence of characters that cannot be broken down further. It is like an abstract productions by building up from the leaves. LR parsers are the most powerful parser of all deterministic
compiler performs type checking, which ensures that variables are used correctly and that operations are symbol that represents a unit. A token can have an optional attribute value. There are different types of tokens: parsers in practice.
performed on compatible data types. The compiler also checks for other semantic errors, such as Rules for LR parser :
Identifiers (user-defined) , Delimiters/ punctuations (;, ,, {}, etc.) , Operators (+, -, *, /, etc.) , Special symbols
undeclared variables and incorrect function calls. The rules of LR parser as follows.
Keywords , Numbers
The first item from the given grammar rules adds itself as the first closed set.
Intermediate Code Generation:>> The fourth phase of a compiler is intermediate code generation. This Lexemes: A lexeme is a sequence of characters matched in the source program that matches the pattern of a If an object is present in the closure of the form A→ α. β. γ, where the next symbol after the symbol is non-terminal,
phase generates an intermediate representation of the source code that can be easily translated into token. add the symbol’s production rules where the dot precedes the first item.
machine code. For example: (, ) are lexemes of type punctuation where punctuation is the token. Repeat steps (B) and (C) for new items added under (B).
Optimization:>> The fifth phase of a compiler is optimization. This phase applies various optimization Patterns: A pattern is a set of rules a scanner follows to match a lexeme in the input program to identify a valid
techniques to the intermediate code to improve the performance of the generated machine code. token. It is like the lexical analyser’s description of a token to validate a lexeme. • LR parser diagram : LALR
Code Generation>> : The final phase of a compiler is code generation. This phase takes the optimized For example, the characters in the keyword are the pattern to identify a keyword. To identify an identifier the LALR Parser is lookahead LR parser. It is the most powerful
intermediate code and generates the actual machine code that can be executed by the target hardware. parser which can handle large classes of grammar. The size
pre-defined set of rules to create an identifier is the pattern
of CLR parsing table is quite large as compared to other
Analysis: Input Buffer • Difference between Top down parsing and Bottom up parsing
parsing table. LALR reduces the size of this table.LALR works
The lexical analyzer scans the characters of the source program one at a time to discover tokens. Often, however, • Top-Down Parsing>
similar to CLR. The only difference is , it combines the similar
many characters beyond the next token many have to be examined before the next token itself can be determined. 1> It is a parsing strategy that first looks at the highest level of the parse tree and works down the parse tree by
states of CLR parsing table into one single state.
For this and other reasons, it is desirable for the lexical analyzer to read its input from an input buffer. Fig. 1.8 shows using the rules of grammar.
a buffer divided into two halves of, say 100 characters each. One pointer marks the beginning of the token being The general syntax becomes [A->∝.B, a ]
2> Top-down parsing attempts to find the left most derivations for an input string.
discovered. A look ahead pointer scans ahead of the beginning point, until the token is discovered. We view the where A->∝.B is production and a is a terminal or right end
3> In this parsing technique we start parsing from the top (start symbol of parse tree) to down (the leaf node of
position of each pointer as being between the character last read and the character next to be read. In practice each marker $
parse tree) in a top-down manner. LR(1) items=LR(0) items + look ahead
buffering scheme adopts one convention either a pointer is at the symbol last read or the symbol it is ready to read. 4> This parsing technique uses Left Most Derivation.
The distance which the look ahead pointer may have to travel past the actual token may be large. For example, in a 5> The main leftmost decision is to select what production rule to use in order to construct the string.
PL/I program we may see: 6> Example: Recursive Descent parser. Steps for constructing the LALR parsing table :
DECLARE (ARGI, ARG2... ARG n) • Bottom-Up Parsing>> 1 Writing augmented grammar
Without knowing whether DECLARE is a keyword or an array name until we see the character that follows the right 1 It is a parsing strategy that first looks at the lowest level of the parse tree and works up the parse tree by using the 2 LR(1) collection of items to be found
parenthesis. In either case, the token itself ends at the second E. If the look ahead pointer travels beyond the buffer rules of grammar. 3 Defining 2 functions: goto[list of terminals] and action[list of non-terminals] in the LALR parsing table
half in which it began, the other half must be loaded with the next characters from the source file. 2 Bottom-up parsing can be defined as an attempt to reduce the input string to the start symbol of a grammar.
Since the buffer shown in figure is of limited size there is an implied constraint on how much look ahead can be used 3 In this parsing technique we start parsing from the bottom (leaf node of the parse tree) to up (the start symbol of
before the next token is discovered. In the above example, if the look ahead traveled to the left half and all the way the parse tree) in a bottom-up manner
through the left half to the middle, we could not reload the right half, because would lose characters that had not yet 4 This parsing technique uses Right Most Derivation.
been grouped into tokens. While we can make the buffer larger if we chose or use another buffering scheme, we 5 The main decision is to select when to use a production rule to reduce the string to get the starting symbol.
cannot ignore the fact that overhead is limited. Example: ItsShift Reduce parser.
• SLR (1)-Simple LR Parser: • Directed Acyclic Graph (DGA)
Shift-reduce parsing attempts to construct a parse tree for an input string beginning at the leaves and working up The Directed Acyclic Graph (DAG) is used to represent the structure of basic blocks, to visualize the flow • Basic Blocks in Compiler Design
towards the root. In other words, it is a process of "reducing" (opposite of deriving a symbol using a production rule) of values between basic blocks, and to provide optimization techniques in the basic block. To apply an Basic Block is a straight line code sequence that has no branches in and out branches except to the entry and at the
a string w to the start symbol of a grammar. At every (reduction) step, a particular substring matching the RHS of a optimization technique to a basic block, a DAG is a three-address code that is generated as the result of end respectively. Basic Block is a set of statements that always executes one after other, in a sequence.
production rule is replaced by the symbol on the LHS of the production. A general form of shift-reduce parsing is LR an intermediate code generation. The first task is to partition a sequence of three-address codes into basic blocks. A new basic block is begun with the
(scanning from Left to right and using Right-most derivation in reverse) parsing, which is used in a number of Directed acyclic graphs are a type of data structure and they are used to apply transformations to basic blocks. first instruction and instructions are added until a jump or a label is met. In the absence of a jump, control moves
automatic parser generators like Yacc, Bison, etc. A convenient way to implement a shift-reduce parser is to use a The Directed Acyclic Graph (DAG) facilitates the transformation of basic blocks. further consecutively from one instruction to another. The idea is standardized in the algorithm below:
stack to hold grammar symbols and an input buffer to hold the string w to be parsed. The symbol $ is used to mark DAG is an efficient method for identifying common sub-expressions. Algorithm: Partitioning three-address code into basic blocks.
the bottom of the stack and also the right-end of the input. It demonstrates how the statement’s computed value is used in subsequent statements. Input: A sequence of three address instructions.
Notation ally, the top of the stack is identified through a separator symbol, and the input string to be Examples of directed acyclic graph : Process: Instructions from intermediate code which are leaders are determined. The following are the rules used for
• Loop Optimizations:
parsed appears on the right side of. The stack content appears on the left of . We now give a brief introduction to a very important place for
finding a leader:
For example, an intermediate stage of parsing can be shown as follows: optimizations, namely loops, especially the inner loops where The first three-address instruction of the intermediate code is a leader.
Sid1+id2 id3S .... (1) programs tend to spend the bulk of their time. The running time of a Instructions that are targets of unconditional or conditional jump/goto statements are leaders.
Here "Sid1" is in the stack, while the input yet to be seen is "+ id2* id3$* In shift-reduce parser, there are two program may be improved if we decrease the number of instructions Instructions that immediately follow unconditional or conditional jump/goto statements are considered leaders.
fundamental operations: shift and reduce. in an inner loop, even if we increase the amount of code outside that Each leader thus determined its basic block contains itself and all instructions up to excluding the next leader.
Shift operation: The next input symbol is shifted onto the top of the stack. loop. Example 1: The following sequence of three-address statements forms a basic block:
1 Three techniques are important for loop optimization:2 Code
After shifting + into the stack, the above state captured in (1) would change into: Sid1+ id2 id35 motion, which moves code outside a loop; 3 Induction- variable
t1 := a*a , t2 := a*b , t3 := 2*t2 , t4 := t1+t3 , t5 := b*b , t6 := t4 +t5
• What is a Polymorphic Function? elimination, which we apply to replace variables from inner loop.4 • Data flow analysis
A function is said to be a Polymorphic function if it can work with multiple types of data or objects. We can also Reduction in strength, which replaces and 3 expensive operation by a It is the analysis of flow of data in control flow graph, i.e., the analysis that determines the information
describe it as a function that can be used to perform the same type of operation on different types of input. cheaper one, such as a multiplication by an addition. regarding the definition and use of data in program. With the help of this analysis, optimization can be done.
Types of Polymorphic Functions in Compiler Design In general, its process in which values are computed using data flow analysis. The data flow property
There are two types of polymorphic functions that are used in compiler design. Directed Acyclic Graph Characteristics :
A Directed Acyclic Graph for Basic Block is a directed acyclic graph with the following labels on nodes. represents information that can be used for optimization.
1. Ad-hoc Polymorphism:
The graph’s leaves each have a unique identifier, which can be variable names or constants. Data flow analysis is a technique used in compiler design to analyze how data flows through a program. It
It is also known as “Overloading Ad-hoc Polymorphism”, which allows functions that have the same name to act
differently for different data types. For example, The plus operator will add two numbers but perform a concatenation The interior nodes of the graph are labelled with an operator symbol. involves tracking the values of variables and expressions as they are computed and used throughout the
operation on two strings. In addition, nodes are given a string of identifiers to use as labels for storing the computed value. program , with the goal of identifying opportunities for optimization and identifying potential errors.
2. Parametric Polymorphism: Directed Acyclic Graphs have their own definitions for transitive closure and transitive reduction. The basic idea behind data flow analysis is to model the program as a graph, where the nodes represent
It is also known as "Early Binding Parametric Polymorphism.” It opens a way to use the same code for different data Directed Acyclic Graphs have topological orderings defined. program statements and the edges represent data flow dependencies between the statements. The data
types. It is implemented by using Templates. For example: To develop an understanding of this sort of Polymorphism, • . Peephole Optimization flow information is then propagated through the graph, using a set of rules and equations to compute the
let us execute a program to find the greater of two Integers or two Strings. Target code often contains redundant instructions and suboptimal constructs.
values of variables and expressions at each point in the program.
• TYPES Expressiion Examine a short sequence of target instruction (peephole) and replace by a shorter or faster sequence peephole is a
types of data flow analysis performed by compilers include:
Type Expressions The type expressions are used to represent the type of a programming language construct. Type small moving window on the target systems.
A statement-by-statement code-generation strategy often produces target code that contains redundant 1 Reaching Definitions Analysis: This analysis tracks the definition of a variable or expression and
expression can be a basic type or formed by recursively applying an operator called a type constructor to other type
instructions and suboptimal constructs. A simple but effective technique for locally improving the target code is determines the points in the program where the definition “reaches” a particular use of the variable or
expressions. The basic types and constructors depend on the source language to be verified. Let us define type
expression as follows: peephole optimization, a method for trying to improve the performance of the target program by examining a short expression. This information can be used to identify variables that can be safely optimized or eliminated.
• A basic type is a type expression • Boolean, char, integer, real, void, type_error • A type constructor applied to type sequence of target instructions (called the e peephole) and replacing these instructions by a shorter or faster 2 Live Variable Analysis: This analysis determines the points in the program where a variable or expression
expressions is a type expression • Array: array(I, T) • Array (I,T) is a type expression denoting the type of an array with sequence, whenever possible. The peephole is a small, moving window the target program. The code in the peephole is “live”, meaning that its value is still needed for some future computation. This information can be used to
elements of type T and index set I, where T is a type expression. Index set I often represents a range of integers. For need not be contiguous, although some implementations do require this. identify variables that can be safely removed or optimized.
example, the Pascal declaration var C: array[1..20] of integer; associates the type expression array(1..20, integer) Peephole optimization examples. 3 Available Expressions Analysis: This analysis determines the points in the program where a particular
Redundant loads and stores
with C. • Product: T1 × T2 • If T1 and T2 are two type expressions, then their Cartesian product T1 × T2 is a type expression is “available”, meaning that its value has already been computed and can be reused. This
Consider the code sequence
expression. We assume that × associates to the left. • Record: record((N1 × T1) × (N2 × T2)) A record differs from a information can be used to identify opportunities for common subexpression elimination and other
Move Ro, a Move a, Ro
product. The fields of a record have names. The record type constructor will be applied to a tuple formed from field optimization techniques.
Instruction 2 can always be removed if it does not have a label.
types and field names. For example, the Pascal program fragment type node = record address: integer ; data : array 4 Constant Propagation Analysis: This analysis tracks the values of constants and determines the points in
Now, we will give some examples of program transformations that are characteristic of peephole optimization:
[1..15] of char end; var node table : array [1..10] of node ; declares the type name “node” representing the type
Redundant loads and stores: If we see the instruction sequence the program where a particular constant value is used. This information can be used to identify
expression record((address integer) × (data × array(1..15,char))) and the variable “node table” to be an array of
Move Ro, a opportunities for constant folding and other optimization techniques.
records of this type. Pointer: pointer (T)
Move a, Ro • Issues In The Design Of A Code Generator
Pointer(T) is a type expression denoting the type “pointer to an object of type T where T is a type expression. For
We can delete instruction (2) because whenever (2) is executed, (1) will ensure that the value of a is already in The following issues arise during the code generation phase:
example, in Pascal, the declaration var pt.: *row declares variable “ptr.” to have type pointer(row). Function: D → R Input to code generator 2. Target program 3. Memory management 4. Instruction selection 5. Register allocation 6 Evaluation order
register RO. Note that is (2) has a label, we could not be sure that (1) was always executed immediately before (2)
• STORAGE Allocation Input to code generator:>> The input to the code generation consists of the intermediate representation of the source program
and so we could not remove (2)
Storage Allocation Strategies These represent the different storage-allocation strategies used in the distinct parts of produced by front end, together with information in the symbol table to determine run-time addresses of the data objects denoted by
• Register allocation and assignment the names in the intermediate representation.
the run-time memory organization (as shown in slide 8). We will now look at the possibility of using these strategies
Registers are the fastest locations in the memory hierarchy. But unfortunately, this resource is limited. It comes Intermediate representation can be:
to allocate memory for activation records. Different languages use different strategies for this purpose. For example,
old FORTRAN used static allocation, Algol type languages use stack allocation, and LISP type languages use heap under the most constrained resources of the target processor. Register allocation is an NP-complete problem. Linear representation such as postfix notation. B. Three address representation such as quadruples c. Virtual machine representation
However, this problem can be reduced to graph coloring to achieve allocation and assignment. Therefore a good such as stack machine code d. Graphical representations such as syntax trees and dags.
allocation. Static allocation: Names are bound to storage as the program is compiled No runtime support is required Target program:>> The output of the code generator is the target program. The output may be:
Bindings do not change at run time On every invocation of procedure names are bound to the same storage Values of register allocator computes an effective approximate solution to a hard problem.
a. Absolute machine language It can be placed in a fixed memory location and can be executed immediately.
local names are retained across activations of a procedure These are the fundamental characteristics of static b. Reloadable machine language It allows subprograms to be compiled separately. C. Assembly language
allocation. Since name binding occurs during compilation, there is no need for a run-time support package. The ROPUNOTES.IN
retention of local name values across procedure activations means that when control returns to a procedure, the Code generation is made easier.
Memory management:>> 1 Names in the source program are mapped to front end and code generator. Addresses of data objects in
values of the locals are the same as they were when control last left. For example, suppose we had the following
run-time memory by the program
code, written in a language using static allocation: function F( ) { int a; print(a); a = 10; } After calling F( ) once, if it was Figure – Input-Output 2 It makes use of symbol table, that is, a name in a three-address statement refers to a symbol-table entry for the name.
called a second time, the value of a would initially be 10, and this is what would get printed The register allocator determines which values will reside in the register and which register will hold each of those 3 Labels in three-address statements have to be converted to addresses of instructions.
• Stack Allocation values. It takes as its input a program with an arbitrary number of registers and produces a program with a finite Instruction selection:>> 1 The instructions of target machine should be complete and uniform.
Stack Allocation The activation records that are pushed onto and popped for the run time stack as the control flows 2 Instruction speeds and machine idioms are important factors when efficiency of target program is considered.
register set that can fit into the target machine. (See image) 3 The quality of the generated code is determined by its speed and size.
through the given activation tree. First the procedure is activated. Procedure read array’s activation is pushed onto Allocation vs Assignment: Register allocation>> Instructions involving register operands are shorter and faster than those involving operands in emery.
the stack, when the control reaches the first line in the procedure sort. After the control returns from the activation of
Allocation – The use of registers is subdivided into two sub problems:
the read array, its activation is popped. In the activation of sort, the control then reaches a call of sort with actuals 1 Register allocation – the set of variables that will reside in registers at a point in the program is selected.
Maps an unlimited namespace onto that register set of the target machine.
and 9 and an activation of assort is pushed onto the top of the stack. In the last stage the activations for partition Register assignment the specific register that a variable will reside in is picked.
Reg. to Reg. Model: Maps virtual registers to physical registers but spills excess amount to memory.
(1,3) and assort (1,0) have begun and ended during the life time of sort (1,3), so their activation records have come Certain machine requires even-odd register pairs for some operands and results. For example, consider the division instruction of the
and gone from the stack, leaving the activation record for sort (1,3) on top. Mem. to Mem. Model: Maps some subset of the memory location to a set of names that models the physical form: D x, y
• Dynamic Storage register set. Where, x-dividend even register in even/odd register pair y – Divisor even register holds the remainder odd register holds the quotient
Allocation ensures that code will fit the target machine’s reg. set at each instruction. Evaluation order>> The order in which the computations are performed can affect the efficiency of the target code. Some computation
. Dynamic Storage Allocation: Generally languages like Lisp and ML which do not allow for explicit de-allocation of orders require fewer registers to hold intermediate results than others.
memory do garbage collection. A reference to a pointer that is no longer valid is called a 'dangling reference'. For Assignment –
• CODE IMPROVIG TRANSFORMATIONS
example, consider this C code: Maps an allocated name set to the physical register set of the target machine.
Algorithms for performing the code improving transformations rely on data-flow information. Here we consider
int main (void) { int* a=fun(); } int* fun() { int a=3; int* b=&a; return b; } Assumes allocation has been done so that code will fit into the set of physical registers.
common sub-expression elimination, copy propagation and transformations for moving loop invariant computations
Here, the pointer returned by fun() no longer points to a valid address in memory as the activation of fun() has ended. No more than ‘k’ values are designated into the registers, where ‘k’ is the no. of physical registers.
out of loops and for eliminating induction variables. Global transformations are not substitute for local
This kind of situation is called a 'dangling reference'. In case of explicit allocation it is more likely to happen as the • Flow Graph in Code Generation
A basic block is a simple combination of statements. Except for entry and exit, the basic blocks do not have any branches like in and out.
transformations; both must be performed.
user can de-allocate any part of memory, even something that has to a pointer pointing to a valid piece of memory Elimination of global common sub expressions:
It means that the flow of control enters at the beginning and it always leaves at the end without any halt. The execution of a set of
• Dead Code Elimination
instructions of a basic block always takes place in the form of a sequence. • The available expressions data-flow problem discussed in the last section allows us to determine if an expression
In the world of software development, optimizing program efficiency and maintaining clean code are crucial goals. Dead code
elimination, an essential technique employed by compilers and interpreters, plays a significant role in achieving these objectives. This The first step is to divide a group of three-address codes into the basic block. The new basic block always begins with the first at point p in a flow graph is a common sub-expression. The following algorithm formalizes the intuitive ideas
article explores the concept of dead code elimination, its importance in program optimization, and its benefits. We will delve into the instruction and continues to add instructions until it reaches a jump or a label. If no jumps or labels are identified, the control will flow presented for eliminating common sub-expressions.
process of identifying and eliminating dead code, emphasizing original content to ensure a plagiarism-free article. from one instruction to the next in sequential order.
Understanding Dead Code Dead code refers to sections of code within a program that is never executed during runtime and has no The algorithm for the construction of the basic block is described below step by step:
impact on the program’s output or behavior. Identifying and removing dead code is essential for improving program efficiency, Algorithm: The algorithm used here is partitioning the three-address code into basic blocks.
reducing complexity, and enhancing maintainability. Input: A sequence of three-address codes will be the input for the basic blocks.
Benefits of Dead Code Elimination Output: A list of basic blocks with each three address statements, in exactly one block, is considered as the output.
Enhanced Program Efficiency: By removing dead code, unnecessary computations and memory usage are eliminated, resulting in Method: We’ll start by identifying the intermediate code’s leaders. The following are some guidelines for identifying leaders:
faster and more efficient program execution. The first instruction in the intermediate code is generally considered as a leader.
Improved Maintainability: Dead code complicates the understanding and maintenance of software systems. By eliminating it, The instructions that target a conditional or unconditional jump statement can be considered as a leader.
developers can focus on relevant code, improving code readability, and facilitating future updates and bug fixes Any instructions that are just after a conditional or unconditional jump statement can be considered as a leader.

You might also like