0% found this document useful (0 votes)
346 views

E4 - Flex and Bison

The document discusses building a compiler using Flex and Bison. It describes the typical phases of a compiler as lexical analysis, syntax analysis, type checking, intermediate code generation, register allocation, assembly and linking, and machine code generation. It then explains how Flex and Bison can be used as tools to build a compiler, with Flex generating a lexical analyzer from regular expressions and Bison generating a parser from BNF grammar rules. The document provides examples of building a basic compiler that performs lexical analysis, generates intermediate code, and assembles machine code in three steps using Flex and Bison.

Uploaded by

Ahmed Ashraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
346 views

E4 - Flex and Bison

The document discusses building a compiler using Flex and Bison. It describes the typical phases of a compiler as lexical analysis, syntax analysis, type checking, intermediate code generation, register allocation, assembly and linking, and machine code generation. It then explains how Flex and Bison can be used as tools to build a compiler, with Flex generating a lexical analyzer from regular expressions and Bison generating a parser from BNF grammar rules. The document provides examples of building a basic compiler that performs lexical analysis, generates intermediate code, and assembles machine code in three steps using Flex and Bison.

Uploaded by

Ahmed Ashraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

E4 Building a Compiler Using Flex and

Bison

4.1 Requirements
• Win_flex and Win_bison tools.
• Visual Studio 2012.

4.2 Objectives
• Identify the different phases of the compiler.
• Introduce the Lex and Yacc as powerful compiler building tools.
• Build a small compiler using flex and bison.

4.3 Introduction
A compiler is a program that translates (or compiles) a program written in a high-level programming language
that is suitable for human programmers into the low-level machine language that is required by computers.
During this process, the compiler will also attempt to spot and report obvious programmer mistakes.
Since writing a compiler is a nontrivial task, it is a good idea to structure the work. A typical way of
doing this is to split the compilation into several phases with well-defined interfaces. Conceptually, these
phases operate in sequence (though in practice, they are often interleaved), each phase (except the first)
taking the output from the previous phase as its input. In some compilers, the ordering of phases may differ
slightly, some phases may be combined or split into several phases or some extra phases may be inserted
between those mentioned below.

4.3.1 Lexical analysis


This is the initial part of reading and analysing the program text: The text is read and divided into tokens,
each of which corresponds to a symbol in the programming language, e.g., a variable name, keyword or
number.

73
4.3.2 Syntax analysis
This phase takes the list of tokens produced by the lexical analysis and arranges these in a tree-structure
(called the syntax tree) that reflects the structure of the program. This phase is often called parsing.

4.3.3 Type checking


This phase analyses the syntax tree to determine if the program violates certain consistency requirements,
e.g., if a variable is used but not declared or if it is used in a context that does not make sense given the
type of the variable, such as trying to use a boolean value as a function pointer.

4.3.4 Intermediate code generation


The program is translated to a simple machine independent intermediate language.

4.3.5 Register allocation


The symbolic variable names used in the intermediate code are translated to numbers, each of which corre-
sponds to a register in the target machine code.

4.3.6 Assembly and linking


The intermediate language is translated to assembly language (a textual representation of machine code) for
a specific machine architecture.

4.3.7 Machine code generation


The assembly-language code is translated into binary representation and addresses of variables, functions,
etc., are determined.

4.4 Lex and Yacc


Lex and Yacc are two very important and powerful tools for building compilers. Lex stands for Lexical
Analyzer and Yacc stands for Yet Another Compiler Compiler. Lex creates a C function that will parse
input according to a set of regular expressions. While Yacc generate a C program for a parser from BNF
rules.Lex and Yacc are part of BSD Unix. GNU has it’s own, enhanced, versions called Flex and Bison which
will be used in this experiment.

74
Figure 4.1: Lex and Yacc

Figure 4.1 illustrates how lex and yacc work.


• First, the lexical rules are input to the Lex to generate the lexical analyzer (lexer). The entry point of
the lexical analyzer is the C function yylex().
• Next the grammar rules are input to Yacc which use them to generate the syntactical analyzer (parser)
that implements the grammar. The syntactical analyzer entry point is the C function yyparse().
• Finally, a main function should be supplied to add the missing parts of the compiler and to drive the
parser. The lexer and parser, along with the main function, are compiled and linked together to form
the executable for the compiler.

4.5 Building a complete compiler


In this experiment we are building a complete compiler for a very simple language for a simple machine
using three steps as shown in Figure 4.2.
• First step is to use flex and bison to do syntax analysis for the input and generate intermediate language.
• Step two is to use Flex and Bison to generate assembly code from the intermediate code.
• Finally apply the assembly code to the assembler to generate the final machine code.

75
Figure 4.2: Compiler steps

4.5.1 Lexer
The word lexical means pertaining to words. In terms of programming languages,words are objects like
variable names,numbers,keywords etc. Such words are traditionally called tokens. A lexical analyser,or lexer
for short,will as its input take a string of individual letters and divide this string into tokens. Additionally,
it will filter out whatever separates the tokens (the so-called white-space), i.e., lay-out characters (spaces,
newlines etc.) and comments. Figure 4.3 illustrates the lexer job.
Some important terminologies:
• token: a name for a set of input strings with related structure.
Example: identifier,integer constant
• pattern: a rule describing the set of strings associated with a token.
Example: a letter followed by zero or more letters, digits, or underscores.
• lexeme: the actual input string that matches a pattern.
Example: count

76
Figure 4.3: Lexer

In order to understand how does the lexer specify tokens, we should first learn about regular expressions.

4.5.1.1 Regular Expressions


Alphabet is defined as a finite set of symbols that are used to compose strings, string is a finite sequence of
alphabet symbols and language is a (finite or infinite) set of strings.
Given an alphabet, we will describe sets of strings by regular expressions. Regular expression an algebraic
notation that is compact and easy for humans to use and understand. Regular expressions that describe
simple sets of strings can be combined to form regular expressions that describe more complex sets of strings.

Regular Operations on languages:


• Union:
R ∪ S = {x|x ∈ R or x ∈ S}
• Concatenation: RS = {xy|x ∈ R and y ∈ S}
• Kleene closure: R* = R concatenated with itself 0 or more times

77
= {ǫ} ∪ R ∪ RR ∪ RRR∪ = strings obtained by concatenating a finite
number of strings from the set R. (ǫ means empty) :
Common Extensions to Regular Expressions Notation
• One or more repetitions of r : r+
• A range of characters : [a-zA-Z], [0-9]
• An optional expression: r?
• Any single character: .
Regular expressions examples:
• letter = [a − zA − Z]
• digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 Or digit = [0 − 9]
• ident =letter(letter | digit)∗
• Integer_const =digit+

4.5.1.2 Lex File


As shown in Figure 4.4, regular expressions are written in the lex file. Lex file is used to generate the lexer
C routine that will specify the tokens.

Figure 4.4: Lex file

lex file can be divided to three parts as shown in Figure 4.5.

78
Figure 4.5: Lex file parts

in which no part is required. However, the first %% is required, in order to mark the separation between
the declarations and the productions.

4.5.1.3 First part of lex file


This part of a Lex file may contain:
• Code written in the target language (usually C or C++), embraced between %{ and %} which will be
placed at the top of the file that Lex will create. That is the place where we usually put the include
files. Lex will put "as is" all the content enclosed between these signs in the target file. The two signs
will have to be placed at the beginning of the line.
• Regular expressions as explained in Section 4.5.1.1

4.5.1.4 Second part of lex file


This part is aimed to instruct Lex what to do in he generated analyser when it will encounter one notion or
another one. It may contain:
• Some specifications, written in the target language (usually C or C++) surrounded by %{ and %} (at
the beginning of a line). The specifications will be put at the beginning of the yylex() function, which
is the function that consums the tokens, and returns an integer.
• Productions, having the syntax: regular expression action.
• If action is missing, Lex will put the matching characters as is into the standard output. If action is
specified, it has to be written in the target language. If it contains more than one instruction or is
written in more than one line, you will have to embrace it between { and }.
You should also note that comment such as /* ... */ can be present in the second part of a Lex file only
if enclosed between braces, in the action part of the statements. Otherwise, Lex would consider them as
regular expressions or actions, which would give errors, or, at least, a weird behaviour. Finally, the yytext
variable used in the actions contains the characters accepted by the regular expression. This is a char table,
of length yyleng (ie, char yytext[yyleng]).

79
4.6 Experiment 1: Extracting Tokens From Source Code
The following experiment is to show how can lexer extract tokens from source code. The language is composed
only of two types of statements: Declaration statements i.e: int x, followed by assign statements i.e: x=4.
1
2
3
4 %{
5 #d e f i n e YYSTYPE ch a r ∗
6 #i n c l u d e " s t d l i b . h "
7
8 i n t l i n e n o =1;
9 %}
10 %o p t i o n n o u n i s t d
11 %o p t i o n noyywrap
12 %o p t i o n never −i n t e r a c t i v e
13 w h i te [ \ r \ t ]+
14 l e t t e r [ A−Za−z ]
15 d i g i t [0 −9]
16 i d { l e t t e r } ({ l e t t e r } | { d i g i t } ) ∗
17 number { d i g i t }+
18 %%
19 { w h i te } { }
20 {number} { p r i n t f ( " Number\n " ) ; }
21 " i n t " { p r i n t f ( " INT\n " ) ; }
22 { i d } { p r i n t f ( " ID\n " ) ; }
23 "=" { p r i n t f ( " ASSIGN\n " ) ; }
24 %%
25 i nt yyerror ( void )
26 {
27 p r i n t f ( " E r r o r \n " ) ;
28
29 }
30
31 i n t main ( )
32 {
33 FILE∗ f p=f o p e n ( "D: \ \ l a b 4 f o l d e r s \\ c o m p i l e r \\ t e s t . t x t " , " r " ) ;
34 y y i n=f p ;
35 yylex () ;
36 system ( " pause " ) ;
37
38 }
39
40
41

Experiment steps:
• open file exp1.l
• open cmd from start menu.
• cd to the folder that contains your file.
• run the following command: win_flex exp1.l.
• lex.yy.c file is now created.
• open Visual studio.
• create new empty console project.

80
• add the following file to the project: lex.yy.c and build the project.
• create the file test.txt in the following folder: D:lab4folders/compiler.
• add the following in the text file:
int x
int y
x=4
y=9

• Now run your visual studio project.


The output you will see is the tokens found in your source code.

4.6.0.5 Exercises
• The previous codes only allowed int declaration, add float declaration.

4.6.1 Parsing (Syntax Analysis)


• Main Task: Take a token sequence from the scanner and verify that it is a syntactically correct program.
• Secondary Tasks:
– Process declarations and set up symbol table information accordingly, in preparation for semantic
analysis.
– Construct a syntax tree in preparation for intermediate code generation.

4.6.1.1 Context Free Grammer


A context-free grammar for a language specifies the syntactic structure of programs in that language. Com-
ponents of a grammar:
• a finite set of tokens (obtained from the scanner);
• a set of variables representing âĂIJrelatedâĂİ sets of strings, e.g., declarations, statements, expressions.
• a set of rules that show the structure of these strings. an indication of the âĂIJtop-levelâĂİ set of
strings we care about

1 An Example :
2 Program −> D e c l a r a t i o n S ta tem en ts
3
4 D e c l a r a t i o n −> |
5 D e c l a r a t i o n INT ID
6
7 S ta tem en ts−> |
8 S ta tem en ts Statement
9
10 Statement−>ID ASSIGN NUMBER

The grammar in this example describes a program that starts with declarations then statements. Decla-
ration is defined by recursion. The declaration part of the program may be: int x, int x int y, int x int y int
z, or any number of declarations. Statements are defined also by recursion.

81
4.6.1.2 Yacc File
As shown in Figure 4.6, grammar rules are written in the yacc file. Yacc file is used to generate the parser
C routine that will do the syntax analysis.

Figure 4.6: Yacc file

Yacc file can be divided to three parts as shown in Figure 4.7. and only the first %% and the second
part are mandatory.

Figure 4.7: Yacc file parts

4.6.1.2.0.1 First part of yacc file The first part of a Yacc file may contain:
• The specifications written in the target language, enclosed between %{ and %} (each symbol at the
beginning of a line) that will be put at the top of the scanner generated by Yacc.
• Declaration of the tokens with the %token keword.

82
• The type of the terminal, using the reserved word:%union.
• Information about operators’ priority or associatively.
• The axiom of the grammar, using the reserved word %start (if not specified, the axiom is the first
production of the second part of the file).
The yylval variable, implicitly declared of the %union type is really important in the file, since it is the
variable that contains the description of the last token read.

4.6.1.2.0.2 Second part of yacc file This part can not be empty. It may contain:
• declarations and/or definitions enclosed between %{ and %}.
• Productions of the language’s grammar.
These productions look like Figure 4.8

Figure 4.8: Production of language grammer

provided that the body_i may be terminal or nonterminal notions of the language.

4.6.2 Experiment 2: Using flex and bison to understand source code


This experiment completes Experiment1 by adding the Yacc file. Note that the main function is removed
from the lex file and added to the yacc file.
lex file (exp2.l):
1 %{
2 #d e f i n e YYSTYPE ch a r ∗
3 #i n c l u d e " exp2 . tab . h "
4 #i n c l u d e " s t d l i b . h "
5
6 i n t l i n e n o =1;
7 %}
8 %o p t i o n n o u n i s t d
9 %o p t i o n noyywrap
10 %o p t i o n never −i n t e r a c t i v e
11
12 w h i te [ \ r \ t ]+
13 l e t t e r [ A−Za−z ]
14 d i g i t [0 −9]
15 i d { l e t t e r } ({ l e t t e r } | { d i g i t } ) ∗
16 number { d i g i t }+

83
17 %%
18 { w h i te } { }
19 {number} { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n NUMBER; }
20 " i n t " { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n INT ; }
21 { i d } { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n ID ; }
22 "=" { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n ASSIGN ; }
23
24 %%
25
26 i nt yyerror ( void )
27 {
28 p r i n t f ( " E r r o r \n " ) ;
29
30 }
31

Yacc file (exp2.y):


1 %{#i n c l u d e <s t d i o . h>
2 #i n c l u d e < s t d l i b . h>
3 #d e f i n e YYSTYPE ch a r ∗
4
5 extern FILE∗ y y i n ;
6 %}
7 %to k en NUMBER
8 %to k en ID
9 %to k en INT
10 %to k en ASSIGN
11
12 %%
13 /∗ R ed u cti o n Rule f o r th e whole Program∗/
14 Program : D e c l a r a t i o n S ta tem en ts { } ;
15
16 /∗ R ed u cti o n r u l e s f o r D e c l a r a t i o n s a t th e b e g i n n i n g o f th e program ∗/
17 Declaration : |
18 D e c l a r a t i o n INT ID { p r i n t f ( " INT ID " ) ; }
19
20 ;
21
22 /∗ R ed u cti o n r u l e f o r th e s e t o f s t a t e m e n t s ∗/
23 S ta tem en ts : |
24 S ta tem en ts Statement {}
25 ;
26
27 /∗ R ed u cti o n r u l e f o r a r i t h m e t i c a s s i g n m en t s t a t e m e n t s ∗/
28 Statement : ID ASSIGN NUMBER{ p r i n t f ( " ID ASSIGN NUMBER" ) ; }
29 ;
30 %%
31 i n t main ( )
32 {
33 FILE∗ f p=f o p e n ( "D: \ \ l a b 4 f o l d e r s \\ c o m p i l e r \\ t e s t . t x t " , " r " ) ;
34 y y i n=f p ;
35 i f ( y y p a r s e ( ) !=0)
36 p r i n t f ( " E r r o r found . \ n " ) ;
37 system ( " pause " ) ;
38 }

Experiment steps:
• put both files exp2.l and exp2.y in one folder.

84
• open cmd from start menu.
• cd to the folder that contains your files
• run the following command: win_flex exp2.l
• lex.yy.c file is now created.
• run the following command: win_bison -d exp2.y.
• exp2.tab.h and exp2.tab.c are now created.
• open Visual studio.
• create new empty console project.
• add the following files to the project: lex.yy.c, hello.tab.h and hello.tab.c and build the project.
• create the file test.txt in the following folder: D:lab4folders/compiler.
• add the following in the text file:
int x
int y
x=4
y=9

• Now run your visual studio project.

4.6.2.1 Exercises
• The previous codes only allowed int declaration, add float declaration.
• The previous codes only allowed assigning numbers to variables, allow assigning numbers or variables.
For example your code should be able to compile both x=3 and x=y.

4.6.3 Experiment 3: Using flex and bison to understand source code - Part 2
Repeat Experiment 2 to extend the compiler to cover the following grammer which is very close to C syntax:
1 Program −> D e c l a r a t i o n S ta tem en ts
2
3 D e c l a r a t i o n −> |
4 D e c l a r a t i o n Type ID DELIMITOR
5
6 Type −> INT | FLOAT
7
8 S ta tem en ts −> S ta tem en ts Statement |
9
10 Statement −> ID ASSIGN AExp DELIMITOR
11
12 AExp −> AExp PLUS term
13 | AExp MINUS term
14 | term
15
16 term −> term TIMES f a c t o r
17 | term DIVIDE f a c t o r
18 | factor
19
20 f a c t o r −> LP AExp RP
21 | NUMBER
22 | ID
23

85
24 Statement −> ID ASSIGN BExp DELIMITOR
25
26 BExp −> TRUE
27 | FALSE

This grammar extends the grammar in experiment 2. Both int and float declarations are allowed by defining
variable type (int or float). Instead of assigning number to ID as in the previous experiment, a variable
called AExp is assigned to the ID. AExp can be the result of adding or subtracting two terms, or a term. A
variable called term is defined to be the result of multiplying or dividing two factors or a factor. A factor is
defined to be AExp between two parentheses, ID, or a number. The reason for defining operations in several
rules is to preserve operators precedence. Highest precedence is given to what is enclosed by parentheses,
then multiplication and division. Lowest precedence is given to addition and subtraction. Finally assigning
binary values (true and false) to an ID is allowed by the last rule.
Make the required changes in the lex and yacc files

• Define new tokens in the lex file.


• Add the new tokens in the yacc file.
• Define new grammer rules in the yacc file.

then follow the same steps in Experiment 2 required to build and run the compiler.

4.7 Intermediate code generation


Many compilers use a medium-level language as a stepping-stone between the high-level language and the
very low-level machine code. Such stepping-stone languages are called intermediate code. Apart from
structuring the compiler into smaller jobs, using an intermediate language has other advantages:
• If the compiler needs to generate code for several different machine-architectures, only one translation
to intermediate code is needed. Only the translation from intermediate code to machine language (i.e.,
the back-end) needs to be written in several versions.
• If several high-level languages need to be compiled, only the translation to intermediate code need to
be written for each language. They can all share the back-end, i.e., the translation from intermediate
code to machine code.
• Intermediate code is implementable via syntax directed translation, so it can be folded into the parsing
process.

4.7.1 Three address code


The intermediate code used in this experiment is called three address code. Where at most three operands
are used in each instruction. So, if the high level instruction has more operands, it will be split to several
three address code instructions using temporary variables. For example:
a=1 + 2 b high level instruction would be translated to
t1=2 *b
t2=1+t1
a=t2
three address code instructions, where t1 and t2 are temporary variables.

86
The sentence a= 1+2/3b can be parsed by the following grammer:
Statement: ID ASSIGN AExp {/* action3 here */}
AExp: AExp PLUS term {/* action2 here */}
term : term TIMES ID {/* action1 here */ }|

So, during parsing we can generate the intermediate code. For each grammar rule we will generate three
address code sentence which is temp= arg1 op arg2. Tracing the parsing for the previous sentence a = 1
+ 2b and from your knowledge with grammar derivations, you may know that action1 would take place
first then action2 then action3. The actions will be generating three address code instructions as Table 4.1
indicates:

Table 4.1: Generating three address code

Action1 Create temp variable t1,


t1= 2*b
Action2 Create temp variable t2,
t2= 1+t1
Action3 a=t2

4.7.2 Experiment 4: Using flex and bison to generate intermediate code


In this experiment we shall generate the intermediate code for the grammar in experiment 3.
Lex file :
1 %{
2 #d e f i n e YYSTYPE ch a r ∗
3 #i n c l u d e " t h r e e . tab . h "
4 #i n c l u d e " s t d l i b . h "
5
6 i n t l i n e n o =1;
7 %}
8 %o p t i o n n o u n i s t d
9 %o p t i o n noyywrap
10 %o p t i o n never −i n t e r a c t i v e
11
12 w h i te [ \ r \ t ]+
13 l e t t e r [ A−Za−z ]
14 d i g i t [0 −9]
15 i d { l e t t e r } ({ l e t t e r } | { d i g i t } ) ∗
16 number { d i g i t }+
17
18
19 %%
20 { w h i te } { }
21 {number} { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n NUMBER; }
22 " i n t " { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n INT ; }
23 " f l o a t " { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n FLOAT; }
24 " t r u e " { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n TRUE; }
25 " f a l s e " { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n FALSE ; }
26 { i d } { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n ID ; }
27 "+" { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n PLUS ; }

87
28 " −" { y y l v a l=s t r d u p ( y y t e x t ) ; return MINUS; }
29 " ∗ " { y y l v a l=s t r d u p ( y y t e x t ) ; return TIMES ; }
30 " / " { y y l v a l=s t r d u p ( y y t e x t ) ; return DIVIDE ; }
31 "=" { y y l v a l=s t r d u p ( y y t e x t ) ; return ASSIGN ; }
32 " ( " { y y l v a l=s t r d u p ( y y t e x t ) ; return LB; }
33 " ) " { y y l v a l=s t r d u p ( y y t e x t ) ; return RB; }
34 " ; " { y y l v a l=s t r d u p ( y y t e x t ) ; return DELIMITOR ; }
35 " \ n " { l i n e n o ++;}
36 %%
37
38 i nt yyerror ( void )
39 {
40 p r i n t f ( " E r r o r \n " ) ;
41 exi t (1) ;
42 }

Yacc file:
1 %{
2 #i n c l u d e <s t d i o . h>
3 #i n c l u d e < s t d l i b . h>
4
5 #d e f i n e YYSTYPE ch a r ∗
6
7 e x t e r n FILE∗ y y i n ;
8 extern int lineno ;
9
10 i n t tem p o r a r y Gen er a ted=0;
11
12 FILE∗ f o ;
13
14 i n t o p e r a t i o n c o u n t =0;
15 // S t r u c t u r e to h o l d code g e n e r a t e d
16 s t r u c t quadruple
17 {
18 ch a r ∗ op ;
19 ch a r ∗ a r g 1 ;
20 ch a r ∗ a r g 2 ;
21 ch a r ∗ r e s u l t ;
22 } operations [100 0];
23
24 // Temporary v a r i a b l e g e n e r a t e r
25 ch a r ∗ newTemp ( )
26 {
27
28 ch a r ∗ temp1 ;
29 temp1=( ch a r ∗ ) m a l l o c ( 1 0 ) ;
30 s p r i n t f ( temp1 , " t%d",++ tem p o r a r y Gen er a ted ) ;
31 r e t u r n temp1 ;
32 }
33 // code P r i n t e r
34 v o i d p r i n tC o d e ( )
35 {
36 int i ;
37 f o r ( i =0; i <o p e r a t i o n c o u n t ; i ++)
38 {
39
40 f p r i n t f ( f o ,"% s=%s%s%s \n " , o p e r a t i o n s [ i ] . r e s u l t ,
41 o p e r a t i o n s [ i ] . arg1 , o p e r a t i o n s [ i ] . op , o p e r a t i o n s [ i ] . a r g 2 ) ;

88
42 }
43 }
44 // O p er a ti o n
45 v o i d AddOperation ( ch a r ∗op , ch a r ∗ arg1 , ch a r ∗ arg2 , ch a r ∗ r e s u l t )
46 {
47
48 operations [ operationcount ] . op=op ;
49 operations [ operationcount ] . a r g 1=a r g 1 ;
50 operations [ operationcount ] . a r g 2=a r g 2 ;
51 operations [ operationcount ] . r e s u l t=r e s u l t ;
52 o p e r a t i o n c o u n t++;
53
54
55 }
56
57 %}
58
59 %to k en NUMBER
60 %to k en ID
61 %to k en PLUS
62 %to k en MINUS
63 %to k en TIMES
64 %to k en DIVIDE
65 %to k en INT
66 %to k en FLOAT
67 %to k en ASSIGN
68 %to k en TRUE
69 %to k en FALSE
70 %to k en LB
71 %to k en RB
72 %to k en DELIMITOR
73 %l e f t PLUS MINUS
74 %l e f t TIMES DIVIDE
75
76 %s t a r t Program
77
78 %%
79 /∗ R ed u cti o n Rule f o r th e whole Program∗/
80 Program : D e c l a r a t i o n S ta tem en ts {
81 /∗ S u c c e s s f u l l ∗/ p r i n tC o d e ( ) ; }
82 ;
83
84 /∗ R ed u cti o n r u l e s f o r D e c l a r a t i o n s a t th e b e g i n n i n g o f th e program ∗/
85 D e c l a r a t i o n : D e c l a r a t i o n Type ID DELIMITOR { f p r i n t f ( f o ,"% s %s \n " , $2 , $3 ) ; }
86 |
87 ;
88 Type : INT { $$ = $1 ; }
89 | FLOAT { $$ = $1 ; }
90 ;
91
92 /∗ R ed u cti o n r u l e f o r th e s e t o f s t a t e m e n t s ∗/
93 S ta tem en ts : S ta tem en ts Statement {}
94 |
95 ;
96
97
98 /∗ R ed u cti o n r u l e f o r a r i t h m e t i c a s s i g n m en t s t a t e m e n t s ∗/
99 Statement : ID ASSIGN AExp DELIMITOR { AddOperation ( " " , $3 , " " , $1 ) ; }

89
100 ;
101 AExp : AExp PLUS term { $$=newTemp ( ) ; AddOperation ( $2 , $1 , $3 , $$ ) ; }
102 | AExp MINUS term {/∗ add your code h e r e ∗/ }
103 | term { $$ = $1 ; }
104 ;
105
106 term : term TIMES f a c t o r { $$=newTemp ( ) ; AddOperation ( $2 , $1 , $3 , $$ ) ; }
107 | term DIVIDE f a c t o r { /∗ add your code h e r e ∗/ }
108 | factor { $$ = $1 ; }
109 ;
110 factor : LB AExp RB { /∗ add your code h e r e ∗/ }
111 | NUMBER { $$ = $1 ; }
112 | ID { $$ = $1 ; }
113 ;
114
115 /∗ R ed u cti o n r u l e f o r b o o l e a n and b o o l e a n a s s i g n m en t e x p r e s s i o n s ∗/
116 Statement : ID ASSIGN BExp DELIMITOR {/∗ add your code h e r e ∗/ }
117 ;
118 BExp :TRUE
119 | FALSE
120 ;
121
122
123
124
125 %%
126 /∗
127 i n t y y e r r o r ( ch a r ∗ s ) {
128 p r i n t f ("% s \n " , s ) ;
129 return 1;
130 }
131 ∗/
132 i n t main ( )
133 {
134
135 FILE∗ f p=f o p e n ( "D: \ \ l a b 4 f o l d e r s \\ c o m p i l e r \\ t e s t . t x t " , " r " ) ;
136 f o=f o p e n ( "D: \ \ l a b 4 f o l d e r s \\ c o m p i l e r \\ out . t x t " , "w" ) ;
137 y y i n=f p ;
138
139 i f ( f o==NULL) {
140 p r i n t f ( " can ’ t open f i l e " ) ;
141 exi t (1) ;
142 }
143
144 i f ( y y p a r s e ( ) !=0)
145 {
146 f p r i n t f ( s t d e r r , " E r r o r found . \ n " ) ;
147 p r i n t f ("%d\n " , l i n e n o ) ;
148 }
149 }

Code also exists in the lab computers. Please ask your TA for assistance.

4.7.2.1 Discussion
The generated parser reads the source code from the text file test.txt. It writes the output in the text file
out.txt. The C code in the action part after each rule manages how the three address code is generated in

90
the output file. For example:
1 D e c l a r a t i o n : D e c l a r a t i o n Type ID DELIMITOR { f p r i n t f ( f o ,"% s %s \n " , $2 , $3 ) ; }
2 |
3 ;
4

$2 means the value of the second word in the rule. So, in the output file int or float is written. $3 means
the value of the third word in the rule. So, in the output file the lexeme of the token ID is written.

In the previous example, the action of the rule writes directly in the output file. But, this cannot be
done with other statements. As explained in Section 4.7.1 statements should be divided into three address
codes statements using temporary variables.

The function newTemp returns the name of the temporary variable. The first time the function is called,
it returns t1. The second time it returns t2, and so on. The function addNewOperation instantiate a new
structure of type quadruple that stores a three address operation (operation, argument1, argument2, result)
in operations array.

For the actions of operations statements, a new temp variable is created first, the grammar statement
result ($$) takes the name of the new temp variable, and AddNewOperation function is called.
1 term : term TIMES f a c t o r { $$=newTemp ( ) ; AddOperation ( $2 , $1 , $3 , $$ ) ; }

Note that the action part will not be executed until all the non-terminals in the grammar rules are resolved.
So, the action of of the highest precedence operation will be executed first.

Finally, the action of the first grammar rule is executed and printCode function is called. PrintCode
function will write all the operations in the operations array in the output file.
1 Program : D e c l a r a t i o n S ta tem en ts {/∗ S u c c e s s f u l l ∗/ p r i n tC o d e ( ) ; } ;

4.7.2.2 Intermediate-code generation


Write the following as the input source code in the test file.
1 int y ;
2 int z ;
3 int x ;
4 y=z ∗ 2 ;
5 x=y+5∗ z ;
6

Then run the project to see the output intermediate code in file called out.txt.
Exercises:
• In the yacc file you need to complete the code whenever you find
1 /∗ add your code h e r e ∗/
2

. Write the following as the input source code in the test file.

91
1 int y ;
2 int z ;
3 int x ;
4 y=z / 2 ;
5 x=y−5∗ z ;
6 x=(y+5)∗ z ;
7 x=t r u e ;
8
9
10

4.7.3 Final code Generation


The last step in this compiler is to generate the final code from the three address code. The final code is an
assembly code called Mano assembly language. Table 4.2 shows some examples that shows the three address
code and the corresponding assembly code

Table 4.2: 3-address code to Mano-assembly examples

3-address code ManoâĂŹs Simple Com-


puter assembly code
int x HLT
int y x, DEC 0
int t1 y, DEC 0
t1, DEC 0
x=y+z LDA y
ADD z
STA x
x=y-z LDA z
CMA
INC
ADD y
STA x
goto L BUN L
if x ≥ y goto L LDA y
CMA
INC
ADD x
SNA
BUN L

4.7.4 Experiment 5: Using flex and bison to generate final code


In this experiment we need to create a program that takes 3-address code as input and generate assembly
code. We have several issues to handle like assigning variables to processor bounded registers or how to handle
the two branches in the IF-Else statement. For simplicity we will generate assembly for Mano machine that

92
have only one register called accumulator. Also for simplicity we will not handle If-Else statement in this
experiment.
As Three address code is the source language that needs to be translated to mano assembly which is the
target language, we can use lex and yacc to do this job in a simple manner.
Again for simplicity we will handle only the following:
• int and float declarations
• Addition, subtraction, and Assigning
• Simple case for if condition
Lex file:
1 %{#d e f i n e YYSTYPE ch a r ∗
2 #i n c l u d e " t h r e e . tab . h "
3 #i n c l u d e " s t d l i b . h "
4 i n t l i n e n o =1;
5 %}
6 %o p t i o n n o u n i s t d
7 %o p t i o n noyywrap
8 %o p t i o n never −i n t e r a c t i v e
9
10 w h i te [ \ r \ t ]+
11 l e t t e r [ A−Za−z ]
12 d i g i t [0 −9]
13 i d { l e t t e r } ({ l e t t e r } | { d i g i t } ) ∗
14 number { d i g i t }+
15 g r e l o p >=
16
17 %%
18 { w h i te } { }
19 {number} { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n NUMBER; }
20 " i f " { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n IF ; }
21 " i n t " { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n INT ; }
22 " f l o a t " { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n FLOAT; }
23 " g o to " { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n GOTO; }
24 { i d } { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n ID ; }
25 "+" { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n PLUS ; }
26 "=" { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n ASSIGN ; }
27 { g r e l o p } { y y l v a l=s t r d u p ( y y t e x t ) ; r e t u r n GRELOP; }
28 " \ n " { l i n e n o ++;}
29 %%
30 i nt yyerror ( void )
31 { p r i n t f ( " E r r o r \n " ) ;
32 exi t (1) ;
33 }
34

Yacc File:
1 %{
2 #i n c l u d e <s t d i o . h>
3 #i n c l u d e < s t d l i b . h>
4
5 #d e f i n e YYSTYPE ch a r ∗
6
7 e x t e r n FILE∗ y y i n ;
8 extern int lineno ;

93
9
10 FILE∗ f o ;
11
12 %}
13
14 %to k en NUMBER
15 %to k en ID
16 %to k en PLUS
17 %to k en INT
18 %to k en FLOAT
19 %to k en ASSIGN
20 %to k en GRELOP
21 %to k en IF
22 %to k en GOTO
23
24 %s t a r t Program
25 %%
26 /∗ R ed u cti o n Rule f o r th e whole Program∗/
27 Program : D e c l a r a t i o n S ta tem en ts {
28 /∗ S u c c e s s f u l l ∗/ }
29 ;
30
31 /∗ R ed u cti o n r u l e s f o r D e c l a r a t i o n s a t th e b e g i n n i n g o f th e program ∗/
32 D e c l a r a t i o n : D e c l a r a t i o n Type ID { f p r i n t f ( f o ,"% s , DEC 0\n " , $3 ) ; }
33 |;
34 Type : INT { $$ = $1 ; }
35 | FLOAT { $$ = $1 ; }
36 ;
37
38 /∗ R ed u cti o n r u l e f o r th e s e t o f s t a t e m e n t s ∗/
39 S ta tem en ts : S ta tem en ts Statement {}
40 |
41 ;
42
43 /∗ R ed u cti o n r u l e f o r a r i t h m e t i c a s s i g n m en t s t a t e m e n t s ∗/
44 Statement : ID ASSIGN term PLUS term
45 { f p r i n t f ( f o , " LDA %s \nADD %s \nSTA %s \n " , $3 , $5 , $1 ) ; }
46 | ID ASSIGN term { f p r i n t f ( f o , "LDA %s \nSTA %s \n " , $3 , $1 ) ; }
47 ;
48
49 term : NUMBER { $$ = $1 ; }
50 | ID { $$ = $1 ; }
51 ;
52 /∗ R ed u cti o n r u l e f o r i f / i f −e l s e s t a t e m e n t s ∗/
53 Statement : IF term GRELOP term GOTO ID
54 { f p r i n t f ( f o , " LDA %s \nCMA \nINC \nADD %s \nSNA \nBUN %s \n " , $4 , $2 , $6 ) ; }
55 ;
56 Statement :GOTO ID { f p r i n t f ( f o , "BUN %s \n " , $2 ) ; }
57 ;
58 %%
59
60 i n t main ( )
61 {
62
63 FILE∗ f p=f o p e n ( "D: \ \ l a b 4 f o l d e r s \\ c o m p i l e r \\ t e s t . t x t " , " r " ) ;
64 f o=f o p e n ( "D: \ \ l a b 4 f o l d e r s \\ c o m p i l e r \\ out . t x t " , "w" ) ;
65 f p r i n t f ( f o , " HLT\n " ) ;
66 y y i n=f p ;

94
67
68 i f ( f o==NULL) {
69 p r i n t f ( " can ’ t open f i l e " ) ;
70 exi t (1) ;
71 }
72
73
74 i f ( y y p a r s e ( ) !=0)
75 {
76 f p r i n t f ( s t d e r r , " E r r o r found . \ n " ) ;
77 p r i n t f ("%d\n " , l i n e n o ) ;
78 }
79 }

Use experiment1 steps to generate the lexer and parser, build your project and test it using the following
instructions:
1 int x
2 int y
3 int z
4 x=y+z
5 x=z
6 i f x>= y g o to L
7 g o to k

As you have studied assembly and assemblers in second and third years, you can understand how the
machine code would be later generated from the assembly code using the machine assembler.
Exercises:
• Do the needed changes in the lex and yacc files to enable minus operation.

4.7.5 References
• "Basics of Compiler Design", Torben AEgidius Mogensen, anniversary edition, ISBN 978-87-993154-0-6.

95

You might also like