Compiler all practicals
Compiler all practicals
No: 1
Date:
Aim:
To write a C program to implement lexical analyzer for 'if' statement.
Algorithm:
Lexeme Token
******** *******
If <1,1>
variable-name
numeric-constant
; <4,4>
( <5,0>
) <5,1>
{ <6,0>
} <6,1>
> <62,62>
>= <620,620>
< <60,60>
<= <600,600>
! <33,33>
!= <330,330>
= <61,61>
== <610,610>
Program:
#include<stdio.h>
#include<ctype.h>
#include<conio.h>
#include<string.h>
char vars[100][100];
int vcnt;
char input[1000],c;
char token[50],tlen;
int state=0,pos=0,i=0,id;
char*getAddress(char str[])
{
for(i=0;i<vcnt;i++)
if(strcmp(str,vars[i])==0)
return vars[i];
strcpy(vars[vcnt],str);
return vars[vcnt++];
}
intisrelop(char c)
{
if(c=='>'||c=='<'||c=='|'||c=='=')
return 1;
else
return 0;
}
int main(void)
{
clrscr();
printf("Enter the Input String:");
gets(input);
do
{
c=input[pos];
putchar(c);
switch(state)
{
case 0:
if(c=='i')
state=1;
break;
case 1:
if(c=='f'
)
{
printf(" t<1,1> n");
state =2;
}
break;
case 2:
if(isspace(c))
printf(" b");
if(isalpha(c))
{
token[0]=c;
tlen=1;
state=3;
}
if(isdigit(c))
state=4;
if(isrelop(c))
state=5;
if(c==';')printf(" t<4,4> n");
if(c=='(')printf(" t<5,0> n");
if(c==')')printf(" t<5,1> n");
if(c=='{') printf(" t<6,1> n");
if(c=='}') printf(" t<6,2> n");
break;
case 3:
if(!isalnum(c))
{
token[tlen]=' o';
printf(" b t<2,%p> n",getAddress(token));
state=2;
pos--;
}
else
token[tlen++]=c;
break;
case 4:
if(!isdigit(c))
{
printf(" b t<3,%p> n",&input[pos]);
state=2;
pos--;
}
break;
case 5:
id=input[pos-1];
if(c=='=')
printf(" t<%d,%d> n",id*10,id*10);
else
{
printf(" b t<%d,%d> n",id,id);
pos--;
}
state=2;
break;
}
pos++;
}
while(c!=0);
getch();
return 0;
}
Sample Input & Output:
if <1,1>
( <5,0>
a <2,0960>
>=
b <2,09c4>
) <5,1>
max <2,0A28>
= <61,61>
a <2,0A8c>
; <4,4>
Result:
The above C program was successfully executed and verified.
Ex. No: 2
Date:
Aim:
To write a C program to implement lexical analyzer for Arithmetic Expression.
Algorithm:
Lexeme Token
******* ******
Variable name
Numeric constant
; <3,3>
= <4,4>
+ <43,43>
+= <430,430>
- <45,45>
-= <450,450>
* <42,42>
*= <420,420>
/ <47,47>
/= <470,470>
% <37,37>
%= <370,370>
^ <94,94>
^= <940,940>
Program:
#include<stdio.h>
#include<ctype.h>
#include<conio.h>
#include<string.h>
char vars[100][100];
int vcnt;
char input[1000],c;
char token[50],tlen;
int state=0,pos=0,i=0,id;
char *getAddress(char str[])
{
for(i=0;i<vcnt;i++)
if(strcmp(str,vars[i])==0)
return vars[i];
strcpy(vars[vcnt],str);
return vars[vcnt++];
}
intisrelop(char c)
{
if(c=='+'||c=='-'||c=='*'||c=='/'||c=='%'||c=='^')
return 1;
else
return 0;
}
int main(void)
{
clrscr();
printf("Enter the Input String:");
gets(input);
do
{
c=input[pos];
putchar(c);
switch(state)
{
case 0:
if(isspace(c))
printf(" b");
if(isalpha(c))
{
token[0]=c;
tlen=1;
state=1;
}
if(isdigit(c))
state=2;
if(isrelop(c))
state=3;
if(c==';')
printf(" t<3,3> n");
if(c=='=')
printf(" t<4,4> n");
break;
case 1:
if(!isalnum(c))
{
token[tlen]=' o';
printf(" b t<1,%p> n",getAddress(token));
state=0;
pos--;
}
else
token[tlen++]=c;
break;
case 2:
if(!isdigit(c))
{
printf(" b t<2,%p> n",&input[pos]);
state=0;
pos--;
}
break;
case 3:
id=input[pos-1];
if(c=='=')
printf(" t<%d,%d> n",id*10,id*10);
else
{
printf(" b t<%d,%d> n",id,id);
pos--;
}
state=0;
break;
}
pos++;
}
while(c!=0);
getch();
return 0;
Sample Input & Output:
Enter the Input String: a=a*2+b/c; a
= <4,4>
a
* <42,42>
2
+ <43,43>
b
/ <47,47>
c
; <3,3>
Result:
The above C program was successfully executed and verified.
PRACTICAL NO. 3
In the following image, we can see that from state q0 for input a, there are two next
states q1 and q2, similarly, from q0 for input b, the next states are q0 and q1. Thus it is
not fixed or determined that with a particular input where to go next. Hence this FA is
called non-deterministic finite automata.
δ: Q x ∑ →2Q
where,
Regular Expression
o The language accepted by finite automata can be easily described by simple
expressions called Regular Expressions. It is the most effective way to represent
any language.
o The languages accepted by some regular expression are referred to as Regular
languages.
o A regular expression can also be described as a sequence of pattern that defines
a string.
o Regular expressions are used to match character combinations in strings. String
searching algorithm used this pattern to find the operations on a string.
For instance:
In a regular expression, x* means zero or more occurrence of x. It can generate {e, x, xx,
xxx, xxxx, .....}
In a regular expression, x+ means one or more occurrence of x. It can generate {x, xx, xxx,
xxxx, .....}
Method
Example WRITE ON LEFT SIDE
∈-NFA is similar to the NFA but have minor difference by epsilon move. This
automaton replaces the transition function with the one that allows the empty
string ∈ as a possible input. The transitions without consuming an input symbol
are called ∈-transitions. In the state diagrams, they are usually labeled with the
Greek letter ∈. ∈-transitions provide a convenient way of modeling the systems
whose current states are not precisely known: i.e., if we are modeling a system
and it is not clear whether the current state (after processing some input string)
should be q or q’, then we can add an ∈-transition between these two states,
thus putting the automaton in both states simultaneously.
One way to implement regular expressions is to convert them into a finite
automaton, known as an ∈-NFA (epsilon-NFA). An ∈-NFA is a type of
automaton that allows for the use of “epsilon” transitions, which do not consume
any input. This means that the automaton can move from one state to another
without consuming any characters from the input string.
The process of converting a regular expression into an ∈-NFA is as
follows:
1. Create a single start state for the automaton, and mark it as the initial state.
2. For each character in the regular expression, create a new state and add an
edge between the previous state and the new state, with the character as
the label.
3. For each operator in the regular expression (such as “*” for zero or more, “+”
for one or more, and “?” for zero or one), create new states and add the
appropriate edges to represent the operator.
4. Mark the final state as the accepting state, which is the state that is reached
when the regular expression is fully matched.
Common regular expression used in make ∈-NFA:
Example: Create a ∈-NFA for regular expression: (a/b)*a
Practical No. 4
Program :
//Including Libraries
#include<stdio.h>
#include<stdlib.h>
#include<string.h>
//Global Variables
int z = 0, i = 0, j = 0, c = 0;
// which is to be Reduce.
void check()
if(stk[z] == '4')
printf("%s4", ac);
stk[z] = 'E';
stk[z + 1] = '\0';
//printing action
}
for(z = 0; z < c - 2; z++)
stk[z + 2] == '2')
printf("%s2E2", ac);
stk[z] = 'E';
stk[z + 1] = '\0';
stk[z + 2] = '\0';
i = i - 2;
Stk[z + 2] == '3')
printf("%s3E3", ac);
stk[z]='E';
stk[z + 1]='\0';
stk[z + 1]='\0';
i = i - 2;
//Driver Function
int main()
// a is input string
strcpy(a,"32423");
c=strlen(a);
strcpy(act,"SHIFT");
printf("\n$\t%s$\t", a);
// Printing action
printf("%s", act);
stk[i] = a[j];
stk[i + 1] = '\0';
a[j]=' ';
// Printing action
check();
}
// Rechecking last time if contain
check();
printf("Accept\n");
printf("Reject\n");
OUTPUT:
GRAMMAR is -
E->2E2
E->3E3
E->4
struct op {
char l;
char r[20];
} op[10], pr[10];
int main() {
int a, i, k, j, n, z = 0, m, q;
char *p, *l;
char temp, t;
char *tem;
printf("Enter values:\n");
for (i = 0; i < n; i++) {
printf("left: ");
scanf(" %c", &op[i].l); // added space before %c to clear the input buffer
printf("tright: ");
scanf("%s", op[i].r);
}
printf("Intermediate Code\n");
for (i = 0; i < n; i++) {
printf("%c=%s\n", op[i].l, op[i].r);
}
printf("\nOptimized Code\n");
for (i = 0; i < z; i++) {
if (pr[i].l != '0') {
printf("%c=%s\n", pr[i].l, pr[i].r);
}
}
return 0;
}
Program
#include<stdio.h>
#include<conio.h>
#include<string.h>
#include<ctype.h>
typedef struct {
char var[10];
int alive;
} regist;
regist preg[10];
int main() {
char basic[10][10], var[10][10], fstr[10], op;
int i, j, k, reg, vc = 0, flag = 0;
clrscr();
printf("\nEnter the Three Address Code:\n");
for (i = 0;; i++) {
gets(basic[i]);
if (strcmp(basic[i], "exit") == 0)
break;
}
printf("\nThe Equivalent Assembly Code is:\n");
for (j = 0; j < i; j++) {
vc = 0;
getvar(basic[j], var[vc++]);
strcpy(fstr, var[vc - 1]);
substring(basic[j], strlen(var[vc - 1]) + 1, strlen(basic[j]));
getvar(basic[j], var[vc++]);
reg = getregister(var[vc - 1]);
if (preg[reg].alive == 0) {
printf("\nMov R%d, %s", reg, var[vc - 1]);
preg[reg].alive = 1;
}
op = basic[j][strlen(var[vc - 1])];
substring(basic[j], strlen(var[vc - 1]) + 1, strlen(basic[j]));
getvar(basic[j], var[vc++]);
switch (op) {
case '+':
printf("\nAdd");
break;
case '-':
printf("\nSub");
break;
case '*':
printf("\nMul");
break;
case '/':
printf("\nDiv");
break;
}
flag = 1;
for (k = 0; k <= reg; k++) {
if (strcmp(preg[k].var, var[vc - 1]) == 0) {
printf(" R%d, R%d", k, reg);
preg[k].alive = 0;
flag = 0;
break;
}
}
if (flag) {
printf(" %s, R%d", var[vc - 1], reg);
printf("\nMov %s, R%d", fstr, reg);
}
strcpy(preg[reg].var, var[vc - 3]);
getch();
}
return 0;
}
StudyofLEXandYACCTool
Aim: Study the LEX andYACCtool and Evaluateanarithmeticexpressionwithparentheses,
unary
andbinaryoperatorsusingFlexandYacc.[Needtowriteyylex()functionandtobeusedwithLexand
yacc.].
Description:
LEX-ALexicalanalyzer generator:
Lexisacomputerprogramthatgenerateslexicalanalyzers("scanners"or"lexers").Lexiscommonly
used with the yacc parser generator.
Lexreadsaninputstreamspecifyingthelexicalanalyzerandoutputssourcecodeimplementingthe
lexer in the C programming language
1. Alexerorscannerisusedtoperformlexicalanalysis,orthebreakingupofaninputstreaminto
meaningful units, or tokens.
2. Forexample,considerbreakingatextfileupintoindividualwords.
3. Lex:atoolforautomaticallygeneratingalexerorscannergivenalexspecification(.lfile).
StructureofaLexfile
Definitionsection:
%%
Rulessection:
%%
Ccode section:
<statements>
13
➢ TheCcodesectioncontainsCstatementsandfunctionsthatarecopiedverbatimtothe
generated source file. These statements presumably contain code called by the
rules in the rules section. In large programs it is more convenient to place this
code in a separate file and link it in at compile time.
Description:-
The lex command reads File or standard input, generates a C language program, and writes it
to a file named lex.yy.c. This file, lex.yy.c, is a compilableC language program. A C++ compiler
also can compile theoutputofthelexcommand.The -
Cflagrenamestheoutputfiletolex.yy.CfortheC++compiler.The C++ program generated by the lex
command can use either STDIO or
IOSTREAMS.Ifthecppdefine_CPP_IOSTREAMSistrueduringaC++compilation,theprogramuses
IOSTREAMS for all I/O. Otherwise, STDIO is used.
ThelexcommandusesrulesandactionscontainedinFiletogenerateaprogram,lex.yy.c,whichcanbe
compiled with the cc command. The compiled lex.yy.c can thenreceive input, break the input
into the logical pieces defined by the rules in File, and run program fragments contained in the
actions in File.
ThegeneratedprogramisaClanguagefunctioncalledyylex.Thelexcommandstorestheyylexfunction
ina file named lex.yy.c. You can use the yylex function aloneto recognize simpleone-word
input,oryou can use it with other C language programs to perform more difficult input analysis
functions. For example, you can use the lex command to generate a program that simplifies an
input stream before sending it to a parser program generated by the yacc command.
The yylex function analyzes the input stream using a program structure called a finite state
machine. This structure allows the program to exist in only one state (or condition) at a time.
There is a finite number of states allowed. The rules in File determine how the program moves
from one state to another.IfyoudonotspecifyaFile,thelexcommandreadsstandardinput.Ittreats
multiplefilesasa single file.
Note:Since thelexcommandusesfixednames forintermediateandoutputfiles, youcanhaveonlyone
program generated by lex in a given directory.
RegularExpression Basics
.:matchesanysinglecharacterexcept\n
*:matches0ormoreinstancesoftheprecedingregularexpression
+:matches1ormoreinstancesoftheprecedingregularexpression
?:matches0or1oftheprecedingregularexpression
|:matchestheprecedingorfollowingregularexpression
[ ] : defines a character class
():groupsenclosedregularexpressionintoanewregularexpression
“…”:matcheseverythingwithinthe““ literally
Special Functions
14
• yytext
– wheretextmatchedmostrecentlyisstored
• yyleng
– numberofcharactersintextmostrecentlymatched
• yylval
– associatedvalueofcurrenttoken
• yymore()
– appendnextstringmatchedtocurrentcontentsofyytext
• yyless(n)
– removefromyytextallbutthefirstncharacters
• unput(c)
– returncharacterctoinput stream
• yywrap()
– maybereplacedbyuser
– TheyywrapmethodiscalledbythelexicalanalyserwheneveritinputsanEOFasthefirstcharacter
when trying to match a regular expression
Files
y.output--Containsareadabledescriptionoftheparsingtablesandareportonconflictsgeneratedby
grammar ambiguities.
y.tab.c--- Containsanoutputfile.
y.tab.h --- Containsdefinitionsfortokennames.
yacc.tmp--- Temporaryfile.
yacc.debug - Temporaryfile.
yacc.acts -- Temporaryfile.
/usr/ccs/lib/yaccpar-- ContainsparserprototypeforCprograms.
/usr/ccs/lib/liby.a --- Containsarun-timelibrary.
YACC:YetAnotherCompiler-Compiler
YacciswritteninportableC.Theclassofspecificationsacceptedisaverygeneralone:LALR(1)grammars
with disambiguating rules.
Basic specification
Names refer to either tokens or non-terminal symbols. Yacc requires tokens names to be
declared as
such.Inaddition,forreasonsdiscussedinsection3,itisoftendesirabletoincludethelexicalanalyzeras
part of the specification file, I may be useful to include other programs as well. Thus, the
sections are separated by double percent “%%” marks. (the percent‟%‟ is generally used inyacc
specifications as an escape character). In other words, a full specification file looks like.
Inotherwordsafullspecificationfilelookslike
Declarations
%%
15
Rules
%%
Programs
Thedeclarationsectionmaybeempty.Moreoveriftheprogramssectionisomitted,thesecond%%
mark may be omitted also thus the smallest legal yacc specification is
%%
Rules
Blanks,tabsandnewlinesareignoredexceptthattheymaynotappearin
namesormulti-characterreservedsymbols.Commentsmayappearwhereverlegal,
theyareenclosedin/*….*/asinCandPL/l
Therulessectionismadeupofoneormoregrammarrulehastheform: A:BODY:
USINGTHELEXPROGRAMWITHTHEYACC PROGRAM
The Lex program recognizes only extended regular expressions and formats them into
character packages called tokens, as specified by the input file. When using the Lex program
to make a lexical analyzerforaparser,thelexicalanalyzer(createdfromthe
Lexcommand)partitionstheinputstream. The parser(from the yacc command) assigns structure
to the resulting pieces. You can also use other programs along with programs generated by
Lex or yacc commands.
A token is the smallest independent unit of meaning as defined by either the parser or the
lexical
analyzer.Atokencancontaindata,alanguagekeyword,anidentifierorthepartsoflanguagesyntax.
The yacc program looks for a lexical analyzer subroutine named yylex, which is generated by
the lex
command.NormallythedefaultmainprogramintheLexlibrarycallstheyylexsubroutines.Howeverif
theyacccommandisloadedanditsmainprogramisused,yacccallstheyylexsubroutines.Inthiscase
each Lex rule should end with:
return(token);
Wheretheappropriatetokenvalueis returned
16
Theyacccommandassignsanintegervaluetoeachtokendefinedintheyaccgrammarfilethrougha#
define preprocessor statement.
Thelexicalanalyzermusthaveaccesstothesemacrostoreturnthetokenstotheparser.Usetheyacc – d
option to create a y.tab.h file and include the y.tab.h file in the Lex specification file by adding
the following lines to the definition section of the Lex specification file:
%{
#include“y.tab.h”
%}
Alternativelyyoucanincludethelex.yy.cfiletheyaccoutputfileby addingthefollowinglinesafterthe
second %% (percent sign, percent sign) delimiter in the yacc grammar file:
#include”lex.yy.c”
TheyacclibraryshouldbeloadedbeforetheLexlibrarytogetamainprogramthatinvokestheyacc
parser. You can generate Lex and yacc programs in either order.
17