Screenshot 2024-02-07 104122-Compressed

Uploaded by

vinaykumarms343

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

0% found this document useful (0 votes)

40 views24 pages

Screenshot 2024-02-07 104122-Compressed

Uploaded by

vinaykumarms343

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF or read online on Scribd

You are on page 1/ 24

1.4 Lexical analysis Now, let us see “What is lexical analysis?” Definition: The process of reading the source program and converting it tokens is called lexical analysis. 1.4.1 The role of lexical analyzer Now, let us see “What is the role of lexical analyzer?” The lexic: phase of the compiler. The various tasks that are performed by the ‘+ Read a sequence of characters from the source program and produce tt * The tokens thus generated are sent to the parser for syntax analysis also called syntax analyzer. During this process, lexical analyzer interacts with symbol table to in: identifiers and constants. Sometimes, information of iden table to assist in determining the proper token to send to the parser. ‘The interaction between the lexical analyzer and the parser is pictorially repres shown below:4.18 © Lexical Analyzer Token. ‘Semantic Source ‘Scanner (Lexical analyzer) Analysis. Program qaNexiToken() 4. The parser program calls the function getNextToken@) which is the function defined in lexical analyzer (See the calling sequence below) Parser Program Lu er program return token; ) The function getNextToken() of lexical analyzer returns the token back to parser for parsing. _ ts in amtared into the symbol table along with¢ The function getNextToke parsing. 4 If the token obtained i various attribute values denoted by ID and a poi 4. The other actions that are per return token; a0) of lexical analyzer returns the token back to parser for ol table along with ier, it is entered into the symbs ff an integer code s an identifi and returns a token as a pair consisting o ‘inter to the symbol table for that identifier, formed by the parser are: Removes comments from the program. Remove white spaces such as blanks, tabs and newline characters from the are obtained. source program and then tokens Keep track of line numbers so as to associate Tine numbers with error messages TTany errors are encountered, the lexical analyzer displays appropriate ect inessages along with line numbers Preprocessing may be done during lexical analysis phasejg the start symbol grammar to geneare the following language: = {ww" where w € (a, b}*} s 26: Obtain a sample L s+ \asa\bs Janguage can be written as: tion: The ; sauton The ETE aa, bb, abba, baab,aaaa, bbbb,....} Observe that the given string is a palindrome of even length. This is achived by idleting the productions $—a|b. So, the final grammar is given by: Sse 7 oO ‘a S —» aSa|bSb ase. a abst Note: In the above grammar if the production Q F soe a b b isrplaced by a WCE 83.6" am wae te result . : m > ing grammar will generate the language m L= {wew" | w € {8b} "1020 and 1U5U are ine aumpute venues wi 1.5 Input buffering )® oe ye Now, let us see “Why input buffering is required?” Input buffering is very eae the following reasons: ¢ Since lexical analyzer is the first phase of the compiler, it is the only phase of the compiler that reads the source program character-bi considerable time in eding the source program. Thus, the speed of lexical analysis is i coneorn-while Cesta ike Panes Ss speed of lexical analysis ¢ Lexical analyzers may have to look ne or more charac before we vehave the ne right lexemi ters beyond the next lexee For this reason, we use the concept of input buffering where a block of 1024 or 4096 or more characters are read in one menfbry read operation Se HORE REMOLY Tead operation and stored in the array to speed bien,The otectred. xf Cee eae a flocw of Cherro shi Cie oc de) & Compiler Design - 1.25 Now, let us see “What is input buffering?” Definition: The method of reading a block of characters (1K or 4K or more bytes) from the disk in ofe'read- operation and storing in memory (normally in the form of-an aay) for further” processing and faster accessing is called inpur buffering. The_memory (an array) where a block of characters read from the disk are stored is called buffer. Now, |<———— Buffer2, 4 The size of each buffer is N where N is usually the size of the disk block. If size o disk block is 4K, in one read operation 4096 characters can be read into the buf using one system command: rather than using one system call per character which consumes lot of time. 4 Imespective of number of characters stored in the buffer, la is eof. 4 Note that eof retains its use as a marker for the end of the entire input. Any eof that appears other than at the end of a buffer means that the input is at an end. 4 The use of two pointers TexembeBeginning and input pointer and the methot Cl accessing lexeme remains same as in buffer pairs. * The algorithm consisting of lookahead code with sentinels is shown below: switch (*inputPointer++) { character of each buf: case eof: i: (inputPointer is at end of first buffer) reload the second buffer; inputPointer = beginning of second buffer; break;& Compiler Design - 1.27 if (inputPointer is at the end of second buffer) { reload the first buffer; inputPointer = beginning of first buffer; break; } /* eof within a buffer indicates the end of the input */ /* So, terminate lexical analysis */ break; /* Cases for other characters */ } ® Observe from the above algorithm that instead of having two tests as in buffer pair technique, there is only one test ie., testing the eof marker. 1.6 Specifications of tokensProblems Prove that b= fol" | zi} as eet seer L+for, oor, o00111, sooo «J yy) Le fot | mz, mej D> Amume L us vy on a method of > lab n bea Contradt on - 3 hut wort” \ A gpiek wexyz such tet J ye anDS Amume L us vugueer= method 4 > kt 7 bea Constant au oe Cyatradecton > Gplttk we Xy¥z such that ye ’ a. /ayl=” 3-daale KP, aya e he wed md, w=00!! syz00 7 a0, yO ZN @ pwyume Ke) TYE 7 cool ® b ae Fedo © inyt% dy not Tequtesrwsdl” nd, weoo!! syz00 | ~ a20, y=0 Zl" @ payee we ye = EET @L — Henu Lrqor® [ny ds not Tequlort _-_-_-Pove that L=fat|Puts a freee} fut db a Rigutan dangoage, ‘pl ds an gntegar Comtant Seek a abtng ‘ud {rom L Yuch — thouk, be daa, dag, aaaee- - Y dub 0-3. 22.0. xXye. el a Kel exyke « woe © : $ ye 2 contradict KeQ - ayX2 = 000 gh =In the above statement, the patterns, lexemes re shown below: and respective tokens ar ‘Symbolic names defined using #define keyword char p= CHAR, Z identifier str —> 41,1? —> LEFT_BRACKET lefi bracket | pore ; —> RIGHT_BRACKET right bracket Pane —> ASSIGN a operator = x -L > , LITERAL? Se ‘strit a jo s tring “hel me seMi_COLON symbol 3 SY Jexemes PatterToken =: Webb Tokens: Now, let us see what is a token?” 41 ig a pair consisting of token name and an 9) ion citrus value, sieally integer codes: represented using Sym jolic names written in INT. i ‘defined in the file token.h in OAT, SEMI_COLON te # Will not be present for keywords, . tribute values are optional and a press and symbols, The attribute values are present for all identifiers and constants, rr unique token name. For example, INT, FLOAT, CHAR, Definition: A tol he token names i operator + For every keyword ther ‘ar every symbol there is a unique token name. For example, SEMI_COLON, COLON, COMMA, LEFT_PARANTHESIS, RIGHT_PARANTHESIS, ASSIGN cl ————_ + Foran identifier sum, the token is where ID is the token name and 1 is the position of the identifier in the symbol table er Whenever there is a request from the parser, the lexical analyzer sends the token. So, tokens are output of the lexical analyzer and input to the syntax analyzer. The syntax analyzer uses these tokens to check whether the program is syntactically correct or not by deriving the tokens from the grammar. All the tokens are represented using symboli¢ constants defined using #define directive as shown belo/* TOKENS with corresponding integer codes for keywords **/ #define itdefine #define itdefine define #define AURWN1.4.3.2 Lexeme Now, let us see “What is a lexeme?” Definition: A sequence of characters in the source program that matches the patterns such as identifiers, numbers, relational operators, arithmetic operators, symbols such as #, £1, G) and so on are called Jexemes. In other words, a lexeme is a string of patterns read from the source file that corresponds to a token. 1.4.3.3 Patterns Now, let us see “What is a pattern?” Definition: The description of a lexeme is called pattern. More formally, a pattern is described as Tule describing set of lexemes. The various patterns are shown below: Fearon: Tie pais fojoord ia aati of cheeaien aicice eee rae of a language. For example, int, if, else, while, do, switch etc are all reserve words. They are also called keywords ¢- Identifier: The pattern identifier is described a sequence of letters or underscores followed by any number of letters or digits or underscores. For example, sum, i, pos, first, rate_of_interest that represent variables in a program or that represent names of functions, structures etc. are all treated as identifiers.1.22 B Lexical Analyzer ¢ Relational Operator: The pattern relational operators which is described 2s a symbols that reprovent various relational oparetors of a language. Fox exarapie: = ~ 1= represent pattems identifying the relational operate 4 Sembols: The pattern symbols is described as vet of symbols 9 },: and soon h 2s #, S$. 6) fExample 1.2: Identify lexemes and tokens in the following statement: printf(“Simple Interest = Jof\n’, si); soa Solution: The lexemes, patterns and tokens for the given printf statements are shown below: + prinif is a lexeme matching the pattern identifier and returning the token Mint ID i Ge is The token name and 1 is the position of identifier pringf in the sym table 4 The character ‘( is a lexeme matching the-patt i ken TET PARMA ig Pattern symbol and returning. the ‘0 # The sequence of characters “Simple Interest i = %fin” is hing pate suing and retuming the Token LITERAL, 7 WIGS Lene ee name and is the postion of literal in tie SHES sae + The character "is a Texeme matchin, ig the patt i coset Redan pattern symbol and returning the& Compiler Design - 1.23 ¢ stis a lexeme matching the pattem identifier and returning the token where ID ig the token name and 3 is the position oF identifier a in the symbol table ¢ The character °" is a lexeme matching the pattern symbol and returning the token SEMICOLON —— :7; Obtain the grammar to generate the language b f mee L={0"}"2"{m2 1 andn20} Kh ' eS simple approach 4.21 tion: Given the language the productions can be gencrated as shown below: solu L={0"1"2"|mzLandn>0} 1s we 5 AO SAB ..., (1) pe wee rhe variable A should produce m number of O's f ¢ ’s with a minimum string 01 (Since m = 1). This j following production: wed by m number of is achieved using thewhee ‘The variable A should produce m number of 0’s followed by m number of 1's with a minimum string 01 (Since m = 1). This is achieved using the following production: A—>01]0Al [Similar to example 20, page 4.17] =“ ¢ B should produce any number of 2's. Any number of 2's can be generated using the production: Boe|2B [Similar to example 1, page 4.7] So, final grammar to accept the given language is: SAB . A>01/0AI |ramart seen = {0"1"2"|m=>1andn>0} B—>|2B The following grammar also generates the same language. The reader is required to verify the answer. 4 eo Ark. S — A|S2 A-— 01/0A1 pse\2e-

Input Buffering
No ratings yet
Input Buffering
129 pages
SPCC Module 5 Lect 2 Lexical Analysis Part 1
No ratings yet
SPCC Module 5 Lect 2 Lexical Analysis Part 1
16 pages
Compiler Easy Notes - Hamza Zahoor
No ratings yet
Compiler Easy Notes - Hamza Zahoor
37 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
56 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
26 pages
Lexical Analysis: Deterministic Finite Automata
No ratings yet
Lexical Analysis: Deterministic Finite Automata
37 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
71 pages
Pcdunit2 Class
No ratings yet
Pcdunit2 Class
21 pages
2.1 - Lexical Analysis
No ratings yet
2.1 - Lexical Analysis
102 pages
Unit 2 - Lexical Anlaysis
No ratings yet
Unit 2 - Lexical Anlaysis
76 pages
Unit 2 Lexical Analysis
No ratings yet
Unit 2 Lexical Analysis
94 pages
CD Aii Partb Ans
No ratings yet
CD Aii Partb Ans
8 pages
Unit 2
No ratings yet
Unit 2
61 pages
Parser Lexical Analysis
No ratings yet
Parser Lexical Analysis
6 pages
Lexical Analysis
No ratings yet
Lexical Analysis
45 pages
Unit II
No ratings yet
Unit II
35 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
24 pages
Compiler - Design - Module2-Print
No ratings yet
Compiler - Design - Module2-Print
16 pages
UNIT 2 Compiler Design
No ratings yet
UNIT 2 Compiler Design
23 pages
Chapter 2
No ratings yet
Chapter 2
36 pages
Efficient Lexical Analysis Techniques
No ratings yet
Efficient Lexical Analysis Techniques
28 pages
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
No ratings yet
Compiler Design: Ambo University School of Informatics and Electrical Engineering Department of Computer Science
35 pages
Lexical Analysis Overview
No ratings yet
Lexical Analysis Overview
17 pages
Unit 2 Lexical Analysis, Input Buffer, Example
No ratings yet
Unit 2 Lexical Analysis, Input Buffer, Example
6 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
71 pages
PCD - Theory - Paper Solution - Nov - Dec - 2017
No ratings yet
PCD - Theory - Paper Solution - Nov - Dec - 2017
27 pages
CD 2
No ratings yet
CD 2
20 pages
SSCD Chapter3
No ratings yet
SSCD Chapter3
97 pages
Compiler Module 1 Important Questions
No ratings yet
Compiler Module 1 Important Questions
14 pages
Unit2 Lexical Analyzer
No ratings yet
Unit2 Lexical Analyzer
6 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
84 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
58 pages
Compiler Design Basics
No ratings yet
Compiler Design Basics
14 pages
ch-2 Compiler Design
No ratings yet
ch-2 Compiler Design
9 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
11 pages
Unit 2 Lexical Analysis - Part 1: Harshita Sharma
No ratings yet
Unit 2 Lexical Analysis - Part 1: Harshita Sharma
55 pages
CD - Module 2
No ratings yet
CD - Module 2
12 pages
Compiler Design Essentials
No ratings yet
Compiler Design Essentials
18 pages
CD Previous QP Answers
No ratings yet
CD Previous QP Answers
28 pages
Unit 2
No ratings yet
Unit 2
80 pages
Chapter 3 - Lexical Analysis
No ratings yet
Chapter 3 - Lexical Analysis
62 pages
CD Notes
No ratings yet
CD Notes
7 pages
CD Prev Ans and Ques
No ratings yet
CD Prev Ans and Ques
37 pages
Chapter 2
No ratings yet
Chapter 2
67 pages
Lexical Analysis in Compilers
No ratings yet
Lexical Analysis in Compilers
5 pages
Comp Chap2
No ratings yet
Comp Chap2
36 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Unit 1 CD
No ratings yet
Unit 1 CD
36 pages
CD 1
No ratings yet
CD 1
92 pages
Compiler - Lexical Analysis
No ratings yet
Compiler - Lexical Analysis
17 pages
CD CIE 1 - DD - Scheme
No ratings yet
CD CIE 1 - DD - Scheme
13 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part1
63 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
26 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
76 pages
Chapter - 2 Lexical Analysis
No ratings yet
Chapter - 2 Lexical Analysis
160 pages
Lexical Analysis
No ratings yet
Lexical Analysis
121 pages
Visvesvaraya Technological University: Artificial Intelligence & Data Science
No ratings yet
Visvesvaraya Technological University: Artificial Intelligence & Data Science
11 pages
M 2
No ratings yet
M 2
20 pages
21be45 Bfe Module 1 Ia1 July 2023
No ratings yet
21be45 Bfe Module 1 Ia1 July 2023
1 page
Priti
No ratings yet
Priti
13 pages
Data Encryption for Developers
No ratings yet
Data Encryption for Developers
25 pages
Supply Chain Management Data Analytics
No ratings yet
Supply Chain Management Data Analytics
11 pages
M 1
No ratings yet
M 1
38 pages
@vtucode - in Previous Year Merged Paper Solution Automata
No ratings yet
@vtucode - in Previous Year Merged Paper Solution Automata
42 pages
@vtucode - in Module 1 Written ATC 2021 Scheme
No ratings yet
@vtucode - in Module 1 Written ATC 2021 Scheme
89 pages
@vtucode - in Previous Year Paper Solution 1 CN
No ratings yet
@vtucode - in Previous Year Paper Solution 1 CN
22 pages
@vtucode - in Previous Year Merged Paper Solution CN
No ratings yet
@vtucode - in Previous Year Merged Paper Solution CN
48 pages
C# Labpgms
No ratings yet
C# Labpgms
27 pages
@vtucode - in - CN QUESTION BANK 2021 SCHEME
No ratings yet
@vtucode - in - CN QUESTION BANK 2021 SCHEME
7 pages
21CS54 SIMP Questions - 21SCHEME: To Pass and Score Decent Just Study Module 1,2 3
No ratings yet
21CS54 SIMP Questions - 21SCHEME: To Pass and Score Decent Just Study Module 1,2 3
5 pages

Screenshot 2024-02-07 104122-Compressed

Uploaded by

Screenshot 2024-02-07 104122-Compressed

Uploaded by

You might also like