0% found this document useful (0 votes)

51 views16 pages

Compiler - Lexical Analyzer-2

The document discusses the role and implementation of a lexical analyzer, which recognizes tokens, generates token streams, and reports lexical errors using regular expressions and finite state automata. It covers the concepts of tokens, lexemes, patterns, and the structure of a symbol table, along with implementation approaches and challenges. Additionally, it highlights the use of tools like LEX for generating lexical analyzers and the importance of handling keywords and ambiguities in token recognition.

Uploaded by

dhasarathan120904

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views16 pages

Compiler - Lexical Analyzer-2

Uploaded by

dhasarathan120904

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Compiler

Lexical Analyzer

Kamalika Bhattacharjee
Asst Prof., Dept. of CSE, NIT Trichy
Lexical Analysis
• Recognize tokens and ignore white spaces, comments
• Generates token stream
• Discard whatever does not contribute to parsing
• white spaces (blanks, tabs, newlines) and comments
• Construct constants:
• convert numbers to token num and pass number as its attribute ● Ex: integer 31 → <num, 31>
• Recognize keyword and identifiers ● counter = counter + increment → id = id + id
• Find word boundaries

• Report Errors
• Lexical Errors f i ( a == f ( x ) ) . ..
❖ Model using regular expressions
● Other cases: Panic mode error recovery
○ Delete one character from the remaining input ❖ Recognize using Finite State Automata
○ Insert a missing character into the remaining input ❖ Use symbol table
○ Replace a character by another character ❖ Line number
○ Transpose two adjacent characters
Units of Lexical Analyzer
● Token: a syntactic category
○ Sentences consist of a string of token
○ Ex: number, identifier, keyword, string etc.

● Lexeme: Sequence of characters in a token

○ Ex: 100.01, counter, const, "I am happy." etc.

● Pattern: Rule of description

○ Ex: letter (letter | digit)*
○ In general, there is a set of strings in the input for which the same token is produced as output.
■ Described by a rule called a pattern associated with the token
■ This pattern is said to match each string in the set.
○ A lexeme is a sequence of characters in the source program that is matched by the pattern for a token.
○ The patterns are specified using regular expressions.
Lexical Analysis

Tricky Problems:
• Fixed format vs Dynamic Format • Unreserved Keywords
If then then then=else else else = then else if if then then = then+1
Role of Lexical Analyzer
• Push back is required due to lookahead
• Implemented through a buffer
• Keep input in a buffer
• Move pointers over the input

• Using a pair of input buffers alternatively reloaded

• Two buffers of same size, usually size of a disk block
• Two pointers to move
Each character read needs two tests: end of buffer and what character is read

• Use of a special character to mark end of

buffer as well as input → Sentinels
• Only one test to understand what is read
⮚
Implementation Approaches
• Use assembly language: Most efficient but most difficult to implement
• Use high level languages like C: Efficient but difficult to implement
• Use tools like LEX, FLEX: Easy to implement but not as efficient as the first two cases

Usual Approach
• Start with tool-based and move towards
implementing in high level languages
• Then replace the I/O operations by fast and
efficient assembly language routines
Construct a Lexical Analyzer
• Allow white spaces, numbers and arithmetic operators
in an expression
• Return tokens and attributes to the syntax analyzer
• A global variable tokenval to use to return value of
the token (lexeme)
• A finite set of tokens be defined
• Patterns describing strings belonging to each token

Problems:
• Scans text character by character
• Look ahead character determines type of token and word boundary
• First character cannot determine the type of token
• Large computational overhead on processing an input characters

Approach: Systematically construct lexical analyzer

Symbol Table
• Stores information for subsequent phases

• Minimum Functionality
• Insert(s,t): save lexeme s and token t and return pointer
• Lookup(s): return index of entry for lexeme s or 0 if s is not found

• Implementation
• Fixed amount of space to store lexemes
• Waste space: not advisable

• Store lexemes in a separate array.

• Each lexeme is separated by eos.
• Symbol table has pointers to lexemes
• Can save ~ 70% of the space wasting before
• 'Other Attributes' are to be filled in the later phases
Implementation Issues
• Handling keywords as reserved
• Consider keywords themselves as lexemes
• Store all entries corresponding to keywords in symbol table while initialization
• Lookup for every new lexeme
• Returns nonzero value meaning existence of a corresponding entry in the Symbol Table
• Handling of blanks
Counter is same as Count er [FORTRAN]

• If keywords are not reserved

if then then then = else else else = then else if if then then = then + 1
[PL/1]
Declare(arg 1 ,arg 2 ,arg 3 ,...,arg n )

Requires arbitrary lookahead and very large buffers

• How to specify tokens? • How to describe tokens?

• Tokens may have similar prefixes • Regular languages
• Each character should be looked at only once • Finite Automata
⮚
Regular Definitions
• Take a fax number:
91-(431)-250-0133

• Take email id:

• Identifiers kamalika@[Link]

• Floating point numbers

Use of shorthand notations:

⮚
Implementation of Specifications
• Regular expressions are only specifications; implementation is still required
• Just yes/no answer on validity of the token is not enough
• Goal: Partition the input into tokens
• Gives priority to tokens listed earlier
• Reserved keyword policy
• If a token belongs to more than one category, needs to set
up priority rules to remove ambiguity
• Pick up the longest possible string in L(R)
• The principle of "maximal munch"
• Regular expressions provide a concise and useful notation
for string patterns
• Good algorithms require a single pass over the input

• How to break up text elsex=0 else x=0 or elsex=0 ?

• Regular expressions alone are not enough

• Lexical definitions consist of regular definitions, priority rules and maximal munch principle
Recognition of Tokens
Construct an analyzer that will return <token, attribute> pairs
• Relational Operators • Identifiers

• White spaces

• Unsigned numbers
Implementation of Transition Diagram
• Switch Case based structure • Unsigned numbers: another transition diagram

• Complexity of transition diagram increases implement

difficulty and is more prone to errors
• Tradeoff: may need to unget() a large number of
characters
Lexical Analyzer Generator
• Input to the generator
• List of regular expressions in priority order
• Associated actions for each of regular expression
(generates kind of token and other book keeping
information)
• Output of the generator
• Program that reads input character stream and
breaks that into tokens
• Reports lexical errors, if any

LEX regular expressions

• Implementing lookahead
DO 10 I = 1.25
DO 10 I = 1,25 Specification for DO as keyword: DO/(letter|digit)*=(letter|digit)*,
Lexical Analyzer Generator
• Structure of LEX program • Translation rule

• How does LEX work?

• Regular expressions to describe the tokens
• Translate each regular expression into NFA
• Convert the NFA into an equivalent DFA
• Minimize the DFA to reduce number of states
• Generate code driven by the DFA tables

• installID() returns a pointer to symbol table placed in yylval

• It has other two variables available:
• yytext: pointer to beginning of lexeme
• yyleng: length of the lexeme
• yylval is a global variable
Some Reference Materials
[Link]
[Link]

[Link]
[Link]

[Link]

[Link]
[Link]

Compiler Construction Lexical Analysis
No ratings yet
Compiler Construction Lexical Analysis
63 pages
Unit 2
No ratings yet
Unit 2
61 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Lexical Analysis: Tokens & Patterns Explained
No ratings yet
Lexical Analysis: Tokens & Patterns Explained
77 pages
Lexical Analysis in Compilers
No ratings yet
Lexical Analysis in Compilers
5 pages
Unit NO.03 Phases in Compilers-Lexical Analysis& Syntax Analysis
No ratings yet
Unit NO.03 Phases in Compilers-Lexical Analysis& Syntax Analysis
43 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
56 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
4 LexicalAnalysis
No ratings yet
4 LexicalAnalysis
27 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
24 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
2 1 Lexical Analysis
No ratings yet
2 1 Lexical Analysis
30 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
L2 Lexical Analysis
No ratings yet
L2 Lexical Analysis
59 pages
HW 31712
No ratings yet
HW 31712
22 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
88 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Lexical Analysis
No ratings yet
Lexical Analysis
88 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
64 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
71 pages
CH 3
No ratings yet
CH 3
66 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Compiler
No ratings yet
Compiler
60 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
Lexical Analysis Overview by Atif Ishaq
100% (1)
Lexical Analysis Overview by Atif Ishaq
37 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
Lexical Analysis
No ratings yet
Lexical Analysis
6 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Unit2 Lexical Analyzer
No ratings yet
Unit2 Lexical Analyzer
6 pages
CSC 415 Compiler Design: Lexical Analysis
No ratings yet
CSC 415 Compiler Design: Lexical Analysis
40 pages
Lexical Analyzer
No ratings yet
Lexical Analyzer
18 pages
Lexical Analysis Techniques and Implementation
No ratings yet
Lexical Analysis Techniques and Implementation
44 pages
Lecture 2 10022025 035804pm
No ratings yet
Lecture 2 10022025 035804pm
27 pages
Ch2 - Lexical Analysis
No ratings yet
Ch2 - Lexical Analysis
76 pages
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
No ratings yet
Slides 02 - Compiler Construction - UET CS - Lexical Analyzer Rev 2
69 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
48 pages
2024 CD-Ch02 Lexical Analysis
No ratings yet
2024 CD-Ch02 Lexical Analysis
25 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
Compiler Design 2
No ratings yet
Compiler Design 2
9 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
CC Unit 2
No ratings yet
CC Unit 2
80 pages
Compiler Design Basics
No ratings yet
Compiler Design Basics
14 pages
Understanding Lexical Analysis in Compilers
No ratings yet
Understanding Lexical Analysis in Compilers
153 pages
Lecture 3
No ratings yet
Lecture 3
22 pages
Practical File: Computer Network and Security
No ratings yet
Practical File: Computer Network and Security
28 pages
Lexical Analysis in Compiler Design
100% (1)
Lexical Analysis in Compiler Design
52 pages
Unit II - Lexical Analysis-20-1-2021
No ratings yet
Unit II - Lexical Analysis-20-1-2021
49 pages
Lexical Analyzer in Perspective: Parser Source Program Token
No ratings yet
Lexical Analyzer in Perspective: Parser Source Program Token
22 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Lexical Analysis
No ratings yet
Lexical Analysis
153 pages
SSC Module2 LexicalAnalysis
No ratings yet
SSC Module2 LexicalAnalysis
26 pages
Basic Blocks, Flow Graphs
No ratings yet
Basic Blocks, Flow Graphs
27 pages
Intermediate Code Generation
No ratings yet
Intermediate Code Generation
44 pages
04 - CALR Parsing
No ratings yet
04 - CALR Parsing
28 pages
07 - SLR Parsers - LR
No ratings yet
07 - SLR Parsers - LR
49 pages
Med Blind Tuner
No ratings yet
Med Blind Tuner
8 pages
ERW MS Pipes & GI Pipes Supplier
No ratings yet
ERW MS Pipes & GI Pipes Supplier
7 pages
Guide To Admissions 2025-26
No ratings yet
Guide To Admissions 2025-26
183 pages
Rubi v. Provincial Board of Mindoro
No ratings yet
Rubi v. Provincial Board of Mindoro
1 page
Literature Review: 2.1 Competency
100% (1)
Literature Review: 2.1 Competency
17 pages
Clac
No ratings yet
Clac
4 pages
Overview of Financial Systems & Markets
No ratings yet
Overview of Financial Systems & Markets
41 pages
Atlasti v8 Manual en Ingles
No ratings yet
Atlasti v8 Manual en Ingles
221 pages
Checklist DA62
No ratings yet
Checklist DA62
3 pages
Aptis Writing Task 4 - Photography Club (Answer Each Question Professionally
No ratings yet
Aptis Writing Task 4 - Photography Club (Answer Each Question Professionally
2 pages
MKT Study Guide For Midterm
No ratings yet
MKT Study Guide For Midterm
3 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
2018 Stonic G 1.0 T Gdi Kappa
No ratings yet
2018 Stonic G 1.0 T Gdi Kappa
2 pages
ITECH Aircon User Manual English
No ratings yet
ITECH Aircon User Manual English
25 pages
Strategic Management of Mediatrix Hospital
No ratings yet
Strategic Management of Mediatrix Hospital
10 pages
Wire EDM Process Overview and Applications
No ratings yet
Wire EDM Process Overview and Applications
23 pages
B.A. Management First Year Exam Scheme
No ratings yet
B.A. Management First Year Exam Scheme
13 pages
Math Quiz: Geometry and Arithmetic Questions
No ratings yet
Math Quiz: Geometry and Arithmetic Questions
4 pages
Calculus Class Exercises
No ratings yet
Calculus Class Exercises
4 pages
Geography Assignment F3
No ratings yet
Geography Assignment F3
2 pages
Pastors Guide
100% (3)
Pastors Guide
36 pages
SocGen - Technical Analysis - 2y UST, 10y UST, 10y Bund, Canada 10y, UK 10y IRS, India 10y, EURUSD, USDJPY, USDCAD, EURPLN, USDCNH, USDINR, Gold
No ratings yet
SocGen - Technical Analysis - 2y UST, 10y UST, 10y Bund, Canada 10y, UK 10y IRS, India 10y, EURUSD, USDJPY, USDCAD, EURPLN, USDCNH, USDINR, Gold
17 pages
Bio Seminar CH 7
No ratings yet
Bio Seminar CH 7
8 pages
EmotionsCheat PDF
No ratings yet
EmotionsCheat PDF
1 page
Legal Analysis: Forum Non Conveniens
No ratings yet
Legal Analysis: Forum Non Conveniens
3 pages
Seminar Project Document Templet
No ratings yet
Seminar Project Document Templet
9 pages
KTS Lab5
No ratings yet
KTS Lab5
23 pages
Paladin 625 Trencher Parts List
No ratings yet
Paladin 625 Trencher Parts List
53 pages
SBA Annual Report 2001
No ratings yet
SBA Annual Report 2001
73 pages
Objects First With Java Chapter 3
No ratings yet
Objects First With Java Chapter 3
6 pages

Compiler - Lexical Analyzer-2

Uploaded by

Compiler - Lexical Analyzer-2

Uploaded by

Compiler

● Lexeme: Sequence of characters in a token

● Pattern: Rule of description

• Using a pair of input buffers alternatively reloaded

• Use of a special character to mark end of

Approach: Systematically construct lexical analyzer

• Store lexemes in a separate array.

• If keywords are not reserved

Requires arbitrary lookahead and very large buffers

• How to specify tokens? • How to describe tokens?

• Take email id:

• Floating point numbers

Use of shorthand notations:

• How to break up text elsex=0 else x=0 or elsex=0 ?

• Regular expressions alone are not enough

• Complexity of transition diagram increases implement

LEX regular expressions

• How does LEX work?

• installID() returns a pointer to symbol table placed in yylval

You might also like