0% found this document useful (0 votes)
14 views11 pages

Role of Lexical Analyzer_Input Buffering

The document provides an overview of the role of the lexical analyzer in a compiler, which includes reading source program characters, grouping them into lexemes, and producing tokens for syntax analysis. It also discusses input buffering techniques to enhance processing efficiency and outlines error recovery methods for lexical errors. Key concepts such as tokens, patterns, and lexemes are defined, along with the use of buffer pairs and sentinel characters in input handling.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views11 pages

Role of Lexical Analyzer_Input Buffering

The document provides an overview of the role of the lexical analyzer in a compiler, which includes reading source program characters, grouping them into lexemes, and producing tokens for syntax analysis. It also discusses input buffering techniques to enhance processing efficiency and outlines error recovery methods for lexical errors. Key concepts such as tokens, patterns, and lexemes are defined, along with the use of buffer pairs and sentinel characters in input handling.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

UNIT I

INTRODUCTION TO COMPILERS

Lexical Analysis
Role of Lexical Analyzer
Input Buffering
The Role of the Lexical Analyzer
• First phase of a compiler
• Read the input characters of the source
program, group them into lexemes, and
produce as output a sequence of tokens for
each lexeme in the source program
• The stream of tokens is sent to the parser for
syntax analysis
• When discovers a lexeme constituting an
identifier, it enters that lexeme into the
symbol table
The Role of the Lexical Analyzer

• getNextToken command
– Causes the lexical analyzer to read characters from its
input until it can identify the next lexeme and produce for
it the next token, which it returns to the parser
The Role of the Lexical Analyzer
• Other tasks
– Stripping out comments and whitespace
– Correlating error messages generated by the
compiler with the source program
• Associate a line number with each error message
– Expansion of macros
The Role of the Lexical Analyzer
Tokens, Patterns and Lexemes
• Token
– A pair consisting of a token name and an optional attribute value
• The token name is an abstract symbol representing a kind of lexical unit
– E.g.: keyword, identifier.
• The token names are the input symbols that the parser processes
• Pattern
– A description of the form that the lexemes of a token may take
• Keyword - the pattern is sequence of characters that form the keyword
• Identifiers - the pattern is matched by many strings
• Lexeme
– A sequence of characters in the source program that matches the
pattern for a token and is identified by the lexical analyzer as an
instance of that token
The Role of the Lexical Analyzer
Lexical Errors
• Panic mode recovery
– Delete successive characters from the remaining
input, until the lexical analyzer can find a well-
formed token at the beginning of what input is left
• Other possible error-recovery actions are:
– Delete one character from the remaining input
– Insert a missing character into the remaining input
– Replace a character by another character
– Transpose two adjacent characters
Input Buffering
• Speed reading the source program
• Two-buffer scheme to handle large
lookaheads
Input Buffering
Buffer Pairs
• Involves two buffers that are alternately
reloaded
– To reduce the amount of overhead required to
process a single input character
• Each buffer is of the same size N
– N is usually the size of a disk block
• Two pointers to the input are maintained:
– lexemeBegin
• Marks the beginning of the current lexeme
– forward

Input Buffering
Buffer Pairs

• Once the next lexeme is determined, forward is set


to the character at its right end
• After the lexeme is recorded, lexemeBegin is set to
the character immediately after the lexeme just
found
• If end of one buffer is reached the other buffer is
reloaded from the input, and forward is moved to
the beginning of the newly loaded buffer
Input Buffering
Sentinels
• To combine the buffer-end test with the test for
the current character, each buffer to hold a
sentinel character at the end
– A special character that cannot be part of the source
program, and a natural choice is the character eof
• eof that appears other than at the end of a
buffer means that the end of input

You might also like