Role of Lexical Analyzer_Input Buffering
Role of Lexical Analyzer_Input Buffering
INTRODUCTION TO COMPILERS
Lexical Analysis
Role of Lexical Analyzer
Input Buffering
The Role of the Lexical Analyzer
• First phase of a compiler
• Read the input characters of the source
program, group them into lexemes, and
produce as output a sequence of tokens for
each lexeme in the source program
• The stream of tokens is sent to the parser for
syntax analysis
• When discovers a lexeme constituting an
identifier, it enters that lexeme into the
symbol table
The Role of the Lexical Analyzer
• getNextToken command
– Causes the lexical analyzer to read characters from its
input until it can identify the next lexeme and produce for
it the next token, which it returns to the parser
The Role of the Lexical Analyzer
• Other tasks
– Stripping out comments and whitespace
– Correlating error messages generated by the
compiler with the source program
• Associate a line number with each error message
– Expansion of macros
The Role of the Lexical Analyzer
Tokens, Patterns and Lexemes
• Token
– A pair consisting of a token name and an optional attribute value
• The token name is an abstract symbol representing a kind of lexical unit
– E.g.: keyword, identifier.
• The token names are the input symbols that the parser processes
• Pattern
– A description of the form that the lexemes of a token may take
• Keyword - the pattern is sequence of characters that form the keyword
• Identifiers - the pattern is matched by many strings
• Lexeme
– A sequence of characters in the source program that matches the
pattern for a token and is identified by the lexical analyzer as an
instance of that token
The Role of the Lexical Analyzer
Lexical Errors
• Panic mode recovery
– Delete successive characters from the remaining
input, until the lexical analyzer can find a well-
formed token at the beginning of what input is left
• Other possible error-recovery actions are:
– Delete one character from the remaining input
– Insert a missing character into the remaining input
– Replace a character by another character
– Transpose two adjacent characters
Input Buffering
• Speed reading the source program
• Two-buffer scheme to handle large
lookaheads
Input Buffering
Buffer Pairs
• Involves two buffers that are alternately
reloaded
– To reduce the amount of overhead required to
process a single input character
• Each buffer is of the same size N
– N is usually the size of a disk block
• Two pointers to the input are maintained:
– lexemeBegin
• Marks the beginning of the current lexeme
– forward
•
Input Buffering
Buffer Pairs