0% found this document useful (0 votes)

15 views

Chapter2-Lexical Analysis

Uploaded by

Boi Phúc

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views

Chapter2-Lexical Analysis

Uploaded by

Boi Phúc

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Chapter 2 – Lexical Analysis

Compiler
• Compiler translates from one language to another

Source code Front End Back End Target code

• Front End: Analysis

• Takes input source code
• Returns Abstract Syntax Tree and symbol table
• Back End: Synthesis
• Takes AST and symbol table
• Returns machine-executable binary code, or virtual machine code
Front End

Lexical Syntax Semantic

Analysis Analysis Analysis

• Lexical Analysis: breaks input into individual words – “tokens”

• Syntax Analysis: parses the phrase structure of program
• Semantic Analysis: calculates meaning of program
The role of the Lexical Analysis

-> read the input characters of the source program

-> group them into lexemes
-> produce as output a sequence of tokens for each lexeme in the source
program
Lexing & Parsing
• From strings to data structures

Abstract
Strings/Files Tokens
Syntax Trees

Lexing Parsing
Interactions between the lexical analyzer
and the parser
Tokens, Patterns and Lexemes
• A pattern is a description of the form that the lexemes of a token may take
(the set of rule that define a TOKEN).

• A lexeme is a sequence of characters in the source program that matches

the pattern for a token and is identfied by the lexical analyzer as an instance of that
token

• A token is a pair consisting of a token name and an optional attribute

value.
• Common token names are
• identifiers: names the programmer chooses
• keywords: names already in the programming language
• separators (also known as punctuators): punctuation characters and paired-delimiters
• operators: symbols that operate on arguments and produce results
• literals: numeric, logical, textual, reference literals
• ………..
Tokens, Patterns and Lexemes
• Consider this expression in the programming language C:
sum=3+2;
• Tokenized and represented by the following table:
Lexeme Token Name
sum Identifier
= Operator
3 Literal
+ Operator
2 Literal
; Seperator
Tokens, Patterns and Lexemes
Lexeme Token Name
if (y <= t) y = y - 3; if Keyword
( Open parenthesis
y Identifier
<= Comparison operator
t Identifier
) Close parenthesis
y Identifier
= Assignment operator
y Identifier
- Arithmatic operator
3 Integer
; semicolon
Attributes for Tokens
• When more than one lexeme can match a pattern, the lexical analyzer
must provide the subsequent compiler phases additional information
about the particular lexeme that matched.

• For example, the pattern for token number matches both 0 and 1, but it is
extremely important for the code generator to know which lexeme was
found in the source program.
Tokens, Patterns and Lexemes
cout << 3+2+3;
Lexeme The following tokens are returned by
scanner to parser in specified order
cout <identifier, ‘cout’>
<< <operator, ‘<<‘>
3 <literal, ‘3’>
+ <operator, ‘+’>
2 <literal, ‘2’>
+ <operator, ‘+’>
3 <literal, ‘3’>
; <punctuator, ‘;’>
Tokens
if (num1 == num2)
result = 1;
else
result = 0;

\tif (num1 == num2)\n\t\tresult = 1;\n\telse\n\t\tresult = 0;

Tokens
• Token class
• In English: noun, verb, adjective, …..

• In a programming language: identifier, keyword, (, ), number, …

Tokens
• Token classes correspond to sets of strings.

• Identifier:
- Identifiers are strings of letters, digits, and underscores, starting with a letter or an
underscore
num1, result, name20, _result, …..
• Integer:
- A non-empty string of digits
10, 89, 001, 00, …….
• Keyword:
- A fix set of reserved words
if, else, for, while, ….
• Whitespace:
- A non-empty sequence of blanks, newlines, and tabs
Lexical Analysis

Tokens Abstract
Strings/Files
<name, attribute> Syntax Trees

Lexing Parsing
Lexical Analysis

<id, ‘result’> Abstract

result=50 <op, ‘=‘>
Syntax Trees
<int, ’50’>

Lexing Parsing
Lexical Analysis
\tif (num1 == num2)\n\t\tresult = 1;\n\telse\n\t\tresult = 0;

=> Go through and identify the tokens of the substrings.

Whitespace: A non-empty sequence of blanks, newlines, and tabs

Keywords: A fix set of reserved words
Identifiers: Identifiers are strings of letters, digits, and underscores, starting with a letter or an
underscore
Numbers
Operators
OpenParenthesis
CloseParenthesis
Semicolon
Lexical Analysis: Regular expression
• Lexical structure = token classes

• Token classes correspond to sets of strings.

- Use regular expressions to specify which set of strings belongs to each token class
Lexical Analysis: Regular expressions
• Single character
‘a’ = {“a”}
• Epsilon
ε = {“”}
• Union
A + B = {a | a∈A} ∪ {b | b ∈B}
• Concatenation
AB = {ab | a∈A ∧ b ∈B}
• Iteration
A* = 𝑖≥0 𝐴𝑖 , 𝐴𝑖 = A……….A (i times)
𝐴0 = ε
Lexical Analysis: Regular expressions
• The regular expression over Σ are the smallest set of expressions including

R = ε
| ‘a’ where c ∈ Σ
| A+B where A, B are regular expressions over Σ
| AB where A, B are regular expressions over Σ
| A* where A is a regular expression over Σ
Lexical Analysis: Regular expressions
Σ = {0, 1}

𝑖
1* = 𝑖≥0 1 = ε + 1 + 11 + 111 + 1111 + ……..

(1+0)1 = {ab | a ∈ 1+0 ∧ b ∈ 1} = 11 + 01

0* + 1* = {0^𝑖 | 𝑖≥0} ∪ {1^𝑖 | 𝑖≥0} = ε + 0 + 00 + 000 + 0000 + ……….

+ ε + 1 + 11 + 111 + 1111 + ……..
(0+1)* = 𝑖≥0(0 + 1)𝑖
= ε + (0+1) + (0+1) (0+1) + (0+1) …… (0+1)
= all strings of 0’s and 1’s
= Σ*
Lexical Analysis
Meaning function L maps syntax to semantics

L(e) = M

Regular Set of
expression strings
L(regular_expression)
L(regular_expression) -> set of strings

‘a’ = {“a”} => L(‘a’) = {“a”}

ε = {“”} => L(ε) = {“”}
A+B=A∪B => L(A + B) = L(A) ∪ L(B)
AB = {ab | a∈A ∧ b ∈B} => L(AB) = {ab | a∈L(A) ∧ b ∈L(B)}
A* = 𝑖≥0 𝐴𝑖 , => L(A*) = 𝑖≥0 𝐿(𝐴𝑖 )
Regular Expression
• keyword: A fix set of reserved words (“if” or “else” or “for” or …..)
Regular expression for if: ‘i’’f’
Regular expression for else: ‘e’’l’’s’’e’
Regular expression for for: ‘f’’o’’r’

Regular expression for keyword:

‘i’’f’ + ‘e’’l’’s’’e’ + ‘f’’o’’r’ + ……….
=> ‘if’ + ‘else’ + ‘for’ + ……….
Regular Expression
• Integer: a non-empty string of digits

- regular expression for the set of strings corresponding to all the single
digits

digit = ‘0’ + ‘1’ + ‘2’ + ‘3’ + ‘4’ + ‘5’ + ‘6’ + ‘7’ + ‘8’ + ‘9’

integer = digit digit* = digit+

Identifier: strings of letters, digits, and underscores, starting with a letter or
an underscore.

digit = ‘0’ + ‘1’ + ‘2’ + ‘3’ + ‘4’ + ‘5’ + ‘6’ + ‘7’ + ‘8’ + ‘9’
= [0-9]
letter_ = [a-zA-Z_]
identifier = letter_(letter_ + digit)*
Whitespace: a non-empty sequence of blanks, newlines, and tabs

whitespace = (‘ ‘ + ‘\n’ + ‘\t’)+

[email protected]

=> Make regular expression for this email address:

letter+’@’letter+’.’letter+’.’letter+
Regular Expression
• At least one: AA*  A+

• Union: A|B A+B

• Option: A+ε  A?

• Range: ‘a’ + ‘b’ + …+ ‘z’  [a-z]

• Excluded range: complement of [a-z]  [^a-z]

Number in Pascal: A floating point number can have some digits, an

optional fraction and an optional exponent (3.15E+10, 8E-3, 15.6, …)

digit = ‘0’+’1’+’2’+’3’+’4’+’5’+’6’+’7’+’8’+’9’
digits = digit+
opt_fraction = (‘.’digits) + ε = (‘.’digits)?
opt_exponent = (‘E’(‘+’ + ’-’ + ε)digits) + ε = (‘E’(‘+’ + ‘-’)?digits)?
num = digits opt_fraction opt_exponent
Regular Expression
• Regular expressions describe many useful languages

• Regular languages are a language specification

• We still need an implementation
Regular Expressions => Lexical Spec
1. Write a regular expressions for the lexemes of each token class
• number = digit+
• keyword = ‘if’ + ‘else’ + …
• identifier = letter_(letter_ + digit)*
• openPar = ‘(‘
• closePar = ‘)’
• ………..

2. Construct R, matching all lexemes for all tokens

R = keyword + identifier + number + …..
= R1 + R2 + ….
• (This step is done automatically by tools like flex)
3. Let input be x1 ….xn
For 1  i  n check x1…..xi L(R) ?

4. If success, then we know that

x1…..xi L(Rj) for some j

R = R1 + R2 + R3 + …..

5. Remove x1 ….xn from input and go to (3)

How much input is used?

If x1 ….xi L(R)
And x1 ….xj L(R)
ij

Rule: Pick longest possible string in L(R)

– Pick k if k > i
– The “maximal munch”
Which token is used?
x1 ….xi L(Rj)
x1 ….xi L(Rk) => which token is used?

Keywords = ‘if’ + ‘else’ + ….

Identifiers = letter(letter + digit)*

if L(Keywords)
if L(Identifiers)
=> Choose the rule listed FIRST.
• What if no rule matches?
x1 ….xi L(R)

Error = all strings not in the language of our lexical specification

Make a regular expression for error strings and PUT IT LAST IN PRIORITY
(lowest priority)
• Regular expressions are a concise notation for string patterns

• Use in lexical analysis requires small extensions

• To resolve ambiguities
• Matches as long as possible
• Highest priority match
• To handle errors
• Make a regular expression for error strings and PUT IT LAST IN PRIORITY.
Make a regular expression for:
• Keyword is a reserved word whose meaning is already defined by the
programming language. We cannot use keyword for any other purpose
inside programming. Every programming language have some set of
keywords.
Examples: int, do, while, void, return, …………
Make a regular expression for:
• Identifiers
Identifiers are the name given to different programming elements. Either
name given to a variable or a function or any other programming element,
all follow some basic naming conventions listed below:

1.Keywords must not be used as an identifier.

2.Identifier must begin with an alphabet a-z A-Z or an underscore_ symbol.
3.Identifier can contains alphabets a-z A-Z, digits 0-9 and underscore _ symbol.
4.Identifier must not contain any special character (e.g. !@$*.'[] etc.) except
underscore _.
Make a regular expression for:
• Operator
Operators are the symbol given to any arithmetical or logical operations.
Various programming languages provides various sets of operators some
common operators are:
• Arithmetic operator (+, -, *, / %)
• Assignment operator (=)
• Relational operator (>, <, >=, <=, ==, !=)
• Logical operator (&&, ||, !)
• Bitwise operator (&, |, ^, ~, <<, >>)
• Increment/Decrement operator (++, --)
• Conditional/Ternary operator (? :)
Make a regular expression for:
• Literals
Literals are constant values that are used for performing various operations and
calculations. There are basically three types of literals:
1.Integer literal
An integer literal represents integer or numeric values.
Example: 1, 100, -12312 etc
2.Floating point literal
Floating point literal represents fractional values.
Example: 2.123, 1.02, -2.33, 13e54, -23.3 etc
3.Character literal
Character literal represent character values. Single character are enclosed in a single
quote(' ') while sequence of character are enclosed in double quotes(" ")
Example: 'a', 'n', "Hello", "Hello123" etc.
Finite Automata
• Regular expressions = specification
• Finite automata = implementation

• A finite automata consists of

• An input alphabet 
• A finite set of states S
• A start state q0
• A set of accepting states F  S
• A set of transitions δ input
state state
Finite Automata
• Transition
s1 a s2
• Is read
In state s1 on input a go to state s2

• If end of input and in accepting state => accept

• Otherwise => reject
• Terminates in state s  F
• Get stuck
Finite Automata
• A state

• The start state

• An accepting state

a
• A transition
Finite Automata
• A finite automata that accepts only “a”

a
q0 q1

• What happen if input strings are:

• “a”
• “b”
• “ab”

• Language of a finite automata is set of accepted strings.

Finite Automata
• A finite automata accepting any number of 0’s followed by a single 1.
0
1
q0 q1

STATE INPUT STATE INPUT

q0 001 q0 011

q0 001 q1 011

q1 001

Accept Reject
Regular Expressions to non-deterministic
finite automata (NFA)

Non-
Deterministic
deterministic Table-driven
Lexical Regular Finite
Finite Implementat
Specification Expressions Automata
Automata ion of DFA
(DFA)
(NFA)
Regular Expressions to NFA
• For each kind of regular expression, define an equivalent NFA that accepts
exactly the same language as the language of a regular expression.
NFA for regular expression M
M

• For ε 

• For input a a
Regular Expressions to NFA
• Concatenation
• For RS R  S

 R 
• Union
• For R + S  
S

• Iteration  
R
• For R*

Regular Expressions to NFA
• Consider the regular expression (0+1)(01)*
0
• For 0

1
• For 1

0
ε ε
• For 0 + 1
ε 1 ε
Regular Expressions to NFA
• Consider the regular expression (0+1)(01)*

• For 01 0 ε 1

ε
• For (01)*
ε 0 ε 1 ε

ε
Regular Expressions to NFA
• Consider the regular expression (0+1)(01)*

0 ε
B C
ε ε
ε ε 0 ε 1 ε
A F G H I J K L
ε 1 ε ε
D E
Regular Expressions to non-deterministic
finite automata (NFA)

Non-
Deterministic
deterministic Table-driven
Lexical Regular Finite
Finite Implementati
Specification Expressions Automata
Automata on of DFA
(DFA)
(NFA)
NFA to DFA
• Simulate the NFA
• Each state of DFA
= a non-empty subset of states of the NFA
• Start state of DFA
= the set of NFA states reachable through -moves from NFA start state
• Add a transition S a S’ to DFA if
– S’ is the set of NFA states reachable from any
state in S after seeing the input a, considering -moves as well
• Final state of DFA
= the set includes the final state of the NFA
NFA to DFA
• NFA for (0+1)(01)*

0 ε
B C
ε ε
ε ε 0 ε 1 ε
A F G H I J K L
ε 1 ε ε
D E
NFA to DFA
• NFA for (0+1)(01)*
0 ε
B C
ε ε
ε ε 0 ε 1 ε
A F G H I J K L
ε 1 ε ε
D E

0 CFGHL 0
1
ABD IJ KLH
1 EFGHL 0
0
Regular Expressions to non-deterministic
finite automata (NFA)

Non-
Deterministic Table-driven
Lexical Regular deterministic
Finite Automata Implementation
Specification Expressions Finite Automata
(DFA) of DFA
(NFA)
Implementation of DFA
• A DFA can be implemented by a 2D table T
– One dimension is “states”
– Other dimension is “input symbol”

a
– For every transition Si Sk define T[i,a] = k
a b
Input symbols
i k
j
states k
l
Implementation of DFA
• DFA for (0+1)(01)*
0 S1 0
S0 S3 1 S4
1 S2 0
0 0 1
S0 S1 S2
S1 S3
S2 S3
S3 S4
S4 S3
Implementation of DFA
i = 0;
state = 0;
0 1
while (input[i]){
state = T[state, input[i++]]; S0 S1 S2
} S1 S3
S2 S3
S3 S4
S4 S3
Implementation of DFA
• DFA for (0+1)(01)* 0 S1 0 1
S0 S3 S4
1 S2 0 0

0 1 0 1
S0 S1 S2 S0 S1 S2
S1 S3 S1
S3
S2 S3 S2
S3 S4 S3 S4
S4 S3 S4
Implementation of NFA
0 ε
0 1 ε B C
ε ε
A {B, D} ε ε 0 ε 1 ε
A F G H I J K L
B {C} ε 1 ε ε
D E
C {F}
D {E}
E {F}
F {G}
G {H, L}
H {I}
I {J}
J {K}
K {L, H}
Summarize
• Conversion of NFA to DFA is the key
• DFAs are faster and less compact so the tables can be very large
• NFAs are slower to implement but more concise.
• In practice, tools provide tradeoffs between speed and space.
• Tools give generally a series of options in the form of configuration files or
command lines which allow you to choose whether you want to be closer
to a full DFA or to a pure NFA.
Assignment 1 (Lexical Analyzer)

Python Quiz
33% (3)
Python Quiz
6 pages
Principles of Programming Languages Lecture Notes PDF
67% (3)
Principles of Programming Languages Lecture Notes PDF
120 pages
Lisp Interpreter in Rust
From Everand
Lisp Interpreter in Rust
Vishal Patil
1/5 (1)
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Chapter 2 - Lexical Analysis_Regular Expressions(1)
No ratings yet
Chapter 2 - Lexical Analysis_Regular Expressions(1)
27 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
69 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
56 pages
Compilers - Week 2
No ratings yet
Compilers - Week 2
14 pages
Compiler Design Chapter-2
60% (5)
Compiler Design Chapter-2
105 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
CD_UNIT-2
No ratings yet
CD_UNIT-2
64 pages
2_Lexical Analysis
No ratings yet
2_Lexical Analysis
52 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
40 pages
Chapter 3 Lexical Analysis
No ratings yet
Chapter 3 Lexical Analysis
5 pages
2 Lex
No ratings yet
2 Lex
45 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
Chapter-2[1]
No ratings yet
Chapter-2[1]
77 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
38 pages
Lexical Analysis 3
No ratings yet
Lexical Analysis 3
27 pages
Lexical Analysis
No ratings yet
Lexical Analysis
44 pages
Chapter 2
No ratings yet
Chapter 2
56 pages
Module 3
No ratings yet
Module 3
7 pages
ch3 M.PPTX - 0
No ratings yet
ch3 M.PPTX - 0
46 pages
Compiler Design
No ratings yet
Compiler Design
122 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
4-LexicalAnalysis
No ratings yet
4-LexicalAnalysis
27 pages
Lecture 03
No ratings yet
Lecture 03
42 pages
Compiler-Lexical Analysis
100% (1)
Compiler-Lexical Analysis
59 pages
COS 320 Compilers: David Walker
No ratings yet
COS 320 Compilers: David Walker
38 pages
Chapter Two (3) (Autosaved)
No ratings yet
Chapter Two (3) (Autosaved)
29 pages
Lexical Analyzer in Perspective: Parser Source Program Token
No ratings yet
Lexical Analyzer in Perspective: Parser Source Program Token
22 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
Lexical Analyzer 2023
No ratings yet
Lexical Analyzer 2023
38 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Lexical Analysis
No ratings yet
Lexical Analysis
41 pages
CC 2
No ratings yet
CC 2
65 pages
Chapter 2
No ratings yet
Chapter 2
27 pages
Lexical Analysis1
No ratings yet
Lexical Analysis1
44 pages
Compiler
No ratings yet
Compiler
60 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
2024_CD-Ch02_Lexical_Analysis
No ratings yet
2024_CD-Ch02_Lexical_Analysis
25 pages
Intro To Compilers Lecture 2
No ratings yet
Intro To Compilers Lecture 2
15 pages
CD ch2
No ratings yet
CD ch2
104 pages
4 Lexical Analysis
No ratings yet
4 Lexical Analysis
60 pages
CH 2 - Lexical Analysis
No ratings yet
CH 2 - Lexical Analysis
36 pages
Chapter 2
No ratings yet
Chapter 2
91 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
Unit22pdf 2021 03 13 13 38 11
No ratings yet
Unit22pdf 2021 03 13 13 38 11
114 pages
Lexical Analysis
No ratings yet
Lexical Analysis
31 pages
Lecture3_E
No ratings yet
Lecture3_E
153 pages
cd1
No ratings yet
cd1
92 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
rkCD-Chapter 2 - LEXICAL ANALYSIS
No ratings yet
rkCD-Chapter 2 - LEXICAL ANALYSIS
9 pages
Lexical Analysis: CD: Compiler Design
No ratings yet
Lexical Analysis: CD: Compiler Design
122 pages
CD KCS502 Unit 1 B
No ratings yet
CD KCS502 Unit 1 B
12 pages
Unit 2 Lexical Analyzer
No ratings yet
Unit 2 Lexical Analyzer
63 pages
Lect2 Lexical
No ratings yet
Lect2 Lexical
9 pages
Compiler Design - Lexical Analysis
No ratings yet
Compiler Design - Lexical Analysis
16 pages
Ch3 - Lexical Analysis
No ratings yet
Ch3 - Lexical Analysis
52 pages
04 Lexi Cal A Analysis
No ratings yet
04 Lexi Cal A Analysis
39 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
M.igcse .2015.002 - Sets - Exercises - 16. 09. 2013 PDF
No ratings yet
M.igcse .2015.002 - Sets - Exercises - 16. 09. 2013 PDF
6 pages
Monk29 PDF
100% (3)
Monk29 PDF
535 pages
Ai QB 2
No ratings yet
Ai QB 2
8 pages
Automata and Compiler Design: D.Rahul
No ratings yet
Automata and Compiler Design: D.Rahul
638 pages
Automata Theory: José Anastacio Hernández Saldaña
No ratings yet
Automata Theory: José Anastacio Hernández Saldaña
4 pages
A Tutorial On Lambda Prolog
No ratings yet
A Tutorial On Lambda Prolog
35 pages
CD Chapter 4
No ratings yet
CD Chapter 4
30 pages
Baker CS341 Packet PDF
0% (1)
Baker CS341 Packet PDF
373 pages
SQL To Pyspark Conversion
No ratings yet
SQL To Pyspark Conversion
9 pages
Introduction To Algorithms: What Is An Algorithm?
No ratings yet
Introduction To Algorithms: What Is An Algorithm?
3 pages
Python Functions: Creating A Function
No ratings yet
Python Functions: Creating A Function
7 pages
CD Unit-3
No ratings yet
CD Unit-3
5 pages
CS 501 Theory of Computation CCP
No ratings yet
CS 501 Theory of Computation CCP
22 pages
Vol 2 Sep
No ratings yet
Vol 2 Sep
701 pages
Compiler Design - Parser Design With Lex and Yacc
No ratings yet
Compiler Design - Parser Design With Lex and Yacc
8 pages
Inference Is The Act or Process of Deriving Logical Conclusions From Premises Known or Assumed To
No ratings yet
Inference Is The Act or Process of Deriving Logical Conclusions From Premises Known or Assumed To
4 pages
Phase Structure Grammar: Grammars, (2) Finite-State Machines, and Turing Machine
No ratings yet
Phase Structure Grammar: Grammars, (2) Finite-State Machines, and Turing Machine
9 pages
Chapter One
No ratings yet
Chapter One
25 pages
TD 7 Pushdown Automata
No ratings yet
TD 7 Pushdown Automata
2 pages
Placing Decimals With Multiplication
0% (1)
Placing Decimals With Multiplication
20 pages
Mathematics: Mathematics (Disambiguation) Math (Disambiguation)
No ratings yet
Mathematics: Mathematics (Disambiguation) Math (Disambiguation)
4 pages
CC102 Lesson 3 Bsit - PPT Variables Data Types
100% (1)
CC102 Lesson 3 Bsit - PPT Variables Data Types
22 pages
Mutna Logika
No ratings yet
Mutna Logika
10 pages
Chomsky Normal Form
No ratings yet
Chomsky Normal Form
5 pages
LTL Vs CTL
No ratings yet
LTL Vs CTL
7 pages
2689498
100% (1)
2689498
14 pages
Günther Patzig Aristotle's Theory of The Syllogism A Logico-Philological Study of Book A of The Prior Analytics Springer Netherlands (1968)
No ratings yet
Günther Patzig Aristotle's Theory of The Syllogism A Logico-Philological Study of Book A of The Prior Analytics Springer Netherlands (1968)
231 pages
Primitive Recursive Function
No ratings yet
Primitive Recursive Function
29 pages

Chapter2-Lexical Analysis

Uploaded by

Chapter2-Lexical Analysis

Uploaded by

Chapter 2 – Lexical Analysis

Source code Front End Back End Target code

• Front End: Analysis

Lexical Syntax Semantic

• Lexical Analysis: breaks input into individual words – “tokens”

-> read the input characters of the source program

• A lexeme is a sequence of characters in the source program that matches

• A token is a pair consisting of a token name and an optional attribute

\tif (num1 == num2)\n\t\tresult = 1;\n\telse\n\t\tresult = 0;

• In a programming language: identifier, keyword, (, ), number, …

<id, ‘result’> Abstract

=> Go through and identify the tokens of the substrings.

Whitespace: A non-empty sequence of blanks, newlines, and tabs

• Token classes correspond to sets of strings.

(1+0)1 = {ab | a ∈ 1+0 ∧ b ∈ 1} = 11 + 01

0* + 1* = {0^𝑖 | 𝑖≥0} ∪ {1^𝑖 | 𝑖≥0} = ε + 0 + 00 + 000 + 0000 + ……….

‘a’ = {“a”} => L(‘a’) = {“a”}

Regular expression for keyword:

integer = digit digit* = digit+

whitespace = (‘ ‘ + ‘\n’ + ‘\t’)+

=> Make regular expression for this email address:

• Union: A|B A+B

• Range: ‘a’ + ‘b’ + …+ ‘z’  [a-z]

• Excluded range: complement of [a-z]  [^a-z]

optional fraction and an optional exponent (3.15E+10, 8E-3, 15.6, …)

• Regular languages are a language specification

2. Construct R, matching all lexemes for all tokens

4. If success, then we know that

5. Remove x1 ….xn from input and go to (3)

Rule: Pick longest possible string in L(R)

Keywords = ‘if’ + ‘else’ + ….

Error = all strings not in the language of our lexical specification

• Use in lexical analysis requires small extensions

1.Keywords must not be used as an identifier.

• A finite automata consists of

• If end of input and in accepting state => accept

• The start state

• What happen if input strings are:

• Language of a finite automata is set of accepted strings.

STATE INPUT STATE INPUT

You might also like