0% found this document useful (0 votes)
59 views

Chapter 3 Finite Automata and Lexical Analysis

Uploaded by

Yohannes Dereje
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Chapter 3 Finite Automata and Lexical Analysis

Uploaded by

Yohannes Dereje
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 95

Chapter 3:

Finite automata and lexical analysis


Contents

 Lexical analysis
 The role of the lexical analyzer: Lexical
scanning, token classes, keyword recognition.

 Finite automata
 Alphabet, Strings and languages
 Regular expressions
 Finite automata (DFA and NFA)
 From regular expressions to finite automata
 Minimizing the number of states of a DFA
Contents
 Lexical analysis
 The role of the lexical analyzer: Lexical scanning,
token classes, keyword recognition.

 Finite automata
 Alphabet, Strings and languages
 Regular expressions
 Finite automata (DFA and NFA)
 From regular expressions to finite automata
 Minimizing the number of states of a DFA
 Lexical Analysis (Scanning): plays an important
role in compilation process of a program.
 It takes the source program as input and reads it
Lexical
one character at a time and produces equivalent
analysis
token stream of a program.
 For example, A = B + C * 50 (source program)
statement.
 The corresponding tokens stream after lexical
analyzer phase are x1 = x2 + x3 * 50, where x1,
x2 and x3 are tokens.
 Other tasks performed by Lexers are:
 skip comments and white space;
Lexical
 Detect syntactic errors in tokens
analysis
 Input program representation: Character
sequence

Lexical  Output program representation: Token

analysis sequence
 Analysis specification: Regular expressions
 Recognizing (abstract) machine: Finite
Automata
 Implementation: Finite Automata
 Lexical analyzer performs the following tasks:
 Reads the source program, scans the input
characters, group them into lexemes and produce
Role of the token as output.
Lexical
Analyzer
 Enters the identified token into the symbol table.
 Strips out whitespaces and comments from
source program.
 Correlate error messages with the source program
i.e., displays error messages with its occurrence
by specifying the line number.
 Expands the macros if it is found in the source
program.
 Simplicity of design of compiler
- The removal of white spaces and
comments enables the syntax analyzer
Need of
Lexical for efficient syntactic constructs.
Analyzer  Compiler efficiency is improved
- Specialized buffering techniques for
reading characters speed up the
compiler process.

 Compiler portability is enhanced


 Syntax: the form or structure of the expressions,
statements, and program units.
• Specifying how statements, declarations, and other
Descripti language constructs are written.
on of a  Semantics: the meaning of the expressions, statements,
Language and program units.

• What programs do, their behavior and meaning.


 Semantics is more complex and involved. It is harder to
define, e.g., natural language .
 Example: if statement
• Syntax: if (<expr>) <statement>

• Semantics: if <expr> is true, execute <statement>


 Sentence is a string of characters over some
alphabets.
 Language is a set of sentences
Descripti
on of a  Lexeme is the lowest level syntactic unit of the
Language language (i.e.+, int, total)
 The lexemes of a PL include its numeric literals,
operators, and special words…
 Lexemes are partitioned into groups -for
example, the names of variables, methods,
classes, and so forth in a PL form a group called
identifiers.
 Token is a category of lexemes (e.g.
identifier, Keyword, Whitespace…).
 In programming languages, the following
Descripti
on of a are the possible tokens to be identified:
- Keywords
Language
- Constants
- Identifiers
- Numbers
- Operators and
- Punctuation symbols.

 Spaces or some characters may work as


delimiter to identify a word.
 Consider
the following Java
statement:
Example: index = 2 * count + 17;
 Thelexemes and tokens of this
statement are:
 Pattern: describes a rule that must be matched
by sequences of characters(lexemes) to form a
Descripti
token.
on of a
Language  It can be defined by regular expressions or
grammar rules.
 For example,
[A-Za-z][A-Za-z_0-9]*
 There are three approaches to building a lexical analyzer:

1) Write a formal description of token patterns of the


language using a descriptive language related to
Approaches
of building regular expressions.
Lexical  These descriptions are used as input to a software tool
Analyzer (for example, Lex) that automatically generates a
lexical analyzer.

2) Design a state transition diagram that describes the


token patterns of the language and write a program
that implements the diagram.

3) Design a state transition diagram that describes the


token patterns of the language and hand construct a
table driven implementation of the state diagram.
 An algorithm or program that
automatically recognizes if a particular
string belongs to the language or not,
by checking the grammar of the string.
 is a self-operating machine, or a machine or
Automata
control mechanism designed to follow automatically
a predetermined sequence of operations, or respond
to predetermined instructions.
 Automata means something that works
automatically.
 There are different varieties of such abstract
machines (also called models of computation)
which can be defined mathematically.
 Finite Automaton
 Finite automata are formal models of
computation that can accept regular languages
Automata corresponding to regular expressions.

Model - used in text processing, compilers, and hardware


design.
 Context-Free Grammar
- used in Programming Languages and Artificial
Intelligence.
 Some important definition regarding to
Specification languages include,
of
Tokens( Alph 1) Alphabet: is a finite, non empty set of
abet, Strings symbols.
and
languages)
 It is denoted by ∑ (Greek letter sigma).

 Example: Σ = {a,b}, Σ = {i,j,k}

 Roman alphabet ∑= {a, b, ……,z},


 Binary alphabet ∑= {0,1} is pertinent to the
theory of computation.
 String - is a finite sequence of symbols, which is

usually written next to one another and not


separated by commas.
Alphabet,
 Example –
Strings &
1) If Σa = {0 ,1}then 001001 is a string over Σa
Languages
2) If Σb = {a ,b , , , z) then axyrpqstcd is a string
over Σb.

3) 00110, 01, 1,0 are all strings over an alphabet

∑ ={0,1}

4) Abab, aabb, ab, ba, a are all strings over an


alphabet ∑ = {a,b}.
1) Empty string : the string of zero length is
called the empty string.
 This is denoted by ϵ.
String  The empty string plays role of 0 in a number system.
Operatio 2) Reverse String: If w = w1w2,…,wn , where
ns each wiϵ∑ , the reverse of w is wnwn -1, …, w1.

3) Substring - z is a substring of w if z appears


consecutively within w.
 Example - deck is a substring of abcdeckabcjkl.
4. Concatenation - assume a string x of length
m and string y of length n, the concatenation
of x and y is written xy, which is the string
String obtained by appending y to the end of x, as
Operatio in x1x2 ...xmy1 y2..... yn.

ns  To concatenate a string with it self many, we

use the “superscript” notation:


String
Operatio
ns
 The Lexicographic ordering of strings is the

same as the dictionary ordering, except that


Lexicograph shorter strings precede longer strings.
ic ordering  The lexicographic ordering of all strings

over the alphabet {0, 1} is

– (∈, 0,1, 00, 01, 10, 11, 000, ...).


String Operations
 Language is simply a set of strings involving
symbols from some alphabet.
 Any set of strings over an alphabet ∑ is
Languages called a language.
 The set of all strings, including the empty
string over an alphabet ∑ is denoted as ∑*.
 Infinite languages L are denoted as:

L = { w ϵ∑*: w has property P}.


 Language is simply a set of strings involving
symbols from some alphabet.
 Any set of strings over an alphabet ∑ is
Languages called a language.
 The set of all strings, including the empty
string over an alphabet ∑ is denoted as ∑* .
 Infinite languages L are denoted as:

L = { w ϵ∑*: w has property P}.


 For example,
1) L1 = { w ϵ(0,1)*: w has an equal number of

0’s and 1’s}.


 L1 = {01, 10,1010. 1100,…..}
Languages
2) L2 = { w ϵ∑*: w = wR} where wR is the

reverse string of w.
 L2 = {101, 10101, radar, level,….}
 Union:
- If L1 and L2 are two languages, then union,
denoted by L1U L2 is a language containing all

Operations strings(w) from both the languages.


on  Concatenation of Languages:
Language
- If L1 and L2 are languages over Σ, their
concatenation is
- L = L1•L2, or simply
- L =L1L2, where
L = {w ∈ Σ* : w = x •Y for some X∈ L1 and
Y∈L2}.
 Kleene Star:
- “Kleene Star” of a language L is denoted by L*.
- L* is the set of all strings obtained by concatenating
zero or more strings from L.
Operations
• L*= w ∈ Σ*:w=w1....w k for some k ≥0 and
on
Language some w1,w2,...,wk ∈ L

• Example: If L = {01, 1, 100} then 110001110011∈ L*,

since 110001110011 = 1• 100• 01• 1• 100 • 1• 1, each

of these strings is in L.
- L*= L0 U L1U L2 , Where L0=Є

 Positive closure: The positive closure of a


+ 1 2
Operations
on
Language
 Regular expressions were mathematical tool

designed to represent regular languages.


 Built from a set of primitives and operations.

 This representation involves a combination of


Regular
strings of symbols from some alphabet Σ,
expressions
parentheses and the operators +, ⋅, and *.
 A regular expression is obtained from the

symbol {a, b, c}, empty string ∈, and empty-


set ∅ perform the operations +, ⋅ and * (union,
concatenation and Kleene star).
A Regular Expression can be recursively defined as
follows −
• ε is a Regular Expression indicates the language
containing an empty string. (L (ε) = {ε})
• φ is a Regular Expression denoting an empty
Regular
language. (L (φ) = { })
Expressions- • x is a Regular Expression where L = {x}
• If X is a Regular Expression denoting the language L(X)
and Y is a Regular Expression denoting the language
L(Y), then
– X + Y is a Regular Expression corresponding to the

Regular language L(X) ∪ L(Y) where L(X+Y) = L(X) ∪ L(Y).


– X . Y is a Regular Expression corresponding to the
Expressions- language L(X) . L(Y) where L(X.Y) = L(X) . L(Y)

– R* is a Regular Expression corresponding to the

language L(R*)where L(R*) = (L(R))*


• If we apply any of the rules several times from 1 to 5,
they are Regular expressions.
 Examples
0 + 1 represents the set {0, 1}
1 represents the set {1}
Regular (0 +1)1 represents the set {01, 11}
expressions (a+b)⋅(b+c) represents the set {ab, bb, ac, bc}
(0 + 1)* = ∈+ (0 + 1) + (0 + 1) (0 + 1)..........=
Σ*
(0 + 1 )+ =(0 +1) (0 +1)*= Σ+ =Σ*- {ε}
 Assume that Σ = {a,b,c}

 Zero or more: a* means zero or more a’s,

 To say zero or more ab’s, i.e.,{λ, ab,abab........,}

you need to say (ab)*.


Building
 One or more: Since a* means zero or more a’s,
Regular
you can use aa* (or equivalently a*a) to mean one
expressions
or more a’s.
 Similarly to describe ‘one or more ab’s”, that is

{ab, abab, ababab, .........}, you can use ab(ab)*.


 Zero or one: It can be described as an optional ‘a’

with (a + λ).
 Examples:
– Represent the following sets by regular expression
a. {∧, ab}
b. {1,11,111....}
c. {ab, a, b, bb}
 Solution
Regular
a. The set {∧, ab} is represented by the regular
expressions expression ∧ + ab
b. The set{1, 11,111,....,}is got by concatenating 1
and any element of {1}*.
Therefore 1(1)* represent the given set.
c. The set {ab, a, b, bb} represents the regular
expression
ab+ a+ b +bb.
 Obtain the regular expressions for the following

sets:
1. The set of all strings over {a, b} beginning and ending
with ‘a’.
Þ The regular expression for ‘the set of all
Regular
strings over {a, b} beginning and ending
expressions with ‘a’ is given by: a (a + b)*a
- Exercises 2. {b2, b5, b8,. . . . .}
Þ The regular expression for {b 2
, b 5
, b
8
, .........} is given by: bb (bbb)*
3. {a2n+1 |n > 0}
Þ The regular expression for {a 2n+1
|n >
0}is given by: a (aa)+
 Let L = {ab, aa, baa}, which of the following

strings are in L*?


I. abaabaaabaa
II. aaaabaaaa
Regular
III. baaaaabaaaab
expressions IV. baaaaabaa
- Exercises Answer: note that L* is a star- closure of a language L
given by L* = L1 U L2 U L3 U …..

V. abaabaaabaa = abaabaa ab aa  This string is in L*.


VI. 
VII. 
VIII.
 The finite automaton is a mathematical model
of a system, with discrete inputs and outputs.
 The system can be in any one of a finite
number of internal configurations or “states”.
Finite  The state of the system summarizes the

Automata information concerning past inputs that is


needed to determine the behavior of the
system on subsequent inputs.
 A finite State Automata / Finite
Automata is an abstract machine
having:
– A finite set of states. These carry no
further structure and provide a
Finite
simple form of memory.
Automata – A start state and a set of final states.
– A finite set of input symbols
(alphabet)
– A finite set of transition rules, which
specify how the machine in a
particular state responds to a
 The various components of finite
automata are:
– Input Tape
– Finite control
Elements
– Reading Head

of Finite
Automata

– Input Tape – is divided into cells


(squares), which can hold one symbol
from input alphabet.
 The various components of finite
automata are:
– Finite control – it indicates the
Elements
current state and decides the next
of Finite state on receiving a particular input
Automata from the input tape.
– The tape reader reads the cells one by
one from left to right and at any instance
only one input symbol.

– Reading Head – examines read


symbol and moves to the right side
with or without changing the state.
 Finite Automaton can be classified
into two types –
– Deterministic Finite Automaton
(DFA)
Finite – Non-deterministic Finite Automaton
Automata (NDFA / NFA)
 DFA—also known as deterministic
finite state machine—is a finite state
machine that accepts /rejects finite
Determinis strings of symbols and only produces

tic Finite a unique computation of the


automaton for each input string.
Automaton
 Deterministic refers to the
(DFA)
uniqueness of the computation.
 Formal Definition of (DFA)
 A DFA is a 5-tuple M =(𝑄,Σ,𝛿,𝑞0,𝐹)
where
Determinis – 𝑄: A finite set of state
tic Finite – Σ: An alphabet of input symbols
Automaton – 𝛿 ∶ 𝑄 × Σ → 𝑄: A transition
(DFA) function
– 𝑞0 ∈ 𝑄: A start state
– 𝐹 ⊆ 𝑄: A set of accepting (or
final) states
 The input mechanism can move
only from left to right and reads
exactly one symbol on each step.
Determinis  The transition from one internal

tic Finite state to another are governed by the


transition function δ.
Automaton
 If δ(q0 , 0) =q1 , then if the DFA is in
(DFA)
state q0 and the current input
symbol is 0, the DFA will go into
state q1.
 Q = {0,1,2}
 Σ ={a}

Example -  0 = Start state


 1 = final state
DFA
 transition function are:
 δ( 0,a) 1
 δ( 1,a) 2
 δ( 2,a) 2
 Design a DFA, the language recognized by the
Example#
Automaton being
L = {an b :n ≥ 0}
 For the given language L = {an b :n ≥ 0}, the
strings could be
b, ab, a2b, a3b,....,.
• Therefore the DFA accepts all strings consisting of
an arbitrary number of a’s, followed by a single b.
 In the automata theory, a
nondeterministic finite automaton (NFA)
or nondeterministic finite state machine
is a finite state machine where from
Non-
each state and a given input symbol the
deterministic
automaton may jump into several
Finite possible next states.
Automata  A non-deterministic finite state automaton
(NFA) (NFA) is a 5- tuple (Q, Σ, δ,s0, F), where:
– Q is a finite set called the states;
– Σ is a finite set called the alphabet;
– δ : Q × (Σ ∪ {ε}) → 2Q is the transition
function;
Example -
Please note that this is an NFA
NFA as
• δ(q0 ,0) = q0 and δ(q0,0)=q1
DFA Vs NFA:
S.No DFA NFA
1. For Every symbol of the We do not need to specify how does
alphabet, there is only one state the NFA react according to some
transition in DFA. symbol.
2. DFA cannot use Empty String NFA can use Empty String
transition. transition.
3. DFA can be understood as one NFA can be understood as multiple
machine. little machines computing at the
same time.
4. DFA is more difficult to
NFA is easier to construct.
construct.
DFA Vs NFA:
 Thereare different representation
of DFA, such as:
 Graph - Transition diagram
 Tabular - Transition table
DFA
Representations  Mathematical - Detailed description
 Directed graphs with vertices and edges with
set of symbols (alphabet).
A diagram consisting of circles to represent
states and directed line segments to represent

Transition transitions between the states.


diagram: Initial state

Final state
a state transition table is a table
showing what state finite state
Table machine(or states in the case of an
transition
NFA) will move to, based on the
current state and other inputs.
Row – states
Column – inputs
Entries – next state
 - start state
* - final state
The mathematical model of automat
consists of
Detailed Q  finite set of states
description
∑  finite set of input symbols

δ : Q X ∑  Q , transition function

q0  start / initial state

F  final / accepting state


Transition functions returns next
state.
Transition Parameters
functions
Current state
Current input symbols
δ (Current state, current input symbol) 
next state.

Example
δ (q0,0)q1
δ (q0,1)q0
δ (q1,0)q1
δ (q1,1)q2
δ (q2,0)q2
Determine the DFA schematic for M =
(Q, Σ, δ ,q ,F ), where Q = {q1, q2,
Example - q3}, Σ = {0,1}, q1 is the start state,
DFA
F = {q2} and δ is given by the table
below
 Language of accepted Strings
 Consider a DFA shown in figure below

Input strings are 01


011,
Check the acceptability of each string
Consider the following NFA given below

Check the acceptability of the following strings


i) 011
ii) 010
iii) 011011
Consider the following NFA given below

Check the acceptability of the following strings


i) 011
ii) 010
iii) 011011
1) The regular expression ϵ denotes the language ϵ;
no strings belong to this language, not even the
empty string.
 RE = ϵ

2) The regular expression ∅ denotes the language ∅; no


Regular
Expressions strings belong to this language, not even the empty
to NFA string.
– RE = ∅

3) For any x in Σ, the regular expression denotes

the language {x}.


 RE = x
3) For juxtaposition, strings in L(r1) followed

by strings in L(r2) , we chain the NFAs


together
 RE = r1r2
Regular
Expressions
to NFA

4) The “+” denotes “or” in a regular expression,

we would use an NFA with a choice of paths.


 RE = r1+r2
5) The star (*) denotes zero or more

applications of the regular expression, hence


a loop has to be set up in the NFA.
 RE = r*
Regular
Expressions
to NFA
1) Construct NFA with ϵ moves with regular expression (0+1)*.
Solution:
Examples The NFA will be constructed step by step by breaking regular
expression into small regular expressions.
 R3 = (r1 + r2)

 R = r3* , where r1 = 0 , r2 = 1

NFA for r1 will be

NFA for r2 will be

NFA for r3 will be


1) Construct NFA with ϵ moves with regular expression (0+1)*.
Solution:
Examples The NFA will be constructed step by step by breaking regular
expression into small regular expressions.
 R3 = (r1 + r2)

 R = r3* , where r1 = 0 , r2 = 1

And finally NFA for the regular expression (0+1)* will be


2) Construct NFA with ϵ moves with regular expression
(01+2*)0.

Exercise Solution:
The NFA will be constructed step by step by breaking regular
expression into small regular expressions.
 R = (r1 + r2)r3 , where r1 = 01 , r2 = 2* and r3 = 0


• Two finite accepters M1 and M2 are equivalent,
iff L(M1) =L(M2) i.e., if both
EQUIVALENC
E OF NFA accept the same language.
AND DFA
• Both DFA and NFA recognize the same class of
languages.
• It is important to note that every NFA has an
equivalent DFA.
Problem Statement
• Let X = (Qx, ∑, δx, q0, Fx) be an NDFA which
accepts the language L(X).
• We have to design an equivalent DFA Y = (Qy,
∑, δy, q0, Fy) such that L(Y) = L(X).
Algorithm
NDFA to DFA
Conversion- • Input: An NDFA
Subset
Constructi • Output: equivalent DFA
on
• Step 1 Create state table from the given
NDFA.
• Step 2 Create a blank state table under
possible input alphabets for the
equivalent DFA.
• Step 3 Mark the start state of the DFA
by q0 (Same as the NDFA).
• Algorithm
• Step 4 Find out the combination of States {Q0, Q1,... ,Qn} for
NDFA to DFA each possible input alphabet.
Conversion
• Step 5 Each time we generate a new DFA state under the
input alphabet columns, we have to apply step 4 again,
otherwise go to step 6.
• Step 6 The states which contain any of the final states of the
NDFA are the final states of the equivalent DFA.
• Let us illustrate the conversion of NFA(NDFA ) to DFA
through an example.
Example
.

71
• ε-closure for a given state A means a set of states
Steps for which can be reached from the state A with only
converting
ε(null) move including the state A itself.
NFA with ε to
DFA: 01: We will take the ε-closure for the starting state of
NFA as a starting state of DFA.

02: Find the states for each input symbol that can be
traversed from the present. I.e., the union of transition
value and their closures for each state of NFA present
in the current state of DFA.

03: If we found a new state, take it as current state


and repeat step 2.

04: Repeat Step 2 and Step 3 until there is no new


state present in the transition table of DFA.
05: Mark the states of DFA as a final state which
Example

73
Con…

74
DFA
DFA minimization is the task of transforming a given
Minimizati
deterministic finite automaton (DFA) into an
on
equivalent DFA that has a minimum number of
states.

There are two popular methods for minimizing a DFA-


1. Minimization of DFA Using Equivalence Theorem-
DFA 01: Eliminate all the dead states and
Minimizatio inaccessible states from the given DFA (if
n any).
 Dead State
 All those non-final states which transit to itself
for all input symbols in ∑ are called dead
states.
 Inaccessible State
 All those states which can never be reached

from the initial state are called as inaccessible


states.
02: Now, start applying equivalence theorem.
 Take a counter variable k and initialize it with

value 0.
1. Minimization of DFA Using Equivalence
DFA Theorem-
Minimizatio 03: Increment k by 1.
n  Find Pk by partitioning the different sets of Pk-
1.

 In each set of Pk-1 , consider all the possible

pair of states within each set and if the two


states are distinguishable, partition the set
into different sets in Pk.

04: Repeat step-03 until no change in partition


occurs.
Example- Minimize given DFA Using Equivalence
DFA Theorem-
Minimizatio
n
Example #2
2. DFA Minimization using Myphill-Nerode
DFA Theorem (Table Filling) method
Minimizatio  Steps
n 01: Draw a table for all pairs of states (Qi, Qj) not
necessarily connected directly [All are unmarked
initially]

02: Consider every state pair (Qi, Qj) in the DFA

where Qi ∈ F and Qj ∉ F or vice versa and mark


them. [Here F is the set of final states]
03: Repeat this step until we cannot mark anymore
states −

 If there is an unmarked pair (Qi, Qk), mark it if

the pair {δ (Qi, A), δ (Qj A)} is marked for


 Example - DFA Minimization using Myphill-Nerode Theorem (Table
Filling) method
 To understand the concept of minimization using Myhill-Nerode Theorem,
Let us take an example-

 To Minimize the above DFA using


Myhill-Nerode theorem, follow some
steps:
 Initially an X is placed in each entry
corresponding to one final state and one
non-final state in the following format.
A,
E

B,
H
DFA Minimization
Example -
• There is a wide range of tools
A language for for constructing lexical
specifying lexical
analyzers analyzers.
– Lex
• Lex is a computer program
that generates lexical
analyzers.
• Lex is commonly used with the
yacc parser generator.
• Lex Specification or Structure
A language for
• A LEX program has the
specifying lexical
analyzers following forms:
D1 = R1
D2 = R2
---------------------
Auxiliary ---------------------
Definitions Dn = Rn

• Each Di is distinct name and


each R is a regular expression,
whose symbols are chosen
from ∑Ʋ{D1,D2, …Di-1}.
• Example:
letter = A|B|……….|Z.
digit = 0|1|………...|9.
Identifier = letter(letter | digit)*
P1 = {A1}
P2 = {A2}
---------------------
---------------------
Pn = {An }
Translation Rules

• each pi is a regular expression


called a token pattern over the
alphabet consisting of ∑ and
auxiliary definition names.
• Example:
ab* (for input symbol a,b)
if, then, else (for keywords).
Each Ai is a program fragment
describing what action the lexical
analyzer should take when token P i
is found.
• First, a specification of a
lexical analyzer is prepared by

Creating a lexical creating a program lex.l in the


analyzer Lex language.
• Then, lex.l is run through the
Lex compiler to produce a C
program lex.yy.c.
• Finally, lex.yy.c is run through
the C compiler to produce an
object program a.out, which is
the lexical analyzer that
Creating a lexical
analyzer
• Recognizers
• Tokens can be recognized by Finite
Automata.
– A recognition device reads input
Recognition of
tokens strings over the alphabet of the
language and decides whether the
input strings belong to the language.
– Example: syntax analysis part of a
compiler
– Compilers and Interpreters recognize
syntax and convert it into machine
understandable form.
• Generators
–A device that generates

Recognition of sentences of a language.


tokens – One can determine if the
syntax of a particular sentence
is syntactically correct by
comparing it to the structure of
the generator.
– Example:

Regular expression
 Three general approaches for
the implementation of a lexical

Implementation of analyzer
a lexical analyzer  By using a lexical-analyzer
generator:
 The generator provides routines for
reading and buffering the input.

 To write the lexical analyzer by


using a high level language.
 To write the lexical analyzer by
using a low level language.

You might also like