0% found this document useful (0 votes)

74 views25 pages

2024 CD-Ch02 Lexical Analysis

This document discusses Chapter Two of Compiler Design, focusing on Lexical Analysis and the role of the lexical analyzer in processing source code. It explains the concepts of tokens, lexemes, and patterns, as well as the process of recognizing and recovering from lexical errors. Additionally, it covers finite automata and the conversion from NFA to DFA, highlighting the importance of regular expressions in defining token patterns.

Uploaded by

munyemola

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Topics covered

Lexical Analysis Tools,
Kleene Closure,
FORTRAN,
Language Operations,
Finite Automata,
Special Symbols,
Transition Diagrams,
Character Recognition,
NFA,
Lexical Errors

0% found this document useful (0 votes)

74 views25 pages

2024 CD-Ch02 Lexical Analysis

Uploaded by

munyemola

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Topics covered

Lexical Analysis Tools,
Kleene Closure,
FORTRAN,
Language Operations,
Finite Automata,
Special Symbols,
Transition Diagrams,
Character Recognition,
NFA,
Lexical Errors

Wachemo University

Institute of Technology
Department of Computer Science
Course Title: Compiler Design (CoSc4103)
Chapter Two: Lexical Analysis and Lex

By: Tseganesh M.(MSc.)

Subscribe on Yadah Academy YouTube channel

Compiler Design (CoSc4103)
Chapter Two
Lexical Analysis and Lex
2.1. The role of the lexical analyzer
2.2. Token: Specification and Recognition of Tokens
Outline2.3. Lexical Error-Recovery
2.4. Finite Automata: NFA to DFA Conversation
2.5. A typical Lexical Analyzer Generator
By: Tseganesh M.(MSc.)
2.1. The role of the Lexical Analyzer
 Lexical analysis is the first phase of a compiler.
A lexical analyzer is also called a "Scanner".
 The input to a lexical analyzer is the pure high-level code from the preprocessor.

 Main functions of Lexical analyzer

 1
st task: read the given source code from left to right in character-wise and produce a
sequence of tokens that are uses for syntax analysis.
 i.e., the output of lexical analysis is a stream of tokens, which is input to the parser

 2
nd task: is removing any comments and white spaces from source code in the form of blank,
tab, and newline characters.
 Another task: it generates an error messages, if it finds invalid token from the source program.

 It identifies valid lexemes from the program and returns tokens to the syntax analyzer,
one after the other, corresponding to the getNextToken command from the syntax
analyzer
read char Token & token value
Source Lexical To semantic
Parser
program Analyzer analysis
put back char getNextToken
id

Read entire program into memory Symbol table

11/28/202 WCU-CS Compiled by TM. 2
Lexical Analyzer cont’d……
 The lexical analyzer works closely with the syntax analyzer.
 But, there are some Issues/reasons why to separating lexical analysis from parsing
 Simplicity of design

 Improving compiler efficiency

 Enhancing compiler portability (e.g. Linux to Win)

 When you work on Lexical analysis, there are three important terms to know:
 Lexemes, Pattern, and Tokens,

 Token, Pattern, Lexeme

 Lexeme: is a sequence of characters (alphanumeric) in the source program that matches the
pattern of a token.
 Pattern: is a set of rules for every lexeme that the scanner follow to identify a valid token.
 A pattern explains what can be a token, and

 These patterns can be defined by means of regular expressions

 Tokens: are a set of strings defining an atomic element with a defined meaning
 It is a pre-defined sequence of characters that cannot be broken down further

 A token can have a token name and an optional token/attribute value

11/28/202 WCU-CS Compiled by TM. 2

Lexical Analyzer cont’d……
 Some example of tokens, lexemes, and pattern
Token Lexeme Pattern
Keyword while w-h-i-l-e
Relop < <, >, >=, <=, !=, ==
Integer 7 (0 - 9)*-> Sequence of digits with at least one digit
String "Hi" Characters enclosed by " "
Punctuation , ; , . ! etc.
Identifier number A - Z, a - z A sequence of characters and numbers initiated by a
character.

 But, here is some questions which raised from the tasks of LA:
 How does the lexical analyzer read the input string and break it into lexemes?

 How can it understand the patterns and check if the lexemes are valid?

 What does the Lexical Analyzer send to the next phase?

11/28/202 WCU-CS Compiled by TM. 2

2.2. Token: Specification and Recognition of Tokens
 In programming language; keywords, constants, identifiers, strings, numbers, whitespace,
operators, and punctuations are considered as tokens.
 For example, in C or C++ language, the variable declaration line

 int value = 100;

 contains the tokens:

 int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol).

 Attributes of Token
 In a program, some times more than one lexeme matches pattern correspond to one token,

 So, Lexical analyzer must provide additional information about the particular lexeme.

 Because, the rest of the phases need additional information about the lexeme to perform
different operations.
 Lexical analyzer collects information about tokens into their associated attributes and sends a
sequence of tokens with their information to the next phase.
 i.e., the tokens are sent as a pair of <Token name, Attribute value> to the Syntax
analyzer

11/28/202 WCU-CS Compiled by TM. 6

Tokens cont’d……
 Example: see the tokens and associated attribute-values for the following FORTRAN statement
 E=M * C** 2 are written below as a sequence of pairs:

 <id, pointer to symbol table entry for E> Token Attribute

 <assign-op> ID Index to symbol table entry E
 <id, pointer to symbol table entry for M> =
ID Index to symbol table entry M
 <mult-op>
*
 <id, pointer to symbol table entry for C>
ID Index to symbol table entry C
 <exp-op>
**
 <number, integer value 2> NUM 2

 A lexeme is like an instance of a token, and the attribute column is used to show which lexeme
of the token is used.

 For every lexeme, the 1st and 2nd columns of the above table are sent to the Syntax Analyzer.

11/28/202 WCU-CS Compiled by TM. 7

Tokens cont’d……
 Specifications of Tokens
 To answer the question “how the lexical analyzer can check the validity of lexemes with
tokens”, it is critical to know the following specifications of tokens:
1) Alphabet
2) Strings
3) Special symbols
4) Language
5) Regular expression
6) etc……
 Let us understand how the language theory undertakes these terms:
1. Alphabets
 Any finite set of symbols

 {0,1} is a set of binary alphabets,

 {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal alphabets,

 {a-z, A-Z} is a set of English language alphabets.

2. Strings
 Any finite sequence of alphabets (characters) is called a string.

 A string over some alphabet is a finite sequence of symbols drawn from that alphabet.
11/28/202 WCU-CS Compiled by TM. 8
Tokens cont’d……
 In language theory, the terms sentence and word are often used as synonyms for the term
"string."
 Length of the string S is the total number of occurrence of alphabets, and it is denoted by |S|

 e.g., the length of the string compiler is 8 and is denoted by |compiler| = 8

 A string having no alphabets, i.e. a string of zero length is known as an empty string and is
denoted by ε (epsilon).
3. Special symbols
 A typical high-level language contains the following special symbols:-

Arithmetic Symbols Addition(+), Subtraction(-), Modulo(%), Multiplication(*), Division(/)

Punctuation Comma(,), Semicolon(;), Dot(.), Arrow(->)
Assignment =
Special Assignment +=, /=, *=, -=
Comparison ==, !=, <, <=, >, >=
Preprocessor #
Location Specifier &
Logical &, &&, |, ||, !
Shift Operator >>, >>>, <<, <<<

11/28/202 WCU-CS Compiled by TM. 9

Tokens cont’d……
4. Language
 Language is considered as a finite set of strings over some finite set of fixed alphabets.

 Computer languages are considered as finite sets, and mathematically set operations can be
performed on them.
 Finite languages can be described by means of regular expressions.

5. Regular Expressions
 Regular expressions are an important notation to specify lexeme patterns for a token.

 Each pattern matches a set of strings, so regular expressions serve as names for a set of
strings.
 Regular expressions are used to represent the language for lexical analyzer
 The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme
that belong to the language in hand.
 It searches for the pattern defined by the language rules.

 A grammar defined by regular expressions is known as regular grammar

 The language defined by regular grammar is known as regular language.
11/28/202 WCU-CS Compiled by TM. 10
Tokens cont’d……
Programming language tokens can be described by regular languages.
 There are a number of algebraic laws that are obeyed by regular expressions, also known as
operations on language
 Operations on languages

 There are several important operations that can be applied to languages.

 Union of two languages L and M is written as;

 L U M = {s | s is in L or s is in M}
 Concatenation of two languages L and M is written as;

 LM = {st | s is in L and t is in M}
 Kleene closure of a language L is written as;

 L* = Zero or more occurrence of language L

 Example: the following example shows the operations on strings:

 Let L={0,1} and S={a,b,c}

 Union : L U S={0,1,a,b,c}

 Concatenation : L.S={0a,1a,0b,1b,0c,1c}

 Kleene closure : L*={ ε,0,1,00….}

 Positive closure : L+={0,1,00….}

11/28/2024 WCU-CS Compiled by TM. 11
Tokens cont’d……
 In lexical analysis by using regular expression it is possible to represent:
i. valid tokens of a language,
ii. occurrences of symbols, and
iii. language tokens;

i. Representing valid tokens of a language in regular expression

 If x is a regular expression, then:

 x* means zero or more occurrence of x.

 i.e., it can generate { e, x, xx, xxx, xxxx, … }

 x+ means one or more occurrence of x.

 i.e., it can generate { x, xx, xxx, xxxx … } or x.x*

 x? means at most one occurrence of x

 i.e., it can generate either {x} or {e}.

 [a-z] is all lower-case alphabets of English language.

 [A-Z] is all upper-case alphabets of English language.

 [0-9] is all natural digits used in mathematics.

11/28/2024 WCU-CS Compiled by TM. 12

Tokens cont’d……
ii. Representation of occurrence of symbols using regular expressions
 letter = [a – z] or [A – Z]

 digit = 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 or [0-9]

 sign = [ + | - ]

iii. Representation of language tokens using regular expressions

?
 Decimal = (sign) (digit)
+

 Identifier = (letter)(letter | digit)*

 However, the only problem left with the lexical analyzer is how to verify the validity of a
regular expression used in specifying the patterns of keywords of a language.
 A well-accepted solution to this problem is use finite automata for verification.

 To recognize and verify the tokens, the lexical analyzer builds Finite Automata for every pattern.

 Transition diagrams can be built and converted into programs as an intermediate step.

 Each state in the transition diagram represents a piece of code.

 Every identified lexeme walks through the Automata.

 The programs built from Automata can consist of switch statements to keep track of the state of the
lexeme. The lexeme is verified to be a valid token if it reaches the final state.
13
2.3. Lexical Error Recovery
 Lexical errors:
 are a type of error can be detected during the lexical analysis phase

 is a sequence of characters that does not match the pattern of any token, which is not
possible to scan into any valid token
 are thrown by the lexer when unable to continue. i.e., if there’s no way to recognize a
lexeme as a valid token.
 Lexical errors are not very common, but it should be managed by a scanner
 Some of common lexical errors in Lexical phase error can be
 Spelling error of identifiers, operators, keyword, etc

 Appearance of some illegal character

 Exceeding length of identifier or numeric constants.

 Remove the character that should be present.

 Replace a character with an incorrect character.

 Transposition of two characters.

11/28/202 14
Lexical Error cont’d……
 Example: see this C code Void main() {
 In this code, 1xab is neither a number nor
int x=10, y=20;
an identifier.
char * a;
 So this code will show the lexical error.
a= &x;
x= 1xab;
}
 Lexical Error recovery: There are some recovery mechanisms to remove lexical errors
 See some of possible error-recovery actions with examples of “cout” are
i. deleting an unnecessary character eg. couttcout
ii. inserting a missing character eg cotcout
iii. replacing an incorrect character by a correct character eg coufcout
iv. transposing two adjacent characters. Eg ocutcout
 However, few errors are out of power of lexical analyzer to recognize, because a lexical analyzer
has a very localized view of a source program. So, some other phase of compiler handle this error
 For instance, if the string fi is encountered in a C++/C program for the first time in the context
of:
 In this code, a lexical analyzer cannot tell whether fi is a misspelling
fi (a == b) … of the keyword if or an undeclared function identifier.
11/28/202 15
2.4. Automata: NFA to DFA Conversation
 Finite automata is a state machine that takes a string of symbols as input and changes its state
accordingly.
 Finite automata is a recognizer for regular expressions.
 When a regular expression string is fed into finite automata, it changes its state for each literal.
 If the input string is successfully processed and the automata reaches its final state, it is
accepted,
 i.e., the string that fed was said to be valid token of the language in hand

 Regular expressions =specify the specification

 Finite automata = implementation

 A finite automaton consists of

 An input alphabet 

 A set of states S

 A start state n

 A set of accepting states F  S

 A set of transitions state 

input state

11/28/202 WCU-CS Compiled by TM. 16

Automata: NFA to DFA cont’d……
 Transition: s1 a s2
 This can be read as; In state s1 on input “a” go to state s2
 If end of input
 If in accepting state => accept, othewise => reject

 If no transition possible => reject

 Finite Automata State Graphs can be build up using

 A state

 The start state

 An accepting state
a
 A transition

 Simple Example: A finite automaton that accepts only “1”

11/28/202 17
Automata: NFA to DFA cont’d……
 A finite automaton accepts a string if we can follow transitions labeled with the characters in the
string from the start to some accepting state
 Another Example: A finite automaton accepting any number of 1’s followed by a single 0
1
 Alphabet: {0,1} 0

 Check that “1110” is accepted with this finite automation

 Exercise: given Alphabet {0,1}, What language will be recognized by this automation machine?
1 0
0 0

1
1
 Epsilon Moves
 Another kind of transition with: -moves

A B  Here a machine can move from state A to state B without
reading input
11/28/202 18
Automata: NFA to DFA cont’d……
 Types of Finite Automata
i. Non-Deterministic Automata (NFA).
ii. Deterministic Automata (DFA)
i. Nondeterministic Finite Automata (NFA)
 Can have multiple transitions for one input in a given state

 Can have -moves

 NFA accepts if it can get in a final state

ii. Deterministic Finite Automata (DFA):

 A DFA is a special case of a NFA in which:-

 It has at most one transition per input from any state

 No -moves, means it has no transitions on input € ,

 DFA formally defined by 5 tuple notation; M = (Q, ∑, δ, qo, F), where

 Q is a finite “set of states”, which is non empty.
 ∑ is “input alphabets”, indicates input set.
 qo is an “initial state” and qo is in Q; ie, qo, ∑, Q F is a set of “Final states”,
 δ is a “transmission function” or mapping function, using this function the next state
can be determined.
11/28/202 WCU-CS Compiled by TM. 19
Automata: NFA to DFA cont’d……

Reading assignment
Execution of Finite Automata ?????
Details of NFA vs. DFA ?????
Regular expression is converted into minimized DFA ?????
Regular Expressions to Finite Automata ????
NFA to DFA ????
Implementation of DFA ????
20

You can refer more and more for detail elaboration

2.5. Lexical Analyzer Generator
 Creating a lexical analyzer with Lex:
 First, a lexical analyzer is prepared by creating a program lex.l in the Lex language.
 Then, lex.l is run through the Lex compiler to produce a C program [Link].c.
 Finally, [Link].c is run through the C compiler to produce an object program [Link],
 [Link] is the lexical analyzer that transforms an input stream into a sequence of tokens.

11/28/202 WCU-CS Compiled by TM. 21

Lexical Analyzer cont’d……
■ Lex Specification: a Lex program consists of three parts:
%{ definitions }% %{
int vowels=0, int cons=0;
%% %}{
%%
{rules } [aeiouAEIOU] {vowels++;}
%% [a-zA-Z] {cons++;}
%%
{ user subroutines } where,
■ Definitions include declarations of variables, constants, and regular definitions
■ Rules are statements of the form p1{action1}p2{action2}… pn{actionn}
■ where pi is regular expression and
■ action describes what action the lexical analyzer should take when pattern pi matches a
lexeme.
■ Actions are written in C code.
■ User subroutines are auxiliary procedures needed by the actions.
■ These can be compiled separately and loaded with the lexical analyzer.

11/28/202 WCU-CS Compiled by TM. 22

Lexical Analyzer cont’d……
■ Consider the following lex program; that count vowels and consonants
%{
int vowels=0;  Steps to executing this 'Lex' program:

int cons=0;  First write the source code in lex editor

%} “EditPlusPortable” or any editor, then
 Tools->'Lex File Compiler'
%%  Tools->'Lex Build'
[aeiouAEIOU] {vowels++;}  Tools->'Open CMD'
[a-zA-Z] {cons++;}  Then in command prompt type
%% 'name_of_file.exe' example->‘[Link]‘ and
press enter
int yywrap() {  Then entering your whole input and press enter
return 1;  Final press Ctrl + Z and press Enter., then you
} see the output
main(){
printf(" Enter any string to count vowels and consonats at end press^d\n");
yylex();
printf("no: of vowels are: %d\n",vowels);
printf("no of constants: %d\n",cons);
return 0;
}
11/28/202 WCU-CS Compiled by TM. 23
Lexical Analyzer cont’d……
■ The output for the above program will look like

11/28/202 WCU-CS Compiled by TM. 24

Next class
Chapter 3: Syntax Analysis
3.1. Role of a parser
3.2. Parsing
Outline
3.3. Types of parsing
3.4. Parser Generator: Yacc

Subscribe Yadah Academy on YouTube

Click [Link]
educationalco8575

CSC 415 Compiler Design: Lexical Analysis
No ratings yet
CSC 415 Compiler Design: Lexical Analysis
40 pages
Week 5-6
No ratings yet
Week 5-6
33 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
48 pages
Lexical Analysis in Compiler Design
100% (1)
Lexical Analysis in Compiler Design
52 pages
Pdf&rendition 1
No ratings yet
Pdf&rendition 1
14 pages
Chapter 2 Lexical Analysis
No ratings yet
Chapter 2 Lexical Analysis
33 pages
L4 - Lexical Analysis
No ratings yet
L4 - Lexical Analysis
44 pages
Compiler Lexical Analysis Guide
No ratings yet
Compiler Lexical Analysis Guide
56 pages
Lexical Analysis
No ratings yet
Lexical Analysis
153 pages
Lexical Analysis Overview by Atif Ishaq
100% (1)
Lexical Analysis Overview by Atif Ishaq
37 pages
Chapter 2 Lexical - Analysis
No ratings yet
Chapter 2 Lexical - Analysis
38 pages
Understanding Lexical Analysis in Compilers
No ratings yet
Understanding Lexical Analysis in Compilers
153 pages
002chapter 2 - Lexical Analysis
No ratings yet
002chapter 2 - Lexical Analysis
114 pages
Chapter 2
No ratings yet
Chapter 2
39 pages
Lecture 2 10022025 035804pm
No ratings yet
Lecture 2 10022025 035804pm
27 pages
Unit NO.03 Phases in Compilers-Lexical Analysis& Syntax Analysis
No ratings yet
Unit NO.03 Phases in Compilers-Lexical Analysis& Syntax Analysis
43 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
64 pages
Lecture 02
No ratings yet
Lecture 02
150 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
14 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
74 pages
Unit2 Lexical Analyzer
No ratings yet
Unit2 Lexical Analyzer
6 pages
Lexical Analysis: Tokens & Patterns Explained
No ratings yet
Lexical Analysis: Tokens & Patterns Explained
77 pages
Lecture 3 - Lexical Analysis
No ratings yet
Lecture 3 - Lexical Analysis
42 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
Unit 6
No ratings yet
Unit 6
109 pages
Lexical Analysis
No ratings yet
Lexical Analysis
62 pages
03 Lex Analysis
No ratings yet
03 Lex Analysis
61 pages
Chapter 3 Finite Automata and Lexical Analysis
No ratings yet
Chapter 3 Finite Automata and Lexical Analysis
100 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
46 pages
Chapter 2 - Lexical Analysis - Regular Expressions
No ratings yet
Chapter 2 - Lexical Analysis - Regular Expressions
27 pages
Ch3 Modified
No ratings yet
Ch3 Modified
80 pages
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
No ratings yet
21CS51 ATCD MODULE 2 - 2 Lexical Analyser Part2
62 pages
Ch2+3 Compiler
No ratings yet
Ch2+3 Compiler
21 pages
L2 Lexical Analysis
No ratings yet
L2 Lexical Analysis
59 pages
Scanner (Lexical Analyzer) : The Structure of A Compiler
No ratings yet
Scanner (Lexical Analyzer) : The Structure of A Compiler
109 pages
2 Lex
No ratings yet
2 Lex
45 pages
CH 3
No ratings yet
CH 3
66 pages
Lexical Analysis
No ratings yet
Lexical Analysis
57 pages
Unit 1 (B)
No ratings yet
Unit 1 (B)
69 pages
02 Lexical Analysis
No ratings yet
02 Lexical Analysis
86 pages
Chapter 2 - Lexical Analysis
No ratings yet
Chapter 2 - Lexical Analysis
10 pages
L4 - Lexical Analysis (Introduction)
No ratings yet
L4 - Lexical Analysis (Introduction)
11 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Lexical Analysis in Compiler Design
No ratings yet
Lexical Analysis in Compiler Design
88 pages
Chapter2-Lexical Analysis
No ratings yet
Chapter2-Lexical Analysis
64 pages
Compiler Design Lexical Analysis
No ratings yet
Compiler Design Lexical Analysis
24 pages
Chapter 7 Lexical Analysis
No ratings yet
Chapter 7 Lexical Analysis
61 pages
4 LexicalAnalysis
No ratings yet
4 LexicalAnalysis
27 pages
Lexical Analyzer in Perspective: Parser Source Program Token
No ratings yet
Lexical Analyzer in Perspective: Parser Source Program Token
22 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
CD Unit-2
No ratings yet
CD Unit-2
64 pages
Lexical Analyzer: Tokenization Process
No ratings yet
Lexical Analyzer: Tokenization Process
37 pages
Chapter 2 - Lexical Analyser
No ratings yet
Chapter 2 - Lexical Analyser
39 pages
Chapter 2 - Lexical Analysis
100% (1)
Chapter 2 - Lexical Analysis
69 pages
PlanDrawings MODELMAKING
No ratings yet
PlanDrawings MODELMAKING
28 pages
The use of πρóς, προτί and ποτί in Homer
No ratings yet
The use of πρóς, προτί and ποτί in Homer
7 pages
DFT Animal Farming Equipments Catalogur 2023
No ratings yet
DFT Animal Farming Equipments Catalogur 2023
34 pages
Casteism
No ratings yet
Casteism
2 pages
Insitu Permeability Tests
75% (4)
Insitu Permeability Tests
9 pages
Rajasthan's Rich Interior Architecture
No ratings yet
Rajasthan's Rich Interior Architecture
58 pages
Gangs and Organized Crime First Edition George W Knox Ebook Instant Reissue
No ratings yet
Gangs and Organized Crime First Edition George W Knox Ebook Instant Reissue
31 pages
Noa - Cncaplc - Notice of Arrival - Ts Tacoma - 06wi3n1nc 2060122572661000
No ratings yet
Noa - Cncaplc - Notice of Arrival - Ts Tacoma - 06wi3n1nc 2060122572661000
2 pages
Staff Performance Appraisal Form G and Below
No ratings yet
Staff Performance Appraisal Form G and Below
10 pages
Report On Under Construction and Under Develoment Projects All Across The Country June 2024
No ratings yet
Report On Under Construction and Under Develoment Projects All Across The Country June 2024
93 pages
Zambia Transfer Pricing Guide
No ratings yet
Zambia Transfer Pricing Guide
25 pages
Nursing Knowledge Management Guide
No ratings yet
Nursing Knowledge Management Guide
21 pages
Conveyor Belts Failure Analysis
No ratings yet
Conveyor Belts Failure Analysis
17 pages
Daddy by Sylvia Plath
No ratings yet
Daddy by Sylvia Plath
2 pages
66b3134abb9190d1f258a9f7 - ## - Idiom and Phrase 01 Class Notes
No ratings yet
66b3134abb9190d1f258a9f7 - ## - Idiom and Phrase 01 Class Notes
49 pages
488514-More Dynamic and Engaging Combat For DD 5E
No ratings yet
488514-More Dynamic and Engaging Combat For DD 5E
2 pages
Why Emg?: Easy Installation
No ratings yet
Why Emg?: Easy Installation
4 pages
Guidelines of Conduct For The Babalawo by Odu of Ifa
No ratings yet
Guidelines of Conduct For The Babalawo by Odu of Ifa
12 pages
Social Policy Theory and Practice 3. Ed. Edition Spicker Ebook Full Series Edition
100% (4)
Social Policy Theory and Practice 3. Ed. Edition Spicker Ebook Full Series Edition
110 pages
Cyberthon 2025
No ratings yet
Cyberthon 2025
1 page
Cause and Effect
No ratings yet
Cause and Effect
9 pages
Transport Layer Overview: TCP & UDP
No ratings yet
Transport Layer Overview: TCP & UDP
90 pages
Arends S Chapter 4
No ratings yet
Arends S Chapter 4
41 pages
Semantic Network Unit 3
No ratings yet
Semantic Network Unit 3
15 pages
Suno Guide
100% (2)
Suno Guide
37 pages
Answer - Questions On Regulation On Professional Conduct
100% (2)
Answer - Questions On Regulation On Professional Conduct
23 pages
SHS Work Power and Energy
No ratings yet
SHS Work Power and Energy
36 pages
Chapter 1
No ratings yet
Chapter 1
7 pages
1994 S C M R 958
No ratings yet
1994 S C M R 958
3 pages
Law Students' Exam Guide
No ratings yet
Law Students' Exam Guide
8 pages

2024 CD-Ch02 Lexical Analysis

Uploaded by

2024 CD-Ch02 Lexical Analysis

Uploaded by

Wachemo University

By: Tseganesh M.(MSc.)

Subscribe on Yadah Academy YouTube channel

 Main functions of Lexical analyzer

Read entire program into memory Symbol table

 Improving compiler efficiency

 Enhancing compiler portability (e.g. Linux to Win)

 Token, Pattern, Lexeme

 These patterns can be defined by means of regular expressions

 A token can have a token name and an optional token/attribute value

11/28/202 WCU-CS Compiled by TM. 2

 What does the Lexical Analyzer send to the next phase?

11/28/202 WCU-CS Compiled by TM. 2

 int value = 100;

 contains the tokens:

 int (keyword), value (identifier), = (operator), 100 (constant) and ; (symbol).

11/28/202 WCU-CS Compiled by TM. 6

 <id, pointer to symbol table entry for E> Token Attribute

11/28/202 WCU-CS Compiled by TM. 7

 {0,1} is a set of binary alphabets,

 {0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F} is a set of Hexadecimal alphabets,

 {a-z, A-Z} is a set of English language alphabets.

 e.g., the length of the string compiler is 8 and is denoted by |compiler| = 8

Arithmetic Symbols Addition(+), Subtraction(-), Modulo(%), Multiplication(*), Division(/)

11/28/202 WCU-CS Compiled by TM. 9

 A grammar defined by regular expressions is known as regular grammar

 There are several important operations that can be applied to languages.

 Union of two languages L and M is written as;

 L* = Zero or more occurrence of language L

 Let L={0,1} and S={a,b,c}

 Kleene closure : L*={ ε,0,1,00….}

 Positive closure : L+={0,1,00….}

i. Representing valid tokens of a language in regular expression

 x* means zero or more occurrence of x.

 i.e., it can generate { e, x, xx, xxx, xxxx, … }

 x+ means one or more occurrence of x.

 i.e., it can generate { x, xx, xxx, xxxx … } or x.x*

 x? means at most one occurrence of x

 i.e., it can generate either {x} or {e}.

 [a-z] is all lower-case alphabets of English language.

 [A-Z] is all upper-case alphabets of English language.

 [0-9] is all natural digits used in mathematics.

11/28/2024 WCU-CS Compiled by TM. 12

iii. Representation of language tokens using regular expressions

 Identifier = (letter)(letter | digit)*

 Each state in the transition diagram represents a piece of code.

 Every identified lexeme walks through the Automata.

 Appearance of some illegal character

 Exceeding length of identifier or numeric constants.

 Remove the character that should be present.

 Replace a character with an incorrect character.

 Transposition of two characters.

 Regular expressions =specify the specification

 A finite automaton consists of

 A set of accepting states F  S

 A set of transitions state 

11/28/202 WCU-CS Compiled by TM. 16

 If no transition possible => reject

 Finite Automata State Graphs can be build up using

 The start state

 Simple Example: A finite automaton that accepts only “1”

 Check that “1110” is accepted with this finite automation

 Can have -moves

 NFA accepts if it can get in a final state

ii. Deterministic Finite Automata (DFA):

 It has at most one transition per input from any state

 No -moves, means it has no transitions on input € ,

 DFA formally defined by 5 tuple notation; M = (Q, ∑, δ, qo, F), where

You can refer more and more for detail elaboration

11/28/202 WCU-CS Compiled by TM. 21

11/28/202 WCU-CS Compiled by TM. 22

int cons=0;  First write the source code in lex editor

11/28/202 WCU-CS Compiled by TM. 24

Subscribe Yadah Academy on YouTube

You might also like