The document discusses the concepts of regular expressions and regular languages, emphasizing their role in defining patterns for finite strings and programming language tokens. It explains operations on languages such as union, concatenation, and Kleene closure, along with the precedence and associativity of these operations. Additionally, it covers finite automata as a mathematical model for recognizing regular expressions and details the construction of finite automata, including states, transitions, and final states.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
6 views
Regular expressions
The document discusses the concepts of regular expressions and regular languages, emphasizing their role in defining patterns for finite strings and programming language tokens. It explains operations on languages such as union, concatenation, and Kleene closure, along with the precedence and associativity of these operations. Additionally, it covers finite automata as a mathematical model for recognizing regular expressions and details the construction of finite automata, including states, transitions, and final states.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21
Compiler Design
Count no of token in following C code
Regular Expression and Regular Languages • The lexical analyzer needs to scan and identify only a finite set of valid string/token/lexeme that belong to the language in hand. • It searches for the pattern defined by the language rules. • Regular expressions have the capability to express finite languages by defining a pattern for finite strings of symbols. • The grammar defined by regular expressions is known as regular grammar. The language defined by regular grammar is known as regular language. Regular Expression and Regular Languages • Programming language tokens can be described by regular languages. • There are a number of algebraic laws that are obeyed by regular expressions, which can be used to manipulate regular expressions into equivalent forms. Regular Expression and Regular Languages The various operations on languages are: • Union of two languages L and M is written as L U M = {s | s is in L or s is in M}
• Concatenation of two languages L and M is written as
LM = {st | s is in L and t is in M}
• The Kleene Closure of a language L is written as
L* = Zero or more occurrence of language L. Notations If r and s are regular expressions denoting the languages L(r) and L(s), then • Union : (r)|(s) is a regular expression denoting L(r) U L(s) • Concatenation : (r)(s) is a regular expression denoting L(r)L(s) • Kleene closure : (r)* is a regular expression denoting (L(r))* (r) is a regular expression denoting L(r) Precedence and Associativity • *, concatenation (.), and | (pipe sign) are left associative • * has the highest precedence • Concatenation (.) has the second highest precedence. • | (pipe sign) has the lowest precedence of all. Representing valid tokens of a language in regular expression If x is a regular expression, then: • x* means zero or more occurrence of x. i.e., it can generate { e, x, xx, xxx, xxxx, … } • x+ means one or more occurrence of x. i.e., it can generate { x, xx, xxx, xxxx … } or x.x* • x? means at most one occurrence of x i.e., it can generate either {x} or {e}. • [a-z] is all lower-case alphabets of English language. • [A-Z] is all upper-case alphabets of English language. • [0-9] is all natural digits used in mathematics. Precedence and Associativity • *, concatenation (.), and | (pipe sign) are left associative • * has the highest precedence • Concatenation (.) has the second highest precedence. • | (pipe sign) has the lowest precedence of all. Write the regular expression for the language accepting all the string containing any number of a's and b’s. Solution: The regular expression will be: 1. r.e. = (a + b)* This will give the set as L = {ε, a, aa, b, bb, ab, ba, aba, bab, .....}, any combination of a and b. The (a + b)* shows any combination with a and b even a null string. Write the regular expression for the language accepting all combinations of a's except the null string, over the set ∑ = {a} Solution: The regular expression has to be built for the language L = {a, aa, aaa, ....} This set indicates that there is no null string. So we can denote regular expression as: R = a+ Write the regular expression for the language accepting all combinations of a's, over the set ∑ = {a} Solution: All combinations of a's means a may be zero, single, double and so on. If a is appearing zero times, that means a null string. That is we expect the set of {ε, a, aa, aaa, ....}. So we give a regular expression for this as: R = a* That is Kleen closure of a Write the regular expression for the language accepting all the string which are starting with 1 and ending with 0, over ∑ = {0, 1}. Solution: In a regular expression, the first symbol should be 1, and the last symbol should be 0. There is as follows: 1. R = 1 (0+1)* 0 Write the regular expression for the language starting and ending with a and having any having any combination of b's in between. Solution: The regular expression will be: R = a b* a Write the regular expression for the language starting with a but not having consecutive b’s. Solution: The regular expression has to be built for the language: L = {a, aba, aab, aba, aaa, abab, .....} The regular expression for the above language is: R = {a + ab}* Regular Expression and Regular Languages • The language accepted by finite automata can be easily described by simple expressions called Regular Expressions. • It is the most effective way to represent any language. • The languages accepted by some regular expression are referred to as Regular languages. • A regular expression can also be described as a sequence of pattern that defines a string. • Regular expressions are used to match character combinations in strings. • String searching algorithm used this pattern to find the operations on a string. Finite automata • Finite automata is a state machine that takes a string of symbols as input and changes its state accordingly. • Finite automata is a recognizer for regular expressions. • When a regular expression string is fed into finite automata, it changes its state for each literal. • If the input string is successfully processed and the automata reaches its final state, it is accepted, i.e., the string just fed was said to be a valid token of the language in hand. Finite automata The mathematical model of finite automata consists of: • Finite set of states (Q) • Finite set of input symbols (Σ) • One Start state (q0) • Set of final states (qf) • Transition function (δ) Finite automata Construction Let L(r) be a regular language recognized by some finite automata (FA). • States : States of FA are represented by circles. State names are of the state is written inside the circle. • Start state : The state from where the automata starts, is known as start state. Start state has an arrow pointed towards it. • Intermediate states : All intermediate states has at least two arrows; one pointing to and another pointing out from them. • Final state : If the input string is successfully parsed, the automata is expected to be in this state. Final state is represented by double circles. It may have any number of arrows pointing to it and any number of arrows pointing out from it. • Transition : The transition from one state to another state happens when a desired symbol in the input is found. Upon transition, automata can either move to next state or stay in the same state. Movement from one state to another is shown as a directed arrow, where the arrows points to the destination state. If automata stays on the same state, an arrow pointing from a state to itself is drawn. Finite automata Construction Regular expressions a b a* ab a|b a? ab* (ab|cd)* a(b|c)*d a(b|a)*b