CS242_Module 5
CS242_Module 5
26/12/2021
Theory of Computing
This Presentation is mainly dependent on the textbook: Introduction to Automata Theory, Languages, and Computation: Global Edition, 3rd edition (2013) PHI
by John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman
• Context-Free Grammars
Not all languages are regular
• So what happens to the languages which are
not regular?
• Can we still come up with a language
recognizer?
• i.e., something that will accept (or reject) strings that
belong (or do not belong) to the language?
7
Context-Free Languages
• A language class larger than the class of regular languages
• Supports natural, recursive notation called “context-free
grammar”
• Applications:
• Parse trees, compilers
• XML
Context-
Regular
(FA/RE) free
(PDA/CFG)
8
An Example
• A palindrome is a word that reads the same from both
ends
• E.g., mom, nolemonnomelon, madam, 101010101
• Let L = { w | w is a binary palindrome}
• Is L regular?
• No.
• Proof:
• Let w = 0N10N (assuming N to be a positive integer constant)
• By Pumping Lemma, w can be rewritten as xyz, such that xykz is also L (for any
k ≥ 0)
• But |xy|≤ N and y ≠
• ==> y = 0+
• ==> xykz will NOT be in L for k = 0
• ==> Contradiction
9
CFL
• The language of palindromes is a CFL, because it
supports recursive substitution (in the form of a CFG)
• This is because we can construct a “grammar” like this:
1. A ==>
2. A ==> 0
3. A ==> 1 Productions
4. A ==> 0A0
5. A ==> 1A1
• This can be also written as
A => 0A0 | 1A1 | 0 | 1 |
• Variable or non-terminal: Symbols on the left side of a
production. Only variable in this grammar is A.
• Terminal: The symbols on the right side of a production. Here,
, 0, 1.
10
How does the CFG for palindromes work?
An input string belongs to the language (i.e.,
accepted) if and only if it can be generated by the
CFG.
Generating a string from a grammar:
1. Pick and choose a sequence of productions that would
allow us to generate the string.
2. At every step, substitute one variable with one of its productions.
• Example: w = 01110
G:
• G can generate w as follows:
A => 0A0 | 1A1 | 0 | 1 |
1. A => 0A0
2. => 01A10
3. => 01110
11
Definition of Context-Free Grammar
• A context-free grammar is denoted as G = (V, T, P, S),
where
• V = Set of variables or non-terminals
• T = Set of terminal symbols (alphabet U {})
• P = Set of productions, each of which is of the form
V ==> 1 | 2 | …
• Where each i is an arbitrary string of variables and terminal
symbols
• S = The start variable
Example: CFG for the language of binary palindromes:
• G = ({A}, {0,1}, P, A)
•P= A ==> 0A0|1A1|0|1|
12
More examples (1)
• Parenthesis matching in code
E.g., ()(((())))((()))….
CFG is:
• S => (S) | SS |
• A grammar for L = {0m1n | m ≥ n}
CFG is:
• S => 0S1 | A
• A => 0A |
• Syntax checking
• In scenarios where there is a general need for:
• Matching a symbol with another symbol, or
• Matching a count of one symbol with that of another symbol, or
• Recursively substituting one symbol with a string of other symbols
13
More examples (2)
• L1 = {0n | n ≥ 0}
• L2 = {0n | n ≥ 1}
• L3={0i1j2k | i = j or j=k, where i, j, k ≥ 0}
• L4={0i1j2k | i = j or I = k, where i, j, k ≥ 1}
14
Applications of CFLs & CFGs
• Compilers use parsers for syntax checking
• Parsers are expressed as CFGs
1. Balancing parentheses:
• B ==> BB | (B) | Statement
• Statement ==> …
2. If-then-else:
• S ==> SS | if Condition then Statement else Statement | if
Condition then Statement | Statement
• Condition ==> …
• Statement ==> …
3. C parentheses matching { … }
4. Pascal begin-end matching
5. YACC (Yet Another Compiler-Compiler)
15
More applications
• Markup languages
• Nested Tag Matching
• HTML
• <html> …<p> … <a href=…> … </a> </p> … </html>
• XML
• <PC> … <MODEL> … </MODEL> .. <RAM> … </RAM> … </PC>
16
Structure of a production
A =======> 1 | 2 | … | k
1. A ==> 1
2. A ==> 2
3. A ==> 3
…
K. A ==> k
17
CFG conventions
• Terminal symbols <== a, b, c…
18
Syntactic Expressions in
Programming Languages
result = a*b + location + 10 * distance + c
19
String membership
How to say if a string belong to the language defined by
a CFG?
1. Derivation
• Head to body Both are equivalent forms
2. Recursive inference
• Body to head
Example:
• w = 01110
• Is w a palindrome?
CFG: A => 0A0 | 1A1 | 0 | 1 |
A => 0A0
=> 01A10
=> 01110
20
Simple Expressions
• We can write a CFG for accepting simple expressions
• G = (V, T, P, S)
• V = {E, F}
• T = {0, 1 ,a, b, +, *, (, )}
• S = {E}
• P=
• E ==> E+E | E*E | (E) | F
• F ==> aF | bF | 0F | 1F | a | b | 0 | 1
21
Generalization of derivation
▪ Derivation is head ==> body
▪ A ==> X (A derives X in a single step)
▪ A ==>*G X (A derives X in a multiple steps)
▪ Transitivity:
IF A ==>*GB, and B ==>*GC, THEN A ==>*G C
22
Context-Free Language
• The language of a CFG, G=(V, T, P, S), denoted by L(G), is
the set of terminal strings that have a derivation from
the start variable S.
• L(G) = { w in T* | S ==>*G w }
23
Left-most & Right-most Derivations
For the CFG:
E => E+E | E*E | (E) | F
F => aF | bF | 0F | 1F |
Derive the string a*(ab+10) from G: E =*=>G a*(ab+10)
E E
Left-most ==> E * E Right-most ==> E * E
derivation: ==> F * E derivation: ==> E * (E)
==> aF * E ==> E * (E + E)
==> a * E ==> E * (E + F)
Always ==> a * (E) Always ==> E * (E + 1F)
substitute ==> a * (E + E) substitute ==> E * (E + 10F)
leftmost ==> a * (F + E) rightmost ==> E * (E + 10)
variable ==> a * (aF + E) variable ==> E * (F + 10)
==> a * (abF + E) ==> E * (aF + 10)
==> a * (ab + E) ==> E * (abF + 0)
==> a * (ab + F) ==> E * (ab + 10)
==> a * (ab + 1F) ==> F * (ab + 10)
==> a * (ab + 10F) ==> aF * (ab + 10)
==> a * (ab + 10) ==> a * (ab + 10)
24
Leftmost vs. Rightmost derivations
• For every leftmost derivation, there is a rightmost
derivation, and vice versa.
Will use parse trees to prove this
• Does every word generated by a CFG have a leftmost
and a rightmost derivation?
Easy to prove (reverse direction)
• Could there be words which have more than one
leftmost (or rightmost) derivation?
Yes – depending on the grammar
25
CFG & CFL
• Gpal
A => 0A0 | 1A1 | 0 | 1 |
• Proof:
• Use induction
• on string length for the IF part
• On length of derivation for the ONLY IF part
26
• Parse Trees
Parse Trees
• Each CFG can be represented using a parse tree:
• Each internal node is labeled by a variable in V
• Each leaf represents a terminal symbol
• For a production, A ==>X1X2…Xk, then any internal node
labeled A has k children which are labeled from X1,X2,…Xk
from left to right
• Parse tree for production and all other subsequent productions:
• A ==> X1..Xi..Xk
X1 … Xi … Xk
28
Examples
G:
G:
E => E+E | E*E | (E) | F
A => 0A0 | 1A1 | 0 | 1 |
F => aF | bF | 0F | 1F | 0 | 1 | a | b
E
A
Recursive inference
E + E
0 A 0
F F
Derivation
1 A 1
a 1
29
Parse Trees, Derivations, and
Recursive Inferences
Production:
A ==> X1..Xi..Xk
A
Derivation
X1 … Xi … Xk
Recursive
inference
Derivation Right-most
Recursive
derivation
inference
30
Interchangeability of different
CFG representations
• Parse tree ==> left-most derivation
• DFS left to right
• Parse tree ==> right-most derivation
• DFS right to left
• ==> left-most derivation == right-most derivation
• Derivation ==> Recursive inference
• Reverse the order of productions
• Recursive inference ==> Parse trees
• bottom-up traversal of parse tree
31
• Applications of Context-Free Grammars
Relationship between CFLs and RLs
33
CFLs & Regular Languages
• A CFG is said to be right-linear if all the productions are
one of the following two forms: A ==> wB (or) A ==> w
Where:
• A & B are variables,
• w is a string of terminals
34
Some Examples
0 1 0 1
0,1 A => 01B | C
1 0 1 0
A B 1 C B => 11B | 0C | 1A
A B C
C => 1A | 0 | 1
0
35
• Ambiguity in Grammars and Languages
Ambiguity in CFGs
37
Why does ambiguity matter?
• E ==> E + E | E * E | (E) | a | b | c | 0 | 1
• For the string a * b + c, the two values are different!
E
• LM derivation #1:
•E => E + E => E * E + E E + E (a*b)+c
==>* a * b + c
E * E c
a b
E
• LM derivation #2
•E => E * E => a * E => E * E a*(b+c)
a * E + E ==>* a * b + c
a E + E
b c
The calculated value depends on which of the two parse trees is
actually used.
38
Removing Ambiguity in
Expression Evaluations
• It may be possible to remove ambiguity for some CFLs
• E.g., in a CFG for expression evaluation by imposing rules &
restrictions such as precedence rule
• This would imply rewrite of the grammar
Precedence: (), * , +
• Modified unambiguous version:
E => E + T | T
T => T * F | F
F => I | (E)
I => a | b | c | 0 | 1
39
Inherently Ambiguous CFLs
• However, for some languages, it may not be possible to
remove ambiguity
40
Main Reference
1. Context-Free Grammars
2. Parse Trees
3. Applications of Context-Free Grammars
4. Ambiguity in Grammars and Languages
(Introduction to Automata Theory, Languages, and Computation
(2013) Global Edition 3rd Edition)
Additional References
https://www3.nd.edu/~cpennycu/2019/assets/fall/TOC/08%20Context%20Free%2
0Grammars.pdf
https://www3.nd.edu/~cpennycu/2019/assets/fall/TOC/09%20Chomsky%20Norm
al%20Form.pdf
This Presentation is mainly dependent on the textbook: Introduction to Automata Theory, Languages, and Computation: Global Edition, 3rd edition (2013) PHI
by John E. Hopcroft, Rajeev Motwani and Jeffrey D. Ullman
Thank You