Context Free Grammar
Where are we…?
I. Theory of Automata
→ II. Theory of Formal Languages
III. Theory of Turing Machines …
2
Important..!!!
◼ Read the first five pages of Chapter 12
before studying this lecture at home.
Chapter 12: Context-Free Grammars
◼ programming languages
◼ compiling a program: an operation that
generates an equivalent program in
machine or assembler language.
◼ 2 phases:
1. parsing
2. translation to machine language
((1/2)+9)/(4+(8/21)+(5/(3+(1/2))))
Parsing splits a sequence of characters or values into smaller parts
4
Chapter 12: Context-Free Grammars
Example: AE (Arithmetic Expressions)
◼ Rule 1: Any number is in AE
◼ Rule 2: If x and y are in AE, then so are:
(x) – (x) (x+y) (x–y) (x*y) (x / y)
A different way for defining the set AE is to use a set
of substitutions rules similar to the grammatical
rules:
5
Chapter 12: Context-Free Grammars
Substitution rules that define the AE’s:
S → AE
AE → (AE + AE)
AE → (AE–AE)
AE → (AE*AE)
AE → (AE/AE)
AE → (AE)
AE → –(AE)
AE → NUMBER
NUMBERS??
NUMBER → FIRST-DIGIT
FIRST-DIGIT → FIRST-DIGIT OTHER-DIGIT
FIRST-DIGIT → 1 2 3 4 5 6 7 8 9
OTHER-DIGIT → 0 1 2 3 4 5 6 7 8 9
6
Chapter 12: Context-Free Grammars
((3+4)*(6+7))
S AE (AE*AE) ((AE+AE)*AE) ((AE+AE)*(AE+AE))
… ((3+4)*(6+7)) AE:
S → AE
AE → (AE + AE)
How to generate the number 1066? AE → (AE–AE)
NUMBER FIRST-DIGIT
FIRST-DIGIT OTHER-DIGIT AE → (AE*AE)
FIRST-DIGIT OTHER-DIGIT OTHER-DIGIT AE → (AE)
FIRST-DIGIT OTHER-DIGIT OTHER-DIGIT OTHER-DIGIT AE → –(AE)
1066
AE → NUMBER
NUMBERS
NUMBER → FIRST-DIGIT
FIRST-DIGIT → FIRST-DIGIT OTHER-DIGIT
FIRST-DIGIT → 1 2 3 4 5 6 7 8 9
OTHER-DIGIT → 0 1 2 3 4 5 6 7 8 9
7
Chapter 12: Context-Free Grammars
Definition: A context free grammar (CFG) is:
1. an alphabet S of letters, called terminals.
2. a set of symbols, called nonterminals or variables.
One symbol S is called the start symbol.
3. a finite set of productions of the form:
A→a
where A is a nonterminal and a is a finite
sequence (word) of nonterminals and terminals.
8
Context Free Grammar (CFG)
CFG terminologies
➢ Terminals: The symbols that can’t be replaced by
anything are called terminals.
➢ Non-Terminals: The symbols that must be replaced
by other things are called non-terminals.
➢ Productions: The grammatical rules are often called
productions
Chapter 12: Context-Free Grammars
EA:
◼ Examples: S → AE
AE → (AE + AE)
Terminals: (, ), +, -, *, numbers AE → (AE–AE)
Nonterminals: S, AE AE → (AE*AE)
AE → (AE)
AE → –(AE)
NUMBERS AE → NUMBER
NUMBER → FIRST-DIGIT
FIRST-DIGIT → FIRST-DIGIT OTHER-DIGIT
FIRST-DIGIT → 123456789
OTHER-DIGIT →0123456789
Terminals : 0, 1, 2, 3, 4, 5, 6, 7, 8, 9
Nonterminals : S, FIRST-DIGIT, OTHER-DIGIT
10
Context Free Grammar (CFG)
Note
➢ The terminals are designated by small letters, while
the non-terminals are designated by capital letters.
➢ There is at least one production that has the non-
terminal S at its left side.
Chapter 12: Context-Free Grammars
Definition: A sequence of applications of productions starting
with the start symbol and ending in a sequence of terminals is
called a derivation.
Definition: The language generated by a CFG is the set of all
sequences of terminals produced by derivations. We also say
language defined by, language derived from, or language
produced by the CFG.
Definition: A language generated by a CFG is called a context-
free language.
12
Chapter 12: Context-Free Grammars
Examples: S → aS S→Λ
S aS
aaS Generated Language:
aaaS {Λ, a, aa, aaa, …} = language(a*).
aaaaS
aaaaaS
aaaaaaS
aaaaaaΛ = aaaaaa
13
Chapter 12: Context-Free Grammars
S → SS S→a S→Λ
S SS
SSS
SaS
Generated Language:
SaSS
ΛaSS {Λ, a, aa, aaa, …} = language(a*).
ΛaaS
ΛaaΛ=aa
(An infinite number of derivations for the word aa.)
14
Chapter 12: Context-Free Grammars
In general: variables: Upper case letters
terminals: Lower case letters
The empty word:
is it a nonterminal? L → ...
no
a terminal? LaaL = aa
not exactly, because it is erased.
N→L N can simply be deleted.
15
Chapter 12: Context-Free Grammars
◼ S → aS S → bS S→a S→b
S aS abS abbS abba
◼ S→X S→Y X→Λ Y → aY
Y→ bY Y→a Y→b
◼ S → aS S → bS S→a S→b S→Λ
S aS abS abbS abbaS abba
◼ S → aS S → bS S→Λ
16
Chapter 12: Context-Free Grammars
Examples
S → XaaX X → aX X → bX X → Λ
Can you generate the word abaab ? How?
S XaaX aXaaX abXaaX abXaabX abaab
How can you generate the word baabaab?
S → XY
X → aX
S → XY
X → bX Abbreviation: X → aX | bX | a
X→a
Y → Ya | Yb | a
Y → Ya
Y→ Yb
Y→a
17
Chapter 12: Context-Free Grammars
Examples
S → SS | ES | SE | Λ | DSD
E → aa | bb
D → ab | ba
Can you identify the language generated by this CFG..?
EVEN-EVEN=language([aa+bb+(ab + ba)(aa+bb)*(ab+ba)]*)
Exercise: Generate aababbab
Exercise: Generate abababbabbbaabaa
18
Chapter 12: Context-Free Grammars
More examples
What is the language of this CFG..???
S → aSb | Λ
S aSb aaSbb aaaSbbb aaaaSbbbb
aaaaaSbbbbb aaaaabbbbb
◼ anbn
What type of words are generated by this CFG..???
S → aSa | bSb | Λ ?
The words are palindromes.
19
Example: What is the language?
Productions:
1. S → aB
2. S → bA
3. A→a
4. A → aS
5. A → bAA
6. B→b
7. B → bS
8. B → aBB
➢ This grammar generates the language EQUAL
➢ Practice yourself with at least 3 strings from this language.
Exercise 1:
◼ Find CFG for the following language:
a ba
n n
◼ S → aSa | b
21
Exercise 2:
➢ Find a CFG that generates the language of
strings, defined over Σ={a,b}, beginning
and ending in different letters.
22
Solution Exercise 2:
Consider the following CFG
S → aXb | bXa
X → aX | bX | Λ
➢ The CFG generates the language of strings, defined
over Σ={a,b}, beginning and ending in different
letters.
Solution Exercise 3
productions:
S → YXY
Y → aY|bY|Λ
X → bbb
➢ It can be observed that, using prod.2, Y generates Λ. Y generates a.
Y generates b. Y also generates all the combinations of a and b. thus
Y generates the strings generated by (a+b) *.
➢ Production 3 provides the triple b that is needed at least once.
Derivation Trees or Parse Trees
S → AB A → aaA | B → Bb |
S AB
S
A B
28
S → AB A → aaA | B → Bb |
S AB aaAB
S
A B
a a A
29
S → AB A → aaA | B → Bb |
S AB aaAB aaABb
S
A B
a a A B b
30
S → AB A → aaA | B → Bb |
S AB aaAB aaABb aaBb
S
A B
a a A B b
31
S → AB A → aaA | B → Bb |
S AB aaAB aaABb aaBb aab
Derivation Tree S
A B
a a A B b
32
S → AB A → aaA | B → Bb |
S AB aaAB aaABb aaBb aab
Derivation Tree S
A B
yield
a a A B b aab
= aab
33
Sometimes, derivation order doesn’t matter
Leftmost: always replace the leftmost nonterminal first
S AB aaAB aaB aaBb aab
Rightmost: always replace the rightmost nonterminal first
S AB ABb Ab aaAb aab
S
Same derivation tree
A B
a a A B b
34
Chapter 12: Context-Free Grammars
Derivation Tree Example 2
S → AA
A → AAA | bA | Ab | a S
Parse Trees for the word bbaaaab
A A
S
S
S b A A A A
A A
A A
A A b A b A a a A b
A A A
b A A A A
a a
b A a a A b
bbaaaab
35
Chapter 12: Context-Free Grammars
Parse trees are also called syntax trees, generation trees,
production trees, or derivation trees.
Remark: In a parse tree every internal node is labelled
with a variable (nonterminal) and every leaf is labelled
with a terminal.
36
Chapter 12: Context-Free Grammars
Example:
S → S+S | S*S | NUMBER
NUMBER → …
S S + S S + S * S … 3 + 4 * 5 Inherent ambiguity
SS*SS+S*S…3+4*5
S
S
S * S
S + S
S + S 5
3 S * S
3 4
4 5 37
Chapter 12: Context-Free Grammars
Example: use parenthesis now
S → (S+S) | (S*S) | NUMBER
NUMBER → …
S (S+S) (S+(S*S)) … (3+(4*5))
S (S*S) ((S+S)*S) … ((3+4)*5)
S
S
S * S
S + S
S + S 5
3 S * S
3 4
4 5 38
Chapter 12: Context-Free Grammars
S
S
S
S + S
3 + S 23
3 + 20
3 S * S
4 * 5
4 5
S S S
S S
S * 5
35
* 7 * 5
S + S 5 3 + 4
3 4
39
S → (S+S) | (S*S) | NUMBER
Chapter 12: Context-Free Grammars
Lukasiewicz* (Prefix) Notation +3*45
Also o-o-o notation
S S S
S + S + +
3 * 3 *
3 S * S
4 5 4 5 4 5
* Pronounced: Wu-cash-ay-vich
40
Chapter 12: Context-Free Grammars
*+345
S S
S
* *
S * S
+ + 5
S + S 5 5
4 3 4
3 4 3
41
S → (S+S) | (S*S) | NUMBER
Chapter 12: Context-Free Grammars
Infix example: ( (1+2) * (3+4) + 5 ) * 6
S *+*+12+3456 +1 2
* *+* 3 +3456 +3 4
+ 6 *+* 3 7 56 *3 7
* 5 *+ 21 56 + 21 5
* 26 6 * 26 6
+ +
156
1 2 3 4
42
S → (S+S) | (S*S) | NUMBER
Exercise 1
◼ Convert infix to Polish notation (draw the tree)
◼ ((1+2)*3)+4
43
Chapter 12: Context-Free Grammars
S
Example: S → AB A → a B→b
S AB aB ab A B
S AB Ab ab
a b
● x + y * z can be produced with these
two different parse trees:
● A CFG is ambiguous if there is at least one word in the
language that has at least two derivation trees. It is called
unambiguous otherwise. 45
Chapter 12: Context-Free Grammars
Language(a+)
Example: S → aS | Sa | a
S S S S
a S a S S a S a
S
a S S a a S S a
a S
a a a a
a S
Example: S → aS | a
a 46
Exercise 2
◼ Show that the following CFG is ambiguous
S → XaX
X → aX | bX | Λ
47
Context Free Grammar (CFG)
Total language tree
➢ For a given CFG, a tree with the start symbol S as its root
and whose nodes are working strings of terminals and non-
terminals.
➢ The descendants of each node are all possible results of
applying every production to the working string.
➢ This tree is called total language tree
Chapter 12: Context-Free Grammars
Total Language Trees
S → aa | bX | aXX S
X → ab | b
aa bX aXX
bab bb aabX abX aXab aXb
aabab aabb abab abb aabab abab aabb abb
The language generated by the given CFG is
{aa, bab, bb, aabab, aabb, abab, abb} 49
Chapter 12: Context-Free Grammars
Total language tree may be infinite
Total language tree is infinite
S → aSb | bS | a S
since S can call itself recursively
aSb bS a
aaSbb abSb aab baSb bbS ba
aaaSbbb aabSbb aaabb aaaSbbb aaaSbbb bba
. . . .
. . . .
50
Grammatical Format
Note…
➢ All Regular languages can be generated by CFGs.
➢ Some Non-Regular languages can be generated by CFGs
but not all possible languages can be generated by CFG
➢ e.g. the CFG S → aSb | ab generates the non regular language
{anbn | n=1,2,3, …}.
➢ Note:
For every FA, there exists a CFG that generates the
language accepted by this FA.
Chapter 13: Grammatical Format
Theorem. All regular languages are context-free
languages.
Proof. We show that for any FA, there is a CFG such that
the language generated by the grammar is the same as
the language accepted by the FA.
By constructive algorithm.
Input: a finite automaton (FA).
Output: a CF grammar (CFG).
54
Chapter 13: Grammatical Format
◼ The alphabet of terminals is the alphabet of the FA.
◼ Nonterminals are the state names. (The start state is renamed
S.)
◼ For every edge, create a production:
X a Y X → aY
X a X → aX
▪ For every final state, create a production:
X →Λ
55
Chapter 13: Grammatical Format
b a,b
Example: a
S– M a F+
b
S → aM | bS
M → aF | bS babbaaba
F → aF | bF | Λ
b a b b a a b a
S → S→M→S→S→M→F→F→F
S bS baM babS babbS babbaM babbaaF
babbaabF babbaabaF babbaaba
56
Exercise 1: Give a CFG for words accepted
by this FA.
57
Chapter 13: Grammatical Format
Definition: A semiword is a sequence of terminals (possibly none)
followed by exactly one nonterminal.
(terminal)(terminal)…(terminal)(Nonterminal)
Definition: A CFG is a regular grammar if all of its productions
have the form:
Nonterminal → semiword
or
Nonterminal → word (a sequence of terminals or Λ)
58
Chapter 13: Grammatical Format
Theorem. All languages generated by regular grammars are
regular.
Proof. By constructive algorithm. We build a transition graph.
◼ The alphabet of the transition graph is the set of terminals.
◼ One state for each nonterminal. The state named S is the start
state. We add one final state +.
◼ Transitions:
Nx → wyNz Np→ wq
wy wq
Nx Nz Np +
59
Chapter 13: Grammatical Format
Example: S → aaS S → bbS S→L
aa
– L + (aa+bb)*
bb
60
Chapter 13: Grammatical Format
Example S → aaS | bbS | abX | baX | Λ
X → aaX | bbX | abS | baS
aa,bb aa,bb
ab
–
ba X
ab
L
ba
+ EVEN-EVEN
61
Exercise 2: Build a TG corresponding to
this CFG
◼ S → aA | bB
◼ A → aS | a
◼ B → bS | b
62
Exercise 2: Build a TG corresponding to
this CFG
SOLUTION
◼ S → aA | bB
◼ A → aS | a
◼ B → bS | b
63
Null or Λ Productions
The production of the form non-terminal → Λ is said to be null
production.
Example:
Consider the CFG,
S → aA|bB| Λ
A → aa| Λ
B → aS
◼ Here S → Λ and A → Λ are null productions.
Chapter 13: Grammatical Format
Killing Λ Productions
◼ If Λ is in the language, a production of the form N → Λ (called a
Λ-production) is necessary.
◼ The existence of a production of the form N → Λ does not
necessarily mean that Λ is a part of the language.
S → aX S → a X → Λ
Theorem. Let L be a language generated by a CFG. There exists a
CFG without productions of the form X → Λ such that:
1. If LL, L is generated by the new grammar.
2. If LL, all words of L except for L are generated by the new
grammar.
65
Killing Λ-Productions
Λ-Productions:
In a given CFG, we call a non-terminal N nullable
◼ if there is a production N → Λ, or
◼ there is a derivation that starts at N and lead to a Λ.
N ➔ ……… ➔ Λ
◼ Λ-Productions are undesirable.
◼ We can replace Λ-production with appropriate non-Λ
productions.
66
Replacement Rule.
1.Delete all Λ-Productions.
2.Add the following productions:
For every production of the X → old string
Add new production(s) of the form X → .., where
right side will account for every modification of
the old string that can be formed by deleting all
possible subsets of null-able Non-Terminals,
except that we do not allow X → Λ, to be formed if
all the character in old string are null-able
67
Note
➢ While adding new productions all Nullable
productions should be handled with care.
➢All Nullable productions will be used to add new
productions, but only the Null production will be
deleted.
Example Consider the CFG
S → a | Xb | aYa
X→Y|Λ X is nullable
Y→b|X Y is nullable
Old nullable New So the new CFG is
Production Production
X→Y nothing S → a | Xb | aa | aYa |b
X→Λ nothing X→Y
Y→X nothing Y→b|X
S → Xb S→b
S → aYa S → aa
69
Consider the CFG
Example S → Xa
X → aX | bX | Λ
X is nullable
Old nullable New So the new CFG is
Production Production
S → Xa S→a S → a | Xa
X → aX | bX | a | b
X → aX X→a
X → bX X→b
70
Example
S → XY
X → Zb
Y → bW • Null-able Non-terminals are?
Z → AB • A, B, Z and W
W→Z
A → aA | bA | Λ
B → Ba | Bb | Λ
71
S → XY
X → Zb
Y → bW
Example Contd. Z → AB
W→Z
A → aA | bA | Λ
B → Ba | Bb | Λ
Old nullable New So the new CFG is
Production Production
S → XY
X → Zb X→b
Y → bW Y→b X → Zb | b
Z → AB Z → A and Z → B Y → bW | b
W→Z Nothing new
Z → AB | A | B
A → aA A→a
A → bA A→b W→Z
B → Ba B →a A → aA | bA | a | b
B → Bb B→b B → Ba | Bb | a | b
72
Remove Nulls
73
74
Killing unit-productions
◼ Definition: A production of the form
◼ non-terminal → one non-terminal
is called a unit production.
◼ The following theorem allows us to get rid of unit
productions:
Theorem 24:
If there is a CFG for the language L that has no Λ-
productions, then there is also a CFG for L with no Λ-
productions and no unit productions.
75
Proof of Theorem 24
◼ This is another proof by constructive algorithm.
◼ Algorithm: For every pair of non-terminals A and
B, if the CFG has a unit production A → B, or if
there is a chain
A ➔ X1 ➔ X2 ➔ … ➔ B
where X1, X2, ... are non-terminals, create new
productions as follows:
◼ If the non-unit productions from B are
B → s1 | s2| …
where s1, s2, ... are strings, we create the productions
A → s 1| s 2| …
76
Example
◼ Consider the CFG
S → A| bb
A→B|b
B→S|a
◼ The non-unit productions are
S → bb A→b B→a
◼ And unit productions are
S→A
A→B
B→S
77
Example contd.
◼ Let’s list all unit productions and their sequences and create new
productions:
S→A gives S→b
S→A→B gives S→a
A→B gives A→a
A→B→S gives A → bb
B→S gives B → bb
B→S→A gives B→b
◼ Eliminating all unit productions, the new CFG is
S → bb | b | a
A → b | a | bb
B → a | bb | b
◼ This CFG generates a finite language since there are no non-
terminals in any strings produced from S.
78
Chapter 13: Grammatical Format
Definition. A CFG in is said to be in Chomsky Normal
Form (CNF) if all the productions have the form:
Nonterminal → (Nonterminal)(Nonterminal)
Nonterminal → terminal
A → BC or A→a
79
Examples:
S → AS S → AS
S →a S → AAS
A → SA A → SA
A→b A → aa
Chomsky Not Chomsky
Normal Form Normal Form
80
Theorem.
Let L be a language generated by a CFG.
There is another grammar which is in CNF
that generates all the words of L (except Λ).
81
CNF construction
◼ Proof: following previous theorems, we can:
◼ Eliminate all Λ-productions
◼ Eliminate all unit productions
◼ Reduce to X → X1X2…Xn and X→terminal
◼ Replace first type, if n>2, with
◼ X→X1R1 R1→X2R2 R2→X3R3 … Rn-3→Xn-2Rn-2
Rn-2 → Xn-1Xn
82
Conversion to Chomsky Normal Form
◼ Example 1:
S → ABa
A → aab
B → Ac
Not Chomsky
Normal Form
83
Nonterminal → (Nonterminal)(Nonterminal)
Nonterminal → terminal
Final grammar in Chomsky Normal Form:
Initial grammar
S → ABa
A → aab
B → Ac
84
Example 1 Solution
Final grammar in Chomsky Normal Form:
S → AV1
Initial grammar
V1 → BTa
S → ABa A → TaV2
V2 → TaTb
A → aab
B → ATc
B → Ac Ta → a
Tb → b
Tc → c
85
Example 2 Nonterminal → (Nonterminal)(Nonterminal)
Nonterminal → terminal
Convert the following CFG to CNF
S → aSa | bSb | a | b | aa | bb
Example 2 Sol Nonterminal → (Nonterminal)(Nonterminal)
Nonterminal → terminal
Consider the following CFG
S → aSa | bSb | a | b | aa | bb
To convert the above CFG to be in CNF, introduce the new productions as
A → a, B → b, then the new CFG will be
S → ASA|BSB|AA|BB|a|b
A→a
B→b
Introduce non-terminals R1 and R2 so that
S → AR1|BR2|AA|BB|a|b
R1 → SA
R2 → SB
A→a
B→b which is in CNF.
Nonterminal → (Nonterminal)(Nonterminal)
Nonterminal → terminal
Exercise 1: Convert the CFG to CNF
◼ S → ASA | aB
◼ A→B|S
◼ B→b|Λ
88
Nonterminal → (Nonterminal)(Nonterminal)
Nonterminal → terminal
Exercise 1 Sol: Convert the CFG to CNF
◼ S → ASA | aB
◼ A→B|S
◼ B→b|Λ
◼ Solution
S → AA1 | UB | AS | SA | a
A → b | AA1 | UB | AS | SA | a
B→b
U→a
A1 → SA
89
Exercise 2: Convert this CFG to CNF
S → ABAB
A→a|Λ
B→b|Λ
90
Left most derivation
➢ The derivation of a word w, generated by a CFG, such that
at each step, a production is applied to the left most Non-
terminal in the working string, is said to be left most
derivation.
➢ Note that the non-terminal that occurs first from the left in
the working string, is said to be left most non-terminal.
Context Free Grammar (CFG)
Example
Consider the following CFG
S → XY
X → XX | a
Y → YY | b
then following are the two left most derivations of
aaabb
S → XY
X → XX | a
String: aaabb Y → YY | b
S => XY S => XY
=> XXY => XXY
=> aXY => XXXY
=> aXXY => aXXY
=> aaXY => aaXY
=> aaaY => aaaY
=> aaaYY => aaaYY
=> aaabY => aaabY
= aaabb = aaabb
Theorem
➢ Any word that can be generated by a certain CFG has also a left
most derivation.
➢ The above theorem can be stated for right most derivation as well.
Example
Consider the following CFG
S → YX
X → XX | b
Y → YY | a
S → YX
X → XX | b
Y → YY | a
Following are the left most and right most derivations of abbbb
S => YX S => YX
=> aX => YXX
=> aXX => YXb
=> abX => YXXb
=> abXX => YXbb
=> abbX => YXXbb
=> abbXX => YXbbb
=> abbbX => Ybbbb
= abbbb = abbbb