Theory of Computation
Automata Theory
(CFG, CFL, CNF)
Objectives
• Introduce Context-free Grammar
(CFG) and Context-free Language
(CFL)
• Show that Regular Language can be
described by CFG
• Terminology related to CFG
– Leftmost Derivation, Ambiguity,
Chomsky Normal Form (CNF)
• Converting a CFG into CNF
Context-free Grammar
(Example)
A 0A1
Substitution
Rules AB
B#
Variables A, B
Terminals 0,1,#
Start Variable A
Important: Substitution Rule in CFG has a special form:
Exactly one variable (and nothing else) on the left
side of the arrow
How does CFG generate strings?
A 0A1
AB
B#
• Write down the start symbol
• Find a variable that is written down, and a
rule that starts with that variable; Then,
replace the variable with the rule
• Repeat the above step until no variable is
left
How does CFG generate strings?
A 0A1
AB
B#
Step 1. A (write down the start variable)
Step 2. 0A1 (find a rule and replace)
Step 3. 00A11 (find a rule and replace)
Step 4. 00B11 (find a rule and replace)
Step 5. 00#11 (find a rule and replace)
Now, the string 00#11 does not have any variable.
We can stop.
How does CFG generate strings?
• The sequence of substitutions to
generate a string is called a derivation
• E.g., A derivation of the string
000#111 in the previous grammar is
A 0A1 00A11 000A111
000B111 000#111
• The same information can be
represented pictorially by a parse tree
(next slide)
Parse Tree
A
A
0 1
A
0 1
0 B 1
#
Language of the Grammar
• In fact, the previous grammar can
generate strings #, 0#1, 00#11,
000#111, …
• The set of all strings that can be
generated by a grammar G is called
the language of G, denoted by L(G)
• The language of the previous
grammar is {0n#1n | n 0 }
CFG (Formal Definition)
• A CFG is a 4-tuple (V,T, R, S), where
– V is a finite set of variables
– T is a finite set of terminals
– R is a set of substitution rules, where
each rule consists of a variable (left side
of the arrow) and a string of variables
and terminals (right side of the arrow)
– S 2 V called the start variable
CFG (terminology)
• Let u and v be strings of variables and
terminals
• We say u derives v, denoted by u * v, if
– u = v, or
– there exists u1, u2, …, uk, k 0 such that u u1
u2 … uk v
• In other words, for a grammar G = (V,T,R,S),
*
L(G) = { w 2 T*| S w }
CFG (more examples)
• Let G = ( {S}, {a,b}, R, S ), and the set
of rules, R, is
– S aSb | SS | This notation is
an abbreviation for
S aSb
S SS
S
• What will this grammar generate?
• If we think of a as “(” and b as “)”, G generates
all strings of properly nested parentheses
• Is the following a CFG?
G = { {A,B}, {0,1}, R, A }
A 0B1 | A | 0
B 1B0 | 1
0B A
Designing CFG
• Can we design CFG for
{0n1n | n 0} [ {1n0n | n 0} ?
• Do we know CFG for {0n1n | n 0}?
• Do we know CFG for {1n0n | n 0}?
Designing CFG
• CFG for the language L1 = {0n1n | n 0}
S 0S1 |
• CFG for the language L2 = {1n0n | n 0}
S 1S0 |
• CFG for L1 [ L2
S S1 | S 2
S1 0S11 |
S2 1S20 |
Designing CFG
• Can we design CFG for {02n13n | n 0}?
• Yes, by “linking” the occurrence of 0’s
with the occurrence of 1’s
• The desired CFG is:
S 00S111 |
• Can we construct the CFG for the
language { w | w is a palindrome } ?
Assume that the alphabet of w is {0,1}
• Examples for palindrome: 010, 0110,
001100, 01010, 1101011, …
Regular Language & CFG
Theorem: Any regular language can be
described by a CFG.
How to prove? (By construction)
Regular Language & CFG
Proof: Let D be the DFA recognizing the
language. Create a distinct variable Vi for
each state qi in D.
• Make V0 the start variable of CFG
Assume that q0 is the start state of D
• Add a rule Vi aVj if (qi,a) = qj
• Add a rule Vi if qi is an accept state
Then, we can show that the above CFG generates
exactly the same language as D (how to show?)
Regular Language & CFG
(Example)
DFA 0 1
1
start q0 q1
0
CFG G = ( {V0, V1}, {0,1}, R, V0 ), where R is
V0 0V0 | 1V1 |
V1 1V1 | 0V0
Leftmost Derivation
• A derivation which always replace the
leftmost variable in each step is called a
leftmost derivation
– E.g., Consider the CFG for the properly nested
parentheses ( {S}, {(,)}, R, S ) with rule R: S
( S ) | SS |
– Then, S SS (S)S ( )S ( ) ( S )
( ) ( ) is a leftmost derivation
– But, S SS S(S) (S)(S) ( ) ( S )
( ) ( ) is not a leftmost derivation
• However, we note that both derivations
correspond to the same parse tree
Ambiguity
• Sometimes, a string can have two or more
leftmost derivations!!
• E.g., Consider CFG ( {S}, {+,x,a}, R, S) with
rules R:
SS+S|SxS|a
– The string a + a x a has two leftmost
derivations as follows:
– S S + S a + S a +S x S a + a x S
a+axa
– S S x S S + S x S a +S x S a + a x S
a+axa
Ambiguity
• If a string has two or more leftmost
derivations in a CFG G, we say the string is
derived ambiguously in G
• A grammar is ambiguous if some strings is
derived ambiguously
• Note that the two leftmost derivations in
the previous example correspond to
different parse trees (see next slide)
– In fact, each leftmost derivation corresponds
to a unique parse tree
Two parse trees for a + a x a
S S
S + S S x S
a S x S S + S a
a a a a
Fun Fact:
Inherently Ambiguous
• Sometimes when we have an ambiguous
grammar, we can find an unambiguous grammar
that generates the same language
• However, some language can only be generated
by ambiguous grammar
E.g., { anbncm | n, m 0} [ {anbmcm | n, m 0}
• Such language is called inherently ambiguous
Chomsky Normal Form (CNF)
• A CFG is in Chomsky Normal Form if each
rule is of the form
A BC
Aa
where
– a is any terminal
– A,B,C are variables
– B, C cannot be start variable
• However, S is allowed
Converting a CFG to CNF
Theorem: Any context-free language
can be generated by a context-free
grammar in Chomsky Normal Form.
Hint: When is a general CFG not in
Chomsky Normal Form?
Proof Idea
The only reasons for a CFG not in CNF:
1. Start variable appears on right side
2. It has rules, such as A
3. It has unit rules, such as A A, or B C
4. Some rules does not have exactly two
variables or one terminal on right side
Prove idea: Convert a grammar into CNF
by handling the above cases
The Conversion (step 1)
• Proof: Let G be the context-free
grammar generating the context-free
language. We want to convert G into
CNF.
• Step 1: Add a new start variable S0
and the rule S0 S, where S is the
start variable of G
This ensures that start variable of the new grammar
does not appear on right side
The Conversion (step 2)
• Step 2: We take care of all rules. To
remove the rule A , for each
occurrence of A on the right side of a rule,
we add a new rule with that occurrence
deleted.
– E.g., R uAvAw causes us to add the rules:
R uAvw, R uvAw, R uvw
• If we have the rule R A, we add R
unless we had previously removed R
After removing A , the new grammar still
generates the same language as G.
The Conversion (step 3)
• Step 3: We remove the unit rule A
B. To do so, for each rule B u
(where u is a string of variables and
terminals), we add the rule A u.
– E.g., if we have A B, B aC, B CC,
we add: A aC, A CC
After removing A B, the new grammar still
generates the same language as G.
The Conversion (step 4)
• Step 4: Suppose we have a rule
A u1 u2 …uk, where k > 2 and each ui
is a variable or a terminal. We replace
this rule by
– A u1A1, A1 u2A2, A2 u3A3, …,
Ak-2 uk-1uk
After the change, the string on the right side of any
rule is either of length 1 (a terminal) or length 2 (two
variables, or 1 variable + 1 terminal, or two terminals)
The Conversion (step 4 cont.)
• To remove a rule A u1u2 with some
terminals on the right side, we replace
the terminal ui by a new variable Ui and
add the rule Ui ui
After the change, the string on the right side of any
rule is exactly a terminal or two variables
The Conversion (example)
• Let G be the grammar on the left
side. We get the new grammar on the
right side after the first step.
S ASA | aB S0 S
AB|S S ASA | aB
Bb| AB|S
Bb|
The Conversion (example)
• After that, we remove B
S0 S S0 S
S ASA | aB S ASA | aB | a
AB|S AB|S|
Bb| Bb
Before removing After removing
B B
The Conversion (example)
• After that, we remove A
S0 S S0 S
S ASA | aB | a S ASA | aB | a |
AB|S|
Bb SA | AS | S
AB|S
Before removing B After
b removing
A A
The Conversion (example)
• Then, we remove S S and S0 S
S0 S S0 ASA | aB | a |
S ASA | aB | a |
SA | AS
SA | AS S ASA | aB | a |
AB|S SA | AS
Bb AB|S
After removing B b removing
After
SS S0 S
The Conversion (example)
• Then, we remove A B
S0 ASA | aB | a | S0 ASA | aB | a |
SA | AS SA | AS
S ASA | aB | a | S ASA | aB | a |
SA | AS SA | AS
AB|S Ab|S
B Before
b removing B b removing
After
AB AB
The Conversion (example)
• Then, we remove A S
S0 ASA | aB | a | S0 ASA | aB | a |
SA | AS SA | AS
S ASA | aB | a | S ASA | aB | a |
SA | AS SA | AS
Ab|S A b | ASA | aB |
Bb a | SA | AS
Before removing B After
b removing
AS AS
The Conversion (example)
• Then, we apply Step 4
S0 AA1 | UB | a | SA |
S0 ASA | aB | a |
AS
SA | AS S AA1 | UB | a | SA | AS
S ASA | aB | a |
A b | AA1 | UB | a | SA |
SA | AS
A b | ASA | aB | AS
a | SA | AS Bb
After Step 4
B b Step 4
Before A1 SA Grammar is in CNF
Ua