0% found this document useful (0 votes)
88 views77 pages

Properties of Context-Free Languages

Uploaded by

SANDYA DUMPA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views77 pages

Properties of Context-Free Languages

Uploaded by

SANDYA DUMPA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 77

Properties of Context-free

Languages

1
Topics
1) Simplifying CFGs, Normal forms
2) Pumping lemma for CFLs
3) Closure and decision properties of
CFLs

2
How to “simplify” CFGs?

3
Three ways to simplify/clean a CFG
(clean)
1. Eliminate useless symbols

(simplify)
2. Eliminate -productions A => 

3. Eliminate unit productions A => B

4
Eliminating useless symbols

Grammar cleanup

5
Eliminating useless symbols
A symbol X is reachable if there exists:
 S *  X 

A symbol X is generating if there exists:


 X * w,

for some w  T*

For a symbol X to be “useful”, it has to be both


reachable and generating


S *  X  * w’, for some w’  T*
reachable generating

6
Algorithm to detect useless
symbols
1. First, eliminate all symbols that are not
generating

2. Next, eliminate all symbols that are not


reachable

Is the order of these steps important,


or can we switch?

7
Example: Useless symbols
 SAB | a
 A b

1. A, S are generating
2. B is not generating (and therefore B is useless)
3. ==> Eliminating B… (i.e., remove all productions that involve
B)
1. S a
2. Ab
4. Now, A is not reachable and therefore is useless

5. Simplified G: What would happen if you reverse the order:


1. Sa i.e., test reachability before generating?
Will fail to remove:
Ab
8
X * w

Algorithm to find all generating symbols


 Given: G=(V,T,P,S)
 Basis:
 Every symbol in T is obviously generating.
 Induction:
 Suppose for a production A , where  is
generating
 Then, A is also generating

9
S *  X 
Algorithm to find all reachable symbols
 Given: G=(V,T,P,S)
 Basis:
 S is obviously reachable (from itself)
 Induction:
 Suppose for a production A 1 2… k,
where A is reachable
 Then, all symbols on the right hand side,
{1, 2 ,… k} are also reachable.

10
Eliminating -productions
A => 

11
What’s the point of removing -productions?
A
Eliminating -productions
It is not possible to eliminate -productions for
languages which include  in their word set
So we will target the grammar for the rest of the language
Theorem: If G=(V,T,P,S) is a CFG for a language L, then
L\ {} has a CFG without -productions

Definition: A is “nullable” if A* 


 If A is nullable, then any production of the form

“B CAD” can be simulated by:


 B  CD | CAD

 This can allow us to remove  transitions for A

12
Example: Eliminating -
productions
 Let L be the language represented by the following CFG G:
i. SAB
ii. AaAA | 
iii. BbBB |  Simplified
grammar
Goal: To construct G1, which is the grammar for L-{}

 Nullable symbols: {A, B}

 G1 can be constructed from G as follows: G1:


 B  b | bB | bB | bBB • S  A | B | AB
 ==> B  b | bB | bBB • A  a | aA | aAA
 Similarly, A  a | aA | aAA • B  b | bB | bBB
 Similarly, S  A | B | AB
+
 Note: L(G) = L(G1) U {} • S

13
Eliminating unit productions
A => B B has to be a variable

What’s the point of removing unit transitions ?


Will save #substitutions
E.g., A=>B | … A=>xxx | yyy | zzz | …
B=>C | … B=> xxx | yyy | zzz | …
C=>D | … C=> xxx | yyy | zzz | …
D=>xxx | yyy | zzz D=>xxx | yyy | zzz
after 14
before
AB
Eliminating unit productions
 Unit production is one which is of the form A B, where both A & B
are variables
 E.g.,
1. E  T | E+T
2. T  F | T*F
3. F  I | (E)
4. I  a | b | Ia | Ib | I0 | I1
 How to eliminate unit productions?

 Replace E T with E  F | T*F

 Then, upon recursive application wherever there is a unit production:


 E F | T*F | E+T (substituting for T)
 E I | (E) | T*F| E+T (substituting for F)
 E a | b | Ia | Ib | I0 | I1 | (E) | T*F | E+T (substituting for I)
 Now, E has no unit productions

 Similarly, eliminate for the remainder of the unit productions

15
Example: eliminating unit
productions
Unit pairs Only non-unit
productions to be
added to P1
G:
1. E  T | E+T (E,E) E  E+T
2. T  F | T*F
3. F  I | (E) (E,T) E  T*F
4. I  a | b | Ia | Ib | I0 | I1 (E,F) E  (E)
(E,I) E  a|b|Ia | Ib | I0 | I1
(T,T) T  T*F
(T,F) T  (E)
(T,I) T  a|b| Ia | Ib | I0 | I1
G1:
1. E  E+T | T*F | (E) | a| b | Ia | Ib | I0 | I1 (F,F) F  (E)
2. T  T*F | (E) | a| b | Ia | Ib | I0 | I1
(F,I) F  a| b| Ia | Ib | I0 |
3. F  (E) | a| b | Ia | Ib | I0 | I1
4. I  a | b | Ia | Ib | I0 | I1 I1
(I,I) I  a| b | Ia | Ib | I0 |
I1
16
Putting all this together…
 Theorem: If G is a CFG for a language that
contains at least one string other than , then there
is another CFG G1, such that L(G1)=L(G) - , and
G1 has:
 no  -productions
 no unit productions
 no useless symbols

 Algorithm:
Step 1) eliminate  -productions
Step 2) eliminate unit productions
Step 3) eliminate useless symbols

17
Normal Forms

18
Why normal forms?
 If all productions of the grammar could be
expressed in the same form(s), then:

a. It becomes easy to design algorithms that use


the grammar

b. It becomes easy to show proofs and properties

19
Chomsky Normal Form (CNF)
Let G be a CFG for some L-{}
Definition:
G is said to be in Chomsky Normal Form if all
its productions are in one of the following
two forms:
i. A  BC where A,B,C are variables, or
ii. Aa where a is a terminal
 G has no useless symbols
 G has no unit productions
 G has no -productions
20
CNF checklist
Is this grammar in CNF?
G1 :
1. E  E+T | T*F | (E) | Ia | Ib | I0 | I1
2. T  T*F | (E) | Ia | Ib | I0 | I1
3. F  (E) | Ia | Ib | I0 | I1
4. I  a | b | Ia | Ib | I0 | I1

Checklist:
• G has no -productions
• G has no unit productions
• G has no useless symbols
• But…
• the normal form for productions is violated

So, the grammar is not in CNF


21
How to convert a G into CNF?
 Assumption: G has no -productions, unit productions or useless
symbols

1) For every terminal a that appears in the body of a production:


i. create a unique variable, say Xa, with a production Xa  a, and
ii. replace all other instances of a in G by Xa

2) Now, all productions will be in one of the following


two forms:

A  B1B2… Bk (k≥3) or Aa

B2 C2 and so on…
3) Replace each production of the form A  BB1B B … Bk by:
1 2 3C1

22
Example #1
G in CNF:
G:
X0 => 0
S => AS | BABC
X1 => 1
A => A1 | 0A1 | 01
S => AS | BY1
B => 0B | 0
Y1 => AY2
C => 1C | 1 Y2 => BC
A => AX1 | X0Y3 | X0X1
Y3 => AX1
B => X0B | 0
C => X1C | 1

All productions are of the form: A=>BC or A=>a

23
Languages with 
 For languages that include ,
 Write down the rest of grammar in CNF
 Then add production “S => ” at the end
E.g., consider: G in CNF:
G: X0 => 0
S => AS | BABC X1 => 1
A => A1 | 0A1 | 01 |  S => AS | BY1 |
B => 0B | 0 |  Y1 => AY2
C => 1C | 1 |  Y2 => BC

A => AX1 | X0Y3 | X0X1


Y3 => AX1
B => X0B | 0
C => X1C | 1 24
Other Normal Forms
 Griebach Normal Form (GNF)
All productions of the form
A==>a 
Example:
S → mXY | mY | mXC | mC | p
X → mX | m
Y → mXD | mD | o
O→o
P→p

25
Return of the Pumping Lemma !!

Think of languages that cannot be CFL


== think of languages for which a stack will not be enough

e.g., the language of strings of the form ww

26
Why pumping lemma?
 A result that will be useful in proving
languages that are not CFLs
 (just like we did for regular languages)

27
The Pumping Lemma for CFLs
Let L be a CFL.
Then there exists a constant N, s.t.,
 if z L s.t. |z|≥N, then we can write
z=uvwxy, such that:
1. |vwx| ≤ N
2. vx≠
3. For all k≥0: uvkwxky  L

Note: we are pumping in two places (v & x)


28
Meaning:
Repetition in the
last m+1 variables

Parse tree for z


h-m≤ i < j ≤ h
S = A0 S = A0
+

A1 Ai = Aj
A2 Ai
, > m levels

. h ≥ m+1 h ≥ m+1

.
Aj
m variables

. m+1
Ah-1
u v x y
Ah=a
w

z z = uvwxy
• Therefore, vx≠
29
Extending the parse tree…
S = A0
S = A0

Replacing Or, replacing


Aj with Ai Ai with Aj
(k times) Aj
Ai=Aj
h ≥ m+1

Ai
w

Ai u y
u v x y
z = uwy
v x

w ==> For all k≥0: uvkwxky L

z = uvkwxky 30
Application of Pumping
Lemma for CFLs
Example 1: L = {ambmcm | m>0 }
Claim: L is not a CFL
Proof:
 Let N <== P/L constant
 Pick z = aNbNcN
 Apply pumping lemma to z and show that there
exists at least one other string constructed from z
(obtained by pumping up or down) that is  L

31
Proof contd…
 z = uvwxy
 As z = aNbNcN and |vwx| ≤ N and vx≠
 ==> v, x cannot contain all three symbols
(a,b,c)
 ==> we can pump up or pump down to build
another string which is  L

32
Example #2 for P/L application
 L = { ww | w is in {0,1}*}

 Show that L is not a CFL

 Try string z = 0N0N


 what happens?
 Try string z = 0N1N0N1N
 what happens?

33
CFL Closure Properties

34
Closure Property Results
 CFLs are closed under:
 Union
 Concatenation
 Kleene closure operator
 Substitution
 Homomorphism, inverse homomorphism
 reversal
 CFLs are not closed under: Note: Reg languages
 Intersection are closed
under
 Difference these
 Complementation operators
35
Strategy for Closure Property
Proofs
 First prove “closure under substitution”
 Using the above result, prove other closure properties
 CFLs are closed under:
 Union
 Concatenation
 Kleene closure operator
Prove  Substitution
this first  Homomorphism, inverse homomorphism
 Reversal

36
Note: s(L) can use
a different alphabet

The Substitution operation


For each a  ∑, then let s(a) be a language
If w=a1a2…an  L, then:

s(w) = { x1x2 … }  s(L), s.t., xi  s(ai)
Example:
 Let ∑={0,1}
 Let: s(0) = {anbn | n ≥1}, s(1) = {aa,bb}
 If w=01, s(w)=s(0).s(1)
 E.g., s(w) contains a1 b1 aa, a1 b1bb,
a2 b2 aa, a2 b2bb,
… and so on.

37
CFLs are closed under
Substitution
IF L is a CFL and a substititution defined
on L, s(L), is s.t., s(a) is a CFL for every
symbol a, THEN:
 s(L) is also a CFL

What is s(L)?
L s(L)
w1 s(w1) Note: each s(w)
w2 s(L) s(w2) is itself a set of strings
w3 s(w3)
w4 s(w4)
38
CFLs are closed under union
Let L1 and L2 be CFLs
To show: L2 U L2 is also a CFL
Let us show by using the result of Substitution

 Make a new language:


 Lnew = {a,b} s.t., s(a) = L1 and s(b) = L2
==> s(Lnew) == same as == L1 U L2

 A more direct, alternative proof


 Let S1 and S2 be the starting variables of the
grammars for L1 and L2 39
Example
Language Grammar

L1  {a b }n n
S1  aS1b |

L2  S 2  aS 2 a | bS 2b |
{ww } R 

Union
L  {a b }
n n
S  S1 | S 2
40
CFLs are closed under
concatenation
 Let L1 and L2 be CFLs
Let us show by using the result of Substitution

 Make Lnew= {ab} s.t.,


s(a) = L1 and s(b)= L2
==> L1 L2 = s(Lnew)

 A proof without using substitution?


41
Example
Language Grammar

L1  {a b } n n
S1  aS1b |

L2  S 2  aS 2 a | bS 2b |
{ww } R 
Concatenation

L  {a b }{ww }
n n R S  S1S2 42
CFLs are closed under
Kleene Closure
 Let L be a CFL

 Let Lnew = {a}* and s(a) = L1

 Then, L* = s(Lnew)

43
Example

Language Grammar

L S  aSb
{a nb n } |

Star Operation

L S1  SS1 |
n n 44
We won’t use substitution to prove this result

CFLs are closed under


Reversal
 Let L be a CFL, with grammar
G=(V,T,P,S)
 For LR, construct GR=(V,T,PR,S) s.t.,
 If A==>  is in P, then:
 A==> R is in PR

 (that is, reverse every production)

45
Some negative closure results

CFLs are not closed under


Intersection
 Existential proof:
 L1 = {0n1n2i | n≥1,i≥1}
 L2 = {0i1n2n | n≥1,i≥1}
 Both L1 and L2 are CFLs
 Grammars?
 But L1  L2 cannot be a CFL
 Why?
 We have an example, where intersection is
not closed.
 Therefore, CFLs are not closed under
intersection 46
Example

L1  {a nb n c m } L2  {a nb m c m }
Context-free: Context-free:
S  AC S  AB
A  aAb |  A  aA | 
C  cC |  B  bBc | 
Intersection

L  L  {a nb n c n } NOT context-free
47
Some negative closure results

CFLs are not closed under


complementation
 Follows from the fact that CFLs are not
closed under intersection

 L1  L2 = L1 U L2
Logic: if CFLs were to be closed under complementation
 the whole right hand side becomes a CFL (because
CFL is closed for union)
 the left hand side (intersection) is also a CFL
 but we just showed CFLs are
NOT closed under intersection!
 CFLs cannot be closed under complementation.
48
Some negative closure results

CFLs are not closed under


difference
 Follows from the fact that CFLs are not
closed under complementation

 Because, if CFLs are closed under


difference, then:
 L = ∑* - L
 So L has to be a CFL too
 Contradiction

49
Decision Properties
 Emptiness test
 Generating test
 Reachability test
 Membership test
 PDA acceptance

50
The CYK Algorithm
 J. Cocke
 D. Younger,
 T. Kasami

 Independently developed an algorithm to


answer this question.
The CYK Algorithm Basics

 The Structure of the rules in a Chomsky


Normal Form grammar

 Uses a “dynamic programming” or “table-


filling algorithm”
Chomsky Normal Form
 Normal Form is described by a set of
conditions that each rule in the grammar
must satisfy
 Context-free grammar is in CNF if each rule
has one of the following forms:
 A  BC at most 2 symbols on right side
 A  a, or terminal symbol
 S  λ null string
where B, C Є V – {S}
Construct a Triangular Table
 Each row corresponds to one length of
substrings
 Bottom Row – Strings of length 1
 Second from Bottom Row – Strings of
length 2
.
.
 Top Row – string ‘w’
Construct a Triangular Table
 Xi, i is the set of variables A such that
A  wi is a production of G

 Compare at most n pairs of previously


computed sets:
(Xi, i , Xi+1, j ), (Xi, i+1 , Xi+2, j ) … (Xi, j-1 , Xj, j )
Construct a Triangular Table
X1, 5
X1, 4 X2, 5
X1, 3 X2, 4 X3, 5
X1, 2 X2, 3 X3, 4 X4, 5
X1, 1 X2, 2 X3, 3 X4, 4 X5, 5
w1 w2 w3 w4 w5

Table for string ‘w’ that has length 5


Construct a Triangular Table
X1, 5
X1, 4 X2, 5
X1, 3 X2, 4 X3, 5
X1, 2 X2, 3 X3, 4 X4, 5
X1, 1 X2, 2 X3, 3 X4, 4 X5, 5
w1 w2 w3 w4 w5

Looking for pairs to compare


Example CYK Algorithm
 Show the CYK Algorithm with the
following example:
 CNF grammar G
 S  AB | BC
 A  BA | a
 B  CC | b
 C  AB | a
 w is baaba
 Question Is baaba in L(G)?
Constructing The Triangular
Table S  AB | BC
A  BA | a
B  CC | b
C  AB | a

{B} {A, C} {A, C} {B} {A, C}


b a a b a

Calculating the Bottom ROW


Constructing The Triangular
Table
 X1 , 2 = (Xi , i ,Xi+1 , j) = (X1 , 1 , X2 , 2)
  {B}{A,C} = {BA, BC}
 Steps:
 Look for production rules to generate BA or
BC
S  AB | BC
 There are two: S and A A  BA | a
 X1 , 2 = {S, A} B  CC | b
C  AB | a
Constructing The Triangular
Table

{S, A}
{B} {A, C} {A, C} {B} {A, C}
b a a b a
Constructing The Triangular
Table
 X2 , 3 = (Xi , i ,Xi+1 , j) = (X2 , 2 , X3 , 3)
  {A, C}{A,C} = {AA, AC, CA, CC} = Y
 Steps:
 Look for production rules to generate Y
 There is one: B S  AB | BC
A  BA | a
 X2 , 3 = {B} B  CC | b
C  AB | a
Constructing The Triangular
Table

{S, A} {B}
{B} {A, C} {A, C} {B} {A, C}
b a a b a
Constructing The Triangular
Table
 X3 , 4 = (Xi , i ,Xi+1 , j) = (X3 , 3 , X4 , 4)
  {A, C}{B} = {AB, CB} = Y
 Steps:
 Look for production rules to generate Y
 There are two: S and C S  AB | BC
A  BA | a
 X3 , 4 = {S, C} B  CC | b
C  AB | a
Constructing The Triangular
Table

{S, A} {B} {S, C}


{B} {A, C} {A, C} {B} {A, C}
b a a b a
Constructing The Triangular
Table
 X4 , 5 = (Xi , i ,Xi+1 , j) = (X4 , 4 , X5 , 5)
  {B}{A, C} = {BA, BC} = Y
 Steps:
 Look for production rules to generate Y
 There are two: S and A S  AB | BC
A  BA | a
 X4 , 5 = {S, A} B  CC | b
C  AB | a
Constructing The Triangular
Table

{S, A} {B} {S, C} {S, A}


{B} {A, C} {A, C} {B} {A, C}
b a a b a
Constructing The Triangular
Table
 X1 , 3 = (Xi , i ,Xi+1 , j) (Xi , i+1 ,Xi+2 , j)
= (X1 , 1 , X2 , 3) , (X1 , 2 , X3 , 3)
  {B}{B} U {S, A}{A, C}= {BB, SA, SC, AA,
S  AB | BC
AC} = Y A  BA | a
 Steps: B  CC | b
C  AB | a
 Look for production rules to generate Y
 There are NONE: S and A
 X1 , 3 = Ø
 no elements in this set (empty set)
Constructing The Triangular
Table

Ø
{S, A} {B} {S, C} {S, A}
{B} {A, C} {A, C} {B} {A, C}
b a a b a
Constructing The Triangular
Table
 X2 , 4 = (Xi , i ,Xi+1 , j) (Xi , i+1 ,Xi+2 , j)
= (X2 , 2 , X3 , 4) , (X2 , 3 , X4 , 4)
  {A, C}{S, C} U {B}{B}= {AS, AC, CS, CC,
BB} = Y
 Steps:
 Look for production rules to generate Y
S  AB | BC
 There is one: B A  BA | a
 X2 , 4 = {B} B  CC | b
C  AB | a
Constructing The Triangular
Table

Ø {B}
{S, A} {B} {S, C} {S, A}
{B} {A, C} {A, C} {B} {A, C}
b a a b a
Constructing The Triangular
Table
 X3 , 5 = (Xi , i ,Xi+1 , j) (Xi , i+1 ,Xi+2 , j)
= (X3 , 3 , X4 , 5) , (X3 , 4 , X5 , 5)
  {A,C}{S,A} U {S,C}{A,C}
= {AS, AA, CS, CA, SA, SC, CA, CC} = Y
 Steps:
 Look for production rules to generate SY AB | BC
A  BA | a
 There is one: B B  CC | b
 X3 , 5 = {B} C  AB | a
Constructing The Triangular
Table

Ø {B} {B}
{S, A} {B} {S, C} {S, A}
{B} {A, C} {A, C} {B} {A, C}
b a a b a
Final Triangular Table
{S, A, C}  X1, 5
Ø {S, A, C}
Ø {B} {B}
{S, A} {B} {S, C} {S, A}
{B} {A, C} {A, C} {B} {A, C}
b a a b a

- Table for string ‘w’ that has length 5


- The algorithm populates the triangular table
Example (Result)
 Is baaba in L(G)?

Yes

We can see the S in the set X1n where ‘n’ = 5


We can see the table
the cell X15 = (S, A, C) then
if S Є X15 then baaba Є L(G)
“Undecidable” problems for
CFL
 Is a given CFG G ambiguous?
 Is a given CFL inherently ambiguous?
 Is the intersection of two CFLs empty?
 Are two CFLs the same?
 Is a given L(G) equal to ∑*?

76
Summary
 Normal Forms
 Chomsky Normal Form
 Griebach Normal Form
 Useful in proroving P/L
 Pumping Lemma for CFLs
 Main difference: z=uviwxiy
 Closure properties
 Closed under: union, concatentation, reversal, Kleen
closure, homomorphism, substitution
 Not closed under: intersection, complementation,
difference

77

You might also like