0% found this document useful (0 votes)
29 views14 pages

Jan-June 2025 Btcs 4 Sem v10 Btcs404 Btcs404 Unit3 Notes

The document provides an overview of context-free grammars (CFG), including their components, derivation processes, and examples of CFGs for various languages. It discusses concepts such as derivation trees, leftmost and rightmost derivations, ambiguity in grammars, and closure properties of context-free languages. Additionally, it covers CFG simplification techniques, including the removal of unit and null productions.

Uploaded by

ved26raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views14 pages

Jan-June 2025 Btcs 4 Sem v10 Btcs404 Btcs404 Unit3 Notes

The document provides an overview of context-free grammars (CFG), including their components, derivation processes, and examples of CFGs for various languages. It discusses concepts such as derivation trees, leftmost and rightmost derivations, ambiguity in grammars, and closure properties of context-free languages. Additionally, it covers CFG simplification techniques, including the removal of unit and null productions.

Uploaded by

ved26raj
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 14

KALINGA UNIVERSITY

Department of Computer Science


Unit – 3
COURSE: BTCS SEM/Year: 4th Sem
SUBJECT: TOAFL (BTCS 404)

Grammar:
 A grammar consists of:
o a set of variables (also called non-terminals), one of which is designated the
start variable. It is customary to use upper case letters for the variables.
o a set of terminals (from the alphabet) and
o a list of productions (also called rules).
 E.g.: 0n1n. The grammar for this is:
o S → 0S1
o S→ϵ
Here, ‘S’ is the only variable. The terminals are 0 and 1 and there are 2 productions.
 A production allows one to take a string containing a variable and replace the variable
by the RHS of the production.
 String w of terminals is generated by the grammar if, starting with the start variable,
one can apply productions and end up with w. The sequence of strings so obtained is a
derivation of w.
 We focus on a special version of grammars called a context-free grammar (CFG). A
language is context-free if it is generated by a CFG.
Context Free Grammar (CFG):
 A Context Free Grammar (CFG) has 4 tuples i.e.: V, Ʃ, R, S. Where,
o V: Final set of variables or non-terminals, generally represented by capital
letters
o Ʃ: Finite set of rules with each rule being a variable and string of variable and
terminals i.e.: V → (V U Ʃ).
o S: Starting variable or non-terminal called the start symbol of the grammar.
o G: It is a context free, if all the production in ‘R’ have the form α → β, where
α ϵ V and β ϵ (V U Ʃ)*
 Every regular grammar is context free. So, a regular language is also a context free
one. It is possible to design, a CFG for this language. So, the family of regular
language is a proper subset of the family of context free language.
 Many programming language constructs have an inherently recursive structure, that
can be defined by context free grammar. E.g.: if ‘S 1’ and ‘S2’ are statements and ‘E’ is
an expression then “if E then S1 else S2” is a statement. The expression for this is:
Stml →if expr then stml else stmt.
Derivation:
 First, we define two notations i.e.: =>G and =>G*.
 If α → β is a production of ‘R’ in CFG and ‘a’ and ‘b’ are strings in (V U Ʃ)*, then
a α b => a β b, then we say that the production α → β is applied to the string a α b to
obtain a β b or , we say that a α b is directly derives a β b.

1
 Now α1, α2 … αm are strings in (V U Ʃ)*, where m ≥ 1 and α 1 =>G α2 =>G … =>G αm,
then we say that α1 =>G* αm then we say that α1 =>G* αm. That is: we say ‘α1’ derives
‘αm’ in grammar ‘G’.
E.g:- R → OA1
A→B
B→#
 The context free grammar is: V = {A, B}, Ʃ = {0, 1, #}, S = A. If we want to derive a
string 000#111 in grammar ‘G’ is: A => G 0A1 =>G 00A11 =>G 000A111 =>G 000B111
=>G 000#111.
Design of CFG for a Language:
1. Design a CFG for a language L = {anbn | n ≥ 1}
Ans: V = {S}, Ʃ = {a, b}
S → ab
S → aSb
S =>G* aSb
2. Design a CFG for L = {wcwR | w ϵ (a, b)*}. wR = Reverse of w, w = abab.
Ans: L = {wcwR | w ϵ (a, b)*}, w = abab
=> L = ababcbaba
Ʃ = {a, b, c}
V = {S
R = S → aSa
S → bSb
S→c
S → abSba
S → abcba
3. Design a CFG for any combination of ‘a’ and ‘b’.
Ans: Ʃ = {a, b}
V = {S
S → anbn
R=S→ϵ
S → aS
S → bS
4. Generate a string which contains at least one occurrence of ‘aaa’.
Ans: Ʃ = {a}
V = {S
R = S → AaaaA
A → aA | ba | ϵ
5. Design a CFG, generating an alternating sequence of ‘0’ and ‘1’.
Ans: Ʃ = {0, 1}
V = {S
R = S → 0A
S → 1B
A → 1B
A→ϵ
B → 0A
B→ϵ

2
6. Construct a CFG, which generates palindrome binary numbers.
Ans: Ʃ = {0, 1}
V = {S
R = S → 0S0
S → 1S1
S→0
S→1
S→ϵ
7. Design a CFG for a regular expression (0 + 1)*. 00. (0 + 1)*.
Ans: Ʃ = {0, 1}
V = {S
R = S → A00A
A → 0A
A → 1A
A→ϵ
Generation of Derivation Tree:
 A derivation tree or parse tree is an ordered rooted tree that graphically represents the
semantic information a string derived from a context-free grammar.
Representation Technique:
 Root vertex: Must be labelled by the start symbol.
 Vertex: Labelled by a non-terminal symbol.
 Leaves: Labelled by a terminal or ϵ.
 If S → x1x2 … xn is a production rule in a CFG, then the parse tree/ derivation tree
will be as follows:

 There are two different approaches to draw a derivation tree;


 Top down Approach:
o Starts with the starting symbol S
o Goes down to tree leaves using productions.
 Bottom up Approach:
o Starts with tree leaves.
o Proceeds upward to the root which is the starting symbol S.
Derivation or Yield of a Tree:
 The derivation or the yield of a parse tree is the final string obtained by concatenating
the labels of the leaves of the tree from left to right, ignoring the Nulls. However, if
all the leaves are Null, derivation is Null.
 Example: Let a CFG {V, Ʃ, R, S} be
V = {S}, Ʃ = {a, b}, Starting symbol = S, R = S → SS | aSb | ϵ.
One derivation from above CFG is “abaabb”.
S → SS → aSbS → abS → abaSb → abaaSbb → abaabb

3
Sentential Form and Partial Derivation Tree:
 A partial derivation tree is a sub-tree of a derivation tree/parse tree such that either all
of its children are in the sub-tree or none of them are in the sub-tree.
Example:
 If in any CFG the productions are: S → AB, A → aaA | ϵ, B →Bb| ϵ the partial
derivation tree can be the following:
 If a partial derivation tree contains the root S, it is called a sentential form. The above
sub-tree is also in sentential form.
Leftmost and Rightmost Derivation of a String:
 Leftmost derivation - A leftmost derivation is obtained by applying production to the
leftmost variable in each step.
 Rightmost derivation - A rightmost derivation is obtained by applying production to
the rightmost variable in each step.
Example:
 Let any set of production rules in a CFG be:
X → X + X | X * X | X | a over an alphabet {a}.
 The leftmost derivation for the string "a + a * a" may be:
X → X + X → a + X → a + X * X → a + a * X→ a + a * a
 The stepwise derivation of the above string is shown as below:

4
 The right most derivation for the above string “ a + a * a” may be:
X → X * X → X * a → X + X * a →X + a * a → a + a * a
 The stepwise derivation of the above string is shown as below:

5
Ambiguity in Context Free Grammars:

w ϵ L(G), it is called an ambiguous grammar. There exist multiple right-most or left-


 If a context free grammar G has more than one derivation tree for some string

most derivations for some string generated from that grammar.


Problem:
 Check whether the grammar G with production rules: X → X + X | X * X | X | a is
ambiguous or not.
Solution:
 Let’s find out the derivation tree for the string "a + a * a". It has two leftmost
derivations.
Derivation 1: X → X + X → a + X → a + X * X → a + a * X → a + a * a
Parse tree 1:

Derivation 2: X → X * X → X + X * X → a + X * X → a + a * X→ a + a * a
Parse tree 2:

Since there are two parse trees for a single string "a + a * a", the grammar G is ambiguous.
CFL Closure Property:
 Context-free languages are closed under:
o Union
o Concatenation
o Kleene Star operation
 Union:
o Let L1 and L2 be two context free languages. Then L1 U L2 is also context free.
o E.g.: Let L1 = {anbn, n > 0}. Corresponding grammar G1 will have
R: S1 → aAb | ab.
Let L2 = {cmdm, n ≥ 0}. Corresponding grammar G2 will have R: S2 → cBb | ϵ
Union of L1 and L2, L1 U L2 = {anbn} U {cmdm}
The corresponding grammar G will have the additional production S → S1 | S2

6
 Concatenation:
o If L1 and L2 are context free languages, then L1L2 is also context free.
o E.g.: Union of the languages L1 and L2, L = L1L2 = {anbncmdm}.
o The corresponding grammar G will have the additional production S → S1S2
 Kleen Star:
o If L is a context free language, then L* is also context free.
o E.g.: Let L = {anbn, n ≥ 0}. Corresponding grammar G will have
R: S → aAb | ϵ
o Kleen Star L1 = {anbn}*
o The corresponding grammar G1 will have additional productions S1 → SS1 | ϵ
 Context free languages are not closed under:
o Intersection: If L1 and L2 are context free languages, then L 1 ∩ L2 is not
necessarily context free.
o Intersection with Regular Language: If L1 is a regular language and L2 is a
context free language, then L1 ∩ L2 is a context free language.
o Complement: If L1 is a context free language, then L1’ may not be context
free.
CFG Simplification:
 In a CFG, it may happen that all the production rules and symbols are not needed for
the derivation of strings. Besides, there may be some null productions and unit
productions. Elimination of these productions and symbols is called simplification of
CFGs.
 Simplification essentially comprises of the following steps:
o Reduction of CFG
o Removal of Unit Productions
o Removal of Null Productions
 Reduction of CFG: CFGs are reduced in two phases:
o Phase 1: Derivation of an equivalent grammar, G’, from the CFG, G, such
that each variable derives some terminal string.
o Derivation Procedure:
 Step 1: Include all symbols, W1, that derive some terminal and
initialize i =1.
 Step 2: Include all symbols, Wi + 1 that derive Wi.
 Step 3: Increment i and repeat Step 2, until Wi + 1 = Wi
 Step 4: Include all production rules that have Wi in it.
o Phase 2: Derivation of an equivalent grammar, G’’ from the CFG, G’, such
that each symbol appears in a sentential form.
o Derivation Procedure:
 Step 1: Include the start symbol in Y1 and initialize i = 1.
 Step 2: Include all symbols, Yi + 1, that can be derived from Yi and
include all production rules that have been applied.
 Step 3: Increment i and repeat Step 2, until Yi + 1 = Yi

7
 Problem: Find a reduced grammar equivalent to the grammar G, having production
rules, R: S → AC | B, A → a, C → c | BC, E → aA | e
 Solution:
o Phase 1:
 Ʃ = { a, c, e }
 W1 = {A, C, E} from rules A → a, C → c and E → aA
 W2 = {A, C, E} U { S } from rule S → AC
 W3 = {A, C, E, S} U ϕ
 Since W2 = W3, we can derive G’ as: G’ = {{ A, C, E, S }, { a, c, e },
R, {S}} where R: S → AC, A → a, C → c , E → aA | ϵ
o Phase 2:
 Y1 = {S}
 Y2 = {S, A, C} from rule S → AC
 Y3 = {S, A, C, a, c} from rules A → a and C → c
 Y4 = {S, A, C, a, c}
 Since Y3 = Y4, we can derive G” as: G” = {{A, C, S}, {a, c}, R, {S}},
where R: S → AC, A → a, C → c

o Any production rules in the form A → B where A, B ϵ Non-terminal is called


 Removal of Unit productions:

unit production.
 Removal Procedure:

whenever B → x occurs in the grammar. [x ϵ Terminal, x can be Null]


o Step 1: To remove A → B, add production A → x to the grammar rule

o Step 2: Delete A→B from the grammar.


o Step 3: Repeat from step 1 until all unit productions are removed.
 Problem: Remove unit production from the following:
S → XY, X → a, Y → Z | b, Z → M, M → N, N → a
 Solution:
o There are 3 unit productions in the grammar: Y → Z, Z → M and M → N
o At first, we will remove M → N.
o As N → a, we add M → a, and M → N is removed.
o The production set becomes
S → XY, X → a, Y → Z | b, Z → M, M → a, N → a
o Now we will remove Z → M.
o As M → a, we add Z→ a, and Z → M is removed.
o The production set becomes
S → XY, X → a, Y → Z | b, Z → a, M → a, N → a
o Now we will remove Y → Z.
o As Z → a, we add Y→ a, and Y → Z is removed.
o The production set becomes
S → XY, X → a, Y → a | b, Z → a, M → a, N → a
o Now Z, M, and N are unreachable, hence we can remove those.
o The final CFG is unit production free: S → XY, X → a, Y → a | b

8
 Removal of Null Productions
o In a CFG, a non-terminal symbol ‘A’ is a nullable variable if there is a
production A → ϵ or there is a derivation that starts at A and finally ends up
with ϵ: A → ......→ ϵ
 Removal Procedure:
o Step 1: Find out nullable non-terminal variables which derive ϵ.
o Step 2: For each production A → a, construct all productions A → x where x
is obtained from ‘a’ by removing one or multiple non-terminals from Step 1.
o Step 3: Combine the original productions with the result of step 2 and remove
ϵ-productions.
 Problem:
o Remove null production from the following:
S→ASA | aB | b, A → B, B → b | ϵ
 Solution:
o There are two nullable variables: A and B
o At first, we will remove B → ϵ.
o After removing B → ϵ, the production set becomes:
S→ASA | aB | b | a, A → B| b | ϵ, B → b
o Now we will remove A → ϵ.
o After removing A → ϵ, the production set becomes:
S→ASA | aB | b | a | SA | AS | S, A → B| b, B → b
o This is the final production set without null transition.

 If L is a context-free language, there is a pumping length p such that any string w ϵ L


Pumping Lemma for CFG:

uvixyiz ϵ L.
of length ≥ p can be written as w = uvxyz, where vy ≠ ϵ, |vxy| ≤ p, and for all i ≥ 0,

Applications of Pumping Lemma:


 Pumping lemma is used to check whether a grammar is context free or not. Let us
take an example and show how it is checked.
Problem:
 Find out whether the language L= {xnynzn | n ≥ 1} is context free or not.
Solution:
 Let L is context free. Then, L must satisfy pumping lemma.
 At first, choose a number n of the pumping lemma. Then, take z as 0n1n2n.
 Break z into uvwxy, where |vwx| ≤ n and vx ≠ ϵ.
 Hence, vwx cannot involve both 0s and 2s, since the last 0 and the first 2 are at least
(n + 1) positions apart. There are two cases:
 Case 1: vwx has no 2s. Then vx has only 0s and 1s. Then uwy, which would have to
be in L, has n 2s, but fewer than n 0s or 1s.
 Case 2: vwx has no 0s.
 Here contradiction occurs. Hence L is not a context free language.

9
Chomsky Normal Form (CNF):
 A CFG is in Chomsky Normal Form if the Productions are in the following forms:
o A→a
o A → BC
o S→ϵ
where A, B, and C are non-terminals and a is terminal.
Algorithm to Convert into Chomsky Normal Form:
 Step 1: If the start symbol S occurs on some right side, create a new start symbol S’
and a new production S’ → S.
 Step 2: Remove Null productions.
 Step 3: Remove unit productions.
 Step 4: Replace each production A → B1 …… Bn where n > 2 with A → B1C where
C → B2 ….. Bn. Repeat this step for all productions having two or more symbols in
the right side.
 Step 5: If the right side of any production is in the form A → aB where a is a terminal
and A, B are non-terminal, then the production is replaced by A → XB and X→ a.
Repeat this step for every production which is in the form A → aB.
Problem:
Convert the following CFG into CNF
S→ ASA | aB, A → B | S, B → b | ϵ
Solution:
 Since S appears in R.H.S, we add a new state S0 and S0 → S is added to the
production set and it becomes:
S0 → S,
S → ASA | aB,
A → B | S,
B→b|ϵ
 Now we will remove the null productions: B → ϵ and A → ϵ.
 After removing B → ϵ, the production set becomes:
S0 → S,
S → ASA | aB | a,
A → B | S | ϵ,
B→b
 After removing A → ϵ, the production set becomes:
S0 → S,
S → ASA | aB | a | AS | SA | S,
A → B | S,
B→b
 Now we will remove the unit productions.
 After removing S → S, the production set becomes:
S0 → S,
S → ASA | aB | a | AS | SA,
A → B | S,
B→b

10
 After removing S0 → S, the production set becomes:
S0 → ASA | aB | a | AS | SA,
S→ ASA | aB | a | AS | SA
A → B | S,
B→b
 After removing A→ B, the production set becomes:
S0 → ASA | aB | a | AS | SA,
S→ ASA | aB | a | AS | SA
A→S|b
B→b
 After removing A→ S, the production set becomes:
S0 → ASA | aB | a | AS | SA,
S→ ASA | aB | a | AS | SA
A → b |ASA | aB | a | AS | SA,
B→b
 Now we will find out more than two variables in the R.H.S.
 Here, S0 → ASA, S → ASA, A→ ASA violates two Non-terminals in R.H.S.
 Hence we will apply step 4 and step 5 to get the following final production set which
is in CNF:
S0 → AX | aB | a | AS | SA
S→ AX | aB | a | AS | SA
A → b |AX | aB | a | AS | SA
B→b
X→ SA
 We have to change the productions S0 → aB, S→ aB, A→ aB and the final
production set becomes:
S0 → AX | YB | a | AS | SA
S→ AX | YB | a | AS | SA
A → b |AX | YB | a | AS | SA
B→b
X→ SA
Y→a
Greibach Normal Form (GNF):
 A CFG is in Greibach Normal Form if the Productions are in the following forms:
A→b
A → bD1 … Dn
S→ϵ
Where A, D1, .... , Dn are non-terminals and b is a terminal.
Algorithm to Convert a CFG into Greibach Normal Form:
 Step 1: If the start symbol S occurs on some right side, create a new start symbol S’
and a new production S’ → S.
 Step 2: Remove Null productions.
 Step 3: Remove unit productions.
 Step 4: Remove all direct and indirect left-recursion.
 Step 5: Do proper substitutions of productions to convert it into the proper form of
GNF.

11
Problem:
Convert the following CFG into CNF
S→ XY | Xn | p
X → mX | m
Y → Xn | o
Solution:
 Here, S does not appear on the right side of any production and there are no unit or
null productions in the production rule set. So, we can skip Step 1 to Step 3.
 Step 4: Now after replacing X in S → XY | Xo | p with mX | m
 We obtain S → mXY | mY | mXo | mo | p.
 And after replacing X in Y→ Xn | o with the right side of X → mX | m
 We obtain Y→ mXn | mn | o.
 Two new productions O→ o and P → p are added to the production set and then we
came to the final GNF as the following:
S → mXY | mY | mXC | mC | p
X→ mX | m
Y→ mXD | mD | o
O→o
P→p
Chomsky Hierarchy:
 According to Noam Chomsky, there are four types of grammars: Type 0, Type 1,
Type 2 and Type 3.
Grammar Grammar Language Automation
Type Accepted Accepted
Type 0 Unrestricted Recursively Turing machine
grammar enumerable
language
Type 1 Context sensitive Context sensitive Linear bounded
grammar language automation
Type 2 Context free Context free Pushdown
grammar language automation
Type 3 Regular grammar Regular language Finite state
automation

12
Type - 3 Grammar:
 Type-3 grammars generate regular languages. Type-3 grammars must have a single
non-terminal on the left-hand side and a right-hand side consisting of a single terminal

 The productions must be in the form X → a or X → aY where X, Y ϵ N (Non-


or single terminal followed by a single non-terminal.

terminal) and a ϵ T (Terminal)


 The rule S → ϵ is allowed if S does not appear on the right side of any rule.
Example
X→e
X → a | aY
Y→b
Type - 2 Grammar:

form A → γ where A ϵ N (Non-terminal) and γ ϵ (T∪N)*. (String of terminals and


 Type-2 grammars generate context-free languages. The productions must be in the

non-terminals).
 These languages generated by these grammars are be recognized by a non-
deterministic pushdown automaton.
Example:
S→Xa
X→a
X → aX
X → abc
X→ϵ
Type - 1 Grammar

the form α A β → α γ β where A ϵ N (Non-terminal) and α, β, γ ϵ (T ∪ N)*. (Strings


 Type-1 grammars generate context-sensitive languages. The productions must be in

of terminals and non-terminals).


 The strings α and β may be empty, but γ must be non-empty. The rule S → ϵ is
allowed if S does not appear on the right side of any rule. The languages generated by
these grammars are recognized by a linear bounded automaton.
Example:
AB → AbBc
A → bcA
B→b
Type - 0 Grammars:

13
 Type-0 grammars generate recursively enumerable languages. The productions have
no restrictions. They are any phase structure grammar including all formal grammars.
 They generate the languages that are recognized by a Turing machine.
 The productions can be in the form of a→ ß where a is a string of terminals and non-
terminals with at least one non-terminal and a cannot be null. ß is a string of terminals
and non-terminals.
Example:
S → ACaB
Bc → acB
CB → DB
aD → Db

14

You might also like