Chapter:3 Grammar
Context Free Language
CFLs are produced by context free language structure. The arrangement of all
setting free dialects is indistinguishable from the set of dialects acknowledged
by push down automata, and the set of regular language is a subset of CFG
• Grammar:
- Sentence structure is a lot of rules which check whether a string have a place
with a specific language or not. This defined Sentence structure is called
Grammar
- Syntax of language is defined by this notation
- Designing of Parser is done by the help of CFG
• Generation of Derivation Tree
- A deduction tree or parse tree is an arranged established tree that graphically
speaks to the semantic data a string got from a CFG.
• Representation Technique
- Root vertex: Must be labeled by the start symbol.
- Vertex: Labeled by a non-terminal symbol.
- Leaves: Labeled by a terminal symbol or ε.
• Top-down Approach
- Starts with the starting symbol S.
- Goes down to tree leaves using productions.
• Bottom-up Approach
- Starts from tree leaves.
- Proceeds upward to the root which is the starting symbol S.
• The induction or the yield of a parse tree is the last string acquired by
connecting the marks of the leaves of the tree from left to right,
overlooking the Nulls. In any case, if all the leaves are Null, deduction is
Null
• EXAMPLE
Let a CFG {N,T,P,S} be N = {S}, T = {a, b}, Starting symbol = S, P = S → SS |
aSb | ε One derivation from the above CFG is “abaabb”
Answer:-
S → SS
S → aSbS (replace S →aSb)
S →abS (replace S →ε)
S → abaSb (replace S →aSb)
S → abaaSbb (replace S →aSb)
S→ abaabb (replace S →ε)
• Sentential Form and Partial Derivation Tree
• A fractional deduction tree is a sub-tree of an induction tree/parse tree with
the end goal that either the entirety of its kids are in the sub-tree or none of
them are in the sub-tree
• Example
In the event that in any CFG the creations/production are:
S → AB, A → aaA | ε, B →Bb| ε
• The fractional deduction tree can be the accompanying:
-In the event that an incomplete determination tree contains the root S, it is
known as a sentential form.
-The above example is likewise in sentential form.
Types of Derivation Tree
• Leftmost and Rightmost Derivation of a String
- Leftmost derivation: A furthest left induction is acquired by applying creation to the
furthest left factor in each progression
- Rightmost derivation: A furthest right deduction is acquired by applying creation to
the furthest right factor in each progression.
Left and Right Recursive Grammars
• In a CFG "G", if there is a creation in the structure X → Xa where X is a non-terminal
and 'a' will be a series of terminals, it is known as a left recursive creation. The syntax
having a left recursive creation is known as a left recursive production
• Also, if in a CFG "G", if there is a creation is in the structure X → aX where X is a non-
terminal and 'a' will be a series of terminals, it is known as a right recursive production.
The language structure having a privilege recursive creation is known as a right
recursive production.
CFG Simplification
• In a CFG, it might happen that all the production rules and symbol are not required for
the determination of strings. Furthermore, there might be some invalid production and
unit creations. End of these production and symbols is called disentanglement of CFGs.
Steps to simply
• Reduction of CFG
• Removal of Unit Productions
• Removal of Null Productions
Reduction of CFG
Derivation of an equivalent grammar, G’, from the CFG, G, such that each variable
derives some terminal string.
Derivation Procedure Part 1:
Step 1: Include all symbols, W1, that derive some terminal and initialize i=1.
Step 2: Include all symbols, Wi+1, that derive Wi.
Step 3: Increment i and repeat Step 2, until Wi+1 = Wi.
Step 4: Include all production rules that have Wi in it.
Derivation of an equivalent grammar, G”, from the CFG, G’ such that each symbol
appears in a sentential form.
Derivation Procedure part 2:
Step 1: Include the start symbol in Y1 and initialize i = 1.
Step 2: Include all symbols, Yi+1, that can be derived from Yi and include all production
rules that have been applied.
Step 3: Increment i and repeat Step 2, until Yi+1 = Yi.
Removal of Unit Productions
Any production rule in the form A → B where A, B ∈ Non-terminal is called unit
production
Removal Procedure:-
Step 1: To remove A→B, add production A → x to the grammar rule whenever B → x
occurs in the grammar. [x ∈ Terminal, x can be Null]
Step 2: Delete A→B from the grammar.
Step 3: Repeat from step 1 until all unit productions are removed.
Removal of Null Productions
In a CFG, a non-terminal symbol ‘A’ is a nullable variable if there is a production A → ϵ
or there is a derivation that starts at A and finally ends up with
ϵ: A → .......… → ϵ
Removal Procedure:
Step1 Find out nullable non-terminal variables which derive ϵ.
Step2 For each production A → a, construct all productions A → x where x is obtained
from ‘a’ by removing one or multiple non-terminals from Step 1.
Step3 Combine the original productions with the result of step 2 and remove ϵ-
productions.
Chomsky Normal Form
In formal language theory, a CFG ”G ” is said to be in Chomsky normal form if all of
its production rules are of the form:
Q → RE,
or Q → f,
or S → ε,
Here Q,R,E are Non terminals and S is starting symbol. f and ε(epsilon) are terminals
where ε represent NULL
Greibach Normal Form
In formal language theory, a CFG is in Greibach normal form (GNF) if the right-hand
sides of all production rules start with a terminal symbol, optionally followed by some
variables.
Either that in the form:
P → aQWERTY
P→a
S→ε
Here P,Q,W,E,R,T,Y are non terminals and S is start symbol. ε and a are terminals
Non Deterministic Push down Automata
• A nondeterministic pushdown automaton (npda) is basically an nfa with a stack
added to it.
• A nondeterministic pushdown automaton or npda is a 7-tuple
M = (Q,∑, Γ, δ,q0, z, F)
• Q is a finite set of states,
∑ is a the input alphabet,
Γ is the stack alphabet,
δ is a transition function,
q0 ∈ Q is the initial state,
z ∈ Γ is the stack start symbol, and
F ⊆ Q is a set of final states.
Block diagram of PDA
Example of NPDA
PDA equivalence to CFG
• Algorithm to find PDA corresponding to a given CFG:
Input − A CFG, G = (V, T, P, S)
Output − Equivalent PDA, P = (Q, ∑, S, δ, q0, I, F)
Step 1 − Convert the productions of the CFG into GNF.
Step 2 − The PDA will have only one state {q}.
Step 3 − The start symbol of CFG will be the start symbol in the PDA.
Step 4 − All non-terminals of the CFG will be the stack symbols of the PDA and
all the terminals of the CFG will be the input symbols of the PDA.
Step 5 − For each production in the form A → aX where a is terminal and A,
X are combination of terminal and non-terminals, make a transition δ (q, a, A)
Chapter 3: Parse Tree
I. Parse Tree
a. Parse trees are a representation of derivations that is much more
compact. Several derivations may correspond to the same parse
tree.
For example, in the balanced parenthesis grammar, the following parse
tree:
S S
( S ) ( S )
b. The yield of a parse tree is the concatenation of the labels of the
leaves, from left to right. The yield of the tree above is ()().
2. Leftmost Derivation and Rightmost derivation
a. In a leftmost derivation, at each step the leftmost nonterminal is
replaced. In a rightmost derivation, at each step the rightmost
nonterminal is replaced.
b. Such replacements are indicated by R ͢ and ͢L respectively.
c. Their transitive closures are R͢ * and L
͢ * respectively.
d. In the balanced parenthesis grammar, this is a leftmost derivation:
S ⇒ SS ⇒ (S)S ⇒ ()S ⇒ ()(S) ⇒ ()().
This is a rightmost derivation:
S ⇒ SS ⇒ S(S) ⇒ S() ⇒ (S)() ⇒ ()()
3. Ambiguity In CFG
1. A context-free grammar G = (V, Σ, R, S) is ambiguous if there is
some string w ∈ Σ ∗ such that there are two distinct parse trees T 1
and T 2 having S at the root and having yield w.
2. Equivalently, w has two or more leftmost derivations, or two or more
rightmost derivations.
Note that languages are not ambiguous; grammars are. Also, it has to be the
same string w with two different (leftmost or rightmost) derivations for a
grammar to be ambiguous.
Here is an example of an ambiguous grammar:
E→E+E
E→E∗E
E → (E)
E→a
E→b
E→c
In this grammar, the string a + b ∗ c can be parsed in two different ways
corresponding to doing the addition before or after the multiplication. This
is very bad for a compiler, because the compiler uses the parse tree to
generate code, meaning that this string could have two very different
semantics. Here are two parse trees for the string a + b ∗ c in this
grammar:
Here are two parse trees for the string a + b ∗ c in this grammar:
E E
E + E E * E
a E * E E + E c
b c a b
➢ A grammar G is ambiguous if there is a string in L(G) for which
● there are two distinct parse trees, or
● there two distinct leftmost derivations, or
● there are two distinct rightmost derivations
Parse Tree for Grammar
Derivation can be represented as parse Tree
CFG : G2
S--->aSb |€ w=aaaabbbb
Derivation : Parse Tree
S--->aSb S
--->aaSbb a S b
--->aaaSbbb a S b
--->aaaabbbb a S b
a S b
3. Pumping Lemma of CFG
➢ For every CFL L there is a constant k ≥ 0 such that for any word z in L of
length at least k, there are strings u, v,w, x, y such that
➢ z = uvwxy,
➢ Vx ≠ ϵ
➢ |vwx| ≤ k,
and for each i ≥ 0, the string uviwxiy belongs to L.
6. Deterministic Push Down Automata
Two Transitions ((p1,u1,v1),(q1,z1)) and ((p2,u2,v2),(q2,z2)) are compatible if
1. p1=p2
2. U1=u2 are compatible (which means that u1 is a prefix of u2 or that u2is prefix of
u1).
3. V1 and v2 are compatible.
➢ A pushdown automaton is deterministic if for every pair of compatible transitions,
these transitions are identical.
➢ Let L be a language defined over the alphabet Σ, the language L is deterministic
context-free if and only if it is accepted by a deterministic pushdown automaton.
Not all context free language s are deterministic context –free
L1={wcwR | w ∈{a,b}* } is deterministic context-free.
L2={wwR | w ∈{a,b}* } is context free but not deterministic context-free.
➢ DPDA with empty stack acceptance has lesser power than DPDA with final state
acceptance
Chapter: 3 Closure Properties
I. Closure properties of CFLs
1. Union
2. Concatenation
3. Kleene Star
4. Reversal
5. Homomorphism
1. Union
• If L1 and If L2 are two context free languages, their union L1 ∪ L2 will also be
context free.
• Let L1 and L2 be CFLs with grammars G1 and G2, respectively.
• Assume G1 and G2 have no variables in common.
• Let S1 and S2 be the start symbols of G1 and G2.
• Form a grammar for (L1 ∪ L2) by combining all the productions of G1 and G2.
• Add a new start symbol S. Add productions S -> S1 | S2.
Example: Let L1= { anbncm | m >= 0 and n >= 0 } and L2 = { anbmcm | n >= 0 and m >=0}
are two context free languages then L3 = L1 ∪ L2 = { anbncm ∪ anbmcm
| n >= 0, m >= 0 } is also context free.
• Here Language L1 generates all strings that contains occurrence of a’s
equals to occurrence of b’s and L2 generates all strings that contains
occurrence of b’s equals to the occurrence of c’s.
• Union require either of two condition require to be true. It can be accepted
by pushdown automata. Hence, Language L3 is also CFL.
2. Concatenation
• L1 and L2 are CFLs, then their concatenation L1.L2 will also be context free
• Let L1 and L2 be CFL’s with grammars G1 and G2, respectively.
• Assume G1 and G2 have no variables in common.
• Let S1 and S2 be the start symbols of G1 and G2.
• Form a new grammar for L1. L2 by starting with productions of G1 and G2.
• Add a new start symbol S and production S -> S1S2.
• Every derivation from S results in a string in L1 followed by one in L2.
• Boards with an FTDI chip should work right out of the box, without the need of
installing any drivers.
Example: Let L1 = { anbn | n >= 0 } and L2 = { cmdm | m >= 0 } are two context free
languages then L3 = L1.L2 = {anbn cmdm | m >= 0 and n >= 0} is also
context free.
• Here L1 grammar generate all strings which contains equal occurrence of
a’s and b’s and L2 generate equal occurrence of c’s and d’s. Language L3
can be accepted by Pushdown automata.
• Hence, Concatenation is closed under CFLs.
3. Kleene Star (*)
• L1 is context free, then its Kleene closure L1* will also be context free.
• Let L have grammar G, with start symbol S1.
• Grammar for L* by introducing to G a new start symbol S and the productions
S -> S1S | ε.
• A rightmost derivation from S generates a sequence of zero or more S1’s, each of
which generates some string in L.
Example: If L1 = { anbn | n >= 0 } is Context free language thenL1* = { anbn | n >= 0 }*
is also context free.
4. Reversal
• L is a CFL with grammar G
• Grammar for LR by reversing the right side of every production.
Example: Let Grammar G have productions S -> 0S1 | 01. Then reversal of
grammar GR is S -> 1S0 | 10. Hence, LR is also Context free grammar.
5. Homomorphism
• Let L be a CFL with grammar G.
• Let h be a homomorphism on the terminal symbols of G.
• Construct a grammar for h(L) by replacing each terminal symbol a by h(a).
Example: Grammar G has productions S -> 0S1 | 01. Homomorphism h is defined
by h(0) = ab, h(1) = ε. Then h(L(G)) has the grammar with productions
S -> abS | ab.
II. Non – Closure properties of CFLs
1. Intersection
2. Complementation
3. Difference
1. Intersection
• Let L1 and L2 are two context free languages
L1 = { anbncm | n >= 0 and m >= 0 }
L2 = (ambncn | n >= 0 and m >= 0 }
L3 = L1 ∩ L2 = { anbncn | n >= 0 }
• L1 generate all strings of number of a’s should be equal to number of b’s and
L2 generate all strings of number of b’s should be equal to number of c’s.
• Intersection requires both conditions need to be true. It cannot be accepted by
pushdown automata, so it is not context free.
2. Complementation
• Let L1 and L2 are two context free languages
L1 = { anbncm | n >= 0 and m >= 0 }
L2 = (ambncn | n >= 0 and m >= 0 }
• CFLs are not closed under intersection. So Language is not context free and it
cannot be accepted by Pushdown automata. Thus, CFLs are not closed under
Complementation.
3. Difference
• Let L1 and L2 are two context free languages
Proof: L1 ∩ L2 = L1 – (L1 – L2).
• CFLs are not closed under Intersection. If CFLs were closed under difference,
they require to be closed under intersection, but they are not.
• Thus, CFLs are not closed under difference
Chapter: 3 Context Sensitive Grammar
I. Context Sensitive Grammar and Languages
1. Definition: Context Sensitive Grammar (CSG) is a quadruple G=(N,∑,P,S)
where,
- N is set of non-terminal symbols
- ∑ is set of terminal symbols
- S is set of start symbol
- P is set of production in form of αAβ → αɤβ where ɤ ≠ ϵ
• Derivation non-terminal A will be changed to ɤ only
• CSG is Non-contracting grammar as ɤ ≠ ϵ then α → β => |α| ≤ |β|
2. Context Sensitive Language(CSL)
The language that can be defined by context-sensitive grammar is called Context
sensitive language (CSL).
Example: S → abc/aAbc
Ab → bA
Ac → Bbcc
bB → Bb
aB → aa/aaA
2.1 Derivation of Context sensitive labguage
S → aAbc
→ abAc
→ abBbcc
→ aBbbcc
→ aaAbbcc
→ aabAbcc
→ aabbAcc
→ aabbBbccc
→ aabBbbccc
→ aaBbbbccc
→ aaabbbccc
Context sensitive language L= {anbncn | n≥1}.
3. Closure properties of Context sensitive Languages(CSLs)
3.1 Union
If λ ∉ L1 ∪ L2, then let G = (N1 ∪ N2 ∪ {S}, T, S, P1 ∪ P2 ∪ {S → S1, S → S2}),
where S ∉ N1 ∪ N2, a new symbol. It can be seen that G generates the
language L1 ∪ L2. Hence, CSLs are closed under union.
3.2 Concatenation
• If λ ∉ L1 and λ ∉ L2, then let G = (N1 ∪ N2 ∪ {S}, T, S, P1 ∪ P2 ∪ {S → S1S2}),
where S ∉ N1 ∪ N2 a new symbol.
• If λ ∈ L1 and λ ∉ L2, then
let G = (N1 ∪ N2 ∪ {S}, T, S, P1 ∪ P2 ∪ {S → S1S2, S → S2} \ {S1 → λ}),
where S is a new symbol.
• If λ ∉ L1 and λ ∈ L2, then
let G = (N1 ∪ N2 ∪ {S}, T, S, P1 ∪ P2 ∪ {S → S1S2, S → S1} \ {S2 → λ}),
where S ∉ N1 ∪ N2 a new symbol.
• If λ∈ L1 and λ ∈ L2, then
let G = (N1 ∪ N2 ∪ {S}, T, S, P1 ∪ P2 ∪ {S → S1S2, S → S1, S → S2, S → λ} \
{S1 → λ, S2 → λ}), where S ∉ N1 ∪ N2.
Hence, from above for cases it can be easily generating G for the language L1. L2
3.2 Kleene-star
Let now G1 = (N1, T, S1, P1) and G2 = (N2, T, S2, P2) be in Kuroda normal form
and generate the languages L (both G1 and G2) such that N1 ∩ N2 =∅.
Let G = (N1 ∪ N2 ∪ {S, S'},
P1 ∪ P2 ∪{S →λ, S → S1, S → S1S2, S → S1S2S', S' → S1, S' → S1S2, S' → S1S2S
'} \ {S1 → λ, S2 → λ}),
Where S, S' ∉ N1 ∪ N2, they are new symbols. Then it can be easily generating G
for the language L*.
4. Equivalence of Context sensitive languages (CSLs)
• The following grammar(G) is context-sensitive.
S -> aTb | ab
aT -> aaTb | ac
• The language generated by grammar G. L = {ab} ∪ {ancbn | n>0} is also a
context-free.
Example: Context free grammar(G1) for this.
S -> aTb | ab
T -> aTb | c
• So any context-free language is context sensitive. Not all context sensitive but it
need not be context free.
4.1 Theorem: Every context-sensitive language L is recursive.
Let L be Context sensitive language and G be Context sensitive grammar then
Derivation of string w, S ->S1 ,S1->S2, S2->S3........ = w
No of steps are Bound on possible derivations. We know that |xi|< |xi +1| (G is
non-contracting).
To check whether w is in L(G) as follows
- Construct a transition graph whose vertices are the strings of length|w|.
- To find Paths correspond to derivation in G. Add edge from x to y if x -> y.
- w ∈ L(G) if there is a path from S to w.
4.2 Theorem: There exists some recursive language that is not context sensitive.
• Language L is recursive
Create possible CSG Gi = (Ni , {0; 1; 2; 3; 4; 5; 6; 7; 8; 9}, Si , Pi ) which
generates numbers.
We can define language L, which contains the numbers of the grammars which
does not generate the number of its position in the list L = {i | i ∉ L(Gi )}.
We can create a list of all context-sensitive generative grammars which
generates numbers, and we can decide whether or not a context-sensitive
grammar generates its position in the list. Hence language L is recursive.
4.3 Theorem: There exists a recursive language that is not context sensitive.
• Language L is not context sensitive
Assume that L is a context sensitive language.So there is a context sensitive
grammar Gc , such that L(Gc ) = L for some c.
1. If c ∈ L(Gc), by the denition of L, we have c ∉ L, but L = L(Gc ). So a
contradiction.
- If c ∉ L(Gc ), then c ∈ L is also a contradiction since L = L(Gc ).
- Hence language L is not context sensitive.
5. Chomsky Hierarchy
5.1. Hierarchy of languages.
- Type-0 : Recursively enumerable language
- Type-1 : Context Sensitive language
- Type-2 : Context Free language
- Type-3 : Regular language
5.1.1 Type-0 : Recursively enumerable language
Type-0 grammars generate recursively enumerable languages. The
productions have no restrictions. They are any phase structure grammar
including all formal grammars. They generate the languages that are recognized
by a Turing machine.
The productions can be in the form of α → β where α is a string of terminals and
non-terminals with at least one non-terminal and α cannot be null. β is a string of
terminals and non-terminals.
S → ACaB
Bc → acB
CB → DB
aD → Db
5.1.2 Type-1: Context Sensitive language
Type-1 grammars generate context-sensitive languages. The productions must
be in the form α A β → α γ β where A ∈ N (Non-terminal) and α, β, γ ∈ (T ∪
N)* (Strings of terminals and non-terminals)
The strings α and β may be empty, but γ must be non-empty. The rule S → ε is
allowed if S does not appear on the right side of any rule. The languages
generated by these grammars are recognized by a linear bounded automaton.
AB → AbBc
A → bcA
B→b
5.1.3 Type-2: Context Free language
Type-2 grammars generate context-free languages. The productions must be in
the form A → γ where A ∈ N (Non terminal) and γ ∈ (T ∪ N)* (String of terminals
and non-terminals). These languages generated by these grammars are be
recognized by a non-deterministic pushdown automaton.
S→Xa
X→a
X → aX
X → abc
X→ε
5.1.4 Type-3 : Regular language
Type-3 grammars generate regular languages.Type-3 grammars must have a
single non-terminal on the left-hand side and a right-hand side consisting of a
single terminal or single terminal followed by a single non-terminal.
The productions must be in the form X → a or X → aY where X, Y ∈ N (Non
terminal) and a ∈ T (Terminal).The rule S → ε is allowed if S does not appear on
the right side of any rule.
X→ε
X → a | aY
Y→b
X. Linear Bounded Automata (LBA)
Linear Bounded Automata is a single tape non-deterministic Turing Machine with
two special tape symbols call them left marker < and right marker >.
The transitions should satisfy these conditions:
- It should not replace the marker symbols by any other symbol.
- It should not write on cells beyond the marker symbols.
Configuration will be: < q0a1a2a3a4a5.......an > = <q0w>
A linear bounded automaton (LBA) is a type of Turing machine wherein
the tape is not permitted to move off the portion of the tape containing the input.
If the machine tries to move its head off either end of the input, the head stays
where it is, in the same way that the head will not move off the left-hand end of
an ordinary Turing machine's tape. It is a TM with a limited amount of memory. It
can only solve problems requiring memory that can fit within the tape used for the
input. Using a tape alphabet larger than the input alphabet allows the available
memory to be increased up to a constant factor.
1. Formal Definition: Linear Bounded Automata is a non-deterministic Turing
Machine, M=(Q, Σ ,T, 𝛿, B, ,F, q0, <, >, t, r) Where,
- Q is set of all states
- Σ is set of all terminals
- T is set of all tape alphabets
- 𝛿 is set of transitions
- B is blank symbol
- q0 is the initial state
- < is left marker and > is right marker
- t is accept state
- r is reject state
Example 1.1: Design a Turing Machine for Context sensitive language
L= {anbncn | n≥1}.