Harambee University
Department of Computer Science
Chapter 3
Context free languages (CFL)
CFG
In formal language theory, a context-free
language (CFL) is a language generated by
a context-free grammar (CFG).
A context-free grammar (CFG) is a formal
grammar whose production rules can be applied to
a nonterminal symbol regardless of its context.
It is a formal grammar which is used to generate all
possible patterns of strings in a given formal
language.
In particular, in a context-free grammar, the rule
that govern our language is production rule.
CFG
CFG stands for context-free grammar.
Context-free grammar G can be defined by four tuples
as: G = (V, T, P, S)
G is the grammar, which consists of a set of the
production rule. It is used to generate the string of a
language.
T is the final set of a terminal symbol. It is denoted by
lower case letters.
V is the final set of a non-terminal symbol. It is denoted
by capital letters.
P is a set of production rules, which is used for
replacing non-terminals symbols(on the left side of the
production) in a string with other terminal or non-
terminal symbols
S is the start symbol which is used to derive the string.
Example CFG
Production rules:
o S → aSa
o S → bSb
o S→c
Now check that abbcbba string can be derived from the
given CFG.
By applying the production S → aSa, S → bSb recursively
o S ⇒ aSa
o S ⇒ abSba
o S ⇒ abbSbba
o S ⇒ abbcbba
finally applying the production S → c, we get the string
abbcbba.
Classification of Context Free Grammars
Context Free Grammars (CFG) can be classified on the
basis of following two properties:
1) Based on number of strings it generates.
• If CFG is generating finite number of strings, then CFG
is Non-Recursive (or the grammar is said to be Non-
recursive grammar)
• Example
S->Aa A->b|c
• If CFG can generate infinite number of strings then the
grammar is said to be Recursive grammar
S->SaS S->b
The language(set of strings) generated by the above
grammar is :{b, bab, babab,…}, which is infinite
2) Based on number of derivation trees.
• If there is only 1 derivation tree then the CFG is
unambiguous.
• If there are more than 1 left most derivation tree or
right most derivation or parse tree , then the CFG
is ambiguous.
Capabilities of CFG
Context free grammar is useful to describe most of the
programming languages.
If the grammar is properly designed then an efficient
parser can be constructed automatically.
Using the features of associatively & precedence
information, suitable grammars for expressions can be
constructed.
Context-free languages have many applications
in programming languages, in particular, most
arithmetic expressions are generated by context-free
grammars.
Context free grammar is capable of describing nested
structures like: balanced parentheses, matching begin-
end, corresponding if-then-else's & so on.
Parsers
Parser is a compiler that is used to break the data
into smaller elements coming from lexical analysis
phase.
• A parser takes input in the form of sequence of
tokens and produces output in the form of parse
tree.
• There are two types of Parsing top down parsing
and bottom up parsing.
Parsing…
Top down paring
• The top down parsing is known as recursive parsing
or predictive parsing.
• In the top down parsing, the parsing starts from the
start symbol and transform it into the input symbol.
• Parse Tree representation of input string "acdb" is
as follows:
Parsing…
Bottom up parsing
• Bottom up parsing is also known as shift-reduce
parsing.
• Bottom up parsing is used to construct a parse
tree for an input string.
• In the bottom up parsing, the parsing starts with
the input symbol and construct the parse tree up
to the start symbol by tracing out the rightmost
derivations of string in reverse.
Parse tree
Parse tree is the graphical representation of symbol.
The symbol can be terminal or non-terminal.
In parsing, the string is derived using the start symbol.
The root of the parse tree is that start symbol.
Parse tree follows the precedence of operators. The
deepest sub-tree traversed first.
So, the operator in the parent node has less
precedence over the operator in the sub-tree.
follows these points:
All leaf nodes have to be terminals.
All interior nodes have to be non-terminals.
In-order traversal gives original input string.
Example
Production rules: S= S + S | S *S S = a|b|c
Input: a * b + c
step1
Grammar Ambiguity
grammar is said to be ambiguous if there exists
more than one leftmost derivation or more than
one rightmost derivative or more than one parse
tree for the given input string.
If the grammar is not ambiguous then it is called
unambiguous.
Cont….
S = aSb | SS S = ∈
For the string aabb, the above grammar generates two
parse trees:
If the grammar has ambiguity then it is not good for a
compiler construction.
No method can automatically detect and remove the
ambiguity
Remove ambiguity by re-writing the whole grammar
without ambiguity.
Derivation
Derivation is a sequence of production rules.
It is used to get the input string through these
production rules.
During parsing we have to take two decisions.
We have to decide the non-terminal which is to be
replaced.
We have to decide the production rule by which the
non-terminal will be replaced.
Left-most Derivation
Leftmost derivation is a process of deriving a string
by expanding the leftmost non-terminal at each
step.
In formal grammar, and the geometrical
representation of leftmost derivation is called a
leftmost derivation tree.
Leftmost derivations may be defined for any
arbitrary formal grammar satisfying the condition
that no terminal symbols occur on the left side of
any production
In the left most derivation, the input is scanned and
replaced with the production rule from left to right.
Example of left-most derivation
S=S+S
S=S-S
S = a | b |c
The left-most derivation is:
S=S+S
S=S-S+S
S=a-S+S
S=a-b+S
S=a-b+c
Right-most Derivation
In the right most derivation, the input is scanned and replaced
with the production rule from right to left.
So in right most derivatives we read the input string from right to
left.
Example:
S=S+S
S=S-S
S = a | b |c
The right-most derivation is:
S=S-S
S=S-S+S
S=S-S+c
S=S-b+c
S=a-b+c
S=a-b+c
Simplification of CFG- Automata
• CFG has recursive structure.
• The languages that are accepted with CFG are
called Context Free Languages.
• CFG has one condition for production rules, i.e., on the
left-hand side of each rule, there must be only single
variable, and on the right-hand side, there may be
combination of variables and terminals included.
• In CFG sometimes all the productions rules and
symbols are not needed for the derivation.
• Some productions rules are never used during
derivation of any string.
• Elimination of these types of productions or symbols is
called Simplification of Context Free Grammar.
Cont…
We will use the various simple methods to simplify the
given context free grammar without changing the resulting
language.
The following types of productions rules are never used
during derivation of any string from context free grammar:
Useless productions
Null productions
Unit productions
Example Given Results
S → XYX S → XY | YX | XX | X | Y
X → aX | a
X → aX | ε
Y → bY | b
Y → bY | ε
Chomsky's Normal Form
CNF stands for Chomsky normal form.
A CFG(context free grammar) is in CNF(Chomsky
normal form) if all production rules satisfy one of
the following conditions:
Start symbol generating ε. For example, A → ε.
A non-terminal generating two non-terminals. For
example, S → AB.
A non-terminal generating a terminal. For example,
S → a.
Chomsky's Normal Form
For example:
o G1 = {S → AB, S → e , A → a, B → b}
o G2 = {S → aA, A → a, B → c}
The production rules of Grammar G1 satisfy the
rules specified for CNF,
so the grammar G1 is in CNF.
However, the production rule of Grammar G2 does
not satisfy the rules specified for CNF as S → aZ
contains terminal followed by non-terminal.
So the grammar G2 is not in CNF.
Chomsky's Normal Form
• If we want to impose restriction on the right side of
production rule, then context free grammar is said
to be in a “normal form”.
A -> BC
A -> a
• In this, A, B and C are variables, and a is terminal
Steps to convert CFG to Chomsky's Normal Form
(CNF)
1.If start variable S occurs on right side of any
production rule, then create a new start symbol S1,
and add as a new production. S1 -> S,
2.Remove or eliminate all Null Productions in given
production rules.
3.Eliminate terminals from the RHS of the production
if they exist with other non-terminals or terminals.
S → aA decomposed as: S → RA R → a
4. Eliminate RHS with more than two non-terminals.
For example, S → ASB decomposed S → RB R → AS
Example 1
Convert the given CFG to CNF. Consider the given
grammar G1:
S → a | aA | B
A → aBB | ε
B → Aa | b
Step 1: We will create a new production S1 → S, as the
start symbol S appears on the RHS. The grammar will be:
S1 → S
S → a | aA | B
A → aBB | ε
B → Aa | b
Example…
Step 2: As grammar G1 contains A → ε null production, its removal from the
grammar yields:
S1 → S
S → a | aA | B
A → aBB
B → Aa | b | a
Now, as grammar G1 contains Unit production S → B, its removal yield:
S1 → S
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Also remove the unit production S1 → S, its removal from the grammar
yields:
S0 → a | aA | Aa | b
S → a | aA | Aa | b
A → aBB
B → Aa | b | a
Example…
Step 3: In the production rule S0 → aA | Aa, S → aA | Aa, So
we will replace terminal a with X:
S0 → a | XA | AX | b
S → a | XA | AX | b
A → XBB
B → AX | b | a
X→a
Step 4: In the production rule A → XBB, RHS has more than two
symbols, removing it from grammar yield:
S0 → a | XA | AX | b
S → a | XA | AX | b
A → RB
B → AX | b | a
X→a
R → XB
28