Removing Direct and Indirect Left Recursion in a Grammar
Last Updated :
14 Oct, 2024
Left Recursion is a common problem that occurs in grammar during parsing in the syntax analysis part of compilation. It is important to remove left recursion from grammar because it can create an infinite loop, leading to errors and a significant decrease in performance. We will discuss how to remove Left Recursion in detail along with an example and explanations of the process in this article.
Types of Left Recursion
There are two types of Left Recursion:
- Direct Left Recursion
- Indirect Left Recursion
Below mentioned is the full process of removing recursion in both the types i.e. Direct Left Recursion and Indirect left Recursion.
Removing Direct Left Recursion in a Grammar
Left Recursion: Grammar of the form S ⇒ S | a | b is called left recursive where S is any non Terminal and a and b are any set of terminals.
Problem with Left Recursion: If a left recursion is present in any grammar then, during parsing in the syntax analysis part of compilation, there is a chance that the grammar will create an infinite loop. This is because, at every time of production of grammar, S will produce another S without checking any condition.
Algorithm to Remove Left Recursion with an example Suppose we have a grammar which contains left recursion:
S ⇒ S a | S b | c | d
Check if the given grammar contains left recursion. If present, then separate the production and start working on it. In our example:
S ⇒ S a | S b | c | d
Introduce a new nonterminal and write it at the end of every terminal. We create a new nonterminal S' and write the new production as:
S ⇒ cS' | dS'
Write the newly produced nonterminal S' in the LHS, and in the RHS it can either produce S' or it can produce new production in which the terminals or non terminals which followed the previous LHS will be replaced by the new nonterminal S' at the end of the term.
S' ⇒ ε | aS' | bS'
So, after conversion, the new equivalent production is:
S ⇒ cS' | dS'
S' ⇒ ε | aS' | bS'
Removing Indirect Left Recursion in a Grammar
A grammar is said to have indirect left recursion if, starting from any symbol of the grammar, it is possible to derive a string whose head is that symbol. For example,
A ⇒ B r
B ⇒ C d
C ⇒ A t
where A, B, C are non-terminals and r, d, t are terminals. Here, starting with A, we can derive A again by substituting C to B and B to A.
Algorithm to Remove Indirect Recursion with help of an example:
A1 ⇒ A2 A3
A2 ⇒ A3 A1 | b
A3 ⇒ A1 A1 | a
Where A1, A2, A3 are non terminals and a, b are terminals.
Identify the productions which can cause indirect left recursion. In our case,
A3 ⇒ A1 A1 | a
Substitute its production at the place the terminal is present in any other production: substitute A1–> A2 A3 in production of A3.
A3 ⇒ A2 A3 A1 | a
Now in this production substitute A2 ⇒ A3 A1 | b
A3 ⇒ (A3 A1 | b) A3 A1 | a
and then distributing,
A3 ⇒ A3 A1 A3 A1 | b A3 A1 | a
Now the new production is converted in the form of direct left recursion, solve this by the direct left recursion method.
Eliminating direct left recursion as in the above, introduce a new nonterminal and write it at the end of every terminal. We create a new nonterminal A' and write the new productions as:
A3 ⇒ b A3 A1 A' | aA'
A' ⇒ ε | A1 A3 A1 A'
ε can be distributed to avoid an empty term:
A3 ⇒ b A3 A1 | a | b A3 A1 A' | aA'
A' ⇒ A1 A3 A1 | A1 A3 A1 A'
The resulting grammar is then:
A1 ⇒ A2 A3
A2 ⇒ A3 A1 | b
A3 ⇒ b A3 A1 | a | b A3 A1 A' | aA'
A' ⇒ A1 A3 A1 | A1 A3 A1 A'
Implementation:
C++
#include <bits/stdc++.h>
using namespace std;
class NonTerminal {
string name; // Stores the Head of production rule
vector<string> productionRules; // Stores the body of production rules
public:
NonTerminal(string name) {
this->name = name;
}
// Returns the head of the production rule
string getName() {
return name;
}
// Returns the body of the production rules
void setRules(vector<string> rules) {
productionRules.clear();
for (auto rule : rules){
productionRules.push_back(rule);
}
}
vector<string> getRules() {
return productionRules;
}
void addRule(string rule) {
productionRules.push_back(rule);
}
// Prints the production rules
void printRule() {
string toPrint = "";
toPrint += name + " ->";
for (string s : productionRules){
toPrint += " " + s + " |";
}
toPrint.pop_back();
cout << toPrint << endl;
}
};
class Grammar {
vector<NonTerminal> nonTerminals;
public:
// Add rules to the grammar
void addRule(string rule) {
bool nt = 0;
string parse = "";
for (char c : rule){
if (c == ' ') {
if (!nt) {
NonTerminal newNonTerminal(parse);
nonTerminals.push_back(newNonTerminal);
nt = 1;
parse = "";
} else if (parse.size()){
nonTerminals.back().addRule(parse);
parse = "";
}
}else if (c != '|' && c != '-' && c != '>'){
parse += c;
}
}
if (parse.size()){
nonTerminals.back().addRule(parse);
}
}
void inputData() {
addRule("S -> Sa | Sb | c | d");
}
// Algorithm for eliminating the non-Immediate Left Recursion
void solveNonImmediateLR(NonTerminal &A, NonTerminal &B) {
string nameA = A.getName();
string nameB = B.getName();
vector<string> rulesA, rulesB, newRulesA;
rulesA = A.getRules();
rulesB = B.getRules();
for (auto rule : rulesA) {
if (rule.substr(0, nameB.size()) == nameB) {
for (auto rule1 : rulesB){
newRulesA.push_back(rule1 + rule.substr(nameB.size()));
}
}
else{
newRulesA.push_back(rule);
}
}
A.setRules(newRulesA);
}
// Algorithm for eliminating Immediate Left Recursion
void solveImmediateLR(NonTerminal &A) {
string name = A.getName();
string newName = name + "'";
vector<string> alphas, betas, rules, newRulesA, newRulesA1;
rules = A.getRules();
// Checks if there is left recursion or not
for (auto rule : rules) {
if (rule.substr(0, name.size()) == name){
alphas.push_back(rule.substr(name.size()));
}
else{
betas.push_back(rule);
}
}
// If no left recursion, exit
if (!alphas.size())
return;
if (!betas.size())
newRulesA.push_back(newName);
for (auto beta : betas)
newRulesA.push_back(beta + newName);
for (auto alpha : alphas)
newRulesA1.push_back(alpha + newName);
// Amends the original rule
A.setRules(newRulesA);
newRulesA1.push_back("\u03B5");
// Adds new production rule
NonTerminal newNonTerminal(newName);
newNonTerminal.setRules(newRulesA1);
nonTerminals.push_back(newNonTerminal);
}
// Eliminates left recursion
void applyAlgorithm() {
int size = nonTerminals.size();
for (int i = 0; i < size; i++){
for (int j = 0; j < i; j++){
solveNonImmediateLR(nonTerminals[i], nonTerminals[j]);
}
solveImmediateLR(nonTerminals[i]);
}
}
// Print all the rules of grammar
void printRules() {
for (auto nonTerminal : nonTerminals){
nonTerminal.printRule();
}
}
};
int main(){
//freopen("output.txt", "w+", stdout);
Grammar grammar;
grammar.inputData();
grammar.applyAlgorithm();
grammar.printRules();
return 0;
}
Java
import java.util.*;
class NonTerminal{
private String name;
private ArrayList<String> rules;
public NonTerminal(String name) {
this.name = name;
rules = new ArrayList<>();
}
public void addRule(String rule) {
rules.add(rule);
}
public void setRules(ArrayList<String> rules) {
this.rules = rules;
}
public String getName() {
return name;
}
public ArrayList<String> getRules() {
return rules;
}
public void printRule() {
System.out.print(name + " -> ");
for (int i = 0; i < rules.size(); i++){
System.out.print(rules.get(i));
if (i != rules.size() - 1)
System.out.print(" | ");
}
System.out.println();
}
}
class Grammar{
private ArrayList<NonTerminal> nonTerminals;
public Grammar() {
nonTerminals = new ArrayList<>();
}
public void addRule(String rule) {
boolean nt = false;
String parse = "";
for (int i = 0; i < rule.length(); i++){
char c = rule.charAt(i);
if (c == ' ') {
if (!nt) {
NonTerminal newNonTerminal = new NonTerminal(parse);
nonTerminals.add(newNonTerminal);
nt = true;
parse = "";
} else if (parse.length() != 0){
nonTerminals.get(nonTerminals.size() - 1).addRule(parse);
parse = "";
}
}else if (c != '|' && c != '-' && c != '>'){
parse += c;
}
}
if (parse.length() != 0){
nonTerminals.get(nonTerminals.size() - 1).addRule(parse);
}
}
public void inputData() {
addRule("S -> Sa | Sb | c | d");
}
public void solveNonImmediateLR(NonTerminal A, NonTerminal B) {
String nameA = A.getName();
String nameB = B.getName();
ArrayList<String> rulesA = new ArrayList<>();
ArrayList<String> rulesB = new ArrayList<>();
ArrayList<String> newRulesA = new ArrayList<>();
rulesA = A.getRules();
rulesB = B.getRules();
for (String rule : rulesA) {
if (rule.substring(0, nameB.length()).equals(nameB)) {
for (String rule1 : rulesB){
newRulesA.add(rule1 + rule.substring(nameB.length()));
}
}
else{
newRulesA.add(rule);
}
}
A.setRules(newRulesA);
}
public void solveImmediateLR(NonTerminal A) {
String name = A.getName();
String newName = name + "'";
ArrayList<String> alphas= new ArrayList<>();
ArrayList<String> betas = new ArrayList<>();
ArrayList<String> rules = A.getRules();
ArrayList<String> newRulesA = new ArrayList<>();
ArrayList<String> newRulesA1 = new ArrayList<>();
rules = A.getRules();
// Checks if there is left recursion or not
for (String rule : rules) {
if (rule.substring(0, name.length()).equals(name)){
alphas.add(rule.substring(name.length()));
}
else{
betas.add(rule);
}
}
// If no left recursion, exit
if (alphas.size() == 0)
return;
if (betas.size() == 0)
newRulesA.add(newName);
for (String beta : betas)
newRulesA.add(beta + newName);
for (String alpha : alphas)
newRulesA1.add(alpha + newName);
// Amends the original rule
A.setRules(newRulesA);
newRulesA1.add("\u03B5");
// Adds new production rule
NonTerminal newNonTerminal = new NonTerminal(newName);
newNonTerminal.setRules(newRulesA1);
nonTerminals.add(newNonTerminal);
}
public void applyAlgorithm() {
int size = nonTerminals.size();
for (int i = 0; i < size; i++){
for (int j = 0; j < i; j++){
solveNonImmediateLR(nonTerminals.get(i), nonTerminals.get(j));
}
solveImmediateLR(nonTerminals.get(i));
}
}
void printRules() {
for (NonTerminal nonTerminal : nonTerminals){
nonTerminal.printRule();
}
}
}
class Main{
public static void main(String[] args) {
Grammar grammar = new Grammar();
grammar.inputData();
grammar.applyAlgorithm();
grammar.printRules();
}
}
Python
class NonTerminal :
def __init__(self, name) :
self.name = name
self.rules = []
def addRule(self, rule) :
self.rules.append(rule)
def setRules(self, rules) :
self.rules = rules
def getName(self) :
return self.name
def getRules(self) :
return self.rules
def printRule(self) :
print(self.name + " -> ", end = "")
for i in range(len(self.rules)) :
print(self.rules[i], end = "")
if i != len(self.rules) - 1 :
print(" | ", end = "")
print()
class Grammar :
def __init__(self) :
self.nonTerminals = []
def addRule(self, rule) :
nt = False
parse = ""
for i in range(len(rule)) :
c = rule[i]
if c == ' ' :
if not nt :
newNonTerminal = NonTerminal(parse)
self.nonTerminals.append(newNonTerminal)
nt = True
parse = ""
elif parse != "" :
self.nonTerminals[len(self.nonTerminals) - 1].addRule(parse)
parse = ""
elif c != '|' and c != '-' and c != '>' :
parse += c
if parse != "" :
self.nonTerminals[len(self.nonTerminals) - 1].addRule(parse)
def inputData(self) :
self.addRule("S -> Sa | Sb | c | d")
def solveNonImmediateLR(self, A, B) :
nameA = A.getName()
nameB = B.getName()
rulesA = []
rulesB = []
newRulesA = []
rulesA = A.getRules()
rulesB = B.getRules()
for rule in rulesA :
if rule[0 : len(nameB)] == nameB :
for rule1 in rulesB :
newRulesA.append(rule1 + rule[len(nameB) : ])
else :
newRulesA.append(rule)
A.setRules(newRulesA)
def solveImmediateLR(self, A) :
name = A.getName()
newName = name + "'"
alphas = []
betas = []
rules = A.getRules()
newRulesA = []
newRulesA1 = []
rules = A.getRules()
# Checks if there is left recursion or not
for rule in rules :
if rule[0 : len(name)] == name :
alphas.append(rule[len(name) : ])
else :
betas.append(rule)
# If no left recursion, exit
if len(alphas) == 0 :
return
if len(betas) == 0 :
newRulesA.append(newName)
for beta in betas :
newRulesA.append(beta + newName)
for alpha in alphas :
newRulesA1.append(alpha + newName)
# Amends the original rule
A.setRules(newRulesA)
newRulesA1.append("\u03B5")
# Adds new production rule
newNonTerminal = NonTerminal(newName)
newNonTerminal.setRules(newRulesA1)
self.nonTerminals.append(newNonTerminal)
def applyAlgorithm(self) :
size = len(self.nonTerminals)
for i in range(size) :
for j in range(i) :
self.solveNonImmediateLR(self.nonTerminals[i], self.nonTerminals[j])
self.solveImmediateLR(self.nonTerminals[i])
def printRules(self) :
for nonTerminal in self.nonTerminals :
nonTerminal.printRule()
grammar = Grammar()
grammar.inputData()
grammar.applyAlgorithm()
grammar.printRules()
OutputS -> cS' | dS'
S' -> aS' | bS' | ε
Time Complexity: The time complexity of the algorithm is O(n*s) where n= no of production rules and s = maximum string length of each rule.
Conclusion
In conclusion, removal of direct as well as indirect left recursion from a grammar plays a important role in ensuring smooth parsing at the time of syntax analysis of compilation. Infinite loops and errors can be present in left recursion , but utilizing the proper techniques can eliminate these issues by restructuring the grammar resulting in the improving the efficiency and accuracy of parsers, leading to better overall performance in language processing systems.
Similar Reads
Right and Left linear Regular Grammars
Regular Grammar is a type of grammar that describes a regular language. It is a set of rules used to describe very simple types of languages called regular languages that can be processed by computers easily, especially with finite automata. A regular grammar is a mathematical object, G, which consi
3 min read
Relationship between grammar and Language
Prerequisite - Regular Grammar, Regular Expressions, Chomsky hierarchy Overview :In this article, we will discuss the overview of Regular Grammar can be either Right Linear or Left Linear, and mainly focus will be on the relationship between grammar and Language. Let's discuss it one by one. Types :
2 min read
Relationship between grammar and language in Theory of Computation
In the Theory of Computation, grammar and language are fundamental concepts used to define and describe computational problems. A grammar is a set of production rules that generate a language, while a language is a collection of strings that conform to these rules. Understanding their relationship i
4 min read
Regular grammar (Model regular grammars )
Regular grammar is a formal grammar used to describe regular languages, which are the languages that can be recognized by finite automata. It serves as one of the simplest and most fundamental models in the Chomsky hierarchy of grammars. Regular grammars are widely used in computer science for lexic
4 min read
Operator grammar and precedence parser in TOC
A grammar that is used to define mathematical operators is called an operator grammar or operator precedence grammar. Such grammars have the restriction that no production has either an empty right-hand side (null productions) or two adjacent non-terminals in its right-hand side. Examples - This is
6 min read
Reversing Deterministic Finite Automata
Prerequisite â Designing finite automata Reversal: We define the reversed language L^R \text{ of } L  to be the language L^R = \{ w^R \mid w \in L \} , where w^R := a_n a_{n-1} \dots a_1 a_0 \text{ for } w = a_0 a_1 \dots a_{n-1} a_n Steps to Reversal: Draw the states as it is.Add a new single accep
4 min read
Introduction To Grammar in Theory of Computation
In Theory of Computation, grammar refers to a formal system that defines how strings in a language are constructed. It plays a crucial role in determining the syntactic correctness of languages and forms the foundation for parsing and interpreting programming languages, natural languages, and other
4 min read
Design Turing Machine to reverse String consisting of a's and b's
Prerequisite : Turing Machine Task : Our task is to design a Turing machine to reverse a string consisting of a's and b's. Examples : Input-1 : aabb Output-1 : bbaa Input-2 : abab Output-2 : baba Approach : The basic idea is to read the input from Right to Left and replace Blank(B) with the alphabet
3 min read
Recursive and Recursive Enumerable Languages in TOC
Recursive Enumerable (RE) or Type -0 LanguageRE languages or type-0 languages are generated by type-0 grammars. An RE language can be accepted or recognized by Turing machine which means it will enter into final state for the strings of language and may or may not enter into rejecting state for the
5 min read
Algorithm for non recursive Predictive Parsing
Prerequisite - Classification of Top Down Parsers Predictive parsing is a special form of recursive descent parsing, where no backtracking is required, so this can predict which products to use to replace the input string. Non-recursive predictive parsing or table-driven is also known as LL(1) parse
4 min read