Relationship between grammar and Language

Removing Direct and Indirect Left Recursion in a Grammar

Last Updated : 14 Oct, 2024

Left Recursion is a common problem that occurs in grammar during parsing in the syntax analysis part of compilation. It is important to remove left recursion from grammar because it can create an infinite loop, leading to errors and a significant decrease in performance. We will discuss how to remove Left Recursion in detail along with an example and explanations of the process in this article.

Types of Left Recursion

There are two types of Left Recursion:

Direct Left Recursion
Indirect Left Recursion

Below mentioned is the full process of removing recursion in both the types i.e. Direct Left Recursion and Indirect left Recursion.

Removing Direct Left Recursion in a Grammar

Left Recursion: Grammar of the form S ⇒ S | a | b is called left recursive where S is any non Terminal and a and b are any set of terminals.

Problem with Left Recursion: If a left recursion is present in any grammar then, during parsing in the syntax analysis part of compilation, there is a chance that the grammar will create an infinite loop. This is because, at every time of production of grammar, S will produce another S without checking any condition.

Algorithm to Remove Left Recursion with an example Suppose we have a grammar which contains left recursion:

S ⇒ S a | S b | c | d

Check if the given grammar contains left recursion. If present, then separate the production and start working on it. In our example:

S ⇒ S a | S b | c | d

Introduce a new nonterminal and write it at the end of every terminal. We create a new nonterminal S' and write the new production as:

S ⇒ cS' | dS'

Write the newly produced nonterminal S' in the LHS, and in the RHS it can either produce S' or it can produce new production in which the terminals or non terminals which followed the previous LHS will be replaced by the new nonterminal S' at the end of the term.

S' ⇒ ε | aS' | bS'

So, after conversion, the new equivalent production is:

S ⇒ cS' | dS'
S' ⇒ ε | aS' | bS'

Removing Indirect Left Recursion in a Grammar

A grammar is said to have indirect left recursion if, starting from any symbol of the grammar, it is possible to derive a string whose head is that symbol. For example,

A ⇒ B r 
B ⇒ C d
C ⇒ A t

where A, B, C are non-terminals and r, d, t are terminals. Here, starting with A, we can derive A again by substituting C to B and B to A.

Algorithm to Remove Indirect Recursion with help of an example:

A1 ⇒ A2 A3
A2 ⇒ A3 A1 | b
A3 ⇒ A1 A1 | a

Where A1, A2, A3 are non terminals and a, b are terminals.

Identify the productions which can cause indirect left recursion. In our case,

A3 ⇒ A1 A1 | a

Substitute its production at the place the terminal is present in any other production: substitute A1–> A2 A3 in production of A3.

A3 ⇒ A2 A3 A1 | a

Now in this production substitute A2 ⇒ A3 A1 | b

A3 ⇒ (A3 A1 | b) A3 A1 | a

and then distributing,

A3 ⇒ A3 A1 A3 A1 | b A3 A1 | a

Now the new production is converted in the form of direct left recursion, solve this by the direct left recursion method.

Eliminating direct left recursion as in the above, introduce a new nonterminal and write it at the end of every terminal. We create a new nonterminal A' and write the new productions as:

A3 ⇒ b A3 A1 A' | aA'
A' ⇒ ε | A1 A3 A1 A'

ε can be distributed to avoid an empty term:

A3 ⇒ b A3 A1 | a | b A3 A1 A' | aA'
A' ⇒ A1 A3 A1 | A1 A3 A1 A'

The resulting grammar is then:

A1 ⇒ A2 A3
A2 ⇒ A3 A1 | b
A3 ⇒ b A3 A1 | a | b A3 A1 A' | aA'
A' ⇒ A1 A3 A1 | A1 A3 A1 A'

Implementation:

C++

#include <bits/stdc++.h>
using namespace std;

class NonTerminal {
    string name;                    // Stores the Head of production rule
    vector<string> productionRules; // Stores the body of production rules

public:
    NonTerminal(string name) {
        this->name = name;
    }

    // Returns the head of the production rule
    string getName() {
        return name;
    }

    // Returns the body of the production rules
    void setRules(vector<string> rules) {
        productionRules.clear();
        for (auto rule : rules){
            productionRules.push_back(rule);
        }
    }

    vector<string> getRules() {
        return productionRules;
    }

    void addRule(string rule) {
        productionRules.push_back(rule);
    }

    // Prints the production rules
    void printRule() {
        string toPrint = "";
        toPrint += name + " ->";

        for (string s : productionRules){
            toPrint += " " + s + " |";
        }

        toPrint.pop_back();
        cout << toPrint << endl;
    }
};

class Grammar {
    vector<NonTerminal> nonTerminals;

public:
    // Add rules to the grammar
    void addRule(string rule) {
        bool nt = 0;
        string parse = "";

        for (char c : rule){
            if (c == ' ') {
                if (!nt) {
                    NonTerminal newNonTerminal(parse);
                    nonTerminals.push_back(newNonTerminal);
                    nt = 1;
                    parse = "";
                } else if (parse.size()){
                    nonTerminals.back().addRule(parse);
                    parse = "";
                }
            }else if (c != '|' && c != '-' && c != '>'){
                parse += c;
            }
        }
        if (parse.size()){
            nonTerminals.back().addRule(parse);
        }
    }

    void inputData() {

       
        addRule("S -> Sa | Sb | c | d");

    }

    // Algorithm for eliminating the non-Immediate Left Recursion
    void solveNonImmediateLR(NonTerminal &A, NonTerminal &B) {
        string nameA = A.getName();
        string nameB = B.getName();

        vector<string> rulesA, rulesB, newRulesA;
        rulesA = A.getRules();
        rulesB = B.getRules();

        for (auto rule : rulesA) {
            if (rule.substr(0, nameB.size()) == nameB) {
                for (auto rule1 : rulesB){
                    newRulesA.push_back(rule1 + rule.substr(nameB.size()));
                }
            }
            else{
                newRulesA.push_back(rule);
            }
        }
        A.setRules(newRulesA);
    }

    // Algorithm for eliminating Immediate Left Recursion
    void solveImmediateLR(NonTerminal &A) {
        string name = A.getName();
        string newName = name + "'";

        vector<string> alphas, betas, rules, newRulesA, newRulesA1;
        rules = A.getRules();

        // Checks if there is left recursion or not
        for (auto rule : rules) {
            if (rule.substr(0, name.size()) == name){
                alphas.push_back(rule.substr(name.size()));
            }
            else{
                betas.push_back(rule);
            }
        }

        // If no left recursion, exit
        if (!alphas.size())
            return;

        if (!betas.size())
            newRulesA.push_back(newName);

        for (auto beta : betas)
            newRulesA.push_back(beta + newName);

        for (auto alpha : alphas)
            newRulesA1.push_back(alpha + newName);

        // Amends the original rule
        A.setRules(newRulesA);
        newRulesA1.push_back("\u03B5");

        // Adds new production rule
        NonTerminal newNonTerminal(newName);
        newNonTerminal.setRules(newRulesA1);
        nonTerminals.push_back(newNonTerminal);
    }

    // Eliminates left recursion
    void applyAlgorithm() {
        int size = nonTerminals.size();
        for (int i = 0; i < size; i++){
            for (int j = 0; j < i; j++){
                solveNonImmediateLR(nonTerminals[i], nonTerminals[j]);
            }
            solveImmediateLR(nonTerminals[i]);
        }
    }

    // Print all the rules of grammar
    void printRules() {
        for (auto nonTerminal : nonTerminals){
            nonTerminal.printRule();
        }
    }
};

int main(){
    //freopen("output.txt", "w+", stdout);

    Grammar grammar;
    grammar.inputData();
    grammar.applyAlgorithm();
    grammar.printRules();

    return 0;
}

Java

import java.util.*;

class NonTerminal{
    private String name;
    private ArrayList<String> rules;

    public NonTerminal(String name) {
        this.name = name;
        rules = new ArrayList<>();
    }

    public void addRule(String rule) {
        rules.add(rule);
    }

    public void setRules(ArrayList<String> rules) {
        this.rules = rules;
    }

    public String getName() {
        return name;
    }

    public ArrayList<String> getRules() {
        return rules;
    }

    public void printRule() {
        System.out.print(name + " -> ");
        for (int i = 0; i < rules.size(); i++){
            System.out.print(rules.get(i));
            if (i != rules.size() - 1)
                System.out.print(" | ");
        }
        System.out.println();
    }
}


class Grammar{
    private ArrayList<NonTerminal> nonTerminals;

    public Grammar() {
        nonTerminals = new ArrayList<>();
    }

    public void addRule(String rule) {
        boolean nt = false;
        String parse = "";

        for (int i = 0; i < rule.length(); i++){
            char c = rule.charAt(i);
            if (c == ' ') {
                if (!nt) {
                    NonTerminal newNonTerminal = new NonTerminal(parse);
                    nonTerminals.add(newNonTerminal);
                    nt = true;
                    parse = "";
                } else if (parse.length() != 0){
                    nonTerminals.get(nonTerminals.size() - 1).addRule(parse);
                    parse = "";
                }
            }else if (c != '|' && c != '-' && c != '>'){
                parse += c;
            }
        }
        if (parse.length() != 0){
            nonTerminals.get(nonTerminals.size() - 1).addRule(parse);
        }
    }

    public void inputData() {
        addRule("S -> Sa | Sb | c | d");
    }

    public void solveNonImmediateLR(NonTerminal A, NonTerminal B) {
        String nameA = A.getName();
        String nameB = B.getName();

        ArrayList<String> rulesA = new ArrayList<>();
        ArrayList<String> rulesB = new ArrayList<>();
        ArrayList<String> newRulesA = new ArrayList<>();
        rulesA = A.getRules();
        rulesB = B.getRules();

        for (String rule : rulesA) {
            if (rule.substring(0, nameB.length()).equals(nameB)) {
                for (String rule1 : rulesB){
                    newRulesA.add(rule1 + rule.substring(nameB.length()));
                }
            }
            else{
                newRulesA.add(rule);
            }
        }
        A.setRules(newRulesA);
    }

    public void solveImmediateLR(NonTerminal A) {
        String name = A.getName();
        String newName = name + "'";

        ArrayList<String> alphas= new ArrayList<>();
        ArrayList<String> betas = new ArrayList<>();
        ArrayList<String> rules = A.getRules();
        ArrayList<String> newRulesA = new ArrayList<>();
        ArrayList<String> newRulesA1 = new ArrayList<>();

        
        rules = A.getRules();

        // Checks if there is left recursion or not
        for (String rule : rules) {
            if (rule.substring(0, name.length()).equals(name)){
                alphas.add(rule.substring(name.length()));
            }
            else{
                betas.add(rule);
            }
        }

        // If no left recursion, exit
        if (alphas.size() == 0)
            return;

        if (betas.size() == 0)
            newRulesA.add(newName);

        for (String beta : betas)
            newRulesA.add(beta + newName);

        for (String alpha : alphas)
            newRulesA1.add(alpha + newName);

        // Amends the original rule

        A.setRules(newRulesA);
        newRulesA1.add("\u03B5");

        // Adds new production rule
        NonTerminal newNonTerminal = new NonTerminal(newName);
        newNonTerminal.setRules(newRulesA1);
        nonTerminals.add(newNonTerminal);
    }

    public void applyAlgorithm() {
        int size = nonTerminals.size();
        for (int i = 0; i < size; i++){
            for (int j = 0; j < i; j++){
                solveNonImmediateLR(nonTerminals.get(i), nonTerminals.get(j));
            }
            solveImmediateLR(nonTerminals.get(i));
        }
    }

    void printRules() {
        for (NonTerminal nonTerminal : nonTerminals){
            nonTerminal.printRule();
        }
    }
    


}
class Main{
    public static void main(String[] args) {
        Grammar grammar = new Grammar();
        grammar.inputData();
        grammar.applyAlgorithm();
        grammar.printRules();
    }
}

Python

class NonTerminal :
    def __init__(self, name) :
        self.name = name
        self.rules = []
    def addRule(self, rule) :
        self.rules.append(rule)
    def setRules(self, rules) :
        self.rules = rules
    def getName(self) :
        return self.name
    def getRules(self) :
        return self.rules
    def printRule(self) :
        print(self.name + " -> ", end = "")
        for i in range(len(self.rules)) :
            print(self.rules[i], end = "")
            if i != len(self.rules) - 1 :
                print(" | ", end = "")
        print()
        
        
class Grammar :
    def __init__(self) :
        self.nonTerminals = []

    def addRule(self, rule) :
        nt = False
        parse = ""

        for i in range(len(rule)) :
            c = rule[i]
            if c == ' ' :
                if not nt :
                    newNonTerminal = NonTerminal(parse)
                    self.nonTerminals.append(newNonTerminal)
                    nt = True
                    parse = ""
                elif parse != "" :
                    self.nonTerminals[len(self.nonTerminals) - 1].addRule(parse)
                    parse = ""
            elif c != '|' and c != '-' and c != '>' :
                parse += c
        if parse != "" :
            self.nonTerminals[len(self.nonTerminals) - 1].addRule(parse)

    def inputData(self) :
        self.addRule("S -> Sa | Sb | c | d")

    def solveNonImmediateLR(self, A, B) :
        nameA = A.getName()
        nameB = B.getName()

        rulesA = []
        rulesB = []
        newRulesA = []
        rulesA = A.getRules()
        rulesB = B.getRules()

        for rule in rulesA :
            if rule[0 : len(nameB)] == nameB :
                for rule1 in rulesB :
                    newRulesA.append(rule1 + rule[len(nameB) : ])
            else :
                newRulesA.append(rule)
        A.setRules(newRulesA)

    def solveImmediateLR(self, A) :
        name = A.getName()
        newName = name + "'"

        alphas = []
        betas = []
        rules = A.getRules()
        newRulesA = []
        newRulesA1 = []

        rules = A.getRules()

        # Checks if there is left recursion or not
        for rule in rules :
            if rule[0 : len(name)] == name :
                alphas.append(rule[len(name) : ])
            else :
                betas.append(rule)

        # If no left recursion, exit
        if len(alphas) == 0 :
            return

        if len(betas) == 0 :
            newRulesA.append(newName)

        for beta in betas :
            newRulesA.append(beta + newName)

        for alpha in alphas :
            newRulesA1.append(alpha + newName)

        # Amends the original rule

        A.setRules(newRulesA)
        newRulesA1.append("\u03B5")

        # Adds new production rule
        newNonTerminal = NonTerminal(newName)
        newNonTerminal.setRules(newRulesA1)
        self.nonTerminals.append(newNonTerminal)

    def applyAlgorithm(self) :
        size = len(self.nonTerminals)
        for i in range(size) :
            for j in range(i) :
                self.solveNonImmediateLR(self.nonTerminals[i], self.nonTerminals[j])
            self.solveImmediateLR(self.nonTerminals[i])

    def printRules(self) :
        for nonTerminal in self.nonTerminals :
            nonTerminal.printRule()

            
grammar = Grammar()
grammar.inputData()
grammar.applyAlgorithm()
grammar.printRules()

Output

S -> cS' | dS' 
S' -> aS' | bS' | ε

Time Complexity: The time complexity of the algorithm is O(n*s) where n= no of production rules and s = maximum string length of each rule.

Conclusion

In conclusion, removal of direct as well as indirect left recursion from a grammar plays a important role in ensuring smooth parsing at the time of syntax analysis of compilation. Infinite loops and errors can be present in left recursion , but utilizing the proper techniques can eliminate these issues by restructuring the grammar resulting in the improving the efficiency and accuracy of parsers, leading to better overall performance in language processing systems.

Relationship between grammar and Language

madarsh986

Improve

Article Tags :

Similar Reads

Right and Left linear Regular Grammars

Regular Grammar is a type of grammar that describes a regular language. It is a set of rules used to describe very simple types of languages called regular languages that can be processed by computers easily, especially with finite automata. A regular grammar is a mathematical object, G, which consi

Relationship between grammar and Language

Prerequisite - Regular Grammar, Regular Expressions, Chomsky hierarchy Overview :In this article, we will discuss the overview of Regular Grammar can be either Right Linear or Left Linear, and mainly focus will be on the relationship between grammar and Language. Let's discuss it one by one. Types :

Relationship between grammar and language in Theory of Computation

In the Theory of Computation, grammar and language are fundamental concepts used to define and describe computational problems. A grammar is a set of production rules that generate a language, while a language is a collection of strings that conform to these rules. Understanding their relationship i

Regular grammar (Model regular grammars )

Regular grammar is a formal grammar used to describe regular languages, which are the languages that can be recognized by finite automata. It serves as one of the simplest and most fundamental models in the Chomsky hierarchy of grammars. Regular grammars are widely used in computer science for lexic

Operator grammar and precedence parser in TOC

A grammar that is used to define mathematical operators is called an operator grammar or operator precedence grammar. Such grammars have the restriction that no production has either an empty right-hand side (null productions) or two adjacent non-terminals in its right-hand side. Examples - This is

Reversing Deterministic Finite Automata

Prerequisite â€“ Designing finite automataÂ Reversal: We define the reversed languageÂ L^R \text{ of } LÂ Â to be the languageÂ L^R = \{ w^R \mid w \in L \}Â , whereÂ w^R := a_n a_{n-1} \dots a_1 a_0 \text{ for } w = a_0 a_1 \dots a_{n-1} a_nÂ Steps to Reversal:Â Draw the states as it is.Add a new single accep

Introduction To Grammar in Theory of Computation

In Theory of Computation, grammar refers to a formal system that defines how strings in a language are constructed. It plays a crucial role in determining the syntactic correctness of languages and forms the foundation for parsing and interpreting programming languages, natural languages, and other

Design Turing Machine to reverse String consisting of a's and b's

Prerequisite : Turing Machine Task : Our task is to design a Turing machine to reverse a string consisting of a's and b's. Examples : Input-1 : aabb Output-1 : bbaa Input-2 : abab Output-2 : baba Approach : The basic idea is to read the input from Right to Left and replace Blank(B) with the alphabet

Recursive and Recursive Enumerable Languages in TOC

Recursive Enumerable (RE) or Type -0 LanguageRE languages or type-0 languages are generated by type-0 grammars. An RE language can be accepted or recognized by Turing machine which means it will enter into final state for the strings of language and may or may not enter into rejecting state for the

Algorithm for non recursive Predictive Parsing

Prerequisite - Classification of Top Down Parsers Predictive parsing is a special form of recursive descent parsing, where no backtracking is required, so this can predict which products to use to replace the input string. Non-recursive predictive parsing or table-driven is also known as LL(1) parse