VIDYAVARDHAKA COLLEGE OF ENGINEERING
(Autonomous, affiliated to VTU)
DEPARTMENT OF
CSE (ARTIFICIAL INTELLIGENCE & MACHINE LEARNING)
Mini Project Report
On
“Context-free Grammar (CFG)”
Submitted in partial fulfillment of the requirement for the completion of VII semester of
BACHELOR OF ENGINEERING
Submitted by:
ULLAS CV 4VV21AI054
SUHAS T 4VV21AI052
PREETHAM S 4VV21AI036
RAKESH R 4VV21AI039
VIBUDHA DATTA SN 4VV21AI056
Under the guidance of:
LATHA DU
Assistant Professor
Dept. of CSE(AI&ML)
CSE (ARTIFICIAL INTELLIGENCE & MACHINE LEARNING)
VIDYAVARDHAKA COLLEGE OF
ENGINEERING
DEPARTMENT OF
CSE (ARTIFICIAL INTELLIGENCE & MACHINE LEARNING)
CERTIFICATE
Certified that mini project work entitled “Context-free Grammar (CFG)”, is a
bona fide work carried out by ULLAS CV (4VV21AI56), SUHAS T
(4VV21AI052) ,PREETHAM S (4VV21AI036), RAKESH R (4VV21AI039)
AND VIBUDHA DATTA SN (4VV21AI056) in partial fulfillment of the
requirement for the completion of V semester in CSE(AI&ML) of
Vidyavardhaka College of Engineering during the year 2023-24. It is certified
that all corrections/suggestions indicated for Internal Assessment has been
incorporated in the report. The report has been approved as it satisfies the
academic requirements with respect to Mini Project work.
Signature of the Guide Signature of the HOD
LATHA DU Dr VINUTHA DC
Assistant Professor Professor & Head
Table of Contents
1. Introduction 4
2. Algorithm 5
3. Code 6
4.Output 7
5.Conclusion 8
INTRODUCTION
Context-free grammar (CFG) is a fundamental concept in formal language theory and
computer science. It provides a structured way to describe the syntax of languages, including
programming languages, natural languages, and many others. In this report, we will explore
the key aspects of context-free grammars, their components, and their applications.
Definition: Context-free grammar consists of a set of production rules that define the syntax of
a language. Each rule consists of a non-terminal symbol (representing a syntactic category)
and a sequence of terminals and/or non-terminals. These rules specify how symbols can be
combined to form valid strings in the language.
Components of CFG:
Non-terminal symbols: Represent syntactic categories or variables in the grammar.
Terminal symbols: Represent basic units of the language, such as keywords, identifiers,
and punctuation marks.
Production rules: Specify how non-terminal symbols can be replaced by sequences of
terminals and/or non-terminals.
Start symbol: Defines the starting point for generating valid strings in the language
Formal Representation: CFGs are formally represented as a tuple (V, Σ, R, S), where:
V is a finite set of non-terminal symbols.
Σ is a finite set of terminal symbols.
R is a set of production rules.
S is the start symbol.
Context-free grammars play a crucial role in the development of programming languages,
where they define the syntactic rules that govern how programs are written and interpreted.
They provide a foundation for designing compilers, interpreters, and other language
processing tools.
One of the key features of context-free grammars is their generative power, allowing them to
generate an infinite set of valid strings in the language. This property is essential for language
recognition and parsing algorithms, which determine whether a given input conforms to the
grammar rules.
4 | Page
Steps involved in Context-free Grammer
1. Problem Definition: Clearly define the language or syntax you want to describe using
a context-free grammar (CFG). This could be the syntax of a programming language,
a subset of natural language, or any other structured language.
2. Define Terminals and Non-terminals: Identify the basic building blocks of your
language, known as terminals, which are the actual symbols or tokens in the
language. Then, define non-terminals, which represent syntactic categories or
abstract symbols that can be expanded into other symbols.
3. Define Production Rules: Write down a set of production rules that specify how non-
terminals can be replaced by sequences of terminals and/or other non-terminals.
Each production rule typically takes the form A → α, where A is a non-terminal
symbol and α is a string of terminals and/or non-terminals.
4. Formalize the Grammar: Organize the production rules into a formal grammar
definition, usually represented in the form G = (N, Σ, P, S), where N is the set of non-
terminals, Σ is the set of terminals, P is the set of production rules, and S is the start
symbol.
5. Test and Validate: Test the CFG by generating and parsing strings according to the
defined grammar rules. Ensure that the generated strings are syntactically correct
according to the grammar and that the parser can correctly recognize and parse
valid strings.
6. Refinement and Iteration: Refine the grammar as needed based on testing and
validation results. This may involve adding additional production rules, modifying
existing rules, or adjusting the language definition to better capture the desired
syntax.
7. Documentation: Document the context-free grammar thoroughly, including
explanations of the terminals, non-terminals, production rules, and any special
syntax or conventions used. Provide examples of valid and invalid strings according
to the grammar rules.
8. Application and Integration: Integrate the context-free grammar into the relevant
application or system where it will be used, such as a compiler, parser, natural
language processing tool, or other language processing system. Ensure that the
grammar is appropriately applied and utilized within the context of the larger
system.
5 | Page
CODE SNIPPET
import nltk
# Define the context-free grammar
grammar = nltk.CFG.fromstring("""
S -> NP VP
NP -> Det N | Det N PP
VP -> V NP | V NP PP
PP -> P NP
Det -> 'the' | 'a'
N -> 'man' | 'dog' | 'park' | 'telescope'
V -> 'saw' | 'ate' | 'walked' | 'gazed'
P -> 'in' | 'on' | 'by' | 'with'
""")
# Create a parser for the grammar
parser = nltk.ChartParser(grammar)
# Parse a sentence
sentence = "the man saw a dog in the park".split()
for tree in parser.parse(sentence):
print(tree)
6 | Page
OUTPUT
7 | Page
CONCLUSION
Context-free grammar (CFG) provides a formal framework for describing the syntax of
languages in a hierarchical structure.
It consists of a set of production rules that define how symbols (non-terminals) can be
replaced by sequences of other symbols and terminals.
The rules in a CFG are typically expressed as productions of the form A→β, where A is a
non-terminal symbol and β is a string of terminals and/or non-terminals.
CFGs are widely used in formal language theory, compiler design, natural language
processing, and other areas of computer science.
Parsing algorithms such as the CYK algorithm and Earley parser are used to determine
whether a given string belongs to the language generated by a CFG.
Despite their simplicity, CFGs have limitations in expressing certain language
constructs, leading to the development of more powerful formalisms such as context-
sensitive grammars and tree-adjoining grammars.
One of the key concepts in CFGs is the derivation process, where a sequence of
production rule applications transforms an initial symbol (usually the start symbol) into
a string of terminals.
CFGs are used extensively in the design and implementation of programming
languages, where they define the syntactic structure of the language and guide the
construction of parsers and compilers.
Context-free grammars are widely used in natural language processing tasks such as
parsing sentences, generating syntactically correct sentences, and performing semantic
analysis.
CFGs are also employed in various other fields including compiler design, formal
language theory, syntax highlighting in text editors, and pattern recognition.
The formalism of context-free grammars provides a foundation for understanding the
hierarchical structure of languages, making them a fundamental concept in theoretical
computer science and linguistics.
8 | Page