Natural Language Processing
Instructor: Dr. Muhammad Zubair Asghar
(Associate Professor, Faculty of Computing)
Deterministic Grammars in NLP
A deterministic grammar is a grammar that defines how sentences are
constructed using fixed production rules, without any element of probability. It
is most commonly represented as a Context-Free Grammar (CFG), where each
non-terminal symbol is expanded into other non-terminals or terminals
according to strictly defined rules. For example, a simple grammar might define
that a sentence (S) must always consist of a noun phrase (NP) followed by a
verb phrase (VP).
Deterministic Grammars in NLP (Contin…)
This means that a sentence such as “the dog chased the cat” will always be
parsed into one valid structure, and every time the same sentence is given, the
grammar produces the same parse tree. Deterministic grammars are widely used
in formal language processing, such as compilers and programming languages,
because programming syntax requires strict unambiguous rules. However, in
natural languages, ambiguity often arises (e.g., the word “bank” could mean a
financial institution or a riverbank), and deterministic grammars are not
well-suited to resolving such ambiguity because they cannot prioritize one
parse over another.
Python Code for Deterministic Grammars
Stochastic Grammar in NLP
In contrast, a stochastic grammar, also called a probabilistic grammar,
introduces probabilities into the rules of grammar. This approach is more
flexible and more closely aligned with how natural language is used in real life.
A Probabilistic Context-Free Grammar (PCFG) assigns probabilities to
production rules so that when multiple parse trees are possible, the parser can
select the one with the highest likelihood.
Stochastic Grammar in NLP (Contin…)
For example, a grammar might specify that a noun phrase (NP) is more likely to
be constructed as Det N (with a probability of 0.6) than as just N (with a
probability of 0.4). Similarly, a verb phrase (VP) might more likely follow the
structure V NP than a standalone verb. This probability weighting enables the
grammar to resolve ambiguity in favor of the most natural or frequent structure
according to training data.
Python Code for Stochastic Grammar in NLP (Contin…)
What are CFGs?
A Context-Free Grammar (CFG) is a formal system used in Natural Language
Processing (NLP) and computational linguistics to describe the syntax of
natural or programming languages. A CFG consists of a set of rules
(productions) that describe how symbols (words, phrases) can be combined to
form valid sentences.
Components of a CFG
A CFG is formally represented as a 4-tuple:
G = (V, Σ, R, S)
where:
o V (Variables / Non-terminals) → abstract symbols like S, NP, VP (sentence,
noun phrase, verb phrase).
o Σ (Terminals) → actual words in the language, like dog, cat, runs.
o R (Rules / Productions) → how non-terminals expand. Example:
S → NP VP
NP → Det Noun
o S (Start symbol) → usually S for sentence.
Python Example with NLTK
We can implement CFGs using NLTK in Python:
THE END