0% found this document useful (0 votes)
34 views12 pages

Understanding Deterministic Grammars in NLP

The document discusses deterministic and stochastic grammars in Natural Language Processing (NLP). Deterministic grammars use fixed production rules and are suitable for formal language processing, while stochastic grammars incorporate probabilities to handle ambiguity in natural languages. It also explains Context-Free Grammars (CFGs) and their components, which are essential for describing the syntax of languages.

Uploaded by

medicomedico847
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views12 pages

Understanding Deterministic Grammars in NLP

The document discusses deterministic and stochastic grammars in Natural Language Processing (NLP). Deterministic grammars use fixed production rules and are suitable for formal language processing, while stochastic grammars incorporate probabilities to handle ambiguity in natural languages. It also explains Context-Free Grammars (CFGs) and their components, which are essential for describing the syntax of languages.

Uploaded by

medicomedico847
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Natural Language Processing

Instructor: Dr. Muhammad Zubair Asghar


(Associate Professor, Faculty of Computing)
Deterministic Grammars in NLP
A deterministic grammar is a grammar that defines how sentences are
constructed using fixed production rules, without any element of probability. It
is most commonly represented as a Context-Free Grammar (CFG), where each
non-terminal symbol is expanded into other non-terminals or terminals
according to strictly defined rules. For example, a simple grammar might define
that a sentence (S) must always consist of a noun phrase (NP) followed by a
verb phrase (VP).
Deterministic Grammars in NLP (Contin…)
This means that a sentence such as “the dog chased the cat” will always be
parsed into one valid structure, and every time the same sentence is given, the
grammar produces the same parse tree. Deterministic grammars are widely used
in formal language processing, such as compilers and programming languages,
because programming syntax requires strict unambiguous rules. However, in
natural languages, ambiguity often arises (e.g., the word “bank” could mean a
financial institution or a riverbank), and deterministic grammars are not
well-suited to resolving such ambiguity because they cannot prioritize one
parse over another.
Python Code for Deterministic Grammars
Stochastic Grammar in NLP

In contrast, a stochastic grammar, also called a probabilistic grammar,

introduces probabilities into the rules of grammar. This approach is more

flexible and more closely aligned with how natural language is used in real life.

A Probabilistic Context-Free Grammar (PCFG) assigns probabilities to

production rules so that when multiple parse trees are possible, the parser can

select the one with the highest likelihood.


Stochastic Grammar in NLP (Contin…)

For example, a grammar might specify that a noun phrase (NP) is more likely to

be constructed as Det N (with a probability of 0.6) than as just N (with a

probability of 0.4). Similarly, a verb phrase (VP) might more likely follow the

structure V NP than a standalone verb. This probability weighting enables the

grammar to resolve ambiguity in favor of the most natural or frequent structure

according to training data.


Python Code for Stochastic Grammar in NLP (Contin…)
What are CFGs?
A Context-Free Grammar (CFG) is a formal system used in Natural Language
Processing (NLP) and computational linguistics to describe the syntax of
natural or programming languages. A CFG consists of a set of rules
(productions) that describe how symbols (words, phrases) can be combined to
form valid sentences.
Components of a CFG
A CFG is formally represented as a 4-tuple:
G = (V, Σ, R, S)
where:
o V (Variables / Non-terminals) → abstract symbols like S, NP, VP (sentence,
noun phrase, verb phrase).
o Σ (Terminals) → actual words in the language, like dog, cat, runs.
o R (Rules / Productions) → how non-terminals expand. Example:
S → NP VP
NP → Det Noun
o S (Start symbol) → usually S for sentence.
Python Example with NLTK
We can implement CFGs using NLTK in Python:
THE END

You might also like