0% found this document useful (0 votes)

83 views19 pages

Probabilistic Lexicalized CFGs in NLP

The document discusses syntactic analysis in natural language processing, focusing on context-free grammars (CFGs), grammar rules for English, treebanks, and parsing techniques. It explains key concepts such as production rules, dependency grammar, and syntactic ambiguity, providing examples and applications in NLP. Additionally, it covers normal forms for grammars, including Chomsky Normal Form and Greibach Normal Form, highlighting their importance in simplifying parsing algorithms.

Uploaded by

paralladeepthi9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

83 views19 pages

Probabilistic Lexicalized CFGs in NLP

Uploaded by

paralladeepthi9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

NATURAL LANGAUGE PROCESSING 5TH AI

UNIT III SYNTACTIC ANALYSIS

Context-Free Grammars, Grammar rules for English, Treebanks, Normal Forms for grammar –
Dependency Grammar – Syntactic Parsing, Ambiguity, Dynamic Programming parsing – Shallow
parsing – Probabilistic CFG, Probabilistic CYK, Probabilistic Lexicalized CFGs - Feature
structures, Unification of feature structures.

 Context-Free Grammars
Context-free grammars (CFGs) are foundational in natural language processing (NLP) for formally
describing the structure of natural language sentences. A CFG consists of a set of production rules used to
generate all well-formed sentences in a language.

Key Elements of a Context-Free Grammar

 Terminals (T): The basic symbols from which strings are formed (e.g., actual words in a
language).

 Non-terminals/Variables (V): Syntactic categories or placeholders for groups of terminals (e.g.,

Sentence 'S', Noun Phrase 'NP', Verb Phrase 'VP').

 Production Rules (P): Rules of the form A→βA→β, where AA is a non-terminal, and ββ is a
sequence of terminals and/or non-terminals.

 Start Symbol (S): A special non-terminal symbol from which generation begins (often the symbol
'S' for 'Sentence')

Formal Definition

A CFG is defined as a 4-tuple (V,T,P,S)(V,T,P,S), where:

 VV: Set of non-terminals

 TT: Set of terminals

 PP: Set of production rules (A→βA→β)

 SS: Start symbol (S∈VS∈V)

Example

A simple CFG for part of English might include:

 V={S,NP,VP,Det,N}V={S,NP,VP,Det,N}

 T={"the", "cat", "sat"}T={"the", "cat", "sat"}

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

 S→NP VP, NP→Det N, Det→"the", N→"cat", VP→"sat"S→NP VP, NP→Det N, Det→"the", N

→"cat", VP→"sat"

 S=SS=S
This CFG can generate the sentence "the cat sat" as a valid structure

Use in NLP

CFGs are vital for:

 Describing syntactic structure (syntax trees) of natural language.

 Building parsing algorithms to analyze sentence structure.

 Modeling hierarchical and recursive constituents in language (like nested noun phrases)

Example:

Here is how the sentence **"I can go to school"** can be represented in context-free grammar (CFG)
format:

### Non-terminals

- S: Sentence

- NP: Noun Phrase

- VP: Verb Phrase

- MOD: Modal

- V: Verb

- PP: Prepositional Phrase

- P: Preposition

- N: Noun

### Terminals

- "I", "can", "go", "to", "school"

### Production Rules

S → NP VP

NP → "I"

VP → MOD V PP

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

MOD → "can"

V → "go"

PP → P N

P → "to"

N → "school"

### Derivation

-S

→ NP VP

→ "I" VP

→ "I" MOD V PP

→ "I" "can" V PP

→ "I" "can" "go" PP

→ "I" "can" "go" P N

→ "I" "can" "go" "to" N

→ "I" "can" "go" "to" "school"

This CFG generates the sentence "I can go to school" and shows the structure required to parse it in
natural language processing applications.

 Grammar rules for English

Grammar rules for English in NLP provide the structural foundation to parse and generate well-formed
sentences. Here are the key types of grammar rules and concepts used in natural language processing for
English:

1. Parts of Speech (POS)

 Labeling each word as a noun, verb, adjective, adverb, etc.

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

 Example: In "The dog runs fast", "the" is an article, "dog" is a noun, "runs" is a verb, "fast" is an
adverb.

 Used for POS tagging in NLP.

2. Syntax Rules

 Define the arrangement of words in a sentence.

 Example: S → NP VP (A sentence can be a Noun Phrase followed by a Verb Phrase.)

 In context-free grammar (CFG), these are production rules applied recursively.

3. Sentence Structure

 The basic format is Subject + Verb + Object.

 Example: ("Ravi" is the subject, "eats" the verb, "mangoes" the object).

 NLP tools break sentences into such chunks for analysis.

4. Tense and Subject-Verb Agreement

 Ensures verbs correspond correctly with the subject and tense.

 Example: "He walks" (singular subject) versus "They walk" (plural subject).

5. Noun Phrases and Verb Phrases

 Groups of words acting as a single noun (noun phrase) or single verb (verb phrase).

 Example: "The big brown dog" (noun phrase), "is barking loudly" (verb phrase).

6. Dependency Grammar

 Focuses on word-to-word relationships within a sentence (e.g., subject-verb-object

dependencies).

 Helps machines understand roles and relationships.

7. Grammar Ambiguity

 Some sentences can be interpreted in more than one way.

 Example: "I saw the man with the telescope" (multiple possible attachments/interpretations).

Example Grammar Rules in CFG Format

 S → NP VP

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

 NP → Det N | Det Adj N | "I" | "he"

 VP → V NP | V NP PP | V | MOD V NP

 PP → P NP

 Det → "the" | "a"

 N → "cat" | "dog" | "school"

 V → "chased" | "slept" | "go"

 MOD → "can"

 P → "to"

These rules enable NLP systems to check syntax, perform parsing, translation, sentence segmentation,
and other language tasks

 TREEBANKS
In Natural Language Processing (NLP), a Treebank is a linguistically annotated text corpus where each
sentence is paired with a syntactic or semantic structure represented as a tree. These structures typically
represent phrase structure or dependency relations that explicitly show how words in a sentence are
syntactically related.

What is a Treebank?

 A Treebank is a parsed corpus where sentences are manually or semi-automatically annotated

with their syntactic trees or dependency trees.

 The tree structure encodes grammatical relations and hierarchical phrase organization, such as
noun phrases (NP), verb phrases (VP), and clauses.

 Treebank’s are created from corpora already annotated with simpler linguistic information like
part-of-speech (POS) tags.

Types of Treebank’s

 Syntactic Treebank’s: Annotate the syntactic structure of sentences, focusing on phrase

structure or dependencies.

Examples: Penn Treebank (phrase structure), Universal Dependencies (dependency trees).

 Semantic Treebank’s: Annotate semantic relationships and roles within sentences, extending the
syntactic annotation with meaning representation.

Examples: PropBank (annotates verbal propositions and arguments), Groningen Meaning Bank.

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

How Treebanks are Built

 Manual annotation by expert linguists or semi-automatic methods refined by linguists.

 Parsing tools generate initial trees that human annotators check and correct for accuracy.

 Annotation schemes may follow specific linguistic theories or be more general.

 Representations are often stored in simple bracketed text form, XML, or specialized formats.

Uses of Treebanks in NLP

 Training and evaluating parsing models: Treebanks serve as "gold standard" data, enabling
supervised machine learning for syntactic parsers.

 Grammar induction: Extract production rules and probabilities for statistical grammars like
probabilistic context-free grammars (PCFGs).

 Linguistic research: Study syntactic phenomena, frequency of specific constructions, and test
linguistic theories.

 Improving NLP systems: POS taggers, dependency parsers, semantic role labelers, and machine
translation models use treebanks extensively.

 Benchmarking: Parsing accuracy is often evaluated by comparing automatic parses with gold
treebank annotation using metrics like labeled attachment score (LAS).

Examples of Notable Treebanks

 Penn Treebank: Early, influential phrase-structure treebank for English.

 Universal Dependencies: Multilingual project providing consistent dependency annotations.

 PropBank: Focuses on semantic role annotation (who did what to whom).

 Arabic Treebank, Chinese Treebank: Language-specific treebanks covering syntactic

annotation for those languages.

 NORMAL FORMS FOR GRAMMAR

Normal Forms for grammars are standardized ways to represent context-free grammars (CFGs) with
restrictions on the form of production rules. These normal forms simplify parsing algorithms and
theoretical analysis.

Chomsky Normal Form (CNF)

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

 A context-free grammar is in Chomsky Normal Form if every production rule is of one of the
following forms:

 A→BCA→BC where A,B,CA,B,C are non-terminal symbols, and B,CB,C are not the
start symbol.

 A→aA→a where aa is a terminal symbol.

 S→ϵS→ϵ where SS is the start symbol and ϵϵ is the empty string (only if the language
contains the empty string).

 Key properties of CNF:

 Each rule either produces two non-terminals or one terminal (or the empty string for the
start symbol).

 Any CFG can be converted into an equivalent CNF grammar.

 CNF is widely used in parsing algorithms like the CYK algorithm.

 Derivations in CNF for strings of length nn take exactly 2n−12n−1 steps, aiding parsing
efficiency.

 Conversion process usually involves:

1. Eliminating null productions.

2. Eliminating unit productions.

3. Removing useless symbols.

4. Ensuring productions produce either two non-terminals or a single terminal.

5. Creating new non-terminals for terminals when they appear in longer productions.

Greibach Normal Form (GNF)

 Another important normal form where each production is of the form:

 A→aαA→aα, where aa is a terminal and αα is a possibly empty string of non-terminals.

 It ensures the leftmost symbol on the right side of each production is always a terminal.

 Useful for top-down parsing and establishing certain theoretical properties.

Importance in NLP

 Normal forms like CNF simplify parsing by restricting production shapes, making algorithms
more efficient and easier to implement.

 They provide a foundation for efficient syntax analysis and automated parsing tools.

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

 The CNF conversion aids in developing probabilistic grammars and facilitates parsers used in
syntactic analysis in natural language understanding systems.

 DEPENDENCY GRAMMAR
Dependency Grammar in NLP is a framework that represents sentence structure based on direct
relationships between words. It models how words depend on one another, focusing on word-to-word
connections rather than hierarchical phrase structures. In this system, a sentence is viewed as a
dependency graph where words (nodes) are linked by directed edges indicating dependencies from a
"head" word to its "dependents."

Key concepts include:

 The head of a sentence, usually the main verb, governs other words.

 Each word (except the root/head) depends on another word, forming a directed graph or tree
structure.

 Dependency relations are labeled to show grammatical functions such as subject, object, or
modifier.

 The structure is generally flatter than phrase-structure grammars and well suited for languages
with flexible word order.

An example of dependency grammar can be illustrated with the sentence:

"The quick brown fox jumps over the lazy dog."

In its dependency structure:

 "jumps" is the root (main verb).

 "fox" is a dependent of "jumps" as the subject.

 "The," "quick," and "brown" are dependents of "fox," acting as determiners and adjectives
describing the subject.

 "over" is a preposition dependent on "jumps."

 "dog" is a dependent of "over," the object of the preposition.

 "The" and "lazy" are dependents of "dog," acting as determiner and adjective describing the
object.

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

This structure shows how each word relates directly to another word (its head), forming labeled
dependencies such as subject, object, or modifier. The connections represent grammatical roles directly
between words rather than constituent phrases.

This relational structure helps NLP systems analyze sentence meaning and syntax efficiently by focusing
on word-to-word dependencies.

Another example is the sentence:

"Kevin can hit the baseball with a bat."

 SYNTACTIC PARSING
Syntactic parsing in NLP is the process of analyzing a sentence's grammatical structure according to
formal grammar rules and constructing a representation, such as a parse tree, that shows how words and
phrases are related syntactically. This process enables machines to understand the arrangement of words,
parts of speech, and grammatical relationships within the sentence, which is fundamental for many
language understanding tasks.

There are two main types of syntactic parsing:

 Constituency Parsing: Breaks a sentence into nested constituents or phrases (e.g., noun phrases,
verb phrases), forming a hierarchical tree that reflects phrase structure.

 Dependency Parsing: Focuses on the relationships between individual words by building a

directed graph or tree that shows which words depend on which others.

Syntactic parsing helps resolve structural ambiguities in sentences and facilitates downstream tasks like
information extraction, semantic role labeling, machine translation, and text-to-speech. It can be rule-
based or use statistical and machine learning methods due to the complexity and ambiguity of natural
language.

Example:

Here's an example illustrating syntactic parsing using a parse tree for the sentence:

"John hit the ball."

In syntactic parsing with a constituency-based grammar:

 The root of the parse tree is "S" (Sentence).

 "S" branches into two main constituents: "NP" (Noun Phrase) and "VP" (Verb Phrase).

 The "NP" consists of the word "John," which is a noun acting as the subject.

 The "VP" consists of the verb "hit," and the noun phrase "the ball" as its object.

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

 The noun phrase "the ball" further branches into a determiner "the" and a noun "ball."

This hierarchical tree visually represents the grammatical structure, showing how words group into
phrases and how those phrases relate to each other. The parse tree reflects the syntactic structure
according to grammar rules, making relationships like subject-verb-object explicit.

 SYNTACTIC AMBIGUITY
Syntactic ambiguity, also known as structural ambiguity, occurs in syntactic analysis when a sentence can
be parsed in multiple valid ways, leading to different interpretations of its grammatical structure. This
ambiguity arises not because of word meanings but due to the sentence’s syntax—how words and phrases
are organized and related.

Examples of Syntactic Ambiguity

 Prepositional Phrase Attachment: In the sentence "I saw the man with the telescope," it is
ambiguous whether "with the telescope" modifies "saw" (meaning the instrument used to see) or
"the man" (describing which man was seen).

 Attachment of Clauses: "While Don was reading the newspaper, his sister knocked on the door."
The phrase "the newspaper" could initially be interpreted as the object of "was reading," or as the
subject of a new clause ("the newspaper lay unnoticed"), causing temporary ambiguity.

 Multiple Interpretations in News Headlines or Sentences: Headlines like “Kids Make

Nutritious Snacks” can be interpreted either as children producing snacks or as snacks made from
kids, showing ambiguity due to syntax.

 Garden Path Sentences: Sentences that lead the reader to an initial incorrect parse requiring
reanalysis, such as "The old man the boats," where "man" is a verb, not a noun.

Syntactic ambiguity poses significant challenges for parsing algorithms because multiple parse trees
(structures) may be possible for one sentence, and disambiguation is required to find the correct meaning.
Parsers may generate a set of all possible structures (parse forests) and use semantic or statistical
information to resolve ambiguity.

Here are some classic examples of syntactic ambiguity sentences:

1. "I saw the man with the telescope."

This can mean either:

 The observer used a telescope to see the man, or

 The man being seen has a telescope.

2. "Kids make nutritious snacks."

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

This humorous example could mean:

 Kids prepare nutritious snacks, or

 Kids themselves are nutritious snacks.

3. "The woman held the baby in the green blanket."

Possible interpretations include:

 The baby wrapped in the green blanket is being held,

 The woman is using the green blanket to hold the baby, or

 The woman herself is wrapped in the green blanket while holding the baby.

4. "Miners refuse to work after death."

5. Ambiguity:

 Miners stop working because of a death, or

 Miners continue refusing to work even after dying.

6. "John saw the man on the hill with a telescope."

Ambiguity about who has the telescope and whose location is on the hill.

 DYNAMIC PROGRAMMING PARSING

Dynamic Programming parsing in NLP refers to parsing algorithms that efficiently construct parse trees
by breaking down the parsing task into overlapping subproblems and solving each subproblem only once,
storing and reusing the results to avoid redundant computations.

Key Concepts

 Parsing is the process of analyzing a sentence's syntactic structure based on a grammar.

 Dynamic programming parsing leverages a tabular approach, where partial parse results for
substrings are stored in a table (chart).

 It reduces the exponential search space to polynomial time by reusing smaller sub-constituent
parses.

Common Dynamic Programming Parsing Algorithms

1. CKY (Cocke-Younger-Kasami) Algorithm

 Works on grammars converted to Chomsky Normal Form (CNF).

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

 Uses a bottom-up approach filling a table to recognize constituents over substrings.

 Runs in O(n3⋅∣G∣)O(n3⋅∣G∣) time where nn is sentence length and ∣G∣∣G∣ grammar size.

 Can produce all possible parse trees for ambiguous sentences.

2. Earley Parser

 Works with any context-free grammar (no CNF restriction).

 Combines top-down prediction, bottom-up recognition, and completion steps.

 Uses dynamic programming to avoid redundant parsing of substructures.

 Practical for natural language parsing due to flexibility with left recursion and ambiguity.

Advantages

 Efficiently handles ambiguous and recursive grammars.

 Avoids exponential backtracking by caching results.

 Supports extraction of multiple parse trees or most probable parse in probabilistic models.

 Foundation for many statistical and neural parsing algorithms.

Applications in NLP

 Syntactic parsing to generate phrase structure or dependency trees.

 Part-of-speech tagging and named entity recognition improvements.

 Semantic parsing and machine translation.

 Text understanding, question answering, and information extraction.

 Shallow parsing
Shallow parsing, also known as chunking or light parsing, is an NLP technique that focuses on identifying
and extracting the main syntactic constituents or phrases from sentences without performing a full,
detailed grammatical analysis.

What is Shallow Parsing?

 It segments a sentence into non-overlapping phrases or "chunks," such as noun phrases (NP), verb
phrases (VP), and prepositional phrases (PP).

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

 Unlike full parsing, which generates a complete syntactic tree showing detailed relationships
between all words, shallow parsing provides a simpler, flatter structure highlighting key phrase
boundaries.

 The goal is to capture important functional units for use in downstream NLP tasks efficiently.

How Shallow Parsing Works

 Uses part-of-speech (POS) tagging as input to detect phrase boundaries.

 Applies techniques like rule-based pattern matching, machine learning (e.g., Hidden Markov
Models, Conditional Random Fields, support vector machines), or neural networks to identify
chunks.

 Can also include named entity recognition to extract entities like people, locations, and
organizations.

Advantages

 Computationally less expensive and faster than full parsing.

 Provides sufficient structure for many NLP applications without the complexity of deep parsing.

 More robust to imperfect or ambiguous input.

 Useful for large-scale text processing in real-time or near real-time scenarios.

Applications in NLP

 Information Extraction: Extract key phrases and entities from unstructured text.

 Text Summarization: Identify important sentence components.

 Sentiment Analysis: Focus on relevant phrases to detect sentiment.

 Machine Translation: Improve phrase-level translation quality.

 Question Answering: Detect meaningful constituents for query understanding.

Example

For sentence: "The black cat sat on the mat."

 Noun Phrase (NP): "The black cat"

 Verb Phrase (VP): "sat"

 Prepositional Phrase (PP): "on the mat"

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

 PROBABILISTIC CFG
Probabilistic Context-Free Grammar (PCFG) is an extension of the standard Context-Free Grammar
(CFG) used in Natural Language Processing (NLP) that associates probabilities with each production
rule. It helps to address ambiguity by modeling the likelihood of different parse trees for a given sentence.

Definition

 A PCFG is formally defined as a 5-tuple G=(N,T,S,R,P)G=(N,T,S,R,P), where:

 NN is a set of non-terminal symbols.

 TT is a set of terminal symbols.

 SS is the start symbol.

 RR is a set of production rules of the form A→αA→α,

where A∈NA∈N and α∈(N∪T)∗α∈(N∪T)∗.

 PP is a set of probabilities assigned to each production rule, where the probabilities of

rules sharing the same left-hand side non-terminal sum to 1.

How It Works

 Each production rule A→αA→α has a probability P(A→α)P(A→α) representing how likely that
rule is chosen given AA.

 The probability of a particular parse tree (derivation) is the product of the probabilities of the
production rules used to generate it.

 PCFGs model the uncertainty and ambiguity inherent in natural language by providing a
probabilistic ranking of possible parses.

 Probabilities are often learned from annotated corpora (e.g., treebanks) using maximum
likelihood estimation or more advanced machine learning methods.

Example

Consider the PCFG with productions and probabilities like:

 S→NP VPS→NPVP [1.0]

 NP→Det NounNP→DetNoun [0.4]

 NP→NP PPNP→NPPP [0.6]

 VP→Verb NPVP→VerbNP [1.0]

 etc.

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

The probability of a parse tree is the product of the probabilities of all productions used in that tree.

Benefits in NLP

 Helps resolve syntactic ambiguity by choosing the most probable parse.

 Enables statistical parsing algorithms.

 Provides a principled framework to incorporate corpus-derived statistical information.

 Supports efficient parsing through dynamic programming algorithms like the probabilistic CYK
parser.

 Probabilistic CYK
Probabilistic CYK (Cocke–Younger–Kasami) parsing is an extension of the classic CYK parsing
algorithm used for parsing sentences with Probabilistic Context-Free Grammars (PCFGs). It finds
the most probable syntactic parse tree for a sentence by systematically combining probabilities of
grammar rules.
How Probabilistic CYK Works
 Assumes the grammar is in Chomsky Normal Form (CNF).
 Uses a dynamic programming table where each cell represents a substring of the input sentence.
 Each cell stores the probability of the best parse for that substring with respect to each non-
terminal.
 The algorithm proceeds bottom-up, filling the table by combining smaller substrings:
 For a substring wi…wjwi…wj, it considers all partitions kk where i≤k<ji≤k<j.
 For each production rule A→BCA→BC, it computes:

P(A,i,j)=i≤k< jmax(P(A→BC)×P(B,i,k)×P(C,k+1,j))
 It selects the partition and rule that yield the highest probability.
 The final answer is the probability of the start symbol spanning the entire sentence w1…wnw1…
wn.
 The algorithm can also reconstruct the best parse tree using backpointers.

Applications and Advantages

 Resolves ambiguity by ranking multiple parses probabilistically.

 Provides a principled approach for statistical parsing.

 Widely used in NLP parsers trained on annotated treebanks.

and ∣G∣∣G∣ grammar size.

 Guarantees polynomial time parsing O(n3×∣G∣)O(n3×∣G∣), where nn is sentence length

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

 PROBABILISTIC LEXICALIZED CFGS

Probabilistic Lexicalized Context-Free Grammars (PLCFGs) are an advanced type of grammar used in
natural language processing that combine two ideas: probabilities of rules (like in Probabilistic CFGs) and
lexical heads (important words) within phrases.

 Lexicalized: Each non-terminal symbol in the grammar is paired with a head word that
represents the key lexical item of that phrase. For example, instead of just having a noun phrase
(NP), you have NP(head word), like NP(dog) or VP(run).

 Probabilistic: Each production rule has an associated probability. These probabilities specify
how likely it is that a particular rule will apply in a given context.

 Together, PLCFGs model not just the structure of sentences, but also how the choice of lexical
items affects the structure.

Why Lexicalization Matters

 A simple PCFG treats categories like NP or VP as abstract entities without considering important
lexical details.

 However, the choice of words (e.g., a particular verb or noun) strongly influences how phrases
combine.

 Lexicalized rules capture dependencies like verb subcategorization (which arguments a verb
takes) and agreements.

How it Works in Parsing

 Parsing with PLCFGs means finding the most probable parse tree that respects both the grammar
rules and the lexical heads.

 The grammar has many more rules because each non-terminal includes the lexical head, so the
state space grows.

 Algorithms based on dynamic programming, similar to probabilistic CYK, are used but with
additional bookkeeping for the heads.

Example

For sentence: "Workers dumped sacks into a bin"

 The start symbol might be S(dumped) because "dumped" is the lexical head of the sentence.

 Its children might be NP(workers) and VP(dumped), showing that the VP is headed by "dumped".

 This detailed lexical info helps decide between competing parses by weighting parses where
heads fit together well more highly.

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

 FEATURE STRUCTURES
Feature structures are a formal way to represent and organize linguistic information in Natural Language
Processing (NLP). They are commonly used in advanced grammar formalisms to describe properties of
linguistic elements, such as words or phrases, in a structured and flexible manner.

What Are Feature Structures?

 A feature structure is essentially a set of attribute-value pairs, where each attribute (called a
feature) describes some property and is paired with a value.

 Values can be atomic (like "singular" for number, or "nominative" for case) or they can
themselves be feature structures, allowing nested, hierarchical information.

 Feature structures are often visualized as attribute-value matrices (AVMs) or directed graphs
with features as labeled arcs and their values as nodes.

Example

A noun phrase (NP) might have features for number, case, and gender like:

Feature Value

NUM singular

CASE nominative

GENDER feminine

Why Use Feature Structures?

 They enable rich, precise modeling of linguistic information beyond simple category labels.

 Allow encoding syntactic, semantic, and morphological information compactly.

 Facilitate unification, a process that merges two feature structures and checks for compatibility
of features, used in parsing and grammar checking.

 Support constraint-based grammars like Lexical Functional Grammar (LFG) and Head-driven
Phrase Structure Grammar (HPSG).

Applications in NLP

 Representing word properties such as tense, number, or agreement.

 Encoding relationships and constraints in parsing and generation.

 Improving robustness and expressiveness of syntactic and semantic grammars.

 Supporting functional and dependency analyses for deeper language understanding.

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

 UNIFICATION OF FEATURE STRUCTURES

Unification of feature structures in NLP is an operation that combines two feature structures into a single
one that contains all the information from both, provided they are compatible. It is a key process used to
merge and reconcile linguistic information from different sources or constraints.

What is Unification?

 Unification attempts to merge two sets of attribute-value pairs (feature structures).

 If the features conflict (e.g., one has number=singular and the other number=plural), unification
fails.

 If compatible, unification produces a new feature structure that is more specific, integrating all
the information from the inputs.

How It Works

 Features are recursively checked and merged.

 For atomic attributes, values must match or be compatible.

 For nested feature structures, unification is applied recursively.

 It is monotonic (information only grows) and order-independent (unifying in any order yields the
same result if successful).

Example

 Feature structure 1: {num: singular, person: 3rd}

 Feature structure 2: {num: singular, gender: feminine}

 Unification result: {num: singular, person: 3rd, gender: feminine}

 Conflicting example:

 FS1: {num: singular}

 FS2: {num: plural}

 Unification fails because of incompatible values.

Importance

 Ensures that linguistic constraints like agreement and subcategorization are respected during
parsing.

 Helps integrate information from lexical entries and syntactic rules.

HINDU COLLEGE GUNTUR

NATURAL LANGAUGE PROCESSING 5TH AI

 Central in constraint-based grammar formalisms like HPSG and LFG.

HINDU COLLEGE GUNTUR

Syntactic Analysis and CFGs in NLP
No ratings yet
Syntactic Analysis and CFGs in NLP
36 pages
Understanding Context-Free Grammars in NLP
No ratings yet
Understanding Context-Free Grammars in NLP
35 pages
Context-Free Grammar in NLP Analysis
No ratings yet
Context-Free Grammar in NLP Analysis
23 pages
Understanding Structural Ambiguity in NLP
No ratings yet
Understanding Structural Ambiguity in NLP
16 pages
Understanding Syntax Parsing and Grammars
No ratings yet
Understanding Syntax Parsing and Grammars
95 pages
CFG and Probabilistic Parsing in NLP
No ratings yet
CFG and Probabilistic Parsing in NLP
45 pages
Context-Free Grammar in NLP
No ratings yet
Context-Free Grammar in NLP
5 pages
Phrase Structure and Sentence Types
No ratings yet
Phrase Structure and Sentence Types
48 pages
Syntactic Analysis Overview
No ratings yet
Syntactic Analysis Overview
4 pages
Syntax Analysis in Natural Language Processing
No ratings yet
Syntax Analysis in Natural Language Processing
89 pages
Context-Free Grammars in NLP
No ratings yet
Context-Free Grammars in NLP
73 pages
Understanding Formal Grammars and Syntax
No ratings yet
Understanding Formal Grammars and Syntax
50 pages
Context-Free Grammars in English Syntax
No ratings yet
Context-Free Grammars in English Syntax
67 pages
Understanding Formal Grammars and Syntax
No ratings yet
Understanding Formal Grammars and Syntax
43 pages
CFG and PCFG in NLP Explained
No ratings yet
CFG and PCFG in NLP Explained
7 pages
Constituency Grammar in NLP Explained
No ratings yet
Constituency Grammar in NLP Explained
72 pages
Syntactic and Semantic NLP Representations
No ratings yet
Syntactic and Semantic NLP Representations
47 pages
Context-Free Grammars in NLP
No ratings yet
Context-Free Grammars in NLP
60 pages
Syntax and Context-Free Grammar Overview
No ratings yet
Syntax and Context-Free Grammar Overview
139 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
26 pages
Syntactic Analysis in Natural Language Processing
No ratings yet
Syntactic Analysis in Natural Language Processing
134 pages
Understanding Syntax and Grammar in NLP
No ratings yet
Understanding Syntax and Grammar in NLP
16 pages
NLP Syntax and Parsing Techniques
No ratings yet
NLP Syntax and Parsing Techniques
31 pages
Syntax Structures in NLP Explained
No ratings yet
Syntax Structures in NLP Explained
22 pages
Types of Grammar in NLP Explained
No ratings yet
Types of Grammar in NLP Explained
55 pages
Context-Free Grammar Overview
No ratings yet
Context-Free Grammar Overview
8 pages
Syntactic Parsing in Natural Language Processing
No ratings yet
Syntactic Parsing in Natural Language Processing
42 pages
Natural Language Syntax and Parsing Techniques
No ratings yet
Natural Language Syntax and Parsing Techniques
16 pages
NLP Techniques and Morphological Parsing
100% (2)
NLP Techniques and Morphological Parsing
66 pages
Understanding Context-Free Grammars in ASP.NET
No ratings yet
Understanding Context-Free Grammars in ASP.NET
7 pages
Syntactic Analysis in Natural Language Processing
No ratings yet
Syntactic Analysis in Natural Language Processing
19 pages
Advanced NLP Course Overview
No ratings yet
Advanced NLP Course Overview
21 pages
Context-Free Grammars Explained
No ratings yet
Context-Free Grammars Explained
72 pages
Understanding Syntactic Processing in NLP
No ratings yet
Understanding Syntactic Processing in NLP
17 pages
Understanding Syntax in Language and Code
No ratings yet
Understanding Syntax in Language and Code
90 pages
NLP Parsing Techniques Overview
No ratings yet
NLP Parsing Techniques Overview
54 pages
Understanding Formal Language Grammar
No ratings yet
Understanding Formal Language Grammar
7 pages
Syntactic Analysis in Linguistics
No ratings yet
Syntactic Analysis in Linguistics
28 pages
Syntactic Processing in NLP Explained
No ratings yet
Syntactic Processing in NLP Explained
9 pages
Treebanks and Context-Free Grammars
No ratings yet
Treebanks and Context-Free Grammars
5 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
13 pages
Understanding Syntax and CFG in NLP
No ratings yet
Understanding Syntax and CFG in NLP
26 pages
Understanding Natural Language Processing
No ratings yet
Understanding Natural Language Processing
7 pages
NLP Notes 3rd & 4th Chapter
No ratings yet
NLP Notes 3rd & 4th Chapter
30 pages
AI Natural Language Processing Overview
No ratings yet
AI Natural Language Processing Overview
19 pages
NLP Parsing Techniques and Grammar
No ratings yet
NLP Parsing Techniques and Grammar
28 pages
Semantic Interpretation in NLP
No ratings yet
Semantic Interpretation in NLP
51 pages
Overview of Natural Language Processing
No ratings yet
Overview of Natural Language Processing
9 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
83 pages
Context-Free Grammars and Constituency Parsing
No ratings yet
Context-Free Grammars and Constituency Parsing
26 pages
Syntax Analysis in NLP: Parsing Methods
No ratings yet
Syntax Analysis in NLP: Parsing Methods
71 pages
Syntax Analysis and Grammar Types
No ratings yet
Syntax Analysis and Grammar Types
87 pages
Natural Language Processing Overview
No ratings yet
Natural Language Processing Overview
19 pages
NLP Text and Speech Analysis Basics
No ratings yet
NLP Text and Speech Analysis Basics
23 pages
Understanding Context-Free Grammar
No ratings yet
Understanding Context-Free Grammar
7 pages
Syntactic Parsing in NLP Explained
No ratings yet
Syntactic Parsing in NLP Explained
45 pages
Augmented Grammar in AI Systems
100% (1)
Augmented Grammar in AI Systems
19 pages
Normal Forms in Context-Free Grammar
No ratings yet
Normal Forms in Context-Free Grammar
14 pages
Overview of Natural Language Processing
100% (1)
Overview of Natural Language Processing
49 pages
Context-Free Grammar in NLP Syntax
No ratings yet
Context-Free Grammar in NLP Syntax
185 pages
Geotechnical Report for 25MW Solar Farm
No ratings yet
Geotechnical Report for 25MW Solar Farm
382 pages
Introduction to Algorithms by Riddhi Kotak
No ratings yet
Introduction to Algorithms by Riddhi Kotak
50 pages
Overview of Basic Medical Sciences Departments
No ratings yet
Overview of Basic Medical Sciences Departments
110 pages
IOMQC Blankets
No ratings yet
IOMQC Blankets
14 pages
AI for Orthodontic Treatment Planning
No ratings yet
AI for Orthodontic Treatment Planning
8 pages
UAE Civil Defense Approved Labs List
No ratings yet
UAE Civil Defense Approved Labs List
67 pages
Overview of Sequential Logic Circuits
No ratings yet
Overview of Sequential Logic Circuits
45 pages
Metallgesellschaft AG Case Study Analysis
No ratings yet
Metallgesellschaft AG Case Study Analysis
24 pages
GSB Test Report for Casta Engineers
No ratings yet
GSB Test Report for Casta Engineers
1 page
Stoichiometry Calculations Worksheet
No ratings yet
Stoichiometry Calculations Worksheet
4 pages
Uplevelling Adventure Sentences for Kids
No ratings yet
Uplevelling Adventure Sentences for Kids
6 pages
Blast Design Parameters in Surface Mining
100% (1)
Blast Design Parameters in Surface Mining
27 pages
Siemens C62 Universal Box Manual
No ratings yet
Siemens C62 Universal Box Manual
10 pages
Differential Distillation Analysis
No ratings yet
Differential Distillation Analysis
9 pages
Notch Sensitivity and Stress Factors
100% (2)
Notch Sensitivity and Stress Factors
24 pages
DATS V2 Loudspeaker Parameter Guide
No ratings yet
DATS V2 Loudspeaker Parameter Guide
10 pages
All-in-One Residential Energy Storage System
No ratings yet
All-in-One Residential Energy Storage System
4 pages
Maths Ceramah 2011: Graphs & Equations
No ratings yet
Maths Ceramah 2011: Graphs & Equations
9 pages
Spring Boot Interview Questions Guide
No ratings yet
Spring Boot Interview Questions Guide
8 pages
Rutsiro S4 Economics Scheme of Work
88% (8)
Rutsiro S4 Economics Scheme of Work
41 pages
APU Starter Logic in Aircraft Systems
No ratings yet
APU Starter Logic in Aircraft Systems
15 pages
Polynomials Quiz: Zeros and Properties
No ratings yet
Polynomials Quiz: Zeros and Properties
4 pages
Understanding Homeostasis Mechanisms
No ratings yet
Understanding Homeostasis Mechanisms
19 pages
Surveying Field Work Operations Guide
No ratings yet
Surveying Field Work Operations Guide
10 pages
Software Engineer Resume of Govind Kumar
No ratings yet
Software Engineer Resume of Govind Kumar
4 pages
Creating Histograms in Google Sheets
No ratings yet
Creating Histograms in Google Sheets
4 pages
First Law of Thermodynamics Overview
No ratings yet
First Law of Thermodynamics Overview
11 pages
Open-Delta Connection in 3-Phase Transformers
No ratings yet
Open-Delta Connection in 3-Phase Transformers
8 pages
Python & SQL Projects for Class XII
No ratings yet
Python & SQL Projects for Class XII
39 pages
Lens Maker's Formula Derivation Guide
No ratings yet
Lens Maker's Formula Derivation Guide
5 pages