16/01/2025, 16:05 OneNote
Syntactic Analysis
Wednesday, 15 January 2025 11:52 AM
UNIT III:Syntactic Analysis: Context-Free Grammars, Grammar rules for English,
Treebanks, Normal Forms for grammar – Dependency Grammar – Syntactic Parsing,
Ambiguity, Dynamic Programming parsing – Shallow parsing – Probabilistic CFG,
Probabilistic CYK, Probabilistic Lexicalized CFGs – Feature structures, Unification of
feature structures.
Syntactic Analysis
Proper understanding of sentence meaning.
1. Context-Free Grammars (CFGs)
A Context-Free Grammar (CFG) is a formal system used to describe the syntax of languages, including
programming and natural languages. A CFG consists of:
• Non-terminal symbols: Represent abstract concepts or categories (e.g., Sentence (S), Noun Phrase
(NP), Verb Phrase (VP)).
• Terminal symbols: Represent actual words or symbols in the language (e.g., "dog," "runs").
• Production rules: Define how non-terminals can be replaced with terminals and/or other non-
terminals (e.g., S→NPVPS \to NP \; VP).
• Start symbol: The top-level symbol from which sentences in the language are derived (usually S).
CFGs are powerful for modeling the hierarchical structure of languages, enabling parsing and
understanding of sentences.
V - It is the collection of variables or non-terminal symbols.
T - It is a set of terminals.
P - It is the production rules that consist of both terminals and non-
terminals.
S - It is the starting symbol.
2. Grammar Rules for English
English grammar can be captured using CFGs by defining production rules such as:
https://onedrive.live.com/view.aspx?resid=B601116B79FD1A71!s8c0260e73b5f4ae7a8c34978f96df1ff&migratedtospo=true&redeem=aHR0cHM6Ly8xZHJ2Lm1zL28vYy9iNjAxMTE2Yjc5ZmQxYTcxL0V1ZGdBb3hmTy… 1/4
16/01/2025, 16:05 OneNote
• S→NPVPS \to NP \; VP(A sentence consists of a noun phrase and a verb phrase).
• NP→DetNNP \to Det \; N(A noun phrase can include a determiner and a noun).
• VP→VNPVP \to V \; NP(A verb phrase may contain a verb followed by a noun phrase).
For example:
• Sentence: "The cat eats fish."
○ S→NPVPS \to NP \; VP
○ NP→DetNNP \to Det \; N
○ VP→VNPVP \to V \; NP
These rules allow for the structural analysis of English sentences.
3. Treebanks
A Treebank is a database of sentences annotated with syntactic or semantic structures, often in the
form of parse trees. Treebanks are valuable for:
• Training and evaluating parsing algorithms.
• Capturing linguistic phenomena in a language.
Example
4. Normal Forms for Grammar
Normal forms are standardized representations of grammar rules to simplify parsing algorithms. Key
normal forms include:
• Chomsky Normal Form (CNF): Each production rule is of the form A→BCA \to BCor A→aA \to a,
where A,B,CA, B, Care non-terminals and aais a terminal.
• Greibach Normal Form (GNF): Rules are of the form A→aαA \to a\alpha, where aais a terminal
and α\alphais a sequence of non-terminals.
5. Dependency Grammar
A Dependency Grammar focuses on the relationships between words in a sentence, emphasizing how
words depend on each other. It is represented as a dependency tree, where:
• Nodes represent words.
• Edges represent dependency relations (e.g., subject, object).
Example:
• Sentence: "The cat eats fish."
○ "eats" (root)
• "cat" (subject)
• "fish" (object)
Dependency grammars are commonly used in syntactic parsing tasks.
https://onedrive.live.com/view.aspx?resid=B601116B79FD1A71!s8c0260e73b5f4ae7a8c34978f96df1ff&migratedtospo=true&redeem=aHR0cHM6Ly8xZHJ2Lm1zL28vYy9iNjAxMTE2Yjc5ZmQxYTcxL0V1ZGdBb3hmTy… 2/4
16/01/2025, 16:05 OneNote
6. Syntactic Parsing
Syntactic parsing involves analyzing a sentence to produce its syntactic structure, typically as a tree:
• Constituency Parsing: Focuses on breaking sentences into sub-phrases (constituents) using CFGs.
• Dependency Parsing: Focuses on finding the dependencies between words.
7. Ambiguity
Ambiguity arises when a sentence can have multiple interpretations:
• Lexical Ambiguity: A word has multiple meanings (e.g., "bank" as a financial institution or
riverbank).
• Structural Ambiguity: A sentence has multiple valid parse trees (e.g., "I saw the man with a
telescope").
Resolving ambiguity is critical for accurate syntactic parsing.
8. Dynamic Programming Parsing
Dynamic programming is used in parsing algorithms to avoid redundant computations:
• CYK Algorithm: A bottom-up parser for CFGs in Chomsky Normal Form.
• Earley Parser: Handles all CFGs and is both top-down and bottom-up.
9. Shallow Parsing
Shallow Parsing (or chunking) identifies phrases in a sentence without generating a full parse tree:
• Goal: Extract noun phrases, verb phrases, etc.
• Example: For "The cat sleeps," identify:
○ NP: "The cat"
○ VP: "sleeps"
https://onedrive.live.com/view.aspx?resid=B601116B79FD1A71!s8c0260e73b5f4ae7a8c34978f96df1ff&migratedtospo=true&redeem=aHR0cHM6Ly8xZHJ2Lm1zL28vYy9iNjAxMTE2Yjc5ZmQxYTcxL0V1ZGdBb3hmTy… 3/4
16/01/2025, 16:05 OneNote
10. Probabilistic CFGs (PCFGs)
PCFGs extend CFGs by associating probabilities with production rules. They help disambiguate multiple
parses by choosing the most probable one:
• Example: P(S→NPVP)=0.9P(S \to NP \; VP) = 0.9
11. Probabilistic CYK Parsing
This is an extension of the CYK algorithm for PCFGs:
Cocke-Younger-Kasami Parsing
The→ {Det}boy→ {Noun}eats→ {Verb}
• Uses probabilities to find the most likely parse for a sentence.
12. Probabilistic Lexicalized CFGs
Lexicalized CFGs associate specific words with grammar rules:
• Adds context by including head words (e.g., associating "eats" with the verb phrase).
13. Feature Structures
Feature structures represent syntactic, semantic, or morphological properties of linguistic units as
attribute-value pairs:
• Example: For "cats":
○ Number: plural
○ Part-of-Speech: noun
14. Unification of Feature Structures
Unification is the process of merging feature structures to check compatibility:
• Combines two sets of attributes if they are consistent.
https://onedrive.live.com/view.aspx?resid=B601116B79FD1A71!s8c0260e73b5f4ae7a8c34978f96df1ff&migratedtospo=true&redeem=aHR0cHM6Ly8xZHJ2Lm1zL28vYy9iNjAxMTE2Yjc5ZmQxYTcxL0V1ZGdBb3hmTy… 4/4