Grammars
• Before you can parse you need a grammar.
• So where do grammars come from?
Grammar Engineering
Lovingly hand-crafted decades-long efforts by humans to write
grammars (typically in some particular grammar formalism of interest
to the linguists developing the grammar).
TreeBanks
Semi-automatically generated sets of parse trees for the sentences in
some corpus. Typically in a generic lowest common denominator
formalism (of no particular interest to any modern linguist).
1
3/6/2019
TreeBank Grammars
• Reading off the grammar…
• The grammar is the set of rules (local
subtrees) that occur in the annotated
corpus
• They tend to avoid recursion (and
elegance and parsimony)
Ie. they tend to the flat and redundant
• Penn TreeBank (III) has about 17500
grammar rules under this definition. 2
3/6/2019
TreeBanks
3
3/6/2019
TreeBanks
4
3/6/2019
Sample Rules
5
3/6/2019
Example
6
3/6/2019
TreeBanks
• TreeBanks provide a grammar (of a sort).
• As we’ll see they also provide the training data for various ML
approaches to parsing.
• But they can also provide useful data for more purely linguistic
pursuits.
You might have a theory about whether or not something can happen
in particular language.
Or a theory about the contexts in which something can happen.
TreeBanks can give you the means to explore those theories. If you
can formulate the questions in the right way and get the data you
need.
7
3/6/2019
TreeBanks
• Finally, you should have noted a bit of a
circular argument here.
• Treebanks provide a grammar because
we can read the rules of the grammar out
of the treebank.
• But how did the trees get in there in the
first place? There must have been a
grammar theory in there someplace…
8
3/6/2019
TreeBanks
• Typically, not all of the sentences are
hand-annotated by humans.
• They’re automatically parsed and then
hand-corrected.
9
3/6/2019
Parsing
• Parsing with CFGs refers to the task of
assigning correct trees to input strings
• Correct here means a tree that covers all
and only the elements of the input and has
an S at the top
• It doesn’t actually mean that the system
can select the correct tree from among all
the possible trees
10
3/6/2019
Parsing
• As with everything of interest, parsing
involves a search which involves the
making of choices
• We’ll start with some basic (meaning bad)
methods before moving on to the one or
two that you need to know
11
3/6/2019
For Now
• Assume…
You have all the words already in some buffer
The input isn’t POS tagged
We won’t worry about morphological analysis
All the words are known
12
3/6/2019
Top-Down Parsing
• Since we’re trying to find trees rooted with
an S (Sentences) start with the rules that
give us an S.
• Then work your way down from there to
the words.
13
3/6/2019
Top Down Space
14
3/6/2019
Bottom-Up Parsing
• Of course, we also want trees that cover
the input words. So start with trees that
link up with the words in the right way.
• Then work your way up from there.
15
3/6/2019
Bottom-Up Space
16
3/6/2019
Bottom Up Space
17
3/6/2019
Control
• Of course, in both cases we left out how to
keep track of the search space and how to
make choices
Which node to try to expand next
Which grammar rule to use to expand a node
18
3/6/2019
Top-Down and Bottom-Up
• Top-down
Only searches for trees that can be answers
(i.e. S’s)
But also suggests trees that are not consistent
with any of the words
• Bottom-up
Only forms trees consistent with the words
But suggest trees that make no sense globally
19
3/6/2019
Problems
• Even with the best filtering, backtracking
methods are doomed if they don’t
address certain problems
Ambiguity
Shared subproblems
20
3/6/2019
Ambiguity
21
3/6/2019
Shared Sub-Problems
• No matter what kind of search (top-down
or bottom-up or mixed) that we choose.
We don’t want to unnecessarily redo work
we’ve already done.
22
3/6/2019
Shared Sub-Problems
• Consider
A flight from Indianapolis to Houston on TWA
23
3/6/2019
Shared Sub-Problems
• Assume a top-down parse making bad
initial choices on the Nominal rule.
• In particular…
Nominal -> Nominal Noun
Nominal -> Nominal PP
24
3/6/2019
Shared Sub-Problems
25
3/6/2019
Shared Sub-Problems
26
3/6/2019
Shared Sub-Problems
27
3/6/2019
Shared Sub-Problems
28
3/6/2019
Dependency Grammars
More appropriate for languages other than
English
Popular in Europe, India
Key notion is thematic roles between words
Often binary in nature
29
3/6/2019
Example 12.14 in book
They hid the letter on the shelf
hid
letter
They
the shelf
the
30
3/6/2019
Differences with CFG
• No non-terminals
• Each link between 2 lexical nodes
• Typed dependency parse
Each link can be labeled (about 48
relationships. See Fig 12.15)
• Useful for languages with flexible ordering
• Can convert CFG parse to dependency
parse (see 12.7.1)
31
3/6/2019
Dependency parsing
• Very important “growth” area for new
research
• Helpful in machine translation systems
• Closer to semantics than CFG
• Abstracts out the word-order variation
32
3/6/2019