0% found this document useful (0 votes)
89 views32 pages

Grammars: Before You Can Parse You Need A Grammar. So Where Do Grammars Come From?

Grammars are needed for parsing. Grammars can come from two main sources: manually crafted grammars developed by linguists or treebanks which are semi-automatically generated parse trees for sentences in a corpus. Treebanks provide grammars because the set of rules can be read off from the annotated trees, though these grammars tend to be flat and redundant compared to hand-crafted grammars. However, the trees in treebanks are not all fully hand-annotated and are initially parsed and then hand-corrected.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views32 pages

Grammars: Before You Can Parse You Need A Grammar. So Where Do Grammars Come From?

Grammars are needed for parsing. Grammars can come from two main sources: manually crafted grammars developed by linguists or treebanks which are semi-automatically generated parse trees for sentences in a corpus. Treebanks provide grammars because the set of rules can be read off from the annotated trees, though these grammars tend to be flat and redundant compared to hand-crafted grammars. However, the trees in treebanks are not all fully hand-annotated and are initially parsed and then hand-corrected.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 32

Grammars

• Before you can parse you need a grammar.


• So where do grammars come from?
 Grammar Engineering
 Lovingly hand-crafted decades-long efforts by humans to write
grammars (typically in some particular grammar formalism of interest
to the linguists developing the grammar).
 TreeBanks
 Semi-automatically generated sets of parse trees for the sentences in
some corpus. Typically in a generic lowest common denominator
formalism (of no particular interest to any modern linguist).

1
3/6/2019
TreeBank Grammars

• Reading off the grammar…


• The grammar is the set of rules (local
subtrees) that occur in the annotated
corpus
• They tend to avoid recursion (and
elegance and parsimony)
 Ie. they tend to the flat and redundant
• Penn TreeBank (III) has about 17500
grammar rules under this definition. 2
3/6/2019
TreeBanks

3
3/6/2019
TreeBanks

4
3/6/2019
Sample Rules

5
3/6/2019
Example

6
3/6/2019
TreeBanks

• TreeBanks provide a grammar (of a sort).


• As we’ll see they also provide the training data for various ML
approaches to parsing.
• But they can also provide useful data for more purely linguistic
pursuits.
 You might have a theory about whether or not something can happen
in particular language.
 Or a theory about the contexts in which something can happen.
 TreeBanks can give you the means to explore those theories. If you
can formulate the questions in the right way and get the data you
need.

7
3/6/2019
TreeBanks

• Finally, you should have noted a bit of a


circular argument here.
• Treebanks provide a grammar because
we can read the rules of the grammar out
of the treebank.
• But how did the trees get in there in the
first place? There must have been a
grammar theory in there someplace…
8
3/6/2019
TreeBanks

• Typically, not all of the sentences are


hand-annotated by humans.
• They’re automatically parsed and then
hand-corrected.

9
3/6/2019
Parsing

• Parsing with CFGs refers to the task of


assigning correct trees to input strings
• Correct here means a tree that covers all
and only the elements of the input and has
an S at the top
• It doesn’t actually mean that the system
can select the correct tree from among all
the possible trees
10
3/6/2019
Parsing

• As with everything of interest, parsing


involves a search which involves the
making of choices
• We’ll start with some basic (meaning bad)
methods before moving on to the one or
two that you need to know

11
3/6/2019
For Now

• Assume…
 You have all the words already in some buffer
 The input isn’t POS tagged
 We won’t worry about morphological analysis
 All the words are known

12
3/6/2019
Top-Down Parsing

• Since we’re trying to find trees rooted with


an S (Sentences) start with the rules that
give us an S.
• Then work your way down from there to
the words.

13
3/6/2019
Top Down Space

14
3/6/2019
Bottom-Up Parsing

• Of course, we also want trees that cover


the input words. So start with trees that
link up with the words in the right way.
• Then work your way up from there.

15
3/6/2019
Bottom-Up Space

16
3/6/2019
Bottom Up Space

17
3/6/2019
Control

• Of course, in both cases we left out how to


keep track of the search space and how to
make choices
 Which node to try to expand next
 Which grammar rule to use to expand a node

18
3/6/2019
Top-Down and Bottom-Up

• Top-down
 Only searches for trees that can be answers
(i.e. S’s)
 But also suggests trees that are not consistent
with any of the words
• Bottom-up
 Only forms trees consistent with the words
 But suggest trees that make no sense globally

19
3/6/2019
Problems

• Even with the best filtering, backtracking


methods are doomed if they don’t
address certain problems
 Ambiguity
 Shared subproblems

20
3/6/2019
Ambiguity

21
3/6/2019
Shared Sub-Problems

• No matter what kind of search (top-down


or bottom-up or mixed) that we choose.
 We don’t want to unnecessarily redo work
we’ve already done.

22
3/6/2019
Shared Sub-Problems

• Consider
 A flight from Indianapolis to Houston on TWA

23
3/6/2019
Shared Sub-Problems

• Assume a top-down parse making bad


initial choices on the Nominal rule.
• In particular…
 Nominal -> Nominal Noun
 Nominal -> Nominal PP

24
3/6/2019
Shared Sub-Problems

25
3/6/2019
Shared Sub-Problems

26
3/6/2019
Shared Sub-Problems

27
3/6/2019
Shared Sub-Problems

28
3/6/2019
Dependency Grammars

 More appropriate for languages other than


English
 Popular in Europe, India
 Key notion is thematic roles between words
 Often binary in nature

29
3/6/2019
Example 12.14 in book

 They hid the letter on the shelf

hid

letter
They

the shelf

the

30
3/6/2019
Differences with CFG

• No non-terminals
• Each link between 2 lexical nodes
• Typed dependency parse
 Each link can be labeled (about 48
relationships. See Fig 12.15)
• Useful for languages with flexible ordering
• Can convert CFG parse to dependency
parse (see 12.7.1)
31
3/6/2019
Dependency parsing

• Very important “growth” area for new


research
• Helpful in machine translation systems
• Closer to semantics than CFG
• Abstracts out the word-order variation

32
3/6/2019

You might also like