Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau
AAiT
SiTE
Course Title: Natural Language Processing (TSC-7261)
Credit Hour: 3
Instructor: Fantahun B. (PhD) [email protected]
Office: NB #
11/13/2023
NLP
Fantahun B.(PhD)
2
POS Tagging and Sequence Labeling
Objectives:
After completing this chapter, students will be able to:
11/13/2023
NLP
Fantahun B.(PhD)
3
POS Tagging
From the earliest linguistic traditions (Yaska and Panini 5th C. BCE,
Aristotle 4th C. BCE), the idea that words can be classified into
grammatical categories
part of speech, word classes, POS, POS tags, morphological classes,
or lexical tags
Adverb,
Conjunction,
Participle,
Article
NLP
Fantahun B.(PhD)
From the earliest linguistic traditions (the Sanskrit grammarians Yaska and Panini
in India, the Aristotle and the Stoics in Greece), came idea that
POS Tagging
Part-of-speech tagging is the process of assigning a part-
speech—and the goal is to find the correct tag for the situation.
Example book:
VERB: (Book that flight)
NOUN: (Hand me that book).
Maps from sequence x1,…,xn of words to
11/13/2023
NLP
Fantahun B.(PhD)
5
POS Tagging
11/13/2023
NLP
Fantahun B.(PhD)
6
POS Tagging
Map from sequence x1,…,xn of words to y1,…,yn of POS tags
11/13/2023
NLP
Fantahun B.(PhD)
7
POS Tagging: Significance
What do you think is the significance of POS?
The significance of parts-of-speech is the large amount of
11/13/2023
NLP
Fantahun B.(PhD)
8
POS Tagging: Significance
Speech synthesis system: a word’s part-of-speech can tell us
More examples
• Object, OBject (noun) and obJECT (verb),
• Discount, DIScount (noun) and disCOUNT (verb).
• INsult, inSULT ?, OVERflow, overFLOW ?, DIScoun,t discount ?
11/13/2023
NLP
Fantahun B.(PhD)
9
POS Tagging: Significance
Information Retrieval: POS can also be used in stemming for
very useful for linguistic research. For example, they can be used
to help find instances or frequencies of particular constructions.
11/13/2023
NLP
Fantahun B.(PhD)
10
POS Tagging: POS Categories: 1-closed classes
Closed class words:
Closed classes are those that have relatively fixed membership.
For example, prepositions are a closed class because there is a
fixed set of them in English; new prepositions are rarely coined.
Usually function words: short, frequent words with grammatical
function
• determiners: a, an, the
• pronouns: she, he, I
• prepositions: on, under, over, near, by, …
11/13/2023
NLP
Fantahun B.(PhD)
11
POS Tagging: POS Categories: 2-open classes
Open class words: nouns and verbs are open classes
because new nouns and verbs are continually coined or
borrowed from other languages.
content words: Nouns, Verbs, Adjectives, Adverbs
Interjections: oh, ouch, uh-huh, yes, hello
o New nouns and verbs like iPhone or to fax
NLP
Fantahun B.(PhD)
12
POS
Tagging
Open
class
("content")and
wordsSequence Labeling
Nouns
d
Verbs
Adjectives
Proper
Common
Main
Adverbs
Janet
Italy
cat, cats
mango
eat
went
Numbers
Interjections Ow hello
… more
122,312
one
Auxiliary
be, can, do
had,
Prepositions to with
Particles
off up
… more
they its
NLP
Fantahun B.(PhD)
11/13/2023
NLP
Fantahun B.(PhD)
14
POS Tagging:
Tagsets for English
45 tagsets
11/13/2023
NLP
Fantahun B.(PhD)
15
POS Tagging: Universal dependencies tagset
s
11/13/2023
NLP
Fantahun B.(PhD)
16
POS Tagging: Some tagged English sentences
Example:
There/PRO were/VERB 70/NUM children/NOUN there/ADV ./PUNC
Preliminary/ADJ findings/NOUN were/AUX reported/VERB in/ADP
today/NOUN ’s/PART New/PROPN England/PROPN Journal/PROPN
of/ADP Medicine/PROPN
The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN
of/IN other/JJ topics/NNS ./.
There/EX are/VBP 70/CD children/NNS there/RB
11/13/2023
NLP
Fantahun B.(PhD)
17
POS Tagging: Difficulties in tagging
Some tagging distinctions are quite hard for both humans
11/13/2023
NLP
Fantahun B.(PhD)
18
POS Tagging: Difficulties in tagging
Making these decisions requires sophisticated knowledge of syntax;
tagging manuals (Santorini, 1990) give various heuristics that can help
human coders make these decisions, and that can also provide useful
features for automatic taggers. Eg. two heuristics from Santorini (1990):
Prepositions generally are associated with a following noun phrase
(also by prepositional phrases), and that the word around is tagged
as an adverb when it means “approximately”.
Particles often can either precede or follow a noun phrase object:
• She told off/RP her friends particle
• She told her friends off/RP.
an ungrammatical sentence.
NLP
Fantahun B.(PhD)
19
POS Tagging: Difficulties in tagging
2) Another difficulty is labeling the words that can modify nouns.
Sometimes the modifiers preceding nouns are common nouns
11/13/2023
NLP
Fantahun B.(PhD)
20
POS Tagging: Difficulties in tagging
Some words that can be adjectives, common nouns, or
Chinese/NN cooking/NN
Pacific/NN waters/NNS
11/13/2023
NLP
Fantahun B.(PhD)
21
POS Tagging: Difficulties in tagging
3) A third known difficulty in tagging is distinguishing past
yesterday at 5:00.
At the time, she was already married/JJ.
11/13/2023
NLP
Fantahun B.(PhD)
22
POS Tagging: Algorithms
Many algorithms have been applied to this problem,
including
Rule-based tagging
Probabilistic / Stochastic methods
o HMM tagging
o Maximum entropy tagging
11/13/2023
NLP
Fantahun B.(PhD)
23
POS Tagging: Algorithms - Rule-Based POS Tagging
The earliest algorithms for automatically assigning POS were
11/13/2023
NLP
Fantahun B.(PhD)
24
POS Tagging: Algorithms - Rule-Based POS Tagging
The EngCG ENGTWOL lexicon is based on the two-level
syntactic features.
Fig. 5.11 shows some selected words, together with a slightly
NLP
Fantahun B.(PhD)
25
POS Tagging: Algorithms - Rule-Based POS Tagging
11/13/2023
NLP
Fantahun B.(PhD)
26
POS Tagging: Algorithms - Rule-Based POS Tagging
Most of the features in Fig. 5.11 are relatively self-explanatory;
SG for singular, -SG3 for other than third-person-singular.
ABSOLUTE non-comparative and non-superlative for an adjective,
indefinite determiner a.
pattern for the verb. SV means the verb appears solely with a subject
(nothing occurred); SVO with a subject and an object (I showed the film);
SVOO with a subject and two complements: She showed her the ball.
11/13/2023
NLP
Fantahun B.(PhD)
27
POS Tagging: Algorithms - Rule-Based POS Tagging
In the first stage of the tagger, each word is run through the two-level
lexicon transducer and the entries for all possible parts-of-speech are
returned.
For example the phrase “Pavlov had shown that salivation . . .“ would
return the following list (one line per possible tag, with the correct tag
shown in boldface):
Pavlov PAVLOV N NOM SG PROPER
had
HAVE V PAST VFIN SVO
HAVE PCP2 SVO
shown SHOW PCP2 SVOO SVO SV
that
ADV
PRON DEM SG
DET CENTRAL DEM SG
CS
salivation
N NOM SG
...
11/13/2023
NLP
Fantahun B.(PhD)
28
POS Tagging: Algorithms - Rule-Based POS Tagging
EngCG then applies a large set of constraints (as many as 3,744
The boldfaced entries in the table above show the desired result, in
which the simple past tense tag (rather than the past participle tag) is
applied to had, and the complementizer (CS) tag is applied to that.
The constraints are used in a negative way, to eliminate tags that are
For example one constraint eliminates all readings of that except the
11/13/2023
NLP
Fantahun B.(PhD)
29
POS Tagging: Algorithms - Rule-Based POS Tagging
Here’s a simplified version of the constraint. . .
11/13/2023
NLP
Fantahun B.(PhD)
30
POS Tagging: Algorithms - Rule-Based POS Tagging
The first two clauses of this rule check to see that the that
11/13/2023
NLP
Fantahun B.(PhD)
31
POS Tagging: Algorithms - Rule-Based POS Tagging
Another rule is used to express the constraint that the
11/13/2023
NLP
Fantahun B.(PhD)
32
POS Tagging: Markov Chains
An HMM is nothing more than a probabilistic function of a
Markov process.
11/13/2023
NLP
Fantahun B.(PhD)
33
POS Tagging: Markov Chains
Markov chains and Hidden Markov Models are both extensions
11/13/2023
NLP
Fantahun B.(PhD)
34
POS Tagging: Markov Chains
Fig. 6.1a shows a Markov chain for assigning a probability to a
11/13/2023
NLP
Fantahun B.(PhD)
35
POS Tagging: Markov Chains
Fig. 6.1b shows another simple example of a Markov chain for
11/13/2023
NLP
Fantahun B.(PhD)
36
POS Tagging: Markov Chains
First order Markov chain assumptions
11/13/2023
NLP
Fantahun B.(PhD)
37
POS Tagging: Hidden Markov Models
In an HMM, you don't know the state sequence that the model
11/13/2023
NLP
Fantahun B.(PhD)
38
POS Tagging: Hidden Markov Models
A HMM allows us to talk about both observed Model events (like words
that we see in the input) and hidden events (like POS tags) that we think
of as causal factors in our probabilistic model.
(2002a).
Imagine that you are a climatologist in the year 2799 studying the history of
global warming. You cannot find any records of the weather in Baltimore,
Maryland, for the summer of 2007, but you do find Jason Eisner’s diary, which
lists how many ice creams Jason ate every day that summer. Our goal is to
use these observations to estimate the temperature every day. We’ll simplify
this weather task by assuming there are only two kinds of days: cold (C) and
hot (H).
11/13/2023
NLP
Fantahun B.(PhD)
39
POS Tagging: Hidden Markov Models
Formal definition of an HMM
An HMM is specified by the following components:
11/13/2023
NLP
Fantahun B.(PhD)
40
POS Tagging: Hidden Markov Models
A first-order Hidden Markov Model two simplifying assumptions.
1) As with a first-order Markov chain, the probability of a particular
the state observation qi, and not on any other states or any other
observations:
Fig. 6.3 shows a sample HMM for the ice cream task. The two hidden states (H and C)
correspond to hot and cold weather, while the observations (drawn from the
alphabet O = {1,2,3}) correspond to the number of ice creams eaten by Jason on a
given day.
11/13/2023
NLP
Fantahun B.(PhD)
41
POS Tagging: Hidden Markov Models
11/13/2023
NLP
Fantahun B.(PhD)
42
POS Tagging: Hidden Markov Models
Notice that in the HMM in Fig. 6.3, there is a (non-zero) probability
lower-numbered state
speech.
11/13/2023
NLP
Fantahun B.(PhD)
43
POS Tagging: Hidden Markov Models
11/13/2023
NLP
Fantahun B.(PhD)
44
POS Tagging: Hidden Markov Models
Hidden Markov Models should be characterized by three
fundamental problems:
11/13/2023
NLP
Fantahun B.(PhD)
45
HMMs: Computing Likelihood: The Forward Algorithm
For example, given the HMM in Fig. 6.2b, what is the probability of
the sequence 3 1 3?
Markov chain, where the surface observations are the same as the
11/13/2023
NLP
Fantahun B.(PhD)
46
HMMs: Computing Likelihood: The Forward Algorithm
Simpler case: we already knew the weather, and wanted to
First, recall that for Hidden Markov Models, each hidden state
,
11/13/2023
NLP
Fantahun B.(PhD)
47
HMMs: Computing Likelihood: The Forward Algorithm
11/13/2023
NLP
Fantahun B.(PhD)
48
HMMs: Computing Likelihood: The Forward Algorithm
First, let’s compute the joint probability of being in a
11/13/2023
NLP
Fantahun B.(PhD)
49
HMMs: Computing Likelihood: The Forward Algorithm
11/13/2023
NLP
Fantahun B.(PhD)
50
HMMs: Computing Likelihood: The Forward Algorithm
Now, we can compute the total probability of the observations
(6.13)
11/13/2023
NLP
Fantahun B.(PhD)
51
HMMs: Computing Likelihood: The Forward Algorithm
Solution: we use the Forward Algorithm which is an efficient
algorithm (O(N2T)).
11/13/2023
NLP
Fantahun B.(PhD)
52
HMMs: Computing Likelihood: The Forward Algorithm
Each cell of the forward algorithm trellis
Fig. 6.7 shows an example of the forward trellis for computing the
11/13/2023
NLP
Fantahun B.(PhD)
53
HMMs: Computing Likelihood: The Forward Algorithm
z
11/13/2023
NLP
Fantahun B.(PhD)
54
HMMs: Computing Likelihood: The Forward Algorithm
For a given state qj at time t, the value
t( j) is computed as:
The three factors that are multiplied in Eq. 6.15 in extending the
11/13/2023
NLP
Fantahun B.(PhD)
55
HMMs: Computing Likelihood: The Forward Algorithm
v
11/13/2023
NLP
Fantahun B.(PhD)
56
HMMs: Computing Likelihood: The Forward Algorithm
We give two formal definitions of the forward algorithm; the
11/13/2023
NLP
Fantahun B.(PhD)
57
HMMs: Decoding: The Viterbi Algorithm
11/13/2023
NLP
Fantahun B.(PhD)
58
HMMs: Decoding: The Viterbi Algorithm
We might propose to find the best sequence as follows:
1. for each possible hidden state sequence (HHH, HHC, HCH, etc.),
2. then we could choose the hidden state sequence with the max
observation likelihood.
Viterbi Algorithm.
11/13/2023
NLP
Fantahun B.(PhD)
59
HMMs: Decoding: The Viterbi Algorithm
Fig. 6.10 shows an example of the Viterbi trellis for computing the
out the trellis. Each cell of the Viterbi trellis, vt(j) represents the
probability that the HMM is in state j after seeing the first t
observations and passing through the most probable state
sequence q0,q1, ...,qt−1, given the automaton λ .
most probable path that could lead us to this cell. Formally, each
cell expresses the following probability:
11/13/2023
NLP
Fantahun B.(PhD)
60
HMMs: Decoding: The Viterbi Algorithm
s
11/13/2023
NLP
Fantahun B.(PhD)
61
HMMs: Decoding: The Viterbi Algorithm
For a given state qj at time t, the value vt(j) is computed as:
The three factors that are multiplied in Eq. 6.20 for extending the
11/13/2023
NLP
Fantahun B.(PhD)
62
HMMs: Decoding: The Viterbi Algorithm
Note that the Viterbi algorithm is identical to the forward algorithm
except that it takes the max over the previous path probabilities
where the forward algorithm takes the sum.
Note also that the Viterbi algorithm has one component that the
11/13/2023
NLP
Fantahun B.(PhD)
63
HMMs: Decoding: The Viterbi Algorithm
Finally, we can give a formal definition of the Viterbi recursion as
follows:
11/13/2023
NLP
Fantahun B.(PhD)
64
HMMs: Decoding: The Viterbi Algorithm
Finally, we can give a formal definition of the Viterbi recursion as
follows:
11/13/2023
NLP
Fantahun B.(PhD)
65
Training HMMs: The Forward-Backward Algorithm
NLP
Fantahun B.(PhD)
66
Training HMMs: The Forward-Backward Algorithm
Simpler case of training a Markov chain rather than HMM:
Since the states in a Markov chain are observed, we can run the
probability matrix A.
11/13/2023
NLP
Fantahun B.(PhD)
67
Training HMMs: The Forward-Backward Algorithm
We get the maximum likelihood estimate of the probability aij of a
problem.
11/13/2023
NLP
Fantahun B.(PhD)
68
Training HMMs: The Forward-Backward Algorithm
The Baum-Welch algorithm uses two neat intuitions to solve this
problem.
an observation and then dividing that probability mass among all the
different paths that contributed to this forward probability.
11/13/2023
NLP
Fantahun B.(PhD)
69
Training HMMs: The Forward-Backward Algorithm
The backward probability β is the probability of seeing the
observations from time t+1 to the end, given that we are in state i at
time t (and of course given the automaton λ):
11/13/2023
NLP
Fantahun B.(PhD)
70
Training HMMs: The Forward-Backward Algorithm
s
11/13/2023
NLP
Fantahun B.(PhD)
71
Training HMMs: The Forward-Backward Algorithm
We are now ready to understand how the forward and backward
11/13/2023
NLP
Fantahun B.(PhD)
72
POS Tagging: Hidden Markov Models
Sources of information in Tagging
tags of other words in the context of the word we are interested
in.
used such information about syntagmatic patterns correctly tagged only 77%
of words.
Charniak et al. (1993), showed that a `dumb' tagger that simply assigns
the most common tag to each word performs at the surprisingly high
level of 90% correct.
As a result of this, the performance of such a `dumb' tagger has been
used to give a baseline performance level in subsequent studies.
11/13/2023
NLP
Fantahun B.(PhD)
73
POS Tagging: Hidden Markov Models
Sources of information in Tagging
And all modern taggers in some way make use of a
combination of
sequences) and
lexical information (predicting a tag based on the word
concerned).
11/13/2023
NLP
Fantahun B.(PhD)
74
POS Tagging: Hidden Markov Models
Computing the most-likely tag sequence
HMM tagging algorithm chooses as the most likely tag
For this example, we will use the 87-tag Brown corpus tagset,
because it has a specific tag for to, TO, used only when to is an
infinitive; prepositional uses of to are tagged as IN.
Example:
(5.36) Secretariat/NNP is/BEZ expected/VBN to/TO race/VB tomorrow/NR
(5.37) People/NNS continue/VB to/TO inquire/VB the/AT reason/NN for/IN
the/AT race/NN for/IN outer/JJ space/NN
11/13/2023
NLP
Fantahun B.(PhD)
75
POS Tagging: Hidden Markov Models
Computing the most-likely tag sequence
Let’s look at how race can be correctly tagged as a VB instead
of an NN in (5.36).
11/13/2023
NLP
Fantahun B.(PhD)
76
POS Tagging: Hidden Markov Models
Computing the most-likely tag sequence
Almost all the probabilities in these two sequences are identical; in
gives us the following probabilities, showing that verbs are about 500
times as likely as nouns to occur after TO:
11/13/2023
NLP
Fantahun B.(PhD)
77
POS Tagging: Hidden Markov Models
Computing the most-likely tag sequence
Let’s now turn to P(wi|ti), the lexical likelihood of the word race given
a part-of-speech tag.
For the two possible tags VB and NN, these correspond to the
P(race|NN) = .00057
P(race|VB) = .00012
Finally, we need to represent the tag sequence probability for the
11/13/2023
NLP
Fantahun B.(PhD)
78
POS Tagging: Hidden Markov Models
Computing the most-likely tag sequence
If we multiply the lexical likelihoods with the tag sequence
11/13/2023
NLP
Fantahun B.(PhD)
79
POS Tagging: Algorithms - Transformation-Based Tagging
Transformation-Based Tagging, sometimes called Brill tagging,
NLP
Fantahun B.(PhD)
80
POS Tagging: Algorithms - Transformation-Based Tagging
Imagine an artist painting a picture of a white house with green trim
against a blue sky. Suppose most of the picture was sky, and hence
most of the picture was blue.
The artist might begin by using a very broad brush and painting the
entire house white. She would just color in the whole house, not worrying
about the brown roof, or the blue windows or the green gables.
Next she takes a smaller brown brush and colors over the roof.
Now she takes up the blue paint on a small brush and paints in the blue
11/13/2023
NLP
Fantahun B.(PhD)
81
POS Tagging: Algorithms - Transformation-Based Tagging
The painter starts with a broad brush that covers a lot of the canvas
but colors a lot of areas that will have to be repainted. The next layer
colors less of the canvas, but also makes less “mistakes”. Each new
layer uses a finer brush that corrects less of the picture, but makes
fewer mistakes.
TBL uses somewhat the same method as this painter.
The TBL algorithm has a set of tagging rules. A corpus is first tagged
using the broadest rule, that is, the one that applies to the most
cases. Then a slightly more specific rule is chosen, which changes
some of the original tags. Next an even narrower rule, which
changes a smaller number of tags (some of which might be
previously changed tags).
11/13/2023
NLP
Fantahun B.(PhD)
82
POS Tagging: Transformation-Based Tagging
How TBL rules are applied
Let’s look at one of the rules used by Brill’s (1995) tagger. Before
the rules apply, the tagger labels every word with its most-likely
tag. We get these most-likely tags from a tagged corpus. For
example, in the Brown corpus, race is most likely to be a noun:
P(NN|race) = .98
P(VB|race) = .02
This means that the two examples of race that we saw above will
11/13/2023
NLP
Fantahun B.(PhD)
83
POS Tagging: Transformation-Based Tagging
How TBL rules are applied
In the first case, this is a mistake, as NN is the incorrect tag:
11/13/2023
NLP
Fantahun B.(PhD)
84
POS Tagging: Transformation-Based Tagging
How TBL rules are applied
After selecting the most-likely tag, Brill’s tagger applies its
transformation rules.
As it happens, Brill’s tagger learned a rule that applies exactly
11/13/2023
NLP
Fantahun B.(PhD)
85
POS Tagging: Transformation-Based Tagging
How TBL rules are learned
Brill’s TBL algorithm has three major stages.
1. It labels every word with its mostlikely tag.
2. It examines every possible transformation, and selects the one
NLP
Fantahun B.(PhD)
86
POS Tagging: Transformation-Based Tagging
How TBL rules are learned
The output of the TBL process is an ordered list of transformations;
pick the best one on each pass through the algorithm. Thus the
algorithm needs a way to limit the set of transformations. This is
done by designing a small set of templates (abstracted
transformations). Every allowable transformation is an instantiation
of one of the templates.
11/13/2023
NLP
Fantahun B.(PhD)
87
POS Tagging: Transformation-Based Tagging
How TBL rules are learned
Brill’s set of templates is listed in Fig. 5.20. Fig. 5.21 gives the details
11/13/2023
NLP
Fantahun B.(PhD)
88
POS Tagging: Transformation-Based Tagging
How TBL rules are learned
Brill’s set of templates is listed in Fig. 5.20. Fig. 5.21 gives the details
11/13/2023
NLP
Fantahun B.(PhD)
89
POS Tagging: Maximum Entropy Models
A second probabilistic machine learning framework called
sequence classification.
11/13/2023
NLP
Fantahun B.(PhD)
90
POS Tagging: Maximum Entropy Models
The task of classification is to take a single observation, extract some
not).
11/13/2023
NLP
Fantahun B.(PhD)
91
POS Tagging: Maximum Entropy Models
MaxEnt belongs to the family of classifiers known as the exponential or
log-linear classifiers.
Let’s flesh out this intuition just a bit more. Assume that we have some
11/13/2023
NLP
Fantahun B.(PhD)
92
POS Tagging: Maximum Entropy Models
Given the features and weights, our goal is to choose a class (for
example a POS-tag) for the word. MaxEnt does this by choosing the
most probable tag; the probability of a particular class c given the
observation x is:
11/13/2023
NLP
Fantahun B.(PhD)
93
POS Tagging: Maximum Entropy Models
Multinomial logistic regression is called MaxEnt in speech and
11/13/2023
NLP
Fantahun B.(PhD)
94
POS Tagging: Maximum Entropy Models
We need to make one more change to see the final MaxEnt equation.
11/13/2023
NLP
Fantahun B.(PhD)
95
POS Tagging: Maximum Entropy Models
Example features: POS
11/13/2023
NLP
Fantahun B.(PhD)
96
POS Tagging: Maximum Entropy Models
Two more part-of-speech tagging features might focus on aspects of a
11/13/2023
NLP
Fantahun B.(PhD)
97
POS Tagging: Maximum Entropy Models
Since each feature is dependent on both a property of the observation
and the class being labeled, we would need to have separate feature
for, e.g, the link between race and VB, or the link between a previous
TO and NN:
11/13/2023
NLP
Fantahun B.(PhD)
98
POS Tagging: Maximum Entropy Models
Each of these features has a corresponding weight. Thus the weight
w1(c,x) would indicate how strong a cue the word race is for the tag VB,
the weight w2(c,x) would indicate how strong a cue the previous tag TO
is for the current word being a VB, and so on.
11/13/2023
NLP
Fantahun B.(PhD)
99
POS Tagging: Maximum Entropy Models
Let’s assume that the feature weights for the two classes VB and VN
11/13/2023
NLP
Fantahun B.(PhD)
100
POS Tagging: Maximum Entropy Models
Notice that when we use MaxEnt to perform classification, MaxEnt
NLP
Fantahun B.(PhD)
101
POS Tagging: Maximum Entropy Models
But as we’ll see later in Sec. 6.8, we generally don’t use MaxEnt for
hard classification.
where we want not the best single class for one unit, but the best
total sequence.
For this task, it’s useful to exploit the entire probability distribution for
11/13/2023
NLP
Fantahun B.(PhD)
102
POS Tagging: Maximum Entropy Models
The features we have described so far express a single binary
property of an observation.
For example a word starting with a capital letter (eg. Day) is more likely to be a
proper noun (NNP) than a common noun (eg. in United Nations Day). But a word
which is capitalized but which occurs at the beginning of the sentence (the
previous word is <s>), as in Day after day...., is not more likely to be a proper
noun.
11/13/2023
NLP
Fantahun B.(PhD)
103
POS Tagging: Maximum Entropy Models
A key to successful use of MaxEnt is thus the design of appropriate
11/13/2023
NLP
Fantahun B.(PhD)
104
POS Tagging: Maximum Entropy Models (Learning)
Learning a MaxEnt model can be done via a generalization of the
Regularized version:
11/13/2023
NLP
Fantahun B.(PhD)
105
POS Tagging
Bibliography
Speech and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics, and Speech
Recognition (2nd edition) D. Jurafsky and J. Martin
Foundations of Statistical Natural Language Processing C. Manning and
H. Schutze
11/13/2023
NLP
Fantahun B.(PhD)
106