0% found this document useful (0 votes)

11 views108 pages

Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau

The document outlines the course on Natural Language Processing (TSC-7261) focusing on Parts of Speech (POS) tagging and sequence labeling. It covers the significance of POS tagging, its categories, challenges in tagging, and various algorithms used for tagging, including rule-based methods and Hidden Markov Models. The content emphasizes the historical context of POS tagging and its relevance in modern NLP applications.

Uploaded by

Fedasa Bote

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views108 pages

Apznzaaczprqee1da4bjade7ul0meb Ap8tjou Feozcgqct6cpnh0z32ibu3faj 0wgfmnhp5p Eneunhaucakhow Bie9yhlaoqtsknu7yq0gfnxrzjd2mjuyrbnhadveb2wj7gjgcxpffbjgyxl4nzdqf5qeux-Lla2ggr5kg9w4bp8ev5hqrj7bwr3npwnp9gfmazwtau

Uploaded by

Fedasa Bote

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 108

AAU

AAiT
SiTE
Course Title: Natural Language Processing (TSC-7261)
Credit Hour: 3
Instructor: Fantahun B. (PhD)  [email protected]
Office: NB #

3-Parts of Speech Tagging and Sequence Labeling

2023/2024, AA
POS Tagging and Sequence Labeling
Contents
 POS Tagging
 Lexical syntax
 Hidden Markov Models
 Maximum Entropy Models

11/13/2023

NLP

Fantahun B.(PhD)

2
POS Tagging and Sequence Labeling
Objectives:
After completing this chapter, students will be able to:

11/13/2023

NLP

Fantahun B.(PhD)

3
POS Tagging
 From the earliest linguistic traditions (Yaska and Panini 5th C. BCE,
Aristotle 4th C. BCE), the idea that words can be classified into
grammatical categories
 part of speech, word classes, POS, POS tags, morphological classes,
or lexical tags

 8 parts of speech attributed to Dionysius Thrax of Alexandria (c.

1st C. BCE):
 Noun,
 Verb,
 Pronon,
 Preposition,

 Adverb,
 Conjunction,
 Participle,
 Article

 These categories are relevant for NLP today.

11/13/2023

NLP

Fantahun B.(PhD)

From the earliest linguistic traditions (the Sanskrit grammarians Yaska and Panini
in India, the Aristotle and the Stoics in Greece), came idea that
POS Tagging
 Part-of-speech tagging is the process of assigning a part-

of-speech to each word in a text.

 Tagging is a disambiguation task why?

 words are ambiguous, have more than one possible part-of-

speech—and the goal is to find the correct tag for the situation.

 Example book:
 VERB: (Book that flight)
 NOUN: (Hand me that book).
 Maps from sequence x1,…,xn of words to

y1,…,yn of POS tags

11/13/2023

NLP

Fantahun B.(PhD)

5
POS Tagging


A sketch of part of speech tagging.

 input: a sequence x1,x2,...,xn of (tokenized) words and a tagset,
 Output: a sequence y1,y2,...,yn of tags, each output yi

corresponding exactly to one input xi,

11/13/2023

NLP

Fantahun B.(PhD)

6
POS Tagging
 Map from sequence x1,…,xn of words to y1,…,yn of POS tags

11/13/2023

NLP

Fantahun B.(PhD)

7
POS Tagging: Significance
What do you think is the significance of POS?
 The significance of parts-of-speech is the large amount of

information they give about a word and its neighbors.

 Useful in a language model for speech recognition.

 Eg. tagsets distinguish between possessive pronouns (my, your, his,

her, its) and personal pronouns (I, you, he, me).

• possessive pronouns  likely to be followed by a noun,
• personal pronouns  likely to be followed by a verb.

11/13/2023

NLP

Fantahun B.(PhD)

8
POS Tagging: Significance
 Speech synthesis system: a word’s part-of-speech can tell us

something about how the word is pronounced.

 Example, the word content can be a noun or an adjective.

They are pronounced differently

• As noun  pronounced CONtent,
• As adjective  pronounced conTENT.

 Thus knowing the POS can produce more natural pronunciations in

a speech synthesis system and more accuracy in a speech

recognition system.

 More examples
• Object, OBject (noun) and obJECT (verb),
• Discount, DIScount (noun) and disCOUNT (verb).
• INsult, inSULT ?, OVERflow, overFLOW ?, DIScoun,t discount ?
11/13/2023

NLP

Fantahun B.(PhD)

9
POS Tagging: Significance
 Information Retrieval: POS can also be used in stemming for

informational retrieval (IR), since knowing a word’s POS can help

tell us which morphological affixes it can take. They can also
enhance an IR application by selecting out nouns or other
important words from a document.
 Parsing, WSD: Automatic assignment of POS plays a role in parsing,

in word-sense disambiguation algorithms, and in shallow parsing of

texts to quickly find names, times, dates, or other named entities
for the information extraction applications.
 Linguistic research: corpora that have been marked for POS are

very useful for linguistic research. For example, they can be used
to help find instances or frequencies of particular constructions.

11/13/2023

NLP

Fantahun B.(PhD)

10
POS Tagging: POS Categories: 1-closed classes
 Closed class words:
 Closed classes are those that have relatively fixed membership.
 For example, prepositions are a closed class because there is a
fixed set of them in English; new prepositions are rarely coined.
 Usually function words: short, frequent words with grammatical
function
• determiners: a, an, the
• pronouns: she, he, I
• prepositions: on, under, over, near, by, …

11/13/2023

NLP

Fantahun B.(PhD)

11
POS Tagging: POS Categories: 2-open classes
 Open class words: nouns and verbs are open classes
because new nouns and verbs are continually coined or
borrowed from other languages.
 content words: Nouns, Verbs, Adjectives, Adverbs
 Interjections: oh, ouch, uh-huh, yes, hello
o New nouns and verbs like iPhone or to fax

 There are four major open classes that occur in the

languages of the world;

 nouns, verbs, adjectives, and adverbs.
 English has all four of these, although not every language does.
11/13/2023

NLP

Fantahun B.(PhD)

12
POS
Tagging
Open
class
("content")and
wordsSequence Labeling
Nouns
d

Verbs

Adjectives

old green tasty

slowly yesterday

Proper

Common

Main

Adverbs

Janet
Italy

cat, cats
mango

eat
went

Numbers

Closed class ("function")

Determiners the, a, some
Conjunctions and, or
Pronouns
11/13/2023

Interjections Ow hello
… more

122,312
one

Auxiliary
be, can, do
had,

Prepositions to with
Particles

off up

… more

they its
NLP

Fantahun B.(PhD)

See Section-5.1 page-124-130

13
POS Tagging: Tagsets for English
 There are a small number of popular tagsets for English, many

of which evolved from

 the 87-tag tagset used for the Brown corpus (Francis, 1979; Francis

and Kuˇcera, 1982).

 This corpus was tagged with POS by first applying the TAGGIT

program and then hand-correcting the tags.

 Besides this original Brown tagset, other most common tagsets:

 the small 45-tag Penn Treebank tagset (Marcus et al., 1993), and
 the medium-sized 61-tag C5 tagset used by the Lancaster UCREL

project’s CLAWS (the Constituent Likelihood AutomaticWordtagging System) tagger to

tag the British National Corpus (BNC)
(Garside et al., 1997).

11/13/2023

NLP

Fantahun B.(PhD)

14
POS Tagging:
Tagsets for English

45 tagsets

11/13/2023

NLP

Fantahun B.(PhD)

15
POS Tagging: Universal dependencies tagset
 s

11/13/2023

NLP

Fantahun B.(PhD)

16
POS Tagging: Some tagged English sentences
Example:
There/PRO were/VERB 70/NUM children/NOUN there/ADV ./PUNC
Preliminary/ADJ findings/NOUN were/AUX reported/VERB in/ADP
today/NOUN ’s/PART New/PROPN England/PROPN Journal/PROPN
of/ADP Medicine/PROPN
The/DT grand/JJ jury/NN commented/VBD on/IN a/DT number/NN
of/IN other/JJ topics/NNS ./.
There/EX are/VBP 70/CD children/NNS there/RB

11/13/2023

NLP

Fantahun B.(PhD)

17
POS Tagging: Difficulties in tagging
 Some tagging distinctions are quite hard for both humans

and machines to make.

1) Eg. prepositions (IN), particles (RP), and adverbs (RB) can

have a large overlap.

 Words like around can be all three:

 Mrs./NNP Shaefer/NNP never/RB got/VBD around/RP to/TO
joining/VBG
 All/DT we/PRP gotta/VBN do/VB is/VBZ go/VB around/IN the/DT
corner/NN
 Chateau/NNP Petrus/NNP costs/VBZ around/RB 250/CD

11/13/2023

NLP

Fantahun B.(PhD)

18
POS Tagging: Difficulties in tagging
 Making these decisions requires sophisticated knowledge of syntax;

tagging manuals (Santorini, 1990) give various heuristics that can help
human coders make these decisions, and that can also provide useful
features for automatic taggers. Eg. two heuristics from Santorini (1990):
 Prepositions generally are associated with a following noun phrase
(also by prepositional phrases), and that the word around is tagged
as an adverb when it means “approximately”.
 Particles often can either precede or follow a noun phrase object:
• She told off/RP her friends  particle
• She told her friends off/RP.

 Prepositions, cannot follow their noun phrase (* is used here to mark

an ungrammatical sentence.

• She stepped off/IN the train  preposition

• *She stepped the train off/IN.
11/13/2023

NLP

Fantahun B.(PhD)

19
POS Tagging: Difficulties in tagging
2) Another difficulty is labeling the words that can modify nouns.
 Sometimes the modifiers preceding nouns are common nouns

like cotton below,

 other times the Treebank tagging manual specifies that

modifiers be tagged as adjectives (for example if the modifier is

a hyphenated common noun like income-tax) and

 other times as proper nouns (for modifiers which are

hyphenated proper nouns like Gramm-Rudman):

o cotton/NN sweater/NN
o income-tax/JJ return/NN
o the/DT Gramm-Rudman/NP Act/NP

11/13/2023

NLP

Fantahun B.(PhD)

20
POS Tagging: Difficulties in tagging
 Some words that can be adjectives, common nouns, or

proper nouns, are tagged in the Treebank as common nouns

when acting as modifiers:

 Chinese/NN cooking/NN
 Pacific/NN waters/NNS

11/13/2023

NLP

Fantahun B.(PhD)

21
POS Tagging: Difficulties in tagging
3) A third known difficulty in tagging is distinguishing past

participles (VBN) from adjectives (JJ).

 A word like married is a past participle when it is being used

in an eventive, verbal way, as below, and is an adjective

when it is being used to express a property, as below:
 They were married/VBN by the Justice of the Peace

yesterday at 5:00.
 At the time, she was already married/JJ.

11/13/2023

NLP

Fantahun B.(PhD)

22
POS Tagging: Algorithms
 Many algorithms have been applied to this problem,

including
 Rule-based tagging
 Probabilistic / Stochastic methods
o HMM tagging
o Maximum entropy tagging

 Transformation based tagging and

 memory-based tagging.

11/13/2023

NLP

Fantahun B.(PhD)

23
POS Tagging: Algorithms - Rule-Based POS Tagging
 The earliest algorithms for automatically assigning POS were

based on a two-stage architecture (Harris, 1962; Klein and

Simmons, 1963; Greene and Rubin, 1971).
1. Use a dictionary to assign each word a list of potential POS.
2. Use large lists of hand-written disambiguation rules to winnow

down this list to a single POS for each word.

 One of the most comprehensive rule-based approaches is

the Constraint Grammar approach (Karlsson et al., 1995a).

In this section we describe a tagger based on this
approach, the EngCG tagger (Voutilainen, EngCG 1995,
1999).

11/13/2023

NLP

Fantahun B.(PhD)

24
POS Tagging: Algorithms - Rule-Based POS Tagging
 The EngCG ENGTWOL lexicon is based on the two-level

morphology and has about 56,000 entries for English word

stems (Heikkil¨a, 1995),
 counting a word with multiple POS (e.g., nominal and verbal senses of

hit) as separate entries, and

 not counting inflected and many derived forms.

 Each entry is annotated with a set of morphological and

syntactic features.
 Fig. 5.11 shows some selected words, together with a slightly

simplified listing of their features; these features are used in

rule writing.
11/13/2023

NLP

Fantahun B.(PhD)

25
POS Tagging: Algorithms - Rule-Based POS Tagging

11/13/2023

NLP

Fantahun B.(PhD)

26
POS Tagging: Algorithms - Rule-Based POS Tagging
 Most of the features in Fig. 5.11 are relatively self-explanatory;
 SG for singular, -SG3 for other than third-person-singular.
 ABSOLUTE  non-comparative and non-superlative for an adjective,

NOMINATIVE  non-genitive, and PCP2 means past participle.

 PRE, CENTRAL, and POST are ordering slots for determiners

(predeterminers (all) come before determiners (the): all the president’s

men).

 NOINDEFDETERMINER  words like furniture do not appear with the

indefinite determiner a.

 SV, SVO, and SVOO specify the subcategorization or complementation

pattern for the verb. SV means the verb appears solely with a subject
(nothing occurred); SVO with a subject and an object (I showed the film);
SVOO with a subject and two complements: She showed her the ball.

11/13/2023

NLP

Fantahun B.(PhD)

27
POS Tagging: Algorithms - Rule-Based POS Tagging
 In the first stage of the tagger, each word is run through the two-level

lexicon transducer and the entries for all possible parts-of-speech are
returned.

 For example the phrase “Pavlov had shown that salivation . . .“ would

return the following list (one line per possible tag, with the correct tag
shown in boldface):
Pavlov PAVLOV N NOM SG PROPER
had
HAVE V PAST VFIN SVO
HAVE PCP2 SVO
shown SHOW PCP2 SVOO SVO SV
that
ADV
PRON DEM SG
DET CENTRAL DEM SG
CS
salivation
N NOM SG
...

11/13/2023

NLP

Fantahun B.(PhD)

28
POS Tagging: Algorithms - Rule-Based POS Tagging
 EngCG then applies a large set of constraints (as many as 3,744

constraints in the EngCG-2 system) to the input sentence to rule out

incorrect parts-of-speech.

 The boldfaced entries in the table above show the desired result, in

which the simple past tense tag (rather than the past participle tag) is
applied to had, and the complementizer (CS) tag is applied to that.

 The constraints are used in a negative way, to eliminate tags that are

inconsistent with the context.

 For example one constraint eliminates all readings of that except the

ADV (adverbial intensifier) sense (this is the sense in the sentence it

isn’t that odd).

 Here’s a simplified version of the constraint. . .

11/13/2023

NLP

Fantahun B.(PhD)

29
POS Tagging: Algorithms - Rule-Based POS Tagging
 Here’s a simplified version of the constraint. . .

11/13/2023

NLP

Fantahun B.(PhD)

30
POS Tagging: Algorithms - Rule-Based POS Tagging
 The first two clauses of this rule check to see that the that

directly precedes a sentence-final adjective, adverb, or

quantifier. In all other cases the adverb reading is eliminated.

 The last clause eliminates cases preceded by verbs like consider

or believe which can take a noun and an adjective; this is to

avoid tagging the following instance of that as an adverb:

I consider that odd.

11/13/2023

NLP

Fantahun B.(PhD)

31
POS Tagging: Algorithms - Rule-Based POS Tagging
 Another rule is used to express the constraint that the

complementizer sense of that is most likely to be used if the

previous word is a verb which expects a complement (like
believe, think, or show), and if that is followed by the beginning
of a noun phrase and finite verb.

 This description oversimplifies the EngCG architecture; the

system also includes probabilistic constraints, and also makes

use of other syntactic information we haven’t discussed. The
interested reader should consult Karlsson et al. (1995b) and
Voutilainen (1999).

11/13/2023

NLP

Fantahun B.(PhD)

32
POS Tagging: Markov Chains
 An HMM is nothing more than a probabilistic function of a

Markov process.

 Markov processes/chains/models were first developed by

Andrei A. Markov (a student of Chebyshev).

 We will refer to vanilla Markov models as Visible Markov Models

(VMMs) when we want to be careful to distinguish them from

HMMs.

 Markov models can be used whenever one wants to model the

probability of a linear sequence of events.

11/13/2023

NLP

Fantahun B.(PhD)

33
POS Tagging: Markov Chains
 Markov chains and Hidden Markov Models are both extensions

of the finite automata.

 Finite automaton is defined by a set of states, and a set of

transitions between states that are taken based on the input

observations.

 A weighted finite-state automaton is a simple augmentation of

the finite automaton in which each arc is associated with a

probability, indicating how likely that path is to be taken.

 The probability on all the arcs leaving a node must sum to 1.

 A Markov chain is a special case of a weighted automaton in

which the input sequence uniquely determines which states the

automaton will go through.

11/13/2023

NLP

Fantahun B.(PhD)

34
POS Tagging: Markov Chains
 Fig. 6.1a shows a Markov chain for assigning a probability to a

sequence of weather events, where the vocabulary consists of

HOT, COLD, and RAINY.

11/13/2023

NLP

Fantahun B.(PhD)

35
POS Tagging: Markov Chains
 Fig. 6.1b shows another simple example of a Markov chain for

assigning a probability to a sequence of words w1, ..., wn.

 A Markov chain is specified by the following components:

11/13/2023

NLP

Fantahun B.(PhD)

36
POS Tagging: Markov Chains
 First order Markov chain assumptions

 Each aij expresses the probability p(qj|qi), hence

11/13/2023

NLP

Fantahun B.(PhD)

37
POS Tagging: Hidden Markov Models
 In an HMM, you don't know the state sequence that the model

passes through, but only some probabilistic function of it.

 A Markov chain is useful when we need to compute a probability

for a sequence of events that we can observe in the world.

 In many cases, however, the events we are interested in may not

be directly observable in the world.

 Example: in POS tagging, we didn’t observe POS tags in the world;
we saw words, and had to infer the correct tags from the word
sequence. We call the POS tags hidden because they are not
observed.
 In speech recognition; in that case we’ll see acoustic events in the
world, and have to infer the presence of ‘hidden’ words that are
the underlying causal source of the acoustics.

11/13/2023

NLP

Fantahun B.(PhD)

38
POS Tagging: Hidden Markov Models
 A HMM allows us to talk about both observed Model events (like words

that we see in the input) and hidden events (like POS tags) that we think
of as causal factors in our probabilistic model.

 To exemplify these models, we’ll use a task conceived of by Jason Eisner

(2002a).

 Imagine that you are a climatologist in the year 2799 studying the history of

global warming. You cannot find any records of the weather in Baltimore,
Maryland, for the summer of 2007, but you do find Jason Eisner’s diary, which
lists how many ice creams Jason ate every day that summer. Our goal is to
use these observations to estimate the temperature every day. We’ll simplify
this weather task by assuming there are only two kinds of days: cold (C) and
hot (H).

 So the Eisner task is as follows: Given a sequence of observations O,

each observation an integer corresponding to the number of ice creams

eaten on a given day, figure out the correct ‘hidden’ sequence Q of
weather states (H or C) which caused Jason to eat the ice cream.

11/13/2023

NLP

Fantahun B.(PhD)

39
POS Tagging: Hidden Markov Models
Formal definition of an HMM
 An HMM is specified by the following components:

11/13/2023

NLP

Fantahun B.(PhD)

40
POS Tagging: Hidden Markov Models
A first-order Hidden Markov Model two simplifying assumptions.
1) As with a first-order Markov chain, the probability of a particular

state is dependent only on the previous state:

2) The probability of an output observation oi is dependent only on

the state observation qi, and not on any other states or any other
observations:

Fig. 6.3 shows a sample HMM for the ice cream task. The two hidden states (H and C)
correspond to hot and cold weather, while the observations (drawn from the
alphabet O = {1,2,3}) correspond to the number of ice creams eaten by Jason on a
given day.
11/13/2023

NLP

Fantahun B.(PhD)

41
POS Tagging: Hidden Markov Models

11/13/2023

NLP

Fantahun B.(PhD)

42
POS Tagging: Hidden Markov Models
 Notice that in the HMM in Fig. 6.3, there is a (non-zero) probability

of transitioning between any two states. Such an HMM is called a

fully-connected or ergodic HMM.

 Sometimes, however, we have HMMs in which many of the

transitions between states have zero probability.

 For example, in left-to-right ( Bakis HMMs ), the state transitions

proceed from left to right, as shown in Fig. 6.4.

 There are no transitions going from a higher-numbered state to a

lower-numbered state

 (or, more accurately, any transitions from a higher-numbered state

to a lower-numbered state have zero probability).

 Bakis HMMs are generally used to model temporal processes like

speech.

11/13/2023

NLP

Fantahun B.(PhD)

43
POS Tagging: Hidden Markov Models

11/13/2023

NLP

Fantahun B.(PhD)

44
POS Tagging: Hidden Markov Models
 Hidden Markov Models should be characterized by three

fundamental problems:

11/13/2023

NLP

Fantahun B.(PhD)

45
HMMs: Computing Likelihood: The Forward Algorithm

 For example, given the HMM in Fig. 6.2b, what is the probability of

the sequence 3 1 3?

 Markov chain, where the surface observations are the same as the

hidden events, we could compute the probability of 3 1 3 just by

following the states labeled 3 1 3 and multiplying the probabilities
along the arcs.

 For a HMM, things are not so simple. We want to determine the

probability of an ice-cream observation sequence like 3 1 3, but we

don’t know what the hidden state sequence is!

11/13/2023

NLP

Fantahun B.(PhD)

46
HMMs: Computing Likelihood: The Forward Algorithm
 Simpler case: we already knew the weather, and wanted to

predict how much ice cream Jason would eat.

 First, recall that for Hidden Markov Models, each hidden state

produces only a single observation. Thus the sequence of hidden

states and the sequence of observations have the same length.

 Given this one-to-one mapping, and the Markov assumptions

expressed in Eq. 6.6, for a particular hidden state sequence, Q and

an observation sequence O
Q=q0,q1,q2, ...,qT
O = o1,o2, ...,oT ,
the likelihood of the observation sequence is:

,
11/13/2023

NLP

Fantahun B.(PhD)

47
HMMs: Computing Likelihood: The Forward Algorithm

 But of course, we don’t actually know what the hidden state

(weather) sequence was.

 We’ll need to compute the probability of ice-cream events 3 1 3

instead by summing over all possible weather sequences, weighted

by their probability.

11/13/2023

NLP

Fantahun B.(PhD)

48
HMMs: Computing Likelihood: The Forward Algorithm
 First, let’s compute the joint probability of being in a

particular weather sequence Q and generating a particular

sequence O of ice-cream events.

 In general, this is:

 The computation of the joint probability of our ice-cream

observation 3 1 3 and one possible hidden state sequence

hot hot cold is as follows (Fig. 6.6 shows a graphic
representation of this):

11/13/2023

NLP

Fantahun B.(PhD)

49
HMMs: Computing Likelihood: The Forward Algorithm

11/13/2023

NLP

Fantahun B.(PhD)

50
HMMs: Computing Likelihood: The Forward Algorithm
 Now, we can compute the total probability of the observations

just by summing over all possible hidden state sequences:

 For our particular case, we would sum over the 8 three-event

sequences. What are these?

(6.13)

P(3 1 3)=P(3 1 3,cold cold cold) + P(3 1 3,cold cold hot)

+ P(3 1 3,hot hot cold) + ...

 What is the problem with this approach?

 For an HMM with N hidden states and an observation sequence

of T observations, there are NT possible hidden sequences. For

real tasks, where N and T and both large, NT is a large number ?

11/13/2023

NLP

Fantahun B.(PhD)

51
HMMs: Computing Likelihood: The Forward Algorithm
 Solution: we use the Forward Algorithm which is an efficient

algorithm (O(N2T)).

 The forward algorithm is a kind of dynamic programming

algorithm, i.e., an algorithm that uses a table to store

intermediate values as it builds up the probability of the
observation sequence.

 The forward algorithm computes the observation probability

by summing over the probabilities of all possible hidden

state paths that could generate the observation sequence,
but it does so efficiently by implicitly folding each of these
paths into a single forward trellis.

11/13/2023

NLP

Fantahun B.(PhD)

52
HMMs: Computing Likelihood: The Forward Algorithm
 Each cell of the forward algorithm trellis

t(j) represents the

probability of being in state j after seeing the first t observations,

given the automaton λ.

 The value of each cell

t( j) is computed by summing over the

probabilities of every path that could lead us to this cell.

 Formally, each cell expresses the following probability:

 Fig. 6.7 shows an example of the forward trellis for computing the

likelihood of 3 1 3 given the hidden state sequence hot hot cold.

11/13/2023

NLP

Fantahun B.(PhD)

53
HMMs: Computing Likelihood: The Forward Algorithm
 z

11/13/2023

NLP

Fantahun B.(PhD)

54
HMMs: Computing Likelihood: The Forward Algorithm
 For a given state qj at time t, the value

t( j) is computed as:

 The three factors that are multiplied in Eq. 6.15 in extending the

previous paths to compute the forward probability at time t are:

11/13/2023

NLP

Fantahun B.(PhD)

55
HMMs: Computing Likelihood: The Forward Algorithm
 v

11/13/2023

NLP

Fantahun B.(PhD)

56
HMMs: Computing Likelihood: The Forward Algorithm
 We give two formal definitions of the forward algorithm; the

pseudocode (refer Fig. 6.9) and a statement of the definitional

recursion here:

11/13/2023

NLP

Fantahun B.(PhD)

57
HMMs: Decoding: The Viterbi Algorithm

 For any model, such as an HMM, that contains hidden variables,

the task of determining which sequence of variables is the

underlying source of some sequence of observations is called the
decoding task.

 In the ice cream domain, given a sequence of ice cream

observations 3 1 3 and an HMM, the task of the decoder is to find

the best hidden weather sequence (H H H).

11/13/2023

NLP

Fantahun B.(PhD)

58
HMMs: Decoding: The Viterbi Algorithm
 We might propose to find the best sequence as follows:
1. for each possible hidden state sequence (HHH, HHC, HCH, etc.),

we could run the forward algorithm and compute the likelihood of

the observation sequence given that hidden state sequence.

2. then we could choose the hidden state sequence with the max

observation likelihood.

 Problem: exponentially large number of state sequences!

 Solution: the most common decoding algorithms for HMMs, the

Viterbi Algorithm.

 Like the forward algorithm, Viterbi is a kind of dynamic

programming, and makes uses of a dynamic programming trellis.

Viterbi also strongly resembles another dynamic programming
variant, the minimum edit distance algorithm.

11/13/2023

NLP

Fantahun B.(PhD)

59
HMMs: Decoding: The Viterbi Algorithm
 Fig. 6.10 shows an example of the Viterbi trellis for computing the

best hidden state sequence for the observation sequence 3 1 3.

 The idea is to process the observation sequence left to right, filling

out the trellis. Each cell of the Viterbi trellis, vt(j) represents the
probability that the HMM is in state j after seeing the first t
observations and passing through the most probable state
sequence q0,q1, ...,qt−1, given the automaton λ .

 The value of each cell vt(j) is computed by recursively taking the

most probable path that could lead us to this cell. Formally, each
cell expresses the following probability:

 Note that we represent the most probable path by taking the

maximum over all possible previous state sequences.

11/13/2023

NLP

Fantahun B.(PhD)

60
HMMs: Decoding: The Viterbi Algorithm
 s

11/13/2023

NLP

Fantahun B.(PhD)

61
HMMs: Decoding: The Viterbi Algorithm
 For a given state qj at time t, the value vt(j) is computed as:

 The three factors that are multiplied in Eq. 6.20 for extending the

previous paths to compute the Viterbi probability at time t are:

11/13/2023

NLP

Fantahun B.(PhD)

62
HMMs: Decoding: The Viterbi Algorithm
 Note that the Viterbi algorithm is identical to the forward algorithm

except that it takes the max over the previous path probabilities
where the forward algorithm takes the sum.

 Note also that the Viterbi algorithm has one component that the

forward algorithm doesn’t have: backpointers.

 This is because while the forward algorithm needs to produce an

observation likelihood, the Viterbi algorithm must produce a

probability and also the most likely state sequence.

 We compute this best state sequence by keeping track of the path

of hidden states that led to each state, as suggested in Fig. 6.12,

and then at the end tracing back the best path to the beginning
(the Viterbi backtrace ).

11/13/2023

NLP

Fantahun B.(PhD)

63
HMMs: Decoding: The Viterbi Algorithm
 Finally, we can give a formal definition of the Viterbi recursion as

follows:

11/13/2023

NLP

Fantahun B.(PhD)

64
HMMs: Decoding: The Viterbi Algorithm
 Finally, we can give a formal definition of the Viterbi recursion as

follows:

11/13/2023

NLP

Fantahun B.(PhD)

65
Training HMMs: The Forward-Backward Algorithm

 Problem: the third problem for HMMs:

o learning the parameters of an HMM, i.e., the A and B matrices.
 Input:
o an unlabeled sequence of observations O and a vocabulary of

potential hidden states Q.

o Thus for the ice cream task, we would start with a sequence of
observations O = {1,3,2, ...,}, and the set of hidden states H and C.
o For the POS tagging task we would start with a sequence of
observations O = {w1,w2,w3 . . .} and a set of hidden states NN, NNS,
VBD, IN,... And so on.
 Algorithm: the forward-backward or Baum-Welch algorithm (Baum, 1972),
a special case of the Expectation-Maximization or EM algorithm (Dempster
et al., 1977).
11/13/2023

NLP

Fantahun B.(PhD)

66
Training HMMs: The Forward-Backward Algorithm
 Simpler case of training a Markov chain rather than HMM:
 Since the states in a Markov chain are observed, we can run the

model on the observation sequence and directly see which path

we took through the model, and which state generated each
observation symbol.

 A Markov chain of course has no emission probabilities B

(alternatively we could view a Markov chain as a degenerate

Hidden Markov Model where all the b probabilities are 1.0 for the
observed symbol and 0 for all other symbols.).

 Thus the only probabilities we need to train are the transition

probability matrix A.

11/13/2023

NLP

Fantahun B.(PhD)

67
Training HMMs: The Forward-Backward Algorithm
 We get the maximum likelihood estimate of the probability aij of a

particular transition between states i and j by counting the number of

times the transition was taken, which we could call C(i→ j), and then
normalizing by the total count of all times we took any transition from
state i:
(6.27)

 We can directly compute this probability in a Markov chain because

we know which states we were in.

 For an HMM we cannot compute these counts directly from an

observation sequence since we don’t know which path of states was

taken through the machine for a given input.

 The Baum-Welch algorithm uses two neat intuitions to solve this

problem.

11/13/2023

NLP

Fantahun B.(PhD)

68
Training HMMs: The Forward-Backward Algorithm
 The Baum-Welch algorithm uses two neat intuitions to solve this

problem.

1) iteratively estimate the counts.

o We will start with an estimate for the transition and observation

probabilities, and then use these estimated probabilities to derive better

and better probabilities.

2) Get estimated probabilities by computing the forward probability for

an observation and then dividing that probability mass among all the
different paths that contributed to this forward probability.

 In order to understand the algorithm, we need to define a useful

probability related to the forward probability, called the backward

probability.

11/13/2023

NLP

Fantahun B.(PhD)

69
Training HMMs: The Forward-Backward Algorithm
 The backward probability β is the probability of seeing the

observations from time t+1 to the end, given that we are in state i at
time t (and of course given the automaton λ):

11/13/2023

NLP

Fantahun B.(PhD)

70
Training HMMs: The Forward-Backward Algorithm
 s

11/13/2023

NLP

Fantahun B.(PhD)

71
Training HMMs: The Forward-Backward Algorithm
 We are now ready to understand how the forward and backward

probabilities can help us compute the transition probability aij and

observation probability bi(ot) from an observation sequence, even
though the actual path taken through the machine is hidden.

 How do we compute the numerator?

 Consult your textbook

11/13/2023

NLP

Fantahun B.(PhD)

72
POS Tagging: Hidden Markov Models
Sources of information in Tagging
 tags of other words in the context of the word we are interested

in.

 Syntagmatic structural information

 Not very successful,
• eg. Greene and Rubin (1971), an early deterministic rule-based tagger that

used such information about syntagmatic patterns correctly tagged only 77%
of words.

 Just knowing the word involved gives a lot of information about

the correct tag

 Charniak et al. (1993), showed that a `dumb' tagger that simply assigns

the most common tag to each word performs at the surprisingly high
level of 90% correct.
 As a result of this, the performance of such a `dumb' tagger has been
used to give a baseline performance level in subsequent studies.
11/13/2023

NLP

Fantahun B.(PhD)

73
POS Tagging: Hidden Markov Models
Sources of information in Tagging
 And all modern taggers in some way make use of a

combination of

 syntagmatic information (looking at information about tag

sequences) and
 lexical information (predicting a tag based on the word
concerned).

11/13/2023

NLP

Fantahun B.(PhD)

74
POS Tagging: Hidden Markov Models
Computing the most-likely tag sequence
 HMM tagging algorithm chooses as the most likely tag

sequence the one that maximizes the product of two terms;

 the probability of the sequence of tags, and

 the probability of each tag generating a word.

 For this example, we will use the 87-tag Brown corpus tagset,

because it has a specific tag for to, TO, used only when to is an
infinitive; prepositional uses of to are tagged as IN.

 Example:
(5.36) Secretariat/NNP is/BEZ expected/VBN to/TO race/VB tomorrow/NR
(5.37) People/NNS continue/VB to/TO inquire/VB the/AT reason/NN for/IN
the/AT race/NN for/IN outer/JJ space/NN
11/13/2023

NLP

Fantahun B.(PhD)

75
POS Tagging: Hidden Markov Models
Computing the most-likely tag sequence
 Let’s look at how race can be correctly tagged as a VB instead

of an NN in (5.36).

11/13/2023

NLP

Fantahun B.(PhD)

76
POS Tagging: Hidden Markov Models
Computing the most-likely tag sequence
 Almost all the probabilities in these two sequences are identical; in

Fig. 5.12 we have highlighted in boldface the three probabilities that

 The tag transition probabilities P(NN|TO) and P(VB|TO) give us the

answer to the question “How likely are we to expect a verb (noun)

given the previous tag?”

 A look at the (87-tag) Brown corpus

P(NN|TO) = .00047
P(VB|TO) = .83

gives us the following probabilities, showing that verbs are about 500
times as likely as nouns to occur after TO:

11/13/2023

NLP

Fantahun B.(PhD)

77
POS Tagging: Hidden Markov Models
Computing the most-likely tag sequence
 Let’s now turn to P(wi|ti), the lexical likelihood of the word race given

a part-of-speech tag.

 For the two possible tags VB and NN, these correspond to the

probabilities: P(race|VB) and P(race|NN).

 Here are the lexical likelihoods from Brown:

P(race|NN) = .00057
P(race|VB) = .00012
 Finally, we need to represent the tag sequence probability for the

following tag (in this case the tag NR for tomorrow):

P(NR|VB) = .0027
P(NR|NN) = .0012

11/13/2023

NLP

Fantahun B.(PhD)

78
POS Tagging: Hidden Markov Models
Computing the most-likely tag sequence
 If we multiply the lexical likelihoods with the tag sequence

probabilities, we see that the probability of the sequence with

11/13/2023

NLP

Fantahun B.(PhD)

79
POS Tagging: Algorithms - Transformation-Based Tagging
 Transformation-Based Tagging, sometimes called Brill tagging,

is an instance of the Transformation-Based Learning (TBL)

approach to machine learning (Brill, 1995), and draws
inspiration from both the rule-based and stochastic taggers.
 Like the rulebased taggers, TBL is based on rules that specify

what tags should be assigned to what words.

 Like the stochastic taggers, TBL is a machine learning

technique, in which rules are automatically induced from

the data.
 Like some but not all of the HMM taggers, TBL is a supervised

learning technique; it assumes a pre-tagged training

corpus.
11/13/2023

NLP

Fantahun B.(PhD)

80
POS Tagging: Algorithms - Transformation-Based Tagging
 Imagine an artist painting a picture of a white house with green trim

against a blue sky. Suppose most of the picture was sky, and hence
most of the picture was blue.
 The artist might begin by using a very broad brush and painting the

entire canvas blue.

 Next she might switch to a somewhat smaller white brush, and paint the

entire house white. She would just color in the whole house, not worrying
about the brown roof, or the blue windows or the green gables.
 Next she takes a smaller brown brush and colors over the roof.
 Now she takes up the blue paint on a small brush and paints in the blue

windows on the house.

 Finally she takes a very fine green brush and does the trim on the gables.

11/13/2023

NLP

Fantahun B.(PhD)

81
POS Tagging: Algorithms - Transformation-Based Tagging
 The painter starts with a broad brush that covers a lot of the canvas

but colors a lot of areas that will have to be repainted. The next layer
colors less of the canvas, but also makes less “mistakes”. Each new
layer uses a finer brush that corrects less of the picture, but makes
fewer mistakes.
 TBL uses somewhat the same method as this painter.
 The TBL algorithm has a set of tagging rules. A corpus is first tagged

using the broadest rule, that is, the one that applies to the most
cases. Then a slightly more specific rule is chosen, which changes
some of the original tags. Next an even narrower rule, which
changes a smaller number of tags (some of which might be
previously changed tags).
11/13/2023

NLP

Fantahun B.(PhD)

82
POS Tagging: Transformation-Based Tagging
How TBL rules are applied
 Let’s look at one of the rules used by Brill’s (1995) tagger. Before

the rules apply, the tagger labels every word with its most-likely
tag. We get these most-likely tags from a tagged corpus. For
example, in the Brown corpus, race is most likely to be a noun:
P(NN|race) = .98
P(VB|race) = .02
 This means that the two examples of race that we saw above will

both be coded as NN.

11/13/2023

NLP

Fantahun B.(PhD)

83
POS Tagging: Transformation-Based Tagging
How TBL rules are applied
 In the first case, this is a mistake, as NN is the incorrect tag:

 In the second case, this race is correctly tagged as a NN:

11/13/2023

NLP

Fantahun B.(PhD)

84
POS Tagging: Transformation-Based Tagging
How TBL rules are applied
 After selecting the most-likely tag, Brill’s tagger applies its

transformation rules.
 As it happens, Brill’s tagger learned a rule that applies exactly

to this mistagging of race:

Change NN to VB when the previous tag is TO
 This rule would change race/NN to race/VB in exactly the

following situation, since it is preceded by to/TO.

11/13/2023

NLP

Fantahun B.(PhD)

85
POS Tagging: Transformation-Based Tagging
How TBL rules are learned
 Brill’s TBL algorithm has three major stages.
1. It labels every word with its mostlikely tag.
2. It examines every possible transformation, and selects the one

that results in the most improved tagging.

3. It re-tags the data according to this rule.
 The last two stages are repeated until some stopping criterion is

reached, such as insufficient improvement over the previous pass.

 Note that stage two requires that TBL knows the correct tag of

each word; that is, TBL is a supervised learning algorithm.

11/13/2023

NLP

Fantahun B.(PhD)

86
POS Tagging: Transformation-Based Tagging
How TBL rules are learned
 The output of the TBL process is an ordered list of transformations;

these then constitute a “tagging procedure” that can be applied

to a new corpus.

 In principle the set of possible transformations is infinite, since we

could imagine transformations such as “transform NN to VB if the

previous word was “IBM” and the word “the” occurs between 17
and 158 words before that”.

 But TBL needs to consider every possible transformation, in order to

pick the best one on each pass through the algorithm. Thus the
algorithm needs a way to limit the set of transformations. This is
done by designing a small set of templates (abstracted
transformations). Every allowable transformation is an instantiation
of one of the templates.

11/13/2023

NLP

Fantahun B.(PhD)

87
POS Tagging: Transformation-Based Tagging
How TBL rules are learned
 Brill’s set of templates is listed in Fig. 5.20. Fig. 5.21 gives the details

of this algorithm for learning transformations.

11/13/2023

NLP

Fantahun B.(PhD)

88
POS Tagging: Transformation-Based Tagging
How TBL rules are learned
 Brill’s set of templates is listed in Fig. 5.20. Fig. 5.21 gives the details

of this algorithm for learning transformations. (refer on page 169)

11/13/2023

NLP

Fantahun B.(PhD)

89
POS Tagging: Maximum Entropy Models
 A second probabilistic machine learning framework called

Maximum Entropy modeling, MaxEnt for short.

 MaxEnt is more widely known as multinomial logistic regression.

 Our goal in this chapter is to introduce the use of MaxEnt for

sequence classification.

 Recall that the task of sequence classification or sequence

labelling is to assign a label to each element in some sequence,

such as assigning a part-of-speech tag to a word.

 The most common MaxEnt sequence classifier is the Maximum

Entropy Markov Model or MEMM, to be introduced in Sec. 6.8.

But before we see this use of MaxEnt as a sequence classifier,
we need to introduce non-sequential classification.

11/13/2023

NLP

Fantahun B.(PhD)

90
POS Tagging: Maximum Entropy Models
 The task of classification is to take a single observation, extract some

useful features describing the observation, and then based on these

features, to classify the observation into one of a set of discrete
classes.
 A probabilistic classifier does slightly more than this; in addition to
assigning a label or class, it gives the probability of the observation
being in that class; indeed, for a given observation a probabilistic
classifier gives a probability distribution over all classes.
 Such non-sequential classification tasks occur throughout speech and
language processing.
 text classification (spam/ham)
 sentiment analysis (positive/negative opinion).
 sentence boundaries (a period (‘.’) as either a sentence boundary or

not).

11/13/2023

NLP

Fantahun B.(PhD)

91
POS Tagging: Maximum Entropy Models
 MaxEnt belongs to the family of classifiers known as the exponential or

log-linear classifiers.

 MaxEnt works by extracting some set of features from the input,

combining them linearly (meaning that we multiply each by a weight

and then add them up), and then, for reasons we will see below,
using this sum as an exponent.

 Let’s flesh out this intuition just a bit more. Assume that we have some

input x (perhaps it is a word that needs to be tagged, or a document

that needs to be classified) from which we extract some features. A
feature for tagging might be this word ends in -ing or the previous
word was ‘the’. For each such feature fi, we have some weight wi.

11/13/2023

NLP

Fantahun B.(PhD)

92
POS Tagging: Maximum Entropy Models
 Given the features and weights, our goal is to choose a class (for

example a POS-tag) for the word. MaxEnt does this by choosing the
most probable tag; the probability of a particular class c given the
observation x is:

 Here Z is a normalizing factor, used to make the probabilities correctly

sum to 1; and as usual exp(x) = e x

11/13/2023

NLP

Fantahun B.(PhD)

93
POS Tagging: Maximum Entropy Models
 Multinomial logistic regression is called MaxEnt in speech and

language processing (see Sec. 6.7.1 on the intuition behind the

name ‘maximum entropy’)
 Adding some details to this equation, first we’ll flesh out the
normalization factor Z, specify the number of features as N, and
make the value of the weight dependent on the class c. The final
equation is:

 Note that the normalization factor Z is just used to make the

exponential into a true probability;

11/13/2023

NLP

Fantahun B.(PhD)

94
POS Tagging: Maximum Entropy Models
 We need to make one more change to see the final MaxEnt equation.

So far we’ve been assuming that the features fi are real-valued. It is

more common in speech and language processing, however, to use
binary-valued features. A feature that only takes on the values 0 and 1 is
also called an indicator function.
 In general, the features we use are indicator functions of some property
of the observation and the class we are considering assigning. Thus in
MaxEnt, instead of the notation fi, we will often use the notation fi(c,x),
meaning a feature i for a particular class c for a given observation x.
 The final equation for computing the probability of y being of class c
given x in MaxEnt is:

11/13/2023

NLP

Fantahun B.(PhD)

95
POS Tagging: Maximum Entropy Models
 Example features: POS

(6.81) Secretariat/NNP is/BEZ expected/VBN to/TO race/?? tomorrow/

 We are doing classification, not sequence classification
 We would like to know whether to assign the class VB to race (or instead

assign some other class like NN).

 One useful feature, we’ll call it f1, would be the fact that the current
word is race. We can thus add a binary feature which is true if this is the
case:

11/13/2023

NLP

Fantahun B.(PhD)

96
POS Tagging: Maximum Entropy Models
 Two more part-of-speech tagging features might focus on aspects of a

word’s spelling and case:

11/13/2023

NLP

Fantahun B.(PhD)

97
POS Tagging: Maximum Entropy Models
 Since each feature is dependent on both a property of the observation

and the class being labeled, we would need to have separate feature
for, e.g, the link between race and VB, or the link between a previous
TO and NN:

11/13/2023

NLP

Fantahun B.(PhD)

98
POS Tagging: Maximum Entropy Models
 Each of these features has a corresponding weight. Thus the weight

w1(c,x) would indicate how strong a cue the word race is for the tag VB,
the weight w2(c,x) would indicate how strong a cue the previous tag TO
is for the current word being a VB, and so on.

11/13/2023

NLP

Fantahun B.(PhD)

99
POS Tagging: Maximum Entropy Models
 Let’s assume that the feature weights for the two classes VB and VN

are as shown in Fig. 6.19.

 Let’s call the current input observation (where the current word is
race) x.
 We can now compute P(NN|x) and P(VB|x), using Eq. 6.80

11/13/2023

NLP

Fantahun B.(PhD)

100
POS Tagging: Maximum Entropy Models
 Notice that when we use MaxEnt to perform classification, MaxEnt

naturally gives us a probability distribution over the classes.

 If we want to do a hard-classification and choose the single-best
class, we can choose the class that has the highest probability, i.e.:

 Classification in MaxEnt is thus a generalization of classification in

(boolean) logistic regression.

 In boolean logistic regression, classification involves building one
linear expression which separates the observations in the class from
the observations not in the class.
 Classification in MaxEnt, by contrast, involves building a separate
linear expression for each of C classes.
11/13/2023

NLP

Fantahun B.(PhD)

101
POS Tagging: Maximum Entropy Models
 But as we’ll see later in Sec. 6.8, we generally don’t use MaxEnt for

hard classification.

 Usually we want to use MaxEnt as part of sequence classification,

where we want not the best single class for one unit, but the best
total sequence.

 For this task, it’s useful to exploit the entire probability distribution for

each individual unit, to help find the best sequence.

 Indeed even in many non-sequence applications a probability

distribution over the classes is more useful than a hard choice.

11/13/2023

NLP

Fantahun B.(PhD)

102
POS Tagging: Maximum Entropy Models
 The features we have described so far express a single binary

property of an observation.

 But it is often useful to create more complex features that express

combinations of properties of a word. Some kinds of machine

learning models, like Support Vector Machines (SVMs), can
automatically model the interactions between primitive properties,
but in MaxEnt any kind of complex feature has to be defined by
hand.

For example a word starting with a capital letter (eg. Day) is more likely to be a
proper noun (NNP) than a common noun (eg. in United Nations Day). But a word
which is capitalized but which occurs at the beginning of the sentence (the
previous word is <s>), as in Day after day...., is not more likely to be a proper
noun.

 Even if each of these properties were already a primitive feature,

MaxEnt would not model their combination, so this boolean

combination of properties would need to be encoded as a feature by
hand:

11/13/2023

NLP

Fantahun B.(PhD)

103
POS Tagging: Maximum Entropy Models
 A key to successful use of MaxEnt is thus the design of appropriate

features and feature combinations.

11/13/2023

NLP

Fantahun B.(PhD)

104
POS Tagging: Maximum Entropy Models (Learning)
 Learning a MaxEnt model can be done via a generalization of the

logistic regression learning algorithms described in Sec. 6.6.4; as we

saw in (6.73), we want to find the parameters w which maximize the
log likelihood of the M training samples:

 As with binary logistic regression, we use some convex optimization

algorithm to find the weights which maximize this function.

 Regularized version:

11/13/2023

NLP

Fantahun B.(PhD)

105
POS Tagging
Bibliography
Speech and Language Processing: An Introduction to Natural
Language Processing, Computational Linguistics, and Speech
Recognition (2nd edition) D. Jurafsky and J. Martin
Foundations of Statistical Natural Language Processing C. Manning and
H. Schutze

11/13/2023

NLP

Fantahun B.(PhD)

106

Part-of-Speech (POS) Tagging
No ratings yet
Part-of-Speech (POS) Tagging
94 pages
4 Pos
No ratings yet
4 Pos
62 pages
POS Tagging: Introduction: Heng Ji
No ratings yet
POS Tagging: Introduction: Heng Ji
35 pages
2025-NLP-Lecture 05 - Sequence Labeling For Parts of Speech and Name Entities
No ratings yet
2025-NLP-Lecture 05 - Sequence Labeling For Parts of Speech and Name Entities
69 pages
NLP Unit III Notes
No ratings yet
NLP Unit III Notes
30 pages
Lec3-Posner Intro
No ratings yet
Lec3-Posner Intro
30 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
Lecture 16-17-18-19
No ratings yet
Lecture 16-17-18-19
42 pages
Lecture6 2022
No ratings yet
Lecture6 2022
101 pages
Lecture#11 (POS Tagging)
No ratings yet
Lecture#11 (POS Tagging)
19 pages
3 Natural Language Processing-PoS Tagging
No ratings yet
3 Natural Language Processing-PoS Tagging
14 pages
Chapter Two Natural Language Processing
No ratings yet
Chapter Two Natural Language Processing
141 pages
NLP Chapter 3
No ratings yet
NLP Chapter 3
36 pages
POS Tagging: Techniques and Challenges
No ratings yet
POS Tagging: Techniques and Challenges
75 pages
Lect6 Pos
No ratings yet
Lect6 Pos
62 pages
Parts of Speech Tagging
No ratings yet
Parts of Speech Tagging
62 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
Comprehensive Guide to POS Tagging
No ratings yet
Comprehensive Guide to POS Tagging
48 pages
8 POSNER Intro May 6 2021
No ratings yet
8 POSNER Intro May 6 2021
26 pages
Understanding Part-Of-Speech Tagging
No ratings yet
Understanding Part-Of-Speech Tagging
53 pages
Lecture 20-23 Part of Speech Tagging
No ratings yet
Lecture 20-23 Part of Speech Tagging
36 pages
Lec-5 POStagging
No ratings yet
Lec-5 POStagging
24 pages
POS Tagging for NLP Students
No ratings yet
POS Tagging for NLP Students
36 pages
SPR 07 Nltk2
No ratings yet
SPR 07 Nltk2
30 pages
Pos Tagging and Chunking
No ratings yet
Pos Tagging and Chunking
29 pages
NLPChapter 3
No ratings yet
NLPChapter 3
14 pages
Lecture Part of Speech Tagging
No ratings yet
Lecture Part of Speech Tagging
41 pages
Print Lect6 Pos
No ratings yet
Print Lect6 Pos
11 pages
Unit 2 Pos Tagger
No ratings yet
Unit 2 Pos Tagger
9 pages
3.1 Chap NLP Pos - Tagging - Lecture3
No ratings yet
3.1 Chap NLP Pos - Tagging - Lecture3
38 pages
Language Structure
No ratings yet
Language Structure
10 pages
POS Tagging for NLP Enthusiasts
No ratings yet
POS Tagging for NLP Enthusiasts
47 pages
POS Tagging
No ratings yet
POS Tagging
11 pages
L11-POS - Tagging - II
No ratings yet
L11-POS - Tagging - II
43 pages
Module 3
No ratings yet
Module 3
33 pages
Unit 3
No ratings yet
Unit 3
16 pages
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
No ratings yet
Part of Speech Tagging (Chapter 5) : Adapted From Kathy Mccoy'S Presentation Downloaded From The Web, September 2010
63 pages
Experiment 4
No ratings yet
Experiment 4
3 pages
Tagging and Its Types
No ratings yet
Tagging and Its Types
3 pages
Part-of-Speech Tagging Techniques
No ratings yet
Part-of-Speech Tagging Techniques
83 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
No ratings yet
Session 6 - Part-Of-Speech Tagging, Sequence Labeling
86 pages
Module 2 HMMPPT
No ratings yet
Module 2 HMMPPT
31 pages
Ai TXT Unit4
No ratings yet
Ai TXT Unit4
39 pages
NLP Session 6
No ratings yet
NLP Session 6
5 pages
Lec04 2 PartOfSpeechTagging
No ratings yet
Lec04 2 PartOfSpeechTagging
56 pages
Speech Recognition Systems Guide
No ratings yet
Speech Recognition Systems Guide
13 pages
PARTS OF SPEECH TAGGING Article
No ratings yet
PARTS OF SPEECH TAGGING Article
4 pages
POStagging
No ratings yet
POStagging
72 pages
POS Tagging: Name: E Gayathri REG NO: 21MIS0241
No ratings yet
POS Tagging: Name: E Gayathri REG NO: 21MIS0241
18 pages
NLP Ia2
No ratings yet
NLP Ia2
18 pages
Natural Language Processing: Parts of Speech Tagging - Pos
No ratings yet
Natural Language Processing: Parts of Speech Tagging - Pos
20 pages
Module-2 NLP
No ratings yet
Module-2 NLP
50 pages
POS Tagging and HMM in NLP
No ratings yet
POS Tagging and HMM in NLP
84 pages
Part-Of-Speech Tagging Overview
No ratings yet
Part-Of-Speech Tagging Overview
84 pages
NLP Notes Unit2 & Unit3
No ratings yet
NLP Notes Unit2 & Unit3
22 pages
Pos Tagging Pushpak
No ratings yet
Pos Tagging Pushpak
88 pages
Advanced Database System Chapter Two Query Processing and Optimization
No ratings yet
Advanced Database System Chapter Two Query Processing and Optimization
50 pages
CCS - Online Assessment
No ratings yet
CCS - Online Assessment
26 pages
Functional Vs Class Components in React Cheat Sheet 2
No ratings yet
Functional Vs Class Components in React Cheat Sheet 2
1 page
Karl Popper Power Point
No ratings yet
Karl Popper Power Point
11 pages
Gebre Utility 2029
No ratings yet
Gebre Utility 2029
1 page
NLP-Ch-2 Introduction To Language Models
No ratings yet
NLP-Ch-2 Introduction To Language Models
82 pages
Chapter - Three: Syntax Analysis
No ratings yet
Chapter - Three: Syntax Analysis
100 pages
Group 20 - Compose Builder - Combined
No ratings yet
Group 20 - Compose Builder - Combined
86 pages
Backend Report
No ratings yet
Backend Report
3 pages
Advanced Sorting Algorithms Explained
No ratings yet
Advanced Sorting Algorithms Explained
44 pages
Python Event Bus Implementation Guide
No ratings yet
Python Event Bus Implementation Guide
8 pages
LAN Technician Configuration Guide
No ratings yet
LAN Technician Configuration Guide
2 pages
Build a Switch and Router Network Lab
No ratings yet
Build a Switch and Router Network Lab
4 pages
Troubleshooting Default Gateway Issues
No ratings yet
Troubleshooting Default Gateway Issues
3 pages
LAN Setup Guide for Networking Students
No ratings yet
LAN Setup Guide for Networking Students
4 pages
IOS Navigation and Configuration Guide
No ratings yet
IOS Navigation and Configuration Guide
4 pages
Network Device Configuration Guide
No ratings yet
Network Device Configuration Guide
3 pages
Initial Switch Configuration Guide
No ratings yet
Initial Switch Configuration Guide
5 pages
Chapter One Lab-5.2 - Basic Switch and End Device Configuration - Physical Mode
No ratings yet
Chapter One Lab-5.2 - Basic Switch and End Device Configuration - Physical Mode
3 pages
Year 4 Greetings Lesson Plan
No ratings yet
Year 4 Greetings Lesson Plan
3 pages
SQL For Interview Prep - SQL Interview Prep Cheatsheet - Codecademy
No ratings yet
SQL For Interview Prep - SQL Interview Prep Cheatsheet - Codecademy
6 pages
Zebra User Guide
No ratings yet
Zebra User Guide
159 pages
Relevant Linguistics QuickExerciseQuestions
83% (6)
Relevant Linguistics QuickExerciseQuestions
29 pages
APS106 Lab 3
No ratings yet
APS106 Lab 3
7 pages
API Code For FA
No ratings yet
API Code For FA
21 pages
Cello 2020 Grade 1 PDF
No ratings yet
Cello 2020 Grade 1 PDF
18 pages
Micro Controller Assignment
No ratings yet
Micro Controller Assignment
10 pages
Vim Commands Cheat Sheet
No ratings yet
Vim Commands Cheat Sheet
4 pages
PPS Unit 2 Notes (Functions 2nd Part)
No ratings yet
PPS Unit 2 Notes (Functions 2nd Part)
4 pages
English Long Vowels PDF
No ratings yet
English Long Vowels PDF
21 pages
EViews 7 Whats New
No ratings yet
EViews 7 Whats New
6 pages
Attendance Management System
100% (1)
Attendance Management System
53 pages
Vs Indsoc 01 Introduction
No ratings yet
Vs Indsoc 01 Introduction
1 page
2022 Dec. ECT203-C
No ratings yet
2022 Dec. ECT203-C
2 pages
Suyu Log
No ratings yet
Suyu Log
69 pages
Soal B.Inggris Kelas 8 Fix
No ratings yet
Soal B.Inggris Kelas 8 Fix
4 pages
Individual Assignment 1
No ratings yet
Individual Assignment 1
3 pages
Pocket Terminal Operating Guide
No ratings yet
Pocket Terminal Operating Guide
15 pages
Ang Huling Prinsesa Reflection Paper
0% (1)
Ang Huling Prinsesa Reflection Paper
2 pages
XDXDM Instruction Manual
No ratings yet
XDXDM Instruction Manual
483 pages
Grade 7 Math Post Test and Solutions
No ratings yet
Grade 7 Math Post Test and Solutions
3 pages
The Heart of Man (1732, JR Gschwend)
No ratings yet
The Heart of Man (1732, JR Gschwend)
37 pages
Rclone Update: Metadata & Hasher Fixes
No ratings yet
Rclone Update: Metadata & Hasher Fixes
52 pages
Java's Ancient Name Debate
No ratings yet
Java's Ancient Name Debate
7 pages
Contemplating Life Through Flowers
100% (1)
Contemplating Life Through Flowers
5 pages
Httpsresources - Collins.co - UkDictionaryCD20ResourcesCollins Arabic 3000 Words and Phrases PDF
No ratings yet
Httpsresources - Collins.co - UkDictionaryCD20ResourcesCollins Arabic 3000 Words and Phrases PDF
147 pages
1.CPrimjereni Ispit Unit 7
No ratings yet
1.CPrimjereni Ispit Unit 7
3 pages
Canadian vs. Irish English Differences
100% (1)
Canadian vs. Irish English Differences
24 pages
Grade-8 English - Q3 Week 6
No ratings yet
Grade-8 English - Q3 Week 6
6 pages