0% found this document useful (0 votes)

118 views46 pages

CSCI 5832 Natural Language Processing: Jim Martin

Uploaded by

Eman Asem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

118 views46 pages

CSCI 5832 Natural Language Processing: Jim Martin

Uploaded by

Eman Asem

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 46

CSCI 5832

Natural Language Processing

Jim Martin
Lecture 9

10/11/20 1
Today 2/12

• Review
 GT example
• HMMs and Viterbi
 POS tagging

2
10/11/20
Good-Turing Intuition

• Notation: Nx is the frequency-of-frequency-x

 So N10=1, N1=3, etc
• To estimate counts/probs for unseen species
 Use number of species (words) we’ve seen once
 c0* =c1 p0 = N1/N
• All other estimates are adjusted (down) to allow for
increased probabilities for unseen

3
10/11/20
HW 0 Results

• Favorite color • 21 events

 Blue 8 • Count of counts
 Green 3  N1 = 4
 Red 2  N2 = 3
 Black 2  N3 = 1
 White 2  N4,5,6,7 = 0
 Periwinkle 1  N8 = 1
 Gamboge 1
 Eau-de-Nil 1
 Brown 1
4
10/11/20
GT for a New Color

• Count of counts
• Treat the 0s as 1s so...  N1 = 4
 N0 = 4; P(new color) = 4/21 = .19  N2 = 3
 If we new the number of colors out there we  N3 = 1
would divide .19 by the number of colors not  N4,5,6,7 = 0
seen.  N8 = 1
• Otherwise
 N*1 = (1+1) 3/4 = 6/4= 1.5
 P*(Periwinkle) = 1.5/21 = .07
 N*2 = (2+1) 1/3 = 1
 P*(Black) = 1/21 = .047

5
10/11/20
GT for New Color

• Count of counts
• But 2 twists  N1 = 4
 N2 = 3
 Treat the high flyers as
 N3 = 1
trusted.  N4,5,6,7 = 0
 So P(Blue) should stay 8/21  N8 = 1

 Use interpolation to smooth

the bin counts before re-
estimation
 To deal with
• N3=(3+1) 0/1

6
10/11/20
Why Logs?

Simple Good-Turing does linear

interpolation in log-space. Why?
QuickTime™
QuickTime™ and and aa
TIFF
TIFF (Uncompressed)
(Uncompressed) decompressor
decompressor
are
are needed
needed to
to see
see this
this picture.
picture.

7
10/11/20
Part of Speech tagging
• Part of speech tagging
 Parts of speech
 What’s POS tagging good for anyhow?
 Tag sets
 Rule-based tagging
 Statistical tagging
 Simple most-frequent-tag baseline
 Important Ideas
 Training sets and test sets
 Unknown words
 HMM tagging 8
10/11/20
Parts of Speech

• 8 (ish) traditional parts of speech

 Noun, verb, adjective, preposition, adverb,
article, interjection, pronoun, conjunction, etc
 Called: parts-of-speech, lexical category,
word classes, morphological classes, lexical
tags, POS
 Lots of debate in linguistics about the number,
nature, and universality of these
 We’ll completely ignore this debate.

9
10/11/20
POS examples

• N noun chair, bandwidth, pacing

• V verb study, debate, munch
• ADJ adjective purple, tall, ridiculous
• ADV adverb unfortunately, slowly
• P preposition of, by, to
• PRO pronoun I, me, mine
• DET determiner the, a, that, those
10
10/11/20
POS Tagging example

WORD tag

the DET
koala N
put V
the DET
keys N
on P
the DET
table N

11
10/11/20
POS Tagging

• Words often have more than one POS:

back
 The back door = JJ
 On my back = NN
 Win the voters back = RB
 Promised to back the bill = VB
• The POS tagging problem is to determine
the POS tag for a particular instance of a
word.
These examples from Dekang Lin
12
10/11/20
How hard is POS tagging?
Measuring ambiguity

13
10/11/20
2 methods for POS tagging

1. Rule-based tagging
 (ENGTWOL)
2. Stochastic (=Probabilistic) tagging
 HMM (Hidden Markov Model) tagging

14
10/11/20
Hidden Markov Model Tagging

• Using an HMM to do POS tagging

• Is a special case of Bayesian inference
 Foundational work in computational linguistics
 Bledsoe 1959: OCR
 Mosteller and Wallace 1964: authorship
identification
• It is also related to the “noisy channel”
model that’s the basis for ASR, OCR and
MT
15
10/11/20
POS Tagging as Sequence
Classification

• We are given a sentence (an “observation” or

“sequence of observations”)
 Secretariat is expected to race tomorrow
• What is the best sequence of tags which
corresponds to this sequence of observations?
• Probabilistic view:
 Consider all possible sequences of tags
 Out of this universe of sequences, choose the tag
sequence which is most probable given the
observation sequence of n words w1…wn.
16
10/11/20
Road to HMMs

• We want, out of all sequences of n tags t1…tn the single

tag sequence such that P(t1…tn|w1…wn) is highest.

• Hat ^ means “our estimate of the best one”

• Argmaxx f(x) means “the x such that f(x) is maximized”

17
10/11/20
Road to HMMs

• This equation is guaranteed to give us the

best tag sequence

• But how to make it operational? How to

compute this value?
• Intuition of Bayesian classification:
 Use Bayes rule to transform into a set of other
probabilities that are easier to compute
18
10/11/20
Using Bayes Rule

19
10/11/20
Likelihood and Prior

20
10/11/20
Two Sets of Probabilities (1)

• Tag transition probabilities p(ti|ti-1)

 Determiners likely to precede adjs and nouns
 That/DT flight/NN
 The/DT yellow/JJ hat/NN
 So we expect P(NN|DT) and P(JJ|DT) to be high
 Compute P(NN|DT) by counting in a labeled
corpus:

21
10/11/20
Two Sets of Probabilities (2)

• Word likelihood probabilities p(wi|ti)

 VBZ (3sg Pres verb) likely to be “is”
 Compute P(is|VBZ) by counting in a
labeled corpus:

22
10/11/20
An Example: the verb “race”

• Secretariat/NNP is/VBZ expected/VBN to/TO

race/VB tomorrow/NR
• People/NNS continue/VB to/TO inquire/VB
the/DT reason/NN for/IN the/DT race/NN
for/IN outer/JJ space/NN
• How do we pick the right tag?

23
10/11/20
Disambiguating “race”

24
10/11/20
Example

• What we’ve described with these two

kinds of probabilities is a Hidden Markov
Model
• Let’s just spend a bit of time tying this into
the model
• First some definitions.

26
10/11/20
Definitions

• A weighted finite-state automaton adds

probabilities to the arcs
 The sum of the probabilities leaving any arc must sum
to one
• A Markov chain is a special case in which the
input sequence uniquely determines which
states the automaton will go through
• Markov chains can’t represent inherently
ambiguous problems
 Useful for assigning probabilities to unambiguous
sequences
27
10/11/20
Markov chain for weather

28
10/11/20
Markov chain for words

29
10/11/20
Markov chain = “First-order
Observable Markov Model”
• A set of states
 Q = q1, q2…qN; the state at time t is qt
• Transition probabilities:
 a set of probabilities A = a01a02…an1…ann.
 Each aij represents the probability of transitioning from
state i to state j
 The set of these is the transition probability matrix A

• Current state only depends on previous state

P(qi | q1 ...qi−1) = P(qi | qi−1 )
30
10/11/20
Markov chain for weather

• What is the probability of 4 consecutive

rainy days?
• Sequence is rainy-rainy-rainy-rainy
• I.e., state sequence is 3-3-3-3
• P(3,3,3,3) =
 1a11a11a11a11 = 0.2 x (0.6)3 = 0.0432

31
10/11/20
HMM for Ice Cream

• You are a climatologist in the year 2799

• Studying global warming
• You can’t find any records of the weather
in Baltimore, MA for summer of 2007
• But you find Jason Eisner’s diary
• Which lists how many ice-creams Jason
ate every date that summer
• Our job: figure out how hot it was
32
10/11/20
Hidden Markov Model

• For Markov chains, the output symbols are the same

as the states.
 See hot weather: we’re in state hot
• But in part-of-speech tagging (and other things)
 The output symbols are words
 But the hidden states are part-of-speech tags
• So we need an extension!
• A Hidden Markov Model is an extension of a Markov
chain in which the input symbols are not the same as
the states.
• This means we don’t know which state we are in.
33
10/11/20
Hidden Markov Models

• States Q = q1, q2…qN;

• Observations O= o1, o2…oN;
 Each observation is a symbol from a vocabulary V = {v1,v2,…
vV}
• Transition probabilities
 Transition probability matrix A = {aij}
aij = P(qt = j | qt−1 = i) 1 ≤ i, j ≤ N
• Observation likelihoods
 Output probability matrix B={bi(k)}
b (k) = P(X t = ok | qt = i)
probability vector i
• Special initial €
34
10/11/20 π i = P(q1 = i) 1 ≤ i ≤ N
Eisner task

• Given
 Ice Cream Observation Sequence:
1,2,3,2,2,2,3…
• Produce:
 Weather Sequence: H,C,H,H,H,C…

35
10/11/20
HMM for ice cream

36
10/11/20
Transitions between the hidden
states of HMM, showing A probs

37
10/11/20
B observation likelihoods for
POS HMM

38
10/11/20
The A matrix for the POS HMM

39
10/11/20
The B matrix for the POS HMM

40
10/11/20
Viterbi intuition: we are looking
for the best ‘path’
S1 S2 S3 S4 S5
RB

VBN
JJ DT VB
TO
VBD
VB NNP NN

promised to back the bill

41
10/11/20
The Viterbi Algorithm

QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.

42
10/11/20
Viterbi example

43
10/11/20
Error Analysis

• Look at a confusion matrix

• See what errors are causing problems

 Noun (NN) vs ProperNoun (NNP) vs Adj (JJ)
 Preterite (VBD) vs Participle (VBN) vs Adjective (JJ)
44
10/11/20
Evaluation

• The result is compared with a manually

coded “Gold Standard”
 Typically accuracy reaches 96-97%
 This may be compared with result for a
baseline tagger (one that uses no context).
• Important: 100% is impossible even for
human annotators.

45
10/11/20
Summary

• HMM Tagging
 Markov Chains
 Hidden Markov Models

46
10/11/20

CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
47 pages
POS Tagging with Hidden Markov Models
No ratings yet
POS Tagging with Hidden Markov Models
37 pages
Lec PoS Tagging 2022
No ratings yet
Lec PoS Tagging 2022
67 pages
L4 Tagging
No ratings yet
L4 Tagging
107 pages
Week 9
No ratings yet
Week 9
36 pages
Lecture 5
No ratings yet
Lecture 5
56 pages
POS HMM Viterbi Algo 2025
No ratings yet
POS HMM Viterbi Algo 2025
52 pages
PoS Tagging and HMM in NLP
No ratings yet
PoS Tagging and HMM in NLP
50 pages
19CSE453 - Natural Language Processing: Part of Speech Tagging
No ratings yet
19CSE453 - Natural Language Processing: Part of Speech Tagging
59 pages
5 Sequence Learning
No ratings yet
5 Sequence Learning
50 pages
HMM Detailed
No ratings yet
HMM Detailed
41 pages
Classical NLP Optimization Techniques
No ratings yet
Classical NLP Optimization Techniques
23 pages
9.chapter7 POS Tagging
No ratings yet
9.chapter7 POS Tagging
37 pages
Part of Speech Tagging in NLP
No ratings yet
Part of Speech Tagging in NLP
57 pages
Hidden Markov Models for CS Students
No ratings yet
Hidden Markov Models for CS Students
35 pages
NLP Mod5 Lec1 Markov Model and Pos
No ratings yet
NLP Mod5 Lec1 Markov Model and Pos
21 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
Lec 10
No ratings yet
Lec 10
77 pages
Unit 5
No ratings yet
Unit 5
8 pages
POS Tagging and HMM in NLP
No ratings yet
POS Tagging and HMM in NLP
93 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
51 pages
PoSTagging-HMM
No ratings yet
PoSTagging-HMM
24 pages
Part-of-Speech Tagging Techniques
No ratings yet
Part-of-Speech Tagging Techniques
83 pages
Lecture05-Hmm Pos Tagging
No ratings yet
Lecture05-Hmm Pos Tagging
38 pages
Class Notes (Unit I - HMM & MaxEnt)
No ratings yet
Class Notes (Unit I - HMM & MaxEnt)
28 pages
Ai TXT Unit5
No ratings yet
Ai TXT Unit5
7 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
11 pages
Sequence Model:: Hidden Markov Models
No ratings yet
Sequence Model:: Hidden Markov Models
60 pages
Corpus Analysis
No ratings yet
Corpus Analysis
8 pages
Module-5 (Markov Model and Pos Tagging)
No ratings yet
Module-5 (Markov Model and Pos Tagging)
66 pages
NLP Algorithms for Students
No ratings yet
NLP Algorithms for Students
5 pages
Lecture Part of Speech Tagging
No ratings yet
Lecture Part of Speech Tagging
41 pages
L8-10 Intro POS HMM
No ratings yet
L8-10 Intro POS HMM
22 pages
A Guide To Hidden Markov Model and Its Applications in NLP
No ratings yet
A Guide To Hidden Markov Model and Its Applications in NLP
11 pages
HMMs for AI & Web Data Extraction
No ratings yet
HMMs for AI & Web Data Extraction
34 pages
Forward-Backward Algorithm & HMM for POS Tagging
No ratings yet
Forward-Backward Algorithm & HMM for POS Tagging
5 pages
Techniques for POS Tagging Explained
No ratings yet
Techniques for POS Tagging Explained
12 pages
Lecture7 Pos Tagging
No ratings yet
Lecture7 Pos Tagging
33 pages
HMM for POS Tagging in NLP
No ratings yet
HMM for POS Tagging in NLP
32 pages
2021 25 Pos Tagging NLP
No ratings yet
2021 25 Pos Tagging NLP
8 pages
HMM - Extra
No ratings yet
HMM - Extra
17 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
17 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
17 pages
Lecture 04
No ratings yet
Lecture 04
42 pages
Design of Experiment Project Report
No ratings yet
Design of Experiment Project Report
10 pages
MLRD 8
No ratings yet
MLRD 8
39 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
36 pages
5 Natural Language Processing
No ratings yet
5 Natural Language Processing
7 pages
Lecture Notes On Syntactic Processing
No ratings yet
Lecture Notes On Syntactic Processing
14 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
55 pages
4 HMM
No ratings yet
4 HMM
52 pages
Hidden Markovnikov Model
No ratings yet
Hidden Markovnikov Model
32 pages
24f 09 Hidden Markov Models
No ratings yet
24f 09 Hidden Markov Models
79 pages
Lecture 6 Hidden Markov and Maximum Entropy Models
No ratings yet
Lecture 6 Hidden Markov and Maximum Entropy Models
28 pages
Markov Chains
No ratings yet
Markov Chains
24 pages
Part-Of-Speech Tagging Overview
No ratings yet
Part-Of-Speech Tagging Overview
84 pages
Cme4408 p6 Pos Tagging
No ratings yet
Cme4408 p6 Pos Tagging
33 pages
3 cs626 Pos Tagging Week of 8aug22
No ratings yet
3 cs626 Pos Tagging Week of 8aug22
27 pages
07au Midterm
No ratings yet
07au Midterm
17 pages
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
No ratings yet
Word Classes and Part-of-Speech (POS) Tagging: CS4705 Julia Hirschberg
40 pages
CPIT 110 First Semester 2020 Schedule
No ratings yet
CPIT 110 First Semester 2020 Schedule
4 pages
Computer Skills - CPIT 100
No ratings yet
Computer Skills - CPIT 100
4 pages
Open Source Arabic Corpora Overview
No ratings yet
Open Source Arabic Corpora Overview
7 pages
Web Concepts and Terminology Quiz
No ratings yet
Web Concepts and Terminology Quiz
2 pages
19 The Young Mum Who Changed The High Street
No ratings yet
19 The Young Mum Who Changed The High Street
7 pages
Noun Clause
100% (2)
Noun Clause
25 pages
11-Wish, Rather, If Only, It's Time - Unreal Uses of Past Tenses
No ratings yet
11-Wish, Rather, If Only, It's Time - Unreal Uses of Past Tenses
2 pages
Class II English Worksheets: Punctuation & Nouns
No ratings yet
Class II English Worksheets: Punctuation & Nouns
31 pages
Halliday NotesDeepGrammar 1966
No ratings yet
Halliday NotesDeepGrammar 1966
12 pages
Advanced 2 - Unit 16 - Two Grammar Points
No ratings yet
Advanced 2 - Unit 16 - Two Grammar Points
3 pages
Goldberg Construction Grammar
No ratings yet
Goldberg Construction Grammar
4 pages
Personal Pronouns Grammar Guides
100% (2)
Personal Pronouns Grammar Guides
2 pages
RMC FHW Papaer
No ratings yet
RMC FHW Papaer
9 pages
Topics For A Research Paper
No ratings yet
Topics For A Research Paper
4 pages
Army Writing Guide NG
No ratings yet
Army Writing Guide NG
25 pages
English Curriculum for 6th Grade
No ratings yet
English Curriculum for 6th Grade
6 pages
Understanding Determiners in Grammar
89% (9)
Understanding Determiners in Grammar
15 pages
Arab Numbers
No ratings yet
Arab Numbers
5 pages
An Eel Pet: Pre-Reading Activities A: Discussion
No ratings yet
An Eel Pet: Pre-Reading Activities A: Discussion
10 pages
Its Virgo 321
100% (1)
Its Virgo 321
10 pages
Subject and To Be (Simple Present Tense/ Present Continuous)
No ratings yet
Subject and To Be (Simple Present Tense/ Present Continuous)
4 pages
Headway Unit 2
No ratings yet
Headway Unit 2
3 pages
Grade 7 English Lesson Plan: Types of Sentences
No ratings yet
Grade 7 English Lesson Plan: Types of Sentences
2 pages
Fragments Pps
No ratings yet
Fragments Pps
7 pages
S4 English
No ratings yet
S4 English
34 pages
10 Soal TOEFL Structure
No ratings yet
10 Soal TOEFL Structure
2 pages
Basic Greetings
No ratings yet
Basic Greetings
5 pages
Modal Verbs Uses
No ratings yet
Modal Verbs Uses
2 pages
Translation and Interpreting Techniques
No ratings yet
Translation and Interpreting Techniques
91 pages
Understanding Active vs. Passive Voice
No ratings yet
Understanding Active vs. Passive Voice
4 pages
Activity Mat 1
No ratings yet
Activity Mat 1
4 pages
1st Six Weeks PSAT Grammar
100% (1)
1st Six Weeks PSAT Grammar
127 pages
MPT English
No ratings yet
MPT English
5 pages
Neetu Ma'Am Tense
No ratings yet
Neetu Ma'Am Tense
115 pages

CSCI 5832 Natural Language Processing: Jim Martin

Uploaded by

CSCI 5832 Natural Language Processing: Jim Martin

Uploaded by

CSCI 5832

Natural Language Processing

• Notation: Nx is the frequency-of-frequency-x

• Favorite color • 21 events

 Use interpolation to smooth

Simple Good-Turing does linear

• 8 (ish) traditional parts of speech

• N noun chair, bandwidth, pacing

• Words often have more than one POS:

• Using an HMM to do POS tagging

• We are given a sentence (an “observation” or

• We want, out of all sequences of n tags t1…tn the single

• Hat ^ means “our estimate of the best one”

• This equation is guaranteed to give us the

• But how to make it operational? How to

• Tag transition probabilities p(ti|ti-1)

• Word likelihood probabilities p(wi|ti)

• Secretariat/NNP is/VBZ expected/VBN to/TO

• What we’ve described with these two

• A weighted finite-state automaton adds

• Current state only depends on previous state

• What is the probability of 4 consecutive

• You are a climatologist in the year 2799

• For Markov chains, the output symbols are the same

• States Q = q1, q2…qN;

promised to back the bill

• Look at a confusion matrix

• See what errors are causing problems

• The result is compared with a manually

You might also like