0% found this document useful (0 votes)

152 views28 pages

Lecture 6 Hidden Markov and Maximum Entropy Models

The document discusses Hidden Markov Models (HMM) and Maximum Entropy Models (MaxEnt), highlighting their roles as statistical models for text and speech processing. It explains the structure and functioning of Markov Chains, HMMs, and MaxEnt, including their applications in sequence classification and tagging. Additionally, it covers concepts like transition probabilities, observation probabilities, and the use of features in MaxEnt models, providing examples and comparisons between these modeling techniques.

Uploaded by

beebird234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

152 views28 pages

Lecture 6 Hidden Markov and Maximum Entropy Models

Uploaded by

beebird234

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Hidden Markov and Maximum Entropy Models

 Introduction,  Maximum entropy models,

 Markov Chains - Logistic regression,
- Observed Markov model, - hyperplane,
- weighted finite-state automata,  Maximum entropy markov models,
- Probabilistic graphical model, - MaxEnt model,
 Hidden Markov model, - HMM tagging model,
- transition probability matrix, - MEMM tagging model.
- Observed likelihood,
- emission probability,
- Left-to-right (Bakis) HMM,
 Maximum entropy models,
- log-linear classifiers,
- linear regression,
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
1. Introduction (Hidden Markov and Maximum Entropy Models )
 Two important classes of statistical models for processing text & speech;
(1) Hidden Markov model (HMM),
(2) Maximum entropy model (MaxEnt), and particularly a Markov-related
variant of MaxEnt called the maximum entropy Markov model (MEMM).

HMMs and MEMMs are both sequence classifiers.

- A sequence classifier or sequence labeller is a model whose job is to assign
some label or class to each unit in a sequence.
- compute a probability distribution over possible labels and choose the best
label sequence.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
2. Markov Chains
 Markov chains and hidden Markov models are both extensions of the finite
automata.
 Finite automata is definitely by a set of states and a set of transitions between
states.
 A Markov chain is a special case of a weighted automaton in which the input
sequence uniquely determines which states the automaton will go through.
Because
 it can’t represent inherently ambiguous problems, a Markov chain is only
useful for assigning probabilities to unambiguous sequences.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
2. Markov Chains (Cont…)
 Figure 6.1 (a) shows;
- (word by word state) a Markov chain for assigning a probability to a sequence of weather
events, for which the vocabulary consists of HOT, COLD and RAINY.
Figure 6.1 (b) shows;
- (sequence of word states, sentences) another simple example of a Markov chain for
assigning a probability to a sequence of words w1, w2, …., wn.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
2. Markov Chains (Alternative representation)
 An alternative representation that is sometimes used for Markov chains doesn’t reply on a
start or end state,
- instead representation the distribution over initial state and accepting states explicitly.
Examples ; compute the probability of each of the following sequences as;
o hot hot hot hot => π = .5*.5*.5*.5 = 0.0625
o cold hot cold hot => π = .5*.2*.5*.2 = 0.01 ????

 COLD-> COLD->WARM->WARM->WARM-> HOT-> COLD

 HOT->COLD->HOT->HOT->WARM->COLD->COLD->WARM

 WARM->HOT->COLD->WARM->COLD->HOT->WARM->HOT

 HOT->COLD->COLD->WARM->WARM->HOT->COLD->WARM

 COLD->HOT->WARM->WARM->COLD->HOT->WARM->HOT

 WARM->COLD->HOT->WARM->COLD->COLD->HOT->WARM

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
2. Markov Chains (Class Participation)
 How to compute the probabilities of each of the following sentences by using
7-states problems of;
(a) Students did their assignment well at time (* high probability likelihood).
(b) did their assignment student well at time (*2nd best probability likelihood).
(c) At student assignment well did time their ( worst probability likelihood).

How to compute the probabilities of each of the following sentences by using

5-states problems of;
(a) Weather is hot and dry (* high probability likelihood).
(b) and is hot weather dry (*2nd best probability likelihood).
(c) hot weather and dry is ( worst probability likelihood).

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Hidden Markov Model (HMM)
A Hidden Markov Model (HMM) allows us to talk Hidden Markov about
both
- observed Model events (like words that we see in the input) and hidden
events (like part-of-speech tags) that we think of as causal factors in our
probabilistic model.
A formal definition of a Hidden Markov Model, focusing on how HMM it
differs from a Markov chain.
- HMM doesn’t rely on a start or end state.
- representing the distribution over initial and accepting states explicitly.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Hidden Markov Model (HMM) (Cont…)
A first hidden Markov model instantiates two simplifying assumptions;
First, the probability of a particular state depends only on the previous state:

Second, the probability of an output observation oi

- depends only on the state that produced the observation qi and
- not on any other states or any other observations:

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Hidden Markov Model (HMM) [Example]
In Figure;
 Two states : ‘Low’ and ‘High’ atmospheric
pressure.
Two observations : ‘Rain’ and ‘Dry’.
Transition probabilities: P(‘Low’|‘Low’)=0.3 ,
P(‘High’|‘Low’)=0.7 , P(‘Low’|‘High’)=0.2,
P(‘High’|‘High’)=0.8
Observation probabilities : P(‘Rain’|‘Low’)=0.6 ,
P(‘Dry’|‘Low’)=0.4 , P(‘Rain’|‘High’)=0.4 ,
P(‘Dry’|‘High’)=0.3 .
Initial probabilities: say P(‘Low’)=0.4 ,
P(‘High’)=0.6 .

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Hidden Markov Model (HMM) [Example-1] (Cont…)
Calculate of observation sequence probability;
Transition
Suppose we want to calculate a probability of a probabilities:
sequence of observations in our example, P(‘Low’|‘Low’)=0.3 ,
{‘Dry’,’Rain’}. P(‘High’|‘Low’)=0.7 ,
Consider all possible hidden state sequences: P(‘Low’|‘High’)=0.2,
P({‘Dry’,’Rain’} ) = P({‘Dry’,’Rain’} , {‘Low’,’Low’}) + P(‘High’|‘High’)=0.8
P({‘Dry’,’Rain’} , {‘Low’,’High’}) + P({‘Dry’,’Rain’} , Observation
{‘High’,’Low’}) + P({‘Dry’,’Rain’} , {‘High’,’High’}) probabilities :
P(‘Rain’|‘Low’)=0.6 ,
P(‘Dry’|‘Low’)=0.4 ,
where first term is : P(‘Rain’|‘High’)=0.4 ,
P({‘Dry’,’Rain’} , {‘Low’,’Low’})= P(‘Dry’|‘High’)=0.3 .
P(‘Dry’|’Low’) P(‘Low’) P(‘Rain’|’Low’) P(‘Low’) Initial probabilities:
P(‘Low’|’Low) say P(‘Low’)=0.4 ,
P(‘High’)=0.6 .
= 0.4*0.4*0.6*0.4*0.3
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Hidden Markov Model (HMM) [Example-2] (Cont…)
Typed word recognition, assume all characters are separated.

Character recognizer outputs probability of the image being particular

character, P(image|character).

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
3. Hidden Markov Model (HMM) [Example-3] (Cont…)
We can construct a single HMM for all words.
Hidden states = all characters in the alphabet.
Transition probabilities and initial probabilities are calculated from language
model.
Observations and observation probabilities are as before.

Here we have to determine the best sequence of hidden states, the one that
most likely produced word image.
This is an application of Decoding problem.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
4. Left-to-right (Bakis) HMM
 During left-to-right (also called Bakis)
HMMs, the state transitions proceed from
left to right.

In a Bakis HMM,

- no transitions go from a higher-
numbered state to a lower-numbered state.

It includes 1-state to multi-states HMM as;

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
4. Left-to-right (Bakis) HMM (Home Assignment)
 Draw a model of left-to-right HMM of 2-states, 3-states and 4-states
problems of;

(a) (b)

Tennis posture detection

(c)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
5. Maximum Entropy Models
2nd probabilistic machine learning framework called maximum entropy modelling.
- MaxEnt is more widely known as multinomial logistic regression.

MaxEnt belongs to the family of classifiers known as the exponential or log-linear

classifiers.
- MaxEnt works by extracting some set of features from the input,
- combining them linearly (meaning that each feature is multiplied by a weight and then
added up) and sum become exponent.

Example-1:
In text classification,
- need to decide whether a particular email should be classified as spam.
- determine whether a particular sentence or document expresses a positive or negative
opinion.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
5. Maximum Entropy Models (Cont…)
Example-2: Assume that we have some input x (perhaps it is a word that needs to be tagged or
a document that needs to be classified).
- From input x, we extract some features fi.
- A feature for tagging might be this word ends in –ing.
- For each such features fi, we have some weight wi.

Given the features and weights, our goal is to choose a class for a word.
- the probability of a particular class c given the observation x is;

where Z is a normalization factor, used to make the probability correctly sum to 1.

Finally, in actual MaxEnt model,
- the feature f and weights w both depend on the class c (i.e., we’ll have different features and
weights for different classes);
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
5.1 Linear Regression
In linear regression, we are given a set of observations;
- each observation associated with some features,
- and we want to predict some real-valued outcome for each observation.

Example; predicting housing prices.

Levitt and Dubner showed that; the words used in a real estate ad can be a good predictor of;
- whether a house will sell for more or less than its asking prices.
- e.g., house whose real estate ads has words like
fantastic, cute, or charming, tending to sell for
lower prices.
- e.g., while houses whose ads has words like
maple and granite tended to sell for
higher prices.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
5.1 Linear Regression (Cont…)
 Figure shows;
- a graph of these points, with the feature (# of adjectives) on the REGRESSION LINE x-
axis, and the price on the y-axis.

Suppose the weight vector that we had previously learned for this task was
w = (w0,w1,w2,w3) = (18000,−5000,−3000,−1.8).

 Then the predicted value for this house would be computed by multiplying each feature by
its weight:
The equation of any line is
y = mx +b; as we
show on the graph, the slope
of this line is
m = −4900, while the
intercept (b) = 16550.
Features (in this case x,
numbers of adjectives)

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
5.1 Linear Regression (Class Participation)
 Example; Global warming may be reducing average
snowfall in your town and you are asked to predict how
much snow you think will fall this year.
Looking at the following table you might guess somewhere
around 10-20 inches. That’s a good guess, but you could make
a better guess, by using regression.
- Find out linear regression for 2014, 2015, 2016, 2017 and
2018?

Hint :
- regression also gives you a useful equation, which for this
chart is: y = -2.2923x + 4624.4.
- For example, 2005:
y = -2.2923(2005) + 4624.4 = 28.3385 inches, which is
pretty close to the actual figure of 30 inches for that year.
@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
5.2 Logistic Regression
In logistic regression, we classify whether some observation x is in the class (true) or not in
the class (false).

Example; we are assigning a part-of-speech tag to the word “race”.

Secretariat/NNP is/BEZ expected/VBN to/TO race/?? tommorrow/
- we are just doing classification, not sequence classification, so let’s consider just this single
word.
- We would like to know whether to assign the class VB to race (or instead assign some other
class like NN)

Case 1: We can thus add a binary feature that is true if this is the case:

Case 2: Another feature would be whether the previous word “to” the tag TO;

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
5.3 Maximum Entropy Markov Models
 Previously, the HMM tagging model is based on probabilities of the form
P(tag|tag) and P(word|tag).
- That means that if we want to include some source of knowledge into the tagging process,
we must find a way to encode the knowledge into one of these two probabilities.
- But “many knowledge sources are hard to fit into these models??????”.

For example; tagging unknown words

- useful features include capitalization, the presence of hyphens, word endings, and so on.
- There is no easy way to fit probabilities like P(capitalization|tag), P(hyphen|tag),
P(suffix|tag), and so on into a HMM-style model.
- HMM to model the most probable part-of-speech tag sequence, we rely on Bayes rule,

Fig. The dependency graph for a traditional HMM (left).

The dependency graph for a Maximum Entropy Markov
Model (right).
In case of HMM, its parameters are used to maximize the likelihood of the observation
sequence (see Figure at left)

While, in the MEMM, the current observation Ot depends on the current state St and the
current observation Ot is also depend on the previous state St-1 .

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
5.3 Maximum Entropy Markov Models [Example-1] (Cont…)

 More formally, in the HMM, we compare the

probability of the state sequence given the
observations
as;

In the MEMM, we compute the probability of the

state sequence given the observation as;

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
(Class Presentation)
 Design case study with proper examples for;
 Linear Regression,
 Logistic Regression,
 Maximum Entropy Markov Models.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
6. HMM Vs Maximum Entropy Markov Models [Example]
 Text classification: Asia or Europe

Europe Training Data Asia

Monaco Monaco Monaco Monaco
Hong Monaco
Monaco Monaco Hong
Kong
Kong
Monaco

HMM FACTORS: PREDICTIONS MEMM:

NB Model • P(A) = P(E) = • P(A,M) =
Class • P(M|A) = • P(E,M) =
• P(M|E) = • P(A|M) =
X1=M • P(E|M) =

Europe Training Data Asia

Monaco Monaco Monaco
Hong Monaco Hong
Monaco Hong
Kong Kong
Kong
Monaco

NB Model HMM FACTORS: PREDICTIONS MEMM:

• P(A) = P(E) = • P(A,H,K) =
Class
• P(H|A) = P(K|A) = • P(E,H,K) =
• P(H|E) = P(K|E) = • P(A|H,K) =
X1=H X2=K
• P(E|H,K) =

Europe Training Data Asia

Monaco Monaco Monaco
Hong Monaco Hong
Monaco Hong
Kong Kong
Kong
Monaco

NB Model HMM FACTORS: PREDICTIONS MEMM:

• P(A) = P(E) = • P(A,H,K,M) =
Class • P(H|A) = P(K|A) = • P(E,H,K,M) =
• P(H|E) = PK|E) =
• P(A|H,K,M) =
H K M • P(M|A) =
• P(M|E) =
• P(E|H,K,M) =

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)
6. HMM vs. Maximum Entropy Markov Models [Example] (Cont…)
 NLP relevance: we often have overlapping features….

HMM models multi-count correlated evidence

• Each feature is multiplied in, even when you have multiple features telling you the same thing

Maximum Entropy models (pretty much) solve this problem

• As we will see, this is done by weighting features so that model expectations match the observed
(empirical) expectations.

@Copyrights: Natural Language Processing (NLP) Organized by Dr. Ahmad Jalal (http://portals.au.edu.pk/imc/)

A Guide To Hidden Markov Model and Its Applications in NLP
No ratings yet
A Guide To Hidden Markov Model and Its Applications in NLP
11 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
31 pages
19CSE453 - Natural Language Processing: Part of Speech Tagging
No ratings yet
19CSE453 - Natural Language Processing: Part of Speech Tagging
59 pages
Pert15 - Probabilistic Reasoning Over Time
No ratings yet
Pert15 - Probabilistic Reasoning Over Time
32 pages
Chapter 14
No ratings yet
Chapter 14
38 pages
Lecture Notes On Syntactic Processing
No ratings yet
Lecture Notes On Syntactic Processing
14 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
21 pages
Dependency Parsing Explained
No ratings yet
Dependency Parsing Explained
38 pages
HMM Basics for Beginners
No ratings yet
HMM Basics for Beginners
8 pages
Text Processing Basics: Tokenization Guide
No ratings yet
Text Processing Basics: Tokenization Guide
42 pages
Intro to Topic Modeling
No ratings yet
Intro to Topic Modeling
120 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
4 pages
NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
No ratings yet
NLP Worksheet: Text Processing, Bag of Words, Tf-Idf Activity
6 pages
Maximum Spanning Trees in Parsing
No ratings yet
Maximum Spanning Trees in Parsing
56 pages
Applications of AI
No ratings yet
Applications of AI
11 pages
Class Notes (Unit I - HMM & MaxEnt)
No ratings yet
Class Notes (Unit I - HMM & MaxEnt)
28 pages
Hidden Markov Models Full Assignment
No ratings yet
Hidden Markov Models Full Assignment
2 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
9 pages
NLP Mod5 Lec1 Markov Model and Pos
No ratings yet
NLP Mod5 Lec1 Markov Model and Pos
21 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
17 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
36 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
17 pages
HMM - Extra
No ratings yet
HMM - Extra
17 pages
Sequence Model:: Hidden Markov Models
No ratings yet
Sequence Model:: Hidden Markov Models
60 pages
Lecture 2
No ratings yet
Lecture 2
21 pages
HMMs in Natural Language Processing
No ratings yet
HMMs in Natural Language Processing
16 pages
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
No ratings yet
Hidden Markov Models: Ts. Nguyễn Văn Vinh Bộ môn KHMT, Trường ĐHCN, ĐH QG Hà nội
55 pages
Hidden Markov Model
0% (1)
Hidden Markov Model
5 pages
POS Tagging with Hidden Markov Models
No ratings yet
POS Tagging with Hidden Markov Models
37 pages
NLP Lecture 01-10-Hmm
No ratings yet
NLP Lecture 01-10-Hmm
9 pages
Winter Semester 2022-23 CSE3008 ETH AP2022236000448 Reference Material I 26-Apr-2023 HMM Class-1 PDF
No ratings yet
Winter Semester 2022-23 CSE3008 ETH AP2022236000448 Reference Material I 26-Apr-2023 HMM Class-1 PDF
56 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
56 pages
L4 Tagging
No ratings yet
L4 Tagging
107 pages
Cis262 HMM
No ratings yet
Cis262 HMM
34 pages
Lecture Week11
No ratings yet
Lecture Week11
24 pages
IS 7118 Unit-6 HMM
No ratings yet
IS 7118 Unit-6 HMM
78 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
41 pages
HMM in NLP
No ratings yet
HMM in NLP
3 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
4 pages
2024 Fall CSE366 12 HMM
No ratings yet
2024 Fall CSE366 12 HMM
46 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
15 pages
Hidden Markov Model Introduction
No ratings yet
Hidden Markov Model Introduction
36 pages
Parametric Models Hidden Markov Models
No ratings yet
Parametric Models Hidden Markov Models
30 pages
Introduction To Hidden Markov Models
No ratings yet
Introduction To Hidden Markov Models
11 pages
HMMs for AI & Web Data Extraction
No ratings yet
HMMs for AI & Web Data Extraction
34 pages
Hidden Markov Models for CS Students
No ratings yet
Hidden Markov Models for CS Students
35 pages
Lec 7 NNL
No ratings yet
Lec 7 NNL
19 pages
Module 3
No ratings yet
Module 3
17 pages
Presentation 20241212 094152 0000
No ratings yet
Presentation 20241212 094152 0000
8 pages
Hidden Markov Model
No ratings yet
Hidden Markov Model
35 pages
Hidden Markov Models for ML Students
No ratings yet
Hidden Markov Models for ML Students
5 pages
AML TB3 CH12 Highlighted
No ratings yet
AML TB3 CH12 Highlighted
9 pages
MLRD 8
No ratings yet
MLRD 8
39 pages
Understanding Hidden Markov Models
No ratings yet
Understanding Hidden Markov Models
11 pages
Labman 2
No ratings yet
Labman 2
16 pages
CSCI 5832 Natural Language Processing: Jim Martin
No ratings yet
CSCI 5832 Natural Language Processing: Jim Martin
46 pages
Hidden Markov Models
No ratings yet
Hidden Markov Models
10 pages
PoSTagging-HMM
No ratings yet
PoSTagging-HMM
24 pages
Markov Models in AI Applications
No ratings yet
Markov Models in AI Applications
78 pages
Hidden Markov Models Explained
No ratings yet
Hidden Markov Models Explained
20 pages
Analysis of Variance
No ratings yet
Analysis of Variance
4 pages
Alexnet and Data Augmentation
No ratings yet
Alexnet and Data Augmentation
6 pages
Meaning Representation
No ratings yet
Meaning Representation
7 pages
Autoencoders Tutorial - What Are Autoencoders - Edureka
No ratings yet
Autoencoders Tutorial - What Are Autoencoders - Edureka
10 pages
Operating System - I - O Hardware - Tutorialspoint
No ratings yet
Operating System - I - O Hardware - Tutorialspoint
6 pages
Why Were Gans Developed in The First Place?: Generative Adversarial Network (Gan)
No ratings yet
Why Were Gans Developed in The First Place?: Generative Adversarial Network (Gan)
3 pages
CN 8
No ratings yet
CN 8
5 pages
WSD Using Dictionary
No ratings yet
WSD Using Dictionary
4 pages
Unit-4 - Cloud Computing Security Architecture
No ratings yet
Unit-4 - Cloud Computing Security Architecture
21 pages
Cloud Stack
No ratings yet
Cloud Stack
4 pages
Google Net
No ratings yet
Google Net
7 pages
13.3.1 Packet Tracer - Use Icmp To Test and Correct Network Connectivity
No ratings yet
13.3.1 Packet Tracer - Use Icmp To Test and Correct Network Connectivity
2 pages
Unit 4
No ratings yet
Unit 4
25 pages
Ngrams Final
No ratings yet
Ngrams Final
28 pages
VGG Net
No ratings yet
VGG Net
6 pages
Air Sea Dynamics
No ratings yet
Air Sea Dynamics
9 pages
Experiment - 6 Study The Plant Population Density by Quadrat Method
No ratings yet
Experiment - 6 Study The Plant Population Density by Quadrat Method
2 pages
Year 7 Maths Dictionary
100% (3)
Year 7 Maths Dictionary
24 pages
Land Use Cover Changes and Vulnerability To Flooding in The Harts Catchment, South Africa
No ratings yet
Land Use Cover Changes and Vulnerability To Flooding in The Harts Catchment, South Africa
13 pages
Rajasthan Mechanical Engineering Refrigeration Timetable
No ratings yet
Rajasthan Mechanical Engineering Refrigeration Timetable
2 pages
ANOVA and Hypothesis Testing Guide
No ratings yet
ANOVA and Hypothesis Testing Guide
21 pages
ARTICLE
No ratings yet
ARTICLE
20 pages
s11269 024 03850 8
No ratings yet
s11269 024 03850 8
19 pages
Machine Learning for Power Transformer Health
No ratings yet
Machine Learning for Power Transformer Health
5 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Crop Yield Prediction Using ML Algorithms
No ratings yet
Crop Yield Prediction Using ML Algorithms
8 pages
Time Series Analysis in R
100% (1)
Time Series Analysis in R
138 pages
Riser Test
No ratings yet
Riser Test
22 pages
Chap012 Analysis of Variance
No ratings yet
Chap012 Analysis of Variance
28 pages
BHU Geography Course Structure 2012-14
No ratings yet
BHU Geography Course Structure 2012-14
73 pages
Wa0027.
No ratings yet
Wa0027.
34 pages
Pocket Anima Prime RPG Guide
100% (1)
Pocket Anima Prime RPG Guide
7 pages
Statistics for Decision-Makers
No ratings yet
Statistics for Decision-Makers
53 pages
Probability Theory and Stochastic Process-18
No ratings yet
Probability Theory and Stochastic Process-18
195 pages
Time Series Analysis: Solution
No ratings yet
Time Series Analysis: Solution
1 page
Vitamin C Degradation in Milk Study
No ratings yet
Vitamin C Degradation in Milk Study
12 pages
Bayes Classifier and K-means Assignment
No ratings yet
Bayes Classifier and K-means Assignment
4 pages
1.10 Simple Linear Regression
No ratings yet
1.10 Simple Linear Regression
9 pages
Statistics & Probability Syllabus
No ratings yet
Statistics & Probability Syllabus
7 pages
Predicting Aircraft Trajectory Choice - A Nominal Route Approach
No ratings yet
Predicting Aircraft Trajectory Choice - A Nominal Route Approach
8 pages
Problem Set 1
No ratings yet
Problem Set 1
2 pages
Introduction To Time Series Models
No ratings yet
Introduction To Time Series Models
20 pages
Multiple Regression in Chapter 9
No ratings yet
Multiple Regression in Chapter 9
15 pages
Probability May 12
No ratings yet
Probability May 12
201 pages
Test of A Service Profit Chain Model in The Retail Banking Sector
No ratings yet
Test of A Service Profit Chain Model in The Retail Banking Sector
22 pages

Lecture 6 Hidden Markov and Maximum Entropy Models

Uploaded by

Lecture 6 Hidden Markov and Maximum Entropy Models

Uploaded by

Hidden Markov and Maximum Entropy Models

 Introduction,  Maximum entropy models,

HMMs and MEMMs are both sequence classifiers.

 COLD-> COLD->WARM->WARM->WARM-> HOT-> COLD

How to compute the probabilities of each of the following sentences by using

Second, the probability of an output observation oi

Character recognizer outputs probability of the image being particular

In a Bakis HMM,

It includes 1-state to multi-states HMM as;

Tennis posture detection

MaxEnt belongs to the family of classifiers known as the exponential or log-linear

where Z is a normalization factor, used to make the probability correctly sum to 1.

Example; predicting housing prices.

Example; we are assigning a part-of-speech tag to the word “race”.

For example; tagging unknown words

Fig. The dependency graph for a traditional HMM (left).

 More formally, in the HMM, we compare the

In the MEMM, we compute the probability of the

Europe Training Data Asia

HMM FACTORS: PREDICTIONS MEMM:

Europe Training Data Asia

NB Model HMM FACTORS: PREDICTIONS MEMM:

Europe Training Data Asia

NB Model HMM FACTORS: PREDICTIONS MEMM:

HMM models multi-count correlated evidence

Maximum Entropy models (pretty much) solve this problem

You might also like