0% found this document useful (0 votes)

217 views24 pages

EM Algorithm & Gaussian Mixture Model

The document discusses the Expectation-Maximization (EM) algorithm and its applications, including Gaussian mixture models (GMM). [1] The EM algorithm is an iterative method for finding maximum likelihood estimates in problems with missing or latent data. [2] It alternates between an expectation (E) step, which computes the expected value of the log-likelihood, and a maximization (M) step, which computes the parameters maximizing the expected log-likelihood from the E step. [3] The algorithm is applied to GMM clustering by treating cluster labels as latent variables.

Uploaded by

Vikash Movva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

217 views24 pages

EM Algorithm & Gaussian Mixture Model

Uploaded by

Vikash Movva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

ISyE 6416: Computational Statistics

Spring 2023

Lecture 8: EM algorithm and

Gaussian Mixture Model

Prof. Yao Xie

H. Milton Stewart School of Industrial and Systems Engineering

Georgia Institute of Technology
Expectation-Maximization (EM) Algorithm
▶ an algorithm to a maximum likelihood estimator in non-ideal case: missing data,
indirect observations
▶ missing data
▶ clustering (unknown label)
▶ hidden-states in HMM
▶ latent factors

▶ replace one difficult likelihood

maximization with a sequence of
easier maximizations
▶ in the limit, the answer to the original
problem
Applications of EM
▶ Data clustering in machine learning
▶ Natural language processing (Baum-Welch algorithm to fit hidden Markov model)
▶ Imputing missing data
General set-up

S O

Observation
Hidden state
Space

▶ we do not observe S, only observe indirectly from O

▶ Joint distribution of state and observation f (S, O|θ)
Deriving EM

▶ Introduce

Q(θ; θ′ ) = E{log f (S, O|θ)|θ′ , O}

▶ The expectation uses the conditional

distribution of S given O and assumed
value of parameter θ′
Rhymer’s Notes

Intuition
Given O, the “best guess” we could have for S, is its conditional expectation with
respect to S|O, θ (notion of projection); but the computation of expectation involves
parameter values. We take a guess, and improve in next round.
Comment on the Q function
For the conditional likelihood: Q-function

Q(θ; θ′ ) = E{log f (S, O|θ)|θ′ , O}

▶ The expectation is taken with respect to the conditional distribution f (S|O)

▶ O: observed data
▶ In this sense, it has a Bayesian flavor: we have to compute the posterior
distribution of the state given the observation
▶ θ′ : assumed value of the parameter when deriving the posterior distribution
f (S|O)
▶ θ is the parameter involved in “log-likelihood” log f (S|θ) that we will maximize
with respect to
▶ θ and θ′ are usually not the same in your algorithm
E-M algorithm

▶ E-step: compute expectation of the log-likelihood

observed data O, unknown state S

Q(θ; θ′ ) = E{log f (S, O|θ)|θ′ , O}

▶ M-step: compute maximum likelihood using the expectation in previous step

E-step ⇒ M-step ⇒ E-step ⇒ M-step ⇒

▶ stop until ∥θk+1 − θk ∥ < ϵ or |Q(θk+1 |θk ) − Q(θk |θk−1 )| < ϵ
Example: EM for missing data

n = 4, p = 2

x1 = (0, 2)T , x2 = (1, 0)T , x3 = (2, 2)T , x4 = (∗, 4)T

Assume they are i.i.d. samples from Gaussian
2
T σ1 0
N ([µ1 , µ2 ] , )
0 σ22

Use EM algorithm to impute the missing data *.

Hidden state: Missing data.

Pattern classification, R. O. Duda, P. E. Hart, and D. G. Stork

(Cont.) Example: missing data
▶ Initialization: θ0 = (0, 0, 1, 1)T , i.e., mean [0, 0]T and covariance I2 .
▶ E-step

Q(θ|θ0 ) = Ex41 [log p(x|θ)|x1 , x2 , x3 , x42 ]

3
X
= log p(xi |θ)
i=1
Z
+ log(p([x41 , 4]T )|θ) · p([x41 , 4]|θ0 )dx41
3
X (1 + µ21 ) (4 − µ2 )2
= log p(xi |θ) − − − log(2πσ1 σ2 )
i=1
2σ12 2σ22

▶ M-step
θ1 = arg max Q(θ|θ0 )
θ
(Cont.) Example: missing data - iterations
 
0.75
 2.0  0.75 0.938 0
0.938 ⇒
θ1 =  µ1 = Σ1 =

2.0 0 2.0
2.0

 
1.0
 2.0 
θ2 =  
0.667
2.0
The absent-minded biologist
197 animals
Distributed into 4 categories

125 18 20 34

Multinomial model of 5 category with unknown parameter θ

1 θ 1−θ 1−θ θ
( , , , , )
2 4 4 4 4
Can we figure out the number of Monkey A based on the data?
(Cont.) The absent-minded biologist

▶ data y = (125, 18, 20, 34)

▶ now assume y1 = y11 + y12 = 125
▶ Likelihood function
n! 1 θ 1 θ 1 θ θ
f (y|θ) = ( )y11 ( )y12 ( − )y2 ( − )y3 ( )y4
y11 !y12 !y2 !y3 !y4 ! 2 4 4 4 4 4 4
▶ log-likelihood

ℓ(θ|y) ∝ (y12 + y4 ) log θ + (y2 + y3 ) log(1 − θ)

▶ y12 unknown, cannot directly maximize ℓ(θ|y)

(Cont.) The absent-minded biologist: set-up EM

Q(θ|θ′ ) = Ey12 [(y12 + y4 ) log θ + (y2 + y3 ) log(1 − θ)|y1 , . . . , y4 , θ′ ]

= (Ey12 [y12 |y1 , θ′ ] + y4 ) log θ + (y2 + y3 ) log(1 − θ)

θ′ /4
Conditional distribution of y12 given y1 : Binomial (y1 , θ′ /4+1/2 )

y1 θ ′ θ′
Ey12 [y12 |y1 , θ′ ] = ′
:= y12 ,
2+θ
E-step:
′
Q(θ|θ′ ) = (y12
θ
+ y4 ) log θ + (y2 + y3 ) log(1 − θ)
(θ )
y12k +y4
M-step: θk+1 = arg max Q(θ|θk ) = (θk )
y12 +y2 +y3 +y4
Fitting Gaussian mixture model (GMM)
C
X
xi ∼ πc ϕ(xi |µc , Σc )
c=1
ϕ: density of multi-variate normal
▶ parameters {µc , Σc , πc }C c=1
▶ assume C is known.
▶ observed data {x1 , . . . , xn }
▶ complete data {(x1 , y1 ), . . . , (xn , yn )}
yn : “label” for each sample, missing.
(𝑥' , 𝑦' )

𝜋"

𝜋$

𝜋#
EM for GMM
▶ If we know the label information yi , likelihood function can be easily written

πyi ϕ(xi |µyi , Σyi )

▶ now yi unknown, compute its expectation with respect to the set of parameters

Xn
Q(θ|θ′ ) = E[ log πyi + log ϕ(xi |µyi , Σyi )|xi , θ′ ]
i=1

(𝑥' , 𝑦' )

𝜋"

𝜋$

𝜋#
E-step

▶ (πc(k) , µ(k) (k)

c , Σc ) parameter values in the kth iteration
▶ we need yi |xi , posterior distribution of label, given observation xi

pi,c := p(yi = c|xi ) ∝ πc(k) ϕ(xi |µc(k) , Σ(k)

Q: where is θ?
M-step
▶ Maximize Q(θ|θk ) with respect to πc , µc , Σc (note that they can be maximized
separately)
θk+1 = arg max Q(θ|θk )
θ
PC
▶ note that c=1 πc =1

Pn
pi,c xi
µ(k+1)
c = Pi=1
n
i=1 pi,c
Pn (k+1) (k+1) T
i=1 pi,c (xi − µc )(xi − µc )
Σ(k+1)
c = Pn
i=1 pi,c
n
1X
πc(k+1) = pi,c
n
i=1
Interpretation

(𝑥' , 𝑦' )
▶ pi,c : probability of each sample belong
to computer c
▶ πc(k+1) : count the expected number of 𝜋"
samples belong to component c 𝜋$
▶ soft-assignment: xi belong to 𝜋#
component c with assignment
probability pi,c 0.5 1
(k+1)
▶ µc :
“average” centroid using soft 0.3
𝑥' 2
assignment
▶ µ(k+1)
c : “average” covariance using 0.2
soft assignment 3
P(𝑦' = 𝑗|𝑥' )
k-means
1 1
▶ K-means: “hard” assignment
▶ EM algorithm: “soft” assignment: in the end, pi,c can 0
𝑥" 2
be viewed as a soft label for each sample; convert into
hard label:
C
ĉi = arg max pi,c
c=1
0
3
Demo
▶ The wine data set was introduced by Forina et al. (1986)
▶ It originally includes the results of 27 chemical measurements on 178 wines made
in the same region of Italy but derived from three different cultivars: Barolo,
Grignolino and Barbera
▶ We use the first two principle components of the data
Mixture of 3 Gaussian components
▶ First fun PCA to reduce the data dimension to 2
▶ Use pi,c , c = 1, 2, 3 as the proportion of “red”, “green”, and “blue” components
Properties of EM

▶ EM algorithm converges to local maximum

▶ Heuristic: escaping the local maximum through a random start
▶ EM works on improving Q(θ|θ′ ) rather than directly improving log f (x|θ)
▶ one can show that improvement on Q(θ|θ′ ) improves log f (x|θ)
▶ EM works well with exponential family
▶ E-step: sum of expectations of the sufficient statistics
▶ M-step: maximizing a linear function
usually possible to derive closed-form update
Convergence of EM

▶ Proof by A. Dempster, N. Larid and

D. Rubin in 1977, later generalized by
J. Wu in 1983.
▶ Basic idea: find a sequence of
quadratic lower bounds for the
likelihood function
▶ EM monotonically increases the
observed data log likelihood

ℓ(θk+1 ) ≥ Q(θk+1 ; θk ) ≥ Q(θk ; θk ) = ℓ(θk )

EM Algorithm for Missing Data Analysis
No ratings yet
EM Algorithm for Missing Data Analysis
10 pages
EM Algorithm for Statisticians
No ratings yet
EM Algorithm for Statisticians
36 pages
کتاب ششم بارگزاری شده
No ratings yet
کتاب ششم بارگزاری شده
49 pages
Week11 Summary Detail
No ratings yet
Week11 Summary Detail
7 pages
TR 97 021
No ratings yet
TR 97 021
15 pages
cs229 Notes7b PDF
No ratings yet
cs229 Notes7b PDF
4 pages
GMM and EM Algorithm Overview
No ratings yet
GMM and EM Algorithm Overview
33 pages
PROBABILISTIC Learning Jb-New
No ratings yet
PROBABILISTIC Learning Jb-New
13 pages
Beamer
No ratings yet
Beamer
34 pages
5
No ratings yet
5
29 pages
Mixture Models and Expectation-Maximization: Justus H. Piater
No ratings yet
Mixture Models and Expectation-Maximization: Justus H. Piater
11 pages
Chapter 9.4 Allele Frequency Estimation
No ratings yet
Chapter 9.4 Allele Frequency Estimation
24 pages
Oral Texte
No ratings yet
Oral Texte
12 pages
Lecture-04 GMM EMalg
No ratings yet
Lecture-04 GMM EMalg
34 pages
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
No ratings yet
Algoritmo E-M. Utilizado para Calcular La Mezcla de Gausianas
8 pages
Gaussian Mixture Models & EM Algorithm
No ratings yet
Gaussian Mixture Models & EM Algorithm
3 pages
Python Getters and Setters Explained
No ratings yet
Python Getters and Setters Explained
32 pages
Applied Stat
No ratings yet
Applied Stat
2 pages
Machine Learning: CSCE883
No ratings yet
Machine Learning: CSCE883
22 pages
Expectation Maximization (EM) Algorithm
No ratings yet
Expectation Maximization (EM) Algorithm
47 pages
ELL - 409 - EM Practice Questions
No ratings yet
ELL - 409 - EM Practice Questions
3 pages
Gaussian Distribution
No ratings yet
Gaussian Distribution
5 pages
Introduction To EM - Gaussian Mixture Models
No ratings yet
Introduction To EM - Gaussian Mixture Models
12 pages
Intro To em
No ratings yet
Intro To em
4 pages
Dsci303-19 GM - em
No ratings yet
Dsci303-19 GM - em
81 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
Expectation-Maximization For The Gaussian Mixture Model
No ratings yet
Expectation-Maximization For The Gaussian Mixture Model
8 pages
Gaussian Mixtures
No ratings yet
Gaussian Mixtures
5 pages
GMMEMNotes
No ratings yet
GMMEMNotes
10 pages
EM Presentation 2013
No ratings yet
EM Presentation 2013
18 pages
Lecture 5
No ratings yet
Lecture 5
16 pages
EM Algorithm & Gaussian Mixtures
No ratings yet
EM Algorithm & Gaussian Mixtures
10 pages
ML Unit3 EM GMM VodnalaSrujana
No ratings yet
ML Unit3 EM GMM VodnalaSrujana
4 pages
Chap2 Part2 GMM
No ratings yet
Chap2 Part2 GMM
34 pages
Gaussian Mixture Models Tutorial
No ratings yet
Gaussian Mixture Models Tutorial
26 pages
Expectation Maximization
No ratings yet
Expectation Maximization
19 pages
L08 GMM
No ratings yet
L08 GMM
11 pages
ML-2-Expectation Maximization
No ratings yet
ML-2-Expectation Maximization
11 pages
401 Week7 Part 2 EM Algorithm
No ratings yet
401 Week7 Part 2 EM Algorithm
58 pages
An Alternative View of EM - Poornima
No ratings yet
An Alternative View of EM - Poornima
4 pages
Expectation-Maximization (E-M) Algorithm
No ratings yet
Expectation-Maximization (E-M) Algorithm
12 pages
EM Algorithm and Variants: An Informal Tutorial
No ratings yet
EM Algorithm and Variants: An Informal Tutorial
17 pages
Commonly Used For Clustering, Where Latent Variables Are Inferred and Has Applications in Various Fields, Including
No ratings yet
Commonly Used For Clustering, Where Latent Variables Are Inferred and Has Applications in Various Fields, Including
2 pages
EM Algorithm Overview and Applications
No ratings yet
EM Algorithm Overview and Applications
7 pages
Understanding Expectation Maximization in GMM
No ratings yet
Understanding Expectation Maximization in GMM
28 pages
EM Algorithm for Data Scientists
No ratings yet
EM Algorithm for Data Scientists
31 pages
Gaussian Mixture Model Overview
No ratings yet
Gaussian Mixture Model Overview
55 pages
K-means and EM Clustering Overview
No ratings yet
K-means and EM Clustering Overview
23 pages
E-M Algorithm for GMM Parameter Estimation
No ratings yet
E-M Algorithm for GMM Parameter Estimation
12 pages
EM Algorithm for Multivariate GMM
No ratings yet
EM Algorithm for Multivariate GMM
9 pages
(Slides) The em Algorithm
No ratings yet
(Slides) The em Algorithm
14 pages
The em Algorithm in ML in Bayesian Learning
No ratings yet
The em Algorithm in ML in Bayesian Learning
12 pages
Gaussian Mixture Models
No ratings yet
Gaussian Mixture Models
3 pages
14 Gaussian Mixture Models
No ratings yet
14 Gaussian Mixture Models
60 pages
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
No ratings yet
Some Studies of Expectation Maximization Clustering Algorithm To Enhance Performance
16 pages
ISyE 6203 Spring 2023 Homework 4
No ratings yet
ISyE 6203 Spring 2023 Homework 4
3 pages
Inventory Management Case
No ratings yet
Inventory Management Case
3 pages
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
No ratings yet
Isye 6416: Computational Statistics Spring 2023: Prof. Yao Xie
44 pages
Sarbin Contribution Study Actuarial Individual Methods Prediction
No ratings yet
Sarbin Contribution Study Actuarial Individual Methods Prediction
11 pages
Kleywegt on Supply Chain Decision-Making
No ratings yet
Kleywegt on Supply Chain Decision-Making
161 pages
Discrete Choice Models in Demand Forecasting
No ratings yet
Discrete Choice Models in Demand Forecasting
21 pages
Human Detection With UWB Radar
No ratings yet
Human Detection With UWB Radar
63 pages
Podcast Listening Test-1
No ratings yet
Podcast Listening Test-1
9 pages
SVT PYRO-SAFE ROKU System-Catalogue
No ratings yet
SVT PYRO-SAFE ROKU System-Catalogue
52 pages
Mixing Rate For Solids 2
No ratings yet
Mixing Rate For Solids 2
8 pages
EN 166 FT Safety Spectacle Overview
No ratings yet
EN 166 FT Safety Spectacle Overview
1 page
Sequencing Batch Reactor (SBR) Design Calculations - S.I. Units
80% (10)
Sequencing Batch Reactor (SBR) Design Calculations - S.I. Units
37 pages
Introduction To Commutative Algebra First Edition Atiyah Download
No ratings yet
Introduction To Commutative Algebra First Edition Atiyah Download
56 pages
EMJMD in Data-Intensive Software Systems
No ratings yet
EMJMD in Data-Intensive Software Systems
2 pages
Test Weights: Class M1
No ratings yet
Test Weights: Class M1
20 pages
2020-An Investigation of The Challenges and The Best Practices of BIM Implementation in The Algerian AEC Industry
No ratings yet
2020-An Investigation of The Challenges and The Best Practices of BIM Implementation in The Algerian AEC Industry
15 pages
HYDROGEL Presentation
100% (1)
HYDROGEL Presentation
23 pages
Analysis of Achievement of Sustainable Development Goals of Any Country
No ratings yet
Analysis of Achievement of Sustainable Development Goals of Any Country
4 pages
Web Design Introductory 6th Edition Campbell Ebook and TestBank Bundle Unlocked Test Bank
No ratings yet
Web Design Introductory 6th Edition Campbell Ebook and TestBank Bundle Unlocked Test Bank
311 pages
Release-Rite 6517-TDS
No ratings yet
Release-Rite 6517-TDS
1 page
MyLabX6 160000159 V03 LR
No ratings yet
MyLabX6 160000159 V03 LR
5 pages
Geotextile Sewing & Dewatering Analysis
No ratings yet
Geotextile Sewing & Dewatering Analysis
2 pages
Preparation of Ethyl Acetate
100% (7)
Preparation of Ethyl Acetate
10 pages
Activity 3 BSED A4
No ratings yet
Activity 3 BSED A4
3 pages
ME3492 Hydraulics and Pneumatics 2 Marks Questions and Answers
No ratings yet
ME3492 Hydraulics and Pneumatics 2 Marks Questions and Answers
11 pages
Drishti Ias Tests Prelims
No ratings yet
Drishti Ias Tests Prelims
13 pages
Mycorrhizae Data Analysis Worksheet
No ratings yet
Mycorrhizae Data Analysis Worksheet
1 page
English FAL P3 Feb-March 2015
No ratings yet
English FAL P3 Feb-March 2015
6 pages
Crown Resorts Strategic Analysis Report
No ratings yet
Crown Resorts Strategic Analysis Report
16 pages
QC - The Calculations - Westgard
No ratings yet
QC - The Calculations - Westgard
13 pages
(2018) An - Autonomous - Dock - and - Battery - Swapping - System - For - Multirotor - UAV
No ratings yet
(2018) An - Autonomous - Dock - and - Battery - Swapping - System - For - Multirotor - UAV
7 pages
Thermodynamics An Interactive Approach 1st Edition Bhattacharjee Digital Access
No ratings yet
Thermodynamics An Interactive Approach 1st Edition Bhattacharjee Digital Access
403 pages
Handout 08 Updated
No ratings yet
Handout 08 Updated
4 pages
Magnetic Effects-X Worksheet
No ratings yet
Magnetic Effects-X Worksheet
6 pages
Signals & Systems Lab Guide
No ratings yet
Signals & Systems Lab Guide
5 pages
DLP-L01.2 - Introduction To Personal Development
No ratings yet
DLP-L01.2 - Introduction To Personal Development
2 pages

EM Algorithm & Gaussian Mixture Model

Uploaded by

EM Algorithm & Gaussian Mixture Model

Uploaded by

ISyE 6416: Computational Statistics

Lecture 8: EM algorithm and

Prof. Yao Xie

H. Milton Stewart School of Industrial and Systems Engineering

▶ replace one difficult likelihood

▶ we do not observe S, only observe indirectly from O

Q(θ; θ′ ) = E{log f (S, O|θ)|θ′ , O}

▶ The expectation uses the conditional

Q(θ; θ′ ) = E{log f (S, O|θ)|θ′ , O}

▶ The expectation is taken with respect to the conditional distribution f (S|O)

▶ E-step: compute expectation of the log-likelihood

Q(θ; θ′ ) = E{log f (S, O|θ)|θ′ , O}

▶ M-step: compute maximum likelihood using the expectation in previous step

E-step ⇒ M-step ⇒ E-step ⇒ M-step ⇒

x1 = (0, 2)T , x2 = (1, 0)T , x3 = (2, 2)T , x4 = (∗, 4)T

Use EM algorithm to impute the missing data *.

Pattern classification, R. O. Duda, P. E. Hart, and D. G. Stork

Q(θ|θ0 ) = Ex41 [log p(x|θ)|x1 , x2 , x3 , x42 ]

Multinomial model of 5 category with unknown parameter θ

▶ data y = (125, 18, 20, 34)

ℓ(θ|y) ∝ (y12 + y4 ) log θ + (y2 + y3 ) log(1 − θ)

▶ y12 unknown, cannot directly maximize ℓ(θ|y)

Q(θ|θ′ ) = Ey12 [(y12 + y4 ) log θ + (y2 + y3 ) log(1 − θ)|y1 , . . . , y4 , θ′ ]

πyi ϕ(xi |µyi , Σyi )

▶ (πc(k) , µ(k) (k)

pi,c := p(yi = c|xi ) ∝ πc(k) ϕ(xi |µc(k) , Σ(k)

▶ EM algorithm converges to local maximum

▶ Proof by A. Dempster, N. Larid and

ℓ(θk+1 ) ≥ Q(θk+1 ; θk ) ≥ Q(θk ; θk ) = ℓ(θk )

You might also like