Expectation Maximization

The document provides an overview of expectation maximization (EM), including how it works, applications, and an example of using EM for probabilistic clustering. EM is an iterative method for finding maximum likelihood or posterior estimates of parameters in statistical models, where the model depends on unobserved latent variables. It involves repeating an expectation (E) step, which computes the expected value of the latent variables, and a maximization (M) step, which computes the parameter values maximizing the expected log-likelihood found in the E step.

Uploaded by

Jarir Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

39 views

Expectation Maximization

Uploaded by

Jarir Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 23

Expectation

Maximization
(Data Mining and Warehousing)
 Expectation Maximization
 How EM works
 Applications of EM
 Example – Probabilistic Clustering

Outlines  Fuzzy Clustering Using EM

 K-means vs EM in terms of Clustering
 Gaussian Mixture Model with EM
 Optimal Number of Clusters
 Advantages and Limitations
Expectation Maximization

 An iterative approach to find local maximum likelihood or maximum a

posteriori.
 Can handle latent variables.
 Gaussian mixture models are an approach to density estimation where the
parameters of the distributions are fit using the expectation-maximization
algorithm.
General MLE Process Vs. EM

 Both Maximum Likelihood Estimation (MLE) and EM can find the "best-fit"
parameters, but with different methodologies.
 MLE accumulates all the data object to estimate the parameters; but EM
takes a guess at the parameter first, and then tweaks the model to fit the
guesses and the observed data.

 EM – The Chicken and Egg Problem

- Need parameters (mean, covariance) to need the source of the points.
- Need to know the source to estimate those parameters.
EM Algorithm – How It Works

Below are the steps of the Expectation Maximization Algorithm:

 The E-step: Estimate the missing variables (latent parameters) in the dataset.
 The M-step: Maximize the parameters of the model in the presence of the
data.
These two steps are repeated until convergence.
EM Algorithm – How It Works (Contd.)
 The EM algorithm is mostly used in probabilistic
clustering methods (unsupervised), especially in Fuzzy
Clustering and Probabilistic Model Based Clustering.
Applications of  Computer Visions and Machine Learning.

The EM  Natural Language Processing (NLP).

 Estimation the parameters of Hidden Markov Model
Algorithm (HMM) Classifiers.
 Reconstruction of medical images.
Probabilistic Clustering

 A method for deriving cluster where each object is assigned a probability of

belonging to a cluster.
 Data objects are assumed to be coming from some distribution function.
 If the probability is interpreted as degree of membership, then these are
fuzzy clustering techniques.
Fuzzy Clustering

Given a set of objects, X = {x1,…,xn}, a fuzzy set S is a subset of X that allows

each object in X to have a membership degree between 0 and 1. Formally, a
fuzzy set, S, can be modeled as a function, FS:X-> [0,1].
This concept can be applied on clustering.
 Fuzzy clustering allows an object to belong to more than one cluster.
 This clustering can be represented using a partition matrix MT, where each
object is assigned a membership degree, for each fuzzy cluster.
 Also called soft clustering.
 Used in text mining.
Fuzzy Clustering (Contd.)
Fuzzy Clustering Using EM

Consider following six

points:
a(3,3), b(4,10), c(9,6),
d(14,8), e(18,11) and
f(21,7)
We randomly select two
points, say c1 = a and c2 =
b, as the initial centers of
the clusters.
Fuzzy Clustering Using EM (Contd.)

 1st E step: Assign objects to the clusters.

 Calculate the weight (Membership Degree) of each object for each cluster: wij
means the weight of object i in cluster j.
 For any data object o,

wij =

For data point, c(9,6), wc,c1= =0.48 and wc,c2= =0.52

Fuzzy Clustering Using EM (Contd.)

1st M Step: Update the previously assigned centroids.

where j=1,2.
According to this formula, after 1st iteration, we get the updated
centers c1(8.47,5.12) and c2(10.42,8.99).
Fuzzy Clustering Using EM (Contd.)
Fuzzy Clustering Using EM (Contd.)

Partition matrix and updated centroids after three iterations:

Comparison on Clustering Methods
(k-means Vs. EM)
K-means Expectation Maximization
1. Hard Clustering. 1. Soft clustering.
2. Based on Euclidean 2. Based on density probability.
distance.
3. Works on numeric data 3. Works on both nominal and
only. numeric data.
4. Inherently non-robust; 4. Robust method.
sensitive to outliers.
Probabilistic Model Based Clustering and
Mixture Models
Probabilistic Model Based Clustering
 Cluster analysis is to find hidden categories based on generative models.
 Each hidden category is a distribution over the data space.
 Each category represents a probabilistic cluster.

Mixture Models
 Probabilistically-grounded way of doing soft clustering.
 Each cluster is a generative model (Gaussian, Multinomial).
 The parameters are latent (Mean, covariance etc.).
Advantages and Limitations of EM

Advantages:
 It is always guaranteed that likelihood will increase with each iteration.
 The E-step and M-step are often pretty easy for many problems in terms of
implementation.
 Solutions to the M-steps often exist in the closed form.
Limitations:
 Slow convergence.
 It makes convergence to the local optima only.
 It requires both the probabilities, forward and backward (numerical
optimization requires only forward probability).
EM – Dealing with The Local Maxima
Problem

 The EM algorithm iterations always increases the likelihood, but it has a

propensity to converge to local maxima.
 Restarting the algorithm from the initial "parameter guessing" step can be a
solution.
 From all the possible (guessed) parameters, we can choose the one which
yields the greatest maximum likelihood.
References

1. Jung YG, Kang MS, Heo J. Clustering performance comparison using K-means and
expectation maximization
algorithms. Biotechnol Biotechnol Equip. 2014;28(sup1):S44-S48.
doi:10.1080/13102818.2014.949045
2. Gupta, Ujjwal Das, Vinay Menon, and Uday Babbar. "Detecting the number of
clusters during expectation-maximization clustering using information criterion."
In 2010 Second International Conference on Machine Learning and Computing, pp.
169-173. IEEE, 2010.
3. https://towardsdatascience.com/a-comparison-between-k-means-clustering-and-
expectation-maximization-estimation-for-clustering-8c75a1193eb7
4. https://www.geeksforgeeks.org/ml-expectation-maximization-algorithm/
5. http://www.inf.ed.ac.uk/teaching/courses/iaml/2011/slides/em.pdf
6. https://machinelearningmastery.com/expectation-maximization-em-algorithm/
7. https://www.statisticshowto.com/em-algorithm-expectation-maximization/
Thank you!!
Univariate Gaussian Mixture Model with EM
Does it look
more like a
sample from
yellow
gaussian, or
blue?

Bayesian Posterior

Each cluster
follows 1-D
Gaussian
distribution.
Optimal Number of k for Gaussians

 Increasing the number of clusters?

- The dimentionality of the model also increases.
- Monotonous increase in likelihood.
 Focus on maximizing the likelihood with any number of clusters?
- We may end up with k = n clusters for n data points.
 An Information Criteria parameter is used for selection among models with
different number of parameters P.
- Introduces a penalty term for each parameter.
- Pick the simplest of the models.
BIC : maxp { L – ½*p*log(n)}
AIC : minp {2p – L}

ET4248E - Chap9 - K-Means and GMM
No ratings yet
ET4248E - Chap9 - K-Means and GMM
27 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
Support Vector Machines
No ratings yet
Support Vector Machines
14 pages
Introduction To Tree Methods
No ratings yet
Introduction To Tree Methods
15 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
03 - K Means Clustering On Iris Datasets
No ratings yet
03 - K Means Clustering On Iris Datasets
4 pages
Agglomerative Hierarchical Clustering
No ratings yet
Agglomerative Hierarchical Clustering
21 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
ML Practice 1
No ratings yet
ML Practice 1
106 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
Chapter
100% (1)
Chapter
101 pages
DBSCAN
No ratings yet
DBSCAN
42 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
CH 6
No ratings yet
CH 6
72 pages
Association Rules FP Growth
No ratings yet
Association Rules FP Growth
32 pages
Non Parametric Methods 8
No ratings yet
Non Parametric Methods 8
23 pages
K Means Clustering Lecture
No ratings yet
K Means Clustering Lecture
32 pages
An Introduction Of: Support Vector Machine
No ratings yet
An Introduction Of: Support Vector Machine
36 pages
Lecture 3 Data Mining
No ratings yet
Lecture 3 Data Mining
30 pages
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
No ratings yet
Market Basket Analysis and Advanced Data Mining: Professor Amit Basu
24 pages
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
No ratings yet
Support Vector Machines: Dominik Wisniewski Wojciech Wawrzyniak
16 pages
K Means
No ratings yet
K Means
22 pages
Chapter 7
100% (1)
Chapter 7
31 pages
K Means Clustering
100% (1)
K Means Clustering
13 pages
ML Unit-2
No ratings yet
ML Unit-2
26 pages
Matplotlib Fundamentals
No ratings yet
Matplotlib Fundamentals
31 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
Data Preprocessing
No ratings yet
Data Preprocessing
57 pages
K-Means Clustering
No ratings yet
K-Means Clustering
6 pages
Naïve Bayes Classifier (Week 8)
No ratings yet
Naïve Bayes Classifier (Week 8)
18 pages
K-Means and PCA
No ratings yet
K-Means and PCA
69 pages
IS328 Final Exam
No ratings yet
IS328 Final Exam
12 pages
Outline: Problem Statement Definitions & Examples Strategies
No ratings yet
Outline: Problem Statement Definitions & Examples Strategies
7 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
DBSCAN Clustering Algorithm: Presented by
No ratings yet
DBSCAN Clustering Algorithm: Presented by
22 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
KNN Algorithm
No ratings yet
KNN Algorithm
3 pages
Seminar Report Machine Learning
No ratings yet
Seminar Report Machine Learning
20 pages
Branch and Bound NOV 2021
No ratings yet
Branch and Bound NOV 2021
38 pages
ML-UNIT-5
No ratings yet
ML-UNIT-5
20 pages
07 Hierarchical Clustering
No ratings yet
07 Hierarchical Clustering
19 pages
DBSCAN
No ratings yet
DBSCAN
18 pages
(Ebook) RapidMiner: Data Mining Use Cases and Business Analytics Applications by Markus Hofmann, Ralf Klinkenberg ISBN 9781482205503, 1482205505 2024 scribd download
100% (1)
(Ebook) RapidMiner: Data Mining Use Cases and Business Analytics Applications by Markus Hofmann, Ralf Klinkenberg ISBN 9781482205503, 1482205505 2024 scribd download
81 pages
Naive Bayes Classification
No ratings yet
Naive Bayes Classification
47 pages
Chapter 6 ML Classifications
No ratings yet
Chapter 6 ML Classifications
51 pages
Supervised and Deep Learning
No ratings yet
Supervised and Deep Learning
83 pages
Data Preprocessing
No ratings yet
Data Preprocessing
38 pages
Chap6 Advanced Association Analysis
No ratings yet
Chap6 Advanced Association Analysis
85 pages
Data Mining:: Concepts and Techniques
100% (1)
Data Mining:: Concepts and Techniques
63 pages
ML Kernel Methods
No ratings yet
ML Kernel Methods
51 pages
Ensemble Methods Bagging Boosting and Stacking
100% (1)
Ensemble Methods Bagging Boosting and Stacking
19 pages
Literature Review On Feature Selection Methods For HighDimensional Data
No ratings yet
Literature Review On Feature Selection Methods For HighDimensional Data
9 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
02 ML Supervised Learning
No ratings yet
02 ML Supervised Learning
32 pages
Data Pre-Processing (Pandas)
No ratings yet
Data Pre-Processing (Pandas)
19 pages
Lecture Expectation Maximization
No ratings yet
Lecture Expectation Maximization
58 pages
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
No ratings yet
WINSEM2020-21 CSE4020 ETH VL2020210504996 Reference Material I 12-May-2021 5.5 Expectation Maximization
28 pages
Matrix(Farhan)
No ratings yet
Matrix(Farhan)
57 pages
English 2014
No ratings yet
English 2014
2 pages
Cse 16 1ST Yr 1ST Sem CT
No ratings yet
Cse 16 1ST Yr 1ST Sem CT
4 pages
All Paragraphs
No ratings yet
All Paragraphs
12 pages
Md. Shahid Uz Zaman Dept. of CSE, RUET
No ratings yet
Md. Shahid Uz Zaman Dept. of CSE, RUET
18 pages
Slides - Simple Linear Regression
No ratings yet
Slides - Simple Linear Regression
35 pages
Understanding The Self Prelim
100% (4)
Understanding The Self Prelim
8 pages
Ged 107 - Ethics Module 9
No ratings yet
Ged 107 - Ethics Module 9
3 pages
The Influence of Customer Knowledge Management On The Level of Customer Perceived Value - Case Study of The National Insurance Company
No ratings yet
The Influence of Customer Knowledge Management On The Level of Customer Perceived Value - Case Study of The National Insurance Company
16 pages
Final Exam
No ratings yet
Final Exam
5 pages
Influence of Social Media Networking Sites On Creative and Innovative Behaviours of Undergraduate Students in A Nigerian University
No ratings yet
Influence of Social Media Networking Sites On Creative and Innovative Behaviours of Undergraduate Students in A Nigerian University
9 pages
Training With Focus On Training Design
No ratings yet
Training With Focus On Training Design
55 pages
LIST-OF-E-BOOKS-
No ratings yet
LIST-OF-E-BOOKS-
1,473 pages
Topic 2-Introducing Child Development Thinking Into Programme Planning
No ratings yet
Topic 2-Introducing Child Development Thinking Into Programme Planning
5 pages
FS1 Print
No ratings yet
FS1 Print
31 pages
Socialization Values Norms Status and Roles Pascua
No ratings yet
Socialization Values Norms Status and Roles Pascua
2 pages
PM One Page Flowchart
No ratings yet
PM One Page Flowchart
1 page
Presentation Conference
No ratings yet
Presentation Conference
11 pages
Educational Technology and System Approach PDF
No ratings yet
Educational Technology and System Approach PDF
19 pages
Scutchfield and Kecks Principles of Public Health Practice 4th Edition Erwin Solutions Manual 1
100% (65)
Scutchfield and Kecks Principles of Public Health Practice 4th Edition Erwin Solutions Manual 1
5 pages
Analyzing Oral Reading
No ratings yet
Analyzing Oral Reading
3 pages
8-2 Health Services Research Translating Research Into Policy
No ratings yet
8-2 Health Services Research Translating Research Into Policy
39 pages
Design Thinking
No ratings yet
Design Thinking
12 pages
Epidemiological Surveillance Systems
No ratings yet
Epidemiological Surveillance Systems
19 pages
First Summative Test in Understanding Culture Society and Politics2
100% (1)
First Summative Test in Understanding Culture Society and Politics2
4 pages
مكانة-تكنولوجيا-المعلومات-في-المحاسبة-المالية-في-الجزائر-وأثرها-على-مهنة-المراجعة-القانونية.
No ratings yet
مكانة-تكنولوجيا-المعلومات-في-المحاسبة-المالية-في-الجزائر-وأثرها-على-مهنة-المراجعة-القانونية.
11 pages
Jacqueline Gollin-Key Concepts in ELT
No ratings yet
Jacqueline Gollin-Key Concepts in ELT
2 pages
Vital Statistics Rates in The United States 1900-1940
No ratings yet
Vital Statistics Rates in The United States 1900-1940
1,053 pages
schedule For Even Semester End Term Examinations, April 2024
No ratings yet
schedule For Even Semester End Term Examinations, April 2024
2 pages
Lac Session On Tos - Antoni PDF
No ratings yet
Lac Session On Tos - Antoni PDF
50 pages
Kmtc Pre Service Training Opportunities for 2023 2024 Academic Year March 2024 Intake 3
No ratings yet
Kmtc Pre Service Training Opportunities for 2023 2024 Academic Year March 2024 Intake 3
2 pages
10C - A Longitudinal Study of Herd Behavior in The Adoption and Continued Use of Technology - 2013
No ratings yet
10C - A Longitudinal Study of Herd Behavior in The Adoption and Continued Use of Technology - 2013
30 pages
KU GST Result 202122 04 Dec
No ratings yet
KU GST Result 202122 04 Dec
25 pages
Damodaram Sanjivayya National Law University Visakhapatnam
No ratings yet
Damodaram Sanjivayya National Law University Visakhapatnam
64 pages
2-2-7-625
No ratings yet
2-2-7-625
12 pages
KKI - Borang Report Diskusi 1 & 2
No ratings yet
KKI - Borang Report Diskusi 1 & 2
4 pages