Lec 12

The document discusses generative models in machine learning, focusing on maximum likelihood estimation (MLE) and the differences between generative and discriminative algorithms. It highlights the advantages and disadvantages of generative algorithms, such as their ability to work with limited data and robustness to feature corruption. Additionally, it introduces more complex models like Gaussian mixtures and the Expectation Maximization (EM) algorithm for optimizing these models with latent variables.

Uploaded by

myalternativemail6803

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views15 pages

Lec 12

Uploaded by

myalternativemail6803

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Generative Models

General Recipe for MLE

Algorithms
Given a problem with label set , find a way to map data
features to PMFs with support
The notation captures parameters in the model (e.g. vectors,
bias terms)
For binary classification, and
For multiclassification, and
The function is often called the likelihood function
The function called negative log likelihood function
Given data , find the model parameters that maximize
likelihood function i.e. think that the training labels are
very likely
Generative Models
So far, we looked at probability theory as a tool to
express the belief of an ML algorithm that the true label
is such and such
Likelihood: given model it tells us
We also looked at how to use probability theory to
express our beliefs about which models are preferred by
us and which are not
Prior: this just tells us
Notice that in all of this, the data features were always
considered constant and never questions as being
random or flexible
Can we also talk about ?
Generative Algorithms
ML algos that can learn dist. of the form or or
A slightly funny bit of terminology used in machine learning
Discriminative Algorithms: that only use to do their stuff
Generative Algorithms: that use or etc to do their stuff
Generative Algorithms have their advantages and disadvantages
More expensive: slower train times, slower test times, larger models
An overkill: often, need only to make predictions – disc. algos enough!
More frugal: can work even if we have very less training data (e.g.
RecSys)
More robust: can work even if features corrupted e.g. some features
missing
A recent application of generative techniques (GANs etc) allows
us to
Generate novel examples of a certain class of data points
A very simple generative model
Given a few feature vectors (never mind labels for now)
We wish to learn a probability distribution with support over
This distribution should capture interesting properties about the
data in a way that allows us to do things like generate similar-
looking feature vectors etc
Let us try to learn a standard Gaussian as this distribution
i.e. we wish to learn so that the distribution explains the
data well
One way is to look for a that achieves maximum likelihood i.e.
MLE!!
As before, assume that our feature vectors were independently
generated
which, upon applying first order optimality, gives us
We just learnt as our generating dist. for data features!
A more powerful generative
model
Suppose we are not satisfied with the above simple
model
Suppose we wish to instead learn as well as a so that
the distribution explains the data well
Log likelihood function (be careful – cannot ignore any
terms now)
where

F.O. optimality w.r.t. i.e. gives us

F.O. optimality w.r.t i.e. gives us
Since this must be global opt. too!
A still more powerful generative
model
Suppose we wish to instead learn as well as a so that
the distribution explains the data well ( notation for PSD)
where

F.O.O. w.r.t. i.e. gives

Definitely when i.e. when
We may have in some other funny cases even when which
basically means there may be multiple optima for this problem
F.O. optimality w.r.t i.e. requires more work
A still more powerful generative
model 8
For a square matrix , its trace is defined as the sum of its
diagonal elements
Easy result: if , then where
Not so easy result: if is a constant matrix, then
Recall: dims of derivs always equal those of quantity w.r.t which
deriv is taken
Let us denote for convenience
New expression: where
A still more powerful generative
model See “The Matrix
9
For any we have the following
Cookbook” (reference
Symmetry: section on course
webpage) for these
Linearity:
results
New expression: where
(assume symm)

F.O.O. w.r.t. i.e. gives which gives

Since as well as symmetric, this must be the global optimum!
MAP, Bayesian Generative
Models? 10
The previous techniques allow us to learn the parameters of
a Gaussian distribution (either or or ) that offer the highest
likelihood of observed data features by computing the MLE
We can incorporate priors over (e.g. Gaussian, Laplacian),
priors over (e.g. inverse Gamma dist. which has support
only over non-negative numbers) and (e.g. inverse Wishart
dist. which has support only over PSD matrices) and
computer the MAP
We can also perform full-blown Bayesian inference by
computing posterior distributions over quantities such as –
calculations involving predictive posterior get messy –
beyond scope of CS771
However, can make generative models more powerful in
Still more powerful generative
model?
Suppose we are concerned that a single Gaussian
cannot capture all the variations in our data
Just as in LwP when we realized sometimes, a single prototype
not enough
Can we learn 2 (or more) Gaussians to represent our data
instead?
Such a generative model is often called a mixture of
Gaussians
The Expectation Maximization (EM) algorithm is a very
powerful technique for performing this and several other
tasks
Soft clustering, learning Gaussian mixture models (GMM)
Robust learning, Mixed Regression
Learning a Mixture of Two
Gaussians
This means that if someone tells us that this means that the
We suspect that instead
first Gaussian of one
is responsible Gaussian,
for that data pointtwo
and Gaussians
are involved in generating
consequently, the likelihoodour feature
expression is . vectors
Similarly, if
Let us calltells
someone them andthis means that the second Gaussian is
us that
responsible
Each of these foristhat dataapoint
called and the likelihood
component expression
of this GMM
Covariance matrices, moreisthan .
two components can also be
incorporated
Since we are unsure which data point came from which
component, we introduce a latent variable per data
point to denote this
The English word “latent” means hidden or dormant or
concealed
Nice name since this variable describes something that was
hidden from us
MLE with Latent Variables
We wish to obtain the maximum (log) likelihood
models i.e.

Since we do not know the values of latent variables,

force them into the expression using the law of total
probability
We did a similar thing (introduce models) in predictive
posterior calculations

Very difficult optimization problem – NP-hard in general

However, two heuristics exist which work reasonably well in
practice
Heuristic 1: Alternating
Optimization step 2 till you are tired or till the
Keep alternating between step 1 and

Convert the original optimization problem

process has converged!

to a double maximization problem (assume const)

In several ML problems with latent vars, although the above

double optimization problem is (still) difficult, following two
problems are easy
The most important difference between the original and
The intuition behind reducing things to a
a sum double
Step 1:the new
Fix problem
and update is that original
latent has
variables tooftheir
log of optimal
sum
valuesoptimization is that
which is very it may
difficult be mostly
to optimize the case
whereas thethat
new only
one of the terms
problem in of
gets rid the summation
this and looks will dominate
simply and if
like a MLE
this is theWe
problem. case, then
know howapproximating
to solve MLE the sum byvery
problems the
Step 2: Fix latent variables
largest and update
term easily!
should to their optimal
be okay i.e.
values
Heuristic 1 at Work Isn’t this like the k-
means clustering
algorithm?
As discussed before, we assume a mixture of two
Gaussians Not just “like” – this is the k-means algorithm!
and This means that the k-means algorithm is
Step 1 becomesone heuristic way to compute an MLE which
is difficult to compute directly!
Indeed! Notice that even here, instead of
Step 2 becomes choosing just one value of the latent
variables at each time step, we can instead
use a distribution over their support
I have a feeling that
Thus, and where is the number of data pointsheuristic
the second for which we
have will also give us
Repeat! something familiar!

ds11 2
No ratings yet
ds11 2
19 pages
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
No ratings yet
Gaussian Mixture Model: P (X - Y) P (Y - X) P (X)
3 pages
Machine Learning Algorithms Guide
No ratings yet
Machine Learning Algorithms Guide
30 pages
Machine Learning UNIT-2: Logistic Regression
No ratings yet
Machine Learning UNIT-2: Logistic Regression
12 pages
Lec 13
No ratings yet
Lec 13
27 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
4 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
Module 2 Notes Bcs602
No ratings yet
Module 2 Notes Bcs602
19 pages
Lec 24
No ratings yet
Lec 24
39 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Summer of Science-Final Report
100% (1)
Summer of Science-Final Report
7 pages
Machine Learning: The Basics
No ratings yet
Machine Learning: The Basics
288 pages
ML Merge
No ratings yet
ML Merge
145 pages
ECE 449 Notes
No ratings yet
ECE 449 Notes
5 pages
Probabilistic Machine Learning An Introduction Book 1 (Kevin P Murphy)
100% (1)
Probabilistic Machine Learning An Introduction Book 1 (Kevin P Murphy)
949 pages
Generative vs. Discriminative Models
No ratings yet
Generative vs. Discriminative Models
11 pages
COMP4702 Notes 2019: Week 2 - Supervised Learning
No ratings yet
COMP4702 Notes 2019: Week 2 - Supervised Learning
23 pages
Generative Models
No ratings yet
Generative Models
10 pages
Machine Learning Class Notes: SVM & Bayesian Learning
No ratings yet
Machine Learning Class Notes: SVM & Bayesian Learning
16 pages
Generative Learning Algorithims 1233
No ratings yet
Generative Learning Algorithims 1233
33 pages
Poly Aml
No ratings yet
Poly Aml
76 pages
PRML Exercise Solutions Guide
No ratings yet
PRML Exercise Solutions Guide
87 pages
L11 - UCLxDeepMind DL2020
No ratings yet
L11 - UCLxDeepMind DL2020
68 pages
Bzdok, D., Altman, N., & Krzywinski, M. (2020)
No ratings yet
Bzdok, D., Altman, N., & Krzywinski, M. (2020)
6 pages
Cheatsheet Supervised Learning
100% (1)
Cheatsheet Supervised Learning
4 pages
Lecture 2 Annotated
No ratings yet
Lecture 2 Annotated
60 pages
ChatGPT - Machine Learning Overview
No ratings yet
ChatGPT - Machine Learning Overview
34 pages
Main 2
No ratings yet
Main 2
37 pages
Supervised Learning Cheatsheet
No ratings yet
Supervised Learning Cheatsheet
2 pages
Lecture 26 - Latent Variable Models (1) - Plain
No ratings yet
Lecture 26 - Latent Variable Models (1) - Plain
9 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
CS 601 Machine Learning Unit 5
No ratings yet
CS 601 Machine Learning Unit 5
18 pages
Week 12 Foundations of Generative AIv2 2
No ratings yet
Week 12 Foundations of Generative AIv2 2
74 pages
Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei
No ratings yet
Deep Learning As A Building Block in Probabilistic Models: Pierre-Alexandre Mattei
62 pages
Latent Variable Model - Notes
No ratings yet
Latent Variable Model - Notes
11 pages
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Machine Learning - WWW - Rgpvnotes.in
17 pages
Brief Intro To ML PDF
No ratings yet
Brief Intro To ML PDF
236 pages
Lecture 03 - Feedforward Networks - 4p
No ratings yet
Lecture 03 - Feedforward Networks - 4p
19 pages
ML Columbia PDF
No ratings yet
ML Columbia PDF
615 pages
Statistical Machine Learning-The Basic Approach and Current Research Challenges
No ratings yet
Statistical Machine Learning-The Basic Approach and Current Research Challenges
35 pages
AML Important Topics
No ratings yet
AML Important Topics
9 pages
L09 Learning I Bayesian Learning
No ratings yet
L09 Learning I Bayesian Learning
66 pages
ML 3
No ratings yet
ML 3
66 pages
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
No ratings yet
Aula 4 (L) - Oggi La Tua Lezione È in Presenza
11 pages
Module-2 Notes-Bcs602
No ratings yet
Module-2 Notes-Bcs602
18 pages
Intro to Variational Autoencoders
No ratings yet
Intro to Variational Autoencoders
89 pages
Week 7 - Latent Variable Models and Expectation Maximization
No ratings yet
Week 7 - Latent Variable Models and Expectation Maximization
39 pages
Quadrant Data Efficient Machine Learning Screen
No ratings yet
Quadrant Data Efficient Machine Learning Screen
6 pages
M03 Clustering
No ratings yet
M03 Clustering
37 pages
Deep Learning A Tutorial
No ratings yet
Deep Learning A Tutorial
16 pages
Slide 1
No ratings yet
Slide 1
37 pages
Applied Statistics - Lecture 1: Mario Beraha
No ratings yet
Applied Statistics - Lecture 1: Mario Beraha
52 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
48 pages
07 - Bayesian Learning
No ratings yet
07 - Bayesian Learning
55 pages
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-03 Reference-Material-I
No ratings yet
WINSEM2024-25 CSE4006 ETH AP2024254000689 2025-01-03 Reference-Material-I
39 pages
Lecture 1and2-Revision Part1
No ratings yet
Lecture 1and2-Revision Part1
53 pages
Lecture # 2-1 Probabilistic Models
No ratings yet
Lecture # 2-1 Probabilistic Models
40 pages
Vehicle Manual for Technicians
No ratings yet
Vehicle Manual for Technicians
1 page
Future With Will Exercises - Homework 2
No ratings yet
Future With Will Exercises - Homework 2
2 pages
Courbans Method
No ratings yet
Courbans Method
2 pages
Report
No ratings yet
Report
27 pages
Final Research 13
No ratings yet
Final Research 13
20 pages
Yukitoshi Higashino Mfta
100% (2)
Yukitoshi Higashino Mfta
29 pages
Disks - RouterOS - MikroTik Documentation
No ratings yet
Disks - RouterOS - MikroTik Documentation
1 page
Remote Sensing and Geographical Information System For Natural Disaster Management
No ratings yet
Remote Sensing and Geographical Information System For Natural Disaster Management
3 pages
Untitled Document 3
No ratings yet
Untitled Document 3
2 pages
Core House - Neue Nationalgalarie
No ratings yet
Core House - Neue Nationalgalarie
46 pages
Tutorial 2 State Space Continuous Time System: T X T y T y T y
No ratings yet
Tutorial 2 State Space Continuous Time System: T X T y T y T y
3 pages
Paul and The Law
100% (1)
Paul and The Law
27 pages
TR 28
100% (1)
TR 28
4 pages
Ground Sensor Ga Class 0940 Testing
No ratings yet
Ground Sensor Ga Class 0940 Testing
4 pages
HP - LP Bypass and Aprds System
100% (6)
HP - LP Bypass and Aprds System
47 pages
Women Travellers
No ratings yet
Women Travellers
76 pages
Gamma-Gamma Fading in FSO MIMO Systems
No ratings yet
Gamma-Gamma Fading in FSO MIMO Systems
12 pages
Dutch Flower Industry Analysis
No ratings yet
Dutch Flower Industry Analysis
13 pages
04 Stalls
No ratings yet
04 Stalls
24 pages
Brief - Unit 2 Drugs and Cosmetics Act
100% (1)
Brief - Unit 2 Drugs and Cosmetics Act
28 pages
750com-In002 - En-P Profibus Card Instalation Manual
No ratings yet
750com-In002 - En-P Profibus Card Instalation Manual
4 pages
T S Eliot Poems
No ratings yet
T S Eliot Poems
9 pages
Bayes Theorem PDF
No ratings yet
Bayes Theorem PDF
9 pages
CH11 Probs
No ratings yet
CH11 Probs
10 pages
Mishary Rahid Al Afasy Muhammad 160215154848
No ratings yet
Mishary Rahid Al Afasy Muhammad 160215154848
3 pages
Step-By-Step Configuration of MRP Types in Sap PP
No ratings yet
Step-By-Step Configuration of MRP Types in Sap PP
3 pages
Chapter Seven The Behavior of Proteins: Enzymes, Mechanisms, and Control
No ratings yet
Chapter Seven The Behavior of Proteins: Enzymes, Mechanisms, and Control
40 pages
Logistics Information System
No ratings yet
Logistics Information System
6 pages
Deloitte Life Sciences Healthcare Predictions
No ratings yet
Deloitte Life Sciences Healthcare Predictions
28 pages
Soil Permeability Calculations
No ratings yet
Soil Permeability Calculations
2 pages

Lec 12

Uploaded by

Lec 12

Uploaded by

Generative Models

General Recipe for MLE

F.O. optimality w.r.t. i.e. gives us

F.O.O. w.r.t. i.e. gives

F.O.O. w.r.t. i.e. gives which gives

Since we do not know the values of latent variables,

Very difficult optimization problem – NP-hard in general

Convert the original optimization problem

to a double maximization problem (assume const)

In several ML problems with latent vars, although the above

You might also like