Lecture 5 Variational Autoencoder

Variational Autoencoders (VAEs) are generative models that utilize a probabilistic approach to represent data distributions, differing from traditional autoencoders by mapping inputs to distributions rather than fixed vectors. They employ a reparameterization trick to enable backpropagation through stochastic sampling, allowing for effective training. VAEs, particularly disentangled VAEs, aim to extract meaningful latent features from high-dimensional data, which can be beneficial in applications like reinforcement learning and image processing.

Uploaded by

wen man

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Lecture 5 Variational Autoencoder

Uploaded by

wen man

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Variational Autoencoders

● Variational autoencoders (VAEs) are generative models, like Generative Adversarial Networks.
● Their association with this group of models derives mainly from the architectural affinity with the basic
autoencoder (the final training objective has an encoder and a decoder), but their mathematical
formulation differs significantly.
● VAEs are directed probabilistic graphical models (DPGM) whose posterior is approximated by a neural
network, forming an autoencoder-like architecture.
● Differently from discriminative modeling that aims to learn a predictor given the observation, generative
modeling tries to simulate how the data are generated, in order to understand the underlying causal
relations.
● Causal relations have indeed the great potential of being generalisable.

In a lot of real-world problems, we have a whole bunch of data that we're looking at. It could be images, or text,
or audio, whatever it is. But the underlying factor, the underlying process of the data could be much simpler in
a much lower dimensional space than the actual data that we're looking at. So a lot of techniques in machine
learning, they try to compress the dimensionality of your data into a smaller space. One very popular technique
that's used a lot in recent papers is called variational autoencoders. It is going to be a pretty technical episode,
so I hope you're ready to dive into the mechanics of variational autoencoders.

So before we dive into the mechanics of variational autoencoders, I first want to introduce normal
autoencoders. So I'm going to assume that you're already familiar with the neural network architectures that
we have, things like back propagation and all of that. So what an autoencoder does is it takes some kind of input
data, it could be an image or a vector, anything at all, with a very high dimensionality, it's gonna run it through
this neural network, and it's gonna try and compress the data into a smaller representation.

It does this with two principal components. The first component is what we call the encoder. The encoder is
simply a bunch of layers. They could be fully connected layers or convolutional that are going to take the input,
and they're going to compress it down to a smaller representation which has less dimensions than the input and
this is what we call the bottleneck.

And then from the bottleneck it’s going to try and reconstruct the input by using again fully connected or
convolutional layers.

And then the last function of training an autoencoder is simply looking at the reconstructed version at the end
of your decoder network.

And then you're going to simply compute the reconstruction loss with respect to your input. And by simply
comparing pixel to pixel differences in the output, we can create a loss function, and we can start training our
network to compress images.
And so obviously you have simple encoders that use fully connected layers, but you can just as well swap them
out with convolutional layers if you're working with images or something like audio for example.

And if you look at what's going on here, if you train a deep convolutional network to do encoding and decoding
of a whole bunch of images, you're actually creating a whole new kind of compression algorithm. And Google is
actually thinking of using these types of networks for reducing the amount of bandwidth that you use on your
phone.
So if you download an image, then the full resolution image is first downscaled, then it's sent to you over the
wireless internet connection, and then in your phone there is actually a decoder that reconstructs the full
resolution image from the compressed representation.

And if you apply this to something like m-miss for example, then it's very interesting to see what these hidden
representations are actually learning.

So here you can see a bunch of images where on the left side you can see the input digits that are being fed
through the network, and then on the right side all of those are reconstructed images. But you can see what
happens if we change the size of the hidden representation.

So if we use only a 2D hidden representation that means that our bottleneck, you know in the middle of the
network, is only two variables. Then we get reconstructions that look pretty okay but they are very fuzzy and
the fuzziness is because you force the entire information of your image to go through two single variables, and
then when you reconstruct obviously you lose some of that detail and that is why the images look so fuzzy.

If you use more dimensions in your latent representation, you can get reconstructions that are much clearer
and much sharper but you need more information in that bottleneck.

And it's interesting to note that the exact same technique is applied to image segmentation as well. So you take
an input image, you run it through your convolutional encoder, it goes through a bottleneck representation, and
then it gets remapped to a full output image. But in this case, instead of reconstructing the original image,
you're actually trying to target a segmented version of your image.

And it's exactly this type of network that is used in self-driving cars to segment the different parts of the public
road into specific objects that a car needs to detect.

Okay so that's the basic idea behind auto-encoders but there are a few very clever tricks that you can apply to
an auto encoder to have it do some really fancy stuff.
So imagine that you start with a normal m-miss digit. It's a clean image, nothing's wrong with it.

But then you add a whole lot of noise to it and you're going to run that noisy image through your encoder
network. You get through the bottleneck representation and then you try to reconstruct the image.

But instead of reconstructing the noisy image, what you're going to do is try and reconstruct the original clean
image. And if you train this network and a whole bunch of these noisy m-miss digits, you're going to try and
force the encoder step to actually get rid of the noise. And this is what we call a denoising auto-encoder.

And so you can see here that by using this approach you can actually train a denoising auto-encoder that is very
good at removing noise from input images. And denoising images isn't the only thing that you can do with this
type of approach.
So in this case for example you take an input and instead of adding noise to it, you simply crop a rectangular
area out of the image, and you throw it away. You replace it with white or black pixels. You feed that input image
through the network, and you try to reconstruct the original full image.

And this technique is what we call neural impeding. It's where you take a small part of the image you throw it
away and then you ask the network to reconstruct whatever was there in the input image.

And with this approach you can do simple things like removing watermarks from images. But you could also
remove a parked car for example if you are filming on a movie set in a natural setting.
Okay so now that we have the basic concept behind a normal auto encoder, let's introduce variational
autoencoders. So the idea behind variational auto-encoders is that instead of mapping any input to a fixed
vector, you want to map your input onto a distribution. And so the only thing that's different in a variational
autoencoder is that your normal bottleneck vector Z is replaced by two separate vectors. One representing the
mean of your distribution, and the other one representing the standard deviation of that distribution. And so
whenever you need a vector to feed through your decoder network, the only thing you have to do is take a
sample from the distribution and then feed it to the decoder.

And so to train a variational autoencoder, the loss function in this case actually consists of two terms.

The first term represents the reconstruction loss, so this is really the same as the autoencoder step, except that
here there is an expectation operator because we are sampling from a distribution. And then the second part
of the loss functions is what we call the KL divergence. I'm not going to go into all of the details because there
is a lot of math involved there. But basically what you want to make sure is that the distribution that you're
learning is not too far removed from a normally distributed discussion. So you're going to try and force your
latent distribution to be relatively close to a mean of zero and a standard deviation of one.

And so finally before we can start training our variational autoencoder, we have one final trick that we have to
use. Because if you look at the computation graph of our network right now, we have a problem. In the middle
of that mat work after the bottleneck we have a sampling operation. There is a node there that takes a sample
from a distribution and then feeds that sample through the decoder. But the problem is that you cannot run
back propagation. You cannot push gradients through a sampling node. So this is an issue. And so in order to
run your gradients through the entire network and train everything end-to-end, we're going to use what we call
the reparameterization trick.

And so the trick goes as follows. If you look at the latent vector that you're sampling, you can actually look at
that vector as the sum of a fixed mu, which is just the parameter that you're learning, plus some kind of a Sigma,
which is also a parameter that you're learning. And then multiplied with an epsilon, and this epsilon is where
we're gonna put the stochastic part. So this epsilon is always going to be standard caution. It's always gonna
have zero mean and standard deviation of one. We're gonna sample from that epsilon and then multiply it with
Sigma, add mu, and we have our latent vector.
And so the clever thing here is that now our mu and our Sigma, those are the only things that we actually want
to train so there we have to be able to compute gradients and run back propagation but that epsilon, well that
doesn't really matter because we don't want to change that epsilon ever again. That epsilon is a fixed stochastic
node. Okay it's still stochastic but we don't have to run back propagation through it so it doesn't matter that it's
sampling operation.

And so this is the reparamisation trick, where instead of having a full stochastic node that is blocking all of your
gradients because you can't do back propagation through it, you're going to split it up into a part where you can
do back prop and then another part which is still stochastic but which you don't want to train because it's fixed.
Pretty clever right.
So let's take a quick look at some code in tensor flow. So here you can see the encoder network which is training
two sets of parameters, the means and the standard deviations of our distribution.

And then in the actual auto-encoder we're going to do a sampling operation from the distribution to actually
get our latent vector.

And then you can see where they compute the KL divergence and then you're actually going to compute your
loss and back prop through it.

Alright and so before we go and look at some visual results of what you can do with variational autoencoders, I
want to note one final thing. There is a new class of variational autoencoders which has a lot of promising results,
that's called disentangled variational autoencoders. And the basic idea behind this disentanglement is that you
want to make sure that the different neurons in your latent distribution that are uncorrelated, that they all try
and learn something different about the input data. And so to implement this the only thing you have to change
is add one hyper parameter through your loss function that weighs how much this KL divergence is present in
the loss function. And so in the disentangled version, the auto encoder will only use a specific latent variable if
it really has a benefit. And if it doesn't benefit the compression then it will simply stick to the normal.

So in order to show the results of a disentangled representation, let's look at a very simple data set.

The data set consists of images that are generated from four latent factors. So you have the x position, the y
position, the size of the objects, and the rotation of the object. And by just picking a sample from that
distribution, you have four values, you can just generate an image that is generated from exactly that hidden
representation. And then the idea is if you train a disentangled variational autoencoder, what you would like to
see is that the autoencoder is able to reconstruct and come up with that exact mapping of those four latent
variables to encode the information in its inputs.
And it turns out that if you use the normal loss function of a variational autoencoder, it simply comes up with a
whole bunch of latent spaces, but it's not really finding exactly those latent variables that we use to generate
the images.

But if you disentangle your representations, it gets much closer. So here on the left side, you can see that by
increasing that beta factor in your autoencoder, you're actually forcing your auto encoder to map the
information onto only a few of those latent variables. So instead of using all ten of them, the autoencoder only
uses five of the latent variables to encode the information. And you can see that the first one represents the Y
position, the second is the x position, then you have the scale, which is the third one, and then in fact there are
two latent representations that the autoencoder used for representing the rotation of an object.
But interestingly, all the other latent representations, even though they are there, they are still fixed at the
Gaussian distribution, and this is because they weren't really necessary to encode the information of our input.

And here you can see really interesting results where they applied variational autoencoders at Google
Deepmind's on their Deepmind lab environment. So you can see a 3D world where an agent can sort of run
around. And what they did is they compress the input images that the agent is seeing into latent space and then
they reconstruct it. But what you can also do is you can start changing the latent variables and then see what
happens to the reconstruction. And so it turns out that if you use a disentangled variational autoencoder, then
changing the latent variables actually corresponds to some very interpretive things. So here you can see that
changing the first latent variable actually changes the color of the floor but nothing else. And then there are
other latent variables that correspond to turning to the left or turning to the right, and there are even some that
changed the rotation and the identity of specific objects that the agent is looking at.
And in contrast, if you don't use this disentanglement, then whenever you start changing a latent
representation, everything starts blurring up in the image, and it's not really clear what this latent vector was
trying to encode.

And so I think this image sums everything up. On the left side, you have a disentangled variational autoencoder,
and you can see that if you change the first dimension in your latent space, then the face is rotated but nothing
else changes. If you do the same thing in a normal variational autoencoder, the face also rotates but you can
see that a lot of other stuff is changing as well.
And then as a comparison, on the right side you can see the results for a Generative Adversarial Network.

And so the holy grail of disentangled variational auto-encoders is to have some kind of a network that can extract
very useful causal features from a very high dimensional space, and then use those features for some tasks that
it's trying to learn. And the hope is then that those learned features will also generalize to the mains outside of
your training data.

And so one of the common domains where people are trying to apply variational autoencoders is, for example,
in reinforcement learning. Because the whole problem in reinforcement learning is that you have very sparse
rewards and it takes a really long time to train anything. So by using this variational autoencoder as some sort
of a feature extractor, the hope is that you can actually run your agent on the representation on the compressed
representation instead of on the full input space.
And so when using this, in practice, there is actually a very clear trade-off. If you disentangle the latent space
too little, then your network is sort of overfitting, because you give it too much freedom, it can just learn how
to reconstruct your input training data, but it won't generalize to unseen data in new cases. On the other hand,
if you disentangle too much, then you actually lose a lot of the high definition detail in your input, and this can
actually hurt performance in a lot of applications. So personally I find this a really interesting idea, and I'm very
curious if this will lead to some types of networks that can learn to extract very useful low dimensional
information from very high dimensional spaces. Because in the end we want to train agents that are able to
understand the world by compressing a whole lot of information and then learning useful behavior on that
latent space.

Assignment 3: Literature Review: 1 Point
0% (1)
Assignment 3: Literature Review: 1 Point
3 pages
Convolutional Neural Networks-CNN PDF
No ratings yet
Convolutional Neural Networks-CNN PDF
95 pages
Java Image Processing
No ratings yet
Java Image Processing
15 pages
UNIT V
No ratings yet
UNIT V
32 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
12 pages
Unit 5
No ratings yet
Unit 5
27 pages
Autoencoders
No ratings yet
Autoencoders
14 pages
Introduction To Autoencoders: A Brief Overview
No ratings yet
Introduction To Autoencoders: A Brief Overview
27 pages
Building Autoencoders in Keras
No ratings yet
Building Autoencoders in Keras
17 pages
Basic Operation On Images
No ratings yet
Basic Operation On Images
2 pages
Keras1 - 1.4 Advanced Model Architectures
No ratings yet
Keras1 - 1.4 Advanced Model Architectures
11 pages
Study Materials - Denoising Autoencoders
No ratings yet
Study Materials - Denoising Autoencoders
7 pages
Building Vehicle Counter System Using OpenCV
No ratings yet
Building Vehicle Counter System Using OpenCV
13 pages
Unit 3 Full Notes
No ratings yet
Unit 3 Full Notes
30 pages
Deep Learning: Prof:Naveen Ghorpade
No ratings yet
Deep Learning: Prof:Naveen Ghorpade
43 pages
Plant Disease Identification
No ratings yet
Plant Disease Identification
17 pages
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Learning in Python, Theano,... (The LazyProgrammer)
No ratings yet
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Learning in Python, Theano,... (The LazyProgrammer)
169 pages
Stacked Autoencoders. | Towards Data Science
No ratings yet
Stacked Autoencoders. | Towards Data Science
9 pages
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Le
100% (3)
Convolutional Neural Networks in Python Master Data Science and Machine Learning With Modern Deep Le
178 pages
Machine Learning Engineer Nanodegree: Capstone Proposal
No ratings yet
Machine Learning Engineer Nanodegree: Capstone Proposal
4 pages
Convolutional Neural Networks in Python
100% (3)
Convolutional Neural Networks in Python
141 pages
An Introduction To Back
No ratings yet
An Introduction To Back
4 pages
Mastering ESP32-Cam from Basics to Advanced Projects: A Comprehensive Guide to Web Servers, Streaming, Robotics, and IoT with ESP32-CAM
From Everand
Mastering ESP32-Cam from Basics to Advanced Projects: A Comprehensive Guide to Web Servers, Streaming, Robotics, and IoT with ESP32-CAM
Furuta Kimiko
No ratings yet
Generative AI Art Internship
No ratings yet
Generative AI Art Internship
23 pages
CNN and Autoencoder
No ratings yet
CNN and Autoencoder
56 pages
Project Report: Optical Character Recognition Using Artificial Neural Network
No ratings yet
Project Report: Optical Character Recognition Using Artificial Neural Network
9 pages
Cad and Dog 2
No ratings yet
Cad and Dog 2
5 pages
Guide Convolutional Neural Network CNN
100% (1)
Guide Convolutional Neural Network CNN
25 pages
Deep Learning Models
No ratings yet
Deep Learning Models
18 pages
Deep Learning Basics in Machine Learnning 1
No ratings yet
Deep Learning Basics in Machine Learnning 1
29 pages
7& 9 Autoencoder and Variational Autoencoder
No ratings yet
7& 9 Autoencoder and Variational Autoencoder
13 pages
Autoencoders
No ratings yet
Autoencoders
12 pages
Cs344 - Lesson 2 - GPU Hardware and Parallel Communication Patterns - Udacity
No ratings yet
Cs344 - Lesson 2 - GPU Hardware and Parallel Communication Patterns - Udacity
14 pages
Deep Learning PDF
No ratings yet
Deep Learning PDF
55 pages
Lec 3
No ratings yet
Lec 3
18 pages
Lecture 2.3.2VariationalAutoencoders (VAEs)
No ratings yet
Lecture 2.3.2VariationalAutoencoders (VAEs)
25 pages
CVDL Cae 2
No ratings yet
CVDL Cae 2
7 pages
Demystifying The Mathematics Behind Convolutional Neural Networks (CNNS)
No ratings yet
Demystifying The Mathematics Behind Convolutional Neural Networks (CNNS)
19 pages
DL Activation Functions Question Bank
No ratings yet
DL Activation Functions Question Bank
27 pages
Technical Interview Questions:: Compiled by Muhammad Adnan:)
No ratings yet
Technical Interview Questions:: Compiled by Muhammad Adnan:)
13 pages
KJ Mohamed Dhanish MIP 2
No ratings yet
KJ Mohamed Dhanish MIP 2
36 pages
2017 Beginner's Review of Generative Adversarial Networks (GAN) Architectures
No ratings yet
2017 Beginner's Review of Generative Adversarial Networks (GAN) Architectures
9 pages
Vehicle Detection in Videos Using OpenCV and Python
No ratings yet
Vehicle Detection in Videos Using OpenCV and Python
1 page
Matlab Deep Learning Series
No ratings yet
Matlab Deep Learning Series
6 pages
DeepDream
No ratings yet
DeepDream
14 pages
Convolutional Neural Network
No ratings yet
Convolutional Neural Network
37 pages
Image Convolution Examples
No ratings yet
Image Convolution Examples
7 pages
Deep Learning
No ratings yet
Deep Learning
17 pages
Taking I Niti Ative: // You're Reading..
No ratings yet
Taking I Niti Ative: // You're Reading..
14 pages
Foundation of AI Lab: Project: Cam Scanner Using Python
No ratings yet
Foundation of AI Lab: Project: Cam Scanner Using Python
32 pages
Neural Networks For Applications in The Arts: Peter M. Todd
No ratings yet
Neural Networks For Applications in The Arts: Peter M. Todd
9 pages
Binary Classification Using Convolution Neural Network (CNN) Model by Mayank Verma Medium 2
No ratings yet
Binary Classification Using Convolution Neural Network (CNN) Model by Mayank Verma Medium 2
1 page
Kerkythea 2007 Rendering System FAQ
No ratings yet
Kerkythea 2007 Rendering System FAQ
6 pages
Basic Neural Network Tutorial - C++ Implementation and Source Code Taking Initiative
No ratings yet
Basic Neural Network Tutorial - C++ Implementation and Source Code Taking Initiative
45 pages
A Deep Approach To Image Matting Report
No ratings yet
A Deep Approach To Image Matting Report
9 pages
Chap 2 DL
No ratings yet
Chap 2 DL
88 pages
Convolutional Neural Network Architecture - CNN Architecture
No ratings yet
Convolutional Neural Network Architecture - CNN Architecture
13 pages
Deep LearningUNIT-IV
No ratings yet
Deep LearningUNIT-IV
16 pages
Auto Encoders
No ratings yet
Auto Encoders
4 pages
Applied Deep Learning - Part 3 - Autoencoders - by Arden Dertat - Towards Data Science
No ratings yet
Applied Deep Learning - Part 3 - Autoencoders - by Arden Dertat - Towards Data Science
20 pages
Motion Detection Sceurity Camera
No ratings yet
Motion Detection Sceurity Camera
29 pages
SOE BCA 2nd sem-merged
No ratings yet
SOE BCA 2nd sem-merged
12 pages
WinCC Flexible - Communication Part 1
No ratings yet
WinCC Flexible - Communication Part 1
280 pages
GA4 Report
No ratings yet
GA4 Report
1 page
Quick_Start_Guide_for_wpa_supplicant_WiFi_P2P_test
No ratings yet
Quick_Start_Guide_for_wpa_supplicant_WiFi_P2P_test
14 pages
Tesi Agustin
No ratings yet
Tesi Agustin
93 pages
Eetop.cn DWC Pcie Ctl Cxl Databook
No ratings yet
Eetop.cn DWC Pcie Ctl Cxl Databook
210 pages
MDM Prerequisites
No ratings yet
MDM Prerequisites
10 pages
Site Kit by Google Dashboard ‹ Podio CRM for Real Estate Investors & Wholesalers _ Integroforce — WordPress
No ratings yet
Site Kit by Google Dashboard ‹ Podio CRM for Real Estate Investors & Wholesalers _ Integroforce — WordPress
8 pages
Morandi Color Work Report PowerPoint Templates
No ratings yet
Morandi Color Work Report PowerPoint Templates
25 pages
Ta 810
No ratings yet
Ta 810
60 pages
Turbo Delphi Interbase Tutorial
No ratings yet
Turbo Delphi Interbase Tutorial
97 pages
Real Time Courier Tracking System
No ratings yet
Real Time Courier Tracking System
72 pages
CLC 12 - Capstone Citation Exercise - Updated
No ratings yet
CLC 12 - Capstone Citation Exercise - Updated
4 pages
Infraestructura Porvenir - Produccion 1
No ratings yet
Infraestructura Porvenir - Produccion 1
2 pages
Url and Uri
No ratings yet
Url and Uri
87 pages
RAKESH-M-FlowCV-Resume-20250214 (2)
No ratings yet
RAKESH-M-FlowCV-Resume-20250214 (2)
1 page
Cyber Crime
No ratings yet
Cyber Crime
18 pages
blockchain cryptocurrency
No ratings yet
blockchain cryptocurrency
3 pages
Gay Porn Videos XXXqartg PDF
100% (1)
Gay Porn Videos XXXqartg PDF
2 pages
Installing and Registering FSUIPC7
No ratings yet
Installing and Registering FSUIPC7
8 pages
Desain Ipal Laundry Mana
No ratings yet
Desain Ipal Laundry Mana
2 pages
Case Study 2
No ratings yet
Case Study 2
20 pages
BSC Computer Science Cs Semester 5 2023 April Blockchain Technology 2019 Pattern
No ratings yet
BSC Computer Science Cs Semester 5 2023 April Blockchain Technology 2019 Pattern
2 pages
Exotel Solutions For Business
No ratings yet
Exotel Solutions For Business
10 pages
CS 403-Week 1 Module
No ratings yet
CS 403-Week 1 Module
24 pages
QL Editor Installation Manual
No ratings yet
QL Editor Installation Manual
10 pages
All in one 2024
No ratings yet
All in one 2024
6 pages
Course Guide For Simplified Accounting For Entrepreneurs & Professionals (Safe&P)
No ratings yet
Course Guide For Simplified Accounting For Entrepreneurs & Professionals (Safe&P)
7 pages
61 Laravel Packages Proposal - Arkusz1
No ratings yet
61 Laravel Packages Proposal - Arkusz1
3 pages