Autoencoder
Hands On Machine Learning with Scikit Learn and TensorFlow Chap.15
Matsuda laboratory B4
Wataru Hirota
Efficient Data Representation
x1 x2 x3
x1 x2 x3
encoder
decoder
What is Autoencoder?
• A Powerful Feature Detector
• Unsupervised Learning
• Work by simply learning to copy
their inputs to their outputs.
• Used for various tasks.
x1 x2 x3
x1 x2 x3
encoder
decoder
Outline
• Typical Autoencoders
• Visualizing Features
• Unsupervised Pretraining with Autoencoders
• Various Autoencoders
• Denoising Autoencoders
• Sparse Autoencoders
• Variational Autoencoder
Outline
• Typical Autoencoders
• Visualizing Features
• Unsupervised Pretraining with Autoencoders
• Various Autoencoders
• Denoising Autoencoders
• Sparse Autoencoders
• Variational Autoencoder
Typical Autoencoders
• Undercomplete
• The hidden layers has a lower dimensionality than the input data.
• Used for dimensionality reduction.
• cf. Overcomplete
• The cost function is reconstruction loss.
• Difference between input data and output for them.
• MSE, Cross Entropy, etc..
Comparison with PCA
• If the autoencoder uses only linear activations
and
the cost function is the MSE,
then it can be shown that it ends up performing PCA.
Comparison with PCA
• Transformation matrix 𝑼 derived from PCA minimize the
following primary problem.
• If you consider 𝑼 as the weight matrix of linear Autoencoder,
the Autoencoder solves the same problem.
min
𝑼
% 𝒙𝒊 − 𝑼 𝒕
𝑼𝒙𝒊
*
,
Stacked Autoencoder
• Multiple hidden layers.
• Adding more layers helps the
autoencoder learn more complex
codings.
• ⟺ easy to overfit…
output
hidden3
hidden2
input
hidden1
Let‘s implement with Tensorflow
output
hidden3
hidden2
input
hidden1
28*28
28*28
300
300
150
Let‘s implement with Tensorflow
C++ like namespace
(set fully_connected option in this scope)
Let‘s implement with Tensorflow
Regularization loss of the
whole graph can be given
in this way.
element-wise addition
How to Train Stacked Autoencoder?
output
hidden3
hidden2
input
hidden1
output
MSE
MSE
Phrase1
Training Op
Phrase 2
Training Op
freeze
Outline
• Typical Autoencoders
• Visualizing Features
• Unsupervised Pretraining with Autoencoders
• Various Autoencoders
• Denoising Autoencoders
• Sparse Autoencoders
• Variational Autoencoder
How to Visualize Features?
1. Consider each neuron in every hidden layer, and find the
training instances that activate it the most.
• This is simplest technique, and effective for the top hidden layer.
• But for lower layers, this technique doesn’t work well.
How to Visualize Features?
2. Show an image where a pixel’s intensity corresponds to
the weight of the connection.
• ex. Denoising Autoencoder (discussed later).
How to Visualize Features?
3. Train a network to a certain feature will activate even
more.
1. feed the autoencoder a random input image,
2. measure the activation of the neuron you are interested in,
3. and then perform backpropagation to tweak the image in such a way
• This is a useful technique to visualize the kinds of inputs that a neuron
is looking for.
Outline
• Typical Autoencoders
• Visualizing Features
• Unsupervised Pretraining with Autoencoders
• Various Autoencoders
• Denoising Autoencoders
• Sparse Autoencoders
• Variational Autoencoder
Pretraining Using (Stacked) Autoencoders
• Train a Stacked Autoencoder with using all the data,
then reuse the lower layers to create a network for your task.
output
hidden3
hidden2
input
hidden1
hidden2
input
hidden1
Softmax
hidden3 ’
Copy parameters
Outline
• Typical Autoencoders
• Visualizing Features
• Unsupervised Pretraining with Autoencoders
• Various Autoencoders
• Denoising Autoencoders
• Sparse Autoencoders
• Variational Autoencoder
Denoising Autoencoder
• Add noise to its input, and train it to recover this original.
output
hidden3
hidden2
input
hidden1
+Gaussian Noise
output
hidden3
hidden2
input
hidden1
Dropout
Gaussian noise Randomly switched input (Dropout)
Denoising Autoencoder Implementation
Note that is_traininig should
be False after training.
Dropout version
An Example of Denoising
Sparse Autoencoder
• Reduce the number of active neurons in the coding layer.
• Add sparsity loss into the cost function.
• “If you could speak only a few words per month, you would probably
try to make them worth listening to.”
Sparsity Loss
• Kullback-Leibler(KL) divergence is commonly used.
• It has much stronger gradients than the MSE.
KL Divergence
• = 0 iff ∀𝒊	[𝑷 𝒊 = 𝑸 𝒊 ]
• − log 𝑄 𝑖 	 − − log 𝑃 𝑖 = log
; ,
< ,
is interpreted as the
information gain.
• In this case,
sparsity target actual activation average
Variational Autoencoder
hidden3
hidden2hidden2
μ σ
+ ×
Gaussian
Noise
input
hidden1hidden1
input
outputoutput
hidden3
hidden1
input
output
hidden3
Variational Autoencoder (VAE)
• A Generative Model
• they can not only reconstruct the train data, but generate new
instances.
• The output are partly determined by chance (probabilistic).
• as opposed to denoising autoencoders: which use randomness only
during training
Latent Loss
• Latent loss pushes the autoencoders to have codings that
looks as thought they were sampled from 𝑁(𝟎, 𝑰)
• Most VAEs’ encoders are trained to output 𝛾 = log 𝜎*
rather
than 𝜎. (This makes it easier for the encoder to capture sigmas
of different scales.)
Latent Loss
𝐷FG	 𝑞 𝒛|𝑿 	 	𝑝(𝒛)) = 𝐷FG	 𝑁 𝝁|𝚺 	 	N(𝟎, 𝑰))
= −
1
2
%(1 + log 𝜎*
	 − 𝜇T
*
+ 𝜎T
*
)
U
,VW
Digits Generated by VAE
Conclusion
• Autoencoders learn effective representations of their inputs.
• Autoencoders are powerful feature detectors, while the
underlying ideas are simple.
• Autoencoders can be used for many tasks.
• Pretraining, Denoising, Generate new images, …

Autoencoder

  • 1.
    Autoencoder Hands On MachineLearning with Scikit Learn and TensorFlow Chap.15 Matsuda laboratory B4 Wataru Hirota
  • 2.
    Efficient Data Representation x1x2 x3 x1 x2 x3 encoder decoder
  • 3.
    What is Autoencoder? •A Powerful Feature Detector • Unsupervised Learning • Work by simply learning to copy their inputs to their outputs. • Used for various tasks. x1 x2 x3 x1 x2 x3 encoder decoder
  • 4.
    Outline • Typical Autoencoders •Visualizing Features • Unsupervised Pretraining with Autoencoders • Various Autoencoders • Denoising Autoencoders • Sparse Autoencoders • Variational Autoencoder
  • 5.
    Outline • Typical Autoencoders •Visualizing Features • Unsupervised Pretraining with Autoencoders • Various Autoencoders • Denoising Autoencoders • Sparse Autoencoders • Variational Autoencoder
  • 6.
    Typical Autoencoders • Undercomplete •The hidden layers has a lower dimensionality than the input data. • Used for dimensionality reduction. • cf. Overcomplete • The cost function is reconstruction loss. • Difference between input data and output for them. • MSE, Cross Entropy, etc..
  • 7.
    Comparison with PCA •If the autoencoder uses only linear activations and the cost function is the MSE, then it can be shown that it ends up performing PCA.
  • 8.
    Comparison with PCA •Transformation matrix 𝑼 derived from PCA minimize the following primary problem. • If you consider 𝑼 as the weight matrix of linear Autoencoder, the Autoencoder solves the same problem. min 𝑼 % 𝒙𝒊 − 𝑼 𝒕 𝑼𝒙𝒊 * ,
  • 9.
    Stacked Autoencoder • Multiplehidden layers. • Adding more layers helps the autoencoder learn more complex codings. • ⟺ easy to overfit… output hidden3 hidden2 input hidden1
  • 10.
    Let‘s implement withTensorflow output hidden3 hidden2 input hidden1 28*28 28*28 300 300 150
  • 11.
    Let‘s implement withTensorflow C++ like namespace (set fully_connected option in this scope)
  • 12.
    Let‘s implement withTensorflow Regularization loss of the whole graph can be given in this way. element-wise addition
  • 13.
    How to TrainStacked Autoencoder? output hidden3 hidden2 input hidden1 output MSE MSE Phrase1 Training Op Phrase 2 Training Op freeze
  • 14.
    Outline • Typical Autoencoders •Visualizing Features • Unsupervised Pretraining with Autoencoders • Various Autoencoders • Denoising Autoencoders • Sparse Autoencoders • Variational Autoencoder
  • 15.
    How to VisualizeFeatures? 1. Consider each neuron in every hidden layer, and find the training instances that activate it the most. • This is simplest technique, and effective for the top hidden layer. • But for lower layers, this technique doesn’t work well.
  • 16.
    How to VisualizeFeatures? 2. Show an image where a pixel’s intensity corresponds to the weight of the connection. • ex. Denoising Autoencoder (discussed later).
  • 17.
    How to VisualizeFeatures? 3. Train a network to a certain feature will activate even more. 1. feed the autoencoder a random input image, 2. measure the activation of the neuron you are interested in, 3. and then perform backpropagation to tweak the image in such a way • This is a useful technique to visualize the kinds of inputs that a neuron is looking for.
  • 18.
    Outline • Typical Autoencoders •Visualizing Features • Unsupervised Pretraining with Autoencoders • Various Autoencoders • Denoising Autoencoders • Sparse Autoencoders • Variational Autoencoder
  • 19.
    Pretraining Using (Stacked)Autoencoders • Train a Stacked Autoencoder with using all the data, then reuse the lower layers to create a network for your task. output hidden3 hidden2 input hidden1 hidden2 input hidden1 Softmax hidden3 ’ Copy parameters
  • 20.
    Outline • Typical Autoencoders •Visualizing Features • Unsupervised Pretraining with Autoencoders • Various Autoencoders • Denoising Autoencoders • Sparse Autoencoders • Variational Autoencoder
  • 21.
    Denoising Autoencoder • Addnoise to its input, and train it to recover this original. output hidden3 hidden2 input hidden1 +Gaussian Noise output hidden3 hidden2 input hidden1 Dropout Gaussian noise Randomly switched input (Dropout)
  • 22.
    Denoising Autoencoder Implementation Notethat is_traininig should be False after training. Dropout version
  • 23.
    An Example ofDenoising
  • 24.
    Sparse Autoencoder • Reducethe number of active neurons in the coding layer. • Add sparsity loss into the cost function. • “If you could speak only a few words per month, you would probably try to make them worth listening to.”
  • 25.
    Sparsity Loss • Kullback-Leibler(KL)divergence is commonly used. • It has much stronger gradients than the MSE.
  • 26.
    KL Divergence • =0 iff ∀𝒊 [𝑷 𝒊 = 𝑸 𝒊 ] • − log 𝑄 𝑖 − − log 𝑃 𝑖 = log ; , < , is interpreted as the information gain. • In this case, sparsity target actual activation average
  • 27.
    Variational Autoencoder hidden3 hidden2hidden2 μ σ +× Gaussian Noise input hidden1hidden1 input outputoutput hidden3 hidden1 input output hidden3
  • 28.
    Variational Autoencoder (VAE) •A Generative Model • they can not only reconstruct the train data, but generate new instances. • The output are partly determined by chance (probabilistic). • as opposed to denoising autoencoders: which use randomness only during training
  • 29.
    Latent Loss • Latentloss pushes the autoencoders to have codings that looks as thought they were sampled from 𝑁(𝟎, 𝑰) • Most VAEs’ encoders are trained to output 𝛾 = log 𝜎* rather than 𝜎. (This makes it easier for the encoder to capture sigmas of different scales.)
  • 30.
    Latent Loss 𝐷FG 𝑞𝒛|𝑿 𝑝(𝒛)) = 𝐷FG 𝑁 𝝁|𝚺 N(𝟎, 𝑰)) = − 1 2 %(1 + log 𝜎* − 𝜇T * + 𝜎T * ) U ,VW
  • 31.
  • 32.
    Conclusion • Autoencoders learneffective representations of their inputs. • Autoencoders are powerful feature detectors, while the underlying ideas are simple. • Autoencoders can be used for many tasks. • Pretraining, Denoising, Generate new images, …