Autoencoder

Autoencoder
Hands On Machine Learning with Scikit Learn and TensorFlow Chap.15
Matsuda laboratory B4
Wataru Hirota

Efficient Data Representation
x1 x2 x3
x1 x2 x3
encoder
decoder

What is Autoencoder?
• A Powerful Feature Detector
• Unsupervised Learning
• Work by simply learning to copy
their inputs to their outputs.
• Used for various tasks.
x1 x2 x3
x1 x2 x3
encoder
decoder

Outline
• Typical Autoencoders
• Visualizing Features
• Unsupervised Pretraining with Autoencoders
• Various Autoencoders
• Denoising Autoencoders
• Sparse Autoencoders
• Variational Autoencoder

Typical Autoencoders
• Undercomplete
• The hidden layers has a lower dimensionality than the input data.
• Used for dimensionality reduction.
• cf. Overcomplete
• The cost function is reconstruction loss.
• Difference between input data and output for them.
• MSE, Cross Entropy, etc..

Comparison with PCA
• If the autoencoder uses only linear activations
and
the cost function is the MSE,
then it can be shown that it ends up performing PCA.

Comparison with PCA
• Transformation matrix 𝑼 derived from PCA minimize the
following primary problem.
• If you consider 𝑼 as the weight matrix of linear Autoencoder,
the Autoencoder solves the same problem.
min
𝑼
% 𝒙𝒊 − 𝑼 𝒕
𝑼𝒙𝒊
*
,

Stacked Autoencoder
• Multiple hidden layers.
• Adding more layers helps the
autoencoder learn more complex
codings.
• ⟺ easy to overfit…
output
hidden3
hidden2
input
hidden1

Let‘s implement with Tensorflow
output
hidden3
hidden2
input
hidden1
28*28
28*28
300
300
150

C++ like namespace
(set fully_connected option in this scope)

Regularization loss of the
whole graph can be given
in this way.
element-wise addition

How to Train Stacked Autoencoder?
output
hidden3
hidden2
input
hidden1
output
MSE
MSE
Phrase1
Training Op
Phrase 2
Training Op
freeze

How to Visualize Features?
1. Consider each neuron in every hidden layer, and find the
training instances that activate it the most.
• This is simplest technique, and effective for the top hidden layer.
• But for lower layers, this technique doesn’t work well.

2. Show an image where a pixel’s intensity corresponds to
the weight of the connection.
• ex. Denoising Autoencoder (discussed later).

3. Train a network to a certain feature will activate even
more.
1. feed the autoencoder a random input image,
2. measure the activation of the neuron you are interested in,
3. and then perform backpropagation to tweak the image in such a way
• This is a useful technique to visualize the kinds of inputs that a neuron
is looking for.

Pretraining Using (Stacked) Autoencoders
• Train a Stacked Autoencoder with using all the data,
then reuse the lower layers to create a network for your task.
output
hidden3
hidden2
input
hidden1
hidden2
input
hidden1
Softmax
hidden3 ’
Copy parameters

Denoising Autoencoder
• Add noise to its input, and train it to recover this original.
output
hidden3
hidden2
input
hidden1
+Gaussian Noise
output
hidden3
hidden2
input
hidden1
Dropout
Gaussian noise Randomly switched input (Dropout)

Denoising Autoencoder Implementation
Note that is_traininig should
be False after training.
Dropout version

Sparse Autoencoder
• Reduce the number of active neurons in the coding layer.
• Add sparsity loss into the cost function.
• “If you could speak only a few words per month, you would probably
try to make them worth listening to.”

Sparsity Loss
• Kullback-Leibler(KL) divergence is commonly used.
• It has much stronger gradients than the MSE.

KL Divergence
• = 0 iff ∀𝒊 [𝑷 𝒊 = 𝑸 𝒊 ]
• − log 𝑄 𝑖 − − log 𝑃 𝑖 = log
; ,
< ,
is interpreted as the
information gain.
• In this case,
sparsity target actual activation average

Variational Autoencoder
hidden3
hidden2hidden2
μ σ
+ ×
Gaussian
Noise
input
hidden1hidden1
input
outputoutput
hidden3
hidden1
input
output
hidden3

Variational Autoencoder (VAE)
• A Generative Model
• they can not only reconstruct the train data, but generate new
instances.
• The output are partly determined by chance (probabilistic).
• as opposed to denoising autoencoders: which use randomness only
during training

Latent Loss
• Latent loss pushes the autoencoders to have codings that
looks as thought they were sampled from 𝑁(𝟎, 𝑰)
• Most VAEs’ encoders are trained to output 𝛾 = log 𝜎*
rather
than 𝜎. (This makes it easier for the encoder to capture sigmas
of different scales.)

Latent Loss
𝐷FG 𝑞 𝒛|𝑿 𝑝(𝒛)) = 𝐷FG 𝑁 𝝁|𝚺 N(𝟎, 𝑰))
= −
1
2
%(1 + log 𝜎*
− 𝜇T
*
+ 𝜎T
*
)
U
,VW

Conclusion
• Autoencoders learn effective representations of their inputs.
• Autoencoders are powerful feature detectors, while the
underlying ideas are simple.
• Autoencoders can be used for many tasks.
• Pretraining, Denoising, Generate new images, …

Autoencoder

More Related Content

What's hot

Similar to Autoencoder

Recently uploaded

Autoencoder