Auto Encoders
1
In the name of God
Mehrnaz Faraz
Faculty of Electrical Engineering
K. N. Toosi University of Technology
Milad Abbasi
Faculty of Electrical Engineering
Sharif University of Technology
Auto Encoders
2
• An unsupervised deep learning algorithm
• Are artificial neural networks
• Useful for dimensionality reduction and clustering
Unlabeled data
𝑧 = 𝑠 𝑤𝑥 + 𝑏
𝑥 = 𝑠 𝑤′
z + 𝑏′
𝑥 is 𝑥’s reconstruction
𝑧 is some latent representation or code and 𝑠 is a non-linearity such
as the sigmoid
𝑧 𝑥𝑥 Encoder Decoder
Auto Encoders
• Simple structure:
3
𝒙 𝟏
𝒙 𝟑
𝒙 𝟐
𝒙 𝟏
𝒙 𝟑
𝒙 𝟐
Input
ReconstructedOutput
Hidden
Encoder Decoder
Undercomplete AE
• Hidden layer is Undercomplete if smaller than the input
layer
– Compresses the input
– Hidden nodes will be Good features for the training
4
𝑥
𝑥
𝑤
𝑤′
𝑧
Overcomplete AE
• Hidden layer is Overcomplete if greater than the input layer
– No compression in hidden layer.
– Each hidden unit could copy a different input component.
5
𝑥
𝑥
𝑤
𝑤′
𝑧
Deep Auto Encoders
• Deep Auto Encoders (DAE)
• Stacked Auto Encoders (SAE)
6
Training Deep Auto Encoder
• First layer:
7
𝒙 𝟏
𝒙 𝟒
𝒙 𝟑
𝒙 𝟐
𝒙 𝟏
𝒙 𝟒
𝒙 𝟑
𝒙 𝟐
𝒂 𝟑
𝒂 𝟐
𝒂 𝟏
Encoder Decoder
Training Deep Auto Encoder
• Features of first layer:
8
𝒙 𝟏
𝒙 𝟒
𝒙 𝟑
𝒙 𝟐
𝒂 𝟑
𝒂 𝟐
𝒂 𝟏
𝑎1
𝑎2
𝑎3
Training Deep Auto Encoder
• Second layer:
9
𝒂 𝟑
𝒂 𝟐
𝒂 𝟏 𝒂 𝟏
𝒂 𝟑
𝒂 𝟐
𝒃 𝟐
𝒃 𝟏
Training Deep Auto Encoder
• Features of second layer:
10
𝒙 𝟏
𝒙 𝟒
𝒙 𝟑
𝒙 𝟐
𝒂 𝟑
𝒂 𝟐
𝒂 𝟏
𝒃 𝟐
𝒃 𝟏
𝑏1
𝑏2
Using Deep Auto Encoder
• Feature extraction
• Dimensionality reduction
• Classification
11
𝒙 𝟏
𝒙 𝟒
𝒙 𝟑
𝒙 𝟐
𝒂 𝟑
𝒂 𝟐
𝒂 𝟏
𝒃 𝟐
𝒃 𝟏
Inputs Features
Encoder
Using Deep Auto Encoder
• Reconstruction
12
𝒙 𝟏
𝒙 𝟒
𝒙 𝟑
𝒙 𝟐
𝒂 𝟑
𝒂 𝟐
𝒂 𝟏
𝒃 𝟐
𝒃 𝟏
𝒂 𝟏
𝒂 𝟑
𝒂 𝟐
𝒙 𝟒
𝒙 𝟑
𝒙 𝟐
𝒙 𝟏
Encoder Decoder
Using AE
• Denoising
• Data compression
• Unsupervised learning
• Manifold learning
• Generative model
13
Types of Auto Encoder
• Stacked auto encoder (SAE)
• Denoising auto encoder (DAE)
• Sparse Auto Encoder (SAE)
• Contractive Auto Encoder (CAE)
• Convolutional Auto Encoder (CAE)
• Variational Auto Encoder (VAE)
14
Generative Models
• Given training data, generate new samples from same
distribution
– Variational Auto Encoder (VAE)
– Generative Adversarial Network (GAN)
15
Variational Auto Encoder
16
Encoder Decoder
Input
x Output
𝐱𝒒∅ 𝒛|𝒙 𝒑 𝜽 𝒙|𝒛
𝒙 𝟏
𝒙 𝟒
𝒙 𝟑
𝒙 𝟐
𝒙 𝟏
𝒙 𝟒
𝒙 𝟑
𝒙 𝟐
𝒛 𝟏
𝒛 𝟐
Variational Auto Encoder
• Use probabilistic encoding and decoding
– Encoder:
– Decoder:
• x: Unknown probability distribution
• z: Gaussian probability distribution
17
𝑞∅ 𝑧|𝑥
𝑝 𝜃 𝑥|𝑧
Training Variational Auto Encoder
• Latent space:
18
𝒙 𝟏
𝒙 𝟒
𝒙 𝟑
𝒙 𝟐
𝒉 𝟏
𝒉 𝟐
𝒉 𝟑
𝝈
𝝁
𝒛 𝑞∅ 𝑧|𝑥
Mean
Variance
1 dimensional
Gaussian probability distribution
If we have n neurons for 𝝈 and 𝝁 then
we have n dimensional distribution
Training Variational Auto Encoder
• Generating new data:
– Example: MNIST Database
19
𝐱
Encoder
Latent space
Decoder
Generative Adversarial Network
• VAE:
• GAN:
– Can generate samples
– Trained by competing each other
– Use neural network
– Z is some random noise (Gaussian/Uniform).
– Z can be thought as the latent representation of the
image.
20
x Decoder 𝐱zEncoder
z Generator 𝐱
x
Discriminator
Fake or real?
Loss
GAN’s Architecture
Real samples
Discriminator
Generated
fake
samples
Fine tune training
Latent space
Noise
Is D
correct?Generator
• Overview:
Using GAN
• Image generation:
22
Using GAN
• Data manipulation:
23
Denoising Auto Encoder
• Add noise to its input, and train it to recover this original.
24
Denoising Auto Encoder
25
Input
Output
Hidden 3
Hidden 2
Hidden 1
+Noise
Input
Output
Hidden 3
Hidden 2
Hidden 1
Dropout
Randomly switched inputGaussian noise
Sparse Auto Encoder
• Reduce the number of active neurons in the coding layer.
– Add sparsity loss into the cost function.
• Sparsity loss:
– Kullback-Leibler(KL) divergence is commonly used.
26
Sparse Auto Encoder
27
   
1
ˆ log 1 log
ˆ ˆ1
j
j j
KL
 
   
 

  

     
1
ˆ, ,sparse j
j
J w b J w b KL  

  

Autoencoders in Deep Learning

  • 1.
    Auto Encoders 1 In thename of God Mehrnaz Faraz Faculty of Electrical Engineering K. N. Toosi University of Technology Milad Abbasi Faculty of Electrical Engineering Sharif University of Technology
  • 2.
    Auto Encoders 2 • Anunsupervised deep learning algorithm • Are artificial neural networks • Useful for dimensionality reduction and clustering Unlabeled data 𝑧 = 𝑠 𝑤𝑥 + 𝑏 𝑥 = 𝑠 𝑤′ z + 𝑏′ 𝑥 is 𝑥’s reconstruction 𝑧 is some latent representation or code and 𝑠 is a non-linearity such as the sigmoid 𝑧 𝑥𝑥 Encoder Decoder
  • 3.
    Auto Encoders • Simplestructure: 3 𝒙 𝟏 𝒙 𝟑 𝒙 𝟐 𝒙 𝟏 𝒙 𝟑 𝒙 𝟐 Input ReconstructedOutput Hidden Encoder Decoder
  • 4.
    Undercomplete AE • Hiddenlayer is Undercomplete if smaller than the input layer – Compresses the input – Hidden nodes will be Good features for the training 4 𝑥 𝑥 𝑤 𝑤′ 𝑧
  • 5.
    Overcomplete AE • Hiddenlayer is Overcomplete if greater than the input layer – No compression in hidden layer. – Each hidden unit could copy a different input component. 5 𝑥 𝑥 𝑤 𝑤′ 𝑧
  • 6.
    Deep Auto Encoders •Deep Auto Encoders (DAE) • Stacked Auto Encoders (SAE) 6
  • 7.
    Training Deep AutoEncoder • First layer: 7 𝒙 𝟏 𝒙 𝟒 𝒙 𝟑 𝒙 𝟐 𝒙 𝟏 𝒙 𝟒 𝒙 𝟑 𝒙 𝟐 𝒂 𝟑 𝒂 𝟐 𝒂 𝟏 Encoder Decoder
  • 8.
    Training Deep AutoEncoder • Features of first layer: 8 𝒙 𝟏 𝒙 𝟒 𝒙 𝟑 𝒙 𝟐 𝒂 𝟑 𝒂 𝟐 𝒂 𝟏 𝑎1 𝑎2 𝑎3
  • 9.
    Training Deep AutoEncoder • Second layer: 9 𝒂 𝟑 𝒂 𝟐 𝒂 𝟏 𝒂 𝟏 𝒂 𝟑 𝒂 𝟐 𝒃 𝟐 𝒃 𝟏
  • 10.
    Training Deep AutoEncoder • Features of second layer: 10 𝒙 𝟏 𝒙 𝟒 𝒙 𝟑 𝒙 𝟐 𝒂 𝟑 𝒂 𝟐 𝒂 𝟏 𝒃 𝟐 𝒃 𝟏 𝑏1 𝑏2
  • 11.
    Using Deep AutoEncoder • Feature extraction • Dimensionality reduction • Classification 11 𝒙 𝟏 𝒙 𝟒 𝒙 𝟑 𝒙 𝟐 𝒂 𝟑 𝒂 𝟐 𝒂 𝟏 𝒃 𝟐 𝒃 𝟏 Inputs Features Encoder
  • 12.
    Using Deep AutoEncoder • Reconstruction 12 𝒙 𝟏 𝒙 𝟒 𝒙 𝟑 𝒙 𝟐 𝒂 𝟑 𝒂 𝟐 𝒂 𝟏 𝒃 𝟐 𝒃 𝟏 𝒂 𝟏 𝒂 𝟑 𝒂 𝟐 𝒙 𝟒 𝒙 𝟑 𝒙 𝟐 𝒙 𝟏 Encoder Decoder
  • 13.
    Using AE • Denoising •Data compression • Unsupervised learning • Manifold learning • Generative model 13
  • 14.
    Types of AutoEncoder • Stacked auto encoder (SAE) • Denoising auto encoder (DAE) • Sparse Auto Encoder (SAE) • Contractive Auto Encoder (CAE) • Convolutional Auto Encoder (CAE) • Variational Auto Encoder (VAE) 14
  • 15.
    Generative Models • Giventraining data, generate new samples from same distribution – Variational Auto Encoder (VAE) – Generative Adversarial Network (GAN) 15
  • 16.
    Variational Auto Encoder 16 EncoderDecoder Input x Output 𝐱𝒒∅ 𝒛|𝒙 𝒑 𝜽 𝒙|𝒛 𝒙 𝟏 𝒙 𝟒 𝒙 𝟑 𝒙 𝟐 𝒙 𝟏 𝒙 𝟒 𝒙 𝟑 𝒙 𝟐 𝒛 𝟏 𝒛 𝟐
  • 17.
    Variational Auto Encoder •Use probabilistic encoding and decoding – Encoder: – Decoder: • x: Unknown probability distribution • z: Gaussian probability distribution 17 𝑞∅ 𝑧|𝑥 𝑝 𝜃 𝑥|𝑧
  • 18.
    Training Variational AutoEncoder • Latent space: 18 𝒙 𝟏 𝒙 𝟒 𝒙 𝟑 𝒙 𝟐 𝒉 𝟏 𝒉 𝟐 𝒉 𝟑 𝝈 𝝁 𝒛 𝑞∅ 𝑧|𝑥 Mean Variance 1 dimensional Gaussian probability distribution If we have n neurons for 𝝈 and 𝝁 then we have n dimensional distribution
  • 19.
    Training Variational AutoEncoder • Generating new data: – Example: MNIST Database 19 𝐱 Encoder Latent space Decoder
  • 20.
    Generative Adversarial Network •VAE: • GAN: – Can generate samples – Trained by competing each other – Use neural network – Z is some random noise (Gaussian/Uniform). – Z can be thought as the latent representation of the image. 20 x Decoder 𝐱zEncoder z Generator 𝐱 x Discriminator Fake or real? Loss
  • 21.
    GAN’s Architecture Real samples Discriminator Generated fake samples Finetune training Latent space Noise Is D correct?Generator • Overview:
  • 22.
    Using GAN • Imagegeneration: 22
  • 23.
    Using GAN • Datamanipulation: 23
  • 24.
    Denoising Auto Encoder •Add noise to its input, and train it to recover this original. 24
  • 25.
    Denoising Auto Encoder 25 Input Output Hidden3 Hidden 2 Hidden 1 +Noise Input Output Hidden 3 Hidden 2 Hidden 1 Dropout Randomly switched inputGaussian noise
  • 26.
    Sparse Auto Encoder •Reduce the number of active neurons in the coding layer. – Add sparsity loss into the cost function. • Sparsity loss: – Kullback-Leibler(KL) divergence is commonly used. 26
  • 27.
    Sparse Auto Encoder 27    1 ˆ log 1 log ˆ ˆ1 j j j KL                    1 ˆ, ,sparse j j J w b J w b KL      