Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
DEEP
LEARNING
WORKSHOP
Dublin City University
27-28 April 2017
Generative Models and
Adversarial Training
Day 2 Lecture 3
1
What is a generative model?
A model P(X; ϴ) that we can draw samples
from.
E.g. A Gaussian Mixture Model
● Fitting: EM algorithm
● Drawing samples:
○ Draw sample from categorical distribution to
select Gaussian
○ Draw sample from Gaussian
GMMs are not generally complex enough
to draw samples of images from.
P(X = x)
x
x
2
Why are generative models important?
● Model the probability density of images
● Understanding P(X) may help us understand P(Y | X)
● Generate novel content
● Generate training data for discriminative networks
● Artistic applications
● Image completion
● Monte-carlo estimators
3
Generative adversarial networks
New method of training deep generative models
Idea: pit a generator and a discriminator against each other
● Generator tries to draw samples from P(X)
● Discriminator tries to tell if sample came from the generator or the real world
Both discriminator and generator are deep networks (differentiable functions)
Can train with backprop: train discriminator for a while, then train generator, then
discriminator, …
4
Generative adversarial networks (conceptual)
Generator
Real world
images
Discriminator
Real
Loss
Latentrandomvariable
Sample
Sample
Fake
5
The generator
Deterministic mapping from a latent random vector to sample from q(x) ~ p(x)
Usually a deep neural network.
E.g. DCGAN:
6
The discriminator
Parameterised function that tries to distinguish between samples from real images
p(x) and generated ones q(x).
Usually a deep convolutional neural network.
conv
conv
...
F F
7
Training GANs
Generator
Real world
images
Discriminator
Real
Loss
Latentrandomvariable
Sample
Sample
Fake
Alternate between training the discriminator and generator
Differentiable module
Differentiable module
8
Generator
Real world
images
Discriminator
Real
Loss
Latentrandomvariable
Sample
Sample
Fake
1. Fix generator weights, draw samples from both real world and generated images
2. Train discriminator to distinguish between real world and generated images
Backprop error to
update discriminator
weights
9
Generator
Real world
images
Discriminator
Real
Loss
Latentrandomvariable
Sample
Sample
Fake
1. Fix discriminator weights
2. Sample from generator
3. Backprop error through discriminator to update generator weights
Backprop error to
update generator
weights
10
Training GANs
Iterate these two steps until convergence (which may not happen)
● Updating the discriminator should make it better at discriminating between real images and
generated ones (discriminator improves)
● Updating the generator makes it better at fooling the current discriminator (generator improves)
Eventually (we hope) that the generator gets so good that it is impossible for the discriminator to tell the
difference between real and generated images. Discriminator accuracy = 0.5
11
Discriminator
training
Generator
training
12
Some examples of generated
images…
13
ImageNet
Source:
https://openai.com/blog/generative-models/
14
CIFAR-10
Source:
https://openai.com/blog/generative-models/
15
Credit:
Alec Radford
Code on GitHub 16
Credit: Alec Radford Code on GitHub 17
Issues
Known to be very difficult to train:
● Formulated as a “game” between two networks
● Unstable dynamics: hard to keep generator and discriminator in balance
● Optimization can oscillate between solutions
● Generator can collapse
Possible to use supervised labels to help prevent this:
https://arxiv.org/abs/1606.03498
18
Conditional GANs
GANs can be conditioned on other info: e.g. a label
● z might capture random characteristics of the
data, variabilities of possible futures,
● c would condition the deterministic parts (label)
For details on ways to condition GANs: Ways of Conditioning Generative Adversarial Networks (Wack et al.) 19
Generating images/frames conditioned on
captions
(Reed et al. 2016b) (Zhang et al. 2016)
20
Predicting the future with adversarial training
Want to train a model to predict the pixels in frame (t+K) from pixels in frame t.
Many possible futures for same frame
Using supervised loss like MSE results in blurry solutions: loss if minimized if
predictor averages over possibilities when predicting.
We really want a sample, not the mean
Adversarial training can solve this: easy for an adversary to detect blurry frames
Mathieu et al. Deep multi-scale video prediction beyond mean square error, ICLR 2016 (https://arxiv.org/abs/1511.05440) 21
Mathieu et al. Deep multi-scale video prediction beyond mean square error, ICLR 2016 (https://arxiv.org/abs/1511.05440) 22
Image super-resolution
Bicubic: not using data statistics. SRResNet: trained with MSE. SRGAN is able to
understand that there are multiple correct answers, rather than averaging.
(Ledig et al. 2016)
23
Image super-resolution
(Ledig et al. 2016)
24
Saliency prediction
Compare
(BCE)
Generated
Saliency Map
Ground Truth
Saliency Map
Input
Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and Xavier
Giro-i-Nieto. “SalGAN: Visual Saliency Prediction with Generative Adversarial Networks.” arXiv. 2017.
25
Saliency prediction
Adversarial lossDala loss
Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and Xavier
Giro-i-Nieto. “SalGAN: Visual Saliency Prediction with Generative Adversarial Networks.” arXiv. 2017.
26
Image-to-Image translation
Isola, Phillip, Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-image translation with conditional adversarial networks."
arXiv:1611.07004 (2016).
Generator
Discriminator
Generated
pairs
Real World
Ground truth
pairs
Loss
27
Questions?
28

Generative Models and Adversarial Training (D2L3 Insight@DCU Machine Learning Workshop 2017)

  • 1.
    Kevin McGuinness [email protected] Research Fellow InsightCentre for Data Analytics Dublin City University DEEP LEARNING WORKSHOP Dublin City University 27-28 April 2017 Generative Models and Adversarial Training Day 2 Lecture 3 1
  • 2.
    What is agenerative model? A model P(X; ϴ) that we can draw samples from. E.g. A Gaussian Mixture Model ● Fitting: EM algorithm ● Drawing samples: ○ Draw sample from categorical distribution to select Gaussian ○ Draw sample from Gaussian GMMs are not generally complex enough to draw samples of images from. P(X = x) x x 2
  • 3.
    Why are generativemodels important? ● Model the probability density of images ● Understanding P(X) may help us understand P(Y | X) ● Generate novel content ● Generate training data for discriminative networks ● Artistic applications ● Image completion ● Monte-carlo estimators 3
  • 4.
    Generative adversarial networks Newmethod of training deep generative models Idea: pit a generator and a discriminator against each other ● Generator tries to draw samples from P(X) ● Discriminator tries to tell if sample came from the generator or the real world Both discriminator and generator are deep networks (differentiable functions) Can train with backprop: train discriminator for a while, then train generator, then discriminator, … 4
  • 5.
    Generative adversarial networks(conceptual) Generator Real world images Discriminator Real Loss Latentrandomvariable Sample Sample Fake 5
  • 6.
    The generator Deterministic mappingfrom a latent random vector to sample from q(x) ~ p(x) Usually a deep neural network. E.g. DCGAN: 6
  • 7.
    The discriminator Parameterised functionthat tries to distinguish between samples from real images p(x) and generated ones q(x). Usually a deep convolutional neural network. conv conv ... F F 7
  • 8.
    Training GANs Generator Real world images Discriminator Real Loss Latentrandomvariable Sample Sample Fake Alternatebetween training the discriminator and generator Differentiable module Differentiable module 8
  • 9.
    Generator Real world images Discriminator Real Loss Latentrandomvariable Sample Sample Fake 1. Fixgenerator weights, draw samples from both real world and generated images 2. Train discriminator to distinguish between real world and generated images Backprop error to update discriminator weights 9
  • 10.
    Generator Real world images Discriminator Real Loss Latentrandomvariable Sample Sample Fake 1. Fixdiscriminator weights 2. Sample from generator 3. Backprop error through discriminator to update generator weights Backprop error to update generator weights 10
  • 11.
    Training GANs Iterate thesetwo steps until convergence (which may not happen) ● Updating the discriminator should make it better at discriminating between real images and generated ones (discriminator improves) ● Updating the generator makes it better at fooling the current discriminator (generator improves) Eventually (we hope) that the generator gets so good that it is impossible for the discriminator to tell the difference between real and generated images. Discriminator accuracy = 0.5 11
  • 12.
  • 13.
    Some examples ofgenerated images… 13
  • 14.
  • 15.
  • 16.
  • 17.
    Credit: Alec RadfordCode on GitHub 17
  • 18.
    Issues Known to bevery difficult to train: ● Formulated as a “game” between two networks ● Unstable dynamics: hard to keep generator and discriminator in balance ● Optimization can oscillate between solutions ● Generator can collapse Possible to use supervised labels to help prevent this: https://arxiv.org/abs/1606.03498 18
  • 19.
    Conditional GANs GANs canbe conditioned on other info: e.g. a label ● z might capture random characteristics of the data, variabilities of possible futures, ● c would condition the deterministic parts (label) For details on ways to condition GANs: Ways of Conditioning Generative Adversarial Networks (Wack et al.) 19
  • 20.
    Generating images/frames conditionedon captions (Reed et al. 2016b) (Zhang et al. 2016) 20
  • 21.
    Predicting the futurewith adversarial training Want to train a model to predict the pixels in frame (t+K) from pixels in frame t. Many possible futures for same frame Using supervised loss like MSE results in blurry solutions: loss if minimized if predictor averages over possibilities when predicting. We really want a sample, not the mean Adversarial training can solve this: easy for an adversary to detect blurry frames Mathieu et al. Deep multi-scale video prediction beyond mean square error, ICLR 2016 (https://arxiv.org/abs/1511.05440) 21
  • 22.
    Mathieu et al.Deep multi-scale video prediction beyond mean square error, ICLR 2016 (https://arxiv.org/abs/1511.05440) 22
  • 23.
    Image super-resolution Bicubic: notusing data statistics. SRResNet: trained with MSE. SRGAN is able to understand that there are multiple correct answers, rather than averaging. (Ledig et al. 2016) 23
  • 24.
  • 25.
    Saliency prediction Compare (BCE) Generated Saliency Map GroundTruth Saliency Map Input Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and Xavier Giro-i-Nieto. “SalGAN: Visual Saliency Prediction with Generative Adversarial Networks.” arXiv. 2017. 25
  • 26.
    Saliency prediction Adversarial lossDalaloss Junting Pan, Cristian Canton, Kevin McGuinness, Noel E. O’Connor, Jordi Torres, Elisa Sayrol and Xavier Giro-i-Nieto. “SalGAN: Visual Saliency Prediction with Generative Adversarial Networks.” arXiv. 2017. 26
  • 27.
    Image-to-Image translation Isola, Phillip,Jun-Yan Zhu, Tinghui Zhou, and Alexei A. Efros. "Image-to-image translation with conditional adversarial networks." arXiv:1611.07004 (2016). Generator Discriminator Generated pairs Real World Ground truth pairs Loss 27
  • 28.