0% found this document useful (0 votes)
117 views118 pages

Generative Adversarial Networks Overview

The document provides an overview of Generative Adversarial Networks (GANs), detailing their architecture, including the roles of the generator and discriminator, as well as various types of GANs such as Conditional GANs and Deep Convolutional GANs. It discusses the training process, loss functions, and applications of GANs in fields like image synthesis, data augmentation, and text-to-image generation. Additionally, it highlights the advantages of GANs, including their ability to generate high-quality synthetic data and their versatility across different tasks.

Uploaded by

ss2623
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
117 views118 pages

Generative Adversarial Networks Overview

The document provides an overview of Generative Adversarial Networks (GANs), detailing their architecture, including the roles of the generator and discriminator, as well as various types of GANs such as Conditional GANs and Deep Convolutional GANs. It discusses the training process, loss functions, and applications of GANs in fields like image synthesis, data augmentation, and text-to-image generation. Additionally, it highlights the advantages of GANs, including their ability to generate high-quality synthetic data and their versatility across different tasks.

Uploaded by

ss2623
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

21CSE424T – Deep

Learning for Data


Analytics
UNIT IV
UNIT IV
Generative Adversarial Networks: Generator, Discriminator, Loss functions, Generator
loss, Discriminator loss, Training – Deep Convolutional Generative Adversarial
Networks: Wasserstein GAN, BEGAN, CycleGAN, Conditional GANs: Pix2pix
Tutorial:
T7. To build a model using GAN to resemble MNIST digits
T8. To implement a Deep Convolutional GAN to generate complex color images
To implement a Deep Convolutional GAN on Fashion MNIST data set using ReLU as
activation function for generator, leaky ReLU as activation function for discriminator
Generative Adversarial Networks (GAN) help machines to
create new, realistic data by learning from existing
examples.

It is introduced by Ian Goodfellow and his team in 2014


and they have transformed how computers generate
Generative images, videos, music and more.
Adversarial
Networks Unlike traditional models that only recognize or classify
data, they take a creative way by generating entirely new
content that closely resembles real-world data.

This ability helped various fields such as art, gaming,


healthcare and data science.
Generative Adversarial Networks
How GANs Work?
GAN vs Traditional ML:
Architecture of GAN
1. Generator Model
The generator is a deep neural network that takes random noise as input to generate
realistic data samples like images or text.
It learns the underlying data patterns by adjusting its internal parameters during training
through backpropagation.
Its objective is to produce samples that the discriminator classifies as real.
Architecture of GAN
Generator Loss Function: The generator tries to minimize this loss:

where
JG measure how well the generator is fooling the discriminator.
G(zi) is the generated sample from random noise zi
D(G(zi)) is the discriminator’s estimated probability that the generated sample is real.
The generator aims to maximize D(G(zi)) meaning it wants the discriminator to classify its
fake data as real (probability close to 1).
Architecture of GAN
2. Discriminator Model
The discriminator acts as a binary classifier helps in distinguishing between real and
generated data.
It learns to improve its classification ability through training, refining its parameters to
detect fake samples more accurately.
When dealing with image data, the discriminator uses convolutional layers or other relevant
architectures which help to extract features and enhance the model’s ability.
Architecture of GAN
Discriminator Loss Function: The discriminator tries to minimize this loss:

JD measures how well the discriminator classifies real and fake samples.
xi is a real data sample.
G(zi) is a fake sample from the generator.
D(xi) is the discriminator’s probability that xixi is real.
D(G(zi)) is the discriminator’s probability that the fake sample is real.
The discriminator wants to correctly classify real data as real (maximize logD(xi) )and fake
data as fake (maximize log(1−D(G(zi)))
Architecture of GAN
MinMax Loss
GANs are trained using a MinMax Loss between the generator and discriminator:

where,
G is generator network and is DD is the discriminator network
pdata(x) = true data distribution
pz(z) = distribution of random noise (usually normal or uniform)
D(x) = discriminator’s estimate of real data
D(G(z)) = discriminator’s estimate of generated data
The generator tries to minimize this loss (to fool the discriminator) and the discriminator
tries to maximize it (to detect fakes accurately).
Architecture of GAN
How does a GAN work?
GAN train by having two networks the Generator (G) and the Discriminator (D) compete
and improve together.
1. Generator's First Move
The generator starts with a random noise vector like random numbers. It uses this noise as
a starting point to create a fake data sample such as a generated image. The generator’s
internal layers transform this noise into something that looks like real data.
How does a GAN work?
2. Discriminator's Turn
The discriminator receives two types of data:
Real samples from the actual training dataset.
Fake samples created by the generator.
D's job is to analyze each input and find whether it's real data or something G cooked up. It
outputs a probability score between 0 and 1. A score of 1 shows the data is likely real and 0
suggests it's fake.
How does a GAN work?
3. Adversarial Learning
If the discriminator correctly classifies real and fake data it gets better at its job.
If the generator fools the discriminator by creating realistic fake data, it receives a positive
update and the discriminator is penalized for making a wrong decision.
4. Generator's Improvement
Each time the discriminator mistakes fake data for real, the generator learns from this
success.
Through many iterations, the generator improves and creates more convincing fake
samples.
How does a GAN work?
5. Discriminator's Adaptation
The discriminator also learns continuously by updating itself to better spot fake data.
This constant back-and-forth makes both networks stronger over time.
6. Training Progression
As training continues, the generator becomes highly proficient at producing realistic data.
Eventually the discriminator struggles to distinguish real from fake shows that the GAN has
reached a well-trained state.
At this point, the generator can produce high-quality synthetic data that can be used for
different applications.
Types of GAN
1. Vanilla GAN
Vanilla GAN is the simplest type of GAN. It consists of:
A generator and a discriminator both are built using multi-layer perceptrons (MLPs).
The model optimizes its mathematical formulation using stochastic gradient descent (SGD).
While foundational, Vanilla GAN can face problems like:
Mode collapse: The generator produces limited types of outputs repeatedly.
Unstable training: The generator and discriminator may not improve smoothly.
Types of GAN
2. Conditional GAN (CGAN)
Conditional GAN (CGAN) adds an additional conditional parameter to guide the generation
process. Instead of generating data randomly they allow the model to produce specific types
of outputs.
Working of CGANs:
A conditional variable (y) is fed into both the generator and the discriminator.
This ensures that the generator creates data corresponding to the given condition (e.g
generating images of specific objects).
The discriminator also receives the labels to help distinguish between real and fake data.
Example: Instead of generating any random image, CGAN can generate a specific object like
a dog or a cat based on the label.
Types of GAN
3. Deep Convolutional GAN (DCGAN)
Deep Convolutional GAN (DCGAN) are among the most popular types of GANs used for
image generation.
They are important because they:
Uses Convolutional Neural Networks (CNNs) instead of simple multi-layer perceptrons
(MLPs).
Max pooling layers are replaced with convolutional stride helps in making the model more
efficient.
Fully connected layers are removed, which allows for better spatial understanding of
images.
DCGANs are successful because they generate high-quality, realistic images.
Types of GAN
4. Laplacian Pyramid GAN (LAPGAN)
Laplacian Pyramid GAN (LAPGAN) is designed to generate ultra-high-quality images by
using a multi-resolution approach.
Working of LAPGAN:
Uses multiple generator-discriminator pairs at different levels of the Laplacian pyramid.
Images are first down sampled at each layer of the pyramid and upscaled again using
Conditional GAN (CGAN).
This process allows the image to gradually refine details and helps in reducing noise and
improving clarity.
Due to its ability to generate highly detailed images, LAPGAN is considered a superior
approach for photorealistic image generation.
Types of GAN
5. Super Resolution GAN (SRGAN)
Super-Resolution GAN (SRGAN) is designed to increase the resolution of low-quality images
while preserving details.
Working of SRGAN:
Uses a deep neural network combined with an adversarial loss function.
Enhances low-resolution images by adding finer details helps in making them appear
sharper and more realistic.
Helps to reduce common image upscaling errors such as blurriness and pixelation.
Application Of Generative Adversarial
Networks (GAN)
Image Synthesis & Generation: GANs generate realistic images, avatars and
high-resolution visuals by learning patterns from training data. They are used in art, gaming
and AI-driven design.
Image-to-Image Translation: They can transform images between domains while
preserving key features. Examples include converting day images to night, sketches to
realistic images or changing artistic styles.
Text-to-Image Synthesis: They create visuals from textual descriptions helps applications
in AI-generated art, automated design and content creation.
Data Augmentation: They generate synthetic data to improve machine learning models
helps in making them more robust and generalizable in fields with limited labeled data.
High-Resolution Image Enhancement: They upscale low-resolution images which helps in
improving clarity for applications like medical imaging, satellite imagery and video
enhancement.
Application Of Generative
Adversarial Networks (GAN)
Advantages of GAN
Synthetic Data Generation: GANs produce new, synthetic data resembling real data
distributions which is useful for augmentation, anomaly detection and creative tasks.
High-Quality Results: They can generate photorealistic images, videos, music and other
media with high quality.
Unsupervised Learning: They don’t require labeled data helps in making them effective in
scenarios where labeling is expensive or difficult.
Versatility: They can be applied across many tasks including image synthesis, text-to-image
generation, style transfer, anomaly detection and more.
Loss Functions in GANs
In GANs (Generative Adversarial Networks), loss functions are critical—they guide both the
generator and the discriminator during training.
There are various types, but some are foundational and widely used:
1. Standard (Minimax) Loss Function
2. Non-Saturating Loss
3. Least Squares Loss (LSGAN)
4. Wasserstein Loss (WGAN)
Loss Functions in GANs
1. Standard (Minimax) Loss Function
This is the original GAN loss from Ian Goodfellow’s 2014 paper. It’s formulated as a
two-player minimax game:
Discriminator Loss:

The discriminator tries to assign 1 (real) to true data and 0 (fake) to generated data.
Generator Loss:
The generator tries to 'fool' the discriminator into classifying its outputs as real.
Loss Functions in GANs
2. Non-Saturating Loss

Instead of minimizing log(1−D(G(z))), the generator maximizes logD(G(z)).

This helps prevent vanishing gradients during training and encourages better generator
learning.
Loss Functions in GANs
3. Least Squares Loss (LSGAN)
Discriminator minimizes:
Generator minimizes:
This loss penalizes predictions that are far from the expected value more heavily, aiding
stability and convergence.
Loss Functions in GANs
4. Wasserstein Loss (WGAN)
Critic (discriminator) loss: D(x)−D(G(z))
Generator loss: −D(G(z))
This measures the earth mover (Wasserstein) distance and solves some training instability
issues in classic GANs.
Loss Functions in GANs
5. Other Loss Functions
Hinge loss
Logistic loss
Feature matching
BEGAN (Boundary Equilibrium) loss: uses an autoencoder loss, balancing generator and
discriminator with a control variable.
Loss Functions in GANs
Binary Cross Entropy/Log Loss for
Binary Classification
Binary cross-entropy (log loss) is a loss function used in binary classification problems. It
quantifies the difference between the actual class labels (0 or 1) and the predicted probabilities
output by the model. The lower the binary cross-entropy value, the better the model’s predictions
align with the true labels.
Mathematically, Binary Cross-Entropy (BCE) is defined as:

where:
N is the number of observations
yi is the actual binary label (0 or 1) of the ith observation.
pi is the predicted probability of the ith observation being in class 1.
Since the model’s output is a probability between 0 and 1, minimizing binary cross-entropy during
training helps improve predictive accuracy, ensuring the model effectively distinguishes between
two classes.
How Does Binary Cross-Entropy Work?
Binary Cross-Entropy measures the distance between the true labels and the predicted
probabilities. When the predicted probability pipi is close to the actual label yiyi, the BCE
value is low, indicating a good prediction.
Conversely, when the predicted probability deviates significantly from the actual label, the
BCE value is high, indicating a poor prediction. The logarithmic component of the BCE
function penalizes wrong predictions more heavily than correct ones.
Why is Binary Cross-Entropy
Important?
Training Deep Learning Models: Binary Cross-Entropy is used as the loss function for
training neural networks in binary classification tasks. It helps in adjusting the model's
weights to minimize the prediction error.
Probabilistic Interpretation: BCE provides a probabilistic interpretation of the model's
predictions, making it suitable for applications where understanding the confidence of
predictions is important, such as in medical diagnosis or fraud detection.
Model Evaluation: BCE is a clear and interpretable metric for evaluating the performance
of binary classification models. Lower BCE values indicate better model performance.
Handling Imbalanced Data: BCE can be particularly useful in scenarios with imbalanced
datasets, where one class is significantly more frequent than the other. By focusing on
probability predictions, it helps the model learn to make accurate predictions even in the
presence of class imbalance.
Mathematical
Example of
Binary
Cross-Entrop
True Label Predicted
y Observation
(y) Probability (p)
Consider a binary classification
problem where we have the
following true labels y and predicted 1 1 0.9
probabilities p for a set of
observations: 2 0 0.2

3 1 0.8

4 0 0.4
Mathematical Example of Binary
Cross-Entropy
Observation 1:
True label y1=1 and Predicted probability p1=0.1
Loss1=−(1⋅log(0.9)+(1−1)⋅log(1−0.9))=−log(0.9)≈−(−0.1054)=0.1054
Similarly, for other classes,
Predicted probability p2 =0.2 and Loss2=0.223
Predicted probability p3 =0.8and Loss3=0.2231
Predicted probability p4 =0.4 and Loss4=0.5108
Next, we sum the individual losses and calculate the average:
Total Loss=0.1054+0.2231+0.2231+0.5108=1.0624
Average Loss (BCE)=1.06244/4=0.2656. Therefore, the Binary Cross-Entropy loss for these
observations is approximately 0.2656.
Training in GAN
1. Components
Generator: Creates fake samples (e.g., images) from random noise.
Discriminator: Receives real samples from the training set and fake samples from the
generator; tries to distinguish between them.
2. Training Steps
Generator produces fake data:
◦ The generator transforms random noise into synthetic data.
Training in GAN
Discriminator evaluates samples:
◦ It receives a batch of real data and a batch of generated (fake) data.
◦ It outputs a probability for each sample indicating how likely it is real.

Update discriminator:
◦ The discriminator is trained to assign correct labels: ‘real’ to real data and ‘fake’ to
generated data.
◦ Its weights are updated to better tell fake from real.
Training in GAN
Update generator:
◦ The generator is updated based on the discriminator’s feedback.
◦ Its objective: generate samples that the discriminator is more likely to label as ‘real’.

Alternate training:
◦ Typically, the discriminator and generator are alternately updated. Each is held constant
while the other is trained for a step or batch.
Training in GAN
3. Objective
The adversarial game:
The generator tries to maximize the probability of the discriminator being wrong (fooling
it), while the discriminator tries to minimize its own error in telling real from fake,
representing a minimax game.
Equilibrium:
Training continues until the generator produces data so realistic that the discriminator
cannot distinguish between real and fake (outputting 0.5 probability for both).
Training in GAN
4. Why Is It Challenging?
Convergence is hard to pinpoint:
Unlike supervised learning, there’s no explicit target or label for generator outputs.
Balancing:
If one network becomes much stronger than the other, the system collapses.
Mode collapse and instability:
Common issues where the generator produces limited diversity or fails to converge.
GAN Training Optimization and
Best Practices
Regularization Techniques

◦ Batch Normalization: Used to stabilize training by normalizing activations


layer-wise. Commonly applied in both generator and discriminator to
improve convergence speed and stability.

◦ Weight Initialization: Proper initialization of weights improves GAN


training dynamics.
GAN Training Optimization and
Best Practices
Activation Functions

◦ LeakyReLU is often preferred in the discriminator to avoid dead neurons


and improve gradient flow.
◦ ReLU is commonly used in the generator except the output layer, where
Tanh or Sigmoid is used.
GAN Training Optimization and
Best Practices
Optimizers

◦ Adam optimizer is extensively used with a learning rate of around 0.0002


and beta_1 momentum of 0.5 (instead of default 0.9) to stabilize training.
◦ Alternatives include RMSProp and SGD variants but Adam remains
popular for GANs.
GAN Training Optimization and
Best Practices
Feature Matching

◦ Instead of optimizing generator to directly fool discriminator, the


generator tries to match features in intermediate discriminator layers.
This helps stabilize training by smoothing gradients.
GAN Training Optimization and
Best Practices
Mini-Batch and Batch Sizes
◦ Training with mini-batches helps converge faster and can improve
stability. Some architectures even use batch sizes as small as one or two.
Label Smoothing
◦ Using softened or noisy labels (e.g., instead of labels 0 and 1, use 0.1 and
0.9) for discriminator targets to reduce overconfidence and improve
generalization.
GAN Training Optimization and
Best Practices
Two Time-Scale Update Rule (TTUR)
◦ Train generator and discriminator with different learning rates, typically a
lower rate for discriminator, to balance training and prevent oscillations.
Loss Functions
◦ Using improved loss functions like Wasserstein loss or Least Squares loss
can make training more stable and improve the quality of generated
samples.
GAN Training Optimization and
Best Practices
Avoiding Mode Collapse
◦ Mode collapse happens when the generator produces limited variety.
Techniques like minibatch discrimination, feature matching, and unrolled
GANs help combat this issue.
GAN Training Optimization and
Best Practices
Deep Convolutional GAN
•Deep Convolutional GAN (DCGAN) was proposed by a researcher from MIT and Facebook
AI research.
•It is widely used in many convolution-based generation-based techniques.
•To make training GANs stable.
•Hence, they proposed some architectural changes in the computer vision problems.
Deep Convolutional GAN

A DCGAN (Deep Convolutional Generative Adversarial Network) is a class of


machine learning frameworks designed for unsupervised learning using two
neural networks — a generator and a discriminator — that contest with each
other in a game-theoretic scenario.
Deep Convolutional GAN
Key Concepts
Generator: Takes random noise as input and creates fake images intended to
look real.
Discriminator: Receives both real images (from a dataset) and fake images
(from the generator), then learns to distinguish between the two.
Deep Convolutional GAN
Key Concepts
Generator: Takes random noise as input and creates fake images intended to
look real.
Discriminator: Receives both real images (from a dataset) and fake images
(from the generator), then learns to distinguish between the two.
Need for Deep Convolutional GAN
•DCGANs are introduced to reduce the problem of mode collapse.
•Mode collapse occurs when the generator got biased towards a few outputs and can't able to
produce outputs of every variation from the dataset.
• For example- take the case of mnist digits dataset (digits from 0 to 9) , we want the
generator should generate all type of digits but sometimes our generator got biased towards
two to three digits and produce them only.
•Because of that the discriminator also got optimized towards that particular digits only, and
this state is known as mode collapse.
•But this problem can be overcome by using DCGANs.
Architecture of DCGANs
Architecture of DCGANs
❖Unlike classic GANs, DCGANs introduce convolutional and convolutional-transpose
layers (instead of fully connected layers), resulting in better image generation capabilities.
Key architectural guidelines of DCGANs include:
❖Use of strided convolutions and transposed convolutions for upsampling/downsampling.
❖Removal of pooling layers (replaced by convolutional layers with strides).
❖Use of batch normalization for stable training.
❖Removal of fully connected hidden layers for deeper networks.
❖LeakyReLU activation in discriminator; ReLU in generator (tanh on output).
Architecture of DCGANs
❑The generator of the DCGAN architecture takes 100 uniform generated values using normal
distribution as an input.
❑First, it changes the dimension to 4x4x1024 and performed a fractionally stridden
convolution 4 times with a stride of 1/2 (this means every time when applied, it doubles the
image dimension while reducing the number of output channels).
❑The generated output has dimensions of (64, 64, 3).
❑There are some architectural changes proposed in the generator such as the removal of all
fully connected layers, and the use of Batch Normalization which helps in stabilizing
training.
❑We use ReLU activation function in all layers of the generator, except for the output layers.
Architecture of DCGANs
❑The role of the discriminator here is to determine that the image comes from either a real
dataset or a generator.
❑The discriminator can be simply designed similar to a convolution neural network that
performs an image classification task.
❑Instead of fully connected layers, they used only strided-convolutions with LeakyReLU as an
activation function, the input of the generator is a single image from the dataset or
generated image and the output is a score that determines whether the image is real or
generated.
Applications of DCGANs
❑Image generation (e.g., generating faces, objects, artwork)

❑Image super-resolution

❑Style transfer

❑Unsupervised feature learning


Advantageous of DCGANs
✅ Produces high-quality, sharp images
✅ More stable training than original GAN
✅ Learns rich image features (useful for transfer learning)
Wasserstein GAN
Wasserstein Generative Adversarial Network (WGANs) is a variation of Deep Learning GAN
with little modification in the algorithm.
Generative Adversarial Network (GAN) is a method for constructing an efficient generative
model.
Martin Arjovsky, Soumith Chintala, and Léon Bottou developed this network in 2017. This is
used widely to produce real images.
WGAN's architecture uses deep neural networks for both generator and discriminator.
The key difference between GANs and WGANs is the loss function and the gradient penalty.
WGANs were introduced as the solution to mode collapse issues.
The network uses the Wasserstein distance, which provides a meaningful and smoother
measure of distance between distributions.
BEGAN: Boundary Equilibrium
Generative Adversarial Networks
BEGAN is a GAN (Generative Adversarial Network) variant introduced in 2017 by David
Berthelot, Thomas Schumm, and Luke Metz. It is recognized for introducing an
equilibrium-enforcing method using an autoencoder-based discriminator and a novel loss
derived from the Wasserstein distance.
Key Concepts and Contributions
Autoencoder-Based Discriminator:
BEGAN uses an autoencoder (not a classic classifier network) as the discriminator. The
discriminator learns to reconstruct input images, and its reconstruction loss indicates the
difference between real and generated samples.
Key Concepts and Contributions
Equilibrium Training:
BEGAN proposes a new technique to balance the power between the generator and
discriminator. It introduces a control variable kk that keeps the generator and discriminator
in equilibrium, preventing either from overpowering the other during training.
Loss Function and Wasserstein Distance:
The loss is based on the Wasserstein distance between distributions of autoencoder loss for
real and generated images, enhancing training stability and convergence.
BEGAN Objective Functions
Discriminator Loss (Autoencoder Reconstruction Error):
LD=L(x)−kt⋅L(G(z))
◦ L(x)L(x): reconstruction loss of the discriminator (autoencoder) on real
data.
◦ L(G(z))L(G(z)): reconstruction loss on generated data.
◦ ktkt: proportional control variable balancing generator/discriminator.
BEGAN Objective Functions
Generator Loss:

LG=L(G(z))

◦ The generator tries to make its output as reconstructable as real data by


the discriminator.
BEGAN Objective Functions
Balance Equation:

The parameter kk is adjusted at each step to maintain the desired


balance (such as matching the autoencoder losses for real and
generated samples).
Features of BEGAN
Convergence Measure:
BEGAN proposes an approximate measure of convergence to quantify training progress, an
issue often ambiguous in other GANs.
Visual Quality and Training Stability:
BEGAN achieves high visual quality even on high-resolution images, along with stable and
fast training compared to earlier GAN models.
Controllable Trade-off:
The model allows easy adjustment between image diversity and image visual quality through
its equilibrium parameter.
Applications
❑High-quality image synthesis and generation.
❑Representation learning using the autoencoder-based discriminator.
❑Tasks needing a balance between diversity and quality/realism in generated
samples.
Cycle GAN
❖CycleGAN solves this problem by learning to change images from one style to
another without needing matching pairs.

❖It understands the features of the new style and transforms the original
images accordingly.

❖This makes it useful for tasks like changing seasons in photos, turning one
animal into another or converting pictures into paintings.
Cycle GAN
The process starts with an input image(x) and Generator G translates it to the target
domain like turning a photo into a painting. Then generator F takes this
transformed image and maps it back to the original domain helps in reconstructing
an image close to the input.

The model measures the difference between the original and reconstructed images
using a loss function like mean squared error. This cycle consistency loss helps the
network to learn meaningful, reversible mappings between the two domains.
Architecture of Cycle GAN
1. Generators: Create new images in the target style.
CycleGAN has two generators G and F:
G transforms images from domain X like photos to domain Y like artwork.
F transforms images from domain Y back to domain X.
The generator mapping functions are as follows:
G:X→Y
F:Y→X
where X is the input image distribution and Y is the desired output distribution such as Van
Gogh styles.
Architecture of Cycle GAN
Architecture of Cycle GAN
2. Discriminators: Decide if images are real (from dataset) or fake (generated).
There are two discriminators Dₓ and Dᵧ.
Dₓ distinguishes between real images from X and generated images from F(y).
Dᵧ distinguishes between real images from Y and generated images fromG(x).
To further regularize the mappings the CycleGAN uses two more loss function in addition to
adversarial loss.
Architecture of
Cycle GAN
1. Forward Cycle Consistency Loss: Ensures
that when we apply G and then F to an image
we get back the original image
For example: .x−−>G(x)−−>F(G(x))≈x
Architecture of Cycle
GAN

2. Backward Cycle Consistency Loss:


Ensures that when we apply F and then G to
an image we get back the original image.
Generator Architecture
Each CycleGAN generator has three main sections:
Encoder: The input image is passed through three convolution layers which extract features
and compress the image while increasing the number of channels. For example a
256×256×3 image is reduced to 64×64×256 after this step.
Transformer: The encoded image is processed through 6 or 9 residual blocks depending on
the input size which helps retain important image details.
Decoder: The transformed image is up-sampled using two deconvolution layers and
restoring it to its original size.
Generator
Structure:
c7s1-64 → d128 → d256 → R256
(×6 or 9) → u128 → u64 → c7s1-3
c7s1-k: 7×7 convolution layer with k
filters.
dk: 3×3 convolution with stride 2
(down-sampling).
Rk: Residual block with two 3×3
convolutions.
uk: Fractional-stride deconvolution
(up-sampling).
Discriminator Architecture
(PatchGAN)
In CycleGAN the discriminator uses a PatchGAN instead of a regular GAN discriminator.
1. A regular GAN discriminator looks at the entire image (e.g 256×256 pixels) and outputs
a single score that says whether the whole image is real or fake.
2. PatchGAN breaks the image into smaller patches (e.g 70×70 patches). It outputs a grid
(like 70×70 values) where each value judges if the corresponding patch is real or fake.
This lets PatchGAN focus on local details such as textures and small patterns rather than the
whole image at once it helps in improving the quality of generated images.
Discriminator
Structure:
Discriminator Structure:
C64 → C128 → C256 → C512
→ Final Convolution
Ck: 4×4 convolution with k
filters, InstanceNorm and
LeakyReLU except the first layer.
The final layer produces a 1×1
output and marking real vs. fake
patches.
Cost Function in CycleGAN
CycleGAN uses a cost function or loss function to help the training process. The cost function
is made up of several parts:
Adversarial Loss: We apply adversarial loss to both our mappings of generators and
discriminators. This adversary loss is written as :
Cost Function in CycleGAN
Cycle Consistency Loss: Given a random set of images adversarial network can map the set
of input image to random permutation of images in the output domain which may induce
the output distribution similar to target distribution.
Thus adversarial mapping cannot guarantee the input xi to yi .
For this to happen we proposed that process should be cycle-consistent.
This loss function used in Cycle GAN to measure the error rate of inverse mapping G(x) ->
F(G(x)). The behavior induced by this loss function cause closely matching the real input (x)
and F(G(x))
Cost Function in CycleGAN
The Cost function we used is the sum of adversarial loss and cyclic consistent loss:
Evaluating CycleGAN’s
Performance
AMT Perceptual Studies: It involve real people reviewing generated images to see if they
look real. This is like a voting system where participants on Amazon Mechanical Turk
compare AI-created images with actual ones.
FCN Scores: It help to measure accuracy especially in datasets like Cityscapes. These scores
check how well the AI understands objects in images by evaluating pixel accuracy and IoU
(Intersection over Union) which measures how well the shapes of objects match real.
Drawbacks and Limitations
CycleGAN is great at modifying textures like turning a horse’s coat into zebra stripes but
cannot significantly change object shapes or structures.
The model is trained to change colors and patterns rather than reshaping objects and make
structural modifications difficult.
Sometimes it give the unpredictable results like the generated images may look unnatural or
contain distortions.
Conditional Generative Adversarial
Network
Conditional Generative Adversarial Networks (CGANs) are a specialized type of Generative
Adversarial Network (GAN) that generate data based on specific conditions such as labels or
descriptions. Unlike standard GANs that produce random outputs, CGANs control the
generation process by adding additional information which allows the creation of targeted
and precise data.
For example we have a dataset with various car brands, a CGAN can be conditioned to
generate images of only Mercedes cars by specifying "Mercedes" as the condition. This
conditioning mechanism helps it to generate data that closely aligns with the desired
attributes or categories.
Conditional Generative Adversarial
Network
Architecture and Working of
CGANs
1. Generator in CGANs: The generator creates synthetic data such as images, text or videos.
It takes two inputs:
Random Noise (z): A vector of random values that adds diversity to generated outputs.
Conditioning Information (y): Extra data like labels or context that guides what the
generator produces for example a class label such as "cat" or "dog".
The generator combines the noise and the conditioning information to produce realistic
data that matches the given condition. For example if the condition y is "cat" the generator
will create an image of a cat.
Architecture and Working of
CGANs
2. Discriminator in CGANs: The discriminator is a binary classifier that decides whether
input data is real or fake. It also receives two inputs:
Real Data (x): Actual samples from the dataset.
Conditioning Information (y): The same condition given to the generator.
Using both the real/fake data and the condition, the discriminator learns to judge if the data
is genuine and if it matches the condition. For example if the input is an image labeled "cat"
the discriminator verifies whether it truly looks like a real cat.
Architecture and Working of
CGANs
3. Interaction Between Generator and Discriminator: The generator and discriminator
train together through adversarial training:
The generator tries to create fake data based on noise (z) and condition (y) that can fool the
discriminator.
The discriminator attempts to correctly classify real vs. fake data considering the condition
(y).
The goal of the adversarial process is:
Generator: Produce data that the discriminator believes is real.
Discriminator: Accurately distinguish between real and fake data.
Architecture and Working of
CGANs
4. Loss Function and Training: Training is guided by a loss function that balances the
generator and discriminator:

❑The first term encourages the discriminator to classify real samples correctly.
❑The second term pushes the generator to produce samples that the discriminator classifies
as real.
Here E represents the expected value pdata is the real data distribution and pz is the prior
noise distribution.
As training progresses both the generator and discriminator improve. This adversarial
process results in the generator producing more realistic data conditioned on the input
information.
Architecture and Working of
CGANs
4. Loss Function and Training: Training is guided by a loss function that balances the
generator and discriminator:

❑The first term encourages the discriminator to classify real samples correctly.
❑The second term pushes the generator to produce samples that the discriminator classifies
as real.
Here E represents the expected value pdata is the real data distribution and pz is the prior
noise distribution.
As training progresses both the generator and discriminator improve. This adversarial
process results in the generator producing more realistic data conditioned on the input
information.
Image-to-Image Translation using
Pix2Pix
Pix2pix GANs were proposed by researchers at UC Berkeley in 2017.
It uses a conditional Generative Adversarial Network to perform the image-to-image
translation task (i.e. converting one image to another, such as facades to buildings and
Google Maps to Google Earth, etc.
Architecture
The pix2pix uses conditional generative adversarial networks (conditional-GAN) in its
architecture.
The reason for this is even if we train a model with a simple L1/L2 loss function for a
particular image-to-image translation task, this might not understand the nuances of the
images.
Generator
The architecture used in the generator
was U-Net architecture. It is similar to
Encoder-Decoder architecture except for
the use of skip-connections in the
encoder-decoder architecture. Skip
connections are used because when the
encoder downsamples the image, the
output of the encoder contains more
information about features and
classification of class but lost the
low-level features like spatial
arrangement of the object of that class in
the image, so skip connections
between encoder and decoder layers
prevent this problem of losing low-level
features.
Generator
Encoder Architecture: The Encoder network of the Generator network has seven
convolutional blocks. Each convolutional block has a convolutional layer, followed by a
LeakyRelu activation function (with a slope of 0.2 in the paper). Each convolutional block
also has a batch normalization layer except the first convolutional layer.
Decoder Architecture: The Decoder network of the Generator network has seven
Transpose convolutional blocks. Each upsampling convolutional block (Dconv) has an
upsampling layer, followed by a convolutional layer, a batch normalization layer, and a ReLU
activation function.
The generator architecture contains skip connections between each layer i and layer n − i,
where n is the total number of layers. Each skip connection simply concatenates all channels
at layer i with those at layer n − i.
Discriminator
The discriminator uses Patch GAN architecture, which also
uses Style GAN architecture. This PatchGAN architecture
contains a number of Transpose convolutional blocks. This
PatchGAN architecture takes an NxN part of the image and
tries to find whether it is real and fake. This discriminator is
applied convolutionally across the whole image, averaging
it to generate the result of the discriminator D.
Discriminator
Each block of the discriminator contains a convolution layer, batch norm layer, and
LeakyReLU. This discriminator receives two inputs:
The input image and Target Image (which discriminator should classify as real)
The input image and Generated Image (which they should classify as fake).
The PatchGAN is used because the author argues that it will be able to preserve
high-frequency details in the image, with low-frequency details that can be focused by
L1-loss.
Generator Loss
The generator loss is the linear combination of L1- loss between generated image, target
image.
Discriminator Loss
The discriminator loss takes two inputs real image and generated image:
❖real_loss is a sigmoid cross-entropy loss of the real images and an array of ones(since these
are the real images).
❖generated_loss is a sigmoid cross-entropy loss of the generated images and an array of
zeros(since these are the fake images)
The total loss is the sum of the real_loss and generated_loss.

You might also like