0% found this document useful (0 votes)
31 views48 pages

Self-Supervised Learning Strategies

The document discusses self-supervised learning, highlighting its motivation to utilize vast amounts of unlabeled data for training deep learning models without costly manual labeling. It outlines strategies for self-supervision, including non-contrastive and contrastive methods, and provides examples of pretext tasks such as rotation, jigsaw puzzles, and colorization. The document also touches on the effectiveness of various self-supervised learning techniques and their applications in downstream tasks.

Uploaded by

Rajdip Ingale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views48 pages

Self-Supervised Learning Strategies

The document discusses self-supervised learning, highlighting its motivation to utilize vast amounts of unlabeled data for training deep learning models without costly manual labeling. It outlines strategies for self-supervision, including non-contrastive and contrastive methods, and provides examples of pretext tasks such as rotation, jigsaw puzzles, and colorization. The document also touches on the effectiveness of various self-supervised learning techniques and their applications in downstream tasks.

Uploaded by

Rajdip Ingale
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

SELF SUPERVISED

LEARNING
Prof. Biplab Banerjee
GNR 650
Slides Overview
• Motivation
• Introduction to self-supervision
• Strategies of self-supervision
• Non-Contrastive Strategies
• Contrastive Strategies
• Use cases
Motivation
• Supervised Learning has shown great promise in deep
learning.
• Deep Learning models are data hungry in nature.
• Manual labelling of data is too costly and time-
consuming affair.
• For e.g. ImageNet dataset curation started in 2006 and
continued till 2010 (~5 years for collecting and
labelling).
• Internet consists of vast amount of unlabeled data that
has not yet been effectively utilised.
Motivation
• Can we learn label agnostic feature representation from
a large unlabeled data which can be generalised to
multiple task?
• Unsupervised learning is hard therefore we carry out
self-supervision for learning from data without labels.

Neural
Network

Unlabeled Representatio
Data n
What is Self-Supervision?
• Generate labelled data from unlabeled data by
some means of automation and without much of
a human supervision.
• Train a neural network to predict "generated"
labels to learn better representation
• Finally Fine-tune on downstream
• task with few samples
What is Self-Supervision?
• We train a neural network by "generating" labels from
unlabeled data and then fine tune on the downstream
task that you might be interested in.
Strategy for Self-Supervision
• From unlabeled data generate some tasks, also known
as pretext task.
• For learning a better representation of data via neural
network encoder
• Then transfer the learnt encoder to the downstream
task by replacing the header and training it.
• Two primary ways to do it:
o Non-contrastive method
o Contrastive method
How to create pretext task?
• So given a data input we
would like to generate the
pretext task in such a way
that model tries to predict or
reconstruct some part/entire
data itself.
• For e.g. Using a part of an
image to generate some
other part of image.
• Then train a neural network in SOURCE:
Yann LeCun @EPFL - "Self-supervised learning: could machines learn like huma
contrastive or non-contrastive ns?" ([Link])

way for self-supervision.


• Hope that the learnt encoder
"generalizes" for downstream
task.
Non-Contrastive way
• Involves working with input image, distorting it in some
way and then try to predict that.
• Create a supervised learning task via automatically
generating labels.
• Some common tasks are:
o Rotation
o Impainting
o Colorization
o Jigsaw puzzle
o Counting objects
o Relative patch position
Rotation
• You rotate images and try to
predict the rotation of image.
• Idea: a good model should be
able to predict only if have visual
common sense of how the object
looks like without distortion.
• Rotate image into 4 direction and
try to predict the rotation.
• Treated as classification problem
Source: Unsupervised Representation
Learning by Predicting Image Rotations by
Gidaris et. Al. 2018
Relative patch position
• Given two patches, predict the relative position of each
other.
• Models tries to learn how different parts of images are
relatively placed and thus learn how• Relative
the objects
to aare in
single
real world. patch, we take
surrounding 8 patch
and number them
and try to predict it.
• Treated as
Source: Unsupervised Visual Representation Learning by Context classification problem
Prediction
by Doersch et. Al. 2015
Jigsaw Puzzle
• Given 9 patches, you jumble them up and then try to
predict the correct class of the jumbling
• Treated as classification problem by indexing to specific
permutation index.

Source: Unsupervised Learning of Visual Representations by Solving


Jigsaw Puzzles by Noroozi & Favaro, 2016
Impainting
• Fill in the missing patch via impainting
• Can be treated as regression problem for reconstructing
missing pixel.
• Can also be treated as
generative problem to
generate missing pixel.
• So, it utilizes both
reconstruction and
adversarial for good result
Source: Context Encoders: Feature Learning by Inpainting by
Pathak et al., 2016
Impainting (Losses)
• Reconstruction loss tries to reconstruct the pixels
• Adversarial loss tries to find whether the image passed
is real or impainted one.

Source: Context Encoders: Feature Learning by Inpainting by Pathak


et al., 2016
Impainting (comparison of loss
results)

Source: Context Encoders: Feature Learning by Inpainting by Pathak


et al., 2016
Masking possibilities
Results on different tasks
Colorization
• Tries to find the color using the black and white image.
• This was tries and done in LAB color space opposed to
RGB color space.
• L denotes perceptual
lightness and a,b denotes
rgb colors.
• Can be treated as
reconstruction and
generative problem.
Source: Colorful Image Colorization by Zhang et al.
2016a
Colorization using split brain
autoencoder
• Divide the input into different channel then use one
channel to predict another one via separate encoders.
• Some common way of splitting, separate color from
lightning and use color to predict lightning and vice
versa.
• Other way is to split image into depth and color image
and use either of them to predict other.

Source: Split-Brain Autoencoders: Unsupervised Learning by


Cross-Channel Prediction by Zhang et al. 2016b
Colorization using split brain
autoencoder
• Colorization and depth prediction example as shown.

Source: Split-Brain Autoencoders: Unsupervised Learning by Cross-Channel Prediction by


Zhang et al. 2016b
Alternate aggregation technique
Results
• Pre-trained using ImageNet labels and fine-tuned on
PASCAL data.

Pretrained on imagenet fully


No Pretraining for them, utilizes weight
rescaling
Impainting

Relative Position
Colorization

Jigsaw Solver

Split brain autoencoder

Rotation of image

Source: Unsupervised Representation Learning by Predicting


Image Rotations by Gidaris et. Al. 2018
Contrastive self supervised learning
multi-modal contrastive learning
SimCLR
Data augmentations
Results
Effects of the projection head
Effects on the batch size
Momentum contrast
MoCo decouples batch size from
negatives

Uses a queue of negative samples

Tackles the time issue of SimCLR


Gradient update in MoCo
Results
Barlow twins
Advantages of BT
• Redundancy reduction in the features
• Avoids collapse in the embedding space
• It does not require large batches, or momentum
encoder
Results
DINO
Some visualization
Segmentation task using DINO
backbone
Pretext invariant SSL
PIRL with memory bank
Results
References
• lecture_12.pdf ([Link])
• Self-Supervised Representation Learning | Lil'Log (lilianw
[Link])
• Week 10 · Deep Learning ([Link])
• Yann LeCun @EPFL - "Self-supervised learning: could ma
chines learn like humans?" ([Link])

You might also like