Míriam Bellver
miriam.bellver@bsc.edu
PhD Candidate
Barcelona Supercomputing Center
Semantic Segmentation
Day 2 Lecture 3
#DLUPC
http://bit.ly/dlcv2018
Segmentation
Segmentation
Define the accurate boundaries of all objects in an image
2
Semantic Segmentation
Label every pixel!
Don’t differentiate
instances (cows)
Classic computer
vision problem
Slide Credit: CS231n 3
Instance Segmentation
Detect instances,
give category, label
pixels
“simultaneous
detection and
segmentation” (SDS)
Label are
class-aware and
instance-aware
Slide Credit: CS231n 4
Outline
Segmentation Datasets
Semantic Segmentation Methods
● Deconvolution (or transposed convolution)
● Dilated Convolution
● Skip Connections
5
Segmentation: Datasets
● 20 categories
● +10,000 images
● Semantic segmentation GT
● Instance segmentation GT
● Real indoor & outdoor scenes
● 540 categories
● +10,000 images
● Dense annotations
● Semantic segmentation GT
● Objects + stuff
Pascal Visual Object Classes Pascal Context
6
Segmentation: Datasets
● Real indoor & outdoor scenes
● 80 categories
● +300,000 images
● 2M instances
● Partial annotations
● Semantic segmentation GT
● Instance segmentation GT
● Objects, but no stuff
COCO Common Objects in Context
7
● Real general scenes
● +150 categories
● +22,000 images
● Semantic segmentation GT
● Instance + parts segmentation GT
● Objects and stuff
ADE20K
Segmentation: Datasets
● Real driving scenes
● 30 categories
● +25,000 images
● 20,000 partial annotations
● 5,000 dense annotations
● Semantic segmentation GT
● Instance segmentation GT
● Depth, GPS and other metadata
● Objects and stuff
● Real driving scenes
● 100 categories
● 25,000 images
● Semantic segmentation GT
● Instance + parts segmentation GT
● Objects and stuff
CityScapes Mapillary Vistas Dataset
8
Outline
Segmentation Datasets
Semantic Segmentation Methods
● Deconvolution (or transposed convolution)
● Dilated Convolution
● Skip Connections
9
From Classification to Segmentation
Slide Credit: CS231n
CNN COW
Extract
patch
Run through
a CNN
Classify
center pixel
Repeat for
every pixel
10
From Classification to Segmentation
Slide Credit: CS231n
CNN
Run “fully convolutional” network
to get all pixels at once
11
Semantic Segmentation
Slide Credit: CS231n
CNN
Smaller output
due to pooling
Problem 1:
12
Learnable upsampling
Long et al. Fully Convolutional Networks for Semantic Segmentation. CVPR 2015 Slide Credit: CS231n 13
Reminder: Convolutional Layer
Slide Credit: CS231n
Typical 3 x 3 convolution, stride 1 pad 1
Input: 4 x 4 Output: 4 x 4
14
Reminder: Convolutional Layer
Slide Credit: CS231n
Typical 3 x 3 convolution, stride 1 pad 1
Input: 4 x 4 Output: 4 x 4
Dot product
between filter
and input
15
Reminder: Convolutional Layer
Slide Credit: CS231n
Typical 3 x 3 convolution, stride 1 pad 1
Input: 4 x 4 Output: 4 x 4
Dot product
between filter
and input
16
Reminder: Convolutional Layer
Slide Credit: CS231n
Typical 3 x 3 convolution, stride 2 pad 1
Input: 4 x 4 Output: 2 x 2
17
Reminder: Convolutional Layer
Slide Credit: CS231n
Typical 3 x 3 convolution, stride 2 pad 1
Input: 4 x 4 Output: 2 x 2
Dot product
between filter
and input
18
Reminder: Convolutional Layer
Slide Credit: CS231n
Typical 3 x 3 convolution, stride 2 pad 1
Input: 4 x 4 Output: 2 x 2
Dot product
between filter
and input
19
Learnable Upsample: Transposed Convolution
Slide Credit: CS231n
3 x 3 “deconvolution”, stride 2 pad 1
Input: 2 x 2 Output: 4 x 4
20
Slide Credit: CS231n
3 x 3 “deconvolution”, stride 2 pad 1
Input: 2 x 2 Output: 4 x 4
Input gives
weight for
filter values
Learnable Upsample: Transposed Convolution
21
Learnable Upsample: Transposed Convolution
Slide Credit: CS231n
3 x 3 “deconvolution”, stride 2 pad 1
Input: 2 x 2 Output: 4 x 4
Input gives
weight for
filter values
Sum where
output overlaps
22
Learnable Upsample: Transposed Convolution
Warning: Checkerboard effect when kernel size is not
divisible by the stride
Source: distill.pub
23
Learnable Upsample: Transposed Convolution
Source: distill.pub
stride = 2, kernel_size = 3
24
Warning: Checkerboard effect when kernel size is not
divisible by the stride
Learnable Upsample: Transposed Convolution
Warning: Checkerboard effect in images generated by
neural networks
Learnable Upsample: Transposed Convolution
Slide Credit: CS231n
Noh et al. Learning Deconvolution Network for Semantic Segmentation. ICCV 2015
“Regular” VGG “Upside down” VGG
26
Semantic Segmentation
CNN Coarse output
Problem 2:
High-level features (e.g. conv5 layer) from a pretrained classification network are the input for the
segmentation branch
27
Skip Connections
Slide Credit: CS231n
Skip connections = Better results
“skip
connections”
Long et al. Fully Convolutional Networks for Semantic Segmentation. CVPR 2015
Recovering low level features from early layers
28
Dilated Convolutions
Yu & Koltun. Multi-Scale Context Aggregation by Dilated Convolutions. ICLR 2016
Structural change in convolutional layers for dense prediction problems (e.g. image segmentation)
● The receptive field grows exponentially as you add more layers → more context information in deeper
layers wrt regular convolutions
● Number of parameters increases linearly as you add more layers
29
Dilated Convolutions
30Source: https://github.com/vdumoulin/conv_arithmetic
State-of-the-art models
31
● U-Net
○ Deconvolutions
○ skip connections
Ronneberger et al. U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI 2015
State-of-the-art models
32
● PSPNet (dilated convolutions + pyramid pooling)
Zhao et al. Pyramid Scene Parsing Network. CVPR 2017
State-of-the-art models
33
● DeepLab v2 (dilated convolutions + CRF)
● DeepLab v3 (added pyramid pooling. Removed CRF)
Chen et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully
Connected CRFs. TPAMI 2017
Chen et al. Rethinking Atrous Convolution for Semantic Image Segmentation. TPAMI 2017
Summary
Segmentation Datasets
Semantic Segmentation Methods
● Deconvolution (or transposed convolution)
● Dilated Convolution
● Skip Connections
34
Questions?
35

Semantic Segmentation - Míriam Bellver - UPC Barcelona 2018

  • 1.
    Míriam Bellver [email protected] PhD Candidate BarcelonaSupercomputing Center Semantic Segmentation Day 2 Lecture 3 #DLUPC http://bit.ly/dlcv2018
  • 2.
    Segmentation Segmentation Define the accurateboundaries of all objects in an image 2
  • 3.
    Semantic Segmentation Label everypixel! Don’t differentiate instances (cows) Classic computer vision problem Slide Credit: CS231n 3
  • 4.
    Instance Segmentation Detect instances, givecategory, label pixels “simultaneous detection and segmentation” (SDS) Label are class-aware and instance-aware Slide Credit: CS231n 4
  • 5.
    Outline Segmentation Datasets Semantic SegmentationMethods ● Deconvolution (or transposed convolution) ● Dilated Convolution ● Skip Connections 5
  • 6.
    Segmentation: Datasets ● 20categories ● +10,000 images ● Semantic segmentation GT ● Instance segmentation GT ● Real indoor & outdoor scenes ● 540 categories ● +10,000 images ● Dense annotations ● Semantic segmentation GT ● Objects + stuff Pascal Visual Object Classes Pascal Context 6
  • 7.
    Segmentation: Datasets ● Realindoor & outdoor scenes ● 80 categories ● +300,000 images ● 2M instances ● Partial annotations ● Semantic segmentation GT ● Instance segmentation GT ● Objects, but no stuff COCO Common Objects in Context 7 ● Real general scenes ● +150 categories ● +22,000 images ● Semantic segmentation GT ● Instance + parts segmentation GT ● Objects and stuff ADE20K
  • 8.
    Segmentation: Datasets ● Realdriving scenes ● 30 categories ● +25,000 images ● 20,000 partial annotations ● 5,000 dense annotations ● Semantic segmentation GT ● Instance segmentation GT ● Depth, GPS and other metadata ● Objects and stuff ● Real driving scenes ● 100 categories ● 25,000 images ● Semantic segmentation GT ● Instance + parts segmentation GT ● Objects and stuff CityScapes Mapillary Vistas Dataset 8
  • 9.
    Outline Segmentation Datasets Semantic SegmentationMethods ● Deconvolution (or transposed convolution) ● Dilated Convolution ● Skip Connections 9
  • 10.
    From Classification toSegmentation Slide Credit: CS231n CNN COW Extract patch Run through a CNN Classify center pixel Repeat for every pixel 10
  • 11.
    From Classification toSegmentation Slide Credit: CS231n CNN Run “fully convolutional” network to get all pixels at once 11
  • 12.
    Semantic Segmentation Slide Credit:CS231n CNN Smaller output due to pooling Problem 1: 12
  • 13.
    Learnable upsampling Long etal. Fully Convolutional Networks for Semantic Segmentation. CVPR 2015 Slide Credit: CS231n 13
  • 14.
    Reminder: Convolutional Layer SlideCredit: CS231n Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 14
  • 15.
    Reminder: Convolutional Layer SlideCredit: CS231n Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 Dot product between filter and input 15
  • 16.
    Reminder: Convolutional Layer SlideCredit: CS231n Typical 3 x 3 convolution, stride 1 pad 1 Input: 4 x 4 Output: 4 x 4 Dot product between filter and input 16
  • 17.
    Reminder: Convolutional Layer SlideCredit: CS231n Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 17
  • 18.
    Reminder: Convolutional Layer SlideCredit: CS231n Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 Dot product between filter and input 18
  • 19.
    Reminder: Convolutional Layer SlideCredit: CS231n Typical 3 x 3 convolution, stride 2 pad 1 Input: 4 x 4 Output: 2 x 2 Dot product between filter and input 19
  • 20.
    Learnable Upsample: TransposedConvolution Slide Credit: CS231n 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 20
  • 21.
    Slide Credit: CS231n 3x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 Input gives weight for filter values Learnable Upsample: Transposed Convolution 21
  • 22.
    Learnable Upsample: TransposedConvolution Slide Credit: CS231n 3 x 3 “deconvolution”, stride 2 pad 1 Input: 2 x 2 Output: 4 x 4 Input gives weight for filter values Sum where output overlaps 22
  • 23.
    Learnable Upsample: TransposedConvolution Warning: Checkerboard effect when kernel size is not divisible by the stride Source: distill.pub 23
  • 24.
    Learnable Upsample: TransposedConvolution Source: distill.pub stride = 2, kernel_size = 3 24 Warning: Checkerboard effect when kernel size is not divisible by the stride
  • 25.
    Learnable Upsample: TransposedConvolution Warning: Checkerboard effect in images generated by neural networks
  • 26.
    Learnable Upsample: TransposedConvolution Slide Credit: CS231n Noh et al. Learning Deconvolution Network for Semantic Segmentation. ICCV 2015 “Regular” VGG “Upside down” VGG 26
  • 27.
    Semantic Segmentation CNN Coarseoutput Problem 2: High-level features (e.g. conv5 layer) from a pretrained classification network are the input for the segmentation branch 27
  • 28.
    Skip Connections Slide Credit:CS231n Skip connections = Better results “skip connections” Long et al. Fully Convolutional Networks for Semantic Segmentation. CVPR 2015 Recovering low level features from early layers 28
  • 29.
    Dilated Convolutions Yu &Koltun. Multi-Scale Context Aggregation by Dilated Convolutions. ICLR 2016 Structural change in convolutional layers for dense prediction problems (e.g. image segmentation) ● The receptive field grows exponentially as you add more layers → more context information in deeper layers wrt regular convolutions ● Number of parameters increases linearly as you add more layers 29
  • 30.
  • 31.
    State-of-the-art models 31 ● U-Net ○Deconvolutions ○ skip connections Ronneberger et al. U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI 2015
  • 32.
    State-of-the-art models 32 ● PSPNet(dilated convolutions + pyramid pooling) Zhao et al. Pyramid Scene Parsing Network. CVPR 2017
  • 33.
    State-of-the-art models 33 ● DeepLabv2 (dilated convolutions + CRF) ● DeepLab v3 (added pyramid pooling. Removed CRF) Chen et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. TPAMI 2017 Chen et al. Rethinking Atrous Convolution for Semantic Image Segmentation. TPAMI 2017
  • 34.
    Summary Segmentation Datasets Semantic SegmentationMethods ● Deconvolution (or transposed convolution) ● Dilated Convolution ● Skip Connections 34
  • 35.