Detailed Report on Residual Networks (ResNet)
Introduction
The video titled "ResNet (actually) explained in under 10 minutes" provides an accessible overview of
Residual Networks (ResNet), a groundbreaking architecture in deep learning introduced in a seminal
paper in 2015. The video explains the challenges faced when training deep neural networks and how
ResNet addresses these issues through the concept of residual learning.
Key Concepts
1. Super Resolution Task:
- The video begins by illustrating a practical application of deep neural networks: super resolution,
where a low-resolution image is transformed into a high-resolution one.
- The challenge arises when deeper networks fail to improve performance, leading to increased
training loss instead of the expected decrease.
2. Challenges in Deep Learning:
- As neural networks become deeper, the input signal can degrade due to the cumulative effects of
non-linear activation functions, leading to the vanishing gradient problem.
- The network struggles to retain the input signal, which is crucial for learning effective
transformations.
3. Residual Learning:
- To tackle these issues, the video introduces the concept of residual learning. Instead of learning the
entire transformation from low to high resolution, the network learns the residual (the difference)
between the two images.
- This approach simplifies the learning task, allowing the network to focus on the essential changes
needed rather than retaining the entire input.
4. Residual Blocks:
- The core building block of ResNet is the residual block, which consists of two main convolutional
layers followed by batch normalization and an activation function (typically ReLU).
- A key feature of the residual block is the addition of the input (identity) to the output of the
convolutional layers, enabling the network to learn the residual mapping.
5. Dimensionality Issues:
- The video highlights two significant problems when generalizing residual connections to various
tasks:
- Dimensionality Mismatch: In tasks like image classification, the input and output dimensions differ
(e.g., an image input mapped to a single class label).
- Signal Propagation: The training signal can be lost as it propagates through the network.
6. Solutions to Dimensionality Mismatch:
- To address dimensionality mismatch, the authors of the original ResNet paper proposed two
solutions:
1. Zero Padding: This method involves adding zeros to the input features to match the output
dimensions without introducing new parameters. However, it can waste computational resources.
2. 1x1 Convolution: This approach uses a 1x1 convolution to adjust the number of channels in the
input features, ensuring that the dimensions match without wasting computation. This method
introduces additional parameters but retains meaningful information.
7. Implementation of Residual Blocks:
- The video briefly discusses the implementation of residual blocks in PyTorch, emphasizing the
importance of matching dimensions for element-wise addition.
- The authors of the ResNet paper strategically reduce dimensionality every few blocks to maintain
computational efficiency while transitioning from high-dimensional inputs to lower-dimensional
outputs.
Conclusion
The video effectively demystifies the concept of Residual Networks, illustrating how they enable the
training of deeper neural networks by addressing the challenges of signal degradation and
dimensionality mismatch. By focusing on learning residuals rather than the full transformation, ResNet
architectures have become foundational in modern deep learning, influencing a wide range of
applications beyond image recognition.