0% found this document useful (0 votes)

178 views

Difference Between AlexNet, VGGNet, ResNet, and Inception - by Aqeel Anwar - Towards Data Science

This document summarizes and compares four famous convolutional neural network (CNN) architectures: AlexNet, VGGNet, ResNet, and Inception. It explains the key details of each network, including their structures, parameters, and innovations. For example, it notes that AlexNet was one of the first deep CNNs to achieve high accuracy on ImageNet challenges. VGGNet reduced parameters by using only 3x3 filters. ResNet addressed issues of accuracy dropping with increased layers using shortcut connections.

Uploaded by

Shrivathsatv Shri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

178 views

Difference Between AlexNet, VGGNet, ResNet, and Inception - by Aqeel Anwar - Towards Data Science

Uploaded by

Shrivathsatv Shri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

Get unlimited access Open in app

Published in Towards Data Science

Aqeel Anwar Follow

Jun 7, 2019 · 9 min read · Listen

Save

Difference between AlexNet, VGGNet, ResNet,

and Inception
In this tutorial, I will quickly go through the details of four of the famous CNN
architectures and how they differ from each other by explaining their W3H (When,
Why, What, and How)

AlexNet
When?

The Alan Turing Year

The year of Sustainable Energy for All

London Olympics

Why? AlexNet was born out of the need to improve the results of the ImageNet
challenge. This was one of the first Deep convolutional networks to achieve
considerable accuracy on the 2012 ImageNet LSVRC-2012 challenge with an
accuracy of 84.7% as compared to the second-best with an accuracy of 73.8%. The
idea of spatial correlation in an image frame was explored using convolutional
layers and receptive fields.

What? The network consists of 5 Convolutional (CONV) layers and 3 Fully

Connected (FC) layers. The activation used is the Rectified Linear Unit (ReLU). The
structural details of each layer in the network can be found in the table below.

https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 1/14
11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

Get unlimited access Open in app

Alexnet Block Diagram (source:oreilly.com)

The network has a total of 62 million trainable variables

How? The input to the network is a batch of RGB images of size 227x227x3 and
outputs a 1000x1 probability vector one corresponding to each class.

Data augmentation is carried out to reduce over-fitting. This Data augmentation

includes mirroring and cropping the images to increase the variation in the
training data-set. The network uses an overlapped max-pooling layer after the
first, second, and fifth CONV layers. Overlapped maxpool layers are simply
maxpool layers with strides less than the window size. 3x3 maxpool layer is used
with a stride of 2 hence creating overlapped receptive fields. This overlapping
improved the top-1 and top-5 errors by 0.4% and 0.3%, respectively.

Before AlexNet, the most commonly used activation functions were sigmoid and
tanh. Due to the saturated nature of these functions, they suffer from the

https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 2/14
11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

Vanishing Gradient (VG) problem and make it difficult for the network to train.
Get unlimited access Open in app
AlexNet uses the ReLU activation function which doesn’t suffer from the VG
problem. The original paper showed that the network with ReLU achieved a 25%
error rate about 6 times faster than the same network with tanh non-linearity.

Although ReLU helps with the vanishing gradient problem, due to its
unbounded nature, the learned variables can become unnecessarily high. To
prevent this, AlexNet introduced Local Response Normalization (LRN). The idea
behind LRN is to carry out a normalization in a neighborhood of pixels
amplifying the excited neuron while dampening the surrounding neurons at the
same time.

AlexNet also addresses the over-fitting problem by using drop-out layers where
a connection is dropped during training with a probability of p=0.5. Although
this avoids the network from over-fitting by helping it escape from bad local
minima, the number of iterations required for convergence is doubled too.

VGGNet:
When?

International Year of Family Farming and Crystallography

First Robotic Landing on Comet

Year of Robin Williams’ death

Why? VGGNet was born out of the need to reduce the # of parameters in the CONV
layers and improve on training time.

What? There are multiple variants of VGGNet (VGG16, VGG19, etc.) which differ
only in the total number of layers in the network. The structural details of a VGG16
network have been shown below.

https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 3/14
11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

Get unlimited access Open in app

VGG16 Block Diagram (source: neurohive.io)

VGG16 has a total of 138 million parameters. The important point to note here is
that all the conv kernels are of size 3x3 and maxpool kernels are of size 2x2 with a
stride of two.

https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 4/14
11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

How? The idea behind having fixed size kernels is that all the variable size
Get unlimited access Open in app
convolutional kernels used in Alexnet (11x11, 5x5, 3x3) can be replicated by making
use of multiple 3x3 kernels as building blocks. The replication is in terms of the
receptive field covered by the kernels.

Let’s consider the following example. Say we have an input layer of size 5x5x1.
Implementing a conv layer with a kernel size of 5x5 and stride one will result in an
output feature map of 1x1. The same output feature map can be obtained by
implementing two 3x3 conv layers with a stride of 1 as shown below

https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 5/14
11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

Get unlimited access Open in app

Now let’s look at the number of variables needed to be trained. For a 5x5 conv layer
filter, the number of variables is 25. On the other hand, two conv layers of kernel
size 3x3 have a total of 3x3x2=18 variables (a reduction of 28%).

Similarly, the effect of one 7x7 (11x11) conv layer can be achieved by implementing
three (five) 3x3 conv layers with a stride of one. This reduces the number of
trainable variables by 44.9% (62.8%). A reduced number of trainable variables
means faster learning and more robust to over-fitting.
https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 6/14
11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

Get unlimited access Open in app

ResNet
When?

Discovery of Gravitational Waves

International year of soil and light-based technologies

The Martian movie

Why? Neural Networks are notorious for not being able to find a simpler mapping
when it exists.

For example, say we have a fully connected multi-layer perceptron network and
we want to train it on a data-set where the input equals the output. The simplest
solution to this problem is having all weights equaling one and all biases zeros
for all the hidden layers. But when such a network is trained using back-
propagation, a rather complex mapping is learned where the weights and biases
have a wide range of values.

Another example is adding more layers to an existing neural network. Say we

have a network f(x) that has achieved an accuracy of n% on a data-set. Now
adding more layers to this network g(f(x)) should have at least an accuracy of
n% i.e. in the worst case g(.) should be an identical mapping yielding the same
accuracy as that of f(x) if not more. But unfortunately, that is not the case.
Experiments have shown that the accuracy decreases by adding more layers to
the network.

The issues mentioned above happens because of the vanishing gradient

problem. As we make the CNN deeper, the derivative when back-propagating to
the initial layers becomes almost insignificant in value.

ResNet addresses this network by introducing two types of ‘shortcut connections’:

Identity shortcut and Projection shortcut.

What? There are multiple versions of ResNetXX architectures where ‘XX’ denotes
the number of layers. The most commonly used ones are ResNet50 and ResNet101.
Since the vanishing gradient problem was taken care of (more about it in the How

https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 7/14
11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

part), CNN started to get deeper and deeper. Below we present the structural details
Get unlimited access Open in app
of ResNet18

621 7

Resnet18 has around 11 million trainable parameters. It consists of CONV layers

with filters of size 3x3 (just like VGGNet). Only two pooling layers are used
throughout the network one at the beginning and the other at the end of the
network. Identity connections are between every two CONV layers. The solid arrows
show identity shortcuts where the dimension of the input and output is the same,
while the dotted ones present the projection connections where the dimensions
differ.

How? As mentioned earlier, ResNet architecture makes use of shortcut connections

to solve the vanishing gradient problem. The basic building block of ResNet is a
Residual block that is repeated throughout the network.

https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 8/14
11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

Get unlimited access Open in app

Residual Block — Image is taken from the original paper

Instead of learning the mapping from x →F(x), the network learns the mapping from
x → F(x)+G(x). When the dimension of the input x and output F(x) is the same, the
function G(x) = x is an identity function and the shortcut connection is called
Identity connection. The identical mapping is learned by zeroing out the weights in
the intermediate layer during training since it's easier to zero out the weights than
push them to one.

For the case when the dimensions of F(x) differ from x (due to stride length>1 in the
CONV layers in between), the Projection connection is implemented rather than the
Identity connection. The function G(x) changes the dimensions of input x to that of
output F(x). Two kinds of mapping were considered in the original paper.

Non-trainable Mapping (Padding): The input x is simply padded with zeros to

make the dimension match that of F(x)

Trainable Mapping (Conv Layer): 1x1 Conv layer is used to map x to G(x). It can
be seen from the table above that across the network the spatial dimensions are
either kept the same or halved, and the depth is either kept the same or doubled
and the product of Width and Depth after each conv layer remains the same i.e.
3584. 1x1 conv layers are used to half the spatial dimension and double the
depth by using stride length of 2 and multiple of such filters respectively. The
number of 1x1 conv layers is equal to the depth of F(x).

https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 9/14
11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

Inception:
Get unlimited access Open in app
When?

International Year of Family Farming and Crystallography

First Robotic Landing on Comet

Year of Robin Williams’ death

Why? In an image classification task, the size of the salient feature can considerably
vary within the image frame. Hence, deciding on a fixed kernel size is rather
difficult. Lager kernels are preferred for more global features that are distributed
over a large area of the image, on the other hand, smaller kernels provide good
results in detecting area-specific features that are distributed across the image
frame. For effective recognition of such a variable-sized feature, we need kernels of
different sizes. That is what Inception does. Instead of simply going deeper in terms
of the number of layers, it goes wider. Multiple kernels of different sizes are
implemented within the same layer.

What? The Inception network architecture consists of several inception modules of

the following structure

Inception Module (source: original paper)

Each inception module consists of four operations in parallel

https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 10/14
11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

1x1 conv layer

Get unlimited access Open in app
3x3 conv layer

5x5 conv layer

max pooling

The 1x1 conv blocks shown in yellow are used for depth reduction. The results from
the four parallel operations are then concatenated depth-wise to form the Filter
Concatenation block (in green). There is multiple version of Inception, the simplest
one being the GoogLeNet.

https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 11/14
11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

Get unlimited access Open in app

How? Inception increases the network space from which the best network is to be
chosen via training. Each inception module can capture salient features at different
levels. Global features are captured by the 5x5 conv layer, while the 3x3 conv layer is
prone to capturing distributed features. The max-pooling operation is responsible
for capturing low-level features that stand out in a neighborhood. At a given level,
all of these features are extracted and concatenated before it is fed to the next layer.
We leave for the network/training to decide what features hold the most values and
weight accordingly. Say if the images in the data-set are rich in global features
without too many low-level features, then the trained Inception network will have
very small weights corresponding to the 3x3 conv kernel as compared to the 5x5
conv kernel.

https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 12/14
11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

Get unlimited access Open in app

Summary
In the table below these four CNNs are sorted w.r.t their top-5 accuracy on the
Imagenet dataset. The number of trainable parameters and the Floating Point
Operations (FLOP) required for a forward pass can also be seen.

Several comparisons can be drawn:

AlexNet and ResNet-152, both have about 60M parameters but there is about a
10% difference in their top-5 accuracy. But training a ResNet-152 requires a lot
of computations (about 10 times more than that of AlexNet) which means more
training time and energy required.

VGGNet not only has a higher number of parameters and FLOP as compared to
ResNet-152 but also has a decreased accuracy. It takes more time to train a
VGGNet with reduced accuracy.

Training an AlexNet takes about the same time as training Inception. The
memory requirements are 10 times less with improved accuracy (about 9%)

Bonus:
Compact cheat sheets for this topic and many other important topics in Machine
Learning can be found in the link below

Cheat Sheets for Machine Learning Interview Topics

A visual cheatsheet for ML interviews (www.cheatsheets.aqeel-
anwar.com)
medium.com

https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 13/14
11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

Get unlimited access Open in app

If this article was helpful to you, feel free to clap, share and respond to it. If want to
learn more about Machine Learning and Data Science, follow me @Aqeel Anwar or
connect with me on LinkedIn.

Sign up for The Variable

By Towards Data Science

Every Thursday, the Variable delivers the very best of Towards Data Science: from hands-on tutorials and cutting-
edge research to original features you don't want to miss. Take a look.

Emails will be sent to [email protected]. Not you?

Get this newsletter

https://towardsdatascience.com/the-w3h-of-alexnet-vggnet-resnet-and-inception-7baaaecccc96 14/14

50 Deep Learning Technical Interview Questions With Answers
100% (1)
50 Deep Learning Technical Interview Questions With Answers
20 pages
Difference Between Alexnet, Vggnet, Resnet, and Inception
No ratings yet
Difference Between Alexnet, Vggnet, Resnet, and Inception
14 pages
Difference between AlexNet, VGGNet, ResNet, and Inception
No ratings yet
Difference between AlexNet, VGGNet, ResNet, and Inception
25 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Modern CNN Architectures
No ratings yet
Modern CNN Architectures
32 pages
CNN Architectures - Transfer Learning
No ratings yet
CNN Architectures - Transfer Learning
64 pages
5b Dana
No ratings yet
5b Dana
67 pages
Unit 2 CNN
No ratings yet
Unit 2 CNN
15 pages
Famous Networks
No ratings yet
Famous Networks
6 pages
Convolutional Networks
No ratings yet
Convolutional Networks
211 pages
Unit III
No ratings yet
Unit III
58 pages
CNN Architectures 01
No ratings yet
CNN Architectures 01
66 pages
Data Science Interview Preparation (#DAY 14)
No ratings yet
Data Science Interview Preparation (#DAY 14)
11 pages
Convolutional Neural Network2 26112024 015227pm
No ratings yet
Convolutional Neural Network2 26112024 015227pm
41 pages
Types of Convolutional Neural Networks - LeNet, AlexNet, VGG-16 Net, ResNet and Inception Net - by Bhavesh Singh Bisht - Analytics Vidhya - Medium
100% (1)
Types of Convolutional Neural Networks - LeNet, AlexNet, VGG-16 Net, ResNet and Inception Net - by Bhavesh Singh Bisht - Analytics Vidhya - Medium
6 pages
CS60010: Deep Learning CNN - Part 3: Sudeshna Sarkar
No ratings yet
CS60010: Deep Learning CNN - Part 3: Sudeshna Sarkar
167 pages
Notes - CSE (DS)
No ratings yet
Notes - CSE (DS)
44 pages
Res Net 4
No ratings yet
Res Net 4
23 pages
TResNet
No ratings yet
TResNet
37 pages
Week3_Lec1_2
No ratings yet
Week3_Lec1_2
107 pages
Unit-3 (1)
No ratings yet
Unit-3 (1)
37 pages
Unit-3
No ratings yet
Unit-3
38 pages
10. Image Processing With Deep Learning
No ratings yet
10. Image Processing With Deep Learning
39 pages
VGG Net
No ratings yet
VGG Net
22 pages
CNN Variants V1
No ratings yet
CNN Variants V1
109 pages
Convolutional Neural Network Ilsvrc Alexnet (2012) Zfnet (2013) Vggnet (2014) Googlenet 2014) Resnet (2015) Conclusion
No ratings yet
Convolutional Neural Network Ilsvrc Alexnet (2012) Zfnet (2013) Vggnet (2014) Googlenet 2014) Resnet (2015) Conclusion
82 pages
138 B Pretrained Networks Classification Complete
No ratings yet
138 B Pretrained Networks Classification Complete
47 pages
Notes - CSE (DS)
No ratings yet
Notes - CSE (DS)
44 pages
DeepLearningAssign2
No ratings yet
DeepLearningAssign2
5 pages
CS436_CS5310_EE513_L05_CNN2
No ratings yet
CS436_CS5310_EE513_L05_CNN2
27 pages
Untitled document (2)
No ratings yet
Untitled document (2)
15 pages
Alex Net
No ratings yet
Alex Net
26 pages
Aidl 2023s DL 08 CNN Architectures
No ratings yet
Aidl 2023s DL 08 CNN Architectures
51 pages
VGG net
No ratings yet
VGG net
6 pages
MLT CNN Architectures
No ratings yet
MLT CNN Architectures
104 pages
Lecture06 VDL
No ratings yet
Lecture06 VDL
79 pages
Understanding AlexNet
No ratings yet
Understanding AlexNet
8 pages
GoogleNET and ResNet v4 With Nin and Bias
No ratings yet
GoogleNET and ResNet v4 With Nin and Bias
82 pages
Convolution Neural Networks
No ratings yet
Convolution Neural Networks
80 pages
DL unit 3-5
No ratings yet
DL unit 3-5
44 pages
Convolutional Neural Networks
No ratings yet
Convolutional Neural Networks
17 pages
Convolutional Neural Network Report
No ratings yet
Convolutional Neural Network Report
5 pages
cs231n 2018 Lecture09
No ratings yet
cs231n 2018 Lecture09
106 pages
Different Deep CNN Architectures - LeNet, AlexNet, VGG
No ratings yet
Different Deep CNN Architectures - LeNet, AlexNet, VGG
13 pages
138 A VGG Googlenet in B Now
No ratings yet
138 A VGG Googlenet in B Now
18 pages
XCXC
No ratings yet
XCXC
16 pages
ML II - Unit IV
No ratings yet
ML II - Unit IV
20 pages
cnn (1)_unit 3_merged
No ratings yet
cnn (1)_unit 3_merged
14 pages
DL3 QB
No ratings yet
DL3 QB
19 pages
An Analysis of Convolutional Neural Network Architectures
No ratings yet
An Analysis of Convolutional Neural Network Architectures
54 pages
CNN Apps
No ratings yet
CNN Apps
17 pages
Module 2
No ratings yet
Module 2
40 pages
L3 - UUCLxDeepMind DL2020
No ratings yet
L3 - UUCLxDeepMind DL2020
110 pages
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
No ratings yet
Convolutional Neural Networks: CS 535 Deep Learning, Winter 2020 Fuxin Li
44 pages
Case Studies
No ratings yet
Case Studies
17 pages
deeplearning_ppt_unit 4 and 5.pptx
No ratings yet
deeplearning_ppt_unit 4 and 5.pptx
154 pages
Lecture11 cnns-2
No ratings yet
Lecture11 cnns-2
58 pages
VGGNet and ResNet Assignment Questions
No ratings yet
VGGNet and ResNet Assignment Questions
8 pages
7 Architectures
No ratings yet
7 Architectures
68 pages
Deep Learning Unit2
No ratings yet
Deep Learning Unit2
43 pages
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
From Everand
Convolutional Neural Networks: Fundamentals and Applications for Analyzing Visual Imagery
Fouad Sabry
No ratings yet
What Is Transposed Convolutional Layer - by Aqeel Anwar - Towards Data Science
No ratings yet
What Is Transposed Convolutional Layer - by Aqeel Anwar - Towards Data Science
6 pages
Types of Regularization in Machine Learning - by Aqeel Anwar - Towards Data Science
No ratings yet
Types of Regularization in Machine Learning - by Aqeel Anwar - Towards Data Science
11 pages
Difference Between Local Response Normalization and Batch Normalization - by Aqeel Anwar - Towards Data Science
No ratings yet
Difference Between Local Response Normalization and Batch Normalization - by Aqeel Anwar - Towards Data Science
9 pages
51 Machine Learning Interview Questions With Answers - Springboard
100% (1)
51 Machine Learning Interview Questions With Answers - Springboard
20 pages
Working With Dates and Times Cheat Sheet
No ratings yet
Working With Dates and Times Cheat Sheet
1 page
Data Storytelling Cheat Sheet
100% (1)
Data Storytelling Cheat Sheet
1 page
module-4-RNN-LSTM-GRU
No ratings yet
module-4-RNN-LSTM-GRU
59 pages
Deep Learning Notes
No ratings yet
Deep Learning Notes
14 pages
Bachelor Thesis
No ratings yet
Bachelor Thesis
25 pages
Technical Report On DenseNet Architecture (Deep Learning Network Model)
No ratings yet
Technical Report On DenseNet Architecture (Deep Learning Network Model)
9 pages
RNN Basics
No ratings yet
RNN Basics
17 pages
Unit I Architecture of Neural Network
No ratings yet
Unit I Architecture of Neural Network
74 pages
Recurrent Neural Network - Fundamentals of Deep Learning
No ratings yet
Recurrent Neural Network - Fundamentals of Deep Learning
16 pages
DeepLearing Theory
No ratings yet
DeepLearing Theory
51 pages
unit 5
No ratings yet
unit 5
46 pages
UNIT-2 Foundations of Deep Learning
No ratings yet
UNIT-2 Foundations of Deep Learning
64 pages
ROHAN PRASAD FinalProjectReport - Rohan Gamer
No ratings yet
ROHAN PRASAD FinalProjectReport - Rohan Gamer
39 pages
IMP - Fundamentals of Deep Learning - Introduction To Recurrent Neural Networks
No ratings yet
IMP - Fundamentals of Deep Learning - Introduction To Recurrent Neural Networks
33 pages
Assignment Week 8-Deep-Learning PDF
100% (1)
Assignment Week 8-Deep-Learning PDF
5 pages
All Quizes
No ratings yet
All Quizes
81 pages
Rezero Is All You Need: Fast Convergence at Large Depth: Authors Contributed Equally, Ordered by Last Name
No ratings yet
Rezero Is All You Need: Fast Convergence at Large Depth: Authors Contributed Equally, Ordered by Last Name
14 pages
Modeling, State of Charge Estimation, and Charging of Lithium-Ion Battery in Electric Vehicle: A Review
No ratings yet
Modeling, State of Charge Estimation, and Charging of Lithium-Ion Battery in Electric Vehicle: A Review
25 pages
POA - Tracker
No ratings yet
POA - Tracker
60 pages
Deep Learning Interview Questions - Deep Learning Questions
No ratings yet
Deep Learning Interview Questions - Deep Learning Questions
21 pages
Vanishing Gradient Problem in Deep Learning Understanding Intuition and Solutions
No ratings yet
Vanishing Gradient Problem in Deep Learning Understanding Intuition and Solutions
8 pages
Session2 2024_2025_ Natural Language Processing
No ratings yet
Session2 2024_2025_ Natural Language Processing
30 pages
The Challenge of Vanishing/Exploding Gradients in Deep Neural Networks
No ratings yet
The Challenge of Vanishing/Exploding Gradients in Deep Neural Networks
8 pages
Network Anomaly Detection Using LSTMBased Autoencoder
No ratings yet
Network Anomaly Detection Using LSTMBased Autoencoder
10 pages
Deep Learning - Intro, Methods & Applications
100% (1)
Deep Learning - Intro, Methods & Applications
37 pages
AI Final1
No ratings yet
AI Final1
18 pages
Data Science Ai
No ratings yet
Data Science Ai
27 pages
Liver Tumor Segmentation Thesis
No ratings yet
Liver Tumor Segmentation Thesis
62 pages
ML Unit4
No ratings yet
ML Unit4
32 pages
Unit III (2) RNN, LSTM, Gru
No ratings yet
Unit III (2) RNN, LSTM, Gru
14 pages
comprehensive-popular-deep-learning-interview-questions-answers
No ratings yet
comprehensive-popular-deep-learning-interview-questions-answers
15 pages

Difference Between AlexNet, VGGNet, ResNet, and Inception - by Aqeel Anwar - Towards Data Science

Uploaded by

Difference Between AlexNet, VGGNet, ResNet, and Inception - by Aqeel Anwar - Towards Data Science

Uploaded by

11/11/22, 7:09 AM Difference between AlexNet, VGGNet, ResNet, and Inception | by Aqeel Anwar | Towards Data Science

Get unlimited access Open in app

Published in Towards Data Science

Aqeel Anwar Follow

Jun 7, 2019 · 9 min read · Listen

Difference between AlexNet, VGGNet, ResNet,

The Alan Turing Year

The year of Sustainable Energy for All

What? The network consists of 5 Convolutional (CONV) layers and 3 Fully

Get unlimited access Open in app

Alexnet Block Diagram (source:oreilly.com)

The network has a total of 62 million trainable variables

Data augmentation is carried out to reduce over-fitting. This Data augmentation

International Year of Family Farming and Crystallography

First Robotic Landing on Comet

Year of Robin Williams’ death

Get unlimited access Open in app

VGG16 Block Diagram (source: neurohive.io)

Get unlimited access Open in app

Get unlimited access Open in app

Discovery of Gravitational Waves

International year of soil and light-based technologies

The Martian movie

Another example is adding more layers to an existing neural network. Say we

The issues mentioned above happens because of the vanishing gradient

ResNet addresses this network by introducing two types of ‘shortcut connections’:

Resnet18 has around 11 million trainable parameters. It consists of CONV layers

How? As mentioned earlier, ResNet architecture makes use of shortcut connections

Get unlimited access Open in app

Residual Block — Image is taken from the original paper

Non-trainable Mapping (Padding): The input x is simply padded with zeros to

International Year of Family Farming and Crystallography

First Robotic Landing on Comet

Year of Robin Williams’ death

What? The Inception network architecture consists of several inception modules of

Inception Module (source: original paper)

Each inception module consists of four operations in parallel

1x1 conv layer

5x5 conv layer

Get unlimited access Open in app

Get unlimited access Open in app

Several comparisons can be drawn:

Cheat Sheets for Machine Learning Interview Topics

Get unlimited access Open in app

Sign up for The Variable

Emails will be sent to [email protected]. Not you?

Get this newsletter

You might also like