0% found this document useful (0 votes)
18 views13 pages

Evolution of CNN Architecture

The document outlines the evolution of Convolutional Neural Networks (CNNs) from LeNet in 1998 to ResNet in 2015, highlighting key architectures and their innovations. It discusses the significance of the ImageNet project and various CNN models like AlexNet, ZFNet, VGG, and GoogLeNet, emphasizing their unique features such as filter sizes and layer structures. The summary concludes with the observation that increasing the number of layers in networks like ResNet aims to reduce error rates.

Uploaded by

tharshiniapr26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views13 pages

Evolution of CNN Architecture

The document outlines the evolution of Convolutional Neural Networks (CNNs) from LeNet in 1998 to ResNet in 2015, highlighting key architectures and their innovations. It discusses the significance of the ImageNet project and various CNN models like AlexNet, ZFNet, VGG, and GoogLeNet, emphasizing their unique features such as filter sizes and layer structures. The summary concludes with the observation that increasing the number of layers in networks like ResNet aims to reduce error rates.

Uploaded by

tharshiniapr26
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Evolution of CNN Architecture

Introduction
• A Convolutional Neural Network (CNN, or ConvNet) are a
special kind of multi-layer neural networks, designed to
recognize visual patterns directly from pixel images with
minimal preprocessing.
• The ImageNet project is a large visual database designed
for use in visual object recognition software research.
• The ImageNet project runs an annual software contest, the
ImageNet Large Scale Visual Recognition Challenge
(ILSVRC), where software programs compete to correctly
classify and detect objects and scenes.
• Training set of 1.2M (7321300 training samples per class)
labelled images from 1000 categories
LeNet in 1998
• LeNet is a 7-level convolutional network by
LeCun in 1998 that classifies digits and used
by several banks to recognize hand-written
numbers on cheques digitized in 32x32 pixel
greyscale input images.
• The ability to process higher resolution images
requires larger and more convolutional layers,
so this technique is constrained by the
availability of computing resources.
AlexNet in 2012
• 8-layer CNN: 5 Conv layers, 3 FC layers
• 227×227 input
• Max pooling, ReLU nonlinearity
• Implemented on GTX 580 GPUs
Structure of AlexNet
ZFNet in 2013
• ZFNet is a modified version of AlexNet which gives a
better accuracy.
• One major difference in the approaches was that ZF
Net used 7x7 sized filters whereas AlexNet used 11x11
filters.
• By using bigger filters, there may be a chance to loose
a lot of pixel information, which we can retain by
having smaller filter sizes in the earlier conv layers.
• The number of filters increase as we go deeper.
• This network also used ReLUs for their activation and
trained using batch stochastic gradient descent.
Architecture of ZFNet
Architecture of ZFNet
• A 224 by 224 crop of an image is presented as the input
• This is convolved with 96 different 1st layer filters, each of
size 7 by 7 using a stride of 2 in both x and y.
• The resulting feature maps are then:
– Passed through a rectified Linear function
– Pooled (max, within 3x3 regions, using stride 2)
– Contrast normalized across feature maps to give 96 different 55
by 55 element feature maps.
• Similar operations repeated in layers 2,3,4,5
• The last two layers are fully connected, taking features from
the top convolutional layer as input.
• The final layer is a C-way softmax function, C being number
of classes.
VGG in 2014
• VGG Net used 3x3 filters compared to 11x11
filters in AlexNet and 7x7 in ZFNet.
• Having two consecutive ive 3x3 filters gives an
effective receptive field of 5x5, and 3 – 3x3
filters give a receptive field of 7x7 filters.
GoogleNet in 2014
• GoogLeNet proposed a module called the
inception modules which includes skip
connections in the network forming a mini
module and this module is repeated
throughout the network.
• GoogLeNet uses 9 inception module and it
eliminates all fully connected layers using
average pooling to go from 7x7x1024 to
1x1x1024. This saves a lot of parameters.
GoogleNet in 2014
ResNet in 2015
• There are 152 layers in the Microsoft ResNet.
• If the number of layers are increased then the
error rate should keep on decreasing
Comparison

You might also like