Convolutional Neural Networks in Python _ DataCamp
Convolutional Neural Networks in Python _ DataCamp
Convolutional Neural
Networks in Python with Keras
In this tutorial, you’ll learn how to implement Convolutional Neural Networks
(CNNs) in Python with Keras, and how to overcome overfitting with dropout.
Dec 5, 2017 · 15 min read
Aditya Sharma
TO P I C S
Python
Artificial Intelligence
Machine Learning
Deep Learning
You might have already heard of image or facial recognition or self-driving cars. These are
real-life implementations of Convolutional Neural Networks (CNNs). In this blog post, you will
learn and understand how to implement these deep, feed-forward artificial neural networks
in Keras and also learn how to overcome overfitting with the regularization technique called
"dropout".
Then, you'll first try to understand the data. You'll use Python and its libraries to load,
explore and analyze your data,
After that, you'll preprocess your data: you'll learn how to resize, rescale, convert your
labels into one-hot encoding vectors and split up your data in training and validation
sets;
With all of this done, you can construct the neural network model: you'll learn how to
model the data and form the network. Next, you'll compile, train and evaluate the
model, visualizing the accuracy and loss plots;
Then, you will learn about the concept of overfitting and how you can overcome it by
adding a dropout layer;
With this information, you can revisit your original model and re-train the model. You'll
also re-evaluate your new model and compare the results of both the models;
Next, you'll make predictions on the test data, convert the probabilities into class labels
and plot few test samples that your model correctly classified and incorrectly classified;
Finally, you will visualize the classification report which will give you more in-depth
intuition about which class was (in)correctly classified by your model.
Would you like to take a course on Keras and deep learning in Python? Consider taking
DataCamp's Deep Learning in Python course!
Also, don't miss our Keras cheat sheet, which shows you the six steps that you need to go
through to build neural networks in Python with code examples!
A specific kind of such a deep neural network is the convolutional network, which is
commonly referred to as CNN or ConvNet. It's a deep, feed-forward artificial neural network.
Remember that feed-forward neural networks are also called multi-layer perceptrons(MLPs),
which are the quintessential deep learning models. The models are called "feed-forward"
because information flows right through the model. There are no feedback connections in
which outputs of the model are fed back into itself.
CNNs specifically are inspired by the biological visual cortex. The cortex has small regions
of cells that are sensitive to the specific areas of the visual field. This idea was expanded by
a captivating experiment done by Hubel and Wiesel in 1962 (if you want to know more,
T U TO R I A L S here's a video). In this experiment, the researchers showed that some individual neurons in category
Sale endsEN
in
the brain activated or fired only in the presence of edges of a particular orientation like 3d 10h 33m 27s
vertical or horizontal edges. For example, some neurons fired when exposed to vertical sides
and some when shown a horizontal edge. Hubel and Wiesel found that all of these neurons
were well ordered in a columnar fashion and that together they were able to produce visual
perception. This idea of specialized components inside of a system having specific tasks is
one that machines use as well and one that you can also find back in CNNs.
Convolutional neural networks have been one of the most influential innovations in the field
of computer vision. They have performed a lot better than traditional computer vision and
have produced state-of-the-art results. These neural networks have proven to be successful
in many different real-life case studies and applications, like:
To understand this success, you'll have to go back to 2012, the year in which Alex Krizhevsky
used convolutional neural networks to win that year's ImageNet Competition, reducing the
classification error from 26% to 15%.
Note that ImageNet Large Scale Visual Recognition Challenge (ILSVRC) began in the year
2010 is an annual competition where research teams assess their algorithms on the given
data set and compete to achieve higher accuracy on several visual recognition tasks.
This was the time when neural networks regained prominence after quite some time. This is
often called the "third wave of neural networks". The other two waves were in the 1940s until
the 1960s and in the 1970s to 1980s.
Alright, you know that you'll be working with feed-forward networks that are inspired by the
biological visual cortex, but what does that actually mean?
1. The convolution layer computes the output of neurons that are connected to local
regions or receptive fields in the input, each computing a dot product between their
weights and a small receptive field to which they are connected to in the input volume.
Each computation leads to extraction of a feature map from the input image. In other
words, imagine you have an image represented as a 5x5 matrix of values, and you take
a 3x3 matrix and slide that 3x3 window or kernel around the image. At each position of
that matrix, you multiply the values of your 3x3 window by the values in the image that
are currently being covered by the window. As a result, you'll get a single number that
represents all the values in that window of the images. You use this layer to filtering: as
the window moves over the image, you check for patterns in that section of the image.
This works because of filters, which are multiplied by the values outputted by the
convolution.
3. The objective of the fully connected layer is to flatten the high-level features that are
learned by convolutional layers and combining all the features. It passes the flattened
output to the output layer where you use a softmax classifier or a sigmoid to predict the
input class label.
Fashion-MNIST is similar to the MNIST dataset that you might already know, which you use
to classify handwritten digits. That means that the image dimensions, training and test splits
are similar to the MNIST dataset. Tip: if you want to learn how to implement an Multi-Layer
Perceptron (MLP) for classification tasks with this latter dataset, go to this deep learning
with Keras tutorial.
You can find the Fashion-MNIST dataset here, but you can also load it with the help of
specific TensorFlow and Keras modules. You'll see how this works in the next section!
You have probably done this a million times by now, but it's always an essential step to get
started. Now you're completely set to start analyzing, processing and modeling your data!
import numpy as np
from keras.utils import to_categorical
import matplotlib.pyplot as plt
%matplotlib inline
Also, don't forget to take a look at what the images in your dataset:
plt.figure(figsize=[5,5])
The output of above two plots looks like an ankle boot, and this class is assigned a class
label of 9. Similarly, other fashion products will have different labels, but similar products will
have same labels. This means that all the 7,000 ankle boot images will have a class label of
9.
Data Preprocessing
As you could see in the above plot, the images are grayscale images have pixel values that
range from 0 to 255. Also, these images have a dimension of 28 x 28. As a result, you'll need
to preprocess the data before you feed it into the model.
As a first step, convert each 28 x 28 image of the train and test set into a matrix of size
28 x 28 x 1 which is fed into the network.
train_X = train_X.reshape(-1, 28,28, 1)
test_X = test_X.reshape(-1, 28,28, 1)
train_X.shape, test_X.shape
The data right now is in an int8 format, so before you feed it into the network you need
to convert its type to float32, and you also have to rescale the pixel values in range 0 - 1
inclusive. So let's do that!
train_X = train_X.astype('float32')
test_X = test_X.astype('float32')
train_X = train_X / 255.
test_X = test_X / 255.
Now you need to convert the class labels into a one-hot encoding vector.
In one-hot encoding, you convert the categorical data into a vector of numbers. The reason
why you convert the categorical data in one hot encoding is that machine learning
algorithms cannot work with categorical data directly. You generate one boolean column for
each category or class. Only one of these columns could take on the value 1 for each
sample. Hence, the term one-hot encoding.
For your problem statement, the one hot encoding will be a row vector, and for each image,
it will have a dimension of 1 x 10. The important thing to note here is that the vector consists
of all zeros except for the class that it represents, and for that, it is 1. For example, the ankle
boot image that you plotted above has a label of 9, so for all the ankle boot images, the one
hot encoding vector would be [0 0 0 0 0 0 0 0 1 0] .
So let's convert the training and testing labels into one-hot encoding vectors:
('Original label:', 9)
('After conversion to one-hot:', array([ 0., 0., 0., 0., 0., 0., 0., 0.,
That's pretty clear, right? Note that you can also print the train_Y_one_hot , which will
display a matrix of size 60000 x 10 in which each row depicts one-hot encoding of an
image.
This last step is a crucial one. In machine learning or any data specific task, you should
partition the data correctly. For the model to generalize well, you split the training data
into two parts, one designed for training and another one for validation. In this case, you
will train the model on 80\% of the training data and validate it on 20\% of the
remaining training data. This will also help to reduce overfitting since you will be
validating the model on the data it would not have seen in training phase, which will
help in boosting the test performance.
For one last time let's check the shape of training and validation set.
train_X.shape,valid_X.shape,train_label.shape,valid_label.shape
((48000, 28, 28, 1), (12000, 28, 28, 1), (48000, 10), (12000, 10))
The Network
The images are of size 28 x 28. You convert the image matrix to an array, rescale it between
0 and 1, reshape it so that it's of size 28 x 28 x 1, and feed this as an input to the network.
import keras
from keras.models import Sequential,Input,Model
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.layers.normalization import BatchNormalization
from keras.layers.advanced_activations import LeakyReLU
Explain code POWERED BY
You will use a batch size of 64 using a higher batch size of 128 or 256 is also preferable it all
depends on the memory. It contributes massively to determining the learning parameters
and affects the prediction accuracy. You will train the network for 20 epochs.
batch_size = 64
epochs = 20
num_classes = 10
More specifically, you add Leaky ReLUs because they attempt to fix the problem of dying
Rectified Linear Units (ReLUs). The ReLU activation function is used a lot in neural network
architectures and more specifically in convolutional networks, where it has proven to be
more effective than the widely used logistic sigmoid function. As of 2017, this activation
function is the most popular one for deep neural networks. The ReLU function allows the
activation to be thresholded at zero. However, during the training, ReLU units can "die". This
can happen when a large gradient flows through a ReLU neuron: it can cause the weights to
update in such a way that the neuron will never activate on any data point again. If this
happens, then the gradient flowing through the unit will forever be zero from that point on.
Leaky ReLUs attempt to solve this: the function will not be zero but will instead have a small
negative slope.
Next, you'll add the max-pooling layer with MaxPooling2D() and so on. The last layer is a
Dense layer that has a softmax activation function with 10 units, which is needed for this
multi-class classification problem.
fashion_model = Sequential()
fashion_model.add(Conv2D(32, kernel_size=(3, 3),activation='linear',input_shape=(
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D((2, 2),padding='same'))
fashion_model.add(Conv2D(64, (3, 3), activation='linear',padding='same'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
fashion_model.add(Conv2D(128, (3, 3), activation='linear',padding='same'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
fashion_model.add(Flatten())
fashion_model.add(Dense(128, activation='linear'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(Dense(num_classes, activation='softmax'))
Let's visualize the layers that you created in the above step by using the summary function.
This will show some parameters (weights and biases) in each layer and also the total
parameters in your model.
fashion_model.summary()
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_51 (Conv2D) (None, 28, 28, 32) 320
_________________________________________________________________
leaky_re_lu_57 (LeakyReLU) (None, 28, 28, 32) 0
_________________________________________________________________
max_pooling2d_49 (MaxPooling (None, 14, 14, 32) 0
_________________________________________________________________
conv2d_52 (Conv2D) (None, 14, 14, 64) 18496
_________________________________________________________________
leaky_re_lu_58 (LeakyReLU) (None, 14, 14, 64) 0
_________________________________________________________________
max_pooling2d_50 (MaxPooling (None, 7, 7, 64) 0
_________________________________________________________________
conv2d_53 (Conv2D) (None, 7, 7, 128) 73856
_________________________________________________________________
leaky_re_lu_59 (LeakyReLU) (None, 7, 7, 128) 0
_________________________________________________________________
max_pooling2d_51 (MaxPooling (None, 4, 4, 128) 0
_________________________________________________________________
flatten_17 (Flatten) (None, 2048) 0
_________________________________________________________________
dense_33 (Dense) (None, 128) 262272
_________________________________________________________________
leaky_re_lu_60 (LeakyReLU) (None, 128) 0
_________________________________________________________________
dense_34 (Dense) (None, 10) 1290
=================================================================
Total params: 356,234
Trainable params: 356,234
Non-trainable params: 0
_________________________________________________________________
Finally! You trained the model on fashion-MNIST for 20 epochs, and by observing the
training accuracy and loss, you can say that the model did a good job since after 20 epochs
the training accuracy is 99% and the training loss is quite low.
However, it looks like the model is overfitting, as the validation loss is 0.4396 and the
validation accuracy is 92%. Overfitting gives an intuition that the network has memorized
the training data very well but is not guaranteed to work on unseen data, and that is why
there is a difference in the training and validation accuracy.
You probably need to handle this. In next sections, you'll learn how you can make your
model perform much better by adding a Dropout layer into the network and keeping all the
other layers unchanged.
But first, let's evaluate the performance of your model on the test set before you come on to
a conclusion.
The test accuracy looks impressive. It turns out that your classifier does better than the
benchmark that was reported here, which is an SVM classifier with mean accuracy of 0.897.
Also, the model does well compared to some of the deep learning models mentioned on the
GitHub profile of the creators of fashion-MNIST dataset.
However, you saw that the model looked like it was overfitting. Are these results really all
that good?
Let's put your model evaluation into perspective and plot the accuracy and loss plots
between training and validation data:
accuracy = fashion_train.history['acc']
val_accuracy = fashion_train.history['val_acc']
loss = fashion_train.history['loss']
val_loss = fashion_train.history['val_loss']
epochs = range(len(accuracy))
plt.plot(epochs, accuracy, 'bo', label='Training accuracy')
plt.plot(epochs, val_accuracy, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
The validation loss shows that this is the sign of overfitting, similar to validation accuracy it
linearly decreased but after 4-5 epochs, it started to increase. This means that the model
tried to memorize the data and succeeded.
With this in mind, it's time to introduce some dropout into our model and see if it helps in
reducing overfitting.
So let's create, compile and train the network again but this time with dropout. And run it for
20 epochs with a batch size of 64.
batch_size = 64
epochs = 20
num_classes = 10
fashion_model = Sequential()
fashion_model.add(Conv2D(32, kernel_size=(3, 3),activation='linear',padding='same
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D((2, 2),padding='same'))
fashion_model.add(Dropout(0.25))
fashion_model.add(Conv2D(64, (3, 3), activation='linear',padding='same'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
fashion_model.add(Dropout(0.25))
fashion_model.add(Conv2D(128, (3, 3), activation='linear',padding='same'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
fashion_model.add(Dropout(0.4))
fashion_model.add(Flatten())
fashion_model.add(Dense(128, activation='linear'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(Dropout(0.3))
fashion_model.add(Dense(num_classes, activation='softmax'))
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_54 (Conv2D) (None, 28, 28, 32) 320
_________________________________________________________________
leaky_re_lu_61 (LeakyReLU) (None, 28, 28, 32) 0
_________________________________________________________________
max_pooling2d_52 (MaxPooling (None, 14, 14, 32) 0
_________________________________________________________________
dropout_29 (Dropout) (None, 14, 14, 32) 0
_________________________________________________________________
conv2d_55 (Conv2D) (None, 14, 14, 64) 18496
_________________________________________________________________
leaky_re_lu_62 (LeakyReLU) (None, 14, 14, 64) 0
_________________________________________________________________
max_pooling2d_53 (MaxPooling (None, 7, 7, 64) 0
_________________________________________________________________
dropout_30 (Dropout) (None, 7, 7, 64) 0
_________________________________________________________________
conv2d_56 (Conv2D) (None, 7, 7, 128) 73856
_________________________________________________________________
leaky_re_lu_63 (LeakyReLU) (None, 7, 7, 128) 0
_________________________________________________________________
max_pooling2d_54 (MaxPooling (None, 4, 4, 128) 0
_________________________________________________________________
dropout_31 (Dropout) (None, 4, 4, 128) 0
_________________________________________________________________
flatten_18 (Flatten) (None, 2048) 0
_________________________________________________________________
dense_35 (Dense) (None, 128) 262272
_________________________________________________________________
leaky_re_lu_64 (LeakyReLU) (None, 128) 0
_________________________________________________________________
dropout_32 (Dropout) (None, 128) 0
_________________________________________________________________
dense_36 (Dense) (None, 10) 1290
=================================================================
Total params: 356,234
Trainable params: 356,234
Non-trainable params: 0
_________________________________________________________________
fashion_model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras
Let's save the model so that you can directly load it and not have to train it again for 20
epochs. This way, you can load the model later on if you need it and modify the
architecture; Alternatively, you can start the training process on this saved model. It is
always a good idea to save the model -and even the model's weights!- because it saves you
time. Note that you can also save the model after every epoch so that, if some issue occurs
that stops the training at an epoch, you will not have to start the training from the
beginning.
fashion_model.save("fashion_model_dropout.h5py")
Wow! Looks like adding Dropout in our model worked, even though the test accuracy did not
improve significantly but the test loss decreased compared to the previous results.
Now, let's plot the accuracy and loss plots between training and validation data for the one
last time.
accuracy = fashion_train_dropout.history['acc']
val_accuracy = fashion_train_dropout.history['val_acc']
loss = fashion_train_dropout.history['loss']
val_loss = fashion_train_dropout.history['val_loss']
epochs = range(len(accuracy))
plt.plot(epochs, accuracy, 'bo', label='Training accuracy')
plt.plot(epochs, val_accuracy, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
Therefore, you can say that your model's generalization capability became much better
since the loss on both test set and validation set was only slightly more compared to the
training loss.
Predict Labels
predicted_classes = fashion_model.predict(test_X)
Since the predictions you get are floating point values, it will not be feasible to compare the
predicted labels with true test labels. So, you will round off the output which will convert the
float values into an integer. Further, you will use np.argmax() to select the index number
which has a higher value in a row.
For example, let's assume a prediction for one test image to be 0 1 0 0 0 0 0 0 0 0 , the
output for this should be a class label 1 .
predicted_classes = np.argmax(np.round(predicted_classes),axis=1)
predicted_classes.shape, test_Y.shape
((10000,), (10000,))
correct = np.where(predicted_classes==test_Y)[0]
print "Found %d correct labels" % len(correct)
for i, correct in enumerate(correct[:9]):
plt.subplot(3,3,i+1)
plt.imshow(test_X[correct].reshape(28,28), cmap='gray', interpolation='none')
plt.title("Predicted {}, Class {}".format(predicted_classes[correct], test_Y[
plt.tight_layout()
By looking at a few images, you cannot be sure as to why your model is not able to classify
the above images correctly, but it seems like a variety of the similar patterns present on
multiple classes affect the performance of the classifier although CNN is a robust
architecture. For example, images 5 and 6 both belong to different classes but look kind of
similar maybe a jacket or perhaps a long sleeve shirt.
Classification Report
Classification report will help us in identifying the misclassified classes in more detail. You
will be able to observe for which class the model performed bad out of the given ten
classes.
You can see that the classifier is underperforming for class 6 regarding both precision and
recall. For class 0 and class 2, the classifier is lacking precision. Also, for class 4, the
classifier is slightly lacking both precision and recall.
Go Further!
This tutorial was good start to convolutional neural networks in Python with Keras. If you
were able to follow along easily or even with little more efforts, well done! Try doing some
experiments maybe with same model architecture but using different types of public
datasets available.
There is still a lot to cover, so why not take DataCamp's Deep Learning in Python course? In
the meantime, also make sure to check out the Keras documentation, if you haven't done so
already. You will find more examples and information on all functions, arguments, more
layers, etc. It will undoubtedly be an indispensable resource when you're learning how to
work with neural networks in Python!
If you rather feel like reading a book that explains the fundamentals of deep learning (with
Keras) together with how it's used in practice, you should definitely read François Chollet's
Deep Learning in Python book.
TO P I C S
For Business
COURSE
See More
Related
C H E AT- S H E E T
Keras Cheat Sheet: Neural
Networks in Python
T U TO R I A L
Convolutional Neural Networks
(CNN) with TensorFlow Tutorial
T U TO R I A L
Keras Tutorial: Deep Learning in
Python
See More
LEARN
Learn Python
Learn AI
Learn Power BI
Assessments
Career Tracks
Skill Tracks
Courses
DATA C O U R S E S
Python Courses
R Courses
SQL Courses
Power BI Courses
Tableau Courses
Alteryx Courses
Azure Courses
AWS Courses
Excel Courses
AI Courses
DATA L A B
Get Started
Pricing
Security
Documentation
C E R T I F I C AT I O N
Certifications
Data Scientist
Data Analyst
Data Engineer
SQL Associate
Azure Fundamentals
AI Fundamentals
RESOURCES
Resource Center
Upcoming Events
Blog
Code-Alongs
Tutorials
Docs
Open Source
RDocumentation
Data Portfolio
PLANS
Pricing
For Students
For Business
For Universities
DataCamp Donates
FO R B U S I N E S S
Business Pricing
Teams Plan
Customer Stories
Partner Program
ABOUT
About Us
Learner Stories
Careers
Become an Instructor
Press
Leadership
Contact Us
DataCamp Español
DataCamp Português
DataCamp Deutsch
DataCamp Français
S U P PO R T
Help Center
Become an Affiliate
Privacy Policy Cookie Notice Do Not Sell My Personal Information Accessibility Security Terms of Use