0% found this document useful (0 votes)
4 views

Convolutional Neural Networks in Python _ DataCamp

This tutorial teaches how to implement Convolutional Neural Networks (CNNs) in Python using Keras, focusing on data preprocessing, model construction, and techniques to prevent overfitting such as dropout. It covers the Fashion-MNIST dataset, including loading, analyzing, and preparing the data for training and validation. The tutorial also discusses the architecture of CNNs and their applications in real-world scenarios like image classification and self-driving cars.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Convolutional Neural Networks in Python _ DataCamp

This tutorial teaches how to implement Convolutional Neural Networks (CNNs) in Python using Keras, focusing on data preprocessing, model construction, and techniques to prevent overfitting such as dropout. It covers the Fashion-MNIST dataset, including loading, analyzing, and preparing the data for training and validation. The tutorial also discusses the architecture of CNNs and their applications in real-world scenarios like image classification and self-driving cars.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Home Tutorials Python

Convolutional Neural
Networks in Python with Keras
In this tutorial, you’ll learn how to implement Convolutional Neural Networks
(CNNs) in Python with Keras, and how to overcome overfitting with dropout.
Dec 5, 2017 · 15 min read

Aditya Sharma

TO P I C S

Python

Artificial Intelligence

Machine Learning

Deep Learning

You might have already heard of image or facial recognition or self-driving cars. These are
real-life implementations of Convolutional Neural Networks (CNNs). In this blog post, you will
learn and understand how to implement these deep, feed-forward artificial neural networks
in Keras and also learn how to overcome overfitting with the regularization technique called
"dropout".

More specifically, you'll tackle the following topics in today's tutorial:

You will be introduced to convolutional neural networks;

Then, you'll first try to understand the data. You'll use Python and its libraries to load,
explore and analyze your data,

After that, you'll preprocess your data: you'll learn how to resize, rescale, convert your
labels into one-hot encoding vectors and split up your data in training and validation
sets;

With all of this done, you can construct the neural network model: you'll learn how to
model the data and form the network. Next, you'll compile, train and evaluate the
model, visualizing the accuracy and loss plots;

Then, you will learn about the concept of overfitting and how you can overcome it by
adding a dropout layer;

With this information, you can revisit your original model and re-train the model. You'll
also re-evaluate your new model and compare the results of both the models;

Next, you'll make predictions on the test data, convert the probabilities into class labels
and plot few test samples that your model correctly classified and incorrectly classified;

Finally, you will visualize the classification report which will give you more in-depth
intuition about which class was (in)correctly classified by your model.
Would you like to take a course on Keras and deep learning in Python? Consider taking
DataCamp's Deep Learning in Python course!

Also, don't miss our Keras cheat sheet, which shows you the six steps that you need to go
through to build neural networks in Python with code examples!

Convolutional Neural Network: Introduction


By now, you might already know about machine learning and deep learning, a computer
science branch that studies the design of algorithms that can learn. Deep learning is a
subfield of machine learning that is inspired by artificial neural networks, which in turn are
inspired by biological neural networks.

A specific kind of such a deep neural network is the convolutional network, which is
commonly referred to as CNN or ConvNet. It's a deep, feed-forward artificial neural network.
Remember that feed-forward neural networks are also called multi-layer perceptrons(MLPs),
which are the quintessential deep learning models. The models are called "feed-forward"
because information flows right through the model. There are no feedback connections in
which outputs of the model are fed back into itself.

CNNs specifically are inspired by the biological visual cortex. The cortex has small regions
of cells that are sensitive to the specific areas of the visual field. This idea was expanded by
a captivating experiment done by Hubel and Wiesel in 1962 (if you want to know more,
T U TO R I A L S here's a video). In this experiment, the researchers showed that some individual neurons in category
Sale endsEN
in
the brain activated or fired only in the presence of edges of a particular orientation like 3d 10h 33m 27s
vertical or horizontal edges. For example, some neurons fired when exposed to vertical sides
and some when shown a horizontal edge. Hubel and Wiesel found that all of these neurons
were well ordered in a columnar fashion and that together they were able to produce visual
perception. This idea of specialized components inside of a system having specific tasks is
one that machines use as well and one that you can also find back in CNNs.

Convolutional neural networks have been one of the most influential innovations in the field
of computer vision. They have performed a lot better than traditional computer vision and
have produced state-of-the-art results. These neural networks have proven to be successful
in many different real-life case studies and applications, like:

Image classification, object detection, segmentation, face recognition;

Self driving cars that leverage CNN based vision systems;

Classification of crystal structure using a convolutional neural network;

And many more, of course!

To understand this success, you'll have to go back to 2012, the year in which Alex Krizhevsky
used convolutional neural networks to win that year's ImageNet Competition, reducing the
classification error from 26% to 15%.

Note that ImageNet Large Scale Visual Recognition Challenge (ILSVRC) began in the year
2010 is an annual competition where research teams assess their algorithms on the given
data set and compete to achieve higher accuracy on several visual recognition tasks.

This was the time when neural networks regained prominence after quite some time. This is
often called the "third wave of neural networks". The other two waves were in the 1940s until
the 1960s and in the 1970s to 1980s.

Alright, you know that you'll be working with feed-forward networks that are inspired by the
biological visual cortex, but what does that actually mean?

Take a look at the picture below:


Figure: Convolutional Neural Network from Wikimedia
The image shows you that you feed an image as an input to the network, which goes
through multiple convolutions, subsampling, a fully connected layer and finally outputs
something.

But what are all these concepts?

1. The convolution layer computes the output of neurons that are connected to local
regions or receptive fields in the input, each computing a dot product between their
weights and a small receptive field to which they are connected to in the input volume.
Each computation leads to extraction of a feature map from the input image. In other
words, imagine you have an image represented as a 5x5 matrix of values, and you take
a 3x3 matrix and slide that 3x3 window or kernel around the image. At each position of
that matrix, you multiply the values of your 3x3 window by the values in the image that
are currently being covered by the window. As a result, you'll get a single number that
represents all the values in that window of the images. You use this layer to filtering: as
the window moves over the image, you check for patterns in that section of the image.
This works because of filters, which are multiplied by the values outputted by the
convolution.

2. The objective of subsampling is to get an input representation by reducing its


dimensions, which helps in reducing overfitting. One of the techniques of subsampling is
max pooling. With this technique, you select the highest pixel value from a region
depending on its size. In other words, max pooling takes the largest value from the
window of the image currently covered by the kernel. For example, you can have a max-
pooling layer of size 2 x 2 will select the maximum pixel intensity value from 2 x 2 region.
You're right to think that the pooling layer then works a lot like the convolution layer!
You also take a kernel or a window and move it over the image; The only difference is
the function that is applied to the kernel and the image window isn't linear.

Figure: Max-Pooling from Wikipedia

3. The objective of the fully connected layer is to flatten the high-level features that are
learned by convolutional layers and combining all the features. It passes the flattened
output to the output layer where you use a softmax classifier or a sigmoid to predict the
input class label.

For more information, you can go here.


The Fashion-MNIST Data Set
Before you go ahead and load in the data, it's good to take a look at what you'll exactly be
working with! The Fashion-MNIST dataset is a dataset of Zalando's article images, with
28x28 grayscale images of 70,000 fashion products from 10 categories, and 7,000 images
per category. The training set has 60,000 images, and the test set has 10,000 images. You
can double check this later when you have loaded in your data! ;)

Fashion-MNIST is similar to the MNIST dataset that you might already know, which you use
to classify handwritten digits. That means that the image dimensions, training and test splits
are similar to the MNIST dataset. Tip: if you want to learn how to implement an Multi-Layer
Perceptron (MLP) for classification tasks with this latter dataset, go to this deep learning
with Keras tutorial.

You can find the Fashion-MNIST dataset here, but you can also load it with the help of
specific TensorFlow and Keras modules. You'll see how this works in the next section!

Load the Data


Keras comes with a library called datasets , which you can use to load datasets out of the
box: you download the data from the server and speeds up the process since you no longer
have to download the data to your computer. The train and test images along with the
labels are loaded and stored in variables train_X , train_Y , test_X , test_Y , respectively.

from keras.datasets import fashion_mnist


(train_X,train_Y), (test_X,test_Y) = fashion_mnist.load_data()

Explain code POWERED BY

Using TensorFlow backend.

Explain code POWERED BY

Great! That was pretty simple, wasn't it?

You have probably done this a million times by now, but it's always an essential step to get
started. Now you're completely set to start analyzing, processing and modeling your data!

Analyze the Data


Let's now analyze how images in the dataset look like. Even though you know the dimension
of the images by now, it's still worth the effort to analyze it programmatically: you might
have to rescale the image pixels and resize the images.

import numpy as np
from keras.utils import to_categorical
import matplotlib.pyplot as plt
%matplotlib inline

print('Training data shape : ', train_X.shape, train_Y.shape)

print('Testing data shape : ', test_X.shape, test_Y.shape)

Explain code POWERED BY

('Training data shape : ', (60000, 28, 28), (60000,))


('Testing data shape : ', (10000, 28, 28), (10000,))

Explain code POWERED BY


From the above output, you can see that the training data has a shape of 60000 x 28 x 28
since there are 60,000 training samples each of 28 x 28 dimension. Similarly, the test data
has a shape of 10000 x 28 x 28 since there are 10,000 testing samples.

# Find the unique numbers from the train labels


classes = np.unique(train_Y)
nClasses = len(classes)
print('Total number of outputs : ', nClasses)
print('Output classes : ', classes)

Explain code POWERED BY

('Total number of outputs : ', 10)


('Output classes : ', array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint8))

Explain code POWERED BY

There's also a total of ten output classes that range from 0 to 9.

Also, don't forget to take a look at what the images in your dataset:

plt.figure(figsize=[5,5])

# Display the first image in training data


plt.subplot(121)
plt.imshow(train_X[0,:,:], cmap='gray')
plt.title("Ground Truth : {}".format(train_Y[0]))

# Display the first image in testing data


plt.subplot(122)
plt.imshow(test_X[0,:,:], cmap='gray')
plt.title("Ground Truth : {}".format(test_Y[0]))

Explain code POWERED BY

Text(0.5,1,u'Ground Truth : 9')

Explain code POWERED BY

The output of above two plots looks like an ankle boot, and this class is assigned a class
label of 9. Similarly, other fashion products will have different labels, but similar products will
have same labels. This means that all the 7,000 ankle boot images will have a class label of
9.

Data Preprocessing
As you could see in the above plot, the images are grayscale images have pixel values that
range from 0 to 255. Also, these images have a dimension of 28 x 28. As a result, you'll need
to preprocess the data before you feed it into the model.

As a first step, convert each 28 x 28 image of the train and test set into a matrix of size
28 x 28 x 1 which is fed into the network.
train_X = train_X.reshape(-1, 28,28, 1)
test_X = test_X.reshape(-1, 28,28, 1)
train_X.shape, test_X.shape

Explain code POWERED BY

((60000, 28, 28, 1), (10000, 28, 28, 1))

Explain code POWERED BY

The data right now is in an int8 format, so before you feed it into the network you need
to convert its type to float32, and you also have to rescale the pixel values in range 0 - 1
inclusive. So let's do that!

train_X = train_X.astype('float32')
test_X = test_X.astype('float32')
train_X = train_X / 255.
test_X = test_X / 255.

Explain code POWERED BY

Now you need to convert the class labels into a one-hot encoding vector.

In one-hot encoding, you convert the categorical data into a vector of numbers. The reason
why you convert the categorical data in one hot encoding is that machine learning
algorithms cannot work with categorical data directly. You generate one boolean column for
each category or class. Only one of these columns could take on the value 1 for each
sample. Hence, the term one-hot encoding.

For your problem statement, the one hot encoding will be a row vector, and for each image,
it will have a dimension of 1 x 10. The important thing to note here is that the vector consists
of all zeros except for the class that it represents, and for that, it is 1. For example, the ankle
boot image that you plotted above has a label of 9, so for all the ankle boot images, the one
hot encoding vector would be [0 0 0 0 0 0 0 0 1 0] .

So let's convert the training and testing labels into one-hot encoding vectors:

# Change the labels from categorical to one-hot encoding


train_Y_one_hot = to_categorical(train_Y)
test_Y_one_hot = to_categorical(test_Y)

# Display the change for category label using one-hot encoding


print('Original label:', train_Y[0])
print('After conversion to one-hot:', train_Y_one_hot[0])

Explain code POWERED BY

('Original label:', 9)
('After conversion to one-hot:', array([ 0., 0., 0., 0., 0., 0., 0., 0.,

Explain code POWERED BY

That's pretty clear, right? Note that you can also print the train_Y_one_hot , which will
display a matrix of size 60000 x 10 in which each row depicts one-hot encoding of an
image.

This last step is a crucial one. In machine learning or any data specific task, you should
partition the data correctly. For the model to generalize well, you split the training data
into two parts, one designed for training and another one for validation. In this case, you
will train the model on 80\% of the training data and validate it on 20\% of the
remaining training data. This will also help to reduce overfitting since you will be
validating the model on the data it would not have seen in training phase, which will
help in boosting the test performance.

from sklearn.model_selection import train_test_split


train_X,valid_X,train_label,valid_label = train_test_split(train_X, train_Y_one_h

Explain code POWERED BY

For one last time let's check the shape of training and validation set.

train_X.shape,valid_X.shape,train_label.shape,valid_label.shape

Explain code POWERED BY

((48000, 28, 28, 1), (12000, 28, 28, 1), (48000, 10), (12000, 10))

Explain code POWERED BY

The Network
The images are of size 28 x 28. You convert the image matrix to an array, rescale it between
0 and 1, reshape it so that it's of size 28 x 28 x 1, and feed this as an input to the network.

You'll use three convolutional layers:

The first layer will have 32-3 x 3 filters,

The second layer will have 64-3 x 3 filters and

The third layer will have 128-3 x 3 filters.

In addition, there are three max-pooling layers each of size 2 x 2.

Figure: Architecture of the Model

Model the Data


First, let's import all the necessary modules required to train the model.

import keras
from keras.models import Sequential,Input,Model
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.layers.normalization import BatchNormalization
from keras.layers.advanced_activations import LeakyReLU
Explain code POWERED BY

You will use a batch size of 64 using a higher batch size of 128 or 256 is also preferable it all
depends on the memory. It contributes massively to determining the learning parameters
and affects the prediction accuracy. You will train the network for 20 epochs.

batch_size = 64
epochs = 20
num_classes = 10

Explain code POWERED BY

Neural Network Architecture


In Keras, you can just stack up layers by adding the desired layer one by one. That's exactly
what you'll do here: you'll first add a first convolutional layer with Conv2D() . Note that you
use this function because you're working with images! Next, you add the Leaky ReLU
activation function which helps the network learn non-linear decision boundaries. Since you
have ten different classes, you'll need a non-linear decision boundary that could separate
these ten classes which are not linearly separable.

More specifically, you add Leaky ReLUs because they attempt to fix the problem of dying
Rectified Linear Units (ReLUs). The ReLU activation function is used a lot in neural network
architectures and more specifically in convolutional networks, where it has proven to be
more effective than the widely used logistic sigmoid function. As of 2017, this activation
function is the most popular one for deep neural networks. The ReLU function allows the
activation to be thresholded at zero. However, during the training, ReLU units can "die". This
can happen when a large gradient flows through a ReLU neuron: it can cause the weights to
update in such a way that the neuron will never activate on any data point again. If this
happens, then the gradient flowing through the unit will forever be zero from that point on.
Leaky ReLUs attempt to solve this: the function will not be zero but will instead have a small
negative slope.

Next, you'll add the max-pooling layer with MaxPooling2D() and so on. The last layer is a
Dense layer that has a softmax activation function with 10 units, which is needed for this
multi-class classification problem.

fashion_model = Sequential()
fashion_model.add(Conv2D(32, kernel_size=(3, 3),activation='linear',input_shape=(
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D((2, 2),padding='same'))
fashion_model.add(Conv2D(64, (3, 3), activation='linear',padding='same'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
fashion_model.add(Conv2D(128, (3, 3), activation='linear',padding='same'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
fashion_model.add(Flatten())
fashion_model.add(Dense(128, activation='linear'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(Dense(num_classes, activation='softmax'))

Explain code POWERED BY

Compile the Model


After the model is created, you compile it using the Adam optimizer, one of the most popular
optimization algorithms. You can read more about this optimizer here. Additionally, you
specify the loss type which is categorical cross entropy which is used for multi-class
classification, you can also use binary cross-entropy as the loss function. Lastly, you specify
the metrics as accuracy which you want to analyze while the model is training.
fashion_model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras

Explain code POWERED BY

Let's visualize the layers that you created in the above step by using the summary function.
This will show some parameters (weights and biases) in each layer and also the total
parameters in your model.

fashion_model.summary()

Explain code POWERED BY

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_51 (Conv2D) (None, 28, 28, 32) 320
_________________________________________________________________
leaky_re_lu_57 (LeakyReLU) (None, 28, 28, 32) 0
_________________________________________________________________
max_pooling2d_49 (MaxPooling (None, 14, 14, 32) 0
_________________________________________________________________
conv2d_52 (Conv2D) (None, 14, 14, 64) 18496
_________________________________________________________________
leaky_re_lu_58 (LeakyReLU) (None, 14, 14, 64) 0
_________________________________________________________________
max_pooling2d_50 (MaxPooling (None, 7, 7, 64) 0
_________________________________________________________________
conv2d_53 (Conv2D) (None, 7, 7, 128) 73856
_________________________________________________________________
leaky_re_lu_59 (LeakyReLU) (None, 7, 7, 128) 0
_________________________________________________________________
max_pooling2d_51 (MaxPooling (None, 4, 4, 128) 0
_________________________________________________________________
flatten_17 (Flatten) (None, 2048) 0
_________________________________________________________________
dense_33 (Dense) (None, 128) 262272
_________________________________________________________________
leaky_re_lu_60 (LeakyReLU) (None, 128) 0
_________________________________________________________________
dense_34 (Dense) (None, 10) 1290
=================================================================
Total params: 356,234
Trainable params: 356,234
Non-trainable params: 0
_________________________________________________________________

Explain code POWERED BY

Train the Model


It's finally time to train the model with Keras' fit() function! The model trains for 20 epochs.
The fit() function will return a history object; By storying the result of this function in
fashion_train , you can use it later to plot the accuracy and loss function plots between
training and validation which will help you to analyze your model's performance visually.

fashion_train = fashion_model.fit(train_X, train_label, batch_size=batch_size,epo

Explain code POWERED BY


Train on 48000 samples, validate on 12000 samples
Epoch 1/20
48000/48000 [==============================] - 60s 1ms/step - loss: 0.4661 - acc:
Epoch 2/20
48000/48000 [==============================] - 60s 1ms/step - loss: 0.2874 - acc:
Epoch 3/20
48000/48000 [==============================] - 60s 1ms/step - loss: 0.2420 - acc:
Epoch 4/20
48000/48000 [==============================] - 59s 1ms/step - loss: 0.2088 - acc:
Epoch 5/20
48000/48000 [==============================] - 59s 1ms/step - loss: 0.1838 - acc:
Epoch 6/20
48000/48000 [==============================] - 59s 1ms/step - loss: 0.1605 - acc:
Epoch 7/20
48000/48000 [==============================] - 59s 1ms/step - loss: 0.1356 - acc:
Epoch 8/20
48000/48000 [==============================] - 59s 1ms/step - loss: 0.1186 - acc:
Epoch 9/20
48000/48000 [==============================] - 59s 1ms/step - loss: 0.0985 - acc:
Epoch 10/20
48000/48000 [==============================] - 59s 1ms/step - loss: 0.0873 - acc:
Epoch 11/20
48000/48000 [==============================] - 59s 1ms/step - loss: 0.0739 - acc:
Epoch 12/20
48000/48000 [==============================] - 60s 1ms/step - loss: 0.0628 - acc:
Epoch 13/20
48000/48000 [==============================] - 61s 1ms/step - loss: 0.0569 - acc:
Epoch 14/20
48000/48000 [==============================] - 60s 1ms/step - loss: 0.0452 - acc:
Epoch 15/20
48000/48000 [==============================] - 60s 1ms/step - loss: 0.0421 - acc:
Epoch 16/20
48000/48000 [==============================] - 60s 1ms/step - loss: 0.0379 - acc:
Epoch 17/20
48000/48000 [==============================] - 60s 1ms/step - loss: 0.0405 - acc:
Epoch 18/20
48000/48000 [==============================] - 60s 1ms/step - loss: 0.0285 - acc:
Epoch 19/20
48000/48000 [==============================] - 61s 1ms/step - loss: 0.0322 - acc:
Epoch 20/20
48000/48000 [==============================] - 61s 1ms/step - loss: 0.0262 - acc:

Explain code POWERED BY

Finally! You trained the model on fashion-MNIST for 20 epochs, and by observing the
training accuracy and loss, you can say that the model did a good job since after 20 epochs
the training accuracy is 99% and the training loss is quite low.

However, it looks like the model is overfitting, as the validation loss is 0.4396 and the
validation accuracy is 92%. Overfitting gives an intuition that the network has memorized
the training data very well but is not guaranteed to work on unseen data, and that is why
there is a difference in the training and validation accuracy.

You probably need to handle this. In next sections, you'll learn how you can make your
model perform much better by adding a Dropout layer into the network and keeping all the
other layers unchanged.

But first, let's evaluate the performance of your model on the test set before you come on to
a conclusion.

Model Evaluation on the Test Set


test_eval = fashion_model.evaluate(test_X, test_Y_one_hot, verbose=0)
Explain code POWERED BY

print('Test loss:', test_eval[0])


print('Test accuracy:', test_eval[1])

Explain code POWERED BY

('Test loss:', 0.46366268818555401)


('Test accuracy:', 0.91839999999999999)

Explain code POWERED BY

The test accuracy looks impressive. It turns out that your classifier does better than the
benchmark that was reported here, which is an SVM classifier with mean accuracy of 0.897.
Also, the model does well compared to some of the deep learning models mentioned on the
GitHub profile of the creators of fashion-MNIST dataset.

However, you saw that the model looked like it was overfitting. Are these results really all
that good?

Let's put your model evaluation into perspective and plot the accuracy and loss plots
between training and validation data:

accuracy = fashion_train.history['acc']
val_accuracy = fashion_train.history['val_acc']
loss = fashion_train.history['loss']
val_loss = fashion_train.history['val_loss']
epochs = range(len(accuracy))
plt.plot(epochs, accuracy, 'bo', label='Training accuracy')
plt.plot(epochs, val_accuracy, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()

Explain code POWERED BY


From the above two plots, you can see that the validation accuracy almost became
stagnant after 4-5 epochs and rarely increased at certain epochs. In the beginning, the
validation accuracy was linearly increasing with loss, but then it did not increase much.

The validation loss shows that this is the sign of overfitting, similar to validation accuracy it
linearly decreased but after 4-5 epochs, it started to increase. This means that the model
tried to memorize the data and succeeded.

With this in mind, it's time to introduce some dropout into our model and see if it helps in
reducing overfitting.

Adding Dropout into the Network


You can add a dropout layer to overcome the problem of overfitting to some extent. Dropout
randomly turns off a fraction of neurons during the training process, reducing the
dependency on the training set by some amount. How many fractions of neurons you want
to turn off is decided by a hyperparameter, which can be tuned accordingly. This way,
turning off some neurons will not allow the network to memorize the training data since not
all the neurons will be active at the same time and the inactive neurons will not be able to
learn anything.

So let's create, compile and train the network again but this time with dropout. And run it for
20 epochs with a batch size of 64.

batch_size = 64
epochs = 20
num_classes = 10

Explain code POWERED BY

fashion_model = Sequential()
fashion_model.add(Conv2D(32, kernel_size=(3, 3),activation='linear',padding='same
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D((2, 2),padding='same'))
fashion_model.add(Dropout(0.25))
fashion_model.add(Conv2D(64, (3, 3), activation='linear',padding='same'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
fashion_model.add(Dropout(0.25))
fashion_model.add(Conv2D(128, (3, 3), activation='linear',padding='same'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(MaxPooling2D(pool_size=(2, 2),padding='same'))
fashion_model.add(Dropout(0.4))
fashion_model.add(Flatten())
fashion_model.add(Dense(128, activation='linear'))
fashion_model.add(LeakyReLU(alpha=0.1))
fashion_model.add(Dropout(0.3))
fashion_model.add(Dense(num_classes, activation='softmax'))

Explain code POWERED BY


fashion_model.summary()

Explain code POWERED BY

_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_54 (Conv2D) (None, 28, 28, 32) 320
_________________________________________________________________
leaky_re_lu_61 (LeakyReLU) (None, 28, 28, 32) 0
_________________________________________________________________
max_pooling2d_52 (MaxPooling (None, 14, 14, 32) 0
_________________________________________________________________
dropout_29 (Dropout) (None, 14, 14, 32) 0
_________________________________________________________________
conv2d_55 (Conv2D) (None, 14, 14, 64) 18496
_________________________________________________________________
leaky_re_lu_62 (LeakyReLU) (None, 14, 14, 64) 0
_________________________________________________________________
max_pooling2d_53 (MaxPooling (None, 7, 7, 64) 0
_________________________________________________________________
dropout_30 (Dropout) (None, 7, 7, 64) 0
_________________________________________________________________
conv2d_56 (Conv2D) (None, 7, 7, 128) 73856
_________________________________________________________________
leaky_re_lu_63 (LeakyReLU) (None, 7, 7, 128) 0
_________________________________________________________________
max_pooling2d_54 (MaxPooling (None, 4, 4, 128) 0
_________________________________________________________________
dropout_31 (Dropout) (None, 4, 4, 128) 0
_________________________________________________________________
flatten_18 (Flatten) (None, 2048) 0
_________________________________________________________________
dense_35 (Dense) (None, 128) 262272
_________________________________________________________________
leaky_re_lu_64 (LeakyReLU) (None, 128) 0
_________________________________________________________________
dropout_32 (Dropout) (None, 128) 0
_________________________________________________________________
dense_36 (Dense) (None, 10) 1290
=================================================================
Total params: 356,234
Trainable params: 356,234
Non-trainable params: 0
_________________________________________________________________

Explain code POWERED BY

fashion_model.compile(loss=keras.losses.categorical_crossentropy, optimizer=keras

Explain code POWERED BY

fashion_train_dropout = fashion_model.fit(train_X, train_label, batch_size=batch_

Explain code POWERED BY

Train on 48000 samples, validate on 12000 samples


Epoch 1/20
48000/48000 [==============================] - 66s 1ms/step - loss: 0.5954 - acc:
Epoch 2/20
48000/48000 [==============================] - 64s 1ms/step - loss: 0.3797 - acc:
Epoch 3/20
48000/48000 [==============================] - 64s 1ms/step - loss: 0.3302 - acc:
Epoch 4/20
48000/48000 [==============================] - 64s 1ms/step - loss: 0.3034 - acc:
Epoch 5/20
48000/48000 [==============================] - 64s 1ms/step - loss: 0.2843 - acc:
Epoch 6/20
48000/48000 [==============================] - 64s 1ms/step - loss: 0.2699 - acc:
Epoch 7/20
48000/48000 [==============================] - 65s 1ms/step - loss: 0.2561 - acc:
Epoch 8/20
48000/48000 [==============================] - 65s 1ms/step - loss: 0.2503 - acc:
Epoch 9/20
48000/48000 [==============================] - 65s 1ms/step - loss: 0.2437 - acc:
Epoch 10/20
48000/48000 [==============================] - 65s 1ms/step - loss: 0.2307 - acc:
Epoch 11/20
48000/48000 [==============================] - 65s 1ms/step - loss: 0.2307 - acc:
Epoch 12/20
48000/48000 [==============================] - 65s 1ms/step - loss: 0.2229 - acc:
Epoch 13/20
48000/48000 [==============================] - 65s 1ms/step - loss: 0.2202 - acc:
Epoch 14/20
48000/48000 [==============================] - 64s 1ms/step - loss: 0.2161 - acc:
Epoch 15/20
48000/48000 [==============================] - 65s 1ms/step - loss: 0.2119 - acc:
Epoch 16/20
48000/48000 [==============================] - 65s 1ms/step - loss: 0.2073 - acc:
Epoch 17/20
48000/48000 [==============================] - 65s 1ms/step - loss: 0.2050 - acc:
Epoch 18/20
48000/48000 [==============================] - 64s 1ms/step - loss: 0.2016 - acc:
Epoch 19/20
48000/48000 [==============================] - 65s 1ms/step - loss: 0.2001 - acc:
Epoch 20/20
48000/48000 [==============================] - 64s 1ms/step - loss: 0.1972 - acc:

Explain code POWERED BY

Let's save the model so that you can directly load it and not have to train it again for 20
epochs. This way, you can load the model later on if you need it and modify the
architecture; Alternatively, you can start the training process on this saved model. It is
always a good idea to save the model -and even the model's weights!- because it saves you
time. Note that you can also save the model after every epoch so that, if some issue occurs
that stops the training at an epoch, you will not have to start the training from the
beginning.

fashion_model.save("fashion_model_dropout.h5py")

Explain code POWERED BY

Model Evaluation on the Test Set


Finally, let's also evaluate your new model and see how it performs!

test_eval = fashion_model.evaluate(test_X, test_Y_one_hot, verbose=1)

Explain code POWERED BY

10000/10000 [==============================] - 5s 461us/step


Explain code POWERED BY

print('Test loss:', test_eval[0])


print('Test accuracy:', test_eval[1])

Explain code POWERED BY

('Test loss:', 0.21460009642243386)


('Test accuracy:', 0.92300000000000004)

Explain code POWERED BY

Wow! Looks like adding Dropout in our model worked, even though the test accuracy did not
improve significantly but the test loss decreased compared to the previous results.

Now, let's plot the accuracy and loss plots between training and validation data for the one
last time.

accuracy = fashion_train_dropout.history['acc']
val_accuracy = fashion_train_dropout.history['val_acc']
loss = fashion_train_dropout.history['loss']
val_loss = fashion_train_dropout.history['val_loss']
epochs = range(len(accuracy))
plt.plot(epochs, accuracy, 'bo', label='Training accuracy')
plt.plot(epochs, val_accuracy, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()

Explain code POWERED BY


Finally, you can see that the validation loss and validation accuracy both are in sync with
the training loss and training accuracy. Even though the validation loss and accuracy line
are not linear, but it shows that your model is not overfitting: the validation loss is
decreasing and not increasing, and there is not much gap between training and validation
accuracy.

Therefore, you can say that your model's generalization capability became much better
since the loss on both test set and validation set was only slightly more compared to the
training loss.

Predict Labels
predicted_classes = fashion_model.predict(test_X)

Explain code POWERED BY

Since the predictions you get are floating point values, it will not be feasible to compare the
predicted labels with true test labels. So, you will round off the output which will convert the
float values into an integer. Further, you will use np.argmax() to select the index number
which has a higher value in a row.

For example, let's assume a prediction for one test image to be 0 1 0 0 0 0 0 0 0 0 , the
output for this should be a class label 1 .

predicted_classes = np.argmax(np.round(predicted_classes),axis=1)

Explain code POWERED BY

predicted_classes.shape, test_Y.shape

Explain code POWERED BY

((10000,), (10000,))

Explain code POWERED BY

correct = np.where(predicted_classes==test_Y)[0]
print "Found %d correct labels" % len(correct)
for i, correct in enumerate(correct[:9]):
plt.subplot(3,3,i+1)
plt.imshow(test_X[correct].reshape(28,28), cmap='gray', interpolation='none')
plt.title("Predicted {}, Class {}".format(predicted_classes[correct], test_Y[
plt.tight_layout()

Explain code POWERED BY

Found 9188 correct labels

Explain code POWERED BY


incorrect = np.where(predicted_classes!=test_Y)[0]
print "Found %d incorrect labels" % len(incorrect)
for i, incorrect in enumerate(incorrect[:9]):
plt.subplot(3,3,i+1)
plt.imshow(test_X[incorrect].reshape(28,28), cmap='gray', interpolation='none
plt.title("Predicted {}, Class {}".format(predicted_classes[incorrect], test_
plt.tight_layout()

Explain code POWERED BY

Found 812 incorrect labels

Explain code POWERED BY

By looking at a few images, you cannot be sure as to why your model is not able to classify
the above images correctly, but it seems like a variety of the similar patterns present on
multiple classes affect the performance of the classifier although CNN is a robust
architecture. For example, images 5 and 6 both belong to different classes but look kind of
similar maybe a jacket or perhaps a long sleeve shirt.

Classification Report
Classification report will help us in identifying the misclassified classes in more detail. You
will be able to observe for which class the model performed bad out of the given ten
classes.

from sklearn.metrics import classification_report


target_names = ["Class {}".format(i) for i in range(num_classes)]
print(classification_report(test_Y, predicted_classes, target_names=target_names)

Explain code POWERED BY


precision recall f1-score support

Class 0 0.77 0.90 0.83 1000


Class 1 0.99 0.98 0.99 1000
Class 2 0.88 0.88 0.88 1000
Class 3 0.94 0.92 0.93 1000
Class 4 0.88 0.87 0.88 1000
Class 5 0.99 0.98 0.98 1000
Class 6 0.82 0.72 0.77 1000
Class 7 0.94 0.99 0.97 1000
Class 8 0.99 0.98 0.99 1000
Class 9 0.98 0.96 0.97 1000

avg / total 0.92 0.92 0.92 10000

Explain code POWERED BY

You can see that the classifier is underperforming for class 6 regarding both precision and
recall. For class 0 and class 2, the classifier is lacking precision. Also, for class 4, the
classifier is slightly lacking both precision and recall.

Go Further!
This tutorial was good start to convolutional neural networks in Python with Keras. If you
were able to follow along easily or even with little more efforts, well done! Try doing some
experiments maybe with same model architecture but using different types of public
datasets available.

There is still a lot to cover, so why not take DataCamp's Deep Learning in Python course? In
the meantime, also make sure to check out the Keras documentation, if you haven't done so
already. You will find more examples and information on all functions, arguments, more
layers, etc. It will undoubtedly be an indispensable resource when you're learning how to
work with neural networks in Python!

If you rather feel like reading a book that explains the fundamentals of deep learning (with
Keras) together with how it's used in practice, you should definitely read François Chollet's
Deep Learning in Python book.

TO P I C S

Python Artificial Intelligence Machine Learning Deep Learning

Training more people?


Get your team access to the full DataCamp for business platform.

For Business

For a bespoke solution book a demo.

Learn more about Python and Deep Learning

COURSE

Introduction to Deep Learning in Python


4 hr 251.9K
Learn the fundamentals of neural networks and how to build deep learning models using Keras 2.0 in Python.

See Details Start Course

See More

Related
C H E AT- S H E E T
Keras Cheat Sheet: Neural
Networks in Python

T U TO R I A L
Convolutional Neural Networks
(CNN) with TensorFlow Tutorial

T U TO R I A L
Keras Tutorial: Deep Learning in
Python

See More

Grow your data skills with DataCamp for Mobile


Make progress on the go with our mobile courses and daily 5-minute coding challenges.

LEARN

Learn Python

Learn AI

Learn Power BI

Learn Data Engineering

Assessments

Career Tracks

Skill Tracks

Courses

Data Science Roadmap

DATA C O U R S E S
Python Courses

R Courses

SQL Courses

Power BI Courses

Tableau Courses

Alteryx Courses

Azure Courses

AWS Courses

Google Sheets Courses

Excel Courses

AI Courses

Data Analysis Courses

Data Visualization Courses

Machine Learning Courses

Data Engineering Courses

Probability & Statistics Courses

DATA L A B

Get Started

Pricing

Security

Documentation

C E R T I F I C AT I O N

Certifications

Data Scientist

Data Analyst

Data Engineer

SQL Associate

Power BI Data Analyst

Tableau Certified Data Analyst

Azure Fundamentals

AI Fundamentals

RESOURCES

Resource Center

Upcoming Events
Blog

Code-Alongs

Tutorials

Docs

Open Source

RDocumentation

Book a Demo with DataCamp for Business

Data Portfolio

PLANS

Pricing

For Students

For Business

For Universities

Discounts, Promos & Sales

DataCamp Donates

FO R B U S I N E S S

Business Pricing

Teams Plan

Data & AI Unlimited Plan

Customer Stories

Partner Program

ABOUT

About Us

Learner Stories

Careers

Become an Instructor

Press

Leadership

Contact Us

DataCamp Español

DataCamp Português

DataCamp Deutsch

DataCamp Français
S U P PO R T

Help Center

Become an Affiliate

Privacy Policy Cookie Notice Do Not Sell My Personal Information Accessibility Security Terms of Use

© 2025 DataCamp, Inc. All Rights Reserved.

You might also like