Transfer Learning: Objectives
Transfer Learning: Objectives
Transfer Learning
So far, we have trained accurate models on large datasets, and also downloaded a pre-trained model that we
used with no training necessary. But what if we cannot find a pre-trained model that does exactly what you
need, and what if we do not have a sufficiently large dataset to train a model from scratch? In this case, there
is a very helpful technique we can use called transfer learning (https://blogs.nvidia.com/blog/2019/02/07/what-
is-transfer-learning/).
With transfer learning, we take a pre-trained model and retrain it on a task that has some overlap with the
original training task. A good analogy for this is an artist who is skilled in one medium, such as painting, who
wants to learn to practice in another medium, such as charcoal drawing. We can imagine that the skills they
learned while painting would be very valuable in learning how to draw with charcoal.
As an example in deep learning, say we have a pre-trained model that is very good at recognizing different
types of cars, and we want to train a model to recognize types of motorcycles. A lot of the learnings of the car
model would likely be very useful, for instance the ability to recognize headlights and wheels.
Transfer learning is especially powerful when we do not have a large and varied dataset. In this case, a model
trained from scratch would likely memorize the training data quickly, but not be able to generalize well to new
data. With transfer learning, you can increase your chances of training an accurate and robust model on a
small dataset.
Objectives
In our last exercise, we used a pre-trained ImageNet (http://www.image-net.org/) model to let in all dogs, but
keep out other animals. In this exercise, we would like to create a doggy door that only lets in a particular dog.
In this case, we will make an automatic doggy door for a dog named Bo, the United States First Dog between
2009 and 2017. There are more pictures of Bo in the data/presidential_doggy_door folder.
1 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
The challenge is that the pre-trained model was not trained to recognize this specific dog, and, we only have
30 pictures of Bo. If we tried to train a model from scratch using those 30 pictures we would experience
overfitting and poor generalization. However, if we start with a pre-trained model that is adept at detecting
dogs, we can leverage that learning to gain a generalized understanding of Bo using our smaller dataset. We
can use transfer learning to solve this challenge.
Let us start by downloading the pre-trained model. Again, this is available directly from the Keras library. As
we are downloading, there is going to be an important difference. The last layer of an ImageNet model is a
dense layer (https://developers.google.com/machine-learning/glossary#dense-layer) of 1000 units,
representing the 1000 possible classes in the dataset. In our case, we want it to make a different
classification: is this Bo or not? Because we want the classification to be different, we are going to remove the
last layer of the model. We can do this by setting the flag include_top=False when downloading the
model. After removing this top layer, we can add new layers that will yield the type of classification that we
want:
base_model = keras.applications.VGG16(
weights='imagenet', # Load weights pre-trained on ImageNet.
input_shape=(224, 224, 3),
include_top=False)
2 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
In [2]: base_model.summary()
Model: "vgg16"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
=================================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
_________________________________________________________________
3 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
Freezing the base layers is as simple as setting trainable on the model to False .
We can now add the new trainable layers to the pre-trained model. They will take the features from the pre-
trained layers and turn them into predictions on the new dataset. We will add two layers to the model. First will
be a pooling layer like we saw in our earlier convolutional neural network (https://developers.google.com
/machine-learning/glossary#convolutional_layer). (If you want a more thorough understanding of the role of
pooling layers in CNNs, please read this detailed blog post (https://machinelearningmastery.com/pooling-
layers-for-convolutional-neural-networks
/#:~:text=A%20pooling%20layer%20is%20a,Convolutional%20Layer)). We then need to add our final layer,
which will classify Bo or not Bo. This will be a densely connected layer with one output.
Let us take a look at the model, now that we have combined the pre-trained model with the new layers.
4 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
In [5]: model.summary()
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 224, 224, 3)] 0
_________________________________________________________________
vgg16 (Model) (None, 7, 7, 512) 14714688
_________________________________________________________________
global_average_pooling2d (Gl (None, 512) 0
_________________________________________________________________
dense (Dense) (None, 1) 513
=================================================================
Total params: 14,715,201
Trainable params: 513
Non-trainable params: 14,714,688
_________________________________________________________________
Keras gives us a nice summary here, as it shows the vgg16 pre-trained model as one unit, rather than
showing all of the internal layers. It is also worth noting that we have many non-trainable parameters as we
have frozen the pre-trained model.
As with our previous exercises, we need to compile the model with loss and metrics options. We have to
make some different choices here. In previous cases we had many categories in our classification problem.
As a result, we picked categorical crossentropy for the calculation of our loss. In this case we only have a
binary classification problem (Bo or not Bo), and so we will use binary crossentropy
(https://www.tensorflow.org/api_docs/python/tf/keras/losses/BinaryCrossentropy). Further detail about the
differences between the two can found here (https://gombru.github.io/2018/05/23/cross_entropy_loss/). We
will also use binary accuracy instead of traditional accuracy.
5 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
Now that we are dealing with a very small dataset, it is especially important that we augment our data. As
before, we will make small modifications to the existing images, which will allow the model to see a wider
variety of images to learn from. This will help it learn to recognize new pictures of Bo instead of just
memorizing the pictures it trains on.
We have seen datasets in a couple different formats so far. In the MNIST exercise, we were able to download
the dataset directly from within the Keras library. For the sign language dataset, the data was in CSV files. For
this exercise, we are going to load images directly from folders using Keras' flow_from_directory
(https://keras.io/api/preprocessing/image/) function. We have set up our directories to help this process go
smoothly as our labels are inferred from the folder names. In the data/presidential_doggy_door
directory, we have train and validation directories, which each have folders for images of Bo and not Bo. In
the not_bo directories, we have pictures of other dogs and cats, to teach our model to keep out other pets.
Feel free to explore the images to get a sense of our dataset.
Note that flow_from_directory (https://keras.io/api/preprocessing/image/) will also allow us to size our images
to match the model: 244x244 pixels with 3 channels.
6 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
Time to train our model and see how it does. Recall that when using a data generator, we have to explicitly
set the number of steps_per_epoch :
7 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
8 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
Epoch 1/20
12/12 [==============================] - 5s 441ms/step - loss: 1.4728
- binary_accuracy: 0.6979 - val_loss: 0.9218 - val_binary_accuracy:
0.8000
Epoch 2/20
12/12 [==============================] - 2s 187ms/step - loss: 0.4864
- binary_accuracy: 0.8242 - val_loss: 0.7172 - val_binary_accuracy:
0.8000
Epoch 3/20
12/12 [==============================] - 2s 144ms/step - loss: 0.4520
- binary_accuracy: 0.8681 - val_loss: 0.7320 - val_binary_accuracy:
0.8667
Epoch 4/20
12/12 [==============================] - 2s 155ms/step - loss: 0.3451
- binary_accuracy: 0.9062 - val_loss: 0.0649 - val_binary_accuracy:
1.0000
Epoch 5/20
12/12 [==============================] - 2s 138ms/step - loss: 0.1165
- binary_accuracy: 0.9121 - val_loss: 0.3346 - val_binary_accuracy:
0.9333
Epoch 6/20
12/12 [==============================] - 2s 139ms/step - loss: 0.0585
- binary_accuracy: 0.9670 - val_loss: 0.2352 - val_binary_accuracy:
0.9667
Epoch 7/20
12/12 [==============================] - 2s 155ms/step - loss: 0.0383
- binary_accuracy: 0.9896 - val_loss: 0.1609 - val_binary_accuracy:
0.9667
Epoch 8/20
12/12 [==============================] - 2s 141ms/step - loss: 0.0645
- binary_accuracy: 0.9780 - val_loss: 0.1469 - val_binary_accuracy:
0.9333
Epoch 9/20
12/12 [==============================] - 2s 131ms/step - loss: 0.0416
- binary_accuracy: 0.9780 - val_loss: 0.2644 - val_binary_accuracy:
0.9333
Epoch 10/20
12/12 [==============================] - 2s 149ms/step - loss: 0.0304
- binary_accuracy: 0.9896 - val_loss: 0.0225 - val_binary_accuracy:
1.0000
Epoch 11/20
12/12 [==============================] - 2s 151ms/step - loss: 0.0384
- binary_accuracy: 0.9780 - val_loss: 0.0927 - val_binary_accuracy:
0.9667
Epoch 12/20
12/12 [==============================] - 1s 124ms/step - loss: 0.0338
- binary_accuracy: 0.9780 - val_loss: 0.0147 - val_binary_accuracy:
1.0000
Epoch 13/20
12/12 [==============================] - 2s 140ms/step - loss: 0.0035
- binary_accuracy: 1.0000 - val_loss: 0.1087 - val_binary_accuracy:
0.9667
Epoch 14/20
12/12 [==============================] - 2s 151ms/step - loss: 0.0029
- binary_accuracy: 1.0000 - val_loss: 0.1541 - val_binary_accuracy:
0.9667
9 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
Epoch 15/20
12/12 [==============================] - 2s 138ms/step - loss: 0.0040
- binary_accuracy: 1.0000 - val_loss: 0.0363 - val_binary_accuracy:
1.0000
Epoch 16/20
12/12 [==============================] - 2s 151ms/step - loss: 7.8249
e-04 - binary_accuracy: 1.0000 - val_loss: 0.0541 - val_binary_accura
cy: 0.9667
Epoch 17/20
12/12 [==============================] - 2s 132ms/step - loss: 0.0027
- binary_accuracy: 1.0000 - val_loss: 0.0649 - val_binary_accuracy:
0.9667
Epoch 18/20
12/12 [==============================] - 2s 142ms/step - loss: 0.0051
- binary_accuracy: 1.0000 - val_loss: 0.0456 - val_binary_accuracy:
0.9667
Epoch 19/20
12/12 [==============================] - 2s 150ms/step - loss: 4.5949
e-04 - binary_accuracy: 1.0000 - val_loss: 0.1243 - val_binary_accura
cy: 0.9667
Epoch 20/20
12/12 [==============================] - 2s 143ms/step - loss: 0.0042
- binary_accuracy: 1.0000 - val_loss: 0.0217 - val_binary_accuracy:
1.0000
Out[9]: <tensorflow.python.keras.callbacks.History at 0x7f25782c05c0>
Discussion of Results
Both the training and validation accuracy should be quite high. This is a pretty awesome result! We were able
to train on a small dataset, but because of the knowledge transferred from the ImageNet model, it was able to
achieve high accuracy and generalize well. This means it has a very good sense of Bo and pets who are not
Bo.
If you saw some fluctuation in the validation accuracy, that is okay too. We have a technique for improving our
model in the next section.
10 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
Now that the new layers of the model are trained, we have the option to apply a final trick to improve the
model, called fine-tuning (https://developers.google.com/machine-learning/glossary#f). To do this we unfreeze
the entire model, and train it again with a very small learning rate (https://developers.google.com/machine-
learning/glossary#learning-rate). This will cause the base pre-trained layers to take very small steps and
adjust slightly, improving the model by a small amount.
Note that it is important to only do this step after the model with frozen layers has been fully trained. The
untrained pooling and classification layers that we added to the model earlier were randomly initialized. This
means they needed to be updated quite a lot to correctly classify the images. Through the process of
backpropagation (https://developers.google.com/machine-learning/glossary#backpropagation), large initial
updates in the last layers would have caused potentially large updates in the pre-trained layers as well. These
updates would have destroyed those important pre-trained features. However, now that those final layers are
trained and have converged, any updates to the model as a whole will be much smaller (especially with a very
small learning rate) and will not destroy the features of the earlier layers.
Let's try unfreezing the pre-trained layers, and then fine tuning the model:
# It's important to recompile your model after you make any changes
# to the `trainable` attribute of any inner layer, so that your changes
# are taken into account
model.compile(optimizer=keras.optimizers.RMSprop(learning_rate = .0000
1), # Very low learning rate
loss=keras.losses.BinaryCrossentropy(from_logits=True),
metrics=[keras.metrics.BinaryAccuracy()])
11 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
Epoch 1/10
12/12 [==============================] - 8s 682ms/step - loss: 0.0767
- binary_accuracy: 0.9890 - val_loss: 0.0057 - val_binary_accuracy:
1.0000
Epoch 2/10
12/12 [==============================] - 2s 176ms/step - loss: 1.0105
e-04 - binary_accuracy: 1.0000 - val_loss: 0.0217 - val_binary_accura
cy: 1.0000
Epoch 3/10
12/12 [==============================] - 2s 184ms/step - loss: 3.8384
e-05 - binary_accuracy: 1.0000 - val_loss: 0.0427 - val_binary_accura
cy: 0.9667
Epoch 4/10
12/12 [==============================] - 2s 180ms/step - loss: 0.0064
- binary_accuracy: 1.0000 - val_loss: 0.1059 - val_binary_accuracy:
0.9333
Epoch 5/10
12/12 [==============================] - 2s 171ms/step - loss: 3.4135
e-04 - binary_accuracy: 1.0000 - val_loss: 0.0040 - val_binary_accura
cy: 1.0000
Epoch 6/10
12/12 [==============================] - 2s 189ms/step - loss: 1.5092
e-05 - binary_accuracy: 1.0000 - val_loss: 0.0222 - val_binary_accura
cy: 1.0000
Epoch 7/10
12/12 [==============================] - 2s 178ms/step - loss: 1.0870
e-04 - binary_accuracy: 1.0000 - val_loss: 0.0164 - val_binary_accura
cy: 1.0000
Epoch 8/10
12/12 [==============================] - 2s 171ms/step - loss: 9.2534
e-06 - binary_accuracy: 1.0000 - val_loss: 0.0019 - val_binary_accura
cy: 1.0000
Epoch 9/10
12/12 [==============================] - 2s 168ms/step - loss: 5.3043
e-06 - binary_accuracy: 1.0000 - val_loss: 0.0011 - val_binary_accura
cy: 1.0000
Epoch 10/10
12/12 [==============================] - 2s 185ms/step - loss: 1.3720
e-05 - binary_accuracy: 1.0000 - val_loss: 0.0048 - val_binary_accura
cy: 1.0000
Now that we have a well-trained model, it is time to create our doggy door for Bo! We can start by looking at
the predictions that come from the model. We will preprocess the image in the same way we did for our last
doggy door.
12 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
def show_image(image_path):
image = mpimg.imread(image_path)
plt.imshow(image)
def make_predictions(image_path):
show_image(image_path)
image = image_utils.load_img(image_path, target_size=(224, 224))
image = image_utils.img_to_array(image)
image = image.reshape(1,224,224,3)
image = preprocess_input(image)
preds = model.predict(image)
return preds
In [13]: make_predictions('data/presidential_doggy_door/valid/bo/bo_20.jpg')
13 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
In [14]: make_predictions('data/presidential_doggy_door/valid/not_bo/121.jpg')
It looks like a negative number prediction means that it is Bo and a positive number prediction means it is
something else. We can use this information to have our doggy door only let Bo in!
Solution
14 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
In [17]: presidential_doggy_door('data/presidential_doggy_door/valid/not_bo/131.
jpg')
In [18]: presidential_doggy_door('data/presidential_doggy_door/valid/bo/bo_29.jp
g')
Summary
15 of 16 12/07/2021, 05:01 pm
05b_presidential_doggy_door about:srcdoc
Great work! With transfer learning, you have built a highly accurate model using a very small dataset. This
can be an extremely powerful technique, and be the difference between a successful project and one that
cannot get off the ground. We hope these techniques can help you out in similar situations in the future!
There is a wealth of helpful resources for transfer learning in the NVIDIA Transfer Learning Toolkit
(https://developer.nvidia.com/tlt-getting-started).
Next
So far, the focus of this workshop has primarily been on image classification. In the next section, in service of
giving you a more well-rounded introduction to deep learning, we are going to switch gears and address
working with sequential data, which requires a different approach.
16 of 16 12/07/2021, 05:01 pm