0% found this document useful (0 votes)
88 views6 pages

NB4-08 PT III Early Stopping

The document discusses the implementation of early stopping in machine learning, particularly using TensorFlow and PyTorch, to prevent overfitting during model training. It outlines the concept, parameters, benefits, and limitations of early stopping, as well as providing practical steps for setup, including model definition, training, and checkpointing. Additionally, it emphasizes the importance of monitoring validation metrics and restoring the best model weights to enhance model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
88 views6 pages

NB4-08 PT III Early Stopping

The document discusses the implementation of early stopping in machine learning, particularly using TensorFlow and PyTorch, to prevent overfitting during model training. It outlines the concept, parameters, benefits, and limitations of early stopping, as well as providing practical steps for setup, including model definition, training, and checkpointing. Additionally, it emphasizes the importance of monitoring validation metrics and restoring the best model weights to enhance model performance.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

NB4-08: Biomedical Imaging and PyTorch III

Goal: Let's try to create a NB where we use early stopping i.e., when we reach a
model that does not improve for a certain number of epochs we stop our training. We
will use a skin lesion dataset with 2 classes.

1. Introduction

What is Early Stopping?

Early stopping is a widely-used regularization technique in machine learning and


deep learning aimed at preventing overfitting during the training process. Here are
some key points about early stopping:

1. Purpose: The main goal of early stopping is to halt the training of a model once
it becomes apparent that the model's performance on a validation set is no
longer improving. This helps in ensuring that the model generalizes well to new,
unseen data rather than just fitting the training data.

2. Mechanism: During training, the model's performance is periodically evaluated


on a separate validation dataset. If the validation performance does not
improve for a certain number of consecutive epochs (a parameter known as
"patience"), the training is stopped. The best weights observed during training
are then restored.

3. Parameters:

o Patience: This parameter defines the number of epochs to wait for an


improvement before stopping. If set to 5, for example, training will stop if
there is no improvement in validation performance for 5 epochs.

o Minimum Delta: This parameter sets the minimum change in the


monitored metric to qualify as an improvement. This helps to filter out
small changes that are not significant.

4. Benefits:

o Prevents Overfitting: By stopping early, the model is less likely to fit


the noise in the training data, leading to better generalization on unseen
data.

o Efficiency: Saves computational resources by not continuing training


when improvements are unlikely.

o Simplicity: Easy to implement and does not require modification of the


underlying model architecture.

5. Implementation: Early stopping is typically implemented using callbacks in


deep learning frameworks such as Keras, TensorFlow, and PyTorch. These
callbacks monitor the validation metrics and stop training when the criteria are
met.

6. Limitations: While early stopping is effective, it may not always be the best
choice for every model or dataset. For some complex problems, more
sophisticated regularization techniques might be required alongside early
stopping.

Using Early Stopping with Tensorflow

In TensorFlow, tf.keras.callbacks.EarlyStopping is used to implement early stopping.


Steps to Use Early Stopping in TensorFlow are:

1. Import Required Libraries: First, import TensorFlow and other relevant


packages.

import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping

2. Define the Model: Define your TensorFlow/Keras model as usual.

model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(input_shape,)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])

model.compile(optimizer='adam', loss='mse', metrics=['mae'])

3. Configure Early Stopping: Set up the EarlyStopping callback. You can adjust
several parameters such as monitor, patience, and min_delta.

o monitor: Metric to watch (e.g., 'val_loss', 'val_accuracy').

o patience: Number of epochs with no improvement after which training will


be stopped.

o min_delta: Minimum change in the monitored metric to qualify as an


improvement.

o mode: in auto, if the metric is loss, it will assume min stops decreasing. If
it's accuracy, it will assume max stops increasing.

early_stopping = EarlyStopping(
monitor='val_loss', # Can be 'val_loss', 'val_accuracy', etc.
patience=10, # Number of epochs with no improvement to stop training
min_delta=0.001, # Minimum change to qualify as an improvement
mode='min', # Can be 'min', 'max', or 'auto'
verbose=1
)

4. Train the Model with Early Stopping: Pass the EarlyStopping callback to
the fit method.

history = model.fit(
x_train, y_train,
epochs=100,
validation_data=(x_val, y_val),
callbacks=[early_stopping]
)

You can adjust other parameters of EarlyStopping according to your needs, for
example:

 restore_best_weights: If set to True, it will restore the model weights to the


epoch with the best monitored metric.

early_stopping = EarlyStopping(
monitor='val_loss',
patience=10,
min_delta=0.001,
mode='min',
verbose=1,
restore_best_weights=True
)

This configuration ensures that after stopping the training, the model will have the
best weights obtained during the training process.

Using Early Stopping with Pytorch

In PyTorch, early stopping is not built-in like in TensorFlow, but you can implement it
easily by monitoring the validation loss (or any other metric) during the
training loop. When the validation loss does not improve for a specified number of
epochs, you can stop the training and optionally restore the best model weights. This
is our mission today in this NoteBook.

2. Setting up Our Workspace

First, we check if GPU is connected. The nvidia-smi command (NVIDIA System


Management Interface) is used to monitor and manage NVIDIA GPUs (Graphics
Processing Units) in a system. It provides detailed information about the status and
performance of the GPUs, including GPU utilization, temperature, memory usage,
processes utilizing the GPU, and more.

nvidia-smi is a command-line utility provided by NVIDIA that helps you manage and
monitor NVIDIA GPU devices. It stands for NVIDIA System Management Interface.

Setting our workspace: /content and /content/datasets

Setting our Home

We save the root directory of the project '/content' as 'HOME' since we will be
navigating through the directory to have multiple projects under the same HOME.
Additionally, we will have the datasets in the 'datasets' directory, so all datasets are
easily accessible for any project.

Mount Google Drive


Next, it imports the drive module from the google.colab library, which provides
functionalities for mounting Google Drive in Google Colab.

Additionally, Google Drive is mounted in Google Colab and made available at the
path /content/drive. The user will be prompted to authorize access to Google Drive.
Once authorized, the content of Google Drive will be accessible from that point
onwards in the Colab notebook.

3. Load a Dataset (Dataloader)

Create a directory where we can save our dataset

Create the dataset directory (if it doesn't exist), where we are going to save the
dataset with which we are going to train our CNN.

Change to new directory datasets

Check if the file specified by file does not exist in the current directory. If it doesn't
exist, the code block inside the conditional, which in this case would be downloading
the file from the specified URL, is executed. then, it extracts the contents
of exp0.zip into the current directory quietly, overwriting any existing files if
necessary.

Display 8 images from a class from test

Now, we will use the matplotlib library to display multiple images in a 2x4 grid layout.
Next code imports necessary modules, including matplotlib.pyplot for plotting, os for
file matching, and PIL for image handling.

It specifies the directory containing the images and retrieves the paths of the first
8 .jpg images in that directory using os.path.join().

Then, it creates a figure with subplots arranged in a 2x4 grid and iterates through the
image paths, displaying each image in a subplot using imshow(). The title of each
subplot is set to indicate the image index, and axis labels are turned off.

After displaying all images, it adjusts the layout to prevent overlapping and

Setting a Dataloader

The purpose of a DataLoader is fundamental in the context of machine learning and


deep learning, especially when working with large or complex datasets.Its main
purpose is to facilitate the efficient loading and manipulation of data during model
training.

Normalize the dataloaders using Statistics

 Normalization: Normalization is crucial for ensuring that pixel values across


images are on a similar scale [0 1], which helps in stabilizing and speeding up
the training process of deep neural networks.

 Dataset Preparation: Each dataset (train_data, val_set, test_set) is prepared


with consistent transformations and normalization, facilitating uniformity in data
processing across training, validation, and testing phases.
4. Define a Convolutional Neural Network

5. Train the Network

Early Stopping

As previously explained, Early stopping is a technique used in machine learning


model training to halt training before the model begins to overfit the training data.
This is done by monitoring a metric of interest on the validation set and stopping
training when the metric ceases to improve for a certain number of consecutive
epochs.

For several training sessions in a row you can "crush" your previous training, then, you
can use the Python standard library os to create a directory named"train"in the
current directory and then save the trained models (pth) in that directory each time
that we train the model. Here's an example of how to do it:

To create directories named train1, train2, etc., each time you execute a training loop,
you can modify the code to check the number of existing training directories and then
create the next directory in sequence. Here's an example of how you could do this:

Checkpoints and Early Stopping

Application checkpointing is a fault tolerance technique. In this approach,


a snapshot of the state of the system is taken in case of system failure. If there is a
problem, you can resume from the snapshot. The checkpoint may be used directly or
as the starting point for a new run, picking up where it left off. When training deep
learning models, the checkpoint captures the weights of the model. These weights
can be used to make predictions as-is or as the basis for ongoing training.

PyTorch does not provide any function for checkpointing but it has functions for
retrieving and restoring weights of a model. So you can implement checkpointing
logic with them. Let’s make a checkpoint and a resume function, which simply save
weights from a model and load them back:

Sometimes, there are states outside of the model and you may want to checkpoint it
as well. One particular example is the optimizer, which in cases like SDG or Adam,
there are dynamically adjusted momentum. If you restarted your training loop, you
may want to restore the momentum at the optimizer as well. It is not difficult to do.

The idea is to make your checkpoint() function thanks


to torch.save() and torch.load() function are backed by pickle a Python object
serializator, so you can use it with a list or dict container.

Displaying the metrics curves model

Checkpointing is not only for fault tolerance. You can also use it to keep your best
model. How to define what is the best is subjective but considering the score from the
val set is a sensible method. Let’s say to keep only the best model ever found.

The variable best_accuracy is to keep track on the highest validation accuracy


(val_acc) obtained so far, which is in a percentage range of 0 to 100. Whenever a
higher accuracy is observed, the model is checkpointed to the file best_model.pth.
The best model is restored after the entire training loop, via the resume() function
which was created before.

Afterward, you can make predictions with the model on unseen data. Beware that, if
you’re using a different metric for checkpointing, e.g., the cross entropy loss, the
better model should come with a lower cross entropy. Thus you should keep track on
the lowest cross entropy obtained.

Defining Early Stopping with Best Model

The training loop can be modified as follows: Keep track of the best model based on
the validation metric. If the current model is better than the previously saved best
model, update the best model.

Best Model and Early Stopping

You can also checkpoint the model per epoch unconditionally together with the best
model checkpointing, as you are free to create multiple checkpoint files. Since the
code above is the find the best model and make a copy of it, you may usually see a
further optimization to the training loop by stopping it early if the hope to see model
improvement is slim. This is the early stopping technique that can save time in
training.

The code above validates the model with val set at the end of each epoch and keeps
the best model found into a checkpoint file. The simplest strategy for early stopping is
to set up a threshold of epochs. If you didn’t see the model improved over the last
epochs, you terminate the training loop in the middle. This can be implemented as
follows:

6. Validating our model

You might also like