0% found this document useful (0 votes)

88 views6 pages

NB4-08 PT III Early Stopping

The document discusses the implementation of early stopping in machine learning, particularly using TensorFlow and PyTorch, to prevent overfitting during model training. It outlines the concept, parameters, benefits, and limitations of early stopping, as well as providing practical steps for setup, including model definition, training, and checkpointing. Additionally, it emphasizes the importance of monitoring validation metrics and restoring the best model weights to enhance model performance.

Uploaded by

Patricia Garcia Berlanga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views6 pages

NB4-08 PT III Early Stopping

Uploaded by

Patricia Garcia Berlanga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

NB4-08: Biomedical Imaging and PyTorch III

Goal: Let's try to create a NB where we use early stopping i.e., when we reach a
model that does not improve for a certain number of epochs we stop our training. We
will use a skin lesion dataset with 2 classes.

1. Introduction

What is Early Stopping?

Early stopping is a widely-used regularization technique in machine learning and

deep learning aimed at preventing overfitting during the training process. Here are
some key points about early stopping:

1. Purpose: The main goal of early stopping is to halt the training of a model once
it becomes apparent that the model's performance on a validation set is no
longer improving. This helps in ensuring that the model generalizes well to new,
unseen data rather than just fitting the training data.

2. Mechanism: During training, the model's performance is periodically evaluated

on a separate validation dataset. If the validation performance does not
improve for a certain number of consecutive epochs (a parameter known as
"patience"), the training is stopped. The best weights observed during training
are then restored.

3. Parameters:

o Patience: This parameter defines the number of epochs to wait for an

improvement before stopping. If set to 5, for example, training will stop if
there is no improvement in validation performance for 5 epochs.

o Minimum Delta: This parameter sets the minimum change in the

monitored metric to qualify as an improvement. This helps to filter out
small changes that are not significant.

4. Benefits:

o Prevents Overfitting: By stopping early, the model is less likely to fit

the noise in the training data, leading to better generalization on unseen
data.

o Efficiency: Saves computational resources by not continuing training

when improvements are unlikely.

o Simplicity: Easy to implement and does not require modification of the

underlying model architecture.

5. Implementation: Early stopping is typically implemented using callbacks in

deep learning frameworks such as Keras, TensorFlow, and PyTorch. These
callbacks monitor the validation metrics and stop training when the criteria are
met.

6. Limitations: While early stopping is effective, it may not always be the best
choice for every model or dataset. For some complex problems, more
sophisticated regularization techniques might be required alongside early
stopping.

Using Early Stopping with Tensorflow

In TensorFlow, tf.keras.callbacks.EarlyStopping is used to implement early stopping.

Steps to Use Early Stopping in TensorFlow are:

1. Import Required Libraries: First, import TensorFlow and other relevant

packages.

import tensorflow as tf
from tensorflow.keras.callbacks import EarlyStopping

2. Define the Model: Define your TensorFlow/Keras model as usual.

model = tf.keras.Sequential([
tf.keras.layers.Dense(64, activation='relu', input_shape=(input_shape,)),
tf.keras.layers.Dense(64, activation='relu'),
tf.keras.layers.Dense(1)
])

model.compile(optimizer='adam', loss='mse', metrics=['mae'])

3. Configure Early Stopping: Set up the EarlyStopping callback. You can adjust
several parameters such as monitor, patience, and min_delta.

o monitor: Metric to watch (e.g., 'val_loss', 'val_accuracy').

o patience: Number of epochs with no improvement after which training will

be stopped.

o min_delta: Minimum change in the monitored metric to qualify as an

improvement.

o mode: in auto, if the metric is loss, it will assume min stops decreasing. If
it's accuracy, it will assume max stops increasing.

early_stopping = EarlyStopping(
monitor='val_loss', # Can be 'val_loss', 'val_accuracy', etc.
patience=10, # Number of epochs with no improvement to stop training
min_delta=0.001, # Minimum change to qualify as an improvement
mode='min', # Can be 'min', 'max', or 'auto'
verbose=1
)

4. Train the Model with Early Stopping: Pass the EarlyStopping callback to
the fit method.

history = model.fit(
x_train, y_train,
epochs=100,
validation_data=(x_val, y_val),
callbacks=[early_stopping]
)

You can adjust other parameters of EarlyStopping according to your needs, for
example:

 restore_best_weights: If set to True, it will restore the model weights to the

epoch with the best monitored metric.

early_stopping = EarlyStopping(
monitor='val_loss',
patience=10,
min_delta=0.001,
mode='min',
verbose=1,
restore_best_weights=True
)

This configuration ensures that after stopping the training, the model will have the
best weights obtained during the training process.

Using Early Stopping with Pytorch

In PyTorch, early stopping is not built-in like in TensorFlow, but you can implement it
easily by monitoring the validation loss (or any other metric) during the
training loop. When the validation loss does not improve for a specified number of
epochs, you can stop the training and optionally restore the best model weights. This
is our mission today in this NoteBook.

2. Setting up Our Workspace

First, we check if GPU is connected. The nvidia-smi command (NVIDIA System

Management Interface) is used to monitor and manage NVIDIA GPUs (Graphics
Processing Units) in a system. It provides detailed information about the status and
performance of the GPUs, including GPU utilization, temperature, memory usage,
processes utilizing the GPU, and more.

nvidia-smi is a command-line utility provided by NVIDIA that helps you manage and
monitor NVIDIA GPU devices. It stands for NVIDIA System Management Interface.

Setting our workspace: /content and /content/datasets

Setting our Home

We save the root directory of the project '/content' as 'HOME' since we will be
navigating through the directory to have multiple projects under the same HOME.
Additionally, we will have the datasets in the 'datasets' directory, so all datasets are
easily accessible for any project.

Mount Google Drive

Next, it imports the drive module from the google.colab library, which provides
functionalities for mounting Google Drive in Google Colab.

Additionally, Google Drive is mounted in Google Colab and made available at the
path /content/drive. The user will be prompted to authorize access to Google Drive.
Once authorized, the content of Google Drive will be accessible from that point
onwards in the Colab notebook.

3. Load a Dataset (Dataloader)

Create a directory where we can save our dataset

Create the dataset directory (if it doesn't exist), where we are going to save the
dataset with which we are going to train our CNN.

Change to new directory datasets

Check if the file specified by file does not exist in the current directory. If it doesn't
exist, the code block inside the conditional, which in this case would be downloading
the file from the specified URL, is executed. then, it extracts the contents
of exp0.zip into the current directory quietly, overwriting any existing files if
necessary.

Display 8 images from a class from test

Now, we will use the matplotlib library to display multiple images in a 2x4 grid layout.
Next code imports necessary modules, including matplotlib.pyplot for plotting, os for
file matching, and PIL for image handling.

It specifies the directory containing the images and retrieves the paths of the first
8 .jpg images in that directory using os.path.join().

Then, it creates a figure with subplots arranged in a 2x4 grid and iterates through the
image paths, displaying each image in a subplot using imshow(). The title of each
subplot is set to indicate the image index, and axis labels are turned off.

After displaying all images, it adjusts the layout to prevent overlapping and

Setting a Dataloader

The purpose of a DataLoader is fundamental in the context of machine learning and

deep learning, especially when working with large or complex datasets.Its main
purpose is to facilitate the efficient loading and manipulation of data during model
training.

Normalize the dataloaders using Statistics

 Normalization: Normalization is crucial for ensuring that pixel values across

images are on a similar scale [0 1], which helps in stabilizing and speeding up
the training process of deep neural networks.

 Dataset Preparation: Each dataset (train_data, val_set, test_set) is prepared

with consistent transformations and normalization, facilitating uniformity in data
processing across training, validation, and testing phases.
4. Define a Convolutional Neural Network

5. Train the Network

Early Stopping

As previously explained, Early stopping is a technique used in machine learning

model training to halt training before the model begins to overfit the training data.
This is done by monitoring a metric of interest on the validation set and stopping
training when the metric ceases to improve for a certain number of consecutive
epochs.

For several training sessions in a row you can "crush" your previous training, then, you
can use the Python standard library os to create a directory named"train"in the
current directory and then save the trained models (pth) in that directory each time
that we train the model. Here's an example of how to do it:

To create directories named train1, train2, etc., each time you execute a training loop,
you can modify the code to check the number of existing training directories and then
create the next directory in sequence. Here's an example of how you could do this:

Checkpoints and Early Stopping

Application checkpointing is a fault tolerance technique. In this approach,

a snapshot of the state of the system is taken in case of system failure. If there is a
problem, you can resume from the snapshot. The checkpoint may be used directly or
as the starting point for a new run, picking up where it left off. When training deep
learning models, the checkpoint captures the weights of the model. These weights
can be used to make predictions as-is or as the basis for ongoing training.

PyTorch does not provide any function for checkpointing but it has functions for
retrieving and restoring weights of a model. So you can implement checkpointing
logic with them. Let’s make a checkpoint and a resume function, which simply save
weights from a model and load them back:

Sometimes, there are states outside of the model and you may want to checkpoint it
as well. One particular example is the optimizer, which in cases like SDG or Adam,
there are dynamically adjusted momentum. If you restarted your training loop, you
may want to restore the momentum at the optimizer as well. It is not difficult to do.

The idea is to make your checkpoint() function thanks

to torch.save() and torch.load() function are backed by pickle a Python object
serializator, so you can use it with a list or dict container.

Displaying the metrics curves model

Checkpointing is not only for fault tolerance. You can also use it to keep your best
model. How to define what is the best is subjective but considering the score from the
val set is a sensible method. Let’s say to keep only the best model ever found.

The variable best_accuracy is to keep track on the highest validation accuracy

(val_acc) obtained so far, which is in a percentage range of 0 to 100. Whenever a
higher accuracy is observed, the model is checkpointed to the file best_model.pth.
The best model is restored after the entire training loop, via the resume() function
which was created before.

Afterward, you can make predictions with the model on unseen data. Beware that, if
you’re using a different metric for checkpointing, e.g., the cross entropy loss, the
better model should come with a lower cross entropy. Thus you should keep track on
the lowest cross entropy obtained.

Defining Early Stopping with Best Model

The training loop can be modified as follows: Keep track of the best model based on
the validation metric. If the current model is better than the previously saved best
model, update the best model.

Best Model and Early Stopping

You can also checkpoint the model per epoch unconditionally together with the best
model checkpointing, as you are free to create multiple checkpoint files. Since the
code above is the find the best model and make a copy of it, you may usually see a
further optimization to the training loop by stopping it early if the hope to see model
improvement is slim. This is the early stopping technique that can save time in
training.

The code above validates the model with val set at the end of each epoch and keeps
the best model found into a checkpoint file. The simplest strategy for early stopping is
to set up a threshold of epochs. If you didn’t see the model improved over the last
epochs, you terminate the training loop in the middle. This can be implemented as
follows:

6. Validating our model

NB4-09 PT IV Data Augmentation and Early Stopping
No ratings yet
NB4-09 PT IV Data Augmentation and Early Stopping
5 pages
Fibercablelength Understanding
No ratings yet
Fibercablelength Understanding
5 pages
Module-4 4
No ratings yet
Module-4 4
19 pages
Tutorial 4
No ratings yet
Tutorial 4
6 pages
Dataset Augmentation
No ratings yet
Dataset Augmentation
30 pages
Final Code
No ratings yet
Final Code
16 pages
DL 1 - ComputerVision With PyTorch Notes
No ratings yet
DL 1 - ComputerVision With PyTorch Notes
304 pages
Chapter 3 - Training Deep Neural Networks
No ratings yet
Chapter 3 - Training Deep Neural Networks
25 pages
NN From Scratch
No ratings yet
NN From Scratch
5 pages
Cat Dog Classification CNN Model
No ratings yet
Cat Dog Classification CNN Model
13 pages
DL Lab Manual
No ratings yet
DL Lab Manual
67 pages
PyTorch For Deep Learning Zero To Mastery
No ratings yet
PyTorch For Deep Learning Zero To Mastery
6 pages
Project Documentation
No ratings yet
Project Documentation
24 pages
Assignment 3 DS5620
No ratings yet
Assignment 3 DS5620
11 pages
PyTorch Neural Network Basics Guide
No ratings yet
PyTorch Neural Network Basics Guide
12 pages
09 Tensorflow101 Slide
No ratings yet
09 Tensorflow101 Slide
78 pages
Week 6
No ratings yet
Week 6
8 pages
Deep Learning TensorFlow and Keras
No ratings yet
Deep Learning TensorFlow and Keras
454 pages
"I C U N N ": Mage Lassification Sing Eural Etworks
No ratings yet
"I C U N N ": Mage Lassification Sing Eural Etworks
15 pages
ML Guide: MNIST Digit Classification
No ratings yet
ML Guide: MNIST Digit Classification
98 pages
TensorFlow Deep Learning Guide
No ratings yet
TensorFlow Deep Learning Guide
35 pages
Keras
No ratings yet
Keras
4 pages
Deep Learning
No ratings yet
Deep Learning
46 pages
Object Detection & Image Denoising Guide
No ratings yet
Object Detection & Image Denoising Guide
9 pages
Exercise Classification
No ratings yet
Exercise Classification
8 pages
106106213
No ratings yet
106106213
637 pages
Pytorch Neural Networks Guide 1717173717
No ratings yet
Pytorch Neural Networks Guide 1717173717
17 pages
Building Deep Learning Models Using The PyTorch Library
No ratings yet
Building Deep Learning Models Using The PyTorch Library
4 pages
Deep Learning Lab 3
No ratings yet
Deep Learning Lab 3
3 pages
System Requirements grandMA3 onPC
No ratings yet
System Requirements grandMA3 onPC
1 page
Characterizing Power Management Opportunities For LLMs in The Cloud
No ratings yet
Characterizing Power Management Opportunities For LLMs in The Cloud
16 pages
Brain Tumor Classification FirstReport
No ratings yet
Brain Tumor Classification FirstReport
42 pages
Flutter Lab Manual Final
No ratings yet
Flutter Lab Manual Final
75 pages
Architecture of Raspberry Pi Unit 2
No ratings yet
Architecture of Raspberry Pi Unit 2
32 pages
MSFS 2024: PC & Laptop Specs Guide
No ratings yet
MSFS 2024: PC & Laptop Specs Guide
1 page
Visualizing Solid Edge Models
No ratings yet
Visualizing Solid Edge Models
15 pages
ASUS TUF Gaming FX504 - Complete Specs
No ratings yet
ASUS TUF Gaming FX504 - Complete Specs
1 page
20ve1a0536 Tejaswi Mamidipally - Technical Seminar Report - 2023-2024
No ratings yet
20ve1a0536 Tejaswi Mamidipally - Technical Seminar Report - 2023-2024
52 pages
Transparent CUDA Distributed Framework
No ratings yet
Transparent CUDA Distributed Framework
8 pages
GeForce RTX 3080 Family of Graphics Cards NVIDIA
No ratings yet
GeForce RTX 3080 Family of Graphics Cards NVIDIA
1 page
Metashape-Pro 1 7 en
No ratings yet
Metashape-Pro 1 7 en
187 pages
Microcontrollers Stm32n6 Series Overview
No ratings yet
Microcontrollers Stm32n6 Series Overview
39 pages
NVIDIA 2024 Annual Report
No ratings yet
NVIDIA 2024 Annual Report
187 pages
An Analytical Model For A GPU Architecture With Memory-Level and Thread-Level Parallelism Awareness
No ratings yet
An Analytical Model For A GPU Architecture With Memory-Level and Thread-Level Parallelism Awareness
12 pages
Best Practices and HPC Strategies For Ansys Mechanical
No ratings yet
Best Practices and HPC Strategies For Ansys Mechanical
7 pages
Hashcat - Advanced Password Recovery PDF
No ratings yet
Hashcat - Advanced Password Recovery PDF
11 pages
CC QB Module 1
No ratings yet
CC QB Module 1
4 pages
AI Device Constellation Systems
No ratings yet
AI Device Constellation Systems
25 pages
Real-Time Multiple Object Tracking Using Deep Learning Methods2021
No ratings yet
Real-Time Multiple Object Tracking Using Deep Learning Methods2021
30 pages
GROUP 7 - Audio and Video Hardware
No ratings yet
GROUP 7 - Audio and Video Hardware
30 pages
Washout Company PPT - 2
No ratings yet
Washout Company PPT - 2
58 pages
DJI Terra Operation Guide v4.2
No ratings yet
DJI Terra Operation Guide v4.2
94 pages
Lenovo g70-70 z70-80 LCFC Balg1 - Ailg1 - Ailz1 Nm-A331 Rev 0.4
100% (1)
Lenovo g70-70 z70-80 LCFC Balg1 - Ailg1 - Ailz1 Nm-A331 Rev 0.4
60 pages
HP Probook 450 G7 Notebook PC: Business Class Power, Affordably Priced
No ratings yet
HP Probook 450 G7 Notebook PC: Business Class Power, Affordably Priced
4 pages
COMPUTER SYSTEMS SERVICING NC IICommon2
No ratings yet
COMPUTER SYSTEMS SERVICING NC IICommon2
12 pages
Oracle Cloud OCI 2022 Architect Associate: 1Z0 - 1072 - 22
No ratings yet
Oracle Cloud OCI 2022 Architect Associate: 1Z0 - 1072 - 22
34 pages
Understanding the Graphics Pipeline
No ratings yet
Understanding the Graphics Pipeline
35 pages
Radeon RX 580 ARMOR 8G OC
No ratings yet
Radeon RX 580 ARMOR 8G OC
1 page
GPU PRO 360: Guide To GPGPU 1st Edition Wolfgang Engel (Editor) PDF Download
100% (4)
GPU PRO 360: Guide To GPGPU 1st Edition Wolfgang Engel (Editor) PDF Download
59 pages