Learning Algorithms for Internet of Things
Learning Algorithms for Internet of Things
I N N O VAT I O N S
SERIES
Learning
Algorithms for
Internet of Things
Applying Python Tools to
Improve Data Collection Use for
System Performance
—
G.R. Kanagachidambaresan
N. Bharathi
Maker Innovations Series
Jump start your path to discovery with the Apress Maker Innovations
series! From the basics of electricity and components through to the most
advanced options in robotics and Machine Learning, you’ll forge a path to
building ingenious hardware and controlling it with cutting-edge software.
All while gaining new skills and experience with common toolsets you can
take to new projects or even into a whole new career.
The Apress Maker Innovations series offers projects-based learning,
while keeping theory and best processes front and center. So you get
hands-on experience while also learning the terms of the trade and how
entrepreneurs, inventors, and engineers think through creating and
executing hardware projects. You can learn to design circuits, program AI,
create IoT systems for your home or even city, and so much more!
Whether you’re a beginning hobbyist or a seasoned entrepreneur
working out of your basement or garage, you’ll scale up your skillset to
become a hardware design and engineering pro. And often using low-
cost and open-source software such as the Raspberry Pi, Arduino, PIC
microcontroller, and Robot Operating System (ROS). Programmers and
software engineers have great opportunities to learn, too, as many projects
and control environments are based in popular languages and operating
systems, such as Python and Linux.
If you want to build a robot, set up a smart home, tackle assembling a
weather-ready meteorology system, or create a brand-new circuit using
breadboards and circuit design software, this series has all that and more!
Written by creative and seasoned Makers, every book in the series tackles
both tested and leading-edge approaches and technologies for bringing
your visions and projects to life.
G.R. Kanagachidambaresan
N. Bharathi
Learning Algorithms for Internet of Things: Applying Python Tools to
Improve Data Collection Use for System Performance
G.R. Kanagachidambaresan N. Bharathi
Vel Tech Dr. RR & Dr. SR Technical Unive Chennai, Tamil Nadu, India
Chennai, Tamil Nadu, India
Acknowledgments����������������������������������������������������������������������������xvii
Preface����������������������������������������������������������������������������������������������xix
vii
Table of Contents
2.2 TensorFlow����������������������������������������������������������������������������������������������������28
2.2.1 Features of TensorFlow������������������������������������������������������������������������29
2.2.2 Modules of TensorFlow�������������������������������������������������������������������������30
2.3 PyTorch���������������������������������������������������������������������������������������������������������34
2.3.1 Features of PyTorch������������������������������������������������������������������������������35
2.3.2 Libraries in PyTorch������������������������������������������������������������������������������37
2.3.3 Modules in PyTorch������������������������������������������������������������������������������38
2.4 SciPy�������������������������������������������������������������������������������������������������������������42
2.4.1 Features of SciPy����������������������������������������������������������������������������������42
2.4.2 Modules in SciPy����������������������������������������������������������������������������������43
2.5 Theano����������������������������������������������������������������������������������������������������������45
2.5.1 Features of Theano�������������������������������������������������������������������������������46
2.5.2 Modules in Theano�������������������������������������������������������������������������������47
2.6 Pandas����������������������������������������������������������������������������������������������������������50
2.6.1 Features of Pandas�������������������������������������������������������������������������������50
2.6.2 Modules in Pandas�������������������������������������������������������������������������������52
2.7 Matplotlib������������������������������������������������������������������������������������������������������53
2.7.1 Features of Matplotlib��������������������������������������������������������������������������53
2.7.2 Modules in Matplotlib���������������������������������������������������������������������������55
2.8 Scikit-learn���������������������������������������������������������������������������������������������������61
2.8.1 Features of Scikit-learn������������������������������������������������������������������������61
2.8.2 Modules in Scikit-learn������������������������������������������������������������������������62
2.9 Seaborn���������������������������������������������������������������������������������������������������������66
2.9.1 Features
of Seaborn�����������������������������������������������������������������������������67
2.9.2 Modules in Seaborn������������������������������������������������������������������������������68
viii
Table of Contents
2.10 OpenCV�������������������������������������������������������������������������������������������������������71
2.10.1 Features of OpenCV����������������������������������������������������������������������������71
2.10.2 Modules in OpenCV����������������������������������������������������������������������������73
2.11 Summary����������������������������������������������������������������������������������������������������75
Chapter 3: S
upervised Algorithms������������������������������������������������������77
3.1 Introduction���������������������������������������������������������������������������������������������������77
3.2 Regression����������������������������������������������������������������������������������������������������77
3.2.1 Linear Regression���������������������������������������������������������������������������������78
3.2.2 Polynomial Regression�������������������������������������������������������������������������78
3.2.3 Bayesian
Linear Regression�����������������������������������������������������������������79
3.2.4 Ridge Regression���������������������������������������������������������������������������������80
3.2.5 Lasso Regression���������������������������������������������������������������������������������81
3.2.6 Case
Study with Medical Applications��������������������������������������������������81
3.3 Classification�������������������������������������������������������������������������������������������������92
3.3.1 Logistic Regression������������������������������������������������������������������������������93
3.3.2 Decision Trees��������������������������������������������������������������������������������������94
3.3.3 Naïve Bayes������������������������������������������������������������������������������������������94
3.3.4 Random Forest�������������������������������������������������������������������������������������95
3.3.5 Support Vector Machines���������������������������������������������������������������������96
3.3.6 Case
Study with Agriculture�����������������������������������������������������������������97
Chapter 4: U
nsupervised Algorithms������������������������������������������������109
4.1 Introduction�������������������������������������������������������������������������������������������������109
4.2 K-Means Clustering������������������������������������������������������������������������������������110
4.3 Hierarchical Clustering��������������������������������������������������������������������������������112
4.4 Principal Component Analysis���������������������������������������������������������������������114
4.5 Independent Component Analysis���������������������������������������������������������������115
4.6 Anomaly Detection��������������������������������������������������������������������������������������118
ix
Table of Contents
Chapter 6: A
rtificial Neural Networks for IoT�����������������������������������177
6.1 Introduction to Artificial Neural Networks (ANNs)���������������������������������������177
6.2 Architecture of ANN�������������������������������������������������������������������������������������179
6.3 Activation Function�������������������������������������������������������������������������������������180
6.4 Loss Function����������������������������������������������������������������������������������������������185
x
Table of Contents
6.5 Types
of Artificial Neural Network Architectures�����������������������������������������191
6.5.1 Feed-Forward ANN�����������������������������������������������������������������������������191
6.5.2 Feedback Networks����������������������������������������������������������������������������203
6.5.3 Unsupervised ANNs����������������������������������������������������������������������������207
6.6 Summary����������������������������������������������������������������������������������������������������208
Chapter 7: C
onvolutional Neural Networks for IoT���������������������������209
7.1 Introduction�������������������������������������������������������������������������������������������������209
7.2 General
Architecture of CNN�����������������������������������������������������������������������213
7.3 Types of CNNs���������������������������������������������������������������������������������������������215
7.4 Case Study for Computer Vision������������������������������������������������������������������222
7.5 Summary����������������������������������������������������������������������������������������������������231
Chapter 8: R
NNs, LSTMs, and GANs �������������������������������������������������233
8.1 Introduction�������������������������������������������������������������������������������������������������233
8.2 Recurrent
Neural Networks������������������������������������������������������������������������233
8.3 Long Short-Term Memory (LSTM)���������������������������������������������������������������238
8.4 Bidirectional LSTM Model���������������������������������������������������������������������������240
8.5 Generative Adversarial Networks (GANs)����������������������������������������������������242
8.6 Application Case Study�������������������������������������������������������������������������������246
8.7 Summary����������������������������������������������������������������������������������������������������249
Chapter 9: O
ptimization Methods�����������������������������������������������������251
9.1 Introduction�������������������������������������������������������������������������������������������������251
9.2 Gradient Descent����������������������������������������������������������������������������������������252
9.3 Batch Gradient Descent������������������������������������������������������������������������������255
9.4 Stochastic
Gradient Descent�����������������������������������������������������������������������258
9.5 Mini-Batch Gradient Descent����������������������������������������������������������������������261
xi
Table of Contents
9.6 Adagrad�������������������������������������������������������������������������������������������������������263
9.7 RMSProp�����������������������������������������������������������������������������������������������������267
9.8 Adadelta������������������������������������������������������������������������������������������������������271
9.9 Momentum��������������������������������������������������������������������������������������������������275
9.10 Nesterov Momentum���������������������������������������������������������������������������������279
9.11 Adam���������������������������������������������������������������������������������������������������������283
9.12 Adamax�����������������������������������������������������������������������������������������������������287
9.13 SMORMS3�������������������������������������������������������������������������������������������������291
9.14 Summary��������������������������������������������������������������������������������������������������292
Index�������������������������������������������������������������������������������������������������293
xii
About the Authors
G.R. Kanagachidambaresan is a professor
in the Department of Computer Science
and Engineering at Vel Tech Rangarajan Dr.
Sagunthala R&D Institute of Science and
Technology. He completed his PhD in 2017 at
Anna University and currently handles funded
projects from ISRO, DBT, and DRDO. He has
written books and articles on the topics of IoT,
wireless networks, and expert systems. He has
also consulted for leading MNC companies. He
is the managing director for Eazythings Technology Private Limited and
a TEC committee member for DBT. He is also the editor in chief for the
Next Generation Computing and Communication Engineering series from
Wiley.
xiii
About the Authors
xiv
About the Technical Reviewer
Massimo Nardone has three decades
of experience in security, web/mobile
development, and cloud and IT/OT/IoT
architecture. His true passions are security
and Android. He has been programming and
teaching how to program with Android, Perl,
PHP, Java, VB, Python, C/C++, and MySQL for
more than 30 years. He holds a master’s degree
in computing science from the University
of Salerno, Italy. He has worked as a chief
information security officer (CISO), software
engineer, chief security architect, security executive, OT/IoT/IIoT security
leader, and security architect for many years. He is currently the VP of OT
security for SSH Communications Security.
xv
Acknowledgments
Our sincere thanks to Jessica Vakili, Shobana Srinivasan, and the entire
Apress team. Our sincere thanks to Vel Tech and SRM management and
DBT – INDIA (BT/PR47509/AAQ/3/1058/2022).
xvii
Preface
The way we connect with our surroundings, gadgets, and systems has
completely changed as a result of the Internet of Things’ (IoT’s) exponential
rise. An enormous quantity of data is being generated every day due to
the widespread use of smart devices and sensors, providing previously
unheard-of opportunities to improve efficiency, automation, and decision-
making in a variety of fields. But rather than stopping at data collection
and conveyance, the real promise of IoT is in turning that data into insights
that can be put to use. This is when learning algorithms—which include
methods from both deep learning and machine learning—come into play.
To construct intelligent systems that can self-optimize, foresee
the future, and adapt to changing environments, this book explores
the integration of learning algorithms with the Internet of Things.
We can create smarter systems that tackle a variety of societal issues,
from transportation and smart cities to healthcare and agriculture, by
combining the advantages of IoT with machine learning. The deliberate
use of learning algorithms makes the idea of “smartness” more than just a
catchphrase; it becomes an actual reality.
This book offers readers a thorough understanding of how learning
algorithms can improve real-time IoT applications. With its coverage of
fundamental ideas, useful applications, and real-world scenarios, it offers
a comprehensive method for utilizing these technologies to boost system
efficiency and data use. Code samples and thorough descriptions showing
how to apply these techniques for different applications are especially
helpful for researchers and developers.
Funding Information
Part of this book is supported by the Indias Department of Biotechnology
(BT/PR47509/AAQ/3/1058/2022).
xix
CHAPTER 1
Introduction to
Learning Algorithms
Learning algorithms are capable of extracting features from the input data
and gaining intelligence to identify and predict new input data. This is like
a human learns from his birth. The input data has the question as well as
the answer, such as an image of an object and its name. How a baby learns
the names of objects as they grow up is similar to how computer programs
implemented with mathematical and logical computations are used as
learning algorithms.
Generally, the learning algorithms can be machine learning
algorithms, deep learning algorithms, genetic algorithms, and supporting
optimizers. The commonality behind all the learning algorithms is that
they extract information from the input training data and apply the gained
knowledge to make predictions and identify new input data.
The machine learning algorithms enable the computers to gain
knowledge from the input data automatically. The past data fed as input
is used to train the mathematical models in order to predict the future
data. The building blocks of deep learning algorithms are artificial neural
networks, which form the basis for computation and learn the features of
the data. As the number of layers increases appropriately in an artificial
neural network, the accuracy will increase, and the algorithm will learn the
features with fewer resources.
2
Chapter 1 Introduction to Learning Algorithms
3
Chapter 1 Introduction to Learning Algorithms
5
Chapter 1 Introduction to Learning Algorithms
6
Chapter 1 Introduction to Learning Algorithms
7
Chapter 1 Introduction to Learning Algorithms
8
Chapter 1 Introduction to Learning Algorithms
It computes the difference between the actual output and the predicted
output and aggregates it as a single value. The popular machine learning
algorithms are logistic regression, linear regression, naïve Bayes, decision
tree, random forest, K-nearest neighbor (KNN) classification, support
vector machine (SVM), and gradient boosting. These are categorized
under supervised machine learning. K-means, apriori, hierarchal
clustering, anomaly detection, principal component analysis, and
independent component analysis are unsupervised machine learning
algorithms.
Machine learning is used in diverse domains such as face or fingerprint
recognition, product recommendations in online stores, friend suggestions
on Facebook, and profile recommendations on LinkedIn. Many top
companies employ machine learning techniques to study customers’
interest using a large volume of data and recommend suggestions about
products. The following are more use cases:
9
Chapter 1 Introduction to Learning Algorithms
10
Chapter 1 Introduction to Learning Algorithms
11
Chapter 1 Introduction to Learning Algorithms
the learning of complex datasets and its patterns. Finally, the output
layer combines the results produced by the hidden layers based on the
requirement of the task, as shown in Figure 1-2. Though the deep neural
networks are computationally intensive, they give more accurate results
than shallow neural networks. Hence, deep neural networks are most
desirable for applications involved in real-time activities such as natural
language processing, image and video processing, and multivariate time
series-based and real-time forecasting applications.
12
Chapter 1 Introduction to Learning Algorithms
The two different models based on the flow of information in the deep
neural network are forward propagation and backpropagation. In the
former model, the input signal moves toward the output layer through
hidden layers one by one. As the name of the model, the signal moves only
in the forward direction. In back propagation, the second model uses the
chain rule to examine the involvement of each neuron to generate the
output as well as errors. Later the error value is fed back to the network,
and the adjustment in the weights of the neuron is performed to reduce
the errors in the output. In addition, there exists optimization techniques
that also contribute to the reduction of errors in back propagation models.
You can find a detailed discussion of optimization techniques later in this
chapter.
Apart from the two models, two more functions used in deep neural
network play key roles. They are the activation function and the loss
function. The activation function converts the values of input data to
a form that can be understandable by the deep neural network. For
example, the sigmoid activation function, which is an “S” shaped curve
takes any real number as input and produces output in the range of 0 to
1. Similarly, other activation functions also exist such as tanh, linear, etc.
A loss function is a performance measure that indicates the how well a
neural network produces output. Mean absolute error, mean square error,
cross entropy, accuracy, etc., are some of the loss functions that compute
the loss or difference in the actual and predicted values using deep neural
networks and deep learning algorithms.
The algorithms that are most common in deep learning are
convolutional neural networks (CNNs), recurrent neural networks
(RNNs), and long short-term memory networks (LSTMs). In addition,
generative adversarial networks (GANs), multilayer perceptron (MLPs),
radial basis function networks (RBFNs), deep belief networks (DBNs),
restricted Boltzmann machines (RBMs), autoencoders, self-organizing
maps (SOMs), and transfer learning algorithms are also used in specific
applications of deep learning.
13
Chapter 1 Introduction to Learning Algorithms
The applications that are using the previous deep learning algorithms
are as follows: CNNs perform well when processing satellite images and
detecting anomalies. RNNs are used for natural language processing
and handwritten character recognition. LSTMs are good at time-series
prediction and music composition. GANs support rendering 3D objects
and creating cartoon characters. MLPs are the best choice for speech
recognition and machine translation. RBFNs are good at classification and
regression. DBNs are better in video recognition and motion capturing.
RBMs are used in collaborative filtering and feature learning. Auto
encoders are well suited for pharmaceutical discovery and popularity
prediction. SOMs help in understanding the high-dimensional data with
visualization.
1.4 Optimization
Optimization is generally a mathematical technique that has an objective
function to be either maximized or minimized based on the nature
of the problem in order to resolve. The properties of the variables
involved in the problem determine the type of approach used to solve.
In the machine learning and deep learning context, the optimizers are
algorithms that are used to modify the learning rates and weights assigned
in the neural network in order to reduce the loss or improve the results.
Hence, optimizers are used to solve minimized optimization problems.
The optimization is done by comparing the results of every iteration by
changing the parameters involved in the model until the optimum results
are generated. The basic terms you need to understand at this juncture
are sample, epoch, and batch. Sample is a single entry or row in the given
dataset. Epoch denotes the number of iterations the algorithm executes on
the entire training dataset. Batch represents the number of samples to be
chosen for updating the attributes or parameters of the model.
14
Chapter 1 Introduction to Learning Algorithms
There are various algorithms for optimizing the model and its results.
Two main optimizers that are the simple and oldest techniques are called
gradient descent and stochastic gradient descent. Before discussing the
optimizers, the fundamental terms that everyone should be aware are
global maxima, global minima, local maxima, and local minima. In a
mathematical function, the smallest value within a given range, and not
the entire series, is called the local minima, and the largest value in each
range of the function is called the local maxima. Alternatively, the smallest
value over the entire series of the function is called the global minima and
that of the largest is the global maxima. Generally, more than one local
minimum and local maxima are possible, but there exists only one global
minima and maxima, as in Figure 1-3.
15
Chapter 1 Introduction to Learning Algorithms
16
Chapter 1 Introduction to Learning Algorithms
17
Chapter 1 Introduction to Learning Algorithms
18
Chapter 1 Introduction to Learning Algorithms
19
Chapter 1 Introduction to Learning Algorithms
1.5 Summary
This chapter provided fundamental knowledge of learning algorithms.
With this basic knowledge, anyone can proceed with programming the
learning algorithms. Programming learning algorithms involves the calling
of methods, which implement the algorithms already for any given dataset
that can be passed as parameters along with other hyper parameters
required for the algorithm. In the next chapter, the packages that offer the
learning algorithm methods will be discussed.
20
CHAPTER 2
2.1 Keras
Keras is an open-source software library that serves as an application
programming interface (API) to the TensorFlow platform. It is an easy-to-
use API that simplifies a developer’s work by providing the building blocks
for machine learning and deep learning implementations.
The word Keras has its origin from the green word (κέρας), which
means horn. Keras was developed mainly for research work on the Open-
ended Neuro-Electronic Intelligent Robot Operating System (ONEIROS).
Now the Keras API is used for developing software in many well-known
companies such as YouTube, Twitter, NASA, Waymo, etc.
© G.R. Kanagachidambaresan and N. Bharathi 2024 21
G.R. Kanagachidambaresan and N. Bharathi, Learning Algorithms for Internet of Things,
Maker Innovations Series, https://doi.org/10.1007/979-8-8688-0530-1_2
Chapter 2 Python Packages for Learning Algorithms
23
Chapter 2 Python Packages for Learning Algorithms
24
Chapter 2 Python Packages for Learning Algorithms
25
Chapter 2 Python Packages for Learning Algorithms
The base Layer class This forms the base from which all other layers are
developed by inheriting it. A layer is basically a callable
object that consumes tensors as input and produces
tensors as output.
Layer activations This governs the usage of activations. The activation
function is applied on the layers or through the argument
in layer methods. Many built-in activations exist as string
identifiers to pass as values of the activation argument.
Layer weight initializers This initializes the weights of Keras layers with the help
of keyword arguments such as kernel_initialzer
and bias_initializer. Built-in initializers can also be
passed as string identifier. In addition, custom initializers
can be created if required.
Layer weight This applies penalty on layer activity or parameters during
regularizers learning optimization. These penalties are contributing for
the loss function that in turn is optimized by the network.
To regularize, three main keyword arguments are
kernel_regularizer, bias_regularizer, and
activity_regularizer.
Layer weight To apply constraints on the weight of each layer, the
constraints tf.keras.constraints module defines the model
parameters or keyword arguments to set constraints
during the learning process.
(continued)
26
Chapter 2 Python Packages for Learning Algorithms
Core layers This layer has the following components: input object
(initialize tensor), dense layer (densely connected neural
network layer), activation layer (applies activation function
to the output), embedding layer (converts +ve integers to
fixed size vectors), masking layer (masks sequence to skip
timesteps), and lambda layer (wraps arbitrary expressions
for quick prototype). These components form the core
layer.
Convolution layers The convolution kernel can be created with the layer input
over the required dimensions such as 1D, 2D, and 3D to
produce output tensors.
Pooling layers It is for pooling operation on 1D, 2D, and 3D layers. It
supports Max, Average, GlobalMax, and GlobalAverage
pooling.
Recurrent layers Recurrent neural networks (RNNs) are enabled using
this layer. It supports long short-term memory (LSTM),
gated recurrent unit (GRU), simple RNN, time distributed,
bidirectional, convolutional LSTM, and base RNN.
Preprocessing layers The preprocessing layer supports the building of a Keras-
based input processing pipeline. These pipelines serve as
an independent preprocessing code block that can be used
with Keras models. Later the combination is used as saved
models. Text, image, and feature preprocessing methods
are available.
Normalization layers Normalization is to convert all the data into one type and
within a range as per the requirement. Batch normalization
is applied for transforming the data in such a way that its
mean is close to 0 and standard deviation is near to 1.
(continued)
27
Chapter 2 Python Packages for Learning Algorithms
Regularization layers This layer sets the dropout in the input randomly to 0
during training to avoid an overfitting possibility. Also, this
layer has methods to update the cost function with respect
to input activity.
Attention layers This layer is responsible for focusing selectively on
important portions of input. This layer prioritizes and
emphasizes necessary information to improve the
performance of the model.
Reshaping layers The shape of the input is modified based on the
requirement. It is a layer that enables the alteration of
structure of the model layer without affecting data.
Merging layers This layer merges more than one input tensors of same
shape into a single tensor of the same shape. This is
mainly useful in deep learning algorithms where fusion of
multiple images, text, or input from various sensors.
Activation layers This layer is responsible for providing various activation
function layers such as rectified linear unit (ReLU),
Softmax, parametric ReLU, exponential and thresholded
activation function.
2.2 TensorFlow
TensorFlow is an open-source software library framework especially
developed for machine learning and deep learning algorithms. It was
created by the Google team to analyze, design, and develop the ideas and
concepts in artificial intelligence, machine learning, and deep learning.
28
Chapter 2 Python Packages for Learning Algorithms
29
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
30
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
compat module This module helps in writing compatible code that works
both in TensorFlow 1.x and 2.x and Python 2 and 3 using
compatible functions.
config module This module performs the configuration of logical or
physical devices for experimental, optimizer, and threading
purpose.
data module The input data pipeline type specification and representation
methods such as iterators, TFrecords, and MultilineText are
offered by this module.
debugging module This module helps to debug using functions such as
assert_equal, assert_less, and assert_greater
elementwise. It has functions to check or assert every
attribute of tensor.
distribute module This module helps in distributing the algorithm among
multiple machines or devices.
dtypes module This module works around the data types of tensors and
converts/cast the tensors type.
errors module This module handles the exception types and errors that are
generated during execution.
estimator module The estimator module does training, evaluation, prediction,
and export. This module has all the experimenting and
exporting methods.
experimental module This module supports experimenting with deep learning
packages, distributed tensors, and TensorFlow-
TensorRTcompiler for inference on NVIDIA devices, etc.
feature_column This module works with the several types of columns such
module as numerical, categorical, embedding, hash, etc.
(continued)
31
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
32
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
33
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
sysconfig module This module gives us the information about the system
configuration such as compilation and linker flags, C++
header files, and TensorFlow framework library directory
location
test module This module helps in understanding the testing of
TensorFlow with its benchmarks.
tpu module This module describes the options for tensor processing
unit and accelerated linear algebra (XLA) compilation.
train module This module works with training models and its associated
methods.
types module This module has the methods to convert union of all types
into tensors.
version module This module gives the version details of the compiler, git,
graph_def, etc.
xla module This module supports the XLA optimizing compiler to
accelerate TensorFlow models.
2.3 PyTorch
PyTorch is a specialized machine learning and deep learning library that
uses GPUs in addition to CPUs. It is a healthy competitor to Keras and
TensorFlow and good at tensor manipulations, GPU acceleration, and
highly optimized and automatic differentiation. Google released Keras in
early 2015 to train neural networks with easy-to-use APIs. But the low-
level features are unable to be customized. In late 2015, Google released
TensorFlow, which gained popularity soon and serves as a back end
for the Keras library. It also implemented more lower-level features for
34
Chapter 2 Python Packages for Learning Algorithms
35
Chapter 2 Python Packages for Learning Algorithms
36
Chapter 2 Python Packages for Learning Algorithms
Library Usage
TorchAudio It deals with signal and audio processing. Loading and storing
waveform tensors from and to files, media stream encoding
and decoding, synthesizing, filtering, etc., are the operations
supported.
TorchData The data pipelines (a series of processing steps on data)
are one of the key features in data science and analytics.
TorchData contains the data loading primitives for easy
construction and usage of data pipelines.
TorchRec It is a PyTorch library that deals with recommendation
systems that are very large. General parallelism and sparsity
primitives are the key features of the TorchRec library to
train models with large datasets shared across many GPUs.
TorchServe It offers easy and flexible tools to deploy PyTorch models in
production. In addition, it provides easy versioning along with
built-in or custom handlers, multiple models in one instance, etc.
TorchText This library deals with sentiment analysis, sequence tagging,
classification, corresponding data processing utilities, and
datasets.
TorchVision It deals with computer vision-based manipulations on images
and videos such as transformation and augmentation,
segmentation, object detection, classification, etc.
Pytorch on XLA This library supports methods and enables the PyTorch
Devices execution on Accelerated Linear Algebra (XLA) devices.
37
Chapter 2 Python Packages for Learning Algorithms
38
Chapter 2 Python Packages for Learning Algorithms
39
Chapter 2 Python Packages for Learning Algorithms
40
Chapter 2 Python Packages for Learning Algorithms
41
Chapter 2 Python Packages for Learning Algorithms
2.4 SciPy
SciPy is a Python package developed for scientific computation–based
applications. It offers statistical, signal processing, and optimization-based
functions. It was developed by Travis Olliphant who developed NumPy. So,
SciPy uses NumPy as its foundation. The functions that are offered in SciPy
are additional support and optimized versions of frequently accessed
functions in NumPy. The SciPy collection of mathematical algorithms and
functions are widely used in computation-intensive applications.
SciPy also offers various functions and classes for visualizing data to
support decision systems and real-time analysis. It has rich user-
friendly numerical functions for numerical computations, integration,
differentiation, and optimization. SciPy is highly associated with NumPy,
and the arguments and return types of SciPy functions are mostly
NumPy arrays.
42
Chapter 2 Python Packages for Learning Algorithms
Modules Purpose
43
Chapter 2 Python Packages for Learning Algorithms
Modules Purpose
44
Chapter 2 Python Packages for Learning Algorithms
Modules Purpose
2.5 Theano
Theano is a Python library enabling computationally intensive
applications and research on a large scale. The multidimensional
mathematical arrays and expressions are efficiently defined, manipulated,
optimized, and evaluated with the Theano Python library functions.
45
Chapter 2 Python Packages for Learning Algorithms
46
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
47
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
48
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
49
Chapter 2 Python Packages for Learning Algorithms
2.6 Pandas
Pandas is a data processing–based Python package that provides
efficient data structures called series and dataframe. A series handles
one-dimensional data, and a dataframe deals with higher dimensions.
A dataframe mainly works with relational or table data that constitutes
heterogenous typed columns. It forms a basic but powerful building block
of data for performing real-world data analysis in Python.
Data in any form such as ordered/unordered time series, arbitrary
rows and columns type, statistical records, etc., can be easily represented
in series or dataframe. Like many other packages, Pandas is also built
on top of the NumPy to contribute to scientific applications. It supports
robust data input and output tools that can be loaded from or to any kind
of source such as Excel files, flat files, databases, and sophisticated file
systems.
50
Chapter 2 Python Packages for Learning Algorithms
51
Chapter 2 Python Packages for Learning Algorithms
Modules Purpose
52
Chapter 2 Python Packages for Learning Algorithms
2.7 Matplotlib
Matplotlib is one of the fundamental Python packages to visualize the data
through static or interactive charts. It was developed by John D. Hunter.
Matplotlib has two application interfaces (APIs): an explicit “Axes” and
an implicit “pyplot.” The Axes interface works on Figure objects and
constructs the visualization in stages. It is also called an object-oriented
interface as it uses methods of the Figure object to create an Axes object,
which in turn enables the call to plot or drawing methods. The Implicit
pyplot API creates the Figure and Axes objects by itself and enables
the user to call the plot method directly. Though the plot() method
is available in the data handling libraries such as Pandas, xarray, etc.,
Matplotlib offers functions specific for the visual demonstration of data
and improves the understanding of data.
Matplotlib initially manages 2D plots, and the functions and utilities
are defined based on two dimensions in its 1.0 release. Successively,
three-dimensional plotting utilities were also implemented on top of the
2D display. It enables a comfortable way of visualizing three-dimensional
plots with the set of tools defined under the mplot3d toolkit. Hence, the
3D plots are generated in applications by importing matplot3d with from
mpl_toolkits import mplot3d.
53
Chapter 2 Python Packages for Learning Algorithms
54
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
55
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
matplotlib.bezier This module provides utility for working with Bezier paths.
matplotlib.category This module generates graphs with any axis as categorical
variables/strings.
matplotlib.cbook This module provides various utility functions and classes.
matplotlib.cm This module supports built-in color maps.
matplotlib. This module holds the classes for generating plots for the
collections large collections of objects.
matplotlib.colorbar This module enhances the visualization by mapping the scalar
values to colors.
matplotlib.colors This module has the functionality of converting a string or
numeric representation of color into RGB or RGBA sequences.
matplotlib.container This module serves as a container for the collection of
semantically related items in the plots, for example, set of
bars in bar plot.
matplotlib.contour This module generates contour plots that illustrates 3D
surface in a 2D format with constant Z slices.
matplotlib.dates This module builds plots by mapping the dates on the axis.
matplotlib.docstring This module is used to describe the components in the plots
using strings over it. But this module is deprecated.
matplotlib.dviread This module helps in reading the DVI files output by LaTex or
Tex.
matplotlib.figure This module consists of Figure, SubFigure, and
SubplotParams that hold all plot elements.
matplotlib.font_ This module governs the fonts by finding, using, and
manager managing across the platforms.
(continued)
56
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
57
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
matplotlib.patches This module supports patches in any object of the figure with
face color and edge color.
matplotlib.path This module enables the drawing of ploy lines and serves as
base class for all vector drawings.
matplotlib. This module provides functions to draw multistage to any
patheffects artist object in the visualization.
matplotlib.pyplot This module is for generating interactive plots and
programmatic plot generations.
matplotlib. This module governs the mapping or transformation of data
projections coordinates to display coordinates.
matplotlib.quiver This module supports the ploting of vector fields. Two plots,
namely, quiver and barb, plots are currently under it.
matplotlib.rcsetup This module offers the functions and classes for runtime
configuration (RC) settings for customization of Matplotlib
plots.
matplotlib.sankey This module enables the visualization of Sankey diagrams.
matplotlib.scale This module defines the data values on the axis. That can like
logarithmic, linear, etc.
matplotlib.sphinxext This module defines the functions and modules to get input
as .rst (ReStructure Text) files and convert into HTML. It
supports sphinx, which is a tool to automate documentation in
an elegant manner.
matplotlib.spines This module manages the spines, which are lines connecting
the boundaries of data area with the tick marks on axis.
(continued)
58
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
matplotlib.style This module works with various styles that dictates the visual
appearance of the plots. The styles can be specified using RC
parameters.
matplotlib.table This module is used to generate tables for storing values in a
grid of cells from texts.
matplotlib.testing This module is developed for testing the figures and images
by comparing them.
matplotlib.text This module generates the text in the figure. It has x, y, and
string text as its base parameters.
matplotlib. This module supports TeX expressions such as LaTeX, PS,
texmanager PDF backends, etc., in figures.
matplotlib.ticker This module determines the location and appearance of the
major and minor ticks along the axes.
matplotlib.tight_ This module governs the functions and classes to generate
bbox bounding boxes as axis aligned rectangles denoted by a tuple
of four integers and it is deprecated.
matplotlib.tight_ This module enables the layout with not only plots but also
layout subplots with convenient arrangement of them.
matplotlib. This module depicts the geometric transformations to decide
transforms the final position of all component objects in the figure.
Transforms are denoted by a tree in which every object
depends on its children in its geometric value.
(continued)
59
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
60
Chapter 2 Python Packages for Learning Algorithms
2.8 Scikit-learn
Scikit-learn is a machine learning library that has implemented supervised
and unsupervised learning. Besides, it is an open-source library distributed
under the three-clause BSD (Berkley Source Distribution) license and
supports various tools such as model fitting and predicting, model
selection, model validation, transforming and preprocessing, pipelines, and
many more utility functions. The powerful automatic parameter search is
provided by Scikit-learn to decide the best parameter combinations.
61
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
sklearn.base This module provides base classes and utility functions for all
estimators.
sklearn.calibration This module helps in the calibration of predicted probabilities.
sklearn.cluster This module supports unsupervised clustering algorithms.
sklearn.compose This module consists of composite estimators or meta
estimators for generating composite models with transformers.
sklearn. This module includes the methods and functions to estimate the
covariance covariance of the features defined in data points.
(continued)
62
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
63
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
64
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
65
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
2.9 Seaborn
Seaborn is a library for generating statistical graphics, which supports
the switching between different visual representation to provide better
understanding of data. Its development is based on the Matplotlib and
Pandas data structures and functions. Its plotting methods operate either
66
Chapter 2 Python Packages for Learning Algorithms
on the the Pandas dataframe or on arrays that hold the entire dataset
and accomplish the meaningful mapping and statistical aggregation
to generate more elaborative plots. This package supports datasets of
different formats and its corresponding possibility of different types
of plots.
67
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
68
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
69
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
70
Chapter 2 Python Packages for Learning Algorithms
2.10 OpenCV
OpenCV is an open-source library that consists of a variety of methods
for computer vision algorithms. It defines the basic data structure that
includes multidimensional arrays, matrices, etc. It comprises libraries
that implement image processing operations, video analysis, 2D features
framework, calibration of camera and 3D reconstruction, object detection,
high-level GUI, video I/O, etc.
Image processing operations consist of filtering, image transformation,
color space conversions, etc. Video analysis consists of motion estimation,
background filtration, object tracking, etc. A 2D feature framework
comprises feature detectors, descriptors and descriptor matchers, camera
calibration, and 3D reconstruction includes object pose estimation,
stereo camera calibration, stereo correspondence algorithms, etc. Object
detection has methods to detect the objects and predefined classes
instances such as person, car, face, computer, pen, etc. A high-level GUI
is an interface to access UI capabilities and its corresponding methods.
Video I/O deals with the capturing of videos and coding and decoding
of them.
71
Chapter 2 Python Packages for Learning Algorithms
72
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
73
Chapter 2 Python Packages for Learning Algorithms
Module Purpose
74
Chapter 2 Python Packages for Learning Algorithms
2.11 Summary
This chapter provided an overview of the Python packages that are widely
used to implement learning algorithms. The packages are highly efficient
and powerful in generating the models. The efficiency of the models is
measured using the performance metrics supported by the packages. In
the next chapter, we will discuss the supervised learning algorithms and
their applications.
75
CHAPTER 3
Supervised Algorithms
3.1 Introduction
Supervised algorithms are grouped under one category of machine
learning called supervised learning. As the name implies, the whole
process of learning is designed like a teacher monitors the learning
process. The learning process starts with an input dataset, which has a set
of features or attributes along with the outputs mapped one to one. Here
the input is called the independent variables, and the output is called the
dependent variables. The values of output are called labels, which enable
the training process to correlate with the input features easier. In short,
the mapping function is generated from the dataset, which has known
inputs and outputs along with the training process. The predicted output
is generated through a mapping function that calculates a label for each
set of inputs for which the outputs are unknown. This chapter describes
the supervised learning algorithms in detail along with their simple
implementations in Python as case study.
3.2 Regression
Regression is the data modeling that decides the best-fit line that covers all
the data points plotted with independent variables against the dependent
variables. The line that is best fit is determined by computing the distance
between the possible set of lines and the data points. The line that has the
smallest value of the sum of distances from all the data points is designated
as the best-fit line. Various types of regression techniques exist and
differ based on the form of regression lines and number of independent
variables and type of dependent variable. Obviously, the nature of the data
highly influences the type of regression to use.
78
Chapter 3 Supervised Algorithms
79
Chapter 3 Supervised Algorithms
------ Eqn 6
i =1 i =1 j =0
The cost function for ridge regression is as follows:
m
CF = ∑ (Yactual − Ypred ) + penalty
2
i =1
2
------ Eqn 7
m p
p
= ∑ Yactual − ∑α j ∗ xij + λ ∑ α j 2
i =1 j =0 j =0
80
Chapter 3 Supervised Algorithms
where λ is penalty term that is multiplied with the sum of squares of the
coefficients and is known as L2 regularization.
i =1
2 ------ Eqn 8
m p
p
= ∑ Yactual − ∑α j ∗ xij + λ ∑α j
i =1 j =0 j =0
This is very similar to ridge regression except the penalty term (λ) is
multiplied with the sum of coefficients simply instead of the squares of them.
LASSO stands for “Least Absolute Shrinkage and Selection Operator.”
Lasso employs shrinkage in which data converges toward the central point
such as mean. Shrinkage is used to reduce the bias-variance trade-off and
in turn reduces overfitting.
import os
os.chdir("/content/drive/MyDrive/Colab Notebooks/LAIoT")
81
Chapter 3 Supervised Algorithms
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_
size=0.3, random_state=0)
# Model evaluation
y_pred_train = lin_reg.predict(X_train)
y_pred_test = lin_reg.predict(X_test)
82
Chapter 3 Supervised Algorithms
# Visualize
plt.scatter(X, y, color='blue')
plt.scatter(X_train, y_train, color='red')
plt.plot(X_train, lin_reg.predict(X_train), color='green')
plt.title('Linear Regression')
plt.xlabel('Age')
plt.ylabel('Maximum Heart Rate')
plt.show()
83
Chapter 3 Supervised Algorithms
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_
size=0.3, random_state=0)
# Model evaluation
y_pred_train = poly_reg.predict(X_poly_train)
y_pred_test = poly_reg.predict(X_poly_test)
84
Chapter 3 Supervised Algorithms
# Visualize
plt.scatter(X, y, color='blue')
plt.scatter(X_train, y_train, color='red')
plt.plot(X_train, poly_reg.predict(poly_features.fit_
transform(X_train)), color='green')
plt.title('Polynomial Regression')
plt.xlabel('Age')
plt.ylabel('Maximum Heart Rate')
plt.show()
85
Chapter 3 Supervised Algorithms
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import BayesianRidge
from sklearn.metrics import mean_squared_error, r2_score
86
Chapter 3 Supervised Algorithms
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_
size=0.3, random_state=0)
# Model evaluation
y_pred_train = bayesian_reg.predict(X_train)
y_pred_test = bayesian_reg.predict(X_test)
# Visualize
plt.scatter(X, y, color='blue')
plt.scatter(X_train, y_train, color='red')
plt.plot(X_train, bayesian_reg.predict(X_train), color='green')
plt.title('Bayesian Regression')
plt.xlabel('Age')
plt.ylabel('Maximum Heart Rate')
plt.show()
87
Chapter 3 Supervised Algorithms
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn.metrics import mean_squared_error, r2_score
88
Chapter 3 Supervised Algorithms
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=0)
# Model evaluation
y_pred_train = lasso_reg.predict(X_train)
y_pred_test = lasso_reg.predict(X_test)
# Visualize
plt.scatter(X, y, color='blue')
plt.scatter(X_train, y_train, color='red')
plt.plot(X_train, lasso_reg.predict(X_train), color='green')
plt.title('Lasso Regression')
plt.xlabel('Age')
plt.ylabel('Maximum Heart Rate')
plt.show()
89
Chapter 3 Supervised Algorithms
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Ridge
from sklearn.metrics import mean_squared_error, r2_score
90
Chapter 3 Supervised Algorithms
# Train-test split
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=0)
# Model evaluation
y_pred_train = ridge_reg.predict(X_train)
y_pred_test = ridge_reg.predict(X_test)
# Visualize
plt.scatter(X, y, color='blue')
plt.scatter(X_train, y_train, color='red')
plt.plot(X_train, ridge_reg.predict(X_train), color='green')
plt.title('Ridge Regression')
plt.xlabel('Age')
plt.ylabel('Maximum Heart Rate')
plt.show()
91
Chapter 3 Supervised Algorithms
3.3 Classification
The prediction of labels is called classification when the target variable is
not continuous in nature. Classification is the process of understanding,
recognizing, and categorizing the data into an existing category. It is a
supervised machine learning technique where the models are trained to
predict the label of the given data. It works based on the probability or
threshold score to determine the category of the data. It trains the model
with the training set of data and evaluates with the test set of data. Then
the model is subjected to the prediction of new data. Classification works
well on both structured and unstructured data.
92
Chapter 3 Supervised Algorithms
93
Chapter 3 Supervised Algorithms
94
Chapter 3 Supervised Algorithms
95
Chapter 3 Supervised Algorithms
for training, and majority votes is the final output. Boosting is building a
sequential model, which combines the weak learning models with strong
ones in order to get high accuracy in the final model.
The best part of this method is the generation of the number of trees
(not only the decision tree, but also supports other methods) with different
subset of features from the dataset. Features are selected randomly and
best among them is searched. From this process, the set of trees generated
denotes the wide diversity of feature selection to yield better overall results.
The margin is the distance between the support vectors from each
class of data, as shown in Figure 3-2. The maximization of margin distance
is recommended to classify the future data points with more confidence.
96
Chapter 3 Supervised Algorithms
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix, accuracy_score
import numpy as np
97
Chapter 3 Supervised Algorithms
# Train-test split
X_train, X_test,y_train,y_test = train_test_split(X,Y,
test_size=0.3)
#Model Evaluation
pred = lr.predict(X_test)
ac = accuracy_score(y_test,pred)
cm = confusion_matrix(y_test, pred)
print(ac)
print(cm)
#Visualization of metric
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix =
cm, display_labels = ["alive", "othercause","pesticides"])
cm_display.plot()
plt.show()
98
Chapter 3 Supervised Algorithms
Accuracy = 0.8303323580163553
Confusion Matrix =
[[21948 231 0]
[ 3600 187 0]
[ 659 33 0]]
#############Decision Trees#####################
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix, accuracy_score
import numpy as np
99
Chapter 3 Supervised Algorithms
# Train-test split
X_train, X_test,y_train,y_test = train_test_split(X,Y,test_
size=0.3)
#Model Evaluation
y_pred = agriTree.predict(X_test)
ac = accuracy_score(y_test,y_pred)
cm = confusion_matrix(y_test, y_pred)
print(ac)
print(cm)
#Visualization of metric
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix =
cm, display_labels = ["alive", "othercause","pesticides"])
cm_display.plot()
plt.show()
100
Chapter 3 Supervised Algorithms
Accuracy = 0.8383974791807337
Confusion Matrix =
[[22178 39 0]
[ 3596 172 0]
[ 628 45 0]]
###############Naive Bayes######################
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn.metrics import confusion_matrix, accuracy_score
import numpy as np
101
Chapter 3 Supervised Algorithms
# Train-test split
X_train, X_test,y_train,y_test = train_test_split(X,Y,test_
size=0.3)
#Model Evaluation
y_pred = classifier.predict(X_test)
ac = accuracy_score(y_test,y_pred)
cm = confusion_matrix(y_test, y_pred)
print(ac)
print(cm)
#Visualization of metric
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix =
cm, display_labels = ["alive", "othercause","pesticides"])
cm_display.plot()
plt.show()
102
Chapter 3 Supervised Algorithms
Accuracy = 0.817278115387501
Confusion Matrix =
[[20972 1175 1]
[ 3006 815 0]
[ 504 185 0]]
###################Random Forest###############
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import confusion_matrix, accuracy_score
import numpy as np
103
Chapter 3 Supervised Algorithms
# Train-test split
X_train, X_test,y_train,y_test = train_test_split(X,Y,
test_size=0.3)
#Model Evaluation
y_pred = rf_model.predict(X_test)
ac = accuracy_score(y_test,y_pred)
cm = confusion_matrix(y_test, y_pred)
print(ac)
print(cm)
#Visualization of metric
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix =
cm, display_labels = ["alive", "othercause","pesticides"])
cm_display.plot()
plt.show()
104
Chapter 3 Supervised Algorithms
Accuracy = 0.8209918223422612
Confusion Matrix =
[[21088 1011 114]
[ 2835 764 139]
[ 469 204 34]]
import pandas as pd
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from sklearn import metrics
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, accuracy_score
import numpy as np
105
Chapter 3 Supervised Algorithms
#Train-test Split
X_train, X_test,y_train,y_test = train_test_split(X,Y,
test_size=0.3)
#Model Evaluation
y_pred = classifier.predict(X_test)
ac = accuracy_score(y_test,y_pred)
cm = confusion_matrix(y_test, y_pred)
print(ac)
print(cm)
#Visualization of metric
cm_display = metrics.ConfusionMatrixDisplay(confusion_matrix =
cm, display_labels = ["alive", "othercause","pesticides"])
cm_display.plot()
plt.show()
106
Chapter 3 Supervised Algorithms
Accuracy = 0.8382474304148848
Confusion Matrix =
[[22346 0 0]
[ 3606 0 0]
[ 706 0 0]]
107
CHAPTER 4
Unsupervised
Algorithms
4.1 Introduction
In real time, it is hard to generate data in an unstructured format
and deduce insights from it. Unsupervised learning determines the
relationship between the data or group of data points. It uses the features
of data and not the label of data to study patterns, and hence experts
compare it with human intellect. The unstructured data is converted to
structured data by identifying similarities between the individual data
records in the dataset using unsupervised learning.
Unsupervised learning is more advantageous in handling complex
data, deriving insights from raw data without the requirement of labels,
and recognizing the patterns involved between unstructured data in
real time; it is also cheaper than supervised learning. Though it has
many benefits, there are challenges involved. It consumes more time for
training, it is difficult to identify the hidden patterns, and the results are
unpredictable as there are no labels in the dataset to verify.
∑ i =1
m
(∑ K
k =1
(W
ik ∗ || xi − µk ||2 ) ) ----- Eqn 1
110
Chapter 4 Unsupervised Algorithms
111
Chapter 4 Unsupervised Algorithms
112
Chapter 4 Unsupervised Algorithms
DE = √ ∑ i =1 ( Ai − Bi )
n 2
------- Eqn 2
∑ (A −B )
n 2
DSE = i =1 i i ------- Eqn 3
∑
n
DM = i =1
Aix − Bix + Aiy − Biy ------- Eqn 4
∑ (A −B )
n −1
i =0 i i
DC = ------ Eqn 5
∑ A X ∑
n −1 2 n −1
i =0 i i =0
Bi 2
The greater the angle between the data points, the farther the data
points are. This measure is similar to Euclidean distance and results
in similar outcomes, whereas Manhattan method produces different
outcomes.
113
Chapter 4 Unsupervised Algorithms
114
Chapter 4 Unsupervised Algorithms
115
Chapter 4 Unsupervised Algorithms
through a microphone, and the resultant sound wave is the mixed signal.
The independent components are the sound generated from the radio,
cow, bird, man, and kid. The sound needs to be separated from the
mixed signal.
116
Chapter 4 Unsupervised Algorithms
117
Chapter 4 Unsupervised Algorithms
118
Chapter 4 Unsupervised Algorithms
119
Chapter 4 Unsupervised Algorithms
120
Chapter 4 Unsupervised Algorithms
A => B
If set A contains more than one item, for example A = {x, y, z}, then its
cardinality is 3.
122
Chapter 4 Unsupervised Algorithms
There are three measures that are used in apriori algorithms: support,
confidence, and lift. Support is defined as the ratio of frequency of
an item to the total number of occurrences of all items. It can also be
calculated for more than one item. For example, if A and B exist, then the
frequency of A and B is given by freq(A, B), and the support is given by
freq(A,B)/N. The support measure for more than one item is required in
market basket analysis. Confidence is the ratio of frequency of more than
one item to the frequency of a given item, for example freq(A, B)/freq(A).
The final measure helps find out how the sale of each item influences the
other. Lift is defined as the ratio of support measured together with the
multiplication of support measured independently, for example, support
(A, B)/ (support(A)Xsupport(B)). The apriori algorithm works by using
these measures as follows.
123
Chapter 4 Unsupervised Algorithms
A = U * S * VT
where U and V are the orthogonal matrices and represent the singular
vectors and S is the diagonal matrix that has singular values of A.
The key concept of singular value decomposition is that it splits data
into crucial parts and makes use of it to find the patterns in them.
The SVD algorithm is as follows:
1. Eigen decomposition of the matrix ATA is computed
with a standard algorithm.
2. The singular values of A are computed and sorted in
descending order.
3. The singular vectors of A are computed for each
singular value and normalized to unit length.
The left and right singular vectors of A corresponding to the nonzero
singular values are computed with eigen vectors A AT and ATA, respectively.
124
Chapter 4 Unsupervised Algorithms
Mounted at /content/drive
import nltk
nltk.download('punkt')
/content/drive/MyDrive/Colab Notebooks/LAIoT/Chapter 4
125
Chapter 4 Unsupervised Algorithms
corpus = []
for i in range(len(sentence)):
sen = re.sub('[^a-zA-Z]', " ", sentence[i])
sen = sen.lower()
sen = sen.split()
sen = ' '.join([i for i in sen if i not in stopwords.
126
Chapter 4 Unsupervised Algorithms
words('english')])
corpus.append(sen)
from gensim.models import Word2Vec
all_words = [i.split() for i in corpus]
model = Word2Vec(all_words, min_count=1)
print(all_words)
127
Chapter 4 Unsupervised Algorithms
sent_vector=[]
for i in corpus:
plus=0
for j in i.split():
plus+= model.wv[j]
plus = plus/len(i.split())
sent_vector.append(plus)
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
from sklearn.preprocessing import StandardScaler
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
df = pd.DataFrame(sent_vector)
data_scaler = StandardScaler()
scaled_data = data_scaler.fit_transform(df)
pca = PCA()
pca.fit(scaled_data)
pca.explained_variance_ratio_
128
Chapter 4 Unsupervised Algorithms
plt.figure(figsize=(10,8))
plt.plot(range(1,12),pca.explained_variance_ratio_.cumsum(),
marker='o',linestyle='--')
plt.title("Variance Analysis")
plt.xlabel('Number of sentence vector')
plt.ylabel('Cumulative Explained Variance')
129
Chapter 4 Unsupervised Algorithms
130
Chapter 4 Unsupervised Algorithms
n_clusters = 6
kmeans_pca = KMeans(n_clusters, init = 'k-means++',
random_state = 42)
kmeans_pca.fit(scores_pca)
KMeans(algorithm='auto',copy_x=True,init='kmeans++',
max_iter=300,n_clusters=6,n_init=10,random_
state=42,tol=0.0001,verbose=0)
print(df.shape)
print(scores_pca.shape)
print(df.head())
scores_pca_df = pd.DataFrame(scores_pca)
print(scores_pca_df.head())
df_pca_kmeans = pd.concat([df, pd.DataFrame
(scores_pca)],axis=1)
df_pca_kmeans.columns.values[-3:] = [100, 101, 102]
df_pca_kmeans[103] = kmeans_pca.labels_
df_pca_kmeans.rename(columns={100 : "column1" })
df_pca_kmeans.head()
(11, 100)
(11, 10)
0 1 2 3 4 5 6 \
0 -0.000558 -0.000520 -0.001140 -0.000719 0.004166 0.000213 -0.001123
1 0.000328 0.000260 0.001941 0.000443 -0.001431 -0.001857 0.002054
2 0.002847 -0.001982 -0.002119 0.001163 0.000719 -0.000856 0.001758
3 -0.000510 0.001485 -0.001931 -0.000623 -0.000544 -0.000669 -0.000470
4 0.000483 -0.001095 -0.000232 0.001141 -0.003356 -0.000425 0.001856
131
Chapter 4 Unsupervised Algorithms
94 95 96 97 98 99
0 0.001408 0.002663 0.001535 0.000791 0.001581 -0.000122
1 0.003436 0.001415 -0.000968 0.000330 -0.000288 -0.002185
2 -0.003008 0.001151 0.000490 -0.001016 -0.001708 -0.001008
3 0.000685 -0.000670 -0.000724 -0.001122 0.000735 0.001186
4 0.001008 0.001520 -0.001825 0.001465 -0.000411 -0.001971
7 8 9
0 1.538730 -0.534066 -1.722997
1 5.781088 2.407620 0.864836
2 -1.987024 -2.434113 0.190779
3 -1.920528 3.576935 0.259598
4 -4.645400 2.295637 -0.576942
0 1 2 3 4 5 6 7
8 9 ... 1 2 3 4
5 6 100 101 102 103
0 -0.000558 -0.000520 -0.001140 -0.000719 0.004166 0.000213 -0.001123
0.004808 -0.002858 0.001510 ... -5.378297 0.182338 6.673170 -2.845145
3.595698 -2.252162 1.538730 -0.534066 -1.722997 1
1 0.000328 0.000260 0.001941 0.000443 -0.001431 -0.001857 0.002054
0.001826 -0.002245 -0.002983 ... 5.983585 0.752204 -1.399546 -1.548659
1.364393 -2.160367 5.781088 2.407620 0.864836 4
2 0.002847 -0.001982 -0.002119 0.001163 0.000719 -0.000856 0.001758
-0.000427 -0.001381 -0.000780 ... 5.883429 -3.167400 5.510932 -0.829297
-4.062222 -0.308087 -1.987024 -2.434113 0.190779 4
132
Chapter 4 Unsupervised Algorithms
df_pca_kmeans['test'] = df_pca_kmeans[103].map({0:'first',1:
'second',2:'third',3:'fourth'})
print(df_pca_kmeans.shape)
(11, 112)
for i in range(100,103):
x_axis=df_pca_kmeans[i]
y_axis=df_pca_kmeans[i+1]
plt.figure(figsize=(10,8))
sns.scatterplot(x=x_axis,y=y_axis, hue=df_pca_kmeans['test'],
palette=['g','r','c','m'])
plt.title('clusters with PCA and Kmeans')
xlabel = str(i-99)
ylabel = str(i-98)
plt.xlabel('PCA attribute'+ xlabel)
plt.ylabel('PCA attribute'+ ylabel)
plt.show()
133
Chapter 4 Unsupervised Algorithms
134
Chapter 4 Unsupervised Algorithms
135
Chapter 4 Unsupervised Algorithms
136
Chapter 4 Unsupervised Algorithms
for i in sorted(my_list):
print(sentence[i])
Output:
4.11 Summary
This chapter discussed the various unsupervised algorithms and how
they work. It explained the clustering techniques to group the data
points into different collections. It also discussed PCA, which identifies
the main components that are necessary for learning and works for
dimensionality reduction, and ICA, which separates the collective data
into its independent components. Anomaly detection finds the outliers
among the data points. Neural network imitates the human brain to
proceed further in the process of learning. The apriori algorithm devises
the relationship between the frequently occurring items and their affinity
over each other. SVD find the crucial regions in the dataset to figure out the
pattern hidden in it.
137
CHAPTER 5
Reinforcement
Learning
5.1 Introduction
The previous two chapters explained the various algorithms of supervised
and unsupervised machine learning techniques. There are certain
algorithms that imitate the learning process of human beings. Because
of the varied nature of these algorithms, instead of categorizing them as
supervised or unsupervised, they are categorized under a new machine
learning technique called reinforcement learning.
The fundamental concept of reinforcement learning is that it must
proceed on a trial-and-error basis to achieve the goal. The system
determines the reward or penalty based on the actions taken. Ultimately,
the goal should be attained with maximum reward points. This approach is
highly suitable for real-world problems with complex objectives.
The features of reinforcement learning are as follows:
• The actions of the software part of the system lead to
the regulation of the successive output it generates.
• The decision of proceeding further based on actions
and output is sequential in nature. Any real number or
signal is used to represent the progress and there exists no
supervisor to orient towards the goal.
© G.R. Kanagachidambaresan and N. Bharathi 2024 139
G.R. Kanagachidambaresan and N. Bharathi, Learning Algorithms for Internet of Things,
Maker Innovations Series, https://doi.org/10.1007/979-8-8688-0530-1_5
Chapter 5 Reinforcement Learning
140
Chapter 5 Reinforcement Learning
141
Chapter 5 Reinforcement Learning
142
Chapter 5 Reinforcement Learning
143
Chapter 5 Reinforcement Learning
vπ(s) = Е [Gt | St = s]
144
Chapter 5 Reinforcement Learning
where St is the set of possible states and Gt is the return. It can be expanded
as follows:
qπ(s,a) = Е [Gt | St = s, At = a]
= Е [Rt+1 + γRt+2 +…+γT-t-1RT | St = s, At = a]
= Е [Rt+1 + γGt+1 | St = s, At = a]
= ∑ p ( s ,a ) r + γ ∑ π ( s ′ ) qπ ( s ′,a ′ ) ]
s ′ ,r a′
where At is the set of actions.
145
Chapter 5 Reinforcement Learning
π ∗ ( s ) = ∑ p ( s ,a ) r + γ v∗ ( s ′ )
s ′. r
V ( s ) ← ∑p ( s ,a ) r + γ V ( s ′ )
s ′ ,r
146
Chapter 5 Reinforcement Learning
Based on the different actions taken from the same state, the policy
differs from π to π’, and the policy improvement for the different actions is
as follows:
147
Chapter 5 Reinforcement Learning
wt
Q ( s ,a ) ← Q ( s ,a ) + G − Q ( s ,a )
C ( s ,a )
N
148
Chapter 5 Reinforcement Learning
149
Chapter 5 Reinforcement Learning
θ = θ − α∇L (θ )
150
Chapter 5 Reinforcement Learning
∇J (θ ) ∝ ∑ µ ( s ) ∑ qπ ( s ,a ) ∇π ( a|,s|,θ )
s a
θt +1 = θt + αγ t Gt ∇lnln π ( At |St , θt )
∇π ( At |,St |,θt )
where ∇lnln π ( At |St , θt ) =
π ( At |,St |,θt )
Advantage Actor-Critic: This algorithm is the combination of policy
gradient method and temporal difference method along with neural
network. The advantage function is given as follows:
151
Chapter 5 Reinforcement Learning
θ t + 1 = θt + αγtAdvt ∇ lnln π(At| St, θt) if Advt > 0, then taking the
action is better than following the policy.
152
Chapter 5 Reinforcement Learning
153
Chapter 5 Reinforcement Learning
import gym
import matplotlib
import matplotlib.pyplot as plt # for visualizing the
frozen lake environment
from matplotlib import animation # for display video
from IPython.display import HTML # for display video
import os
from typing import Tuple, Dict, Optional, Iterable,
Callable # in render Optional
154
Chapter 5 Reinforcement Learning
LEFT = 0
DOWN = 1
RIGHT = 2
UP = 3
MAPS = {
"4x4": ["SFFF", "FHFH", "FFFH", "HFFG"],
"8x8": [
"SFFFFFFF",
"FFFFFFFF",
"FFFHFFFF",
"FFFFFHFF",
"FFFHFFFF",
"FHHFFFHF",
"FHFFHFHF",
"FFFHFFFG",
],
}
# DFS to check that it's a valid path.
155
Chapter 5 Reinforcement Learning
c_new = c + y
if r_new < 0 or r_new >= max_size or c_new < 0
or c_new >= max_size:
continue
if board[r_new][c_new] == "G":
return True
if board[r_new][c_new] != "H":
frontier.append((r_new, c_new))
return False
# The below code generates the random map for frozen lake
problem with one start and one goal along with holes and frozen
regions.
156
Chapter 5 Reinforcement Learning
class FrozenLakeEnv(gym.Env):
"""
Frozen lake involves crossing a frozen lake from Start(S)
to Goal(G) without falling into any Holes(H) by walking
over the Frozen(F) lake.
The agent may not always move in the intended direction due
to the slippery nature of the frozen lake.
### Action Space
The agent takes a 1-element vector for actions.
The action space is `(dir)`, where `dir` decides direction
to move in which can be:
- 0: LEFT
- 1: DOWN
- 2: RIGHT
- 3: UP
### Observation Space
The observation is a value representing the agent's current
position as
current_row * nrows + current_col (where both the row and
col start at 0).
For example, the goal position in the 4x4 map can be
calculated as follows: 3 * 4 + 3 = 15.
The number of possible observations is dependent on the
size of the map.
For example, the 4x4 map has 16 possible observations.
### Rewards
157
Chapter 5 Reinforcement Learning
Reward schedule:
- Reach goal(G): +1
- Reach hole(H): 0
- Reach frozen(F): 0
### Arguments
```
gym.make('FrozenLake-v1', desc=None, map_name="4x4", is_
slippery=True)
```
`desc`: Used to specify custom map for frozen lake. For
example,
desc=["SFFF", "FHFH", "FFFH", "HFFG"].
A random generated map can be specified by calling the
function `generate_random_map`. For example,
```
from gym.envs.toy_text.frozen_lake import generate_
random_map
gym.make('FrozenLake-v1', desc=generate_random_
map(size=8))
```
`map_name`: ID to use any of the preloaded maps.
"4x4":[
"SFFF",
"FHFH",
"FFFH",
"HFFG"
]
`is_slippery`: True/False. If True will move in
intended direction with
probability of 1/3 else will move in either perpendicular
direction with
158
Chapter 5 Reinforcement Learning
# Rendering modes
metadata = {
"render_modes": ["human", "ansi", "rgb_array"],
"render_fps": 4,
}
def __init__(
self,
render_mode: Optional[str] = None,
desc=None,
map_name="4x4",
is_slippery=True,
shaped_rewards: bool = False,
size: int = 16
):
if desc is None and map_name is None:
desc = generate_random_map()
elif desc is None:
desc = MAPS[map_name]
self.desc = desc = np.asarray(desc, dtype="c")
159
Chapter 5 Reinforcement Learning
160
Chapter 5 Reinforcement Learning
161
Chapter 5 Reinforcement Learning
self.render_mode = render_mode
# pygame utils
self.window_size = (min(64 * ncol, 512), min
(64 * nrow, 512))
self.cell_size = (
self.window_size[0] // self.ncol,
self.window_size[1] // self.nrow,
)
self.window_surface = None
self.clock = None
self.hole_img = None
self.cracked_hole_img = None
self.ice_img = None
self.elf_images = None
self.goal_img = None
self.start_img = None
162
Chapter 5 Reinforcement Learning
def reset(
self,
*,
seed: Optional[int] = None,
options: Optional[dict] = None,
):
super().reset(seed=seed)
self.s = categorical_sample(self.initial_state_distrib,
self.np_random)
self.lastaction = None
if self.render_mode == "human":
self.render()
return int(self.s), {"prob": 1}
def render(self):
if self.render_mode is None:
logger.warn(
"You are calling render method without
specifying any render mode. "
"You can specify the render_mode at
initialization, "
f'e.g. gym("{self.spec.id}", render_mode="rgb_
array")'
)
elif self.render_mode == "ansi":
return self._render_text()
else: # self.render_mode in {"human", "rgb_array"}:
return self._render_gui(self.render_mode)
163
Chapter 5 Reinforcement Learning
165
Chapter 5 Reinforcement Learning
self.elf_images = [
pygame.transform.scale(pygame.image.load
(f_name), self.cell_size)
for f_name in elfs
]
desc = self.desc.tolist()
assert isinstance(desc, list), f"desc should be a list
or an array, got {desc}"
for y in range(self.nrow):
for x in range(self.ncol):
pos = (x * self.cell_size[0], y * self.cell_
size[1])
rect = (*pos, *self.cell_size)
self.window_surface.blit(self.ice_img, pos)
if desc[y][x] == b"H":
self.window_surface.blit(self.hole_
img, pos)
elif desc[y][x] == b"G":
self.window_surface.blit(self.goal_
img, pos)
elif desc[y][x] == b"S":
self.window_surface.blit(self.start_
img, pos)
pygame.draw.rect(self.window_surface,
(180, 200, 230), rect, 1)
# paint the elf
bot_row, bot_col = self.s // self.ncol, self.s %
self.ncol
cell_rect = (bot_col * self.cell_size[0], bot_row *
self.cell_size[1])
166
Chapter 5 Reinforcement Learning
167
Chapter 5 Reinforcement Learning
168
Chapter 5 Reinforcement Learning
@staticmethod
def _create_Lake(size: int) -> Dict[Tuple[int],
Iterable[Tuple[int]]]:
lake = {s: [] for s in range(size) }
return lake
169
Chapter 5 Reinforcement Learning
@staticmethod
def _center_small_rect(big_rect, small_dims):
offset_w = (big_rect[2] - small_dims[0]) / 2
offset_h = (big_rect[3] - small_dims[1]) / 2
return (
big_rect[0] + offset_w,
big_rect[1] + offset_h,
)
def _render_text(self):
desc = self.desc.tolist()
outfile = StringIO()
row, col = self.s // self.ncol, self.s % self.ncol
desc = [[c.decode("utf-8") for c in line] for line
in desc]
desc[row][col] = utils.colorize(desc[row][col], "red",
highlight=True)
if self.lastaction is not None:
outfile.write(f" ({['Left', 'Down', 'Right', 'Up']
[self.lastaction]})\n")
else:
outfile.write("\n")
outfile.write("\n".join("".join(line) for line in
desc) + "\n")
with closing(outfile):
return outfile.getvalue()
170
Chapter 5 Reinforcement Learning
def close(self):
if self.window_surface is not None:
import pygame
pygame.display.quit()
pygame.quit()
env = FrozenLakeEnv(render_mode='rgb_array')
env.reset()
#print("this is frozen lake")
#env.render()
frame = env.render()
plt.figure(figsize=(6,6))
plt.axis('off')
plt.imshow(np.squeeze(frame))
print(f"For example, the initial state is: {env.reset()}")
print(f"The space state is of type: {env.observation_space}")
print(f"An example of a valid action is: {env.action_space.
sample()}")
print(f"The action state is of type: {env.action_space}")
env.reset()
action = env.action_space.sample()
print(action)
next_state, reward, done, info,_ = env.step(action)
frame = env.render()
plt.axis('off')
plt.title(f"State: {next_state}")
plt.imshow(np.squeeze(frame))
171
Chapter 5 Reinforcement Learning
state = env.reset()
episode = []
terminated = False
truncated = False
# policy function
def policy(state):
action_probabilities = np.array([0.25, 0.25, 0.25, 0.25])
return action_probabilities
action_probabilities = policy((0,0))
for action, prob in zip(range(4), action_probabilities):
print(f"Probability of taking action {action}: {prob}")
state = env.reset()
action_probabilities = policy(state)
objects = ('Up', 'Right', 'Down', 'Left')
y_pos = np.arange(len(objects))
plt.bar(y_pos, action_probabilities, alpha=0.5)
172
Chapter 5 Reinforcement Learning
plt.xticks(y_pos, objects)
plt.ylabel('P(a|s)')
plt.title('Random Policy')
plt.tight_layout()
plt.show()
state_values = np.zeros(shape=(16))
def policy_evaluation(policy_probs, state_values, theta=1e-6,
gamma=0.99):
delta = float("inf")
while delta > theta:
delta = 0
for index in range(16):
old_value = state_values[(index)]
new_value = 0
action_probabilities = policy_probs[(index)]
for action, prob in enumerate(action_
probabilities):
next_state, reward, _, _ = env.simulate_
step((index), action)
new_value += prob * (reward + gamma *
state_values[next_state])
state_values[(index)] = new_value
delta = max(delta, abs(old_value - new_value))
173
Chapter 5 Reinforcement Learning
max_qsa = float("-inf")
for action in range(4):
next_state, reward, _, _ = env.simulate_step
((index), action)
qsa = reward + gamma * state_values[next_state]
if qsa > max_qsa:
max_qsa = qsa
new_action = action
action_probs = np.zeros(4)
action_probs[new_action] = 1.
policy_probs[(index)] = action_probs
if new_action != old_action:
policy_stable = False
return policy_stable
policy_iteration(policy_probs, state_values)
def display_video(frames):
orig_backend = matplotlib.get_backend()
matplotlib.use('Agg')
fig, ax = plt.subplots(1, 1, figsize=(5, 5))
174
Chapter 5 Reinforcement Learning
matplotlib.use(orig_backend)
ax.set_axis_off()
ax.set_aspect('equal')
ax.set_position([0, 0, 1, 1])
im = ax.imshow(frames[0])
def update(frame):
im.set_data(frame)
return [im]
anim = animation.FuncAnimation(fig=fig, func=update,
frames=frames,
interval=50, blit=True,
repeat=False)
return HTML(anim.to_html5_video())
# The agent method for testing the environment and play game
with successive actions for 10 episodes by default.
175
Chapter 5 Reinforcement Learning
frames.append(img)
state = next_state
if state==15:
break;
print(episode)
return display_video(frames)
5.7 Summary
Reinforcement learning is the process of gaining experience much like a
human being does. The components involved are state, action, reward,
return, exploration, exploitation, experience, etc. Two major types of
reinforcement learning are the model-based and model-free methods. We
discussed model-free methods in this chapter. Also, the applications of RL
and a case study with Python are discussed. In the next chapter, we will
discuss the artificial neural networks in more detail.
176
CHAPTER 6
Artificial Neural
Networks for IoT
6.1 Introduction to Artificial Neural
Networks (ANNs)
An ANN is a deep learning method that imitates the neural functionality
of the human brain. The human brain is composed of billions of neurons
that consist of dendrites, axon, and nucleus. Information is communicated
between the excited neurons through electric and chemical signals.
Figure 6-1 shows the synapse or synaptic junction formed through the
connection of the axon of one neuron and the dendrites of another
neuron. The axon and dendrites acts as the neuro transmitter and receiver
during information transmission in the human brain. The information
received through dendrites is processed at the nucleus and in turn
produces electric and chemical signals to transmit to the next neuron
through axon tips when its threshold level is reached.
178
Chapter 6 Artificial Neural Networks for IoT
179
Chapter 6 Artificial Neural Networks for IoT
180
Chapter 6 Artificial Neural Networks for IoT
181
Chapter 6 Artificial Neural Networks for IoT
x = [-5,-4.5,-4,-3.5,-3,-2.5,-2,-1.5,-1,-0.5,0,1,1.5,2,2.5,3,
3.5,4,4.5,5]
y1,y2,y3,y4,y5,y6,y7,y8,y9 = [],[],[],[],[],[],[],[],[]
def linear(x):
return x
def binary(x):
if x >= 0:
return 1
else:
return 0
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def tanh(x):
return np.tanh(x)
def relu(x):
return max(0.0, x)
def leakyrelu(x):
alpha = 0.1 #small negative slope
return max(alpha*x, x)
def exponentialLU(x):
alpha = 1.0 # positive value hyper parameter
if x > 0:
182
Chapter 6 Artificial Neural Networks for IoT
return x
else:
return (1 * (np.exp(x) – 1))
def softmax(x,exp_sm):
exp_x = np.exp(x)
return exp_x / exp_sm
def swish(x):
return x * sigmoid(x)
exp_sm = exp_sum(x)
for i in range(len(x)):
y1.append(linear(x[i]))
y2.append(binary(x[i]))
y3.append(sigmoid(x[i]))
y4.append(tanh(x[i]))
y5.append(relu(x[i]))
y6.append(leakyrelu(x[i]))
y7.append(exponentialLU(x[i]))
y8.append(softmax(x[i],exp_sm))
y9.append(swish(x[i]))
plt.plot(x,y1, label=’Linear’)
plt.plot(x,y2, label=’Binary’)
plt.legend()
183
Chapter 6 Artificial Neural Networks for IoT
plt.show()
plt.plot(x,y3, label=’Sigmoid’)
plt.plot(x,y4, label=’Tanh’)
plt.legend()
plt.show()
plt.plot(x,y5, label=’ReLU’)
plt.plot(x,y6, label=’Leaky ReLU’)
plt.plot(x,y7, label=’Exponential LU’)
plt.legend()
plt.show()
plt.plot(x,y8, label=’Softmax’)
plt.plot(x,y9, label=’Swish’)
plt.legend()
plt.show()
185
Chapter 6 Artificial Neural Networks for IoT
Binary class loss function: The loss function used in binary class
models is called the binary class loss function. In this type, the predicting
output is one among the two values, namely, 0 or 1. The threshold value is
decided to classify the data under the category 0 or 1. The default threshold
value will be 0.5, and the prediction under category 0 if the value is less
than the threshold or under category if value is greater than threshold.
The binary classification loss functions is as follows:
186
Chapter 6 Artificial Neural Networks for IoT
import numpy as np
y_actual = np.array([1.0, 0.0, 1.0, 0.0, 1.0])
y_pred = np.array([0.8, 0.1, 0.9, 0.3, 0.9])
# Calculates the mean squared error (MSE) loss between
predicted and actual values.
def mseloss(y_actual, y_pred):
187
Chapter 6 Artificial Neural Networks for IoT
n = len(y_actual)
mse_loss = np.sum((y_pred - y_actual) ** 2) / n
return mse_loss
188
Chapter 6 Artificial Neural Networks for IoT
Output:
0.031999999999999994
0.01885637600036785
0.15999999999999998
0.1791800084452842
0.56
0.31360000000000005
import numpy as np
189
Chapter 6 Artificial Neural Networks for IoT
y_actual_onehot = np.zeros_like(y_pred)
y_actual_onehot[np.arange(len(y_actual)), y_actual] = 1
# calculate loss
scce_loss = -np.mean(np.sum(y_actual_onehot * np.log
(y_pred), axis=-1))
return scce_loss
print(multiclassCELoss(y_actual, y_pred))
y_actual = np.array([1,2,0])
y_pred = np.array([[0.1, 0.6, 0.3], [0.2, 0.3, 0.5],
[0.8, 0.1, 0.1]])
print(sparse_categorical_crossentropy(y_actual, y_pred))
print(kl_divergence_loss(y_actual, y_pred))
output:
0.47570545188004854
0.47570545188004854
0.43386458262986244
190
Chapter 6 Artificial Neural Networks for IoT
191
Chapter 6 Artificial Neural Networks for IoT
import numpy as np
192
Chapter 6 Artificial Neural Networks for IoT
The following code illustrates this with three input neurons in the
input layer and two neurons in output layer.
import numpy as np
193
Chapter 6 Artificial Neural Networks for IoT
[[0.77928423 0.81475486]
[0.79882622 0.83384928]]
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.colors
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, mean_squared_error
from sklearn.preprocessing import OneHotEncoder
from sklearn.datasets import make_blobs
from tqdm import tqdm_notebook
#creation of color map
my_cmap = matplotlib.colors.LinearSegmentedColormap.
from_list("", ["red","blue","green","yellow"])
194
Chapter 6 Artificial Neural Networks for IoT
195
Chapter 6 Artificial Neural Networks for IoT
self.bs2 = 0
self.bs3 = 0
self.bs4 = 0
196
Chapter 6 Artificial Neural Networks for IoT
# initialise w, b
if initialise:
self.wt1 = np.random.randn()
self.wt2 = np.random.randn()
self.wt3 = np.random.randn()
self.wt4 = np.random.randn()
self.wt5 = np.random.randn()
self.wt6 = np.random.randn()
self.wt7 = np.random.randn()
self.wt8 = np.random.randn()
self.wt9 = np.random.randn()
self.bs1 = 0
self.bs2 = 0
197
Chapter 6 Artificial Neural Networks for IoT
self.bs3 = 0
self.bs4 = 0
if display_loss:
loss = {}
m = X.shape[1]
self.wt1 -= learning_rate * dwt1 / m
self.wt2 -= learning_rate * dwt2 / m
self.wt3 -= learning_rate * dwt3 / m
self.wt4 -= learning_rate * dwt4 / m
self.wt5 -= learning_rate * dwt5 / m
self.wt6 -= learning_rate * dwt6 / m
198
Chapter 6 Artificial Neural Networks for IoT
if display_loss:
Y_pred = self.predict(X)
loss[i] = mean_squared_error(Y_pred, Y)
if display_loss:
plt.plot(loss.values())
plt.xlabel('Epochs')
plt.ylabel('Mean Squared Error')
plt.show()
ffn = simpleFFNetwork()
#predictions
Y_pred_train = ffn.predict(X_train)
199
Chapter 6 Artificial Neural Networks for IoT
#model performance
print("Training accuracy", round(accuracy_train, 2))
print("Validation accuracy", round(accuracy_val, 2))
(500, 2) (500,)
(375, 2) (125, 2)
200
Chapter 6 Artificial Neural Networks for IoT
201
Chapter 6 Artificial Neural Networks for IoT
202
Chapter 6 Artificial Neural Networks for IoT
203
Chapter 6 Artificial Neural Networks for IoT
import numpy as np
import matplotlib.pyplot as plt
# RNN parameters
input_size = 1
hidden_size = 10
output_size = 1
learning_rate = 0.01
204
Chapter 6 Artificial Neural Networks for IoT
# Forward pass
def forward(inputs):
xh = np.zeros((hidden_size, 1))
outputs = []
for x in inputs:
x = x.reshape(-1, 1) # Convert to column vector
mulx = np.dot(Wx, x)
mulh = np.dot(Wh, xh)
xh = mulx + mulh + bh
xh = sigmoid(xh)
#weight vector is multiplied with the vector h to give
the output vector
y = Wy @ xh + by
outputs.append(y)
return outputs, xh
# Training loop
for epoch in range(100): # 100 epochs
for i in range(len(X)):
205
Chapter 6 Artificial Neural Networks for IoT
# Prediction
predictions = []
for i in range(len(X)):
inputs = X[i].reshape(n_steps, input_size)
outputs, _ = forward(inputs)
predictions.append(outputs[-1].item())
206
Chapter 6 Artificial Neural Networks for IoT
207
Chapter 6 Artificial Neural Networks for IoT
Autoencoders
Autoencoders consist of encoders and decoders that map the input space
to a lower-dimensional intermediate representation and regenerate input
data from the intermediate representation, respectively.
6.6 Summary
Artificial neural networks are imitations of the human brain’s neural
networks, which are used in predictions. The components involved are
input, hidden, and output layers; weights; bias; activation functions; and
loss functions. Two major types of ANNs are feed-forward and feedback
ANNs. We discussed both types and their subtypes in this chapter. Also, the
Python programs for implementing activation functions, loss functions,
feed-forward, and feedback network architectures were discussed. In
the next chapter, we will discuss the convolutional neural networks in
more detail.
208
CHAPTER 7
Convolutional Neural
Networks for IoT
7.1 Introduction
Deep neural networks have evolved into various types that are suitable for
different domains. Convolutional neural networks (CNNs) are one type
that are the right fit for computer vision applications. Object detection
and image processing applications can be developed using CNNs. CNNs
are different from other machine learning techniques because they can
extract features on their own from images as well as other types of data
without manual operations involved. There are many built-in CNNs such
as AlexNet, GoogleNet, ResNet, etc.
A sound implementation of a CNN lies in the better understanding
and usage of its basic operations and the components that it consists of.
The basic components are kernels, convolution operation, feature map,
pooling, and striding.
210
Chapter 7 Convolutional Neural Networks for IoT
Figure 7-1. Generation of 3x3 feature maps with 5x5 input image
size, 3x3 filter size, and stride = 1
211
Chapter 7 Convolutional Neural Networks for IoT
212
Chapter 7 Convolutional Neural Networks for IoT
where Wi and Wo are the input image and the output feature map size in
pixel values. F is the size of the filter or kernel. P is the amount of padding
for extending the borders of input image. S is the number of rows or
columns skipped during the movement of the filter over the input mage.
Hence, after applying convolution, Wi x Wi x Din get transformed into Wo
x Wo x Dout.
Input Layer
The input layer is the layer that has the input image with a predefined
size. The standard dimension of the image is 227x227x3 or 224x224x3 as
an RGB image. This image dimension is followed in almost all types of
CNN. Alternatively, the grayscale image dimension is 32x32x1.
Convolution Layer
The process of CNN starts with the convolution layer in which the
convolution operation is carried out between the input image and the filter
or kernel. The common filter sizes are 2x2, 3x3, 5x5, and 7x7. The filter
slides over the input image, and as it slides, the dot product is calculated
between the portion of the image covered by filter and the filter itself.
This results in a feature map that provides details about the image such
as edges, corners, notable parts of the objects, etc. The convolution layer
can generate a greater number of feature maps to provide various details of
the image with the usage of multiple filters. The feature maps in turn feed
to successive layers to learn more and more about the features of the input
image. The positivity of the convolution layer is that it preserves the spatial
relationships between pixels.
213
Chapter 7 Convolutional Neural Networks for IoT
Pooling Layer
This layer acts as a bridge between the convolution layer and the fully
connected layer. The key role of this layer is the reduce the dimension of
the feature maps to reduce the computation cost. This pooling process
is performed independently on each feature map irrespective of any
order in the feature maps. This layer along with the convolution layer
works for extracting the features and helps in recognizing the features
independently.
214
Chapter 7 Convolutional Neural Networks for IoT
LeNet
This type of CNN is mainly focused on handwritten character recognition
with the regular convolution, pooling, and fully connected layers that have
nonlinear activation functions. The convolution layer is for extracting
features, pooling for downsampling, and using tanh activation functions
for nonlinearity. The FC layer is for classifying the input images under
various classes. To avoid overfitting of the model, dropout is used to reduce
complexity and improve performance.
The architecture is demonstrated in Figure 7-3 as it consists of three
convolution layers and two pooling layers alternatively. Also, it comprises
three fully connected layers with a flattened size of 120, 84, and 10,
respectively. The output is categorized under any one of the 10 classes.
AlexNet
This model was proposed in 2012 through a research paper by Alex
Krizhevsky and his fellow researchers. It is a deep architecture that
involves five convolution layer along with max pooling. As we walk through
the architecture of AlexNet as shown in Figure 7-4, the input image of size
215
Chapter 7 Convolutional Neural Networks for IoT
227x227x3 is fed to the first layer of convolution with a 11x11 filter size with
stride 4. So, 96 filters are applied in the first layer over the input layer. The
output dimension of the first layer is 55x55x96. The second layer is the max
pooling layer, which reduces the dimension as 27x27x96.
216
Chapter 7 Convolutional Neural Networks for IoT
Later, the first fully connected layer of output size 4096 is generated.
Then one more fully connected layer is formed. Finally, the output layer is
formed with a fully connected 1,000 classes.
As we move forward through the model, the number of filters
increases, and it helps to extract features in more detail in the architecture.
Also, the feature map size decreases as we move deeper.
VGGNet
The VGGNet model comprises 13 convolutional layers and 3 fully
connected layers along with pooling layers. The input layer is fed to the
first convolution layer, which has 64 filters and of size 3x3. This is repeated
one more time in the second convolution layer. The output of these two
layers is 224x224x64. Now, max pooling is applied with a 2x2 size with
stride 2, and the output dimension is 112x112x128.
Next, two more convolution layers are formed with 128 filters of size
3x3. The new dimension in these layers is 112x112x128. Max pooling is
applied, and the resulting dimension is 56x56x256.
Next, three more convolution layers are formed with 256 filters of size
3x3. The dimension in these layers is reduced to 28x28x512. Max pooling
is applied, and the resulting dimension is 14x14x512. The same set of three
convolution and one max pooling is repeated to produce a 7x7x512 feature
map. Finally, it is flattened into three fully connected layers with two 4,096
sizes and 1,000 categorized classes.
The architecture shown in Figure 7-5 is VGGNet 16 as explained.
Similar to this is VGGNet 19, which has three more convolution layers.
217
Chapter 7 Convolutional Neural Networks for IoT
GoogLeNet
This CNN model is very different from other models as it uses 1x1
convolution, global average pooling, inception module, and auxiliary
classifier for training.
Global average pooling: This layer transforms the 7x7 feature map
to 1x1, which highly decreases the computation cost. Other networks
such as AlexNet and ZFNet have fully connected layer loaded with more
parameters, which obviously increases the computation cost.
1x1 convolution: This convolution operation is used in the inception
module. It is used to reduce the number of weights and biases of the
architecture to increase the depth of the architecture.
Inception module: In other models, the convolution layer is of
fixed size, but in this inception module three different size convolutions
along with max pooling is performed in order to generate final output
concatenated from the four previously mentioned operations. As the
218
Chapter 7 Convolutional Neural Networks for IoT
219
Chapter 7 Convolutional Neural Networks for IoT
ResNet
The ResNet model starts with an input image size of 227x227x3. The next
is zero padding, which increases the dimension by 229x229x3. This is
followed by the convolution layer with a kernel size of 7x7. The resultant
dimension is 112x112x64 feature maps. Next, max pooling is applied to
reduce the size. Now the dimension is 56x56x64.
220
Chapter 7 Convolutional Neural Networks for IoT
221
Chapter 7 Convolutional Neural Networks for IoT
"""
# Importing the necessary packages
import tensorflow as tf
222
Chapter 7 Convolutional Neural Networks for IoT
223
Chapter 7 Convolutional Neural Networks for IoT
224
Chapter 7 Convolutional Neural Networks for IoT
#ouput
Model: "sequential"
_______________________________________________________________
Layer (type) Output Shape Param #
====================================================================
conv2d (Conv2D) (None, 24, 24, 50) 1300
max_pooling2d (MaxPooling2D) (None, 12, 12, 50) 0
conv2d_1 (Conv2D) (None, 10, 10, 50) 22550
max_pooling2d_1 (MaxPooling2D) (None, 5, 5, 50) 0
flatten (Flatten) (None, 1250) 0
dense (Dense) (None, 10) 12510
================================================================
225
Chapter 7 Convolutional Neural Networks for IoT
226
Chapter 7 Convolutional Neural Networks for IoT
#output
Epoch 1/10
422/422 - 66s - loss: 0.6083 - accuracy: 0.7824 - val_loss:
0.4082 - val_accuracy: 0.8518 - 66s/epoch - 157ms/step
Epoch 2/10
422/422 - 60s - loss: 0.3943 - accuracy: 0.8609 - val_loss:
0.3625 - val_accuracy: 0.8748 - 60s/epoch - 141ms/step
Epoch 3/10
422/422 - 61s - loss: 0.3484 - accuracy: 0.8777 - val_loss:
0.3344 - val_accuracy: 0.8772 - 61s/epoch - 145ms/step
Epoch 4/10
422/422 - 61s - loss: 0.3184 - accuracy: 0.8876 - val_loss:
0.2932 - val_accuracy: 0.8978 - 61s/epoch - 144ms/step
Epoch 5/10
422/422 - 59s - loss: 0.3003 - accuracy: 0.8925 - val_loss:
0.2778 - val_accuracy: 0.9072 - 59s/epoch - 139ms/step
Epoch 6/10
227
Chapter 7 Convolutional Neural Networks for IoT
#output
1/1 [==========================] - 5s 5s/step - loss: 0.2767 -
accuracy: 0.9014
# Printing the test results
print('Test loss: {0:.4f}. Test accuracy: {1:.2f}%'.
format(test_loss, test_accuracy*100.))
#output
Test loss: 0.2767. Test accuracy: 90.14%
"""### Plotting images and the results"""
import matplotlib.pyplot as plt
import numpy as np
# Split the test_data into 2 arrays, containing the images and
the corresponding labels
228
Chapter 7 Convolutional Neural Networks for IoT
#output
column1 column2
0 0 T-shirt/top
1 1 Trouser
2 2 Pullover
3 3 Dress
4 4 Coat
5 5 Sandal
6 6 Shirt
7 7 Sneaker
8 8 Bag
9 9 Ankle boot
229
Chapter 7 Convolutional Neural Networks for IoT
plt.axis('off')
plt.imshow(images_plot[i-1], cmap="gray", aspect='auto')
plt.show()
# Print the correct label for the image
print("Label: {} {}".format(labels_test[i-1], df.
column2[labels_test[i-1]]))
#output
230
Chapter 7 Convolutional Neural Networks for IoT
plt.bar(x=[0,1,2,3,4,5,6,7,8,9], height=probabilities[0],
tick_label=["T-shirt/top", "Trouser", "Pullover", "Dress",
"Coat","Sandal", "Shirt", "Sneaker", "Bag", "Ankle boot"])
#output
7.5 Summary
This chapter discussed the general architecture of CNNs. CNNs are
useful in computer vision-based applications. The various types of CNNs
are used for specialized purpose to recognize objects. LeNeT, AlexNet,
GoogLeNet, VGGNet, and ResNet were discussed and their detailed
architecture were illustrated. Finally, a case study on computer vison
was implemented in Python with the fashion MNIST data set that clearly
recognized the costumes with an overall test accuracy of 90.14%.
231
CHAPTER 8
RNN Architecture
An RNN has memory cells in the neurons that play a key role during
computation. The memory cell controls the flow of information between
input and output layers for keeping track of information and predicting
the output. During the forward propagation, it computes the output of
each neuron of that layer by updating its internal state in the memory cell
and forwards the output to the next layer. During backward propagation
through time (BPTT), the partial derivatives of the output is sent back to
the network through the sequence of input received. This in turn helps
determine the effect of the previous neuron’s output to the current cell’s
output. Then the weights of the network are tuned to get better prediction
over the iterations during the training phase.
The formula for finding the current state (hidden) is as follows:
ht=f(ht-1,it)
234
Chapter 8 RNNs, LSTMs, and GANs
where ht is the current state, ht-1 is the previous state, and it is the input.
After applying the activation function tanh, the current state expression is
as follows:
ht=tanh(whh*ht-1 + wih*it+b)
where whh is weight of recurrent neuron, wih is the weight of the input
neuron, and b is the bias of hidden layer.
The output is as follows:
ot = whoht + c
where who is the weight of the output layer and c is the bias at the
output layer.
Figure 8-1 illustrates the temporal unfold of an RNN, which depicts the
forward feed and backward feed over successive time instances.
Types of RNN
The RNN is classified into four types, as illustrated in Figure 8-2, based on
the nature or number of inputs and outputs: one-to-one, one-to-many,
many-to-one, and many-to-many.
235
Chapter 8 RNNs, LSTMs, and GANs
One-to-one: This type is the simplest among all types. It has only one
input and one output. The input and output sizes are fixed. It is also known
as vanilla RNN. This type is recommended for image classification.
One-to-many: This type of RNN generates several outputs from the
fixed-size single output. It generates a sequence of outputs that are more
suitable for image captioning application. It gives the sentence with
multiple words as output from the single fixed-size image input.
Many-to-one: This type of RNN produces the fixed-size single output
from the sequence of several inputs. This receives a sentence with a sequence
of words as input and generates the classified output stating the sentence is
positive or negative or neutral. It is suitable for sentiment analysis applications.
Many-to-many: This type of RNN generates a sequence of several
outputs from the sequence of several inputs. This in turn is categorized
into two types based on the number of differences between the input and
output. If the number of inputs and outputs are the same, then it is called
an equal unit size and is suitable for name-entity recognition and tagging.
236
Chapter 8 RNNs, LSTMs, and GANs
237
Chapter 8 RNNs, LSTMs, and GANs
LSTM Architecture
The LSTM architecture mainly comprises three gates: input gate, forget
gate, and output gate, as shown in Figure 8-3. The input gate controls
the information stored into the memory cell, the forget cell controls the
discarding of information from the memory cell, and finally the output
gate controls the reading of information stored in the memory cell. The
LSTM network has many such cells connected in succession of one after
the other.
238
Chapter 8 RNNs, LSTMs, and GANs
The memory cell has two states: the hidden state and the cell state.
The states are updated continuously and carry forward the information
from one time step to the next. The cell state corresponds to the long-term
memory, and the hidden state corresponds to the short-term memory. The
hidden state of the current memory cell is updated based on the input, the
previous hidden state, and the current cell state. This mechanism allows
the LSTM to retain whatever information is necessary and discard the
unnecessary information from the memory cell and achieves the learning
of long-term dependencies.
A series of steps happens at each cell over the LSTM network to
receive the previous input and proceed further to produce the sequence
of information. The steps are computing the forget gate, determining the
input gate value, updating the cell state based on the previous two steps,
239
Chapter 8 RNNs, LSTMs, and GANs
and finally computing the hidden state that serves as output to the next
cell. Hence, the hidden state carries the information from the previous cell,
whereas the cell state gathers all the previous information as the long-term
information carrier. This in turn supports LSTM to work well with time-
series and sequential applications.
The forget gate is computed using the function ft = σ (Wf *[ht-1, it] + bf)
where Wf is the weight matrix of forget gate, [ht-1, it] is the concatenation of
the previous hidden state and current input, bf is the bias of the forget gate,
and σ is the sigmoid activation function
The input gate value is computed using the function it = σ (Wi *[ht-1, it] + bi)
where Wi is the weight matrix of the input gate, and bi is the bias of input gate.
The cell state is computed using the function Ct = ft ⊙ Ct-1 + it ⊙ Ct
where Ct-1 is the previous state of the cell and ⊙ is the element-wise
operator.
Finally, the hidden state (output) is computed as ot = σ (Wo *[ht-1, it] + bo)
where Wo is the weight matrix of output gate and bo is the bias of output gate.
240
Chapter 8 RNNs, LSTMs, and GANs
241
Chapter 8 RNNs, LSTMs, and GANs
242
Chapter 8 RNNs, LSTMs, and GANs
and the discriminator loss. The generator loss depicts the weakness of fake
data in showcasing its competency over the real one. The discriminator
loss depicts that how far it is believing the fake one as real. The two losses
are communicated as depicted in Figure 8-5 through backpropagation to
the generator and discriminator, respectively, to improve its process and
reduce their corresponding loss.
Types of GAN
The GANs are categorized based on its neural network structure, its
process, its suitability for applications, etc. There are various types of
GANs, and in this section, we discuss the famous types.
243
Chapter 8 RNNs, LSTMs, and GANs
244
Chapter 8 RNNs, LSTMs, and GANs
245
Chapter 8 RNNs, LSTMs, and GANs
Text to Speech
This module requires the installation of a few language-related libraries
that are used to convert text to speech. The library pyttsx3 is a Python
text-to-speech X platform that initializes a speech engine through the
engine interface. Another library called espeak is a speech synthesizer
library that generates clear, faster English and a few other languages. gTTS
is the Google Translates Text to Speech library, which makes it easier to
convert text to speech using Python script.
246
Chapter 8 RNNs, LSTMs, and GANs
#gTTS converts the text input into a audio object which can be
stored as .mp3 file
from gtts import gTTS
speak_lang = 'en'
top_domain = 'co.in'
speech = gTTS(text=sentence, lang=speak_lang, tld = top_domain,
slow=False)
speech.save("firstspeech.mp3")
247
Chapter 8 RNNs, LSTMs, and GANs
Speech to Text
This module converts the speech into text by using a speech recognition
library. It uses the Google Speech API. This library needs to be installed
before calling the recognizer and recognize Google methods. This module
also uses the pydub library in Python to manipulate audio files such as
play, edit, merge/split, etc. The only constraint in using the pydub library is
that it manipulates only .wav files.
248
Chapter 8 RNNs, LSTMs, and GANs
The Output as
8.7 Summary
The chapter explained the basic recurrent neural network and its
architecture in detail. The various types of RNNs were demonstrated using
figures. Besides the types, neural network connectivity was discussed.
Then the most efficient type of RNN, the LSTM, was covered, with its basic
architecture, and the mathematical expressions for all three gates were
discussed. Then the powerful bidirectional LSTM was explained so that
its efficiency could be fully utilized. Finally, the adversarial generator and
discriminator model that frames GAN was described along with its pros
and cons.
249
CHAPTER 9
Optimization Methods
9.1 Introduction
Learning algorithms have several parameters such as learning rate,
epochs, etc. The output of a learning algorithm is compared to the actual
output, and the difference between them is calculated and called the
cost function. Optimizers are functions that are used in machine learning
algorithms to minimize the difference between the actual output and the
predicted output using a gradient. The minimization of the cost function
value is carried out by finding the gradient. The gradient is the value
change of all the parameters involved in learning with respect to the
change happening in the cost function. The slope can be steeper if the
gradient is higher, and learning happens in a faster way. This will repeat for
several iterations in order to minimize the cost function value. But at the
same time, the learning process stops if the slope becomes zero.
Types of Optimizers
The optimizers update the weights involved in the learning algorithm
along with the learning rate to move toward the minimum cost function
value. It will repeat for certain iterations, and it stops at a point where
further minimization is negligible with a corresponding computing time. It
updates the weight as follows:
∂θ
Wx = Wx −1 − α
∂Wx
• Adagrad
• RMSProp
• Adadelta
• Momentum
• Nesterov momentum
• Adam
• Adamax
• SMORMS3
252
Chapter 9 Optimization Methods
minima. Since nonconvex functions have many peaks and valleys, multiple
local minima exists, and this type may be bounded in any one of the local
minima instead of the global minima.
Another challenge is a saddle point problem where the gradient
becomes zero and is not an optimal value. Besides, very small and very
large gradient values lead to vanishing gradient or exploding gradient
problems. This is mainly because of the incorrect initialization of
learning parameters, and it results in high-cost function values due to
nonconvergence. The following code and Figure 9-1 demonstrate this
optimizer:
# function to be optimized
def f(x, y):
return x**2 + y**2
253
Chapter 9 Optimization Methods
return series
# input specification
254
Chapter 9 Optimization Methods
255
Chapter 9 Optimization Methods
gradient, and updates the parameters using the mean gradient. The
cost function value decreases smoothly as it progresses over the epochs.
This type is well suited with the increased number of features. Though it
generates accurate output and comparatively smooth error manifolds, the
computation cost is very high.
The downside of this optimizer is that it takes relatively more time for
convergence as it takes the entire training set during each iteration. This
type is not good for the nonconvex functions as it handles the entire set
of input in each iteration; therefore, there exists the possibility of getting
trapped in local minima. The following code and Figure 9-2 demonstrate
the convergence of this optimizer:
256
Chapter 9 Optimization Methods
257
Chapter 9 Optimization Methods
258
Chapter 9 Optimization Methods
259
Chapter 9 Optimization Methods
return loss_val
#input parameters
X = npy.array([[0.88837122, 0.55088876, 0.07233497, 0.18170872,
0.75790715],
[0.99225237, 0.5395178, 0.79842092, 0.02775931, 0.39621768],
[0.30150944, 0.68509148, 0.3827384, 0.83898851, 0.7266716]])
y = npy.array([[0.67738928], [0.81657103], [0.13115408]])
loss_val = SGD_train(X, y, 20, stp_sz=0.1, plt_all=1)
260
Chapter 9 Optimization Methods
261
Chapter 9 Optimization Methods
mean_loss = current_loss/ N
loss_val.append(mean_loss)
print(f"\nEpoch:{epoch}, Mean Square Error: {loss}")
return loss_val
262
Chapter 9 Optimization Methods
9.6 Adagrad
This type is said to be an adaptive gradient because the learning rate
changes for each iteration. In this type, the learning rate plays a major role
in updating the parameters involved in learning. This is mainly because of
the different frequency of features influencing the learning. The weights
of high-frequency features are updated with low learning rates, and the
weights of low-frequency features are updated with high learning rates to
get better accuracy. This type of optimizer is best suited for sparse data.
Hence, at each iteration, the learning rate α is calculated based on the time
instance t and other parameters like frequency of features, etc. Also, in this
type, there is no need to change the learning rate manually.
The downside of the optimizer is that as the number of iterations
increases, the α value becomes very large and leads to the vanishing
gradient problem where there is no change between the previous and
current iteration cost function value. The following code and Figure 9-5
demonstrate this optimizer:
263
Chapter 9 Optimization Methods
# function to be optimized
def f(x, y):
return x**2 + y**2
# Adagrad function
def adaG(f, df_dx_dy, xyval, num_itrns, stp_sz):
# record solutions
solns = list()
sol_scores = []
sol_traj = []
# setting initial position
soln = xyval[:, 0] + npy.random.rand(len(xyval)) * (xyval[:,
1] - xyval[:, 0])
# compute sum of square gradients
sq_gd_sums = [0.0 for _ in range(xyval.shape[0])]
# gradient descent calculation
for itr in range(num_itrns):
grad = df_dx_dy(soln[0], soln[1])
# Calcualte squared partial derivatives and add
for i in range(grad.shape[0]):
sq_gd_sums[i] += grad[i]**2.0
# record new solution
new_soln = list()
for i in range(soln.shape[0]):
264
Chapter 9 Optimization Methods
265
Chapter 9 Optimization Methods
solns = npy.asarray(solns)
# Plot trajectory path
figure = mplt.figure()
grph = figure.add_subplot(111, projection='3d')
grph.plot_surface(X, Y, Z, cmap='viridis', alpha=0.5)
grph.scatter(solns[0], solns[1], f(solns[0], solns[1]),
color='red', label='Best_solution')
grph.plot([pt[0] for pt in sol_traj],
[pt[1] for pt in sol_traj], sol_scores,
color='blue', label='Trajectory from intitial point')
grph.set_xlabel('X')
grph.set_ylabel('Y')
grph.legend()
266
Chapter 9 Optimization Methods
9.7 RMSProp
The root mean square propagation optimizer is an extension of adagrad.
This type overcomes the limitations of adagrad that has a monotonically
decreasing learning rate. It computes the moving average of squares of
the gradient for individual weights. The learning rate is divided by this
moving average to ensure that the learning rate for each weight adapts.
This type uses the moving average to discard the past rate with exponential
decay. This helps the optimizer to reach the converging point sooner after
discovering the convex portion of the function used.
The gradient is calculated with gt = δC/δw, where C is the cost function.
Squared gradients are calculated using E[grad²]t = βE[grad²]t-1 + (1-β)
gradt², where β is the decline rate, generally set to 0.9. Now the adaptive
learning rate is computed using ηt = η / √(E[grad²]t + ε), where η is the
267
Chapter 9 Optimization Methods
initial learning rate and ε is a very less valued constant to avoid division
by zero, often set to 1e-8. Update the weight by Wt+1 = Wt - ηt * gradt. This
process is carried out for each parameter of the network for epochs
count or until the convergence point. The following code and Figure 9-6
demonstrate this optimizer:
# function to be optimized
def f(x1, x2):
return 7 * x1**2.0 + 5 * x2**2.0
268
Chapter 9 Optimization Methods
# initialization of parameters
init_x1 = -4.0
init_x2 = 3.0
lr_rate = 0.1
gamma = 0.9
epsilon = 1e-8
epochs_limit = 50
# RMSprop execution
269
Chapter 9 Optimization Methods
# plot in 3D
x1 = npy.arange(-5.0, 5.0, 0.1)
x2 = npy.arange(-5.0, 5.0, 0.1)
# Computing y component
y = f(x1, x2)
fig1 = mplt.figure()
graph3d = fig1.add_subplot(111, projection='3d')
graph3d.plot_surface(x1, x2, y, cmap='viridis', alpha=0.5)
graph3d.scatter(x1_traj[0], x2_traj[0], f(x1_traj[0],
x2_traj[0]), color=’red’, label=’Best solution’)
graph3d.plot(x1_traj, x2_traj, y_traj, color='blue',
label='Trajectory from initial point')
graph3d.set_xlabel('x1')
graph3d.set_ylabel('x2')
graph3d.set_zlabel('y')
graph3d.legend()
graph3d.set_title('RMSProp learning')
270
Chapter 9 Optimization Methods
9.8 Adadelta
This is also an extension of adagrad. It uses the sum of gradients as a
weighted average of square of fixed number of past gradients instead of
using the entire history of gradients. This in turn maintains the learning
rate not to reduce to a very small value. The formula for weight updating is
the same, but the computation of the learning rate at each iteration is with
a weighted average of a fixed number of past gradients.
The moving average E[grad2]t at time instant t is given by E[grad2]t =
γE[grad2]t-1 + (1-γ)gradt2.
Where γ is the learning rate generally set to 0.9.
Let the parameter vector update be ∆θt = -η * gradt,i and θt+1=θt + ∆θt
The vector of Adagrad is ∆θt = (-η / √(Gt + ϵ))* gradt , and it is rewritten
using average past squared gradients as ∆θt = (-η / √( E[grad2]t + ϵ))* gradt.
271
Chapter 9 Optimization Methods
This can then be written as ∆θt = (-η /RMS[grad] t))* gradt. Now the
learning rate η is replaced with the updated RMS parameter until the
previous time step RMS[Δθ]t−1. Then the ∆θt is as follows:
# function to be optimized
def f(x, y):
return 7.0*x**2.0 + 5.0*y**2.0
# Adadelta implementation
def AD(f, df_dx_dy, xyval, num_itr, rhow, ep=1e-3):
# record all solutions
solns = list()
sol_scores = []
sol_traj = []
# set an initial position
soln = xyval[:, 0] + npy.random.rand(len(xyval)) * (xyval
[:, 1] - xyval[:, 0])
# initialize list for storing average square gradients
mean_sqr_grad = [0.0 for _ in range(xyval.shape[0])]
# list of the average parameter updates
mean_sqr_param = [0.0 for _ in range(xyval.shape[0])]
272
Chapter 9 Optimization Methods
for i in range(grad.shape[0]):
# compute squared gradient
sqgrad = grad[i]**2.0
# compute squared gradient moving average and update
mean_sqr_grad[i] = (mean_sqr_grad[i] * rhow) + (sqgrad *
(1.0-rhow))
# record new solution
new_soln = list()
for i in range(soln.shape[0]):
# compute the step size of this variable
alfa = (ep + math.sqrt(mean_sqr_param[i])) / (ep + math.
sqrt(mean_sqr_grad[i]))
# record the variation
variation = alfa * grad[i]
# compute squared parameter moving average and update
mean_sqr_param[i] = (mean_sqr_param[i] * rhow) +
(variation**2.0 * (1.0-rhow))
# Compute the next position using this variable and
record it
value = soln[i] - variation
new_soln.append(value)
soln = npy.asarray(new_soln)
solns.append(soln)
soln_eval = f(soln[0], soln[1])
# display progress
print(f"\nIteration: {itr}, solution :{soln}, evaluation of
solution: {soln_eval}")
# append solution and trajectory of this iteration
273
Chapter 9 Optimization Methods
solns = npy.asarray(solns)
# Plot the trajectory
fig1 = mplt.figure()
grph = fig1.add_subplot(111, projection='3d')
grph.plot_surface(X, Y, Z, cmap='viridis', alpha=0.5)
grph.scatter(solns[0], solns[1], f(solns[0], solns[1]),
color='red', label='Best solution')
grph.plot([point[0] for point in sol_traj],
[point[1] for point in sol_traj], sol_scores,
color='blue', label='Trajectory from initial point')
274
Chapter 9 Optimization Methods
grph.set_xlabel('X')
grph.set_ylabel('Y')
grph.legend()
9.9 Momentum
This type overcomes the downside of all variants of gradient descent
by adding momentum as the exponentially weighted average of a fixed
number of previous gradients.
This is an extension of stochastic gradient and is best for nonconvex
functions. It reduces the number of oscillations in the traditional
gradient descent optimizers by accelerating the process to find the global
275
Chapter 9 Optimization Methods
# function to be optimized
def f(x,y):
return x**2.0 + y**2.0
276
Chapter 9 Optimization Methods
277
Chapter 9 Optimization Methods
278
Chapter 9 Optimization Methods
279
Chapter 9 Optimization Methods
# function to be optimized
def f(x, y):
return x**2.0 + y**2.0
280
Chapter 9 Optimization Methods
281
Chapter 9 Optimization Methods
# plot 3D visualization
x = npy.linspace(xyval[0, 0], xyval[0, 1], 100)
y = npy.linspace(xyval[1, 0], xyval[1, 1], 100)
X, Y = npy.meshgrid(x, y)
Z = f(X, Y)
solns = npy.asarray(solns)
fig1 = mplt.figure()
grph = fig1.add_subplot(111, projection='3d')
grph.plot_surface(X, Y, Z, cmap='viridis', alpha=0.5)
grph.scatter(solns[0], solns[1], f(solns[0], solns[1]),
color='red', label='Best solution')
grph.plot([point[0] for point in sol_traj],
[point[1] for point in sol_traj], sol_scores,
color='blue', label='Trajectory from initial point')
grph.set_xlabel('X')
grph.set_ylabel('Y')
grph.legend()
282
Chapter 9 Optimization Methods
9.11 Adam
This type is said to be an adaptive moment estimation optimizer, which
computes the adaptive learning rate for each parameter involved in
learning over every iteration. It also employs bias correction at the
earlier iterations to balance the initialization bias. It is an amalgamation
momentum and RMSProp for determining the parameters. This type is
best for nonconvex functions and noisy functions with sparse gradients.
The hyperparameters used in Adam are α, which is the step size;
β₁, which is the decay rate of momentum; β₂, which is the decay rate of
squared gradients; and ϵ, which is the small value to avoid divide by zero
and generally is set to 1e-8.
The momentum is given by mt = β1 * mt-1 + (1-β1)*gradt.
The root mean squared gradient is given by υt = β2 * υt-1 + (1-β2)*gradt2.
The updation is given by θ = θ – ((α * mt ) / √ (υt + ϵ)).
283
Chapter 9 Optimization Methods
The Adam optimizer requires very little memory and is suitable for
roaming objectives problems, large datasets, more parameters, etc.
The downside of the optimizer is that its performance is highly
hyperparameter sensitive, and it requires more computational overhead
when compared to gradient descent variants. The following code and
Figure 9-10 demonstrate this optimizer:
# function to be optimized
def f(x, y):
return x ** 2.0 + y ** 2.0
# Adam implementation
def ADAM(f, df_dx_dy, xyval, num_itr,alfa, bta1, bta2,
eps=1e-8):
# set an initial position
x = xyval[:, 0] + npy.random.rand(len(xyval)) *
(xyval[:, 1] - xyval[:, 0])
sol_scores = []
sol_traj = []
284
Chapter 9 Optimization Methods
# plot 3D visualization
x = npy.linspace(xyval[0, 0], xyval[0, 1], 100)
y = npy.linspace(xyval[1, 0], xyval[1, 1], 100)
285
Chapter 9 Optimization Methods
X, Y = npy.meshgrid(x, y)
Z = f(X, Y)
286
Chapter 9 Optimization Methods
9.12 Adamax
This type is the extended version of the Adam optimizer. Adam updates the
weights in porportion to the square of the past history of gradients, whereas
Adamax updates with a maximum of past gradients. This optimizer
automatically adjusts the learning rate for all parameters involved in the
learning model. The moment vector and the max of exponentially weighted
value for each parameter are denoted as m and v. The gradient is given
by grad(t) = f '(x(t-1)). The moment vector updation using gradient and
hyper parameter β1 is given by mv(t) = β1 * mv(t-1) + (1-β1) *grad(t). The
exponentially weighted infinity norm using the β2 hyperparameter is given
by ev(t) = max (β2 * ev(t-1), abs(grad(t))). Now the updating takes place with
the parameter updation and general updation rule as follows:
287
Chapter 9 Optimization Methods
# function to be optimized
def f(x, y):
return x**2.0 + y**2.0
# Adamax implementation
def adMax(f, df_dx_dy, xyval, num_itr, alfa, bta1, bta2):
solns = list()
# set an initial position
x = xyval[:, 0] + npy.random.rand(len(xyval)) * (xyval
[:, 1] - xyval[:, 0])
sol_scores = []
sol_traj = []
# moment initialization
m1 = [0.0 for _ in range(xyval.shape[0])]
m2 = [0.0 for _ in range(xyval.shape[0])]
for itr in range(num_itr):
# compute gradient
grad = df_dx_dy(x[0], x[1])
# find solution for each variable one at a time
288
Chapter 9 Optimization Methods
for i in range(x.shape[0]):
m1[i] = bta1 * m1[i] + (1.0 - bta1) * grad[i]
m2[i] = max(bta2 * m2[i], abs(grad[i]))
stp_sz = alfa / (1.0 - bta1**(itr+1))
dta = m1[i] / m2[i]
x[i] = x[i] - stp_sz * dta
sol_score = f(x[0], x[1])
sol_scores.append(sol_score)
sol_traj.append(x.copy())
# display the progress
print(f"\nIteration: {itr}, solution :{x}, score of
solution: {sol_score}")
return x, sol_scores, sol_traj
# plot 3D visualization
x = npy.linspace(xyval[0, 0], xyval[0, 1], 100)
289
Chapter 9 Optimization Methods
290
Chapter 9 Optimization Methods
9.13 SMORMS3
Squared mean over root mean squared cubed (SMORMS3) is a variant of
RMSProp where the squared gradient is replaced with the cube of a squared
gradient. Unlike RMSProp computing the average of squared gradients,
SMORMS3 computes the cube root of the moving average of the cube of
squared gradients, and it helps in preventing the learning rate to decrease
quickly. It also helps in preventing the learning rate from becoming too small
and slows down the optimization process. In addition, it has a damping
factor like RMSProp, which avoids the learning rate from becoming too large.
This is best for the functions that have a high variance in gradient
during the optimization process. It offers better performance when the
dataset is large and has higher dimensions and steady performance
though the data has noisy gradients. The benefit of this type is achieved at
the cost of computation expensive requirement. Try coding this by yourself
as it requires minor modification in the RMSProp code.
291
Chapter 9 Optimization Methods
9.14 Summary
Gradient descent and the variant of gradient descent optimizers are simple
optimizers but slow in computation comparatively. Adagrad updates the
learning rate and and achieves better computational speed. RMSProp
and Adadelta are alternatives that use a moving average of gradients and
avoid the monotonical decreasing learning rate. Momentum and Nesterov
momentum adds the momentum to speed up the convergence process.
The Adam optimizer finds the updates for each parameter in each iteration
and applies bias correction. Adamax applies the infinite norm of the
moving average of the gradient. Sworms3 applies the cube of the squared
gradient. Each optimizer has its own benefits. Based on the nature of the
problem, the optimizer can be applied to get better results.
292
Index
A loss function, 185–190
synapse/synaptic junction, 177
Activation function, 180
types, 191
Adadelta, 18, 292
unsupervised, 207, 208
Adaptive gradient, 263
Autoencoders, 13, 208
Adaptive gradient descent
(Adagrad), 18
Additive hierarchical B
clustering, 112 Back propagation, 180
Agglomerative clustering, 112 Backward propagation through
AlexNet, 216 time (BPTT), 234
ANNs, see Artificial neural Bagging and boosting, 95
networks (ANNs) Batch gradient descent optimizer, 17
APIs, see Application programming Binary class loss function, 186
interface (APIs) Binary step activation function, 180
Application programming interface BPTT, see Backward propagation
(APIs), 21, 53 through time (BPTT)
Apriori algorithm, 122, 137
Artificial neural networks (ANNs), 11
activation function, 180, 181, C
183, 184 Classification
architecture, 179 case study, agriculture, 97,
communication structure, 98, 100–106
human brain, 178, 179 decision trees, 94
definition, 177 definition, 92
feedback networks, 203–206 logistic regression, 93
feed forward ANNs, 191 naïve bayes, 94, 95
294
INDEX
295
INDEX
296
INDEX
297
INDEX
Reinforcement learning S
applications, 152
Scikit-learn
components, 140, 141
definition, 61
definition, 139
features, 61, 62
features, 139
modules, 66
Python, case study, 153,
SciPy
154, 156, 157, 159–162, definition, 42
165, 166, 168, 169, features, 42, 45
171–174, 176 functions and classes, 42
types, 142 modules, 45
Bellman equation, 144 Seaborn
DP, 145, 147 definition, 66
MC, 147 features, 67
MDP, 144 modules, 71
neural network methods, Pandas dataframe, 67
150, 151 Self-organizing maps
N-step bootstrappping, 149 (SOMs), 13, 208
tabular methods, 145 Series and dataframe, 50
temporal difference, 148 SGD, see Stochastic gradient
ReLU, see Rectified Linear descent (SGD)
Unit (ReLU) Sigmoid function, 180
Reproduction, 5 Simple linear regression, 78
ResNet, 220, 231 Single node, 203
Restricted Boltzmann machines Singular value decomposition
(RBMs), 13, 207 (SVD), 124
Ridge regression, 80 SMORMS3, see Squared mean over
RMSProp, see Root mean square root mean squared cubed
propagation optimizer (SMORMS3)
(RMSProp) SOMs, see Self-organizing
RNNs, see Recurrent neural maps (SOMs)
networks (RNNs) Sphering, 117
Root mean square propagation Squared mean over root mean
optimizer squared cubed
(RMSProp), 18, 267 (SMORMS3), 20, 291
298
INDEX
299