0% found this document useful (0 votes)
95 views30 pages

MLT Unit-4 Notes

Uploaded by

srimaddhesia9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views30 pages

MLT Unit-4 Notes

Uploaded by

srimaddhesia9
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 30

UNIT-4

 Artificial Neural Network:

The term "Artificial Neural Network" is derived from Biological


neural networks that develop the structure of a human brain.
Similar to the human brain that has neurons interconnected to
one another, artificial neural networks also have neurons that are
interconnected to one another in various layers of the networks.
These neurons are known as nodes.

An Artificial Neural Network in the field of Artificial


intelligence where it attempts to mimic the network of neurons
makes up a human brain so that computers will have an option to
understand things and make decisions in a human-like manner.
The artificial neural network is designed by programming
computers to behave simply like interconnected brain cells.

There are around 1000 billion neurons in the human brain. Each
neuron has an association point somewhere in the range of 1,000
and 100,000. In the human brain, data is stored in such a manner
as to be distributed, and we can extract more than one piece of
this data when necessary, from our memory parallelly. We can
say that the human brain is made up of incredibly amazing
parallel processors.

We can understand the artificial neural network with an example,


consider an example of a digital logic gate that takes an input and
gives an output. "OR" gate, which takes two inputs. If one or both
the inputs are "On," then we get "On" in output. If both the inputs
are "Off," then we get "Off" in output. Here the output depends
upon input. Our brain does not perform the same task. The
outputs to inputs relationship keep changing because of the
neurons in our brain, which are "learning."
 Perceptron’s:

A single-layer perceptron is the basic unit of a neural network.


A perceptron consists of input values, weights and a bias, a
weighted sum and activation function.

 Basic Components of Perceptron:

Perceptron is a type of artificial neural network, which is a


fundamental concept in machine learning. The basic components
of a perceptron are:
1. Input Layer: The input layer consists of one or more input
neurons, which receive input signals from the external
world or from other layers of the neural network.

2. Weights: Each input neuron is associated with a weight,


which represents the strength of the connection between
the input neuron and the output neuron.

3. Bias: A bias term is added to the input layer to provide the


perceptron with additional flexibility in modeling complex
patterns in the input data.

4. Activation Function: The activation function determines


the output of the perceptron based on the weighted sum
of the inputs and the bias term. Common activation
functions used in perceptron’s include the step function,
sigmoid function, and ReLU function.

5. Output: The output of the perceptron is a single binary


value, either 0 or 1, which indicates the class or category
to which the input data belongs.

6. Training Algorithm: The perceptron is typically trained


using a supervised learning algorithm such as the
perceptron learning algorithm or backpropagation. During
training, the weights and biases of the perceptron are
adjusted to minimize the error between the predicted
output and the true output for a given set of training
examples.

7. Overall, perceptron is a simple yet powerful algorithm that


can be used to perform binary classification tasks and has
paved the way for more complex neural networks used in
deep learning today.

 Multilayer Perceptron:
The Multilayer Perceptron was developed to tackle this
limitation. It is a neural network where the mapping between
inputs and output is non-linear.

A Multilayer Perceptron has input and output layers, and one or


more hidden layers with many neurons stacked together.It uses
a back propagation algorithm to increase the accuracy of training
model. And while in the Perceptron the neuron must have an
activation function that imposes a threshold, like ReLU or sigmoid,
neurons in a Multilayer Perceptron can use any arbitrary activation
function.

Working:
1. The input node represents the feature of the dataset.
2. each input node passes the vector input value to the hidden
layer.
3. In the hidden layer, each edge has some weight multiplied by
the input variable. All the production
values from the hidden nodes are summed together to generate
the output.
4. The activation function is used in the hidden layer to identify
the active nodes.
5. The output is passed to the output layer.
6. Calculate the difference between predicted and actual output
at the output layer.
7. The model uses backpropagation after calculating the
predicted output.

 Gradient Descent and Delta Rule:

A set of data points are said to be linearly separable if the data


can be divided into two classes using a straight line. If the data is
not divided into two classes using a straight line, such data points
are said to be called non-linearly separable data.
Although the perceptron rule finds a successful weight vector
when the training examples are linearly separable, it can fail to
converge if the examples are not linearly separable.

A second training rule, called the delta rule, is designed to


overcome this difficulty.

If the training examples are not linearly separable, the delta rule
converges toward a best-fit approximation to the target concept.

The key idea behind the delta rule is to use gradient descent to
search the hypothesis space of possible weight vectors to find the
weights that best fit the training examples.

This rule is important because gradient descent provides the


basis for the BACKPROPAGATON algorithm, which can learn
networks with many interconnected units.

Derivation of Delta Rule


The delta training rule is best understood by considering the task
of training an unthresholded perceptron; that is, a linear unit for
which the output o is given by.
Thus, a linear unit corresponds to the first stage of a perceptron,
without the threshold.

In order to derive a weight learning rule for linear units, let us


begin by specifying a measure for the training error of a
hypothesis (weight vector), relative to the training examples.

Although there are many ways to define this error, one common
measure is

where D is the set of training examples, ‘td’ is the target output


for training example ‘d’ and od is the output of the linear unit for
training example ‘d’.

How to calculate the direction of steepest descent along the error


surface?

The direction of steepest can be found by computing the


derivative of E with respect to each component of the vector w.
This vector derivative is called the gradient of E with respect to w,
written as,

The gradient specifies the direction of steepest increase of E, the


training rule for gradient descent is
Here η is a positive constant called the learning rate, which
determines the step size in the gradient descent search.

The negative sign is present because we want to move the weight


vector in the direction that decreases E.

This training rule can also be written in its component form,

Here,
Finally,

 Self-Organization Map:
Self-Organizing Maps (SOM) or Kohenin’s map is a type of artificial
neural network introduced by Teuvo Kohonen in the 1980s.
A SOM is an unsupervised learning algorithm trained using
dimensionality reduction (typically two-dimensional), discretized
representation of input space of the training samples, called
a map. It differs from other ANN as they apply competitive
learning and not the error-correction learning (like
backpropagation with gradient descent). They use a
neighborhood function to preserve the topological properties of
the input space to reduce data by creating a spatially organized
representation, and also helps to discover the correlation between
data.

Although SOM has been initially proposed for data visualization, it


has been applied to different problems, including a solution to the
Traveling Salesman Problem (TSP).

 Applications

 Dimensionality reduction and data


visualization: In terms of dimensionality reduction,
Principal Component Analysis (PCA) is one of the most
popular tools and has been broadly used. Compared to
PCA, SOM has an advantage in maintaining the
topological (structural) information of the training data
and is not inherently linear. Using PCA on high-
dimensional data may cause a loss of data when the
dimension is reduced to two. If the target data has a lot
of dimensions and every dimension is equally essential,
SOM can be very useful over PCA.

 Seismic facies analysis for oil and gas


exploration: Seismic facies analysis refers to the
interpretation of facies type from the seismic reflector
information. It generates groups based on the
identification of different individual features.
These methods find an organization in the dataset and
form organized relational clusters. However, these
clusters may or may not have any physical analogs. A
calibration method to relate SOM clusters to physical
reality is needed. This calibration method must define
the mapping between the groups and the measured
physical properties; it should also provide an estimate
of the validity of the relationships.

 Text Clustering: Text clustering is an unsupervised


learning process, i.e., not dependent on the prior
knowledge of data, and based solely on the similarity
relationship between documents in the collection to
separate the document collection into some clusters.
Text clustering’s important preprocessing step is to
check how the text can be shown in the form of the
mathematical expression for further analysis and
processing. The Common method is Salton’s vector
space model (Vector Space Model, VSM).

 Self-Organizing Maps architecture

Self-organizing maps consist of two layers, the first one is the


input layer, and the second one is the output layer, also called a
feature map.

SOM can integrate multi-modal input vectors and can extract


relations among them in a 2-dimensional plane. SOM can also be
used for the clustering of unlabeled data or classify labeled data
with labeling the output units after learning. Unlike other ANN
types, SOM doesn’t have activation functions in neurons, and we
directly pass weights to the output layer without doing anything.
 What really happens in SOM?
Each data point in the data set competes to get recognition for
representation. SOM mapping steps start from initializing the
weight vectors. From there, a sample vector is selected randomly,
and the map of the weight vectors is searched to find the weight
which can best represent that sample. Each weight vector
maintains neighboring weights that are close to it. The weight
that is selected is rewarded by being able to become more like
that randomly selected sample vector. The neighbors of that
weight are also considered to be more like the selected sample
vector. This allows the map to form different shapes. Most
generally, they have square/rectangular/hexagonal/L shapes in
the 2D feature space.
Fig 2. Illustration of basic decisions to be made during the
growth procedure.

 Algorithm:
The learning procedure of a SOM is described below.

1. Let wi,j(t) be the weight from an input layer unit i to a Kohonen


layer unit j at time t. Wi,j is initialized using random numbers.

2. Let xi(t) be the data input to the input layer unit i at time t;
calculate the Euclidean distance dj between xi(t) and wi,j(t)

3. Search for a Kohonen layer unit to minimize dj , which is


designated as the best matching unit.

4. Update the weight wi,j(t) of a Kohonen layer unit contained in


the neighborhood region of the best matching unit Nc (t) using
(2), where α(t) is a learning coefficient.

wi,j(t+1) = wi,j(t) + α(t)


( xi(t) – wi,j(t) )

5. Repeat processes 2–4 up to the maximum iteration of learning.

 Self-organizing maps training:


As mentioned before, SOM doesn’t use backpropagation with SGD
to update weights; this type of unsupervised ANN uses
competitive learning to update its weights. Competitive learning
is based on three processes:

 Competition
 Cooperation
 Adaptation
Let’s look at the process in detail-

 Competition:

In the below example, each neuron of the output layer will


have a vector with dimension n. We compute the distance
between each output layer neuron and the input data. The
neuron with the lowest distance will be the winner of the
competition. The Euclidean metric is commonly used to
compute distance.

 Cooperation:
We will update the vector of the winner neuron in the
final process (adaptation) along with its neighbor.
How do we choose the neighbors?
Selecting neighbors using neighborhood kernel
function, and this function depends on two factors: time
and distance between the winner neuron and the other
neuron.

 Adaptation:

After selecting the winner neuron and its neighbors, we


compute the neuron to update. Those selected neurons
will be updated but not the same update, more the
distance between the neuron and the input data grow
less we adjust it like shown in the image below:

 Pros of Kohonen Maps


 Data is easily interpreted and understood (reduction of
dimensionality and grid clustering)
 Capable of handling several types of classification
problems while providing a useful, and intelligible
summary of the data.

 Cons of Kohonen Maps


 It does not build a generative model for the data, i.e.,
the model does not understand how data is created.
 It doesn’t perform well when using categorical data,
even worse for mixed types of data.
 The model preparation time is slow, hard to train
against slowly evolving data.

 Convolution neural network:

Convolution Neural Network (also known as ConvNet or CNN) is a


type of feed-forward neural network used in tasks like image
analysis, natural language processing, and other complex image
classification problems.

Convolution Layer:

CNN works by comparing images piece by piece.

Filters are spatially small along width and height but extend through
the full depth of the input image. It is designed in such a manner
that it detects a specific type of feature in the input image.
In the convolution layer, we move the filter/kernel to every possible
position on the input matrix. Element-wise multiplication between
the filter-sized patch of the input image and filter is done, which is
then summed.

The translation of the filter to every possible position of the input


matrix of the image gives an opportunity to discover that feature is
present anywhere in the image.

The generated resulting matrix is called the feature map.

Convolution neural networks can learn from multiple features


parallelly. In the final stage, we stack all the output feature maps
along with the depth and produce the output.
Now, let's go over a few important terms that you might encounter
when learning about Convolutional Neural Networks.

Local connectivity refers to images represented in a matrix of


pixel values. The dimension increases depending on the size of the
image. If all the neurons are connected to all previous neurons as in
a fully connected layer, the number of parameters increases
manifold.
To resolve this, we connect each neuron to only a patch of input
data. This spatial extent (also known as the receptive field of the
neuron) determines the size of the filter.

Here's how it works in practice—

Suppose we have an input image is of size 128*128*3. If the filter


size is 5*5*3 then each neuron in the convolution layer will have a
total of 5*5*3 = 75 weights (and +1 bias parameter).

Spatial arrangement governs the size of the neurons in the output


volume and how they are arranged.

Three hyperparameters that control the size of the output volume:


 The depth—The depth of the output volume is equal to the
number of filters we use to look for different features in the
image. The output volume has stacked activation/feature maps
along with the depth, making it equal to the number of filters
used.
 Stride - Stride refers to the number of pixels we slide while
matching the filter with the input image patch. If the stride is
one, we move the filters one pixel at a time. Higher the stride,
smaller output volumes will be produced spatially.
 Zero-padding—It allows us to control the spatial size of the
output volume by padding zeros around the border of the input
data.

Parameter Sharing means that the same weight matrix acts on all
the neurons in a particular feature map—the same filter is applied in
different regions of the image. Natural images have statistical
properties, one being invariant to translation.
For example, an image of a cat remains an image of a cat even if it
is translated one pixel to the right—CNNs take this property into
account by sharing parameters across multiple image locations.
Thus, we can find a cat with the same feature matrix whether the cat
appears at column i or column i+1 in the image.

ReLU Layer:

In this layer, the ReLU activation function is used, and every


negative value in the output volume from the convolution layer is
replaced with zero. This is done to prevent the values from summing
up to zero.

Pooling Layer:

Pooling layers are added in between two convolution layers with the
sole purpose of reducing the spatial size of the image
representation.

The pooling layer has two hyperparameters:


 window size
 stride

From each window, we take either the maximum value or the


average of the values in the window depending upon the type of
pooling being performed.

The Pooling Layer operates independently on every depth slice of


the input and resizes it spatially, and later stacks them together.
Types of Pooling:

Max Pooling selects the maximum element from each of the


windows of the feature map. Thus, after the max-pooling layer, the
output would be a feature map containing the most dominant
features of the previous feature map.

Average Pooling computes the average of the elements present in


the region of the feature map covered by the filter. It simply
averages the features from the featur

‍NOTE: Max Pooling performs a lot better than Average Pooling

Normalization Layer:

Normalization layers, as the name suggests, normalize the output of


the previous layers. It is added in between the convolution and
pooling layers, allowing every layer of the network to learn more
independently and avoid overfitting the model.

However, normalization layers are not used in advanced


architectures because they do not contribute much towards effective
training.

Fully Connected Layer:

The Convolutional Layer, along with the Pooling Layer, forms a block
in the Convolutional Neural Network. The number of such layers may
be increased for capturing finer details depending upon the
complexity of the task at the cost of more computational power.
Having been able to furnish important feature extraction, we are
going to flatten the final feature representation and feed it to a
regular fully connected neural network for image classification
purposes.

How do Convolutional Neural Networks work?

Now, let's get into the nitty-gritty of how CNNs work in practice.
CNN has hidden layers of convolution layers that form the base of
ConvNets. Like any other layer, a convolutional layer receives input
volume, performs mathematical scalar product with the feature
matrix (filter), and outputs the feature maps.

Features refer to minute details in the image data like edges,


borders, shapes, textures, objects, circles, etc.

At a higher level, convolutional layers detect these patterns in the


image data with the help of filters. The higher-level details are taken
care of by the first few convolutional layers.

The deeper the network goes, the more sophisticated the pattern
searching becomes.

For example, in later layers rather than edges and simple shapes,
filters may detect specific objects like eyes or ears, and eventually a
cat, a dog, and whatnot.
The first hidden layer in the network dealing with images is usually a
convolutional layer.

When adding a convolutional layer to a network, we need to specify


the number of filters we want the layer to have.

A filter can be thought of as a relatively small matrix for which we


decide the number of rows and columns this matrix has. The value of
this feature matrix is initialized with random numbers. When this
convolutional layer receives pixel values of input data, the filter will
convolve over each patch of the input matrix.

The output of the convolutional layer is usually passed through the


ReLU activation function to bring non-linearity to the model. It takes
the feature map and replaces all the negative values with zero.

But—

We haven't addressed the issue of too much computation that was a


setback of using feedforward neural networks, did we?

It's because there's no significant improvement.

The pooling layer is added in succession to the convolutional layer to


reduce the dimensions.

We take a window of say 2x2 and select either the maximum pixel
value or the average of all pixels in the window and continue sliding
the window. So, we take the feature map, perform a pooling
operation, and generate a new feature map reduced in size.

Pooling is a very important step in the ConvNet as reduces the


computation and makes the model tolerant towards distortions and
variations.

The convolutional layer was responsible for the feature extraction.


But—

What about the final prediction?

A fully connected dense neural network would use a flattened


feature matrix and predict according to the use case.
 1D Convolutional Neural Network (1D CNN):

Input: Typically used for sequence data, such as time series or text.
Input is a 1D array (e.g., a sequence of words or sensor readings
over time).

Convolution Operation: Convolution is applied along one


dimension (e.g., the sequence length). A kernel (filter) slides over
the input, capturing local patterns.

Example Use Cases: Time series analysis: Predicting trends or


patterns in a sequence. Natural language processing: Text
classification, sentiment analysis.

Architecture Highlights: Convolutional layers with ReLU


activation. Pooling layers (e.g., MaxPooling) to downsample and
capture essential features. Fully connected layers for classification or
regression.

 2D Convolutional Neural Network (2D CNN):

Input: Primarily used for image data, where each pixel has spatial
relationships.Input is a 2D grid (e.g., an image with width, height,
and channels).

Convolution Operation: Convolution is applied in both spatial


dimensions (width and height). Kernels move over the image,
capturing local features and spatial hierarchies.

Example Use Cases: Image classification: Identifying objects or


scenes in images.
Object detection: Detecting and localizing objects within an image.

Image segmentation: Assigning labels to individual pixels.

Architecture Highlights: Convolutional layers with ReLU


activation. Pooling layers to down sample spatial dimensions. Fully
connected layers for high-level feature representation and decision
making.

Key Differences: Data Dimension:

1D CNNs operate on sequential data with a single dimension.

2D CNNs operate on 2D grids, commonly used for image data with


width, height, and channels.

Applications:

1D CNNs are suitable for tasks involving sequences, like time series
and text data.

2D CNNs excel in tasks related to images, capturing spatial


relationships and patterns.

Both 1D and 2D CNNs leverage convolutional layers to automatically


learn hierarchical representations from input data, making them
effective in various machine learning applications.

 Case Study: Diabetic Retinopathy Detection with CNNs

1. Problem Statement: Diabetic retinopathy is a leading cause of


blindness among diabetic patients. Early detection is crucial for
timely intervention and prevention of vision loss.

2. Data Collection: Gather a diverse dataset of retinal images,


including both normal and diabetic retinopathy cases. Annotate the
images to indicate the severity of diabetic retinopathy (e.g., using
the Amsler-Beck Grading System).

3. Data Preprocessing: Resize and standardize the images for


consistency. Augment the dataset to increase diversity and improve
model generalization. Normalize pixel values to ensure numerical
stability during training.

4. Model Architecture: Design a CNN architecture suitable for


image classification.

Input Layer: Accepts retinal images with appropriate dimensions.

Convolutional Layers: Extract hierarchical features from the images.

Activation Functions (e.g., ReLU): Introduce non-linearity.

Pooling Layers: Downsample and retain essential information.

Fully Connected Layers: Map high-level features to output classes.

Output Layer: Provides predictions for different levels of diabetic


retinopathy severity.

5. Training:

Split the dataset into training, validation, and test sets.

Train the CNN using the training set, adjusting model parameters
based on performance on the validation set.

Utilize transfer learning if a pre-trained model on a large dataset is


available.

6. Evaluation:

Evaluate the model on the test set to assess its performance.


Metrics may include sensitivity, specificity, accuracy, and area under
the receiver operating characteristic curve (AUC-ROC).

7. Interpretability: Employ techniques for model interpretability to


understand which regions of the retinal images contribute to the
predictions. Grad-CAM (Gradient-weighted Class Activation Mapping)
can highlight important regions.

8. Integration: Integrate the trained model into a system that can


take retinal images as input and provide predictions on diabetic
retinopathy severity. Ensure interoperability with existing healthcare
systems.

9. Deployment: Deploy the model for real-world use, potentially in


collaboration with healthcare professionals. Continuous monitoring
and updating of the model based on new data.

10. Results: Assess the impact of the CNN on diabetic retinopathy


detection by measuring improvements in early diagnosis and patient
outcomes.

Outcome: The deployment of a CNN for diabetic retinopathy


detection can contribute to early intervention and improved
management of the condition, ultimately reducing the risk of vision
loss among diabetic patients. Regular screenings using such
automated systems can enhance the efficiency and accessibility of
healthcare services in managing diabetic retinopathy.

You might also like