0% found this document useful (0 votes)
14 views58 pages

Module 4

Neural networks, or artificial neural networks (ANNs), are inspired by the human brain and are essential to deep learning, capable of solving various practical problems like speech recognition and climate modeling. They consist of interconnected neurons that process information through weighted inputs and activation functions, with training often utilizing backpropagation to minimize error rates. Support Vector Machines (SVM) are another machine learning approach primarily used for classification, focusing on finding hyperplanes that best separate data classes, with applications in bioinformatics and event detection.

Uploaded by

ASLAM SALAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views58 pages

Module 4

Neural networks, or artificial neural networks (ANNs), are inspired by the human brain and are essential to deep learning, capable of solving various practical problems like speech recognition and climate modeling. They consist of interconnected neurons that process information through weighted inputs and activation functions, with training often utilizing backpropagation to minimize error rates. Support Vector Machines (SVM) are another machine learning approach primarily used for classification, focusing on finding hyperplanes that best separate data classes, with applications in bioinformatics and event detection.

Uploaded by

ASLAM SALAM
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 58

Module 4

Neural network
• Neural networks, also known as artificial neural networks (ANNs) or
simulated neural networks (SNNs), are a subset of machine learning
and are at the heart of deep learning algorithms. Their name and
structure are inspired by the human brain, mimicking the way that
biological neurons signal to one another.
Artificial Neural Network (ANN)
• The human brain is made up of about 85 billion neurons,
resulting in a network capable of storing a tremendous
amount of knowledge. As you might expect, this dwarfs the
brains of other living creatures.
• For instance, a cat has roughly a billion neurons, a mouse has
about 75 million neurons, and a cockroach has only about a
million neurons. In contrast, many ANNs contain far fewer
neurons, typically only several hundred, so we're in no danger
of creating an artificial brain anytime in the near future—even
a fruit fly brain with 100,000 neurons far exceeds the current
ANN state-of-the-art
• Rudimentary ANNs have been used for over 50 years to simulate the brain's
approach to problem solving.
• At first, this involved learning simple functions, like the logical AND function or the
logical OR.
• These early exercises were used primarily to construct models of how biological
brains might function.

Practical problems such as


• The Speech and handwriting recognition programs like those used by voicemail
transcription services and postal mail sorting machines .
automation of smart devices like an office building's environmental controls or self-
driving cars and self-piloting drones .
• Sophisticated models of weather and climate patterns, tensile strength, fluid
dynamics, and many other scientific, social, or economic phenomena.

ANNs are versatile learners that can be applied to nearly any learning task:
classification, numeric prediction, and even unsupervised pattern recognition
From biological to artificial neurons
• Because ANNs were intentionally designed as conceptual
models of human brain activity, it is helpful to first understand
how biological neurons function.
• As illustrated in the following figure, incoming signals are
received by the cell's dendrites through a biochemical process
that allows the impulse to be weighted according to its
relative importance or frequency.
• As the cell body begins to accumulate the incoming signals, a
threshold is reached at which the cell fires and the output
signal is then transmitted via an electrochemical process
down the axon. At the axon's terminals, the electric signal is
again processed as a chemical signal to be passed to the
neighboring neurons across a tiny gap known as a synapse.
The model of a single artificial neuron can be understood in terms very similar
to the biological model. As depicted in the following figure, a directed
network diagram defines a relationship between the input signals received by
the dendrites (x variables) and the output signal (y variable). Just as with the
biological neuron, each dendrite's signal is weighted (w values) according to
its importance—ignore for now how these weights are determined. The input
signals are summed by the cell body and the signal is passed on according to
an activation function denoted by f.
A typical artificial neuron with n input dendrites can be represented by the
formula that follows. The w weights allow each of the n inputs, (x), to
contribute a greater or lesser amount to the sum of input signals. The net
total is used by the activation function f(x), and the resulting signal, y(x), is
the output axon.
There are numerous variants of neural networks, each can be defined in
terms of the following characteristics

• An activation function, which transforms a neuron's net input signal into a


single output signal to be broadcasted further in the network
• A network topology (or architecture), which describes the number of
neurons in the model as well as the number of layers and manner in which
they are connected
• The training algorithm that specifies how connection weights are set in
order to inhibit or excite neurons in proportion to the input signal
The activation function is the mechanism by which the artificial neuron
processes information and passes it throughout the network.
Threshold activation function
A threshold activation function is a type of activation function used in some neural
networks, typically in simpler or older models like perceptrons. It is a binary activation
function that outputs a specific value (commonly 0 or 1) based on whether the input
exceeds a certain threshold.
Binary output: The output is either 0 or 1, making it useful in
situations where binary decisions are needed.
Thresholding behavior: A certain threshold value is defined, and
the function "activates" when the input crosses that threshold.
Unit step activation function
the neuron fires when the sum of input signals is at least zero.
Because of its shape, it is sometimes called a unit step activation
function.
The number of layers

1. The number of layers:-


• The input layers : the nodes which receive unprocessed
signals directly from the input data.
• Output layer:-those nodes which generate the signals
predicted values
• Hidden nodes:-a node that process the signals from the
input node
Network topology
1. The capacity of a neural network to learn is rooted in its topology, or the patterns
and structures of interconnected neurons.
2. Although there are countless forms of network architecture, they can be
differentiated by three key characteristics:
• The number of layers
• Whether information in the network is allowed to travel backward
• The number of nodes within each layer of the network

The topology determines the complexity of tasks that can be learned by the network.
Generally, larger and more complex networks are capable of identifying more subtle
patterns and complex decision boundaries. However, the power of a network is not
only a function of the network size, but also the way units are arranged
The input and output nodes are arranged in groups known as layers. Because the input
nodes process the incoming data exactly as received, the network has only one set of
connection weights (labelled here as w1, w2, and w3).
It is therefore termed a single-layer network. Single-layer networks can be used for
basic pattern classification, particularly for patterns that are linearly separable, but
more sophisticated networks are required for most learning tasks.
As you might expect, an obvious way to create more complex networks is by adding
additional layers.
As depicted here, a multilayer network adds one or more hidden layers that process
the signals from the input nodes prior to reaching the output node.
Most multilayer networks are fully connected, which means that every node in one
layer is connected to every node in the next layer, but this is not required
The direction of information
travel

1. Networks in which the input signal is fed continuously in one


direction from connection-to-connection until reaching the output
layer are called feedforward networks.
2.Feedback networks- networks which allows signals to travel in
both directions using loops are called feedback network/recurrent
network
The number of nodes in each layer

• Num of features in the input data: the number of input nodes


• Num of output nodes:- number of outcome to be modelled
• Num of hidden nodes :- user to decide prior to training the model. The
appropriate number depends are no of input nodes amount of training
data , amount of noisy data, complexity of the learning task and many other
factors.
The training algorithm

1.Algorithm specifies how connection are set in order


2.Mainly two algorithm for learning a single perceptron
• Perceptron rule: used when training dataset is linearly
separatable
• Delta rule-used when training dataset is not linearly
separable
Training neural networks with
backpropagation

What is Backpropagation?
Backpropagation is the essence of neural network training. It is the
method of fine-tuning the weights of a neural network based on the
error rate obtained in the previous epoch (i.e., iteration). Proper tuning
of the weights allows you to reduce error rates and make the model
reliable by increasing its generalization.

Backpropagation is an algorithm for supervised learning of artificial


neural networks using gradient descent.

Given an ANN and error function , the method calculates the gradient of
the error function with respect to the neural network weight
How Backpropagation Algorithm
Works
1. Inputs X, arrive through the preconnected
path
2. Input is modelled using real weights W. The
weights are usually randomly selected.
3. Calculate the output for every neuron from
the input layer, to the hidden layers, to the
output layer.
4. Calculate the error in the outputs
5. Travel back from the output layer to the
hidden layer to adjust the weights such
that the error is decreased

Keep repeating the process until the desired


output is achieved

ErrorB= Actual Output – Desired Output


In its most general form, the backpropagation algorithm iterates through
many cycles of two processes. Each iteration of the algorithm is known as an
epoch. Because the network contains no a priori (existing) knowledge,
typically the weights are set randomly prior to beginning. Then, the algorithm
cycles through the processes until a stopping criterion is reached. The cycles
include:
1. forward phase
2. backward phase
A forward phase in which the neurons are activated in sequence from the input
layer to the output layer, applying each neuron's weights and activation function
along the way. Upon reaching the final layer, an output signal is produced.

A backward phase in which the network's output signal resulting from the
forward phase is compared to the true target value in the training data. The
difference between the network's output signal and the true value results in an
error that is propagated backwards in the network to modify the connection
weights between neurons and reduce future errors.
How does a Perceptron learn the appropriate weights using delta rule.
The Perceptron is a fundamental type of neural network that classifies input data by adjusting its weights
iteratively. It uses the delta rule (also known as the perceptron learning rule) to learn appropriate weights based
on the difference between the predicted output and the actual output during training.
Here's a breakdown of how it works:
44

Support Vector Machine


• SVM is a supervised machine algorithm that can be employed for
both classification regression problem.
• But Mostly it is used for classification problem.
• The main idea behind SVM is of finding a hyperplane that the
best divides the dataset into two classes.
• Compared to newer algorithms like neural networks, they have
two main advantages: higher speed and better performance with a
limited number of samples (in the thousands).
45

Applications

• Classification of microarray gene expression data in the field of


bioinformatics to identify cancer or other genetic diseases
• Text categorization, such as identification of the language used in
a document or organizing documents by subject matter
• The detection of rare yet important events like combustion engine
failure, security breaches, or earthquakes
46

Classification with hyperplanes


• SVMs use a linear boundary called a hyperplane to partition data
into groups of similar elements, typically as indicated by the class
values. For example, the following figure depicts a hyperplane that
separates groups of circles and squares in two and three
dimensions. Because the circles and squares can be divided by
the straight line or flat surface, they are said to be linearly
separable.
47

• The task of the SVM algorithm is to identify a line that separates the two
classes. As shown in the following figure, there is more than one choice of
dividing line between the groups of circles and squares. Three such
possibilities are labelled a, b, and c. How does the algorithm choose?
48

Finding the maximum margin


• The answer to that question involves a search for the Maximum Margin
Hyperplane (MMH) that creates the greatest separation between the two
classes.
• The support vectors (indicated by arrows in the figure that follows) are the
points from each class that are the closest to the MMH; each class must
have at least one support vector, but it is possible to have more than one.
Using the support vectors alone, it is possible to define the MMH.
49

The case of linearly separable data


• It is easiest to understand how to find the maximum margin under
the assumption that the classes are linearly separable. In this
case, the MMH is as far away as possible from the outer
boundaries of the two groups of data points. These outer
boundaries are known as the convex hull.
• The MMH is then the perpendicular bisector of the shortest line
between the two convex hulls.
• Sophisticated computer algorithms that use a technique known as
quadratic optimization are capable of finding the maximum margin
in this way
50

• An alternative (but equivalent) approach involves a search


through the space of every possible hyperplane in order to find a
set of two parallel planes which divide the points into
homogeneous groups yet themselves are as far apart as possible
• To understand this search process, we'll need to define exactly
what we mean by a hyperplane. In n-dimensional space, the
following equation is used.

• The arrows above the letters indicate that they are vectors rather than
single numbers. In particular, w is a vector of n weights, that is, {w1, w2,
…, wn}, and b is a single number known as the bias.
51

• Since the training data is linearly seperable,we have two hyperplane that
separate two class of data,so that the distance between them as large as
possible.
• Using this formula, the goal of the process is to find a set of weights that
specify two hyperplanes, as follows:

If Yi=+1
Yi=-1
• We will also require that these hyperplanes are specified such that all the
points of one class fall above the first hyperplane and all the points of the
other class fall beneath the second hyperplane. This is possible so long
as the data are linearly separable.
• Vector geometry defines the distance between these two planes as:
52

• Here, ||w|| indicates the Euclidean norm (the distance from the origin to
vector w). Therefore, in order to maximize distance, we need to minimize
||w||. In order to facilitate finding the solution, the task is typically re
expressed as a set of constraints:

• the idea is to minimize the previous formula subject to (s.t.) the condition each of
the yi data points is correctly classified. Note that y indicates the class value
(transformed to either +1 or -1) and the upside down "A" is shorthand for "for all."
53

The case of non-linearly separable data

• SVM Classifier will give a solution only if the given two class dataset is
linearly separable.
• But ,in real life problems, two class dataset are only rarely linearly
separable such cases, on additional variable called slack variables ie :,
which stores the deviations from the margin.s
54

• Given two class linearly separable dataset of N points of the form


(x1,y1),(x2,y2)…….(xN,YN)
Where yi area either +1 or -1
• We then find the vectors and the bias b
Which minimize

• A cost value (denoted as C) is applied to all points that violate the


constraints, and rather than finding the maximum margin, the
algorithm attempts to minimize the total cost.
• We can therefore revise the optimization problem to:
55

Kernels funtion
• SVM algorithm use a set of mathematical functions that are defined
as the kernel.
• A function is said to be kernel function it is the form
• These are the n dimensional vectors they have special properties.
• These functions are used to obtain some SVM like classifies for 2-
class data set that are not linearly separable.
• For a non-linear data SVM finds it difficult to classify the data. The
easy solution is to use kernel trick
56

• Kernel functions, in general, are of the following form. Here, the function
denoted by the Greek letter phi, that is, ϕ(x), is a mapping of the data into
another space:
57

A few of the most commonly used kernel functions are


listed as follows.
58

• The kernel function of SVM map P Dimensional space into a much a


higher dimensional space, they are used to classify the non liner data
points, for that using different kernel function.

You might also like