ML-5TH UNIT
ML-5TH UNIT
Neural Networks:
1. Introduction
2. Perceptron Learning
3. Backpropagation
4. Training and Validation
5. Parameter Estimation – MLE, MAP, Bayesian
Neural Networks:
1. Introduction:
1. Structure: A neural network consists of interconnected units
called neurons. These neurons send signals to one another. While
individual neurons are simple, when many of them work together in a
network, they can perform complex tasks.
2. Layers: A typical neural network has three main layers:
o Input Layer: Receives input data.
o Hidden Layer(s): One or more intermediate layers that process
information.
o Output Layer: Produces the final output or prediction.
3. Node (Neuron): Each node (neuron) in the network has its own
associated weight and threshold. If the output of a node exceeds the
specified threshold, it activates and passes data to the next layer.
4. Activation Function: After multiplying inputs by their weights and
summing them up, the output passes through an activation function. If the
result exceeds the threshold, the node fires, connecting to the next layer.
This process defines the neural network as a feedforward network.
5. Training: Neural networks learn from training data to improve accuracy
over time. Once fine-tuned, they excel in tasks like speech recognition,
image classification, and more.
These include:
1. The neural network is simulated by a new environment.
2. Then the free parameters of the neural network are changed as a result of
this simulation.
3. The neural network then responds in a new way to the environment because
of the changes in its free parameters.
Working of a Neural Network
Neural networks are complex systems that mimic some features of the
functioning of the human brain. It is composed of an input layer, one or
more hidden layers, and an output layer made up of layers of artificial
neurons that are coupled. The two stages of the basic process are called
backpropagation and forward propagation.
Forward Propagation
• Input Layer: Each feature in the input layer is represented by a node on
the network, which receives input data.
• Weights and Connections: The weight of each neuronal connection
indicates how strong the connection is. Throughout training, these weights
are changed.
• Hidden Layers: Each hidden layer neuron processes inputs by multiplying
them by weights, adding them up, and then passing them through an
activation function. By doing this, non-linearity is introduced, enabling the
network to recognize intricate patterns.
• Output: The final result is produced by repeating the process until the
output layer is reached.
Backpropagation
• Loss Calculation: The network’s output is evaluated against the real goal
values, and a loss function is used to compute the difference. For a
regression problem, the Mean Squared Error (MSE) is commonly used as
the cost function.
Loss Function:
• Gradient Descent: Gradient descent is then used by the network to reduce
the loss. To lower the inaccuracy, weights are changed based on the
derivative of the loss with respect to each weight.
Perceptron model is also treated as one of the best and simplest types of Artificial
Neural networks. However, it is a supervised learning algorithm of binary
classifiers. Hence, we can consider it as a single-layer neural network with four
main parameters, i.e., input values, weights and Bias, net sum, and an
activation function.
Mr. Frank Rosenblatt invented the perceptron model as a binary classifier which
contains three main components. These are as follows:
This is the primary component of Perceptron which accepts the initial data into
the system for further processing. Each input node contains a real numerical
value.
Weight parameter represents the strength of the connection between units. This
is another most important parameter of Perceptron components. Weight is
directly proportional to the strength of the associated input neuron in deciding the
output. Further, Bias can be considered as the line of intercept in a linear equation.
o Activation Function:
These are the final and important components that help to determine whether the
neuron will fire or not. Activation Function can be considered primarily as a step
function.
o Sign function
o Step function, and
o Sigmoid function
The data scientist uses the activation function to take a subjective decision based
on various problem statements and forms the desired outputs. Activation function
may differ (e.g., Sign, Step, and Sigmoid) in perceptron models by checking
whether the learning process is slow or has vanishing or exploding gradients.
Step-1
In the first step first, multiply all input values with corresponding weight values
and then add them to determine the weighted sum. Mathematically, we can
calculate the weighted sum as follows:
Add a special term called bias 'b' to this weighted sum to improve the model's
performance.
∑wi*xi + b
Step-2
Y = f(∑wi*xi + b)
A multi-layer perceptron model has greater processing power and can process
linear and non-linear patterns. Further, it can also implement logic gates such as
AND, OR, XOR, NAND, NOT, XNOR, NOR.
Perceptron Function
Perceptron function ''f(x)'' can be achieved as output by multiplying the input 'x'
with the learned weight coefficient 'w'.
f(x)=1; if w.x+b>0
otherwise, f(x)=0
Characteristics of Perceptron
The perceptron model has the following characteristics.
Future of Perceptron
The future of the Perceptron model is much bright and significant as it helps to
interpret data by building intuitive patterns and applying them in the future.
Machine learning is a rapidly growing technology of Artificial Intelligence that
is continuously evolving and in the developing phase; hence the future of
perceptron technology will continue to support and facilitate analytical behavior
in machines that will, in turn, add to the efficiency of computers.
3.Backpropagation
• In machine learning, backpropagation is an effective algorithm used
to train artificial neural networks, especially in feed-forward neural
networks.
• Backpropagation is an iterative algorithm, that helps to minimize the
cost function by determining which weights and biases should be
adjusted. During every epoch, the model learns by adapting the
weights and biases to minimize the loss by moving down toward the
gradient of the error. Thus, it involves the two most popular
optimization algorithms, such as gradient descent or stochastic
gradient descent.
• Computing the gradient in the backpropagation algorithm helps to
minimize the cost function and it can be implemented by using the
mathematical rule called chain rule from calculus to navigate through
complex layers of the neural network.
5. Scalability: Backpropagation scales well with the size of the dataset and
the complexity of the network. This scalability makes it suitable for large-
scale machine learning tasks, where training data and network size are
significant factors.
• Forward pass
• Backward pass
• In forward pass, initially the input is fed into the input layer. Since the
inputs are raw data, they can be used for training our neural network.
• The inputs and their corresponding weights are passed to the hidden layer.
The hidden layer performs the computation on the data it receives. If there
are two hidden layers in the neural network, for instance, consider the
illustration fig(a), h1 and h2 are the two hidden layers, and the output of
h1 can be used as an input of h2. Before applying it to the activation
function, the bias is added.
• In the backward pass process shows, the error is transmitted back to the
network which helps the network, to improve its performance by learning
and adjusting the internal weights.
• To find the error generated through the process of forward pass, we can use
one of the most commonly used methods called mean squared error which
calculates the difference between the predicted output and desired output.
The formula for mean squared error
is: 𝑀𝑒𝑎𝑛𝑠𝑞𝑢𝑎𝑟𝑒𝑑𝑒𝑟𝑟𝑜𝑟=(𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑𝑜𝑢𝑡𝑝𝑢𝑡–
𝑎𝑐𝑡𝑢𝑎𝑙𝑜𝑢𝑡𝑝𝑢𝑡)2Meansquarederror=(predictedoutput–actualoutput)2
• Once we have done the calculation at the output layer, we then propagate
the error backward through the network, layer by layer.
• The key calculation during the backward pass is determining the gradients
for each weight and bias in the network. This gradient is responsible for
telling us how much each weight/bias should be adjusted to minimize the
error in the next forward pass. The chain rule is used iteratively to calculate
this gradient efficiently.
6. Training the Neural Network: The train method trains the neural network
using the specified number of epochs and learning rate. It iterates through
the training data, performs the feedforward and backward passes, and
updates the weights and biases accordingly.
7. XOR Dataset: The XOR dataset (X) is defined, which contains input pairs
that represent the XOR operation, where the output is 1 if exactly one of
the inputs is 1, and 0 otherwise.
8. Testing the Trained Model: After training, the neural network is tested
on the XOR dataset (X) to see how well it has learned the XOR function.
The predicted outputs are printed to the console, showing the neural
network’s predictions for each input pair.
You will need unknown information to test your machine learning model
after it was created (using your training data). This data is known as testing
data, and it may be used to assess the progress and efficiency of your
algorithms’ training as well as to modify or optimize them for better results.
This dataset needs to be “unseen” and recent. This is because the training
data was already “learned” by your model. You can decide if it is operating
successfully or when it need more training data to fulfill your standards by
observing how it performs on fresh test data. Test data provides as a last,
real check if an unknown dataset was correctly trained by the machine
learning algorithm.
The machine-
learning model is
trained using
Testing data is
training data. The
used to evaluate
more training data
the model’s
a model has, the
performance.
more accurate
predictions it can
Purpose make.
Features Training Data Testing Data
Until evaluation,
By using the the testing data is
training data, the not exposed to the
model can gain model. This
knowledge and guarantees that the
become more model cannot learn
accurate in its the testing data by
predictions. heart and produce
Exposure flawless forecasts.
By making
predictions on the
To stop testing data and
overfitting, comparing them to
training data is the actual labels,
utilized. the performance of
the model is
Use assessed.
Training data teaches a machine learning model how to behave, whereas testing
data assesses how well the model has learned.
Training is the process where a machine learning model learns from a dataset.
This dataset, known as the training set, contains input-output pairs where the
output (or label) is known. The model adjusts its parameters to minimize the error
between its predictions and the actual labels in the training set.
Key Points:
Validation
Key Points:
1. Purpose:
o Training: To adjust model parameters to minimize the error on the
training data.
o Validation: To evaluate model performance and tune
hyperparameters, ensuring the model generalizes well to unseen
data.
2. Data:
o Training: Uses the training dataset.
o Validation: Uses the validation dataset, which is distinct from the
training data.
3. Process:
o Training: Involves learning from the data, often through
backpropagation and optimization algorithms.
o Validation: Involves monitoring performance metrics and making
decisions about model adjustments.
4. Frequency:
o Training: Continuous, until the model converges or training criteria
are met.
o Validation: Periodic, often after each epoch or a set number of
iterations.
5. Impact on Model:
o Training: Directly influences the model's weights and biases.
o Validation: Influences decisions on model tuning, such as
hyperparameter settings and stopping criteria.
6. Outcome:
o Training: A model that performs well on the training data.
o Validation: Insights into the model’s ability to generalize, guiding
improvements to prevent overfitting and underfitting.
Example Workflow
1. Split Data: Divide the dataset into training, validation, and test sets (e.g.,
70% training, 20% validation, 10% test).
2. Train Model: Use the training set to train the model.
3. Validate Model: Use the validation set to evaluate performance and adjust
hyperparameters.
4. Final Evaluation: After tuning, use the test set to assess the model's final
performance.
training and validation process
Training Phase:
Validation Phase:
Model Tuning:
Summary of Differences
• MLE: Focuses solely on the data, ignoring any prior information.
• MAP: Balances data and prior information, offering a compromise
between data fit and prior beliefs.
• Bayesian: Provides a comprehensive approach by considering the full
posterior distribution, leading to a more robust estimation that accounts for
uncertainty in the parameters.
Practical Considerations in Neural Networks
• Computational Complexity: Bayesian methods are generally more
computationally intensive than MLE and MAP due to the need to sample
or approximate the posterior distribution.
• Regularization: MAP estimation introduces regularization naturally
through the prior distribution, which can help in preventing overfitting.
• Uncertainty Quantification: Bayesian approaches allow for uncertainty
quantification in predictions, which can be particularly valuable in
applications where understanding confidence in predictions is important.
Q:
How does Bayesian parameter estimation differ from MLE and MAP,
and what are its advantages in neural network modelling?
Uncertainty Quantification:
1. Uncertainty Quantification:
o Bayesian methods provide a natural way to quantify uncertainty in
both the parameters and the predictions. This is particularly useful
in applications where understanding the confidence in predictions is
crucial (e.g., medical diagnosis, autonomous driving).
2. Regularization:
o The prior distribution in Bayesian estimation acts as a regularizer,
helping to prevent overfitting, especially in cases where data is
scarce. This is similar to MAP, but Bayesian methods take this a step
further by fully integrating over the posterior.
3. Robustness:
o Bayesian models are generally more robust to overfitting compared
to MLE, as they incorporate prior beliefs and account for
uncertainty.
4. Prediction Averaging:
o Bayesian methods can perform model averaging, where predictions
are averaged over many possible models (parameter settings)
weighted by their posterior probability. This often leads to better
generalization performance compared to single point estimates from
MLE or MAP.
5. Flexibility:
oBayesian frameworks are flexible and can naturally incorporate
various types of prior knowledge. This can be particularly
advantageous in complex neural network models where domain
knowledge can significantly improve model performance.
6. Adaptability to New Data:
o Bayesian methods can more easily adapt to new data through the
updating of the posterior distribution. This is in contrast to MLE and
MAP, which may require re-training from scratch or significant
adjustments when new data is introduced.
Practical Considerations
• Computational Complexity:
o Bayesian methods can be computationally intensive due to the need
to sample from or approximate the posterior distribution, especially
in high-dimensional parameter spaces typical in neural networks.
Techniques like Markov Chain Monte Carlo (MCMC) or variational
inference are often used to address these challenges.
• Implementation:
o Implementing Bayesian neural networks can be more complex
compared to standard MLE or MAP-based networks. However,
advances in probabilistic programming and software libraries (e.g.,
Pyro, TensorFlow Probability) are making Bayesian approaches
more accessible.
Data (D):
• The starting point for all parameter estimation methods. Represents the
observed data used for training the neural network.
Likelihood P(D|θ):
• The probability of the data given the parameters (θ). This is the key
component for MLE and MAP.
MLE (θ_MLE):
Prior P(θ):
• In MAP and Bayesian estimation, prior information about the parameters
is incorporated through the prior distribution.
MAP (θ_MAP):
Bayesian Estimation: