0% found this document useful (0 votes)
18 views

Module -04 Machine Learning(BCS602) Search Creators

The document covers Bayesian Learning and Artificial Neural Networks (ANNs) in machine learning. It explains the principles of probability-based learning, including Bayes' theorem, and details various algorithms like Naive Bayes and Bayesian Belief Networks. Additionally, it discusses the structure and functionality of ANNs, including types, activation functions, and their applications, along with advantages and challenges associated with using ANNs.

Uploaded by

gamingsrb34
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Module -04 Machine Learning(BCS602) Search Creators

The document covers Bayesian Learning and Artificial Neural Networks (ANNs) in machine learning. It explains the principles of probability-based learning, including Bayes' theorem, and details various algorithms like Naive Bayes and Bayesian Belief Networks. Additionally, it discusses the structure and functionality of ANNs, including types, activation functions, and their applications, along with advantages and challenges associated with using ANNs.

Uploaded by

gamingsrb34
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Module-4

Chapter – 01 - Bayesian Learning

Introduction to Probability-based Learning

Definition:

Probability-based learning combines prior knowledge or prior probabilities with


observed data to make predictions about future events.

Role of Probability Theory:

Uses probability theory to model randomness, uncertainty, and noise in data.

Purpose:

It helps in predicting outcomes and learning from large datasets by using Bayes' rule to
infer unknown quantities.

Randomness vs. Determinism:

 Probabilistic Model: Involves randomness and uses probability distributions to


find solutions.
 Deterministic Model: Does not involve randomness and will yield the same result
with identical initial conditions.

Bayesian Learning:

 A type of probabilistic learning that uses subjective probabilities, i.e., probabilities


based on individual belief and interpretation about the outcome, which can
change over time.
 Involves the inference of model parameters using subjective probabilities.

Search Creators... Page 1


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Bayesian Algorithms:

 Naive Bayes Learning: A simple probabilistic classifier based on Bayes' theorem


with strong independence assumptions.
 Bayesian Belief Network (BBN): A graphical model that represents a set of
variables and their conditional dependencies, which is explained in Chapter 9.

Bayes' Rule: Forms the foundation of probabilistic learning and Bayesian learning
algorithms for inferring useful information.

Fundamentals of Bayes Theorem

Naive Bayes Model relies on Bayes theorem that works on the principle of three kinds of
probabil- ities called prior probability, likelihood probability, and posterior probability.

Prior Probability

It is the general probability of an uncertain event before an observation is seen or some


evidence is collected.

It is the initial probability that is believed before any new information is collected.

Likelihood Probability

Likelihood probability is the relative probability of the observation occurring for each
class or the sampling density for the evidence given the hypothesis.

It is stated as P (Evidence | Hypothesis), which denotes the likeliness of the occurrence


of the evidence given the parameters.

Posterior Probability

It is the updated or revised probability of an event taking into account the observations
from the training data.

Search Creators... Page 2


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

P (Hypothesis | Evidence) is the posterior distribution representing the belief about the
hypothesis, given the evidence from the training data.

Therefore, Posterior probability = prior probability + new evidence.

Classification Using Bayes Model

Naive Bayes Classification is based on Bayes’ theorem, which calculates the posterior
probability using prior probabilities.

Bayes’ theorem determines the probability of a hypothesis (h) given evidence (E):

Posterior Probability is proportional to Prior Probability × Likelihood Probability.

Bayes’ theorem helps calculate posterior probabilities for multiple hypotheses to select
the one with the highest probability, known as Maximum A Posteriori (MAP) Hypothesis.

MAP Hypothesis (h_MAP):

The hypothesis with the highest posterior probability among all candidates.

Search Creators... Page 3


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Maximum Likelihood (ML) Hypothesis (h_ML):

 When all hypotheses are equally probable, only the likelihood (P(E | h)) is
considered.
 The hypothesis that maximizes the likelihood is selected.

Bayes’ theorem ensures correctness by defining relationships between events within a


sample space, enabling reliable probabilistic predictions.

Naïve Bayes Algorithm for Continuous Attributes

Purpose: Handle continuous features in Naïve Bayes classification by either discretizing


them or applying a probability distribution.

Approaches:

 Discretization: Convert continuous features into discrete intervals or categories


(e.g., binning).
 Gaussian Distribution: Assume continuous features follow a Gaussian (normal)
distribution.

Gaussian Naïve Bayes Algorithm:

 Assumes that the likelihood of the features P(x∣h) follows a Gaussian


distribution.
 The probability density function for Gaussian distribution

Search Creators... Page 4


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Steps in Gaussian Naïve Bayes:

Calculate the mean (μ) and variance (σ2) of each continuous feature for every class
using the training data.

For a given test instance, compute the likelihood P(x∣h) for each feature using the
Gaussian formula.

Multiply the likelihoods of all features with the prior probabilities of the class to
calculate the posterior probability.

Select the class with the highest posterior probability as the predicted class.

Advantages:

 Efficient in handling continuous data.


 Works well if the assumption of Gaussian distribution is valid.

Limitations:

 Assumes independence between features.


 Performance may degrade if the data is not normally distributed.

Search Creators... Page 5


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Chapter – 02

Artificial Neural Networks

Introduction

 The human nervous system has billions of neurons that act as processing units,
enabling perception, hearing, vision, smell, and overall cognition.

 It helps humans understand themselves, their actions, their location, and their
surroundings, allowing them to remember, recognize, and correlate information.

 The system comprises functional units called neurons (nerve cells).

Divisions of the Nervous System:

 Central Nervous System (CNS): Includes the brain and spinal cord.
 Peripheral Nervous System (PNS): Comprises neurons located inside and outside
the CNS.

Types of Neurons:

Sensory Neurons:

o Gather information from different parts of the body and transmit it to the CNS.

Motor Neurons:

o Receive information from other neurons and send commands to the body parts.

Interneurons:

o Found in the CNS, they connect one neuron to another, transmitting information
between them.

Search Creators... Page 6


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Functionality of a Neuron:

Receiving: Collects information from other neurons or sensory inputs.

Processing: Interprets the received information.

Transmitting: Sends processed information to another neuron or a body part.

Biological Neurons
 A typical biological neuron has four parts called dendrites, soma, axon and synapse.
The body of the neuron is called as soma.
 Dendrites accept the input information and process it in the cell body called soma. A
single neuron is connected by axons to around 10,000 neurons and through these
axons the processed information is passed from one neuron to another neuron.
 A neuron gets fired if the input information crosses a threshold value and transmits
signals to another neuron through a synapse.
 A synapse gets fired with an electrical impulse called spikes which are transmitted
to another neuron.
 A single neuron can receive synaptic inputs from one neuron or multiple neurons.
 These neurons form a network structure which processes input information and
gives out a response.
 The simple structure of a biological neuron is shown in Figure 10.1.

Search Creators... Page 7


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Artificial Neurons
 Artificial neurons are like biological neurons which are called as nodes.
 A node or a neuron can receive one or more input information and process it.
 Artificial neurons or nodes are connected by connection links to one another.
 Each connection link is associated with a synaptic weight.
 The structure of a single neuron is shown in Figure 10.2.

Simple Model of an Artificial Neuron

Search Creators... Page 8


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

McCulloch & Pitts Neuron model can represent only a few Boolean functions. A Boolean
function has binary inputs and provides a binary output.

For example, an AND Boolean function neuron would fire when all the inputs are 1,
whereas an OR Boolean function neuron would even when one input is 1.

Moreover, the weight and threshold values are fixed in this mathematical model.

Search Creators... Page 9


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Artificial Neural Network (ANN) Structure

Structure of ANN:

o The ANN is a network represented as a directed graph with neurons (nodes) and
connection weights (edges).
o The neurons are arranged in layers:
1. Input Layer: Receives the input data.
2. Hidden Layer: Processes the information from the input layer.
3. Output Layer: Provides the final output.

Working of the Network:

o Each neuron in the hidden layer performs computations based on weighted inputs
from the previous layer and fires if the weighted sum exceeds the threshold.
o Neurons use an activation function to map the weighted sum to a non-linear output.

Search Creators... Page 10


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Activation Functions

Role:

o Activation functions determine whether a neuron should fire or not based on the
input signals. They map the weighted input sum to an output value, typically
normalizing the value between 0 and 1 or -1 and +1.

Linear activation functions are used when outputs can be classified into two groups,
suitable for binary classification.

Non-linear activation functions (such as sigmoid, tanh, etc.) are used for more complex
data, like audio, video, or images, and allow the network to learn non-linear
relationships.

Common Activation Functions in ANNs:

Sigmoid: Maps inputs to a range between 0 and 1.

Tanh: Maps inputs to a range between -1 and +1.

ReLU (Rectified Linear Unit): Outputs the input if positive, otherwise outputs 0.

Leaky ReLU: Similar to ReLU, but allows a small negative slope for negative inputs.

Perceptron and Learning Theory (A Linear Binary Classifier )


The Perceptron, developed by Frank Rosenblatt in 1958, is one of the first neural network
models designed for supervised learning.
The perceptron is a linear binary classifier and was an extension of the McCulloch & Pitts
Neuron Model, incorporating the Hebbian learning rule to adjust weights.

Elements of the Perceptron include:

1. Inputs from other neurons.

Search Creators... Page 11


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

2. Weights and Bias associated with each input.


3. Net-sum: The weighted sum of inputs.
4. Activation function: Used to determine the output based on the net sum.

Mathematical Model of Perceptron

Search Creators... Page 12


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

XOR Problem and the Perceptron

The perceptron model works for linearly separable Boolean functions but fails to solve
the XOR problem. The XOR function produces:

 1 if the two inputs are not equal,


 0 if the two inputs are equal.

Since XOR is not linearly separable, a single-layer perceptron cannot solve it. This
limitation led to the development of the Multi-Layer Perceptron (MLP), which uses
multiple layers to handle non-linear separable problems.

Delta Learning Rule and Gradient Descent

In neural networks, learning is achieved by adjusting the weights of the network to


minimize the difference between the actual output and the desired output.

This difference is measured using a cost function (or error function).

The Delta Rule, also known as the Widrow-Hoff rule or Adaline rule, is used to update the
weights. It is based on minimizing the error (difference between desired and actual
output).

The training error is given by:

Search Creators... Page 13


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Gradient Descent is an optimization method used to minimize the cost function by


adjusting the weights in the direction of the negative gradient. The size of each step
taken in the direction of the gradient is controlled by the learning rate.

Gradient descent is a key part of the backpropagation algorithm, which is the


foundation for training Multi-Layer Perceptrons (MLPs).

Types of Artificial Neural Networks

General ANN Structure

o Consists of Input Layer, Hidden Layer(s), Output Layer.


o Types differ based on structure, activation function, learning rules.

Feed Forward Neural Network (FFNN)

Search Creators... Page 14


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

o Simplest ANN with one-way data flow (no backpropagation).


o Used for simple classification, image processing.
o Types:
 Single-layered (no hidden layer)
 Multi-layered (has hidden layers)

Fully Connected Neural Network (FCNN)

Fully connected neural networks are the ones in which all the neurons in a layer are
connected to all other neurons in the next layer.

The model of a fully connected neural network is shown in Figure 10.8.

Multi-Layer Perceptron (MLP)

o Multiple layers (input, hidden, output), fully connected.


o Uses Backpropagation for learning.
o Suitable for complex tasks (speech recognition, medical diagnosis).

Search Creators... Page 15


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Feedback Neural Network (Recurrent Neural Networks - RNNs)

Feedback neural networks have feedback connections between neurons that allow
information flow in both directions in the network.

The output signals can be sent back to the neurons in the same layer or to the neurons
in the preceding layers.

Hence, this network is more dynamic during training. The model of a feedback neural
network is shown in Figure 10.10.

Search Creators... Page 16


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Learning in Multi-Layer Perceptron (MLP)

Structure

o At least three layers (input, hidden, output).


o Hidden layers extract features.

Training Process

o Forward Phase: Inputs pass through layers, output is generated.


o Backward Phase: Error is backpropagated to adjust weights.
o Stops when error reaches a threshold or after N epochs.

Overfitting Solution

o Cross-validation using a validation dataset.

Search Creators... Page 17


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Radial Basis Function Neural Network (RBFNN)

1. Introduced by Broomhead & Lowe (1988).


2. Single Hidden Layer (unlike MLP with multiple).
3. Uses Radial Basis Functions (RBF) as activation functions.
4. Common RBF types:
o Gaussian RBF (decreases with distance).
o Multiquadric RBF (increases with distance).
5. Used for classification, time-series prediction, interpolation.

Search Creators... Page 18


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Self-Organizing Feature Map (SOFM / Kohonen Network)

1. Developed by Teuvo Kohonen (1982).


2. Unsupervised Learning Model (clusters data).
3. High-dimensional data mapped to 2D space.
4. Network Structure:

o Two layers (Input Layer & Output Layer).


o No hidden layers.
5. Training Process:
o Uses Euclidean distance to find the winning neuron.
o Weights are updated to form clusters.
6. Applications:
o Pattern recognition, data clustering, dimensionality reduction.

Search Creators... Page 19


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

Popular Applications of Artificial Neural Networks


 ANN learning mechanisms are used in many complex applications that involve
modelling of non-linear processes.
 ANN is a useful model that can handle even noisy and incomplete data.
 They are used to model complex patterns, recognize patterns and solve prediction
problems like humans in many areas such as:

Advantages and Disadvantages of ANN

Advantages of ANN

1. Solves complex problems involving non-linear processes.


2. Learns and recognizes patterns like humans.
3. Supports parallel processing, making it fast.
4. Works even with incomplete or noisy data.
5. Scales well for large datasets and outperforms traditional learning mechanisms.

Limitations of ANN

1. Requires high computational power for training.


2. Functions as a “black box” making it hard to interpret.
3. Development is complex and time-consuming.

Search Creators... Page 20


BCS602 | MACHINE LEARNING| SEARCH CREATORS.

4. Needs large datasets for effective learning.


5. More computationally expensive than traditional machine learning models.

Challenges of ANN

Training complexity – Requires extensive training data and computational power.

Overfitting/Underfitting – Can struggle with real-world data if not trained properly.

Generalization issues – Models trained on simulated data may not work well in real
applications.

Parameter tuning – Finding optimal weights and biases is difficult.

Search Creators... Page 21

You might also like