0% found this document useful (0 votes)

10 views61 pages

1

The document provides an overview of machine learning, focusing on supervised learning, its types, and algorithms such as perceptrons and support vector machines (SVM). It explains the training process, evaluation methods, and the role of neural networks in pattern recognition and decision-making. Additionally, it covers logistic regression and its types, assumptions, and evaluation metrics, highlighting the importance of loss functions in model training.

Uploaded by

nandushettar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views61 pages

1

Uploaded by

nandushettar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Machine learning (ML)

Well posed learning problem: "A computer program is said to learn from experience E with
respect to some class of tasks T and performance measure P, if its performance at tasks in T,
as measured by P, improves with experience E.“ (Tom Michel)

• Allows computers to learn and make decisions without being explicitly programmed.

• It involves feeding data into algorithms to identify patterns and make predictions on new
data.

• Computers may now acquire knowledge and make hypotheses or judgments without
explicit programming with the help of machine learning.

• It involves developing algorithms and models that identify data trends and connections so
that computers can draw inferences, make accurate predictions, and automate processes.
Supervised learning is a type of machine learning where a model
is trained on labeled data—meaning each input is paired with the
How Supervised Machine Learning Works?
Where supervised learning algorithm consists of input features and
corresponding output labels.

The process works through:

Training Data: The model is provided with a training dataset that
includes input data (features) and corresponding output data (labels
or target variables).
Learning Process: The algorithm processes the training data, learning
the relationships between the input features and the output labels.
This is achieved by adjusting the model's parameters to minimize the
difference between its predictions and the actual labels.

• After training, the model is evaluated using a test dataset to measure its accuracy
and performance.
• Then the model's performance is optimized by adjusting parameters and using
techniques like cross-validation to balance bias and variance. This ensures the
model generalizes well to new, unseen data.
How a supervised machine learning model is trained on a dataset to
learn a mapping function between input and output, and then with
learned function is used to make predictions on new data:
Types of Supervised Learning in Machine Learning
Now, Supervised learning can be applied to two main types of
problems:

Classification: Where the output is a categorical variable (e.g., spam

vs. non-spam emails, yes vs. no).

Regression: Where the output is a continuous variable (e.g.,

predicting house prices, stock prices).
Figure A: It is a dataset of a shopping store that is useful in predicting whether a
customer will purchase a particular product under consideration or not based on his/
her gender, age, and salary.
Input: Gender, Age, Salary
Output: Purchased i.e. 0 or 1; 1 means yes the customer will purchase and 0 means
that the customer won't purchase it.

Figure B: It is a Meteorological dataset that serves the purpose of predicting wind

speed based on different parameters.
Input: Dew Point, Temperature, Pressure, Relative Humidity, Wind Direction
Output: Wind Speed
• In statistics, the term linear model refers to any model which
assumes linearity in the system.

• The most common occurrence is in connection with regression

models and the term is often taken as synonymous with linear
regression model.

• However, the term is also used in time series analysis with a

different meaning.
In machine learning, a perceptron is a fundamental algorithm and a
basic building block of artificial neural networks. It's a type of linear
classifier that takes inputs, applies weights and a bias, sums them,
and then applies an activation function to produce an output,
essentially making a decision.

the perceptron model mimics the functioning of a biological neuron,

enabling us to solve binary classification problems.

It is a type of linear classifier that predicts whether an input belongs

to one of two classes, typically labeled as 0 and 1.

At its core, the perceptron algorithm performs a weighted sum of the

input features, applies a threshold function, and outputs a predicted
class label.
Key Components of the Perceptron:

Input Features: The perceptron algorithm takes a set of input features as its
initial input. These features can be numeric, categorical, or binary,
representing different aspects of the problem being solved.

Weights and Bias: Each input feature is associated with a weight, which
determines its importance in the classification process. Additionally, there is
a bias term that allows for adjusting the decision boundary.

Activation Function: The activation function is applied to the weighted sum

of the input features plus the bias. It determines the output of the
perceptron, indicating which class the input is predicted to belong to.
Common activation functions used in perceptrons include the step function
and the sigmoid function.
The support vector machine (SVM) method is a popular and effective
machine learning method that finds its application in a wide range of
different areas.

Furthermore, various modifications of this method are still being

developed.

We can use this method for both the problems of classification and
regression, but it’s more common to use it for classification.
In short, the main idea behind this classification algorithm is to separate classes as
correctly as possible. For example, if we take the classification of red and blue dots
in the left image below, we can see that all three lines (generally hyperplanes)
correctly separate the class of red and blue dots.

However, the question arises as to which is the best solution in general for some
other points of these classes as well. The SVM solves this problem by dividing these
two classes in such a way that the hyperplane remains as far away as possible from
the two nearest points of both classes. We can see this in the right image below.

The maximum possible margin was constructed between these two classes as a
space between two parallel gray boundary hyperplanes, while in their midst, there
is a hyperplane of separation. These two parallel gray hyperplanes intersect one
or more points that we call support vectors:
Key Differences
Although these two algorithms are very similar, and they’re solving the same
problem, there are major differences between them.

1. Inspiration
As we mentioned above, the perceptron is a neural network type of model. The
inspiration for creating perceptron came from simulating biological networks. In
contrast, SVM is a different type of machine learning model, which was inspired by
statistical learning theory.

2. Training and Optimization

Generally, for training and optimizing weights of perceptron and neural networks,
we use the backpropagation technique, which includes the gradient descent
approach. Conversely, to maximize the margin of SVM, we need to solve quadratic
equations using quadratic programming (QP).

For example, in the popular machine learning library Scikit-Learn, QP is solved by an

algorithm called sequential minimal optimization (SMO).
3. Kernel Trick
The SVM algorithm uses one smart technique that we call the kernel
trick. The main idea is that when we can’t separate the classes in the
current dimension, we add another dimension where the classes may
be separable.

In order to do that, we don’t just arbitrarily add another dimension,

we use special transformations called kernels.

In contrast to SVM, perceptron doesn’t use the kernel trick and

doesn’t transform the data into a higher dimension. Consequently, if
the data isn’t easily separable with the current configuration of
perceptron, we can try to increase the number of neurons or layers in
the model.
4. Multiclass Classification
SVM doesn’t support multiclass classification natively. Therefore, if
we want to separate multiple classes using SVM algorithms, there are
two indirect approaches:
• One-vs-One approach
• One-vs-Rest approach

One-vs-One (OvO) approach means that we break the multiclass

problem into multiple binary classification problems. For example, if
we have three classes with names X, Y, and Z, the OvO approach
would divide it into three binary classification problems:
1. X vs Y
2. X vs Z
3. Y vs Z
Similarly, One-vs-Rest (OvR) approach breaks the multiclass problem
into multiple binary classifications where it tries to separate the
current class with all the other classes together. For example, if we
take the same classes as above, the OvR approach looks like this:
1. X vs [Y, Z]
2. Y vs [X, Z]
3. Z vs [X, Z]
Although classic perceptron with one neuron requires the same logic
for solving multiclass classification problems, most of today’s
implementations of perceptron algorithms can directly predict the
probability for each of the classes. This is simply done using the
softmax activation function in the output layer.
5. Probability of Prediction
Finally, the SVM model doesn’t output probability natively. Therefore,
if we want to have a probability of prediction, we can get it indirectly
with the probability calibration method. One standard way is using
Platt scaling.

In contrast to SVM, perceptron with probabilistic activation function in

the output layer directly predicts the probability for each class. The
most common probabilistic functions are sigmoid and softmax.
Logistic Regression is a supervised machine learning algorithm used
for classification problems.

Unlike linear regression which predicts continuous values it predicts

the probability that an input belongs to a specific class.

It is used for binary classification where the output can be one of two
possible categories such as Yes/No, True/False or 0/1.

It uses sigmoid function to convert inputs into a probability value

between 0 and 1.
Types of Logistic Regression
Logistic regression can be classified into three main types based on the nature of
the dependent variable:

Binomial Logistic Regression: This type is used when the dependent variable has
only two possible categories. Examples include Yes/No, Pass/Fail or 0/1. It is the
most common form of logistic regression and is used for binary classification
problems.

Multinomial Logistic Regression: This is used when the dependent variable has
three or more possible categories that are not ordered. For example, classifying
animals into categories like "cat," "dog" or "sheep." It extends the binary logistic
regression to handle multiple classes.

Ordinal Logistic Regression: This type applies when the dependent variable has
three or more categories with a natural order or ranking. Examples include ratings
like "low," "medium" and "high." It takes the order of the categories into account
when modeling.
Assumptions of Logistic Regression
Independent observations: Each data point is assumed to be
independent of the others means there should be no correlation or
dependence between the input samples.
Binary dependent variables: It takes the assumption that the
dependent variable must be binary, means it can take only two values.
For more than two categories SoftMax functions are used.
Linearity relationship between independent variables and log odds:
The model assumes a linear relationship between the independent
variables and the log odds of the dependent variable which means the
predictors affect the log odds in a linear way.
No outliers: The dataset should not contain extreme outliers(data
points that deviate significantly from the rest of the observations in a
dataset) as they can distort the estimation of the logistic regression
coefficients.
Large sample size: It requires a sufficiently large sample size to
produce reliable and stable results.
How to Evaluate Logistic Regression Model
Neural networks are machine learning models that mimic
the complex functions of the human brain.

These models consist of interconnected nodes or neurons

that process data, learn patterns, and enable tasks such as
pattern recognition and decision-making.
Neural networks are pivotal in identifying complex patterns, solving
intricate challenges, and adapting to dynamic environments. Their
ability to learn from vast amounts of data is transformative,
impacting technologies like natural language processing, self-driving
vehicles, and automated decision-making.
Layers in Neural Network Architecture
Input Layer: This is where the network receives its input data. Each
input neuron in the layer corresponds to a feature in the input data.
Hidden Layers: These layers perform most of the computational
heavy lifting. A neural network can have one or multiple hidden
layers. Each layer consists of units (neurons) that transform the inputs
into something that the output layer can use.
Output Layer: The final layer produces the output of the model. The
format of these outputs varies depending on the specific task (e.g.,
classification, regression).
Shallow Neural Network?
A shallow neural network refers to a neural network that consists of
only one hidden layer between the input and output layers.

The term “shallow” refers to the minimal depth of the network due
to just one hidden layer between input and output.

During the training process, input data is fed into the network where
it is processed through weights and biases associated with neurons in
the hidden layer.

The processed information then moves to the output layer, which

provides a prediction or classification based on the learned features.

The accuracy of these predictions is refined through a process called

backpropagation and optimization algorithms like gradient descent,
which adjust weights and biases to minimize errors.
Components of a Shallow Neural Network
Input Layer: This is where the network receives its input data. Each
neuron in this layer represents a feature of the input dataset.

Hidden Layer: The single hidden layer in a shallow network

transforms the inputs into something that the output layer can use.
The neurons in this layer apply a set of weights to the inputs and pass
them through an activation function to introduce non-linearity to the
process.

Output Layer: The final layer produces the output of the network. For
regression tasks, this might be a single neuron; for classification, it
could be multiple neurons corresponding to the classes.
Shallow Neural Networks Work?
The functionality of shallow neural networks hinges on the
transformation of inputs through the hidden layer to produce outputs.
Here's a step-by-step breakdown:

Weighted Sum: Each neuron in the hidden layer calculates a weighted

sum of the inputs.

Activation Function: The weighted sums are passed through an

activation function (such as Sigmoid, Tanh,) to introduce non-linearity,
enabling the network to learn complex patterns.
Sigmoid: S-shaped function that maps input values to a range between 0 and 1.
Tanh (Hyperbolic Tangent): S-shaped function like sigmoid, but maps input values between -1 and 1.)

Output Generation: The output layer integrates the signals from the
hidden layer, often through another set of weights, to produce the
final output.
Training Shallow Neural Networks
Training a shallow neural network typically involves:

Forward Propagation: Calculating the output for a given input by

passing it through the layers of the network.

Loss Calculation: Determining how far the network's output is from

the actual desired output using a loss function.

Backpropagation: Calculating the gradient of the loss function with

respect to each weight in the network, which informs how the
weights should be adjusted to minimize the loss.

Weight Update: Adjusting the weights using an optimization

algorithm like gradient descent.
Forward Propagation
• Forward propagation is the fundamental process in a neural network where input
data passes through multiple layers to generate an output.

• It is the process by which input data passes through each layer of neural network
to generate output.

• In Forward propagation input data moves through each layer of neural network
where each neuron applies weighted sum, adds bias, passes the result through
an activation function and making predictions.
• It determines the output of neural network with a given set of inputs and current
state of model parameters (weights and biases).

• Understanding this process helps in optimizing neural networks for various tasks
like classification, regression and more.

x (input) -> [Layer 1] -> [Layer 2] -> ... -> [Layer n] -> ŷ (output)
1. Input Layer
• The input data is fed into the network through the input layer.
• Each feature in the input dataset represents a neuron in this layer.
• The input is usually normalized or standardized to improve model
performance.

2. Hidden Layers
• The input moves through one or more hidden layers where
transformations occur.
• Each neuron in hidden layer computes a weighted sum of inputs
and applies activation function to introduce non-linearity.
Each neuron receives inputs, computes: Z=WX+b
where: W: W is the weight matrix
X: X is the input vector
B: b is the bias term
The activation function such as ReLU or sigmoid is applied.
3. Output Layer
• The last layer in the network generates the final prediction.
• The activation function of this layer depends on the type of
problem:
• Softmax (for multi-class classification)
• Sigmoid (for binary classification)
• Linear (for regression tasks)
4. Prediction
• The network produces an output based on current weights and
biases.
• The loss function evaluates the error by comparing predicted
output with actual values.
2. loss function
• is a mathematical way to measure how good or bad a model’s predictions are
compared to the actual results.
• It gives a single number that tells us how far off the predictions are. The smaller
the number, the better the model is doing.
• Loss functions are used to train models.

L(ŷ, y) = measure of difference between prediction ŷ and true value y

Loss functions are important because they:

• Guide Model Training: During training, algorithms such as Gradient Descent use
the loss function to adjust the model's parameters and try to reduce the error
and improve the model’s predictions.
• Measure Performance: By finding the difference between predicted and actual
values and it can be used for evaluating the model's performance.
• Affect learning behavior: Different loss functions can make the model learn in
different ways depending on what kind of mistakes they make.
Types of loss functions each suited for different tasks.
1. Regression Loss Functions
These are used when your model needs to predict a continuous
number such as predicting the price of a product or age of a person.
Popular regression loss functions are:

A. Mean Squared Error (MSE) Loss

• Mean Squared Error (MSE) Loss is one of the most widely used loss
functions for regression tasks.
• It calculates the average of the squared differences between the
predicted values and the actual values.
• It is simple to understand and sensitive to outliers because the
errors are squared which can affect the loss.

MSE=1/n∑i=1 (yi)2
n
B. Mean Absolute Error (MAE) Loss
• Mean Absolute Error (MAE) Loss is another commonly used loss
function for regression.
• It calculates the average of the absolute differences between the
predicted values and the actual values.
• It is less sensitive to outliers compared to MSE. But it is not
differentiable at zero which can cause issues for some optimization
algorithms.
MSE=1/n∑i=1 (yi)
n

C. Huber Loss
• Huber Loss combines the advantages of MSE and MAE.
• It is less sensitive to outliers than MSE and differentiable
everywhere unlike MAE. It requires tuning of the parameter δ.
2. Classification Loss Functions
Classification loss functions are used to evaluate how well a
classification model's predictions match the actual class labels.

A. Binary Cross-Entropy Loss (Log Loss)

• Binary Cross-Entropy Loss is also known as Log Loss and is used
for binary classification problems.
• It measures the performance of a classification model whose
output is a probability value between 0 and 1.

B. Categorical Cross-Entropy Loss

• Categorical Cross-Entropy Loss is used for multiclass classification
problems.
• It measures the performance of a classification model whose
output is a probability distribution over multiple classes.
C. Sparse Categorical Cross-Entropy Loss
• Sparse Categorical Cross-Entropy Loss is similar to Categorical
Cross-Entropy Loss but is used when the target labels are integers
instead of one-hot encoded vectors.
• It is efficient for large datasets with many classes.
3. Ranking Loss Functions
Ranking loss functions are used to evaluate models that predict the
relative order of items. These are commonly used in tasks such as
recommendation systems and information retrieval.
A. Contrastive Loss
Contrastive Loss is used to learn embeddings such that similar items
are closer in the embedding space while dissimilar items are farther
apart. It is often used in Siamese networks.
2. Triplet Loss
Triplet Loss is used to learn embeddings by comparing the relative
distances between triplets: anchor, positive example and negative
example.
4. Image and Reconstruction Loss Functions
These loss functions are used to evaluate models that generate or
reconstruct images ensuring that the output is as close as possible to
the target images.

A. Pixel-wise Cross-Entropy Loss

Pixel-wise Cross-Entropy Loss is used for image segmentation tasks
where each pixel is classified independently.

B. Dice Loss
Dice Loss is used for image segmentation tasks and is particularly
effective for imbalanced datasets. It measures the overlap between
the predicted segmentation and the ground truth.

C. Jaccard Loss (Intersection over Union, IoU)

Jaccard Loss is also known as IoU Loss that measures the intersection
over union of the predicted segmentation and the ground truth.
3. Backpropagation
• is a supervised learning algorithm used for training artificial neural
networks.

• It computes the gradient of the loss function with respect to each

weight by the chain rule.

• This process allows the network to update its weights and biases,
minimizing the error in predictions.

• Without backpropagation, training deep neural networks would be

inefficient and impractical.

• Its goal is to reduce the difference between the model’s predicted

output and the actual output by adjusting the weights and biases in
the network.
Backpropagation is all about efficiently computing how each
parameter in the network contributes to the overall error.

It does this by cleverly applying the chain rule of calculus, propagating

the error gradient backwards through the network.

It works iteratively to adjust weights and bias to minimize the cost

function. In each epoch the model adapts these parameters by
reducing loss by following the error gradient.

It often uses optimization algorithms like gradient descent or

stochastic gradient descent.

The algorithm computes the gradient using the chain rule from
calculus allowing it to effectively navigate complex layers in the neural
network to minimize the cost function.
Chain Rule
• The chain rule is a fundamental concept in calculus that is crucial
for backpropagation.
• It allows the computation of the derivative of a composite
function by breaking it down into simpler parts.
• In the context of neural networks, the chain rule helps in
computing the gradient of the loss function with respect to each
weight.

∂L/∂θ = (∂L/∂ŷ) * (∂ŷ/∂z) * (∂z/∂θ)

Where:
• L is the loss
• ŷ is the network output
• z is the input to the activation function
• θ is any weight or bias in the network
Back Propagation plays a critical role in how neural
networks improve over time.

• Efficient Weight Update: It computes the gradient of

the loss function with respect to each weight using the
chain rule making it possible to update weights
efficiently.

• Scalability: The Back Propagation algorithm scales well

to networks with multiple layers and complex
architectures making deep learning feasible.

• Automated Learning: With Back Propagation the

learning process becomes automated and the model
can adjust itself to optimize its performance.
The Back Propagation algorithm involves two main steps: the Forward
Pass and the Backward Pass.

1. Initial Calculation

aj=∑(wi,j ∗ xi)
Where,
ajis the weighted sum of all the inputs and weights at each node
wi,jrepresents the weights between the ith input and the jth neuron
xirepresents the value of the ith input

(output): After applying the activation function to a, we get the output of the
neuron:
oj= activation function(aj)
2. Sigmoid Function
The sigmoid function returns a value between 0 and 1, introducing
non-linearity into the model.
yj=1/1+e−aj

3. Computing Outputs
4. Error Calculation
Our actual output is 0.5 but we obtained 0.67.

To calculate the error we can use the below formula:

Errorj = ytarget − y5

=> 0.5−0.67=−0.17

Back Propagation
1. Calculating Gradients
The change in each weight is calculated as:
Δwij=η × δj × Oj

Where:
•δjis the error term for each unit,
•η is the learning rate.
Stochastic gradient descent
• Stochastic Gradient Descent (SGD) is an optimization
algorithm in machine learning, particularly when dealing
with large datasets.
• It is a variant of the traditional gradient descent
algorithm but offers several advantages in terms of
efficiency and scalability, making it the go-to method for
many deep-learning tasks.
• Gradient descent is an iterative optimization algorithm
used to minimize a loss function, which represents how
far the model’s predictions are from the actual values.
The main goal is to adjust the parameters of a model
(weights, biases, etc.) so that the error is minimized.
Need for Stochastic Gradient Descent

• For large datasets, computing the gradient using all data points can
be slow and memory-intensive.

• This is where SGD comes into play. Instead of using the full dataset
to compute the gradient at each step, SGD uses only one random
data point (or a small batch of data points) at each iteration.

• This makes the computation much faster.

• The insight of SGD is that the gradient is an expectation.

• The expectation maybe approximately estimated using a small set

of samples.

• Speciﬁcally, on each step of the algorithm, we can sample a

minibatch of examples B={x(1), . . . , x(m)} drawn uniformly from
the training set.

• The mini batch size m is typically chosen to be a relatively small

number of examples, ranging from one to a few hundred.

• Crucially, m is usually held ﬁxed as the training set size m grows.

• We may ﬁt a training set with billions of examples using updates

computed on only a hundred examples.
How It Works
Initialization: Start with random initial values for the model
parameters (weights and biases).

Iterative Updates:
• Randomly shuffle the dataset.
• For each data point (or mini-batch), compute the gradient of the
loss function with respect to the model parameters.
• Update the parameters using the formula
• Θ = Θ −η∇ Θ J(Θ)
• where:
• Θ : Model parameters (weights, biases, etc.)
• η : Learning rate (step size)
• J(Θ) : Gradient of the loss function

Repeat: Continue until the loss converges or a stopping criterion (e.g.,

number of epochs) is met.
SGD Classifier:
• is a linear classification algorithm that aims to find the optimal
decision boundary (a hyperplane) to separate data points belonging
to different classes in a feature space.

• It operates by iteratively adjusting the model's parameters to

minimize a cost function, often the cross-entropy loss, using the
stochastic gradient descent optimization technique.

SGD Regressor:
• solve regression issues with a machine learning approach

• Predicting a continuous output variable, also known as the dependent

variable, from one or more input data, also known as independent
variables, is the aim of SGD regression, a sort of supervised learning.

• The SGD Regressor reduces the discrepancy between target values and
anticipated values by optimizing the model's parameters.
Advantages of Stochastic Gradient Descent
Efficiency: Because it uses only one or a few data points to calculate
the gradient, SGD can be much faster, especially for large datasets.
Each step requires fewer computations, leading to quicker
convergence.

Memory Efficiency: Since it does not require storing the entire dataset
in memory for each iteration, SGD can handle much larger datasets
than traditional gradient descent.

Escaping Local Minima: The noisy updates in SGD, caused by the

stochastic nature of the algorithm, can help the model escape local
minima or saddle points, potentially leading to better solutions in non-
convex optimization problems (common in deep learning).

Online Learning: SGD is well-suited for online learning, where the

model is trained incrementally as new data comes in, rather than on a
static dataset.
Applications of Stochastic Gradient Descent
SGD and its variants are widely used across various domains of machine learning:

Deep Learning: In training deep neural networks, SGD is the default optimizer due
to its efficiency with large datasets and its ability to work with large models. Deep
learning frameworks like TensorFlow and PyTorch typically use variants like Adam or
RMSprop, which are based on SGD.

Natural Language Processing (NLP): Models like Word2Vec and transformers are
trained using SGD variants to optimize large models on vast text corpora.

Computer Vision: For tasks such as image classification, object detection and
segmentation, SGD has been fundamental in training convolutional neural networks
(CNNs).

Reinforcement Learning: SGD is also used to optimize the parameters of models

used in reinforcement learning, such as deep Q-networks (DQNs) and policy
gradient methods.

ML Module Ii
No ratings yet
ML Module Ii
24 pages
Data Analysis ch1
No ratings yet
Data Analysis ch1
13 pages
Ai and ML
No ratings yet
Ai and ML
16 pages
Data Science Lecture: Classification & Regression
No ratings yet
Data Science Lecture: Classification & Regression
27 pages
Perceptron
100% (1)
Perceptron
3 pages
Unit 4 Learning
No ratings yet
Unit 4 Learning
5 pages
Machine Learning Fundamentals Overview
No ratings yet
Machine Learning Fundamentals Overview
14 pages
Project Report 2
No ratings yet
Project Report 2
11 pages
Unit-1 DL
No ratings yet
Unit-1 DL
29 pages
ML Unit 3
No ratings yet
ML Unit 3
17 pages
ML Unit2
No ratings yet
ML Unit2
22 pages
ML Unit-4
No ratings yet
ML Unit-4
20 pages
Overview of Machine Learning Types
No ratings yet
Overview of Machine Learning Types
9 pages
Data Mining Classification and Prediction
No ratings yet
Data Mining Classification and Prediction
17 pages
Unit 4
No ratings yet
Unit 4
38 pages
Machine Learning Concept1
No ratings yet
Machine Learning Concept1
16 pages
Machine Learning-4
100% (1)
Machine Learning-4
18 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
20 pages
ML and DL
No ratings yet
ML and DL
15 pages
Pattern Recognition Unit 2
No ratings yet
Pattern Recognition Unit 2
24 pages
MDL Principle and SVM Overview
No ratings yet
MDL Principle and SVM Overview
28 pages
Lect 4 - Linear Classification
No ratings yet
Lect 4 - Linear Classification
14 pages
AI
No ratings yet
AI
52 pages
Ai Notes For Isa
No ratings yet
Ai Notes For Isa
9 pages
MAchine Learning Notes
No ratings yet
MAchine Learning Notes
6 pages
Tan 2021 J. Phys. Conf. Ser. 1994 012016
No ratings yet
Tan 2021 J. Phys. Conf. Ser. 1994 012016
6 pages
Unit 4 ML
No ratings yet
Unit 4 ML
11 pages
ML LVC 1 Post-Session Summary
No ratings yet
ML LVC 1 Post-Session Summary
15 pages
Unit-1 ML
No ratings yet
Unit-1 ML
19 pages
Comparative Study of Four Supervised Machine Learning Techniques For Classification
No ratings yet
Comparative Study of Four Supervised Machine Learning Techniques For Classification
15 pages
Unit 2 - Intro To ML Tech
No ratings yet
Unit 2 - Intro To ML Tech
47 pages
IT 802 ML Unit-2 Notes
No ratings yet
IT 802 ML Unit-2 Notes
19 pages
ML Notes - 2025
No ratings yet
ML Notes - 2025
145 pages
ML Unit-Ii
No ratings yet
ML Unit-Ii
37 pages
AI Internal Questions Solution
No ratings yet
AI Internal Questions Solution
15 pages
Unit4 (ML)
No ratings yet
Unit4 (ML)
8 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
21 pages
Multi-Layer Perceptron: Components
No ratings yet
Multi-Layer Perceptron: Components
9 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
24 pages
Artificial Neural Network Bao
No ratings yet
Artificial Neural Network Bao
26 pages
What Is Machine Learning
No ratings yet
What Is Machine Learning
4 pages
Fundamentals of Machine Learning II
No ratings yet
Fundamentals of Machine Learning II
13 pages
Intro to Supervised Machine Learning
No ratings yet
Intro to Supervised Machine Learning
42 pages
Chapter 9. Classification: Advanced Methods
No ratings yet
Chapter 9. Classification: Advanced Methods
39 pages
UNIT 4 Supervised Learning
No ratings yet
UNIT 4 Supervised Learning
38 pages
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
No ratings yet
Fundamental Knowledge of Machine Learning: Abstract This Chapter Introduces The Basic Concepts and Methods of Machine
14 pages
ML Unit 4
No ratings yet
ML Unit 4
19 pages
Unit II
No ratings yet
Unit II
25 pages
Machine Learning QB
No ratings yet
Machine Learning QB
15 pages
Unit 2 - Class
No ratings yet
Unit 2 - Class
16 pages
MLT Unit 2 Notes
No ratings yet
MLT Unit 2 Notes
58 pages
Introduction To AI
No ratings yet
Introduction To AI
51 pages
Tutorial On Support Vector Machine (SVM) : Abstract
No ratings yet
Tutorial On Support Vector Machine (SVM) : Abstract
13 pages
Supervised Learning (Classification and Regression)
No ratings yet
Supervised Learning (Classification and Regression)
14 pages
Unit 2
No ratings yet
Unit 2
63 pages
Machine Learning
No ratings yet
Machine Learning
38 pages
Soft Computing Unit 2 Notes..
No ratings yet
Soft Computing Unit 2 Notes..
24 pages
Statistics
No ratings yet
Statistics
38 pages
IJCRT2307700
No ratings yet
IJCRT2307700
6 pages
Cholesterol Analysis Using ANOVA
No ratings yet
Cholesterol Analysis Using ANOVA
2 pages
Survival Part 5
No ratings yet
Survival Part 5
36 pages
SSMDA Notes Unit 2
No ratings yet
SSMDA Notes Unit 2
47 pages
Effective Financial Forecasting Techniques
No ratings yet
Effective Financial Forecasting Techniques
8 pages
Applied Econometrics I Syllabus
No ratings yet
Applied Econometrics I Syllabus
8 pages
Coking Coal
0% (1)
Coking Coal
13 pages
Gellish 2007
No ratings yet
Gellish 2007
8 pages
How To Use MBAL
No ratings yet
How To Use MBAL
15 pages
CherryPick: Optimal Cloud Configurations for Big Data
No ratings yet
CherryPick: Optimal Cloud Configurations for Big Data
15 pages
Junior Data Scientist Program for Kids
No ratings yet
Junior Data Scientist Program for Kids
24 pages
Multivariate Regression in Matrix Form
No ratings yet
Multivariate Regression in Matrix Form
29 pages
Linear Algebra for AI Students
0% (1)
Linear Algebra for AI Students
24 pages
Econometrics Project: Prof. Coord.: Serban Daniela, Phd. Vîlceanu Letiţia-Gabriela
No ratings yet
Econometrics Project: Prof. Coord.: Serban Daniela, Phd. Vîlceanu Letiţia-Gabriela
13 pages
Data Science Study Materials
No ratings yet
Data Science Study Materials
47 pages
Structured Format of Predictive
No ratings yet
Structured Format of Predictive
13 pages
Above-Ground Biomass Estimation of Mangrove Forest
No ratings yet
Above-Ground Biomass Estimation of Mangrove Forest
9 pages
Study On Deep Reinforcement Learning Techniques For Building Energy
No ratings yet
Study On Deep Reinforcement Learning Techniques For Building Energy
14 pages
Personality Prediction System Via CV Analysis
No ratings yet
Personality Prediction System Via CV Analysis
7 pages
Artificial Neural Network (ANN) Optimized Entropy Ornamented Effects On
No ratings yet
Artificial Neural Network (ANN) Optimized Entropy Ornamented Effects On
16 pages
Calculator Owner's Manual and Safety
No ratings yet
Calculator Owner's Manual and Safety
2 pages
Ethics and Ai Record
No ratings yet
Ethics and Ai Record
22 pages
Determinants of Excessive Screen Time Among Children Under Five Years Old in Selangor, Malaysia: A Cross-Sectional Study
No ratings yet
Determinants of Excessive Screen Time Among Children Under Five Years Old in Selangor, Malaysia: A Cross-Sectional Study
11 pages
(Ebook PDF) Quantitative Methods For Health Research: A Practical Interactive Guide To Epidemiology and Statistics 2nd Editioninstant Download
100% (6)
(Ebook PDF) Quantitative Methods For Health Research: A Practical Interactive Guide To Epidemiology and Statistics 2nd Editioninstant Download
47 pages
Stress Shift
No ratings yet
Stress Shift
14 pages
High-Dimensional Penalized GEE Package
No ratings yet
High-Dimensional Penalized GEE Package
11 pages
Impurity Measures in Decision Trees (Machine Learning) Impurity Measures
No ratings yet
Impurity Measures in Decision Trees (Machine Learning) Impurity Measures
39 pages
Does Crime Pay More Than Punishment Hurts? Prior Illegal Income, Incarceration, and Return To Income Generating Crime
No ratings yet
Does Crime Pay More Than Punishment Hurts? Prior Illegal Income, Incarceration, and Return To Income Generating Crime
22 pages
Econometrics: Serial Correlation
No ratings yet
Econometrics: Serial Correlation
7 pages

1

Uploaded by

1

Uploaded by

Machine learning (ML)

The process works through:

Classification: Where the output is a categorical variable (e.g., spam

Regression: Where the output is a continuous variable (e.g.,

Figure B: It is a Meteorological dataset that serves the purpose of predicting wind

• The most common occurrence is in connection with regression

• However, the term is also used in time series analysis with a

the perceptron model mimics the functioning of a biological neuron,

It is a type of linear classifier that predicts whether an input belongs

At its core, the perceptron algorithm performs a weighted sum of the

Activation Function: The activation function is applied to the weighted sum

Furthermore, various modifications of this method are still being

2. Training and Optimization

For example, in the popular machine learning library Scikit-Learn, QP is solved by an

In order to do that, we don’t just arbitrarily add another dimension,

In contrast to SVM, perceptron doesn’t use the kernel trick and

One-vs-One (OvO) approach means that we break the multiclass

In contrast to SVM, perceptron with probabilistic activation function in

Unlike linear regression which predicts continuous values it predicts

It uses sigmoid function to convert inputs into a probability value

These models consist of interconnected nodes or neurons

The processed information then moves to the output layer, which

The accuracy of these predictions is refined through a process called

Hidden Layer: The single hidden layer in a shallow network

Weighted Sum: Each neuron in the hidden layer calculates a weighted

Activation Function: The weighted sums are passed through an

Forward Propagation: Calculating the output for a given input by

Loss Calculation: Determining how far the network's output is from

Backpropagation: Calculating the gradient of the loss function with

Weight Update: Adjusting the weights using an optimization

L(ŷ, y) = measure of difference between prediction ŷ and true value y

Loss functions are important because they:

A. Mean Squared Error (MSE) Loss

A. Binary Cross-Entropy Loss (Log Loss)

B. Categorical Cross-Entropy Loss

A. Pixel-wise Cross-Entropy Loss

C. Jaccard Loss (Intersection over Union, IoU)

• It computes the gradient of the loss function with respect to each

• Without backpropagation, training deep neural networks would be

• Its goal is to reduce the difference between the model’s predicted

It does this by cleverly applying the chain rule of calculus, propagating

It works iteratively to adjust weights and bias to minimize the cost

It often uses optimization algorithms like gradient descent or

∂L/∂θ = (∂L/∂ŷ) * (∂ŷ/∂z) * (∂z/∂θ)

• Efficient Weight Update: It computes the gradient of

• Scalability: The Back Propagation algorithm scales well

• Automated Learning: With Back Propagation the

To calculate the error we can use the below formula:

• This makes the computation much faster.

• The expectation maybe approximately estimated using a small set

• Speciﬁcally, on each step of the algorithm, we can sample a

• The mini batch size m is typically chosen to be a relatively small

• Crucially, m is usually held ﬁxed as the training set size m grows.

• We may ﬁt a training set with billions of examples using updates

Repeat: Continue until the loss converges or a stopping criterion (e.g.,

• It operates by iteratively adjusting the model's parameters to

• Predicting a continuous output variable, also known as the dependent

Escaping Local Minima: The noisy updates in SGD, caused by the

Online Learning: SGD is well-suited for online learning, where the

Reinforcement Learning: SGD is also used to optimize the parameters of models

You might also like