0% found this document useful (0 votes)
7 views

UNIT III Part-2

This document discusses semi-parametric methods in machine learning, focusing on linear models, discriminant-based classification, and gradient descent. It covers the basics of linear regression and logistic regression, the geometry of linear discriminants, and the importance of pairwise separation for non-linearly separable classes. Additionally, it explains gradient descent and its types, including batch, stochastic, and mini-batch gradient descent, highlighting their advantages and applications in optimizing machine learning models.

Uploaded by

janarthana9789
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

UNIT III Part-2

This document discusses semi-parametric methods in machine learning, focusing on linear models, discriminant-based classification, and gradient descent. It covers the basics of linear regression and logistic regression, the geometry of linear discriminants, and the importance of pairwise separation for non-linearly separable classes. Additionally, it explains gradient descent and its types, including batch, stochastic, and mini-batch gradient descent, highlighting their advantages and applications in optimizing machine learning models.

Uploaded by

janarthana9789
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 39

UNIT III

SEMI PARAMETRIC METHODS

Code:U18CST7002
Presented by: Nivetha R
Department: CSE
Contents

• Introduction to Linear Model


• Generalizing linear model
• Geometry of linear Discriminant
• Pairwise Separations
• Gradient Descent
Linear Model

•The Linear Model is one of the most straightforward models in


machine learning. It is the building block for many complex
machine learning algorithms, including deep neural networks.
•Linear models predict the target variable using a linear function
of the input features. Two crucial linear models in machine
learning: linear regression and logistic regression.
•Linear regression is used for regression tasks, whereas logistic
regression is a classification algorithm.
Likelihood vs. Discriminant-based Classification

• Likelihood-based: Assume a model for p(x|Ci), use


Bayes’ rule to calculate P(Ci|x) ​

• Discriminant-based: Assume a model for gi(x|Φi); no


density estimation​

• Estimating the boundaries is enough; no need to accurately


estimate the densities inside the boundaries
This approach is useful because it focuses on the boundaries
between classes rather than trying to estimate the probability
distribution within each class.
Discriminant-based Classification
A discriminant function is a mathematical function used
in classification tasks to determine which class a given data
point belongs to. It assigns a score to each class based on
the input features of the data point. The class with the
highest score is chosen as the predicted class for that data
point.
Discriminant-based Classification
Discriminant-based Classification
Discriminant-based Classification
Discriminant-based Classification
Linear Discriminant​

• In linear discriminant ,the final output is a weighted


sum of the input attributes

• The magnitude of the weight shows the importance


of attributes
• Its sign indicates if the effect is positive or negative.
Generalizing the Linear Model
Generalized Linear Model in the context of classification, particularly
focusing on how we can extend linear models to handle more complex
relationships between features
Generalizing the Linear Model
Generalizing the Linear Model
Generalizing the Linear Model
Nonlinear basis functions help transform original features into
new, more complex forms, making it easier to apply linear
models to data that has a non-linear relationship between
input and output
Summary -Generalizing the Linear Model
Geometry of Linear Discriminant
Linear Discriminant analysis is one of the most popular
dimensionality reduction techniques used for supervised
classification problems in machine learning. It is also considered a
pre-processing step for modeling differences in ML and applications
of pattern classification.

Linear Discriminant Analysis model is considered the most common


technique to solve such classification problems. For e.g., if we have
two classes with multiple features and need to separate them
efficiently. When we classify them using a single feature, then it
may show overlapping.
Geometry of Linear Discriminant
Derivations for Geometry of Linear
Discriminant
Derivations for Geometry of Linear
Discriminant
Derivations for Geometry of Linear
Discriminant
Derivations for Geometry of Linear
Discriminant
Derivations for Geometry of Linear
Discriminant
Derivations for Geometry of Linear
Discriminant
Pairwise Separation

• Pairwise separation is a method used when the classes in a


dataset are not linearly separable. It breaks down the problem
into smaller, more manageable parts by focusing on separating
each pair of classes individually.
Pairwise Separation
Pairwise Separation
Pairwise Separation
Gradient Descent
• Gradient Descent is known as one of the most commonly used
optimization algorithms to train machine learning models by
means of minimizing errors between actual and expected results.
Further, gradient descent is also used to train Neural Networks.
• In mathematical terminology, Optimization algorithm refers to the
task of minimizing/maximizing an objective function f(x)
parameterized by x. Similarly, in machine learning, optimization
is the task of minimizing the cost function parameterized by the
model's parameters.
• The main objective of gradient descent is to minimize the convex
function using iteration of parameter updates. Once these
machine learning models are optimized, these models can be
used as powerful tools for Artificial Intelligence and various
computer science applications.
What is Gradient Descent or Steepest
Descent
• Gradient Descent is defined as one of the most commonly used
iterative optimization algorithms of machine learning to train the
machine learning and deep learning models. It helps in finding
the local minimum of a function.

• The best way to define the local minimum or local maximum of a


function using gradient descent is as follows:

• If moved towards a negative gradient or away from the gradient


of the function at the current point, it will give the local minimum
of that function.

• Whenever move towards a positive gradient or towards the


gradient of the function at the current point, we will get the local
maximum of that function.
What is Gradient Descent or Steepest
Descent

The main objective of using a gradient descent algorithm


is to minimize the cost function using iteration. To achieve
this goal, it performs two steps iteratively:
Gradient Descent
• The first-order derivative of a function is used to compute its
gradient or slope.
• In gradient descent, we update the parameters by moving in the
opposite direction of the gradient.
• This means adjusting the values by a factor of alpha, where
alpha (α) is the learning rate—a crucial tuning parameter that
determines the step size taken during the optimization process.
• It helps control how quickly or slowly the algorithm converges to
the optimal solution.
What is Cost-function?

• The cost function is defined as the measurement of difference or


error between actual values and expected values at the current
position and present in the form of a single real number.

• It helps to increase and improve machine learning efficiency by


providing feedback to this model so that it can minimize error and
find the local or global minimum.

• Further, it continuously iterates along the direction of the


negative gradient until the cost function approaches zero.

• At this steepest descent point, the model will stop learning


further.
How does Gradient Descent work?
How does Gradient Descent work?
How does Gradient Descent work?
The main objective of gradient descent is to minimize the cost
function or the error between expected and actual.

Learning Rate:

• It is defined as the step size taken to reach the minimum or


lowest point. This is typically a small value that is evaluated and
updated based on the behavior of the cost function. If the
learning rate is high, it results in larger steps but also leads to
risks of overshooting the minimum. At the same time, a low
learning rate shows the small step sizes, which compromises
overall efficiency but gives the advantage of more precision.
How does Gradient Descent work?
Types of Gradient Descent
1. Batch Gradient Descent:

Batch gradient descent (BGD) is used to find the error for each
point in the training set and update the model after evaluating all
training examples. This procedure is known as the training epoch.
In simple words, it is a greedy approach where need to sum over all
examples for each update.

Advantages of Batch gradient descent:

• It produces less noise in comparison to other gradient descent.

• It produces stable gradient descent convergence.

• It is Computationally efficient as all resources are used for all


training samples.
Types of Gradient Descent
2. Stochastic gradient descent

Stochastic gradient descent (SGD) is a type of gradient descent that runs one
training example per iteration. Or in other words, it processes a training epoch
for each example within a dataset and updates each training example's
parameters one at a time. As it requires only one training example at a time,
hence it is easier to store in allocated memory. However, it shows some
computational efficiency losses in comparison to batch gradient systems as it
shows frequent updates that require more detail and speed. Further, due to
frequent updates, it is also treated as a noisy gradient

Advantages of Stochastic gradient descent

• It is easier to allocate in desired memory.

• It is relatively fast to compute than batch gradient descent.

• It is more efficient for large datasets.


Types of Gradient Descent
3. Mini Batch Gradient Descent:

Mini Batch gradient descent is the combination of both batch gradient descent
and stochastic gradient descent. It divides the training datasets into small
batch sizes then performs the updates on those batches separately. Splitting
training datasets into smaller batches make a balance to maintain the
computational efficiency of batch gradient descent and speed of stochastic
gradient descent. Hence, achieving a special type of gradient descent with
higher computational efficiency and less noisy gradient descent.

Advantages of Mini Batch gradient descent:

• It is easier to fit in allocated memory.

• It is computationally efficient.

• It produces stable gradient descent convergence

You might also like