0% found this document useful (0 votes)

42 views53 pages

ML Unit 2

The document discusses various classification and regression models, focusing on linear segmentation, decision trees, linear discriminants, linear regression, and logistic regression. It explains how these models work, their advantages, limitations, and applications in real-world scenarios. Key concepts include decision boundaries, training data, and the importance of linear separability for model effectiveness.

Uploaded by

rebellion1452

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views53 pages

ML Unit 2

Uploaded by

rebellion1452

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

CLASSIFICATION AND

REGRESSION MODELS
Linear Segmentation and Decision region
Linear Segmentation

What it is: Imagine you have a bunch of data points on a graph, and you want to divide
them into groups (or "segments") based on their features. Linear segmentation uses
straight lines (in 2D) or hyperplanes (in higher dimensions) to create these divisions.

Why it's useful: It's a simple and efficient way to separate data when the groups have a
clear, linear boundary. Think of it like drawing a line to separate apples from oranges on a
table.

Examples:
• Image processing: Identifying edges in an image.
• Classification: Categorizing customers as likely to buy or not buy a product.
Linear Segmentation and Decision region (Contd.)
Decision Regions

What they are: The areas created by your segmentation lines. Each region represents a
specific category or class. If a new data point falls within a region, it's assigned to that
category.

How they work: The decision boundary (your line or hyperplane) is what separates the
regions. The goal is to have the decision boundary placed so that it correctly classifies as
many data points as possible.

Example: In a medical diagnosis scenario, one region might represent "healthy" and
another "needs further testing."
Linear Segmentation and Decision region (Contd.)
Key Concepts

•Linear Classifiers: Algorithms that use linear segmentation to create decision regions (e.g.,
Logistic Regression, Linear SVM).
•Feature Space: The space where your data points are plotted, with each axis representing
a feature.
•Training: The process of finding the best position for the decision boundary using labeled
data.
Important Notes
•Linear segmentation works best when the data is linearly separable (i.e., you can draw a
straight line to perfectly divide the groups).
•Real-world data is often more complex, requiring non-linear methods for accurate
segmentation.
Linear Segmentation and Decision region (Contd.)
Explanation:
•Data points: Each flower is represented by a point
on the graph. Blue circles are "Iris" flowers, and
red squares are "Not Iris" flowers.
•Features: The x-axis represents sepal length, and
the y-axis represents petal width.
•Decision boundary: The straight line is the
decision boundary. It's what our linear
segmentation creates.
•Decision regions: The area above the line is the
"Iris" decision region, and the area below is the
"Not Iris" decision region.
•Classification: If a new flower has a sepal length
and petal width that fall in the "Iris" region, we
classify it as "Iris." Otherwise, it's "Not Iris."
Linear Discriminates
Def: Linear discriminants are a fundamental concept in machine learning, particularly in
classification tasks. They provide a way to separate data points into different categories
using linear decision boundaries. Here's a breakdown of what they are and how they work:

What are Linear Discriminants?

•Decision Boundaries: Imagine you have data points belonging to different classes (e.g., cats
vs. dogs). A linear discriminant aims to find a straight line (in 2D) or a hyperplane (in higher
dimensions) that best separates these classes. This line or hyperplane is called the decision
boundary.
•Linear Combination of Features: The decision boundary is defined by a linear combination
of the features of your data points. For example, if you have two features (x1 and x2), the
decision boundary might be represented by an equation like: w1x1 + w2x2 + b = 0, where
w1 and w2 are weights, and b is a bias term.
•Classification: To classify a new data point, you simply plug its features into the equation. If
the result is positive, it belongs to one class; otherwise, it belongs to the other.
Linear Discriminates (Contd.)
How do Linear Discriminants Work?
1.Training Data: You start with labeled data, where you know the class of each data point.
2.Finding the Best Boundary: The goal is to find the weights (w1, w2, etc.) and bias (b) that
define the decision boundary that best separates the classes. This is typically done using
optimization algorithms.
3.Maximizing Separation: The algorithm tries to maximize the distance between the classes
and the decision boundary. This helps to improve the classifier's ability to generalize to new,
unseen data.

Types of Linear Discriminants

•Fisher's Linear Discriminant (FLD): A classic method that finds the linear combination of
features that maximizes the separation between classes.
•Perceptron: A simple algorithm that learns a linear decision boundary by iteratively
adjusting the weights based on misclassified data points.
Linear Discriminates (Contd.)
Advantages of Linear Discriminants
•Simple and Efficient: They are computationally inexpensive and easy to implement.
•Interpretability: The weights assigned to each feature provide insights into which features
are most important for classification.

Limitations of Linear Discriminants

•Linear Separability: They work best when the classes are linearly separable, meaning you
can draw a straight line or hyperplane to perfectly divide them.
•Complex Data: They may not perform well on complex datasets with non-linear
relationships between features and classes.

Applications of Linear Discriminants

•Pattern Recognition: Identifying objects in images or sounds.
•Medical Diagnosis: Classifying patients into different disease categories.
•Natural Language Processing: Categorizing text documents.
Linear Discriminates (Contd.)
Linear Discriminant Analysis (LDA) is a dimensionality reduction and
classification technique commonly used in machine learning and pattern
recognition. In the context of classification it aims to find a linear combination
of features that best separates different classes or categories of data. It seeks
to reduce the dimensionality of the feature space while preserving as much of
the class-separability information as possible.
Linear Regression
Def: Linear regression is a fundamental and widely used algorithm in machine learning and
statistics. It's used for predicting a continuous outcome variable based on one or more
predictor variables.

What is Linear Regression?

•Predicting a Continuous Value: Linear regression aims to find the best-fitting linear
relationship between the predictor variables (also called independent variables or features)
and the outcome variable (also called the dependent variable or target). The outcome
variable is continuous, meaning it can take on any value within a range (e.g., house prices,
temperature, sales figures).
•Linear Relationship: The core assumption is that the relationship between the predictors
and the outcome can be modeled by a straight line (in simple linear regression with one
predictor) or a hyperplane (in multiple linear regression with more than one predictor).
•Finding the Best Fit: The algorithm learns the coefficients (weights) for each predictor
variable that minimize the difference between the predicted values and the actual values in
the training data. This difference is often measured using the mean squared error.
Linear Regression (Contd.)
Types of Linear Regression:
•Simple Linear Regression: One predictor variable. Example: Predicting house prices based
on the size of the house.
•Multiple Linear Regression: Two or more predictor variables. Example: Predicting house
prices based on size, number of bedrooms, and location.

Cost Function (Mean Squared Error):

The mean squared error (MSE) measures the average squared difference between
the predicted values and the actual values. The goal is to minimize the MSE.
Linear Regression (Contd.)
Linear Regression (Contd.)
Advantages of Linear Regression:
•Simple and Easy to Understand: Linear regression is relatively easy to understand and
interpret.
•Computationally Efficient: Training and prediction are fast, even with large datasets.
•Widely Available: Linear regression is implemented in almost all statistical software and
machine learning libraries.

Limitations of Linear Regression:

•Linearity Assumption: It assumes a linear relationship between the predictors and the
outcome. If the relationship is non-linear, linear regression may not perform well.
•Sensitivity to Outliers: Outliers can significantly affect the regression line.
•Overfitting: With too many predictor variables, the model can overfit the training data and
not generalize well to new data.
Linear Regression (Contd.)
In the given figure,

X-axis = Independent variable

Y-axis = Output / dependent variable

Line of regression = Best fit line for a

model

Here, a line is plotted for the given data

points that suitably fit all the issues.
Hence, it is called the ‘best fit line.’ The
goal of the linear regression algorithm is to
find this best fit line seen in the above
figure.
Logistic Regression
Def: Logistic regression is a powerful and widely used algorithm in machine learning for
classification tasks. Unlike linear regression, which predicts continuous values, logistic
regression predicts the probability of an instance belonging to a certain class.

What is Logistic Regression?

•Classification: Logistic regression is used when the outcome variable is categorical,
meaning it belongs to a set of distinct categories (e.g., spam or not spam, cat or dog, disease
or no disease).
•Probability: The output of logistic regression is a probability between 0 and 1, representing
the likelihood of an instance belonging to a particular class.
•Sigmoid Function: The core of logistic regression is the sigmoid function, which takes any
real-valued number as input and outputs a value between 0 and 1. This function is what
allows us to interpret the output as a probability.
Logistic Regression (Contd.)
Logistic Regression (Contd.)
Types of Logistic Regression:
•Binary Logistic Regression: The outcome variable has only two possible classes (e.g., spam
or not spam).
•Multinomial Logistic Regression: The outcome variable has more than two possible classes
(e.g., classifying images as cat, dog, or bird).

Advantages of Logistic Regression:

•Simple and Easy to Understand: Logistic regression is relatively easy to understand and
interpret.
•Efficient: Training and prediction are fast, even with large datasets.
•Probabilistic Output: The output is a probability, which provides more information than
just a class label.
Logistic Regression (Contd.)

Limitations of Logistic Regression:

•Linearity Assumption: Logistic regression assumes a linear relationship between the
features and the log-odds of the outcome.
•Sensitivity to Outliers: Outliers can significantly affect the model.
•Overfitting: With too many features, the model can overfit the training data.

Applications of Logistic Regression:

•Medical Diagnosis: Predicting the likelihood of a patient having a certain disease.
•Marketing: Predicting customer churn or the likelihood of a customer clicking on an ad.
•Finance: Predicting loan defaults or credit card fraud.
•Natural Language Processing: Classifying text documents or identifying spam emails.
Logistic Regression (Contd.)
Decisiontrees
Def: Decision trees are widely used machine learning algorithm and can be used for both
classification and regression tasks. These models work by splitting data into subsets based
on feature and this splitting is called as decision making and each leaf node tells us
prediction. This splitting creates a tree-like structure. They are easy to interpret and visualize
for understanding the decision-making process.

Types of Decision Tree Algorithms

The different decision tree algorithms are listed below:
•ID3(Iterative Dichotomiser 3)
•C4.5
•CART(Classification and Regression Trees)
Decisiontrees (Contd.)
Decision tree is a simple diagram that shows different choices and their possible results
helping you make decisions easily. This article is all about what decision trees are, how they
work, their advantages and disadvantages and their applications.

Understanding Decision Tree

A decision tree is a graphical representation of different options for solving a problem and
show how different factors are related. It has a hierarchical tree structure starts with one
main question at the top called a node which further branches out into different possible
outcomes where:
•Root Node is the starting point that represents the entire dataset.
•Branches: These are the lines that connect nodes. It shows the flow from one decision to
another.
•Internal Nodes are Points where decisions are made based on the input features.
•Leaf Nodes: These are the terminal nodes at the end of branches that represent final
outcomes or predictions
Decisiontrees (Contd.)
Decisiontrees (Contd.)
They also support decision-making by
visualizing outcomes. You can quickly
evaluate and compare the “branches” to
determine which course of action is best for
you.
Now, let’s take an example to understand the
decision tree. Imagine you want to decide
whether to drink coffee based on the time of
day and how tired you feel. First the tree
checks the time of day—if it’s morning it asks
whether you are tired. If you’re tired the tree
suggests drinking coffee if not it says there’s
no need. Similarly in the afternoon the tree
again asks if you are tired. If you
recommends drinking coffee if not it
concludes no coffee is needed.
Decisiontrees (Contd.)
Classification of Decision Tree
We have mainly two types of decision tree based on the nature of the target
variable: classification trees and regression trees.
•Classification trees: They are designed to predict categorical outcomes means they classify
data into different classes. They can determine whether an email is “spam” or “not spam”
based on various features of the email.
•Regression trees : These are used when the target variable is continuous It predict
numerical values rather than categories. For example a regression tree can estimate the
price of a house based on its size, location, and other features.
Decisiontrees (Contd.)
How Decision Trees Work?
 A decision tree working starts with a main question known as the root node. This
question is derived from the features of the dataset and serves as the starting point for
decision-making.
 From the root node, the tree asks a series of yes/no questions. Each question is designed
to split the data into subsets based on specific attributes.
 This branching continues through a sequence of decisions. As you follow each branch,
you get more questions that break the data into smaller groups. This step-by-step process
continues until you have no more helpful questions .
 You reach at the end of a branch where you find the final outcome or decision. It could be
a classification (like “spam” or “not spam”) or a prediction (such as estimated price).
ID3 Algorithm
 The ID3 algorithm is a popular decision tree algorithm used in machine learning. It aims
to build a decision tree by iteratively selecting the best attribute to split the data based
on information gain. Each node represents a test on an attribute, and each branch
represents a possible outcome of the test. The leaf nodes of the tree represent the final
classifications.
 It is a greedy algorithm that builds a decision tree by recursively partitioning the data set
into smaller and smaller subsets until all data points in each subset belong to the same
class.
 Thе ID3 (Iterative Dichotomiser 3) algorithm is a classic decision tree algorithm used for
both classification and regression tasks.ID3 deals primarily with categorical properties,
which means that it can efficiently handle objects with a discrete set of values.
 One of the strengths of ID3 is its ability to generate interpretable decision trees. The
resulting tree structure is easily understood and visualized, providing insight into the
decision-making process.
ID3 Algorithm (Contd.)

 The ID3 algorithm works by building a decision tree, which is a hierarchical structure that
classifies data points into different categories and splits the dataset into smaller subsets
based on the values of the features in the dataset.
 The ID3 algorithm then selects the feature that provides the most information about the
target variable.
 The decision tree is built top-down, starting with the root node, which represents the
entire dataset.
 At each node, the ID3 algorithm selects the attribute that provides the most information
gain about the target variable.
 The attribute with the highest information gain is the one that best separates the data
points into different categories.
ID3 Algorithm (Contd.)
ID3 metrices

The ID3 algorithm utilizes metrics related to information theory, particularly entropy and information
gain, to make decisions during the tree-building process.
Information Gain and Attribute Selection
The ID3 algorithm uses a measure of impurity, such as entropy or Gini impurity, to calculate
the information gain of each attribute. Entropy is a measure of disorder in a dataset. A dataset with
high entropy is a dataset where the data points are evenly distributed across the different categories.
A dataset with low entropy is a dataset where the data points are concentrated in one or a few
categories.

If entropy is low, data is well understood; if high, more information is needed. Preprocessing data
before using ID3 can enhance accuracy. In sum, ID3 seeks to reduce uncertainty and make informed
decisions by picking attributes that offer the most insight in a dataset.
ID3 Algorithm (Contd.)
Information gain assesses how much valuable information an attribute can provide. We select the
attribute with the highest information gain, which signifies its potential to contribute the most to
understanding the data. If information gain is high, it implies that the attribute offers a significant
insight. ID3 acts like an investigator, making choices that maximize the information gain in each step.
This approach aims to minimize uncertainty and make well-informed decisions, which can be further
enhanced by preprocessing the data.
ID3 Algorithm (Contd.)
What are the steps in ID3 algorithm?

1.Determine entropy for the overall the dataset using class distribution.

2.For each feature.

2. Calculate Entropy for Categorical Values.

3. Assess information gain for each unique categorical value of the feature.

3.Choose the feature that generates highest information gain.

4.Iteratively apply all above steps to build the decision tree structure.
ID3 Algorithm (Contd.)
How ID3 Works:
The ID3 algorithm is specifically designed for building decision trees from a given dataset. Its primary
objective is to construct a tree that best explains the relationship between attributes in the data and
their corresponding class labels.
1. Selecting the Best Attribute
 ID3 employs the concept of entropy and information gain to determine the attribute that best
separates the data. Entropy measures the impurity or randomness in the dataset.
 The algorithm calculates the entropy of each attribute and selects the one that results in the most
significant information gain when used for splitting the data.
2. Creating Tree Nodes
 The chosen attribute is used to split the dataset into subsets based on its distinct values.
 For each subset, ID3 recurses to find the next best attribute to further partition the data, forming
branches and new nodes accordingly.
3. Stopping Criteria
The recursion continues until one of the stopping criteria is met, such as when all instances in a
branch belong to the same class or when all attributes have been used for splitting.
ID3 Algorithm (Contd.)
4. Handling Missing Values
ID3 can handle missing attribute values by employing various strategies like attribute mean/mode
substitution or using majority class values.

5. Tree Pruning
Pruning is a technique to prevent overfitting. While not directly included in ID3, post-processing
techniques or variations like C4.5 incorporate pruning to improve the tree's generalization.
ID3 Algorithm (Contd.)
Mathematical Concepts of ID3 Algorithm
ID3 Algorithm (Contd.)
ID3 Algorithm (Contd.)
Advantages of ID3
•Simple and easy to understand.
•Requires little training data.
•Can work well with data with discrete and continuous attributes.

Disadvantages of ID3
•Can lead to overfitting.
•May not be effective with data with many attributes.

Applications of ID3
1.Fraud detection: ID3 can be used to develop models that can detect fraudulent transactions or
activities.
2.Medical diagnosis: ID3 can be used to develop models that can diagnose diseases or medical
conditions.
3.Customer segmentation: ID3 can be used to segment customers into different groups based on
their demographics, purchase history, or other factors.
4.Risk assessment: ID3 can be used to assess risk in a variety of different areas, such as insurance,
finance, and healthcare.
C4.5 Algorithm
Introduction:
 The C4.5 algorithm's fundamental building block, decision trees provide the framework
for its categorization procedure. These trees depict a structure that is hierarchical and
akin to a flowchart, with each internal node signifying an attribute test, each branch
designating the test's result, and every leaf node designating a class name.
 The decision tree in C4.5 is built iteratively, splitting the dataset at each stage based on
the best attribute. Based on metrics like data gain or gain ratio-which gauge how well an
attribute reduces confusion about the class labels-the optimal attribute is chosen.
 Decision trees, however, are susceptible to overfitting, a phenomenon in which the
model incorrectly identifies patterns in its training data as noise. C4.5 uses pruning
approaches to increase the tree's generalisation efficiency on unseen data and to lessen
the impact of this problem.
C4.5 Algorithm (Contd.)
 C4.5 uses a modified version of information gain called the gain ratio to reduce the bias
towards features with many values. The gain ratio is computed by dividing the
information gain by the intrinsic information which measures the amount of data
required to describe an attribute’s values:

 It addresses several limitations of ID3 including its inability to handle continuous

attributes and its tendency to overfit the training set. It handles continuous attributes by
first sorting the attribute values and then selecting the midpoint between adjacent values
as a potential split point. The split that maximizes information gain or gain ratio is chosen.
 It can also generate rules from the decision tree by converting each path from the root to
a leaf into a rule, which can be used to make predictions on new data.
 This algorithm improves accuracy and reduces overfitting by using gain ratio and post-
pruning. While effective for both discrete and continuous attributes, C4.5 may still
struggle with noisy data and large feature sets.
C4.5 Algorithm (Contd.)
C4.5 Pruning Techniques:
1. Decreased Pruning Error:
With this method, the decision tree is traversed recursively from bottom to top, and the
effects of eliminating each subtree are assessed on an approved dataset. A subtree is cut
down (replaced with an individual leaf node) if its removal results in better performance or
no appreciable decline in performance.
2. After-run rule:
C4.5 builds decision trees and turns these into sets of rules rather than directly pruning
subtrees. After that, these rules are reduced in size using a validation dataset's predicted
accuracy as a basis. Pruning is the process of getting rid of rules that cause overfitting or
don't enhance classification performance.
C4.5 Algorithm (Contd.)
3. The principle of Minimum Description Length (MDL):
When determining when to stop developing the decision tree, C4.5 applies the MDL concept
as a guideline. This concept strikes a compromise between the model's intricacy and how
well it fits the data. The decision tree is expanded until its description length (a gauge of
model complexity) is significantly reduced without additional partitioning.
4. Subtree Substitution:
If the subtree's error rate is not appreciably higher than the leaf node's, C4.5 replaces whole
subtrees using a single leaf node. In doing so, the decision tree's simplicity and forecast
accuracy are maintained.
C4.5 Algorithm (Contd.)
How the C4.5 Algorithm Operates:
1. Starting Point:
Starting with the complete dataset, the method treats it as the decision tree's root node.
Every instance in the collection is a piece of information with associated class label and
characteristics, or attributes.
2. Selection of Attributes:
At each decision tree node, C4.5 determines which attribute is optimal to partition the
dataset into. For every characteristic, it computes a metric, usually information gain and gain
ratio. These measures indicate how well a characteristic reduces ambiguity regarding the
class labels. For the present node, the splitting criterion is determined by selecting the
characteristic that has the greatest amount of data gain or gain ratio.
C4.5 Algorithm (Contd.)
3. Dividing the Collection:
Following attribute selection, the dataset is partitioned into subsets according to the
attribute's potential values. Each subgroup for a categorical attribute corresponds to a
unique attribute value. C4.5 establishes an appropriate threshold to separate the data in
subsets for continuous characteristics.
4. Building Recursive Trees:
The method splits the dataset and performs attribute selection iteratively to every subset
that was produced in the preceding stage. This procedure keeps going until any of the
following requirements is satisfied:
 When all instances inside a subset are members of the same class, a leaf node is
produced.
 There are no more qualities that can be divided.
 The tree has reached a certain depth. The amount of occurrences in a subset reaches a
certain threshold.
C4.5 Algorithm (Contd.)
5. Reduction:
Pruning is done once the tree reaches maturity in order to lessen overfitting. By eliminating
nodes or branch that do not considerably increase prediction accuracy, pruning includes
making the tree simpler. Reduced mistake pruning, rule after pruning, and subtree
substitution are common pruning methods.
6. Results:
A collection of categorization rules is represented by the decision tree that is produced.
Every leaf node in the structure of the tree matches a class label, and every internal node in
this tree represents a choice made in response to an attribute. The requirements needed to
categorise an instance are represented by the path that leads from a root node to the leaf
node.
C4.5 Algorithm (Contd.)
7. Grouping:
Based on the instance's attribute values, it moves through the decision tree through the
root nodes to a leaf node in order to categories a new instance. The algorithm assesses the
attribute condition at each internal node and proceeds down the relevant branch until it
arrives at a leaf node. The instance is allocated the class label corresponding to the leaf
node that was reached during traversal.
C4.5 Algorithm (Contd.)
Splitting Criteria:
1. Knowledge Gain:
 Information gain quantifies how well a characteristic reduces ambiguity regarding the
class labels.
 It is computed by contrasting the dataset's entropy, also known as impurity, prior to and
following the split according to the attribute.
 The dataset's degree of disorder and uncertainty is measured by entropy. Greater
homogeneity of labels for classes within subgroups is indicated by lower entropy.
 The following formula is used to compute information gain:
 Entropy before to split - The weighted mean of entropies following split equals
information gain.
 For the present node, the splitting criterion is chosen based on the property that has the
biggest information gain.
C4.5 Algorithm (Contd.)
2. Ratio of Gain:
 Although information acquisition is useful, it favours characteristics with a high number
of values.
 The bias towards qualities with more values is addressed by the gain ratio, an alteration
of data gain that penalises attributes with many different values.
 It is computed by dividing the information gain-a measure of the attribute's intrinsic
information-by the split information.
 To get the gain ratio, use this formula:
 Information Gain/Split Information equals Gain Ratio
 For the attribute values, the amount of entropy or another indicator of impurity is used
to determine the split information.
C4.5 Algorithm (Contd.)
Summary:
 To sum up, the algorithm known as C4.5 is an effective method for building decision
networks in classification applications.
 Using splitting criteria like data gain or loss ratio, it chooses characteristics that optimise
the decrease in ambiguity around class labels.
 With the use of pruning strategies to avoid overfitting and repeated dataset division, C4.5
produces interpretable decision trees that can effectively categorise cases.
 Notwithstanding its efficacy, C4.5 could be constrained by issues including biassed tree
building and susceptibility to noisy data.
 Even so, it continues to be a fundamental algorithm in the field of machine learning,
helping to comprehend and create more sophisticated categorization methods.
K-Nearest Neighbor(KNN) Algorithm
Getting Started with K-Nearest Neighbors
K-Nearest Neighbors is also called as a lazy learner algorithm because it does not learn from
the training set immediately instead it stores the dataset and at the time of classification it
performs an action on the dataset.
As an example, consider the following table of data points containing two features:
K-Nearest Neighbor(KNN) Algorithm (Contd.)
The new point is classified as Category 2 because most of its closest neighbors are blue
squares. KNN assigns the category based on the majority of nearby points.

The image shows how KNN predicts the category of a new data point based on its closest
neighbours.
•The red diamonds represent Category 1 and the blue squares represent Category 2.
•The new data point checks its closest neighbours (circled points).
•Since the majority of its closest neighbours are blue squares (Category 2) KNN predicts the
new data point belongs to Category 2.

KNN works by using proximity and majority voting to make predictions.

K-Nearest Neighbor(KNN) Algorithm (Contd.)
What is ‘K’ in K Nearest Neighbour ?
In the k-Nearest Neighbours (k-NN) algorithm k is just a number that tells the algorithm
how many nearby points (neighbours) to look at when it makes a decision.

Example:
Imagine you’re deciding which fruit it is based on its shape and size. You compare it to fruits
you already know.
•If k = 3, the algorithm looks at the 3 closest fruits to the new one.
•If 2 of those 3 fruits are apples and 1 is a banana, the algorithm says the new fruit is an
apple because most of its neighbours are apples.
K-Nearest Neighbor(KNN) Algorithm (Contd.)
How to choose the value of k for KNN Algorithm?
The value of k is critical in KNN as it determines the number of neighbors to consider when
making predictions. Selecting the optimal value of k depends on the characteristics of the
input data.

If the dataset has significant outliers or noise a higher k can help smooth out the
predictions and reduce the influence of noisy data. However choosing very high value can
lead to underfitting where the model becomes too simplistic.
K-Nearest Neighbor(KNN) Algorithm (Contd.)
Statistical Methods for Selecting k:

 Cross-Validation: A robust method for selecting the best k is to perform k-fold cross-
validation. This involves splitting the data into k subsets training the model on some
subsets and testing it on the remaining ones and repeating this for each subset. The value
of k that results in the highest average validation accuracy is usually the best choice.
 Elbow Method: In the elbow method we plot the model’s error rate or accuracy for
different values of k. As we increase k the error usually decreases initially. However after
a certain point the error rate starts to decrease more slowly. This point where the curve
forms an “elbow” that point is considered as best k.
 Odd Values for k: It’s also recommended to choose an odd value for k especially in
classification tasks to avoid ties when deciding the majority class.
K-Nearest Neighbor(KNN) Algorithm (Contd.)
Distance Metrics Used in KNN Algorithm
KNN uses distance metrics to identify nearest neighbour, these neighbours are used for
classification and regression task. To identify nearest neighbour we use below distance
metrics:
1. Euclidean Distance
Euclidean distance is defined as the straight-line distance between two points in a plane or
space. You can think of it like the shortest path you would walk if you were to go directly
from one point to another.
Euclidean distance (p=2): This is the most commonly used distance measure, and it is limited
to real-valued vectors. Using the below formula, it measures a straight line between the
query point and the other point being measured.
K-Nearest Neighbor(KNN) Algorithm (Contd.)
2. Manhattan Distance
This is the total distance you would travel if you could only move along horizontal and
vertical lines (like a grid or city streets). It’s also called “taxicab distance” because a taxi can
only drive along the grid-like streets of a city.

3. Minkowski distance:
This distance measure is the generalized form of Euclidean and Manhattan distance metrics.
The parameter, p, in the formula below, allows for the creation of other distance metrics.
Euclidean distance is represented by this formula when p is equal to two, and Manhattan
distance is denoted with p equal to one.

Classification & Regression Models
No ratings yet
Classification & Regression Models
32 pages
UNIT 1 Linear Discrimnat
No ratings yet
UNIT 1 Linear Discrimnat
7 pages
Module 3.1
No ratings yet
Module 3.1
25 pages
C30 C35 LinearModelForClassification
No ratings yet
C30 C35 LinearModelForClassification
50 pages
UNIT II Regration
No ratings yet
UNIT II Regration
62 pages
Module3-Fitting A Model To Data
No ratings yet
Module3-Fitting A Model To Data
57 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
07 - Linear Models For Classification
No ratings yet
07 - Linear Models For Classification
76 pages
Unit 2 ML - Ver 2
No ratings yet
Unit 2 ML - Ver 2
129 pages
Linear Discriminant & SVM Explained
No ratings yet
Linear Discriminant & SVM Explained
41 pages
Linear Classifiers Explained
No ratings yet
Linear Classifiers Explained
13 pages
Unit 3 in Machine Intelligence
No ratings yet
Unit 3 in Machine Intelligence
62 pages
Understanding Linear Classifiers Basics
No ratings yet
Understanding Linear Classifiers Basics
9 pages
ML Algorithms Week 3
No ratings yet
ML Algorithms Week 3
30 pages
Pattern Recognition for CS Scholars
0% (1)
Pattern Recognition for CS Scholars
37 pages
Unit-4 Pda
No ratings yet
Unit-4 Pda
111 pages
Supervised Learning. wk3
No ratings yet
Supervised Learning. wk3
18 pages
PA
No ratings yet
PA
8 pages
Lecture 10.self
No ratings yet
Lecture 10.self
9 pages
Machine Learning and Data Mining: Prof. Alexander Ihler
No ratings yet
Machine Learning and Data Mining: Prof. Alexander Ihler
46 pages
Supervised Learning: Classification & Regression
No ratings yet
Supervised Learning: Classification & Regression
6 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
Supervised Machine Learning Guide
No ratings yet
Supervised Machine Learning Guide
74 pages
Unit 4 ML
No ratings yet
Unit 4 ML
11 pages
Week3 Summary Detail
No ratings yet
Week3 Summary Detail
13 pages
SumitBurnwal ML
No ratings yet
SumitBurnwal ML
13 pages
LoTs and HoTs Question For Unit 3 and Unit 4 - 1
No ratings yet
LoTs and HoTs Question For Unit 3 and Unit 4 - 1
16 pages
Lecture 3
No ratings yet
Lecture 3
51 pages
Week3 Summary Detail
No ratings yet
Week3 Summary Detail
9 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
M2 - Supervised Machine Learning
No ratings yet
M2 - Supervised Machine Learning
79 pages
Machine Learning
No ratings yet
Machine Learning
87 pages
ML Final
No ratings yet
ML Final
92 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Week - 03 Week04
No ratings yet
Week - 03 Week04
32 pages
12 - Bài Toán Phân L P - LR - v2
No ratings yet
12 - Bài Toán Phân L P - LR - v2
130 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
Linear Discriminant Analysis Reference
No ratings yet
Linear Discriminant Analysis Reference
6 pages
AI, ML, and DL Concepts Explained
No ratings yet
AI, ML, and DL Concepts Explained
39 pages
Unit 2 Supervised Learning and Applications
No ratings yet
Unit 2 Supervised Learning and Applications
13 pages
Hota ML LDF
No ratings yet
Hota ML LDF
28 pages
Complete
No ratings yet
Complete
12 pages
ML Unit4
No ratings yet
ML Unit4
41 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
36 pages
Lecture Slides-Week11
No ratings yet
Lecture Slides-Week11
32 pages
MLRS Assignment 1 24070146008 Sreemanth Mannem
No ratings yet
MLRS Assignment 1 24070146008 Sreemanth Mannem
12 pages
Lecture Slides Week11
No ratings yet
Lecture Slides Week11
33 pages
UNIT3 Machine Learning
No ratings yet
UNIT3 Machine Learning
53 pages
CBM342 BCI Unit IV
No ratings yet
CBM342 BCI Unit IV
22 pages
Beginner's Guide to Machine Learning
No ratings yet
Beginner's Guide to Machine Learning
37 pages
Linear Methods For Classification
No ratings yet
Linear Methods For Classification
29 pages
Assignment 3 AINLP
No ratings yet
Assignment 3 AINLP
4 pages
1 - Intro To Machine Learning
No ratings yet
1 - Intro To Machine Learning
34 pages
Classification and Regression
No ratings yet
Classification and Regression
26 pages
Unit-Ii Va&pt
No ratings yet
Unit-Ii Va&pt
29 pages
ML Unit 3
No ratings yet
ML Unit 3
29 pages
ML Unit 1
No ratings yet
ML Unit 1
21 pages
Finite State Machines & Regular Languages
No ratings yet
Finite State Machines & Regular Languages
50 pages
Compiler Code Optimization Techniques
No ratings yet
Compiler Code Optimization Techniques
28 pages
Memory Forensics in Cybersecurity
No ratings yet
Memory Forensics in Cybersecurity
11 pages
VR22 CN Lab Manual
No ratings yet
VR22 CN Lab Manual
44 pages
An Introduction To Neuroimaging Analysis: General Linear Model For Neuroimaging
No ratings yet
An Introduction To Neuroimaging Analysis: General Linear Model For Neuroimaging
35 pages
Intro Regression Modeling
No ratings yet
Intro Regression Modeling
11 pages
Exercises Chapter2 Part1
No ratings yet
Exercises Chapter2 Part1
2 pages
Multinomial Logit Regression Guide
No ratings yet
Multinomial Logit Regression Guide
6 pages
Regression Analysis
No ratings yet
Regression Analysis
8 pages
Calculating Reliability
No ratings yet
Calculating Reliability
14 pages
The Coefficient Stability Test
No ratings yet
The Coefficient Stability Test
21 pages
Faktor-Faktor Yang Mempengaruhi Pelayanan Pernikahan Pada Kantor Urusan Agama (KUA) Kecamatan Koto Tangah Di Kota Padang
No ratings yet
Faktor-Faktor Yang Mempengaruhi Pelayanan Pernikahan Pada Kantor Urusan Agama (KUA) Kecamatan Koto Tangah Di Kota Padang
11 pages
Logistic Regression Lab Manual
No ratings yet
Logistic Regression Lab Manual
7 pages
Unit 7 PDF
No ratings yet
Unit 7 PDF
15 pages
Final - PPR (1) BTP
No ratings yet
Final - PPR (1) BTP
14 pages
363 - Reliability Based Calibration of Foundation Strength Factor Using Full-Scale Test Data - A Guide For Design Engineers
100% (1)
363 - Reliability Based Calibration of Foundation Strength Factor Using Full-Scale Test Data - A Guide For Design Engineers
116 pages
Regression Analysis Solutions
No ratings yet
Regression Analysis Solutions
20 pages
How To Do Logistic Regression
No ratings yet
How To Do Logistic Regression
4 pages
Econometrics Computer Exercise Week 1: Introduction Stata + Simple Regression Model
No ratings yet
Econometrics Computer Exercise Week 1: Introduction Stata + Simple Regression Model
4 pages
r21 Cs603c Gnit
No ratings yet
r21 Cs603c Gnit
15 pages
Comprehensive ML Course Guide
No ratings yet
Comprehensive ML Course Guide
4 pages
S1 Final Mock
No ratings yet
S1 Final Mock
17 pages
17.1.6 General Comments On Linear Regression
No ratings yet
17.1.6 General Comments On Linear Regression
7 pages
Group F - Time Series Project
No ratings yet
Group F - Time Series Project
14 pages
UNIT-3 Material
No ratings yet
UNIT-3 Material
19 pages
Series
No ratings yet
Series
5 pages
Machine Learning Python Seminar Report
No ratings yet
Machine Learning Python Seminar Report
18 pages
kNN with R Caret Package Guide
No ratings yet
kNN with R Caret Package Guide
17 pages
ACT6100 A2020 Sup 12
No ratings yet
ACT6100 A2020 Sup 12
37 pages
What Is LASSO Regression Definition, Examples and Techniques
No ratings yet
What Is LASSO Regression Definition, Examples and Techniques
15 pages
Statistical Tests Cheatsheet
No ratings yet
Statistical Tests Cheatsheet
2 pages
Statistik Chap 5 Hair
No ratings yet
Statistik Chap 5 Hair
4 pages
Assignment Report - Group A
No ratings yet
Assignment Report - Group A
31 pages
MLS Player Performance vs. Salary Analysis
No ratings yet
MLS Player Performance vs. Salary Analysis
19 pages

ML Unit 2

Uploaded by

ML Unit 2

Uploaded by

CLASSIFICATION AND

What are Linear Discriminants?

Types of Linear Discriminants

Limitations of Linear Discriminants

Applications of Linear Discriminants

What is Linear Regression?

Cost Function (Mean Squared Error):

Limitations of Linear Regression:

X-axis = Independent variable

Y-axis = Output / dependent variable

Line of regression = Best fit line for a

Here, a line is plotted for the given data

What is Logistic Regression?

Advantages of Logistic Regression:

Limitations of Logistic Regression:

Applications of Logistic Regression:

Types of Decision Tree Algorithms

Understanding Decision Tree

2.For each feature.

2. Calculate Entropy for Categorical Values.

3.Choose the feature that generates highest information gain.

 It addresses several limitations of ID3 including its inability to handle continuous

KNN works by using proximity and majority voting to make predictions.

You might also like