0% found this document useful (0 votes)
18 views

U21amg05 Aif and ML Unit 04 Notes

KPRIET AI Fundamentals and Machine Learning Unit 4

Uploaded by

22cs103
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

U21amg05 Aif and ML Unit 04 Notes

KPRIET AI Fundamentals and Machine Learning Unit 4

Uploaded by

22cs103
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

U21AMG05 AI Fundamentals and Machine

Learning

Unit 04 Notes
Supervised Learning:

Definition: Supervised learning is a machine learning paradigm where the algorithm learns from labeled training data,
making predictions or decisions based on input-output pairs.
Importance of Supervised Learning:

Predictive Modeling:
Supervised learning enables the creation of predictive models that can make accurate predictions on new, unseen
data.
Decision Making:
It assists in decision-making processes by providing insights and recommendations based on historical data.
Automation:
Supervised learning algorithms automate tasks such as classification, regression, and forecasting, reducing
manual effort.

Types of Supervised Learning:

Classification:
Predicting the class label of new observations based on a set of labeled training data.
Regression:
Predicting continuous numeric outcomes or quantities based on input variables.
Classification vs. Regression:

Regression:
Classification:
Definition:
Definition: Regression is a type of supervised learning where the
Classification is a type of supervised algorithm predicts continuous numeric values or quantities.
learning where the algorithm assigns new
observations to predefined categories or classes. Example:

Example: Predicting house prices, stock prices, temperature forecasting.


Spam detection, sentiment analysis, image
classification. Output:

Output: Continuous numeric values.


Discrete class labels or categories.
Evaluation Metrics:
Evaluation Metrics:
Mean squared error (MSE), mean absolute error (MAE), R-
Accuracy, precision, recall, F1-score. squared.
Advantages of Classification: Advantages of Regression:
Flexibility:
Interpretability: Regression models can capture complex relationships between
Class labels provide clear interpretations and insights input and output variables.
into the data. Quantitative Predictions:
Robustness: Regression provides precise numeric predictions, facilitating
Classification models can handle noisy data and decision-making.
outliers effectively. Feature Importance:
Wide Applicability: Regression models can identify the importance of different
Classification algorithms are widely applicable across features in predicting outcomes.
various domains, from healthcare to finance.

Disadvantages of Classification: Disadvantages of Regression:

Imbalanced Data: Overfitting: Regression models may overfit the training data,
Classification may struggle with imbalanced datasets, capturing noise instead of underlying patterns.
where one class dominates the others.
Boundary Ambiguity: Limited Interpretability: Some regression models, especially
Class boundaries may be ambiguous in high-dimensional complex ones, may lack interpretability, making it difficult to
feature spaces, leading to classification errors. explain predictions.
Sensitive to Noise:
Classification models can be sensitive to noise and Assumption Violation: Linear regression, for example,
irrelevant features, affecting performance. assumes a linear relationship between variables, which may not
hold true in all cases.
Decision Tree
A decision tree is a flowchart-like structure used for decision making and classification.
Components:
Root Node: Represents the entire dataset.
Internal Nodes: Represent the features.
Branches: Represent the decision rules.
Leaf Nodes: Represent the outcomes.
Entropy
Entropy is a measure of the impurity or uncertainty in a dataset.

where:
𝐻(𝑆) is the entropy of the set 𝑆.
𝑐 is the number of classes.
Information Gain pi​ is the proportion of samples in class i.

Definition:
Information gain measures the reduction in
entropy after a dataset is split on an attribute. where:
𝐼𝐺(𝑆,𝐴) is the information gain of attribute 𝐴.
𝐻(𝑆) is the entropy of set 𝑆.
𝑆𝑣​ is the subset of 𝑆 for which attribute 𝐴 has value v.
∣𝑆𝑣∣ is the number of samples in subset 𝑆𝑣.
Decision Tree Geometrical representation
Decision Tree Algorithm

➢ Calculate Entropy: Compute the entropy of the dataset.


➢ Compute Information Gain: For each attribute, compute the information gain.
➢ Select Attribute: Choose the attribute with the highest information gain.
➢ Split: Divide the dataset based on the selected attribute.
➢ Repeat: Apply the process recursively to the subsets.

Advantages of Decision Tree Classification Disadvantages of Decision Tree Classification

Interpretability: Overfitting:
Decision trees are easy to understand and Decision trees tend to overfit the training data,
interpret. especially if they are deep with many branches.

No Need for Feature Scaling: Instability:


Decision trees do not require normalization or Small changes in the data can result in a completely
scaling of features. different tree structure.
Bayesian Classifier
The Bayesian classifier is based on Bayes' Theorem.
It is used for classification tasks in machine learning, leveraging probabilistic approaches.
Commonly used variants include the Naive Bayes classifier, which assumes feature independence.
Importance of Bayesian Classifier
Simplicity:
Simple to implement and understand.
Requires relatively small amount of training data to estimate parameters.
Probabilistic Output:
Provides probabilistic predictions, offering confidence measures for classifications.
Efficiency:
Highly efficient for large datasets and real-time predictions.

Bayes' Theorem

Bayes' Theorem provides a way to update the probability


estimate for a hypothesis as additional evidence is acquired.
Bayesian Classifier Geometrical representation
Bayesian Classifier Algorithm
Advantages of Bayesian Classifier

Simplicity and Efficiency:


Easy to implement and computationally efficient.
Probabilistic Interpretation:
Outputs probabilistic predictions, useful for uncertainty estimation.
Performance:
Particularly effective for text classification and spam detection

Dis advantages of Bayesian Classifier

Independence Assumption:
Naive Bayes assumes independence among features, which is often not true in real-world data.
Sensitivity to Data Quality:
Performance can degrade with noisy or irrelevant features.
Zero Probability Problem:
If a feature value was not observed in the training data, it leads to zero probability
Support Vector Machine (SVM)
➢ SVM is a powerful and versatile supervised learning algorithm used for classification and regression tasks.
➢ It works by finding the optimal hyperplane that best separates the classes in a high-dimensional space.
➢ SVM finds the hyperplane that maximizes the margin between different classes.
➢ The data points that are closest to the hyperplane are called support vectors.

Importance of Support Vector Machine (SVM)


Effective in High-Dimensional Spaces:
SVM is effective in cases where the number of dimensions is greater than the number of samples.
Robust to Overfitting:
SVM is effective when the number of features is large, and it employs regularization to reduce overfitting.
Versatile:
Can be used for both linear and non-linear classification using kernel functions.
Support Vector Machine – Geometrical Representation
Why SVM Uses Vector Space Instead of Geometrical Space

➢ SVM operates in a high-dimensional vector space where each feature represents a dimension.
➢ Vector space representation allows the use of dot products to calculate margins and distances efficiently.
➢ Geometrical space is limited in visualizing and calculating in higher dimensions, whereas vector space provides a
mathematical framework for handling high-dimensional data.

Hyperplane vs. Linear Line


Hyperplane:
➢ A hyperplane is a flat affine subspace of one dimension less than its ambient space.
➢ In 2D, it’s a line; in 3D, it’s a plane; in higher dimensions, it’s called a hyperplane.
Linear Line:
➢ In 2D, the decision boundary is a line, which is a special case of a hyperplane.
➢ SVM uses hyperplanes because it generalizes the concept of a line to higher dimensions, allowing for more complex
decision boundaries.
Support Vector Machine (SVM) - Algorithm
Advantages of SVM

Effective in High Dimensions:


Works well in high-dimensional spaces and is effective even when the number of features exceeds the number of
samples.
Memory Efficient:
Uses a subset of training points (support vectors) in the decision function, making it memory efficient.
Versatile:
Different kernel functions can be specified for the decision function, providing flexibility in decision boundaries.

Disadvantages of SVM

Computational Complexity:
Training can be computationally intensive, especially with large datasets.

Parameter Tuning:
Choosing the right kernel and setting the parameters (e.g., 𝐶C and kernel parameters) can be challenging and requires careful
tuning.

Non-Probabilistic Output:
Does not provide probabilistic confidence scores directly, though techniques like Platt scaling can be used to obtain
probability estimates.
K-Nearest Neighbors (KNN) Classifier
➢ KNN is a simple, non-parametric, and instance-based learning algorithm used for classification and regression tasks.
➢ It makes predictions based on the k most similar training examples in the feature space.
➢ KNN stores all available cases and classifies new cases based on a similarity measure (distance function).
➢ For a given test instance, the algorithm identifies the 𝑘 nearest instances from the training data.
➢ The predicted class is the majority class among the 𝑘 nearest neighbors.
➢ Lazy Learner :
➢ KNN does not build a model during the training phase.
➢ It only stores the training instances and performs computation during the prediction phase.
➢ The algorithm defers the learning process until a query is made to the system, hence "lazy"

Importance of K nearest neighbour algorithm

Simplicity:
• Easy to understand and implement.
No Training Phase:
• No explicit training phase, making it straightforward to
deploy.
Versatility:
• Can be used for both classification and regression tasks.
K nearest neighbour algorithm – Geometrical representation
Distance Calculations

Euclidean Distance:

The straight-line distance between two points in Euclidean space.

Manhattan Distance:

The sum of the absolute differences of their coordinates.

Note :
The choice of distance metric can affect the performance of KNN.
K nearest neighbour algorithm
Advantages of KNN

➢ Simplicity and Ease of Implementation:


➢ KNN is easy to understand and implement with no assumptions about the underlying data distribution.
➢ No Training Phase:
➢ The absence of a training phase allows for immediate prediction using the training data.
➢ Versatility:
➢ Effective for both classification and regression problems.

Disadvantages of KNN

➢ Computationally Intensive:
➢ Prediction requires calculating the distance to all training instances, making it slow for large datasets.

➢ Storage Requirements:
➢ Requires storing all the training data, which can be memory-intensive.

➢ Sensitive to Irrelevant Features:


➢ Performance can degrade with irrelevant or redundant features; feature scaling and selection are crucial.
Linear Regression Classifier
➢ Linear Regression is a fundamental algorithm in machine learning used for predicting continuous target variables.
➢ It models the relationship between a dependent variable (target) and one or more independent variables (features)
by fitting a linear equation to observed data.

When to Use Linear Regression

➢ Continuous Outcome:
➢ When the target variable is continuous and linear relationships exist between the target and features.
➢ Simple Relationships:
➢ When the relationship between variables can be approximated well with a straight line.
➢ Feature-Target Relationships:
➢ To understand and quantify the impact of independent variables on the dependent variable.
Linear Regression Geometrical representation
Importance of Linear Regression

Simplicity:
Easy to understand and implement.
Interpretability:
Provides clear insight into the relationship between variables.
Efficiency:
• Computationally efficient for both training and prediction.

Geometric Space of Linear Regression

Data Representation:
Linear regression represents data points in a multi-dimensional space where each dimension corresponds to a
feature.
Best Fit Line:
The algorithm finds the line (or hyperplane in higher dimensions) that minimizes the distance between the
predicted values and actual values.
Mean Squared Error (MSE)
Linear Regression Algorithm
Advantages of Linear Regression

Simplicity and Interpretability:


• Easy to understand relationships between variables.
Efficiency:
• Fast for training and making predictions.
Linearity:
• Effective for linear relationships and can be extended to polynomial regression for more complex relationships.

Disadvantages of Linear Regression


Assumption of Linearity:
• Assumes a linear relationship between features and target, which may not always hold.
Outliers:
• Sensitive to outliers, which can significantly affect the model.
Multicollinearity:
• High correlation between independent variables can distort results.
Logistic Regression Classifier
Logistic Regression is a statistical method for binary classification problems.
It models the probability that a given input point belongs to a certain class.

Geometric Space and Decision Boundary

Geometric Space:
Logistic regression works in the feature space
where data points are represented.

Decision Boundary:

➢ The decision boundary is a straight line (or a hyperplane


in higher dimensions) that separates the classes.

➢ For a single feature, the decision boundary is a vertical


line.
Logistic Regression Algorithm – geometrical representation
Logistic Regression Algorithm
Advantages of Logistic Regression

Interpretability:
• Coefficients provide insight into the feature importance.
Probabilistic Output:
• Outputs probabilities, useful for understanding confidence levels.
Efficiency:
• Computationally efficient and scalable to large datasets.

Disadvantages of Logistic Regression

Linearity Assumption:
• Assumes a linear relationship between the features and the log-odds of the outcome, which may not always be true.
Binary Limitation:
• Primarily suited for binary classification problems; extensions are needed for multi-class classification.
Sensitivity to Outliers:
• Can be sensitive to outliers, which may affect the decision boundary.
Solved Examples
Decision Tree Problem

Day Weather Temperature Humidity Wind Play?


1 Sunny Hot High Weak No
2 Cloudy Hot High Weak Yes
3 Sunny Mild Normal Strong Yes
4 Cloudy Mild High Strong Yes
5 Rainy Mild High Strong No
6 Rainy cool Normal Strong No
7 Rainy Mild High Weak Yes
8 Sunny Hot High Strong No
9 Cloudy Hot Normal Weak Yes
10 Rainy Mild High Strong No
Final Decision Tree
Based on the calculated information gains and subsequent splits, we can construct the decision tree as follows:
LINEAR REGRESSION
Find linear regression equation for the following two sets of data:

c 2 4 6 8

Y 3 7 5 10
K- Nearest Neighbors Algorithm

BMI Age Sugar


Assume K=3
33.6 50 1
26.6 30 0 predict the diabetic patient with the given features BMI,
23.4 40 0 Age if Test Example BMI=43.6, Age=40,Sugar=?
43.1 67 0
35.3 23 1
Step 2: Calculate Euclidean Distances

You might also like