U21amg05 Aif and ML Unit 04 Notes
U21amg05 Aif and ML Unit 04 Notes
Learning
Unit 04 Notes
Supervised Learning:
Definition: Supervised learning is a machine learning paradigm where the algorithm learns from labeled training data,
making predictions or decisions based on input-output pairs.
Importance of Supervised Learning:
Predictive Modeling:
Supervised learning enables the creation of predictive models that can make accurate predictions on new, unseen
data.
Decision Making:
It assists in decision-making processes by providing insights and recommendations based on historical data.
Automation:
Supervised learning algorithms automate tasks such as classification, regression, and forecasting, reducing
manual effort.
Classification:
Predicting the class label of new observations based on a set of labeled training data.
Regression:
Predicting continuous numeric outcomes or quantities based on input variables.
Classification vs. Regression:
Regression:
Classification:
Definition:
Definition: Regression is a type of supervised learning where the
Classification is a type of supervised algorithm predicts continuous numeric values or quantities.
learning where the algorithm assigns new
observations to predefined categories or classes. Example:
Imbalanced Data: Overfitting: Regression models may overfit the training data,
Classification may struggle with imbalanced datasets, capturing noise instead of underlying patterns.
where one class dominates the others.
Boundary Ambiguity: Limited Interpretability: Some regression models, especially
Class boundaries may be ambiguous in high-dimensional complex ones, may lack interpretability, making it difficult to
feature spaces, leading to classification errors. explain predictions.
Sensitive to Noise:
Classification models can be sensitive to noise and Assumption Violation: Linear regression, for example,
irrelevant features, affecting performance. assumes a linear relationship between variables, which may not
hold true in all cases.
Decision Tree
A decision tree is a flowchart-like structure used for decision making and classification.
Components:
Root Node: Represents the entire dataset.
Internal Nodes: Represent the features.
Branches: Represent the decision rules.
Leaf Nodes: Represent the outcomes.
Entropy
Entropy is a measure of the impurity or uncertainty in a dataset.
where:
𝐻(𝑆) is the entropy of the set 𝑆.
𝑐 is the number of classes.
Information Gain pi is the proportion of samples in class i.
Definition:
Information gain measures the reduction in
entropy after a dataset is split on an attribute. where:
𝐼𝐺(𝑆,𝐴) is the information gain of attribute 𝐴.
𝐻(𝑆) is the entropy of set 𝑆.
𝑆𝑣 is the subset of 𝑆 for which attribute 𝐴 has value v.
∣𝑆𝑣∣ is the number of samples in subset 𝑆𝑣.
Decision Tree Geometrical representation
Decision Tree Algorithm
Interpretability: Overfitting:
Decision trees are easy to understand and Decision trees tend to overfit the training data,
interpret. especially if they are deep with many branches.
Bayes' Theorem
Independence Assumption:
Naive Bayes assumes independence among features, which is often not true in real-world data.
Sensitivity to Data Quality:
Performance can degrade with noisy or irrelevant features.
Zero Probability Problem:
If a feature value was not observed in the training data, it leads to zero probability
Support Vector Machine (SVM)
➢ SVM is a powerful and versatile supervised learning algorithm used for classification and regression tasks.
➢ It works by finding the optimal hyperplane that best separates the classes in a high-dimensional space.
➢ SVM finds the hyperplane that maximizes the margin between different classes.
➢ The data points that are closest to the hyperplane are called support vectors.
➢ SVM operates in a high-dimensional vector space where each feature represents a dimension.
➢ Vector space representation allows the use of dot products to calculate margins and distances efficiently.
➢ Geometrical space is limited in visualizing and calculating in higher dimensions, whereas vector space provides a
mathematical framework for handling high-dimensional data.
Disadvantages of SVM
Computational Complexity:
Training can be computationally intensive, especially with large datasets.
Parameter Tuning:
Choosing the right kernel and setting the parameters (e.g., 𝐶C and kernel parameters) can be challenging and requires careful
tuning.
Non-Probabilistic Output:
Does not provide probabilistic confidence scores directly, though techniques like Platt scaling can be used to obtain
probability estimates.
K-Nearest Neighbors (KNN) Classifier
➢ KNN is a simple, non-parametric, and instance-based learning algorithm used for classification and regression tasks.
➢ It makes predictions based on the k most similar training examples in the feature space.
➢ KNN stores all available cases and classifies new cases based on a similarity measure (distance function).
➢ For a given test instance, the algorithm identifies the 𝑘 nearest instances from the training data.
➢ The predicted class is the majority class among the 𝑘 nearest neighbors.
➢ Lazy Learner :
➢ KNN does not build a model during the training phase.
➢ It only stores the training instances and performs computation during the prediction phase.
➢ The algorithm defers the learning process until a query is made to the system, hence "lazy"
Simplicity:
• Easy to understand and implement.
No Training Phase:
• No explicit training phase, making it straightforward to
deploy.
Versatility:
• Can be used for both classification and regression tasks.
K nearest neighbour algorithm – Geometrical representation
Distance Calculations
Euclidean Distance:
Manhattan Distance:
Note :
The choice of distance metric can affect the performance of KNN.
K nearest neighbour algorithm
Advantages of KNN
Disadvantages of KNN
➢ Computationally Intensive:
➢ Prediction requires calculating the distance to all training instances, making it slow for large datasets.
➢ Storage Requirements:
➢ Requires storing all the training data, which can be memory-intensive.
➢ Continuous Outcome:
➢ When the target variable is continuous and linear relationships exist between the target and features.
➢ Simple Relationships:
➢ When the relationship between variables can be approximated well with a straight line.
➢ Feature-Target Relationships:
➢ To understand and quantify the impact of independent variables on the dependent variable.
Linear Regression Geometrical representation
Importance of Linear Regression
Simplicity:
Easy to understand and implement.
Interpretability:
Provides clear insight into the relationship between variables.
Efficiency:
• Computationally efficient for both training and prediction.
Data Representation:
Linear regression represents data points in a multi-dimensional space where each dimension corresponds to a
feature.
Best Fit Line:
The algorithm finds the line (or hyperplane in higher dimensions) that minimizes the distance between the
predicted values and actual values.
Mean Squared Error (MSE)
Linear Regression Algorithm
Advantages of Linear Regression
Geometric Space:
Logistic regression works in the feature space
where data points are represented.
Decision Boundary:
Interpretability:
• Coefficients provide insight into the feature importance.
Probabilistic Output:
• Outputs probabilities, useful for understanding confidence levels.
Efficiency:
• Computationally efficient and scalable to large datasets.
Linearity Assumption:
• Assumes a linear relationship between the features and the log-odds of the outcome, which may not always be true.
Binary Limitation:
• Primarily suited for binary classification problems; extensions are needed for multi-class classification.
Sensitivity to Outliers:
• Can be sensitive to outliers, which may affect the decision boundary.
Solved Examples
Decision Tree Problem
c 2 4 6 8
Y 3 7 5 10
K- Nearest Neighbors Algorithm