0% found this document useful (0 votes)
9 views5 pages

Unit III

notes

Uploaded by

Blaze 08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views5 pages

Unit III

notes

Uploaded by

Blaze 08
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Unit III: Supervised Learning Techniques

3.1 Decision Trees

What is a Decision Tree?

A Decision Tree is a supervised learning model that looks like a tree. It helps make decisions or predictions by
breaking down data into smaller and smaller parts. It can be used for both "yes/no" type answers (classification) and
guessing numbers (regression).

How it Looks and Works:

• Nodes: Each "circle" or "box" in the tree is a "node."

o Internal Node: This is where a question is asked about a feature (like "Is age > 30?").

o Branch: The lines coming out of a node are "branches," representing the answers to the question
(like "Yes" or "No").

o Leaf Node: These are the very end points of the tree. They give the final answer or prediction (like
"Buy computer" or "Don't buy").

o Root Node: This is the starting node at the very top of the tree.

• The tree keeps asking questions and splitting the data until it reaches a final answer. The goal is to make the
purest groups possible at the end.

Key Ideas:

• A tree-like model for making decisions.

• Used for classification (categories) and regression (numbers).

• Made of nodes, branches, and leaves.

• Works by splitting data based on questions about features.

3.2 Naive Bayes Classification

What is Naive Bayes?

Naive Bayes is a classification algorithm that works based on probability, using a rule called Bayes' Theorem. It's
"naive" because it assumes that all features (like a person's age, income, and job) are independent of each other
when predicting a class (like whether they will buy a product). This means it thinks one feature doesn't affect
another.

How it Works:

Even with this simple assumption, Naive Bayes often performs surprisingly well, especially with a lot of data. It's very
fast and efficient. It calculates the chance that a certain input belongs to each possible category, and then picks the
category with the highest chance.

Key Ideas:

• It's a classification algorithm (predicts categories).

• Based on probability (Bayes' Theorem).

• Assumes features are independent (which is why it's "naive").

• Good for tasks like spam detection or figuring out text feelings.

• Simple, fast, and efficient.


3.3 Classification (General Concepts)

What is Classification?

Classification is a type of supervised learning where the computer learns to put things into predefined groups or
categories. The answer it gives is always one of these specific labels.

How it Differs:

• Output: The main thing about classification is that its output is always a category (like "yes" or "no", "cat" or
"dog").

• Learning: The model learns from data that already has these categories marked (labeled data).

Examples:

• Is this email spam or not spam?

• Is this picture a cat, a dog, or a bird?

• Does this person have a disease or not?

Key Points:

• Predicts discrete categories (labels).

• Learns from labeled data.

• Different from regression (which predicts numbers).

3.4 Support Vector Machines (SVMs)

What is a Support Vector Machine (SVM)?

An SVM is a powerful supervised learning algorithm used for both classification and regression, but most commonly
for classification. Its main goal is to find the best way to separate different groups of data.

How it Works (The "Hyperplane"):

Imagine you have data points scattered on a graph, and you want to draw a line to separate two different types of
points (like circles and squares).

• Hyperplane: The SVM tries to find the "best" line (or plane, if you have more features) that separates these
groups. This line is called a hyperplane.

• Maximum Margin: The "best" line is the one that has the largest possible gap (or "margin") between it and
the closest data points from each group.

• Support Vectors: The data points that are closest to this separating line are called "support vectors." These
are the critical points that "support" or define the position of the hyperplane. If you move these points, the
hyperplane might change.

Key Points:

• A supervised learning algorithm for classification (and regression).

• Goal: To divide datasets into classes.

• Finds a "hyperplane" (a line or plane) that maximizes the margin between classes.

• Support vectors are the data points closest to the hyperplane that influence its position.
3.5 Random Forest

What is a Random Forest?

A Random Forest is a very popular and powerful ML algorithm for both classification and regression. It's based on an
idea called "Ensemble Learning," which means combining many simpler models to get a better, more robust result.
As its name suggests, it builds a "forest" of decision trees.

How it Works:

• Instead of relying on just one decision tree, a Random Forest builds many decision trees using different
random subsets of the training data.

• Each tree makes its own prediction.

• For classification problems, the Random Forest then takes a "vote" from all the trees and chooses the
prediction that the majority of trees agreed on.

• For regression problems, it averages all the trees' predictions.

• Having many trees helps improve accuracy and prevents overfitting (where a single tree might be too specific
to the training data).

Key Ideas:

• Combines multiple decision trees.

• Uses Ensemble Learning to improve performance.

• Takes majority vote for classification, averages for regression.

• Helps achieve higher accuracy and prevents overfitting.

3.6 Linear Regression for Regression Problems

What is Linear Regression?

Linear Regression is a fundamental statistical method used for regression problems. Its goal is to find the best straight
line that describes the relationship between an input feature (or features) and a continuous numerical output.

How it Works (Finding the "Best Fit Line"):

Imagine you have data points on a graph (like house size vs. house price). Linear regression tries to draw a straight
line that comes closest to all these points.

• Minimizing Errors: It does this by minimizing the "residuals" or "errors," which are the distances between
each actual data point and the line. Specifically, it tries to minimize the sum of the squared differences
between the actual values and the values predicted by the line. This is called the "Ordinary Least Squares
(OLS)" method.

• Equation of the Line: The final line has an equation like Y=mX+c (for one input), or more generally, Y=β0+β1
X1+β2X2+...+e for multiple inputs. Here, Y is the output, X values are inputs, β values are the coefficients
(how much each input affects the output), and 'e' is the error.

Assumptions (Things that should be true for it to work well):

Before using Linear Regression, ideally, certain things should be true about your data:

• Linear Relationship: There should be a straight-line relationship between inputs and outputs.
• Independence of Errors: The errors (differences between predicted and actual values) should not be related
to each other.

• Constant Variance of Errors: The spread of errors should be roughly the same across all input values.

• Normal Distribution of Errors: The errors should follow a bell-shaped curve (normal distribution).

• No Multicollinearity: Input features should not be too highly correlated with each other.

Key Points:

• Used for regression problems (predicting numbers).

• Finds the best-fitting straight line through data points.

• Minimizes the sum of squared errors (Ordinary Least Squares).

• Has assumptions about the data for best results.

3.7 Ordinary Least Squares (OLS) Regression

What is OLS?

Ordinary Least Squares (OLS) is the most common method used in Linear Regression. It's how the "best-fitting line" is
actually found.

How it Works:

The main idea of OLS is to make the differences between the actual data points and the line as small as possible.

• Errors/Residuals: These are the vertical distances from each data point to the line.

• Sum of Squared Residuals: OLS doesn't just add up the errors (because positive and negative errors would
cancel out). Instead, it squares each error and then adds them up. This way, larger errors get penalized more.

• Minimizing This Sum: The OLS method finds the line (by picking the right slope and intercept) that makes
this "sum of squared residuals" as small as possible. This line is called the "Regression Line."

Key Points:

• A linear regression technique.

• Estimates the unknown parameters (coefficients) of the model.

• Relies on minimizing the sum of squared differences between actual and predicted values.

• The resulting line is the Regression Line.

3.8 Logistic Regression

What is Logistic Regression?

Despite "regression" in its name, Logistic Regression is primarily a classification algorithm. It's used when the output
you want to predict is a binary category (like "yes/no," "true/false," "spam/not spam"). It predicts the probability that
an input belongs to a certain class.

How it Works:

• Instead of fitting a straight line to the data (like linear regression), Logistic Regression uses a special S-shaped
curve called the "sigmoid function."

• This curve squashes any input value into a probability between 0 and 1.
• If the probability is above a certain threshold (e.g., 0.5), it assigns it to one class; otherwise, it assigns it to
the other.

Key Points:

• A classification algorithm, not for predicting numbers directly.

• Used for binary outcomes (e.g., yes/no).

• Predicts the probability of belonging to a class.

• Uses the sigmoid (S-shaped) function.

You might also like