0% found this document useful (0 votes)
45 views5 pages

Machine Learning Models Overview

The document outlines various machine learning models, including Linear Regression, Logistic Regression, Decision Trees, Random Forests, Support Vector Machines, Neural Networks, and K-Nearest Neighbors, detailing their advantages and disadvantages. It also discusses evaluation metrics for classification and regression tasks, emphasizing the trade-offs between precision, recall, and accuracy. Additionally, it highlights the importance of selecting appropriate evaluation metrics based on specific application goals and data characteristics.

Uploaded by

v5p7pvsydh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views5 pages

Machine Learning Models Overview

The document outlines various machine learning models, including Linear Regression, Logistic Regression, Decision Trees, Random Forests, Support Vector Machines, Neural Networks, and K-Nearest Neighbors, detailing their advantages and disadvantages. It also discusses evaluation metrics for classification and regression tasks, emphasizing the trade-offs between precision, recall, and accuracy. Additionally, it highlights the importance of selecting appropriate evaluation metrics based on specific application goals and data characteristics.

Uploaded by

v5p7pvsydh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

Linear Regression: predict scores.

Logistic Regression: binary, pass or fail


Decision Trees: yes/no questions >> good for classification and regression
tasks
Random Forests: like gathering a bunch of friends (trees), each with their
own decision, and make a decision
SVM: each network processes a small but of info, they collectively make a
decision.
kNearest: look around for the closest data points, if most of them like the
category.
Naïve Bayes
Linear Regression
 Advantages:
o Simple and easy to understand.
o Good for predicting numerical outcomes.
 Disadvantages:
o Assumes a linear relationship between variables.
o Can be easily affected by outliers.

Logistic Regression
 Advantages:
o Provides probabilities for outcomes.
o Good for binary classification.
 Disadvantages:
o Assumes a linear relationship between the log odds of the
outcome and predictor variables.
o Not suitable for complex relationships.

Decision Trees
 Advantages:
o Easy to interpret and understand.
o Can handle both numerical and categorical data.
 Disadvantages:
o Prone to overfitting, especially with complex trees.
o Can be unstable, as small changes might result in a completely
different tree.

Random Forests
 Advantages:
o More accurate than a single decision tree.
o Handles overfitting well.
 Disadvantages:
o More complex and computationally intensive.
o Less interpretable than a single decision tree.

Support Vector Machines (SVM)


 Advantages:
o Effective in high dimensional spaces.
o Works well with clear margin of separation.
 Disadvantages:
o Requires careful tuning of parameters.
o Not suitable for large datasets.

Neural Networks
 Advantages:
o Extremely powerful, can model complex nonlinear
relationships.
o Good for a wide range of applications (image recognition, NLP,
etc.).
 Disadvantages:
o Requires a lot of data to train.
o Complex and hard to interpret.

K-Nearest Neighbors (KNN)


 Advantages:
o Simple and easy to implement.
o No assumption about data distribution.
 Disadvantages:
o Computationally expensive as the dataset grows.
o Sensitive to irrelevant or redundant features.

CLUSTERING CLASSIFICATION
Unsupervised Learning Supervised Learning:
Finding Groups Predicting Categories
Labers NOT Required Prelabeled data
MODELS: MODELS:
K-Means Logistic Regression
DBSCAN Decision Trees
Agglomerative Hierarchical Random Forest
SVM
Naïve Bayes
KNN
Neural Networks
EVALUATION FOR CLASSIFICATION:
Accuracy:
ALL the positive results (TP & TN) over EVERYTHING
Precision:
True Positives over ALL the positives (TP & FP) even the false positives!!
Recall:
True Positives over ALL the correct positives (TP & FN)
F1:
The harmonic Mean = Useful to balance precision and recall

Improving precision and recall can have a negative effect on


accuracy

Improving Precision: Reducing False Positives (FP) >> might increase FN


to be absolutely sure it’s true before predicting a T.
Improving Recall: Reducing False Negatives (FN) >>increase FP
THEREFORE, the number in the denominator is bigger so, Accuracy (TP & TN)
gets smaller!!

Evaluation metrics that align with the goals of the specific application and
the characteristics of the data being modeled are important,

 Precision-Recall Trade-off: There's often a trade-off between


precision and recall. Improving precision typically reduces recall and
vice versa. This is because increasing one generally involves making
the model more conservative or more liberal in predicting positives,
which can decrease or increase the other metric, respectively.
 Accuracy's Role: Accuracy might not always reflect changes in
precision and recall, especially in imbalanced datasets. For instance, a
model that always predicts the majority class can have high accuracy
but low recall and precision for the minority class.

Detecting spam emails>> improve recall

EVALUATION FOR REGRESSION:


Mean Absolute Error (MAE): The average of the absolute errors between
predicted and actual values. It gives an idea of how wrong the predictions
are.
Mean Squared Error (MSE): The average of the squares of the errors. It
penalizes larger errors more than MAE.
R-squared (Coefficient of Determination): Represents the proportion of
the variance for the dependent variable that's explained by the independent
variables in the model.

Increasing Precision
 Email Spam Detection: Higher precision minimizes the risk of
important emails being incorrectly marked as spam, ensuring
important messages reach the inbox.
 Financial Fraud Detection: In banking, high precision helps in
accurately identifying fraudulent transactions while minimizing false
positives that could inconvenience customers through false alerts or
blocked transactions.
 Product Recommendation Systems: High precision ensures that
the recommendations are relevant to the user, enhancing user
satisfaction and engagement.

Increasing Recall
 Disease Screening: High recall is crucial to ensure that as many true
cases of a disease are identified as possible, minimizing the number of
cases that go undetected.
 Disaster Response: In disaster response scenarios, high recall in
identifying areas needing assistance ensures that help is dispatched to
as many affected areas as possible, even if it means some areas might
receive unnecessary aid.
 Content Moderation: For social media platforms, higher recall in
identifying and removing harmful content is vital to ensure a safe
environment, even if some non-harmful content is mistakenly removed.

You might also like