L 13 Choose Your Own Algorithm D 07062024 111828am

Uploaded by

Bahadar Ayaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views36 pages

L 13 Choose Your Own Algorithm D 07062024 111828am

Uploaded by

Bahadar Ayaz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 36

Choose Your Own Algorithm

Introduction
• Machine learning is part art and part science.
When you look at machine learning
algorithms, there is no one solution or one
approach that fits all.
• There are several factors that can affect your
decision to choose a machine learning
algorithm.
Introduction
• Some problems are very specific and require a unique
approach. E.g. if you look at a recommender system,
it’s a very common type of machine learning
algorithm and it solves a very specific kind of problem.
• While some other problems are very open and need a
trial & error approach.
• Supervised learning, classification and regression etc.
are very open. They could be used in anomaly
detection, or they could be used to build more general
sorts of predictive models.
• Besides some of the decisions that we make
when choosing a machine learning algorithm
have less to do with the optimization or the
technical aspects of the algorithm but more to
do with business decisions.
• We look at some of the factors that can help
you narrow down the search for your machine
learning algorithm.
Steps
• Understand Your Data
• Know your data
• Clean your data
• Augment your data
• Categorize the problem
• Understand your constraints
• Find the available algorithms
Linear Regression
• Time to go one location to another
• Predicting sales of particular product next
month
• Impact of blood drug content on coordination
• Predict monthly gift card sales and improve
yearly revenue projections
Decision Trees
• Investment decisions
• Customer churn
• Banks loan defaulters
• Build vs Buy decisions
• Sales lead qualifications
K Nearest Neighbor
• Sometimes you don’t know any labels and
your goal is to assign labels according to the
features of objects. This is called clusterization
task. Clustering algorithms can be used for
example, when there is a large group of users
and you want to divide them into particular
groups based on some common attributes.
SVM
• Detecting persons with common diseases such
as diabetes.
• Hand-written character recognition
• Text categorization — news articles by topics
• Stock market price prediction
Naïve Bayes
• Sentiment analysis and text classification
• Recommendation systems like Netflix, Amazon
• To mark an email as spam or not spam
• Face recognition
Random Forest
• Random Forest can be used in real-world
applications such as,
• Predict patients for high risks
• Predict parts failures in manufacturing
• Predict loan defaulters
Cheat Sheet
Performance Metrics
• True Positive: You predicted positive, and it’s true.
• True Negative: You predicted negative, and it’s true.
• False Positive: (Type 1 Error): You predicted positive, and it’s false.
• False Negative: (Type 2 Error): You predicted negative, and it’s false.
• Accuracy: the proportion of the total number of correct predictions that were
correct.
• Positive Predictive Value or Precision: the proportion of positive cases that were
correctly identified.
• Negative Predictive Value: the proportion of negative cases that were correctly
identified.
• Sensitivity or Recall: the proportion of actual positive cases which are correctly
identified.
• Specificity: the proportion of actual negative cases which are correctly identified.
• Rate: It is a measuring factor in a confusion matrix. It has also 4 types TPR, FPR,
TNR, and FNR
If the classifier predicts negative, you can trust it, the example is negative. However,
pay attention, if the example is negative, you can’t be sure it will predict it as
negative (specificity=78%).
If the classifier predicts positive, you can’t trust it (precision=33%). However, if the
example is positive, you can trust the classifier (recall=100%).
• Predicting everything as positive clearly can’t
be a good idea. However, because the
population is imbalanced the precision is
relatively high, the recall is 100% because all
the positive examples are predicted as positive.
But the specificity is 0% because no negative
example is predicted as negative.
The accuracy for the problem in hand comes out to be 88%. As you can see from
the above two tables, the Positive Predictive Value is high, but the negative
predictive value is quite low. The same holds for Sensitivity and Specificity. This is
primarily driven by the threshold value we have chosen. If we decrease our
threshold value, the two pairs of starkly different numbers will come closer.
In general, we are concerned with one of the above-defined metrics. For instance,
in a pharmaceutical company, they will be more concerned with a minimal wrong
positive diagnosis. Hence, they will be more concerned about high Specificity. On
the other hand, an attrition model will be more concerned with Sensitivity.
Confusion matrices are generally used only with class output models.
F1 Score
In the last section, we discussed precision and recall for classification problems and
also highlighted the importance of choosing a precision/recall basis for our use case.
What if, for a use case, we are trying to get the best precision and recall at the same
time? F1-Score is the harmonic mean of precision and recall values for a
classification problem.
Now, an obvious question that comes to mind is why you are taking
a harmonic mean and not an arithmetic mean. This is because HM
punishes extreme values more. Let us understand this with an
example. We have a binary classification model with the following
results:
Precision: 0, Recall: 1
Here, if we take the arithmetic mean, we get 0.5. It is clear that the
above result comes from a dumb classifier that ignores the input
and predicts one of the classes as output. Now, if we were to take
HM, we would get 0 which is accurate as this model is useless for
all purposes.
This seems simple. There are situations, however, for which a data
scientist would like to give a percentage more importance/weight
to either precision or recall
• Area Under the ROC Curve (AUC – ROC)
• This is again one of the popular evaluation metrics used
in the industry. The biggest advantage of using the ROC
curve is that it is independent of the change in the
proportion of responders. This statement will get clearer
in the following sections.
• Let’s first try to understand what the ROC (Receiver
operating characteristic) curve is. If we look at the
confusion matrix below, we observe that for a
probabilistic model, we get different values for each
metric.
Clustering Metrics
• Silhouette Score
• The Silhouette Score and Silhouette Plot are used to measure
the separation distance between clusters
• Dunn Index
• Dunn’s Index (DI) is another metric for clustering algorithm
evaluation. Dunn’s Index equals the minimum inter-cluster
distance divided by the maximum cluster size. Large inter-
cluster distances (better separation) and smaller cluster sizes
(more compact clusters) lead to a higher DI value. A higher DI
implies better clustering. It assumes that better clustering
means that clusters are compact and well-separated from other
clusters.

Unit 4
No ratings yet
Unit 4
20 pages
Classification Metrics
No ratings yet
Classification Metrics
39 pages
Machine Learning Note
No ratings yet
Machine Learning Note
40 pages
Unit6 - 7 Issues
No ratings yet
Unit6 - 7 Issues
53 pages
Evaluating A Machine Learning Model
No ratings yet
Evaluating A Machine Learning Model
14 pages
Machine Learning Classification Guide
No ratings yet
Machine Learning Classification Guide
28 pages
19-Performance Metrics
No ratings yet
19-Performance Metrics
23 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
No ratings yet
CS 620 / DASC 600 Introduction To Data Science & Analytics: Lecture 8-Performance Evaluation
62 pages
CLASSIFICATION
No ratings yet
CLASSIFICATION
36 pages
ML Model Evaluation Metrics
No ratings yet
ML Model Evaluation Metrics
11 pages
ML Unit 3
No ratings yet
ML Unit 3
127 pages
03 Performance Metrics
No ratings yet
03 Performance Metrics
15 pages
Unit - 5
No ratings yet
Unit - 5
57 pages
Unit3 7 Issues
No ratings yet
Unit3 7 Issues
24 pages
Machine Learningassignment
No ratings yet
Machine Learningassignment
10 pages
ML Metrics
No ratings yet
ML Metrics
9 pages
Data M
No ratings yet
Data M
10 pages
Evaluation Matrix
No ratings yet
Evaluation Matrix
29 pages
Data M11
No ratings yet
Data M11
5 pages
Chapter 7 - LAST
No ratings yet
Chapter 7 - LAST
29 pages
Chap3 Part1 Classification
No ratings yet
Chap3 Part1 Classification
38 pages
Unit 3
No ratings yet
Unit 3
123 pages
Binary Classification PDF
No ratings yet
Binary Classification PDF
27 pages
Basics of ML and Evaluation
No ratings yet
Basics of ML and Evaluation
42 pages
Machine Learning Project Report (Group 3) Shahbaz Khan
No ratings yet
Machine Learning Project Report (Group 3) Shahbaz Khan
11 pages
Machine Learning Evaluation Metrics
No ratings yet
Machine Learning Evaluation Metrics
16 pages
Unit Iii
No ratings yet
Unit Iii
67 pages
Classification Metrics Mod 6
No ratings yet
Classification Metrics Mod 6
8 pages
ML Challenges and Metrics
No ratings yet
ML Challenges and Metrics
19 pages
06-FSSR DS610 2024 2025T1 Metrics
No ratings yet
06-FSSR DS610 2024 2025T1 Metrics
24 pages
Intro to Machine Learning Steps
No ratings yet
Intro to Machine Learning Steps
35 pages
Machine Learning
No ratings yet
Machine Learning
6 pages
Performance Parameters
No ratings yet
Performance Parameters
14 pages
BSC ML CH1
No ratings yet
BSC ML CH1
63 pages
GR 10 - Final Evaluation
No ratings yet
GR 10 - Final Evaluation
45 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
No ratings yet
Session 2 Evaluation Boosting Bagging Contemporary Business Anaytics
17 pages
Unit 2 Classification
No ratings yet
Unit 2 Classification
59 pages
ML CH 5
No ratings yet
ML CH 5
45 pages
ML - Mod2 Classification
No ratings yet
ML - Mod2 Classification
74 pages
Lecture 3 1611410001002
No ratings yet
Lecture 3 1611410001002
51 pages
Dr. Dubacharla Gyaneshwar
No ratings yet
Dr. Dubacharla Gyaneshwar
30 pages
Ai Unit 5
No ratings yet
Ai Unit 5
13 pages
Lec5 Classification
No ratings yet
Lec5 Classification
27 pages
3.4. Evaluation Metrics For AI Models
No ratings yet
3.4. Evaluation Metrics For AI Models
36 pages
Intermediate Analytics-Regression-Week 3-1
No ratings yet
Intermediate Analytics-Regression-Week 3-1
44 pages
Classification
100% (2)
Classification
105 pages
Unit III Iml Final
No ratings yet
Unit III Iml Final
36 pages
AIML-HC Mod 03
No ratings yet
AIML-HC Mod 03
46 pages
CH-5 ML
No ratings yet
CH-5 ML
36 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Statistical Modelling and Evaluation
No ratings yet
Statistical Modelling and Evaluation
15 pages
ML Unit 3
No ratings yet
ML Unit 3
13 pages
Performance Metrics
No ratings yet
Performance Metrics
34 pages
Evaluation Metricsflaksdj Fa
No ratings yet
Evaluation Metricsflaksdj Fa
22 pages
CSE4261 Lecture-10
No ratings yet
CSE4261 Lecture-10
50 pages
Data Mining: Class Imbalance Solutions
No ratings yet
Data Mining: Class Imbalance Solutions
56 pages
Live Classroom 2
No ratings yet
Live Classroom 2
40 pages
Natural Language Processing Professional Program
No ratings yet
Natural Language Processing Professional Program
12 pages
ML Lab Syllabus for Students
No ratings yet
ML Lab Syllabus for Students
90 pages
Open-World Classification Report
No ratings yet
Open-World Classification Report
48 pages
Machine Learning With Convolutional Neural Networks
No ratings yet
Machine Learning With Convolutional Neural Networks
22 pages
Zhang 2020
No ratings yet
Zhang 2020
5 pages
18AI61
No ratings yet
18AI61
3 pages
Anil K. Ghosh: CV & Publications
No ratings yet
Anil K. Ghosh: CV & Publications
6 pages
What Is Backpropagation
No ratings yet
What Is Backpropagation
8 pages
Yong Shi - Advances in Big Data Analytics - Theory, Algorithms and Practices (2022, Springer) - Libgen - Li
100% (1)
Yong Shi - Advances in Big Data Analytics - Theory, Algorithms and Practices (2022, Springer) - Libgen - Li
723 pages
Customer Churn Prediction in The Telecom Sector
No ratings yet
Customer Churn Prediction in The Telecom Sector
6 pages
Machine Learning 1707965934
No ratings yet
Machine Learning 1707965934
15 pages
Introduction To Machine Learning - Final Quiz 2
No ratings yet
Introduction To Machine Learning - Final Quiz 2
11 pages
Theoretical Evaluation of Ensemble Machine Learning Techniques
No ratings yet
Theoretical Evaluation of Ensemble Machine Learning Techniques
9 pages
Cardiovascular Disease Prediction Using Deep Learning
No ratings yet
Cardiovascular Disease Prediction Using Deep Learning
6 pages
MyThesis PDF
No ratings yet
MyThesis PDF
171 pages
Towards Early Stroke Prediction Detecting Hidden Patterns With Data Analytics
No ratings yet
Towards Early Stroke Prediction Detecting Hidden Patterns With Data Analytics
8 pages
AI - A Two-Fold Machine Learning Approach To Prevent and Detect IoT Botnet Attacks - Paper - PLAGARISM
No ratings yet
AI - A Two-Fold Machine Learning Approach To Prevent and Detect IoT Botnet Attacks - Paper - PLAGARISM
10 pages
Int 354
No ratings yet
Int 354
4 pages
Lemlem Abebaw Asaye Assignment 9
No ratings yet
Lemlem Abebaw Asaye Assignment 9
8 pages
Tru ML - AI 030 - Political Security Threat Prediction Framework Using Hybrid Lexicon-Based Approach and Machine Learning Technique - ABSTRACT 2
No ratings yet
Tru ML - AI 030 - Political Security Threat Prediction Framework Using Hybrid Lexicon-Based Approach and Machine Learning Technique - ABSTRACT 2
6 pages
Dott. Ing. Letizia Squarcina, PH.D.: Tecniche Di Analisi Di MRI Cerebrale Neuroscience and Psychiatry
No ratings yet
Dott. Ing. Letizia Squarcina, PH.D.: Tecniche Di Analisi Di MRI Cerebrale Neuroscience and Psychiatry
46 pages
Multi-Label Learning with GLOCAL
No ratings yet
Multi-Label Learning with GLOCAL
14 pages
Fake News Detection Using ML: Srishti Agrawal, Vaishali Arora, Ruchika Arora, Pronika Chawla, Madhumita Kathuria
No ratings yet
Fake News Detection Using ML: Srishti Agrawal, Vaishali Arora, Ruchika Arora, Pronika Chawla, Madhumita Kathuria
6 pages
Unit 1.1
No ratings yet
Unit 1.1
44 pages
Keras v.2.1.6
No ratings yet
Keras v.2.1.6
244 pages
Cyber Bullying Detection Using Machine Learning
No ratings yet
Cyber Bullying Detection Using Machine Learning
4 pages
Clustering Analysis (Unsupervised)
No ratings yet
Clustering Analysis (Unsupervised)
6 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Gradient Boosting: Presentation Edited by
100% (1)
Gradient Boosting: Presentation Edited by
38 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
1 page

L 13 Choose Your Own Algorithm D 07062024 111828am

Uploaded by

L 13 Choose Your Own Algorithm D 07062024 111828am

Uploaded by

Choose Your Own Algorithm

You might also like