0% found this document useful (0 votes)

29 views

Concepts - Model Evaluation (Data Mining Fundamentals)

The document discusses model evaluation techniques for assessing how well a model fits data and generalizes to new data. Common metrics include errors, accuracy, precision and recall for classification models as well as R-squared and F-tests for regression models. Validation methods like confusion matrices are also covered.

Uploaded by

mtemp7489

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views

Concepts - Model Evaluation (Data Mining Fundamentals)

Uploaded by

mtemp7489

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 40

Model Evaluation

1
Data Mining & Methodology

• Data mining is a process that uses a variety of data analysis tools to

discover patterns and relationships in data that may be used to
make valid predictions.
• A generic data mining process methodology:

2
Data Preprocessing

• The model evaluation phase of a data

mining process is assessing how well a
model fits the data and generalizes to
new data.
• This phase is an essential step in the data
mining process, as it helps to ensure that
the model is accurate and reliable.

3
Precision, bias and accuracy

Quality of measurement process & resulting data are

measured by:
• Precision – the closeness of repeated measurements
• Bias – a systematic variation of measurements from
the quantity being measured
• Accuracy – the correctness of measurements to the
true value of the quantity being measured
Choosing Measurement Criteria

• Data type of target variable (prediction)

• Interval / numerical

• Nominal / categorical

• Type of prediction

• Estimate - Interval / numerical

• Decision / Classification – Nominal / categorical

5
Model Quality Measurement
• Estimates
• Variance/Errors (examples: ASE/MSE, RASE)
• Fit (examples: Coefficient of Determination, R2 )
• Methods: Statistics

• Classification (Decisions)
• Error rates (examples: Misclassification)
• Accuracy (examples: Accuracy, Precision, Specificity)
• Methods: Confusion Matrix, Gain, Lift, ROC

6
Errors for Estimation

• Indicates how close a regression line is to a set of points (actual values). It does this by
taking the distances (i.e. errors) from the points to the regression line and squaring
them.
• The difference (i.e. variance/errors) between the estimated values and the actual value.
• Mean squared error (MSE) = SSE/DFE
• Average squared error (ASE) = SSE/N
Where SSE = Sum of Squared Error
N = total sample size
DFE = Degree of Freedom
Model Evaluation for Regressions

Regression models can be evaluated and analysed based on several

measurements:
• Errors (e.g. ASE or other errors)
• F-test
• R Squared
• Misclassification
• ROC
R2 (R-Squared) for Linear Regression
• R2 is a statistical measure of how close the data are to the fitted regression line - the goodness of fit of the
regression.
• R2 is also known as the coefficient of determination.
• R2 indicates how much better the function predicts the dependent variable than just using the mean value
of the dependent variable.
• The adjusted R2 is a statistic adjusted for the number of parameters in the equation and the number of data
observations. It is a more conservative estimate of the percent of variance explained
• R2 measures the strength of the relationship between a model and the input attributes.
• R2 is not a formal significant test for the relationship. The F-test of overall significance is the hypothesis
test for this relationship. If the overall F-test is significant, it can be concluded that the correlation between
the model and input attributes is statistically significant.
• R2 is always between 0 and 100% (or between 0 and 1):
• 0% indicates that the model explains none of the variability of the target attribute data.
• 100% indicates that the model explains all the variability of the target attribute data.
Chi-Square for Logistic Regressions
• The statistics test the overall significance of the regression model.
• The overall significance indicates whether a regression model provides a better
fit to the data than a model that contains no input attributes or using
mean/mode.
• If the p-value is less than the significance level (usually 0.01 or 0.05), the
sample data provide sufficient evidence to conclude that statistically the
regression model fits the data better than the model with no input attributes.
F-value and P-value
• The F-value is a ratio of the variances of the predicted values and the residuals, while the p-value is a probability
value that indicates the likelihood of obtaining the observed results by chance.
• F-value: The F-value is a measure of the overall significance of the model. A large F-value indicates that the model
is able to explain a significant amount of the variation in the dependent variable. An F-value that is greater than the
critical value at a given significance level (typically 0.05 or 0.01) indicates that the model is statistically significant.
• P-value: The p-value is a measure of the probability of obtaining the observed results by chance. A small p-value
indicates that the results are unlikely to have occurred by chance. A p-value that is less than the significance level
(typically 0.05 or 0.01) indicates that the results are statistically significant.
• In general, a linear regression model with a large F-value and a small p-value is considered to be a good fit for the
data. However, it is important to note that the F-value and p-value are only two measures of the model's fit. Other
factors, such as the R-squared value, should also be considered when evaluating the model.
• The F-value and p-value do not provide any information about the significance of individual variables in the model.
To assess the significance of individual variables, you should look at the t-values and p-values for those variables.
• Overall, the F-value and p-value are two useful tools for evaluating the significance of a linear regression model.
However, it is important to keep in mind that they are only two measures of the model's fit and should not be used
in isolation.
Example Outputs of a Linear Regression Model
Example Outputs of a Logistic Regression Model
Example of Linear Regression Model Interpretation

The regression model has a adjusted R square value of 0.1525, meaning the input
variables in the model can explain 15.25% of the characteristics of the target variable,
i.e. the Result.

Provided other input variables are constant:

Model presentation:
result = 18.7833 + 0.7717(Medu=0) – 0.9251(Medu=1) – 0.7789(Medu=2) +
0.2555(Medu=3) – 0.4689(age) – 0.3951(goout) + 0.8672(studytime)
Decision Tree Evaluation: Performance and Complexity
Misclassification/error in both training
and validation keep decreasing when
more model training is performed

As more training does not improve

performance (i.e. misclassification in
validation does not decrease obviously
anymore), further training will lead to
overfitting (i.e. when we observe the
training misclassification/error keeps
decreasing while validation increasing.

Optimal tree:
Compromise with 5 leaves
Observing Decision Tree Rules Node id: 6 (best rule/highest % to identify B=1)
if Replacement: Gift Count 36 Months >= 2.5 or MISSING
AND Replacement: Gift Amount Last < 7.5
then there is 64% chance B is 1 (i.e. 36% B is 0).

Node id: 23 (best rule/highest % to identify B=0)

if Replacement: Time Since Last Gift >= 17.5 or MISSING
AND Replacement: Gift Count 36 Months >= 2.5 or MISSING
AND Replacement: Gift Amount Last >= 7.5 or MISSING
AND Replacement: Gift Amount Average Card 36 Months >= 14.415
or MISSING
then there is 59% chance B is 0 (reversely 41% B is 1).

16
Validation dataset (Positive target response is Churn = “Y”)

ID Age Gender Churn Churn Churn

(Actual value) (Prediction)
1 18 M Y Y
2 21 M N Y
3 30 F N N
4 25 M Y Y
5 50 F N N
6 28 F Y Y
7 22 F Y N
8 40 M N Y
9 32 F N N
10 60 M N N
17
Validation dataset (Positive target response is Churn = “Y”)

ID Age Gender Churn Churn Churn

(Actual value) (Prediction)
1 18 M Y Y Positive (TRUE) = True Positive
2 21 M N Y Positive (FALSE) = False Positive
3 30 F N N Negative (TRUE) = True Negative
4 25 M Y Y Positive (TRUE) = True Positive
5 50 F N N Negative (TRUE) = True Negative
6 28 F Y Y Positive (TRUE) = True Positive
7 22 F Y N Negative (FALSE) = False Negative
8 40 M N Y Positive (FALSE) =False Positive
9 32 F N N Negative (TRUE) = True Negative
10 60 M N N Negative (TRUE) = True Negative
18
19
Evaluating Binary Models’ Predictive Accuracy
Prediction models with a binary response are assessed based on four properties
are initially required:
• True positive (TP): The number of observations predicted to be true (1) that are
in fact true (1).
• True negative (TN): The number of observations predicted to be false (0) that are
in fact false (0).
• False positive (FP): The number of observations that are incorrectly predicted to
be positive (1), but which are in fact negative (0).
• False negative (FN): The number of observations that are incorrectly predicted to
be negative (0), but which are in fact positive (1).

These four alternatives are illustrated in the Confusion Matrix 20

Confusion Matrix and Measures

Misclassification Rate: (FP+FN)/Total Overall, how often is the model wrong?

Accuracy: (TP+TN)/Total Overall, how often is the model correct?
True Positive Rate(Precision Positive): When it predicts yes, how often is it correct?
TP/(TP+FP) How often the predicted ‘true’ target is correct?
True Negative Rate(Precision Negative): When it predicts no, how often is it correct?
TN/(TN+FN) How often the predicted ‘false’ target is correct?
Sensitivity (Recall Positive): When it's actually yes, how often does it predict yes?
TP/(TP+FN) The ability to find all relevant (positive) targets in a dataset
Specificity (Recall Negative): When it's actually no, how often does it predict no?
TN/(TN+FP) The ability to find all relevant (negative) targets in a dataset
21
Recall vs Precision
John was the witness of an incident where there were 10 terrorists attacking
customers in a café.

What are the chances that John will be able to recall all terrorists precisely?

Say John narrated 15 terrorists to finally spell out the 10 correct terrorists.

22
Recall vs Precision
Then, John’s recall (ability to find all relevant (positive) targets) will be 100%,
but his precision (ability to correctly predict ‘true’ target from all prediction)
will only be (10/15=67%)
Calculate the rates..

 Accuracy:
 Misclassification Rate:
 Precision positive (True Positive Rate):
 Recall positive (Sensitivity):
 Recall negative (Specificity):

24
Got it?
100+50
• Accuracy: = = 0.9091
165
10+5
• Misclassification Rate = = 0.0909
165
100
• Precision = = 0.9090
10+100
100
• Sensitivity = = 0.9524
100+5
50
• Specificity = = 0.8333
50+10

25
ROC Chart
• A receiver operating characteristics, or ROC, curve
provides an assessment of one or more binary
classification models.
• Usually a diagonal line is plotted as a baseline, that
is, where a random prediction would lie.
• For classification models that generate a single
value, a single point can be plotted on the chart. A
point above the diagonal line indicates a degree of
accuracy that is better than a random prediction.
• Conversely, a point below the line indicates that
the prediction is worse than a random prediction.
The closer the point is to the upper top left point in
the chart, the better the prediction.

26
ROC Chart and Index
1.0

0.0
0.0 1.0
weak model strong model
weak model
ROC Index < 0.6 strong model
ROC Index > 0.7

• The closer the curve follows the left-hand border and then the top border of
the ROC space, the more accurate the test.
• The closer the curve comes to the 45-degree diagonal of the ROC space, the
less accurate the test.
SAS Enterprise Miner – Model Comparison

30
SAS Enterprise Miner – Model Comparison

31
SAS Enterprise Miner – Model Comparison

32
Contingency Table-based Measures
Actual
Orange (c1) Apple (c2) Plum (c3)
Orange (c1) 10 0 0 p1 = 10
Apple (c2) 0 7 5 p2 = 12
Predicted Plum (c3) 0 3 5 p3 = 8
a1 = 10 a2 = 10 a3 = 10
Accuracy = All Correct Prediction/Total = 22/30 = 0.733
Precision (c1): CP1/(CP1+p1) = 10/10 = 1
Precision (c2): CP2/(CP2+p2) = 7/12 = 0.583
Precision (c3): CP3/(CP3+p3) = 5/8 = 0.625
Recall (c1): CP1/a1 = 10/10 = 1
Recall (c2): CP2/a2 = 7/10 = 0.7
Recall (c3): CP3/a3 = 5/10 = 0.5
F(c1): (2*CP1)/(a1+p1) = (2*10)/(10+10) = 20/20 = 1
F(c2): (2*CP2)/(a2+p2) = (2*7)/(10+12) = 14/22 = 0.636
F(c3): (2*CP3)/(a3+p3) = (2*5)/(10+8) = 10/18 = 0.556
33
Overall F-measure: (1/number of classes)(total of all classes F) = (1/3)(1+0.636+0.556) = 0.731
Example: Binary Classification Models Comparison
Analysis & Interpretation:
• The overall accuracy and
error rate of the models
are summarized in the
accuracy and error
metric. In general,
model C is most accurate,
followed by model B, and then model A.
• The metrics also assess how well the models
specifically predict positives, with model B
performing the best based on the sensitivity
score.
• Model C has the highest specificity score,
indicating that this model is the best of the
three at predicting negatives.
Note: These different metrics are used in different
situations, depending on the goal of the specific
project.
34
Generalization and Overfitting
• Generalization is the property of a model, whereby the model applies to data that were
not used to build the model.
• Applying models not just to the exact training set but to the general population
(including those beyond the training data.) from which the training data came
• Overfitting is the tendency of data mining procedures to tailor models to the training
data, at the expense of generalization to previously unseen data points.
• There is no single choice/procedure that will eliminate overfitting. The best strategy is to
recognize overfitting and manage complexity in a principled way.
• The accuracy of a model depends on how complex we allow it to be. A model can be
complex in different ways
• Generally, there will be more overfitting as one allows the model to be more complex.
Underfitting and Overfitting
Error

Validation Data

Training Data

Number of Nodes
Underfitting Overfitting
(Model Complexity)
Good Compromise
Underfitting: when model is too simple, both training and test errors are large
Overfitting: test error rate begins to increase as training error rate continues to decrease
Over-fitting and Under-fitting
• Under-fitting
• When model is too simple, both training and validation
errors/misclassification are large
• Over-fitting
• Occurs when learned function (i.e. training) performs well on
data used during the training and poorly with new data (i.e.
validation)
• Validation error rate begins to increase as training error rate
continues to decrease
• An issue for all modeling algorithms

37
More Training ≠ Performance

Decision Tree Model 1 Decision Tree Model 2 Decision Model 3

Partition = 50:50, Partition = 70:30, Partition = 90:10,
Misclassification = 0.428 Misclassification = 0.417 Misclassification = 0.432

38
Complexity ≠ Performance

Model A Model B
ASE = 11.311 (Validation) ASE = 5.702 (Validation)

39
Complexity and Performance Evaluation
• Choose the right metrics
• Different modelling techniques may have different performance metrics
• R-Square (e.g. linear Regressions)
• ASE for estimation (interval target)
• Confusion matrix, ROC, misclassification/accuracy for classification
(nominal target)
• Different modelling techniques may pose different characteristics in
complexity
• Decision trees: depth/branch, leaves size
• Regressions: number of attributes

41
Adjustment for improving Model Performance /Complexity
(and reduce Bias)
Data preparation
• Detect highly correlated attributes, i.e. collinearity
• Data transformation on variables with high data sparsity / outliers
• Imputation – replacing missing values
• Data deletion – missing values/outliers/duplication
• Variable importance / selection
• Replacing misclassified values
NOTE: Do not surprise if you do not get better performance after performing
certain data preparation techniques!
Modeling
• Different modelling techniques have different test settings
• Variable selection methods (backward/forward/stepwise)
• Tree pruning (for decision tree)
42
Model Presentation and Interpretation
Model presentation and explanation
• Decision Tree modelling technique:
o Present the complete rules or tree of Decision tree
o Explain the best rule for a positive and negative target
▪ The rule description
▪ positive and negative target purity ratio explanation
• Regression modelling technique:
o Present the formula of regression
o Interpretation of the regression formula
Model complexity
• Presented the number of input attributes as a predictor in the selected model
• Presented the names of the input attributes included in the selected model
Model performance
• indication of accuracy or estimation prediction
• present the value of measurement for model evaluation
• indication of “validation” partition is used as model selection criteria
43

FortiAnalyzer 7.0 Study Guide-Online (1) Unlocked
No ratings yet
FortiAnalyzer 7.0 Study Guide-Online (1) Unlocked
346 pages
Introduction To Statistical Learning: With Applications in R
No ratings yet
Introduction To Statistical Learning: With Applications in R
13 pages
Statistics: a QuickStudy Laminated Reference Guide
From Everand
Statistics: a QuickStudy Laminated Reference Guide
BarCharts Publishing, Inc.
No ratings yet
نظام العقارات الالكتروني-2013
60% (5)
نظام العقارات الالكتروني-2013
95 pages
FS1570 FS2570 Installation Manual
No ratings yet
FS1570 FS2570 Installation Manual
76 pages
AI & ML Notes
No ratings yet
AI & ML Notes
22 pages
Data Science Technical Interview Questions
No ratings yet
Data Science Technical Interview Questions
24 pages
FDS_notes
No ratings yet
FDS_notes
6 pages
Concepts - Regression Overview
No ratings yet
Concepts - Regression Overview
14 pages
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
No ratings yet
Data Science Cheatsheet 2.0: Statistics Model Evaluation Logistic Regression
4 pages
IS4242 W6 Model Evaluation and Selection
No ratings yet
IS4242 W6 Model Evaluation and Selection
86 pages
ML Unit-3 - RTU
No ratings yet
ML Unit-3 - RTU
20 pages
Exam PA Knowledge Based Outline
No ratings yet
Exam PA Knowledge Based Outline
22 pages
3-Performance Measures
No ratings yet
3-Performance Measures
35 pages
ML Model Paper 2 Solution
No ratings yet
ML Model Paper 2 Solution
15 pages
Model Evaluation
No ratings yet
Model Evaluation
18 pages
2-Training and Testing Models, Evaluation Metrics-01-07-2023
No ratings yet
2-Training and Testing Models, Evaluation Metrics-01-07-2023
23 pages
Model Perf Cheat Sheet
No ratings yet
Model Perf Cheat Sheet
2 pages
Analytics in Practice: Model Evaluation
No ratings yet
Analytics in Practice: Model Evaluation
40 pages
IT 138 - Lecture 4
No ratings yet
IT 138 - Lecture 4
30 pages
Model Evaluation
No ratings yet
Model Evaluation
80 pages
Statistics N Probability
No ratings yet
Statistics N Probability
31 pages
Ds Module 4
No ratings yet
Ds Module 4
73 pages
ERROR and Confusion Matrix
No ratings yet
ERROR and Confusion Matrix
29 pages
Metrix in ML
No ratings yet
Metrix in ML
7 pages
ML 04 Validation Regularization
No ratings yet
ML 04 Validation Regularization
57 pages
11 - Model Eval and Tuning
No ratings yet
11 - Model Eval and Tuning
17 pages
Model Perf Cheat Sheet
No ratings yet
Model Perf Cheat Sheet
2 pages
Model Evaluation and Selection
No ratings yet
Model Evaluation and Selection
6 pages
ML MAKAUT unit-3
No ratings yet
ML MAKAUT unit-3
6 pages
Evaluation Metrics:: Confusion Matrix
No ratings yet
Evaluation Metrics:: Confusion Matrix
7 pages
Data Science Interview Questions -1
No ratings yet
Data Science Interview Questions -1
55 pages
BIOSTATISTICS
No ratings yet
BIOSTATISTICS
15 pages
Lecture 6-Revisions Chapter 1-5
No ratings yet
Lecture 6-Revisions Chapter 1-5
62 pages
Capitulo 2 big data
No ratings yet
Capitulo 2 big data
25 pages
CE880_Lecture6_slides
No ratings yet
CE880_Lecture6_slides
25 pages
Confusion Matrix & Evaluation Metrics in Machine Learning
No ratings yet
Confusion Matrix & Evaluation Metrics in Machine Learning
23 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
4 pages
Data Mining Primer
No ratings yet
Data Mining Primer
5 pages
MISY 631 Final Review Calculators Will Be Provided For The Exam
No ratings yet
MISY 631 Final Review Calculators Will Be Provided For The Exam
9 pages
DCSN 216 Summary
No ratings yet
DCSN 216 Summary
23 pages
Day 3 Statistics Interview QnA
No ratings yet
Day 3 Statistics Interview QnA
5 pages
Information Securtiy
No ratings yet
Information Securtiy
8 pages
IT 122 Lecture 4
No ratings yet
IT 122 Lecture 4
26 pages
FDS Sem5
No ratings yet
FDS Sem5
15 pages
Performance Metrics (Classification) : Enrique J. de La Hoz D
100% (1)
Performance Metrics (Classification) : Enrique J. de La Hoz D
30 pages
Statistical Evaluation of Big Data
No ratings yet
Statistical Evaluation of Big Data
22 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
Module 1
No ratings yet
Module 1
19 pages
ISLR
No ratings yet
ISLR
9 pages
EDAN96_2024_Last_lecture-1
No ratings yet
EDAN96_2024_Last_lecture-1
78 pages
Unit 3 Computational Statistics
No ratings yet
Unit 3 Computational Statistics
5 pages
DL_IT324a_4
No ratings yet
DL_IT324a_4
52 pages
06_prediction-and-decision-making.en
No ratings yet
06_prediction-and-decision-making.en
2 pages
APznzaag02xO1GGi5u_A2DhJZs4CkLi9le3t7z9-R-wpvTJmn6o4ZfwQPBMHbFF9nnLxXjm40qffE-ZJQt7sji0grSXm812681Z1HXweJuujlkNekCE0LBXhi7QZzIbYwVm0Gy8OihuREB3yX-xuUY9vnUp00zdff4914hbLoLi_yw8ca2WGrMjDOn15XXUi5lnBdigIFlLgiIztS_axMl
No ratings yet
APznzaag02xO1GGi5u_A2DhJZs4CkLi9le3t7z9-R-wpvTJmn6o4ZfwQPBMHbFF9nnLxXjm40qffE-ZJQt7sji0grSXm812681Z1HXweJuujlkNekCE0LBXhi7QZzIbYwVm0Gy8OihuREB3yX-xuUY9vnUp00zdff4914hbLoLi_yw8ca2WGrMjDOn15XXUi5lnBdigIFlLgiIztS_axMl
15 pages
9b. Evaluation of Classifiers
No ratings yet
9b. Evaluation of Classifiers
4 pages
Lecture 3b - Evaluation
No ratings yet
Lecture 3b - Evaluation
37 pages
Evaluating A Machine Learning Model
No ratings yet
Evaluating A Machine Learning Model
14 pages
Lec 8
No ratings yet
Lec 8
35 pages
Basicof Stats
No ratings yet
Basicof Stats
7 pages
Data science cheat sheet
No ratings yet
Data science cheat sheet
7 pages
Fiches Machine Learning
No ratings yet
Fiches Machine Learning
21 pages
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
Internship Report
No ratings yet
Internship Report
23 pages
BCA VI Sem Microprocessor & Assembly Language Programming
No ratings yet
BCA VI Sem Microprocessor & Assembly Language Programming
24 pages
Led TV: Service
No ratings yet
Led TV: Service
116 pages
Electronic Pressure Switch XMLR010G0T75
No ratings yet
Electronic Pressure Switch XMLR010G0T75
5 pages
L04 - IS - Program Security
No ratings yet
L04 - IS - Program Security
2 pages
Chapter 6 Distributed System Management
No ratings yet
Chapter 6 Distributed System Management
12 pages
PC Laptop 101learn The Basics
No ratings yet
PC Laptop 101learn The Basics
20 pages
Social Media The Convergence of Public and Personal Communication 2nd Edition Graham Meikle 2024 scribd download
No ratings yet
Social Media The Convergence of Public and Personal Communication 2nd Edition Graham Meikle 2024 scribd download
67 pages
SNLR 019
No ratings yet
SNLR 019
6 pages
Contact Managment UI With React
No ratings yet
Contact Managment UI With React
10 pages
Introduction To Os Important Question
No ratings yet
Introduction To Os Important Question
12 pages
GitHub - Atkinssamuel - Applied-Map-Reduce
No ratings yet
GitHub - Atkinssamuel - Applied-Map-Reduce
6 pages
Development of DC Motor Speed Control Using PID Ba
No ratings yet
Development of DC Motor Speed Control Using PID Ba
6 pages
PNTC Colleges: A Research Paper Proposal Presented To The
No ratings yet
PNTC Colleges: A Research Paper Proposal Presented To The
34 pages
ECS781P-1-Introduction To Cloud Computing
No ratings yet
ECS781P-1-Introduction To Cloud Computing
68 pages
ABC Candidate Submittal Cover Sheet - Duwand Constant
No ratings yet
ABC Candidate Submittal Cover Sheet - Duwand Constant
2 pages
Linear Predictive Coding
No ratings yet
Linear Predictive Coding
4 pages
Robotics 01
No ratings yet
Robotics 01
45 pages
Experiment No. 9 3118013: Aim: Theory: Lexical Analyzer
No ratings yet
Experiment No. 9 3118013: Aim: Theory: Lexical Analyzer
16 pages
Diploma Business Information Technology - Brochure
No ratings yet
Diploma Business Information Technology - Brochure
4 pages
Ion 510 & Ion 520 & Ion 530 Kit - Chef: Quick Reference
No ratings yet
Ion 510 & Ion 520 & Ion 530 Kit - Chef: Quick Reference
7 pages
Chapter 4 Boolean Algebra (1)
No ratings yet
Chapter 4 Boolean Algebra (1)
27 pages
IRJET IoT Based Water Quality Monitoring
No ratings yet
IRJET IoT Based Water Quality Monitoring
5 pages
Residence Case Study
No ratings yet
Residence Case Study
15 pages
MC4101-ADSA Unit-II 1
No ratings yet
MC4101-ADSA Unit-II 1
65 pages
Technology Hype and IOT PDF
No ratings yet
Technology Hype and IOT PDF
3 pages
GE Fanuc IC695NKT002: RX3i Ethernet NIU Kit With Two Ethernet Modules. IC695N IC695NK IC695NKT
No ratings yet
GE Fanuc IC695NKT002: RX3i Ethernet NIU Kit With Two Ethernet Modules. IC695N IC695NK IC695NKT
13 pages

Concepts - Model Evaluation (Data Mining Fundamentals)

Uploaded by

Concepts - Model Evaluation (Data Mining Fundamentals)

Uploaded by

Model Evaluation

• Data mining is a process that uses a variety of data analysis tools to

• The model evaluation phase of a data

Quality of measurement process & resulting data are

• Data type of target variable (prediction)

• Estimate - Interval / numerical

• Decision / Classification – Nominal / categorical

Regression models can be evaluated and analysed based on several

Provided other input variables are constant:

As more training does not improve

Node id: 23 (best rule/highest % to identify B=0)

ID Age Gender Churn Churn Churn

ID Age Gender Churn Churn Churn

These four alternatives are illustrated in the Confusion Matrix 20

Misclassification Rate: (FP+FN)/Total Overall, how often is the model wrong?

Decision Tree Model 1 Decision Tree Model 2 Decision Model 3

You might also like