0% found this document useful (0 votes)
74 views1 page

Chapter 5: Mind Map: Mathematical Functions

The document discusses overfitting and how to avoid it. It notes that overfitting occurs when a model memorizes the training data and does not generalize well to new data. Specifically, it can happen when a model becomes too complex, such as a decision tree with too many nodes. The document recommends measuring accuracy on both the training and test sets to evaluate overfitting. It also suggests techniques like cross-validation and pruning decision trees to find the right balance between model complexity and accuracy.

Uploaded by

Amir Rashidee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views1 page

Chapter 5: Mind Map: Mathematical Functions

The document discusses overfitting and how to avoid it. It notes that overfitting occurs when a model memorizes the training data and does not generalize well to new data. Specifically, it can happen when a model becomes too complex, such as a decision tree with too many nodes. The document recommends measuring accuracy on both the training and test sets to evaluate overfitting. It also suggests techniques like cross-validation and pruning decision trees to find the right balance between model complexity and accuracy.

Uploaded by

Amir Rashidee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Linear

Classes are distinct and separable

Iris model from other chapter


Better fit = more attributes
Patterns that do not generalize: over-fitting
Mathematical Functions Over-fitting and it's Avoidance
Adding more xi's is more complex Allows for flexibility when searching data
wi = a learned parameter

Measure accuracy on training and test set


If not pure: estimate based on average

Sweet spot: where it starts to over-fit

Generalization
Sectioning to get "pure" data Chapter 5: Mind Map Not fit with other data: over-fit
Over-fitting in Tree Induction
For previously unseen data

memorizes training data and doesn't generalize


Number of nodes = complexity of the tree
Sampling approach = table model
Growing trees until the leaves are pure: how to over-fit
If fails: more realistic models will fail too

All data models could and do, do this


Recognize and manage in the principle way

Based on how complex you allow the model to be


Tendency to make models with training data
Overfitting
At the expense of Generalization
Based on accuracy as a model of complexity
Fitting Graph

Comparing predicted values w/hidden true values Increases when you allow more flexibility
Generalizaiton Performance
Why is it bad?
estimated performance
estimates all data
Must mis-trust data on a training set Cross-validation:
More sophisticated
Churn Data-set Model will pick up harmful correlations

all models are susceptible to over-fitting effects

Tree induction
Stop growing the tree
Avoidance
Grow until it is too large hen prune it back

Estimate the generalizing performance of each model

Find the right balance


Equations
Parameter optimization

You might also like