Memory Card: Bias-Variance Tradeoff, Ridge & Lasso, and Regularization
1. BIAS-VARIANCE TRADEOFF (ISLP 2.2.2)
Concept:
In supervised ML, total error = Bias^2 + Variance + Irreducible Error.
Trade-off:
- High Bias = Underfitting (too simple)
- High Variance = Overfitting (too complex)
- Goal: Balance both to generalize well.
Analogy:
Juicer Machine:
- High Bias = one-speed juicer (inefficient on all fruits)
- High Variance = over-smart juicer (over-adjusts, inconsistent)
- Good Model = adaptive juicer (adjusts for best output)
2. RIDGE & LASSO REGRESSION (ISLP 6.2-6.2.2)
Problem:
OLS regression fails with too many or highly correlated features.
Solution: Regularization = Penalize large coefficients.
A. Ridge Regression (L2):
- Penalty: lambda times sum(coefficients^2)
- Shrinks coefficients but keeps all.
Analogy: Compress clothes for travel, take all but tighter.
B. Lasso Regression (L1):
- Penalty: lambda times sum of absolute values of coefficients
- Shrinks some coefficients to 0 for feature selection.
Analogy: Airline fee per item, take only essentials.
Comparison:
Aspect | Ridge (L2) | Lasso (L1)
------------- | ------------------ | ----------------------
Penalty | Sum of squares | Sum of absolute values
Coefficients | Shrink only | Shrink and zero-out
Use Case | Many relevant vars | Sparse selection
3. REGULARIZATION IN MACHINE LEARNING
What:
Regularization means adding a penalty to the loss function.
Why:
- Prevents overfitting
- Improves generalization
Types:
- L1 (Lasso): zeros out irrelevant features
- L2 (Ridge): shrinks but retains all
- ElasticNet: mix of L1 and L2
Analogy:
Model training is like gym workout. Regularization is like a trainer supervising you to avoid injury (overfitting).
KEY TAKEAWAYS
Concept | Insight
-------------- | -------------------------------------------
Bias-Variance | Balance model complexity
Ridge | Shrink but keep all features
Lasso | Shrink and select features
Regularization | Controls model, improves generalization