Model Generalization
Legal Notices and Disclaimers
This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES,
EXPRESS OR IMPLIED, IN THIS SUMMARY.
Intel technologies’ features and benefits depend on system configuration and may require
enabled hardware, software or service activation. Performance varies depending on system
configuration. Check with your system manufacturer or retailer or learn more at intel.com.
This sample source code is released under the Intel Sample Source Code License Agreement.
Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries.
*Other names and brands may be claimed as the property of others.
Copyright © 2017, Intel Corporation. All rights reserved.
K Value Affects Decision Boundary
Number of Malignant Nodes
0
Age
60
40
20
10 20
Number of Malignant Nodes
0
60
40
20
10 20
K = 34K = 1
Choosing Between Different Complexities
X
Y
Model
True Function
Samples
X
Y
X
Y
Polynomial Degree = 1 Polynomial Degree = 4 Polynomial Degree = 15
How Well Does the Model Generalize?
Poor at Training
Poor at Predicting
Just Right
Good at Training
Poor at Predicting
X
Y
Model
True Function
Samples
X
Y
X
Y
Polynomial Degree = 1 Polynomial Degree = 4 Polynomial Degree = 15
Underfitting vs Overfitting
Underfitting Just Right Overfitting
X
Y
Model
True Function
Samples
X
Y
X
Y
Polynomial Degree = 1 Polynomial Degree = 4 Polynomial Degree = 15
Bias – Variance Tradeoff
High Bias
Low Variance
Just Right
Low Bias
High Variance
X
Y
Model
True Function
Samples
X
Y
X
Y
Polynomial Degree = 1 Polynomial Degree = 4 Polynomial Degree = 15
Training and Test Splits
Training and Test Splits
Training
Data
Test
Data
fit the model
measure performance
- predict label with model
- compare with actual value
- measure error
Test
Data
Using Training and Test Data
Training
Data
Test DataTraining Data
Using Training and Test Data
0.0 1.0 2.00.0 1.0 2.0
1.0
2.0
3.0
4.0
x108 x108
1.0
2.0
3.0
4.0
x108x108
0.0 1.0 2.00.0 1.0 2.0
1.0
2.0
3.0
4.0
x108 x108
1.0
2.0
3.0
4.0
x108x108
Fit the model
Using Training and Test Data
Test DataTraining Data
0.0 1.0 2.00.0 1.0 2.0
1.0
2.0
3.0
4.0
x108 x108
1.0
2.0
3.0
4.0
x108x108
Make predictions
Using Training and Test Data
Test DataTraining Data
0.0 1.0 2.00.0 1.0 2.0
1.0
2.0
3.0
4.0
x108 x108
1.0
2.0
3.0
4.0
x108x108
Measure error
Using Training and Test Data
Test DataTraining Data
X_train
X_test
Y_train
model
KNN( X_train, Y_train ).fit()
.predict( X_test )
model
Fitting Training and Test Data
Y_predictTest
Data
Training
Data
X_train
X_test
Y_train
model
KNN( X_train, Y_train ).fit()
.predict( X_test )
model
Y_predict
Fitting Training and Test Data
error_metric( Y_test, Y_predict) test error
Y_test
Test
Data
Training
Data
Import the train and test split function
from sklearn.model_selection import train_test_split
Train and Test Splitting: The Syntax
Import the train and test split function
from sklearn.model_selection import train_test_split
Split the data and put 30% into the test set
train, test = train_test_split(data, test_size=0.3)
Train and Test Splitting: The Syntax
Train and Test Splitting: The Syntax
Import the train and test split function
from sklearn.model_selection import train_test_split
Split the data and put 30% into the test set
train, test = train_test_split(data, test_size=0.3)
Other method for splitting data:
from sklearn.model_selection import ShuffleSplit
Beyond a Single Test Set: Cross Validation
Training
Data
Validatio
n
Data
0.0 1.0 2.0
1.0
2.0
3.0
4.0
x108 x108
x108
Best model for this test set
Beyond a Single Test Set: Cross Validation
0.0 1.0 2.0
1.0
2.0
3.0
4.0
x108
Test DataTraining Data
Beyond a Single Test Set: Cross Validation
Training
Data 1
Validatio
n
Data 1
Beyond a Single Test Set: Cross Validation
Training
Data 2
Validatio
n
Data 2
Beyond a Single Test Set: Cross Validation
Training
Data 3
Validatio
n
Data 3
Beyond a Single Test Set: Cross Validation
Training
Data 4
Validatio
n
Data 4
Beyond a Single Test Set: Cross Validation
Test SplitTraining Split Training Split Training Split
Test SplitTraining Split Training Split Training Split
Test SplitTraining Split Training Split Training Split
Test Split Training Split Training Split Training Split
+
+
+
Average cross validation results.
Beyond a Single Test Set: Cross Validation
Test SplitTraining Split Training Split Training Split
Test SplitTraining Split Training Split Training Split
Test SplitTraining Split Training Split Training Split
Test Split Training Split Training Split Training Split
+
+
+
Average cross validation results.Average cross validation results.
Model Complexity vs Error
error
𝐽𝑐𝑣 𝜃
cross validation error
𝐽𝑡𝑟𝑎𝑖𝑛 𝜃
training error
Model Complexity vs Error
error
𝐽𝑐𝑣 𝜃
cross validation error
𝐽𝑡𝑟𝑎𝑖𝑛 𝜃
training error
Model Complexity vs Error
error
𝐽𝑐𝑣 𝜃
cross validation error
𝐽𝑡𝑟𝑎𝑖𝑛 𝜃
training error
Model Complexity vs Error
Underfitting: training and cross validation error are high
error
𝐽𝑐𝑣 𝜃
cross validation error
𝐽𝑡𝑟𝑎𝑖𝑛 𝜃
training error
X
Y
Model
True Function
Samples
Polynomial Degree = 1
Model Complexity vs Error
Overfitting: training error is low, cross validation is high
model complexity
error
𝐽𝑐𝑣 𝜃
cross validation error
𝐽𝑡𝑟𝑎𝑖𝑛 𝜃
training error
X
Y
Model
True Function
Samples
Polynomial Degree = 15
Model Complexity vs Error
Just right: training and cross validation errors are low
error
𝐽𝑐𝑣 𝜃
cross validation error
𝐽𝑡𝑟𝑎𝑖𝑛 𝜃
training error
X
Y
Model
True Function
Samples
Polynomial Degree = 4
Import the train and test split function
from sklearn.model_selection import cross_val_score
Perform cross-validation with a given model
cross_val = cross_val_score(KNN, X_data, y_data, cv=4,
scoring='neg_mean_squared_error')
Other methods for cross validation:
from sklearn.model_selection import KFold, StratifiedKFold
Cross Validation: The Syntax
Import the train and test split function
from sklearn.model_selection import cross_val_score
Perform cross-validation with a given model
cross_val = cross_val_score(KNN, X_data, y_data, cv=4,
scoring='neg_mean_squared_error')
Other methods for cross validation:
from sklearn.model_selection import KFold, StratifiedKFold
Cross Validation: The Syntax
Cross Validation: The Syntax
Import the train and test split function
from sklearn.model_selection import cross_val_score
Perform cross-validation with a given model
cross_val = cross_val_score(KNN, X_data, y_data, cv=4,
scoring='neg_mean_squared_error')
Other methods for cross validation:
from sklearn.model_selection import KFold, StratifiedKFold
Introduction to
Linear Regression
Introduction to Linear Regression
𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥
0.0
1.0
2.0
x108
1.0 x108
2.0
Budget
BoxOffice
Introduction to Linear Regression
coefficient
0
box office
revenue
movie
budget
coefficient
1
𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥
0.0
1.0
2.0
x108
1.0 x108
2.0
Budget
BoxOffice
Introduction to Linear Regression
𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥
𝛽0= 80 million, 𝛽1= 0.6
0.0
1.0
2.0
x108
1.0 x108
2.0
Budget
BoxOffice
Predicting from Linear Regression
𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥
𝛽0= 80 million, 𝛽1= 0.6
Predict 175 Million Gross for 160 Million Budget
0.0
1.0
2.0
x108
1.0 x108
2.0
Budget
BoxOffice
Which Model Fits the Best?
0.0
1.0
2.0
x108
1.0 x108
2.0
Budget
BoxOffice
Calculating the Residuals
𝑦 𝛽 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
predicted
value
observe
d value
0.0
1.0
2.0
x108
1.0 x108
2.0
Budget
BoxOffice
Calculating the Residuals
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
0.0
1.0
2.0
x108
1.0 x108
2.0
Budget
BoxOffice
Mean Squared Error
1
𝑚
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
0.0
1.0
2.0
x108
1.0 x108
2.0
Budget
BoxOffice
Minimum Mean Squared Error
min
𝛽0,𝛽1
1
𝑚
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
0.0
1.0
2.0
x108
1.0 x108
2.0
Budget
BoxOffice
Cost Function
𝐽 𝛽0, 𝛽1 =
1
2𝑚
𝑖=1
𝑚
𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠
(𝑖)
− 𝑦𝑜𝑏𝑠
(𝑖)
2
0.0
1.0
2.0
x108
1.0 x108
2.0
Budget
BoxOffice
Modelling Best Practice
• Use cost function to fit model
• Develop multiple models
• Compare results and choose best one
𝑖=1
𝑚
𝑦 𝑜𝑏𝑠 − 𝑦𝑜𝑏𝑠
(𝑖) 2
Total Sum of Squares (TSS):
1 −
𝑆𝑆𝐸
𝑇𝑆𝑆
Correlation Coefficient (R2):
Other Model Metrics
Sum of Squared Error
(SSE): 𝑖=1
𝑚
𝑦 𝛽(𝑥(𝑖)) − 𝑦𝑜𝑏𝑠
(𝑖) 2
𝑖=1
𝑚
𝑦 𝑜𝑏𝑠 − 𝑦𝑜𝑏𝑠
(𝑖) 2
Total Sum of Squares (TSS):
1 −
𝑆𝑆𝐸
𝑇𝑆𝑆
Correlation Coefficient (R2):
Other Measures of Error
Sum of Squared Error
(SSE): 𝑖=1
𝑚
𝑦 𝛽(𝑥(𝑖)) − 𝑦𝑜𝑏𝑠
(𝑖) 2
Other Measures of Error
Sum of Squared Error
(SSE): 𝑖=1
𝑚
𝑦 𝛽(𝑥(𝑖)) − 𝑦𝑜𝑏𝑠
(𝑖) 2
𝑖=1
𝑚
𝑦 𝑜𝑏𝑠 − 𝑦𝑜𝑏𝑠
(𝑖) 2
Total Sum of Squares (TSS):
1 −
𝑆𝑆𝐸
𝑇𝑆𝑆
Correlation Coefficient (R2):
• Fitting involves minimizing cost
function (slow)
• Model has few parameters
(memory efficient)
• Prediction involves calculation
(fast)
• Fitting involves storing training
data (fast)
• Model has many parameters
(memory intensive)
• Prediction involves finding
closest neighbors (slow)
Comparing Linear Regression and KNN
Linear Regression K Nearest Neighbors
• Fitting involves minimizing cost
function (slow)
• Model has few parameters
(memory efficient)
• Prediction involves calculation
(fast)
• Fitting involves storing training
data (fast)
• Model has many parameters
(memory intensive)
• Prediction involves finding
closest neighbors (slow)
Comparing Linear Regression and KNN
Linear Regression K Nearest Neighbors
• Fitting involves minimizing cost
function (slow)
• Model has few parameters
(memory efficient)
• Prediction involves calculation
(fast)
• Fitting involves storing training
data (fast)
• Model has many parameters
(memory intensive)
• Prediction involves finding
closest neighbors (slow)
Comparing Linear Regression and KNN
Linear Regression K Nearest Neighbors
Import the class containing the regression method
from sklearn.linear_model import LinearRegression
Create an instance of the class
LR = LinearRegression()
Fit the instance on the data and then predict the expected value
LR = LR.fit(X_train, y_train)
y_predict = LR.predict(X_test)
Linear Regression: The Syntax
Import the class containing the regression method
from sklearn.linear_model import LinearRegression
Create an instance of the class
LR = LinearRegression()
Fit the instance on the data and then predict the expected value
LR = LR.fit(X_train, y_train)
y_predict = LR.predict(X_test)
Linear Regression: The Syntax
Linear Regression: The Syntax
Import the class containing the regression method
from sklearn.linear_model import LinearRegression
Create an instance of the class
LR = LinearRegression()
Fit the instance on the data and then predict the expected value
LR = LR.fit(X_train, y_train)
y_predict = LR.predict(X_test)
Advanced
Linear Regression
Scaling is a Type of Feature Transformation
Number of Surgeries
Age
60
40
20
12345
24
22
20
18
Number of Surgeries
60
40
20
1 2 4 53
Transformation of Data Distributions
• Predictions from linear regression
models assume residuals are
normally distributed
• Features and predicted data are
often skewed
• Data transformations can solve
this issue
Transformation of Data Distributions
• Predictions from linear regression
models assume residuals are
normally distributed
• Features and predicted data are
often skewed
• Data transformations can solve
this issue
Transformation of Data Distributions
from numpy import log, log1p
from scipy.stats import boxcox
Transformation of Data Distributions
• Predictions from linear regression
models assume residuals are
normally distributed
• Features and predicted data are
often skewed
• Data transformations can solve
this issue
Feature Types
• Continuous: numerical values
• Nominal: categorical, unordered
features (True or False)
• Ordinal: categorical, ordered
features (movie ratings)
• Standard Scaling, Min-Max Scaling
• One-hot encoding (0, 1)
• Ordinal encoding (0, 1, 2, 3)
Feature Type Transformation
Feature Types
• Continuous: numerical values
• Nominal: categorical, unordered
features (True or False)
• Ordinal: categorical, ordered
features (movie ratings)
• Standard Scaling, Min-Max Scaling
• One-hot encoding (0, 1)
• Ordinal encoding (0, 1, 2, 3)
Feature Type Transformation
Feature Types
• Continuous: numerical values
• Nominal: categorical, unordered
features (True or False)
• Ordinal: categorical, ordered
features (movie ratings)
• Standard Scaling, Min-Max Scaling
• One-hot encoding (0, 1)
• Ordinal encoding (0, 1, 2, 3)
Feature Type Transformation
from sklearn.preprocessing import LabelEncoder, LabelBinarizer, OneHotEncoder
Feature Types
• Continuous: numerical values
• Nominal: categorical, unordered
features (True or False)
• Ordinal: categorical, ordered
features (movie ratings)
• Standard Scaling, Min-Max Scaling
• One-hot encoding (0, 1)
• Ordinal encoding (0, 1, 2, 3)
Feature Type Transformation
from sklearn.feature_extraction import DictVectorizer
from pandas import get_dummies
Addition of Polynomial Features
• Capture higher order features
of data by adding polynomial
features
• "Linear regression" means
linear combinations of
features
𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥2
BudgetBoxOffice
Addition of Polynomial Features
• Capture higher order features
of data by adding polynomial
features
• "Linear regression" means
linear combinations of
features
𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥2
+ 𝛽3 𝑥3
BudgetBoxOffice
Addition of Polynomial Features
• Capture higher order features
of data by adding polynomial
features
• "Linear regression" means
linear combinations of
features
𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥2
BudgetBoxOffice
Addition of Polynomial Features
• Capture higher order features
of data by adding polynomial
features
• "Linear regression" means
linear combinations of
features
𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 log(𝑥)
BudgetBoxOffice
Addition of Polynomial Features
• Can also include variable
interactions
• How is the correct functional
form chosen?
𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥1 𝑥2
Check relationship of each
variable or with outcome
Addition of Polynomial Features
• Can also include variable
interactions
• How is the correct functional
form chosen?
𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥1 𝑥2
Check relationship of each
variable or with outcome
Polynomial Features: The Syntax
Import the class containing the transformation method
from sklearn.preprocessing import PolynomialFeatures
Create an instance of the class
polyFeat = PolynomialFeatures(degree=2)
Create the polynomial features and then transform the data
polyFeat = polyFeat.fit(X_data)
X_poly = polyFeat.transform(X_data)
Polynomial Features: The Syntax
Import the class containing the transformation method
from sklearn.preprocessing import PolynomialFeatures
Create an instance of the class
polyFeat = PolynomialFeatures(degree=2)
Create the polynomial features and then transform the data
polyFeat = polyFeat.fit(X_data)
X_poly = polyFeat.transform(X_data)
Polynomial Features: The Syntax
Import the class containing the transformation method
from sklearn.preprocessing import PolynomialFeatures
Create an instance of the class
polyFeat = PolynomialFeatures(degree=2)
Create the polynomial features and then transform the data
polyFeat = polyFeat.fit(X_data)
X_poly = polyFeat.transform(X_data)
Ml2 train test-splits_validation_linear_regression

Ml2 train test-splits_validation_linear_regression

  • 1.
  • 2.
    Legal Notices andDisclaimers This presentation is for informational purposes only. INTEL MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY. Intel technologies’ features and benefits depend on system configuration and may require enabled hardware, software or service activation. Performance varies depending on system configuration. Check with your system manufacturer or retailer or learn more at intel.com. This sample source code is released under the Intel Sample Source Code License Agreement. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and/or other countries. *Other names and brands may be claimed as the property of others. Copyright © 2017, Intel Corporation. All rights reserved.
  • 3.
    K Value AffectsDecision Boundary Number of Malignant Nodes 0 Age 60 40 20 10 20 Number of Malignant Nodes 0 60 40 20 10 20 K = 34K = 1
  • 4.
    Choosing Between DifferentComplexities X Y Model True Function Samples X Y X Y Polynomial Degree = 1 Polynomial Degree = 4 Polynomial Degree = 15
  • 5.
    How Well Doesthe Model Generalize? Poor at Training Poor at Predicting Just Right Good at Training Poor at Predicting X Y Model True Function Samples X Y X Y Polynomial Degree = 1 Polynomial Degree = 4 Polynomial Degree = 15
  • 6.
    Underfitting vs Overfitting UnderfittingJust Right Overfitting X Y Model True Function Samples X Y X Y Polynomial Degree = 1 Polynomial Degree = 4 Polynomial Degree = 15
  • 7.
    Bias – VarianceTradeoff High Bias Low Variance Just Right Low Bias High Variance X Y Model True Function Samples X Y X Y Polynomial Degree = 1 Polynomial Degree = 4 Polynomial Degree = 15
  • 8.
  • 9.
    Training and TestSplits Training Data Test Data
  • 10.
    fit the model measureperformance - predict label with model - compare with actual value - measure error Test Data Using Training and Test Data Training Data
  • 11.
    Test DataTraining Data UsingTraining and Test Data 0.0 1.0 2.00.0 1.0 2.0 1.0 2.0 3.0 4.0 x108 x108 1.0 2.0 3.0 4.0 x108x108
  • 12.
    0.0 1.0 2.00.01.0 2.0 1.0 2.0 3.0 4.0 x108 x108 1.0 2.0 3.0 4.0 x108x108 Fit the model Using Training and Test Data Test DataTraining Data
  • 13.
    0.0 1.0 2.00.01.0 2.0 1.0 2.0 3.0 4.0 x108 x108 1.0 2.0 3.0 4.0 x108x108 Make predictions Using Training and Test Data Test DataTraining Data
  • 14.
    0.0 1.0 2.00.01.0 2.0 1.0 2.0 3.0 4.0 x108 x108 1.0 2.0 3.0 4.0 x108x108 Measure error Using Training and Test Data Test DataTraining Data
  • 15.
    X_train X_test Y_train model KNN( X_train, Y_train).fit() .predict( X_test ) model Fitting Training and Test Data Y_predictTest Data Training Data
  • 16.
    X_train X_test Y_train model KNN( X_train, Y_train).fit() .predict( X_test ) model Y_predict Fitting Training and Test Data error_metric( Y_test, Y_predict) test error Y_test Test Data Training Data
  • 17.
    Import the trainand test split function from sklearn.model_selection import train_test_split Train and Test Splitting: The Syntax
  • 18.
    Import the trainand test split function from sklearn.model_selection import train_test_split Split the data and put 30% into the test set train, test = train_test_split(data, test_size=0.3) Train and Test Splitting: The Syntax
  • 19.
    Train and TestSplitting: The Syntax Import the train and test split function from sklearn.model_selection import train_test_split Split the data and put 30% into the test set train, test = train_test_split(data, test_size=0.3) Other method for splitting data: from sklearn.model_selection import ShuffleSplit
  • 20.
    Beyond a SingleTest Set: Cross Validation Training Data Validatio n Data
  • 21.
    0.0 1.0 2.0 1.0 2.0 3.0 4.0 x108x108 x108 Best model for this test set Beyond a Single Test Set: Cross Validation 0.0 1.0 2.0 1.0 2.0 3.0 4.0 x108 Test DataTraining Data
  • 22.
    Beyond a SingleTest Set: Cross Validation Training Data 1 Validatio n Data 1
  • 23.
    Beyond a SingleTest Set: Cross Validation Training Data 2 Validatio n Data 2
  • 24.
    Beyond a SingleTest Set: Cross Validation Training Data 3 Validatio n Data 3
  • 25.
    Beyond a SingleTest Set: Cross Validation Training Data 4 Validatio n Data 4
  • 26.
    Beyond a SingleTest Set: Cross Validation Test SplitTraining Split Training Split Training Split Test SplitTraining Split Training Split Training Split Test SplitTraining Split Training Split Training Split Test Split Training Split Training Split Training Split + + + Average cross validation results.
  • 27.
    Beyond a SingleTest Set: Cross Validation Test SplitTraining Split Training Split Training Split Test SplitTraining Split Training Split Training Split Test SplitTraining Split Training Split Training Split Test Split Training Split Training Split Training Split + + + Average cross validation results.Average cross validation results.
  • 28.
    Model Complexity vsError error 𝐽𝑐𝑣 𝜃 cross validation error 𝐽𝑡𝑟𝑎𝑖𝑛 𝜃 training error
  • 29.
    Model Complexity vsError error 𝐽𝑐𝑣 𝜃 cross validation error 𝐽𝑡𝑟𝑎𝑖𝑛 𝜃 training error
  • 30.
    Model Complexity vsError error 𝐽𝑐𝑣 𝜃 cross validation error 𝐽𝑡𝑟𝑎𝑖𝑛 𝜃 training error
  • 31.
    Model Complexity vsError Underfitting: training and cross validation error are high error 𝐽𝑐𝑣 𝜃 cross validation error 𝐽𝑡𝑟𝑎𝑖𝑛 𝜃 training error X Y Model True Function Samples Polynomial Degree = 1
  • 32.
    Model Complexity vsError Overfitting: training error is low, cross validation is high model complexity error 𝐽𝑐𝑣 𝜃 cross validation error 𝐽𝑡𝑟𝑎𝑖𝑛 𝜃 training error X Y Model True Function Samples Polynomial Degree = 15
  • 33.
    Model Complexity vsError Just right: training and cross validation errors are low error 𝐽𝑐𝑣 𝜃 cross validation error 𝐽𝑡𝑟𝑎𝑖𝑛 𝜃 training error X Y Model True Function Samples Polynomial Degree = 4
  • 34.
    Import the trainand test split function from sklearn.model_selection import cross_val_score Perform cross-validation with a given model cross_val = cross_val_score(KNN, X_data, y_data, cv=4, scoring='neg_mean_squared_error') Other methods for cross validation: from sklearn.model_selection import KFold, StratifiedKFold Cross Validation: The Syntax
  • 35.
    Import the trainand test split function from sklearn.model_selection import cross_val_score Perform cross-validation with a given model cross_val = cross_val_score(KNN, X_data, y_data, cv=4, scoring='neg_mean_squared_error') Other methods for cross validation: from sklearn.model_selection import KFold, StratifiedKFold Cross Validation: The Syntax
  • 36.
    Cross Validation: TheSyntax Import the train and test split function from sklearn.model_selection import cross_val_score Perform cross-validation with a given model cross_val = cross_val_score(KNN, X_data, y_data, cv=4, scoring='neg_mean_squared_error') Other methods for cross validation: from sklearn.model_selection import KFold, StratifiedKFold
  • 38.
  • 39.
    Introduction to LinearRegression 𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥 0.0 1.0 2.0 x108 1.0 x108 2.0 Budget BoxOffice
  • 40.
    Introduction to LinearRegression coefficient 0 box office revenue movie budget coefficient 1 𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥 0.0 1.0 2.0 x108 1.0 x108 2.0 Budget BoxOffice
  • 41.
    Introduction to LinearRegression 𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥 𝛽0= 80 million, 𝛽1= 0.6 0.0 1.0 2.0 x108 1.0 x108 2.0 Budget BoxOffice
  • 42.
    Predicting from LinearRegression 𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥 𝛽0= 80 million, 𝛽1= 0.6 Predict 175 Million Gross for 160 Million Budget 0.0 1.0 2.0 x108 1.0 x108 2.0 Budget BoxOffice
  • 43.
    Which Model Fitsthe Best? 0.0 1.0 2.0 x108 1.0 x108 2.0 Budget BoxOffice
  • 44.
    Calculating the Residuals 𝑦𝛽 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) predicted value observe d value 0.0 1.0 2.0 x108 1.0 x108 2.0 Budget BoxOffice
  • 45.
    Calculating the Residuals 𝛽0+ 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 0.0 1.0 2.0 x108 1.0 x108 2.0 Budget BoxOffice
  • 46.
    Mean Squared Error 1 𝑚 𝑖=1 𝑚 𝛽0+ 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 0.0 1.0 2.0 x108 1.0 x108 2.0 Budget BoxOffice
  • 47.
    Minimum Mean SquaredError min 𝛽0,𝛽1 1 𝑚 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 0.0 1.0 2.0 x108 1.0 x108 2.0 Budget BoxOffice
  • 48.
    Cost Function 𝐽 𝛽0,𝛽1 = 1 2𝑚 𝑖=1 𝑚 𝛽0 + 𝛽1 𝑥 𝑜𝑏𝑠 (𝑖) − 𝑦𝑜𝑏𝑠 (𝑖) 2 0.0 1.0 2.0 x108 1.0 x108 2.0 Budget BoxOffice
  • 49.
    Modelling Best Practice •Use cost function to fit model • Develop multiple models • Compare results and choose best one
  • 50.
    𝑖=1 𝑚 𝑦 𝑜𝑏𝑠 −𝑦𝑜𝑏𝑠 (𝑖) 2 Total Sum of Squares (TSS): 1 − 𝑆𝑆𝐸 𝑇𝑆𝑆 Correlation Coefficient (R2): Other Model Metrics Sum of Squared Error (SSE): 𝑖=1 𝑚 𝑦 𝛽(𝑥(𝑖)) − 𝑦𝑜𝑏𝑠 (𝑖) 2
  • 51.
    𝑖=1 𝑚 𝑦 𝑜𝑏𝑠 −𝑦𝑜𝑏𝑠 (𝑖) 2 Total Sum of Squares (TSS): 1 − 𝑆𝑆𝐸 𝑇𝑆𝑆 Correlation Coefficient (R2): Other Measures of Error Sum of Squared Error (SSE): 𝑖=1 𝑚 𝑦 𝛽(𝑥(𝑖)) − 𝑦𝑜𝑏𝑠 (𝑖) 2
  • 52.
    Other Measures ofError Sum of Squared Error (SSE): 𝑖=1 𝑚 𝑦 𝛽(𝑥(𝑖)) − 𝑦𝑜𝑏𝑠 (𝑖) 2 𝑖=1 𝑚 𝑦 𝑜𝑏𝑠 − 𝑦𝑜𝑏𝑠 (𝑖) 2 Total Sum of Squares (TSS): 1 − 𝑆𝑆𝐸 𝑇𝑆𝑆 Correlation Coefficient (R2):
  • 53.
    • Fitting involvesminimizing cost function (slow) • Model has few parameters (memory efficient) • Prediction involves calculation (fast) • Fitting involves storing training data (fast) • Model has many parameters (memory intensive) • Prediction involves finding closest neighbors (slow) Comparing Linear Regression and KNN Linear Regression K Nearest Neighbors
  • 54.
    • Fitting involvesminimizing cost function (slow) • Model has few parameters (memory efficient) • Prediction involves calculation (fast) • Fitting involves storing training data (fast) • Model has many parameters (memory intensive) • Prediction involves finding closest neighbors (slow) Comparing Linear Regression and KNN Linear Regression K Nearest Neighbors
  • 55.
    • Fitting involvesminimizing cost function (slow) • Model has few parameters (memory efficient) • Prediction involves calculation (fast) • Fitting involves storing training data (fast) • Model has many parameters (memory intensive) • Prediction involves finding closest neighbors (slow) Comparing Linear Regression and KNN Linear Regression K Nearest Neighbors
  • 56.
    Import the classcontaining the regression method from sklearn.linear_model import LinearRegression Create an instance of the class LR = LinearRegression() Fit the instance on the data and then predict the expected value LR = LR.fit(X_train, y_train) y_predict = LR.predict(X_test) Linear Regression: The Syntax
  • 57.
    Import the classcontaining the regression method from sklearn.linear_model import LinearRegression Create an instance of the class LR = LinearRegression() Fit the instance on the data and then predict the expected value LR = LR.fit(X_train, y_train) y_predict = LR.predict(X_test) Linear Regression: The Syntax
  • 58.
    Linear Regression: TheSyntax Import the class containing the regression method from sklearn.linear_model import LinearRegression Create an instance of the class LR = LinearRegression() Fit the instance on the data and then predict the expected value LR = LR.fit(X_train, y_train) y_predict = LR.predict(X_test)
  • 60.
  • 61.
    Scaling is aType of Feature Transformation Number of Surgeries Age 60 40 20 12345 24 22 20 18 Number of Surgeries 60 40 20 1 2 4 53
  • 62.
    Transformation of DataDistributions • Predictions from linear regression models assume residuals are normally distributed • Features and predicted data are often skewed • Data transformations can solve this issue
  • 63.
    Transformation of DataDistributions • Predictions from linear regression models assume residuals are normally distributed • Features and predicted data are often skewed • Data transformations can solve this issue
  • 64.
    Transformation of DataDistributions from numpy import log, log1p from scipy.stats import boxcox
  • 65.
    Transformation of DataDistributions • Predictions from linear regression models assume residuals are normally distributed • Features and predicted data are often skewed • Data transformations can solve this issue
  • 66.
    Feature Types • Continuous:numerical values • Nominal: categorical, unordered features (True or False) • Ordinal: categorical, ordered features (movie ratings) • Standard Scaling, Min-Max Scaling • One-hot encoding (0, 1) • Ordinal encoding (0, 1, 2, 3) Feature Type Transformation
  • 67.
    Feature Types • Continuous:numerical values • Nominal: categorical, unordered features (True or False) • Ordinal: categorical, ordered features (movie ratings) • Standard Scaling, Min-Max Scaling • One-hot encoding (0, 1) • Ordinal encoding (0, 1, 2, 3) Feature Type Transformation
  • 68.
    Feature Types • Continuous:numerical values • Nominal: categorical, unordered features (True or False) • Ordinal: categorical, ordered features (movie ratings) • Standard Scaling, Min-Max Scaling • One-hot encoding (0, 1) • Ordinal encoding (0, 1, 2, 3) Feature Type Transformation from sklearn.preprocessing import LabelEncoder, LabelBinarizer, OneHotEncoder
  • 69.
    Feature Types • Continuous:numerical values • Nominal: categorical, unordered features (True or False) • Ordinal: categorical, ordered features (movie ratings) • Standard Scaling, Min-Max Scaling • One-hot encoding (0, 1) • Ordinal encoding (0, 1, 2, 3) Feature Type Transformation from sklearn.feature_extraction import DictVectorizer from pandas import get_dummies
  • 70.
    Addition of PolynomialFeatures • Capture higher order features of data by adding polynomial features • "Linear regression" means linear combinations of features 𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥2 BudgetBoxOffice
  • 71.
    Addition of PolynomialFeatures • Capture higher order features of data by adding polynomial features • "Linear regression" means linear combinations of features 𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥2 + 𝛽3 𝑥3 BudgetBoxOffice
  • 72.
    Addition of PolynomialFeatures • Capture higher order features of data by adding polynomial features • "Linear regression" means linear combinations of features 𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥 + 𝛽2 𝑥2 BudgetBoxOffice
  • 73.
    Addition of PolynomialFeatures • Capture higher order features of data by adding polynomial features • "Linear regression" means linear combinations of features 𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 log(𝑥) BudgetBoxOffice
  • 74.
    Addition of PolynomialFeatures • Can also include variable interactions • How is the correct functional form chosen? 𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥1 𝑥2 Check relationship of each variable or with outcome
  • 75.
    Addition of PolynomialFeatures • Can also include variable interactions • How is the correct functional form chosen? 𝑦 𝛽 𝑥 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥1 𝑥2 Check relationship of each variable or with outcome
  • 76.
    Polynomial Features: TheSyntax Import the class containing the transformation method from sklearn.preprocessing import PolynomialFeatures Create an instance of the class polyFeat = PolynomialFeatures(degree=2) Create the polynomial features and then transform the data polyFeat = polyFeat.fit(X_data) X_poly = polyFeat.transform(X_data)
  • 77.
    Polynomial Features: TheSyntax Import the class containing the transformation method from sklearn.preprocessing import PolynomialFeatures Create an instance of the class polyFeat = PolynomialFeatures(degree=2) Create the polynomial features and then transform the data polyFeat = polyFeat.fit(X_data) X_poly = polyFeat.transform(X_data)
  • 78.
    Polynomial Features: TheSyntax Import the class containing the transformation method from sklearn.preprocessing import PolynomialFeatures Create an instance of the class polyFeat = PolynomialFeatures(degree=2) Create the polynomial features and then transform the data polyFeat = polyFeat.fit(X_data) X_poly = polyFeat.transform(X_data)