2.1 Linear Regression
2.1 Linear Regression
REGRESSION
Linear regression is one of the easiest and most
popular Machine Learning algorithms.
WHAT IS Linear regression makes predictions for
LINEAR continuous/real or numeric variables
Three main uses of regression analysis are:
REGRESSIO 1. Predicting share price
N? 2. Analyzing the impact of price changes
LINEAR REGRESSION
Linear regression shows the linear relationship,
which means it finds how the value of the
dependent variable (y) is changing according to
the value of the independent variable (x).
The linear regression model provides a sloped
straight line representing the relationship
between the variables.
LINEAR REGRESSION
Mathematically, we can represent a linear regression
as:
y= a0+a1x+ ε
Here,
Y= Dependent Variable (Target Variable)
X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree
of freedom)
a1 = Linear regression coefficient (scale factor to each
input value).
ε = random error
TYPES OF LINEAR REGRESSION
Simple Linear Regression:
If a single independent variable is used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Simple Linear
Regression.
Multiple Linear regression:
If more than one independent variables are used to predict the value of a numerical
dependent variable, then such a Linear Regression algorithm is called Multiple Linear
Regression.
LINEAR REGRESSION LINE
Positive Linear Relationship: Negative Linear Relationship:
If the dependent variable increases on If the dependent variable decreases on
the Y-axis and independent variable the Y-axis and independent variable
increases on X-axis, then such a increases on the X-axis, then such a
relationship is termed as a Positive linear relationship is called a negative linear
relationship. relationship.
COST FUNCTION
Goal: find the best fit line that means the error between predicted values and actual values
should be minimized. The best fit line will have the least error.
The different values for weights or the coefficient of lines (a0, a1) gives a different line of
regression, so we need to calculate the best values for a0 and a1 to find the best fit line, so
to calculate this we use cost function.
For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is the
average of squared error occurred between the predicted values and actual values. It can
be written as:
MSE can be calculated as:
Where,
N=Total number of observation
Yi = Actual value
(a1xi+a0)= Predicted value.
GRADIENT DESCENT
Gradient descent is a method of updating a_0 and a_1 to reduce the cost
function(MSE). The idea is that we start with some values for a_0 and a_1 and then
we change these values iteratively to reduce the cost.
GOODNESS OF FIT
GOODNESS OF FIT
MULTIVARIATE LINEAR REGRESSION
More than one independent variables are used to predict the value of a numerical
dependent variable.
MULTIVARIATE LINEAR REGRESSION
Y=f(x,z)
y = m1.x + m2.z+ c
y is the dependent variable i.e. the variable that needs to be estimated
and predicted.
x is the first independent variable i.e. the variable that is controllable. It is
the first input.
m1 is the slope of x1. It determines what will be the angle of the line (x).
z is the second independent variable i.e. the variable that is controllable. It
is the second input.
m2 is the slope of z. It determines what will be the angle of the line (z).
c is the intercept. A constant that determines the value of y when x and z
are 0.
MULTIVARIATE LINEAR
REGRESSIONREGRESSION
model with two input variables can be expressed as:
y = β0 + β1.x1 + β2.x2
In machine learning world, there can be many dimensions. A model with three
input variables can be expressed as:
y = β0 + β1.x1 + β2.x2 + β3.x3
A generalized equation for the multivariate regression model can be:
y = β0 + β1.x1 + β2.x2 +….. + βn.xn
MULTIVARIATE LINEAR REGRESSION
When we have multiple features
and we want to train a model that
can predict the price given those
features, we can use a
multivariate linear regression. The
model will have to learn the
parameters(theta 0 to theta n) on
the training dataset below such
that if we want to predict the
price for a house that is not sold
yet, it can give us prediction that
is closer to what it will get sold
for.
MULTIVARIATE LINEAR REGRESSION
COST FUNCTION AND GRADIENT
DESCENT FOR MULTIVARIATE LINEAR
REGRESSION
GRADIENT DESCENT
PRACTICAL IDEAS FOR MAKING
GRADIENT DESCENT WORK WELL
Use feature scaling to help gradient descent converge faster. Get every feature
between -1 and +1 range. It doesn’t have to be exactly in the -1 to +1 range but it
should be close to that range.
PRACTICAL IDEAS FOR MAKING
GRADIENT DESCENT WORK WELL
PRACTICAL IDEAS FOR MAKING
GRADIENT DESCENT WORK WELL
MODEL INTERPRETATION
y = -85090 + 102.85 * X1 + 43.79 * X2+ 1.52 * x3 - 37.91 * x4 + 908.12 * x5 + 364.33 * x6
x1: With all other predictors held constant, if the x1 is increased by one unit, the average
price increases by $102.85.
x2: With all other predictors held constant, if the x2 is increased by one unit, the average
price increases by $43.79.
x3: With all other predictors held constant, if the x3 is increased by one unit, the average
price increases by $1.52.
x4: With all other predictors held constant, if the x4 is increased by one unit, the average
price decreases by $37.91 (length has a -ve coefficient).
x5: With all other predictors held constant, if the x5 is increased by one unit, the average
price increases by $908.12
x6t: With all other predictors held constant, if the x6 is increased by one unit, the average
price increases by $364.33
REGULARIZATION
32
PROBLEM OF OVERFITTING
Regularisation is a technique used to reduce the errors by fitting the function appropriately on the given training set and avoid
overfitting.
33
REGULARISATION IN ML
L1 regularisation
L2 regularisation
A regression model which uses L1 Regularisation technique is called LASSO (Least Absolute
Shrinkage and Selection Operator) regression.
34
REGULARISATION
Lasso Regression adds “absolute value of magnitude” of coefficient as penalty term to the loss
function(L).
During Regularisation the output function(y_hat) does not change. The change is only in the loss function.
35
LOSS FUNCTION
36
PROBELM
A clinical trial gave the following data for BMI and
17 140
Cholesterol level for 10 patients. Predict the likely
value of cholesterol level for someone who has a 21 189
BMI of 27. 24 210
28 240
14 130
16 100
19 135
22 166
15 130
18 170
SOLUTION:
Total=1522 Total=172.4