0% found this document useful (0 votes)

13 views9 pages

Linear Regression in Machine Learning

Linear regression is a fundamental machine learning algorithm used for predictive analysis of continuous variables, establishing a linear relationship between dependent and independent variables. It can be categorized into simple and multiple linear regression, with the goal of minimizing prediction errors through techniques like cost functions and gradient descent. Key assumptions include linearity, no multicollinearity, homoscedasticity, normal distribution of errors, and no autocorrelation.

Uploaded by

soccer21hd047

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views9 pages

Linear Regression in Machine Learning

Uploaded by

soccer21hd047

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Linear Regression in Machine Learning

Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis.
Linear regression makes predictions for continuous/real or numeric
variables such as sales, salary, age, product price, etc.

Linear regression algorithm shows a linear relationship between a

dependent (y) and one or more independent (y) variables, hence called as
linear regression. Since linear regression shows the linear relationship,
which means it finds how the value of the dependent variable is changing
according to the value of the independent variable.

The linear regression model provides a sloped straight line representing

the relationship between the variables. Consider the below image:

Mathematically, we can represent a linear regression as:

y= a0+a1x+ ε

Here,

Y= Dependent Variable (Target Variable)

X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional degree of freedom)
a1 = Linear regression coefficient (scale factor to each input value).
ε = random error

The values for x and y variables are training datasets for Linear
Regression model representation.

Types of Linear Regression

Linear regression can be further divided into two types of the algorithm:

o Simple Linear Regression:

If a single independent variable is used to predict the value of a
numerical dependent variable, then such a Linear Regression
algorithm is called Simple Linear Regression.
o Multiple Linear regression:
If more than one independent variable is used to predict the value
of a numerical dependent variable, then such a Linear Regression
algorithm is called Multiple Linear Regression.

Linear Regression Line

A linear line showing the relationship between the dependent and
independent variables is called a regression line. A regression line can
show two types of relationship:

o Positive Linear Relationship:

If the dependent variable increases on the Y-axis and independent
variable increases on X-axis, then such a relationship is termed as a
Positive linear relationship.
o Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent
variable increases on the X-axis, then such a relationship is called a
negative linear relationship.

Finding the best fit line:

When working with linear regression, our main goal is to find the best fit
line that means the error between predicted values and actual values
should be minimized. The best fit line will have the least error.

The different values for weights or the coefficient of lines (a 0, a1) gives a
different line of regression, so we need to calculate the best values for
a0 and a1 to find the best fit line, so to calculate this we use cost function.

Cost function-

o The different values for weights or coefficient of lines (a 0, a1) gives

the different line of regression, and the cost function is used to
estimate the values of the coefficient for the best fit line.
o Cost function optimizes the regression coefficients or weights. It
measures how a linear regression model is performing.
o We can use the cost function to find the accuracy of the mapping
function, which maps the input variable to the output variable. This
mapping function is also known as Hypothesis function.

For Linear Regression, we use the Mean Squared Error (MSE) cost
function, which is the average of squared error occurred between the
predicted values and actual values. It can be written as:
For the above linear equation, MSE can be calculated as:

Where,

N=Total number of observation

Yi = Actual value
(a1xi+a0)= Predicted value.

Residuals: The distance between the actual value and predicted values is
called residual. If the observed points are far from the regression line,
then the residual will be high, and so cost function will high. If the scatter
points are close to the regression line, then the residual will be small and
hence the cost function.

Gradient Descent:

o Gradient descent is used to minimize the MSE by calculating the

gradient of the cost function.
o A regression model uses gradient descent to update the coefficients
of the line by reducing the cost function.
o It is done by a random selection of values of coefficient and then
iteratively update the values to reach the minimum cost function.

o The high value of R-square determines the less difference between

the predicted values and actual values and hence represents a good
model.
o It is also called a coefficient of determination, or coefficient of
multiple determination for multiple regression.
o It can be calculated from the below formula:

Assumptions of Linear Regression

Below are some important assumptions of Linear Regression. These are
some formal checks while building a Linear Regression model, which
ensures to get the best possible result from the given dataset.

o Linear relationship between the features and target:

Linear regression assumes the linear relationship between the
dependent and independent variables.
o Small or no multicollinearity between the features:
Multicollinearity means high-correlation between the independent
variables. Due to multicollinearity, it may difficult to find the true
relationship between the predictors and target variables. Or we can
say, it is difficult to determine which predictor variable is affecting
the target variable and which is not. So, the model assumes either
little or no multicollinearity between the features or independent
variables.
o Homoscedasticity Assumption:
Homoscedasticity is a situation when the error term is the same for
all the values of independent variables. With homoscedasticity,
there should be no clear pattern distribution of data in the scatter
plot.
o Normal distribution of error terms:
Linear regression assumes that the error term should follow the
normal distribution pattern. If error terms are not normally
distributed, then confidence intervals will become either too wide or
too narrow, which may cause difficulties in finding coefficients.
It can be checked using the q-q plot. If the plot shows a straight
line without any deviation, which means the error is normally
distributed.
o No autocorrelations:
The linear regression model assumes no autocorrelation in error
terms. If there will be any correlation in the error term, then it will
drastically reduce the accuracy of the model. Autocorrelation
usually occurs if there is a dependency between residual errors.
Example

lets take the data of passenger vehicle sales in India. To

keep things simple, lets say only one variable which is GDP
of the country has impact on the sales. In reality there are
more factors like auto interest loan rates etc but thats for
next article. For this lets focus on linear equation of
number of vehicle sold in India wrt GDP.

Sample Data

I looked at passenger vehicles sales in India year wise for

last few years. Also checked GDP for each year. While
looking at both the data it dawned upon me that impact of
GDP in ‘current year’ will have effect on vehicle sales ‘next
year’. So whichever year GDP was less, the coming year
sales was lower and when GDP increased the next year
vehicle sales also increased. Hence, they say preparing the
data for ML analytics is more important and that’s where
most time needs to be spent.

Lets have equation as y = c + ax .

y = number of vehicles sold in the year

x = GDP of prior year.

We need to find c and a.

Below is table of data. I saved this data in a file called
‘vehicle_sale_data’. Please note that number of vehicles
sold figure is in lakhs (1 lakh = 0.1 million).
year,GDP,4wheeler_passengar_vehicle_sale(in lakhs)
2011,6.2,26.3
2012,6.5,26.65
2013,5.48,25.03
2014,6.54,26.01
2015,7.18,27.9
2016,7.93,30.47

First column = year, which is of not much use in the code

below

Second column — GDP for the ‘previous’ year. This is x in

the equation.

Third column — Number of vehicles sold. This is what we

want to predict maybe for nexy year if we know the GDP of
current year.

Model creation

We will use python to create the model. Below are the

steps.

 Read the file. ‘gdp_sale’ dictionary will have key as

GDP and value is sales.
def read_data() :
data = open("vehicle_sale_data" , "r")
gdp_sale = [Link]()
for line in [Link]()[1:] :
record = [Link](",")
gdp_sale[float(record[1])] = float(record[2].replace('\
n', "")) return gdp_sale
 Calulate the step and get to new ‘c’ and ‘a’. For
first time, we will pass the initial value ‘c’ and ‘a’.
This function will calculate the values of new c and
new a after moving one step. This function need to
be called iteratively till it stablizes.
def step_cost_function_for(gdp_sale, constant, slope) :
global stepSize
diff_sum_constant = 0 # diff of sum for constant 'c' in "c +
ax" equation
diff_sum_slope = 0 # diff of sum for 'a' in "c + ax"
equation
gdp_for_years = list(gdp_sale.keys()) for year_gdp in
gdp_for_years: # for each year's gdp in the sample data
# get the sale for given 'c' and 'a'by giving the GDP
for this sample record
trg_data_sale = sale_for_data(constant, slope, year_gdp)
# calculated sale for current 'c' and 'a'
a_year_sale = gdp_sale.get(year_gdp) # real sale for
this record
diff_sum_slope = diff_sum_slope + ((trg_data_sale -
a_year_sale) * year_gdp) # slope's (h(y) - y) * x
diff_sum_constant = diff_sum_constant + (trg_data_sale -
a_year_sale) # consant's (h(y) - y) step_for_constant =
(stepSize / len(gdp_sale)) * diff_sum_constant # distance to be
moved by c
step_for_slope = (stepSize / len(gdp_sale)) * diff_sum_slope
# distance to be moved by a
new_constant = constant - step_for_constant # new c
new_slope = slope - step_for_slope # new a return
new_constant, new_slope

 Function to get the sales of vehicles provided the

values of c, a and x. Used by above function for
each sample data (gdp).
def sale_for_data(constant, slope, data):
return constant + slope * data # y = c + ax format

 Iteration to get optimum weights ie optimum

values of c and a. It will stop if c and a both are not
moving more than 0.01 in next iteration.
def get_weights(gdp_sale) :
constant = 1
slope = 1
accepted_diff = 0.01 while 1 == 1: # continue till we
reach local minimum
new_constant, new_slope =
step_cost_function_for(gdp_sale, constant, slope)
# if the diff is too less then lets break
if (abs(constant - new_constant) <= accepted_diff) and
(abs(slope - new_slope) <= accepted_diff):
print "done. Diff is less than " +
str(accepted_diff)
return new_constant, new_slope
else:
constant = new_constant
slope = new_slope
print "new values for constant and slope are " +
str(new_constant) + ", " + \
str(new_slope)

 And of course the main function

def main() :
contant, slope = get_weights(read_data())
print "constant :" + contant + ", slope:" + slopeif __name__
== '__main__':
main()

I got the equation as

y (vehicles sales) = 1.43 + 3.84 * x

 x is value of GDP

So if we have GDP as 7.5 this year then we will have

passenger vehicles sales next year as — 1.43 7.5*3.84 =
30.23

Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
12 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
6 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
29 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
5 pages
Predictor Variables for IMDB Ratings
No ratings yet
Predictor Variables for IMDB Ratings
43 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
24 pages
Understanding Linear Models in ML
No ratings yet
Understanding Linear Models in ML
60 pages
Linear Regression Overview in Data Science
100% (1)
Linear Regression Overview in Data Science
14 pages
Statistical Decision Theory & Linear Regression
No ratings yet
Statistical Decision Theory & Linear Regression
16 pages
Linear Regression: Simple & Multiple Models
No ratings yet
Linear Regression: Simple & Multiple Models
43 pages
Data Science Regression Techniques Guide
No ratings yet
Data Science Regression Techniques Guide
27 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
26 pages
Python Multiple Linear Regression Lab Guide
No ratings yet
Python Multiple Linear Regression Lab Guide
9 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
35 pages
Understanding Linear Regression in ML
No ratings yet
Understanding Linear Regression in ML
17 pages
Linear Regression Model Overview
No ratings yet
Linear Regression Model Overview
16 pages
Linear Regression Basics in ML
No ratings yet
Linear Regression Basics in ML
23 pages
Supervised Learning: Linear Regression Guide
No ratings yet
Supervised Learning: Linear Regression Guide
147 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
4 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
11 pages
Regression Model Development Lab Guide
No ratings yet
Regression Model Development Lab Guide
8 pages
Supervised Learning and Regression Analysis
No ratings yet
Supervised Learning and Regression Analysis
20 pages
Linear Regression
No ratings yet
Linear Regression
32 pages
Machine Learning Regression Techniques
No ratings yet
Machine Learning Regression Techniques
13 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
28 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
29 pages
ML Exp 1
No ratings yet
ML Exp 1
4 pages
Linear Regression: Concepts and Code
No ratings yet
Linear Regression: Concepts and Code
4 pages
Classification vs. Regression Algorithms
No ratings yet
Classification vs. Regression Algorithms
19 pages
Linear Regression An Overview
No ratings yet
Linear Regression An Overview
32 pages
Regression in Machine Learning
No ratings yet
Regression in Machine Learning
13 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
10 pages
Understanding Regression Analysis Basics
No ratings yet
Understanding Regression Analysis Basics
20 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
38 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
28 pages
Understanding Linear Regression Models
No ratings yet
Understanding Linear Regression Models
22 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
14 pages
Supervised Learning in AI & ML
No ratings yet
Supervised Learning in AI & ML
33 pages
Introduction to Regression Analysis
No ratings yet
Introduction to Regression Analysis
16 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
3 pages
Understanding Linear Regression Basics
100% (1)
Understanding Linear Regression Basics
8 pages
Linear Regression in Machine Learning
No ratings yet
Linear Regression in Machine Learning
54 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
8 pages
Regression and Classification Overview
No ratings yet
Regression and Classification Overview
80 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
9 pages
Linear Regression Model Evaluation Guide
No ratings yet
Linear Regression Model Evaluation Guide
31 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
7 pages
Understanding Correlation and Regression
No ratings yet
Understanding Correlation and Regression
5 pages
Understanding Linear Regression Variables
No ratings yet
Understanding Linear Regression Variables
18 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
9 pages
Understanding Parametric Models in Regression
No ratings yet
Understanding Parametric Models in Regression
19 pages
Linear Separability in Regression Models
No ratings yet
Linear Separability in Regression Models
32 pages
Univariate Linear Regression Overview
No ratings yet
Univariate Linear Regression Overview
16 pages
Supervised Learning: Regression Techniques
No ratings yet
Supervised Learning: Regression Techniques
34 pages
Linear Regression Model Complete Guide
No ratings yet
Linear Regression Model Complete Guide
2 pages
Linear Regression in Python Explained
No ratings yet
Linear Regression in Python Explained
18 pages
Understanding Regression and Covariance
No ratings yet
Understanding Regression and Covariance
34 pages
Fault Estimation in Nonlinear Systems
No ratings yet
Fault Estimation in Nonlinear Systems
6 pages
3D Printing Defect Detection System
No ratings yet
3D Printing Defect Detection System
6 pages
Lagrange Multipliers: Examples & Methods
No ratings yet
Lagrange Multipliers: Examples & Methods
20 pages
B.Sc. Mathematics 5th Sem Exam Papers
No ratings yet
B.Sc. Mathematics 5th Sem Exam Papers
3 pages
Understanding Present Value in Finance
No ratings yet
Understanding Present Value in Finance
43 pages
Linear Programming Excel Tutorial
No ratings yet
Linear Programming Excel Tutorial
1 page
Understanding Random Variables in Probability
No ratings yet
Understanding Random Variables in Probability
6 pages
Turing Machine Based Encryption
No ratings yet
Turing Machine Based Encryption
4 pages
LLM Seminar Report by Sethuram B
No ratings yet
LLM Seminar Report by Sethuram B
10 pages
Probabilistic Models for Safety Stock
No ratings yet
Probabilistic Models for Safety Stock
5 pages
Information Security Overview and Attacks
No ratings yet
Information Security Overview and Attacks
11 pages
A LQR Optimal Method To Control The Position of An Overhead Crane
No ratings yet
A LQR Optimal Method To Control The Position of An Overhead Crane
7 pages
Understanding Stationary Time Series
No ratings yet
Understanding Stationary Time Series
49 pages
Robot Structural Analysis 2017 Help - Criterions To Stop Analysis
No ratings yet
Robot Structural Analysis 2017 Help - Criterions To Stop Analysis
3 pages
Numerical Integration Techniques Explained
No ratings yet
Numerical Integration Techniques Explained
98 pages
AI Water Usage Monitor for Conservation
No ratings yet
AI Water Usage Monitor for Conservation
11 pages
A TGIU-Inspired Inflationary Model From Informational Asymmetry - 260313 - 003847
No ratings yet
A TGIU-Inspired Inflationary Model From Informational Asymmetry - 260313 - 003847
25 pages
Isospectrality in Extended Gravity Theories
No ratings yet
Isospectrality in Extended Gravity Theories
10 pages
Backstepping Control for Nonlinear Systems
No ratings yet
Backstepping Control for Nonlinear Systems
5 pages
AI and Machine Learning in Neurology
No ratings yet
AI and Machine Learning in Neurology
10 pages
Heun’s and Runge-Kutta Methods Explained
No ratings yet
Heun’s and Runge-Kutta Methods Explained
15 pages
Sanitized TLBO Algorithm Overview
No ratings yet
Sanitized TLBO Algorithm Overview
50 pages
Facial Liveness Detection with Transformers
No ratings yet
Facial Liveness Detection with Transformers
7 pages
Additional Practice: Hana Shalabi
No ratings yet
Additional Practice: Hana Shalabi
1 page
Testbank for Essentials of Statistics
No ratings yet
Testbank for Essentials of Statistics
13 pages
Computer Aided Civil Eng - 2017 - Lin - Structural Damage Detection With Automatic Feature Extraction Through Deep Learning
No ratings yet
Computer Aided Civil Eng - 2017 - Lin - Structural Damage Detection With Automatic Feature Extraction Through Deep Learning
22 pages
Deep Learning Fundamentals Overview
No ratings yet
Deep Learning Fundamentals Overview
66 pages
UMA019/UMA035 Quiz Results Summary
No ratings yet
UMA019/UMA035 Quiz Results Summary
14 pages
Chapter 2 Lesson 3 4
No ratings yet
Chapter 2 Lesson 3 4
6 pages
Divide & Conquer Algorithms Explained
No ratings yet
Divide & Conquer Algorithms Explained
111 pages

Linear Regression in Machine Learning

Uploaded by

Linear Regression in Machine Learning

Uploaded by

Linear Regression in Machine Learning

Linear regression algorithm shows a linear relationship between a

The linear regression model provides a sloped straight line representing

Mathematically, we can represent a linear regression as:

Y= Dependent Variable (Target Variable)

Types of Linear Regression

o Simple Linear Regression:

Linear Regression Line

o Positive Linear Relationship:

Finding the best fit line:

o The different values for weights or coefficient of lines (a 0, a1) gives

N=Total number of observation

o Gradient descent is used to minimize the MSE by calculating the

o The high value of R-square determines the less difference between

Assumptions of Linear Regression

o Linear relationship between the features and target:

lets take the data of passenger vehicle sales in India. To

I looked at passenger vehicles sales in India year wise for

Lets have equation as y = c + ax .

y = number of vehicles sold in the year

x = GDP of prior year.

We need to find c and a.

First column = year, which is of not much use in the code

Second column — GDP for the ‘previous’ year. This is x in

Third column — Number of vehicles sold. This is what we

We will use python to create the model. Below are the

 Read the file. ‘gdp_sale’ dictionary will have key as

 Function to get the sales of vehicles provided the

 Iteration to get optimum weights ie optimum

 And of course the main function

I got the equation as

y (vehicles sales) = 1.43 + 3.84 * x

So if we have GDP as 7.5 this year then we will have

You might also like