0% found this document useful (0 votes)
17 views

Linear Regression

Uploaded by

Sailla Raghu raj
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views

Linear Regression

Uploaded by

Sailla Raghu raj
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

What is linear regression?

1) Linear regression is a data analysis technique that predicts the value of unknown data by using another
related and known data value.
2) It mathematically models the unknown or dependent variable and the known or independent variable as
a linear equation.
(Linear Equation Definition: A linear equation is an algebraic equation where each term has
an exponent of 1 and when this equation is graphed, it always results in a straight line. This is the
reason why it is named as a 'linear' equation.)
Let us learn how to identify linear equations and non-linear equations with the help of the following
examples….

Equations Linear or Non-Linear

y = 8x - 9 Linear

y = x2 - 7 Non-Linear, the power of the variable x is 2

√y + x = 6 Non-Linear, the power of the variable y is 1/2

y + 3x - 1
Linear
=0

y2 - x = 9 Non-Linear, the power of the variable y is 2

3) For instance, suppose that you have data about your expenses and income for last year. Linear
regression techniques analyze this data and determine that your expenses are half your income.
4) They then calculate an unknown future expense by halving a future known income.

Linear Equation Formula


 Slope intercept form of a linear equation is y = mx + c (where m = slope and c =
y-intercept)
 Point slope form of a linear equation is y - y1 = m(x - x1) (where m = slope and
(x1, y1) is a point on the line)
Linear Equations in Standard Form

The standard form or the general form of linear equations in one variable is written as, Ax + B =

0; where A and B are real numbers, and x is the single variable. The standard form of linear

equations in two variables is expressed as, Ax + By = C; where A, B and C are any real numbers, and x

and y are the variables.


Linear Equation Graph

The graph of a linear equation in one variable x forms a vertical line that is parallel to the y-axis and vice-versa,

whereas, the graph of a linear equation in two variables x and y forms a straight line. Let us graph a linear

equation in two variables with the help of the following example.

Example: Plot a graph for a linear equation in two variables, x - 2y = 2.

Let us plot the linear equation graph using the following steps.

 Step 1: The given linear equation is x - 2y = 2.

 Step 2: Convert the equation in the form of y = mx + b. This will give: y = x/2 - 1.

 Step 3: Now, we can replace the value of x for different numbers and get the resulting value of y

to create the coordinates.

 Step 4: When we put x = 0 in the equation, we get y = 0/2 - 1, i.e. y = -1. Similarly, if we

substitute the value of x as 2 in the equation, y = x/2 - 1, we get y = 0.

 Step 5: If we substitute the value of x as 4, we get y = 1. The value of x = -2 gives the value of y =

-2. Now, these pairs of values of (x, y) satisfy the given linear equation y = x/2 - 1. Therefore, we

list the coordinates as shown in the following table.

x 0 2 4 -2

y -1 0 1 -2

 Step 6: Finally, we plot these points (4,1), (2,0), (0,-1) and (-2, -2) on a graph and join the points

to get a straight line. This is how a linear equation is represented on a graph.


A sloped straight line represents the linear regression model.

Best Fit Line for a Linear Regression Model


In the above figure,

X-axis = Independent variable

Y-axis = Output / dependent variable

Line of regression = Best fit line for a model

Here, a line is plotted for the given data points that suitably fit all the issues. Hence, it is called the ‘best fit line.’

The goal of the linear regression algorithm is to find this best fit line seen in the above figure.
https://www.analyticsvidhya.com/blog/2021/10/
everything-you-need-to-know-about-linear-
regression/
Detail for linear regression
https://realpython.com/linear-regression-in-python/

Step 1: Import packages and classes

>>> import numpy as np


>>> from sklearn.linear_model import LinearRegression

Step 2: Provide data

>>> x = np.array([5, 15, 25, 35, 45, 55]).reshape((-1, 1))


>>> y = np.array([5, 20, 14, 32, 22, 38])

This is how x and y look now:

>>> x
array([[ 5],
[15],
[25],
[35],
[45],
[55]])

>>> y
array([ 5, 20, 14, 32, 22, 38])

Step 3: Create a model and fit it


The next step is to create a linear regression model and fit it using the existing data.

Create an instance of the class LinearRegression, which will represent the regression
model:

>>>

>>> model = LinearRegression()


This statement creates the variable model as an instance of LinearRegression. You can
provide several optional parameters to LinearRegression:

 fit_intercept is a Boolean that, if True, decides to calculate the intercept 𝑏₀ or, if False,
considers it equal to zero. It defaults to True.
 normalize is a Boolean that, if True, decides to normalize the input variables. It defaults
to False, in which case it doesn’t normalize the input variables.
 copy_X is a Boolean that decides whether to copy (True) or overwrite the input variables
(False). It’s True by default.
 n_jobs is either an integer or None. It represents the number of jobs used in parallel
computation. It defaults to None, which usually means one job. -1 means to use all available
processors.

Your model as defined above uses the default values of all parameters.

It’s time to start using the model. First, you need to call .fit() on model:

>>>

>>> model.fit(x, y)
LinearRegression()
With .fit(), you calculate the optimal values of the weights 𝑏₀ and 𝑏₁, using the existing
input and output, x and y, as the arguments. In other words, .fit() fits the model. It
returns self, which is the variable model itself. That’s why you can replace the last two
statements with this one:

>>>

>>> model = LinearRegression().fit(x, y)


This statement does the same thing as the previous two. It’s just shorter.

Step 4: Get results

Once you have your model fitted, you can get the results to check whether the model works
satisfactorily and to interpret it.

You can obtain the coefficient of determination, 𝑅², with .score() called on model:

>>>

>>> r_sq = model.score(x, y)


>>> print(f"coefficient of determination: {r_sq}")
coefficient of determination: 0.7158756137479542
When you’re applying .score(), the arguments are also the predictor x and response y, and
the return value is 𝑅².

The attributes of model are .intercept_, which represents the coefficient 𝑏₀, and .coef_,
which represents 𝑏₁:

>>>

>>> print(f"intercept: {model.intercept_}")


intercept: 5.633333333333329

>>> print(f"slope: {model.coef_}")


slope: [0.54]
The code above illustrates how to get 𝑏₀ and 𝑏₁. You can notice that .intercept_ is a scalar,
while .coef_ is an array.

Note: In scikit-learn, by convention, a trailing underscore indicates that an attribute is


estimated. In this example, .intercept_ and .coef_ are estimated values.
The value of 𝑏₀ is approximately 5.63. This illustrates that your model predicts the response
5.63 when 𝑥 is zero. The value 𝑏₁ = 0.54 means that the predicted response rises by 0.54
when 𝑥 is increased by one.

You’ll notice that you can provide y as a two-dimensional array as well. In this case, you’ll
get a similar result. This is how it might look:

>>>

>>> new_model = LinearRegression().fit(x, y.reshape((-1, 1)))


>>> print(f"intercept: {new_model.intercept_}")
intercept: [5.63333333]

>>> print(f"slope: {new_model.coef_}")


slope: [[0.54]]
As you can see, this example is very similar to the previous one, but in this
case, .intercept_ is a one-dimensional array with the single element 𝑏₀, and .coef_ is a
two-dimensional array with the single element 𝑏₁.

Step 5: Predict response

Once you have a satisfactory model, then you can use it for predictions with either existing or
new data. To obtain the predicted response, use .predict():

>>>

>>> y_pred = model.predict(x)


>>> print(f"predicted response:\n{y_pred}")
predicted response:
[ 8.33333333 13.73333333 19.13333333 24.53333333 29.93333333 35.33333333]

When applying .predict(), you pass the regressor as the argument and get the
corresponding predicted response. This is a nearly identical way to predict the response:

>>>

>>> y_pred = model.intercept_ + model.coef_ * x


>>> print(f"predicted response:\n{y_pred}")
predicted response:
[[ 8.33333333]
[13.73333333]
[19.13333333]
[24.53333333]
[29.93333333]
[35.33333333]]
In this case, you multiply each element of x with model.coef_ and
add model.intercept_ to the product.

The output here differs from the previous example only in dimensions. The predicted
response is now a two-dimensional array, while in the previous case, it had one dimension.

If you reduce the number of dimensions of x to one, then these two approaches will yield the
same result. You can do this by replacing x with x.reshape(-1), x.flatten(),
or x.ravel() when multiplying it with model.coef_.

In practice, regression models are often applied for forecasts. This means that you can use
fitted models to calculate the outputs based on new inputs:

>>>

>>> x_new = np.arange(5).reshape((-1, 1))


>>> x_new
array([[0],
[1],
[2],
[3],
[4]])

>>> y_new = model.predict(x_new)


>>> y_new
array([5.63333333, 6.17333333, 6.71333333, 7.25333333, 7.79333333])
Here .predict() is applied to the new regressor x_new and yields the response y_new. This
example conveniently uses arange() from numpy to generate an array with the elements
from 0, inclusive, up to but excluding 5—that is, 0, 1, 2, 3, and 4.

You might also like