Linear Regression
Linear Regression
1) Linear regression is a data analysis technique that predicts the value of unknown data by using another
related and known data value.
2) It mathematically models the unknown or dependent variable and the known or independent variable as
a linear equation.
(Linear Equation Definition: A linear equation is an algebraic equation where each term has
an exponent of 1 and when this equation is graphed, it always results in a straight line. This is the
reason why it is named as a 'linear' equation.)
Let us learn how to identify linear equations and non-linear equations with the help of the following
examples….
y = 8x - 9 Linear
y + 3x - 1
Linear
=0
3) For instance, suppose that you have data about your expenses and income for last year. Linear
regression techniques analyze this data and determine that your expenses are half your income.
4) They then calculate an unknown future expense by halving a future known income.
The standard form or the general form of linear equations in one variable is written as, Ax + B =
0; where A and B are real numbers, and x is the single variable. The standard form of linear
equations in two variables is expressed as, Ax + By = C; where A, B and C are any real numbers, and x
The graph of a linear equation in one variable x forms a vertical line that is parallel to the y-axis and vice-versa,
whereas, the graph of a linear equation in two variables x and y forms a straight line. Let us graph a linear
Let us plot the linear equation graph using the following steps.
Step 2: Convert the equation in the form of y = mx + b. This will give: y = x/2 - 1.
Step 3: Now, we can replace the value of x for different numbers and get the resulting value of y
Step 4: When we put x = 0 in the equation, we get y = 0/2 - 1, i.e. y = -1. Similarly, if we
Step 5: If we substitute the value of x as 4, we get y = 1. The value of x = -2 gives the value of y =
-2. Now, these pairs of values of (x, y) satisfy the given linear equation y = x/2 - 1. Therefore, we
x 0 2 4 -2
y -1 0 1 -2
Step 6: Finally, we plot these points (4,1), (2,0), (0,-1) and (-2, -2) on a graph and join the points
Here, a line is plotted for the given data points that suitably fit all the issues. Hence, it is called the ‘best fit line.’
The goal of the linear regression algorithm is to find this best fit line seen in the above figure.
https://www.analyticsvidhya.com/blog/2021/10/
everything-you-need-to-know-about-linear-
regression/
Detail for linear regression
https://realpython.com/linear-regression-in-python/
>>> x
array([[ 5],
[15],
[25],
[35],
[45],
[55]])
>>> y
array([ 5, 20, 14, 32, 22, 38])
Create an instance of the class LinearRegression, which will represent the regression
model:
>>>
fit_intercept is a Boolean that, if True, decides to calculate the intercept 𝑏₀ or, if False,
considers it equal to zero. It defaults to True.
normalize is a Boolean that, if True, decides to normalize the input variables. It defaults
to False, in which case it doesn’t normalize the input variables.
copy_X is a Boolean that decides whether to copy (True) or overwrite the input variables
(False). It’s True by default.
n_jobs is either an integer or None. It represents the number of jobs used in parallel
computation. It defaults to None, which usually means one job. -1 means to use all available
processors.
Your model as defined above uses the default values of all parameters.
It’s time to start using the model. First, you need to call .fit() on model:
>>>
>>> model.fit(x, y)
LinearRegression()
With .fit(), you calculate the optimal values of the weights 𝑏₀ and 𝑏₁, using the existing
input and output, x and y, as the arguments. In other words, .fit() fits the model. It
returns self, which is the variable model itself. That’s why you can replace the last two
statements with this one:
>>>
Once you have your model fitted, you can get the results to check whether the model works
satisfactorily and to interpret it.
You can obtain the coefficient of determination, 𝑅², with .score() called on model:
>>>
The attributes of model are .intercept_, which represents the coefficient 𝑏₀, and .coef_,
which represents 𝑏₁:
>>>
You’ll notice that you can provide y as a two-dimensional array as well. In this case, you’ll
get a similar result. This is how it might look:
>>>
Once you have a satisfactory model, then you can use it for predictions with either existing or
new data. To obtain the predicted response, use .predict():
>>>
When applying .predict(), you pass the regressor as the argument and get the
corresponding predicted response. This is a nearly identical way to predict the response:
>>>
The output here differs from the previous example only in dimensions. The predicted
response is now a two-dimensional array, while in the previous case, it had one dimension.
If you reduce the number of dimensions of x to one, then these two approaches will yield the
same result. You can do this by replacing x with x.reshape(-1), x.flatten(),
or x.ravel() when multiplying it with model.coef_.
In practice, regression models are often applied for forecasts. This means that you can use
fitted models to calculate the outputs based on new inputs:
>>>