0% found this document useful (0 votes)

109 views

Regression PDF

Uploaded by

adhi

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views

Regression PDF

Uploaded by

adhi

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

www.analyttica.

com

Predictive Modelling
using
Linear Regression

© Analy Datalab Inc.,

e 2016. All rights res rved.
https://leaps.analyttica.com

Table of Contents
Concept of Regression Analysis

Simple and Multiple Linear Regression

Evaluating a Linear Model

Variable Selection and Transformations

Page 2 of 16
https://leaps.analyttica.com

Concept of Regression Analysis

Regression analysis is a predictive modelling technique which estimates the relationship between
two or more variables. Recall that a correlation analysis makes no assumption about the causal
relationship between two variables. Regression analysis focusses on the relationship between a
dependent (target) variable and independent variable(s) (predictors). Here, dependent variable is
assumed to be the effect of the independent variable(s). The value of predictors is used to estimate
or predict the likely-value of the target variable.

For example, to describe the relationship between diesel consumption and industrial production, if
it is assumed that “diesel consumption” is the effect of “industrial production”, we can do a
regression analysis to predict value of “diesel consumption” for some specific value of “industrial
production”

To do this, we first try to assume a mathematical relationship between the target and the
predictor(s). The relationship can be a straight line (linear regression) or a polynomial curve
(polynomial regression) or a non-linear relationship (non-linear regression). This can be done
through various ways. The simplest and most popular way is to create a scatter plot of the target
variable and predictor variable. (Refer to Figure 1 and Figure 2)

Figure 1: Linear Relationship Figure 2: Polynomial Relationship

Once the type of relationship is established, we try to find the most-likely values of the coefficients
in the mathematical formula.

Regression analysis comprises of the entire process of identifying the target and predictors, finding
the relationship, estimating the coefficients, finding the predicted values of target, and finally
evaluating the accuracy of the fitted relationship.

Page 3 of 16
https://leaps.analyttica.com

Why do we use Regression Analysis?

As discussed above, regression analysis estimates the relationship between two or more variables.
More specifically, regression analysis helps one understand how the typical value of the dependent
variable changes when any one of the independent variables is varied, while the other independent
variables are held fixed.

Let’s say, we want to estimate the credit card spend of the customers in the next quarter. For each
customer, we have their demographic and transaction related data which indicate that the credit
card spend is a factor of age, credit limit and total outstanding balance on their loans. Using this
insight, we can predict future sales of the company based on current and past information.

What are the benefits of using Regression Analysis?

Regression explores significant relationships between dependent variable and independent
variable.

It indicates the strength of impact of multiple independent variables on a dependent variable and
helps to determine which variables in particular, are most significant predictors of the dependent
variable. Their influence is quantified by the magnitude and sign of the beta estimates, which is
nothing but the extent to which they impact the dependent variable.
It also allows us to compare the effect of variable measures on different scales and can consider
nominal, interval, or categorical variables for analysis.

The simplest form of the equation with one dependent and one independent variable is defined by
the formula:

y = c + b*x,
where
y = estimated dependent score,

c = constant,

b = regression coefficient, and

x = independent variable.

Page 4 of 16
https://leaps.analyttica.com

Types of Regression Techniques

For predictions, there are many regression techniques available. The type of regression technique
to be used is mostly driven by three metrics:

• Number of independent variables

• Type of dependent variables
• Shape of regression line
Let’s briefly discuss a few regression techniques. A more elaborate discussion of the most commonly
used regression techniques will be covered later in the module.

Linear Regression
Linear regression is one of the most commonly used predictive modelling techniques. It establishes
a relationship between dependent variable (Y) and one or more independent variables (X) using
a best fit straight line (also known as a regression line).
It is represented by an equation 𝑌 = 𝑎 + 𝑏𝑋 + 𝑒, where a is the intercept, b is the slope of the
line and e is the error term. This equation can be used to predict the value of a target variable based
on given predictor variable(s).

Logistic Regression
Logistic regression is used to explain the relationship between one dependent binary variable and
one or more nominal, ordinal, interval or ratio-level independent variables.

Polynomial Regression
A regression equation is a polynomial regression equation if the power of independent variable is
more than 1. The equation below represents a polynomial equation.
𝑌 = 𝑎 + 𝑏𝑋 + 𝑐𝑋2
In this regression technique, the best fit line is not a straight line. It is rather a curve that fits into the
data points.

Ridge Regression
Ridge regression is suitable for analyzing multiple regression data that suffers from multicollinearity.
When multicollinearity occurs, least squares estimates are unbiased, but their variances are large so
they may be far from the true value. By adding a degree of bias to the regression estimates, ridge
regression reduces the standard errors. It is hoped that the net effect will be to give estimates that
are more reliable.

Page 5 of 16
https://leaps.analyttica.com

Linear Regression
Linear Regression is a predictive modelling technique that establishes a relationship
between dependent variable (Y) and one or more explanatory variables denoted by X, using a best
fit straight line (also known as regression line).

It is represented by the equation, 𝑌 = 𝑎 + 𝑏 ∗ 𝑋 + 𝑒, where a is intercept, b is slope of the line and e

is error term.

This equation can be used to predict the value of target variable based on given predictor variable(s).
The case of one explanatory variable is called simple linear regression. For more than one
explanatory variables, the process is called multiple linear regression. In this technique, the
dependent variable is continuous, independent variable(s) can be continuous or discrete, and nature
of the regression line is linear.
The following sections discuss in detail, the process of developing and evaluating a regression
model. An important concept to recall at this point, is that of Data Splitting, which requires the data
to be randomly split into Training and Validation datasets. The rationale behind splitting the data is
that the model is built on one dataset (training) and its performance is evaluated on the validation
dataset to evaluate its performance on a new, unknown dataset.

In all following discussions, it is understood that the model building and evaluation process
(determining the best fitting line and estimating the accuracy of the model) is done on the training
dataset, and the model validation is done on the validation dataset.

Page 6 of 16
https://leaps.analyttica.com

Determining the Best Fitting Line

Consider we have a random sample of 20 students with their height (x) and weight (y) and we need
to establish a relationship between the two. One of the first and basic approach to fit a line through
the data points is to create a scatter plot of (x,y) and draw a straight line that fits the experimental
data.

Figure 3
Since there can be multiple lines that fit the data, the challenge arises in choosing the one that best
fits. As we already know, the best fit line can be represented as
𝑦̂𝑖 = 𝑏0 + 𝑏1 𝑥𝑖
Where,

• 𝑦 denotes the observed response for experimental unit i

• 𝑥𝑖 denotes the predictor value for experimental unit i
• 𝑦𝑖̂ is the predicted response (or fitted value) for experimental unit i

When we predict height using the above equation, the predicted value of the prediction wouldn't
be perfectly accurate. It has some "prediction error" (or "residual error"). This can be represented as
𝑒𝑖 = 𝑦𝑖 − 𝑦̂𝑖
A line that fits the data best will be one for which the n (i = 1 to n) prediction errors, one for each
observed data point, are as small as possible in some overall sense.

One way to achieve this goal is to invoke the "least squares criterion," which says to "minimize the
sum of the squared prediction errors."

Page 7 of 16
https://leaps.analyttica.com

The equation of the best fitting line is:

𝑦̂𝑖 = 𝑏0 + 𝑏1 𝑥𝑖
We need to find the values of b0 and b1 that make the sum of the squared prediction errors the
smallest i.e.
Residual Squares = ∑𝑛 𝑒𝑖 2 = ∑𝑛 (𝑦𝑖 − 𝑦̂𝑖 )2
𝑖=1 𝑖=1

Because the deviations are first squared, then added, there is no cancelling out between positive
and negative values.

Least Square Estimates

For the above equation 𝑏0 and 𝑏1 are determined using the following:
̂ ̅ ̂ ̅ ̅ ̅
∑�� (𝑋𝑖 −𝑋)(𝑌𝑖 −𝑌)
̂
𝑏0 = 𝑌 − 𝑏1𝑋 and 𝑏1 = 𝑖=1
∑𝑛 (𝑋 −𝑋̅)2
𝑖=1 𝑖

Because the formulas for b0 and b1 are derived using the least squares criterion, the resulting
equation, 𝑦̂𝑖 = 𝑏0 + 𝑏1 𝑥𝑖 , is often referred to as the "least squares regression line," or simply the
"least squares line." It is also sometimes called the "estimated regression equation."

What Does the Equation Mean?

The equation above is a physical interpretation of each of the coefficients and hence it is very
important to understand what the regression equation means.

• The coefficient 𝑏0, or the intercept, is the expected value of Y when X = 0

• The coefficient 𝑏1, or the slope, is the expected change in Y when X is increased by one unit.

Page 8 of 16
https://leaps.analyttica.com

The following figure explains the interpretations clearly.

Figure 4

Example of Linear Regression: Factors affecting Credit Card Sales

An analyst wants to understand what factors (or independent variables) affect credit card sales. Here,
the dependent variable is credit card sales for each customer, and the independent variables are
income, age, current balance, socio-economic status, current spend, last month’s spend, loan
outstanding balance, revolving credit balance, number of existing credit cards and credit limit. In
order to understand what factors affect credit card sales, the analyst needs to build a linear
regression model.
It is important to note that a linear regression cannot be applied to categorical variables, and is not
recommended for ordinal variables, hence, the analyst may also need to check the variable type
before running a model.

Module 1 Simulation 1: Learn & Apply a Simple Linear Regression Model

In this simulation, the learner is exposed to a sample dataset comprising of telecom customer
accounts and their annual income, age along with their average monthly revenue (dependent
variable). The learner is expected to apply the linear regression model using annual income as the
single predictor variable.

Page 9 of 16
https://leaps.analyttica.com

Evaluating a Linear Regression Model

Once we fit a linear regression model, we need to evaluate the accuracy of the model. In the
following sections, we will discuss the various methods used to evaluate the accuracy of the model
with respect to its predictive power.

For an in-depth understanding of all the topics covered in the coming sections, refer to the course
“Fundamentals of Data Analytics” on Analyttica TreasureHunt
(https://leaps.analyttica.com/courses/overview/Fundamentals-of-Data-Analytics). You can also
refer to any of the standard Statistics books for more information on the same.

F-Statistics and p-value

The F-Test indicates whether a linear regression model provides a better fit to the data than a model
that contains no independent variables. It consists of the null and alternate hypothesis and the test
statistic helps to prove or disprove the null hypothesis.

The null hypothesis here is “The target variable cannot be significantly predicted using the predictor
variable(s)”. To do this we look at the F-statistic and its p-value. Mathematically, the null hypothesis
we test here is “All slope parameters are 0” (note the number of slope parameters will be the same
as the number of independent variables in the model). Hence, if the null hypothesis is accepted (or
not rejected) then it means we cannot predict target variable using the predictor variables and hence
regression is not possible.

Page 10 of 16
https://leaps.analyttica.com

Coefficient of Determination
Next, we look at the R-squared value of the model, which is also called the “Coefficient of
Determination”. This statistic calculates the percentage of variation in target variable explained by
the model. The below illustration captures the explained vs. unexplained variation in data.

Figure 5
R-squared is calculated using the following formula:
𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 ∑𝑛 (𝑌̂𝑖 − 𝑌̅)2
𝑅2 = 𝑇𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = ∑𝑖=1
𝑛 (𝑌𝑖 − 𝑌 ̅ )2
𝑖=1

R-squared is always between 0 and 100%. As a guideline, the more the R-squared, the better is the
model. The objective is not to maximize the R-squared, since the stability and applicability of the
model are equally important. Next, check the Adjusted R-squared value. Ideally, the R-squared and
adjusted R-squared values need to be in close proximity of each other. If this is not the case, then
the analyst may have over fitted the model and may need to remove the insignificant variables from
the model.

Module 1 Simulation 2: Learn & Apply the concept of R-Square

In this simulation, the learner is exposed to a sample dataset capturing telecom customer accounts
and their annual income, age, along with their average monthly revenue (dependent variable). The
dataset also contains predicted values of “average monthly revenue” from a regression model. The
learner is expected to apply the concept of calculation of coefficient of determination.

Page 11 of 16
https://leaps.analyttica.com

Check the p-value of the Parameter Estimates

The p-value for each variable tests the null hypothesis that the coefficient is equal to zero (no effect).
A low p-value (<0.05) indicates that we can reject the null hypothesis. In other words, a predictor
that has a low p-value can be included in the model because changes in the predictor's value are
related to changes in the response variable. Conversely, a larger (insignificant) p-value suggests that
changes in the predictor are not associated with changes in the response. This is an iterative process
and the analyst may need to re-run the model until only significant variables remain. If there are
hundreds of variables then the analyst may choose to automate the variable selection using the
forward, backward or stepwise techniques. Automated variable selection is however, not
recommended for small number of variables in the dataset.

Module 2 Simulation 1: Build a Multivariate Linear Regression Model and Evaluate Parameter
Significance

In this simulation, the learner is exposed to a sample dataset capturing the flight status of flights
with their delay in arrival, along with various possible predictor variables like departure delay,
distance, air time, etc. The learner is expected to build a multiple regression model where all the
variables are significant.

Residual Analysis
We can also evaluate a regression model based on various summary statistics on error or residuals.
Some of them are:
• Root Mean Square Error (RMSE): Where we find average of squared residuals as per the given
formula:
n 2
RMSE = 1 ∑(Y − ̂Y )
n i i
i=1

• Mean Absolute Percentage Error (MAPE): We find the average percentage deviation as per
the given formula:
n
ABS(Y − ̂Y )
i i
MAPE = ∑ 1
n Yi
i=1

We also often look at the distribution of absolute percentage deviation across all observations.

Page 12 of 16
https://leaps.analyttica.com

Rank Ordering
Observations are grouped based on predicted values of the target variable. The average of the actual
vs. predicted values of the target variable, across the groups, is observed to see if they move in the
same direction across the groups (increase or decrease). This is called the rank ordering check.

ATH Module 2 Simulation 2: Evaluating a Linear Model

In this simulation, the learner is exposed to a sample dataset capturing flight fare data. The objective
is to predict fare between two locations. The data contains actual fare along with a predicted average
fare. The learner is expected to evaluate the accuracy of the model using various statistics and
measures.

Assumptions of Linear Regression

There are some basic but strong underlying assumptions behind the linear regression model
estimation. After fitting a regression model, we should also test the validation of each of these
assumptions.
• There must be a causal relationship between the dependent and the independent variable(s)
which can be expressed as a linear function. A scatter plot of target variable vs. predictor
variable can help us validate this.
• Error term of one observation is independent of that of the other. Otherwise we say the data
has autocorrelation problem. We use Durbin-Watson test to check the presence of
autocorrelation.
• The mean (or expected value) of errors is zero.
• The variance of errors does not depend on the value of any predictor variable. This means,
errors have a constant variance along the regression line. This characteristic is often termed
as Homoscedasticity. Breausch-Pagan test helps us to test if the data is homoscedastic or
heteroscedastic.
• Errors follow normal distribution. We can use normality test on the errors here
The independent variables are not correlated or there is no multicollinearity in the data. Though this
is not a mandatory condition, the problem of multicollinearity makes the estimated values unstable.
Variance Inflation Factor (VIF) helps us to identify any multicollinearity.

Page 13 of 16
https://leaps.analyttica.com

Variable Selection and Transformations

Having multiple predictor variables introduces noise in the modeling process, affecting the estimates
of other variables. The principle of Occam’s Razor states that among several plausible explanations
for a phenomenon, the simplest is the best. Applied to regression analysis, this implies that the
smallest model that fits the data is the best.

There are various techniques available to select the best set of variables. Variable reduction
techniques vary depending on the kind of modeling technique used. For linear regression, we often
use any one or more of the following techniques:

• Box-Cox transformations
• Variable multicollinearity through Variance Inflation Factor (VIF)
• Principal Component Analysis
• Stepwise/ Forward/ Backward variable selection technique

Module 2 Simulation-3: Apply the concept of Variable Transformation

In this simulation, the learner is exposed to U. S. Census Bureau data on per capita retail sales along
with some socio-economic variables for the year 1992, of 845 US Standard Metropolitan Statistical
Areas (SMAS). The objective is to predict the “Per Capita Retail Sales” using other socio-economic
variables as possible predictors. The socio-economic variables are not always linearly related and
hence the learner is expected to try various transformations on the variables and try to see which
fits the model better.
Here we need to discuss one more important aspect of regression model fitting. Often, we find one
predictor variable to be exceptionally strong in a regression model compared to other predictors.
Such predictors also contribute to the extremely high accuracy of the model. This is known as the
problem of overfitting.
The main problem of these models is the fact that these models become too dependent on a single
variable. If there is any issue with the value of that specific variable the entire model fails. Sometimes,
the selected variables are actually a part of the target variable.

For Example: Suppose we are trying to fit a regression model to predict “Household Expenditure” of
a household using various predictor variables like “Household Income”, “Household Size”, “City Cost
of Living Index”, etc. Now, note that “Household Income” is expected to have a very high impact on

Page 14 of 16
https://leaps.analyttica.com

the model compared to any other predictor. Also, the model can be very high on efficiency. For a
linear model, the R-square can be as high as 98-99%. On removing the variable from the model, the
R-square will come down to may be 20% or 30%.

Now think about if “Household Income” is a right variable to predict “Household Expenditure”.
Expenditure is actually a part of income. Often, people are reluctant about revealing their actual
income, introducing high levels of impurity in the data. Hence, we should not have included the
variable in the model.
Also, 98% R-square is too high to believe in any real-life scenario. In general, any linear model with
an R-square more that 75% or 80% must be subjected to detailed inspection and checked for
overfitting. Models with an R-square of 40% - 50% are deemed acceptable in most practical cases.

Module 2 Simulation-4: Learn & Apply concepts of Variable Selection & Overfitting

In this simulation, the learner is exposed to a sample dataset capturing flight fares where the
objective is to predict fare between two locations. The learner is expected to select the significant
variable for the model first and then check if there is any problem of overfitting. If found, learner
should remove the requisite variable(s) and iterate through the variable selection process.

Page 15 of 16
https://leaps.analyttica.com

Write to us at
[email protected]

USA Address
Analyttica Datalab Inc.
1007 N. Orange St, Floor-4, Wilmington, Delaware - 19801
Tel: +1 917 300 3289/3325

India Address
Analyttica Datalab Pvt. Ltd.
702, Brigade IRV Centre,2nd Main Rd, Nallurhalli,
Whitefield, Bengaluru - 560066.
Tel : +91 80 4650 7300

Page 16 of 16

Regression Analysis
No ratings yet
Regression Analysis
12 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
No ratings yet
Predictive Modelling Using Linear Regression: © Analy Datalab Inc., 2016. All Rights Reserved
16 pages
Regression Analysis
100% (2)
Regression Analysis
11 pages
Data Analytics Regression Unit III
No ratings yet
Data Analytics Regression Unit III
27 pages
Unit III
No ratings yet
Unit III
18 pages
Data Analytics Regression UNIT-III
No ratings yet
Data Analytics Regression UNIT-III
26 pages
UNIT - III
No ratings yet
UNIT - III
9 pages
Regression Techniques
No ratings yet
Regression Techniques
14 pages
ida unit-3.rtf
No ratings yet
ida unit-3.rtf
34 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
UNIT II Regression
No ratings yet
UNIT II Regression
59 pages
Regression
No ratings yet
Regression
19 pages
Regression Unit-2
No ratings yet
Regression Unit-2
5 pages
Cs3351 Aiml Unit 3 Notes Eduengg
No ratings yet
Cs3351 Aiml Unit 3 Notes Eduengg
38 pages
Unit-3 Data Analysis
No ratings yet
Unit-3 Data Analysis
36 pages
Data Analytics Unit 2
No ratings yet
Data Analytics Unit 2
13 pages
UNIT-3NEW
No ratings yet
UNIT-3NEW
34 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
(Revised) Simple Linear Regression and Correlation
No ratings yet
(Revised) Simple Linear Regression and Correlation
41 pages
Coding 2
No ratings yet
Coding 2
3 pages
Unit - II_DA
No ratings yet
Unit - II_DA
22 pages
Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
DISCRETE MATH Chapter-8
No ratings yet
DISCRETE MATH Chapter-8
34 pages
Regression Analysis
No ratings yet
Regression Analysis
54 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
Chapter 6
No ratings yet
Chapter 6
58 pages
Regression Course For Second Year (Chap 1-3)
No ratings yet
Regression Course For Second Year (Chap 1-3)
59 pages
DA-3rd unit
No ratings yet
DA-3rd unit
16 pages
PSAI Unit3
No ratings yet
PSAI Unit3
36 pages
Regression
No ratings yet
Regression
25 pages
MLT Unit 2
No ratings yet
MLT Unit 2
53 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
DA2
No ratings yet
DA2
12 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Management Science Notes
No ratings yet
Management Science Notes
13 pages
Dr. Hussin Abdullah School of Economics, Finance and Banking, Uum Cob
No ratings yet
Dr. Hussin Abdullah School of Economics, Finance and Banking, Uum Cob
12 pages
MOD3_EDA
No ratings yet
MOD3_EDA
16 pages
SAIDS- Linear Least Squares.pptx
No ratings yet
SAIDS- Linear Least Squares.pptx
27 pages
Regression Analysis
No ratings yet
Regression Analysis
14 pages
Regression Analysis and Forecasting Models
No ratings yet
Regression Analysis and Forecasting Models
28 pages
lecture 9-10
No ratings yet
lecture 9-10
28 pages
IV Ai & Ds Al3451 Ml Unit2
No ratings yet
IV Ai & Ds Al3451 Ml Unit2
50 pages
Unit 2 Topic 1 REGRESSION
No ratings yet
Unit 2 Topic 1 REGRESSION
19 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
MODULE-3
No ratings yet
MODULE-3
34 pages
Module -05 Statistical Computing and r Programming
No ratings yet
Module -05 Statistical Computing and r Programming
53 pages
Regression Analysis: Post Mid Assignment Topic
No ratings yet
Regression Analysis: Post Mid Assignment Topic
8 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Unit 3 notes
No ratings yet
Unit 3 notes
35 pages
Regression
No ratings yet
Regression
14 pages
Chapter 6: How To Do Forecasting by Regression Analysis
No ratings yet
Chapter 6: How To Do Forecasting by Regression Analysis
7 pages
DA-MODULE-3
No ratings yet
DA-MODULE-3
54 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Applied Parameter Estimation For Chemical Engineers
100% (2)
Applied Parameter Estimation For Chemical Engineers
459 pages
1 4 2 Methodology 5 3 Data Collection 6 4 Regression Analysis 7 5 Findings and Interpretation 6 Conclusion 7 Annexures 8 References
No ratings yet
1 4 2 Methodology 5 3 Data Collection 6 4 Regression Analysis 7 5 Findings and Interpretation 6 Conclusion 7 Annexures 8 References
7 pages
My451 PDF
No ratings yet
My451 PDF
264 pages
Phiscis 1 Lab Exp 1.1
No ratings yet
Phiscis 1 Lab Exp 1.1
11 pages
Linder 2016
No ratings yet
Linder 2016
7 pages
Course Outcome - BCA - BU - Sep - 2023 - Update
No ratings yet
Course Outcome - BCA - BU - Sep - 2023 - Update
24 pages
Uploads Files ButtonDetails 85955
No ratings yet
Uploads Files ButtonDetails 85955
17 pages
UNIT-III Lecture Notes
No ratings yet
UNIT-III Lecture Notes
18 pages
Tarea 10nestadistica
No ratings yet
Tarea 10nestadistica
9 pages
Laptop Price Prediction Using Machine Learning (Abstract)
0% (1)
Laptop Price Prediction Using Machine Learning (Abstract)
3 pages
218 1629907997 PDF
No ratings yet
218 1629907997 PDF
12 pages
Abcde: Socio-Economic Classification
No ratings yet
Abcde: Socio-Economic Classification
15 pages
A Multivariate Analysis of Customer Satisfaction in a Local Street Food Industry (1)
No ratings yet
A Multivariate Analysis of Customer Satisfaction in a Local Street Food Industry (1)
51 pages
H Ng-Vân PM Asm1
No ratings yet
H Ng-Vân PM Asm1
27 pages
Crop Yield Prediction Based On Indian Agriculture Using Machine Learning
No ratings yet
Crop Yield Prediction Based On Indian Agriculture Using Machine Learning
5 pages
Syllabus Regression
No ratings yet
Syllabus Regression
2 pages
A-Level Further Mathematics PDF
No ratings yet
A-Level Further Mathematics PDF
20 pages
Module 3 Regression Notes
No ratings yet
Module 3 Regression Notes
3 pages
Bopm 141 MGT Maths Ii Course
No ratings yet
Bopm 141 MGT Maths Ii Course
3 pages
Post-Covid Changes in Consumer Preferences For Social Media Marketing in India PDF
No ratings yet
Post-Covid Changes in Consumer Preferences For Social Media Marketing in India PDF
20 pages
Multivariate Quality Control Thesis
100% (1)
Multivariate Quality Control Thesis
135 pages
MLA LabManual1
No ratings yet
MLA LabManual1
52 pages
70008
No ratings yet
70008
47 pages
Multivariate Regression, slides
No ratings yet
Multivariate Regression, slides
61 pages
ResearchProposalFinalVer1 4 33
No ratings yet
ResearchProposalFinalVer1 4 33
30 pages
Ebooks File Data Analysis For Business Decision Making: A Laboratory Manual 2nd Edition Andres Fortino All Chapters
100% (2)
Ebooks File Data Analysis For Business Decision Making: A Laboratory Manual 2nd Edition Andres Fortino All Chapters
52 pages
Introduction To Statistical Methods: Introduction Statististical Method (Open Elective)
No ratings yet
Introduction To Statistical Methods: Introduction Statististical Method (Open Elective)
22 pages
Hydroxyl Value of Base Polyol PDF
No ratings yet
Hydroxyl Value of Base Polyol PDF
9 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
23 pages
Martin Jones & Nancy Cartwright (Eds. 2005) - Idealization Xii - Correcting The Model - Idealization and Abstraction
100% (2)
Martin Jones & Nancy Cartwright (Eds. 2005) - Idealization Xii - Correcting The Model - Idealization and Abstraction
305 pages

Regression PDF

Uploaded by

Regression PDF

Uploaded by

www.analyttica.

© Analy Datalab Inc.,

Simple and Multiple Linear Regression

Evaluating a Linear Model

Variable Selection and Transformations

Concept of Regression Analysis

Figure 1: Linear Relationship Figure 2: Polynomial Relationship

Why do we use Regression Analysis?

What are the benefits of using Regression Analysis?

b = regression coefficient, and

Types of Regression Techniques

• Number of independent variables

It is represented by the equation, 𝑌 = 𝑎 + 𝑏 ∗ 𝑋 + 𝑒, where a is intercept, b is slope of the line and e

Determining the Best Fitting Line

• 𝑦 denotes the observed response for experimental unit i

The equation of the best fitting line is:

Least Square Estimates

What Does the Equation Mean?

• The coefficient 𝑏0, or the intercept, is the expected value of Y when X = 0

The following figure explains the interpretations clearly.

Example of Linear Regression: Factors affecting Credit Card Sales

Module 1 Simulation 1: Learn & Apply a Simple Linear Regression Model

Evaluating a Linear Regression Model

F-Statistics and p-value

Module 1 Simulation 2: Learn & Apply the concept of R-Square

Check the p-value of the Parameter Estimates

ATH Module 2 Simulation 2: Evaluating a Linear Model

Assumptions of Linear Regression

Variable Selection and Transformations

Module 2 Simulation-3: Apply the concept of Variable Transformation

You might also like