0% found this document useful (0 votes)

76 views

Lecture 1. Part 1-Regression Analysis. Correlation and SLRM

Uploaded by

Richelle Pausang

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views

Lecture 1. Part 1-Regression Analysis. Correlation and SLRM

Uploaded by

Richelle Pausang

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 44

STT151A

Statistics for Research

Part 1: Regression Analysis

Part 1: Regression Analysis
Time Frame: Weeks 1-4
1.1 Introduction
1.2 Least-Squares Regression and Correlation (SLRM and MLRM)
1.3 Model Validation & Remedial Measures, Outlier Detection, and
Transformations
1.4 Variable Selection & Model Building
1.5 Intrinsically Linear Models
** 1.6 Logistic Regression
1.7 Kaplan-Meier Survival Analysis|
QUIZ #1
Course Syllabus Discussion
Download Statistica
https://helpdesk.dlsu.edu.ph/guides/software/statistica-installation-
guide.asp
1.2 Least-Squares Regression and
Correlation
Simple Linear Regression Model and
Multiple Linear Regression Model
Correlation Analysis
From: https://www.mathsisfun.com/data/correlation.html
The word Correlation is made of Co- (meaning "together"), and Relation

• Correlation is Positive when the values increase together, and

• Correlation is Negative when one value decreases as the other increases

Correlation can have a value:

•1 is a perfect positive correlation
•0 is no correlation (the values don't seem linked at all)
•-1 is a perfect negative correlation
The value shows how good the correlation is (not how steep the line is), and if it is positive or negative.
We can easily see that warmer weather and higher sales go together. The relationship is good but not perfect.
Correlation Is Not Good at Curves
The correlation calculation only works properly for straight line relationships
It gets so hot that people aren't going near the shop, and sales start dropping.
Here is the latest graph:

The correlation value is now 0: "No Correlation" ... !

The calculated correlation value is 0, which means "no correlation".
But we can see the data follows a nice curve that reaches a peak around 25° C.
But the correlation calculation is not "smart" enough to see this.
Linear Correlation
Linear relationships Curvilinear relationships

Y Y

X X

Y Y

X X
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
◼
Linear Correlation
Strong relationships Weak relationships

Y Y

X X

Y Y

X X
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
◼
Linear Correlation
No relationship

X
Slide from: Statistics for Managers Using Microsoft® Excel 4th Edition, 2004 Prentice-Hall
◼
Moral of the story: make a Scatter Plot, and look at it!
You may see a relationship that the calculation does not.
"Correlation Is Not Causation"
A common saying is "Correlation Is Not
Causation".

• What it really means is that a correlation does not prove one thing
causes the other:
• One thing might cause the other
• The other might cause the first to happen
• They may be linked by a different thing
• Or it could be random chance!
• There can be many reasons the data has a good correlation.
Example: Poor suburbs are more likely to have high pollution
Why?
• Do poor people make pollution?
• Are polluted suburbs the only place poor people can afford?
• Is it a common link, such as factories with low paying jobs and lots of
pollution?
Pearson Product-Moment
Correlation
Pearson Product-Moment Correlation
Ice
Temperatu
Cream x2 y2 xy
re °C (x)
Sales(y)

14.2 215 201.64 46225 3053

16.4 325 268.96 105625 5330
11.9 185 141.61 34225 2202
15.2 332 231.04 110224 5046
18.5 406 342.25 164836 7511
22.1 522 488.41 272484 11536
19.4 412 376.36 169744 7993
25.1 614 630.01 376996 15411
23.4 544 547.56 295936 12730
18.1 421 327.61 177241 7620
22.6 445 510.76 198025 10057
17.2 408 295.84 166464 7018
Total 224.1 4829 4362.1 2118025 95507
How to Perform Pearson
Correlation( r ) in Excel
How to open file in Statistica:
https://www.youtube.com/watch?v=uc_67xVZK8s

How to perform Pearson correlation in Statistica:

https://www.youtube.com/watch?v=Ev86DMtLXOk
Age (years) BMI(kg/m2)
73 28 =PEARSON(array, array)

22 22 Coefficient (r ) : 0.761713
74 27
34 29
50 29
42 27
64 28
53 29
43 24
21 19
12 17
Correlation
• Quantification of the relationship between two QUANTITATIVE
variables

• The quantity is called the Pearson’s correlation coefficient (r).

• -1 < r < 1
• (+) direct linear relationship
• (-) inverse linear relationship
Conclusion
There is a strong inverse linear relationship between water
temperature and decrease in pulse rate of children
How to Perform Pearson
Correlation( r ) in Statistica
Open data in Statistica:
https://www.youtube.com/watch?v=uc_67xVZK8s

Perform Pearson Correlation in Statistica:

https://www.youtube.com/watch?v=AvdKQyVr9FQ&list=PLsY7hM6ZLBNOBT9RPYo0oeIuFezrDNYXu&index=6
Correlation
Download FIES data from CANAVAS
Family Income and Expenditure Survey
Get the Pearson correlation between Total Income and all the variables
except the categorial variables encoded in text.

Reference : https://www.kaggle.com/datasets/grosvenpaul/family-income-and-expenditure/discussion
Note: This data is trimmed to use as a tool for class discussion. Complete data is available upon request from
PSA with recommendation of thesis adviser.
Observation on Correlation of Total Income
and other variables
What observations can you get from the data?
Which has the highest correlation with Total Income?
Which are not significantly related with Total Income?
Which variables have directly linear relationship with Total Income?
Which variables have indirect linear relationship with Total Income?
Simple Linear Regression Model
(SLRM)
Introduction to Simple Linear Model
http://www.sthda.com/english/articles/40-regression-analysis/167-simple-linear-
regression-in-r/
https://www.youtube.com/watch?v=owI7zxCqNY0
The simple linear regression is used to predict a quantitative outcome y on the basis of one single predictor
variable x. The goal is to build a mathematical model (or formula) that defines y as a function of the x variable.

Once, we built a statistically significant model, it’s possible to use it for predicting future outcome on the basis
of new x values.
From the scatter plot, it can be seen that not all the data points
fall exactly on the fitted regression line. Some of the points are
above the blue curve and some are below it;

overall, the residual errors (e) have approximately mean zero.

The sum of the squares of the residual errors are called the
Residual Sum of Squares or RSS.

The average variation of points around the fitted regression

line is called the Residual Standard Error (RSE). This is one of
the metrics used to evaluate the overall quality of the fitted
regression model. The lower the RSE, the better it is.

Mathematically, the beta coefficients (b0 and b1) are determined so that the RSS is as minimal as possible. This
method of determining the beta coefficients is technically called least squares regression or ordinary least squares
(OLS) regression.
Least Square Method using Excel
Least Square Method using excel
https://www.youtube.com/watch?v=P8hT5nDai6A
Least Square Method using
Statistica
https://www.youtube.com/watch?v=VInW7mmxzOU&list=PLsY7hM6ZLBNOBT9RPY
o0oeIuFezrDNYXu&index=7
Least Square Method using Statistica
Example 1.
Encode the table below, where the dependent variable is y and the
independent variable (predictor) is x.
x y
1 1.5
2 3.8
3 6.7
4 9.0
5 11.2
6 13.6
7 16
Based on the Least Square Method, the
line of best fit to the data is
𝑦 = −0.828571 + 2.414286x
Prediction: Example
𝑦 = −0.828571 + 2.414286x

𝑦 = −0.828571 + 2.414286 5 = 11.242

The EXPECTED value of y for the value of x at 5 is 11.242.

-0.828571 :
The EXPECTED value of y for when the predictor, x, is 0 is -0.828571.

𝟐. 𝟒𝟏𝟒𝟐𝟖𝟔:
The EXPECTED increase in value of y for every unit increase in x.
Model Validation using Coefficient of
Determination (R-squared)
https://www.youtube.com/watch?v=TCtDXmvXDUc
https://www.youtube.com/watch?v=igIT6xzAH8s

A measure of goodness-of-fit.
𝑅2 closer to 1 is a good model
𝑅2 closer to zero is not a good model.
𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 = 0.99876183
Interpretation: 99.87% of the variation of y can be explained by x.
Model is significant if the p-value is less than 0.05 (p<0.05)
Example 2. Least Square Method using Statistica
Use FIES data, where the dependent variable is total income and the
independent variable (predictor) is Communication Expenditure.

Based on the Least Square Method, the line of best fit to the data is
𝑦 = 133241.9 + 27.9x
Model Validation using Coefficient of
Determination (R-squared)

𝐴𝑑𝑗𝑢𝑠𝑡𝑒𝑑 𝑅2 = 0.50428708
Interpretation: 50.42% of the variation of the total household income can be
explained by communication expenditure.
𝑦ො = 133241.9 + 27.9x
Total Household Income = 133241.9 + 27.9 (Communication Expenditure)
The symbol, 𝑦,
ො means it is an estimate only. The symbol, y, means it is the
actual population measurement.
Prediction:
1. The EXPECTED total household income when the communication
expenditure is 0 is 𝐏𝐡𝐩 𝟏𝟑𝟑𝟐𝟒𝟏. 𝟗𝟎.
2. The Expected increase in total household income for every one unit
increase in communication expenses is 𝐏𝐡𝐩𝟐𝟕. 𝟗.
3. A household with annual communication expenditure of 10000 has an
Expected total household income of
Total Household Income = 133241.9 + 27.9 10000 = 𝐏𝐡𝐩 𝟒𝟏𝟐, 𝟐𝟒𝟏. 𝟗

Complete Business Statistics: by Amir D. Aczel & Jayavel Sounderpandian 6 Edition
No ratings yet
Complete Business Statistics: by Amir D. Aczel & Jayavel Sounderpandian 6 Edition
54 pages
Lecture 10 Correlation and Regression
No ratings yet
Lecture 10 Correlation and Regression
43 pages
QTDM Unit-2 Correlation & Regression Analysis
No ratings yet
QTDM Unit-2 Correlation & Regression Analysis
12 pages
Regression Analysis
No ratings yet
Regression Analysis
29 pages
Correlation
No ratings yet
Correlation
27 pages
Pearson Correlation Analysis
100% (1)
Pearson Correlation Analysis
26 pages
Chapter 5 Regression Analysis
No ratings yet
Chapter 5 Regression Analysis
14 pages
Regression Analysis Linear and Multiple Regression
No ratings yet
Regression Analysis Linear and Multiple Regression
6 pages
Regression Analysis Linear and Multiple Regression
No ratings yet
Regression Analysis Linear and Multiple Regression
6 pages
Regression Analysis Linear and Multiple Regression
No ratings yet
Regression Analysis Linear and Multiple Regression
6 pages
Data Science Unit-3
No ratings yet
Data Science Unit-3
42 pages
Regression: Leech N L, Barret K C & Morgan G A (2011)
No ratings yet
Regression: Leech N L, Barret K C & Morgan G A (2011)
35 pages
Complete Business Statistics: Simple Linear Regression and Correlation
No ratings yet
Complete Business Statistics: Simple Linear Regression and Correlation
50 pages
DS Lecture 3c Covariance and Correlation
No ratings yet
DS Lecture 3c Covariance and Correlation
21 pages
Module 2 Part 1 - Types of Forecasting Models and Simple Linear Regression
No ratings yet
Module 2 Part 1 - Types of Forecasting Models and Simple Linear Regression
71 pages
8-Correlation and Regression Analysis
No ratings yet
8-Correlation and Regression Analysis
4 pages
Linear Regression. Com
No ratings yet
Linear Regression. Com
13 pages
Correlation Coefficient
100% (1)
Correlation Coefficient
16 pages
BUSINESS STATISTICS: Simple Linear Regression and Correlation
No ratings yet
BUSINESS STATISTICS: Simple Linear Regression and Correlation
55 pages
Bus 173 - Lecture 5
No ratings yet
Bus 173 - Lecture 5
38 pages
Thesis Linear Regression
100% (2)
Thesis Linear Regression
5 pages
Regression
No ratings yet
Regression
21 pages
Chapter 10: Correlation and Regression Chapter 13: Nonparametric Statistics
No ratings yet
Chapter 10: Correlation and Regression Chapter 13: Nonparametric Statistics
27 pages
Correlation Analysis: Concept of Univariate, Bivariate Data
No ratings yet
Correlation Analysis: Concept of Univariate, Bivariate Data
48 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Review Questions
No ratings yet
Review Questions
9 pages
Correlation & Regression
100% (1)
Correlation & Regression
23 pages
Correlation and Regression
No ratings yet
Correlation and Regression
10 pages
Stats and Maths
No ratings yet
Stats and Maths
29 pages
Linear Regression - Six Sigma Study Guide
No ratings yet
Linear Regression - Six Sigma Study Guide
17 pages
Econometrics For Finance
100% (1)
Econometrics For Finance
54 pages
Financial Econometrics: ASSIGNMENT: Functional Forms of Regression Models
No ratings yet
Financial Econometrics: ASSIGNMENT: Functional Forms of Regression Models
14 pages
Stastics ll:6
No ratings yet
Stastics ll:6
22 pages
Chapter Seventeen: Correlation and Regression
No ratings yet
Chapter Seventeen: Correlation and Regression
80 pages
Lesson 5
No ratings yet
Lesson 5
62 pages
Session_19&20
No ratings yet
Session_19&20
54 pages
W6 - L4 - Simple Linear Regression
No ratings yet
W6 - L4 - Simple Linear Regression
4 pages
Regression Analysis
100% (1)
Regression Analysis
43 pages
Statistics and Probability: Quarter 4 - Module 7 Pearson's Sample Correlation Coefficient
No ratings yet
Statistics and Probability: Quarter 4 - Module 7 Pearson's Sample Correlation Coefficient
16 pages
Data Analytics Lesson 11 Notes
No ratings yet
Data Analytics Lesson 11 Notes
8 pages
WINSEM2020-21 MAT2001 ETH VL2020210505834 Reference Material I 25-Mar-2021 Module 3 - Correlation and Regression
No ratings yet
WINSEM2020-21 MAT2001 ETH VL2020210505834 Reference Material I 25-Mar-2021 Module 3 - Correlation and Regression
31 pages
Regression And-Correlation
No ratings yet
Regression And-Correlation
69 pages
Problem Solving Corelationcoeff
No ratings yet
Problem Solving Corelationcoeff
4 pages
L7 Correlation
No ratings yet
L7 Correlation
40 pages
Lab 4 Means and Correlations - Upload
No ratings yet
Lab 4 Means and Correlations - Upload
26 pages
Regression Models: To Accompany
No ratings yet
Regression Models: To Accompany
75 pages
03 Linear Regression
No ratings yet
03 Linear Regression
29 pages
Module 6A Estimating Relationships
No ratings yet
Module 6A Estimating Relationships
104 pages
11 Economics Impq Ch07 Correlation
No ratings yet
11 Economics Impq Ch07 Correlation
10 pages
Chapter 4
No ratings yet
Chapter 4
63 pages
Spurious Regressions
No ratings yet
Spurious Regressions
17 pages
Regression Analysis
No ratings yet
Regression Analysis
50 pages
Linear Regression Analysis. Statistics 2 Notes
No ratings yet
Linear Regression Analysis. Statistics 2 Notes
20 pages
Chapter 8
No ratings yet
Chapter 8
45 pages
Chapter 4
No ratings yet
Chapter 4
15 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Econometrics Lectures
No ratings yet
Econometrics Lectures
22 pages
Método Regresión
No ratings yet
Método Regresión
14 pages
U02Lecture06 Regression
No ratings yet
U02Lecture06 Regression
25 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
TE MECH SEM-5 Computational Methods
No ratings yet
TE MECH SEM-5 Computational Methods
2 pages
Godfried Toussaint - The Euclidean Algorithm Generates Traditional Musical Rhythms (Extended) PDF
100% (1)
Godfried Toussaint - The Euclidean Algorithm Generates Traditional Musical Rhythms (Extended) PDF
25 pages
ICSE Board Class IX Physics Gold Series Sample Paper - 2: Time: 2 Hrs Total Marks: 80 General Instructions
No ratings yet
ICSE Board Class IX Physics Gold Series Sample Paper - 2: Time: 2 Hrs Total Marks: 80 General Instructions
5 pages
4 Dim
No ratings yet
4 Dim
8 pages
Mat565 - Table of Laplace Transforms
No ratings yet
Mat565 - Table of Laplace Transforms
1 page
05 Area of Irregular Shapes
No ratings yet
05 Area of Irregular Shapes
3 pages
Fluid II Open Ended Lab Report
No ratings yet
Fluid II Open Ended Lab Report
17 pages
ST2187 Block 3
No ratings yet
ST2187 Block 3
20 pages
Fluid Mechanics
No ratings yet
Fluid Mechanics
8 pages
Carepoint Review Center: Practice Test (Social Research/Research Methods)
No ratings yet
Carepoint Review Center: Practice Test (Social Research/Research Methods)
8 pages
Download Statistical programing in SAS Second Edition A. John Bailer ebook All Chapters PDF
100% (1)
Download Statistical programing in SAS Second Edition A. John Bailer ebook All Chapters PDF
62 pages
Basic Concepts I: A Brief Introduction To Vibration Analysis of Process Plant Machinery (I)
No ratings yet
Basic Concepts I: A Brief Introduction To Vibration Analysis of Process Plant Machinery (I)
92 pages
Markovian Decision Process: Chapter Guide. This Chapter Applies Dynamic Programming To The Solution of A Stochas
No ratings yet
Markovian Decision Process: Chapter Guide. This Chapter Applies Dynamic Programming To The Solution of A Stochas
20 pages
ME101-Lecture10 - Friction and Wedge
No ratings yet
ME101-Lecture10 - Friction and Wedge
31 pages
Exercise 2.1 (Advanced)
No ratings yet
Exercise 2.1 (Advanced)
3 pages
Unit IV Aiml
No ratings yet
Unit IV Aiml
32 pages
Aci Structural Journal January-February 2013 v. 110 No. 1 Complete
75% (4)
Aci Structural Journal January-February 2013 v. 110 No. 1 Complete
169 pages
Cs2353 Object Oriented Analysis and Design
No ratings yet
Cs2353 Object Oriented Analysis and Design
4 pages
ch04 Solutions Manual Chemistry Math Books
No ratings yet
ch04 Solutions Manual Chemistry Math Books
24 pages
GBVG Proceedings v1
No ratings yet
GBVG Proceedings v1
196 pages
NigerPostgradMedJ224195-2432238 064522
No ratings yet
NigerPostgradMedJ224195-2432238 064522
7 pages
Chapter 02 COSTS Cost Behaviour
No ratings yet
Chapter 02 COSTS Cost Behaviour
11 pages
Agustina 2013 (Pengaruh Persepsi Pengembangan Karir Dan Job Characteristic Terhadap Turnover Intention Pada Karyawan Terindikasi Hobo Syndrome)
No ratings yet
Agustina 2013 (Pengaruh Persepsi Pengembangan Karir Dan Job Characteristic Terhadap Turnover Intention Pada Karyawan Terindikasi Hobo Syndrome)
1 page
NumberSeries pdf-37 PDF
100% (1)
NumberSeries pdf-37 PDF
4 pages
Forces-2
No ratings yet
Forces-2
9 pages
Exercise 3
No ratings yet
Exercise 3
2 pages
Python TIE
No ratings yet
Python TIE
4 pages
Standard Costing
100% (2)
Standard Costing
65 pages
13 Two Dimensional Laplace Equation L3
No ratings yet
13 Two Dimensional Laplace Equation L3
13 pages
MTS 101 +++ COMPILED BY EXPLICIT
No ratings yet
MTS 101 +++ COMPILED BY EXPLICIT
11 pages

Lecture 1. Part 1-Regression Analysis. Correlation and SLRM

Uploaded by

Lecture 1. Part 1-Regression Analysis. Correlation and SLRM

Uploaded by

STT151A

Statistics for Research

Part 1: Regression Analysis

• Correlation is Positive when the values increase together, and

Correlation can have a value:

The correlation value is now 0: "No Correlation" ... !

14.2 215 201.64 46225 3053

How to perform Pearson correlation in Statistica:

• The quantity is called the Pearson’s correlation coefficient (r).

Perform Pearson Correlation in Statistica:

overall, the residual errors (e) have approximately mean zero.

The average variation of points around the fitted regression

𝑦 = −0.828571 + 2.414286 5 = 11.242

You might also like