0% found this document useful (0 votes)

95 views

Part 2 Exploring Relationships Among Variables

This document discusses scatter plots and correlation analysis. It defines scatter plots and their use in analyzing relationships between two quantitative variables. Characteristics of scatter plots like linearity, slope, and strength are described. Correlation coefficients are introduced as a measure of both the direction and strength of association between two variables. The ranges and interpretations of correlation coefficients are explained. Examples are provided to illustrate positive and negative correlation as well as the effect of outliers on correlation.

Uploaded by

pu3bsd

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views

Part 2 Exploring Relationships Among Variables

Uploaded by

pu3bsd

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

6/1/2020

Ho Chi Minh City University of Technology –

Bach Khoa
Content
 Part 2: Exploring relationships between variables
 2.1 Scatter plots
 2.2 Correlation
Applied Statistics in  2.3 Linear regression
Construction Management  2.4 Multiple regression

Lecturer: Nguyen Hoai Nghia (Jack), Ph.D.

Email: [email protected],vn
[email protected]

1 2

2.1 Scatter plots 2.1 Scatter plots

2.1.1 Definition 2.1.1 Definition
A scatterplot is a graphic tool used to display the relationship
between two quantitative variables.
 A scatterplot consists of an X axis (the horizontal axis), a Y
axis (the vertical axis), and a series of dots.
 Each dot on the scatterplot represents one observation from
a data set.

3 4

3 4
6/1/2020

2.1 Scatter plots 2.1 Scatter plots

2.1.2 Characteristics 2.1.2 Characteristics
 Scatterplots are used to analyze patterns in bivariate data.
These patterns are described in terms of linearity, slope, and
strength.
 Linearity refers to whether a data pattern is linear
(straight) or nonlinear (curved).
 Slope refers to the direction of change in variable Y when
variable X gets bigger  positive/ negative.
 Strength refers to the degree of "scatter" in the plot. If the
dots are widely spread, the relationship between variables
is weak. If the dots are concentrated around a line, the
relationship is strong.

5 6

2.2 Correlation 2.2 Correlation

2.2.1 Relationships 2.2.2 Strength of correlation
 Let consider data collected students’ Height (in inches) and  The green points in the upper right and lower left quadrants
Weight (in pounds)  a positive association between the two. are consistent with a positive association while the red
 the form of the scatterplot is fairly straight as well. points in the other two quadrants are consistent with a
negative association

7 8

7 8
6/1/2020

2.2 Correlation 2.2 Correlation

2.2.3 Correlation coefficients 2.2.3 Correlation
 Correlation coefficients measure both the direction and coefficients (cont.)
strength of association between two variables.

𝑋 µ 𝑌 µ
∑
𝜎 𝜎
 Population: 𝑅
𝑁

∑ 𝑥 𝑥̅ 𝑦 𝑦
 Sample: 𝑟
∑ 𝑥 𝑥̅ ∑ 𝑦 𝑦

9 10

2.2 Correlation 2.2 Correlation

2.2.3 Correlation coefficients (cont.) 2.2.3 Correlation coefficients (cont.)

11 12

11 12
6/1/2020

2.2 Correlation 2.2 Correlation

2.2.3 Correlation coefficients (cont.) 2.2.3 Correlation coefficients (cont.)
 The ranges of a correlation coefficient is -1 and 1.  The correlation becomes weaker as the data points become
 The greater the absolute value of a correlation coefficient, the more scattered.
stronger linear relationship.  If the data points fall in a random pattern, the correlation is
 The strongest linear relationship is indicated by a correlation equal to zero.
coefficient of -1 or 1 when data points fall exactly on a  Correlation is sensitive to outliers. Compare the first
straight line. scatterplot with the last scatterplot. The single outlier in the
 The weakest linear relationship is indicated by a correlation last plot greatly reduces the correlation (from 1.00 to 0.71).
coefficient equal to 0.  Correlation has no units.
 A positive correlation means that if one variable gets bigger,  Correlation is not affected by changes in the center or scale
the other variable tends to get bigger. of either variable. Changing the units or baseline of either
 A negative correlation means that if one variable gets bigger, variable has no effect on the correlation coefficient.
the other variable tends to get smaller. 13
 Correlation depends only on the z-scores. 14

13 14

2.2 Correlation 2.2 Correlation

2.2.3 Correlation coefficients (cont.) 2.2.3 Correlation coefficients (cont.)
Question: Question: A national consumer magazine reported the following
 Is it ok if we say a correlation of 0 mean zero relationship correlations.
between two variables?  The correlation between car weight and car reliability is -0.30.
 The correlation between car weight and annual maintenance
cost is 0.20.
 Which of the following statements are true?
I. Heavier cars tend to be less reliable.
II. Heavier cars tend to cost more to maintain.
III. Car weight is related more strongly to reliability than
to maintenance cost

15 16

15 16
6/1/2020

2.2 Correlation 2.3 Linear regression

2.2.4 Conditions for Correlation 2.3.1 Definition
 Quantitative Variables Condition  Correlation is only about  In a cause and effect relationship, the independent
quantitative variables. variable is the cause, and the dependent variable is the
 Straight Enough Condition  to look at the scatterplot to see effect.
whether it looks reasonably straight. That’s a judgment call,  Least squares linear regression is a method for predicting the
but not a difficult one. value of a dependent variable Y, based on the value of an
 No Outliers Condition  Outliers can distort the correlation independent variable X.
dramatically, making a weak association look strong or a
strong one look weak. Outliers can even change the sign of Y = Β0 + Β1X
the correlation. But it’s easy to see outliers in the scatterplot.

17 18

2.3 Linear regression 2.3 Linear regression

2.3.2 Conditions to apply simple linear regression 2.3.2 Conditions to apply simple linear regression
 The dependent variable Y has a linear relationship to the  For any given value of X,
independent variable X  make sure that the  The Y values are independent, as indicated by a random
XY scatterplot is linear and that the residual plot shows a pattern on the residual plot.
random pattern.  The Y values are roughly normally distributed
 For each value of X, the probability distribution of Y has the (i.e., symmetric and unimodal). A little skewness is ok if
same standard deviation σ. When this condition is satisfied, the sample size is large. A histogram or a dotplot will show
the variability of the residuals will be relatively constant the shape of the distribution.
across all values of X, which is easily checked in a residual
plot.

19 20

19 20
6/1/2020

2.3 Linear regression 2.3 Linear regression

2.3.3 Least Squares Regression Line 2.3.3 Least Squares Regression Line (cont.)
Linear regression finds the straight line, called the least squares  Normally, you will use a computational tool - a software
regression line or LSRL, that best represents observations in package or a graphing calculator - to find b0 and b1. You enter
a bivariate data set the X and Y values into your program or calculator, and the
tool solves for each parameter.
 Population: Y = Β0 + Β1X  Or you can calculate values of b0 and b1 manually using the
equations.

∑
 Sample: ŷ = b 0 + b 1x 𝑏 r * sy / sx
∑

𝑏 𝑦 - 𝑏 𝑥̅
21 22

21 22

2.3 Linear regression 2.3 Linear regression

2.3.4 Properties of the Regression Line 2.3.4 Properties of the Regression Line
 The line minimizes the sum of squared differences between
observed values (the y values) and predicted values (the ŷ
values computed from the regression equation).
 The regression line passes through the mean of the X values
(x) and through the mean of the Y values (y).
 The regression constant (b0) is equal to the y intercept of the
regression line.
 The difference between the observed value and its predicted
value is called its residual. The residual value tells how well
the model predicted the observed value at that point

23 24

23 24
6/1/2020

2.3 Linear regression 2.3 Linear regression

2.3.5 Coefficient of determination 2.3.5 Coefficient of determination (cont.)
 The coefficient of determination (denoted by R2) is a key  An R2 between 0 and 1 indicates the extent to which the
output of regression analysis. It is interpreted as the dependent variable is predictable. An R2 of 0.10 means that
proportion of the variance in the dependent variable that is 10 percent of the variance in Y is predictable from X; an R2 of
predictable from the independent variable. 0.20 means that 20 percent is predictable; and so on.
 The coefficient of determination ranges from 0 to 1.  If you know the linear correlation (r) between two variables,
 An R2 of 0 means that the dependent variable cannot be then the coefficient of determination (R2) is easily computed
predicted from the independent variable. using the following formula: R2 = r2.
 An R2 of 1 means the dependent variable can be predicted
without error from the independent variable.

25 26

2.3 Linear regression 2.3 Linear regression

2.3.5 Standard error All statistics packages make a table of results for a regression
 The standard error about the regression line (often denoted
by SE) is a measure of the average amount that the
regression equation over- or under-predicts. The higher the
coefficient of determination, the lower the standard error; and
the more accurate predictions are likely to be.

27 28

27 28
6/1/2020

2.3 Linear regression 2.4 Multiple regression

All statistics packages make a table of results for a regression 2.4.1 Multiple regression
 A regression with two or more predictor variables is called a
multiple regression:

ŷ = b0 + b1xi + ... + bkxk.

 We then find the residuals as

e = y - ŷ.

29 30

Tos Mathematics 9 Second Quarter
71% (7)
Tos Mathematics 9 Second Quarter
2 pages
Stats10_Chapter+4 2
No ratings yet
Stats10_Chapter+4 2
54 pages
Lecture8 4
No ratings yet
Lecture8 4
29 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
Unit 2 - Scatterplots Correlation and Regression Summer 2021
No ratings yet
Unit 2 - Scatterplots Correlation and Regression Summer 2021
43 pages
Lecture+8+ +Linear+Regression
No ratings yet
Lecture+8+ +Linear+Regression
45 pages
Ch 4- Correlation and Regression YARA&LAMA
No ratings yet
Ch 4- Correlation and Regression YARA&LAMA
27 pages
Correlation
No ratings yet
Correlation
29 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (1)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Corr_Regression Analysis
No ratings yet
Corr_Regression Analysis
19 pages
Presentation4 - Bivariate Analysis and Simple Linear Regression
No ratings yet
Presentation4 - Bivariate Analysis and Simple Linear Regression
31 pages
Simple Regression and Correlation
No ratings yet
Simple Regression and Correlation
30 pages
Relationship- Correlation and Regression (1)
No ratings yet
Relationship- Correlation and Regression (1)
42 pages
Statistics For Business STAT130: Unit 8: Correlation and Regression Analysis
No ratings yet
Statistics For Business STAT130: Unit 8: Correlation and Regression Analysis
56 pages
Regression&Corr&Annova
No ratings yet
Regression&Corr&Annova
71 pages
STAR Rando Questions Stats
No ratings yet
STAR Rando Questions Stats
14 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Correlation
No ratings yet
Correlation
72 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Bivariate EDA and Regression Analysis
No ratings yet
Bivariate EDA and Regression Analysis
61 pages
Correlation Regression
No ratings yet
Correlation Regression
58 pages
Chapter12 Stats
No ratings yet
Chapter12 Stats
6 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
46 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Applied Statistics II Chapter 7 The Relationship Between Two Variables
No ratings yet
Applied Statistics II Chapter 7 The Relationship Between Two Variables
73 pages
Mda-Session-7 Simple Linear Regression
No ratings yet
Mda-Session-7 Simple Linear Regression
75 pages
Lecture 8 and 9 Regression Correlation and Index
No ratings yet
Lecture 8 and 9 Regression Correlation and Index
32 pages
Bivariate Data Analysis
100% (1)
Bivariate Data Analysis
34 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Topic - chapter 12 - Regression models
No ratings yet
Topic - chapter 12 - Regression models
1 page
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
7 pages
Topic 6
No ratings yet
Topic 6
22 pages
BCSE352E EDA CAT 2 Mod 1,2,5
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5
146 pages
06 Simple Linear Regression Part1
No ratings yet
06 Simple Linear Regression Part1
8 pages
Session_19&20
No ratings yet
Session_19&20
54 pages
Week 03 Regression
No ratings yet
Week 03 Regression
14 pages
6 ASAP Advanced Statistics-Regression
No ratings yet
6 ASAP Advanced Statistics-Regression
53 pages
Chapter 3 Describing Relationships
No ratings yet
Chapter 3 Describing Relationships
39 pages
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
No ratings yet
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
31 pages
QT _Unit 2_Part B - Regression
No ratings yet
QT _Unit 2_Part B - Regression
40 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Cha 6
No ratings yet
Cha 6
8 pages
Simple and Multiple Linear Regression
No ratings yet
Simple and Multiple Linear Regression
91 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Correlation and Regression
No ratings yet
Correlation and Regression
61 pages
Corr and Regress
No ratings yet
Corr and Regress
42 pages
Chapter 3: Describing Relationships: Section 3.2
No ratings yet
Chapter 3: Describing Relationships: Section 3.2
23 pages
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
No ratings yet
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
43 pages
Regression and Correlation
No ratings yet
Regression and Correlation
66 pages
Chapter 2
No ratings yet
Chapter 2
67 pages
ASS#1-FINALS Doromal
No ratings yet
ASS#1-FINALS Doromal
8 pages
Spat Itttttt Ttttt Ttttt
No ratings yet
Spat Itttttt Ttttt Ttttt
48 pages
Chapter-23 Bivariate Statistical Analysis: Measurement of Association
No ratings yet
Chapter-23 Bivariate Statistical Analysis: Measurement of Association
30 pages
Chapter 12
No ratings yet
Chapter 12
36 pages
Statistics Overview Part II
No ratings yet
Statistics Overview Part II
29 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
Linear correlation and linear regression
No ratings yet
Linear correlation and linear regression
37 pages
BStats 2
No ratings yet
BStats 2
66 pages
Analytic Geometry and Linear Algebra for Physical Sciences
From Everand
Analytic Geometry and Linear Algebra for Physical Sciences
Kartikeya Dutta
No ratings yet
Calculus III Essentials
From Everand
Calculus III Essentials
Editors of REA
1/5 (2)
(2025.02.12) DWG - Pertashop Teluk Kabung Rev.1
No ratings yet
(2025.02.12) DWG - Pertashop Teluk Kabung Rev.1
1 page
Part 3 Data-Collection 29
No ratings yet
Part 3 Data-Collection 29
8 pages
Part 1 Exploring-And-Understanding-Data
No ratings yet
Part 1 Exploring-And-Understanding-Data
19 pages
Bao Viet Insurance - List of Bao Viet Insurance's Direct Billing Network in HCMC
No ratings yet
Bao Viet Insurance - List of Bao Viet Insurance's Direct Billing Network in HCMC
3 pages
8 Chapter Ii
No ratings yet
8 Chapter Ii
22 pages
Dynamic Load Allowance For Reinforced Concrete Bridges: Jerry W. Wekezer, Eduardo E. Taft
No ratings yet
Dynamic Load Allowance For Reinforced Concrete Bridges: Jerry W. Wekezer, Eduardo E. Taft
13 pages
Slide Cable Stayed PDF
No ratings yet
Slide Cable Stayed PDF
117 pages
Petroleum Laboratory Manual PETE 355
No ratings yet
Petroleum Laboratory Manual PETE 355
70 pages
CV Protocol-Coated Tablets
No ratings yet
CV Protocol-Coated Tablets
22 pages
Essay of Poverty
No ratings yet
Essay of Poverty
8 pages
Modification On The Design of A Drill Power Pump
No ratings yet
Modification On The Design of A Drill Power Pump
7 pages
Documentos de Semaforos
No ratings yet
Documentos de Semaforos
54 pages
Improvement in Outdoor Sound Source Detection Using A Quadrotor-Embedded Microphone Array
No ratings yet
Improvement in Outdoor Sound Source Detection Using A Quadrotor-Embedded Microphone Array
6 pages
Cuadro de Inspeccion Visual
No ratings yet
Cuadro de Inspeccion Visual
1 page
Org PPT 15
No ratings yet
Org PPT 15
34 pages
Effective Communication in Nursing Practice
No ratings yet
Effective Communication in Nursing Practice
34 pages
One Word Subitution From CSSPREPFORUM
No ratings yet
One Word Subitution From CSSPREPFORUM
22 pages
Proceso de Fabricaion Timken
No ratings yet
Proceso de Fabricaion Timken
9 pages
12 Required Properties of Idea Refrigerants PDF
No ratings yet
12 Required Properties of Idea Refrigerants PDF
1 page
English 8 Q4 Week 1 7
No ratings yet
English 8 Q4 Week 1 7
51 pages
Pugo Elemetary School Individual Learning Monitoring Plan Grade Vi-N
100% (1)
Pugo Elemetary School Individual Learning Monitoring Plan Grade Vi-N
3 pages
Responsible management accounting and controlling : a practical handbook for sustainability, responsibility, and ethics First Edition Ette download
100% (1)
Responsible management accounting and controlling : a practical handbook for sustainability, responsibility, and ethics First Edition Ette download
69 pages
Calculus Chapter 1
No ratings yet
Calculus Chapter 1
3 pages
Wordwall Im Study
No ratings yet
Wordwall Im Study
13 pages
Met Upto 12 Chapter
No ratings yet
Met Upto 12 Chapter
13 pages
Black Crusade - Broken Chains Additional Characters
100% (2)
Black Crusade - Broken Chains Additional Characters
2 pages
General Biology 2 Course Outline
No ratings yet
General Biology 2 Course Outline
5 pages
Technical 6 10 2012
No ratings yet
Technical 6 10 2012
24 pages
Workshop On Exact Equations
No ratings yet
Workshop On Exact Equations
18 pages
Board Surveyor Management Policy
No ratings yet
Board Surveyor Management Policy
11 pages
Werner Blumenberg - Karl Marx - An Illustrated History
No ratings yet
Werner Blumenberg - Karl Marx - An Illustrated History
202 pages
Waterbase TDS
No ratings yet
Waterbase TDS
3 pages
Introduction To Traditional Knowledge Notes
No ratings yet
Introduction To Traditional Knowledge Notes
2 pages
STS - PPT 6
No ratings yet
STS - PPT 6
17 pages
Bài Tập Trọng âm - Gửi lớp
No ratings yet
Bài Tập Trọng âm - Gửi lớp
7 pages
q1 m1 Measurement
No ratings yet
q1 m1 Measurement
37 pages

Part 2 Exploring Relationships Among Variables

Uploaded by

Part 2 Exploring Relationships Among Variables

Uploaded by

6/1/2020

Ho Chi Minh City University of Technology –

Lecturer: Nguyen Hoai Nghia (Jack), Ph.D.

2.1 Scatter plots 2.1 Scatter plots

2.1 Scatter plots 2.1 Scatter plots

2.2 Correlation 2.2 Correlation

2.2 Correlation 2.2 Correlation

2.2 Correlation 2.2 Correlation

2.2 Correlation 2.2 Correlation

2.2 Correlation 2.2 Correlation

2.2 Correlation 2.3 Linear regression

2.3 Linear regression 2.3 Linear regression

2.3 Linear regression 2.3 Linear regression

2.3 Linear regression 2.3 Linear regression

2.3 Linear regression 2.3 Linear regression

2.3 Linear regression 2.3 Linear regression

2.3 Linear regression 2.4 Multiple regression

ŷ = b0 + b1xi + ... + bkxk.

 We then find the residuals as

You might also like