100% found this document useful (1 vote)
39 views93 pages

Session 2 Inferential Statistics Slides

The document provides information about Dr. Muhammad Azam, an Associate Professor at Rabat Business School, including his contact details and office hours. It outlines the lecture contents and objectives for a course on Inferential Statistics, covering topics such as measures of central tendency, variability, and regression analysis. Additionally, it includes examples and explanations of statistical concepts like mean, median, mode, and various measures of variation.

Uploaded by

Ali Fnidou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
39 views93 pages

Session 2 Inferential Statistics Slides

The document provides information about Dr. Muhammad Azam, an Associate Professor at Rabat Business School, including his contact details and office hours. It outlines the lecture contents and objectives for a course on Inferential Statistics, covering topics such as measures of central tendency, variability, and regression analysis. Additionally, it includes examples and explanations of statistical concepts like mean, median, mode, and various measures of variation.

Uploaded by

Ali Fnidou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 93

Instructor Information

Dr. Muhammad Azam


Associate Professor
Rabat Business School

📧 Email: [email protected]

🕒 Office Hours:

Tuesday: 10:00 AM - 12:00 PM


Thursday: 10:00 AM - 12:00 PM
📍 Office Location: Room 307

Inferential Statistics (Lecture Contents )
 Describe measurement of central tendency
 Illustrate shape of distribution
 Describe measures of variation
 Range interquartile Range
 variance ,standard deviation ,coefficient of variation
 Group data discussion
 variables discussion
 Inferences in Multiple Linear Regression
 Overview of multiple linear regression and its application in predictive analysis.
Lecture Objective

(continued)

After completing this lecture , you should be able to:


• Create and interpret graphs to describe categorical variables:
– from central tendency ,mean ,median range etc
• Create and solve exercise questions
• Create and interpret regression to describe numerical variables:
– regression analysis , coeffcient interpretation, t values roles
• Describe appropriate ways to deal with business data analysis
throgh manual solution and excel based solution
2.1 Measures of Central Tendency

Overview
Central Tendency

Mean Median Mode

x i
x  i1
n
Arithmetic Midpoint of Most frequently
average ranked values observed value
(if one exists)
Example
• 3, 7, 8, 5, 12, 7, 7, 8

• Mean (Average):

• Add all the values: 3 + 7 + 8 + 5 + 12 + 7 + 7 + 8 = 57


• Divide by the number of values: 57 ÷ 8 = 7.125
• So, the mean is 7.125.
• Median (Middle value):

• First, sort the numbers in order: 3, 5, 7, 7, 7, 8, 8, 12


• take the average of the two middle numbers: 7 and 7.
• The median is (7 + 7) ÷ 2 = 7.
• Mode (Most frequent value):

• The number 7 appears three times


Arithmetic Mean

• The arithmetic mean (mean) is the most


common measure of central tendency
– For a population of N values:
N

x i
x1  x 2    x N Population
μ 
i1
values
N N
Population size
– For a sample of size n:
n

x i
x1  x 2    x n Observed
x i1
 values
n n
Sample size
Arithmetic Mean
(continued)

• The most common measure of central tendency


• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Mean = 3 Mean = 4
1  2  3  4  5 15 1  2  3  4  10 20
 3  4
5 5 5 5
Median

• In an ordered list, the median is the


“middle” number (50% above, 50% below)

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Median = 3 Median = 3

• Not affected by extreme values


Finding the Median

• The location of the median:


th
 n  1
Median position    position in the ordered data
 2 
– If the number of values is odd, the median is the middle number
– If the number of values is even, the median is the average of the
two middle numbers

n 1
• Note that 2 is not the value of the median, only the
position of the median in the ranked data
Mode

• A measure of central tendency


• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical data
• There may may be no mode
• There may be several modes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 0 1 2 3 4 5 6

No Mode
Mode = 9
Review Example

• Five houses on a hill by the beach


$2,000 K
House Prices:

$2,000,000
500,000 $500 K
300,000 $300 K
100,000
100,000

$100 K

$100 K
Review Example:Summary Statistics

• Mean: ($3,000,000/5)
House Prices:
= $600,000
$2,000,000
500,000
300,000 • Median: middle value of ranked data
100,000 = $300,000
100,000
Sum 3,000,000
• Mode: most frequent value
= $100,000
Which measure of location is the “best”?

• Mean is generally used, unless extreme


values (outliers) exist . . .
• Then median is often used, since the
median is not sensitive to extreme
values.
– Example: Median home prices may be
reported for a region – less sensitive to
outliers
Shape of a Distribution

• Describes how data are distributed


• Measures of shape
– Symmetric or skewed

Left-Skewed Symmetric Right-Skewed


Mean < Median Mean = Median Median < Mean
Geometric Mean

Used to measure the rate of change of a variable over


time.The geometric mean is also used to calculate the
average rate of return on an investment over time. It
gives a more accurate idea of long-term growth,
especially when returns vary.
1/n
x g  (x1 x 2  x n ) (x1 x 2  x n )
n

• Geometric mean rate of return


– Measures the status of an investment over time

– Where xi is the rate of return in time period i


rg (x1 x 2 ... x n )1/n  1
Example
Example

An investment of $100,000 rose to $150,000 at the


end of year one and increased to $180,000 at end
of year two:

X1 $100,000 X 2 $150,000 X3 $180,000

50% increase 20% increase

What is the mean percentage return over time?


Example
(continued)

Use the 1-year returns to compute the


arithmetic mean and the geometric mean:

Arithmetic (50%)  (20%)


mean rate X 35% Misleading result
2
of return:

rg (x1 x 2 )1/n  1
Geometric
mean rate [(50) (20)]1/2  1
of return: Accurate
(1000)1/2  1 31.623  1 30.623% result
Percentiles and Quartiles

• Percentiles and Quartiles indicate the position of a


value relative to the entire set of data
• Generally used to describe large data sets and
compare values with other.
• Example: it means your IQ is higher than 90% of the
population, and 10% of the population has a higher
IQ than you.

Pth percentile = value located in the (P/100)(n + 1)th


ordered position
Example
Quartiles

• Quartiles split the ranked data into 4 segments with


an equal number of values per segment (note that
the widths of the segments may be different)

25% 25% 25% 25%

Q1 Q2 Q3
 The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
 Q2 is the same as the median (50% are smaller, 50% are
larger)
 Only 25% of the observations are greater than the third
quartile
Ch. 2-21
Quartile Formulas

Find a quartile by determining the value in the


appropriate position in the ranked data, where

First quartile position: Q1 = 0.25(n+1)

Second quartile position: Q2 = 0.50(n+1)


(the median position)

Third quartile position: Q3 = 0.75(n+1)

where n is the number of observed values


Ch. 2-22
Quartiles

 Example: Find the first quartile


Sample Ranked Data: 11 12 13 16 16 17 18 21 22

(n = 9)
Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,

so Q1 = 12.5

Ch. 2-23
Five-Number Summary

The five-number summary refers to five descriptive


measures:
minimum
first quartile
median
third quartile
maximum

minimum < Q1 < median < Q3 < maximum

Ch. 2-24
2.2 Measures of Variability

Variation

Range Interquartile Variance Standard Coefficient of


Range Deviation Variation

 Measures of variation give


information on the spread
or variability of the data
values.

Same center,
different variation Ch. 2-25
Range

• Simplest measure of variation


• Difference between the largest and the
smallest observations:
Range = Xlargest – Xsmallest

Example:

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13
Ch. 2-26
Disadvantages of the Range

• Ignores the way in which data are distributed

7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

• Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range = 5 - 1 = 4
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 120 - 1 = 119
Ch. 2-27
Interquartile Range

• Can eliminate some outlier problems by


using the interquartile range

• Eliminate high- and low-valued observations


and calculate the range of the middle 50% of
the data

• Interquartile range = 3rd quartile – 1st quartile

IQR = Q3 – Q1 Ch. 2-28


Interquartile Range

• The interquartile range (IQR) measures the


spread in the middle 50% of the data

• Defined as the difference between the


observation at the third quartile and the
observation at the first quartile

IQR = Q3 - Q1

Ch. 2-29
Box-and-Whisker Plot

• A box-and-whisker plot is a graph that describes the


shape of a distribution
• Created from the five-number summary: the
minimum value, Q1, the median, Q3, and the
maximum
• The inner box shows the range from Q1 to Q3, with
a line drawn at the median
• Two “whiskers” extend from the box. One whisker is
the line from Q1 to the minimum, the other is the line
from Q3 to the maximum value
Ch. 2-30
Box-and-Whisker Plot

The plot can be oriented horizontally or vertically

Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%

12 30 45 57 70

Ch. 2-31
Population Variance

• Average of squared deviations of values from


the mean

N
– Population variance:
2
 (x  μ)
i
2

σ  i1
N
Where μ = population mean
N = population size
xi = ith value of the variable x
Ch. 2-32
Example

Ch. 1-33
Sample Variance

• Average (approximately) of squared


deviations of values from the mean
n
– Sample variance:
2
 (x  x)i
2

s  i1
n -1
Where X = arithmetic mean
n = sample size
Xi = ith value of the variable X
Ch. 2-34
we have the following sample data points:
5,7,10,12,155, 7, 10, 12, 155,7,10,12,15
Step 1: Find the Sample Mean ( xˉ\bar{x}xˉ )
The mean is the sum of the values divided by the number of data points:xˉ=5+7+10+12+155=495=9.8\bar{x} = \frac{5 + 7 + 10 +
12 + 15}{5} = \frac{49}{5} = 9.8xˉ=55+7+10+12+15​=549​=9.8
Step 2: Find the Differences from the Mean
Subtract the mean (9.8) from each data point:
• 5−9.8=−4.85 - 9.8 = -4.85−9.8=−4.8
• 7−9.8=−2.87 - 9.8 = -2.87−9.8=−2.8
• 10−9.8=0.210 - 9.8 = 0.210−9.8=0.2
• 12−9.8=2.212 - 9.8 = 2.212−9.8=2.2
• 15−9.8=5.215 - 9.8 = 5.215−9.8=5.2
Step 3: Square the Differences
Now square each difference:
• (−4.8)2=23.04(-4.8)^2 = 23.04(−4.8)2=23.04
• (−2.8)2=7.84(-2.8)^2 = 7.84(−2.8)2=7.84
• (0.2)2=0.04(0.2)^2 = 0.04(0.2)2=0.04
• (2.2)2=4.84(2.2)^2 = 4.84(2.2)2=4.84
• (5.2)2=27.04(5.2)^2 = 27.04(5.2)2=27.04
Step 4: Find the Average of the Squared Differences
Now sum all the squared differences:23.04+7.84+0.04+4.84+27.04=62.823.04 + 7.84 + 0.04 + 4.84 + 27.04 =
62.823.04+7.84+0.04+4.84+27.04=62.8
Now, divide by n−1n - 1n−1, which is 5−1=45 - 1 = 45−1=4:Sample Variance=62.84=15.7\text{Sample Variance} = \frac{62.8}{4}
= 15.7Sample Variance=462.8​=15.7
Population Standard Deviation

• Most commonly used measure of


variation
• Shows variation about the mean
• Has the same units as the original data

– Population standard deviation:


N

 (x i  μ)2
σ  i 1
N
Ch. 2-36
Sample Standard Deviation

• Most commonly used measure of variation


• Shows variation about the mean
• Has the same units as the original data
n
– Sample standard deviation:
 i
(x  x) 2

S i1
n -1

Ch. 2-37
Calculation Example:
Sample Standard Deviation

Sample
Data (xi) : 10 12 14 15 17 18 18 24
n=8 Mean = x = 16

(10  X)2  (12  x)2  (14  x)2    (24  x)2


s
n 1

(10  16)2  (12  16)2  (14  16)2    (24  16)2



8 1

130 A measure of the “average”


  4.3095
7 scatter around the meanCh. 2-38
Measuring variation

Small standard deviation

Large standard deviation

Ch. 2-39
Comparing Standard Deviations

Mean = 15.5 for each data set

11 12 13 14 15 16 17 18 19 20 21
s = 3.338
(compare to the two
Data A cases below)

11 12 13 14 15 16 17 18 19 20 21
s = 0.926
(values are concentrated
Data B near the mean)

11 12 13 14 15 16 17 18 19
s = 4.570
20 21 (values are dispersed far
Data C from the mean)
Ch. 2-40
Advantages of Variance and
Standard Deviation

• Each value in the data set is used in


the calculation

• Values far from the mean are given


extra weight
(because deviations from the mean are
squared)

Ch. 2-41
Coefficient of Variation

• Measures relative variation


• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more
sets of data measured in different units
Population coefficient of Sample coefficient of
variation: variation:
σ  s
CV   100% CV   100%
μ  x 
Ch. 2-42
Comparing Coefficient of Variation

• Stock A:
– Average price last year = $50
– Standard deviation = $5
s $5
CVA   100%  100% 10%
x $50 Both stocks
• Stock B: have the same
standard
– Average price last year = $100 deviation, but
– Standard deviation = $5 stock B is less
variable relative
to its price
s $5
CVB   100%  100% 5%
x $100
Ch. 2-43
2.3 Weighted Mean and Measures of Grouped Data

• The weighted mean of a set of data is

w x i i
w 1x1  w 2 x 2    w n x n
x i1

n n
• Where wi is the weight of the ith observation
and n  w i
• Use when data is already grouped into n classes, with wi
values in the ith class

Ch. 2-44
Example

Ch. 1-45
Approximations for Grouped Data

Suppose data are grouped into K classes, with


frequencies f1, f2, . . ., fK, and the midpoints of the
classes are m1, m2, . . ., mK

• For a sample of n observations, the mean is


K

 fm i i
K
where n  fi
x i1
i1
n
Approximations for Grouped Data

Suppose data are grouped into K classes, with


frequencies f1, f2, . . ., fK, and the midpoints of the
classes are m1, m2, . . ., mK

• For a sample of n observations, the variance is


K

i i
f (m  x) 2

s2  i1
n 1

Ch. 2-48
Example
2.4 Measures of Relationships Between Variables

Two measures of the relationship between


variable are
• Covariance
– a measure of the direction of a linear
relationship between two variables

• Correlation Coefficient
– a measure of both the direction and the
strength of a linear relationship between two
variables
Ch. 2-50
Covariance

• The covariance measures the strength of the linear relationship


between two variables

• The population covariance:


N

 (x   i x )(y i   y )
Cov (x , y)  xy  i1
N
• The sample covariance:
n

 (x  x)(y  y)
i i
Cov (x , y) s xy  i1
n 1
– Only concerned with the strength of the relationship
– No causal effect is implied
Ch. 2-51
Example
Interpreting Covariance

• Covariance between two variables:

Cov(x,y) > 0 x and y tend to move in the same direction


Cov(x,y) < 0 x and y tend to move in opposite directions
Cov(x,y) = 0 x and y are independent

Ch. 2-53
Coefficient of Correlation

• Measures the relative strength of the linear relationship


between two variables

• Population correlation coefficient:

Cov (x , y)
ρ
σXσY
• Sample correlation coefficient:

Cov (x , y)
r
sX sY
Ch. 2-54
Features of Correlation Coefficient, r

• Unit free
• Ranges between –1 and 1
• The closer to –1, the stronger the negative linear
relationship
• The closer to 1, the stronger the positive linear
relationship
• The closer to 0, the weaker any positive linear
relationship

Ch. 2-55
Scatter Plots of Data with Various Correlation
Coefficients

Y Y Y

X X X
r = -1 r = -.6 r=0
Y
Y Y

X X X
r = +1 r = +.3 r = 0 Ch. 2-56
11.1 Overview of Linear Models

• An equation can be fit to show the best linear


relationship between two variables:

Y = β0 + β1X

Where Y is the dependent variable and


X is the independent variable
β0 is the Y-intercept
Ch. 11-57
Least Squares Regression

• Estimates for coefficients β0 and β1 are found


using a Least Squares Regression technique
• The least-squares regression line, based on sample data, is

yˆ b0  b1x
• Where b1 is the slope of the line and b0 is the y-intercept:

Cov(x, y)  s y  b0 y  b1x
b1  2
r  
sx  sx 

Ch. 11-58
Introduction to Regression Analysis

• Regression analysis is used to:


– Predict the value of a dependent variable based on
the value of at least one independent variable
– Explain the impact of changes in an independent
variable on the dependent variable
Dependent variable: the variable we wish to explain
(also called the endogenous variable)
Independent variable: the variable used to explain
the dependent variable
(also called the exogenous variable)
Ch. 11-59
11.2 Linear Regression Model

• The relationship between X and Y is described by


a linear function
• Changes in Y are assumed to be influenced by
changes in X
• Linear regression population equation model

y i β0  β1x i  ε i
• Where 0 and 1 are the population model
coefficients and  is a random error term.

Ch. 11-60
Simple Linear Regression Model

The population regression model:


Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable

y i β0  β1x i  ε i
Linear component Random Error
component

Ch. 11-61
Simple Linear Regression Model
(continued)

Y Yi β0  β1Xi  ε i
Observed Value
of Y for xi

εi Slope = β1
Predicted Value Random Error
of Y for xi
for this Xi value

Intercept = β0

xi X
Ch. 11-62
Least Squares Coefficient Estimators

(continued)
• The slope coefficient estimator is

 (x  x)(y  y)
i i
Cov(x, y) sy
b1  i1
n
 2
r
sx sx
 i
(x
i1
 x) 2

• And the constant or y-intercept is

b0 y  b1x
• The regression line always goes through the mean x, y

Ch. 11-64
Least Squares Coefficient Estimators

(continued)
• The slope coefficient estimator is

 (x  x)(y  y)
i i
Cov(x, y) sy
b1  i1
n
 2
r
sx sx
 i
(x
i1
 x) 2

• And the constant or y-intercept is

b0 y  b1x
• The regression line always goes through the mean x, y

Ch. 11-65
Computer Computation of Regression
Coefficients

• The coefficients b0 and b1 , and other


regression results in this chapter, will be found
using a computer
– Hand calculations are tedious
– Statistical routines are built into Excel
– Other statistical analysis software can be used

Ch. 11-66
Interpretation of the Slope and the
Intercept

• b0 is the estimated average value of y


when the value of x is zero (if x = 0 is in
the range of observed x values)

• b1 is the estimated change in the average


value of y as a result of a one-unit change
in x

Ch. 11-67
Interpretation of the Slope and the
Intercept

• b0 is the estimated average value of y


when the value of x is zero (if x = 0 is in
the range of observed x values)

• b1 is the estimated change in the average


value of y as a result of a one-unit change
in x

Ch. 11-68
Simple Linear Regression Example

• A real estate agent wishes to examine the relationship


between the selling price of a home and its size
(measured in square feet)
• A random sample of 10 houses is selected
– Dependent variable (Y) = house price in $1000s
– Independent variable (X) = square feet

Ch. 11-69
Sample Data for House Price Model

House Price in $1000s Square Feet


(Y) (X)
245 1400
312 1600
279 1700
308 1875
199 1100
219 1550
405 2350
324 2450
319 1425
255 1700

Ch. 11-70
Graphical Presentation

• House price model: scatter plot


450
400
House Price ($1000s)

350
300
250
200
150
100
50
0
0 500 1000 1500 2000 2500 3000
Square Feet

Ch. 11-71
Regression Using Excel
(continued)
• Data / Data Analysis / Regression

Provide desired input:

Ch. 11-72
Excel Output

Ch. 11-73
Excel Output
(continued)
Regression Statistics
Multiple R 0.76211 The regression equation is:
R Square 0.58082
Adjusted R Square 0.52842 house price 98.24833  0.10977 (square feet)
Standard Error 41.33032
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Ch. 11-74
Interpretation of the Intercept, b0

house price 98.24833  0.10977 (square feet)

• b0 is the estimated average value of Y when


the value of X is zero (if X = 0 is in the
range of observed X values)
– Here, no houses had 0 square feet, so b0 =
98.24833 just indicates that, for houses within the
range of sizes observed, $98,248.33 is the portion
of the house price not explained by square feet
Ch. 11-75
Interpretation of the Slope Coefficient, b1

house price 98.24833  0.10977 (square feet)

• b1 measures the estimated change in the


average value of Y as a result of a one-unit
change in X
Here, b1 = .10977 tells us that the average
value of a house increases by .10977($1000)
= $109.77, on average, for each additional
Ch. 11-76
one square foot of size
Explanatory Power of a
11.4 Linear Regression Equation

• Total variation is made up of two parts:


SST  SSR  SSE
Total Sum of Regression Sum Error (residual)
Squares of Squares Sum of Squares

SST  (y i  y)2 SSR  (yˆ i  y)2 SSE  (y i  yˆ i )2


where:
y = Average value of the dependent variable
yi = Observed values of the dependent variable
ŷ = Predicted value of y for the given x value Ch. 11-77
i i
Analysis of Variance

• SST = total sum of squares


– Measures the variation of the yi values around
their mean, y
• SSR = regression sum of squares
– Explained variation attributable to the linear
relationship between x and y
• SSE = error sum of squares
– Variation attributable to factors other than the
linear relationship between x and y
Ch. 11-78
Analysis of Variance
(continued)
Y Unexplained
yi variation 
 2 y
SSE = (yi - yi )
_
SST = (yi - y)2
Explained

 _2
y variation
_ SSR = (yi - y) _
y y

xi X
Ch. 11-79
Coefficient of Determination, R2

• The coefficient of determination is the portion of


the total variation in the dependent variable that is
explained by variation in the independent variable
• The coefficient of determination is also called R-
squared and is denoted as R2
2 SSR regression sum of squares
R  
SST total sum of squares
2
note: 0 R 1
Ch. 11-80
Examples of Approximate r2 Values

Y
r2 = 1

Perfect linear relationship


between X and Y:
X
r2 = 1
Y 100% of the variation in Y is
explained by variation in X

X
r =1
2
Ch. 11-81
Examples of Approximate r2 Values

Y
0 < r2 < 1

Weaker linear relationships


between X and Y:
X
Some but not all of the
Y
variation in Y is explained by
variation in X

X
Ch. 11-82
Examples of Approximate r2 Values

r2 = 0
Y
No linear relationship between
X and Y:

The value of Y does not


X depend on X. (None of the
r2 = 0
variation in Y is explained by
variation in X)

Ch. 11-83
Excel Output

2SSR 18934.9348
Regression Statistics
R   0.58082
Multiple R 0.76211
SST 32600.5000
R Square 0.58082
Adjusted R Square 0.52842
58.08% of the variation in
Standard Error 41.33032
Observations 10
house prices is explained by
variation in square feet
ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Ch. 11-84
Correlation and R2

• The coefficient of determination, R 2, for a


simple regression is equal to the simple
correlation squared
2 2
R r

Ch. 11-85
Estimation of Model Error Variance

• An estimator for the variance of the population model


error is
n

 i
e 2
SSE
ˆσ 2 s2e  i1 
n 2 n 2
• Division by n – 2 instead of n – 1 is because the simple regression
model uses two estimated parameters, b0 and b1, instead of one

se  s2e is called the standard error of the estimate

Ch. 11-86
Excel Output

Regression Statistics
Multiple R 0.76211 se 41.33032
R Square 0.58082
Adjusted R Square 0.52842
Standard Error 41.33032
Observations 10

ANOVA
df SS MS F Significance F
Regression 1 18934.9348 18934.9348 11.0848 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 98.24833 58.03348 1.69296 0.12892 -35.57720 232.07386
Square Feet 0.10977 0.03297 3.32938 0.01039 0.03374 0.18580

Ch. 11-87
Comparing Standard Errors

se is a measure of the variation of observed y


values from the regression line
Y Y

small se X large se X

The magnitude of se should always be judged relative to the size of the


y values in the sample data

i.e., se = $41.33K is moderately small relative to house prices in the


$200 - $300K range Ch. 11-88
Inference about the Slope: t Test
(continued)

House Price
Square Feet
Estimated Regression Equation:
in $1000s
(x)
(y)
house price 98.25  0.1098 (sq.ft.)
245 1400
312 1600
279 1700
308 1875
The slope of this model is 0.1098
199 1100
219 1550 Does square footage of the house
405 2350 significantly affect its sales price?
324 2450
319 1425
255 1700

Ch. 11-89
Inferences about the Slope: t Test Example

b1 sb1
H0: β1 = 0 From Excel output:
H1: β1  0 Coefficients Standard Error t Stat P-value
Intercept 98.24833 58.03348 1.69296 0.12892
Square Feet 0.10977 0.03297 3.32938 0.01039

b1  β1 0.10977  0
t  t 3.32938
sb1 0.03297

Ch. 11-90
11.8 Beta Measure of Financial Risk

• A Beta Coefficient is a measure of how the


returns of a particular firm respond to the
returns of a broad stock index (such as the
S&P 500)
• For a specific firm, the Beta Coefficient is the
slope coefficient from a regression of the
firm’s returns compared to the overall market
returns over some specified time period

Ch. 11-91
Beta Coefficient Example

• Slope coefficient is the Beta Coefficient

Information about
the quality of the
regression
model that
provides the
estimate of beta

Ch. 11-92
Questions and Answers

You might also like