File Gabungan
File Gabungan
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Product moment correlation
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
From a sample of n observations, X and Y, the product
moment correlation, r, can be calculated as:
n
i =1
(X i − X ) ( Yi − Y )
r =
n n
2
2
( Xi − X ) ( Yi − Y )
i=1 i= 1
n
( Xi − X )( Yi − Y )
i =1 n –1
r =
n
( Xi − X )2 n
( Yi − Y )2
n −1 n −1
i =1 i =1
C OV x y
=
Sx Sy
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
• r varies between −1.0 and +1.0.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Table 22.1
Explaining attitude towards the city of
residence
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
The correlation coefficient may be calculated as follows:
= (10 + 12 + 12 + 4 + 12 + 6 + 8 + 2 + 18 + 9 + 17 + 2)/12
X = 9.333
= (6 + 9 + 8 + 3 + 10 + 4 + 5 + 2 + 11 + 9 + 10 + 2)/12
Y = 6.583
n
i
(X i – X )(Y i – Y ) = (10 − 9.33)(6 − 6.58) + (12 − 9.33)(9 − 6.58)
+ (12 – 9.33)(8 − 6.58) + (4 − 9.33)(3 − 6.58)
=1
+ (12 − 9.33)(10 − 6.58) + (6 − 9.33)(4 − 6.58)
+ (8 − 9.33)(5 − 6.58) + (2 − 9.33) (2 − 6.58)
+ (18 – 9.33)(11 − 6.58) + (9 − 9.33)(9 − 6.58)
+ (17 − 9.33)(10 − 6.58) + (2 – 9.33)(2 − 6.58)
= − 0.3886 + 6.4614 + 3.7914 + 19.0814
+ 9.1314 + 8.5914 + 2.1014 + 33.5714
+ 38.3214 − 0.7986 + 26.2314 + 33.5714
= 179.6668
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
n
(X i – X )2 = (10 − 9.33)2 + (12 − 9.33)2 + (12 − 9.33)2 + (4 − 9.33)2
i 1
= + (12 − 9.33)2 + (6 − 9.33)2 + (8 − 9.33)2 + (2 − 9.33)2
+ (18 − 9.33)2 + (9 − 9.33)2 + (17 − 9.33)2 + (2 − 9.33)2
= 0.4489 + 7.1289 + 7.1289 + 28.4089
+ 7.1289+ 11.0889 + 1.7689 + 53.7289
+ 75.1689 + 0.1089 + 58.8289 + 53.7289
= 304.6668
n
(Yi – Y)2 = (6 − 6.58)2 + (9 − 6.58)2 + (8 − 6.58)2 + (3 − 6.58)2
i =1 + (10 − 6.58)2 + (4 − 6.58)2 + (5 − 6.58)2 + (2 − 6.58)2
+ (11 − 6.58)2 + (9 − 6.58)2 + (10 − 6.58)2 + (2 − 6.58)2
= 0.3364 + 5.8564 + 2.0164 + 12.8164
+ 11.6964 + 6.6564 + 2.4964 + 20.9764
+ 19.5364 + 5.8564 + 11.6964 + 20.9764
= 120.9168
Thus, r= 179.6668
= 0.9361
(304.6668) (120.9168)
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Decomposition of the total variation
explained variation
r2 =
total variation
SSx
=
SSy
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
• When it is computed for a population rather than a sample, the
product moment correlation is denoted by , the Greek letter
rho. The coefficient r is an estimator of .
H0: = 0
H1: 0
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
The test statistic is:
t = r n −2
1/ 2
1− r2
which has a t distribution with n − 2 degrees of freedom.
For the correlation coefficient calculated based on the
data given in Table 22.1,
t = 0.9361 12 −2 1/2
1− (0.9361) 2
= 8.414
and the degrees of freedom df = 12 − 2 = 10. From the
t distribution table (Table 4 in the statistical appendix),
the critical value of t for a two-tailed test and
= 0.05 is 2.228. Hence, the null hypothesis of no
relationship between X and Y is rejected.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Partial correlation
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Regression analysis
Regression analysis examines associative relationships
between a metric-dependent variable and one or more
independent variables in the following ways:
• Determine whether the independent variables explain a significant
variation in the dependent variable: whether a relationship exists.
• Determine how much of the variation in the dependent variable can
be explained by the independent variables: strength of the
relationship.
• Determine the structure or form of the relationship: the mathematical
equation relating the independent and dependent variables.
• Predict the values of the dependent variable.
• Control for other independent variables when evaluating the
contributions of a specific variable or set of variables.
Regression analysis is concerned with the nature and degree of
association between variables and does not imply or assume any
causality.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Statistics associated with bivariate
regression analysis
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
• Regression coefficient. The estimated parameter, b, is
usually referred to as the non-standardised regression
coefficient.
• Scattergram. A scatter diagram, or scattergram, is a plot
of the values of two variables for all the cases or
observations.
• Standard error of estimate. This statistic, SEE, is the
standard deviation of the actual Y values from the
predicted Ŷ values.
• Standard error. The standard deviation of b, SEb, is
called the standard error.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
• Standardised regression coefficient. Also termed the
beta coefficient or beta weight, this is the slope obtained
by the regression of Y on X when the data are
standardised.
• Sum of squared errors. The distances of all the points
from the regression line are squared and added together
to arrive at the sum of squared errors, which is a
measure of total error, ej2.
• t statistic. A t statistic with n − 2 degrees of freedom can
be used to test the null hypothesis that no linear
relationship exists between X and Y, or:
H0 : 1 = 0, where t = b
SEb
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Conducting bivariate regression analysis
plot the scatter diagram
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Figure 22.2
Conducting bivariate regression
analysis
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Conducting bivariate regression analysis
formulate the bivariate regression model
In the bivariate regression model, the general form of a
straight line is: Y = 0 + 1 X
where
Y = dependent or criterion variable
X = independent or predictor variable
0 = intercept of the line
1 = slope of the line.
Yi = 0 + 1 Xi + ei
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Figure 22.4
Which straight line is best?
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Figure 22.5
Bivariate regression
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Figure 22.6
Decomposition of the total variation in
bivariate regression
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Conducting bivariate regression analysis
estimate the parameters
In most cases, 0 and 1 are unknown and are estimated
from the sample observations using the equation:
Y i = a + bxi
where Yi is the estimated or predicted value of Yi, and
a and b are estimators of 0 and 1 , respectively.
COVxy
b=
Sx2
n
i =1
(Xi – X )(Yi – Y)
= n
( Xi – X )
2
i =1
n
i=1
XiYi – nX Y
= n
2
Xi – nX 2
i =1
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
The intercept, a, may then be calculated using:
a = Y – bX
12
i
Xi2 = 102 + 122 + 122 + 42 + 122 + 62
=1
+ 82 + 22 + 182 + 92 + 172 + 22
= 1350
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
It may be recalled from earlier calculations of the simple correlation that:
X = 9.333
Y = 6.583
Given n = 12, b can be calculated as:
917 –(12)(9.333)(6.583)
b=
1350 –(12)(9.333)2
= 0.5897
a = Y – bX
= 6.583 – (0.5897)(9.333)
= 1.0793
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Conducting bivariate regression analysis
estimate the standardised regression
coefficient
• Standardisation is the process by which the raw data are
transformed into new variables that have a mean of 0 and a
variance of 1.
• When the data are standardised, the intercept assumes a
value of 0.
• The term beta coefficient or beta weight is used to denote
the standardised regression coefficient.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Conducting bivariate regression analysis
test for significance
The statistical significance of the linear relationship
between X and Y may be tested by examining the
hypotheses:
H0: 1 = 0
H 1: 1 0
A t statistic with n – 2 degrees of freedom can be
used, where t = b
SEb
SEb denotes the standard deviation of b and is called
the standard error.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Conducting bivariate regression analysis
test for significance
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Conducting bivariate regression analysis
determine the strength and significance
of association
The total variation, SSy, may be decomposed into the variation
accounted for by the regression line, SSreg, and the error or
residual variation, SSerror or SSres, as follows:
SSy = SSreg + SSres
n
where SSy =i (Y
=1 i −Y )2
SSreg = (Ŷ −Y)2
i
i =1
n
SSres = (Y −Ŷi)2
i =1 i
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
The strength of association may then be calculated as follows:
SSreg
r2=
SSy
SSy −SSres
=
SSy
i =1
= 120.9168
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
The predicted values (Ŷ ) can be calculated using the regression
equation:
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Therefore, n
SSreg = (Yi − Y )
2
i =1
i =1
SSreg
r2=
SSy
105.9466
=
120.9168
= 0.8762
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Another, equivalent test for examining the significance of the
linear relationship between X and Y (significance of b) is the
test for the significance of the coefficient of determination.
The hypotheses in this case are:
2
H0 : Rpop = 0
2
H1: Rpop 0
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
The appropriate test statistic is the F statistic:
S S reg
F=
SS res /(n – 2)
which has an F distribution with 1 and (n – 2) df. The F test is a generalised form
of the t test. If a random variable is t distributed with n degrees of freedom, then
t2 is F distributed with 1 and n df. Hence, the F test for testing the significance of
the coefficient of determination is equivalent to testing the following hypotheses:
H 0: 1 = 0
H 0: 1 0
or
H 0: = 0
H 1: 0
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
From Table 22.2, it can be seen that:
105.9522
r2 =
105.9522 + 14.9644
= 0.8762
Which is the same as the value calculated earlier. The value of the
F statistic is:
105.9522
F =
14.964/10
= 70.8028
with 1 and 10 df. The calculated F statistic exceeds the critical value of
4.96 determined from Table 5 in the Appendix. Therefore, the
relationship is significant at a = 0.05, corroborating the results of the
t test.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Table 22.2
Bivariate regression
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Conducting bivariate regression analysis
check prediction accuracy
To estimate the accuracy of predicted values, Ŷ , it is useful to
calculate the standard error of estimate, SEE.
n
(Yi − Yˆ )2
SEE = i =1
n−2
or
SEE = SSres
n− 2
or more generally, if there are k independent variables,
SEE = SSres
n − k −1
For the data given in Table 22.2, the SEE is estimated as follows:
SEE = 14.9644/(12–2)
= 1.22329
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Assumptions
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
• While a frequency distribution describes one
variable at a time, a cross-tabulation describes
two or more variables simultaneously.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Table 20.3
Gender and internet usage
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Two variables cross-tabulation
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Table 20.4
Gender and internet usage – column
totals
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Table 20.5
Gender and internet usage – row totals
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Three variables cross-tabulation
refine an initial relationship
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
• As shown in Table 20.7, in the case of females, 60% of the
unmarried participants fall in the high-purchase category, as
compared to 25% of those who are married. On the other
hand, the percentages are much closer for males, with 40% of
the unmarried participants and 35% of the married participants
falling in the high-purchase category.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Table 20.6
Purchase of luxury branded clothing by
marital status
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Table 20.7
Purchase of luxury branded clothing by
marital status and gender
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Three variables cross-tabulation
initial relationship was spurious
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Table 20.9
Ownership of expensive cars by
education and income levels
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Three variables cross-tabulation
reveal suppressed association
• Table 20.10 shows no association between desire to travel abroad
and age.
• When gender was introduced as the third variable, Table 20.11
was obtained. Among men, 60% of those under 45 indicated a
desire to travel abroad, as compared to 40% of those 45 or older.
The pattern was reversed for women, where 35% of those under
45 indicated a desire to travel abroad as opposed to 65% of those
45 or older.
• Since the association between desire to travel abroad and age
runs in the opposite direction for males and females, the
relationship between these two variables is masked when the data
are aggregated across gender as in Table 20.10.
• But when the effect of gender is controlled, as in Table 20.11, the
suppressed association between desire to travel abroad and age
is revealed for the separate categories of males and females.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Table 20.10
Desire to travel abroad by age
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Table 20.11
Desire to travel abroad by age and
gender
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Statistics associated with
cross-tabulation
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Statistics associated with
cross-tabulation chi-square
2 (fo− fe )2
=
all cells fe
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
For the data in Table 20., the value of 2is
calculated as:
= 3.333
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
• The chi-square distribution is a skewed distribution whose
shape depends solely on the number of degrees of freedom.
As the number of degrees of freedom increases, the chi-square
distribution becomes more symmetrical.
• Table 3 in the statistical appendix contains upper-tail areas of
the chi-square distribution for different degrees of freedom. For
1 degree of freedom the probability of exceeding a chi-square
value of 3.841 is 0.05.
• For the cross-tabulation given in Table 20.3, there are
(2 − 1) × (2 − 1) = 1 degree of freedom. The calculated chi-
square statistic had a value of 3.333. Since this is less than the
critical value of 3.841, the null hypothesis of no association
cannot be rejected, indicating that the association is not
statistically significant at the 0.05 level.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Lecture 3
Descriptive Statistics and
Hypothesis Testing
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Frequency distribution
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Table 20.2
Frequency distribution of ‘Familiarity
with the internet’
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Figure 20.1
Frequency histogram
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Statistics associated with frequency
distribution measures of location
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
• The median of a sample is the middle value when the data are
arranged in ascending or descending order. If the number of data
points is even, the median is usually estimated as the midpoint
between the two middle values – by adding the two middle values
and dividing their sum by 2. The median is the 50th percentile.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Statistics associated with frequency
distribution measures of variability
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
• The variance is the mean squared deviation from the
mean. The variance can never be negative.
• The standard deviation is the square root of the
variance. n 2
sx = (Xi − X)
i -1
n−1
• The coefficient of variation is the ratio of the
standard deviation to the mean expressed as a
percentage, and it is a unitless measure of relative
variability. sx
CV =
X
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Statistics associated with frequency
distribution measures of shape
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Figure 20.2
Skewness of a distribution
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Hypothesis testing
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
• A null hypothesis may be rejected, but it can never be
accepted based on a single test. In classical hypothesis
testing, there is no way to determine whether the null
hypothesis is true.
• The null hypothesis is formulated in such a way that its
rejection leads to the acceptance of the desired
conclusion. The alternative hypothesis represents the
conclusion for which evidence is sought.
H0: 0.40
H1: 0.40
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
A general procedure for hypothesis testing
step 1: Formulate the hypothesis
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
A general procedure for hypothesis testing
step 2: Select an appropriate statistical
technique
• The test statistic measures how close the sample has
come to the null hypothesis.
• The test statistic often follows a well-known distribution,
such as the normal, t, or chi-square distribution.
• In our example, the z statistic, which follows the standard
normal distribution, would be appropriate.
z= p−
p
where
(1 − )
p =
n Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
A general procedure for hypothesis testing
step 3: Choose the level of significance, α
Type I error
• Type I error occurs when the sample results lead to the
rejection of the null hypothesis when it is in fact true.
• The probability of type I error ( ) is also called the level
of significance.
Type II error
• Type II error occurs when, based on the sample results,
the null hypothesis is not rejected when it is in fact false.
• The probability of type II error is denoted by .
• Unlike , which is specified by the researcher, the
magnitude of depends on the actual value of the
population parameter (proportion).
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Figure 20.6
A broad classification of hypothesis
testing procedures
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Measurement and Scaling
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Measurement and scaling
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Scaling involves creating a continuum upon
which measured objects are located.
Consider an attitude scale from 1 to 100. Each
respondent is assigned a number from 1 to 100,
with 1 = extremely unfavourable, and 100 =
extremely favourable. Measurement is the actual
assignment of a number from 1 to 100 to each
respondent. Scaling is the process of placing the
respondents on a continuum, for example, with
respect to their attitude towards Formula One
racing.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Figure 12.1
An illustration of primary scales of
measurement
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Nominal scale
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Ordinal scale
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Interval scale
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Figure 12.2
A classification of scaling techniques
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
A comparison of scaling techniques
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Table 12.4
Basic non-comparative scales
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Itemised rating scales
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Figure 12.7
The Likert scale
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Figure 12.8
Semantic differential scale
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Figure 12.9
The Stapel scale
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Table 12.6
Some commonly used scales in
marketing
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Figure 12.13
Development of a multi-item scale
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Figure 12.14
Scale evaluation
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Measurement accuracy
where
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Potential sources of error on
measurement
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Reliability
• Reliability can be defined as the extent to which
measures are free from random error, XR. If XR = 0, the
measure is perfectly reliable.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
• Internal consistency reliability determines the extent
to which different parts of a summated scale are
consistent in what they indicate about the characteristic
being measured.
• In split-half reliability, the items on the scale are
divided into two halves and the resulting half scores are
correlated.
• The coefficient alpha, or Cronbach’s alpha, is the
average of all possible split-half coefficients resulting
from different ways of splitting the scale items. This
coefficient varies from 0 to 1, and a value of 0.6 or less
generally indicates unsatisfactory internal consistency
reliability.
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved
Validity
Copyright © 2018, 2012, 2007 Pearson Education, Inc. All Rights Reserved