0% found this document useful (0 votes)
33 views47 pages

MKT3602 Week+8 Slides

Uploaded by

g7whxzksqg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views47 pages

MKT3602 Week+8 Slides

Uploaded by

g7whxzksqg
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

MKT3602

Marketing Research

Week 8
MKT3602 Marketing Research
Module 4.1

Confidence Interval &


Hypothesis Testing
We can only observe sample data
because it is impossible or too costly
to collect data on the population

Population: The entire set of


Sample: A subset of
all observations that we are
the population
interested in and want to
collect data on
We use the analytics results from the sample to
make inference on population characteristics 3
Types of inference

Population

Sample

Estimation: Hypothesis testing:


Estimate a population parameter Assess a statement about
(e.g., mean) and assess the a population parameter
reliability of the estimate (e.g., mean)
4
Estimation of the mean

Population Mean Sample Mean

σ𝑁
𝑖=1 𝑋𝑖
σ𝑛𝑖=1 𝑋𝑖
𝜇= 𝑋ሜ =
𝑁 𝑛

where:
𝜇 = Population mean
𝑋ത = Sample mean
𝑋𝑖 = Variable value for individual i

N = Population size
n = Sample size
5
Estimation of the mean: Confidence interval
• Point estimate
• 𝑋ത is a point estimate for 𝜇 (e.g., 4 hours)

• But a single sample mean does not convey the uncertainty associated with the estimation

• Confidence interval: provides additional information about estimation


uncertainty
• Gives a range of values based on observations from one sample (e.g., 3.7-4.3 hours)

• Gives information about the precision to estimate the unknown population mean

• Stated in terms of level of confidence: typically 95%

6
Confidence interval calculation
CI = Point Estimate ± (Critical Value) × (Standard Error)

Margin of Error

• Reflects the uncertainty or precision of the estimation

• Critical value: determined by the level of confidence

• Standard error: reflects variability of the parameter estimate


Lower Upper
Confidence Limit Point Estimate Confidence Limit
Symmetrical around
the point estimate

Width of CI = 2 × Critical Value × Standard Error 7


An example
• Suppose a random sample of 100 people are selected

• Sample mean: 𝑋ሜ = 17 hours

• Sample standard deviation: 𝑠 = 6 hours

• 95% confidence interval calculation

• Critical Value: 𝑧𝛼/2 = 1.96

𝑠
=
6
= 0.6 hours We are 95% confident that
• standard error: 𝑛 100
the true average hours
𝑋ሜ − 𝑧𝛼/2 ⋅
𝑠 spent on social media per
• Lower bound: = 17 − 1.96 ⋅ 0.6 = 15.8 hours
𝑛 week is between 15.8
• Upper bound: 𝑋ሜ + 𝑧𝛼/2 ⋅
𝑠
= 17 + 1.96 ⋅ 0.6 = 18.2 hours hours and 18.2 hours
𝑛
10
Hypothesis testing
• Again, the key idea is still learning the population
characteristic from a sample

• Hypothesis testing, sometimes called significance


testing, is an act in statistics whereby an analyst tests
an assumption regarding a population parameter.

• Hypothesis testing is used to assess the plausibility of


a hypothesis by using sample data.

• Example: whether a drug has an effect on blood


pressure

• A good learning video:

[Link] 11
Steps for hypothesis testing
1. Form the null hypothesis (a claim about a population parameter) and form
the alternative hypothesis (the opposite of null)

2. Set a level of significance α (usually set at 0.01, 0.05, 0.1)

3. Take a sample and calculate sample statistics

4. Use your sample data to perform the test (done by SPSS or other statistical
software): p-value

5. Interpret the test results and decide whether the null hypothesis should or
should not be rejected

17
Steps 1: Hypothesis
• A hypothesis is a claim (assumption, initial statement) about a population
parameter

• The Null Hypothesis, 𝐻0 , states the assumption (numerical) to be tested

• The Alternative Hypothesis, 𝐻𝐴 or 𝐻1 , states the opposite of 𝐻0

• Example: on average people spend 18 hours per week on social media

𝑯𝟎 : 𝝁 = 𝟏𝟖 (where 𝜇 denotes the population mean)

𝑯𝑨 : 𝝁 ≠ 𝟏𝟖 (because 𝜇 can be larger or smaller than 18, we call this


alternative hypothesis a two-sided
or two-tailed hypothesis)
18
Step 2: Level of significance
• Selected by the researcher (YOU) at the beginning of the test, denoted as α

• Level of significant = 1 - confident level

• The significance level is a value for which p-values ≤ α is considered statistically


significant

• Meaning: The researcher is going to reject the hypothesis if the observed result has
less than 𝛼 of probability of happening due to chances
α What it means When to use
0.01 p ≤ 0.01 are considered statistically significant Recommended for a large sample size
0.05 p ≤ 0.05 are considered statistically significant Used most frequently
0.10 p ≤ 0.10 are considered statistically significant Recommended for a small sample size
19
Step 3: Take a sample

• Sampling methods we have discussed

• Calculate the sample statistics: sample mean, sample standard deviation

σ𝑛𝑖=1 𝑋𝑖 − 𝑋ത 2
𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 =
𝑛−1

𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑠 = 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

20
Step 4: Perform the test
• The choice of the test depends on the null and alternative hypotheses

• In MR Module 4, we will review the following tests:


• Compare proportions: Chi-square test

• Compare means: T test and Analysis of Variance (ANOVA)

• We will perform the test in SPSS and obtain the p-value for the test

• P-value: given the null hypothesis is right, how likely it is to draw a sample that deviate from the
population by an equal or greater amount than the observed sample value

• An example:

Null hypothesis: The average height of male students at CityU is 1.70m

We have a sample of 100 male students: sample mean=1.8m

• A great learning video: [Link]


21
Step 5: Reach a conclusion

• If the p-value from the test is ≤ α, reject the Null Hypothesis

• Suppose the null hypothesis that average height of CityU male students is 1.7m is right

• We now have a sample with sample mean=1.8m. The test shows that the likelihood of drawing such

a sample is 0.01 (given the null hypothesis is right)

• Thus, the null hypothesis is not right

• If p-value for the test is > α, do not reject the Null Hypothesis

• Note: That does not mean we “accept” the null hypothesis


22
A quick example on comparing means
• Review the example on time spent on social media. The researcher would like to
find out whether the average is 18 hours per week.

• Suppose the researcher collected data on a random sample of 100 people and
found the sample mean to be 17 and the standard deviation to be 6.
• Step 1: 𝑯𝟎 : 𝝁 = 𝟏𝟖 ത 0
𝑋−𝜇 17−18
Test statistic = = = −1.67
• Step 2: 𝑯𝑨 : 𝝁 ≠ 𝟏𝟖 𝑠/ 𝑛 6/ 100

• Step 3: α=0.05 p-value = 0.096


• Step 4: SPSS performs the test
• Step 5: p-value >0.05 → we do not reject the null and are in favor of the null.
• In other words, at 0.05 significance level, there is no evidence that the average time spent
on social media is not 18 hours per week. 23
An example of one-sided hypothesis testing

24
What Test to Use?
The Test
Categorical Continuous
Variable

The Other
Categorical Continuous Categorical Continuous
Variable

2 3+
Groups Groups

Switch the Independent Analysis of


Chi-square Correlation
two Sample Variance
Test Test
variable T-Test (ANOVA)
29
Examples from the survey
Variable 1 Variable 2 Which test to use?

Does brand awareness Brand awareness Age group


depend on respondents’ (Yes/No) Chi-square Test
age group? nominal ordinal

Does people’s willingness Willingness to pay Brand awareness Independent-


($) (Yes/No) Independent
to pay vary by brand sample
Sample T-Test
awareness? ratio nominal T Test

Does people’s willingness Willingness to pay Income category Analysis of


($) Analysis of
to pay vary by income ordinal Variance
Variance (ANOVA)
category? ratio (3+ levels) (ANOVA)
30
Case study

• A team of researchers are interested in


understanding the house-sharing market in
New York City.

• A data set is compiled on a random sample


of the listings on Airbnb in New York City in
2019.

• A data sample and the variable dictionary


are presented on the next slide.
31
Data and variable dictionary
Data Variable Dictionary
id host_id borough neighbourhood room_type price number_of_reviews Variable name Description
101 2787 2 Kensington 2 149 9 id The identifier for each listing unit (i.e., a home)
102 2845 3 Midtown 1 225 45
103 4632 3 Harlem 2 150 0 host_id The identifier for host
104 4869 2 Clinton Hill 1 89 270 New York encompasses five administrative
105 7192 3 East Harlem 1 80 9 divisions (aka boroughs)
106 7322 3 Murray Hill 1 200 74 1 = "Bronx"
107 7356 2 Bedford-Stuyvesant 2 60 49
borough 2 = "Brooklyn"
108 8967 3 Hell's Kitchen 2 79 430
109 7490 3 Upper West Side 2 79 118 3 = "Manhattan"
110 7549 3 Chinatown 1 150 160 4 = "Queens"
111 7702 3 Upper West Side 1 135 53
5 = "Staten Island"
112 7989 3 Hell's Kitchen 2 85 188
The name of neighborhood; each borough
113 9744 2 South Slope 2 89 167 neighbourhood
consists of multiple neighborhoods
114 11528 3 Upper West Side 2 85 113
116 15991 2 Williamsburg 1 140 148 The type of dwelling
117 17571 2 Fort Greene 1 215 198 room_type 1 = "Entire home/apartment"
2 = "Private room"
price Price in dollars
The number of reviews each listing received
number_of_reviews
-999 = Missing
Research question 1

Do different room types have different price levels?


Categorical
Continuous
2 groups

Independent
Sample T-Test

Null Hypothesis:
Different room types have the same price levels.

Alternative Hypothesis:
Different room types have different price levels.
33
Research question 2

Do different boroughs have different price levels?


Categorical
Continuous
5 groups

Analysis of
Variance
(ANOVA)

Null Hypothesis:
Different boroughs have the same price levels.

Alternative Hypothesis:
At least one borough has different price level from other boroughs.
34
Research question 3

Do different boroughs have different room types?


Categorical Categorical

Chi-square
Test

Null Hypothesis:
Different boroughs have the same distribution of room types.

Alternative Hypothesis:
Different boroughs have different distributions of room types.
35
MKT3602 Marketing Research
Module 4.2

Crosstabs and Chi-square Test


Crosstabs
• Two-way frequency table
• One variable is placed in columns, One variable is placed in rows

• May include: row percentage, column percentage, cell


percentage

• Both of are categorical measurements


• Nominal or ordinal

• E.g., Gender vs. whether an iPhone user

nominal nominal

• E.g., Income category vs. whether an iPhone user


ordinal nominal
Practice
• Suppose we are studying whether female students prefer iPhone more
than male students do. Please consider the following questions:
• What are the variables being examined?
• Is each of the variable categorical or continuous?
• What are the row percentage and column percentage?

iPhone Android Row Total

Male 120 180 300


Female 80 20 100
Column Total 200 200 400

38
Calculate the row percentage

iPhone Android Row Total


Male 120 180 300
120/300=40% 180/300=60%
Female 80 20 100
80/100=80% 20/100=20%
Column Total 200 200 400
200/400=50% 200/400=50%

Divide by Row Total


Row percentages add up to 100% on each row

39
Calculate the column percentage

iPhone Android Row Total


Male 120 180 300
120/200=60% 180/200=90% 300/400=75%
Female 80 20 100
80/200=40% 20/200=10% 100/400=25%
Column Total 200 200 400

Divide by Column Total


Column percentages add up to 100% on each column

40
Calculate the cell percentage

iPhone Android Row Total


Male 120 180 300
120/400=30% 180/400=45%
Female 80 20 100
80/400=20% 20/400=5%
Column Total 200 200 400

Divide by Overall Total


Cell percentages add up to 100% across all cells

41
Consider the following questions
• Are these numbers enough to conclude a gender difference?
• 80% of female students prefer iPhones while 40% of male students prefer iPhones
• Mathematic difference? Managerial difference?
• Statistical difference?

• Statistical difference:
• Does this pattern happen by chance? Hypothesis testing can
• Or is it a systematic pattern in the overall population that answer these questions
female students are more likely to go with iPhones?

42
Hypothesis testing
u Null hypothesis: There is no gender difference in the preference for iPhone
vs. Android.

v Alternative hypothesis: There is a gender difference in the preference for


iPhone vs. Android.

w Set a level of significance α (usually set at 0.05)

x Use your sample data to perform the test → Typically done by SPSS.

y Interpret the test results and conclude In the appendix, we have a few slides
for the process to understand what
happens behind the scenes (not
required)
43
Example for Chi-square test

Do different boroughs have different room types?


Categorical Categorical

Chi-square
Test

Null Hypothesis:
Different boroughs have the same distribution of room types.

Alternative Hypothesis:
Different boroughs have different distributions of room types.
44
Steps 1 through 4: Chi-square test
1. The Null Hypothesis, 𝐻0
Different boroughs have the same distribution of room types.

2. The Alternative Hypothesis, 𝐻𝐴 or 𝐻1


Different boroughs do not have the same distribution of room types.

3. Set the level of significance: 0.05

4. Perform the Chi-Square test


• To perform a chi-square test, we need to first construct a crosstab
• Row variable = borough
• Column variable = room type
• In SPSS, go to menu Analyze → Descriptive Statistics → Crosstabs

45
Step 5: Interpret the results
borough * room_type Crosstabulation
Count

• The SPSS output includes a crosstab between


room_type
1 2 Total
borough 1 18 46 64 borough (five levels) and room type (two levels)
2 1393 888 2281
3 1563 798 2361 • Within each cell, the number is the frequency
4 148 200 348
count: there are 18 rooms of type 1 (i.e., entire
home/apartment) in borough 1
5 13 17 30
Total 3135 1949 5084

• Null hypothesis: distribution of room types is the


same among boroughs

• The p-value <0.001 < 0.05


We reject the null hypothesis and conclude that
the distribution of room types is statistically
significantly different among boroughs 46
Step 5: Interpret the results
• Following the chi-square test, we can use the
crosstab to find out HOW the distribution of
room types varies by boroughs

• % within borough is called the row percentage


• Borough 1: 28.1% of the units are room type 1 and
71.9% are room type 2
• Borough 2: 61.1% of the units are room type 1 and
38.9% are room type 2
• The differences we observe here are statistically
significant (based on chi-square test from the previous
page)

• % within room_type is called the column percentage


• The interpretation is similar to the row percentages
• 0.6% of room type 1 are from borough 1, etc. 47
• Now, let us switch to SPSS

• Open SPSS on the lab computer

• Know about the windows and menus in SPSS

SPSS • Import data


What we learn today
• Confidence interval

• Hypothesis testing

• Chi-square test

49
See you next
week!
Appendix:
Calculating Chi-square statistic and p-value
Chi-square test: Calculate the value of the test statistic
• Step 4.1: Calculate the expected value for each cell in the cross-tab table, if gender
and phone use are NOT related
iPhone Android Row Total
Male Expected? ? 300

Female ? ? 100

Column 200 200 400

300 200 300 × 200


400 × × = = 150
400 400 400
Probability Probability Expected number
of being of using of male students 52
male iPhone using iPhone
Chi-square test: Calculate the value of the test statistic
• Step 4.1: Calculate the expected value for each cell in the cross-tab table
(Row total*Column total/Cell total)
iPhone Android Row Total
Male 120 180 300
200*300/400=150 200*300/400=150
Female 80 20 100
200*100/400=50 200*100/400=50
Column 200 200 400

Within each cell:


• top number: observed frequency (O)
• bottom number: expected frequency (E)
53
Chi-square test: Calculate the value of the test statistic
(𝑂𝑖𝑗 − 𝐸𝑖𝑗 )2
• Step 4.2: Compute for each cell
𝐸𝑖𝑗

iPhone Android Row Total


Male 120 (150) 180 (150)
(120-150)2/150=6 (180-150)2/150=6
Female 80 (50) 20 (50)
(80-50)2/50=18 (20-50)2/50=18
Column

54
Chi-square test: Calculate the value of the test statistic
𝑟 𝑘 2
• Step 4.3: Sum over all cells 𝜒2 = ෍ ෍
(𝑂𝑖𝑗 − 𝐸𝑖𝑗 )
𝐸𝑖𝑗
𝑖=1 𝑗=1

iPhone Android Row Total


Male 120 (150) 180 (150)
(120-150)2/150=6 (180-150)2/150=6
Female 80 (50) 20 (50)
(80-50)2/50=18 (20-50)2/50=18
Column 6+6+18+18=48

• p-value = [Link](48,1,1)) = 0.000 < 0.05 → Reject null

• With confidence of 95%, we reject the hypothesis that there is no relationship


between gender and the phone choice
55
Chi-square test: Calculate the value of the test statistic
• Step 4.4: Calculate the degree of freedom (df)
• Degree of freedom (df) = (# of rows -1)(# of columns - 1)

• Meaning: The number of values in the final calculation of a statistic that are free to vary

iPhone Android Row Total


Male

Female

Column

• In this example, df = (2-1)(2-1) = 1


56
Chi-square test: Calculate the value of the test statistic
• Step 4.5: Calculate the p-value
• p-value = 0.000 < 0.05
• Meaning: If the null hypothesis (i.e., no gender difference)
is true, the probability of seeing a Chi-square statistic of 48
is <0.001.

• Step 5: Interpret the result (any of the following three is okay)


• With confidence of 95%, we reject the hypothesis that there is no relationship between
gender and the phone choice
• With confidence of 95%, we find that there is significant relationship between gender and
the phone choice
• With confidence of 95%, we find female students are significantly more likely to choose
iPhones than male students 57
Another example of Chi-square test
(𝑂𝑖𝑗 − 𝐸𝑖𝑗 )2
𝐸𝑖𝑗
iPhone Android Row Total
Male 140 (150) 160 (150)
(140-150)2/150=0.67 (160-150)2/150=0.67
Female 55 (50) 45 (50)
(55-50)2/50=0.5 (45-50)2/50=0.5
Column 0.67+0.67+0.5+0.5=2.34

• p-value = [Link](2.34,1,1)) = 0.126 > 0.05 → do not reject null


• With confidence of 95%, we do not reject the hypothesis that there is no relationship
between gender and the phone choice.
• In other words, we conclude that there is no relationship between gender and the
phone choice. 58

You might also like