RM-EBBA-class-8-CH0-11-Quatitative-analysis
RM-EBBA-class-8-CH0-11-Quatitative-analysis
EBBA-EBDB Programme
Research Methodology
Class 8
Senior lecturer: Assoc. Prof. Dr. Le Thi My Linh
1
Chapter 14
2
Areas of Editing Concern
◼ Asking the proper questions
◼ Recording answers accurately
◼ Screening questions correctly
◼ Recording open-ended answers
completely and accurately
3
Getting the Data Ready for
Analysis: coding
▪ Coding variable: Naming, Assigning Values, Giving
Labels to Variables and to their Values
▪ The variable label can be called anything you like – this is
what will appear on any table or graph you make.
▪ Each variable can have many values, each of which can
have its own label
◼ Data coding: assigning a number to the participants’
responses so they can be entered into a database.
4
DATA CODING: Tips for complex questions
5
Getting the Data Ready for
Analysis: Data entry
◼ Data Entry: after responses have been
coded, they can be entered into a
database. Raw data can be entered
through any software program (e.g.,
excel, SPSS)
6
Editing Data after entering data
◼ Check for mistake during entering data
◼ An example of an illogical response is an
outlier response. An outlier is an observation
that is substantially different from the other
observations.
◼ Inconsistent responses are responses that are
not in harmony with other information.
◼ Illegal codes are values that are not specified
in the coding instructions.
7
Types of statistical analysis
◼ Descriptive: describe the variables in a
data matrix
◼ Inferential: Make inferences about the
population’s characteristics based on the
sample data
◼ Differences: compare the mean of the responses
of one group to that of another group
◼ Associative: determines the strength and
direction of relationships between two or more
variables
◼ Predictive: make forecasts of future events
8
Statistical Analysis
◼ Every set of data collected needs some
summary information developed that
describes the numbers it contains
◼ Central tendency and dispersion,
◼ Relationships of the sample data, and
◼ Hypothesis testing
9
Measures of
Central Tendency
Mean
Mode
Arithmetic
Response Most
Average
Often Given
to a Question
Median
Middle Value
of a Rank Ordered
Distribution
10
Measures of
Central Tendency
◼ Each measure of central tendency
describes a distribution in its own
manner:
◼ for nominal data, the mode is the best
measure.
◼ for ordinal data, the median is generally
the best.
◼ for interval or ratio data, the mean is
generally used.
11
Measures of Dispersions
Describes how close to the mean or other measure
of central tendency, the rest of the values fall
12
Descriptive statistics
◼ Measures of Central Tendency or
location
◼ Measures of
Variability/dispersion/spread
◼ Other measures
13
Measures of Central Tendency
◼ Mode- nominal: value in a string of numbers
that occurs most often
◼ Median- ordinal: value whose occurrence lies in
the middle of a set of ordered value. Not
influenced by extreme data points. M=(n+1)/2
◼ E.g. 42,36,39,38,40,34,32,44. M=38.5
◼ Mean- interval or ratio: is the arithmetic
average of a set of numbers. Subject to strong
influence by extreme data points, should always
quote the standard deviation
◼ E.g.. 6,6.5,7,7,7,7.5,8,8.5,9: mode: 7; media:7; mean:
7.38
14
Skewed and Symmetric
Distributions
◼ Symmetric data – data sets whose values are evenly spread
around the center; the mean and median are equal.
◼ Skewed data – data sets that are not symmetric; the mean will
be larger or smaller than the median.
Mean < Median < Mode Mean = Median = Mode < Median < Mean
(Longer tail extends to left)
Mode (Longer tail extends to right)
15
Measures of Spread
◼ Range: identifies the maximum and minimum values
in a set of numbers. Subject to strong influence by
extreme data points
◼ The interquartile range: the range of the middle
50% of scores, hence a measure of how spread out the
middle 50% of scores are. Not subject to influence by
extreme data points
◼ Variance is calculated by subtracting the mean from
each of the observations in the data set, taking the
square of this difference, and dividing the total of these
by the number of observation.
(variance is the square of standard deviation)
◼ Standard Deviation: indicates the degree of variation
in away that can be translated into a bell-shaped curve
distribution. is the average amount of variation around the
mean, reducing the impact of extreme values (outliers)
16
Range
◼ Simplest measure of variation
◼ Difference between the largest and the smallest
observations, hence a measure of how spread out the
data are.
Range = xmaximum – xminimum
Example:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
(a dispersion of 13 units) 17
Interquartile Range
Example:
Median X
X Q1 Q3 maximum
minimum (Q2)
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
18
Variance
◼ Company A sold 30,40,50 units of a
product during 3 month: Arp., May,
June
◼ Company B sold 10,40,70 units of a
product during the same period
◼ Range?
◼ Variance?
19
Variance
◼ Variance for Co A:=
◼ (30-40)2 + (40-40)2 + (50-40)2 =66.7
3
◼ Variance for Co B:
◼ (10-40)2 + (40-40)2 + (70-40)2 =600
3
◼ The variance is much larger in Co B than Co A. It is
much more difficult for Co B to estimate how many
goods to stock than it is for Co A 20
Things to look for in a
histogram/boxplot
Visual inspection (histograms and/or box plots)
21
Histogram and boxplot
20 5.5
5.0 4
8
9
16
4.5
4.0
10
3.5
3.0
2.5
Std. Dev = .76
Mean = 4.0
2.0 15
10
0 N = 22.00
2.0 3.0 4.0 5.0 1.5
N= 22
Attractive Attractive
22
Modify variables
◼ Recode variable: from ratio/ordinal/nominal ->
ordinal/nominal vars
• Age-> age groups (children: age<=15, adult: 15- <60;
elders: >60)
• Primary/ Secondary/ High/ A degree -> Below High/
Upper High
• Ethnics: Kinh/ Mong/ Dao/ Pa Ko -> Kinh/ Not Kinh
◼ Compute variable: ratio vars -> ratio vars
• income = salary + bonus
• Revenue = price*quantity
• Lnhhsize = ln(hhsize)
23
Descriptive statistics
25
Descriptive statistics
2- Performing Descriptive
- Purpose: Display the principal statistics of
variables
- Commands :
Analyze\Descriptive Statistics\
Descriptive...
Designate: variables, summary, display,
save
- Descriptive for test capacity of data
26
Exercise #1 On Frequency
Distributions
◼ Below is a tabulation of the
demographic data from the Frequency
distribution of a survey done by Ms.
Sandra Jones. Her sample consisted of
148 of a total of 3,700 clerical
employees in three service
organizations. Based on the tabulation
provided below, describe the sample
characteristics.
27
Table 1: Frequency Distributions of Sample (n = 148)
Masters Degree =
36 (24%)
28
AGE # OF YEARS IN ORG. MARITAL STATUS
< 20 = 10(7%) < 1 year = 5 (3%) Single 20 (14%)
20-30 = 20(14%) 1-3 = 25(17%) Married 108 (73%)
31-40 = 30(20%) 4-10 = 98(66%) Divorced 13 (9%)
>40 = 88(59%) >10 = 20(14%) Alternative7 (4%)
Lifestyle
29
Testing goodness of data
◼ Reliability
◼ Validity
30
Testing goodness of data
Reliability
◼ Cronbach’s alpha is reliability coefficient that
indicates how well the items in a set are
positively correlated to one another
◼ Cronbach’s alpha is an adequate test of
internal consistency reliability
◼ The closer the Cronbach’s alpha is to 1, the
higher the internal consistency reliability
31
Cronbach’s alpha
◼ Cronbach’s alpha < 0.6: reliabilities is poor
◼ Cronbach’s alpha = 0.6: acceptable
◼ Cronbach’s alpha = 0.7: reliabilities is acceptable
◼ Cronbach’s alpha > 0.8: reliabilities is good
◼ Cronbach’s alpha > 0.9: reliabilities is very good
33
Reliability Analysis
- Commands :
◼ Analyse/scale/reliability analysis/select
the variables constituting the
scale/model alpha/click statistics/scale if
deleted under Descriptives
34
Example: reliability analysis for the
variable customer differentiation
Item-total Corrected
Statistics
Scale Scale Item- Total Alpha if
Mean if Variance if Correlation Item
Item Item Deleted
Deleted Deleted
CUSDIF1 10.04 5.473 .2437 .7454
CUSDIF2 9.7432 5.0176 .5047 .3293
CUSDIF3 9.6486 5.3754 .4849 .3722
Reliability Coefficients
N of Cases = 111.0 N of Items = 3
Alpha = .5878
35
Example: how to write result of
reliability test
◼ The reliability test was executed for each dimension
and the results indicated that the questions measured
each factor have consistency and can be
acknowledged to be reliable and qualified for further
analysis. The Cronbach's alpha of each factor is shown
in the far rightmost column of Table 1. These indexes
are at 0.7 or higher, indicating that the measurement
model achieved the reliability (Hair et al., 2009).
Especially, marketing innovation has good results
with Cronbach Alpha index above 0.8
◼ Hair, J. F., Andreson, R. E., Tahtam, R. L., & Black, C. W. (2009),
Multivariate Data Analysis, Prentice – Hall International, Inc.
36
Testing goodness of data
Validity
◼ Factorial validity can be established by
submitting the data for factor analysis. The
result of factor analysis (a multivariate
technique) will confirm whether or not the
theorized dimensions emerge
◼ When well validated measures are used,
there is no need, of course , to establish their
validity again for each study
◼ The reliability of the items can be tested 37