Research Design - A framework or blueprint for conducting the marketing research project.
It
specifies the details of the procedures necessary for obtaining the information needed to structure
and/or solve marketing research problems.
• Define the information needed
• Design the exploratory, descriptive, and/or causal phases of the research
• Specify the measurement and scaling procedures
• Construct and pretest a questionnaire (interviewing form) or an appropriate form for data
collection
• Specify the sampling process and sample size
• Develop a plan of data analysis
Exploratory research - One type of research design, which has as its primary objective the provision
of insights into and comprehension of the problem situation confronting the researcher.
• Provide insights to enhance comprehension
• Formulate a problem or define a problem more precisely.
• Identify alternative courses of action.
• Develop hypotheses.
• Isolate key variables and relationships for further examination.4
• Gain insights for developing an approach to the problem.
• Establish priorities for further research.
Conclusive research - Research designed to assist the decision maker in determining, evaluating, and
selecting the best course of action to take in a given situation.
• To Validate Hypothesis
Descriptive Research
A type of conclusive research that has as its major objective the description of something— usually
market characteristics or functions. descriptive research, in contrast to exploratory research, is
marked by a clear statement of the problem, specific hypotheses, and detailed information needs. A
survey conducted in the department store patronage project, which involved personal interviews, is
an example of descriptive research.
Reasons for Descriptive Research
• To describe the characteristics of relevant groups, such as consumers, salespeople,
organizations, or market areas.
• To estimate the percentage of units in a specified population exhibiting a certain behavior.
• To determine the perceptions of product characteristics
• To determine the degree to which marketing variables are associated.
• To make specific predictions.
Cross-sectional design - A type of research design involving the collection of information from any
given sample of population elements only once.
Multiple cross-sectional design - A cross-sectional design in which there are two or more samples of
respondents, and information from each sample is obtained only once.
Cohort analysis - A multiple cross-sectional design consisting of a series of surveys conducted at
appropriate time intervals. The cohort refers to the group of respondents who experience the same
event within the same time interval.
Longitudinal designs - a fixed sample (or samples) of population elements is measured repeatedly on
the same variables. A longitudinal design differs from a cross-sectional design in that the sample or
samples remain the same over time. In other words, the same people are studied over time and the
same variables are measured. In contrast to the typical cross-sectional design, which gives a
snapshot of the variables of interest at a single point in time, a longitudinal study provides a series of
pictures that give an in-depth view of the situation and the changes that take place over time.
Causal research - A type of conclusive research where the major objective is to obtain evidence
regarding cause and-effect (causal) relationships.
• To understand which variables are the cause (independent variables) and which variables are
the effect (dependent variables) of a phenomenon
• To determine the nature of the relationship between the causal variables and the effect to be
predicted
• Conditions for causality
o Presence of concomitant variation (Correlation and Covariance)
o X Should always precede Y
o Temporal in nature – Should be consistent with time.
o Should rule out other possible explanations
o No spurious correlation
Potential Sources of Error
Total error - variation between the true mean value in the population of the variable of interest and
the observed mean value obtained in the marketing research project.
Random sampling error occurs because the particular sample selected is an imperfect
representation of the population of interest
Non-sampling errors can be attributed to sources other than sampling, and they may be random or
non-random. They result from a variety of reasons, including errors in problem definition, approach,
scales, questionnaire design, interviewing methods, and data preparation and analysis.
Nonresponse error arises when some of the respondents included in the sample do not respond.
The primary causes of nonresponse are refusals and not-at-homes. Nonresponse error is defined as
the variation between the true mean value of the variable in the original sample and the true mean
value in the net sample.
Response error arises when respondents give inaccurate answers or their answers are misrecorded
or misanalyzed. Response error is defined as the variation between the true mean value of the
variable in the net sample and the observed mean value obtained in the marketing research project.
Surrogate information error may be defined as the variation between the information
needed for the marketing research problem and the information sought by the researcher.
Measurement error may be defined as the variation between the information sought and the
information generated by the measurement process employed by the researcher.
Population definition error may be defined as the variation between the actual population
relevant to the problem at hand and the population as defined by the researcher.
Sampling frame error may be defined as the variation between the population defined by
the researcher and the population as implied by the sampling frame (list) used.
Data analysis error encompasses errors that occur while raw data from questionnaires are
transformed into research findings.
Respondent selection error occurs when interviewers select respondents other than those
specified by the sampling design or in a manner inconsistent with the sampling design.
Questioning error denotes errors made in asking questions of the respondents or in not
probing when more information is needed.
Recording error arises due to errors in hearing, interpreting, and recording the answers
given by the respondents.
Cheating error arises when the interviewer fabricates answers to a part or all of the interview.
Inability error results from the respondent’s inability to provide accurate answers.
Unwillingness error arises from the respondent’s unwillingness to provide accurate
information.
Primary data are originated by a researcher for the specific purpose of addressing the problem
at hand. The collection of primary data involves all six steps of the marketing research process.
Secondary data are data that have already been collected for purposes other than the problem at
hand. These data can be located quickly and inexpensively.
Secondary data can help you:
• Identify the problem.
• Better define the problem.
• Develop an approach to the problem.
• Formulate an appropriate research design (for example, by identifying the key variables).
Answer certain research questions and test some hypotheses.
• Interpret primary data more insightfully
Disadvantages of Secondary Data
Because secondary data have been collected for purposes other than the problem at hand, their
usefulness to the current problem may be limited in several important ways, including relevance and
accuracy. The objectives, nature, and methods used to collect the secondary data may not be
appropriate to the present situation.
Qualitative Versus Quantitative Research
Qualitative Quantitative
Nature of research Exploratory Conclusive
Objective To gain a qualitative understanding To quantify the data and
of the underlying reasons and generalize the results from the
motivations sample to the population of
interest
Sample Small number of nonrepresentative Large number of representative
cases cases
Data collection Unstructured Structured
Data analysis Nonstatistical Statistical
Outcome Develop an initial understanding Recommend a final course of
action.
Interpretation of Interpretivist Positivist
Insights
Nature of results Descriptive Specific
Types of Research Methods
Focus Group Discussions
A focus group is an interview conducted by a trained moderator in a nonstructured and natural
manner with a small group of respondents. The moderator leads the discussion. The main purpose of
focus groups is to gain insights by listening to a group of people from the appropriate
target market talk about issues of interest to the researcher.
Depth interviews are another method of obtaining qualitative data. We describe the general
procedure for conducting depth interviews and then illustrate some specific techniques. depth
interviews are an unstructured and direct way of obtaining information, but unlike focus groups,
depth interviews are conducted on a one-on-one basis. A depth interview is an unstructured, direct,
personal interview in which a single respondent is probed by a highly skilled interviewer to uncover
underlying motivations, beliefs, attitudes, and feelings on a topic.
Laddering A technique for conducting depth interviews in which a line of questioning proceeds from
product characteristics to user characteristics. Laddering provides a way to probe into consumers’
deep underlying psychological and emotional reasons that affect their purchasing decisions.
Laddering requires interviewers to be trained in specific probing techniques in order to
develop a meaningful “mental map” of the consumer’s view of a target product. The ultimate
goal is to combine mental maps of consumers who are similar, which will lead to the reasons
why people purchase products.
Travels from attributes arrives at consequences and unearths values.
Likert Scale
A measurement scale with five response categories ranging from “strongly disagree” to “strongly
agree,” which requires the respondents to indicate a degree of agreement or disagreement with
each of a series of statements related to the stimulus objects.
Semantic Differential Scale
The semantic differential is a 7-point rating scale with endpoints associated with bipolar labels
that have semantic meaning. In a typical application, respondents rate objects on a number of
itemized, 7-point rating scales bounded at each end by one of two bipolar adjectives, such as “cold”
and “warm.” Focuses on the intensity or direction of the feeling rather than agreement or
disagreement.
The researcher must make six major decisions when constructing any of these scales.
• The number of scale categories to use
• Balanced versus unbalanced scale
• Odd or even number of categories
• Forced versus non-forced choice
• The nature and degree of the verbal description
• The physical form of the scale
A multi-item scale consists of multiple items, where an item is a single question or statement to be
evaluated. The Likert, semantic differential, and Stapel scales presented earlier to measure attitudes
toward Sears are examples of multi-item scales.
Construct is a specific type of concept that exists at a higher level of abstraction than do everyday
concepts, such as brand loyalty, product involvement, attitude, satisfaction, and so forth. Next, the
researcher must develop a theoretical definition of the construct that states the meaning of the
central idea or concept of interest.
Measurement error is the variation in the information sought by the researcher and the information
generated by the measurement process employed.
The true score model provides a framework for understanding the accuracy of measurement.
According to this model,
XO = XT +XS +XR
where
XO - the observed score or measurement
XT - the true score of the characteristic
XS - systematic error
XR - random error
Reliability refers to the extent to which a scale produces consistent results if repeated
measurements are made.
Test-retest reliability, respondents are administered identical sets of scale items at two different
times under as nearly equivalent conditions as possible. The time interval between tests or
administrations is, typically, two to four weeks. The degree of similarity between the two
measurements is determined by computing a correlation coefficient. The higher the correlation
coefficient, the greater the reliability.
Alternative-forms reliability, two equivalent forms of the scale are constructed. The same
respondents are measured at two different times, usually two to four weeks apart, with a different
scale form being administered each time. The scores from the administration of the alternative-scale
forms are correlated to assess reliability.
Internal consistency reliability is used to assess the reliability of a summated scale where several
items are summed to form a total score. In a scale of this type, each item measures some aspect of
the construct measured by the entire scale, and the items should be consistent in what they indicate
about the characteristic.
The simplest measure of internal consistency is split-half reliability. The items on the scale are
divided into two halves and the resulting half scores are correlated. High correlations between the
halves indicate high internal consistency.
The coefficient alpha, or Cronbach’s alpha, is the average of all possible split-half coefficients
resulting from different ways of splitting the scale items. This coefficient varies from 0 to 1, and a
value of 0.6 or less generally indicates unsatisfactory internal consistency reliability. An important
property of coefficient alpha is that its value tends to increase with an increase in the number of
scale items. Therefore, coefficient alpha may be artificially, and inappropriately, inflated by including
several redundant scale items.
The validity of a scale may be defined as the extent to which differences in observed scale scores
reflect true differences among objects on the characteristic being measured, rather than systematic
or random error.
Content validity, sometimes called face validity, is a subjective but systematic evaluation of how well
the content of a scale represents the measurement task at hand. The researcher or someone else
examines whether the scale items adequately cover the entire domain of the construct being
measured.
Criterion validity reflects whether a scale performs as expected in relation to other variables
selected as meaningful criteria (criterion variables). Criterion variables may include demographic and
psychographic characteristics, attitudinal and behavioral measures, or scores obtained from other
scales.
Construct validity addresses the question of what construct or characteristic the scale is, in fact,
measuring. When assessing construct validity, the researcher attempts to answer theoretical
questions about why the scale works and what deductions can be made concerning the underlying
theory.
Convergent validity is the extent to which the scale correlates positively with other measures of the
same construct. It is not necessary that all these measures be obtained by using conventional scaling
techniques. Discriminant validity is the extent to which a measure does not correlate with other
constructs from which it is supposed to differ. It involves demonstrating a lack of correlation among
differing constructs. Nomological validity is the extent to which the scale correlates in theoretically
predicted ways with measures of different but related constructs.
To collect quantitative primary data, a researcher must design a questionnaire or an observation
form. A questionnaire has three objectives. It must translate the information needed into a set of
specific questions the respondents can and will answer. It must motivate respondents to complete
the interview. It must also minimize response error. Designing a questionnaire is an art rather than a
science. The process begins by specifying (step 1) the information needed and (step 2) the type of
interviewing method. The next (step 3) is to decide on the content of individual questions. The
question should overcome the respondents’ inability and unwillingness to answer (step 4).
Respondents may be unable to answer if they are not informed, cannot remember, or cannot
articulate the response. The unwillingness of the respondents to answer must also be overcome.
Respondents may be unwilling to answer if the question requires too much effort, is asked in a
situation or context
deemed inappropriate, does not serve a legitimate purpose, or solicits sensitive information. Then
comes the decision regarding the question structure (step 5). Questions can be unstructured (open
ended) or structured to a varying degree. Structured questions include multiple choice, dichotomous
questions, and scales.
Determining the wording of each question (step 6) involves defining the issue, using ordinary words,
using unambiguous words, and using dual statements. The researcher should avoid leading
questions, implicit alternatives, implicit assumptions, and generalizations and estimates. Figure 10.3
gives a concept map for question wording. Once the questions have been worded, the order in which
they will appear in the questionnaire must be decided (step 7). Special consideration should be given
to opening questions, type of information, difficult questions, and the effect on subsequent
questions. The questions should be arranged in a logical order. The stage is now set for determining
the form and layout of the questions (step 8). Several factors are important in reproducing the
questionnaire (step 9). These include appearance, use of booklets, fitting entire question on a page,
response category format, avoiding overcrowding, placement of directions, color coding, easy-to-
read format, and cost. Last but not least is pretesting (step 10). Important issues are the extent of
pretesting, nature of respondents, type of interviewing method, type of interviewers, sample size,
protocol analysis and debriefing, and editing and analysis. The design of observational forms requires
explicit
decisions about what is to be observed and how that behavior is to be recorded. It is useful to specify
the who, what, when, where, why, and way of the behavior to be observed. The questionnaire
should be adapted to the specific cultural environment and should not be biased in terms of any one
culture. Also, the questionnaire may have to be suitable for administration by more than one method
because different interviewing methods may be used in different countries. Several ethical issues
related to the researcher–respondent relationship and the researcher–client relationship may have
to be addressed. The Internet and computers can greatly assist the researcher in designing sound
questionnaires and observational forms.
Practices in framing good questionnaire
- Use Questions that test the hypothesis – IV, DV, CV
- Do not sacrifice validity to parsimony
- Use multi item measures
- Use predetermined scales for validity and reliability
- Questionnaire must have all possible IVs and DVs
- Measure the variables separately
- Order – DV followed by IV and then CV
- Avoid double baralled questions
- Avoid ambiguity
- Make questionnaire appear short
- Counterbalance the order of constructs, not the items within the construct.
Important qualitative factors that should be considered in determining the sample size include
• the importance of the decision,
• the nature of the research,
• the number of variables,
• the nature of the analysis,
• sample sizes used in similar studies,
• incidence rates completion rates,
• resource constraints.
Assumptions in Analysis of Variance
The salient assumptions in analysis of variance can be summarized as follows.
• Ordinarily, the categories of the independent variable are assumed to be fixed. Inferences
are made only to the specific categories considered. This is referred to as the fixed-effects
model. Other models are also available. In the random-effects model, the categories or
treatments are considered to be random samples from a universe of treatments. Inferences
are made to other categories not examined in the analysis. A mixed-effects model results if
some treatments are considered fixed and others random.6
• The error term is normally distributed, with a zero mean and a constant variance. The error is
not related to any of the categories of X. Modest departures from these assumptions do not
seriously affect the validity of the analysis. Furthermore, the data can be transformed to
satisfy the assumption of normality or equal variances.
• The error terms are uncorrelated. If the error terms are correlated (i.e., the observations are
not independent), the F ratio can be seriously distorted. In many data analysis situations,
these assumptions are reasonably met.
N-Way ANOVA
Interaction
Omega Squared
A measure indicating the proportion of the variation in the dependent variable explained by a
particular independent variable or factor.
Regression analysis is a powerful and flexible procedure for analyzing associative relationships
between a metric dependent variable and one or more independent variables. It can be used in the
following ways:
• Determine whether the independent variables explain a significant variation in the
dependent variable: whether a relationship exists
• Determine how much of the variation in the dependent variable can be explained by the
independent variables: strength of the relationship
• Determine the structure or form of the relationship: the mathematical equation relating the
independent and dependent variables
• Predict the values of the dependent variable
• Control for other independent variables when evaluating the contributions of a specific
variable or set of variables
• Coefficient of determination. The strength of association is measured by the coefficient of
determination, r2. It varies between 0 and 1 and signifies the proportion of the total
variation in Y that is accounted for by the variation in X.
• Estimated or predicted value. The estimated or predicted value of Yi is , where is the
predicted value of YNi Yi, and a and b are estimators of and
• Regression coefficient. The estimated parameter b is usually referred to as the
nonstandardized regression coefficient.
• Scattergram. A scatter diagram, or scattergram, is a plot of the values of two variables for
all the cases or observations.
• Standard error of estimate. This statistic, SEE, is the standard deviation of the actual Y
values from the predicted values.
• Standard error. The standard deviation of b, SEb, is called the standard error.
• Standardized regression coefficient. Also termed the beta coefficient or beta weight, this
is the slope obtained by the regression of Y on X when the data are standardized.
• Sum of squared errors. The distances of all the points from the regression line are squared
and
added together to arrive at the sum of squared errors, which is a measure of total error, .
• t statistic. A t statistic with n - 2 degrees of freedom can be used to test the null hypothesis
that no linear relationship exists between X and Y,
Assumptions
1. The error term is normally distributed. For each fixed value of X, the distribution of Y is
normal.12
2. The means of all these normal distributions of Y, given X, lie on a straight line with slope b.
3. The mean of the error term is 0.
4. The variance of the error term is constant. This variance does not depend on the values
assumed by X.
5. The error terms are uncorrelated. In other words, the observations have been drawn
independently.
Statistics Associated with Multiple Regression
Adjusted R2. R2, coefficient of multiple determination, is adjusted for the number ofindependent
variables and the sample size to account for diminishing returns. After thefirst few variables, the
additional independent variables do not make much contribution.
Coefficient of multiple determination. The strength of association in multiple regressionis measured
by the square of the multiple correlation coefficient, R2, which is also calledthe coefficient of
multiple determination.
Correlation matrix. A correlation matrix is a lower triangle matrix showing the simple correlations, r,
between all possible pairs of variables included in the analysis. The diagonal elements, which are all
1, are usually omitted.
Communality. Communality is the amount of variance a variable share with all the
other variables being considered. This is also the proportion of variance explained by the common
factors.
Eigenvalue. The eigenvalue represents the total variance explained by each factor.
Factor loadings. Factor loadings are simple correlations between the variables and the factors.
Factor loading plot. A factor loading plot is a plot of the original variables using the factor loadings as
coordinates.
Factor matrix. A factor matrix contains the factor loadings of all the variables on all the
factors extracted.
Factor scores. Factor scores are composite scores estimated for each respondent on the
derived factors.
Factor scores coefficient matrix. This matrix contains the weights, or factor score coefficients, used
to combine the standardized variables to obtain factor scores.
Factor Analysis is done only when there is multicollinearity among the independent variables
- Start with highly correlated IV
- Do factor analysis
- Determine the factors to extract
- Rotate if necessary
- Name the factors
A factor is
- Linear combo of correlated IVs
- Latent
- Reflected in correlated IVs
- No of Factors < No of Variables
- Explain more than unit variance