RESEARCH METHODS FOR
BUSINESS
A Skill-Building Approach
Prepared by Riña
PART 12
Measurement: Scaling,
Reliability and Validity.
• Four types of scales.
1 Nominal scale, Ordinal scale, Interval scale, Ratio scale, Ordinal or
Interval?.
CONTENTS 2 • Rating and Ranking scales.
Dichotomous, Category, Semantic differential, Numerical, Itemized
rating, Likert, Fixed or constant sum, Stapel, Graphic rating,
Consensus, and Other scales.
3 • Goodness of measures.
Item analysis, Validity, and Reliability..
4 • Reflective versus formative
measurement scales.
What is a reflective scale? What is a formulative scale and why do the items of a
formative scale not necessarily hang together.
1. Four Types of Scales.
Scale – is a tool or mechanism by which individuals are distinguished as to how they differ from one
another on the variables of interest to our study. Scaling involves the creation of a continuum on which our
objects are located.
1. Nominal Scale
Is one that allows the researcher to assign subjects to certain categories or groups.
Nominal scales categorize individuals or objects into mutually exclusive and collectively exhaustive
groups.
The information that can be generated from nominal scaling is the calculation of the percentage (or
frequency) of males and females in our sample of respondents.
Nominal scale gives some basic, categorical, gross information.
1. Four Types of Scales.
2. Ordinal scale
An ordinal scale not only categorizes the variables in such a way as to denote differences among the
various categories, it also rank-orders the categories in some meaningful way.
With any variable for which the categories are to be ordered according to some preference, the ordinal
scale would be used. The preference would be ranked and numbered 1,2, and so on.
The ordinal scale helps the researcher to determine the percentage of respondents who consider
interaction with others as most important, those who consider using a number of different skills as
most important, and so on.
1. Four Types of Scales.
3. Interval scale
In an interval scale, or equal interval scale, numerically equal distances on the scale represent equal
values in the characteristics being measured.
Whereas the nominal scale allows us only to qualitatively distinguish groups by categorizing them into
mutually exclusive and collectively exhaustive sets, and the ordinal scale to rank-order the preferences,
the interval scale allows us to compare differences between objects.
The clinical thermometer is a good example of an interval-scaled instrument.
The difference between any two values on the scale is identical to the difference between any other
two neighboring values of the scale. It is more powerful than the nominal and ordinal scales.
The interval scale, then, taps the differences, the order, and the equality of the magnitude of the
differences in the variable.
1. Four Types of Scales.
4. Ratio scale
The ratio scale overcomes the disadvantage of the arbitrary origin point of the interval scale, in that it
has an absolute (in contrast to an arbitrary) zero point, which is a meaningful measurement point.
Thus, the ratio scale not only measures the magnitude of the differences between points on the scale
but also taps the proportions in the differences.
It is the most powerful of the four scales because it has a unique zero origin and subsumes all the
properties of the other three scales.
The weighing balance is a good example of a ratio scale.
Ordinal or interval?
- Likert scales are a commonly used way of measuring opinions and attitudes. Whether this scale is ordinal or
interval in nature is a subject of much debate. Some people argue that a Likert scale is ordinal in nature.
Nonetheless, Likert scales are generally treated as if they were interval scales, because it allows researchers
to calculate averages and standard deviation and to apply other, more advanced techniques.
1. Four Types of Scales.
1. Four Types of Scales.
1. Four Types of Scales.
2. Rating Scales.
1. Dichotomous scale
Used to elicit a Yes or No answer. Note that a nominal scale is used to elicit the response.
2. Category scale
Uses multiple items to elicit a single response. This also uses the nominal scale.
2. Rating Scales.
3. Semantic differential scale
Used to assess respondents’ attitudes toward a particular brand, advertisement, object, or individual.
It is ordinal in nature. However, it is often treated as an interval scale.
4. Numerical scale
Is similar to the semantic differential scale, with the difference that numbers on a five-point or seven-
point scale are provided, with bipolar adjectives at both ends.
This scale is also often treated an as interval scale, although it is formally ordinal in nature.
2. Rating Scales.
5. Itemized rating scale
A five-point or seven-point scale with anchors, as needed, is provided for each item and the respondent
states the appropriate number on the side of each item, or circles the relevant number against each
item. The responses to the items are then summed.
This uses an interval scale.
2. Rating Scales.
6. Likert scale
Is designed to examine how strongly subjects agree or disagree with statements on a five-point scale
with the following anchors:
7. Fixed or constant sum scale
The respondents are here asked to distribute a given number of points across various items.
This is more in the nature of an ordinal scale.
2. Rating Scales.
8. Stapel scale
This scale simultaneously measures both the direction and intensity of the attitude toward the items
under study.
This gives an idea of how close or distant the individual response to the stimulus is.
This is an interval scale.
2. Rating Scales.
9. Graphic rating scale
A graphic representation helps the respondents to indicate on this scale their answers to a particular
questions by placing a mark at the appropriate point on the line.
This is an ordinal scale.
2. Rating Scales.
10. Consensus scale
Scales can also be developed by consensus, where a panel of judges selects certain items, which in its
view measure the relevant concept.
The items are chosen particularly based on their pertinence or relevance to the concept.
Such a consensus scale is developed after the selected items have been examined and tested for their
validity and reliability.
This scale is rarely used for measuring organizational concepts because of the time necessary to develop
it.
Other scales
There are also some advanced scaling methods such as multidimensional scaling, where objects, people,
or both are visually scaled, and a conjoint analysis is performed.
This provides a visual image of the relationships in space among the dimensions of a construct.
It should be noted that Likert or some form of numerical scale is the most frequently used to measure
attitudes and behaviors in business research.
2. Ranking Scales.
Ranking scales
Are used to tap preferences between two or among more objects or items (ordinal in nature)
However, such ranking may not give definitive clues to some of the answers sought.
1. Paired comparison
Is used when, among a small number of object, respondents are asked to choose between two objects
at a time.
This helps to assess preferences.
The greater the number of objects or stimuli, the greater the number of paired comparisons presented
to the respondents, and the greater the respondent fatigue.
Hence, paired comparison is a good method if the number of stimuli presented is small.
2. Forced choice
Enables respondents to rank objects relative to one another, among the alternatives provided.
This is easier for the respondents, particularly if the number of choices to be ranked is limited in number.
2. Ranking Scales.
Ranking scales
Are used to tap preferences between two or among more objects or items (ordinal in nature)
However, such ranking may not give definitive clues to some of the answers sought.
3. Comparative scale
Provides a benchmark or a point of reference to assess attitudes toward the current object, event, or
situation under study.
Rating scales are used to measure most behavioral concepts. Ranking scales are used to make comparisons
or rank the variables that have been tapped on a nominal scale.
3. Goodness of measure.
Item analysis
Is carried out to see if the items in the instrument belong there or not. Each item is examined for its
ability to discriminate between those subjects whose total scores are high and those with low scores.
In item analysis, the means between the high-score group and the low-score group are tested to detech
significant differences through the t-values.
The items with a high t-value are then included in the instrument. Thereafter, tests for the reliability of
the instrument are carried out and the validity of the measure is established.
Reliability – is a test of how consistently a measuring instrument measures whatever concept it is
measuring.
Validity – is a test of how well an instrument that is developed measures that particular concept it is
intended to measure.
In other words, validity is concerned with whether we measure the right concept, and reliability with
stability and consistency of measurement.
Validity and reliability of the measure attest to the scientific rigor that has gone into the research study.
3. Goodness of measure.
3. Goodness of measure.
Content validity
Ensures that the measure includes an adequate and representative set of items that tap the concept.
The more the scale items represent the domain or universe of the concept being measured, the greater
the content validity.
Validity is a function of how well the dimensions and elements of a concept have been delineated.
Criterion-related validity
Is established when the measure differentiates individuals on a criterion it is expected to predict.
This can be done by establishing concurrent validity or predictive validity.
Concurrent validity - is established when the scale discriminates individuals who are known to be
different; that is, they should score differently on the instrument.
Predictive validity – indicates the ability of the measuring instrument to differentiate among individuals
with reference to a future criterion.
3. Goodness of measure.
Construct validity
Testifies to how well the results obtained from the use of the measure fit the theories around which the
test is designed. This is assessed through convergent and discriminant validity.
Convergent validity – is established when the scores obtained with two different instruments measuring
the same concept are highly correlated.
Discriminant validity – is established when, based on theory, two variables are predicted to be
uncorrelated, and the scores obtained by measuring them are indeed empirically found to be so.
3. Goodness of measure.
4. Reflective versus Formative measurement scales.
What is a reflective scale?
In a reflective scale, the items (all of them!) are expected to correlate. Each item in a reflective scale is
assumed to share a common basis.
Hence, an increase in the value of the construct will translate into an increase in the value for all the
items representing the construct.
What is a formative scale and why do the items not necessarily hang together?
A formative scale is used when a construct is viewed as an explanatory combination of its indicators.
A scale that contains items that are not necessarily related is called a formative scale.
A good (that is, a valid) formative scale is one that represents the entire domain of the construct. This
means that a valid scale should represent all the relevant aspects of the construct of interest, even if
these aspects do not necessarily correlate.
RESEARCH METHODS FOR
BUSINESS
A Skill-Building Approach
-END-