APA Test Standards: Validity Recommendations
APA Test Standards: Validity Recommendations
T
HE occasion for this paper is the publication tive, and content validity in the Technical Recom-
by Harold Bechtoldt (1959) of an eloquent mendations is an exhaustive statement of the de-
attack on the category of construct validity. sirable evidence of test validity.
The philosophical problems upon which Bechtoldt Test validity and test reliability are not concepts
takes issue with Cronbach and Meehl (19SS) are belonging to the philosophy of science. Instead
far removed from the practical business of design- they are concepts which have developed in the
ing, validating, and selling tests and are problems course of the mutual criticisms of test constructors
upon which competent philosophers are in disagree- and test users, concepts which relate to the implicit
ment. They are issues for which philosophy offers and explicit claims of test constructors and test
no orthodoxy upon which practitioners can depend salesmen. Had test designers from the beginning
and issues which make little or no difference to the been so abstemious as to merely present copies of
practicing scientist, as Hochberg (1959) has their tests and those of others and to report cor-
argued. In this situation, it would seem inap- relation coefficients between them for specified
propriate if the eloquence of Bechtoldt's attack led populations on specified dates under specified ad-
to the removal of the category of construct validity ministrators and conditions of administration, then
from the next edition of Technical Recommenda- no validity problem, no validity requirements,
tions for Psychological Tests and Diagnostic Tech- would have ever been developed.
niques (APA, 1954). Instead, it is here argued The actual situation has always been different
that there should be a considerable strengthening and, for the published tests to which the Technical
of a set of precautionary requirements more easily Recommendations addresses itself, seems likely to
classified under construct validity than under con- continue to be so. In the labels given tests, in
current or predictive validity as presently de- statements of intent and descriptive material, many
scribed. explicit and implicit claims are made. These claims
While not denying the presence of a serious amount to assertions of empirical laws between
philosophical disagreement nor its relevance to the test and other possible operations. Require-
psychology, this paper will emphasize the common ments for evidence as to reliability and validity are
ground implicit in psychology's tradition of test requirements that some of these laws be examined
validation efforts. The philosophical disagreement and confirmed. Our insistence on the importance
will remain, but it need not produce a lack of con- of such evidence comes from our cumulative ex-
census about desirable evidence of test validity. perience, in which test constructors and users have
Bechtoldt's argument is indeed more against the frequently been misled. Test constructors and
role of construct validity in discussions of philoso- users as we have known them have generally been
phy of science and psychological theory, rather prone to reifying and hypos tatizing, prone to as-
than an objection to specific statements of desira- sume that their tests were tapping dispositional
ble evidence of test validity contained under that syndromes with other symptoms than those utilized
rubric. He probably would not claim, for ex- in the test. The requirements of validity demand
ample, that the presentation of concurrent, predic- that the implications of such hypostatizing be
1
This paper has been improved through extended com- sampled and checked. Were the hypostatizing
ments on a previous draft made by H. P. Bechtoldt, L. J. tendency to be effectively eradicated, such require-
Cronbach, D. W. Fiske, K. R. Hammond, and J. Loevinger; ments would indeed become obsolete. If indeed
and I wish to express gratitude for their generous help.
This is not to imply that any of them would completely the hypostatizations are unjustified, there is no
agree with the paper in its present form. better way of extinguishing them than attempting
546
CONSTRUCT, TEAIT, OR DISCRIMINANT VALIDITY 547
to verify them. 2 Validation procedures are just quent presentations (Cronbach & Meelil, 19SS;
this. Jessor & Hammond, 1957) have tended to tie con-
struct validity to tests developed and validated in
CONSTRUCT OR TRAIT VALIDITY IN THE HISTORY the context of explicit theoretical structures or
OF TEST VALIDATION EFFORTS "nomological nets." Such developed theory was
usually lacking and was not typically employed in
Validation efforts have thus naturally been re- these older validation efforts, even where, as for
lated to the intents and claims of the tests in ques- the numerous introversion-extroversion tests, a
tion. In some instances, as occasionally for per- theoretical background may have been present.
sonnel selection tests, there have been quantitative It may be wise, therefore, to distinguish two
or dichotomous institutional discriminations which types of construct validity. The first of these can
it was economically advantageous to be able to appropriately be given the old-fashioned name
predict. Where tests were offered as predictors of trait validity. It is applicable at that level of de-
these practical decisions, evidences of the accuracy velopment still typical of most test development
of prediction were the relevant validity data. efforts, in which "theory," if any, goes no farther
Such validity efforts are subsumed in the Technical than indicating a hypothetical syndrome, trait, or
Recommendations under the terms predictive and personality dimension. The second type could
concurrent validity. The latter is usually but an be called nomological validity and would represent
inexpensive and presumptive substitute for the the very important and novel emphasis of Cronbach
former, and together they might be called practical and Meehl on the possibility of validating tests by
validity. Note the asymmetry of the validational using the scores from a test as interpretations of
correlation of this type: because of the socially a certain term in a formal theoretical network and,
institutionalized and valued nature of the "crite- through this, to generate predictions which would
rion," it is taken as an immutable given, even when, be validating if confirmed when interpreted as still
as in college grades or factory production records, other operations and scores. When the Taylor
it might be known to have many imperfections or Manifest Anxiety Scale is validated against psy-
sources of invalidity in itself, if judged from a chiatrists' ratings (Taylor, 1956, pp. 316-317),
theoretical point of view. Beginning with James trait validation is.being illustrated. Validated by
McKeen CattelFs efforts to predict college grades generating the correct predictions of performance
from reaction time measures, many psychological in learning situations, when test scores are inter-
tests have been validated and invalidated by this preted as differences in D in the Hull-Spence learn-
process. ing theory (Taylor, 1956, pp. 307-310), nomo-
But not all psychological tests have been de- logical validation is shown. The desirability of
signed solely to predict performance against extant going still further in designing tests in detailed con-
institutional decision situations. There are, in sideration of formal theory is an aspect of nomo-
fact, relatively few settings which produce such logical validity advocated by Jessor and Ham-
criteria; and these are often so patently complex mond (1957).
in the determinants of success as to be uninterest- Among commercially published tests, nomologi-
ing to the scientist, who would rather measure cal validity evidence is apt to be rare for some
purer, more single-factored traits for which society time to come, and it is therefore to trait validity
produces no correspondingly pure criteria. A good criteria that this paper is primarily addressed. If
half of the validation efforts for personality tests one prefers to regard this as merely the old com-
since 1920 have been of this latter type and can- mon sense notion of validity, and not needing any
not be readily subsumed under practical validity. new lable such as construct validity, this is ac-
They fit best under construct validity as first ceptable. But the Technical Recommendations
described in the 1954 Technical Recommendations, presentation of concurrent and predictive validity
even though construct validity was described there is not adequate to cover it. A number of distinc-
primarily as a possibility for the future. Subse- tions between trait validity and practical validity
3
For a justification of validational requirements on can be noted. In trait validity, no a priori defining
psychological grounds, see Campbell (19S9), especially pp. criterion is available as a perfect measure or de-
172-179. fining operation against which to check the fallible
548 THE AMERICAN PSYCHOLOGIST
test. Instead, the validator seeks out some in- social intelligence tests (Strang, 1930; Thorndike,
dependent way of getting at "the same" trait. 1936) is similar. Predictive or concurrent validity
Thus he may obtain specially designed ratings for considerations would not have invalidated these
the purpose. This independent measure has no tests, as they did indeed predict the practical
status as the criterion for the trait, nor is it given criteria which a general intelligence test would
any higher status for validity than is the test. predict, although perhaps not as well.
Both are regarded as fallible measures, often with An ubiquitous class of cases in which high cor-
known imperfections, such as halo effects for the relations have been invalidating are those in-
ratings and response sets for the test. Validation, stances of strong trait-irrelevant methods factors.
when it occurs, is symmetrical and equalitarian. These include the halo effects in ratings (Guilford,
The presumptive validity of both measures is in- 1954; Thorndike, 1920), response sets (Cronbach,
creased by agreement. Starting from a test, the 1946, 1950) and social desirability factors (Ed-
validating measure is selected or devised on the wards, 1957) in questionnaires, and stereotypes in
joint criteria of independence of method and interpersonal perception (Cronbach, 1958; Gage,
relevance to the trait. Leavitt, & Stone, 1956). Where feasible procedures
The Downey Will-Temperament Tests, the moral are available to check on the strength of these
knowledge tests, the introversion-extroversion tests, trait-irrelevant methods factors in their contribu-
the social intelligence tests, the empathy tests: all tion to reliable test variance, such procedures
have been invalidated without recourse to correlat- should certainly be tried before offering the test
ing test scores with criterion variables. They have for general use.
instead been invalidated by cumulative evidence
of the trait-validity sort. Inspecting the classic SUGGESTED ADDITIONS TO THE RECOMMENDED
surveys of the validity of personality tests on a EVIDENCES OF VALIDITY
study by study basis shows more than half of the
validational efforts cited by Symonds (1931) and Upon the basis of psychology's experience, more
Vernon (1938) to be of this type. An even half exhaustively assembled and discussed elsewhere
of the items in Ellis' (1946) review are of this (as in previous and subsequent references), the
type. For personnel selection tests, the role of following additions to the Technical Recommenda-
trait validation procedures would be much smaller, tions in the category of construct validity are sug-
of course, but even here it is relevant. Trait gested:
validity is thus an important part of our cumula- 1. Correlation with intelligence tests. A new
tive experience in finding some tests worthless. It test, no matter what its content, should be cor-
deserves to be represented among the precautionary related with an intelligence test of as similar format
standards attempting to prevent the needless pub- as possible (e.g., a group intelligence test for a
lication and sale of worthless tests in the future. group personality test, etc.). If correlations are
Common to trait validity and practical validity reported with independent trait-appropriate or cri-
is evidence of convergence or agreement between terion measures, it should be demonstrated that the
highly independent measures. Peculiar to trait new test correlates better with these measures than
validity considerations is the requirement of dis- does the intelligence test.
criminant validity (Campbell & Fiske, 1959), the This requirement is already somewhat recog-
requirement that a test not correlate too highly with nized. Some test manuals for empathy and for
measures from which it is supposed to differ. In- personality traits report low correlations with in-
stances of invalidation by high correlation were telligence as evidence favorable to validity. One
already available when Symonds summarized the major challenge to the validity of the F Scale, for
literature in 1931. He cites, for example, the example, is its high correlation with intelligence and
moral knowledge tests. The interests of the 1920's the fact that its correlations with ethnocentrism,
had led to the development of such tests by several social class, conformity, and leadership are correla-
persons. These moral knowledge tests, it turned tions previously demonstrated for test intelligence
out, individually correlated more highly with in- (e.g., Christie, 1954).
telligence tests than they did with each other and 2. Correlations with social desirability. A new
on this ground were abandoned. The case of the test of the voluntary self-descriptive sort should
CONSTRUCT, TRAIT, OR DISCRIMINANT VALIDITY 549
be correlated with some measure of the very gen- companied by control correlations based upon
eral response tendency of describing oneself in a random matches, as in the manner of Corsini
favorable light no matter what the trait-specific (1956) and Silverman (1959).
content of the items. If correlations are reported 5. Validity correlations higher than those for
with trait-appropriate or criterion measures, then self-ratings. Advocates of personality tests im-
it should be demonstrated that the new test pre- plicitly or explicitly claim that their scores are
dicts these measures better than does the general better measures (in some situations at least) than
social desirability factor. In lieu of this, construc- much quicker and more direct approaches such as
tion features designed to eliminate the social de- simple self-ratings. While correlations with self-
sirability factor should be specified, as in the forced rating may in some circumstances be validating, it
choice pairing of items previously equated on social should also be demonstrated that the test scores
desirability. Edwards (1957) reviews the evidence predict independent trait-appropriate or criterion
necessitating this requirement. measures better than do self-ratings. The available
3. Correlations with measures of acquiescence evidence (as sampled, for example, by Campbell
and other response sets. Tests of the voluntary and Fiske, 1959) shows that this may only rarely
self-description type employing responses with mul- be the case.
tiple levels of endorsement (e.g., L-D-I, A-a-?-d-D, 6. Multitrait-multimethod matrix. The demon-
etc.) should report correlations with external meas- stration of discriminant validity and the examina-
ures of acquiescence response set and other likely tion of the strength of method factors require a
response sets. For check lists, the correlation with validational setting containing not only two or more
general frequency of checking items independent methods of measuring a given trait, but also the
of content should be reported. It should be demon- measurement of two or more traits. This require-
strated that the tests predict trait-appropriate or ment is implicit in several of the points above
criterion measures better than do the response set and has been present in the range of validational
scores. In lieu of this, it should be demonstrated evidence used in our field from the beginning (e.g.,
that the test construction and scoring procedures Symonds, 1931). It is frequently convenient to
are such as to prevent response sets from being examine such evidence through a multitrait-multi-
confounded with trait-specific content in the total method matrix. Particularly does this seem de-
score, as through the use of items worded in op- sirable where the test publisher offers a multiple-
posite directions in equal numbers, etc. Cronbach score test or a set of tests in a uniform battery.
(1946, 1950) and others (e.g., Chapman & Bock, Achievement and ability tests need this fully as
19S8) have illustrated the extent to which extant much as do personality tests. A detailed argu-
tests have in fact produced scores predominately a ment for this requirement is presented elsewhere
function of such trait-irrelevant sources of vari- (Campbell & Fiske, 1959).
ance. (This is not to rule out the deliberate
utilization of response-set variance, where the in- DEMURRERS FROM SOME CONNOTATIONS
tent to do so is made explicit.) OF CONSTRUCT VALIDITY
4. Self-description and stereotype keys for in-
terpersonal perceptual accuracy tests. Measures It is believed that the originators of the term
of empathy, interpersonal perception, social com- "construct validity" would find in the above de-
petence, and the like should compare the results scription of trait validity, including its discriminant
of efforts to replicate the scores of particular social aspects and the suggested additions, nothing in-
targets with the use of self-descriptions and stereo- compatible with construct validity as they origi-
type scores as predictors; or such scores should nally intended it. For this reason and for reasons
be based upon competence in differentiating among of economy of conceptualization, it seems desirable
social targets rather than upon the absolute dis- to emphasize the essential identity. There may be,
crepancy in predictions for a single social target. nonetheless, several points at which the connota-
Gage, Leavitt, and Stone (19S6) and Cronbach tions of the original presentation are at variance
(1958) have described how misleading scores can with the emphases of this paper and of the orienta-
be without such checks. Similarly, Q type corre- tion toward validity represented by the multitrait-
lations offered as validity data should be ac- multimethod matrix. These connotations may very
550 THE AMERICAN PSYCHOLOGIST
well have been inadvertent aspects of the illustra- component in all validity is spelled out in more
tions used in the presentation rather than intended. detail by Campbell and Fiske (19S9).
In other cases they may be connotations elicited 4. Construct validity represents the abandon-
only in the minds of a few readers. A primary ment of operationalism. Like all of the pragmatist
source has been informal conversations with hard- and positivist calls for observable evidence as op-
headed psychologists who have failed to see the posed to untestable metaphysical speculation, con-
need for the concept or who have felt that they struct validation is a kind of operationalism, as the
disagreed with it. These demurrers are deliberately term is generally used. Where verifying operations
overstated here for purposes of expository clarity. against which to check tests are not automatically
As will be seen from the discussions subsumed available (as they are for predictive and concur-
under them, there are included some complaints rent validity), it calls for the generation of inde-
which the present writer feels are totally un- pendent operations for this purpose. The only
justified, as well as others upon which the presenta- kind of operationalism with which it is in disagree-
tion of construct validity may be in need of clarifi- ment is the totally unpracticed kind referred to by
cation. Bridgman in his original presentation (1927), as
1. Construct validity is new. Through the use when he said: "if we have more than one set of
of hypothetical illustrations rather than classic in- operations we have more than one concept, and
stances, and through the references to formal strictly there should be a separate name to cor-
theory, the connotation has been created that con- respond to each different set of operations" (p. 10).
struct validity was offered as a new type of valida- We may call this "exhaustive-definitional-opera-
tion procedure. Actually it is as old as the concept tionalism" if it is taken as alleging that for every
of test validity itself, and it (or trait validity) is theoretical construct there is one perfect defining
needed in any inventory of the useful procedures operation and that this operation exhaustively de-
by which tests have been shown to be invalid in fines that theoretical construct.
the past. Bridgman probably no longer holds to this ex-
2. Construct validity is only for tests developed treme view, if he ever did. No theoretical psy-
in the context of formal theory. While the il- chologist who attempts to relate theory with data
lustrations of the Technical Recommendations employs this exhaustive-definitional-operationalism.
presentation clearly contradict this, the term "con- To take an illustration from the range of au-
struct," the reference to nomological nets, and the thorities Bechtoldt cites: it is clear, for example,
accompanying argument from disputed positions that the Manifest Anxiety Scale (Taylor, 1951)
within the philosophy of science have furthered was not introduced as the exhaustive definition of
this impression. The heterogeneity of validational the Hull-Spence theoretical construct D, but rather
approaches encompassable within construct validity as a tentative operational representation of D, not
is indeed so great that its subdivision into trait excluding the representation of D by hours of food
and nomological validation might well improve deprivation, etc. in other studies. In the initial
accuracy of communication at the practical level. presentation (Taylor, 1951) it was considered an
3. Construct validity confuses reliability and empirically meaningful question (rather than a
validity. While this criticism is perhaps accurately matter of definition) to ask whether the MAS
applied to some of the precursors to the concept of might not be representing »//« instead of D. Spence
construct validity, and to some published claims of (1958) has clearly said that had the experiments
construct validity, details of the formal presenta- using the MAS been negative, the other portions
tions belie this. However, Technical Recommenda- of the theory were sufficiently well confirmed "that
tions is weak in making explicit the common de- we would have had no hesitancy about abandoning
nominator among all of the major validity notions,, the A-Scale [MAS] as being related to D in our
and their common difference from reliability. Re- theorem." Further, MAS scores have never been
liability is agreement between measures maximally assumed to be solely a function of D, but rather,
similar in method. The best examples of concur- some degree of impurity is conceded (e.g., Taylor,
rent, predictive, and construct validity all represent 1956, p. 303).
agreement between highly different and independent The general spirit of operationalism as endorsed
measurement procedures. This essential common by such varied persons as Margenau (1950, pp.
CONSTRUCT, TKAIT, OK DISCRIMINANT VALIDITY 551
232-242; 1954), Feigl (1945), and Frank (1946) evidences is combined with the small samples avail-
is certainly compatible with construct validity. able in clinical studies, capitalizing on chance
Where the emphasis is upon test operations, dis- sampling variations from zero validity becomes a
tinguishing operations, operational verification, mul- very real possibility. The multiplicity of extenu-
tiple operations (Frank, 1946), or convergent op- ating circumstances known to the sensitive clinician
erations (Garner, 1954; Garner, Hake, & Eriksen, in specific situations and for particular patients
1956), the compatibility is particularly clear. further dilutes the applicability of statistical tests.
5. Construct validity makes possible pseudo- These possibilities should certainly be discouraged
validation of invalid tests. Many of us identified in any new edition of Technical Recommendations
with structured measurement techniques and hard- and, of course, are neither inherent in nor limited
headed validational procedures are still smarting to construct validation procedures.
from having lost to projective techniques in the It is one of the valuable by-products of the rigid
late 1930's and early 1940's a battle that was never and "cookbook" character of the multitrait-multi-
fought. As we see it, the structured personality method matrix that it forces the investigator to
tests provided in the form of scores specific predic- specify in advance the correlations that will be
tions verifiable against other measures, such as validating if high, forces him to examine others
ratings by psychologists and peers, performance in which will be invalidating if high, and provides a
experimental situations, etc. The methodological setting for examining the comparative validity of
commitments of the field made certain that these techniques. Likewise, where a detailed and explicit
predictions be checked, resulting in a disappointing nomological net is employed in the validational
collection of validity data, disappointing not only procedure, such evasion of invalidation seems un-
because of numerous .00 correlations but also be- likely. It is also unlikely where the test constructor
cause .30s and .40s looked like failure against ex- commits himself in advance to the most appropriate
pectations in the .80s and .90s. This record of independent data series for validational purposes
"failure" led to the wholesale supplanting of struc- and then attempts to predict this both with his
tured tests with projective tests without any transi- new test and with simple self-ratings (or other
tion studies utilizing both types in competitive rival devices).
prediction against the same independent measures. When the evasion of invalidation is being con-
On their part, the projective tests were surrounded sidered, it seems well to note the evasion made
by an interpretive framework which evaded valida- possible by the combination of plausible a priori
tion. The scientific evidence justifying the intro- considerations, "face validity," and the kind of
duction of projectives was solely the evidence of operationalism which says "intelligence is what the
the "failure" of the structured approaches and not Stanford Binet (1916 edition) measures." Con-
in the least evidence of the superior validity of the trasted with this, the approach of construct valida-
projectives. This seemed a very unfair victory. tion forces the test designer into checking out the
Now belatedly projective tests are being checked implicit and explicit claims by which he convinces
in ways similar to those that invalidated the struc- others that his test is worth buying and using.
tured tests, and the evidence for projectives looks 6. Construct validity encourages the reification
even worse. It would seem very undesirable if of traits. Bechtoldt has ridiculed the proponents of
construct validity provided a rationalization for construct validity on this score. Such ridicule is
continued evasion of this evidence. telling because we have all been taught to avoid
That construct validity could do so in some in- naive reification, and we hate to appear unsophisti-
stances would be made possible by the joint ap-
cated. Were it not for such reasons of vanity, the
plication of several features. The presentation of
simplest answer would be that test constructors and
construct validity emphasized the wide variety of
validational evidence, without prescribing any par- buyers are in fact prone to such reification. Such
ticular type of evidence for all users. This makes reification implies laws which can and should be
possible a highly opportunistic selection of evi- sample-checked in the validation process. There
dence and the editorial device of failing to men- is no better cure for such reification than entering
tion the construct validity probes that were not the trait or the test in a multitrait-multimethod
confirmatory. When the multiplicity of possible matrix. Even for the more successful tests, this is
552 THE AMERICAN PSYCHOLOGIST
a humbling experience, generating modesty and validity have been guilty of this identification, but
caution. it is by no means limited to them, nor necessarily
Reifying tendencies arc not limited to the trail- incurred as a result of their validity rationale.
reifying construct validators, however. Still more Some such thinking is involved whenever the im-
pernicious is the score-reifying which has accom- plicit assumption is made that all of the correlates
panied the popular use of intelligence, achievement, of a given test involve the same sources of sys-
and vocational interest tests, for example. We are tematic variance from within the inherently com-
all occasionally appalled at the literal interpreta- plex test. As a matter of fact, the emphasis upon
tions and assumptions of immutable three-digit the presence of construct-irrelevant sources of vari-
perfection with which some users regard these ance in test scores should be a major deterrent to
test scores. Even test constructors are on occa- such lapses.3
sion childishly naive in assuming their test scores 3
A further possible demurrer may be briefly discussed.
to measure perfectly what they intended when Bechtoldt gives the following example (taken from John-
they wrote the items. Often they use a pseudo- son, 1954) of faulty inference presumably typical of con-
sophisticated operationalism to disguise this: what struct validity discussions: "If a person has an over-com-
the test measures is the trait. Intent and achieve- pensated inferiority complex, he blusters, is aggressive,
ment are slurred. domineering, and dogmatic; this man blusters, is agressive,
domineering, and dogmatic; therefore, he has an in-
In such settings, one feels the need for a meth- feriority complex." As a logical syllogism this is, of course,
odological perspective which emphasizes the im- invalid, but inductive science is well recognized by philoso-
perfection inherent in scores: the variable effects of phers since Hume to lack logical validity in this sense.
guessing, the biases in personality tests imposed by Bechtoldt does not make explicit enough to the casual
idiosyncracies in vocabulary, the impurities con- reader that this logical invalidity is present in all efforts
to relate theory to observations. For example, the theo-
tributed by token compliance, the misreading of retical validity of the MAS takes essentially this form: in-
items, the clerical errors in answering, the response crease in D in the Ilull-Spence behavior theory predicts
sets, the inevitable factorial complexity, and the more rapid learning in Conditions A, B, and C, and slower
like. A perspective which exhaustively defines learning in Conditions D, E, and F; high (vs. low) MAS
constructs in terms of obtained test scores can only scorers show more rapid learning in Conditions A, B, and
C, and slower learning in Conditions D, E, and F; therefore
inconsistently or indirectly admit such imperfec-
MAS is (may be tentatively considered to be) a measure
tion, ft is, however, an essential part of the of D. All theories in science relate to observations in this
critical-realist position that all measurement is, to logically unjustified manner and, concomitantly, are held
some degree at least, imperfect; and this feature tentatively. Note, however, that the "invalid" syllogism
is one that strongly recommends it as describing can rule out many theories, in that it sets up requirements
which many theory-data sets do not meet. These are
the orientation of the scientist to scientific truth.
more stringent the more specific and numerous the implica-
Note that the critical-realist position is not entirely tions of the theory. The scientific (as opposed to logical)
out of favor with sophisticated philosophers of validity of a theory becomes a matter of, first, the number
science and might even claim a plurality of those and rigor of such tests to which the theory has been exposed
in the logical positivist camp. and successfully survived and, second, the number of avail-
able rival theories which as efficiently subsume the same
7. Construct validity leads to a confusion of
complex of data. Bechtoldt is, of course, correct in point-
"significant correlation" with "identity." There is ing out that many actual uses of the phrase "construct
in current psychological writing a frequent lapse validity" in the literature have been in settings in which
into an implicit assumption of construct purity on the stringency of the requirements are so weak and the
the part of tests that have been found to be number of available plausible rival hypotheses so numerous
that the degree of confirmation of theory is trivial. But if
"valid" and an implicit assumption of complete
his presentation connotes that this weakness is specific to
intersubstitutibility on the part of different op- construct validity, or to response-inferred constructs, this
erations which have in one setting been found connotation is in serious error. The advocates of construct
"significantly correlated." Too frequently there validity take second place to none in their insistence upon
are inference sequences of this order: A correlates greater rigor in testing theory and in their demand for
vigilance in seeking out data series which will distinguish
.50 with B, B correlates .50 with C, therefore A
between rival interpretations. Critical-realist aspirations
equals C—employed even when, with a little more in science lead directly to this emphasis, as opposed to a
work, the relation of A to C could be directly possible nominalist complacency which says that all you
examined. Some users of the concept of construct have anyway are operations and correlations and all efforts
CONSTRUCT, TRAIT, OR DISCRIMINANT VALIDITY 553