Quantitative Psychology 83rd Annual Meeting of the
Psychometric Society, New York, NY 2018
Visit the link below to download the full version of this book:
https://medipdf.com/product/quantitative-psychology-83rd-annual-meeting-of-the-p
sychometric-society-new-york-ny-2018/
Click Download Now
Springer Proceedings in Mathematics & Statistics
This book series features volumes composed of selected contributions from
workshops and conferences in all areas of current research in mathematics and
statistics, including operation research and optimization. In addition to an overall
evaluation of the interest, scientific quality, and timeliness of each proposal at the
hands of the publisher, individual contributions are all refereed to the high quality
standards of leading journals in the field. Thus, this series provides the research
community with well-edited, authoritative reports on developments in the most
exciting areas of mathematical and statistical research today.
More information about this series at http://www.springer.com/series/10533
Marie Wiberg Steven Culpepper
• •
Rianne Janssen Jorge González
• •
Dylan Molenaar
Editors
Quantitative Psychology
83rd Annual Meeting of the Psychometric
Society, New York, NY 2018
123
Editors
Marie Wiberg Steven Culpepper
Department of Statistics, Umeå School Department of Statistics
of Business, Economics and Statistics University of Illinois at Urbana-Champaign
Umeå University Champaign, IL, USA
Umeå, Sweden
Jorge González
Rianne Janssen Facultad de Matematicas
Faculty of Psychology and Educational Pontificia Universidad Catolica de Chile
Sciences Santiago, Chile
KU Leuven
Leuven, Belgium
Dylan Molenaar
Department of Psychology
University of Amsterdam
Amsterdam, The Netherlands
ISSN 2194-1009 ISSN 2194-1017 (electronic)
Springer Proceedings in Mathematics & Statistics
ISBN 978-3-030-01309-7 ISBN 978-3-030-01310-3 (eBook)
https://doi.org/10.1007/978-3-030-01310-3
Mathematics Subject Classification (2010): 62P15, 62-06, 62H12, 62-07
© Springer Nature Switzerland AG 2019
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part
of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations,
recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission
or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar
methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this
publication does not imply, even in the absence of a specific statement, that such names are exempt from
the relevant protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the
authors or the editors give a warranty, expressed or implied, with respect to the material contained
herein or for any errors or omissions that may have been made. The publisher remains neutral with regard
to jurisdictional claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
Preface
This volume represents presentations given at the 83rd annual meeting of the
Psychometric Society, organized by Columbia University and held in New York,
USA, during July 9–13, 2018. The meeting attracted 505 participants, and 286
papers were presented, of which 81 were part of a symposium. There were 106
poster presentations, 3 pre-conference workshops, 4 keynote presentations, 3
invited presentations, 2 career award presentations, 3 state-of-the-art presentations,
1 dissertation award winner, and 18 symposia.
Since the 77th meeting in Lincoln, Nebraska, Springer publishes the proceedings
volume from the annual meeting of the Psychometric Society to allow presenters to
make their ideas available quickly to the wider research community, while still
undergoing a thorough review process. The first six volumes of the meetings in
Lincoln, Arnhem, Madison, Beijing, Asheville, and Zurich were received suc-
cessfully, and we expect a successful reception of these proceedings too.
We asked the authors to use their presentation at the meeting as the basis of their
chapters, possibly extended with new ideas or additional information. The result is a
selection of 38 state-of-the-art chapters addressing a diverse set of psychometric
topics, including item response theory, multistage adaptive testing, and cognitive
diagnostic models.
Umeå, Sweden Marie Wiberg
Urbana-Champaign, IL, USA Steven Culpepper
Leuven, Belgium Rianne Janssen
Santiago, Chile Jorge González
Amsterdam, The Netherlands Dylan Molenaar
v
Contents
Explanatory Item Response Theory Models: Impact on Validity
and Test Development? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Susan Embretson
A Taxonomy of Item Response Models in Psychometrika . . . . . . . . . . . 13
Seock-Ho Kim, Minho Kwak, Meina Bian, Zachary Feldberg,
Travis Henry, Juyeon Lee, Ibrahim Burak Olmez, Yawei Shen,
Yanyan Tan, Victoria Tanaka, Jue Wang, Jiajun Xu and Allan S. Cohen
NUTS for Mixture IRT Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Rehab Al Hakmani and Yanyan Sheng
Controlling Acquiescence Bias with Multidimensional
IRT Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Ricardo Primi, Nelson Hauck-Filho, Felipe Valentini, Daniel Santos
and Carl F. Falk
IRT Scales for Self-reported Test-Taking Motivation of Swedish
Students in International Surveys . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
Denise Reis Costa and Hanna Eklöf
A Modification of the IRT-Based Standard Setting Method . . . . . . . . . . 65
Pilar Rodríguez and Mario Luzardo
Model Selection for Monotonic Polynomial Item Response Models . . . . 75
Carl F. Falk
TestGardener: A Program for Optimal Scoring and Graphical
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Juan Li, James O. Ramsay and Marie Wiberg
Item Selection Algorithms in Computerized Adaptive Test
Comparison Using Items Modeled with Nonparametric
Isotonic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Mario Luzardo
vii
viii Contents
Utilizing Response Time in On-the-Fly Multistage
Adaptive Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Yang Du, Anqi Li and Hua-Hua Chang
Heuristic Assembly of a Classification Multistage Test
with Testlets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
Zhuoran Wang, Ying Li and Werner Wothke
Statistical Considerations for Subscore Reporting in Multistage
Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Yanming Jiang
Investigation of the Item Selection Methods in Variable-Length
CD-CAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
Ya-Hui Su
A Copula Model for Residual Dependency in DINA Model . . . . . . . . . . 145
Zhihui Fu, Ya-Hui Su and Jian Tao
A Cross-Disciplinary Look at Non-cognitive Assessments . . . . . . . . . . . 157
Vanessa R. Simmreing, Lu Ou and Maria Bolsinova
An Attribute-Specific Item Discrimination Index in Cognitive
Diagnosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
Lihong Song and Wenyi Wang
Assessing the Dimensionality of the Latent Attribute Space
in Cognitive Diagnosis Through Testing for Conditional
Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
Youn Seon Lim and Fritz Drasgow
Comparison of Three Unidimensional Approaches to Represent
a Two-Dimensional Latent Ability Space . . . . . . . . . . . . . . . . . . . . . . . . 195
Terry Ackerman, Ye Ma and Edward Ip
Comparison of Hyperpriors for Modeling the Intertrait Correlation
in a Multidimensional IRT Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
Meng-I Chang and Yanyan Sheng
On Extended Guttman Condition in High Dimensional Factor
Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
Kentaro Hayashi, Ke-Hai Yuan and Ge (Gabriella) Jiang
Equivalence Testing for Factor Invariance Assessment
with Categorical Indicators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
W. Holmes Finch and Brian F. French
Canonical Correlation Analysis with Missing Values: A Structural
Equation Modeling Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Zhenqiu (Laura) Lu
Contents ix
Small-Variance Priors Can Prevent Detecting Important
Misspecifications in Bayesian Confirmatory Factor Analysis . . . . . . . . . 255
Terrence D. Jorgensen, Mauricio Garnier-Villarreal,
Sunthud Pornprasermanit and Jaehoon Lee
Measuring the Heterogeneity of Treatment Effects with Multilevel
Observational Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
Youmi Suk and Jee-Seon Kim
Specifying Multilevel Mixture Selection Models in Propensity
Score Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
Jee-Seon Kim and Youmi Suk
The Effect of Using Principal Components to Create
Plausible Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Tom Benton
Adopting the Multi-process Approach to Detect Differential
Item Functioning in Likert Scales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307
Kuan-Yu Jin, Yi-Jhen Wu and Hui-Fang Chen
Detection of Differential Item Functioning via the Credible
Intervals and Odds Ratios Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319
Ya-Hui Su and Henghsiu Tsai
Psychometric Properties of the Highest and the Super
Composite Scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 331
Dongmei Li
A New Equating Method Through Latent Variables . . . . . . . . . . . . . . . 343
Inés Varas, Jorge González and Fernando A. Quintana
Comparison of Two Item Preknowledge Detection Approaches
Using Response Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355
Chunyan Liu
Identifying and Comparing Writing Process Patterns
Using Keystroke Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
Mo Zhang, Mengxiao Zhu, Paul Deane and Hongwen Guo
Modeling Examinee Heterogeneity in Discrete Option Multiple
Choice Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383
Nana Kim, Daniel M. Bolt, James Wollack, Yiqin Pan, Carol Eckerly
and John Sowles
Simulation Study of Scoring Methods for Various
Multiple-Multiple-Choice Items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393
Sayaka Arai and Hisao Miyano
x Contents
Additive Trees for Fitting Three-Way (Multiple Source)
Proximity Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 403
Hans-Friedrich Köhn and Justin L. Kern
A Comparison of Ideal-Point and Dominance Response Processes
with a Trust in Science Thurstone Scale . . . . . . . . . . . . . . . . . . . . . . . . 415
Samuel Wilgus and Justin Travis
Rumor Scale Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429
Joshua Chiroma Gandi
An Application of a Topic Model to Two Educational Assessments . . . . 449
Hye-Jeong Choi, Minho Kwak, Seohyun Kim, Jiawei Xiong,
Allan S. Cohen and Brian A. Bottge
Explanatory Item Response Theory
Models: Impact on Validity and Test
Development?
Susan Embretson
Abstract Many explanatory item response theory (IRT) models have been devel-
oped since Fischer’s (Acta Psychologica 37:359–374, 1973) linear logistic test model
was published. However, despite their applicability to typical test data, actual impact
on test development and validation has been limited. The purpose of this chapter is
to explicate the importance of explanatory IRT models in the context of a frame-
work that interrelates the five aspects of validity (Embretson in Educ Meas Issues
Pract 35, 6–22, 2016). In this framework, the response processes aspect of validity
impacts other aspects. Studies on a fluid intelligence test are presented to illustrate
the relevancy of explanatory IRT models to validity, as well as to test development.
Keywords Item response theory · Explanatory models · Validity
1 Introduction
Since Fischer (1973) introduced the linear logistic test model (LLTM), many addi-
tional explanatory IRT models have been developed to estimate the impact of item
complexity on item parameters. These models include the linear partial credit model
(LPCM; Fischer & Ponocny, 1995), the linear logistic test model with response error
term (LLTM-R; Janssen, Schepers, & Peres, 2004), the constrained two parame-
ter logistic model (2PL-Constrained; Embretson, 1999) and the Rasch facet model
(Linacre, 1989). Explanatory IRT models also can include covariates for both items
and persons, as well as within-person interactions (De Boeck & Wilson, 2004).
Several models can detect strategy differences between persons, such as mixture
distribution models (Rost, 1990; Rost & von Davier, 1995) and mixed models that
include response time to detect strategies (Molenaar & De Boeck, 2018). Further,
hierarchical models can be used in an explanatory fashion, such as item family models
(Glas, van der Linden & Geerlings, 2010) and a criterion-referenced model (Janssen,
Tuerlinckx, Meulder & De Boeck, 2000). Multidimensional IRT models with defined
S. Embretson (B)
Georgia Institute of Technology, Atlanta, GA 30328, USA
e-mail: [email protected]
© Springer Nature Switzerland AG 2019 1
M. Wiberg et al. (eds.), Quantitative Psychology, Springer Proceedings
in Mathematics & Statistics 265, https://doi.org/10.1007/978-3-030-01310-3_1
2 S. Embretson
dimensions, such as the bifactor MIRT (Reise, 2012) or the multicomponent latent
trait model (MLTM; Embretson, 1984, 1997) also can be used as explanatory IRT
models. The Handbook of Item Response Theory (van der Linden, 2016) includes
several explanatory models. Janssen (2016) notes that explanatory IRT models have
been applied to many tests, ranging from mathematics, reading and reasoning to
personality and emotions.
However, despite the existence of these models for several decades and their
applicability to typical test data, actual impact on test development and validation has
been limited. The purpose of this chapter is to highlight the importance of explanatory
IRT models in test development. Studies on the development of a fluid intelligence
test are presented to illustrate the use of explanatory IRT models in test design and
validation. Prior to presenting the studies, background on the validity concept and a
framework that unifies the various aspects are presented.
1.1 Test Validity Framework
In the current Standards for Educational and Psychological Testing (2014), validity
is conceptualized as a single type (construct validity) with five aspects. First, the
content aspect of construct validity is the representation of skills, knowledge and
attributes on the test. It is supported by specified test content, such as blueprints
that define item skills, knowledge or attribute representation, as well as specifica-
tions of test administration and scoring conditions. Second, the response processes
aspect of validity consists of evidence on the cognitive activities engaged in by the
examinees. These cognitive activities are assumed to be essential to the meaning of
the construct measured by a test. The Standards for Educational and Psychological
Testing describes several direct methods to observe examinees’ processing on test
items, such as eye-trackers movements, videos and concurrent and retrospective ver-
bal reports/observations, as well as response times to items or the whole test. Third,
the internal structure aspect of construct validity includes psychometric properties
of a test as relevant to the intended construct. Thus, internal consistency reliability,
test dimensionality and differential item functioning (DIF) are appropriate types of
evidence. Item selection, as part of test design, has a direct impact on internal struc-
ture. Fourth, the relationship to other variables aspect concerns how the test relates
to other traits and criteria, as well as to examinee background variables (i.e., demo-
graphics, prior experience, etc.). Evidence relevant to this aspect should be consistent
with the goals of measurement. Fifth, the consequences aspect of validity concerns
how test use has adverse impact on different groups of examinees. While the test
may not have significant DIF, studies may nonetheless show that the test has adverse
impact if used for selection or placement. Adverse impact is particularly detrimental
to test quality if based on construct-irrelevant aspects of performance.
The various aspects of validity can be conceptualized as a unified sys-
tem with causal interrelationships (Embretson, 2017). Figure 1 organizes the
five aspects into two general areas, internal and external, which concern test
Explanatory Item Response Theory Models … 3
Fig. 1 Unified framework for validity
meaning and test significance, respectively. Thus, the content, response processes
and internal structure aspects are relevant to defining the meaning of the construct
while the relationships to other variables and consequences aspects define the sig-
nificance of the test. Notice that the content and response processes aspect drive
the other aspects causally in this framework. Importantly, these two aspects can be
manipulated in test development. That is, item design, test specifications and testing
conditions can impact test meaning. Thus, understanding the relationship between
test content and response processes can be crucial in test development to measure
the intended construct.
Unfortunately, the methods for understanding response processes described in
the Standards have substantial limitations. Both eye-tracker data and talk aloud data
are typically expensive to collect and analyze as well as impacting the nature of
processing for examinees. Further, unless elaborated in the context of a model, the
utility of response time data may be limited to identifying guessing or inappropriate
responses. Importantly, explanatory IRT modeling can be applied to standard test data
with no impact on examinees responses. Further, such models permit hypotheses to
be tested about the nature of response processes through relationships of item content
features and item responses.
2 Explanatory IRT Models in Item Design: Examples
from ART
The Abstract Reasoning Test (ART) was developed in the context of research on
response processes. ART is a test of fluid intelligence used to predict learning and
performance in a variety of settings (Embretson, 2017). ART consists of matrix
completion items as shown in Fig. 2. In these items, the examinee must identify the
figure that completes the matrix based on the relationships between the figures across
the rows and down the columns.
4 S. Embretson
Fig. 2 Example of an ART item
2.1 Theory of Response Processes on Matrix Problems
Consistent with the Carpenter, Just and Shell’s (1990) theory, it was hypothesized
that examinees process the various elements individually in the matrix entries to
find relationships. According to the theory, processing complexity is driven by the
number of unique objects (as counted in the first entry) and memory load in finding
relationships. Memory load depends on both the number and types of relationships,
which are hypothesized to be ordered by complexity as follows: 1 = Constant in
a Row (or column), the same figure appears in a row; 2 = Pairwise Progressions,
figures change in the same way in each row; 3 = Figure Addition/Subtraction, the
third column results from overlaying the first and second columns and subtracting
common figures; 4 = Distribution of Three, a figure appears once and only once
in each row and column and 5 = Distribution of Two, one figure is systematically
missing in each row and column. Figure 2 illustrates relationships #1, #4 and #5
(see key on right) and Fig. 4 illustrates relationship #3. Relationship #2 could be
illustrated by a change in object size across rows. Carpenter et al. (1990) postulate
that these relationships are tried sequentially by examinees, such that Constant in a
Row is considered before Pairwise Progressions and so forth. Thus, the Memory Load
score is highest for the Distribution of Two relationships. Figure 2 shows numerical
impact on Memory Load for three types of relationships. The difficulty of solving
matrix problems also is hypothesized to depend on perceptual complexity, which
is determined by Distortion, Fusion or Integration of objects in an entry. Figure 2
has none of these sources of perceptual complexity while Fig. 4 illustrates object
integration in the matrix on the right side. Each matrix item can be scored for the
processing and perceptual complexity variables. Item difficulty is postulated to result
from these variables because they drive cognitive complexity.
Explanatory Item Response Theory Models … 5
2.2 Explanatory Modeling of Response Processes on ART
Matrix Problems
An explanatory modeling of ART item difficulty results from applying LLTM to
item response data, using the scores for matrix problem complexity. LLTM is given
as follows:
exp(θj − k τk qik + τ0 )
P(θ) = (1)
1 + exp(θj − k τk qik + τ0 )
where qik is the score for item i on attribute k, τk is the weight of attribute k in item
difficulty and τ0 is an intercept. Finally, θj is the ability of person j.
LLTM was applied to model item responses for ART items, scored for the two
predictors of processing complexity and the three predictors of perceptual complexity.
For example, a sample of 705 Air Force recruits were administered a form of ART
with 30 items. The delta statistic, which is a likelihood ratio index of fit (Embretson,
1999) similar in magnitude to a multiple correlation, indicated that LLTM had strong
fit to the data ( = .78). The processing complexity variables had the strongest impact,
especially memory load, which supports the theory.
2.3 Impact of Explanatory Modeling on Item Design
for Matrix Problems
These results and the scoring system had direct impact on item and test design for
ART. An automatic item generator was developed for ART items. Abstract structures
were specified to define the objects within each cell of the 3 × 3 display and the
response options. Types of relationships, as described above, specifies the changes
in objects (e.g., circles, arrows, squares, etc.) and/or their properties (e.g., shading,
borders, distortion, size, etc.) across columns and rows. LLTM results on military
samples indicated high predictability of item difficulty by the generating structure (
= .90) and continued prediction by the five variables defining cognitive complexity
( = .79).
3 Strategy Modeling in Test Design: Example from ART
Examinee differences in item solving strategies and potential impact on the various
aspects of validity was examined in two studies. In Study 1, ART was adminis-
tered with the original brief instructions. In Study 2, ART was administered with an
expanded version of the instructions with examples of each type of relationship. In
both studies, strategies were examined through mixture modeling.
6 S. Embretson
3.1 Mixture Modeling to Identify Latent Classes
The mixture Rasch model (Rost & von Davier, 1995) can be applied to identify
classes of examinees that vary in item difficulty ordering, which is postulated to
arise from applying different item solving strategies. The mixture Rasch model is
given as follows:
exp(θjg − βig )
P(θ) = g πg (2)
1 + exp(θjg − βig )
where βig is the difficulty of item i in class g, θjg is the ability of person j in class g
and πg is the probability of class g. Classes are identified empirically to maximize
model fit. However, class interpretation can be examined by follow-up explanatory
modeling (e.g., applying LLTM within classes) or by comparing external correlates
of ability.
3.2 Study 1
Method. A form of ART with 30 items was administered to 803 Air Force recruits
who were completing basic training. The ART instructions concerned the nature of
matrix problems as defined by relationships in the row and columns in the 3 × 3
matrices. However, the scope of relationships that could be involved was not covered.
ART was administered without time limits. Item parameters were estimated with
the Rasch model and with the mixture Rasch model. In both cases the mean item
parameter was set to zero.
Results from other tests were available on the examinees, including the Armed
Services Vocational Aptitude Battery (ASVAB).
Results. The test had moderate difficulty for the sample based on raw scores
(M = 18.097, SD = 5.784) and latent trait estimates (M = .636, SD = 1.228).
Racial-ethnic comparisons were between groups with N > 50. The latent trait
estimates were significant (F2,743 = 8.722, p < .001, η2 = .023). Standardized
differences of (d = .452) for African Americans and (d = .136) for Hispanics were
observed as compared to Caucasians.
The mixture Rasch model was applied with varying numbers of classes. Table 1
shows that while the log likelihood index (−2lnL) decreased successively from one to
three classes, the Bayesian Information Criterion (BIC) increased for three classes.
Thus, the two-class solution, with 68.7 and 31.2% of examinees in Class 1 and
Class 2 respectively, was selected for further study. The latent trait means differed
significantly between classes (F1,801 = 439.195, p < .001), with Class 1 (M = 1.143,
SD = .984) scoring higher than Class 2 (M = −.413, SD = .865). Significant racial
ethnic differences were observed between the classes χ21,695 = 12.958, p < .001
, with 75.0% of Caucasians and 57.3% of African-Americans in Class 1.
Explanatory Item Response Theory Models … 7
Table 1 Mixture Rasch modeling results
Number of classes Parameters −2lnL BIC
Study 1
1 31 25,472 25,680
2 62 25,146 25,567
3 93 25,044 25,680
Study 2
1 33 13,222 13,423
2 67 13,068 13,477
3 101 13,001 13,616
Table 2 LLTM weights, standard errors and t value by class
Complexity source Class 1 (df = 572, = .820) Class 2 (df = 229, = .809)
Weight SE t value Weight SE t value
Unique elements .1922 .0113 16.95* .2681 .0185 14.50*
Memory load .1851 .0049 37.49* .0926 .0077 12.09*
Integration .4543 .0454 10.00* .5502 .0622 8.85*
Distortion .7434 .0654 11.36* −.0121 .1054 −.12
Fusion .3150 .0508 6.20* .0549 .0723 .76
Intercept −4.1809 .1018 −41.08* −2.2618 .1285 −17.61*
*p < .01
LLTM was applied within each class to determine the relative impact of the
sources of cognitive complexity. While the overall prediction, as indicated by the
statistic (Embretson, 1999) shown on Table 2, was strong for both classes, the
LLTM weights for cognitive complexity differed. Typically, the strongest predictor
is Memory Load; however, the weight for Memory Load was significantly higher in
Class 1. Unique Elements was the strongest predictor in Class 2 and two of three
perceptual complexity variables were not significant.
Item difficulty also was modeled by the sources of memory load from the five
types of relationships. It was found that the number of Figure-Addition relationships
was correlated negatively for Class 1 (r = −.211) and positively for Class 2 (r =
.216). Items with Figure-Addition relationships mostly more difficult for Class 2 (see
Fig. 3).
Finally, ART trait estimates were correlated with four factors of ASVAB: Ver-
bal, Quantitative, Perceptual Speed and Technical Information. Although significant
positive correlations were found with all factors except Perceptual Speed for Class
1, no significant correlations with ASVAB factors were found for Class 2.
Discussion. Two classes of examinees, with varying patterns of item difficulty,
were identified on the ART for fluid intelligence. Class 2 was characterized by sub-
stantially lower trait levels and lack of significant correlations with other aptitude
8 S. Embretson
Fig. 3 Item difficulties by class
measures (i.e., ASVAB factors). Further, item difficulty was less predictable for Class
2 from the memory load associated with ART items. An analysis of the relationship
types that contribute to memory load indicated that items with Figure-Addition rela-
tionships had substantially higher difficulty in Class 2. A possible explanation is that
examinees in this class were unfamiliar with the Figure-Addition relationships and
applied the much harder Distribution of Two relationship. Figure 4 shows examples
of these relationships. Notice that the item on the left requires two Distribution of
Two relationships (i.e., changes in the hourglass and house figures), as well as a Con-
stant in a Row (triangles). The item on the right, however, can be solved by either
three Figure-Addition (colum 3 is the substraction of column 2 from column 1) or
three Distribution of Two relationships.