0% found this document useful (0 votes)
281 views

Livro-Bilog Multilog Parscale

Bilog multilog
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
281 views

Livro-Bilog Multilog Parscale

Bilog multilog
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 893

IRT from SSI:

BILOG-MG MULTILOG
PARSCALE TESTFACT

Edited by Mathilda du Toit


BILOG-MG, MULTILOG, PARSCALE, and TESTFACT are trademarks of Scientific
Software International, Inc.

General notice: Other product names mentioned herein are used for identification
purposes only and may be trademarks of their respective companies.

IRT from SSI: BILOG-MG, MULTILOG, PARSCALE, TESTFACT.

Copyright © 2003 by Scientific Software International, Inc.

All rights reserved. Printed in the United States of America.

No part of this publication may be reproduced or distributed, or stored in a database or


retrieval system, or transmitted, in any form or by any means, without the prior written
permission of the publisher.

Edited by Mathilda du Toit.

Cover by Clint Smith of Esse Group. Based on a design by Louis Sullivan for
elevator grillwork in the Chicago Stock Exchange (1893).

1 2 3 4 5 6 7 8 9 0 08 07 06 05 04 03

Published by:

Scientific Software International. Inc.


7383 North Lincoln Avenue, Suite 100
Lincolnwood, IL 60712—1704
Tel: + 1.847.675.0720
Fax: +1.847.675.2140
URL: http://www.ssicentral.com

ISBN: 0-89498-053-X
Preface
Software for item analysis and test scoring has long been an important subset of the products
published by SSI. In this new volume, four of the IRT programs that have previously been
published separately have been brought together for the first time. The four programs—BILOG-
MG, MULTILOG, PARSCALE, and TESTFACT—have been ported to the Windows platform.
In the case of BILOG-MG and MULTILOG, analyses can be set up and performed interactively
via dialog boxes from within the program. Interfaces for TESTFACT and PARSCALE do not
presently include dialog boxes to build syntax interactively. All programs offer extensive on-line
help, and BILOG-MG, MULTILOG, and PARSCALE also include an IRT graphing program,
capable of producing quality graphics.
The programs
BILOG-MG, an extension of the BILOG program for the analysis of dichotomous data, was
written by Michele Zimowski (National Opinion Research Center, Chicago), Eiji Muraki
(Tohoku University, Japan), Robert Mislevy (Educational Testing Service), and Darrell Bock
(University of Chicago). This program can also perform multiple group analysis, allowing the
user to study both DIF and DRIFT. The documentation for the program, which has been
incorporated into Chapters 2, 7, 8, and 10 of this volume, was written by Darrell Bock and
Michele Zimowski, while Eiji Muraki and Robert Mislevy made major contributions in terms of
programming.
MULTILOG, written by David Thissen (University of North Carolina, Chapel Hill), is designed
to facilitate the analysis and scoring of items with multiple alternatives and makes use of logistic
response models, such as Samejima’s (1969) model for graded responses, Bock’s (1972) model
for nominal (non-ordered) responses, and Steinberg’s (1984) model for multiple-choice items.
Documentation by David Thissen has been included in Chapters 4, 7, 8, and 12 of this volume.
Eiji Muraki and Darrell Bock wrote PARSCALE, a program for the analysis and scoring of
rating-scale data. The program, which has proven to be a very flexible tool over the years, can
also perform multiple-group and DIF analysis. Documentation for PARSCALE, provided by Eiji
Muraki, is included in Chapters 3, 7, and 8.
The fourth program, TESTFACT, was written by Robert Wood (Pearn Kandola Downs, Oxford,
England). Other contributors to the program are Darrell Bock, Robert Gibbons (University of
Illinois, Chicago), Steven Schilling (University of Michigan), Eiji Muraki, and Douglas Wilson
(London, England). TESTFACT performs classical test scoring, item analysis, and item factor
analysis. Documentation provided by Robert Wood has been included in Chapters 4, 7, and 8.
About this book
This volume can be divided into two sections: a setup and reference guide, and an applications
guide.
The first section contains a description of data preparation and reference guides for each of the
four programs. It also provides descriptions of the user's interfaces (where applicable) and the
IRT graphing program.
Chapter 1, dealing with the preparation of data for use in the programs, was written by Leo Stam,
SSI’s president and IRT consultant. Chapters 2, 3, 4, and 5 provide reference guides to both
syntax and interface for BILOG-MG, PARSCALE, MULTILOG and TESTFACT respectively.
Chapter 6 deals with a new feature common to BILOG-MG, MULTILOG, and PARSCALE: the
new graphics module. The item characteristic curves, item and test information curves, and a
matrix plot of all item characteristic curves simultaneously, can all be plotted with this module.
The option to obtain a histogram of the estimated abilities has also been included.
The final two chapters in the first section of this volume provide information on the various
models that may be fitted in each program (See Chapter 7), while Chapter 8 discusses the
methods of estimation and the implementation of these in each of the applications.
The applications guide, covering chapters 9 to 13, starts with an overview of item response
theory and current applications thereof given by Professor Darrell Bock, the cofounder and
former president of SSI and one of the main authors of the IRT software. Chapters 10-13
provides annotated examples for the four programs. These chapters are meant as an aid both to
setting up command files and the interpretation of results obtained from IRT analyses. In each
example a description of the research problem as well as the program keywords used in the
syntax file for the analysis are given. I have also revised and, in a number of cases, added to the
annotation of key sections of the output files produced by each program.
Appendix A contains a paper by Darrell Bock, A brief history of item response theory. This
paper, which first appeared in Educational Measurement: Issues and Practice, has been reprinted
here with the kind permission of the journal editors and provides a fascinating overview of the
development of IRT to date.
Using the CD / Installing the programs
The software CD contains four IRT programs. Each one can be installed separately, and in each
case complete on-line help is provided. SSI provides technical support for all registered users of
its software and it is recommended that the registration card, included in each shipment, be
returned to SSI for this purpose.
If the installation process does not begin automatically, locate and run setup.exe from the root
directory of your computer’s CD drive. Each of the IRT programs has a unique serial number
that appears on the CD jacket and/or shipment invoice; these should be retained for your records.
Although provision is made for a custom installation, the typical installation is recommended.
This installation includes the program files, the online help and a subfolder with all the examples
discussed in the help file and in this volume. The default installation folder can be changed to
suit the user’s needs. The readme.txt and/or readme.wri files contain instructions on how to
create a desktop icon for and shortcut to each program.
In addition to the IRT programs, the CD contains the most recent student editions of the LISREL
and HLM programs that are also published by SSI. Other extra resources include this volume (in
.PDF format) and a copy of Adobe Systems’ Acrobat® Reader®.
Acknowledgements
Invaluable contributions from Darrell Bock and David Thissen made this project possible. The
daunting task of porting the IRT programs to Windows and designing the new dialog boxes was
undertaken by Shyan Lam. All data sets and examples were carefully revised by Leo Stam.
Debugging the programs and writing the graphics module were the responsibilities of Stephen du
Toit, whose untiring work and support went a long way toward making this volume a reality.
Bola King and Gerhard Mels spent weeks patiently working through all the documentation,
proofreading and offering suggestions on how this volume could be made more consistent in
style and more useful to the user of IRT programs. Without the assistance of all of these people,
this volume would never have been anything more than a good idea. Lastly, I must mention that
a venture of this magnitude is bound to be imperfect; I accept responsibility for any errors or
omissions in this volume and look forward to constructive criticism that will make the next
version even better.

– Mathilda du Toit
Table of Contents
1 DATA PREPARATION ..........................................................................................................16

2 BILOG-MG...............................................................................................................................24

2.1 NEW FEATURES IN BILOG-MG ...........................................................................................24

2.2 PHASES OF THE ANALYSIS: INPUT, CALIBRATION AND SCORING ...........................................26

2.3 THE BILOG-MG INTERFACE ...............................................................................................37

2.3.1 File menu................................................................................................................................................... 38

2.3.2 Edit menu .................................................................................................................................................. 40

2.3.3 Setup menu............................................................................................................................................... 40

2.3.4 Data menu ................................................................................................................................................. 54

2.3.5 Technical menu ........................................................................................................................................ 64

2.3.6 Save menu ................................................................................................................................................ 80

2.3.7 Run menu.................................................................................................................................................. 81

2.3.8 Output menu ............................................................................................................................................. 82

2.3.9 View menu................................................................................................................................................. 82

2.3.10 Options menu ......................................................................................................................................... 83

2.3.11 Window menu ......................................................................................................................................... 85

2.3.12 Help menu ............................................................................................................................................... 85

2.3.13 Location of keywords in interface ........................................................................................................ 85

2.4 GETTING STARTED WITH BILOG-MG..................................................................................91

2.4.1 A first model: 2PL model for spelling data ............................................................................................ 92

2.4.2 A second model: DIF model for spelling data ..................................................................................... 100

2.5 SYNTAX..............................................................................................................................108

2.5.1 Data structures: ITEMS, TEST, GROUP and FORM commands ........................................................ 108

2.6 USING THE COMMAND LANGUAGE ......................................................................................113

1
2.6.1 Overview of syntax................................................................................................................................. 113

2.6.2 Order of commands ............................................................................................................................... 114

2.6.3 CALIB command .................................................................................................................................... 116

2.6.4 COMMENT command............................................................................................................................. 141

2.6.5 DRIFT command..................................................................................................................................... 142

2.6.6 FORM command..................................................................................................................................... 144

2.6.7 GLOBAL command ................................................................................................................................ 147

2.6.8 GROUP command .................................................................................................................................. 159

2.6.9 INPUT command..................................................................................................................................... 163

2.6.10 ITEMS command .................................................................................................................................. 182

2.6.11 LENGTH command............................................................................................................................... 185

2.6.12 PRIORS command ............................................................................................................................... 187

2.6.13 QUAD command................................................................................................................................... 193

2.6.14 QUADS command ................................................................................................................................ 196

2.6.15 SAVE command.................................................................................................................................... 199

2.6.16 SCORE command................................................................................................................................. 208

2.6.17 TEST command .................................................................................................................................... 224

2.6.18 TITLE command ................................................................................................................................... 234

2.6.19 Variable format statement ................................................................................................................... 235

2.6.20 Input and output files........................................................................................................................... 241

3 PARSCALE ............................................................................................................................257

3.1 THE PARSCALE INTERFACE .............................................................................................258

3.1.1 Main menu............................................................................................................................................... 258

3.1.2 Workspace .............................................................................................................................................. 259

3.1.3 Run menu................................................................................................................................................ 259

3.1.4 Output menu ........................................................................................................................................... 260

2
3.1.5 Font option.............................................................................................................................................. 260

3.1.6 Window menu ......................................................................................................................................... 261

3.2 COMMAND SYNTAX ............................................................................................................261

3.2.1 Order of commands ............................................................................................................................... 262

3.2.2 BLOCK command................................................................................................................................... 265

3.2.3 CALIB command .................................................................................................................................... 274

3.2.4 COMBINE command .............................................................................................................................. 285

3.2.5 COMMENT command............................................................................................................................. 287

3.2.6 FILES command ..................................................................................................................................... 288

3.2.7 INPUT command..................................................................................................................................... 292

3.2.8 MGROUP command ............................................................................................................................... 300

3.2.9 MRATER command ................................................................................................................................ 303

3.2.10 PRIORS command ............................................................................................................................... 305

3.2.11 QUADP command ................................................................................................................................ 308

3.2.12 QUADS command ................................................................................................................................ 310

3.2.13 SAVE command.................................................................................................................................... 312

3.2.14 SCORE command................................................................................................................................. 316

3.2.15 TEST/SCALE command....................................................................................................................... 325

3.2.16 TITLE command ................................................................................................................................... 330

3.2.17 Variable format statements ................................................................................................................. 331

3.3 INPUT FILES ........................................................................................................................333

3.3.1 Specification of input files .................................................................................................................... 333

3.3.2 Individual level data ............................................................................................................................... 333

3.3.3 Group-level data ..................................................................................................................................... 335

3.3.4 Key files................................................................................................................................................... 336

3
3.4 OUTPUT FILES.....................................................................................................................337

3.4.1 Format of output files ............................................................................................................................ 337

3.4.2 Combined score file ............................................................................................................................... 337

3.4.3 Fit statistics file ...................................................................................................................................... 338

3.4.4 Item parameter file ................................................................................................................................. 340

3.4.5 Item information file ............................................................................................................................... 342

3.4.6 Subject scores file.................................................................................................................................. 343

4 MULTILOG ...........................................................................................................................345

4.1 THE MULTILOG USER’S INTERFACE ................................................................................345

4.1.1 Main menu............................................................................................................................................... 346

4.1.2 Run menu................................................................................................................................................ 346

4.1.3 Output menu ........................................................................................................................................... 347

4.1.4 Window menu ......................................................................................................................................... 347

4.1.5 Font option.............................................................................................................................................. 348

4.2 CREATING SYNTAX USING THE MULTILOG SYNTAX WIZARD ..........................................349

4.2.1 New Analysis dialog box ....................................................................................................................... 349

4.2.2 Fixed Theta dialog box .......................................................................................................................... 350

4.2.3 Input Data dialog box............................................................................................................................. 351

4.2.4 Input Parameters dialog box................................................................................................................. 352

4.2.5 Test Model dialog box ........................................................................................................................... 354

4.2.6 Response Codes (Binary Data) dialog box.......................................................................................... 355

4.2.7 Response Codes (Non-Binary Data) dialog box ................................................................................. 356

4.3 GETTING STARTED WITH MULTILOG...............................................................................357

4.3.1 Two-parameter model for the skeletal maturity data.......................................................................... 357

4.3.2 Three-parameter (and guessing) model for the LSAT6 data.............................................................. 364

4
4.3.3 Generating syntax for a fixed- θ model............................................................................................... 370

4.4 COMMAND SYNTAX ............................................................................................................375

4.4.1 Overview of syntax................................................................................................................................. 375

4.4.2 END command........................................................................................................................................ 378

4.4.3 EQUAL command................................................................................................................................... 379

4.4.4 ESTIMATE command ............................................................................................................................. 382

4.4.5 FIX command.......................................................................................................................................... 385

4.4.6 LABELS command ................................................................................................................................. 387

4.4.7 PROBLEM command ............................................................................................................................. 388

4.4.8 PRIORS command.................................................................................................................................. 393

4.4.9 SAVE command...................................................................................................................................... 395

4.4.10 START command ................................................................................................................................. 396

4.4.11 TEST command .................................................................................................................................... 398

4.4.12 TGROUPS command ........................................................................................................................... 401

4.4.13 TMATRIX command ............................................................................................................................. 403

4.4.14 Variable format statement ................................................................................................................... 405

5 TESTFACT.............................................................................................................................410

5.1 INTRODUCTION ...................................................................................................................410

5.2 THE TESTFACT INTERFACE .............................................................................................411

5.2.1 Main menu............................................................................................................................................... 411

5.2.2 Run menu................................................................................................................................................ 412

5.2.3 Output menu ........................................................................................................................................... 412

5.2.4 Window menu ......................................................................................................................................... 412

5.2.5 Font option.............................................................................................................................................. 412

5.3 COMMAND SYNTAX ............................................................................................................413

5.3.1 Order of commands ............................................................................................................................... 414

5
5.3.2 Overview of syntax................................................................................................................................. 415

5.3.3 BIFACTOR command............................................................................................................................. 418

5.3.4 CLASS command ................................................................................................................................... 424

5.3.5 COMMENT command............................................................................................................................. 426

5.3.6 CONTINUE command............................................................................................................................. 427

5.3.7 CRITERION command............................................................................................................................ 428

5.3.8 EXTERNAL command ............................................................................................................................ 430

5.3.9 FACTOR command ................................................................................................................................ 431

5.3.10 FRACTILES command ......................................................................................................................... 435

5.3.11 FULL command .................................................................................................................................... 437

5.3.12 INPUT command................................................................................................................................... 441

5.3.13 KEY command ...................................................................................................................................... 448

5.3.14 NAMES command ................................................................................................................................ 449

5.3.15 PLOT command.................................................................................................................................... 450

5.3.16 PRIOR command .................................................................................................................................. 452

5.3.17 PROBLEM command ........................................................................................................................... 454

5.3.18 RELIABILITY command ....................................................................................................................... 459

5.3.19 RESPONSE command ......................................................................................................................... 460

5.3.20 SAVE command.................................................................................................................................... 461

5.3.21 SCORE command................................................................................................................................. 474

5.3.22 SELECT command ............................................................................................................................... 480

5.3.23 SIMULATE command ........................................................................................................................... 482

5.3.24 STOP command.................................................................................................................................... 488

5.3.25 SUBTEST command............................................................................................................................. 489

5.3.26 TECHNICAL command......................................................................................................................... 491

5.3.27 TETRACHORIC command ................................................................................................................... 499

6
5.3.28 TITLE command ................................................................................................................................... 502

5.3.29 Variable format statement ................................................................................................................... 502

6 IRT GRAPHICS.....................................................................................................................505

6.1 INTRODUCTION ...................................................................................................................505

6.2 MAIN MENU ........................................................................................................................505

6.2.1 The ICC option........................................................................................................................................ 506

6.2.2 The Information option .......................................................................................................................... 507

6.2.3 The ICC and Info option......................................................................................................................... 508

6.2.4 The Total Info option.............................................................................................................................. 509

6.2.5 Matrix Plot option ................................................................................................................................... 510

6.2.6 The Histogram option ............................................................................................................................ 512

6.2.7 The Bivariate Plot option ....................................................................................................................... 513

6.2.8 The Exit option ....................................................................................................................................... 514

6.3 MANIPULATING AND MODIFYING GRAPHS ..........................................................................514

6.3.1 File menu................................................................................................................................................. 514

6.3.2 Edit menu ................................................................................................................................................ 515

6.3.3 Options menu ......................................................................................................................................... 515

6.3.4 Graphs menu .......................................................................................................................................... 516

6.3.5 Axis Labels dialog box .......................................................................................................................... 517

6.3.6 Bar Graph Parameters dialog box ........................................................................................................ 518

6.3.7 Legend Parameters dialog box............................................................................................................. 520

6.3.8 Line Parameters dialog box .................................................................................................................. 521

6.3.9 Plot Parameters dialog box ................................................................................................................... 522

6.3.10 Text Parameters dialog box ................................................................................................................ 522

6.4 ITEM CHARACTERISTIC CURVES ..........................................................................................523

6.5 ITEM INFORMATION CURVES ...............................................................................................524

7
6.6 TEST INFORMATION CURVES ...............................................................................................526

7 OVERVIEW AND MODELS ...............................................................................................528

7.1 OVERVIEW OF IRT PROGRAMS ...........................................................................................528

7.1.1 BILOG-MG ............................................................................................................................................... 528

7.1.2 PARSCALE.............................................................................................................................................. 528

7.1.3 MULTILOG............................................................................................................................................... 529

7.1.4 TESTFACT............................................................................................................................................... 529

7.2 MODELS IN BILOG-MG ....................................................................................................530

7.2.1 Introduction ............................................................................................................................................ 530

7.2.2 Multiple-group analyses ........................................................................................................................ 531

7.2.3 Technical details .................................................................................................................................... 538

7.2.4 Statistical tests ....................................................................................................................................... 543

7.3 MODELS IN PARSCALE....................................................................................................544

7.3.1 Introduction ............................................................................................................................................ 544

7.3.2 Samejima’s graded response model.................................................................................................... 546

7.3.3 Masters’ partial credit model ................................................................................................................ 550

7.3.4 Scoring function of generalized partial credit model ......................................................................... 557

7.3.5 Multiple-group polytomous item response models............................................................................ 560

7.3.6 Constraints for group parameters........................................................................................................ 560

7.3.7 Test of goodness-of-fit .......................................................................................................................... 561

7.3.8 Initial parameter estimates.................................................................................................................... 562

7.4 MODELS IN MULTILOG ...................................................................................................567

7.4.1 Introduction ............................................................................................................................................ 567

7.4.2 The graded model .................................................................................................................................. 567

7.4.3 The one- and two-parameter logistic models...................................................................................... 567

8
7.4.4 The multiple response model ............................................................................................................... 568

7.4.5 The multiple-choice model .................................................................................................................... 569

7.4.6 The three-parameter logistic model ..................................................................................................... 569

7.4.7 The nominal model................................................................................................................................. 570

7.4.8 Contrasts................................................................................................................................................. 570

7.4.9 Equality constraints and fixed parameters.......................................................................................... 575

7.5 OPTIONS AND STATISTICS IN TESTFACT ..........................................................................575

7.5.1 Introduction ............................................................................................................................................ 575

7.5.2 Classical item analysis and test scoring ............................................................................................. 575

7.5.3 Classical descriptive statistics ............................................................................................................. 576

7.5.4 Item statistics ......................................................................................................................................... 577

7.5.5 Fractile tables ......................................................................................................................................... 580

7.5.6 Plots......................................................................................................................................................... 582

7.5.7 Correction for guessing ........................................................................................................................ 582

7.5.8 Internal consistency .............................................................................................................................. 582

7.5.9 Tetrachoric correlations and factor analysis ...................................................................................... 583

7.5.10 IRT based item factor analysis ........................................................................................................... 584

7.5.11 Full information factor analysis.......................................................................................................... 585

7.5.12 Bifactor analysis................................................................................................................................... 586

7.5.13 Not-reached items in factor analysis ................................................................................................. 586

7.5.14 Constraints on item parameter estimates ......................................................................................... 586

7.5.15 Statistical test of the number of factors ............................................................................................ 587

7.5.16 Factor scores........................................................................................................................................ 588

7.5.17 Number of quadrature points.............................................................................................................. 589

7.5.18 Monte Carlo integration ....................................................................................................................... 591

7.5.19 Applications.......................................................................................................................................... 591

9
8 ESTIMATION ........................................................................................................................592

8.1 INTRODUCTION ...................................................................................................................592

8.1.1 Trait estimation with Item Response Theory....................................................................................... 593

8.1.2 Information.............................................................................................................................................. 597

8.2 ESTIMATION IN BILOG-MG ..............................................................................................599

8.2.1 Item calibration....................................................................................................................................... 599

8.2.2 Test scoring ............................................................................................................................................ 605

8.2.3 Test and item information ..................................................................................................................... 608

8.2.4 Effects of guessing ................................................................................................................................ 610

8.2.5 Aggregate-level IRT models .................................................................................................................. 610

8.3 ESTIMATION IN PARSCALE..............................................................................................611

8.3.1 Prior densities for item parameters...................................................................................................... 612

8.3.2 Rescaling the parameters ..................................................................................................................... 612

8.3.3 The information function ....................................................................................................................... 613

8.3.4 Warm’s weighted ML estimation of ability parameters ...................................................................... 615

8.4 ESTIMATION IN MULTILOG .............................................................................................616

8.4.1 Item parameter estimation .................................................................................................................... 616

9 USES OF ITEM RESPONSE THEORY .............................................................................618

9.1 INTRODUCTION ...................................................................................................................618

9.2 SELECTION TESTING ...........................................................................................................618

9.3 QUALIFICATION TESTING ....................................................................................................619

9.4 PROGRAM EVALUATION AND ASSESSMENT TESTING ...........................................................619

9.5 CLINICAL TESTING ..............................................................................................................619

9.6 MEASUREMENT METHODS AND RESEARCH .........................................................................620

10
9.7 APPROACHES TO ANALYSIS OF ITEM RESPONSE DATA .........................................................620

9.7.1 Test scoring ............................................................................................................................................ 621

9.7.2 Test generalizability ............................................................................................................................... 622

9.7.3 Item analysis........................................................................................................................................... 623

9.7.4 Estimating the population distribution ................................................................................................ 625

9.7.5 Differential item functioning ................................................................................................................. 626

9.7.6 Forms equating ...................................................................................................................................... 626

9.7.7 Vertical equating .................................................................................................................................... 627

9.7.8 Construct definition ............................................................................................................................... 629

9.7.9 Analysis and scoring of rated responses............................................................................................ 629

9.7.10 Matrix sampling .................................................................................................................................... 630

9.7.11 Estimating domain scores .................................................................................................................. 631

9.7.12 Adaptive testing ................................................................................................................................... 632

10 BILOG-MG EXAMPLES ...................................................................................................634

10.1 CONVENTIONAL SINGLE-GROUP IRT ANALYSIS ...............................................................634

10.2 DIFFERENTIAL ITEM FUNCTIONING ...................................................................................638

10.3 DIFFERENTIAL ITEM FUNCTIONING ...................................................................................650

10.4 EQUIVALENT GROUPS EQUATING ......................................................................................652

10.5 VERTICAL EQUATING........................................................................................................658

10.6 MULTIPLE MATRIX SAMPLING DATA .................................................................................666

10.7 ANALYSIS OF VARIANT ITEMS...........................................................................................670

10.8 GROUP-WISE ADAPTIVE TESTING ......................................................................................674

10.9 TWO-STAGE SPELLING TEST..............................................................................................679

10.10 ESTIMATING AND SCORING TESTS OF INCREASING LENGTH ............................................685

10.11 COMMANDS FOR PARALLEL-FORM CORRELATIONS .........................................................685

11
10.12 EAP SCORING OF THE NAEP FORMS AND STATE MAIN AND VARIANT TESTS..................686

10.13 DOMAIN SCORES.............................................................................................................688

11 PARSCALE EXAMPLES ...................................................................................................692

11.1 ITEM CALIBRATION AND EXAMINEE BAYES SCORING WITH THE RATING-SCALE GRADED
MODEL ......................................................................................................................................692

11.2 EXAMINEE MAXIMUM LIKELIHOOD SCORING FROM EXISTING PARAMETERS .....................708

11.3 CALIBRATION AND SCORING WITH THE GENERALIZED PARTIAL CREDIT RATING-SCALE
MODEL: COLLAPSING OF CATEGORIES.......................................................................................709

11.4 TWO-GROUP DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS WITH THE PARTIAL CREDIT
MODEL ......................................................................................................................................710

11.5 A TEST WITH 26 MULTIPLE-CHOICE ITEMS AND ONE 4-CATEGORY ITEM: THREE-PARAMETER
LOGISTIC AND GENERALIZED PARTIAL CREDIT MODEL ..............................................................720

11.6 ANALYSIS OF THREE TESTS CONTAINING ITEMS WITH TWO AND THREE CATEGORIES:
CALCULATION OF COMBINED SCORES .......................................................................................722

11.7 RATER-EFFECT MODEL: MULTI-RECORD INPUT FORMAT WITH VARYING NUMBERS OF


RATERS PER EXAMINEE .............................................................................................................723

11.8 RATER-EFFECT MODEL: ONE-RECORD INPUT FORMAT WITH SAME NUMBER OF RATERS PER
EXAMINEE ................................................................................................................................727

11.9 RATERS-EFFECT MODEL: ONE-RECORD INPUT FORMAT WITH VARYING NUMBERS OF


RECORDS PER EXAMINEE ..........................................................................................................728

12 MULTILOG EXAMPLES ..................................................................................................730

12.1 ONE-PARAMETER LOGISTIC MODEL FOR A FIVE-ITEM BINARY-SCORED TEST (LSAT6).....730

12.2 TWO-PARAMETER MODEL FOR THE FIVE-ITEM TEST..........................................................732

12.3 THREE-PARAMETER (AND GUESSING) MODEL FOR THE FIVE-ITEM TEST ............................733

12.4 THREE-CATEGORY GRADED LOGISTIC MODEL FOR A TWO-ITEM QUESTIONNAIRE .............735

12.5 THREE-CATEGORY PARTIAL CREDIT MODEL FOR THE TWO-ITEM QUESTIONNAIRE ............738

12.6 FOUR-CATEGORY GRADED MODEL FOR A TWO-ITEM INTERVIEW SCALE ...........................740

12
12.7 A GRADED MODEL ANALYSIS OF ITEM-WORDING EFFECT ON RESPONSES TO AN OPINION
SURVEY ....................................................................................................................................741

12.8 GRADED-MODEL SCORES FOR INDIVIDUAL RESPONDENTS ................................................748

12.9 FIVE-CATEGORY RATINGS OF AUDIOGENIC SEIZURES IN MICE IN FOUR EXPERIMENTAL


CONDITIONS..............................................................................................................................749

12.10 A NOMINAL MODEL FOR RESPONSES TO MULTIPLE-CHOICE ALTERNATIVES ....................751

12.11 A CONSTRAINED NONLINEAR MODEL FOR MULTIPLE-CHOICE ALTERNATIVES.................757

12.12 A NOMINAL MODEL FOR TESTLETS .................................................................................759

12.13 A CONSTRAINED NOMINAL MODEL FOR QUESTIONNAIRE ITEMS .....................................761

12.14 A CONSTRAINED GENERALIZED PARTIAL CREDIT MODEL................................................762

12.15 A MIXED NOMINAL AND GRADED MODEL FOR SELF-REPORT INVENTORY ITEMS .............765

12.16 A MIXED THREE-PARAMETER LOGISTIC AND PARTIAL CREDIT MODEL FOR A 26-ITEM TEST
.................................................................................................................................................767

12.17 EQUIVALENT GROUPS EQUATING OF TWO FORMS OF A FOUR-ITEM PERSONALITY


INVENTORY ..............................................................................................................................768

12.18 DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS OF EIGHT ITEMS FROM THE 100-ITEM
SPELLING TEST .........................................................................................................................770

12.19 INDIVIDUAL SCORES FOR A SKELETAL MATURITY SCALE BASED ON GRADED RATINGS OF
OSSIFICATION SITES IN THE KNEE ..............................................................................................772

13 TESTFACT EXAMPLES....................................................................................................775

13.1 CLASSICAL ITEM ANALYSIS AND SCORING ON A GEOGRAPHY TEST WITH AN EXTERNAL
CRITERION ................................................................................................................................775

13.2 TWO-FACTOR NON-ADAPTIVE FULL INFORMATION FACTOR ANALYSIS ON A FIVE-ITEM TEST


(LSAT7) ..................................................................................................................................778

13.3 ONE-FACTOR NON-ADAPTIVE FULL INFORMATION ITEM FACTOR ANALYSIS OF THE FIVE-
ITEM TEST .................................................................................................................................780

13
13.4 A THREE-FACTOR ADAPTIVE ITEM FACTOR ANALYSIS WITH BAYES (EAP) ESTIMATION OF
FACTOR SCORES: 32 ITEMS FROM AN ACTIVITY SURVEY ...........................................................780

13.4.1 Discussion of output............................................................................................................................ 782

13.5 ADAPTIVE ITEM FACTOR ANALYSIS AND BAYES MODAL (MAP) FACTOR SCORE ESTIMATION
FOR THE ACTIVITY SURVEY .......................................................................................................802

13.6 SIX-FACTOR ANALYSIS OF THE ACTIVITY SURVEY BY MONTE CARLO FULL INFORMATION
ANALYSIS .................................................................................................................................803

13.7 ITEM BIFACTOR ANALYSIS OF A 12TH-GRADE SCIENCE ASSESSMENT TEST........................804

13.7.1 Discussion of bifactor analysis output .............................................................................................. 805

13.8 CONVENTIONAL THREE-FACTOR ANALYSIS OF THE 12TH-GRADE SCIENCE ASSESSMENT TEST


.................................................................................................................................................814

13.9 COMPUTING EXAMINEE GENERAL FACTOR SCORES FROM PARAMETERS OF A PREVIOUS


BIFACTOR ANALYSIS .................................................................................................................815

13.10 ONE-FACTOR ANALYSIS OF THE 12TH-GRADE SCIENCE ASSESSMENT TEST .....................817

13.11 ITEM FACTOR ANALYSIS OF A USER-SUPPLIED CORRELATION MATRIX ............................818

13.12 SIMULATING EXAMINEE RESPONSES TO A THREE-FACTOR TEST WITH USER-SUPPLIED


PARAMETERS ............................................................................................................................819

13.13 SIMULATING EXAMINEE RESPONSES IN THE PRESENCE OF GUESSING AND NON-ZERO


FACTOR MEANS.........................................................................................................................820

13.14 THREE-FACTOR ANALYSIS WITH PROMAX ROTATION: 32 ITEMS FROM THE SCIENCE
ASSESSMENT TEST ....................................................................................................................823

13.15 PRINCIPAL FACTOR SOLUTION OF A FACTOR ANALYSIS ON SIMULATED DATA: NO GUESSING


.................................................................................................................................................825

13.16 NON-ADAPTIVE FACTOR ANALYSIS OF SIMULATED DATA: PRINCIPAL FACTOR SOLUTION,


NO GUESSING ............................................................................................................................826

13.17 ADAPTIVE ITEM FACTOR ANALYSIS OF 25 SPELLING ITEMS FROM THE 100-ITEM SPELLING
TEST .........................................................................................................................................827

13.18 CLASSICAL ITEM FACTOR ANALYSIS OF SPELLING DATA FROM A TETRACHORIC


CORRELATION MATRIX .............................................................................................................828

14
14 APPENDIX A: A BRIEF HISTORY OF ITEM RESPONSE THEORY .......................830

14.1 ANTECEDENTS ..................................................................................................................830

14.2 CONNECTIONS ..................................................................................................................833

14.3 IRT TEST SCORING ...........................................................................................................836

14.4 IRT ITEM ANALYSIS .........................................................................................................840

14.5 CURRENT TRENDS.............................................................................................................844

15 REFERENCES .....................................................................................................................848

15
1 DATA PREPARATION

1 Data preparation1

1.1 Data characteristics: What kind of data can I use?

The only type of data that the IRT programs currently can handle is fixed format with one or
more lines per record (case) and one-character response codes. Fixed format means that the vari-
ables occupy the same column positions throughout the data file. The only acceptable values in
such a data file are the upper- and lowercase characters a through z, the digits 0 through 9, and
any of the special characters like +-.*&. Tab characters (^t) and other control characters that are
usually embedded in files from word processing (e.g., doc), database (e.g., dbf), spreadsheet
(e.g., xls), and statistical applications (e.g., sav) are not acceptable and data files with such extra-
neous characters will produce unexpected program behavior that may be difficult to trace. Sec-
tion 1.5 illustrates the conversion of an Excel2 file to a fixed format file.

In its simplest form the data file contains individual response data. Such a flat file usually has
one line per record, starting with a subject ID (identification field) and followed by a number of
one-character response codes for the items in the test. Spaces in between fields and/or items are
permitted, as long as those blanks maintain the column positions of the item responses through-
out the file.

Example:

John abbac aaacc


Mary-Ann bcabb bbcaa

Mary-Ann selected response category a for items 3, 9, and 10, while John answered b, c, and c,
respectively.

The item response codes may represent right/wrong answers, selected response categories,
nominal category codes, ordinal variable values, ratings, etc. The maximum number of different
codes per item is dependent on the program used for analysis. BILOG-MG and TESTFACT ana-
lyze binary (dichotomous) responses only. The data may be multiple-category (1,2,3,4 or
a,b,c,d,e, etc.), but the program reduces that to right/wrong data with the correct response code
key that the user provides. MULTILOG and PARSCALE can handle both binary and multiple-
category items or mixtures of those types.

1
This section was contributed by Leo Stam.

2
Excel 2000 was used in the examples.

16
1 DATA PREPARATION

Besides a subject ID with up to 30 characters and the single-character item response codes, other
fields that may be present in the records are:

 A case weight or a frequency


 A subtest number
 A group identifier
 A form number
 A rater code

The specific requirements for these fields can be found in the Command Reference section for
the different programs. For example, the group identifier in BILOG-MG has to be a single digit
(integer) , starting with 1, while in TESTFACT it can be any single character (M,F, etc.), and in
PARSCALE it can be a name up to eight characters.

Including the single-subject data described above, the programs allow the following data types:

 Single-subject data with or without case weights


 Number tried/number right data with or without case weights
 Response patterns with frequencies

1.2 Format statement: How do I tell the program about my data?

The IRT programs are command-driven and are run in batch mode. That is to say that the user
prepares a command file (either directly in an editor or through a dialog-box user interface, if
present) and submits this command file to the program for execution (Run).

While it is true that command-driven programs were the standard before the “point-and-click”
user interfaces (“GUI”) entered the computing scene, maintaining this standard for the current
programs was done deliberately. The dialog-box interfaces that have been added are merely a so-
called front-end for the convenience of the user in building such a command file. Despite the
progress that has been made with the graphical user interfaces, in our experience users who use a
program routinely still prefer the ease of use of the command file. Moreover, such a file stores
the particulars of an analysis in a very succinct way, such that making small changes to an analy-
sis, retrieving an old analysis, or sharing the analysis with other users of the program (also: tech-
nical support) is a straightforward task. It is like giving somebody a map of how to get from A to
B instead of having to describe the route with “take the first street to the right, then a left at the
third traffic light”, etc. Granted, learning and remembering the commands, keywords, and op-
tions used in a program requires a considerable effort (like learning how to read maps), while the
click-and-point interface can lay claim to being intuitive for the user. The dialog-box user inter-
face is especially helpful in that learning process or as a means to refresh the memory of the oc-
casional user of the programs.

17
1 DATA PREPARATION

Besides the particular analysis specifications, the command file informs the program where the
data file can be found and how it should be read. The location of the data file to be analyzed is
simply a matter of specifying that location with a keyword.

For example:

>FILES … DFNAME=’C:\PARSCALE\DATA\EXAMPL01.DAT’;

or

>GLOBAL … DFNAME=’F:\BILOGMG\EXAMPLES\EXAMPL06.RAW’;

or

>INPUT … FILE=’D:\TESTFACT\DATAFILES\TEST01.DAT’;

or

>PROBLEM … DATA=’G:\MULTILOG\DATA\TEST04.RAW’;

This shows that each program has its own flavor of command file syntax but also that those
specifications are essentially the same and that it is fairly easy to tell a program where it can find
the data input. Note that the name of the data file must be enclosed in single quotes. The drive
and directory path should be included if the data file is not in the same folder as the command
file. It is also good practice to keep all the files, including the command file, for a particular
analysis together in a separate folder. In that case, all that is needed is the filename.

Now that the program knows where to find the data, it needs to be told how to read those data.
What part of a record has the subject ID, in which column is the response code for the first item
to be read, where is the group code, if any, etc. To that end, the user includes a format statement
in the command file.

Format statements are enclosed in parentheses. They are entered on a separate line in the com-
mand file and usually one line is all that is needed. However, if more lines are needed, the user
can indicate that with a keyword (e.g., NFMT=2 tells the program that the format statement occu-
pies two lines).

The format statement for the simple example above is: (8A1,1X,5A1,1X,5A1).

Here is the file again, with a column counter added above for convenience:

12345678901234567890
John abbac aaacc
Mary-Ann bcabb bbcaa

As can be seen, the total length of each record in the file is 20 columns. The first eight columns
contain the ID field. This is specified in the format statement with “8A1.” That stands for “eight

18
1 DATA PREPARATION

alphanumeric characters of length one.” The “A” is a format code and stands for alphanumeric.
The 1 indicates the width and the 8 is a repeat count. Other possible format codes are “F” (for
floating point, used to read real numbers) and “I” (for integer).

The next element in the format statement is an example of an operator, in this case “X”. The “X”
is used to tell the program to skip one or more columns. The example specifies “1X” or skip one
column. Next follows a block of five item responses to be read as “5A1”. Then, we instruct the
program to skip another column and to read a second set of five alphanumeric characters: items 6
through 10. Thus, the complete format statement, (8A1,1X,5A1,1X,5A1), describes how to read
each of the twenty columns in a record. Because the format statement describes one data record
and that description is applied to the whole data file, all the records in the data file should look
identical: the essence of a fixed format.

Instead of the “X” operator, the “T” (tab) operator can be used with the same result. The tab op-
erator specifies the column position to tab to. Thus, the format statement (8A1,1X,5A1,1X,5A1)
becomes (8A1,T10,5A1,T16,5A1) when using the tab operator. Tabbing backward is also pos-
sible. That is often used when the examinee records have the examinee ID at the end of each line,
while the program wants it as the first thing being read. Here is our example in that format. The
first line is a column counter added for your convenience. It is not part of the actual data file.

12345678901234567890
abbac aaacc John
bcabb bbcaa Mary-Ann

With the format statement (T13,8A1,T1,5A1,1X,5A1) we instruct the program to read the eight-
character ID starting at column 13, then go back to column 1 and read two blocks of five items,
skipping a blank column in the middle. This examples also illustrates that the “X” and “T” opera-
tors can be used within the same format statement. Obviously, the “T” operator can also be used
to read the items in an order that is different from the order in the data file. For example, with
(T13,8A1,T7,5A1,T1,5A1) we read the second block of 5 items before the first block of 5
items.

The final operator that the user of our IRT programs should know about is the “/” (slash) opera-
tor. It instructs the program to go to the next line of the data file. Oftentimes, users have data
where the record for each examinee spans more than one line in the data file. A simple example
is as follows (again, with the column counter added for convenience).

1234567890123456
John 1 abbac
John 2 aaacc
Mary-Ann 1 bcabb
Mary-Ann 2 bbcaa

Here, each block of five items is given on a separate line. This could easily result from two dif-
ferent data files (each with an examinee ID and five items) that were concatenated into one file,
then sorted on examinee ID. To keep the order of the item blocks the same for each examinee, a
block number was added to the original data files.

19
1 DATA PREPARATION

The format statement (8A1,T12,5A1,/,11X,5A1) will read the examinee ID from the first line
of the record (8A1), tab to column 12 and read the first five items (T12,5A1), then go to the next
line of the record (/), skip the first 11 columns and read columns 12—16 as the responses to the
second set of five items. Note that the examinee ID in the second line of each record is not
needed.

A special use of the forward slash operator is to read every first, second, third, etc. record of a
large data file. For example, (8A1,1X,20A1,/) reads every odd record of a data file, starting
with the first one, while (/,8A1,1X,20A1) reads every even record of a data file, starting with
the second one.

The examples that come with the programs use a variety of format statements and it is a good
idea to look for an example that resembles your data when in doubt about the right format state-
ment. The chapters in this book that describe the examples also offer further details on the use of
the format statement.

1.3 Telling right from wrong with the response key

When you are analyzing multiple-choice items that are either answered correctly or incorrectly,
the program needs to know the item response code for each item that represents a correct answer.
The user provides that information with a response key.

MULTILOG and TESTFACT require the response key in the command file as a string of item
codes for correct responses, while users of BILOG-MG should specify in the command file
where the response key can be found (unless the data are already coded as 1 for a right and 0 for
a wrong answer). Because it is slightly more complicated, let us look at a BILOG-MG example.

The response key is a record with the exact same format as the data records. It can be in its own
file, or it can be part of the data file. The latter option makes it easier to check that the format is
indeed identical.

key acaab baaba


John abbac aaacc
Mary-Ann bcabb bbcaa

The file has the response key as the first record. The word key is used in the ID field for conven-
ience. It is not needed and will not be read by the program. BILOG-MG will apply the response
key to the data records and it will convert John’s responses to 1001001100 and Mary-Ann’s re-
sponses to 0110110001.

1.4 What about missing data?

In educational assessment, the reason for an item response in a data file to be coded as missing is
generally limited to two possibilities. The specific item was not presented to the examinee or the
examinee did not respond to the specific item. The former occurs when examinees answer differ-
ent forms (selection of items) of the same test and all the items of the test are included in the data

20
1 DATA PREPARATION

file. The importance of the differentiation in missing codes lies in the fact that omitted items can
be treated as a wrong response, a fractionally correct response, or the same as a not-presented
item, i.e., excluded from the calculations.

Using the simple example again, the data file with not-presented items could look like:

John abbac xxxxx


Mary-Ann xxxxx bbcaa

John took a different form of the test than Mary-Ann. They both responded to the five items in
their form and all ten items of the two forms are included in the data file. Although the example
uses the same not-presented code for all items, note that with BILOG-MG and PARSCALE the
not-presented (or omitted item code) may vary among items.

The four programs approach handling missing codes differently; details can be found in the
chapters describing the programs. BILOG-MG and PARSCALE are similar and accommodate
both omitted and not-presented codes. TESTFACT allows only one value for all items to repre-
sent an omitted item and another value for not-presented items. TESTFACT is the only program
that allows omitted items to be differentiated into skipped items and not-reached items. The latter
are defined as all the omitted items after the last item the examinee responded to. This situation
occurs when tests are administered under a time restriction (speed tests) and such tests are not
considered appropriate ability measurements under the assumptions underlying the power test
models used in the other programs. MULTILOG does not distinguish between omitted and not-
presented items, the user can only assign one missing code per item.

The format of not-presented and omitted keys is as described in Section 1.3. Note that, if more
than one key is used as part of the data file, the keys should follow the order as described in the
Command Reference sections for the respective programs.

1.5 Data import: What if my data are different?

The IRT programs from SSI expect plain text (ascii) data files with a fixed format. Because the
programs do not include an import facility to handle various file formats, the user with data in
such a format faces the task of converting the dataset to the plain text, fixed format. Spreadsheet,
database, and statistical applications generally offer the user some form of data export (or Save
As) that includes the plain text format. In this section we will illustrate such a conversion with an
Excel dataset as starting point. We selected Excel, because it has a format that other applications
include in their export formats, and it is a widely used program. This way, users that are unclear
about how to convert a specific data format to plain text format may convert to Excel, then fol-
low one of the two methods described below.

The user is advised always to use copies of the original dataset. With Excel, for example, the
Save As operation uses a format that can only save the active worksheet, so some of your work
may get lost.

21
1 DATA PREPARATION

1.5.1 Using the print format

Using this simple format can only be done with files up to 240 columns (after conversion). In
other words, if your Excel worksheet has more than 240 (minus maximum ID length minus pos-
sible form and/or group indicators) items, this method will not work.

In Excel, highlight all the columns with the item response codes and set the column width of the
highlighted columns to 1. This assumes that your response codes are already one-character
codes. If not, you should use the recode capabilities of Excel. For example, if a twelve-category
item is coded as 1 through 12, recode it as 1,2,3,4,5,6,7,8,9,A,B,C or as A through L. The col-
umn with the ID field should be set to the maximum length of the values appearing in that col-
umn. Form or group indicators are best coded as numbers, starting with one.

Now, save the data file as a “*.prn” file. Excel calls that a Formatted text (Space-delimited)
file. If you want your filename to have the extension dat (instead of the automatic prn exten-
sion), use double quotation marks (") around the name of the file you want to save it to. Answer
Yes to the question about loosing special formatting features.

The resulting file should look as shown below, where the first 8 columns are the ID field, fol-
lowed by 17 item responses. Note that the leading blanks in the first ID field are automatically
included because the column width in Excel was set to 8 and the ID itself has only 4 characters.
The alignment of the item responses is preserved.

John0101010101010101
Mary-Ann1010101010101010
....

1.5.2 Using the tab-delimited format

Another option in Excel is to Save As txt format, which produces a tab delimited file. This
method has no limitations on the maximum record length. However, the IRT programs stumble
on tab characters (do not know how to handle that) and they have to be removed. You can do that
in MS Word, for example, by reading in the file as a plain text file, then do a global replace of
the “^t” with either a blank or nothing. Then save the file. This works well if your ID field has
the same number of characters. Otherwise, you can move the column to the end of the worksheet
before you do the Save As operation.

A second problem occurs when your worksheet has cells with no entries at all (missing re-
sponse). When exporting (Save As) this as a tab-delimited text file, a global replacement of the
tab character with a blank will throw off the column alignment. In that case, you should replace
all instances of tab-tab with tab-space-tab.

To accommodate the user, SSI has included a NOTAB utility on the program CD that can filter
out unwanted tab characters correctly. This utility as well as a worked example can be found in
the dataprep folder on the “IRT from SSI” CD.

22
1 DATA PREPARATION

1.6 Data export: What if my data needs editing?

Going the other way, from a plain text, formatted data file to an Excel file has a number of us-
ages. Foremost is data editing. The first attempt at analysis may reveal several difficulties in the
data. Values that are out of range, negative item-test correlations, group codes that are coded
with characters instead of numbers, etc. Importing the plain text data file into Excel or a similar
application provides the user with powerful tools for data editing and data cleaning.

From within Excel, select Get External Data from the Data menu, then Import Text File. Se-
lect the data file to import. The Text Import Wizard opens with a preview of the data file. Se-
lect Fixed width as the type that best describes the data, then click the Next button. In the Data
Preview box, use the mouse to set break lines separating the data into columns. Once satisfied,
click Next. The last step allows you to skip columns, if needed. Click Finish.

23
2 BILOG-MG REFERENCE

2 BILOG-MG

BILOG-MG is an extension of the BILOG program that is designed for the efficient analysis of
binary items, including multiple-choice or short-answer items scored right, wrong, omitted, or
not-presented. BILOG-MG is capable of large-scale production applications with unlimited
numbers of items or respondents. It can perform item analysis and scoring of any number of sub-
tests or subscales in a single program run. All the program output may be directed to text files for
purposes of selecting items or preparing reports of test scores.

The BILOG-MG program implements an extension of Item Response Theory (IRT) to multiple
groups of respondents. It has many applications in test development and maintenance. Applica-
tions of multiple-group item response theory in education assessment and other large-scale test-
ing programs include:

 Nonequivalent groups equating for maintaining the comparability of scale scores as new
forms of the test are developed.
 Vertical equating of test forms across school grades or age groups.
 Analysis of Differential Item Functioning (DIF) associated with demographic or other
group differences.
 Detecting and correcting for item parameter trends over time (DRIFT).
 Calibrating and scoring tests in two-stage testing procedures designed to reduce total test-
ing time.
 Estimating latent ability or proficiency distributions of students in schools, communities,
or other aggregations.

In addition, the BILOG-MG program provides for “variant items” that are inserted in tests for
purpose of estimating item statistics, but that are not included in the scores of the examinees.

2.1 New features in BILOG-MG

The most important change is that BILOG-MG is now a Windows application. Syntax can be
generated or adapted using menus and dialog boxes or, as before, with command files in text
format. The interface has menu options in the order the user would most generally use: model
specification is followed by data specification and technical specifications, etc. Each of the menu
options provides access to a number of dialog boxes in which the user can make specifications.
For an overview of the required and optional commands in BILOG-MG syntax, please see Sec-
tion 2.6.1. For more information on which dialog box to use to specify a specific keyword or op-
tion, please see the location of keywords in the interface discussed in Section 2.3.13.

 Filename length: All filenames with path may now extend to 128 characters. The file-
name must be enclosed in single quotes. Note that each line of the command file has a
maximum length of 80 characters. If the filename does not fit on one line of 80 characters,
the remaining characters should be placed on the next line, starting at column 1.

24
2 BILOG-MG REFERENCE

 Factor loadings: The item dispersion (reciprocal of the item slope) previously listed
among the parameter estimates has been replaced by the one-factor item factor loading
given by the expression Slope / 1 + Slope 2 .
 Average measurement error and empirical reliability for each subtest: The mean-
square error and root-mean-square error for the sample cases are listed for each test. In
addition, the empirical reliability computed from the IRT scale score variance and the
mean-square error is listed.
 Note that for EAP and MAP estimated ability the formula for this reliability differs from
the formula for ML estimated ability (to account for the regression effect in EAP and
MAP estimation). If there are multiple test forms, these test statistics are averages over the
forms. If there are multiple groups, the statistics are listed for both the combined groups
and the separate groups.
 Reliabilities in connection with information plots: The reliabilities given by the pro-
gram in connection with the information plots of Phase 3 differ from empirical reliabilities
in that they assume a normal distribution of ability in the population. They depend only on
the parameters of the items and not on the estimated abilities in the sample. The program
now computes and lists these theoretical reliabilities for both combined and separated test
forms and sample groups. (For a discussion of empirical and theoretical reliability see
Bock & Zimowski (1999).)
 Information curves and reliabilities for putative test forms: It may be useful in test
development to preview the information and theoretical reliability of test forms that might
be constructed from items drawn from a calibrated item bank. (For a discussion of this
procedure, see Section 2.2.)
 GLOBAL command—PRNAME keyword: This keyword instructs the program to read
the provisional values of parameters of selected items in the test forms from the specified
file.
 SAVE command—PDISTRIB keyword: This keyword allows the user to save the
points and weights of the posterior latent distribution at the end of Phase 2. These quanti-
ties can be included as prior values following the SCORE command for later EAP estima-
tion of ability from previously estimated item parameters.
 TEST command—FIX keyword: This keyword allows the user to keep selected item pa-
rameters fixed at their starting values. Starting values may be entered on the SLOPE,
THRESHLD, and GUESSING keywords on the same command or read from an existing item
parameter file.
 CALIB command—NOADJUST option: BILOG-MG routinely rescales the origin and
scale of the latent distribution, even in the one-group case. This option may be used to
suppress this adjustment.
 CALIB command—CHI keyword: This keyword determines the number of items re-
quired and the number of intervals used for χ 2 computations.
 CALIB command—FIXED option: If this option is present, the prior distributions of
ability in the population of respondents are kept fixed at the values specified in the IDIST
keyword and/or the QUAD commands. It suppressed the updating of the means and standard
deviations of the prior distribution at each EM cycle in the multiple-group case.
 CALIB command—GROUP-PLOTS option: By default, the program item plots show
observed proportions of correct responses in the data combined for all groups. The GROUP-

25
2 BILOG-MG REFERENCE

PLOTS option provides plots for each separate group, along with the combined plot.
 CALIB command—RASCH option: If this option is specified, the parameter estimates
will be rescaled according to Rasch model conventions: that is, all the slopes will be re-
scaled so that their geometric mean equals 1.0, and the thresholds will be rescaled so that
their arithmetic mean equals 0.0. If the 1-parameter model has been specified, all slope pa-
rameters will therefore equal 1.0.
 PRIORS command—SMU and SSIGMA keywords: Prior values for slope parameter
means and sigma are now entered in arithmetic units rather than natural log units. The
means for both forms are printed in the Phase 2 output, however. The default for SMU is
1.0 (log SMU = 0.0) and for SSIGMA the default is 1.64872127 (log SSIGMA = 0.5).
 SCORE command—MOMENTS option: Inserting the MOMENTS option in the SCORE
command causes the program to compute and list the coefficients of skewness and kurto-
sis of the ability estimates and of the latent distribution.
 SCORE command—DOMAIN keyword: BILOG-MG now includes a procedure for
converting the Phase 3 estimates of ability into domain scores if the user supplies a file
containing the item parameters for a sample of previously calibrated items from the do-
main. Weights can be applied to the items to improve the representation of the domain
specifications.
 SCORE command—FILE keyword: This keyword is used to supply the external file
used to calculate the domain scores (see above).

2.2 Phases of the analysis: input, calibration and scoring

Phase 1: INPUT

The input routine reads formatted data records. Data for each observation consist of subject iden-
tification, optional form number, optional group number, optional case weight, and item response
data. Item responses of individual examinees comprise one character for each of n items. The
answer key, not-presented, and omit codes are read in exactly the same format as the observa-
tions. For aggregate-level data, the “responses” consist of number of attempts and number cor-
rect for each item. If data are for the aggregate-level model, vectors of numbers of attempts and
correct responses to the items are read in decimal format.

Omits and attempts

Omits may be scored “wrong”, treated as fractionally correct, or omitted from calculations.

Items and forms

The INPUT routine accepts a list of numbers and corresponding names for all items to be read
from the data records. The order in which the items appear in the data records is specified in a
form key(s). If the data are collected with a multiple-form test, the program accepts a key for
each form. Each respondent’s data record is identified by its form number.

26
2 BILOG-MG REFERENCE

Multiple groups

When multiple-group IRT analysis is requested, the INPUT routine accepts a list of item num-
bers or names identifying the items administered to each group. Each respondent’s data record is
identified by its group number. The Phase 1 program computes classical item statistics separately
for each group.

Subtests

The INPUT routine also accepts lists of item numbers or names, not necessarily mutually exclu-
sive, describing i subtests. It scores each subtest and creates a file containing the item scores,
item attempts, subtest scores, and other input information for each respondent. Each subtest is
calibrated separately in Phase 2. Each respondent is scored on all subtests in Phase 3.

Case weights

If there are case weights for respondents (because they were drawn in an allocation sample), the
item responses and item attempts are multiplied by the weight. If the data consist of response
patterns, the case weights are the frequencies of the patterns.

Samples

If there are a large number of respondents or aggregate-level records, the INPUT routine can be
instructed to select a random sample of a specified size to be passed to CALIBRATE (Phase 2).
The complete master file of cases will nevertheless be passed to Phase 3 for scoring.

Classical item statistics

While preparing the item-score file, the INPUT routine also accumulates, subtest by subtest, cer-
tain item and test statistics (accumulated from the sample file when the number of cases exceeds
the user-specified sampling level). These statistics consist of

 item facilities (percent correct),


 item-subscore correlations, and
 the number of respondents attempting each item.

These quantities are listed and passed to the Phase 2 and Phase 3 routines to provide starting val-
ues for item parameter and respondent scale-score estimation.

Phase 2: CALIBRATE

The CALIBRATE routine fits a logistic item-response function to each item of each subscale.
There are many options available to the user in this section of the program.

27
2 BILOG-MG REFERENCE

Item-response model

The response model may be the 1-, 2- or 3-parameter logistic response function. The scaling fac-
tor D = 1.7, employed to scale estimates in the normal metric, may be included or omitted at the
user’s option. Information that assists the user in model selection is provided in the marginal log
likelihood and goodness of fit indices and statistics for individual items. The user may request
plots of the observed and expected item-response curves.

Individual data or aggregate data

Item parameters may be estimated from either binary (right/wrong/omit) data or aggregate-level
frequency data (number of correct responses, number of attempts) input from Phase 1. If aggre-
gate-level data are used, it is assumed that each respondent in each group responds to only one
item per subscale, as required in matrix-sampling applications (see Mislevy, 1983). The aggre-
gate-level option can also be applied to individual data if weights are used and the binary re-
sponses take on fractional values. In this use of the aggregate-level option, each respondent re-
sponds to more than one item.

Marginal maximum likelihood (MML) estimation of item parameters

Estimation of item parameters by the method of marginal maximum likelihood is applicable to


tests of three or more items. The solution assumes the respondents are drawn randomly from a
population or populations of abilities, which may be assumed to have either a normal distribu-
tion, an arbitrary distribution specified by the user, or an arbitrary distribution to be estimated
jointly with item parameters. The empirical distributions of ability are represented as discrete
distributions on a finite number of points (histogram). In the case of multiple groups, the
CALIBRATE routine also provides estimates of the means and standard deviations of the poste-
rior distributions of ability.

The MML solution employs two methods of solving the marginal likelihood equations: the so-
called EM method and Newton-Gauss (Fisher scoring) iterations. The default number of cycles
for the EM algorithm is 10; the default for Newton steps is 2. Convergence in the EM steps is
hastened by the accelerator described in Ramsay, 1975. Results of each cycle are displayed so
that the extent of convergence can be judged. The information matrix for all item parameters is
approximated during each Newton step and then used at convergence to provide large-sample
standard errors of estimation for the item parameter estimates.

Item statistics supplied by CALIBRATE

Phase 2 provides the item parameters in the form of the lower asymptote and the item intercept
(equal to minus the product of the slope and threshold) and so-called “slope” or “discrimination”
parameter, and the item threshold (location) and loading (one-factor item factor loading =
Slope / 1 + Slope 2 ).

28
2 BILOG-MG REFERENCE

In the one-parameter solution, all slopes are equal. In both the one- and two-parameter solutions,
all lower asymptotes are zero. In the three-parameter solution with a common lower asymptote,
all lower asymptote parameters are equal. Otherwise, they are estimated separately for each item.

When an analysis of differential item functioning (DIF) is requested, the program provides esti-
mates of the unadjusted and adjusted threshold parameters for each group along with their stan-
dard errors. Estimates of group differences in the adjusted threshold parameters are also pro-
vided. When an item parameter drift (DRIFT) analysis is selected, the program provides esti-
mates of the coefficients of the linear or polynomial function.

In Phase 2, when there is a single group, the unit and origin of the scale on which the parameters
are expressed are based on the assumption that the latent ability distribution has zero mean and
unit variance. This is referred to as the “0, 1” metric. When there are multiple groups, the pro-
gram provides the option of setting the mean and standard deviation of the combined estimated
distributions of the group to zero and one.

The parameter estimates in Phase 3 can be rescaled according to scale conventions selected by
the user. If the one-parameter model has been selected, the item slope estimates are uniformly
1.0. In other cases, the scores can be scaled to a specified mean and standard deviation in the
sample. In both Phase 2 and Phase 3, the item parameter estimates can be saved before and after
rescaling, respectively, in formatted external files.

Maximum marginal a posteriori estimation of item parameters

When some items are extremely easy or extremely difficult, there may be insufficient informa-
tion in the sample to estimate their parameters accurately. This will be especially true if the
number of respondents is only moderate (250 or fewer). As an alternative to deleting these items,
prior distributions can be placed on the item parameters. The user may specify normal priors for
item thresholds, log-normal priors for slopes, and beta priors for lower asymptotes. Each item
may have a different specification for its prior.

Default specifications are for prior distributions on slopes under the two-parameter models, and
on slopes and lower asymptotes under the three-parameter model. By specifying tight priors on
selected item parameters, the user may hold these values essentially fixed while estimating other
item parameters. This feature is useful in linking studies, where new test items are to be cali-
brated into an existing scale without changing parameter values for old items.

Item fit statistics

Approximate χ 2 indices of fit are computed for each item following the final estimation cycle.
For the purpose of computing these χ 2 , the scale score continuum is divided into a number of
successive intervals convenient for displaying the response proportions (maximum of 20). Each
respondent is assigned to the interval that includes the EAP estimate (based on the type of prior

29
2 BILOG-MG REFERENCE

specified by the user) of his or her score. For the item in question, the expected response prob-
abilities corresponding to the average EAP estimate of ability of cases that fall in the interval are
used as the expected proportion for the interval.

A likelihood ratio χ 2 is then computed after combining extreme intervals so that the expected
frequency exceeds five. Degrees of freedom are equal to the number of combined intervals.
There is no reduction in degrees of freedom due to estimating the item parameters because the
marginal maximum likelihood method does not place linear constraints on the residuals.

At the user’s request, observed and expected item-response curves are plotted for each item.

Test of improved fit if the number of parameters is increased

When the expected frequencies of the individual response patterns are too small to justify the
likelihood ratio test of goodness-of-fit, the change in likelihood ratio χ 2 between the 1- and 2-
parameter models, or between the 2- and 3-parameter models, is a valid large-sample test of the
hypothesis that the added parameters are null. The degrees of freedom of each of these change
χ 2 are equal to the number of items.

Test of overall fit when the number of items is 10 or less

If the sample size is large and the number of items is small, the overall fit of the response func-
tions of all items can be tested by comparing the observed frequencies of the patterns with the
expected marginal frequencies computed from the fitted functions. The data must be in the form
of response patterns and frequencies. The likelihood ratio χ 2 statistic for the test of fit is

2n
ri
G = 2∑ ri log e
2
_
i =1
N Pi

where 2n is the number of possible patterns of the n binary item scores, ri is the observed fre-
quency of pattern i, N is the number of respondents, and Pi is the estimated marginal probability
of pattern i.

The number of degrees of freedom is 2n − kn − 1 , where k is the number of parameters in the re-
sponse model.

This test should be used only when the number of respondents is large relative to the number of
patterns. If a few patterns have zero observed frequency, ½ should be substituted as the fre-
quency for those patterns and corresponding ½s subtracted from the frequency of the most fre-
quent pattern (or 1 could be used for this purpose).

30
2 BILOG-MG REFERENCE

Phase 3: SCORE

The SCORE routine makes use of the master response file from Phase 1 and the item parameter
estimate files from Phase 2 to compute estimated scale scores for respondents. The user may se-
lect one of the three methods described below for estimating scale scores.

In each of these methods the user has the option of biweight robustification to protect the esti-
mates from spurious responses due to guessing or inattention. Because effects of guessing are
suppressed by the robustification, the lower asymptote is not incorporated in the response model
in Phase 3 when the biweight option is selected. Scores and standard errors for all subscales are
calculated simultaneously for each respondent. Results may be printed and/or saved on an exter-
nal file.

Maximum likelihood (ML)

ML estimates with or without robustification are computed by the Newton-Raphson method


starting from a linear transformation of the logit of the percent-correct score for the subject. In
those rare cases where the Newton iterations diverge, an interval-bisection method is substituted.

Estimates for respondents with all correct or all incorrect responses are attributed by the half-
item rule. That is, respondents who score all incorrect are assigned one-half a correct response to
the easiest item; respondents who score all correct are assigned one-half a correct response to the
hardest item. The estimate is then computed from this modified response pattern.

Standard errors are computed as the square root of the negative reciprocal of the expected second
derivative of the log likelihood at the estimate, i.e., the square root of the reciprocal Fisher in-
formation.

Bayes or expected a posteriori (EAP)

EAP estimates with or without robustification are computed by quadrature using a discrete dis-
tribution on a finite number of points as the prior. The user may select the number of points and
has the choice of a normal, locally uniform, or empirical prior. For the latter, the user may supply
the values of the points and the corresponding empirical weights or may use the empirical
weights generated in Phase 2.

The EAP estimate is the mean of the posterior distribution and the standard error is the standard
deviation of the posterior distribution.

Bayes modal or maximum a posteriori (MAP)

MAP estimates with or without robustification are also computed by the Newton-Gauss method.
This procedure always converges and gives estimates for all possible response patterns. A nor-
mal prior distribution with user-specified mean and variance is assumed [the default is N(0, 1)].
The estimate corresponds to the maximum of the posterior density function (mode); the standard

31
2 BILOG-MG REFERENCE

error is the square root of the negative reciprocal of the curvature of the density function at the
mode.

Estimated latent distribution

When EAP estimation is selected, the SCORE routine obtains an estimate of the population dis-
tribution of ability in the form of a discrete distribution on a finite number of points. This distri-
bution is obtained by accumulating the posterior densities over the subjects at each quadrature
point. These sums are then normalized to obtain the estimated probabilities at the points. The
program also computes the mean and standard deviation for the estimated latent distribution.
Sheppard’s correction for coarse grouping is used in calculating the standard deviation.

Rescaling

The ability estimates are calculated initially in the scale of the item parameter estimates from
Phase 2. In addition, however, rescaled estimates may be obtained by one of the following op-
tions:

 the mean and standard deviation of the sample distribution of score estimates are set to
arbitrary values specified by the user (default = 0, 1);
 a linear transformation of scale is provided by the user;
 if EAP estimation has been selected, the mean and standard deviation of the latent score
distribution may be set to arbitrary values by the user (default = 0, 1). Any of these op-
tions may be applied to all subtests in the same computer run, or different rescaling pa-
rameters may be used for each subtest. Parameter estimates and standard errors for items
from Phase 2 are rescaled for each subtest according to the selected option.

Marginal probabilities of response patterns

When EAP estimation is selected, the marginal probability of each response pattern in the sam-
ple is calculated and printed along with the corresponding number-right score and scale score.

Item and test information tables and curves

BILOG-MG provides at the user’s request a number of indices and plots concerning item and
test information:

 Plots of test information and standard error curves for each subtest.
 Tables of item information indices, including the point and value of maximum informa-
tion.

Classical reliability

The classical definition of reliability is simply the ratio of the true score variance to the observed
score variance, which is the sum of the true scores variance and the error variance. In an IRT
context, the true scores are the unobservable theta values that are estimated with a specified
32
2 BILOG-MG REFERENCE

standard error from item response patterns, as for example in Phase 3 of the BILOG-MG pro-
gram.

Classical reliability is implemented in BILOG-MG in two different ways according to how the
true score and error variances are estimated. To distinguish the two results, we refer to one as
“theoretical” reliability and the other as “empirical” reliability. The result for theoretical reliabil-
ity appears in connection with the test information plots in the Phase 3 output; the result for
“empirical” reliability appears following the display of the means, standard deviations, and aver-
age standard error of the scores earlier in the Phase 3 output. The computation of these two quan-
tities is carried out as follows.

Theoretical reliability

The theoretical reliability value applies to IRT scores estimated by the maximum likelihood
method (METHOD=1 of the SCORE command). It is based only on the item parameters passed from
Phase 2 and does not depend in any way on the ability scores computed in Phase 3. Instead, it
assumes that the true ability scores are distributed normally with mean zero and variance one in
the population of examinees. The test information function is integrated numerically with respect
to this assumed distribution to obtain the average information expected when the test is adminis-
tered in the population. The formulas for evaluating test information for any given value of abil-
ity, assuming a one, two, or three parameter logistic item response model, are as follows:

1PL

∧ ∧ ∧ 1
S .E.(1) (θ ) = {1/ D a ∑ P (θ )[1 − P(1) j (θ )]}
2 2 n
2
j =1 (1) j

2PL

∧ ∧ ∧ 1
S .E.(2) (θ ) = {1/ D 2 ∑ j =1 a 2j P(2) j (θ )[1 − P(2) j (θ )]}2
n

3PL

1
  ∧ ∧

2
 2

 1 − P(3) j (θ )  P(3) j (θ ) − g j  
S .E.(3) (θ ) = 1/ D ∑ j =1 a j
2 n 2

  
 P(3) j (θ )  1 − g j  
 

Although the formulas are expressed in terms of standard errors, the information values can be
obtained by taking the reciprocal of the squared standard error. Conversely, the reciprocal of the
average information with respect to the ability distribution is the harmonic mean of the error
variance. Since by assumption the variance of the true score (i.e., ability) distribution is equal to
one when expressed in the scale of the Phase 2 item parameter calibration, the theoretical reli-
ability is one divided by the quantity one plus the error variance.

33
2 BILOG-MG REFERENCE

In the program the theoretical reliability is computed for each form of the test when there are
multiple forms. Whether the analysis pertains to one group or multiple groups of examinees is
not relevant; because the theoretical reliability is a function only of the item parameters, the
presence of multiple groups has no effect on the results.

This version of BILOG-MG has provisions for computing information curves and reliability for
any set of item parameters supplied in Phase 1 as starting values for item parameter estimation. If
alternative forms are to be constructed from the items set, the user can insert forms commands
following the score command to indicate the item composition of the forms. See the documenta-
tion of these score-forms commands for instructions on how to set up these calculations
(REFERENCE, READF and NFORMS on the SCORE command discussed in Section 2.6.16).

Empirical reliability

The formulas for estimating the error and true score variances for calculating empirical reliability
differ depending on how the ability scores of the examinees in the sample (or in the samples in
the case of a multiple-group analysis) are estimated:

 For maximum likelihood scores (method 1), the estimated error variance is the reciprocal
of the mean of the best information evaluated at the ability estimates of all examinees in
the sample or samples. The score variance is just the variance of the maximum likelihood
scores in the sample or samples. The true score variance can therefore be estimated simply
by subtracting the error variance from the score variance. The empirical reliability in each
sample is then given by that value for the true score variance divided by the score vari-
ance.
 For Bayes EAP scores (METHOD=2 on the SCORE command), the estimate of the error vari-
ance is the mean of the variances of the posterior distribution’s ability for all examinees in
the sample or samples. Because the ability scores are regressed estimates in Bayes estima-
tion, the true score variance is estimated directly by the variance of the means of the pos-
terior distribution (i.e., the EAP scores) in the sample or samples. An empirical reliability
is therefore the true score variance divided by the sum of the true score variance and the
error variance. The formulas for computing, by numerical integration, the means and vari-
ances of the examinee posterior distributions of ability are as follows.

The Bayes estimate is the mean of the posterior distribution of θ , given the observed response
pattern x i (Bock & Mislevy, 1982). It can be approximated as accurately as required by the
Gaussian quadrature,

− ∑ X k P(x i | X k ) A( X k )
k =1
θi ≅ q
∑ P(x i | X k ) A( X k )
k =1

34
2 BILOG-MG REFERENCE

This function of the response pattern x i has also been called the expected a posteriori (EAP) es-
timator. A measure of its precision is the posterior standard deviation (PSD), approximated by

q −

− ∑ ( X k − θ i )2 P(xi | X k ) A( X k )
PSD(θ i ) ≅ k =1
q
.
∑ P(x
k =1
i | X k ) A( X k )

The weights, A( X k ) , in these formulas depend on the assumed distribution of θ . Theoretical


weights, empirical weights A* ( X k ) , or subjective weights are possibilities.

The EAP estimator exists for any answer pattern and has a smaller average error in the popula-
tion than any other estimator, including the ML estimator. It is in general biased toward the
population mean, but the bias is small within ±3σ of the mean when the PSD is small (e.g., less
than 0.2σ , see Bock & Mislevy, 1982).

Although the sample mean of the EAP estimates is an unbiased estimator of the mean of the la-
tent population, the sample standard deviation is in general smaller than that of the latent popula-
tion. In most applications, this effect is not apparent because the sample standard deviation is
adjusted arbitrarily when the scale scores are standardized. Thus, the bias is not a serious prob-
lem if all the respondents are compared using alternative test forms that have much different
PSDs. The same problem occurs, of course, when number-right scores from alternative forms
with differing reliabilities are used to compare respondents. Users of tests should avoid making
comparisons between respondents who have taken alternative forms that differ appreciably in
their reliability or precision. A further implication is that, if EAP estimates are used in computer-
ized adaptive testing, the trials should not terminate after a fixed number of items, but should
continue until a prespecified PSD is reached.

For Bayes MAP scores, the estimated error variance is the mean of the reciprocal of the test in-
formation at the modes of the posterior distributions of all examinees in the sample or samples.
Similarly the true score variance is estimated by the mean of the variances of the posterior distri-
butions at the mode. As in the case of Bayes EAP scores, the empirical reliability for the MAP
scores is equal to the true score variance divided by the sum of the true score variance and the
error variance. The formulas for computing the posterior mode and test information at the mode
are as follows.

Similar to the Bayes estimator but with a somewhat larger average error is the Bayes modal, or
so-called maximum a posteriori (MAP) estimator. It is the value of θ that maximizes

n
P(θ | xi ) = ∑ {xij log e Pi (θ ) + (1 −xij ) log e [1 − Pi (θ )]} + log e g (θ ),
j =1

where g (θ ) is the density function of a continuous population distribution of θ .

35
2 BILOG-MG REFERENCE

The stationary equation is

n xij − Pj (θ ) ∂Pj (θ ) ∂ log e g (θ )


∑ P (θ )[1 − P (θ )] ⋅ ∂θ
+
∂θ
=0
j =1 j j

Analogous to the maximum likelihood estimate, the MAP estimate is calculated by Fisher scor-
ing, employing the posterior information,

J (θ ) = I (θ ) − ∂ 2 log e g (θ ) / ∂θ 2 ,

where the right-most term is the second derivative of the population log density of θ .

In the case of the 2PL model and a normal distribution of θ with variance σ 2 , the posterior in-
formation is

n
1
J (θ ) = ∑ a 2j Pj (θ )[1 − Pj (θ )] + .
j =1 σ2

The PSD of the MAP estimate, θ , is approximated by


PSD(θ ) = 1/ J (θ ).

Like the EAP estimator, the MAP estimator exists for all response patterns but is generally bi-
ased toward the population mean.

Because empirical reliabilities are estimated from the results of test score estimation, they are
reported separately for each group of examinees in a multiple-group analysis. Note, however,
that the test forms are not distinguished in these computations. If there are multiple forms of the
test, the empirical reliabilities are aggregations over the test forms.

Information curves and reliabilities for putative test forms

It may be useful in test development to preview the information and theoretical reliability of test
forms that might be constructed from items drawn from a calibrated item bank. This can now be
done using the FIX keyword on the TEST commands. Starting values for the item parameters are
supplied to the program (see definition of the FIX keyword in Section 2.6.3 for details) or the
parameters may be read from an IFNAME file. Then all of the items are designated as fixed using
the FIX keyword. If the INFO keyword appears in the SCORE command, the required information
and reliability analysis will be performed in Phase 3.

In order for this procedure to work, however, the program must have data to process in Phases 1
and 2 for at least a few cases. Some artificial response data can be used for this purpose. The
only calculations that will be performed in Phase 2 are preparations for the information analy-

36
2 BILOG-MG REFERENCE

sis in Phase 3. The number of EM cycles in the CALIB command can therefore be set to 1 and the
number of Newton cycles to 0. The NOADJUST option must also be invoked.

Output files

Phase 1 results appear in the *.ph1 file. They include test and item identification and classical
item statistics.

Phase 2 results appear in the *.ph2 file. They include assumed prior distributions, estimated
item parameters, standard errors and goodness-of-fit statistics, DRIFT parameters, estimates of
differential item functioning, posterior distributions for the groups, group means, and standard
deviations, and estimates of their standard errors.

Phase 3 results appear in the *.ph3 file. They include assumed prior distributions of the scale
scores for MAP and EAP estimation, correlations among the subtest scores, rescaling constants,
rescaled item parameters, scale scores for the subjects, test information plots, and parameters of
the rescaled latent distribution.

2.3 The BILOG-MG interface

When the BILOG-MG program is opened the first time, a blank window is displayed with only
three active options: File, View and Help. By default, however, BILOG-MG will open with the
last active file syntax displayed. In this case, or when a command file is opened, the main menu
bar shown below is displayed.

There are 12 menu titles available on the main menu bar. The main purpose of each is summa-
rized in Table 2.1.

37
2 BILOG-MG REFERENCE

Table 2.1: Menu titles on the main menu bar

Menu title Purpose

File Creating or opening files, printing files and exiting the program

Edit Standard Windows editing functions

Setup Model specification

Data Description of the data, with option to enter new data

Technical Specifying starting values and priors for calibration and/or scoring

Save Saving output to external files

Run Generating syntax and running one or all phases of the program; access-
ing the graphics procedure.

Output Viewing output files for the current analysis.

View Show or hide the tool bar and status bar

Options Changing program settings and user preferences

Window Switching between open files

Help Access to the online help, build number and contact information for SSI

2.3.1 File menu

The File menu provides the user with options to open an existing syntax or text file, to create a
new file, to save or to print files.

38
BILOG-MG INTERFACE

When the New or Open options are selected from the File menu, the user is prompted for the
name of a command file. This can be either a new file, in which case a new name is entered in
the File name field, or an existing file, in which case one can browse and select the previously
created command file to be used as the basis for the current analysis.

The Close option is used to close any file currently open in the main BILOG-MG window, while
the Save option is used to save any changes made to the file since it was opened. With the Save
As option a file may be saved under the same or a different name in a folder of the user’s choos-
ing.

The Print and Print Setup options represent the usual Windows printing options, while selec-
tion of the Print Preview option opens a new window, in which the current file is displayed in
print preview mode. Options to move between pages and to zoom in and out are provided. The
printing options are followed by the names of the last files opened, providing easy access to re-
cently used files. The Exit option is used to exit the program and return to Windows.

39
2 BILOG-MG REFERENCE

2.3.2 Edit menu

The Edit menu has the standard Windows options to select, cut, copy and paste contents of files.
In addition, the user can search for text strings and/or replace them with new text using the Find
and Replace options.

2.3.3 Setup menu

The Setup menu is used to provide general information to be used in the analysis. The three op-
tions on the Setup menu are:

 General: used for entering general information on the type of analysis required.
 Item Analysis: used to specify the allocation of items to forms, subtests, and/or groups
and to control the item parameter estimation procedure.
 Test Scoring: used to request the scoring of individual examinees or of response patterns,
item and test information and rescaling of scores.

40
BILOG-MG INTERFACE

The menu options are used to activate dialog boxes. The function of each dialog box is described
below.

Setup menu: General dialog box

The General dialog box has four tabs on which the job description, model, type of response and
test, group and item labels may be specified. The Job Description tab is shown below.

41
2 BILOG-MG REFERENCE

The top half of the Job Description tab on the General dialog box is used to provide a title and
additional comments for the analysis. Below these fields, the number of items, subtests, groups
and/or forms (if any), and the reference group in the case of a multiple-group analysis are en-
tered. On the images shown here, links between the fields and the corresponding keywords are
provided.

Related topics

 CALIB command: REFERENCE keyword (see Section 2.6.3)


 COMMENT command (see Section 2.6.4)
 GLOBAL command: NTEST keyword (see Section 2.6.7)
 INPUT command: NFORM, NGROUP, and NTOTAL keywords (see Section 2.6.9)
 TITLE command (see Section 2.6.18)

The second tab, Model, is used to select a 1-, 2-, or 3PL model and to specify the response func-
tion metric to be used. If variant items are to be included in the analysis, or a DIF or DRIFT mul-
tiple-group analysis is required, this is indicated in the Special Models group box.

42
BILOG-MG INTERFACE

Note that the selection of some models is dependent on the presence of other keywords in the
syntax. For example, in order to request Variant Item Analysis the NVTEST keyword on the
GLOBAL command should have a value larger than the default of 0, or the NVARIANT keyword on
the LENGTH command should have a non-zero entry.

Related topics

 GLOBAL command: LOGISTIC option, NPARM and NVTEST keywords (see Section 2.6.7)
 INPUT command: DIF and DRIFT options (see Section 2.6.9)
 LENGTH command: NVARIANT keyword (see Section 2.6.11)

The Response tab allows specification of the number of response alternatives, and codes for the
responses, not-presented and/or omitted items. In the case of a 3-PL model, the user may also
request that omitted responses are scored fractionally correct. If the NPARM keyword on the

43
2 BILOG-MG REFERENCE

GLOBAL command is not set to 3 to indicate a 3-PL model (see the previous tab) any instructions
in the Omits will be scored group box will not be used.

Related topics

 GLOBAL command: OMITS option (see Section 2.6.7)


 INPUT command: NALT keyword (see Section 2.6.9)

Finally, the Labels tab provides the default item labels and group/test names. The user may enter
names in the respective fields, or import item labels from an external file by using the Browse
button next to the Item Label File Name field. After entering or selecting the file containing the

44
BILOG-MG INTERFACE

item labels, click the Import button. Alternatively, after completion of the Item Labels and Test
or Group fields, the user may save the labels to file using the Save button.

45
2 BILOG-MG REFERENCE

Related topics

 GROUP command: GNAME keyword (see Section 2.6.8)


 ITEMS command: INAMES keyword (see Section 2.6.10)
 TEST command: TNAME keyword (see Section 2.6.17)

Setup menu: Item Analysis dialog box

The Item Analysis dialog box has 5 tabs and is used to assign items to subtests, forms, and/or
groups. In addition, subtests to be calibrated are selected here. Calibration specifications control-
ling the iterative procedure are also entered on this dialog box.

On the Subtests tab shown below, labels for the subtests are entered in the first fields. The next
two fields are used to indicate the number of items per test. Note that variant items should also

46
BILOG-MG INTERFACE

be indicated here. The final column is used to select the subtests for which item parameter esti-
mation is required.

Related topics

 CALIB command: SELECT keyword (see Section 2.6.3)


 LENGTH command: NITEMS and NVARIANT keywords (see Section 2.6.11)

On the images below, links between the fields and the corresponding keywords are provided.

: Number of subtest-only item =  – 

47
2 BILOG-MG REFERENCE

Table 2.2: Effect of test length

Add or remove entries in:


 >LENGTH NITEMS
Add or remove entries in the >PRIORS command:
TMU, TSIGMA
Add or remove entries in the variant’s >TEST command:
 INUMBER, INTERCPT, SLOPE, THRESHLD, GUESS, DISPERSN, FIX
Add or remove entries in the subtest’s >TEST command:
 INUMBER, INTERCPT, SLOPE, THRESHLD, GUESS, DISPERSN, FIX
Add or remove entries in the >PRIORS command:
SMU, SSIGMA, ALPHA, BETA

The Subtest Items tab allows the user to assign specific items to the main and variant tests. Note
that, if fewer items are selected here than were indicated on the Subtests tab, the information on
the Subtest Items tab will be adjusted accordingly (see table above for specific information).

 The Select and Unselect buttons may be used to include or exclude single items or sets of
items (selected by holding down the mouse button and dragging over a selection of items).
 Double-clicking a single item also reverses the state of the item.
 To reverse the state of a block of items, highlight the items and click the Toggle button.
 A variant item can only be selected when its corresponding subtest item is selected.
 Note that the table only supports rectangular blocks of items. There are two ways to high-
light a rectangular block of items:

Click and drag: Left-click on any one corner of the block you want to highlight, hold the mouse
button down and drag the mouse to the opposite corner of the block before releasing the mouse
button. All items bounded by the opposite corners used will be highlighted.

Click-Shift-Click: Left-click on any corner of the block you want to highlight. Press and hold
down the Shift key, move the mouse pointer to the opposite corner of the block and left-click.
All items bounded by the opposite corners used will be highlighted.

Related topics

 LENGTH command: NITEMS keyword (see Section 2.6.11)


 TEST command: INUMBERS keyword (see Section 2.6.17)

48
BILOG-MG INTERFACE

The next two tabs, Form Items and Group Items, are only available when a multiple-group or
multiple-form analysis was indicated on the Job Description tab of the General dialog box.
Both dialog boxes have the same form and mode of operation as the Subtest Items tab previ-
ously discussed, the only difference being that information entered here are recorded on the
FORMS and GROUPS commands respectively. Both are used to indicate the length of and assign-
ment of items to forms/groups.

49
2 BILOG-MG REFERENCE

Related topics

 FORM command: INAMES, INUMBERS, and LENGTH keywords (see Section 2.6.6)
 GROUP command: INAMES, INUMBERS, and LENGTH keywords (see Section 2.6.8)

50
BILOG-MG INTERFACE

The final tab of the Item Analysis dialog box is the Advanced tab that controls the estimation of
item parameters. Most of the information pertains to the CALIB command. The number of itera-
tions and convergence criterion are set at the top of the dialog box, while the number of items
and ability intervals for calculation of χ 2 item fit statistics are specified in the Chi-square Item
Fit Statistics group box. At the bottom of the dialog box, prior item constraints may be re-
quested and the estimation of the means of the prior distributions on the item parameters speci-
fied to be kept at a fixed value or to be estimated along with the parameters.

If a 3PL model is selected, all the prior check boxes in the Prior Item Constraints group box
will be enabled. In the case of a 2PL model, the Prior on Guessing check box is disabled, while
both the Prior on Guessing and Prior on Slope check boxes are disabled when a 1PL model is
fitted to the data.

Related topics

 CALIB command: CHI, CRIT, CYCLES, NEWTON, NQPT, FLOAT, EMPIRICAL, GPRIOR,
SPRIOR, and TPRIOR keywords (see Section 2.6.3)
 QUAD command: POINTS and WEIGHTS keywords (see Section 2.6.13)

51
2 BILOG-MG REFERENCE

Setup menu: Test Scoring dialog box

Information entered on the Test Scoring dialog box controls the type of scoring performed in
Phase 3 of the analysis. The General tab of this dialog box is used to select the method of scor-
ing and to import item parameters for scoring from previously saved files. In the latter case, the
Browse button at the bottom of the tab can be used to locate the file containing the item parame-
ters to be used for scoring.

Group-level fit statistics, the suppression of printing of scores to the output file when scores are
saved to an external file using the SCORE keyword on the SAVE command, and biweighted esti-
mates robust to isolated deviant responses are requested using the Group Level Fit Statistics,

52
BILOG-MG INTERFACE

List Scores, and Biweight Items radio buttons. On the images below, links between the fields
and the corresponding keywords are provided.

Related topics

 GLOBAL command: IFNAME and PRN keywords (see Section 2.6.7)


 SCORE command: BIWEIGHT, FIT, and NOPRINT options (see Section 2.6.16)
 SCORE command: IDIST and METHOD keywords

The Rescaling tab is associated with the RSCTYPE, LOCATION and SCALE keywords on the SCORE
command and is used to request the scaling of the ability scores according to user-specified val-
ues. Provision is made for different scaling options for different subtests.

53
2 BILOG-MG REFERENCE

Related topics

 SCORE command: LOCATION, RSCTYPE, and SCALE keywords (see Section 2.6.16)

2.3.4 Data menu

The Data menu is used to enter data or to provide information on the data file; type and number
of records in the data file; and answer, omit and not-presented keys if applicable (Item Keys op-
tion). A distinction is made between single-subject and group-level data (Examinee Data and
Group-level Data tabs respectively).

54
BILOG-MG INTERFACE

Data menu: Examinee Data dialog box

The Examinee Data dialog box deals with single-subject data. On the General tab of this dialog
box, the type and number of data records to be used in the analysis are specified. All of the en-
tries on this dialog box correspond to keyword values on the INPUT command, as indicated on
the image below. Note that when the check box labeled External Ability Criterion and Stan-
dard Error is checked, the External Ability and Ability S.E. data fields on the Data File tab
are enabled.

Related topics

 INPUT command: EXTERNAL, NIDCHAR, SAMPLE, TAKE, and TYPE keywords (see Section
2.6.9)
 INPUT command: PERSONAL option

The name of the raw data file and the format of the data are specified on the Data File tab. An
external data file may be selected using the Browse button at the top of the tab. Data may be dis-
played in the window below the Data File Name field by clicking the Show Data button.

55
2 BILOG-MG REFERENCE

Data can be read in free- or fixed column format. For fixed-format data, a format string is re-
quired to tell the program where in the file each data element is located. To ensure the accuracy
of the format information, the column locations of the various data elements can be determined
directly using the spreadsheet in which the data are displayed: clicking directly in the display

56
BILOG-MG INTERFACE

paces a cursor whose exact position is shown by the Line: and Col: indicators.

Related topics

 GLOBAL command: DFNAME keyword (see Section 2.6.7)


 Input files (see Section 2.6.20)
 INPUT command: NFMT keyword (see Section 2.6.9)
 Variable format statement (see Section 2.6.18)

57
2 BILOG-MG REFERENCE

Data may be entered interactively on the third tab of the Examinee Data dialog box. The Ap-
pend button is used to add a new case to the end of the data file. The Insert button is used to in-
sert a new case at the current cursor location, while the Delete button is used to delete lines of
entered data. For example, if case 10 is highlighted in the table, pressing the Insert button will
insert a new case at case 10, and all cases starting from 10 will move one row down in the table.

In the Read as Fixed-Column Records group box, the user can indicate the number of data re-
cords per case and then fill in the information on the positions of the case ID, the form and group
numbers (if applicable), and the responses. The Set Format button is then clicked to create
automatically a format statement in the Format String data field. Alternatively, the format
statement may be entered directly in the Format String data filed. Clicking the Set Fields button
will then automatically fill in the fields in the Read as Fixed-Column Records group box. Note
that with either method the response string must be continuous, that is, there can be no spaces in
the response string. Any attempt to specify non-continuous response data will result in incor-

58
BILOG-MG INTERFACE

rect format and/or response information, and the data will not be read correctly. For example, if
the response string is “10A,1X,10A,1X,15A” clicking the Set Fields button does not correctly set
the Format String data field.

Related topics

 GLOBAL command: DFNAME keyword (see Section 2.6.7)


 INPUT command: NFMT keyword (see Section 2.6.9)
 Variable format statement (see Section 2.6.18)

Data menu: Item Keys dialog box

The sole purpose of the Item Keys option on the Data menu is to provide the option to use an-
swer, not-presented or omit keys. The three tabs on the Item Keys dialog box are similar. Possi-
ble key codes consist of the entire list of “possible keys”. The information is taken from the Re-
sponse Codes edit box on the Response tab of the General dialog box on the Setup menu.

On the first tab, Answer Key, an answer key may be read from an external file using the Open
button and browsing for the file containing the answer key, or entered interactively in the win-
dow towards the top of the tab.

In the case of multiple forms, a separate answer key for each form should be provided. The for-
mat of the keys should be the same as that used for the raw response data. If a key is entered in-
teractively, the Save button may be used to save the entered information to an external file. The
file used as answer key is referenced by the KFNAME keyword on the INPUT command.

Related topics

 INPUT command: KFNAME keyword (see Section 2.6.9)

59
2 BILOG-MG REFERENCE

The second tab is used for the not-presented key (if any) and information entered here is echoed
to the NFNAME keyword on the INPUT command.

60
BILOG-MG INTERFACE

Related topics

 INPUT command: NFNAME keyword (see Section 2.6.9)

The Omit Key tab is used for the omit key, if any. This tab corresponds to the OFNAME entry on
the INPUT command in the completed command file.

61
2 BILOG-MG REFERENCE

Related topics

 INPUT command: OFNAME keyword (see Section 2.6.9)

Data menu: Group-Level Data dialog box

The Group-Level Data dialog box is similar in purpose to the Examinee Data dialog box where
single-subject data may be entered. On this dialog box, however, information on the structure of
group-level data to be used in analysis is provided.

The General tab is used to provide information on the number of groups, group ID, and number
of data records and weights, if any, to use in analysis. All entries correspond to keywords on the
INPUT command.

62
BILOG-MG INTERFACE

Related topics

 INPUT command: EXTERNAL, NIDCHAR, SAMPLE, TAKE, and TYPE keywords (see Section
2.6.9)

The Browse button on the Data File tab allows the user to browse for the file containing the
group-level data. After clicking the Show Data button the contents of the selected file are dis-

63
2 BILOG-MG REFERENCE

played in the window below these buttons. The Format String field should be completed ac-
cording to the contents of the file. In contrast to item responses in the case of single-subject data,
which are read in “A” format, the frequencies in group-level data files are read in “F” format as
shown below.

Related topics

 GLOBAL command: DFNAME keyword (see Section 2.6.7)


 INPUT command: NFMT keyword (see Section 2.6.9)
 Variable format statement (see Section 2.6.18)

2.3.5 Technical menu

The first set of options on the Technical menu is used to assign starting values, prior constraints,
and information on prior latent distributions for both calibration and scoring during the analysis.
The last three options on this menu provide the user with the option to exercise even more con-

64
BILOG-MG INTERFACE

trol over the sampling and EM and Newton cycles (Data Options); request a Rasch model, plots
per group or to prevent the adjustment of the latent distribution to a mean of 0 and S.D. of 1
(Calibration Options), and finally to calculate domain scores based on a user-supplied file con-
taining information on previously calibrated items (Score Options).

Technical menu: Item Parameter Starting Values dialog box

The Assign Item Parameter Starting Values option on the Technical menu may be used to
import starting values for item parameters from a saved item parameter or user-supplied file or,
alternatively, to enter starting values interactively.

The first tab on the Item Parameter Starting Values dialog box is used to select a previously
created file. To use an item parameter file created during a previous BILOG-MG analysis, check
the radio button next to the Import Saved Item Parameter File option. If starting values are
provided through a user-supplied file, check the radio button next to the Import User Supplied
File … option. The Browse button is used to locate the file.

65
2 BILOG-MG REFERENCE

Enter starting values for the item parameters on the Enter Values tab to set values for the corre-
sponding keywords on the TEST command. A subset of slope, threshold or asymptote parameters
may be selected by holding the mouse button down and dragging until the selection is complete.
Clicking the right mouse button will display a pop-up menu that can be used to assign values to
the parameters.

All selected parameters may be set to a specific value or to the default value. In addition, the user
may select one parameter and assign a value to this and all other parameters below it by selecting
the appropriate option from the pop-up menu. Alternatively, the Default Value or Set Value but-
tons may be used to assign values to the selected parameters.

There are two ways to select the cell for parameter values:

 Select a rectangular block of cells. Use either the click-and-drag or click-shift-click


method described in the discussion of the Item Analysis dialog box of the Setup menu.
Note that cells selected must be one continuous rectangular block.
 Select one or more columns by clicking on the column header. The click-and-drag
66
BILOG-MG INTERFACE

method works when selecting a continuous block of columns. To select a disjoint block of
columns, press and hold the Ctrl key down when clicking the header.

Note that when selecting a block of cells, the Shift key is used. When selecting a block of col-
umns through clicking column headers, the Ctrl key is used.

Clicking the column header changes the selection state of the entire column. It toggles the items
in the column from the “selected” state to the “unselected” state and vice versa. The Save as

67
2 BILOG-MG REFERENCE

User Data option may be used to provide a name for the external file to which input is saved
with the file extension *.prm.

Related topics

 TEST command: DISPERSN, GUESS, INTERCPT, SLOPE, and THRESHLD keywords (see
Section 2.6.17)

Technical menu: Assign Fixed Items dialog box

This dialog box is associated with the FIX keyword on the TEST command, which is used to indi-
cate which items of a subtest are free to be estimated, and which are to be held fixed at their
starting values.

As with the Enter Values tab on the Item Parameter Starting Values dialog box discussed
above, cells may be selected in rectangular blocks or by columns. The same conventions for the
use of the Shift and Control keys apply. Additionally, double-clicking on any one cell under the
Fixed column also toggles the cell state: fixed to free or free to fixed.

Related topics

 TEST command: FIX keyword (see Section 2.6.17)

68
BILOG-MG INTERFACE

Technical menu: Item Parameter Prior Constraints dialog box

The Item Parameter Prior Constraints dialog box is associated with the PRIORS command.
The number of tabs on the dialog box depends on the number of subtests – priors may be entered
for each subtest separately.

The user can set values by selecting an item or group of items and clicking the Set Value button.
A subset of cells in the displayed table may be selected by holding the mouse button down and
dragging until the selection is complete. Clicking the right mouse button will display a pop-up
menu, which can be used to assign values to the cells. All selected cells may be set to a specific
value or to the default value.

69
2 BILOG-MG REFERENCE

In addition, the user may select one parameter and assign a value to this and all other parameters
below it by selection the appropriate option from the pop-up menu. A dialog box appears,
prompting the user to enter the value to be assigned.

Alternatively, the Default Value or Set Value buttons may be used to assign values to the se-
lected parameters. To set the priors of a selection of items to their default value, the Default
Value button may be used. Links between the fields on this dialog box and the corresponding
keywords on the PRIOR command are shown on the image below.

Related topics

 CALIB command: READPRIOR option (see Section 2.6.3)


 PRIORS command: ALPHA, BETA, SMU, SSIGMA, TMU, and TSIGMA keywords (see Section
2.6.12)

Technical menu: Calibration Prior Latent Distribution dialog box

The Assign Calibration Prior Latent Distribution option provides the opportunity to assign
prior latent distributions, by subtest, to be used during item parameter estimation. This dialog
box is associated with the QUAD command(s). This option is only enabled when the IDIST key-
word is set to 1 or 2 on the CALIB command.

There is no user’s interface to select this option. It must be set manually in the command file. For
assigning prior latent distributions to be used during scoring, see the Assign Scoring Prior La-
tent Distribution dialog box.

The first image below shows the dialog box for a single group analysis. Quadrature points and
weights may be provided separately for each subtest. On the second image, the Calibration
Prior Latent Distribution dialog box for a multiple-group analysis is shown. Note that quadra-
ture points and weights may be entered per group and subtest, as a tab for each subtest is pro-
vided in this case, and that the set of positive fractions entered as Weights should sum to 1.0.

70
BILOG-MG INTERFACE

The format of the table on the Calibration Prior Latent Distribution dialog box depends on the
values of the NTEST, NGROUP and IDIST keywords. Examples are shown below.

71
2 BILOG-MG REFERENCE

>GLOBAL NTEST=1, …
>INPUT NGROUP=1, …
>CALIB IDIST=1 or 2, …

>GLOBAL NTEST>1, …
>INPUT NGROUP=1, …
>CALIB IDIST=1, …

72
BILOG-MG INTERFACE

>GLOBAL NTEST>1, …
>INPUT NGROUP=1, …
>CALIB IDIST=2, …

>GLOBAL NTEST=1, …
>INPUT NGROUP>1, …
>CALIB IDIST=1 or 2, …

73
2 BILOG-MG REFERENCE

>GLOBAL NTEST>1, …
>INPUT NGROUP>1, …
>CALIB IDIST=1, …

>GLOBAL NTEST>1, …
>INPUT NGROUP>1, …
>CALIB IDIST=2, …

74
BILOG-MG INTERFACE

Related topics

 QUAD command: POINTS and WEIGHTS keywords (see Section 2.6.13)

Technical menu: Scoring Prior Latent Distribution dialog box

The Assign Scoring Prior Latent Distribution dialog box provides the opportunity to assign
prior latent distributions, by subtest, to be used during scoring. This dialog box is associated with
keywords on the SCORE and QUAD commands.

For assigning prior latent distributions to be used during the item parameter estimation phase, see
the Assign Calibration Prior Latent Distribution dialog box.

On the Normal tab of this dialog box, the type of prior distribution to be used for the scale scores
is the first information required. This tab is used when separate arbitrary discrete prior for each
group or for each group for each subtest are to be read from a QUAD command. These options are
only available when the Expected A Posteriori (EAP) method of scale score estimation is used.

When maximum likelihood (ML) or Maximum A Posteriori (MAP) estimation is selected, these
options are disabled and the PMN and PSD keywords may be used to specify real-numbered values
for the means and standard deviation of the normal prior distributions. The default values of
these keywords for each group for each subtest, 0 and 1 respectively, are displayed.

To provide alternative values for the PMN and PSD keywords, click in the fields and enter the new
values.

Information in the table below corresponds to numbers on the image shown overleaf.

Set the number of entries in the >QUAD commands:


 POINTS, WEIGHTS to NGROUP * NTEST.

 Set the number of entries in the >QUAD commands:


POINTS, WEIGHTS to NGROUP.

Related topics

 SCORE command: IDIST, PMN and PSD keywords (see Section 2.6.16)

75
2 BILOG-MG REFERENCE

The User Supplied tab allows the user to change the number of quadrature points to be used by
subtest. Different quadrature points and weights may be supplied for each group per subtest, as
shown in the image below where two subtests were administered to two groups of examinees.

76
BILOG-MG INTERFACE

Related topics

 QUAD command: POINTS and WEIGHTS keywords (see Section 2.6.13)


 SCORE command: NQPT keyword (see Section 2.6.16)

Technical menu: Data Options dialog box

To set the values for the random number generator seed used with the SAMPLE keyword on the
INPUT command, or to change the value of the acceleration constant used during the E-steps in
item calibration, the Data Options dialog box may be used. To use default values, the Set to De-
fault Value buttons may be clicked after which the program defaults will be displayed in the cor-
responding fields.

77
2 BILOG-MG REFERENCE

Note that:

 The Item Analysis and/or Scoring from Saved Master File Name section is the same as
the Master Data edit box in the Save Output to File dialog box .
 The dialog box does not read any data from the specified master file. The filename is sim-
ply copied to the MASTER keyword on the SAVE command.

Related topics

 CALIB command: ACCEL and NFULL keywords (see Section 2.6.3)


 INPUT command: ISEED and SAMPLE keywords (see Section 2.6.9)

Technical menu: Calibration Options dialog box

The Calibration Options dialog box is associated with keywords on the CALIB command.

To request a Rasch model, the One Parameter Logistic Model option should be checked. Sepa-
rate item plots for each group may be requested using the Separate Plot for Each Group check
box while adjustment of the latent distribution to a mean of 0 and S.D. of 1 may be suppressed
using the first check box. To keep the prior distributions of ability in the population of respon-
dents fixed at the value specified in the IDIST keyword and/or the QUAD commands, the Fixed
Prior Distribution of Ability check box should be checked. This corresponds to the FIXED op-
tion on the CALIB command.

78
BILOG-MG INTERFACE

Related topics

 CALIB command: FIXED, GROUP-PLOTS, NOADJUST and RASCH options (see Section 2.6.3)

Technical menu: Score Options dialog box

This dialog box allows the user to request the calculation of domain scores based on a user-
supplied file containing the item parameters for a sample of previously calibrated items for a
domain and to request the computation and listing of the coefficients of skewness and kurtosis of
the ability estimates and of the latent distribution.

Related topics

 SCORE command: DOMAIN and FILE keywords (see Section 2.6.16)


 SCORE command: MOMENTS option

79
2 BILOG-MG REFERENCE

2.3.6 Save menu

The Save Output to File dialog box is accessed through the Save menu. Various types of data
may be saved to external files using the SAVE command. On the image below, links are provided
between the fields of this dialog box and the corresponding keywords on the SAVE command.

Related topics

 GLOBAL command: SAVE option (see Section 2.6.7)


 SAVE command (see Section 3.2.13)

80
BILOG-MG INTERFACE

2.3.7 Run menu

The Run menu provides the necessary options to generate syntax from the input provided on the
dialog boxes accessed through the Setup, Data, and Technical menus (Build Syntax option) or
to run separate or all phases of the analysis. This menu is also used to access the graphics proce-
dure described in Chapter 6 via the Plot option. Note that this option is only enabled after com-
pletion of the three phases of analysis.

81
2 BILOG-MG REFERENCE

Select the Build Syntax option to generate a syntax or command file based on the contents of the
previous dialog boxes and menus. When the Initialize option is selected, changes made to an
existing command file in the syntax window are transferred to the dialog boxes and menus.

Run only the first phase of the analysis to obtain the classical statistics by selecting the Classical
Statistics Only option. The item parameter estimation may be performed next by selecting the
Calibration Only option, and scoring after that using the Scoring Only option. These options
are provided to allow the user to run and verify information in the output for each phase of the
analysis before continuing to the next step. When running the analysis phase by phase, the option
to run the next phase will only be enabled after successful completion of the previous phase.

Alternatively, the user can request to run all three phases in succession by selecting the Stats,
Calibration and Scoring option. A message indicating the normal or abnormal termination of
each phase will appear in the main window between phases to alert the user to possible problems
in a particular phase of the analysis. This message may be suppressed using the Options menu.

2.3.8 Output menu

To view the output obtained during any of the three phases of analysis, the options on the Out-
put menu may be used. Options will be enabled or disabled depending on the number of com-
pleted phases of the analysis. When any of these options is selected, the relevant output file will
be displayed. After inspection, the user may close this file to return to the main BILOG-MG
window, where the command file on which the analysis was based will be displayed.

2.3.9 View menu

The View menu allows the user to add or remove the status bar displayed at the bottom of the
main BILOG-MG window. The toolbar, allowing the standard Windows editing functions, is
displayed by default.

82
BILOG-MG INTERFACE

2.3.10 Options menu

The Options menu provides access to the Settings dialog box. This dialog box has three tabs:
General, Editor, and Server.

On the General tab as shown below, the size of the application window and document window
can be set. The user may opt to always open the last active document when opening BILOG-MG
(default) or to start with a blank screen instead by unchecking the Open last active document
on start check box.

To change the font in which the contents of the editor window are displayed, or to use tabs, the
Editor tab of the Settings dialog box may be used. Reminders of file changes and automatic re-
loading of externally modified documents may also be requested.

83
2 BILOG-MG REFERENCE

The Server tab of the Settings dialog box may be used to show or hide the windows in which
details of the analysis are displayed during the run. To open multiple command files, which can
then be accessed using the Windows menu, check the box next to the Allow multiple command
file… option on this tab.

84
BILOG-MG INTERFACE

2.3.11 Window menu

The Window menu allows the user to arrange multiple windows or to switch between open files.
To open multiple command files simultaneously that may be accessed through this menu, use the
Server tab on the Settings dialog box accessed through the Options menu.

2.3.12 Help menu

The Help menu provides access to the BILOG-MG help file (Help Topics option) and to the
About BILOG-MG for Windows dialog box in which the version and build number of the ap-
plication are displayed. This box may also be used to directly e-mail SSI for technical support or
product information or to link to the SSI website.

2.3.13 Location of keywords in interface

Command and keyword Menu and option Tab on dialog box

TITLE command Setup, General Job Description


COMMENT command Setup, General Job Description
GLOBAL command:
Data, Examinee Data/Data,
DFNAME Data File/Enter Data
Group-Level Data
MFNAME - -

CFNAME - -

IFNAME Setup, Test Scoring General

NPARM Setup, General Model

NWGHT - -

NTEST Setup, General Job Description

NVTEST

LOGISTIC Setup, General Model

85
2 BILOG-MG REFERENCE

Command and keyword Menu and option Tab on dialog box

OMITS Setup, General Response

SAVE Save -

PRNAME Setup, Test Scoring General


SAVE command:

MASTER Save -

CALIB Save -

PARM Save -

SCORE Save -

COVARIANCE Save -

TSTAT Save -

POST Save -

EXPECTED Save -

ISTAT Save -

DIF Save -

DRIFT Save -

PDISTRIB Save -
LENGTH command:

NITEMS Setup, Item Analysis Subtests

NVARIANT Setup, Item Analysis Subtests

INPUT command:

NTOTAL Setup, General Job Description


Data, Examinee Data/Data,
NFMT Data File
Group-Level Data
Data, Examinee Data/Data,
TYPE General
Group-Level Data
Data, Examinee Data/Data,
SAMPLE General
Group-Level Data
NALT Setup, General Response

86
BILOG-MG INTERFACE

Command and keyword Menu and option Tab on dialog box

Data, Examinee Data/Data,


NIDCHAR General
Group-Level Data
Data, Examinee Data/Data,
TAKE General
Group-Level Data
NGROUP Setup, General Job Description

NFORM Setup, General Job Description

DIAGNOSE

KFNAME Data, Item Keys Answer Key

NFNAME Data, Item Keys Not Presented Key

OFNAME Data, Item Keys Omit Key

DRIFT Setup, General Model

DIF Setup, General Model

PERSONAL Data, Examinee Data General


Data, Examinee Data/Data,
EXTERNAL General
Group-Level Data
ISEED Technical, Data Options -
ITEMS command:

INUMBERS - -

INAMES Setup, General Labels

TEST command:

TNAME Setup, General Labels

INUMBERS Setup, Item Analysis Subtest Items

INAMES Setup, General Labels


Technical, Assign Item Pa-
INTERCPT Import/Enter Values
rameter Starting Values
Technical, Assign Item Pa-
SLOPE Import/Enter Values
rameter Starting Values
Technical, Assign Item Pa-
THRESHLD Import/Enter Values
rameter Starting Values
Technical, Assign Item Pa-
GUESS Import/Enter Values
rameter Starting Values

87
2 BILOG-MG REFERENCE

Command and keyword Menu and option Tab on dialog box

Technical, Assign Item Pa-


DISPERSN Import/Enter Values
rameter Starting Values
FIX Technical, Assign Fixed Items -

FORM command:

LENGTH Setup, Item Analysis Form Items

INUMBERS Setup, Item Analysis Form Items


INAMES Setup, General Labels
GROUP command:

GNAME Setup, General Labels

LENGTH Setup, Item Analysis Group Items

INUMBERS Setup, Item Analysis Group Items

INAMES Setup, General Labels


DRIFT command:

MAXPOWER - -

MIDPOINT - -
Data, Examinee Data / Data,
Variable format statement Data File
Group-Level Data
CALIB command:

CHI Setup, Item Analysis Advanced

NQPT Setup, Item Analysis Advanced

CYCLES Setup, Item Analysis Advanced

NEWTON Setup, Item Analysis Advanced

PRINT - -

CRIT Setup, Item Analysis Advanced

IDIST - -

PLOT - -

88
BILOG-MG INTERFACE

Command and keyword Menu and option Tab on dialog box

DIAGNOSIS - -

REFERENCE Setup, General Job Description

SELECT Setup, Item Analysis Subtests

RIDGE - -

ACCEL Technical, Data Options -

NSD - -

COMMON - -

EMPIRICAL Setup, Item Analysis Advanced

NORMAL - -

FIXED Technical, Calibration Options -

TPRIOR Setup, Item Analysis Advanced

SPRIOR Setup, Item Analysis Advanced

GPRIOR Setup, Item Analysis Advanced

NOTPRIOR Setup, Item Analysis Advanced

NOSPRIOR Setup, Item Analysis Advanced

NOGPRIOR Setup, Item Analysis Advanced


Technical menu: Item Parame-
READPRIOR -
ter Prior Constraints dialog box
FLOAT Setup, Item Analysis Advanced

NOFLOAT Setup, Item Analysis Advanced

GROUP-PLOTS Technical, Calibration Options -

NOADJUST Technical, Calibration Options -

RASCH Technical, Calibration Options -

NFULL Technical, Data Options -

89
2 BILOG-MG REFERENCE

Command and keyword Menu and option Tab on dialog box

QUAD command:
Technical, Assign Calibration
POINTS -
Prior Latent Distributions
Technical, Assign Calibration
WEIGHTS -
Prior Latent Distributions
PRIORS command:
Technical, Assign Item Pa-
TMU -
rameter Prior Constraints
Technical, Assign Item Pa-
TSIGMA -
rameter Prior Constraints
Technical, Assign Item Pa-
SMU -
rameter Prior Constraints
Technical, Assign Item Pa-
SSIGMA -
rameter Prior Constraints
Technical, Assign Item Pa-
ALPHA -
rameter Prior Constraints
Technical, Assign Item Pa-
BETA -
rameter Prior Constraints
SCORE command:

METHOD Setup, Test Scoring General


Technical, Assign Scoring
NQPT User-supplied
Prior Latent Distribution
IDIST (values=0,3) Setup, Test Scoring General
Technical, Assign Scoring
IDIST (values=1,2) Normal
Prior Latent Distribution
Technical, Assign Scoring
PMN Normal
Prior Latent Distribution
Technical, Assign Scoring
PSD Normal
Prior Latent Distribution
RSCTYPE Setup, Test Scoring Rescaling

LOCATION Setup, Test Scoring Rescaling

SCALE Setup, Test Scoring Rescaling

INFO - -

90
BILOG-MG INTERFACE

Command and keyword Menu and option Tab on dialog box

BIWEIGHT Setup, Test Scoring General

FIT Setup, Test Scoring General

NOPRINT Setup, Test Scoring General

YCOMMON - -

POP - -

REFERENCE - -

READF - -

NFORMS - -

MOMENTS Technical, Score Options -

DOMAIN Technical, Score Options -

FILE Technical, Score Options -


QUADS command:
Technical, Assign Scoring
POINTS User-Supplied
Prior Latent Distribution
Technical, Assign Scoring
WEIGHTS User-Supplied
Prior Latent Distribution

Related topics

 Overview of required and optional commands.

2.4 Getting started with BILOG-MG

To illustrate the use of the interface in creating syntax files, the data file exampl01.dat in the
examples subfolder of the BILOG-MG installation folder is used. This problem is based on an
example in Thissen, Steinberg & Wainer (1993). Other examples based on the same data (see
complete description below) can be found in Chapter 10.

In the late 1980s, R. Darrell Bock created a “College Level Spelling Test” comprising a sample
of 100 words drawn from a large source list by simple random sampling. Data collected using
that test are the basis for the empirical example in the paper “IRT Estimation of Domain Scores”
(R.D. Bock, M.F. Zimowski, & D. Thissen, Journal of Educational Measurement, 1997, 34, 197-
211). Parameter estimates for the 2PL IRT model for the 100-item test are tabulated in that pa-
per. Bock created the script for conventional oral presentation of the test, and recorded the origi-
nal reading of the script (by Monica Marie Bock) on reel-to-reel magnetic tape. Subsequent cop-
ies onto cassette tape were used by Jo Ann Mooney in the collection of data from around 1000

91
2 BILOG-MG REFERENCE

University of Kansas Undergraduates. We are using the file with 100 words (items) and 1000
records (examinees).

The words for the test were randomly selected from a popular wordbook for secretaries. Students
were asked to write the words as used in a sentence on the tape recording. Responses were
scored 1 if spelled correctly and 0 if spelled incorrectly. Because the items are scored 1,0, ac-
cording to the defaults assumed by the program, an answer key is not required.

The purpose of this section is to give the new user a quick overview of the interface and the ab-
solute minimum input needed to run the program. In Chapter 11, the syntax and keywords of
each example are discussed in detail.

A few lines of the data in exampl01.dat are shown below:

11 0000
21 0001
31 1000
41 1001

162 1111

The first three characters in each line represent the examinee identification field. This is followed
by the responses to the four items in the test.

2.4.1 A first model: 2PL model for spelling data

As a first example, we wish to set up a simple 2-PL model for this data. To construct a command
file, begin by selecting the New option from the File menu. The Open dialog box is now acti-
vated.

Assign a name, with the *.blm file extension, to the command file. In this case, the command file
first.blm is created in the examples folder as shown below. Click Open when done to return to
the main BILOG-MG window.

92
GETTING STARTED WITH BILOG-MG

Note that a number of options have been added to the main menu bar of the BILOG-MG win-
dow. Of interest for this example are the Setup, Data, Run and Output options. The Setup
menu is used to describe the model to be fitted to the data. As a first step, select the General op-
tion from this menu to access the General dialog box.

The General dialog box has four tabs, on which both required and optional keywords may be
set. On the Job Description tab below, the number of items in the test is indicated as 4. The type
of model is selected on the Model tab. As the default model fitted by BILOG-MG is a 2PL
model, this tab is not used now. Click OK to return to the main window.

93
2 BILOG-MG REFERENCE

The next step in specifying the analysis is to assign the items to be calibrated to the test. To do
this, select the Item Analysis option on the Setup menu to access the Item Analysis dialog box.

Change the default value of 1 under Subtest Length to 4 by clicking in this field and typing in
“4”. By default, all items will be analyzed, as indicated under the Analyze this run header. Click
OK when done.

This completes the model specification. All that remains to be done is to provide information on
the data. To do so, the Data menu is used. In this case, we have examinee data and thus the Ex-
aminee Data option is selected from the Data menu.

94
GETTING STARTED WITH BILOG-MG

On the Examinee Data dialog box, enter the number of characters representing the examinee
identification (in this case 3) in the Number of Case ID Characters field. By default, all data
are used as shown below.

To provide information on the name and format of the data file, click the Data File tab.

 Use the Browse button to browse for the data file.


 Next, indicate that the data are in fixed format by clicking the Read as Fixed-Column
Records radio button.
 Complete the table in the Read as Fixed-Column Records group box by clicking in the
cell next to Case ID under the First header. Enter a “1” here to indicate that the examinee
identification starts in column 1 of the data. Next, enter a “3” under the Last header to in-
dicate the end of the examinee identification.
 Note that Form Number, Group Number, etc is disabled due to the information we en-
tered from the Setup menu. The only other fields to complete are the Response String
fields.
 The response to the first item is in column 5 of the data, so a “5” is entered in the cell next
to Response String under the First header. The response to the last item is in column 8,
and an “8” is thus entered in the Last column.
 By default, BILOG-MG assumes that there is one line of data for each examinee, as indi-
cated by the Number of Data Record per Case field. As this is the case for data in ex-
ampl01.dat, no further information is required. Click the Set Format button to write the

95
2 BILOG-MG REFERENCE

format statement (3A1,1X,4A1) to the Format String field. Click OK to return to the
main BILOG-MG window.

96
GETTING STARTED WITH BILOG-MG

Having completed the specification in terms of model and data, the command file is created by
selecting the Build Syntax option from the Run menu. The syntax created by the program is
now displayed in the main window, as shown below. Note that no options are given on the ITEMS
and SCORE commands in this file, indicating that all program defaults will be used.

Save the completed syntax to file by selecting the Save option on the File menu.

97
2 BILOG-MG REFERENCE

The analysis is now performed by using some of the other options on the Run menu. Although
the analysis can be done phase by phase (using the Classical Statistics Only, Calibration Only,
and Scoring Only options) all three phases can be run in sequence by selecting the Stats, Cali-
bration and Scoring option from this menu.

After successful completion of all three phases of the analysis, a message to this effect is dis-
played on the screen. If a problem was encountered during analysis, this message box will indi-
cate that all phases were not completed successfully. Access the output from the analysis through
the Output menu. Classical statistics are given in the *.ph1 file.

98
GETTING STARTED WITH BILOG-MG

A section of this output file is shown below.

In the first.ph2 file, the results of the item calibrations are given. The item parameter estimates
for the four items in the test are shown below.

Scoring results are given in the first.ph3 file. The complete list of scores is printed to this file by
default. A section of this output, showing summary statistics for the score estimates, is shown
below.

99
2 BILOG-MG REFERENCE

2.4.2 A second model: DIF model for spelling data

The data analyzed in the previous example actually came from two groups of respondents. The
groups in this example are the two sexes. The same four items are presented to both groups on a
single test form. The group indicator is found in column 3 of the data records.

11 0000
21 0001
31 1000
41 1001

162 1111

The third column of the data contains either a 1 or a 2, indicating whether an examinee belonged
to group 1 (male) or group 2 (female).

The previous single-group analysis for this group, contained in the command file first.blm, is
modified to perform a DIF analysis for the two groups. As a first step, the General option on the
Setup menu is used to indicate the presence of multiple groups.

On the Job Description tab of the General dialog box, change the Number of Examinee
Groups from the default value of 1 to 2, as shown below.

100
GETTING STARTED WITH BILOG-MG

In the case of a DIF model, a 1PL model is required. To change the model from the default 2PL
model previously used, click the Model tab and check the 1-Parameter Logistic (1PL) radio
button in the Response Model group box.

To request a DIF model, click the Differential Item Functioning (DIF) radio button in the Spe-
cial Models group box. By default, the first group will be used as reference group as indicated in
the Reference Group field.

Once this is done, all necessary changes to the General dialog box have been made. Click OK to
return to the main BILOG-MG window.

101
2 BILOG-MG REFERENCE

The allocation of items to be calibrated for each of the two groups is specified using the Item
Analysis option of the Setup menu. Once this option is selected, the Item Analysis dialog box is
displayed.

Leaving the Subtests tab as previously completed, click the Group Items tab. By default, all
items will be selected for the first group. This is indicated by the display of the item names in a
bold font in the first column of the table. To select all four items for the second group, click on
ITEM0001 in the second column. While holding the Shift button on the keyboard down, click on
ITEM0004. All four items are now highlighted. Click the Select button at the bottom left of the
dialog box to select all items.

This completes the model specification. Click OK to return to the main window.

102
GETTING STARTED WITH BILOG-MG

The only remaining task is to revise the reading of the data file so that the group identification
field can be recognized and processed by the program. To do so, select the Examinee Data op-
tion from the Data menu.

On the General tab of the Examinee Data dialog box, the number of case identification charac-
ters is now decreased to 2, as shown below. (Recall that previously this field was set to 3: in ef-
fect, a combination of actual examinee ID and group ID was used to identify the cases in the
previous example.)

103
2 BILOG-MG REFERENCE

The format statement is now adjusted accordingly by changing the entries in the Read as Fixed-
Column Records group box:

 The Last value for the Case ID is set to 2.


 The Group Number field, now enabled due to our selection of a multiple-group analysis
from the Setup menu, is set to 3 under both the First and Last headers, as the group iden-
tification, given in column 3 of the data, is one character in length.
 Finally, these changes are made to the format statement by clicking the Set Format but-
ton. Note that the format statement now contains an additional “I1” indicating the group
number in integer format.

This completes the syntax specification. Return to the main window by clicking the OK button.

104
GETTING STARTED WITH BILOG-MG

The revised command file is generated by selecting the Build Syntax option from the Run
menu.

After generating the syntax, it is saved to file using the Save As option on the File menu. The
revised syntax is saved in the file second.blm in the examples folder. Click the Save button after
specifying a name for and path to the new command file.

105
2 BILOG-MG REFERENCE

When the syntax displayed in the main BILOG-MG window is compared to the first example,
we note the addition of two GROUP commands and the NGROUP and DIF keywords on the INPUT
command. The revised format statement is also included. The NPARM keyword on the GLOBAL
command (not shown here) indicates that a 1-PL model is requested.

The three phases of the analysis can be run separately using the Classical Statistics Only, Cali-
bration Only, and Scoring Only options on the Run menu. To run the phases sequentially, se-
lect the Stats, Calibration, and Scoring option from this menu.

Output for the analysis is accessed as before using the Output menu from the main menu bar.

In the partial output from the second.ph1 file for this DIF analysis classical item statistics are
provided by group. Similar statistics are also given for the combined group (not shown below).

106
GETTING STARTED WITH BILOG-MG

The Phase 2 output in the second.ph2 file provides item parameter estimates, and DIF specific
output as shown below.

107
2 BILOG-MG REFERENCE

Although the second.ph3 file is created as shown below, no scoring is performed in the case of a
DIF analysis.

2.5 Syntax
2.5.1 Data structures: ITEMS, TEST, GROUP and FORM commands

In addition to conventional IRT analysis of one test administered to one group of examinees,
BILOG-MG is capable of analyzing data from test development and scoring applications in
which multiple alternative test forms, each consisting of multiple subtests or scales, are adminis-
tered to persons in one or more groups. BILOG-MG relies on a system of four commands,
ITEMS, TEST, FORM, and GROUP, describing the assignment of items to subtests, forms, and
groups. The syntax of these commands is discussed in detail in the syntax section. Here a de-
scription is given of how the commands work together to accommodate a wide range of applica-
tions.

Related topics

 The FORM command


 The GROUP command
 The ITEMS command
 The TEST commands
 How the FORM and GROUP commands work
 Setup menu: General dialog box
 Setup menu: Item Analysis dialog box

The ITEMS command

The ITEMS command attaches names and numbers to items of the test instrument. In the TEST,
FORM, and GROUP commands, the user can select items either by name or number. As a conven-
ience to the user, the program can automatically create sequences of eight-character item
108
OVERVIEW OF SYNTAX

names consisting of a user-supplied alphabetical section and a sequential numerical section. An


ITEMS command is required in all applications of the program. It lists the entire set of items ap-
pearing in the test instrument. In the TEST, FORM, and GROUP commands, the full set, or subsets of
it, may appear. The examples illustrate the use of the ITEMS command in a variety of applications
of the program.

Related topics

 The FORM command


 The GROUP command
 The TEST commands
 How the FORM and GROUP commands work
 Setup menu: General dialog box

The TEST command

The TEST commands describe the subsets (or scales) that will be scored in the test. There is a
separate TEST command for each subtest. A subtest may consist of a combination of items in the
instrument including items that appear on different test forms making up the instrument. The
TEST commands identify the items belonging to each subtest.

In addition, when the LENGTH command indicates variant items are present in a particular subtest
(items that are included in the test to obtain item statistics for a subsequent form of the test but
are not used in computing test scores), the user identifies these items with the corresponding sub-
test by means of an additional TEST command that immediately follows the TEST command of
the subtest.

If the entire instrument is analyzed in a single subtest without variant items, the problem setup
requires a single TEST command that lists all the items in the test instrument. Chapter 10 illus-
trate this type of application.

If multiple subtests of items are selected for analysis, a separate TEST command is required for
each subscale. The example in Section 10.6 illustrates the problem setup for an analysis with
multiple subtests within a single test form. The example discussed in Section 10.8 shows the
setup for analysis with multiple subtests for an instrument and with multiple test forms. Section
10.7 illustrates the special TEST command setup for an instrument with variant items.

Related topics

 LENGTH command
 Setup menu: General dialog box
 Setup menu: Item Analysis dialog box
 Technical menu: Assign Fixed Items dialog box
 Technical menu: Item Parameter Starting Values dialog box

109
2 BILOG-MG REFERENCE

The FORM command

The FORM command controls the input of the response record. It lists the items in the order in
which they appear in the data records. Most applications of BILOG-MG require at least one FORM
command.

There are two arrangements in which multiple forms data can be supplied to the program. We
refer to them as the expanded format and the compressed format (see also the file structure speci-
fications):

Expanded format

The response record of each examinee spans the entire set of items appearing in the test instru-
ment. Each item of the test instrument has a unique location (column) in the input records. A not-
presented code appears in the locations of the items belonging to forms not presented to a given
examinee. Expanded format is convenient for users who store data in two-dimensional (row by
column) arrays typical of many database systems. This format requires only a single FORM com-
mand, even though the data arise from multiple forms. Note that the order of the items in the in-
put records, and thus the order of their listing on the FORM command, does not have to be the
same as that in the list of names and numbers in the ITEMS command (although ordinarily it
would be). Note also that a code to identify the form administered to a particular examinee is not
read by the program from an expanded format record.

Compressed format

The data record for each examinee contains responses only to the items presented to that person,
and the responses appear in the same column field of each record (the number of columns is
equal to the number of items in the longest test form). Data entry in the compressed format is
easier than in the expanded format and results in smaller data files.

With compressed-format data, the locations of the items in the input records are not unique. An
item in one record may occupy the same column as a different item in another record. A separate
FORM command is therefore required for each test form in the instrument. In addition, each re-
sponse record must contain a number identifying the FORM command that applies to that record.
The number (1, 2, 3, etc.) refers to the order of the FORM command in the command file. The item
list of the corresponding FORM command gives the names or numbers of the items in the order
that they appear in the response field of the data records (see Section 2.6.18 for details). Inter-
nally, the program works by expanding the compressed records and inserting not-presented
codes in locations corresponding to the forms not administered to the examinee.

Related topics

 Setup menu: General dialog box (see Section 2.3.3)


 Setup menu: Item Analysis dialog box

110
OVERVIEW OF SYNTAX

The GROUP command

GROUP commands are required whenever a multiple-group analysis is specified. The number of
commands is equal to the number of groups in the analysis. GROUP commands serve two pur-
poses. First, they identify groups of respondents for multiple-group analysis. Second, they iden-
tify the set of items administered to each group. Note that whenever a multiple-group analysis is
requested, each response record must contain a number identifying the GROUP command that ap-
plies to that record.

Related topics

 Setup menu: General dialog box (see Section 2.3.3)


 Setup menu: Item Analysis dialog box

How the FORM and GROUP commands work

The FORM and GROUP commands control the input of the individual response records. How they
work together depends on the following:

 the format of the response records (expanded or compressed)


 the number of forms in the test instrument
 the assignment of test forms to the respondents
 the number of groups in the IRT analysis

The sections below describe how these factors determine the structure of the FORM and GROUP
commands.

Instruments with a single test form

When an instrument consists of a single test form, a single FORM command is assumed, and it ap-
plies to all records. The program reads the entire response records according to the specifications
on the command. If a FORM command is not included in the problem setup, the program reads the
response records according to the order of items in the ITEMS command list. As in all applica-
tions with a single FORM command, the response records do not contain a form indicator.

Single-group analysis

The examples in Sections 10.1 and 10.3 illustrate the simple case of a single-group analysis of a
single test form. The program reads all response records according to the specifications on the
FORM or ITEMS commands. GROUP commands are not required for the analysis.

Multiple-group analysis

In multiple-group analysis of a single test form, the groups may represent naturally occurring
subgroups within a population of respondents, or groups of respondents drawn from different
populations. In either case, the structure of the FORM and GROUP commands is the same. A single
FORM command applies to all response records and a separate GROUP command is required for

111
2 BILOG-MG REFERENCE

each group of respondents in the analysis. Because all respondents receive the same test form,
and thus respond to the same set of items, the lists of items in the GROUP commands are the same
for all groups in the analysis. The lists include all of the items specified in the FORM or ITEMS
commands.

The primary function of GROUP commands in applications of this type is to identify the groups of
respondents for multiple-group analysis. The example in Section 10.2 shows how this command
structure applies to examinations of differential item functioning in subgroups of a population.
Group differences in the latent distributions of ability may also be examined in this way.

Instruments with multiple test forms

When an instrument consists of multiple test forms, the structure of the FORM and GROUP com-
mands depends in part on whether the forms are administered to equivalent or nonequivalent
groups of respondents. If the forms of the instrument are randomly assigned to respondents
drawn from a single population, the groups are equivalent, and the data may be analyzed with a
single-group IRT model. GROUP commands are not required in this case, but may be added to ex-
amine subgroup differences in item functioning. When test forms are administered to nonequiva-
lent groups of respondents, the forms must contain common “linking” items, and a multiple-
group analysis is necessary to place the items from the forms on the same scale. GROUP com-
mands are required in this case.

The number of GROUP commands corresponds to the number of groups in the analysis. In multi-
ple-form applications the response records may follow either of the two formats. The sections
below show how the structure of the FORM and GROUP commands depends on these formats.

Single-group analysis

When there are multiple forms in the test instrument but only one group of examinees, multiple
FORM commands are required if the compressed data format is used, but a GROUP command or a
group indicator on the response records is not required. Section 10.4 illustrates an application of
this type.

Multiple-group analysis

In the case of multiple forms and multiple groups, all such applications can be handled by ex-
panded format. Only one FORM command is then required and the data records will not contain a
forms indicator. Similarly, if the assignment of items to groups is performed in expanded format,
including the codes for items presented to a given examinee in a given group, the GROUP com-
mands require only the group names, not the item identifications. Specification of the items as-
signed to each group will, however, shorten the run time. The example in Section 10.5 illustrates
this type of data structure. The expanded style of data entry is mandatory in applications where
the test forms contain more than one subtest and the examinee is assigned to different groups for
different subtests. This can occur in complex two-stage testing designs.

In more typical applications, however, whole forms rather than subtests are assigned to groups.
In this case, the compressed style of data entry is suitable and may be more convenient. The

112
OVERVIEW OF SYNTAX

GROUP commands must then contain, in addition to the group name, a list of all items on all forms
assigned to the corresponding groups. The data records must include both a forms identifier and
a group identifier. The advantage of this method is that response records need not contain codes
for not-presented items. Examples illustrating this type of data input are discussed in Sections
10.4 and 10.8 respectively.

Related topics

 Setup menu: General dialog box (see Section 2.3.3)


 Setup menu: Item Analysis dialog box

2.6 Using the command language


2.6.1 Overview of syntax

BILOG-MG uses command lines employing the general syntax:

>NAME KEYWORD1=N, KEYWORD2=(list), …, OPTIONn;

The following rules apply:

 A greater-than sign (>) must be entered in column 1 of the first line of a command and
followed without a space by the command name.
 All command names, keywords, options, and keyword values may be entered in upper
and/or lower case.
 Command names, keywords, options, and keyword values may be entered in full or ab-
breviated to the first three characters.
 At least one space must separate the command name from any keywords or options.
 Commas must separate all keywords and options.
 The equals sign is used to set a keyword equal to a value, which may be integer, real or
character. A real value must contain a decimal point. A character value must be enclosed
in single quotes if it:
o Contains more than eight characters
o Begins with a numeral
o Contains embedded blanks, commas, slashes, or semi-colons

For example: DFNAME=‘EXAMPL01.DAT’, TNAME=’20-ITEMS’;

 A keyword may be vector valued, i.e., set equal to a list of integer, real or character con-
stants, separated with commas or spaces, and enclosed in left and right parentheses (as
KEYWORD2 above).
 If the list is an arithmetic progression of integer or decimal numbers, the short form, first
(increment) last, may be used. Thus, a selection of items 1, 3, 7, 8, 9, 10, 15 may be en-
tered as 1,3,7(1)10,15. Real values may be entered in a similar way.
 If the values in the list are equal, the form, value (0) number of values, may be used. Thus
1.0, 1.0, 1.0, 1.0, 1.0 may be entered as 1.0(0)5.
 The italic elements in the format description are variables that the user needs to replace.

113
2 BILOG-MG REFERENCE

 Command lines may not exceed 80 columns. Continuation on one or more lines is permit-
ted.
 Each command terminates with a semi-colon (;). The semi-colon signals the end of the
command and the beginning of a new command.

2.6.2 Order of commands

The table below lists all available BILOG-MG commands in their necessary order. Commands
marked below as “Required” must appear in each problem in the order shown. All other com-
mands are optional. Note that, in the rest of this chapter, description for the commands follow an
alphabetical order. The data layout must be described in a variable format statement. This state-
ment is entered within parentheses.

Table 7.3: keywords and options in BILOG-MG

Command Keywords and options Required


TITLE1 *
TITLE2 *
COMMENT

GLOBAL CFNAME, DFNAME, IFNAME, LOGISTIC, MFNAME, NPARM,


NTEST, NVTEST, NWGHT, OMITS, SAVE, PRN;
*
CALIB, COVARIANCE, DIF, DRIFT, EXPECTED, ISTAT,
SAVE
MASTER, PARM, POST, SCORE, TSTAT, PDISTRIB;

LENGTH NITEMS, NVARIANT; *


DIAGNOSE, DIF, DRIFT, KFNAME, NALT, NFMT, NFNAME,
INPUT NFORM, NGROUP, NIDCHAR, NTOTAL, OFNAME, PERSONAL, *
SAMPLE, TAKE, TYPE, EXTERNAL;

ITEMS INAMES, INUMBERS; *

TESTi DISPERSN, GUESS, INAME, INTERCPT, INUMBER, SLOPE,


THRESHLD, TNAME, FIX;
*

FORMj INAMES, INUMBERS, LENGTH;

GROUPk GNAME, INAMES, INUMBERS, LENGTH;

DRIFT MAXPOWER, MIDPOINT;

(variable for-
mat *
statement)

114
OVERVIEW OF SYNTAX

Table 7.3 (continued)

Command Keywords and options Required


CALIB ACCEL, COMMON, CRIT, CYCLES, DIAGNOSIS, EMPIRICAL,
FIXED, FLOAT, GPRIOR, IDIST, NEWTON, NOFLOAT, *
NOGPRIOR, NORMAL, NOSPRIOR, NOTPRIOR, NQPT, NSD,
PLOT, PRINT, READPRIOR, REFERENCE, RIDGE, SELECT,
SPRIOR, TPRIOR, CHI, GROUP-PLOTS, NOADJUST, RASCH,
NFULL;
QUADk (for POINTS, WEIGHTS;
group k)
ALPHA, BETA, SMU, SSIGMA, TMU, TSIGMA;
PRIORSi (for
subtest I)
SCORE BIWEIGHT, FIT, IDIST, INFO, LOCATION, METHOD,
NOPRINT, NQPT, PMN, POP, PSD, RSCTYPE, SCALE,
YCOMMON, MOMENTS, DOMAIN, FILE, REFERENCE, READF,
NFORMS;
QUADSk (for POINTS, WEIGHTS;
group k)

Note that if there are not variant items in the subtest, there is one TEST command for each sub-
test. If a subtest contains variant test items, there must be exactly two TEST commands for that
subtest. The first identifies the main test items while the second identifies the variant test items.

Related topics

 Location of keywords in interface (see Section 2.3.13)

115
2 BILOG-MG REFERENCE

2.6.3 CALIB command

Purpose

To control the item parameter estimation procedure and the specification of prior distribu-
tions on the item parameters.

Format

>CALIB NQPT=n, CYCLES=n, NEWTON=n, PRINT=n, CRIT=n, IDIST=n,


PLOT=n, DIAGNOSIS=n, REFERENCE=n, SELECT=(list), RIDGE=(a,b,c),
ACCEL=n, NSD=n, COMMON, EMPIRICAL, NORMAL, FIXED, TPRIOR, SPRIOR,
GPRIOR, NOTPRIOR, NOSPRIOR, NOGPRIOR, READPRI, NOFLOAT, FLOAT,
NOADJUST, GROUP-PLOT, RASCH, NFULL, CHI=(a,b);

Examples

This example uses simulated responses to illustrate nonequivalent groups equating of two
forms of a 25-item multiple-choice examination administered to different populations. Sepa-
rate latent distributions are estimated for each population (EMPIRICAL option). The indeter-
minacy in location and scale of the distributions is resolved by setting the mean and standard
deviation of Group 1 to 0 and 1, respectively, with REF=1 on the CALIB command.

>CALIB NQPT=10, EMPIRICAL, CYCLES=25, NEWTON=5, CRIT=0.01, PLOT=0.05,


REFERENCE=1, TPRIOR;

In the following example of vertical equating of test forms over three grade levels, students
at each of three grade levels were given grade-appropriate versions of a arithmetic examina-
tion. The distributions of ability are assumed to be normal at each grade level (NORMAL op-
tion). The second group serves as the reference group in the calibration of the items. A prior
is placed on the item thresholds by the addition of the TPRIOR option.

>CALIB NQPT=20, NORMAL, CYCLE=30, TPRIOR, NEWTON=2, CRIT=0.01,


REFERENCE=2;

In the following example of a 3-PL model, the PLOT keyword has been set to 0.99 so that all
item response functions will be plotted. The FLOAT option is added to request the MML (un-
der normal distribution assumptions) of the means of the prior distributions on the item pa-
rameters along with the parameters. This option should not be invoked when the data set is
small and the items few. The acceleration constant (ACCEL keyword) is set to 0.5 instead of
the default value of 1.0 for a single group analysis.

>CALIB NQPT=6, FLOAT, PLOT=0.99, CYCLES=15, NEWTON=3, ACCEL=0.5;

The next example, again of a 3-PL model, illustrates the command’s usage in the presence
of aggregate-level, multiple-matrix sampling data. In this case, the data come from eight
forms of a rather difficult, multiple-choice instrument. Since aggregate-level data are always

116
CALIB COMMAND

more informative than individual-level item responses, it is worthwhile to increase the num-
ber of quadrature points (NQPT), to set a stricter convergence criterion (CRIT), and to in-
crease the CYCLES limit. A prior on the thresholds (TPRIOR) and a ridge constant of 0.8
(RIDGE) are required for convergence with the exceptionally difficult second subtest.

Aggregate-level data typically have smaller slopes in the 0,1 metric than do person-level
data. Thus, the mean of the prior for log slopes is set to 0.5 with the READPRIOR option and
the succeeding PRIOR commands as shown.

>CALIB NQPT=3, CYCLES=24, NEWTON=4, CRIT=0.0050,


RIDGE=(2, 0.8000, 2.0000), ACCEL=1.0000, TPRIOR, READPRI,
NOFLOAT;
>PRIORS1 SMU=(0.5000(0)8);
>PRIORS2 SMU=(0.5000(0)8);

Related topics

 PRIORS command (see Section 3.2.10)


 QUAD command (see Section 2.6.13)
 Setup menu: General dialog box (see Section 2.3.3)
 Setup menu: Item Analysis dialog box
 Technical menu: Calibration Options dialog box (see Section 2.3.5)
 Technical menu: Data Options dialog box
 Technical menu: Item Parameter Prior Constraints dialog box

ACCEL keyword (optional)

Purpose

To set the acceleration constant for the E-steps.

Format

ACCEL=n

Default

for NGROUP=1 and 0.5 for NGROUP>1.

Related topics

 CALIB command: CYCLES keyword (see Section 2.6.3)


 INPUT command: NGROUP keyword (see Section 2.6.9)
 Technical menu: Data Options dialog box (see Section 2.3.5)

117
2 BILOG-MG REFERENCE

CHI keyword (optional)

Purpose

To specify the number of items required and the number of intervals used for χ 2 computa-
tions.

Format

CHI=(a,b)

where a is the number of items for computation of the χ 2 fit statistics, and b is the number
of intervals into which the score continuum will be divided for purposes of computation of
χ 2 item fit statistics.

Default

CHI=(20,9).

Example

In the CALIB command shown below, the CHI keyword is used to request the calculation of
the χ 2 item fit statistics on 18 items and 7 intervals.

>CALIB CYCLE=30, TPRIOR, NEWTON=2, CRIT=0.01, CHI=(18,7);

Related topics

 Setup menu: Item Analysis dialog box (see Section 2.3.3)

COMMON option (optional)

Purpose

To estimate a common value for the lower asymptote of all items in the 3PL model.

Format

COMMON

Default

Separate values for the lower asymptotes.

118
CALIB COMMAND

Example

If the CALIB command

>CALIB NQPT=10, CYCLES=30, NEWTON=5, COMMON;

is used for a 3PL model, output as shown below is obtained. Note that the asymptote pa-
rameter is estimated at a common value of 0.031 for all items.

SUBTEST SIM ; ITEM PARAMETERS AFTER CYCLE 15

ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE CHISQ DF


S.E. S.E. S.E. S.E. S.E. (PROB)
-------------------------------------------------------------------------
T01| 1.320 | 0.968 | -1.363 | 0.695 | 0.031 | 3.5 6.0
| 0.185* | 0.192* | 0.198* | 0.138* | 0.006* | (0.7490)
| | | | | |
T02| 1.516 | 0.984 | -1.541 | 0.701 | 0.031 | 3.2 6.0
| 0.212* | 0.197* | 0.212* | 0.140* | 0.006* | (0.7783)
| | | | | |
T03| 1.020 | 1.131 | -0.902 | 0.749 | 0.031 | 2.8 6.0
| 0.170* | 0.227* | 0.142* | 0.150* | 0.006* | (0.8294)
| | | | | |
T04| 0.603 | 0.787 | -0.766 | 0.619 | 0.031 | 8.2 8.0
| 0.118* | 0.134* | 0.155* | 0.105* | 0.006* | (0.4113)
| | | | | |
T05| 0.780 | 0.695 | -1.123 | 0.571 | 0.031 | 6.3 7.0
| 0.119* | 0.124* | 0.208* | 0.102* | 0.006* | (0.5066)

Related topics

 GLOBAL command: NPARM keyword (see Section 2.6.7)

CRIT keyword (optional)

Purpose

To set the convergence criterion for EM and Newton iterations.

Format

CRIT=n

Default

0.01.

119
2 BILOG-MG REFERENCE

Example

Here, the convergence criterion has been set to the more restrictive value of 0.0050 in order
to deal with a more informative aggregate-level data set.

>CALIB NQPT=30, CYCLES=24, NEWTON=4, CRIT=0.0050,


RIDGE=(2, 0.8000, 2.0000), ACCEL=1.0000, TPRIOR, READPRI,
NOFLOAT;

Related topics

 CALIB command: CYCLES and NEWTON keywords (see Section 2.6.3)


 Setup menu: Item Analysis dialog box

CYCLES keyword (optional)

Purpose

To set the maximum number of EM cycles. If CYCLES=0 and NEWTON=0, item parameter es-
timates will be calculated from the classical item statistics from Phase 1 or from the starting
values of the TEST command. The former will be corrected for guessing if the 3-parameter
model is selected.

Format

CYCLES=n

Default

10 (for all subtests).

Examples

In this example of vertical equating of test forms over three grade levels, a maximum of 30
EM cycles and 2 Newton-Gauss iterations are requested.

>CALIB NQPT=20,NORMAL,CYCLE=30,TPRIOR,NEWTON=2,CRIT=0.01,REFERENCE=2;

Here, the CYCLES limit is increased in order to deal with a more informative aggregate-level
data set.

>CALIB NQPT=30, CYCLES=24, NEWTON=4, CRIT=0.0050, NOFLOAT,


RIDGE=(2, 0.8000, 2.0000), ACCEL=1.0000, TPRIOR, READPRI;

120
CALIB COMMAND

Related topics

 CALIB command: NEWTON keyword (see Section 2.6.3)


 TEST command (see Section 2.6.17)
 Setup menu: Item Analysis dialog box (see Section 2.3.3)

DIAGNOSIS keyword (optional)

Purpose

To set the level of diagnostic printout.

Format

DIAGNOSIS=n

Default

0.

Example

When DIAGNOSIS is set to 1, for example, item parameter estimates are printed to the Phase
2 output phase at each iteration.

EMPIRICAL option (optional)

Purpose

To estimate the score distribution in the respondent population in the form of a discrete dis-
tribution on NQPT points. This empirical distribution is used in place of the prior in the MML
estimation of the item parameters.

If NGROUP >1, separate score distributions are estimated for each group.

Format

EMPIRICAL

Default

Not-empirical (FIXED) if NGROUP=1; empirical if NGROUP>1.

121
2 BILOG-MG REFERENCE

Example

For this example, which comes from a simulation of non-equivalent groups equating, the
EMPIRICAL option is used to estimate separate latent distributions for each population.

>CALIB NQPT=10, EMPIRICAL, CYCLES=25, NEWTON=5, CRIT=0.01, PLOT=0.05,


REFERENCE=1, TPRIOR;

Related topics

 CALIB command: FIXED option (see Section 2.6.3)


 CALIB command: NQPT keyword
 INPUT command: NGROUP keyword (see Section 2.6.9)
 Setup menu: Item Analysis dialog box (see Section 2.3.3)

FIXED option (optional)

Purpose

To keep the prior distributions of ability in the population of respondents fixed at the values
specified in the IDIST keyword and/or the QUAD commands.

Format

FIXED

Default

Same as for EMPIRICAL.

Related topics

 CALIB command: EMPIRICAL option (see Section 2.6.3)


 CALIB command: IDIST keyword
 QUAD command (see Section 2.6.13)
 Technical menu: Calibration Options dialog box (see Section 2.3.5)

FLOAT option (optional)

Purpose

To estimate the means of the prior distributions on the item parameters by marginal maxi-
mum likelihood (under normal distribution assumptions), along with the parameters. To
keep the means of the prior distributions on the item parameters fixed at their specified val-
ues during estimation, the NOFLOAT option should be used.

122
CALIB COMMAND

Standard deviations of the priors are fixed in either case. The FLOAT option should not be in-
voked when the data set is small and the items few. The means of the item parameters may
drift indefinitely during the estimation cycles under these conditions.

Format

FLOAT

Default

NOFLOAT if NGROUP=1; FLOAT if NGROUP>1

Example

In this example of a 3-PL model, the FLOAT option is added to request the MML (under
normal distribution assumptions) of the means of the prior distributions on the item parame-
ters along with the parameters. This option should not be invoked when the data set is small
and the items few. The acceleration constant (ACCEL keyword) is set to 0.5 instead of the de-
fault value of 1.0 for a single group analysis.

>CALIB NQPT=6, FLOAT, PLOT=0.99, CYCLES=15, NEWTON=3, ACCEL=0.5;

Related topics

 INPUT command: NGROUP keyword (see Section 2.6.9)


 Setup menu: Item Analysis dialog box (see Section 2.3.3)

GPRIOR/NOGPRIOR option (optional)

Purpose

To select or suppress prior distributions on the lower asymptote (guessing) parameter. This
may be needed in order to give plausible values for easy items (which carry little or no in-
formation about guessing).

Priors on the slope parameters are sometimes required to prevent Heywood cases. Priors on
the lower asymptote parameters may be needed to give plausible values for easy items.

Format

GPRIOR/NOGPRIOR

Default

 1PL model, NOGPRIOR


 2PL model, NOGPRIOR

123
2 BILOG-MG REFERENCE

 3PL model, GPRIOR

Examples

For a 3PL model, priors on slopes and asymptote parameters are assumed. To remove these
priors, the CALIB command

>CALIB NQPT=10, CYCLES=15, NOSPRIOR, NOGPRIOR;

may be used.

To remove the default prior distribution on the asymptote parameters and use a prior distri-
bution on the thresholds instead, use

>CALIB NQPT=10, CYCLES=15, SPRIOR, NOGPRIOR, TPRIOR;

Related topics

 CALIB command: SPRIOR/NOSPRIOR and TPRIOR/NOTPRIOR options


 Setup menu: Item Analysis dialog box (see Section 2.3.3)

GROUP-PLOTS option (optional)

Purpose

To provide plots showing the proportions of correct responses for each separate group in a
multiple-group analysis. These plots may provide more information than the combined plot
provided by the PLOT keyword.

Format

GROUP-PLOTS

Default

Combined plots, if PLOT keyword is used.

Example

In the CALIB command from a two-group analysis below, the PLOT keyword has been set to
0.99 so that all item response functions will be plotted. In order to obtain plots by group, the
GROUP-PLOT keyword has been added.

>CALIB NQPT=6, REFERENCE=2, PLOT=0.99, CYCLES=15, NEWTON=3, GROUP-PLOT;

124
CALIB COMMAND

Related topics

 CALIB command: PLOT keyword


 Technical menu: Calibration Options dialog box (see Section 2.3.5)

IDIST keyword (optional)

Purpose

To designate the type of prior distribution in the population of respondents.

Format

IDIST=n

n=0 standard normal approximation

separate, arbitrary discrete priors for each group for each subtest read
n=1
from QUAD command

separate, arbitrary discrete priors for each group read from QUAD com-
n=2
mand

Default

0.

Example

This example illustrates how user-supplied priors for the latent distributions are specified
with IDIST=1 on the CALIB command. The points and weights for these distributions are
supplied in the QUAD commands. Note that with IDIST=1, there are separate QUAD commands
for each group for each subtest. Within each subtest, the points are the same for each group.
This is a requirement of the program. But as the example shows, the points for the groups
may differ by subtest.

>CALIB IDIST=1,READPR,EMPIRICAL,NQPT=16,CYCLE=25,TPRIOR,NEWTON=5,
CRIT=0.01,REFERENCE=1,NOFLOAT;
>QUAD1 POINTS=(-0.4598E+01 -0.3560E+01 -0.2522E+01 -0.1484E+01
-0.4453E+00 0.5930E+00 0.1631E+01 0.2670E+01 0.3708E+01
0.4746E+01),
WEIGHTS=(0.2464E-05 0.4435E-03 0.1724E-01 0.1682E+00
0.3229E+00 0.3679E+00 0.1059E+00 0.1685E-01 0.6475E-03
0.8673E-05);
>QUAD2 POINTS=(-0.4598E+01 -0.3560E+01 -0.2522E+01 -0.1484E+01
-0.4453E+00 0.5930E+00 0.1631E+01 0.2670E+01 0.3708E+01
0.4746E+01),
WEIGHTS=(0.2996E-04 0.1300E-02 0.1474E-01 0.1127E+00

125
2 BILOG-MG REFERENCE

0.3251E+00 0.3417E+00 0.1816E+00 0.2149E-01 0.1307E-02


0.3154E-04);
>PRIOR TSIGMA=(1.5(0)35);
>QUAD1 POINTS=(-0.4000E+01 -0.3111E+01 -0.2222E+01 -0.1333E+01
-0.4444E+00 0.4444E+00 0.1333E+01 0.2222E+01 0.3111E+01
0.4000E+01),
WEIGHTS=(0.1190E-03 0.2805E-02 0.3002E-01 0.1458E+00
0.3213E+00 0.3213E+00 0.1458E+00 0.3002E-01 0.2805E-02
0.1190E-03);
>QUAD2 POINTS=(-0.4000E+01 -0.3111E+01 -0.2222E+01 -0.1333E+01
-0.4444E+00 0.4444E+00 0.1333E+01 0.2222E+01 0.3111E+01
0.4000E+01),
WEIGHTS=(0.1190E-03 0.2805E-02 0.3002E-01 0.1458E+00
0.3213E+00 0.3213E+00 0.1458E+00 0.3002E-01 0.2805E-02
0.1190E-03);
>PRIOR TSIGMA=(1.5(0)35);

Related topics

 QUAD command (see Section 2.6.13)

NEWTON keyword (optional)

Purpose

To specify the number of Gauss-Newton (Fisher-scoring) iterations following EM cycles.

If CYCLES=0 and NEWTON=0, item parameter estimates will be calculated from the classical
item statistics from Phase 1 or from the starting values of the TEST command. The former
will be corrected for guessing if the 3-parameter model is selected.

Format

NEWTON=n

Default

2.

Example

In this example, the value of NEWTON is increased to 4 in order to deal with a more informa-
tive aggregate-level data set.

>CALIB NQPT=30, CYCLES=24, NEWTON=4, CRIT=0.0050,


RIDGE=(2, 0.8000, 2.0000), ACCEL=1.0000, TPRIOR, READPRI,
NOFLOAT;

126
CALIB COMMAND

Related topics

 CALIB command: CYCLES keyword


 TEST command (see Section 2.6.17)
 Setup menu: Item Analysis dialog box (see Section 2.3.3)

NFULL keyword (optional)

Purpose

To specify that the Fisher-scoring steps for estimating item parameters use the full informa-
tion matrix (if the number of items n is less than p) or the block-diagonal approximation to
the information matrix (if n is greater than or equal to p).

Format

NFULL=p

Default

p=20.

Example

The NFULL keyword is used on the CALIB command to request the use of the full information
matrix in the Newton steps for this data set where only 4 items were presented to subjects. In
the absence of the NFULL keyword, the block diagonal approximation to the information ma-
trix would have been used in this case, as NITEMS=4 is less than the default threshold of 20
items.

>CALIB TPRIOR,SPRIOR,NFULL=4;

Related topics

 LENGTH command: NITEMS keyword (see Section 2.6.11)


 Technical menu: Data Options dialog box (see Section 2.3.5)

NOADJUST option (optional)

Purpose

In multiple-group applications, each group has its own latent distribution. To resolve the
indeterminacy of origin and scale of measurement in the IRT analysis, the user can choose
to set the mean and standard deviation to 0.0 and 1.0 in a reference group specified by the
REF keyword of the CALIB command; alternatively, the user can choose to assign these

127
2 BILOG-MG REFERENCE

keyword of the CALIB command; alternatively, the user can choose to assign these values to
the combined distributions weighted by their sample sizes.

BILOG-MG routinely rescales the origin and scale of the latent distribution (i.e., linearly
transforms the quadrature points) exactly to these values even in the case of one group. The
item slopes and thresholds are then linearly transformed to match the adjusted scale.

This results in small differences between the values estimated in BILOG and BILOG-MG
because the posterior latent distribution has mean and standard deviation equal to only ap-
proximately zero and one. To obtain the BILOG values (when all other conditions of estima-
tion are identical), the user may include the option NOADJUST in the CALIB command, as in
the example below.

Format

NOADJUST

Default

Rescaling the origin and scale of the latent distribution.

Example

In the syntax below, a single subtest is analyzed in a single group analysis. The NOADJUST
option is used on the CALIB command to suppress the adjustment of the rescaling of the la-
tent distribution.

EXAMPLE 16: TRADITIONAL IRT ANALYSIS OF A FIFTEEN-ITEM TEST


PARAMETERS OF ITEMS 6 THROUGH 10 ARE FIXED

>CALIB CYCLES=15,NEWTON=3,NQPT=11,DIAGNOS=1,NOADJUST;

Related topics

 CALIB command: REFERENCE keyword


 Technical menu: Calibration Options dialog box (see Section 2.3.5)

NORMAL option (optional)

Purpose

To specify the estimation of the means and standard deviations of the prior distributions of
ability in the population of respondents by marginal maximum likelihood (under normal dis-
tribution assumptions) along with the item parameters. If NGROUP>1, separate means and
standard deviations are estimated for each group.

128
CALIB COMMAND

Format

NORMAL

Default

Same as for EMPIRICAL.

Example

In this example of vertical equating of test forms over three grade levels, the distributions of
ability are assumed to be normal at each grade level (NORMAL on the CALIB command).

>CALIB NQPT=20,NORMAL,CYCLE=30,TPRIOR,NEWTON=2,CRIT=0.01,REFERENCE=2;

Related topics

 CALIB command: EMPIRICAL option


 INPUT command: NGROUP keyword (see Section 2.6.9)

NQPT keyword (optional)

Purpose

To specify the number of quadrature points in MML estimation for each group.

Format

NQPT=n

Default

20 for each group when NGROUP>1; 10 (otherwise).

Examples

In this example of a nonequivalent groups equating analysis, the number of quadrature


points is set to 10 instead of the default of 20 for multiple-group analyses.

>CALIB NQPT=10, EMPIRICAL, CYCLES=25, NEWTON=5, CRIT=0.01, PLOT=0.05,


REF=1, TPRIOR;

Here, the value of NQPT is increased to 30 in order to deal with a more informative aggre-
gate-level data set.

129
2 BILOG-MG REFERENCE

>CALIB NQPT=30, CYCLES=24, NEWTON=4, CRIT=0.0050,


RIDGE=(2, 0.8000, 2.0000), ACCEL=1.0000, TPRIOR, READPRI,
NOFLOAT;

Related topics

 INPUT command: NGROUP keyword (see Section 2.6.9)


 Setup menu: Item Analysis dialog box (see Section 2.3.3)

NSD keyword (optional)

Purpose

To specify the range of the prior distribution(s) for the population(s) in standard deviation
units.

Format

NSD=n

Default

8 standard deviation units (from –4.0 to 4.0).

PLOT keyword (optional)

Purpose

To specify the significance level for the goodness-of-fit of the item-response functions to be
plotted. All items for which the significance level is below the real-number value (decimal
fraction) provided will be plotted.

Format

PLOT=n

n = 0.0 no plots produced

n = 1.0 plots all items

(for example) plots only those of poor-fitting items for which


n = 0.01
the significance level is less than 0.01
Default

0.0.

130
CALIB COMMAND

Examples

Plots of the item-response functions of all items for which the goodness-of-fit statistic is less
than 0.05 are requested.

>CALIB NQPT=10, EMPIRICAL, CYCLES=25, NEWTON=5, CRIT=0.01, PLOT=0.05,


REFERENCE=1, TPRIOR;

In this example of a 3-PL model, the PLOT keyword has been set to 1.0 so that all item re-
sponse functions will be plotted

>CALIB NQPT=6, FLOAT, PLOT=1.0, CYCLES=15, NEWTON=3, ACCEL=0.5;

PRINT keyword (optional)

Purpose

To print provisional item parameter estimates at each iteration during the calibration phase.
If PRINT=1, provisional item parameter estimates are printed; if PRINT=0 printing is sup-
pressed.

Format

PRINT=n

Default

0.

Example

If the following CALIB command is used for a 2-group DIF analysis, only the information
shown below is printed concerning the iterative process:

>CALIB NQPT=10,CYCLES=15,CRIT=0.005,NEWTON=2,REFERENCE=1, PRINT=0;

[E-M CYCLES]
-2 LOG LIKELIHOOD = 3152.375
CYCLE 1; LARGEST CHANGE= 0.17572
-2 LOG LIKELIHOOD = 3128.806
CYCLE 2; LARGEST CHANGE= 0.15440
-2 LOG LIKELIHOOD = 3117.237

When the PRINT keyword is set to 1

>CALIB NQPT=10,CYCLES=15,CRIT=0.005,NEWTON=2,REFERENCE=1, PRINT=1;

131
2 BILOG-MG REFERENCE

the output provided in the Phase 2 output file is expanded and parameter estimates are given
for each group after each cycle. The output obtained for both groups after the third EM cycle
is given below as an example.

QUADRATURE POINTS, POSTERIOR WEIGHTS, MEAN AND S.D.:


EM CYCLE: 3

GROUP 1 MALES ; ITEM PARAMETERS AFTER CYCLE 3


ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE
----------------------------------------------------------------
SP1 | 1.378 | 1.128 | -1.222 | 0.748 | 0.000
| 0.151* | 0.053* | 0.000* | 0.000* | 0.000*
| | | | |
SP2 | 0.686 | 1.128 | -0.608 | 0.748 | 0.000
| 0.137* | 0.053* | 0.000* | 0.000* | 0.000*
| | | | |
SP3 | -0.938 | 1.128 | 0.831 | 0.748 | 0.000
| 0.140* | 0.053* | 0.000* | 0.000* | 0.000*
| | | | |
SP4 | 0.649 | 1.128 | -0.575 | 0.748 | 0.000
| 0.136* | 0.053* | 0.000* | 0.000* | 0.000*
----------------------------------------------------------------
* STANDARD ERROR
LARGEST CHANGE = 0.101366

GROUP 2 FEMALES ; ITEM PARAMETERS AFTER CYCLE 3


ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE
----------------------------------------------------------------
SP1 | 1.795 | 1.128 | -1.591 | 0.748 | 0.000
| 0.144* | 0.053* | 0.000* | 0.000* | 0.000*
| | | | |
SP2 | 0.582 | 1.128 | -0.516 | 0.748 | 0.000
| 0.117* | 0.053* | 0.000* | 0.000* | 0.000*
| | | | |
SP3 | -1.069 | 1.128 | 0.947 | 0.748 | 0.000
| 0.124* | 0.053* | 0.000* | 0.000* | 0.000*
| | | | |
SP4 | -0.200 | 1.128 | 0.177 | 0.748 | 0.000
| 0.115* | 0.053* | 0.000* | 0.000* | 0.000*
----------------------------------------------------------------
* STANDARD ERROR
LARGEST CHANGE = 0.101366

PARAMETER MEAN STN DEV


-----------------------------------
GROUP: 1 NUMBER OF ITEMS: 4
THRESHOLD -0.393 0.869
GROUP: 2 NUMBER OF ITEMS: 4
THRESHOLD -0.246 1.078
-----------------------------------

-2 LOG LIKELIHOOD = 3112.870

132
CALIB COMMAND

RASCH option (optional)

Purpose

To rescale the parameter estimates according to Rasch-model conventions. That is, all the
slopes will be rescaled so that their geometric mean equals 1.0, and the thresholds will be re-
scaled so that their arithmetic mean equals 0.0. If the 1-parameter model has been specified,
all slope parameters will therefore equal 1.0.

Because the threshold parameters are constrained in other ways in DIF and DRIFT analysis,
the RASCH option cannot be used with these models. The posterior latent distribution dis-
played in Phase 2 is not rescaled in the Rasch convention.

Format

RASCH

Default

No Rasch rescaling.

Example

In the syntax for a single-group analysis shown below, a 1-parameter model is fitted to the
data (NPARM=1 on GLOBAL command). Rasch rescaling is requested on the CALIB command
through inclusion of the RASCH keyword, and all slope parameters will therefore equal 1.0.

>GLOBAL DFNAME='EXAMPL04.DAT',NIDCH=5,NPARM=1;

>CALIB CYCLE=10,TPRIOR,NEWTON=2,CRIT=0.01,RASCH;

Related topics

 GLOBAL command: NPARM keyword (see Section 2.6.7)


 Technical menu: Calibration Options dialog box (see Section 2.3.5)

READPRIOR option (optional)

Purpose

To specify that the prior distributions for selected parameters will be read from the ensuing
PRIORS command(s). Otherwise, default priors will be used for these parameters.

133
2 BILOG-MG REFERENCE

Format

READPRI

Default

 thresholds: normal, mean = 0, SD = 2.0


 log slope: normal, mean = 0, SD = 0.5
 asymptote: beta, with parameters set so that the mean is 1/NALT with a weight of 20
observations of respondents who are marking randomly.

Example

In this example, the mean of the prior for the log slopes has been set to 0.5 by use of the
READPRI option of the CALIB command and the following PRIORS commands.

>CALIB NQPT=30, CYCLES=24, NEWTON=4, CRIT=0.0050,


RIDGE=(2, 0.8000, 2.0000), ACCEL=1.0000, TPRIOR, READPRI,
NOFLOAT;
>PRIORS1 SMU=(0.5000(0)8);
>PRIORS2 SMU=(0.5000(0)8);

Related topics

 INPUT command: NALT keyword (see Section 2.6.9)


 PRIORS command (see Section 2.6.12)
 Technical menu: Item Parameter Prior Constraints dialog box (see Section 2.3.5)

REFERENCE keyword (optional)

Purpose

To resolve the indeterminacy of the location and scale of the latent variable when NGROUP>1.

When the groups originally came from one population as, for example, in two-stage testing,
REFERENCE should be set to 0. When the groups represent separate populations, REFERENCE
should be set to the value of one of the group indicators. It specifies the reference group for
the DIF model and the reference cohort for the DRIFT model.

Format

REFERENCE=n

n=0 The mean and standard deviation of the combined estimated distribu-
tions of the groups weighted by their sample sizes are set to 0 and 1,
respectively.

134
CALIB COMMAND

n>0 The mean and standard deviation of group n are set to 0 and 1, respec-
tively.

Default

1.

Examples

In this example of a nonequivalent groups equating analysis, the indeterminacy in location


and scale of the distributions is resolved by using REF=1 to specify Group 1 as the reference
group. This sets the mean and standard deviation of Group 1 to 0 and 1, respectively.

>CALIB NQPT=10, EMPIRICAL, CYCLES=25, NEWTON=5, CRIT=0.01, PLOT=0.05,


REFERENCE=1, TPRIOR;

Here, the second group serves as the reference group in the calibration of the items.

>CALIB NQPT=20,NORMAL,CYCLE=30,TPRIOR,NEWTON=2,CRIT=0.01,REFERENCE=2;

Related topics

 INPUT command: DIF option (see Section 2.6.9)


 INPUT command: DRIFT option
 INPUT command: NGROUP keyword
 Setup menu: General dialog box (see Section 2.3.3)

RIDGE keyword (optional)

Purpose

To add a ridge constant (if a = 2) to the diagonal elements of the information matrix to be
inverted during the EM cycles and Newton iterations. The ridge constant starts at the value 0
and is increased by b if the ratio of a pivot and the corresponding diagonal elements of the
matrix is less than c.

The old ridge option can be invoked with the RIDGE=1 specification. It is provided so users
may duplicate old results from BILOG. The present default is an improvement of the old
method.

Format

RIDGE=(a, b, c)

135
2 BILOG-MG REFERENCE

Default

(2, 0.1, 0.01).

Example

This example emanates from an analysis of aggregate-level data that includes some fairly
difficult items. A ridge constant of 0.8 is required for convergence as one of the subtests is
exceptionally difficult.

Aggregate-level data typically have smaller slopes in the 0,1 metric than do person-level
data. For this reason, the mean of the prior for the log slopes has been set to -0.5 by use of
the READPRI option of the CALIB command and the following PRIOR commands.

>CALIB NQPT=30, CYCLES=24, NEWTON=4, CRIT=0.0050,


RIDGE=(2, 0.8000, 2.0000), ACCEL=1.0000, TPRIOR, READPRI,
NOFLOAT;
>PRIORS1 SMU=(0.5000(0)8);
>PRIORS2 SMU=(0.5000(0)8);

Related topics

 CALIB command: CYCLES keyword


 CALIB command: NEWTON keyword

SELECT keyword (optional)

Purpose

To select, with a vector of ones and zeros, subtests for which item-parameter calibration is
desired.

Format

SELECT=( n1 , n2 , ..., nNTEST )

where

 n=0 Do not calibrate subtest i


 n=1 Calibrate subtest i

Default

Calibrate all subtests.

136
CALIB COMMAND

Example

In this example with three subtests, only the second subtest is to be calibrated.

>TEST1 INUMBERS=(1(1)10);
>TEST2 INUMBERS=(11(1)30);
>TEST3 INUMBERS=(31(1)45);
(5A1,45A1)
>CALIB NQPT=10, CYCLES=25, NEWTON=5, SELECT=(0,1,0);

Related topics

 GLOBAL command: NTEST keyword (see Section 2.6.7)


 Setup menu: Item Analysis dialog box (see Section 2.3.3)

SPRIOR/NOSPRIOR option (optional)

Purpose

The presence of these options selects or suppresses, respectively, prior distributions on the
threshold, slope, and lower asymptote (guessing) parameter.

Priors on the slope parameters are sometimes required to prevent Heywood cases.

Format

SPRIOR/NOSPRIOR

Default

 1PL model, NOSPRIOR


 2PL model, SPRIOR
 3PL model, SPRIOR.

Examples

In the case of a 1PL model, no priors are used by default and thus the two CALIB commands

>CALIB NQPT=10, CYCLES=15, NOSPRIOR;

and

>CALIB NQPT=10, CYCLES=15;

are equivalent.

137
2 BILOG-MG REFERENCE

In order to assume a prior distribution on the slopes in the 1PL case, the CALIB command

>CALIB NQPT=10, CYCLES=15, SPRIOR;

may be used.

In a 2PL model, a prior is placed on the slopes by default and thus the commands

>CALIB NQPT=10, CYCLES=15, SPRIOR;

and

>CALIB NQPT=10, CYCLES=15;

are equivalent.

Related topics

 CALIB command: GPRIOR/NOGRPRIOR and TPRIOR/NOTPRIOR options


 Setup menu: Item Analysis dialog box (see Section 2.3.3)

TPRIOR/NOTPRIOR option (optional)

Purpose

To select prior distributions on the threshold parameters. Although extreme threshold values
do not affect the estimation of ability adversely, a diffuse prior distribution on the thresholds
will keep their estimates within a reasonable range during the estimation cycle.

Format

TPRIOR/NOTPRIOR

Default

 1PL model, NOTPRIOR


 2PL model, NOTPRIOR
 3PL model, NOTPRIOR.

Examples

In this example of vertical equating of test forms, a prior is placed on the item thresholds by
the addition of the TPRIOR option to the CALIB command.

>CALIB NQPT=20,NORMAL,CYCLE=30,TPRIOR,NEWTON=2,CRIT=0.01,REFERENCE=2;

138
CALIB COMMAND

This example emanates from an analysis of aggregate-level data that includes some fairly
difficult items. A prior on the thresholds is required for convergence as one of the subtests is
exceptionally difficult.

>CALIB NQPT=30, CYCLES=24, NEWTON=4, CRIT=0.0050,


RIDGE=(2, 0.8000, 2.0000), ACCEL=1.0000, TPRIOR, READPRI,
NOFLOAT;
>PRIORS1 SMU=(0.5000(0)8);
>PRIORS2 SMU=(0.5000(0)8);

In the case of a 1PL model, no priors are used by default and thus the two CALIB commands

>CALIB NQPT=10, CYCLES=15, NOSPRIOR;

and

>CALIB NQPT=10, CYCLES=15;

are equivalent. In order to assume a prior distribution on the slopes in the 1PL case, the
CALIB command

>CALIB NQPT=10, CYCLES=15, SPRIOR;

may be used.

In a 2PL model, a prior is placed on the slopes by default and thus the commands

>CALIB NQPT=10, CYCLES=15, SPRIOR;

and

>CALIB NQPT=10, CYCLES=15;

are equivalent. In a 2PL model, the command

>CALIB NQPT=10, CYCLES=15, SPRIOR, TPRIOR;

indicates that an additional prior distribution should be assumed for the threshold parame-
ters.

For a 3PL model, priors on slopes and asymptote parameters are assumed. To remove these
priors, the CALIB command

>CALIB NQPT=10, CYCLES=15, NOSPRIOR, NOGPRIOR;

may be used.

139
2 BILOG-MG REFERENCE

In a 3PL model, to remove the default prior distribution on the asymptote parameters and
use a prior distribution on the thresholds instead, use

>CALIB NQPT=10, CYCLES=15, SPRIOR, NOGPRIOR, TPRIOR;

Related topics

 CALIB command: SPRIOR/NOSPRIOR and GPRIOR/NOGPRIOR options


 Setup menu: Item Analysis dialog box (see Section 2.3.3)

140
COMMENT COMMAND

2.6.4 COMMENT command

(Optional)

Purpose

To enter one or more lines of explanatory remarks into the program output stream. This line
and all subsequent lines preceding the GLOBAL command will be printed in the initial output
stream. The maximum length of each line is 80 characters. A semicolon to signal the end of
the command is not needed.

Format

>COMMENT
…text…
…text…

Example

EXAMPLE 4
SIMULATED RESPONSES TO TWO 20-ITEM PARALLEL TEST FORMS
>COMMENT
This example illustrates the equating of equivalent groups with the BILOG-
MG program. Two parallel test forms of 20 multiple-choice items were ad-
ministered to two equivalent samples of 200 examinees drawn from the same
population. There are not common items between the forms.
>GLOBAL DFNAME=’EXAMPL04.DAT’,NIDCH=5,NPARM=2;

Default

No comments.

Related topics

 GLOBAL command (see Section 2.6.7)


 Setup menu: General dialog box (see Section 2.3.3)

141
2 BILOG-MG REFERENCE

2.6.5 DRIFT command

(Required only if DRIFT is specified in the INPUT command)

Purpose

To provide the maximum degree of the polynomial item parameter drift model and a vector
of time points, n1 , n2 ,..., nn .

Format

>DRIFT MAXPOWER=a, MIDPOINT=( n1 , n2 ,..., nn );

Default

No DRIFT command.

Example

>DRIFT MAXPOWER=2, MIDPOINT=(-5,-2,0,2,4);

Related topics

 INPUT command: DRIFT option (see Section 2.6.9)

MAXPOWER keyword

(optional)

Purpose

To specify the maximum degree of the drift polynomial included in the model. The maxi-
mum degree must be less than the number of groups.

Format

MAXPOWER=n

Default

NGROUP-1.

Related topics

 INPUT command: NGROUP keyword (see Section 2.6.9)

142
DRIFT COMMAND

MIDPOINT keyword

(optional)

Purpose

To specify a vector of time points (or midpoints of time intervals).

Format

MIDPOINT=( n1 , n2 ,..., nn )

Default

(1, 2, …, NGROUP-1)

Related topics

 INPUT command: NGROUP keyword

143
2 BILOG-MG REFERENCE

2.6.6 FORM command

(Required only if the NFORM keyword appears in the INPUT command)

Purpose

To supply the order of the item responses in the data records. Each FORM command gives the
number of items in the form and lists the items in the order in which the item responses ap-
pear on the data records for that form. The items may be listed by name or number, but not
by both.

When NFORMS > 1, the FORM command requires a form number in the data record. The form
numbers must range in value from 1 to the number of forms. The form indicator field fol-
lows the case ID field and is INTEGER in the variable format statement. Because the same
format statement is used to read the data records for all forms, the item responses, the case
ID and weight, and the form and group indicators must occupy the same columns on all re-
cords. If the forms are of unequal length, the size of the item-response field on the format
statement should equal the number of items in the longest form.

The order of the several FORM commands corresponds to the number of the respective form.

Format

>FORM LENGTH=n, INUMBERS=(list), INAMES=(list);

Default

None.

Example

Form 1 consists of items 1, 2, 3, 4, and 6, and form 2 consists of items 1, 6, 7, 8, 9, and 10.
The data records are as follows:

SUBJECT001 1 21321
SUBJECT002 2 513122

SUBJECT999 1 21422

Responses to item 1 appear in column 14 of the data records for form 1 and at the end of the
data records for form 2. The FORM commands and format statement are as follows:

>FORM1 LENGTH=5, INUMBERS=(1(1)5);


>FORM2 LENGTH=6, INUMBERS=(6(1)10,1);
(10A1, 1X, I1, 1X, 6A1)

144
FORM COMMAND

Related topics

 INPUT command: NFORM keyword (see Section 2.6.9)


 Variable format statement (see Section 2.6.18)
 Setup menu: General dialog box (see Section 2.3.3)
 Setup menu: Item Analysis dialog box

INAMES keyword (optional)

Purpose

To specify the list of item names, as specified in the ITEMS command, in the order in which
the item response appear on the data records for FORMj.

Format

INAMES=( n1 , n2 ,..., nLENGTH )

Default

When NFORM = 1, the sequence of items specified on the ITEMS command. When NFORM > 1,
no sequence is specified.

Example

Assume, in the previous example, that the command

>ITEMS INAMES=(I1(1)I10);

appears earlier in the command file to give the name Ix to item x. Then the FORM1 statement
could be replaced with

>FORM1 LENGTH=5, INAMES=(I1(1)I4,I6);

Note that if the item names are in a sequence, they can be specified using the variable list
format “first (increment) last”, as “I1(1)I4” is used here to specify items 1 through 4.

Related topics

 ITEMS command
 INPUT command: NFORM keyword
 Setup menu: General dialog box

145
2 BILOG-MG REFERENCE

INUMBERS keyword (optional)

Purpose

To provide the list of item numbers, as specified in the ITEMS command, in the order in
which the item response appear on the data records for FORMj.

Format

INUMBERS=( n1 , n2 ,..., nLENGTH )

Default

When NFORM = 1, the sequence of items specified on the ITEMS command. When NFORM > 1,
none.

Related topics

 ITEMS command (see Section 2.6.10)


 INPUT command: NFORM keyword (see Section 2.6.9)
 Setup menu: Item Analysis dialog box (see Section 2.3.3)

LENGTH keyword (required)

Purpose

The number of items in FORM.

Format

LENGTH=n

Default

NTOTAL when NFORM = 1, none when NFORM > 1.

Related topics

 INPUT command: NFORM and NTOTAL keywords (see Section 2.6.9)


 ITEMS command (see Section 2.6.10)
 Setup menu: Item Analysis dialog box (see Section 2.3.3)

146
GLOBAL COMMAND

2.6.7 GLOBAL command

(Required)

Purpose

To supply input filenames and other information used in the three phases of the program.
The GLOBAL keywords DFNAME, MFNAME, CFNAME, and IFNAME enable the user to assign spe-
cific names to the program’s input files. A filename must be not more that 128 characters
long and may include a drive prefix, a path name, and an extension. The filename must be
enclosed in single quotes. Note that each line of the command file has a maximum length of
80 characters. If the filename does not fit on one line of 80 characters, the remaining charac-
ters should be placed on the next line, starting at column 1.

Format

>GLOBAL DFNAME=n, MFNAME=n, CFNAME=n, IFNAME=n, NPARM=n, NWGHT=n, NTEST=n,


NVTEST=n, PRNAME=n, LOGISTIC, OMITS, SAVE;

Example

>GLOBAL DFNAME=’EXAMPL04.DAT’, NPARM=2;

Related topics

 Data menu: Examinee Data dialog box (see Section 2.3.4)


 Data menu: Group-Level Data dialog box
 Save menu (see Section 2.3.6)
 Setup menu: General dialog box (see Section 2.3.3)
 Setup menu: Test Scoring dialog box

CFNAME keyword (optional)

Purpose

To supply the name of the previously created calibration file (if any) to be read in. If data are
read from a previously generated calibration file, DFNAME must not appear, and TYPE=0 must
appear in the INPUT command.

The PARM keyword of the SAVE command must be specified to save updated parameter esti-
mates to an external file.

Format

CFNAME=<'filename'>

147
2 BILOG-MG REFERENCE

Example

In a previous run, a calibration file was created as shown below. The calibration file was
saved to exampl03.cal using the CALIB keyword on the SAVE command. Note that a calibra-
tion file will be created only if the SAMPLE keyword is also specified on the INPUT command,
with a number less that the total number of examinees.

EXAMPLE:
CREATING A CALIBRATION FILE
>COMMENT
>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2, SAVE;
>SAVE CALIB='EXAMPL03.CAL';
>LENGTH NITEMS=(45);
>INPUT NTOTAL=45,SAMPLE=2000,NGROUP=2,KFNAME='EXAMPL03.DAT',NIDCHAR=5,
NALT=5,NFORM=2,TYPE=1;

The previously created calibration file is now used as data source through the use of the
CFNAME keyword on the GLOBAL command. Note that the TYPE keyword on the INPUT com-
mand is now set to 0, compared to 1 previously. The updated item parameter estimates are
saved to the file latest.prm using the PARM keyword on the SAVE command.

EXAMPLE:
USING A MASTER FILE AS INPUT
>COMMENT
>GLOBAL CFNAME='EXAMPL03.CAL',NPARM=2, SAVE;
>SAVE PARM='LATEST.PRM';
>LENGTH NITEMS=(45);
>INPUT NTOTAL=45,SAMPLE=2000,NGROUP=2,NIDCHAR=5,
NALT=5,NFORM=2,TYPE=0;

Related topics

 GLOBAL command: DFNAME keyword (see Section 2.6.7)


 INPUT command: TYPE keyword (see Section 2.6.9)
 SAVE command: CALIB keyword (see Section 2.6.15)

DFNAME keyword (optional)

Purpose

To supply the name of the raw data file that contains the original data. The format for this
file is described in the section on input and output files.

Format

DFNAME=<'filename'>

148
GLOBAL COMMAND

Notes

The path to and filename of this file may be longer than 80 characters. However, as the
maximum length of any line in the command file is 80 characters, multiple lines may be
used. It is important to continue up to and including the 80th column when specifying a long
path and filename.

For example, suppose the data file exampl06.dat is in a folder named:

C:\PROGRAM FILES\ITEM RESPONSE THEORY\IRT_2002\MARCH20\BILOG-MG-


VERSION1.2\EXAMPLES

The correct way to enter this information in the command file is to enclose the name and
path in single quotes, and continue until column 80 is reached. Then proceed in column 1 of
the next line as shown below:

>GLOBAL DFNAME=’C:\PROGRAM FILES\ITEM RESPONSE THEORY\IRT_2002\MARCH20\BILOG-MG


-VERSION1.2\EXAMPLES\EXAMPL06.DAT’, NTEST=1, NVTEST=1, NPARM=2, SAVE;

If the data are stored in the same folder as the command file, it is sufficient to type

DFNAME='EXAMPL06.DAT'

Examples

This example shows the use of the external data file exampl03.dat.

>GLOBAL DFNAME='EXAMPL03.DAT';
>LENGTH NITEMS=(45);
>INPUT NTOTAL=45,KFNAME='EXAMPL03.DAT',NIDCHAR=5,

Note that this file is referenced on both the GLOBAL command (DFNAME keyword) and on the
INPUT command (KFNAME keyword). This indicates that the answer key for correct responses
is given at the top of the data file, as shown below:

ANSWER KEY 1111111111111111111111111


person1 1111111112221212211111121
person2 2211111212222222222255222

Related topics

 Data menu: Examinee Data dialog box (see Section 2.3.4)


 Data menu: Group-Level Data dialog box
 Input files (see Section 2.6.20)
 Output files
 Variable format statement (see Section 2.6.18)

149
2 BILOG-MG REFERENCE

IFNAME keyword (optional)

Purpose

To supply the name of the previously created item parameter file (if any) to be used as input.

The PARM keyword of the SAVE command must be specified to save updated parameter esti-
mates to an external file.

Format

IFNAME=<'filename'>

Example

The previously created parameter file exampl03.par is used as data source through the use
of the IFNAME keyword on the GLOBAL command. The updated item parameter estimates are
saved to the file latest.par using the PARM keyword on the SAVE command.

EXAMPLE:
USING A ITEM PARAMETER FILE AS INPUT
>COMMENT
>GLOBAL IFNAME='EXAMPL03.PAR',NPARM=2, SAVE;
>SAVE CALIB='LATEST.PAR';
>LENGTH NITEMS=(45);
>INPUT NTOTAL=45,SAMPLE=2000,NGROUP=2,NIDCHAR=5,
NALT=5,NFORM=2;

Related topics

 GLOBAL command: IFNAME keyword


 INPUT command: TYPE keyword (see Section 2.6.9)
 SAVE command: PARM keyword (see Section 2.6.15)
 Setup menu: Test Scoring dialog box (see Section 2.3.3)

LOGISTIC option (optional)

Purpose

To assume the natural metric of the logistic response function in all calculations. Otherwise,
the logit is multiplied by D = 1.7 to obtain the metric of the normal ogive model.

Format

LOGISTIC

150
GLOBAL COMMAND

Default

Normal ogive model.

Examples

For the 2-parameter model requested in this first GLOBAL command, the natural metric of the
logistic response function is assumed:

>GLOBAL NPARM=2, LOGISTIC, DFNAME=’EXAMPLE.DAT’;

while a similar normal ogive model can be obtained by using the command:

>GLOBAL NPARM=2, DFNAME=’EXAMPLE.DAT’;

Related topics

 Setup menu: General dialog box (see Section 2.3.3)

MFNAME keyword (optional)

Purpose

To supply the name of a previously created master file to be read in. If data are read from a
previously prepared master file, DFNAME must not appear, and TYPE=0 must appear in the
INPUT command. The PARM keyword of the SAVE command may be specified to save up-
dated parameter estimates to an external file.

Format

MFNAME='filename'

Example

The previously created master file exampl03.mas is used as data source through the use of
the MFNAME keyword on the GLOBAL command. Note that the TYPE keyword on the INPUT
command is now set to 0.

EXAMPLE:
USING A MASTER FILE AS INPUT
>GLOBAL MFNAME='EXAMPL03.MAS',NPARM=2;
>LENGTH NITEMS=(45);
>INPUT NTOTAL=45,SAMPLE=2000,NGROUP=2,NIDCHAR=5,
NALT=5,NFORM=2,TYPE=0;

151
2 BILOG-MG REFERENCE

Related topics

 INPUT command: TYPE keyword (see Section 2.6.9)


 SAVE command: MASTER keyword (see Section 2.6.15)

NPARM keyword (optional)

Purpose

To indicate the number of item parameters in the model:

 1: 1-parameter logistic model


 2: 2-parameter logistic model
 3: 3-parameter logistic model

Format

NPARM=n

Default

NPARM=2.

Examples

The following GLOBAL commands are used to request a 1PL, 2PL and 3PL model respec-
tively.

>GLOBAL NPARM=1, NWGHT=3, LOGISTIC;

>GLOBAL DFNAME=’EXAMPL03.DAT’, NPARM=2;

>GLOBAL NPARM=3, DFNAME=EXAMPL07.DAT;

Related topics

 Setup menu: General dialog box (see Section 2.3.3)

NTEST keyword (optional)

Purpose

To indicate the number of subtests.

152
GLOBAL COMMAND

Format

NTEST=n

Default

NTEST=1.

Examples

In the GLOBAL command below, the NTEST keyword is used to indicate that two subtests are
used. Note the two TEST commands in the syntax. The LENGTH command is used to indicate
the length of the subtests.

>GLOBAL NPARM=3, NTEST=2, DFNAME=’EXAMPL08.DAT’;


>LENGTH NITEM=(8,8);
>INPUT NTOTAL=16;
>ITEMS INUMBER=(1(1)16), INAMES=(N1(1)N8,A1(1)A8);
>TEST1 TNAME=NUMCON, INUMBER=(1(1)8);
>TEST2 TNAME=ALGCON, INUMBER=(9(1)16);

Related topics

 GLOBAL command: NVTEST keyword


 LENGTH command (see Section 2.6.11)
 Setup menu: General dialog box (see Section 2.3.3)
 TEST command

NVTEST keyword (optional)

Purpose

To indicate the number of subtests with variant items.

Format

NVTEST=n

Default

NVTEST=0.

Example

In the example below, both a main and variant test are used. In this case, NTEST is set to 1 to
indicate the main test, and the NVTEST keyword is used to indicate the presence of a variant
test. The first test command is that for the main test, while items for the variant test are se-
lected by name in the next TEST command (here named TESTV purely for convenience).

153
2 BILOG-MG REFERENCE

There are 20 main test items and 4 variant test items, selected from a total of 50 items in the
data file. The LENGTH command is used to indicate the length of the subtests.

>GLOBAL NPARM=3, NTEST=1, NVTEST=1, DFNAME=’EXAMPL06.DAT’;


>LENGTH NITEM=24, NVARIANT=4;
>INPUT NTOTAL=50, NIDCHAR=11;
>ITEMS INUMBERS=(1(1)50), INAMES=(I26(1)I75);
>TESTM TNAME=MAIN, INAMES=(I26, I27, I28, I29, I31, I33, I34, I35, I36,
I38, I39, I47, I48, I49, I50, I54, I60, I64, I68, I72);
>TESTV TNAME=VARIANT, INAMES=(I53, I59, I69,I73);
(11A1,T39,25A1/T13,25A1)

Related topics

 GLOBAL command: NTEST keyword


 LENGTH command (see Section 2.6.11)
 TEST command (see Section 2.6.17)

NWGHT keyword (optional)

Purpose

To specify the weighting of response records. A value larger than 0 is required when the
data are input in the form of response patterns and frequencies, or when the sampling proce-
dure requires the use of case weights. The data file (TYPE) in the INPUT command must also
be set appropriately. See the information on format statements for the data format with
weights in Section 2.6.18.

Format

NWGHT=n

The type of weighting associated with valid values of n is:

 0: none
 1: for classical item statistics only
 2: for IRT item calibration only
 3: for both statistics and calibrations.

Default

NWGHT=0.

Example

In this example, the data are accumulated into answer patterns. TYPE=2 and NWGHT=3 are in-

154
GLOBAL COMMAND

cluded to indicate this form of data.

>GLOBAL NPARM=1, NWGHT=3, LOGISTIC;


>LENGTH NITEMS=4;
>INPUT NTOTAL=4, NGROUPS=2, DIF, NIDCHAR=2, TYPE=2;

Related topics

 INPUT command: TYPE keyword (see Section 2.6.9)


 Variable format statement (see Section 2.6.18)

OMITS option (optional)

Purpose

To specify that omits are treated as fractionally correct when the 3-parameter model is em-
ployed. The fraction is the reciprocal of the number of the alternatives in the multiple-choice
items (see the NALT keyword on the INPUT command, Section 2.6.9). Also see Section 2.6.20
for more information on the specification of an omit key using the OFNAME keyword on the
INPUT command.

Format

OMITS

Default

Omitted responses are treated as incorrect.

Examples

For the following 3-parameter model, an omitted response will be scored fractionally correct
with the fraction equal to 1/5 (NALT=5). The omit response key can be found in the data file.

>GLOBAL NPARM=3, LOGISTIC, DFNAME=’EXAMPLE.DAT’, OMITS;


>LENGTH NITEMS=40;
>INPUT NTOTAL=40, OFNAME=’EXAMPLE.DAT’, NALT=5;

In this example the omitted response will be scored fractionally correct with fraction 1/4.
The key for omitted responses can be found in a separate, external file.

>GLOBAL NPARM=3, LOGISTIC, DFNAME=’EXAMPLE.DAT’, OMITS;


>LENGTH NITEMS=40;
>INPUT NTOTAL=40, OFNAME=’OMITKEY.DAT’, NALT=4;

Related topics

 INPUT command: NALT and OFNAME keywords(see Section 2.6.9)

155
2 BILOG-MG REFERENCE

 Input files (see Section 2.6.20)


 Setup menu: General dialog box (see Section 2.3.3)

PRNAME keyword (optional)

Purpose

To specify the name of the file from which the provisional (i.e. starting) values of parame-
ters of selected items will be obtained. The values are read in space-delimited, free-format
form.

Format

PRNAME=<’filename’>

Contents:

The contents of the file are as follows:

Line 1:

The number of selected items in each subtest.

Remaining lines:

The serial position of each item selected from the corresponding subtest, followed by the
slope, threshold, and chance success (guessing) probability of the item. If a two-parameter
model is assumed, the latter should be entered as 0.

Default

None.

Example

5 5
5 1.0 0.0 0.333
10 1.0 0.0 0.333
15 1.0 0.0 0.333
25 1.0 0.0 0.333
30 1.0 0.0 0.333
5 1.1 0.5 0.233
10 1.1 0.5 0.233
15 1.1 0.5 0.233
25 1.1 0.5 0.233
30 1.1 0.5 0.233

156
GLOBAL COMMAND

Provisional values will be assigned to five items in each of two subtests. In each subtest, the
5-th, 10-th, 15-th, 25-th, and 30-th item will be assigned the values in the corresponding
line.

The following is an example of a command file that will input these values. Note that PRINT
has been set to 1 on the CALIB command to print the item parameters at cycle zero and show
the assigned values.

EXAMPLE 15:
ASSIGNED STARTING VALUES FOR TWO SUBTESTS
>GLOBAL DFNAME='EXAMPL03.DAT',PRNAME='EXAMPL15.PRM',NPARM=2,
NTEST=2,SAVE;
>SAVE PDISTRIB='EXAMPL15.PST',SCORE='EXAMPL15.SCO';
>LENGTH NITEMS=(35,35);
>INPUT NTOTAL=45,SAMPLE=2000,NGROUP=2,KFNAME='EXAMPL03.DAT',
NALT=5,NFORMS=2,NIDCHAR=5;
>ITEMS INUMBERS=(1(1)45),
INAME=(C01(1)C45);
>TEST1 TNAME=SUBTEST1, INAME=(C01(1)C15,C21(1)C40);
>TEST2 TNAME=SUBTEST2, INAME=(C06(1)C25,C31(1)C45);
>FORM1 LENGTH=25,INUMBERS=(1(1)25);
>FORM2 LENGTH=25,INUMBERS=(21(1)45);
>GROUP1 GNAME=POP1,LENGTH=25,INUMBERS=(1(1)25);
>GROUP2 GNAME=POP2,LENGTH=25,INUMBERS=(21(1)45);
(T28,5A1,T25,I1,T25,I1/45A1)
>CALIB IDIST=1,EMPIRICAL,NQPT=11,CYCLE=10,TPRIOR,NEWTON=1,
CRIT=0.01,REF=1,NOFLOAT,PRINT=1;
>SCORE IDIST=3,RSCTYPE=3,INFO=1,YCOMMON,POP,NOPRINT,MOMENTS;

Related topics

 CALIB command: PRINT keyword (see Section 2.6.3)


 Setup menu: Test Scoring dialog box (see Section 2.3.3)

SAVE option (optional)

Purpose

To indicate that a SAVE command will follow the GLOBAL command.

Format

SAVE

Default

No SAVE command to follow.

157
2 BILOG-MG REFERENCE

Example

In the syntax below, the item parameters and scale scores are saved to file through the use of
the SCORE and PARM keywords on the SAVE command. Note that, in order to use the SAVE
command, the SAVE option is added to the GLOBAL command.

>GLOBAL DFNAME=’EXAMPLE.DAT’, NPARM=2, SAVE;


>SAVE SCORE=’EXAMPLE.SCO’, PARM=’EXAMPLE.PAR’;
>LENGTH NITEMS=(40);

Related topics

 SAVE command (see Section 2.6.15)


 Save menu (see Section 2.3.6)

158
GROUP COMMAND

2.6.8 GROUP command

(Required when NGROUP > 1 on the INPUT command)

Purpose

To specify information about the items in each particular group. When the NGROUP keyword
on the INPUT command is greater than one, that same number of GROUP commands must fol-
low the FORM commands. Each GROUP command specifies the group’s name, the length of
the group’s form, and the items included in that form. Items may be identified by name or
number, but not by both.

The GROUP command requires a group number in the data record. The group numbers must
range in value from 1 to the number of groups. If NFORM > 1, the group indicator field fol-
lows the form indicator field. If NFORM = 1, the group indicator field follows the case ID
field. The group indicator field is INTEGER in the variable format statement. If the subtest
is personalized (the option PERSONAL is present in the INPUT command) there are NTEST
group indicators for each subject.

The order of the several GROUP commands corresponds to the number of the respective
group. If the same items are administered to all groups, the INUMBERS and INAMES lists are
the same as those in the ITEMS command.

Format

>GROUP GNAME=n, LENGTH=n, INUMBERS=(list), INAMES=(list);

Default

No groups assumed.

Example

If the form(s) for group 1 consists of items 1, 2, 4, and 5, and the form(s) for group 2 con-
sists of items 3 through 8, then the corresponding group commands are as follows:

>GROUP1 GNAME=GROUP1, LENGTH=4, INUMBERS=(1,2,4,5);


>GROUP2 GNAME=GROUP2, LENGTH=6, INUMBERS=(3(1)8);

Related topics

 GLOBAL command: NTEST and NVTEST keywords (see Section 2.6.7)


 INPUT command: NFORM keyword (see Section 2.6.9)
 INPUT command: PERSONAL option
 ITEMS command (see Section 2.6.10)
 LENGTH command (see Section 2.6.11)

159
2 BILOG-MG REFERENCE

 Setup menu: General dialog box (see Section 2.3.3)


 Setup menu: Item Analysis dialog box
 Variable format statement (see Section 2.6.18)

GNAME keyword (optional)

Purpose

To specify the name of GROUPk (up to eight characters).

Format

GNAME=character string

Default

Blanks.

Related topics

 Setup menu: General dialog box (see Section 2.3.3)

INAMES keyword (optional)

Purpose

To specify the list of item names, as specified in the ITEMS command, for all items in all
forms administered to GROUPk.

Format

INAMES=( n1 , n2 ,..., nLENGTH )

Default

All names specified in the ITEMS command.

Example

Assume, in the previous example, that the command

>ITEMS INAMES=(I1(1)I8)

appears earlier in the command file to give the name Ix to item x. Then the two GROUP state-
statements could be replaced with

160
GROUP COMMAND

>GROUP1 GNAME=GROUP1, LENGTH=4, INAMES=(I1,I2,I4,I5)


>GROUP2 GNAME=GROUP2, LENGTH=6, INAMES=(I3(1)I8)

Note the use of the list notation in the GROUP2 statement to specify items I3 through I8.

Related topics

 GROUP command: LENGTH keyword


 INPUT command: NTOTAL keyword (see Section 2.6.9)
 ITEMS command (see Section 2.6.10)
 Setup menu: General dialog box (see Section 2.3.3)

INUMBERS keyword (optional)

Purpose

To provide a list of item numbers, as specified in the ITEMS command, for all items in all
forms administered to GROUPk.

Format

INUMBERS=( n1 , n2 ,..., nLENGTH )

Default

All items specified in the ITEMS command.

Example

In the following example, the INUMBERS keywords specify the item list for each group. Note,
again, the use of the “sequence” notation in the second statement to specify items 3 through
8.

>GROUP1 GNAME=GROUP1, LENGTH=4, INUMBERS=(1,2,4,5);


>GROUP2 GNAME=GROUP2, LENGTH=6, INUMBERS=(3(1)8);

Related topics

 GROUP command: LENGTH keyword


 INPUT command: NTOTAL keyword (see Section 2.6.9)
 ITEMS command (see Section 2.6.10)
 Setup menu: Item Analysis dialog box (see Section 2.3.3)

161
2 BILOG-MG REFERENCE

LENGTH keyword (optional)

Purpose

To specify the number of items in the test form(s) for GROUPk .

Format

LENGTH=n

Default

NTOTAL.

Example

In the following example, the LENGTH keyword in each GROUP statement specifies the num-
ber of items for each group.

>GROUP1 GNAME=GROUP1, LENGTH=4, INUMBERS=(1,2,4,5);


>GROUP2 GNAME=GROUP2, LENGTH=6, INUMBERS=(3(1)8);

Related topics

 INPUT command: NTOTAL keyword (see Section 2.6.9)


 Setup menu: General dialog box (see Section 2.3.3)

162
INPUT COMMAND

2.6.9 INPUT command

(Required)

Purpose

To provide the information which describes the raw data file. One or more variable format
statements describing the layout of the data must follow the FORM, GROUP, or DRIFT com-
mand, if present.

The keywords KFNAME, NFNAME, and OFNAME enable the user to assign specific names to the
program’s input files. A filename must be no more than 128 characters long and may in-
clude a drive prefix, a path name, and an extension. The filename must be enclosed in single
quotes. Note that each line of the command file has a maximum length of 80 characters. If
the filename does not fit on one line of 80 characters, the remaining characters should be
placed on the next line, starting at column 1.

Format

>INPUT NTOTAL=n, NFMT=n, TYPE=n, SAMPLE=n, NALT=n, NIDCHAR=n, TAKE=n,


NGROUP=n, NFORM=n, ISEED=n, DIAGNOSE=n, KFNAME=n, NFNAME=n,
OFNAME=n, DRIFT, DIF, PERSONAL, EXTERNAL;

Examples

In the following example, responses from two groups are analyzed. There are two forms of a
25-item multiple-choice examination, with 5 items in common. In total, the responses of a
sample of 2000 respondents to the 45 items are considered.

>INPUT NTOTAL=45, SAMPLE=2000, NGROUP=2, NFORM=2;

The INPUT command below is used to request a DIF analysis on 4 items administered to two
groups.

>INPUT NTOTAL=4, DIF, NGROUP=2;

Related topics

 Data menu: Examinee Data dialog box (see Section 2.3.4)


 Data menu: Group-Level Data dialog box
 Data menu: Item Keys dialog box
 DRIFT command (see Section 2.6.5)
 FORM command (see Section 2.6.6)
 GROUP command (see Section 2.6.8)
 Setup menu: General dialog box (see Section 2.3.3)
 Technical menu: Data Options dialog box (see Section 2.3.5)
 Variable format statement (see Section 2.6.18)

163
2 BILOG-MG REFERENCE

DIAGNOSE keyword (optional)

Purpose

To specify a level of diagnostic printout for Phase 1. Larger values of n give increasing
diagnostic output.

Format

DIAGNOSE=n

Default

No diagnostic printout.

Related topics

 Phase 1: INPUT (see Section 2.2)

DIF option (optional)

Purpose

To specify a differential equation modeling (DIF) analysis for multiple groups, which as-
sumes common slopes and guessing parameter for all groups.

Format

DIF

Default

No DIF analysis.

Example

In the syntax below, a 1-parameter DIF model is fitted to data from two groups of exami-
nees. DIF parameters are saved to the file exampl01.dif through use of the SAVE option on
the GLOBAL command and the DIF option on the SAVE command.

>GLOBAL NPARM=1,LOGISTIC,SAVE;
>SAVE PARM='EXAMPL01.PAR',DIF='EXAMPL01.DIF';
>LENGTH NITEMS=4;
>INPUT NTOTAL=4,NGROUPS=2,DIF,NIDC=2;

164
INPUT COMMAND

Related topics

 DRIFT option
 GROUP command (see Section 2.6.8)
 INPUT command: NGROUP keyword
 SAVE command: DIF keyword (see Section 2.6.15)
 Setup menu: General dialog box (see Section 2.3.3)

DRIFT option (optional)

Purpose

To specify an item parameter drift model for multiple groups. A DRIFT command must also
appear after the GROUP commands.

Format

DRIFT

Default

No DRIFT model.

Example

In the syntax below, a 2-parameter DRIFT model is fitted to data from two groups of exami-
nees. DRIFT parameters are saved to the file exampl01.drf by using the SAVE option on the
GLOBAL command and the DRIFT option on the SAVE command.

>GLOBAL NPARM=1,LOGISTIC,SAVE;
>SAVE PARM='EXAMPL01.PAR',DRIFT='EXAMPL01.DRF';
>LENGTH NITEMS=4;
>INPUT NTOTAL=4,NGROUPS=2,DRIFT,NIDC=2;

Related topics

 DRIFT command (see Section 2.6.5)


 GROUP command (see Section 2.6.8)
 INPUT command: NGROUP keyword
 SAVE command: DRIFT keyword (see Section 2.6.15)
 Setup menu: General dialog box (see Section 2.3.3)

165
2 BILOG-MG REFERENCE

EXTERNAL keyword (optional)

Purpose

To specify the computation of the item parameters with respect to an external variable, the
values of which are supplied in the data records, rather than to a latent variable inferred from
the item responses. When item parameters are estimated in this way and used to score test
data of any other groups of examinees, the resulting scores are the best predictors of the abil-
ity measured by the external variable.

In each record of the calibration data, each test in the analysis must be represented by a
value of the external variable and its corresponding standard error. These two quantities for
each test in the data record must precede the item responses in the same order as the tests
appear in their successive command lines. The columns of the data records devoted to these
pairs of scores and standard errors must be identified in the input variable format statement.

Format

EXTERNAL

Default

Calibration with respect to a latent variable inferred from the item responses.

Example

Suppose a group of students took an end-of-term reading test and math test routinely
administered to all students in a metropolitan school district. Suppose these students were
also part of the sample for a state assessment of reading and math achievement. If scores and
standard errors on the assessment tests for these students were available to the district, the
district test could be calibrated to best predict the state reading and math scores of students
of all students in the district. For this purpose, the state test results would serve as the
external variables for calibrating items of the local tests to predict the state assessment’s
scores.
For the sake of generality, suppose also that there are three random parallel forms of the dis-
trict tests and that these forms are assigned at random to students in two successive school
grades. Then there will be two groups of students in the analysis and the record layout of the
data might be the following:

 Columns 1-4: Student ID


 Column 6: test form number
 Column 8: grade group number
 Columns 10-13: state reading test score
 Columns 15-18 state reading test standard error
 Columns 20-23 state math test score
 Columns 25-28 state math test standard error

166
INPUT COMMAND

 Columns of 30-59: local reading test item responses


 Columns 60-89: local math test item responses

The format statement for reading the data records would be

(4A1,1X,I1,1X,I1,2(1X,F4.1,1X,F4.1),1X,60A1)

and the item parameter file from the calibration could be saved for use in scoring other
students.

Related topics

 Data, Examinee Data /Data, Group-Level Data dialog boxes (see Section 2.3.4)
 Variable format statement (see Section 2.6.18)

ISEED keyword (optional)

Purpose

To specify the seed for the random number generator used for sampling subjects.

By default, the same seed will always be used for sampling subjects when the SAMPLE key-
word on the INPUT command is used. ISEED may be used to change the seed, thus producing
a different random sample of subjects.

Format

ISEED=n

Default

ISEED=1.

Related topics

 INPUT command: SAMPLE and TAKE keywords


 Technical menu: Data Options dialog box (see Section 2.3.5)

KFNAME keyword (optional)

Purpose

To specify the name of the file which contains the answer key. This key consists of the cor-
rect response alternative for each item, in the same format as the corresponding response re-
cords. Any single ASCII character can be used as a response alternative. If the answer key is
in the same file as the item response data, the key must precede the first response record. If

167
2 BILOG-MG REFERENCE

KFNAME does not appear on the INPUT command, then the data are assumed to be scored 1
for correct and 0 for incorrect.

When NFORM > 1, separate answer, not-presented, and omit keys must be specified for each
form in the order of the forms to which they apply. Again, if they are in the same file as the
response data, all keys must precede the first response record.

Format

KFNAME=<’filename’>

Default

No answer key.

Notes

The path to and filename of this file may be longer than 80 characters. As the maximum
length of any line in the command file is 80 characters, multiple lines should be used. It is
important to continue up to and including the 80th column when specifying a long path and
filename.

For example, suppose the data file exampl06.dat is in a folder named:

C:\PROGRAM FILES\ITEM RESPONSE THEORY\IRT_2002\MARCH20\BILOG-MG-


VERSION1.2\EXAMPLES

The correct way to enter this information in the command file is to enclose the name and
path in apostrophes, and continue until column 80 is reached. Then proceed in column 1 of
the next line as shown below:

>GLOBAL DFNAME=’C:\PROGRAM FILES\ITEM RESPONSE THEORY\IRT_2002\MARCH20\BILOG-MG


-VERSION1.2\EXAMPLES\EXAMPL06.DAT’, NTEST=1, NVTEST=1, NPARM=2, SAVE;

If the data are stored in the same folder as the command file, it is sufficient to type

DFNAME='EXAMPL06.DAT'

Example

In the analysis of single subject data from the file exampl04.dat, the answer key appears at
the top of the file as indicated by the use of the KFNAME keyword.

>INPUT NTOTAL=40,NFORM=2,KFNAME='EXAMPL04.DAT',NALT=5;

168
INPUT COMMAND

As two forms are used, answer keys are given by form before the actual data, and in the
same format as the data records. The first few lines of exampl04.dat are as follows:

ANSWER KEY FORM 1 1


11111111111111111111
ANSWER KEY FORM 2 2
11111111111111111111
Samp1 12 1
11111111122212122111
Samp1 12 1
11222212221222222112

Related topics

 INPUT command: NFORM, NFNAME, and OFNAME keywords


 Data menu: Item Keys dialog box (see Section 2.3.4)

NALT keyword (optional)

Purpose

To specify the maximum number of response alternatives in the raw data. 1/NALT is used as
the automatic starting value for estimating lower asymptotes (guessing parameters) of the 3-
parameter model.

Format

NALT=n

Default

5 for the 3PL model; 1000 for the 1PL and 2PL models.

Examples

In the case of the following 2-parameter model, 5 responses to each item are given in the
data file.

The correct response to each item is noted in the answer key, which appears at the top of the
data file (indicated by the KFNAME keyword on the INPUT command).

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2;
>LENGTH NITEMS=(45);
>INPUT NTOTAL=45,SAMPLE=2000,NGROUP=2,KFNAME='EXAMPL03.DAT',NIDCHAR=5,
NALT=5,NFORM=2,TYPE=1;

When a 3-parameter model is fitted to the same data, 1/5 will be used as starting value for
the lower asymptote (guessing parameter) of each item.

169
2 BILOG-MG REFERENCE

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=3;
>LENGTH NITEMS=(45);
>INPUT NTOTAL=45,SAMPLE=2000,NGROUP=2,KFNAME='EXAMPL03.DAT',NIDCHAR=5,
NALT=5,NFORM=2,TYPE=1;

In the following example, a 2-parameter model is fitted to the data. No answer key is given,
and it is assumed that the 2 response alternatives (NALT=2) are coded 1 for correct responses
and 0 for incorrect responses. If more than 2 response alternatives are present and no code is
given, all responses other than 1 will be assumed incorrect.

>GLOBAL DFNAME='EXAMPL04.DAT',NPARM=2;
>LENGTH NITEMS=(40);
>INPUT NTOTAL=40,NALT=2;

Related topics

 GLOBAL command: NPARM keyword (see Section 2.6.7)


 Setup menu: General dialog box (see Section 2.3.3)

NFMT keyword (optional)

Purpose

To specify the number of format records for reading the respondent data records.

Format

NFMT=n

Default

1.

Examples

In the format statement below, item responses are read from two lines: the first 25 responses
are read on the first line of data for each examinee and the second 25 on the second line of
data. Although responses are read over two lines, the format statement fits comfortably on
one line in the command file, and thus NFMT=1.

(11A1,T39,25A1/T13,25A1)

If, however, a large data file is used as input, and it becomes necessary to write the format
statement over multiple lines in the command file, the value assigned to NFMT should be ad-
justed to reflect this. For example, NFMT=2 for the following format statement in which 15
items are selected and columns between items are passed over using the “X” operator:

170
INPUT COMMAND

(11A1,1X,A1,2X,A1,1X,A1,3X,A1,1X,A1,2X,A1,1X,A1,3X,A1,1X,A1,2X,A1,1X,A1,
3X,A1,1X,A1,2X,A1,1X,A1)

Related topics

 Data menu: Examinee Data dialog box (see Section 2.3.4)


 Data menu: Group-Level Data dialog box
 Variable format statement (see Section 2.6.18)

NFNAME keyword (optional)

Purpose

To specify the name of the file which contains the not-presented key. This key must be
given in the same format as the corresponding response records. Any single ASCII character
can be used to represent a not-presented item. If the not-presented key is in the same file as
the item response data, the key must precede the first response record. If this key appears in
the same file as the answer key, it must appear in the file after the answer key. If NFNAME
does not appear on the INPUT command, then all items are assumed presented.

When NFORM > 1, separate answer, not-presented, and omit keys must be provided for each
form in the order of the forms to which they apply. Again, if they are in the same file as the
response data, all keys must precede the first response record.

Format

NFNAME=<’filename’>

Default

No not-presented key.

Examples

In the analysis of single subject data from the file exampl04.dat, the not-presented key ap-
pears at the top of the file as indicated below, using the NFNAME keyword.

>INPUT NTOTAL=40,NFORM=2,NFNAME='EXAMPL04.DAT',NALT=5;
>ITEMS INUMBERS=(1(1)40),INAME=(T01(1)T40);
>TEST TNAME=SIM;
>FORM1 LENGTH=20,INUMBERS=(1(1)20);
>FORM2 LENGTH=20,INUMBERS=(21(1)40);
(T28,5A1,T25,I1/40A1)

As two forms are used, the not-presented keys are given by form before the actual data, and
in the same format as the data records. The first few lines of exampl04.dat are as follows:

171
2 BILOG-MG REFERENCE

Not-P KEY FORM 1 1


aaaaaaaaaaaaaaaaaaaa
Not-P KEY FORM 2 2
aaaaaaaaaaaaaaaaaaaa
Samp1 12 1
11a11111122212122111
Samp1 12 1
112222122a1222222112

Alternatively, the lines

Not-P KEY FORM 1 1


aaaaaaaaaaaaaaaaaaaa
Not-P KEY FORM 2 2
aaaaaaaaaaaaaaaaaaaa

can be saved to a not-presented key file exampl04.nfn, and referenced as such in a revised
INPUT command:

>INPUT NTOTAL=40,NFORM=2,NFNAME='EXAMPL04.NFN',NALT=5;

If both a not-presented key and an omit key are used for the two forms, the following lines
should appear at the top of the data file when the data file is referenced by the NFNAME and
OFNAME keywords in the INPUT command:

>INPUT NTOTAL=40,NFORM=2,NFNAME='EXAMPL04.DAT',
OFNAME='EXAMPL04.DAT',NALT=5;

Not-P KEY FORM 1 1


aaaaaaaaaaaaaaaaaaaa
Omit KEY FORM 1 1
bbbbbbbbbbbbbbbbbbbb
Not-P KEY FORM 2 2
aaaaaaaaaaaaaaaaaaaa
Omit KEY FORM 2 2
bbbbbbbbbbbbbbbbbbbb

Related topics

 Data menu: Item Keys dialog box (see Section 2.3.4)


 GLOBAL command: DFNAME keyword (see Section 2.6.7)
 INPUT command: NFORM keyword
 INPUT command: KFNAME keyword
 INPUT command: OFNAME keyword

172
INPUT COMMAND

NFORM keyword (optional)

Purpose

To specify the number of test forms. If NFORM > 1, the response records must contain an in-
dicator specifying the form to which the examinee responded. This keyword is used in com-
bination with the FORM command and the variable format statement.

The NFORM keyword is required when multiple-form data is supplied to the program in com-
pressed form (see input file format discussed in Section 2.6.20 for more details). If the in-
strument consists of a single test form, or multiple-form data is supplied to the program in
expanded format, the NFORM keyword, with NFORM=1, is required by the program if the order
of items on the response records does not correspond to the order of items in the ITEMS
command list.

Format

NFORM=n

Default

No FORM commands will be read and the order of items in the response records is assumed to
be the same as that in the ITEMS command.

Example

In the following example, two forms were administered to two groups of examinees. As
both the NFORM and NGROUP keywords are used on the INPUT command, both FORM and
GROUP commands are given.

>INPUT NTOTAL=45,NGROUP=2,NIDCHAR=5,NALT=5,NFORM=2;
>ITEMS INUMBERS=(1(1)45), INAME=(C01(1)C45);
>TEST TNAME=CHEMISTRY;
>FORM1 LENGTH=25,INUMBERS=(1(1)25);
>FORM2 LENGTH=25,INUMBERS=(21(1)45);
>GROUP1 GNAME=POP1,LENGTH=25,INUMBERS=(1(1)25);
>GROUP2 GNAME=POP2,LENGTH=25,INUMBERS=(21(1)45);

Note that the format statement contains both a form and a group indicator.

(5A1,T25,I1,T25,I1,25A1)

Related topics

 FORM command (see Section 2.6.6)


 Input files (see Section 2.6.20)
 ITEMS command (see Section 2.6.10)

173
2 BILOG-MG REFERENCE

 Setup menu: General dialog box (see Section 2.3.3)


 Variable format statement (see Section 2.6.18)

NGROUP keyword (optional)

Purpose

To specify the number of groups or cohorts of respondents. If NGROUP > 1, the response re-
cords must contain an indicator specifying the group or cohort to which the respondent be-
longs. This keyword is used in combination with the GROUP command and the variable for-
mat statement, where a group indicator is added.

Format

NGROUP=n

Default

1.

Related topics

 FORM command (see Section 2.6.6)


 GROUP command (see Section 2.6.8)
 Setup menu: General dialog box (see Section 2.3.3)
 Variable format statement (see Section 2.6.18)

NIDCHAR keyword (required)

Purpose

To specify the number of characters in the respondent’s identification field. Valid values are
1 to 30.

Format

NIDCHAR=n

Default

30.

Example

Data from two groups, found on two forms are analyzed in this example. The NIDCHAR
keyword is set to 5, indicating that the subject ID field is 5 columns in length. This corre-

174
INPUT COMMAND

sponds with the format statement, where the first entry, for the subject ID, is 5A1.

>INPUT NTOTAL=45,NGROUP=2,NIDCHAR=5,NALT=5,NFORM=2,TYPE=1;
(5A1,T25,I1,T25,I1/25A1)

Related topics

 Data menu: Examinee Data dialog box (see Section 2.3.4)


 Data menu: Group-Level Data dialog box
 Variable format statement (see Section 2.6.18)

NTOTAL keyword (optional)

Purpose

To specify the total number of unique items in the respondent data records. The number in-
cludes all main and variant test items on all forms.

Format

NTOTAL=n

Default

0.

Examples

In this example, responses from two groups are analyzed. There are two forms of a 25-item
multiple-choice examination, with 5 items in common. In total, the responses of a sample of
2000 respondents to the 45 items are considered.

>INPUT NTOTAL=45, SAMPLE=2000, NGROUP=2, NFORM=2;

The INPUT command below is used to request a DIF analysis on 4 items administered to two
groups.

>INPUT NTOTAL=4, DIF, NGROUP=2;

In the following example, responses to 50 items are read from the data file. From the 50, 20
are selected as Main Test items and 4 as Variant Test items. Items for the main test are se-
lected by name in the TESTM command; items for the variant test are selected by name in the
TESTV command.

>GLOBAL DFNAME='EXAMPL06.DAT', NTEST=1,NVTEST=1,NPARM=2;


>LENGTH NITEM=24,NVARIANT=4;
>INPUT NTOTAL=50,KFNAME='EXAMPL06.DAT',SAMPLE=200,NIDCH=11;

175
2 BILOG-MG REFERENCE

>ITEMS INUMBERS=(1(1)50),INAME=(I26(1)I75);
>TESTM TNAME=MAINTEST,
INAMES=(I26,I27,I28,I29,I31,I33,I34,
I35,I36,I38,I39,I47,I48,I49,I50,I54,I60,I64,I68,I72);
>TESTV TNAME=VARIANT,INAMES=(I53,I59,I69,I73);

Related topics

 Setup menu: General dialog box (see Section 2.3.3)

OFNAME keyword (optional)

Purpose

To specify the name of the file which contains the omit key. This key must be specified in
the same format as the response records. Any single ASCII character can be used to repre-
sent a not-presented item. If the not-presented key is in the same file as the item response
data, the key must precede the first response record. If this key appears in the same file as
the answer and/or not-presented keys, it must appear in the file after the both keys.

If OFNAME does not appear on the INPUT command, omits will not be distinguished from in-
correct responses. When NFORM > 1, separate answer, not-presented, and omit keys must be
provided for each form in the order of the forms to which they apply. Again, if they are in
the same file as the response data, all keys must precede the first response record.

Format

OFNAME=character string

Default

No omit key.

Examples

In the analysis of single subject data from the file exampl04.dat, the omit key appears at the
top of the file as indicated by the use of the OFNAME keyword.

>INPUT NTOTAL=40,NFORM=2,OFNAME='EXAMPL04.DAT',NALT=5;
>ITEMS INUMBERS=(1(1)40),INAME=(T01(1)T40);
>TEST TNAME=SIM;
>FORM1 LENGTH=20,INUMBERS=(1(1)20);
>FORM2 LENGTH=20,INUMBERS=(21(1)40);
(T28,5A1,T25,I1/40A1)

As two forms are used, omit keys are given by form before the actual data, and in the same
format as the data records. The first few lines of exampl04.dat are as follows:

176
INPUT COMMAND

Omit KEY FORM 1 1


bbbbbbbbbbbbbbbbbbbb
Omit KEY FORM 2 2
bbbbbbbbbbbbbbbbbbbb
Samp1 12 1
11a11111122212122111
Samp1 12 1
112222122a1222222112

Alternatively, the lines

Omit KEY FORM 1 1


bbbbbbbbbbbbbbbbbbbb
Omit KEY FORM 2 2
bbbbbbbbbbbbbbbbbbbb

can be saved to a omit key file exampl04.ofn, and referenced as such in a revised INPUT
command:

>INPUT NTOTAL=40,NFORM=2,NFNAME='EXAMPL04.OFN',NALT=5;

If both a not-presented key and an omit key are used for the two forms, the following lines
should appear at the top of the data file when the data file is referenced by the NFNAME and
OFNAME keywords in the INPUT command:

>INPUT NTOTAL=40,NFORM=2,NFNAME='EXAMPL04.DAT',
OFNAME='EXAMPL04.DAT',NALT=5;

Not-P KEY FORM 1 1


aaaaaaaaaaaaaaaaaaaa
Omit KEY FORM 1 1
bbbbbbbbbbbbbbbbbbbb
Not-P KEY FORM 2 2
aaaaaaaaaaaaaaaaaaaa
Omit KEY FORM 2 2
bbbbbbbbbbbbbbbbbbbb

Related topics

 Data menu: Item Keys dialog box (see Section 2.3.4)


 GLOBAL command: DFNAME and NPARM keywords (see Section 2.6.7)
 INPUT command: KFNAME, NFNAME, and NFORM keywords

PERSONAL option (optional)

Purpose

To specify the assumption that the group or cohort assignment of an examinee is personal-
ized by subtest. The response records must contain NTEST indicators, one for each subtest,
specifying the groups or group cohorts to which the respondent belongs. The NTEST group

177
2 BILOG-MG REFERENCE

indicators must be specified in the variable format statement in the same order as the sub-
tests.

The PERSONAL option is especially useful for two-stage tests that measure ability in more
than one area. Assignment to the second-stage booklets may differ among areas.

Format

PERSONAL

Default

None.

Related topics

 Data menu: Examinee Data dialog box (see Section 2.3.4)


 Data menu: Group-Level Data dialog box
 GLOBAL command: NTEST keyword (see Section 2.6.7)
 Variable format statement (see Section 2.6.18)

SAMPLE keyword (optional)

Purpose

To specify the number of respondents to be randomly sampled from the raw data file.

Format

SAMPLE=n

Default

1000.

Example

Here data are read from the file exampl03.dat, which also contains the answer key (DFNAME
and KFNAME keywords). Although the data file contains only 400 records, a sample of 2000
is requested.

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2;
>LENGTH NITEMS=(45);
>INPUT NTOTAL=45,SAMPLE=2000,NGROUP=2,KFNAME='EXAMPL03.DAT',NIDCHAR=5,
NALT=5,NFORM=2,TYPE=1;

178
INPUT COMMAND

If the first few records of the data file are to be used, the TAKE keyword should be used in-
stead.

Related topics

 Data menu: Examinee Data dialog box (see Section 2.3.4)


 Data menu: Group-Level Data dialog box
 INPUT command: TAKE keyword

TAKE keyword (optional)

Purpose

To specify an analysis using only the first n respondents in the data file. This option is useful
for testing the problem setup on a smaller number of respondents when the sample size is
large. Note that the maximum value for this keyword is the actual number of respondents in
the data file. To obtain a random sample of the respondents, the SAMPLE keyword should be
used. TAKE and SAMPLE are mutually exclusive keywords.

Format

TAKE=n

Default

Take all data specified by SAMPLE.

Examples

In the following example, data are read from the file exampl03.dat, which also contains the
answer key (DFNAME and KFNAME keywords). Although the data file contains only 400 re-
cords, a sample of 2000 is requested.

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2;
>LENGTH NITEMS=(45);
>INPUT NTOTAL=45,SAMPLE=2000,NIDCHAR=5,NALT=5,TYPE=1;

If, however, only the first 100 records are to be used in the analysis, the modified INPUT
command

>INPUT NTOTAL=45,TAKE=100,NIDCH=5,NALT=5,TYPE=1;

should be used.

179
2 BILOG-MG REFERENCE

Related topics

 Data menu: Examinee Data dialog box (see Section 2.3.4)


 Data menu: Group-Level Data dialog box
 INPUT command: SAMPLE keyword

TYPE keyword (optional)

Purpose

To specify the type of data file to be used in the analysis:

 0: no raw data to read in


 1: single-subject data to read in
 2: single-subject data with case weights
 3: number tried, number right data, no case weights
 4: number tried, number right data, case weights

Format

TYPE=n

Default

1.

Examples

In a preliminary run, an item parameter file was created as shown below. The item parame-
ter file was saved to exampl03.par using the PARM keyword on the SAVE command. As sin-
gle-subject data were used in this run TYPE was set to 1 in the INPUT command.

EXAMPLE:
CREATING A ITEM PARAMETER FILE
>COMMENT
>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2, SAVE;
>SAVE PARM='EXAMPL03.PAR';
>LENGTH NITEMS=(45);
>INPUT NTOTAL=45,SAMPLE=2000,NGROUP=2,KFNAME='EXAMPL03.DAT',NIDCH=5,
NALT=5,NFORM=2,TYPE=1;

The previously created calibration file is now used as input through the use of the IFNAME
keyword on the GLOBAL command. Note that the TYPE keyword on the INPUT command is
now set to 0, compared to 1 previously. The updated item parameter estimates are saved to
the file latest.par using the PARM keyword on the SAVE command.

180
INPUT COMMAND

EXAMPLE:
USING A ITEM PARAMETER FILE AS INPUT
>COMMENT
>GLOBAL CFNAME='EXAMPL03.PAR',NPARM=2, SAVE;
>SAVE CALIB='LATEST.PAR';
>LENGTH NITEMS=(45);
>INPUT NTOTAL=45,SAMPLE=2000,NGROUP=2,NIDCHAR=5,
NALT=5,NFORM=2,TYPE=0;

Related topics

 Data menu: Examinee Data dialog box (see Section 2.3.4)


 Data menu: Group-Level Data dialog box
 GLOBAL command: IFNAME keyword (see Section 2.6.7)
 SAVE command: PARM keyword (see Section 2.6.15)

181
2 BILOG-MG REFERENCE

2.6.10 ITEMS command

(Required)

Purpose

To specify the names and corresponding numbers for all items in the data records. The items
may be listed in any order, but the order in which the names appear must correspond with
the order of the numbers. The names and numbers specified in the ITEMS command are used
to refer to the items in the TEST, FORM, and GROUP commands.

Strings of consecutive numbers may be abbreviated as m(1)n, where m is the number of the
first item and n is the number of the last item. Strings of up to 8 character names including
consecutive numbers may be abbreviated as Xm(1)Xn, where X is a string of up to 4 letters of
the alphabet, m is the up-to-4 character integer number of the first item and n is the up-to-4
character integer number of the last item.

Format

>ITEMS INUMBERS=(list), INAMES=(list);

Default

None.

Examples

In the first example, 15 items are assigned the names MATH01 through MATH15.

>ITEMS INAME=(MATH01(1)MATH15);

In the syntax that follows, 16 items belonging to 2 subtests are identified. From the LENGTH
command, we see that each subtest has 8 items. The ITEMS command is used to first number
these items, and then to assign the names N1 through N8 to items belonging to the first sub-
test. Items belonging to the second subtest are named A1 through A8. On the TEST com-
mands, items are referenced by number. Referencing by the names assigned in the ITEMS
command is another option.

>LENGTH NITEMS=(8,8);
>INPUT NTOTAL=16,NALT=5,NIDCHAR=9,TYPE=3;
>ITEMS INUMBERS=(1(1)16),INAMES=(N1(1)N8,A1(1)A8);
>TEST1 TNAME=NUMCON,INUMBERS=(1(1)8);
>TEST2 TNAME=ALGCON,INUMBERS=(9(1)16);

182
ITEMS COMMAND

Related topics

 FORM command (see Section 2.6.6)


 GROUP command (see Section 2.6.8)
 TEST command (see Section 2.6.17)
 Setup menu: General dialog box (see Section 2.3.3)

INAMES keyword (optional)

Purpose

To specify a list of NTOTAL unique names (up to eight characters each). Item names that do
not begin with letters must be enclosed in single quotes.

Strings of up to 8 character names including consecutive numbers may be abbreviated as


Xm(1)Xn, where X is a string of up to 4 letters of the alphabet, m is the up-to-4 character inte-
ger number of the first item and n is the up-to-4 character integer number of the last item.

Format

INAMES=( n1 , n2 ,..., nNTOTAL )

Default

1, 2, …, NTOTAL.

Related topics

 INPUT command: NTOTAL keyword (see Section 2.6.9)


 Setup menu: General dialog box (see Section 2.3.3)

INUMBERS keyword (optional)

Purpose

To specify the list of NTOTAL unique numbers. Strings of consecutive numbers may be ab-
breviated as m(1)n, where m is the number of the first item and n is the number of the last
item.

Format

INUMBERS=( n1 , n2 ,..., nNTOTAL )

Default

1, 2, …, NTOTAL.

183
2 BILOG-MG REFERENCE

Example

In the syntax that follows, 16 items belonging to 2 subtests are identified. From the LENGTH
command we see that each subtest has 8 items. The ITEMS command is used to first number
these items, and then to assign the names N1 through N8 to items belonging to the first sub-
test. Items belonging to the second subtest are named A1 through A8. On the TEST com-
mands, items are referenced by number. Referencing by the names assigned in the ITEMS
command is another option.

>LENGTH NITEMS=(8,8);
>INPUT NTOTAL=16,NALT=5,NIDCHAR=9,TYPE=3;
>ITEMS INUMBERS=(1(1)16),INAMES=(N1(1)N8,A1(1)A8);
>TEST1 TNAME=NUMCON,INUMBERS=(1(1)8);
>TEST2 TNAME=ALGCON,INUMBERS=(9(1)16);

Related topics

 INPUT command: NTOTAL keyword (see Section 2.6.9)

184
LENGTH COMMAND

2.6.11 LENGTH command

(Required)

Purpose

To supply the number of items in subtests and the number of variant items in the subtests.

Format

>LENGTH NITEMS=(list), NVARIANT=(list);

Example

Consider two subtests. Subtest 1 has a total of 20 items; subtest 2 has a total of 15 items.
Five of the items in subtest 1 are variant items. None of the items in subtest 2 are variant
items.

Note that the number of variant tests has to be specified using the NVTEST keyword on the
GLOBAL command. The corresponding number of TEST commands must also be included in
the syntax.

>GLOBAL DFNAME=’EXAMPL04.DAT’,NTEST=2,NVTEST=1;

>LENGTH NITEMS=(20,15), NVARIANT=(5,0);

Related topics

 GLOBAL command: NVTEST and NVTEST keywords (see Section 2.6.7)


 Setup menu: Item Analysis dialog box (see Section 2.3.3)
 TEST command (see Section 2.6.17)

NITEMS keyword (required)

Purpose

To provide a list of the number of items in the successive subtests to be analyzed. If a sub-
test contains variant items, they are included in this count of items.

Format

NITEMS=( n1 , n2 ,..., nNTESTS )

Default

None.

185
2 BILOG-MG REFERENCE

Example

In the example below, 20 of the 24 items are selected as main test items and 4 as variant test
items. The number of variant tests is specified using the NVTEST keyword on the GLOBAL
command. The TEST command for the main test is followed by a TEST command in which
the variant items are specified by item number.

>GLOBAL DFNAME='example.dat', NTEST=1, NVTEST=1;


>LENGTH NITEM=24, NVARIANT=4;
>INPUT NTOTAL=24;
>ITEMS INUMBER=(1(1)24);
>TESTM TNAME=MAINTEST,
INUMBER=(1(1)20);
>TESTV TNAME=VARIANT,
INUMBER=(21(1)24);

Related topics

 GLOBAL command: NTEST and NVTEST keywords (see Section 2.6.7)


 Setup menu: Item Analysis dialog box (see Section 2.3.3)
 TEST command (see Section 2.6.17)

NVARIANT keyword (optional)

Purpose

To specify the number of variant items, if any, in the successive subtests to be analyzed. Al-
though parameter estimates for these items will be obtained, these items are not used in scor-
ing of tests/forms.

Format

NVARIANT=( nv1 , nv2 ,..., nvNVTESTS )

Default

0.

Related topics

 GLOBAL command: NTEST and NVTEST keywords (see Section 2.6.7)


 Setup menu: Item Analysis dialog box (see Section 2.3.3)
 TEST command (see Section 2.6.17)

186
PRIORS COMMAND

2.6.12 PRIORS command

(Optional)

Purpose

To specify prior distributions for constrained estimation of the item parameters of the main
test and for the variant items, if any. This command is required when the READPR keyword
appears in the CALIB command.

There is one prior command for each subtest. Values are read in order of the items in the
subtest beginning with the main test items and ending with the variant test items. If
NGROUP>1, more than one set of prior means and standard deviations for the item thresholds
may be required when the DIF or DRIFT models are specified. See the TMU and TSIGMA
keywords below.

Format

>PRIORS TMU=(list),TSIGMA=(list),SMU=(list) SSIGMA=(list),


ALPHA=(list), BETA=(list);

Notes

If the same value applies to all items of the subtest, you may use the “repeat” form: “value
(0) number of values” (see Section 2.6.2).

For a mean of p with a weight of n observations for the beta prior distribution, set

ALPHA=np+1
BETA=n(1–p)+1

To set an item parameter to a fixed value, set the mean of the prior to the parameter value
and set the corresponding standard deviation to a very small value. Suitable values for
TSIGMA are 0.005, for SSIGMA, 0.001 and for ALPHA and BETA, n = 1000. The priors for free
parameters should be set to the default values above. The PRIORS command for each test
should appear immediately after the QUAD commands for that test.

Examples

The following example emanates from an analysis of aggregate-level, multiple-matrix sam-


pling data. Aggregate-level data typically have smaller slopes in the 0,1 metric than do per-
son-level data. For this reason, the mean of the prior for the log slopes has been set to 0.5 by
the use of the READPRI option on the CALIB command and the successive PRIOR commands.
The NOFLOAT option is used to keep the means of the prior distributions on the item parame-
ters fixed at their specified values during estimation.

187
2 BILOG-MG REFERENCE

>CALIB NQPT=30, CYCLES=24, NEWTON=4, CRIT=0.0050,


RIDGE=(2, 0.8000, 2.0000), ACCEL=1.0000, TPRIOR, READPRI,
NOFLOAT;
>PRIORS1 SMU=(0.5000(0)8);
>PRIORS2 SMU=(0.5000(0)8);

The next example illustrates how user-supplied priors for the latent distributions are speci-
fied with IDIST=1 on the CALIB command. The points and weights for these distributions
are supplied in the corresponding QUAD commands. Note that with IDIST=1, there are sepa-
rate QUAD commands for each group for each subtest. Within each subtest the points are the
same for each group. This is a requirement of the program. But as the example shows, the
points for the groups may differ by subtest. The PRIOR command for each subtest is placed
after the QUAD commands for that subtest. In this example, only the prior for the standard de-
viations of the thresholds is supplied on the PRIOR command. Default values are used for the
other prior distributions. The means of the distributions are kept fixed at their specified val-
ues by using the NOFLOAT option on the CALIB command.

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2,NTEST=2;
>LENGTH NITEMS=(35,35);
>INPUT NTOT=45,SAMPLE=2000,NGROUP=2,KFNAME='EXAMPL03.DAT',NALT=5,
NFORMS=2,NIDCHAR=5;
>ITEMS INUMBERS=(1(1)45), INAME=(C01(1)C45);
>TEST1 TNAME=SUBTEST1,INAME=(C01(1)C15,C21(1)C40);
>TEST2 TNAME=SUBTEST2,INAME=(C06(1)C25,C31(1)C45);
>FORM1 LENGTH=25,INUMBERS=(1(1)25);
>FORM2 LENGTH=25,INUMBERS=(21(1)45);
>GROUP1 GNAME=POP1,LENGTH=25,INUMBERS=(1(1)25);
>GROUP2 GNAME=POP2,LENGTH=25,INUMBERS=(21(1)45);
(T28,5A1,T25,I1,T25,I1/45A1)
>CALIB IDIST=1,READPR,EMPIRICAL,NQPT=16,CYCLE=25,TPRIOR,NEWTON=5,
CRIT=0.01,REFERENCE=1,NOFLOAT;
>QUAD1 POINTS=(-0.4598E+01,-0.3560E+01,-0.2522E+01,-0.1484E+01,
-0.4453E+00,0.5930E+00,0.1631E+01,0.2670E+01,0.3708E+01,
0.4746E+01),
WEIGHTS=(0.2464E-05,0.4435E-03,0.1724E-01,0.1682E+00,
0.3229E+00,0.3679E+00,0.1059E+00,0.1685E-01,0.6475E-03,
0.8673E-05);
>QUAD2 POINTS=(-0.4598E+01,-0.3560E+01,-0.2522E+01,-0.1484E+01,
-0.4453E+00,0.5930E+00,0.1631E+01,0.2670E+01,0.3708E+01,
0.4746E+01),
WEIGHTS=(0.2996E-04,0.1300E-02,0.1474E-01,0.1127E+00,
0.3251E+00,0.3417E+00,0.1816E+00,0.2149E-01,0.1307E-02,
0.3154E-04);
>PRIOR TSIGMA=(1.5(0)35);
>QUAD1 POINTS=(-0.4000E+01,-0.3111E+01,-0.2222E+01,-0.1333E+01,
-0.4444E+00,0.4444E+00,0.1333E+01,0.2222E+01,0.3111E+01,
0.4000E+01),
WEIGHTS=(0.1190E-03,0.2805E-02,0.3002E-01,0.1458E+00,
0.3213E+00,0.3213E+00,0.1458E+00,0.3002E-01,0.2805E-02,
0.1190E-03);
>QUAD2 POINTS=(-0.4000E+01,-0.3111E+01,-0.2222E+01,-0.1333E+01,
-0.4444E+00,0.4444E+00,0.1333E+01,0.2222E+01,0.3111E+01,
0.4000E+01),

188
PRIORS COMMAND

WEIGHTS=(0.1190E-03,0.2805E-02,0.3002E-01,0.1458E+00,
0.3213E+00,0.3213E+00,0.1458E+00,0.3002E-01,0.2805E-02,
0.1190E-03);
>PRIOR TSIGMA=(1.5(0)35);

Suppose IDIST=1, NGROUP=2, and NTEST=2. The setup for the QUAD and PRIOR commands is
as follows:

>QUAD1 (specifications for Group 1, subtest 1)


>QUAD2 (specifications for Group 2, subtest 1)
>PRIOR1 (specifications for Groups 1 and 2, subtest1)
>QUAD1 (specifications for Group 1, subtest2)
>QUAD2 (specifications for Group 2, subtest 2)
>PRIOR2 (specifications for Groups 1 and 2, subtest 2)

Related topics

 CALIB command: READPRIOR option (see Section 2.6.3)


 GLOBAL command: NTEST keyword (see Section 2.6.7)
 INPUT command: NGROUP keyword (see Section 2.6.9)
 QUAD command (see Section 2.6.13)
 Technical menu: Item Parameter Prior Constraints dialog box

ALPHA keyword (optional)

Purpose

To specify the real-valued “alpha” parameters for the beta prior distribution of lower asymp-
tote (guessing) parameters.

Format

ALPHA=( n1 , n2 ,..., nN )

Default

20p+1.

Related topics

 CALIB command: READPRIOR option (see Section 2.6.3)


 GLOBAL command: NPARM keyword (see Section 2.6.7)
 PRIORS command: BETA keyword
 Technical menu: Item Parameter Prior Constraints dialog box (see Section 2.3.5)

189
2 BILOG-MG REFERENCE

BETA keyword (optional)

Purpose

To specify the real-valued “beta” parameters for the beta prior distribution of lower asymp-
tote (guessing) parameters.

Format

BETA=( n1 , n2 ,..., nN )

Default

20(1–p)+1.

Related topics

 CALIB command: READPRIOR option (see Section 2.6.3)


 GLOBAL command: NPARM keyword (see Section 2.6.7)
 PRIORS command: ALPHA keyword
 Technical menu: Item Parameter Prior Constraints dialog box (see Section 2.3.5)

SMU keyword (optional)

Purpose

To provide real-valued prior means for the item slopes.

Format

SMU=( n1 , n2 ,..., nN )

Default

1.0.

Example

In the following example, SMU is used to specify prior means for the item slopes.

>CALIB NQPT=30, CYCLES=24, NEWTON=4, CRIT=0.0050, NOFLOAT,


RIDGE=(2, 0.8000, 2.0000), ACCEL=1.0000, SPRIOR, READPRI;
>PRIORS1 SMU=(0.5000(0)8);
>PRIORS2 SMU=(0.5000(0)8);

190
PRIORS COMMAND

Related topics

 CALIB command: READPRIOR option (see Section 2.6.3)


 Technical menu: Item Parameter Prior Constraints dialog box (see Section 2.3.5)

SSIGMA keyword (optional)

Purpose

To specify real-valued prior standard deviations of the item slopes.

Format

SSIGMA=( n1 , n2 ,..., nN )

Default

1.64872127.

Example

In the calibration of a single subtest with 35 items, the following PRIOR command is used to
provide a real-valued prior standard deviation of 1.75 for the item slopes.

>CALIB READPRI, NQPT=16,CYCLE=25, NEWTON=5;


>PRIOR SSIGMA=(1.75(0)35);

Related topics

 CALIB command: READPRIOR option (see Section 2.6.3)


 Technical menu: Item Parameter Prior Constraints dialog box (see Section 2.3.5)

TMU keyword (optional)

Purpose

To specify real-valued prior means for the item thresholds (DIF) or polynomial coefficients
(DRIFT) including intercept.

Format

TMU=( n1 , n2 ,..., nMITM1 , n1 , n2 ,..., nMITM 2 ,..., n1 , n2 ,..., nMITM L )

191
2 BILOG-MG REFERENCE

Default

0.0.

Example

In the example, below, PRIOR commands are used to specify prior distributions for the con-
strained estimation of the thresholds in the calibration of two subtests with 8 items each.

>CALIB NQPT=30, CYCLES=24, NEWTON=4, CRIT=0.0050, READPRI;


>PRIORS1 TMU=(2.0500(0)8);
>PRIORS2 TMU=(2.0500(0)8);

Related topics

 CALIB command: READPRIOR option (see Section 2.6.3)


 INPUT command: DIF option (see Section 2.6.9)
 INPUT command: DRIFT option
 Technical menu: Item Parameter Prior Constraints dialog box (see Section 2.3.5)

TSIGMA keyword (optional)

Purpose

To specify real-valued prior standard deviations of the threshold parameters.

If neither the DIF nor the DRIFT model is selected, L = 1. If the DIF model is selected, L =
NGROUP. If the DRIFT model is selected, L = MAXPOWER.

Format

TSIGMA=( n1 , n2 ,..., nMITM1 , n1 , n2 ,..., nMITM 2 ,..., n1 , n2 ,..., nMITM L )

Default

2.0.

Related topics

 CALIB command: READPRIOR option (see Section 2.6.3)


 DRIFT command: MAXPOWER keyword (see Section 2.6.5)
 INPUT command: DIF option (see Section 2.6.9)
 INPUT command: DRIFT option
 INPUT command: NGROUP keyword
 Technical menu: Item Parameter Prior Constraints dialog box (see Section 2.3.5)

192
QUAD COMMAND

2.6.13 QUAD command

(Required if IDIST = 1 or 2 on CALIB command)

Purpose

To read in user-supplied quadrature points and weights, or points and ordinates of the dis-
crete finite representations of the prior distribution for the groups. This command follows di-
rectly after the CALIB command.

If:

 IDIST = 0: This command is not used.


 IDIST = 1: There must be a separate QUAD command for each group for each subtest. For
any subtest, the points for each group must have the same values
 IDIST = 2: There must be a separate QUAD command for each group. The same set of QUAD
commands applies to all subtests. The points for each group must have the same values.

Format

>QUAD POINTS=(list), WEIGHTS=(list);

Example

This example illustrates user-supplied priors for the latent distributions are specified with
IDIST=1 on the CALIB command. The points and weights for these distributions are supplied
in the QUAD commands. Note that with IDIST=1, there are separate QUAD commands for each
group for each subtest.

Within each subtest the points are the same for each group. This is a requirement of the pro-
gram. But as the example shows, the points for the groups may differ by subtest. The PRIOR
command for each subtest is placed after the QUAD commands for that subtest.

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2,NTEST=2;
>LENGTH NITEMS=(35,35);
>INPUT NTOT=45,SAMPLE=2000,NGROUP=2,KFNAME='EXAMPL03.DAT',NALT=5,
NFORMS=2,NIDCHAR=5;
>ITEMS INUMBERS=(1(1)45), INAME=(C01(1)C45);
>TEST1 TNAME=SUBTEST1,INAME=(C01(1)C15,C21(1)C40);
>TEST2 TNAME=SUBTEST2,INAME=(C06(1)C25,C31(1)C45);
>FORM1 LENGTH=25,INUMBERS=(1(1)25);
>FORM2 LENGTH=25,INUMBERS=(21(1)45);
>GROUP1 GNAME=POP1,LENGTH=25,INUMBERS=(1(1)25);
>GROUP2 GNAME=POP2,LENGTH=25,INUMBERS=(21(1)45);
(T28,5A1,T25,I1,T25,I1/45A1)
>CALIB IDIST=1,READPR,EMPIRICAL,NQPT=16,CYCLE=25,TPRIOR,NEWTON=5,
CRIT=0.01,REFERENCE=1,NOFLOAT;
>QUAD1 POINTS=(-0.4598E+01,-0.3560E+01,-0.2522E+01,-0.1484E+01,
-0.4453E+00,0.5930E+00,0.1631E+01,0.2670E+01,0.3708E+01,

193
2 BILOG-MG REFERENCE

0.4746E+01),
WEIGHTS=(0.2464E-05,0.4435E-03,0.1724E-01,0.1682E+00,
0.3229E+00,0.3679E+00,0.1059E+00,0.1685E-01,0.6475E-03,
0.8673E-05);
>QUAD2 POINTS=(-0.4598E+01,-0.3560E+01,-0.2522E+01,-0.1484E+01,
-0.4453E+00,0.5930E+00,0.1631E+01,0.2670E+01,0.3708E+01,
0.4746E+01),
WEIGHTS=(0.2996E-04,0.1300E-02,0.1474E-01,0.1127E+00,
0.3251E+00,0.3417E+00,0.1816E+00,0.2149E-01,0.1307E-02,
0.3154E-04);
>PRIOR TSIGMA=(1.5(0)35);
>QUAD1 POINTS=(-0.4000E+01,-0.3111E+01,-0.2222E+01,-0.1333E+01,
-0.4444E+00,0.4444E+00,0.1333E+01,0.2222E+01,0.3111E+01,
0.4000E+01),
WEIGHTS=(0.1190E-03,0.2805E-02,0.3002E-01,0.1458E+00,
0.3213E+00,0.3213E+00,0.1458E+00,0.3002E-01,0.2805E-02,
0.1190E-03);
>QUAD2 POINTS=(-0.4000E+01,-0.3111E+01,-0.2222E+01,-0.1333E+01,
-0.4444E+00,0.4444E+00,0.1333E+01,0.2222E+01,0.3111E+01,
0.4000E+01),
WEIGHTS=(0.1190E-03,0.2805E-02,0.3002E-01,0.1458E+00,
0.3213E+00,0.3213E+00,0.1458E+00,0.3002E-01,0.2805E-02,
0.1190E-03);
>PRIOR TSIGMA=(1.5(0)35);

Related topics

 CALIB command: IDIST keyword (see Section 2.6.3)


 GLOBAL command: NTEST keyword (see Section 2.6.7)
 INPUT command: NGROUP keyword (see Section 2.6.9)
 Technical menu: Calibration Prior Latent Distribution dialog box (see Section 2.3.5)

POINTS keyword (optional)

Purpose

To specify the location of quadrature points.

If:

 IDIST = 1: a set of NQPT real-numbered values (with decimal points) of the quadrature
points must be supplied for each group for each subtest.
 IDIST = 2: one set of points is required for each group.

Format

POINTS=( n1 , n2 ,..., nNQPT )

194
QUAD COMMAND

Default

Supplied by program.

Example

See the example given above.

Related topics

 CALIB command: IDIST and NQPT keywords (see Section 2.6.3)


 GLOBAL command: NTEST keyword (see Section 2.6.7)
 INPUT command: NGROUP keyword (see Section 2.6.9)
 Technical menu: Calibration Prior Latent Distribution dialog box (see Section 2.3.5)

WEIGHTS keyword (optional)

Purpose

To supply the weights for the quadrature points.

If:

 IDIST = 1 on the CALIB command: A set of NQPT positive fractions (with decimal points
and summing to 1.0) for weights for quadrature points must be supplied for each group for
each subtest. This set of points applies to all subtests.
 IDIST = 2: One set of weights is required for each group. This set of weights applies to all
subtests.

Format

WEIGHTS=( n1 , n2 ,..., nNQPT )

Default

Supplied by program.

Related topics

 CALIB command: IDIST and NQPT keywords (see Section 2.6.3)


 GLOBAL command: NTEST keyword (see Section 2.6.7)
 INPUT command: NGROUP keyword (see Section 2.6.9)
 Technical menu: Calibration Prior Latent Distribution dialog box (see Section 2.3.5)

195
2 BILOG-MG REFERENCE

2.6.14 QUADS command

(Required command if IDIST =1 or IDIST =2 on SCORE command)

Purpose

To supply arbitrary prior distributions of scale scores for the respondents when EAP estima-
tion is selected. This command follows directly after the SCORE command.

If:

 IDIST = 0: This command is not required


 IDIST = 1: There must be as separate QUADSj command for each group for each subtest.
 IDIST = 2: There must be a separate QUADSj command for each group. The same set of
QUADS commands applies to all subtests.
 IDIST = 3: This command is not required.
 IDIST = 4: This command is not required.

If there are multiple groups (NGROUPS > 1) and IDIST = 1 or 2, the POINTS must have the
same values for all groups. The WEIGHTS may differ by group, and the POINTS may differ by
subtest.

Format

>QUADS POINTS=(list), WEIGHTS=(list);

Example

In the 2-group example below, an illustration is given of the use of user-supplied priors for
the scale scores (IDIST=2) for the respondents when EAP estimation is selected (METHOD=2).
The points and weights for these distributions are supplied in the QUADS commands. Note
that with IDIST=2, there are separate QUADS commands for each group.

>SCORE NQPT = 10, METHOD = 2, IDIST=2, INFO=1, YCOMMON, POP;


>QUADS1 POINTS=(-0.4598E+01,-0.3560E+01,-0.2522E+01,-0.1484E+01,
-0.4453E+00,0.5930E+00,0.1631E+01,0.2670E+01,0.3708E+01,
0.4746E+01),
WEIGHTS=(0.2464E-05,0.4435E-03,0.1724E-01,0.1682E+00,
0.3229E+00,0.3679E+00,0.1059E+00,0.1685E-01,0.6475E-03,
0.8673E-05);
>QUADS2 POINTS=(-0.4598E+01,-0.3560E+01,-0.2522E+01,-0.1484E+01,
-0.4453E+00,0.5930E+00,0.1631E+01,0.2670E+01,0.3708E+01,
0.4746E+01),
WEIGHTS=(0.2996E-04,0.1300E-02,0.1474E-01,0.1127E+00,
0.3251E+00,0.3417E+00,0.1816E+00,0.2149E-01,0.1307E-02,
0.3154E-04);

196
QUADS COMMAND

Related topics

 GLOBAL command: NTEST keyword (see Section 2.6.7)


 INPUT command: NGROUP keyword (see Section 2.6.9)
 SCORE command: IDIST keyword (see Section 2.6.16)
 Technical menu: Scoring Prior Latent Distribution dialog box (see Section 2.3.5)

POINTS keyword (optional)

Purpose

To specify real-numbered, non-negative values (with decimal points) for the NQPT points of
the arbitrary discrete prior distribution.

Format

POINTS=( n1 , n2 ,..., nNQPT )

Default

Supplied by program.

Example

See example above.

Related topics

 GLOBAL command: NTEST keyword (see Section 2.6.7)


 INPUT command: NGROUP keyword (see Section 2.6.9)
 SCORE command: IDIST or NQPT keywords (see Section 2.6.16)
 Technical menu: Scoring Prior Latent Distribution dialog box (see Section 2.3.5)

WEIGHTS keyword (optional)

Purpose

To specify real-numbered, non-negative values (with decimal points) for the NQPT weights
of the arbitrary discrete prior distribution. The sum of the weights must equal unity.

Format

WEIGHTS=( n1 , n2 ,..., nNQPT )

197
2 BILOG-MG REFERENCE

Default

Supplied by program.

Example

See the example above.

Related topics

 GLOBAL command: NTEST keyword (see Section 2.6.7)


 INPUT command: NGROUP keyword (see Section 2.6.9)
 SCORE command: IDIST or NQPT keywords (see Section 2.6.16)
 Technical menu: Scoring Prior Latent Distribution dialog box (see Section 2.3.5)

198
SAVE COMMAND

2.6.15 SAVE command

(Required when SAVE is specified on the GLOBAL command)

Purpose

This command is used to supply output filenames. The filenames must be less than 128
characters long and may contain a drive prefix, a path name, and an extension. The filename
must be enclosed in single quotes. Note that each line of the command file has a maximum
length of 80 characters. If the filename does not fit on one line of 80 characters, the remain-
ing characters should be placed on the next line, starting at column 1. All output files other
than the MASTER and CALIB files are saved in a formatted form. See Section 2.6.20 on output
files for more information. Note that, in order to use the SAVE command, the SAVE option
must be included in the GLOBAL command.

Format

>SAVE MASTER=n, CALIB=n, PARM=n, SCORE=n, COVARIANCE=n, TSTAT=n, POST=n,


EXPECTED=n, ISTAT=n, DIF=n, DRIFT=n, PDISTRIB=n;

Example

In the syntax below, the item parameters and scale scores are saved to file through use of the
SCORE and PARM keywords on the SAVE command. Note that, in order to use the SAVE com-
mand, the SAVE keyword is added to the GLOBAL command.

>GLOBAL DFNAME=’EXAMPLE.DAT’, NPARM=2, SAVE;


>SAVE SCORE=’EXAMPLE.SCO’, PARM=’EXAMPLE.PAR’;
>LENGTH NITEMS=(40);

Related topics

 GLOBAL command: SAVE option (see Section 2.6.7)


 Output files (see Section 2.6.20)
 Save menu (see Section 2.3.6)

CALIB keyword (optional)

Purpose

To specify a filename for the calibration data file that is to be saved. The original response
data are sampled and calibrated, then saved as a temporary binary file. If no sampling oc-
curs, this temporary file cannot be created. Upon normal termination of the program this
temporary file is deleted automatically. By assigning a specific name to the calibration data
file, the user can save and reuse it as a master data file in subsequent analyses.

199
2 BILOG-MG REFERENCE

Format

CALIB=<’filename’>

Default

Do not save.

Example

The calibration file is saved to exampl03.cal using the CALIB keyword on the SAVE com-
mand.

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2, SAVE;


>SAVE CALIB='EXAMPL03.CAL';

Related topics

 GLOBAL command: SAVE option (see Section 2.6.7)


 Output files (see Section 2.6.20)
 Save menu (see Section 2.3.6)

COVARIANCE keyword (optional)

Purpose

To specify a filename for the external file to which the covariances of item parameter esti-
mates for each item are written. This file is written automatically in the calibration phase
(Phase 2) as a temporary file, which passes necessary information to the scoring phase
(Phase 3). Normally, it is deleted at the termination of the program, but by assigning a spe-
cific name to this file the user can save it as a permanent file.

Format

COVARIANCE=<’filename’>

Default

Do not save.

Example

A covariance file from a previous calibration can be used to compute test information by
specifying the name of the file with the COVARIANCE keyword on the SAVE command. During
the scoring phase, the item information indices will be added to this file if requested. This
feature is intended for use when scoring is based on a previously created item parameter file.

200
SAVE COMMAND

It must be used in conjunction with an IFNAME specification on the GLOBAL command, as


shown below:

>GLOBAL IFNAME=’EXAMPLE.PAR’, SAVE;


>SAVE COV=’EXAMPLE.COV’;

Related topics

 GLOBAL command: SAVE option (see Section 2.6.7)


 GLOBAL command: IFNAME keyword
 Output files (see Section 2.6.20)
 Save menu (see Section 2.3.6)

DIF keyword (optional)

Purpose

To specify a filename for saving the DIF parameters if requested and computed during the
calibration phase (Phase 2) to an external file.

Format

DIF=<’filename’>

Default

Do not save.

Example

The DIF parameters are saved to the file exampl03.dif using the DIF keyword on the SAVE
command.

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2, SAVE;


>SAVE DIF='EXAMPL03.DIF';
>INPUT NGROUPS=2, DIF, …;

Related topics

 GLOBAL command: SAVE option (see Section 2.6.7)


 INPUT command: DIF option (see Section 2.6.9)
 Output files (see Section 2.6.20)
 Save menu (see Section 2.3.6)

201
2 BILOG-MG REFERENCE

DRIFT keyword (optional)

Purpose

To specify a filename for saving the DRIFT parameters computed during the calibration
phase (Phase 2) to an external file.

Format

DRIFT=<’filename’>

Default

Do not save.

Example

In the following example, the DRIFT parameters are saved to the file exampl03.dri using
the DRIFT keyword on the SAVE command.

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2, SAVE;


>SAVE DRIFT='EXAMPL03.DRI';
>INPUT NGROUPS=2, DRIFT, …;

Related topics

 GLOBAL command: SAVE option (see Section 2.6.7)


 INPUT command: DRIFT option (see Section 2.6.9)
 Output files (see Section 2.6.20)
 Save menu (see Section 2.3.6)

EXPECTED keyword (optional)

Purpose

To specify the filename to which the expected frequencies of correct responses, attempts,
and proportions of correct responses, attempts, and proportions of correct responses for each
item at each quadrature point by group will be saved. This file will also contain standardized
posterior residuals and model proportions of correct responses.

Format

EXPECTED=<’filename’>

202
SAVE COMMAND

Default

Do not save.

Example

In the following example, the expected frequencies are saved to exampl03.frq using the
EXPECTED keyword on the SAVE command.

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2, SAVE;


>SAVE EXPECTED='EXAMPL03.FRQ';

Related topics

 GLOBAL command: SAVE option (see Section 2.6.7)


 Output files (see Section 2.6.20)
 Save menu (see Section 2.3.6)

ISTAT keyword (optional)

Purpose

To specify a filename for saving the classical item statistics computed in Phase 1 of the pro-
gram to an external file.

Format

ISTAT=<’filename’>

Default

Do not save.

Example

The classical item statistics are saved to the file exampl03.sta using the ISTAT keyword on
the SAVE command.

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2, SAVE;


>SAVE ISTAT='EXAMPL03.STA';

Related topics

 GLOBAL command: SAVE option (see Section 2.6.7)


 Output files (see Section 2.6.20)
 Save menu (see Section 2.3.6)

203
2 BILOG-MG REFERENCE

MASTER keyword (optional)

Purpose

To specify a filename for the master data file. The original response data are scored and
stored as a temporary binary file. Upon normal termination of the program this temporary
file is deleted automatically. By assigning a specific name to this master data file, the user
can save and reuse it as an input file in subsequent analyses.

Format

MASTER=<’filename’>

Default

Do not save.

Example

The master file is saved to exampl03.mas using the MAS keyword on the SAVE command.

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2, SAVE;


>SAVE MAS='EXAMPL03.MAS';

Related topics

 GLOBAL command: SAVE option (see Section 2.6.7)


 Output files (see Section 2.6.20)
 SAVE command: CALIB keyword
 Save menu (see Section 2.3.6)

PARM keyword (optional)

Purpose

To specify a filename for the item parameter file.

Item parameter estimates are saved in a formatted form as an external output file. This file
can be used as initial estimates of item parameters for further iterations or as final estimates
of the item parameters for scoring new data.

In either case, the user must specify the name of the previously created item parameter file
using the IFNAME keyword of the GLOBAL command.

204
SAVE COMMAND

Format

PARM=<’filename’>

Default

Do not save.

Example

In the syntax below, the item parameters are saved to file through use of the SCORE and PARM
keywords on the SAVE command. Note that, in order to use the SAVE command, the SAVE
option is added to the GLOBAL command.

>GLOBAL DFNAME=’EXAMPLE.DAT’, NPARM=2, SAVE;


>SAVE PARM=’EXAMPLE.PAR’;
>LENGTH NITEMS=(40);

The use of this file as initial estimates for further iterations is illustrated in the syntax below:

>GLOBAL DFNAME=’EXAMPLE.DAT’, IFNAME=’EXAMPLE.PAR’, NPARM=2;


>LENGTH NITEMS=(40);

Related topics

 GLOBAL command: IFNAME keyword (see Section 2.6.7)


 GLOBAL command: SAVE option
 Output files (see Section 2.6.20)
 Save menu (see Section 2.3.6)

PDISTRIB keyword (optional)

Purpose

To save the points and weights of the posterior latent distribution at the end of Phase 2 to an
external file. These quantities can be included as prior values following the SCORE command
for later EAP estimation of ability from previously estimated item parameters.

Format

PDISTRIB=<’filename’>

Default

Do not save.

205
2 BILOG-MG REFERENCE

Related topics

 SCORE command (see Section 2.6.16)


 Save menu (see Section 2.3.6)

POST keyword (optional)

Purpose

To save the case weight and marginal probability for each observation to an external output
file.

Format

POST=<’filename’>

Default

Do not save.

Example

The case weights and marginal probabilities are saved to the file exampl03.pos using the
POST keyword on the SAVE command.

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2, SAVE;


>SAVE POST='EXAMPL03.POS';

Related topics

 GLOBAL command: SAVE option (see Section 2.6.7)


 Output files (see Section 2.6.20)
 Save menu (see Section 2.3.6)

SCORE keyword (optional)

Purpose

To specify a filename when the score file is to be saved.

Format

SCORE=<’filename’>

206
SAVE COMMAND

Default

Do not save.

Example

In the following example, the score file is saved to exampl03.sco using the SCORE keyword
on the SAVE command.

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2, SAVE;


>SAVE SCORE='EXAMPL03.SCO';

Related topics

 GLOBAL command: SAVE option (see Section 2.6.7)


 Output files (see Section 2.6.20)
 Save menu (see Section 2.3.6)

TSTAT keyword (optional)

Purpose

To specify a filename when the tables of test information statistics are to be saved.

Format

TSTAT=<’filename’>

Default

Do not save.

Example

The test information statistics file is saved to exampl03.tsa using the TSTAT keyword on the
SAVE command.

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2, SAVE;


>SAVE TSTAT='EXAMPL03.TSA';

Related topics

 GLOBAL command: SAVE option (see Section 2.6.7)


 Output files (see Section 2.6.20)
 Save menu (see Section 2.3.6)

207
2 BILOG-MG REFERENCE

2.6.16 SCORE command

(Optional)

Purpose

To initiate the scoring of individual examinees or of response patterns; to compute item and
test information and plot information curves; to rescale scores to a specified mean and stan-
dard deviation in either the sample or the latent distribution.

Format

>SCORE METHOD=n, NQPT=(list), IDIST=n, PMN=(list), PSD=(list), RSCTYPE=n,


LOCATION=(list), SCALE=(list), INFO=n, BIWEIGHT, FIT, NOPRINT,
YCOMMON, POP, MOMENTS, FILE, READF, REFERENCE=n, NFORMS=n;

Examples

The aggregate scores for the following analysis of school-level data are estimated by the
EAP method using the empirical distributions from Phase 2. The number of quadrature
points is set to 12 per subtest.

The scores are rescaled to a mean of 250 and a standard deviation of 50 in the latent distri-
bution of schools (IDIST=3, LOCATION=250, SCALE=50). The fit of the data to the group-
level model is tested for each school (FIT).

>SCORE NQPT=(12, 12), IDIST=3, RSCTYPE=4, LOCATION=(250.0000, 250.0000),


SCALE=(50.0000, 50.0000), FIT;

The next SCORE command gives the specifications for a scoring phase that includes an in-
formation analysis (INFO=2) with expected information indices for a normal population
(POP). Rescaling of the scores and item parameters to mean 0 and standard deviation 1 in the
estimated latent distribution has been requested (RSC=3). Printing of the students' scores on
the screen is suppressed (NOPRINT).

>SCORE NQPT=6, NOPRINT, RSCTYPE=3, INFO=2, POP;

In the following SCORE command, the EAP scale scores of Phase 3 are computed from the
responses to items in the main test as specified by setting METHOD to 2. Printing of scores is
suppressed (NOPRINT).

>SCORE METHOD=2,NOPRINT;

In this score command, Maximum Likelihood estimates of ability (METHOD=1) are rescaled to
a mean of 250 and standard deviation of 50 in Phase 3 (RSCTYPE=3, LOCATION=250,
SCALE=50).

208
SCORE COMMAND

>SCORE METHOD=1, RSCTYPE=3, LOCATION=(250.0000), SCALE=(50.0000), INFO=1,


NOPRINT;

Related topics

 Phase 2: CALIBRATE (see Section 2.2)


 Setup menu: Test Scoring dialog box (see Section 2.3.3)
 Technical menu: Score Options dialog box (see Section 2.3.5)
 Technical menu: Scoring Prior Latent Distribution dialog box

BIWEIGHT option (optional)

Purpose

To request the calculation of biweighted estimates robust to isolated deviant responses. (See
also Mislevy & Bock, 1982.)

Format

BIWEIGHT

Related topics

 Setup menu: Test Scoring dialog box (see Section 2.3.3)

DOMAIN keyword (optional)

Purpose

To convert the Phase 3 estimates into domain scores if the user supplies a file containing the
item parameters for a sample of previously calibrated items. The FILE keyword on the
SCORE command is used to specify this parameter file. Weights can be applied to the items to
improve the representation of the domain specifications. This conversion may be useful as
an aid to the interpretation of test results (see Bock, Thissen, & Zimowski (1997).)

Note that the formula for the domain scores that appears in the paper cited here contains ty-
pographical errors. The computation of the domain scores in the program uses the corrected
formula. The domain scores will appear in the score listing following the test percent correct
score for each case in the Phase 3 output file.

Format

DOMAIN=n

where n represents the number of items in the domain.

209
2 BILOG-MG REFERENCE

Default

No domain scores.

Related topics

 SCORE command: FILE keyword


 Technical menu: Score Options dialog box (see Section 2.3.5)

FILE keyword (required if DOMAIN keyword is used)

Purpose

To specify the name of the file containing the item parameters to be used for the domain
score conversions.

The first line of the file referenced by the FILE keyword must contain a variable format
statement (in parentheses) describing the column layout of the weights and parameter in the
following lines of the file. The values must be read in order—item weight, slope, threshold,
and guessing parameter. The weights will be automatically scaled to sum to 1.0 by the pro-
gram. The domain score will appear in the score listing following the test percent correct
score for each case. Note that the parameter file produced by the SAVE command does not
have the layout described above.

Format

FILE=<'filename'>

Default

No domain scores or supplied file.

Related topics

 SCORE command: DOMAIN keyword


 Technical menu: Score Options dialog box (see Section 2.3.5)

FIT option (optional)

Purpose

To request the computation of a likelihood ratio χ 2 goodness-of-fit statistic for each re-
sponse pattern. This statistic is intended only for use with aggregate-level data.

210
SCORE COMMAND

Format

FIT

Default

No fit statistic.

Example

The aggregate scores for this analysis of school-level data are estimated by the EAP method
using the empirical distributions from Phase 2. The fit of the data to the group-level model is
tested for each school (FIT).

>SCORE NQPT=(12, 12), IDIST=3, RSCTYPE=4, LOCATION=(250.0000, 250.0000),


SCALE=(50.0000, 50.0000), FIT;

Related topics

 Setup menu: Test Scoring dialog box (see Section 2.3.3)

IDIST keyword (optional)

Purpose

To designate the type of prior distribution of scale scores. IDIST = 0 applies to both MAP
and EAP estimation. IDIST = 1, 2, 3, or 4 applies only to EAP estimation.

Format

IDIST=n

 n=0 Standard normal approximation.


 n=1 Separate, arbitrary discrete priors for each group for each subtest read
from QUADSj command.
 n = 2 Separate arbitrary discrete priors for each group read from QUADSj command.
 n = 3 Empirical prior estimated during Phase 2.
 n = 4 35-point rectangular prior on the interval ± 3.5. These scores may be transformed
to the 150 – 850 range by setting LOCATION = 500.0 and SCALE = 100.0.

Default

0.

211
2 BILOG-MG REFERENCE

Examples

In the following aggregate-level example, IDIST=3 is used to estimate scores by the EAP
method by using the empirical distributions from Phase 2.

>SCORE NQPT=(12, 12), IDIST=3, RSCTYPE=4, LOCATION=(250.0000, 250.0000),


SCALE=(50.0000, 50.0000), FIT;

In the next example, EAP estimates of ability are calculated (METHOD=2) using the informa-
tion in the posterior distributions from Phase 2 (IDIST=3). The ability estimates are rescaled
to a mean of 0 and standard deviation of 1 by specifying RSCTYPE=3 on the SCORE command.

>SCORE METHOD=2, IDIST=3, NOPRINT, RSCTYPE=3;

Related topics

 QUADS command (see Section 2.6.14)


 Setup menu: Test Scoring dialog box (see Section 2.3.3)
 Technical menu: Scoring Prior Latent Distribution dialog box (see Section 2.3.5)

INFO keyword (optional)

Purpose

To select information output.

Format

INFO=n

 n=0 none
 n=1 test information curves
 n=2 test information curves and table of information statistics

Default

0.

Examples

The following SCORE command gives the specifications for a scoring phase that includes an
information analysis (INFO=2) with expected information indices for a normal population
(POP).

>SCORE NQPT=6, NOPRINT, RSCTYPE=4, INFO=2, POP;

212
SCORE COMMAND

In the following SCORE command, Maximum Likelihood estimates of ability (METHOD=1) are
rescaled to a mean of 250 and standard deviation of 50 in Phase 3.

>SCORE METHOD=1, RSCTYPE=3, LOCATION=(250.0000), SCALE=(50.0000),


INFO=1, NOPRINT;

Related topics

 SCORE command: POP and YCOMMON options

LOCATION keyword (optional)

Purpose

To specify real-valued location constants (with decimal points) for rescaling.

Format

LOCATION=( n1 , n2 ,..., nNTEST )

Default

0.0.

Examples

The scores are rescaled to a mean of 250 and a standard deviation of 50 in the latent distri-
bution of schools (IDIST=3, LOCATION=250, SCALE=50). The fit of the data to the group-
level model is tested for each school (FIT).

>SCORE NQPT=(12, 12), IDIST=3, RSCTYPE=4, LOCATION=(250.0000, 250.0000),


SCALE=(50.0000, 50.0000), FIT;

In the next SCORE command, Maximum Likelihood estimates of ability (METHOD=1) are re-
scaled to a mean of 250 and standard deviation of 50 in Phase.

>SCORE METHOD=1, RSCTYPE=3, LOCATION=(250.0000), SCALE=(50.0000),


INFO=1, NOPRINT;

Related topics

 SCORE command: RSCTYPE keyword


 SCORE command: SCALE keyword
 Setup menu: Test Scoring dialog box (see Section 2.3.3)

213
2 BILOG-MG REFERENCE

METHOD keyword (optional)

Purpose

To specify the method of estimating scale scores. If ML is selected, it is advisable to use the
PMN keyword to set bounds on the estimated scores. If EAP or MAP is selected, the PMN and
PSD keywords may be used to specify the means and standard deviations of the prior distri-
butions.

Format

METHOD=n

 n=1 Maximum likelihood (ML)


 n=2 Expected a posteriori (EAP) (Bayes)
 n=3 Maximum a posteriori (MAP) (Bayes modal)

Default

2.

Examples

In this score command, Maximum Likelihood estimates of ability (METHOD=1) are rescaled to
a mean of 250 and standard deviation of 50 in Phase 3 (RSCTYPE=3, LOCATION=250,
SCALE=50).

>SCORE METHOD=1, RSCTYPE=3, LOCATION=(250.0000), SCALE=(50.0000),


INFO=1, NOPRINT;

Related topics

 SCORE command: PMN keyword


 SCORE command: PSD keyword
 Setup menu: Test Scoring dialog box (see Section 2.3.3)

MOMENTS option (optional)

Purpose

To request the computation and listing of the coefficients of skewness and kurtosis of the
ability estimates and of the latent distribution.

214
SCORE COMMAND

Format

MOMENTS

Default

No computation or listing.

Examples

The MOMENTS keyword on the SCORE commands below is used to obtain the coefficients of
skewness and kurtosis for the rescaled ability.

>SCORE NQPT=11,RSCTYPE=3,LOCATION=250,SCALE=50,NOPRINT,INFO=1,
POP,MOMENT;

>SCORE IDIST=3,RSCTYPE=3,INFO=1,YCOMMON,POP,NOPRINT,MOMENTS;

Related topics

 Technical menu: Score Options dialog box (see Section 2.3.5)

NFORM keyword (optional)

Purpose

To indicate the number of additional FORM commands after the SCORE command. It is used
when scoring is to be performed using these additional form specifications. The reference
form for scoring is set using the REFERENCE keyword on the SCORE command.

Format

NFORMS=n

Default

No additional FORM commands are expected.

Example

In the example below, two additional form commands follow the SCORE command. The first
is the references group (as set by the REFERENCE keyword) while the READF keyword in-
structs the program to read and process the additional FORM commands.

>SCORE IDIST=3,RSCTYPE=3,INFO=1,YCOMMON,POP,NOPRINT,REF=1,NFORMS=2,READF;
>FORM1 LENGTH=25,INUM=(1(1)25);
>FORM2 LENGTH=25,INUM=(21(1)45);

215
2 BILOG-MG REFERENCE

Related topics

 FORM command (see Section 2.6.6)


 SCORE command: READF option
 SCORE command: REFERENCE keyword

NOPRINT option (optional)

Purpose

To suppress the display of the scores on screen and in the printed output of Phase 3.

To shorten the run time for scoring a large subject response file, it is advisable to specify an
external file using the SCORE keyword in the SAVE command, and the NOPRINT option. In this
way, scores for all subjects are computed but are stored only in the external file.

Format

NOPRINT

Default

Scores will appear both on screen and in the Phase 3 output file.

Examples

The EAP scale scores of Phase 3 are computed from the responses to items in the main test
as specified by setting METHOD to 2. Printing of scores is suppressed (NOPRINT).

>SCORE METHOD=2, NOPRINT;

Related topics

 Setup menu: Test Scoring dialog box (see Section 2.3.3)

NQPT keyword (optional)

Purpose

To set the number of quadrature points for each subtest with the NQPT keyword when EAP
estimation is selected by the METHOD keyword.

To reduce computing time when there are items not-presented, use 2 x square root of the
maximum number of items per respondent as the number of quadrature points.

216
SCORE COMMAND

Format

NQPT=( n1 , n2 ,..., nNTEST )

Default

Computed by program as a function of number of items in complete data.

Example

The aggregate scores for this analysis of school-level data are estimated by the EAP method
using the empirical distributions from Phase 2. The number of quadrature points is set to 12
per subtest.

>SCORE NQPT=(12, 12), IDIST=3, RSCTYPE=4, LOCATION=(250.0000, 250.0000),


SCALE=(50.0000, 50.0000), FIT;

Related topics

 GLOBAL command: NTEST keyword (see Section 2.6.7)


 SCORE command: METHOD keyword
 Technical menu: Scoring Prior Latent Distribution dialog box (see Section 2.3.5)

PMN keyword (optional)

Purpose

To specify real-numbered means (with decimal points) of the normal prior distributions for
each group for each subtest.

Format

PMN=( n1,1 , n1,2 ,..., n1, NGROUP , n2,1 , n2,2 ,..., n2, NGROUP ,..., nNTEST ,1 , nNTEST ,2 ,..., nNTEST , NGROUP )

Default

0.0.

Example

In the following two-group analysis for one subtest, the PMN and PSD keywords are used on
the SCORE command to provide the means and standard deviations of the normal prior distri-
butions for each subtest.

>SCORE PMN=(0.00051,-0.16191), PSD=(0.00001,0.89707);

217
2 BILOG-MG REFERENCE

Related topics

 GLOBAL command: NTEST keyword (see Section 2.6.7)


 INPUT command: NGROUP keyword (see Section 2.6.9)
 SCORE command: PSD keyword
 Technical menu: Scoring Prior Latent Distribution dialog box (see Section 2.3.5)

POP option (optional)

Purpose

To request the calculation of the expected information for the population when INFO > 0.
This includes an estimate of the classical reliability coefficient for each subtest. The score
metric after rescaling is used in these calculations.

Format

POP

Default

No expected information calculated for population.

Example

This SCORE command gives the specifications for a scoring phase that includes an informa-
tion analysis (INFO=2) with expected information indices for a normal population (POP). Re-
scaling of the scores and item parameters to mean 0 and standard deviation 1 in the esti-
mated latent distribution has been requested (RSC=3). Printing of the students' scores on the
screen is suppressed (NOPRINT).

>SCORE NQPT=6, NOPRINT, RSCTYPE=4, INFO=2, POP;

Related topics

 SCORE command: INFO keyword


 Phase 3: SCORING (see Section 2.2)

PSD keyword (optional)

Purpose

To specify real-numbered standard deviations (with decimal points) of the normal prior dis-
tributions for each group for each subtest.

218
SCORE COMMAND

Format

PSD=( n1,1 , n1,2 ,..., n1, NGROUP , n2,1 , n2,2 ,..., n2, NGROUP ,..., nNTEST ,1 , nNTEST ,2 ,..., nNTEST , NGROUP )

Default

1.0.

Example

In the following two-group analysis for one subtest, the PMN and PSD keywords are used on
the SCORE command to provide the means and standard deviations of the normal prior distri-
butions for each subtest.

>SCORE PMN=(0.00051, -0.16191), PSD=(0.00001, 0.89707);

Related topics

 GLOBAL command: NTEST keyword (see Section 2.6.7)


 INPUT command: NGROUP keyword (see Section 2.6.9)
 SCORE command: PMN keyword
 Technical menu: Scoring Prior Latent Distribution dialog box (see Section 2.3.5)

READF option (optional)

Purpose

To indicate the presence of multiple FORM commands after the SCORE command. It is used to
indicate that scoring is to be performed using this form specification. The reference form for
scoring is set using the REFERENCE keyword on the SCORE command.

Format

READF

Default

No additional FORM commands are expected.

Example

In the example below, two additional form commands follow the SCORE command. The first
is the references group (as set by the REFERENCE keyword) while the READF keyword in-
structs the program to read and process the additional FORM commands.

219
2 BILOG-MG REFERENCE

>SCORE IDIST=3,RSCTYPE=3,INFO=1,YCOMMON,POP,NOPRINT,REF=1,NFORMS=2,READF;
>FORM1 LENGTH=25,INUM=(1(1)25);
>FORM2 LENGTH=25,INUM=(21(1)45);

Related topics

 FORM command (see Section 2.6.6)


 SCORE command: REFERENCE and NFORM keywords

REFERENCE keyword (optional)

Purpose

To set the reference group for scoring when scoring is performed by forms, as specified with
the READF and NFORM keywords on the same command. Note that, if this keyword is omitted
while the READF and NFORM keywords are present, the reference form specified in the CALIB
command will be used.

Format

REFERENCE=n

Default

Set by REFERENCE keyword on CALIB command.

Example

In the example below, two additional form commands follow the SCORE command. The first
is the references group (as set by the REFERENCE keyword) while the READF keyword in-
structs the program to read and process the additional FORM commands.

>SCORE IDIST=3,RSCTYPE=3,INFO=1,YCOMMON,POP,NOPRINT,REF=1,NFORMS=2,READF;
>FORM1 LENGTH=25,INUM=(1(1)25);
>FORM2 LENGTH=25,INUM=(21(1)45);

Related topics

 CALIB command: REFERENCE keyword (see Section 2.6.3)


 SCORE command: READF option
 SCORE command: NFORM keyword

220
SCORE COMMAND

RSCTYPE keyword (optional)

Purpose

To specify the type of rescaling required.

Format

RSCTYPE=n

Uses the LOCATION and SCALE constants specified by the options below. Note that there is no
option 2.

 0: no rescaling
 1: linear transformation of scores: new score = SCALE x old score + LOCATION
 3: rescale to SCALE and LOCATION in the sample of scale score estimates
 4: only if EAP estimation has been selected: Set the mean of the latent population distri-
bution equal to LOCATION and set the standard deviation equal to SCALE.

Default

0.

Examples

The aggregate scores for this analysis of school-level data are estimated by the EAP method
using the empirical distributions from Phase 2. The number of quadrature points is set to 12
per subtest. The scores are rescaled to a mean of 250 and a standard deviation of 50 in the
latent distribution of schools (IDIST=3, LOCATION=250, SCALE=50). The fit of the data to the
group-level model is tested for each school (FIT).

>SCORE NQPT=(12, 12), IDIST=3, RSCTYPE=4, LOCATION=(250.0000, 250.0000),


SCALE=(50.0000, 50.0000), FIT;

Related topics

 SCORE command: LOCATION and SCALE keywords


 Setup menu: Test Scoring dialog box (see Section 2.3.3)

SCALE keyword (optional)

Purpose

To specify real-valued scale constants (with decimal points) for rescaling.

221
2 BILOG-MG REFERENCE

Format

SCALE=( n1 , n2 ,..., nNTEST )

Default

1.0.

Examples

In the following example, Maximum Likelihood estimates of ability (METHOD=1) are rescaled
to a mean of 250 and standard deviation of 50 in Phase 3 (RSCTYPE=3, LOCATION=250,
SCALE=50).

>SCORE METHOD=1, RSCTYPE=3, LOCATION=(250.0000), SCALE=(50.0000),


INFO=1, NOPRINT;

Related topics

 SCORE command: LOCATION and RSCTYPE keywords


 Setup menu: Test Scoring dialog box (see Section 2.3.3)

YCOMMON option (optional)

Purpose

To specify that the test information curves for subtests should be expressed in comparable
units when INFO > 0. If INFO = 0, the curves for subsets will be adjusted separately to make
each plot fill the available space.

Format

YCOMMON

Default

Plots adjusted separately.

Example

The following SCORE command specifies a scoring phase that includes an information analy-
sis (INFO=2) with expected information indices for a normal population (POP).

222
SCORE COMMAND

Test information curves for subtests will be expressed in comparable units and printed to the
Phase 3 output file.

>SCORE INFO=2, POP, YCOMMON;

Related topics

 SCORE command: INFO keyword


 SCORE command: POP option

223
2 BILOG-MG REFERENCE

2.6.17 TEST command

(Required)

Purpose

To identify the main test items and the variant test items (if any) in each of the NTEST sub-
tests. If the subtest contains only main test items, there is only one TEST command for that
subtest. If there are variant items in the subtest, two TEST commands are required for that
subtest. The first describes the main test items, while the second describes the variant test
items. There are as many TEST commands as there are main and variant subtests specified in
the NTEST and NVTEST keywords of the GLOBAL command.

Items may be identified by name or number, but not by both. The names or numbers must
correspond to those listed in the ITEMS command. If numbers are supplied, the program will
refer to the names supplied in the ITEMS command only for printing of item information.
Starting values for estimating the item parameters may also be supplied in the TEST com-
mand. Note that parameter estimation for variant items is non-iterative and does not require
starting values.

Format

>TEST TNAME=n, INUMBER=(list), INAME=(list), INTERCPT=(list),


SLOPE=(list), THRESHLD=(list), GUESS=(list), DISPERSN=(list),
FIX=(list);

Default

All items are used.

Examples

In the example below, two subtests are used, each with 8 items. The NTEST keyword on the
GLOBAL command indicates that two subtests are to be used, and two TEST commands follow
the ITEMS command. The TEST commands are assigned names through the TNAME keyword
and items are referenced by number.

>GLOBAL NPARM=3,NTEST=2,DFNAME='EXAMPL08.DAT';
>LENGTH NITEMS=(8,8);
>INPUT NTOTAL=16,NALT=5,NIDCHAR=9,TYPE=3;
>ITEMS INUMBERS=(1(1)16),INAMES=(N1(1)N8,A1(1)A8);
>TEST1 TNAME=NUMCON,INUMBERS=(1(1)8);
>TEST2 TNAME=ALGCON,INUMBERS=(9(1)16);

In the next example, the ITEMS command lists the four items in the order that they will be
read from the data records. The INAMES and INUMBERS keywords assign each item a name
and a corresponding number. Because there is only one form, the NFORM keyword is not re-

224
TEST COMMAND

quired in the INPUT command and a FORM command is not required. Because examinees in
both groups are presented all the items listed in the ITEMS command, the TEST command
needs contain only the test name.

>GLOBAL NPARM=1,NWGHT=3,LOGISTIC;
>LENGTH NITEMS=4;
>INPUT NTOTAL=4,NGROUPS=2,DIF,NIDCHAR=2,TYPE=2;
>ITEMS INAMES=(SP1(1)SP4),INUMBERS=(1(1)4);
>TEST TNAME=SPELL;
>GROUP1 GNAME=MALES;
>GROUP2 GNAME=FEMALES;

Related topics

 GLOBAL command NTEST and NVTEST keywords (see Section 2.6.7)


 ITEMS command (see Section 2.6.10)
 Setup menu: General dialog box (see Section 2.3.3)
 Setup menu: Item Analysis dialog box
 Technical menu: Item Parameter Starting Values dialog box (see Section 2.3.5)
 Technical menu: Assign Fixed Items dialog box

DISPERSN keyword (optional)

Purpose

To specify positive real-numbered starting values for dispersion (2- and 3-parameter models
only).

Starting values may be specified for slopes or for dispersions, but not for both.

Format

DISPERSN=( n1 , n2 ,..., nn ( i ) )

Default

1/slope.

Example

In the syntax below, starting values for the dispersion and intercepts of the four items con-
sidered in this 3-parameter model are provided on the TEST command.

EXAMPLE:
USING STARTING VALUES
>GLOBAL NPARM=3,LOGISTIC';
>LENGTH NITEMS=4;
>INPUT NTOTAL=4,NIDCHAR=2;

225
2 BILOG-MG REFERENCE

>ITEMS INAME=(SP01,SP02,SP03,SP04),INUMBERS=(1(1)4);
>TEST TNAME=SPELL,
INTERCPT = (1.284,0.287,-1.912,-0.309),
DISPERSN=(0.957,0.623,0.545,0.620);

Related topics

 Technical menu: Item Parameter Starting Values dialog box (see Section 2.3.5)
 TEST command: SLOPE keyword

FIX keyword (optional)

Purpose

To specify whether the parameters of specific items are free to be estimated or are to be held
fixed at their starting values. This keyword appears in the j-th TEST command as
FIX=( n1 , n2 ,..., nLENGTH ( j ) ) where

 ni = 0 if parameters of item i of test j are free to be estimated, or


 ni =1 if these item parameters are to be held fixed at their starting values.

The starting values may be entered by the SLOPE, THRESHLD, and GUESSING keywords of the
j-th TEST command, or read from an existing item parameter file (IFNAME) designated by
IFNAME=<’filename’> on the GLOBAL command and saved in a previous job by the
PARM=<’filename’> keyword on the SAVE command; or, alternatively, read from a file of
provisional item parameters, designated by the PRNAME=<’filename’> keyword on the
GLOBAL command. When only a few items are to be fixed, this method is the most conven-
ient. If all items are designated as fixed, and the INFO keyword appears on the SCORE com-
mand, the required information and reliability analysis will be performed in Phase 3.

In order for this procedure to work, however, the program must have data to process in
Phases 1 and 2 for at least a few cases. Some artificial response data can be used for this
purpose. The only calculations that will be performed in Phase 2 are preparations for the in-
formation analysis in Phase 3. The number of EM cycles in the CALIB command can there-
fore be set to 2 and the number of NEWTON cycles to 1. The NOADJUST option must also be in-
voked.

Format

FIX=( u1 , u2 ,..., u LENGTH ( j ) )

Default

Do not fix.

226
TEST COMMAND

Example

The following command file shows the fixing of five items by specifying values in a PRNAME
file.

EXAMPLE 16: TRADITIONAL IRT ANALYSIS OF A FIFTEEN-ITEM TEST


PARAMETERS OF ITEMS 6 THROUGH 10 ARE FIXED
>GLOBAL NPARM=3, DFNAME='EXAMPL07.DAT',PRNAME='EXAMPL7f.PRM',SAVE;
>SAVE SCORE='EXAMPL7.SCO',PARM='EXAMPL7.PAR';
>LENGTH NITEMS=15;
>INPUT NTOTAL=15,NALT=5,KFNAME='EXAMPL07.KEY',SAMPLE=600,NIDCHAR=4;
>ITEMS INAME=(MATH01(1)MATH15);
>TEST TNAME=PRETEST,FIX=(0(0)5,1(0)5,0(0)5);
(2X,4A1,T25,15A1)
>CALIB CYCLES=15,NEWTON=3,NQPT=11,NOADJUST;
>SCORE NQPT=11,RSCTYPE=3,LOCATION=250,SCALE=50,NOPRINT,INFO=1,POP;

The exampl7f.prm file contains the following 6 lines:

5
6 1.27168 0.10504 0.14011
7 1.79009 0.10221 0.07543
8 0.81238 0.24523 0.22179
9 1.33017 -0.22387 0.15453
10 1.06557 0.58430 0.08921

DIAGNOS has been set equal to 1 to produce more detailed output which show that these val-
ues do not change during the Phase 2 estimation cycles. They will, of course, be rescaled
along with those of the estimated items in Phase 3.

Related topics

 CALIB command: NOADJUST option (see Section 2.6.30)


 GLOBAL command: IFNAME and PRNAME keywords (see Section 2.6.7)
 SAVE command: PARM keyword (see Section 2.6.15)
 SCORE command: INFO keyword (see Section 2.6.16)
 TEST command: GUESS, SLOPE, and THRESHLD keywords
 Technical menu: Assign Fixed Items dialog box (see Section 2.3.5)

GUESS keyword (optional)

Purpose

To specify starting values for the lower asymptote (guessing) parameters (3-parameter
model only). These values should be positive fractional numbers with decimal points.

227
2 BILOG-MG REFERENCE

Format

GUESS=( n1 , n2 ,..., nn ( i ) )

Default

0.0.

Example

In the syntax below, starting values for the slopes and guessing parameters of the four items
considered in this 3-parameter model are provided on the TEST command.

EXAMPLE:
USING STARTING VALUES
>GLOBAL NPARM=3,LOGISTIC';
>LENGTH NITEMS=4;
>INPUT NTOTAL=4,NIDCHAR=2;
>ITEMS INAME=(SP01,SP02,SP03,SP04),INUMBERS=(1(1)4);
>TEST TNAME=SPELL,
SLOPE=(1.045,1.604,1.836,1.613),
GUESS=(0.189,0.168,0.101,0.152);

Related topics

 GLOBAL command: NPARM keyword (see Section 2.6.7)


 Technical menu: Item Parameter Starting Values dialog box (see Section 2.3.5)

INAMES keyword (optional)

Purpose

To provide a list of names, as specified in the ITEMS command for items in TEST. Item
names that do not begin with letters must be enclosed in single quotes.

Format

INAME=( n1 , n2 ,..., nn ( i ) )

Default

If NTEST =1, and NVTEST = 0, all NTOTAL items are as specified in the INPUT command.
There is no default if NTEST > 1 or NVTEST ≠ 0.

228
TEST COMMAND

Example

In the following example, responses to 50 items are read from those of 100 items in the data
file. From the 50, 20 are selected as Main Test items and 4 as Variant Test items. Items for
the main test are selected by name in the TESTM command; items for the variant test are se-
lected by name in the TESTV command. The item names correspond to the sequence numbers
in the original set of 100 items.

>GLOBAL DFNAME='EXAMPL06.DAT', NTEST=1,NVTEST=1,NPARM=2;


>LENGTH NITEM=24,NVARIANT=4;
>INPUT NTOTAL=50,KFNAME='EXAMPL06.DAT',SAMPLE=200,NIDCH=11;
>ITEMS INUMBERS=(1(1)50),INAME=(I26(1)I75);
>TESTM TNAME=MAINTEST,
INAMES=(I26,I27,I28,I29,I31,I33,I34,
I35,I36,I38,I39,I47,I48,I49,I50,I54,I60,I64,I68,I72);
>TESTV TNAME=VARIANT,INAMES=(I53,I59,I69,I73);

Related topics

 GLOBAL command: NTEST and NVTEST keywords (see Section 2.6.7)


 INPUT command: NTOTAL keyword (see Section 2.6.9)
 ITEMS command (see Section 2.6.10)
 LENGTH command (see Section 2.6.11)
 Setup menu: General dialog box (see Section 2.3.3)

INTERCPT keyword (optional)

Purpose

To specify real-numbered starting values (with decimal points) for estimating the item inter-
cept. Starting values may be specified for intercepts or for thresholds, but not for both.

Format

INTERCPT=( n1 , n2 ,..., nn ( i ) )

Default

Supplied by the program.

Example

In the syntax below, starting values for the intercepts of the four items considered in this 3-
parameter model are provided on the TEST command.

229
2 BILOG-MG REFERENCE

EXAMPLE:
USING STARTING VALUES
>GLOBAL NPARM=3,LOGISTIC';
>LENGTH NITEMS=4;
>INPUT NTOTAL=4,NIDCHAR=2;
>ITEMS INAME=(SP01,SP02,SP03,SP04),INUMBERS=(1(1)4);
>TEST TNAME=SPELL,INTERCPT = (1.284,0.287,-1.912,-0.309);

Related topics

 TEST command: THRESHLD keyword


 Technical menu: Item Parameter Starting Values dialog box (see Section 2.3.5)

INUMBERS keyword (optional)

Purpose

To provide a list of numbers, as specified in the ITEMS command for items in TEST. If TEST
refers to main test items, n(i) is the number of main test items. If TEST refers to variant test
items, n(i) is the number of variant test items.

The notation “first (increment) last” in these lists may be used when the item numbers form
an arithmetic progression.

Format

INUMBER=( n1 , n2 ,..., nn ( i ) )

Default

If NTEST=1, and NVTEST=0, all NTOTAL items as specified in the INPUT command. There is no
default if NTEST>1 of NVTEST ≠ 0.

Examples

For the case where NTEST=1 and NVTEST=1 in the GLOBAL command, NITEMS=10 and
NVARIANT=4 in the LENGTH command, and NTOT=10 in the INPUT command, the main test
items of subtest i might be specified in the first TEST command with
ITEMS=(1,2,3,6,8,10). The variant test items of subtest i might be specified in the second
TEST command with ITEMS=(4,5,7,9).

In the example below, two subtests are used, each with 8 items. The NTEST keyword on the
GLOBAL command indicates that two subtests are to be used, and two TEST commands follow
the ITEMS command. The subtests are assigned names through the TNAME keyword and items
are referenced by number.

230
TEST COMMAND

>GLOBAL NPARM=3,NTEST=2,DFNAME='EXAMPL08.DAT';
>LENGTH NITEMS=(8,8);
>INPUT NTOTAL=16,NALT=5,NIDCHAR=9,TYPE=3;
>ITEMS INUMBERS=(1(1)16),INAMES=(N1(1)N8,A1(1)A8);
>TEST1 TNAME=NUMCON,INUMBERS=(1(1)8);
>TEST2 TNAME=ALGCON,INUMBERS=(9(1)16);

Related topics

 GLOBAL command: NTEST or NVTEST keywords (see Section 2.6.7)


 ITEMS command (see Section 2.6.10)
 LENGTH command (see Section 2.6.11)
 Setup menu: Item Analysis dialog box (see Section 2.3.3)

SLOPE keyword (optional)

Purpose

To provide starting values for slopes (2- and 3-parameter models only). These starting val-
ues should be positive, real numbers with decimal points. Starting values may be specified
for slopes or for dispersions, but not for both.

Format

SLOPE=( n1 , n2 ,..., nn ( i ) )

Default

1.0.

Example

In the syntax below, starting values for the intercepts and slopes of the four items considered
in this 3-parameter model are provided on the TEST command.

EXAMPLE:
USING STARTING VALUES
>GLOBAL NPARM=3,LOGISTIC';
>LENGTH NITEMS=4;
>INPUT NTOTAL=4,NIDCHAR=2;
>ITEMS INAME=(SP01,SP02,SP03,SP04),INUMBERS=(1(1)4);
>TEST TNAME=SPELL,
INTERCPT = (1.284,0.287,-1.912,-0.309),
SLOPE=(1.045,1.604,1.836,1.613);

231
2 BILOG-MG REFERENCE

Related topics

 Technical menu: Item Parameter Starting Values dialog box (see Section 2.3.5)
 TEST command: DISPERSN keyword

THRESHLD keyword (optional)

Purpose

To specify real-numbered starting values (with decimal points) for estimating the item
thresholds. Starting values may be specified for intercepts or for thresholds, but not for both.

Format

THRESHLD=( n1 , n2 ,..., nn ( i ) )

Default

0.0.

Example

In the syntax below, starting values for the slopes and thresholds of the four items consid-
ered in this 3-parameter model are provided on the TEST command.

EXAMPLE:
USING STARTING VALUES
>GLOBAL NPARM=3,LOGISTIC';
>LENGTH NITEMS=4;
>INPUT NTOTAL=4,NIDCHAR=2;
>ITEMS INAME=(SP01,SP02,SP03,SP04),INUMBERS=(1(1)4);
>TEST TNAME=SPELL, SLOPE=(1.045,1.604,1.836,1.613),
THRESHLD=(-1.229,-0.179,1.041,0.192);

Related topics

 Technical menu: Item Parameter Starting Values dialog box (see Section 2.3.5)
 TEST command: INTERCPT keyword

TNAME keyword (optional)

Purpose

To supply a name for subtest i (up to eight characters), if there are not variant test items in
subtest i; or name of the main test items in subtest i, if there are variant test items in subtest
i.

232
TEST COMMAND

Format

TNAME=character string

Default

None.

Examples

In the example below, two subtests are used, each with 8 items. The NTEST keyword on the
GLOBAL command indicates that two subtests are to be used, and two TEST commands follow
the ITEMS command. The TEST commands are assigned names through the TNAME keyword
and items are referenced by number.

>GLOBAL NPARM=3,NTEST=2,DFNAME='EXAMPL08.DAT';
>LENGTH NITEMS=(8,8);
>INPUT NTOTAL=16,NALT=5,NIDCHAR=9,TYPE=3;
>ITEMS INUMBERS=(1(1)16),INAMES=(N1(1)N8,A1(1)A8);
>TEST1 TNAME=NUMCON,INUMBERS=(1(1)8);
>TEST2 TNAME=ALGCON,INUMBERS=(9(1)16);

In the next example, the ITEMS command lists the four items in the order that they will be
read from the data records. Because examinees in both groups are presented all the items
listed in the ITEMS command, the TEST command needs contain only the test name.

>GLOBAL NPARM=1,NWGHT=3,LOGISTIC;
>LENGTH NITEMS=4;
>INPUT NTOTAL=4,NGROUPS=2,DIF,NIDCHAR=2,TYPE=2;
>ITEMS INAMES=(SP1(1)SP4),INUMBERS=(1(1)4);
>TEST TNAME=SPELL;
>GROUP1 GNAME=MALES;
>GROUP2 GNAME=FEMALES;

Related topics

 GLOBAL command: NTEST and NVTEST keywords (see Section 2.6.7)


 ITEMS command (see Section 2.6.10)
 Setup menu: General dialog box (see Section 2.3.3)
 TEST command: INUMBERS keyword

233
2 BILOG-MG REFERENCE

2.6.18 TITLE command

(Required)

Purpose

To provide a label that will be used throughout the output to identify the problem run. The
first two lines of the command file are always the title lines. If the title fits on one line, a
second, blank line must be entered before the next command starts.

The maximum length of each line is 80 characters. The text will be printed verbatim at the
top of each output section, as well as at the start of some output files. The two title lines are
required at the start of the command file. No special delimiters (> or ;) are required.

Format

…text…
…text…

Example

EXAMPLE 4
SIMULATED RESPONSES TO TWO 20-ITEM PARALLEL TEST FORMS

Related topics

 Setup menu: General dialog box (see Section 2.3.3)

234
VARIABLE FORMAT STATEMENT

2.6.19 Variable format statement

(Required)

Purpose

To supply variable format statements describing the column assignments of fields in the data
records.

Format

(aA1,nX,Ib,Ic,Fd.e,Tw,fA1)

where:

a is the number of columns in the ID field


b is the number of columns in the form indicator field, if any.
c is the number of columns in the group indicator field, if any. If PERSONAL is present
on the INPUT command there will be multiple-group indicator fields.
d is the number of columns in the case-weight or pattern-frequency field, if any.
e is the number of columns to the right of the decimal place in the case-weight or pat-
tern frequency field, if any.
f is the total number of items in the form, when NFORM=1 and the total number of items
in the longest form when NFORM>1.

Notes

Columns skipped between fields are indicated by nX, where n is the number of columns to
be passed over.

If the fields in the data records are not in the above order, the format tab designator (Tw) may
be inserted in front of any of the fields (w is the position of the first column of the field,
counting from column one). Check the input data carefully when left tabs are used.

A forward slash (/) means “skip to the next line”. For example,

(5A1,5X,15A1/10X,15A1)

would read the case ID and 15 item responses from line 1; then, skip ten columns and read
15 item responses from line 2.

235
2 BILOG-MG REFERENCE

The variable format statement for aggregate-level data has the general form:

(aA1,Ib,Ic,Fd.e,f(Fg.h,Fi.j))

where:

 g is the number of columns in the “number tried” field


 h is the number of columns to the right of the decimal point in the “number tried” field
 i is the number of columns in the “number right” field.
 j is the number of columns to the right of the decimal point in the “number right” field.

Examples

The following example uses simulated responses to illustrate nonequivalent groups equating
of two forms of a 25-item multiple-choice examination administered to different popula-
tions. The two forms have five items in common: C21, C22, C23, C24, and C25. The items
for each group are specified in the GROUP1 and GROUP2 commands. Note that the item lists
on the GROUP commands are the same as those on the FORMS command. This is because
Group 1 took Form 1 of the examination and Group 2 took Form 2 of the examination.

As an answer key is provided in the raw data file (KFNAME=EXAMPL03.DAT on the INPUT
command), the answer key appears first. Note that, when multiple forms are used, an answer
key for each form should be provided. The answer key is in the same format as the data. For
each examinee, two lines of data are provided. The first line contains identifying information
and the second the item responses.

The first information read from the data file is the examinee’s ID, which is in column 35
(5A1). For the first examinee the ID is 0001, and for the last 0200. Using the T operator to
move to column 25, the form indicator is read next (I1). Because the values for form and
group are the same for any given subject, a single form/group indicator appears on each data
record. The indicator is read twice, first for forms and then for groups. The “/” operator is
used to move to the first column of the second line. The 25 item responses are then read as
(25A1).

>GLOBAL DFNAME='EXAMPL03.DAT', NPARM=2;


>LENGTH NITEMS=(45);
>INPUT NTOTAL=45, SAMPLE=2000, NIDCHAR=5, NALT=5, NGROUP=2,
NFORM=2, TYPE=1, KFNAME='EXAMPL03.DAT';
>ITEMS INAMES=(C01(1)C45), INUMBERS=(1(1)45);
>TEST1 TNAME='CHEMISTR', INUMBERS=(1(1)45);
>FORM1 LENGTH=25, INUMBERS=(1(1)25);
>FORM2 LENGTH=25, INUMBERS=(21(1)45);
>GROUP1 GNAME='POP1', LENGTH=25, INUMBERS=(1(1)25);
>GROUP2 GNAME='POP2', LENGTH=25, INUMBERS=(21(1)45);
(T35,5A1,T25,I1,T25,I1/25A1)

236
VARIABLE FORMAT STATEMENT

ANSWER KEY FORM 1 1


1111111111111111111111111
ANSWER KEY FORM 2 2
1111111111111111111111111
Samp1 GROUP1 11 1 00001
1111111112221212211111121
Samp1 GROUP1 11 2 00002
2211111212222222222255222
Samp2 GROUP2 22 2 00199
2422221211222211222221121

Samp2 GROUP2 22 100 00200
1111111111111111212111111

The following example illustrates the equating of equivalent groups with the BILOG-MG
program. Two parallel test forms of 20 multiple-choice items were administered to two
equivalent samples of 200 examinees drawn from the same population. There are no com-
mon items between the forms. Because the samples were drawn from the same population,
GROUP commands are not required. The FORM1 command lists the order of the items in Form
1 and the FORM2 command lists the order of the items in Form 2.

As in the previous example, two lines of data are provided for each examinee. The first line
contains identifying information and the second the item responses. The first information
read from the data file is the examinee’s ID, which is in column 35 (5A1). For the first ex-
aminee the ID is 0001, and for the last 0200. Using the “T” operator to move to column 25,
the form indicator is read next (I1). The “/” operator is used to move to the first column of
the second line. The 20 item responses per form are then read in.

>GLOBAL DFNAME='EXAMPL04.DAT', NIDCH=5, NPARM=2;


>LENGTH NITEMS=(40);
>INPUT NTOTAL=40, SAMPLE=2000, NALT=5, NIDCHAR=5, NFORM=2,
KFNAME='EXAMPL04.DAT';
>ITEMS INAMES=(T01(1)T40), INUMBERS=(1(1)40);
>TEST1 TNAME='SIM', INMBERS=(1(1)40);
>FORM1 LENGTH=20, INUMBERS=(1(1)20);
>FORM2 LENGTH=20, INUMBERS=(21(1)40);
(T35,5A1,T25,I1/40A1)

ANSWER KEY FORM 1 1


11111111111111111111
ANSWER KEY FORM 2 2
11111111111111111111
Samp1 GROUP1 12 1 00001
11111111122212122111
Samp1 GROUP1 12 1 00002
11222212221222222112

Two hundred students at each of three grade levels, grades four, six, and eight, were given
grade-appropriate versions of a 20-item arithmetic examination. Items 19 and 20 appear in
the grade 4 and 6 forms; items 37 and 38 appear in the grade 6 and 8 forms. Because each

237
2 BILOG-MG REFERENCE

item is assigned a unique column in the data records, a FORM command is not required. Both
an answer and not-presented key are given at the top of the raw data file
(KFNAME=EXAMPL05.DAT, NFNAME=EXAMPL05.DAT on the INPUT command). In the case of
the answer key, a “1” represents a correct response. A not-presented item is indicated by a
blank, ” “.

As no FORM command is required, only a group indicator has to be read in. The case ID,
given in column 35, is read first (5A1), followed by the group indicator in column 25 (I1).
The 56 item responses are read from the second line of data (56A1) after using the
“/”operator to move to the start of this line.

>GLOBAL DFNAME='EXAMPL05.DAT',NPARM=2;
>LENGTH NITEMS=(56);
>INPUT NTOTAL=56,SAMPLE=2000,NGROUPS=3,KFNAME='EXAMPL05.DAT',
NFNAME='EXAMPL05.DAT',NIDCHAR=5;
>ITEMS INUMBERS=(1(1)56),INAME=(M01(1)M56);
>TEST TNAME=MATH;
>GROUP1 GNAME='GRADE 4',LENGTH=20,INUMBERS=(1(1)20);
>GROUP2 GNAME='GRADE 6',LENGTH=20,INUMBERS=(19(1)38);
>GROUP3 GNAME='GRADE 8',LENGTH=20,INUMBERS=(37(1)56);
(T35,5A1,T25,I5/56A1)

ANSWER KEY
11111111111111111111111111111111111111111111111111111111
NOT-PRESENTED KEY

Samp1 GROUP1 1 1 00001


11111112221211222212
Samp1 GROUP1 1 1 00002
21121211121111121212
Samp3 GROUP3 3 3 00199
12212212212211112121
Samp3 GROUP3 3 3 00200
11111111121111111111

The following example illustrates the use of the TYPE=3 specification on the INPUT com-
mand to analyze aggregate-level, multiple-matrix sampling data. The data in exampl08.dat
are numbers tried and numbers correct for items from eight forms of a matrix sampled as-
sessment instrument. The groups are selected 8-th grade students from 32 public schools.
The first record for each school contains the data for the items of a Number Concepts scale,
NUMCON, and the second record contains the data for items of an Algebra Concepts scale,
ALGCON. An answer key is not relevant for aggregate-level data in number-tried, number-
right summary form. Note the format statement for reading the two sets of eight number-
tried, number-right observations from the two data lines. Again, the “/” operator is used to
move to the start of the second line of data for each school.

>GLOBAL DFNAME='EXAMPL08.DAT', NPARM=3, NTEST=2;


>LENGTH NITEMS=(8, 8);
>INPUT NTOTAL=16, NALT=5, TYPE=3, NIDCHAR=9;
>ITEMS INAMES=(N1(1)N8, A1(1)A8), INUMBER=(1(1)16);

238
VARIABLE FORMAT STATEMENT

>TEST1 TNAME='NUMCON', INUMBERS=(1(1)8);


>TEST2 TNAME='ALGCON', INUMBERS=(9(1)16);
(1X,9A1,5X,8(2F3.0)/15X,8(2F3.0))

SCHOOL 1 NUM 1 0 3 2 2 1 4 4 3 2 2 1 4 3 4 1
SCHOOL 1 ALG 1 0 3 1 2 0 3 2 3 2 2 1 4 1 4 0
SCHOOL 2 NUM 5 3 4 4 3 2 3 3 2 2 4 3 4 3 5 3
SCHOOL 2 ALG 5 2 4 2 3 2 3 2 2 2 4 2 4 2 5 3

The next example illustrates the use of BILOG-MG with multiple groups and multiple sub-
tests. Based on previous test performance, examinees are assigned to two groups for adap-
tive testing. Out of a set of 45 items, group 1 is assigned items 1 through 25, and group 2 is
assigned items 21 through 45; thus, there are 5 items linking the test forms administered to
the groups.

Twenty of the 25 items presented to group 1 belong to subtest 1 (items 1-15 and 21-25).
Twenty items also belong to subtest 2 (items 6-25). Of the 25 items presented to group 2, 20
belong to subtest 1 (items 21-40) and 20 to subtest 2 (items 21-25 and 31-45).

In all, there are 35 items from the set of 45 assigned to each subtest. (This extent of item
overlap between subtests is not realistic, but it illustrates that more than one subtest can be
scored adaptively provided they each contain link items between the test forms.)

Note that, in this case, the item responses on the second line of data for each examinee rep-
resent responses to different items. When we previously considered these data, the response
in the first column of the second line represented the response to item 1, regardless of group
membership. Here, that response would be the response to item 1 for a member of group 1,
but the response to item 21 for an examinee from group 2.

>GLOBAL DFNAME='EXAMPL03.DAT',NPARM=2,NTEST=2,SAVE;
>SAVE SCORE='EXAMPL09.SCO';
>LENGTH NITEMS=(35,35);
>INPUT NTOTAL=45, SAMPLE=2000, NGROUP=2, KFNAME='EXAMPL03.DAT', NALT=5,
NFORMS=2,NIDCHAR=5;
>ITEMS INUMBERS=(1(1)45), INAME=(C01(1)C45);
>TEST1 TNAME=SUBTEST1, INAME=(C01(1)C15,C21(1)C40);
>TEST2 TNAME=SUBTEST2, INAME=(C06(1)C25,C31(1)C45);
>FORM1 LENGTH=25,INUMBERS=(1(1)25);
>FORM2 LENGTH=25,INUMBERS=(21(1)45);
>GROUP1 GNAME=POP1,LENGTH=25,INUMBERS=(1(1)25);
>GROUP2 GNAME=POP2,LENGTH=25,INUMBERS=(21(1)45);
(T35,5A1,T25,I1,T25,I1/45A1)

ANSWER KEY FORM 1 1


1111111111111111111111111
ANSWER KEY FORM 2 2
1111111111111111111111111
Samp1 GROUP1 11 1 00001 1.0 .48900
1111111112221212211111121
Samp1 GROUP2 22 2 00002 1.0 -.92734
2211111212222222222255222

239
2 BILOG-MG REFERENCE

Default

None.

Related topics

 Data Structures (see Chapter 1)


 Data menu: Examinee Data dialog box (see Section 2.3.4)
 Data menu: Group-Level Data dialog box
 FORM command (see Section 2.6.6)
 GLOBAL command: DFNAME and NWGHT keywords (see Section 2.6.7)
 INPUT command: NFORM, TYPE and PERSONAL keywords (see Section 2.6.9)

240
INPUT AND OUTPUT FILES

2.6.20 Input and output files

Input files

The following data files contain problem information that must be supplied by the user as
needed. Any text editor that writes an ASCII file may be used to prepare these files.

File Keyword

Answer key KFNAME on INPUT command

Not-presented key NFNAME on INPUT command

Omit key OFNAME on INPUT command

Original data file DFNAME on GLOBAL command

Provisional starting values PRNAME on GLOBAL command

Note:

The assignment of specific names to these files in the INPUT command causes the program
to read external files.

These files may be combined into one file, using the order above. The arrangement is to
construct an arbitrarily named file consisting of the answer key, if any, the not-presented
key, if any, the omit key, if any and the item-response data. Any of the above files would
then have the name of that file. Section 10.5 illustrates the combination of an answer key
and not-presented key within the data file.

Format of the input records

 The keys and the data records must have the same fixed-column formats.
 The fields of the data records are read in the following order:
 The respondent identification field (up to 30 columns of characters as specified by the
NIDCHAR keyword on the INPUT command).
 The form number (only if NFORMS>1).
 The group number or numbers (integer) (only if specified by a value larger than 1 for the
NGROUP keyword of the INPUT command).
 A real-valued (with decimal point) case weight for the respondent or frequency for a re-
sponse pattern (only if specified by the NWGHT keyword of the GLOBAL command).
 The individual item-response records or patterns.
 The type of entries in the item-response field is determined by the TYPE keyword of the
INPUT command and by the presence or absence of the KFNAME keyword of the INPUT
command:
 if KFNAME is not present, the item responses are scored 1 = correct and 0 = not correct.

241
2 BILOG-MG REFERENCE

 if KFNAME is present, the item responses are arbitrary single ASCII characters, the correct
alternatives of which appear in the same columns of the answer key.
 In either of the above types of data, not-presented items may be coded by an arbitrary
character defined in the corresponding column of the not-presented key. (See the
NFNAME keyword of the INPUT command in Section 2.6.9.)
 Similarly, omitted items may be coded by another character defined in the corresponding
column of the omit key. (See the OFNAME keyword of the INPUT command.)
 The path to and filename of any of these files may be longer than 80 characters. As the
maximum length of any line in the command file is 80 characters, multiple lines should be
used. It is important to continue up to and including the 80th column when specifying a
long path and filename.

For example, suppose the data file exampl06.dat is in a folder named:

C:\PROGRAM FILES\ITEM RESPONSE THEORY\IRT_2002\MARCH20\BILOG-MG-


VERSION1.2\EXAMPLES

The correct way to enter this information in the command file is to enclose the name and
path in apostrophes, and continue until column 80 is reached. Then proceed in column 1 of
the next line as shown below:

>GLOBAL DFNAME=’C:\PROGRAM FILES\ITEM RESPONSE THEORY\IRT_2002\MARCH20\BILOG-MG


-VERSION1.2\EXAMPLES\EXAMPL06.DAT’, NTEST=1, NVTEST=1, NPARM=2, SAVE;

If the data are stored in the same folder as the command file, it is sufficient to type

DFNAME='EXAMPL06.DAT'

Related topics

 Data menu: Examinee Data dialog box (see Section 2.3.4)


 Data menu: Group-Level Data dialog box
 Data menu: Item Keys dialog box
 GLOBAL command (see Section 2.6.7)
 INPUT command (see Section 2.6.9)

Output files

Through use of the keywords on the SAVE command, the following output files may be created.

 Ability score file


 Classical item statistics file
 DIF parameter file
 DRIFT parameter file
 Estimated covariance file
 Expected frequency file
 Item parameter file

242
INPUT AND OUTPUT FILES

 Marginal posterior probability file


 Test information file

Related topics

 SAVE command (see Section 2.6.15)


 Save menu (see Section 2.3.6)

Ability score file

Keyword: SCORE

This file is created during Phase 3 of the program if SCORE is specified in the SAVE command. It
consists of the title records and two records per subtest for each respondent.

The format is as follows:

Records Description

1&2 In 20A4/20A4 format, the title records of the BILOG-MG run that created the
ability score file.

3+ Two records per subtest for each respondent, containing the following infor-
mation

Columns Format Description

First re-
cord

1–3 I3 group indicator

4–5 2X blank filler

6 – 35 30A1 respondent identification

Second
record

1–6 F6.2 respondent case weight

7–7 A1 * if the subject is not calibrated; a blank otherwise

243
2 BILOG-MG REFERENCE

8 – 15 A8 subtest name

16 – 20 I5 number of attempts to items in the subtest

21 – 25 I5 number of correct responses to the subtest

26 – 35 F10.6 percent-correct score

36 – 47 F12.6 scale score estimate

48 – 59 F12.6 estimated standard error of scale score

A1 * if standard error of scale score is inestimable; a blank oth-


60 – 60
erwise

61 – 70 F10.6 group fit probability, if requested

F10.6 marginal probability of response pattern if EAP scoring is


71 – 80
chosen

Related topics

 SAVE command (see Section 2.6.15)


 Save menu (see Section 2.3.7)

Classical item statistics file

Keyword: ISTAT

This file contains all classical item statistics computed and printed by Phase 1 of the program.
The following items are written to this external file in the same format as used in the result out-
put from Phase 1, *.ph1:

 the title records in format (20A4/20A4)


 item facilities (percent correct)
 number of attempts and correct responses to each item
 item-subscore correlations

Related topics

 SAVE command (see Section 2.6.15)


 Save menu (see Section 2.3.7)

244
INPUT AND OUTPUT FILES

DIF parameter file

Keyword: DIF

Records Description

1&2 In 20A4/20A4 format, the title records of the BILOG-MG run that created the
DIF parameter file

3+ Three sets of item records for each subtest. The first set contains the unadjusted
item threshold parameters and s.e.s for each group.

The second set contains adjusted threshold parameters and s.e.s for each group.

The last set contains estimates of group differences in adjusted threshold pa-
rameters.

First set of item records:

Columns Format Description

First record

1–8 A8 test name

8 – 10 2X blank filler

11 - 18 A8 item name

19 – 20 2X blank filler

21 - 220 20(F10.5) unadjusted threshold parameters for groups

Second record

1 – 20 20X Blank filler

21 – 220 20(F10.5) Estimated s.e. of unadjusted threshold parameters for


groups

245
2 BILOG-MG REFERENCE

Second set of item records:

First record

1–8 A8 test name

8 – 10 2X blank filler

11 – 18 A8 item name

19 – 20 2X blank filler

21 – 220 20(F10.5) adjusted threshold parameters for groups

Second record

1 – 20 20X blank filler

21 – 220 20(F10.5) s.e. of adjusted threshold parameter for group


contrasts

Third set of item records:

First record

1–8 A8 test name

8 – 10 2X blank filler

11 – 18 A8 item name

19 – 20 2X blank filler

21 – 210 20(F10.5) group differences in threshold parameters for


group contrasts

211 – 220 10X blank filler

Second record

1 – 20 20X blank filler

21 – 210 20(F10.5) s.e. of group differences for group contrasts

211 – 220 10X blank filler

246
INPUT AND OUTPUT FILES

Related topics

 SAVE command (see Section 2.6.15)


 Save menu (see Section 2.3.7)

DRIFT parameter file

Keyword: DRIFT

This file is saved during Phase 2 if DRIFT is specified on the SAVE command. It consists of
title records and two records for each item. The format is as follows:

Records Description

1&2 In 20A4/20A4 format, the title records of the BILOG-MG run that created the
DRIFT parameter file

3+ Two records for each item, containing the following information

Columns Format Description

First record

1–8 A8 item name

9 – 10 2X blank filler

11 – 21 F11.5 Intercept

22 – 32 F11.5 linear coefficient

33 – 43 F11.5 quadratic coefficient

44 – 54 F11.5 cubic coefficient

55 – 65 F11.5 quartic coefficient

66 – 76 F11.5 quintic coefficient

247
2 BILOG-MG REFERENCE

Second record

1 – 10 10X blank filler

11 – 21 F11.5 estimated s.e. of intercept

22 – 32 F11.5 estimated s.e. of linear coefficient

33 – 43 F11.5 estimated s.e. of quadratic coefficient

44 – 54 F11.5 estimated s.e. of cubic coefficient

55 – 65 F11.5 estimated s.e. of quartic coefficient

66 – 76 F11.5 estimated s.e. of quintic coefficient

Related topics

 SAVE command (see Section 2.6.15)


 Save menu (see Section 2.3.7)

Estimated covariance file

Keyword: COVARIANCE

This file is created by Phase 2 of the program and passed to Phase 3, where item information
indices are added if requested. It contains title records and the item parameter estimates at
the conclusion of Phase 2 and the added item information indices at the conclusion of Phase
3. The format is as follows:

Records Description

1&2 In 20A4/20A4 format, the title records of the BILOG-MG run that created the
covariance file.

3+ Four records for each item, containing the following information:

248
INPUT AND OUTPUT FILES

Columns Format Description

First record

1–8 A8 item name

9 – 16 A8 subtest name

17 – 21 I5 group indicator

22 – 33 F12.6 slope estimate

34 – 45 F12.6 threshold estimate

46 – 57 F12.6 lower asymptote estimate

58 – 69 F12.6 estimation error variance for slope

70 – 81 F12.6 estimation error covariance for slope and threshold

Second record

1 – 17 17X blank filler

18 – 29 F12.6 estimation error variance for threshold

30 – 41 F12.6 estimation error covariance for slope and asymptote

42 – 53 estimation error covariance for threshold and as-


F12.6 ymptote

54 – 65 F12.6 estimation error variance for lower asymptote

66 – 81 16X blank filler

Third record population-independent indices

1 – 17 17X blank filler

18 – 29 F12.5 value of maximum information

30 – 41 F12.5 estimated s.e. of value of maximum information

42 – 53 F12.5 point of maximum information

54 – 65 F12.5 estimated s.e. of point of maximum information

249
2 BILOG-MG REFERENCE

66 – 81 16X blank filler

Fourth record population-dependent indices

1 – 17 17X blank filler

18 – 29 F12.5 value of maximum effectiveness (info*density)

30 – 41 F12.5 point of maximum effectiveness

42 – 53 F12.5 average information

54 – 65 F12.5 reliability index (s.d./(s.d. + 1/(ave.info)2 ))

66 – 81 16X blank filler

Related topics

 SAVE command (see Section 2.6.15)


 Save menu (see Section 2.3.7)

Expected frequency file

Keyword: EXPECTED

This file is created by Phase 2 of the program. It contains expected sample sizes, expected
number of correct responses, expected proportions of correct responses, standardized poste-
rior residuals and model proportions of correct responses. These values are evaluated at each
quadrature point and item. The format of each item and each of the quadrature points is as
follows:

Records Description

1&2 In 20A4/20A4 format, the title records of the BILOG-MG run that created the
expected file

3+ Seven records for each item, containing the following information

250
INPUT AND OUTPUT FILES

Column Format Description

First record

1–8 A8 item name

9 – 10 2X blank filler

11 – 15 I5 group indicator

16 – 17 2X blank filler

18 – 27 A10 label “POINT”

28 – 82 5(F10.5,1X) five values of quadrature points

Second record

1–8 A8 item name

9 – 10 2X blank filler

11 – 15 I5 group indicator

16 – 17 2X blank filler

18 – 27 A10 label “WEIGHT”

28 – 82 5(F10.5,1X) five values of quadrature weights

Third record

1–8 A8 item name

9 – 10 2X blank filler

11 – 15 I5 group indicator

16 – 17 2X blank filler

18 – 27 A10 label “TRIED”

28 – 82 5(F10.5,1X) five values of expected sample sizes

251
2 BILOG-MG REFERENCE

Fourth record

1–8 A8 item name

9 – 10 2X blank filler

11 – 15 I5 group indicator

16 – 17 2X blank filler

18 – 27 A10 label “RIGHT”

28 – 82 five values of expected number of correct re-


5(F10.5,1X) sponses

Fifth record

1–8 A8 item name

9 – 10 2X blank filler

11 – 15 I5 group indicator

16 – 17 2X blank filler

18 – 27 A10 label “PROPORTION”

28 – 82 five values of expected proportions of correct


5(F10.5,1X) responses

Sixth record

1–8 A8 item name

9 – 10 2X blank filler

11 – 15 I5 group indicator

16 – 17 2X blank filler

18 – 27 A10 label “s.e.”

28 – 82 5(F10.5,1X) five values of standardized posterior residuals

252
INPUT AND OUTPUT FILES

Seventh record

1–8 A8 item name

9 – 10 2X blank filler

11 – 15 I5 group indicator

16 – 17 2X blank filler

18 – 27 A10 label “MODEL PROP”

28 – 82 five values of model proportions of correct


5(F10.5,1X) responses

Remark:

If more than five quadrature points are used, each record is duplicated with the same format.
If there is more than one group, the item information is presented for each group. Sets of re-
cords within an item are separated by single-dashed lines. Sets of records between items are
separated by double-dashed lines.

Related topics

 SAVE command (see Section 2.6.15)


 Save menu (see Section 2.3.7)

Item parameter file

Keyword: PARM

This file is saved during Phase 2 of the program if PARM is specified in the SAVE command.
The file contains the item parameter estimates and other information. The format is as fol-
lows:

Records Description

1&2 In 20A4/20A4 format, the title records of the BILOG-MG run that created
the item parameter file

3 In 2I4 format, the number of subtests and the total number of items
appearing in this file

4 In 20I4 format, the numbers of items in the main and variant subtest on as
many records as necessary.

253
2 BILOG-MG REFERENCE

5+ One record for each item in the main and variant subtests (if any), contain-
ing the following information

Columns Format Description

1–8 A8 item name

9 – 16 A8 subtest name

17–26 F10.5 intercept parameter

27 – 36 F10.5 intercept s. e.

37 – 46 F10.5 slope parameter

47 – 56 F10.5 slope s. e.

57 – 66 F10.5 threshold parameter

67 – 76 F10.5 threshold s. e.

77 – 86 dispersion parameter (reciprocal of


F10.5 slope)

87 – 96 F10.5 dispersion s. e.

97 – 106 F10.5 lower asymptote parameter

107 – 116 F10.5 lower asymptote s.e.

117 – 126 F10.5 DRIFT parameter

127 – 136 F10.5 DRIFT s. e.

137 – 146 F10.5 unused columns

147 – 150 I4 location of item in input stream

151 A1 answer key

152 I1 dummy values

254
INPUT AND OUTPUT FILES

Related topics

 SAVE command (see Section 2.6.15)


 Save menu (see Section 2.3.7)

Marginal posterior probability file

Keyword: POST

This file is created by Phase 2 of the program. It contains title records, the respondent’s
identification and group numbers, the case weight, and the marginal posterior probability of
its response pattern. The format of each respondent’s record is as follows:

Records Description

1&2 In 20A4/20A4 format, the records of the BILOG-MG run that created the
posterior file

3+ Two records for each response pattern, containing the following information

Columns Format Description

First record

1–5 I5 group indicator

6 – 10 5X blank filler

11 – 40 30A1 respondent’s identification number

41 – 80 40X blank filler

Second record

1–8 A8 subtest name

9 – 10 2X blank filler

11 – 20 F10.3 case weight

21 – 25 5X blank filler

255
2 BILOG-MG REFERENCE

marginal posterior probability of the


26 – 40 F15.10 response pattern

41 – 80 40X blank filler

Related topics

 SAVE command (see Section 2.6.15)


 Save menu (see Section 2.3.7)

Test information file

Keyword: TSTAT

This file contains all summary item and test information computed and printed by Phase 3 of the
program. The following items are written to this external file in the same format as used in the
result output from Phase 3, *.ph3:

 the title records in format (20A4/20A4)


 correlations among subtest scale scores
 means and estimates of scale scores

The following items are written only if the appropriate INFO keyword on the SCORE command
has been specified:

 test information and standard error curves


 table of item information indices, including the point and value of maximum information
and the corresponding estimated standard errors for those indices.

Related topics

 SAVE command (see Section 2.6.15)


 Save menu (see Section 2.3.7)
 SCORE command (see Section 2.6.16)

256
3 PARSCALE REFERENCE

3 PARSCALE

The PARSCALE program developed in the early 1990s by Eiji Muraki (then from Educational
Testing Service) and R. Darrell Bock (University of Chicago), implements a powerful extension
of Item Response Theory (IRT) measurement methods ranging from binary-item analysis to mul-
tiple-category and rating-scale items.

PARSCALE was originally developed with large-scale social surveys and educational assess-
ments in mind. More recently, however, the program has become a popular tool for a wider vari-
ety of applications, seeing use by governmental statistical agencies, marketing researchers, pol-
icy and management consultants, and investigators of the many different “classical” (psychologi-
cal, sociological, educational, medical) assessment studies. Its flexibility and the wealth of in-
formation it can provide have kept it in regular use by researchers around the world.

The program can handle a great diversity of data types. The simple survey is probably the most
common of these. In such a case, items are rated in a common set of categories (known to behav-
ioral scientists as a “Likert”-type scale). Whereas the original Likert approach assigned arbitrary,
successive integer values to the categories, the IRT procedures implemented in PARSCALE es-
timate optimal, empirical values for the boundaries between categories. These boundaries, as
well as item locations and respondent scores, can all be represented as points along the latent di-
mension of measurement. Tests that utilize this type of data might be behavioral surveys in
which the answers are “always,” “sometimes,” “often,” or “never”; expressions of opinion such
as “agree,” “undecided,” or “disagree”; or ratings of status, as perhaps a physician using “criti-
cal", “stable,” “improved,” or “symptom-free” as levels of evaluation.

For instruments of assessment, PARSCALE can also be used to analyze rating-scale items (such
as open-ended essay questions) and multiple-choice items. With multiple-choice, simple “right-
wrong” scoring and analysis is achieved by treating items as if only two categories are available
(collapsing all wrong choices into a single category). However, if more information is desired,
the choices can remain separated within each item so that the identity of the chosen alternative is
retained during the analysis. In this way, information on wrong responses can be recovered for
detailed analysis. The effects of guessing can also be included in the analysis.

Often an instrument will consist of a mixture of item types, some having common categories and
some with unique categories. PARSCALE handles this kind of diversity by allowing items to be
assigned to “blocks” within which the item categories are common. Any item that has unique
category definitions will be assigned to its own block. An educational test, for example, may
contain open-ended exercises rated in five categories in one block and multiple-choice items in
another block.

PARSCALE’s multiple-group capability adds the options of Differential Item Functioning (DIF)
analysis for trends between groups or over time, and Rater’s-Effect analysis in order to allow for
rater bias or differences in rater severity. PARSCALE for Windows allows for both easier ma-
nipulation of the command (syntax) file and more efficient review of the output files.

257
3 PARSCALE REFERENCE

3.1 The PARSCALE interface

This section describes those elements in the user’s interface that may not be immediately clear to
the user or that behave in a somewhat nonstandard way.

 Main menu bar


 Workspace
 Run menu
 Output menu
 Font option
 Window menu

3.1.1 Main menu

At the center of the interface is the main menu bar, which adapts to the currently active function.
For example, when you start the program, the menu bar shows only the menu choices File, View,
and Help.

However, as soon as you open a PARSCALE output file or any other text file (by using the File
menu), the Windows and Edit options show up on the menu bar. At the same time, the File
menu choices are expanded with selections like Save and Save As. In addition, the View menu
now includes a Font option following the Status Bar and Toolbar options.

The opening of an existing PARSCALE command (*.psl) file, or starting a new one, adds addi-
tional choices to the main menu bar: the Output, Run, and Workspace menus.

Note that you can open only one command file at a time. If you want to paste some part from an
existing command file in your current one, opening the old file will automatically close the cur-
rent one. After you copy the selection you want to the clipboard, you have to reopen the *.psl file
for pasting.

258
3 PARSCALE REFERENCE

Note also that, by choosing “All Files (*.*)” in the Open File dialog box, score files, parameter
files, or other files created during the run can be reviewed.

3.1.2 Workspace

The Workspace option on the main menu bar provides access to a dialog box that shows the cur-
rent values that are reserved for the numeric and the character workspace.

The defaults are 50 Kbytes for character workspace and 200KBytes for the numeric workspace.
Most problems will run with these settings. If there is insufficient workspace for an analysis to
finish, the program will alert you with a message box and you will find a message at the end of
the output file. For example:

***** NOT ENOUGH SPACE-- 1024 4 BYTES LOCATIONS EXHAUSTED *****

When you encounter such a message, increase the workspace and run the problem again. Re-
member that the changes remain in effect until you change the settings again. Allocating too
much workspace may slow down your analysis, or other programs that are running simultane-
ously, so increase the workspace in reasonable steps. If a run is successful, the program reports at
the end of the output file how much memory it actually used. The values are reported in bytes
and you should divide them by 1024 to arrive at the values for the numbers used in the Work-
space dialog box.

3.1.3 Run menu

The Run menu includes you the option to run All Phases of the program or to run them one at
the time. If you opt for the latter, remember that the different program phases build on each
other. In other words, you need calibration (Phase 2) before you can do the scoring (Phase 3).
That is why the program interface disallows the possibility of running the phases out of order.

259
3 PARSCALE REFERENCE

If you have a new (or changed) command file, initially only Phase 0, Phase 1 and All Phases are
enabled on the Run menu.

When you run an analysis by clicking on one of the options under the Run menu, the current
command file will first be saved, if you made any changes. You can easily tell if a command file
has changed by looking at the filename above the menu bar. An asterisk after the filename shows
that the current file has changed but has not been saved yet.

Once all phases have been completed, the Plot option, providing access to the graphics proce-
dure described in Chapter 6, is enabled.

3.1.4 Output menu

By using the Output menu, you can open the output files for the four different program phases,
named with the file extensions ph0, ph1, ph2, and ph3, respectively. Always check the end of
each output file to see if it reports: NORMAL END. If it does not, something went wrong and the
output file should include some information on that.

3.1.5 Font option

The Font option on the View menu displays the Font dialog box with the fonts that are available
on your system. You may use different fonts for command and output files. At installation, they
are both set to a special Arial Monospace font that ships with the program. To keep the tables in
the output aligned, you should always select a monospace or fixed pitch font where all the char-
acters in the font have the same width. Once you select a new font, that font becomes the default
font. This gives you the option to select a font (as well as font size and font style) for your com-
mand (*.psl) files that is different from the one for your output (*.ph*) files as a quick visual
reminder of the type of file.

260
3 PARSCALE REFERENCE

3.1.6 Window menu

The Window menu is only available when you have at least one file open. You can use the Ctrl-
Tab key combination to switch between open files, or use the Window menu to arrange the open
files (cascade, tile). If you have several or all output (*.ph*) files open for a particular analysis,
you could use the Window menu to arrange them for convenient switching.

3.2 Command syntax

PARSCALE uses the command conventions of other IRT programs published by SSI. Com-
mands employ the general syntax:
>NAME KEYWORD1=n, KEYWORD2=(list), …, OPTION1….

The following general rules apply.

 A greater-than sign (>) must be entered in column 1 of the first line of a command and
followed without a space by the command name.
 All command names, keywords, options, and keyword values must be entered in UPPER
CASE.
 Command names, keywords, and options may be entered in full or abbreviated to the first
three characters.
 At least one space must separate the command name from any keywords or options.
 All keywords and options must be separated by commas.
 The equals sign is used to set a keyword equal to a value, which may be integer, real, or
character. A real value must contain a decimal point. A character string must be enclosed
in single quotes if:

261
3 PARSCALE REFERENCE

 it contains more than eight characters


 it begins with a numeral
 it contains blanks, commas, semicolons, or slashes

Example:

DFNAME=’EXAMPL0l.DAT’, TNAME=’20-ITEMS’.

A keyword may be vector valued; i.e., set equal to a list of integer, real, or character constants,
separated by commas or spaces, and enclosed in parentheses.

If the list is an arithmetic progression of integer or decimal numbers, the short form,
first(increment)last, may be used. Thus, a selection of items 1,3,7,8,9,10,15 may be entered as
1,3,7(1)10,15. Real values may be used in a similar way.

If the values in the list are equal, the form, value(0)number of values, may be used. Thus,
1.0,1.0,1.0,1.0,1.0 maybe entered as 1.0(0)5.

 The italic elements in the command format description are variables that the user needs to
replace.
 Command lines may not exceed 128 columns. Continuation on one or more lines is per-
mitted. See Section 3.2.6 for more information.
 Filenames, including the path, may not exceed 128 characters, including path to file.
 Each command terminates with a semicolon (;). The semi colon functions as the command
delimiter; it signals the end of the command and the beginning of a new command.

Related topics

For information on the order of commands and keywords associated with each command, please
see Section 3.2.1.

3.2.1 Order of commands

The table below lists all available PARSCALE commands in their necessary order. This order is
also used in the remainder of this section of the user’s guide. Commands marked as “required”
must appear in the command file for each problem setup. All other commands are optional. In
other words, at a minimum the command file should start with two TITLE lines, followed by the
FILES, INPUT, TEST (or SCALE), BLOCK, CALIB, and SCORE command lines. Note that INPUT
and the variable format statement may be followed by data. The variable format statement is also
required in the command file when raw data are read in from an external file.

Note that, in the remainder of this chapter, the commands are discussed in alphabetical order, and
not in the required order as shown below.

262
OVERVIEW OF SYNTAX

Table 8.1: Order of PARSCALE commands

Required Command Keywords Options


TITLE
*
COMMENT

FILES DFNAME=<name>, FNAME=<name>, SAVE


* CFNAME=<name>, FNAME=<name>,
OFNAME=<name>, NFNAME=<name>
SAVE MASTER=<name>, CALIB=<name>,
PARM=<name>, SCORE=<name>,
FIT=<name>, COMBINE=<name>,
INFORMATION=<name>
INPUT NTEST=n, LENGTH=(list), WEIGHT, GROUPLEVEL
* NFMT=n, SAMPLE=n, TAKE=n,
NIDCH=n, NTOTAL=n, INOPT=n,
COMBINE=n, MGROUP/MRATER=n,
NRATER=(list), R-INOPT=n

* (variable
format
statement)
TEST TNAME=n, NBLOCK=n,
* ITEMS=(list),INAME=(list),
INTERCEPT=(list),
THRESHOLD=(list), SLOPE=(list)
BLOCK BNAME=(list), NITEMS=n, CSLOPE, NOCADJUST
* NCAT=n, ORIGINAL=(list),
MODIFIED=(list),
CNAME=(list), CADJUST=n,
CATEGORY=(list), GPARM=(list),
GUESSING=(list),
SCORING=(list),
REPEAT=n, SKIP=(list),
RATER=(list),
MGROUP GNAME=(list), GCODE=(list),
DIF=(list), REFERENCE=n,
COMMON=(list)
MRATER RNAME=(list), RCODE=(list),
RATER=(list)

263
3 PARSCALE REFERENCE

CALIB SCALE=n, NQPT=n, DIST=n, GRADED/PARTIAL,


* CYCLES=(list), CRIT=(list), LOGISTIC/NORMAL,
DIAGNOSIS=n, QRANGE=(list), POSTERIOR, FLOAT,
ITEMFIT=n, RIDGE=(list), QPREAD, ESTORDER,
NEWTON=n, FREE=(list), SPRIOR,TPRIOR,
GPRIOR, PRIORREAD,
NOCALIB, SKIPC,
ACCEL/NOACCEL,
CSLOPE, THRESHOLD,
NRATER
QUADP POINTS=(list),WEIGHTS=(list)

PRIORS TMU=(list), TSIGMA=(list), SOPTION


SMU=(list), SSIGMA=(list),
GMU=(list), GSIGMA=(list),
SCORE NQPT=n, DIST=n, QRANGE=(list), PRINT, QPREAD,
* SMEAN=n, SSD=n, NAME=n, PFQ=n, NOSCORE, SAMPLE,
ITERATION=(list), EAP/WML/MLE,
SCORING=(list) RESCALE, NOADJUST,
FlT, NRATER
QUADS POINTS=(list),WEIGHTS=(list)

COMBINE NAME=n, WEIGHTS=(list)

Notes

 A series of commands from TEST to QUADS should be repeated for the number of subtests,
specified by the NTEST keyword in the INPUT command.
 The BLOCK command should be repeated for the number of blocks, specified by the
NBLOCK keyword in the TEST (or SCORE) command. Repetition of the BLOCK commands
can be shortened by utilizing the REPEAT keyword in the TEST (or SCORE) command.
 The COMBINE command is optional and must be placed at the end of the PARSCALE com-
mand file.

Related topics

 Command syntax (Section 3.2)

264
BLOCK COMMAND

3.2.2 BLOCK command

(Required)

Purpose

To provide a block name, and to identify the items that belong to block j in subtest or sub-
scale i.

Format

>BLOCK BNAME=(list), NITEMS=n, NCAT=n, ORIGINAL=(list),


MODIFIED=(list),CNAME=(list), CATEGORY=(list), CADJUST=n,
GPARM=(list),GUESSING=(n, FIX/ESTIMATE),SCORING=(list),
REPEAT=n, SKIP=(list), RATER=(list),CSLOPE, NOCADJUST;

Notes

 There should be as many BLOCK commands as the total number of blocks specified with
the NBLOCK keyword on each TEST (or SCALE) command. These BLOCK commands are re-
quired commands.
 Each of the BLOCK commands provides a block name (BNAME), the number of items in the
block (NITEMS), the number of categorical responses that those items share (NCAT), and the
identification of those items. Categorical responses of the raw data are assumed to be
coded as consecutive integers, such as 1, 2, 3, and so forth. (Notice that the first categori-
cal response is coded 1 instead of 0.) Use the ORIGINAL keyword to describe categorical
responses that are coded differently in the input file.
 The ORIGINAL and MODIFIED keywords may be used to re-order or concatenate the origi-
nal categorical responses in the block. See the examples in Chapter 11.
 The user may supply the initial values of the parameters for the estimation phase with the
CATEGORY keyword.
 Block names or category names that
 do not begin with a letter, or
 contain blanks and/or special (non-alphanumeric) symbols, or
 consist of more than 8 characters,

must be enclosed in single quotes.

Related topics

 Examples of BLOCK commands


 TEST/SCALE command: NBLOCK keyword (Section 3.2.15)

265
3 PARSCALE REFERENCE

BNAME keyword

Purpose

To provide the block name, which may be up to eight characters in length. If the REPEAT
keyword is used, all values of the keywords including the block name are replicated for sub-
sequent blocks. A user can supply unique block names for those replicated blocks by using
the BNAME keyword.

Format

BNAME= ( n1 , n2 ,..., nREPEAT )

Default

Supplied by program

Related topics

 BLOCK command: REPEAT keyword


 Examples of BLOCK commands

CADJUST keyword

Purpose

To control the location adjustment: n sets the mean of the category parameters.

Format

CADJUST=n

Default

0.0.

Related topics

 Examples of BLOCK commands

CATEGORY keyword

Purpose

To provide initial category parameter values for the estimation process. If the CATEGORY
keyword is supplied, but no values are specified, then the constant values from “scores
266
BLOCK COMMAND

for ordinal or ranked data” (Statistical Tables for Biological, Agricultural, and Medical Re-
search, R. A. Fisher & F. Yates, p. 66) substitute the default initial values of the category
parameters.

Format

CATEGORY= ( n1 , n2 ,..., nMODIFIED )

Default

Supplied by program.

Related topics

 BLOCK command: MODIFIED keyword


 Examples of BLOCK commands

CNAME keyword

Purpose

To provide a list of names for categories.

Format

CNAME= ( n1 , n2 ,..., nMODIFIED )

Default

Blanks.

Related topics

 BLOCK command: MODIFIED keyword


 Examples of BLOCK commands

CSLOPE option

Purpose

To request the estimation of a single common slope parameter for all items in the block.

Format

CSLOPE

267
3 PARSCALE REFERENCE

Related topics

 Examples of BLOCK commands

GPARM keyword

Purpose

To provide guessing parameters that are used only for the correction of dichotomous item
response probabilities if GUESSING is specified. If GUESSING is specified, these guessing pa-
rameters are used for the initial parameter values.

Format

GPARM= ( n1 , n2 ,..., nMODIFIED )

Default

0.0.

Related topics

 BLOCK command: GUESSING keyword


 Examples of BLOCK commands

GUESSING keyword

Purpose

To request the use of the item-response model with a lower asymptote (guessing) parameter,
g. P* = g +(1 - g)P for the k-th response category and P* = (1- g)P for others. The lower
asymptote (guessing) parameters are estimated if ESTIMATE is specified; otherwise, the
probabilities of categorical responses are only corrected by fixed parameter values, which
are supplied by the item-parameter file or the keyword GPARM in the BLOCK command.

Format

GUESSING=(n,FIX/ESTIMATE)

Default

(2,FIX) (lower asymptote (guessing) parameters are not estimated)

268
BLOCK COMMAND

Related topics

 BLOCK command: GPARM keyword


 Examples of BLOCK commands

MODIFIED keyword

Purpose

To provide a list of integers corresponding to the original response codes. The first category
should correspond to n = 1, not n = 0. The number of arguments should be equal to NCAT.
The program computes automatically the number of response categories after the modifica-
tion, as specified by the MODIFIED keyword. If some categories are collapsed, making the
modified number less than NCAT, the modified number is used to read the keywords CNAME
and CATEGORY.

Format

MODIFIED = ( n1 , n2 ,..., nNCAT )

Default

1 through NCAT

Related topics

 BLOCK command: CNAME, CATEGORY, and NCAT keyword


 Examples of BLOCK commands

NCAT keyword

Purpose

To provide the number of response categories in the block.

Format

NCAT=n

Default

2.

Related topics

 Examples of BLOCK commands

269
3 PARSCALE REFERENCE

NITEMS keyword

Purpose

To indicate the number of items in the block.

Format

NITEMS=n

Default

The number of items in the subtest (LENGTH on INPUT command).

Related topics

 Examples of BLOCK commands


 INPUT command: LENGTH keyword

NOCADJUST option

Purpose

To omit the adjustment provided by the CADJUST keyword during the calibration.

Format

NOCADJUST

Related topics

 BLOCK command: CADJUST keyword


 Examples of BLOCK commands

ORIGINAL keyword

Purpose

To provide a list of the original categorical response codes (up to four characters each). The
number of arguments should be equal to NCAT.

270
BLOCK COMMAND

Format

ORIGINAL= ( n1 , n2 ,..., nNCAT )

Default

1 through NCAT

Related topics

 BLOCK command: NCAT keyword


 Examples of BLOCK commands

RATER keyword

Purpose

To provide the ratio of a rater variance and an error variance per item. This ratio is used for
the correction of the information function per item.

If “ n1 ” is specified, but no other “ n ” is specified, then the default values of those unspeci-
fied “ n ” are “ n1 ” (the first “ n ” value).

Format

RATER= ( n1 , n2 ,..., nNITEMS )

Default

n1 = 0

Related topics

 Examples of BLOCK commands

REPEAT keyword

Purpose

To request the repetition of a BLOCK command. The ij-th BLOCK command will be automati-
cally repeated n times. This option maybe used to estimate different category values for each
item (Samejima’s model).

271
3 PARSCALE REFERENCE

Format

REPEAT=n

Default

0.

Related topics

 Examples of BLOCK commands

SCORING keyword

Purpose

To specify the scoring function of the partial credit models using scoring function values.
Values can be fractional.

Format

SCORING= ( n1 , n2 ,..., nMODIFIED )

Default

1.0, 2.0, 3.0, ....

Related topics

 BLOCK command: MODIFIED keyword


 Examples of BLOCK commands

SKIP keyword

Purpose

To skip the parameter estimation for this particular block and use the parameter values sup-
plied by a user or the program.

 n1 : If the estimation of the slope parameters needs to be skipped, set this value to one,
otherwise 0.
 n2 : If the estimation of the threshold parameters needs to be skipped, set this value to one,
otherwise 0.
 n3 : If the estimation of the category parameters needs to be skipped, set this value to one,
otherwise 0.

272
BLOCK COMMAND

 n4 : If the estimation of the lower asymptote parameters needs to be skipped, set this value
to one, otherwise 0.

If the keyword SKIP appears without arguments, all of the parameter estimations are
skipped, that is, SKIP=(1,1,1,1). If no SKIP keyword appears, none of the parameter esti-
mations is skipped, that is, SKIP=(0,0,0,0).

Format

SKIP= ( n1 , n2 , n3 , n4 )

Related topics

 Examples of BLOCK commands

Examples of BLOCK commands

 The four categorical responses are coded as A, B, C, and D and the user wants to concate-
nate the categories A and B as the first category. Note that NCAT specifies the number of
categories before the modification.

>BLOCK NCAT=4, ORIGINAL=(A,B,C,D), MODIFIED=(1,1,2,3)

 The four categorical responses are coded as 1, 2, 3, and 4 and the user wants to reverse the
order of the categories. The ORIGINAL keyword is not really needed in this case, because it
specifies the default. Note the single quotes around the specified block name, due to the
presence of the hyphen.

>BLOCK BNAME=’OBS—RHET’, NCAT=4, ORIGINAL=(1,2,3,4), MODIFIED=(4,3,2,1);

273
3 PARSCALE REFERENCE

3.2.3 CALIB command

(Required)

Purpose

To control the item and category parameter estimation and to specify prior distributions on
the parameters for subtest or sub-scale i.

Format

>CALIB GRADED/PARTIAL, LOGISTIC/NORMAL, SCALE=n, NQPT=n, DIST=n,


CYCLES=(list), CRIT=(list), DIAGNOSIS=n, QRANGE=(list), ITEMFIT=n,
RIDGE=(list), NEWTON=n, POSTERIOR, FLOAT, QPREAD, ESTORDER,
SPRIOR,TPRIOR, GPRIOR, PRIORREAD, NOCALIB, SKIPC,
FREE=(t/NOADJUST,u/NOADJUST,REFERENCE/COMBINED,POSTERIOR/MLE),
ACCEL/NOACCEL, CSLOPE, THRESHOLD, NRATER;

Notes

 This is a required command.


 There should be as many CALIB commands as there are subtests, in the same order as the
TEST commands.
 The values for both the CYCLES and CRIT keywords are positional. To change a default
value after the first position, blanks must be supplied for the earlier positions (delimited
by commas).
 To use the same values for all the CRIT parameters, for instance 0.05, specify CRIT = 0.05,
without parentheses.
 The CYCLES keyword may be used to limit the iterations. One reason would be to check
the problem setup. Another reason to do so is to check the convergence of the estimates.
Sometimes the priors specified for slopes and thresholds are not strict enough, preventing
some estimates from converging. In that case, save the intermediate estimates in an exter-
nal file by specifying the PARM keyword in the SAVE command, and use these estimates as
starting values by specifying the IFNAME keyword in the FILES command in a following
run. Repeat this process until convergence of the parameters is reached.
 For the DIF model, the optimal estimation process is obtained by specifying FREE=(0,1)
and POSTERIOR.

Related topics

 FILES command: IFNAME keyword (Section 3.2.6)


 SAVE command: PARM keyword (Section 3.2.13)
 TEST/SCALE command (Section 3.2.15)

274
CALIB COMMAND

ACCEL/NOACCEL option

Purpose

To specify whether or not the acceleration routine should be used after each cycle of the EM
iterations. ACCEL specifies that it will be used, while NOACCEL specifies that it will not be
used.

Format

ACCEL/NOACCEL

Default

NOACCEL

CRIT keyword

Purpose

To control the parameters of the iterative procedure.

 j Convergence criterion for EM cycles (ACRIT). (Default 0.001)

 k Convergence criterion for inner EM cycles of category parameter esti-


mation. (Default equal to the ACRIT value above)
 l Convergence criterion for inner EM cycles of threshold parameter esti-
mation. (Default equal to the ACRIT value above)
 m Convergence criterion for inner EM cycles of slope parameter estima-
tion. (Default equal to the ACRIT value above)
 n Convergence criterion for inner EM cycles of guessing parameter esti-
mation. (Default equal to the ACRIT value above)
 o Convergence criterion for inner EM cycles of multiple-group parameter
estimation. (Default equal to the ACRIT value above)

Format

CRIT= (j,k,l,m,n,o)

CSLOPE option

Purpose

To request the estimation of a single common slope parameter for all items in the subtest.

275
3 PARSCALE REFERENCE

Format

CSLOPE

CYCLES keyword

Purpose

To specify parameters for the EM cycles.

 d The maximum number of EM cycles. (Default 10, if LENGTH < 50 (see INPUT
command, Section 3.2.7); 5, otherwise)
 e The maximum number of inner EM iterations of item and category parameter
estimation. (Default 1)
 f The maximum number of inner EM iterations of category parameter estimation.
(Default 1)
 g The maximum number of inner EM iterations of item parameter estimation.
(Default 1)
 h The maximum number of inner EM iterations of the multiple rater parameter es-
timation. (Default 1)
 i The minimum number of the inner EM iterations of item and category parame-
ter estimation. (Default 1)

Format

CYCLES = (d,e,f,g,h,i)

Related topics

 INPUT command: LENGTH keyword (Section 3.2.7)

DIAGNOSIS keyword

Purpose

To request diagnostic output and to specify the level of diagnostic output, from zero (no di-
agnostic out put) through 6 (maximum diagnostic output).

 n = 0: Minimum printout of parameter estimates after each calibration cycle.


 n = 1: Summary statistics of parameter estimates after each calibration cycle.
 n = 2: Intermediate parameter estimates after each calibration cycle.
 n = 3 to 6: Detailed diagnostic printout for checking the program or computations.

Diagnostic output of higher numbers includes the printout of the lower ones. n = 3 or higher
is not recommended for normal use.

276
CALIB COMMAND

Format

DIAGNOSIS=n

Default

0, no diagnostic output

DIST keyword

Purpose

To designate the type of prior distribution specified for the ability distribution in the popula-
tion of respondents.

 n = 1: Uniform distribution
 n = 2: Normal on equally spaced points
 n = 3: Normal on Gauss-Hermite points
 n = 4: User supplied

Format

DIST=n

Default

ESTORDER option

Purpose

To reverse the estimation order of the EM cycles. This implies that the item parameters will
be estimated before the category parameters, rather than the other way around.

Format

ESTORDER

FLOAT option

Purpose

To specify that the means of the prior distributions on the item parameters are estimated by
marginal maximum likelihood, along with the parameters. If this option does not appear, the
means are kept fixed at their specified values during estimation.

277
3 PARSCALE REFERENCE

Format

FLOAT

Remark:

Standard deviations of the priors are fixed in either case. This option should not be invoked
when the data set is small and the items few. The means of the item parameters may drift in-
definitely during the estimation cycles under these conditions.

FREE keyword

Purpose

To specify the posterior latent distributions to be used.

 If the DIF model is chosen, a prior latent trait distribution is normally used for each sub-
group. If the FREE keyword is specified, the posterior distribution is substituted for the
prior distribution.
 If this keyword is specified with numerical values of t and u, the multiple posterior distri-
butions are rescaled to mean t and standard deviation u. If NOADJUST is specified for
either argument, no rescaling will be done with respect to mean or standard deviation or
both. The defaults are rescaling with t = 0.0 and u = 1.0.
 If the third argument is COMBINED, the multiple posterior distributions are combined and a
total distribution is rescaled. Otherwise, only the reference group is rescaled to mean t
and standard deviation u and other groups are adjusted accordingly.
 If the fourth argument is specified, the MLE scores are computed and used for the poste-
rior distributions. The MLE option is not generally recommended.

Format

FREE=(t/NOADJUST,u/NOADJUST, REFERENCE/COMBINED, POSTERIOR/MLE)

Default

t=0.0, u=1.0, REFERENCE, POSTERIOR

GPRIOR option

Purpose

To request the use of a Beta prior distribution on the guessing parameter.

278
CALIB COMMAND

Format

GPRIOR

GRADED/PARTIAL option

Purpose

To specify the response model to be used: GRADED specifies the graded response model, and
PARTIAL specifies the partial credit model.

Format

GRADED/PARTIAL

ITEMFIT keyword

Purpose

To specify the number of frequency score groups to be used for the computation of item-fit
statistics. If the ITEMFIT value specified is greater than NQPT, the NQPT value specified will
replace the ITEMFIT value.

Format

ITEMFIT=n

Default

None

Related topics

 CALIB command: NQPT keyword

LOGISTIC/NORMAL option

Purpose

To specify the response function metric to be used: LOGISTIC specifies that the natural met-
ric of the logistic response function is used in all calculations, while NORMAL specifies the
use of the metric of the normal response function (normal ogive model). This choice is ef-
fective only if the graded response model is used. For the partial credit model, only the lo-
gistic response function is available.

279
3 PARSCALE REFERENCE

Format

LOGISTIC/NORMAL

NEWTON keyword

Purpose

To specify the maximum number of Newton-Gauss (Fisher scoring) iterations following the
EM cycles.

Format

NEWTON=n

Default

0.

NOCALIB option

Purpose

To request that the calibration of both item and category parameter estimation will be
skipped. This option permits tests to be scored from previously estimated parameters (see
FILES and INPUT commands in Sections 3.2.6 and 3.2.7).

Format

NOCALIB

Related topics

 FILES command (Section 3.2.6)


 INPUT command (Section 3.2.7)

NQPT keyword

Purpose

To specify the number of quadrature points to be used in the EM and Newton estimation.

Format

NQPT=n

280
CALIB COMMAND

Default

30.

NRATER option

Purpose

To specify that the correction for the information function, specified with the RATER key-
word on the BLOCK command, is not to be used for calibration.

Format

NRATER

Related topics

 BLOCK command: RATER keyword (Section 3.2.2)

POSTERIOR option

Purpose

To specify the computation of the posterior distribution after the M-step in the EM cycle, in
addition to normally doing so after the E-step. This allows the expected proportions com-
puted in each succeeding E-step to be based on an updated posterior distribution.

Format

POSTERIOR

PRIORREAD option

Purpose

To specify the use of the slope, threshold, and category parameter priors specified by the
user in the PRIORS command.

Format

PRIORREAD

Related topics

 PRIORS command (Section 3.2.10)

281
3 PARSCALE REFERENCE

QPREAD option

Purpose

To specify that quadrature points and weights are to be read from the following QUADP
command. Otherwise, the program supplies the quadrature points and weights (and no
QUADP command follows).

Format

QPREAD

Related topics

 QUADP command (Section 3.2.11)

QRANGE keyword

Purpose

To specify the upper (q) and lower (r) range of the quadrature points.

Format

QRANGE=(q,r)

Default

(-4.0, +4.0)

Note:

This keyword is effective only if DIST = 1 or 2 (see SCORE command, Section 3.2.14).

Related topics

 SCORE command: DIST keyword (Section 3.2.14)

RIDGE keyword

Purpose

To specify that a ridge constant is to be added to the diagonal elements of the information
matrix to be inverted during the EM cycles and the Newton iterations.

282
CALIB COMMAND

The ridge constant starts at the value of 0.0 and is increased by v if the ratio of a pivot and
the corresponding diagonal elements of the matrix is less than w.

Format

RIDGE=(v,w)

Default

No ridge.

SCALE keyword

Purpose

To provide a scale constant for the item response model.

Format

SCALE=n

Default

1.0 for the normal ogive item response model; 1.7 for the logistic item response model.

SKIPC option

Purpose

To request the skipping of the calibration of the category parameters.

Format

SKIPC

SPRIOR option

Purpose

To request the use of a log-normal prior distribution on the slope parameter.

Format

SPRIOR

283
3 PARSCALE REFERENCE

THRESHOLD option

Purpose

To specify that the item location parameter for a dichotomous item is to be estimated di-
rectly as a threshold. Otherwise, an intercept parameter is estimated and converted to a
threshold. It is only effective for dichotomously scored items.

Format

THRESHOLD

TPRIOR option

Purpose

To request the use of a normal prior distribution on the threshold parameter.

Format

TPRIOR

284
COMBINE COMMAND

3.2.4 COMBINE command

(Optional)

Purpose

To provide the weighting coefficients of a combined subtest or subscale score.

Format

>COMBINE NAME=n, WEIGHTS=(list);

Notes

The keyword COMBINE on the INPUT command establishes the number of COMBINE com-
mands that should be inserted here, if any. Each of these COMBINE commands gives the name
for the combined score and the weights corresponding to the subscale scores. The number of
weight constants is the same as the total number of subscales (the total number of SCORE
commands). Specific subscores may be excluded from the combined score by entering a
zero for that subscore.

Combined score names that

 do not begin with a letter, or


 contain embedded blanks and/or special (non-alphanumerical) symbols, or
 consist of more than 8 characters,

must be enclosed in single quotes.

Related topics

 INPUT command: COMBINE keyword (Section 3.2.7)


 SAVE command: COMBINE keyword (Section 3.2.13)
 SCORE command (Section 3.2.14)

NAME keyword

Purpose

To specify the name of the combined score (up to eight characters).

Format

NAME=character string

285
3 PARSCALE REFERENCE

Default

Blank.

WEIGHTS keyword

Purpose

To specify the weights for combining of the subscores. The subscores are combined linearly.

 For sums and means: a set of positive fractions with decimal points, summing to 1.0, for
weights of subscale scores.
 For DIF: a set of fractions with decimal points, summing to 0.0, for weights of subscale
scores.

Format

WEIGHTS= ( n1 , n2 ,..., nn ) ;

Default

None.

286
COMBINE COMMAND

3.2.5 COMMENT command

(Optional)

Purpose

To enter one or more lines of explanatory remarks into the program output stream.

Format

>COMMENT ...text...
...text...

Notes

This line and all subsequent lines preceding the FILES command will be printed verbatim in
the initial output stream. The maximum length of each line is 80 characters. A semicolon to
signal the end of the command is not needed. Comments are optional.

Example:

EXAMPLE 4. BIOLOGY LABORATORY PERFORMANCE ASSESSMENT PARTIAL CREDIT


MODEL

>COMMENT

Data for this example are from the study described by Doran, et al., in
the April, 1992, Science Teacher. The ratings of the student’s laboratory
reports with different numbers of graded categories are assigned to dif-
ferent blocks. Categories 1 and 2 of the item in block 4, which had low
frequency of use, were collapsed in the modified category assignments. Be-
cause of the limited … are estimated and saved.
>FILES DFNAME=’EXAMPLO4.DAT’,SAVE;

Default

No comments.

Related topics

 FILES command (see below)

287
3 PARSCALE REFERENCE

3.2.6 FILES command

(Required)

Purpose

To assign names to the input files.

Format

>FILES DFNAME=<name>, MFNAME=<name>, CFNAME=<name>, IFNAME=<name>,


OFNAME=<name>, NFNAME=<name>, SAVE;

Notes

The master and calibration files are binary files and created by the program. They can be
saved for reuse by specifying the MASTER and CALIB keywords on the SAVE command, re-
spectively. Otherwise, they are automatically deleted at the end of the analysis.

 Other files are ASCII (plain text) files and their specifications are described in Section
3.3.1 of the manual.
 FILES is a required command.
 If filenames are supplied, the files must already exist.
 Names must be enclosed in single quotes.
 The maximum length of filenames is 128 characters, including the directory path, if
needed. Note that each line of the command file has a maximum length of 80 characters.
If the filename does not fit on one line of 80 characters, the remaining characters should
be placed on the next line, starting at column 1.
 The original response data are recoded into a binary form and saved in the master file
(MFNAME). If the SAMPLE keyword on the INPUT command is specified, the additional bi-
nary file, the calibration file (CFNAME), is created and the responses of the randomly sam-
pled respondents are saved in this calibration file. The calibration file is used for the item
parameter estimation. For the scoring of respondents, however, the master file is used and
all respondents’ scores are computed. This option shortens the calibration stage, but still
computes all respondents’ scores. If only the sampled respondents need to be scored, the
user must specify the SAMPLE keyword on the SCORE command. If no SAMPLE keyword on
the INPUT command is specified, only the master file is created, and it is used for both the
calibration and scoring phases.
 To read data from a previously prepared master file, specify the MFNAME keyword instead
of the DFNAME keyword. If an existing item-parameter file is specified by the IFNAME key-
word, and the NOCALIB option is evoked in the CALIB command for the test, scores for the
test will be computed from the previously estimated parameters in the IFNAME file.

288
FILES COMMAND

Example

>FILES DFNAME=’c89conv.dat’, IFNAME=’cap90ctl.if1’, SAVE;

Related topics

 CALIB command (Section 3.2.3)


 INPUT command (Section 2.6.9)
 SAVE command (Section 3.2.13)
 SCORE command (Section 3.2.14)
 Key files (Section 3.3.4)

CFNAME keyword

Purpose

To provide the name of the calibration file.

Format

CFNAME=<'filename'>

Default

Supplied by program.

DFNAME keyword

Purpose

To specify the name of the raw data file. This file contains the original data.

Format

DFNAME=<'filename'>

Default

Command file contains the raw data after the format code(s).

IFNAME keyword

Purpose

To specify the name of the item-parameter file.

289
3 PARSCALE REFERENCE

Format

IFNAME=<'filename'>

Default

Supplied by program.

MFNAME keyword

Purpose

To provide the name of the master file.

Format

MFNAME=<'filename'>

Default

Supplied by program.

NFNAME keyword

Purpose

To specify the name of the not-presented-key file.

Format

NFNAME=<'filename'>

Default

Blank.

OFNAME keyword

Purpose

To specify the name of the omit-key file.

Format

OFNAME=<'filename'>

290
FILES COMMAND

Default

Blank.

SAVE option

Purpose

To indicate that additional output files are requested. If this option is present, then the SAVE
command must follow the FILES command. Otherwise, the next command is the INPUT
command. In other words, this option has to be specified if you want to save any or all of the
intermediate output files; the specific output files are selected with the following SAVE
command.

Format

SAVE

Related topics

 SAVE command (Section 3.2.13)

291
3 PARSCALE REFERENCE

3.2.7 INPUT command

(Required)

Purpose

To describe the original data file and to supply other information used in all three phases of
the program.

Format

>INPUT NTEST=n, LENGTH=(list),NFMT=n, SAMPLE=n, TAKE=n, NIDCH=n, NTOTAL=n,


INOPT=n, COMBINE=n, MGROUP/MRATER=n, WEIGHT, GROUPLEVEL,
NRATER=(list), R-INOPT=n;

Notes

 INPUT is a required command.


 The TAKE keyword is useful for testing the command file specifications on a small number
of respondents when the sample size is large.

Related topics

 Examples of INPUT commands

COMBINE keyword

Purpose

To specify the number of COMBINE commands that will be used to compute weighted score
combinations (see Section 3.2.4) in the case of multiple subtests or subscores.

Format

COMBINE=n

Default

No combined scores.

Related topics

 SCORE command (Section 3.2.14)


 COMBINE command (Section 3.2.4)
 Examples of INPUT commands

292
INPUT COMMAND

GROUPLEVEL option

Purpose

To indicate that group-level frequency data will be used as input instead of the default single
respondent data (see Section 3.3.1). Note that this option is not available for the Raters-
effect model.

Format

GROUPLEVEL

Related topics

 Examples of INPUT commands

INOPT keyword

Purpose

To specify the nature of group-level input records (note that this only applies if the GROUP
option has been specified). The possible values for INOPT are:

 1: Categorical responses
 2: Not-presented categorical responses plus frequencies
 3: Omit categorical responses plus frequencies
 4: Not-presented plus Omit categorical responses plus frequencies
 5: A series of categorical response code plus its frequency

Format

INOPT=n

Default

15.

Related topics

 Examples of INPUT commands


 INPUT command: GROUPLEVEL option

293
3 PARSCALE REFERENCE

LENGTH keyword

Purpose

To specify the number of items in each subtest or subscale. If there is only one subtest (the
default), the format LENGTH=n may be used.

Format

LENGTH= ( n1 , n2 ,..., na )

Default

NTOTAL.

Related topics

 Examples of INPUT commands

MGROUP/MRATER keyword

Purpose

To specify the number of subgroups. The keyword MGROUP should be specified if the DIF
model is used. MGROUP is the number of subgroups. In this case, an MGROUP command should
also be present in the command file, after the BLOCK command(s) and before the CALIB
command.

Format

MGROUP/MRATER=n

Default

MGROUP = 1 for the multiple-group models; MRATER = 0 or no Rater-effect model.

Notes

 Note that either MGROUP or MRATER can be specified, but not both.
 The keyword MRATER should be used if the Raters-effect model is used, in which case
MRATER specifies the number of raters. If MRATER is specified, an MRATER command must
be present after the BLOCK command(s) and before the CALIB command.

294
INPUT COMMAND

Related topics

 MGROUP command (Section 3.2.8)


 BLOCK command (Section 3.2.2)
 CALIB command (Section 3.2.3)
 Examples of INPUT commands

NFMT keyword

Purpose

To indicate the number of lines used for the format statement(s) that specify how to read the
original data records.

Format

NFMT=n

Default

1.

Related topics

 Examples of INPUT commands

NIDCHAR keyword

Purpose

To specify the number of characters in the respondent’s identification field, at least 1 and at
most 30 characters long.

Format

NIDCH=n

Default

30.

Related topics

 Examples of INPUT commands

295
3 PARSCALE REFERENCE

NRATER keyword

Purpose

To specify the number of times each of k items is rated by each rater. Note that this keyword
can only be used when multiple raters rate examinees.

Note

When rater data are analyzed, data are read in a different format. See Section 3.2.17 for ex-
amples of variable format statements for such data.

Format

NRATER= ( n1 , n2 ,..., nk )

Default

1.

Related topics

 INPUT command: MRATER keyword


 INPUT command: R-INOPT keyword

NTEST keyword

Purpose

To indicate the number of subtests or subscales to be analyzed.

Format

NTEST=n

Default

1.

Related topics

 Examples of INPUT commands

296
INPUT COMMAND

NTOTAL keyword

Purpose

To specify the total number of items in the original data records. The items for particular
subtests or subscales are selected from these items using the TESTi (or SCALEi) commands.

Format

NTOTAL=n

Default

0.

Related topics

 Examples of INPUT commands


 TEST/SCALE command (Section 3.2.15)

R-INOPT keyword

Purpose

This keyword is exclusively used when examinees are rated by multiple raters. By default, it
is assumed that all the data for an examinee are given on the same line. If multiple lines are
used, n should be set to the number of lines containing information for an examinee.

Note

When rater data are analyzed, data are read in a different format. See Section 3.2.17 below
for examples of variable format statements for such data.

Format

R-INOPT=n

Default

R-INOPT=1.

Related topics

 INPUT command: MRATER keyword


 INPUT command: NRATER keyword

297
3 PARSCALE REFERENCE

SAMPLE keyword

Purpose

To request a percentage (0-100) of respondents to be randomly sampled from the raw data
file.

Format

SAMPLE=n

Default

SAMPLE=100.

Related topics

 Examples of INPUT commands

TAKE keyword

Purpose

To request the analysis of only the first n respondents in the raw data file.

Format

TAKE=n

Default

Use all data

WEIGHT option

Purpose

To indicate the presence of case weights. If this option is present, each input record has a
case weight. In each data record, the weight follows the case ID and precedes the item re-
sponses.

Format

WEIGHT

298
INPUT COMMAND

Related topics

 Examples of INPUT commands

Examples of INPUT commands

The following INPUT command specifies a 160 item test divided in 16 subtests of 10 items
each. The first fifteen characters of each record are for identification purposes, and one for-
mat statement will follow describing each record.

>INPUT NIDCH=5, NTOTAL=160, NTEST=16,NFMT=1,


LENGTH=(10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10);
(15A1,5X,60A1,/,80A1,/,20A1)

The next example is a variation on the first in that the data are now weighted. The option
WEIGHT specifies that each record will have a case weight. It follows immediately after the
case ID, and—as the following format statement describes—has a field width of five col-
umns.

>INPUT WEIGHT, NIDCH=15, NTOTAL=160, NTEST=17,


LENGTH= (6,6,6,6,6,6,6,6,6,6,6,6,5,5,6,6,66);
15A1,F5.0,12(2X,6A2),2(2X,5A1),2(2X,6A2),/,66A1)

299
3 PARSCALE REFERENCE

3.2.8 MGROUP command

(Optional)

Purpose

To provide necessary information about the DIF model.

Format

>MGROUP GNAME=(list), GCODE=(list), DIF=(list), REFERENCE=n,


COMMON=(list);

Notes

 This command is required if the MGROUP keyword is specified in the INPUT command.
 Group names and group codes must be enclosed in single quotes if they do not begin with
a letter or if they contain blanks or special (non-alphanumeric) symbols. Note that group
codes in the data records do not need quotes, regardless of what characters are used.

Related topics

 INPUT command: MGROUP/MRATER keyword

COMMON keyword

Purpose

To specify the positions of the common blocks for each subtest. Note that this can be used
only with the DIF model. A common block contains items for which the model parameters
are the same among the multiple groups in spite of the DIF model.

Format

COMMON= ( n1 , n2 ,...)

Default

None.

300
MGROUP COMMAND

DIF keyword

Purpose

To specify the DIF model. If the value of nn is 1, separate item parameters for multiple sub-
groups are estimated. If the value of nn is 0, a common item parameter for multiple sub-
groups is obtained. Each position of DIF arguments corresponds to a particular item parame-
ter:

 n1 : If slope parameter differs among groups, set to 1, otherwise 0.


 n2 : If threshold parameters differ among groups, set to 1, otherwise 0.
 n3 : If category parameters differ among groups, set to 1, otherwise 0.
 n4 : If lower asymptote parameters differ among groups, set to 1, otherwise 0.

Format

DIF= ( n1 , n2 , n3 , n4 )

Default

DIF=(0,1,0,0).

GCODE keyword

Purpose

To specify the subgroup identification code, which appears in the data field of the original
response file (DFNAME) in the same order as the group names, up to four characters.

Format

GCODE= ( n1 , n2 ,..., nMGROUP )

Default

GCODE=(‘nnnl’, ‘nnn2’, ...), where n is a blank character.

Related topics

 FILES command: DFNAME keyword (Section 3.2.6)


 INPUT command: MGROUP/MRATER keyword (Section 3.2.7)

301
3 PARSCALE REFERENCE

GNAME keyword

Purpose

To supply a list of names of subgroups, up to eight characters each.

Format

GNAME= ( n1 , n2 ,..., nMGROUP )

Default

GROUP 01, GROUP 02, …

Related topics

 INPUT command: MGROUP/MRATER keyword (Section 3.2.7)

REFERENCE keyword

Purpose

To specify the position of the reference subgroup in the GCODE list (i.e., the subscript of the
reference group corresponding to symbols n11 , n21 ,..., nMGROUP ). The parameter values for
other subgroups are adjusted to this reference subgroup. If REFERENCE=0, no reference sub-
group is set and no adjustment is performed. This keyword is used only for the DIF model.

Format

REFERENCE=n

Default

n=1.

302
MRATER COMMAND

3.2.9 MRATER command

(Optional)

Purpose

To provide necessary information about the Raters-effect model.

Format

>MRATER RNAME=(list), RCODE=(list), RATER=(list);

Notes

 This command is required if the MRATER keyword is specified on the INPUT command.
 Rater names and rater codes must be enclosed in single quotes if they do not begin with a
letter or if they contain blanks or special (non-alphanumeric) symbols. Note that group
codes in the data records do not need quotes, regardless of what characters are used.

Related topics

 INPUT command: MGROUP/MRATER keyword (Section 3.2.7)

RATER keyword

Purpose

To specify the raters’ weights. For the Raters-effect model, the ability score for each re-
spondent is computed for each subtest (or subscale) and each rater separately. A total score
of each respondent for each subtest (or subscale) is computed by summing those scores over
items within each subtest and all raters who have rated the respondent. The rater weights of
this keyword are used to compute the weighted subtest or subscale score for each re-
spondent.

Since the number of raters who rated each respondent’s responses may vary, the weights are
normalized (divided by their sum) for each respondent.

Format

RATER= ( n1 , n2 ,..., nMRATER ) .

Default

n = 1.0.

303
3 PARSCALE REFERENCE

Related topics

 INPUT command: MGROUP/MRATER keyword (Section 3.2.7)

RCODE keyword

Purpose

To specify the rater identification code, which appears in the data field of the original re-
sponse file (DFNAME) in the same order as the rater names, up to four characters.

Format

RCODE= ( n1 , n2 ,..., nMRATER ) .

Default

RCODE=(‘nnnl’,’nnn2’, ...),where n is a blank character.

Related topics

 INPUT command: MGROUP/MRATER keyword (Section 3.2.7)


 FILES command: DFNAME keyword (Section 3.2.6)

RNAME keyword

Purpose

To supply a list of names of raters, up to eight characters each.

Format

RNAME= ( n1 , n2 ,..., nMRATER )

Default

RATER001, RATER002, ….

Related topics

 INPUT command: MGROUP/MRATER keyword (Section 3.2.7)

304
PRIORS COMMAND

3.2.10 PRIORS command

(Optional)

Purpose

To specify prior distributions for constrained estimation of item parameters of subtest or


subscale i.

Format

>PRIORS TMU=(list), TSIGMA=(list), SMU=(list), SSIGMA=(list),


GMU=(list), GSIGMA=(list), SOPTION;

Notes

 If the PRIORREAD option has been specified on the CALIB command, the PRIORS command
is required. Of course, since there should be as many CALIB commands as there are sub-
tests, the number and order of the PRIORS commands should mimic the CALIB commands.
The program assumes a normal prior distribution for the thresholds and a lognormal prior
distribution for the slopes.

Related topics

 CALIB command: PRIORREAD option (Section 3.2.3)

GMU keyword

Purpose

To specify the real-valued “alpha” parameters for the Beta prior distribution of the lower as-
ymptote (guessing) parameter.

Format

GMU= ( n1 , n2 ,..., nn )

Default

Supplied by the program.

305
3 PARSCALE REFERENCE

GSIGMA keyword

Purpose

To specify the real-valued “beta” prior parameters for the Beta prior distribution of the
lower asymptote (guessing) parameter.

Format

GSIGMA= ( n1 , n2 ,..., nn )

Default

Supplied by the program.

SMU keyword

Purpose

To supply real-valued prior means for the item slopes.

Format

SMU= ( n1 , n2 ,..., nn )

Default

Supplied by the program.

SOPTION option

Purpose

To indicate that the means and the standard deviations for prior slopes are already in the
log(e) metric.

Format

SOPTION

Default

The regular arithmetic metric.

306
PRIORS COMMAND

SSIGMA keyword

Purpose

To specify real-valued prior standard deviations of the item slopes.

Format

SSIGMA= ( n1 , n2 ,..., nn )

Default

Supplied by the program.

TMU keyword

Purpose

To specify real-valued prior means for the item thresholds.

Format

TMU= ( n1 , n2 ,..., nn )

Default

Supplied by the program.

TSIGMA keyword

Purpose

To specify real-valued prior standard deviations of the item thresholds.

Format

TSIGMA= ( n1 , n2 ,..., nn )

Default

Supplied by the program.

307
3 PARSCALE REFERENCE

3.2.11 QUADP command

(Optional)

Purpose

To specify that user-supplied quadrature points and weights, or points and ordinates of the
discrete finite representation of the prior ability for subtest or subscale i are provided.

Format

>QUADP POINTS=(list), WEIGHTS=(list);

Notes

If the QPREAD option has been specified on the CALIB command, the QUADP command is re-
quired. Of course, since there should be as many CALIB commands as there are subtests, the
number and order of the QUADP commands should mimic the CALIB commands.

Related topics

 CALIB command: NQPT keyword (Section 3.2.3)


 CALIB command: QPREAD option

POINTS keyword

Purpose

To provide a set of NQPT (on CALIB command) real-numbered values (with decimal points)
of the quadrature points of the discrete distribution.

Format

POINTS= ( n1 , n2 ,..., nNQPT )

Default

Supplied by the program.

Related topics

 CALIB command: NQPT keyword (Section 3.2.3)

308
QUADP COMMAND

WEIGHTS keyword

Purpose

To supply a set of NQPT (on CALIB command) positive fractions (with decimal points and
summing to 1.0) for weights of probabilities of points in the discrete distribution.

Format

WEIGHTS= ( n1 , n2 ,..., nNQPT )

Default

Supplied by the program.

Related topics

 CALIB command: NQPT keyword (Section 3.2.3)

309
3 PARSCALE REFERENCE

3.2.12 QUADS command

(Optional)

Purpose

To specify that user-supplied quadrature points and weights, or points and ordinates of the
discrete step-function representation of the scale scores for the respondents on subtest or
subscale i are provided.

Format

>QUADS POINTS=(list), WEIGHTS=(list);

Notes

If the QPREAD option has been specified on the SCORE command, the QUADS command is re-
quired. Of course, since there should be as many SCORE commands as there are subtests, the
number and order of the QUADS commands should mimic the SCORE commands.

Related topics

 SCORE command: NQPT keyword (Section 3.2.14)


 SCORE command: QPREAD option

POINTS keyword

Purpose

To specify a set of NQPT (on SCORE command) real-numbered values (with decimal points)
of the quadrature points of the discrete distribution.

Format

POINTS= ( n1 , n2 ,..., nNQPT )

Default

Supplied by the program.

Related topics

 SCORE command: NQPT keyword (Section 3.2.14)

310
QUADS COMMAND

WEIGHTS keyword

Purpose

To specify a set of NQPT (on SCORE command) positive fractions (with decimal points and
summing to 1.0) for weights of probabilities of points in the discrete distribution.

Format

WEIGHTS= (b1 , b2 ,..., bNQPT )

Default

Supplied by the program.

Related topics

 SCORE command: NQPT keyword (Section 3.2.14)

311
3 PARSCALE REFERENCE

3.2.13 SAVE command

(Optional)

Purpose

To specify the output files to be saved.

Format

>SAVE MASTER=<name>, CALIB=<name>, PARM=<name>, SCORE=<name>,


INFORMATION=<name>, FIT=<name>, COMBINE=<name>;

Notes

 The master and calibration data files are saved in a binary form. Other files are saved as
ASCII (plain text) files; their formats are described in Section 3.4.1.
 The SAVE command is required if the SAVE option on the FILES command has been en-
tered.
 There are no default filenames for this command.
 If a specific name is supplied with a keyword, then that particular output file will be saved
after the analysis is completed.
 If the same filename is used in both the FILES and the SAVE command, then the existing
file will be overwritten after it has been read. Thus, different filenames should be supplied
for the IFNAME keyword on the FILES command and the PARM keyword on the SAVE com-
mand to avoid replacing old item-parameter values with new values.
 Names must be enclosed in single quotes.
 The maximum length of filenames is 128 characters, including the path, if needed. See
Section 3.2.6 for more details.

Related topics

 FILES command (Section 3.2.6)


 Format of output files (Section 3.4.1)

CALIB keyword

Purpose

To specify a calibration data filename.

Format

CALIB=<'filename'>

312
SAVE COMMAND

Default

None.

Related topics

 Format of output files (Section 3.4.1)

COMBINE keyword

Purpose

To specify a combined score filename.

Format

COMBINE=<'filename'>

Default

None.

Related topics

 Combined score file (COMBINE) (Section 3.4.2)

FIT keyword

Purpose

To specify a fit-statistics filename.

Format

FIT=<'filename'>

Default

None.

Related topics

 Fit statistics file (Section 3.4.3)

313
3 PARSCALE REFERENCE

INFORMATION keyword

Purpose

To specify an item information filename.

Format

INFORMATION=<'filename'>

Default

None.

Related topics

 Item information file (Section 3.4.5)

MASTER keyword

Purpose

To specify a master data filename.

Format

MASTER=<'filename'>

Default

None.

Related topics

 Format of output files (Section 3.4.1)

PARM keyword

Purpose

To specify an item parameter filename.

Format

PARM=<'filename'>

314
SAVE COMMAND

Default

None.

Related topics

 Item parameter file (Section 3.4.4)

SCORE keyword

Purpose

To specify a subject scores filename.

Format

SCORE=<'filename'>

Default

None.

Related topics

 Subject scores file (Section 3.4.6)

315
3 PARSCALE REFERENCE

3.2.14 SCORE command

(Required)

Purpose

To request the scoring of individual respondents or of response frequencies in group-level


data. There is a SCORE command for each subtest or subscale.

Format

>SCORE NQPT=n, DIST=n, QRANGE=(list), SMEAN=n, SSD=n, NAME=n, PFQ=n,


ITERATION=(list), PRINT, QPREAD, NOSCORE, SAMPLE, RESCALE,
SCORING=STANDARD/CALIBRATION, EAP/MLE/WML, NOADJUST, FlT, NRATER;

Notes

 There should be as many SCORE commands as there are subtests, in the same order as the
TEST commands.
 If a score file has been specified by the SCORE keyword on the SAVE command, all subject
scores will be printed to the output file, whether the PRINT option on the SCORE command
has been selected or not.
 If the option RESCALE is present, the keywords SMEAN and SSD are rescaling constants. Let
the rescaled score be θ * and the original score θ . Then, θ * = sθ + t , where s is the scaling
constant (SSD) and t is the location constant (SMEAN).

Related topics

 Examples of SCORE commands


 SAVE command: SCORE keyword (Section 3.2.13)
 TEST command (Section 3.2.15)

DIST keyword

Purpose

To specify the type of prior distribution. This keyword is to be used when EAP scoring is se-
lected.

 n = 1: Uniform distribution
 n = 2: Normal on equally spaced points
 n = 3: Normal on Gauss-Hermite points

316
SCORE COMMAND

Format

DIST=n

Default

2.

Related topics

 Examples of SCORE commands


 SCORE command: EAP/MLE/WML option

EAP/MLE/WML option

Purpose

To specify a method of estimating scale scores.

 EAP: Expected a posteriori estimation (Bayes)


 MLE: Maximum likelihood estimation
 WML: Warms’ weighted maximum likelihood estimation

Format

EAP/MLE/WML

Default

EAP

Related topics

 Examples of SCORE commands

FIT option

To request the printing of fit statistics for score estimates for the group-level data. This key-
word is not effective for individual response data.

Format

FIT

Related topics

 Examples of SCORE commands

317
3 PARSCALE REFERENCE

ITERATION keyword

Purpose

In maximum likelihood scoring, stops the iterative solution when the changes are less than
i, or the number of iterations is greater than j.

Format

ITERATION=(i,j)

Default

(0.01, 20).

Related topics

 Examples of SCORE commands

NAME keyword

Purpose

To specify a score name different from the subtest or subscale name.

Format

NAME=character string

Default

Test name

Related topics

 Examples of SCORE commands

NOADJUST option

Purpose

To suppress the calibration adjustment of the category parameter mean during scoring.

318
SCORE COMMAND

Format

NOADJUST

Related topics

 Examples of SCORE commands

NOSCORE option

Purpose

To suppress the scoring of responses.

Format

NOSCORE

Related topics

 Examples of SCORE commands

NQPT keyword

Purpose

To set the number of quadrature points if EAP scoring has been selected.

Format

NQPT=n

Default

30.

Related topics

 Examples of SCORE commands

NRATER option

Purpose

To stop the correction for the information function (specified with the RATER keyword on the
BLOCK command) from being used for scoring.

319
3 PARSCALE REFERENCE

Format

NRATER

Related topics

 BLOCK command: RATER keyword (Section 3.2.2)


 Examples of SCORE commands

PFQ keyword

Purpose

To specify the response percentage to be moved to the immediately adjacent category to en-
able the computation of ML scale scores if the input data are group-level frequency data (see
the INPUT command, Section 3.2.7) and all item responses are in the lowest or highest
categories. The edited response records are printed out if DIAG=2 or higher on the
CALIBRATION command.

Format

PFQ=n

Default

(n = 1 to 99).

Related topics

 CALIB command: DIAGNOSIS keyword (Section 3.2.3)


 Examples of SCORE commands
 INPUT command (Section 3.2.7)

PRINT option

Purpose

To request the printing of the subject scores to the output file.

Format

PRINT

320
SCORE COMMAND

QPREAD option

Purpose

To indicate that quadrature points and weights will be read from the QUADS command. Oth-
erwise, the program supplies the quadrature points and weights (and no QUADS command fol-
lows).

Format

QPREAD

Related topics

 Examples of SCORE commands


 QUADS command (Section 3.2.12)

QRANGE keyword

Purpose

To specify the upper (c) and lower (d ) range of quadrature points.

Format

QRANGE=(c,d)

Default

(-4.0, +4.0).

Related topics

 Examples of SCORE commands

RESCALE option

Purpose

To use the values specified for the keywords SMEAN and SSD as rescaling constants instead of
a mean and a standard deviation, respectively, of the sample distribution.

Format

RESCALE

321
3 PARSCALE REFERENCE

Related topics

 Examples of SCORE commands


 SCORE command: SMEAN keyword
 SCORE command: SSD keyword

SAMPLE option

Purpose

To request that only the sampled subjects are scored (see the SAMPLE keyword on the INPUT
command, Section 3.2.7).

Format

SAMPLE

Related topics

 Examples of SCORE commands


 INPUT command: SAMPLE keyword (Section 3.2.7)

SCORING keyword

Purpose

To specify the scoring function to be used for scoring. STANDARD specifies that the standard
scoring function (1.0, 2.0,…) is to be used, even if a different function is used for calibra-
tion. CALIBRATION specifies that the calibration function specified in the BLOCK commands
is to be used for scoring.

Format

SCORING=STANDARD/CALIBRATION

Default

STANDARD.

Related topics

 BLOCK command (Section 3.2.2)


 Examples of SCORE commands

322
SCORE COMMAND

SMEAN keyword

Purpose

To request that the original scale scores be rescaled such that the mean equals n.

Format

SMEAN=n

Default

No rescale.

Related topics

 Examples of SCORE commands

SSD keyword

Purpose

To request that the original scale scores are rescaled such that the standard deviation equals
n.

Format

SSD=n

Default

No rescale.

Related topics

 Examples of SCORE commands

Examples of SCORE commands

This example shows how an existing item-parameter file is used for scoring observations.
Calibration is not needed, therefore the NOCALIB option of CALIB has been invoked. Scoring
will be done with maximum likelihood estimation, and the score distribution will be ad-
justed to the mean and the standard deviation specified with the SMEAN and SSD keywords,
respectively.

>FILES NAME=’c89conv.dat’, IFNAME=’cap90ctl.ifl’, SAVE;


>SAVE SCORE = ’cap89.scr’ ;

323
3 PARSCALE REFERENCE

>INPUT WEIGHT, NIDCH=15, NTOTAL=120, LENGTH=6, NTEST=1, NFMT=3;


(15A1,F5.0,2(7X,6A3,1X,19X,6A3,1X)
3(/,20X,2(7X,6A3,1X,19X,6A3,1X)),/,
20x,2(7X,6A3,1X, 19X,6A3,1X))
>TEST TNAME=AUTORHET, NBLOCK=1, ITEM=(1,3,5,11,17,19),
INAME=(A20R,A21R,A22R,A25R,A28R,A29R) ;
>BLOCK BNAME=’AUT-RHET’, NITEMS=6, NCATEGORIES=6, MODIFIED=(6,5,4,3,2,1) ;
>CAL NOCALIB ;
>SCORE SMEAN=254.182, SSD=66.496, MLE ;

Related topics

 CALIB command: NOCALIB option (Section 3.2.3)


 SCORE command: SSD keyword
 SCORE command: SMEAN keyword

324
TEST/SCALE COMMAND

3.2.15 TEST/SCALE command

(Required)

Purpose

To identify the test or scale, or subtest i or subscale i. The keyword NTEST on the INPUT
command supplies the number of subtests or subscales. The same number of TEST (or
SCALE) commands is expected. The order of these TEST (or SCALE) commands is the same as
the order in which the subtest lengths are specified on the INPUT command. If there is only
one test or scale, there is only one test command.

Location of the items, names of the items, and starting values for estimating the item pa-
rameters can also be supplied with the TEST (or SCALE) command.

Format

>TEST/SCALE TNAME=n, NBLOCK=n, ITEMS=(list), INAME=(list),


INTERCEPT=(list), THRESHOLD=(list), SLOPE=(list);

Notes

 One TEST command is required for each subtest as specified by the NTEST keyword on the
INPUT command. If there are no subtests (NTESTS=1), only one TEST command is needed.
The order of the TEST commands is the same as the order used in the specification of the
length of each subtest on the INPUT command.
 If the keywords INTERCEPT, THRESHOLD, or SLOPE are given without any arguments, the
values 0.0, 0.0, and 1.0 are used for the initial intercept, threshold, and slope parameters,
respectively. In this case, no initial values are computed by the program.
 Test or item names that
o do not begin with a letter, or
o contain blanks and/or special (non-alphanumerical) symbols, or
o consist of more than 8 characters,

must be enclosed in single quotes.

Also see the section of examples of TEST/SCALE commands.

Related topics

 INPUT command: NTEST keyword (Section 3.2.7)


 Examples of TEST/SCALE commands

325
3 PARSCALE REFERENCE

INAMES keyword

Purpose

To specify a list of names (up to four characters each) for the items in this (sub)test or
(sub)scale.

Format

INAME= ( n1 , n2 ,..., nn1 )

Default

Supplied by the program.

Related topics

 Examples of TEST/SCALE commands

INTERCEPT keyword

Purpose

To provide real-numbered starting values (with decimal points) for estimating the item in-
tercepts. Starting values may be specified by INTERCEPT or THRESHOLD, but not by both.

Format

INTERCEPT= ( n1 , n2 ,..., nn1 )

Default

Supplied by the program.

Related topics

 Examples of TEST/SCALE commands

ITEMS keyword

Purpose

To supply a list of the serial position numbers of the items in the total response record.

326
TEST/SCALE COMMAND

Format

ITEMS= ( n1 , n2 ,..., nn1 )

Default

through LENGTH.

Related topics

 Examples of TEST/SCALE commands


 INPUT command: LENGTH keyword (Section 3.2.7)

NBLOCK keyword

Purpose

To indicate the number of blocks of items that share common categorical parameters. When
items are rated by a single Likert scale, for example, the number and meaning of their cate-
gories is the same and all may be assigned to the same block. The items must be selected or
rearranged so that all those TEST or SCALES in block 1 precede those in block 2, which pre-
cede those in block 3, etc. (see BLOCK command, discussed in Section 3.2.2).

Format

NBLOCK=n

Default

1.

Related topics

 BLOCK command (Section 3.2.2)


 Examples of TEST/SCALE commands
 TEST/SCALE command

SLOPE keyword

Purpose

To specify real-numbered starting values (with decimal points) for estimating the item
slopes.

327
3 PARSCALE REFERENCE

Format

SLOPE= ( n1 , n2 ,..., nn1 )

Default

Supplied by the program.

Related topics

 Examples of TEST/SCALE commands

THRESHOLD keyword

Purpose

To specify real-numbered starting values (with decimal points) for estimating the item
thresholds. Starting values may be specified by INTERCEPT or THRESHOLD, but not by both.

Format

THRESHOLD= ( n1 , n2 ,..., nn1 )

Default

Supplied by the program.

Related topics

 Examples of TEST/SCALE commands


 TEST/SCALE command: INTERCEPT keyword

TNAME keyword

Purpose

To provide a name for the test or scale, subtest or subscale i, up to eight characters.

Format

TNAME=character string

328
TEST/SCALE COMMAND

Default

Supplied by the program.

Related topics

 Examples of TEST/SCALE commands

Examples of TEST/SCALE commands

The first TEST command describes a subtest with the name “AUTORHET” consisting of one
block of items with the serial positions 1, 3, 5, 7, 9, 11, 13, 15, 17, and 19, and with names
like “A20R.”

The command file will have sixteen TEST commands, as specified with NTEST on the INPUT
command. Note the order of the commands.

>INPUT NIDCH=15,NTOTAL=160, NTEST=16, GROUP, NFMT=2,


LENGTH=(10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10);
(15A1,5X,3(5(1OA1,2X),3X),/,10A3)
>TEST1 TNAME=AUTORHET,NBLOCK=1,ITEM=(1(2)19),
INAME=(A20R,A21R,A22R,A23R,A24R, A25R,A26R,A27R,A28R,A29R);
>BLOCK BNAME=’AUT-HET’,NITEM=10, NCAT=6, MODIFIED=(6,5,4,3,2,1);
>CAL DIAGNOS=1, SCALE=1.7, LOGISTIC, CYCLES=50, CRITERION=0.0075,
SPRIOR, TPRIOR;
>SCORE NOSCORE,MLE;
>TEST2 TNAME=AUTOCONV,NBL=1,ITEM=(2(2)20),
....
....

Related topics

 INPUT command (Section 3.2.7)


 INPUT command: NTEST keyword

329
3 PARSCALE REFERENCE

3.2.16 TITLE command

(Required)

Purpose

To provide a label that will be used throughout the output to identify the problem run.

Format

...text...
...text...

Notes

The first two lines of the command file are title lines. If the title fits on one line, a second,
blank line should be entered before the next command starts. The text will be printed verba-
tim at the top of each output section, as well as at the start of some output files. The two title
lines are required at the start of the command file. No special delimiters (> or ;) are required.

Example:

CALIFORNIA ASSESSMENT PROJECT ‘90 CALIBRATION OF ESSAY-TEST RATINGS


>COMMENT GROUP-LEVEL ANALYSIS

330
VARIABLE FORMAT STATEMENT

3.2.17 Variable format statements

The data layout must be described in a variable format statement. This statement is entered
within parentheses and immediately follows the INPUT command.

When data (labels, raw data, summary statistics) are used in fixed format, a format statement is
needed to instruct the program how to read the data.

The general form of such a statement is

(rCw) or (rCw.d),

where:

r Repeat count; if omitted, 1 is assumed.

C Format code:

A Code for character values

I Code for integer values

F Code for real numbers

w Field width, or number of columns.

d Number of decimal places (for F-format).

The format statement should be enclosed in parentheses. Blanks within the statement are ig-
nored: (r C w. d) is acceptable. Anything after the right parenthesis and on the same line
is also ignored by the program, thus comments may be placed after the format statement.

The following example shows three ways to read five integers, with the same result:

(5I1)
12345

(5I2)
1 2 3 4 5

(I1,I2,3I3)
1 2 3 4 5

The F-format requires the number of decimal places in the field description. If there are
none (and eight columns) specify (F8.0); (F8) is not allowed. However, if a data value con-

331
3 PARSCALE REFERENCE

tains a decimal point, then it overrides the location of the decimal point as specified by the
general field description. If the general field description is given by (F8.5), then 12345678
would result in the real number +123.45678, but the decimal point in —1234.56 would not
be changed. Only blanks will result in the value zero. The plus sign is optional.

The “X” operator can be used to skip spaces or unused variables in the data file. For exam-
ple, (F7.4,8X,2F3.2) informs the program that the data file has 21 columns per record. The
first value can be found in the first seven columns (and there are four decimal places), then
eight columns should be skipped, and a second and third value are in columns 16 - 21, both
occupying three columns (with two decimal places). Note that the ITEMS keyword on the
TEST (or SCALE) command also allows selection and reordering of variables.

Another option is the use of the tabulator format descriptor T, followed by a column number
n. For example, (F8.5, T61, 2F5.1) describes three data fields; in columns 1 - 8, with five
decimal digits, next in columns 61 - 65 and 66 - 70, both with one decimal digit. If the num-
ber n is smaller than the current column position, left-tabbing results. Left tabs can be unre-
liable in PC systems and should be used cautiously. A forward slash (/) in an F-format
means “skip the rest of this line and continue on the next line. Thus, (F10.3/5F10.3) or
(F10.3,/,5F10.3) instructs the program to read the first variable on the first line, then to
skip the remaining variables on that line and to read five variables on the next line.

For other uses of a format statement, a FORTRAN textbook should be consulted.

Related topics

 ITEMS keyword on the TEST/SCALE command (Section 3.2.15)

332
INPUT AND OUTPUT FILES

3.3 Input files


3.3.1 Specification of input files

The following types of data can be used as input for a PARSCALE analysis:

 Original response data for individual respondents


 Individual response data for a single-group model
 Individual response data for a DIF multiple-group model
 Individual response data for a rater-effect multiple-group model
 Original response data for group-level frequencies

In addition to these, item parameter files from previous analyses may be used as input. The use
of an omitted key file and not-presented key file is also permitted. Each of these data types will
now be discussed in turn.

3.3.2 Individual level data

Original response data for individual respondents (DFNAME)

Each record is read by a variable format statement supplied by the user. The following fields are
contained in each record.

Field Format Description

Required. <a> is specified by the NIDCHAR keyword in the INPUT


Identification <a>A1
command.

For a single model, this field should be omitted. For DIF multiple-
group models, the subgroup code is read as characters. For the
Rater-Effect Model, the rater’s code is read as characters. The
Subgroup A<b> length of the characters (b), must be less than eight, and they should
identification
be specified by the GCODE keyword on the MGROUP command. The
maximum number of subgroups should be specified in the MGROUP
keyword on the INPUT command.

Fw.d If the WEIGHT option appears in the INPUT command, this WEIGHT
Weight
field must be read, in floating point format.

<c> should correspond to the NTOTAL keyword on the INPUT


command, <d> can be specified by the user, but only the first four
Response <c>A<d> characters are significant. These character responses are converted
vector
into integers according to the list of response codes, specified by the
ORIGINAL keyword in the BLOCK command.

333
3 PARSCALE REFERENCE

Notes

For a single group model and a DIF multiple-group model, each respondent’s responses are rep-
resented by a single response vector. For the DIF model, response vectors are not necessarily to
be sorted by subgroups. Response vectors of all subgroups can be mixed. If the identification
field is a blank or the end of file is reached, the program terminates the input procedure.

For a Rater-effect multiple-group model, the responses of a single respondent to constructed


items may be rated by more than one rater. The program assumes that multiple rated response
vectors of each respondent are consecutively read, and only the first record of each respondent
has a non-blank identification field. Subsequent response vectors rated by different raters for the
respondent have a blank field of identification and non-blank field of subgroup identification
(Rater Identification). If both respondent and rater’s identification are blanks, or the end of file is
reached, the program assumes that it is the end of the input file. Some of the items are not rated.
These objective items must be duplicated for each rater response vector. If a constructed re-
sponse item is not rated by a certain value, the response should be coded as not-presented.

Related topics

 BLOCK command: ORIGINAL keyword (Section 3.2.2)


 INPUT command: NIDCH, NTOTAL, and MGROUP/MRATER keywords (Section 3.2.7)
 INPUT command: WEIGHT option
 MGROUP command: GCODE keyword (Section 3.2.8)
 Variable format statements (Section 3.2.17)

Individual response data for single-group model

In this case, the format of the data should be:

Respondent 1 [ID] <WEIGHT> [ITEM RESPONSES]


Respondent 2 [ID] <WEIGHT> [ITEM RESPONSES]
...
...
[Blank Record or End-of-file]

Individual response data for DIF multiple-group model

In this case, the format of the data should be:

Respondent 1 [ID.] [GROUP CODE] <WEIGHT> [ITEM RESPONSES]


Respondent 2 [ID.] [GROUP CODE] <WEIGHT> [ITEM RESPONSES]
...
...
[Blank Record or End-of-file]

334
INPUT AND OUTPUT FILES

Individual response data for rater-effect multiple-group model

In this case, the format of the data should be:

Respondent 1 [ID.] <WEIGHT> [RATER CODE,ITEM RESPONSE]....


[RATER CODE,ITEM RESPONSE]
Respondent 2 [ID.] <WEIGHT> [RATER CODE,ITEM RESPONSE]....
[RATER CODE,ITEM RESPONSE]
...
...
[Blank Record or End-of-file]

3.3.3 Group-level data

Original response data for group-level Frequencies (DFNAME)

If the GROUP keyword appears in the INPUT command, the input data are assumed to be group-
level frequencies of categorical responses. Each record is read by a format statement supplied by
the user. The following fields are contained in each record.

Field Format Description

<a>A1 Required. <a> is specified by the NIDCHAR keyword in the


Identification
INPUT command.

For a single-group model, this field should be omitted. For DIF


multiple-group models, the subgroup code is read as characters.
For the Rater-effect Model, the rater’s code is read as characters.
The length of characters, (b), must be less than eight, and they
Subgroup A<b> should be specified by the GCODE keyword on the MGROUP com-
identification
mand.

The maximum number of subgroups should be specified in the


MGROUP keyword on the INPUT command.

Fw.d If the WEIGHT option appears in the INPUT command, this


Weight
WEIGHT field must be read, in floating point format.

The INOPT keyword on the INPUT command allows the program


to read the following input vector:
Response Integer INOPT=1:
vector
Frequencies of Categorical Responses[1][2][3]...[ m j ]

335
3 PARSCALE REFERENCE

INOPT=2:

Frequencies of Not-Presented Responses and Categorical Re-


sponses [N-P][1][2]...[ m j ]

INOPT=3:

Frequencies of Omitted Responses and Categorical Responses


[Omit][1][2]..[ m j ]

INOPT=4:

Frequencies of Not-Presented, Omitted, and Categorical Re-


sponses [N-P][Omit][1][2]...[ m j ]

INOPT=5: A series of Response Codes and Frequencies


[Code][Fre][Code][Fre]

If INOPT=5 is used, the response code is read in as characters and


specified by the ORIGINAL keyword on the BLOCK command.

The distinctions between input data streams for the single-group and multiple-group (DIF and
Rater-effect) models are the same as those in the individual response data discussed earlier.

Related topics

 BLOCK command: ORIGINAL keyword (Section 3.2.2)


 INPUT command: INOPT, MGROUP/MRATER, and NIDCHAR keywords (Section 3.2.7)
 INPUT command: GROUPLEVEL and WEIGHT options
 MGROUP command: GCODE keyword (Section 3.2.8)
 Variable format statements (Section 3.2.17)

3.3.4 Key files

Item parameter file

See the format specification for PARM file in the SAVE command.

Omitted key file

This file should contain a single record in the same format as the individual response data. The
fields of identification, subgroup identification, and weight are not processed and do not need to
be occupied. The current version of PARSCALE treats omitted response as not-presented.

336
INPUT AND OUTPUT FILES

Not-presented file

This file should contain a single record in the same format as the individual response data. The
fields of identification, subgroup identification, and weight are not processed and do not need to
be occupied. For multiple-group models (DIF and Raters’ Effect), this file is particularly impor-
tant because for those situations, not all items are presented to all subgroups of respondents or
not all items are rated by all raters. If a not-presented code is present in the original data file and
this file is not specified, response records containing the code will be rejected.

Related topics

 FILES command: OFNAME and NFNAME keywords (Section 3.2.6)


 SAVE command: PARM keyword (Section 3.2.13)

3.4 Output files


3.4.1 Format of output files

Apart from the standard 4 list output files produced (*.ph0, *.ph1, *.ph2, and *.ph3), the user
can instruct the program to create the following additional output files, using keywords on the
SAVE command:

 combined score file (COMBINE)


 fit statistics file (FIT)
 item information file (INFORMATION)
 item parameter file (PARM)
 subject scores file (SCORE)

3.4.2 Combined score file

In the combined score file, the first eight records form the file’s title lines.

Format: ('1',//,T25,30'*',/,T32,'COMBINED SCORES',/,T25,30'*',///)

The following specifications are repeated for all respondents or sampled respondents.

Format Description

(<a>A1, Identification of respondent

T22,'|',2X,I7,2X The respondent number

A8,2X, Group name

F7.2) Weight for respondent

337
3 PARSCALE REFERENCE

Finally, the next specifications are repeated for each subtest (from 1 through NTEST) within each
respondent (NTEST is the number of subtests specified by the NTEST keyword in the INPUT com-
mand)

Format Description

(1X,I3,2X, The subtest number

A8,2X, The subtest name

T22,'|',2X,F7.3,2X, Combined score

F7.3) S.E. of combined scores

Related topics

 INPUT command: NTEST keyword (Section 3.2.7)


 INPUT command: COMBINE keyword

3.4.3 Fit statistics file

The first four records of the fit statistics file describe the run as follows:

Records Format Description

1&2 (20A4,/,20A4) The title records from the PARSCALE run

3 (I4, The number of subtests


I4) The number of subgroups

4 (I4, The subtest number


A8, The subtest name

I4) The number of boundaries for the fit statistics computa-


tion (NBOUND)

338
INPUT AND OUTPUT FILES

The following information is repeated for each group (from 1 through MGROUP):

Format Description

(I4, Subgroup number

A8) Subgroup name

(8F10.5,/) Mean ability for NBOUND boundaries

The information shown below, together with its format description, is written to the fit statistics
file for each block (from 1 through NBLOCK) within each group, and for each item (from 1
through NITEMS) within each block.

Format Description

(I4, The block number

A8, The block name

I4, The number of categories

I4, The item number

A4) The item name

(8F10.5,/) Observed sample sizes for NBOUND boundaries

Finally, the following information is repeated for each response category (1 through NCAT) within
each block (NCAT is the number of response categories of the current block):

Format Description

(8F10.5,/) Observed frequencies for NBOUND boundaries

(8F10.5,/) Model based frequencies for NBOUND boundaries

Related topics

 TEST/SCALE command: NBLOCK keyword (Section 3.2.15)


 BLOCK command: NITEMS keyword (Section 3.2.2)
 BLOCK command: NCAT keyword
 SAVE command: FIT keyword (Section 3.2.13)

339
3 PARSCALE REFERENCE

3.4.4 Item parameter file

Records 1 and 2 are the TITLE lines from the command file.

Format: (20A4,/,20A4)

The codes in record 3 describe the model as follows:

Format Description

(A8, The test name from the PARSCALE run

I5, The number of blocks (NBLOCK)

I5, The total number of items

I5, The model code

I5, The number of subgroups (MGROUP)

I5) The model code for multiple-groups (see below)

Notes:

The model codes are as follows:

 1: Normal ogive graded response model with item and category parameters separated
 2: Normal ogive graded response model with item-category parameters
 3: Logistic graded response model with item and category parameters separated
 4: Logistic graded response model with item-category parameters
 5: Normal ogive partial credit model with item and category parameters separated
 6: Normal ogive partial credit model with item-category parameters (not implemented)
 7: Logistic partial credit model with item and category parameters separated
 8: Logistic partial credit model with item-category parameters

The model codes for multiple-groups are:

 DIF model (the default for a single-group model)


 Raters’ effect model

Line 4 shows the number of items per block.

340
INPUT AND OUTPUT FILES

Format: (30I5)

The rest of the data show the parameters grouped by block within each group. For each group
(from 1 through MGROUP), the subgroup name is listed first formatted as (A8). Note that for a sin-
gle-group or Rater's-Effect model there will be only one group name.

Within each group, the following block information appears:

Format Description

(A8, Block name

I5, The number of categories

A4, Item name

F10.5, Slope parameter

F10.5, S.E. of slope parameter

F10.5, Location parameter

F10.5, S.E. of location parameter

F10.5, Guessing parameter

F10.5) S.E. of guessing parameter

(15F10.5,/) Category parameters for this block

(15F10.5,/) S.E. of category parameters for this block

Lastly, in the case of a Rater's-Effect model, the following rater information is provided for each
rater:

Format Description

(A8, Rater’s name

F10.5, Rater's-Effect parameter

F10.5) S.E. of Raters’ Effect parameter

341
3 PARSCALE REFERENCE

Related topics

 BLOCK command: NITEMS keyword (Section 3.2.2)


 INPUT command: MGROUP/MRATER keyword (Section 3.2.7)
 TEST/SCALE command: NBLOCK keyword (Section 3.2.15)
 SAVE command: PARM keyword (Section 3.2.13)

3.4.5 Item information file

The information file begins with the TITLE lines from the command file in records 1 and 2.

Format: (20A4,/,20A4)

In the remainder of the file, item information is listed as follows: the results are grouped by
quadrature points (1 through NQPT), within items (1 through NITEMS), within blocks (1 through
NBLOCK), within groups (1 through MGROUP, or just 1 for a single-group or Rater's-Effect model):

Format Description

(A8,2X, The test name

I4,2X, Group number

I4,2X, Block number

A8,2X, Block name

I4,2X, Item number

A4,2X, Item name

I2,2X, Node number

F10.5,2X Quadrature point value

F18.10) Item information at each quadrature node

Related topics

 TEST/SCALE command: NBLOCK keyword (Section 3.2.15)


 BLOCK command: NITEMS keyword (Section 3.2.2)
 CALIB command: NQPT keyword (Section 3.2.3)
 INPUT command: MGROUP/MRATER keyword (Section 3.2.7)
 SAVE command: INFORMATION keyword (Section 3.2.13)

342
INPUT AND OUTPUT FILES

3.4.6 Subject scores file

Records: 1

Format Description

(/,1X,I2, Subtest number

‘SUBTEST:',A8) Subtest name

Note that if the number of subtests is one, this record is skipped.

Repeat for all respondents or sampled respondents:

Format Description

(<a>A1, Identification of respondents


T22,'|',

I7,2X, Respondent number

A8,2X, Group name

F7.2) Weight for respondent

Notes

The length of identification specified by the NIDCHAR keyword in the INPUT command is auto-
matically supplied for <a>A1.

Repeat for each rater (from 1 through the number of response vectors, NVEC) within each respon-
dent. (For a single-group model or DIF model, it repeats only once (NVEC=1), and for Raters’ Ef-
fects models, it repeats for the number of raters rated for this particular respondent. Therefore,
for the Raters’ Effect model, NVEC varies depending on the number of raters who rated this re-
spondent).

Format Description

(1X,I3,2X, Score number

A8,2X, Score name

I3,2X, Rater identification of this response vector


'|',2X,

343
3 PARSCALE REFERENCE

F7.2,4X, Weight for this response vector

F7.2,2X, Mean category

F7.2,4X, The number of items attempted

F10.4,2X, Ability estimate

F10.4) S.E. of ability estimate

Note that if the DIF model is used, the rater identification is the subgroup identification.

Format Description
(T22,'|',2X,

F7.2,4X, Fit statistics

F7.2,2X, Probability of fit statistics

F7.2) Degree of freedom

Note that this record is saved only if the original response data is frequency (group-level) data
and the FIT option in the SCORE command has been given.

Related topics

 INPUT command: NIDCHAR keyword (Section 3.2.7)


 SCORE command: FIT option (Section 3.2.14)
 SAVE command: FIT option (Section 3.2.13)

344
4 MULTILOG REFERENCE

4 MULTILOG

MULTILOG, written by David Thissen, is a computer program designed to facilitate item analy-
sis and scoring of psychological tests within the framework of Item Response Theory (IRT). As
the name implies, MULTILOG is for items with MULTIple alternatives and makes use of LOG-
istic response models, such as Samejima’s (1969) model for graded responses, Bock’s (1972)
model for nominal (non-ordered) responses, and Thissen & Steinberg’s (1984) model for multi-
ple-choice items. The commonly used logistic models for binary item response data are also in-
cluded, because they are special cases of the multiple category models. MULTILOG provides
Marginal Maximum Likelihood (MML) item parameter estimates for data in which the latent
variable of IRT is random, as well as Maximum Likelihood (ML) estimates for the fixed-effects
case. χ 2 indices of the goodness-of-fit of the model are provided. In IRT, the item parameter
estimates are the focus of item analysis. MULTILOG also provides scaled scores estimates of
the latent variable for each examinee or response pattern.

MULTILOG is best suited to the analysis of multiple-alternative items, such as those on multi-
ple-choice tests or Likert-type attitude questionnaires. It is the only widely available program
capable of fitting a wide variety of models to these kinds of data using optimal (MML) methods.
MULTILOG also facilitates refined model fitting and hypothesis testing through general provi-
sions for imposing equality constraints among the item parameters and for fixing item parame-
ters at a particular value. MULTILOG may also be used to test hypotheses about differential item
functioning (DIF; sometimes called “item bias”) with either multiple response or binary data,
through the use of its facilities to handle data from several populations simultaneously and test
hypotheses about the equality of item parameters across groups.

4.1 The MULTILOG user’s interface

Although MULTILOG syntax can still be created and submitted in batch mode as was done with
previous versions, MULTILOG version 7.0 has new features designed to make the program
more user-friendly.

 The user no longer has to create syntax using INFORLOG. The functionality of
INFORLOG and MULTILOG has been combined into a single executable file.
 In addition, the MULTILOG syntax wizard described in Section 4.2 can be used to create
a skeleton command file that can then be edited according to the user’s needs.

This document describes those elements in the user’s interface that may not be immediately clear
to the user or that behave in a somewhat nonstandard way. Each element will be discussed in
turn in the following sections.

345
4 MULTILOG REFERENCE

4.1.1 Main menu

At the center of the interface is the menu bar, which adapts to the currently active function. For
example, when you start the program, the menu bar shows only the menu choices File, View,
and Help.

However, as soon as you open a MULTILOG output file (through the File menu), the Window
and Edit menu choices show up on the menu bar. At the same time, the File menu choices ex-
pand with selections like Save and Save As And the View menu now has a Font option after the
Status bar and Toolbar choices.

Opening an existing MULTILOG command (*.mlg) file, or starting a new one, adds further
choices to the main menu bar: the Output and Run menus.

Note that you can open only one command file at a time. If you want to paste some part from an
existing command file in your current one, opening the old file will automatically close the cur-
rent one. After you copy the part you want to the clipboard, you have to reopen the *.mlg file for
pasting.

4.1.2 Run menu

The Run menu gives you the option to run the command file displayed in the main window.

346
OVERVIEW OF MULTILOG INTERFACE

When you run an analysis by clicking Run, the current command file will first be saved, if you
made any changes. You can easily tell if a command file has changed by looking at the filename
above the menu bar. An asterisk after the filename shows that the current file has changed, but
has not been saved yet. Once the analysis has been completed, the Plot option, providing access
to the graphics procedure, is enabled. For a description of the plots that can be produced, see
Chapter 6.

4.1.3 Output menu

Through the Output menu you can open the list output, named with the file extension out. Al-
ways check the end of each output file to see if it reports: NORMAL END. If it does not, something
went wrong and the output file should have some information on that.

4.1.4 Window menu

The Window menu is only available when you have at least one file open. You can use the Ctrl-
Tab key combination to switch between open files, or use the Window menu to arrange the open
files (Cascade, Tile). If you have the output (*.out) file open for a particular analysis, you could
use the Window menu to arrange this file and the command file for convenient switching.

347
4 MULTILOG REFERENCE

4.1.5 Font option

Clicking on the Font option on the View pull-down menu displays a dialog box with the fonts
that are available on your system.

You may use different fonts for command and output files. At installation, they are both set to a
special Arial Monospace font that ships with the program. To keep the tables in the output
aligned, you should always select a monospace or fixed pitch font where all the characters in the
font have the same width. Once you select a new font, that font becomes the default font. This
gives you the option to select a font (as well as font size and font style) for your command
(*.mlg) files that is different from the one for your list output (*.out) files as a quick visual re-
minder of the type of file.

348
OVERVIEW OF MULTILOG INTERFACE

4.2 Creating syntax using the MULTILOG syntax wizard

The MULTILOG syntax wizard, used to create new MULTILOG command files, uses succes-
sive dialog boxes to generate the syntax. The boxes displayed during the process depend on the
user’s choices in previous boxes.

The dialog boxes are described below, approximately in order of appearance.

4.2.1 New Analysis dialog box

The New Analysis dialog box is used to select the type of problem and/or to create a new
MULTILOG command file. This dialog box is activated when the File, New option is selected
from the main menu bar.

The type of problem is specified by selecting one of the three mutually exclusive options in the
Select type of problem group box:

 MML item parameter estimation (RANDOM option on PROBLEM command)


 Fixed-theta item parameter estimation (FIXED option on PROBLEM command)
 MLE or MAP computation (SCORES option on PROBLEM command).

349
4 MULTILOG REFERENCE

Enter the folder location and the name for the MULTILOG command file in the Folder location
and File name edit box respectively.

If the Fixed-theta Item Parameter Estimation option or the MLE or MAP computation op-
tion is chosen, the Fixed Theta dialog box, in which you are asked about the reading of a fixed
value of θ with the data, is activated once the OK button is clicked, followed by the Input Data
dialog box. Selecting MML Item Parameter Estimation or MLE or MAP Computation will
activate the Input Data dialog box when OK is clicked.

If the Blank MULTILOG Command File option is selected, the Folder location and File
name for the new file should be provided in the appropriate fields of this dialog box. Clicking
OK in this case will open an editor window in which you can enter syntax manually.

Related topics

 RANDOM, FIXED and SCORES options on the PROBLEM command (Section 4.4.7)
 Fixed Theta dialog box (Section 4.2.2)
 Input Data dialog box (Section 4.2.3)

4.2.2 Fixed Theta dialog box

The Fixed Theta dialog box is activated when you select fixed- θ item parameter estimation or
the MLE or MAP computation option in the New Analysis dialog box. It is used to indicate
whether a fixed value of θ should be read with the data. If the Yes radio button is clicked, the
position of the fixed value to be read must be indicated using the Data Format field in the Input
Data dialog box. Clicking the Back button will return the user to the New Analysis dialog box
while clicking the Next button will activate the Input Data dialog box.

Related topics

 FIXED option on the PROBLEM command (Section 4.4.7)


 New Analysis dialog box (Section 4.2.1)
 Input Data dialog box (Section 4.2.3)
 Variable format statement (Section 4.4.14)

350
OVERVIEW OF MULTILOG INTERFACE

4.2.3 Input Data dialog box

The Input Data dialog box is used to specify the type and location of the data to be analyzed.
You can enter the name of the data file in the Data file name field provided, or use the Browse
button to browse for the file. The program automatically enters the name of the command
file (specified in New Analysis dialog box) with file extension dat as the default name.

MULTILOG can handle three types of data, each associated with one of the mutually exclusive
options in the Type of data group box:

351
4 MULTILOG REFERENCE

 Counts of response patterns (PATTERN option on PROBLEM command)


 Individual item response vectors (INDIVIDUAL option on PROBLEM command)
 Fixed-effects table of counts (TABLE option on PROBLEM command) .

In all cases, the format statement describing the data must be entered in the Data Format field.
Depending on the option selected, different versions of the Input Parameters dialog box, re-
flecting the selection made here, will be displayed when the Next button is clicked.

Related topics

 PATTERNS, INDIVIDUAL and TABLE options on the PROBLEM command (Section 4.4.7)
 DATA keyword on the PROBLEM command
 Input Parameters dialog box (Section 4.2.3)
 Variable format statement (Section 4.4.14)

4.2.4 Input Parameters dialog box

The Input Parameters dialog box is used to describe the contents of the data file to be analyzed.
The version of this dialog box displayed depends on the type of data specified in the Input Data
dialog box. In each case, the type of data previously selected is noted at the top of the Input Pa-
rameters dialog box. In general, this dialog box is used to indicate the number of items, groups,
tests, patterns, examinees and the number of characters in the ID field. You can use the Back
button to return to any of the previously completed dialog boxes. Clicking Next activates the
Test Model dialog box. All fields in these dialog boxes are associated with keywords on the
PROBLEM command, with the exception of the Number of tests field. This field is used by the
program to determine the number of tabs in the Test Model dialog box, displayed later in the
setup process.

When counts of response patterns are analyzed, the Input Parameters dialog box shown below
is used to provide the following information:

 The number of items. Previous limits on the number of items that can be analyzed have
been removed in the current version of MULTILOG (NITEMS keyword on PROBLEM com-
mand).
 The number of groups. Previously, a maximum of 10 groups could be used. This limit has
also been removed (NGROUPS keyword on PROBLEM command)
 The number of patterns. This field is only displayed when response pattern data are ana-
lyzed (NPATTERNS keyword on PROBLEM command)
 The number of characters in the ID field. By default, it is assumed to be zero (NCHAR key-
word on PROBLEM command)

In the case of analysis of individual item response vectors, the same options as described above
are available, with one exception: the Number of patterns field is replaced by the Number of
examinees field. The number of examinees for which response vectors are available should be
entered in this field.

352
OVERVIEW OF MULTILOG INTERFACE

Related topics

 NITEMS, NGROUPS, NEXAMINEES, NPATTERNS and NCHARS keywords on the PROBLEM com-
mand (Section 4.4.7)
 Input Data dialog box (Section 4.2.3)
 Test Model dialog box (Section 4.2.5)

For the analysis of a fixed-effects table of counts, only three fields need to be completed: the
number of items, the number of groups and the number of tests. The Input Parameters dialog
box for this type of analysis is shown below.

353
4 MULTILOG REFERENCE

4.2.5 Test Model dialog box

The Test Model dialog box is used to specify details for a subtest. The number of tabs in the
Test Model dialog box depends on the value entered in the Number of tests field in the Input
Parameters dialog box.

354
OVERVIEW OF MULTILOG INTERFACE

The model to be fitted to the data is specified in the Test Model group box. One of six mutually
exclusive options may be selected:

 1-parameter logistic model, corresponding to the L1 option on the TEST command


 2-parameter logistic model, corresponding to the L2 option on the TEST command
 3-parameter logistic model, corresponding to the L3 option on the TEST command
 Graded model, corresponding to the GR option on the TEST command
 Nominal model, corresponding to the NO option on the TEST command
 Multiple-choice model, corresponding to the BS option on the TEST command.

In the Test Items group box, the items to be analyzed are described. By default, no items are se-
lected. Clicking the All check box under the Use header will invoke the ALL option on the TEST
command, and all items will be included in the analysis. Also, unchecking any of the items will
uncheck the All check box. Clicking the check box next to an item will select or deselect an item.
Such a selection corresponds to use of the ITEMS keyword on the TEST command. In the image
above, all items have been included in the analysis.

For each item, the number of response categories must be specified under the Category heading.
By default, it is assumed that each item has two response categories. The admissible number of
categories is between 2 and 10. This column is only available for the graded model, nominal
model, and multiple-choice model and corresponds to the NC keyword on the TEST command.

Finally, the number of the highest category has to be indicated in the case of a nominal model.
Select either A for Ascending or D for Descending under the Order header to indicate the order
of categories in each case. The Order column is only available for the nominal and multiple-
choice models and corresponds to the HIGH keyword on the TEST command.

Click the Back button to return to the Input Parameters dialog box and the Next button to pro-
ceed to the Response Codes dialog box.

Related topics

 L1, L2, L3, GR, NO and BS options on the TEST command (Section 4.4.11)
 NC, HIGH and ITEMS keywords and ALL option on the TEST command
 Input Parameters dialog box (Section 4.2.4)
 Response Codes (Binary Data) dialog box (Section 4.2.6)
 Response Codes (Non-Binary Data) dialog box (Section 4.2.7)

4.2.6 Response Codes (Binary Data) dialog box

The Response Codes dialog box for binary data is used to provide information on the response
and missing codes and the answer key for the data to be used in the analysis.

The Response Codes field is used to list all possible codes occurring in the data. The Correct
response codes field is used to provide an answer key for the total number of items to be ana-
lyzed. The Missing Code check box is checked when a value indicating “missing” for popula-

355
4 MULTILOG REFERENCE

tion membership other than the default 9.0 assumed by the program is used. Use the drop-down
list box on the right to select the appropriate missing value code for the data.

After completing the dialog box, click Next to display a summary of the information entered.
Click Finish in this dialog box to generate the command file.

Related topics

 VAIM keyword on the ESTIMATE command (Section 4.4.4)


 Format of data (binary items) (Section 4.4.14)

4.2.7 Response Codes (Non-Binary Data) dialog box

The Response Codes dialog box for non-binary data is used to provide information on the re-
sponse codes and the answer key for the data to be used in analysis.

The Response Codes field is used to list all possible codes occurring in the data. The Correct
response code fields are used to provide an answer key for each of the items to be analyzed.

After completing the dialog box, click Next to display a summary of the information entered.
Click Finish in this dialog box to generate the command file.

Related topics

 Format of data (multiple response items) (Section 4.4.14)

356
OVERVIEW OF MULTILOG INTERFACE

4.3 Getting started with MULTILOG


4.3.1 Two-parameter model for the skeletal maturity data

In this example, the generation of syntax that includes the reading of an external criterion is illus-
trated. For a complete discussion of the problem, please see Section 12.19.

The first step in creating a new command file using the syntax wizard is to select the New option
from the File menu to activate the New Analysis dialog box.

The type of problem and the name and location of the new command file are defined using the
New Analysis dialog box. As we wish to score (estimate θ ) in this run, the MLE or MAP
Computation option is selected in the Select type of problem group box. This selection corre-
sponds to the SCORES option on the PROBLEM command.

The location in which the new command file is to be stored is specified next. By default, the
folder in which MULTILOG has been installed will be displayed. This can be changed by either
typing an alternative path in the Folder Location field or by using the Browse button to the right
of this field. Finally, the name of the command file is entered in the File name field. In this case,
we want to create the command file knee.mlg in the (default) mlgwin folder. Click OK to con-
tinue with the syntax specification.

357
4 MULTILOG REFERENCE

The Fixed Theta dialog box is now displayed, allowing you to include the reading of a fixed
value with the data. Click the radio button next to the Yes option to add the CRITERION option to
the PROBLEM command. Then click Next to go to the Input Data dialog box.

358
GETTING STARTED WITH MULTILOG

The Input Data dialog box is used to provide information on the position and contents of the
raw data file. By default, the Data file name will be assumed to be in the same folder and to
have the same filename as the new command file. This may be changed by either correcting the
entry in this field or by using the Browse button ( ) to the right of this field.

The variable format statement describing the contents of this file must be entered in the Data
Format field. Recall that the data are in the format:

40 1 0.5 2112111111112111111111111111111111
33 1 1.0 3113211111112122111111111111111111
33 1 2.0 4333211111113122111111111111111111
29 1 3.0 4543211111113122111111011111111111

As the data file contains individual data identification in the first 10 columns, an identification
field is required as the first entry in the variable format statement. The format statement shown
below reflects the position of the examinee identification field (10A1), the 34 item responses
(34A1) and the criterion (F4.0). MULTILOG will read the chronological age of each individual
and use that as a starting value for the iterative modal estimation procedure. The “T” format is
used to tab to the correct positions of the respective fields. The first ten characters on each re-
cord, which are read as an identification field, are also used to assign a value to the NCHARS key-
word on the PROBLEM command through the Input Parameters dialog box (see later in the ex-
ample). Note that if the NCHARS keyword is set to 0, no ID field needs to be included in the for-
mat statement.

Select the Individual item response vectors option from the Type of data group box. The se-
lection made in this case will add the INDIVIDUAL option of the PROBLEM command, while the
entry in the Data file name field will be used in conjunction with the DATA keyword on the same
command. The variable format entered in the Data format field will be echoed to the generated
command file. For more on format specification rules please see Section 4.4.14.

The parameters for the 34 indicators are in a file called knee.par. This file was produced by
MULTILOG in a (previous) calibration run. As the parameters for the SAVE and other optional
commands cannot be set using the syntax wizard, these commands will be added after generating
the command file. Instructions concerning this can be found at the end of this example. Having
completed the Input Data dialog box, click Next to go to the Input Parameters dialog box.

359
4 MULTILOG REFERENCE

The problem uses 34 items, and this is indicated by setting the value in the Number of items
field to 34. Data from 13 examinees are available, and this is specified using the Number of ex-
aminees field. Finally, the NCHAR keyword on the PROBLEM command is set to 10, as previously
indicated in the variable format statement description, using the Number of characters in ID
field.

Entries in this dialog box correspond to the following MULTILOG keywords on the PROBLEM
command:

Dialog box field Keyword in command file

Number of items NITEMS

Number of groups NGROUP

Number of tests None; used to set number of tabs in Test


Model dialog box.

Number of patterns NPATTERNS

Number of characters in ID field NCHARS

Click Next to go to the Test Model dialog box.

360
GETTING STARTED WITH MULTILOG

A graded model is used here, and is specified by clicking the radio button next to the Graded
model option in the Test model group box. This corresponds to the GR option on the TEST com-
mand.

The “test” has varying numbers of response categories for the 34 indicators, which are entered in
the NC list on the TEST command. As all items are used, the All check box in the Use column of
the Test Items group box is clicked, and the number of categories is set by item in the Category
column as shown below. Once the number of categories for each item has been indicated, click
OK to go to the Response Code (Non-Binary data) dialog box.

361
4 MULTILOG REFERENCE

All possible responses in the data are entered in the Response Code field. The corresponding
correct response codes are entered in the Correct response codes group box. On each line, the
number of entries permitted corresponds to the number of items specified in the Input Parame-
ters dialog box.

Once the response codes (123450) are entered in the Response Code string field, these codes
appear as the first column of the Correct response code group box. For each response code and
each item, a category number is entered. Permissible values are 1, 2, …, NCAT, where NCAT de-
notes the total number of categories for a given item. In any row, a “0” indicates that the re-
sponse code value is excluded from the analysis. A valid entry for item 1, for example, is Code 1
= 5, Code 2 = 2, Code 3 = 3, Code 4 = 4, and Code 5 = 1. The entry specifies that data values
equal to 1 is assigned to the 5-th category of item 1, while data value 5 is assigned to the first
category.

Start by entering the correct responses for the first code, and press the Enter key on you key-
board when done to proceed to the next line of the window. Note that, if an attempt is made to
specify response codes not in agreement with previous selections, no value will appear in this
box. Only when valid codes are entered, will the results be displayed. Once all codes have been
entered, click OK to go to the Project Settings dialog box. Entries in the Response Code dialog
box will appear after the END command in the generated command file.

362
GETTING STARTED WITH MULTILOG

The Project Settings dialog box displays a summary of all selections made up to this point. To
go back to any of the previous dialog boxes, the Back button may be used. To generate the syn-
tax, click Finish. Syntax generated using the wizard is now displayed in the main MULTILOG

363
4 MULTILOG REFERENCE

window. Before running this problem, the following (optional) commands are added to the syn-
tax in this window by using standard Windows editing functions:

>START ALL, FORMAT, PARAM=’KNEE.PAR’;


>SAVE;

The START command is used to override the default starting values for all the item parameters
and enter others, in this case from the file knee.par.

Click the Run option on the main menu bar to start the analysis. Once the analysis has been
completed, the output generated may be viewed using the Output option on the same menu bar.
The output file will then be displayed in the main window, and the Window option may be used
to switch between syntax and output files.

4.3.2 Three-parameter (and guessing) model for the LSAT6 data

For a description of the problem for which syntax is generated here, please see Sections 12.1 to
12.3.

Select the New option from the File menu to activate the New Analysis dialog box. In the New
Analysis dialog box, the type of problem and the name and location of the new command file are
defined. For the LSAT data, we wish to perform MML item parameter estimation. Note that this
corresponds to the RANDOM option on the PROBLEM command. Click on the MML Item Parame-
ter Estimation option.

364
GETTING STARTED WITH MULTILOG

Next, the location in which the new command file is to be stored is specified. By default, the
folder in which MULTILOG has been installed will be displayed. This can be changed by either
typing an alternative path in the Folder Location field or by using the Browse button to the right
of this field. Finally, the name of the command file is entered in the File name field. In this case,
we want to create the command file lsat6_2.mlg in the (default) mlgwin folder. Click OK to go
to the Input Data dialog box.

The Input Data dialog box is used to provide information on the position and contents of the
raw data file. By default, the Data file name will be assumed to be in the same folder and to
have the same filename as the new command file. This may be changed by either correcting the
entry in this field or by using the Browse button ( ) to the right of this field.

365
4 MULTILOG REFERENCE

The variable format statement describing the contents of this file must be entered in the Data
Format field. For this example, recall that the data are of the form

1 00000 3
2 00001 6
3 00010 2
4 00011 11

As the data file contains patterns and frequencies, no identification field is required. The format
statement entered reflects the position of the pattern (5A1) and frequency (F4.0) only. In each
row, the first 4 columns are skipped, and this is indicated by the value “4” in combination with
the “X” operator. Select the Counts of response patterns option from the Type of data group
box as shown below. The selection made in this case will add the PATTERN option of the PROBLEM
command, while the entry in the Data file name field will be used with the DATA keyword on the
same command. The variable format entered in the Data format field will be echoed to the gen-
erated command file. For more on format specification rules please see Section 4.4.14. After
completing the Input Data dialog box, click Next to go to the Input Parameters dialog box.

The number of items, groups, tests, patterns/examinees, and characters in the identification field
are specified using the Input Parameters dialog box. When the dialog box is first displayed, all
entries are set to 1, assumed to be the default. For this problem, we need only indicate the num-
ber of items (5) and the number of patterns (32) in the Number of items and Number of pat-
terns fields respectively. Note that the buttons to the right of each of these fields may be used to
increase or decrease the value displayed in a particular field.

Entries on this dialog box correspond to the following MULTILOG keywords on the PROBLEM
command:

366
GETTING STARTED WITH MULTILOG

Dialog box field Keyword in command file

Number of items NITEMS

Number of groups NGROUP

Number of tests None; used to set number of tabs on Test Model dialog
box.

Number of patterns NPATTERNS

Number of characters in ID
field NCHARS

Click Next to go to the Test Model dialog box.

The Test Model dialog box is used to describe the items assigned to the test(s) and the model to
be fitted. All entries on this dialog box correspond to keywords/options on the TEST command:

 The Test Model group box corresponds to the choice of one of the following options:
L1/L2 /L3/GR/NO/BS.
 Entries in the Use column of the Test items group box correspond to the ALL option and
ITEMS keyword on the TEST command.
 The Categories and Order columns of the Test items group box (not used in this exam-
ple) correspond to the NC and HIGH keywords respectively.

The number of tabs displayed at the top of this dialog box depends on the entry in the Number
of tests field in the Input Parameters dialog box. This problem requires the use of all 5 items in

367
4 MULTILOG REFERENCE

the data on a single test, so the All check box at the top of the Use column is checked to select all
items simultaneously. To select single items, the check boxes next to the items selected for inclu-
sion should be clicked individually.

At this point, the data file and the model specification are complete. All that remains to be done
is to indicate the response codes. To do this, click Next to go to the Response Codes (Binary
Data) dialog box.

The response patterns in the data file consist of combinations of “0” and “1” values. These two
values are entered in the Response Codes field, which should reflect all possible response codes
present in the data. No missing code is used here, so the Missing code check box is left un-
checked.

The response to each of the items that indicates the correct response is entered in the Correct
response codes field. Note that the number of entries allowed in this field is equal to the number
of items specified in the Input Parameters dialog box.

All entries in this dialog box are echoed to the command file and can be found directly after the
END command that is automatically added to the command file, but before the variable format
statement that is also written to the command file.

368
GETTING STARTED WITH MULTILOG

Problem specification is now complete. When the Next button is clicked on the Response Codes
dialog box, a list of the options specified is displayed in the Project Settings dialog box. To go
back to any of the previous dialog boxes, click the drop-down list button next to the Back button
and select from the list that will be displayed. To generate the command file, click Finish.

369
4 MULTILOG REFERENCE

Once the Finish button has been clicked in the Project Settings dialog box, you are returned to
the main MULTILOG window, where the generated syntax is displayed. In this example, no
changes are needed but, if additional optional commands are to be used, you can insert such
commands in this window using standard Windows editing functions. To run the generated
command file, click the Run option on the main menu bar.

4.3.3 Generating syntax for a fixed- θ model

This example illustrates user input for a fixed- θ analysis. For a discussion of these data, see the
previous section. To start the process, select the New option from the File menu. The New
Analysis dialog box shown below will be displayed.

On the New Analysis dialog box, the type of problem and the name and location of the new
command file is defined. As we wish to perform a fixed- θ analysis for the mouse data, the
Fixed-theta Item Parameter Estimation option is selected by clicking on it. Note that this cor-
responds to the FIXED option on the PROBLEM command.

Next, the location in which the new command file is to be stored is specified. By default, the
folder in which MULTILOG has been installed will be displayed. This can be changed by either
typing an alternative path in the Folder Location field or by using the Browse button to the right
of this field. Finally, the name of the command file is entered in the File name field. In this case,
we want to create the command file mouse.mlg in the (default) mlgwin folder. Click OK to pro-
ceed with the specification.

370
GETTING STARTED WITH MULTILOG

As the Fixed-theta Item Parameter Estimation option was selected in the New Analysis dia-
log box, the Fixed Theta dialog box is displayed next. The Fixed Theta dialog box is only acti-
vated in the user-selected fixed- θ item parameter estimation in the New Analysis dialog box. It
is used to indicate whether a fixed value of θ should be read with the data. Clicking the Back
button will return you to the New Analysis dialog box while clicking the Next button will acti-
vate the Input Data dialog box.

Leaving the default entry (No) as it is displayed on this dialog box, click Next.

371
4 MULTILOG REFERENCE

Recall that the data are in a file called mouse.dat, which contains the following four lines:

1 7 0 2 11
0 6 0 6 10
0 2 0 5 11
3 10 2 0 2

Each of the four lines of data represents one of four groups of mice; each group of mice repre-
sents a cell of a 2 x 2 experimental design. The response variable (measured on an ordinal scale)
is the severity of audiogenic seizures. The column categories are “crouching”, “wild running”,
“clonic seizures”, “tonic seizures”, and “death”.

The variable format statement describing the contents of this file must be entered in the Data
Format field. As the data file contains frequencies for each cell of the table, no identification
field is required and the format statement entered reflects the frequency in each cell (5F3.0)
only.

Select the Fixed-effect table of counts option from the Type of data group box as shown below
to indicate that cell frequencies from a table are used as input. The selection made in this case
will add the TABLE option of the PROBLEM command, while the entry in the Data file name field
will be used in conjunction with the DATA keyword on the same command. The variable format
entered in the Data format field will be echoed to the end of the generated command file. For
more on format specification rules, please see Section 3.2.17.

Having completed the Input Data dialog box, click Next to go to the Input Parameters dialog
box.

372
GETTING STARTED WITH MULTILOG

The Input Parameters dialog box reflects the selections made in previous dialog boxes. Only
the numbers of items, groups, and tests need to be specified for this type of problem. When the
dialog box is first displayed, all entries are set to 1, assumed to be the default. For this problem,
you need only indicate the number of items (1) and the number of groups (4) in the Number of
items and Number of groups fields respectively. Note that the buttons to the right of each of
these fields may be used to increase or decrease the value displayed in a particular field.

Entries in this dialog box correspond to the following MULTILOG keywords on the PROBLEM
command:

Dialog box field Keyword in command file

Number of items NITEMS

Number of groups NGROUP

None; used to set number of tabs on Test Model dialog


Number of tests
box.

Click Next to go to the Test Model dialog box.

373
4 MULTILOG REFERENCE

In the Test Model dialog box, only one tab is displayed. In addition, only one item is available
for inclusion on the test. This corresponds to the number of items and tests entered on the Input
Parameters dialog box.

As a graded model (corresponding to the GR option on the TEST command) is required, click the
Graded model radio button.

374
GETTING STARTED WITH MULTILOG

The item can be selected by either clicking the check box next to All or the check box next to “1”
in this case. The entries in the Use column correspond to the ALL option and ITEMS keyword re-
spectively.

In the case of a graded model, the number of categories must be specified. The presence of 5
categories is indicated using the buttons to the right of this field. This sets the value for the NC
keyword on the TEST command.

This completes the model specification, and clicking the Next button on the Test Model dialog
box now generates the syntax.

The generated syntax is displayed in the main MULTILOG window. To add the additional op-
tional commands

>TGROUPS NUMBER=4, MIDDLES=(1,1,1,-1);


>FIX ITEMS=1, AJ, VALUE=1.0;
>FIX ITEMS=1, BK=4, VALUE=0.4756;

to the syntax, use standard Windows editing functions. When done, click the Run option on the
main menu bar to start the analysis. The output generated during the analysis may be accessed
using the Output option after completion of the analysis.

4.4 Command syntax


4.4.1 Overview of syntax

In the table below, the MULTILOG commands are listed in the order in which they should ap-
pear in the command file. MULTILOG command files should have a *.mlg suffix. In the rest of
this section, these commands are listed and discussed in alphabetical order.

375
4 MULTILOG REFERENCE

Command Required/ Keyword / Option

Optional
TITLE
Required
TITLE
RANDOM/FIXED/SCORE, NITEMS=n, NGROUP=n,
PATTERNS/INDIVIDUAL/TABLE,
PROBLEM Required NPATTERNS=n/NEXAMINEES=n, NCHARS=n, CRITERION,
NOPOP, DATA=filename;
ALL/ITEMS=(list), L1/L2/L3/GRADED/NOMINAL/BS,
TEST Required NC=(list), HIGH=(list);
ALL/ITEMS=(list)/GROUPS=(list), WITH=(list),
EQUAL Optional AJ/BJ/CJ/BK=(list)/AK=(list)/CK=(list)/DK=(list)
/MU/SD;
NCYCLES=n, ITERATIONS=n, ICRIT=n, CCRIT=n,
ESTIMATE Optional ACCMAX=n, VAIM=n;

END Required
ALL/ITEMS=(list)/GROUPS=(list), VALUE=n,
FIX Optional AJ/BJ/CJ/BK=(list)/AK=(list)/CK=(list)/DK=(list)
/MU/SD;

LABELS Optional ALL/ITEMS=(list), NAMES=(‘lab1’,’lab2’,...);

ALL/ITEMS=(list)/GROUPS=(list), PARAMS=(n,n),
PRIORS Optional AJ/BJ/CJ/BK=(list)/AK=(list)/CK=(list)/DK=(list)
/MU/SD;

SAVE Optional
ALL/ITEMS=(list), PARAM=‘filename’, FORMAT,
START Optional PARAM=file;

TGROUPS Optional NUMBER=n, QP=(list), MIDDLE=(list);

ALL/ITEMS=(list), AK/CK/DK,
TMATRIX Optional DEVIATION/POLYNOMIAL/TRIANGLE;

376
OVERVIEW OF SYNTAX

A basic command file may be created using the MULTILOG interface. Values can be assigned
to the following keywords in this way:

Command Keyword / Option Dialog box in which this is set

New Analysis
PROBLEM RANDOM/FIXED/SCORE
Fixed Theta
PROBLEM PATTERNS/INDIVIDUAL/TABLE Input Data
PROBLEM NITEMS=n Input Parameters
PROBLEM NGROUP=n Input Parameters
PROBLEM NPATTERNS=n/NEXAMINEES=n Input Parameters
PROBLEM NCHARS=n Input Parameters
PROBLEM DATA=filename Input Data
TEST ALL/ITEMS=(list) Test Model
TEST L1/L2/L3/GRADED/NOMINAL/BS Test Model
TEST NC=(list) Test Model
TEST HIGH=(list) Test Model
ESTIMATE VAIM=n Response Codes (Binary Data)

Variable format -
Input Data
statement

377
4 MULTILOG REFERENCE

4.4.2 END command

(Required)

Purpose

Terminates command line entry.

Format

>END;

378
EQUAL COMMAND

4.4.3 EQUAL command

(Optional)

Purpose

To impose equality constraints among the item parameters.

Format

>EQUAL ALL/ITEMS=(list)/GROUPS=(list), WITH=(list),


AJ/BJ/CJ/BK=(list)/AK=(list)/CK=(list)/DK=(List)/MU/SD;

AJ/BJ/CJ/BK/AK/CK/DK/MU/SD keyword

Purpose

The set of parameters is specified by one of the following mutually exclusive keywords: AJ,
BJ, CJ, BK=(list), AK=(list), CK=(list), DK=(List), MU, or SD.

 AJ is the slope for the graded model and 3PL model


 BJ is the threshold for binary graded items and 3PL model
 CJ is the lower asymptote for the 3PL model
 BK=(list) specifies the listed threshold parameters for the graded model
 AK=(list) specifies the listed contrasts among the ak s for the nominal and multiple-
choice models
 CK=(list) specifies the listed contrasts among the ck s for the nominal and multiple-
choice models
 DK=(list) specifies the listed contrasts among the d k s for the multiple-choice model
 MU is the mean of the population distribution for a group
 SD is the standard deviation of the population distribution for a group.

Format

AJ/BJ/CJ/BK=(list)/AK=(list)/CK=(list)/DK=(list)/MU/SD

Related topics

 TEST command: L1/L2/L3/GRADED/NOMINAL/BS options) (Section 4.4.11)

379
4 MULTILOG REFERENCE

ALL/ITEMS/GROUPS keyword

Purpose

The set of items is specified with one of the following: ALL, ITEMS=(list), or
GROUPS=(list).

 ALL refers to all of the items in the data


 ITEMS refers to a subset of the items in the data
 GROUPS specifies groups, for reference to MU and SD

Format

ALL/ITEMS=(list)/GROUPS=(list)

WITH keyword

Purpose

This keyword specifies pairwise constraints, as illustrated in examples 2 and 3 below:

Format

WITH=(list)

Example 1

If the item parameters on the EQUAL command are to be set equal for all items in a set, the set
of items may be given as ALL if the equality constraint applies to all items on the test, or
ITEMS=(list) if the equality constraint is to be imposed for a subset of the items given in
the list. For example, the following sequence specifies the 1PL model:

>TEST ALL, L1;


>EQUAL AJ, ALL;

(It is easier to specify L1 on the TEST command).

Example 2

There are cases in which it is desirable to impose equality constraints between (a number of)
pairs of items; this is done by using WITH=(list) in conjunction with ITEMS=(list). A
one-to-one relationship between the items in the ITEMS list and the WITH list is required; the
parameters are made equal within the implied pairs. For example,

>EQUAL AJ, ITEMS=(2,4), WITH(1,3);

380
EQUAL COMMAND

has the effect of setting aitem1 = aitem 2 and aitem 3 = aitem 4 . When WITH is used in this way, it
must refer to the lower-numbered item of each pair; the form must be

>EQUAL parameter ITEMS=(higher numbers) WITH=(lower numbers);

Example 3

For the parameter of BS items, if the WITH list is identical to the ITEMS list, equality con-
straints are imposed on the specified contrasts among the parameters within each item.

For example,

>EQUAL DK=(1,2,3), ITEMS=(1,2,3,4), WITH=(1,2,3,4);

sets the first three contrasts among the d k s equal within each item for items 1 – 4. For four-
alternative multiple-choice items, such as those considered in Section 12.6, this would have
the effect of setting d 2 = d3 = d 4 ; the identifiability constraint that the sum of the ds must be
one would then give d1 = 1 − d 2 − d3 − d 4 . Similar forms may be used to impose constraints
on the ak s and ck s . See Section 12.10 for further discussion of the use of the EQUAL com-
mand with the multiple-choice model.

The parameters of Gaussian population distributions may also be constrained if there are
several groups. The default arrangement fixes µ = 0 for the last group, as well as σ = 1 for
all groups. If there are three groups,

>EQUAL MU, GROUP=(1,2);

constrains the means of the first two groups to be equal.

Related topics

 EQUAL command: ALL/ITEMS/GROUPS keyword (Section 4.4.3)

381
4 MULTILOG REFERENCE

4.4.4 ESTIMATE command

(Optional)

Purpose

To reset internal program parameters controlling the estimation.

Format

>ESTIMATE NCYCLES=n, ITERATIONS=n, ICRIT=n, CCRIT=n, ACCMAX=n, VAIM=n;

ACCMAX keyword

Purpose

Specifies the maximum value for the acceleration parameter; more negative is more accel-
eration.

Format

ACCMAX=n

Default

0.0.

CCRIT keyword

Purpose

Specifies the convergence criterion for the EM-cycles.

Format

CCRIT=n

Default

0.001.

382
ESTIMATE COMMAND

ICRIT keyword

Purpose

Specifies the convergence criterion for the M-step. It should always be smaller than CCRIT.

Format

ICRIT=n

Default

0.0001.

Related topics

 ESTIMATE command: CCRIT keyword

ITERATIONS keyword

Purpose

A control parameter for the number of iterations in the M-step; the actual number of itera-
tions is ITERATIONSxNP, where NP is the number of parameters being jointly estimated,
which usually means the number of parameters for a particular item. For very large prob-
lems, it may be useful (faster) to set ITERATIONS at 2.

Format

ITERATIONS=n

Default

4.

NCYCLES keyword

Purpose

Specifies the number of cycles of MML estimation.

Format

NCYCLES=n

383
4 MULTILOG REFERENCE

Default

25.

VAIM keyword

Purpose

Defines the value indicating “missing” for population membership.

Format

VAIM=n

Default

9.0.

Related topics

 This keyword may be set through the Response Codes (Binary Data) dialog box (Section
4.2.6)

384
FIX COMMAND

4.4.5 FIX command

(Optional)

Purpose

To fix item parameters at their starting values.

Format

>FIX ALL/ITEMS=(list)/GROUPS=(list), VALUE=n,


AJ/BJ/CJ/BK=(list)/AK=(list)/CK=(list)/DK=(list)/MU/SD;

 The set of items is specified with one of the following: ALL, ITEMS=(list), or
GROUPS=(list).
 The set of parameters is specified by one of the following mutually exclusive keywords:
AJ, BJ, CJ, BK=(list), AK=(list), CK=(list), DK=(List), MU, or SD.

AJ/BJ/CJ/BK/AK/CK/DK/MU/SD keyword

Purpose

The set of parameters is specified by one of the following mutually exclusive keywords: AJ,
BJ, CJ, BK=(list), AK=(list), CK=(list), DK=(List), MU, or SD.

 AJ is the slope for the graded model and 3PL model


 BJ is the threshold for binary graded items and 3PL model
 CJ is the lower asymptote for the 3PL model
 BK=(list) specifies the listed threshold parameters for the graded model
 AK=(list) specifies the listed contrasts among the ak s for the nominal and multiple-
choice models
 CK=(list) specifies the listed contrasts among the ck s for the nominal and multiple-
choice models
 DK=(list) specifies the listed contrasts among the d k s for the multiple-choice model
 MU is the mean of the population distribution for a group
 SD is the standard deviation of the population distribution for a group.

Format

AJ/BJ/CJ/BK=(list)/AK=(list)/CK=(list)/DK=(list)/MU/SD

385
4 MULTILOG REFERENCE

ALL/ITEMS/GROUPS keyword

Purpose

The set of items is specified with one of the following: ALL, ITEMS=(list), or
GROUPS=(list).

 ALL refers to all of the items in the data


 ITEMS refers to a subset of the items in the data
 GROUPS specifies groups, for reference to MU and SD

Format

ALL/ITEMS=(list)/GROUPS=(list)

Related topics

 FIX command: MU/SD keywords

VALUE keyword

Purpose

This real constant is used to specify the value at which the parameter is to be fixed.

Format

VALUE=n.

For the 3PL model, the values are specified in “traditional 3PL, normal metric” form; for the
other models, the actual values of the parameters or contrasts must be used. The parameters
of Gaussian population distributions may also be fixed if there are several groups. The de-
fault arrangement fixes µ = 0 for the last group, as well as σ = 1 for all groups. If there are
three groups,

FIX MU, GROUPS=(1,2), VALUE=0.0;

fixes the means of the first two groups at 0.0.

386
LABELS COMMAND

4.4.6 LABELS command

(Optional)

Purpose

To enter item-labels for the output.

Format

>LABELS ALL/ITEMS=(list), NAMES=(‘lab1’,’lab2’, ...);

The set of items is specified with either the keyword ALL or ITEMS=(list).

ALL/ITEMS option

Purpose

The set of items for which labels is provided is specified with either the keyword ALL or
ITEMS=(list).

 The ALL option refers to all of the items in the data.


 The ITEMS keyword specifies a subset of the items.

Format

ALL/ITEMS=(list)

Related topics

 LABELS command: NAMES keyword

NAMES keyword

Purpose

These labels are entered as a list; each label must have 4 or fewer characters.

Format

NAMES=(‘lab1’,’lab2’,…).

Related topics

 LABELS command: ALL/ITEMS keyword

387
4 MULTILOG REFERENCE

4.4.7 PROBLEM command

(Required)

Purpose

To set up the problem and to specify the type of data MULTILOG is to expect.

Format

>PROBLEM RANDOM/FIXED/SCORE, PATTERNS/INDIVIDUAL/TABLE, NITEMS=n,


NGROUP=n, NPATTERNS=n/NEXAMINEES=n, NCHARS=n, CRITERION, NOPOP,
DATA=filename;

The class of the problem is specified by selecting one of the mutually exclusive options:
RANDOM/FIXED/SCORE. The type of input data is specified by selecting one of the mutually
exclusive options: PATTERNS/INDIVIDUAL/TABLE.

Related topics

 The RANDOM, FIXED and SCORES options in the New Analysis dialog box may be used to
select the type of problem (Section 4.2.1)
 The PATTERNS, INDIVIDUAL and TABLE options may be set using the Input Data dialog
box (Section 4.2.3)
 The NITEMS, NGROUP, NPATTERNS, NEXAMINEES, NCHARS and DATA keywords may be ac-
cessed via the Input Parameters dialog box (Section 4.2.4)
 The CRITERION option is added to the PROBLEM command by clicking “Yes” in the Fixed
Theta dialog box (Section 4.2.2)

CRITERION option

Purpose

If FIXED or SCORE is entered, you may specify, by including this keyword, that a fixed value
of θ is to be read with the data as the criterion for fixed- θ item parameter calibration or as a
starting value for computation of MLE[ θ ].

Format

CRITERION

388
PROBLEM COMMAND

Related topics

 PROBLEM command: FIXED/SCORES option (Section 4.4.7)


 The CRITERION option is added to the PROBLEM command by clicking “Yes” in the Fixed
Theta dialog box (Section 4.2.2)
 Input Parameters dialog box (Section 4.2.4)

DATA keyword

Purpose

This keyword is used to enter the name and location of the raw data file. The name may by up to
128 characters in length and must be enclosed in single quotes. Note that each line of the com-
mand file has a maximum length of 80 characters. If the filename does not fit on one line of 80
characters, the remaining characters should be placed on the next line, starting at column 1.

Format

DATA=<‘filename’>

Related topics

 The DATA keyword may be accessed via the Input Parameters dialog box (Section 4.2.4)

RANDOM/SCORE/FIXED option

Purpose

The class of problem is specified by selecting one of three mutually exclusive options.

 The FIXED option is used for fixed- θ item parameter estimation.


 The RANDOM option is used for MML item parameter estimation.
 The SCORES option is used for computation of MLE[ θ ] or MAP[ θ ].

Format

FIXED/RANDOM/SCORE

Related topics

 The RANDOM, FIXED and SCORES options in the New Analysis dialog box may be used to
select the type of problem (Section 4.2.1)

389
4 MULTILOG REFERENCE

PATTERNS/INDIVIDUAL/TABLE option

Purpose

The type of input data is specified by selecting one of three mutually exclusive options.

 The PATTERN option is used for pattern data (also see NPATTERN keyword on PROBLEM
command). The number of examinees (NEXAMINEES) and the number of patterns
(NPATTERNS) are mutually exclusive options.
 INDIVIDUAL is used for individual item response vectors (also see NEXAMINEES keyword
on PROBLEM command).
 TABLE is used for a fixed-effect table of counts.

Format

PATTERN/INDIVIDUAL/TABLE

Related topics

 The RANDOM, FIXED and SCORES options in the New Analysis dialog box may be used to
select the type of problem (Section 4.2.1)
 The type of data—pattern, individual or table—is specified on the Input Data dialog box
(Section 4.2.3)
 PROBLEM command: NPATTERNS/NEXAMINEES keywords

NCHARS keyword

Purpose

To specify the number of characters in the ID field for individual response or pattern count
data (see INDIVIDUAL/PATTERN/TABLE option on PROBLEM command).

Format

NCHARS=n

Related topics

 The NCHARS keyword may be accessed via the Input Parameters dialog box (Section
4.2.4).
 PROBLEM command: INDIVIDUAL/PATTERN/TABLE option

390
PROBLEM COMMAND

NEXAMINEES/NPATTERNS keyword

Purpose

Used to indicate the number of patterns or examinees for which responses are present in the
data.

 PATTERN represents the number of response patterns tabulated for pattern data (also see
PATTERN option on PROBLEM command).
 The number of examinees for individual data. To be used with INDIVIDUAL option on the
PROBLEM option.
 The number of examinees (NEXAMINEES) and the number of patterns (NPATTERNS) are
mutually exclusive options.

Format

NPATTERNS=n/NEXAMINEES=n

Related topics

 The NITEMS, NGROUP, NPATTERNS, NEXAMINEES, NCHARS and DATA keywords may be ac-
cessed via the Input Parameters dialog box (Section 4.2.4)
 PROBLEM command: INDIVIDUAL/PATTERN/TABLE options
 PROBLEM command: NPATTERNS/NEXAMINEES keywords

NGROUP keyword

Purpose

Specifies the number of groups to be used in the analysis.

Format

NGROUP=n

Related topics

 The NGROUP keyword may be accessed via the Input Parameters dialog box (Section
4.2.4)

NITEMS keyword

Purpose

Specifies the number of items to be used in the analysis.

391
4 MULTILOG REFERENCE

Format

NITEMS=n

Related topics

 The NITEMS keyword may be accessed via the Input Parameters dialog box (Section
4.2.4)

NOPOP option

Purpose

If SCORE is specified, the default is MAP estimation including the population distribution. If
no population distribution is desired, enter NOPOP. If NOPOP is entered, some MLE[ θ ]s may
not be finite and the program may stop.

Format

NOPOP

Related topics

 PROBLEM command: SCORE/FIXED/RANDOM option

392
PRIORS COMMAND

4.4.8 PRIORS command

(Optional)

Purpose

To impose Gaussian prior distributions for the item parameters.

Format

>PRIORS ALL/ITEMS=(list)/GROUPS=(list), PARAMS=(n,n),


AJ/BJ/CJ/BK=(list)/AK=(list)/CK=(list)/DK=(list)/MU/SD;

The set of items is specified with one of the following: ALL, ITEMS=(list), or
GROUPS=(list). The set of parameters is specified by one of the following mutually exclu-
sive keywords: AJ, BJ, CJ, BK=(list), AK=(list), CK=(list), DK=(List), MU, or SD.

The parameters of the Gaussian prior distribution are entered using the PARAMS keyword. In
the special case of CJ or DK=1, indicating the asymptote for the 3PL model, the prior must be
specified for the logit of the asymptote, which is the parameter in MULTILOG. A standard
deviation of 0.5 works well for the asymptote.

AJ/BJ/CJ/BK/AK/CK/DK/MU/SD keyword

Purpose

The set of parameters is specified by one of the following mutually exclusive keywords: AJ,
BJ, CJ, BK=(list), AK=(list), CK=(list), DK=(List), MU, or SD.

 AJ is the slope for the graded model and 3PL model


 BJ is the threshold for binary graded items and 3PL model
 CJ is the lower asymptote for the 3PL model
 BK=(list) specifies the listed threshold parameters for the graded model
 AK=(list) specifies the listed contrasts among the ak s for the nominal and multiple-
choice models
 CK=(list) specifies the listed contrasts among the ck s for the nominal and multiple-
choice models
 DK=(list) specifies the listed contrasts among the d k s for the multiple-choice model
 MU is the mean of the population distribution for a group
 SD is the standard deviation of the population distribution for a group.

Format

AJ/BJ/CJ/BK=(list)/AK=(list)/CK=(list)/DK=(list)/MU/SD

393
4 MULTILOG REFERENCE

ALL/ITEMS/GROUPS option

Purpose

The set of items is specified with one of the mutually exclusive options ALL,
ITEMS=(list) or GROUP=(list).

 The ALL option refers to all of the items in the data.


 The ITEMS keyword specifies a subset of the items.
 The GROUPS keyword specifies groups, with reference to MU and SD.

Format

ALL/ITEMS=(list)/GROUPS=(list)

PARAMS keyword

Purpose

Specify the mean and standard deviation of the normal prior to be imposed on the item pa-
rameter(s) as (mean, standard deviation).

Format

PARAMS=(n,n)

394
SAVE COMMAND

4.4.9 SAVE command

Purpose

To instruct MULTILOG to write the parameter estimates in item calibration problems, or


the MLE[ θ ] or MAP[ θ ] in scoring problems, to external files.

Format

>SAVE FORMAT;

The saved parameters may be used to restart the program or score examinees, with the pa-
rameters read after a START command. When item calibration is performed, the parameters
are saved to <jobname>.par. Scores obtained from scoring problems are saved to job-
name.sco.

FORMAT option

Purpose

The present form of the parameter file differs from that of previous versions of the program.
Users who wish to have MULTILOG save the estimated parameters in the previous style
may insert the FORMAT option on the SAVE command. The program will then write a parame-
ter file in the previous file, but the format of the parameter values will be 5F12.5 rather than
8F10.3; the new format must be used in formatted reading of the saved file.

If this option is not present, the file will be saved in free format.

Related topics

 START command (see below)

395
4 MULTILOG REFERENCE

4.4.10 START command

Purpose

To override the default starting values for the item parameters and enter others.

Format

>START ALL/ITEMS=(list), PARAM=‘filename’, FORMAT;

The set of item related options is specified with either the ALL or ITEMS=(list) keyword.

ALL/ITEMS option

Purpose

The set of items is specified with one of the mutually exclusive options ALL or
ITEMS=(list).

 The ALL option refers to all of the items in the data.


 The ITEMS keyword specifies a subset of the items.

Format

ALL/ITEMS=(list)

FORMAT option

Purpose

The present form of the parameter files differs from that of previous versions of the pro-
gram. The FORMAT option is used in the processing of parameter files created with previous
versions. When this option is present on the START command, the next line of the command
file must contain the format statement for reading the parameter file. The statement
(8F10.3) is the required format for previous style parameter files. The filename is specified
using the PARM keyword.

Format

FORMAT

Related topics

 START command: PARAM keyword

396
START COMMAND

PARAM keyword

Purpose

This option is used to give the name and location of an external file containing parameter
values that should be used as starting values in the current analysis or for comparing exami-
nee or pattern scores. The filename can be up to 128 characters in length and should be en-
closed in single quotes. Note that each line of the command file has a maximum length of 80
characters. If the filename does not fit on one line of 80 characters, the remaining characters
should be placed on the next line, starting at column 1. This option in used in combination
with FORMAT option. If this keyword does not appear, the parameters are assumed to be in
the command file immediately following the START command.

Format

PARAM=<‘filename’>

Related topics

 START command: FORMAT option

397
4 MULTILOG REFERENCE

4.4.11 TEST command

(Required)

Purpose

To define the IRT model for a set of items. Note that ALL and ITEMS=(list) and L1, L2,
L3, GRADED, NOMINAL and BS are mutually exclusive options.

Format

>TEST ALL/ITEMS=(list), L1/L2/L3/GRADED/NOMINAL/BS, NC=(list),


HIGH=(list);

Related topics

 The L1, L2, L3, GRADED, NOMINAL and BS options in the Input Parameters dialog box may
be used to select the type of model to be fitted to the data (Section 4.2.4).
 The ALL option and ITEMS, NC and HIGH keywords may be set by using the Test Model
dialog box (Section 4.2.5).

ALL option/ITEMS keyword

Purpose

One of the following mutually exclusive options may be selected:

 The ALL option will select all of the items in the data as indicated in the variable format
statement.
 The ITEMS keyword is used to select a subset of the items for inclusion in the subtest(s).

Format

ALL/ITEMS=(list)

Related topics

 The ALL option or ITEMS keyword may be set using the Test Model dialog box.

L1/L2/L3/GRADED/NOMINAL/BS option

Purpose

Used to select the type of model to be fitted to the data. Note that L1, L2, L3, GRADED,
NOMINAL and BS are mutually exclusive options.

398
TEST COMMAND

 L1 represents the one-parameter logistic (1PL).


 L2 represents the two-parameter logistic (2PL).
 L3 represents the three-parameter logistic (3PL).
 GRADED represents Samejima’s (1969) graded model.
 NOMINAL represents Bock’s (1972) nominal model.
 BS represents the multiple-choice model of Thissen and Steinberg (1984), a version of
proposals by Bock (1972) and Samejima (1979).

For the multiple-category models(GRADED,NOMINAL and BS) it is also necessary to indicate


the number of categories of response for each item. This is done with the NC keyword. For
the nominal model, it is also necessary to specify that one of the categories is “HIGH”; this is
usually the correct response on an ability test.

Format

L1/L2/L3/GRADED/NOMINAL/BS

Related topics

 The L1, L2, L3, GRADED, NOMINAL and BS options in the Input Parameters dialog box may
be used to select the type of model to be fitted to the data (Section 4.2.4).
 TEST command: NC and HIGH keywords

HIGH keyword

Purpose

This keyword is used to enter the number of the highest category for each item for the nomi-
nal models; this is usually the correct response on an ability test.

Format

HIGH=(list)

Related topics

 The NC and HIGH keywords may be set by using the Test Model dialog box (Section 4.2.5)
 TEST command: L1/L2/L3/GRADED/NOMINAL/BS options
 TEST command: NC keyword

NC keyword

Purpose

This keyword is used to enter the number of response categories for each item. Note that the
nominal model cannot be used for binary (NC=2) data; use L2.

399
4 MULTILOG REFERENCE

Format

NC=(list)

Default

(2(0)NITEMS).

Maximum

10 for any item.

Related topics

 The NC and HIGH keywords may be set using the Test Model dialog box (Section 4.2.5)
 TEST command: L1/L2/L3/GRADED/ NOMINAL/BS options
 PROBLEM command: NITEMS keyword (Section 4.4.7)

400
TGROUPS COMMAND

4.4.12 TGROUPS command

(Optional)

Purpose

To specify grouping on the θ -dimension, for quadrature in MML or fixed groups for the
fixed-effects model.

Format

>TGROUPS NUMBER=n, QP=(list), MIDDLE=(list);

MIDDLE keyword

Purpose

Specifies the NUMBER of fixed groups at the values of θ as given in the list. It is used in the
context of fixed-effects estimation.

Format

MIDDLE=(list)

Related topics

 TGROUPS command: NUMBER keyword

NUMBER keyword

Purpose

Specifies the number of quadrature points for MML estimation, or the number of θ -groups
for fixed-effects estimation.

Format

NUMBER=n

Maximum

150.

401
4 MULTILOG REFERENCE

QP keyword

Purpose

Specifies the NUMBER of quadrature points, placed at values of θ as given in the list. It is
used in the context of MML random-effects estimation.

Format

QP=(list)

Related topics

 TGROUPS command: NUMBER keyword

402
TMATRIX COMMAND

4.4.13 TMATRIX command

(Optional)

Purpose

Specifies the form of the T -matrices for the NOMINAL/BS model.

Format

>TMATRIX ALL/ITEMS=(list), AK/CK/DK DEVIATION/POLYNOMIAL/TRIANGLE;

The set of item related options is specified with either the ALL or ITEMS=(list) keyword.
One of the three vectors of parameters of the nominal or multiple-choice model is specified
with one of the following options: AK, CK or DK. One of the following three T -matrix op-
tions specifies the matrix: DE, PO or TR.

The vector of parameters estimated by MULTILOG is multiplied by the T -matrices listed


to give the ak s , ck s and d k s of the model.

For example, the version of Masters’ (1982) model in which the slopes are not constrained
to be equal for all of the items is given by the sequence

>TEST ALL, NOMINAL, NC=(m, m, m,), HIGH=(m, m, m,..);


>TMATRIX ALL, CK, TRIANGLE;
>TMATRIX ALL, AK, POLYNOMIAL;
>FIX ALL, AK=(2,3,…,m–1), VALUE=0.0;

which identifies the c-contrasts as the crossover points, the slope contrasts as polynomial
and fixes the quadratic and higher terms to zero. If in addition you enter

>EQUAL ALL AK=1;

the constraint is added that the slopes are equal across items and it becomes the MML ver-
sion of Masters’ (1982) model.

Related topics

 TEST command: NOMINAL/BS options (Section 4.4.11)

ALL/ITEMS option

Purpose

The set of items is specified with one of the mutually exclusive options ALL or
ITEMS=(list).

403
4 MULTILOG REFERENCE

 The ALL option refers to all of the items in the data.


 The ITEMS keyword specifies a subset of the items.

Format

ALL/ITEMS=(list)

AK/CK/DK option

Purpose

One of the three vectors of parameters of the nominal or multiple-choice model is specified
with one of the following mutually exclusive keywords: AK, CK=(list) or DK=(list).

 AK refers to ak
 CK refers to ck
 DK refers to d k .

Format

AK/CK/DK

DEVIATION/POLYNOMIAL/TRIANGLE option

Purpose

One of the following three T-matrix options specifies the matrix: DEVIATION, POLYNOMIAL
or TRIANGLE.

 DEVIATION specifies deviation contrasts; those used by Bock (1972)


 POLYNOMIAL specifies polynomial contrasts
 TRIANGLE gives Masters’ (1982) δ s as parameters

Format

DEVIATION/POLYNOMIAL/TRIANGLE

404
VARIABLE FORMAT STATEMENT

4.4.14 Variable format statement

There are two formats for the key information: one is used if the items are all binary, and the
other is used if any items on the test have more than two response categories. The two types of
key entry will now be discussed in turn.

Key entry: binary items

 The first line after the END command must contain a single integer that is the number of
response codes in the data file. In this context “code” means a single alphanumeric charac-
ter that appears in the data file to indicate a response; common codes are 0 and 1, or T and
F, or Y and N.
 The next line contains, beginning in column 1, in one-column fields, the codes them-
selves, for instance 01, or TF, or YN.
 The next line (or lines) contains the “key”: a list of the correct- or positive-response-
codes, beginning in column 1, in one-column fields. The key codes for up to 79 items go
on a single line. If there are more than 79 items, the key codes for items 80 and higher go
on the next line, up to the code for item 158. If there are more than 158 items, the key
codes for items 159 and higher go on the following line, and so on (79 codes per line).
 The next line of the binary-item key block contains N in column 1 if there are no explicitly
specified codes that indicate missing-at-random (or “not-reached”) or Y if such a code ex-
ists. Usually this line contains N, because any code in the data file that is not listed among
the codes on the line described in (4), above, is also treated as missing-at-random, so it is
generally easier to simply omit the missing data code(s) from the list of codes. However,
if it is desirable to specify the missing data code among the codes, you can put Y on this
line.
 If the entry on the missing-data code line (4, above) is Y, then the final line of the key se-
quence has the missing data code in column 1.

Related topics

 END command
 Response Codes (Binary Data) dialog box

Key entry: multiple response items

For tests that include one or more items with more than two response categories:

 The first line after the END command must contain a single integer that is the number of
response codes in the data file. In this context “code” means a single alphanumeric char-
acter that appears in the data file to indicate a response; common codes are 0, 1, 2, 3, … or
A, B, C, D, ….
 The next line contains, beginning in column 1, in one-column fields, the codes them-
selves, for instance 01234, or ABCDE. Note that the codes must be single characters, re-
gardless of the number of item response categories. For example, after using the digits
from 0-9 for ten response categories, the letters A, B, C, … are often used for the eleventh,

405
4 MULTILOG REFERENCE

twelfth, thirteenth, and so on, categories.

For multiple-category data, the code-line in (2) is followed by one line (or set of lines) for each
code, in the order the codes are typed in (2), above. The category numbers are in 1-column fields
if all of the items on the test have fewer than ten (10) response categories. Each line indicates,
beginning in column 1 for item 1, the number of the response category into which that data code
is to be placed. The lowest response category for MULTILOG models is numbered 1; the next
lowest is 2, and so on. Response category number 0 is reserved for missing data in MULTILOG.
For tests with no items with 10 or more categories, each item’s category-number for a given data
code occupies a single column. 79 items' categories fit on a single line; the next 79 go on the
next line, and so on.

For example: In the mixture.mlg file (derived from example 15), there are five (5) response
codes in the data: 0, 1, 2, 3, and 9. The first 26 items are binary; for those items, 0 is incorrect
and 1 is correct. The 27th item has three categories, coded in the data file 1, 2, and 3. 9 indicates
missing data.

For this example, the key sequence is:

5
01239
111111111111111111111111110
222222222222222222222222221
000000000000000000000000002
000000000000000000000000003
000000000000000000000000000

For items 1-26, this key sequence maps the response code 0 into category 1, and the response
code 1 into category 2. The response codes 1, 2, and 3 are placed in model categories 1, 2, and 3
for item 27. Unacceptable values, and 9s, are made missing by placement in category 0 (zero).

If any item on the test has ten or more response categories, the category number lines in the key
are all entered in two-column fields, right-justified. In this case, 40 category numbers are entered
on each line; for more than 40 items, additional lines are used for each data code.

Related topics

 END command (Section 4.4.2)


 Response Codes (binary data) dialog box (Section 4.2.6)

Following the key sequence, a format command is entered describing the layout of the data file.
This is described below:

The format begins (in column 1) with “(” and ends with “)”.

406
VARIABLE FORMAT STATEMENT

For PATTERN data

 NCHAR A1 for the ID field (optional; include only if NCHAR>0; NCHAR is the number of ID
characters entered on the PROBLEM command).
 I1 (or I2, if there are 10 groups) to read the group number, from 1 to NGROUP (optional;
include only if NGROUP>1 on the PROBLEM command).
 NITEMS A1 for the item responses.
 Fn.0 for the frequency corresponding to that response pattern, where n is the number of
columns devoted to the frequency in the data file.

For INDIVIDUAL data

 NCHAR A1 for the ID field (optional; include only if NCHAR>0; NCHAR is the number of ID
characters entered on the PROBLEM command).
 I1 (or I2, if there are 10 groups) to read the group number, from 1 to NGROUP (optional;
include only if NGROUP>1 on the >PROBLEM command).
 NITEMS A1 for the item responses.
 Fn.d for a criterion if there is one (optional), where n is the number of columns in the data
file devoted to the criterion, and d is the number of places after the decimal point.

For TABLE data

The format is for one row of the θ -group × item-response table, giving the frequencies in each
θ -group responding in each category, in F-format.

FORTRAN “X” and “T” formats may be used as required.

Examples

An example for binary data is example 1:

MML PARAMETER ESTIMATION FOR THE 1PL MODEL, LSAT6 DATA


BINARY DATA
>PROBLEM RANDOM, PATTERN, NITEMS=5, NGROUP=1, NPATTERNS=32,
DATA=‘LSAT6.DAT’;
>TEST ALL, L1;
>END;
2
01
11111
N
(4X,5A1,F4.0)

The *.mlg file for the COURTAB2 example is shown below, illustrating keying for multiple
categories.

If the problem requires an item parameter file as starting values or for computing examinee or
pattern scores, the file is named in a similar way by the keyword PARM=‘filename’ of the START

407
4 MULTILOG REFERENCE

command. If this keyword does not appear, the parameters are assumed to be in the command
file immediately following the START command. The command file courtab2.mlg for version
7.0 provides an example of this type of problem setup:

SCORE LIBERAL CONSERVATIVE/COURTS HARSH


PARAMETER VALUES IN COMMAND FILE
>PROBLEM SCORE, INDIVIDUAL, NITEMS=2, NGROUP=2, NEXAMINEES=18,
NCHAR=2, DATA=‘COURTAB.DAT’;
>TEST ALL, GRADED, NCATS=(3,3);
>START ALL;
1.08 -2.86 -1.78
1.12 -0.93 0.97
-1.00 0.16 1.00
-1.00 0.00 1.00
>END;
3
123
11
22
33
(2X,2A1,T1,I1,1X,2A1)

Note the addition of DATA=‘COURTAB.DAT’ on the PROBLEM command, and the FORMAT after
>START ALL; To simplify parameter input, the program default has been changed to format-free
read. The values of the parameters must be space- or comma-delimited; no format statement is
required (or permitted). Free-format parameter values may also be read using the keyword
PARM=‘filename’ to read the values from a file.

By default, when the estimated parameters are saved in the jobname.par file, the parameter val-
ues are in this file and are suitable for format-free read. (See comments about the SAVE command
below).

To use parameter files in the format used by previous versions of MULTILOG, the keyword
FORMAT has been added to the START command. This keyword is used only in conjunction with
the keyword PARM=‘filename’, to read output files produced by MULTILOG with the keyword
FORMAT on the SAVE command line. Partial syntax for the Knee problem using this new setup is
shown below:

ESTIMATION OF SKELETAL MATURITY BY THE RWT (1975) METHOD


PARTIAL SYNTAX
>PRO SCORE, INDIVIDUAL, NOPOP, CRITERION, NEXAMINEES=13, NITEMS=34,
NCHARS=10, DATA=‘KNEE.DAT’;
>TEST ALL, GRADED,
NC=(5,5,5,3,2,2,2,3,2,3,3,3,5,5,2(0)12,3,3,5,2,3,2,2,4);
>START ALL, FORMAT, PARAM=‘KNEE.PAR’;
>END;

 If the SAVE command appears in an item calibration problem, the parameters are saved in
a file named jobname.par.

408
VARIABLE FORMAT STATEMENT

 If you wish to have the program save the estimated parameters in the previous style, you
may insert the keyword FORMAT in the SAVE command. This keyword causes an item
header line to be written ahead of the parameter values as previously. This item header
line contains essential information if the nominal or multiple-choice models are used;
free-format output is not useful for most purposes with those models.
 If FORMAT does not appear in the SAVE command, the item header line is omitted and the
parameters are written in format-free, space delimited form.
 If the SAVE command appears in an examinee or pattern scoring, the scores are saved in a
file named jobname.sco. If the SAVE command does not appear, the scores are listed in the
jobname.out file.

Related topics

 END command (Section 4.4.2)


 PROBLEM command: NCHARS/ NGROUP/ NITEMS keywords (Section 4.4.7)
 START command: PARAM keyword (Section 4.4.10)
 SAVE command (Section 4.4.9)
 Input Data dialog box (Section 4.2.3)

409
5 TESTFACT REFERENCE

5 TESTFACT

5.1 Introduction

The TESTFACT program implements all the main procedures of classical item analysis, test
scoring, and factor analysis of inter-item tetrachoric correlations. In addition, it performs modern
methods of factor analysis based on item response theory (IRT). The program also includes a fa-
cility for simulating responses to test items having difficulties and factor loadings specified by
the user.

New features in TESTFACT are all part of full information item factor analysis (FIFA). The
commands and procedures of classical item statistics and classical factor analysis of tetrachoric
correlation coefficients remain unchanged.

The changes to full information item factor analysis consist of a new and improved algorithm for
estimating the factor loadings and scores – specifically, new methods of numerical integration
used in the EM solution of the marginal maximum likelihood equations. Three different methods
of multidimensional numerical integration for the E-step of the EM algorithm are provided:
adaptive quadrature, non-adaptive quadrature, and Monte Carlo integration.

In exploratory item factor analysis, these methods make possible the analysis of up to fifteen fac-
tors and improve the accuracy of estimation when the number of items is large. The previous
non-adaptive method has been retained in the program as a user-selected option (NOADAPT), but
the adaptive method is the default. The maximum number of factors with adaptive quadrature is
10; with non-adaptive quadrature, 5; with Monte Carlo integration, 15. Bayes estimates of scores
for all factors can be estimated either by the adaptive or non-adaptive method. Estimation of the
classical reliability of the factor scores is also included.

TESTFACT includes yet another full information method that provides an important form of
confirmatory item factor analysis namely “bifactor” analysis. The factor pattern in bifactor
analysis consists of a general factor on which all items have some loading, plus any number of
so-called “group factors” to which non-overlapping subsets of items, assigned by the user, are as-
sumed to belong. The subsets typically represent small numbers of items that pertain to a com-
mon stem such as a reading passage or problem-solving exercise. The bifactor solution provides
Bayes estimation of scores for the general factor, accompanied by standard errors that properly
account for association among responses attributable to the group factors.

Three new commands have been added:

 BIFACTOR invokes and controls the bifactor solution. The FACTOR and FULL commands
may not be used with BIFACTOR.
 TECHNICAL combines keywords and options of item factor analysis that would otherwise
have to be duplicated in the BIFACTOR, FACTOR, FULL, and SCORE commands.
 SIMULATE is now a separate command instead of a keyword of the SCORE command. It has
additional options for input of item parameters to specify the simulation. The parameters
may be entered either as item intercepts and factor slopes, or standard difficulties (i.e.,
410
5 TESTFACT REFERENCE

normal deviates corresponding to items percent correct) and factor loadings. The com-
mand now also allows the user to specify mean values for the factor scores. The default
values of the means are zero, as in the previous version of the program.

5.2 The TESTFACT interface

This document describes those elements in the user’s interface that may not be immediately clear
to the user or that behave in a somewhat nonstandard way.

 Main menu
 Run menu
 Output menu
 Font option
 Window menu

5.2.1 Main menu

At the center of the interface is the main menu bar, which adapts to the currently active function.
For example, when you start the program, the menu bar shows only the menu choices File, View,
and Help.

However, as soon as you open a TESTFACT output file (through the File menu), the Windows
and Edit menu choices show up on the menu bar. At the same time, the File menu choices have
been expanded with selections like Save and Save As. And the View menu now has a Font op-
tion next to the Status bar and Toolbar choices.

Opening an existing TESTFACT command (*.tsf) file, or starting a new one, adds further
choices to the main bar: the Output and Run menus.

Note that you can open only one command file at a time. If you want to paste some part from an
existing command file in your current one, opening the old file will automatically close the cur-
rent one. After you copy the part you want to the clipboard, you have to reopen the *.tsf file for
pasting.

411
5 TESTFACT REFERENCE

5.2.2 Run menu

The Run menu gives you the option to run the command file displayed in the main window.

If you made any changes, the current command file will first be saved when you run an analysis
by clicking Run. You can easily tell if a command file has been changed by looking at the file-
name above the menu bar. An asterisk after the filename shows that the current file has changed
but has not been saved yet.

5.2.3 Output menu

Through the Output menu, you can open the list output, named with the file extension *.out.
Always check the end of each output file to see if it reports: NORMAL END. If it does not, some-
thing went wrong and the output file should have some information on that.

5.2.4 Window menu

The Window menu is only available when you have at least one file open. You can use the Ctrl-
Tab key combination to switch between open files, or use the Window menu to arrange the open
files (cascade, tile).

5.2.5 Font option

Clicking on the Font option on the View menu displays a dialog box with the fonts that are
available on your system.

You may use different fonts for command and output files. At installation, they are both set to a
special Arial Monospace font that ships with the program. To keep the tables in the output
aligned, you should always select a monospace or fixed pitch font where all the characters in the
font have the same width. Once you select a new font, that font becomes the default font. This
gives you the option to select a font (as well as font size and font style) for your command (*.tsf)
files that is different from the one for your list output (*.out) file as a quick visual reminder of
the type of file.

412
OVERVIEW OF SYNTAX

5.3 Command syntax

TESTFACT uses the command conventions of other IRT programs published by SSI. Command
lines employ the general syntax:

>NAME KEYWORD1=n, KEYWORD2=(list), …, OPTIONn;

The following rules apply:

 Command lines may not exceed 80 columns. Continuation on one or more lines is permit-
ted.
 Each command must be terminated by a semicolon (;). The semicolon functions as the
command delimiter: it signals the end of the command and the beginning of a new com-
mand.
 A greater-than sign (>) must be entered in column 1 of the first line of a command and
followed without a space by the command name.
 Command names, keywords, and options may be entered in full or abbreviated to the first
three characters. Exceptions are the following keyword values and options, which must be
entered in full:
 VARIMAX in the FACTOR command (Section 5.3.9)
 PROMAX in the FACTOR command
 PATTERN in the INPUT command (Section 5.3.12)
 CASE in the INPUT command
 RECODE in the BIFACTOR and FULL commands (Sections 5.3.3 and 5.3.11)
 MISS in the BIFACTOR and FULL commands
 LORD in BIFACTOR and FULL commands

413
5 TESTFACT REFERENCE

 LOADINGS and SLOPES in the SIMULATE command (Section 5.3.23)


 At least one space must separate the command name from any keywords or options.
 Commas must separate all keywords and options.
 The equal sign is used to set a keyword equal to a value, which may be integer, real, or
character. A real value must contain a decimal point. A character value must be enclosed
in single quotes if:
o it contains more than eight characters
o it begins with a numeral
o it contains embedded blanks, commas, semicolons, or slashes
 A keyword may be vector valued, i.e., set equal to a list of integer, real, or character con-
stants, separated by commas or spaces, and enclosed in left and right parentheses.
 If the list contains an arithmetic progression, the abbreviated form first(increment)last,
may be used. Thus, a selection of items 1,3,7,8,9,10,15 may be entered as 1,3,7(1)10,15.
Real values may be used in a similar way. If the values in the list are equal, the form,
value(0)number of values, may be used. Thus, 1.0,1.0,1.0,1.0,1.0 may be entered as
1.0(0)5.
 Any number of problems may be stacked. Output from the problems will be concatenated
in files <problem>.* (the <problem> part of the filenames is equal to the name of the
command file being used).
 The STOP command indicates the end of the stack of problems.

Related topics

 FACTOR command: ROTATE keyword (Section 5.3.9)


 INPUT command: CASE and PATTERN keywords (Section 5.3.12)
 LOADINGS and SLOPES in the SIMULATE command (Section 5.3.23)
 LORD in BIFACTOR and FULL commands (Sections 5.3.3 and 5.3.11)
 MISS in the BIFACTOR and FULL commands
 RECODE in the BIFACTOR and FULL commands

5.3.1 Order of commands

All available TESTFACT commands are given in their necessary order below. Required com-
mands (indicated with a “*”) must appear in the command file for each problem setup. All other
commands are optional. In the sections that follow, commands are arranged in alphabetical order.

TITLE (*)
PROBLEM (*)
COMMENT
NAMES
RESPONSE (*)
KEY (*)
SELECT
SUBTEST
CLASS
FRACTILE
EXTERNAL
CRITERION

414
OVERVIEW OF SYNTAX

RELIABILITY
PLOT
TETRACHORIC
BIFACTOR
FACTOR
FULL
PRIOR
SCORE
TECHNICAL
SAVE
SIMULATE
INPUT (*)
(Variable format statement) (*)
CONTINUE (*)
STOP (*)

Note:

INPUT and CONTINUE (or INPUT and STOP) must be the last two commands. The variable format
is required in the command file when raw data are read in from an external file.

5.3.2 Overview of syntax

Command Keywords Options

TITLE
PROBLEM NITEM=n, SELECT=n, NOTPRES
RESPONSE=n, SUBTEST=n,
CLASS=n, FRACTILES=n,
EXTERNAL=n, SKIP=n
COMMENT
NAMES
RESPONSE
KEY
SELECT
SUBTEST BOUNDARY=(list),
NAMES=(list)
CLASS IDENTITY=(list),
NAMES=(list)
FRACTILES BOUNDARY=(list) SCORE/PERCENTIL
EXTERNAL
CRITERION NAME=n, WEIGHTS=(list) EXTERNAL/SUBTESTS/CRITMARK
RELIABILITY KR2/ALPHA

415
5 TESTFACT REFERENCE

Command Keywords Options

PLOT BISERIAL/PBISERIAL,
NOCRITERION/CRITERION,
FACILITY/DELTA
TETRACHORIC NDEC=n RECODE/PAIRWISE/ COMPLETE,
TIME, LIST, CROSS
BIFACTOR NIGROUPS=n, TIME, SMOOTH, RESIDUAL,
IGROUPS=(list), LIST=n, NOLIST
CPARMS=(list), NDEC=n,
OMIT=n, CYCLES=n, QUAD=n
FACTOR NFAC=n, NROOT=n, NIT=n, RESIDUAL, SMOOTH
ROTATE=(list), NDEC=n
FULL OMIT=n, FREQ=n, CYCLES=n, TIME
CPARMS=(list), QUAD=n
PRIOR SLOPE=n, INTER=(list)
SCORE NFAC=n, FILE=<name>, MISSING, TIME, CHANCE,
LIST=n, METHOD=n, PARAM=n, LOADINGS
SPRECISION=n
TECHNICAL ITER=(list), QUAD=n, NOADAPT, FRACTION, NOSORT
SQUAD=n, PRV=n, FREQ=n,
NITER=n, QSCALE=n,
QWEIGHT=n, IQUAD=n,
ITLIMIT=n, PRECISION=n,
NSAMPLE=n, ACCEL=n,
MCEMSEED=n
SAVE SCORE, MAIN, SUBTESTS,
CRITERION, CMAIN, CSUB,
CCRIT, CORRELAT, SMOOTH,
ROTATE, UNROTATE, FSCORES,
TRIAL, SORTED, EXPECTED, PARM
SIMULATE NFAC=n, NCASES=n, LOADINGS/SLOPES, CHANCE
SCORESEED=n, ERRORSEED=n,
GUESSSEED=n, FILE=<name>,
MEAN=(list), FORM=n,
GROUP=n, PARM=n
INPUT NIDCHAR=n, NFMT=n, SCORES/CORRELAT/FACTOR,
TRIAL=<name>, FORMAT/UNFORMAT, LIST, REWIND
WEIGHT=(list), FILE=<name>
CONTINUE
STOP

416
OVERVIEW OF SYNTAX

Note:

 Keywords require a value after the equal sign.


 Options operate if present; a forward slash (/) indicates a choice between two (or three)
options, the first being the default.

417
5 TESTFACT REFERENCE

5.3.3 BIFACTOR command

(Optional)

Purpose

To request full information estimation of loadings on a general factor in the presence of


item-group factors, and Bayes EAP estimation of scores on the general factor, including
standard errors allowing for conditional dependence introduced by the item-group factors.

Note

FACTOR and BIFACTOR are mutually exclusive commands. If RESIDUAL is not selected, it is
not necessary to compute the tetrachoric matrix.

Format

>BIFACTOR NIGROUPS=n, IGROUPS=(list), LIST=n, CPARMS=(list), NDEC=n,


OMIT=RECODE/MISS/LORD, CYCLES=n, QUAD=n, TIME, SMOOTH,
RESIDUAL, NOLIST;

Example

>BIFACTOR LIST=1, NDEC=2, OMIT=LORD;

Default

No bifactor analysis.

Related topics

 FACTOR command (Section 5.3.9)

CPARMS keyword

Purpose

To specify the probability of chance success on each item. If items have been specified in
the SELECT command, the corresponding probabilities will be selected from this list.

Format

CPARMS= (n1 , n2 ,..., nn )

418
BIFACTOR COMMAND

Default

All probabilities are set to zero.

Related topics

 SELECT command (Section 5.3.22)

CYCLES keyword

Purpose

To specify the number of EM cycles in the bifactor solution.

Format

CYCLES=n

Default

20.

IGROUPS keyword

Purpose

To assign the items to the item groups, numbered from 1 to n. Assign 0 to any item that
loads only on the general factor. If items have been specified in the SELECT command, the
corresponding IGROUPS numbers will be selected from this list.

Format

IGROUPS= ( n1 , n2 ,..., nn )

For purpose of comparing results of a bifactor analysis with a one-factor analysis of the
same data, the user may assign all items to the general factor (i.e., all values of the IGROUPS
keyword are zero. In that case, NIGROUPS must also be set to zero).

Default

None.

Related topics

 BIFACTOR command: NIGROUPS keyword


 SELECT command (Section 5.3.22)

419
5 TESTFACT REFERENCE

LIST keyword

Purpose

To control the printout of the bifactor loadings as follows:

 n= 0: no printout
 n= 1: loadings will be listed in item order
 n= 2: loadings will be listed in item group order
 n= 3: loadings will be listed in both orders

If unrotated factor loadings are selected in the SAVE command, the loadings will be saved in
item order in the format of a conventional two-factor solution. The group assignments will
be included.

Format

LIST=n

Default

0.

Related topics

 SAVE command (Section 5.3.20)

NDEC keyword

Purpose

To specify the number of decimal places in the listing of a selected smoothed or residual
correlation computed from the bifactor solution.

Format

NDEC=n

Default

3.

420
BIFACTOR COMMAND

NIGROUPS keyword

Purpose

To specify the number of item-group factors.

Format

NIGROUPS=n

Default

None.

NOLIST option

Purpose

To suppress the listing of the smoothed or residual matrix in the program output. These ma-
trices may be saved in an external file in either case (see the SAVE command, discussed in
Section 5.3.20).

Format

NOLIST

Related topics

 SAVE command (Section 5.3.20)

OMIT keyword

Purpose

To specify the treatment of omitted items. Note that the option selected should be given in
full.

 If OMIT is set to RECODE, omitted items will be scored incorrect.


 If OMIT is set to MISSING, omitted items will be treated as not-presented.
 If OMIT is set to LORD, F.M. Lord’s convention of scoring omits of multiple-choice items
as fractionally correct will be observed. The fraction is the chance success parameter for
the items. It is set to the reciprocal of the number of alternatives in the item. Only factor
scores are affected.

421
5 TESTFACT REFERENCE

Format

OMIT=RECODE/MISSING/LORD

Default

RECODE.

QUAD keyword

Purpose

To control the number of quadrature points for the EM estimation of the parameters.

Format

QUAD=n

Default

9.

RESIDUAL option

Purpose

To compute the difference between the tetrachoric correlation matrix and the smoothed ex-
pected matrix. If RESIDUAL is not selected, it is not necessary to compute the tetrachoric ma-
trix.

Format

RESIDUAL

SMOOTH option

Purpose

To reproduce the expected correlation matrix from the bifactor solution; otherwise, the ma-
trix will not be computed.

Format

SMOOTH

422
BIFACTOR COMMAND

TIME option

Purpose

To specify that omitted items following the last non-omitted item should be treated as not-
presented. Otherwise, they will be scored incorrect if the OMIT keyword is set to RECODE.

Format

TIME

Related topics

 BIFACTOR command: OMIT keyword

423
5 TESTFACT REFERENCE

5.3.4 CLASS command

(Optional)

Purpose

To assign class codes if item statistics are to be estimated separately for each class (group)
of respondents in the sample.

Format

>CLASS IDENTITY=(list), NAMES=(list);

Examples

>CLASS IDENTITY=(‘1000’,‘2000’,‘3000’);

Classes are identified by numbers.

>CLASS IDENTITY=(N,S,E,W,C);

Classes are identified by letters.

>CLASS IDENTITY=(N,S,E,W,C), NAMES=(NORTH, SOUTH, EAST, WEST, CENTRAL);

Classes are identified and named.

Default

No classes.

IDENTITY keyword

Purpose

To specify a string of codes to identify classes of respondents.

 Each code may be 1 to 4 characters.


 Codes other than letters must be enclosed in single quotes.
 Codes are separated by commas.
 The string of codes for all the classes is enclosed in parentheses.

Format

IDENTITY= ( n1 , n2 ,..., nq )

424
CLASS COMMAND

Default

None.

NAMES keyword

Purpose

To specify a string of names to label classes of respondents.

 Names are limited to 8 characters for each class name.


 Names are listed in the same order as the codes.
 The string of names for all the classes is enclosed in parentheses.

Format

NAMES= ( n1 , n2 ,..., nq )

Default

Blank names.

425
5 TESTFACT REFERENCE

5.3.5 COMMENT command

(Optional)

Purpose

To enter one or more lines of comment to appear in the output.

Format

>COMMENT
…text…


…text…

Note

The COMMENT command is given on a line by itself and followed by as many lines as desired,
of 80 characters maximum, containing comments. A semicolon to end this command is not
needed.

Example

>COMMENT
20 ITEM TEST, THE TOTAL SCORE AS THE CRITERION SCORE.
THE ITEMS ARE TESTING THE FOLLOWING TOPICS.
STRUCTURE AND LANDFORMS.
EROSION, TRANSPORT AND DEPOSITION.
CLIMATE AND VEGETATION.
MINERAL RESOURCES.
AGRICULTURE AND INDUSTRY.
POPULATION AND TRANSPORT.
MISCELLANEOUS.
PERSONS SITTING TEST CLASSIFIED BY SEX, G=GIRL, B=BOY;
THE DATA CARDS ARE LAYED OUT AS BELOW.
COLUMN 1 TO 12 INCLUSIVE – IDENTITY
COLUMN 13 – SEX
COLUMNS 14 TO 33 – ITEM RESPONSES
COLUMNS 36 TO 37 INCLUSIVE – CRITERION SCORE
>NAMES…

Default

No comments.

426
CONTINUE COMMAND

5.3.6 CONTINUE command

(Optional)

Purpose

To terminate each problem in a set of stacked problems.

Format

>CONTINUE

Note

A semicolon to signal the end of this command is not needed.

427
5 TESTFACT REFERENCE

5.3.7 CRITERION command

(Optional)

Purpose

To define a criterion score to supplement the main test score of each respondent.

Format

>CRITERION EXTERNAL/SUBTESTS/CRITMARK, NAME=n, WEIGHTS=(list);

Examples

>CRITERION CRITMARK, NAME=TESTMARK;

>CRITERION SUBTESTS, NAME=TOTAL, WEIGHTS=(0.3,0.2,0.5);

Default

No criterion score.

EXTERNAL/SUBTEST/CRITMARK option

Format

EXTERNAL/SUBTESTS/CRITMARK

 EXTERNAL: A linear combination of external variables (see the EXTERNAL command, dis-
cussed in Section 5.3.8). In this case, weights are supplied by the user (see WEIGHTS key-
word, with w = t).
 SUBTESTS: A linear combination of subtest scores (see SUBTEST command, Section
5.3.25). In this case, calculation is based weights supplied by the user (see WEIGHTS key-
word, with w = p).
 CRITMARK: A score input with item responses. No weights required.

Default

EXTERNAL.

Related topics

 EXTERNAL command (Section 5.3.8)


 CRITERION command: WEIGHTS keyword
 SUBTEST command (Section 5.3.25)

428
CRITERION COMMAND

NAME keyword

Purpose

To provide a name of 1 to 8 characters for the resulting criterion. The rules for naming items
(see NAMES command, Section 5.3.14) apply to the criterion name.

Format

NAME=character string

Default

Blank name.

Related topics

 NAMES command (5.3.14)

WEIGHTS keyword

Purpose

To enter weights when the criterion score must be calculated as a linear combination of
other variables (see EXTERNAL option) .

 Weights are separated by commas, and the list is enclosed in parentheses.


 Weights must be real-valued (that is, they must have decimal points).

Format

WEIGHTS= ( n1 , n2 ,..., nw )

Default

1.0.

Related topics

 CRITERION command: EXTERNAL option

429
5 TESTFACT REFERENCE

5.3.8 EXTERNAL command

(Optional)

Purpose

To provide names for the external variables.

 Names may be 1 to 8 characters long.


 Values of the external variables are entered from each data record. See the SCORES option
on the INPUT command, discussed in Section 5.3.12.
 The rules for naming items (see NAMES command, Section 5.3.14) apply to the external
variables.
 This keyword is related to the use of the EXTERNAL keyword on the PROBLEM command
(see Section 5.3.17) .

Format

>EXTERNAL= ( n1 , n2 ,..., nt )

Example

>EXTERNAL ARITH,ALGEBRA,TRIG,GEOMETRY;

Default

External variables are unnamed.

Related topics

 INPUT command: SCORES option (Section 5.3.12)


 NAMES command (Section 5.3.14)
 PROBLEM command: EXTERNAL keyword (Section 5.3.17)

430
FACTOR COMMAND

5.3.9 FACTOR command

(Optional)

Purpose

 To control the item factor analysis.


 To perform MINRES factor analysis of the conditioned tetrachoric correlation matrix.
 To rotate the solution.
 To obtain residual and smoothed correlations.

Format

>FACTOR NFAC=n, NROOT=n, NDEC=n, ROTATE=(VARIMAX/PROMAX,d,e), RESIDUAL,


SMOOTH, NIT=n;

Note

VARIMAX and PROMAX may not be abbreviated in the FACTOR command.

Example

>FACTOR NFAC=5, ROTATE=(PROMAX,4,2), RESIDUAL;

Default

No factor analysis.

FACTOR command: NDEC keyword

Purpose

To specify the number of decimal places printed for the residuals.

Format

NDEC=n

Default

3.

NFAC keyword

Purpose

To specify the number of factors to be extracted.

431
5 TESTFACT REFERENCE

 Must be less than NITEMS in the PROBLEM command.


 Must not exceed 15 in MINRES factor analysis or Monte Carlo EM analysis.
 Must not exceed 10 in adaptive full information factor analysis.
 Must not exceed 5 in non-adaptive full information item factor analysis (see the
TECHNICAL command for the NOADAPT option, discussed in Section 5.3.26).

Format

NFAC=n

Default

NITEM/2.

Related topics

 PROBLEM command: NITEMS keyword (Section 5.3.17)


 TECHNICAL command: NOADAPT option (Section 5.3.26)

NIT keyword

Purpose

To specify the number of iterations for the MINRES factor solution of the smoothed correla-
tion matrix.

Format

NIT=n

Default

NIT=3. Minimum value = 1.

NROOT keyword

Purpose

To specify the number of latent roots to be extracted. NROOT must be greater or equal to
NFAC.

Format

NROOT=n

432
FACTOR COMMAND

Default

NFAC.

Related topics

 FACTOR command: NFAC keyword

RESIDUAL option

Purpose

To request the computation of the residual correlation matrix. This matrix will be computed
by the initial correlation matrix minus the final correlation matrix. The residual variance for
each item appears in the diagonal of this matrix.

Format

RESIDUAL

Default

No residual correlation matrix.

ROTATE keyword

Purpose

To request rotation of the factors. VARIMAX or PROMAX has to be entered (in full) if rotation is
required, there is no default. d is the number of leading factors to be rotated, and must be
equal to or less than NFAC. e is the constant for PROMAX rotation and must be between 2
and 4, inclusive.

Format

ROTATE=([VARIMAX/PROMAX],d,e)

Default

d=NFAC, e=3.

Related topics

 FACTOR command: NFAC keyword

433
5 TESTFACT REFERENCE

SMOOTH option

Purpose

To request the computation of an f-factor positive definite estimate of the latent response
process correlation matrix.

Format

SMOOTH

Note

The SMOOTH option affects only the output of the final smoothed correlation matrix. Initial
smoothing of the correlation matrix will take place whether the SMOOTH option is entered or
not. The off-diagonal elements 1.0, -1,0, 9.0 or -9.0 in the initial tetrachoric correlation ma-
trix (caused by too small cell or marginal frequencies in a contingency table) will be auto-
matically replaced by a new correlation coefficient estimated by the centroid method. The
positive-definite tetrachoric correlation matrix is then produced before the principal factor
analysis.

434
FRACTILES COMMAND

5.3.10 FRACTILES command

(Optional)

Purpose

To group scores into fractiles by score boundaries or percentiles. The number of fractiles
must be set in the PROBLEM command.

Format

>FRACTILES SCORE/PERCENTIL, BOUNDARY=(list);

Examples

>FRACTILES SCORE, BOUNDARY=(15,27,33,40,60);

Fractiles will correspond to the mutually exclusive score bands:

 through 15
 16 through 27
 28 through 33
 34 through 40
 41 through 60

>FRACTILES PERCENTIL, BOUNDARY=(20,40,60,80,100);

Each fractile will correspond to 20 percent of the scores.

Default

No fractiles.

Related topics

 PROBLEM command (Section 5.3.17)

BOUNDARY keyword

Purpose

To specify the boundaries to be used.

 If the SCORES option is selected, the boundaries consist of cumulative upper scores on the
test bands. The scores are expressed in integers from 1 to NITEM.
 If the PERCENTIL option is selected, the boundaries consist of the cumulative upper per-

435
5 TESTFACT REFERENCE

centages of the score distribution. The percentages are expressed in integers from 1 to
100.

Format

BOUNDARY= ( n1 , n2 ,..., ns )

Related topics

 PROBLEM command: NITEMS keyword (Section 5.3.17)


 FRACTILES command: SCORE/PERCENTIL option (Section 5.3.10)

SCORE/PERCENTIL option

Purpose

To specify the fractiles used.

 If SCORE is selected, each fractile corresponds to a band of scores on the main test. If the
number of items is small, it is better to use score bands rather than the percentiles to de-
fine fractiles.
 If PERCENTIL is selected, each fractile corresponds to a percentile of scores on the main
test.

Format

SCORE/PERCENTIL

Default

SCORE.

436
FULL COMMAND

5.3.11 FULL command

(Optional)

Purpose

To request full information item factor analysis, starting from the principal factor solution,
and the computation of the likelihood ratio χ 2 and change of χ 2 .

Format

>FULL OMIT=RECODE/MISS/LORD, CYCLES=n, CPARMS=(list), TIME,


FREQ=n, QUAD=n;

Note

RECODE, MISS and LORD may not be abbreviated in the FULL command.

Example

>FULL CYCLES=20, OMIT=MISS, CPARMS=(0.25(0)10);

Default

No full information analysis.

CPARMS keyword

Purpose

To specify the value of chance or so-called guessing parameters, j = 1, 2, …, n. Repeated


values of n may be coded as n(0)m, where m is the number of repetitions of the values of
(0.0 < n < 1.0). TESTFACT regards these parameters as fixed when fitting the guessing
model.

Format

CPARMS= ( n1 , n2 ,.., nn )

Examples

CPARMS=(0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,
0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,
0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1);

437
5 TESTFACT REFERENCE

CPARMS=(0.1(0)32);

CPARMS=(0.05(0)16,0.1,0.1,0.15(0)14);

Default

0.0.

CYCLES keyword

Purpose

To specify the maximum number of EM cycles.

Format

CYCLES=n

Default

15.

FREQ keyword

Purpose

To list observed and expected response pattern frequencies and their differences.

Format

FREQ=0 or FREQ=1

Default

FREQ=0 (observed and expected response frequency table not written to output file.)

OMIT keyword

Purpose

To specify the treatment of omitted items.

 If OMIT=RECODE, omits will be recoded as wrong responses.


 If OMIT=MISSING is selected, omits will not be used at all.
 If OMIT equals LORD, F.M. Lord’s convention of scoring omits of multiple-choice items as

438
FULL COMMAND

fractionally correct will be observed. The fraction is the chance success parameter for the
items. It is set to the reciprocal of the number of alternatives in the item. Only factor
scores are affected.

Format

OMIT=RECODE/MISSING/LORD

Default

RECODE.

QUAD keyword

Purpose

To control the number of quadrature points for the EM estimation of the parameters.

Format

QUAD=n

Default

When the NOADAPT option is specified on the TECHNICAL command, the default values for
QUAD are as follows:

Factors Quad points

1 21

2 15

3 5

>3 3

439
5 TESTFACT REFERENCE

Otherwise

Factors Quad points

1 9

2 5

3 4

>3 3

Related topics

 TECHNICAL command: NOADAPT option (Section 5.3.26)

TIME option

Purpose

To treat omitted responses following the last-responded-to item as not-presented.

Format

TIME

Default

Omitted responses will be treated as the OMIT keyword specifies.

Related topics

 FULL command: OMIT keyword

440
INPUT COMMAND

5.3.12 INPUT command

(Required)

Purpose

To begin input data for item analysis or factor analysis.

Format

>INPUT NIDCHAR=n, NFMT=n, TRIAL=‘filename’, WEIGHT=(CASE,n)/PATTERN,


FILE=‘filename’, SCORES/CORRELAT/FACTORS, FORMAT/UNFORMAT,
LIST, REWIND;

Note

CASE and PATTERN may not be abbreviated in the INPUT command.

Examples

Example 1:

Item response data, unweighted and unformatted, are in the file with the name mydata.dat.
There are two groups, coded M and F, one form, and two external criteria. Because these are
raw data, the variable format record must be in the command file. End of command file:

>INPUT NIDCHAR=4,FILE=‘MYDATA.DAT’;
(4A1,3X,A1,2X,10A1,2F5.1)
>CONTINUE

The file mydata.dat:

0001 M 1324431223 261 372


0002 M 1332143214 193 301

2000 F 1334331214 176 280

Example 3:

A 5 × 5 correlation matrix for MINRES factor analysis, possibly including VARIMAX or


PROMAX rotation. Data are in the indicated file.

>INPUT CORRELAT ,FILE= ‘EXAMPL02.COR’;


>STOP

Data are read as one long string of numbers with decimal points – space-delimited and for-
mat-free. They can appear, for example, in the following form:

441
5 TESTFACT REFERENCE

1.0000
.6231 1.0000
.5574 .5395 1.0000
.3746 .3952 .4871 1.0000
.3210 .3456 .3863 .5894 1.0000
>CONTINUE

Example 4:

A factor solution for VARIMAX and PROMAX rotation; factor loadings are unformatted
output from SAVE command and are in the file factor.dat. The first line of the external file
with the factor loadings contains the variable format statement.

>INPUT FACTORS, FILE='FACTOR.DAT’,UNF;


>CONTINUE

Example 5:

Item response data for full information factor analysis; trial values from a previous principal
factor analysis are input as starting values.

The trial values and the variable format statement describing the layout are in the file
pfact.tri. The raw data are in the external file survey.dat. The corresponding variable for-
mat statement is required in the command file.

Command file:

>INPUT NIDCHAR=10, FILE=SURVEY.DAT’, TRIAL=‘PFACT.TRI’;


(10A1,15A1)
>CONTINUE

The trial value file pfact.tri:

(I3,2X,3F10.5)
01 0.36512 -0.62143 0.01684

15

The data file survey.dat:

0001M24 101101110010100
0250F20 111101011010111

Example 6:

Item response data with case weights normalized to 1000; data are formatted in the file sur-
vey.dat. Two variable format records are in the command file.

442
INPUT COMMAND

>INPUT NIDCHAR=5, NFMT=2, WEIGHT=(CASE,1000.0), FILE=‘SURVEY.DAT’,


REWIND;
(5A1,A1,T16,4A1,T14,2A1,T8,4A1,
T20,10A1,F6.3)
>CONTINUE

The file survey.dat:

00001ABDAECAEDBBCAEDEBACDEABC 0.632

01651ABDBEACEDCBBCDABEBDDCEAA 0.467

Related topics

 SAVE command (Section 5.3.20)

NIDCHAR keyword

Purpose

To specify the number (between 1 and 50) of characters in the subject identification field.

Format

NIDCHAR=n

Default

1.

FILE keyword

Purpose

To provide the filename of the data records (of response records, correlations, or factor load-
ings). It may contain up to 128 characters including path and extension and should be en-
closed in single quotes. The drive and directory path should be included if the data file is not
in the same folder as the command file. For more information, see Section 3.2.6.

Format

FILE=<‘filename’>

Default

Data are read from the command file.

443
5 TESTFACT REFERENCE

FORMAT/UNFORMAT option

Purpose

To indicate whether or not the input data file is formatted:

 If the data file is formatted as described in a variable format statement, the FORMAT option
should be used.
 UNFORMAT is used when the data file is unformatted (binary).

To create the records of the unformatted file, the following WRITE statements may be used:

SCORES:
WRITE(FILE)(ID(I),I=1,NIDCHAR),C,(IR(I),I=1,NITEM),
WT,EXTV(I)I=1,EXT),

where C, WT and EXTV(I) are optional. The variable IR is integer; the others are real, single
precision. ID is a case identifier, C a class indicator, IR an item response pattern, and WT a
weight (for either CASE or PATTERN).

CORRELAT:
WRITE(FILE)(CORR(I)I=1,NTRI),

where NTRI=(NITEMS x (NITEMS+1))/2 and CORR is real, single precision.

FACTOR:
WRITE(FILE)(A(I)I=1,NFA),

where NF=(NITEMS x NFAC).

Format

FORMAT/UNFORMAT

Default

FORMAT.

Related topics

 INPUT command: SCORES option


 INPUT command: CASE/PATTERN option

LIST option

To list in output file, for all subjects:

444
INPUT COMMAND

 identification
 main test score
 subtest scores (if any).

Format

LIST

NFMT keyword

Purpose

To specify the number of variable format records (80 characters) describing the data records.
The format records must appear in the command file immediately following the INPUT
command.

Format

NFMT=n

Default

1.

REWIND option

Purpose

This is a program instruction to read the data file from the beginning for a subsequent prob-
lem in a stacked command file. The term rewind dates from mainframe days, when data files
were commonly read from a tape that needed to be rewound to the start of the file for a fresh
reading of the data.

Format

REWIND

SCORES/CORRELAT/FACTORS option

Purpose

To indicate the type of data being read from the specified file as follows:

Use the SCORES option to read subject response records. The data should conform to the fol-
lowing specifications:

445
5 TESTFACT REFERENCE

 a characters of case identification read as an A1 character field: (aA1), where a is the


value specified with the NIDCHAR keyword in the INPUT command.
 Class code if CLASS is more than 0 in the PROBLEM command; read as (A1), (A2), (A3), or
(A4) characters.
 One-character item responses read as (nA1) or, for example, ( n1 A1,10X, n2 A1) where
n1 + n2 = n ; n being the value specified with the NITEMS keyword in the PROBLEM com-
mand.
 Weight, if the WEIGHT keyword is specified in the INPUT command; read as (Fw.d).
 r external variable values, if the EXTERNAL keyword on the PROBLEM command is set to
more than 0; read as (rFw.d), where r is the value specified with the EXTERNAL keyword.
 End of scores is indicated by one completely blank record.
 Use the CORRELAT option to use the input correlation matrix for MINRES factor analysis
(full information factor analysis requires item response data and cannot be carried out di-
rectly on the correlation matrix). Also see Example 3 of the INPUT command.
 NITEMxNITEM correlation matrix in lower triangular form. The NITEM(1+NITEM)/2 ele-
ments up to and including each diagonal element are read as one long string.
 Use the FACTORS option to read input factor loadings (for rotation only) from the file
specified by the FILE keyword. Also see Example 4 of the INPUT command.
 NITEMxNFAC matrix of factor loadings read by rows.
 One variable format record precedes the first row of this matrix and describes one row.

Format

SCORES/CORRELAT/FACTORS

Default

SCORES.

Related topics

 FACTOR command: NFAC keyword (Section 5.3.9)


 INPUT command: FILE and WEIGHT keywords
 PROBLEM command: CLASS, NITEM, and EXTERNAL keywords (Section 5.3.17)

TRIAL keyword

Purpose

To specify the filename for input of trial values for the full information factor analysis. It
may contain up to 128 characters including path and extension and must be enclosed in sin-
gle quotes.

Each line of the trial values file must contain the item number followed by the intercept and
slope for each factor. The variable format record must appear in the first line of this file. See

446
INPUT COMMAND

Example 5 of the INPUT command. Trial values are saved in this form, with the format
statement, by the TRIAL option of the SAVE command.

If the trial values are in the command file, they must appear immediately after the data for-
mat records.

Format

TRIAL=<‘filename’>

Default

No trial values.

Related topics

 SAVE command: TRIAL option (Section 5.3.20)

WEIGHT keyword

Purpose

To specify the type of weight for a weighted analysis. The two options below may be used
with this keyword.

 (CASE,n): Each record includes a case weight (real, i.e., with decimal points in data re-
cords, or read in F-format). Real normalizing constant n must be specified if CASE option
is chosen.
 PATTERN: Each data record consists of an answer pattern with a frequency (integer, i.e.,
without a decimal point and read in I-format).
 CASE and PATTERN may not be abbreviated in the INPUT command.

Format

WEIGHT=(CASE,n)/PATTERN

Default

No weights.

447
5 TESTFACT REFERENCE

5.3.13 KEY command

(Required)

Purpose

To specify correct-response codes for all the items on the main test, in their original, prese-
lected order.

Format

>KEY ccccccccccccccccccccccc;

Notes

Rules for entering the key to correct responses:

 The key is a continuous string of correct-answer response codes.


 Each response code is one character only, a letter or a number.
 Blanks in the response key string are ignored. If a blank is used as a response code, repre-
sent it by a comma (,) in the key string.
 Codes may continue without break on up to ten continuation lines.

Example

>KEY AABCAEDCEDEACBD125342;

Default

If this command is omitted, the job will abort.

448
NAMES COMMAND

5.3.14 NAMES command

(Optional)

Purpose

To provide brief names for all of the items on the test in their original order.

Format

>NAMES n1 , n2 ,..., nn ;

Notes

If items are selected and/or reordered using the SELECT command, their item names and the
answer key will be selected and/or reordered at the same time.

Rules for entering names of items:

 Names may be no more than 8 characters each.


 Names must be separated by commas and should normally start with an alphabetic charac-
ter.
 Names must be enclosed in single quotes if they:
o do not start with a letter
o contain a semicolon
o contain a forward slash
o contain a blank

Examples

>NAMES KNOW1, KNOW2, UNDER1, ANAL1, KNOW3, UNDER2, ANAL2, COMP, ANAL3;

>NAMES ‘100’,’A100’,’100B’,’C-8’,’D-9’,’E/10’,’F/20’;

Default

If this command is omitted, every item will be given a blank name.

449
5 TESTFACT REFERENCE

5.3.15 PLOT command

(Optional)

Purpose

To produce a line plot of a measure of item difficulty against discriminating power.

Format

>PLOT BISERIAL/PBISERIAL, NOCRITERION/CRITERION, FACILITY/DELTA;

Use only one of the items in each pair.

Examples

>PLOT BISERIAL,CRITERION,DELTA;

>PLOT BISERIAL,CRITERION,FACILITY;

>PLOT PBISERIAL,NOCRITERION,DELTA;

Default

No plot.

PLOT command: BISERIAL/PBISERIAL option

Purpose

To indicate the choice of discrimination index. The discrimination index may be either of
these:

 BISERIAL: biserial coefficient


 PBISERIAL: point biserial coefficient

Format

BISERIAL/PBISERIAL

Default

BISERIAL.

450
PLOT COMMAND

FACILITY/DELTA option

Purpose

To indicate the frame of reference for the item difficulty. Item difficulty may be plotted in
terms of either of these:

 FACILITY: item facility (percent correct)


 DELTA: difficulty index.

Format

FACILITY/DELTA

Default

FACILITY.

NOCRITERION/CRITERION option

Purpose

To define the discriminating power. Discriminating power may be with respect to either of
these:

 NOCRITERION: internal test or subtest score


 CRITERION: external criterion score

Format

NOCRITERION/CRITERION

Default

NOCRITERION

451
5 TESTFACT REFERENCE

5.3.16 PRIOR command

(Optional)

Purpose

To constrain the maximum likelihood estimation of slope and intercept parameters using a
beta prior distribution on the uniquenesses and a normal prior distribution on the intercepts.

Format

>PRIOR SLOPE=n, INTER=(m,s);

Example

>PRIOR SLOPE=1.5, INTER=(0,4);

Default

None. If the PRIOR command does not appear, the ML estimation will not be constrained.

INTER keyword

Purpose

To define the mean (m) and standard deviation (s) of the normal distribution for intercept pa-
rameters, such that c j ∼ n(m, s).

Format

INTER=(m,s)

Default

m=0, s=2.

SLOPE keyword

Purpose

To define the parameter of the beta distribution for uniquenesses, such that u j ∼ β (n,1). Lar-
ger values of n correspond to stronger priors.

452
PRIOR COMMAND

Format

SLOPE=n

Default

1.2.

453
5 TESTFACT REFERENCE

5.3.17 PROBLEM command

(Required)

Purpose

To specify overall characteristics of components included in each problem.

Format

>PROBLEM NITEM=n, SELECT=n, RESPONSE=n, SUBTEST=n, CLASS=n,


FRACTILES=n,EXTERNAL=n, SKIP=n, NOTPRES;

Examples

>PROBLEM NIT=150;

In this case, defaults would be invoked for all remaining parameters.

>PROBLEM NITEMS=50, SELECT=30, RESPONSE=7, SUBTEST=3, CLASS=2,


FRACTILES=5, EXTERNAL=0, SKIP=2;

Default

If this command does not appear, the job will abort.

CLASS keyword

Purpose

To specify the number of classes (n = 1 to 10) into which respondents will be divided. This
corresponds to the number of classes identified and named in the CLASS command.

Format

CLASS=n

Default

0.

Related topics

 CLASS command (Section 5.3.4)

454
PROBLEM COMMAND

EXTERNAL keyword

Purpose

To specify the number of external variates (n = 0 to 5). This should equal the number of ex-
ternal variates named in the EXTERNAL command.

Format

EXTERNAL=n

Default

0.

Related topics

 EXTERNAL command (Section 5.3.8)

FRACTILES keyword

Purpose

To specify the number of fractiles (n = 1 to 10) into which scores will be divided. Bounda-
ries of the fractiles are defined in the FRACTILES command.

Format

FRACTILES=n

Default

1.

Related topics

 FRACTILES command (Section 5.3.10)

NITEMS keyword

Purpose

To specify the total number of test items. This should equal the number of item names speci-
fied in the NAMES command.

455
5 TESTFACT REFERENCE

Format

NITEMS=n

Default

None; must be specified.

Related topics

 NAMES command (Section 5.3.14)

NOTPRES option

Purpose

To indicate that one of the response codes identifies “not-presented” items. See the
RESPONSE command, discussed in Section 5.3.19.

Format

NOTPRES

Default

All items are presented to all respondents.

Related topics

 RESPONSE command (Section 5.3.19)

RESPONSE keyword

Purpose

To specify the number of response codes (n = 2 to 15). This should equal the number of
codes specified in the RESPONSE command.

Format

RESPONSE=n

Default

2.

456
PROBLEM COMMAND

Related topics

 RESPONSE command (Section 5.3.19)

SELECT keyword

Purpose

To specify the number of items selected for this run (n = 0 to NITEM, the number of items
specified in the SELECT command).

Format

SELECT=n

Default

0; no selection; all items will be used in the original order.

Related topics

 PROBLEM command: NITEMS keyword


 SELECT command (Section 5.3.22)

SKIP keyword

Purpose

To specify which steps of the analysis should be performed.

Format

SKIP=n

n=0 Do not skip, perform item analysis and all subsequent steps.

n=1 Skip item analysis and proceed to calculation of tetrachoric correlations


immediately after data entry.

n=2 Proceed directly to factor analysis or rotation after input of correlation


matrix, factor pattern, or provisional parameter estimates.

Default

Do not skip (n = 0).

457
5 TESTFACT REFERENCE

SUBTEST keyword

Purpose

To indicate the number of boundaries and subtest names as specified in the SUBTEST com-
mand.

Format

SUBTEST=n

Default

0.

Related topics

 SUBTEST command (Section 5.3.25)

458
RELIABILITY COMMAND

5.3.18 RELIABILITY command

(Optional)

Purpose

To specify a measure of internal consistency for the main test (or all subtests).

Format

>RELIABILITY KR20/ALPHA;

Use only one of the two options:

 KR20: The default if the RELIABILITY command is used. The Kuder-Richardson formula
20 is calculated for each subtest (or for the main test when there are no subtests). Omits
are not allowed in computing KR20.
 ALPHA: Coefficient alpha is calculated for each subtest (or for the main test when there are
no subtests). Omits are permissible. The computer time required to calculate alpha may be
excessive if the number of items and respondents is large.

Example

>RELIABILITY KR20;

Default

No reliability measure.

459
5 TESTFACT REFERENCE

5.3.19 RESPONSE command

(Required)

Purpose

To specify the response codes common to all items on the main test.

Format

>RESPONSE omit, n1 , n2 ,..., nm − 2 , not-presented;

Notes

Rules for entering response codes:

 Codes must be 1 character each.


 Commas must separate codes.
 A code must be enclosed in single quotes if it is any non-alphabetic character.
 The first code must be for “omit”, and is required even if the data contain no omitted re-
sponses.
 The last code must be for “not-presented” if the NOTPRES option is entered in the PROBLEM
command.
 The total number of codes must not exceed 15.

Examples

>RESPONSE ‘0’,A,B,C,D;

In this example, there are 5 response codes on the main test (m = 5).

>RESPONSE ‘ ‘,’1’,’2’,’3’,’4’,’-‘;

In this example, there are 6 response codes on the main test (m = 6). Omit is blank, items
not-presented to respondents are coded “minus”.

Default

If this command is omitted, the job will abort.

Related topics

 PROBLEM command: NOTPRES option (Section 5.3.17)

460
SAVE COMMAND

5.3.20 SAVE command

(Optional)

Purpose

To save scores and/or item parameters in output files specified by the user.

Format

>SAVE SCORE, MAIN, SUBTESTS, CRITERION, CMAIN, CSUB, CCRIT,


CORRELAT, SMOOTH, UNROTATE, ROTATE, PARM, FSCORES, TRIAL,
SORTED, EXPECTED;

Notes

 The saved file for data simulation is described in the SIMULATE command.
 All results are saved in fixed-column text files; the first record of each file contains the
format statement describing the column layout.
 The saved files will have the jobname as default filename.

Example

>SAVE SCORE,MAIN,SUBTESTS,CRITERION,SMOOTH,FSCORES;

Default

Not saved.

Related topics

 SIMULATE command (Section 5.3.23)

CCRIT option

Purpose

To save the class item statistics based upon criterion score in the file <jobname>.ccr:

Format Description

(A8,1X Class name

A8,1X Criterion name

I4,1X Item number

461
5 TESTFACT REFERENCE

A8,1X Item name

E10.3 Number of respondents attempting item (weights)

F6.2 Mean criterion score

F6.3 Standard deviation

F6.3 Mean criterion score of respondents giving correct response

F6.3 Facility

F6.2 Difficulty

F6.2 Biserial correlation

F6.2 Point biserial correlation

F6.3 Biserial correlation based on criterion score

F6.3 Point biserial correlation based on criterion score

Format

CCRIT

Default

Do not save.

CMAIN option

Purpose

To save separate estimates for each class based on the main test score in the file <job-
name>.cma:

Format Description

(A8,1X Class name

‘MAIN’,5X -

I4 Item number

A8,1X Item name


462
SAVE COMMAND

E10.3 Number of respondents attempting item (weights)

F6.2 Mean score

F6.2 Standard deviation

F6.2 Mean score of respondents giving correct response

F6.3 Facility

F5.2 Difficulty

F6.3 Biserial correlation

F6.3 Point biserial correlation

F6.3 Biserial correlation based on criterion score

Format

CMAIN

Default

Do not save.

CORRELAT option

Purpose

To save the tetrachoric correlation matrix in the file <jobname>.cor. This matrix may not
be positive-definite (diagonal and lower triangle only, NITEMxNITEM).

Output format

Output is 80-column, space-delimited format-free, in lower triangular form with line wrap.

Format

CORRELAT

Default

Do not save.

463
5 TESTFACT REFERENCE

Related topics

 PROBLEM command: NITEMS keyword

CRITERION option

Purpose

To save the item statistics based upon criterion score in the file <jobname>.cri:

Format Description

(A8,1X Criterion name

I4,1X Item number

A8,1X Item name

E10.3 Number of respondents attempting item (weights)

F5.3 Facility

F6.2 Difficulty

F6.3 Biserial correlation

F6.3 Point biserial correlation

F6.2 Mean criterion score

F6.2 Mean criterion score of respondents giving correct response

F6.3 Biserial and point biserial based upon criterion score

Format

CRITERION

Default

Do not save.

464
SAVE COMMAND

CSUB option

Purpose

To save the item statistics of each class based upon subtest scores in the file <job-
name>.csu:

Format Description

(A8,1X Class name

A8,1X Subtest name

I4,1X Item number

A8,1X Item name

E10.3 Number of respondents attempting item (weights)

F6.2 Mean subtest score

F6.2 Standard deviation

F6.2 Mean score of respondents giving correct response

F6.3 Facility

F5.2 Difficulty

F6.3 Biserial correlation

F6.3 Point biserial correlation

Format

CSUB

Default

Do not save.

EXPECTED option

Purpose

To save the results of the final E-step of the full information item factor analysis in the file
<jobname>.exp:

465
5 TESTFACT REFERENCE

 counter designating quadrature point


 quadrature point
 weight for each quadrature dimension (the product of these weights is the weight for the
point.)
 expected number of correct attempts on each item at quadrature point
 expected number of attempts at quadrature point. If all respondents attempt every item, all
of these numbers will be equal. If some items are not attempted by some respondents,
these numbers will change from item to item.

Format

EXPECTED

Note

This option applies only to the solution (NOADAPT option of the TECHNICAL command).

Output format

(1X,fI2,n(/,1X,7F10.5))

f: the number of factors, as specified by the NFAC keyword in the FACTOR command.

n: number of lines required to print E-step results 2 through 4 above.

Default

Do not save.

Related topics

 FACTOR command: NFAC keyword (Section 5.3.9)


 TECHNICAL command: NOADAPT option (Section 5.3.26)

FSCORES option

Purpose

To save the factor scores and their posterior standard deviations with subject identification in
the file <jobname>.fsc. Output format is given in the first line of the factor score file.

Format

FSCORES

466
SAVE COMMAND

Default

Do not save.

MAIN option

Purpose

To save the following classical item statistics in the file <jobname>.mai:

Format Description

‘MAIN’,5X -

I4,1X Item number

A8,1X Item name

E10.3 Number of respondents (or weight) for item

F6.2 Mean test score for all respondents

F6.2 Standard deviation of score for all respondents

F6.2 Mean score of respondents answering item correctly

F5.3 Facility

F6.2 Difficulty

F6.3 Biserial correlation

F6.3 Point biserial correlation

Format

MAIN

Default

Do not save.

467
5 TESTFACT REFERENCE

PARM option

Purpose

To save the item numbers, intercepts, factor slopes, and guessing parameters in a form suit-
able for computing factor scores at a later time in the file <jobname>.par. If VARIMAX or
PROMAX is selected, these parameters will be saved after the VARIMAX rotation; otherwise,
they will be saved from the principal factor solution. If BIFACTOR is selected, the item num-
ber, intercepts, general and specific factor slopes will be saved. For scoring purposes, set the
FILE keyword of the SCORE command equal to <jobname>.par.

Note that PARM and TRIAL cannot be used in the same BIFACTOR analysis.

Output format

Output format is (I3,2X,F8.5,fF8.5), where f is the keyword value for NFAC in the
FACTOR command, or f = 2 for the BIFACTOR command. This format is given in the first line
of the PARM or TRIAL values file.

Format

PARM

Default

Do not save.

Related topics

 FACTOR command: ROTATE keyword (Section 5.3.9)


 FACTOR command: NFAC keyword
 BIFACTOR command (Section 5.3.3)
 SCORE command: FILE keyword (Section 5.3.21)
 SAVE command: TRIAL option

ROTATE option

Purpose

To save the rotated factor loadings (NITEMxNFAC) in the file <jobname>.rot.

Output format

The output format is (10F8.5).

468
SAVE COMMAND

Format

ROTATE

Default

Do not save.

Related topics

 PROBLEM command: NITEMS keyword (Section 5.3.17)


 FACTOR command: NFAC keyword (Section 5.3.9)

SCORES option

Purpose

To save the following case score information according to the status of WEIGHT in the INPUT
command in the file <jobname>.sco:

 case identification
 test form number
 case weight
 main test score
 subtest score
 criterion score

WEIGHT key- Number Formats used by program


word in INPUT of test forms
command
(aA1,3X,E10.3,3X,F7.0,3X,pF7.0,F9.2)
yes 1
(aA1,3X,F7.0,3X,pF7.0,F9.2)
no 1
(aA1,3X,I2,3X,E10.3,3X,F7.0,3X,pF7.0,F9.2)
yes >1
(aA1,3X,I2,3X,F7.0,3X,pF7.0,F9.2)
no >1

 a: the number specified with the NIDCHAR keyword in the INPUT command.
 p: the number of subtests.

469
5 TESTFACT REFERENCE

If p = 1, the main test and subtest score will be identical. If there is no CRITERION com-
mand, the criterion field will be null.

Format

SCORE

Default

Do not save.

Related topics

 INPUT command: NIDCHAR and WEIGHT keywords (Section 5.3.12)


 CRITERION command (Section 5.3.7)

SMOOTH option

Purpose

To save the “smoothed” NFAC common factor approximation to the correlation matrix in the
file <jobname>.smo.

Output format

This matrix will be positive-definite (diagonal and lower triangle only, NITEMxNITEM). The
output format is (10F8.5).

Format

SMOOTH

Default

Do not save.

Related topics

 PROBLEM command: NITEMS keyword (Section 5.3.17)

SORTED option

Purpose

To save the sorted file of identity, item responses, and weight in the file <jobname>.sor.

470
SAVE COMMAND

This applies only to the non-adaptive solution (NOADAPT option on TECHNICAL command).

Format

SORTED

Default

Do not save.

Related topics

 TECHNICAL command: NOADAPT option (Section 5.3.26)

SUBTESTS option

Purpose

To save the item subtest parameter estimates as follows in the file <jobname>.sub:

Format Description

A8,1X Subtest name

I4,1X Item number

A8,1X Item name

E10.3 Number of respondents attempting item (weight)

F6.2 Mean subtest score

F6.2 Standard deviation

F6.2 Mean score of respondents answering item correctly

F5.3 Facility

F6.2 Difficulty

F6.3 Biserial correlation

F6.3 Point biserial correlation

Format

SUBTESTS

471
5 TESTFACT REFERENCE

Default

Do not save.

TRIAL option

Purpose

To save the item number intercepts, factor slopes, and guessing parameters in the file <job-
name>.tri in a form suitable for performing additional EM parameter estimation cycles at a
later time. The trial values are saved at the end of the EM cycles and before re-
orthogonalization or rotation. In BIFACTOR analysis, TRIAL and PARM are identical. To use
the saved trial values parameters as the starting point for continued EM cycles, set the TRIAL
keyword of the INPUT command equal to <jobname>.tri.

Output format

The output format is (I3,2X,F8.5,fF8.5), where f is the (keyword) value for NFAC in the
FACTOR command, or f = 2 for the BIFACTOR command. This format is given in the first line
of the PARM or TRIAL values file.

Note that PARM and TRIAL cannot be used in the same BIFACTOR analysis.

Format

TRIAL

Default

Do not save.

Related topics

 FACTOR command: NFAC and ROTATE keywords (Section 5.3.9)


 BIFACTOR command (Section 5.3.3)
 INPUT command: TRIAL keyword (Section 5.3.12)
 SAVE command: TRIAL option

UNROTATE option

Purpose

To save the unrotated (principal) factor loadings (NITEMxNFAC) in the file <jobname>.unr.
Use UNROTATE to save BIFACTOR loadings.

472
SAVE COMMAND

Output format

The output format is (10F8.5).

Format

UNROTATE

Default

Do not save.

Related topics

 PROBLEM command: NITEMS keyword (Section 5.3.17)


 FACTOR command: NFAC keyword (Section 5.3.9)
 BIFACTOR command (Section 5.3.3)

473
5 TESTFACT REFERENCE

5.3.21 SCORE command

(Optional)

Purpose

To obtain factor score estimates (EAP or MAP) and standard error estimates for each case
from estimated or supplied item parameters, the EAP score of the general factor of the bifac-
tor model, and estimates of the standard error of the general factor score allowing for condi-
tional dependence introduced by the group factors.

Format

>SCORE LIST=n, NFAC=n, FILE=‘filename’, MISSING, TIME, CHANCE, LOADINGS,


METHOD=n, PARAM=n, SPRECISION=n;

Examples

>SCORE LIST=20;

>SCORE LIST=10,NFAC=6,FILE=‘NEWTEST.PAR’,MISSING,TIME,CHANCE;

Default

No factor score estimates.

CHANCE option

Purpose

To specify the use of the guessing model in computing factor scores. When used in conjunc-
tion with the SIMULATE command, the item parameter file must include the chance parame-
ters. This option has the same effect as CPARMS in the FULL command.

Format

CHANCE

Related topics

 FULL command: CPARMS keyword (Section 5.3.11) .


 SIMULATE command (Section 5.3.23)

474
SCORE COMMAND

FILE keyword

Purpose

To specify the name (enclosed in single quotes) of the file containing item parameters for
scoring. The name may include a path and a filename extension, but the total length may not
exceed 128 characters. The drive and directory path should be included if the data file is not
in the same folder as the command file. For more information, see Section 3.2.6.

This file has the format as trial values produced by the TRIAL option in the SAVE command,
i.e., chance value, intercept, and slopes.

The layout of this file is as follows:

First record:

A variable format statement (in parentheses) describing the item parameter column assign-
ments.

Following records:

 Without chance parameter: intercept and factor slopes and/or standard difficulty loadings.
 With chance parameter: intercept or factor slopes, and standard difficulty loadings.

Format

FILE=<‘filename’>

Default

None.

Related topics

 SAVE command: TRIAL option (Section 5.3.20)

LIST keyword

Purpose

To specify the number of leading cases for which factor scores will be listed in the program
output. If FSCORES appears in the SAVE command, factor scores for all cases will be saved in
the file with the same name as the command file, but with the extension *.fsc.

475
5 TESTFACT REFERENCE

Format

LIST=n

Default

Factor scores of all cases will be listed.

Related topics

 SAVE command: FSCORES option (Section 5.3.20)

LOADINGS keyword

Purpose

To specify that the parameter file contains item standard difficulties and factor loadings.

Format

LOADINGS

Default

The parameter file contains item intercepts and slopes.

METHOD keyword

Purpose

To specify the method of estimation.

Format

METHOD=n

n = 1: Maximum A Posteriori (MAP) estimation

n = 2: Bayes (EAP) estimation

476
SCORE COMMAND

MISSING option

Purpose

The OMIT keyword on the BIFACTOR command will be automatically set to MISSING if the
TIME option has been selected in the TETRACHORIC command.

Format

MISSING

Related topics

 TETRACHORIC command: TIME option (Section 5.3.27)

NFAC keyword

Purpose

To specify the number of factors when estimating factor scores from a user-supplied file of
parameter values. If the keyword NFAC appears, a parameter file must be designated by the
FILE keyword of the SCORE command and available for reading.

Format

NFAC=n

Default

Factor scores will be computed from parameters in the current specified analysis.

Related topics

 SCORE command: FILE keyword

PARAM keyword

Purpose

To specify the number of parameter values (integer and factor slopes) supplied by the user
for estimating factor scores, where n = f + 1, with f being the number of factor loadings.

Format

PARAM=n

477
5 TESTFACT REFERENCE

Note

 Required if scale score estimates for each subject are desired, without factor analysis.
 PARAM must not be used if the FACTOR command or the FULL command is included.
 If the PARAM keyword is invoked, the parameter file must be designated with the FILE key-
word in the SCORE command and available for reading.

Related topics

 FACTOR command (Section 5.3.9)


 FULL command (Section 5.3.11)
 SCORE command: FILE keyword

SPRECISION keyword

Purpose

To control the EAP and MAP precision in the calculation of factor scores.

Format

SPRECISION=n

Default

0.0001.

QUAD keyword

Purpose

To specify the number of quadrature points in EAP estimation.

Format

QUAD=n

Default

 1 factor: 10
 factors : 5
 factors: 3
 …

478
SCORE COMMAND

TIME option

Purpose

To specify that omitted items after the last non-omitted item should be treated as not-
presented. Tetrachoric correlation coefficients will be computed with the TIME option, even
if TIME has not been specified in the TETRACHORIC command.

Format

TIME

Related topics

 TETRACHORIC command: TIME option (Section 5.3.27)

479
5 TESTFACT REFERENCE

5.3.22 SELECT command

(Optional)

Purpose

To specify items to be selected and/or reordered for each problem. Requires the SELECT
keyword of the PROBLEM command to be set to n '.

Format

SELECT= (item1 , item2 ,..., itemn' )

Notes

If the SELECT command is used in a given problem, all commands following it in the same
problem will pertain only to the selected and/or reordered set of items. Rules for selecting
and reordering of items:

 Selection is made by listing the original order-numbers of the desired items. For example,
from the items 1, 2, 3, 4, 5, 6, 7, 8, 9, the items 1, 4, 5, 7, 8, 9 might be selected.
 The selected items can be in any order. For example, the items could have been selected in
the order 7, 5, 9, 1, 8, 4.
 If all the items (n) are to be reordered, n ' = n, and the selection list will contain the origi-
nal n item numbers in the new order.
 Contiguous items may be entered with a “(1)” between the first and last item numbers. For
example, 10(1)34, would select all items numbered 10 through 34.
 To select every b-th item from a to c, write a(b)c. For example, 1(2)99 will select every
odd-numbered item from a 100-item test.
 Each item’s name and the answer key will be selected and/or reordered at the same time
as the item.

Example

>SELECT 10,9,11,2(1)5,15,14,13,12;

From an original list of 20 items, 11 items are to be selected (n = 20; n ' = 11).

Default

All items will be included in the analysis in their original order.

480
SELECT COMMAND

Related topics

 PROBLEM command: SELECT keyword (Section 5.3.17)

481
5 TESTFACT REFERENCE

5.3.23 SIMULATE command

(Optional)

Purpose

To simulate item response records of cases drawn from a multivariate latent distribution of
factor scores with user-specified vector mean and fixed correlation matrix. The user must
supply standard item difficulties and NFAC factor loadings (or intercepts and factor slopes)
for each item. If a model with chance correct responses is specified, the probabilities of cor-
rect responses must also be supplied. The factor loadings must be orthogonal, e.g., principal
factors. If desired, the means of the factors can be set to arbitrary values to simulate group
effects. The default mean value is 0.0.

Format

>SIMULATE NFAC=n, NCASES=n, SCORESEED=n, ERRORSEED=n, GUESSSEED=n,


CHANCE, LOADINGS/SLOPES, FILE=‘filename’, MEAN=(list),
FORM=n, GROUP=n, PARM=n;

Notes

 The simulated item responses will be saved in the file with the name <jobname>.sim.
 The communalities of the factor loadings must be less than 1.0.
 For simulation, only the TITLE, PROBLEM, SIMULATE, and CONTINUE or STOP commands
are required; the NAMES command is optional.
 There must be no SAVE or INPUT command.
 Response codes in the simulated data are 1 for correct and 0 for incorrect.

Examples

>SIMULATE NFAC=1, NCASES=2500, SLOPES, FILE=‘SIM01.PAR’;

>SIMULATE NFAC=3, NCASES=2500, LOADINGS, CHANCE, FILE=‘SIM01C1G.PAR’,


MEAN=(0.8,0.5,0.5);

Default

No simulation.

Related topics

 TITLE command (Section 5.3.28)


 PROBLEM command (Section 5.3.17)
 CONTINUE command (Section 5.3.6)
 STOP command (Section 5.3.24)
 NAMES command (Section 5.3.14)

482
SIMULATE COMMAND

 SAVE command (Section 5.3.20)


 INPUT command (Section 5.3.12)

CHANCE option

Purpose

To indicate that the model allowing for correct responses by chance is assumed. The chance
parameters must be present in the parameter file.

Format

CHANCE

Default

Non-chance model.

ERRORSEED keyword

Purpose

To provide the seed of the random number for generating the univariate normal independent
uniqueness distributions of the items.

The random number generator seed may be any number greater than 1 and less than
2147483647.

Format

ERRORSEED=n

Default

453612.

FILE keyword

Purpose

To specify the name (enclosed in single quotes) of the file containing item parameters of the
simulation model. This name may include a path and filename extension, but the total length
may not exceed 128 characters.

483
5 TESTFACT REFERENCE

The simulation parameter file must have the following layout when the CHANCE option is not
present:

 First record: variable format statement describing the fixed-column layout of the file.
 NITEMS following records: Standard difficulty, NFAC factor loadings or standard difficulty,
NFAC slopes.

If the CHANCE option is present, the chance probabilities should precede standard difficulties.

Format

FILE=‘filename’

Default

None.

Related topics

 SIMULATE command: CHANCE option


 PROBLEM command: NITEMS keyword (Section 5.3.17)
 FACTOR command: NFAC keyword (Section 5.3.9)

FORM keyword

Purpose

To provide, solely for convenience, a test form identification following the case number in
the simulation records; n may be set to any natural number.

Format

FORM=n

Default

1.

GROUP keyword

Purpose

To provide, solely for convenience, a group identification following the case number in the
simulation records; n may be set to any natural number.

484
SIMULATE COMMAND

Format

GROUP=n

Default

1.

GUESSSEED keyword

Purpose

To specify the seed of the random number for generating the independent probability of
chance success on an item response.

The random number generator seed may be any natural number greater than 1 and less than
2147483647.

Format

GUESSEED=n

Default

543612.

LOADINGS/SLOPES option

Purpose

To indicate the form in which the item parameters are provided:

 Select LOADINGS if the item parameters are supplied in the form of standard item difficulty
(i.e., the standard normal deviate corresponding to the population percent CORRECT),
followed by the NFAC factor loadings.
 Select SLOPES if the item parameters are in the form of standard item difficulty, followed
by NFAC factor slopes.

Format

LOADINGS/SLOPES

Default

LOADINGS.

485
5 TESTFACT REFERENCE

Related topics

 FACTOR command: NFAC keyword (Section 5.3.9)

MEAN keyword

Purpose

To provide the population means of the factor scores from which the responses are gener-
ated. These means will be added to the random standard normal deviates representing the
ability of each case of the corresponding factors. The maximum number of factors allowed
is 15.

Format

MEAN= ( n1 , n2 ,..., nm )

Default

None.

NCASES keyword

Purpose

To specify the number of response records to be generated.

Format

NCASES=n

Default

1.

NFAC keyword

Purpose

To specify the number of factors in the model.

Format

NFAC=n

486
SIMULATE COMMAND

Default

1.

PARM keyword

Purpose

To specify the number of parameter values (intercept and factor loadings) supplied by the
user. n = f + 1, where f is the number of factor loadings.

Format

PARM=n

SCORESEED keyword

Purpose

To provide the seed of the random number for generating the multivariate normal ability dis-
tribution. The mean and standard deviation for each variate is assumed zero and one, respec-
tively. The random number generator seed may be any natural number greater than 1 and
less than 2147483647.

Format

SCORESEED=n

Default

345261.

487
5 TESTFACT REFERENCE

5.3.24 STOP command

(Required)

Purpose

To terminate the problem stack.

Format

>STOP

Note

A semicolon to signal the end of this command is not needed.

488
SUBTEST COMMAND

5.3.25 SUBTEST command

(Optional)

Purpose

To specify the partition of the main test into subtests and to assign names to the subtests.

Format

>SUBTEST BOUNDARY=(list), NAMES=(list);

Examples

>SUBTEST BOUNDARY=(10,20,30);

A test with 30 items will be partitioned into 3 subtests of 10 items each.

>SUBTEST BOUNDARY=(10,20,30), NAME=(BASIC, AVERAGE, ADVANCED);

A test with 30 items will be partitioned into 3 named subtests of 10 items each:

Subtest Name Item Boundaries

BASIC 1 through 10

AVERAGE 11 through 20

ADVANCED 21 through 30

Default

No subtests.

BOUNDARY keyword

Purpose

If the SELECT command is used to reorder items before subtests are partitioned, boundaries
are specified by the new order numbers.

Format

BOUNDARY= ( n1 , n2 ,..., n p )

with n = order number of the last item in each subtest.

489
5 TESTFACT REFERENCE

Default

None.

Related topics

 SELECT command (Section 5.3.22)

NAMES keyword

Purpose

To specify the names, of no more than 8 characters, for each subtest. Note that the rules for
naming items also apply to naming subtests (see the NAMES command, Section 5.3.14).

Format

NAMES= ( n1 , n2 ,..., n p )

Default

No names.

Related topics

 NAMES command (Section 5.3.14)

490
TECHNICAL COMMAND

5.3.26 TECHNICAL command

(Optional)

Purpose

To change the value of the default constants in the item factor analysis.

Format

>TECHNICAL QUAD=n, SCQUAD=n, ITER=(list), PRV=n, FREQ=n, NITER=(list),


NSAMPLE=n, MCEMSEED=n, QSCALE=n, QWEIGHT=n, IQUAD=n,
ITLIMIT=n, PRECISION=n, NOADAPT, FRACTION, ACCEL=n, NOSORT;

Example

>TECHNICAL QUAD=5, ITER=(20,3,0.001);

Default

All program defaults.

ACCEL keyword

Purpose

To request acceleration of the full information analysis by three-point extrapolation between


EM cycles. n is the proportional step size of the acceleration.

Format

ACCEL=n

Default

1.0.

FRACTION option

Purpose

To invoke a three-point quadrature with an 81-point fractional factorial design. This option
is only applicable in the case of adaptive quadrature with five factors. Otherwise, the full
243-point design is used.

491
5 TESTFACT REFERENCE

Format

FRACTION

FREQ keyword

Purpose

To specify whether to print the pattern frequencies.

Format

FREQ=n

 n = 0: Do not print observed and expected response pattern frequency.


 n = 1: Print observed and expected response pattern frequency.

Default

0.

IQUAD keyword

Purpose

To specify the type of quadrature:

 n = 1: Gauss-Hermite quadrature.
 n = 2: Gauss-Hermite quadrature; the quadrature points as well as the weights are printed.
 n = 3: Quadrature using ordinates.
 n= 4: Quadrature using ordinates; the quadrature points as well as the weights are printed.

Format

IQUAD=n

Default

n = 4.

ITER keyword

Purpose

To specify the parameters for EM cycles.

492
TECHNICAL COMMAND

Format

ITER=(c,d,e)

 c: maximum number of EM cycles (min = 3).


 d: maximum number of iterations within M-step.
 e: convergence criterion in M-step.

Note

d and e are used only in non-adaptive quadrature; there is only one M-step per EM-cycle in
adaptive quadrature.

Default

(c,d,e)=(15,5,0.005).

ITLIMIT keyword

Purpose

To specify the number of EM cycles prior to the fixing of conditional distributions. In adap-
tive quadrature, the means and covariances of the conditional distribution of factor variables
for each case is computed only in the first ITLIMIT EM cycles. Thereafter, the conditional
distributions are held fixed for each case. Change of the log likelihood between EM cycles is
computed and displayed only after fixing has occurred. In Monte Carlo EM, the sampled
points are fixed at their values for each case in the ITLIMIT cycle.

Format

ITLIMIT=n

n : The number of EM cycles prior to fixing.

Default

Adaptive: n = 10

Monte Carlo: n = 15.

MCEMSEED keyword

Purpose

To specify the generation of random multivariate normal variables for Monte Carlo integra-
tion in the full information EM solution. n is the number of points sampled for Monte Carlo

493
5 TESTFACT REFERENCE

EM solution (min = 2; max = 2147483646). If this keyword appears, the quadratures in the
E-step of the EM cycles will be performed by Monte Carlo integration; otherwise, fixed-
point quadrature is used.

Format

MCEMSEED=n

Default

Fixed-point quadrature.

NITER keyword

Purpose

To specify the parameters for communality improvements.

Format

NITER=(h,i)

where

 h : Maximum number of iterative communality improvements. Must be between 0 and 5,


inclusive.
 i : Convergence criterion for communality improvements.

Default

(h,i)=(3, 0.001).

NOADAPT option

Purpose

To specify that non-adaptive quadrature be performed in the full information solution. Note
that this option can only be invoked if there are 5 or fewer factors; with more than 5 factors,
this option if present will be ignored and adaptive fractional quadrature will be performed
(with 3 points per dimension). If the NOADAPT option is not invoked, all quadrature is adap-
tive.

Format

NOADAPT

494
TECHNICAL COMMAND

NOSORT option

Purpose

To suppress the sorting of response patterns with respect to their number correct scores. In
non-adaptive quadrature, such sorting can be used to speed computation. As it has no advan-
tage in adaptive quadrature or Monte Carlo, or in the BIFACTOR solution, NOSORT is always
used in these solutions.

Format

NOSORT

Related topics

 BIFACTOR command

NSAMPLE keyword

Purpose

To specify the number of points sampled in the latent variable space when numerical inte-
gration used in the marginal maximum likelihood procedure is based on a fractional factorial
design. For example, if the number of factors equals 4, then a fractional factorial design re-
quires 34 = 81 points. Likewise, when five factors are specified the number of points is
35 = 243.

Format

NSAMPLE=n

Default

Set by program, value is written to the output file. Maximum = 243.

PRECISION keyword

Purpose

To specify the convergence criterion for change between EM cycles.

Format

PRECISION=n

495
5 TESTFACT REFERENCE

Default

One-third of the maximum number of EM cycles (see ITER keyword on TECHNICAL com-
mand).

Related topics

 TECHNICAL command: ITER keyword

PRV keyword

Purpose

To control the printing of provisional estimates.

Format

PRV=n

n = 0 No provisional estimates of slope and intercept parameters are printed.

n = 1 Provisional estimates of slope and intercept parameters are printed after each E-
step.

n = 2 Provisional estimates of slope and intercept parameters are printed after each M-
step iteration.

n = 3 Provisional estimates of slope and intercept parameters are printed after each E-
step and M-step iteration.

n = 4 Provisional estimates of slope and intercept parameters and their corrections are
printed as in 3.

Default

0.

QSCALE keyword

Purpose

To set the value of the extreme points in adaptive quadrature when QUAD or SQUAD equals 3.

496
TECHNICAL COMMAND

n : Absolute value in the points (-n, 0.0, +n).

Format

QSCALE=n

Default

1.2.

Related topics

 TECHNICAL command: QUAD keyword


 TECHNICAL command: SQUAD keyword

QUAD keyword

Purpose

To specify the number of quadrature points, 1 to 10, per dimension in full information solu-
tion.

Format

QUAD=n

Default

Depends on the number of factors; see program output.

QWEIGHT keyword

Purpose

To set the value of the weights of extreme points in three-point quadrature.

m : The weights (m, 1 – 2m, m) are assigned to the points; n must be fractional.

Format

QWEIGHT=m

Default

The weights are normal ordinates at (-n, 0.0, +n).

497
5 TESTFACT REFERENCE

SQUAD keyword

Purpose

To specify the number of quadrature points for EAP estimation of factor scores.

Format

SCQUAD=n

Default

Depends on number of factors; see program output.

498
TETRACHORIC COMMAND

5.3.27 TETRACHORIC command

(Optional)

Purpose

To specify how to form the count matrix that is used in calculating tetrachoric correlations.

Format

>TETRACHORIC RECODE/PAIRWISE/COMPLETE, TIME, LIST, CROSS, NDEC=n;

Examples

>TETRACHORIC COMPLETE, LIST, TIME;

>TETRACHORIC PAIRWISE, LIST, TIME;

>TETRACHORIC RECODE, NDEC=3, LIST, TIME;

Default

No correlations computed.

CROSS option

Purpose

To ensure the joint frequencies for each pair of items will appear in the output.

Format

CROSS

Default

No joint frequencies are listed.

LIST option

Purpose

To ensure that the matrix of tetrachoric correlations (and possibly warning messages) will
appear in the printed output. This correlation matrix may be saved even when it is not listed
(see the SAVE command).

499
5 TESTFACT REFERENCE

Format

LIST

Default

No listing of the tetrachoric correlations is provided.

Related topics

 SAVE command (Section 5.3.20)

NDEC keyword

Purpose

To specify the number of decimal places printed in tetrachoric correlation coefficients.

Format

NDEC=n

Default

3.

RECODE/PAIRWISE/COMPLETE option

Purpose

To specify the treatment of observations that include omits. One of the following options
may be selected:

 RECODE: Omits will be coded as wrong responses.


 PAIRWISE: All observations will be used. Wherever an omit response occurs, the item will
be ignored. A pair of responses containing a “not-presented” item will be excluded from
the calculation.
 COMPLETE: Only observations with no omit responses will be used.

Format

RECODE/PAIRWISE/COMPLETE

500
TETRACHORIC COMMAND

Default

RECODE.

TIME option

Purpose

To specify that omitted items following the last non-omitted item be treated as not-
presented. All omitted items prior to the last non-omitted item will be recoded as “wrong” if
the guessing mode is not selected. If the guessing mode is selected, these items will be
scored “correct” with probability g j and “incorrect” with probability 1 − g j .

The TIME option does not affect RECODE, but if TIME is combined with COMPLETE or
PAIRWISE, different tetrachoric correlation coefficients will result.

Format

TIME

Related topics

 TETRACHORIC command: RECODE option

501
5 TESTFACT REFERENCE

5.3.28 TITLE command

(Required)

Purpose

To provide a label that will be used throughout the output to identify the problem run.

Format

>TITLE
…text…
…text…

Notes

The TITLE command consists of three lines. The first line contains the TITLE command, and
is followed by two lines of 80 characters maximum containing the title text. Using only one
title line will cause an error condition. If the title does not require two lines, leave the sec-
ond line blank.

A semicolon to end this command is not needed.

Example

>TITLE
ENGLISH LANGUAGE COMPREHENSION TEST
ITEM AND TEST STATISTICS
>PROBLEM…

Default

No default, title lines are required.

5.3.29 Variable format statement

The data layout must be described in a variable format statement. This statement is entered
within parentheses and follows immediately after the INPUT command.

When data (labels, raw data, summary statistics) are used in fixed format, a format statement is
needed to instruct the program how to read the data. The general form of such a statement is

(rCw) or (rCw.d),

where:

 r Repeat count; if omitted, 1 is assumed.

502
VARIABLE FORMAT STATEMENT

 C Format code
 w Field width, or number of columns
 d Number of decimal places (for F-format).

The following codes are used to indicate the type of value to be read:

 A Code for character values


 I Code for integer values
 F Code for real numbers

The format statement must be enclosed in parentheses. Blanks within the statement are ignored:
(rCw.d) is acceptable. The program also ignores anything after the right parenthesis and on the
same line. Thus, comments may be placed after the format statement.

Examples of format statements:

The labels HEIGHT, WEIGHT, AGE, and IQ could be read in fixed format as

(A6,A6,A3,A2)
HEIGHTWEIGHTAGEIQ

Or, with the same result, as

(4A6)
HEIGHTWEIGHT AGE IQ

Note that the first method lets the repeat count default to 1, and that it describes several different
fields, separated by commas, with one statement.

The following example shows three ways to read five integers, with the same result:

(5I1)
12345

(5I2)
1 2 3 4 5

(I1,I2,3I3)
1 2 3 4 5

The F-format requires that the number of decimal places be specified in the field description, so
if there are none (and eight columns) specify (F8.0); (F8) is not allowed. However, if a data
value contains a decimal point, then this overrides the location of the decimal point as specified
by the general field description. If the general field description is given by (F8.5), then
12345678 would result in the real number +123.45678, but the decimal point in –1234.56
would not change. Just a decimal point, or only blanks, will result in the value zero. The plus
sign is optional.

503
5 TESTFACT REFERENCE

It is possible to use the format statement to skip over variables in the data file when they are not
needed in the analysis. For example, (F7.4,8X,2F3.2) informs the program that the data file
has 21 columns per record. The first value can be found in the first seven columns (and there are
four decimal places), then eight columns should be skipped, and a second and third value are in
columns 16 – 21, both occupying three columns (with two decimal places). Note that the SELECT
command allows selection and reordering of variables.

Another possibility is the use of the tabulator format descriptor T, followed by a column number
n. For example, (1F8.5,T60,2F5.1) describes three data fields: in columns 1 – 8, with five
decimal digits, next in columns 61 – 65 and 66 – 70, both with one decimal digit. If the number n
is smaller than the current column position, left tabbing results. A forward slash (/) in an F-for-
mat means “skip the rest of this line and continue on the next line” . Thus, (F10.3/5F10.3) in-
structs the program to read the first variable on the first line, then to skip the remaining variables
on that line and to read five variables on the next line.

Related topics

 SELECT command (Section 5.3.22)


 INPUT command (Section 5.3.12)

504
6 IRT GRAPHICS

6 IRT graphics

6.1 Introduction

A new feature included with the IRT programs is the IRT GRAPHICS procedure. Item
characteristic curves, item and test information curves, and a histogram of the estimated abilities
may be plotted. A matrix plot showing all item characteristic curves simultaneously can also be
obtained. This feature is accessed via the Run menu on the main menu bar and becomes
available once the analysis has been completed. The plots are based on the contents of the
parameter files produced by the respective programs. In this chapter, an overview of the interface
and options of this feature is given.

6.2 Main menu

The Main window of the IRT GRAPHICS program is used to access the following graphics:

 Item characteristic curves through the ICC option


 Item information curves through the Information option
 ICC and item information curves on the same page through the ICC and Info option
 Total information curve through the Total Info option
 Simultaneous display of all Item Characteristic Curves (ICCs) through the Matrix Plot
option
 Regression of ability on the percentage correct through the Bivariate Plot option
 Histogram of estimated abilities through the Histogram option.

The graphs displayed may be selected, changed, saved to file, or printed using various options
and dialog boxes described in Section 6.3. To exit the program, click the Exit option on the
Main menu.

505
6 IRT GRAPHICS

6.2.1 The ICC option

This option provides access to item characteristic curves for all the items in the test. In the image
below, the ICC for item 2 is displayed.

 As a nominal model was fitted in this case, the high category is displayed in red and a
message to this effect is displayed in the Category Legends box at the bottom of the win-
dow. This field contains the legend for all categories plotted.
 The Next button provides access to following items, while the Prev button allows the user
to go back to previously viewed Item Characteristic Curves (ICCs).
 Use the Main Menu button at the bottom left of the window to return to the main menu.
 The graph can be selected, edited, saved, or printed using the File, Edit, Graphs, and Op-
tions menus on the main menu bar. For more on the options available, see Section 6.3.

506
6 IRT GRAPHICS

Related topics

 Manipulating and modifying graphs (see Section 6.3)


 Item characteristic curves (see Section 6.4)

6.2.2 The Information option

 This option provides access to item information curves for all the items in the test. In the
image below, the item information curve for the second item is displayed.
 The Scaling Information box at the bottom of the window contains information on the
scaling of the information axis. The item with the most information is indicated here for
all items in a test.
 The Next button provides access to following items, while the Prev button allows the user
to go back to previously viewed item information curves.
 Use the Main Menu button at the bottom left of the window to return to the main menu.
 The graph can be selected, edited, saved, or printed using the File, Edit, Graphs, and Op-
tions options on the main menu bar. For more on the options available, see Section 6.3.

507
6 IRT GRAPHICS

Related topics

 Manipulating and modifying graphs (see Section 6.3)


 Item information curves (see Section 6.5)

6.2.3 The ICC and Info option

When this option is selected from the Main menu, the ICC and item information curve for an
item are displayed simultaneously.

 As a nominal model was fitted in this case, the high category is displayed in red and a
message to this effect is displayed in the Category Legends box at the bottom of the win-
dow. This field also contains information on the legend for all other categories plotted.
 The Next button provides access to following items, while the Prev button allows the user
to go back to previously viewed item curves.
 Use the Main Menu button at the bottom left of the window to return to the main menu.
 The graph can be selected, edited, saved, or printed using the File, Edit, Graphs, and Op-
tions menus on the main menu bar. For more on the options available, see Section 6.3.

508
6 IRT GRAPHICS

Related topics

 Manipulating and modifying graphs (see Section 6.3)


 Item characteristic curves (see Section 6.4)
 Item information curves (see Section 6.5)

6.2.4 The Total Info option

This option is used to access the test information and standard error curves.

 The total test information for a given scale score is read from the axis on the left of the
graph and is plotted in blue.
 The axis to the right of the graph is used for reading the standard error estimate for a given
scale score. The measurement error is shown in red.
 Use the Main Menu button at the bottom left of the window to return to the main menu.
 The Next and Prev buttons may be used to access similar plots for multiple groups (if
any).
 The graph can be selected, edited, saved, or printed using the File, Edit, Graphs, and Op-
tions menus on the main menu bar. For more on the options available, see Section 6.3.

509
6 IRT GRAPHICS

Related topics

 Manipulating and modifying graphs (see Section 6.3)


 Test information curves (see Section 6.6)

6.2.5 Matrix Plot option

This option provides an organized way of simultaneously looking at the item characteristic
curves of up to 100 items.

In the graph below, the ICCs of 35 items are plotted. As can be seen from the graph, models fit-
ted to the items range from the 1PL model to the nominal, graded and multiple response models.
Item 1 is shown in the top left corner of the combined graph, as indicated by the item numbers
given to the right of the plots. The gray lines dividing each plot into four quadrants are drawn at
a probability of 0.5 (on the y-axis) and ability of 0 (on the x-axis).

510
6 IRT GRAPHICS

To take a closer look at item 20, to which a nominal response model was fitted, click and drag
the right mouse button to select the area for zooming as shown below.

Releasing the mouse button produces a closer look at the graph for item 20 as shown below.
Note that any part of the matrix of plots can be selected for zooming, and that the zoom option is
also available for already enlarged areas of the matrix such as that shown below.

511
6 IRT GRAPHICS

Note that the high category is shown in red. To reset the image, double-click the right mouse but-
ton.

 Up to 100 items can be simultaneously displayed. If the test contains more than 100 items,
return to the Main Menu and click the Matrix Plot button again for the next set of items.
 The graphs can be selected, edited, saved, or printed using the File, Edit, Graphs, and
Options menus on the main menu bar. For more on the options available, see Section 6.3.

Related topics

 Manipulating and modifying graphs (see Section 6.3)


 Item information curves (see Section 6.5)

6.2.6 The Histogram option

The Histogram option provides a histogram of the ability scores. This option is only available if
scoring has been requested and the scores have been saved to an external file.

As indicated in the legend box at the bottom of the window, abilities are rescaled to a mean of 0
and standard deviation of 1. The area under the bell-shaped curve equals the total area of the his-
togram.

 Use the Main Menu button at the bottom left of the window to return to the main menu.
 The Next and Prev buttons may be used to access similar plots for multiple groups (if
any).
 The graph can be selected, edited, saved, or printed using the File, Edit, Graphs, and Op-
tions options on the main menu bar.

512
6 IRT GRAPHICS

Related topics

 Manipulating and modifying graphs (see Section 6.3)

6.2.7 The Bivariate Plot option

The Bivariate Plot option provides a regression of ability on the percentage correct. This option
is only available if scoring has been requested and the scores have been saved to an external file.

 As with the matrix plots, segments of the plot may be inspected by zooming in. This is
done by clicking and dragging the mouse to select the area of interest.
 A 95% prediction interval for a new examinee is also shown on the plot.
 Use the Main Menu button at the bottom left of the window to return to the main menu.
 The graph can be selected, edited, saved, or printed using the File, Edit, Graphs, and Op-
tions menus on the main menu bar. For more on the options available, see Section 6.3.
 If information is available for multiple groups, bivariate plots are available by group and
the Next and Prev buttons may be used to access the plots for following groups.

513
6 IRT GRAPHICS

6.2.8 The Exit option

Use this option to exit the application.

6.3 Manipulating and modifying graphs

Displayed graphs can be modified, saved and printed by using menus available on the main
menu bar of the graph window.

6.3.1 File menu

The File menu controls the printing and saving of graphs.

 The Save as Metafile option is used to save the selected page or graph as a *.wmf (Win-
dows Metafile) for later use in other applications.
 Note that an entire page, including legend boxes, may be printed using the Print current
page option.
 Alternatively, the Show Selectors option on the Options menu may be used to select a
graph, after which the Print selected graph option of the File menu may be used to print
only the selected graph.
 The Printer Setup and Printing Options options provide access to the standard Windows
printing controls.

514
6 IRT GRAPHICS

Related topics

 Options menu

6.3.2 Edit menu

The Edit menu is used for copying of graphs or entire pages to the Windows clipboard. To select
a graph, the Show Selectors option on the Options menu may be used.

Related topics

 Options menu

6.3.3 Options menu

The Options menu is used to enable graph selectors and to highlight a selected graph.

In the image below, both options have been enabled

and the selectors for the three areas of the graph below (the ICC, the item information curve and
the Category legends box) are displayed at the right of the window. The second graph has been
selected, and this entire section of the window is highlighted in dark red. This selected graph
may now be saved or printed using options on the File menu.

515
6 IRT GRAPHICS

6.3.4 Graphs menu

The Graphs menu provides access to the Parameters and Fill Page options.

The Fill Page option is used to resize the graph to fill the entire window. The Parameters option
is used to change attributes of the graph displayed and is associated with the Graph Parameters
dialog box. This dialog box is used to change the position, size, and color of the currently se-
lected graph and its plotting area.

The following functions are defined:

 The Left, Top, Width, and Height edit controls allow the user to specify a new position
and size of the graph (relative to the page window) and of the plotting area (relative to the
graph window).
 The Color drop-down list boxes are used to specify the graph window color and the color
of the graph’s plotting area.
 If the Border check box is checked, the graph will have a border around it.
 If the Border check box is checked, the Border Attributes button leads to another stan-
dard dialog box (the Line Parameters dialog box) that allows specification of the thick-

516
6 IRT GRAPHICS

ness, color, and style of the borderline.

In addition to the Graphs Parameters dialog box, a number of other dialog boxes may be used
to change attributes of graphs. The dialog boxes accessible depend on the type of graph dis-
played. The dialog boxes are:

 Axis Labels dialog box


 Text Parameters dialog box
 Bar Graph Parameters dialog box
 Legend Parameters dialog box
 Line Parameters dialog box

The user may access any of these dialog boxes by double-clicking in the corresponding section
of the graph. For example, double-clicking in the legend area of the graph will activate the Leg-
end Parameters dialog box. Double-clicking on the title of the graph, on the other hand, will
provide access to the Text Parameters dialog box.

6.3.5 Axis Labels dialog box

This dialog box is used for editing axis labels and is activated by double clicking on the axis of a
displayed graph.

The following functions are defined:

 The Labels Position group box controls the position of the labels relative to the axis or
plotting area.
 The Last Label group box allows manipulation of the last label drawing options. If On is
selected, the last label is displayed like the others. If Off is selected, it is not displayed, If
Text is selected, the text string entered in the edit box below will be displayed instead of
the last numerical label.
 The format of the numerical labels can be specified using the radio buttons in the Format
group box.

517
6 IRT GRAPHICS

 The Date Parameters group box becomes active once the Date radio button is checked.
The Date Format box selects the date format to use for labels, while the Date Time Base
box selects the time base (minute, hour, day, week, month, year) for the date calculations.
The Starting Date drop-down list boxes specify the starting date that corresponds to the
axis value of 0. All dates are calculated relative to this value.
 If the Set Precision check box is not checked, the labels’ precision is determined auto-
matically. If it is checked, the number entered into the #Places field specifies the number
of digits after the decimal point.
 The Text Parameters button provides access to the Text Parameters dialog box (see
Section 6.3.10) that controls the font, size, and color of labels.

Related topics

 Text Parameters dialog box

6.3.6 Bar Graph Parameters dialog box

This dialog box is used for editing the parameters of all bars in a regular bar graph, or a selected
group member of grouped bar graphs. It is displayed when a bar in the histogram (Histogram
option on the Main menu) is double-clicked.

518
6 IRT GRAPHICS

It operates as follows:

 If the Border check box is checked, the bars have a border around them. In this case, the
Border Attributes button leads to the Line Parameters dialog box that controls border
thickness, color, and style.
 The Data button leads to the spreadsheet-style window for editing plotted data points
(shown below)

519
6 IRT GRAPHICS

 The Hatch Style drop-down list box allows the user to choose the hatch style for bars.
 The Bar Color scrolling bars control the bar RGB color.
 The Position radio buttons control the bar position relative to the independent variable
values.
 The Width string field allows the user to enter the bar width in units of the independent
variable.

6.3.7 Legend Parameters dialog box

This dialog box allows the editing of legends. It opens when the mouse button is double-clicked
while the cursor is anywhere inside the legend box, except over a symbol representing a plotting
object.

520
6 IRT GRAPHICS

This dialog box operates as follows:

 The Left, Top, Width, and Height edit controls allow the user to specify a new position
and size of the legend-bounding rectangle relative to the graph window.
 The Color drop-down menu specifies the legend rectangle background color.
 If the Border check box is checked, the rectangle will have a border. In this case, the
Border Attributes button leads to the Line Parameters dialog box that controls border
thickness, color, and style of the border line.
 The multi-line text box in the lower left corner lists and allows editing of each of the leg-
end text strings.
 The Text Parameters button leads to the Text Parameters dialog box discussed earlier.

Related topics

 Text Parameters dialog box

6.3.8 Line Parameters dialog box

This dialog box is used for editing lines in the graph. It is accessed via the Plot Parameters dia-
log box, which is activated when a curve in a graph is double-clicked.

It has the following functions:

 The Color drop-down list box controls the line color.

521
6 IRT GRAPHICS

 The Style drop-down list box, visible when activated, allows selection of a line style.
 The Width control specifies the line width, in window pixels.

Related topics

 Plot Parameters dialog box

6.3.9 Plot Parameters dialog box

The Plot Parameters dialog box is accessed when a curve is double-clicked.

 The type of line to be displayed may be changed using the Type drop-down lost box.
 To fill the area under the curve, the Fill Area check box may be used.
 The type of curve fitted (spline or not) is controlled by the Spline check box.
 The Data button provides direct access to the data used to plot the curve.
 The Line Attributes button provides access to the Line Parameters dialog box (shown to
the right of the Plot Parameters dialog box below). The Line Parameters dialog box is
discussed elsewhere in this section.

Related topics

 Line Parameters dialog box

6.3.10 Text Parameters dialog box

This dialog box is used for editing text strings, labels, titles, etc. It can be called from some of
the other dialog boxes controlling graphic features. It may be activated by double clicking on any
text in a displayed graph.

522
6 IRT GRAPHICS

The following functions are defined:

 The Text edit control allows the user to edit the text string.
 The Font drop-down list box allows control of the typeface.
 The text color can be selected from the Color drop-down menu.
 The size of the fonts (in points) is controlled by the Size drop-down menu.
 The Bold, Italic and Underline check boxes control the text style.

6.4 Item characteristic curves

The item characteristic curve is a nonlinear function that portrays the regression of the item score
on the trait or ability measured in a test. It shows the relationship between the probability of suc-
cess on the item to the ability measured by the item set or test containing the item.

In the case of binary data, a single curve is used to portray this relationship, and the difficulty,
discrimination and guessing parameters (where applicable) are indicated on the graph. In poly-
tomous models such as the graded response model and nominal response model, a number of
item option curves are plotted. Each curve shows the selection probability of a category of the
item as a function of the ability.

For a description of the models for which item characteristic curves or item option curves may be
obtained, see

Binary data:

 The one-parameter (1PL, Rasch) model (see Section 7.2)


 The two-parameter (2PL, Birnbaum) model
 The three-parameter (3PL, guessing) model

Polytomous data:

 Masters’ Partial Credit model (see Section 7.3.3)


 Thissen and Steinberg’s (1984) multiple response model (see Section 7.4.4)

523
6 IRT GRAPHICS

 The nominal model (see Section 7.4.7)


 Samejima’s (1969) graded model (see Sections 7.3.2)
 The rating-scale model (see Section 7.3.4)

6.5 Item information curves

Item information functions are dependent on ability and provide valuable insight into the differ-
ences in the precision of measurement different ability levels. They are of particular interest in
test construction, where these curves can be used to ensure the inclusion of different items that
maximize the precision of measurement at different levels of θ in the test.

In the case of a 1PL model, the item information function is given by (Hambleton & Swa-
minathan, 1985, Table 6-1)

D 2 {1 + exp[− D(θ − bi )]}−1{1 − (1 + exp[− D(θ − bi )]) −1}

The maximum value of the information is constant for the one-parameter model and is at the
point bi .

For a 2PL model, the item information function is given by (Hambleton & Swaminathan, 1985,
Table 6-1)

D 2 ai2 {1 + exp[− D(θ − bi )]}−1{1 − (1 + exp[− D(θ − bi )]) −1}

with the maximum value directly proportional to the square of the item discrimination parameter,
a. A larger value of a is associated with greater information. The maximum information is ob-
tained at bi .

For the three-parameter model, the information function is (Hambleton & Swaminathan, 1985,
Table 6-1)

D 2 ai2 {1 + exp[− D(θ − bi )]}−1{1 − (1 + exp[− D(θ − bi )]) −1}


{1 − ci }2

The maximum information is reached at

1
bi + ln{1/ 2 + 1/ 2 1 + 8ci }
Dai

An increase in information is associated with a decrease in ci . The maximum information is ob-


tained when ci = 0.

The slope of the item response function and the conditional variance at each ability level θ play

524
6 IRT GRAPHICS

an important role in terms of the information provided by an item. An increase in the slope, to-
gether with a decrease in the variance, leads to more information being obtained. This in turn
provides a smaller standard error of measurement. By assessing these curves, items with large
standard errors of measurement may be identified and discarded.

The contributions of both item and test information curves are summarized by Hambleton &
Swaminathan (1985) as follows:

“The item and test information functions provide viable alternatives to the classical concepts
of reliability and standard error. The information functions are defined independently of any
specific group of examinees and, moreover, represent the standard error of measurement at
any chosen ability level. Thus, the precision of measurement can be determined at any level
of ability that is of interest. Furthermore, through the information function, the test construc-
tor can precisely assess the contribution of each item to the precision of the total test and
hence choose items in a manner that is not contradictory with other aspects of test construc-
tion. ”

The item and item information curves for two items to which a 3PL model has been fitted are
shown below. The discrimination parameter for item 24 is approximately twice that of item 25,
and the effect of this can be seen in the corresponding item information curves. Both item infor-
mation functions were plotted on the same scale. The item in the test with the most information
determines the scale.

525
6 IRT GRAPHICS

Related topics

 The Information option (see Section 6.2.2)


 The ICC and Info option (See Section 6.2.3)

6.6 Test information curves

The test information function summarizes the information function for a set of items or test. The
contribution of each item in the test to the total information is additive, as can be seen from the
definition of the test information function

n
Pi ' (θ ) 2
I (θ ) = ∑
i =1 Pi (θ )Qi (θ )

where Pi (θ ) denotes the probability of an examinee responding correctly to item i given an abil-
ity of θ , and Qi (θ ) = 1 − Pi (θ ) .

The function provides information for a set of items at each point on the ability scale and the
amount of information is influenced by the quality and number of test items. As was the case for
the item information function, the item slope and item variance play an important role. An in-
crease in the slope and a decrease in the item variance both lead to more information being ob-
tained. This in turn provides a smaller standard error of measurement. Also note that the contri-
bution of each test item is independent of the other items in the test.

The amount of information provided by a set of test items at an ability level is inversely related
to the error associated with ability estimates at the ability level. The standard error of the ability

526
6 IRT GRAPHICS

estimates at ability level θ can be written as

1
SE (θ ) = .
I (θ )

An example of test information and measurement error curves is shown below. Note that the ver-
tical axis to the left is used to find the information at a given ability while the vertical axis to the
right serves a similar purpose for the standard error curve.

Related topics

 The Total Info option (see Section 6.2.4)

527
7 OVERVIEW OF MODELS

7 Overview and models

7.1 Overview of IRT programs

In this section, a brief overview of each of four IRT programs published by Scientific Software
International, Inc., is given. In subsequent sections, the models available and statistics that are
produced by each program are discussed in detail.

7.1.1 BILOG-MG

BILOG-MG (Zimowski, Muraki, Mislevy & Bock, 1996) is an extension of the BILOG (Mislevy
& Bock, 1990) program that is designed for the efficient analysis of binary items, including mul-
tiple-choice or short-answer items scored right, wrong, omitted, or not-presented. The program
performs the same analyses as BILOG in the single-group case. The BILOG-MG program im-
plements an extension of Item Response Theory (IRT) to multiple groups of respondents.

BILOG-MG is capable of large-scale production applications with unlimited numbers of items or


respondents; it can perform item analysis and scoring of any number of subtests or subscales in a
single program run. All the program output may be directed to text files for purposes of selecting
items or preparing reports of test scores.

The program provides 1, 2, and 3 parameter logistic models for binary scored responses and ac-
commodates both nonequivalent groups equating for maintaining the comparability of scale
scores as new forms of the test are developed, and vertical equating of test forms across school
grades or age groups.

Analysis of differential item functioning (DIF) with respect to item difficulty associated with
demographic or other group differences may be performed with BILOG-MG, and provision is
made for the detection and correction for item parameter trends over time (DRIFT). In addition,
the BILOG-MG program provides for “variant items” that are inserted in tests for purposes of
estimating item statistics, but are not included in the scores of the examinees.

The present version of BILOG-MG includes a fully developed Windows graphical interface.
Syntax can be generated or adapted using menus and dialog boxes or, as before, in the format of
command files in text format. The interface has menu options that are in the order that the user
would most generally use: model specification is followed by data specification and technical
specifications, etc. Each of the menu options provides access to a number of dialog boxes on
which specifications are entered by the user.

7.1.2 PARSCALE

A versatile IRT rating-scale program is PARSCALE, written by Muraki & Bock (1996).
PARSCALE is capable of large-scale production applications with unlimited numbers of items
or respondents. The program can perform item analysis for both dichotomous and polytomous
data, and scoring of any number of subtests or subscales in a single program run. Up to 15 cate-

528
7 OVERVIEW OF MODELS

gories can be accommodate by PARSCALE. The user has the option to use the normal ogive or
the logistic response function.

This program includes options to make adjustments for differences in rater severity, and to in-
clude DIF of rating-scale items. PARSCALE has the ability to mix rating-scale items and multi-
ple-choice with or without guessing, and to handle multiple subtests and weighted combinations
of subtest scores. The program also provides the option to use Samejima’s graded response
model generalized for rating-scales or Masters’ partial credit model with or without discriminat-
ing power coefficients.

Program output may be directed to text files for purposes of selecting items or preparing reports
of test scores.

7.1.3 MULTILOG

The most versatile of the SSI IRT programs is MULTILOG (Thissen, 1991). It applies to both
binary and multiple category item scores and makes use of logistic response models, such as
Samejima’s (1969) model for graded responses, Bock’s (1972) model for nominal (non-ordered)
responses, and Thissen & Steinberg’s (1984) model for multiple-choice items. The commonly
used logistic models for binary item response data are also included, because they are special
cases of the multiple category models. MULTILOG provides Marginal Maximum Likelihood
(MML) item parameter estimates for data in which the latent variable of IRT is random, as well
as Maximum Likelihood (ML) estimates for the fixed-effects case. χ 2 indices of the goodness-
of-fit of the model are provided. In IRT, the item parameter estimates are the focus of item
analysis. MULTILOG also provides scaled scores: ML and Bayes modal (MAP) estimates of the
latent variable for each examinee or response pattern.

MULTILOG is best suited to the analysis of multiple-alternative items, such as those on multi-
ple-choice tests or Likert-type attitude questionnaires. It is the only widely available program
capable of fitting a wide variety of models to these kinds of data using optimal (MML) methods.
MULTILOG also facilitates refined model fitting and hypothesis testing through general provi-
sions for imposing equality constraints among the item parameters and for fixing item parame-
ters at a particular value. MULTILOG may also be used to test hypotheses about Differential
Item Functioning with either multiple response or binary data, through the use of its facilities to
handle data from several populations simultaneously and test hypotheses about the equality of
item parameters across groups. It is the only IRT program that handles all the major models: 1, 2,
and 3 parameter logistic models, multiple nominal categories, graded rating-scale model, partial
credit model, multiple-choice model, and constrained parameter models. In contrast to previous
versions, it now analyzes models of any size up to the limit of available memory.

7.1.4 TESTFACT

TESTFACT is a factor analysis program for binary scored items. This program, by Bock, Gib-
bons, Schilling, Muraki, Wilson and Wood, implements all the main procedures of classical item
analysis, test scoring, and factor analysis of inter-item tetrachoric correlations, and also modern
methods of factor analysis based on IRT. It handles item selection, multiple subtests, multiple

529
7 OVERVIEW OF MODELS

groups of examinees and correlation without an external criterion. The user can also compute
tetrachoric correlations with or without omitted or not-presented items, perform MINRES princi-
pal factor analysis and full information item factor analysis with likelihood ratio test of the num-
ber of factors, compute Bayes estimates of factor scores from the multidimensional IRT model,
and simulate item response data for the multidimensional model.

New features in TESTFACT are all part of Full information Item Factor Analysis (FIFA). The
command and procedures of classical item statistics and classical factor analysis of tetrachoric
correlation coefficients remain unchanged. The changes to full information item factor analysis
consist of a new and improved algorithm for estimating the factor loadings and scores—
specifically, new methods of numerical integration are used in the EM solution of the marginal
maximum likelihood equations. Four different methods of multidimensional numerical integra-
tion for the E-step of the EM algorithm are provided: adaptive quadrature, fractional adaptive
quadrature, non-adaptive quadrature, and Monte Carlo integration.

In exploratory item factor analysis, these methods make possible the analysis of up to fifteen fac-
tors and improve the accuracy of estimation, especially when the number of items is large. The
previous non-adaptive method has been retained in the program as a user-selected option
(NOADAPT), but the adaptive method is the default. The maximum number of factors with adap-
tive quadrature is 10; with non-adaptive quadrature, 5; with Monte Carlo integration, 15. Bayes
estimates of scores for all factors can be estimated either by the adaptive or non-adaptive
method. Estimation of the classical reliability of the factor scores is also included.

TESTFACT includes yet another full information method that provides an important form of
confirmatory item factor analysis called “bifactor” analysis. The factor pattern in bifactor analy-
sis consists of a general factor on which all items have some loading, plus any number of so-
called “group factors” to which non-overlapping subsets of items, assigned by the user, are as-
sumed to belong. The subsets typically represent small numbers of items that pertain to a com-
mon stem such as a reading passage or problem-solving exercise. The bifactor solution provides
Bayes estimation of scores for the general factor, accompanied by estimated standard errors that
properly account for association among responses attributable to the group factors.

7.2 Models in BILOG-MG


7.2.1 Introduction

The central concept of item response theory is that of the item response model. These models are
mathematical expressions describing the probability of a correct response to a test item as a func-
tion of the ability (or proficiency) of the respondent. For binary data, the response functions most
often encountered in IRT applications are the normal ogive and the logistic models. These are
discussed in Section 7.2.3. Multiple-group applications are considered in Section 7.2.2.

530
7 OVERVIEW OF MODELS

7.2.2 Multiple-group analyses3

Background for multiple-group models

In the multiple-group case, it is assumed that the response function of any given item is the same
for all groups of subjects. In the DIF and DRIFT applications, however, we allow the relative
difficulties of the items to differ from one group to another or one occasion to another. In that
case, the b j parameters will differ between groups, and we will have to detect and estimate the
differences. Even in the presence of DIF and DRIFT, however, it is assumed that the item dis-
criminating powers are the same from one group to another. In the other applications, such as
nonequivalent groups equating or two-stage testing, we assume that both the locations and the
slope of items common to more than one group are equal. To satisfy this assumption, we would
perform a preliminary DIF analysis and not use, in equating, items showing appreciable DIF.

The main difference between the single-group and multiple-group case is in the assumption
about the latent distribution. In most equation situations, it is reasonable to assume that the re-
spondents in the sample groups are drawn from populations that are normal, but have different
means and standard deviations (see Figure 7.1).

Figure 7.1: Normal latent densities in three populations

In that case, the item response data can be described completely by estimating the means and
standard deviations of the groups along with the item parameters. One must, however, again con-
tend with the arbitrary origin and unit of the latent continuum, and may resolve this indeter-
minacy either by setting the mean and standard deviation of one of the groups to any arbitrary
values, or by setting the overall mean and variance of the combined distributions to arbitrary val-
ues. Both options are provided in BILOG-MG. The procedure for simultaneous estimation of
item parameters and latent distributions in more than one group are described in Bock & Zi-
mowski (1995) and in Mislevy (1987).

In two-stage testing applications, the situation is different. The groups correspond to examinees
who have been selected on the basis of a first-stage test to receive second-stage test forms tai-
lored to the provisional estimate of ability based on the first-stage test. Typically, the second-
stage groups are determined by cutting points on the θ -scale of the pretest. Because the pretest

3
This section was contributed by Michele Zimowski.

531
7 OVERVIEW OF MODELS

score is a fallible criterion, the θ distributions of the second-stage groups may overlap to a con-
siderable extent, but they cannot be expected to be normal even when the population from which
the examinees originate is normal. More likely in these applications the latent distributions
would appear as in Figure 7.2.

To accommodate such arbitrary shapes of distributions, one must make use of the empirical es-
timation procedure (see the section on estimation in the next chapter). As in the single-group
case, these empirical distributions can be estimated along with the item parameters by marginal
maximum likelihood. Again, the indeterminacy of location and scale must be resolved, either by
setting the mean and standard deviation of one of the groups to convenient values, such as zero
and one, or setting the overall mean and standard deviation of the combined distributions to simi-
lar values. In DIF analysis of ethnic effects, for example, the usual approach is to assign the
mean and standard deviation of the reference group, which is usually the majority demographic
group.

Figure 7.2: Two-stage testing: latent densities of three second-stage groups

In two-stage testing applications, where the groups represent an arbitrary partition of the original
sample, assigning the overall mean and standard deviation is more reasonable. In vertical equat-
ing and DRIFT analysis, on the other hand, the groups correspond to distinct populations, so the
best solution would be to choose a reference group, perhaps the youngest-age group or the first-
year group, and assign the mean and standard deviation arbitrarily in that group. Comparing the
estimated means and standard deviations of the remaining groups with the reference group would
then show the trends in the mean and variability of test performance in successive age groups or
year groups.

Equivalent groups equating

Equivalent groups equating refers to the equating of parallel test forms by assigning them ran-
domly to examinees drawn from the same population. In educational applications, this type of
assignment is easily accomplished by packaging the forms in rotation and distributing them
across whatever seating arrangement exists in the classroom. Provided there are fewer forms than
students per classroom, it is justifiable to assume that the abilities of the examinees who receive

532
7 OVERVIEW OF MODELS

the various forms are similarly distributed in the population. This is the assumption on which the
classical equi-percentile method of equating is based, and it applies also to IRT equating.

Indeed, the procedure is even simpler in IRT because the latent distribution of ability is invariant
with respect to the distribution of item difficulties in the forms (this is not true of the number-
right score of classical test theory: the test score distribution in the population of respondents is
an artifact of the distribution of item difficulties (see Lord & Novick, 1968, pp. 387-392). The
IRT scale scores computed from the various forms are therefore equated whenever their location
and scale are set in the same way for all forms. There is no necessity for common items between
forms, any more than there is for equi-percentile equating, but neither will they interfere with the
equivalent groups equating if present.

The method of carrying out equivalent groups equating is somewhat different, however, accord-
ing to whether common items between forms are or are not present. In both cases, the collection
of forms may be treated as if it were one test with length equal to the number of distinct items
over all forms. The data records are then subjected to a single-group IRT analysis and scoring.
When common items are not present, each form may also be analyzed as an independent test,
with the mean and standard deviation of the scale scores of all forms set to the same values dur-
ing the scoring phase.

Equivalent groups equating is especially well suited to matrix-sample educational assessment,


where the multiple test forms are created by random assignment of items to forms within each of
the content and process categories of the assessment design, and the forms are distributed in ro-
tation in classrooms. Often as many as 30 forms are produced in this way in order to assure high
levels of generalizability of the aggregate scores for schools or other large groups of students.

Nonequivalent groups equating

Nonequivalent groups equating is possible only by IRT procedures and has no counterpart in
classical test theory. It makes stronger assumptions than equivalent groups equating, but it re-
mains attractive because of the economy it brings to the updating of test forms in long-term test-
ing programs. Either to satisfy item disclosure regulations or to protect the test from compro-
mise, testing programs must regularly retire and replace some or all of the items with others from
the same content and process domains. They then face the problem of equating the reporting
scales of the new and old forms so that the scores remain comparable.

Although equivalent groups equating will accomplish this, it requires a separate study in which
the new and old forms are administered randomly to examinees from the same population. A
more economical approach is to provide for a subset of items that are common to the old and
new forms, and to employ nonequivalent groups equating to place their scores on the same scale.
These common or “link” items are chosen from the old form on the basis of item analysis results.
Link items should have relatively high discriminating power, middle range difficulty, and should
be free of any appreciable DIF effect. With suitable common items included, the old and new
forms can be equated in data from the operational administration of the tests without an addi-
tional equating study. Only the BILOG-MG program can perform this type of equating.

533
7 OVERVIEW OF MODELS

Although the case records from the current administration of the new form and the earlier ad-
ministration of the old form are subjected to a single IRT item analysis in nonequivalent equat-
ing, the test form is identified on each case record and separate latent distributions are estimated
for examinees taking different forms. For typical applications of the procedure to unrestricted
samples of examinees, the latent distributions may reasonably be considered normal. In that case,
the estimation of the mean and standard deviation of each distribution jointly with the item pa-
rameters allows for the nonequivalence of the two equating groups. The common items provide
the link between the two samples of data so that we may fix the arbitrary origin and unit of a sin-
gle reporting scale. Simulation studies have shown that if the sample sizes for the two groups are
large enough to ensure highly precise estimation of the item parameters, as few as four anchor
items can accurately equate the reporting scales for the test forms (see Lord, 1980).

In the BILOG-MG procedure, this method of equating can be extended to nonequivalent groups
equating of any number of such forms, provided there are common items linking the forms to-
gether in an unbroken chain. An example of a plan for common item linking of a series of test
forms is shown in Figure 7.3.

Figure 7.3: An item linking design for test forms updating

Variant items

If total disclosure of the item content of an educational test is required, a slightly different strat-
egy is followed. Special items, called “variant” items, are included in each test form but not used
in scoring the form in the current year. It is not necessary that all test booklets contain the same
variant items; subsets of variant items may be assigned in a linked design to different test book-
lets in order to evaluate a large number of them without unduly increasing the length of a given
test booklet. These variant items provide the common items that appear among the operational
items of the new form, which itself includes other variant items in anticipation of equating to a
later form. The item calibration of the old and new form then includes, in total, the response data
in the case records for the operational items of the old form, for the linking variant items that ap-
peared on the old form, and for all operational items from the new form. In this way, all of the
items in the current test form can be released as soon as testing is complete.

Vertical equating

Vertical equating refers to the creation of a single reporting scale extending over a number of
school grades or age groups. Because the general level of difficulty of finding items in tests in-
tended for such groups must increase with the grade or age, the forms cannot be identical. There
is little difficulty in finding items that are suitable for neighboring grades or age groups, how-
ever, and these provide the common items that can be used to link the forms together on a com-

534
7 OVERVIEW OF MODELS

mon scale. Inasmuch as these types of groups necessarily have different latent distributions, non-
equivalent groups equating is required. BILOG-MG offers two methods for inputting the re-
sponse records. In the first method, each case record spans the entire set of items appearing in all
the forms, but the columns for the items not appearing in the test booklet of a given respondent
are ignored when the data are read by the program. All of the items thus have unique locations in
the input records and are selected from each record according to the group code on the record. In
the second method, the location of the items in the input records is not unique. An item in one
form may occupy the same column as a different item in another form. In this case, the items are
selected from the record according to the form and the group codes on the record. These methods
of inputting the response records apply in all applications of BILOG-MG. See Chapter 10 for ex-
amples of both types of data input.

Differential item functioning (DIF)

The purpose of differential item functioning analysis is to detect and estimate interactions be-
tween item difficulties and various subgroups within the population of respondents (see Thissen,
Steinberg, & Wainer, 1993). It is most often applied to interactions with respect to demographic
or ethnic groups and to gender, but any classification of the respondents could be investigated in
a similar manner. Specifically, it is the interactions of the item location parameters, b j , reflecting
the item difficulties, that are in question. DIF includes only the relative differences in difficulties
between the groups. Any reduction of the item percent corrects due to the average level of ability
in the group, as indicated by the mean of the corresponding latent distribution, we attribute to the
“adverse impact” of the test and do not regard it as DIF. Moreover, we assume that the differen-
tial item functioning does not extend to the item discriminating powers. The b j parameters for
the separate groups are estimated on the assumption that the slope parameters, a j , are homoge-
neous across groups. (for an alternative form of DIF analysis that includes differential item dis-
criminating power, see Bock, 1993).

DIF analysis is similar to nonequivalent groups equating in the sense that different latent distri-
butions are assumed for the groups in question, but it differs because the same form of the test is
administered in all of the groups. It also provides a large sample standard error estimate of the
effect estimators. In addition, the program provides an overall marginal likelihood ratio test of
the presence of differential item functioning in the data. To perform this test, first analyze the
data in a single group as if they came form the same population and note the marginal maximum
log likelihood of the item parameters in the final iteration (labeled –2 LOG LIKELIHOOD in the
output). Then, analyze the data in separate groups using the DIF model and again note the final
log likelihood. Under the null hypothesis of no DIF effects on item locations, the difference in
these log likelihoods is distributed in large samples as χ 2 with (n − 1)(m − 1) degrees of freedom,
where n is the number of items and m is the number of groups. When this χ 2 is significant, there
is evidence that differential item effects are present. Their interpretation usually becomes clear
when the item content is examined in relation to the direction of the estimated contrasts in the b j
parameters, because these contrasts are interactions and must sum to zero (some are positive and
others negative).

535
7 OVERVIEW OF MODELS

Item parameter drift (DRIFT)

As defined by Bock, Muraki & Pfiffenberger (1988), DRIFT is a form of DIF in which item dif-
ficulty interacts with the time of testing. It can be expected to occur in education tests when the
same items appear in forms over a number of years and changes in the curriculum or instruc-
tional emphasis interact differentially with the item content (see Goldstein, 1983). Bock, Muraki
& Pfiffenberger found numerous examples of DRIFT among the items of a form of the College
Board’s Advanced Placement Test in Physics that had been administered annually over a ten-
year period (see Figure 7.4). DRIFT is similar to DIF in admitting only the item interaction:
changes in the means of the latent distributions of successive cohorts are attributed to changes in
the levels of proficiency of the corresponding population cohorts.

Figure 7.4: Drift of the location parameters of two items from a College Board Advanced
Placement Examination in Physics

DRIFT differs from DIF in that the interaction of item location with time is assumed to be a con-
tinuous process that can be modeled by a straight or low degree polynomial regression line.
Thus, in place of estimating contrasts between groups, we estimate the coefficients of the linear
or polynomial function in time that describes the DRIFT in the b j parameters. The significance
of the trends can be judged from the size of the estimated regression coefficient relative to its
large sample standard error estimate. The overall presence of DRIFT can be tested in a marginal
likelihood ratio test similar to that for DIF.

As implemented in BILOG-MG, DRIFT analysis does not require all items to be included in
each test form. The DRIFT regression functions are estimated for whatever time points are avail-
able for each item. In most DRIFT applications, it is satisfactory to assume that the latent distri-
butions of the yearly cohorts are normal. The corresponding means and standard deviations esti-
mated in the DRIFT analysis describe differences in the proficiencies of the cohorts.

Two-stage testing

Two-stage testing is a type of adaptive item presentation suitable for group administration. By
tailoring the difficulties of the test forms to the abilities of selected groups of examinees, it per-
mits a reduction in test length by a factor of a third or a half without loss of measurement preci-
sion. The procedure employs some preliminary estimate of the examinees’ abilities, possibly

536
7 OVERVIEW OF MODELS

from a short first-stage test or other evidence of achievement, to classify the examinees into three
or four levels of ability. Second-stage test forms in which the item difficulties are optimally cho-
sen are administered to each level. Forms at adjacent levels are linked by common items so that
they can be calibrated on a scale extending from the lowest to the highest levels of ability. Simu-
lation studies have shown that two-stage testing with well placed second-stage tests is nearly as
efficient as fully adaptive computerized testing when the second-stage test has four levels (see
Lord, 1980).

The IRT calibration of the second-stage forms is essentially the same as the nonequivalent forms
equating described above, except that the latent distributions in the second-stage groups cannot
be considered normal. This application therefore requires estimation of the location, spread, and
shape of the empirical latent distribution for each group jointly with the estimation of item pa-
rameters. During the scoring phase of the analysis, these estimated latent distributions provide
for Bayes estimation of ability combining the information from the examinee’s first-stage classi-
fication with the information from the second-stage test. Alternatively, the examinees can be
scored by the maximum likelihood method, which does not make use of the first-stage informa-
tion. The BILOG-MG program is capable of performing these analyses for the test as a whole, or
separately for each second-stage subtest and its corresponding first-stage test. For an example of
an application of two-stage testing in mathematics assessment see Bock & Zimowski (1989).

Estimating latent distributions

An innovative application of the BILOG-MG program is the estimation, from matrix sampled
assessment data, of the latent distributions for schools or other groups of students. Certain matrix
sampling designs, such as those employed by the National Assessment of Educational Progress,
include in each booklet a number of short scales, consisting of eight or nine items, in several sub-
ject-matter areas. These scales have too few items to permit reliable estimation of the profi-
ciencies of individual examinees in each subject matter, but they do allow estimation of the latent
distribution of each proficiency at the group-level if the number of respondents is sufficiently
large. There is a tradeoff between the number of items for each scale in each test booklet and the
number of respondents: the more items, the fewer respondents are needed for accurate estimation
of the group latent distribution.

If each booklet contains perhaps 48 items, the latent distributions for six content areas could be
estimated simultaneously. The results of the assessment could then be reported to the public in
terms of the means and standard deviations of the achievement levels of the schools or groups.
Alternatively, if achievement standards have been set in terms of IRT scale score levels, the per-
cent of students attaining or exceeding each level can be computed from the latent distribution
and reported. The latter form of reporting is often more easily understood than scale-dependent
statistics such as the mean and standard deviation. Because the BILOG-MG program allows
unlimited numbers of groups as well as unlimited numbers of items and respondents, it is well
suited to the estimation of latent distributions for this form of reporting. The shape of the latent
distributions may either be assumed normal or estimated empirically.

537
7 OVERVIEW OF MODELS

7.2.3 Technical details

The normal ogive model

A response to a binary test item j is indicated in these expressions by the item score,

xj = 1 if the respondent answers correctly, or

xj = 0 if the respondent answers incorrectly.

Let θ denote the ability of the person, and let the probability of a correct response to item j be
represented by

P ( x j = 1| θ ) = Pj (θ ) ;

and thus, the probability of an incorrect response is given by

P ( x j = 0 | θ ) = 1 − Pj (θ ) .

In general, the response function also depends upon one or more parameters characteristic of the
item, the values of which must be estimated.

The normal ogive model is defined as:


1
Pj (θ ) = ∫ e −[1/ 2]t dt ,
2

2π − (θ −b j ) / σ j

where σ j = 1/ a j is called the item dispersion, a j is the item discriminating power and b j is an
item location parameter. The normal ogive model is conventionally represented as Φ j (θ ) .

The logistic models for binary scored items

At present, the response models most widely used in applied work are the logistic models for bi-
nary scored items. The most important of these models are:

 The one-parameter (1PL, Rasch) model


 The two-parameter (2PL, Birnbaum) model
 The three-parameter (3PL, guessing) model

538
7 OVERVIEW OF MODELS

The one-parameter (1PL, Rasch) model

The one-parameter logistic model is defined as

1
P(1) j (θ ) =
1 + exp[− a (θ − b j )]

where

exp(k ) = e k and

e = 2.718 is the base of the natural logarithm,

a is a scale constant determining the units of θ , and

bj is a location parameter related to the difficulty of item j (also referred to as the


item “threshold”). Items with larger values of b j are more difficult; those with
smaller values are easier.

The two-parameter (2PL, Birnbaum) model

The two-parameter logistic model is defined as

1
P(2) j (θ ) =
1 + exp[− a j (θ − b j )]

where a j is the item discriminating power, and b j is an item location parameter as in the 1PL
model.

The negative of the exponent in this model,

z j = a j (θ − b j ),

is referred to as a logistic deviate, or logit. The logit can also be written as z j = a jθ + c j where
c j = −a j b j . In this form, a j is referred to as the item slope and c j as the item intercept (see Fig-
ure 7.5).

539
7 OVERVIEW OF MODELS

Figure 7.5: the two-parameter logistic model

The 2PL model is conventionally represented as

1
Ψ j (θ ) = −zj
1+ e

If all a j are equal, the model is reduced to a one-parameter logistic or Rasch model.

The three-parameter (3PL, guessing) model

In the case of multiple-choice items, an examinee who does not know the correct alternative may
succeed in responding correctly by randomly guessing. If the examinee’s ability is θ , the prob-
ability that the examinee will not know the answer, but guesses correctly (with probability g j ) is
g j [1 − Ψ j (θ )] . The probability that the examinee will respond correctly either by knowledge or
by random guessing is therefore

P3 j (θ ) = g j [1 − Ψ j (θ )] + Ψ j (θ )
= g j + (1 − g j )Ψ j (θ ),

where g j is the probability of a correct response to a multiple-choice item as a result of guess-


ing. If the correct response alternative is randomly assigned, and all of the examinees guess
blindly, the value of g j is equal to 1/A, where A is the number of alternatives of the multiple-
choice item. If some of the examinees guess after eliminating one or more of the alternatives, g j

540
7 OVERVIEW OF MODELS

will be greater than 1/A by some amount that must be determined empirically along with the a j
and b j or c j parameters.

The parameter g j corresponds to the lower asymptote of the item response function, P3 j (θ ) .
This interpretation of g j , as well as that of the other item parameters, is shown in Figure 7.6.

Figure 7.6: Three-parameter logistic model

Relationship between normal ogive and logistic models

The logistic item response models are closely related to the normal ogive model. In order to
bring the logistic models into close agreement with the normal ogive model, θ is multiplied by
the factor D = 1.7. When D = 1.7 is used, the discrepancy between the normal response function
and its logistic approximation is never greater than 0.01.

When the logit incorporates this factor, as in z j = a j D(θ − b j ) , the models are said to be in the
normal metric.

Classical item statistics

BILOG-MG computes and uses classical item statistics as starting values for the iterative esti-
mation of the IRT parameters.

On the assumption that θ is distributed with zero mean and unit standard deviation in the popu-
lation of respondents, the normal ogive item parameters are related to the classical item statistics
as follows (see Lord & Novick, 1968, Sections 16.9 and 16.10).

541
7 OVERVIEW OF MODELS

Reliability index (item-trait correlation):

If one assumes a bivariate normal distribution of the population over the item and criterion vari-
ables, Richardson (1936) and Tucker (1946) have shown that

ρ j = a j / 1 + a 2j , 0 ≤ ρ j ≤ 1

where ρ j is the biserial correlation between ability and item j. In classical item analysis, ρ j is
estimated by the item-test correlation (the correlation between response to the item scored 1 or 0
and number-right score for the test).

We see from the equation above that an item with slope 1 (in the normal metric) has a reliability
index equal to 1/ 2 = 0.707. Items with slopes greater than 1 are more reliable (more discrimi-
nating measures of the trait represented by the test); those with slopes less than 1 but greater than
zero are less reliable. Items with a negative slope are keyed in a direction opposite to that of the
other items. The same relationships hold with good approximation for the logistic parameters
expressed in the normal metric.

Item facility (p-value):

Tucker (1946) has expressed classical item difficulty Pj as a function of the item parameters a j
and b j :

Pj = Φ (−a j b j / 1 + a 2j ) ,

that is, p j is the value of the standard normal distribution function at the point

−a j b j
= −b j ρ j
1 + a 2j

i.e., the area to the left of the point under the normal curve.

From the equations above it follows that


ρ
a j =
j

2
1− ρ j

542
7 OVERVIEW OF MODELS

and

z j
b j =

ρ j

where z j is calculated using the inverse normal distribution with ρ


 = P ( z ≤ z j ).
j

7.2.4 Statistical tests

Because BILOG-MG employs maximum likelihood estimation when fitting the IRT model,
large-sample statistical tests of alternative models are available, provided one model is nested
within each other. Two models are called “nested” when the larger model is formed from the
smaller by the addition of terms and parameters. For example, the one-parameter logistic model
is nested within the two-parameter model, which is in turn nested within the three-parameter
model. Similarly, the single-group model is nested within the two-group model, and so on. The
smaller of the nested models is referred to as the “null” model and the larger as the “alternative”.
The statistical test of the alternative model vs. the null model is equivalent to a test of the hy-
pothesis that the additional parameters in the alternative are all zero and that no significant im-
provement in fit is obtained by including them.

At the end of the estimation cycles in the calibration phase, BILOG-MG prints the negative of
the maximum marginal log likelihood. If the program is run, with the same data, once with the
null model and once with the alternative model, the negative of the log likelihood of the former
will always be larger than that of the latter. In large-samples, the positive difference of these log
likelihoods is distributed as χ 2 on the null hypothesis. Its number of degrees of freedom is equal
to the difference in the number of parameters in the null and alternative models. A model with
more parameters should be adopted only when this test statistic is clearly significant. Otherwise,
fitting of the additional parameters will needlessly reduce precision of estimation.

BILOG-MG also provides a large-sample test of the goodness-of-fit of individual test items in
the analysis: this requires the test to have 20 or more items.

If the test is sufficiently long (more than 20 items), the respondents in a sample of size N can be
assigned with good accuracy to intervals on the θ -continuum on the basis of their estimated
value of θ (for this purpose, we use the EAP estimate with whatever prior is assumed for item
calibration; see the section on test and item information to follow). Then the number of those in
each interval who respond correctly to item j can be tallied from their item scores.

Finally, a likelihood ratio χ 2 test statistic can be used to compare the resulting frequencies of
correct and incorrect responses in the intervals with those expected from the fitted model:

543
7 OVERVIEW OF MODELS

 
ng
 rhj N h − rhj ,
X = 2∑ rhj log e
2
+ ( N h − rhj ) log e
j
 _ _ 
h =1
 N h Pj (θ h )) N h (1 − Pj (θ h )) 

where ng is the number of intervals, rhj is the observed frequency of correct responses to item j
in interval h, N h is the number of respondents assigned to that interval, and Pj (θ h ) is the value
of the fitted response function for item j at θ h , the average ability of respondents in interval h.

Because neither the MML or MAP methods of fitting the response functions actually minimize
this χ 2 , the residuals are not under linear constraints and there is no loss of degrees of freedom
due to the fitting of the item parameters. The number of degrees of freedom is therefore equal to
the number of intervals remaining after neighboring intervals are collapsed if necessary to avoid
expected values less than 2.

7.3 Models in PARSCALE4


7.3.1 Introduction

Psychological, sociological, educational and medical data often consist of responses classified in
two or more predefined categories. The extra information contained in multiple-category re-
sponse classifications helps offset the greater cost (compared to machine scored multiple-choice
items) of ratings based on human judgments. Provided the readers are able to assign the catego-
ries consistently, multiple-category scoring is more informative than binary scoring, because it
contains multiple thresholds of difficulty corresponding to the boundaries between the catego-
ries. By discriminating among the respondents at more than one level, multiple-category scoring
of an extended response has the same advantages as adaptive testing with several binary-scored
items at different levels of difficulty.

The generalization of IRT to the multiple-category case provides a comprehensive methodology


for the analysis and scoring of this type of data. It applies optimal procedures in place of the ad
hoc rules traditionally used in quantifying and weighting categorical data. In particular, it solves
the hitherto intractable problem of how to combine the information in ratings when different
items have varying numbers of differently defined categories. Utilizing maximum likelihood or
Bayes estimation derived from multiple-category item response models, the IRT approach em-
ploys all the information in the categorical data efficiently to assign quantitative scores to the
respondents.

Readers familiar with IRT in the binary case will find the generalization to the multiple-category
case quite straightforward. The concept of a latent dimension on which response probability

4
This section was contributed by Eiji Muraki.

544
7 OVERVIEW OF MODELS

functions are defined carries over from the binary case without changes, and parameters of the
response functions must still be estimated; the estimated parameters are then used to estimate
scores for the respondents. The only new element is the more general form of the response func-
tions and the greater number of parameters per item. The similarity of the two cases is apparent
in the parallel structure of the BILOG-MG and PARSCALE programs: both programs have a
data input, item calibration and test-scoring phase. Both are designed for efficient use in large-
scale testing programs based on instruments with many items and possible multiple subtests or
scales.

The current version of PARSCALE handles data in which the responses to a number of items are
classified in a common set of ordered categories. This is perhaps the most common type of data.
In the context of attitude measurement, this type of item is often treated as a so-called “Likert”
scale, where the categories are arbitrarily assigned successive integer values (Likert, 1932). In
contrast, the IRT procedures estimate optimal empirical values to be assigned to the boundaries
between the categories. Since all of the items are rated in the same categories, the number of
boundaries to be estimated equals one less than the number of categories. The boundaries, item
locations, and respondent scores are all represented as points on the latent dimension of meas-
urement.

Another common type of data is where each item has its own specific number and definition of
categories. The number of boundaries to be estimated is therefore equal to the sum of one less
than the number of categories for each item. In this case, the item locations are absorbed in the
category boundaries of the items and are not separately estimated.

Alternatively, the instrument or test may consist of a mixture of common-category and specific-
category items. PARSCALE handles this case by assigning items to “blocks”, with categories
common within blocks and different between blocks. Each specific-category item is its own
block. In the case of binary items, i.e., items with only two categories, the categories are com-
mon by definition and all belong to the same block. An educational test, for example, may con-
tain open-ended exercises scored in five or six categories in one block, and multiple-choice items
in another block. The presence of multiple-choice items introduces the additional problem of
guessing effects (which is often absent in rated items). These effects are estimated using a three-
parameter model in the binary case.

A case not handled by the current version is that of nominal categories. These categories each
represent a qualitatively distinct type of response to the stimulus and have no predefined ordinal
relationship. A common use of the nominal model is to extract all information in responses to all
alternatives of a multiple-choice item, beyond just the contrast of correct and incorrect alterna-
tives. At the present time, only MULTILOG (Thissen, 1991) handles both ordinal and nominal
category item response data. But MULTILOG does not allow for Likert-type items with a com-
mon set of response categories.

The response models in PARSCALE are derived from normal and logistic models in the binary
case. In this section Samejima’s Graded Response Model and Masters’ Partial Credit Model are
discussed. The scoring function of the Generalized Partial Credit Model, the Rater's-Effect

545
7 OVERVIEW OF MODELS

model, the DIF model, and the Trend model for dichotomous item response models (see Bock,
Muraki & Pfeiffenberger, 1988) are then reviewed.

7.3.2 Samejima’s graded response model

If we define Pjk+ (θ ) and Pj+,k +1 (θ ) as the regression of the binary item score method in which all
response categories less than k and k + 1, respectively, are scored 1 for each item j, the operating
characteristic (Samejima, 1972) of the graded item scoring for the latent trait variable θ is

Pjk (θ ) = Pjk+ (θ ) − Pj+,k +1 (θ ).

Samejima (1969) further defines Pj+0 (θ ) and Pj+,m+1 (θ ) so that

Pj+0 (θ ) = 1
Pj+,m +1 (θ ) = 0,

where m is the number of categories minus 1. Therefore,

Pj 0 (θ ) = Pj+0 (θ ) − Pj+1 (θ ) = 1 − Pj+1 (θ )


Pj1 (θ ) = Pj+1 (θ ) − Pj+2 (θ )
+
Pjm (θ ) = Pjm (θ ) − Pj+,m +1 (θ ) = Pjm
+
(θ ),

and, in general,

Pjk (θ ) = Pjk+ (θ ) − Pj+,k +1 (θ ) ≥ 0.

For the normal ogive model (Samejima, 1974), the formula for Pjk+ (θ ) in the general case, is
given by

a j (θ −b jk )

Pjk+ (
θ) = ∫
−∞
φ (t ) dt ,

where a j is a common slope parameter and b jk is called an item-category threshold parameter.


For each m + 1 category item, there are m category threshold parameters.

From this definition,

b j1 ≤ b j 2 ≤ ... ≤ b jm .

546
7 OVERVIEW OF MODELS

An extension of Samejima’s graded item response model suitable for Likert items is

a j (θ −b j + ck )

Pjk+ ( θ) = ∫
−∞
φ (t ) dt ,

where b j is now the item-location parameter and ck the category parameter. We refer to this ex-
tension as the “rating-scale” model.

In the same manner, we can write

a j (θ −b j + ck +1 )

Pj+,k +1 ( θ) = ∫
−∞
φ (t ) dt.

If a j > 0, then Pjk+ (θ ) − Pj+,k +1 (θ ) ≥ 0 , and ck − ck +1 ≥ 0 .

From these results, we obtain the response function of a graded category under the normal ogive
model:

a j (θ −b j + ck )

Pjk+ (θ) = ∫
a j (θ −b j + ck +1 )
φ (t ) dt.

The corresponding logistic form of the graded response model is

exp[ Da j (θ − b j + ck )] exp[ Da j (θ − b j + ck +1 )]
Pjk (θ ) = −
1 + exp[ Da j (θ − b j + ck )] 1 + exp[ Da j (θ − b j + ck +1 )]
1 1
= −
1 + exp[− Da j (θ − b j + ck )] 1 + exp[− Da j (θ − b j + ck +1 )]

where D = 1.7. Both models in these equations are response functions for items scored in succes-
sive categories. A major distinction between the two models is that in the rating-scale model,
Samejima’s parameter b jk is resolved into an item location parameter b j and a category parame-
ter ck .

If each item has its own response categories, which may differ in number m j (j = 1, 2, …, n), the
graded response model is required. Figures 7.7, 7.8 and 7.9 illustrate the meaning of the pa-
rameters, a j , b j and ck . All examples are the logistic rating-scale models with four categorical
responses. Therefore, the model contains three category parameters, ck .

547
7 OVERVIEW OF MODELS

Figure 7.7: Graded response model: a = 1.0, b = 0.0, c = (2.0,0.0,-2.0)

In Figure 7.7, the values of parameters are set as a j = 1.0, b j = 0.0, c1 = 2.0, c2 = 0.0, and c3 = -
2.0. Item category trace lines are drawn from left to right as the order of Pj 0 , Pj1 , Pj 2 , and Pj 3 .
Since the distances between adjacent category thresholds are equal and the location parameter is
zero, the trace lines are symmetric with respect to θ = 0.

If the slope a j increases by 0.5 ( a j = 1.5) and the location is changed from b j = 0 to b j = 0.5,
then all of the four trace lines become steeper and are shifted to the right of the θ scale, as
shown in Figure 7.8. These mechanisms of the function behave the same as the dichotomous
item response model.

Figure 7.8: Graded Response Model: a = 1.5, b = 0.5, c = (2.0,0.0,-2.0)

548
7 OVERVIEW OF MODELS

If the distance between c2 and c3 becomes narrower by 0.5 ( c2 = -0.5) as shown in Figure 7.9,
the trace lines of Pj1 and Pj 2 are shifted to the right. In other words, these two categories become
more difficult to be endorsed or attained. However, the trace lines of Pj 0 and Pj 3 stay the same
since these probabilities do not involve c2 . Since the trace lines of the extreme categories, Pj 0
and Pj 3 , are essentially the cumulative probability function, the slope of these functions change
only if the slope parameter, a j , is altered. However, the slope of the middle category is affected
by the distance of adjacent categories. Therefore, the trace line of Pj 2 (θ ) is not only shifted, but
also becomes flatter.

Figure 7.9: Graded response model: a = 1.5, b = 0.5, c = (2.0,0.5,-2.0)

In the case of dichomotomous item response models, the slope parameter is synonymous with
the discriminating power. However, for the polytomous item response model, the discriminating
power of a specific categorical response depends on the width of adjacent category thresholds as
well as a slope parameter. Because of this property, the simultaneous estimation of the slope pa-
rameter and of all m j category parameters is not obtainable. If the model includes the slope pa-
rameter for each item j, the location of the category parameters must be fixed. The natural choice
is to fix the values of the mean of the category parameters c1 and cm . The program provides the
keyword CADJUST on the BLOCK command to set this mean (default is 0.0). The option
NOCADJUST causes the program to omit the adjustment during the calibration or scoring phase.

The relationships among parameters in the rating-scale model are expressed by

aj
a j (θ − b j + ck ) = [ sθ − ( sb j − t ) + ( sck − t )]
s

where s is a scaling factor and t is a location constant. This equation shows that shifting the cen-
ter of the category metric results in a shift of b j in the same direction by the same units. If the

549
7 OVERVIEW OF MODELS

intervals of the category scale are expanded by the factor s and the scale of θ is held constant,
the b j will expand and the a j will contract by the same factor. Therefore, if the assumption that
more than two subsets of items measure the same ability is met and their ability distributions are
constrained to have the same mean and standard deviation, both the scale and location parame-
ters are determinate and estimable.

7.3.3 Masters’ partial credit model

Masters (1982) reformulated Andrich’s polytomous rating response model by utilizing the Rasch
dichotomous model, which does not contain a discriminating power parameter. It is quite legiti-
mate, however, to formulate the general model based on the two-parameter logistic response
model, following the same operating characteristic that Masters employs. Since the essential
mechanism for constructing the general model is shared with Masters’ partial credit model and
Andrich’s rating-scale model, the models constructed in this text can simply be called the gener-
alized partial credit model.

The generalized partial credit model is formulated based on the assumption that each probability
of choosing the k-th category over the (k–1)-th category is governed by the dichotomous re-
sponse model. To develop the partial credit model, let us denote Pjk as the specific probability of
choosing the k-th category from m j + 1 possible categories of item j. In the dichotomous model
( m j + 1 = 2 ), Pj 0 (θ ) + Pj1 (θ ) = 1. The conditional probability of choosing category 1 given the
probability of choosing categories 0 and 1 is then

Pj1 (θ ) exp[a j (θ − b j1 )]
Pj1|0,1 = = Pj1 (θ ) = .
Pj 0 (θ ) + Pj1 (θ ) 1 + exp[a j (θ − b j1 )]

Therefore,

1
Pj 0|0,1 (θ ) = 1 − Pj1 (θ ) = .
1 + exp[ a j (θ − b j1 )]

In the polytomous response model, in which m j + 1 is more than 3 for item j,

mj
Pj 0 (θ ) + Pj1 (θ ) + ... + Pjm j (θ ) = ∑ Pjk (θ ) = 1.
k =0

For each of the adjacent categories, the probability of the specific categorical response k over k –
1 is given by the above conditional probability:

550
7 OVERVIEW OF MODELS

C jk = Pjk |k −1,k (θ )
Pjk (θ )
=
Pj ,k −1 (θ ) + Pjk (θ )
exp[a j (θ − b jk )]
= ,
1 + exp[a j (θ − b jk )]

where k = 1, 2, … , m j . Then,

C jk
Pjk (θ ) = Pj ,k −1 (θ ),
1 − C jk

where

C jk Pjk |k −1,k (θ )
=
1 − C jk 1 − Pjk |k −1,k (θ )
Pjk |k −1,k (θ )
=
Pjk −1|k −1,k (θ )
= exp[a j (θ − b jk )].

This equation may be called the operating characteristic for the partial credit model. If we start
by determining

1
Pj 0 (θ ) = ,
G

we obtain the following probabilities by applying the operating characteristic:

exp[a j (θ − b j1 )]
Pj1 (θ ) =
G
exp[a j (θ − b j1 ) + a j (θ − b j 2 )]
Pj 2 (θ ) =
G

exp[∑ v =1 a j (θ − b jv )]
g

Pjg (θ ) =
G

exp[∑ v =j1 a j (θ − b jv )]
m

Pjm j (θ ) = ,
G

where g is a subscript for a specific categorical response k = g.

551
7 OVERVIEW OF MODELS

Since ∑ Pjk (θ ) = 1,

mj
 c 
G = 1 + ∑ exp  ∑ a j (θ − b jv )  .
c =1  v =1 

Therefore, the partial credit model is given by

k 
exp  ∑ a j (θ − b jv ) 
Pjk (θ ) =  v =1 
mj
 c 
1 + ∑ exp  ∑ a j (θ − b jv ) 
c =1  v =1 
 k 
exp  ∑ a j (θ − b jv ) 
= m  v =0  ,
j
 c 
∑ exp  ∑ a j (θ − b jv ) 
c =0  v =0 

where b j 0 ≡ 0. The partial credit model reduces to the dichotomous item response model when
m j = 1 and k = 0, 1. Note that b j 0 is arbitrarily defined as 0.0. This quantity is not a location
constant and could be any value because the term containing this parameter cancels from both
numerator and denominator:

k 
exp  ∑ z jv 
Pjk (θ ) = m  v =1 
j
 c 
∑ exp  ∑ z jv 
c =1  v =0 

k  k 
exp[ z j 0 (θ )] × exp  ∑ z jv (θ )  exp  ∑ z jv (θ ) 
=  v =1  =  v =1  ,
mj
 c  mj
 c 
exp[ z j 0 (θ )] + ∑ exp  z j 0 (θ ) + ∑ z jv (θ )  1 + ∑ exp  ∑ z jv (θ ) 
c =1  v =1  c =1  v =1 

where z jk (θ ) = a j (θ − b jk ). Masters (1980) calls the quantity b jk in this equation an “item step”
parameter. It is the intersection point of Pjk (θ ) and Pj ,k +1 (θ ) expressed as the operating charac-
teristic. Thus, assuming a j > 0,

552
7 OVERVIEW OF MODELS

θ = b jk , then Pjk (θ ) = Pj ,k +1 (θ )
if: θ > b jk , then Pjk (θ ) < Pj ,k +1 (θ )
θ < b jk , then Pjk (θ ) > Pj ,k +1 (θ ).

It should be noted that b jk is not sequentially ordered within item j, because it represents the
relative magnitude of the adjacent probabilities Pjk (θ ) and Pj ,k +1 (θ ) . Furthermore, when all
probabilities Pjk (θ ) are equal, the values of b jk also become identical.

While the item-category threshold parameter, b jk , in the graded response model determines the
steepest point of the trace line, the b jk parameter in the partial credit model is the intersection
point of Pjk (θ ) and Pj ,k +1 (θ ) . These lines intersect only once anywhere on the θ scale. Figure
7.10 is the graph of the partial credit model with a j = 1.0, b j1 = -2.0, b j 2 = 0.0, and b j 3 = 2.0. If
b j1 and b j 2 are brought closer together by changing b j1 = -2.0 to -0.5, then the probability of
completing only the first step would decrease, as illustrated in Figure 7.11. If the slope parameter
is changed from 1.0 to 0.7, as shown in Figure 7.12, the intersection points of all trace lines are
left unchanged and the curves become flatter.

Figure 7.10: Partial credit model: a = 1.0, b = (-2.0,0.0,2.0)

553
7 OVERVIEW OF MODELS

Figure 7.11: Partial credit model: a = 1.0, b = (-0.5,0.0,2.0)

Figure 7.12: Partial credit model: a = 0.7, b = (-0.5,0.0,2.0)

554
7 OVERVIEW OF MODELS

Figure 7.13: Partial credit model: a = 0.7, b = (0.5,0.0)

When the second step is made easier than the first ( b j1 > b j 2 ), the trace lines of Pj 2 drop and
every person becomes less likely to complete only the first step. This is illustrated in Figure 7.13.
The trace line in Figure 7.13 is the partial credit model with three categorical responses. If we
add another category ( b j 3 = 2.0) to this model, the trace lines become more complicated.

However, the interpretation remains clear. The transition or step from Pj ,k −1 (θ ) to Pjk (θ ) is gov-
erned by the item step parameter, b jk . Since the magnitude of b jk s are ordered as b j 3 (= 2.0),
b j1 (= 0.5) and b j 2 (= 0.0), the step from b j 2 (θ ) to b j 3 (θ ) is the hardest, next to the step from
Pj 0 (θ ) to Pj1 (θ ) . The easiest step is the transition from Pj1 (θ ) to Pj 2 (θ ) . Consequently, the re-
spondent becomes more likely to complete the first category, but less likely to complete the sec-
ond category. Therefore, as shown in Figure 7.14, the probability of the second categorical re-
sponse, Pj 2 (θ ) , appears dominant. If all item step parameters are the same value, all trace lines
intersect at the same value. Even though the values of item step parameters are not sequentially
ordered, the partial credit model expresses the probabilities of ordered response. The subsequent
steps can be completed only after the former ones are successfully completed. In other words, the
locations of the trace lines can never be interchanged, only their intersection points.

555
7 OVERVIEW OF MODELS

Figure 7.14: Partial credit model: a = 0.7, b = (0.5,0.0,2.0)

The Likert version of the partial credit model is the simple extension of the foregoing results,
namely,

exp  ∑ v =0 a j (θ − b j + cv ) 
k

Pjk (θ ) = m   ,
∑ c=0 exp ∑ v=0 a j (θ − b j + cv ) 

j c

where θ − b j + c0 ≡ 0 and c0 ≡ 0 and the parameter b jk is resolved into two parameters b j and ck
( b jk = b j − ck ). Andrich (1978) first introduced this separation of the item location and the cate-
gory boundary parameters.

In the graded response model, the probability of responding in category k to a specific item is
obtained by subtracting the person’s probability of responding in or below category k. Since the
probability of the categorical response is determined by the distance between the boundaries of
the category, the order of the boundaries is fixed by the order of the categories.

In the partial credit model, the probability of responding in a category k to a specific item is ex-
pressed by the conditional probability of responding in category k, given the probability of re-
sponding in categories k – 1 and k. The models are constructed by recursively applying a di-
chotomous model to the probability of choosing category k over another adjacent category k – 1
for each pair of binary categories. Therefore, the probability of a specific categorical response is
determined by the number of the upper boundaries the person has passed and the combination of
their unique parameter values. The values of the item-category parameters, b jk , are not necessar-
ily in successive order on a scale like those of the graded response model. Since the item-cate-
gory parameters are not necessarily ordered within item j, the category parameters ( ck ) may not
be sequentially ordered for k = 1, 2, …, m. The parameter ck is interpreted as the relative diffi-

556
7 OVERVIEW OF MODELS

culty of step k in comparing other steps.

7.3.4 Scoring function of generalized partial credit model

In the normal metric, the sum of z jv (θ ) above can be written as

 k 
z +jk (θ ) = Da j  k (θ − b j ) + ∑ cv  .
 v =0 

The model can then be rewritten as

z +jk (θ ) = Da j Tk (θ − b j ) + K k 
.

Andrich (1978) calls Tk and K k the scoring function and the category coefficient, respectively.
For the partial credit model, the scoring function Tk is a linear integer scoring function, that is
T = (1, 2,3,..., m j + 1) where m j + 1 is the number of categories of item j.

The log-odds can be expressed by using the scoring function as

λ j ,k |k −1 = Da j (Tk − Tk −1 )(θ − b j ) + ck  .

This shows that the log-odds is a monotonically increasing function of the latent trait only when
the increment scoring is used for successive categorical responses. The higher latent trait value a
subject has, the more likely he or she responds in upper categories. In other words, the partial
credit model becomes the model for ordered categorical responses only when the scoring func-
tion is increasing, that is, Tk > Tk −1 , for any k and a j > 0.

Figure 7.15: Partial credit model: T=(1,2,3,4), a = 1.0, b = 0.0, d = (0.0,2.0,-1.0,-2.0)

557
7 OVERVIEW OF MODELS

Figure 7.15 is the partial credit model with four categorical responses, where a j = 1, b j = 0, c =
(0.0, 2.0, -1.0, -2.0), and T = (1, 2, 3, 4). The trace lines of these Pjk (θ ) s do not change if we use
T = (0, 1, 2, 3) or even T = (-3, -2, -1, 0), because the increment rate of both scoring functions is
identical, that is Tk − Tk −1 = 1. However, if we multiply the scoring by 2, that is, T ' = (2, 4, 6, 8),
the trace lines become steeper and their intersection points are –1.0, 0.5, and 1.0, respectively,
because

z +jk (θ ) = Da j Tk' (θ − b j ) + K k 
 K 
= 2 Da j Tk (θ − b j ) + k  .
 2 

This effect is similar to that of multiplying the slope parameter.

The scoring function provides a convenient notation for collapsing or recoding categorical re-
sponses. For example, if the number of categorical responses of an item is five, then a scoring
function T can be specified as T = (1, 2, 3, 4, 5). If the original response categories are collapsed
by combining the first and second categories into one category, the scoring function T ' can be
written as T ' = (1, 1, 2, 3, 4). If these modified response categories are recoded by treating the
original fourth category as the fifth and the original fifth as the fourth, the scoring function can
be further modified as T '' = (1, 1, 2, 4, 3).

The generalized partial credit model can be expressed as a form of the nominal response model
(Bock, 1972):

exp  Da j Tk (θ − b j ) + K k  
Pjk (θ ) = .
∑ c=0 exp  Da j Tc (θ − b j ) + Kc  
mj

The partial credit model can be expressed as a form of the nominal response model (Bock,
1972):

exp  Da*jkθ + c*jk 


Pjk (θ ) =
∑ c=0 exp  Da*jcθ + c*jc 
mj

where a*jk = Da jTk and c*jk = − Da j (Tk b j − K k ) .

The nominal response model is the model in which the scoring function is constant over response
categories, that is T = Tk for any k, and the discrimination power varies for each categorical re-
sponse. Or, it can be said that the nominal response model is the model whose scoring functions
are unknown and treated as the parameters to be estimated. If a common slope parameter is used
for all categorical responses, the trace lines become horizontally straight lines since they are

558
7 OVERVIEW OF MODELS

independent of θ . Therefore, varied discriminating powers among categorical responses are es-
sential features of the nominal response model. Since the estimates of the discriminating powers
or slope parameters contain indeterminacy, the constraint, such that making the sum of those pa-
rameters zero, is commonly used (see Thissen & Steinberg, 1986).

We have observed that the scoring function determines the orderliness of categorical responses.
Thus, if we assign an identical scoring to two response categories, we can construct the partial
credit model (PCM) with partially unordered categorical responses. This model can be called a
partially unordered partial credit model (PUPCM). The basic difference between the PCM for the
collapsed categories and the PUPCM is that the item-category parameters for each of the original
categories are estimated for the PUPCM.

If the scoring, Tk and Tk ' , are identical, the log-odds of these categorical responses are independ-
ent of the latent trait, θ . The log-odds become a function of the difference of the categorical co-
efficients, that is,

λ j ,k |k ' = Da j [ K k − K k ' ] .

These odds are constant along the θ scale and the trace lines never intersect.

Figure 7.16 is the partial credit model with four categorical responses. The parameter values are
the same as the previous example, but the scoring, T = (1, 2, 2, 3) is used instead. In other words,
we impose the assumption that the second and third categories do not have an inherent ordering.
Since K 2 = 2.0 is larger than K3 = 1.0, Pj 2 (θ ) is always higher than Pj 3 (θ ) . The positions of
these two trace lines are reversed if K 2 < K3 . Notice again that imposing the same scoring on the
categories does not mean collapsing of those categories. We just eliminate the assumption about
the ordering nature among those categories, in other words, nominalizing the categories.

Figure 7.16: Partial credit model: T=(1,2,2,3), a = 1.0, b = 0.0, d = (0.0,2.0,-1.0,-2.0)

559
7 OVERVIEW OF MODELS

7.3.5 Multiple-group polytomous item response models

Muraki (1993) proposed several variations of polytomous item response models for the multi-
group settings: the Rater's-Effect (RE) model, the DIF model, and the Trend model. The DIF and
Trend models for the dichotomous item response models were also discussed by Bock, Muraki &
Pfeiffenberger (1988).

The model for the differential item function (DIF) contains the following deviate Z gjk (θ ) :

Z gjk (θ ) = Da j (θ − b j − d gj + c jk )

where d gj is a DIF (or item location deviate) parameter for group g and item j.

In a similar manner, the deviate for the Rater's-Effect (RE) model is expressed as

Z gjk (θ ) = Da j (θ − b j − d g + c jk )

where d g is a rater effect (or group) parameter for rater or rater group g.

Notice that the group parameter d gj for the DIF model is nested within each item. For the DIF
model, it is assumed that only the item location parameters differ among groups while the slope
and category parameters are common for groups (this restriction can be relaxed in the program).
The subgroup identification may be a gender, a year, or some other covariate. On the other hand,
in the RE model, the group effect d g is crossed with the item effect. This model is generally re-
ferred to as a multifacet model (Linacre & Wright, 1993). The basic difference between the DIF
model and the RE model is whether the group parameter is nested within or crossed with the
item difficulty facet.

For the DIF model, a separate prior distribution is used for each group member, and the prior dis-
tribution is updated after each estimation cycle based on the posterior distribution from the pre-
vious cycle. For the RE model, a single prior distribution is used for the responses rated by mul-
tiple groups of raters. The prior distribution may or may not be updated after each estimation cy-
cle.

7.3.6 Constraints for group parameters

For the DIF model, it is assumed that different groups have different distributions with mean µ g
and standard deviation σ g . The distributions are not necessarily normal. These empirical poste-
rior distributions are estimated simultaneously with the estimation of the item parameters. To
obtain these estimates, we impose the following constraint for the DIF model:

560
7 OVERVIEW OF MODELS

J J
∑ d Rj = ∑ d Fj .
j =1 j =1

This constraint implies that the overall difficulty levels of a test or a set of common items given
both the reference group and focal group (indicated by subscripts R and F, respectively) are the
same. Therefore, the item difficulty parameters for the focal groups are adjusted. Any overall dif-
ference in terms of test difficulty will be assumed to be the difference in ability level for sub-
groups. The ability level difference among groups can then be estimated by the posterior distri-
butions.

The constraint imposed for the group parameters for the RE model is

G
∑ d g = 0.
g =1

Group weight coefficient

The weight coefficient wg for group g is used only for the RE model. For the case of multiple
raters, it may be reasonable to assume that all raters are not equally reliable. If the reliability in-
dex for each rater is computed by some method, the index can be used for the coefficients.

7.3.7 Test of goodness-of-fit

The goodness-of-fit of the polytomous item response model can be tested item by item. Summa-
tion of the item fit can also be used for the goodness-of-fit for the test as a whole. If a test is suf-
ficiently long, the method used in BILOG-MG (Mislevy and Bock, 1990) can be used with slight
modifications.

In the method of Mislevy & Bock, the respondents in a sample of size N are assigned to H inter-
vals of the θ -continuum. The expected a posteriori (EAP) is used for the estimator of each re-
spondent’s proficiency score. The EAP estimate is the mean of the posterior distribution of θ ,
given the observed response pattern xl (Bock & Mislevy, 1982). The EAP score of the response
pattern xl is approximated by the quadrature points, X f , and the weights, A( X f ) , that is,

∑ f =1 X f Ll ( X f ) A( X f )
F

θl =
∑ f =1 Ll ( X f ) A( X f )
F

where Ll ( X f ) is the probability of observing a particular response pattern xl . The posterior stan-
dard deviation (PSD) of the EAP score is approximated by

561
7 OVERVIEW OF MODELS

∑ f =1 ( X f − θl )2 Ll ( X f ) A( X f ) .
F

PSD(θ l ) =
∑ f =1 Ll ( X f ) A( X f )
F

After all respondents' EAP scores are assigned to any one of the predetermined H intervals on the
θ -continuum, the observed frequency of the k-th categorical responses to item j in interval h,
rhjk , and the number of respondents assigned to item j in the h-th interval, N hj , are computed.
The estimated θ s are rescaled so that the variance of the sample distribution equals that of the
latent distribution on which the MML estimation of the item parameters is based, which is usu-
ally set as N (0,1) as a default. Thus, we obtain the H by m j + 1 contingency table for each item j.
For each interval, we compute the interval mean, θ h , and the value of the fitted response func-
tion, Pjk (θ h ) . Finally, a likelihood ratio χ 2 statistic for each item is computed by

H j mj
rhjk
G 2j =2 ∑∑ rhjk ln −
,
h =1 k =0 N hj Pjk (θ h )

where H j is the number of intervals left after neighboring intervals are merged if necessary to
avoid expected values, N hj Pjk (θ h ) , less than 5. The degrees of freedom is equal to the number of
intervals, H j , multiplied by m j . The likelihood ratio χ 2 test statistic for the test as a whole is
simply the summation of the separate χ 2 test statistics. The number of degrees of freedom is
also the summation of the degrees of freedom for each item. These fit statistics are useful in
evaluating the fits of the models to the same response data when the models are nested in their
parameters.

7.3.8 Initial parameter estimates

Point biserial and biserial correlation coefficients

The point biserial correlation, rPB , j for item j is a computationally simplified Pearson’s r be-
tween the dichotomously scored item j and the total score x. It is computed as

(µ j − µ x ) pj
rPB , j = ,
σx qj

where µ j is the mean total score among examinees who have responded correctly to item j, µ x
is the mean total score for all examinees, p j is the item difficulty index for item j, q j = (1 - p j ),
and σ x is the standard deviation of the total score for all examinees.

562
7 OVERVIEW OF MODELS

The biserial correlation coefficient estimates the relationship between the total score and the hy-
pothetical normally distributed score on a continuous scale underlying the dichotomous item.
The biserial correlation between an item and the total score can be estimated from the p-value
and the point biserial correlation of the item:

p jq j
rB , j = rPB , j ,
h( z j )

where z j is the z score that cuts off p j proportion of the cases to item j in the standard normal
distribution, and h( z j ) is the ordinate of the normal distribution at the point z j .

Lord & Novick (1968) show that the slope and threshold parameters of the normal ogive model
for the item are functions of the biserial correlation coefficient:

rB , j
aj =
1 − rB2, j

and

zj ajzj
bj = − = .
rB , j 1 + a 2j

Polyserial correlation coefficient

The point polyserial correlation, rPP , is simply the Pearson correlation between equally spaced
integers assigned to the successive categories (e.g., Likert scores). The relation between the point
polyserial correlation, and the polyserial correlation, rP , is

m j −1
1
rPP , j = rP , j
σj ∑ h( z jk )(T j,k +1 − T jk ),
k =0

where T jk is the scoring function for item j and category k, σ j is the standard deviation of item
scores y for item j, z jk is the z score corresponding to the cumulative proportion, p jk , of the k-th
response category to item j. If consecutive integers are used for scoring (that is T jk = 0,1,… , m j ),
then the relation expressed by this equation becomes

563
7 OVERVIEW OF MODELS

m j −1
rP , j ∑ h( z jk )
k =0
rPP , j =
σj

or

rPP , jσ j
rP , j = m j −1
.
∑ h( z jk )
k =0

The polyserial correlation becomes the biserial correlation if the number of response categories
is two.

Olsson, Drasgow, & Dorans (1982) presented three estimation methods for the polyserial corre-
lation coefficient - the maximum likelihood estimator, the two-step estimator and the ad hoc es-
timator. The latter is obtained by substituting sample statistics into the preceding equation. The
sample product-moment correlation of the total score and polytomous item score is the point-
polyserial correlation, rPP , j . h( z jk ) is the normal ordinate corresponding to the proportion p jk of
examinees with item scores less than or equal to T jk .

Initial slope parameter

From results of a simulation study, Olsson et al. (1982) concluded that the ad hoc estimator was
sufficiently unbiased and accurate for applied research. Thus, we compute initial slope values by
using the ad hoc estimator

rp , j
aj =
1 − rp2, j

This value applies to both the graded model and the generalized partial credit model.

To obtain the m j − 1 initial category threshold parameters of the graded model, we compute the
item category cumulative proportions from the numbers of examinees m jk responding in the suc-
cessive categories of item j.

∑ v=0 n jv .
k

p jk =
∑ v=0 n jv
m j

564
7 OVERVIEW OF MODELS

The corresponding deviates are obtained from the inverse normal distribution function:

z jk = Φ −1 ( p jk ) .

The threshold values are

− z jk −a j z jk
b jk = = .
rp , j 1 + a 2j

For the partial credit model, the corresponding parameters are computed from the proportions of
examinees in the higher of the two successive categories,

n jk
p jk = .
n j ,k −1 + n jk

The quantities z jk and b jk are computed as above, but the latter must be adjusted to reflect the
average ability of examinees who respond in these two categories relative to all examinees in the
sample. The adjusted value is

m jk − m jT
b'jk = b jk + ,
s jT

where m jk is the mean test score (computed from items scores 0, 1, …, m-1) of examinees re-
sponding in categories k or k – 1, and m jT and s jT are the mean and standard deviation of the test
scores of all examinees responding to item j.

These initial values are printed for each item in the Phase 1 program output (see Chapter 11 for
an example). Note, however, that in the results the category parameters of both models are ex-
pressed as deviations about the mean value and therefore sum to zero. The mean value itself is
referred to as the item “location” and appears along with the item slope in the listing of items
within blocks. For two-category (binary) items, the category parameters are equal in absolute
value by opposite in sign, and the location parameter is just the threshold parameter of the nor-
mal ogive or 2PL model.

Initial category parameters of the rating-scale model

Multiple items that appear within the same block (and thus have the same number of categories)
comprise a rating-scale. Each item has a slope and location parameter, the initial values of which
are computed as above. But all items have the same category parameters, which are therefore a
property of the block. The corresponding initial values are computed, first, by the category re-
sponse frequencies over the m items in the block to obtain

565
7 OVERVIEW OF MODELS
n k

∑∑ n
j =1 v = 0
jv

pk = m m

∑∑ n
j =1 v = 0
jv

or
n

∑n
j =1
jk

pk = m
.
∑ (n
j =1
j , k −1 + n jk )

Then

zk = Φ −1 ( pk ),

and for the graded rating-scale model,

− a j zk
bk = ,
1 + a j
2

where a j is the geometric mean of the slopes of the items within the bock (i.e., the n-th root of
the product of the n slopes). The use of the geometric mean is justified by the assumption that the
slopes of the items in the rating-scale domain are log-normally distributed.

For the partial credit rating-scale model, the quantities used in computing the category adjust-
ment formula shown above are accumulated over all items in the block; that is,

mk − mT
bk' = bk + ,
sT

where mk = ∑ j =1 m jk is the mean test score of all examinees responding in category k or k – 1


m

for all items, and mT and sT are the mean and standard deviation of the test scores of all exami-
nees responding to all items in the block.

566
7 OVERVIEW OF MODELS

7.4 Models in MULTILOG5


7.4.1 Introduction

In this section we describe the item response (trace line) models available in MULTILOG. There
are two general models that can be fitted using MULTILOG: Samejima’s (1969) “graded” model
and Thissen and Steinberg’s (1984) multiple response model. Many other (seemingly) different
models are available, as constrained subsets of one or the other of these two. For an extended
discussion of the relationship among IRT models, see Thissen & Steinberg (1986).

7.4.2 The graded model

Samejima’s (1969) graded model, for ordered responses x = k, k = 1, 2, …, m; where response m


reflects the highest θ value is defined as follows:

1 1
P( x = k ) = −
1 + exp[− a(θ − bk −1 )] 1 + exp[− a (θ − bk )] ,
= P* (k ) − P* (k + 1)

where a is the slope and bk is threshold(k). P* (k ) is the trace line describing the probability that
a response is in category k or higher, for each value of θ . For completeness of the model defini-
tion, we note that P* (1) = 1 and P* (m + 1) = 0. The value of bk −1 is the point on the θ -axis at
which the probability passed 50% that the response is in category k or higher. The properties of
the model are extensively described by Samejima (1969). In the MULTILOG output, the pa-
rameter a is labelled A, and bk is labelled B(k). This model is obtained by using the GR option on
the TEST command in MULTILOG.

7.4.3 The one- and two-parameter logistic models

When there are only two possible responses to each item, as for binary items on a test of profi-
ciency (correct/incorrect) or forced-choice items on a personality measure, the graded model is
equivalent to the 2PL model, which is usually written in the following simpler form:

1
P ( x = 2) = .
1 + exp[− a (θ − b)]

For compatibility with the graded model, the key is used to recode the “higher” of the two re-
sponses (correct, positive) to have the internal value “2” in MULTILOG. The lower (incorrect,
negative) response has the value “1.” The 2PL model has two parameters for each item, a (la-
belled A in the output), and b (labelled B(1)).

5
This section was contributed by David Thissen.

567
7 OVERVIEW OF MODELS

If the additional constraint that a j = a for all items j is imposed, the 2PL model is equivalent to
the 1PL model, sometimes referred to as the “Rasch model” (after Rasch, 1960). At times the
term “Rasch model” refers simply to the constraint that the slopes are equal for all items in the
test, in which case this is the “Rasch model,” using the marginal maximum likelihood estimation
algorithm described by Thissen (1982); at other times, the term “Rasch model” refers to both the
constraint and another method of parameter estimation (Conditional Maximum Likelihood),
which is not implemented in MULTILOG. The output is identical to that for the 2PL model, ex-
cept that the value of a (labelled A) is the same for all items.

Note that we do not include the scale factor 1.7 in the definition of either the 2PL or the 1PL
models. This model is obtained by using the L2 option on the TEST command in MULTILOG.

7.4.4 The multiple response model

A modified version of Samejima’s (1979) modification of Bock’s (1972) nominal model, for re-
sponses x = 1, 2, ..., m (or m +1), is Thissen and Steinberg’s (1984) multiple response model:

h*exp[akθ + ck ] + hd k exp[a1θ + c1 ]
P( x = k ) = m +1

∑ exp[a θ + c ]
i =1
i i

in which two classes of constraints are required:

 The parameters ak and ck are not identified with respect to location; either TRIANGLE con-
trasts define the parameters that are estimated, in which case a1 = c1 = 0 , or DEVIATION or
POLYNOMIAL contrasts among these parameters are estimated, in which case
∑ ak = ∑ ck = 0 , or
 The parameters represented by d k are proportions, representing the proportion of those
who “don’t know” who respond in each category on a multiple-choice item (see Thissen
& Steinberg, 1984). Therefore, the constraint that ∑ d k = 1 (where the sum is from k = 1 to
2 for binary data and from 2 to m+1 for m > 2) is required. This is enforced by estimating
d k such that

exp  d k* 
dk =
∑ exp d *
k

and contrasts among the d k are the parameters estimated.

The parameters h* and h are used to provide several different models, and are calculated by
MULTILOG. The value of h* is always 1 for items with m > 2.

568
7 OVERVIEW OF MODELS

7.4.5 The multiple-choice model

When m > 2 and h = 1, the Thissen and Steinberg model described in Section 7.4.4 becomes

exp[akθ + ck ] + d k exp[a1θ + c1 ]
P( x = k ) = m +1

∑ exp[a θ + c ]
i =1
i i

which is the “Multiple-choice” model, as described by Thissen and Steinberg (1984). The data
should be keyed into categories 2, 3, … m + 1, because category 1 in the program is the “0” or
“Don’t Know” latent category (see Section 12.10). MULTILOG prints the values of the parame-
ters ak , ck , and d k , labelled A(K), C(K), and D(K), respectively. The values of the d k contrasts
may be fixed at zero using the MULTILOG command language. This produces Samejima’s
(1979) version of the model, in which the “guessing proportions” are equal to 1/ m :

1
exp[akθ + ck ] + exp[a1θ + c1 ]
P( x = k ) = m .
m +1

∑ exp[a θ + c ]
i =1
i i

7.4.6 The three-parameter logistic model

For m = 2, h* = 0 for category 1 (incorrect) and h* = 1 for category 2 (correct) this gives a
parameterization of the conventional 3PL model (Lord, 1980; see Thissen & Steinberg, 1986, for
a description of this conception of the 3PL), in which

exp[a2θ + c2 ] + d 2 exp[a1θ + c1 ]
P ( x = 2) = 2

∑ exp[a θ + c ]
i =1
i i

exp[a2θ + c2 ] + d 2 exp[a1θ + c1 ]
=
exp[a1θ + c1 ] + exp[a2θ + c2 ]
d 2 exp[a1θ + c1 ] exp[a2θ + c2 ]
= + .
exp[a1θ + c1 ] + exp[ a2θ + c2 ] exp[ a1θ + c1 ] + exp[a2θ + c2 ]

The constraints described above require that a1 = −a2 and c1 = −c2 . Thus, the model is:

d 2 exp[−(a2θ + c2 )] exp[a2θ + c2 ]
P ( x = 2) = +
exp[−(a2θ + c2 )] + exp[a2θ + c2 ] exp[−( a2θ + c2 )] + exp[a2θ + c2 ]
 1  1
= d 2 1 +
 1 + exp[ −2( a2θ + c2 
)] 1 + exp[ −2( a2θ + c2 )]
 1 
= d 2 + (1 − d 2 )  
1 + exp[−2(a2θ + c2 )] 

569
7 OVERVIEW OF MODELS

which is a fairly conventional form of the 3PL model. MULTILOG actually estimates the logit
of d 2 , and the contrasts between a1 and a2 , and c1 and c2 . These are printed in the output, as
well as the “traditional 3PL, normal metric” form of the parameters, labelled A, B, and C from the
model when written

 1 
P ( x = 2) = C + (1 − C)  .
1 + exp[−1.7 A(θ − B )] 

This model is obtained by using the L3 option on the TEST command in MULTILOG.

7.4.7 The nominal model

When h = 0, the model defined in Section 2.4.3 is equivalent to Bock’s (1972) “nominal model”:

exp[akθ + ck ]
P( x = k ) = m
;
∑ exp[a θ + c ]
i =1
i i

in this case, the data are keyed into categories 1, 2, …, m, and the parameters represented by d k
are not estimated. MULTILOG prints the values of the parameters ak and ck , labelled A(K) and
C(K) respectively. This model is obtained by using the NO option on the TEST command in
MULTILOG.

7.4.8 Contrasts

MULTILOG estimates the contrasts between the as, cs, and d *s ; the unconstrained (estimated)
parameters are the α s, γ s and δ s [denoted AK, CK, and DK, respectively, in the syntax, and
CONTRAST(k) FOR A, CONTRAST(k) FOR C, and CONTRAST(k) FOR D in the MULTILOG out-
put], where

a' = α ' Ta , c' = γ ' Tc , and d*' = δ ' Td .

The default form of the T matrices consists of deviation contrasts, as suggested by Bock (1972).
For varying numbers of response categories, those matrices are printed here, along with the al-
ternative polynomial and triangle contrasts.

DEVIATION T-matrices

2 Categories

–0.50 0.50

570
7 OVERVIEW OF MODELS

3 Categories

–0.33 0.67 –0.33


–0.33 –0.33 0.67

4 Categories

–0.25 0.75 –0.25 –0.25


–0.25 –0.25 0.75 –0.25
–0.25 –0.25 –0.25 0.75

5 Categories

–0.20 0.80 –0.20 –0.20 –0.20


–0.20 –0.20 0.80 –0.20 –0.20
–0.20 –0.20 –0.20 0.80 –0.20
–0.20 –0.20 –0.20 –0.20 0.80

6 Categories

–0.17 0.83 –0.17 –0.17 –0.17 –0.17


–0.17 –0.17 0.83 –0.17 –0.17 –0.17
–0.17 –0.17 –0.17 0.83 –0.17 –0.17
–0.17 –0.17 –0.17 –0.17 0.83 –0.17
–0.17 –0.17 –0.17 –0.17 –0.17 0.83

7 Categories

–0.14 0.86 –0.14 –0.14 –0.14 –0.14 –0.14


–0.14 –0.14 0.86 –0.14 –0.14 –0.14 –0.14
–0.14 –0.14 –0.14 0.86 –0.14 –0.14 –0.14
–0.14 –0.14 –0.14 –0.14 0.86 –0.14 –0.14
–0.14 –0.14 –0.14 –0.14 –0.14 0.86 –0.14
–0.14 –0.14 –0.14 –0.14 –0.14 –0.14 0.86

8 Categories

–0.13 0.88 –0.13 –0.13 –0.13 –0.13 –0.13 –0.13


–0.13 –0.13 0.88 –0.13 –0.13 –0.13 –0.13 –0.13
–0.13 –0.13 –0.13 0.88 –0.13 –0.13 –0.13 –0.13
–0.13 –0.13 –0.13 –0.13 0.88 –0.13 –0.13 –0.13
–0.13 –0.13 –0.13 –0.13 –0.13 0.88 –0.13 –0.13
–0.13 –0.13 –0.13 –0.13 –0.13 –0.13 0.88 –0.13
–0.13 –0.13 –0.13 –0.13 –0.13 –0.13 –0.13 0.88

571
7 OVERVIEW OF MODELS

9 Categories

–0.11 0.89 –0.11 –0.11 –0.11 –0.11 –0.11 –0.11 –0.11


–0.11 –0.11 0.89 –0.11 –0.11 –0.11 –0.11 –0.11 –0.11
–0.11 –0.11 –0.11 0.89 –0.11 –0.11 –0.11 –0.11 –0.11
–0.11 –0.11 –0.11 –0.11 0.89 –0.11 –0.11 –0.11 –0.11
–0.11 –0.11 –0.11 –0.11 –0.11 0.89 –0.11 –0.11 –0.11
–0.11 –0.11 –0.11 –0.11 –0.11 –0.11 0.89 –0.11 –0.11
–0.11 –0.11 –0.11 –0.11 –0.11 –0.11 –0.11 0.89 –0.11
–0.11 –0.11 –0.11 –0.11 –0.11 –0.11 –0.11 –0.11 0.89

POLYNOMIAL T-matrices

2 Categories

–0.50 0.50

3 Categories

–1.00 0.00 1.00


0.58 –1.15 0.58

4 Categories

–1.50 –0.50 0.50 1.50


1.12 –1.12 –1.12 1.12
–0.50 1.50 –1.50 0.50

5 Categories

–2.00 –1.00 0.00 1.00 2.00


1.69 –0.85 –1.69 –0.85 1.69
–1.00 2.00 0.00 –2.00 1.00
0.38 –1.51 2.27 –1.51 0.38

6 Categories

–2.50 –1.50 –0.50 0.50 1.50 2.50


2.28 –0.46 –1.83 –1.83 –0.46 2.28
–1.56 2.18 1.25 –1.25 –2.18 1.56
0.79 –2.37 1.58 1.58 –2.37 0.79
–0.26 1.32 –2.64 2.64 –1.32 0.26

572
7 OVERVIEW OF MODELS

7 Categories

–3.00 –2.00 –1.00 0.00 1.00 2.00 3.00


2.89 0.00 –1.73 –2.31 –1.73 0.00 2.89
–2.16 2.16 2.16 0.00 –2.16 –2.16 2.16
1.28 –2.98 0.43 2.56 0.43 –2.98 1.28
–0.58 2.31 –2.89 0.00 2.89 –2.31 0.58
0.17 –1.04 2.61 –3.48 2.61 –1.04 0.17

8 Categories

–3.50 –2.50 –1.50 –0.50 0.50 1.50 2.50 3.50


3.50 0.50 –1.50 –2.50 –2.50 –1.50 0.50 3.50
–2.79 1.99 2.79 1.20 –1.20 –2.79 –1.99 2.79
1.83 –3.39 –0.78 2.35 2.35 –0.78 –3.39 1.83
–0.97 3.19 –2.36 –2.08 2.08 2.36 –3.19 0.97
0.40 –1.99 3.59 –1.99 –1.99 3.59 –1.99 0.40
–0.11 0.77 –2.32 3.87 –3.87 2.32 –0.77 0.11

9 Categories

–4.00 –3.00 –2.00 –1.00 0.00 1.00 2.00 3.00 4.00


4.12 1.03 –1.18 –2.50 –2.94 –2.50 –1.18 1.03 4.12
–3.45 1.72 3.20 2.22 0.00 –2.22 –3.20 –1.72 3.45
2.42 –3.64 –1.90 1.56 3.12 1.56 –1.90 –3.64 2.42
–1.43 3.94 –1.43 –3.22 0.00 3.22 1.43 –3.94 1.43
0.70 –2.96 3.83 0.17 –3.48 0.17 3.83 –2.96 0.70
–0.26 1.59 –3.70 3.70 –0.00 –3.70 3.70 –1.59 0.26
0.07 –0.55 1.91 –3.82 4.78 –3.82 1.91 –0.55 0.07

TRIANGLE T-matrices

(When used for the vector c, the constraint ∑c = 0 is replaced with the constraint c1 = 0 ).

2 Categories

0.00 –1.00

3 Categories

0.00 –1.00 –1.00


0.00 0.00 –1.00

573
7 OVERVIEW OF MODELS

4 Categories

0.00 –1.00 –1.00 –1.00


0.00 0.00 –1.00 –1.00
0.00 0.00 0.00 –1.00

5 Categories

0.00 –1.00 –1.00 –1.00 –1.00


0.00 0.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 –1.00 –1.00
0.00 0.00 0.00 0.00 –1.00

6 Categories

0.00 –1.00 –1.00 –1.00 –1.00 –1.00


0.00 0.00 –1.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 0.00 –1.00 –1.00
0.00 0.00 0.00 0.00 0.00 –1.00

7 Categories

0.00 –1.00 –1.00 –1.00 –1.00 –1.00 –1.00


0.00 0.00 –1.00 –1.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 –1.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 0.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 0.00 0.00 –1.00 –1.00
0.00 0.00 0.00 0.00 0.00 0.00 –1.00

8 Categories

0.00 –1.00 –1.00 –1.00 –1.00 –1.00 –1.00 –1.00


0.00 0.00 –1.00 –1.00 –1.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 –1.00 –1.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 0.00 –1.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 0.00 0.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 0.00 0.00 0.00 –1.00 –1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 –1.00

574
7 OVERVIEW OF MODELS

9 Categories

0.00 –1.00 –1.00 –1.00 –1.00 –1.00 –1.00 –1.00 –1.00


0.00 0.00 –1.00 –1.00 –1.00 –1.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 –1.00 –1.00 –1.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 0.00 –1.00 –1.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 0.00 0.00 –1.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 0.00 0.00 0.00 –1.00 –1.00 –1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 –1.00 –1.00
0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 –1.00

7.4.9 Equality constraints and fixed parameters

MULTILOG permits any subset of the parameters to be constrained to be equal, or to retain


fixed values. The approach is very similar to that used in LISREL (Jöreskog & Sörbom, 1996).
Using these facilities, a wide variety of item response models may be specified. Some uses of
equality constraints are described in the examples given in Chapter 12.

7.5 Options and statistics in TESTFACT


7.5.1 Introduction

The TESTFACT program implements all the main procedures of classical item analysis, test
scoring, and factor analysis of inter-item tetrachoric correlations, and also modern methods of
item factor analysis based on item response theory (IRT). In addition, the program includes a fa-
cility for simulating responses to test items having difficulties and factor loadings specified by
the user. This section reviews the mathematical and statistical backgrounds of these procedures.

7.5.2 Classical item analysis and test scoring6

Classical item analysis aims at inferring the expected characteristics of responses of persons in
the population to whom the test will be administered. The data to be analyzed are assumed to
come from a sample of respondents from that population. The various sample indices—item dif-
ficulties based on item p-values, test reliability expressed as a ratio of true-score variances to
test-score variances, and test validity measured by the correlation of the test score with an exter-
nal criterion of performance—are all estimates of population statistics. Classical test theory em-
ploys such estimates to predict how the test will perform when administered to members of the
population in question.

Classical methods also give useful details about the discriminating power of the items. They pro-
vide various measures of the item-by-test score correlation, supplemented by tabulations of the

6
This section was contributed by Robert Wood.

575
7 OVERVIEW OF MODELS

frequencies of responses to multiple-choice item alternatives at selected fractiles of the test-score


distribution. The latter show how the distractors function in discriminating among respondents of
differing levels of ability. This information, together with plots of item discriminating power vs.
item difficulty, guides the test constructor in choosing items that will be informative throughout
the population.

The TESTFACT program computes these statistics from the sample of item-response data and
display them in tables and plots for ready interpretation. It also allows the user to specify subtests
of the main test and to analyze each subtest separately. Similarly, the user can assign respondents
to groups (by age, grade, or class, for example) and can analyze the groups independently.

In addition, TESTFACT provides a powerful data analytic tool in the form of item factor analy-
sis. As a preliminary to test construction, or in preparation for latent trait analysis with programs
such as BILOG (Mislevy & Bock, 1990), BILOG-MG (Zimowski, Muraki, Mislevy & Bock,
1996), or MULTILOG (Thissen, 1988), item factor analysis permits a more comprehensive and
detailed examination of item dimensionality than is currently available with any other procedure.
Because they are based on Thurstone’s multiple factor model, the results of the analysis—factor
loadings, orthogonal and oblique rotations, factor correlations, and factor scores—are familiar to
most users. Since the model is fitted by Bock & Aitkin’s (1981) marginal maximum likelihood
(MML) method, the analysis provides a rigorous test of the statistical significance of factors
added successively to the model.

Item factor analysis has other interesting uses beside those of test construction and exploration of
test dimensionality. The existence of more than one statistically significant factor implies a simi-
lar number of distinguishable profiles of individual differences in the population of respondents.
By calling attention to the common features of items that participate in these distinctions, item
factor analysis gives clues to the cognitive basis for the item responses. It serves as a “discovery
procedure” revealing often-unsuspected cognitive components of the test task. If the sample size
is large, factor analysis of item responses is often more productive than factor analysis of test
scores, because data for many distinct items is easier to obtain than data for a comparable num-
ber of tests.

The factor analysis procedure in TESTFACT makes exploration of any type of binary scored
characteristics dependable and informative. The potential distortions of chance successes, not-
reached items, and Heywood cases are effectively controlled. Principal factor, VARIMAX and
PROMAX patterns are provided; Bayes estimates of scores for orthogonal factors can be com-
puted for each respondent.

The main features and statistical principles of the TESTFACT program are described in the re-
mainder of this section.

7.5.3 Classical descriptive statistics

Each item has a set of responses: right, wrong, omitted, or not-presented. For item j, the response
of person i can be written as

576
7 OVERVIEW OF MODELS

xij = 1 if response is correct

xij = 0 if response is incorrect.

At the user’s option, omitted items can be considered either wrong or not-presented.

For a test of n items, the total main test score X i for person i would be

n
X i = ∑ xij .
j =1

If the main test is divided into K subtests of nk items each, the subtest scores are

nk
X ik = ∑ xijk k = 1, , K .
j =1

Given scores X i or X ik , the program provides estimates of means, standard deviations, and cor-
relations - whether the group of respondents is taken as a whole or split into classes. In addition,
histograms of main test and subtest scores are supplied to enable the user to check the nature of
the dispersion of each score. Product-moment correlations between the main and subtest scores
and external variates (where applicable) are also provided.

7.5.4 Item statistics

The most important item parameters for test construction are those that measure item difficulty
and discriminating power.

Difficulty

The number of respondents who answer item j correctly is called the item facility and is ex-
pressed as p j (a proportion of the total number attempting the item). For a standard measure of
item difficulty, the delta statistic ( ∆ ) is available. Delta is a non-linear transformation of the pro-
portion correct arranged to have a mean of 13 and a standard deviation of 4. Its effective range is
1 to 25. The formula is

∆ = −4Φ −1 ( p) + 13,

where p is the proportion correct (or item facility) and Φ −1 is the inverse normal transformation.
(For details, see Henryssen, 1971, p. 139-140).

577
7 OVERVIEW OF MODELS

A transformation based upon proportions or percentages that fall on a non-linear scale can cause
misleading judgments about relative differences in difficulty. The difference in difficulty be-
tween items with facilities of .40 and .50 would be quite small, but the difference between items
with facilities of .10 and .20 would be quite large.

The delta scale, on the other hand, is assumed to be linear. The difference in difficulty between
items with deltas of 13 and 14 is assumed to be the same as the difference in difficulty between
items with deltas of 17 and 18. Figure 7.17 shows a delta of 13 (i.e., Φ −1 = 0) corresponds to a
facility of 0.50.

Figure 7.17: Relationship between delta and facility

Discriminating power

According to Marshall & Hales (1972), more than 60 different indices for measuring item dis-
criminating power have been proposed.

TESTFACT provides two classical indices, the point biserial and the biserial correlations. Both
call for calculation of correlation between the score (1 or 0) on the item and the score on the test
as a whole. The higher the correlation between these two scores, the more effective the item is in
separating the test scores of the respondents. Naturally, this relationship is relative: a given item
could have a higher item-test correlation when included in one test than when included in a dif-
ferent test.

The point biserial correlation, rpbis , is a product-moment correlation between two variates, when
one of the variates is binary (the item score) and the other is the complete test or subtest, con-

578
7 OVERVIEW OF MODELS

tinuously distributed. The formula for the sample point biserial correlation can be written as

Mp −M p
rpbis = × ,
S 1− p

where:

Mp is the mean score on the test for those subjects who get the item correct

M is the mean score on the test for the entire group

S is the standard deviation of test scores for the entire group

p is the proportion that gets the item right (the item facility)

Evidently, rpbis serves as a measure of separation through the action of the term,

Mp −M
.
S

Note that rpbis is also a function of item facility.

In principle, values of the point biserial lie between -1 and +1. But, as Wilmut (1975, p. 30) has
demonstrated, in item analysis it is unlikely ever to exceed 0.75 or to fall below -0.10. This
should be kept in mind when interpreting output.

Of the many classical discrimination indices, the only serious rival to the point biserial is the
biserial correlation. Unlike the point biserial, the biserial is not a product-moment correlation;
rather, it should be thought of as a measure of association between performance on the item and
performance on the test (or some other criterion). The biserial is less influenced by item diffi-
culty and tends to be invariant from one testing situation to another—advantages the point bise-
rial does not possess (see below).

Also distinguishing it from its rival is the biserial correlation’s assumption that a normally dis-
tributed latent variable underlies the right/wrong dichotomy imposed in scoring an item. This
variable may be thought of as representing the trait that determines success or failure on the item.

The formula for calculating the sample biserial correlation coefficient, rbis , is

Mp −M p
rbis = × .
S h( p )

579
7 OVERVIEW OF MODELS

Except for h( p ) , the terms are as before; h( p ) stands for the ordinate or elevation of the normal
curve at the point where it cuts off a proportion p of the area under the curve. As might be ex-
pected, h( p ) enters into the formula because of the assumption of a normally distributed underly-
ing variable.

The relationship between the biserial and point biserial formulas is straightforward:

h( p )
rpbis = rbis × .
p (1 − p )

The point biserial is equal to the biserial multiplied by a factor that depends only on an item dif-
ficulty, so the point biserial will always be less than the biserial. In theory, the biserial can take
any value between -1 and +1, but values greater than 0.75 are rare, although the biserial can even
exceed 1 in exceptional circumstances - usually resulting from some peculiarity in the test score
or criterion distribution (Glass & Stanley, 1970, p.171). In practice, negative values usually indi-
cate that the wrong answer has been keyed.

Lord & Novick (1968, p. 340) show that the point biserial can never attain a value as high as
0.80 of the biserial, and they present a table showing how the fraction varies according to item
difficulty (see also Bowers, 1972). They remark that the extent of biserial invariance is necessar-
ily a matter for empirical investigation, but present some results in support of the conclusion that
“biserial correlations tend to be more stable from group to group than point biserials”.

Bowers (1972) observes that as long as a markedly non-normal distribution of the criterion vari-
able is not anticipated, substantially the same items are selected or rejected no matter which sta-
tistic is used to evaluate discrimination. It is true that the point biserial is more dependent on the
level of item difficulty, but this is not serious, as it only leads to rejection of very easy or very
difficult items, which would be rejected anyway. Users who have not made up their minds on
this issue are advised to fasten on to one or another statistic, learn about its behavior, and stay
with it. Switching from one to the other or trying to interpret both simultaneously is likely to be
confusing. Note, however, that in the factor analysis procedure, factor loadings of the items serve
as discrimination indices.

7.5.5 Fractile tables

Although point biserial and biserial correlations are useful guides to the discriminating power of
an item, they cannot describe how respondents of differing levels of achievement or ability re-
spond to specific items. By defining fractiles of the distribution of test scores, and classifying
item responses according to membership in these fractiles, the user can observe the behavior of
items across the ability range and, in particular, keep an open eye for malfunctioning distractors.

Items may:

 Fail to differentiate between respondents in the lower, and sometimes in the middle, frac-
tile bands;

580
7 OVERVIEW OF MODELS

 Function well over lower fractiles, but give little or no information about respondents in
the higher fractiles;
 Discriminate in a way that fluctuates wildly over fractiles.

By way of illustration, consider Table 7.1. The item that produced the data belonged to a 50-item
external examination in chemistry taken by 319 candidates. In many cases, of course, the sample
of candidates would be much larger than this.

The correct (starred) answer was option A, chosen by 146 candidates or, as the number under-
neath indicates, by 0.46 of the sample. The facility of this item is therefore 0.46 and the difficulty
( ∆ ) is 13.42.

Of the distractors, E was most popular (endorsed by 82 candidates, or 0.26 of the sample), fol-
lowed by options C, D, and B. Only two candidates omitted the item.

Table 7.1: Item responses classified by fractiles

SCORE Response Frequencies & Proportions for Options Total

A* B C D E O

146* 13 54 22 82 2 319

0.46 0.04 0.17 0.07 0.26 0.01 1.00

0-18 9 6 18 7 21 2 63

18-22 16 5 16 8 19 0 64

22-29 30 1 7 7 19 0 64

29-35 42 1 8 0 13 0 64

35-47 49 0 5 0 10 0 64

Mean
criterion 30.8 18.8 22.0 19.2 23.5 12.5 26.02

Turning to the body of the table, we see an evident pattern. Under the correct answer A, the
count increases as these score level rises. Under the distractors (excepting D, where the trend is
unclear), the gradient runs in the opposite direction. This is what we should see if the item is dis-
criminating in terms of total test score. The pattern we should not see is one in which the counts
under A are relatively equal, or worse, that all the counts in the table tend to equality.

As it is, the distribution of the responses tells us quite a lot. Relatively speaking, options B and C
are much more popular in the lowest score fractile, and in that fractile the correct answer was

581
7 OVERVIEW OF MODELS

barely more popular than B or D. In the higher score fractiles, however, B and D are almost to-
tally rejected.

In all, Table 7.1 supports the view that wrong answers are seldom, if ever, equally distributed
across the distractors, either in the sample as a whole or in fractiles. Nor is there any evidence of
blind guessing, an indication of which would be an inflated number in the cell for option A in the
0-18 score group - the one containing a 9 - which could cause the gradient to flatten out at low
score levels, or even to go in the other direction.

In Table 7.1, the five fractiles (any number can be defined, but five is enough for a first look at
an item) have been constructed so as to contain equal or nearly equal numbers of candidates.
This means that, unless the distribution of scores is rectangular, the score intervals will always
be unequal. However, there is no reason why fractiles cannot be defined in terms of equal score
intervals or according to some assumption about the underlying score distribution. If, for exam-
ple, the user believes that the underlying score distribution is normal, the fractiles might be con-
structed so as to have greater numbers in the middle fractiles and smaller numbers in the outer
fractiles. The only problem with this strategy is that, given small numbers of respondents, any
untoward behavior in the tails of the distribution would be amplified or distorted. Also, interpre-
tation of the table might be prone to error because of the varying number of fractiles.

7.5.6 Plots

TESTFACT provides line plots of item difficulties (or facilities) vs. the point biserial (or bise-
rial) correlations. It is often the case that high item difficulty corresponds to low biserial values,
and vice versa. When evaluating item statistics, plotting item difficulty (or facility) against point
biserial (or biserial) correlations should enable the user to see which items need attention (but see
Section 7.5.5, above). The user can specify either measure of difficulty and either measure of
discrimination in the PLOT command.

7.5.7 Correction for guessing

Following the argument of Wood (1977), the initial part of TESTFACT does not correct for
guessing in the case of omitted items. In the factor analysis part of the program, the user may
elect to proceed under the 3-parameter multidimensional normal ogive model, which will pro-
vide for the effects of guessing. TESTFACT does not estimate guessing parameters, but does al-
low the user to specify these values, either a priori or as estimated by a program such as BILOG
(see Section 7.5.11, below).

7.5.8 Internal consistency

TESTFACT also provides measures of internal test consistency. It is important to understand that
internal consistency is not the same as homogeneity: a test may be internally consistent—an em-
pirical, statistical fact—even though it includes items that are patently dissimilar in content (see
Green, Lissitz & Mulik, 1977). A measure of the internal consistency is the intra-class correla-
tion coefficient of the test or subtest. The correlation is commonly known as coefficient α .

582
7 OVERVIEW OF MODELS

For an n-item test,

σ2
α= 2 ,
σ + σ ε2 / n

where σ 2 is the variance component due to respondents, and σ ε2 is the residual or error variance.
Unlike many other programs, the calculation of α in TESTFACT allows for omits—technically
it is a variance components analysis in the unbalanced case (Harvey, 1970). Users should be
aware that the time taken to compute α is prohibitive for a large number of items or respondents.
We have, therefore, provided a simpler alternative, the Kuder-Richardson (KR20) coefficient:
n
S 2 − ∑ p j (1 − p j )
n j =1
KR20 = ⋅ ,
n −1 S2

where n is the number of items in the test, p j is the facility of item j, and S 2 is the variance of
the test scores. Note: if large numbers of respondents omit items, this can affect the estimate of
KR20.

7.5.9 Tetrachoric correlations and factor analysis

The response to any particular item can be thought of in terms of an item threshold on the trait
continuum being assessed. Respondents with a response process greater than or equal to the
threshold will give the correct answer; otherwise, they will give a wrong answer. By assuming
that the processes are normally distributed, and knowing the proportion of cases that respond
correctly to both items in any pair, we can estimate tetrachoric correlations for all distinct
n(n − 1) / 2 pairs of items. To calculate these correlations, TESTFACT uses Digvi’s (1979)
method.

If all respondents get either or both of the items correct, the tetrachoric correlation becomes ±1.
Because the presence of such values causes difficulties for the MINRES factor analysis of the
correlation matrix, TESTFACT uses a one-factor version of Thurstone’s (1947) centroid method
to estimate admissible values for these correlations.

As the final phase of item analysis, the matrix of tetrachoric correlations can be subjected to
principal factor analysis with communality iterations. This is equivalent to unweighted least-
squares (ULS) or MINRES factor analysis based on Thurstone’s (1947) multiple-factor model
(see Harman, 1976). The resulting principal factor pattern can be rotated orthogonally to the
varimax criterion (Kaiser, 1958). With the varimax solution as a target, the pattern can then be
rotated obliquely by the promax method of Hendrickson & White (1964). The latter pattern is
especially appropriate for item analysis, because it tends to identify clusters of items that form
unidimensional subsets within a heterogeneous collection of items.

583
7 OVERVIEW OF MODELS

In general, item tetrachoric correlation matrices are not positive-definite. This means that they
often cannot be used in any of the many statistical procedures that require positive-definiteness,
such as computing partial correlations among some of the items while holding others fixed.

In TESTFACT, this inconvenience can be avoided by listing and saving a smoothed positive
definite matrix of the item correlations. The smoothed matrix is computed from all the positive
roots (renormed to sum to n) of the original tetrachoric matrix. After the number of factors has
been determined, the smoothed matrix is reproduced from the MINRES factor solution.

7.5.10 IRT based item factor analysis7

Classical item analysis makes use of only that information in the examinees’ responses available
in the sample correct and incorrect occurrence frequencies for each item, together with the joint
correct and incorrect occurrence frequencies for all possible pairs of items. The estimation pro-
cedures of classical item statistics, including the MINRES factor loadings, are therefore referred
to as “partial” information methods. IRT estimation procedures, on the other hand, make use of
all of the information in each examinee’s pattern of correct and incorrect responses to the test
items (which is equivalent to the information in all possible occurrence and joint occurrence fre-
quencies of all orders, i.e., of item pairs, item triples, item quadruples, etc.) The IRT procedures
are therefore called “full” information methods.

In the TESTFACT program, IRT-based full information estimation procedures are only needed
in, and only applied to, item factor analysis. Their main advantage is that they are not affected by
the occurrence of zero- or 100-percent joint occurrence frequencies, for which tetrachoric corre-
lations cannot be estimated. Items with zero or 100-percent of correct responses in the sample,
which correspond to infinitely negative or positive item thresholds, do disturb IRT procedures,
however. They should therefore be eliminated from the response patterns before factor analysis
is attempted. For this reason, it is advisable to perform a preliminary classical item analysis be-
fore proceeding with the item factor analysis.

The full information procedure in TESTFACT maximizes the likelihood of the item factor load-
ings and standardized difficulties given the observed patterns of correct and incorrect responses.
It solves the corresponding likelihood equations by integrating over the latent distribution of fac-
tor scores assumed for the population of examinees (the so-called θ distribution). Because this
type of integration is called “marginalization” in the statistical literature, the estimation method
is called “marginal maximum likelihood” or MML. The definite integrals involved in this
method are computed numerically in a procedure referred to, for historical reasons, as “quad-
rature”. This version of TESTFACT makes use of recently developed innovations in quadrature
to make MML estimation feasible for fitting item response models in high-dimensional factor
spaces. It also includes, in addition to the preceding exploratory factor analysis, a confirmatory
factor analysis based on the bifactor model.

7
This section was contributed by R. Darrell Bock and Stephen G. Schilling.

584
7 OVERVIEW OF MODELS

7.5.11 Full information factor analysis

Bock & Aitkin (1981) introduced the marginal maximum likelihood (MML) or estimating item
parameters of the 1- and 2-parameter normal ogive item response models. Their iterative solution
of these likelihood equations was based on the EM algorithm of Dempster, Laird & Rubin
(1977). This method can be applied straightforwardly to the estimation problem of item parame-
ters in the item response model with a guessing term and with more than one latent dimension of
ability, θ . Details are given in Bock, Gibbons & Muraki (1988).

In the multidimensional case, the normal ogive item response model with guessing is given by

Prob( xij = 1| θ i ) = g j + (1 − g j )Φ[ z j (θ i )]


1 ∞
= g j + (1 − g j )

∫ − z j (θ j )
exp(−t 2 / 2) dt ,

where

z j (θ i ) = c j + a j1θ j1 + a j 2θ j 2 + … + a jmθ jm .

The MML estimates of the factor loadings α jk , k = 1, 2,… , m , and standard difficulty, δ ξ are then
calculated from the estimates of the slope parameter, a jk , and the intercept parameter, c j , as fol-
lows:

a jk cj
α jk = , δj = k = 1, 2,… , m,
dj dj

where

d j = 1 + a 2j1 + a 2j 2 + … + a 2jm .

Chance or guessing parameters, g j , are treated as known constants in TESTFACT. If the chance
parameter is not included in the item-response model, g j is set to zero for all items. Otherwise,
values of these parameters must be supplied as a part of the input data. When the guessing model
is invoked, the tetrachoric correlation coefficients are computed according to Carroll’s (1945)
correction chances/successes.

Guessing parameters are the ordinates of the asymptote of the response function in the direction
of low ability. As such, they do not depend upon the form of the response curve at higher abili-
ties. For this reason, guessing parameters can be satisfactorily estimated by a one-dimensional
item response model such as that used in the BILOG program of Mislevy & Bock (1990). Oth-
erwise, the a priori value equal to 1 divided by the number of response alternatives can be used.

585
7 OVERVIEW OF MODELS

7.5.12 Bifactor analysis

Prior to Thurstone’s development of the multiple-factor model, Holzinger & Swineford (1937)
introduced the bifactor model to extend the Spearman (1904) one-factor model for intelligence
tests to include so-called “group” factors. By including these mutually uncorrelated factors they
were able to explain departures from one common factor when distinguishable items, such as
spatial or number series items, appeared in the tests. Their model also applies to educational
achievement tests containing more than one subject-matter content area - for example, a mathe-
matics test containing an algebra and a geometry section. Such tests are often scored for general
mathematics achievement effects, but the multiple content areas may induce group factors.

The bifactor model has special relevance for IRT, because it accounts for departures from condi-
tional independence of responses to groups of items that depend on a common stimulus such as a
reading passage or problem-solving task. This type of item has been called a “testlet” (see
Wainer, 1995). The presence of these items violates the assumption of conditional independence
and leads to under-estimation of the standard error of the test score.

Taking advantage of the fact that a common factor and uncorrelated group factors imply only
two non-zero factors per item, Gibbons & Hedeker (1992) showed that MML estimation for the
bifactor model requires quadratures in only two dimensions. This means that the conditional de-
pendence problem can be solved in a way that is computationally practical and easily extendable
to large numbers of testlets. Standard errors for scores on the common factor after integrating
over the group factor dimensions then correctly account for the presence of conditional depend-
ence within the item groups. Comparing marginal maximum likelihoods of the bifactor solution
and a one-factor exploratory solution also provides a statistical test for failure of conditional in-
dependence. Analysis based on the bifactor model is included in TESTFACT.

7.5.13 Not-reached items in factor analysis

Item factor analysis should be applied only to power tests. If the time limits of such a test are too
short, a substantial proportion of the respondents may not reach later items in the test. In ap-
praising ability, such items might be scored as incorrect. But to do so in the item factor analysis
would introduce a spurious factor associated with item position.

To minimize these effects in the factor analysis, TESTFACT provides an option called TIME in
the SCORE, FULL, and TETRACHORIC commands. When this option is invoked (for each respon-
dent), all items omitted after the last-responded-to item are scores as “not-presented”. Omitted
items prior to the last-responded-to item are scored “incorrect”, unless the guessing mode is se-
lected (CPARMS in the FULL command or CHANCE option in the SCORE command). In that case, the
latter items would be scored “correct” with the probability of chance success g j and “incorrect”
with probability 1 − g j .

7.5.14 Constraints on item parameter estimates

Unless otherwise constrained, maximum-likelihood factor analysis may encounter one or more

586
7 OVERVIEW OF MODELS

so-called Heywood cases (i.e., items for which the unique variance goes to zero). In these cases,
the iterative MML solution will not converge. When that happens, the user has the option of sup-
pressing Heywood cases by placing a stochastic constraint, in the form of a prior Beta distribu-
tion, on the uniqueness (1 – communality) of the items. The default values of the parameters of
the Beta distribution have been chosen so that the effect of the prior on the estimated factor load-
ings will be comparatively mild. The uniqueness will not become zero or negative, factor load-
ings will not go to ±1, and loadings of smaller absolute value will not be much affected.

TESTFACT also permits a normal prior with specified mean and variance to be placed on the
intercept parameter of the response function. This protects the maximum likelihood analysis
from excessively large or small item intercepts, corresponding to one hundred percent or zero
percent item facility.

7.5.15 Statistical test of the number of factors

If the sample size is sufficiently large that all 2n possible response patterns have expected values
greater than one or two, the χ 2 approximation for the likelihood ratio test of fit of the model
relative to the general multinomial alternative is

2n
rl
G = 2∑ rl ln
2
,
l =1 NPl

 l is computed from the maximum likelihood estimates of


where rl is the frequency of pattern l, P
the item parameters, and N = ∑ rl is the number of cases. The number of degrees of freedom is

2n − 1 − n(m + 1) + m(m − 1) / 2,

where n is the number of items and m is the number of factors.

In this case, the goodness-of-fit test can be carried out after performing repeated full information
analyses, adding one factor at a time. When G 2 falls to insignificance, no further factors are re-
quired to explain association between item responses in the sample.

The degrees of freedom above do not apply to bifactor analysis. With that model, the number of
degrees of freedom is

2n − 1 − 2n − the number of items assigned to group factors

Note that the term, 2n − 1 , in the above formulas applies only to the case where all 2n possible
patterns are represented in the data. It can only be used in situations where the patterns are en-
tered with frequency weights, some of which can be zero to account for patterns not appearing in
the data (See Section 13.2).

587
7 OVERVIEW OF MODELS

In situations where the data are entered as individual observations, the program replaces 2n − 1
with N − 1 , the number of cases in the sample minus 1. This is a rather arbitrary expedient, how-
ever, since the degrees of freedom are determined by the number of distinct patterns, and any
given pattern may occur more than once even when there are a large number of items. Also, the
restriction that the probabilities of the patterns must sum to one, which eliminates one degree of
freedom, does not apply unless all possible patterns are represented in the data.

When the number of possible patterns is much larger than the sample size, many patterns will
have zero frequency in the data and many of their expected frequencies will be very small. The
above χ 2 or other approximation to the probability of the likelihood ratio statistics on the null
hypothesis will then be too inaccurate to be relied on as a goodness-of-fit test. Haberman (1977)
has shown, however, that the difference in these statistics for alternative models is distributed in
large samples as χ 2 , with degrees of freedom equal to the difference of respective degrees of
freedom, even when the frequency table is sparse. Thus, the contribution of the last factor added
to the model can be judged significant if the corresponding change of χ 2 value is statistically
significant, even when there are many patterns that do not occur in the sample. Since the term
N − 1 subtracts out of the difference in degrees of freedom when two models with different
number of factors are analyzed with the same data, the degrees of freedom printed by the pro-
gram can be subtracted to obtain the degrees of freedom for the difference of the corresponding
χ 2 values.

These statistics should be interpreted with caution, however: in large-scale studies based on re-
spondents from different sites, cluster effects may inflate the χ 2 statistics. To be conservative
about the number of factors that are identifiable in such studies, it is advisable to divide the χ 2
by a design factor of 2 or 3 before assessing its probability. Factors in large-scale studies that do
not show a significant χ 2 by this criterion are usually uninterpretable.

7.5.16 Factor scores

In TESTFACT, factor scores for the respondents can be computed by the Bayes/EAP (expected a
posteriori) method suggested by Bock & Aitkin (1981) (see also Muraki & Engelhard, 1985).
The posterior standard deviation (PSD) measuring the precision of each factor score estimate is
also computed. The factor scores are computed only for orthogonal solutions (principal factor or
varimax). Transformation to oblique factor scores (by the promax transformation, for example)
could be carried out subsequently, but there is no provision for that in the present version of the
program.

Factor scores may be computed either from standard difficulties and factor loadings estimated
within the program, or from standard difficulties and loadings supplied by the user from external
sources. Alternatively, item intercepts and slopes may be supplied. If the guessing model is se-
lected, chance success parameters must also be supplied.

The factor scores in TESTFACT are Bayes estimates computed on the assumption that the corre-
sponding ability factors are normally distributed in the population from which the sample of
588
7 OVERVIEW OF MODELS

examinees was drawn. That is, the score for each factor is the mean of the posterior (conditional)
distribution of ability, given the item response pattern of the examinee in question. The standard
deviation of the posterior distribution is also calculated and is interpreted as the standard error of
measurement for the score. The user can request the factor scores and corresponding standard
errors to be printed in the output listing and/or saved in an ASCII (plain text) file. The name of
that file will be the command filename with the extension *.fsc.

Following estimation of factor scores, the program will list their sample mean and variances, to-
gether with the mean-square and root-mean-square of the measurement errors. From the score
variance and the mean-square measurement error, the empirical reliability of the test in the par-
ticular sample of examinees is calculated and listed in the output. The expected value of each
factor score variance and the corresponding mean-square error is unity. If the sum of these listed
sample quantities varies widely from 1.0, it may be an indication of poor convergence or of the
presence of a near-Heywood case (see Bock & Zimowski, 1999, for details).

7.5.17 Number of quadrature points

The computations of MML item factor analysis can be time consuming when the number of
items and number of examinees are large, because the program must then evaluate the posterior
probability of each examinee’s response pattern at each point in the quadrature space. In con-
ventional multidimensional quadrature, i.e., “product” quadrature, the total number of points in
the full space is equal to the number of points in one dimension raised to the power of the num-
ber of dimensions. To avoid excessively long computing times in earlier versions of
TESTFACT, the total number of points was limited to 243. This meant that with three points per
dimension, the largest number of factors that could be accommodated in full information analysis
was five (while this is not true of classical partial information MINRES analysis, programming
restrictions allow only up to 15 factors in MINRES analysis).

With large numbers of items, perhaps 30 or more, the dispersion of the posterior distribution for
a given examinee can become small relative to that of the examinee population distribution in the
fill factor space. In that situation, the quadratures tend to become inaccurate with only 243 points
in the full space, because too few of the points fall in the neighborhood of the location of the pos-
terior. Rather than increase the total number of points to avoid this problem, TESTFACT now
employs a form of quadrature that adapts the placement of the points to the region of the factor
space occupied by the posterior distribution corresponding to each pattern. With this method,
three points per dimension are quite adequate for accurate estimation of the factor loadings and
factor scores. It also makes possible a form of “fractional” quadrature based on a subset of points
in product space. The method of choosing these points is described below.

For the integrations by adaptive quadrature, the user has the options of fill or fractional quadra-
ture for five factors. For one to four factors, the quadratures use all points in the product space;
for six to ten factors, the program uses successive one-third fractions of the points. Thus, the
number of points actually involved in the quadrature never exceeds 243. The points and weights
in these quadratures may be those for rectangular or for Gauss-Hermite quadrature, at the user’s
option. (Rectangular is the program default.) The program implementation of adaptive quadra-
ture is based on mathematical and statistical results of Naylor & Smith (1982), Schilling (1993),

589
7 OVERVIEW OF MODELS

Meng & Schilling (1996), Bock & Schilling (1997) and Schilling & Bock (1999). In addition to
increasing to 10 the program limit on the number of factors, adaptive quadrature improves accu-
racy of estimation, especially when the number of items is large. To allow for comparison be-
tween the two methods of quadrature, TESTFACT now includes a technical option, NOADAPT, to
invoke the non-adaptive procedure for up to five factors.

In fractional quadrature, a subset of the full set of points in product quadrature is selected in a
way that retains estimability of the parameters of the multiple factor model. Since factor analysis
is equivalent to determining the mean and covariance matrix of the latent factors (or an arbitrary
orthogonal transformation of the covariance matrix), any subset of points that allows means and
covariances to be estimated will be suitable for quadrature in MML item factor analysis. Designs
that have this property have been found for the formally equivalent problem of factorial experi-
ments in which main effects and two-way interactions must be estimable, but higher-way inter-
actions may be assumed null. Such designs exist for factorial experiments in 5 or more treatment
variables each with three equally spaced levels of treatment. Because these designs reduced the
total number of points by the one-third as an additional treatment is included, the total number of
treatment combinations remains fixed at 243 from six-treatment variable onward. With 5 treat-
ment variables, the one-third fraction contains 81 combinations.

The employment of these fractional factorial designs for multidimensional quadrature requires
only the choice of values corresponding to the treatment levels. Simulation studies by Schilling
& Bock (1998) have shown that - on the assumption that the latent factor score distribution is
multivariate normal - near optimum values are:

-1.2 0.0 1.2

These are the default values in TESTFACT, but they can be altered by the user if desired. What-
ever their values, the corresponding quadrature weights are the respective normal ordinates (den-
sities) constrained to sum to unity by dividing by their unconstrained total weight.

In Monte Carlo integration with importance sampling, the adaptation is carried out in a similar
way, but the points are drawn randomly from the provisionally estimated posterior distribution
corresponding to each distinct response pattern, which is assumed multivariate normal. In both
fractional quadrature and Monte Carlo integration, the factor scores for each examinee’s re-
sponse pattern are estimated by the Bayes modal method (also called “maximum a posteriori”,
or MAP estimation). In this method, the multivariate mode of the posterior distribution serves as
the estimate of the examinee’s factor scores, and the inverse of the Fisher information at the
mode (i.e., the curvature of the posterior density at the mode) serves to estimate the variances
and covariances of modal values. In the adaptive solution, the MAP estimates are recomputed
during each of the earlier EM cycles. After a number of cycles set by the ITLIMIT keyword of
the TECHNICAL command (default equals one-third of the number of cycles set by the CYCLES
keyword of the FULL command) the posterior modes and information matrix for each examinee
remain fixed for the rest of the EM cycles.

590
7 OVERVIEW OF MODELS

7.5.18 Monte Carlo integration

The program now provides the option of Monte Carlo integration in the EM solution of the mar-
ginal maximum likelihood equations. The user may choose the number of random deviates to be
sampled from the assumed multivariate normal distributions corresponding to each response pat-
tern in the data, and choose also the seed for their generation. In principle, this method of solving
the likelihood equations applies to any number of factors, but programming restrictions in the
implementation limit the number to 15.

7.5.19 Applications

Interesting applications of the TESTFACT program are described in Muraki & Engelhard
(1985), Zimowski (1985), Zwick (1987), Bock, Gibbons & Muraki (1988), and Zimowski &
Bock (1987).

591
8 ESTIMATION

8 Estimation

8.1 Introduction8

Item Response Theory (IRT) has only recently come into widespread application in the fields of
educational and psychological measurement. However, the theory is not really new. Its roots can
be found in the work of Thurstone (1925), Thorndike (1927), Symonds (1927), Horst (1933), and
others. Its beginnings are in the pioneering efforts of Lawley (1943) and Lord (1952). In contrast
to traditional test theory, as presented in Gulliksen’s (1950) landmark text and elsewhere, IRT
provides the following features:

 Respondents may be scored on the same scale, even if they do not respond to the same set
of items.
 Respondents may be comparably scored on two or more forms of the same test.
 Short forms, long forms, easy forms, hard forms, parallel forms, and other alternate forms
are all treated in the same way.
 Tests can be tailored to proficiency, with easy questions for those who show low profi-
ciency and difficult questions for those who exhibit higher proficiency.

The magic of IRT arises in placing all of the test scores on the same scale after all of these
machinations, even if the respondents answer different sets of questions. IRT also permits the use
of all of the information included in an examinee’s response to a question or test item, even if
that response may be in one of three or more graded categories (as on a rating scale) or in one of
several strictly nominal categories (as among the four or five choices of a conventional multiple-
choice item). Responses on attitude measures are frequently graded, and on multiple-choice pro-
ficiency tests some distractors are usually “wronger” than others. IRT permits the use of the in-
formation in any choice of an item response to be used to estimate the value of that respondent’s
trait or proficiency.

The power of IRT is associated primarily with the phrase “estimate the value of the trait”.
Loosely speaking, we say that a test is “scored”. But strictly speaking, the test is not scored; one
does not simply count the positive responses, as is done in traditional test theory. One “estimates
the value of the trait” using the inferred relationships between the item responses and the trait
being measured. In the process, one finds that there is no longer an idea of “reliability” in many
cases; instead, there is information. An understanding of this estimation process and the idea of
information in the technical sense (after Fisher, 1925) are crucial for an appreciation of the the-
ory. Both are discussed in the following sections.

8
This section was contributed by David Thissen.

592
8 ESTIMATION

8.1.1 Trait estimation with Item Response Theory

Item response theory is concerned with the probabilistic relationship between the response to
some test item and the respondent’s attribute that the test item is intended to measure. Test items
may be problems on a proficiency test, questions on an attitude scale, or behaviors on a behavior
checklist. The attribute of the person may be a cognitive proficiency, an attitude, or a personality
construct (either “trait” or “state”). The attribute being measured by the test is usually called θ
and is usually arbitrarily placed on a z-score scale, so zero is average and θ -values range, in
practice, roughly from -3 to +3. Item response theory is used to convert item responses into an
estimate of θ , as well as to examine the properties of the items in item analysis.

In its simplest form, item response theory is concerned with the relationship between binary test
items (correct/incorrect, agree/disagree, yes/no) and θ , however it may be conceived. In a useful
binary test item, the relationship between the probability of a positive response and θ must be
more or less like the function in the top panel of Figure 8.1. As is illustrated there, the probability
of a positive response (on the y-axis) is plotted against θ : it is an increasing, S-shaped function,
indicating a low probability of a positive response among persons of low θ , moderate probabil-
ity for individuals of average θ , and a high probability of positive response for persons of high
θ.

Figure 8.1: Probabilities and joint relative likelihood of sequence of binary items

 Top panel: A trace line for a binary test item (referred to as item 1); that is, the probability
of a positive response plotted against the trait value ( θ ).
 Center panel: The probability of a negative response to a second item (referred to as item
2).
 Lower panel: The joint relative likelihood of the response sequence {positive, negative}
as a function of θ .

593
8 ESTIMATION

Computational aspects of item response theory usually require that the function in Figure 8.1
have some specified mathematical form; the normal ogive and logistic functions have frequently
been used (see Lawley, 1943; Lord, 1952; Rasch, 1960; Birnbaum, 1968). In either case, each
binary item has a curve like that in the top panel of Figure 8.1, sometimes called an Item Char-
acteristic Curve (ICC) or “trace line” (Lazarsfeld, 1950), which is defined by its “location” and
“slope”. The latter terminology will be used here.

In some of the simpler IRT models, the location parameter of the trace line is the point on the θ -
scale at which the curve crosses P = 0.5. So persons whose trait value exceeds the location pa-
rameter of the item have greater than a 50% chance of a positive response, while persons whose
θ values lie below that location have less than 50% chance of a positive response. In the context
of proficiency tests, the location of an item corresponds to its difficulty: the higher the location
parameter, the more proficiency is required before the examinee has a 50% chance of a correct
response.

The slope of a trace line reflects the rate at which the probability of a positive response changes
as θ increases. This is the classical discrimination parameter. The trace line for item 2 in Figure
8.1 (for a negative or incorrect response, since it decreases over θ ) changes more quickly as θ
changes than does the trace line for item 1. The item 2 curve drops from about 0.9 to about 0.1
between 0 and 2, while it takes the range from -2 to +2 for the trace line for item 1 to climb from
0.1 to 0.9. Item 2 is an item with a higher slope than item 1. The location of item 2 is also higher
at θ =1.

If the trace lines for items 1 and 2 are known, or, more precisely, if their parameters are known,
and an examinee responds positively to item 1 and negatively to item 2, that information may be
used to estimate the θ -value for that person. One way to make such an estimate uses the princi-
ple of Maximum Likelihood (ML). If the item responses are independent (conditional on θ ), then
the joint likelihood of the sequence {positive response, negative response} ({right, wrong},
{agree, disagree}, and so on) at any value of θ is the product of the item 1 and item 2 probabil-
ity values at that level of θ in Figure 8.1. That product has been computed and is labeled “Total”
at the bottom of Figure 8.1. The total likelihood is low for low values of θ , because it is unlikely
that a person there would respond positively to item 1, and it is low for high values of θ , be-
cause it is unlikely that a person there would respond negatively to item 2. The total likelihood of
the sequence {positive, negative} is highest at about θ = 0.4 so that is the Maximum Likelihood
Estimate (MLE) for θ , called MLE[ θ ]. So a person who responds {positive, negative} might
be assigned a trait value of 0.4 as a “test score” or measurement.

The MLE is the mode of the total likelihood in Figure 8.1. If desired, the average, called EAP[ θ ]
for Expected a Posteriori, or some other estimate may be used. The point estimate of location
provides a very limited summary of the total likelihood. In a subsequent section on test informa-
tion, we will consider the addition of indices of spread or width of that likelihood around its lo-
cation. At this point, it is sufficient to understand that “estimation” of θ uses the relative likeli-
hood of the observed response sequence as a function of θ , and consists of summarizing the dis-
tribution of the likelihood, like the “Total” in Figure 8.1, by one or more numbers - the first of

594
8 ESTIMATION

which is always its location.

The procedure is easily extended to more items. Figure 8.2 shows the trace lines associated with
a five-item test, for the sequence {negative, negative, positive, positive, positive}. Again, the to-
tal at the bottom is a plot of the likelihood of that particular sequence over values of θ . There is
almost no likelihood above zero, and MLE[ θ ] for this sequence of responses to this very easy
test is about -1.3. So the examinees that responded with this sequence to these five items might
be assigned -1.3 as a point estimate of their trait values.

If items 1 and 2 of Figure 8.1 and items 1 to 5 of Figure 8.2 came from a pool of items which
measured the same trait ( θ ) and their item parameters were known, the ML estimates of θ in the
two cases would be on the same scale and thus directly comparable.

This is true even though the examinee represented by Figure 8.1 responded to only two items and
the examinee represented by Figure 8.2 responded to five (different) items. This feature of item
response theory allows tests made up of different sets of items (like tests with missing data, short
and long forms, alternate forms, and so on) to be used to assign comparable trait estimates to ex-
aminees. The ML estimation of θ for each person takes into account the properties of each item
in constructing the total likelihood of the observed responses. Thus, the estimate of θ which as
the value that has the highest likelihood of producing those responses, is the same regardless of
the set of items.

Figure 8.2: Trace lines and total relative likelihood for sequence of 5 binary items

 Top five panels: Trace lines for five binary items, in sequence {negative, negative, posi-
tive, positive, positive}.
 Lower panel: Total relative likelihood for that sequence as a function of θ .

595
8 ESTIMATION

Item responses need not be binary. Samejima (1969) (see Sections 2.3.2 and 2.4.2) has devel-
oped item response models for graded items with 3, 4, or more ordered categories of response
(like disagree, moderately disagree, moderately agree, and agree). The trace lines for the highest
and lowest of the graded responses are like those for positive and negative binary responses; for
binary items, the graded models become identical to binary models using the same functions. For
more than two graded responses, the intermediate responses have increasing, then decreasing
trace lines: intermediate responses must be more likely at some moderate value of θ . Figure 8.3
shows trace lines for a test of three graded items, with the probability of responses 2 on item 1, 3
on item 2, and 1 on item 3 plotted. Samejima’s graded model has one slope parameter for each
item (very high for item 3, moderate for item 2, and low for item 1 in Figure 8.3). There are sev-
eral location parameters, called thresholds: one less than the number of response categories.
Each location parameter specifies the point on the θ scale at which a person has a 50% chance
of responding in some higher category than the one to which the threshold belongs.

As illustrated in Figure 8.3, there is still a total likelihood over θ for any given response pattern,
even when all (or some) of the items have more than two possible responses. θ may still be esti-
mated as the maximum of the total likelihood, it is -1.2 in the example in Figure 8.3.

Figure 8.3: Trace lines and joint relative likelihood of response sequence of 3-category
items

 Top panel: Trace line for a response in category 2 of a 3-category graded item.
 Second panel: Trace line for a response in category 3 of a 3-category item.
 Third panel: Trace line for a response of 1 on another 3-category item.
 Fourth panel: The joint relative likelihood of the response sequence as a function of θ .

There are sometimes test items that permit multiple responses that have no obvious order. Multi-
ple-choice vocabulary questions are such items. While one response may be “correct”, the others
may not be obviously graded with respect to “correctness” although they may be chosen dif-

596
8 ESTIMATION

ferentially by examinees of different proficiency. Bock (1972) has proposed a logistic response
model for just such items. The parameters of that model are contrasts among the slopes and in-
tercepts of the trace lines. The model produces trace lines for each response category, which can
be combined with other trace lines and used in the ML estimation of θ . Thissen and Steinberg
(1984, see Section 7.4.4) describe an extended version of this model which is included in
MULTILOG.

Frequently, prior information is available or can be assumed about the distribution of θ in the
population of persons responding to a test. Such information, based on population membership,
is numerically equivalent to a test item to which all of the members of the population respond
identically. A N[0,1] prior – assuming that the examinees are drawn from a standard normal dis-
tribution – is equivalent to a trace line for a Gaussian distribution. The curve marked “Pop.” in
Figure 8.4 is a density for such a normal prior. Prior information of this sort may be combined
with the item responses just as though it represented an item on the test. The implicit test item is
“Do you belong in this population?” Yes = mean of population, No = missing data.

Item response theory is a flexible system for scoring tests and thus providing measurement of
individual differences. Alternative models are available for many types of item response data.
The reader is advised to examine the technical literature in this field or obtain competent advice
before applying any model (or set of models). Once the parameters are estimated, item response
theory should provide a satisfactory solution to any problem of psychological measurement.

8.1.2 Information

In the preceding section, we have made use of point estimates of the location of total likelihoods
as estimates of the unobserved trait value θ . When the total likelihood over θ derived from the
item responses is relatively flat, as it is for the five item responses in Figure 8.2, an unmitigated
version of this procedure should generate some discomfort in the reader. In Figure 8.2, there is a
substantial likelihood for the observed response pattern {00111} for all θ -values between –3.0
and 0.0. Under these circumstances, the point estimate MLE[ θ ] = –1.3 may apply too much pre-
cision.

An estimate of the width or spread of the total likelihood may be used to specify the precision
with which the MLE[ θ ] estimates θ . Since the form of the distribution of total likelihood is
roughly Gaussian, an estimate of the standard deviation of that distribution is a useful and widely
comprehensible index of spread. For the situation in which the item parameters are taken to be
fixed and known and the only parameter to be estimated is the trait-value θ for a particular ex-
aminee, the distribution is the sampling distribution of θ and its standard deviation is the stan-
dard error of MLE[ θ ].

597
8 ESTIMATION

Figure 8.4: Trace lines and joint relative likelihood for five binary items

 Top five panels: Trace lines for five binary test items {00100}.
 Sixth panel: N[0,1] population density.
 Lowest panel: The joint relative likelihood (posterior) as a function of θ .

It is possible to employ any variety of methods to estimate the spread of distributions such as the
total likelihood in Figure 8.2. One method, which is extremely convenient in the context of ML
estimation, makes use of the fact that the negative inverse of the expected value of the second
derivative of the log likelihood is approximately equal to the variance of the estimator (Fisher,
1925; Kendall & Stuart, 1961, p.10). While the term used to describe that number is unpleasantly
long, that value is a routine by-product of ML estimation. It is frequently used as an estimate of
the standard error to describe the spread of the total likelihood in terms that are interpretable in a
roughly Gaussian sense, i.e. a 95% confidence interval is MLE[ θ ] ± 2 standard errors.

The standard error estimated in this way for MLE[ θ ] = –1.3 from the likelihood at the bottom of
Figure 8.2 is 1.3. So the central (Gaussian) 68% confidence interval for θ would run from
-1.3 ± 1.3 = –2.6 to 0.0. Examination of Figure 8.2 reveals that, although the total likelihood is not
strictly Gaussian, the inflection points are very nearly at –2.6 and 0.0, as would be expected if
the distribution were Gaussian and 1.3 were the standard deviation.

Standard errors estimated in this way are different for different response patterns for the same
test. The likelihoods may be broad or narrow, depending on the relative locations of θ for the
individual and the item parameters. Since the standard error is the width of the total likelihood, it
varies. With some exceptions, there is a pattern to the standard errors: they are small for θ -loca-
tions near clusters of discriminating items and large far away, usually at the edges of the range of
the test. This variation is at odds with the concept of reliability ( ρ ), which is based on a model in
which all the estimates have the same error of estimate, equal to 1 − ρ for standardized tests.

598
8 ESTIMATION

So reliability is frequently not a useful characteristic of a test scored in this way. No single num-
ber characterizes the precision of the entire set of IRT trait-estimates made from a test. Instead,
the pattern of precision over the range of the test may be plotted. A plot of the standard error
against θ would serve this purpose, but the variable conventionally plotted is Information, which
is approximately equal to 1/(standard error)2 . This definition, due to Fisher (1925) and therefore
sometimes called Fisherian information, uses the word “information” in an intuitively obvious
way: if the standard error reflects our lack of knowledge about the parameter, then its inverse is
information. Information is used primarily because it is additive: each test item produces a fixed
quantity of information at each level of θ . The information function for a test is simply the sum
of the item information functions. This allows easy computation of information functions for
tests of varying compositions.

8.2 Estimation in BILOG-MG9


8.2.1 Item calibration

To make use of IRT for test scoring, the parameters of the model for each item of the test must
be estimated. Estimating the item parameter and checking the fit of the models is referred to as
item calibration. The calibration process requires item data from a sample of respondent who
have been administered the test under exactly the same conditions as those in which the test will
be used in practice. After a preliminary pilot study to select suitable items, the main item cali-
bration can be performed on data obtained in the first operational use of the test. Replacement of
items in subsequent administrations can then be carried out by one of the methods described in
the section on test equating.

Marginal Maximum Likelihood estimation (MML)

An approach to item parameter estimation that applies to all types of item response models and is
efficient for short and long tests is the method of marginal maximum likelihood (MML). (See
Bock & Aitkin, 1981; Harwell, Baker & Zwarts, 1988). Except in special cases, the MML
method assumes the conditional independence of responses to different items by persons of the
same ability θ . Because the joint probability of independent events is the product of the prob-
abilities of the separate events, this assumption makes it possible to calculate the probability of
observing a particular pattern of item scores,

x = ( x1 , x2 ,… , xn ) ,

in the responses of a person with ability θ .

9
This section was contributed by R. Darrell Bock.

599
8 ESTIMATION

This probability may be expressed as

n
P (x | θ ) = ∏ [ Pj (θ )] j [1 −Pj (θ )]
x 1− x j
Equation Section 8 (8.1)
j =1

that is, as the continued product of Pj (θ ) or 1 − Pj (θ ) according as the person response correctly
or incorrectly to item j. This quantity is the probability of the pattern x , conditional on θ . It is
to be distinguished from the probability of observing the pattern x from a person of unknown
ability drawn at random from a population in which θ is distributed with a continuous density
g (θ ) . The latter is the unconditional probability given by the definite integral,


P (x) = ∫ P(x | θ )g (θ ) dθ . (8.2)
−∞

This quantity is also referred to as the marginal probability of x . Because the ability, θ , has
been integrated out, this quantity is a function of the item parameters only.

In IRT applications, the integral in (8.2) cannot generally be expressed in closed form, but the
marginal probability can be evaluated as accurately as required by the Gaussian quadrature for-
mula

_ q
Px ≈ ∑ P (x | X k ) A( X k ), (8.3)
k =1

where X is a quadrature point and A( X k ) is a positive weight corresponding to the density func-
tion, g ( X ) . Tables giving quadrature points and corresponding weights are available for various
choices of g (θ ) (see Stroud & Sechrest, 1966). We recommend 2 x the square root of the num-
ber of items as the maximum number of quadrature points.

In the MML method, values for the item parameters are chosen so as to maximize the logarithm
of the marginal maximum likelihood function, defined as

S _
log LM = ∑ rl log e P ( xl ), (8.4)
l =1

where rl is the frequency with which the pattern x l is observed in a sample of N respondents,
and S is the number of distinct patterns observed.

A necessary condition on the maximum of (8.4) for the 3PL model of item j is given by the like-
lihood equations

600
8 ESTIMATION

0 
q  r jk − N jk Pj ( X k )  ∂Pj ( X k )  
∑  P ( X )[1 − P ( X )]   c  = 0 , (8.5)
k =1  j k 
k j
 
j 0 
∂ a j 
g j 
 

where

S
r jk = ∑ rl xlj P (x l | X k ) A( X k ) / P xl
l
and (8.6)
S
N k = ∑ rl P (x l | X k ) A( X k ) / P xl
l

are, respectively, the posterior expectation of the number-correct and of the number of attempts
at point X k . ( xlj is the 0,1 score for item j in pattern l).

The so-called EM algorithm and Newton-Gauss (Fisher-scoring) methods are used to solve these
implicit equations. Details may be found in Bock and Aitkin (1981) and Thissen (1982). Stan-
dard errors and correlations of the parameter estimators are obtained by inverting the information
matrix in the Fisher-scoring solution.

Marginal Maximum A Posteriori estimation (MMAP)

MML estimation for the two- and three-parameter models is essentially a one-dimensional item
factor analysis. As such, it is subject to so-called Heywood cases in which a unique variance
goes to zero. The symptom of such a case is an indefinitely increasing slope during the EM and
Newton iterations of the maximization.

Because all items are fallible to some degree, zero unique variance is untenable. It is therefore
reasonable to avoid Heywood cases by placing a stochastic constraint on the item slopes to pre-
vent them from becoming indefinitely large. This may be done by adopting a Bayes procedure
called “marginal maximum a posteriori” (MMAP) estimation. In one form of this procedure, the
slopes (which must be positive) are assumed to have a log normal distribution in the domain
from which the items are drawn. Values for the item parameters are then chosen so as to maxi-
mize the logarithm of the product of the likelihood of the sample and the assumed log normal
“prior” distribution of the slopes. The parameters of this log normal distribution for slopes can be
either specified as a priori—the Bayes solution—or estimated from the data at hand—an empiri-
cal Bayes solution. This amounts to finding the maximum of the posterior distribution of the
slopes, given the data.

For the three-parameter model, a similar procedure is needed to keep the lower asymptote pa-
rameter, g j , in the open interval from 0 to 1. The beta distribution may be used for this purpose.
The intercept parameter can also be constrained to a plausible region, although this is less im-

601
8 ESTIMATION

portant than constraining the slope and asymptote. (See Mislevy, 1986, and Tsutakawa & Lin,
1986, for details).

Estimation of the latent distribution

It is possible to estimate the distribution of θ by MML. To do so it is necessary to solve the in-


determinacy of location and scale that is inherent in the item response models. This indetermi-
nacy arises because, in the logit,

z j = a j (θ − b j ) ,

any change in the origin of θ can be absorbed in b j , and any change in the unit of θ can be ab-
sorbed in a j . A widely accepted convention is to fix location by setting the mean of the latent
distribution (of θ ) to 0 and to fix scale by setting the standard deviation of the distribution to 1.
The parameters are then said to be in the “0, 1” metric. To set the mean and standard deviation to
some other values, m and s, say, it is only necessary to change b j to

b*j = sb j + m (8.7)

and

a*j = a j / s. (8.8)

If the logit is parameterized as z j = a jθ + c j , then the change of c j is

c*j = c j − a j m / s. (8.9)

The asymptote parameter, g j , is not affected by these changes.

A convenient way to characterize an arbitrary latent distribution with finite mean and variance is
to compute the probability density at a finite number of suitably chosen values of θ and to nor-
malize the densities by dividing by their total. The result is a so-called “discrete distribution on a
finite number of points” (Mislevy, 1984). These normalized densities can be used as the weights,
A( X k ) in quadrature formulas such as (8.3).

This discrete representation of the latent distribution can be readily estimated from the item re-
sponse model. The expected frequency at point X k , given the item data from a sample of N re-
spondents, is N k , the expected number of attempts defined by (8.6) above.

602
8 ESTIMATION

The estimated densities are these values divided by their total:

q
A* ( X k ) = N k / ∑ N h . (8.10)
h

They are called empirical or “posterior” weights, as distinguished from the theoretical or “prior”
weights, A( X k ) , assumed before the data are in hand.

Testing the goodness-of-fit of the IRT model

If data from a large sample of respondents is available, the fit of the model may be tested, either
for the test as a whole, or item by item. The method of examining fit depends upon the number
of items in the test.

(i) Very short tests (10 or fewer items)

If nearly all of the 2n possible response patterns for an n-item test appear in the data, the overall
goodness-of-fit of the model can be tested directly. The distinct response patterns must be
counted to obtain the pattern frequencies r1 , r2 ,…, r2 .
n

If a few of these frequencies are zero, ½ may be substituted for each and the sum of these sub-
stitutions subtracted from the largest frequency. Then the likelihood ratio χ 2 statistic for the test
of fit is

2n
rl
G 2 = 2∑ rl log e , (8.11)
l =1 N P ( xl )

where P( xl ) is the marginal probability of pattern x l given by (8.3). This χ 2 has degrees of
freedom 2n − kn − 1 , where k is the number of item parameters in the model. Significantly large
values of the statistic indicate a failure of fit of one or more of the response models for the n
items.

(ii) Short tests (11 to 20 items)

No dependable, formal test of fit yet exists for all of this range. But useful information about the
fit of individual items may be obtained by inspecting standardized differences between the poste-
rior probability of correct response at selected values of θ and the probabilities at those points
computed from the corresponding fitted response model. These differences are called “standard-
ized posterior residuals”.

In terms of quantities defined above, the posterior probability of a correct response is computed
for item j at the point X k is the ratio r jk / N k , the terms of which are defined above.

603
8 ESTIMATION

The corresponding standardized posterior residual can be expressed as follows:

S
∑Wlk [ xlj − P( X k )]
δ jk = S
l
(8.12)
{∑Wlk [ xlj − P ( X k )]} 1/ 2

where

rl P (x l | X k )
Wlk = . (8.13)
P (x l )

Values of this residual greater than, say, 2.0 may be taken to indicate some failure of fit of the
model at the corresponding point. In interpreting such deviates, it is advisable to take into con-
sideration the posterior weight, A* ( X k ) , at the point, since a discrepancy in a region of θ with
very little probability in the population will have little effect on the performance of the model.

As an overall index of fit, we suggest the population root-mean-square of the posterior deviates.
Its formula is

1/ 2
q q 
RMS (δ j ) =  ∑ N k δ 2jk / ∑ N k  .
k k 

Unfortunately, the posterior residuals seem to be too highly correlated to be successfully com-
bined into a χ 2 statistic for the item. Neither do they take into account the sampling variance of
Pj ( X k ) due to estimation of its item parameters, but this source of variation is presumably small.

(iii) Long tests (more than 20 items)

If the test is sufficiently long, the respondents in a sample of size N can be assigned with good
accuracy to intervals on the θ -continuum on the basis of their estimated value of θ . For this pur-
pose, we use the EAP estimate with whatever prior is assumed for item calibration. The esti-
mated θ ‘s are rescaled so that the variance of the sample distribution equals that of the latent
distribution on which the MML estimation of the item parameters is based. The number of re-
spondents in each interval who respond correctly to item j can be tallied from their item scores.
Finally, a likelihood ratio χ 2 statistic may be used to compare the resulting frequencies of cor-
rect and incorrect responses in the intervals with those expected from the fitted model at the in-
terval mean, θ h :

604
8 ESTIMATION

ng  
rhj N h − rhj
G 2j = 2∑  rhj log e + ( N h − rhj ) log e , (8.14)
h =1 
 N h Pj (θ h ) N h [1 − P j (θ h )] 

where ng is the number of intervals, rhj is the observed frequency of correct response to item j in
interval h, N h is the number of respondents assigned to that interval, and Pj (θ h ) is the value of
the fitted response function for item j at θ h , the average ability of respondents in interval h.

Because neither the MML nor the MMAP method of fitting the response functions actually
minimizes this χ 2 , the residuals are not under linear constraints and there is no loss of degrees of
freedom due to the fitting of the item parameters. The number of degrees of freedom is therefore
equal to the number of intervals remaining after neighboring intervals are merged if necessary to
avoid expected values less than 5.

To diagnose cases of poor fit, one can inspect a plot of rhj / N h compared to Pj (θ h ) . Ninety-five
percent tolerance intervals on these points are

±2 Pj (θ h )[1 − Pj (θ h ) / N h .

When the number of items is small, the standardized posterior deviates should be plotted instead.

8.2.2 Test scoring

Unlike classical test theory, IRT does not in general base the estimate of the respondent’s ability
(or other attribute) on the number-correct score. The only exception is the one-parameter logistic
model, in which the estimate is a non-linear function of that score. To distinguish IRT scores
from their classical counterparts, we refer to them as “scale” scores.

The main advantages of scale scores are that they

 remain comparable when items are added to or deleted from the tests,
 weight the individual items optimally according to their discriminating powers,
 have more accurate standard errors,
 provide more flexible and robust adjustments for guessing than the classical corrections,
and
 are on the same continuum as the item locations.

There are three types of IRT scale score estimation methods now in general use:

 Maximum likelihood estimation


 Bayes estimation
 Bayes modal estimation.

The three types of IRT scale score estimation methods are discussed in the sections to follow.

605
8 ESTIMATION

Maximum likelihood estimation

The maximum likelihood (ML) estimate of the scale score of respondent i is the value of θ that
maximizes

n
log Li (θ ) = ∑ {xij log e Pj (θ ) + (1 −xij ) log e [1 − Pj (θ )]}, (8.15)
j =1

where Pj (θ ) is the fitted response function for item j.

The implicit likelihood equation to be solved is

∂ log Li (θ ) n xij − Pj (θ ) ∂Pj (θ )


=∑ ⋅ = 0.
∂θ j =1 Pj (θ )[1 − Pj (θ )] ∂θ

The ML estimate, θ , is conveniently calculated by the Fisher-scoring method, which depends on


the so-called “Fisher information”,

n
I (θ ) = ∑ a 2j Pj (θ )[1 − Pj (θ )], (8.16)
j =1

in the case of the two-parameter logistic model. Similar formulas are available for the other
models. The iterations of the Fisher-scoring solution are

 ∂ log Li (θ ) 
θ t +1 = θ t + I −1 (θ )   .
 ∂θ 

The standard error of the ML estimator is the square root reciprocal of the information at θ :

S.E.(θ ) = 1/ I (θ ). (8.17)

Unlike the classical standard error of measurement, which is a constant, the IRT standard error
varies across the scale-score continuum. It is typically smaller towards the center of the scale
where more items are located and larger at the extremes where there are fewer items. A disad-
vantage of the ML estimate is that it is not defined for the response patterns in which all items
are correct or all items are incorrect (and occasionally for other unfavorable patterns near the
chance level when the three-parameter model is used). These problems do not arise in the other
two methods of estimation.

606
8 ESTIMATION

Bayes estimation

The Bayes estimate is the mean of the posterior distribution of θ , given the observed response
pattern xi (Bock & Mislevy, 1982). It can be approximated as accurately as required by the
Gaussian quadrature (see the section on MML estimation):

∑X k P(xi | X k ) A( X k )
θi ≅ k =1
q
.
∑ P(x
k =1
i | X k ) A( X k )

This function of the response pattern xi has also been called the expected a posteriori (EAP) es-
timator. A measure of its precision is the posterior standard deviation (PSD) approximated by

∑(X k − θ i ) 2 P(xi | X k ) A( X k )
PSD(θ i ) ≅ k =1
q
.
∑ P(x
k =1
i | X k ) A( X k )

The weights, A( X k ) , in these formulas depend on the assumed distribution of θ . Theoretical


weights, empirical weights, A* ( X k ) , or subjective weights are possibilities.

The EAP estimator exists for any answer pattern and has a smaller average error in the popula-
tion than any other estimator, including the ML estimator. It is in general biased toward the
population mean, but the bias is small within ±3 σ of the mean when the PSD is small (e.g., less
than 0.2 σ ). Although the sample mean of EAP estimates is an unbiased estimator of the mean of
the latent population, the sample standard deviation is in general smaller than that of the latent
population. This is not a serious problem if all the respondents are measured within the same
PSD. But it could be a problem if respondents are compared using alternative test forms that
have much different PSDs. The same problem occurs, of course, when number-right scores from
alternative test forms with differing reliabilities are used to compare respondents. Tests adminis-
trators should avoid making comparisons between respondents who have taken alternative forms
that differed appreciably in their psychometric properties. A further implication is that, if EAP
estimates are used in computerized adaptive testing, the trials should not terminate after a fixed
number of items, but should continue until a prespecified PSD is reached.

Bayes modal estimation

Similar to the Bayes estimator, but with a somewhat larger average error, is the Bayes modal or
so-called maximum a posteriori (MAP) estimator. It is the value of θ that maximizes

607
8 ESTIMATION

n
P(θ | xi ) = ∑ {xij log e Pi (θ ) + (1 −xij ) log e [1 − Pi (θ )]} + log e g (θ ),
j =1

where g (θ ) is the density function of a continuous population distribution of θ . The likelihood


equation is

n xij − Pj (θ ) ∂Pj (θ ) ∂ log e g (θ )


∑ P (θ )[1 − P (θ )] ⋅
j =1 ∂θ
+
∂θ
= 0.
j j

Analogous to the maximum likelihood estimate, the MAP estimate is calculated by Fisher scor-
ing, employing the posterior information,

∂ 2 log e g (θ )
J (θ ) = I (θ ) + ,
∂θ 2

where the right-most term is the second derivative of the population log density of θ .

In the case of the 2PL model and a normal distribution of θ with variance σ 2 , the posterior in-
formation is

n
1
I (θ ) = ∑ a 2j Pj (θ )[1 − Pj (θ )] + .
j =1 σ2

The PSD of the MAP estimate, θ , is approximated by


PSD(θ ) = 1/ I (θ ).

Like the EAP estimator, the MAP estimator exists for all response patterns, but is generally bi-
ased toward the population mean.

8.2.3 Test and item information

According to classical theory, the standard error of measurement (SEM) is a function only of the
test reliability and the variance of the score distribution. But this is an oversimplification. Actu-
ally, the error standard deviation of a score on a test of finite length—whether the classical num-
ber-right score or an IRT scale score—also depends upon the level of the score itself.

When the maximum likelihood estimator is used to obtain an IRT scale score, the SEMs of the
three logistic models expressed in the normal metric are as follows:

608
8 ESTIMATION

1PL:

1
 n 2
S.E.(1) (θ ) = 1/ D 2 a 2 ∑ P(1) j (θ )[1 − P(1) j (θ )] (8.18)
 j =1 

2PL:

1
 n 2
S.E.(2) (θ ) = 1/ D 2 ∑ a 2j P(2) j (θ )[1 − P(2) j (θ )] (8.19)
 j =1 

3PL:

1
 1 − P(3) j (θ )  P(3) j (θ ) − g j 
2
2
  n

S.E.(3) (θ ) = 1/ D ∑ a j
2 2
⋅   (8.20)
θ)  1− g j 
 j = 1 P(3) j (   

Although these formulas are more realistic than the classical standard error of measurement, they
are nevertheless approximations. Strictly speaking, they are exact only as the number of items
becomes indefinitely large. But in general, they are good approximations for tests with as few as
ten or twenty items. Although they neglect the errors due to estimating the item parameters, these
errors are inconsequential if the calibration sample is large.

Because the terms that are summed in the information functions (8.18), (8.19), and (8.20) can be
regarded as the information functions of the items, they show how the SEM depends upon the
item slopes, locations and lower asymptotes. By plotting the item information functions of the
items against the test information, the test constructor can see which items are contributing most
to increasing the test information in relevant regions of the scale, and thus to decreasing the
SEM. The plots show where additional items are needed to improve the precision of measure-
ment locally. Generally, the aim is to produce a test item function that is high and flat over the
range of θ in which accurate measurement is required.

It is evident in the information functions for the logistic models that, as Pj (θ ) goes to 1 or 0 (or
to g j for the 3PL model), the information goes to zero and the standard error to infinity. Thus,
the ML estimator is effective only over a finite range. As a result, it is necessary to set some
limit, perhaps ±5 standard deviations of the latent distribution, as upper and lower bounds of θ .

The posterior information for the Bayes modal (MAP) estimator has properties similar to those
of the Fisher information of the ML estimator except that, when the prior is suitably chosen (e.g.,
normal) the posterior information does not go to zero as θ becomes extreme. Rather, for a nor-
mal prior, the posterior information goes to 1/ σ 2 , and the SEM goes to the population standard

609
8 ESTIMATION

deviation, σ , which means that nothing is known about θ except that it is very large or very
small, depending on the sign of θ .

The squared inverse posterior standard deviation (PSD) of the Bayes (EAP) estimator does not
have the convenient additive property of the Fisher and posterior information. But because of the
equivalence of the EAP and MAP estimators as the number of items becomes large, ML infor-
mation analysis of items can be applied to the EAP estimation for most practical purposes of test
construction.

8.2.4 Effects of guessing

Guessing in response to multiple-choice items has a deleterious effect on any estimator of ability,
classic or IRT. For the three-parameter model, the average effect of guessing, and thus the size of
the asymptote parameter, g j , can be reduced by instructing the examinees to omit the item rather
than make a blind guess. But when the three-parameter model is used in scoring, it does not dis-
tinguish between those examinees who omit and those who ignore the instructions, and guess.

Two methods of improving the accuracy of scale score estimation in the presence of mixed omit-
ting and guessing have been proposed. One method is to assign to the omitted responses a prob-
ability equal to the asymptote parameter, g j , or to 1/A, where A is the number of alternatives of
the multiple-choice item (Lord, 1980, p. 229). In effect, the omitted responses are replaced by
guessed responses and scored fractionally correct.

The other method is to score omits as incorrect, but suppress the effects of guessing by giving
reduced weight to unlikely correct responses in the response pattern. A technique of robust data
analysis, called “biweighting”, has been proposed for this purpose (Mislevy & Bock, 1982).
Simulation studies have shown that such robustifying procedures improve the accuracy of esti-
mating ability in the presence of chance successes in response to multiple-choice items.

8.2.5 Aggregate-level IRT models

In some forms of educational assessment, scores are required for populations of groups and stu-
dents (schools, for example) rather than for individual students (Mislevy, 1983). In these appli-
cations, IRT scale scores for the groups can be estimated directly from matrix sampling data if
the following conditions are met:

 The assessment instrument consists of 15 or more randomly parallel forms, each of which
contain exactly one item from each content element to be measured.
 The forms are assigned in rotation to students in the groups being assessed and adminis-
tered under identical conditions.

On these conditions, it may be reasonable to assume that the ability measured by each scale is
normally distributed within the groups. In that case, the proportion of students in the groups who
respond correctly to each item of a scaled element will be well approximated by a logistic model

610
8 ESTIMATION

in which the ability parameter, θ , is the mean ability of the group. Because each item of the ele-
ment appears on a different form, these responses will be experimentally independent.

An aggregate-level IRT model can therefore be used to analyze data for the groups summarized
as the number of attempted responses, N hj , and the number of correct responses, rhj , to item j in
group h. The probability of these response frequencies for the n items of the element, given the
mean ability of the group, θ h , is then

n N hj !
P(rh | N h ,θ h ) = ∏
N hj − rhj
⋅ [Ψ j (θ h )] hj [1 − Ψ j (θ h )]
r
. (8.21)
j =1 ( N hj − rhj )!rhj !

Using (8.21) in place of the individual MML estimator

P(x | θ ) = ∏ j =1[ Pj (θ )] j [1 −Pj (θ )]


n x 1− x j

we can carry out MML estimation of item parameters for the aggregate-level IRT model in the
same manner as estimation for the individual-level model. Scale scoring of the pattern of fre-
quencies of attempts and correct is performed by a similar substitution in (8.15), (8.16) or (8.17).
All other aspects of the IRT analysis are unchanged.

Unlike the individual-level analysis, the aggregate-level permits a rigorous test of fit of the re-
sponse pattern for the group. Because the response frequencies for the items of a scaled element
are binomially distributed and independent, a likelihood ratio or Pearsonian χ 2 test statistic may
be computed to test the fit of the model within each group.

The starting values computed in the input phase and used in item parameter estimation in the
calibration phase in BILOG-MG are generally too high for aggregate-level models. The user
should reduce these values by substituting other starting values in the TEST command.

8.3 Estimation in PARSCALE10

PARSCALE estimates the parameters of the response models by marginal maximum likelihood
assuming either a normal or empirically estimated latent distribution with mean zero and stan-
dard deviation one (see Muraki, 1990). The EM algorithm is used in the solution of the likeli-
hood equations starting from the initial values described previously. The current version includes
the Newton cycles used in BILOG-MG to improve the EM results.

Because of the potentially wide spacing of category boundary locations on the latent dimension,
it is advisable to use a greater number of quadrature points than in BILOG-MG. Thirty points is

10
This section was contributed by Eiji Muraki.

611
8 ESTIMATION

the default. Simulation studies show that with smaller numbers of points the item slopes are in-
creasingly underestimated. The effect tends to be proportional, however, and is hardly apparent
in the test scores when they are rescaled to an assigned standard deviation in the sample.

Despite the greater number of parameters in the multiple-category models as opposed to the bi-
nary, the greater information in the data allows stable estimation in similarly sized samples.
Sample sizes around 250 are marginally acceptable in research applications, but 500 or 1000
should be required in operational use. Beyond 1000, the additional precision may not justify the
additional computing time.

8.3.1 Prior densities for item parameters

For a slope parameter, we assume that the natural logarithm of the parameter, ln(a j ) , is distrib-
uted as N ( µ a ,σ a2 ) :

1  −(ln a j − µ a ) 2 
f (a j ) = exp  .
σ a (2π )1/ 2  2σ a2 

We assume that a threshold parameter b j is distributed N ( µb ,σ b2 ) :

1  −(ln b j − µb ) 2 
f (b j ) = exp  .
σ b (2π )1/ 2  2σ b2 

For a guessing or lower asymptote parameter g j , a beta density is used:

g αj −1 (1 − g j ) β −1
f (g j ) =
B(α , β )

where B (α , β ) is the beta function.

8.3.2 Rescaling the parameters

The graded response model and the partial credit model contain the element

z jk (θ ) = Da j (θ − b jk )
= Da j (θ − b j + ck ).

Let us change the location and scaling of the original θ scale by

θ * = Aθ − B,

612
8 ESTIMATION

where θ ∼ N (m, s 2 ) and θ * ∼ N (m* , s*2 ) ; then A = s* / s and B = Am − m* . Then, the element
z jk (θ * ) on the new θ * scale is expressed by

z jk (θ * ) = Da*j (θ * − b*jk )
b*jk + B
= Da*j (θ − ),
A

where D is the adjustment for a normal metric. We then obtain the following relations:

aj
a*j =
A

and

b*jk = Ab jk − B.

We define b jk = b j − ck and constrain the category parameters by

mj mj

∑ ck =∑ ck* = 0.
k =0 k =0

Therefore, the location shift, B, is absorbed by the item location parameter, b j . Consequently, we
obtain b*j = Ab j − B and ck* = Ack .

8.3.3 The information function

The item information function, I j (θ ) , is the information contributed by a specific item j. The
item information for the polytomous item response model as proposed by Samejima (1974) is

mj
I j (θ ) = ∑ Ajk (θ )
k =0
2
 ∂ 
mj  Pjk (θ ) 
 ∂θ 
= ∑ P (θ )
k =0 jk

where A jk (θ ) is called the basic function of the item response model.

613
8 ESTIMATION

For the normal ogive form of the graded response model, the basic function A jk (θ ) is written as

[ϕ jk (θ ) − ϕ j ,k +1 (θ )]2
A jk (θ ) = D 2 a j
A jk (θ )

where ϕ jk (θ ) is the normal ordinate for Pjk+ (θ ) . For the logistic graded response model, the basic
function becomes

[ Pjk+ (θ )[1 − Pjk+ (θ )] − Pj+,k +1 (θ )[1 − Pj+,k +1 (θ )]]2


A jk (θ ) = D 2 a 2j .
Pjk (θ )

The item information for the partial credit model is

 mj  mj  
2
 T 2 P (θ ) −  T P (θ )   .
I j (θ ) = D 2 a 2j
∑ c jc
 c =0

c jc
 
 c =0 

For the case of dichotomous item responses, the equation simplifies to

I j (θ ) = D 2 a 2j (T0 − T1 )2 Pj 0 (θ ) Pj1 (θ ),

where Pj1 (θ ) = 1 − Pj 0 (θ ) .

Bock (1972) proposed the information due to the response in category k of item j as the partition
of the item information, that is,

I jk (θ ) = Pjk (θ ) I j (θ ).

This result may be called the item’s response information function, according to Samejima’s
term, although she formulated it slightly differently.

The item information function may also be expressed by the summation of the response infor-
mation functions:

mj
I j (θ ) = ∑ I jk (θ ).
k =0

614
8 ESTIMATION

Finally, the test information function is defined as the summation of item information functions:

n
I (θ ) = ∑ I j (θ ).
j =1

8.3.4 Warm’s weighted ML estimation of ability parameters

Warm’s (1989) weighted maximum likelihood (WML) estimator is obtained by maximizing the
likelihood weighted by a square root of the test information function. The likelihood of a par-
ticular response vector ( U jk ) given θ is

∏ j =1∏ k =0[ Pjk (θ )]


n mj
L[(U jk ) | θ ] =
U jk
.

The weighted likelihood is given by

L*[(U jk ) | θ ] = f (θ ) L[(U jk ) | θ ].

Taking a natural logarithm of the weighted likelihood above yields

ln L*[(U jk ) | θ ] = ln f (θ ) + ln L[(U jk ) | θ ].

A weighted maximum likelihood estimator WML( θ ) is the value that maximizes the weighted
likelihood above. If f (θ ) is a positive constant, WML( θ ) is a maximum likelihood estimate of
θ . If f (θ ) is a square root of the test information function I (θ ) , it is called Warm’s weighted
maximum likelihood estimate, WML( θ ). This is not a Bayesian estimator of a latent trait
since f (θ ) is not a prior probability, but a reciprocal of the standard error of MLE( θ ).

The logarithm of the weighted likelihood for WML( θ ) becomes

1  n j 
m n mj
ln L*[(U jk ) | θ ] = ln  ∑∑
2  j =1 k =0
Pjk (θ ) I jk (θ )  +

∑∑U ijk ln[ Pjk (θ )].
 j =1 k = 0

The Newton-Raphson technique is used to obtain the MLE( θ ) or WLE( θ ) via an iterative pro-
cedure. The Newton-Raphson estimation equation is given by

∧ ∧  ∂ 2 ln L*[(U jk ) | (θ )]   ∂ ln L*[(U jk ) | (θ )] 
θ q +1 = θ q −  − .
 ∂θ 2   ∂θ 

The PARSCALE program can also compute EAP( θ ) scores in addition to MLE( θ ) and
WML( θ ).

615
8 ESTIMATION

8.4 Estimation in MULTILOG


8.4.1 Item parameter estimation

In the discussion in Section 3.1, we have assumed that the item parameters were known. Usually,
they are not. Current practice requires estimation of the item parameters using empirical data.
Usually it is desirable to use a sample sufficiently large that the standard errors of the estimated
item parameters are small, and can be ignored in future use of the parameters. Such a sample is
called a calibration sample.

The sample size required for useful item calibration varies widely, depending on the format of
the response and the strength of the relationship between the item responses and the trait. Con-
straints of the item parameters, in the form of equality constraints or prior information incorpo-
rated using Bayes' rule, facilitate estimation with relatively small samples. Large numbers of un-
constrained item parameters require relatively large samples. An example of a highly constrained
model is the Rasch (1960) 1PL model. Only a few hundred examinees may serve to calibrate a
test under this model. On the other hand, a model with three unconstrained parameters per item
(as in the 3PL: slope, location and “pseudo-guessing” parameters) may require tens of thousands
of examinees to calibrate successfully (Thissen and Wainer, 1982). The relationship of the item
responses to the trait is crucial: the parameters of items, which are strongly influenced by the
trait, may be estimated precisely with few observations, while weakly related items may require
more. So no real guidelines are available in each case for the standard errors of the parameters to
determine if the precision of the estimation is satisfactory.

Item parameter calibration problems fall into two broad categories: those in which θ is consid-
ered fixed and those in which it is considered random. The random- θ problem is the situation
most commonly encountered in psychological measurement, but the fixed- θ case is simpler, so
we will discuss that first.

If θ is assumed to have fixed values, two more alternative conditions arise: the fixed values of θ
may also be taken to be known, or the values of θ may be taken to be unknown parameters. In
the former case, item parameter calibration is simply a problem in nonlinear regression of the
item responses on θ . An example of IRT item calibration in this sort of problem is provided in
the Roche, Wainer and Thissen (1975) measurement of skeletal age. The Roche et al. system
makes use of graded indicators of skeletal maturity that are observable on radiographs. Thus,
each indicator is like a test item, and skeletal age is estimated as was θ above. To calibrate the
indicators, or items, Roche et al. defined skeletal age to be linearly related to chronological age
in the population from birth to maturity. Then the parameters of the graded response model were
estimated by nonlinear regression of the observed indicator grades on the ages of those meas-
ured. This procedure depends on the existence of an observed variable linearly related to the trait
being measured, called a criterion.

Estimation is complicated somewhat if θ is taken to have fixed but unknown values for several
pre-defined groups of examinees. Bock (1983) considered such a problem in an item calibration
context very similar to the skeletal age problem, in which the items to be calibrated were 96
questions from the Stanford-Binet and the examinees were again classified by age. Each age
616
8 ESTIMATION

group was assumed to have some fixed mean developmental age ( θ ), which was estimated si-
multaneously with the item parameters. Bock (1976) and Kolakowski and Bock (1981) discuss
an algorithm for the simultaneous estimation of the fixed values of θ and the item parameters in
general, for any item response model. This procedure depends on the existence of a division of
the examinees into homogeneous groups with respect to the trait being measured. An example,
amounting to a very simple fixed effects, unknown- θ -calibration of a single “item”, is given in
Chapter 5.

When θ is assumed to be an unobserved random variable, the only fixed parameters to be esti-
mated are the item parameters, but their estimation is numerically complex. For categorical item
response models, Bock and Lieberman (1970) provided a theoretically satisfying but impractical
algorithm; Bock and Aitkin (1981) provided the workable algorithm used in MULTILOG. Its
workings are explained there, and by Thissen (1982) for the 1PL model and Thissen & Steinberg
(1984) for the multiple-choice model.

MULTILOG reports the closest approximation to reliability available in the context of IRT, the
so-called marginal reliability. This value is (effectively) an average reliability over levels of θ ;
it is an accurate characterization of the precision of measurement only if the test information is
relatively uniform.

617
9 USES OF ITEM RESPONSE THEORY

9 Uses of Item Response Theory11

9.1 Introduction

The development of item response theory (IRT) has reached a point where testing applications,
whether in educational or psychological testing programs or in research, can be carried out en-
tirely with IRT methods. These methods have significant advantages over those of classical test
theory: they improve the quality of the tests and scales produced, handle a wider range of re-
sponse modes, facilitate test equating, allow adaptive test administration to reduce testing time,
and offer important economies in labor and cost of test construction and maintenance. The more
limited methods that grew from classical theory were strongly conditioned by the rudimentary
data processing capabilities available in the formative years from 1910 to 1950. The more flexi-
ble and efficient, but computationally intensive, IRT methods could not develop and find practi-
cal use until electronic computation became widely accessible. Although classical and IRT
methods now exist side-by-side in computer implemented form, IRT uses the power of com-
puters in more varied and effective ways.

For the benefit of readers who have studied and worked with classical methods prior to or along
with IRT, this chapter contrasts these approaches to item response data in various areas of appli-
cation. In terms of present uses of tests and scales, the following five areas perhaps cover most
possibilities:

 Selection testing
 Qualification testing
 Program evaluation and assessment testing
 Clinical testing
 Measurement methods and research

These five areas are discussed in Sections 9.2 to 9.6. In Section 9.7 various approaches to analy-
sis of item response data are considered.

9.2 Selection testing

Selection tests are administered to persons competing for a limited number of positions in some
organization or program. Examples are college entrance examinations, employment tests, civil
service examinations, military enlistment tests, etc. A test as an aid to selection is valuable if it
predicts with acceptable accuracy some criterion of the person’s later performance. First-year
college grades point average, success in a job training program, and on-the-job productivity are
typical examples of performance criteria. By suitable choice of item content and operating char-
acteristics, tests can be constructed to maximize correlation between the test scores and some
measure of the criterion. In most applications, prediction is further improved by use of multiple

11
This section was contributed by R. Darrell Bock.

618
9 USES OF ITEM RESPONSE THEORY

regression procedures to combine the test score with other information about the applicant. An
major economic role of selection tests is to reduce losses incurred when a person selected for a
position proves untrainable or unable to perform work assignments satisfactorily.

9.3 Qualification testing

Qualification tests results are used in connection with education or training as evidence that a
person has attained an acceptable level of knowledge or skills. Examples are tests required in
school promotion or graduation, licensing examinations of persons going into professions such
as law or medicine, pre-service testing of public school teachers, etc. In these applications a
clear-cut criterion of later performance rarely exists. The rational justification of the test is that it
samples the domain of competence. The percent of items on the test that the examinee responds
to satisfactorily is assumed to estimate the percent of mastery of the domain, which must be suf-
ficiently high to “pass” the test. Because qualification tests often have high-stakes for persons
taking them, they must be carefully constructed to ensure that they represent fairly the domain of
competence and give consistent results from one test form to another.

9.4 Program evaluation and assessment testing

Evaluation tests are administered to persons in some program of education or training, not for
individual qualification, but to evaluate whether the program, or institution conducting the pro-
gram, is achieving its instructional goals. In the evaluation of schools or school systems, this test-
ing is now referred to as assessment. The objective of assessment is to stimulate and guide
change and improvements in instruction when they are needed. An important requirement of as-
sessment is, therefore, that it include measures of outcomes in all main areas of instruction; oth-
erwise, under pressure to obtain favorable results, schools may concentrate instruction on areas
that are tested at the expense of those that are not. Because assessment programs are often car-
ried out in very large scale at state or national levels, and are intended to measure achievement
trends over a period of years, attention to the efficiency of the test forms to deliver accurate and
stable results is of the greatest importance.

9.5 Clinical testing

Clinical tests in fields such as counseling psychology, pediatrics, and psychiatry help in the iden-
tification of learning difficulties and behavioral disorders. The Binet and Wechsler I.Q. scales are
well-known examples of tests administered to children to determine whether they are learning
and reasoning at the level expected for their chronological age. The Minnesota Multiphasic Per-
sonality Inventory (MMPI) is the leading self-report device for obtaining information about per-
sonal adjustment problems and neuroses. Clinical tests are administered and interpreted only by
qualified professionals, usually on a one-to-one basis with the client. Ideally, they produce a pro-
file of scores exhibiting patterns that aid diagnosis of the behavioral problem. Because these tests
are limited to controlled clinical settings, they are in little danger from overexposure or compro-
mise and can remain in the same form over a period of years.

619
9 USES OF ITEM RESPONSE THEORY

9.6 Measurement methods and research

In fields devoted to the study of human behavior—including psychology, education, sociology,


political science, and market research—tests and scales are instrumental in making behavioral
phenomena accessible to quantitative analysis. In so doing, they serve to operationalize con-
structs derived from some theory. Measurement methodology becomes relevant when the in-
struments consist of multiple items, responses to which must be combined into a score or profile
of scores for each observational unit (usually a person). Responses of a sample of persons to
items that can be interpreted in isolation—for example, opinion about a particular public issue, or
preference for a certain consumer product—can be analyzed by statistical methods for aggregate-
level categorical data and do not require classical test theory or IRT. If there are many alternative
items that could represent the construct, however, the question will arise as to how well the par-
ticular items in a test or scale serve in this role. The fundamental question is whether the items
represent one construct or more than one construct, and if the latter, how many and how charac-
terized. The item factor analysis procedures discussed in Section 7.5 provide some relevant an-
swers.

Once a construct is identified and items representing it are in hand, the questions focus on the
measurement characteristics of the resulting test or scale:

 can the full range of variation in the population of potential respondents be measured with
acceptable precision?
 what is the measurement error variance at various points on the scale?
 are scores obtained at different sites of application, or on different occasions in time, sta-
ble and consistent?
 if ratings based on human judgments are involved in scoring, are results sufficiently re-
producible between judges or between teams of judges recruited and trained at different
sites and times?

Classical test theory, especially generalizability theory, answers these questions in an average
sense, while IRT test information analysis gives a more detailed account of measurement error
throughout the range of scores. In addition, IRT test equating facilitates the construction of paral-
lel test forms measuring the construct, and these in turn can serve as “multiple indicators” for
structural equation modeling to validate the construct through its relationships with external vari-
ables.

9.7 Approaches to analysis of item response data

Once the test has been administered to persons in some population of interest and the item re-
sponses are in hand, certain analysis operations must be performed to put the information in the
data in usable form. Many testing programs and research organizations still perform these opera-
tions with procedures based entirely on classical test theory, others rely on a mixture of classical
and IRT methods, and a few others use IRT methods exclusively.

620
9 USES OF ITEM RESPONSE THEORY

To give some idea of how day-to-day work of data analysis may change in a shift from classical
to IRT procedures, this section compares the two approaches in the areas of application detailed
previously. It also serves as guide to the references for further reading at the end of the subsec-
tions. Although no two persons would likely agree on which or how many such aspects of data
analysis deserve attention, twelve topics frequently appearing in the current literature are dis-
cussed in the following sections. They are test scoring, test generalizability, item analysis, esti-
mating the population distribution, differential item functioning, forms equating, vertical equat-
ing, construct definition, analysis and scoring of rated responses, matrix sampling, estimating
domain scores, and adaptive testing.

9.7.1 Test scoring

A given of classical test theory is that the score on a test in which the responses are marked cor-
rect or incorrect is the number correct or percent correct. Minor variations are the score on a mul-
tiple-choice test corrected for guessing (number correct minus the quantity (number incorrect
divided by the number of choice alternatives less one) or arbitrary scoring formulas in which
some items count for more than others. It is an interesting fact that the number correct score was
not part of the first rationally developed standardized test—the Binet-Simon Intelligence Scale,
first published in 1909. The test consisted of an age-graded series of tasks and questions pre-
sented successively to a child by a test administrator. The score on the test, called the child’s
“mental age”, corresponded to the highest of several age-graded items that the administrator
found the child could complete successfully. The child’s “I.Q.” was defined as this mental age
divided by chronological age. There is a sense then in which measurement methodology has
come full circle, for, except in some special cases, number-correct is not a summary statistic used
by IRT in computing the score of a person taking the test. IRT uses instead the person’s total pat-
tern of correct and incorrect responses to the test items to estimate a score on the construct scale.
The result is referred to as a “full information” estimate: it makes use of all information in the
answer pattern, not just that in the number-correct count. Finding this IRT “scale score” is much
like locating the test taker’s position on the Binet-Simon scale, except that the continuum on
which it is expressed is not an external variable, such as age, but is a construct inferred from the
internal consistency of item responses within the sample data. During the early period when IRT
was oriented primarily toward selection testing, this construct was called “ability”. Later, as
qualification and program evaluation became more prominent in testing, the term “proficiency”
was introduced. In other areas of application—consumer research, for example—“preference” or
“propensity” would be apposite. “Proficiency” is used in the present writing.

The process of inferring a proficiency continuum is based on a mathematical model expressing


the probability of a correct response to an item as a function of the scale position of the person
taking the test and the values of certain parameters specific to the item. In effect, the continuum
is constructed so that the patterns of correct and incorrect responses in the sample data are best
accounted for by the item models. The models most commonly used for this purpose are the so-
called normal ogive model, based on the normal distribution function, and the very similar but
mathematically simpler logistic function.

621
9 USES OF ITEM RESPONSE THEORY

These models have so-called threshold parameters that are related to the difficulty of the item
and determine where the item is located on the inferred scale. They also have slope parameters,
related to the discriminating power of the items, that determine how much each will influence
estimation of the proficiency scores. The score on the test is that point on the scale where, when
the person’s score is substituted in the item response models, the person’s pattern of correct and
incorrect responses is best accounted for. Scores determined in this way can be represented,
along with the item thresholds, on the proficiency scale in the same manner that mental ages and
age-graded items appear on the Binet intelligence scale.

The IRT method of extracting information about the person from the item responses, although
more intricate than the simple number-correct count, has several important advantages in testing
practice. First, the person’s scale score is little affected by adding or deleting items from the test.
The precision with which the scale point is located may change, but the meaning of the scale and
its units of measurement remain the same. This is not true of the number-correct score or even
the percent-correct score: they vary depending on the difficulties of the items added or removed;
if the average difficulties of the items change, the difficulty of the test as a whole is altered. This
does not happen with the IRT scale score because differences in item difficulty are accounted for
by the threshold parameters of the item response model.

Second, the IRT scale scores have smaller relative measurement error than number-right scores
because the influence of the items on the estimate is adjusted by the discriminating power pa-
rameters to minimize error. Finally, the IRT scale-score concept generalizes in a direct and con-
sistent way to other response modes—for example, extended responses to open-ended items
scored in graded categories by raters. Classical test theory has no comparable capability; it
merely resorts to arbitrary assignment of numerical values to the various grades and summing the
values to provide a score.

Apart from some remarks and references to examples in connection with item analysis and
analysis of rated responses, technical particulars of how item parameters and scale scores are es-
timated from item response data are beyond the scope of this section. Computer programs that
implement the IRT methods of estimation are described in the chapters to follow.

9.7.2 Test generalizability

A fundamental concept of both classical test theory and IRT is that the items of a test are a sam-
ple from some larger domain of items or tasks, any of which might equally well have been pre-
sented to the test taker. A score from any such test therefore raises the essential question of sam-
pling—namely, how much error variation in the test score must be attributed to the sampling
process. In classical test theory, this question is posed in terms of the so-called true score model,
in which the observed test score is assumed to be the sum of a true score component and error
component. The two components are defined to be statistically independent, such that the vari-
ance of the test score in the population of persons to be tested equals the sum of the variances of
the components. These variances can be estimated in test data by giving correct responses a score
of 1 and incorrect responses a score of 0, and carrying out a person-by-items analysis of vari-
ance. On the assumptions underlying this model, the square root of the estimated variance of the
error component is the standard error of measurement of the test. It can be used, for example, to

622
9 USES OF ITEM RESPONSE THEORY

place an approximate 90 percent confidence interval on the true score (i.e., observed true score
plus or minus 2 times the standard error). The variance estimates can also be used to calculate a
generalizability index for the test as the ratio of the true score variance to the sum of the true
score variance and the error variance. This index is variously referred to as coefficient α , Kuder-
Richardson reliability, or test reliability. It can be modified to predict the coefficient of gener-
alizability of a test in which the number of items sampled is increased n times merely by dividing
the error variance component in the ratio by that factor. The resulting formula is equivalent to the
Spearman-Brown prophecy formula of classical test theory.

A more penetrating treatment of the classical model, called strong true score theory, shows the
preceding results to be an oversimplification. The standard error of a test score is not constant
but varies with the value of the test score. IRT results take this into account by providing, not just
one error estimate, but an error function, computable from the item parameters, that yields an es-
timate of the error variance specific to every point on the scale score. This function typically
shows the test to have the highest precision in the region of the scale where the item locations are
most dense. For a test in which the greater part of the item set is in the middle range of difficulty,
the error function tends to be “U” shaped.

A related, very useful concept is that of the test information function, which is the reciprocal of
the error function. The information function shows the relative precision of the test at every point
on the scale. High values of the information correspond to high precision of the scale score and
low values to low precision. The important property of the test information function is that it is
the sum of corresponding information functions of the items. Item information functions depend
on both the item location and its discriminating power. The maximum of the function occurs for
the normal and logistic models, for example, at the location of the item threshold, and the height
of the function increases and decreases with item discriminating power. The test information thus
shows in quantitative detail how the measurement precision of a test can be adapted to a particu-
lar application by the placement of items of differing difficulty and discriminating power. A test
can be made highly informative in a narrow score range by concentrating items in that interval,
or made uniformly informative over a wide range by spacing items evenly over the range. Incor-
porating effects of both item thresholds and discriminating powers, plots of item information
functions play the same role in IRT that plots of item difficulty vs. part-whole correlation play in
classical test theory.

9.7.3 Item analysis

In classical test theory, the estimation of item difficulties, part-whole correlations and other char-
acteristics, such as distractor use in multiple-choice items, is referred to as item analysis. The
corresponding operations of IRT theory are called item calibration. For the normal ogive or lo-
gistic model calibration involves the estimation of the item thresholds and discriminating powers
from samples of item response data. If the test contains multiple-choice items, then a modifica-
tion of these models that accounts for chance successes may be used. These so-called three-
parameter normal and logistic models require estimation of item-specific probabilities of correct
response due to guessing in addition to the threshold and slope parameters. In most instances,
only the more difficult items of a test, with their greater frequency of wrong responses, require a
three-parameter model.

623
9 USES OF ITEM RESPONSE THEORY

An important purpose of item analysis, both in classical and IRT methodology, is to check the
extent to which each item belonging to some larger set represents the construct that the set is in-
tended to measure. The classical item statistic that conveys this information is the part-whole
correlation (computed as the biserial or point biserial correlation between the item 0, 1 score and
the test score). Of course, this correlation succeeds in this role only if the preponderate majority
of the items in the test are validly construct-related. When this condition is satisfied and a small
minority of items depart from the construct or are in some way ambiguous, their part-whole cor-
relations will be low. The IRT statistic that functions in a similar way is the slope parameter of
the item response model—high slopes correspond to high part-whole correlations, and vice
versa. In fact, the slope statistics can be converted into a correlation index, similar to part-whole
correlation, that measures the relationship between the item response and the inferred construct.
An operational difference between classical and IRT procedures, however, is that during score
estimation the presence of a very low slope parameter will automatically nullify the influence of
the item, whereas the item must be specifically excluded from the classical number-correct score.

The other essential statistic of classical item analysis is item difficulty or, more accurately item
facility—namely, the percent or proportion of correct responses to the item when the test is ad-
ministered to a sample of persons representing the relevant population. It is well known from
classical and IRT theory that an item is most informative about a particular person when that per-
son’s probability of responding correctly is in the neighborhood of one-half. Although this prob-
ability will differ considerably among persons in the population, it is advisable from the stand-
point of minimizing the average measurement error that test items be selected so that an appre-
ciable proportion of persons has an intermediate probability of correct response. Near zero or
100 percent chances of correct response across the population as a whole are of no help in meas-
urement. The IRT statistic that measures item difficulty is the item-threshold parameter, located
at or near the point on the scale where a person with that scale score will have probability one-
half of responding correctly. This parameter is also sometimes referred to as the item location. It
is not related in a simple way to the percent of persons in the population expected to respond cor-
rectly to the item, but if the origin or units of the IRT scale are chosen suitably, the threshold pa-
rameter conveys similar information. The appropriate scaling convention for this purpose is to
set respectively values 0 and 1, to the mean and standard deviation of the distribution of scale
values in the sample data. If, as is often the case, this distribution is approximately normal, the
item thresholds are on a scale in which their values correspond to normal deviates in the popula-
tion of persons.

At the core of any IRT item analysis is the algorithm for estimating parameters of the response
models from a sample—preferably a large sample—of data obtained by administering the test to
persons in some population of interest. Fitting models to such data is referred to as item calibra-
tion. The most general and robust procedures for this purpose, applicable to any well-identified,
twice-differentiable model, are based on the statistical techniques of maximum marginal likeli-
hood or Bayes estimation. These methods give a single best estimate of each parameter of each
item, and also an interval estimate indicating the effect of sampling variation. They also provide
for statistical tests of the improvement of fit to the data when additional parameters are included.
With the multiple-group IRT models discussed below, more general forms of these methods also
estimate the proficiency distributions of the populations corresponding to the groups.

624
9 USES OF ITEM RESPONSE THEORY

As mentioned above, IRT theory gives more precision to item selection criteria by combining the
information in the item slopes and thresholds into item information functions, that accumulate
additively to form the test information function. A plot of item and test information functions on
the same scale as the sample score distribution conveys clearly how the items or tests will per-
form in the population of interest. The same approach applies to a classical statistic pertaining to
multiple-choice items—namely, the percent of responses in the sample falling into each of the
alternatives of the multiple-choice item. In IRT, the nominal categories model gives the prob-
ability of the correct response and of each of the distractors as a function of the scale score. It is
easy to identify in these plots the distractors that are not functioning as desired in various regions
of the range. In addition, the analysis under this model shows the amount of construct related in-
formation in the distractors as well as in the correct response. In many cases, plausible distractors
contain information that can improve the precision of estimated scale scores and can be recov-
ered by the IRT scoring procedure based on the model. This model is implemented in the
MULTILOG program.

9.7.4 Estimating the population distribution

For purposes of norming test results, it is necessary to estimate the distribution of test scores in
the population of interest. This presents a problem for classical test theory for two reasons. First,
the number-correct test scores contain both true score variation and measurement error variation;
since the measurement error variance is a function of test length, the variance of the score distri-
bution therefore depends on an arbitrary choice in test construction. Second, the shape of the test
score distribution depends arbitrarily upon the distribution of item difficulties in the test; tests
with severely skewed distributions of item difficulties will produce skewed distributions of test
scores in the population.

Classical test theory sidesteps these problems by expressing norms as population percentiles,
which are invariant with respect to the spread or shape of the score distribution. Further analysis
of the test scores by statistical methods that assume a normal distribution may still be affected,
however. IRT theory is more favorable in this respect in that the shape of the observed scale
score distribution is relatively little influenced by the distribution of item difficulties. If the true
score distribution is approximately normal, for example, the scale score distribution will be also.
The variance of the latter is still increased by measurement error, but as is also true of test scores,
the effect can be largely suppressed independent of test length by computing so-called “re-
gressed” or “shrunken” estimates as a function of test reliability. The Bayes (EAP) regressed and
Bayes modal (MAP) scores provided by the programs are regressed estimates.

IRT can handle this problem more rigorously, however, by estimating an inferred latent distribu-
tion of proficiency scores. The shape of the latent distribution is estimated directly from the pat-
terns of correct and incorrect responses to the test items and does not involve the test scores. If
there is only one sample group in the analysis, the location and dispersion of the latent distribu-
tion are indeterminant and must be set arbitrarily (e.g. to 0 and 1). If there are multiple sample
groups in the analysis, locations and dispersions of their latent distributions can be set relative to
a designated reference group or relative to arbitrarily set values in the combined groups. Multi-
ple-group analysis is implemented in the BILOG-MG and MULTILOG programs.

625
9 USES OF ITEM RESPONSE THEORY

9.7.5 Differential item functioning

Almost any population of potential test takers will consist of identifiable subpopulations—
different age groups, the two sexes, urban or rural residents, education levels, ethnic and lan-
guage groups, etc. Relevant information on group membership may be available from back-
ground questionnaires administered along with the test. If so, the data will allow investigation of
whether persons in one such group experience differences in item difficulty or discriminating
power relative to those in other groups when all groups have equal mean scores on the test as a
whole. When this is the case, the test is said to exhibit differential item functioning (DIF). DIF is
essentially item by group interaction in item difficulty or discriminating power. If at the same
time, the groups show unequal mean test scores, the test is said to have adverse impact on the
groups that perform more poorly. Adverse impact can, of course, also occur in the absence of
DIF. Since DIF in effect alters the substantive meaning of the test score from one group to an-
other, it is undesirable and should be eliminated if possible. An English language vocabulary test
with words of Latin or Germanic origin, for example, will tend to show DIF with respect to Eng-
lish or Spanish as first language acquired. If only a few items of the test exhibit DIF, they usually
can be removed without impairing measurement of the intended construct.

The problem for the data analyst is how to detect DIF in tests that may also show adverse impact.
There are both classical and IRT approaches to this problem. The classical methods look for dif-
ferences in item difficulty among persons from different background groups whose tests scores
are equal or fall in a narrow score interval. A summary statistic for these differences over the
scores or score intervals provides a measure of DIF; an associated statistical test establishes its
presence. Based on a log-linear model of item by group interaction, a similar analysis can be car-
ried out with the so-called Mantel-Haenszel statistic.

The IRT treatment of DIF is an example of multiple-group analysis in which item thresholds or
discriminating power are estimated separately in each group and jointly with the group latent dis-
tributions, under the restriction that the means of the item thresholds must be equal in all groups.
The item guessing parameters, if any, are also restricted to be equal among groups. IRT estima-
tion of DIF effects includes standard errors that can be used to assess statistical significance of
effects for individual items. In addition, a test of DIF in all items jointly is provided by compari-
son of the goodness-of-fit of the response model when different thresholds or discriminating
power are assumed vs. the fit when a single set of thresholds or discriminating power is esti-
mated in the combined data. The IRT method of analyzing DIF is in general more sensitive than
its classical counterparts, especially with shorter tests, because IRT better defines the latent con-
struct measured by the test. DIF in item difficulty is implemented in BILOG-MG, PARSCALE
and MULTILOG. DIF in discriminating power is implemented in PARSCALE and
MULTILOG.

9.7.6 Forms equating

Many testing programs must update their test at regular intervals to prevent overexposure and
compromise of the item content. This creates the problem of keeping the reporting scores for
successive forms comparable so that a person is neither advantaged nor disadvantaged by taking
one form rather than another. Somehow, the reported results must allow for the differences in

626
9 USES OF ITEM RESPONSE THEORY

overall difficulty of the forms that inevitably occur when items are changed. Classical test theory
solves this problem by equivalent-groups equating. This method requires the alternative forms to
be assigned randomly to persons in some large sample. The randomization ensures that persons
taking different forms will have essentially the same true score distribution (provided that the
successor forms are of the same length as the preceding form and have similar distributions of
item difficulties and discriminating powers). If these conditions are met, the test scores for the
new forms can be expressed on the same scale as the old forms by assigning them to the corre-
sponding points of their respective observed score distributions. This is the equipercentile
method of keeping the score reports comparable to one another through successive generations
of test forms. If the distribution of item difficulties within the forms is more or less normally dis-
tributed and well centered for the population, the test score distributions will be approximately
normal. In that case, a nearly equivalent equating can be obtained merely by standardizing the
scores of the various forms—that is, by setting the mean and standard deviations of their respec-
tive distributions to any convenient fixed values. This method is called linear equating.

Since IRT scale scores are much more likely to approximate normal distribution than number-
right scores, equipercentile equating is less needed in IRT applications. Linear equating suffices,
and it happens automatically if the origin and unit of the IRT scale is set so that the scale scores
have a specified mean and standard deviation in the sample. In addition, IRT is unique in allow-
ing the equating forms administered to non-equivalent groups—i.e., groups with different true
score distributions. This type of equating requires, however, that the test forms share a certain
number of common “linking” items. Provided the linking items do not exhibit DIF with respect
to the groups, multiple-group IRT analysis of all forms together automatically produces a single
IRT scale on which the reported scores are comparable. The multiple-group procedure estimates
separate latent distributions of the groups jointly with the item parameters of all the forms. The
advantage of this method is that it does not require a separate administration of the forms to
some group of persons for purposes of equivalent groups equating. Forms can, for example, be
updated in the course of operational administrations of the test in which a certain proportion of
items from the previous year’s forms is carried over to the current year’s forms. A random sam-
ple of examinees from the previous year’s operational testing provides data for one of the groups,
and a similar sample from the current year provides data for the other. The resulting scale scores
are linearly equated to those of the previous year by setting the mean and standard deviations of
the latent distribution of the first group to its previous year’s values. Estimates of change in the
mean and standard deviation between years are a by-product of the equating. If desired, non-
equivalent groups equating can be carried back more than one year, provided linking items exist
between at least adjacent pairs of forms. Multiple-groups forms equating is implemented in
BILOG-MG and MULTILOG.

9.7.7 Vertical equating

In school systems with a unified primary and secondary curriculum, there is often interest in
monitoring individual children’s growth in achievement from Kindergarten through eighth grade.
A number of test publishers have produced articulated series of tests covering this range for sub-
ject matter such as reading, mathematics, language skills, and, more recently, science. The tests
are scored on a single scale so that each child’s gains in these subjects can be measured. The ana-

627
9 USES OF ITEM RESPONSE THEORY

lytical procedure for placing results from the grade-specific test forms on a common scale for
this purpose is referred to as vertical equating.

The most widely used classical method of vertical equating is the transformation of test scores
into so-called grade equivalents. In essence, the number-correct scores for each year are scaled
in such a way that the mean score for the age group is equal to the numerical values of the grades
zero through eight. This convention permits a child’s performance on any test in the series to be
described in language similar to that used with the Binet mental age scale. One may say of a
child whose reading score exceeds the grade mean, for example, that he or she is “reading above
grade level”.

For IRT, vertical equating is merely another application of non-equivalent groups equating in
which the children administered particular grade-specific tests correspond to the groups. As in
the equating of updated forms mentioned above, linking items between at least consecutive
forms in the series are required. They must be provided in each subject matter included in the
graded series. (Note that grade-equivalent scaling does not require linking items.)

The two methods produce quite different scales. Grade equivalents are of course linear in school
grade. They treat the average gain between first and second grade, for example, as if it were
equal to that between seventh and eighth. On this scale, the amount of variation between chil-
dren’s scores appears to increase as the cohort moves through the grades, and there is a corre-
sponding positive correlation between a child’s average score level over the years and the child’s
average gain. In other words, children who begin at a lower level appear to gain less overall than
those who begin at a higher level. This so-called “fan-spread effect” is regularly seen in all sub-
ject matters.

On an IRT vertically equated scale, average gains are generally greatest at the earlier grade levels
and decrease with increasing grade. Within grade, standard deviations are fairly uniform, and the
correlation between children’s average score levels and their gains are small, or even slightly
negative in some subject matters.

Unfortunately, there is no objective basis for deciding which of these scales better represents a
child’s true course of growth in knowledge and skills during the school years. Different IRT
models assuming other transformations of the proficiency scale could be made to fit the item re-
sponse data equally well and yet exhibit much different appearing relationships between grade
level and average score or average gain. Extrinsic considerations would have to be brought to
bear on the question to determine a preferred scale. For example, if one wished to compare an-
nual average gains in test performance of children in different classrooms when assignment to
classrooms is non-random, the scale that showed zero correlation between level and gain would
be most advantageous. IRT vertical equating comes much closer to this ideal than grade equiva-
lents, but might require some further transformation, possibly subject matter and site specific, to
attain complete independence of level and gain. (See Bock, Wolfe & Fisher, 1996, for a discus-
sion of this topic).

628
9 USES OF ITEM RESPONSE THEORY

9.7.8 Construct definition

The discussion up to this point assumes that all items in the test measure the same underlying
construct. When it is not clear that the item set is homogeneous in this sense, steps must be taken
to explore the construct dimensionality of the set. The classical approach to this problem is to
perform, in a large sample of test data, a multiple factor analysis of the matrix of tetrachoric cor-
relations between all pairs of items. The more familiar Pearson product-moment correlation of
item responses assigned different numerical values if correct or incorrect (phi coefficient) is not
generally satisfactory for this purpose because variation in item difficulties introduces spurious
factors in the results. Random guessing on multiple-choice items has a similar effect that must
also be allowed for. Tetrachoric correlations with corrections for guessing are largely free of
these problems, but they have others of their own. One of these is computational instability that
appears when the correlations have large positive or negative values and the item difficulties are
very low or high; in these cases, it is often necessary to replace the correlation in the matrix with
an attributed default value. The other problem is that factor analysis of tetrachoric correlation
matrices almost always produces a certain number of small, unreal factors that are meaningless
and must be discarded.

IRT improves on this procedure by a method of full information item factor analysis that oper-
ates directly on the patterns of correct and incorrect responses without intervening computation
of correlation coefficients. In effect, this method fits a multidimensional item response model to
the patterns in the sample data. Full information item factor analysis is robust in the presence of
omitted or not-presented items and is free of the artifacts of the tetrachoric method. It also pro-
vides a statistical test of the number of factors that can be detected in the data.

The objective of both classical and IRT item factor analysis is the identification of items with
similar profiles of factor loadings—an indication that they arise from the same cognitive or af-
fective sources underlying the responses of persons taking the test. Objective methods of rotating
the factor structure, such as orthogonal varimax rotation and non-orthogonal promax rotation, are
especially effective in picking out clusters of items that identify these implicit constructs. The
presence of significant multiple factors in the data means that there are corresponding dimen-
sions of variation in the population of persons. In some cases, actual subgroups in the population
associated with particular factors can be identified by including demographic variables in the
analysis. Alternatively, they may be found by conventional multiple regression analysis of factor
scores for the persons, which are also provided by IRT full information item factor analysis. Full
information item factor analysis is implemented in the TESTFACT program.

9.7.9 Analysis and scoring of rated responses

When tests contain items or exercises that cannot be scored mechanically, the responses are often
rated on a graded scale that indicates quality or degree of correctness. For individually adminis-
tered intelligence tests the grading is done by the test administrator at the time the response is
recorded. For group administered open-ended exercises and essay questions, written responses
are graded later by trained raters. In both cases, the additional information conveyed, beyond that
provided by correct-incorrect scoring, provides better justification of the considerable cost of
graded scoring.

629
9 USES OF ITEM RESPONSE THEORY

In addition to problems that may arise in preparing the rating protocol and training the raters for
graded scoring, analysis of the resulting data presents other difficulties not encountered with cor-
rect-incorrect scoring. How to combine the ratings into an overall score in a rational way is not at
all clear in classical test theory—especially so if the test also includes multiple-choice items. The
classical approach never goes much beyond mere assignment of arbitrary numerical values to the
scale categories and summing these values to obtain the test score. The arbitrariness of this
method, and the fact that items with different numbers of rating categories receive different
weights in the sum, has always proved troublesome.

In this respect, IRT methods are a very considerable advance. Item response models now exist
that express the probability of a response falling in a given graded category as function of 1) the
respondent’s position on the IRT scale, 2) parameters for the spacing of the categories, and 3) the
difficulty and discriminating power of the item. Models for items with different numbers of rat-
ing categories and models for dichotomously scored responses can be mixed in any order when
analyzing items or scoring tests; arbitrary assignments of score points are not required. The IRT
test scoring based on these models makes use of the information in the pattern of ratings in a way
that is internally consistent in the data and minimizes measurement error. The IRT approach to
graded data allows tests to have more interesting and varied item formats and makes them acces-
sible to IRT methods of construction and forms equating. Provision for graded scores is included
in PARSCALE and MULTILOG.

9.7.10 Matrix sampling

Testing at the state and national level plays a part not only in counseling or qualification of indi-
vidual students, but also in evaluating the effectiveness of instructional programs, schools, or
school systems. The objective is to compare instructional programs and schools with respect to
their strength and weaknesses in promoting student achievement in various categories of the cur-
riculum. Testing used in this way is referred to as assessment to distinguish it from student-
oriented achievement testing. Educational assessment is typically carried out in large-scale sur-
veys, often on a sampling basis rather than a total census of schools and students. The sampling
approach consists of drawing a probability sample of schools and, within these schools, testing a
random sample of students. To minimize the burden on schools and students alike there is an at-
tempt to test as many curricular categories as possible in a limited amount of time, usually one
class period. This is accomplished by assigning randomly to the selected students one of 20 or 30
different test forms each containing only a small number of the items representing a category.
Usually the categories are main topic areas within subject matters. The total sampling design can
be laid out as a table in which the rows correspond to schools and students tested and the col-
umns correspond to items sampled for the test forms and the categories within forms. This ar-
rangement is referred to as a matrix sample.

In the original conception of a matrix-sampled assessment, the score to be assigned to programs,


schools, states, or demographic groups is the average aggregated percent-correct for the items in
each subject topic. As an aid to interpretation of differences between groups or between assess-
ment years, statistical theory provides formulas for standard errors for these average scores under
the assumptions of matrix sampling. This treatment of the data is within the framework of the
number-correct score concept of classical test theory, although no explicit scores for individual

630
9 USES OF ITEM RESPONSE THEORY

students are computed.

A problem with average-percent correct reporting occurs, however, if the assessment aims at
monitoring trends in average achievement over successive years. When the time comes to update
the items of the assessment instrument, new items substituted for old inevitably introduce
changes in average scores at higher levels of aggregation—changes which may be larger than the
expected differences between year or programs or schools. Although scores on the successive
instruments can be made comparable by equivalent groups equating, very large sample groups
are required to bring the equating errors below the size of the smallest difference that would have
policy implications.

IRT nonequivalent groups equating, which can be done in the full operational samples, is much
more cost effective in this situation. It requires only that a certain proportion of items from the
previous assessment be carried over into the update to serve as links between successive forms.
Typically, one-third of the items are retained as links. A large random sample of cases from the
two assessments are then analyzed in a multiple-group IRT calibration that estimates the latent
distributions for the two assessment samples jointly with the new set of item parameters. The
link items serve to set the origin and unit of scale equal to those of the previous assessment.

Paralleling the average percent-correct approach, IRT can also estimate scores at the group-level
without intervening score estimation for individual students. This can be done in one of two
ways. If the interest is only in comparing mean scores among schools or higher level aggregates,
these quantities can be estimated directly from counts of the number of times each item is pre-
sented to a student in the group, and of these, the proportion correct. The group means are esti-
mated on a scale that is standardized by setting the mean of the estimated group means weighted
by the numbers of persons testing in the respective groups, and the standard deviation of the es-
timated group means calculated in a similar weighted form. Standard errors with respect to the
sampling of students within schools are available for the estimated school means, and the higher-
order aggregate means.

If it is also of interest, however, to know something about the distribution of student achievement
within the aggregate groups, multiple-group IRT analysis can be used to estimate the latent dis-
tributions within the groups directly, without estimating scores for individual respondents. The
procedure is more efficient for a definite form of latent distribution, such as the normal or other
distribution that depends on a relatively small number of parameters. If a completely general
form is assumed, a nonparametric procedure, possibly involving computer simulations, may be
necessary.

9.7.11 Estimating domain scores

Both classical test theory and item response theory have to contend with the arbitrary nature of
test scores as measurements. As mentioned above, the classical number-correct score, and even
the length-independent percent-correct score, depend arbitrarily upon the difficulties of the items
selected for the test. The IRT scale score, although relatively free of that problem, is nevertheless
expressed on a scale of arbitrary origin and unit. The earliest and still most widely used method
of removing this arbitrariness is to scale the scores relative to their distribution in some large

631
9 USES OF ITEM RESPONSE THEORY

sample of persons taking the test. This is most commonly done by expressing the scores as per-
centiles of the distribution or as standardized scores, i.e., subtracting the mean of the distribution
from the observed score and dividing by the distribution standard deviation. This approach to
reporting test scores is called norm referencing; it assumes that comparisons between persons is
the object of the testing, which undeniably it is in selection testing.

In the context of qualification testing, however, a more relevant objective is whether a person
taking the test shows evidence of having learned or mastered a satisfactory proportion of the
knowledge and skills required for qualification. Similarly, in program evaluation the objective is
whether a sufficient proportion of students in a program has reached a satisfactory level of learn-
ing or mastery. Reporting test results in these terms is referred to as domain referencing, or in a
somewhat similar usage, criterion referencing. For domain referencing to be realizable in prac-
tice, some reasonably large pool of items or exercises must exist to define the domain operation-
ally. Particular tests containing items or exercises from the pool may then be selected for pur-
poses of estimating domain scores.

The classical method of domain score estimation is to assume that items of the test are a random
sample of the pool. In that case, the test percent-correct directly estimates the domain percent-
correct, and its standard error can be computed from the test’s generalizability coefficient. IRT
can improve upon this estimate if response models for items in the pool have been calibrated in
data from a relevant population of examinees. With this information available, the items selected
for a particular test do not need to be a random sample of the pool. They need only be link items
in tests calibrated by non-equivalent groups equating. In that case, one estimates the domain
score by first estimating the person’s IRT scale score, then substituting the score in the model for
each test item to compute the person’s corresponding probability of correct response: the IRT
estimated domain score is the sum of these probabilities divided by the number of items on the
test. Domain scores estimated in this way are more accurate than classical estimates because they
take into account the varying difficulty and discriminating power of the items making up the test.
These methods of estimation can be carried out with multidimensional as well as unidimensional
response models. Domain scores are implemented in the BILOG-MG program.

9.7.12 Adaptive testing

Adaptive testing is a method of test administration in which items are chosen that are maximally
informative for each individual examinee. Among items with acceptable discriminating power,
those selected are at a level of difficulty that affords the examinee a roughly 50 percent probabil-
ity of correct response. This corresponds to minimum a priori knowledge of the response, and
thus maximum information gain from its observation.

The two main forms of adaptive test administration are two-stage testing and sequential item
testing. In the two-stage method, which is suitable for group administration, a brief first-stage
test is administered in order to obtain a rough provisional estimate of each examinee’s profi-
ciency level. At a later time, a longer second-stage test form is administered at a level of diffi-
culty adapted to the provisional score of each examinee. In sequential adaptive testing, usually
carried out by computer, a new provisional estimate of the examinee’s proficiency is calculated
after each item presentation, and a most informative next item is chosen based on that estimate.

632
9 USES OF ITEM RESPONSE THEORY

The presentation sequence begins with an item of median difficulty in the population from which
the examinee is drawn. Depending on whether the response to that item is correct or incorrect,
the second item chosen is harder or easier. The presentations continue in this manner until the
successive provisional estimates of proficiency narrow-in on a final value with acceptably small
measurement error. Unlike two-stage testing, this method of administration requires the adaptive
process to be carried out during the testing session. For this reason computer administration is
possible only if the items are machine scorable.

When IRT scale scores are used to obtain the provisional estimates of proficiency in computer-
ized adaptive testing, the presented items must be calibrated beforehand in data obtained non-
adaptively. Once the system is in operation, however, items required for routine updating can be
calibrated “on line”. For this purpose, new items that are not part of the adaptive process must be
presented to examinees at random, usually in the early presentations. Responses to all items in
the sequence are then saved and assembled from all testing sites and sessions. A special type of
IRT calibration called variant item analysis is applied in which parameters are estimated for the
new “variant” items only; parameters of the old items are kept at the values used in the adaptive
testing. Because IRT calibration as well as scoring can be carried out on different arbitrary sub-
sets of item presented to respondents, the parameters of the variant items are correctly estimated
in the calibration even though the old items have been presented non-randomly in the adaptive
process. Variant item analysis is implemented in the BILOG-MG program.

With different examinees presented items of differing difficulty in adaptive testing, the number-
correct score is not appropriate for comparing proficiency levels among examinees. For this rea-
son, no treatment of adaptive testing appeared within classical test theory, and hardly any discus-
sion of the topic arose until item response theory made it possible to estimate comparable scores
from arbitrary item subsets. That development, combined with the availability of computer ter-
minals and microcomputers, has made sequential testing a practical possibility. Significant appli-
cations of computerized adaptive testing have followed, particularly in the area of selection test-
ing. Apart from its logistical and operational convenience, the primary benefit of this method of
test administration is in reducing testing time. As little as one-third of the time required for a
non-adaptive test suffices for a fully adaptive sequential test of equal precision.

633
10 BILOG-MG EXAMPLES

10 BILOG-MG examples

10.1 Conventional single-group IRT analysis

This example illustrates how the BILOG-MG program can be used for traditional IRT analyses.
The data are responses to 15 multiple-choice mathematics items that were administered to a
sample of eighth-grade students. The answer key and the omitted response key are in files called
exampl01.key and exampl01.omt, respectively (defined on the INPUT command).

The data lines, of which the first few lines are shown below, contain 15 item responses. This is
the simplest form in which raw data can be read from file: there is one line of data for each ex-
aminee, and the response to item 1, for example, can always be found in column 6. All items are
used on the single subtest. Item responses start in column 6 as reflected in the format statement
(4A1,1X,15A1).

1 242311431435242
2 243323413213131
3 142212441212312
4 341211323253521

Exampl01.key contains a single line:

KEY 341421323441413

With such a short test (15 items), item chi-squares are not reliable. For illustration purposes the
minimum number of items needed for chi-square computations has been reduced from the de-
fault of 20 to the number of items in this test, using the CHI keyword on the CALIB command.
With the item chi-squares computed, the PLOT=1 specification can now be used to plot all the
item response functions.

Note that the ICCs produced with the IRTPLOT program in the Windows version of BILOG-
MG display the χ 2 -test statistics, degrees of freedom, and probability, as well as the observed
response probabilities only for those items that have a significance level below the value speci-
fied with the PLOT keyword.

The scoring phase includes an information analysis (INFO=2) with expected information indices
for a normal population (POP). Rescaling of the scores and item parameters to mean 0 and stan-
dard deviation 1 in the estimated latent distribution has been requested (RSC=4). Printing of the
students' scores on the screen is suppressed (NOPRINT), because that information is saved in the
exampl01.sco file.

EXAMPL01.BLM - TRADITIONAL IRT ANALYSIS OF A FIFTEEN-ITEM PRETEST FROM


A TWO-STAGE TEST OF MATHEMATICS AT THE EIGHTH-GRADE LEVEL
>GLOBAL DFNAME='EXAMPL01.DAT', NPARM=3, SAVE;
>SAVE PARM='EXAMPL01.PAR', SCORE='EXAMPL01.SCO';
>LENGTH NITEMS=15;
>INPUT NTOTAL=15, NALT=5, NIDCHAR=4,
KFNAME='EXAMPL01.KEY', OFNAME='EXAMPL01.OMT';

634
10 BILOG-MG EXAMPLES

>ITEMS INAMES=(MATH01(1)MATH15);
>TEST1 TNAME='PRETEST', INUMBER=(1(1)15);
(4A1,1X,15A1)
>CALIB NQPT=31, CYCLES=25, NEWTON=10, CRIT=0.001, ACCEL=0.0, CHI=15, PLOT=1;
>SCORE NOPRINT, RSCTYPE=4, INFO=2, POP;

Phase 1 output

This is a standard 3-parameter, one-form, single-group analysis of a 15 item test. The Phase 1
classical item statistics for the first 5 items are as follows.

ITEM STATISTICS FOR SUBTEST PRETEST


ITEM*TEST CORRELATION
ITEM NAME #TRIED #RIGHT PCT LOGIT/1.7 PEARSON BISERIAL
------------------------------------------------------------------------
1 MATH01 1000.0 844.0 84.4 -0.99 0.274 0.415
2 MATH02 1000.0 972.0 97.2 -2.09 0.112 0.285
3 MATH03 1000.0 696.0 69.6 -0.49 0.356 0.468
4 MATH04 1000.0 503.0 50.3 -0.01 0.442 0.553
5 MATH05 1000.0 594.0 59.4 -0.22 0.477 0.603
------------------------------------------------------------------------

Phase 2 output

No new features are illustrated in the Phase 2 analysis, except that the plot criterion has been set
to include all items.

>CALIB NQPT=31, CYCLES=25, NEWTON=10, CRIT=0.001, ACCEL=0.0, CHI=15, PLOT=1;

The first and last item response function plots are shown below. The first item is extremely easy
and the last extremely difficult. These plots were produced using the IRTGRAPH procedure,
which is accessed via the Plot option on the Run menu after completion of the analysis. Note
that the Phase 2 output file also contains similar line plots.

635
10 BILOG-MG EXAMPLES

Phase 3 output

With this short, wide range test, ten quadrature points are sufficient for scoring. The item pa-

636
10 BILOG-MG EXAMPLES

rameters are rescaled so that the scores have mean zero and standard deviation one in the latent
distribution estimated from the full sample of 1000 examinees. Population characteristics of the
score information, including the IRT estimate of test reliability (equal to [score variance–
1/average information] / score variance) are shown with the information plot.

>SCORE, NOPRINT, RSCTYPE = 4, INFO = 2, POP;

QUAD RESCALING CONSTANTS


TEST NAME POINTS SCALE LOCATION
-------------------------------------------
1 PRETEST 10 1.000 0.000
-------------------------------------------

ITEM INFORMATION STATISTICS FOR TEST PRETEST FORM 1


FOR A NORMAL POPULATION WITH MEAN = 0.000 AND S.D. = 1.000

MAXIMUM POINT OF MAXIMUM AVERAGE


INFORMATION MAX INFORMATION EFFECTIVENESS INFORMATION
ITEM STANDARD STANDARD POINT OF INDEX OF
ERROR * ERROR * MAX EFFECTIVENESS * RELIABILITY
-------------------------------------------------------------------------
MATH01| 0.2142 | -1.3703 | 0.0587 | 0.1206
| 0.0579* | 0.2226* | -0.5284* | 0.1076*

(Similar output omitted)

MATH15| 1.0608 | 2.5110 | 0.0375 | 0.0476


| 0.8638* | 0.3388* | -1.9279* | 0.0454*
-------------------------------------------------------------------------

Using the Plot option on the Run menu to access the IRTGRAPH program, the following plot of
test information is obtained:

637
10 BILOG-MG EXAMPLES

10.2 Differential item functioning

This example is based on an example in Thissen, Steinberg & Wainer (1993). The data are drawn
from a 100 word spelling test administered by tape recorder to psychology students at a large
university. The words for the test were randomly selected from a popular word book for secretar-
ies. Students were asked to write the words as used in a sentence on the tape recording. Re-
sponses were scored 1 if spelled correctly and 0 if spelled incorrectly. Because the items are
scored 1,0, an answer key is not required. A complete description of these data are given in Sec-
tion 2.4.1.

The groups in this example are the two sexes and this is indicated by the NGROUP keyword on the
INPUT command. The same four items are presented to both groups on a single test form. The
format statement following the second GROUP command describes the position and order of data
in exampl01.dat. The group indicator is found in column 3 of the data records and is read in in-
teger format. A form indicator is not required in the data records because there is only one form.
The data have been sorted into answer patterns, and the frequencies are found in columns 10-11
of the data (2A1). These frequencies serve as case weights in the analysis. The TYPE=2 and
NWGHT=3 keywords describe this type of data. The value assigned to the keyword NWGHT requests
the use of weighting in both the statistics and calibration (by default, no weights would be ap-
plied).

A 1-parameter logistic model is requested using the NPARM keyword on the GLOBAL command.
The LOGISTIC option on the GLOBAL command indicates that the natural metric of the logistic
response function will be assumed in all calculations. If this keyword is not present, the logit is,
by default, multiplied by 1.7 to obtain the metric of the normal response function.

The SAVE option on the GLOBAL command indicates that a SAVE command will follow directly
after the GLOBAL command. On the SAVE command, the item parameter estimates are saved to an
external file exampl02.par and the DIF analysis results are written to an external file ex-
ampl02.dif.

The total number of unique items is described using the NTOTAL keyword on the INPUT command
while the NITEMS keyword on the LENGTH command is set to 4 to indicate that all 4 items are to
be used in the single subtest.

The ITEMS command lists the four items in the order that they will be read from the data records.
The INAMES and INUMBERS keywords assign each item a name and a corresponding number. Be-
cause there is only one form, the NFORM keyword is not required in the INPUT command and a
FORM command is not required. Because examinees in both groups are presented all the items
listed in the ITEMS command, the TEST and GROUP commands need contain only the test name
and the group names, respectively.

A DIF analysis is requested through the use of the DIF option on the INPUT command.

The REFERENCE=1 keyword on the CALIB command designates males as the reference group. The
convergence criterion is set to 0.005 instead of the default 0.01 using the CRIT keyword.
638
10 BILOG-MG EXAMPLES

When NGROUP >1, 20 quadrature points will be used for each group. Setting the NQPT keyword to
10 implies that 10 points will be used for each group, as fewer points are needed when the num-
ber of items is small.

No SCORE command is included in the command file, as DIF models cannot be scored.

EXAMPL02.BLM - MALE VS FEMALE DIFFERENTIAL ITEM FUNCTIONING


SPELLING, GIRDER ITEM 4, OTHER 3 ITEMS 1-3
>GLOBAL NPARM=1, LOGISTIC, SAVE, NWGHT=3, DFNAME='EXAMPL02.DAT';
>SAVE PARM='EXAMPL02.PAR', DIF='EXAMPL02.DIF';
>LENGTH NITEMS=4;
>INPUT NTOTAL=4, NGROUPS=2, DIF, NIDCHAR=2, TYPE=2;
>ITEMS INAMES=(SP1(1)SP4), INUMBERS=(1(1)4);
>TEST TNAME=SPELL;
>GROUP1 GNAME=MALES;
>GROUP2 GNAME=FEMALES;
(2A1,I1,T10,F2.0,T5,4A1)
>CALIB NQPT=10, CYCLES=15, CRIT=0.005, NEWTON=2, REFERENCE=1, PLOT=1;

Phase 1 output

The title and additional comments (if the optional COMMENT command has been used) are echoed
to the output file. Immediately after that, Phase 1 commands and specifications of the analysis
are given. Under FILE ASSIGNMENT, relevant information as read in from the GLOBAL, SAVE,
LENGTH, and TEST command are listed.

EXAMPLE 02: MALE VS FEMALE DIFFERENTIAL ITEM FUNCTIONING


SPELLING, GIRDER ITEM 4, OTHER 3 ITEMS 1-3
>GLOBAL NPARM=1,LOGISTIC,SAVE,NWGHT=3, DFNAME=’EXAMPL02.DAT’;

FILE ASSIGNMENT AND DISPOSITION


===============================

SUBJECT DATA INPUT FILE EXAMPL02.DAT


BILOG-MG MASTER DATA FILE MF.DAT WILL BE CREATED FROM DATA FILE
CALIBRATION DATA FILE CF.DAT WILL BE CREATED FROM DATA FILE
ITEM PARAMETERS FILE IF.DAT WILL BE CREATED THIS RUN
CASE SCALE-SCORE FILE SF.DAT
CASE WEIGHTING FOR SUBJECT STATISTICS AND
ITEM CALIBRATION
ITEM RESPONSE MODEL 1 PARAMETER LOGISTIC
LOGIT METRIC (I.E., D = 1.0)

>SAVE PARM='EXAMPL02.PAR',DIF='EXAMPL02.DIF';

BILOG-MG SAVE FILES

[OUTPUT FILES]
ITEM PARAMETERS FILE EXAMPL02.PAR
DIF PARAMETER FILE EXAMPL02.DIF

>LENGTH NITEMS=4;

639
10 BILOG-MG EXAMPLES

TEST LENGTH SPECIFICATIONS


==========================

MAIN TEST LENGTHS: 4

>INPUT NTOTAL=4,NGROUP=2,DIF,NIDCHAR=2,TYPE=2;

Specification of input related keywords are echoed in the next section. The data are entered as
item-score patterns (right = 1, wrong = 0) and frequencies (case weights).

DATA INPUT SPECIFICATIONS


=========================

NUMBER OF FORMAT LINES 1


NUMBER OF ITEMS IN INPUT STREAM 4
NUMBER OF RESPONSE ALTERNATIVES 1000
NUMBER OF SUBJECT ID CHARACTERS 2
NUMBER OF GROUPS 2
NUMBER OF TEST FORMS 1
TYPE OF DATA SINGLE-SUBJECT DATA, CASE WEIGHTS
MAXIMUM SAMPLE SIZE FOR ITEM CALIBRATION 10000000
ALL SUBJECTS INCLUDED IN RUN

>ITEMS INAMES=(SP1(1)SP4),INUMBERS=(1(1)4);

TEST SPECIFICATIONS
===================

>TEST TNAME=SPELL;

The following lines indicate the assignment of items to the single subtest, utilizing the informa-
tion on both the TEST and ITEMS commands.

TEST NUMBER: 1 TEST NAME: SPELL


NUMBER OF ITEMS: 4

ITEM ITEM ITEM ITEM ITEM ITEM ITEM ITEM


NUMBER NAME NUMBER NAME NUMBER NAME NUMBER NAME
---------------------------------------------------------------------
1 SP1 2 SP2 3 SP3 4 SP4
---------------------------------------------------------------------

Information on the forms and groups is given next. The definition of the male and female groups,
and the use of the same four items for both groups are reflected below. It is also noted that a DIF
model is to be employed in this analysis.

FORM SPECIFICATIONS
===================
ITEMS READ ACCORDING TO SPECIFICATIONS ON THE ITEMS COMMAND

>GROUP1 GNAME=MALES;
>GROUP2 GNAME=FEMALES;

640
10 BILOG-MG EXAMPLES

MULTIPLE GROUP SPECIFICATIONS


=============================
DIFFERENTIAL ITEM FUNCTIONING MODEL IS EMPLOYED.

GROUP NUMBER: 1 GROUP NAME: MALES


TEST NUMBER: 1 TEST NAME: SPELL
NUMBER OF ITEMS: 4

ITEM ITEM
NUMBER NAME
------------------
1 SP1
2 SP2
3 SP3
4 SP4
------------------

GROUP NUMBER: 2 GROUP NAME: FEMALES


TEST NUMBER: 1 TEST NAME: SPELL
NUMBER OF ITEMS: 4

ITEM ITEM
NUMBER NAME
------------------
1 SP1
2 SP2
3 SP3
4 SP4
------------------

Following is the format statement used in reading the data and the answer, omit, and not-present
keys (if any). Data for this example are item scores and they are complete; keys are not required.
The case ID is read in the first 2 columns (2A1), followed by the group indicator (I1). After read-
ing the weights (F2.0), the 4 item responses are read (4A1).

FORMAT FOR DATA INPUT IS:


(2A1,I1,T10,F2.0,T5,4A1)

The first two cases are echoed to the output file so that the user can verify the input.

OBSERVATION # 1 WEIGHT: 22.0000 ID : 1

SUBTEST #: 1 SPELL
GROUP #: 1 MALES

TRIED RIGHT
4.000 0.000

ITEM 1 2 3 4
TRIED 1.0 1.0 1.0 1.0
RIGHT 0.0 0.0 0.0 0.0

641
10 BILOG-MG EXAMPLES

OBSERVATION # 2 WEIGHT: 10.0000 ID : 2

SUBTEST #: 1 SPELL
GROUP #: 1 MALES

TRIED RIGHT
4.000 1.000

ITEM 1 2 3 4
TRIED 1.0 1.0 1.0 1.0
RIGHT 0.0 0.0 0.0 1.0

Classical item statistics for the total sample and each group sample follow. #TRIED designates
the number of examinees responding to the item. For completeness, both the Pearson and biserial
item-test correlations are shown. The latter has smaller bias when the percent right is extreme.
The item statistics are given by group and then for the total group.

Item means, initial slope estimates, and Pearson and polyserial item-test correlations are given in
the next table.

Pearson

The point biserial correlation rPB , j for item j is a computationally simplified Pearson’s r between
the dichotomously scored item j and the total score x. It is computed as

(µ j − µ x ) pj
rPB , j =
σx qj

where µ j is the mean total score among examinees who have responded correctly to item j, µ x
is the mean total score for all examinees, p j is the item difficulty index for item j, q j is 1 − p j ,
and σ x is the standard deviation of the total scores for all examinees.

Polyserial correlation

The polyserial correlation rP , j can be expressed in terms of the point polyserial correlation as

rPP , jσ j
rP , j = m −1
∑ k =1
h( z jk )

where

z jk is the scoring corresponding to the cumulative proportion, p jk of the k-th response category
to item j, σ j is the standard deviation of items scores y for item j, and rPP , j is the point-

642
10 BILOG-MG EXAMPLES

polyserial correlation.

The biserial correlation estimates the relationship between the total score and the hypothetical
score on the continuous scale underlying the (dichotomous) item. The biserial correlation also
assumes a normal distribution of the hypothetical scores. The reason for reporting these correla-
tions separately for each group is that the appearance of large discrepancies between groups for a
given item would suggest that the assumption of a common slope is untenable. Note that, if a
biserial correlation more negative than –0.15 is detected by the program during this phase of the
analysis, the item in question will be assumed miskeyed and will be omitted in the Phase 2
analysis.

ITEM STATISTICS FOR GROUP: 1 MALES


ITEM*TEST CORRELATION
ITEM NAME #TRIED #RIGHT PCT LOGIT PEARSON BISERIAL
-----------------------------------------------------------------------
1 SP1 285.0 215.0 0.754 -1.12 0.243 0.332
2 SP2 285.0 181.0 0.635 -0.55 0.351 0.450
3 SP3 285.0 91.0 0.319 0.76 0.364 0.474
4 SP4 285.0 179.0 0.628 -0.52 0.360 0.461
----------------------------------------------------------------------

ITEM STATISTICS FOR GROUP: 2 FEMALES


ITEM*TEST CORRELATION
ITEM NAME #TRIED #RIGHT PCT LOGIT PEARSON BISERIAL
-----------------------------------------------------------------------
1 SP1 374.0 305.0 0.816 -1.49 0.254 0.370
2 SP2 374.0 230.0 0.615 -0.47 0.295 0.376
3 SP3 374.0 109.0 0.291 0.89 0.231 0.307
4 SP4 374.0 171.0 0.457 0.17 0.306 0.385
-----------------------------------------------------------------------

ITEM STATISTICS FOR MULTIPLE GROUPS SPELL


ITEM*TEST CORRELATION
ITEM NAME #TRIED #RIGHT PCT LOGIT PEARSON BISERIAL
-----------------------------------------------------------------------
1 SP1 659.0 520.0 0.789 -1.32 0.239 0.337
2 SP2 659.0 411.0 0.624 -0.51 0.320 0.409
3 SP3 659.0 200.0 0.303 0.83 0.291 0.383
4 SP4 659.0 350.0 0.531 -0.12 0.324 0.406
-----------------------------------------------------------------------

Phase 2 output

During calibration, a logistic item response function is fitted to each item of each subscale. In
this example, a 1-parameter logistic response function is fitted (NPARM =1 on GLOBAL).

Echoing of the Phase 2 commands and specification of the analysis starts the listing of Phase 2
output.

>CALIB NQPT=10,CYCLES=15,CRIT=0.005,NEWTON=2,REFERENCE=1;

643
10 BILOG-MG EXAMPLES

Under CALIBRATION PARAMETERS, the definitions of calibration related keywords for this analy-
sis are given:

CALIBRATION PARAMETERS
======================

MAXIMUM NUMBER OF EM CYCLES: 15


MAXIMUM NUMBER OF NEWTON CYCLES: 2
CONVERGENCE CRITERION: 0.0050
ACCELERATION CONSTANT: 0.5000

LATENT DISTRIBUTION: EMPIRICAL PRIOR FOR EACH GROUP


ESTIMATED CONCURRENTLY
WITH ITEM PARAMETERS
REFERENCE GROUP: 1
PLOT EMPIRICAL VS. FITTED ICC’s: YES, FOR ITEMS WITH FIT PROBABILITY
LESS THAN 1.00000
DATA HANDLING: DATA ON SCRATCH FILE
CONSTRAINT DISTRIBUTION ON SLOPES: NO
CONSTRAINT DISTRIBUTION ON THRESHOLDS: NO

MML estimation is used when tests of three or more items are specified. The solution assumes
that the respondents are drawn randomly from a population or populations of abilities, which is
assumed to have a normal distribution. The empirical distribution of ability is represented as a
discrete distribution on a finite number of points. The quadrature points and weights used for
MML estimation of the item parameters for the two groups are printed next.

METHOD OF SOLUTION:
EM CYCLES (MAXIMUM OF 15)
FOLLOWED BY NEWTON-RAPHSON STEPS (MAXIMUM OF 2)

QUADRATURE POINTS AND PRIOR WEIGHTS: GROUP 1 MALES


1 2 3 4 5
POINT -0.4000E+01 -0.3111E+01 -0.2222E+01 -0.1333E+01 -0.4444E+00
WEIGHT 0.1190E-03 0.2805E-02 0.3002E-01 0.1458E+00 0.3213E+00

6 7 8 9 10
POINT 0.4444E+00 0.1333E+01 0.2222E+01 0.3111E+01 0.4000E+01
WEIGHT 0.3213E+00 0.1458E+00 0.3002E-01 0.2805E-02 0.1190E-03

QUADRATURE POINTS AND PRIOR WEIGHTS: GROUP 2 FEMALES


1 2 3 4 5
POINT -0.4000E+01 -0.3111E+01 -0.2222E+01 -0.1333E+01 -0.4444E+00
WEIGHT 0.1190E-03 0.2805E-02 0.3002E-01 0.1458E+00 0.3213E+00

6 7 8 9 10
POINT 0.4444E+00 0.1333E+01 0.2222E+01 0.3111E+01 0.4000E+01
WEIGHT 0.3213E+00 0.1458E+00 0.3002E-01 0.2805E-02 0.1190E-03

The MML solution employs both the EM method and Newton-Gauss iterations to solve the mar-
ginal likelihood equations. On the CALIB command, a maximum of 15 EM cycles and 2 Newton-
Gauss iterations were requested. Results for each iteration are displayed so that the extent of
convergence can be judged.

644
10 BILOG-MG EXAMPLES

In the case of nested models on the same data, the –2 log likelihood values at convergence can be
used to evaluate the fit of the models. Refitting this example, for instance as a single-group
analysis will allow the comparison of non-DIF and DIF models for these data. In that way, it can
be determined whether differential item functioning effects are present.

[E-M CYCLES]

-2 LOG LIKELIHOOD = 3152.375

CYCLE 1; LARGEST CHANGE= 0.17572


-2 LOG LIKELIHOOD = 3128.806

...

CYCLE 8; LARGEST CHANGE= 0.00486

The information matrix for all item parameters is approximated during each Newton step and
then used at convergence to provide large-sample standard errors of estimation on the item pa-
rameter estimates.

[FULL NEWTON CYCLES]


-2 LOG LIKELIHOOD: 3110.3990
CYCLE 9; LARGEST CHANGE= 0.00416

In Phase 2, when there is a single group, the unit and origin of the scale on which the parameters
are expressed is based on the assumption that the latent ability distribution has zero mean and
unit variance (the so-called “0,1” metric). In the case of multiple groups, the program provides
the option of setting the mean and standard deviation of one group to 0,1 as shown here. The user
may set the mean and standard deviation of the combined estimated distribution of the groups to
0 and 1 by setting the REFERENCE keyword on the CALIB command to zero. The parameter esti-
mates can be rescaled in Phase 3 according to scale conventions selected by the user (using the
RSCTYPE, SCALE and LOCATION keywords on the SCORE command). In a DIF model, no scoring is
done, so use of the REFERENCE=0 specification is not pursued here.

Estimated item parameters for the two groups are given next. The INTERCEPT column contains
the estimates of the item intercepts, which are defined as the product of each item’s slope and
threshold. This is followed by the slope or discrimination parameters and the item threshold or
location parameters. The LOADING column represents the one-factor item factor loadings given
by the expression

slope
.
1.0 + slope 2

For a 1PL model, no asymptotes or guessing parameters are estimated. In a 1PL model, all slopes
are equal. In DIF analyses, the assumption is made that slopes are equal over the groups. This
implies that items will discriminate equally well in all groups. Note that, in this example, the
slopes of all items for both groups are constrained to 1.285.

645
10 BILOG-MG EXAMPLES

GROUP 1 MALES ; ITEM PARAMETERS AFTER CYCLE 9

ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE CHISQ DF


S.E. S.E. S.E. S.E. S.E. (PROB)
-------------------------------------------------------------------------------
SP1 | 1.489 | 1.285 | -1.159 | 0.789 | 0.000 | 2.7 7.0
| 0.168* | 0.096* | 0.130* | 0.059* | 0.000* | (0.9146)
| | | | | |
SP2 | 0.749 | 1.285 | -0.583 | 0.789 | 0.000 | 20.0 7.0
| 0.152* | 0.096* | 0.119* | 0.059* | 0.000* | (0.0056)
| | | | | |
SP3 | -1.008 | 1.285 | 0.784 | 0.789 | 0.000 | 36.2 5.0
| 0.109* | 0.096* | 0.085* | 0.059* | 0.000* | (0.0000)
| | | | | |
SP4 | 0.709 | 1.285 | -0.552 | 0.789 | 0.000 | 26.6 7.0
| 0.150* | 0.096* | 0.117* | 0.059* | 0.000* | (0.0004)
-------------------------------------------------------------------------------
* STANDARD ERROR
LARGEST CHANGE = 0.004158 95.3 12.0
(0.0000)
GROUP 2 FEMALES ; ITEM PARAMETERS AFTER CYCLE 9

ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE CHISQ DF


S.E. S.E. S.E. S.E. S.E. (PROB)
-------------------------------------------------------------------------------
SP1 | 1.887 | 1.285 | -1.468 | 0.789 | 0.000 | 11.3 7.0
| 0.168* | 0.096* | 0.131* | 0.059* | 0.000* | (0.1261)
| | | | | |
SP2 | 0.617 | 1.285 | -0.480 | 0.789 | 0.000 | 34.6 7.0
| 0.136* | 0.096* | 0.106* | 0.059* | 0.000* | (0.0000)
| | | | | |
SP3 | -1.113 | 1.285 | 0.866 | 0.789 | 0.000 | 24.5 7.0
| 0.144* | 0.096* | 0.112* | 0.059* | 0.000* | (0.0009)
| | | | | |
SP4 | -0.203 | 1.285 | 0.158 | 0.789 | 0.000 | 43.0 7.0
| 0.133* | 0.096* | 0.104* | 0.059* | 0.000* | (0.0000)
-------------------------------------------------------------------------------
* STANDARD ERROR
LARGEST CHANGE = 0.004158 95.3 12.0
(0.0000)
NOTE: ITEM FIT CHI-SQUARES AND THEIR SUMS MAY BE UNRELIABLE
FOR TESTS WITH LESS THAN 20 ITEMS

The item parameter estimates for each group are followed by the averages for the group thresh-
olds. The mean threshold of the female group (Group 2) is 0.146 above that of the male or refer-
ence group. DIF is item by group interaction under the constraint that the mean thresholds of the
groups are equal. The threshold adjustment sets the mean of the reference group’s threshold to 1,
and the mean threshold for the females is accordingly adjusted to 0.148. The unadjusted and ad-
justed mean thresholds for the two groups form the next section of the Phase 2 output file.

PARAMETER MEAN STN DEV


-----------------------------------
GROUP: 1 NUMBER OF ITEMS: 4
THRESHOLD -0.377 0.823
GROUP: 2 NUMBER OF ITEMS: 4
THRESHOLD -0.231 0.991
-----------------------------------

646
10 BILOG-MG EXAMPLES

THRESHOLD MEANS
GROUP ADJUSTMENT
------------------------
1 0.000
2 0.146
------------------------

MODEL FOR GROUP DIFFERENTIAL ITEM FUNCTIONING:


ADJUSTED THRESHOLD VALUES
ITEM GROUP
1 2
----------------------------------
SP1 | -1.159 | -1.614
| 0.130* | 0.131*
| |
SP2 | -0.583 | -0.626
| 0.119* | 0.106*
| |
SP3 | 0.784 | 0.720
| 0.085* | 0.112*
| |
SP4 | -0.552 | 0.012
| 0.117* | 0.104*
----------------------------------
*STANDARD ERROR

The adjusted threshold values are followed by the group differences of the constrained values.
The standard errors for the differences are computed as

s.e.G 2−G1 = var(G 2) + var(G1).

ITEM GROUP
2 - 1
-----------------------
SP1 | -0.455
| 0.185*
|
SP2 | -0.043
| 0.159*
|
SP3 | -0.065
| 0.141*
|
SP4 | 0.564
| 0.156*
-----------------------
*STANDARD ERROR

The estimated latent distributions of the groups are given next; with the origin and unit of scale
set so that the mean of the reference group is 0 and the standard deviation is 1.

GROUP: 1 MALES QUADRATURE POINTS, POSTERIOR WEIGHTS, MEAN AND S.D.:

1 2 3 4 5
POINT -0.3578E+01 -0.2788E+01 -0.1998E+01 -0.1208E+01 -0.4180E+00
POSTERIOR 0.1972E-03 0.4485E-02 0.4394E-01 0.1737E+00 0.2780E+00

647
10 BILOG-MG EXAMPLES

6 7 8 9 10
POINT 0.3720E+00 0.1162E+01 0.1952E+01 0.2742E+01 0.3532E+01
POSTERIOR 0.2647E+00 0.1724E+00 0.5526E-01 0.7020E-02 0.3483E-03

MEAN 0.00000
S.E. 0.00000

S.D. 1.00000
S.E. 0.00000

GROUP: 2 FEMALES QUADRATURE POINTS, POSTERIOR WEIGHTS, MEAN AND S.D.:

1 2 3 4 5
POINT -0.3724E+01 -0.2934E+01 -0.2144E+01 -0.1354E+01 -0.5642E+00
POSTERIOR 0.2099E-03 0.4246E-02 0.3608E-01 0.1456E+00 0.3067E+00

6 7 8 9 10
POINT 0.2258E+00 0.1016E+01 0.1806E+01 0.2596E+01 0.3386E+01
POSTERIOR 0.3161E+00 0.1525E+00 0.3473E-01 0.3598E-02 0.1624E-03

MEAN -0.16191
S.E. 0.06907

S.D. 0.89707
S.E. 0.00845

A plot of the two estimated latent distributions are shown below. The solid line represents the
estimated distribution of the male group.

BILOG-MG is also capable of producing graphic representations of a number of item and test
characteristics. Using the PLOT keyword on the CALIB command, it is possible to obtain plots of
the item-response functions with a significance level below the value assigned to the PLOT key-

648
10 BILOG-MG EXAMPLES

word. By default, PLOT=0 and no plots are produced. On the other hand, setting PLOT to 1.0 will
lead to the display of all item response functions in the output file. One such plot, for the fourth
item administered to the female group, is shown below.

The plot also shows 95% tolerance intervals for the observed percent correct among respondents
in corresponding EAP groups, assuming the percent-correct predicted by the model is correct.
Note that similar plots may be obtained using the IRTGRAPH program accessible from the Run
menu.

Similar graphs may be obtained through the IRTGRAPH program, accessed via the Plot option
on the Run menu in BILOG-MG for Windows.

GROUP: 2 FEMALES
SUBTEST: SPELL

ITEM: SP4 CHISQ = 43.0 DF = 7.0 PROB< 0.0000


1.00+------------------------------------------------------------+
| |
| |
0.90| |
| ...|
| ... |
0.80| ... |
| X .. |
| | .. |
0.70| | .. |
| |.. |
| .| |
0.60| .. | |
| .. | |
| . |
0.50| | .. |
| | .. |
| |.. |
0.40| | |
| ..| |
| .. | |
0.30| | .. X |
| | ... |
| | .. |
0.20| .|. |
| ... | |
| .... | |
0.10| ...... X |
|.. |
| |
0.00| |
+--+-----+-----+-----+-----+-----+-----+-----+-----+-----+---+
THETA -1.86 -1.48 -1.11 -0.73 -0.36 0.02 0.40 0.77 1.15 1.53

By saving the estimated parameter estimates to an external file, the estimates can also be used in
external packages to produce additional plots. Below, the item response functions for both
groups are plotted by item.

649
10 BILOG-MG EXAMPLES

10.3 Differential item functioning

The data from example 2 are analyzed here as a single group. Thus no NGROUP keyword is pro-
vided on the INPUT command and, by default, the program assumes there is only one group. No
GROUP commands follow the TEST command, and the group indicator has been removed from the
variable format statement.

The acceleration factor on the CALIB command has been set to its default value of 0.5
(ACCEL=0.5). The difference in the log likelihoods from the two-group and single-group solu-
tions can be examined to determine if differential item functioning effects are present. The item
parameter file obtained in the previous section is specified in the GLOBAL command to provide
starting values for parameter estimation in Phase 2.

650
10 BILOG-MG EXAMPLES

EXAMPL03.BLM - MALE VS. FEMALE DIFFERENTIAL ITEM FUNCTIONING


SPELLING, GIRDER ITEM 4, OTHER 3 ITEMS 1-3
>GLOBAL NPARM=1, NWGHT=3, LOGISTIC, IFNAME='EXAMPL02.PAR',
DFNAME='EXAMPL02.DAT';
>LENGTH NITEMS=4;
>INPUT NTOTAL=4, NIDCHAR=2, TYPE=2;
>ITEMS INAME=(SP01,SP02,SP03,SP04), INUMBERS=(1(1)4);
>TEST TNAME=SPELL;
(2A1,T10,F2.0,T5,4A1)
>CALIB EMPIRICAL, NQPT=31, CRIT=0.005, ACCEL=0.5;

Phase 1 output

EXAMPLE 02: MALE VS. FEMALE DIFFERENTIAL ITEM FUNCTIONING


SPELLING, GIRDER ITEM 4, OTHER 3 ITEMS 1-3

The Phase 1 output for this example is the same as that obtained in Section 10.2, except that clas-
sical item statistics are computed only for the total sample.

Phase 2 output

The main interest in this example is the comparison of the log likelihood of the fit of the DIF and
non-DIF models. The difference, 3138.4122 – 3110.3990 = 28.0132, distributed as χ 2 on four
degrees of freedom, indicates significantly better fit of the DIF model.

-2 LOG LIKELIHOOD: 3138.4122


CYCLE 4; LARGEST CHANGE= 0.00439

SUBTEST SPELL ; ITEM PARAMETERS AFTER CYCLE 4

ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE CHISQ DF


S.E. S.E. S.E. S.E. S.E. (PROB)
-------------------------------------------------------------------------------
SP01 | 1.688 | 1.234 | -1.368 | 0.777 | 0.000 | 15.4 3.0
| 0.124* | 0.094* | 0.101* | 0.059* | 0.000* | (0.0015)
| | | | | |
SP02 | 0.662 | 1.234 | -0.536 | 0.777 | 0.000 | 33.6 3.0
| 0.105* | 0.094* | 0.085* | 0.059* | 0.000* | (0.0000)
| | | | | |
SP03 | -1.069 | 1.234 | 0.866 | 0.777 | 0.000 | 17.9 3.0
| 0.111* | 0.094* | 0.090* | 0.059* | 0.000* | (0.0005)
| | | | | |
SP04 | 0.169 | 1.234 | -0.137 | 0.777 | 0.000 | 32.2 3.0
| 0.102* | 0.094* | 0.082* | 0.059* | 0.000* | (0.0000)
-------------------------------------------------------------------------------
* STANDARD ERROR

LARGEST CHANGE = 0.004551 99.0 12.0


(0.0000)

NOTE: ITEM FIT CHI-SQUARES AND THEIR SUMS MAY BE UNRELIABLE


FOR TESTS WITH LESS THAN 20 ITEMS

651
10 BILOG-MG EXAMPLES

10.4 Equivalent groups equating

This example illustrates the equating of equivalent groups with the BILOG-MG program. Two
parallel test forms of 20 multiple-choice items were administered to two equivalent samples of
200 examinees drawn from the same population. There are no common items between the forms.
Because the samples were drawn from the same population, GROUP commands are not required.
The FORM1 command lists the order of the items in Form 1 and the FORM2 command lists the or-
der of the items in Form 2. These commands follow directly after the TEST command as indi-
cated by the NFORM=2 keyword on the INPUT command. As only one test is used, the vector of
items per subtest given by the NITEMS keyword on the LENGTH command contains only one entry.

The SAVE option on the GLOBAL command is used in combination with the SAVE command to
save item parameter estimates and scores to the external files exampl04.par and exampl04.sco
respectively.

In this example, 40 unique item responses are given in the data file. The first few lines of the
data file are shown below. The first record shown after the answer keys for the two forms, which
should always appear first and in the same format as the data, contains responses to items 1
through 20 in the second line associated with this examinee. In the case of the data shown for
another examinee who responded to the second form, responses in the same positions in the data
file correspond to items 21 through 40. Keep in mind that the number of items read by the format
statement is the total number of items in the form, when NFORM=1 and the total number of items
in the longest form when NFORM>1.

1 11111111111111111111
2 11111111111111111111
1 001 11111111122212122111
1 002 11222212221222222112
1 003 12121221222222221222
1 004 11212212222222212222

2 198 11112211111222212211
2 199 21122222222222222122
2 200 11111111111111221111

The FLOAT option is used on the CALIB command to request the estimation of the means of the
prior distributions of item parameters along with the parameters. This option should not be used
when the data set is small and items few. Means of the item parameters may drift indefinitely
during estimation cycles under these conditions. In the CALIB command, the FIXED option is also
required to keep the prior distributions of ability fixed during the EM cycles of this example. In
multiple-group analysis, the default is “not fixed”.

ML estimates of ability are rescaled to a mean of 250 and standard deviation of 50 in Phase 3
(METHOD=1, RSCTYPE=3, LOCATION=250, SCALE=50). By settting INFO to 1 on the SCORE com-
mand, the printing of test information curves to the phase 3 output file is requested. To request
the calculation of expected information for the population, the POP option may be added to this
command. In the case of multiple subtests, the further addition of the YCOMMON option will re-

652
10 BILOG-MG EXAMPLES

quest the expression of test information curves for the subtests in comparable units.

EXAMPL04.BLM - EQUIVALENT GROUPS EQUATING


SIMULATED RESPONSES TO TWO 20-ITEM PARALLEL TEST FORMS
>GLOBAL DFNAME='EXAMPL04.DAT', NPARM=2, SAVE;
>SAVE SCORE='EXAMPL04.SCO', PARM='EXAMPL04.PAR';
>LENGTH NITEMS=40;
>INPUT NTOT=40, NFORM=2, KFNAME='EXAMPL04.DAT', NALT=5, ,NIDCHAR=5;
>ITEMS INUM=(1(1)40), INAME=(T01(1)T40);
>TEST TNAME=SIM;
>FORM1 LENGTH=20, INUM=(1(1)20);
>FORM2 LENGTH=20, INUM=(21(1)40);
(5A1,T1,I1,T7,20A1)
>CALIB FIXED, FLOAT, NQPT=31, TPRIOR, PLOT=.05;
>SCORE METHOD=1, RSCTYPE=3, LOCATION=250, SCALE=50, NOPRINT, INFO=1;

Phase 1 output

Because all examples are drawn from the same population, all responses are combined in the re-
sults. Since there are no common items between forms, the number tried for each item is 200. If
there had been common items, their number tried would be 400. Results for the first 5 items are
shown below.

400 OBSERVATIONS READ FROM FILE: EXAMPL04.DAT


400 OBSERVATIONS WRITTEN TO FILE: MF.DAT

ITEM STATISTICS FOR SUBTEST SIM


ITEM*TEST CORRELATION
ITEM NAME #TRIED #RIGHT PCT LOGIT/1.7 PEARSON BISERIAL
----------------------------------------------------------------------
1 T01 200.0 165.0 82.5 -0.91 0.446 0.658
2 T02 200.0 171.0 85.5 -1.04 0.416 0.642
3 T03 200.0 150.0 75.0 -0.65 0.524 0.715
4 T04 200.0 138.0 69.0 -0.47 0.448 0.588
5 T05 200.0 149.0 74.5 -0.63 0.391 0.531

Phase 2 output

Item parameter estimation assumes a common latent distribution for the random equivalent
groups administered the respective test forms. Empirical prior distributions are assumed for the
slope and threshold parameters. The means of these priors are estimated concurrently with the
item parameters.

CALIBRATION PARAMETERS
======================

MAXIMUM NUMBER OF EM CYCLES: 20


MAXIMUM NUMBER OF NEWTON CYCLES: 2
CONVERGENCE CRITERION: 0.0100
ACCELERATION CONSTANT: 1.0000
LATENT DISTRIBUTION: NORMAL PRIOR FOR EACH GROUP
PLOT EMPIRICAL VS. FITTED ICC’s: YES, FOR ITEMS WITH FIT
PROBABILITY
LESS THAN 0.05000
DATA HANDLING: DATA ON SCRATCH FILE

653
10 BILOG-MG EXAMPLES

CONSTRAINT DISTRIBUTION ON SLOPES: YES


CONSTRAINT DISTRIBUTION ON THRESHOLDS: YES
SOURCE OF ITEM CONSTRAINT DISTRIBUTION
MEANS AND STANDARD DEVIATIONS: PROGRAM DEFAULTS
ITEM CONSTRAINTS IF PRESENT
WILL BE UPDATED EACH CYCLE

Final iterations of the solutions and some of the results are as follows. Indeterminacy of the ori-
gin and unit of the ability scale is resolved in Phase 2 by setting the mean and standard deviation
of the latent distribution to zero and one, respectively.

-2 LOG LIKELIHOOD = 8297.415

UPDATED PRIOR ON LOG SLOPES; MEAN & SD = -0.23882 0.50000


UPDATED PRIOR ON THRESHOLDS; MEAN & SD = -0.01801 2.00000

CYCLE 5; LARGEST CHANGE= 0.00752

[NEWTON CYCLES]

UPDATED PRIOR ON LOG SLOPES; MEAN & SD = -0.23457 0.50000


UPDATED PRIOR ON THRESHOLDS; MEAN & SD = -0.01751 2.00000

-2 LOG LIKELIHOOD: 8297.4560

CYCLE 6; LARGEST CHANGE= 0.00489

After assigning cases to the intervals (shown below) on the basis of the EAP estimates of their
scale scores, the program computes the expected number of correct responses in the interval by
multiplying these counts by the response model probability at the indicated θ . The χ 2 is com-
puted in the usual way from the differences between the observed and expected counts.

The counts are displayed so that the user can judge whether there are enough cases in each group
to justify computing a χ 2 statistic. If not, the user should reset the number of intervals.

INTERVAL COUNTS FOR COMPUTATION OF ITEM CHI-SQUARES


-----------------------------------------------------------------------
15. 30. 36. 52. 70. 69. 48. 36. 44.
-----------------------------------------------------------------------

INTERVAL AVERAGE THETAS


-----------------------------------------------------------------------
-2.000 -1.520 -1.076 -0.648 -0.191 0.235 0.620 1.100 1.724
-----------------------------------------------------------------------

SUBTEST SIM ; ITEM PARAMETERS AFTER CYCLE 6


ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE CHISQ DF
S.E. S.E. S.E. S.E. S.E. (PROB)
-------------------------------------------------------------------------------
T01 | 1.339 | 1.000 | -1.338 | 0.707 | 0.000 | 2.3 5.0
| 0.192* | 0.206* | 0.194* | 0.146* | 0.000* | (0.8044)
T02 | 1.488 | 0.961 | -1.549 | 0.693 | 0.000 | 4.4 6.0
| 0.211* | 0.199* | 0.218* | 0.144* | 0.000* | (0.6179)

654
10 BILOG-MG EXAMPLES

(Similar output omitted) | | | |


T39 | 0.508 | 0.911 | -0.557 | 0.673 | 0.000 | 1.8 6.0
| 0.119* | 0.172* | 0.126* | 0.127* | 0.000* | (0.9334)
T40 | 0.525 | 0.675 | -0.777 | 0.559 | 0.000 | 5.4 7.0
| 0.107* | 0.130* | 0.175* | 0.108* | 0.000* | (0.6055)
-------------------------------------------------------------------------------
* STANDARD ERROR

LARGEST CHANGE = 0.004890 176.3 243.0


(0.9996)

PARAMETER MEAN STN DEV


-----------------------------------
SLOPE 0.809 0.153
LOG(SLOPE) -0.230 0.189
THRESHOLD -0.019 0.975

Phase 3 output

For purposes of reporting test scores, the ability scale is set so that the mean score distribution in
the sample of examinees is 250 and the standard deviation is 50. The item parameters are re-
scaled accordingly.

>SCORE METHOD = 1, RSCTYPE = 3,LOCATION = 250, SCALE = 50,NOPRINT, INFO = 1;

PARAMETERS FOR SCORING, RESCALING, AND TEST AND ITEM INFORMATION


METHOD OF SCORING SUBJECTS: MAXIMUM LIKELIHOOD
SCORES WRITTEN TO FILE EXAMPL04.SCO
TYPE OF RESCALING: IN THE SAMPLE DISTRIBUTION
REFERENCE GROUP FOR RESCALING: GROUP: 0

Before rescaling, the sample mean score is essentially the same as that in the Phase 2 latent dis-
tribution. The standard deviation is larger, however, because the score distribution includes
measurement error variance.

Summary statistics for each group include the following.

 The correlation matrix of the test scores (when there is more than one test).
 The mean, standard deviation and variance of the θ score estimates:
 Maximum Likelihood (ML) estimate
 Bayes Model (Maximum A Posteriori, MAP) estimate
 Bayes (Expected, EAP) estimate

The summary of the error variation depends on the type of estimate:

 Maximum Likelihood – Harmonic Root-Mean-Square standard errors: The error variance


for each case is the reciprocal of the Fisher information at the likelihood maximum for the
case. The standard error is the reciprocal square root of the average of these variances.
 MAP – Root-Mean-Square posterior standard deviation: The error variance for each case
is the posterior information at the maximum of the posterior probability density of θ ,

655
10 BILOG-MG EXAMPLES

given the response pattern of the case. The standard error is the square root of the average
of these variances.
 EAP – Root-Mean-Square posterior standard deviation: The error variance for each case is
the variance of the posterior distribution of theta, given the response pattern of the case.
The standard error is the square root of the average of these variances.

The empirical reliability of the test is the θ score variance divided by the sum of that variance
and the error variance.

Note:

The expected value of the sum of the θ score variance and the error variance is the variance of
the latent distribution of the group. The sum of the corresponding sample variances should tend
to that value as the sample size increases.

SUMMARY STATISTICS FOR SCORE ESTIMATES


======================================

CORRELATIONS AMONG TEST SCORES

SIM
SIM 1.0000

MEANS, STANDARD DEVIATIONS, AND VARIANCES OF SCORE ESTIMATES

TEST: SIM
MEAN: 0.0057
S.D.: 1.1426
VARIANCE: 1.3054

HARMONIC ROOT-MEAN-SQUARE STANDARD ERRORS OF THE ML ESTIMATES

TEST: SIM
RMS: 0.4203
VARIANCE: 0.1767

EMPIRICAL RELIABILITY: 0.8647

RESCALING WITH RESPECT TO SAMPLE DISTRIBUTION


---------------------------------------------------

RESCALING CONSTANTS
TEST SCALE LOCATION
SIM 43.762 249.749

The scaled scores are saved on an external file and their printing is suppressed in all but the first
two cases.

656
10 BILOG-MG EXAMPLES

GROUP SUBJECT IDENTIFICATION


WEIGHT TEST TRIED RIGHT PERCENT ABILITY S.E.
----------------------------------------------------------------
1 1 | |
1.00 SIM 20 14 70.00 | 282.5091 17.5097 |
1 1 | |
1.00 SIM 20 6 30.00 | 217.0505 16.8979 |
----------------------------------------------------------------

The magnitudes of the rescaled item parameters reflect the new origin and unit of the scale. The
thresholds center around 250 and the slopes are smaller by a factor of about 50. The slopes are
printed here to only three decimal places but appear accurately in the saved items parameter file.
If saved parameters are used to score other examinees, the results will be determined in the pre-
sent sample.

TEST SIM ; RESCALED ITEM PARAMETERS


ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE
S.E. S.E. S.E. S.E. S.E.
---------------------------------------------------------------
T01 | -4.371 | 0.023 | 191.189 | 0.707 | 0.000
| 1.190* | 0.005* | 8.501* | 0.146* | 0.000*
| | | | |
T02 | -3.994 | 0.022 | 181.956 | 0.693 | 0.000
| 1.157* | 0.005* | 9.357* | 0.144* | 0.000*
| | | | |
(Similar output omitted)
| | | | |
T39 | -4.691 | 0.021 | 225.362 | 0.673 | 0.000
| 0.988* | 0.004* | 5.524* | 0.127* | 0.000*
| | | | |
T40 | -3.327 | 0.015 | 215.726 | 0.559 | 0.000
| 0.748* | 0.003* | 7.651* | 0.108* | 0.000*
---------------------------------------------------------------

PARAMETER MEAN STN DEV


-----------------------------
SLOPE 0.018 0.003
LOG(SLOPE) -4.009 0.189
THRESHOLD 248.921 42.657

MEAN & SD OF SCORE ESTIMATES AFTER RESCALING: 250.000 50.000

Results of the information analysis are depicted in the following line printer plot. Points indi-
cated by + and * represent the information and measurement error functions, respectively. This
plot applies to all 40 items and not to the separate test forms. Because the item thresholds are
normally distributed with mean standard similar to that of the score distribution, the precision of
the item set is greatest toward the middle of the scale.

657
10 BILOG-MG EXAMPLES

TEST INFORMATION CURVE FOR TEST: SIM FORM: 1


STANDARD INFOR-
ERROR MATION
------------------------------------------------------------------
1.48| * +++++ * | .1341
| + ++ |
1.41| ++ + | 6.7774
| * * |
1.33| + + | 6.4207
| + + |
1.26| * + * | 6.0640
| + |
1.19| + * | 5.7073
| * + |
1.11| + * | 5.3506
| * + |
1.04| + * | 4.9939
| * + |
0.96| * | 4.6372
| * + + * |
0.89| | 4.2805
| * + + * |
0.82| * | 3.9237
| * + + |
0.74| * | 3.5670
| * + * |
0.67| * + * | 3.2103
| * + * |
0.59| + + | 2.8536
| * ** |
0.52| + * * + | 2.4969
| ** ** + |
0.44| + ** ** | 2.1402
| + **** **** + |
0.37| *********** + | 1.7835
| + + |
0.30| + + | 1.4268
| + + |
0.22| + + | 1.0701
| + ++ |
0.15| ++ + | 0.7134
| ++ +++ |
0.07|+++ +| 0.3567
| |
0.00| | 0.0000
-+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
-4.00 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 4.00

10.5 Vertical equating

Two hundred students at each of three grade levels, grades four, six, and eight, were given grade-
appropriate versions of a 20-item arithmetic examination. Items 19 and 20 appear in the grade 4
and 6 forms; items 37 and 38 appear in the grade 6 and 8 forms. Because each item is assigned a
unique column in the data records, a FORM command is not required.

The data file contains, the answer key, not-presented key and raw data. Two lines of information
are given per examinee as shown below. The answer key contains 56 entries, each equal to 1. If

658
10 BILOG-MG EXAMPLES

an item has not been presented, its presence in the data will be indicated with ‘ ‘.

KEY 11111111111111111111111111111111111111111111111111111111
NOT
001 1 11111112221211222212
002 1 21121211121111121212
003 1 11112112211222212212
004 1 11111112121111111211
005 1 21111112221212121222

No items are assigned to the TEST command using the INAMES or INUMBERS keywords. By de-
fault, it is assumed that all items are assigned to the test. Although the test name (TNAME=MATH) is
not enclosed in single quotes, the group names are as these names contain blanks as part of the
name.

The distributions of ability are assumed to be normal at each grade level (NORMAL on the CALIB
command). Grade 6 serves as the reference group in the calibration of the items (REFERENCE=2).
EAP estimates of ability are calculated using the information in the posterior distributions from
Phase 2. The ability estimates are rescaled to a mean of 0 and standard deviation of 1 by specify-
ing RSCTYPE=3 on the SCORE command.

EXAMPL05.BLM - VERTICAL EQUATING OF TEST FORMS OVER THREE GRADE LEVELS

>GLOBAL DFNAME='EXAMPL05.DAT', NPARM=2, SAVE;


>SAVE SCORE='EXAMPL05.SCO', PARM='EXAMPL05.PAR';
>LENGTH NITEMS=56;
>INPUT NTOT=56, NGROUPS=3, NIDCH=3,
KFNAME='EXAMPL05.DAT', NFNAME='EXAMPL05.DAT';
>ITEMS INUM=(1(1)56), INAME=(M01(1)M56);
>TEST TNAME=MATH;
>GROUP1 GNAME='GRADE 4', LENGTH=20, INUM=(1(1)20);
>GROUP2 GNAME='GRADE 6', LENGTH=20, INUM=(19(1)38);
>GROUP3 GNAME='GRADE 8', LENGTH=20, INUM=(37(1)56);
(3A1,1X,I1,1X,56A1)
>CALIB NQPT=51, NORMAL, CYCLE=30, TPRIOR, REFERENCE=2;
>SCORE METHOD=2, IDIST=3, NOPRINT, RSCTYPE=3;

Phase 1 output

In this example, items assigned to the three groups of examinees are selected from the following
set. The items are selected in such a way that two items are common to groups 1 and 2 and two
other items are common to groups 2 and 3. The groups, corresponding to school grades four, six,
and eight are non-equivalent and require separate classical item statistics. The fact that classical
item statistics are not invariant with respect to sampling from different populations is illustrated
by the different results for common items in different groups.

MULTIPLE GROUP SPECIFICATIONS


=============================

MULTIPLE GROUPS ARE DEFINED,


BUT NEITHER DIF MODEL NOR PARAMETER DRIFT MODEL IS EMPLOYED.

659
10 BILOG-MG EXAMPLES

GROUP NUMBER: 1 GROUP NAME: GRADE 4


TEST NUMBER: 1 TEST NAME: MATH
NUMBER OF ITEMS: 20

ITEM ITEM
NUMBER NAME
------------------
1 M01
2 M02

20 M20
------------------

GROUP NUMBER: 2 GROUP NAME: GRADE 6


TEST NUMBER: 1 TEST NAME: MATH
NUMBER OF ITEMS: 20

ITEM ITEM
NUMBER NAME
------------------
19 M19
20 M20

38 M38
------------------

GROUP NUMBER: 3 GROUP NAME: GRADE 8


TEST NUMBER: 1 TEST NAME: MATH
NUMBER OF ITEMS: 20

ITEM ITEM
NUMBER NAME
------------------
37 M37

56 M56
------------------

600 OBSERVATIONS READ FROM FILE: EXAMPL05.DAT


600 OBSERVATIONS WRITTEN TO FILE: MF.DAT

SUBTEST 1 MATH
GROUP 1 GRADE 4 200 OBSERVATIONS
GROUP 2 GRADE 6 200 OBSERVATIONS
GROUP 3 GRADE 8 200 OBSERVATIONS

Item statistics for the first 5 items of each subtest are shown below. Similar output is produced
for grades 6 to 8, and for multiple groups MATH which, in this case, contains the statistics for all
the grades.

660
10 BILOG-MG EXAMPLES

SUBTEST 1 MATH
ITEM STATISTICS FOR GROUP: 1 GRADE 4
ITEM*TEST CORRELATION
ITEM NAME #TRIED #RIGHT PCT LOGIT/1.7 PEARSON BISERIAL
-----------------------------------------------------------------------
1 M01 200.0 138.0 0.690 -0.47 0.470 0.616
...
19 M19 200.0 95.0 0.475 0.06 0.520 0.652
20 M20 200.0 67.0 0.335 0.40 0.475 0.615
----------------------------------------------------------------------

ITEM STATISTICS FOR GROUP: 2 GRADE 6


ITEM*TEST CORRELATION
ITEM NAME #TRIED #RIGHT PCT LOGIT/1.7 PEARSON BISERIAL
----------------------------------------------------------------------
19 M19 200.0 138.0 0.690 -0.47 0.431 0.565
20 M20 200.0 106.0 0.530 -0.07 0.512 0.643
...
37 M37 200.0 104.0 0.520 -0.05 0.379 0.475
38 M38 200.0 62.0 0.310 0.47 0.497 0.651
------------------------------------------------------------------------

ITEM STATISTICS FOR GROUP: 3 GRADE 8


ITEM*TEST CORRELATION
ITEM NAME #TRIED #RIGHT PCT LOGIT/1.7 PEARSON BISERIAL
------------------------------------------------------------------------
37 M37 200.0 135.0 0.675 -0.43 0.420 0.546
38 M38 200.0 96.0 0.480 0.05 0.594 0.745
...
55 M55 200.0 90.0 0.450 0.12 0.471 0.592
56 M56 200.0 111.0 0.555 -0.13 0.529 0.665
------------------------------------------------------------------------

Phase 2 output

In vertical equating over a range of age levels, the ability distributions of the groups may be
widely spaced. For that reason, it is desirable to use a large number of quadrature points – in this
case, 51.

The origins and unit of the ability distribution can be fixed in the calibration either by setting the
mean and standard deviation of a reference group to zero and one, respectively, or, similarly, set-
ting the mean and standard deviation of the combined groups. In this example, group 2 is se-
lected as the reference group.

>CALIB NQPT=51, NORMAL, CYCLE=30, TPRIOR, REFERENCE=2;

CALIBRATION PARAMETERS
======================
MAXIMUM NUMBER OF EM CYCLES: 30
MAXIMUM NUMBER OF NEWTON CYCLES: 2
CONVERGENCE CRITERION: 0.0100
ACCELERATION CONSTANT: 1.0000
LATENT DISTRIBUTION: NORMAL PRIOR FOR EACH GROUP
GROUP MEANS AND SDS
ESTIMATED CONCURRENTLY
WITH ITEM PARAMETERS
REFERENCE GROUP: 2

661
10 BILOG-MG EXAMPLES

PLOT EMPIRICAL VS. FITTED ICC'S: NO


DATA HANDLING: DATA ON SCRATCH FILE

MEANS AND STANDARD DEVIATIONS: PROGRAM DEFAULTS
ITEM CONSTRAINTS IF PRESENT
WILL BE UPDATED EACH CYCLE

The iterative estimation procedures typically converge more slowly in nonequivalent group data
than in one group or equivalent groups data. The last few iterations are shown here along with
some of the resulting parameter estimates. The means of the prior distributions on item thresh-
olds and slopes are also listed.

CYCLE 19; LARGEST CHANGE= 0.02538


-2 LOG LIKELIHOOD = 13246.111
UPDATED PRIOR ON LOG SLOPES; MEAN & SD = -0.23806 0.50000
UPDATED PRIOR ON THRESHOLDS; MEAN & SD = 0.08303 2.00000

CYCLE 20; LARGEST CHANGE= 0.00812

[NEWTON CYCLES]
UPDATED PRIOR ON LOG SLOPES; MEAN & SD = -0.23533 0.50000
UPDATED PRIOR ON THRESHOLDS; MEAN & SD = 0.08308 2.00000
-2 LOG LIKELIHOOD: 13245.9542

CYCLE 21; LARGEST CHANGE= 0.00699

INTERVAL COUNTS FOR COMPUTATION OF ITEM CHI-SQUARES


-------------------------------------------------------------------------
19. 32. 56. 83. 93. 109. 82. 60. 66.
-------------------------------------------------------------------------

INTERVAL AVERAGE THETAS


-------------------------------------------------------------------------
-2.695 –1.942 -1.448 -0.866 -0.356 0.145 0.607 1.193 1.989
-------------------------------------------------------------------------

SUBTEST MATH ; ITEM PARAMETERS AFTER CYCLE 21


ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE CHISQ DF
S.E. S.E. S.E. S.E. S.E. (PROB)
-------------------------------------------------------------------------
M01 | 1.218 | 0.805 | -1.512 | 0.627 | 0.000 | 3.2 5.0
| 0.194* | 0.155* | 0.163* | 0.121* | 0.000* | (0.6741)
| | | | | |
M02 | 1.149 | 0.707 | -1.626 | 0.577 | 0.000 | 4.4 6.0
| 0.169* | 0.129* | 0.186* | 0.105* | 0.000* | (0.6249)
| | | | | |
[Similar output omitted]
M55 | -0.584 | 0.707 | 0.826 | 0.577 | 0.000 | 3.9 6.0
| 0.129* | 0.126* | 0.143* | 0.103* | 0.000* | (0.6847)
| | | | | |
M56 | -0.319 | 0.849 | 0.376 | 0.647 | 0.000 | 1.1 5.0
| 0.127* | 0.144* | 0.125* | 0.110* | 0.000* | (0.9547)
-------------------------------------------------------------------------
* STANDARD ERROR

LARGEST CHANGE = 0.007897 188.0 296.0


(1.0000)

662
10 BILOG-MG EXAMPLES

PARAMETER MEAN STN DEV


-----------------------------------
SLOPE 0.802 0.138
LOG(SLOPE) -0.235 0.172
THRESHOLD 0.083 0.775

The within-group latent distributions are assumed normal. Their means and standard deviations
are estimated relative to the reference group. In these data, the means increase over the grades (-
0722, 0.000, 0.569), but the standard deviations are relatively constant (1.069, 1.00, 1.126).

GROUP: 1 GRADE 4 QUADRATURE POINTS, POSTERIOR WEIGHTS, MEAN AND S.D.:


1 2 3 4 5
POINT -0.4275E+01 -0.4105E+01 -0.3935E+01 -0.3765E+01 -0.3594E+01
POSTERIOR 0.4299E-03 0.7062E-03 0.1119E-02 0.1717E-01 0.2558E-02

[Similar output omitted]

47 48 49 50 51
POINT 0.3552E+01 0.3722E+01 0.3892E+01 0.4062E+01 0.4232E+01
POSTERIOR 0.1899E-04 0.9879E-05 0.3535E-05 0.1816E-05 0.9055E-06

MEAN -0.72298
S.E. 0.11260

S.D. 1.06880
S.E. 0.12631

GROUP: 2 GRADE 6 QUADRATURE POINTS, POSTERIOR WEIGHTS, MEAN AND S.D.:


1 2 3 4 5
POINT -0.4275E+01 -0.4105E+01 -0.3935E+01 -0.3765E+01 -0.3594E+01
POSTERIOR 0.1136E-04 0.2278E-04 0.4596E-04 0.8712E-04 0.1599E-03

[Similar output omitted]

47 48 49 50 51
POINT 0.3552E+01 0.3722E+01 0.3892E+01 0.4062E+01 0.4232E+01
POSTERIOR 0.1172E-03 0.6346E-04 0.3291E-04 0.1689E-04 0.8409E-05

MEAN 0.00000
S.E. 0.00000

S.D. 1.00000
S.E. 0.00000

GROUP: 3 GRADE 8 QUADRATURE POINTS, POSTERIOR WEIGHTS, MEAN AND S.D.:


1 2 3 4 5
POINT -0.4275E+01 -0.4105E+01 -0.3935E+01 -0.3765E+01 -0.3594E+01
POSTERIOR 0.4219E-05 0.7809E-05 0.1793E-04 0.3292E-04 0.5918E-04

[Similar output omitted]

663
10 BILOG-MG EXAMPLES

47 48 49 50 51
POINT 0.3552E+01 0.3722E+01 0.3892E+01 0.4062E+01 0.4232E+01
POSTERIOR 0.1837E-02 0.1230E-02 0.8192E-03 0.5316E-03 0.3268E-03

MEAN 0.56861
S.E. 0.11855

S.D. 1.12577
S.E. 0.14026

Phase 3 output

With nonequivalent groups, Bayes (EAP) and Bayes Modal (MAP) estimation of test scores
should be carried out with respect to the Phase 2 latent distribution to which the examinee be-
longs. Specify IDIST=3 on the SCORE command.

>SCORE METHOD=2,IDIST=3,NOPRINT, RSCTYPE=3;

PARAMETERS FOR SCORING, RESCALING, AND TEST AND ITEM INFORMATION


METHOD OF SCORING SUBJECTS: EXPECTATION A POSTERIORI
(EAP; BAYES ESTIMATION)
TYPE OF PRIOR: EMPIRICAL, FROM ITEM CALIBRATION
TYPE OF RESCALING: IN THE SAMPLE DISTRIBUTION
REFERENCE GROUP FOR RESCALING: GROUP: 2

QUAD
TEST NAME GROUP POINTS
---------------------------
1 MATH 1 51
1 MATH 2 51
1 MATH 3 51
---------------------------

RESCALING CONSTANTS
TEST NAME SCALE LOCATION
------------------------------------
1 MATH 1.000 0.000
------------------------------------

In this example, the scores are rescaled so that their mean and standard deviation in the total
sample are zero and one, respectively. The parameter estimates are rescaled accordingly.

RESCALING WITH RESPECT TO SAMPLE DISTRIBUTION


---------------------------------------------------

RESCALING CONSTANTS
TEST SCALE LOCATION
MATH 1.066 0.003

664
10 BILOG-MG EXAMPLES

GROUP SUBJECT IDENTIFICATION MARGINAL


WEIGHT TEST TRIED RIGHT PERCENT ABILITY S.E. PROB
-------------------------------------------------------------------------
1 1 | |
1.00 MATH 20 11 55.00 | -0.3055 0.3598 | 0.000000
1 1 | |
1.00 MATH 20 13 65.00 | -0.0653 0.3620 | 0.000000
-------------------------------------------------------------------------

TEST MATH ; RESCALED ITEM PARAMETERS

ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE


S.E. S.E. S.E. S.E. S.E.
---------------------------------------------------------------
M01 | 1.216 | 0.755 | -1.610 | 0.627 | 0.000
| 0.194* | 0.145* | 0.173* | 0.121* | 0.000*
| | | | |
M02 | 1.148 | 0.663 | -1.732 | 0.577 | 0.000
| 0.169* | 0.121* | 0.198* | 0.105* | 0.000*
| | | | |
[Similar output omitted]
| | | | |
M55 | -0.566 | 0.670 | 0.845 | 0.581 | 0.000
| 0.127* | 0.120* | 0.151* | 0.104* | 0.000*
| | | | |
M56 | -0.298 | 0.805 | 0.370 | 0.651 | 0.000
| 0.125* | 0.136* | 0.132* | 0.110* | 0.000*
---------------------------------------------------------------

PARAMETER MEAN STN DEV


-----------------------------
SLOPE 0.752 0.130
LOG(SLOPE) -0.299 0.172
THRESHOLD 0.092 0.806

MEAN & SD OF SCORE ESTIMATES AFTER RESCALING

GROUP MEAN SD
-----------------------------
1 -0.776 1.067
2 0.000 1.000
3 0.608 1.118
-----------------------------

MEAN & SD OF LATENT DISTRIBUTIONS AFTER RESCALING

GROUP MEAN SD
-----------------------------
1 -0.776 1.149
2 0.000 1.074
3 0.608 1.201
-----------------------------

665
10 BILOG-MG EXAMPLES

10.6 Multiple matrix sampling data

This example illustrates the use of the TYPE=3 specification on the INPUT command to analyze
aggregate-level, multiple-matrix sampling data. The data in exampl06.dat are numbers tried and
numbers correct for items from eight forms of a matrix sampled assessment instrument. The
groups are selected 8th grade students from 32 public schools. The first record for each school
contains the data for the items of a Number Concepts scale, NUMCON, and the second record con-
tains the data for items of an Algebra Concepts scale, ALGCON. Data for the first two schools are
shown below.

SCHOOL 1 NUM 1 0 3 2 2 1 4 4 3 2 2 1 4 3 4 1
SCHOOL 1 ALG 1 0 3 1 2 0 3 2 3 2 2 1 4 1 4 0
SCHOOL 2 NUM 5 3 4 4 3 2 3 3 2 2 4 3 4 3 5 3
SCHOOL 2 ALG 5 2 4 2 3 2 3 2 2 2 4 2 4 2 5 3

An answer key is not required for aggregate-level data in number-tried, number-right summary
form. Note the format statement for reading the two sets of eight number-tried, number-right ob-
servations. For more information on how to set up the variable format statement for this type of
data, see 2.6.18.

The items are multiple-choice and fairly difficult, so the 3PL model is needed. Because aggre-
gate-level data are always more informative than individual-level item responses, it is worth-
while in the CALIB command to increase the number of quadrature points (NQPT), to set a stricter
criterion for convergence (CRIT), and to increase the CYCLES limit. A prior on the thresholds
(TPRIOR) and a ridge constant of 0.8 (RIDGE) are required for convergence with the exceptionally
difficult ALGCON subtest. Aggregate-level data typically have smaller slopes in the 0,1 metric
than do person-level data. For this reason, the mean of the prior for the log slopes has been set to
0.5 by use of the READPRIOR option of the CALIB command and the following PRIOR commands.

The aggregate scores for the schools are estimated by the EAP method using the empirical distri-
butions from Phase 2. The number of quadrature points is set the same as in Phase 2.

The scores are rescaled to a mean of 250 and a standard deviation of 50 in the latent distribution
of schools (IDIST=3, LOCATION=250, SCALE=50). The fit of the data to the group-level model is
tested for each school (FIT). The NUMCON items have fairly homogeneous slopes and might
be favorable for a one-parameter model.

EXAMPL06.BLM - MULTIPLE-MATRIX SAMPLING DATA


AGGREGATE-LEVEL MODEL
>GLOBAL NPARM=3, NTEST=2, DFNAME='EXAMPL06.DAT';
>LENGTH NITEMS=(8,8);
>INPUT NTOTAL=16, NALT=5, NIDCHAR=9, TYPE=3;
>ITEMS INUM=(1(1)16), INAMES=(N1(1)N8,A1(1)A8);
>TEST1 TNAME=NUMCON, INUM=(1(1)8);
>TEST2 TNAME=ALGCON, INUM=(9(1)16);
(9A1,T15,8(2F3.0)/T15,8(2F3.0))
>CALIB NQPT=51, CYCLES=50, NEWTON=10, CRIT=0.005, TPRIOR,
READPRIOR, NOFLOAT, RIDGE=(2,0.8,2.0), CHI=8, PLOT=1;
>PRIORS1 SMU=(0.5(0)8);
>PRIORS2 SMU=(0.5(0)8);

666
10 BILOG-MG EXAMPLES

>SCORE NQPT=(12,12), IDIST=3, RSCTYPE=4,


LOCATION=(250.0,250.0), SCALE=(50.0,50.0), FIT;

Phase 1 output

Group-level data consist of number-tried and number-right frequencies for each item in each
group. The program reads them as values rather than characters and conversion to item scores is
not required.

OBSERVATION # 1 WEIGHT: 1.0000 ID : SCHOOL 1

SUBTEST #: 1 NUMCON
GROUP #: 1

TRIED RIGHT
23.000 14.000
ITEM 1 2 3 4 5 6 7 8
TRIED 1.0 3.0 2.0 4.0 3.0 2.0 4.0 4.0
RIGHT 0.0 2.0 1.0 4.0 2.0 1.0 3.0 1.0

SUBTEST #: 2 ALGCON
GROUP #: 1

TRIED RIGHT
22.000 7.000
ITEM 1 2 3 4 5 6 7 8
TRIED 1.0 3.0 2.0 3.0 3.0 2.0 4.0 4.0
RIGHT 0.0 1.0 0.0 2.0 2.0 1.0 1.0 0.0

OBSERVATION # 2 WEIGHT: 1.0000 ID : SCHOOL 2

SUBTEST #: 1 NUMCON
GROUP #: 1

TRIED RIGHT
30.000 23.000
ITEM 1 2 3 4 5 6 7 8
TRIED 5.0 4.0 3.0 3.0 2.0 4.0 4.0 5.0
RIGHT 3.0 4.0 2.0 3.0 2.0 3.0 3.0 3.0

SUBTEST #: 2 ALGCON
GROUP #: 1

TRIED RIGHT
30.000 17.000
ITEM 1 2 3 4 5 6 7 8
TRIED 5.0 4.0 3.0 3.0 2.0 4.0 4.0 5.0
RIGHT 2.0 2.0 2.0 2.0 2.0 2.0 2.0 3.0

Classical item statistics are computed for each subtest. Biserial correlations cannot be computed
with group-level data.

667
10 BILOG-MG EXAMPLES

ITEM STATISTICS FOR SUBTEST NUMCON


ITEM*TEST CORRELATION
ITEM NAME #TRIED #RIGHT PCT LOGIT/1.7 PEARSON BISERIAL
-------------------------------------------------------------------------
1 N1 260.0 160.0 61.5 -0.28 0.637 0.000
2 N2 268.0 162.0 60.4 -0.25 0.682 0.000
3 N3 260.0 163.0 62.7 -0.31 0.663 0.000
4 N4 261.0 137.0 52.5 -0.06 0.637 0.000
5 N5 271.0 129.0 47.6 0.06 0.699 0.000
6 N6 271.0 154.0 56.8 -0.16 0.656 0.000
7 N7 270.0 157.0 58.1 -0.19 0.656 0.000
8 N8 266.0 170.0 63.9 -0.34 0.781 0.000
-------------------------------------------------------------------------

ITEM STATISTICS FOR SUBTEST ALGCON

ITEM*TEST CORRELATION
ITEM NAME #TRIED #RIGHT PCT LOGIT/1.7 PEARSON BISERIAL
-------------------------------------------------------------------------
1 A1 259.0 120.0 46.3 0.09 0.636 0.000
2 A2 267.0 81.0 30.3 0.49 0.606 0.000
3 A3 241.0 94.0 39.0 0.26 0.669 0.000
4 A4 245.0 121.0 49.4 0.01 0.687 0.000
5 A5 263.0 96.0 36.5 0.33 0.669 0.000
6 A6 263.0 166.0 63.1 -0.32 0.746 0.000
7 A7 267.0 71.0 26.6 0.60 0.667 0.000
8 A8 262.0 90.0 34.4 0.38 0.683 0.000
-------------------------------------------------------------------------

Phase 2 output

The set-up for group-level item calibration differs somewhat from examinee-level analysis: more
quadrature points and more iterations for the solution are required. Prior distributions for all pa-
rameters are necessary, the means should be kept fixed (default = NOFLOAT), and the mean of the
priors for slopes should be set lower than the examinee-level default.

>PRIORS1 SMU = (0.5000(0)8);

CONSTRAINT DISTRIBUTIONS ON ITEM PARAMETERS


(THRESHOLDS, NORMAL; SLOPES, LOG-NORMAL; GUESSING, BETA)

THRESHOLDS SLOPES ASYMPTOTES


ITEM MU SIGMA MU SIGMA ALPHA BETA
----------------------------------------------------------------------
N1 0.000 2.000 0.500 1.649 5.00 17.00
N2 0.000 2.000 0.500 1.649 5.00 17.00
N3 0.000 2.000 0.500 1.649 5.00 17.00
N4 0.000 2.000 0.500 1.649 5.00 17.00
N5 0.000 2.000 0.500 1.649 5.00 17.00
N6 0.000 2.000 0.500 1.649 5.00 17.00
N7 0.000 2.000 0.500 1.649 5.00 17.00
N8 0.000 2.000 0.500 1.649 5.00 17.00
----------------------------------------------------------------------

668
10 BILOG-MG EXAMPLES

Group-level item parameter estimates for the first 3 items in subtest NUMCON are as follows.

SUBTEST NUMCON ; ITEM PARAMETERS AFTER CYCLE 12

ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE CHISQ DF


S.E. S.E. S.E. S.E. S.E. (PROB)
-------------------------------------------------------------------------------
N1 | 0.030 | 0.190 | -0.156 | 0.186 | 0.232 | 5.7 6.0
| 0.194* | 0.066* | 1.026* | 0.065* | 0.094* | (0.4521)
| | | | | |
N2 | 0.046 | 0.279 | -0.163 | 0.268 | 0.218 | 3.8 6.0
| 0.222* | 0.107* | 0.801* | 0.103* | 0.093* | (0.7025)
| | | | | |
N3 | 0.126 | 0.313 | -0.404 | 0.299 | 0.212 | 3.2 5.0
| 0.224* | 0.120* | 0.735* | 0.115* | 0.091* | (0.6638)
-------------------------------------------------------------------------------
* STANDARD ERROR

LARGEST CHANGE = 0.003146 42.8 53.0


(0.8397)
NOTE: ITEM FIT CHI-SQUARES AND THEIR SUMS MAY BE UNRELIABLE
FOR TESTS WITH LESS THAN 20 ITEMS

PARAMETER MEAN STN DEV


-----------------------------------
ASYMPTOTE 0.210 0.041
SLOPE 0.306 0.099
LOG(SLOPE) -1.223 0.290
THRESHOLD 2.241 1.515

Phase 3 output

Computing scores at the group-level is essentially the same as at the examinee level. Note that
the selection of EAP estimations based on the empirical latent distribution from Phase 2 over-
rides the choice here of number of quadrature points. Because of the small number of items, the
standard deviation of the estimated scores is considerably smaller than that of the latent distribu-
tion. Portions of the Phase 3 output are listed below.

>SCORE NQPT = (12, 12),IDIST = 3,RSCTYPE = 4,


LOCATION = (250.0000, 250.0000), SCALE = (50.0000, 50.0000),FIT;

PARAMETERS FOR SCORING, RESCALING, AND TEST AND ITEM INFORMATION


METHOD OF SCORING SUBJECTS: EXPECTATION A POSTERIORI
(EAP; BAYES ESTIMATION)
TYPE OF PRIOR: EMPIRICAL, FROM ITEM CALIBRATION
SUBJECT FIT PROBABILITIES: YES
TYPE OF RESCALING: IN THE ESTIMATED LATENT
DISTRIBUTION
REFERENCE GROUP FOR RESCALING: GROUP: 1

QUAD RESCALING CONSTANTS


TEST NAME POINTS SCALE LOCATION
-------------------------------------------
1 NUMCON 51 50.000 250.000
2 ALGCON 51 50.000 250.000
--------------------------------------------

669
10 BILOG-MG EXAMPLES

The scores are rescaled so that the mean and standard deviation of the Phase 3 latent distribution
are 250 and 50, respectively. Scores for all 32 schools are computed and printed. Because the
data are binomial rather than binary, a χ 2 index of fit on 8 degrees of freedom can be calculated
for each school. The corresponding probabilities are shown in the output.

RESCALING WITH RESPECT TO LATENT DISTRIBUTION


--------------------------------------------------

RESCALING CONSTANTS
TEST SCALE LOCATION
NUMCON 58.462 251.342
ALGCON 56.462 251.127

GROUP SUBJECT IDENTIFICATION FIT MARGINAL


WEIGHT TEST TRIED RIGHT PERCENT ABILITY S.E. PROB PROB
-------------------------------------------------------------------------
1 SCHOOL 1 | |
1.00 NUMCON 23 14 60.87 | 246.5104 43.5894 | 0.1539 0.0000
1 SCHOOL 1 | |
1.00 ALGCON 22 7 31.82 | 243.1547 47.4683 | 0.3197 0.0000
[Similar output omitted]
1 SCHOOL 32 | |
1.00 NUMCON 181 100 55.25 | 221.6762 21.9655 | 0.0166 0.0000
1 SCHOOL 32 | |
1.00 ALGCON 179 77 43.02 | 273.1747 21.8821 | 0.5242 0.0000
-------------------------------------------------------------------------

MEAN & SD OF SCORE ESTIMATES AFTER RESCALING: 250.000 31.149


MEAN & SD OF LATENT DISTRIBUTION AFTER RESCALING: 250.000 50.000

10.7 Analysis of variant items

In this example, responses to 50 items are read from those of 100 items in the data file using the
format statement

(10A1,T38,25A1,1X,25A1).

The first few lines of the data file are shown below. In contrast to previous examples, each posi-
tion in the item response fields for each examinee corresponds to the same item. In the earlier
examples, the association between response and item depended on the group/form membership
of an examinee.

The answer key (KFNAME keyword on the INPUT command) is given first, and is given in the first
two lines of the raw data file in the same format as the item responses.

KEY 00000000000000000000000000000000000000000000000000000000000000000000000000…
0102111900 00000401020100002001101002024030005001000000000233004002014062000012000100…
0104112200 10101200210100000100010230110030013000000100103021014000002042001012001000…
0105121900 11012041110200000010002230131010122101000000013123000002001042101012001300…

From the 50, 20 are selected as Main Test items and 4 as Variant Test items. This is indicated by
setting NITEMS to 24 and NVARIANT to 4 on the LENGTH command. Items for the main test are

670
10 BILOG-MG EXAMPLES

selected by name in the TESTM command; items for the variant test are selected by name in the
TESTV command. The item names correspond to the sequence numbers in the original set of 100
items. Here the short form of naming and numbering is used – the set of items forms an arithme-
tic progression of integer or decimal numbers allowing use of the short form (first (increment)
last). A similar abbreviation may be used for consecutive item names (INAMES keyword on the
ITEMS command).

The analysis is performed on a sample of 200 students randomly drawn from the original sample
of 660 (SAMPLE=200 on the INPUT command). The EAP scale scores of Phase 3 are computed
from the responses to items in the main test.

EXAMPL07.BLM - ANALYSIS OF VARIANT ITEMS IN A SPELLING TEST OF RANDOMLY


SELECTED WORDS; SUBJECTS: 660 UNDERGRADUATE STUDENTS; 2PL MODEL.
>GLOBAL DFNAME='EXAMPL07.DAT', NTEST=1, NVTEST=1, NPARM=2, SAVE;
>SAVE PARM='EXAMPL07.PAR', SCORE='EXAMPL07.SCO';
>LENGTH NITEM=24, NVARIANT=4;
>INPUT NTOTAL=50, KFNAME='EXAMPL07.DAT', SAMPLE=200, NIDCHAR=10;
>ITEMS INUMBERS=(1(1)50), INAME=(I26(1)I75);
>TESTM TNAME=MAINTEST, INAMES=(I26,I27,I28,I29,I31,I33,I34,
I35,I36,I38,I39,I47,I48,I49,I50,I54,I60,I64,I68,I72);
>TESTV TNAME=VARIANT, INAMES=(I53,I59,I69,I73);
(10A1,T38,25A1,1X,25A1)
>CALIB NQPT=31, CRIT=.005, CYCLES=10, NEWTON=2, FLOAT, ACCEL=0.5;
>SCORE METHOD=2, NOPRINT;

Phase 1 output

Phase 1 lists the test specifications and the assignment of items to the main test and the variants.

>ITEMS INUMBERS=(1(1)50), INAME=(I26(1)I75);

TEST SPECIFICATIONS
===================

>TESTM TNAME=MAINTEST,
INAMES=(I26,I27,I28,I29,I31,I33,I34,
I35,I36,I38,I39,I47,I48,I49,I50,I54,I60,I64,I68,I72);

TEST NUMBER: 1 TEST NAME: MAINTEST


NUMBER OF ITEMS: 20

ITEM ITEM ITEM ITEM ITEM ITEM ITEM ITEM


NUMBER NAME NUMBER NAME NUMBER NAME NUMBER NAME
-----------------------------------------------------------------------
1 I26 9 I34 23 I48 43 I68
2 I27 10 I35 24 I49 47 I72
3 I28 11 I36 25 I50
4 I29 13 I38 29 I54
6 I31 14 I39 35 I60
8 I33 22 I47 39 I64
-----------------------------------------------------------------------

>TESTV TNAME=VARIANT,
INAMES=(I53,I59,I69,I73);

671
10 BILOG-MG EXAMPLES

TEST NUMBER: 2 TEST NAME: VARIANT


NUMBER OF ITEMS: 4

ITEM ITEM ITEM ITEM ITEM ITEM ITEM ITEM


NUMBER NAME NUMBER NAME NUMBER NAME NUMBER NAME
-----------------------------------------------------------------------
28 I53 34 I59 44 I69 48 I73
-----------------------------------------------------------------------

Responses of 660 examinees are read from the data records, but only 200 randomly sampled
cases are included in the Phase 1 and Phase 2 analysis. The classical item statistics are shown
separately for main and variant items. The test scores for the item-test correlations are based on
the test scores from the main test items only.

660 OBSERVATIONS READ FROM FILE: EXAMPL07.DAT


660 OBSERVATIONS WRITTEN TO FILE: MF.DAT

REPORT ON SUBJECT SAMPLING:


LEVEL OF SAMPLING = 0.3030
660 SUBJECTS READ FROM FILE: MF.DAT
200 SUBJECTS WRITTEN TO FILE: CF.DAT

ITEM STATISTICS FOR SUBTEST MAINTEST


ITEM*TEST CORRELATION
ITEM NAME #TRIED #RIGHT PCT LOGIT/1.7 PEARSON BISERIAL
------------------------------------------------------------------------
1 I26 200.0 134.0 67.0 -0.42 0.188 0.244
2 I27 200.0 102.0 51.0 -0.02 0.421 0.527
3 I28 200.0 78.0 39.0 0.26 0.294 0.374
4 I29 200.0 147.0 73.5 -0.60 0.444 0.598
...
------------------------------------------------------------------------

ITEM STATISTICS FOR SUBTEST VARIANT


ITEM*TEST CORRELATION
ITEM NAME #TRIED #RIGHT PCT LOGIT/1.7 PEARSON BISERIAL
-------------------------------------------------------------------------
1 I53 200.0 139.0 69.5 -0.48 0.454 0.596
2 I59 200.0 135.0 67.5 -0.43 0.456 0.594
3 I69 200.0 53.0 26.5 0.60 0.379 0.510
4 I73 200.0 50.0 25.0 0.65 0.069 0.094
-------------------------------------------------------------------------

Phase 2 output

Calibration of the main test items is computed as in the other examples. Without altering the item
parameter estimates of those items, parameter estimates for the variants are computed with re-
spect to the latent dimension determined by the main items.

672
10 BILOG-MG EXAMPLES

SUBTEST MAINTEST; ITEM PARAMETERS AFTER CYCLE 6

ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE CHISQ DF


S.E. S.E. S.E. S.E. S.E. (PROB)
-------------------------------------------------------------------------
I26| 0.451 | 0.360 | -1.254 | 0.339 | 0.000 | 8.6 8.0
| 0.648* | 0.088* | 1.775* | 0.083* | 0.000* | (0.3784)
| | | | | |
I27| 0.028 | 0.753 | -0.037 | 0.602 | 0.000 | 3.2 6.0
| 0.691* | 0.152* | 0.918* | 0.121* | 0.000* | (0.7857)
| | | | | |
(Similar output omitted)
I72| -0.018 | 0.726 | 0.025 | 0.587 | 0.000 | 13.6 6.0
| 0.684* | 0.149* | 0.942* | 0.121* | 0.000* | (0.0347)
-------------------------------------------------------------------------
* STANDARD ERROR

LARGEST CHANGE = 0.002542 106.4 131.0


(0.9439)
PARAMETER MEAN STN DEV
-----------------------------------
SLOPE 0.616 0.220
LOG(SLOPE) -0.548 0.368
THRESHOLD 0.143 1.256

******************************
CALIBRATION OF VARIANT ITEMS
VARIANT
******************************

-2 LOG LIKELIHOOD = 4545.542

ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE CHISQ DF


S.E. S.E. S.E. S.E. S.E. (PROB)
-------------------------------------------------------------------------
I53| 0.587 | 0.613 | -0.957 | 0.523 | 0.000 | 0.0 0.0
| 0.104* | 0.111* | 0.202* | 0.094* | 0.000* | (1.0000)
| | | | | |
I59| 0.519 | 0.603 | -0.860 | 0.517 | 0.000 | 0.0 0.0
| 0.101* | 0.109* | 0.195* | 0.093* | 0.000* | (1.0000)
| | | | | |
I69| -0.702 | 0.549 | 1.280 | 0.481 | 0.000 | 0.0 0.0
| 0.107* | 0.109* | 0.263* | 0.095* | 0.000* | (1.0000)
| | | | | |
I73| -0.668 | 0.231 | 2.886 | 0.225 | 0.000 | 0.0 0.0
| 0.098* | 0.064* | 0.857* | 0.062* | 0.000* | (1.0000)
-------------------------------------------------------------------------

Phase 3 output

In Phase 3, scores for all 660 examinees are computed from the main test item response and
saved to an external file. Printing of the scores is suppressed, except for the first three cases. The
latent distribution estimated from all 660 cases is computed and printed. Scores are based on the
unrescaled Phase 2 parameters, which are then saved to an external file.

>SCORE METHOD=2,NOPRINT;

673
10 BILOG-MG EXAMPLES

SCORES WILL NOT BE COMPUTED FOR VARIANT ITEM SUBTESTS


PARAMETERS FOR SCORING, RESCALING, AND TEST AND ITEM INFORMATION

METHOD OF SCORING SUBJECTS: EXPECTATION A POSTERIORI


(EAP; BAYES ESTIMATION)
TYPE OF PRIOR: NORMAL
SCORES WRITTEN TO FILE EXAMPL07.SCO

GROUP SUBJECT IDENTIFICATION MARGINAL


WEIGHT TEST TRIED RIGHT PERCENT ABILITY S.E. PROB
-------------------------------------------------------------------------
1 0102111900 | |
1.00 MAINTEST 20 8 40.00 | -0.4065 0.3645 | 0.000000
1 0104112200 | |
1.00 MAINTEST 20 8 40.00 | -0.4091 0.3641 | 0.000000
1 0105121900 | |
1.00 MAINTEST 20 3 15.00 | -1.2316 0.4637 | 0.000000
-------------------------------------------------------------------------

SUMMARY STATISTICS FOR SCORE ESTIMATES


======================================

CORRELATIONS AMONG TEST SCORES

MAINTEST
MAINTEST 1.0000

MEANS, STANDARD DEVIATIONS, AND VARIANCES OF SCORE ESTIMATES

TEST: MAINTEST
MEAN: 0.0915
S.D.: 0.8940
VARIANCE: 0.7992

ROOT-MEAN-SQUARE POSTERIOR STANDARD DEVIATIONS

TEST: MAINTEST
RMS: 0.4493
VARIANCE: 0.2019

EMPIRICAL RELIABILITY: 0.7984

MARGINAL LATENT DISTRIBUTION(S)


===============================

MARGINAL LATENT DISTRIBUTION FOR TEST MAINTEST


MEAN = 0.092
S.D. = 0.974

10.8 Group-wise adaptive testing

This example illustrates the use of BILOG-MG with multiple groups and multiple subtests. It is
designed to illustrate some of the more complicated features of the program, including user-
specified priors on the latent distributions and priors on the item parameters.

674
10 BILOG-MG EXAMPLES

Based on previous test performance, examinees are assigned to two groups for adaptive testing.
Out of a set of 45 items, group 1 is assigned items 1 through 25, and group 2 is assigned items 21
through 45. Thus, there are 5 items linking the test forms administered to the groups.

Twenty of the 25 items presented to group 1 belong to subtest 1 (items 1-15 and 21-25); twenty
items also belong to subtest 2 (items 6-25). Of the 25 items presented to group 2, 20 belong to
subtest 1 (items 21-40) and 20 to subtest 2 (items 21-25 and 31-45).

In all, there are 35 items from the set of 45 assigned to each subtest. (This extent of item overlap
between subtests is not realistic, but it illustrates that more than one subtest can be scored adap-
tively provided they each contain link items between the test forms.)

This example also illustrates how user-supplied priors for the latent distributions are specified
with IDIST=1 on the CALIB command. The points and weights for these distributions are sup-
plied in the QUAD commands. Note that with IDIST=1, there are separate QUAD commands for
each group for each subtest. Within each subtest the points are the same for each group. This is a
requirement of the program. But as the example shows, the points for the groups may differ by
subtest. If IDIST has been set to 2, sets of weights have to be supplied by group. The set of
points then applies to all subtests.

The PRIOR command for each subtest is placed after the QUAD commands for that subtest. The
presence of the PRIOR command is indicated using the READPRIOR option on the CALIB com-
mand. In this example, only the prior for the standard deviation of the thresholds is supplied on
the PRIOR command. Default values are used for the other prior distributions. The means of the
distributions are kept fixed at their specified values by using the NOFLOAT option on the CALIB
command.

The score distribution in the respondent population is estimated in the form of a discrete distribu-
tion on NQPT=16 points by adding the EMPIRICAL option to the CALIB command. This discrete
distribution will be used in the place of the prior in MML estimate of the item parameters. When
NGROUP>1, separate score distributions will be estimated for the groups. The first group serves as
the reference group (REFERENCE=1). If the REFERENCE keyword is omitted, the first group will by
default be used as the reference group. When NGROUP>1, the FLOAT option is the default. By us-
ing NOFLOAT here, the means of the prior distributions on item parameters are kept fixed at the
specified values during estimation.

In the scoring phase, the empirical prior from phase 2 is used as prior distribution for the scale
scores (IDIST=3). Rescaling of scores to the scale and location in the sample of scale score esti-
mates is requested by setting RSCTYPE to 3. The presence of the INFO keyword indicates that in-
formation output is required. In this case INFO=1 and test information curves will be printed to
the phase 3 output file. In combination with the YCOMMON and POP options, the test information
curves will be expressed in comparable units and an estimate of the classical reliability coeffi-
cient, amongst other information, will be calculated for each subtest.

675
10 BILOG-MG EXAMPLES

EXAMPL08.BLM -
GROUP-WISE ADAPTIVE TESTING WITH TWO SUBTESTS
>GLOBAL DFNAME='EXAMPL08.DAT', NPARM=2, NTEST=2, SAVE;
>SAVE SCORE='EXAMPL08.SCO';
>LENGTH NITEMS=(35,35);
>INPUT NTOT=45, SAMPLE=2000, NGROUP=2, KFNAME='EXAMPL08.DAT', NALT=5,
NFORMS=2, NIDCH=5;
>ITEMS INUM=(1(1)45), INAME=(C01(1)C45);
>TEST1 TNAME=SUBTEST1, INAME=(C01(1)C15,C21(1)C40);
>TEST2 TNAME=SUBTEST2, INAME=(C06(1)C25,C31(1)C45);
>FORM1 LENGTH=25, INUM=(1(1)25);
>FORM2 LENGTH=25, INUM=(21(1)45);
>GROUP1 GNAME=POP1, LENGTH=25, INUM=(1(1)25);
>GROUP2 GNAME=POP2, LENGTH=25, INUM=(21(1)45);
(5A1,T1,I1,T1,I1,T7,25A1)
>CALIB IDIST=1, READPRIOR, EMPIRICAL, NQPT=31, CYCLE=25, TPRIOR, NEWTON=5,
CRITERION=0.01, REFERENCE=1, NOFLOAT;
>QUAD1 POINTS=(-0.4598E+01,-0.3560E+01,-0.2522E+01,-0.1484E+01,-0.4453E+00,
0.5930E+00, 0.1631E+01, 0.2670E+01, 0.3708E+01, 0.4746E+01),
WEIGHTS=(0.2464E-05, 0.4435E-03, 0.1724E-01, 0.1682E+00, 0.3229E+00,
0.3679E+00, 0.1059E+00, 0.1685E-01, 0.6475E-03, 0.8673E-05);
>QUAD2 POINTS=(-0.4598E+01,-0.3560E+01,-0.2522E+01,-0.1484E+01,-0.4453E+00,
0.5930E+00, 0.1631E+01, 0.2670E+01, 0.3708E+01, 0.4746E+01),
WEIGHTS=(0.2996E-04, 0.1300E-02, 0.1474E-01, 0.1127E+00, 0.3251E+00,
0.3417E+00, 0.1816E+00, 0.2149E-01, 0.1307E-02, 0.3154E-04);
>PRIOR TSIGMA=(1.5(0)35);
>QUAD1 POINTS=(-0.4000E+01,-0.3111E+01,-0.2222E+01,-0.1333E+01,-0.4444E+00,
0.4444E+00, 0.1333E+01, 0.2222E+01, 0.3111E+01, 0.4000E+01),
WEIGHTS=(0.1190E-03, 0.2805E-02, 0.3002E-01, 0.1458E+00, 0.3213E+00,
0.3213E+00, 0.1458E+00, 0.3002E-01, 0.2805E-02, 0.1190E-03);
>QUAD2 POINTS=(-0.4000E+01,-0.3111E+01,-0.2222E+01,-0.1333E+01,-0.4444E+00,
0.4444E+00, 0.1333E+01, 0.2222E+01, 0.3111E+01, 0.4000E+01),
WEIGHTS=(0.1190E-03, 0.2805E-02, 0.3002E-01, 0.1458E+00, 0.3213E+00,
0.3213E+00, 0.1458E+00, 0.3002E-01, 0.2805E-02, 0.1190E-03);
>PRIOR TSIGMA=(1.5(0)35);
>SCORE IDIST=3, RSCTYPE=3, INFO=1, YCOMMON, POP, NOPRINT;

Phase 1 output

Phase 1 echoes the assignment of items to subtests, forms, and groups. Classical item statistics
are computed for each subtest in each group. Output for subtest 1 and group 1 (POP1) is given
below.

SUBTEST 1 SUBTEST1
GROUP 1 POP1 200 OBSERVATIONS
GROUP 2 POP2 200 OBSERVATIONS

SUBTEST 2 SUBTEST2
GROUP 1 POP1 200 OBSERVATIONS
GROUP 2 POP2 200 OBSERVATIONS

676
10 BILOG-MG EXAMPLES

SUBTEST 1 SUBTEST1
ITEM STATISTICS FOR GROUP: 1 POP1
ITEM*TEST CORRELATION
ITEM NAME #TRIED #RIGHT PCT LOGIT/1.7 PEARSON BISERIAL
------------------------------------------------------------------------
1 C01 200.0 170.0 0.850 -1.02 0.408 0.625
2 C02 200.0 164.0 0.820 -0.89 0.396 0.580
3 C03 200.0 154.0 0.770 -0.71 0.451 0.626
4 C04 200.0 143.0 0.715 -0.54 0.400 0.532
5 C05 200.0 140.0 0.700 -0.50 0.586 0.772
6 C06 200.0 135.0 0.675 -0.43 0.441 0.574
...
19 C24 200.0 83.0 0.415 0.20 0.590 0.746
20 C25 200.0 76.0 0.380 0.29 0.558 0.711
------------------------------------------------------------------------

Phase 2 output

Phase 2 estimates empirical latent distributions for each group and item parameters for each sub-
test. The arbitrary mean and standard deviation of reference group 1 determine the origin and
unit of the ability scales.

>CALIB IDIST=1, READPRIOR, EMPIRICAL, NQPT=31, CYCLE=25, TPRIOR, NEWTON=5,


CRITERION=0.01, REFERENCE=1, NOFLOAT;

ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE CHISQ DF


S.E. S.E. S.E. S.E. S.E. (PROB)
-------------------------------------------------------------------------------
C01 | 1.435 | 0.930 | -1.542 | 0.681 | 0.000 | 8.5 6.0
| 0.196* | 0.187* | 0.211* | 0.137* | 0.000* | (0.2037)
| | | | | |
C02 | 1.196 | 0.823 | -1.453 | 0.635 | 0.000 | 7.7 6.0
| 0.162* | 0.163* | 0.215* | 0.126* | 0.000* | (0.2580)
| | | | | |
C03 | 1.028 | 0.922 | -1.115 | 0.678 | 0.000 | 5.8 6.0
| 0.160* | 0.169* | 0.153* | 0.124* | 0.000* | (0.4441)
...
| | | | | |
C38 | -0.962 | 1.098 | 0.876 | 0.739 | 0.000 | 6.7 6.0
| 0.164* | 0.182* | 0.115* | 0.122* | 0.000* | (0.3520)
| | | | | |
C39 | -1.144 | 0.879 | 1.302 | 0.660 | 0.000 | 1.7 5.0
| 0.173* | 0.170* | 0.169* | 0.128* | 0.000* | (0.8927)
| | | | | |
C40 | -1.044 | 0.632 | 1.652 | 0.534 | 0.000 | 3.0 6.0
| 0.133* | 0.123* | 0.268* | 0.104* | 0.000* | (0.8143)
-------------------------------------------------------------------------------
* STANDARD ERROR
LARGEST CHANGE = 0.008756 171.9 233.0
(0.9990)

PARAMETER MEAN STN DEV


-----------------------------------
SLOPE 0.862 0.154
LOG(SLOPE) -0.165 0.180
THRESHOLD -0.164 0.908

677
10 BILOG-MG EXAMPLES

Phase 3 output

The only new feature in Phase 3 is the use of the YCOMMON option to place the information plots
for the subtests on the same scale. This permits visual comparison of the relative precision of the
subtests according to the heights of the information curves. To illustrate, the ICC for subtest 1,
form is given below. The POP option also provides IRT estimated reliability for each subtest.

TEST INFORMATION CURVE FOR TEST: SUBTEST1 FORM: 1


STANDARD INFOR-
ERROR MATION
------------------------------------------------------------------
4.41| |10.0923
| |
4.19| | 9.5876
| |
3.97| ++++ | 9.0830
| + + |
3.75| + + | 8.5784
| + + |
3.53| + + | 8.0738
| |
3.31| + + | 7.5692
| |
3.08| + + | 7.0646
| |
2.86| + + | 6.5600
| |
2.64| + | 6.0554
| + |
2.42| + | 5.5507
| + |
2.20| + *| 5.0461
| + * |
1.98|* + | 4.5415
| + + * |
1.76| * * | 4.0369
| * + + |
1.54| * * | 3.5323
| + + * |
1.32| * * | 3.0277
| ** + + * |
1.10| * + + * | 2.5231
| * + ** |
0.88| ** + + * | 2.0185
| +* +* |
0.66| + ** ** + | 1.5138
| + *** *** + |
0.44| + ****** ***** ++ | 1.0092
| ++ **************** ++ |
0.22| +++ +++ | 0.5046
|+++ ++++|
0.00| | 0.0000
-+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
-4.00 -3.00 -2.00 -1.00 0.00 1.00 2.00 3.00 4.00

MAXIMUM INFORMATION APPROXIMATELY 0.9046D+01 AT 0.0000


FOR A NORMAL POPULATION WITH MEAN 0.000 AND S.D. 1.000
AVERAGE INFORMATION= 0.7232D+01 AND RELIABILITY INDEX= 0.879

678
10 BILOG-MG EXAMPLES

10.9 Two-stage spelling test

This example is based on a study by Bock and Zimowski (1998). The full document is available
on the Internet from the American Institutes for Research. As a small computing example, we
simulated two-stage testing in data for the “One-Hundred Word Spelling Test” previously ana-
lyzed by Bock, Thissen, and Zimowski (1997). A complete description of these data are given in
Section 2.4.1.

On the basis of item parameters they report, we selected 12 first-stage items and 12 items for
each of three levels of the second-stage test.

Because of the limited number of items in the pool, we could not meet exactly the requirements
of the prototype design, but the resulting test illustrates well enough the main features of the
analysis. The item numbers in this and a later example correspond to the words presented in
Bock, Thissen, and Zimowski’s (1977) Table 1 in the NAEP report. All computations in the
analysis were carried out with the BILOG-MG program of Zimowski, Muraki, Mislevy and
Bock (1996). The program command files as well as the data file (with N = 660) are included in
the twostage folder of the BILOG-MG installation folder.

For assigning the cases in the data to second-stage levels under conditions that would apply in an
operational assessment, we re-estimated the parameters for the 12 first-stage items, computed
Bayes estimates of proficiency scale scores, and rescaled the scores to mean 0 and standard de-
viation 1 in the sample. The command file step0.blm, shown below, contains the necessary com-
mands.

STEP0.BLM - A SIMULATED TWO-STAGE SPELLING TEST - Prototype 1 computing


example. Estimation of the 12 first-stage item parameters.
>COMMENTS
From: "Feasibility Studies of Two-Stage Testing in Large-Scale Educational
Assessment: Implications for NAEP" by R. Darrel Bock and Michele F. Zimowski,
May 1998, American Institutes for Research.

Based on the 100-word spelling test data. N = 1000


(See Bock, Thissen and Zimowski, 1997).

According to page 35 of the NAEP study, we first establish group membership by


recalibrating the parameters for the 12 first-stage items and compute EAP esti-
mates of the proficiency scale scores, rescaled to mean 0 and standard devia-
tion 1 in the sample of 1000. Next, we assign group membership based on scores
at or below -0.67 (group 1), at or above +0.67 (group 3), and the remaining
scores (group 2).

The resulting score file was manipulated per these instructions(see result in
the STEP0.EAP file) and the assigned group membership added to the original
data file as column 12 (before empty). The resulting split is: group 1 236,
group 2 531, group 3 233.

>GLOBAL NPARM=2, DFNAME='SPELL1.DAT', SAVE;


>SAVE PARM='STEP0.PAR', SCORE='STEP0.SCO';
>LENGTH NITEMS=12;
>INPUT NTOTAL=100, NIDCH=11, TYPE=1, SAMPLE=1000, KFNAME='SPELL1.DAT';
>ITEMS INUM=(1(1)100), INAME=(SPELL001(1)SPELL100);

679
10 BILOG-MG EXAMPLES

>TEST TNAME=SPELLING, INUM=(1,4,8,10,23,25,28,29,39,47,59,87);


(11A1,1X,25A1,1X,25A1/12X,25A1,1X,25A1)
>CALIB NQPT=20, CRIT=0.001, CYCLES=100, NEWTON=2, NOFLOAT;
>SCORE IDIST=3, METHOD=2, NOPRINT, INFO=1, POP;

Cases with scores at or below -0.67 were assigned to group 1. Those at or above +0.67 were as-
signed to group 3, and the remainder to group 2. Of the 1000 cases in the original study, 274,
451, and 275 were assigned to groups 1, 2, and 3, respectively. With these assignment codes in-
serted in the case records, the latent distributions were estimated using the command file for the
first-stage analysis shown below (step1.blm in the twostage folder).

STEP1.BLM - ANALYSIS 1: A SIMULATED TWO-STAGE SPELLING TEST


Estimation of first-stage item parameters and latent distributions.
>GLOBAL DFNAME='SPELL2.DAT', NPARM=2, SAVE;
>SAVE SCORE='STEP1.SCO', PARM='STEP1.PAR';
>LENGTH NITEMS=12;
>INPUT NTOT=100, SAMPLE=1000, NGROUP=3, KFNAME='SPELL2.DAT', NIDCHAR=11,
TYPE=1;
>ITEMS INUMBERS=(1(1)100), INAMES=(SPELL001(1)SPELL100);
>TEST TNAME=SPELLING, INUM=(1,4,8,10,23,25,28,29,39,47,59,87);
>GROUP1 GNAME=GROUP1, LENGTH=12, INUM=(1,4,8,10,23,25,28,29,39,47,59,87);
>GROUP2 GNAME=GROUP2, LENGTH=12, INUM=(1,4,8,10,23,25,28,29,39,47,59,87);
>GROUP3 GNAME=GROUP3, LENGTH=12, INUM=(1,4,8,10,23,25,28,29,39,47,59,87);
(11A1,I1,25A1,1X,25A1,/T13,25A1,1X,25A1)
>CALIB FIX, NOFLOAT, NQPT=20, CYCLE=35, SPRIOR, NEWTON=2, CRIT=0.001, REF=0;
>SCORE IDIST=3, METHOD=2, NOPRINT, INFO=1, POP;

For the second-stage analysis, we used the latent distributions estimated in the first-stage analysis
as the prior distributions for maximum marginal likelihood analysis of the combined first- and
second-stage data. The points and weights representing the distributions are shown in the corre-
sponding BILOG-MG command file.

Inasmuch as there are no second-stage link items in this example, we use the first-stage items as
an anchor test. The six easiest of these items provide the links between levels 1 and 2; the six
most difficult provide the links between levels 2 and 3.

The syntax for this analysis is given in step2.blm, as shown below.

STEP2.BLM - ANALYSIS 2: A SIMULATED TWO-STAGE SPELLING TEST. Estimated link


and second-stage item parameters, and latent distributions.
>COMMENTS
The points and weights are the posterior estimates from STEP1.PH2.

>GLOBAL DFNAME='SPELL2.DAT', NPARM=2, SAVE;


>SAVE SCORE='STEP2.SCO', PARM='STEP2.PAR';
>LENGTH NITEMS=48;
>INPUT NTOT=100, SAMPLE=1000, NGROUP=3, KFNAME='SPELL2.DAT', NIDCHAR=11,
TYPE=1;
>ITEMS INUM=(1(1)100), INAME=(SPELL001(1)SPELL100);
>TEST TNAME=SPELLING,
INUM=( 1, 4, 5, 6, 7, 8, 9,10,12,14,15,17,20,23,24,25,
26,27,28,29,33,34,35,38,39,46,47,48,49,50,53,54,
59,60,64,68,69,72,73,77,78,84,85,86,87,90,95,97);
>GROUP1 GNAME=GROUP1, LENGTH=18,
INUM=( 1, 4, 5,14,24,26,29,38,39,46,53,59,68,78,85,87,90,95);

680
10 BILOG-MG EXAMPLES

>GROUP2 GNAME=GROUP2, LENGTH=24,


INUM=( 1, 4, 8, 9,10,15,20,23,25,27,28,29,33,34,39,47,48,49,
50,54,59,64,72,87);
>GROUP3 GNAME=GROUP3, LENGTH=18,
INUM=( 6, 7, 8,10,12,17,23,25,28,35,47,60,69,73,77,84,86,97);
(11A1,I1,25A1,1X,25A1,/T13,25A1,1X,25A1)
>CALIB IDIST=1, FIX, NOFLOAT, CYCLE=35, SPRIOR, NEWTON=2, CRIT=0.001,
NQPT=20, REF=0, PLOT=1.0, ACC=0.0;
>QUAD1 POINT=(-0.4081E+01, -0.3652E+01, -0.3222E+01, -0.2792E+01,
-0.2363E+01, -0.1933E+01, -0.1504E+01, -0.1074E+01,
-0.6443E+00, -0.2147E+00, 0.2150E+00, 0.6446E+00,
0.1074E+01, 0.1504E+01, 0.1933E+01, 0.2363E+01,
0.2793E+01, 0.3222E+01, 0.3652E+01, 0.4082E+01),
WEIGHT= (0.2345E-03, 0.1159E-02, 0.4738E-02, 0.1624E-01,
0.4605E-01, 0.1077E+00, 0.2023E+00, 0.2785E+00,
0.2311E+00, 0.9390E-01, 0.1678E-01, 0.1320E-02,
0.4924E-04, 0.9717E-06, 0.8556E-12, 0.0000E+00,
0.0000E+00, 0.0000E+00, 0.0000E+00, 0.0000E+00);
>QUAD2 POINT=(-0.4081E+01, -0.3652E+01, -0.3222E+01, -0.2792E+01,
-0.2363E+01, -0.1933E+01, -0.1504E+01, -0.1074E+01,
-0.6443E+00, -0.2147E+00, 0.2150E+00, 0.6446E+00,
0.1074E+01, 0.1504E+01, 0.1933E+01, 0.2363E+01,
0.2793E+01, 0.3222E+01, 0.3652E+01, 0.4082E+01),
WEIGHT=(0.0000E+00, 0.0000E+00, 0.0000E+00, 0.3055E-05,
0.7882E-04, 0.1170E-02, 0.1119E-01, 0.6218E-01,
0.1820E+00, 0.2791E+00, 0.2502E+00, 0.1451E+00,
0.5407E-01, 0.1271E-01, 0.1945E-02, 0.2046E-03,
0.8579E-06, 0.0000E+00, 0.0000E+00, 0.0000E+00);
>QUAD3 POINT=(-0.4081E+01, -0.3652E+01, -0.3222E+01, -0.2792E+01,
-0.2363E+01, -0.1933E+01, -0.1504E+01, -0.1074E+01,
-0.6443E+00, -0.2147E+00, 0.2150E+00, 0.6446E+00,
0.1074E+01, 0.1504E+01, 0.1933E+01, 0.2363E+01,
0.2793E+01, 0.3222E+01, 0.3652E+01, 0.4082E+01),
WEIGHT= (0.0000E+00, 0.0000E+00, 0.0000E+00, 0.0000E+00,
0.0000E+00, 0.3914E-11, 0.1006E-05, 0.5966E-04,
0.1650E-02, 0.1943E-01, 0.9720E-01, 0.2237E+00,
0.2717E+00, 0.2051E+00, 0.1111E+00, 0.4735E-01,
0.1652E-01, 0.4763E-02, 0.1128E-02, 0.2324E-03);
>SCORE IDIST=3, METHOD=2, NOPRINT, INFO=1, POP;

Since the spelling data contain responses of all cases to all items, we can examine the compara-
tive accuracy of the estimates based on the 24 items per case in the two-stage data with those
based on 48 items per case in a conventional one-stage test. Syntax is as given in step3.blm,
shown below.

STEP3 - ANALYSIS 2: A SIMULATED TWO-STAGE SPELLING TEST


Estimation of 48 one-stage item parameters, and latent distributions.
>GLOBAL DFNAME='SPELL2.DAT', NPARM=2, SAVE;
>SAVE SCORE='STEP3.SCO', PARM='STEP3.PAR';
>LENGTH NITEMS=48;
>INPUT NTOTAL=100, SAMPLE=1000, KFNAME='SPELL2.DAT', NIDCHAR=11, TYPE=1;
>ITEMS INUMBERS=(1(1)100), INAMES=(SPELL001(1)SPELL100);
>TEST TNAME=SPELLING, INUM=(1,4,5,6,7,8,9,10,12,14,15,17,20,23,24,25,
26,27,28,29,33,34,35,38,39,46,47,48,49,50,53,54,59,60,64,68,69,72,73,
77,78,84,85,86,87,90,95,97);
(11A1,1X,25A1,1X,25A1,/T13,25A1,1X,25A1)
>CALIB IDIST=0, FIX, NOFLOAT, CYCLE=35, SPRIOR, NEWTON=2, CRIT=0.001,
NQPT=20, REF=0, PLOT=1.0, ACC=0.0;
>SCORE IDIST=3, METHOD=2, NOPRINT, INFO=1, POP;

681
10 BILOG-MG EXAMPLES

The latter estimates are also shown in Table 10.1. Despite the small number of items and rela-
tively small sample size in this computing example, the agreement between the estimates is rea-
sonably good for the majority of items. There are notable exceptions, however, among the sec-
ond-stage items: of these, items 6, 7, 77, and 84 show discrepancies in both slope and threshold;
all of these are from level 3 and have extremely high thresholds in the one-stage analysis, well
beyond the +1.5 maximum we are assuming for second-stage items. Items 12 and 17 from level 3
are discrepant only in slope, as are items 26 and 38 from level 2, and items 50 and 64 from level
1.

Table 10.1: Comparison of two-stage and one-stage item parameter estimates in the spell-
ing data (shown for first 10 items)

Item Two-stage One-stage

Slope Threshold Slope Threshold


(S.E.) (S.E.) (S.E.) (S.E.)

SPELL001 0.74191 -0.22896 0.84646 -0.32964


0.10040 0.07910 0.08642 0.06612

SPELL004 0.64140 -0.45195 0.71193 -0.54128


0.08831 0.09150 0.07347 0.08305

SPELL005 0.68036 -1.47582 0.69276 -1.40895


0.19351 0.16286 0.07525 0.13561

SPELL006 0.87969 1.51254 0.29534 2.15957


0.24184 0.13287 0.04648 0.37184

SPELL007 0.78362 2.59105 0.32823 3.76009


0.24146 0.37885 0.06116 0.67776

SPELL008 0.51257 0.52107 0.54531 0.59135


0.07726 0.11154 0.06226 0.10754

SPELL009 0.98121 -0.28826 0.68981 -0.25449


0.19997 0.08066 0.06884 0.07895

SPELL010 0.94877 0.45341 0.91421 0.50198


0.10159 0.06703 0.08021 0.06909

SPELL012 0.87810 1.41514 0.78199 1.41415


0.23453 0.11948 0.09203 0.13032

SPELL014 1.00579 -1.99060 0.72159 -1.94803


0.28436 0.20872 0.10121 0.20793

682
10 BILOG-MG EXAMPLES

In all cases the two-stage slope is larger than the one-stage slope. This effect is balanced how-
ever, by the tendency of the first-stage items, 1, 4, 8, 10, 23, 25, 28, 29, 39, 47, 59, and 87, to
show smaller slopes in the two-stage analysis. As a result, the average slope in the two-stage re-
sults is only slightly larger than the one-stage average.

The average thresholds also show only a small difference. In principle, the parameters of a two-
parameter logistic response function can be calculated from probabilities at any two distinct, fi-
nite values on the measurement continuum. Similarly, those of the three-parameter model can be
calculated from three such points. This suggests that in fallible data estimation must improve,
even in the two-stage case, as sample size increases. Some preliminary simulations we have at-
tempted suggest that with sample sizes in the order of 5 or 10 thousand, and better placing of the
items, the discrepancies we see in the prototype 1 results largely disappear.

The latent distributions estimated with items from both stages are depicted in Figure 10.1. The
distributions for the three assignment groups are shown normalized to unity. The estimated popu-
lation distribution, which is the sum of the distributions for the individual groups weighted pro-
portional to sample size, is constrained to mean 0 and standard deviation 1 during estimation of
the component distribution. It is essentially normal and almost identical to the population distri-
bution estimated in the one-stage analysis.

Figure 10.1. Prototype 1: estimated latent distributions from two-stage and one-stage spell-
ing data

One may infer the measurement properties of the simulated two-stage spelling test from the in-
formation and efficiency calculations shown in Figure 10.2 and Figure 10.3, respectively. When
interpreting information curves, the following rules of thumb are helpful. An information value

683
10 BILOG-MG EXAMPLES

of 5 corresponds to a measurement error variance of 1/5 = 0.2. In a population in which the score
variance is set to unity, the reliability of a score with this error variance is 1.0 - 0.2 = 0.8. Simi-
larly, the reliability corresponding to an information value of 10 is 0.9. In the context of low-
stakes score reporting, we are aiming for reliabilities anywhere between these figures. As is ap-
parent in Figure 10.2, this range of reliability is achieved in the two-stage results for spelling
over much of the latent distribution.

Figure 10.2. Prototype 1: two-stage spelling test

Figure 10.3. Prototype 1: efficiencies of the two-stage spelling tests

684
10 BILOG-MG EXAMPLES

Finally, the efficiency curves in Figure 10.3 for the three levels show us the saving of test length
and administration time, including both first- and second-stage testing, due specifically to the
two-stage procedure in comparison with a one-stage test of the same length and item content.

In this case we hope to see efficiencies greater than 2.0, at least away from the population mean
where conventional tests with peaked centers typically have reduced precision. The prototype 1
design and analysis meet this criterion.

To increase generalizability of group-level mean scores in assessment applications of the proto-


type 1 design, the second-stage tests will of course have to exist in multiple stratified randomly
parallel forms. As with matrix sampling designs, these forms will be administered in random ro-
tation to the examinees in each second-stage level. The sample data will then be suitable for
equivalent-groups equating of the second-stage forms.

10.10 Estimating and scoring tests of increasing length

In this example, commands for estimating item parameters and computing score means, standard
deviations, variances, average standard errors, error variances, and inverse information reliabil-
ities of maximum likelihood estimates of ability, are illustrated.

Note: to obtain the same results for EAP estimation, set METHOD=2 in the SCORE command; for
MAP estimation, set METHOD=3.

EXAMPL10.BLM - MML estimation of item parameters


ML estimation of case scores
>GLOBAL DFNAME=‘SIM01C0.SIM',NPARM=2,NTEST=6,SAVE;
>SAVE SCORE='MLEVAL1.SCO';
>LENGTH NITEMS=(4,8,16,32,64,128);
>INPUT NTOTAL=128,NIDCH=5,SAMPLE=3000;
>ITEMS INUMBERS=(1(1)128),INAME=(ITEM001(1)ITEM128);
>TEST1 TNAME=LENGTH4,INUMBERS=(1(1)4);
>TEST2 TNAME=LENGTH8,INUMBERS=(1(1)8);
>TEST3 TNAME=LENGTH16,INUMBERS=(1(1)16);
>TEST4 TNAME=LENGTH32,INUMBERS=(1(1)32);
>TEST5 TNAME=LENGTH64,INUMBERS=(1(1)64);
>TEST5 TNAME=LENG128,INUMBERS=(1(1)128);
(11A1,1X,128A1)
>CALIB NQPT=40,CYCLE=25,TPRIOR,NEWTON=3,CRIT=0.001,NOSPRIOR,NOADJUST;
>SCORE METHOD=1,INFO=1,YCOMMON,POP,NOPRINT;

Related topics

 SCORE command: METHOD keyword

10.11 Commands for parallel-form correlations

This example contains the syntax used for computing parallel form correlations and between test
correlations for tests of different lengths. Set METHOD equal to 1, 2, or 3 in the SCORE command to
obtain correlations for ML, EAP, and MAP estimated abilities respectively.

685
10 BILOG-MG EXAMPLES

EXAMPL11.BLM - Correlation of independent ML estimates

>GLOBAL DFNAME=‘SIM01C0.SIM',NPARM=2,NTEST=12,SAVE;
>SAVE SCORE='MAPCOR1.SCO';
>LENGTH NITEMS=(4,4,8,8,16,16,32,32,64,64,128,128);
>INPUT NTOTAL=504,NIDCH=5,SAMPLE=3000;
>ITEMS INUMBERS=(1(1)504),INAME=(ITEM001(1)ITEM504);
>TEST1 TNAME=LENGTH4a,INUMBERS=(1(1)4);
>TEST2 TNAME=LENGTH4b,INUMBERS=(5(1)8);
>TEST3 TNAME=LENGTH8a,INUMBERS=(9(1)16);
>TEST4 TNAME=LENGTH8b,INUMBERS=(17(1)24);
>TEST5 TNAME=LEN16a, INUMBERS=(25(1)40);
>TEST6 TNAME=LEN16b, INUMBERS=(41(1)56);
>TEST7 TNAME=LEN32a, INUMBERS=(57(1)88);
>TEST8 TNAME=LEN32b, INUMBERS=(93(1)120);
>TEST9 TNAME=LEN64a, INUMBERS=(121(1)184);
>TEST10 TNAME=LEN64b, INUMBERS=(185(1)248);
>TEST11 TNAME=LEN128a ,INUMBERS=(249(1)376);
>TEST12 TNAME=LEN128b, INUMBERS=(377(1)504);
(11A1,1X,504A1)
>CALIB NQPT=40,CYCLE=25,NEWTON=3,CRIT=0.001,NOSPRIOR,NOADJUST;
>SCORE METHOD=1,INFO=1,YCOMMON,POP,NOPRINT;

Related topics

 SCORE command: METHOD keyword

10.12 EAP scoring of the NAEP forms and state main and variant tests

The syntax in this example was used to score NAEP forms and state main and variant tests. It is
included here as an example of a more complicated analysis and contains numerous TEST and
FORMS commands.

The use of the INUMBERS keyword on the FORM commands to assign items to the various forms is
of interest, as is the naming convention used with the INAMES keyword on the ITEMS command.
Finally, note that none of the tests are calibrated (SELECT=0 for all tests on the CALIB command).
Scoring is done according to a previously generated item parameter file gr4fin.par read with the
IFNAME keyword on the GLOBAL command.

EXAMPLE12.BLM - Scoring of main and variant tests


Grade 4 Reading
>COMMENTS
******************************************************************************
This example is for illustration purposes only.
The actual data to run the command file is not available.
******************************************************************************

The syntax in this example was used to score NAEP forms and state main and
variant tests. It is included here as an example of a more complicated analysis
and contains numerous TEST and FORMS commands.

The use of the INUMBERS keyword on the FORM commands to assign items to the
various forms is of interest, as is the naming convention used with the INAMES
keyword on the ITEMS command. Finally, note that none of the tests are cali-

686
10 BILOG-MG EXAMPLES

brated (SELECT=0 for all tests on the CALIB command). Scoring is done according
to a previously generated item parameter file (gr4fin.par) read with the IFNAME
keyword on the GLOBAL command.

The variant items in a test are not intended to be scored as a test. They are
included in the analysis to obtain preliminary information on there item char-
acteristics with respect to the latent variable measured by the main test.

>GLOBAL NPARM=3, NTEST=6, NVTEST=6, DFNAME='GR4FIN.DAT',


IFNAME='GR4FIN.PAR', SAVE;
>SAVE SCORE='GR4FIN.SCO';
>LENGTH NITEMS=(82,20,56,82,25,47), NVARIANT=(82,20,56,82,25,47);
>INPUT NTOTAL=230, SAMPLE=3000, NIDCH=10, NFORM=16,
KFNAME='GR4FIN.DAT', OFNAME='GR4FIN.DAT', NFNAME='GR4FIN.DAT';
>ITEMS INUMBERS=(1(1)230), INAME=
(MC01,MC02,MC03, OEtw04,OETW05,OEtw06, OEth07,OEth08, OEfo09,
MC10(1)MC14, OEtw15(1)OEtw20, OEfo21,
MC22(1)MC28, OEtw29(1)OEtw31, OEfo32,
MC33(1)MC37, OEtw38(1)OEtw41, OEfo42,
MC43(1)MC46, OEtw47(1)OEtw51, OEfo52,
MC53(1)MC55, OEth56(1)OEth60, OEfo61,
MC62(1)MC64, OEtw65, OEth66(1)OEth69, OEfo70,
MC71(1)MC76, OEtw77(1)OEtw81, OEfo82,
READ07,READ08,READ16,READ17,READ24,READ25,
LIST04,LIST05,LIST06,LIST08,LIST09,LIST12,LIST16,
LIST17,LIST18,LIST19,LIST20,
WSAM01,WSAM02,
DRP01(1)DRP56,
READ01,READ02,READ03,READ04,READ05,READ06,READ09,READ10,
READ11(1)READ15,READ18(1)READ23,
LIST01,LIST02,LIST03,LIST07,LIST10,LIST11,LIST13(1)LIST15,
WRIT01(1)WRIT45);
>TEST01 TNAME=NAEP1, INUMB=(1(1)82);
>TEST02 TNAME=LISTV, INAME=(LIST01(1)LIST20);
>TEST03 TNAME=LISTM, INAME=(LIST01(1)LIST20);
>TEST04 TNAME=NAEP2, INUMB=(1(1)82);
>TEST05 TNAME=DRPV, INAME=(DRP01(1)DRP56);
>TEST06 TNAME=DRPM, INAME=(DRP01(1)DRP56);
>TEST07 TNAME=NAEP3, INUMB=(1(1)82);
>TEST08 TNAME=READV, INAME=(READ01(1)READ25);
>TEST09 TNAME=READM, INAME=(READ01(1)READ25);
>TEST10 TNAME=NAEP4, INUMB=(1(1)82);
>TEST11 TNAME=WRITEV, INAMB=(WRIT01(1)WRIT45,WSAM01,WSAM02);
>TEST12 TNAME=WRITEM, INAME=(WRIT01(1)WRIT45,WSAM01,WSAM02);
>FORM01 LEN=169, INUM=(83(1)230,10(1)14, 1(1)3,15(1)20, 4(1)6, 7, 8, 21,9);
>FORM02 LEN=168, INUM=(83(1)230, 1(1)3,22(1)28, 4(1)6,29(1)31, 7, 8, 9,32);
>FORM03 LEN=168, INUM=(83(1)230,22(1)28,62(1)64,29(1)31,65, 66(1)69,32,70);
>FORM04 LEN=169, INUM=(83(1)230,62(1)64,10(1)14,65, 15(1)20,66(1)69,70,21);
>FORM05 LEN=171, INUM=(83(1)230,10(1)14,22(1)28, 15(1)20,29(1)31,21,32);
>FORM06 LEN=166, INUM=(83(1)230, 1(1)3,62(1)64, 4(1)6,65, 7, 8,66(1)69,9,70);
>FORM07 LEN=170, INUM=(83(1)230,33(1)37,71(1)76,38(1)41,77(1)81, 42,82);
>FORM08 LEN=170, INUM=(83(1)230,71(1)76,43(1)46,77(1)81,47(1)51, 82,52);
>FORM09 LEN=167, INUM=(83(1)230,43(1)46,53(1)55,47(1)51,56(1)60, 52,61);
>FORM10 LEN=167, INUM=(83(1)230,53(1)55,33(1)37,38(1)41,56(1)60, 61,42);
>FORM11 LEN=168, INUM=(83(1)230,33(1)37,43(1)46,38(1)41,47(1)51, 42,52);
>FORM12 LEN=169, INUM=(83(1)230,71(1)76,53(1)55,77(1)81,56(1)60, 82,61);
>FORM13 LEN=170, INUM=(83(1)230,43(1)46,10(1)14,47(1)51,15(1)20, 52,21);
>FORM14 LEN=166, INUM=(83(1)230,53(1)55, 1(1)3, 4(1)6, 56(1)60, 7, 8,61, 9);
>FORM15 LEN=169, INUM=(83(1)230,22(1)28,33(1)37,29(1)31,38(1)41,32,42);
>FORM16 LEN=169, INUM=(83(1)230,62(1)64,71(1)76,65,77(1)81,66(1)69,70,82);
(10A1,2X,I2/15X,19A1,1X,129A1/15X,23A1)

687
10 BILOG-MG EXAMPLES

>CALIB SELECT=(0(0)6);
>SCORE METHOD=2, NOPRINT, NQPT=(25(0)6);

Related topics

 CALIB command: SELECT keyword


 GLOBAL command: IFNAME keyword
 FORM command
 FORM command: INUMBERS keyword
 ITEMS command: INUMBERS keyword
 TEST command

10.13 Domain scores

This is an attempt to reconstruct the domain scores demonstration application reported in “The
Domain Score Concept and IRT: Implications for Standards Setting” by Bock, Thissen & Zi-
mowski (2001). We use the dataset spell.dat as included with the TESTFACT program (see
Chapter 13). All 100 items of the 100-word spelling test seem to be there, but there are only 660
records (instead of the 1,000 that Bock et. al. report). In a first run (spell1.blm), we calibrate all
100 items and save the parameters in an external file. The syntax is shown below.

SPELL1.BLM - CALIBRATION OF THE 100 WORD SPELLING TEST


TWO-PARAMETER MODEL
>COMMENTS
We are trying first to reproduce the table with slope and location parameters
for the 100 words as Bock et al. report in "The Domain Score Concept and IRT:
Implications for Standards Setting."

The SCORE command is included to obtain the percent correct for each examinee
(= the true domain scores).

>GLOBAL DFNAME='SPELL.DAT', NPARM=2, SAVE;


>SAVE PARM='SPELL1.PAR';
>LENGTH NITEMS=(100);
>INPUT NTOTAL=100, NIDCHAR=10, KFNAME='SPELL.DAT';
>ITEMS INAMES=(S001(1)S100);
>TEST1 TNAME='SPELLING', INUMBERS=(1(1)100);
(10A1,1X,25A1,1X,25A1,1X,25A1,1X,25A1)
>CALIB NQPT=31, CYCLES=100, CRIT=0.001, NOFLOAT;
>SCORE;

The item parameters of the first 5 items, as reported in the item parameter file step1blm.par, are
shown in Table 10.2.

688
10 BILOG-MG EXAMPLES

Table 10.2: Selected item parameters from step1blm.par

Item Slope S.E. Threshold S.E.

S001 0.79494 0.07978 -0.34466 0.06899

S002 0.38723 0.07299 -3.53823 0.61667

S003 0.24041 0.04784 -3.04033 0.61821

S004 0.72020 0.07353 -0.54159 0.08115

S005 0.69253 0.07367 -1.41137 0.13523

The parameter values are in close agreement with Table 1 from Bock. et al. (results for the first 5
items shown in Table 10.3 below), showing also that we have a correct dataset, with the items in
the right order (of the table), albeit not all records.

Table 10.3: Selected item parameters from Bock et. al.

Item Slope Threshold

S001 0.843 -0.339

S002 0.351 -3.623

S003 0.239 -3.073

S004 0.785 0.727

S005 0.269 2.273

In a second run (spell2.blm), we let the program compute the expected domain scores for all 660
examinees from the saved parameter file. The DOMAIN and FILE keywords on the SCORE com-
mand are used. We skip the calibration phase with the SELECT keyword on the CALIB command.
The scores are saved to file by using the SCORE keyword on the SAVE command.

The contents of spell2.blm are shown below. All the command files and data discussed here are
available to the user in the domscore subfolder of the BILOG-MG installation folder.

SPELL2.BLM - CALIBRATION OF THE 100 WORD SPELLING TEST


TWO-PARAMETER MODEL
>COMMENTS
In a second step, we test the "DOMAIN" keyword on the score command. The item
parameter file from the SPELL1.BLM analysis has been edited and saved as
SPELL2.PAR in accordance with the FILE keyword format requirements. We save the
score file.

689
10 BILOG-MG EXAMPLES

>GLOBAL DFNAME='SPELL.DAT', NPARM=2, SAVE;


>SAVE SCORE='SPELL2.SCO';
>LENGTH NITEMS=(100);
>INPUT NTOTAL=100, NIDCHAR=10, KFNAME='SPELL.DAT';
>ITEMS INAMES=(S001(1)S100);
>TEST1 TNAME='SPELLING', INUMBERS=(1(1)100);
(10A1,1X,25A1,1X,25A1,1X,25A1,1X,25A1)
>CALIB SELECT=(0);
>SCORE DOMAIN=100, FILE='SPELL2.PAR', METHOD=2;

The parameter file that we read in through the FILE keyword on the SCORE command had to be
created from the saved parameter file (spell1blm.par) in the spell1.blm run. First we deleted
everything before the first line with parameter estimates. Then we deleted all the columns that
were not slope, threshold, or guessing parameters, leaving just those three columns and in that
order. Then, we added a column with weights as the first column, in the same format. We used
1.0000, because we want all items weighed equally. We then added the variable format statement
(4F10.5) as the first line in the file and renamed it to spell1.par.

The estimated domain scores from spell2.blm are fairly well recovered as spell2.ph3 shows.
Here are the results for the first five examinees:

GROUP SUBJECT IDENTIFICATION DOMAIN SCORE S.E. MARGINAL


WEIGHT TEST TRIED RIGHT PERCENT ABILITY S.E. PROB
-------------------------------------------------------------------------
1 01021119001 64.89 4.92 |
1.00 * SPELLING 100 65 65.00 | -0.1501 0.4187 | 0.000000
1 01041122001 57.14 5.43 |
1.00 * SPELLING 100 56 56.00 | -0.7839 0.4321 | 0.000000
1 01051219001 54.25 5.40 |
1.00 * SPELLING 100 57 57.00 | -1.0132 0.4269 | 0.000000
1 01061219001 71.52 1.80 |
1.00 * SPELLING 100 69 69.00 | 0.4499 0.1768 | 0.000000
1 01071219001 80.77 2.68 |
1.00 * SPELLING 100 81 81.00 | 1.5475 0.4000 | 0.000000
-------------------------------------------------------------------------

If the estimated expected domain scores are not close, something is probably wrong, so this is a
good test.

In a third and final step (step3.blm), we take a random sample of 20 items, adapt the parameter
file (spell3.par as described previously) and produce a new score file (spell3.sco).

The contents of spell3.blm are as follows:

SPELL3.BLM - CALIBRATION OF THE 100 WORD SPELLING TEST


TWO-PARAMETER MODEL
>COMMENTS
In this third step we use a random sample of 20 items from the 100-word spell-
ing test to score the examinees with the item parameters from the first step.
The score file is saved.

>GLOBAL DFNAME='SPELL.DAT', NPARM=2, SAVE;


>SAVE SCORE='SPELL3.SCO';
>LENGTH NITEMS=20;

690
10 BILOG-MG EXAMPLES

>INPUT NTOTAL=100, NIDCHAR=10, KFNAME='SPELL.DAT';


>ITEMS INAMES=(S001(1)S100);
>TEST1 TNAME='SPELLING', INUMBERS=(4, 9, 10, 13, 22, 26, 36, 51, 55, 65,
69, 73, 74, 82, 83, 88, 89, 91, 94, 97);
(10A1,1X,25A1,1X,25A1,1X,25A1,1X,25A1)
>CALIB SELECT=(0);
>SCORE DOMAIN=20, FILE='SPELL3.PAR', METHOD=2;

These are the results for the first five examinees:

GROUP SUBJECT IDENTIFICATION DOMAIN SCORE S.E. MARGINAL


WEIGHT TEST TRIED RIGHT PERCENT ABILITY S.E. PROB
-------------------------------------------------------------------------
1 01021119001 72.51 6.95 |
1.00 * SPELLING 20 14 70.00 | 0.4109 0.7214 | 0.000000
1 01041122001 62.76 7.94 |
1.00 * SPELLING 20 12 60.00 | -0.4985 0.6850 | 0.000000
1 01051219001 63.42 7.90 |
1.00 * SPELLING 20 12 60.00 | -0.4414 0.6870 | 0.000000
1 01061219001 74.25 6.61 |
1.00 * SPELLING 20 14 70.00 | 0.5971 0.7300 | 0.000000
1 01071219001 79.14 5.32 |
1.00 * SPELLING 20 17 85.00 | 1.2047 0.7604 | 0.000000
-------------------------------------------------------------------------

As can be seen, a decent recovery of the “population domain scores” with the random sample of
only 20 items.

Related topics

 CALIB command: SELECT keyword


 SAVE command: SCORE keyword
 SCORE command: DOMAIN keyword
 SCORE command: FILE keyword

691
11 PARSCALE EXAMPLES

11 PARSCALE examples

11.1 Item calibration and examinee Bayes scoring with the rating-scale graded
model

This example illustrates calibration and scoring of a test or scale containing 20 multiple category
items. The simulated data represent responses of 1000 examinees drawn randomly from a popu-
lation with a mean trait score of 0.0 and standard deviation of 1.0.

Data are read from the file exampl01.dat in the examples folder using the DFNAME keyword on
the FILES command. The first few lines of the data file are shown below. The generating trait
value of each examinee is the second column of information in the data file. The case ID, given
at the beginning of each line, is 4 characters long and is indicated as such using the NIDCHAR
keyword on the INPUT command. It is also reflected in the format statement as 4A1.

0001 .44739 42444232223343433332


0002 -.93465 12221121122324121432
0003 -.56465 32212212213342314121
0004 -.58622 13222111113224221111
0005 -.35223 21211122313132312131

All 20 items are used in a single test (NTEST=1 on INPUT command, with LENGTH=20). All 20
items have common categories and are assigned to the same BLOCK (NBLOCK=1 on TEST;
NITEMS=20 on BLOCK).

All items have four categories (NCAT=4 on BLOCK command) and varying difficulties and dis-
criminating powers. The graded model is assumed (GRADED on CALIB command); and a logistic
response model (LOGISTIC on CALIB command) is requested. The choice between a logistic or
normal response function metric is effective only if the graded response model is used. The re-
sponse function of the graded model can be either the normal ogive or its logistic approximation.
Graded is the default. If logistic is selected, the item parameters can be in the natural or the logis-
tic metric. Natural is the default. For the normal metric, set SCALE equal to 1.7. Neither
LOGISTIC nor SCALE is needed when PARTIAL is selected. Because the generalized model allows
for varying item discriminating powers, both a slope and threshold are estimated for each item.
The CADJUST keyword on the BLOCK command is used to set the mean of the category parameters
to 0 as simultaneous estimation of slope parameters and all category parameters is not obtain-
able.

The ITEMFIT keyword is used to set the number of frequency score groups for the computation
of item fit statistics to 10. Note that there is no default value for the ITEMFIT keyword.

The CYCLES keyword specifies 25 EM iterations, with maximum 2 inner EM iterations for the
item and category parameter estimation. Five Newton-Gauss iterations are requested (NEWTON=5
on CALIB). A convergence criterion of 0.005 is specified by using the CRIT keyword on CALIB.

30 quadrature points are to be used in the EM and Newton estimation instead of the default of 10
for cases where LENGTH less or equal to 50 in the INPUT command. The calibration procedure

692
11 PARSCALE EXAMPLES

depends on the evaluation of integrals using Gauss-Hermite quadrature. In general, the accuracy
of numerical integration increases with the number of quadrature points used.

The score estimation method is specified (EAP option on SCORE command). Scale scores for each
subtest are estimated by the Bayes (EAP) method, and their posterior standard deviations serve
as standard errors.

The scores, which are rescaled to zero mean and unit standard deviation in the sample (SMEAN
and SSD on SCORE), are saved in the file exampl01.sco using the SCORE keyword on the SAVE
command. The PFQ keyword is specified. This keyword is usually used to make ML scores more
computable but would also improve the EAP estimates somewhat. In addition, the estimated item
parameters are saved in the file exampl01.par (PARM keyword on the SAVE command).

The command file is shown below, with comments omitted.

EXAMPL01.PSL: ARTIFICIAL EXAMPLE: MONTE CARLO DATA


GRADED RATING SCALE MODEL, NORMAL RESPONSE FUNCTION: EAP SCALE SCORES
>FILES DFNAME='EXAMPL01.DAT',SAVE;
>SAVE PARM='EXAMPL01.PAR',SCORE='EXAMPL01.SCO';
>INPUT NIDCHAR=4,NTOTAL=20,NTEST=1,LENGTH=(20),NFMT=1;
(4A1,10X,20A1)
>TEST1 TNAME=SCALE1,ITEM=(1(1)20),NBLOCK=1;
>BLOCK1 BNAME=SBLOCK1,NITEMS=20,NCAT=4, CADJUST=0.0;
>CAL GRADED,NQPTS=30,CYCLE=(25,2,2,2,2),
NEWTON=5,CRIT=0.005,ITEMFIT=10;
>SCORE EAP,NQPTS=30,SMEAN=0.0,SSD=1.0,NAME=EAP,PFQ=5;

Phase 0 output

At the beginning of the output for Phase 0, the command file is echoed. Information on the num-
ber of tests, items, and type of model to be fitted as interpreted by PARSCALE is also given.

SINGLE MAIN TEST IS USED.


NUMBER OF ITEMS: 20

FORMAT OF DATA INPUT IS


(4A1,10X,20A1)

>TEST1 TNAME=SCALE1,ITEM=(1(1)20),NBLOCK=1;

BLOCK CARD: 1
>BLOCK1 BNAME=SBLOCK1,NITEMS=20,NCAT=4,CADJ=0.0;
>CAL GRADED,LOGISTIC,SCALE=1.7,NQPTS=30,CYCLE=(25,2,2,2,2),
NEWTON=5,CRIT=0.005,ITEMFIT=10;

MODEL SPECIFICATIONS
======================

LOGISTIC - GRADED ITEM RESPONSE MODEL IS SPECIFIED.


SCALE CONSTANT 1.70 FOR SLOPE PARAMETERS.

693
11 PARSCALE EXAMPLES

This section of the output file contains information on the settings to be used during the item pa-
rameter estimation in Phase 2.

CALIBRATION PARAMETERS
======================

MAXIMUM NUMBER OF EM CYCLES: 25


MAXIMUM INNER EM CYCLES: 2
MAXIMUM CATEGORY ESTIMATION CYCLES: 2
MAXIMUM ITEM PARAMETER ESTIMATION CYCLES: 2
MAXIMUM NUMBER OF NEWTON CYCLES: 2
CONVERGENCE CRITERION FOR EM CYCLES: 0.0050
CONVERGENCE CRITERION FOR SLOPE: 0.0050
CONVERGENCE CRITERION FOR THRESHOLD: 0.0050
CONVERGENCE CRITERION FOR CATEGORY: 0.0050
CONVERGENCE CRITERION FOR GEUSSING: 0.0050
ORDER OF INNER EM CYCLES: CATEGORY - ITEM PARAMETERS
ESTIMATION ACCELERATOR: NO (DEFAULT)
RIDGE METHOD: NO (DEFAULT)

No prior distribution was requested in the CALIB command, and consequently the default prior, a
normal distribution on equally spaced points, will be used (DIST=2 on CALIB). The number of
quadrature points to be used during item parameter estimation was set to 30 (NQPT on CALIB).
The program-generated quadrature points and weights are printed to the Phase 0 output file, as
shown below.

THE FIXED PRIOR DISTRIBUTION FOR LATENT TRAITS


MEAN : 0.0000
S.D. : 1.0000

QUADRATURE POINTS AND PRIOR WEIGHTS (PROGRAM-GENERATED NORMAL APPROXIMATION):

1 2 3 4 5
POINT -0.4000E+01 -0.3724E+01 -0.3448E+01 -0.3172E+01 -0.2897E+01
WEIGHT 0.3692E-04 0.1071E-03 0.2881E-03 0.7181E-03 0.1659E-02

6 7 8 9 10
POINT -0.2621E+01 -0.2345E+01 -0.2069E+01 -0.1793E+01 -0.1517E+01
WEIGHT 0.3550E-02 0.7042E-02 0.1294E-01 0.2205E-01 0.3481E-01

11 12 13 14 15
POINT -0.1241E+01 -0.9655E+00 -0.6897E+00 -0.4138E+00 -0.1379E+00
WEIGHT 0.5093E-01 0.6905E-01 0.8676E-01 0.1010E+00 0.1090E+00

16 17 18 19 20
POINT 0.1379E+00 0.4138E+00 0.6897E+00 0.9655E+00 0.1241E+01
WEIGHT 0.1090E+00 0.1010E+00 0.8676E-01 0.6905E-01 0.5093E-01

21 22 23 24 25
POINT 0.1517E+01 0.1793E+01 0.2069E+01 0.2345E+01 0.2621E+01
WEIGHT 0.3481E-01 0.2205E-01 0.1294E-01 0.7042E-02 0.3550E-02

694
11 PARSCALE EXAMPLES

26 27 28 29 30
POINT 0.2897E+01 0.3172E+01 0.3448E+01 0.3724E+01 0.4000E+01
WEIGHT 0.1659E-02 0.7181E-03 0.2881E-03 0.1071E-03 0.3692E-04

TOTAL WEIGHT: 1.00000


MEAN : 0.00000
S.D. : 0.99970

The control settings to be used during calibration are followed by settings to be used during the
scoring phase (Phase 3). The EAP method of scoring is requested (EAP option) and, as in the
calibration phase, 30 quadrature points were requested. Since no prior distribution was requested
using the DIST keyword, by default a normal distribution on equally spaced points will be used
(DIST=2 on SCORE). Note that the DIST keyword applies only when EAP scoring has been se-
lected.

>SCORE EAP,NQPTS=30,SMEAN=0.0,SSD=1.0,NAME=EAP,PFQ=5;

PARAMETERS FOR SCORING AND TEST AND ITEM INFORMATION


====================================================

METHOD OF SCORING SUBJECTS: EXPECTATION A POSTERIORI


(EAP; BAYES ESTIMATES)

TYPE OF PRIOR: NORMAL APPROXIMATION

NUMBER OF QUADRATURE POINTS 30


SCORES WRITTEN TO FILE EXAMPL01.SCO

QUADRATURE POINTS AND PRIOR WEIGHTS (PROGRAM-GENERATED NORMAL APPROXIMATION):

1 2 3 4 5
POINT -0.4000E+01 -0.3724E+01 -0.3448E+01 -0.3172E+01 -0.2897E+01
WEIGHT 0.3692E-04 0.1071E-03 0.2881E-03 0.7181E-03 0.1659E-02

6 7 8 9 10
POINT -0.2621E+01 -0.2345E+01 -0.2069E+01 -0.1793E+01 -0.1517E+01
WEIGHT 0.3550E-02 0.7042E-02 0.1294E-01 0.2205E-01 0.3481E-01

11 12 13 14 15
POINT -0.1241E+01 -0.9655E+00 -0.6897E+00 -0.4138E+00 -0.1379E+00
WEIGHT 0.5093E-01 0.6905E-01 0.8676E-01 0.1010E+00 0.1090E+00

16 17 18 19 20
POINT 0.1379E+00 0.4138E+00 0.6897E+00 0.9655E+00 0.1241E+01
WEIGHT 0.1090E+00 0.1010E+00 0.8676E-01 0.6905E-01 0.5093E-01

21 22 23 24 25
POINT 0.1517E+01 0.1793E+01 0.2069E+01 0.2345E+01 0.2621E+01
WEIGHT 0.3481E-01 0.2205E-01 0.1294E-01 0.7042E-02 0.3550E-02

26 27 28 29 30
POINT 0.2897E+01 0.3172E+01 0.3448E+01 0.3724E+01 0.4000E+01
WEIGHT 0.1659E-02 0.7181E-03 0.2881E-03 0.1071E-03 0.3692E-04

695
11 PARSCALE EXAMPLES

TOTAL WEIGHT: 1.00000


MEAN : 0.00000
S.D. : 0.99970

The values assigned to the rescaling constants SMEAN and SSD in the SCORE command are shown:

SET NUMBER : 1
SCORE NAME : EAP
NUMBER OF ITEMS : 20
RESCALE CONSTANT: MEAN = 0.00 S.D. = 1.00

ITEMS : 1 2 3 4 5 6 7 8 9 10
11 12 13 14 15 16 17 18 19 20

0001 0002 0003 0004 0005 0006 0007 0008 0009 0010
0011 0012 0013 0014 0015 0016 0017 0018 0019 0020

Input and output files as requested with the DFNAME keyword on the FILES command and the
PARM and SCORE keywords on the SAVE command are listed:

FILE ASSIGNMENTS AND DISPOSITIONS


=================================

[INPUT FILES]

SUBJECT DATA INPUT FILE EXAMPL01.DAT


SINGLE-SUBJECT DATA
NO CASE WEIGHTS

[OUTPUT FILES]

ITEM PARAMETERS FILE EXAMPL01.PAR


SUBJECT SCALE-SCORE FILE EXAMPL01.SCO

[SCRATCH FILES]

PARSCALE SYSTEM BINARY DATA FILE Exampl01.MFL


TEMPORARY FILE Exampl01.T99
TEMPORARY FILE Exampl01.T98
TEMPORARY FILE Exampl01.T97
TEMPORARY FILE Exampl01.T96

To allow the user to verify that data have been read in correctly from the raw data file, the first
two records from the data file are echoed in the output. The INPUT RESPONSES fields give the
original responses while the RECODED RESPONSES reflect any recoding of the responses. Re-
coding of responses is controlled by the ORIGINAL and MODIFIED keywords on the BLOCK com-
mand.

INPUT AND RECODED RESPONSE OF FIRST AND SECOND OBSERVATIONS

OBSERVATION # 1
GROUP: 1
ID: 0001

696
11 PARSCALE EXAMPLES

INPUT RESPONSES: 4 2 4 4 4 2 3 2 2 2 3 3 4 3 4 3 3 3 3 2
RECODED RESPONSES:4 2 4 4 4 2 3 2 2 2 3 3 4 3 4 3 3 3 3 2

OBSERVATION # 2
GROUP: 1
ID: 0002
INPUT RESPONSES: 1 2 2 2 1 1 2 1 1 2 2 3 2 4 1 2 1 4 3 2
RECODED RESPONSES:1 2 2 2 1 1 2 1 1 2 2 3 2 4 1 2 1 4 3 2

Finally, the number of observations to be used in the analysis is recorded. By default, all obser-
vations will be used. The number of observations to be used can be manipulated using the
SAMPLE or TAKE keywords on the INPUT command.

1000 OBSERVATIONS READ FROM FILE: PSLDAT\EXAMPL01.DAT


1000 OBSERVATIONS WRITTEN TO FILE: Exampl01.MFL

Phase 1 output

The title given in the TITLE command and name assigned to the test in the TEST command in the
command file are echoed in the output file.

EXAMPLE 1: ARTIFICIAL EXAMPLE: MONTE CARLO DATA


GRADED MODEL, NORMAL METRIC: EAP SCALE SCORES

MAINTEST: SCALE1

The master file created during Phase 0 is used as input. Note that the master file exampl01.mfl
may be saved using the MASTER keyword on the SAVE command for use as input in a subsequent
analysis (MFNAME keyword on the FILES command). The keywords TAKE and SAMPLE on the
INPUT command control the number of records read from the raw data file. As the default value
of SAMPLE is 100%, neither keyword was used and all data were used by default.

1000 OBS.(WEIGHTS: 1000.000) WERE READ FROM Exampl01.MFL

Summary item statistics for the 20 items are given next. Since no not-represented (NFNAME on
FILES) or omit key (OFNAME on FILES) was used, no frequencies or percentages are reported un-
der the “NOT PRESENT” or “OMIT” headings. Under the “CATEGORIES” heading, frequencies and
percentages of responses for each of the 4 categories are given item-by-item. Cumulative fre-
quencies and percentages for the categories over all items are given at the end of the table.

Note that, if empty categories are encountered, the user has to recode the corresponding items on
which this occurs before proceeding with the analysis.

BLOCK NO.: 1 NAME: SBLOCK1


---------------------------------------------------------------
ITEM | TOTAL NOT OMIT | CATEGORIES
| PRESENT |
| | 1 2 3 4
---------------------------------------------------------------
0001 | |
FREQ.| 1000 0 0| 194 303 313 190
PERC.| 0.0 0.0| 19.4 30.3 31.3 19.0

697
11 PARSCALE EXAMPLES

| |
0002 | |
FREQ.| 1000 0 0| 204 284 310 202
PERC.| 0.0 0.0| 20.4 28.4 31.0 20.2

0020 | |
FREQ.| 1000 0 0| 305 211 212 272
PERC.| 0.0 0.0| 30.5 21.1 21.2 27.2
| |
---------------------------------------------------------------
CUMMUL.| |
FREQ.| | 4844 5186 5204 4766
PERC.| | 24.2 25.9 26.0 23.8
---------------------------------------------------------------

Item means, initial slope estimates, and Pearson and polyserial item-test correlations are shown
in the next table.

Pearson

The sample product-moment correlation of the test score,

J
ti = ∑ sij ,
j =1

and m-category polytomous item score, sij = 1, 2,..., m , is the point polyserial correlation rPP , j ,
where


n
t s − nts j
rPP , j = i =1 i ij

(∑ i =1 ti2 − nt 2 )(∑ i =1 sij2 − ns 2j )


n n

where n is the sample size, t is the mean test score and s j , the mean item score. In this example
n = 1000. For item 1,

∑s i1 = (1×194) + (2 × 303) + (3 × 313) + (4 ×190) ,

so that

s1 =
∑s i1
=
2502
= 2.502.
n 1000

Also

∑s 2
i1 = (12 ×194) + (22 × 303) + (32 × 313) + (42 × 190) = 7263

698
11 PARSCALE EXAMPLES

so that

7263 − (1000 × 2.5022 )


S .D.(item 1) =
1000
1.0015

Polyserial correlation

The polyserial correlation rP can be expressed in terms of the point polyserial correlation as

rPP , jσ j
rP , j = m −1
∑ k =1
h( z jk )

where

 z jk is the scoring corresponding to the cumulative proportion, p jk of the k-th response


category to item j (for item 1, for example, the cumulative proportions are 0.194, 0.497,
and 0.81 for categories 1,2, and 3), σ j is the standard deviation of item scores for item j
(1.0015 for item 1), and rPP , j is the point-polyserial correlation.
 h( z jk ) is the ordinate of the normal distribution at the point z jk ; that is

1 1
h( z jk ) = exp(− z 2jk ).
2π 2

Initial slopes and location

The polyserial correlation estimates the item factor loading, α j , say. If the arbitrary scale of the
item latent variable, y j , is chosen so that the variance y j equals 1, then

y j = α j (θ − b jk ) + ε j ,

where θ is the factor score with mean 0 and variance 1, and the error, ε j , has mean 0 and vari-
ance 1 − rp2, j .

For purposes of MML parameter estimation in IRT, it is convenient to rescale the item latent
variable so that the error variance equals 1. The factor loading then becomes the item slope,

a j = rp , j / 1 − rp2, j .

699
11 PARSCALE EXAMPLES

This provisional estimate of the slope is then used as the starting value in the iterative EM solu-
tions of the marginal maximum likelihood equations for estimating the parameters of the poly-
tomous item response models. The initial locations shown in the last column of the table are the
averages of the category thresholds for each item.

Initial item-category threshold parameters

Item-category threshold parameters can be calculated once the polyserial coefficients have been
obtained. The expression for the threshold parameter in terms of the cumulative category propor-
tions and the biserial correlation coefficient (Lord & Novick, 1968) is

 zjk
b jk =
rB , j

with rB , j the biserial correlation for item j and zjk the z score that cuts off 
p jk proportion of the
cases to item j in a unit-normal distribution; that is

 njk
p jk = m
∑ njk
v =1

where n jk is the frequency of the categorical response for item j and category k. These provi-
sional thresholds of the categories serve as starting values in MML estimation of the correspond-
ing item parameters. For the rating-scale model, whether or not all items have the same thresh-
olds, the category proportions are computed from frequencies accumulated over all items; i.e.,

∑n
j =1
jk

pk = n m
.
∑∑ n
j =1 k =1
jk

In Muraki’s (1990) formulation of the rating-scale model, the category threshold parameter, ck ,
is expressed as a deviation from the item threshold parameter, b j ; that is

y j = α (θ − b j + ck ) + ε j

m −1
under the constraint that ∑c
j = k −1
j = 0.

700
11 PARSCALE EXAMPLES

In the context of the rating-scale model, b j is referred to as a “location” parameter. The INITIAL
LOCATION column provides the values of the average of the category thresholds for each item.

---------------------------------------------------------------------------
BLOCK | RESPONSE TOTAL SCORE | PEARSON & | INITIAL INITIAL
ITEM | MEAN MEAN | POLYSERIAL | SLOPE LOCATION
| S.D.* S.D.* | CORRELATION |
---------------------------------------------------------------------------
SBLOCK1 | | |
1 0001 | 2.499 49.892 | 0.778 | 1.488 -0.017
| 1.009* 14.754* | 0.830 |
2 0002 | 2.510 49.892 | 0.797 | 1.628 -0.036
| 1.030* 14.754* | 0.852 |
3 0003 | 2.481 49.892 | 0.785 | 1.545 0.013
| 1.031* 14.754* | 0.839 |
4 0004 | 2.515 49.892 | 0.805 | 1.695 -0.053
| 1.037* 14.754* | 0.861 |
5 0005 | 2.511 49.892 | 0.811 | 1.739 -0.038
| 1.032* 14.754* | 0.867 |
6 0006 | 2.137 49.892 | 0.728 | 1.293 0.837
| 1.037* 14.754* | 0.791 |
7 0007 | 2.118 49.892 | 0.735 | 1.336 0.855
| 1.033* 14.754* | 0.801 |
8 0008 | 2.144 49.892 | 0.754 | 1.426 0.758
| 1.029* 14.754* | 0.819 |
9 0009 | 2.136 49.892 | 0.736 | 1.329 0.830
| 1.029* 14.754* | 0.799 |
10 0010 | 2.128 49.892 | 0.730 | 1.293 0.882
| 1.002* 14.754* | 0.791 |
11 0011 | 2.870 49.892 | 0.645 | 0.985 -1.168
| 1.041* 14.754* | 0.702 |
12 0012 | 2.874 49.892 | 0.655 | 1.029 -1.094
| 1.071* 14.754* | 0.717 |
13 0013 | 2.874 49.892 | 0.690 | 1.144 -1.017
| 1.053* 14.754* | 0.753 |
14 0014 | 2.831 49.892 | 0.673 | 1.072 -0.953
| 1.057* 14.754* | 0.731 |
15 0015 | 2.847 49.892 | 0.679 | 1.114 -0.938
| 1.094* 14.754* | 0.744 |
16 0016 | 2.492 49.892 | 0.590 | 0.839 0.010
| 1.161* 14.754* | 0.643 |
17 0017 | 2.541 49.892 | 0.548 | 0.738 -0.173
| 1.125* 14.754* | 0.594 |
18 0018 | 2.463 49.892 | 0.589 | 0.834 0.102
| 1.152* 14.754* | 0.641 |
19 0019 | 2.470 49.892 | 0.573 | 0.798 0.085
| 1.160* 14.754* | 0.624 |
20 0020 | 2.451 49.892 | 0.583 | 0.830 0.048
| 1.184* 14.754* | 0.639 |
---------------------------------------------------------------------------
CATEGORY | | MEAN | S.D. | PARAMETER
1 | | 36.116 | 10.656 | 0.927
2 | | 46.091 | 11.156 | 0.002
3 | | 54.107 | 11.165 | -0.930
4 | | 63.427 | 10.739 | 0.000
----------------------------------------------------------------------------

701
11 PARSCALE EXAMPLES

At the end of this table, descriptive statistics for the raw total scores of examinees who re-
sponded in each of the 4 categories are given. The highest average total score of 63.427 was for
respondents who responded in the 4th category.

Phase 2 output

An MML approach is used for estimation, and either a normal or empirical latent distribution
with mean 0 and standard deviation 1 is assumed. The type of distribution used is controlled by
the DIST keyword on the CALIB command. By default, a normal distribution with equally spaced
points is used and, for analyses where the LENGTH keyword on the INPUT command is set to a
value less than or equal to 50, 10 quadrature points will be used.

Because of the potentially wide spacing of category boundary parameters on the latent dimen-
sion, it is advisable to use a greater number of quadrature points than in BILOG-MG. In this ex-
ample, the number of quadrature points was set to 30 (NQPT on the CALIB command).

The EM algorithm is used in the solution of the maximum likelihood equations for parameters,
starting from the initial values described in the Phase 1 output. At each iteration, the -2 ln L is
given, along with information on the parameter for which the largest change between cycles was
observed. The number of EM cycles is controlled by the CYCLE keyword on the CALIB command,
and the convergence criterion may be set using the CRIT keyword on the same command. By de-
fault, 10 EM cycles would be performed when LENGTH ≤ 50 on the INPUT command. In this ex-
ample, 25 EM cycles with a maximum of 2 inner EM iterations for the item and category pa-
rameter estimation were specified. The default convergence criterion is 0.001. For this example,
it was set to 0.005.

[E-M CYCLES] GRADED RESPONSE MODEL

CATEGORY AND ITEM PARAMETERS AFTER CYCLE 0

LARGEST CHANGE= 0.000

-2 LOG LIKELIHOOD = 46371.421

CATEGORY AND ITEM PARAMETERS AFTER CYCLE 1

LARGEST CHANGE= 0.636 ( -1.168-> -0.532) at Location of Item: 11 0011

-2 LOG LIKELIHOOD = 44229.018

CATEGORY AND ITEM PARAMETERS AFTER CYCLE 2

LARGEST CHANGE= 0.033 ( 0.989-> 1.022) at Slope of Item: 13 0013

-2 LOG LIKELIHOOD = 44224.943

The EM algorithm converged after 3 cycles were completed. After reaching either the maximum
number of EM cycles or convergence, the program will perform the Newton-Gauss (Fisher scor-
ing) cycles requested through the NEWTON keyword on the CALIB command. In this example,

702
11 PARSCALE EXAMPLES

NEWTON was set to 5. The information matrix for all item parameters is approximated during
each Newton step and then used at convergence to provide large-sample standard errors of esti-
mation for the item parameter estimates.

[NEWTON CYCLES] GRADED RESPONSE MODEL

CATEGORY AND ITEM PARAMETERS AFTER CYCLE 0

LARGEST CHANGE= 0.000


-2 LOG LIKELIHOOD = 44224.833

CATEGORY AND ITEM PARAMETERS AFTER CYCLE 1

LARGEST CHANGE= 0.004 ( -0.536-> -0.533) at Location of Item: 11 0011

The Newton cycles converged after 2 iterations. As all items were assigned to the same BLOCK,
only one table is printed to the output file.

At the top of the table, the estimated category parameters are given. For each m category item,
there are m-1 category threshold parameters with

b j1 ≤ b j 2 ... ≤ b jm −1.

For a polytomous item response model, the discriminating power of a specific categorical re-
sponse depends on the width of the adjacent category thresholds as well as a slope parameter.
Because of this property, the simultaneous estimation of the slope parameter and all m j category
parameters is not obtainable. If the model includes the slope parameter for each item j as in this
example, the location of the category parameters must be fixed. The CADJUST keyword on the
BLOCK command was set to 0, and thus the mean of the category parameters is 0.

For each item, the slope and location parameters, along with corresponding standard errors, are
given. All guessing parameters are zero for this model.

ITEM BLOCK 1 SBLOCK1

CATEGORY PARAMETER : 1.024 0.005 -1.030


S.E. : 0.011 0.009 0.011

+------+-----+---------+---------+---------+---------+---------+---------+
| ITEM |BLOCK| SLOPE | S.E. |LOCATION | S.E. |GUESSING | S.E. |
+======+=====+=========+=========+=========+=========+=========+=========+
| 0001 | 1 | 1.486 | 0.063 | 0.006 | 0.042 | 0.000 | 0.000 |
| 0002 | 1 | 1.526 | 0.067 | -0.012 | 0.040 | 0.000 | 0.000 |
| 0003 | 1 | 1.472 | 0.065 | 0.022 | 0.041 | 0.000 | 0.000 |

[Similar output omitted]

| 0019 | 1 | 0.699 | 0.030 | 0.048 | 0.060 | 0.000 | 0.000 |


| 0020 | 1 | 0.665 | 0.029 | 0.085 | 0.062 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+

703
11 PARSCALE EXAMPLES

The average parameter estimates over all 20 items are given next. If the items are regarded as
random samples from a real or hypothetical universe, these quantities estimate the means and
standard deviations of the parameters. They could serve as item parameter priors in future item
calibrations in this universe.

SUMMARY STATISTICS OF PARAMETER ESTIMATES

+----------+---------+---------+----+
|PARAMETER | MEAN | STN DEV | N |
+==========+=========+=========+====+
|SLOPE | 1.111| 0.317| 20|
|LOG(SLOPE)| 0.065| 0.296| 20|
|THRESHOLD | 0.003| 0.370| 20|
|GUESSING | 0.000| 0.000| 0|
+----------+---------+---------+----+

The estimated latent distribution is given next. This distribution is the sum of the posterior distri-
butions of θ for all respondents in the sample. It is represented here as point masses, scaled to
sum to 1.0, at 30 equally spaced points on the θ dimension. If the population distribution is
normal and the test is sufficiently informative over the range of θ , the posterior distributions for
all respondents will approach normality and the latent distribution will approach normality.

1 2 3 4 5
POINT -0.4000E+01 -0.3724E+01 -0.3448E+01 -0.3172E+01 -0.2897E+01
WEIGHT 0.6912E-04 0.1967E-03 0.5110E-03 0.1201E-03 0.2420E-02

6 7 8 9 10
POINT -0.2621E+01 -0.2345E+01 -0.2069E+01 -0.1793E+01 -0.1517E+01
WEIGHT 0.4662E-02 0.7645E-02 0.1189E-01 0.2005E-01 0.3585E-01

11 12 13 14 15
POINT -0.1241E+01 -0.9655E+00 -0.6897E+00 -0.4138E+00 -0.1379E+00
WEIGHT 0.5568E-01 0.7094E-01 0.8078E-01 0.9708E+00 0.1104E+00

16 17 18 19 20
POINT 0.1379E+00 0.4138E+00 0.6897E+00 0.9655E+00 0.1241E+01
WEIGHT 0.1086E+00 0.9806E+00 0.8301E-01 0.6999E-01 0.5416E-01

21 22 23 24 25
POINT 0.1517E+01 0.1793E+01 0.2069E+01 0.2345E+01 0.2621E+01
WEIGHT 0.3797E-01 0.2403E-01 0.1328E-01 0.6619E-02 0.2962E-02

26 27 28 29 30
POINT 0.2897E+01 0.3172E+01 0.3448E+01 0.3724E+01 0.4000E+01
WEIGHT 0.1197E-03 0.4451E-03 0.1547E-04 0.5062E-04 0.1563E-05

TOTAL WEIGHT: 1.00000


MEAN : 0.00000
S.D. : 0.99970

The goodness-of-fit of the polytomous item response model can be tested item by item. Summa-
tion of the item fit can also be used for the goodness-of-fit for the test as a whole. The fit statis-
tics are useful in evaluating the fit of models to the same response data when models are nested

704
11 PARSCALE EXAMPLES

in their parameters.

Respondents are assigned to H intervals on the θ -continuum. The number of intervals is set us-
ing the ITEMFIT keyword on the CALIB command. The expected a posteriori (EAP) score of each
respondent is used for assigning respondents to the H intervals. The observed frequency rhjk of
the k-th category response to item j in interval h, and N hj , the number of respondents assigned to
item j in the h-th interval, are computed. The estimated θ s are rescaled so that the variance of
the sample distribution equals that of the latent distribution on which the MML estimation of the
parameters is based.

Thus an H by m j contingency table is obtained for each item j. In order to avoid expected values
less than 5, neighboring intervals and/or categories may be merged. For each interval, the inter-
val mean, θ h , and the value of the fitted response function Pjk (θ h ) , are computed.

Finally, a likelihood ratio χ 2 -statistic for each item is computed by

H j mj
rhjk
G = 2∑∑ rhjk ln
2
,
N hj Pjk (θ h )
j
h =1 k =1

where H j is the number of intervals left after neighboring intervals are merged. The degrees of


Hj
freedom is j =1
(m*j − 1) where m*j is the number of categories left after merging.

The likelihood ratio χ 2 -statistic for the test as a whole is simply the summation of the separate
χ 2 -statistics. The number of degrees of freedom is also the summation of the degrees of freedom
for each item.

ITEM FIT STATISTICS

-----------------------------------------------
| BLOCK | ITEM | CHI-SQUARE | D.F. | PROB. |
-----------------------------------------------
| SBLOCK1 | 0001 | 25.00714 | 20. | 0.201 |
| | 0002 | 23.18082 | 20. | 0.280 |
| | 0003 | 25.66873 | 20. | 0.177 |
| | 0004 | 31.56813 | 19. | 0.035 |
| | 0005 | 19.88483 | 19. | 0.339 |
| | 0006 | 13.51922 | 22. | 0.918 |

| | 0019 | 12.51549 | 25. | 0.982 |
| | 0020 | 25.25502 | 25. | 0.448 |
-----------------------------------------------
| TOTAL | | 492.43930 | 442. | 0.049 |
-----------------------------------------------

705
11 PARSCALE EXAMPLES

The null hypothesis tested here is that there are no significant differences between the expected
and observed frequencies. A significant χ 2 -statistic indicates that item parameters differ across
the raw score groups and that the assumed model is not appropriate for the data. In this case, no
item showed poor fit to the assumed model.

Phase 3 output

The first information given in the output from the scoring phase is on the scoring function used
for scaling. The default function is STANDARD, and thus the standard scoring function (1.0, 2.0)
will be used even though a different scoring function may be used for calibration. The scoring
function may also be set to CALIBRATION (SCORING keyword on the SCORE command) to use the
calibration scoring function specified on the BLOCK command instead. Note that the scoring func-
tion only applies to the partial credit model.

SCORING FUNCTION FOR SCALING

BLOCK: 1 SBLOCK1

1 1.000
2 2.000
3 3.000
4 4.000

Bayes estimates are computed for each examinee with respect to his or her group latent distribu-
tion (controlled by the EAP option on the SCORE command used here). A discrete distribution on a
finite number of points (see below) is used as prior. The user may select the number of points
and the type of prior using the NQPT and DIST keywords on the SCORE command.

[EAP SUBJECT ESTIMATION]

QUADRATURE POINTS AND PRIOR WEIGHTS:

1 2 3 4 5
POINT -0.4000E+01 -0.3724E+01 -0.3448E+01 -0.3172E+01 -0.2897E+01
WEIGHT 0.3692E-04 0.1071E-03 0.2881E-03 0.7181E-03 0.1659E-02

6 7 8 9 10
POINT -0.2621E+01 -0.2345E+01 -0.2069E+01 -0.1793E+01 -0.1517E+01
WEIGHT 0.3550E-02 0.7042E-02 0.1294E-01 0.2205E-01 0.3481E-01

11 12 13 14 15
POINT -0.1241E+01 -0.9655E+00 -0.6897E+00 -0.4138E+00 -0.1379E+00
WEIGHT 0.5093E-01 0.6905E-01 0.8676E-01 0.1010E+00 0.1090E+00

16 17 18 19 20
POINT 0.1379E+00 0.4138E+00 0.6897E+00 0.9655E+00 0.1241E+01
WEIGHT 0.1090E+00 0.1010E+00 0.8676E-01 0.6905E-01 0.5093E-01

706
11 PARSCALE EXAMPLES

21 22 23 24 25
POINT 0.1517E+01 0.1793E+01 0.2069E+01 0.2345E+01 0.2621E+01
WEIGHT 0.3481E-01 0.2205E-01 0.1294E-01 0.7042E-02 0.3550E-02

26 27 28 29 30
POINT 0.2897E+01 0.3172E+01 0.3448E+01 0.3724E+01 0.4000E+01
WEIGHT 0.1659E-02 0.7181E-03 0.2881E-03 0.1071E-03 0.3692E-04

MEANS AND STANDARD DEVIATIONS OF ABILITY DISTRIBUTIONS

SCORE MEAN STANDARD TOTAL


NAME DEVIATION FREQUENCIES
---------------------------------------------
EAP 0.000 0.985 1000.00
---------------------------------------------

In this example, the keywords SMEAN and SSD were set to 0 and 1 respectively on the SCORE
command. As a result, the following output reflects the rescaling constants (0.000 and 1.015)
used in this particular case.

RESCALING DONE WITH RESPECT TO USER SUPPLIED LINEAR TRANSFORMATION

SCORE LOCATION SCALING TOTAL


NAME CONSTANT CONSTANT FREQUENCIES
---------------------------------------------
EAP 0.000 1.015 1000.00
---------------------------------------------

Scores are saved to an external file (keyword SCORE on SAVE command), but the first three scores
are printed to the output file for purposes of checking. When EAP is used for scoring, the S.E.
column represents the posterior standard deviation.

SUBJECT IDENTIFICATION WEIGHT/FREQUENCY


SCORE NAME GROUP WEIGHT MEAN CATEGORY ATTEMPTS ABILITY S.E.
----------------------------------------------------------------------------
.447 | 1 GROUP 01 1.00
1 EAP 1 | 1.00 3.00 1.00 0.6435 0.2193
-----------------------------------------------------------------------------
-.934 | 2 GROUP 01 1.00
1 EAP 1 | 1.00 1.95 1.00 -0.7442 0.2164
-------------------------------------------------------------------------
-.564 | 3 GROUP 01 1.00
1 EAP 1 | 1.00 2.10 1.00 -0.4392 0.2115
-----------------------------------------------------------------------------

MEANS AND STANDARD DEVIATIONS OF ABILITY DISTRIBUTIONS

SCORE MEAN STANDARD TOTAL


NAME DEVIATION FREQUENCIES
---------------------------------------------
EAP 0.000 1.000 1000.00
---------------------------------------------

When EAP is selected, an estimate of the population distribution of ability in the form of a dis-
crete distribution of a finite number of points is obtained by accumulating the posterior densities

707
11 PARSCALE EXAMPLES

over the subjects at each quadrature point. These sums are then normalized to obtain the esti-
mated probabilities at the points. Improved estimates of the latent distribution may be obtained
after one more iteration of the solution.

The program also computes the mean and standard deviation for the estimated latent distribution.
Sheppard’s correction for coarse grouping is used in the calculation of the standard deviation.
The EAP estimate is the mean of the posterior distribution while the standard error is the stan-
dard deviation of the posterior distribution. Posterior weights are only given when EAP is used.
Note that it is based on all cases, and not just on those cases used in calibration.

QUADRATURE POINTS AND POSTERIOR WEIGHTS: SCORE SET # 1


1 2 3 4 5
POINT -0.4000E+01 -0.3724E+01 -0.3448E+01 -0.3172E+01 -0.2897E+01
WEIGHT 0.6822E-04 0.1942E-03 0.5048E-03 0.1187E-03 0.2494E-02

6 7 8 9 10
POINT -0.2621E+01 -0.2345E+01 -0.2069E+01 -0.1793E+01 -0.1517E+01
WEIGHT 0.46622-02 0.7591E-02 0.1180E-01 0.1987E-01 0.3555E-01

11 12 13 14 15
POINT -0.1241E+01 -0.9655E+00 -0.6897E+00 -0.4138E+00 -0.1379E+00
WEIGHT 0.5541E-01 0.7082E-01 0.8069E-01 0.9694E+00 0.1105E+00

16 17 18 19 20
POINT 0.1379E+00 0.4138E+00 0.6897E+00 0.9655E+00 0.1241E+01
WEIGHT 0.1088E+00 0.9832E+00 0.8323E-01 0.7015E-01 0.5431E-01

21 22 23 24 25
POINT 0.1517E+01 0.1793E+01 0.2069E+01 0.2345E+01 0.2621E+01
WEIGHT 0.3809E-01 0.2411E-01 0.1333E-01 0.6645E-02 0.2974E-02

26 27 28 29 30
POINT 0.2897E+01 0.3172E+01 0.3448E+01 0.3724E+01 0.4000E+01
WEIGHT 0.1202E-03 0.4470E-03 0.1554E-04 0.5083E-04 0.1569E-05

TOTAL WEIGHT: 1.00000


MEAN : 0.00012
S.D. : 1.01246

The mean and standard deviation of the latent posterior distribution calculated from posterior
weights at quadrature points are also given. In these calculations, the formulas for the variance of
grouped data are used, with quadrature points as class marks and posterior weights as class fre-
quencies.

11.2 Examinee maximum likelihood scoring from existing parameters

In this example, the item parameter estimates from the Section 11.1, saved in the exampl01.par
file, are used in scoring the simulated examinees by the maximum likelihood method (MLE).
The item parameter file is used as input (IFNAME keyword on the FILES command) and calibra-
tion is suppressed with the NOCALIB option of the CALIB command.

708
11 PARSCALE EXAMPLES

Comparison of the results in files example01.ph3 (see Section 11.1, Phase 3 output) and ex-
ampl02.ph3 (not shown here) shows that, when the scores are scaled to match the mean and
standard deviation of the generating distribution, both the EAP and MLE estimates recover the
generating values with good accuracy.

EXAMPL02.PSL - ARTIFICIAL EXAMPLE (MONTE CARLO DATA)


GRADED MODEL - MLE SCALE SCORES
>FILES DFNAME='EXAMPL01.DAT', IFNAME='EXAMPL01.PAR', SAVE;
>SAVE SCORE='EXAMPL02.SCO';
>INPUT NIDCHAR=4, NTOTAL=20, LENGTH=20;
(4A1,10X,20A1)
>TEST1 TNAME=SCALE1, ITEM=(1(1)20), NBLOCK=1;
>BLOCK1 BNAME=SBLOCK1, NITEMS=20, NCAT=4;
>CALIB GRADED, LOGISTIC, SCALE=1.7, NQPTS=30, CYCLES=(100,1,1,1,1),
CRIT=0.005, NOCAL;
>SCORE MLE, SMEAN=0.0, SSD=1.0, NAME=MLE, PFQ=5;

11.3 Calibration and scoring with the generalized partial credit rating-scale
model: collapsing of categories

This example scores and calibrates the data of Section 11.1 assuming the partial credit model
with standard scoring function. The command file is shown below.

To illustrate the situation where two types of items are involved, the four categories for the sec-
ond ten items are collapsed into two categories, thus making those items effectively binary. Two
blocks are required (each with ten items), and the MODIFIED list in the BLOCK2 command speci-
fies the collapsing.

The standard score function assumes 4 is the highest category, so no response modification is
required in BLOCK1. In BLOCK2, the scoring function is used to specify scoring function values.
CADJUST is not used with the partial credit model, nor is SCALE in the CALIB command. Because
the data are now less informative, the number of quadrature points for calibration can be reduced
(NQPT=15 instead of the 30 previously used).

Despite the different model and the partition of the items into two blocks, the estimated trait
scores in exampl03.sco agree well with the estimates from Sections 11.1 and 11.2 after rescaling
in the sample.

EXAMPL03.PSL - ARTIFICIAL EXAMPLE (MONTE CARLO DATA)


GENERALIZED PARTIAL CREDIT MODEL - EAP SCALE SCORES
>FILES DFNAME='EXAMPL01.DAT', SAVE;
>SAVE SCORE='EXAMPL03.SCO';
>INPUT NIDCHAR=4, NTOTAL=20, NTEST=1, LENGTH=20;
(4A1,10X,20A1)
>TEST TNAME='SCALE1', ITEM=(1(1)20), NBLOCK=2;
>BLOCK1 BNAME='SBLOCK1', NITEMS=10, NCAT=4, SCORING=(1,2,3,4);
>BLOCK2 BNAME='SBLOCK2', NITEMS=10, NCAT=4, MODIFIED=(1,1,2,2), SCORE=(1,2);
>CALIB PARTIAL, LOGISTIC, NQPTS=15, CYCLE=(100,1,1,1,1), NEWTON=2,
CRIT=0.01;
>SCORE MLE, SMEAN=0.0, SSD=1.0, NAME='PCR_MLE', PFQ=5;

709
11 PARSCALE EXAMPLES

11.4 Two-group differential item functioning (DIF) analysis with the partial
credit model

This example illustrates differential item functioning (DIF) analysis of multiple category item
responses. The SCORE command is required and thus included in the command file. For the DIF
model, however, no scoring is done and there is no Phase 3 output.

Raw data are read from the file exampl04.dat using the DFNAME keyword on the FILES com-
mand. The data file contains responses to 6 items, as indicated on the INPUT command, where
NTOTAL is set to 6. The data file contains the examinee ID and sample group code (1,2), then the
responses on the 6 items, and finally the generating trait value for each examinee. The first few
lines of the data file are shown below.

0001 1 233113 .43930


0002 1 113111 -.94251
0003 1 313112 -.57257
0004 1 113131 -.59414
0005 1 313131 -.36019

The format statement includes information on three fields in the raw data file. The subject ID
(4A1) and group identification field (1A1) are read first, followed by the 6 item responses (6A1).

One test, 6 items in length, is considered. The MGROUP keyword on the INPUT command requests
a multiple-group analysis for two groups. Note that the MGROUP keyword is used in combination
with the MGROUP command, which must follow directly after the BLOCK command(s).

On the TEST command, a name for the test is provided using the TNAME keyword. The items on
this test are listed using the ITEMS keyword, while the INAMES keyword is used to provide names
for the items. Finally, by setting NBLOCK to 6, it is indicated that 6 BLOCK commands will follow
the TEST command.

In this example, there is one item with three categories originally coded 1, 2, and 3 in each block
as indicated by the NITEMS, NCAT and ORIGINAL keywords respectively. Because the rating-scale
model is not used here, separate category parameters are estimated for each item, and the REPEAT
keyword indicates that the BLOCK command should be repeated six times.

The second value (1) assigned to the DIF keyword of the MGROUP command requests a DIF analy-
sis of the item threshold parameters. All other values in this keyword are equal to zero, indicating
that only thresholds are allowed to differ between the groups. The GNAME and GCODE keywords
are used to assign names and codes to the two groups. By default, the first group will be used as
the reference group. To change the reference group, the REFERENCE keyword on the MGROUP
command may be used.

A partial credit model with logistic response function is requested through the use of the
PARTIAL and LOGISTIC options on the CALIB command. The default number of quadrature points
is 30. In this case, NQPT is set to 25, because fewer points are needed when the number of items
is small. By setting the CYCLES keyword to 100, a maximum of 100 EM cycles will be per-

710
11 PARSCALE EXAMPLES

formed, followed by two Newton cycles (NEWTON=2). The convergence criterion is somewhat re-
laxed by setting CRIT to 0.01 instead of using the default convergence criterion of 0.001. Finally,
the POSTERIOR option is added to the CALIB command. As a default, the posterior distribution is
computed after the computation of expected proportions during the E-step as their by-product.
Therefore, these expected sample sizes and expected frequencies of categorical responses are
computed based on the posterior distribution in the previous EM cycle. Adding the POSTERIOR
option forces the program to compute the posterior distribution again after the M-step. Therefore,
the expected proportions can be computed during the E-step based on an updated posterior dis-
tribution. This was added to be consistent with the BILOG-MG program in the case of two
categories.

The command file is as follows.

EXAMPL04.PSL - DIF ANALYSIS USING PARTIAL CREDIT MODEL


ARTIFICAL DATA, TWO SAMPLES (EACH WITH N=500, N=(0,1)), 6 ITEMS
>FILES DFNAME='EXAMPL04.DAT';
INPUT NIDCHAR=4, MGROUP=2, NTOTAL=6;
(4A1,1X,1A1,1X,6A1)
>TEST TNAME='PARV3E', ITEM=(1,2,3,4,5,6),
INAME=('I001','I002','I003','I004','I005','I006'), NBLOCK=6;
>BLOCK1 REPEAT=6, NIT=1, NCAT=3, ORIGINAL=(1,2,3);
>MGROUP DIF=(0,1,0,0), GNAME=('MALE','FEMALE'), GCODE=('1','2');
>CALIB LOGISTIC, PARTIAL, NQPT=25, CYCLES=(100,1,1,1,1,1), NEWTON=20,
CRIT=0.01, POSTERIOR;
>SCORE ;

Phase 0 output

When the MGROUP keyword and MGROUP command are used or multiple TEST/BLOCK commands
are used, additional information is written to the phase 0 output file.

NUMBER OF SUBGROUPS: 2
FORMAT OF DATA INPUT IS
(4A1,1X,1A1,1X,6A1)

>TEST TNAME=PARV3E, ITEM=(1,2,3,4,5,6),


INAME=('I001','I002','I003','I004','I005','I006'), NBLOCK=6 ;

BLOCK CARD: 1
>BLOCK1 REPEAT=6, NIT=1, NCAT=3, ORIGINAL=(1,2,3) ;

BLOCK CARD: 2 IS COPIED FROM BLOCK 1


BLOCK CARD: 3 IS COPIED FROM BLOCK 1
BLOCK CARD: 4 IS COPIED FROM BLOCK 1
BLOCK CARD: 5 IS COPIED FROM BLOCK 1
BLOCK CARD: 6 IS COPIED FROM BLOCK 1

>MGROUP DIF=(0,1,0,0), GNAME=(MALE,FEMALE), GCODE=('1','2') ;

In the next few lines, the program echoes the information on parameters allowed to be different
between groups as specified with the DIF keyword: in this case, only the thresholds are allowed
to differ between the two groups. The MALE group will be used as reference group.

711
11 PARSCALE EXAMPLES

GROUP PARAMETER FOR SLOPE: NO


GROUP PARAMETER FOR THRESHOLD: YES
GROUP PARAMETER FOR CATEGORY: NO
GROUP PARAMETER FOR GUESSING: NO
REFERENCE GROUP FOR DIF MODEL: 1 MALE

SUBGROUP NAME AND CODE


======================

1 MALE 1
2 FEMALE 2

DIF OR COMMON BLOCK


===================
1 BLOCK DIF BLOCK
2 BLOCK DIF BLOCK
3 BLOCK DIF BLOCK
4 BLOCK DIF BLOCK
5 BLOCK DIF BLOCK
6 BLOCK DIF BLOCK

Phase 1 output

The only difference between the Phase 1 output for a single group analysis and for a multiple-
group analysis is that the summary item statistics are first given by subgroup and then for the to-
tal group. The output for the first item is shown below for all three cases. We see that females
were more likely to respond in category 3 and less likely to respond in category 1 than the males.
In general, 76% of the total responses were in category 3.

1 SUBGROUP: MALE

BLOCK NO.: 1 NAME: BLOCK


-------------------------------------------------------
ITEM | TOTAL NOT OMIT | CATEGORIES
| PRESENT |
| | 1 2 3
-------------------------------------------------------
I001 | |
FREQ.| 500 0 0| 152 13 335
PERC.| 0.0 0.0| 30.4 2.6 67.0
-------------------------------------------------------

2 SUBGROUP: FEMALE

BLOCK NO.: 1 NAME: BLOCK


-------------------------------------------------------
ITEM | TOTAL NOT OMIT | CATEGORIES
| PRESENT |
| | 1 2 3
-------------------------------------------------------
I001 | |
FREQ.| 500 0 0| 69 6 425
PERC.| 0.0 0.0| 13.8 1.2 85.0
-------------------------------------------------------

712
11 PARSCALE EXAMPLES

TOTAL

BLOCK NO.: 1 NAME: BLOCK


-------------------------------------------------------
ITEM | TOTAL NOT OMIT | CATEGORIES
| PRESENT |
| | 1 2 3
-------------------------------------------------------
I001 | |
FREQ.| 1000 0 0| 221 19 760
PERC.| 0.0 0.0| 22.1 1.9 76.0
-------------------------------------------------------

Item means, initial slope estimates, and Pearson and polyserial item-test correlations are given in
the next table. For a detailed discussion of the measures shown here, refer to the discussion of
the Phase 1 output of Section 11.1.

----------------------------------------------------------------------------
BLOCK | RESPONSE TOTAL SCORE | PEARSON & | INITIAL INITIAL
ITEM | MEAN MEAN | POLYSERIAL | SLOPE LOCATION
| S.D.* S.D.* | CORRELATION |
---------------------------------------------------------------------------
BLOCK | | |
1 I001 | 2.539 13.162 | 0.714 | 1.000 0.000
| 0.831* 3.765* | 0.976 |
----------------------------------------------------------------------------
CATEGORY | SCORING | MEAN | S.D. | PARAMETER
1 | 1.000 | 8.190 | 2.235 | 0.000
2 | 2.000 | 11.263 | 2.899 | -0.155
3 | 3.000 | 14.655 | 2.735 | 1.596
----------------------------------------------------------------------------

Phase 2 output

For the DIF model, a separate prior distribution is used for each group member, and the prior dis-
tribution is updated after each estimation cycle based on the posterior distribution from the pre-
vious cycle.

For the DIF model, it is assumed that different groups have different distributions with mean µ g
and standard deviation σ g . The distributions are not necessarily normal. These empirical poste-
rior distributions are estimated simultaneously with the estimation of the item parameters. To
obtain those parameters, the following constraint is imposed for the DIF model:

J J

∑ d R j = ∑ d Fj .
j =1 j =1

This constraint implies the overall difficulty levels of a test or a set of common items given to
both the reference group and focal group, indicated by subscripts R and F, respectively, are the
same. Therefore, the item difficulty parameters for the focal groups are adjusted. Any overall
difference in terms of test difficulty will be assumed to be the difference in ability level for sub-

713
11 PARSCALE EXAMPLES

groups. The ability level difference among groups can then be estimated by the posterior distri-
butions.

The first difference between the output file discussed here and the Phase 2 output for Section
11.1 concerns the scoring function and step parameters for the multiple blocks. As no scoring
function was specified on the CALIB command, the default scoring function 1, 2 will be used.

Under the partial credit model, the step parameters, also known as the item step difficulties or
category intersections, correspond to the points on the ability scale where two successive item
response category characteristic curves (IRCCC) intersect. The increasing difficulty of a step
relative to other steps within an item is associated with higher values of the step parameters. In
this example, where each item has 3 categories, 2 “steps” are needed to move from the first cate-
gory to the third category: a respondent needs to move from category 1 to category 2, and a sec-
ond step is needed to move from category 2 to category 3. From the second step parameters of
items 1 and 2 (see below) moving from category 2 to category 3 is harder to do in the case of
item 2 for the male respondents.

The IRCCC for items 1 and 5 are shown below. Vertical lines were added to indicate the trait
level at which the curves for step 0 and step 1 intersect. The most likely response for a male with
trait level of -2 would be to complete 0 steps in both cases. For a male with trait level of ap-
proximately 1.5, completing the step from category 2 to category 3 would be more likely in the
case of item 5. Although there is little difference between the two graphs, it would appear that
completing the first step is somewhat easier for item 1 than for item 5, while completing the sec-
ond step is easier for item 5. This is in agreement with the second step parameters for these
items: 1.769 for item 1 and 1.517 for item 5.

714
11 PARSCALE EXAMPLES

MULTIPLE GROUP MODEL [DIF (TREND) MODEL]


---------------------------------------

[GROUP: 1 MALE ]

ITEM BLOCK 1 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 -1.743 1.743
S.E. : 0.000 0.168 0.163

ITEM BLOCK 2 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 -1.733 1.733
S.E. : 0.000 0.155 0.156

ITEM BLOCK 3 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 -1.733 1.733
S.E. : 0.000 0.146 0.140

ITEM BLOCK 4 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 -1.755 1.755
S.E. : 0.000 0.149 0.147

ITEM BLOCK 5 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 -1.658 1.658
S.E. : 0.000 0.171 0.154

ITEM BLOCK 6 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 -1.974 1.974
S.E. : 0.000 0.177 0.186

715
11 PARSCALE EXAMPLES

The step parameter information is followed by the item parameter estimates for the male group.
Standard errors are computed from the empirical information matrix in the final Newton cycle.

+------+-----+---------+---------+---------+---------+---------+---------+
| ITEM |BLOCK| SLOPE | S.E. |LOCATION | S.E. |GUESSING | S.E. |
+======+=====+=========+=========+=========+=========+=========+=========+
| I001 | 1 | 0.846 | 0.054 | -0.590 | 0.070 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+
| I002 | 2 | 0.948 | 0.060 | 0.519 | 0.066 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+
| I003 | 3 | 0.628 | 0.034 | -0.542 | 0.076 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+
| I004 | 4 | 0.615 | 0.034 | 0.544 | 0.077 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+
| I005 | 5 | 0.414 | 0.025 | -0.666 | 0.098 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+
| I006 | 6 | 0.344 | 0.021 | 0.658 | 0.110 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+

Similar information for the female group is given next. Note that the slope for each item is com-
mon across the two groups. This implies that the same item discrimination is assumed over the
groups.

[GROUP: 2 FEMALE ]

ITEM BLOCK 1 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 -1.743 1.743
S.E. : 0.000 0.168 0.163

ITEM BLOCK 2 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 -1.733 1.733
S.E. : 0.000 0.155 0.156

ITEM BLOCK 3 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 -1.773 1.773
S.E. : 0.000 0.146 0.140

ITEM BLOCK 4 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 -1.755 1.755
S.E. : 0.000 0.149 0.147

ITEM BLOCK 5 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 -1.658 1.658
S.E. : 0.000 0.171 0.154

ITEM BLOCK 6 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 -1.974 1.974
S.E. : 0.000 0.177 0.186

716
11 PARSCALE EXAMPLES

+------+-----+---------+---------+---------+---------+---------+---------+
| ITEM |BLOCK| SLOPE | S.E. |LOCATION | S.E. |GUESSING | S.E. |
+======+=====+=========+=========+=========+=========+=========+=========+
| I001 | 1 | 0.846 | 0.054 | -0.615 | 0.085 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+
| I002 | 2 | 0.948 | 0.060 | 0.644 | 0.057 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+
| I003 | 3 | 0.628 | 0.034 | 0.010 | 0.075 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+
| I004 | 4 | 0.615 | 0.034 | -0.348 | 0.084 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+
| I005 | 5 | 0.414 | 0.025 | -0.645 | 0.118 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+
| I006 | 6 | 0.344 | 0.021 | 0.877 | 0.098 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+

DIF contrasts are given next. In the table below, the CONTRAST column gives the differences be-
tween item locations between the groups and the associated standard error. The STD column con-
tains standardized contrasts, obtained by dividing each contrast by its standard error. The prob-
ability that a normal variate exceeds the absolute value of the standardized difference is also
given. This is a one-sided test.

CONTRAST OF ITEM LOCATIONS:


GROUP 2: FEMALE MINUS REFERENCE GROUP 1: MALE
+---------------------------------+
|ITEM |BLOCK| CONTRAST | STD |
| | | (S.E.) | (PROB. ) |
+=====+=====+==========+==========+
|I001 | 1 | -0.025 | -0.230 |
| | |( 0.110)|( 0.409)|
| | | | |
|I002 | 2 | 0.125 | 1.433 |
| | |( 0.087)|( 0.076)|
| | | | |
|I003 | 3 | 0.552 | 5.176 |
| | |( 0.107)|( 0.000)|
| | | | |
|I004 | 4 | -0.892 | -7.798 |
| | |( 0.114)|( 0.000)|
| | | | |
|I005 | 5 | 0.021 | 0.138 |
| | |( 0.153)|( 0.445)|
| | | | |
|I006 | 6 | 0.219 | 1.494 |
| | |( 0.147)|( 0.068)|
+---------------------------------+

χ 2 -test statistics for the item location contrasts are given in the next section of the output file. In
this case, with only one degree of freedom, χ 2 = ( std . difference) 2 . This is a two-sided test. In
this table, these χ 2 -test statistics and exceedance probabilities are summarized.

717
11 PARSCALE EXAMPLES

CHI-SQUARES OF ITEM LOCATION CONTRASTS:

+---------------------------------+
|ITEM BLOCK CHI-SQRS D.F. |
| PROB. |
+=====+=====+==========+==========+
|I001 | 1 | 0.053 | 1. |
| | | | 0.803 |
+---------------------------------+
|I002 | 2 | 2.052 | 1. |
| | | | 0.148 |
+---------------------------------+
|I003 | 3 | 26.789 | 1. |
| | | | 0.000 |
+---------------------------------+
|I004 | 4 | 60.814 | 1. |
| | | | 0.000 |
+---------------------------------+
|I005 | 5 | 0.019 | 1. |
| | | | 0.861 |
+---------------------------------+
|I006 | 6 | 2.231 | 1. |
| | | | 0.131 |
+---------------------------------+
|TOTAL| | 91.958 | 6. |
| | | | 0.000 |
+---------------------------------+

When the summary statistics for the 2 groups are compared, we see that only the standard devia-
tion of the threshold differs. Recall that for this example, the DIF keyword on the MGROUP com-
mand was used to allow only threshold parameters to differ between the groups. Overall, no
large difference between groups over all items is observed.

SUMMARY STATISTICS OF PARAMETER ESTIMATES


1 GROUP NAME: MALE
+----------+---------+---------+----+
|PARAMETER | MEAN | STN DEV | N |
+==========+=========+=========+====+
|SLOPE | 0.633| 0.235| 6|
|LOG(SLOPE)| -0.520| 0.394| 6|
|THRESHOLD | -0.013| 0.645| 6|
|GUESSING | 0.000| 0.000| 0|
+----------+---------+---------+----+

2 GROUP NAME: FEMALE


+----------+---------+---------+----+
|PARAMETER | MEAN | STN DEV | N |
+==========+=========+=========+====+
|SLOPE | 0.633| 0.235| 6|
|LOG(SLOPE)| -0.520| 0.394| 6|
|THRESHOLD | -0.013| 0.648| 6|
|GUESSING | 0.000| 0.000| 0|
+----------+---------+---------+----+

The final output is the estimated latent distributions by group. The origin and unit of the scale are
set so that the mean and standard deviation of the reference group are 0 and 1 respectively.

718
11 PARSCALE EXAMPLES

A plot of the estimated latent distributions is given below. The solid line represents the distribu-
tion for the male group. If there is appreciable DIF, the latent distributions do not represent the
same latent variable and no meaningful comparison of the two distributions is possible. If there is
no DIF, significant differences between the latent distributions represent real differences between
the populations sampled.

QUADRATURE POINTS AND POSTERIOR WEIGHTS:


GROUP 1 GROUP NAME: MALE

1 2 3 4 5
POINT -0.4000E+01 -0.3667E+01 -0.3333E+01 -0.3000E+01 -0.2667E+01
WEIGHT 0.5988E-04 0.2137E-03 0.6808E-03 0.1934E-02 0.4887E-02

6 7 8 9 10
POINT -0.2333E+01 -0.2000E+01 -0.1667E+01 -0.1333E+01 -0.1000E+01
WEIGHT 0.1096E-01 0.2172E-01 0.3790E-01 0.5826E-01 0.8000E-01

11 12 13 14 15
POINT -0.6667E+00 -0.3333E+00 0.3331E-15 0.3333E+00 0.6667E+00
WEIGHT 0.1009E+00 0.1178E+00 0.1249E+00 0.1190E+00 0.1034E+00

16 17 18 19 20
POINT 0.1000E+01 0.1333E+01 0.1667E+01 0.2000E+01 0.2333E+01
WEIGHT 0.8257E-01 0.5917E-01 0.3742E-01 0.2086E-01 0.1029E-02

21 22 23 24 25
POINT 0.2667E+01 0.3000E+01 0.3333E+01 0.3667E+01 0.4000E+01
WEIGHT 0.4516E-02 0.1766E-02 0.6167E-03 0.1924E-03 0.5368E-04

719
11 PARSCALE EXAMPLES

TOTAL WEIGHT: 1.00000


MEAN : 0.00000
S.D. : 0.99974

QUADRATURE POINTS AND POSTERIOR WEIGHTS:

GROUP 2 GROUP NAME: FEMALE

1 2 3 4 5
POINT -0.4000E+01 -0.3667E+01 -0.3333E+01 -0.3000E+01 -0.2667E+01
WEIGHT 0.1485E-04 0.5381E-04 0.1748E-03 0.5093E-03 0.1331E-02

6 7 8 9 10
POINT -0.2333E+01 -0.2000E+01 -0.1667E+01 -0.1333E+01 -0.1000E+01
WEIGHT 0.3120E-02 0.6569E-02 0.1248E-01 0.2175E-01 0.3608E-01

11 12 13 14 15
POINT -0.6667E+00 -0.3333E+00 0.3331E-15 0.3333E+00 0.6667E+00
WEIGHT 0.5834E-01 0.8712E-01 0.1130E+00 0.1320E+00 0.1437E+00

16 17 18 19 20
POINT 0.1000E+01 0.1333E+01 0.1667E+01 0.2000E+01 0.2333E+01
WEIGHT 0.1360E+00 0.1059E+00 0.6927E-01 0.3922E-01 0.1955E-01

21 22 23 24 25
POINT 0.2667E+01 0.3000E+01 0.3333E+01 0.3667E+01 0.4000E+01
WEIGHT 0.8653E-02 0.3410E-02 0.1199E-02 0.3764E-03 0.1056E-04

TOTAL WEIGHT: 1.00000


MEAN : 0.00000
S.D. : 0.99974

11.5 A test with 26 multiple-choice items and one 4-category item: three-
parameter logistic and generalized partial credit model

This example illustrates a test consisting primarily of machine-scorable multiple choice items,
but also containing one open-ended item scored in three categories. The latter item appears in the
middle.

The item responses are from several test forms, and items not represented on a particular form
are assigned the not-presented code 9. The not-presented key appears in the exampl05.npc file.
The codes 1 and 0 for correct and incorrect response to the multiple-choice items must be re-
coded 1 and 2, respectively, for the PARSCALE analysis. This is accomplished through use of
the ORIGINAL and MODIFIED keywords on the BLOCK commands.

The first few lines of the file exampl05.dat are shown below. The data and command files can
be found in the examples folder of the PARSCALE installation.

720
11 PARSCALE EXAMPLES

1 110000000000199999999999999
2 110000000011199999999999999
3 011001000001199999999999999
4 110000100000199999999999999
5 101011010011199999999999999

The contents of exampl05.npc, are as shown below. It is indicated in the syntax by the NFNAME
keyword on the FILES command.

KEY 999999999999999999999999999

The first information read according to the format statement shown below is the case ID, which
is read in the format “3A1”. The NIDCHAR keyword is set to 3 to indicate that the case ID is six
characters in length. The response to the first item is in column 5, and the format (“27A1”) that
follows after skipping of the fourth column using the “X” operator indicates that 27 items are
read from each line.

The 3-parameter logistic model (3PL) is assumed for the multiple-choice items, and the partial
credit model is assumed for the open-ended item. Because the parameters of the 3PL model dif-
fer from one item to another, each item must be assigned to a separate block. This is facilitated
by the REPEAT keyword of the BLOCK command, which indicates the number of successive items
that have the same block specifications. In the present example, the first block specification ap-
plies to the first 12 multiple-choice items, the second applies to the open-ended item, and the
third applies to the remaining 14 multiple-choice items. Note also the assignment of separate
block names using the BNAME keyword.

The use of the SPRIOR and GPRIOR options on the CALIB command requests the use of a log-
normal prior distribution and a normal prior distribution on the slope and guessing parameters
respectively.

The Bayes estimates (EAP option on the SCORE command) of the respondents' scale scores are
estimated and saved.

EXAMPL05.PSL - A TEST WITH 26 MULTIPLE CHOICE ITEMS AND ONE 4-CATEGORY ITEM
THREE-PARAMETER LOGISTIC AND GENERALIZED PARTIAL CREDIT MODEL
>FILE DFNAME='EXAMPL05.DAT', NFNAME='EXAMPL05.NPC', SAVE ;
>SAVE PARM='EXAMPL05.PAR', SCORE='EXAMPL05.SCO' ;
>INPUT NIDCHAR=3, NTOTAL=27, NTEST=1, LENGTH=27;
(3A1,1X,27A1)
>TEST1 TNAME=SOCSCI, ITEM=(1(1)27), NBLOCK=27 ;
>BLOCKS BNAME=(MC01,MC02,MC03,MC04,MC05,MC06,MC07,MC08,MC09,MC10,MC11,MC12),
NITEMS=1, NCAT=2, ORIGINAL=(0,1), MODIFIED=(1,2),
REPEAT=12, GUESSING=(2,ESTIMATE) ;
>BLOCK BNAME=OE, NITEMS=1, NCAT=3, SCORING=(1,2,3) ;
>BLOCKS BNAME=(MC13,MC14,MC15,MC16,MC17,MC18,MC19,MC20,MC21,MC22,MC23,MC24,
MC25,MC26),
NITEMS=1, NCAT=2, ORIGINAL=(0,1), MODIFIED=(1,2),
REPEAT=14, GUESSING=(2,ESTIMATE) ;
>CALIB PARTIAL, LOGISTIC, NQPTS=15, CYCLE=(50,1,1,1,1), NEWTON=2,
CRIT=0.01, SPRIOR, GPRIOR ;
>SCORE EAP, SMEAN=0.0, SSD=1.0, NAME=SOCSCI ;

721
11 PARSCALE EXAMPLES

11.6 Analysis of three tests containing items with two and three categories:
calculation of combined scores

A partial credit model based on artificial data is discussed in this example. Six items, with either
2 or 3 categories each, are assigned to three subtests. In all cases, guessing parameters are esti-
mated.

The data file used is exampl06.dat in the examples subfolder of the PARSCALE installation.
The first few lines of the data file are shown below.

0001 113211 -1.98194


0002 222211 .07151
0003 222211 -.06528
0004 211221 -.72716
0005 222211 -.35792
0006 323212 .73036
0007 212212 -.53729
0008 211211 -1.40260
0009 221212 .09829
0010 122111 -.75451

The case identification is given in the first four columns of each line. Responses to the six items
are recorded in columns 6 to 11. At the end of each line, the generating trait value is given. This
value is not used in the analysis. The format statement used to read these data is:

(4A1,1X,6A1)

The items are analyzed in different ways in three subtests (NTEST=3 on INPUT). The LENGTH
keyword on the INPUT command indicates the length of each of the three subtests. The COMBINE
keyword on the INPUT command indicates that 3 COMBINE commands follow the SCORE com-
mand, while the SAVE option indicates that a SAVE command will follow directly after the FILES
command. On the SAVE command, names for external files to which subject scores and combined
scores will be saved are provided.

The first subtest consists of six items analyzed in six distinct blocks. The REPEAT keyword of the
first block indicates that the first three blocks each contain one 3-category item with item-
specific step parameters. The remaining blocks contain multiple-choice items with various guess-
ing parameters. The GPARM keyword is used here to correct the dichotomous item response prob-
abilities in the presence of the GUESSING keyword. These guessing parameters are used for the
initial parameter values and have a default value of zero. The value of (2,ESTIMATE) assigned to
the GUESSING keywords indicates that the second category is the correct response and that a
guessing parameter is to be estimated.

In the second subtest, the first 3 items are analyzed separately. In the third subtest, the last 3
items are analyzed separately. The convergence criterion for the iteration procedure is somewhat
relaxed for this test calibration (0.005 ==> 0.01) to obtain convergence.

Scores for the three subtests are combined in the scoring phase. These scores are saved to the ex-

722
11 PARSCALE EXAMPLES

ternal file exampl06.sco as specified on the SAVE command. They are combined as specified by
the COMBINE keyword in the INPUT command and the COMBINE commands following the last
SCORE command. The WEIGHT keywords on these commands have as values sets of positive frac-
tions summing to 1. These values are used as the weights for the subscale scores. Subscores are
combined linearly. In this example, three different combinations of the scores from the subtests
are requested. These scores are saved to the external file exampl06.cmb.

The command file exampl06.psl is shown below.

EXAMPLE 6: PARTIAL CREDIT MODEL


ARTIFICAL DATA, ONE SAMPLE (N=1000, N=(0,1)), 6 ITEMS
>FILES DFNAME='EXAMPL06.DAT', SAVE;
>SAVE SCORE='EXAMPL06.EAP', COMBINE='EXAMPL06.CMB';
>INPUT NIDCHAR=4, NTOT=6, NTEST=3, LENGTH=(6,3,3), COMBINE=3;
(4A1,1X,6A1)
>TEST1 TNAME=TEST1, ITEM=(1,2,3,4,5,6),
INAME=('P011','P012','P013','D011','D012','D013'), NBLOCK=6;
>BLOCK1 REPEAT=3, NIT=1, NCAT=3, ORIGINAL=(1,2,3);
>BLOCK4 NIT=1, NCAT=2, ORIGINAL=(1,2), GPARM=0.0, GUESS=(2,ESTIMATE);
>BLOCK5 NIT=1, NCAT=2, ORIGINAL=(1,2), GPARM=0.1, GUESS=(2,ESTIMATE);
>BLOCK6 NIT=1, NCAT=2, ORIGINAL=(1,2), GPARM=0.3, GUESS=(2,ESTIMATE);
>CALIB LOGISTIC, PARTIAL, NQPT=21, CYCLES=(100,1,1,1,1,1), NEWTON=2,
CRIT=0.005, SCALE=1.7;
>SCORE EAP, NAME=TEST1;
>TEST2 TNAME=TEST2, ITEM=(1,2,3), INAME=('P021','P022','P023'), NBLOCK=3;
>BLOCK1 REPEAT=3, NIT=1, NCAT=3, ORIGINAL=(1,2,3);
>CALIB LOGISTIC, PARTIAL, NQPT=21, CYCLES=(100,1,1,1,1,1), NEWTON=2,
CRIT=0.005, SCALE=1.7;
>SCORE EAP,NAME=TEST2;
>TEST3 TNAME=TEST3, ITEM=(4,5,6), INAME=('D031','D032','D033'), NBLOCK=3;
>BLOCK4 NIT=1, NCAT=2, ORIGINAL=(1,2), GPARM=0.0, GUESS=(2,ESTIMATE);
>BLOCK5 NIT=1, NCAT=2, ORIGINAL=(1,2), GPARM=0.1, GUESS=(2,ESTIMATE);
>BLOCK6 NIT=1, NCAT=2, ORIGINAL=(1,2), GPARM=0.3, GUESS=(2,ESTIMATE);
>CALIB LOGISTIC, PARTIAL, NQPT=21, CYCLES=(100,1,1,1,1,1), NEWTON=2,
CRIT=0.01, SCALE=1.7;
>SCORE EAP, NAME=TEST3;
>COMBINE1 NAME=SUM1, WEIGHT=(0.5,0.25,0.25);
>COMBINE2 NAME=SUM2, WEIGHT=(1.0,0.0,0.0);
>COMBINE3 NAME=SUM3, WEIGHT=(0.0,0.5,0.5);

11.7 Rater-effect model: multi-record input format with varying numbers of


raters per examinee

This example illustrates the parameter estimation for multiple raters. The analysis is based on
data in the file exampl07.dat in the raters folder of the PARSCALE installation folder. The first
few lines of the data are shown below.

00001 12 11 32 32
00001 22 21 42 42
00002 12 11 32 32
00002 22 22 43 42
00003 12 12 31 31
00003 23 22 43 41
00004 12 12 33 32
00004 22 22 42 42

723
11 PARSCALE EXAMPLES

00005 11 11 31 31
00005 22 21 41 42

The data contain the rating on four items administered to each examinee by four raters. The first
5 columns of each line of data contain the examinee ID. After two blank columns, the rater ID is
given, directly followed by the rating on the first item. Similar combinations of rater ID and rat-
ing for the other three items follow. As can be seen from the data above, the first line of data is
associated with examinee 00001 and contains the ratings for raters 1 and 3. The second line of
data, associated with the same examinee, contains the ratings for raters 2 and 4.

The data are read using the format statement

(5A1,4(2X,2A1))

where “5A1” is the format of the examinee ID, and “2X,2A1” the format for reading of one rater
ID/rating combination. The latter is repeated four times, using the notation “4( )”. Note that,
since the data for each examinee are given on two lines, R-INOPT=2 could have been specified on
the INPUT command and the format statement changed to

(5A1,4(2X,2A1),/T6,4(2X,2A1)).

The MRATER keyword on the INPUT command requests Rater's-Effect analysis, and indicates the
number of raters. The MRATER command provides necessary information about the four raters.

The estimated parameters and scores are saved to external output files using the SAVE option on
the FILES command and the PARM and SCORE keywords on the SAVE command.

The command file for a partial credit model based on these data is shown below.

EXAMPL07.PSL - ARTIFICIAL EXAMPLE: MONTE CARLO DATA


GENERALIZED PARTIAL CREDIT MODEL: RATERS’ EFFECT MODEL [NESTED DESIGN]
>FILES DFNAME='EXAMPL07.DAT', SAVE;
>SAVE PARM='EXAMPL07.PAR', SCORE='EXAMPL07.SCO';
>INPUT NIDCHAR=5, NTOT=4, LENGTH=4, NTEST=1, NFMT=1, MRATER=4;
(5A1,4(2X,2A1))
>TEST TNAME=RATERN, ITEM=(1,2,3,4), NBLOCK=4;
>BLOCK REPEAT=4, NITEMS=1, NCAT=3, ORIGINAL=('1','2','3'), MODIFIED=(1,2,3);
>MRATER RNAME=(RaterA,RaterB,RaterC,RaterD), RCODE=('1','2','3','4');
>CAL LOGISTIC, PARTIAL, NQPT=21, CYCLES=(100,1,1,1,1,1), NEWTON=2,
CRIT=0.05, DIAG=0, ITEMFIT=10, SCALE=1.7;
>SCORE EAP;

Phase 0 output

In addition to the standard Phase 0 output discussed elsewhere, information on the raters’ names,
codes, and the weight assigned to each is echoed to the output file.

The MRATER command used here only assigns names and codes to the rater. By default, the RATER
keyword, not included in the MRATER command shown here, assumes the value (1,1,1,1). The ar-
guments of this keyword are the raters’ weights. For the Raters-effect model, the ability score

724
11 PARSCALE EXAMPLES

for each respondent is computed for each subtest (or subscale) and each rater separately. A total
score of each respondent for each subtest (or subscale) is computed by summing those scores
over items within each subtest and all raters who have rated the respondent. The rater weights of
this keyword are used to compute the weighted subtest or subscale score for each respondent.
Since the number of raters who rated each respondent’s responses varies, the weights are normal-
ized (divided by their sum) for each respondent.

>MRATER RNAME=(RaterA,RaterB,RaterC,RaterD), RCODE=('1','2','3','4');

MULTIPLE GROUP MODEL: RATER’s EFFECT MODEL

RATER’s NAME, CODE, AND WEIGHT


==============================

1 RaterA 1 1.00
2 RaterB 2 1.00
3 RaterC 3 1.00
4 RaterD 4 1.00

Also included in the Phase 0 output is a listing of the first two observations, showing the input
and recoded responses. The raters responsible for each rating are also listed. This information is
provided so that the user can check that the data are read in correctly. If not, the variable format
statement (or the data) should be corrected.

INPUT AND RECODED RESPONSE OF FIRST AND SECOND OBSERVATIONS


OBSERVATION # 1
GROUP: 1
ID: 00001
INPUT RESPONSES: 2 1 2 2
RECODED RESPONSES: 2 1 2 2
RECODED RATERS : 1 1 3 3

OBSERVATION # 2
GROUP: 1
ID: 00001
INPUT RESPONSES: 2 1 2 2
RECODED RESPONSES: 2 1 2 2
RECODED RATERS : 2 2 4 4

The Phase 0 output also reports that 2000 lines of data were read from the data file, and indicates
that these 2000 observations are associated with 1000 examinees.

[MAIN TEST: RATERN ]


2000 OBSERVATIONS READ FROM FILE: EXAMPL08.DAT
2000 OBSERVATIONS WRITTEN TO FILE: exampl08.MFL

MULTIPLE RATERS DATA


1000 CASES READ FROM FILE: EXAMPL08.DAT

725
11 PARSCALE EXAMPLES

Phase 1 Output

The Phase 1 output file contains no additional information in this type of analysis. As usual, fre-
quencies and percentages for items nested within blocks are reported here. Information for the
first block/item is shown below.

SUMMARY ITEM STATISTICS


=======================

BLOCK NO.: 1 NAME: BLOCK


-------------------------------------------------------
ITEM | TOTAL NOT OMIT | CATEGORIES
| PRESENT |
| | 1 2 3
-------------------------------------------------------
0001 | |
FREQ.| 2000 0 0| 235 1145 620
PERC.| 0.0 0.0| 11.8 57.2 31.0
| |
-------------------------------------------------------

Phase 2 Output

The Phase 2 output file shows the standard output for category parameters and item parameters
at convergence. This is followed by rater parameters and their associated standard errors as
shown below.

ITEM BLOCK 1 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 1.243 -1.243
S.E. : 0.000 0.056 0.041

ITEM BLOCK 2 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 1.183 -1.183
S.E. : 0.000 0.037 0.049

ITEM BLOCK 3 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000

STEP PARAMTER : 0.000 1.298 -1.298

S.E. : 0.000 0.079 0.064

ITEM BLOCK 4 BLOCK


SCORING FUNCTION : 1.000 2.000 3.000
STEP PARAMTER : 0.000 1.273 -1.273
S.E. : 0.000 0.062 0.077

726
11 PARSCALE EXAMPLES

+------+-----+---------+---------+---------+---------+---------+---------+
| ITEM |BLOCK| SLOPE | S.E. |LOCATION | S.E. |GUESSING | S.E. |
+======+=====+=========+=========+=========+=========+=========+=========+
| 0001*| 1 | 0.814 | 0.041 | -0.515 | 0.039 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+
| 0002*| 2 | 0.935 | 0.047 | 0.410 | 0.037 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+
| 0003*| 3 | 0.491 | 0.027 | -0.502 | 0.051 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+
| 0004*| 4 | 0.505 | 0.028 | 0.508 | 0.050 | 0.000 | 0.000 |
+------+-----+---------+---------+---------+---------+---------+---------+
RATER’s EFFECT PARAMETER

RATER’s NAME PARAMETER S.E.


---------------------------------
RaterA -0.008 0.030
RaterB -0.006 0.030
RaterC 0.069 0.044
RaterD -0.055 0.044

NOTE: RATED ITEMS ARE MARKED BY “*'

From the output above, we see a marked difference between the raters, in particular between
RaterC and RaterD. The raters differ appreciably in severity.

11.8 Rater-effect model: one-record input format with same number of raters
per examinee

This example illustrates another option of rater data input (R-INOPT=1). The data in ex-
ampl07.dat (see Section 11.7) were reformatted so that rated responses for each respondent are
on one same record. This input option needs the NRATER keyword in the INPUT command to indi-
cate the number of times each item was rated. The number of raters is indicated using the MRATER
keyword on the same command.

EXAMPL08.PSL - ARTIFICIAL EXAMPLE (MONTE CARLO DATA)


GENERALIZED PARTIAL CREDIT MODEL: RATERS' EFFECT MODEL [NESTED DESIGN]
>FILE DFNAME='EXAMPL08.DAT', SAVE;
>SAVE PARAM='EXAMPL08.PAR', SCORE='EXAMPL08.SCO';
>INPUT R-INOPT=2, NIDCHAR=5, NTOT=4, LENGTH=4, NTEST=1, NFMT=1,
MRATER=4, NRATER=(2(0)4);
(5A1,8(2X,2A1))
>TEST TNAME=RATERN, ITEM=(1,2,3,4), NBLOCK=4;
>BLOCK REPEAT=4, NIT=1, NCAT=3, ORIGINAL=('1','2','3'), MOD=(1,2,3);
>MRATER RNAME=(RaterA,RaterB,RaterC,RaterD), RCODE=('1','2','3','4');
>CAL LOGISTIC, PARTIAL, NQPT=21, CYCLES=(100,1,1,1,1,1), NEWTON=2,
CRIT=0.05, DIAG=0, ITEMFIT=10, SCALE=1.7;
>SCORE EAP;

727
11 PARSCALE EXAMPLES

11.9 Raters-effect model: one-record input format with varying numbers of


records per examinee

This example illustrates another form of data input for multiple ratings. It is requested by setting
R-INOPT=1 on the INPUT command to indicate one line of data per examinee. The number of
items in the test is given in the LENGTH keyword.

The data in exampl09.dat (given in the raters folder) are formatted so that a rater ID code pre-
cedes each rating of the examinee’s response to an item. The INPUT command must include the
NRATER keyword to indicate the number of times each item has been rated. The MRATER keyword
is used to give the maximum number of raters for each of the items in the test. If any given item
of any particular case record has fewer than the maximum number of raters, the not-presented
code must be inserted for the rater code of each missing rater.

If an item is multiple-choice or is objectively scored, then the number of raters for the item in the
NRATER list must be set to zero. For those items, only the response code appears in the case re-
cord.

The total number of responses, NTOTAL, to all items in the data is equal to the sum of the number
of multiple-choice items plus the sum of number of raters in NRATER list. The INPUT command
must also contain the MRATER keyword, giving the number of different raters in the data. The
codes that identify the raters in the data must appear in the MRATER command. Labels for the rat-
ers in the output listing may be supplied in the RNAME keyword on the MRATER command.

The following is an example of a data record in exampl09.dat. There are 5 open-ended items,
but any given examinee is presented only 2 of these items. Rater codes and ratings for the re-
maining items are assigned the not-presented code 0. There are no multiple-choice items.

14 3 2 10 3 0 0 0 0 5 3 12 2 0 0 0 0 0 0 0 0

Examinee 14 was presented items 1 and 3. The response to item 1 was scored by rater 3, who
assigned it category 2, and by rater 10, who assigned it category 3. The response to item 3 was
scored by rater 5, who assigned it category 3, and by rater 12, who assigned it category 2.

The not-presented key must have the same format as the data records. In this case:

NPKY 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The raters are nested within items in these data; i.e., any given rater scores one, and only one,
response of any given examinee.

The complete command file for this analysis is given below.

728
11 PARSCALE EXAMPLES

EXAMPL09.PSL - DATA FROM A STUDY OF MULTIPLE RATINGS OF PERFORMANCE EXERCISES


GENERALIZED PARTIAL CREDIT MODEL: RATER EFFECTS MODEL [NESTED DESIGN]
>FILE DFNAME='EXAMPL09.DAT',NFNAME='EXAMPL09.DAT',SAVE;
>SAVE PARAM='EXAMPL09.PAR',SCORE='EXAMPL09.SCO';
>INPUT R-INOPT=2,NIDCHAR=4,NTOT=5,LENGTH=5,NTEST=1,NFMT=1,MRATER=10,
NRATER=(2(0)5);
(4A1,5(2X,A2,1X,A1,1X,A2,1X,A1))
>TEST TNAME=RATERN,ITEM=(1,2,3,4,5),NBLOCK=5;
>BLOCK REPEAT=4,NIT=1,NCAT=4,ORIGINAL=('1','2','3','4'),
MOD=(1,2,3,4);
>BLOCK NIT=1,NCAT=3,ORIGINAL=('1','2','3'),
MOD=(1,2,3);
>MRATER RNAME=(R3,R4,R5,R6,R7,R8,R9,R10,R11,R12),
RCODE=(' 3',' 4',' 5',' 6',' 7',' 8',' 9','10','11','12');
>CAL LOGISTIC,PARTIAL,NQPT=21,CYCLES=(50,1,1,1,1,1),NEWTON=2,
CRIT=0.05,DIAG=0,ITEMFIT=10,SCALE=1.7;
>SCORE EAP;

729
12 MULTILOG EXAMPLES

12 MULTILOG examples

12.1 One-parameter logistic model for a five-item binary-scored test (LSAT6)

The so-called LSAT Section 6 data includes the responses of 1000 examinees to five binary
items in a short section of the Law School Admission test. The data have been analyzed by Bock
& Lieberman (1970), Andersen & Madsen (1977), Bock & Aitkin (1981), Thissen (1982) and
others; the 1PL model with a Gaussian population distribution fits quite well.

Contents of the data file are shown below. Note that a frequency of 0 was obtained for the 11th
and 13th patterns.

1 00000 3
2 00001 6
3 00010 2
4 00011 11
5 00100 1
6 00101 1
7 00110 3
8 00111 4
9 01000 1
10 01001 8
11 01010 0
12 01011 16
13 01100 0
...
31 11110 28
32 11111 298

The examples in Sections 12.2 and 12.3 fit these data with the 2PL and 3PL models. The
PROBLEM command identifies the problem as RANDOM θ (requiring MML estimation) using
PATTERN-count data input. The TEST command defines the test as ALL L1, specifying the 1PL
model. The data, in the file exampl01.dat, have the response patterns defined by [0-1] strings,
with 1 coded for a correct response. The format reads the 5A1 item responses, followed by F4.0
to read the frequency.

Results are saved to an output file called exampl01.out. The first few pages of the MULTILOG
output give information about the problem; those are omitted from the selected output repro-
duced here. The results are on the final three pages of the output and are included here.

The parameters correspond to those given by Thissen (1982, p. 180). The values in parentheses
adjacent to each parameter are approximate standard errors. The final page of the MULTILOG
output for PATTERN input has two parts: the left section describes goodness-of-fit, and the right
section characterizes the distribution of θ for each response pattern. In the goodness-of-fit sec-
tion, the observed and expected frequencies are printed, as well as the standardized residuals:

[observed-expected]/ expected.

730
12 MULTILOG EXAMPLES

The EAP (Expected A Posteriori) estimate of θ for each pattern is also printed, as well as the
posterior standard deviation. At the bottom of the table, the likelihood ratio χ 2 goodness-of-fit
statistic value is printed. The command file exampl01.mlg is shown below.

EXAMPL01.MLG -
MML PARAMETER ESTIMATION FOR THE 1PL MODEL, LSAT6 DATA
>PROBLEM RANDOM, PATTERN, NITEMS=5, NGROUP=1, NPATTERNS=32,
DATA=‘EXAMPL01.DAT’;
>TEST ALL, L1;
>END;
2
01
11111
N
(4X,5A1,F4.0)

Selected output is shown below. Parameter estimates for item 1, with standard errors in parenthe-
ses, are given. These may be used to test if a parameter is significantly different from zero (t =
estimate/S.E.).

ITEM 1: 2 GRADED CATEGORIES


P(#) ESTIMATE (S.E.)
A 6 0.76 (0.05)
B( 1) 1 -3.61 (0.29)

The next section of output provides information on the contribution of item 1 to the total infor-
mation.

@THETA: INFORMATION: (Theta values increase in steps of 0.2)


-3.0 - -1.6 0.135 0.130 0.124 0.116 0.109 0.100 0.092 0.084
-1.4 - 0.0 0.076 0.068 0.061 0.054 0.048 0.043 0.037 0.033
0.2 - 1.6 0.029 0.025 0.022 0.019 0.016 0.014 0.012 0.011
1.8 - 3.0 0.009 0.008 0.007 0.006 0.005 0.004 0.004

OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN


CATEGORY(K): 1 2
OBS. FREQ. 76 924
OBS. PROP. 0.0760 0.9240
EXP. PROP. 0.0760 0.9240

Total information and standard errors, computed as 1/ Information .

TOTAL TEST INFORMATION

@THETA: INFORMATION:
-3.0 - -1.6 1.548 1.566 1.581 1.593 1.600 1.604 1.603 1.599
-1.4 - 0.0 1.590 1.578 1.561 1.541 1.519 1.493 1.466 1.437
0.2 - 1.6 1.407 1.376 1.346 1.316 1.287 1.259 1.233 1.208
1.8 - 3.0 1.186 1.165 1.145 1.128 1.112 1.098 1.086

@THETA: POSTERIOR STANDARD DEVIATION:


-3.0 - -1.6 0.804 0.799 0.795 0.792 0.790 0.790 0.790 0.791
-1.4 - 0.0 0.793 0.796 0.800 0.805 0.811 0.818 0.826 0.834

731
12 MULTILOG EXAMPLES

0.2 - 1.6 0.843 0.852 0.862 0.872 0.881 0.891 0.901 0.910
1.8 - 3.0 0.918 0.927 0.934 0.942 0.948 0.954 0.960

MARGINAL RELIABILITY: 0.2924

OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN


RES. : :

3.0( 2.4) 0.41 : -1.91 ( 0.80) : 11111


6.0( 5.5) 0.23 : -1.43 ( 0.80) : 11112
2.0( 2.5) -0.30 : -1.43 ( 0.80) : 11121
11.0( 8.2) 0.96 : -0.94 ( 0.81) : 11122
1.0( 0.9) 0.16 : -1.43 ( 0.80) : 11211

NEGATIVE TWICE THE LOGLIKELIHOOD= 21.8

12.2 Two-parameter model for the five-item test

The second example of MULTILOG fits the LSAT6 data with the 2PL model. A 3PL model is
fitted to the data in Section 12.1.

The test is redefined as L2 (for the 2PL model) on the TEST command. The results follow, in the
same format as before. The only differences are that each item has a different estimated slope
(A) value, and the value of the likelihood ratio statistic indicates a very slight improvement in fit,
from 21.8 for the 1PL model to 21.2 for the 2PL model.

The command file exampl02.mlg for this analysis is shown below, followed by the output ob-
tained from this run.

EXAMPL02.MLG -
MML PARAMETER ESTIMATION FOR THE 2PL MODEL, LSAT DATA
>PROBLEM RANDOM, PATTERN, NITEMS=5, NGROUP=1, NPATTERNS=32,
DATA=‘EXAMPL01.DAT’;
>TEST ALL, L2;
>END;
2
01
11111
N
(4X,5A1,F4.0)

Portions of the output are shown below.

ITEM SUMMARY
MML PARAMETER ESTIMATION FOR THE 2PL MODEL, LSAT DATA

ITEM 1: 2 GRADED CATEGORIES


P(#) ESTIMATE (S.E.)
A 1 0.82 (0.18)
B( 1) 2 -3.36 (0.62)

732
12 MULTILOG EXAMPLES

ITEM 2: 2 GRADED CATEGORIES


P(#) ESTIMATE (S.E.)
A 3 0.72 (0.11)
B( 1) 4 -1.37 (0.21)

MML PARAMETER ESTIMATION FOR THE 2PL MODEL, LSAT DATA

OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN


RES. : :
3.0( 2.3) 0.48 : -1.90 ( 0.80) : 11111
6.0( 5.9) 0.06 : -1.47 ( 0.80) : 11112
2.0( 2.6) -0.37 : -1.45 ( 0.80) : 11121
11.0( 8.9) 0.69 : -1.03 ( 0.81) : 11122
1.0( 0.7) 0.36 : -1.33 ( 0.80) : 11211
1.0( 2.6) -1.00 : -0.90 ( 0.81) : 11212
3.0( 1.2) 1.67 : -0.88 ( 0.81) : 11221
4.0( 6.0) -0.80 : -0.44 ( 0.82) : 11222
1.0( 1.8) -0.62 : -1.43 ( 0.80) : 12111
8.0( 6.4) 0.63 : -1.00 ( 0.81) : 12112
0.0( 2.9) -1.70 : -0.98 ( 0.81) : 12121
16.0( 13.6) 0.66 : -0.55 ( 0.82) : 12122
0.0( 0.9) -0.92 : -0.86 ( 0.81) : 12211
3.0( 4.4) -0.66 : -0.42 ( 0.82) : 12212
2.0( 2.0) 0.00 : -0.40 ( 0.82) : 12221
15.0( 13.9) 0.29 : 0.05 ( 0.84) : 12222
10.0( 9.5) 0.16 : -1.37 ( 0.80) : 21111
29.0( 34.6) -0.95 : -0.94 ( 0.81) : 21112
14.0( 15.6) -0.40 : -0.92 ( 0.81) : 21121
81.0( 76.5) 0.51 : -0.48 ( 0.82) : 21122
3.0( 4.7) -0.78 : -0.79 ( 0.81) : 21211
28.0( 25.0) 0.60 : -0.35 ( 0.82) : 21212
15.0( 11.5) 1.04 : -0.33 ( 0.82) : 21221
80.0( 83.5) -0.38 : 0.12 ( 0.84) : 21222
16.0( 11.2) 1.42 : -0.90 ( 0.81) : 22111
56.0( 56.1) -0.01 : -0.46 ( 0.82) : 22112
21.0( 25.6) -0.91 : -0.44 ( 0.82) : 22121
173.0( 173.5) -0.04 : 0.01 ( 0.83) : 22122
11.0( 8.5) 0.88 : -0.31 ( 0.82) : 22211
61.0( 62.5) -0.19 : 0.15 ( 0.84) : 22212
28.0( 29.1) -0.20 : 0.17 ( 0.84) : 22221
298.0( 296.6) 0.08 : 0.65 ( 0.86) : 22222

NEGATIVE TWICE THE LOGLIKELIHOOD= 21.2


(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

12.3 Three-parameter (and guessing) model for the five-item test

The third run of MULTILOG fits the LSAT6 data with the 3PL model. The test is redefined as
L3 (for the 3PL) on the TEST command. 1PL and 2PL models for these data are discussed in Sec-
tions 12.1 and 12.2. This example also illustrates the use of Bayesian priors for some of the item
parameters. Specifically, the PRIORS command indicates that for all five items
[ITEMS=(1,2,3,4,5)] the parameter DK=1, which is the logit of the lower asymptote, should be
estimated with a Gaussian prior distribution with a mean of –1.4 and a standard deviation of 1.0.
The value –1.4 is chosen for the mean because that is the logit of 0.2, and the items of LSAT6
were five-alternative multiple-choice items. The complete command file exampl03.mlg is shown
below.

733
12 MULTILOG EXAMPLES

EXAMPL03.MLG -MML PARAMETER ESTIMATION


3PL MODEL WITH PRIORS ON THE ASYMPTOTES, LSAT DATA
>PROBLEM RANDOM, PATTERN, NITEMS=5, NGROUP=1, NPATTERNS=32,
DATA=‘EXAMPL01.DAT’;
>TEST ALL,L3;
>PRIORS ITEMS=(1,2,3,4,5), DK=1, PARAMS=(-1.4,1.0);
>END;
2
01
11111
N
(4X,5A1,F4.0)

The presentation of the parameter estimates is different in this third (3PL) output from
MULTILOG. MULTILOG interprets the 1PL and 2PL models as binary versions of Samejima’s
(1969) graded model, giving the output form in the first two runs. The 3PL is estimated as a bi-
nary form of the multiple-choice model; so contrasts between the two slopes (correct and incor-
rect) and intercepts are estimated, as well as the logit of the lower asymptote. For convenience,
the three parameters are transformed into the more commonly used “Traditional 3PL, normal
metric” form on the first line for each item.

The results indicate that there is very little information about the lower asymptote parameters for
these very easy items; all of the estimated values of the lower asymptote are very near their prior
expected value of 0.2. The most difficult of the five items (item 3) has an estimated asymptote of
0.18. The likelihood ratio statistic indicates that this model does not fit quite as well as the 2PL
model did. That is true, although it seems odd. The Maximum Likelihood estimates (computed
with no priors) for the 3PL model for these data are identical to the 2PL estimates: all of the as-
ymptotes are estimated to be zero. The prior holds the estimates of the asymptotes near 0.2, and
does not fit quite as well. It does not fit particularly worse, either; there is very little information
available for estimating the lower asymptotes for these items.

Selected output for this run follows (only item 3 shown here), followed by the total information,
observed and expected frequencies, and value of −2 ln L :

ITEM 3: 2 NOMINAL CATEGORIES, 2 HIGH


TRADITIONAL 3PL, NORMAL METRIC: A B C
0.70 0.19 0.18

CONTRAST-COEFFICIENTS (STANDARD ERRORS)


FOR: A C D
CONTRAST P(#) COEFF.[ DEV.] P(#) COEFF.[ DEV.] P(#) COEFF.[ DEV.]
1 7 1.18 (0.25) 8 -0.22 (0.33) 9 -1.54 (0.63)

@THETA: INFORMATION: (Theta values increase in steps of 0.2)


-3.0 - -1.6 0.003 0.005 0.007 0.010 0.015 0.022 0.032 0.045
-1.4 - 0.0 0.062 0.082 0.106 0.133 0.161 0.189 0.082 0.233
0.2 - 1.6 0.245 0.249 0.245 0.233 0.216 0.194 0.171 0.148
1.8 - 3.0 0.126 0.106 0.088 0.072 0.059 0.048 0.038

734
12 MULTILOG EXAMPLES

OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN


CATEGORY(K): 1 2
OBS. FREQ. 447 553
OBS. PROP. 0.4470 0.5530
EXP. PROP. 0.4469 0.5531

TOTAL TEST INFORMATION

@THETA: INFORMATION:
-3.0 - -1.6 1.275 1.297 1.317 1.337 1.355 1.373 1.391 1.409
-1.4 - 0.0 1.427 1.446 1.465 1.484 1.500 1.514 1.522 1.522
0.2 - 1.6 1.514 1.497 1.471 1.438 1.400 1.360 1.318 1.278
1.8 - 3.0 1.240 1.206 1.175 1.148 1.124 1.104 1.087

@THETA: POSTERIOR STANDARD DEVIATION:


-3.0 - -1.6 0.885 0.878 0.871 0.865 0.859 0.853 0.848 0.843
-1.4 - 0.0 0.837 0.832 0.826 0.821 0.816 0.813 0.811 0.811
0.2 - 1.6 0.813 0.817 0.824 0.834 0.845 0.858 0.871 0.885
1.8 - 3.0 0.898 0.911 0.923 0.933 0.943 0.952 0.959

MARGINAL RELIABILITY: 0.3084

OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN


RES. : :
3.0( 2.0) 0.70 : -1.77 ( 0.78) : 11111
6.0( 5.5) 0.20 : -1.39 ( 0.78) : 11112
2.0( 2.5) -0.34 : -1.40 ( 0.79) : 11121
11.0( 8.9) 0.69 : -1.00 ( 0.79) : 11122
1.0( 0.8) 0.23 : -1.45 ( 0.85) : 11211

NEGATIVE TWICE THE LOGLIKELIHOOD= 21.5


(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

12.4 Three-category graded logistic model for a two-item questionnaire

Clogg & Goodman (1984) analyzed a set of data for two responses (six weeks apart) to a three-
alternative graded questionnaire item about “happiness.” Some of their data are analyzed here
with ordered latent trait models. The data are in a file called exampl04.dat, there are three re-
sponse codes: 1 = very happy, 2 = pretty happy, and 3 = not too happy.

In this example, we fit these data with Samejima’s (1969) graded model. In the next section, we
estimate the parameters of a version of Masters’ (1982) partial credit model for the same data.
Another example of a graded model can be found in Section 12.7. The TEST command defines
the model as “GRADED,” with 3 categories for each of the two items. The items are labeled “PRE”
and “POST” on the LABELS command. The slope parameters are constrained to be equal. The long
form of key entry is required for multiple-category items: each response code in the data must be
assigned to a category of the model.

The graded model assumes that the response corresponding to the highest value of the trait (here,
happiness) has the highest value, so response 1 is placed in category 3 for both items, 2 in cate-
gory 2 and 3 in category 1.

735
12 MULTILOG EXAMPLES

The data file exampl04.dat and command file exampl04.mlg are shown below.

11 46
12 31
13 8
21 20
22 68
23 12
31 1
32 12
33 11

EXAMPL04-1.MLG - MML ESTIMATION, SAMEJIMA’S GRADED MODEL


FOR THE 2ND-YEAR HAPPINESS DATA
>PROBLEM RANDOM, PATTERNS, NITEMS=2, NGROUPS=1, NPATTERNS=9,
DATA=‘EXAMPL04.DAT’;
>TEST ALL, GRADED, NC=(3,3);
>LABELS ITEMS=(1,2), NAMES=(‘PRE’ , ‘POST’);
>EQUAL ALL, AJ;
>END;
3
123
33
22
11
(2A1,F4.0)

In the MULTILOG output, the estimated parameters are printed in a format similar to those for
the 1PL and 2PL models in Sections 12.1 and 12.2, except that there are two thresholds for each
of the three-category items. As before, the goodness-of-fit statistics and EAP[ θ ]s are printed on
the final page of the MULTILOG listing. The model fits these data satisfactorily: the likelihood
ratio χ 2 statistic is 7.4 on 3 d.f., p = 0.07. Selected output is given below.

ITEM 1: PRE 3 GRADED CATEGORIES


P(#) ESTIMATE (S.E.)
A 1 1.83 (0.19)
B( 1) 2 -1.65 (0.20)
B( 2) 3 0.32 (0.12)

@THETA: INFORMATION: (Theta values increase in steps of 0.2)


-3.0 - -1.6 0.240 0.324 0.426 0.542 0.660 0.765 0.837 0.864
-1.4 - 0.0 0.846 0.798 0.744 0.707 0.700 0.727 0.798 0.830
0.2 - 1.6 0.862 0.852 0.796 0.702 0.587 0.469 0.361 0.270
1.8 - 3.0 0.197 0.142 0.101 0.072 0.050 0.035 0.025

OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN


CATEGORY(K): 1 2 3
OBS. FREQ. 24 100 85
OBS. PROP. 0.1148 0.4785 0.4067
EXP. PROP. 0.1165 0.4773 0.4062

736
12 MULTILOG EXAMPLES

ITEM 2: POST 3 GRADED CATEGORIES


P(#) ESTIMATE (S.E.)
A 1 1.83 (0.19)
B( 1) 4 -1.44 (0.18)
B( 2) 5 0.63 (0.14)

@THETA: INFORMATION: (Theta values increase in steps of 0.2)


-3.0 - -1.6 0.173 0.238 0.321 0.423 0.538 0.656 0.761 0.833
-1.4 - 0.0 0.860 0.840 0.789 0.727 0.680 0.664 0.840 0.732
0.2 - 1.6 0.794 0.844 0.860 0.829 0.753 0.646 0.528 0.413
1.8 - 3.0 0.313 0.232 0.168 0.120 0.085 0.060 0.042

OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN


CATEGORY(K): 1 2 3
OBS. FREQ. 31 111 67
OBS. PROP. 0.1483 0.5311 0.3206
EXP. PROP. 0.1478 0.5284 0.3238

ITEM 3: GRP1, N[MU: 0.00 SIGMA: 1.00]


P(#);(S.E.): 7; (0.00) 8; (0.00)

@THETA: INFORMATION: (Theta values increase in steps of 0.2)


-3.0 - -1.6 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
-1.4 - 0.0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
0.2 - 1.6 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
1.8 - 3.0 1.000 1.000 1.000 1.000 1.000 1.000 1.000

TOTAL TEST INFORMATION


@THETA: INFORMATION:
-3.0 - -1.6 1.413 1.562 1.748 1.965 2.198 2.421 2.597 2.697
-1.4 - 0.0 2.706 2.639 2.533 2.434 2.381 2.391 2.639 2.563
0.2 - 1.6 2.655 2.696 2.656 2.531 2.340 2.115 1.888 1.683
1.8 - 3.0 1.511 1.374 1.269 1.192 1.136 1.095 1.067

@THETA: POSTERIOR STANDARD DEVIATION:


-3.0 - -1.6 0.841 0.800 0.756 0.713 0.674 0.643 0.620 0.609
-1.4 - 0.0 0.608 0.616 0.628 0.641 0.648 0.647 0.616 0.625
0.2 - 1.6 0.614 0.609 0.614 0.629 0.654 0.688 0.728 0.771
1.8 - 3.0 0.814 0.853 0.888 0.916 0.938 0.956 0.968

MARGINAL RELIABILITY: 0.5762

OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN


RES. : :
46.0( 44.3) 0.25 : 1.09 ( 0.70) : 33
31.0( 36.8) -0.95 : 0.32 ( 0.64) : 32
8.0( 3.8) 2.15 : -0.27 ( 0.72) : 31
20.0( 21.5) -0.33 : 0.31 ( 0.64) : 23
68.0( 61.3) 0.85 : -0.34 ( 0.61) : 22
12.0( 16.9) -1.19 : -0.96 ( 0.64) : 21
1.0( 1.8) -0.62 : -0.22 ( 0.75) : 13
12.0( 12.3) -0.09 : -0.93 ( 0.65) : 12
11.0( 10.2) 0.25 : -1.67 ( 0.66) : 11

NEGATIVE TWICE THE LOGLIKELIHOOD= 7.4


(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

737
12 MULTILOG EXAMPLES

12.5 Three-category partial credit model for the two-item questionnaire

In this example, we estimate the parameters of a version of Masters’ (1982) partial credit model
for these same “happiness” data considered in Section 12.4 where Samejima’s (1969) graded
model was fitted. A description of the data file can also be found in that section. The model for
the test items is redefined to be NOMINAL, with 3 categories for each item, and category 3 is
“HIGH.” The sequence

>TMATRIX ALL, AK, POLYNOMIAL;


>EQUAL ALL, AK=1;
>FIX ALL, AK=2, VALUE=0.0;

specifies that POLYNOMIAL contrasts are to be used for the ak parameters of the NOMINAL model,
with the linear contrasts constrained to be equal for the two items and the quadratic contrasts
FIXED at zero. The command

>TMATRIX ALL, CK, TRIANGLE;

specifies the “TRIANGLE” contrast matrix for the ck parameters of the NOMINAL model. Thissen
& Steinberg (1986) show that this parameterization of the NOMINAL model is equivalent to Mas-
ter’s (1982) partial credit model; the only difference between the model as fitted here and that
fitted by Masters is the inclusion here of the Gaussian population distribution. This model does
not fit these data quite as well as Samejima’s (1969) graded model .

With this parameterization, the parameter values printed by MULTILOG are the slope contrast,
which is the slope of the trace lines relative to the unit standard deviation of the population dis-
tribution, and the c-contrasts, which are equivalent to Masters’ δ s : the points at which the suc-
cessive ordered trace lines cross. A property of the partial credit model is that it is a “Rasch-
type” model; response patterns with the same total raw score have the same posterior distribution
of θ . This means, for instance, that the response patterns that total 5 (32 and 23) have the same
EAP[ θ ], 0.34, with the same standard deviation, 0.65. This property of raw-score sufficiency for
θ is not obtained with the Samejima graded model, even when the slopes are constrained to be
equal, as in the preceding run. It is only obtained with this model when, as here, the slopes are
constrained to be equal for all items.

The command file exampl05.mlg is given below, followed by selected output for item 1 only.

EXAMPL05.MLG - MML ESTIMATION, PARTIAL CREDIT MODEL


2ND YEAR HAPPINESS DATA
>PROBLEM RANDOM, PATTERNS, NITEMS=2, NGROUP=1, NPATTERNS=9,
DATA=‘EXAMPL04.DAT’;
>TEST ALL, NOMINAL, NC=(3,3), HIGH=(3,3);
>LABELS ITEMS=(1,2), NAMES=(‘PRE’ , ‘POST’);
>TMATRIX ALL, AK, POLYNOMIAL;
>EQUAL ALL, AK=1;
>FIX ALL, AK=2, VALUE=0.0;
>TMATRIX ALL, CK, TRIANGLE;
>END;
3

738
12 MULTILOG EXAMPLES

123
33
22
11
(2A1,F4.0)

ITEM 1: PRE 3 NOMINAL CATEGORIES, 3 HIGH


CATEGORY(K): 1 2 3
A(K) -1.64 0.00 1.64
C(K) 0.00 2.66 2.19

CONTRAST-COEFFICIENTS (STANDARD ERRORS)


FOR: A C
CONTRAST P(#) COEFF.[POLY.] P(#) COEFF.[ TRI.]
1 1 1.64 (0.19) 2 -2.66 (0.40)
2 6 0.00 (0.00) 3 0.47 (0.21)

@THETA: INFORMATION: (Theta values increase in steps of 0.2)


-3.0 - -1.6 0.233 0.304 0.387 0.480 0.576 0.667 0.741 0.792
-1.4 - 0.0 0.815 0.818 0.809 0.799 0.798 0.805 0.818 0.818
0.2 - 1.6 0.802 0.761 0.693 0.607 0.511 0.416 0.329 0.255
1.8 - 3.0 0.193 0.145 0.107 0.079 0.058 0.042 0.030

OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN


CATEGORY(K): 1 2 3
OBS. FREQ. 24 100 85
OBS. PROP. 0.1148 0.4785 0.4067
EXP. PROP. 0.1153 0.4770 0.4077

TOTAL TEST INFORMATION


@THETA: INFORMATION:
-3.0 - -1.6 1.409 1.536 1.688 1.864 2.051 2.236 2.398 2.518
-1.4 - 0.0 2.586 2.604 2.589 2.563 2.544 2.545 2.604 2.582
0.2 - 1.6 2.583 2.547 2.463 2.332 2.166 1.983 1.802 1.636
1.8 - 3.0 1.493 1.375 1.282 1.209 1.154 1.113 1.082

@THETA: POSTERIOR STANDARD DEVIATION:


-3.0 - -1.6 0.843 0.807 0.770 0.733 0.698 0.669 0.646 0.630
-1.4 - 0.0 0.622 0.620 0.621 0.625 0.627 0.627 0.620 0.622
0.2 - 1.6 0.622 0.627 0.637 0.655 0.680 0.710 0.745 0.782
1.8 - 3.0 0.818 0.853 0.883 0.909 0.931 0.948 0.961

MARGINAL RELIABILITY: 0.5700

OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN


RES. : :
46.0( 43.5) 0.38 : 1.08 ( 0.71) : 33
31.0( 38.0) -1.14 : 0.34 ( 0.65) : 32
8.0( 3.7) 2.24 : -0.33 ( 0.63) : 31
20.0( 22.1) -0.45 : 0.34 ( 0.65) : 23
68.0( 60.4) 0.97 : -0.33 ( 0.63) : 22
12.0( 17.1) -1.24 : -0.97 ( 0.63) : 21
1.0( 1.5) -0.44 : -0.33 ( 0.63) : 13
12.0( 12.3) -0.08 : -0.97 ( 0.63) : 12
11.0( 10.3) 0.23 : -1.65 ( 0.66) : 11

NEGATIVE TWICE THE LOGLIKELIHOOD= 8.4

(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

739
12 MULTILOG EXAMPLES

12.6 Four-category graded model for a two-item interview scale

Klassen & O’Connor (1989) conducted a prospective study of predictors of violence in adult
male mental health admissions. One combination of possible predictors of subsequent violence
involved data readily available in mental health center records: The number of prior (inpatient)
admissions and age at the first such admission. Both a large number of previous admissions and
a young age at first admission are considered possible predictors of subsequent violence, pre-
sumably because they both reflect more serious psychopathology.

In acquiring the interview data, Klassen & O’Connor (1989) divided both age at first admission
and number of prior admissions into four ordered categories. The two variables do not really ap-
pear to be test items. But they are related to each other, in an obvious sort of way: Those whose
first admission was at a relatively young age tend to have had more previous admissions
[ χ 2 (9) = 16.4, p = 0.05 for independence].

From the point of view of item response theory, the fact that these two “items” are not independ-
ent is explained by their common relationship to an underlying variable: the “long-term nature”
or “seriousness” of the mental health problems for which the person is being admitted. From the
point of view of the researchers attempting to predict subsequent behavior, estimates of individ-
ual values on that underlying continuum may be more useful than either of the two observed
variables alone. Thissen (1991) describes fitting these data with Samejima’s (1969) graded
model, and the consequences for estimating individual scores. This example illustrates the use of
MULTILOG for this purpose. The data are given in the file exampl06.dat. Additional graded
models for the happiness data are discussed in Section 12.4 and 12.7.

The command file used for this analysis is exampl06.mlg:

EXAMPL06.MLG -ADMISSIONS DATA


UNCONSTRAINED GRADED MODEL
>PROBLEM RANDOM, PATTERNS, NITEMS=2, NGROUPS=1, NPATTERNS=16,
DATA=‘EXAMPL06.DAT’;
>TEST ALL, GRADED, NC=(4,4);
>END;
4
0123
11
22
33
44
(1X,2A1,F5.0)

Selected output is given below.

ITEM 1: 4 GRADED CATEGORIES


P(#) ESTIMATE (S.E.)
A 1 0.87 (0.16)
B(1) 2 -1.95 (0.37)
B(2) 3 -0.19 (0.17)
B(3) 4 2.57 (0.48)

740
12 MULTILOG EXAMPLES

MARGINAL RELIABILITY: 0.3359

ADMISSIONS DATA, UNCONSTRAINED GRADED MODEL

OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN


RES. : :
28.0( 28.0) 0.00 : -1.01 ( 0.86) : 11
15.0( 15.1) -0.03 : -0.46 ( 0.81) : 12
8.0( 6.2) 0.70 : -0.17 ( 0.84) : 13
5.0( 6.7) -0.65 : 0.08 ( 0.90) : 14
35.0( 35.3) -0.04 : -0.64 ( 0.81) : 21
23.0( 24.7) -0.34 : -0.17 ( 0.77) : 22
12.0( 11.5) 0.15 : 0.10 ( 0.79) : 23
15.0( 13.4) 0.43 : 0.36 ( 0.84) : 24
43.0( 40.0) 0.47 : -0.23 ( 0.83) : 31
35.0( 36.9) -0.32 : 0.20 ( 0.78) : 32
19.0( 20.5) -0.33 : 0.49 ( 0.78) : 33
29.0( 28.5) 0.09 : 0.79 ( 0.83) : 34
6.0( 8.9) -0.98 : 0.10 ( 0.89) : 41
14.0( 10.1) 1.23 : 0.54 ( 0.82) : 42
6.0( 6.6) -0.24 : 0.85 ( 0.81) : 43
11.0( 11.4) -0.13 : 1.24 ( 0.86) : 44

NEGATIVE TWICE THE LOGLIKELIHOOD= 4.2


(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

12.7 A graded model analysis of item-wording effect on responses to an opin-


ion survey

In this example, we illustrate the use of MULTILOG with data from an experiment conducted
during the 1974 General Social Survey. The data involve two questions. The first question (in
form A) was, “In general, do you think the courts in this area deal too harshly or not harshly
enough with criminals?”; the responses used here (with their codes) are “Courts too harsh” (1),
“About right” (2), and “Not harsh enough” (3). The second question produced a classification of
the respondents into the three categories “Liberal” (1), “Moderate” (2), and “Conservative” (3).
The first question was asked in different wordings on two forms. The first wording is given
above; the second wording (used on form B) was “In general, do you think the courts in this area
deal too harshly or not harshly enough with criminals, or don’t you have enough information
about the courts to say?” The two forms were randomly assigned to the respondents to the sur-
vey. The point of the split-ballot experiment was to determine the effect of the explicitly offered
“don’t know” alternative in form B. About 7% of the group one (form A) respondents said they
“didn’t know,” and about 29% of the group two (form B) respondents said they “didn’t know.”
Thus, as expected, explicit provision of “don’t know” as an alternative increased the probability
of that response.

Here, we consider only the data from the respondents who chose one of the three (coded) sub-
stantive alternatives listed above. Setting aside the people (differing numbers in the two groups)
who said they “didn’t know,” we consider the hypothesis that the structure of the responses to
the two questions is the same for both wordings. To do this, we hypothesize that a single under-
lying latent variable (in this case, political liberalism-conservativism) accounts for the observed
covariances between the responses to the two questions. We fit the data with Samejima’s (1969)

741
12 MULTILOG EXAMPLES

graded item response model, and consider the goodness-of-fit, the trace lines, and the conse-
quences of the model for inferences about the political attitudes of the respondents.

The data are in the file exampl07.dat. Contents of the command file exampl07.mlg are shown
below. The command lines entered here indicate that the problem is one involving RANDOM
(MML) item parameter estimation, using response-PATTERN data, for 2 items, and 2 groups. The
GRADED model is used, with 3 response categories for each item.

EXAMPL07.MLG - ITEM 2: LIB,MOD,CONS; ITEM 1: COURTS HARSH--NOT;


2 FORMS
>PROBLEM RANDOM, PATTERN, NITEMS=2, NGROUP=2, NPATTERN=18,
DATA=’EXAMPL07.DAT’;
>TEST ALL, GRADED, NCATS=(3,3);
>END;
3
123
11
22
33
(I1,1X,2A1,F4.0)

The data file is shown below. The first column contains 1 for form A, and 2 for form B. Columns
3 and 4 contain codes (1, 2, and 3) for the responses to the two items. The frequencies for each
response pattern for each group are in columns 6-8.

This example illustrates MULTILOG’s use of numbers from 1 to the number of groups (in this
case, 2) to denote group membership. When there is only one group, no group number is read in
the data.

1 11 16
1 12 16
1 13 5
1 21 24
1 22 29
1 23 13
1 31 122
1 32 224
1 33 185
2 11 21
2 12 7
2 13 3
2 21 16
2 22 11
2 23 11
2 31 112
2 32 152
2 33 126

Annotated output is given below. On the first page of the output, MULTILOG reports on the
state of its internal control codes. This information is used mostly for trouble-shooting.

742
12 MULTILOG EXAMPLES

EXAMPL07.MLG - ITEM 2: LIB,MOD,CONS; ITEM 1: COURTS HARSH--NOT


2 FORMS
DATA PARAMETERS:
NUMBER OF LINES IN THE DATA FILE: 18
NUMBER OF CATEGORICAL-RESPONSE ITEMS: 2
NUMBER OF CONTINUOUS-RESPONSE ITEMS, AND/OR GROUPS: 2
TOTAL NUMBER OF ‘ITEMS’ (INCLUDING GROUPS): 4
NUMBER OF CHARACTERS IN ID FIELDS: 0
MAXIMUM NUMBER OF RESPONSE-CODES FOR ANY ITEM: 3

THE MISSING VALUE CODE FOR CONTINUOUS DATA: 9.0000


RESPONSE-PATTERN FREQUENCIES WILL BE READ

THE DATA WILL BE STORED IN MEMORY

ESTIMATION PARAMETERS:
THE ITEMS WILL BE CALIBRATED--
BY MARGINAL MAXIMUM LIKELIHOOD ESTIMATION
MAXIMUM NUMBER OF EM CYCLES PERMITTED: 25
NUMBER OF PARAMETER-SEGMENTS USED IS: 1
NUMBER OF FREE PARAMETERS IS: 7
MAXIMUM NUMBER OF M-STEP ITERATIONS IS 4 TIMES
THE NUMBER OF PARAMETERS IN THE SEGMENT
THE M-STEP CONVERGENCE CRITERION IS: 0.000100
THE EM-CYCLE CONVERGENCE CRITERION IS: 0.001000
THE RK CONTROL PARAMETER (FOR THE M-STEPS) IS: 0.9000
THE RM CONTROL PARAMETER (FOR THE M-STEPS) IS: 1.0000
THE MAXIMUM ACCELERATION PERMITTED IS: 0.0000

THETA-GROUP LOCATIONS WILL REMAIN UNCHANGED

IN-CORE CATEGORICAL DATA STORAGE AVAILABLE FOR N= 10000, 10000 WORDS.

QUADRATURE POINTS FOR MML,


AT THETA:
-4.500
-3.500
-2.500
-1.500
-0.500
0.500
1.500
2.500
3.500
4.500

The key and format for the data, and the values for the first observation are printed to help de-
termine that the data have been read properly. The values printed next to NORML are the internal
representation of group membership: 0 means “in group 1” and 9 means “not in group 2.” The
value printed for WT/CR is the frequency (weight). Below, we note that the MML estimation al-
gorithm has essentially converged; since the maximum change between estimation cycles for any
parameter is less than 0.004.

743
12 MULTILOG EXAMPLES

ITEM 2: LIB, MOD, CONS; ITEM 1:COURTS HARSH-NOT HARSH; TWO FORMS

READING DATA...
KEY-
CODE CATEGORY
11
22
33

FORMAT FOR DATA-


(I1,1X,2A1,F4.0)

FIRST OBSERVATION AS READ-

ITEMS 11
NORML 0.000 9.000
WT/CR 16.00

18 WORDS USED OUT OF 10000 AVAILABLE FOR RESPONSE PATTTERNS


294 WORDS USED OUT OF 40000 AVAILABLE FOR TABLES

FINISHED CYCLE 25
MAXIMUM INTERCYCLE PARAMETER CHANGE= 0.00367 P( 6)

The Maximum Likelihood estimates of the item parameters are printed here: one value for the
slope, (A), and two thresholds (B) for each item.

ITEM 2: LIB, MOD, CONS; ITEM 1:COURTS HARSH-NOT HARSH; TWO FORMS

ITEM 1: 3 GRADED CATEGORIES


P(#) ESTIMATE (S.E.)
A 1 1.08 (0.13)
B(1) 2 -2.86 (0.34)
B(2) 3 -1.78 (0.20)

@THETA:
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0

I(THETA):
0.34 0.30 0.25 0.19 0.13 0.08 0.05 0.03 0.02

OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN


CATEGORY(K): 1 2 3
OBS. FREQ. 68 104 921
OBS. PROP. 0.06 0.10 0.84

GROUP 1:
EXP. PROP. 0.06 0.09 0.85

GROUP 2:
EXP. PROP. 0.07 0.10 0.83

744
12 MULTILOG EXAMPLES

ITEM 2: 3 GRADED CATEGORIES


P(#) ESTIMATE (S.E.)
A 4 1.12 (0.10)
B(1) 5 -0.93 (0.11)
B(2) 6 0.97 (0.11)

@THETA:
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0

I(THETA):
0.22 0.29 0.33 0.35 0.35 0.35 0.34 0.29 0.23

OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN


CATEGORY(K): 1 2 3
OBS. FREQ. 311 439 343
OBS. PROP. 0.28 0.40 0.31

GROUP 1:
EXP. PROP. 0.27 0.40 0.33

GROUP 2:
EXP. PROP. 0.30 0.40 0.29

Beneath the parameter estimates for each item, MULTILOG prints the information I[ θ ] for that
item at nine values of θ from –2 to 2, and the observed and expected frequencies for each re-
sponse alternative.

ITEM 3: GRP1, N[MU: 0.16 SIGMA: 1.00]


P(#);(S.E.): 7; (0.05) 1996; (0.00)

@THETA:
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0

I(THETA):
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

ITEM 4: GRP2, N[MU: 0.00 SIGMA: 1.00]


P(#);(S.E.): 8; (0.00) 1995; (0.00)

@THETA:
-2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0

I(THETA):
1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00

TOTAL TEST INFORMATION

FOR GROUP 1:
@THETA: -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
I(THETA): 1.6 1.6 1.6 1.5 1.5 1.4 1.4 1.3 1.2
SE(THETA):0.80 0.79 0.79 0.81 0.82 0.83 0.85 0.87 0.89

MARGINAL RELIABILITY: 0.3126

745
12 MULTILOG EXAMPLES

TOTAL TEST INFORMATION

FOR GROUP 2:
@THETA: -2.0 -1.5 -1.0 -0.5 0.0 0.5 1.0 1.5 2.0
I(THETA): 1.6 1.6 1.6 1.5 1.5 1.4 1.4 1.3 1.2
SE(THETA): 0.80 0.79 0.79 0.81 0.82 0.83 0.85 0.87 0.89

MARGINAL RELIABILITY: 0.3196

In this case, the population distributions of the two groups are assumed to be normal.
MULTILOG prints the estimated or fixed means (MU) and the standard deviations. It also prints
the total test information I[ θ ] for each group, its inverse square root SE[ θ ], and the marginal
reliability.

ITEM 2: LIB, MOD, CONS; ITEM 1:COURTS HARSH-NOT HARSH; TWO FORMS

GROUP 1
OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN
RES. : :
16.0( 17.3) -0.31 : -1.21 ( 0.84) : 11
16.0( 13.7) 0.62 : -0.55 ( 0.81) : 12
5.0( 6.1) -0.46 : -0.04 ( 0.88) : 13
24.0( 23.8) 0.04 : -0.97 ( 0.79) : 21
29.0( 22.8) 1.30 : -0.40 ( 0.77) : 22
13.0( 11.2) 0.54 : 0.09 ( 0.84) : 23
122.0( 131.6) -0.84 : -0.33 ( 0.84) : 31
224.0( 217.8) 0.42 : 0.22 ( 0.80) : 32
185.0( 189.6) -0.34 : 0.84 ( 0.87) : 33

NEGATIVE TWICE THE LOGLIKELIHOOD= 3.5


(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

ITEM 2: LIB, MOD, CONS; ITEM 1:COURTS HARSH-NOT HARSH; TWO FORMS

GROUP 2
OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN
RES. : :
21.0( 15.6) 1.36 : -1.32 ( 0.84) : 11
7.0( 11.1) -1.23 : -0.66 ( 0.81) : 12
3.0( 4.6) -0.74 : -0.17 ( 0.89) : 13
16.0( 20.6) -1.02 : -1.07 ( 0.78) : 21
11.0( 18.0) -1.65 : -0.50 ( 0.77) : 22
11.0( 8.2) 0.99 : -0.03 ( 0.84) : 23
112.0( 102.9) 0.90 : -0.44 ( 0.84) : 31
152.0( 155.5) -0.28 : 0.11 ( 0.80) : 32
126.0( 122.4) 0.32 : 0.72 ( 0.86) : 33

NEGATIVE TWICE THE LOGLIKELIHOOD= 10.2

(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

TOTAL, NEGATIVE TWICE THE LOGLIKELIHOOD, ALL GROUPS= 13.7

These tables summarize the goodness-of-fit of the model to the data. Observed and expected val-
ues are printed for each response pattern in each group, as well as the standardized residual

746
12 MULTILOG EXAMPLES

which may be taken to be approximately normally distributed with mean zero and variance one
for diagnostic purposes. The χ 2 statistics indicate that the model-fit is satisfactory (on 18 – 2
[group totals] – 7 [parameters fitted] = 9 degrees of freedom). The tables also include EAP[ θ ]
for each response pattern, and the corresponding standard deviation.

Figure 12.1 shows the trace lines computed from the item parameters in the MULTILOG output.
We note that the “Liberal-Conservative” question divides the respondents approximately equally
into three “centered” groups, while the question on the courts has trace lines crossing each other
far on the left. Only the most liberal respondents consider the courts sufficiently harsh. As ex-
pected, the questions are strongly related. For further discussion of part of these data, see Thissen
& Steinberg (1988). A scoring run on these data is discussed in the next section.

747
12 MULTILOG EXAMPLES

Figure 12.1: Trace lines for first two items

12.8 Graded-model scores for individual respondents

Having concluded that the model fits the data satisfactorily, we set up a “scoring run” in which
we compute MAP[ θ ] for each response pattern, as though each line of the input data file repre-
sented an individual observation. This sequence of events represents the normal use of
MULTILOG: the item analysis and individual scoring are done in two separate runs of the com-
puter program. Frequently, several (sometimes even many) item analysis runs are performed be-
fore a satisfactory model is selected. Only after this is accomplished does it make sense to com-
pute estimates of θ for each respondent.

To set up a scoring run for the data described in the previous section, the syntax in courtab2.mlg
is used. The command lines entered here indicate that the problem is one involving the calcula-
tion of SCORE for INDIVIDUAL data, for 2 items, and 2 groups. The GRADED model is used, with 3
response categories for each item. The START command is used to enter the item parameters from
the previous run. The parameters are entered in the order that they are printed, following a user-
supplied format. Usually, these parameters are read from a file previously saved by MULTILOG.
Note that the user should provide information about the key and data format. In the data format,
the first 2A1 refers to the NCHARS=2 characters of ID information; in this case, that reads the re-
sponse pattern as the label.

EXAMPL08.MLG - SCORE LIBERAL CONSERVATIVE/COURTS HARSH


PARAMETER VALUES IN COMMAND FILE
>PROBLEM SCORE, INDIVIDUAL, NITEMS=2, NGROUP=2, NEXAMINEES=18,
NCHARS=2, DATA=‘EXAMPL07.DAT’;
>TEST ALL, GRADED, NCATS=(3,3);
>START ALL;

748
12 MULTILOG EXAMPLES

1.08 -2.86 -1.78


1.12 -0.93 0.97
-1.00 0.16 1.00
-1.00 0.00 1.00
>END;
3
123
11
22
33
(2X,2A1,T1,I1,1X,2A1)

The numbers in the column marked THETAHAT in the output file obtained for this analysis are the
values of MAP[ θ ] for each response pattern; the response patterns are used as the ID fields on
the right. The estimated standard errors are tabulated, as well as the number of iterations required
by the Newton-Raphson algorithm. Comparing these values for each response pattern to the cor-
responding EAPs, we note very little difference.

THETAHAT S.E. ITER ID FIELD


-1.229 0.830 6 11
-0.561 0.804 2 12
-0.028 0.883 2 13
-1.003 0.771 4 21
-0.430 0.762 4 22
0.072 0.841 4 23
-0.375 0.828 4 31
0.192 0.794 4 32
0.826 0.853 4 33
-1.339 0.829 6 11
-0.665 0.804 2 12
-0.153 0.886 3 13
-1.098 0.767 4 21
-0.522 0.759 3 22
-0.041 0.840 3 23
-0.484 0.822 4 31
0.092 0.791 4 32
0.710 0.852 3 33

12.9 Five-category ratings of audiogenic seizures in mice in four experimental


conditions

Bock (1975, pp. 512-547) describes the graded model, a model for ordered categorical data, in
detail and includes an application to a set of behavioral data. The data are in a file called ex-
ampl09.dat, which contains the following four lines:

1 7 0 2 11
0 6 0 6 10
0 2 0 5 11
3 10 2 0 2

Each of the four lines of data represents one of four groups of mice; each group of mice repre-
sents a cell of a 2 x 2 experimental design. The response variable is a classification of the mice in
each group according to the ordered severity of audiogenic seizures they exhibit; the column-
categories are “crouching,” “wild running,” “clonic seizures,” “tonic seizures,” and “death.”

749
12 MULTILOG EXAMPLES

Bock (1975) uses a model for responses in ordered categories, formally identical to Samejima’s
graded item response model, to relate the categorical response to effects of the experimental
conditions. This example reproduces the estimates for Bock’s “main class and interaction”
model.

The algorithm used in MULTILOG is very different from that described by Bock, and requires
different treatment of the required constraints of location. Bock’s system constrains the group
means (called µ there, and θ here) to total zero; he estimates three contrasts among the four
group values. That is impossible in MULTILOG, so all four group locations ( θ ) are estimated,
and one of the thresholds, called BK=4, is fixed at the value Bock obtains (0.4756). With this con-
straint, the results obtained with MULTILOG match those printed in the original source. Of
course, in real data analysis, one would not have such a value and one of the thresholds would be
fixed at some arbitrary value, like zero. If BK=4 had been fixed at zero in the current example, all
of the values of θ and the other three thresholds would have been shifted 0.4756 from the values
in the text.

The command file illustrates user input for FIXED- θ analysis, with data in the form of the table
given above. In this case, there is a single item, and the rows of the table are the groups, so
NGROUP=4. The TGROUPS command specifies four starting values (1,1,1,–1) for the four values of
θ , one for each group. These values must be entered manually. The slope is fixed at a value of 1
and BK=4 is fixed at 0.4756.

The contents of the command file exampl09.mlg are given below. To see how to generate this
command file using the syntax wizard, please see Section 4.3.3.

EXAMPL09.MLG -AUDIOGENEIC SEIZURES IN MICE;


BOCK, P. 512FF
>PROBLEM FIXED, TABLE, NITEM=1, NGROUP=4, DATA=‘EXAMPL09.DAT’;
>TEST ALL, GRADED, NC=5;
>TGROUPS NUMBER=4, MIDDLES=(1,1,1,-1);
>FIX ITEMS=1, AJ, VALUE=1.0;
>FIX ITEMS=1, BK=4, VALUE=0.4756;
>END;
(5F3.0)

In the MULTILOG output, the values of the thresholds, corresponding to those tabulated on p.
547 by Bock, are printed as the values of B(K) in the item summary; their estimated standard er-
rors differ slightly from those in the original analysis, because MULTILOG uses a somewhat
less precise algorithm for computing estimated standard errors. For each θ -group, the estimated
value of θ is printed before the word DATA, e.g. 0.32 for group 1. Those four values correspond
to the four µs on page 546 in Bock (1975). The remainder of the table gives the observed and
expected counts (proportions and probabilities in parentheses) for each cell in the 4 x 5 table, and
the values of the likelihood ratio and Pearson goodness-of-fit statistics. A selection of the output
follows.

750
12 MULTILOG EXAMPLES

ITEM SUMMARY

AUDIOGENEIC SEIZURES IN MICE; BOCK, P. 512FF

ITEM 1: 5 GRADED CATEGORIES


P(#) ESTIMATE (S.E.)
A 7 1.00 (0.00)
B(1) 1 -3.40 (0.53)
B(2) 2 -0.51 (0.21)
B(3) 3 -0.37 (0.22)
B(4) 8 0.48 (0.00)

ITEM 1
TH-GROUP CATEGORY
1 2 3 4 5
0.32 DATA 1.0(0.05) 7.0(0.33) 0.0(0.00) 2.0(0.10) 11.0(0.52)
EXPECTED 0.5(0.02) 5.9(0.28) 0.7(0.03) 4.3(0.20) 9.7(0.46)

0.41 DATA 0.0(0.00) 6.0(0.27) 0.0(0.00) 6.0(0.27) 10.0(0.45)


EXPECTED 0.5(0.02) 5.8(0.26) 0.7(0.03) 4.4(0.20) 10.6(0.48)

1.07 DATA 0.0(0.00) 2.0(0.11) 0.0(0.00) 5.0(0.28) 11.0(0.61)


EXPECTED 0.2(0.01) 2.9(0.16) 0.4(0.02) 2.9(0.16) 11.6(0.64)

-1.80 DATA 3.0(0.18) 10.0(0.59) 2.0(0.12) 0.0(0.00) 2.0(0.12)


EXPECTED 2.8(0.17) 10.5(0.62) 0.4(0.02) 1.7(0.10) 1.6(0.09)

PEARSON CHI-SQUARE= 15.078 L.R.CHI-SQUARE= 15.885

TOTAL TEST CHI-SQUARES

PEARSON CHI-SQUARE= 15.078; LIKELIHOOD RATIO CHI-SQUARE= 15.885

D.F.= 9 (IF TEST ALL CATEGORICAL AND THERE ARE NO EMPTY TH-GROUPS)

WARNING- 7 OF THE CELLS INCLUDED IN THESE


CHI-SQUARES HAVE EXPECTED VALUES LESS THAN ONE;
THE STATISTICS MAY BE MEANINGLESS.

12.10 A nominal model for responses to multiple-choice alternatives

The multiple-choice model (Thissen & Steinberg, 1984) includes a separate trace line for each
alternative response—the key and all of the distractors—on a multiple-choice item. The model is
a development of suggestions by Bock (1972) and Samejima (1979). For this reason, it is re-
ferred to as the “BS” model. The procedures involved differ from those used when the responses
on multiple-choice items are made binary by scoring correct or incorrect before the item analy-
sis. The data are more complex: for 4 four-alternative multiple-choice items, there are 44 = 256
possible response patterns; if the data are made binary, there are only 24 = 16 response patterns.
The model is more complex: the multiple-choice model has eleven free parameters for each four-
alternative item, while the 3PL model has only three. The model, its estimation, and its interpre-
tation are described by Thissen & Steinberg (1984) and Thissen, Steinberg & Fitzpatrick (1989).
The interested reader is referred to those sources.

751
12 MULTILOG EXAMPLES

The first 12 lines of the data file exampl10.dat are shown below.

1 1111
2 1113
1 1121
1 1133
1 1134
2 1143
1 1144
2 1222
2 1232
1 1233
2 1242
5 1243

This example shows the MULTILOG output for an item parameter estimation problem. The data
are the responses of 976 examinees to 4 four-alternative multiple-choice vocabulary items. The
model is fitted with constraints described by Thissen & Steinberg (1984) as the “ABCD(C),
ABCD(D)” model. In addition to the constraints on ak giving the “ABCD(C), ABCD(D)” model, two
of the relatively ill-determined c-contrasts are fixed at zero, increasing the precision of estima-
tion of the entire model, without damaging the fit.

On the PROBLEM command, the four-choice items are defined as having five response categories
[NC=(5(0)4)], because the Multiple-choice (“BS”) model appends an additional latent response
category to each item. This category is denoted DK (for “Don’t Know”) by Thissen & Steinberg
(1984), and must be category 1 in MULTILOG; i.e. the “real” responses are keyed into catego-
ries 2, 3, 4, and 5. The correct answers for these four items are [D,C,C,D], so HIGH =
(5,4,4,5).

The EQUAL commands impose the constraint that the proportions of the “DK” curve distributed
into each of the observed response categories are the same within the pairs of items with the
same keyed correct response. The two c-contrasts are fixed at zero, because we have found since
the publication of the original paper that the model is better-conditioned with the addition of
such constraints, and there is no apparent damage to the fit. As a matter of fact, this run pro-
duces a better fit than that reported for the slightly less constrained ABCD(C), ABCD(D) model in
the original paper, because this version of MULTILOG appears to converge somewhat more
completely than the version of MULTILOG (4.0) that provided the findings reported in the pa-
per. The two parameters fixed at zero had estimated standard errors several times larger than
their absolute values when they were estimated. Because of the substantial error covariances
among the parameters of this model, their estimation induced large standard error estimates in
several of the other parameters. Fixing the two ck s produces much more stable results. The pa-
rameter estimates are printed in the following selection of the MULTILOG output, both in con-
trast form and as the ak s , ck s and d k s . The goodness-of fit statistics and EAP[ θ ]s for each ob-
served pattern are printed on the final pages of the output.

The more heavily constrained model also runs faster in MULTILOG. In general, poorly identi-
fied models require much more computing time than more highly constrained models.

752
12 MULTILOG EXAMPLES

The parameterization

The relationship among the unconstrained (or contrast) parameters estimated by MULTILOG
and the constrained parameters of the model is fairly complex; here we provide illustrations,
based on item 1 of the example.

The model is

exp[akθ + ck ] + d k [a1θ + c1 ]
P( x = k ) = m +1
∑ i =1
exp[aiθ + ci ]

in which k = 2, 3, 4, 5 represent responses A, B, C, and D, respectively, for this multiple-choice


item. Response category 1 is used (internally) by MULTILOG to represent the latent “Don’t
know” category. The slope parameters ak , in the vector a , are computed as

a ' = α 'Tα ,

where α contains the unconstrained parameters estimated by MULTILOG. For item 1, this is
(with the vectors transposed to fit on the page):

−2.98
'

−2.03 −0.20 0.80 −0.20 −0.20 −0.20


'

−5.01
6.89 −0.20 −0.20 0.80 −0.20 −0.20
3.91 =
2.33 −0.20 −0.20 −0.20 0.80 −0.20
−0.66
7.73 −0.20 −0.20 −0.20 −0.20 0.80
4.75

The estimates of the parameters ak , in the vector a , are printed in the row marked A(K) in the
MULTILOG output, and the estimates of the (unconstrained) parameters in the vector α are
printed in the column marked CONTRAST COEFFICIENTS FOR A. Using the (default) deviation
contrasts in T , there is a fairly straightforward scalar interpretation of the parameters:

a1 = –2.98 = –0.20 [–2.03 + 6.89 + 2.33 + 7.73 ]

and

a2 = –2.98 + (–2.03) = –5.01,

a3 = –2.98 + 6.89 = 3.91,

a4 = –2.98 + 2.33 = –0.66,

753
12 MULTILOG EXAMPLES

and

a5 = –2.98 + 7.73 = 4.75,

where α ' = [–2.03 6.89 2.33 7.73 ] contains the parameters estimated by MULTILOG. This
has a direct bearing on the imposition of equality constraints using the MULTILOG command-
language. If, for instance, one wanted to constrain a3 and a5 to be equal, one would enter the
command

>EQUAL ITEMS=1, WITH=1, AK=(2,4);

because this would set the second and fourth contrasts among the as equal (they are currently
estimated as 6.89 and 7.73); the consequence of this would be that a3 and a5 would be equal.
Any constraints involving a1 are different: To constrain a1 and a2 to be equal, for instance, one
would enter the command

>FIX ITEMS=1, AK=1, VA=0.0;

which would have the effect of fixing the first contrast among the as (currently estimated to be –
2.03) at a value of zero. If that is true, a1 = a2 . The computation of the cs is parallel in all re-
spects to that for the as. Note that, in the example as printed, the command

>FIX ITEMS=1, CK=3, VALUE=0.0;

has the effect of imposing the constraint that c1 = c4 = −0.74.

The use of different T -matrices (Polynomial or Triangle) changes the relationship between
the unconstrained parameters estimated by MULTILOG and the as. However, MULTILOG
commands to FIX or EQUAL parameters always refer to the unconstrained contrast parameters,
and algebraic manipulation similar to that described here is necessary to obtain any desired con-
straints on the as or cs themselves.

The relationship between the ds and the unconstrained parameters estimated by MULTILOG is
somewhat more complex, because the parameters represented by d k are proportions (represent-
ing the proportion of those who “don’t know” who respond in each category on a multiple-
choice item; see Thissen & Steinberg, 1984). Therefore, the constraint that ∑ d k = 1 is required.
This is enforced by estimating d k such that

exp[d k* ]
dk =
∑ exp[d k* ]

754
12 MULTILOG EXAMPLES

and

d*' = δ'Td .

The elements of the vector δ are the parameters estimated by MULTILOG, and printed in the
column marked CONTRAST COEFFICIENTS FOR D; in the case of item 1 of this example,

δ ' = [0.76 –0.13 1.28].

These values are used (internally) by MULTILOG to compute the values of d*k . In this case, they
are:

−0.47
'

0.76 −0.25 0.75 −0.25 −0.25


'

0.28
= −0.13 −0.25 −0.25 0.75 −0.25
−0.61
1.28 −0.25 −0.25 −0.25 0.75
0.80

Then

∑ exp[d *
k ] = exp[−0.47] + exp[0.28] + exp[−0.61] + exp[0.80]
= 0.625 + 1.323 + 0.543 + 2.226
= 4.717.

So

0.625 1.323
d1 = = 0.13 d 2 = = 0.28
4.717 , 4.717 ,

0.543 2.226
d3 = = 0.12 and d 4 = = 0.47.
4.717 , 4.717

The four proportions [0.13, 0.28, 0.12, 0.47] are printed as D(K) in the MULTILOG output, in
columns 2, 3, 4 and 5 because those columns represent the parameters for the observed item re-
sponses.

The example illustrates the imposition of equality constraints on the ds between items. To im-
pose equality constraints on the ds within an item, the procedure is parallel to that described pre-
viously for imposing within-item equality constraints on as and cs. For instance, to impose the
constraint that d 2 and d3 should be equal, one would enter the command

>EQUAL ITEMS=1, WITH=1, DK=(1,2);

755
12 MULTILOG EXAMPLES

because this would set the first and second contrasts among the ds equal (they are currently esti-
mated as 0.76 and –0.13). The consequence of this would be that d 2 and d3 would be equal.
Any constraints involving d1 are different: To constrain d1 and d3 to be equal, for instance, one
would enter the command

>FIX ITEMS=1, DK=2, VALUE=0.0;

which would have the effect of fixing the second contrast among the ds (currently estimated to
be –0.13) at a value of zero. If that is true, then d1 = d3 .

The command file exampl10.mlg is shown below. Another example of the fitting of a multiple-
choice model is given in Section 12.11.

EXAMPL10.MLG -
ABCD(C) ABCD(D) WITH TWO C(K)S FIXED AT ZERO
>PROBLEM RANDOM, PATTERNS, NITEMS=4, NGROUP=1, NPATTERNS=156,
DATA=‘EXAMPL10.DAT’;
>TEST ALL, BS, NC=(5(0)4), HIGH=(5,4,4,5);
>EQUAL ITEMS=(1,4), DK=(1,2,3);
>EQUAL ITEMS=(2,3), DK=(1,2,3);
>FIX ITEMS=1, CK=3, VALUE=0.0;
>FIX ITEMS=2, CK=2, VALUE=0.0;
>SAVE;
>ESTIMATE NC=25;
>TGROUPS NUMBERS=10, QP=(-4.5(1.00)4.5);
>END;
4
1234
2222
3333
4444
5555
(10X,4A1,T3,F4.0)

Selected output follows.

ABCD(C) ABCD(D) WITH TWO C(K)S FIXED AT ZERO

ITEM 1: 5 NOMINAL CATEGORIES, 5 HIGH


CATEGORY(K): 1 2 3 4 5
A(K) -2.98 -5.01 3.91 -0.66 4.75
C(K) -0.74 -5.88 3.80 -0.74 3.56
D(K) 0.13 0.28 0.12 0.47

CONTRAST-COEFFICIENTS (STANDARD ERRORS)


FOR: A C D
CONTRAST P(#) COEFF.[ DEV.] P(#) COEFF.[ DEV.] P(#) COEFF.[ DEV.]
1 1 -2.03 (1.00) 5 -5.15 (2.02) 8 0.76 (0.29)
2 2 6.89 (3.45) 6 4.53 (1.68) 9 -0.13 (0.45)
3 3 2.33 (0.99) 37 0.00 (0.00) 10 1.28 (0.26)
4 4 7.73 (3.46) 7 4.29 (1.68)

756
12 MULTILOG EXAMPLES

TOTAL TEST INFORMATION

@THETA: INFORMATION:
-3.0 - -1.6 2.520 2.883 3.164 3.272 3.163 2.879 2.525 2.224
-1.4 - 0.0 2.088 2.257 3.024 4.791 5.243 3.507 2.811 3.882
0.2 - 1.6 6.688 8.513 6.862 4.443 2.957 2.194 1.787 1.550
1.8 - 3.0 1.400 1.300 1.231 1.181 1.144 1.116 1.095

@THETA: POSTERIOR STANDARD DEVIATION:


-3.0 - -1.6 0.630 0.589 0.562 0.553 0.562 0.589 0.629 0.671
-1.4 - 0.0 0.692 0.666 0.575 0.457 0.437 0.534 0.596 0.508
0.2 - 1.6 0.387 0.343 0.382 0.474 0.582 0.675 0.748 0.803
1.8 - 3.0 0.845 0.877 0.901 0.920 0.935 0.947 0.956

MARGINAL RELIABILITY: 0.6903

ABCD(C) ABCD(D) WITH TWO C(K)S FIXED AT ZERO


OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN
RES. : :
1.0( 0.1) 2.33 : -2.23 ( 0.63) : 2222
2.0( 0.6) 1.78 : -2.20 ( 0.63) : 2224
1.0( 0.2) 2.16 : -1.53 ( 0.58) : 2232
1.0( 0.8) 0.26 : -0.96 ( 0.56) : 2244
1.0( 0.4) 0.97 : -1.01 ( 0.58) : 2245
2.0( 1.5) 0.38 : -1.27 ( 0.57) : 2254
1.0( 0.8) 0.20 : -1.32 ( 0.58) : 2255
2.0( 0.5) 2.08 : -1.63 ( 0.49) : 2333
2.0( 0.4) 2.64 : -1.25 ( 0.54) : 2343
1.0( 0.7) 0.35 : -1.33 ( 0.51) : 2344
2.0( 1.0) 1.05 : -1.47 ( 0.47) : 2353
5.0( 1.9) 2.22 : -1.51 ( 0.45) : 2354

Many similar lines omitted here

NEGATIVE TWICE THE LOGLIKELIHOOD= 243.2


(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

12.11 A constrained nonlinear model for multiple-choice alternatives

Thissen, Steinberg & Fitzpatrick (1989) described the use of the multiple-choice model with four
items from a nation-wide tryout of achievement test items conducted in 1987 by CTB/McGraw-
Hill. The data comprised the responses of 959 examinees that responded to four items on a single
page of one of the tryout forms. The items are included in the report by Thissen, Steinberg &
Fitzpatrick (1989).

The data for the analysis were the observed counts of examinees giving each of the 44 = 256
possible response patterns to the four items. Fitting the 256-cell contingency table with the mul-
tiple-choice model with no constraints, the likelihood ratio G 2 with 211 d.f. was 226.0, which
indicates a satisfactory fit. However, examination of the item parameters, the trace lines, and the
items themselves lead us to impose a number of constraints on the model. Using MULTILOG
subscripts, where category 1 =“don’t know,” and the observed responses are in categories 2-5:

757
12 MULTILOG EXAMPLES

 For items 2, 3, and 4, we constrained d k = 0.25 for all four alternatives with >FIX
ITEMS=(2,3,4), DK=(1,2,3) VALUE=0.0.
 For item 1, we constrained d1 = d3 = d 4 with >FIX ITEMS=1, DK=(2,3), VALUE=0.0.
 For items 1 and 2 we constrained a2 = a1 with >FIX ITEMS=1, AK=1, VALUE=0.0 and
>FIX ITEMS=2, AK=1, VALUE=0.0.
 For item 2, we constrained a3 = a5 with >EQUAL ITEMS=2, WITH=2, AK=(2,4); for item
3 we constrain a3 = a4 with >EQUAL ITEMS=3, WITH=3, AK=(2,3); and for item 4, we
constrained a2 = a3 = a5 with >EQUAL ITEMS=4, WITH=4, AK=(1,2,4).
 For item 3, we constrained a2 = a5 with >EQUAL ITEMS=3, WITH=3, AK=(1,4).

These constraints reduce the number of parameters (contrasts) estimated from 44 to 26. The
goodness-of-fit statistic under all of the constraints is χ 2 (229) = 236.9 , which is very close to
expectation. The overall test of significance of the 18 contrasts among the parameters eliminated
in these constraints is χ 2 (18) = 236.9 − 226.0 = 10.9.

Thus no significant differences among the trace lines have been eliminated in the imposition of
these constraints. However, the remaining parameters are much more precisely estimated and the
corresponding trace lines are smoother than those involving many parameters that are not well-
specified by the data.

On the following pages we illustrate the use of MULTILOG to compute the estimates. Note that
we increased the number of quadrature points (with the TGROUPS command) from the default 10
to 13. This increases the usefulness of the approximate standard errors. We also impose a gentle
Bayesian prior on d-contrast 1 for item 1 (the only estimated d-contrast); as with the 3PL model,
weak priors on the d-contrasts are usually helpful.

Syntax for this run, as shown below, is given in exampl11.mlg while the data file is ex-
ampl11.dat.

EXAMPL11.MLG -
"CALORIC CONSUMPTION ITEMS", TSF, JEM 89
>PROBLEM RANDOM, PATTERNS, NITEMS=4, NGROUP=1, NPATTERNS=148,
DATA=‘EXAMPL11.DAT’;
>TEST ALL, BS, NC=(5(0)4), HIGH=(3,4,5,4);
>SAVE;
>TGROUPS NUMBERS=13, QP=(-4.5(0.75)4.5);
>FIX ITEMS=1, AK=1, VA=0.0;
>FIX ITEMS=2, AK=1, VA=0.0;
>EQUAL ITEMS=2, WITH=2, AK=(2,4);
>EQUAL ITEMS=3, WITH=3, AK=(1,4);
>EQUAL ITEMS=3, WITH=3, AK=(2,3);
>EQUAL ITEMS=4, WITH=4, AK=(1,2,4);
>FIX ITEMS=1, DK=(2,3), VALUE=0.0;
>FIX ITEMS=(2,3,4), DK=(1,2,3), VALUE=0.0;
>PRIOR ITEMS=1, DK=1, PA=(0.0,1.0);
>ESTIMATE NC=100;
>END;
4

758
12 MULTILOG EXAMPLES

1234
2222
3333
4444
5555
(4A1,F4.0)

Selected output for this run follows.

ITEM 1: 5 NOMINAL CATEGORIES, 3 HIGH

CATEGORY(K): 1 2 3 4 5
A(K) -1.94 -1.94 1.31 0.68 1.88
C(K) 0.05 -0.29 1.17 1.74 -2.68
D(K) 0.12 0.65 0.12 0.12

CONTRAST-COEFFICIENTS (STANDARD ERRORS)


FOR: A C D
CONTRAST P(#) COEFF.[ DEV.] P(#) COEFF.[ DEV.] P(#) COEFF.[ DEV.]
1 27 0.00 (0.00) 4 -0.34 (0.26) 8 1.71 (0.35)
2 1 3.26 (0.44) 5 1.12 (0.41) 28 0.00 (0.00)
3 2 2.63 (0.40) 6 1.69 (0.32) 29 0.00 (0.00)
4 3 3.83 (0.84) 7 -2.73 (1.19)

MARGINAL RELIABILITY: 0.6005

‘CALORIC CONSUMPTION ITEMS’, TSF, JEM 89

OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN


RES. : :
2.0( 3.0) -0.56 : -1.53 ( 0.60) : 2223
8.0( 6.7) 0.52 : -1.16 ( 0.70) : 2224
1.0( 2.0) -0.69 : -1.64 ( 0.58) : 2225
6.0( 6.1) -0.05 : -1.12 ( 0.61) : 2232
7.0( 6.9) 0.06 : -1.09 ( 0.60) : 2233
28.0( 22.9) 1.06 : -0.79 ( 0.53) : 2234

NEGATIVE TWICE THE LOGLIKELIHOOD= 236.4


(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

12.12 A nominal model for testlets

In their description of the use of latent class models for the validation of the structure of knowl-
edge domains, Bergan & Stone (1985) report a number of analyses of the data in this example.
The data were collected as the responses to four items measuring the numerical knowledge of a
sample of preschool children in the Head Start program. The first two items required the children
to identify numerals (3 and 4), and the second two items required the children to match the cor-
rect numeral (again, 3 or 4) represented by a number of blocks.

In an analysis reported in Thissen & Steinberg (1988), the items were redefined as two pseudo-
items, each of which has four response categories. The first of these pseudo-items is denoted
“Identify,” which has four categories of response: correctly identifying neither numeral, only 3,
only 4, or both correct. The second pseudo-item is called “Match,” with the same four response

759
12 MULTILOG EXAMPLES

categories. The pseudo-items are logically equivalent to testlets described by Wainer & Kiely
(1987): They are clusters of items between which conditional independence may reasonably be
expected.

The trace line model used here is Bock’s (1972) nominal model. Equality constraints are im-
posed among the parameters: for “Identify,” a2 = a1 ; for “Match,” a3 = a2 and c3 = c2 . Given
the use of “Triangle” T -matrices, these constraints are imposed by fixing a- and c-contrasts at
zero, because those contrasts represent the differences between successive as and cs. This exam-
ple also illustrates entry of starting values; MULTILOG’s default starting values do not perform
well in this example. The fit of the model is quite good: χ 2 = 8.4, p = 0.2.

Syntax for this model, as shown below, is contained in the file exampl12.mlg and is based on
data in exampl12.dat. Additional examples of nominal models are given in the next two sec-
tions.

EXAMPL12.MLG -
BERGAN & STONE DATA ON PRESCHOOLERS AND ‘3 AND 4’
>PROBLEM RANDOM, PATTERNS, NITEMS=2, NGGROUPS=1, NPATTERNS=16,
DATA=‘EXAMPL12.DAT’;
>TEST ALL, NOMINAL, NC=(4,4), HIGH=(4,4);
>TMATRIX ALL, AK, TRIANGLE;
>TMATRIX ALL, CK, TRIANGLE;
>FIX ITEMS=1, AK=1, VALUE=0.0;
>FIX ITEMS=2, AK=2, VALUE=0.0;
>FIX ITEMS=2, CK=2, VALUE=0.0;
>START ITEMS=(1,2), PARAMS=‘EXAMPL12.PRM’;
>END;
4
N34B
11
22
33
44
(2A1,F4.0)

Selected output is given below.

ITEM 1: 4 NOMINAL CATEGORIES, 4 HIGH


CATEGORY(K): 1 2 3 4
A(K) 0.00 0.00 1.19 2.95
C(K) 0.00 -0.86 -0.67 0.69

CONTRAST-COEFFICIENTS (STANDARD ERRORS)


FOR: A C
CONTRAST P(#) COEFF.[ TRI.] P(#) COEFF.[ TRI.]
1 10 0.00 (0.00) 3 0.86 (0.14)
2 1 -1.19 (0.33) 4 -0.19 (0.27)
3 2 -1.76 (0.35) 5 -1.36 (0.21)

TOTAL TEST INFORMATION


@THETA: INFORMATION:
-3.0 - -1.6 1.048 1.058 1.071 1.090 1.118 1.159 1.220 1.311
-1.4 - 0.0 1.448 1.650 1.940 2.335 2.819 3.315 3.681 3.775

760
12 MULTILOG EXAMPLES

0.2 - 1.6 3.566 3.152 2.678 2.245 1.894 1.629 1.437 1.301
1.8 - 3.0 1.206 1.141 1.097 1.066 1.045 1.031 1.021

MARGINAL RELIABILITY: 0.5409

OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN


RES. : :
71.0( 70.2) 0.09 : -0.99 ( 0.70) : 11
34.0( 32.8) 0.20 : -0.84 ( 0.67) : 12
30.0( 32.8) -0.50 : -0.84 ( 0.67) : 13
38.0( 37.1) 0.15 : -0.18 ( 0.58) : 14
30.0( 29.6) 0.07 : -0.99 ( 0.70) : 21
13.0( 13.9) -0.23 : -0.84 ( 0.67) : 22
15.0( 13.9) 0.31 : -0.84 ( 0.67) : 23
15.0( 15.6) -0.16 : -0.18 ( 0.58) : 24
13.0( 15.1) -0.53 : -0.49 ( 0.61) : 31
4.0( 8.3) -1.48 : -0.37 ( 0.59) : 32
15.0( 8.3) 2.34 : -0.37 ( 0.59) : 33
19.0( 19.4) -0.09 : 0.22 ( 0.58) : 34
43.0( 42.1) 0.15 : 0.11 ( 0.58) : 41
30.0( 28.0) 0.37 : 0.22 ( 0.58) : 42
25.0( 28.0) -0.57 : 0.22 ( 0.58) : 43
197.0( 196.9) 0.01 : 0.91 ( 0.69) : 44

NEGATIVE TWICE THE LOGLIKELIHOOD= 8.4


(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

12.13 A constrained nominal model for questionnaire items

This example illustrates the computations involved in the analysis of the “life satisfaction” data
described by Thissen & Steinberg (1988). The data consist of the counts of respondents in a 33
cross-classification based on the responses of 1472 respondents to the 1975 General Social Sur-
vey (Davis, 1975), to three questions concerning satisfaction with family (F), hobbies (H), and
residence (R). In the original data, there were seven responses available. In previous analyses,
Clogg (1979) re-classified the data into three categories, and Masters (1985) used the trichoto-
mized data. Better data analysis would probably be obtained with the original seven-category
data, or at least a more sensible reduction; Muraki (1984), for instance, used a different four-
category system for the same seven responses. However, the analysis illustrated here corresponds
to that described by Thissen & Steinberg (1988) and uses the trichotomized data.

In this illustration, we again use Bock’s (1972) nominal model. This model for the trace lines is
extremely flexible; however, it is frequently too flexible and some additional constraints on the
item parameters are required to give a satisfactory solution. When fitted without constraints, for
item F, the difference between a1 and a2 is nearly zero. For items H and R, the difference be-
tween a1 and a2 is small and similar; and the difference between a1 and a3 is about the same for
all three items. In this example, we impose equality constraints to make these small differences
exactly zero. Using the (default) deviation contrasts, this is done with >FIX ITEMS=1, AK=1,
VALUE=0.0 [to set a1 = a2 for item 1], >EQUAL ITEMS=(2,3), AK=1 [to set ( a1 − a2 ) equal for
items 2 and 3] and >EQUAL ITEMS=(1,2,3), AK=2 [to set (a1 − a3 ) equal for all three items].
Imposing these equality constraints gives a version of the nominal model that (barely) fits:

761
12 MULTILOG EXAMPLES

χ 2 (18) = 28.3, p = 0.06.

Additional examples of nominal models are given in Sections 12.10 and 12.12. The contents of
the command file exampl13.mlg are shown below.

EXAMPL13.MLG:
SATISFACTION DATA FOR THE PARAMETERS IN TABLE 7, T&S 88
>PROBLEM RANDOM, PATTERNS, NITEMS=3, NGROUP=1, NPATTERNS=27,
DATA=‘EXAMPL13.DAT’;
>TEST ALL, NOMINAL, NC=(3,3,3), HIGH=(3,3,3);
>FIX ITEMS=1, AK=1, VALUE=0.0;
>EQUAL ITEMS=(2,3), AK=1;
>EQUAL ITEMS=(1,2,3), AK=2;
>SAVE;
>END;
3
123
111
222
333
(1X,3A1,F4.0)

Selected output is given below.

ITEM 1: 3 NOMINAL CATEGORIES, 3 HIGH


CATEGORY(K): 1 2 3
A(K) -0.52 -0.52 1.04
C(K) -1.37 -0.33 1.69

CONTRAST-COEFFICIENTS (STANDARD ERRORS)


FOR: A C
CONTRAST P(#) COEFF.[ DEV.] P(#) COEFF.[ DEV.]
1 9 0.00 (0.00) 2 1.04 (0.12)
2 1 1.55 (0.08) 3 3.06 (0.14)

MARGINAL RELIABILITY: 0.4670

OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN


RES. : :
15.0( 6.5) 3.31 : -1.53 ( 0.72) : 111
11.0( 11.1) -0.03 : -1.33 ( 0.70) : 112
7.0( 4.6) 1.12 : -0.78 ( 0.68) : 113
3.0( 9.2) -2.04 : -1.33 ( 0.70) : 121
12.0( 16.9) -1.20 : -1.13 ( 0.69) : 122

NEGATIVE TWICE THE LOGLIKELIHOOD= 28.3

(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

12.14 A constrained generalized partial credit model

In this example, we consider the responses of 3866 examinees to a 4-passage, 22-item test of
reading comprehension. For a complete description of the data and the analysis, see Thissen,
Steinberg & Mooney (1989). The reading passages were of varying lengths, and they were fol-

762
12 MULTILOG EXAMPLES

lowed by varying numbers of questions about their content, from three to eight questions. Instead
of considering the test to be comprised of 22 binary items, we considered it to be made up of four
testlets (Wainer & Kiely, 1987). Each testlet has q questions (q = 7, 4, 3, 8), and the four testlet
responses for each examinee are the number of questions correct for each of the four passages.
Thus the seven questions following the first passage constitute a single testlet, with responses
x=0, 1, 2, …, 7.

The model we used for the number-correct for each passage was the nominal model (Bock,
1972). We reparameterized the model using centered polynomials of the associated scores to rep-
resent the category-to-category change in the ak s and ck s (with TMATRIX … POLYNOMIAL). This-
sen & Steinberg (1986) showed that the polynomial-contrast version of the nominal model is
equivalent to Masters’ (1982) “partial credit” model for ordered item responses when the con-
trasts among the as are restricted to be linear, and constant for all items. We did not expect that
such a simple model will fit the data; for instance, we did not expect a priori that the testlets
would be equally related to proficiency, so we permitted the linear contrast among the as to vary
over items. Guessing may cause a score of one on a multi-question passage to reflect little more
proficiency than a score of zero, but higher scores should be more ordered. The linear-plus-quad-
ratic polynomial for the a-contrasts was intended to produce as that may be similar for scores of
zero and one, and increasing for higher scores. The polynomial parameterization for the cs is in-
tended to capture the smoothness in the distribution of response proportions for adjacent scores.

To improve the stability of estimation of the item parameters, we located the lowest-degree
polynomials, which provided a satisfactory fit to the data. We used the likelihood ratio statistics
to evaluate the models. For the unconstrained nominal model twice the negative log likelihood
was 1048.3 (this is not distributed as χ 2 with any clear degrees of freedom; only 652 of the 1440
cells of the 4-way contingency table are non-zero). Upon reducing the rank of the polynomials
for the as to one (linear in number-correct) for testlets 1, 3, and 4, and to two (quadratic in num-
ber-correct) for testlet 2, we obtained a value of 1082.2; the likelihood ratio test for the signifi-
cance of this reduction is χ 2 (17) = 33.9, p = 0.01. While this value is significant, it is not
highly significant given the sample size (3866). No individual term among those eliminated was
extremely significant. The significance arose from moderately large χ 2s for two or three rela-
tively high-order polynomial terms (e.g., χ 2s of about 5 for fourth- and seventh-degree terms).
Upon finding that any further reduction in the rank of the a-parameterization induces a highly
significant change in the goodness of fit, we settled on linear as for testlets 1, 3, and 4 and quad-
ratic as for testlet 2.

Using the reduced-rank as, we then reduced the rank of the polynomials for the cs to {3, 3, 2, 4}
for the four testlets; χ 2 (10) = 12.0, p = 0.3 for the ten high-order polynomial terms eliminated.
Any further reduction caused a highly significant change in the goodness-of-fit. On the following
pages, we fit the model to the 8 x 5 x 4 x 9 cross-classification of observed response-pattern fre-
quencies, with constraints imposed to give the final model.

The data are given in exampl14.dat and the syntax for this run (exampl14.mlg) is given below.

763
12 MULTILOG EXAMPLES

EXAMPL14.MLG -
READING COMPREHENSION AS 4 TESTLETS, FINAL MODEL, TSM 89 JEM
>PROBLEM RANDOM, PATTERNS, NITEMS=4, NGROUPS=1, NPATTERNS=652,
DATA=‘EXAMPL14.DAT’;
>TEST ALL, NOMINAL, NC=(8,5,4,9), HIGH=(8,5,4,9);
>SAVE;
>TMATRIX ALL, AK, POLYNOMIAL;
>TMATRIX ALL, CK, POLYNOMIAL;
>FIX ITEMS=1, AK=(2(1)7), VALUE=0.0;
>FIX ITEMS=2, AK=(3,4), VALUE=0.0;
>FIX ITEMS=3, AK=(2,3), VALUE=0.0;
>FIX ITEMS=4, AK=(2(1)8), VALUE=0.0;
>FIX ITEMS=1, CK=(4,5,6,7), VALUE=0.0;
>FIX ITEMS=2, CK=4, VALUE=0.0;
>FIX ITEMS=3, CK=3, VALUE=0.0;
>FIX ITEMS=4, CK=(5,6,7,8), VALUE=0.0;
>END;
9
123456789
1111
2222
3333
4444
5505
6006
7007
8008
0009
(4A1,F5.0)

Selected output is given below.

READING COMPREHENSION AS 4 TESTLETS, FINAL MODEL, TSM 89 JEM

ITEM 1: 8 NOMINAL CATEGORIES, 8 HIGH


CATEGORY(K): 1 2 3 4 5 6 7 8
A(K) -1.94 -1.39 -0.83 -0.28 0.28 0.83 1.39 1.94
C(K) -4.00 -1.68 0.14 1.39 2.00 1.90 1.01 -0.75

CONTRAST-COEFFICIENTS (STANDARD ERRORS)


FOR: A C
CONTRAST P(#) COEFF.[POLY.] P(#) COEFF.[POLY.]
1 1 0.56 (0.02) 2 0.50 (0.03)
2 18 0.00 (0.00) 3 -0.68 (0.02)
3 19 0.00 (0.00) 4 -0.05 (0.02)
4 20 0.00 (0.00) 24 0.00 (0.00)
5 21 0.00 (0.00) 25 0.00 (0.00)
6 22 0.00 (0.00) 26 0.00 (0.00)
7 23 0.00 (0.00) 27 0.00 (0.00)

ITEM 2: 5 NOMINAL CATEGORIES, 5 HIGH


CATEGORY(K): 1 2 3 4 5
A(K) -0.88 -0.64 -0.20 0.44 1.28
C(K) -1.71 -1.21 0.05 1.25 1.62

764
12 MULTILOG EXAMPLES

CONTRAST-COEFFICIENTS (STANDARD ERRORS)


FOR: A C
CONTRAST P(#) COEFF.[POLY.] P(#) COEFF.[POLY.]
1 5 0.54 (0.03) 7 0.91 (0.04)
2 6 0.12 (0.02) 8 -0.03 (0.03)
3 35 0.00 (0.00) 9 -0.16 (0.02)
4 36 0.00 (0.00) 37 0.00 (0.00)

MARGINAL RELIABILITY: 0.6570

OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN


RES. : :
1.0( 0.2) 1.77 : -2.89 ( 0.64) : 1111
1.0( 0.2) 2.11 : -2.57 ( 0.63) : 1121
1.0( 0.4) 1.02 : -2.05 ( 0.60) : 1123
1.0( 0.1) 3.95 : -1.31 ( 0.57) : 1135
1.0( 0.3) 1.29 : -2.26 ( 0.61) : 1213

NEGATIVE TWICE THE LOGLIKELIHOOD= 1094.2


(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

12.15 A mixed nominal and graded model for self-report inventory items

In research concerned with eating disorders among college women, Irving (1987) used a ques-
tionnaire called the BULIT, a 36-item index created to identify individuals with, or at risk for
developing bulimia (Smith & Thelen, 1984). All of the items on the scale have five response al-
ternatives; most are “Likert-type” items. The questionnaire was developed to be scored by add-
ing the numbers (from 1 to 5) associated with each response; high scores imply high risk. But
the BULIT also includes items for which the responses are not so obviously ordered.

In this example, we illustrate the use of MULTILOG to fit different models to different items of
the same scale, as described by Thissen (1991). We use Bock’s (1972) nominal model for item 1
of exampl15.dat, while we use Samejima’s (1969) graded model for items 2 and 3. For the 5 x 5
x 5 table arising from the cross-classification based on the three items described by Thissen
(1991), the graded model for items 2 and 3 and the nominal model for item 1 give
χ 2 (108) = 99.9, p = 0.6.

Syntax for this run, from the file exampl15.mlg, is shown below.

EXAMPL15.MLG -
HYBRID GRADED-NOMINAL SET OF ITEMS FROM THE BULIT
>PROBLEM RANDOM, PATTERNS, NITEMS=3, NGROUP=1, NPATTERNS=69,
DATA=‘EXAMPL15.DAT’;
>TEST ITEMS=1, NOMINAL, NC=5, HIGH=5;
>TEST ITEMS=(2,3), GRADED, NC=(5,5);
>FIX ITEMS=1, AK=(1,3), VALUE=0.0;
>END;
5
12345
111
222

765
12 MULTILOG EXAMPLES

333
444
555
(1X,3A1,F5.0)

Selected output is shown below.

ITEM 1: 5 NOMINAL CATEGORIES, 5 HIGH


CATEGORY(K): 1 2 3 4 5
A(K) -0.39 -0.39 0.24 -0.39 0.94
C(K) 2.02 -1.49 -0.26 0.57 -0.83

CONTRAST-COEFFICIENTS (STANDARD ERRORS)


FOR: A C
CONTRAST P(#) COEFF.[ DEV.] P(#) COEFF.[ DEV.]
1 17 0.00 (0.00) 3 -3.51 (0.34)
2 1 0.63 (0.26) 4 -2.28 (0.26)
3 18 0.00 (0.00) 5 -1.45 (0.13)
4 2 1.33 (0.26) 6 -2.85 (0.31)

@THETA: INFORMATION: (Theta values increase in steps of 0.2)


-3.0 - -1.6 0.006 0.007 0.008 0.010 0.012 0.014 0.017 0.020
-1.4 - 0.0 0.024 0.029 0.035 0.042 0.051 0.061 0.029 0.090
0.2 - 1.6 0.108 0.130 0.155 0.183 0.214 0.247 0.280 0.312
1.8 - 3.0 0.339 0.360 0.373 0.375 0.368 0.351 0.326

OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN


CATEGORY(K): 1 2 3 4 5
OBS. FREQ. 302 9 34 71 32
OBS. PROP. 0.6741 0.0201 0.0759 0.1585 0.0714
EXP. PROP. 0.6740 0.0201 0.0759 0.1584 0.0716

ITEM 2: 5 GRADED CATEGORIES


P(#) ESTIMATE (S.E.)
A 7 2.47 (0.24)
B( 1) 8 0.22 (0.07)
B( 2) 9 0.98 (0.08)
B( 3) 10 1.67 (0.12)
B( 4) 11 2.49 (0.22)

TOTAL TEST INFORMATION


@THETA: INFORMATION:
-3.0 - -1.6 1.049 1.073 1.109 1.162 1.240 1.350 1.502 1.699
-1.4 - 0.0 1.940 2.210 2.489 2.769 3.058 3.373 2.210 4.008
0.2 - 1.6 4.226 4.348 4.407 4.432 4.409 4.321 4.177 3.998
1.8 - 3.0 3.803 3.619 3.466 3.315 3.103 2.802 2.450

MARGINAL RELIABILITY: 0.6768

OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN


RES. : :
72.0( 76.9) -0.56 : -1.11 ( 0.68) : 111
47.0( 47.8) -0.11 : -0.50 ( 0.54) : 112
36.0( 33.5) 0.43 : -0.19 ( 0.53) : 113
15.0( 14.3) 0.20 : 0.05 ( 0.54) : 114
10.0( 12.5) -0.70 : 0.26 ( 0.60) : 115

766
12 MULTILOG EXAMPLES

NEGATIVE TWICE THE LOGLIKELIHOOD= 99.9


(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

12.16 A mixed three-parameter logistic and partial credit model for a 26-item
test

In this example, we illustrate the use of MULTILOG for item analysis for a test comprising 26
conventional multiple-choice items (scored dichotomously: correct or incorrect), and a 27th item
with three response categories. We use the 3PL model for items 1-26, and Bock’s (1972) nomi-
nal model (with constraints making it equivalent to Masters’ partial credit model) for item 27.
Note that, as in the previous section, the specification of two distinct item response models is
done with two TEST commands.

In this example, we use Bayesian prior distributions for all three parameters of the 3PL model:
we assume that the slopes (as) are distributed normally with an average value of 1.7 (equal to a
slope of 1.0 in the usual “normal metric” of the 3PL) and a standard deviation of 1. We assume
that the bs are distributed normally with mean zero and standard deviation 3 (this serves only to
limit the bs for very easy or very difficult items); and we assume that the logit of the lower as-
ymptote is normally distributed with an average of –1.4 and a standard deviation of 1. The
TMATRIX commands establish the partial credit parameterization for item 27.

Using MULTILOG, there is no problem combining item response models to analyze and score
items with different kinds of responses on the same test. The data file exampl16.dat is used in
this example and the command file (exampl16.mlg) is shown below.

EXAMPL16.MLG -
MIXTURE OF 26 3PL ITEMS AND ONE PARTIAL CREDIT ITEM
>PROBLEM RANDOM, INDIVIDUAL, NITEMS=27, NGROUP=1, NEXAMINEES=668,
DATA=‘EXAMPL16.DAT’;
>TEST ITEMS=(1(1)26), L3;
>TEST ITEMS=27, NOMINAL, NC=3, HIGH=3;
>PRIORS ITEMS=(1(1)26), AJ, PARAMS=(1.7,1.0);
>PRIORS ITEMS=(1(1)26), BJ, PARAMS=(0.0,2.0);
>PRIORS ITEMS=(1(1)26), CJ, PARAMS=(-1.4,0.5);
>TMATRIX ITEMS=27, AK, POLYNOMIAL;
>TMATRIX ITEMS=27, CK, TRIANGLE;
>FIX ITEMS=27, AK=2, VALUE=0.0;
>SAVE ;
>END;
5
01239
111111111111111111111111110
222222222222222222222222221
000000000000000000000000002
000000000000000000000000003
000000000000000000000000000
(12A1,2X,15A1)

Selected output follows.

767
12 MULTILOG EXAMPLES

ITEM 1: 2 NOMINAL CATEGORIES, 2 HIGH


TRADITIONAL 3PL, NORMAL METRIC: A B C
0.78 -1.28 0.21

ITEM 26: 2 NOMINAL CATEGORIES, 2 HIGH


TRADITIONAL 3PL, NORMAL METRIC: A B C
0.72 1.21 0.21

ITEM 27: 3 NOMINAL CATEGORIES, 3 HIGH


CATEGORY(K): 1 2 3
A(K) -2.09 0.00 2.09
C(K) 0.00 0.59 -0.93

TOTAL TEST INFORMATION


@THETA: INFORMATION:
-3.0 - -1.6 1.127 1.184 1.267 1.383 1.547 1.776 2.091 2.515
-1.4 - 0.0 3.071 3.774 4.634 5.656 6.843 8.179 3.774 11.023
0.2 - 1.6 12.245 13.022 13.097 12.418 11.228 9.867 8.568 7.426
1.8 - 3.0 6.452 5.612 4.862 4.177 3.561 3.028 2.590

MARGINAL RELIABILITY: 0.8453

NEGATIVE TWICE THE LOGLIKELIHOOD= 2629.5


(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

12.17 Equivalent groups equating of two forms of a four-item personality


inventory

In an attempt to link the study of social norms and the study of personality, Stouffer & Toby
(1951) devised three forms of a questionnaire designed to measure a personality disposition to-
ward “particularistic” (as opposed to “universalistic”) solutions to social-role conflicts. Form A
of their questionnaire consisted of four vignettes designed to invoke social role conflict and the
items elicited particularistic or universalistic responses. The four items are reproduced by This-
sen & Steinberg (1988), along with a discussion of the data analysis in this example. In Form B,
the stories were worded so that a friend of the respondent faced the role conflict and items meas-
ured expectations for particularistic or universalistic actions on the part of friends.

Here, we consider the fit of the 2PL model to these data. In the data, the item responses for Form
A are in columns 3-6 (as items 1-4), and the item responses for Form B are in columns 7-10 (as
items 5-8). The trace lines have been fitted with the constraint that the slopes are the same for a
given item on the two forms [using >EQUAL AJ, ITEMS=(5,6,7,8), WITH=(1,2,3,4)], but the
thresholds may vary between forms. The respondents were randomly assigned to the different
forms; therefore we constrained the population means of the two groups to be equal. Because the
mean for group 2 is fixed at zero as an identifiability constraint, this is done by fixing the mean
for group 1 at zero as well. The model fits the data adequately; the goodness-of-fit likelihood ra-
tio statistic is 21.9 on 18 d.f., p = 0.2.

In the output, note that when there are two (or more) groups, MULTILOG prints the observed
frequencies and proportions in each response category for the entire sample, but the expected

768
12 MULTILOG EXAMPLES

proportions are computed separately for each group.

The command file exampl17.mlg, using the data file exampl17.dat, is given below.

EXAMPL17.MLG - STOUFER-TOBY, FORMS A&B,


MODEL FOR PARAMETERS IN TABLE 2, T&S 88
>PROBLEM RANDOM, PATTERNS, NITEMS=8, NGROUP=2, NPATTERNS=32,
DATA=‘EXAMPL17.DAT’;
>TEST ALL, L2;
>EQUAL AJ, ITEMS=(5,6,7,8), WITH=(1,2,3,4);
>FIX MU, GROUPS=1, VALUE=0.0;
>END;
2
+-
++++++++
N
(I1,1X,8A1,F3.0)

Selected output is given below.

ITEM SUMMARY

ITEM 1: 2 GRADED CATEGORIES


P(#) ESTIMATE (S.E.)
A 1 1.26 (0.16)
B( 1) 2 1.37 (0.19)

TOTAL TEST INFORMATION

FOR GROUP 1:
@THETA: INFORMATION:
-3.0 - -1.6 1.238 1.309 1.402 1.522 1.675 1.868 2.108 2.399
-1.4 - 0.0 2.745 3.148 3.602 4.095 4.590 5.019 3.148 5.316
0.2 - 1.6 5.087 4.667 4.168 3.684 3.262 2.915 2.632 2.397
1.8 - 3.0 2.196 2.020 1.865 1.728 1.608 1.503 1.414

MARGINAL RELIABILITY: 0.7221

TOTAL TEST INFORMATION

FOR GROUP 2:
@THETA: INFORMATION:
-3.0 - -1.6 1.238 1.309 1.402 1.522 1.675 1.868 2.108 2.399
-1.4 - 0.0 2.745 3.148 3.602 4.095 4.590 5.019 3.148 5.316
0.2 - 1.6 5.087 4.667 4.168 3.684 3.262 2.915 2.632 2.397
1.8 - 3.0 2.196 2.020 1.865 1.728 1.608 1.503 1.414

MARGINAL RELIABILITY: 0.7221

Note: In this situation, MULTILOG “thinks” there are eight items when, in fact, each respon-
dent answered only four. MULTILOG computes TOTAL TEST INFORMATION and MARGINAL
RELIABILITY assuming that each respondent answered all (eight) items; as a result, these values
are not correct for the four-item tests that were actually administered. MULTILOG cannot know
the difference between real “missing data” and this kind of artificial “missing data.” In situations

769
12 MULTILOG EXAMPLES

like this, the TOTAL TEST INFORMATION and MARGINAL RELIABILITY values printed cannot be
used.

GROUP 1
OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN
RES. : :
20.0( 21.9) -0.42 : 1.27 ( 0.71) : 22220000
9.0( 8.9) 0.04 : 0.83 ( 0.66) : 21220000
6.0( 4.0) 0.99 : 0.31 ( 0.61) : 22120000
2.0( 1.7) 0.24 : 0.57 ( 0.63) : 22210000
2.0( 1.3) 0.66 : 0.21 ( 0.61) : 21210000
...
NEGATIVE TWICE THE LOGLIKELIHOOD= 11.6
(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

GROUP 2
OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN
RES. : :
20.0( 24.8) -0.96 : 1.23 ( 0.71) : 00002222
23.0( 17.4) 1.34 : 0.79 ( 0.66) : 00002122
4.0( 4.1) -0.03 : 0.27 ( 0.61) : 00002212
3.0( 1.9) 0.77 : 0.53 ( 0.63) : 00002221
3.0( 2.5) 0.32 : 0.18 ( 0.60) : 00002121

NEGATIVE TWICE THE LOGLIKELIHOOD= 10.3
(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

TOTAL, NEGATIVE TWICE THE LOGLIKELIHOOD, ALL GROUPS= 21.9

12.18 Differential item functioning (DIF) analysis of eight items from the 100-
Item Spelling Test

Thissen, Steinberg & Wainer (1993) illustrated the application of a number of likelihood-based
procedures for the detection of Differential Item Functioning (DIF) using a set of data derived
from a conventional orally-administered spelling test, with data obtained from 659 undergradu-
ates at the University of Kansas. A description of these data are given in Section 2.4.1.

The reference group included the male students (N = 285), and the focal group was made up of
the female students (N = 374). The original test had 100 words, but only four (infidelity, pano-
ramic, succumb, and girder) are used here. The words infidelity, panoramic, and succumb were
selected to comprise an “anchor” (a set of items believed to involve no DIF) with information
over a range of the θ -continuum. The word girder is the “studied” item. It was selected because
it shows substantial differential difficulty for the two groups in these data.

Thissen, Steinberg & Wainer (1993) included (in an appendix) a description of the procedures
followed to compute the estimates using MULTILOG version 5. In this section, the same analy-
sis is reproduced using version 7. The item responses for the males are read as items 1-4, and
those for the females as items 5-8.

770
12 MULTILOG EXAMPLES

Syntax from the file exampl18.mlg is given below. This analysis is based on the data in ex-
ampl18.dat.

EXAMPL18.MLG -‘GIRDER’ DIF


3-ITEM ANCHOR (ITEMS 5,4,25), 1PL
>PROBLEM RANDOM, PATTERNS, NITEMS=8, NGROUP=2, NPATTERNS=32,
DATA=‘EXAMPL18.DAT’;
>TEST ALL, L1;
>EQUAL BJ, ITEMS=(5,6,7), WITH=(1,2,3);
>END;
2
01
11111111
N
(I1,8A1,F3.0)

Selected output follows.

ITEM 1: 2 GRADED CATEGORIES


P(#) ESTIMATE (S.E.)
A 6 1.25 (0.08)
B( 1) 1 -1.34 (0.13)

@THETA: INFORMATION: (Theta values increase in steps of 0.2)


-3.0 - -1.6 0.155 0.187 0.222 0.259 0.296 0.331 0.360 0.380
-1.4 - 0.0 0.390 0.387 0.373 0.349 0.318 0.282 0.387 0.208
0.2 - 1.6 0.174 0.143 0.117 0.094 0.076 0.060 0.048 0.038
1.8 - 3.0 0.030 0.023 0.018 0.014 0.011 0.009 0.007

OBSERVED AND EXPECTED COUNTS/PROPORTIONS IN


CATEGORY(K): 1 2
OBS. FREQ. 70 215
OBS. PROP. 0.2456 0.7544

GROUP 1:
EXP. PROP. 0.2093 0.7907

GROUP 2:
EXP. PROP. 0.2135 0.7865

ITEM 9: GRP1, N[MU: 0.03 SIGMA: 1.00]


P(#);(S.E.): 7; (0.07) 9; (0.00)

TOTAL TEST INFORMATION

FOR GROUP 1:
@THETA: INFORMATION:
-3.0 - -1.6 1.556 1.683 1.831 1.999 2.184 2.383 2.589 2.792
-1.4 - 0.0 2.985 3.158 3.301 3.409 3.477 3.504 3.158 3.439
0.2 - 1.6 3.354 3.239 3.099 2.939 2.763 2.577 2.388 2.202
1.8 - 3.0 2.024 1.860 1.714 1.585 1.475 1.383 1.306

MARGINAL RELIABILITY: 0.6651

771
12 MULTILOG EXAMPLES

TOTAL TEST INFORMATION

FOR GROUP 2:
@THETA: INFORMATION:
-3.0 - -1.6 1.556 1.683 1.831 1.999 2.184 2.383 2.589 2.792
-1.4 - 0.0 2.985 3.158 3.301 3.409 3.477 3.504 3.158 3.439
0.2 - 1.6 3.354 3.239 3.099 2.939 2.763 2.577 2.388 2.202
1.8 - 3.0 2.024 1.860 1.714 1.585 1.475 1.383 1.306

MARGINAL RELIABILITY: 0.6658

GROUP 1
OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN
RES. : :

29.0( 32.3) -0.57 : -1.30 ( 0.72) : 00001111


7.0( 7.3) -0.12 : -0.68 ( 0.69) : 00001112
50.0( 50.4) -0.05 : -0.68 ( 0.69) : 00002111
30.0( 24.3) 1.16 : -0.09 ( 0.68) : 00002112
15.0( 17.9) -0.69 : -0.68 ( 0.69) : 00001211
...

NEGATIVE TWICE THE LOGLIKELIHOOD= 14.7


(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)

‘GIRDER’ DIF; 3-ITEM ANCHOR (ITEMS 5,4,25), 1PL

GROUP 2
OBSERVED(EXPECTED) STD. : EAP (S.D.) : PATTERN
RES. : :

22.0( 20.7) 0.30 : -1.40 ( 0.71) : 11110000


10.0( 10.4) -0.14 : -0.80 ( 0.68) : 11120000
30.0( 28.0) 0.37 : -0.80 ( 0.68) : 21110000
27.0( 29.6) -0.48 : -0.22 ( 0.68) : 21120000
13.0( 10.0) 0.96 : -0.80 ( 0.68) : 12110000
...

NEGATIVE TWICE THE LOGLIKELIHOOD= 10.5


(CHI-SQUARE FOR SEVERAL TIMES MORE EXAMINEES THAN CELLS)
TOTAL, NEGATIVE TWICE THE LOGLIKELIHOOD, ALL GROUPS= 25.2

12.19 Individual scores for a skeletal maturity scale based on graded ratings
of ossification sites in the knee

Roche, Wainer, & Thissen (1975) calibrated 34 “indicators” (items) of skeletal maturity using
Samejima’s (1969) graded model ; a description of the model and methods used is in Chapter V
of that volume. The parameters of the data for the males are used here to “score” (estimate θ =
skeletal age) using the following data in the file exampl19.dat:

40 1 0.5 2112111111112111111111111111111111
33 1 1.0 3113211111112122111111111111111111
33 1 2.0 4333211111113122111111111111111111
29 1 3.0 4543211111113122111111011111111111
8 1 5.0 5553211211114222112111111111323111

772
12 MULTILOG EXAMPLES

10 1 6.0 5553211211115322121111121111323011
23 1 7.0 5553211311115322111001111111323011
26 1 8.0 5553212221115322221001221111423211
35 1 9.0 5553211321115422222111211111423111
10 1 12.0 5553212321115522222111222111523021
23 1 14.0 5553210320115522022100220011523121
24 1 16.0 5553222323105522222222222221523222
46 1 18.0 5553222323025522222202222202523224

The parameters for the 34 indicators are in a file called exampl19.prm. This file was produced
by MULTILOG in a (previous) calibration run. Note that the parameters in the Roche et al.
(1975) table (in which the thresholds are called τ and the slopes are called d) are in years, instead
of the usual standard units, so the results appear in years.

The MULTILOG command file includes instructions to SCORE INDIVIDUAL data on the PROBLEM
command, as well as to use no population distribution, because RWT skeletal ages are not nor-
mally computed using a population distribution. We also use CRITERION, which instructs
MULTILOG to read the chronological age of each individual to use as a starting value for the
iterative modal estimation procedure. The first ten characters (NCHARS=10) on each record are
read as an identification field; using T-format, the age in that field is also read later as the
CRITERION. The “test” has varying numbers of response categories for the 34 indicators, which
are entered in the NC list on the TEST command. The command file exampl19.mlg is shown be-
low. To see how to generate this command file using the syntax wizard, see Section 4.3.1.

EXAMPL19.MLG -
ESTIMATION OF SKELETAL MATURITY BY THE RWT (1975) METHOD
>PROBLEM SCORE, INDIVIDUAL, CRITERION, NEXAMINEES=13, NITEMS=34, NCHARS=10,
DATA=‘EXAMPL19.DAT’;
>TEST ALL, GRADED,
NC=(5,5,5,3,2,2,2,3,2,3,3,3,5,5,2(0)12,3,3,5,2,3,2,2,4);
>START ALL, FORMAT, PARAM=‘EXAMPL19.PRM’;
>SAVE;
>END;
6
123450
1111111111111111111111111111111111
2222222222222222222222222222222222
3333333333333333333333333333333333
4444444444444444444444444444444444
5555555555555555555555555555555555
0000000000000000000000000000000000
(10A1,1X,34A1,T7,F4.0)

The RWT estimates of skeletal age are modal estimates of θ , labeled THETAHAT on the last page
of the MULTILOG output. Their estimated standard errors, the number of iterations required to
compute each, and the contents of the ID field are also printed there. When using MULTILOG,
modal estimates of θ are always computed in this way, in a subsequent run after the item pa-
rameters have been estimated. Frequently, several item analysis runs are required with a set of
item-response data before a satisfactory set of item parameters is obtained; only then is it useful
to score the individuals. Selected output for this run follows.

773
12 MULTILOG EXAMPLES

ESTIMATION OF SKELETAL MATURITY BY THE RWT (1975) METHOD

SCORING DATA...

THETAHAT S.E. ITER ID FIELD


-2.306 0.432 5 40 1 0.5
-1.730 0.386 3 33 1 1.0
-1.352 0.361 4 33 1 2.0
-1.243 0.367 4 29 1 3.0
-0.507 0.328 4 8 1 5.0
-0.276 0.331 4 10 1 6.0
-0.340 0.346 4 23 1 7.0
0.376 0.337 3 26 1 8.0
0.155 0.330 4 35 1 9.0
0.814 0.358 3 10 1 12.0
1.012 0.397 3 23 1 14.0
1.963 0.422 4 24 1 16.0
2.383 0.465 6 46 1 18.0

774
13 TESTFACT EXAMPLES

13 TESTFACT examples

13.1 Classical item analysis and scoring on a geography test with an external
criterion

The geography test discussed in this example consists of 20 items. The total score on the test is
used as the criterion score. The items are testing the following topics:

 Structure and landforms


 Erosion, transport, and deposition
 Climate and vegetation
 Mineral resources
 Agriculture and industry
 Population and transport
 Miscellaneous

This example illustrates the running of stacked problems. The two problems use the same data,
but with different variable format statements. The same data are found in two identical data files,
exampl01.da1 and exampl01.da2. The reason for the duplication is that, in the case of stacked
problems, the same data file cannot be opened more than once during the analysis. The first ten
lines of the data files exampl01.da1 and exampl01.da2 are shown below.

1201903390B32325251253531212145 62531
2201903400B12223111431231122312 02535
3201903410B12123432542455323111 92231
4201903420B15323121415431524135 91827
5201903430B43123221153531522151 81220
6201903440B45124321343431512313101121
7201903450B14523224514521123411 81826
8201903460B45125422444211421213 51217
9201903470B34423221541453322131122638
10201903480B44423525451431313114121628

The persons sitting for the test are classified by sex, with “G” denoting a girl, and “B” a boy. Col-
umns 1 to 3 inclusive contain the case identification, while the gender classification is given in
column 13. These fields are denoted by “3A1” and “A1” in the variable format statement. Note
that the “X” operator is used to skip from column 3 to column 13. The width of the case identifi-
cation field is also indicated by the NIDCHAR keyword on the INPUT command.

The variable format statement for the first problem is

(3A1,9X,A1,20A1,F2.0)

The item responses are given in columns 14 to 33 and are represented by “20A1” in the format
statement. Finally, the criterion score is given in columns 34 and 35. Note that this score is read
as a real number with format “F2.0”.

775
13 TESTFACT EXAMPLES

The format statement for the second analysis is

(3A1,9X,1X,20A1)

and is the same as for the first analysis, with the exceptions of the omission of the criterion score
specification and the omission of the gender classification.

In the first problem, an external criterion score is used. The PROBLEM command specifies that 20
items, with 6 responses each, are to be analyzed in the first problem (NITEM=20 and
RESPONSE=6). To obtain estimated item statistics for the two gender groups, the responses are
divided into two classes (CLASS=2) and the definition of the two classes is given in the CLASS
command. The INPUT command indicates that the data are in the external data file exampl01.da1
(FILE keyword) and that it consists of scores (SCORES option).

The external criterion score used is a score input with item responses (CRITMARK option on the
CRITERION command) named “TWENTY” (NAME keyword). By specifying the ALPHA option on the
RELIABILITY command, the calculation of coefficient alpha is requested. Alternatively, the
Kuder-Richardson formula 20, which is the default reliability measure, may be requested using
the KR20 option.

The PLOT command requests line plots of the point biserial coefficient (PBISERIAL option) as
discrimination index and with discriminating power with respect to the external criterion
(CRITERION option). The measure of item difficulty is plotted in terms of the item facility (per-
cent correct; default FACILITY option).

Note that the use of the CONTINUE command in the case of stacked problems is optional.

In the second part of exampl01.tsf the geography test is split into 2 subtests. This is indicated by
the use of the SUBTEST keyword on the PROBLEM command and the SUBTEST command in which
the BOUNDARY keyword is used to indicate that the 12th item is the last item in the first subtest,
and the 20th item is the last item in subtest 2. The subtests are named using the NAME keyword on
this command. The subtests are composed of items testing the following abilities:

 Factual recall, comprehension and application


 Analysis, evaluation and synthesis.

The reordering of the items is indicated by the SELECT keyword on the PROBLEM command. The
reordering is specified on the SELECT command, which lists the items in the order in which they
are to be used.

The fractile option is used to investigate behavior of items across the ability spectrum. The
FRACTILES command is used to group scores into fractiles by score boundaries (SCORES option).
The boundaries, consisting of the cumulative upper scores on the test bands, are defined using
the BOUNDARY keyword on the FRACTILES command. The FRACTILES keyword on the PROBLEM
command indicates that 3 fractiles will be used for score divisions. The INPUT command indi-
cates that, as in the first analysis, scores are used as input. In addition, the LIST option requests

776
13 TESTFACT EXAMPLES

the listing, for all subjects, of the identification, main and subtest scores in the output file.

Each TESTFACT run produces output under headings labelled Phase 0 to Phase 7. The Phase 1
to Phase 4 output contains data description, plots, basic statistics, and item statistics. These are
discussed in detail in Section 13.4.1. In the present example, the Phase 1 to Phase 4 output is
suppressed using by setting the SKIP keyword on the PROBLEM command to 1. Phase 5 output
provides information about tetrachoric correlations, while Phase 6 and 7 output are only pro-
duced if a FACTOR or BIFACTOR command is present in the command file.

>TITLE
EXAMPL01.TSF- GEOGRAPHY TEST WITH EXTERNAL CRITERION SCORE
ITEM AND TEST STATISTICS
>PROBLEM NITEM=20,RESPONSE=6,CLASS=2;
>NAMES MISCELL1,MISCELL2,EROSION1,EROSION2,EROSION3,
STRUCTU1,MINERAL1,MINERAL2,MINERAL3,AGRICUL1,
MISCELL3,STRUCTU2,EROSION4,CLIMATE1,CLIMATE2,
MINERAL4,AGRICUL2,AGR,POPULAT1,STRUCTU3;
>RESPONSE ‘0’,’1’,’2’,’3’,’4’,’5’;
>KEY 14423321441435112111;
>CLASS IDEN=(G,B),NAME=(GIRLS,BOYS);
>CRITERION CRITMARK, NAME=’TWENTY’;
>RELIABIITY ALPHA;
>PLOT PBISERIAL,CRITERION,FACILITY;
>INPUT NIDCHAR=3,SCORES,FILE=‘EXAMPL01.DA1’;
(3A1,9X,A1,20A1,F2.0)
>TITLE
GEOGRAPHY TEST SPLIT INTO 2 SUBTESTS AND USE OF FRACTILES
ITEMS REORDERED
>PROBLEM NITEM=20,RESPONSE=6,SELECT=20,SUBTEST=2,FRACTILES=3;
>NAMES MISCELL1,MISCELL2,EROSION1,EROSION2,EROSION3,
STRUCTU1,MINERAL1,MINERAL2,MINERAL3,AGRICUL1,
MISCELL3,STRUCTU2,EROSION4,CLIMATE1,CLIMATE2,
MINERAL4,AGRICUL2,AGR,POPULAT1,STRUCTU3;
>RESPONSE ‘0’,’1’,’2’,’3’,’4’,’5’;
>KEY 14423321441435112111;
>SELECT 3,4,7(1)12,16(1)19,1,2,5,6,13,14,15,20;
>SUBTEST BOUNDARY=(12,20),NAME=(RECALL,ANALYSIS);
>FRACTILE SCORE,BOUNDARY=(7,13,20);
>INPUT NIDCHAR=3,SCORES,LIST,FILE=‘EXAMPL01.DA2’;
(3A1,9X,1X,20A1)
>STOP

Portions of the Phase 5 output are shown below. The first part of the output contains, for each
selected item, the number of cases, % correct, % omitted, % not reached and % not-presented.
The summary shows that 2.5% of the respondents omitted item number 6.

MAIN TEST MISSING RESPONSE INFORMATION


-------------------------------------------------------------------------------

ITEM NUMBER PERCENT PERCENT PERCENT PERCENT


OF CASES CORRECT OMITTED NOT REACHED NOT PRESENTED
-------------------------------------------------------------------------------
1. EROSION1 40 32.5 0.0 0.0 0.0
2. EROSION2 40 97.5 0.0 0.0 0.0
3. MINERAL1 40 57.5 0.0 0.0 0.0
4. MINERAL2 40 77.5 0.0 0.0 0.0
5. MINERAL3 40 62.5 0.0 0.0 0.0

777
13 TESTFACT EXAMPLES

6. AGRICUL1 39 50.0 2.5 0.0 0.0


7. MISCELL3 40 52.5 0.0 0.0 0.0
8. STRUCTU2 40 65.0 0.0 0.0 0.0
9. MINERAL4 40 22.5 0.0 0.0 0.0
10. AGRICUL2 40 47.5 0.0 0.0 0.0
11. AGRICUL3 40 60.0 0.0 0.0 0.0
12. POPULAT1 40 65.0 0.0 0.0 0.0
13. MISCELL1 40 70.0 0.0 0.0 0.0
14. MISCELL2 40 55.0 0.0 0.0 0.0
15. EROSION3 40 85.0 0.0 0.0 0.0
16. STRUCTU1 40 7.5 0.0 0.0 0.0
17. EROSION4 40 67.5 0.0 0.0 0.0
18. CLIMATE1 40 32.5 0.0 0.0 0.0
19. CLIMATE2 40 30.0 0.0 0.0 0.0
20. STRUCTU3 40 42.5 0.0 0.0 0.0
------------------------------------------------------------------------------

Use is made of n × (n − 1) / 2 2 × 2 frequency tables to calculate the tetrachoric correlations. Since


n (the number of items) equals 20, the number of possible tables is 20 × 19 / 2 = 190 . In this case,
there are only 162 valid pairs of tables. Examples of non-valid pairs are given in Section 13.4.1.
The output below shows a listing of the first four non-valid pairs, followed by the average tetra-
choric correlation.

-->ITEM PAIR( 2, 1):


CELL FREQUENCIES ARE TOO SMALL FOR A MEANINGFUL RESULT;
RTET=-1.00 SUBSTITUTED.
-->ITEM PAIR( 3, 2):
CELL FREQUENCIES ARE TOO SMALL FOR A MEANINGFUL RESULT;
RTET= 1.00 SUBSTITUTED.
-->ITEM PAIR( 4, 2):
CELL FREQUENCIES ARE TOO SMALL FOR A MEANINGFUL RESULT;
RTET= 1.00 SUBSTITUTED.

-->ITEM PAIR( 5, 2):


CELL FREQUENCIES ARE TOO SMALL FOR A MEANINGFUL RESULT;
RTET= 1.00 SUBSTITUTED.

AVERAGE TETRACHORIC CORRELATION = 0.0862


STANDARD DEVIATION = 0.2817
NUMBER OF VALID ITEM PAIRS = 162

13.2 Two-factor non-adaptive full information factor analysis on a five-item


test (LSAT7)

In this example, a non-adaptive full information item factor analysis is performed on 5 items,
with 3 responses each, from the LSAT data (Bock & Lieberman (1970), LSAT data Section 7).
The number of items and responses are indicated by the NITEMS and RESPONSE keywords on the
PROBLEM command. The data are in the file exampl02.dat, and have the following layout:

 Columns 1 to 2: Pattern number (ID)


 Columns 3 to 7: Item responses
 Columns 11 to 13: Weight (number of occurrences of pattern)

778
13 TESTFACT EXAMPLES

The variable format statement

(2A1,5A1,3X,I3)

lists these three fields in the same order, and the “X” operator is used to skip from column 7 to
column 11.

The INPUT command indicates that item scores are used as input (SCORES option) and that each
data record starts with an identification field 2 characters in length (NIDCHAR=2). The WEIGHT
keyword is set to PATTERN to indicate that each data record consists of an answer pattern with a
frequency. Note that the frequency is read as an integer (I3) in the variable format statement.

The three responses are listed on the RESPONSE command, while the KEY command indicates that
a “1” is the correct response to all 5 items. By default, the RECODE option will be used on the
TETRACHORIC command, and thus all omits will be recoded as wrong responses.

The TETRACHORIC command specifies details concerning the tetrachoric correlation matrix. Co-
efficients will be printed to 3 decimal places (NDEC=3) and the matrix of tetrachoric correlations
will appear in the printed output (LIST option). This matrix may also be saved to an external file
if the CORRELAT option is included on the (optional) SAVE command.

The FACTOR and FULL commands are used to specify parameters for the full information item
factor analysis. Two factors and 3 latent roots are to be extracted, as indicated by the NFAC and
NROOT keywords respectively. A PROMAX rotation is requested. Note that this keyword may not be
abbreviated in the FACTOR command. The residual correlation matrix will be computed as the
initial correlation matrix minus the final correction matrix (RESIDUAL option). An f-factor posi-
tive definite estimate of the latent response process correlation matrix will be computed (SMOOTH
option). This option affects only the output of the final smoothed correlation matrix. A maximum
of 20 EM cycles will be performed (CYCLES keyword on the FULL command).

The NOADAPT option on the TECHNICAL command specifies that non-adaptive quadrature should
be used to obtain the full information solution. Note that, if NFAC > 5, the presence of this option
will be ignored and adaptive fractional quadrature will be performed.

The smoothed correlation matrix, rotated factor loadings and item parameters are saved to exter-
nal files (exampl02.smo, exampl02.rot and exampl02.par respectively) using the SMOOTH,
ROTATE and PARM options on the SAVE command.

>TITLE
EXAMPL02.TSF- LSAT DATA NON-ADAPTIVE FULL INFORMATION ITEM FACTOR ANALYSIS
COUNTED RESPONSE PATTERNS
>PROBLEM NITEM=5,RESPONSE=3;
>NAMES ITEM1,ITEM2,ITEM3,ITEM4,ITEM5;
>RESPONSE ‘8’, ‘0’, ‘1’;
>KEY 11111;
>TETRACHORIC NDEC=3,LIST;
>FACTOR NFAC=2,NROOT=3,ROTATE=PROMAX,RESIDUAL,SMOOTH;
>FULL CYCLES=20;
>TECHNICAL NOADAPT;

779
13 TESTFACT EXAMPLES

>SAVE SMOOTH,ROTATE,PARM;
>INPUT NIDCHAR=2,SCORES,WEIGHT=PATTERN, FILE=’EXAMPL02.DAT’;
(2A1,5A1,3X,I3)
>STOP;

13.3 One-factor non-adaptive full information item factor analysis of the five-
item test

In this example, the LSAT data of Section 13.2 are analyzed assuming a one-factor model. The
purpose of the analysis is to compare the goodness-of-fit with that of the two-factor model, and
to use the change in χ 2 between the models as a test of statistical significance of the second fac-
tor. The computation of classical item statistics is skipped (SKIP=1), and the factor loadings are
not rotated or saved.

>TITLE
EXAMPL03.TSF- LSAT DATA NON-ADAPTIVE FULL INFORMATION ITEM FACTOR ANALYSIS
TEST OF FIT
>PROBLEM NITEM=5,RESPONSE=3,SKIP=1;
>NAMES ITEM1,ITEM2,ITEM3,ITEM4,ITEM5;
>RESPONSE ‘8’,‘0’,‘1’;
>KEY 11111;
>TETRACHORIC NDEC=3,LIST;
>FACTOR NFAC=1,NROOT=3;
>FULL CYCLES=16;
>TECHNICAL NOADAPT;
>INPUT NIDCHAR=2,SCORES,WEIGHT=PATTERN,FILE=’EXAMPL02.DAT’;
(2A1,5A1,3X,I3)
>STOP;

13.4 A three-factor adaptive item factor analysis with Bayes (EAP) estimation
of factor scores: 32 items from an activity survey

This example analyzes 32 items selected from the 48-item version of the Jenkins Activity Survey
for Health Prediction, Form B (Jenkins, Rosenman, & Zyzanski, 1972). The data are responses
of 598 men from central Finland drawn from a larger survey sample. Most of the items are rated
on three-point scales representing little or no, occasional, or frequent occurrence of the activity
or behavior in question. For purposes of the present analysis, the scales have been dichotomized
near the median. Wording in the positive or negative direction varies from item to item as fol-
lows (item numbers are those of the original pool of items from which those of the present form
were selected):

-Q156,-Q157,+Q158,-Q165,-Q166,-Q167,+Q247,+Q248,-Q249,-Q250,+Q251,
+Q252,+Q253,+Q254,+Q255,+Q256,+Q257,-Q258,-259,+Q260,+Q261,+Q262,
+Q263,+Q264,+Q265,-Q266,+Q267,+Q268,+Q269,+Q270,+Q271,+Q272,-Q273,
-Q274,-Q275,+Q276,+Q277,+Q278,-Q279,-Q280,+Q307,+Q308,+Q309,+Q310,
+Q311,-Q312,-Q313,-Q314.

The first 7 lines of the data file exampl04.dat are shown below.

780
13 TESTFACT EXAMPLES

201000220122112221022212202112211101122112222000
001221211011100111111111111110111102211111211020
0010.02100222122021221222112112212.0011111222001
002020220212012120011112112221221022211111222202
201000221000211221221112012211122112211111222000
001001221022011120022222212222211101121112222101
102100111022112120021212212221121212111022200021

The first 10 columns of each record are used as case identification and are read first. Starting
again in the first column by using the “T” operator, the responses to the 48 items are read as sin-
gle fields (48A1).

(10A1,T1,48A1)

The SELECT keyword on the PROBLEM command indicates that 32 items are selected from the
original 48 items. The SELECT command provides the selected items in the order in which they
will be used. The RESPONSE command lists the 5 responses indicated on the PROBLEM command
(RESPONSE keyword) and the KEY command provides the correct responses for each of the 48
items. The NOTPRESENTED option on the PROBLEM command is required if one of the response
codes identifies not-presented items. The “.” code on the RESPONSE command identifies these
responses.

The TETRACHORIC command requests the printing of the coefficients to 3 decimal places
(NDEC=3) in the printed output file (LIST option). The tetrachoric correlation matrix, item pa-
rameters, rotated factor loadings, and the factor scores will be saved in the files exampl04.cor,
exampl04.par, example04.rot, and exampl04.fsc, respectively, as specified on the SAVE com-
mand.

The FACTOR and FULL commands are used to specify parameters for the full information item
factor analysis. Three factors and ten latent roots are to be extracted, as indicated by the NFAC
and NROOT keywords respectively. A VARIMAX rotation is requested. Note that this keyword may
not be abbreviated in the FACTOR command. A maximum of 80 EM cycles will be performed
(CYCLES keyword on the FULL command). The convergence criterion for the EM cycles is given
by the PRECISION keyword on the TECHNICAL command.

Cases will be scored by EAP (Expected A Posteriori, or Bayes) estimation with adaptive quad-
rature (METHOD=2 on the SCORE command). Posterior standard deviations will also be computed.
Results will be saved in the exampl04.fsc file (FSCORES option on the SAVE command). The fac-
tor scores for the first 20 cases will be listed in the output file (LIST=20). See Section 13.5 for
MAP (Maximum A Posteriori, or Bayes Modal) estimation for the same cases.

>TITLE
EXAMPL04.TSF-ITEMS FROM THE JENKINS ACTIVITY SURVEY
ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
>PROBLEM NITEMS=48,SELECT=32,RESPONSES=5,NOTPRESENTED;
>NAMES Q156,Q157,Q158,Q165,Q166,Q167,Q247,Q248,Q249,Q250,Q251,Q252,
Q253,Q254,Q255,Q256,Q257,Q258,Q259,Q260,Q261,Q262,Q263,Q264,
Q265,Q266,Q267,Q268,Q269,Q270,Q271,Q272,Q273,Q274,Q275,Q276,
Q277,Q278,Q279,Q280,Q307,Q308,Q309,Q310,Q311,Q312,Q313,Q314;
>RESPONSE ‘8’, ‘0’, ‘1’, ‘2’, ‘.’;
>KEY 002000220022222220022222202222220002220022222000;

781
13 TESTFACT EXAMPLES

>SELECT 3,5,6,7,9,11(1)14,17(1)23,25(1)30,32,33,35,36,39(1)42,47,48;
>TETRACHORIC LIST, NDEC=3;
>FACTOR NFAC=3,NROOT=10,ROTATE=VARIMAX;
>FULL CYCLES=80;
>TECHNICAL PRECISION=0.005;
>SCORE METHOD=2,LIST=20;
>SAVE CORR,PARM,FSCORE,ROTATE;
>INPUT NIDCHAR=10,SCORES,FILE=‘EXAMPL04.DAT’;
(10A1,T1,48A1)
>STOP

13.4.1 Discussion of output

The first part of the output lists the name of the command file (exampl04.tsf) and the name of
the output file (exampl04.out). Each TESTFACT run produces output under one or more of the
following headings, depending on the type of analysis.

 Phase 0: input commands


 Phase 1: data description, histogram and basic statistics
 Phase 2: item statistics
 Phase 3: item difficulty x discrimination plot
 Phase 4: class item statistics
 Phase 5: tetrachoric correlations / response by fractile tables
 Phase 6: factor analysis / bifactor analysis
 Phase 7: general bifactor score: EAP estimate/factor score EAP estimates/factor score
MAP estimates.

The analysis specified in exampl04.tsf produces Phase 0, Phase 1, Phase 2, Phase 5, and Phase 7
output.

Phase 0: Input commands

Regardless of the type of analysis, a Phase 0 output is produced, being an echo of the input com-
mands in the *.tsf file.

PHASE 0: INPUT COMMANDS


ITEMS FROM THE JENKINS ACTIVITY SURVEY
ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
------------------------------------------------------------
>PROBLEM NITEM=48,SELECT=32,RESPONSES=5,NOTPRESENTED;
This example analyzes 32 items selected from the 48-item version
of the Jenkins Activity Survey for Health Prediction, Form B (Jenkins,
Rosenman, and Zyzanski, 1972). The data are responses of 598 men from
central Finland drawn from a larger survey sample. Most of the items
are rated on three-point scales representing little or no, occasional,
or frequent occurence of the activity or behavior in question. For
purposes of the present analysis, the scales have been dichotomized
near the median. Wording in the positive or negative direction varies
from item to item as follows (item numbers are those of the original
pool of items from which those of the present form were selected):

-Q156,-Q157,+Q158,-Q165,-Q166,-Q167,+Q247,+Q248,-Q249,-Q250,
+Q251,+Q252,+Q253,+Q254,+Q255,+Q256,+Q257,-Q258,-Q259,+Q260,+Q261,

782
13 TESTFACT EXAMPLES

+Q262,+Q263,+Q264,+Q265,-Q266,+Q267,+Q268,+Q269,+Q270,+Q271,+Q272,
-Q273,-Q274,-Q275,+Q276,+Q277,+Q278,-Q279,-Q280,+Q307,+Q308,+Q309,
+Q310,+Q311,-Q312,-Q313,-Q314.

The tetrachoric correlation matrix, item parameters, rotated factor


loadings, and the factor scores will be saved in the files EXAMPL03.COR,
EXAMPL03.PAR, EXAMPL03.ROT, and EXAMPL03.FSC, respectively.Cases will be
scored by EAP (Expected A Posteriori, or Bayes) estimation with adaptive quad-
rature (Method 2). Posterior standard deviations will also be computed. Re-
sults will be saved in the EXAMPL03.FSC file. See Exampl3a.tsf for MAP (Maxi-
mum A Posteriori, or Bayes Modal) estimation for the same cases.

>NAMES Q156,Q157,Q158,Q165,Q166,Q167,Q247,Q248,Q249,Q250,Q251,Q252,
Q253,Q254,Q255,Q256,Q257,Q258,Q259,Q260,Q261,Q262,Q263,Q264,
Q265,Q266,Q267,Q268,Q269,Q270,Q271,Q272,Q273,Q274,Q275,Q276,
Q277,Q278,Q279,Q280,Q307,Q308,Q309,Q310,Q311,Q312,Q313,Q314;
>RESPONSE ‘8’, ‘0’, ‘1’, ‘2’, ‘.’;
>KEY 002000220022222220022222202222220002220022222000;
>SELECT 3,5,6,7,9,11(1)14,17(1)23,25(1)30,32,33,35,36,39(1)42,47,48;
>TETRACHORIC LIST, NDEC=3;
>FACTOR NFAC=3,NROOT=10,ROTATE=VARIMAX;
>FULL CYCLES=80;
>TECHNICAL PRECISION=0.005;
>SCORE METHOD=2,LIST=20;
>SAVE CORR,PARM,FSCORE,ROTATE;
>INPUT NIDCHAR=10,SCORES,FILE=‘EXAMPL04.DAT’;

DATA FILENAME IS EXAMPL04.DAT

DATA FORMAT=
(10A1,T1,48A1)

Phase 1: Data description

Values of the response categories (8, 0, 1, 2, .), the answer key, contents of the first observation,
the sum of weights and number of records are given. This information enables you to verify that
the data values were read correctly from the data file exampl04.dat. The response categories in-
dicate a code of “8” for omitted responses (first value) and a code of “.” for not-presented items
(last value).

Thirty-two items were selected from the 48-item test. Based on the answer key values, a total
score for each of the 598 respondents is computed. Each item has a set of responses: right,
wrong, omit, or not-presented. For item j, j = 1, 2, …, 32, the response of person i, i = 1, 2, …,
598 can be written as

xij = 1 if the response is correct, and

xij = 0 if the response is incorrect.

783
13 TESTFACT EXAMPLES

At your option, omitted items can be considered either wrong or not presented. The total test
score X i for person i is

32
X i = ∑ xij .
j =1

Respondent 1, for example, has a total score of 19 correct out of a possible 32 as shown below.

Answer key:

20020222220022222022222002002200

Respondent 1:

10020221121022212021121101211200

EXAMPL04.TSF- ITEMS FROM THE JENKINS ACTIVITY SURVEY


ADAPTIVE ITEM FACTOR ANALYSIS AND FACTOR SCORE ESTIMATION
------------------------------------------------------------
RESPONSE CATEGORIES: 8 0 1 2 .
ANSWER KEY: 20020222220022222022222002002200

CONTENTS OF FIRST OBSERVATION:


ID=2010002201
WEIGHT= 1
ITEM RESPONSES= 201000220122112221022212202112211101122112222000
ITEM RESPONSES AFTER SELECTION =
10020221121022212021121101211200

SUM OF WEIGHTS = 598


NUMBER OF RECORDS= 598

Using this information, a frequency table of the score distribution is calculated and presented
graphically.

PHASE 1: HISTOGRAM AND BASIC STATISTICS

NUMBER OF OBSERVATIONS AT EACH SCORE


SCORE COUNT FREQ | SCORE COUNT FREQ | SCORE COUNT FREQ
0 0 0.0 | 11 35 5.9 | 22 21 3.5
1 0 0.0 | 12 40 6.7 | 23 10 1.7
2 0 0.0 | 13 38 6.4 | 24 8 1.3
3 0 0.0 | 14 52 8.7 | 25 6 1.0
4 1 0.2 | 15 54 9.0 | 26 1 0.2
5 2 0.3 | 16 54 9.0 | 27 1 0.2
6 1 0.2 | 17 56 9.4 | 28 0 0.0
7 5 0.8 | 18 57 9.5 | 29 0 0.0
8 7 1.2 | 19 36 6.0 | 30 0 0.0
9 18 3.0 | 20 43 7.2 | 31 0 0.0
10 20 3.3 | 21 32 5.4 | 32 0 0.0

784
13 TESTFACT EXAMPLES

MAIN TEST HISTOGRAM

FREQUENCY :
|
|
| **
| ****
| *****
8.0+ *****
| *****
| *****
| ***** *
| * ***** *
| *********
| **********
| ***********
| ***********
| ***********
4.0+ ***********
| ***********
| *************
| **************
| **************
| **************
| ***************
| ****************
| *******************
| *******************
0.0+-----+----+----+----+----+----+----+----+----+----+----+----+----+--
0. 5. 10. 15. 20. 25. 30.
SCORES

The last portion of the Phase 1 output gives the mean (15.9) and standard deviation (4.0) of the
Total Scores.

TEST RECORD NUMBER MEAN S.D. PROPORTION S.D.


MAIN 598 598 15.9 4.0 0.497 0.500

The proportion of correct responses, p, is

598 32
p = ∑∑ xij /(32 × 598) = 0.497
i =1 j =1

with a standard deviation

p (1 − p ) = 0.5.

Phase 2: Item statistics

For each item, eight statistics are produced. The Number, Mean and S.D. for item 2, for example,
are 590, 15.92, and 4.03 respectively. These values are obtained by “deleting” each row of the
data if a not-presented code is encountered for item 2. Since 8 rows contain not-presented codes,
the mean and standard deviation of the Total Scores is calculated for the remaining 590 cases.

785
13 TESTFACT EXAMPLES

Note, for example, that item 1 was presented to all 598 persons, while item 4 was presented to
592 persons.

PHASE 2: ITEM STATISTICS

MAIN TEST ITEM STATISTICS

ITEM NUMBER MEAN S.D. RMEAN FACILITY DIFF BIS P.BIS


1 Q158 598 15.91 4.01 14.46 0.206 16.29 -0.262 -0.185
2 Q166 590 15.92 4.03 17.13 0.653 11.43 0.532 0.413
3 Q167 596 15.90 4.01 16.35 0.790 9.77 0.305 0.215
4 Q247 592 15.93 4.01 16.71 0.694 10.97 0.384 0.292
5 Q249 594 15.92 4.01 15.89 0.466 13.34 -0.008 -0.006
6 Q251 598 15.91 4.01 17.16 0.532 12.68 0.417 0.332
7 Q252 598 15.91 4.01 17.39 0.490 13.10 0.451 0.360
8 Q253 598 15.91 4.01 18.16 0.410 13.91 0.591 0.467
9 Q254 597 15.91 4.02 18.99 0.203 16.33 0.551 0.387
10 Q257 597 15.92 4.01 17.99 0.449 13.51 0.585 0.466
...
31 Q313 597 15.91 4.02 16.31 0.843 8.98 0.349 0.231
32 Q314 594 15.93 4.02 16.86 0.586 12.13 0.351 0.278

The mean score for those subjects who get a specific item correct is denoted by RMEAN. For ex-
ample, since 385 respondents selected the correct response for item 2, RMEAN for item 2 is calcu-
lated as the mean of the corresponding 385 Total Scores and equals 17.13.

The item facility (FACILITY) is the proportion correct response for a specific item. For example,
385 of the 590 respondents presented with item 2 selected the correct response, and hence

p2 = 385 / 590 = 0.653.

The delta statistic ( ∆ or DIFF) is calculated as

∆ = −4Φ −1 ( p ) + 13,

where p is the item facility and Φ −1 denotes the inverse normal transformation. This statistic has
an effective range of 1 to 25, with a mean and standard deviation of 13 and 4 respectively.

The last 2 statistics are the biserial (BIS) and point biserial (P.BIS) correlations. The formula for
the sample point biserial correlation is

RMEAN − MEAN facility


P.BIS =
S .D. 1 − facility .

For item 8, for example,

786
13 TESTFACT EXAMPLES

18.16 − 15.91 0.41


P.BIS = = 0.467.
4.01 0.59

The point biserial correlation is the correlation between the item score and the total score, or sub-
test score. Theoretically −1 ≤ P.BIS ≤ 1 but in practice −0.20 ≤ P.BIS ≤ 0.75. Therefore, 0.467 indi-
cates a relatively strong association between item 8 and the Total Score.

The formula for calculating the sample biserial correlation coefficient, BIS, is

RMEAN − MEAN facility


BIS = × .
S .D. h( facility )

Consider, for example, the item 3 facility, which equals 0.790. From the inverse normal tables,
this corresponds to a z p -value of 0.8062.

1 1
h( facility ) = exp(− z 2p )
2π 2
= 0.399 × 0.723
= 0.29.

For item 3,

16.35 − 15.90 0.79


BIS = ×
4.01 0.29
= 0.305.

Phase 5: Tetrachoric correlations

The first part of the output contains, for each selected item, the Number of Cases, Percent
Correct, Percent Omitted, Percent Not Reached and Percent Not Presented.

PHASE 5: TETRACHORIC CORRELATIONS

MAIN TEST MISSING RESPONSE INFORMATION


----------------------------------------------------------------------------
ITEM NUMBER PERCENT PERCENT PERCENT PERCENT
OF CASES CORRECT OMITTED NOT REACHED NOT PRESENTED
----------------------------------------------------------------------------
1. Q158 598 20.6 0.0 0.0 0.0
2. Q166 590 64.4 0.0 0.0 1.3
3. Q167 596 78.8 0.0 0.0 0.3
4. Q247 592 68.7 0.0 0.0 1.0
5. Q249 594 46.3 0.0 0.0 0.7

31. Q313 597 84.1 0.0 0.0 0.2
32. Q314 594 58.2 0.0 0.0 0.7
----------------------------------------------------------------------------

787
13 TESTFACT EXAMPLES

This summary indicates that there were no omitted codes in the data and that all 598 respondents
could complete the test. The percent Not Presented varies from 0.0 to a maximum of 1.3 for
item 2. For item 2, this percentage is calculated as

 598 − 590 
  ×100 = 1.3%.
 598 

Note that the Percent Correct is calculated here as the number of respondents who selected the
correct answer, divided by the total number of cases. For item 2

385
PERCENT CORRECT = ×100 = 64.38%.
598

This value differs from the facility estimate (385/590) given under Phase 2 of the output.

Display 1: Tetrachoric correlation matrix

The tetrachoric correlation coefficient is widely used as a measure of association between two
dichotomous items. Tetrachoric correlations are obtained by hypothesizing, for each item, the
existence of a continuous “latent” variable underlying the “right-wrong” dichotomy imposed in
scoring. It is additionally hypothesized that, for each pair of items, the corresponding two con-
tinuous “latent” variables have a bivariate normal distribution.

AVERAGE TETRACHORIC CORRELATION = 0.0654


STANDARD DEVIATION = 0.2384
NUMBER OF VALID ITEM PAIRS = 496

DISPLAY 1. TETRACHORIC CORRELATION MATRIX

1 2 3 4 5 6
Q158 Q166 Q167 Q247 Q249 Q251
1 Q158 1.000
2 Q166 -0.383 1.000
3 Q167 -0.145 0.124 1.000
4 Q247 -0.535 0.368 0.054 1.000
5 Q249 0.106 -0.019 0.016 -0.161 1.000
6 Q251 -0.065 0.017 0.019 0.016 -0.126 1.000
...

In TESTFACT, use is made of n(n − 1) / 2 , (n = number of items) 2 × 2 frequency tables to calcu-


late the tetrachoric coefficients. From the computer output, the number of valid item pairs is 496.
Since the number of items equals 32, 32(32 – 1)/2 = 496, this data set contains no non-valid
pairs. Non-valid pairs have zero off-diagonal or marginal frequencies. Examples of non-valid
pairs are

788
13 TESTFACT EXAMPLES

R W

R O

W O

R W

R O O

and

R W

R O

W O

The average tetrachoric correlation equals 0.0654. Since the output has both negative and posi-
tive correlation coefficients, the average value does not shed much light on the actual strength of
association between item pairs. Note that tetrachoric correlation matrices are not necessarily
positive definite.

Phase 6: Factor analysis

Display 2: The positive latent roots of the correlation matrix

By definition, a symmetric matrix is positive definite if all its characteristic roots are positive.
From the output below, it is seen that only the first 31 of the 32 roots are positive, and therefore
the 32 × 32 matrix of tetrachoric correlations is not positive definite. This problem can be cor-
rected by replacing the negative roots of the matrix by zero or a small non-zero quantity.

DISPLAY 2. THE POSITIVE LATENT ROOTS OF THE CORRELATION MATRIX

1 2 3 4 5 6
1 7.491350 3.442602 2.592276 1.745235 1.576302 1.442306

7 8 9 10 11 12
1 1.248438 1.118638 1.015248 0.971235 0.908476 0.835705

13 14 15 16 17 18
1 0.768426 0.719607 0.657375 0.638227 0.631485 0.555802

789
13 TESTFACT EXAMPLES

19 20 21 22 23 24
1 0.514488 0.461871 0.398661 0.375292 0.349726 0.312994

25 26 27 28 29 30
1 0.292964 0.243591 0.218973 0.183170 0.167582 0.117183

31
1 0.055375

Display 3: Number of items and sum of latent roots and their ratio

This section of the output shows the sum of positive roots and the ratio with which each root has
to be multiplied to obtain a sum of “corrected roots” which equals the number of items. To illus-
trate, consider a 5 × 5 correlation matrix with latent roots 3, 1, 0.8, 0.3, and –0.1. The sum of the
roots equals 5. In general, for any correlation matrix based on n items, the sum of roots equals n.

Suppose the value of –0.1 is replaced by 0.0001, then the new sum of roots equals 5.1001. How-
ever, by multiplying each root by the ratio 5/5.1001 = 0.9804, a “corrected” set of roots is ob-
tained in the sense that their sum equals 5.

From the Display 3 part of the output, the ratio required to obtain a corrected set of latent roots
equals 0.9984211. The corrected set is given under the Display 4 heading.

DISPLAY 3. NUMBER OF ITEMS AND SUM OF LATENT ROOTS


AND THEIR RATIO
32 32.0506033 0.9984211

Display 4: Corrected latent roots

DISPLAY 4. THE CORRECTED LATENT ROOTS OF THE CORRELATION MATRIX

1 2 3 4 5 6
1 7.479522 3.437167 2.588184 1.742479 1.573814 1.440029
...

Display 5: Initial smoothed inter-item correlation matrix

A tetrachoric correlation matrix is not necessarily positive definite and in TESTFACT it is re-
placed by a so-called smoothed inter-item correlation matrix. For the reader familiar with matrix
algebra, a short description of the smoothing procedure follows.

Any symmetric matrix can be decomposed as

R = VDV '

where D is a diagonal matrix with diagonal elements the characteristic roots of ng = 32. As men-
tioned previously, if all roots are positive, that is, all the diagonal elements of D are positive, R
is a positive definite matrix. When this is not the case, a “smoothed” correlation matrix, R* may

790
13 TESTFACT EXAMPLES

be obtained by replacing the elements of D with the corrected roots and negative roots with ei-
ther 0 or some small positive quantity, so that

R * = VD*V ' ,

where the columns of V are eigenvectors and the elements of D* the corrected latent roots. The
elements of the smoothed correlation matrix for the first 6 of the 32 items are given below.

DISPLAY 5. INITIAL SMOOTHED INTER-ITEM CORRELATION MATRIX

1 2 3 4 5 6
Q158 Q166 Q167 Q247 Q249 Q251
1 Q158 1.000
2 Q166 -0.383 1.000
3 Q167 -0.145 0.124 1.000
4 Q247 -0.534 0.368 0.054 1.000
5 Q249 0.106 -0.019 0.016 -0.161 1.000
6 Q251 -0.066 0.017 0.019 0.016 -0.126 1.000

Display 6: Iterated communality estimates

A communality is defined as the squared multiple correlation between an observed variable and
the set of factors. The output below shows the estimated communalities for iterations 1, 2, 3, and
4. Note the small changes in the estimated values going from iteration 3 to iteration 4.

At iteration 1, the squared multiple correlation of an item with all other items is calculated for
each of the 32 items. The MINRES method (see Display 7) is subsequently used to obtain post-
solution improvements to these initial multiple regression communality estimates.

DISPLAY 6. ITERATED COMMUNALITY ESTIMATES

1 2 3 4
1 Q158 0.413 0.373 0.371 0.371
2 Q166 0.370 0.325 0.323 0.322
3 Q167 0.156 0.116 0.115 0.115
4 Q247 0.516 0.471 0.466 0.465
5 Q249 0.142 0.088 0.087 0.087
6 Q251 0.351 0.269 0.257 0.255

31 Q313 0.477 0.422 0.415 0.414
32 Q314 0.458 0.396 0.387 0.386

Display 7: The NROOT largest latent roots of the correlation matrix

TESTFACT uses the minimum squared residuals (MINRES) method to extract factors from the
smoothed correlation matrix R* .

Let eij denote the difference between a smoothed correlation coefficient rij* and the correspond-
ing estimated correlation coefficient pij . These estimated coefficients are functions of the factor
loadings and unique variances. The MINRES method mimimizes the residual sum of squares,

791
13 TESTFACT EXAMPLES

∑e 2
ij , using ordinary least squares. A more technical description, which may be skipped, fol-
lows.

The MINRES method minimizes the sum of squares of the residuals in a matrix ∆ , where

∆ p× p = R* − ( ΛΛ ' + Du )

where Λ is a p × k common factor matrix and the diagonal elements uii of Du , the unique vari-
ances, i = 1, 2, …, p. If ρi2 denotes the communality for item i, then uii equals 1 − ρi2 .

The sum of squares of the residuals is expressed as a statistical function (see, e.g. Tucker &
MacCallum, 1997), which is minimized by the determination of the matrix of factor loadings Λ
and uniqueness Du .

In this part of the output, the NROOT largest roots of the matrix

R * − Du

are reported. Note that, since uii equals 1 − ρi2 , characteristic roots are actually obtained from the
smoothed correlation matrix with the unit diagonal elements replaced by the communalities. In
general, the matrix R* − Du will be non-positive definite and hence a subset of the roots will be
negative.

If one replaces NROOT=10 in the FACTOR command with, for example, NROOT=20, the output
shows that roots with numbers 16, 17, 18 and higher are all negative. An empirical rule for the
selection of the number of factors, k, is to set k equal to the number of latent roots larger than 1.
For the present example it appears as if 3 or 4 factors are appropriate. Usually, the number of
factors is selected on the basis of some theoretical framework concerning the items included in
the analysis.

DISPLAY 7. THE NROOT LARGEST LATENT ROOTS OF THE CORRELATION MATRIX

1 2 3 4 5 6
1 6.886994 2.861018 1.961481 1.149766 0.934423 0.738751

7 8 9 10
1 0.582337 0.423875 0.326571 0.270941

Display 8: MINRES principal factor loadings

The estimated factor loadings at convergence of the MINRES method are given below. These
values are used to obtain starting values for the marginal maximum likelihood procedure speci-
fied in the FULL (full information) command.

Note that each communality is equal to the sum of squares of the corresponding factor loadings.

792
13 TESTFACT EXAMPLES

For example, for item 12, the 3 factor loadings are 0.406, 0.275, and 0.555. Hence,

p122 = 0.4062 + 0.2752 + 0.5552 = 0.549

(see Display 6 at iteration 4 as given in the complete output file).

DISPLAY 8. MINRES PRINCIPAL FACTOR LOADINGS

1 2 3
1 Q158 -0.579 0.189 0.022
2 Q166 0.519 -0.230 -0.001
3 Q167 0.246 0.215 -0.091
4 Q247 0.535 -0.420 -0.049
5 Q249 -0.152 -0.022 -0.251
6 Q251 0.250 0.245 0.364
...
31 Q313 0.431 -0.478 -0.018
32 Q314 0.338 -0.511 0.105

Display 9: Initial intercept and slope estimates

The intercept and slope estimates are functions of the item facility and factor loadings. If the
ROTATE keyword is omitted in the FACTOR command, the factor loadings are the MINRES factor
loadings (see Display 8). Otherwise the initial rotated factor loadings are used (not shown in the
output).

Suppose the factor loadings for item 1 and a 3-factor solution are denoted by f11 , f12 , and f13 ,
respectively. Let

c1 = 1 − f112 − f122 − f132 ,

and denote the slopes corresponding to item 1 by s11 , s12 , and s13 respectively. Then

f11 f f
s11 = , s12 = 12 , s13 = 13 .
c1 c1 c1

zi
Intercepts are computed as , where
ci

3
ci = 1 − ∑ f ij2
j =1

and zi is the z-value corresponding to an area under the N(0,1) curve equal to the item i facility.

793
13 TESTFACT EXAMPLES

For item 1, for example, facility equals 0.206 and the corresponding z-value is –0.8202. For item
1, c1 = 0.791 and therefore the item 1 intercept estimate is

−0.8202
INTERCEPT = = −1.036.
0.791

Conversely, factor loadings are related to the slopes. Let fij and sij respectively denote the j-th
factor loading and slope of item i, j = 1, 2, …, nfac. Then

sij
fij = ,
ki

where

3
ki = 1 + ∑ sij2 .
j =1

The initial intercept and slope values are used as initial estimates for the full information maxi-
mum likelihood procedure specified by the FULL command.

DISPLAY 9. INITIAL INTERCEPT AND SLOPE ESTIMATES


INTERCEPT SLOPES
1 2 3
1 Q158 -1.036 0.387 0.636 0.191
2 Q166 0.476 -0.285 -0.609 -0.156
3 Q167 0.858 -0.341 0.023 -0.115
4 Q247 0.695 -0.245 -0.900 -0.033
5 Q249 -0.088 -0.030 0.092 0.293
6 Q251 0.092 -0.097 0.025 -0.576
...
31 Q313 1.313 -0.083 -0.837 0.020
32 Q314 0.277 0.107 -0.784 -0.045

Display 10: The EM estimation of parameters

This part of the output shows that parameter estimates will be based on the EM (Expectation
Maximization) method and that the number of quadrature points equals 4. Quadrature is a nu-
meric integration method that is often used in practice to calculate the value of an integral, when
no closed-form solution exists.

For the interested reader, a brief description of the quadrature method to calculate the log-
likelihood function is presented next.

794
13 TESTFACT EXAMPLES

For a one-factor analysis, for example, the log-likelihood function can be expressed as

N ∞


α
log ∫ gα (θ , x)dx
=1 −∞

where N denotes the number of cases, and θ a set of unknown parameters.

The integrals, or so-called marginal probabilities, are approximated by


∞ NQUAD

∫ gα (θ , x)dx = ∑
−∞ k =1
wk gα (θ , xk )

where wk denote the weights and xk the quadrature points.

Display 11: quadrature points and weights

The numeric values of the 4 quadrature points and weights are listed. Note that the weights are
always positive and that the quadrature points are symmetric.

DISPLAY 11. 4 QUADRATURE POINTS AND WEIGHTS:

1 -2.334414 0.045876
2 -0.741964 0.454124
3 0.741964 0.454124
4 2.334414 0.045876

The next part of the output shows the progress of the iterative procedure. At each cycle, -2 x
LOG-LIKELIHOOD is reported as well as the maximum change in the intercept and slope values.
For example, the maximum change in slope 1 estimates is equal to 0.098630. In other words,
starting from the initial slope values of 0.387 (item 1), 0.467 (item 2), …, 0.277 (item 32), the
differences between these values and the revised cycle 1 slope 1 estimates are at the most
0.098630 units.

Small maximum changes in intercept and slope estimates are therefore an indication of conver-
gence.

Note that, starting from cycle 6, the difference between –2 log L of the previous cycle and the
present cycle is reported. At cycle 19, for example, this value, reported as CHANGE, is 0.0726.

SUM OF MARGINAL PROBABILITIES = 0.17040D-02

CYCLE 1 - 2 X MARGINAL LOG LIKELIHOOD = 0.2084060567D+05

MAXIMUM CHANGE OF ESTIMATES


INTERCEPT = 0.038118 SLOPE = 0.098630
0.056828
0.037478

795
13 TESTFACT EXAMPLES

Number of patterns with zero probability = 0


...

SUM OF MARGINAL PROBABILITIES = 0.17930D-02

CYCLE 32 - 2 X MARGINAL LOG LIKELIHOOD = 0.2080175353D+05


CHANGE = -0.3000105835D-02

MAXIMUM CHANGE OF ESTIMATES


INTERCEPT = 0.002038 SLOPE = 0.005042
0.001369
0.003811

Number of patterns with zero probability = 0

Display 12: χ 2 and degrees of freedom

The χ 2 -statistic reported below is calculated as

NR  Wj 
χ = ∑ W j log 
2
,
W × p
j =1  T j 

where N R denotes the number of unique observed response patterns, W j the sum of weights and
p j the marginal probability (marginal likelihood function) for pattern j.

The degrees of freedom, ndf, equal

ndf = ( N R − 1) − [(nfac + 1)n − nfac(nfac − 1) / 2].

For this example, N R = 598 , nfac = 3, and n (number of items) equals 32. Hence

ndf = 597 − [128 − 3] = 472.

This χ 2 statistic can be used to test hypotheses of the form:

 H 0 : A k- factor model provides an adequate description of the data.


 H1 : A (k + 1)- factor model provides an adequate description of the data.

The resultant test statistic is the difference between the χ 2 under H 0 and the χ 2 under H1 with
degrees of freedom equal to the difference in degrees of freedom for H 0 and H1 .

796
13 TESTFACT EXAMPLES

If we replace the NFAC=3 keyword in the FACTOR command with NFAC=2, then

χ 02 = 13498.63
ndf = 502.

From the output below, χ12 = 13155.03 with 472 degrees of freedom. The χ 2 for a 2-factor ver-
sus a 3-factor model is 13498.63 – 13155.03 = 343.60 with 502 – 472 = 30 degrees of freedom.
Since this value is highly significant, we reject the 2-factor model in favor of the 3-factor model.

DISPLAY 12. CHI-SQUARE = 13155.03 DF = 472.00 P = 0.000

Display 13: Untransformed item parameters

The output below shows the estimated intercept and slope estimates after convergence is at-
tained, or alternatively, after the maximum number of cycles specified is used. The number of
EM cycles can be specified by one of the following commands:

>FULL CYCLES = ncycles;


>TECHNICAL ITER(a,b,c);

DISPLAY 13. UNTRANSFORMED ITEM PARAMETERS


INTERCEPT SLOPE ESTIMATES
1 2 3
1 Q158 -1.048 0.280 0.620 0.264
2 Q166 0.482 -0.244 -0.562 -0.181
3 Q167 0.868 -0.294 0.038 -0.135
4 Q247 0.693 -0.141 -0.853 -0.125
5 Q249 -0.086 -0.028 0.083 0.277
6 Q251 0.093 -0.066 0.015 -0.591
...
31 Q313 1.361 0.063 -0.863 -0.133
32 Q314 0.278 0.115 -0.757 -0.069

Display 14: Standardized difficulty, communality, and principal factors

Each communality is equal to the sum of squared factor loadings for the corresponding item. For
example, for item 1 the factor loadings are –0.533, –0.194, and 0.069. The communality is equal
to c j ∼ N (m, s ). The standardized difficulty for item i is calculated as – intercept / ki , where (see
comments for Display 9)

ki = 1/ si21 + si22 + si23

and sij denotes the j-th slope for item j. For item 1, for example,

si21 + si22 + si23 = 0.2792 + 0.6212 + 0.2642


= 0.533.

797
13 TESTFACT EXAMPLES

Hence, the standardized difficulty for item 1 = –( –1.048)/1.238 = 0.846.

An item with a standardized difficulty of 0 can be regarded as an item with “average” difficulty.
Standardized difficulty scores above 0 are associated with the more difficult items and a value of
1.0, for example, indicates that examinees can be expected to find this item more difficult to an-
swer than an item with standardized difficulty of less than 1. On the other hand, items with stan-
dardized difficulty of less than 0 (for example item 31) can be expected to be much easier to an-
swer correctly.

As mentioned earlier (see Display 9), the relationship between slopes and unrotated factor load-
ings is given by

sij
fij = ,
ki

where i is the item number, j the slope number and ki as defined above.

The principal factor loadings given below are obtained as follows. Let F be a n × nfac matrix of
factor loadings with typical elements fij = sij / kij and define S as the n × n symmetric matrix
FF ' with column rank equal to the number of factors, nfac. This implies that S has a maximum
of nfac non-zero characteristic roots c1 , c2 ,..., cnfac . If we denote the corresponding eigenvectors
by e1 , e 2 ,..., e nfac , then the principal factor loadings shown in the output below are computed as
f1* = e1 c1 , f 2* = e 2 c2 and f3* = e3 c3 where the elements of f *j denote the factor loadings for
the j-th factor, j = 1, 2, 3.

DISPLAY 14. STANDARDIZED DIFFICULTY, COMMUNALITY, AND PRINCIPAL FACTORS

DIFF. COMM. FACTORS

1 2 3
1 Q158 0.846 0.348 -0.553 -0.194 0.069
2 Q166 -0.406 0.290 0.496 0.208 -0.030
3 Q167 -0.825 0.096 0.215 -0.215 0.062
4 Q247 -0.522 0.433 0.512 0.410 -0.050
5 Q249 0.083 0.078 -0.146 0.032 0.235
6 Q251 -0.080 0.261 0.246 -0.242 -0.377
...
31 Q313 -1.024 0.434 0.419 0.487 -0.145
32 Q314 -0.221 0.372 0.341 0.488 -0.131

798
13 TESTFACT EXAMPLES

Display 15: Percent of variance explained

The percentage variance explained by factor j is calculated as

cj
×100%, j = 1, 2, … , nfac
n

where c j is the j-th characteristic root of FF ' (see Display 14) and n the number of items.

From the values reported in the output, it is seen that 20.31% of the total variance is explained by
the first factor, 8.64% by the second and 5.68% by the third factor. Since

c1
20.31 = ×100,
32

it follows that c1 = 6.50.

DISPLAY 15. PERCENT OF VARIANCE

1 2 3
1 20.31014 8.64630 5.68340

Display 16: Standardized difficulty, communality and VARIMAX factors

The output below contains the VARIMAX rotated factors.

Let Λ be a n × k matrix of factor loadings. This matrix represents the relationships between the
original n items and k linear combinations of these items. To illustrate, suppose the number of
items (p) is 4 and the number of factors (k) equals 2:

F1 = λ11 Item1 + λ21 Item2 + λ31 Item3 + λ41 Item4


F2 = λ12 Item1 + λ22 Item2 + λ32 Item3 + λ42 Item4

where F1 and F2 are uncorrelated and the variances of F1 and F2 are the so-called eigenvalues.
The factor loadings {λij } are only unique up to a rotation in k-dimensional space. A suitable rota-
tion of these factor loadings can result in a simplified structure between the factors and items if
the new set of factor loadings {λij*} are either relatively large or small. Rotations may be found
by minimizing the criterion (see, e.g. Lawley and Maxwell (1971))

γ
2
k n
 n k

V = ∑∑ (λ ) − ∑  ∑ (λij* ) 2  ,
* 4
ij
j =1 i =1 n j =1  i =1 

799
13 TESTFACT EXAMPLES

where the constant γ gives a family of rotations with γ = 1 giving VARIMAX rotations and
γ = 0 QUARTIMAX rotations.

Note that the standardized difficulty and communality estimates are the same as those given in
Display 14. To determine which items are associated with a specific factor, one may select, for
each item, the column with the highest loading (ignoring the sign of the loading). The following
items appear to be indicators of Factor 2, for example: items 1, 2, 4, 8, 20, 24, 25, 26, 31 and 32.

DISPLAY 16. STANDARDIZED DIFFICULTY, COMMUNALITY, AND VARIMAX FACTORS


DIFF. COMM. FACTORS
1 2 3
1 Q158 0.846 0.348 0.261 0.499 0.175
2 Q166 -0.406 0.290 -0.234 -0.470 -0.117
3 Q167 -0.825 0.096 -0.287 0.043 -0.110
4 Q247 -0.522 0.433 -0.138 -0.641 -0.058
5 Q249 0.083 0.078 -0.005 0.091 0.263
6 Q251 -0.080 0.261 -0.092 -0.006 -0.503
...
31 Q313 -1.024 0.434 0.014 -0.654 -0.075
32 Q314 -0.221 0.372 0.063 -0.605 -0.035

Phase 7: Factor scores using EAP estimates

The factor scores are Bayes estimates computed under the assumption that the corresponding
ability factors are normally distributed in the population from which the sample of examinees
was drawn.

Let θ ik denote the j-th ability score, k = 1, 2, …, nfac for examinee i, i = 1, 2, …, N, then the fac-
tor scores are E (θ ik | xi1 , xi 2 ,..., xin ) where xij is the item j score for examinee i (see the discussion
of the output in Section 13.7 for more details).

Display 17: Quadrature points and weights

To obtain these conditional expectations, a 5-point quadrature formula is employed. The points
and weights are shown below.

DISPLAY 17. 5 FACTOR SCORE QUADRATURE POINTS AND WEIGHTS:

1 -2.856970 0.011257
2 -1.355626 0.222076
3 0.000000 0.533333
4 1.355626 0.222076
5 2.856970 0.011257

800
13 TESTFACT EXAMPLES

Display 18: Factor scores and standard error estimates

The command file contains the command

>SCORE METHOD=2, LIST=20;

This command requests that the factor ability scores for the first 20 cases should be listed as part
of the output. The full set of factor scores is written to the file exampl03.fsc. For each case, the
case ID, number of items presented, percent correct and percent omitted are reported. Below
these values, the ability scores for each factor, with estimated standard errors marked with an
asterisk, are given. Case 3, for example, was presented with 30 items of which 13 were answered
correctly. Hence the percentage correct for this case is

13
×100 = 43.3.
30

Case 10 answered 84.4% percent correctly and had factor scores of 0.898, 1.234 and 1.710 re-
spectively. Since the means of the 598 factor scores (see the last part of the output) are approxi-
mately 0 with standard deviations of 0.86, 0.86 and 0.82 respectively, it can be concluded that
examinee 10 attained factor scores that are at least one standard deviation above average.

Factor scores are not unique in the sense that multiplication of any column of factor scores by –1
does not affect the validity of the estimates. It may therefore happen that negative scores are as-
sociated with above average percent responses and vice versa for below average responses.
TESTFACT attempts to reverse the signs in such a way that scores above zero are usually as-
signed with above average achievement.

DISPLAY 18. FACTOR SCORES AND STANDARD ERRORS (S.E.)

CASE NUMBER PERCENT PERCENT CASE ID


PRESENTED CORRECT OMITTED
SCORES: 1 2 3
S.E.*
==============================================================
1 32 59.4 0.0 2010002201
0.264 1.018 0.120
0.560* 0.543* 0.576*
2 32 12.5 0.0 0012212110
-1.329 -0.100 -1.495
0.483* 0.469* 0.645*
3 30 43.3 0.0 0010.02100
-0.572 0.346 0.035
0.420* 0.511* 0.527*
4 32 43.8 0.0 0020202202
-0.612 -1.378 0.584
0.530* 0.521* 0.587*
5 32 37.5 0.0 2010002210
-0.901 -0.061 -0.123
0.446* 0.482* 0.541*

7 32 59.4 0.0 1021001110
0.548 -1.653 1.132

801
13 TESTFACT EXAMPLES

0.456* 0.532* 0.611*


8 32 34.4 0.0 0010012210
-0.156 -0.332 -0.817
0.436* 0.484* 0.574*
9 32 28.1 0.0 2010011100
-0.204 -0.590 -0.597
0.433* 0.478* 0.556*
0.699* 0.654* 0.709*
...

13.5 Adaptive item factor analysis and Bayes modal (MAP) factor score esti-
mation for the activity survey

This example analyzes 32 items selected from the 48-item version of the Jenkins Activity Survey
for Health Prediction, Form B (Jenkins, Rosenman, & Zyzanski, 1972). The data are responses
of 598 men from central Finland drawn from a larger survey sample. Most of the items are rated
on three-point scales representing little or no, occasional, or frequent occurrence of the activity
or behavior in question. For purposes of the present analysis, the scales have been dichotomized
near the median. For a complete discussion of the contents of the data file and variable format
statement used to read these data, see Section 13.4. In Section 13.4, EAP factor score estimation
was performed. This example, illustrating MAP factor score estimation, imports the ex-
ampl04.par file from the previous example (FILE keyword on the SCORE command) to score the
respondents to the survey using the VARIMAX rotated factor pattern.

The PROBLEM, RESPONSE, KEY, SELECT and INPUT commands are the same as used in Section
13.4, with the exception of the addition of the SKIP keyword on the PROBLEM command.
Classical item analysis is skipped, and the removal of the TETRACHORIC, FACTOR and FULL com-
mands confirms that item factor analysis is also skipped, as indicated by SKIP=2. The SAVE
command is still present, but only used to save factor scores to the file exampl04.fsc (FSCORES
option on the SAVE command).

The SCORE command now indicates the use of MAP estimation (METHOD=3). The FILE keyword
indicates the parameter file to be used while the NFAC keyword specifies the number of factors
used when estimating the factor scores (recall that in the previous example 3 factors were ex-
tracted). Factor scores for the first 20 cases are to be written to the output file (LIST=20) and the
convergence for the MAP iterations is set by the SPRECISION keyword. Cases will be scored by
the MAP (Maximum A Posteriori, or Bayes Modal) method. Standard error estimates will be
computed from the posterior information at the estimated values.

>TITLE
EXAMPL05.TSF-ITEMS FROM THE JENKINS ACTIVITY SURVEY
SCORING THE RESPONDENTS (MAP METHOD)
>PROBLEM NITEMS=48,SELECT=32,RESPONSES=5,NOTPRESENTED,SKIP=2;
>NAMES Q156,Q157,Q158,Q165,Q166,Q167,Q247,Q248,Q249,Q250,Q251,Q252,
Q253,Q254,Q255,Q256,Q257,Q258,Q259,Q260,Q261,Q262,Q263,Q264,
Q265,Q266,Q267,Q268,Q269,Q270,Q271,Q272,Q273,Q274,Q275,Q276,
Q277,Q278,Q279,Q280,Q307,Q308,Q309,Q310,Q311,Q312,Q313,Q314;
>RESPONSE ‘8’, ‘0’, ‘1’, ‘2’, ‘.’;
>KEY 002000220022222220022222202222220002220022222000;
>SELECT 3,5,6,7,9,11(1)14,17(1)23,25(1)30,32,33,35,36,39(1)42,47,48;
>SAVE FSCORES;

802
13 TESTFACT EXAMPLES

>SCORE METHOD=3,LIST=20,NFAC=3,SPRECISION=0.0001,
FILE=‘EXAMPL04.PAR’;
>INPUT NIDCHAR=10,SCORES,FILE=‘EXAMPL04.DAT’;
(10A1,T1,48A1)
>STOP

13.6 Six-factor analysis of the activity survey by Monte Carlo full information
analysis

This example illustrates a six-dimensional analysis by the Monte Carlo version of adaptive EM
estimation. The same 32 items selected from the 48-item version of the Jenkins Activity Survey
for Health Prediction, Form B (Jenkins, Rosenman, and Zyzanski, 1972) as in the previous 2 ex-
amples are used. For a complete discussion of the contents of the data file and variable format
statement used to read these data, see Section 13.4.

The TETRACHORIC command requests the printing of the coefficients to 3 decimal places
(NDEC=3) in the printed output file (LIST option). The FACTOR and FULL commands are used to
specify parameters for the full information item factor analysis. Six factors and six latent roots
are to be extracted, as indicated by the NFAC and NROOT keywords respectively. A PROMAX rota-
tion is requested. Note that this keyword may not be abbreviated in the FACTOR command. A
maximum of 24 EM cycles will be performed (CYCLES keyword on the FULL command).

In place of the default method of integration by fractional quadrature of the posterior distribu-
tions, the program performs Monte Carlo integration in the corresponding number of dimensions.
Random points are drawn at each E-step from the provisional posterior distribution for each case,
which is assumed multivariate normal in the number of factors. After the specified iteration limit
is reached, the points for each case at the iteration limit are saved and used in all subsequent EM
cycles. The number of points sampled for the Monte Carlo EM solution is set to a value of 74596
using the MCEMSEED keyword on the TECHNICAL command.

Monte Carlo integration is also used in computing EAP factor scores (METHOD=2 on the SCORE
command). Factor scores for the first 20 cases are to be written to the output file (LIST=20).

>TITLE
EXAMPL06.TSF- ITEMS FROM THE JENKINS ACTIVITY SURVEY
SIX-FACTOR ANALYSIS BY MONTE CARLO EM FULL INFORMATION ANALYSIS
>PROBLEM NITEMS=48,SELECT=32,RESPONSES=5,NOTPRESENTED;
>NAMES Q156,Q157,Q158,Q165,Q166,Q167,Q247,Q248,Q249,Q250,Q251,Q252,
Q253,Q254,Q255,Q256,Q257,Q258,Q259,Q260,Q261,Q262,Q263,Q264,
Q265,Q266,Q267,Q268,Q269,Q270,Q271,Q272,Q273,Q274,Q275,Q276,
Q277,Q278,Q279,Q280,Q307,Q308,Q309,Q310,Q311,Q312,Q313,Q314;
>RESPONSE ‘8’,‘0’, ‘1’, ‘2’, ‘.’;
>KEY 002000220022222220022222202222220002220022222000;
>SELECT 3,5,6,7,9,11(1)14,17(1)23,25(1)30,32,33,35,36,39(1)42,47,48;
>TETRACHORIC LIST, NDEC=3;
>FACTOR NFAC=6,NROOT=6,ROTATE=PROMAX;
>FULL CYCLES=24;
>SCORE METHOD=2,LIST=20;
>TECHNICAL MCEMSEED=4593;
>INPUT NIDCHAR=10,SCORES,FILE=‘EXAMPL04.DAT’;
(10A1,T1,48A1)
>STOP

803
13 TESTFACT EXAMPLES

13.7 Item bifactor analysis of a 12th-grade science assessment test

Data for this example are based on 32 items from a science assessment test in the subjects of bi-
ology, chemistry, and physics administered to twelfth-grade students near the end of the school
year. The items were classified by subject matter for purposes of the bifactor analysis.

The first five cases from the data file exampl07.dat are shown below. The FILE keyword on the
INPUT command denotes this file as the data source and the SCORES option indicates that it con-
tains item scores.

Case001 14523121312421534414334135131545
Case002 34283328312821524114338184145848
Case003 14543223322131554134331134134441
Case004 24423324322421524134315254134242
Case005 24523221122421544514333115131241

The case identification is given in the first 7 columns, and is listed first in the variable format
statement. The length of this field is also indicated by the NIDCHAR keyword on the INPUT com-
mand. After using the “T” operator to tab to column 11, the 32 item responses are read as single
characters (32A1).

(7A1,T11,32A1)

32 items from the science test are used as indicated by the NITEMS keyword on the PROBLEM com-
mand, and the RESPONSE keyword denotes the number of possible responses. The six responses
are listed in the RESPONSE command. Naming of the items is done using the NAMES command,
while the KEY command lists the correct response to each item.

The BIFACTOR command is used to request full information estimation of loadings on a general
factor in the presence of item-group factors. Three item-group factors are present (NIGROUP=3),
with allocation of the items to these groups as specified with the IGROUPS keyword. The CPARMS
keyword lists the probabilities of chance success on each item. By setting the LIST keyword to 3,
the bifactor loadings will be printed in both item and in item-group order in the output file. A
total of 30 EM cycles (CYCLES=30) will be performed in the bifactor solution.

The SCORE command is used to obtain, for each distinct pattern, the EAP score of the general
factor of the bifactor model and to obtain the standard error estimate of the general factor score
allowing for conditional dependence introduced by the group factors. Factor scores for the first
10 cases will be printed to the output file (LIST=10) and the guessing model will be used in the
computation of the factor scores (CHANCE option).

>TITLE
EXAMPL07.TSF- ITEM BIFACTOR ANALYSIS OF A TWELFTH-GRADE SCIENCE ASSESSMENT TEST
THE GENERAL FACTOR WILL BE SCORED
>PROBLEM NITEMS=32,RESPONSE=6;
>NAMES CHEM01,PHYS02,CHEM03,PHYS04,PHYS05,CHEM06,BIOL07,CHEM08,
BIOL09,BIOL10,BIOL11,PHYS12,BIOL13,PHYS14,BIOL15,CHEM16,
BIOL17,BIOL18,PHYS19,PHYS20,BIOL21,BIOL22,PHYS23,BIOL24,
PHYS25,PHYS26,BIOL27,PHYS29,CHEM29,PHYS30,BIOL31,CHEM32;

804
13 TESTFACT EXAMPLES

>RESPONSE 8,1,2,3,4,5;
>KEY 14523121312421534414334135131545;
>BIFACTOR NIGROUP=3,LIST=3,CYCLES=30,QUAD=9,
IGROUPS=(2,3,2,3,3,2,1,2,1,1,1,3,1,3,1,2,1,1,3,3,1,1,
3,1,3,3,1,3,2,3,1,2),
CPARMS=(0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,
0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,
0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1);
>SCORE LIST=10,CHANCE;
>SAVE PARM,FSCORES;
>INPUT NIDCHAR=4,SCORES,FILE=‘EXAMPL07.DAT’;
(7A1,T11,32A1)
>STOP;

13.7.1 Discussion of bifactor analysis output

Exampl04.tsf illustrates the extension of a one-factor model to a so-called bifactor model by the
inclusion of group factors. The bifactor model is applicable when an achievement test contains
more than one subject matter content area. The data set exampl04.dat consists of the results of a
32-item science assessment test in the subjects biology, chemistry, and physics. Items are classi-
fied according to subject matter where 1 = biology, 2 = chemistry and 3 = physics (see the
IGROUPS keyword in the BIFACTOR command, discussed in Section 5.3.3). Note that TESTFACT
does not estimate guessing parameters, but does allow the user to specify the values (see CPARMS
keyword) in which case a 3-parameter model that provides for the effect of guessing is fitted to
the data.

The analysis specified in exampl04.tsf produces Phase 0, Phase 1, Phase 2, Phase 6 and Phase 7
output. The interpretation of Phases 0, 1, and 2 is omitted here since a detailed discussion of
these parts of the output elsewhere.

Phase 6: Bifactor analysis

Display 1 lists the chance and initial intercept and slope estimates. Note that the initial intercept
estimates are set equal to zero, the initial slope estimates are set to 1.414 for the general factor
and 1.00 for the group factors. These initial values are routinely used in TESTFACT for bifactor
models.

DISPLAY 1. CHANCE AND INITIAL INTERCEPT AND SLOPE ESTIMATES


CHANCE INTERCEPT SLOPES
1 2
1 CHEM01 0.100 0.000 1.414 1.000
2 PHYS02 0.100 0.000 1.414 1.000
3 CHEM03 0.100 0.000 1.414 1.000
4 PHYS04 0.100 0.000 1.414 1.000
5 PHYS05 0.100 0.000 1.414 1.000
6 CHEM06 0.100 0.000 1.414 1.000
...
31 BIOL31 0.100 0.000 1.414 1.000
32 CHEM32 0.100 0.000 1.414 1.000

One may optionally include the TETRACHORIC command (see exampl03.tsf) when fitting a bi-
factor model. This command is required if a printout of residuals is requested. If a TETRACHORIC

805
13 TESTFACT EXAMPLES

command is used, tetrachoric correlations are computed pairwise for the 32 × (32 − 1) / 2 = 496
pairs of items. There are a total of 20 item pairs that cannot be used since their corresponding
2 × 2 frequency tables contain zero or near-zero off-diagonal or marginal frequencies. In these
cases, a tetrachoric correlation of 1 is substituted in the matrix of tetrachoric correlations.

The inclusion or exclusion of the TETRACHORIC command has no effect on the estimation proce-
dure, since the starting values for the marginal maximum likelihood procedure are fixed, and do
not depend on the matrix of tetrachoric coefficients.

Display 2-3: EM estimation and quadrature points and weights

The bifactor procedure uses the 9 quadrature points and weights listed below. MML estimation
for the bifactor model requires quadrature in only two dimensions. For a more detailed discus-
sion, see the Phase 7 part of the output.

DISPLAY 2. THE EM ESTIMATION OF PARAMETERS

9 QUADRATURE POINTS

DISPLAY 3. 9 QUADRATURE POINTS AND WEIGHTS

1 -4.000000 0.000134
2 -3.000000 0.004432
3 -2.000000 0.053991
4 -1.000000 0.241971
5 0.000000 0.398942
6 1.000000 0.241971
7 2.000000 0.053991
8 3.000000 0.004432
9 4.000000 0.000134

The number of cycles for the EM algorithm is set equal to 30 (CYCLES=30 on the BIFACTOR
command). At each cycle, the value of –2 log L as well as the maximum change in the intercept
and slope parameters are given. At cycle 30 the maximum change in intercept is 0.0050. The
general factor slope estimates for the 32 items changed at most by 0.0047 while the correspond-
ing value for the group factor equals 0.0095. These values indicate that, although convergence
was not-reached within the specified 30 cycles, the solution after 30 cycles is probably accept-
able for all practical purposes.

CYCLE 30 -2 X MARGINAL LOG-LIKELIHOOD = 0.1882667932D+05


CHANGE = 0.4390039691D-01

MAXIMUM CHANGE OF ESTIMATES


INTERCEPT = 0.004952 SLOPE = 0.004758
0.009507

Display 4: χ 2 and degrees of freedom

DISPLAY 4. CHI-SQUARE = 11150.36 DF = 503.00 P = 0.000

806
13 TESTFACT EXAMPLES

The χ 2 -value is 1150.36 with 503 degrees of freedom. The number of degrees of freedom is cal-
culated as

df = N − 1 − 2n − ng

where N is the number of distinct patterns, n is the number of items, and ng is the number of
items assigned to group factors. For this example, N = 600, n = 32 and, since all the items are
assigned to group factors, ng = 32.

The χ 2 -statistic is only correct when all possible 2n patterns are observed. For the present sam-
ple, since N  2n , the χ 2 -statistic is too inaccurate to be used as a goodness-of-fit test statistic.
The difference in the χ 2 -statistics for alternative models, however, yields a valid test statistic for
judging whether the inclusion of additional parameters results in a significant improvement of
model fit.

Example

It is hypothesized that the 12 physics items are indicators of a general factor only, while the biol-
ogy and chemistry items are indicators of a general and two uncorrelated group factors.

We wish to test

 H 0 : The 32 items are indicators of a general factor, but the 13 biology and 7 chemistry
items are also indicators of two uncorrelated group factors.
 H1 : The 32 items are indicators of a general as well as three uncorrelated group factors.

To obtain the χ 2 -statistic and degrees of freedom under H 0 , the BIFACTOR command is modi-
fied as follows:

>BIFACTOR NIGROUP=2, LIST=3, CYCLES=30,


IGROUPS=(2,0,2,0,0,2,1,2,1,1,1,0,1,0,1,2,1,1,0,0,1,1,0,1,0,0,1,0,2,0,1,2),
CPARMS=(0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1,
0.1, 0.1, 0.1, 0.1, 0.1);

Note that the NIGROUPS keyword is set equal to 2 and that each value of 3, corresponding to the
position of the physics items in the data set, is substituted by a value of 0 in the IGROUPS key-
word. A “0” symbol indicates that the corresponding items are not assigned to any group factors.
A graphical presentation of the H 0 model is shown below.

807
13 TESTFACT EXAMPLES

If we run exampl04.tsf with the changes to the BIFACTOR command as discussed above, the χ 2 -
statistic value and degrees of freedom shown below are obtained.

DISPLAY 4. CHI-SQUARE = 11179.71 DF=515.00 P=0.000

To test H 0 against H1 , one computes the difference in the corresponding χ 2 s and degrees of
freedom. Hence χ 2 = 11179.71 – 11150.36 = 29.35 with 515 – 503 = 12 degrees of freedom.

Since P( χ 2 (12) ≥ 29.35) = 0.0034 , H 0 is rejected and it is concluded that items from all 3 sub-
jects should be used for the group factors.

Display 5: Untransformed item parameters

The estimates for the intercept and slope parameters are listed below.

808
13 TESTFACT EXAMPLES

DISPLAY 5. UNTRANSFORMED ITEM PARAMETERS


CHANCE INTERCEPT SLOPES

1 2
1CHEM01 0.100 -1.054 0.709 0.417
2PHYS02 0.100 0.126 1.019 0.548
3CHEM03 0.100 -1.360 1.265 -0.182
4PHYS04 0.100 -0.578 0.469 0.377
5PHYS05 0.100 0.263 0.635 0.337
6CHEM06 0.100 -2.729 1.706 0.308
...
31 BIOL31 0.100 1.608 1.447 -0.005
32 CHEM32 0.100 -1.522 0.190 0.066

An alternative way to present these estimated parameters is shown below for the first 10 items.

Item Chance Intercept General Group1 Group2 Group3


--------------------------------------------------------------------
1 CHEM01 0.100 -1.054 0.709 0.000 0.417 0.000
2 PHYS02 0.100 0.126 1.019 0.000 0.000 0.548
3 CHEM03 0.100 -1.360 1.265 0.000 -0.182 0.000
4 PHYS04 0.100 -0.578 0.469 0.000 0.000 0.377
5 PHYS05 0.100 0.263 0.635 0.000 0.000 0.337
6 CHEM06 0.100 -2.729 1.706 0.000 0.308 0.000
7 BIOL07 0.100 0.839 0.586 0.636 0.000 0.000
8 CHEM08 0.100 -2.220 1.144 0.000 0.929 0.000
9 BIOL09 0.100 1.287 0.212 0.476 0.000 0.000
10 BIOL10 0.100 -0.464 0.762 0.444 0.000 0.000

Display 6: Percent of variance

DISPLAY 6. PERCENT OF VARIANCE


----------------------------
GENERAL 0 31.7580
ITEM GROUP 1 3.8018
ITEM GROUP 2 2.7716
ITEM GROUP 3 2.9551
UNIQUENESS 58.7134
----------------------------

The percentage variance explained by each of the four factors is calculated as follows. Let sij
denote the j-th slope parameter for item i, i = 1, 2, …, 32. If we define

ki = 1 + ∑ sij2 ,
j

then slopes are transformed to factor loadings (see Display 9 in the discussion of the Section
13.4 output) using the relationship

sij
fij =
ki .

809
13 TESTFACT EXAMPLES

Example

For item 7,

k7 = 1 + 0.5862 + 0.6362 = 1.322.

The item 7 loadings are therefore 0.586/1.322 = 0.443 and 0.636/1.322 = 0.481 respectively. Let
F be a 32 × 4 matrix of factor loadings with elements (see Display 7)

 0.5475 0.0000 0.3222 0.0000 


 0.6663 0.0000 0.0000 0.3585

F=     .
 
 0.8227 −0.0029 0.0000 0.0000 
 0.1859 0.0000 0.0649 0.0000 

The percentage variance explained by factor j is calculated as

cj
×100%, j = 1, 2,3, 4,
n

where n = 32 and c j the j-th characteristic root of FF ' (See also the discussion of the output,
Display 15 in Section 13.4).

The uniqueness component is calculated as

n − ∑cj
×100%.
j

Display 7: Standardized difficulties, communalities and bifactor loadings

The bifactor loadings are derived from the slope estimates using the formula fij = sij / ki (see Dis-
play 6 above). The standardized item i difficulty equals −intercept / ki .

Example

For item 7, k7 = 1.322 so that the standardized difficulty is −0.839 /1.322 = −0.635. Communal-
ities are equal to the sum of the squares of the factor loadings. For example, the item 1 commu-
nality is equal to

0.54752 + 0.32222 = 0.4036.

810
13 TESTFACT EXAMPLES

DISPLAY 7. BIFACTOR RESULTS IN ITEM SEQUENTIAL ITEM ORDER


ITEM GROUP DIFFICULTY COMMUNALITY GENERAL SPECIFIC
----------------------------------------------------------------
1 CHEM01 2 0.8138 0.4036 0.5475 0.3222
2 PHYS02 3 -0.0821 0.5725 0.6663 0.3585
3 CHEM03 2 0.8380 0.6204 0.7797 -0.1120
4 PHYS04 3 0.4952 0.2659 0.4022 0.3226
...
31 BIOL31 1 -0.9140 0.6768 0.8227 -0.0029
32 CHEM32 2 1.4923 0.0388 0.1859 0.0649
----------------------------------------------------------------

Display 8: Bifactor results in item group order

The printout below shows the same information as for Display 7, except that the items are re-or-
dered by group number. All 32 items have positive loadings on the general factor, while the
group factor loadings for BIOL31, CHEM03 and PHYS30 are negative, but relatively small.

DISPLAY 8. BIFACTOR RESULTS IN ITEM GROUPORDER


ITEM GROUP DIFFICULTY COMMUNALITY GENERAL SPECIFIC
----------------------------------------------------------------
7 BIOL07 1 -0.6347 0.4277 0.4430 0.4812
9 BIOL09 1 -1.1417 0.2136 0.1882 0.4221
10 BIOL10 1 0.3483 0.4372 0.5713 0.3330
11 BIOL11 1 -2.0915 0.3887 0.5390 0.3133
13 BIOL13 1 -0.3328 0.3918 0.5249 0.3411
15 BIOL15 1 -0.8412 0.4683 0.5493 0.4081
17 BIOL17 1 -1.7506 0.3391 0.5395 0.2192
18 BIOL18 1 0.5366 0.7089 0.8358 0.1022
21 BIOL21 1 -1.3192 0.2177 0.2152 0.4140
22 BIOL22 1 -1.4669 0.3685 0.5956 0.1175
24 BIOL24 1 -0.5344 0.3935 0.5922 0.2069
27 BIOL27 1 -1.0345 0.5244 0.7042 0.1686
31 BIOL31 1 -0.9140 0.6768 0.8227 -0.0029

1 CHEM01 2 0.8138 0.4036 0.5475 0.3222


3 CHEM03 2 0.8380 0.6204 0.7797 -0.1120
6 CHEM06 2 1.3636 0.7504 0.8524 0.1540
8 CHEM08 2 1.2465 0.6847 0.6422 0.5219
16 CHEM16 2 0.3761 0.3262 0.4820 0.3065
29 CHEM29 2 0.6132 0.6825 0.5533 0.6135
32 CHEM32 2 1.4923 0.0388 0.1859 0.0649

2 PHYS02 3 -0.0821 0.5725 0.6663 0.3585


4 PHYS04 3 0.4952 0.2659 0.4022 0.3226
5 PHYS05 3 -0.2137 0.3408 0.5156 0.2739
12 PHYS12 3 0.3828 0.0437 0.0999 0.1836
14 PHYS14 3 -0.5195 0.4537 0.4988 0.4527
19 PHYS19 3 -0.0128 0.2476 0.4967 0.0289
20 PHYS20 3 -1.0889 0.4226 0.6200 0.1954
23 PHYS23 3 0.7102 0.3164 0.4571 0.3279
25 PHYS25 3 0.5005 0.4205 0.4814 0.4345
26 PHYS26 3 0.2243 0.6128 0.7500 0.2242
28 PHYS29 3 0.0337 0.3544 0.5948 0.0255
30 PHYS30 3 0.3034 0.0979 0.2912 -0.1146
----------------------------------------------------------------

811
13 TESTFACT EXAMPLES

Phase 7: General bifactor score: EAP estimate

The factor scores are so-called expected a-posteriori estimates of the general ability factor under
the assumption of normality (see Phase 7, exampl03.out).

Let θ i denote the general ability for examinee i. The EAP score is the conditional expectation
E (θ i | xi1 , xi 2 ,..., xin ), where xij is the item j score for examinee i. It can be shown that this condi-
tional expectation follows as the solution of a two-dimensional integral that is approximated by a
Gauss-quadrature formula. A brief description is provided below for the interested reader.

From well-known results for conditional distributions it follows that

E (θ i | xi1 , xi 2 ,..., xin ) = ∫ θ f (θ i | xi1 , xi 2 ,..., xin )∂θ ,

where

f ( xi1 , xi 2 ,..., xin ,θ i )


f (θ i | xi1 , xi 2 ,..., xin ) =
f ( xi1 , xi 2 ,..., xin )
f ( xi1 , xi 2 ,..., xin | θ i ) g (θ i )
= .
f ( xi1 , xi 2 ,..., xin )

The marginal probability function f ( xi1 , xi 2 ,..., xin ) is obtained in the EM step using a two-
dimensional quadrature formula. Suppose yi1 , yi 2 ,..., yin denotes the set of ordered item scores for
the three groups. Under the assumption of uncorrelated group factors, it follows from Section
13.7 that

f ( yi1 , yi 2 ,..., yi 32 | θ i ,θ i1 ,θ i 2 ,θ i 3 ) = f ( yi1 , yi 2 ,..., yi13 | θ i ,θ i1 ) × f ( yi14 , yi15 ,..., yi 20 | θ i ,θ i 2 )


× f ( yi 21 , yi 22 ,..., yi 32 | θ i ,θ i 3 ),

where θ i denotes the general ability for examinee i, θ i1 is the group 1 (biology), θ i 2 the group 2
(chemistry) and θ i 3 the group 3 (physics) ability respectively. Note that (see Display 9) yi1 = xi 7 ,
yi 2 = xi 9 , …, yi 32 = xi 30 .

Under the independence assumption, it follows that

f ( yi1 , yi 2 ,..., yi 32 ) = f ( yi1 , yi 2 ,..., yi13 ) × f ( yi14 , yi15 ,..., yi 20 ) × f ( yi 21 , yi 22 ,..., yi 32 )


.

Each term in this product can be expressed as a two-dimensional integral.

812
13 TESTFACT EXAMPLES

The first term, for example, can be evaluated from

f ( yi1 , yi 2 ,..., yi13 ) = ∫ ∫ f ( yi1 , yi 2 ,..., yi13 | θ ,θ1 ) g (θ ,θ1 )∂θ ∂θ1.
θ θ1

This integral can be approximated by

∑∑ w w fθ θ ( x , x ),
k l
*
, 1 k l

where wk and wl are the weights and xk and xl the points shown as Display 9 of the output.

Display 10: General factor scores and standard errors

The ability scores for each case and corresponding standard error estimates are tabulated. Ex-
aminee 5, for example, selected the correct answers to 22 of the 32 items. Therefore, the percent-
age correct is 22 / 32 × 100 = 68.8%.

The estimated ability score for this candidate is 0.8 with a standard error of 0.411. Candidate 7
also had correct answers to 22 of the 32 items, but the ability estimate is 0.691. It is evident that
the ability estimated depends on the number of correct items as well as which subset of items
was answered correctly.

DISPLAY 10. GENERAL FACTOR SCORE AND STANDARD ERROR (S.E.)

CASE HEADER:
CASE NUMBER PERCENT PERCENT CASE ID
PRESENTED CORRECT OMITTED
SCORE AND S.E.
============================================================================
1 32 100.0 0.0 Case001
2.507 0.591
2 32 53.1 0.0 Case002
-0.066 0.323
3 32 56.2 0.0 Case003
0.032 0.380
4 32 50.0 0.0 Case004
-0.559 0.505
5 32 68.8 0.0 Case005
0.800 0.411
6 32 62.5 0.0 Case006
0.342 0.491
7 32 68.8 0.0 Case007
0.691 0.469
8 32 65.6 0.0 Case008
0.286 0.463
9 32 28.1 0.0 Case009
-1.434 0.518
10 32 46.9 0.0 Case010
-0.965 0.342

813
13 TESTFACT EXAMPLES

Summary statistics for score estimates

The number of cases scored is equal to 600, with a mean of –0.0258 and standard deviation of
0.9011. Note that the ability scores are estimated under the assumption that the general factor
ability has a normal distribution with mean 0 and standard deviation 1. For large data sets, one
ideally wants the estimated ability scores to have mean 0 and standard deviation 1.

The root-mean-square posterior standard deviations are calculated as follows.

SE12 + SE22 + ... + SEN2


RMS =
N

where SE12 = 0.591 , SE22 = 0.323 etc.

The RMS value of 0.4394 is relatively large, and indicates that, in general, 95% confidence in-
tervals for the estimated scores will be wide. For example, a 95% confidence interval for exami-
nee 5 is

0.8 ± 1.96(0.411) = (−0.006;1.6806).

The empirical reliability is a measure of how close the observed scores are to the true, but unob-
served, scores. A reliability of 1, for example, implies that one can safely substitute the observed
test scores for the unknown true scores.

SUMMARY STATISTICS FOR SCORE ESTIMATES


======================================
CASES SCORED 600
MEAN: -0.0258
S.D.: 0.9011
VARIANCE: 0.8119

ROOT-MEAN-SQUARE POSTERIOR STANDARD DEVIATIONS


RMS: 0.4344
VARIANCE: 0.1887

EMPIRICAL
RELIABILITY: 0.8114

13.8 Conventional three-factor analysis of the 12th-grade science assessment


test

Data for this example are based on 32 items from a science assessment test in the subjects of bi-
ology, chemistry, and physics administered to twelfth-grade students near the end of the school
year. For a description of the data file and variable format statement, see the example discussed
in Section 13.7.

814
13 TESTFACT EXAMPLES

Although the items are classified by subject matter for purposes of a bifactor analysis (see Sec-
tion 13.4), an item factor analysis is specified. The FULL command specifies that a maximum of
24 EM cycles (CYCLES=24) is to be performed in the full information item factor analysis in
which 3 factors and 3 latent roots are to be extracted (NFAC=3, NROOT=3 on FACTOR command).
Non-adaptive quadrature is requested through the use of the NOADAPT option on the TECHNICAL
command.

The SAVE command is used to write the unrotated factor loadings to exampl8.unr (UNROTATE
option). See Section 13.14 for more details on the use of this file.

>TITLE
EXAMPL08.TSF-THREE FACTOR ANALYSIS OF A TWELFTH-GRADE SCIENCE ASSESSMENT TEST
UNROTATED FACTOR LOADINGS ARE SAVED FOR USE IN EXAMPL14.
>PROBLEM NITEM=32,RESPONSE=6;
>NAMES CHEM01,PHYS02,CHEM03,PHYS04,PHYS05,CHEM06,BIOL07,CHEM08,BIOL09,
BIOL10,BIOL11,PHYS12,BIOL13,PHYS14,BIOL15,CHEM16,BIOL17,BIOL18,
PHYS19,PHYS20,BIOL21,BIOL22,PHYS23,BIOL24,PHYS25,PHYS26,BIOL27,
PHYS29,CHEM29,PHYS30,BIOL31,CHEM32;
>RESPONSE 8,1,2,3,4,5;
>KEY 14523121312421534414334135131545;
>TETRACHORIC NDEC=3,LIST;
>FACTOR NFAC=3,NROOT=3;
>FULL CYCLES=24;
>TECHNICAL NOADAPT;
>SAVE UNROTATE;
>INPUT NIDCHAR=7,SCORES,FILE=‘EXAMPL07.DAT’;
(7A1,T11,32A1)
>STOP

13.9 Computing examinee general factor scores from parameters of a previous


bifactor analysis

This example illustrates bifactor scoring from saved parameters. Data for this example are based
on 32 items from a science assessment test in the subjects of biology, chemistry, and physics
administered to twelfth-grade students near the end of the school year. For a description of the
data file and variable format statement, see the example discussed in Section 13.7.

The assignment of items to group factors is not included in the parameter file exampl07.par read
using the FILE keyword on the SCORE command. The assignment of items must be supplied in
the bifactor command. The BIFACTOR command is used to request full information estimation of
loadings on a general factor in the presence of item-group factors. Three item-group factors are
present (NIGROUP=3), with allocation of the items to these groups as specified with the IGROUPS
keyword. By setting the LIST keyword to 3, the bifactor loadings will be printed in both item and
in item-group order in the output file. A total of 30 EM cycles (CYCLES=30) will be performed in
the bifactor solution. The chance parameters are supplied in the file and do not need to be re-
entered in the command.

815
13 TESTFACT EXAMPLES

For the purpose of scoring from supplied parameters, the number of factors (NFAC) is set to 1 in
the SCORE command. Factor scores for the first 10 students will be printed to the output file
(LIST=10). The factor scores are also saved to the file exampl09.fsc (FSCORES on the SAVE com-
mand). The guessing model will be used in the computation of the factor scores (CHANCE option).

>TITLE
EXAMPL09.TSF- ITEM BIFACTOR ANALYSIS OF A TWELFTH-GRADE SCIENCE
ASSESSMENT TEST: THE GENERAL FACTOR WILL BE SCORED
>PROBLEM NITEM=32,RESPONSE=6;
>NAMES CHEM01,PHYS02,CHEM03,PHYS04,PHYS05,CHEM06,BIOL07,CHEM08,
BIOL09,BIOL10,BIOL11,PHYS12,BIOL13,PHYS14,BIOL15,CHEM16,
BIOL17,BIOL18,PHYS19,PHYS20,BIOL21,BIOL22,PHYS23,BIOL24,
PHYS25,PHYS26,BIOL27,PHYS29,CHEM29,PHYS30,BIOL31,CHEM32;
>RESPONSE 8,1,2,3,4,5;
>KEY 14523121312421534414334135131545;
>BIFACTOR NIGROUP=3,LIST=3,CYCLES=30,
IGROUPS=(2,3,2,3,3,2,1,2,1,1,1,3,1,3,1,2,1,1,3,3,1,1,
3,1,3,3,1,3,2,3,1,2);
>SCORE NFAC=1,LIST=10,CHANCE,FILE=‘EXAMPL07.PAR’;
>SAVE FSCORES;
>INPUT NIDCHAR=7,SCORES,FILE=‘EXAMPL07.DAT’;
(7A1,T11,32A1)
>STOP;

A graphical presentation of the general factor scores using a bifactor analysis with 3 groups is
shown below.

816
13 TESTFACT EXAMPLES

13.10 One-factor analysis of the 12th-grade science assessment test

The saved parameters from Section 13.7 are used in scoring the general factor by adaptive quad-
rature. Data for this example are based on 32 items from a science assessment test in the subjects
of biology, chemistry, and physics administered to twelfth-grade students near the end of the
school year. For a description of the data file and variable format statement, see Section 13.7.
The PROBLEM, KEY, RESPONSE, SAVE, and INPUT commands are also the same as those used in
Section 13.7.

Conditional dependence due to the group factors is not accounted for. A one-factor analysis is
requested by replacing the BIFACTOR command used in Section 13.7 with the FACTOR command
shown here.

EAP factor scores are requested (METHOD=2 on the SCORE command). The first ten cases are also
printed to the output file (LIST=10) and factor scores for all cases are saved to the file ex-
ampl10.fsc (FSCORES on SAVE command). As before, the guessing model will be used in the
computation of the factor scores (CHANCE option).

>TITLE
EXAMPL10.TSF-ONE-FACTOR ANALYSIS OF A TWELFTH-GRADE SCIENCE ASSESSMENT TEST
ADAPTIVE SCORING OF GENERAL FACTOR FROM SUPPLIED BIFACTOR PARAMETERS
>PROBLEM NITEM=32,RESPONSE=6;
>NAMES CHEM01,PHYS02,CHEM03,PHYS04,PHYS05,CHEM06,BIOL07,CHEM08,
BIOL09,BIOL10,BIOL11,PHYS12,BIOL13,PHYS14,BIOL15,CHEM16,
BIOL17,BIOL18,PHYS19,PHYS20,BIOL21,BIOL22,PHYS23,BIOL24,
PHYS25,PHYS26,BIOL27,PHYS29,CHEM29,PHYS30,BIOL31,CHEM32;
>RESPONSE 8,1,2,3,4,5;
>KEY 14523121312421534414334135131545;
>FACTOR NFAC=1;
>SCORE METHOD=2,NFAC=1,LIST=10,CHANCE,FILE=‘EXAMPL07.PAR’;
>SAVE FSCORES;
>INPUT NIDCHAR=7,SCORES,FILE=‘EXAMPL07.DAT’;
(7A1,T11,32A1)
>STOP;

A histogram of the factor scores obtained from the one-factor model is shown below. The distri-
bution of scores follow a bell-shaped curve with mean –0.028 and standard deviation of 0.933. In
contrast to this, the bifactor solution (see previous section) yields scores for the general factor
which exhibit much less variation (standard deviation = 0.273) about the mean.

817
13 TESTFACT EXAMPLES

13.11 Item factor analysis of a user-supplied correlation matrix

This example illustrates a MINRES factor analysis of a correlation matrix imported from the file
exampl04.cor saved in Section 13.4.

The import file is named using the FILE keyword on the INPUT command. The CORRELAT option
on this command indicates that the input is a correlation matrix for MINRES factor analysis (full
information factor analysis requires item response data and cannot be carried out directly on the
correlation matrix). In this instance, the matrix contains item tetrachoric correlations, but a corre-
lation matrix from any source could be analyzed.

For convenience in handling large correlation matrices, the tetrachoric correlation matrix is
saved and imported in format-free space delimited form. Note that names are supplied in the
NAMES command for the variables represented in the correlation matrix.

The 48 items, from which 32 are selected using the SELECT keyword on the PROBLEM command
and the SELECT command specifying the items and the order of selection, are from the Jenkins
Activity Survey. SKIP=2 on the PROBLEM command bypasses the calculations and printing of
classical item statistics.

The FACTOR command specifies the extraction of 3 factors and 6 roots (NFAC=3; NROOT=6). A
PROMAX rotation is requested and the rotated factor loadings will be saved in the file ex-
ampl03.rot (ROTATE option on the SAVE command). Note that the PROMAX option may not be ab-
breviated on the FACTOR command.

818
13 TESTFACT EXAMPLES

>TITLE
EXAMPL11.TSF- ITEMS FROM THE JENKINS ACTIVITY SURVEY
ITEM FACTOR ANALYSIS OF A USER-SUPPLIED CORRELATION MATRIX
>PROBLEM NITEM=48,SELECT=32,SKIP=2;
>NAMES Q156,Q157,Q158,Q165,Q166,Q167,Q247,Q248,Q249,Q250,Q251,Q252,
Q253,Q254,Q255,Q256,Q257,Q258,Q259,Q260,Q261,Q262,Q263,Q264,
Q265,Q266,Q267,Q268,Q269,Q270,Q271,Q272,Q273,Q274,Q275,Q276,
Q277,Q278,Q279,Q280,Q307,Q308,Q309,Q310,Q311,Q312,Q313,Q314;
>SELECT 3,5,6,7,9,11(1)14,17(1)23,25(1)30,32,33,35,36,39(1)42,47,48;
>FACTOR NFAC=3,NROOT=6,ROTATE=PROMAX;
>SAVE ROTATE;
>INPUT CORRELAT,FILE=‘EXAMPL04.COR’;
>STOP

13.12 Simulating examinee responses to a three-factor test with user-supplied


parameters

This example illustrates the simulation of a sample of 1500 responses to 32 items. Sampling is
from a multivariate latent distribution of factor scores with user-specified vector mean and fixed
correlation matrix. A three-factor model is assumed. The user must supply standard item diffi-
culties and NFAC factor loadings (or intercepts or factor slopes) for each item.

Note that the PROBLEM command only indicates the number of items, and that the syntax contains
no INPUT command, but only the SIMULATE command. The NFAC keyword on this command indi-
cates the use of a three-factor model, and NCASES denotes the required sample size. The presence
of the SLOPES option indicates that the item parameters provided are the intercept and the NFAC
slopes. These parameter values are read in from the file exampl12.prm using the FILE keyword.

The MEAN keyword indicates the population means of the factor scores from which the responses
are generated. These means will be added to the random standard normal deviates representing
the ability of each case on the corresponding factors. If the MEAN keyword is omitted, zero means
are assumed and written to the *.sim file.

The simulated responses are written to a file with file extension *.sim, in this case ex-
ampl12.sim. The first line of each new record contains the case number, group number, form
number and factor means. The next line is the set of responses, where 0 indicates an incorrect
answer and 1 a correct answer. The GROUP keyword is set to its default value of 1. Similarly, test
form identification may be requested using the FORM keyword. By default, all records will be as-
sumed to belong to the same test form.

>TITLE
EXAMPL12.TSF- SIMULATE RESPONSES TO 32 ITEMS
THREE FACTOR MODEL; FACTOR SLOPES; SAMPLE SIZE= 1500
>PROBLEM NITEM=32;
>SIMULATE NFAC=3,NCASES=1500,FORM=2,GROUP=1,SLOPES,
MEAN=(0,0.0,0.0),FILE=‘EXAMPL12.PRM’;
>STOP

The first few records of the exampl12.prm file are:

819
13 TESTFACT EXAMPLES

(6X,4F8.3)
1 1.041 -0.675 0.246 -0.049
2 0.480 0.585 -0.261 0.024
3 0.868 0.240 0.230 -0.063

13.13 Simulating examinee responses in the presence of guessing and non-


zero factor means

This example illustrates the simulation of a sample of 1500 responses to 32 items. Sampling is
from a multivariate latent distribution of factor scores with user-specified vector mean and fixed
correlation matrix. A three-factor model is assumed and simulation is with guessing and non-
zero factor means. The user must supply standard item difficulties and NFAC factor loadings (or
intercepts or factor slopes) for each item.

Note that the PROBLEM command only indicates the number of items, and that the syntax contains
no INPUT command, but only the SIMULATE command. The NFAC keyword on this command indi-
cates the use of a three-factor model, and NCASES denotes the required sample size. The presence
of the CHANCE and LOADINGS options indicates that each item has a guessing, standardized diffi-
culty and three factor loading parameters. The parameters are read in from the file exampl3.prm
using the FILE keyword.

The MEAN keyword indicates the population means of the factor scores from which the responses
are generated. These means will be added to the random standard normal deviates representing
the ability of each case on the corresponding factors. If the MEAN keyword is omitted, zero means
are assumed and written to the *.sim file.

The simulated responses are written to a file with file extension *.sim, in this case ex-
ampl13.sim. The first line of each new record contains the case number, group number, form
number and factor means. The next line is the set of responses, where 0 indicates an incorrect
answer and 1 a correct answer. The GROUP keyword is set to its default value of 1. Similarly, test
form identification may be requested using the FORMS keyword. By default, all records will be as-
sumed to belong to the same test form.

The SCORESEED keyword specifies the random number generator seed for the simulation of mean
abilities, the GUESSSEED keyword the seed for the simulation of chance parameters with popula-
tion values specified in the exampl13.prm file, and the ERRORSEED is the seed associated with
the simulation of the binary responses based on the difficulty and slope parameters.

>TITLE
EXAMPL13.TSF-SIMULATE RESPONSES TO 32 ITEMS WITH GUESSING AND NON-ZERO
FACTOR MEANS; THRE-FACTOR MODEL; FACTOR LOADINGS; N=1500
>PROBLEM NITEM=32;
>SIMULATE NFAC=3, NCASES=1500, FORM=2, GROUP=3, LOADINGS, ERRORSEED=1231,
SCORESEED=71893, GUESSSEED=3451, FILE=‘EX7SIM.PAR’, CHANCE,
MEAN=(0.5,-0.5,1.0);
>STOP

820
13 TESTFACT EXAMPLES

The first few records of the exampl13.prm file are:

(6X,5F8.3)
1 0.200 0.844 0.552 -0.197 0.046
2 0.200 -0.405 0.497 0.215 -0.019
3 0.200 -0.824 0.222 -0.216 0.065

Discussion of simulation output

This example illustrates how to simulate data under the assumption that there are 32 binary items
that measure three ability factors. The model considered allows for guessing and for non-zero
factor means. It is assumed that for each item, the population values for the guessing, standard-
ized difficulty and factor loadings are known. These values are stored in the file ex7sim.par.

Phase 0: Input commands

The COMMENT command is used to show the format statement and the parameter values for the
first 5 of the 32 items.

(6X,5F8.3)
1 0.200 0.844 -0.552 -0.197 0.046
2 0.200 -0.405 0.497 0.215 -0.019
3 0.200 -0.824 0.222 -0.216 0.065
4 0.200 -0.520 0.508 0.415 -0.030
5 0.200 0.083 -0.145 0.026 0.244

The LOADINGS option on the SIMULATE command specifies that the population parameters are
standardized difficulties and factor loadings. Note that FORM=2, GROUP=3, ERRORSEED=1231,
GUESSSEED=3451, and SCORESEED=71893 are optional keywords.

The values of the chance, difficulty, and factor loadings for each item (the contents of ex-
ampl13.prm) are given below.

NUMBER OF ITEMS = 32
NUMBER OF CASES = 1500
NUMBER OF FACTORS = 3
CHANCE MODEL

VALUES OF CHANCE, DIFFICULTY, AND FACTOR LOADINGS


ITEM CHANCE DIFFICULTY FACTOR LOADINGS

ITEM 1 0.200 0.844 -0.552 -0.197 0.046


ITEM 2 0.200 -0.405 0.497 0.215 -0.019
ITEM 3 0.200 -0.824 0.222 -0.216 0.065
ITEM 4 0.200 -0.520 0.508 0.415 -0.030
ITEM 5 0.200 0.083 -0.145 0.026 0.244
ITEM 6 0.200 -0.080 0.245 -0.232 -0.384
ITEM 7 0.200 0.026 0.323 -0.158 -0.412
ITEM 8 0.200 0.220 0.524 0.226 -0.191
ITEM 9 0.200 0.842 0.544 -0.002 -0.312
ITEM 10 0.200 0.125 0.492 -0.273 -0.261
ITEM 11 0.200 0.133 0.342 -0.197 -0.234
ITEM 12 0.200 -0.080 0.398 -0.280 -0.525

821
13 TESTFACT EXAMPLES

ITEM 13 0.200 0.032 0.604 -0.016 0.198


ITEM 14 0.200 -0.494 0.741 0.070 0.483
ITEM 15 0.200 -0.259 0.745 0.012 0.423
ITEM 16 0.200 0.980 -0.543 0.186 -0.109
ITEM 17 0.200 -0.740 0.506 -0.201 -0.002
ITEM 18 0.200 0.456 -0.288 0.120 -0.296
ITEM 19 0.200 -0.609 0.323 -0.663 -0.127
ITEM 20 0.200 -1.096 -0.238 -0.502 -0.085
ITEM 21 0.200 -0.410 0.391 -0.562 0.074
ITEM 22 0.200 -0.443 0.539 -0.259 0.159
ITEM 23 0.200 -0.072 0.025 -0.031 -0.292
ITEM 24 0.200 0.845 0.521 0.233 -0.244
ITEM 25 0.200 -0.143 0.478 0.286 -0.046
ITEM 26 0.200 0.136 -0.487 -0.463 0.109
ITEM 27 0.200 0.809 -0.368 0.202 -0.268
ITEM 28 0.200 0.996 -0.321 0.268 -0.229
ITEM 29 0.200 0.328 0.572 -0.127 0.064
ITEM 30 0.200 0.565 0.345 -0.142 0.121
ITEM 31 0.200 -1.022 0.413 0.491 -0.112
ITEM 32 0.200 -0.219 0.333 0.497 -0.121

The simulated data are written to the file exampl13.sim. The first line of the *.sim file gives the
case, form and group number, as well as the simulated abilities for the three factors (1.056, 0.605
and 0.873 for case 1). Note that, if the keywords FORM=f and GROUP=g are omitted from the
SIMULATE command, default values of one are written to the *.sim file. The values of the simu-
lated abilities will change if a different value for the keyword SCORESEED is used. The default
value is 345261. By changing both or either of the ERRORSEED and GUESSSEED values, one will
obtain a new set of simulated responses. The GUESSSEED default value is 543612 while the
ERRORSEED default value is 453612. Note that the GUESSSEED parameter only has an effect if a
chance model is simulated. It determines the sequence of the simulated values from a normal
population with a mean equal to the chance parameter (in the case of exampl13.tsf this value is
0.2 for each item).

1 2 3 1.056 0.605 0.873


11110011000111100011010010000111
2 2 3 0.233 0.795 0.078
11111011111011101011101110010011
3 2 3 0.744 0.181 0.215
01110010011101101011111110101011
4 2 3 1.070 -0.490 0.611
01111111010101101011111111111111
5 2 3 0.934 0.457 0.824
01010001010011110110111110000111
6 2 3 0.440 -0.561 0.881
11111111010011001011111010001100
7 2 3 0.286 -2.143 1.583
00100110010111101011111100001111
8 2 3 -0.786 0.282 1.838
11101010010001111111101010000010
9 2 3 -0.740 -0.866 0.929
10101110110100101111010101001010
10 2 3 0.495 -0.182 -0.244
11110011110111101111111011100011

822
13 TESTFACT EXAMPLES

Means and standard deviations of ability variables

The values below are based on the simulated ability for each factor. For example, the mean of
factor 1 is computed as

(1.056 + 0.233 + 0.844 + 1.070 + ...) /1500 = 0.513.

Note that the means are close to the assumed population values of 0.5, -0.5 and 1.0 respectively.

The correlations between the simulated ability variables are close to zero showing that the simu-
lated factor abilities are, for practical purposes, uncorrelated.

MEANS AND STANDARD DEVIATIONS OF ABILITY VARIABLES


FACTOR MEAN S.D.
1 0.513 0.984
2 -0.521 0.972
3 1.025 0.958

CORRELATION COEFFICIENT MATRIX OF CASE ABILITIES VARIABLES FACTOR

1 2 3
1.000
-0.005 1.000
0.000 0.039 1.000

13.14 Three-factor analysis with PROMAX rotation: 32 items from the sci-
ence assessment test

In this example, a PROMAX rotation is performed, using a three-factor model and 32 items from
a science assessment test in the subjects of biology, chemistry, and physics administered to
twelfth-grade students near the end of the school year. For a description of the data file and vari-
able format statement, see Section 13.7.

As input, a factor pattern from a 3-factor analysis of the data discussed in Section 13.8 is used
(FILE keyword on the INPUT command). The FACTORS option on the same command indicates
that the input is in the form of factor loadings. This option is used for rotation only.

The first few records of the exampl08.unr file are:

(15X,5F10.6,2(/15X,5F10.6))

1 CHEM01 0.368242 -0.149557 -0.028675


2 PHYS02 0.475550 -0.008526 0.150567
3 CHEM03 0.439431 -0.017829 -0.100431
4 PHYS04 0.285406 -0.197091 0.186737
5 PHYS05 0.425255 0.003244 0.155182

823
13 TESTFACT EXAMPLES

The SKIP keyword on the PROBLEM command is set to 2, and TESTFACT will thus proceed di-
rectly to rotation after input of the factor pattern. The rotation is specified by the ROTATE key-
word on the FACTOR command, while the NFAC keyword confirms this to be a three-factor model.

>TITLE
EXAMPL14.TSF- PROMAX ROTATION FOR 32 ITEMS
3-FACTOR MODEL
>PROBLEM NITEM=32,SKIP=2;
>NAMES CHEM01,PHYS02,CHEM03,PHYS04,PHYS05,CHEM06,BIOL07,CHEM08,
BIOL09,BIOL10,BIOL11,PHYS12,BIOL13,PHYS14,BIOL15,CHEM16,
BIOL17,BIOL18,PHYS19,PHYS20,BIOL21,BIOL22,PHYS23,BIOL24,
PHYS25,PHYS26,BIOL27,PHYS29,CHEM29,PHYS30,BIOL31,CHEM32;
>FACTOR NFAC=3,ROTATE=PROMAX;
>INPUT FACTOR,FILE=‘EXAMPL08.UNR’;
>STOP

Each row of factor loadings can be viewed as a point in multidimensional space, so that each fac-
tor corresponds to a coordinate axis. A factor rotation is equivalent to rotating those axes, result-
ing in a new set of factor loadings. There are various rotation methods, some (e.g. VARIMAX)
leave the axes orthogonal, while others are so-called oblique methods that change the angles be-
tween axes. The oblique method used in TESTFACT is called PROMAX. This method often
creates a simpler structure in the sense that loadings on each factor are either large or small. Note
that with oblique rotation, factors are no longer uncorrelated.

Sections of the output are shown below.

DISPLAY 2. PROMAX ROTATED FACTOR LOADINGS

1 2 3
1 CHEM01 -0.021 0.271 0.170
2 PHYS02 0.235 0.381 -0.035
3 CHEM03 0.105 0.119 0.300
4 PHYS04 -0.006 0.491 -0.162
5 PHYS05 0.229 0.352 -0.030
6 CHEM06 0.014 0.332 0.286
7 BIOL07 0.469 -0.106 -0.023
8 CHEM08 -0.019 0.138 0.056
9 BIOL09 0.419 -0.027 -0.056
10 BIOL10 0.275 0.142 0.001
11 BIOL11 0.301 0.041 0.016
12 PHYS12 -0.117 0.275 0.002
13 BIOL13 0.361 0.090 -0.013
14 PHYS14 0.127 0.577 -0.050
15 BIOL15 0.332 -0.146 0.034
16 CHEM16 -0.051 0.298 0.042
17 BIOL17 0.232 0.018 0.034
18 BIOL18 0.113 0.086 0.068
19 PHYS19 -0.022 0.123 0.073
20 PHYS20 0.208 0.309 0.000
21 BIOL21 0.485 0.016 -0.086
22 BIOL22 -0.022 -0.180 0.121
23 PHYS23 -0.177 0.378 0.049
24 BIOL24 0.239 -0.023 0.043
25 PHYS25 -0.115 0.397 0.045
26 PHYS26 0.110 0.409 0.009
27 BIOL27 0.297 0.092 0.017

824
13 TESTFACT EXAMPLES

28 PHYS29 0.168 0.284 0.007


29 CHEM29 -0.144 0.267 0.075
30 PHYS30 0.105 -0.018 0.027
31 BIOL31 0.185 0.277 0.017
32 CHEM32 0.035 0.018 0.003

From the output of PROMAX factor loadings we conclude that there are only two factors, these
being Biology (factor 1) and Chemistry-physics (factor 2). Except for item 3 (CHEM03), all
items have larger loadings on either of the first two factors than on the third factor.

DISPLAY 3. PROMAX FACTOR CORRELATIONS


1 2 3
1 1.000
2 0.409 1.000
3 0.453 0.694 1.000

The correlation between factors 2 and 3 equals 0.694. This relatively high correlation may ex-
plain why two factors appear to be sufficient.

13.15 Principal factor solution of a factor analysis on simulated data: no


guessing

32 items from the simulated data set in the file exampl15.dat are used as indicated by the
NITEMSS keyword on the PROBLEM command and the FILE keyword on the INPUT command.
The input is in the form of input subject records containing scores. The RESPONSE keyword de-
notes the number of possible responses. The three responses are listed in the RESPONSE com-
mand. Naming of the items is done using the NAMES command, while the KEY command lists the
correct response to each item.

The TETRACHORIC command requests the recoding of omits to wrong responses (RECODE option),
prior to the computation of the tetrachoric correlation coefficients. The FACTOR and FULL com-
mands are used to specify parameters for the factor analysis. A two-factor model will be fitted to
the data (NFAC=2) and the first 6 characteristic roots of the smoothed correlation matrix
(NROOT=6) will be written to the output file. A maximum of 10 EM cycles will be performed
(CYCLES keyword on the FULL command). The OMIT keyword on this command indicates re-
coding of omits to wrong responses. The QUAD keyword sets the number of quadrature points for
the EM estimation of the parameters to 9, instead of the default of 15 for the 2-factor case when
the NOADAPT option is selected. Non-adaptive quadrature will be performed (NOADAPT option on
the TECHNICAL command).

Trial intercept and slope estimates after 10 cycles will be saved in exampl15.tri as indicated by
the TRIAL option on the SAVE command.

>TITLE
EXAMPL15.TSF- 2-FACTOR MODEL. SIMULATED DATA: PRINCIPAL FACTOR SOLUTION, NO
GUESSING. NON-ADAPTIVE QUADRATURE. SAVE TRAIL VALUES FOR CONTINUED EM CYCLES.
>PROBLEM NITEM=32,RESPONSE=3;
>NAMES I1,I2,I3,I4,I5,I6,I7,I8,I9,I10,
I11,I12,I13,I14,I15,I16,I17,I18,I19,I20,
I21,I22,I23,I24,I25,I26,I27,I28,I29,I30,

825
13 TESTFACT EXAMPLES

I31,I32;
>RESPONSE ‘8’, ‘0’, ‘1’;
>KEY 11111111111111111111111111111111;
>TETRACHORIC RECODE;
>FACTOR NFAC=2,NROOT=6;
>FULL CYCLES=10,OMIT=RECODE,QUAD=9;
>TECHNICAL NOADAPT;
>SAVE TRIAL;
>INPUT NIDCHAR=3,SCORES,FILE=‘EXAMPL15.DAT’;
(3A1,T31,32A1)
>STOP

13.16 Non-adaptive factor analysis of simulated data: principal factor solu-


tion, no guessing

32 items from the simulated data set in the file example15.dat are again used as input (see Sec-
tion 13.15) as indicated by the NITEMS keyword on the PROBLEM command and the FILE keyword
on the INPUT command. The input is in the form of input subject records containing scores
(SCORES option). Trial intercept and slope estimates will be read from the previously saved file
exampl15.tri. Item numbers are required. The first few lines of the trial values file are:

(15X,6F9.5,2(/24X,5F9.5))
1 I1 0.02547 0.68219 -0.55756
2 I2 0.01425 0.64937 -0.88348
3 I3 -0.00925 0.80883 -0.84193

The RESPONSE keyword denotes the number of possible responses. The three responses are listed
in the RESPONSE command. Naming of the items is done using the NAMES command, while the
KEY command lists the correct response to each item. The inclusion of the SKIP=1 keyword on
the PROBLEM command indicates that the classical and item analysis phase should be skipped.
The program will proceed to the calculation of tetrachoric correlations immediately after data
entry.

The FACTOR and FULL commands are used to specify parameters for the factor analysis. Two fac-
tors and six latent roots are to be printed, as indicated by the NFAC and NROOT keywords respec-
tively. The OMIT keyword on the FULL command indicates recoding of omits to wrong responses.
The QUAD keyword sets the number of quadrature points for the EM estimation of the parameters
to 9, instead of the default of 15 for the 2-factor case when the NOADAPT option is selected. Non-
adaptive quadrature will be performed (NOADAPT option on the TECHNICAL command). The pa-
rameters assigned to the ITER keyword requests a maximum of 15 EM cycles, with a maximum
of 5 iterations and a convergence criterion of 0.001 for the M-step. Trial values will be saved
again in case further EM cycles are necessary. The trial values and the intercepts, factor slopes,
and guessing parameters (in a form suitable for computing factor scores at a later time) are saved
in exampl16.tri and exampl16.par as indicated by the TRIAL and PARM options on the SAVE
command.

826
13 TESTFACT EXAMPLES

>TITLE
EXAMPL16.TSF- 2-FACTOR MODEL SIMULATION: PRINCIPAL FACTOR SOLUTION, NO
GUESSING. NON-ADAPTIVE QUADRATURE. CONTINUE WITH AN ADDITIONAL 15 CYCLES.
>PROBLEM NITEM=32,RESPONSE=3,SKIP=1;

>NAMES I1,I2,I3,I4,I5,I6,I7,I8,I9,I10,
I11,I12,I13,I14,I15,I16,I17,I18,I19,I20,
I21,I22,I23,I24,I25,I26,I27,I28,I29,I30,
I31,I32;
>RESPONSE ‘8’, ‘0’, ‘1’;
>KEY 11111111111111111111111111111111;
>FACTOR NFAC=2,NROOT=6;
>FULL OMIT=RECODE,QUAD=9;
>TECHNICAL NOADAPT,ITER=(15,5,0.001);
>SAVE PARM, TRIAL;
>INPUT TRIAL=‘EXAMPL15.TRI’,NIDCHAR=3,SCORES,FILE=‘EXAMPL15.DAT’;
(3A1,T31,32A1)
>STOP

13.17 Adaptive item factor analysis of 25 spelling items from the 100-Item
Spelling Test

Data from a 100 word spelling test are used in this example. A complete description of these data
are given in Section 2.4.1.

The data file example17.dat contains individual responses to all 100 items, of which 25 are used
here. Data are read using the FILE keyword on the INPUT command. The SCORES option indi-
cates that the data file contains item scores, and that the case identification is 11 characters in
width (NIDCHAR).

The first 11 columns of every line of data contain the case identification, which is represented by
“11A1” in the variable format statement given below. Responses to the first 25 items start in col-
umn 13, and the “X” operator is used to skip over the 11th column after the case identification has
been read. The next set of 25 responses are contained in columns 39 to 63 inclusive and are read
in the same format as the previous set (25A1). The third set of responses follows after 1 blank
column, which is skipped using the “X” operator. The final set of 25 items is again separated
from the previous set by a single blank column (1X,25A1).

(11A1,1X,25A1,1X,25A1,1X,25A1,1X,25A1)

The number of items read by the variable format statement corresponds to the number of items
indicated on the PROBLEM command (NITEMS keyword). The 12 possible responses to each item
are listed on the RESPONSE command, and the RESPONSE key on the PROBLEM command indicates
the total number. The answer key, given in the KEY command, indicates that a “1” is the correct
response to all 100 items.

SELECT=25 on the PROBLEM command indicates that only 25 items will be used in the analysis.
These items are listed in the order to be used on the SELECT command.

827
13 TESTFACT EXAMPLES

The TETRACHORIC command specifies how the count matrix to be used in the calculation of the
tetrachoric correlations is to be formed. By using the (default) RECODE option, all omitted re-
sponses will be recoded as wrong responses. The matrix of tetrachoric correlations, with ele-
ments printed up to 3 decimal places (NDEC keyword), will be printed in the output (LIST) and
saved to the file exampl17.cor through the use of the CORRELAT option on the SAVE command.
Factor scores and their posterior standard deviations are saved to exampl17.fsc with the FSCORES
option on the SAVE command.

The FACTOR command requests and controls the parameters for the item factor analysis. Two fac-
tors (NFAC=2) are to be extracted, along with 6 latent roots (NROOT=6). The ROTATE keyword is
used to request a PROMAX rotation. Note that this keyword may not be abbreviated in the FACTOR
command. By default, NFAC leading factors will be rotated and the constant for the PROMAX
rotation is equal to 3. The FULL command is used to request full information item factor analysis,
starting from the principal factor solution. The OMIT keyword is set to RECODE, and omitted re-
sponses are thus recoded as wrong responses (similar to the request on the TETRACHORIC com-
mand). Note that RECODE may not be abbreviated in the FULL command.

The SCORE command specifies that the factor scores for 100 cases are to be listed in the output.

>TITLE
EXAMPL17.TSF- ITEM FACTOR ANALYSIS OF 25 SPELLING ITEMS SELECTED FROM
THE 100 WORD SPELLING TEST. USING TETRACHORIC OPTION
>PROBLEM NITEM=100,RESPONSE=12,SELECT=25;
>NAMES S01,S02,S03,S04,S05,S06,S07,S08,S09,S10,S11,S12,S13,S14,S15,S16,
S17,S18,S19,S20,S21,S22,S23,S24,S25,S26,S27,S28,S29,S30,S31,S32,
S33,S34,S35,S36,S37,S38,S39,S40,S41,S42,S43,S44,S45,S46,S47,S48,
S49,S50,S51,S52,S53,S54,S55,S56,S57,S58,S59,S60,S61,S62,S64,S64,
S65,S66,S67,S68,S69,S70,S71,S72,S73,S74,S75,S76,S77,S78,S79,S80,
S81,S82,S83,S84,S85,S86,S87,S88,S89,S90,S91,S92,S93,S94,S95,S96,
S97,S98,S99,S100;
>RESPONSE ‘ ’,‘0’, ‘1’, ‘2’, ‘3’, ‘4’, ‘5’, ‘6’, ‘7’, ‘8’, ‘9’, ‘A’;
>KEY 00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000;
>SELECT 1,4,6,8,9,10,15,20,23(1)29,33,34,36,39,48,49,54,59,64,72;
>TETRACHORIC RECODE,NDEC=3,LIST;
>FACTOR NFAC=2,NROOT=6,NIT=(5,0.02),ROTATE=PROMAX;
>FULL ITER=(8,3,0.01),OMIT=RECODE;
>SCORE LIST=100;
>SAVE CORRELAT,FSCORES;
>INPUT NIDCHAR=11,SCORES,FILE=‘EXAMPL17.DAT’;
(11A1,1X,25A1,1X,25A11X,25A1,1X,25A1)
>STOP

13.18 Classical item factor analysis of spelling data from a tetrachoric corre-
lation matrix

The analysis in this example is based on the spelling data used in Section 13.17. For a discussion
of the data, variable format statement, and INPUT command, see the previous section.

A classical analysis is carried out on all 100 items in the data. Thus the SELECT keyword previ-
ously used is omitted from the PROBLEM command, which only indicates the total number of

828
13 TESTFACT EXAMPLES

items (NITEM) and the total number of possible responses (RESPONSE). All 12 responses are listed
on the RESPONSE command, and the KEY command contains the answer key for all the items. The
TETRACHORIC command specifies how the count matrix, to be used in the calculation of the tetra-
choric correlations, is to be formed. By using the (default) RECODE option, all omitted responses
will be recoded as wrong responses.

The FACTOR command requests and controls the parameters for the item factor analysis. Two fac-
tors (NFAC=2) are to be printed, along with 6 latent roots (NROOT=6). The ROTATE keyword is used
to request a PROMAX rotation. Note that this keyword may not be abbreviated in the FACTOR com-
mand. By default, NFAC leading factors will be rotated and the constant for the PROMAX rota-
tion is equal to 3. The NIT keyword specifies the number of iterations for the MINRES factor
solution and the convergence criterion. A value of 0.01, for example, implies that if the largest
change in factor loadings is less than 0.01, the iteration procedure will terminate. The default
values are 3 and 0.0001 respectively.

Matrix plots of the biserial coefficient (BISERIAL option) and item facility (percent correct;
FACILITY option) against discriminating power are requested using the PLOT command. By de-
fault, the internal test score is used as discriminating power. To use an external criterion score,
the CRITERION option should be included on the PLOT command.

>TITLE
EXAMPL18.TSF- CLASSICAL ANALYSIS OF SPELLING DATA: 100 ITEMS
USING TETRACHORIC OPTION AND PLOT
>PROBLEM NITEM=100,RESPONSE=12;
>RESPONSE ‘ ’,‘0’,‘1’,‘2’,‘3’,‘4’,‘5’,‘6’,‘7’,‘8’,‘9’,‘A’;
>KEY 00000000000000000000000000000000000000000000000000
00000000000000000000000000000000000000000000000000;
>PLOT BISERIAL,FACILITY;
>TETRACHORIC RECODE;
>FACTOR NFAC=2,NROOT=6,NIT=(5,0.01),ROTATE=PROMAX;
>INPUT NIDCHAR=11,SCORES,FILE=‘EXAMPL17.DAT’;
(11A1,4(1X,25A1))
>CONTINUE
>STOP

829
REFERENCES

References
Aitchison, J., & Silvey, S. D. (1960). Maximum-likelihood estimation procedures and associated
tests of significance. Journal of the Royal Statistical Society, Series B, 22, 154-171.

Andersen, E. B. (1973). Asymptotic properties of conditional maximum-likelihood estimators.


Journal of the Royal Statistical Society, Series B, 32, 283-301.

Andersen, E. B., & Madsen, M. (1977). Estimating the parameters of a latent population distribu-
tion. Psychometrika, 42, 357-374.

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43,
561-573.

Baker, F. B. (1992). Item response theory: parameter estimation techniques. Reading, NY: Mar-
cel Dekker.

Bartholomew, D. J. (1980). Factor analysis for categorical data. Journal of the Royal Statistical
Society, Series B, 42, 293-321.

Bergan, J. R., & Stone, C. A. (1985). Latent class models for knowledge domains. Psychological
Bulletin, 98, 166-184.

Binet, A., & Simon, T. (1905). Methods nouvelles pour le diagnostic du niveau intellectuel des
anormaux. Année Psychologique, 11, 191-244.

Birnbaum, A. (1957). Efficient design and use of tests of a mental ability for various decision
making problems. Series Report No. 58-16. Project No. 7755-23, USAF School of Aviation
Medicine, Randolph Air Force Base, Texas.

Birnbaum, A. (1958a). On the estimation of mental ability. Series Report No. 15. Project No.
7755-23, USAF School of Aviation Medicine, Randolph Air Force Base, Texas.

Birnbaum, A. (1958b). Further considerations of efficiency in tests of a mental ability. Technical


Report No. 17. Project No. 7755-23, USAF School of Aviation Medicine, Randolph Air Force
Base, Texas.

Birnbaum, A. (1967). Statistical theory for logistic mental test models with a prior distribution of
ability. Research Bulletin, No. 67-12. Princeton, NJ: Educational Testing Service.

Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In
F. M. Lord & R. M. Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Ad-
dison-Wesley.

848
REFERENCES

Bliss, C. I. (1935). The calculation of the dosage mortality curve (Appendix by R. A. Fisher).
Annals of Applied Biology, 22, 134-167.

Bock, R. D. (1966). Estimating multinomial response relations. Research Memorandum, No. 5.


Chicago: University of Chicago Educational Statistics Laboratory.

Bock, R. D. (1970). Estimating multinomial response relations. In R. C. Bose, et al. (Eds.), Con-
tributions to statistics and probability. Chapel Hill, NC: University of North Carolina Press, 111-
132.

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in
two or more nominal categories. Psychometrika, 37, 29-51.

Bock, R. D. (1975). Multivariate statistical methods in behavioral research. New York:


McGraw-Hill, 2nd edition, 1985, Scientific Software International, Chicago.

Bock, R. D. (1976). Basic issues in the measurement of change. In D. N. M. de Gruijter & L. J.


T. van der Kamp (Eds.), Advances in Psychological and Educational Masurement. London:
Wiley & Sons, 75-76.

Bock, R. D. (1983a). Within-subject experimentation in psychiatric research. In R. D. Gibbons &


Dysken (Eds.), Statistical and methodological advances in psychiatric research. New York: SP
Medical & Scientific Books, 59-90.

Bock, R. D. (1983b). The mental growth curve re-examined. In D. Weiss (Ed.), New horizons in
testing. New York: Academic Press, 205-219.

Bock, R. D. (1983c). The discrete Bayesian. In H. Wainer & S. Messick (Eds.), Principles of
psychometrics. Hillsdale, NJ: Erlbaum, 103-115.

Bock, R. D. (1989). Measurement of human variation: a two-stage model. In R. D. Bock (Ed.),


Multilevel analysis of educational data. New York: Academic Press, 319-342.

Bock, R. D. (1993). Different DIFs. In: P. W. Holland & H. Wainer (Eds.), Differential item
functioning. Hillsdale, NJ: Erlbaum, 115-122.

Bock, R. D. (1997). The nominal categories model. In W. J. van der Linden & R. K. Hambleton
(Eds.), Handbook of Modern Item Response Theory. New York: Springer Verlag, 33-65.

Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters:
application of an EM algorithm. Psychometrika, 46, 443-445.

Bock, R. D., Gibbons, R. D., & Muraki, E. (1988). Full information item factor analysis. Applied
Psychological Measurement, 12, 261-280.

849
REFERENCES

Bock, R. D., & Jones, L. V. (1968). The measurement and prediction of judgment and choice.
San Francisco: Holden-Day.

Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored
items. Psychometrika, 35, 179-197.

Bock, R. D., & Mislevy, R. J. (1981). An item response model for matrix-sampling data: the
California Grade Three Assessment. In D. Carlson (Ed.), Testing in the states: beyond account-
ability. San Francisco: Jossey-Bass, 65-90.

Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer
environment. Applied Psychological Measurement, 6, 431-444.

Bock, R. D., Muraki, E., & Pfiffenberger, W. (1988). Item pool maintenance in the presence of
item parameter drift. Journal of Educational Measurement, 25, 275-285.

Bock, R. D., Thissen, D., & Zimowski, M. F. (1997). IRT estimation of domain scores. Journal
of Educational Measurement 34, 197-211.

Bock, R. D., Wolfe, R., & Fisher, T. H. (1996). A Review and analysis of the Tennessee Value-
Added Assessment System. Nashville, TN: Office of Education Accountability, State of Tennes-
see, Comptroller of the Treasury.

Bock, R. D., & Zimowski, M. F. (1989). Duplex Design giving students a stake in educational
assessment. Chicago: Methodology Research Center NORC.

Bock, R. D., & Zimowski, M. F. (1995). Multiple group IRT. In W. van der Linden & R. Ham-
bleton (Eds.), Handbook of item response theory. New York: Springer-Verlag.

Bock, R. D., & Zimowski, M. F. (1997). Multiple group IRT. In W. J. van der Linden & R. K.
Hambleton (Eds.), Handbook of Modern Item Response Theory. New York: Springer Verlag,
433-448.

Bock, R. D., & Zimowski, M. F. (1998). Feasibility Studies of Two-Stage Testing in Large-Scale
Educational Assessment: Implications for NAEP, 34-41. Commissioned by the NAEP Validity
Studies (NVS) Panel. May 1998.

Bock, R. D., & Zimowski, M. F., (1999). Application of disattenuation analysis to correlations
between matrix-sample assessment results and achievement test scores. Addendum to D. H.
McLaughlin, R . D. Bock, E. A. Arenson & M. F. Zimowski. Palo Alto, CA: American Institutes
for Research.

Bowers, J. (1972). A note on comparing r-biserial and r-point biserial. Educational and Psycho-
logical Measurement, 32, 771-775.

850
REFERENCES

Bradley, R. A., & Terry, M. E. (1952). Rank analysis of incomplete block designs. I. Method of
paired comparisons. Biometrika, 39, 324-345.

Browne, M. W., & du Toit, S. H. C. (1992). Automated fitting of nonstandard models. Multi-
variate Behavioral Research, 27, 269-300.

Burt, C. (1921). Mental and scholastic tests. London: P. S. King & Son.

Carroll, J. B. (1945). The effect of difficulty and chance success on correlations between items or
between tests. Psychometrika, 10, 1-19.

Clogg, C. C. (1979). Some latent structure models for the analysis of Likert-type data. Social
Science Research, 8, 287-301.

Clogg, C. C., & Goodman, L. A. (1984). Latent structure analysis of a set of multi-dimensional
contingency tables. Journal of the American Statistical Association, 79, 762-661.

Cronbach, L. J., Glaser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of behav-
ioral measurements: Theory of generalizability for scores and profiles. New York: Wiley.

Davis, J. A. (1975). Codebook for the Spring 1976 General Social Survey. Chicago: NORC.

De Leeuw, J., & Verhelst, N. (1986). Maximum likelihood estimation in generalized Rasch
models. Journal of Educational Statistics, 11, 193-196.

Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete
data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1-38.

Digvi, D. R. (1979). Calculation of the tetrachoric correlation coefficient. Psychometrika, 44,


169-172.

Fechner, G. T. (1860). Elemente der Psychophysik. Leipzig: Breitkopf & Hartel.

Finney, D. J. (1952). Probit analysis: A statistical treatment of the sigmoid response curve, 2nd
ed. London: Cambridge University Press.

Fisher, R. A. (1925). Theory of statistical estimation. Proceedings of the Cambridge Philosophi-


cal Society, 22, 699-725.

Fisher, R. A., & Yates, F. (1938). Statistical tables for biological, agricultural and medical re-
search. New York: Hafner.

Follman, D. (1988). Consistent estimation in the Rasch model based on nonparametric margins.
Psychometrika, 53, 553-562.

851
REFERENCES

French, J. L., & Hale, R. L. (1990). A history of the development of psychological and educa-
tional testing. In C. R. Reynolds & R. W. Kamphaus (Eds.), Handbook of psychological and
educational assessment of children. New York: Guilford Press, 3-28.

Gibbons, R. D., & Hedeker, D. R. (1992). Full information item bi-factor analysis. Psycho-
metrika, 57, 423-436.

Glas, C. A. W. (1996). Detection of differential item functioning using Lagrange Multiplier


Tests. Research Report, No. 96-02. Enchede: University of Twente, Faculty of Educational Sci-
ence and Technology.

Glass, G. V., & Stanley, J. C. (1970). Statistical Methods in Education and Psychology. Engle-
wood Cliffs, NJ: Prentice-Hall.

Goldstein, H. (1983). Measuring changes in educational attainment over time. Journal of Educa-
tional Measurement, 20, 369-377.

Green, B. F. (1951). A general solution for the latent class model of latent structure analysis.
Psychometrika, 16, 151-166.

Green, B. F. (1952). Latent structure analysis and its relation to factor analysis. Journal of the
American Statistical Association, 47, 71-76.

Green, S. B., Lissitz, R. W., & Mulaik, S. A. (1977). Limitations of coefficient alpha as an index
of test unidimensionality. Educational and Psychological Measurement, 37, 827-838.

Guilford, J. P. (1954). Psychometric Methods. (2nd Ed.) New York: McGraw-Hill.

Gulliksen, H. (1950). Theory of mental tests. New York: Dekker.

Haberman, S. J. (1977). Log-linear models and frequency tables with small expected cell counts.
Annals of Statistics, 5, 1148-1169.

Haberman, S. J. (1979). Analysis of qualitative data, Vol. 2. New developments. New York:
Academic Press.

Hambleton, R. K., & Jurgensen, C. (1990). Criterion referenced assessment of school achieve-
ment. In C. R. Reynolds & R. W. Kamphaus (Eds.), Handbook of psychological and educational
assessment of children. New York: Guilford Press, 456-477.

Hambleton, R. K., & Swaminathan, H. (1985). Item Response Theory. Principles and applica-
tions. Boston: Kluwer.

Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response
theory. Newbury Park, CA: Sage.

852
REFERENCES

Harman, H. H. (1976). Modern Factor Analysis. Chicago: The University of Chicago Press.

Harvey, W. R. (1970). Estimation of variance and covariance components in the mixed model.
Biometrics, 26, 485-504.

Harwell, M. R., Baker, F. B., & Zwarts, M. (1988). Item parameter estimation via marginal
maximum likelihood and an EM algorithm: A didactic. Journal of Educational Statistics, 13,
243-271.

Hendrickson, E. A., & White, P. O. (1964). Promax: A quick method for rotation to oblique sim-
ple structure. British Journal of Mathematical and Statistical Psychology, 17, 65.

Henryssen, S. (1971). Gathering, analyzing, and using data on test items. In R. L. Thorndike
(Ed.), Educational Measurement, 2nd ed., Washington, DC: American Council on Education.

Hively, W. (1974). Domain-referenced testing. Englewood Cliffs, NJ: Educational Technology


Publications.

Holland, P. W., & Rubin, D. B. (Eds.) (1982). Test equating. Hillsdale, NJ: Erlbaum.

Holland, P. W., & Wainer, H. (1993). Differential Item Functioning. Hillsdale, NJ: Erlbaum.

Holzinger, K. J., & Swineford, F. (1937). The bi-factor method. Psychometrika, 2, 41-54.

Horst, P. (1933). The difficulty of a multiple-choice test item. Journal of Educational Psychol-
ogy, 24, 229-232.

Irving, L. M. (1987). Mirror images: Effects of the standard of beauty on women’s self and body
esteem. Unpublished Masters Thesis, University of Kansas.

Jenkins, C.D., Rosenman, R.H. & Zyzanski, S.J. (1972). The Jenkins Activity Survey of Health
Prediction. New York: The Psychological Corporation.

Johnson, N. L. (1949). Systems of frequency curves generated by methods of translation. Bio-


metrika, 36, 149-176.

Jöreskog, K. G., & Sörbom, D. (1996). LISREL 8: User’s Reference Guide. Chicago: Scientific
Software International, Inc.

Kaiser, H. F. (1958). The varimax criterion for analytic rotation in factor analysis. Psycho-
metrika, 23, 187-200.

Kendall, M., & Stuart, A. (1961), Inference and Relationship, Vol. 2 of The Advanced Theory of
Statistics, first ed. London: Charles Griffin & Company.

Kelley, T. L. (1947). Fundamentals of statistics. Cambridge, MA: Harvard University Press.

853
REFERENCES

Kiefer, J., & Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the pres-
ence of infinitely many incidental parameters. Annals of Mathematical Statistics, 27, 887.

Klassen, D., & O'Connor, W. A. (1989). Assessing the risk of violence in released mental pa-
tients: A cross-validation study. Psychological Assessment: A Journal of Consulting and Clinical
Psychology, 1, 75-81.

Kolakowski, D., & Bock, R. D. (1981). A multivariate generalization of probit analysis.


Biometrics, 37, 541-551.

Lawley, D. N. (1943). On problems connected with time selection and test construction. Pro-
ceedings of the Royal Society of Edinburgh, 61A, 273-287.

Lazarsfeld, P. F. (1950). The logical and mathematical foundation of latent structure analysis. In
S. A. Stouffer, L. Guttman, E. A. Suchman, P. F. Lazarsfeld, S. A. Star & J. A. Clausen, Meas-
urement and prediction. Princeton, NJ: Princeton University Press, 362-412.

Lawley, D. N. (1943). On problems connected with item selection and test construction. Pro-
ceedings of the Royal Society of Edinburgh, 61, 273-287.

Likert, R. (1932). A technique for the measurement of attitude. Archives of Psychology, 140.

Linacre, J. M., & Wright, W. D. (1993). FACETS: Many-facet Rasch analysis with FACFORM
Data Formatter. Chicago: MESA Press.

Linn, R. L., & Hambleton, R. K. (1991). Customized sets and customized norms. Applied Meas-
urement in Education, 4, 185-207.

Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data. New York: Wiley.

Longford, N. T. (1989). Fisher scoring algorithm for variance component analysis of data with
multilevel structure. In R. D. Bock (Ed.), Multilevel analysis of educational data. San Diego:
Academic Press, 297-310.

Lord, F. M. (1952). A theory of test scores. Psychometric Monograph, No. 7.

Lord, F. M. (1953). An application of confidence intervals and of maximum likelihood to the


estimation of an examinee’s ability. Psychometrika, 18, 57-76.

Lord, F. M. (1971). A theoretical study of two-stage testing. Psychometrika, 36, 227-242.

Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale,
NJ: Erlbaum.

854
REFERENCES

Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores (with contri-
butions by A. Birnbaum). Reading, MA: Addison-Wesley.

Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm.
Journal of the Royal Statistical Society, Series B, 44, 226-233.

Luce, R. D. (1959). Individual choice behavior. New York: Wiley.

Mantel, N. (1966). Models for complex contingency tables and polytomous dosage response
curves. Biometrics, 22, 83-95.

Marshall, J. C., & Hales, L. W. (1972). Essentials of Testing. Reading, MA: Addison-Wesley.

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.

Masters, G. N. (1985). A comparison of latent trait and latent class analyses of Likert-type data.
Psychometrika, 50, 69-82.

Meng, X. L., & Schilling, S. (1996). Fitting full information factor models and an empirical in-
vestigation of bridge sampling. Journal of the American Statistical Association, 91, 1254-1267.

Mislevy, R. J. (1983). Item response models for grouped data. Journal of Educational Statistics,
8, 271-288.

Mislevy, R. J. (1984). Estimating latent distributions. Psychometrika, 49, 359-381.

Mislevy, R. J. (1986). Bayesian modal estimate in item response models. Psychometrika, 51,
177-195.

Mislevy, R. J. (1987). Exploiting auxiliary information about examinees in the estimation of item
parameters. Applied Psychological Measurement, 11, 81-91.

Mislevy, R. J., & Bock, R. D. (1982). Biweight estimates of latent ability. Journal of Educa-
tional and Psychological Measurement, 42, 725-737.

Mislevy, R. J., & Bock, R. D. (1983). BILOG: Analysis and scoring of binary items and one-,
two-, and three-parameter logistic models. Chicago: Scientific Software International, Inc.

Mislevy, R. J., & Bock, R. D. (1990). BILOG 3: Item Analysis and Test Scoring with Binary Lo-
gistic Models. Chicago: Scientific Software International, Inc.

Mislevy, R. J., Johnson, E. G., & Muraki, E. (1992). Scaling procedures in NAEP. Journal of
Educational Statistics, 17, 131-154.

Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psy-
chological Measurement, 14, 59-71.

855
REFERENCES

Muraki, E. (1992). A generalized partial credit model: application of an EM algorithm. Applied


Psychological Measurement, 16, 159-176.

Muraki, E. (1993). Variations of polytomous item response models: Raters’ effect model, DIF
model, and trend model. Paper presented at the Annual Meeting of the American Educational
Research Association, Atlanta, GA.

Muraki, E. (1997). The generalized partial credit model. In W. J. van der Linden & R. K. Ham-
bleton (Eds.), Handbook of Modern Item Response Theory. New York: Springer Verlag, 153-
164.

Muraki, E., & Bock, R. D. (1997). PARSCALE 3: IRT based test scoring and item analysis for
graded items and rating scales. Chicago: Scientific Software International, Inc.

Muraki, E., & Engelhard, G. (1985). Full information item factor analysis: applications of EAP
scores. Applied Psychological Measurement, 9, 417-430.

Naylor, J. C., & Smith, A. F. M. (1982). Applications of a method for the efficient computation
of posterior distributions. Applied Statistics, 31, 214-225.

Neyman, J. A., & Scott, E. L. (1948). Consistent estimates based on partially consistent observa-
tions. Econometrika, 16, 1-22.

Olsson, U., Drasgow, F., & Dorans, N. J. (1982). The polyserial correlation coefficient. Psycho-
metrika, 47(3), 337-347.

Owen, R. J. (1969). A Bayesian approach to tailored testing. Research Bulletin No. 69-92.
Princeton, NJ: Educational Testing Service.

Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L.
Linn (Ed.), Educational Measurement (3rd edition). New York: American Council on Educa-
tion-Macmillan, 221-262.

Ramsey, J. O. (1975). Solving implicit equations in psychometric data analysis. Psychometrika,


40, 337-360.

Rasch, G. (1960; reprinted 1980). Probabilistic models for some intelligence and attainment
tests. Chicago: University of Chicago Press.

Rasch, G. (1961). On general laws and the meaning of measurement in psychology. Proceedings
of the fourth Berkeley symposium on mathematical statistics and probability, 4, 321-324.

Richardson, M. W. (1936). The relationship between difficulty and the differential validity of a
test. Psychometrika, 1, 33-49.

856
REFERENCES

Roche, A. F., Wainer, H., & Thissen, D. (1975). Skeletal Maturity: The knee joint as a biological
indicator. New York: Plenum.

Samejima, F. (1969). Estimation of latent trait ability using a response pattern of graded scores.
Psychometrika Monograph Supplement, No. 17.

Samejima, F. (1972). A general model for free-response data. Psychometrika Monograph Sup-
plement, No. 18.

Samejima, F. (1974). Normal ogive model on the continuous response level in the multidimen-
sional latent space. Psychometrika, 39, 111-121.

Samejima, F. (1979). A new family of models for the multiple-choice item. Research Report,
No. 79-4, Department of Psychology, University of Tennessee.

Schilling, S. (1993). Advances in Full Information Item Factor Analysis using the Gibbs Sam-
pler. Unpublished doctoral dissertation, University of Chicago.

Schilling, S. G., & Bock, R. D. (1999). High-dimensional maximum marginal likelihood item
factor analysis. (In press.)

Schultz, M. E., & Nicewander, W. A. (1997). Grade equivalent and IRT representations of
growth. Journal of Educational Measurement, 34(4), 315-332..

Smith, M. C., & Thelen, M. H. (1984). Development and validation of a test for bulimia. Journal
of Consulting and Clinical Psychology, 52, 863-872.

Spearman, C. (1904). General intelligence, objectively determined and measured. American


Journal of Psychology, 15, 201-293.

Stouffer, S. A., & Toby, J. (1951). Role conflict and personality. American Journal of Sociology,
56, 395-406.

Stroud, A. H., & Secrest, D. (1966). Gaussian Quadrature Formulas. Englewood Cliffs, NJ:
Prentice-Hall.

Symonds, P. M. (1929). Choice of items for a test on the basis of difficulty. Journal of Educa-
tional Psychology, 20, 481-493.

Thissen, D. (1982). Marginal maximum likelihood estimation for the one-parameter logistic
model. Psychometrika, 47, 175-186.

Thissen, D. (1991). MULTILOG: multiple category item analysis and test scoring using item re-
sponse theory. Chicago: Scientific Software International, Inc.

857
REFERENCES

Thissen, D., & Steinberg, L. (1984). A response model for multiple-choice items. Psychometrika,
49, 501-519.

Thissen, D., & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51,
566-577.

Thissen, D., & Steinberg, L. (1988). Data analysis using item response theory. Psychological
Bulletin, 104, 385-395.

Thissen, D., Steinberg, L., & Fitzpatrick, A. R. (1989). Multiple-choice models: The distractors
are also part of the item. Journal of Educational Measurement, 26, 161-176.

Thissen, D., Steinberg, L., & Money, J. A. (1989). Trace lines for testlets: A use of multiple-
categorical-response models. Journal of Educational Measurement, 26, 247-260.

Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of DIF using the parameters of item
response models. In P. W. Holland & H. Wainer (Eds.), Differential item functioning. Hillsdale,
NJ: Erlbaum, 67-113.

Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika,
47, 397-412.

Thorndike, E. L. (1927). The measurement of intelligence. New York: Teacher’s College, Co-
lumbia University.

Thorndike, R. L. (1982). Applied Psychometrics. Boston: Houghton-Mifflin.

Thurstone, L. L. (1925). A method of scaling psychological and educational tests. Journal of


Educational Psychology, 16, 433-451.

Thurstone, L. L. (1930). The learning function. Journal of General Psychology, 3, 469-493.

Thurstone, L. L. (1947). Multiple-factor analysis. Chicago: University of Chicago Press.

Tsutakawa, R. K. (1992). Prior distribution for item response curves. British Journal of Mathe-
matical and Statistical Psychology, 45, 51-71.

Tsutakawa, R. K., & Lin, H. Y. (1986). Bayesian estimation of item response curves. Psycho-
metrika, 51, 251-267.

Tucker, L. R. (1946). Maximum validity of a test with equivalent items. Psychometrika, 11, 1-
13.

Urban, F. M. (1908). The application of statistical methods to the problems of psychophysics.


Philadelphia: Psychological Clinic Press.

858
REFERENCES

Van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory.
New York: Springer-Verlag.

Verhulst, P-F. (1844). Recherches mathematiques sur la loi d'accrossement de population. Mem-
oires de l'Academe royal de Belgique, Volume 18.

Wainer, H. (Ed.) (1990). Computerized adaptive testing: a primer. Hillsdale, NJ: Erlbaum.

Wainer, H. (1995). Precision and differential item functioning on a testlet based test: The 1991
Law School Admissions Test as an example. Applied Measurement in Education, 8, 157-187.

Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for
testlets. Journal of Educational Measurement, 24, 185-201.

Warm, T. (1989). Weighted likelihood estimation of ability in item response theory. Psycho-
metrika, 54, 427-450.

Wilmut, J. (1975). Objective test analysis: some criteria for item selection. Research in Educa-
tion, 13, 27-56.

Wilson, D. T., Wood, R., & Gibbons, R. (1991). TESTFACT: Test scoring, item statistics, and
item factor analysis. Chicago: Scientific Software International, Inc.

Wood, R. (1977). Inhibiting blind guessing: the effect of instructions. Journal of Educational
Measurement, 13, 297-307.

Zimowski, M. F. (1985). Attributes of spatial test items that influence cognitive processing. Un-
published doctoral dissertation, University of Chicago.

Zimowski, M. F., & Bock, R. D. (1987). Full information item factor analysis of test forms from
the ASVAB CAT pool. Chicago: NORC.

Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (1996). BILOG-MG: Multiple-group
IRT analysis and test maintenance for binary items. Chicago: Scientific Software International,
Inc.

Zwick, R. (1987). Assessing the dimensionality of NAEP reading data. Journal of Educational
Measurement, 24, 293-308.

859
INDEX

1PL model, 51, 101, 123, for score estimation, on PRIORS command,
137-139, 355, 367, 379- 410, 530 394
380, 398-400, 510, 524, invoking 3-point, 491 on START command,
539-540, 543, 567-568, setting values of 396
605, 616-617, 645, 730, extreme points, 496 on TEST command,
732, 843 setting values of 355, 367, 398, 730
2PL model, 36, 51, 92-93, extreme weights, 497 on TMATRIX
101, 123, 137-139, 169, Adaptive testing, 632 command, 403
355, 367, 379, 398-400, Advanced tab ALPHA keyword
524, 539-540, 565, 567- on Item Analysis dialog on PRIORS command,
568, 606, 608, 732-734, box, 51, 88-89 47, 90, 115, 187, 189-
736, 768, 842 Aggregate-level 190
3PL model, 268, 355, 367, fit statistic, 210 ALPHA option
379, 398-400, 523, 538, IRT models, 610 on RELIABILITY
540, 569 Aitchison, J., 844 command, 776
Aitkin, M., 576, 585, 588, Analysis
599, 601, 617, 730, 842, display of details, 84
A 843 indication of successful
AJ keyword termination of, 82
on EQUAL command, selecting steps of, 457
Ability score file 379 specifying in
saving, 242-243 on FIX command, 385 MULTILOG, 388-
ACCEL keyword on PRIORS command, 389
on CALIB command, 393 Andersen, E.B., 730, 840
78, 89, 115-117, 123, AK keyword Andrich, D., 550, 556-557,
650 on EQUAL command, 846
on TECHNICAL 379 Answer key
command, 491 on FIX command, 385 using, 26, 59, 92, 149,
ACCEL option on PRIORS command, 167-171, 178-179,
on CALIB command, 393 236, 238, 241-242,
264, 274-275 AK option 254, 355-356, 448-
Acceleration on TMATRIX 449, 480, 634, 652,
for full information command, 404 658, 666, 670, 783,
factor analysis, 491 ALL keyword 827, 829
using or suppressing of on EQUAL command, Answer Key tab
routine, 275 381 on Item Keys dialog
value of constant, 77, on FIX command, 386 box, 59, 87
117, 382 on GROUPS command, Assessment testing, 618-
ACCMAX keyword 381 619
on ESTIMATE on ITEMS command, Assign Calibration Prior
command, 382 381 Latent Distribution
Adaptive quadrature, 410, ALL option dialog box, 75
491, 493, 495, 530, 589, on LABELS command,
781, 815, 817, 825-826, 387
842
860
INDEX

Assign Calibration Prior Bars, 520 BETA keyword


Latent Distribution editing characteristics on PRIORS command,
option of, 518 47, 90, 115, 187, 189,
on Technical menu, 70, Bartholomew, D.J., 843 190
75, 90 Bayes estimation, 31, 34- Bifactor analysis, 410, 418,
Assign Fixed Items dialog 35, 214, 317, 410, 418, 782, 804, 815
box, 68, 109, 225, 227 476, 529, 530, 537, 544, assigning items to item
Assign Fixed Items option 576, 588, 590, 601, 605, groups, 419
on Technical menu, 68, 607, 609-610, 616, 624- number of cycles, 419
88, 109, 225, 227 625, 655, 664, 679, 692- number of decimals in
Assign Item Parameter 693, 706, 721, 780-781, solution, 420
Prior Constraints option 800, 802, 837-838, 841, printing of loadings, 420
on Technical menu, 89- 843-844, 847 printing of residual
90 Bayes Modal or Maximum matrix, 421
Assign Item Parameter A Posteriori (MAP), 25, printing of smoothed
Starting Values dialog 31, 35-37, 75, 211, 214, matrix, 421
box 349-350, 357, 389, 392, BIFACTOR command,
Import/Enter Values tab, 395, 474, 476, 529, 544, 410, 418, 468, 472-473,
87-88 590, 605, 607-610, 625, 495, 804
Assign Item Parameter 655, 664, 685, 748-749, CPARMS keyword,
Starting Values option 781-782, 802, 837-838, 418, 804-805
on Technical menu, 65, 841, 843 CYCLES keyword, 419,
87-88 Bayes or Expected A 804, 806, 815
Assign Scoring Prior Posteriori (EAP), 25, 29, IGROUPS keyword,
Latent Distribution 31-37, 75, 196, 205,208, 419, 804-805, 815
dialog box, 70, 75 211-212, 214, 216-217, LIST keyword, 420,
Normal tab, 75, 90 221, 244, 264, 316-317, 804, 815
User-Supplied tab, 76, 319, 418, 474, 476, 478, LORD option, 413-414
90-91 498, 543, 561-562, 588, MISS option, 413-414
Assign Scoring Prior 594, 604, 607-608, 610, NDEC keyword, 420
Latent Distribution 615, 625, 649, 654-656, NIGROUPS keyword,
option 659, 664, 666, 669, 671, 419, 421, 804, 815
on Technical menu, 70, 685-686, 693, 695, 705- NOLIST option, 421
75, 90-91 709, 721, 731, 736, 738, OMIT keyword, 421,
Axis labels 747, 752, 780-782, 800, 423, 477
editing in plot, 517 802- 804, 812, 817, 837 QUAD keyword, 422
Axis Labels dialog box, Bergan, J.R., 759 RECODE option, 413-
517 Beta 414
function, 612 RESIDUAL option, 422
parameter for SMOOTH option, 422
B distribution, 452 TIME option, 423
prior on uniquenesses, Bifactor loadings, 810
452 Bifactor solution, 420
Baker, F.B., 599 supplying priors for number of EM cycles,
Bar Graph Parameters distribution, 278 419
dialog box, 517-518

861
INDEX

reproducing expected using item parameter on PRIORS command,


correlation matrix, file as input, 150, 393
422 200, 204 Bliss, C.I., 833
BILOG-MG using master file as BLOCK command, 263,
allocating items, 40 input, 151 265, 295, 322, 327, 692
calibration phase, 27 using provisional values BNAME keyword, 263,
example of 2PL model, file as input, 156 265, 266, 721
92 using raw data file as CADJUST keyword,
example of DIF input, 148 263, 265-6, 270, 692,
analysis, 100 using the interface, 91 703
format statement, 57, using weighted response CATEGORY keyword,
59, 64, 88, 114, 145, records, 154 263, 265-266, 269
149, 155, 160, 163, Binet, A., 616, 619, 621- CNAME keyword, 263,
167, 170-171, 173- 622, 628, 830-832 265, 267, 269
175, 178, 235 Birnbaum model, 539-540, CSLOPE option, 263,
input files, 57, 149, 156, 606, 842 265, 267
173, 241 Birnbaum, A., 523, 538- GPARM keyword, 263,
input of response record, 539, 594, 837, 840-841 265, 268-269, 722
110 Biserial coefficient GUESSING keyword,
input phase, 26 as discrimination index, 263, 265, 268, 722
new features, 24 450 MODIFIED keyword,
opening or creating Biserial correlation, 462-5, 263, 265, 267, 269,
syntax file, 38 467, 471, 562-563, 579, 272, 696, 709, 720
options on main menu 667, 700, 786-787 NCAT keyword, 263,
bar, 38 BISERIAL option 265, 269-271, 339,
order of commands, 114 on PLOT command, 450 692, 710
output files, 149, 199- Bivariate Plot option NITEMS keyword, 263,
207, 241-242 Graphics procedure, 265, 270, 339, 342,
overview of syntax, 113 505, 513 692, 710
providing item and Bivariate plots, 505, 513 NOCADJUST option,
group labels, 41 displaying, 513 263, 265, 270
providing job editing and saving, 513 ORIGINAL keyword,
description, 41 BIWEIGHT option 263, 265, 270, 333,
scoring phase, 31 on SCORE command, 335-336, 696, 710,
specifying input files, 53, 91, 115, 208, 209 720
147, 163 Biweighting, 31, 52, 209 RATER keyword, 263,
specifying model, 41 BJ keyword 265, 271, 281, 320
specifying number of on EQUAL command, REPEAT keyword, 263,
parameters, 152 379 265-266, 271, 710,
specifying Rasch model, on FIX command, 385 721-722
133 on PRIORS command, SCORING keyword,
specifying type of 393 263, 265, 272, 709
analysis, 40 BK keyword SKIP keyword, 263,
user interface, 37 on EQUAL command, 265, 272
using calibration file as 379
input, 147 on FIX command, 385

862
INDEX

Blocks BS option ESTORDER option,


assigning names, 265- on TEST command, 264, 274, 277
266 355, 367, 379, 398- FIXED option, 25, 78,
number of categories in, 400, 403, 752 79, 89, 115-116, 122,
269 Build Syntax option 652
number of items in, 270 on Run menu, 82, 97, FLOAT option, 51, 89,
of common items for 105 115-116, 122, 264,
DIF analysis, 300 Burt, C., 830, 832 274, 277, 652, 675
repeating, 271 FREE keyword, 264,
requesting common 274, 278
slope for all items, C GPRIOR option, 51, 89,
267, 275 115-116, 123, 138,
required number of, 265 140, 264, 274, 278,
skipping estimation of, CADJUST keyword 721
272 on BLOCK command, GRADED option, 264,
BNAME keyword 263, 265-266, 270, 274, 279, 692
on BLOCK command, 692, 703 GROUP-PLOTS option,
263, 265-266, 721 CALIB command, 51, 78, 25, 79, 89, 115-116,
Bock 88, 115-116, 264, 274, 124
main class and 289, 295 IDIST keyword, 88,
interaction model, ACCEL keyword, 78, 115-116, 122, 125,
750 89, 115-117, 123, 650 189, 193-195, 675
Bock, R.D., 25, 34-35, ACCEL option, 264, ITEMFIT keyword, 264,
209, 257, 345, 399, 404, 274-275 274, 279, 692
528-529, 531, 535-537, CHI keyword, 25, 51, LOGISTIC keyword,
546, 558, 560-561, 568, 88, 115-116, 118, 634 710
570, 576, 584-585, 588- COMMON option, 89, LOGISTIC option, 264,
591, 597, 599, 601, 607, 115-116, 118 274, 279, 692
610, 614, 616-618, 628, CRIT keyword, 51, 88, NEWTON keyword, 51,
679, 688, 730, 749-751, 115-117, 119, 264, 88, 115-116, 120-121,
760-761, 763, 765, 767, 274-275, 639, 666, 126, 136, 226, 264,
778, 830, 835, 838-839, 692, 711 274, 280, 692, 703,
841-847 CSLOPE option, 264, 711
Boundaries 274-275 NFULL keyword, 78,
before partitioning into CYCLES keyword, 51, 89, 115-116, 127
subtests, 489 88, 115-117, 120, NOACCEL option, 264,
number of, 458 126-127, 136, 226, 274-275
BOUNDARY keyword 264, 274, 276, 666, NOADJUST option, 25,
on FRACTILES 692, 702, 711 37, 79, 89, 115-116,
command, 435, 776 DIAGNOSIS keyword, 127-128, 226-227
on SUBTEST 89, 115-116, 121, NOCALIB option, 264,
command, 489, 776 264, 274, 276, 320 274, 280, 288, 324,
Bowers, J., 580 DIST keyword, 264, 708
Bradley, R.A., 835 274, 277, 702 NOFLOAT option, 89,
Browne, M.W., 844 EMPIRICAL option, 51, 115-116, 122-123,
89, 115-116, 122, 187-188, 675
129, 675

863
INDEX

NOGPRIOR option, 89, REFERENCE keyword, user-supplied parameter


115-116, 123, 138, 42, 89, 115-116, 128, values, 272
140 134-135, 220, 638, Calibration file
NORMAL option, 89, 659, 675 as input in BILOG-MG,
115-116, 128-129, RIDGE keyword, 89, 147
264, 274, 279, 659 115-117, 135, 264, naming of, 289
NOSPRIOR option, 89, 274, 283, 666 saving, 199, 312
115-116, 124, 137, SCALE keyword, 264, Calibration Only option
140 274, 283, 692 on Run menu, 82
NOTPRIOR option, 89, SELECT keyword, 47, Calibration Options dialog
115-116, 124, 138- 89, 115-116, 136, 686 box, 78, 117, 122, 125,
139 SKIPC option, 264, 274, 128, 133
NQPT keyword, 51, 88, 283 Calibration Options option
115-117, 121-122, SPRIOR option, 51, 89, on Technical menu, 78,
129, 194-195, 264, 115-116, 124, 137, 89, 117, 122, 125,
274, 279-280, 308- 140, 264, 274, 721 128, 133
309, 342, 639, 666, THRESHOLD option, Calibration Prior Latent
692, 702, 709-710 264, 274, 284 Distribution dialog box,
NRATER option, 264, TPRIOR option, 51, 89, 70, 194, 195
274, 281 115-117, 124, 138- Calibration Prior Latent
NSD keyword, 89, 115- 139, 264, 274, 284, Distribution option
116, 130 666 on Technical menu, 70,
PARTIAL keyword, 710 CALIB keyword 194, 195
PARTIAL option, 264, on SAVE command, 86, Carroll, J.B., 585
274, 279 114, 148, 199, 204, CASE option
PLOT keyword, 88, 263, 288, 312, 337 on INPUT command,
115-116, 124-125, Calibration, 34, 40, 47, 70, 413-414
130-131, 634 75, 82, 116, 276, 288, Case score information
POSTERIOR option, 349-350, 364, 371, 389, saving to file, 469
264, 274, 281, 711 599, 611, 694, 742, 752 Case weights, 27, 298
PRINT keyword, 88, controlling iterative Categories
115-116, 131, 157 procedure, 46, 51, assigning names to, 267
PRIORREAD option, 274, 382 average of threshold
264, 274, 281, 305 information on prior parameters, 700-701
QPREAD option, 264, distributions, 64 collapsing, 269-270
274, 282, 308 sample, 609, 616 number of, 399
QRANGE keyword, selecting blocks for, 272 number of highest, 355,
264, 274, 282 selecting subtests for, 399
RASCH option, 26, 79, 46, 136 number of in graded
89, 115-116, 133 skipping of, 280, 283 model, 375
READPRIOR option, suppressing adjustment number of per block,
70, 89, 115-117, 133, for category 269
187, 189-192, 666, parameters, 318 threshold parameters,
675 suppressing correction 546, 564, 700, 703
for information
function, 281

864
INDEX

Category coefficient change in likelihood Classical item statistics,


generalized partial credit ratio, 30 27, 37, 82, 106, 120,
model, 557 likelihood ratio, 437, 126, 154, 203, 242, 244,
CATEGORY keyword 731-732, 736, 747, 410, 530, 541, 584, 635,
on BLOCK command, 750, 763, 768, 780, 642, 651, 659, 667, 672,
263, 265-266, 269 796 676, 780, 818
Category parameters, 341 test statistic, 796, 807 saving to file, 203, 242,
providing initial values, CJ keyword 244, 467
266 on EQUAL command, Classical reliability, 32-33
setting mean of, 266, 379 Classical Statistics Only
270 on FIX command, 385 option
skipping estimation of, on PRIORS command, on Run menu, 82
280, 283 393 Clinical testing, 618-619
suppressing calibration CK keyword Clogg, C.C., 735, 761
adjustment, 318 on EQUAL command, CMAIN option
CCRIT keyword 379 on SAVE command,
on ESTIMATE on FIX command, 385 462
command, 383 on PRIORS command, CNAME keyword
CCRIT option 393 on BLOCK command,
on SAVE command, CK option 263, 265, 267, 269
461 on TMATRIX Code
CFNAME keyword command, 404 class identification, 424
on FILES command, Class codes correct-response in
263, 288-289 assigning in TESTFACT, 448
on GLOBAL command, TESTFACT, 424 identifying raters, 304
85, 114, 147 CLASS command, 424, identifying subgroups,
CHANCE option 776 301
on SAVE command, IDENTITY keyword, not-presented items, 43
816-817 424 omitted items, 43
on SCORE command, NAMES keyword, 425 responses, 43
474, 804 Class item statistics Coefficient
on SIMULATE saving to file, 461 alpha, 459
command, 483-484, CLASS keyword of kurtosis, 26, 79, 214-
820 on PROBLEM 215
Changing command, 446, 454, of skewness, 26, 79,
attributes of graphs, 517 776 214-215
size of window display, Classes Collapsing
83 identifying, 424 of interval for test
CHI keyword number of, 454 statistic, 544, 562,
on CALIB command, saving item statistics for 605, 705
25, 51, 88, 115-116, file, 465 COMBINE command,
118, 634 saving separate 264, 285, 292, 722
Chi-square, 634 estimates, 462 NAME keyword, 264,
ability intervals for, 25, 285
51, 118 WEIGHTS keyword,
264, 285, 286, 723

865
INDEX

COMBINE keyword 219-220, 236, 238, RELIABILITY, 459


on INPUT command, 240, 652 RESPONSE, 456-457,
263, 285, 292, 338, FRACTILES, 435, 455 460, 781, 804
722 FULL, 410, 437, 478 SAVE, 80, 86, 114, 158,
on SAVE command, GLOBAL, 85, 114, 141, 199, 242-244, 247-
263, 285, 312, 313, 199, 242 248, 250, 253, 255-
337 GROUP, 88, 108-112, 256, 263, 289, 291,
Combined 114, 163, 165, 173- 312, 337, 395, 409,
subscale scores, 285-286 174, 183, 236 420-421, 442, 461,
Combined score file INPUT, 55, 62, 86, 114, 482, 499-500, 652,
saving, 313, 337 241-242, 263, 280, 722
Commands 289, 291-292, 320, SCORE, 75, 90, 115,
BIFACTOR, 410, 418, 329, 331, 441, 482, 206, 208, 264, 285,
468, 472-473, 495, 504 289, 292, 316, 410,
804 ITEMS, 87, 108, 111- 474
BLOCK, 263, 265, 295, 112, 114, 145-146, SELECT, 418-419, 457,
322, 327, 692 159-161, 173, 182, 480, 489-490, 504,
CALIB, 51, 78, 88, 115- 224-225, 228-233 781, 818
116, 264, 274, 289, KEY, 448, 779, 781, SIMULATE, 410, 461,
295 804 474, 482, 819
CLASS, 424, 776 LABELS, 387 START, 395-396, 748
COMBINE, 264, 285, LENGTH, 86, 109, 114, STOP, 482, 488
292, 722 153-154, 159, 185, SUBTEST, 428, 458,
COMMENT, 42, 85, 229, 231 489
114, 141, 263, 287, MGROUP, 263, 294, TECHNICAL, 410, 491
426 295, 300, 710, 711 TEST, 66, 87, 108-109,
CONTINUE, 427, 482, MRATER, 263, 294, 114, 121, 127, 153-
775 303, 728 154, 183, 185-186,
CRITERION, 428, 470 NAMES, 449 224, 263, 274, 297,
DRIFT, 88, 114, 163, NAMES, 429-430, 449, 316, 325, 327, 398
165 455-456, 482, 490, TETRACHORIC, 499,
END, 378, 405-406, 409 804 806
EQUAL, 379, 752, 754, PLOT, 450 TGROUPS, 401, 403,
758, 761, 768 PRIORS, 69, 90, 115, 750, 758
ESTIMATE, 382 117, 134, 187, 193, TITLE, 42, 85, 114,
EXTERNAL, 428, 430, 264, 281, 305, 393, 234, 263, 330, 482,
455 452, 675 502
FACTOR, 410, 418, PROBLEM, 388, 435, TMATRIX, 754, 767
431, 468, 478 454, 482 COMMENT command,
FILES, 263, 280, 287- QUAD, 75, 90, 115, 42, 85, 114, 141, 263,
288, 312 117, 122, 126, 187- 287, 426
FIX, 738, 754, 758, 761 189, 193, 675 COMMON keyword
FORM, 88, 108-112, QUADP, 264, 282, 308 on MGROUP command,
114, 159, 163, 173- QUADS, 75, 91, 115, 263, 300
174, 183, 215-216, 196, 211-212, 264,
310, 321

866
INDEX

COMMON option specifying in BILOG- CRITERION option


on CALIB command, MG, 119 on PLOT command,
89, 115-116, 118 Copying 451, 776
Communality of graphs, 515 on PROBLEM
estimates, 791-792, 810 CORRELAT option command, 358, 388,
Communality on INPUT command, 773
improvements 445, 818 on SAVE command,
convergence criterion on SAVE command, 464
for, 494 463 Criterion score
COMPLETE option Count matrix defining in TESTFACT,
on TETRACHORIC for calculating 428
command, 500-501 tetrachoric external, 451, 776, 829
Components correlations, 499 mean, 462, 464
specifying Covariance file naming, 429
characteristics, 454 saving, 200, 242, 248 saving item statistics
Constraints COVARIANCE keyword based on, 464
equal slopes, 738 on SAVE command, 86, CRITMARK option
equality, 761 114, 199-201, 248 on CRITERION
for group parameters, CPARMS keyword command, 428, 776
560 on BIFACTOR Cronbach, L.J., 837
imposing in command, 418, 804- CROSS option
MULTILOG, 379, 805 on TETRACHORIC
735, 754, 758, 760, on FULL command, command, 499
768 437, 474 CSLOPE option
of location parameters, CRIT keyword on BLOCK command,
750 on CALIB command, 263, 265, 267
pairwise, 380 51, 88, 115-117, 119, on CALIB command,
using in MULTILOG, 264, 274-275, 639, 264, 274, 275
757 666, 692, 711 CSUB option
Construct definition, 629 Criterion on SAVE command,
Contingency table convergence of EM 465
per item, 562 cycles, 383 CYCLES keyword
CONTINUE command, reading from file, 359 on BIFACTOR
427, 482, 775 referencing, 632 command, 419, 804,
Contrasts CRITERION command, 806, 815
deviation, 404 428, 470 on CALIB command,
in MULTILOG, 379, CRITMARK option, 51, 88, 115-117, 120,
385, 393, 570 428, 776 126-127, 136, 226,
standardized, 717 EXTERNAL option, 264, 274, 276, 666,
Convergence criterion, 51, 428, 429 692, 702, 711
117, 120, 382-383, 493, NAME keyword, 429 on FULL command,
495, 638, 692, 702, 711, SUBTEST option, 428 438, 779, 781, 794,
781, 826, 829 WEIGHTS keyword, 803, 815, 825
for communality 428-429
improvements, 494
for M-step, 383

867
INDEX

D using subset of records, Degrees of freedom, 30,


179, 298 535, 543, 544, 562, 587-
Data using weighted, 154 588, 603, 605, 634, 651,
aggregate-level, 210 Data file 670, 705, 747, 763, 796-
as input in BILOG-MG, providing information 797, 806-808
148, 163 on, 54, 289 Delta statistic, 577-578,
counts of response Data File tab 786
patterns, 352, 366, on Examinee Data Dempster, A.P., 585, 842
407 dialog box, 55 DEVIATION option
DIF model, 334 Data File/Enter Data tab on TMATRIX
entering interactively, on Examinee Data command, 404
58 dialog box, 85-86, 88 DFNAME keyword
fixed-effects table of on Group-Level Data on FILES command,
counts, 352-353, 407 dialog box, 85-86, 88 263, 288-289, 301,
fixed-format, 56 DATA keyword 304, 333, 692, 710
for item or factor on PROBLEM on GLOBAL command,
analysis, 441 command, 352, 359, 57, 59, 64, 85, 114,
format of, 56 389 147-149, 151, 172,
group-level, 54, 62-63, Data menu 177, 240-241
208, 211, 213, 221, Examinee Data dialog DIAGNOSE keyword
293, 320, 333, 335, box, 55, 95, 147, 149, on INPUT command,
344, 537, 631, 666- 163, 167, 171, 175, 87, 114, 163, 164
669, 685, 841 178-181, 240, 242 DIAGNOSIS keyword
in compressed format, Examinee Data option, on CALIB command,
110, 112 85-94, 103 89, 115-116, 121,
in expanded format, Group-Level Data 264, 274, 276, 320
110, 112, 173 dialog box, 62, 147, Dialog boxes
individual level, 333 149, 163, 167, 171, Assign Calibration Prior
matrix sampling, 116, 175, 178-181, 240, Latent Distribution,
187, 238, 537, 610, 242 75
630, 666 Group-Level Data Assign Fixed Items, 68,
multiple-group Rater's- option, 85- 88 109, 225, 227
Effect, 333, 335 Item Keys dialog box, Assign Scoring Prior
number of records, 391 59, 163, 169, 172, Latent Distribution,
rewinding, 445 177, 242 70, 75
sampling from file, 178, Item Keys option, 87 Axis Labels, 517
298 Data Options dialog box, Bar Graph Parameters,
single-group model, 334 77, 117, 127, 163, 167 517-518
single-subject, 54, 55, Data Options option Calibration Options, 78,
62, 64, 180 on Technical menu, 77, 117, 122, 125, 128,
specifying in 87, 89, 117, 127, 163, 133
MULTILOG, 388-90 167 Calibration Prior Latent
specifying order of Davis, J.A., 761 Distribution, 70, 194-
items, 144 Decimals 195
specifying type of, 180, number for tetrachorics, Data Options, 77, 117,
445 500 127, 163, 167

868
INDEX

Examinee Data, 55, 95, New Analysis, 349-350, saving parameters to


147, 149, 163, 167, 357, 364, 370-371, file, 201
171, 175, 178-181, 377, 388-390 DIF keyword
240, 242 Plot Parameters, 521- on MGROUP command,
Fixed Theta, 350, 358, 522 263, 300-301, 710,
371, 377, 388-389 Project Settings, 362, 711
General, 41, 59, 93, 369 on SAVE command, 86,
108-111, 113, 117, Response Codes (Binary 114, 164-165, 199,
135, 141, 145, 147, Data), 355-356, 368, 201, 245, 638
151-153, 156, 160- 384, 405-406 DIF option
163, 165, 170, 174, Response Codes (Non- on INPUT command,
176, 183, 225, 229, Binary Data), 355, 43, 87, 114, 135, 163-
233-234 361 164, 192, 638
Graph Parameters, 516 Save Output to File, 78, DIF parameter file
Group-Level Data, 62, 80 saving, 242, 245
147, 149, 163, 167, Score Options, 79, 209- Difficulty
171, 175, 178-179, 210, 215 of item, 539
180-181, 240, 242 Scoring Prior Latent Difficulty index, 451, 462-
Input Data, 350-353, Distribution, 197-198, 465, 467, 471, 577
358, 365, 371, 377, 209, 212, 217-218, of item, 562, 642
388, 390 219 Digvi, D.R., 583
Input Parameters, 352- Settings, 83, 85 Discriminating power, 575,
355, 359, 362, 366- Test Model, 352-354, 622, 632
368, 372-374, 377, 360, 367, 373, 377, defining, 451
388-392, 398-399 398-400 of item in 2PL model,
Item Analysis, 46, 94, Test Scoring, 147, 150, 531, 535, 538-539,
102, 108-113, 117- 157, 209, 211-214, 576, 578, 623, 692,
118, 120-124, 127, 216, 221-222 840
130, 137-138, 140, Text Parameters, 517- plot against item
145-146, 160-161, 518, 521-522 difficulty, 450
185-186, 225, 231 DIF, 24, 29, 42-43, 86-87, Discrimination parameter,
Item Keys, 59, 163, 169, 100-101, 106-108, 114, 594
172, 177, 242 131, 133-135, 155, 163- Dispersion
Item Parameter Prior 165, 175, 187, 191-192, starting values for, 225
Constraints, 69, 117, 199, 201, 225, 233, 242, DISPERSN keyword
134, 189 245, 257, 263, 274, 278, on TEST command, 47,
Item Parameter Starting 286, 294, 300-302, 333- 68, 88, 114, 224-225,
Values, 65, 109, 225, 337, 340, 343-345, 528- 232
226, 228, 230, 232 529, 531-533, 535-536, Displaying
Item Prior Constraints, 546, 560-561, 626-627, bivariate plots, 513
189-192 638, 645-646, 651, 710- histogram of ability
Legend Parameters, 517, 713, 717-719, 770, 844- scores, 512
520 845, 858 item characteristic
Line Parameters, 516- and common blocks of curve, 506, 508, 510
517, 519, 521-522 items, 300 item information curve,
and Rasch model, 133 507-508

869
INDEX

test information, 509 MIDPOINT keyword, Editor tab


DIST keyword 88, 114, 142, 143 on Settings dialog box,
on CALIB command, DRIFT keyword 83
264, 274, 277, 702 on SAVE command, 86, EM
on SCORE command, 114, 165, 199, 202, algorithm, 28, 644
264, 282, 316, 695, 247 reversing estimation
706 DRIFT option order, 277
DK keyword on INPUT command, EMPIRICAL option
on EQUAL command, 43, 87, 114, 135, 142, on CALIB command,
379 163, 165, 192, 202 51, 89, 115-116, 122,
on FIX command, 385 DRIFT parameter file 129, 675
on PRIORS command, saving, 242, 247 END command, 378, 405-
393, 733 du Toit, S.H.C, 844 406, 409
DK option Engelhard, G., 588, 591
on TMATRIX Enter Data tab
command, 404 E on Examinee Data
DOMAIN keyword dialog box, 58
on SCORE command, Enter Values tab
26, 79, 91, 115, 208- EAP on Item Parameter
210 controlling precision for Starting Values dialog
Domain referencing, 632, factor scores, 478 box, 66
846 estimating scores on EQUAL command, 379,
Domain scores, 26, 79, general factor, 418 752, 754, 758, 761, 768
209-210, 621, 631-632, EAP option AJ keyword, 379
688-691, 846 on SCORE command, AK keyword, 379
calculating, 65 264, 316-317, 693, ALL keyword, 381
Dorans, N.J., 564 707, 721 BJ keyword, 379
Drasgow, F., 564 Edit menu, 40, 515 BK keyword, 379
DRIFT, 24, 29, 37, 42-43, Editing CJ keyword, 379
86-88, 114, 123, 133- axis labels of graphs, CK keyword, 379
135, 142, 163, 165, 187, 517 DK keyword, 379
191-192, 199, 202, 242, bar parameters, 518 GROUPS keyword, 381
247, 254, 278, 528, 531- bivariate plots, 513 ITEMS keyword, 381
532, 536, 652, 844, 845 histogram of ability MU keyword, 379
and Rasch model, 133 scores, 512 SD keyword, 379
requesting analysis, 165 item characteristic Equality constraints
saving parameters to curve, 506, 508, 512 of item parameters, 379,
file, 202 item information curve, 575
specifying polynomial, 507-508 Equating
142 legends of graphs, 520 equivalent groups, 532-
specifying time points, lines in graph, 521 533, 627, 631, 652,
143 plot parameters, 522 685, 768, 845
DRIFT command, 88, 114, test information curve, linear, 627, 838, 845
163, 165 509 non-equivalent groups,
MAXPOWER keyword, text in graph, 522 122, 627-628, 632
88, 114, 142, 192 of forms, 626

870
INDEX

vertical, 24, 116, 120, Bayes, 31, 34-35, 214, 685, 748-749, 781-
129, 138, 528, 532, 317, 410, 418, 476, 782, 802, 837-838
534, 621, 627-628, 529-530, 537, 544, Minimized squared
658, 661, 845 576, 588, 590, 601, residuals (MINRES),
Equipercentile method, 605, 607, 609-610, 791, 793
533, 627 616, 624-625, 655, Newton-Gauss, 28, 31,
Equivalent groups 664, 679, 692-693, 120, 601, 644, 692,
equating, 532-533, 627, 706, 721, 780-781, 702, 833-834
631, 652, 768 800-802, 837-838, Newton-Raphson, 31,
ERRORSEED keyword 841, 843-844, 847 615, 749, 833-834
on SIMULATE Bayes modal, 605, 607, reversing order of EM,
command, 483, 820, 841, 843 277
822 marginal maximum Warm's weighted
E-step likelihood, 28, 116, maximum likelihood,
methods of integration, 121, 123, 345, 349- 264, 316-317, 615
410, 530 350, 364, 389, 401- ESTORDER option
saving results of final, 403, 407, 529, 544, on CALIB command,
465 562, 576, 584-587, 264, 274, 277
ESTIMATE command, 589-590, 599-602, Examinee Data dialog box,
382 604-605, 607, 611, 55, 95, 147, 149, 163,
ACCMAX keyword, 644, 675, 699-700, 167, 171, 175, 178-181,
382 702, 705, 730, 742- 240, 242
CCRIT keyword, 383 743, 806, 842-843 Data File tab, 55
ICRIT keyword, 383 maximum likelihood, Data File/Enter Data
ITERATIONS keyword, 25, 28, 30-31, 33-36, tab, 85-86, 88
383 75, 122, 128, 208, Enter Data tab, 58
NCYCLES keyword, 213-214, 222, 277, General tab, 55, 86-87,
383 317-318, 320, 323, 103
VAIM keyword, 356, 345, 410, 452, 495, Examinee Data option
384 529-530, 532, 537, on Data menu, 55, 85-
Estimated error variance, 543-544, 564, 568, 88, 94-95, 103, 147,
32-35, 249, 271, 583, 576, 584-585, 586- 149, 163, 167, 171,
620, 623, 625, 655-656, 587, 591, 594-595, 175, 178-181, 240,
684-685, 699, 838, 843 597- 600, 606-611, 242
Estimates 615, 652, 655, 685, Example, 359, 362
a-posteriori, 812 693, 700, 702, 708, 2PL model with
provisional, 496 734, 744, 792, 794, BILOG-MG, 92
Estimating 806, 833-834, 836- 3PL model, 364, 733
common value for lower 838, 840-844, 847 DIF analysis with
asymptote, 118 maximum marginal a BILOG-MG, 100
means of prior posteriori, 25, 29, 31, fixed-theta model, 370
distributions, 123 35-37, 75, 211, 214, reading of an external
score distribution as 349-350, 357, 389, criterion, 357
discrete distribution, 392, 395, 474, 476, Expected correlations
121 529, 544, 590, 607- in bifactor solution, 422
Estimation 610, 625, 655, 664, Expected frequencies, 438

871
INDEX

marginal, 30 Factor analysis Factor correlations, 576


saving to file, 202, 242, changing values of Factor loadings, 410-411,
250 default constants, 491 442-443, 446-477, 482,
EXPECTED keyword classical, 410, 530 484, 485, 487, 530, 575-
on SAVE command, 86, controlling, 431 576, 580, 584-589, 629,
114, 199, 202, 250 exploratory, 584 645, 699, 779-781, 792-
EXPECTED option full information, 410, 794, 797-798, 809-811,
on SAVE command, 432, 437, 442, 446, 815, 818-821, 823, 829,
465 530, 629, 778, 818 842-843
EXTERNAL command, latent response process MINRES principal, 792
428, 430, 455 correlation matrix, parameter file as input,
EXTERNAL keyword 434 476
on INPUT command, MINRES, 431 saving to file, 468, 472
55, 63, 87, 114, 163, not-reached items, 586 unrotated, 420, 798
166 number of factors, 431, Factor scores, 410-411,
on PROBLEM 486 421, 439, 468, 474-475,
command, 430, 446, of inter-item tetrachoric 477-478, 482, 498, 530,
455 correlations, 410, 576, 584, 588-590, 629,
EXTERNAL option 529, 575 780-781, 800-804, 812-
on CRITERION principal, 434, 442, 530, 813, 815-817, 819-820,
command, 428-429 583 826, 828
External variables requesting rotation, 433 controlling EAP/MAP
in computation of item statistical test of number precision, 478
parameters, 166 of factors, 587 number of leading cases
naming, 430 FACTOR command, 410, in output, 475
number of, 455 418, 431, 468, 478 number of with user-
Extreme points NDEC keyword, 431 supplied file, 477
setting values for NFAC keyword, 431- providing population
adaptive quadrature, 433, 446, 466, 468, means for, 486
496 469, 473, 484, 486, saving to external file,
Extreme weights 779, 781, 797, 803, 466
setting values for 815, 818, 824-825 using guessing model,
adaptive quadrature, NIT keyword, 432 474
497 NROOT keyword, 432, Factors
779, 781, 792, 803, number of, 410
815, 818, 825 number of item-group,
F PROMAX option, 413 421
RESIDUAL option, FACTORS option
433, 779 on INPUT command,
Facilities ROTATE keyword, 433, 445, 823
plot, 582 468, 472, 779, 781, Fechner, G.T., 832
Facility, 27, 244, 345, 410, 793, 803, 818, 824 FILE keyword
529, 542, 575, 578, 581- ROTATE option, 414 on INPUT command,
583, 787-788, 793-794 SMOOTH option, 434, 443, 446, 776, 804,
FACILITY option 779 818, 823, 825
on PLOT command, VARIMAX option, 413
451, 776
872
INDEX

on SCORE command, SAVE option, 263, 288, Fit statistics


26, 79, 91, 115, 208- 291, 312 group-level, 317
210, 468, 475, 477- Fill Page option Fit statistics file
478, 690, 802, 815 Graphs menu, 516 saving, 313, 338
on SIMULATE Finney, D.J., 834 Fitzpatrick, A.R., 751, 757
command, 483, 819- Fisher information, 31, FIX command, 738, 754,
820 590, 606, 609, 655, 834, 758, 761
File menu, 38, 54, 514-515 837 AJ keyword, 385
Print current page Fisher scoring, 28, 31, 36, AK keyword, 385
option, 514 120, 126-127, 280, 601, ALL keyword, 386
Print selected graph 606, 608, 644, 692, 702, BJ keyword, 385
option, 514 833-834, 841-842 BK keyword, 385
Printer Setup option, requesting use of full CJ keyword, 385
514 information matrix, CK keyword, 385
Printing Options option, 127 DK keyword, 385
514 Fisher, R.A., 28, 31, 36, GROUPS keyword, 386
Save as Metafile option, 126-127, 267, 280, 590, ITEMS keyword, 386
514 592, 598-599, 601, 606, MU keyword, 385- 386
Save option, 97 608-610, 628, 655, 702, SD keyword, 385-386
Show Selectors option, 833-834, 837, 841-842, VALUE keyword, 386
514 844, 845 FIX keyword
Filename Fit on TEST command, 25,
for input in TESTFACT, group-level statistics, 36, 47, 68, 88, 114,
443 208, 211, 213, 221, 224, 226
Files 317, 666 Fixed
opening multiple syntax, likelihood ratio test, 30, number of groups, 401
84-85 530, 535-536, 587, parameters, 226, 386,
FILES command, 263, 705, 763 575
280, 287-288, 312 probability for group, FIXED option
CFNAME keyword, 244 on CALIB command,
263, 288-289 root-mean-square of 25, 78, 79, 89, 115-
DFNAME keyword, posterior deviates, 116, 122, 652
263, 288-289, 301, 604 on PROBLEM
304, 333, 692, 710 test for small number of command, 349-350,
IFNAME keyword, 263, items, 30, 436, 605, 370, 389, 392
274, 288-289, 312, 710, 841 Fixed Theta dialog box,
708 test of improved, 30 350, 358, 371, 377, 388-
MFNAME keyword, FIT keyword 389
263, 288, 290 on SAVE command, Fixed-theta
NFNAME keyword, 263, 312-313, 337, parameter estimation,
263, 288, 290, 337 339, 344 349, 350, 371, 401
OFNAME keyword, FIT option Fixed-theta parameter
263, 288, 290, 337 on SCORE command, estimation., 750
SAVE keyword, 722, 53, 91, 115, 208, 210, FLOAT option
724 264, 316-317, 344, on CALIB command,
666 51, 89, 115-116, 122,

873
INDEX

264, 274, 277, 652, marginal posterior length of, 49


675 probability file, 243, number of, 42, 173
Follman, D., 840, 843 255 position of ID, 58
Font not-presented key, 337 vertical equating, 24
changing, 83 of binary item data, 356 FORMS keyword
Form of multiple response on SIMULATE
of T-matrix in item data, 356 command, 484, 819,
MULTILOG, 403- omit key, 336 820
404 output files, 312-314, Fractile tables, 580
FORM command, 88, 108- 337 Fractiles, 435-436, 455,
112, 114, 159, 163, 173- subject scores file, 315, 576, 580-582, 776
174, 183, 215-216, 219- 337 grouping scores into,
220, 236, 238, 240, 652 FORMAT option 435, 776
INAMES keyword, 50, on INPUT command, number of, 435, 455,
88, 114, 144, 145 444 582
INUMBERS keyword, on SAVE command, specifying, 436
50, 88, 114, 144, 146, 395 FRACTILES command,
686 on START command, 435, 455
LENGTH keyword, 50, 396-397 BOUNDARY keyword,
88, 114, 144, 146 Format statement, 56-59, 435, 776
Form Items tab 64, 88, 96, 104, 106, PERCENTIL option,
on Item Analysis dialog 114, 144-145, 149, 154- 435, 436
box, 49, 88 155, 159-160, 163, 166- SCORES option, 435,
Format 167, 170-175, 178, 210, 436, 776
ability score file, 242- 235-236, 238, 262-263, FRACTILES keyword
243 296-297, 299, 331-336, on FRACTILES
classical item statistics 350, 352, 359-360, 366, command, 776
file, 242, 244 368, 372, 396, 398, 405, on PROBLEM
combined score file, 408, 415, 442, 444, 447, command, 455
313, 337 461, 475, 484, 502-504, FRACTION option
covariance file, 242, 248 634, 638, 641, 652, 666, on TECHNICAL
DIF parameter file, 242, 670, 690, 692, 710, 721- command, 491
245 722, 724-725, 775-776, Fractional
DRIFT parameter file, 779, 802-804, 814-815, factorial design, 495,
242, 247 817, 821, 823, 827-828 590
expected frequencies number of records, 170, quadrature, 494, 589,
file, 242, 250 295 590, 779, 803
fit statistics file, 313, PARSCALE, 263, 331, FREE keyword
338 334, 336 on CALIB command,
for multiple raters, 297 Forms 264, 274, 278
item information file, allocating items to, 40 French, J.L., 833
314, 342 as reference for scoring, FREQ keyword
item parameter file, 242, 215, 219 on FULL command, 438
253, 315, 336, 340 assigning items, 46 FREQ option
equating, 626 on TECHNICAL
for scoring, 219 command, 492

874
INDEX

Frequencies G CFNAME keyword, 85,


expected, 30 114, 147
expected response DFNAME keyword, 57,
pattern, 492 Gauss-Hermite quadrature, 59, 64, 85, 114, 147-
joint per pair, 499 492, 589, 693, 841 149, 151, 172, 177,
FSCORES option GCODE keyword 240-241
on SAVE command, on MGROUP command, IFNAME keyword, 53,
466, 476, 781, 802, 263, 300-302, 334- 85, 114, 147, 150,
816-817 336 180-181, 201, 204-
FULL command, 410, 437, General dialog box, 41, 59, 205, 226-227, 686
478 93, 108-111, 113, 117, LOGISTIC option, 43,
CPARMS keyword, 135, 141, 145, 147, 151- 85, 114, 147, 150,
437, 474 153, 156, 160-163, 165, 638
CYCLES keyword, 438, 170, 174, 176, 183, 225, MFNAME keyword, 85,
779, 781, 794, 803, 229, 233, 234 114, 147, 151
815, 825 Job Description tab, 42, NPARM keyword, 43,
FREQ keyword, 438 49, 85-89, 93, 100 85, 114, 119, 133,
LORD option, 413-414, Labels tab, 44, 87-88 147, 152, 170, 177,
437-438 Model tab, 42, 85, 87, 189-190, 228, 638
MISS option, 413-414, 93, 101 NTEST keyword, 42,
437-438 Response tab, 43, 59, 86 85, 114, 137, 147,
OMIT keyword, 438, General option 152-154, 159, 175,
440, 825 on Setup menu, 40-41, 178, 185-186, 189,
QUAD keyword, 439, 59, 85-89, 93, 100, 194-195, 197-198,
825 108-111, 113, 117, 217-219, 224-225,
RECODE option, 413- 135, 141, 145, 147, 229, 230-231, 233
414, 437-438 151-153, 156, 160- NVTEST keyword, 43,
TIME option, 440 163, 165, 170, 174, 85, 114, 147, 153,
Full information factor 176, 183, 225, 229, 159, 175, 185-186,
analysis, 410, 432, 437, 233-234 224-225, 229-233
442, 446, 530, 629, 778, General tab NWGHT keyword, 85,
818 on Examinee Data 114, 147, 154, 240-
acceleration, 491 dialog box, 55, 86-87, 241
and omits, 421, 438 103 OMITS option, 44, 86,
input of trial values, 446 on Group-Level Data 114, 147, 155
number of quad points dialog box, 62, 86-87 PRNAME keyword, 25,
per dimension, 497 on Settings dialog box, 53, 86, 114, 147, 156-
saving result of E-step, 83 157, 226-227, 241
465 on Test Scoring dialog SAVE option, 80, 86,
use of non-adaptive box, 52, 85-86, 90-91 114, 147, 157, 164-
quadrature, 494 Gibbons, R.D., 529, 585- 165, 199-207, 638,
Full information matrix 586, 591, 843 652
use in Fisher scoring, Glas, C.A.W, 844 GMU keyword
127 Glass, G.V., 580 on PRIORS command,
Full information GLOBAL command, 85, 264, 305
procedures, 584 114, 141, 199, 242

875
INDEX

GNAME keyword listing threshold item information curve,


on GROUP command, parameters, 379, 385, 505, 508-509, 512,
46, 88, 114, 159-160 393 524
on MGROUP command, logistic form, 547 matrix, 829
263, 300, 302 metric, 279 measurement error, 527
Goldstein, H., 536 number of categories, modifying, 514
Goodman, L.A., 735 375 printing of, 514
Goodness-of-fit slopes, 379, 385, 393 resizing, 516
test statistics, 37, 279, Graded responses, 345, saving and printing of,
705, 717, 730, 732, 529, 596 514
736, 746, 750, 752, Graphics procedure, 505 selecting of, 515
758, 763, 768, 780, Bivariate Plot option, total information curve,
796, 807-808 505, 513 505
Goodness-of-fit test, 561 Histogram option, 505, Graphs menu, 516
collapsing of intervals, 512, 518 Fill Page option, 516
544, 562, 605, 705 ICC and Info option, Parameters option, 516
GPARM keyword 505, 508, 526 Graphs Parameters dialog
on BLOCK command, ICC option, 505-506 box, 516
263, 265, 268-269, Information option, 505, Green, B.F., 836
722 507, 526 Green, S.B., 582
GPRIOR option Main menu, 505 Group
on CALIB command, Matrix Plot option, 505, identification code, 301
51, 89, 115-116, 123, 510 Group box
138, 140, 264, 274, Total Info option, 505, Test Model, 367
278, 721 509, 527 GROUP command, 88,
GR option Graphs 108-109, 111, 114, 163,
on TEST command, bivariate, 505, 513 165, 173-174, 183, 236
355, 361, 367, 374, changing attributes of, GNAME keyword, 46,
379, 398-400 517 88, 114, 159-160
Grade equivalents, 628, copying of, 515 INAMES keyword, 50,
832 editing axis labels, 517 88, 114, 159-160
Graded category editing bar INUMBERS keyword,
response function, 547 characteristics, 518 50, 88, 114, 159, 161
GRADED option editing bar parameters, LENGTH keyword, 50,
on CALIB command, 518 88, 114, 159, 161-162
264, 274, 279, 692 editing legends, 520 Group Items tab
on TEST command, editing lines, 521 on Item Analysis dialog
735, 742, 748 editing plot parameters, box, 49, 88, 102
Graded response model, 522 GROUP keyword
279, 340, 355, 367, 379, editing text, 522 on SIMULATE
398-400, 523, 529, 546- item characteristic command, 484, 819-
549, 553, 556, 612, 614, curve, 505, 507, 509, 820
616, 692, 735, 738, 740, 523 Group-level data, 335, 667
742, 748, 765, 772 item difficulty against as input, 293
discriminating power, printing fit statistics,
450 317

876
INDEX

scoring of, 316 listing of item names, GSIGMA keyword


Group-Level Data dialog 160-161 on PRIORS command,
box, 62, 147, 149, 163, listing of item numbers, 264, 305-306
167, 171, 175, 178-181, 160-161 GUESS keyword
240, 242 mean of population on TEST command, 68,
Data File/Enter Data distribution, 379, 385, 87, 114, 224, 226-227
tab, 85-86, 88 393 GUESSING keyword
General tab, 62, 86-87 naming of, 160 on BLOCK command,
Group-Level Data option number of, 42, 62, 174, 263, 265, 268, 722
on Data menu, 62, 85- 294, 352, 360, 367, Guessing model, 523, 538,
88, 147, 149, 163, 373, 391 540, 569
167, 171, 175, 178- number of fixed, 401 and computing factor
181, 240, 242 parameter constraints, scores, 474
GROUPLEVEL option 560 and simulating data, 483
on INPUT command, position of ID, 58, 62 Guessing parameter, 28,
263, 292-293, 336 posterior distributions 29, 31, 118, 123, 137,
GROUP-PLOTS option for, 37 169, 189-190, 249, 254,
on CALIB command, providing labels for, 41 268, 273, 301, 305-306,
25, 79, 89, 115-116, providing quadrature 379, 385, 393, 474-475,
124 points and weights 483, 540-541, 585, 601,
Groups by, 70 609, 612, 733-734, 767,
adjustment of item separate item plots, 78 815, 820, 822, 843
difficulty, 561, 713 setting reference, 302 in TESTFACT, 437,
allocating items to, 40 setting reference for 582, 805
assigning codes, 424 scoring, 220 requesting beta prior,
assigning items, 46 specifying means of 278
assigning items for normal prior, 217 requesting in
general factor, 419 specifying number of PARSCALE, 268
assigning names, 44, items, 162 selecting prior, 123
302 specifying quadrature starting values for, 227,
assumed prior points, 129 268
distributions, 37 specifying standard GUESSSEED keyword
different quad points deviation of normal on SIMULATE
and weights, 76 prior, 218 command, 485, 822
estimates of means, 37 standard deviation of Gulliksen, H., 592
estimates of standard population
errors, 37 distribution, 379, 385,
fit probability, 244 393 H
identification in GROUPS command, 112
simulation of records, GROUPS keyword
484 on FIX command, 386 Haberman, S.J., 588, 835
identifying respondents, GROUPS option Hale, R.L., 833
111 on PRIORS command, Hales, L.W., 578
identifying sets of items, 394 Half-item rule
111 for extreme cases, 31
length of, 49 Hambleton, R.K., 524-525,
846

877
INDEX

Harman, H.H., 583 on SCORE command, 222, 226, 243, 248, 250,
Harvey, W.R., 583 75, 78, 90, 115, 196- 256, 282, 342, 510, 524,
Harwell, M.R., 599 198, 208, 211, 213, 525, 527, 592, 599, 601,
Hedeker, D.R., 586 666, 675 606, 683, 685, 703, 745,
Help menu, 85 IFNAME keyword 842-843, 847
Hendrickson, E.A., 583 on FILES command, curves, 678
Henryssen, S., 577 263, 274, 288-289, expected, 652
Heywood case, 123, 137, 312, 708 maximum value of, 524
576, 587, 589, 601, 843 on GLOBAL command, Information axis
HIGH keyword 53, 85, 114, 147, 150, scaling of, 507
on TEST command, 180-181, 201, 204- Information curves, 25, 36
367, 399, 738, 752 205, 226-227, 686 Information function, 271,
Histogram on INPUT command, 524-526, 599, 609, 613-
estimated abilities, 505 650 614, 623, 625
of ability scores, 512 IGROUPS keyword correcting, 319
scores in TESTFACT, on BIFACTOR INFORMATION keyword
577 command, 419, 804- on SAVE command,
Histogram option 805, 815 263, 312, 314, 337,
Graphics procedure, Import/Enter Values tab 342
505, 512, 518 Assign Item Parameter Information option
Hively, W., 846 Starting Values dialog Graphics procedure,
Holland, P.W., 844 box, 65, 87-88 505, 507, 526
Holzinger, K.J., 586 INAMES keyword Information statistics
Horst, P., 592 on FORM command, requesting, 212
50, 88, 114, 144-145 Initial slope parameter, 564
on GROUP command, Initialize option
I 50, 88, 114, 159-160 on Run menu, 82
on ITEMS command, INOPT keyword
46, 87, 114, 159, 182, on INPUT command,
ICC and Info option 183, 638, 686 263, 292-293, 335-
Graphics procedure, on TEST command, 87, 336
505, 508, 526 114, 224, 228, 263, Input
ICC option 325-326, 710 counts of response
Graphics procedure, Indeterminacy, 531, 549, patterns, 352, 366,
505-506 703 407
ICRIT keyword INDIVIDUAL option data for item or factor
on ESTIMATE on PROBLEM analysis, 441
command, 383 command, 352, 359, file in TESTFACT, 443
IDENTITY keyword 390-391, 748, 773 fixed-effects table of
on CLASS command, INFO keyword counts, 352-353, 407
424 on SCORE command, trial values for full
IDIST keyword 36, 90, 115, 208, 212, information factor
on CALIB command, 218, 222-223, 226- analysis, 446
88, 115-116, 122, 227, 256, 634, 675 INPUT command, 55, 62,
125, 189, 193-195, Information, 25, 28, 31, 86, 114, 241-242, 263,
675 33-36, 208, 212, 218, 280, 289, 291-292, 320,

878
INDEX

329, 331, 441, 482, 504 300-304, 334, 336, OFNAME keyword, 61,
CASE option, 413-414 342, 724, 727-728 62, 87, 114, 155, 163,
COMBINE keyword, NALT keyword, 44, 86, 169, 172, 176-177,
263, 285, 292, 338, 114, 134, 155, 163, 241-242
722 169 PATTERN option, 413,
CORRELAT option, NFMT keyword, 57, 59, 730
445, 818 64, 86, 114, 163, 170, PERSONAL option, 55,
DIAGNOSE keyword, 263, 292, 295, 445 87, 114, 159, 163,
87, 114, 163-164 NFNAME keyword, 60- 177, 240
DIF option, 43, 87, 114, 61, 87, 114, 163, 169, REWIND option, 445
135, 163-164, 192, 171, 177, 241-242 R-INOPT keyword, 263,
638 NFORM keyword, 42, 292, 296-297, 724,
DRIFT option, 43, 87, 87, 114, 144-146, 727-728
114, 135, 142, 163, 159, 163, 168-169, SAMPLE keyword, 55,
165, 192, 202 171-173, 176-177, 63, 77-78, 86, 114,
EXTERNAL keyword, 235, 240-241 148, 163, 167, 178-
55, 63, 87, 114, 163, NFORMS keyword, 652 180, 263, 288, 292,
166 NGROUP keyword, 42, 298, 671, 697
FACTORS option, 445, 87, 114, 117, 121, SCORES option, 430,
823 122-123, 128-130, 444-445, 776, 779,
FILE keyword, 443, 135, 142-143, 159, 781, 804
446, 776, 804, 818, 163, 165, 173-174, TAKE keyword, 55, 63,
823, 825 187, 189, 192, 194- 87, 114, 163, 167,
FORMAT option, 444 198, 218-219, 241, 179, 263, 292, 298,
GROUPLEVEL option, 638 697
263, 292-293, 336 NIDCHAR keyword, TRIAL keyword, 446,
IFNAME keyword, 650 55, 63, 87, 114, 163, 472
INOPT keyword, 263, 174, 241, 263, 292, TYPE keyword, 55, 63,
292, 293, 335, 336 295, 333-336, 343- 86, 114, 147, 148,
ISEED keyword, 78, 87, 344, 443, 446, 469, 150-152, 154-155,
114, 163, 167 692, 721, 775, 779, 163, 180, 238, 240-
KFNAME keyword, 59, 804 241, 638, 666
87, 114, 149, 163, NRATER keyword, UNFORMAT option,
167, 169, 172, 177, 263, 292, 296, 727- 444
236, 241, 670 728 WEIGHT keyword, 441,
LENGTH keyword, NTEST keyword, 263- 444, 446-447, 470,
263, 270, 276, 292, 264, 292, 296, 325, 779
294, 327, 692, 702, 329, 338, 692, 722 WEIGHT option, 263,
722, 728 NTOTAL keyword, 42, 292, 298, 333-336
LIST option, 444, 777 86, 114, 146, 161, Input data
MGROUP keyword, 162-163, 175, 183- type of, 390
263, 292, 294, 300- 184, 229-230, 263, Input Data dialog box,
304, 334-336, 339- 292, 297, 333-334, 350-353, 358, 365, 371,
342, 710 638, 710, 728 377, 388, 390
MRATER keyword, NWEIGHT keyword, Input files
263, 292, 294, 296, 638 answer key, 167

879
INDEX

BILOG-MG, 57, 149, INTERCPT keyword Group Items tab, 49, 88,
156, 173, 241 on TEST command, 47, 102
calibration file in 68, 87, 114, 224, 226, Subtest Items tab, 48, 87
BILOG-MG, 147 229, 232 Subtests tab, 46, 86, 89
in PARSCALE, 288-290 Internal consistency, 459, Item Analysis option
item parameter file in 582 on Setup menu, 40, 46,
BILOG-MG, 150, measure of, 459 86-89, 94, 102, 108-
200, 204 Intervals 111, 113, 117-118,
item parameters for assigning respondents 120-124, 127, 130,
scoring, 475 to, 543, 604 137-138, 140, 145-
item provisional values assigning scores to, 562 146, 160-161, 185-
file in BILOG-MG, for displaying response 186, 225, 231
156 proportions, 29 Item characteristic curves,
item standard tolerance, 605, 649 505-509, 515, 523, 525-
difficulties, 476 Intra-class correlation 526, 594, 644, 653
master file in BILOG- coefficient, 582 displaying, 506, 508
MG, 151 INUMBERS keyword displaying
not-presented key, 171 on FORM command, simultaneously, 510
omit key, 176 50, 88, 114, 144, 146, editing and saving, 506,
PARSCALE, 289, 333 686 508, 512
raw data in BILOG-MG, on GROUP command, Item difficulty, 482, 819-
148 50, 88, 114, 159, 161 820
specifying in BILOG- on ITEMS command, as input, 476
MG, 147, 163 87, 114, 159, 182- plot, 582
Input Parameters dialog 183, 638 plot against
box, 352-355, 359, 362, on TEST command, 47, discriminating power,
366-368, 372-374, 377, 48, 87, 114, 224, 230, 450
388-392, 398-399 233 Item dimensionality, 576
Instruments IQUAD keyword Item facility, 451, 462-465,
multiple test forms, 112 on TECHNICAL 467, 471, 577, 579, 587,
single test form, 111 command, 492 624, 776, 786, 793, 829
INTER keyword Irving, L.M., 765 Item factor analysis, 410,
on PRIOR command, ISEED keyword 432, 465, 530, 575-576,
452 on INPUT command, 584, 586, 589-601, 620,
on SLOPE command, 78, 87, 114, 163, 167 629, 778-781, 802-803,
452 ISTAT keyword 815, 818, 827-829, 840,
INTERCEPT keyword on SAVE command, 86, 842
on TEST command, 114, 199, 203, 244 Item fit statistics, 29
263, 325-326, 328 Item Analysis dialog box, Item information, 32, 40,
Intercepts, 539 46, 94, 102, 108-111, 200, 224, 248, 253, 256,
normal prior 113, 117-118, 120-124, 314, 337, 342, 505, 507-
distribution, 452 127, 130, 137-138, 140, 508, 515, 524-526, 543,
starting values for, 229, 145-146, 160-161, 185- 599, 608-609, 613-615,
326 186, 225, 231 623, 625, 847
Advanced tab, 51, 88-89 suppressing correction
Form Items tab, 49, 88 for, 281

880
INDEX

Item information curves, Import/Enter Values tab, constraining in


505, 507-509, 512, 515, 65 MULTILOG, 379
524-525 Item Parameter Starting contingency tables, 562
displaying, 507-508 Values option controlling estimation
editing and saving, 507- on Technical menu, 65, of, 51
508 109, 225-226, 228, difficulty index, 562,
Item information file 230, 232 642
saving, 314, 342 Item parameters discriminating power,
Item information function, as input for simulation 531, 535, 538-539,
524-526, 599, 609, 613- model, 483 575, 578, 622-623,
615, 623, 625 fixing to, 386 632, 692, 840
Item Keys dialog box, 59, form provided in, 485 entering starting values
163, 169, 172, 177, 242 imposing priors, 393 interactively, 65-66
Answer Key tab, 59, 87 prior densities for, 612 estimated parameters,
Not Presented Key tab, saving to file, 395, 461 25, 37, 205, 616, 693
87 starting values, 396, 397 fixing at starting values,
Omit Key tab, 61, 87 untransformed, 808 68, 226
Item Keys option Item plots fixing parameters, 51,
on Data menu, 59, 87, by group, 78 385
163, 169, 172, 177, Item statistics, 461, 465 free to be estimated, 68
242 output in TESTFACT, guessing parameter, 540
Item location parameter, 785 importing labels from
284, 535, 538, 547, 560, saving, 464 file, 44
613 Item-category threshold importing parameters
estimating as threshold, parameter, 546, 553, for scoring, 52
284 700 importing starting
Item option curves, 523 ITEMFIT keyword values, 65
Item parameter file on CALIB command, intercept parameter, 539
as input in BILOG-MG, 264, 274, 279, 692 joint frequencies per
150, 200, 204 Items pair, 499
naming of, 204, 289 adjustment of difficulty naming and numbering
saving, 242, 253, 314- parameters, 561, 713 in BILOG-MG, 108
315, 336, 340 assigning for general naming in MULTILOG,
Item Parameter Prior factor, 419 387
Constraints dialog box, assigning starting naming of, 449
69, 117, 134, 189-192 values, 64 not-reached in factor
Item Parameter Prior assigning to item analysis, 586
Constraints option groups, 419 number in block, 270
on Technical menu, 69, assigning to subtests, number in test, 185, 294
117, 134, 189-192 forms, groups, 46 number of, 42, 270, 391
Item Parameter Starting assigning to tests, 224 number of times rated,
Values dialog box, 65, assigning variant, 48 296
109, 225-226, 228, 230, common response number of variant, 186
232 codes, 460 printing provisional
Enter Values tab, 66 computation using parameter estimates,
external variable, 166 131

881
INDEX

prior constraints, 51 on LABELS command, Johnson, E.G., 845


probability of chance 387 Johnson, N.L., 843
success, 418 on PRIORS command, Jones, L.V., 835, 840
requesting single 394, 733 Jurgensen, C., 846
common slope on START command,
parameter, 267, 275 396
response curves, 28, 30 on TEST command, K
saving covariances to 263, 325-326, 332,
file, 200 355, 367, 398, 710 Kaiser, H.F., 583
saving DIF parameters, on TMATRIX Kelley, T.L., 837
201 command, 403 Kendall, M., 598
saving DRIFT Item-test correlation, 542, KEY command, 448, 779,
parameters, 202 578, 642, 672, 698, 713 781, 804
saving subtest parameter Item-trait correlation, 542 Key file
estimates, 471 ITER keyword answer, 26, 59, 92, 149,
selecting or re-ordering, on TECHNICAL 167-171, 178-179,
480 command, 492, 496 236, 238, 241-242,
skipping estimation of ITERATION keyword 254, 355-356, 449,
parameters, 280 on SCORE command, 480, 634, 652, 658,
slope parameter, 539 264, 316, 318 666, 670, 783, 827,
specifying as free or Iterations 829
fixed, 226 number of, 51, 120, 126, not-presented, 54, 60,
specifying names, 183 276, 280, 383, 419, 171-172, 176-177,
specifying number on 438, 493, 496, 702 238, 241-242, 333,
form, 146 number of for MINRES, 658, 728
specifying numbers, 183 432, 829 omit, 61, 155, 168, 171-
specifying order in data, number of prior to 172, 176-177, 241-
144 fixing of conditional, 242, 634, 697
specifying prior 493 KFNAME keyword
distributions, 29 stopping at criterion, on INPUT command,
threshold, 539, 624 318 59, 87, 114, 149, 163,
total number of, 175, ITERATIONS keyword 167, 169, 172, 177,
455 on ESTIMATE 236, 241, 670
ITEMS command, 87, 108, command, 383 Kiefer, J., 840, 843
110-112, 114, 145-146, ITLIMIT keyword Kiely, G.L., 760, 763
159-161, 173, 182, 224- on TECHNICAL Klassen, D., 740
225, 228-231, 233 command, 493 Kolakowski, D., 617
INAMES keyword, 46, Kuder-Richardson
87, 114, 159, 182- formula 20, 459, 776
183, 638, 686 J
INUMBERS keyword,
87, 114, 159, 182-
183, 638 Jenkins, C.D., 780, 802,
ITEMS keyword 818
on FIX command, 386 Job Description tab
on General dialog box,
42, 49, 85-89, 93, 100
882
INDEX

L Legend Parameters dialog Linn, R.L., 846


box, 517, 520 LIST keyword
Legends on BIFACTOR
L1 option editing of, 520 command, 420, 804,
on TEST command, Length 815
355, 367, 379, 398- of ID field, 352, 360, on SCORE command,
400, 730 367, 443 475, 801-804, 816-
L2 option LENGTH command, 86, 817
on TEST command, 109, 114, 153-154, 159, LIST option
355, 367, 379, 398- 185, 229, 231 on INPUT command,
400, 732 NITEMS keyword, 47, 444, 777
L3 option 48, 86, 114, 127, 185, on TETRACHORIC
on TEST command, 638, 652, 671 command, 499, 779,
355, 367, 379, 398- NVARIANT keyword, 781, 803
400, 733 43, 47, 86, 114, 185- Little, R.J.A., 845
Labels 186, 671 LOADINGS keyword
importing from external LENGTH keyword on SCORE command,
file, 44 on FORM command, 476
providing for items, 41, 50, 88, 114, 144, 146 LOADINGS option
44 on GROUP command, on SIMULATE
LABELS command, 387 50, 88, 114, 159, 161- command, 414, 485,
ALL option, 387 162 820-821
ITEMS keyword, 387 on INPUT command, Location
NAMES keyword, 387, 263, 270, 276, 292, adjustment for category
735 294, 327, 692, 702, parameters, 266, 270
Labels tab 722, 728 constant, 316, 549, 552
on General dialog box, Lieberman, M., 617, 730, parameter of 1PL
44, 87-88 778, 841-842 model, 539
Laird, N.M., 585, 842 Likelihood ratio test, 30, parameter of 2PL
Latent distributions 530, 535-536, 587, 705, model, 539
assigning by test, 70, 75 763 specifying in rescaling,
estimating, 32, 537 Likert scale 213
preventing adjustment analysis, 257, 327, 345, LOCATION keyword
of, 65 529, 545, 547, 556, on SCORE command,
rescaling, 37 563, 765, 851 53, 54, 90, 115, 208,
Latent roots, 779, 781, Likert, R., 327, 345, 529, 213, 221-222, 652,
789-791, 803, 815, 826, 545 666
828-829 Lin, H.Y., 602 Location parameter
number of, 432, 792 Linacre, J.M., 560 of rating-scale model,
Latent variable space Line Parameters dialog 701
number of points box, 516-517, 519, 521- Log likelihood, 543
sampled, 495 522 Logistic model
Lawley, D.N., 592, 594, Linear equating, 627, 838, 1PL, 539-540, 543, 567,
835 845 605, 843
Lazarsfeld, P.F., 594, 836, Linking 2PL, 539-540, 567, 606,
841 of forms, 534 842

883
INDEX

3PL, 523, 538, 540, 569 602, 604-605, 607, 611, 708, 734, 744, 792, 794,
and relationship to 644, 675, 699-700, 702, 806, 833-834, 836-838,
normal ogive, 541 705, 730, 742-743, 806, 840-844, 847
LOGISTIC option 842-843 and Warm's weighted,
on CALIB command, Marginal probability, 32 264, 316-317, 615
264, 274, 279, 692, of pattern, 30, 32, 244, Maximum marginal a
710 600-603, 796, 812, posteriori (MAP), 25,
on GLOBAL command, 836 29, 31, 35-37, 75, 211,
43, 85, 114, 147, 150, saving to file, 206, 243, 214, 349-350, 357, 389,
638 255 392, 395, 474, 476, 529,
Marshall, J.C., 578 544, 590, 607-610, 625,
Logit, 31, 150, 393, 539, Master file 655, 664, 685, 748-749,
541, 570, 602, 734, 834, as input in BILOG-MG, 781, 782, 802, 837-838
842 151 controlling precision for
Longford, N.T., 842 naming of, 204, 290 factor scores, 478
LORD option saving, 314 MAXPOWER keyword
on BIFACTOR MASTER keyword on DRIFT command,
command, 413-414 on SAVE command, 78, 88, 114, 142, 192
on FULL command, 86, 114, 152, 199, MCEMSEED keyword
413-414, 437-438 204, 263, 288, 312, on TECHNICAL
Lord, F.M., 421, 438, 533, 314, 337 command, 493, 803
534, 537, 541, 563, 569, Masters, G.N., 403, 404, Mean
580, 592, 594, 610, 700, 523, 529, 545, 550, 552, criterion score, 462, 464
835-838, 840-842, 845, 735, 738, 761, 763, 767, of normal distribution
847 842, 846 for intercepts, 452
Louis, T.A., 830, 842 Matrix plot, 829 MEAN keyword
Lower asymptote Matrix Plot option on SIMULATE
3PL model, 393 Graphics procedure, command, 486, 819-
Luce, R.D., 835 505, 510 820
Matrix sampling data, 116, Means
187, 238, 537, 610, 630, estimates for groups, 37
M 666 of population
Maximum distribution, 379, 385,
effectiveness point, 250 393
Madsen, M., 730, 843 Maximum information, 32, of population of factor
MAIN option 249, 256, 524, 632 scores, 486
on SAVE command, Maximum Likelihood of prior distributions,
467 (ML), 25, 28, 30-36, 75, 51, 123, 277, 394,
Mantel, N., 626, 835 122, 128, 208, 213-214, 652
Marginal 222, 277, 317-318, 320, Mean-square
reliability, 617, 746 323, 345, 410, 452, 495, of measurement errors,
Marginal maximum 529-530, 532, 537, 543- 589
likelihood (MML), 28, 544, 564, 568, 576, 584- Measurement
116, 121, 123, 345, 349- 587, 591, 594-595, 597- standard error, 525-526,
350, 364, 389, 401-403, 600, 606-611, 615, 652, 589, 606, 608-609,
407, 529, 544, 562, 576, 655, 685, 693, 700, 702, 622
584-587, 589-590, 599-
884
INDEX

Measurement error MIDPOINT keyword MOMENTS option


graph, 527 on DRIFT command, on SCORE command,
mean-square, 589 88, 114, 142-143 26, 79, 91, 115, 208,
root-mean-square, 589 Minimized squared 214-215
Meng, X.L., 590 residuals (MINRES) Monte Carlo integration,
Menu method, 791, 793 410, 494, 530, 590-591,
Edit, 515 MINRES 803
File, 514-515 factor analysis, 431-432, generating random
Graphs, 516 829 multivariate normal
Options, 515 Mislevy, R.J., 28, 34-35, variables, 493
METHOD keyword 209, 528, 531, 561, 576, MRATER command, 263,
on SCORE command, 585, 602, 607, 610, 679, 294, 303, 728
90, 115, 208, 213- 838-839, 843-845, 847 RATER keyword, 263,
214, 216-217, 476, MISS option 303
652, 666, 685, 781, on BIFACTOR RCODE keyword, 263,
801- 804, 817 command, 413-414 303-304
Method of estimation on FULL command, RNAME keyword, 263,
for scoring, 214, 317 413-414, 437-438 303-304, 728
in TESTFACT, 410, 530 MISSING option MRATER keyword
MFNAME keyword on SCORE command, on INPUT command,
on FILES command, 477 263, 292, 294, 296,
263, 288, 290 Missing value 300-304, 334, 336,
on GLOBAL command, code in MULTILOG, 342, 724, 727-728
85, 114, 147, 151 356, 384 M-step
MGROUP command, 263, ML scores convergence criterion in
294-295, 300, 710, 711 moving to adjacent MULTILOG, 383
COMMON keyword, category, 320 number of iterations,
263, 300 MLE option 383
DIF keyword, 263, 300, on SCORE command, MU keyword
301, 710-711 264, 316-317 on EQUAL command,
GCODE keyword, 263, MML, 277 379
300-302, 334-336 number of quadrature on FIX command, 385-
GNAME keyword, 263, points, 129 386
300, 302 Model on PRIORS command,
REFERENCE keyword, defining in 393
263, 300, 302, 710 MULTILOG, 398 MULTILOG
MGROUP keyword Model tab class of problem, 389
on INPUT command, on General dialog box, contrasts, 570
263, 292, 294, 300- 42, 85, 87, 93, 101 convergence for M-step,
302, 304, 334-336, MODIFIED keyword 383
339-342, 710 on BLOCK command, example of syntax
MIDDLE keyword 263, 265, 267, 269, generation, 357, 364,
on TGROUPS 272, 696, 709, 720 370, 733
command, 401 Modifying Fixed-theta analysis,
graphs, 514 349-350, 371, 401
imposing priors, 393

885
INDEX

listing contrasts, 379, N NCASES keyword


385, 393 on SIMULATE
missing value code, 356 command, 486, 819-
naming items, 387 NALT keyword 820
number of groups, 352, on INPUT command, NCAT keyword
391 44, 86, 114, 134, 155, on BLOCK command,
number of items, 352, 163, 169 263, 265, 269-271,
391 NAME keyword 339, 692, 710
number of patterns, 352 on COMBINE NCHAR keyword
order of commands, 375 command, 264, 285 on PROBLEM
specifying data file, 389 on CRITERION command, 352
T-matrix, 403 command, 429 NCHARS keyword
type of data, 388, 390 on SCORE command, on PROBLEM
user interface, 345 264, 316, 318 command, 353, 359-
Multiple-category models, Names 360, 367, 390, 409,
399, 612 assigning to categories, 748, 773
Multiple-choice model, 267 NCYCLES keyword
355, 367, 379, 398-400, assigning to groups, 302 on ESTIMATE
403, 751, 757 assigning to raters, 304 command, 383
Multiple-group analysis, NAMES command, 429- NDEC keyword
34, 36, 42, 70, 104, 111- 430, 449, 455-456, 482, on BIFACTOR
112, 124, 300, 531, 625- 490, 804 command, 420
626, 652, 710, 712, 844 NAMES keyword on FACTOR command,
DRIFT analysis, 165 on CLASS command, 431
identifying groups, 111 425 on TETRACHORIC
identifying sets of items, on LABELS command, command, 500, 779,
111 387, 735 781, 803
resolving indeterminacy, on SUBTEST Nested models, 543
134 command, 490 New Analysis dialog box,
setting reference, 302 Natural metric 349-350, 357, 364, 370-
suppressing rescaling in, logistic response 371, 377, 388-390
25, 128 function, 150, 279, NEWTON keyword
Multiple-group model 306 on CALIB command,
response data, 334 Naylor, J.C., 589 51, 88, 115-116, 120-
Multiple-response model, NBLOCK keyword 121, 126, 136, 226,
510, 523, 567-568 on SCORE command, 264, 274, 280, 692,
Muraki, E., 16, 257, 528- 264 703, 711
529, 536, 544, 546, 560, on TEST command, Newton-Gauss, 31, 280,
576, 585, 588, 591, 611, 263-265, 325, 327, 833
679, 700, 761, 843-846 339-340, 342, 692 Newton-Raphson
NC keyword estimation, 31, 615, 749,
on PROBLEM 833-834
command, 752 NEXAMINEES keyword
on TEST command, on PROBLEM
367, 375, 399, 738, command, 353, 390-
773 391

886
INDEX

Neyman, J.A., 840 189, 192, 194-198, NOACCEL option


NFAC keyword 218-219, 241, 638 on CALIB command,
on FACTOR command, on PROBLEM 264, 274-275
431-433, 446, 466, command, 360, 367, NOADAPT option, 815
468, 469, 473, 484, 373, 391, 409 on TECHNICAL
486, 779, 781, 797, NGROUPS keyword command, 432, 439-
803, 815, 818, 824- on PROBLEM 440, 466, 471, 494,
825 command, 352, 353 779, 825
on SCORE command, NIDCHAR keyword NOADJUST option
477, 802, 816 on INPUT command, on CALIB command,
on SIMULATE 55, 63, 87, 114, 163, 25, 37, 79, 89, 115-
command, 486, 819, 174, 241, 263, 292, 116, 127-128, 226-
820 295, 333-336, 343- 227
NFMT keyword 344, 443, 446, 469, on SCORE command,
on INPUT command, 692, 721, 775, 779, 264, 316, 318
57, 59, 64, 86, 114, 804 NOCADJUST option
163, 170, 263, 292, NIGROUPS keyword on BLOCK command,
295, 445 on BIFACTOR 263, 265, 270
NFNAME keyword command, 419, 421, NOCALIB option
on FILES command, 804, 815 on CALIB command,
263, 288, 290, 337 NIT keyword 264, 274, 280, 288,
on INPUT command, on FACTOR command, 324, 708
60-61, 87, 114, 163, 432 NOCRITERION option
169, 171, 177, 241, NITEMS keyword on PLOT command, 451
242 on BLOCK command, NOFLOAT option
NFORM keyword 263, 265, 270, 339, on CALIB command,
on INPUT command, 342, 692, 710 89, 115-116, 122-123,
42, 87, 114, 144-146, on LENGTH command, 187-188, 675
159, 163, 168-169, 47-48, 86, 114, 127, NOGPRIOR option
171-173, 176-177, 185, 638, 652, 671 on CALIB command,
235, 240-241 on PROBLEM 89, 115-116, 123,
on SCORE command, command, 352-353, 138, 140
91, 115, 208, 215, 360, 367, 373, 391- NOLIST option
219-220 400, 409, 432, 436, on BIFACTOR
NFORMS keyword 446, 455, 457, 464, command, 421
on INPUT command, 469, 470, 473, 484, Nominal model, 345, 355,
652 776, 778, 804, 825 367, 379, 398-400, 403,
NFULL keyword NITER keyword 506, 508, 511, 523-524,
on CALIB command, on TECHNICAL 529, 545, 558, 568, 570,
78, 89, 115-116, 127 command, 494 625, 738, 751, 759-763,
NGROUP keyword NO option 765, 767, 841
on INPUT command, on TEST command, scoring function, 558
42, 87, 114, 117, 121- 355, 367, 379, 398- NOMINAL option
123, 128-130, 135, 400, 403 on TEST command, 738
142-143, 159, 163, Non-adaptive quadrature,
165, 173-174, 187, 410, 493-495, 530, 779

887
INDEX

for score estimation, NOSPRIOR option 666, 692, 702, 709-


410, 530 on CALIB command, 710
in full information 89, 115-116, 124, on SCORE command,
solution, 494 137, 140 77, 90, 115, 197-198,
Non-equivalent groups Not Presented Key tab 208, 216, 264, 310-
equating, 24, 122, 533, on Item Keys dialog 311, 316, 319, 706
627-628, 632 box, 87 NRATER keyword
NOPOP option NOTPRES option on INPUT command,
on PROBLEM on PROBLEM 263, 292, 296, 727-
command, 392 command, 456, 460 728
NOPRINT option Not-presented NRATER option
on SCORE command, codes for, 43 on CALIB command,
53, 91, 115, 208, 216, Not-presented key 264, 274, 281
634 format of file, 337 on SCORE command,
Normal using, 54, 59-60, 171, 264, 316, 319
generating ability 172, 176-177, 238, NROOT keyword
distribution, 487 241-242, 290, 333, on FACTOR command,
prior on intercepts, 452 658, 728 432, 779, 781, 792,
Normal metric, 28, 386, NOT-PRESENTED option 803, 815, 818, 825
541-542, 557, 570, 608, on PROBLEM NSAMPLE keyword
692, 734, 767 command, 781 on TECHNICAL
scaling factor, 28, 549, NOTPRIOR option command, 495
613 on CALIB command, NSD keyword
using, 279 89, 115-116, 124, on CALIB command,
Normal ogive model, 150, 138-139 89, 115-116, 130
151, 279, 538, 541, 546- Not-reached items NTEST keyword
547, 563, 582, 621, 832, in factor analysis, 586 on GLOBAL command,
834, 837, 840-842 Novick, M.R., 533, 541, 42, 85, 114, 137, 147,
and relationship to 563, 580, 700, 837-838, 152-154, 159, 175,
logistic, 541 841 178, 185-186, 189,
NORMAL option NPARM keyword 194-195, 197-198,
on CALIB command, on GLOBAL command, 217-219, 224-225,
89, 115-116, 128-129, 43, 85, 114, 119, 133, 229-231, 233
264, 274, 279, 659 147, 152, 170, 177, on INPUT command,
Normal prior 189-190, 228, 638 263-264, 292, 296,
specifying, 394 NPATTERNS keyword 325, 329, 338, 692,
Normal tab on PROBLEM 722
on Assign Scoring Prior command, 352-353, NTOTAL keyword
Latent Distribution 360, 367, 390-391 on INPUT command,
dialog box, 75, 90 NQPT keyword 42, 86, 114, 146, 161-
NOSCORE option on CALIB command, 163, 175, 183-184,
on SCORE command, 51, 88, 115-117, 121- 229-230, 263, 292,
264, 316, 319 122, 129, 194-195, 297, 333-334, 638,
NOSORT option 264, 274, 279-280, 710, 728
on TECHNICAL 308-309, 342, 639, Number
command, 495 of boundaries, 458

888
INDEX

of cases generated, 486 of iterations, 51, 120, of tests, 24, 325, 352,
of categories for graded 276, 383, 438, 493, 354, 360, 367, 373,
model, 375 496, 702 528
of classes, 454 of iterations for of the highest category,
of COMBINE MINRES, 432, 829 355, 399
commands, 292 of iterations in the M- of times item is rated,
of cycles of MML step, 383, 493 296
estimation, 383 of iterative communality of unique items, 175
of cycles prior to fixing, improvements, 494 of variable format
493 of latent roots, 432, 792 records, 445
of decimals for of parameter values, 487 NUMBER keyword
residuals, 431 of parameters in on TGROUPS
of decimals for BILOG-MG model, command, 401-402
tetrachorics, 500 152 NVARIANT keyword
of EM cycles, 493, 496 of patterns, 352, 360, on LENGTH command,
of examinees, 391 366, 367, 391 43, 47, 86, 114, 185-
of examinees in of points sampled, 495 186, 671
MULTILOG, 352 of quadrature points, 76, NVTEST keyword
of external variates, 455 117, 129, 208, 216- on GLOBAL command,
of factors, 431, 477, 486 217, 221, 280, 319, 43, 85, 114, 147, 153,
of factors to be 401, 422, 439, 478, 159, 175, 185-186,
extracted, 431 497-498, 600, 611, 224-225, 229-231,
of format records, 170 666, 669, 693, 694, 233
of forms, 42 702, 709, 710, 758, NWEIGHT keyword
of fractiles, 435, 455, 794, 825, 826 on INPUT command,
582 of quadrature points for 638
of generated response EAP estimation, 498 NWGHT keyword
records, 486 of records in data file, on GLOBAL command,
of groups, 42, 62, 174, 54-55, 58, 62, 391 85, 114, 147, 154,
294, 352, 360, 367, of response alternatives, 240-241
373, 391 43, 169
of groups in of response categories,
MULTILOG, 352, 399 O
353 of response categories in
of item-group factors, MULTILOG, 355
421 of response codes, 405, O’Connor, W.A., 740
of items, 42, 183, 270, 456 Oblique rotation, 588
352, 391 of response patterns, Observed
of items in form, 146, 390-391 frequencies of patterns,
162 of response patterns in 438
of items in MULTILOG, 352 OFNAME keyword
MULTILOG, 352, of selected items, 457 on FILES command,
353 of subtests, 42, 152-153, 263, 288, 290, 337
of items in test, 185, 294 296 on INPUT command,
of test forms, 173 61-62, 87, 114, 155,
163, 169, 172, 176-
177, 241-242

889
INDEX

Olsson, U., 564 number of leading cases 158, 180-181, 199,


Omit key with factor scores, 204, 226-227, 253,
format of file, 336 475 263, 274, 312, 314,
using, 59, 61, 155, 168, printing tetrachoric 336-337, 342, 638,
171-172, 176-177, correlation matrix to 693, 724, 779
241-242, 290, 634, file, 499 on SIMULATE
697 Output files, 37, 149, 199- command, 487
Omit Key tab 207, 242, 337 on START command,
on Item Keys dialog BILOG-MG, 149, 199- 396-397
box, 61, 87 207, 241-242 PARM option
OMIT keyword format of, 312-314, 337 on SAVE command,
on BIFACTOR PARSCALE, 312-313, 468
command, 421, 423, 337 PARSCALE
477 requesting, 199 DIF analysis, 301
on FULL command, saving case weights, 206 format statement, 263,
438, 440, 825 saving item parameter 331, 334, 336
Omits file, 204 input files, 288-289, 333
codes for, 43 saving marginal order of commands, 262
scoring fractionally probabilities, 206 output files, 312-313,
correct, 43, 155, 169 saving master file, 204 337
specifying treatment of, saving score file, 206 overview of syntax, 261
500 saving to, 461 requesting guessing
treatment of in full viewing of, 82 parameter, 268
information factor Output menu, 82, 98 ridge constant, 282
analysis, 421, 438 Owen, R.J., 847 setting workspace, 259
OMITS option user interface, 258
on GLOBAL command, Partial credit model, 279,
44, 86, 114, 147, 155 P 340, 529, 550, 552-559,
Options menu, 82-83, 515 565, 612, 614, 706, 709-
Settings dialog box, 83, 710, 714, 720, 722, 724,
85 Pairwise constraints, 380 735, 738, 763, 767, 842
Order of commands PAIRWISE option category coefficient, 557
BILOG-MG, 114 on TETRACHORIC generalized, 550, 557,
MULTILOG, 375 command, 500, 501 558, 564, 762
PARSCALE, 262 PARAM keyword intersection of trace
TESTFACT, 413-414 on SCORE command, lines, 552-553, 555,
ORIGINAL keyword 477 558
on BLOCK command, on START command, Likert version, 556
263, 265, 270, 333, 409 operating characteristic,
335-336, 696, 710, Parameters option 551
720 on Graphs menu, 516 partially unordered, 559
Orthogonal rotation, 482, PARAMS keyword scoring function for
576, 588, 590, 629 on PRIORS command, generalized, 557, 558
Output 394 specifying scoring
diagnostic, 164, 276 PARM keyword function for, 272
on SAVE command, 86,
114, 147-148, 150,
890
INDEX

PARTIAL keyword NOCRITERION option, on QUADP command,


on CALIB command, 451 264, 308
710 PBISERIAL option, on QUADS command,
PARTIAL option 450, 776 91, 115, 196-197,
on CALIB command, PLOT keyword 264, 310
264, 274, 279 on CALIB command, POLYNOMIAL option
PATTERN option 88, 115-116, 124-125, on TMATRIX
on INPUT command, 130-131, 634 command, 404, 738,
413, 730 Plot Parameters dialog 763
on PROBLEM box, 521-522 Polyserial correlation
command, 352, 390- Plots coefficient, 563-564,
391, 742 editing legends, 520 642-643, 699
Patterns facilities, 582 POP option
number of, 352, 360, item and test on SCORE command,
366, 367, 391 information, 208 91, 115, 208, 213,
printing frequencies of, item by group, 78 218, 223, 634, 653,
492 item difficulties, 582 675
PBISERIAL option of proportion correct Population distribution
on PLOT command, responses, 124 excluding from scoring,
450, 776 specifying significance 392
PDISTRIB keyword level, 130 Population percentiles, 625
on SAVE command, 25, PMN keyword Position
86, 114, 199, 205 on SCORE command, of case ID, 58
Pearson product-moment 75, 90, 115, 208, 214, POST keyword
correlation, 629 217, 219 on SAVE command, 86,
PERCENTIL option Point biserial coefficient 114, 199, 206, 255
on FRACTILES as dicrimination index, Posterior
command, 435, 436 450 distribution, 28, 31, 34,
PERSONAL option Point biserial correlation, 35, 37, 212, 278, 281,
on INPUT command, 462-467, 471, 786 560-561, 589-590,
55, 87, 114, 159, 163, Point polyserial 601, 607, 656, 704,
177, 240 correlation, 563, 642, 708, 711, 713-714,
Personalizing 698-699 738, 803, 837
by subtest, 177 Points distribution after M-
Pfiffenberger, W., 536, for quadrature, 70, 128, step, 281
844 194-195, 250-251, distributions for groups,
PFQ keyword 253, 282, 308, 310, 37
on SCORE command, 321, 342, 402, 492, information, 36, 608,
264, 316, 320, 693 561, 589, 600, 639, 609-610, 655, 802
PLOT command, 450 644, 668, 692, 695, probability, 255-256,
BISERIAL option, 450 795, 800, 806 589, 603, 655
CRITERION option, POINTS keyword saving points and
451, 776 on QUAD command, weights of
FACILITY option, 451, 51, 75, 77, 90, 115, distribution, 25, 205
776 193-194 saving residuals to file,
202

891
INDEX

standard deviation, 35, constraints on items, 51 specifying distributions


36, 75, 90, 115, 208, densities for item for items, 187, 305
214, 217-219, 466, parameters, 612 specifying means of
561, 588, 607-608, distribution information normal, 217
610, 655-656, 693, for calibration, 64 specifying range of, 130
707, 731, 814, 828 distribution information specifying standard
standardized residuals, for scoring, 64 deviation of normal,
202, 250, 252, 603- distributions assumed 218
604 for groups, 37 specifying type, 125,
POSTERIOR option estimating means, 277 211, 277
on CALIB command, estimating means of standard deviations for
264, 274, 281, 711 distribution, 51, 128, item slopes, 191, 307
PRECISION keyword 652 standard deviations for
on TECHNICAL estimating means of item thresholds, 192
command, 495, 781 distributions, 123 type of distribution for
Print current page option for guessing parameter, scoring, 75, 316
on File menu, 514 123 user-supplied, 125, 188,
PRINT keyword for item parameters, 393 193, 675
on CALIB command, for slope parameter, PRIORS command, 69, 90,
88, 115-116, 131, 157 123, 137 115, 117, 134, 187, 193,
PRINT option for threshold parameter, 264, 281, 305, 393, 452
on SCORE command, 138 AJ keyword, 393
264, 316, 320 keeping distributions AK keyword, 393
Print selected graph option fixed, 122 ALL option, 394
on File menu, 514 mean and s.d. for ALPHA keyword, 47,
Printer Setup option normal, 394 90, 115, 187, 189-190
on File menu, 514 normal for intercepts, BETA keyword, 47, 90,
Printing, 514 452 115, 187, 189-190
bifactor loadings, 420 normal for threshold BJ keyword, 393
of graphs, 514 parameter, 284 BK keyword, 393
pattern frequencies, 492 parameters for item CJ keyword, 393
provisional item slopes, 190, 306 CK keyword, 393
parameter estimates, parameters for item DK keyword, 393, 733
131, 496 thresholds, 191, 307 GMU keyword, 264,
subject scores to output providing arbitrary 305
file, 320 discrete, 75 GROUPS option, 394
Printing Options option requesting reading of, GSIGMA keyword, 264,
on File menu, 514 281 305-306
PRIOR command, 675 requesting use of beta INTER keyword, 452
PRIORREAD option for guessing, 278 ITEMS keyword, 394,
on CALIB command, slopes in natural metric, 733
264, 274, 281, 305 306 MU keyword, 393
Priors specifying constraints, PARAMS keyword, 394
beta on uniquenesses, 64 SD keyword, 393
452 specifying distribution SLOPE keyword, 452
by subtest, 69 for items, 29 SMU keyword, 26, 47,

892
INDEX

70, 90, 115, 187, 190, NCHARS keyword, Project Settings dialog
264, 305-306 359-360, 367, 390, box, 362, 369
SOPTION option, 264, 409, 748, 773 PROMAX option
305, 306 NEXAMINEES on FACTOR command,
SSIGMA keyword, 26, keyword, 390-391 413
47, 70, 90, 115, 187, NGROUP keyword, PROMAX rotation, 413,
191, 264, 305, 307 360, 367, 373, 391, 431, 433, 441-442, 468,
TMU keyword, 47, 70, 409 576, 583, 588, 629, 779,
90, 115, 187, 191, NITEMS keyword, 360, 803, 818, 823, 828-829
264, 305, 307 367, 373, 391, 400, Provisional estimates
TSIGMA keyword, 47, 409, 432, 436, 446, controlling printing of,
70, 90, 115, 187, 192, 455, 457, 464, 469- 496
264, 305, 307 470, 473, 484, 776, Provisional values file
PRNAME keyword 778, 804, 825 as input in BILOG-MG,
on GLOBAL command, NOPOP option, 392 156
25, 53, 86, 114, 147, NOTPRES option, 456, PRV keyword
156-157, 226-227, 460 on TECHNICAL
241 NOT-PRESENTED command, 496
Probability option, 781 PSD keyword
marginal of pattern, 30, NPATTERNS keyword, on SCORE command,
32, 244, 600, 603, 360, 367, 390-391 75, 90, 115, 208, 214,
796, 812, 836 PATTERN option, 352, 217, 218-219
observed response, 634 390-391, 742
of chance success, 418 RANDOM option, 349-
posterior, 255-256, 589, 350, 389, 392, 730, Q
603, 655 742
PROBLEM command, RESPONSE keyword,
388, 435, 454, 482 456, 776, 778, 781, QP keyword
CLASS keyword, 446, 804, 825 on TGROUPS
454, 776 SCORE option, 748, command, 402
CRITERION option, 773 QPREAD option
358, 388, 773 SCORES option, 349- on CALIB command,
DATA keyword, 352, 350, 357, 389, 392 264, 274, 282, 308
359, 389 SELECT keyword, 457, on SCORE command,
EXTERNAL keyword, 480-481, 776, 781, 264, 310, 316, 321
430, 446, 455 818 QRANGE keyword
FIXED option, 349, SKIP keyword, 457, on CALIB command,
350, 370, 389, 392 780, 802, 818, 824 264, 274, 282
FRACTILES keyword, SUBTEST keyword, on SCORE command,
455, 776 458, 776 264, 316, 321
INDIVIDUAL option, TABLE option, 352, QSCALE keyword
352, 359, 390-391, 390-391 on TECHNICAL
748, 773 Program command, 496
NC keyword, 752 evaluation, 618-619 QUAD command, 75, 90,
information, 85 115, 117, 122, 126, 187-
189, 193, 675

893
INDEX

POINTS keyword, 51, 422, 439, 478, 497-498, RASCH option


75, 77, 90, 115, 193- 600, 611, 666, 669, 693- on CALIB command,
194 694, 702, 709, 710, 758, 26, 79, 89, 115-116,
WEIGHTS keyword, 794, 825, 826 133
51, 75, 77, 90, 115, user-supplied, 193-194, Rasch, G., 26, 65, 78, 133,
193, 195 196-197 523, 538-540, 550, 568,
QUAD keyword Quadrature weights 594, 616, 738, 840
on BIFACTOR user-supplied, 193, 195- RATER keyword
command, 422 197 on BLOCK command,
on FULL command, QUADS command, 75, 91, 263, 265, 271, 281,
439, 825 115, 196, 211-212, 264, 320
on SCORE command, 310, 321 on MRATER command,
478 POINTS keyword, 91, 263, 303
on TECHNICAL 115, 196-197, 264, Raters
command, 496-497 310 identification code, 304
QUADP command, 264, WEIGHTS keyword, naming of, 304
282, 308 91, 115, 196-197, Rater's Effect model
POINTS keyword, 264, 264, 310-311 adjusting for differences
308 Qualification testing, 618- in severity, 529
WEIGHTS keyword, 619 data, 333, 335
264, 308-309 QWEIGHT keyword weights, 561
Quadrature on TECHNICAL Rating-scale model, 257,
adaptive, 410, 491, 493, command, 497 524, 528-529, 547, 549-
495, 530, 589, 781, 550, 565-566, 692
815, 817, 825-826, RCODE keyword
842 R on MRATER command,
fractional, 494, 589-590, 263, 303-304
779, 803 READF option
Gauss-Hermite, 492, Random number seed, 77, on SCORE command,
589, 693, 841 167, 483, 485, 487, 820 91, 115, 208, 215,
non-adaptive, 410, 493- RANDOM option 216, 219-220
495, 530, 779 on PROBLEM READPRIOR option
points, 70, 128, 194, command, 349-350, on CALIB command,
195, 250-251, 253, 389, 392, 730, 742 70, 89, 115-117, 133,
282, 308, 310, 321, Range 187-192, 666, 675
342, 402, 492, 561, of prior distribution, 130 RECODE option
589, 600, 639, 644, of quadrature points, on BIFACTOR
668, 692, 695, 795, 321 command, 413-414
806 Rasch model, 26, 65, 78, on FULL command,
points and weights, 70, 539-540, 543, 568, 605, 413-414, 437-438
308, 310, 800 843 on TETRACHORIC
type of, 492 and DIF/DRIFT, 133 command, 500-501,
weights, 251, 590 specifying in BILOG- 825
Quadrature points, 76, 117, MG, 133 Reference group
129, 208, 216-217, 221, multiple-group analysis,
280, 282, 319, 321, 401, 302

894
INDEX

REFERENCE keyword suppressing of in Response Codes (Binary


on CALIB command, multiple-group Data) dialog box, 355-
42, 89, 115-116, 128, analysis, 25, 128 356, 368, 384, 405-406
134-135, 220, 638, type of, 221 Response Codes (Non-
659, 675 Rescaling tab Binary Data) dialog box,
on MGROUP command, on Test Scoring dialog 355, 361
263, 300, 302, 710 box, 53, 90 RESPONSE command,
on SCORE command, Residual matrix 456-457, 460, 781, 804
91, 115, 208, 215- computing, 433 RESPONSE keyword
216, 219-220 number of decimals, 420 on PROBLEM
Referencing printing for bifactor command, 456, 776,
criterion, 632 solution, 421 778, 781, 804, 825
domain, 632, 846 RESIDUAL option Response pattern
Relative difficulty, 557 on BIFACTOR goodness-of-fit statistic,
Reliability, 25, 35, 36, 459, command, 422 210
575, 607-608, 623, 625, on FACTOR command, listing observed and
637, 684-685, 840 433, 779 expected frequencies,
classical, 32-33 Residuals 438
empirical, 25, 34-36, number of decimal suppressing sorting of,
589, 656, 814 places, 431 495
index, 542 standardized posterior, Response records
marginal, 617, 746 202, 250, 252, 603- number to be generated,
theoretical, 25, 33-34, 604 486
36 Response Response tab
Reliability coefficient, 675 all correct or incorrect, on General dialog box,
RELIABILITY command, 31 43, 59, 86
459 codes common to all REWIND option
ALPHA option, 776 items, 460 on INPUT command,
REPEAT keyword graded, 345, 529, 596 445
on BLOCK command, graded category Rewinding
263, 265-266, 271, function, 547 of data file, 445
710, 721-722 individual respondents, Richardson, M.W., 459,
on SCORE command, 333-334 542, 583, 623, 776
264 marginal probabilities of Ridge constant
on TEST command, 264 patterns, 32 specifying in BILOG-
RESCALE option metric of function, 42, MG, 135
on SCORE command, 279, 692 specifying in
264, 316, 321 number of alternatives, PARSCALE, 282
Rescaling, 323, 652 169 RIDGE keyword
constants, 321 number of codes, 405, on CALIB command,
specifying, 208, 634 456 89, 115-117, 135,
specifying location number of patterns, 390 264, 274, 283, 666
constant, 213 plots of proportion R-INOPT keyword
specifying scale correct, 124 on INPUT command,
constant, 221 specifying number of 263, 292, 296-297,
alternatives, 43 724, 727-728

895
INDEX

RNAME keyword Classical Statistics Only CMAIN option, 462


on MRATER command, option, 82 COMBINE keyword,
263, 303-304, 728 Initialize option, 82 263, 285, 312-313,
Roche, A.F., 616, 772-773 Scoring Only option, 82 337
Root-mean-square Stats, Calibration and CORRELAT option,
of measurement errors, Scoring option, 82, 463
589 98, 106 COVARIANCE
of posterior deviates, keyword, 86, 114,
604 199-201, 248
Rosenman, R.H., 780, 802- S CRITERION option,
803 464
ROTATE keyword Samejima, F., 271, 345, CSUB option, 465
on FACTOR command, 399, 524, 529, 545-547, DIF keyword, 86, 114,
414, 433, 468, 472, 567-569, 596, 613-614, 164-165, 199, 201,
779, 781, 793, 803, 734-735, 738, 740-741, 245, 638
818, 824 750-751, 765, 772, 835, DRIFT keyword, 86,
ROTATE option 841, 846 114, 165, 199, 202,
on SAVE command, Sample 247
468, 779, 818 calibration, 609, 616 EXPECTED keyword,
Rotated factor loadings matrix, 238, 537, 630, 86, 114, 199, 202,
saving to file, 468 666 250
Rotation of records, 27, 167, 178, EXPECTED option, 465
oblique, 576, 588 298 FIT keyword, 263, 312,
of factors, 433 SAMPLE keyword 313, 337, 339, 344
orthogonal, 482, 576, on INPUT command, FORMAT option, 395
588, 590, 629 55, 63, 77-78, 86, FSCORES option, 466,
PROMAX, 413, 431, 114, 148, 163, 167, 476, 781, 802, 816,
433, 441-442, 468, 178-180, 263, 288, 817
576, 583, 588, 629, 292, 298, 671, 697 INFORMATION
779, 803, 818, 823, SAMPLE option keyword, 263, 312,
828-829 on SCORE command, 314, 337, 342
VARIMAX, 413, 431, 264, 316, 322 ISTAT keyword, 86,
433, 441-442, 468, Save as Metafile option 114, 199, 203, 244
576, 583, 588, 629, on File menu, 514 MAIN option, 467
781, 799, 802 SAVE command, 80, 86, MASTER keyword, 78,
RSCTYPE keyword 114, 158, 199, 242-244, 86, 114, 152, 199,
on SCORE command, 247-248, 250, 253, 255- 204, 263, 288, 312,
53-54, 90, 115, 208, 256, 263, 289, 291, 312, 314, 337
213, 221-222, 652, 337, 395, 409, 420-421, PARM keyword, 86,
659, 675 442, 461, 482, 499-500, 114, 147-148, 150,
Rubin, D.B., 585, 844 652, 722 158, 180-181, 199,
Run menu, 81 CALIB keyword, 86, 204, 226-227, 253,
Build Syntax option, 82, 114, 148, 199, 204, 263, 274, 312, 314,
97, 105 263, 288, 312, 337 336-337, 342, 638,
Calibration Only option, CCRIT option, 461 693, 724, 779
82 CHANCE option, 816- PARM option, 468
817
896
INDEX

PDISTRIB keyword, 25, calibration file, 199, 312 marginal probabilities,


86, 114, 199, 205 case score information, 206, 243, 255
POST keyword, 86, 114, 469 master data file, 314
199, 206, 255 case weights, 206 output files, 80
ROTATE option, 468, class item statistics, 461 points and weights, 25,
779, 818 classical item statistics 205
SCORE keyword, 52, to file, 203, 242, 244, results of E-step of fifa,
86, 114, 158, 199, 467 465
206, 207, 216, 243, combined score file, results of final E-step,
263, 312, 315-316, 313, 337 465
337, 693, 707, 724 command factor rotated factor loadings,
SCORES option, 469 approximation, 470 468
SMOOTH option, 470, covariance file, 200, scores to external file,
779 242, 248 52, 395, 461
SORTED option, 470 DIF parameter file, 201, separate estimates for
SUBTESTS option, 471 242, 245 each class, 462
TRIAL option, 447, DRIFT parameter file, sorted file, 470
468, 472, 475, 825 202, 242, 247 standardized posterior
TSTAT keyword, 86, estimates for classes, residuals, 202
114, 199, 207, 256 462 subject scores file, 315,
UNROTATE option, expected frequencies to 337
472, 815 file, 202, 242, 250 test information curve,
SAVE keyword factor scores, 466 509
on FILES command, factor scores and test information
722, 724 posterior s.d., 466 statistics to file, 207
Save menu, 80, 147, 158, fit statistics file, 313, tetrachoric correlation
199-207, 243-244, 247- 338 matrix, 463
248, 250, 253, 255-256 histogram of ability trial values, 472
Save option scores, 512 unrotated factor
on File menu, 97 in TESTFACT, 461 loadings to file, 472
on main menu, 86 item characteristic Scale
SAVE option curve, 506, 508, 512 specifying in rescaling,
on FILES command, item information curve, 221
263, 288, 291, 312 507-508 Scale constant, 283
on GLOBAL command, item information file, SCALE keyword
80, 86, 114, 147, 157, 314, 342 on CALIB command,
164-165, 199-207, item parameter file, 242, 264, 274, 283, 692
638, 652 253, 314-315, 336, on SCORE command,
Save Output to File dialog 340, 395, 461, 468 53-54, 90, 115, 208,
box, 78, 80 item statistics for 213, 221, 652, 666
Save Output to File option classes, 465 Scaling factor, 28, 549,
on Technical menu, 80 item statistics from 613
Saving criterion score, 464 Schilling, S., 529, 584,
ability score file, 242- item subtest parameter 589-590, 842
243 estimates, 471 Schultz, M.E., 845
bivariate plots, 513

897
INDEX

SCORE command, 75, 90, MOMENTS option, 26, REFERENCE keyword,


115, 206, 208, 264, 285, 79, 91, 115, 208, 214- 91, 115, 208, 215-
289, 292, 316, 410, 474 215 216, 219-220
BIWEIGHT option, 53, NAME keyword, 264, REPEAT keyword, 264
91, 115, 208-209 316, 318 RESCALE option, 264,
CHANCE option, 474, NBLOCK keyword, 264 316, 321
804 NFAC keyword, 477, RSCTYPE keyword, 53-
DIST keyword, 264, 802, 816 54, 90, 115, 208, 213,
282, 316, 695, 706 NFORM keyword, 91, 221-222, 652, 659,
DOMAIN keyword, 26, 115, 208, 215, 219, 675
79, 91, 115, 208-210 220 SAMPLE option, 264,
EAP option, 264, 316- NOADJUST option, 316, 322
317, 693, 707, 721 264, 316, 318 SCALE keyword, 53,
FILE keyword, 26, 79, NOPRINT option, 53, 54, 90, 115, 208, 213,
91, 115, 208-210, 91, 115, 208, 216, 221, 652, 666
468, 475, 477-478, 634 SCORING keyword,
690, 802, 815 NOSCORE option, 264, 264, 316, 322, 706
FIT option, 53, 91, 115, 316, 319 SMEAN keyword, 264,
208, 210, 264, 316- NQPT keyword, 77, 90, 316, 323, 693, 696,
317, 344, 666 115, 197, 198, 208, 707
IDIST keyword, 75, 78, 216, 264, 310-311, SPRECISION keyword,
90, 115, 196-198, 316, 319, 706 478, 802
208, 211, 213, 666, NRATER option, 264, SSD keyword, 264, 316,
675 316, 319 323, 693, 696, 707
INFO keyword, 36, 90, PARAM keyword, 477 TIME option, 479
115, 208, 212, 218, PFQ keyword, 264, 316, WML option, 264, 316-
222-223, 226-227, 320, 693 317
256, 634, 675 PMN keyword, 75, 90, YCOMMON option, 91,
ITERATION keyword, 115, 208, 214, 217, 115, 208, 213, 222,
264, 316, 318 219 653, 675
LIST keyword, 475, POP option, 91, 115, Score estimates
801-804, 816, 817 208, 213, 218, 223, summary statistics, 814
LOADINGS keyword, 634, 653, 675 Score file
476 PRINT option, 264, 316, naming of, 206
LOCATION keyword, 320 SCORE keyword
53-54, 90, 115, 208, PSD keyword, 75, 90, on SAVE command, 52,
213, 221-222, 652, 115, 208, 214, 217- 86, 114, 158, 199,
666 219 206-207, 216, 243,
METHOD keyword, 90, QPREAD option, 264, 263, 312, 315-316,
115, 208, 213-214, 310, 316, 321 337, 693, 707, 724
216-217, 476, 652, QRANGE keyword, SCORE option
666, 685, 781, 801- 264, 316, 321 on PROBLEM
804, 817 QUAD keyword, 478 command, 748, 773
MISSING option, 477 READF option, 91, 115, Score Options dialog box,
MLE option, 264, 316- 208, 215-216, 219-20 79, 209-210, 215
317

898
INDEX

Score Options option suppressing printing, 52, nominal model, 558


on Technical menu, 79, 216 specifying for partial
91, 209-210, 215 type of prior credit model, 272
Scores distribution, 75, 316 SCORING keyword
assigning different using multiple forms, on BLOCK command,
name, 318 219 263, 265, 272, 709
assigning to intervals, variance, 25, 32, 34-35, on SCORE command,
29 575, 589, 623, 637, 264, 316, 322, 706
assumed prior 656, 684, 844 Scoring Only option
distributions, 37 SCORES option on Run menu, 82
Bayes MAP, 35 on FRACTILES Scoring Prior Latent
combining using command, 435-436, Distribution dialog box,
weights, 285-286, 292 776 197-198, 209, 212, 217-
EAP estimation on on INPUT command, 219
general factor, 418 430, 444-445, 776, Scoring Prior Latent
estimating, 31, 410, 530 779, 781, 804 Distribution option
grouping into fractiles, on PROBLEM on Technical menu,
435, 776 command, 349, 350, 197-198, 209, 212,
identifying criterion, 357, 389, 392 217-219
428 on SAVE command, Scott, E.L., 840
method of estimating, 469 SD keyword
214, 317 SCORESEED keyword on EQUAL command,
ML, 34 on SIMULATE 379
requesting for command, 487, 820, on FIX command, 385-
individuals or 822 386
response patterns, 40 Scoring, 82 on PRIORS command,
requesting printing of, defining, 208 393
320 excluding population SELECT command, 418-
rescaling, 29, 32, 40, 53, distribution, 392 419, 457, 480, 489-490,
90, 128, 208, 218, importing item 504, 781, 818
612, 634, 675 parameters, 52 SELECT keyword
saving to external file, information on prior on CALIB command,
52, 395, 461, 465 distributions, 64 47, 89, 115-116, 136,
scaling to user-supplied input of item parameters 686
values, 53 for, 475 on PROBLEM
setting reference group, method of, 52, 695 command, 457, 480-
220 of group-level data, 316 481, 776, 781, 818
specifying number of of respondents, 316 Selection
frequency groups, suppressing correction of graphs, 515
279 for information of items, 480
specifying of in function, 319 testing, 618
TESTFACT, 474 suppressing of, 319 Sequential item testing,
specifying reference test, 605, 621 632
form for, 215, 219 Scoring function, 322 Server tab
specifying type of, 52 generalized partial credit on Settings dialog box,
model, 557-558 84, 85

899
INDEX

Settings dialog box, 83, 85 FORMS keyword, 484, SLOPE keyword


Editor tab, 83 819-820 on TEST command, 47,
General tab, 83 GROUP keyword, 484, 68, 87, 114, 224, 226,
Server tab, 84-85 819-820 227, 231, 263, 325,
Settings option GUESSSEED keyword, 327
on Options menu, 83, 85 485, 822 Slopes, 539
Setup menu, 40 LOADINGS option, for graded model, 379,
General dialog box, 41, 414, 485, 820-821 385, 393
59, 93, 108-111, 113, MEAN keyword, 486, initial parameter, 564
117, 135, 141, 145, 819-820 requesting single
147, 151-153, 156, NCASES keyword, 486, common, 267, 275
160-163, 165, 170, 819-820 selecting prior, 123, 137
174, 176, 183, 225, NFAC keyword, 486, starting values for, 231
229, 233-234 819-820 supplying priors for,
General option, 40, 85- PARM keyword, 487 190-191, 306-307
89, 93, 100 SCORESEED keyword, SLOPES option
Item Analysis dialog 487, 820, 822 on SIMULATE
box, 46, 94, 102, 108- SLOPES option, 414, command, 414, 485,
111, 113, 117-118, 485, 819 819
120-124, 127, 130, Simulation of responses, SMEAN keyword
137-138, 140, 145- 410, 575 on SCORE command,
146, 160-161, 185- and guessing 264, 316, 323, 693,
186, 225, 231 parameters, 483 696, 707
Item Analysis option, form of item parameters Smith, A.F.M., 589
40, 86-89, 94, 102 provided, 485 Smith, M.C., 765
Test Scoring dialog box, means of population of SMOOTH option
52, 147, 150, 157, scores, 486 on BIFACTOR
209, 211-214, 216, number to be generated, command, 422
221-222 486 on FACTOR command,
Test Scoring option, 40, user-supplied item 434, 779
85-86, 90-91 parameters, 483 on SAVE command,
Sheppard's correction, 32, Single-group model 470, 779
708 response data, 334 Smoothed matrix, 584,
Show Selectors option SKIP keyword 791-792
on File menu, 514 on BLOCK command, difference from
Silvey, S.D., 844 263, 265, 272 tetrachoric
Simon, T., 621, 830-831 on PROBLEM correlations, 422
SIMULATE command, command, 457, 780, number of decimals, 420
410, 461, 474, 482, 819 802, 818, 824 printing for bifactor
CHANCE option, 483, SKIPC option solution, 421
484, 820 on CALIB command, SMU keyword
ERRORSEED keyword, 264, 274, 283 on PRIORS command,
483, 820, 822 Skipping 26, 47, 70, 90, 115,
FILE keyword, 483, steps in analysis, 457 187, 190, 264, 305-
819-820 306

900
INDEX

SOPTION option Standard error of 759, 761-763, 768, 770,


on PRIORS command, measurement, 525-526, 835
264, 305-306 589, 606, 608-609, 622 Stone, C.A., 759
Sorted file Standardized difficulties, STOP command, 482, 488
saving, 470 810 Stouffer, S.A., 768, 836
SORTED option Standardized difficulty, Stroud, A.H., 600
on SAVE command, 797-798 Stuart, A., 598
470 Stanley, J.C., 580 Subject scores file
Sorting START command, 395- saving, 315, 337
of response patterns, 396, 748 SUBTEST command, 428,
495 ALL option, 396 458, 489
Spearman, C., 586, 623 FORMAT option, 396, BOUNDARY keyword,
SPRECISION keyword 397 489, 776
on SCORE command, ITEMS keyword, 396 NAMES keyword, 490
478, 802 PARAM keyword, 409 Subtest Items tab
SPRIOR option PARM keyword, 396, on Item Analysis dialog
on CALIB command, 397 box, 48, 87
51, 89, 115-116, 124, Starting values, 397 SUBTEST keyword
137, 140, 264, 274, assigning, 64 on CRITERION
721 entering interactively, command, 428
SQUAD keyword 65-66 on PROBLEM
on TECHNICAL fixing items to, 68, 226, command, 458, 776
command, 496-498 385 SUBTEST option
SSD keyword for category parameters, on SAVE command,
on SCORE command, 266 471
264, 316, 323, 693, for dispersion, 225 Subtests
696, 707 for guessing parameter, boundaries, 489
SSIGMA keyword 227 histogram of scores, 577
on PRIORS command, for intercept, 229, 326 naming, 490
26, 47, 70, 90, 115, for slopes, 231 number of, 42
187, 191, 264, 305, for thresholds, 232, 328 partitioning of main test,
307 importing from file, 65 489
Standard deviation user-supplied, 396-397 Subtests tab
of normal distribution Statistical test on Item Analysis dialog
for intercepts, 452 of number of factors, box, 46, 86, 89
of population 587 Summary statistics
distribution, 379, 385, Stats, Calibration and score estimates, 814
393 Scoring option Swaminathan, H., 524-525
posterior, 35-36, 75, 90, on Run menu, 82, 98, Swineford, F., 586
115, 208, 214, 217- 106 Symonds, P.M., 592
219, 561, 588, 607- Status bar Syntax
608, 610, 655-656, adding or removing, 82 generating from input,
693, 707, 731, 814, Steinberg, L., 91, 345, 399, 81-82, 357, 364, 370,
828 523, 529, 535, 559, 567- 733
Standard error, 31, 586, 569, 597, 617, 638, 738, generating MULTILOG,
598, 601, 631, 716 747, 751-752, 754, 757, 357, 364, 370

901
INDEX

opening multiple files, Assign Calibration Prior Score Options dialog


84-85 Latent Distribution box, 79, 209-210, 215
transferring changes to dialog box, 75 Score Options option,
dialog boxes, 82 Assign Calibration Prior 91
Latent Distribution Scoring Prior Latent
option, 70, 90 Distribution dialog
T Assign Fixed Items box, 197-198, 209,
option, 88 212, 217-219
TABLE option Assign Fixed Items Technical support, 85
on PROBLEM dialog box, 68, 109, Terry, M.E., 835
command, 352, 390- 225, 227 TEST command, 66, 87,
391 Assign Item Parameter 108-109, 114, 121, 127,
TAKE keyword Prior Constraints 153-154, 183, 185-186,
on INPUT command, option, 89-90 224, 263, 274, 297, 316,
55, 63, 87, 114, 163, Assign Item Parameter 325, 327, 398
167, 179, 263, 292, Starting Values ALL option, 355, 367,
298, 697 option, 65, 87-88 398, 730
TECHNICAL command, Assign Scoring Prior BS option, 355, 367,
410, 491 Latent Distribution 379, 398-400, 403,
ACCEL keyword, 491 dialog box, 70, 75 752
FRACTION option, 491 Assign Scoring Prior DISPERSN keyword,
FREQ option, 492 Latent Distribution 47, 68, 88, 114, 224-
IQUAD keyword, 492 option, 90-91 225, 232
ITER keyword, 492, 496 Calibration Options FIX keyword, 25, 36,
ITLIMIT keyword, 493 dialog box, 78, 117, 47, 68, 88, 114, 224,
MCEMSEED keyword, 122, 125, 128, 133 226
493, 803 Calibration Options GR option, 355, 361,
NITER keyword, 494 option, 89 367, 374, 379, 398-
NOADAPT option, 432, Calibration Prior Latent 400
439, 440, 466, 471, Distribution dialog GRADED option, 735,
494, 779, 815, 825 box, 70, 194-195 742, 748
NOSORT option, 495 Data Options dialog GUESS keyword, 68,
NSAMPLE keyword, box, 77, 117, 127, 87, 114, 224, 226-227
495 163, 167 HIGH keyword, 367,
PRECISION keyword, Data Options option, 87, 399, 738, 752
495, 781 89 INAMES keyword, 87,
PRV keyword, 496 Item Parameter Prior 114, 224, 228, 263,
QSCALE keyword, 496 Constraints dialog 325-326, 710
QUAD keyword, 496, box, 69, 117, 134, INTERCEPT keyword,
497 189-192 263, 325-326, 328
QWEIGHT keyword, Item Parameter Starting INTERCPT keyword,
497 Values dialog box, 47, 68, 87, 114, 224,
SQUAD keyword, 496- 65, 109, 225-226, 226, 229, 232
498 228, 230, 232 INUMBERS keyword,
Technical menu, 64 Save Output to File 47-48, 87, 114, 224,
dialog box, 80 230, 233

902
INDEX

ITEMS keyword, 263, saving statistics to file, specifying guessing


325-326, 332, 355, 207 parameters, 437
367, 398, 710 Test Model dialog box, test form identification
L1 option, 355, 367, 352-354, 360, 367, 373, in simulation, 484
379, 398-400, 730 377, 398-400 using external variables,
L2 option, 355, 367, Test Model group box, 430
379, 398-400, 732 355, 367 weighted analysis, 447
L3 option, 355, 367, Test scoring, 605, 621 Testing
379, 398-400, 733 Test Scoring dialog box, adaptive, 632
NBLOCK keyword, 52, 147, 150, 157, 209, assessment, 618-619
263-265, 325, 327, 211-214, 216, 221-222 clinical, 618-619
339-340, 342, 692 General tab, 52, 85-86, qualification, 618-619
NC keyword, 367, 375, 90-91 selection, 618
399, 738, 773 Rescaling tab, 53, 90 sequential item, 632
NO option, 355, 367, Test Scoring option Tests
379, 398-400, 403 on Setup menu, 40, 52, allocating items to, 40
NOMINAL option, 738 85-86, 90-91, 147, assigning items, 46, 224
REPEAT keyword, 264 150, 157, 209, 211- assigning names, 44
SLOPE keyword, 47, 214, 216, 221-222 assigning prior latent
68, 87, 114, 224, 226- Test validity, 575 distributions, 70, 75
227, 231, 263, 325, TESTFACT assigning variant items,
327 and guessing 48
THRESHLD keyword, parameters, 582, 805 different quad points
47, 68, 87, 114, 224, bifactor analysis, 418, and weights, 76
226-227, 230, 232 804, 815 different score scaling
THRESHOLD keyword, classical factor analysis, options, 53
263, 325-326, 328 410, 530 form indicator in
TNAME keyword, 46, criterion score, 429 simulation of records,
87, 114, 224, 232, exploratory factor 484
263, 325, 328, 710 analysis, 584 indicating number of,
Test generalizability, 622 full information factor 152-153, 296
Test information, 32, 33, analysis, 410, 432, naming and numbering
35, 37, 40, 200, 207, 437, 442, 446, 530, items in BILOG-MG,
212, 222, 256, 509, 525- 778, 818 108
527, 594, 608-609, 615, group indicator in naming of, 232
617, 620, 623, 625, 652, simulation of records, norming results, 625
675 484 number of, 24, 325, 352,
curves, 505, 525 input filename, 443 354, 360, 367, 373,
displaying, 509 item statistics, 785 528
editing and saving, 509 methods of estimation, number of items, 185,
expressing in 410, 530 294
comparable units, 222 new features, 410 number of quadrature
function, 33, 525-526, order of commands, points, 216
615, 623, 625 413, 414 number of variant items,
plotting, 208, 212 simulating responses, 186
410, 575

903
INDEX

obtaining information in factor analysis, 410, requesting prior for, 284


comparable units, 222 529, 575 selecting prior, 138
partitioning into number of decimals, 500 starting values for, 232,
subtests, 489 specifying count matrix, 328
personalizing of, 177 499 supplying priors for,
providing quadrature Text Parameters dialog 191-192, 307
points and weights box, 517-518, 521-522 Thurstone, L.L., 576, 583,
by, 70 TGROUPS command, 401, 586, 592, 830-832, 835,
providing separate 403, 750, 758 842
priors, 69 MIDDLE keyword, 401 TIME option
saving item parameter NUMBER keyword, on BIFACTOR
estimates, 471 401-402 command, 423
selecting for calibration, QP keyword, 402 on FULL command, 440
46, 136 Thissen, D., 91, 209, 345, on SCORE command,
total number of items, 399, 523, 529, 535, 545, 479
455 559, 567-569, 576, 597, on TETRACHORIC
TETRACHORIC 601, 616-617, 638, 679, command, 477, 479,
command, 499, 779, 806 688, 730, 738, 740, 747, 501
COMPLETE option, 751-752, 754, 757, 759, Title
500-501 761-762, 763, 765, 768, of analysis, 42, 234,
CROSS option, 499 770, 772, 835, 843, 846 330, 502
LIST option, 499, 779, Thorndike, E.L., 592 TITLE command, 42, 85,
781, 803 THRESHLD keyword 114, 234, 263, 330, 482,
NDEC keyword, 500, on TEST command, 47, 502
779, 781, 803 68, 87, 114, 224, 226- T-matrix
PAIRWISE option, 500- 227, 230, 232 form of, 403-404
501 THRESHOLD keyword TMATRIX command, 754,
RECODE option, 500- on TEST command, 767
501, 779, 825 263, 325-326, 328 AK option, 404
TIME option, 477, 479, THRESHOLD option ALL option, 403
501 on CALIB command, CK option, 404
Tetrachoric correlation 264, 274, 284 DEVIATION option,
matrix, 788 Thresholds 404
and positive category parameters, DK option, 404
definiteness, 584 546, 564, 700, 703 ITEMS keyword, 403
printing to output, 499 for binary graded items, POLYNOMIAL option,
saving to file, 463 379, 385, 393 404, 738, 763
Tetrachoric correlations, for graded model, 379, TRIANGLE option, 404
457, 499, 500, 530, 583- 385, 393 TMU keyword
584, 629, 779, 782, 787, for item-category, 546, on PRIORS command,
789, 806, 818, 826, 828- 553, 700 47, 70, 90, 115, 187,
829, 840, 842 item parameter, 624 191, 264, 305, 307
and MINRES factor of item in 1PL model, TNAME keyword
analysis, 431 539 on TEST command, 46,
difference from of item in 2PL model, 87, 114, 224, 232,
smoothed, 422 539 263, 325, 328, 710

904
INDEX

Toby, J., 768 Two-stage testing, 24, 112, User-Supplied tab


Tolerance intervals, 605, 134, 531-532, 536-537, on Assign Scoring Prior
649 632, 679 Latent Distribution
Total Info option Type dialog box, 76, 90-91
Graphics procedure, of analysis in
505, 509, 527 MULTILOG, 389
Total information curve, of data file used as V
505, 526 imput, 54
Total test information, 509, of data file used as
746 input, 55, 180, 445 VAIM keyword
TPRIOR option of model, 398 on ESTIMATE
on CALIB command, of quadrature, 492 command, 356, 384
51, 89, 115-117, 124, of rescaling of scores, VALUE keyword
138-139, 264, 274, 221 on FIX command, 386
284, 666 TYPE keyword Variable format statement,
Trace line, 548-549, 553, on INPUT command, 350, 352, 359, 366, 372,
555, 558-559, 567, 593- 55, 63, 86, 114, 147- 405, 415, 442, 502
594, 595-597, 738, 742, 148, 150-155, 163, Variance
747, 751, 757-758, 760- 180, 238, 240-241, estimated error, 32-35,
761, 768 638, 666 249, 271, 583, 620,
Trait estimation, 593 623, 625, 655-656,
Trend model, 546, 560 684-685, 699, 838,
TRIAL keyword U 843
on INPUT command, percentage explained by
446, 472 factors, 809
TRIAL option UNFORMAT option score, 25, 32, 34-35,
on SAVE command, on INPUT command, 575, 589, 623, 637,
447, 468, 472, 475, 444 656, 684, 844
825 Uniquenesses, 792 Variant items, 24, 42, 46,
Trial values beta prior for, 452 109, 115, 153, 185-187,
input for full random number for 224, 528, 534, 633, 670,
information factor generating from 672
analysis, 446 distribution, 483 assigning to test, 48
TRIANGLE option UNROTATE option number in test, 186
on TMATRIX on SAVE command, VARIMAX option
command, 404 472, 815 on FACTOR command,
TSIGMA keyword Unrotated factor loadings 413
on PRIORS command, saving to file, 472 VARIMAX rotation, 413,
47, 70, 90, 115, 187, Updating 431, 433, 441-442, 468,
192, 264, 305, 307 of posterior distribution, 576, 583, 588, 629, 781,
TSTAT keyword 281 799, 802
on SAVE command, 86, Urban, F.M., 832 Verhelst, N., 843
114, 199, 207, 256 User-supplied file Verhulst, P-F., 834
Tsutakawa, R.K., 602, 843 specifying number of Vertical equating, 24, 116,
Tucker, L.R., 542, 792, factors for scoring, 120, 129, 138, 528, 532,
840 477 534, 621, 627-628, 658,
661, 845

905
INDEX

View menu, 82 for quadrature, 70, 251, Wolfowitz, J., 840, 843
308, 310, 590, 800 Wood, R., 529, 575, 582,
for Rater's Effect model, 592, 599, 611
W 561 Workspace
providing information setting in PARSCALE,
on, 62 259
Wainer, H., 91, 535, 586, specifying in BILOG- Wright, W.D., 560
616, 638, 760, 763, 770, MG, 154
772, 844, 847 type of, 447
Warm, T., 615 WEIGHTS keyword Y
Warm's on COMBINE
weighted ML command, 264, 285-
estimation, 264, 316- 286 Yates, F., 267, 834
317, 615 on CRITERION YCOMMON option
WEIGHT keyword command, 428-429 on SCORE command,
on COMBINE on QUAD command, 91, 115, 208, 213,
command, 723 51, 75, 77, 90, 115, 222, 653, 675
on INPUT command, 193, 195
441, 444, 446-447, on QUADP command,
470, 779 Z
264, 308-309
WEIGHT option on QUADS command,
on INPUT command, 91, 115, 196-197,
263, 292, 298, 333- Zimowski, M.F., 25, 209,
264, 310-311 528, 531, 537, 576, 589,
336 White, P.O., 583
Weights 591, 679, 688, 844-846
Wilmut, J., 579 Zwarts, M., 599
for calculating criterion Wilson, D.T., 529
score, 429 Zwick, R., 591
Window menu, 85 Zyzanski, S.J., 780, 802-
for combining subscale WML option
scores, 285, 286 803
on SCORE command,
264, 316-317

906

You might also like