0% found this document useful (0 votes)
37 views12 pages

Chapter 6 Writing and Evaluating Test Items

The document discusses various formats for writing and evaluating test items, including dichotomous, polytomous, multiple-choice, and Likert formats, along with their definitions and examples. It emphasizes the importance of item analysis, covering aspects such as item difficulty, discriminability, and methods for evaluating test items. Additionally, it highlights the significance of criterion-referenced tests and the limitations of item analysis procedures in assessing student learning outcomes.

Uploaded by

Zenny Samson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views12 pages

Chapter 6 Writing and Evaluating Test Items

The document discusses various formats for writing and evaluating test items, including dichotomous, polytomous, multiple-choice, and Likert formats, along with their definitions and examples. It emphasizes the importance of item analysis, covering aspects such as item difficulty, discriminability, and methods for evaluating test items. Additionally, it highlights the significance of criterion-referenced tests and the limitations of item analysis procedures in assessing student learning outcomes.

Uploaded by

Zenny Samson
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

PSYCHOLOGICAL ASSESSMENT

In Partial Fulfilment of the Requirement in


PSY 312 (2194)
Submitted to:

LLOYD SAJOL, MPsy

Submitted by:

Anito, Christine Jade


Bantulo, Elaiza
Deligencia, Andrea Marie
Tan, Diether James L.
Wines, Weinslet F.
CHAPTER 6: WRITING AND EVALUATING TEST ITEMS

TERM DEFINITION REMARKS


6.1 Item Writing
6.1.1 Item Formats
Dichotomous Format ● Offers two alternatives for each item.
● Usually a point is given for the selection of one of the alternatives.
● The most common example of this format is the true-false
examination.

True-False Examination ● The student’s task is to determine which statements are true and
which are false.

Polytomous Format ● Sometimes called polychotomous.


● Resembles the dichotomous format except that each item has
more than two alternatives.
● Typically, a point is given for the selection of one of the
alternatives, and no point is given for selecting any other choice.

Multiple-Choice Examination ● Is the polytomous format you have likely encountered most often.
● Are easy to score, and the probability of obtaining a correct
response by chance is lower than it is for true-false items.

Distractors ● Incorrect choices.


● As we shall demonstrate in the section on item analysis, the
choice of distractors is critically important.
● Poorly written distractors can adversely affect the quality of the
test.

Corrected Score = R- W ● The formula to correct for guessing on a test.


------ ● R = the number of right responses.
n-1 ● W = the number of wrong responses.
● n = the number of choices for each item.
Likert Format ● One popular format for attitude and personality scales requires
that a respondent indicate the degree of agreement with a
particular attitudinal question.
● It was used as part of Likert’s (1932) method of attitude scale
construction.
Category Format ● A technique that is similar to the Likert format but that uses an
even greater number of choices.
● A category scale need not have exactly 10 points; it can have
either more or fewer categories.
10-Point Scale ● Common in psychological research and everyday conversation,
controversy exists regarding when and how it should be used.
Adjective Checklists ● One format common in personality measurement is the adjective
checklist (Gough, 1960).
● With this method, a subject receives a long list of adjectives and
indicates whether each one is characteristic of himself or herself.
● Can be used for describing either oneself or someone else.
● Requires subjects either to endorse such adjectives or not, thus
allowing only two choices for each item.

Q-sort ● Increases the number of categories.


● Can be used to describe oneself or to provide ratings of
others (Stephenson, 1953).
● With this technique, a subject is given statements and
asked to sort them into nine piles.

6.1.2 Other Possibilities


● The forced-choice (such as multiple-choice and Q-sort) and
Likert formats are clearly the most popular in contemporary tests
and measures.
6.2 Item Analysis
Item Analysis ● A general term for a set of methods used to evaluate test items, is
one of the most important aspects of test construction.
● The basic methods involve assessment of item difficulty and item
discriminability.

6.2.1 Item Difficulty


Item Dificulty ● Defined by the number of people who get a particular item
correct.
● How hard should items be in a good test? This depends on the
uses of the test and the types of items.
● The first thing a test constructor needs to determine is the
probability that an item could be answered correctly by chance
alone.

6.2.2 Discriminability
Discriminability ● Another way is to examine the relationship between performance
on particular items and performance on the whole test.

Item Discriminability ● Determines whether the people who have done well on particular
items have also done well on the whole test.

The Extreme Group Method ● This method compares people who have done well with those
who have done poorly on a test.
● For example, you might find the students with test scores in the
top third and those in the bottom third of the class. Then you
would find the proportions of people in each group who got each
item correct.

Discrimination Index ● The difference between these proportions.

The Point Biserial Method ● Another way to examine the discriminability of items is to find the
correlation between performance on the item and performance on
the total test.

Point Biserial Correlation ● The correlation between a dichotomous (two- category) variable
and a continuous variable.

6.2.3 Pictures of Item Characteristics


Item Characteristic Curve ● A valuable way to learn about items is to graph their
characteristics, which you can do with the item characteristic
curve.
● For particular items, one can prepare a graph for each individual
test item.
● On these individual item graphs, the total test score is plotted on
the horizontal (X) axis and the proportion of examinees who get
the item correct is plotted on the vertical (Y) axis.

Drawing the Item Characteristic ● To draw the item characteristic curve, we need to define discrete
Curve categories of test performance.
● If the test has been given to many people, we might choose to
make each test score a single category (65, 66, 67, and so on).
● However, if the test has been given to a smaller group, then we
might use a smaller number of class intervals (such as 66–68,
69–71).

Item Response Theory (IRT) ● Newer approaches to testing based on item analysis consider the
chances of getting particular items right or wrong. These
approaches, now known as item response theory (IRT), make
extensive use of item analysis (DeMars, 2010; DeVellis, 2012).

Classical Test Theory ● A score is derived from the sum of an individual’s responses to
various items, which are sampled from a larger domain that
represents a specific trait or ability.

External Criteria ● You can use similar procedures to compare performance on an


item with performance on an external criterion.
● Rarely used in practice (Linn, 1994a, 1994b).

6.2.4 Linking Uncommon Measures


● One challenge in test applications is how to determine linkages
between two different measures. There are many cases in which
linkages are needed.
National Assessment of ● Problems in test linkages became important in the late 1990s
Educational Progress (NAEP) when the National Assessment of Educational Progress
(NAEP) program was proposed.

6.2.5 Items for Criterion-Referenced Tests


Criterion-Referenced Test ● Compares performance with some clearly defined criterion for
learning.
● This approach is popular in individualized instruction programs.
● For each student, a set of objectives is defined that state exactly
what the student should be able to do after an educational
experience.
● Would be used to determine whether this objective had been
achieved.

Antimode ● The least frequent score.


Cutting Score or Point ● This point divides those who have been exposed to the unit from
those who have not been exposed and is usually taken as the
cutting score or point.
● What marks the point of decision.

6.2.6 Limitations for Item Analysis


● The growing interest in criterion-referenced tests has posed new
questions about the adequacy of item-analysis procedures. The
main problem is this: though statistical methods for item analysis
tell the test constructor which items do a good job of separating
students, they do not help the students learn.
CHAPTER 6: WRITING AND EVALUATING TEST ITEMS
SUMMARY
● Dichotomous Format- It offers two alternatives for each item and the most common example of this format is the true-false examination.
● True-False Examination- The student’s task is to determine which statements are true and which are false.
● Polytomous Format- Also called polychotomous. Each item has more than two alternatives and no point is given for selecting any other choice.
● Multiple-Choice Examination- Are easy to score, and the probability of obtaining a correct response by chance is lower than it is for true-false items
● Distractors- Incorrect Choices
● Corrected Score = R-W/n-1 - The formula to correct for guessing on a test.
● Likert Format- One popular format for attitude and personality scales requires that a respondent indicate the degree of agreement with a particular attitudinal
question.
● Category Format- A category scale need not have exactly 10 points; it can have either more or fewer categories.
● 10-Point Scale- Although it is frequently employed in psychological research and regular conversation, there is debate over whether and how to apply it.
● Adjective Checklists- It is also common format in personality measurement in this a subject receives a long list of adjectives and indicates whether each
one is characteristic of himself or herself.
● Q-sort- With this technique, a subject is given statements and asked to sort them into nine piles.
● Item analysis- The basic methods involve assessment of item difficulty and item discriminability.
● Item Difficulty – Defined by the quantity of individuals who correctly answer a specific question. This relies on how the test is used and what kinds of items
are included.
● Discriminability- It is another way in examining the relationship between performance on particular items and performance on the whole test.
● Item Discriminability- It Determines whether the people who have done well on particular items have also done well on the whole test.
● The Extreme Group Method- Compares the Extreme Group Method.
● Discrimination Index- The difference between these proportions.
● The Point Biserial Method- Also another way in examining discriminability is to find the correlation between performance on the item and performance on the
total test.
● Point Biserial Correlation- The correlation between a dichotomous (two- category) variable and a continuous variable.
● Item Characteristic Curve- Using the item characteristic curve is a valuable method for learning. Each test item can be plotted on individual graphs, with the
total test score and the proportion of correct items plotted on the horizontal and vertical axes.
● Drawing the Item Characteristic Curve- To draw the item characteristic curve, we need to define discrete categories of test performance. If there are many
people each test score is a single category and if the test has been given to a smaller group, then we might use a smaller number of class intervals.
● Item Response Theory (IRT)- Newer approaches to testing based on item analysis consider the chances of getting particular items right or wrong.
● Classical Test Theory- A score is calculated by combining an individual's responses to items from a larger domain that represents a specific trait or ability.
● External Criteria- using similar procedures to compare performance on an item with performance on an external criterion.
● National Assessment of Educational Progress (NAEP)- Problems in test linkages became important in the late 1990s when the National Assessment of
Educational Progress (NAEP) program was proposed.
● Criterion-Referenced Test - Compares performance with some clearly defined criterion for learning.
● Antimode- The least frequent score.
● Cutting Score or Point- This point divides those who have been exposed to the unit from those who have not been exposed and is usually taken as the
cutting score or point.
● Limitations for Item Analysis- The increasing interest in criterion-referenced tests raises questions about the effectiveness of item-analysis procedures, as
statistical methods for item analysis only provide information about which items effectively separate students, not their learning outcomes.
CHAPTER 6: MULTIPLE CHOICE QUESTION

1. It offers two alternatives for each item. Usually a point is given for the selection of one of the alternatives. The most common example of this format is the true-
false examination.
a. Research Format c. Multiple Choice Format
b. Dichotomous Format d. Polytomous Format
2. A point is given for the selection of one of the alternatives, and no point is given for selecting any other choice and it is a popular method of measuring academic
performance in large classes, an example of this is multiple-choice examination.
a. Research Format c. Quantitative Format
b. Dichotomous Format d. Polytomous Format
3. When taking a multiple-choice examination, you must determine which of several alternatives is “correct.” Incorrect choices are called.
a. Destruction c. Distraction
b. Distracted d. Distractors
4. Describes the chances that a low-ability test taker will obtain each score.
A. Likert format c. Distractors
B. Guessing Threshold d. Visual Analogue Scale
5. If 84% of the taking a particular test get item 24 correct, then the difficulty level for that item is .84.
a. Checklist and Q-sorts c. discrimination index
b. Discriminability d. Item difficulty
6. Newer approaches to testing based on item analysis consider the chances of getting particular items right or wrong.
a. Item response theory c. Item Characteristics Curve
b. External criteria d. Criterion referenced
7. if you were building a test to select airplane pilots, you might want to evaluate how well the individual items predict success in pilot training or flying performance.
What type of criteria is this?
a. internal criteria c. internal external criteria
b. external internal criteria d. external criteria
8. This method compares people who have done well with those who have done poorly on a test.
a. the point biserial method c. the extreme group method
b. item characteristic curve d. discriminability
9. What is the most common example of dichotomous format?
a. false- true examination c. true- true examination
b. true- false examination d. false-false examination
10. a general term for a set of methods used to evaluate test items, is one of the most important aspects of test construction.
a. item analysis c. item difficulty
b. item discriminability d. item curve
11. A popular approach in individualized instruction program that compares performance with some clearly defined criterion for learning.
a. Criteria-Reference Test c. Criterion-Referenced Test
b. Criterion-Test References d. Criteria-Based Test
12. Criterion-referenced tests offer many advantages to newer educational approaches such as;
a. Computer-Assisted Instruction c. Criterion-Based Test
b. Assistant Instruction d. All of the above
13. Popular for attitude scales.
a. Likert Format c. Research Format
b. Dichotomous Format d. Polytomous Format
14. Are among the many item formats used in personality research. These methods require people to make judgments about whether or not certain items describe
themselves or others.
a. Checklists and Q-sorts c. Criterion-Referenced Test
b. Checkers and L-Sorts d. T-Test and Q-Test
15. A valuable way to learn about items is to graph their characteristics?
a. Item Character Curve c. Item Characteristic Curve
b. Item Curve d. None of the Above
IDENTIFICATION

1. Offers two alternatives for each item. Usually a point is given for the selection of one of the alternatives. The most common example of this format is the true-
false examination.

2. The student’s task is to determine which statements are true and which are false.

3. Sometimes called polychotomous. Resembles the dichotomous format except that each item has more than two alternatives. Typically, a point is given for the
selection of one of the alternatives, and no point is given for selecting any other choice.

4. Is the polytomous format you have likely encountered most often. Are easy to score, and the probability of obtaining a correct response by chance is lower
than it is for true-false items.

5. Incorrect choices. As we shall demonstrate in the section on item analysis, the choice of distractors is critically important. Poorly written distractors can adversely
affect the quality of the test.

6. The formula to correct for guessing on a test. R = the number of right responses. W = the number of wrong responses. n = the number of choices for each
item.

7. One popular format for attitude and personality scales requires that a respondent indicate the degree of agreement with a particular attitudinal question.

8. A technique that is similar to the Likert format but that uses an even greater number of choices.

9. Common in psychological research and everyday conversation, controversy exists regarding when and how it should be used.

10. With this method, a subject receives a long list of adjectives and indicates whether each one is characteristic of himself or herself. Can be used for describing
either oneself or someone else.

11. Increases the number of categories. Can be used to describe oneself or to provide ratings of others. With this technique, a subject is given statements and
asked to sort them into nine piles.

12. A general term for a set of methods used to evaluate test items, is one of the most important aspects of test construction. The basic methods involve assessment
of item difficulty and item discriminability.

13. Defined by the number of people who get a particular item correct. The first thing a test constructor needs to determine is the probability that an item could be
answered correctly by chance alone.

14. Another way is to examine the relationship between performance on particular items and performance on the whole test.
15. Determines whether the people who have done well on particular items have also done well on the whole test.

16. This method compares people who have done well with those who have done poorly on a test to find the proportions of people in each group who got each
item correct.

17. The difference between the proportions.

18. Another way to examine the discriminability of items is to find the correlation between performance on the item and performance on the total test.

19. The correlation between a dichotomous (two- category) variable and a continuous variable.

20. A valuable way to learn about items is to graph their characteristics with particular items, one can prepare a graph for each individual test item. On these
individual item graphs, the total test score is plotted on the horizontal (X) axis and the proportion of examinees who get the item correct is plotted on the vertical (Y) axis.

21. Newer approaches to testing based on item analysis consider the chances of getting particular items right or wrong. It makes extensive use of item analysis

22. A score is derived from the sum of an individual’s responses to various items, which are sampled from a larger domain that represents a specific trait or ability.

23. Is rarely used in practice and you can use similar procedures to compare performance on an item with performance on an external criterion.

24. The program that was proposed due to the problems in test linkages became important in the late 1990s.

25. Compares performance with some clearly defined criterion for learning. This approach is popular in individualized instruction programs and would be used to
determine whether this objective had been achieved.

26. The least frequent score.

27. This point divides those who have been exposed to the unit from those who have not been exposed and is usually taken as the_______.
ANSWER KEY;

Answer Key: Chapter 6


MCQ
1. B 6. A 11. C 4. B 9, B 14. A
2. D 7. D 12. A 5. D 10. A 15. C
3. D 8. C 13. A

Identification
1. Dichotomous Format 14. Discriminability
2. True-False Examination 15. Item Discriminability
3. Polytomous Format 16. Extreme Group Method
4. Multiple-Choice Examination 17. Discrimination Index
5. Distractors 18. The Point Biserial Method
6. Corrected Score = R- W_ 19. Point Biserial Correlation
N-1 20. Item Characteristic Curve
7. Likert Format 21. Item Response Theory or IRT
8. Category Format 22. Classical Test Theory
9. 10-Point Scale 23. External Criteria
10. Adjective Checklist 24. National Assessment of Educational Progress (NAEP)
11. Q-Sort 25. Criterion-Referenced Test
12. Item Analysis 26. Antimode
13. Item Difficulty 27. Cutting Score or Po

You might also like