100% found this document useful (2 votes)
939 views

EDUR 8331 14a Item Analysis

This document discusses using item analysis to improve test items. It explains that item analysis involves analyzing student responses to test items to identify issues and opportunities for revision. Key components of item analysis include examining overall test performance, calculating item difficulty, measuring item discrimination, and conducting distractor analysis. Item difficulty refers to the percentage of students answering correctly, while item discrimination measures an item's ability to distinguish between more and less knowledgeable students. The document provides an example to illustrate calculating item difficulty and discrimination.

Uploaded by

Loey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
939 views

EDUR 8331 14a Item Analysis

This document discusses using item analysis to improve test items. It explains that item analysis involves analyzing student responses to test items to identify issues and opportunities for revision. Key components of item analysis include examining overall test performance, calculating item difficulty, measuring item discrimination, and conducting distractor analysis. Item difficulty refers to the percentage of students answering correctly, while item discrimination measures an item's ability to distinguish between more and less knowledgeable students. The document provides an example to illustrate calculating item difficulty and discrimination.

Uploaded by

Loey
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

1

Van Blerkom Chapter 11: Using Item Analysis to Improve Test Items

We learned about item analysis for scales, and item analysis for tests is similar, but item analysis for tests has
some differences due to the types of test items available. Recall that item analysis refers to procedures used to analyze
responses to test items. Through these analyses, one may revise items to produce more reliable and more valid tests. A
complete item analysis has several components, each of which will be described below.

1. Test Analysis
When examining a test, one may first note how students performed on the test as a whole. If most students
perform very well on a test, that could mean
 the test content was taught well, students learned the content, and scored high on the test – this is an ideal
situation; or
 the test was too easy and could not adequately assess student content understanding.

If student test performance, overall, was poor, this could mean


 instruction was poor, and this is reflected in poor performance by students on the test;
 material was more difficult than expected so student performance was lower than hoped;
 test content did not align well with instruction or objectives, so test lacks content validity;
 the test was simply too difficult; or
 a combination of several issues listed above.

2. Item Analysis: Item Difficulty


Item difficulty refers to the percentage or proportion of students who correctly responded to an item. If, for
example, 43% of the students correctly responded to item X, then X has an item difficulty of 43% or .43. Usually item
difficulty will be presented in proportion, not percentage, format.

For items scored dichotomously – scored as 1 = correct and 0 = incorrect – the item difficulty is simply the sum of
students who answered the item correctly divided by the number of students who completed the test. For example, if
15 students answered an item, but only 8 students answered the item correctly, the item difficulty would be 8/15 = .53
or 53%.

3. Item Analysis: Item Discrimination


Item discrimination, another component of item analysis, refers to the ability of the item to discriminate
between more and less knowledgeable students. Item discrimination is calculated by finding the difference in item
difficulty between two sets of students, those with above average performance and those of below average
performance. Average performance is defined as the total test score for each student, i.e., sum of all test items or the
overall percentage correct on the test.

Ideally, the two groups are defined such that the top performing 25% or 1/3 (33%) of the class on the test represents the
above average group and the bottom 25% or 1/3 (33%) the below average groups. In most cases for classroom tests, the
number of students will be small, usually less than 30, so best to use the top 1/3 and bottom 1/3 for calculating item
discrimination.

Item discrimination for a given item would be:

item discrimination = item difficulty for top - item difficulty for low
performing group performing group

If the top 1/3 of students all correctly answered the chosen item, their item difficulty would be 1.00, and if 40% of the
bottom 1/3 correctly answered the item correctly, their item difficulty would be .40. The item discrimination would be:

item discrimination = 1.00 - .40 = .60


2

The larger the discrimination index, the better performing the item in terms of distinguishing between knowledgeable
students and less knowledgeable students for the content domain and skill level sampled by the item. For classroom,
teacher-constructed tests, discrimination indexes of .20 and greater are good when using the upper 1/4 vs. lower 1/4,
and one should expect lower levels of discrimination when using larger groups as the basis of the upper and lower
groups, such as upper 1/3 vs. lower 1/3, or upper 1/2 vs. lower 1/2. Some measurement specialists recommend only
using the top 10 and bottom 10 performers, even if a hundred (or more) students took the test. As these
recommendations reveal, there is no uniform agreement on how to define the top and bottom group, so for your own
practice work with whatever seems best for you. Key is to have a large sample for both groups, i.e., 10 or more students
in each group.

The discrimination index ranges from -1.00 to 1.00. Negative values indicate that less knowledgeable students are
answering the item correctly more often than more knowledgeable students. This is a signal that something is wrong
with the item and it is deficient. A common problem with such items is ambiguity, i.e., the item has more than one
correct response or no correct response.

Another method for calculating the discrimination index is to calculate the correlation between the item score and the
total test score across the sample of students. This correlation is called the item-total correlation and is the same item-
total correlation learned for item analysis with scales. This is especially the method used when one wishes to calculate
discrimination for an item that is scored with multiple points or partial credit, like essay or short-answer items.

Normally, one will use all students, not just the top and bottom groups, to calculate the item-total correlation. As with
the discrimination index, the item-total correlation ranges from -1.00 to 1.00. The better the item discriminates, the
larger will be the positive item-total correlation. Like negative discrimination indices, negative correlations indicate that
the item is not properly discriminating and signals a problematic item.

To illustrate both methods for calculating discrimination, listed below are the total test scores and the item
performances for two groups of students. Note that all information is sorted by Total Test Score, so it is easy to identify
top and bottom performers.

Table 1
Example Discriminations: Note that one should sort the table or spreadsheet by Total Test Score
Student Group Item 1 Item 20 Item 20 Total Test Score
Identification Multiple Choice Essay Essay (% Correct)
(1 = correct, (Partial Score out (Proportion Correct
0 = incorrect) of possible total 5) out of 5 Points)

Bill Top 1/3 1 4.5 4.5/5 = .90 97


Beth Top 1/3 0 4 4.0/5 = .80 93
Bertha Top 1/3 1 4.5 4.5/5 = .90 89
Brian Top 1/3 1 3.5 3.5/5 = .70 87

Brittany Middle 1/3 1 4 4.0/5 = .80 85


Bailey Middle 1/3 1 3.5 3.5/5 = .70 80
Brenda Middle 1/3 0 4 4.0/5 = .80 79
Bonnie Middle 1/3 1 2.5 2.5/5 = .50 76

Bryan Bottom 1/3 0 3 3.0/5 = .60 75


Bart Bottom 1/3 0 3.5 3.5/5 = .70 74
Barney Bottom 1/3 1 2.5 2.5/5 = .50 73
Bernie Bottom 1/3 0 2 2.0/5 = .40 66
3
The Item 1 difficulty the top group is 3/4 = .75, and 1/4 = .25 for the bottom group. The item discrimination for Item 1 is:

Item 1 discrimination = .75 - .25 = .50

The item-total correlation for all 12 students uses the 1,0 scoring for Item 1 and the Total Test Score. The Pearson r = .
36, which, while lower, corresponds with the above discrimination index. Both the discrimination index of .50 and the
correlation of .36 tells us students with higher scores also performed better on item 1, which is what we hope to find
with a good performing item.

For the Item 20, the essay item, discrimination can be found by calculating the difference in mean proportion correct
between the two groups. For the Top 1/3 group, the mean item difficulty is (.90+.80+.90+.70)/4 = .825. For the bottom
1/3 group the mean item difficulty is (.60+.70+.50+.40)/4 = .55.

Item 20 discrimination = .825 - .55 = .275

The item-total correlation for Item 20 is found by correlating the Item 20 score, either raw score or proportion correct,
with the Total Test Score for all students. Since Pearson r is invariant to linear transformations, it does not matter
whether raw score or proportion score is used for Pearson r. For Item 20, the item-total correlation is .853. Both item
discrimination and item-total correlation are positive which indicates the essay item functions as it should by
discriminating between more and less knowledgeable students.

Finally, there is a direct relationship between item difficulty and item discrimination. The more (or less) difficult the item,
the less it discriminates. When item difficulty approaches the half-way point (.50), item discrimination will most likely be
maximized.

4. Item Analysis: Distractor Analysis


Distractor analysis enables one to determine the pattern of responses across all options of a multiple-choice
item. This is useful because it allows one to learn which are the more successful distractors. If an item is performing
appropriately, then the upper performing group should choose the correct option more frequently than any other
option, and, as noted in the discrimination discussion, should choose the correct option more often than the lower
performing group. Additionally, the lower performing group should choose distractors more often than the upper
performing group.

Table 2
Distractor Analysis of Item 1 (Numbers are percentages)
Item X A B* C Omit
Upper 1/4 25 75 0 0
Lower 1/4 25 25 50 0
All Students 25 60 15 0
Note. The Omit category represents students who did not answer the item.
*Correct response.

Note that Item 1, illustrated in Table 2, behaves accordingly. That is, the upper group chose B more often than the lower
group, and the lower group chose distractors (options A and C) more often than the upper group.

Should the pattern illustrated in Table 2 not occur, then the item will probably need some revision. For example, if lower
group chooses the correct response more often than the upper group, then the item is most likely ambiguous. Or, if the
upper group chooses one of the distractors more often than the lower group (but a larger percentage of the upper
groups still chooses the correct response), then that distractor needs revision. The last type of problem that may be
observed is the case in which more students (both upper and lower) choose a distractor rather than the correct option.
When this occurs, students view the distractor as more correct than the correct option, and the item should be carefully
reviewed.
4
Distractor analysis is beneficial in learning which skills and content students are having the greatest and least success
with. Oftentimes items will contain distractors that represent common mistakes, and when students select such
distractors with great frequency, this is a clear indication that further instruction is necessary. Thus, proper
interpretation of distractors may lead to significant alterations to instruction.

Note: The three components of item analysis (difficulty, discrimination, and distractor analysis) should be viewed
cautiously when one has a small sample of students. Ideally large groups, say 50 or greater, are needed for reliable
analysis. But when small numbers are present, an analysis should be viewed as preliminary, and one should collect more
data from additional classes over time.

5. Detecting Ambiguities That are Causing Students Difficulty


The item difficulty index indicates which items are causing students difficulty. However, a more detailed
examination of item analysis may reveal additional problems.

5a. Interpreting an Item Analysis


If an item is causing some difficulty, then one must next determine whether the difficulty results from the item’s
ability to discriminate between more and less knowledgeable students. The third step is the examine the distractor
analysis. Usually one need not examine, in detail, the distractor analysis if the item has suitable difficulty and
discrimination. In sum, to interpret an item analysis, one must first examine the item difficulty, then item discrimination,
and finally distractor analysis.

When an item has low discrimination and moderate difficulty, then the item is most likely ambiguous and should be
revised, or perhaps instruction was less than adequate and should be corrected. When examining an item, one should
consider all aspects: stem, correct option, and distractors. Moreover, one should also ensure that the item reflects the
desired capability and performance objective.

5b. Using Student Input to Interpret an Item Analysis


One should first perform an item analysis of all items on the test. Once items are identified as questionable,
based upon the item analysis, the instructor should next survey students and find why they selected various distractors,
or why the item posed difficulty for students. Often this discussion will reveal either student misunderstandings, or
problems with instruction. In addition, this communication may reveal problems inherent with the item.

6. Item Analysis for Item Formats Other than Multiple-Choice


It is possible to perform an item analysis for (a) Completion and Short-Answer Items, (b) Essay Items, and (c)
Alternate-Choice Items. The item analysis will typically consist of difficulty and discrimination. Distractor analysis is
unique to multiple-choice items.

6a. Completion and Short-Answer Items


Since these items can usually be scored as either 1 (correct) or 0 (incorrect), calculation of difficulty and
discrimination is the same as discussed earlier. If one assigns partial points for short-answer items, then one may
calculate both difficulty and discrimination using the procedures described below for essay items.

6b. Essay Items


With essay items one may incorporate partial grades (e.g., 4 out of 5 possible points), and this makes calculation
of item difficulty more cumbersome.

One method for calculating difficulty is to determine what is a minimally acceptable response level, in terms of points
awarded, and calculate the percentage of students who scored above (or below) this level. For example, an essay item
may be worth a total of 10 points, yet one may decide that a minimum of 6 points is needed to be consider acceptable
performance on the essay. One then calculates the percentage of student who scored 6 or more points and this
percentage represents the item difficulty. Another approach is that illustrated above in section 3 – calculate the mean
5
proportion of points earned for all students who completed the essay. For example, if the essay is worth 10 points, and
the mean points earned across all students is 7.3, then the item difficulty is 7.3/10 = .73.

Item discrimination may be calculated using the discrimination technique just described. After determining the top and
bottom performing groups for the test as a whole, one may then calculate, for each group, the percentage of students
awarded 6 or more points to the essay item. Thus, for example, the top performing group may have had 85% receive 6
or more points for the essay, and the bottom group may have had 53% receive 6 or more, so the item discrimination
would be .85 - .53 = .32. Another alternative is to use the approach illustrated above in section 3, Table 1. Find the
proportion correct for both upper and lower groups and then find the difference. For the Item 20 in Table 1, the upper
1/3 of students had a mean difficulty of (.90+.80+.90+.70)/4 = .825 and the bottom 1/3 group had a mean difficulty of
(.60+.70+.50+.40)/4 = .55, so the discrimination index was .825 - .55 = .275

A better approach, however, is the use the item-total correlation. That is, calculate the correlation between the points
received for the essay item by each student with the total score received by each student on the test. The higher the
positive correlation, the better the item discriminates.

5c. Alternate-Choice Items


Since these items can usually be scored as either 1 (correct) or 0 (incorrect), calculation of difficulty is the same
as discussed for multiple-choice items, and the procedure for calculating discrimination is identical to that discussed
above for multiple-choice formats.

Self-Test: Using Item Analysis to Improve Test Items

Table 3
An item analysis for two items.

Item 1 A B *C D E Omit
Upper 1/3 4 6 74 5 11 0
Lower 1/3 21 12 38 25 4 0
All Students 12 10 54 15 9 0

54%, or 15 of 28 students, taking the test correctly answered item; 36


percentage points separate the upper and lower groups on the correct
answer (option C).

Item 2 A B C D E Omit
Upper 1/3 28 42 15 0 15 0
Lower 1/3 13 51 10 0 26 0
All Students 20 46 13 0 21 0

46%, or 13 or 28 students, taking the test correctly answered item; -9


percentage points separate the upper and lower groups on the correct
answer (option B)

Use Table 3 to answer items 1 through 14.

1. What is the difficulty of item 1?


2. What is the difficulty of item 2?
3. What is the discrimination of item 1?
4. What is the discrimination of item 2?
5. Which options, distractors, within item 1 are performing appropriately?
6
6. Which options within item 2 are performing appropriately?

Items 7 through 14 are interpretations of the information in Table 3. Indicate (yes or no) whether each
interpretation is correct.

7. Item 1 is easier than item 2.


8. Item 1 is more ambiguous than item 2.
9. The analysis suggests the discrimination of item 1 would be improved if option E were revised.
10. The analysis suggests that the discrimination of item 2 would be improved if option E were revised.
11. If reviewing the wording of item 1 indicates that option A is a good distractor, the teacher should evaluate the
adequacy of instruction relevant to this item.
12. If reviewing the wording of item 2 indicates that option A is a good distractor, the teacher should evaluate the
adequacy of instruction relevant to this item.
13. Overall, item 1 appears to be well written.
14. Overall, item 2 appears to be well written.

Items 15 through 19 suggest some benefits of using student input to help interpret an item analysis. Indicate
(yes or no) whether each of these statements represents a benefit.

15. Test scores can be more readily corrected for student guessing.
16. Ambiguity within a test item can be identified.
17. Ambiguity within instruction given students can be identified.
18. The reliability of the test can be estimated.
19. Misconceptions learned by students can be addressed.

You might also like