Unit 4: Qualities of A Good Test: Validity, Reliability, and Usability
The document discusses key qualities of a good test: validity, reliability, and usability. It defines four types of validity (face, content, criterion-related including concurrent and predictive, and construct validity) and how each ensures a test accurately measures the desired objectives. Reliability refers to a test's consistency and is categorized into scorer, content, and temporal reliability. Usability considers practical factors like a test's availability, cost, administration, scoring, and interpreting results.
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
447 views
Unit 4: Qualities of A Good Test: Validity, Reliability, and Usability
The document discusses key qualities of a good test: validity, reliability, and usability. It defines four types of validity (face, content, criterion-related including concurrent and predictive, and construct validity) and how each ensures a test accurately measures the desired objectives. Reliability refers to a test's consistency and is categorized into scorer, content, and temporal reliability. Usability considers practical factors like a test's availability, cost, administration, scoring, and interpreting results.
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18
Unit 4: Qualities of a good Test: Validity,
reliability, and Usability
I/ Validity Meaning • Validity refers to the capacity of a test to generate accurate and relevant information (data) that are emphasized in the objectives of interest. • Shortly validity is the accuracy of a test in measuring the desired objectives (learning outcomes in the context of teaching learning processes). • For example, a physics test should measure accurately the desired knowledge, skills or competence of physics rather than unrelated abilities such as language abilities. • There are four types of validity, namely face validity, content validity, criterion related validity, and construct validity. Face Validity • This refers to the reasonableness of items (questions) of a test with regard to the background of the test takers it is meant for. • Here reasonableness is explained by relevance of the items in the test to the background of the test takers , the adequacy of the items in generating the desired information, and the coverage of the test. • A test may have face validity without having actual validity. Content Validity • Content is concerned with the extent to which items in a test are related to the contents of the course, chapter or unit covered in the teaching-learning processes. • There shouldn't be items from the contents areas which are not touched upon in instruction. Criterion Related Validity • This type of validity is concerned with checking the extent to which a test correlates with another test. • Here there are two tests. • One is called predictor and the other is criterion. • There are two types of criterion related validity, namely concurrent validity and predictive validity. Concurrent Validity • This refers to the extent to which a newly developed test in certain area of interest is related to the existing (old) test in the same area. In this case the existing test is criterion. • This type of validity is determined when there is a need to replace the old test with a new one because the difficulties associated with the existing test in using it, having high validity and reliability. Predictive Validity • This is concerned with the capacity of a test to predict the future success of an individual in certain area of interest. • For example, in Ethiopian context, the results on ESLCE may predict how the students scoring high in it will be successful in their academic career in the colleges and in the universities. • Here the ESLCE is predictor whereas a test given later in the universities to assess the academic performance of the students is criterion. • The higher the degree of correlation between the scores of the predictor and the scores of the future criterion test the higher the predictive validity of the former. • Here the predictive validity is determined after a long period of time. Construct Validity • This type of validity is concerned with the degree to which a test result tells us something meaningful about person's trait (relatively permanent behavior) such as achievement motivation, degree of sociability, introversion etc. • Construct validity is determined by correlating measures of the construct with known measures of observable criterion that the construct is highly related to. • For example, if a researcher is interested in construct validation of a test of achievement motivation, then he/she might correlate the scores from such a test with the scores from a test for scholastic aptitude since achievement motivation is known to be highly related to academic achievement. II/ Reliability • Reliability refers to the capacity of test to produce consistent (invariable) results when taken by different norm groups or when it is taken by the same group at different times. • There are three type of reliability: scorer reliability, content reliability, and temporal reliability • Scorer Reliability: This type of reliability is concerned with the agreement between two markers of the same answer of a test (also called inter-scorers' reliability) or consistency of score of the same answer scored by the same scorer at different occasions (also called intra-scorers' reliability). • Content Reliability: This refers to capacity of items in a test to measure related or similar area of behavior. • Temporal Reliability: This is concerned with consistency (stability) of test results over time. • N.B. Reliability of a test could be determined by one or a combination of the following five common methods of reliability determination. These are test-retest method, parallel form method, split-half method, Kuder-Richardson formula 20 and 21 methods, and Crombach's alpha method. III/ Usability of a Test • This refers to all the practical considerations that go into our decisions to use a particular test or another. • Such practical considerations include availability of the test, its cost, mode of administration, scoring procedures, and test score interpretations.