Arts Assessment Glossary
Arts Assessment Glossary
This glossary was developed by the members of SCASS/Arts Education Assessment Consortium. The terms below are
words/phrases in general use in the field of assessment.
Accommodations Anchor
approved/standardized administrative or scoring (also called exemplars or benchmarks); a sample of
adjustments (e.g., large print or Braille test booklets, student work (product or performance) used to
individual or small group administrations, reading the illustrate each level of a scoring rubric; critical for
test to the student) made for special populations training scorers of performances since it serves as a
taking standardized assessments standard against which other student work is
compared.
Accountability testing
Using student achievement tests to measure the Aptitude test
effectiveness of an educational program. Usually a test which uses past learning and ability to predict
summative in nature and in the form of state or other what a person can do in the future; aptitude tests
large-scale test designed to conform to psychometric depend heavily on out-of-school experiences rather
standards, an accountability test purports to assign than in-school learning (Also see intelligence test.)
responsibility for the success or failure of an
Assessment
educational program or system by demanding that
schools demonstrate the impact and effectiveness of The process of collecting and analyzing data for the
educational programs in order to justify the money purpose of evaluation. The assessment of student
invested in education. Accountability testing is learning involves describing, collecting, recording,
designed to provide achievement data that is used to scoring, and interpreting information about
evaluate and presumably improve the system. performance. A complete assessment of student
learning should include measures with a variety of
Achievement test formats as developmentally appropriate.
a test designed to measure students’ “school taught” Assessments and the tests they use are usually
learning, as opposed to their initial aptitude or classified by how the data are used; either formative,
intelligence. benchmark or interim, and summative.
Alternative assessment Authentic assessments
assessments other than traditional multiple-choice assessments that emulate the performance that
tests; most often used to describe performance would be required of the student in real-life situations.
assessments or other assessments that provide more
feedback about student learning than whether the Benchmarks
answer is correct or incorrect. (Also see identifiable points on a continuum toward a goal or
Accommodations.) standard. The term may be used to describe content
standards when interim targets (benchmarks) have
Analytic scoring been set by age, grade, or developmental level; the
A method of scoring performance assessments that term is also used interchangeably with “anchor”
yields multiple scores for the same task/performance. papers or performances which illustrate points of
Performance is separated into major components, progress on an assessment scale (i.e., student works
traits, or dimensions and each is independently which exemplify the different levels of a scoring
scored. (e.g., a particular sample of a student’s rubric).
writing may be assessed as grammatically correct at
the same time it is assessed as poorly organized.) CIA
Analytic scoring is especially effective as a diagnostic acronym for curriculum, instruction, and assessment
tool.
STATE COLLABORATIVE ON ASSESSMENT AND STUDENT STANDARDS ARTS EDUCATION ASSESSMENT CONSORTIUM
Page 1 of 3
Cohort Curricular alignment
a group of students whose progress is followed and the degree to which a curriculum’s scope, sequence,
measured at different points in time. and content match standards, instruction,
assessment, or instructional resources.
Competency test
a test intended to verify that a student has met Cut score
standards (usually minimal) of skills and knowledge (also called performance standard) performance level
and therefore should be promoted, graduated, or or numerical score established by the assessment
perhaps deemed competent. system to describe how well the student performed.
The cut score can be manipulated to increase or
Constructed-response assessment
decrease the number “passing” or “failing” a test.
a form of assessment that calls for the student to (Also see standard-setting.)
generate the entire response to a question, rather
than choosing an answer from a list (e.g., paper-and- Descriptors
pencil responses on essay or short answer tests or explanations that define the levels of scoring scales
performances which may be drawn, danced, acted (Also see criteria.)
out, performed musically, or provided in any other
way to exhibit particular skills or knowledge. (Also Dimension
referred to as open-response and open-ended specific traits, characteristics, or aspects of
assessments.) performance which are fairly independent of each
other and can be scored separately (e.g., rhythm and
Context melody can be scored separately for the same
the surrounding circumstances or environment in musical performance).
which an assessment takes place (e.g., embedded in
the instruction or under standardized conditions [e.g., Disaggregate
part of a large scale assessment]) (as in disaggregated data); pulling information apart
(e.g., looking at the performance of various sub-
Cornerstone assessment tasks groups instead of only the performance of the large
are curriculum-embedded assessment tasks that are group).
intended to engage students in applying their
knowledge and skills in an authentic context. These Educational outcome
tasks are described by their originator Jay McTighe an educational goal, expectation, or result that occurs
as: at the end of an educational program or event
• curriculum embedded (as opposed to externally (usually a culminating activity, product, or other
imposed); measurable performance).
• recurring across the grades, becoming Enhanced/extended multiple-choice assessments
increasingly sophisticated over time;
selected-response assessments with additional parts
• establishing authentic contexts for performance;
(for more points); this additional part often requires
• calling for understanding and transfer via genuine
the students to justify their answers, show their work,
performance;
or explain why they marked a particular option.
• used as rich learning activities or assessments;
• integrating 21st century skills (e.g., critical Essay test
thinking, technology use, teamwork) with subject a paper-and-pencil test that requires students to
area content; construct their entire brief or extensive responses to
• evaluating performance with established rubrics; the question(s); should be limited to measuring
• engaging students in meaningful learning while higher levels of learning.
encouraging the best teaching;
• providing content for student portfolios so that Extended-response assessments
students graduate with a resume of demonstrated an essay question or performance assessment,
accomplishments rather than simply a transcript which requires an elaborated or graphic response
of courses taken. that expresses ideas and their interrelationships in a
literate and organized manner
Criteria
(sometimes used as synonym for traits or attributes); Evaluation
the rules or guidelines used for categorizing or a judgment about the worth or quality of something.
judging; in arts assessment, the rules or guidelines In education, data from tests, tasks, or performances
used to judge the quality of a student’s performance. are used to make judgments about the success of the
(Also see rubric, scoring guide, and scoring criteria.) student or program.
Criterion-referenced assessment Formative Assessment
an assessment designed to measure performance (Sometimes referred to as Assessment for Learning)
against a set of clearly defined criteria. Such A process used by teachers and students during
assessments are used to identify student strengths instruction that provides feedback to adjust ongoing
and weaknesses with regard to specified knowledge teaching and learning to improve students'
and skills (which are the goals or standards of the achievement of intended instructional outcomes.
instruction). Synonyms include: standard-based or - Short-interval and usually classroom-based
referenced, objective-referenced, content-referenced, assessments that have immediate information for
domain-referenced, or universe-referenced. teachers and students to inform the instructional
process and determine what comes next in the
learning process.
STATE COLLABORATIVE ON ASSESSMENT AND STUDENT STANDARDS ARTS EDUCATION ASSESSMENT CONSORTIUM
Page 2 of 3
Generalizability Median
the degree to which the performances measured by a a measure of central tendency, which identifies the
set of assessment items/tasks are representative of point on the scale that separates a group of scores
the entire domain being assessed (E.g., is one so that there is an equal number of scores above and
performance assessment sufficient for drawing below it.
conclusions about a student’s ability to critique works
of art?); may also be an issue in drawing a sample of Metacognition
students from a population (i.e. the degree to which a the ability to think about one’s own thinking; the
sample of students is representative of the population knowledge that individuals have of their own thinking
from which it is drawn) processes and strategies and their ability to monitor
and regulate those processes.
Grade equivalent
a score, available from some standardized tests, Multiple-choice test
which describes the performance of students a test consisting of items (questions or incomplete
according to how it resembles the performance of statements) followed by a list of choices from which
students in various grades. A GE of 5.5 means that students have to select the correct or best response.
the student is performing like a student in the fifth Multiple measures
month of the fifth grade.
the use of a variety of assessments to evaluate
Grading performance in a subject area (e.g., using multiple-
a rating system for evaluating student work; grades choice items, short answer questions, and
are usually letters or numbers and their meaning performance tasks to assess student achievement in
varies widely across teachers, subjects, and systems. a subject); the use of multiple measures is advocated
to obtain a fair and comprehensive measurement of
High-stakes testing performance
any testing program for which the results have highly
significant consequences for students, teachers, Mode
schools, and/or districts. These summative tests are a measure of central tendency which identifies the
frequently used as accountability devices to most frequent score in a group of scores (e.g., in the
determine effectiveness or success. group of scores: 1, 2, 8, 9,9,10, the mode is 9).
Norm
Holistic method the midpoint or “average” score for the group of
a scoring method which assigns a single score based students to which a norm-referenced test was initially
on an overall appraisal or impression of performance administered (the norm group). By design, 50% of the
rather than analyzing the various dimensions students score below and 50% above this score.
separately. A holistic scoring rubric can be
specifically linked to focused (written) or implied Norm group
(general impression) criteria. Some forms of holistic a group of students that is first administered a
assessment do not use written criteria at all but rely standardized norm-referenced test by its developers
solely on anchor papers for training and scoring. in order to establish scores for interpreting the
performance of future test-takers.
Intelligence tests
tests designed to measure general cognitive Norm-referenced test
functioning; group or individually administered tests a standardized test which compares the performance
used to determine mental age as compared to of students to an original group that took the test (the
chronological age (MA/CA x 100 = IQ [intelligence norm group); results usually reported in terms of
quotient]); i.e., the “average” IQ of the population is percentile scores (e.g., a score of 90 means that the
100. Some intelligence tests do not calculate mental student did better than 90% of the norm group).
age but compare an individual’s performance to the Normal curve equivalent (NCE)
performance of a norm group at various
a normalized standard score used to compare scores
developmental levels, generating verbal and
across tests with different scales and/or between
performance scores with a mean or “average” score
students on the same test (since arithmetic
of 100.
manipulations should not use percentiles); it has a
Item analysis mean of 50, a standard deviation of 21.06 and is
a statistical analysis of the items on a selected- often required for reporting by federal funding
response test to determine the relationship of the agencies such as Title I.
item to the test’s validity and reliability as a whole. Open-ended assessments
The number and nature of the students selecting constructed assessments (frequently tasks or
each option are analyzed. problems) that require students to generate a solution
Matrix sampling to a problem for which there is no single correct
a process used to estimate the performance of large answer (e.g., create a drawing that uses symbols of
groups through testing a representative sample of the the Renaissance)
students. Each student in the sample may be given Open-response assessments
only a small segment of the total assessment. constructed-response assessments (ones for which
Mean students must construct the entire answer and show
the arithmetic average of a group of scores; one of their work) that have a single correct answer but
three measures of central tendency, a way to multiple methods of solution possible.
describe a group of scores with a single number.
STATE COLLABORATIVE ON ASSESSMENT AND STUDENT STANDARDS ARTS EDUCATION ASSESSMENT CONSORTIUM
Page 3 of 3
Percentile Rubric
a statistic provided by standardized norm-referenced (sometime referred to as a scoring guide or scoring
tests which describes the performance of a student criteria) an established, ordered set of criteria for
as compared to that of the norm group. The range is judging student performance/products; it includes
1to 99 with 50 denoting average performance. A performance descriptors of student work at various
th
student scoring at the 65 percentile performed levels of achievement.
better than, or as well as, 65% of the norm group.
Sampling
Performance assessment a way to get information about a large group by
a task/event/performance designed to measure a examining a smaller representative number of the
student’s ability to directly demonstrate particular group (the sample).
knowledge and skills. E.g., a student may be asked
to demonstrate some physical or artistic Scale score
achievement: play a musical instrument, create or a score indicating an individual’s performance on a
critique a work of art, or improvise a dance or a standardized test, which allows comparisons across
scene. These kinds of assessments (e.g., tasks, sub-groups and time. (E.g., one could use scale
projects, portfolios, etc.) are scored using rubrics: scores to compare test results among classes,
established criteria for acceptable performance. schools, and districts; or across grades from year to
year.)
Portfolio
a purposeful collection of student work across time Scaffolded assessments
which exhibits a student’s efforts, progress, or level of a set of context-dependent assessments, which are
proficiency. Examples of types of portfolios include: sequenced to measure ascending levels of learning;
showcase (best work), instructional, assessment this set usually contains a variety of item formats
(used to evaluate the student, and process or project (from multiple-choice to performance tasks) about a
(shows all phases in the development of a product or single stimulus (e.g., a specific set of materials: a
performance). particular situation, scenario, problem, or event).
Since these kinds of assessments can measure a
Primary trait scoring variety of kinds of learning, they provide the
A type of rubric scoring constructed to assess a opportunity for diagnosis of instruction and
specific trait, skill or format or the impact on a identification of student strengths and weaknesses.
designated audience. (Also see analytic scoring.)
Project Scoring criteria
a type of performance assessment which is complex, the rules or guidelines used to assign a score (a
usually requiring more than one type of activity, number or a label) indicating the quality of a
process, or product for completion. performance; in the analytic scoring of a
performance, different rules may be applied to
Quartile different dimensions or traits of the performance.
a way of describing the position of a score on a norm-
referenced test, e.g., the score falls in one of four Scoring guide
th th
groups: 0-25 percentile, 26-50 percentile, etc. directions for scoring and/or interpreting scores; the
guide may include general instructions for raters,
Quintile training notes, rating scales, rubric, and student work.
a way of describing the position of a score on a norm-
referenced test, e.g., the score falls in one of five Selected-response items
th th
groups: 0-20 percentile, 21-40 percentile, etc. a kind of test item for which students have to select
the best or correct answer from a list of options
Range (multiple-choice, etc.) or indicate the truth or falsity of
the most rudimentary method of describing how a statement.
much a group of scores vary; range is determined by
subtracting the lowest from the highest score in the Self-assessment
group collecting data about one’s own performance for the
purpose of evaluating it. Self-evaluation may include
Rating scale the comparison of one’s own performance against
a scale used to evaluate student learning using a established criteria, change in performance over
gradation of numbers or labels; a Likert rating scale is time, and/or a description of current performance.
frequently used to measure attitudes or perceptions Three types of educational standards are frequently
used in education today:
Reliability
a measure of the consistency of an assessment Standard deviation
across time, judges and subparts of the assessment a measure of the variability of a group of scores.
(assuming no real change in what is being When the standard deviation is high, students are
measured). performing very differently from each other; if it is low,
students are performing similarly to one another.
Rating scale
a scale used to evaluate student learning using Standard error of measurement
numbers or labels (e.g, a Likert scale). a statistic used to indicate the consistency and
reliability of a measurement instrument; a large
standard error of measurement indicates that we
have less confidence in the obtained score
STATE COLLABORATIVE ON ASSESSMENT AND STUDENT STANDARDS ARTS EDUCATION ASSESSMENT CONSORTIUM
Page 4 of 4
Standards-based instruction
Summative Assessment
instruction designed, taught, and assessed using
Standards The effort to summarize student learning at a
particular point in time such as the end of a chapter,
1. Content standards specify what students
should know and be able to do in a specific unit, grading period, semester, year, or end of
content area---the essential knowledge, skills, course.
processes, and procedures students must learn Test
and be able to demonstrate. They answer the A sample of behavior or performance administered in
question: “What should be learned in this order to provide a basis for inferences about a larger
subject?” Student standards have been subject area or domain of study. E.g., a teacher may
developed for periods of time ranging from administer a 30-minute test to provide evidence of
individual grade levels to lifelong learning. the student’s learning for the last two weeks or for a
2. Performance standards specify the degree or particular unit of instruction. The test may be norm- or
quality of learning students are expected to criterion-referenced, traditional (e.g., multiple-choice,
demonstrate in the subject. They answer the short answer, essay, etc.), or performance-based. A
question: “How good is good enough?” The teacher-made test is one prepared and administered
national standards for the arts use the term by the teacher, usually for use in the classroom.
“achievement standards” to avoid confusion
between arts performance and performance Validity
assessment. (Some states refer to established A characteristic of a measure which refers to its
levels of proficiency instead of performance ability to measure what it is intended to measure
standards.) AND do so reliably (i.e., measures consistently
3. Opportunity-to-learn standards specify what across time, judges, and sub-parts). A valid measure
schools must provide to enable students to meet is both accurate and consistent; e.g., a bathroom
content and performance standards. scale may record 100 pounds every time a person
student standards (achievement targets) gets on it, but if he or she actually weighs 120, the
scale is reliable but not valid. Types of validity
Stanine
include:
A standard 9-point scale used to report the results of
norm-referenced tests in order to allow comparison of Content validity—The assessment has content
scores across students, schools, districts, tests, validity if it measures the content or area it intends
grades, etc. The mean is 5 and the standard to measure.
deviation approximately 2. Stanines of 1-3 are
Concurrent validity—The assessment has
considered below average; 4-6 average; and 7-9
concurrent validity if it is correlated with other
above average.
measures of that particular content or area.
Standardized test
Predictive validity—The assessment has
A test administered to a group of persons under the predictive validity if it predicts later actual
same specific conditions so student results can be performance of the individual in that subject or
fairly compared. area. Predictive validity is related to
generalizability.
STATE COLLABORATIVE ON ASSESSMENT AND STUDENT STANDARDS ARTS EDUCATION ASSESSMENT CONSORTIUM
Page 5 of 5