Lecture: Evaluation Methods & outcome measures Dr.
Mahdi
Outcome measures
An outcome measure is a tool used to assess a patient’s current status. Outcome
measures may provide a score, an interpretation of results and at times a risk
categorization of the patient. Prior to providing any intervention, an outcome
measure provides baseline data. The initial results may help determine the course
of treatment intervention. Once treatment has commenced, the same tool may be
used in serial assessments to determine whether the patient has demonstrated
change. With the move towards Evidence Based Practice (EBP) in health care,
outcome measures provide credible and reliable justification for treatment on an
individual patient level. The results from outcome measures may also be grouped
for aggregated analysis focused on determining quality of care. When outcome
measures are used in an aggregated data situation to compare results, a risk
adjustment process is required to fairly compare results.
Classification
Outcome measures that we use in clinical practice are divided into four categories:
1. Self-report measures
2. Performance-based measures
3. Observer-reported measures
4. Clinician-reported measures
Self-report measures are typically captured in the form of a questionnaire. The
questionnaires are scored by applying a predetermined point system to the
patient's responses. Although self-report measures seem subjective in nature, self-
report measures objectify a patient's perception. Historically, the questionnaires
required that either a therapist interviewed the patient or the patient
independently completed the questionnaire. Self-report outcome measures that
use paper and pencil for completion are considered a fixed-form questionnaire.
Computer based or electronic self-report measures are available. Electronic
measures may be fixed-form or adaptive. Computerized adaptive testing is a
method of testing that determines the questions for a response based on the
patient's previous responses. The questionnaires where the patient reports on
health or physical function are known as patient-reported outcomes (PRO).
PROs can be categorized as disease specific or generic. PROs have been defined as
"any report of the status of a patient's health condition that comes directly from
the patient, without interpretation of the patient's response by a clinician or
anyone else."
Performance-based measures require the patient to perform a set of
movements or tasks. Scores for performance-based measures can be based on
either an objective measurement (e.g., time to complete a task) or a qualitative
assessment that is assigned a score (e.g., normal or abnormal mechanics for a
given task).
1|Page
Performance-based measures and patient reported measures both capture a
current status. These measures do not typically equate with each other.
Performance-based measures tend to bring to light physiologic factors. Patient
reported outcome measures may capture a patient's perception, beliefs, social
factors and/or health factors.
Observer-reported measures are measurements completed by a parent,
caregiver or someone who regularly observes the patient on a daily basis.
Clinician-reported measures are measurements that are completed by a
health care professional. The professional uses clinical judgement and reports on
patient behaviors or signs that are observed by the professional.
Statistical Analysis
Important features of an outcome measure that need to be taken into account
when using an outcome measure are its psychometric properties.
Psychometric properties are the intrinsic properties of an outcome measure.
Ideally, the psychometric properties of an outcome measure used in practice
should have been developed and tested through a series of research studies. These
properties include validity, inter-rater reliability, intra-rater reliability,
responsiveness, ceiling effects, floor effects and minimal clinically
important difference.
Validity refers to the how accurately the test actually measures what it is
supposed to measure.
Inter-rater reliability takes into consideration the consistency of the results of
the measure when two different people are evaluating the results of a common
subject. With performance-based measures, if two physiotherapists scored the
performance, high inter-rater reliability would mean that both determined similar
scores on the performance evaluated. For patient reported outcome measures, a
high intra-rater reliability indicates that the patient consistently responds to
attain the same results. (This would be more relevant with serial testing and no
intervention or change in status. Intra-rater reliability falls under test-retest
reliability.)
Responsiveness refers to the ability for the measure to be able to capture
change in status. Ceiling effect occurs when the majority of patients are able to
complete the measure and score within the highest range of the measurement.
(The test is too easy and is not capturing their full capability.) Floor effect occurs
when the majority of the patients score within the lowest range of the
measurement. (The test is too hard and does not have enough easier items to
distinguish varying levels of status.) When determining if change is relevant, the
p-value has no value. For outcome measures, the clinician needs to know the
minimal important difference. Minimal important difference refers to the amount
2|Page
of change that is relevant from the patient's perspective. (clinical
meaningfulness).
Clinical Utility of an Outcome Measure
Choosing appropriate outcome measures for your patients is critical to
understanding their status and progress over time.
Guide to Selecting Outcome Measures
Why Am I Using the Outcome Measure?
Identifying the impact of a disorder on an individual?
Establishing a baseline measure from which to monitor changes over time?
Evaluating the impact of an intervention?
Evaluating the needs of those attending a service?
What Am I Aiming to Measure?
Impairments of body structure and function?
Activity limitations?
Participation restrictions?
Quality of life?
Have the Clinimetric Properties of this Tool Been Measured in a
Population Similar to Mine? Consider:
Do the study samples have the same condition?
Is the study sample similar in disease severity?
Is the study sample similar in disease-specific factors?
Is the Outcome Measure Reliable?
Do I know the rate of error detected with scores?
Do I know the minimum detectable change?
Is the Outcome Measure Valid?
Does it measure what I want it to measure?
Is the Outcome Measure Responsive to Change?
Is there a known minimum clinically important difference?
3|Page
Financial Considerations
What is the cost of this test?
Is a licence required?
Is equipment required?
Therapist Implementation
Is the measure easy for a clinician to conduct?
Is special training required/available?
Are there clear standardised instructions on how to carry out and score the
measure?
How long does it take to carry out the measure?
How long does it take to record results?
Resources
Is special equipment or are special forms required?
Is space sufficient for this measure to be carried out?
Client
How much time does it take for the person to complete?
Is the task difficult?
Is privacy required?
Patient-Reported Outcome Measures (PROM)
Is face-to-face contact required or can this measure be completed in the
waiting room?
Does the questionnaire cover sensitive personal issues?
Is there a specific reading level required?
Is the measure available in other languages?
Examples:
Shoulder - Outcome Measures
Western Ontario Rotator Cuff (WORC) Index
The Western Ontario Rotator Cuff (WORC) Index is a questionnaire that was
purposely developed to help understand the particular signs, symptoms, and
functional limitations associated with an RC tendinopathy.
4|Page
Method of Use
The WORC Index is a self-administering health questionnaire.
It has 21 items, exploring 5 different domains:
1. Physical symptoms
2. Sports and recreation
3. Work
4. Social function
5. Emotions
6. Each question uses a visual analogue scale (VAS) - which is a straight line,
representing a 100-point scale, ranging from 0-100.
The maximum score is 2100 (worst possible symptoms). Zero (0) represents no
symptoms at all.
To make the final score more clinically friendly, some minor math is involved. The
score can be reported as a percentage by subtracting the total from 2100, dividing
by 2100, and multiplying by 100. This will give you an overall percentage. Total
final WORC scores can, therefore, range from 0% ( lowest functional status level)
to 100% (the highest functional status).
Difficulty of administration: Easy.
Clarity: Very clear directions.
Scoring: Using the VAS, measure the distance from left to right, along the line.
The total score is out of 100 (record the value to the nearest 0.5mm).
Final Score - 2100 (max) / 21, x 100%.
For example: A score of 1625.
(1625/2100)/21 = 22.6% (therefore, overall low functional status).
Reliability refers to how consistent the tool is at measuring your
outcome of interest, and is it free of error.
The WORC Index is highly reliable and has an excellent test-retest reliability
(intraclass correlation coefficient) ranging from 0.85-0.99.
Lumbar Spine - Outcome Measures
https://www.physio-pedia.com/Category:Lumbar_Spine_-_Outcome_Measures
Hand - Outcome Measures
https://www.physio-pedia.com/Category:Hand_-_Outcome_Measures
Hip - Outcome Measures
https://www.physio-pedia.com/Category:Hip_-_Outcome_Measures
Knee - Outcome Measures
https://www.physio-pedia.com/Category:Knee_-_Outcome_Measures
Outcome Measures (All)
https://www.physio-pedia.com/Category:Outcome_Measures
5|Page