Faa Ada280477 Dot Faa Rd-93 5
Faa Ada280477 Dot Faa Rd-93 5
OMB No.0
704
18 8
Public reporting burden for this coLLection o information Is estimated to average 1 hour per response, including the
time.for reviewing instructions, searching ex tng data sources, gathering andmaintaining the dAta needed, an
coqv)leting andreviewing the co~tectionot nformatIon. Send co'ivnents regardig thIsI ren estTMat or any other
L vr
qspct
of Directorate
ts correction
ft/ntorat ion,
Including
for Je0ferson
r
ng DavI burden, to wGash ingt ,-Hearter
for ?nformation
Operations
a suggestions
Report a. 121e
-iiit 6v' uTe
1
Ant ngton VA
L"Sw(-u and tn the offica af Mnnao.en andS,
Sevc
l
'UTo I-1-tPewrkRdcinPr~~"O
fi~way.
iiUn
In~ 2jO
1. AGENCY USE ONLY (Leave blank)
I 3.
2. REPORT DATE
July 1993j
5.
FUNDING NUMBERS
FA3E2/A3093
6. AUTHOR(S)
Kim Cardosi and M. Stephen Huntley
7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES)
U.S. Department of TransportationResearch and Special Programs Administration
8. PERFORMING ORGANIZATION
REPORT NUMBER
DOT-VNTSC-FAA-93-4
MA 02142-1093
10. SPONSORING/MONITORING
DC
20591
This document is
Springfield, VA
22161
200 words)
416
Human Factors, Cockpit, Automation, Display Design, Human
Performance, Human Engineering, Perception, Sensation,
Attention, Workload, Evaluation
17. SECURITY CLASSIFICATION
OF REPORT
Unclassified
Unclassified
Unclassified
Best
Avai~lable
Copy
NOTICE
Th is docu ment is dissem inated u nder the sponsorship
of the Department of Transportation in the interest
of information exchange. The United States Government
assumes no liability for its contents or use thereof.
NOTICE
The United States Government does not endorse
products or manufacturers. Trade or manufacturers'
names appear herein solely because they are considered
essential to the object of this report.
US. Department
of Transportation
John A. Volpe
National Transportation
Systems Center
Kendall Square
Cambridge Massachusetts 02142
Research and
Special Programs
Administration
May 1994
ERRATA
Due to an oversight on the part of the printer, certain last-minute changes to this Final Report
were not added. As a result, it is necessary for us to enclose a revised copy of page 44. Please
accept our apologies. In the future, we intend tt print a new edition of Human Factorsfor
Flight Deck Certification Personnel that contains these additional changes.
that color discriminations that depend on S cones will be impaired if the image
is sufficiently small to fall only on the center of the fovea. This is illustrated by
Figure 3.3. When viewed dose, so that the visual angle of each circle subtends
several degrees, it is easy for an individual with normal color vision to
discriminate the yellow vs. white and red vs. green. Viewed from a distance of
several feet, however, the yellow and white will be indiscriminable. This is
called smail-fleld tritanopia, because tritanopes are individuals who completely
lack S cones. A tritanope would not be able to discriminate the yellow from the
white in Figure 3.3 regardless of their sizes. With certain small fields, even
normal individuals behave like tritanopes. Notice that even from a distance, the
red-green pair is still discriminable because S cones are not necessary for this
discrimination. Thus, the small-field effect is limited to discriminations that
depend on S cone5. (Note: Due tc teclincal difficulties in reproducing colors,
individuals with normal color vision may still be able to discriminate the yellow
and white semicircles at a distance.)
Figure 3.3.
Colors (yellow and white) not discriminable at a distance due to small field
trftanopia
94-i89937
44
94 6 20
004
METRIC TO ENGLISH
LENGTH (APPROXIMATE)
LENGTH (APPROXIMATE)
1 inch
AREA
1 millimeter (mm)
1 centimeter (cm)
1 meter (m)
1 meter (m)
1 kilometer(km)
AREA
(APPROXIMATE)
=
=
=
=
=
TEMPERATURE (EXACT)
yC
(APPROXIMATE)
1 milliliter (ml)
1 liter(1)
1 liter (I)
1 liter(1)
1 cubic meter (m3 )
3
1 cubic meter (m )
TEMPERATURE (EXACT)
[(x-32)(5/9)]*F
(APPROXIMATE)
VOLUME
5 milliliters (ml)
15 milliliters (ml)
30 milliliters (ml)
0.24 liter (I)
0.47 liter (I)
0.96liter(1)
3.8 liters (I)
0.03 cubic meter (m3 )
0.76 cubic meter (m3 )
= 1.1 yards(yd)
a 0.6 mile (mi)
(APPROXIMATE)
-
MASS - WE IG HT (APPROXIMATE)
VOLUME
1 teaspoon (tsp)
1 tablespoon (tbsp)
1 fluid ounce (fl oz)
1 cup (c)
1 pint (pt)
1 quart(qt)
1 gallon (gal)
1 cubic foot (cu ft W3)
3
1 cubic yard (cu yd, yd )
[(9/5)y+32]JC
xF
CENTIMETERS
10
11
12
13
14
15
16
17
18
19
20
10
21
22
23
24
25
25.40
.22
O Ile
.400
.30*
40
14
.20
.10
32
1 I
0
50s
680
86
200
30*
10
104
122*
40
50
140
I
600
1s8
I
70
176
I
80s
For more exact and or other conversion factors, see NBS Miscellaneous Publication 286, Units of Weights and
Measures. Price S2.50. SD Catalog No. C1 3 10286.
ii
194
212
I1
90g
1000
Preface
Flight test pilots who perform aircraft certification and evaluation
functions for the FAA are frequently required to make important
decisions regarding the human factors aspects of cockpit design. Such
decisions require a thorough understanding of the operational conditions
under which cockpit systems are used as well as the performance limits
and capabilities of the flight crews who will use these systems. In the
past, the limits of control and display technology and the test pilot
familiarity with the knobs and dials of traditional aircraft have provided
useful references from which to judge the safety and utility of cockpit
displays and controls. Today, however, with the advent of the
automated cockpit, and the almost limitless information configurations
possible with CRT and LCD displays, evaluators are being asked to go far
beyond their personal experience to make certification judgments.
A survey of human factors handbooks, advisory circulars and even formal
human factors courses revealed little material on human performance
that was formatted in a fashion that would provide useful guidelines to
certification personnel for human factors evaluations in the cockpit.
Most sources of human factors information are of limited use in
evaluating advanced technology cockpits because they are out of date
and do not consider the operational and cockpit context within which the
newly designed controls and displays are to be used.
It will be some time before the human factors issues concerning
interacting with electronic cockpits are well defined and there is
sufficient information and understanding available to support the
development of useful handbooks. In lieu of such guidance, a series of
one-week seminars on human factors issues relevant to cockpit display
design was conducted for approximately 120 FAA certification personnel.
The lectures were given by researchers and practitioners working in the
field. The lectures included material on the special abilities and
limitations of the human perceptual and cognitive system, concepts in
display design, testing and evaluation, and lessons learned from the
designers of advanced cockpit display systems. The contents of this
document were developed from the proceedings of the seminars.
'n&I
I wish to thank a number of my friends and colleagues for their
B
important contribution to this document. I am deeply indebted to Dr.
aced
impotantcontibuton
t
Kim Cardosi, Dr. Peter D. Eimas, Mr. Delmar M. Fadden, Dr. Richard F. cat.
Gabriel, Dr. Christopher D. Wickens, and Dr. John Wemer, the principal
Diatglbltll
Aw11rbab11
.,
iv
Contents
CONTENTS
Chapter 1
............................
Effects of Exposure ..
.............................
Sound Localization ..
.............................
Chapter 2
11
11
The EZ
y.
14
......................................
Accommodation ............................
Aging and Presbyopia ........................
Ocular Media Transmission and Aging ............
16
17
17
20
20
22
23
24
25
26
27
32
Chapter 3
Flicker .................................
32
Motion ...................................
35
41
41
43
45
45
46
47
48
51
51
53
55
56
57
58
59
59
60
60
60
62
Adaptation .....................................
Chromatic Adaptation ........................
Variation Under Normal Conditions ..............
63
63
64
65
65
68
vi
Contents
.........................
69
71
72
75
77
78
78
79
83
83
83
89
89
91
91
92
93
Chapter 5
...........................
93
94
Attention ......................................
97
98
99
100
Expectation ..................................
101
vii
..
102
105
Memory ......................................
The Sensory Store . ........................
Short-Term Memory ........................
Long-Term Memory ........................
107
108
110
111
Chapter 6
115
115
Attention .....................................
Focused Attention . ........................
Divided Attention ..........................
Selective Attention .........................
Head-Up Displays ..........................
HUD Optics . ............................
Physical Characterstics ......................
Symbology . .............................
Attention Issues ...........................
118
119
120
121
122
125
128
129
130
Chapter 7
133
133
135
137
137
137
138
139
140
141
141
142
vili
Contents
144
145
146
146
147
147
147
148
149
149
149
151
152
160
Chapter 8
. .
161
165
165
166
166
167
169
171
177
179
180
182
183
186
188
190
190
190
ix
193
198
200
200
204
206
Chapter 9
209
Introduction ...................................
209
Definition . ..................................
210
210
213
213
214
215
215
217
219
220
221
223
223
223
224
224
224
...
Contents
224
224
225
226
227
229
232
235
237
239
Conclusions ..................................
239
Recommendations ...............................
240
Chapter 10
243
244
Requirements ................................
244
Design ......................................
248
Evaluation ..................................
250
Operation ....................................
251
251
251
256
258
260
xi
264
267
chapter 11
269
271
273
275
277
282
289
292
301
301
303
303
303
305
Chapter 12
301
302
307
Introduction ...................................
307
308
310
310
311
315
xii
Contents
316
316
318
319
320
321
322
322
323
325
325
327
327
330
331
332
332
335
336
REFERENCES ......................................
INDEX
......................................
R-1
index-1
xo~oe
LIST OF FIGURES
Figure 1.1
Figure 1.2
Figure 1.3
Figure 1.4
Figure 2.1
Figure 2.2
Page
12
13
14
Figure 2.4
15
Figure 2.5
16
Figure 2.6
17
Figure 2.7
18
18
19
Figure 2.3
Figure 2.8
Figure 2.9
xiv
Contents
Figure 2.10
20
21
22
24
Figure 2.14
26
Figure 2.15
27
Figure 2.16
28
Figure 2.17
29
31
32
33
34
35
Figure 2.11
Figure 2.12
Figure 2.13
Figure 2.18
Figure 2.19
Figure 2.20
Figure 2.21
Figure 2.22
xv
Figure 2.23
Figure 2.24
Figure 2.25
36
37
38
42
43
44
48
49
50
50
Figure 3.8
52
Figure 3.9
53
55
Figure 3.1
Figure 3.2
Figure 3.3
Figure 3.4
Figure 3.5
Figure 3.6
Figure 3.7
Figure 3.10
xvi
....
Contents
Figure 3.11
56
57
58
Figure 3.14
61
Figure 3.15
61
Figure 3.16
62
63
64
66
Figure 3.20
67
Figure 3.21
68
Figure 3.22
69
Figure 4.1
73
Figure 4.2
74
Figure 4.3
Figure 3.12
Figure 3.13
Figure 3.17
Figure 3.18
Figure 3.19
xvii
.....................
75
Figure 4.5
77
Figure 4.6
78
79
80
81
Figure 4.7
Figure 4.8
Figure 4.9
Figure 4.10
...
76
81
82
84
85
86
Figure 4.15
87
Figure 4.16
88
Figure 4.17
88
Figure 4.12
Figure 4.13
Figure 4.14
xviUi
Contents
Figure 4.18
89
Figure 4.19
90
Figure 4.20
92
Figure 5.1
94
Figure 5.2
103
110
117
Figure 5.3
Figure 6.1
Figure 6.2
121
122
Figure 6.4
123
Figure 6.5
125
127
Figure 7.1
134
Figure 7.2
136
Figure 7.3
151
154
Figure 6.3
Figure 6.6
Figure 7.4
xix
...............
Figure 7.5
Figure 7.6
Figure 7.7
Figure 7.8
Figure 8.1
155
157
158
. .
159
168
Figure 8.2
Model of workload
169
Figure 8.3
Figure 8.4
.........................
172
173
.......................
.............
179
Figure 8.5
Figure 8.6
181
Figure 8.7a
184
Figure 8.7b
184
Figure 8.8
187
188
Figure 8.10
189
Figure 8.11
191
Figure 8.9
xx
...........
Contents
Figure 8.12
192
194
196
198
211
216
Figure 9.3
218
Figure 9.4
230
Figure 9.5
236
237
Figure 10.1
245
Figure 10.2
253
Figure 10.3
254
Figure 10.4
255
Figure 10.5
256
266
Figure 8.13
Figure 8.14
Figure 8.15
Figure 9.1
Figure 9.2
Figure 9.6
Figure 10.6
'Ii
Figure 11.1
275
Figure 11.2
280
Figure 11.3
281
288
Figure 11.5
289
Figure 11.6
293
Figure 11.4
Figure 11.7
Evaluation Questionnaire
Figure 11.8
Figure 11.9
Figure 11.10
Figure 11.11
.....................
294
295
296
297
298
xxii
Contents
List of Tables
Table 1.1
Table 3.1
Table 5.1
108
Table 7.1
162
Table 8.1
176
197
Table 9.1
212
Table 9.2
214
217
Table 8.2
Table 9.3
Table 9.4
............................
46
218
Table 9.5
219
Table 9.6
221
Table 9.7
222
Table 9.8
227
Table 9.9
228
Table 9.10
231
Table 9.11
232
Table 9.12
233
xxmie
238
Table 9.14
241
278
Table 11.1
Table 11.2
Table 11.3
Table 11.4
279
Flight Procedure Workload Data Summary Chicago to St. Louis Flight Totals ................
284
285
Line Operation Visual Activity Time Demand Average Percent of Time Available Devoted to
Visual Tasks ...............................
286
Table 11.6
287
Table 11.7
290
Table 11.8
291
Table 11.5
xxiv
Figure 1.2
Figure 1.3
Figure 1.4
Figure 2.1
Figure 2.2
Figure 2.3
Figure 2.4
Figure 2.5
Figure 2.6
Figure 2.8
From Werner, J.S., Peterzell, D. & Scheetz, A.J. Light, vision, and
ion
aging - A brief review. (Copyright * 1990) Optomeby and
Sdeuc, 67 214-229. Reproduced by permission of William &
Wilkins, American Academy of Optometry.
Figure 2.9
Cocp and
mence:From Wyszecki, G. & Stiles, W.S. Color S
Metsod; QuanNtt
Data and Fomulae. (2nd ed.) (Copyright.
1982) New York: John Wiley & Sons, Inc. Reproduced by
permission of John Wiley & Sons, Inc.
Figure 2.10
Figure 2.12
Figure 2.13
Figure 2.14
Figure 2.16
Figure 2.17
Figure 2.18
Figure 2.19
xxvi
Figure 2.20
Figure 2.21
Figure 2.22
Figure 2.23
Figure 2.25
Figure 3.1
Figure 3.2
Figure 3.4
From Werner, J.S. & Steele, V.G. Sensitivity of human foveal color
mechanisms throughout the life span. (Copyright o 1988)
Washington, D.C.: Jounal of th Optical Society of Ammica, A, 5,
2122-2130. Reproduced by permission of the publisher.
Figure 3.6
xxvii
Figure 3.9
Figure 3.10
Figure 3.11
Figure 3.12
Figure 3.13
Figure 3.14
Figure 3.15
Figure 3.16
Figure 3.17
Figure 3.18
Xu'lu
Figure 3.19
From Wyszecki, G. & Stiles, W.S. (2nd ed.) Color ScienceConept and MediA Quantati Data wd Formula.
(Copyright e 1982) New York: John Wiley & Sons, Inc.
Reproduced by permission of John Wiley & Sons, Inc.
Figure 3.20
Figure 3.21
Figure 3.22
Figure 4.1
Figure 4.2
Figure 4.3
Figure 4.4
Figure 4.5
Figure 4.6
xxiX
Figure 4.7
Figure 4.8
Figure 4.9
Figure 4.10
Figure 4.11
Figure 4.14
Figure 4.15
Figure 4.16
Figure 4.17
Figure 4.18
Figure 4.19
xxx
Figure 6.1
Figure 6.2
Figure 6.3
Figure 6.4
dendabiltyfrom a prjd
Figure 7.1
Figure 7.2
Figure 7.3
Figure 7.7
xxxi
Figure 7.8
Figure 7.9
From Braune, 11J. The commou/same type raniW human factors and
adwn ibues Reproduced with permission from SAE Paper No.
892229, copyright e 1989, Society of Automotive Engineers, Inc.
Figure 8.1
Figure 8.3
Figure 8.4
From Groce, J.L & Boucek, G.P., Jr. Air banipmt cew tasking in an
ATC data fink awinmenL Reproduced with permission from SAE
Paper No. 871764, copyright o 1987, Society of Automotive
Engineers, Inc.
Figure 8.5
Figure 8.7a
From Roscoe, A.H. The original version of this material was first
published by the Advisory Group for Aerospace Research and
Development, North Atlantic Treaty Organisation (AGARD/NATO)
in AGARDograph AG-282 "The practical assessment of pilot
workload" in 1987.
Figure 8.7b
From Cooper, G.E. & Harper, R.P., Jr. The original version of this
material was first published by the Advisory Group for Aerospace
Research and Development, North Atlantic Treaty Organisation
(AGARD/NATO) in AGARDograph AG-567 "The use of pilot rating
in the evaluation of aircraft handling qualities" in 1969.
Figure 8.8
From Wilson, G.F., Skelly, J., & Purvis, G. The original version of
this material was first published by the Advisory Group for
Aerospace Research and Development, North Atlantic Treaty
Organisation (AGARD/NATO) in AGARDograph CP-458 "Reactions
to emergency situations in actual and simulated flight. In Human
Behavior in High Stress Situations in Aerospace Operations" in
1989.
xxxii
Figure 8.9
From Wilson, G.F., Skelly, J., & Purvis, G. The original version of
this material was first published by the Advisory Group for
Aerospace Research and Development, North Atlantic Treaty
Organisation (AGARD/NATO) in AGARDograph CP-458
"Reactions to emergency situations in actual and simulated flight.
In Human Behavior in High Stress Situations in Aerospace
Operations" in 1989.
Figure 8.11
Figure 8.12
Figire 8.13
Figure 8.15
Figure 9.2
Figure 9.4
Figure 10.1
Figures
11.1 - 11.11
TABLES
Table 8.2
Table 9.3
From Barnett, A. & Higgins. M.X Airline safety: the last decade.
Managmun Scince; 35, January 1989, 1-21, (copyright * 1989).
Reprinted by permission of The Institute of Management
Sciences, 290 Westminster St., Providence, Rhode Island 02903,
USA.
Table 9.7
Table 9.8
xxxiv
Executive Summary
A series of one-week seminars was developed to provide FAA certification
xxxv/xxxvi
Auditory Perception
Chapter 1
Auditory Perception
by John S. Werner, Ph.D., University of Colorado at Boulder
Hearing, like vision, provides information about objects and events at a
distance. There are some important practical differences between hearing and
vision. For example, the stimulus for vision, light, cannot travel through solid
objects, but many sounds can. Unlike vision, hearing is not entirely dependent
on the direction of the head. This makes auditory information particularly
useful as a warning system. A pilot can process an auditory warning regardless
of the direction of gaze, and while processing other critical information through
the visual channel. Auditory information is also less degraded than visual
signals by turbulence during flight, making auditory warnings an appropriate
replacement for some visual display warnings (Stokes & Wickens, 1988). No
doubt these considerations formed the basis of FAA voluntary guidelines on the
use of aural signals as part of aircraft alerting systems (RD-81/38,II, page 89).
~Amplitude
""
0
High Frequency
C)
4)
Time (sec)
Figure 1.1.
Changes in air pressure shown for two sound waves differing in frequency
and amplitude (top). When added together (bottom), the two pure tones form
a complex sound. (original figure)
Auditory Perception
Example
Threshold of Hearing
10
Normal Breathing
20
Leaves Rustling
30
Empty Office
40
50
60
70
80
Quiet Restaurant
Two-Person Conversation
Busy Traffic
Noisy Auto
90
City Bus
100
Subway Train
120
130
140
160
Wind Tunnel
Comment
Threshold of Pain
see the same note from the musical scale played by three different musical
instruments. Below each waveform is shown the amplitude of each frequency in
the sound. That is, each complex sound was broken down into a set of sine
waves of different frequencies using a method called Fourier analysis.
The three instruments sound different because they contain different amplitude
spectra (amplitudes as a function of frequency).
ClOOOng Pipe C
0." fl
o
II
1
1000
P-nn C
2M0
3M0
Frequency
Figure 1.2.
0
godl
0
1000
2M0
3=0
Frequency
eI
1000
2M0
3000
Fequency
Typically, the pitch we hear in a complex sound corresponds to the pitch of the
lowest frequency component of that sound. This component is called the
fwzdamental frequency. Frequency components higher than the fundamental are
called hanmonics, and these harmonics affect the quality or the timbre of the
sound. Two musical instruments, say a trumpet and piano, playing the same
note will generate the same fundamental. However, their higher frequency
components, or harmonics will differ, as illustrated in Figure 1.2. These
harmonics produce the characteristic differences in quality between different
instruments. If we were to remove the harmonics, leaving only the fundamental,
a trumpet and a piano playing the same note would sound identical.
'2
Threshold
of feeling
120
100
Equal
loudness
curves
80
3
so -
Conversational
Audibility
20 -
curve
0 r
20
II
100
500 1000
(threshold
of hearing)
5000 10,000
Frequency (Hz)
Rgum 1.3.
Vadelon dr abslum Uwhald wth sound *rquuncy for a young aduL (Prnm
FRichw & MunWon, 193)
standards of 40 and 80 dB above threshold. Note that the shape of the contour
changes with increasing intensity. That is, the increase in the loudness of a
sound with increasing intensity occurs at different rates for different
frequencies. Thus, we are much more sensitive to intermediate frequencies of
sound than to extremes in frequency. However, with loud sounds, indicated by
higher intensity standards in Figure 1.3, this difference in our sensitivity to
various frequencies decreases.
Sensitivity to loudness depends on the sound frequency in a way that changes
with the level of sound intensity. You have probably experienced this
phenomenon when listening to music. Listen to the same piece of music at high
and low volumes. Attend to how the bass and treble become much more
noticeable at the higher volume. Some high-fidelity systems compensate for this
change by providing a loudness control that can boost the bass and treble at
low volume. The fact that the loudness of a tone depends not only on its
intensity but also on its frequency is a further illustration that physical and
perceptual descriptions are not identical.
While pitch depends on frequency, as mentioned, it also depends on intensity.
When we increase the intensity of a low frequency sound, its pitch decreases.
When we increase the intensity of a high frequency sound, its pitch increases.
The Effects of Aging
The frequency range for an individual observer is commonly measured by
audiologists and is known as an audiogram. Figure 1.3 showed that the
frequency sensitivity of a young adult ranged from about 20 to 20,000 Hz. This
range diminishes with increasing age, however, so that few people over age 30
can hear above approximately 15,000 Hz. By age 50 the high frequency limit is
about 12,000 Hz and by age 70 it is about 6,000 Hz (Davis & Silverman,
1960). This loss with increasing age is known as presbycusis, and is usually
greater in men than in women.
The cause of presbycusis is not known. As with all phenomena of aging, there
are large individual differences in the magnitude of high frequency hearing loss.
One possibility is that changes in vasculature with increasing age limit the
blood supply to sensitive neural processes in the ear. Another possibility is that
there is some cumulative pathology that occurs with age. For example, cigarette
smokers have a greater age-related loss in sensitivity than nonsmokers (Zelman,
1973) and this may be due to the interfering effects of nicotine on blood
circulation. There are other possibilities, but perhaps the most important to
consider is the cumulative effect of sound exposure.
Auditory Perception
Effects of Exposure
Sudden loud noises have been known to cause hearing losses. This is a common
problem for military personnel exposed to gun shots. Even a small firecracker
can cause a permanent loss in hearing under some conditions (Ward &
Glorig,1961).
Exposure to continuous sound is common in modem industrial societies. Even
when the sounds are not sufficiently intense to cause immediate damage,
continuous exposure may produce loss of hearing, especially for high
frequencies. Unprotected workers on assembly lines or airports have hearing
losses that are correlated with the amount of time on the job (Taylor, 1965).
Similar studies have shown deleterious effects of attending loud rock concerts.
The potentially damaging effects of sound exposure on hearing depend on both
the intensity and duration of the sounds. Thus, cumulative exposure to sound
over the life span might be related to presbycusis.
Sound Localization
The separated locations of our ears allows us to judge the source of a sound.
We use incoming sound from a single source to localize sounds in space in two
different ways. To begin with, suppose a tone above 1,200 Hz is sounded
directly to your right, as illustrated in Figure 1.4.
The intensity of high frequency sounds will be less in the left ear than the right
because your head blocks the sounds before they reach your left ear. This
intensity difference only exists for sounds above 1,200 Hz, however. At lower
frequencies, sound can travel around your head without any significant
reduction in intensity.
Whenever a sound travels farther to reach one ear or the other, a lime difference
exists between the arrival of the sound at each ear. Thus, if the sound source is
closer to one ear, the pulsations in air pressure will hit that ear first and the
other a bit later. We can use a time difference as small as 10 microseconds
between our two ears to localize a sound source (Durlach & Colbum, 1978),
but this information is only useful for low frequency sounds. Thus, localization
of high frequency sounds depends primarily on interaural intensity differences,
but low frequency sounds are localized by interaural time differences.
Pernmon
/
Habit
Sound source
atn n Adaptaio
High frequency
/ /tSound
source
High pressure
Low frequency
Low pressure
1.4. wayis
wagve mnd
*h=aasowa
emunihifiquml
from
saue to
tso uw of a pwaon's hema. #mn
Winnr &8ctimingw, 1981)
Auditory Perception
9/10
Chapter 2
Basic Visual Processes
by John S. Werner, Ph.D., University of Colorado at Boulder
Vision is our dominant sensory channel, not only in guiding aircraft, but also in
most tasks of everyday life. For example, we can recognize people in several
ways -- by their appearance, their voice, or perhaps even their odor. When we
rearrange stimuli in the laboratory so that what one hears or feels conflicts with
what one sees, subjects consistently choose responses based on what they saw
rather than on what their other senses told them (Welch and Warren, 1980).
Most of us apparently accept the idea that "seeing is believing."
Physical Propertiesof Light
Light is a form of electromagnetic energy that is emitted from a source in small,
indivisible packets called quanta (or photons). A quantum is the smallest unit of
light. As with sound energy, the movement of light energy through space is in a
11
sinusoidal pattern. Sound waves were described in terms of their frequency, but
light waves are more commonly described in terms of the length of the waves
(i.e., the distance between two successive peaks). This description is equivalent
to one based on frequency because wavelength and frequency are inversely
related. Figure 2.1 illustrates two waves differing in their length. As can be seen
Short wavelength
Ultraviolet
Gmma
Shortwave
asInfrared
X rays
rays
10-14
Long wavelength
10-12
Volet
400O
Figure 2.1
10-10
rays
1-8
Blue Green
10-b
Yellow
Radar
10 4
10
-"
FM
TV
10"
ac ct
AM
104
Electricty
10h
1013
Red
500
600
700
Wavelength in nanometers (nm)
Regions
of
the
electromagnetic
spectrum
and
their corresponding
in the figure, the electromagnetic spectrum encompasses a wide range, but our
eyes are sensitive only to a small band of radiation which we perceive as light.
Normally, we can see quanta with wavelengths between about 400 and 700
nanometeW
(nm; 1 nm is one billionth of a meter). Thus, the two major
physical variables for discussing light are quanta and wavelength. The number
of quanta falling on an object describes the light intensity, whereas the
wavelength tells us where the quanta lie in the spectrum. Most naturally
occurring light sources emit quanta of many wavelengths (or a broadband of
the spectrum), but in a laboratory, we use specialized instruments that emit
only a narrow band of the spectrum called monochromatic lights. If a person
with normal color vision were to view monochromatic lights in a dark room,
the appearance would be violet at 400 nm, blue at 470 nm, green at 550 rnm,
yellow at 570 nm, and red at about 680 rim. Note that this description is for
one set of conditions; later we will illustrate how the appearance can change
for the same monochromatic lights when viewed under other conditions.
12
Figure 2.2 shows the distribution of energy for some familiar light sources,
fluorescent lamps. The four different curves show four different types of lamp.
"200
C.
I1.
,..
,I_
S100
-44
A
....
..
,/
...
...-
400
Figure 22-
.........
...........
, ---..
.
i,"
.......
...
/#
I--------------
600
500
WAVELENGTH X(nm)
700
While they all may be called "white," they differ in their relative distribution c.
energy. They also appear different in their color although this is not always
noticed unless they are placed side-by-side. Variations in the intensity and
spectral distribution of energy can sometimes be quite large without affecting
our color perception. Indeed, Figure 2.3 shows the energy of sunlight plotted as
a function of wavelength for a surface facing away from the sun or toward the
sun. If these two light distributions were placed side-by-side you would say that
one is bluish and the other yellowish, but if either one was used to illuminate a
whole scene by itself, you would most likely call this illuminant white and
objects would appear to have their usual color. Objects usually do not change
their color with these changes in the source of illumination. This perceptual
phenomenon is called color constancy.
When light travels from one medium to another, several things can happen.
First, some or all of the quanta can be lost by absogdon and the energy in the
absorbed quanta is converted into heat or chemical energy. Second, when
striking another medium some or all of the quanta can bounce back into the
initial medium, a familiar phenomenon known as Itjlecdon. Third, the light can
13
1
away
facing
from sun
,
towards
facing
sun
-100
0
100az
U
S5001-
300
Figue .3
400
500
600
wavetength (nm)
700
S~unItgt energy plaited asa function of wavelength for a surface facing away
from W3solar alitiude) or toward fthsun (80 solar altitude). (fromn Wairaven at
aL. 1990
The Eye
Figure 2.4 is a diagram of the human eye. The eyeball is surrounded by a
tough, white tissue called the sclera, which becomes the clear cornea at the
front. Light that passes through the cornea continues on through the pupil, a
hole formed by a ring of muscles called the iris. it is the outer, pigmented layer
of the iris that gives our eyes their color.
Contraction and expansion of the iris opens or closes the pupil to adjust the
amount of light entering the eye. Light then passes through the ens and strikes
the rwna, several layers of cells at the back of the eye. The retina includes
receptors that convert energy in absorbed quanta into neural signals. One part of
the retina, called the fovea, contains the highest number of receptors per unit
area. When we want to look at an object, or fixate it, we move our head and
14
eyes so that the light will travel along the visual axis and the image of the
object will fall on the fovea.
The sizes of visual stimuli are
often specified in terms of the
region of the retina that they
subtend (cover). This concept is
illustrated in panel (b) of Figure
2.4. Consider what happens when
or...
ten,
."
Ro,
3m
__
,7,n
,
(b,
optics of the eye bend the light
so that the image of the tree on
Figure 2.4.
Cross section of the human eye;
the retina is upside down and
visual angle. Mrom Comsweet 19M)
reversed left to right. The area of
the retina covered by the image is called the visual angle, which is measured in
degrees. The angle depends on the object's size and distance from us. In Figure
2.4, we can deduce that smaller and smaller trees at closer and closer distances
could all subtend the same visual angle. The same principles hold for two
equally sized objects at differing distances; they will produce different visual
angles and appear as different sizes. This relation is such that as the distance of
an object from the eye doubles, the size of the image produced by the object is
halved. Artists use this information to create an illusion of three-dimensional
space on a flat surface by making background figures smaller than foreground
figures.
The visual angle x is calculated by: arctan (size/distance), and is specified in
degrees. (Note that the distance is between the object and the cornea, plus the
distance between the cornea and point 'p' in Figure 2.4. The latter value is
seven mm.) By definition, on, !egree equals 60 minutes of arc, and one
minute of arc equals 60 seconc of arc. A rough rule of thumb (no pun
intended) is that the visual angle 'x' of your thumb nail at arms length is about
2o.
15
onRays
ecs
(too
short)
Raysfocus
behind eye
SActual eye
shape
(too tong)
Rays tocusin
ofretina
Normal eye shape
Figure 2.5.
in
formation
Image
(normal),
emmetropic
and
hypermetropic,
myopic
eyes.
(from
16
8o
'E
0
20
I
40
20
to
Age in Years
Figure 2.6.
17
.-.....
3.0
S2.0
8
1.0
200
300
400
500
600
700
800
Wavelength, nm
Figure 2.7.
E 3.5I
o 3.0
0
0
-'2.5
.2.0
O-
0 t-
1.5
1.0
C
0
Figure 2.8.
" 0.5 -A
0
20
40
60
AGE (years)
80
100
optical density of the lens increases markedly with advancing age. It can be
deduced from the solid line fit to the data that the average 70-year-old eye
transmits about 22 times less light at 400 nm (1.34 optical density difference)
than does the eye of the average 1-month-old infant This difference between
young and old diminishes with increasing wavelength.
Because the lens increases its absorption with age, the visual stimulus arriving
at the receptors will be less intense with age. In addition, for stimuli with a
broad spectrum of wavelengths, there will be a change in the relative
distribution of light energy because the short wavelengths will be attenuated
18
more than middle or long wavelengths. Since the stimulus at the retina is
changing with age, there will be age-related decreases in the ability to detect
short wavelengths of W The amount of light absorbed by the lens will also
directly influence our ability to discriminate short wavelengths (blue hues).
Thus, the large range of individual variation in the lens, leads to large
individual differences in discrimination of blue hues and in how a specific blue
light will appear to different observers.
While an increase in the absorption of light with advancing age is considered
normal, some individuals experience an excessive change which leads to a lens
opacity known as a catarac. A cataractous lens severely impairs vision and is
typically treated by surgical removal and implantation of a plastic, artificial lens.
These artificial lenses eliminate the ability to accommodate, but in most cases of
cataract the individual is above about 55 or 60 years of age and has lost this
ability anyway.
ai
z3
'
,o,,,.oi
ColC
ollc
gonglion
Figure 2.9.
Various cal types In the primate retina. (original figure from Dowling &
Boycod, 1966;
19
,4.,.40, -
,,__
BI~nd%W
160.ooom
140.000
120.000oo
!,oo
100,0M)
different shapes.
Rd
L.W
60.000
40,000
MOOD
0
0
70' 60' 50' 40" 30' 20" I0
P
e. o 0
(Y
0I
Figure 2.10.
from Osterberg,
(data Comsweet,
eccentricity.
distributed across the retina, as
1935; figure from
1970)
cones
The
shown in Figure 2.10.
are most densely packed in the
fovea. To look at something directly or fixate on it, we turn our eyes so that
the object's image falls directly on the fovea. This is advantageous because the
fovea contains the greatest number of cones, providing us with our best visual
acuity, or ability to see fine details. Outside the fovea, where the density of the
cones decreases, there is a corresponding decrease in visual acuity. The density
of rods is greatest about 200 from the fovea and decreases toward the periphery.
The periphery has many more rods than cones, but a careful reading of the
figure shows that there are as many as 7,500 cones per square num even in the
peripheral retina.
20
When light falls on the rods and cones, they send signals to other retinal cells,
the horizontal, bipolar, and amacrine cells located in different retinal layers
(Figure 2.9). These cells organize incoming information from receptors in
complex ways. For example, one of these cells can receive information from
many receptors as well as from other retinal cells. Then these cells send their
information on to ganglion cells, which can further modify and reorganize the
neural information. The activity of these ganglion cells is sent to the brain
along neural fibers called axons. Thus, the only information that our brain can
process must be coded in the signals from the ganglion cells. The interactions
among the different retinal cell types provide the physiological basis for many
important perceptual phenomena.
The axons of ganglion cells form a bundle of approximately one million fibers
called the optic nerve. These fibers leave the eyeball in the region termed the
optic disc. Because this area is devoid of receptors, it is called the blind spot. As
can be seen in Figure 2.10, the blind spot is located at about 150 on the side of
the nose (or nasal retina) from the fovea.
As a practical matter, one can now see why FAA guidelines (see RD-81/38,11,
page 40) stress the importance of placing master visual alerts within 15o of each
pilot's normal line of sight as illustrated by Figure 2.11. This is the area of the
visual field with best visual acuity and typically the center of attention. By
FOR
SAREA
3D
SHIGH
PRIORITY /
AREA FOR
SECONDARY
SIGNALS
Figur 2.11.
21
placing high priority signals in this area, they will be detected more quickly
than if they are placed more peripherally.
Spetral Sniiiy1
The functional difference between rods
and cones was discovered in 1825, when
the Czech medical doctor Purkinje
realized that he was most sensitive to a
part of the spectrum in complete
darkness that was different from the part
he was most sensitive to in daylight.
From this, he hypothesized the existence
of different receptors for day (photopic)
vision and night (scotopic) vision. Shortly
Fva oe
o -
-1/,-3_-
s
-5
700
600
500
400
Wavelength inmillimicrons
Log relative sensitivity
Figure 2.12.
plotted separately for rods
some wavelengths. The wavelength of maximal sensitivity for the cones (555
nm) is different than for the rods. This phaotopic spewlvd wmsi
is shown in
Figure 2.12. Not only do the cones differ from the rods in their spectral
sensitivity, but they produce different perceptual experiences. Under photopic or
daylight conditions, we can see different hues as wavelength varies. Thus,
perception of hue is dependent on cone receptors.
77
groped
Most of us have
around in a dark movie__6_-_
c6
theater until our eyes adjusted
to the dim level of
.
5
illumination. This process is
called dark adaptation, and it
"
_4
occurs, in part, because our
31
receptors need time to achieve
35
30
25
20
15
10
5
0
their maximum sensitivity, or
(min)
dark
in
Time
we
If
threshold.
minimum
were to measure the minimum
amount of light required to
Figure 2.13. Tlrel decrease, duLh adaptdon to
dftln showing to cones (iop branch)
see at various times, i.e., our
branch) a1 a dlrm
M
an r
a
entered
we
after
threshold
1=4
Gahan%
o
r
plot
could
we
darkened room,
a dark adaptation curve such
as that shown in Figure 2.13 (reprinted by permission from C.H. Graham, Ed.
Viaion and Viuual Parepton,e John Wiley & Sons, Inc. New York, NY, 1965, p.
75). This curve indicates that the eye becomes progressively more sensitive in
the dark, but notice that the curve has two distinct phases. The first phase,
which lasts about seven minutes, is attributed to the cone system, and the
second phase, to the rod system. When we first enter the dark our cones are
more sensitive than the rods, but after about seven minutes, the rods become
more sensitive.
What explains the greater sensitivity of rods over cones in a dark theater? Part
of the answer is related to the fact that there are many more rods than cones.
Second, because the rods contain more photopigment than cones, they absorb
more quanta. To consider the third explanation for the difference in scotopic
and photopic sensitivity, we must look at the connections of rods and cones to
other neural elements in the retina. Several cones are often connected to a
single bipolar cell. This is termed convergence because the signals from several
24
cones come together at one cell. The more receptors converging on a single cell,
the greater chances are of activating that cell.
Sen
A dim light that produces a weak signal in many rods has a greater chance of
being detected because many rods summate their signals on another cell. Their
combined effects can produce a signal strong enough for visual detection.
Detection of a weak signal by cones is less likely because their spatial
summation of signals occurs over much smaller regions than rods. Convergence,
a structural property of many neural-sensory systems, thus enhances sensitivity.
(Signals in receptors can also be added together over time, a process known as
temporal summation, and this occurs over longer durations for rods than for
cones.)
While it may seem advantageous to summate visual signals over a wide region
of the retina to enhance sensitivity, it should be noted that this is associated
with a loss of resolution or acuity. That is, whenever signals are combined,
information about which receptors generated the signals is lost. Conversely, if
information from receptors is separated, there is a greater possibility of
localizing which receptors are activated and thereby resolving the locus of
stimulation. Thus, there is a trade-off between sensitivity and resolution.
Because rods pool their signals over larger retinal regions than cones, they
enhance sensitivity at the cost of spatial resolution. Cones, on the other hand,
summate information over small regions of retina and favor high resolution at
the expense of sensitivity.
Visual acuity, or resolution, is often defined in terms of the smallest detail that
an observer can see. This is measured by the familiar eye chart with varying
letter sizes viewed at a fixed distance. Visual acuity tested with such a chart is
defined by the smallest letter that can be read. When an individual has, for
example, an acuity of 20/40 or 0.5 it means that at a distance of 20 feet, the
individual just resolves a gap in a letter that would subtend I minute of arc at
a distance of 40 feet (see Riggs, 1965 for other details). In many states, a
person is legally blind if visual acuity is 20/400 or worse.
Visual acuity varies with luminance, as shown in Figure 2.14. In the scotopic
range, visual acuity is dependent on rods and is very poor. As light intensity
increases into the photopic range, visual acuity is more dependent on cones and
dramatically improves. Note, however, that even after cones "take over," visual
acuity continues to vary with light intensity. The data in Figure 2.14 represent
more or less ideal conditions. When a stimulus is moving or the display is
vibrating (as in turbulence), visual acuity may be considerably reduced.
25
--
1.4
00
0-0
00
1.0
.100
00
0.4
00
-3
-2
-1
0-2
log L (mL)
Rgur 2-14.
Damage Thresholds
The human visual system is extremely sensitive to light -- so sensitive that when
light is very intense, the receptors of the retina can be permanently damaged. A
common example of this is the blindness that occurs subsequent to viewing the
solar eclipse.
Ham and colleagues (1982) conducted experiments with rhesus monkeys (who
had their lenses surgically removed) to determine which wavelengths of light
are most damaging to the retina. Because rhesus monkeys have a retina that is
nearly identical to that of humans, the results of these experiments can be
generalized to humans. The results are presented in Figure 2.15 in terms of
relative sensitivity to damage as a function of wavelength.
Damage observed by Ham et al. occurred to the receptors as well as to the cells
behind the receptors that are necessary for receptor function, cells in the layer
known as the retinal pigment epithelium. The data points in Figure 2.15
indicate that any wavelength of light, in sufficient intensity, can be damaging to
the retina. Note, however, that the short wavelengths in the visible spectrum
(ca. 450 run) and the ultraviolet wavelengths (300 to 400 ran) are most
26
Basic Vmu
Processes
(D I000
_j
C
0
_-
100 1-U)
I-
S10
U)
t-
Cr
300
400
500
WAVELENGTH
Figuo 2.1&5.
600
(nm)
RhlWlve mnhvy
rilna
nw danhppoudm a funIon 1 wavdouglh. (dam
*am Ham a a , 19O, ad"
Iuw)
varies with retinal eccentricity and so we rely on a smaller portion of the retina
for processing detailed information. In particular, the act of fixation involves
head and eye movements that position the image of objects of interest onto the
fovea. This can be seen by recording the eye movements of someone who is
viewing a picture. Figure 2.17 shows some recordings made by Yarbus (1967)
of eye movements while viewing pictures. Notice that the eye moves to a point,
fixates momentarily (producing a small dot on the record), and then jumps to
by
both
eyes
Figure 2.16.
Seen
Se
left
right
another point of interest. Notice also that much of the fixation occurs to
features or in areas of light-dark change. Homogeneous areas normally do not
evoke prolonged inspection. For information to be recognized or identified
quickly and accurately, movements of the eye must be quick and accurate. This
is accomplished by six muscles that are attached to the outside of each eye.
These muscles are among the fastest in the human body.
There are two general classes of eye movements: vergence and conjunctive.
Movements of the two eyes in different directions -- for example, when both
eyes turn inward toward the nose -- are called vergence movements. These
movements are essential for fixating objects that are close. The only way both
eyes can have a near object focused on both foveas is by moving them inward.
Eye movements that displace the two eyes together relative to the line of sight
are known as conjunctive eye movements. There are three types of conjunctive
eye movements: saccadic, pursuit, and vestibular. Saccadic eye movements are
easily observed when asking a person to change fixation from one point in
space to another. A fast ballistic movement is engaged to move the eye from
28
one point to the next. Careful measurements show that the delay between
presentation of a peripheral stimulus and a saccade to that stimulus is on the
order of 180 to 250 msec. The movement itself only requires about 100 msec
for the eyes to travel a distance of 400 (Alpern, 1971). Saccadic movements of
the eyes are necessary to extract information from our environment. For
Figure 2.17.
Eye movements while viewing pictures; small dots are fixations. (from Yarbus,
196I
29
example, during reading we may make as many as four small saccades across a
line of type and one large saccade to return the eye to the beginning of the
next line. We engage in many thousands of saccadic eye movements each day.
One of the great mysteries in eye movement research has to do with why we
don't notice our eye movements. If the visual image in a motion picture were
moved around the way the eyes move the visual image, it would be very
disconcerting. The same motion of the image due to movement of the eye
results in the appearance of a stable world. Part of the reason that saccadic eye
movements are not disruptive has to do with an active suppression of visual
sensitivity for about 50 msec before and after a saccadic eye movement
(Volkmann, 1962). A similar reduction in visual sensitivity also occurs during
blinks (Riggs, Volkmann & Moore, 1981), which is probably why we do not
notice "the lights dimming" for one-third second, every four seconds, which is
about the duration and frequency of eye blinking. The light is reduced by about
99% during a blink, but this change is seldom noticed.
Still another reason we may fail to notice blurring during a saccadic eye
movement is due to visual masking. When two stimuli are presented in quick
succession, one stimulus may interfere with seeing the other. For example,
threshold for detecting a weak visual stimulus will increase if a more intense
stimulus is presented just before or just after the weak stimulus is presented.
Similarly, the sharp images seen just before and after an eye movement may
mask the blurred stimulus created during the saccade (Campbell & Wurtz,
1978).
While saccadic eye movements allow the eye to "jump" from one point to
another, pursuit eye movements allow the eye to move slowly and steadily to
fixate a moving object. These movements are very different from saccades and
are controlled by different mechanisms in the brain. Saccadic eye movements
are programmed to move the eye between two points with no changes in the
direction of movement once the saccade has begun. Pursuit movements require
brain mechanisms to determine the direction and velocity of a moving object for
accurate tracking. Indeed, accurate tracking for slow moving objects is possible,
but the accuracy decreases with increasing target speed.
Vestibular movements of the eye are responsible for maintaining fixation when
the head or body moves. To maintain fixation during head movement, there
must be compensatory changes in the eyes. The movement of the head is
detected by a specialized sensory system called the vestibular system, and
head-position information is relayed from the vestibular system to the brainstem
areas controlling eye movements. Although we are seldom aware of vestibular
eye movements, they are essential for normal visual perception. Some antibiotics
30
10
AN,
--).7
9JA,
(/i
conditions, it is virtually
impossible to read signs or
"
\11
892
2
1'
,4
Typical Intercon'e
Size
"
3
movement
n
records illustrating
(for Ditchburn,
discriminated, or recognized.
Rkkrr
_ z
'
&'
Fgum
r219. The drk
sow red-n bkwd
ah.
vewWe The cu" drc,
dmisr ine oo reod vgnz
mpmo the fove ("om P*
1941)
60
0
CL
40
40
S30
__-
.C20
Me___
10
0
-3
--2
-1
Figure 2.20.
Critical flicker fusion for a centrally viewed stimulus plotted as a function of log
luminance. Different curves show different stimulus sizes. (from Hecht & Smith,
1936)
computer screen it may appear steady with direct viewing, but not in your
periphery. A flickering stimulus (e.g., part of a display) in the periphery can be
very distracting as it efficiently attracts attention.
In Figure 2.21, data are presented showing how sensitivity to flicker varies with
the intensity of the stimulus and retinal location (Hecht & Verrijp, 1933). Data
were obtained with a 2 stimulus that was viewed foveally (00 in the figure)
and at 5o and 150 eccentric to the fovea. It appears that a single curve can
account for the changes in CFF with intensity in the fovea, but to describe CFF
at more peripheral locations requires curves with two branches. From the data
in the figure, one can see that the relationship between CFF and retinal
location is complex. At high light levels there is a decrease in CFF from the
fovea to the periphery, whereas the reverse is true at low light levels. Flicker
sensitivity declines rather markedly as a function of increasing observer age, as
shown in Figure 2.22. This is to be expected at least in part because the light
transmitted by the lens decreases with age, and flicker sensitivity is dependent
on light level. It is still not entirely clear whether there are additional neural
changes associated with age-related changes in CFF or whether these changes in
flicker sensitivity are secondary to changes in light level alone (Weale, 1982). In
any case, this means that a display may appear to be flickering for one observer
while an older observer would see the display as steady, i.e., no flicker.
33
50
S30
0*
150
S20
.'"
10
0
-4
-3
-2
-1
0o
Figure 2.21.
The CFF measurements discussed so far were obtained with a stimulus that was
either completely on or completely off. If we were to draw a graph of the
intensity over time it would look like shape 1 illustrated in Figure 2.23, and is
known as a square wave. With specialized equipment, deLange (1958) also
measured flicker sensitivity using other waveforms (changes in light intensity
over time) that are shown by the inset in Figure 2.23, and at three different
light levels. In each case, the stimulus was repeatedly made brighter and
dimmer at the frequency specified on the horizontal axis. The vertical axis plots
the "ripple ratio," or amplitude of modulation, which refers to the amount that
the light must be increased and decreased relative to the average light level to
just detect flicker. Figure 2.23 thus illustrates our sensitivity to flicker at all
different frequencies. It can be seen that we are most sensitive to flicker at
about 10 Hz. At higher frequencies, the amplitude of modulation must be
increased in order for flicker to be detected.
We noted in our discussion of hearing that the response of the human auditory
system to complex sounds can be predicted by decomposing the complex tones
into a set of pure tones, or sinusoidal waveforms. deLange (1958) applied this
approach to the different waveforms of his flickering stimuli by mathematically
analyzing them in terms of a set of sine-wave components (using Fourier
analysis). Figure 2.23 shows a plot of the amplitude of modulation for the
34
T
Ref I
20
40
60
S5,0
30
Figure 2.22.
10
20
30
40
50
60
70
80
9O
Critical flicker fusion plotted as a function of age for six different studies which
used different stimulus conditions. (from Weale, 1982)
35
WI~7..:
bserver L
195
-4-'
-'
2 -
i -.i
43tr.oIands43
-----
10
...-
10
20*
4.3 trolands I
.-
I--H-
.--- --
1-,,
-"
-,
Shape
-,-
-"
" --
- +'\ '
-3i
-.
-i
-:
- -
30 -}I~1.
40
200
t I I U \t-;
I
10
100
A i- ;
I I
20
30 40 50
100
CFF cps -
Figure 2.23.
the fluid rose.... In a room where more than two other people
were walking she felt very insecure and unwell, and usually left
the room immediately, because 'people were suddenly here or
there but I have not seen them moving.'... She could not cross
the street because of her inability to judge the speed of a car,
but she could identify the car itself without difficulty. 'When I'm
looking at the car first, it seems far away. But then, when I want
to cross the road, suddenly the car is very near.'
(From Zihl, von Cramon & Mai, 1983, p. 315).
Figure 2.24 shows a square comprised of dots that are arranged in a random
order, and a set of dots arranged so that they spell a word. If the two sets of
dots are printed on a transparent sheet and superimposed, no word can be read
and one observes only a set of dots. However, if one sheet moves relative to the
other, the dots that move together form a clearly legible word. Structure
36
S-
I.
Figure 2-24.
1"
Ifthe two sets of dots are superimposed, no pattern can be detected. However,
if one set ot dots moves relative to the other, the word "motion" will be clearly
visbe.
emerges from the motion information. This illustrates one of the many functions
of motion -- to separate figure and ground. When an object moves relative to a
background, the visual system separates the scene into figure and ground.
Our perception of motion is influenced by many factors. Our perception of
motion speed is affected by the sizes of moving objects and background.
Measures of motion thresholds indicate that we can detect changes of an object
on a stationary background on the order of 1 to 2 minutes of arc per second.
However, when the background cues are removed, motion thresholds increase
by about a factor of ten (see Graham, 1965b). These thresholds also depend on
the size of the moving object and background. For example, Brown (1931)
compared movement of circles inside rectangles of different size, as illustrated
by Figure 2.25. Observers were asked to adjust the speed of one of the dots to
match the experimenter-controlled speed of the other. He found that in the
large rectangle, the spot had to move much faster than in the small rectangle to
be perceived as moving at the same speed. As a general rule, when different
size objects are moving at the same speed, the larger one will appear to be
moving more slowly than the small one. Leibowitz (1983) believes that this is
the reason for the large number of fatalities at railroad crossings. Large
locomotives are easily seen from the road, but they are perceived to be moving
more slowly than they really are. As a consequence, motorists misjudge the
amount of time they have to cross the tracks.
Most of the motion that we observe involves actual displacement of objects over
time, but this is not a necessary condition for the experience of motion. For
example, a compelling sense of motion occurs if we view two lights, separated
in space, that alternately flash on and off with a brief time interval between the
flashes (about 60 msec). This movement is called stroboscopic motion, and it is
37
Figure 2.25.
Illustration of experiment by Brown (1931). The left circle must move faster than
the one on the right for the two to be perceived as moving at the same speed.
very important for motion pictures because films are merely a set of still
pictures flashed in quick succession. Stroboscopic movement is also important
for understanding how motion is actually perceived because it demonstrates that
it is.a perceptual quality of its own, rather than a derivative of our sense of
time and space.
In early studies of stroboscopic movement, Wertheimer (1912) discovered that
the apparent movement of two spots of light in the above demonstration goes
through several different stages depending on the time interval between the
flashes. If the interval was less than 30 msec, no movement was detected.
Between about 30 and 60 msec there was partial or jerky movement, while at
about 60 msec intervals the movement appeared smooth and continuous.
Between about 60 and 200 msec, movement could be perceived, but the form of
the object could not (objectless movement). Above about 200 msec, no
movement was detected. Of course, these values depend on the distance
between the two stimuli, but at all distances the different stages could be
identified.
Still another type of movement perception occurs without actual movement of
the object. For example, induced movement occurs when a background moves in
the presence of a stationary object, but it is the object not the background that
is seen as moving. You may have had this experience looking at the moon
when clouds were moving quickly in the wind; it is not unusual to have the
experience of the moon moving across the sky.
38
On a clear and quiet night looking at a star against a dark sky you may also
have experienced illusory movement of the star. The effect is easily
demonstrated by looking at a small light on a dark background. It may start to
move, even though it is rigidly fixed in place. This illusory movement is known
as the auto
effect. It is not well understood, but some researchers believe
it may be due to drifting movements of the eyes (Matin & MacKinnon, 1964).
Whatever the cause, one can imagine practical situations in which the
autokinetic effect has the potential to cause errors in judgment.
39/40
Color Vision
Chapter 3
Color Vision
by John S. Wemer, Ph.D., University of Colorado at Boulder
Color Mixture
From the scotopic spectral sensitivity curve (Figure 2.12, p. 22) it is clear that
rods are not equally sensitive to all wavelengths. Why, then, do all wavelengths
look the same to us when they stimulate only the rods? The answer is that a
rod can only produce one type of signal regardless of the wavelength that
stimulates it. That is, all absorbed quanta have the same effect on a single
receptor, and, therefore, it can only pass on one type of signal to the brain.
Thus, even though some wavelengths are more easily absorbed than others,
once absorbed they all have the same effect.
41
If each receptor cell has only one type of response, what explains how we use
our cones to see color? The answer is that we have three different types of
cones. They differ because each type contains a different photopigment.
Figure 3.1 shows the absorption spectra -- plots of relative absorption as a
function of wavelength -- for the three types of photopigment contained in
human cones. Note that each type is capable of absorbing over a broad
wavelength range. One type maximally absorbs quanta at about 440 rin,
another at about 530 nm, and the third type at about 560 nm. We call these
three types of receptors short-, middle-, and long-wave cones, based on their
wavelength of maximal sensitivity.
9
to
'S00d1
Color Vision
4) 20
MT
8"20
40
that color discriminations that depend on S cones will be impaired if the image
is sufficiently small to fall only on the center of the fovea. This is illustrated by
Figure 3.3. When viewed close, so that the visual angle of each circle subtends
several degrees, it is easy for an individual with normal color vision to
discriminate the various pairs; yellow vs. white, blue vs. green, and red vs.
green. Viewed from a distance of several feet, however, the yellow and white,
as well as the blue and green, pairs will be indiscriminable. This is called
wma/fl-Je/d bitanopia, because tritanopes are individuals who completely lack S
cones. A tritanope would not be able to discriminate the yellow from the white
in Figure 3.3 regardless of their sizes. With certain small fields, even normal
individuals behave like tritanopes. Notice that with the small field condition, the
red-green pair is still discriminable because S cones are not necessary for this
discrimination. Thus, the small-field effect is limited to discriminations that
depend on S cones.
Figure 3.3.
Colors (yellow and white) not discriminable at a distance due to small field
44
Color Vision
Table 3.1
Congenital Color Vision Deficiencies
Type
Tritanomaly
Abnormality
Shifted S-Cone Pigment
Deuteranomaly
Protanomaly
Tritanopia
Deuteranopia
Protanopia
Prevalence
Males: ?
Females: ?
Males: 5.1%
Females: 0.5%
Males: 1.0%
Females: 0.02%
Males: 0.0007%
Females: 0.0007%
Males: 1.1%
Females: 0.01%
Males: 1.0%
Females: 0.02%
Not all deficiencies of color vision are congenital, some are acquired in later
life. Unlike congenital deficiencies which are due to abnormalities at the level
of the photopigments, acquired deficiencies can be due to disruption of
processing at any level of the visual system. For example, on rare occasions
following a stroke, an individual may experience damage to a particular region
of the brain involved in color processing that will render him or her
permanently color blind. Such a case was reported for a customs official who
46
Color Vision
47
-05 -sws
-2
0
-2 5
..
2.t.
2230'
a*
0"
-30 00
9
9
_ _-_
..
2
20
40
80..
00
so
100
-:--
---
60
o---- 7---.
-05
LWS
-,o-
-15"
to0
0wS
>
So
40
0o ,--
**
-25
-30
0
'
60
80
,-20
40
00
100
AGE (yeafs)
Testing
Figure 3.4.
cones, measured
psychophysically, plotted
as a function of observer
age. (data from Werner &
Steele, 1988, figure from
Werner et al., 1990)
The most definitive way to measure color vision is through color matching. A
yellow light (590 nm) can be matched with a mixture of a yellowish red (670
nm) and a yellowish green (545 nm). The stimulus used for such a test is
illustrated by Figure 3.5 and is produced by an instrument called an
anomaloscope. Deuteranomalous and protanomalous individuals will differ from
normal in the ratio of the two light intensities in the mixture that is required to
match the yellow. Deuteranopes and protanopes can match the yellow using
only one of the two lights simply by adjusting the intensity. Other wavelength
mixtures can be used to diagnose deficiencies of the short-wave cones.
48
. ..
.. ..
'
' ,
-J
r.
Color Vision
670 nm
Figure 3.5.
,-
'..
. .
07
0.,
*~ ~
.g@
,0
ih
4~
**.kw
*~
~0.
.v
*
L~
@*4
* ,"0 V Iih!i
Figure 3.6:
Figure 3.7.
.-
Color Vision
CoW Appeanwce
Color is defined by three properties) brightness, hue, and saturation. It would
be convenient for engineers if these three psychological properties were related
in one-to-one correspu-dence to physical properties of light, but they are not.
Imagine that you are sitting in a dark room viewing a moderately bright
monochromatic light of 550 nm. A normal trichromat would say it is yellowish
green. If we increased the number of quanta the light emits, you would say that
the light is now brighter. What you experience as lhown increases with the
light intensity, but before you conclude that brightness depends only on light
utaneo
hne coM .
intensity, look at Figure 3.8 which demonstrates
The two central patches are identical, but their brightness is influenced by the
surroundings. All things being equal, brightness increases with intensity, but it
is also affected by other factors.
51
Figure 3.8.
As we increase the intensity of our 550 run light, you will probably notice that
what appeared as green with just a tinge of yellow now has a much more vivid
yellow component. You might say that the color has changed, but this change
in appearance is described more precisely as a change in hue. Hue refers to our
chromatic experience with light, such as redness and greenness. Many people
think that particular wavelengths produce definite hues, but this is not entirely
correct. Wavelength is related to hue, but one must consider other variables as
well, such as intensity. In our example, a single wavelength produced somewhat
different hues at different intensities.
A third change in the appearance of our 550 nm light as we increase the
intensity is that the tinge of whiteness that was detectable at low intensities has
now become clearer. The whiteness or blackness component is another
dimension of our color experience known as saluration.A light with little white
is said to be highly saturated and appears vivid; a light with more whiteness is
less saturated and appears more "washed out."
Thus, there are three dimensions
saturation. These dimensions are
wavelength. As we increased the
constant, we saw a clear charge
saturation.
52
Color V'ison
GROUP
z
"W 75 -
25
o
M
,.
0
.J
9
-50
5
50
0
z
W
U
cr 25
0.
Figure 3.9.
400
450
Iz
W
75 0
500
550
600
B
Y
WAVELENGTH (nm)
650
700
1100
100 on the left and the percentage blue or yellow is plotted on the right from
100 to 0. The data could be plotted in this way because, when describing a
uniform patch of the visual field, observers do not use the terms red and green
simultaneously, that is, they do not call it "reddish green," nor do they use the
53
terms blue and yellow simultaneously ("bluish yellow"). The arrows in the
graphs indicate the wavelengths perceived to be uniquely blue, green, or yellow.
Hering further argued that by studying our color experiences carefully, we could
discover other properties of how the brain codes for hue. For example, while
we can experience red in combination with either yellow (to produce orange)
or blue (to produce violet), we cannot experience red and green at the same
time and place. When red and green lights are combined, they cancel each
other. The same is true of blue and yellow lights. Hering proposed that this
happens because red and green (as blue and yellow) are coded by a single
Color Vision
W
Bk
Figure 3.10.
55
100
1
0
75-
-j
75-
La.
25-
a:-
0..25
.000
SD
50A
50-
HA
25
a-
a
obeves (7ro
0
-O
1.0
2.0
VoVeh eta.S98
01
3.0
4.0
-1.0
LOG TROLANDS
Figure 3.11.
(c'0.46,
1.0
2.0
3.0
4.0
0.53)
56
Color Vision
100
i"
the
Ylo
/
s
60
1,1"
M
Zones in the visuai field of the right eye in
col can be s1o91r)
whimch,
Hurvich, 1981)
There are three points that should be noted about these color zones in the
visual field. First, it is evident that the same visual stimulus can be perceived
differently depending on the area of visual field that is stimulated. For example,
at the fovea, a stimulus might appear orange or reddish yellow, at about 400
away from the fovea it might be yellow, and at 70o it may appear gray. Second,
the figure again illustrates that red and green are linked, as are yellow and
blue. The linkage is through an opponent code as discussed earlier. Third, these
zones were measured under one condition and with other conditions such as
larger fields they will change somewhat.
Wavelength Discriminationand Identification
Discriminating color requires an observer to compare two lights and to decide
whether they are the same or different. Identification involves an absolute
judgment about a color name or category that must be made regardless of
whether other colors are present.
57
ROWg of Difaiminafion
To measure wavelength discrimination, the experimenter typically uses a split
field such as that shown by the inset of Figure 3.13. One half-field is
illuminated by a standard wavelength and the other half-field by a variable
wavelength. If the two half-fields are seen as different, the experimenter
increases or decreases the intensity of the variable wavelength to determine
whether it is discriminable at all intensities. If there is any intensity at which
the fields are indiscriminable, it is said that the observer does not discriminate
the wavelength pairs. Thus, when we say that two wavelengths can be
discriminated, it is implied that this discrimination is made independent of
intensity. The object of such an experiment is to find the minimum wavelength
difference, or AX, that can be discriminated.
.0
Photometric Field
2 degrees
Approx 70 trolonds
(nm)
400
500
600
WAVELENGTH, X
igueM
3.13.
WavengM
dif
w a funcion of wavelengh
700
inm)
IndependeW
of iensy ploted
58
Color Vision
Simultaneous Contrast
Figure 3.15 illustrates a spatial, color-contrast effect. The thin bars in the two
patterns are identical, but they look different when surrounded by different
colors. This is called simultaneous color contrast because it occurs
instantaneously. The color induced into the focal area is opposite to that of the
surround. This is attributable to opponent processes that operate over space; the
neural activity in one region of the retina produces the opponent response in
adjacent regions. While the effect noticed here is primarily from the surround
altering the appearance of the bars, the opposite also occurs.
Through simultaneous contrast we can experience many colors that are not seen
when viewing spectral lights. For example, the color brown is experienced only
under conditions of color contrast. If a yellow spot of light is surrounded by a
dim white ring of light it will look yellow. As the luminance of the surround is
increased (without changing the luminance of the center), there will be
60
Color Vision
Figure 3.14.
Figure 3.15.
61
corresponding changes in the central color. First it will look beige or tan, then
light brown, followed by dark brown (Fuld et at., 1983). If the ring is still
further increased in luminance, the central spot will look black. The color black
is different from the other fundamental colors because it arises only from the
indirect influence of light. That is, like brown, the color black is a contrast
color and is only perceived under conditions of contrast. Any wavelength can be
used in the center or surround and if the luminance ratio is sufficiently high,
the center will appear black (Werner et al., 1984).
AssiwTadoa
Sometimes a pattern and background of different colors will not oppose each
other as in simultaneous contrast, but will seem to blend together. This is
known as assimirlion or the Bezold Spreading Effect and is illustrated by Figure
3.16, (reprinted from Evans, R.M. An Intodudkm to Color. Plate XI, p. 192 *
John Wiley & Sons, Inc., New York, NY). Here we see that the saturation of the
Figure 3.16
Ademonrlon
motm
cd
1948
(Irom Evan%
red background of the top left and center looks different depending on whether
it is interlaced with white or black patterns, even though the background is
physically the same in the two sections. The lower illustration shows the effect
62
Color Vision
it is known that it cannot be explained by light scatter from one region of the
image to another. The phenomenon arises from the way in which colors are
processed by the brain.
Adaptation
We have already seen from the dark adaptation curve that the visual system
changes its sensitivity according to the surrounding level of illumination. We
have also seen that visual acuity increases with increased light level. Here we
shall briefly discuss some of the changes in color perception that occur with
changes in ambient light.
Chromatic Adaptation
520
0.8 r530
450
510
Contrast
540
0 0.5
o 1.0
5.o
550
- / ,'
"-A
0.6
0.
'~560
"
570
580
0.4
'-..~,sy~n590
0.4
490
0.2 -
0
0
63
o ,
",= ,.
450
0.2
0.4
0.6
0.8
Figure 3.17.
to that of the adapting field color. For example, white letters may be tinged
with yellow when viewed on a blue background or tinged with green when the
observer has adapted to a red background. These effects of chromatic
adaptation can be altered to work in favor of color identification or detection.
For example, detection of a yellow stimulus may be enhanced by presenting it
on a blue background.
Variaton Under Normal Conditions
The effects of ambient light in altering the state of adaptation are not
fundamentally different from those already shown in Figure 3.17. However,
since most ambient lights contain a broad distribution of wavelengths, the
receptors are not adapted as selectively as in laboratory experiments.
One important consideration in evaluating changes in ambient illumination
under natural conditions is that in addition to altering the perceptual state of
an observer, there often can be substantial changes in the display itself. CRT
screens typically reflect a high percentage of incident light. The light emitted
64
Color Vision
from the display is therefore seen against this background of ambient light.
Figure 3.18 shows how sunlight alters the spectral composition of the colors
available on a display. As sunlight is added to the display, the gamut of
chromaticities shrinks, as illustrated by the progressively smaller triangles
(Viveash & Laycock, 1983). To an observer this would be experieiiced as a
desaturation or "wash out" of the display colors as well as a shift in hue that
accompanies changes in saturation, called the Abney effect (see Kurtenbach,
Stemheim & Spillmann, 1984). Some colors that were previously discriminable
may no longer be so. Finally, not illustrated by the figure is the substantial
reduction in luminance contrast
09
Do
193,
CE
06
o5
0-
0,
00
Figure 3.18.
O01
02
03
04
05
06
07
06
Color Specification
There are many situations in which it is useful to have an objective method for
specifying color. Since color perception of a fixed spectral distribution depends
upon many conditions, a system of color specification could be based on
appearance or on some physical or psychophysical description of the stimulus.
Each system of color specification has advantages and disadvantages.
CIE System
We have seen that a normal trichromat can match any wavelength (or any
mixture of wavelengths) by some combination of three other wavelengths or
65
Color- matChing
functions
ic A
OM
G
W
0 4
are designated 3,, Y, and Z, and are
tristimulus
spectral
the
as
known
values. Among the nuances of this
FN 3.19. CIE Ubdnwks valus for a 20
samdad obsw prAted w a
system, the Y tristimulus value is
fincaoi of wavdengf (from
identical to the V, function (photopic
Wimadd & SM, 19)
sensitivity of the standard observer).
Thus, when the V tristimulus value is
integrated with the energy distribution (by multiplying the energy by Y at each
wavelength and summing), we have the total value of the Y primary which is
equal to the luminance. It should also be mentioned that the CIE actually
developed two sets of tristimulus values, one for 20 stimuli and one for 100
stimuli.
To specify the chromaticity of a particular color in the CIE system, the energy at
each wavelength is multiplied by the X tristimulus value at each wavelength and
the products are summed across wavelengths to yield the tristimulus value (not
to be confused with the spectral tristimulus values) designated as X. Similarly,
the energy across wavelengths is convolved with the V and I tristimulus values
to yield Y and Z. The X, Y, Z values can be quite useful in specifying a color.
For example, given the values for a color of interest, we can be certain that it
66
Color Vision
550
x, y )-chromaticity
diagram
00.6
o(E)
47 0s
3o
I0
(Z)
0.2
0.4
0.6
Figtu 3.20.
67
0.8
1.-0
all plot around the perimeter of the diagram, a region known as the spectrum
locus. The area inside the diagram represents all physically realizable mixtures
of color. Given the chromaticity coordinates of a color, a perfect match can be
made by various mixtures determined using the chromaticity diagram. If we also
wanted the match to include information about luminance, we would have to
specify Y as well as the x,y coordinates.
A useful property of the CIE chromaticity diagram stems from the fact that a
mixture of two lights always plots on a straight line that connects the points
representing the lights within the diagram. The position along the line that
represents the mixture depends on the energy ratio of the two lights. Thus, if
we plot the points representing the chromaticity coordinates of three phosphors
on a color display, we can connect the points to create a triangle representing
the color gamut of the display. This triangle would represent all chromaticities
that can be generated by the display.
The CIE chromaticity diagram is useful for specifying color in many
applications, but it does have some drawbacks. Perhaps the most important
problem is that equal distances between sets of points in the diagram are not
necessarily equal distances in perceptual space. To rectify this problem the CIE
developed a new chromaticity diagram, shown in Figure 3.21, in an attempt to
provide more uniform color spacing. The coordinates of this diagram are called
u',v' and can be obtained by a simple transformation from the x,y coordinates
(for further details see Wyszecki & Stiles, 1982). The smaller triangle in Figure
3.21 shows the gamut of many typically used displays while the larger triangle
shows the maximum envelope of
currently used displays.
0 6
Munsel SYsM4
5900
05
490
03-
480
02
01-
470
46-1
Figure 3.21.
0o
400
0 3 II04
0 5
06
07
(fro
68
02
Color Vision
known system being the one developed by Munsell in 1905. In its current form,
the Munsell System consists of a series of colored paint chips arranged in an
orderly array, as illustrated by Figure 3.22. Each entry is characterized by three
numbers that specify hue, blackness and whiteness from 0 to 10 (called
lightness), and ratio of chromatic and achromatic content (called chroma). Hue
is represented by a chrcular arrangement in 40 steps that are intended to be
equal in perceptual space. Lightness varies from bottom to top in nine equally
spaced steps from black to white. Chroma, or saturation, represents the hue and
lightness ratios in 16 steps that vary from the center outward. To use this
system, one merely finds the chip that most closely matches an item of interest.
Each chip is specified by three parameters: hue, lightness, and chroma. Since
the steps between chips are nearly equal, the Munsell system can be useful in
the selection of colors that are equal distances in perceptual space.
While the Munsell system is easy to use and the arrangement corresponds more
closely to color appearance than the
CIE system, it still has many
limitations. The influences of
surrounding colors and state of
adaptation which are important for
color appearance are not taken into
account by the Munsell designations.
Thus, the appearance and
Pigme,
discriminability of colors expected
from a Munsell designation may not
be obtained when the conditions of
viewing are altered.
P,
of
constant hue
5R
1,
12
/10
chr...
19U)
while at the same time providing a
good basis for grouping or
organizing information on a display
which may help display operators segregate multiple types of information and
reduce clutter. For example, an experiment by Carter (1979) showed that when
the number of display items was increased from 30 to 60, search time increased
69
by 108% when only one color was used, but increased by only 17% for
redundant color-coded displays.
There are severe constraints on the effective usage of color information (see
also Walraven, 1985). The attention-getting value of a color is dependent on its
being used sparingly. Only a limited number of colors should be used in order
to avoid overtaxing the ability of an observer to classify colors. If each color is
to have meaning, only about six or seven can be utilized effectively.
In addition, we have seen that perception of a fixed stimulus will be changed as
a function of many variables including the intensity, surrounding conditions,
temporal parameters, and state of adaptation of an observer. If color is a
redundant code, these problems, as well as loss of color due to aging of the
display, will have substantially less impact on operator performance.
The choice of colors can be facilitated by considering the physiological
principles by which hues are coded -- red opposes green and blue opposes
yellow. These colors are also separated well in CIE chromaticity diagrams.
Colors that are barely discriminable at low ambient conditions may not be at all
discriminable at high ambient conditions because of a physical change in the
color gamut.
The use of blue stimuli can be problematic for displaying characters requiring
good resolution. The blue phosphors on many displays only produce relatively
low luminances, but the main difficulty is a physiological problem in processing
short wavelengths. One problem already mentioned that might result from using
small blue stimuli is related to small-field tritanopia. Because the short-wave
cones are distributed more sparsely across the retina, they contribute very little
to detail vision. Short-wave cone signals are not used in defining borders oi
contours (Boynton, 1978). In addition, focusing of short-wavelength stimuli is
not as easily achieved as for middle- and long-wave stimuli, making blue a
color to avoid in displaying thin lines and small symbols. A major advantage of
blue and yellow is that our sensitivity to these colors extends further out in the
visual field than our sensitivity to red and green. Blue hues also provide good
contrast with yellow. Thus, while blue may be a good colo- to avoid when
legibility is a consideration, it may be a good color to use for certain
backgrounds on displays.
70
Chapter 4
Form and Depth
by John S. Werner, Ph.D., University of Colorado at Boulder
We have already discussed how two objects of different sizes, placed at different
distances from us can cast images of idi.ntical size and shape on our retina.
Despite this, we can still tell that one is small and close and the other is large
and far away. How do we do thL. Either we have additional information
about physical distance or we know something about the physical size.
We encounter another aspect of the same perceptual problem when we consider
the fact that as an object changes position with respect to us, because either it
is moving or we are moving, the retinal image formed by the object
continuously changes shape and size. These changes depend on both the
object's distance and ,ar angle of view. For example, an object moving away
"grows" smaller. Or the image of a square on our retina may become in turn a
rectangle or a trapezoid depending on our angle of view. The amazing fact in
the face of such retinal contortions is that our perceptions of the object's shape
and size remain relatively constant; we still see a square. These perceptual
71
other words, our concepts about the physical world. It is important to realize
that we are usually unaware of this process when perceiving size and distance;
we do it automatically. In this section we will discuss some of the ways in
which form and depth information are processed.
S-4"-
Figure 4.1.
73
I)/
D
Position
Figure 4.2.
The bottom panel of Figure 4.2 shows a luminance profile in which there is an
increase in intensity from left to right. The photograph above shows a stimulus
that changes according to this luminance distribution, but notice that our
perception does not follow it exactly. Rather, at the border one perceives small
bands of exaggerated darkness and brightness, labelled D and B in the
photograph. These are called Mach bands in honor of Ernst Mach (1865) who
first described them. The pattern we perceive exaggerates the abrupt light-dark
transitions.
There are many other phenomena in which the brightness or darkness of a
region depends on border contrast or on changes in contrast over time (see
Fiorentini et al., 1990). These phenomena reveal the visual system's attempt to
extract information at the borders because borders and edges define objects or
parts of objects.
74
Contrast Sensitivity
The forms of objects are defined by contrast. It is, therefore, important to
characterize the sensitivity of the visual system to contrast. One approach to this
problem is to measure contrast sensitivity using grating stimuli in which the
luminance is varied sinusoidally as illustrated by Figure 4.3. If one were to
Pualnon
Pol)lllo
I kiPf O-kftf\J
Figure 4.3.
1970)
measure the intensity of the stimuli on the left, by passing a light meter across
it, the sinusoidal luminance profile on the right would be found. The profile of
the stimuli could be characterized by the contrast, which was defined above by
the difference between the luminance maximum and minimum, divided by the
average luminance. The frequency of oscillation of the sine wave is defined in
terms of the number of cycles per degree of visual angle (cpd). For example,
the stimulus on the top of Figure 4.4 has a lower spatial frequency than the
one on the bottom.
Contrast threshold is measured by determining the minimum contrast required
for detection of a grating having a particular spatial frequency (usually
generated on a CRT display). Contrast sensitivity is the reciprocal of contrast
threshold. Thus, the contrast sensitivity function represents the sensitivity of an
75
individual to sine-wave
gratings plotted as a function
of their spatial frequency.
Figure 4.4 shows a typical
contrast sensitivity function
10
to
C
L) 10
____,__,________,__,._,____
0.1
10
Spatial frequency (cideg)
Figure
4.4.
It can be deduced from the contrast sensitivity function that we are not equally
sensitive to the contrast of objects of all sizes. High spatial frequency sensitivity
is related to visual acuity; both are a measure of resolution, or the finest detail
that can be seen. When spatial vision is measured by an optometrist or
ophthalmologist, only visual acuity is typically measured. While a more
complete evaluation of spatial vision would include contrast sensitivity
measurements over a range of spatial frequencies, it is the high frequency
sensitivity that is most impaired by optical blur (Westheimer, 1964). Thus, high
frequency sensitivity is what is improved by spectacle corrections.
One explanation for our contrast sensitivity is that cells in the visual cortex
respond selectively to a small band of spatial frequencies. The contrast
sensitivity function may thus represent the envelope of sensitivity of these cells.
This is analogous to the photopic spectral sensitivity function representing the
relative activity of three classes of cones. In the case of contrast sensitivity, the
model implies that different cells respond selectively to stimuli of different sizes.
A demonstration consistent with this idea is presented in Figure 4.5. Notice that
76
Figure 4.5.
the two patterns on the right are of identical spatial frequency. Now, stare at
the bar between the two gratings on the left, allowing your eyes to move back
and forth along the bar. This scanning prevents the buildup of a traditional
afterimage. It is intended to fatigue cells responsive to gratings of a particular
size. After about 45 seconds of fixating along the bar on the left, shift your
gaze to the small bar on the right. The two patterns on the right will now
appear to have different spatial frequencies. According to theory (Blakemore &
Sutton, 1969), size-selective cells responsive to gratings on the left were
fatigued during fixation. This shifted the balance of activity when looking at the
patterns on the right compared to the activity produced by the gratings prior to
adaptation.
Variation with Luminance
The effects on contrast sensitivity of changing the space average luminance of
the stimulus were systematically investigated by DeValois, Morgan and
Snodderly (1979). Their data are shown in Figure 4.6. Contrast sensitivity is
plotted as a function of spatial frequency. Different symbols and curves from
top to bottom correspond to luminance decreases in steps of 1.0 log unit. This
figure shows that overall contrast sensitivity is reduced as luminance decreases,
but the reduction in sensitivity is much greater for high than low spatial
frequencies. This shifts the peak of the function to lower frequencies with
reduced luminance. In general, high spatial frequency sensitivity decreases as a
function of the square root of the luminance (Kelly, 1972).
77
,
,
A
0
\
\
o6
Spatiof
.
2
frequency,
40
'10
c/deg
Figure 4.6.
Contrast sensitivity is
plotted as a function of
Snodderly, 1974)
in the retina and at higher levels in the brain (see Wilson et al., 1990). When
larger stimuli were used to compensate for these factors, Rovamo et al. obtained
the results shown in the panel on the right side of Figure 4.7. These results are
important because they show how stimuli can be scaled in size to be equally
visible at all eccentricities.
78
10-a
oO
01
S"c
.1-010
~0
O\
0,o.
7.
tI
c0
16
32
Figure 4.7.
64
')
5,,
200
L
-
IOo
50
2 o0's -S
20
3
40's
~40'
c_
\0 50's
o-
6o's
7
.70's
-8os
N 91
--
0 5
Spatial frequency,
16
16
c/deg
80
Figure 4.9.
Sine-wave (left) and squarewave (right) gratings of the same spatial frequency.
(from De Valois & De Valois, 1988)
waves look more and more like a square wave. While a mathematically perfect
square wave requires an infinite number of sine waves, only the frequencies to
which we are sensitive (as defined by the contrast sensitivity function) need be
used. This can be demonstrated by producing a set of sine waves and adding
various components until the complex wave becomes indiscriminable from a
true square wave.
SSquare
31
Figure 4.10.
Wa.e
, *3r. Sf
t.3,5f*7
81
Figure 4.11.
82
83
The size of objects can sometimes indicate their relative depth. If several similar
items are presented together, the larger items will be judged as closer. For
example, the series of circles in Figure 4.12 appears to be receding into the
distance. This makes sense because, in fact, the size of an object's image on the
retina becomes progressively smaller as it moves away.
000
Figure 4.12.
The ability to infer distance from image size often depends on familiarity with
the true size of the objects. At great distances, such as looking down from an
airplane, we perceive objects to be smaller than when they are near. in this
situation, our familiarity with objects and their constancy of size serve as a
source of information about distance. Although from the air a house ieems like
a toy, our knowledge about the actual size of houses informs us that the house
is only farther away, not smaller.
The relation between size and distance can lead not only to faulty inferences
about distance, as illustrated by Figure 4.12, but assumptions about distance
can also lead to faulty inferences about size. When we are misinformed about
distance, our pLrceptions of size and shape will be affected. You have probably
noticed, for example, how much larger the moon appears when it is low on the
horizon than high in the evening sky. This is called the moon illusion. The
change in the moon's appearance is only slightly affected by atmospheric
phenomena; by far the greatest effect is perceptual. Our retinal image of the
moon is the same size in both positions. You can prove this by holding at amns
length a piece of cardboard just large enough to block the moon from view.
The same piece of cardboard blocks the moon at the horizon and at its zenith
equally. Though they look different, they measure the same. The moon illusion
seems to be caused by inaccurate distance information about very far objects
(Kaufman & Rock, 1962). Because we see intervening objects on the earth's
surface when we look at the moon near the horizon, our internal distance
analyzers apparently cue us that the moon is farther away than when it is at its
zenith. An object analyzed as more distant has to be larger to produce an image
84
of the same size. Thus we perceive the moon as larger on the horizon than
when it is at its zenith.
The relationship between size and distance is important to understanding not
only harmless illusions such as the size of the moon, but also in situations of
more significance. As mentioned above, changing fixation from a head-up
display to distant objects often requires a change in the state of
accommodation. Change in the focus of the eye is accompanied by a change in
the apparent visual angle of distant objects. Thus, when a pilot shifts fixation
from a HUD to a distant surface in the outside world, the objects in the
distance may appear smaller and more distant than they really are (Iavecchia,
lavecchia & Roscoe, 1988). While the resultant spatial errors in perception are
temporary, Iavecchia et al. believe it could introduce a significant safety hazard
under some conditions.
Any ambiguity about relative distance in relation to size can be rectified when
one object partially occludes another, as shown in Figut,. 4.13. We perceive the
partially occluded object as being more distant. This cue to depth is called
inerposilion.
Figure 4.13
If a distant object is not partially occluded, we may still be able to judge its
distance using linearperspeciive- When you look at a set of parallt. lines, such
as railroad tracks going off into the distance, the retinal images of these lines
converge because the visual angle formed by two points parallel to another
decreases as the points are farther away. This cue to depth is s., powerful that
it may cause objects of the same size to be perceived as different, as in Figure
4.14.
85
Figure 4.14
Illustration of how linear perspective makes the same size objects appear to
be different sizes. (from Sekuler & Blake, 1985)
If you look at a textured surface such as a lawn, two blades of grass the same
distance apart would be separated by a smaller distance in the retinal image the
f-rther away they are because they cover a smaller visual angle. Most surfaces
have a certain pattern, grain, or texture such as pebbles on the beach or the
grain of a wood floor. Whatever the texture, it becomes denser with distance.
This information can provide clear indications of distances (Newman, Whinham
& MacRae, 1973). Figure 4.15 shows how discontinuities in the texture also
indicate a change such as an edge or corner.
Of special relevance to aircraft pilots is the depth cue known as aerial
pernpetive. As light travels through the atmosphere, it is scattered by molecules
in the air such as dust and water. The images of more distant object-, are thus
less clear. Under different atmospheric conditions, the perceived distance of an
object of fixed size may vary. For example, an airport will appear farther away
on a hazy day than on a clear day.
Some monocular cues to depth are not static, but are dependent on relative
movement. When we are moving, objects appear to move relative to the point
86
~/'
\X
\//
/;
11 12k
Foum 4.15
IMusrlon o
ItmI
ugWadlefts
A'A
87
Figure 4.16.
Fiur 4.17.
Ocular Convrgence
When fixating a distant object, the image of the object will fall on the fovea of
each eye. As the object is brought nearer, maintenance of fixation will require
that the two eyes move inward or converge. This information about
convergence can be used to gauge the absolute distance of objects, provided the
objects are not more than about 10 feet away. Beyond this distance, the
convergence angle of the two eyes approaches zero.
F
/"
disparity. Thus, we will perceive the subset of dots within the square as lying in
front or behind the other dots. The ability of the visual system to correlate all
of these random dots shows that retinal disparity does not require a comparison
of specific forms or features of objects. One possible basis for extracting the
information in the two eyes quickly might be for the visual system to process
the spatial frequency content in the two images (Frisby & Mayhew, 1976).
90
91
92
Information Processinm
Cht........er 5
Information Processing
by Kim M. Cardosi, Ph.D., Volpe Center
based on material presented by Peter D. Eimas, Ph.D., Brown University
What Is the Mind?
An important belief shared by cognitive psychologists is that the mind has many
components that perform different functions. We can measure the time it takes
for the different parts of the mind to do their jobs, even though our experience
of information processing or of any cognitive function is that it happens
instantaneously. In laboratory research, psychologists can parcel our mental
processes into component parts and measure the time it takes for each
component task to be accomplished.
93
1.
SENSORY
LONG-TERM MEMORY
SYSTEMS
&
SENSORY BUFFERS
T
4-
KNOWLEDGE
ATTENTION
AUTOBIOGRAPHICAL
ME14OR IES
RULES
PERCEPTION
-
PATTERN
RECOGNITION
EXPECTATIONS
GOALS
DESIRES
It
"-
RESPONSE SYSTEM:
DECISIONS,
RESPONSE EXECUTION
CONSCIOUSNESS (?)
L
Figure 5.1
94
RESPONSES
Information Processinf
make a decision and initiate a response. Decision making and response selection
will be covered in Chapter 7.
We can classify our stored knowledge as explicit or implicit. Frplicit knowledge
is knowledge that you have direct and immediate access to. This includes your
name, your phone number, what you do for a living, who your spouse is, what
your children's names are, all the knowledge you have about your expertise and
your profession, etc. All of these are explicit forms of knowledge that you can
describe in detail as well as use for many tasks of a cognitive nature.
Implitm
knowledge is knowledge that you have, but you are not able to describe;
that is to say, you do not have direct access to this knowledge. Good examples
of implicit knowledge are things like riding a bicycle, playing tennis, catching a
baseball, the syntactic rules of your language, etc. Most likely, unless you're a
physicist, you have no explicit knowledge of the laws of physics that you use
when doing such things as riding a bicycle. Nevertheless, you can do them
properly. Your implicit knowledge enables you to do so -- it is available for
certain tasks, but it is not available to consciousness.
Figure 5.1 breaks things up rather neatly, as if these processes occur separately,
taking a lot of time. However, information processing, perception, speaking, and
listening go on very, very quickly. The diagram shows mental activity occurring
in accord with a serial processing system; that is, we do one thing at a time.
However, there is the belief that paralel processing (doing more than one thing
at a time) also occurs. It is difficult to substantiate that parallel processing goes
on in the mind because the measuring instruments are limited. We can measure
the electrical activity of someone's brain and say that the brain is working
because we see the blips on the electroencephalogram. We can be much more
precise and say that certain areas are working. What appears to be true is that
some of those areas are working in parallel. Indeed, if we think of all the
events that must occur during perception of visual scenes or spoken language,
parallel processing would seem to be absolutely necessary if we are to explain
how these processes, these mental activities, could occur so quickly.
Another box in Figure 5.1 is attention. Attention is simply the part of the mental
system that directs us to one sort of information rather than another. We are
able to attend to a particular stimulus even in the presence of an enormous
amount of other stimulation. This ability to selectively attend to specific
information will be discussed in detail in Chapter 8.
Information processing takes time, as noted above. The time required to process
information depends upon many factors. In most cases, information will be
96
Information Processin
processed only to the extent that is required by the task. The more complex the
task, the more time will be required. For example, in any array of colored
numbers, more time will be needed to count the blue numbers than to decide
whether or not blue numbers are present in the display. Still more time will be
needed to add the blue numbers. This type of difference in the level of
processing is referred to as "depth" of processing. The more, or "deeper," the
information is processed, the easier it will be to remember (Craik and Lockhart,
1972). For example, a controller is more likely to remember "seeing" an aircraft
that he or she has communicated with several times than one with which no
communication was required. In our previous example of an array of colored
numbers, the person who added the blue numbers would have more success in
recalling them than the person who counted the same numbers. Information
that is not specifically attended to is not likely to be remembered. The more
attentional resources spent on processing the information, the more accurately
the information will be remembered. This has implications for complex tasks in
which it is important to remember certain pieces of information. We can
maximize the chances of being able to remember information by requiring that
the information be used or processed in some way. Information that is not
actively attended to will not be easily recalled from memory when needed.
Attention
In Principlesof Psychology (1890), William James defined auention as "the mind
taking possession, in clear and vivid form, of one of what seems several
simultaneous possible objects or trains of thought. It implies withdrawal from
some things to deal effectively with others." In processing information we can
focus on specific information at the expense of other information, and we can
shift our attention from one thing to another. What are the costs of focused
attention? How do you move attention around?
Attention directs us to something particular. Some researchers consider human
mental processing to be, for the most part, a serial processing system like the
central processing system of most computers. Computers do one thing at a time,
but they do them very, very quickly; performing millions of operations per
second. Our neurons are not as fast. In fact, they're incredibly slow. So what we
probably do is group great masses of them together to do things and use
parallel processing. One mass of neurons in one section of the brain does one
thing, while another mass in another section does another thing.
The attention mechanism that directs our processing energy works both within
a sensory modality, (i.e., within vision or within audition) and across
modalities. There may be two types of attention: one that directs you to a
modality, and one that works within a modality. Alternatively, there may be a
97
single central processor in the mind that is responsible for prioritizing incoming
information.
Se
Attention
Some of the early scientific work on attention began around 1950 in a group
led by Donald Broadbent in England. It took its impetus from a phenomenon
that came to be called "the cocktail party effect." If you go to a cocktail party
where there are only a couple people and the noise level is not bad, it's easy to
understand the person you're t,1king to. After 150 people have arrived, the
noise level is overpowering; if you recorded it, it would sound like gibberish. It
would be very difficult to pick out one conversation on the tape and pay
attention to it. However, an individual at the cocktail party can begin to and
continue to attend to a speaker and understand what that speaker is saying
despite distracting noise.
One factor that makes this possible is the distinctiveness of the voice of the
person who is speaking to you; it is easier to attend to an individual if the
voice is distinctive in some way. For example, it would be easier to attend to a
woman's voice when the distracting voices were men's voices, because of the
pitch of a woman's voice tends to be very different from a man's. Another factor
is the direction of the voice. You can focus on a voice by virtue of the direction
it comes from: a voice coming from a certain direction hits one ear earlier than
the other by a very precise amouiat of time. Other factors that allow you to
attend to a particular voice at a noisy cocktail party are the coherence or
meaning of the speech, the nature of the voice, and the emphasis given to the
words. These kinds of simple matters were related by Broadbent to what was
called "picking up a channel of information and staying attached to it."
Neisser and Becklen (1975) performed interesting experiments that show the
power of selective attention. They showed videotapes of games to subjects and
had them perform simple tasks. In one tape, three men bounced a basketball
back and forth to each other. The subjects' task was to count the bounces.
Then, Neisser showed a tape of two people playing a handslapping game. The
subjects' task here was to count the number of hits. If either task was
performed alone, counting accura-y was near perfect. When the two tapes were
superimposed, it was still quite easy to count either the number of ball bounces
or the number of hand slaps. Trying to count both at the same time, however,
was quite difficult. It was so difficult that the subjects failed to notice the "odd"
events of the ball disappearing or the men being replaced by women. This is
one example of the filtering of information. We can attend to and process
98
Information Processing
response time to it will be faster than if eye movements are required to fixate,
or focus on, the information. Similarly, if the stimulus appears within the
person's visual field, but in the periphery rather than at the fixation point,
response time will be lower than when an eye movement is required, but higher
than when the target appears at the fixation point. While we usually move our
eyes when we shift attention, this is not always necessary. We can shift our
mental focus, or intemal attention. Even when shifting internal attention does
not involve eye movements, it does take time. The time required to shift
internal attention increases with the distance from the fixation point and travels
at a velocity of about 1 per 8 msec. (Tsal, 1983). Furthermore, some
information that is presented during the time that it takes to make this shift
may not be processed (Reeves and Sperling, 1986).
Automatic and Conrolled Processing
Automatic processing occurs in highly practiced activities like driving a car,
riding a bike, etc. You do it without necessarily being aware of what you're
doing. It just happens. Automatic processing is fast. It appears to be parallel,
that is, you can do more than one thing at a time, and ites fairly effortless.
Controlled processing means voluntary, one-step-at-a-time processing. It is a
rather slow process. It requires focussing attention to specific parts of complex
tasks. Acquiring controlled processing can be done simply by saying "Pay
attention to this." Acquiring automatic processing, on the other hand, may be
very slow or fast, depending on the task. At very low levels where the
distinctions are being made by simple kinds of physical stimuli, e.g., search for
a red object among varying colored objects, search for a curved line among all
straight lines, automatic processing can be achieved quickly. Things tend to
jump out. It's called the popout effect. If you ask subjects, "How did you find the
red square?" they say, "Well, it was kind of there. It popped out at me."
Whether there was one choice, two choices, or four choices, they were trying to
search for, it really didn't make any difference. It just seemed to show up to
them. They had to do less processing. It popped out. Something was
"automatically" happening to them. If you have to do high-level processing, such
as searching for particular letters in a field of other letters over and over again,
achieving automatic processing is much more difficult and takes much more
time. Automatic processing allows for development of fast, highly skilled
behaviors without eating up attentional resources.
Many things we do acquire a quality of automaticity, which is to say we do
these things automatically, without thinking much about them. For example,
learning to drive a car is a complex, difficult task. It is attentionally taxing and
even simple conversation is very distracting. An experienced driver, however,
100
Information Processini
can drive and carry on a conversation with ease. This is what we mean by
automatic processing. You can perform your primary task (e.g., driving) and
simultaneously perform another task (e.g., conversing) and do each as well as if
you were doing it alone. And, you are doing both without a great deal of stress
and effort because one of these tasks is being done automatically.
There are many examples of complex, difficult tasks becoming easier and less
taxing with practice. Any difficult task is, at first, attentionally all consuming;
extraneous or unexpected information is not likely to be processed. Sufficient
practice, however, can make even the most difficult tasks sufficiently easy to
deal with other incoming information. This is the advantage of automaticity.
When tasks or parts of tasks (subtasks), such as flying straight and level, are
performed automatically, resources are available to perform other tasks
simultaneously. While it is easy to see the benefits of automaticity, it is
important to be aware of the hidden costs. One of these costs is commonly
referred to as complacency. Since we devote less attention to tasks we can
perform automatically, it is easy to miss some incoming information - even
when this information is important (such as a subtle course deviation or a new
stop sign on a road traveled daily). We are most likely to miss or misinterpret
information when what we expect to see or hear differs only slightly from what
is actually there.
Epecaton
Expectations are powerful shapers or perception. We are susceptible particularly under high workload - to seeing what we expect to see and hearing
what we expect to hear. Even when we do notice the difference between the
expected and the actual message, there is a price to pay; it takes much longer
to process the correct message when another one is expected than when the
correct one is expected or when there are no expectations.
Scharf, Quigley, Aoki, Peachey, and Neeves (1987) demonstrated that evwn the
simplest of information processing shows a detrimental effect of a discrepancy
between the expected and actual information. They played a pure tone between
600 and 1500 Hz that was just barely audible and told subjects that this tone
would be played again during one of two time intervals. No tone was played in
the other interval. The subjects' task was to decide in which interval the tone
was played. When the tone that the subjects had to listen for (the target) was
the same frequency as the one they had heard first (the prime), subjects were
90% correct in identifying the interval that contained the tone. When the
frequency of the target was changed, performance suffered. For example, when
a 600 Hz tone was expected and a 600 Hz tone was the target, performance
was near perfect with 90% accuracy. When a 1000 Hz tone was expected and a
101
600 Hz tone was the target, performance was near chance with subjects
guessing which interval contained the tone with only 55% accuracy. The same
was true when the target tone was 1500 Hz and the prime was 1000 Hz. Even
a difference of only 75 Hz (with targets of 925 and 1075 Hz) resulted in a
drop in accuracy frcm 90% to 64%. This supports transfer ot training principles
which will be discussed in detail in Chapter 7. The closer auditory warnings are
to what is expected (e.g., from simulator training or experience in other
aircraft), the easier it will be to "hear," all other things being equal.
The powers of expectancy are even more obvious in higher level processing,
such as speech perception. If you quickly read aloud, "the man went to a
restaurant for dinner and ordered state and potatoes," chances are any listeners
would hear "the man went to a restaurant for dinner and ordered steak and
potatoes." it is not surprising that there have been many ASRS reports of pilots
accepting clearances not intended for them after requesting higher or lower
altitudes. Again, we are most likely to make such mistakes when what we
expect to hear is only slightly different from what should be heard (as with
similar call signs).
Pattern Recognidon
Pattern recognition is one of the components of our model of information
processing (Figure 5.1). The word "pattern" refers to anything we see or hear or
really sense by any means. Our ability to perceive and identify patterns whether words or objects - depends heavily on our ability to match the pattern
that we see or hear with the representations of patterns that are stored in
memory. We refer to this matching as pattern recognition.
There have been many theories of pattern recognition. The template theory states
that there are entire patterns stored in our brains as whole patterns. When we
see or hear something, we match this to one of the stored patterns to identify
it. The problem with this theory is that we would need an infinite number of
templates to match the innumerable ways in which an object may be presented
to us -- one stored pattern for each different pattern in a different size and
orientation. For example, consider an individual letter "Z." This letter may be
presented to us in print (in either upper or lower case) or handwritten by many
different writers. While no one template would fit all of these Z's, we usually
have no trouble recognizing Z's as such.
A similar theory, the feature theory states that incoming information is broken
down into its component physical characteristics or features and their relations.
A "Z" for example, can be broken down into two horizontal parallel lines, an
102
Information Promessmn
oblique line, and two acute angles. There is some physiological evidence to
suggest that our brains do process some information in this way. There are
brain cells that respond only to horizontal lines, others that respond only to
vertical lines, etc. But that does not mean that we process all information in
this way. In fact, it would be difficult to explain the identification of most real
world objects in this way. For example, by what features do we recognize a
dog? There are barkless dogs, tailless dogs, dogs with three legs, hairless dogs,
etc. Whatever feature we might consider using to define "dog," we are sure to
think of an exception.
The template and feature theories both assume a "bottom-up" mode of
information processing. That is, they assert that we process information by
beginning with the physical aspects of the stimulus and working up to its
meaning. In a "top-down" approach, the meaning is accessed first or at least in
parallel with other information (usually with the aid of contextual cues), and
then that information helps us process the physical features. For example, none
of the characters in Figure 5.2(a) appear at all ambiguous; there is a dearly
definable "A", "B", "C", and "D". However, in a different context. the "B"
(a) F1
13 C
S12
B 14151
FIgu. 5.2
Eum"S o ue d =orMM
(or*
s"gn
figure)
now appears to be a 13. If you only saw Figure 5.2(b), you wouldn't think that
any of those numbers were ambiguous, yet the "13" and the "B"are exactly the
same. This is an example of the use of contextual cues to identify an ambiguous
signaL When surrounded by the letters "A", "C", and "D", we see a "B"; when
103
surrounded by the numbers "12", "14", and "15", we see a "13". There are many
studies that show that an appropriate context aids our ability to identify visual
stimu]L For example, lines are easier to identify when they are presented in the
context of an object, such as a box, than when they are presented alone (e.g.,
Weisstein and Harris, 1974). Letters are easier to identify when they are
presented in a word than when they are presented alone (Reicher, 1969).
Palmer (1975) showed subjects pictures of a loaf of bread, a mailbox, and a
drum. The bread and the mailbox were physically very similar. The subject's
task was to decide which of the three pictures they saw. The subjects saw the
pictures for such a short period of time that they could not be sure of which
picture they saw. Sometimes, before seeing one of these pictures, subjects were
presented with a scene such as a kitchen scene (i.e., a picture of a kitchen
counter with utensils, food, etc.). When subjects saw a scene that was
appropriate for the target picture (such as seeing the kitchen scene before
seeing the loaf of bread), accuracy was significantly better than where they saw
nothing before seeing the target. Performance suffered when subjects were "led
down the garden path" with an inappropriate context and a target object that
was physically similar to an appropriate object. For example, after seeing the
kitchen scene, many subjects were sure they had seen the loaf of bread even if,
in fact, they had been shown the mailbox.
In most cases, context helps or hurts us by setting the stage for expectations.
When what we see or hear is compatible with what we expect, we process the
information quickly and accurately. When it is incompatible, performance
suffers. Examples of this can be found in videotapes of simulation studies where
pilots say what they are thinking throughout the session. In an early TCAS
simulation study, one pilot saw the traffic display and was so convinced that a
"climb" advisory would follow that he never heard the many repetitions of the
"descend" command (See pp. 313-314 for a detailed discussion.)
Our pattern recognition system is set into motion every time our senses perceive
something. It is the first step toward processing complex information and
problem solving. It is important to understand that pattern recognition cannot
be considered in isolation. When we want to know how easy it will be to see
or hear a particular stimulus (whether a simple line or tone or a complex
message), we must consider the physical attributes of the stimulus, the context
in which it will be presented, and the knowledge or expectancies of the
perceiver.
104
Information Processing
Speech Perception
One example of complex pattern recognition is the comprehension of speech.
Speech perception is a very interesting problem. Almost any small computer is
capable of producing intelligible speech with the appropriate software and
hardware. Nevertheless, it is incredibly difficult to get even the most
sophisticated super computer to understand what the small stupid one said.
These computers fail almost completely when they listen to a variety of human
speakers say a variety of different things.
The French equivalent of Bell Laboratories has developed an automatic
telephone where the caller speaks the number into the phone rather than
dialing it. It works remarkably well with one notable exception. The phone does
not usually work for Americans or other non-native French speakers, even
though they may speak French very well. It appears totally unable to process
the call. Why can't this computer recognize American French as well as French
people can? The speech recognition systems that work best are "trained" to
individual speakers who use a limited vocabulary. The speaker says the words
to be used into the computer several times. The computer system then "learns"
to recognize this limited set of words under ideal conditions. One necessary
condition is a quiet environment since the computer can't differentiate between
speech sounds and similar noises. Once the speech recognition system is trained
to a speaker, it cannot tolerate much change in the speaker's voice, such as the
rise in pitch that is often induced by stress.
To understand why speech recognition is so difficult, we must first examine the
complexities of the speech signal. A spectrogram is a physical representation of
the speech signal. It plots the frequencies (in Hz) of the speech sounds as a
function of time. An examination of a spectrogram of normal speech reveals
that it is impossible to say where syllables begin and end; words can only be
differentiated when they are separated by silent pauses and these pauses do not
always exist in natural speech which is quite rapid. This presents a problem for
computers, since they are limited to the physical information in processing
speech. We, on the other hand, use our knowledge of language to help parse
the acoustic signal into comprehendible units such as words.
Another problem for speech recognition systems is the tremendous amount of
variability in the speech signal. Ask one person to say "ba" five times. And these
five simple sounds will all be slightly different (e.g., in terms of how long
before the vocal folds vibrate after the initial release of the sound -- the initial
opening of the vocal tract at the region of the lips. When these sounds are
produced in context, they are even more variable. The "ba" in "back," for
example, is slightly different than the "ba" in "bag."
105
106
Information Processmf
Memoly
Memory is a key component in our information processing system. Simple
recognition requires that the pattern in front of us match a pattern in memory
and most complex problem solving requires applying information stored in
memory to the task at hand.
107
The
Stm
SENSORY
(WORKING) SHORT-TERM
LONG-TERM
information
input
Automatic
Requires Attention
0.5 to 2 sec
20-30 sec
Decades
Large
7+ or -2 items
No known
limit
Information
duration
Information
capacity
108
Information Processn
109
Fgure 5.3.
Short-Term Memory
110
Information Processig
be able to recall between seven and ten of them. If the same letters were
read in logical groupings, such as "FBI, CIA, JFK, MTV, UHF", you would
probably be able to recall all of them. Fifteen items have been "chunked" or
grouped into a meaningful set of five. Similarly, "2, 0, 6, 3, 8, 4, 5, 7, 9, 1"
will be easier to recall as "206, 384, 5791", particularly if it is a familiar
phone number. These two examples illustrate two points. First, the capacity
Memory is also constructive in the sense that we not only store information
that is given directly to us, but we also store whatever that can be derived
from that information. Bransford, Barclay, and Franks (1972) read many
sentences to subjects in their experiments and later asked them if the test
sentences were ones they had heard before. They found that subjects could
not distinguish between the sentences they heard and ones that could be
logically inferred from the ones they heard. It was the processed meaning of
the sentences, not the specific words, that was stored in long-term memory.
There is some physiological evidence for the existence of short- and longterm memories as separate distinct structures in the brain. The following
case illustrates this point. H.M. incurred brain damage as the result of an
accident. Because of this damage to the temporal lobes, H.M. was unable to
transfer information from short-term into long-term memory. The
information stored in long-term memory before the accident remained intact
and could easily be recalled. This, along with his functioning short-term
memory enabled H.M. to carry on normal conversations with his doctor and
others. Without the ability to transfer the information to long-term memory,
however, the conversations were forgotten. If the doctor left the room and
returned minutes later, there was no evidence that H.M. had any memory of
the conversation that took place just minutes before.
With disease such as Alzheimer's, there is also -vidence of a separation of
short- and long-term memory. In the beginning stage of the disease,
transferring information from short-term memory into long-term memory is
problematic. Later, long-term memory degenerates and eventually the disease
invades so deep into the memory system that even language can be
forgotten.
In the absence of brain disease or damage, there are things we can do to
help store information effectively in long-term memory. If the material to be
learned can be organized around existing knowledge structures, (i.e.,
information already known), then it will be more efficiently stored and,
thus, easier to recall. it is easier to learn more about something you already
know than to learn the same amount of material about something totally
foreign or to learn it as isolated facts. Cognitive effort can also help to store
information in long-term memory. This effort can be intentional or
incidental. We can study to memorize facts (intentional) or we can use
a phone number, that we learn it whether or not
information so often, e,
we intend to do so. On ie other hand, informatiov that we would like to
keep easily accessible 'such as memory items on a checklist) may not be
readily available without regular review. Our raicmorv for important,
112
Information Processinf
complex information that is not used regularly but does need to be quickly
accessed, requires periodic maintenance - particularly if this informadon is
expected to be recalled in stressful situations.
113/114
Ch:':a:p'ter 6..
Display Compatibility and
Attention
by Christopher D. Wickens, Ph.D., University of Illinois
Displa Compaiibilily
As we follow the sequence through which information is processed by the pilot,
the first critical stage is that of penqdon, that is, interpreting or understanding
displayed information. However, there are features in display design that can
allow this interpretation to proceed automatically and correctly or, alternatively,
to require more effort with the possibility of error. This is the issue of the
compatibility between displayed information (stimulus) and its cognitive
interpretation. Based on that understanding, a response is triggered.
Compatibility generally refers to the relationship between a display's
representation, the way in which the display's meaning is interpreted, and the
115
way in which the response is carried out. S-C couuadbty refers to the
relationship between how a stimulus changes on a display, and how it is to be
cognitively interpreted. S-R couaubilty refers to the relation between displayed
stimulus change and the appropriate response. The important design issue in SC compatibility, which we shall now consider, is whether the change in a
display state naturally fosters the correct cognitive interpretation. We provide
several examples below.
Color is one important component of display compatibility. When a display
changes color, does the color on that display immediately give the correct
interpretation to the pilot of what that color is supposed to mean? The meaning
of certain colors is related to pcpdatkm Maewomj
which must be kept in mind
by designers. A designer might think "I have a meaning I want to convey, what
color should I use to convey that meaning?" This is really working backwards
because it does not address other population stereotypes a color might have.
What the designer really wants to do is say: "When a color appears on a
display, what will the pilot automatically interpret it to mean?" The problem
occurs when colors have multiple stereotypes, and so the pilot may instinctively
interpret one that is different from what the designer intended. Red has a
stereotype of both "danger" and "stop" or "retard speed." Now a pilot sees red,
in the context of airspeed control. Does it mean "slow down" or does it mean
that "airspeed is already too slow and there is danger of a stall?" Possible
conflicts of color stereotypes must be carefully thought through by the designer,
to make sure that a given color has an association that can't possibly confuse or
be confused and trigger the incorrect interpretation.
The second component of display compatibility is the spatial interpretation of
display orientation and movement. This relates to the movement of a display
and how a pilot interprets what that movement signals. Roscoe (1968) cited
two principles that define display compatibility. The first is the principe of
pi
iwdalim. The spatial layout of a display, that is, the picture of a display,
should be an analogical representation of the information it is supposed to
represent. The second principle that helps define display compatibility is the
prbacp o th moving pail. The moving element of a display should move in
the same orientation and direction as the pilot's mental model of systems
moving in the real world.
A good way to illustrate these two principles is with examples of hypothetical
airspeed indicator designs as shown in Figure 6.1. These are not necessarily the
ideal ways of designing an airspeed indicator, but they either confirm or violate
the principle of pictorial realism or the principle of the moving part. A pilot's
mental model of airspeed is something with a "high" and "low" value.
Therefore, according to the principle of pictorial realism, a vertical
116
representation is more compatible than the circular one as shown in design (d).
(c)
(a)
(b)
300-
500
200-
400
100
300
400
0-
200
500
200
Moving scale
Moving pointer
Id)
Flgm 6l.
Dfbui
Vrnw-
dhqr
w
mb
md of
1994
Also, our mental model has high airspeed at the top and low airspeed at the
bottom. So a fixed scale moving pointer indicator with the high airspeed
represented at the top, as shown in design (a), is compatible with the principle
117
of pictorial realism, whereas a moving scale indicator with the high airspeed
represented at the bottom violates that principle (d).
Consider the display of altitude as another example. There has been a good
deal of research suggesting that pilots think of the aircraft as the moving
element through the stable airspace, not as the stable element in a moving
airspace (Johnson and Roscoe, 1969). So when an aircraft gains altitude, it is
compatible with both the principle of the moving part and the principle of
pictorial realism for the m~oving part of the display to move upwards, and when
the plane descends, for the moving part to move downwards. This is exactly
what we get with a fixed scale moving pointer display (a). You have high
altitudes at the top, low altitudes at the bottom, and your moving pointer is in
a direction of motion that is compatible with the pilot's mental representation
of what is happening in the environment. That is, it conforms to the principle
of the moving part. With a fixed pointer moving tape display, there are two
possible design orientations. The situation in design (b) has the high altitude at
the top of the tape and the low altitude at the bottom; again, conforming to
the principle of pictorial realism. But an increase in altitude is signaled by a
downward movement on the display--a violation of the principle of the moving
part. The alternative is to present the low altitude at the bottom of the tape
and the high altitude at the top. In that case, when the plane climbs, the tape
moves upwards, and you've satisfied the principle of the moving part but
violated the principle of pictorial realism. This is one of those cases of
competing principles.
While it would seem therefore that the fixed scale display is ideal because it
conforms to both Roscoe's principles, it turns out that even this is not
necessarily the ideal solution because of a problem with scale resolution. For
variables like altitude, you can't print the whole scale unless it is printed so
small it is nearly impossible to read. That is the nice thing about moving scale
displays. They can accommodate a much longer scale because they are not
constrained by space. A compromise solution which could be adopted here is
called "frequency separation," in which the pointer moves rapidly across a fixed,
partially exposed scale to reflect high frequency changes. But lower frequency,
longer duration changes that will require exposing a different scale range are
accomplished by moving the scale.
Affeimion
Attention may be characterized as a limited capacity available to process a lot
of information. Our discussion of attention here will lead in two directions:
discussing the principles of multi-element display design, and the use of
head-up displays. Then in Chapter 11, we shall discuss the issue of dividing
118
Display CouuM
ftbik
and Atumnton
attention when trying to perform several tasks at once, and measuring the
attention demands of tasks: the issue of pilot workload. The issue of attention
can really be divided into three different aspects of human abilities. One aspect
is focued auion--how easily we can focus on one source of information and
ignore the distraction of other information. Successful focus is the opposite of
distraction. Another aspect of attention is diided atteton--how easily we can
divide attention between two activities and do two things at once, or process
two display channels at once. These activities could involve the pilot flying at
the same time he or she is communicating, or perceiving vertical velocity at the
same time that heading is perceived. Finally, we have the aspect called sekei'
attenion, and this describes how easily and how carefully the pilot selects
particular channels of information to be processed at the right time (e.g., is the
pilot sampling an instrument when he should be looking outside, or attending
to data entry on the FMC when he should be attending to airspeed control).
Fobeumd Affanw
A discussion of focused attention and distraction leads to consideration of the
electronic display issue. One of the things that we know from basic psychology
is that all information that falls in about one degree of visual angle is going to
get processed whether you want it processed or not. We know in aviation
displays that clutter is going to be an inevitable consequence of putting more
information in a smaller and smaller space. This will be important in the
discussion of head-up displays to follow. The issue now is to minimize the
confusion caused by clutter, and images that are too close together in the visual
field. How can we increase the pilot's ability to focus attention on one
displayed item and ignore other things that may not be relevant? We are
finding in research that color is an extremely useful technique for segregating
different sources of information. Coloring all of one type of information in one
color and different information in a different color can allow us to focus in on,
say, all of the information that is of one t ', and ignore the information that
is of the other, even if they are in the sanm spatial location.
With auditory messages too, the issue of confusion and distraction is relevant.
How do we allow the pilot to focus attention on one auditory channel of
information (say a synthesized voice message from the cockpit), while filtering
out conversation from the copilot or controller, so that the latter will not get
confused with the cockpit alert? The answer here is again in terms of physical
differences, in this case making messages sound as different from each other as
possile -- perhaps by purposefully making computer-driven messages sound
artificial
119
Huma Famc
for
DAddld AMd~n
When we consider divided attention, particularly attention divided between
different aspects of a display, designers are interested in creating for the pilot a
sense that two (or more) parts of a display that are to be related can be
perceived at the same time. This objective can sometimes be achieved by
bringing them close together in space. This, of course, is the principle
underlying the development of the head-up display. Also, any sort of static
display ought to have the labels of an indicator very dose to the indicator's
actual moving part. In fact, the analysis of the USS Vincennes incident when
the Navy ship shot down the Iranian airliner, revealed that the label on the
Navy's radar system that indicated whether the altitude was increasing or
decreasing was considerably separated from the actual indicator of XY position
itself. So the separation of these two pieces of information may, in part, have
caused the controllers on the radar display to misinterpret what that altitude
trend information was showing, assuming that it represented a descending
attacking fighter, rather than a climbing neutral airliner.
Of course, spatial closeness can be overdone. As we noted above, too much
closeness can create display clutter and thereby be counterproductive. Thus,
relative closeness between related display channels is probably more important
than absolute closeness.
In addition to spatial closeness, it is also possible to use a common color to
bring together in the mind two things that may be spatially separatd, and
make it easier to divide attention between them. As we note in the next
chapter, for example, it may be useful to use a common color to show the
relationship between a display and its associated control, when these are not
colocated; or, in an air traffic status display, to code all aircraft with similar
characteristics (e.g., common altitude) with the same color. Because color can
be processed in parallel with other features of a display, it is often useful to use
the color coding of an object to facilitate divided attention.
A third display feature that can improve the ability to divide attention between
two indicators is to present them as two dimensions of a single object. Perhaps
the best example of this is the attitude display indicator (ADI) that represents
two independent dimensions of flight control It represents both pitch and roll
as the vertical location and the angle of the horizon. That design feature greatly
improves the ability to divide attention between those two critical pieces of
flight information for integrated lateral and vertical flight control
Another important way of designing displays to facilitate parallel processing is
through the creation of wwmq tfeammm. These are perceptual characteristics of
120
a set of displays that are not the property of any single display. A good
example of an emergent feature is the imagined horizontal line, that connects
the tops of four vertical column engine indicators on a four-engine aircraft,
when all engines are running at the same level as shown in Figure 6.2.
Engine power
Engine number
FRgm 6.2
WdveM ISM
Two other characteristics that will improve the ability to process information in
parallel will be discussed in more detail in our later section on workload. These
are the automakty with which information is perceived (the more
automatically we process one symbol or piece of information, the better we can
do so in parallel with other display processing), and the use of sqwrate
modalide of information display (i.e., auditory and visual channels).
The piloes ability to select information that is needed on the display at the
appropriate time can be improved by three factors. First, and most obviously,
&uinn can improve selective attention. There is reasonably good evidence that
pilot's mean pauenmu (good indices of what is being attended when), change as a
function of their skill level, indicating an evolution of selective attention ability.
Second, diqlay orpnkadkm provides a good way of enabling the pilot to find
(look at) the information needed at the right time. One can contrast the more
organized display in Figure 6.3a, with the less organized one in Figure 6.3b, to
see the difference. However, it is important that the physical organization of the
display be compatible with the mental organization that defines the plotes
121
HM
inormation needs. That is, displays that are clustered or grouped together
would be those that are also used together.
S~
(a)
FlWs &3
(b)
-rgwteftL-WkwK1S"
Diyiq conAueny is a third variable that effects the pilot's ability to selectively
attend to the right sources of information at the right time. Where possible,
similar types of displays should be located at similar places, across different
viewing opportunities. This applies both for display locations across different
cockpits, and for multifunction displays across different pages that may contain
similar material. Finally, as we described above, dilay cuer will be a
hindrance to effective selective attention. It is difficult to visually find what you
want on a cluttered display.
The design and use issues of the head-up display highlight many of the issues
of attention discussed in the previous pages. Figure 6.4 shows a sample of a
head-up display (HUD) developed by Flight Dynamics, Incorporated. It is flown
in Alaskan Airlines planes. The HUD was designed primarily to bring visual
channels closer together in space so as to improve the ability to divide attention
between them. Instead of having critical flight instrumentation physically
separate from the outside world, the HUD overlays certain aspects of this
information on the view of the outside world. The goal of the head-up display
122
WIND SHEAR
WARNING
10
ARTIFICIAL
HORIZON
10-WIND
6.8
lII -..
WINDSHEAR
'
15
-=W
,,
12
I
..
.
.
SPEED ERROR TAPE .
F LIG H T
HFLGTPT
.. . . .. . . . . . . . .A
............
..--..... ............
.)
ACCELERATION
13
I
AOA LIMIT
14
I
<-).j
.......... ................. ...... .-
GWIND SHEAR
GUIDANCE
----FLI HTCOMMAND
AT
---- FLIGHT PATH
BARD ALTITUDE
480 B<------------------------
GS
.148 -
.......
.-
..-136
...
AIRSEED
AIRSPEE
j~
VECTOR
AND MAGNITUDE
1'
100 -650
-10
""
-10
-VERTICAL SPEED
RADIO ALTITUDE
GROUNDSPEED
Figure 6.4
is twofold. One, as noted above, is to reduce the need for visual scanning
between instruments and the outside world. The second goal is to portray
certain critical pieces of information that conform with the enviroranent so they
can be directly superimposed on that environment. These would include,
certainly, the runway symbol, the horizon line, a flight path representation, and
a symbol of the aircraft's current and predicted position. This conformal
ombolog, then can be interpreted by the pilot as belonging at locations along
his or her line of sight beyond the HUD.
HUD display development and research has a very long history in the military.
There are a number of issues in the military, like flying inverted and getting out
of high-G combat situations, that are less relevant for the design of civil
aircraft. On the other hand, it has been recently introduced and successfully
123
flown in Alaskan Airlines planes and has had very good reception (Steenblik,
1990). The first category three landing was at Seattle-Tacoma Airport in late
1989. The pilots who have flown with it generally have liked it and have found
that it does a good job of allowing maneuvering and landing in very low
visibility conditions. At the same time, it keeps them actively involved in the
control loop rather than turning over control to automatic landing systems,
thereby maintaining a level of involvement which pilots generally value. Flight
tests with the HUD have been quite successful. Figure 6.5 shows an example of
the "footprints" of landing touchdowns made on a series of category one and
category two landings done in simulations with and without a HUD. It shows
greater touchdown dispersion without the HUD than with it. It also tells us that
there were six go-arounds in the approach without the HUD and no go-arounds
with the HUD. Desmond (1986) reviewed the development of the HUD and its
implementation in the aircraft.
The critical issues in HUD design relate not so much to whether they are a
good thing or bad thing, although some researchers have phrased it that way,
but rather to the appropriate design guidelines to follow, how HUDs can be
improved, and to identification of the potential pitfalls in HUD use (Weintraub
& Ensi,
1992).
In the analysis of HUDs, there are three conceptually different domains. One
domain has to do with the optics of the HUD, that is, how they are coiRl,
how the lenses are configured, and where they are located (the visual angle
between the HUD instrumentation and the line of sight out the cockpit toward
the runway during approach). A second is the Oabi
of the HUD. What
exactly should be placed on the HUD, and in what format? How much of this
should be nonconformal symbology? The third domain concerns the whole issue
of pilot atten in the HUD. How does human attention switch back and forth
between the HUD instrumentation and distant objects in the far environment?
How well can human attention be divided between instrumentation and things
in the far domain? What are the consequences of focusing attention on the near
HUD and ignoring information that is out there in the environment?
In addition to these three issues of HUD research, there are four important
categories of differences between typical HUDs and conventional flight
instruments. First, HUDs are, of course, displaced upwards to overlap the visual
scene. Second, conventional displays are presented at a short optical distance.
HUDs are typically collimated out to near optical infinity. Third, there are
significant differences in the symbology between conventional instruments,
which often, although not necessarily, have an older round dial symbology, and
HUD instrumentations which typically have a much more novel symbology.
Fourth, the different symbologies represent the movement of the airplane
differently. Most conventional instrumentation for presenting guidance
124
DhaW1
TOUCHDOWN COMPARISON
CAT I, CAT II AND NON-PRECISION APPROACHES
Touchdown Dispersions Without HGS (46 Flights, 6 Go-Arounds)
THRESHOLD
1050 FEET
3000 FEET
5000
1050 FEET
3000 FEET
RUre 6.5
5000
information is based on the relationship of the airplane to the air mass. Some
HUD symbology (e.g., that used by Flight Dynamics), in contrast, may be based
on the inertial guidance of the plane and therefore provides information with
respect to the ground surface. Differences in flight test performance between
HUD and conventional instrumentation could result from any or all of these
differences in design features.
When we view objects up dose, the light rays from the object hit the eyeball in
a converging orientation. They are not parallel The muscles surrounding the
lens must activate or "refract&to bring that image into focus. For objects more
than five or six meters away, the light rays travel in a roughly parallel
orientation. The lens relaxes its shape and the more distant object is brought
125
the HUD information, then would suddenly present the runway information,
and determine how long it took the pilot to confirm appropriate altitude and
airspeed, and then make the decision about whether the runway was open or
dosed. Essentially they were asking the pilot to switch attention from the near
domain, (the air speed and altitude), to the far domain, and then make a
response of whether there was an X present or not. In one condition of their
experiment, the instrumentation was presented head down, and optically close.
Therefore the pilots not only had to switch attention from the near to the far,
but they had to accommodate from the HUD to the distant runway.
Figure 6.6 shows the results from this condition. The solid line represents the
state of accommodation, changing from the near to the far symbology. This is
130 JE
STATIC
HUD
220
SWITCH
NEAR TO FAR
STATIC
RUNWAY
RT
NEAR
ACCOMODATnVE
RESPONSE
FAR
Fig.re
K
&
RUNWAY
RESPONSE
STIMULUS
TIME
called the ac o
wrapouwe. The important point to note in this figure is
that the time to make this decision is influenced partially by how far they have
to accommodate, but also they can make the response well before they have
completely reaccommodated to the greater distance. This finding suggests that
you don't need to have perfect visual information in the far domain before you
are able to process it and use it. Nevertheless, this was the first experiment that
127
keeping that imagery close to optical infinity rather than close in.
Weintraub, Haines and Randall also varied the visual angle between the HUD
and the runway information. They compared two conditions. In both conditions,
the HUD imagery was collimated to optical infinity. In one condition, the HUD
imagery was overlapping the runway and "head-up." In the other condition, it
was not overlapping the runway and "head-down." In the head-down condition,
the imagery was still optically far, but was no longer superimposed on the
runway. Instead, it was positioned at the same location as the true conventional
instrumentation. So to get information from the runway and HUD in the headdown condition, the pilot still had to visually scan up and down, but didn't
have to reaccommodate. The investigators found almost no difference in
performance between the head-up and head-down conditions in terms of the
ability with which judgments could be made. These results suggest that the
advantages in the head-up display may be more in the symbology on the one
hand, and in lessening the need to reaccommodate, than in the fact that there
is overlapping imagery.
In addition to the physical and optical placement issues, there are a set of other
physical characteristics of the HUD that are worth noting. Many of these are
taken from a series of guidelines presented by Richard Newman, who did a
fairly extensive review for the Air Force, and whose findings are applicable to
civil aviation as well (Newman, 1985). One of the guidelines concerns the qe
refiemcepoint. It turns out that in viewing a HUD, the imagery changes and the
ability to interpret it changes a little bit, depending on where the eye is
positioned relative to the HUD. Newman argues very strongly that the HUD
positioning should be adjustable to allow different seating postures, so it could
be moved when the pilot is scrunched forward or sitting back. A second issue
concerns the fied of view. That is, how much of the outside world should the
HUD incorporate? A lot of technological effort has been put into designing
HUDs that can present a wide field of view. One of the guidelines is that the
field of view should be at least wide enough so that when you are landing into
a crosswind with a very substantial crab angle, the runway is still visible on the
HUD, even as the aircraft is crabbed maximally into the wind. This difference
between aircraft heading and velocity vector indicates how wide the field of
view should be on the HUD.
Another issue that isn't well-resolved concerns what happens when conformal
symbology on the HUD moves out of the field of view. Suppose a pilut is flying
directly towards the runway, and then changes course so that now the runway
128
symbol slides off to the side of the HUD. Should it disappear or just freeze on
the side so the pilot still dearly perceives that it is off to the left or the right,
but now perceives an underestimation of the magnitude of the deviation.
Another physical characteristic concerns the frequency with which the HUD is
updated. For analog information on the HUD, a guideline is that the variables
should be refreshed at around 10 to 12 hertz, sufficient to give good
performance. For digital information on the other hand, you certainly don't
want that fast updating, because digits tend to be unreadable. Therefore,
something like 3 to 4 hertz is probably appropriate.
The symbology issue can be broken into two major domains. The first relates to
some of the sensory factors that relate to issues in visual and auditory
perception. For example, what should be the intensity of the HUD imagery?
How bright should it be? What is the necessary intensity to perceive across the
conditions ranging from night viewing, in which you can get by with fairly low
intensity, to incredibly bright snow cover? Is a single fixed intensity adequate,
or should there be automatic or manual intensity control? A related issue
concerns the &wmiuance.Newman has recommended that no less than 70
percept of the outside world light should be transmitted through the HUD.
Weintraub argues instead that it should really be more like 90 percent
(Weintraub & Ensing, 1992). In fact, the Flight Dynamic HUD used by Alaskan
Airlines has about 90 percent transmittance.
Color is another issue in HUD design. The current HUD designs tend to be
monochrome (green). One of the reasons is that the monochrome display
transmits a lot more light than a color HUD. Color of course has benefits, but
color, as viewed on the HUD, may have some real problems in terms of
interpretation, particularly when several colors are to be used. Under the varied
conditions of illumination in which a HUD may be used, any more than four or
five colors will create a real risk of confusion.
Cognitive issues in the design of HUD symbology are also relevant. The Air
Force has done some good research in terms of the nature of the HUD
symbology and how that can be best interpreted (Weinstein, 1990). The nature
of the pitch ladder is one example. How do you make the pitch ladder as
unambiguous as possible in depicting whether the aircraft is nose-up or nosedown? Here is where color comes in. One of the problems with the HUD is that
its graphic representation of what is up and down is not as good as the colored
representation on the typical Attitude Display Indicator using blue and brown.
129
There may be a role for color in HUDs to help make the simple discrimination
of what is above and what is below the horizon.
The use of the inertia guidance system is an important cognitive issue. Its
importance is suggested by the fact that the evaluation of the HUD flown in
Alaskan Airlines revealed that the characteristic that pilots seemed to like most
is the fact the guidance given by the HUD is based on inertial guidance rather
than air mass guidance. In other words, the pilot actually gets a representation
on the HUD instrumentation of where the plane is heading relative to the
ground, rather than relative to the air mass through which it is flying. So this
indicates that possibly the major benefits may be in what the HUD presents
rather than where it is physically presented.
Some issues have to do with the development of flight director displays on
HUDs. These correlate very closely with the same issues of the flight director
for presenting head-down information. What is the appropriate tuning? What
are the appropriate rules to guide the flight director?
One major symbology issue concerns how much information should be on the
HUD. Should a HUD only present the necessary conformal flight information,
the things that are necessary for actual flight path guidance, and, therefore,
conform to (and can be superimposed on) the world outside? Should the HUD
also present different kinds of flight parameter and alerting information, and if
so, how much? As we see below, this impacts the issue of display clutter.
Finally, there is the issue of multimode operations. Some HUD designs present a
lot of information in a relatively small space. If this is viewed as a problem,
then designers often recommend that the pilots be given the option of calling
up alternative forms of information. However, any time the designer creates
multimode situations, you start dealing with problems of menu selection, forcing
the pilot into computer keyboard operation. Such operations have a number of
potential dangers at critical high workload times during the flight, when the
HUDs are likely to be in use.
,Msxon houas
The initial goal of the HUD was to resolve the problems of divided attention by
superimposing the two images. Once that decision was made, then there
followed the issue of how to improve the symbology, and the decision to
collimate the images at optical infinity. The real question is whether or not
simply superimposing images of nonconformal symbology does address the
problems of divided attention, or whether it creates the potential for other
problems.
130
There are three possible attention problems that are created by superimposing
visual images. One is whether or not the resulting chter disrupts the ability to
focus attention. Are there problems trying to focus attention on the far world,
(the runway out there) when there is a large amount of symbology in the near
domain that may be partially obscuring it? Can these problems be addressed by
reducing HUD intensity? The second problem, a related one, is related to
divided ateion and confusion. If a pilot is actually trying to process the farworld information and the near-world symbology simultaneously, is there a
possibility of confusion? For example, when the aircraft moves and the farworld runway then moves relative to the HUD, could the motion of the runway
be misinterpreted as being part of the movement of analog symbology on the
HUD? The third problem, related to attentonal twmnneing orjiraion we now
discuss in some detail, in the context of research at NASA Ames.
One of the few studies that has been conducted with a dynamic head-up display
to examine attentional issues has received a fair amount of publicity, although
it has some methodological problems. It is a study done by Fischer, Haines, and
Price (1980). Ten pilots flew a simulated instrument landing approach. The
HUD was compared with conventional head-down instrumentation (not
collimated). Although most of the landings were normal, on the very last trial,
there was a runway incursion. As the pilot was approaching the simulated
runway, another aircraft pulled onto the runway. The investigators found that,
although the HUD gave better performance under normal landing conditions, a
significant number of pilots failed to notice the plane coming onto the runway
when flying with the HUD. Furthermore, those that did notice the runway
incursion took longer tn notice it when they were flying with the HUD than
when they were flying with conventional head-down instrumentation. However,
this finding w-as not replicated in a more carefully controlled study by Wickens,
Martin-Emerson, and Larish (1993).
The way the NASA investigators interpreted the fixation data was to state that
in flying with conventional instrumentation, there is a very regular scan pattern
required to check the clearance of the runway; but with the HUD, the imagery
may obscure the distant runway, and the scan pattern is disrupted in a way that
doesn't allow the routine and automatic examination of the imagery out in the
far domain. In the evaluation by Steenblik (1989) of the operational use of the
HUD in Alaskan Airlines, some pilots report that in the last few seconds of the
approach, coming into and through the flare, they find the imagery on the HUD
distracting. They have a tendency to tunnel attention exclusively on that
imagery and, therefore, they prefer to turn off the HUD to avoid this tunneling.
Also, earlier evaluations done by NASA indicate a substantial problem with
tunneling in on the HUD instrumentation and potentially ignoring the outside
world. Finally, some research on military applications of the HUD done by
131
Opatek indicated a lot of problems, at least with early HUD designs, that arose
from them being too cluttered, so that pilots had a tendency to turn them off.
A summary of the attentional issues highlights the following points. First, the
distinction between conformal and nonconformal symbology is critical.
Conformal symbology will not create clutter and clearly is desirable to be
presented head-up, particularly when driven by inertial guidance information.
Nonconformal symbology, whether digital or analog, may lead to clutter and
confusion, and its addition to a HUD, while reducing scanning, should be
considered only with caution. Secondly, attentional tunneling on either
conformal or nonconformal symbology, to the exclusion of attention to the far
domain, is a potentially real problem. Consideration should be given as to how
to "break through" the tunnel (e.g., by turning off the HUD or reducing its
intensity). Third, there is some suggestion that the tunneling problem may be
exacerbated in head-up rather than head-down presentations.
In conclusion, there has been some debate in the aviation psychology literature
regarding whether the HUD is an advancement or a detriment to aviation
safety. One way of addressing this debate is to point to the strong endorsements
provided by pilots who have flown with the current versions. A second way is
to consider what HUD has done. It has pushed the performance envelope of
aircraft into a whole new domain, and dearly in that new domain there are
going to be more chances for risk and accidents, for example, flying lower to
the ground in low to zero visibility. In this sense, it is analogous to headlights
which, by allowing night driving, have placed the driver in a consistently more
dangerous environment (Weintraub & Ensing, 1992).
132
DKduhn Makh
Chapter 7
Decision Making
by Christopher D. Wickens, Ph.D., University of Illinois
The Decision-Making Process
Figure 7.1 shows a model of information processing. This is similar to the
model presented "i Chapter 5, Figure 5.1. In the preceding chapters, the
discussion focused on basic characteristics of the senses, how the eyes and ears
perceive stimuli, and how information from the world around us is perceived or
understood. This chapter deals with the decision-making process that takes
place after the sensory information is perceived.
Figure 7.1 provides a framework for discussing the decision-making process. A
pilot senses a stimulus, for example, the VASI on a runway. That information
becomes an understood piece of knowledge when the pilot recognizes the visual
133
Stmul-..
Perception
--
Decision and
Response
Response
Execution
Responses
Ears
SWorking
S~Memory
L Memor
- .....
long-ftrm
o...
Feedback
FRUm 7.1.
A modin
I" dVornin
proFmh..k
(from Wickam, 19
134
Decision Making
second class has direct relevance to cockpit design issues, and this will lead us
to a discussion of the transfer between different designs on different aircraft.
PilotJudgment
When we talk about decision making, we begin with the concept of uncertainty.
Decisions can be made with certainty or with uncertainty. A pilot's decision to
lower the landing gear, for example, is made with certainty. The pilot knows he
or she must lower the landing gear to touch down on the runway and the
consequences of the decision are well known in advance. On the other hand, a
decision to proceed with a flight in bad weather or to carry on with a landing
where the runway is not visible is a decision with uncertainty, because of the
uncertain consequences of the actions. What will happen if the pilot continues
with the flight in bad weather can't be predicted.
A lot of the conclusions in decision making that will be discussed here come
directly from studies and experiments that have not been related to pilot
judgment. There are, of course, databases about aviation accidents and incidents
that attribute a large percentage of these to poor pilot judgment and faulty
decisions (Jensen, 1977; Nagel, 1988). The problem, of course, is going back
after the fact of an accident or incident. It is easy to attribute a particular
disaster to poor judgment when, in fact, there may be, and usually are, a lot of
other causes. Poor judgment may have been only one of a large number of
contributing causes all of which cannot be identified. For this reason, it is
helpful to study judgment and decision making in other fields besides aviation,
like the nuclear power industry, or to draw inferences from some experimental
laboratory research. Much of the information in this section is based upon
conclusions from these other nonaviation areas.
Figure 7.2 (Wickens and Flach, 1988) shows L general model of human
decision making that highlights the information. processing components which
are relevant to the decision process. To the left of the figure, we represent the
pilot sampling, processing and integrating a number of cues or sources of
information. If it is a judgment about flying into instrument meteorological
conditions, these cues may be weather reports, direct observation of the
weather, anecdotal reports from other pilots in the air, etc. All of the cues help
135
perception M,
attention 4"
working
critgrion
[r(diagnosis)
hypothesis
generation
action
riskessemmen
generation
[J
salience bias
representativeness heuristic
F!]
As es-if heuristic
Figure 7.2.
Av ava.iability heuristic
confirmation bias
[ac
framing bias
Decision
through with a given action or decision. In aviation, the criterion setting is very
often based upon risk a.esment.What is the risk of continuing in bad
weather? What is guiding our choice? What are the consequences of failure?
And then in the final box in the model in Figure 7.2 we perform an action, and
observe its consequences which themselves generate more cues.
Biases in Situation Assessment
The model in Figure 7.2 includes codes (S, R, As, etc.) in small boxes which
represent biases that can cause errors in human decision making. Some of these
biases are also called hewuics, shortcuts or mental "rules of thumb" that
people use to approximate the correct way of making a decision because it
takes less mental effort (Kahneman, Slovic, & Tverksy, 1982).
Sa/ence Bias
The first of these biases is called a salience bias (S). The salience bias means
that when someone is forming a hypothesis based on a lot of different cues of
perceptual information, he or she tends to pay more attention to the most
salient cue. For example, a pilot may be processing various sources of auditory
information including weather reports, reports from air traffic control and other
pilots, conversation from the first officer, etc., to form a hypothesis. The
salience bias is reflected in the fact that it is often the loudest sound or loudest
voice that has the most influence. Another example of the salience bias occurs
in dealing with a multi-element display. We tend to pay most attention to
information displayed at the center of the display rather than the information at
the bottom. These are physical characteristics of a display that aren't necessarily
related to how important that information is. The brightness of lights creates a
bias: the brighter the light, the more we tend to pay attention to it in making
our situation assessment
Con
atinon Bias
Anc
ng Heulsfdc
7i
Decision Making
base rate data, (if you don't know what those overall probabilities are), and
you have a lot of weather forecasts and a lot of good observations, you should
pay more attention to the degree of similarity between the hypothesis and the
existing conditions.
Aveilbity Hew
There are two very important heuristics that we use to approximate the base
rate and the similarity of the data to the hypothesis. These are vailabilty and
140
Decision
Mkinf
ftwesa~mimHowisfic
We have said that people should rest their belief in part on the base rate
probability. We have also said that the way people actually use base rate
probability is not by the true probability, but by how easily they can recall
instances of an event. However, it seems that people frequently do not use
probability at all in making diagnoses. Instead, they attend only to the similarity
or representativeness of the current evidence or data to one hypothesis or
another. The representativeness heuristic further states that the only time we
use base rate is when there isn't much data to go on. For example, a pilot may
be flying in a particular area, and it is highly probable that the local weather
conditions may be severe, based upon past data. If the present weather actually
looks clear outside the cockpit, the pilot would tend to ignore the base rate
nformation which might state that in this particular region, at this particular
time of year, the weather is likely to degrade. The representativeness heuristic
nw' Bias
In understanding where you are, what your situation is, and what you should
do next, the overconfidence bias can be at work. This seems to be a fairly
pervasive bias that underlies performance of both novices and experts in a lot
141
where it was found that pilots are more confident that their judgments are
correct than they really have a right to be (Wickens et al, 1988).
For the pilot, the consequence of overconfidence in the correctness of a decision
that he or she has just made, is that the next course of action will be taken
without adequately considering the alternative actions, should the decision in
fact be the wrong one, and will be taken without adequately monitoring the
evolving consequences of the decision just made.
Risk Assessment
A characteristic of many judgments both on the ground and in the air is the
need to choose between a Ar*k, option and a sure thing option. A risky option
has two possible outcomes, neither one of them assured. A sure thing option
har only one, certain, outcome. It is almost guaranteed. The classic example of
choosing between a risky option and a sure thing option is delaying takeoff on
a flight. The sure thing option is that you are going to sit on the ground for a
long period of time and nothing is going to happen except a certain delayed
flight. The risky option involves going ahead with the takeoff into potentially
uncertain weather, a decision with two possible outcomes: an accident or
incident due to severe weather, or a safe trip. With the sure thing option,
staying on the ground, it is highly probable that everything will be fine, and
the consequences of the decision will be generally good (safe, but with a
delay). The risky option really has a relatively high probability that things will
go very well (a safe flight but no delay), but a very severe negative
consequence if the bad weather leads to disaster.
How do people make these choices? Do they tend to go for the sure thing or
the risky option? These sorts of decision problems can be expressed intuitively
in terms of gambling choices. Here's the choice: you can receive five dollars
guaranteed, or you can flip a coin and either win #---i dollars or nothing at all.
This is really a choice between two positive outcomes with the same expected
value in the long run. One is keeping the five dollars, a sure thing. The other is
that you have a 50/50 chance of getting something good, ten dollars, or
nothing at all. With either option, you have everything to gain and nothing to
lose. In contrast, we can also represent these two decision choices in terms of
negative outcomes. So I can say, I will take five dollars from you, or you can
flip a coin and have a 50/50 chance of either losing ten dollars or nothing at
all.
The research in psychology has studied people confronted with these gambling
choices (including also trained business people making inv,-rnents). The results
reveal that whether people choose the risky option or th2 sure thing option,
142
Decision Making
depends upon whether the choice is between two positive outcomes as in the
first example, or two negative outcomes as in the second example. Given a
choice between two positive outcomes, people tend to take the sure thing. They
tend to be aversive to risk, and the expression goes, they "take the money and
run." So more people would be likely to take the five dollars than to take the
bet of getting more or nothing at all. But given the choice between two
negative outcomes, people tend to be risk-seeking. The expression for them is,
they "throw good money after bad." They are more likely to take the gamble
and hope they come out with no loss rather than accepting a guaranteed loss.
This difference in choice preference is called framing of decisions, because the
way in which a decision is made depends on how it is framed: Whether it is a
choice between positives or a choice between negatives (Kahneman, Slovic, &
Tversky, 1982).
Consider, for example, a physician making choices between a sure thing medical
treatment and risky treatment. Investigations have found that the physician
recommendations are very much influenced by whether words are phrased in
terms of saving the patient, or the probability that the patient will die. Saving
the patient is the positive outcome; the probability of death is the negative
outcome.
How do we translate framing into an aviation-relevant example? Again, let's
consider a decision between, say, canceling or delaying a flight and taking off
into bad weather. We can talk about the sure thing characteristics of delaying
or canceling the flight. There is a certain good characteristic to delay or
cancellation, and that is you are guaranteeing safety. A certain bad
characteristic is that you are guaranteeing a lot of irate passengers, a disrupted
crew schedule, etc. The risky option of flying into bad weather has a good (but
uncertain) outcome: it is likely that you are going to proceed in a more L,ely
fashion. It also has a potentially bad characteristic: with a low probability, it
could happen that there is going to be severe delay and possibly disaster. The
issue here is that the bias towards one choice or the other can be based on the
way in which the positive outcomes are framed or emphasized. Say the decision
is between guaranteeing a safe flight or a high probability of getting a timely
flight to the destination. That is a decision framed in terms of a positive sure
thing and a positive risk. The framing bias suggests that under these
circumstances, the bias would be towards delaying the flight and just staying on
the ground. Whereas, if the decision were framed in terms of negatives, a sure
thing of delay with irate passengers or a relatively small possibility of a crisis
because of being in the air in bad weather, there would be a greater bias
towards choosing the risky option.
143
Decision
something he or she has experienced before, and that's the diagnosis. The pilot
has carried out an action that worked before under those same conditions, so
the pilot carries it out again and doesn't go through a time-consuming process
of risk evaluation and calculated action choice.
Both the research at Illinois (Wickens et al, 1988) and a lot of the research that
Klein (1989) has done with tank ciew commanders and with fire fighters
indicate that this type of decision making seems to be much more resistant to
stress. Finally, it has been found that people's ability to evaluate the risk of
different options, again, does not appear to be degraded by stress. There isn't a
tendency to be more risky or less risky under stress.
Lessening Bias in Decision Making
So where does all this lead to? What steps can be taken to address bias
problems in decision making? Clearly, training and developing expertise is one
step. Experts tend to use decision strategies that are based more on directly and
rapidly retrieving the right action or diagnosis from long-term memory, on the
basis of similarity with past experience, rather than using working memory to
generate or ponder the alternatives in an effortful manner (Klein, 1989).
Another step that can be taken is de-biasing. There has been some successful
research in de-biasing, that is, making pilots or decision makers aware of the
kind of biases already mentioned in this chapter. Weather forecasters, for
example, if given explicit training about the tendency to be overconfident in
their forecasts, can learn to calibrate those forecasts quite accurately. Planning,
that is, rehearsing alternatives in advance of a crisis situation, is another step in
addressing the bias problem. Effective pilot training naturally strives to get the
student to plan for alternative courses of action, and their consequences in
different and possible circumstances. Finally one of the more controversial
means used to deal with bias, one that is emerging in the commercial flight
deck and is already used in the military flight deck, are expert systems. Expert
systems can, at least according to some scientists, replace some of the pilot
decision making necessary in the cockpit, or at least can recommend judgments
The first factor that affects response selection speed is the decision compleriy.
The complexity of a decision is literally the number of possible alternatives.
Think of a two-choice decision. You are accelerating for takeoff. Do you rotate
or abort the landing? There are two possible choices available. A more complex
example is a choice between four possible alternatives. A TCAS warning might
tell you to turn right or left, or to climb or descend. It might even present more
detailed choices: right and descend, left and descend, etc. The response time
increases with the number of possible response alternatives. In fact, we have a
nice equation that can be used to express how long the response time will be
as a function of the number of possible alternatives that are available.
RT = a + b log2N
You can plot this function to show that each time we double those alternatives,
we get a constant increase in response time (and an increase in the probability
of making a mistake). Again, simple choices are easier and made more rapidly
than complex ones.
146
tend to perceive and respond very fast to things we expect, take a long time to
respond to (or not perceive at al) things that we do not expect. For example,
in accelerating for takeoff, the pilot very much expects the conditions to be
favorable to rotating and going through with the takeoff. He does not expect
conditions that will force an abandonment of takeoff procedures. Coming in for
a landing, the pilot expects a clear and open runway, and does not expect an
obstacle to appear on the runway. We have a formula for the effect of
expectancy or probability on reaction time (Hyman, 1953).
RT = a+blog2[1/p(a)]
The lower the probability of the event (a), the less frequent it is, and the
longer is the reaction time. These equations provide some evidence, which
psychologists are always seeking, for fairly well-defined mathematical laws of
human performance. To some extent and in some circumstances, these laws can
be balanced against the very strong mathematical laws of engineering
performance.
respond more slowly if the context makes the event unlikely than if the context
makes the event a probable one. So a crew will respond to a windshear alert
reach the next phase of flight as soon as possible, will be more likely to make
an error. There is an interesting application of the speed-accuracy trade-off in
nuclear power plant design. Designers have found that operators, under
emergency conditions, tend to put a self-imposed time stress on themselves.
When warning signals start to go off, they want to respond very rapidly. The
consequences have been, in a couple of accidents, that people respond fast and
make mistakes. Of course, mistakes in dealing with a crisis are the last things
you want to have happen. There have been some implicit recommendations in
this country, and explicit recommendations in Germany in the nuclear industry,
to tell operators that when something starts to go wrong, not to respond
immediately. Germany has actually given them a time in which they cannot do
anything until they form an understanding of exactly what is happening. In
other words, control room operators have been given instructions that combat
the tendency to respond fast and make more errors in times of crisis.
Signal and Response Discniminabilily
Deidsiona
what the manufacturer calls "human-engineered" direct input pushbutton
control, in which all of the controls look identical. This is exactly the opposite
of good human engineering principles where you would want to have a high
degree of discriminability between one control and another.
Pracdco
Practice is still another influence on response time. The more practiced we are
at responding in certain ways under specific conditions, the more rapidly those
responses will be.
149
human factor guidelines that are relevant. A major point in checklist design is
150
Decision Mkn
Figure 7.3.
Response Feedback
Another issue in response selection, particularly relevant to making several
responses in a row, is the issue of feedback from the responses. There are two
different classes of feedback. Eniwsic feedback is separate from the act of
making the response itself. Extrinsic feedback is often visual. For example, when
you press a key on a CDU (control-display unit), you see a visual indicator on
the display corresponding with the key that was pressed. Intrinsic feedback, on
the other hand, is directly tied to the act itself. It may be tactile feedback where
you press a button and feel the click as it makes contact, and perhaps you hear
a click. Intrinsic feedback is very useful if it is immediate; that is, if it occurs
immediately after the action. For example, pushbutton phones that give you a
tone each time you press a button provide better intrinsic feedback than those
that don't. There is a great advantage to making sure any keyboard design
includes this intrinsic, more immediate feedback.
151
On the other hand, it is clear that delayed feedback is harmful, particularly for
novices. It disrupts the ability to make sequential responses, particularly when
that feedback is attended to and is necessary, or particularly if it is intrinsic.
One of the things we have known for a long time is that delayed auditory
feedback has a tremendously disruptive effect. If you are hearing your own
voice and it is delayed by as little as a quarter second, the voice transmission is
very profoundly degraded. Looking toward the future, design considerations for
the data-link system between pilots and area traffic controllers will need to be
concerned with feedback issues, as the pilot communicates through the
computer interface with the ground using various forms of non-natural displays
and non-natural controls, (i.e., keyboard controls, computer-based voice
recognition, and voice synthesis).
Display-Control (Stimulus-Response) Compatibility
The compatibility between a display and its associated control has two
components. One relates to the relative location of the control and display; the
second to how the display reflects (or commands) control movement.
In its most general form, the principle of location compatibility says that the
location of a control should correspond to the location of a display. But there
are several ways of describing this correspondence. Most directly this is satisfied
by the principle of colocadon, which says that each display should be located
adjacent to its appropriate control. But this is not always possible in cockpit
design when the displays themselves may be grouped together. Then the
compatibility principle of congruence takes over, which states that the spatial
arrangement of a set of two or more displays should be congruent with the
arrangement of their controls. Unfortunately, some aviation systems violate the
congruence principle (Hartzell et aL, 1980). In the traditional helicopter, for
example, the collective, controlled with the left hand, controls altitude which is
displayed to the right; whereas the cyclic, controlled by the right hand, affects
airspeed which is displayed to the left.
The distinction between "left" and "right" in designing for compatibility can be
expressed either in relative terms (the airspeed indicator is to the left of the
altitude indicator), or in absolute terms, relative to some prominent axis. This
axis may be the body midline (i.e., left hand, right hand), or it may be a
prominent axis of symmetry in the aircraft, like that bisecting the ADI on an
instrument panel, or that bisecting the cockpit on a twin seat design. Care
should be taken that compatibility mappings are violated in neither relative nor
absolute terms. For example, in the Kegworth crash in the United Kingdom in
1989, in which pilots shut down the remaining, working (right) engine on a
Boeing 737, there is some suggestion that they did so because the diagnostic
152
Decision Makin
(b)
(a)
X-Y
x-z
(c)
(d)
00
Y-Z
Figur 7.4.
y-x
cw is 1990
154
Decision Mkn
(a)
(b)
(e)
(g)
Figir 7.5.
(c)
(a)
(h)
with any rotary control, the arc of the rotating element that is closest to the
moving display is assumed to move in the same direction as that display.
Looking at (c) in Figure 7.5, we see that rotating the control clockwise is
assumed to move the needle to the right, while rotating it counterclockwise is
assumed to move the needle to the left. It is as if the human's "mental model" is
that there is a mechanical linkage between the rotating object and the moving
element, even though that mechanical linkage may not really be there.
The important point is that it is very easy to come up with designs of control
display relations that conform to one principle and violate another. A good
example is (e). It shows a moving vertical scale display with a rotating
indicator. If the operator wants to increase the quantity, he or she grabs the
dial and rotates it clockwise. That will move the needle on the vertical scale up,
thus violating proximity of movement stereotype. You can almost hear the
grinding of teeth as one part moves down while the adjacent part moves up.
How do we solve the confusion? Simply by putting the rotary control on the
right side rather than the left side of a display. We have now created a display
control relationship that conforms to both the proximity of movement stereotype
as well is the clockwise to increase stereotype. Simply by improving the controlto-display relationship, designers can reduce the sorts of blunder errors that
may occur when an operator inadvertently sets out to, say, increase an air speed
bug by doing what seems to be compatible, and instead it moves it in the
opposite direction.
The third component of movement compatibility relates to congruence. Just as
we saw with location compatibility, so movement compatibility is also preserved
when controls and displays move in a congruent fashion: linear controls parallel
to linear displays [(f), but not (g)], and rotary controls congruent with rotary
displays [(b) and (h). Note, however, that (h) violates proximity of movement].
When displays and controls move in orthogonal directions, as in (g), the
movement relation between them is ambiguous. Such ambiguity, however, can
often be reduced by placing a modest "cant" on either the control or display
surface, so that some component of the movement axes are parallel, as shown
in Figure 7.6.
As we have seen with the proximity of movement principle, movement
compatibility is often tied to a pilot's "mental model" of the quantity being
controlled and displayed. Figure 7.7 shows one particular example of display-tocontrol compatibility that indicates how consideration of the mental model can
increase the complexity of compatibility relations. This example is taken from
an aircraft manual on a vertical speed window. It is a thumbwheel control
mounted in the panel, and to adjust the speed down, you rotate the wheel
upward. The label next to the thumbwhee! shows an arrow pointing up to
156
Decision
.4
0.%
8=600
0dO68=45*
(a)
0
70
Figure 7.6
bring down (DN) vertical speed and an arrow pointing down to bring vertical
speed up (UP). From the human factors point of view, this is an incompatible
relationship between control and display. If you want to go down, you should
push something down, not up. If you want to go up, you should push
something up. However, consideration of the mental model makes the relation
more compatible than it first appears. If you think about this as a vertical
wheel, mounted into the cockpit along the longitudinal axis, you are basically
157
sown
~Up
Glareshield(b
(b)
Vertical speed selector
UP/DN - sets vertical speed in
vertical speed window
(a)
giuM 7.7.
EBMTIs Of iplay&*ft
WUMM 194
co"
csnprQMby on a vwlfial
sc
wlndw. (torM
rotating the nose of the aircraft down or up. So moving it up rotates the nose
of the aircraft down, thereby creating a descent. How pilots think of this is not
altogether dear, but it illustrates an important principle that a pilot's mental
model of what a control is doing has tremendous implications for whether that
control will be activated in the correct or incorrect direction.
Compatibility concerns also address the issue of how a toggle switch should
move to activate or provide power to a system. To configure a control mounted
on a front panel in a way that its movement will increase the quantity of
something or activate it, we might well have it move to the right or upward. If
it is mounted along a side panel, we might want to move forward to increase
(on) and backward to decrease (off). What happens when we have it mounted
on a panel which is at an angle between the right side and the front? We now
have a competition between whether this panel is being viewed as closer to the
forward position, in which case an increase should be to the right, or closer to
the sideward position, in which case an increase should be forward--but in the
opposite direction. Which way should this control go to increase? An answer is:
Why fight the stereotypes? Why not instead go with the one direction that is
unambiguous. That is, make sure upwards increases? If there is a zone of
ambiguity, where you have one stereotype fighting against the other stereotype,
158
Decision Makinf
good human factors should consider that battle and take advantage of designs
on
Fur 7.8
slowt repka*n
w
c
seeing in the military, more and
the saier lorward-on'
more voice-activated controls
rom
" Hawn, 1967
arrangere
replacing manual controls.
Certain guidelines seem to exist
that suggest that voice control is well-suited (compatible) for certain kinds of
cognitive tasks, but poorly suited (incompatible) for other kinds of tasks. The
voice is very good for making categorical output, describing a state. On the
other hand, using the voice for any sort of tracking task, describing the location
of things, or movement of things in space, is relatively poor. One reason for this
is that our understanding of space is directly connected with our manipulation
159
of the hands. Therefore, the hands, whether using a key or joystick are much
more appropriate for continuous analog control when responding to continuous
analog displays. The one possible benefit for voice control of continuous
variables would occur if the hands were already heavily involved with other
manual control activities. (See Chapter 8.)
Stress and Action Selection
As we have mentioned before, high stress tends to shift one towards fast but
inaccurate performance. People tend to react rapidly, but they tend to make
more mistakes. It is also clear that under stress, people shift to the most
compatible habits and actions. This is probably the strongest reason for keeping
stimulus-response compatibi-Kit.y gh. Under low stress, people can be effective
using an incompatible design like an overhead switch that goes back to turn
something on. However, the data suggests that under high levels of stress, the
incompatible design is likely to cause an accident, even for the skilled pilot.
Somebody wants to turn it off, so by habit they move it backward (which is
really on). So compatibility is most beneficial under stress, and, of course, the 1
percent of the time when stress is high is when we are most concerned about
good cockpit design, because this is the period in which the environment may
be least forgiving of human error.
Stress also has other effects on action selection. It biases operators to perform
the best learned habits, in place of more recently learned habits. Stress leads to
a sort of "action tunneling," which is analogous to the cognitive tunneling we
discussed above. In action tunneling, the pilot may repeat the same
(unsuccessful) action over and over. Because stress reduces the capacity of
working memory, it may have a particularly degrading effect on mu/dmode
systems--like a multimode autopilot--in which the pilot must remember what
mode of operation a system is in, in order to select an appropriate action. (We
discuss these systems again under the topic of human error in the next
chapter.) If the memory fails (because of stress), the multimode system
becomes particularly vulnerable to an inappropriate action.
Finally, stress has implications for voice control, where either a pilot or air
traffic controller is talking to voice recognition systems. Major concern in the
research on voice control is the extent to which high levels of stress distort the
voice quality and, therefore, distort the computer's ability to recognize and
categorize the voice message. This has been one of the biggest bottlenecks to
the use of voice control in military systems. What happens when a pilot comes
under stress when talking to the aircraft, and the aircraft does not recognize his
voice commands?
160
Decision Makint
Negative Transfer
The topic of stress and action selection are closely related to the issue of
negative transfer. Negative transfer is the bringing of habits used in one
The appearance of the new design is the same or similar to the old.
learning and experience. Almost any task that a pilot must perform can be
characterized by some perceived information read from a display and a required
action. This matrix portrays whether the perceived information and the required
action is the same between the old and the new systems.
161
Table 7.1.
Matrix Showing Error Probability Due to Transfer. (from Braune, 1989)
Case 1
Perceived
Information
Required
Action
Transfer of
Previous
Learning and
Experience
Error
Probability Due
to
Transfer
Same
Same
Maximum
None
Positive
Case 2
Different
Same
Positive
Immediate
Case 3
Different
Different
Little or
None
Low
Case 4
Same
Different
Negative
High
In Case 1 in Table 7.1, the perceived information is the same and the required
action is the same. With two identical systems, therefore, everything that was
learned in the old system is going to transfer to performance in the new system.
There is going to be a maximum positive transfer of previous learning and
experience from the old system to the new. There is really no possibility for
errors in the transfer.
Case 2 is where there is a different representation of the perceived information,
but the same required action. For example, the old system might have an
analog display and the new system has a digital CRT display. The information is
perceived differently because it is presented in two different formats but the
required action is the same. The transfer of previous learning and experience
will be positive. Error probability is intermediate, so that some errors will occur
but not a great many.
162
Decision Malng
In the Case 3 example, both the displays and the controls are different.
Therefore, there is little or no transfer of previous learning and experience. The
probability of error due to transfer in Case 3 is low. In Case 4, the perceived
information is the same, but there is a different required action. This was the
situation in the DC-9 crash. The same mode switch in two cockpits performed
different actions. The mode switch had to be set differently in the old system
than in the new system, and here is where the transfers of previous learning
and experience are highly negative. These are the "red flags" for potential error
in transferring from one design to the other.
It is important to note that the potential for negative transfer is greatest when
the required action is actually similar, but incompatible with the old action. In
the DC-9 crash described above for example, the identically appearing rotary
switch was turned in both cases; only the turn was to a different position in
the old and new (two incompatible responses). The nature of the transfer
relationship shown in the matrix is such that negative transfer may sometimes
be avoided by making the appearance of the new response device substantially
different from the old (e.g., a pushbutton select, rather than a rotary control, in
the above case). One of the greatest problems with the different aircraft
manufacturers doing their own thing is the extent to which there is a lack of
standardization of those kinds of display-action relations across aircraft. In
particular, there is a lack of consistency in the relationship between computer
systems and control that leads operators to make errors when transferring from
one to the other.
163/164
Chapter 8
Timesharing, Workload, and
Human Error
by Christopher D. Wickens, Ph.D., University of Illinois
Diwded Attention and Timesharing
In Chapter 6, we talked about attention in terms of ability to divide attention
between two different sources of displayed information. We talk now of
attention in the broader sense of being able to divide attention between a large
number of different tasks such as between flying and communicating, between
navigating and talking, or between understanding the airspace and diagnosing
165
time. Effective timesharing is being able to attend to the right thing at the right
time. Much of your ability to take notes at a lecture is based on your ability to
write when the speaker is not saying anything important, then switch your
attention to listening when the speaker is saying something important. A lot of
research on selective attention, on being able to attend to the right place at the
right time, particularly in aviation, has focused on the visual world and pilots'
successful ability to look at the right instrument at the right time. The general
conclusion of research at NASA Langley is that pilots are fairly good at
attending to the right place at the right time.
On the other hand, there is also some good evidence that task scheduling and
information sampling is not always optimal. Accident reports may be cited in
which pilots have clearly "tunneled" their attention onto tasks of lower priority,
while neglecting those of higher priority (e.g., maintaining stability and safe
altitude). The Eastern Airlines crash into the Florida Everglades in 1972 is
perhaps the most prominent example. Furthermore, experiments done at Illinois
find that student pilots do not adequately postpone lower priority tasks when
workload becomes high.
There is some interesting research that Gopher (1991) has done with the Israel
Air Force which looks at ways to train pilots to better allocate their attention
flexibly between tasks. This training device was found to be fairly effective in
qualif"in 6 pilots for fightcr aircraft duty.
Confion
A second cause of poorly divided attention in doing two things at the same
time relates to confusion, a topic discussed in our section on HUDs. You can
think of two channels of information, and two responses, but the responses that
should have been made for B show up in A, and the responses that should have
been made for A show up in B. Recall our discussion of a pilot flying an HUD.
There is a motion in the outside runway because the plane changes attitude,
and the pilot interprets that motion as being motion on the HUD. This is an
example of confusion. One possible way of avoiding confusion between HUD
imagery and the far domain is by the use of color. Certainly confusion often
occurs in verbally dependent environments where there are two verbal messages
arriving at once; for example, a pilot listening to a copilot and simultaneously
listening to an air traffic controller. There is confusion when a message coming
from one person gets attributed to the other person, or when the digits or the
166
words in the two messages get confused. The main guideline to avoid confusion
is to maximize the differences between the voices. You are less likely to confuse
the voice of the copilot with the voice of the controller if one is male and the
other is female than if both are male or both are female. The same thing could
probably be said regarding digital voice messages. Make sure the voice quality
of the digital message is very distinctive and very clear, perhaps by making it
sound mechanical, which differs markedly from the voices typically heard on the
flight deck. Differences that help us to distinguish between voices include
location (or source) and pitch.
Resources
The third mechanism that is involved in timesharing and attention when doing
several things at a time is the concept of esources. We have limited capacity,
resources, or a supply of "mental effort" that is available for different tasks.
Because this limitation exists, the concept of processing resources is important
to the issue of pilot workload prediction and assessment, a topic to be discussed
later in the chapter. We allocate our limited attentional resources to tasks; as
we try to do two tasks at once, for example, fly and communicate, one task gets
a certain amount of resources and another task receives the remainder. Our
ability to do the two activities at once depends upon the demand of the task for
resources and the available supply. In discussing task demand and supply of
resources, psychologists describe a function that relates the level of performance
on a given task to the amount of resources that are invested in that task. This
function is known as the performance resourcefimaion. If you take a very
difficult task, for example, flying through heavy turbulence and landing under
low visibility conditions, it requires a full investment of all of one's resources.
One hundred percent of the resources are required to obtain a given level of
performance, and that level of performance isn't very good. However, if you
consider an easy task, like cruising through clear weather, one can obtain very
good performance by only investing half of the attentional resources; and trying
harder (investing more resources) can't improve performance any further. You
can get maximum performance by giving only a small amount of your resources.
Figure 8.1 presents the performance-resource functions for an easy task (top), a
difficult task (bottom), and one of intermediate difficulty. The difference
between the bottom and top curve is important not only in the level of
performance that is attainable, but also in the amount of "residual resources"
that are available to devote to a second (concurrent) task. For the difficult task,
as for the intermediate one, any diversion of resources to a secondary task will
sacrifice its performance. But for the easy task, a good portion of resources can
be diverted with no loss in performance.
167
The curves in Figure 8.1 are also related to training. Extensive practice on any
given task will shift the performance resource function from the bottom, to the
middle, to the top curve. As the task can be performed with fewer resources, we
say that its performance has become automadized. Compare the middle and top
Hg
Resources
Allocated to
Primary Task
Fo
Resources
Allocated to
Secondary Task
&Iur6.. Grqii of hw pslfomulis a tww ion of the cMliill of prinwy and aecondiny
Usk& (If= Wickau 1992)
168
Our discussion of attention and timesharing in the previous section has set the
stage for the treatment of workload here. Figure 8.2 is one representation of
HUMAN OPERATOR
WORKLOAD
STRATEGY
BEHAVIOR
EFFORT
CONTROL
ENVIRONMENT
SSYSTEM
Figum &
0 PERFORMANCE
Mode of woddoad.
between the capacity of a human operator and the demands of a system. That
human operator interacts with the system in two ways. First, he or she is
involved with control--doing things to it and watching what happens. Second,
he or she is also involved with putting effort into this performance, and the
system itself drams effort from the operator. The human and the system
together work under the influence of an environment. The hum-n outputs
behavior. The system outputs performance. For example, in an aircraft, the
human is doing things to the control yoke, and the aircraft is performing (i.e.,
following some flight profile). The human also outputs workload which is the
experience of the effort involved in controlling or monitoring the system. This is
what we measure when we measure workload, and these are the factors that
basically drive workload.
There are a number of important case studies in which pilot workload has
played a major role. Right now a major issue in the Army is whether one or
two pilots should fly the LHX Light Attack Helicopter. That is very much of a
169
workload issue. Can one crew member manage the task load requirement with
sufficiently low workload to make it fly satisfactorily with sufficient residual
resources to handle the unexpected? An analogous choice was posed around
1980 regarding two- versus three-person flight crews on the generation of more
automated commercial aircraft (e.g., the Boeing 757). The President established
a workload task force to look at the issue of whether the flight engineer was
necessary. The decision came down to allow two-crew operations, in part,
because the mental workload was deemed to be allowable with this
complement. FAR 25.23, Appendix D, talks about certifying aircraft for their
workload. In such certification, workload estimations are used to compare
systems. Does the old system impose less workload or more workload than the
new system? Workload is also relevant in examining the impact of data-link
based automation versus traditional communications with the air traffic control.
Finally, there is the issue of using workload measures to examine the level of
training of a pilot. As we saw in the previous section, although two pilots may
fly the mission at the same level, if one flies with a lot less workload than the
other, does that make a difference in predicting how the pilots will do later on
or how well the pilots may transition from simulator training to the air?
What exactly is workload? How does workload relate to performance? How a
plane performs in terms of its landing oi deviation from the flight path tells you
a good deal, but doesn't tell you all there is to know about the cost imposed on
pilot workload by flying the aircraft. A good metaphor for workload is of a
"dipstick to the brain." If workload depends upon this reservoir of resources we
have, as shown in Figure 7.1, we would like to be able to push a little dipstick
into the brain, find out how much workload there is, then just pull it out like
we measure the amount of oil in a car. We'd like to be able to say the
workload of this task is a 0.8 relative to some absolute capacity. This measure
of absolute workload is a goal we are a long way from achieving. We will
probably never be able to achieve it with a high degree of accuracy. Far more
realistic is being able to make judgments of relative workload; for example that
the workload of the new system is less than or greater than the workload of
the old system. This is different than saying the workload is excessive or not
excessive.
In addition to the distinction between absolute and relative workload measures,
a second distinction is between workload prediction and workload assessment. A
major objective of design is to be able to predict workload of an aircraft before
flying a mission, as opposed to assessing the workload of the pilot actually
flying. In this chapter we shall first contrast these two approaches: prediction
and assessment. While our discussion in these sections will focus on conditions
of overload (is workload excessive?), we will then turn to the other extreme of
work underload, and the closely allied issue of sleep disruption. Finally the
170
ilmelne Analysis
The simplest model or technique for predicting workload is the timeline model.
The timeline model is based on the assumption that during any flight task, the
pilot, over time, performs a number of different tasks, and each task has some
particular time duration. Therefore, we can estimate the workload on the pilot
as being the proportion of total time that he or she has been occupied doing
something. When applying this method, it doesn't matter what the difficulty of
that task is. The only thing that matters is how long it takes to carry out the
task. It doesn't make much of a difference whether two tasks are done at the
same time or done at different periods of time. Timeline analysis has been
developed extensively in the work that Parks and Boucek (1989) have done at
Boeing, where they have developed specialized software for doing such analysis.
As shown in Figure 8.3, the TIme/ine Analysis Program (TLAP) simply codes a
time record by lines, whose vertical position indicates the type of task, and
whose length indicates the duration of time each task segment is performed.
The time line is divided up into lengths of equal duration. Then the program
sums within each unit of time the total amount of time the tasks are being
done and the total time available. It computes the fraction of the time required
to do each task and divides that by the time available within the interval. From
that, the software comes up with a workload score for each interval.
The program can generate a chart for a particular activity that shows peaks and
valleys. Figure 8.3 shows an example of a workload time history profile. Using
such a technique, it is possible to establish a "red line" of absolute workload
level, a workload you would say is "excessive." Then you can determine where
design problems are in the epochs when the task demands exceed the red line.
As one example, Parks and Boucek (1989) carried out an analysis of their view
of the implication of the data-link system cai fight crew workload. The scenario
they fabricated was one with a weather deviation, an approach to landing, some
major weather, a wind shear warning, missed approach, and a number of other
events. They first traced out the pattern of activities carried out by the pilotflying and the pilot-not-flying, under the conventional instrumentation and the
conventional interaction with controllers. The task analysis was then repeated
assuming their conception of the data-link system, which posited a data-link
display on which, at the bottom of the CDU there was a message board that
presented the necessary information from the data-link, (the automated
171
Workload Prediction
Timeline Analysis
Workload Histogram
Crewmember - Captain
Unshifted
June 1, 1987
Flight Phase
Eng Start and Taxi
4)
140
120
100
0
0
40
320
M0
0
12
18
24
30
36
42
48
54
Time (Seconds)
WL
Figure3.
Time Required
Time Available
=ro
Timesharinf. Workload. and Human
140
120
02 1000
80
Link
Overload
"The"Red Line"
- - -- -- - -- -- - -- - -- -- - -- -- - --
60
40
20
0
109
206
309
412
515
618
721
824
927
1080
Time (Seconds)
Figur 8.4.
P kternu vkion tak ing advicedl Not d*ck for wehir waldimm
Groe & BouceK 1M7)
P=m
There are some other examples of timeline analysis. For example, McDonnell
Douglas has a slightly different version of a timeline program. Either version
provides a good way of auditing what the tasks are and where the potential
periods of peak overload may be. The technique has certain limitations however
because it assumes that the workload of a task is only defined by how long it
takes and not how intensive or demanding it is. We all know intuitively that
there is a difference between how long something takes and how much demand
it imposes on our mental process. For example, the pilot may have to retain
three digits of information from ATC in short-term memory for five seconds, or
173
seven digits of information in short-term memory for five seconds. Either way,
that task takes five seconds, but certainly keeping seven digits in mind is more
demanding on our mental resources than keeping three digits in mind.
Similarly, flight control with an easily controlled system may involve just as
much stick activity but a lot less cognitive demand than flight control with a
system that has long lags and is very difficult to predict. Timeline analysis
doesn't really take into account the demand of the tasks.
A second problem is that the way timeline analysis is derived, the definition of
a task is usually something you can see the operator doing, and it doesn't
handle very well the sort of cognitive thinking activities that pilots go through
(planning, problem solving), although timeline analysis is beginning to address
A third problem is that timeline analysis doesn't account for the fact that certain
tasks can be timeshared more easily than others. Pilots can do a fairly good job
of controlling the stick at the same time they a:e listening. Visual and vocal
activity can be timeshared very easily. Visual and manual activities can be less
easily shared. In other words, scanning the environment at the same time as
entering information into a keyboard is much more difficult than speaking to a
controller while looking outside the cockpit. Rehearsing digits is also quite
difficult while talking or listening. Timeline analysis does not account for the
fact that certain tasks are easy to timeshare and others are hard. These
differences in timesharing will be elaborated below when we discuss multiple
resources.
Finally, a fourth problem is that timeline analysis is fairly rigid. It sets up a
timeline in advance and sees where different tasks will be performed, but in
reality, pilots do a fairly good job of scheduling and moving tasks around. So if
two tasks overlap in time according to the timeline set up by the analyst, pilots
may simply postpone one in a way that avoids overlap.
Elaborations of Thneline Analysis
There are a number of more sophisticated workload prediction techniques that
address some of these limitations of timeline analysis. Table 8.1 shows workload
component scales for the UH-60A mission/task/workload analysis. It is an
attempt by Aldrich, Szabo, & Bierbaum (1989), who have been working with
the Army on the helicopter design to code the tasks in terms of how demanding
or how difficult they are. The left column has a number for the difficulty scale
of the task. A higher number means the task is more difficult. The first task on
the list is "Visually Register/Detect (Detect Occurrence of Image)." It has a
174
difficulty value of 1. The authors have also dzfined six channels of task
demand, analogous in some respects to the different channels used by Boeing.
Another way of accounting for the demands of a task is through a demand
checklist. That is, if you do an analysis of the task that a pilot has to do, there
are certain characteristics of any given task that influence whether it is difficult
or easy, independent of how long it takes. Consider, for example, the signal-tonoise ratio. It obviously is a lot easier to search for a runway if it is clearly
defined than if it is partially masked by poor visibility. Other characteristics that
influence display processing demand are the disci
l
between different
display symbols, the clt on a display, the conmatbilty between a display and
its meaning, as discussed in the earlier chapter, and the consistenc of
symbology across displays. Variables that influence the demand for central
processing resources are the number of modes in which a system may operate,
the requirements for predicdon, the need for mental/riain
(as a pilot must
often do when using an approach plate to plan a south-flying approach), the
amount of worldng memoy demands (time and number of chunks), and the
need to follow utwpompdprocedures. Demands on response processes are
imposed by low S-R compadibiit, the absence of feedback from action, and the
need for precision of action.
175
Table 8.1
Workload Component Scales for the U1H-60A Mission/Task/Workload Analysis
Scale
Value
Descriptors
1.0
3.7
4.0
5.0
5.4
5.9
7.0
1.0
2.P0
4.2
43
.9
6.6
7.0
7.0
Kinesthetic
Detect Discrete Activation of Switch (Toggle, Trigger, Button)
Detect Preset Position or Status of Object
Detect Discrete Adjustment of Switch (Discrete Rotary or Discrete Lever
etect SeriaI Movements (Keyboard Entries)
Detect Kinesthetic Cues Conflicting with Vnisal Cues
Detect Continuous Adjustment of Switches (Rotary Rheostat,
Thumb.heel)
Detect Continuous Adjustment of Controls
1.0
1.2
3.7
4.6
5.3
6.8
7.0
cognitive
Automatic (Simiple Association)
Alternative Selecton
Sig/ignal Recognton
Evaluation/Judgent (Consider Single Aspect)
Encoding&/Decoding, Recall
Evaluation/Judgent (Consider Several Aspects)
Estimation, Calculation, Conversion
1.0
2.2
2.6
Psychomotor
Sec
DiceeActuation (Butto To e, Trigr
ntrolSebsr Control)
Continuous Adjustive (Ph-t
1.0
4.0
4.8
5.5
6.1
6.7
176
Timesharinm,
5.8
6.5
7.0
Manipulative
These are a series of guidelines that can be used to predict the amount of load
on a task. There are other approaches to predicting task demand as well. Parks
and Boucek have used an information complexity measure for computing task
demands. However, what has been discussed up to now has still been a view of
attention that really assumes that there is one pool of resources that are used
for all tasks, or a series of separate and completely independent channels. That
assumption of how the attentional system works is not in line with the fact that
not all of the interference between tasks can be accounted for by difficulty. For
example, entering data into a keyboard interferes a lot more with flying
performance when it is done manually than when it is done by voice. When we
change the structure of the task like this we can sometimes find a large
difference in the amount of interference with flying. We also find another
characteristic of dual task performance which indicates that not all tasks
compete for the same resources, and this is called djffrulty hinenid'iy. This is a
situation when increasing the difficulty does not increase the interference with
another task. Given the assumption that there is one pool of resources, then if
we make one task more difficult, we pull resources away from the other task,
and the performance of the other task ought to decline. But there are situations
when this doesn't happen. For example, we can increase the difficulty of flying
and a pilot's ability to communicate will not change much unless the flying
becomes very, very difficult.
MLutpfe Resoucs
The above findings and others suggest that there is not a single pool of
resources, but rather that there are multiple resources. So to the extent that two
tasks share many common characteristics, and therefore common resources, the
amount of interference between them will increase. For example, if we have
two tasks that both demand the same resource, like controlling aircraft stability
while adjusting a navigational instrument, there will be a trade-off in
performance between them. However, if we have one task that demands
resource A, and a second task that demands resource B, like listening, while
flying a coordinated turn, there will be little or no mutual interference. As an
analogy, if you have one home that relies on gas, and another home that relies
on oil, there is not going to be any competition for heating resources between
these homes if, say, the demand for gas suddenly increases.
177
specific type of resource. If this resource is also shared with concurrent tasks,
the difficulty increase will be more likely to lead to a loss of performance. In
other words if two tasks demand the same resources, there will be a trade-off
between the difficulty of one and performance of the other. If they use different
resources, we can change the demand of one and not affect the performance of
the other.
We have argued elsewhere that there are three distinctions that define
resources. First, auditory resources are different from visual resources.
Therefore, it is easier to divide attention between the eye and ear than between
messages from two visual sources or two auditory sources. Second, the
resources that are used in perceptual and cognitive processes in seeing, hearing,
and understanding the world are different from the resources that are involved
in responding, whether with the voice or with the hands. Third, we have
contrasted spatial and verbal resources.
As we are perceiving words on a printed page or spoken words, we are using
verbal resources. When employed in central processing, we use verbal resources
for logical problem solving, rehearsal of digits or words, and mental arithmetic.
For a pilot this could involve rehearsing navigational frequencies given by ATC
or computing fuel problems. Anything that has to do with the voice uses verbal
response resources.
In perceiving spatial information, we do a variety of things. We do visual
search; we process analog quantities like moving tapes or moving meter
displays. We also process flow fields, that is, estimate the velocity over the
ground, from the flow of texture past the aircraft. We recognize spatial patterns
on maps, to help form a guidance of where to fly. Spatial central processing
involves imagining the airspace, or mentally rotating maps from say a north-up
to a heading-up orientation. Spatial responses are anything that involves
manually guiding the hands, fingers, feet or eyes through space: using the
control yoke, the rudder pedals, and the keyboards or engaging in visual search.
Thus the idea behind multiple resources models is that you can predict how
tasks will interfere with each other or how much workload will be experienced
not only by how long those tasks take to perform and by how demanding those
tasks are, but also by the extent to which two tasks demand common resources.
There are now a number of different efforts in the research design community,
more directly focused on military systems, that have elaborated upon versions of
multiple resources theories to come up with computation models that will take
a timeline and a task demand coding, and make predictions of the workload on
178
the pilot. Both Honeywell and the Boeing people have been involved in
developing a model of this sort (North & Riley, 1989).
WododAwaoet
A framework for understanding workload assessment is presented in Figure 8.5
which shows a graph that presents across the bottom line the resources
demanded by a task or set of tasks. The farther to the right on this axis, the
more the pilot is having to do more tasks or perform tasks that are more
difficult. The pilot has available multiple resources that can be given to those
4-
Underload"
Overload
Maximum
MResources
-supplied
Reserv e
Resource
supply
cpi
V
Resources demanded
More Tasks
More Difficult Tasks
Fo
8.5.
179
If you look at all how well the pilot is performing the task at hand when
demands are in the left side of the graph (e.g., maintaining the flight path),
what you will see is less and less reserve resources available to do other things.
As we push the demand beyond the maximum supply at the middle of the
figure, the pilot is getting into the "overload" region. There is an excess of
demands and the pilot needs more than he can give. As a result, performance of
the task of interest is going to begin to deteriorate. The measurement of
workload requires looking across this whole range of task demands, from
underload to overload. This suggests that how we measure workload may vary
depending on where the pilot falls in the underload and overload regions. At
the left, we must measure raudual raeouwrs. At the right, we may measure
performance directly. Four major techniques of measuring workload are
generally proposed: measuring the primary task itself, meas,.ring performance
on a secondary task, taking subjective measurements, and recording
physiological measurements.
Pdmay Task Pwftmn
ce MeAwwsures
In aviation, the critical primary task is flight performance. How well is a pilot
actually doing keeping the plane in the air along a predefined flight path
trajectory? The direct measure of primary task performance might be some
measure of error or deviations off of that trajectory. However, it is also
important to measure not only performance, but some index of control activity;
that is, how much effort the pilot is putting into keeping the plane on the
trajectory. We need to measure control activity because we can get two aircraft
that fly the same profile with the same error, but one requires a lot of control
activity and one needs very little control activity. It turns out that one good
measure of control activity is the open loop gain, which is the ratio of the pilot's
control output (yoke displacement) to a given flight path deviation.
Figure 8.6 shows the relationship between gain (effort) and error. The upper
left box represents a timeline of a pilot flying a particular profile under low
workload because there is little error and little control effort being made. This
is an unambiguous measure of low workload; performance (flight path error) is
good and effort is low. In the upper right box, we have a situation where the
error is low but the pilot is putting in a lot of control activity to maintain that
low error. We would see there is a high gain or high effort invested in the
flight performance. This is probably a high workload situation and suggests that
there is some sort of control problem. That is, some sort of problem in the way
the information is represented or the handling of the aircraft, so it is taking a
lot of effort to keep the plane flying steadily. This situation may also reflect
flying in high turbulence.
180
Thnesharih.
Error
Control-.--
--
Inputs
GAIN (Effort)
Low
High
Low WL
High WL
Control Problems
Low
ERROR
.High WL
Neglect
High WL
High------------1
FR9M 86
In the lower left box is represented the opposite situation in which there is not
much control activity going on, but there is a fairly high amount of error. It is
almost as if the plane is flying through turbulence and the pilot is not doing
anything with the stick. This pattern may very well signal ne-lect where the
pilot is neglecting the flight control and allocating resources to something else-system problems or problems with other aspects of the aircraft. It is also an
indicator that there is high workload, but the high workload is not associated
with the flight control itself, but with some aspect of the aircraft environment.
Finally, the lower right box shows the worst situation, in which the pilot is
producing a lot of control activity and is still generating a lot of error for
whatever reason. Thus there is very high workload in this situation.
The important point illustrated in this figure is that looking at performance of
the primary task itself as an indicator of workload is not sufficient. You have to
look jointly at performance of the system and at the behavior of the pilot
181
technique assesses the extent to which the pilot has enough residual resources
to perform another, secondary task at the same time as a primary task without
letting performance in the primary task drop. When doing a difficult primary
task, if we give the pilot a secondary task, he is going to either have no
resources for that secondary task, or, if resources are diverted, the primary task
is going to drop (Wickens, 1991).
One example of a secondary task is time estimation. Suppose the pilot is flying
along and is asked to give a voice report every time he thinks 10 seconds has
passed. Time estimation generally becomes more variable and the intervals
longer as the workload increases. Another secondary task that has received a
fair amount of interest is the task of a memory comparison. While flying along,
the pilot hears a series of probe signals. Maybe they represent call signs. Every
time he hears the call sign of his own aircraft, he presses a button. Every time
he hears the call sign of another aircraft, he does nothing. So he compares each
call sign to his memory. If it matches he responds. This task is sometimes called
the Stmnber Task. The response time to acknowledge call signs is longer with
higher levels of workloads. Random number genem ion is another possible
secondary task. The pilot is asked to generate a series of random numbers and
the more difficult the primary task, the less random the numbers become.
Another secondary task is the citical intability racking task, in which a second
tracking task is built into the pilot's primary flight control loop. Error on this
task directly reflects the difficulty of the flight dynamics of the primary task.
All of these types of secondary tasks have various problems. One problem they
have in common is that they are all sensitive to multiple resources. If you have
a secondary task that demands resources that are different from the primary
task, you are going to underestimate workload. If you have a primary task that
is heavy, in terms of perceptual-cognitive load--rehearsing digits would be a
good example--and you have a secondary task that is heavily motor, like
performing a critical tracking task, it is like you are looking in one comer of a
room for something that exists in a different part of the room. So you need to
have your secondary tasks demand the same resources as the primary task.
Perhaps even more critical, at least for in-flight secondary task measures of
mental workload, is this problem of inisivenew. We can all imagine the
resistance that a pilot would give if he were trying to fly the aircraft through
high workload conditions, and at the same time 1had to generate a continuous
stream of random numbers, or had to continuously control a side-tracking task.
He simply wouldn't want to do it. This is the biggest bottleneck towards the
182
introduction and the use of secondary tasks--they tend to be intrusive into the
primary task and disrupt the primary task; and this is a major problem when
the primary task is one involving a high-risk environment (i.e., in flight
recording, rather than simulation).
A solution to the problem of intrusiveness is a technique called the embedded
secondary task; that is, use of a secondary task which is an officially designated
part of the pilot's primary responsibilities, but is fairly low in the hierarchy of
importance for the pilot. In flying, there is a certain intrinsic task priority
hierarchy. For example, there is the standard command hierarchy to aviate,
navigate, and communicate in that order of priority. With more precision we
can further rank order tasks in terms of those that have very high priority, say
maintaining stability of the aircraft, those of extremely low priority, like
answering service calls from the back of the aircraft, and those things in
between. The idea behind this prioritization scheme is that as the workload
increases from low to high, the lowest priority tasks are going to drop out, so
when the workload is very, very high, the only thing that will be left to do is
the highest priority task. Thus good embedded measures of secondary tasks are
those tasks that are naturally done but are lower down in the priority hierarchy.
An example might be acknowledging call signs. To the extent that this is a
legitimate part of the communication channel, one can measure how long it
takes the pilot to acknowledge the call sign as an embedded secondary task.
Our research has indicated that airspeed control is a good embedded secondary
task. The control of airspeed around some target is of lower priority, or at least
seems to be reduced in its accuracy more, when the demands for the control of
the innerloop flight path error, (heading and altitude error), become excessively
difficult. So as the demand goes up, the airspeed errors seem to increase, more
so than do the other types of errors.
Subjective Measures of Worldoad
The third category of workload measures, which is often the most satisfactory to
the pilot, is the subjective measure. There are a number of different techniques
of subjective workload measurement. One is a unidimensional scale. An example
of this is the Bedford Scale shown in Figure 8.7a, and involves a decision tree
logic. There are a series of questions: Was workload satisfactory without
reduction? Was workload tolerable for the task? Was it possible to complete the
task? If the answer is yes or no, then you go on up to some higher levels that
eventually allow you to categorize the workload of a task on a 10-point scale.
Similar to the Bedford Scale is the modified Cooper-HarperScale (Figure 8.7b),
which is taken more directly from the Cooper-Harper scale of flight handling
quality, but now has questions phrased in terms of workload. The important
point is that you can get a single number, and that number is guided by a
183
RAT114G
Workload Insignificant
WL I
Workload low
WL 2
WL 3
WL4
WL 7
DECISION TREE
------
Yes
-----o.
r
No
rldoa I satisfectory
Was
ithout rl duction?
------
WIL 6
WL 6
Yes
Was
No
Isk?
Worph
WL 9
Yes
FWasit
No
b 9 to complete
BKfold
pft
AdOWMfOrSelocled
Dorm,
a Pilot inSewcwd
Ta* w 14"*W
QW&Gon
FIZ.Ilca
Clow Negfillible
8
F&W- Sorre
Inklyur"asent
Yes
factor
MWMWPIMCM"Mwn
r@WW*dfor dnlmd
:r=
="".
Minor
fast annaft
Is it
satisfactory
No
without
ment
oaft"clas
Warrant
ipow mars
(11111clonal"
Lfodwm*
01310abrablo
doficland"
pwb-wm
remilres
9 Pam corripeossibn
"A.
paft.
19ffn
Pat
I
RoWv I
Yes
Ad*Wft peftemancs not
Is adequate
Malor &ficiencitts
Deftlenclas
performance
attainable with a
Mq**
No
MOW
tolerable pilot
def-st-Ws
workload'.)
Major denciencles
Pki convenswon is
rewww to relsh
am"
Yes
Isit
controllable?
No
If
Malordeliclercifte
ortion
170
PIM decisions
F19M &7h
184
nMN ecdL MM
series of verbal decision rules about how it is that you ought to interact with
the task. Both of these unidimensional scales, the Bedford and the modified
Cooper-Harper, are simple. Because they are simple, they have a certain amount
of ambiguity. It is not always clear why a task is rated difficult, because the
scale won't tell you if it is difficult, for example, because it had difficult
response characteristics, or because the displays were hard to interpret, or there
was heavy time pressure or heavy cognitive demands, etc.
Multidimensional scales, in contrast, assume that there are several dimensions
underlying subjective workload, and reveal what these dimensions are. The two
major candidates for multidimensional scales are the Subjective Workload
Assesment Technique (SWA.T) and the NASA T7X Sca/e. The SWAT, which was
developed for the Air Force at Wright Patterson AFB, assumes that we
experience workload in terms of three dimensions: the time demands of the
task, the effort of the task, and the stress the task imposes on us. It asks the
pilot to indicate for each of these scales, on a three-point rating, whether the
time, effort, and stress levels are low, medium, or high. By a fairly elaborate
procedure which uses all 27 possible workload ratings derived from low,
medium, and high combinations for each of these three scales, it is possible to
determine which scale is more important for a particular pilot. This procedure is
used as a way of coming up with a single measure of workload from these
three ratings on each of the different scales.
Two major problems have been found with the SWAT technique. The sorting
procedure it uses, which seems to be a mandatory part of SWAT, is timeconsuming. The other problem has to do with the scale resolution; that is,
SWAT only allows you to say that workload is low, medium, or high on each
scale. If you consider your own flight experience, you are able to give a lot
more precision to workload than three levels. You have more power of
discrimination between the resource demands of the task than simply low,
medium, and high. What happens when only three rating levels are available is
that people tend to choose the middle level, and pretty soon you don't get
much resolution at all.
A different techique, as an alternative to the SWAT is the NASA Task Load
Index, or TLX scale. This was developed by Sandra Hart at NASA and assumes
that there really are six dimensions of subjective workload: mental demand,
physical demand, temporal (time) demand, the level of performance the pilot
thinks he or she has achieved, amount of effort, and frustration level with the
task. For each of these, there is a verbal description of what it means, and,
furthermore each of these different demand levels can be rated on a 13-point
scale. You do it by putting a mark on a piece of paper somewhere along the
13-point scale. The scale gives the pilots more freedom and flexibility to rate on
185
different dimensions without a lot of extra effort, and probably provides more
information. In fact, some comparisons of how well the two different scales
have differentiated loads indicates that the TLX scale does a better job than the
SWAT. TLX also has a procedure that allows the six dimensions to be combined
into a single workload rating. For many purposes, the single-dimensional rating
scales are probably adequate for picking up most of what there is in workload.
There are really three problems with subjective workload measures. One of
them is reqspne bias. If you are simply asking for a rating of workload, we all
know there are individual differences among pilots. One may not ever admit
that the workload is greater than three, no matter how difficult things are.
Another may be very quick to admit to high levels of workload whether they
exist or not. A second problem with subjective workload measures is related to
memory. An example would be if we were evaluating two tasks, flown on two
different systems, and the pilot is asked to compare their workload. Since the
pilot's memory for the first one may have degraded, he may not be able to
make an accurate judgment based on memory. The third problem with
subjective workload measures is that they do not always agree with
performance. It sometimes happens that when two systems are compared, one
gives better performance than the other. However, the one that gives better
performance is, in fact, shown to have higher measures of subjective workload.
Which measure should then be trusted by the designer?
PhysiologicalMeasures of Woddoad
The fourth category of workload measures are physiological measures. Several
of these have been proposed: heart rate (both mean rate and variability), visual
scanning, blinking and various measures of electroencephalogram (EEG) that
can measure fatigue and, finally, the evoked potential, the momentary changes
in the EEG that are caused by a discrete event, like the sudden onset of a light
or a tone. The prevailing view is that most of these techniques have some uses,
but as far as being reliable measures of pilot workload, particularly in civil
aviation, there are more problems than there are benefits. The most successful
measures appear to be those that relate to heart rate. Here, there are two
specific measures. There is the mean heart rate. That is, the number of beats
per minute. The faster the heart beat, presumably the higher the level of mental
workload. That does hold true more or less, but there are other factors,
unrelated to mental workload that cause the heart to beat fast. Certainly two of
these are arousal and stress. Another one is simply physical load. So in a
physically taxing environment, even though the mental workload may be low,
the heart rate may still be very rapid. Thus the mean heart rate is not a terribly
good indicator of mental workload by itself.
186
A better measure of cognitive load is the variability of the heart beat interval
(Vicente, et al. 1987). It has been found that as the workload gets higher, the
variable of the heart gets lower. Figure 8.8 shows some data taken at Wright
Patterson (Wilson, et al. 1988). It is a timeline of two minutes which plots at
the bottom, the interval between each heartbeat. The fact that the curve
oscillates suggests that the heartbeat interval is itself variable. Some periods
wo-
36
36
4.
4' 1
i.
T1m (,..)
Figure &&
the beats are close together, then they get slower, then they get faster, then
they get slower. So this oscillation represents variability in the inter-beat
interval. The overall level represents the overall inter-beat interval or the mean
heart rate, plotted at the top. When the level is low, that means the heart is
beating very fast. In the figure, note that at 35 seconds into the flight test, a
bird struck the windshield. This was a fairly traumatic event, and you can see
very dramatically an increase in heart rate (decrease in the inter-beat interval)
and a reduction in the variability. So both emotional stress and the cognitive
load of dealing with this unexpected event made the heartbeat faster and
caused much less variation. Figure 8.9 (top) shows another case of relatively
low variability in heartbeat, indicating high workload. Figure 8.9 (bottom)
shows the change from high to low variability (low to high workload) with
little corresponding change in emotional load.
Collectively, it is hard to say which technique of workload measure is best. In
civil aviation tests by both AirBus Industries and Douglas there has been some
success with the physiological measures. The best approach is probably one that
187
S130-
IL 120-
S110-
1009070
601
0
10
20
30
40
50
60
70
80
90
100
110
120
10
20
30
40
50
60
70
80
90
100
1;0
120
-a1200 g.1000.
40
C
FRgure 8.9.
Graph plotting inter-bea time Inervals for huuribeg over a two-minute period.
Note the reduction InvatabUity at t=40. with no corresponding change in
moe
of workload are again represented. But there are also a set of fairly
sophisticated cognitive activities, assumed to be carried out by the pilot. These
include planning, setting priorities, establishing a schedule, allocating effort,
focusing attention on certain tasks, ignoring others, etc. As a result of this
adjustment, the pilot experiences some mental and physical demands, which we
call workload, but the workload experienced at one moment in time is used to
continuously adjust performance, establish priorities, and change task
188
Ouw..Remadm
Task Reqrements
AV0
AtctReso
TimeAvailbl
ACfOM
OUtCOMe.
wcsrj~naJJcqsod
W
Pt ys
Plenndng
AcHOvft
RasuM"g
Outcomes
ACVo"s
InrtOs
Condibons
Task
Reqirments
(b)
shShde
then express it. nstead, if they experence workload, and the workload is too
high, they drop tasks. If the workload is too low, they assume tasks.
Unfortntely, we really do not have a very strong database on how well
people conform to this model. For example, there aren't good data regarding
how good a job people do at shedding tasks appropriately and knowing
whether optimal task shedding is done weil under normal conditions, or done
poorly under stress. A program of research at NASA and the Air Force is
beginning to examine this issue, and there is a slmilar research program at
addressed, is that as people become underloaded they will tend to assume "pick
up' tasks. The goal of a pilot is not to minimize workload, but rather to keep
workload at some moderate, stable, intermediate level This obviously has longterm implicaton
for the system designer who is considering the appropriate
level of automation. The goal of automation should not be to eliminate the
pilot and reduce the pilots workload to zero, but rather to simply address the
189
the pilot's repertoire. The problems of excessively low workload, and their close
relation to issues of sleep' disruption will now be addressed.
Urndedad
The flip side of high workload is underload. As we discuss underload in this
section, it refers to situations of long periods of relative inactivity. Transoceanic
flights or long cross-continental flights are examples of underload, where very
little is actually happening. it is not surprising that very long periods of low
workload really are not optimal. The pilot will try to create some level of
workload, whether it is flight-related or not, in order to avoid sleeping. Some
interesting studies of air traffic controllers by Paul Stager in Canada found that
a predominance of ATC error seems to occur at relatively low workloads rather
than periods of high overload.
One of the things we know about low workload periods is that these interact
negatively with sleep loss. Pilots under sleep loss conditions are much more
likely to perform poorly under low workload periods than pilots who are well
rested, and so we now turn to a discussion of this important topic.
Sleep Disnrption
There have not been many systematic studies of the effects of sleep deprivation
on pilots' performance. Perhaps the best of these was a study carried out by
Farmer and Green (1985) in the UK, in which they worked with 16 pilots. The
pilots were deprived of one night's sleep, by being kept awake for 24 straight
hours. Then they did a series of in-flight maneuvers, with a wide-awake check
pilot to make sure that nothing disastrous happened. Farmer and Green looked
at the kind of errors that were made, and found that the errors occurred mostly
during the low activity portion of the flight, at the times when not much was
going on, except for an occasional need to respond to, for example,
unpredictable and infrequent warning signals. These are what psychologists call
Because we know that sleep loss has consequences that are harmful in low
workload environments, it is important to understand some of the characteristics
of sleep. We have two different forms of sleep. One is rapid qve movement
(REM) sleep in which the eyes are twitching, there is a lot of dreaming, and
there is actually a fairly high level of brain activity. The other is slow wave
zkep, so named because the EEG is very slowly changing during this type of
sleep. The brain is very quiescent during slow wave sleep. There is not much
190
dreaming activity going on. REM sleep takes place later in the night. Slow wave
sleep takes place predominately during the first part of the night. There is good
evidence that both kinds of sleep are important for the overall health of the
individual
The whole sleep wake cycle is defined not only in terms of staying awake and
being asleep, but also by a set of body rhythms, called vcadian dijyhmn that
reflect different characteristics of the efficiency of performance. These circadian
rhythms run on a 24-hour cycle and can be defined by body temperature, the
depth of sleep, sleep latency, and performance. Figure 8.11 shows the average
duration of sleep episodes and the body temperature of a person during a 48hour time period.
What the function shows is that temperature is lowest in the night and the very
early morning period. It begins to climb during the day, reaches its peak in the
late afternoon and evening, then declines at night. The graph of temperature
coincides with the bar graph that plots the duration of sleep. This graph shows
-20
Mean
Sleep
-15
10 Latency
0(min)
18a)
o
- 14---4
Lw
-37.0
a-
a)
0.TC
"0
a)
E
a)
ID
a)
-36.5
a1)
12
18
24
12
18
24
12
ga., irno9
191
oun nonut
to rcndcuilt
Pdlyt
om CznMsl
>
that if you go to sleep sometime in the early morning hours, your sleep
duration will be relatively short. If you go to sleep during the evening, your
duration of sleep will be longer.
A third characteristic of the circadian rhythms has to do with seep atency.
Figure 8.12 shows a graph of the mean sleep latency of subjects who received
the Sleep Latency Test. Sleep latency is how long it takes you to fall asleep. If
there is a long latency, it means you are wide awake, and so you are not about
to nod off to sleep. If there is a relatively short latency, it means you are very
prone to fall into a deep sleep. Figure 8.12 covers results of a 24-hour period
from 9:30 am to 9:30 am. Eight 21-year-old subjects and eight 70-year-old
Young Men
Old Men
Q0
*
25
e-
20
20
0)
CC
0)
O/
C
10
as
5
0930
1330
1730
2130
0130
0530
0930
Time of Day
Figure 8.12.
Mean sinep hindcu for 21.yew-olds mnd 70-yer-olds. Prom Richwrdson d al,
192
subjects received the Mean Sleep Latency Test (MSLT), while awake, during the
day, followed by four brief awakenings at 2-hour intervals during the night
(shaded). In the afternoon, there is a "post-lunch dip" which indicates that in
the afternoon we tend to fall asleep and drop off rapidly. Sleep latency gets
longer in the evening time (it takes longer to fall asleep), but then again
becomes very short in the morning, and rises again during the daytime. The
measures of temperature and sleep duration show only one cycle during the
day, while sleep latency has the same general cycle but with this little extra dip
in it in the afternoon.
Performance is the all-important measure related to sleep deprivation. Figure
8.13 shows how human performance on various tasks changes during the day.
The performance tends to correspond with body temperature, but also shows
hints of the "post lunch dip" characteristic of sleep latency. One graph shows
psychomotor performance, like a tracking task. You do progressively better
during the day, best in the early afternoon, and do relatively poorly at night
and in the early morning hours. The other graphs show the measurement of
reaction time, and of ability to do symbol cancellation and digit summation.
The collective implications revealed by all of these effects is that we have a
regularly trained rhythm that describes how fast we go to sleep, how long we
sleep, our body temperature, and the level of performance, all of which show a
very pronounced dip in the time from midnight until about six in the morning.
The data strongly suggest that when possible, flight schedules ought to be
arranged to take advantage of the capacity for sleep. Flight schedules that allow
pilots to sleep at times when they go to sleep fastest and sleep for the longest
are better than those that give pilots the opportunity to sleep at times when
they have a hard time sleeping because their sleep latency is long.
Sleep Disitipon in Pilots
A lot of the research on sleep disruption has either been based upon subjects
that were not pilots, or were military pilots, so there are not a lot of data that
generalize directly to civil aviation. There are two important studies that were
carried out at NASA that do have a direct bearing on the civilian piloting
community (Graeber 1988). One of these is a short-haul study in which a large
number of pilots were evaluated during a series of domestic short hauls. They
flew for three or four days before returning to the home base. Out of that study
came the first systematic conclusions of the effects of sleep cycle on the short
haul. First, the pilots began the trip with a sleep loss, because they were apt to
sleep less than the normal amount the night before they took off for the first
leg. Thus they started out behind the eight ball. This is interesting, because it is
precisely the opposite of a concept that has proved to be an effective antidote
against sleep loss, the concept of prophylacdic seep. This is defined as getting
193
210
Psychomotor performance
66
220
230
64
E 62
240 -91215182124436
__
96058912151821243 6 9E250
hours
Symbol cancellation
4.0
4.2
5.2-
9121518212436
4.6
hours
4.
4.8
5.0
'
3 69
6.0-
5.2
6.2
6.4FguM 813.
260270
280-
5856
3.8
Reaction time
Graphs showng how human purfornwn e vados during the day wut a rhthm
"npodA g to body
I
wamprum. (hrom Klein aL., 1972)
extra sleep in advance of a period of time when you are going to miss a lot of
sleep. It can do a very good job of compensating for the later loss of sleep.
A second finding from the short-haul study was that sleep loss each night is
greater on layovers than at home. Generally, the pilots were sleeping less per
night on the layovers. The sleep was also more fragmented during the layovers.
Graeber also examined the buildup of fatigue across the four days of flying, and
found that this buildup (measured by the pilots' subjective rating of how tired
they were), was really greatest after the first day of the trip, with a more
modest increase in fatigue after the third and fourth days.
194
Thnesharfin
Now consider what each day of the trip is like. Some days are very fragmented
and consist of three or four different legs on different aircraft -- up to seven or
eight takeoffs and landings at different airports. Other days may involve only
one flight with a fairly long layover. Thus we can distinguish between busy
days and relatively nonbusy days in terms of takeoff and landings. Graeber's
third conclusion was that sleep was better following a busy day than following
a relatively light day. That is not altogether surprising. The busier the day, the
more takeoffs and landings, the more fatigue within a day, and, therefore, the
better the sleep will be after that day is over. A fourth conclusion from
Graeber's study is that down-line changes of schedules are bad for sleep planning.
If, after the second or third day into the short haul, the pilot was informed of a
sudden change in the flight schedule, this change seriously disrupted the pilot's
sleep schedules. It was almost as if the crews could preprogram themselves for
how much sleep they were going to need each night into the short haul.
However, if that schedule was suddenly disrupted by a change, that change
disrupted the preprogramming. For pilots who have done operational flying for
commercial airlines, most of these conclusions are probably not surprising. The
important point is, for the first time, they are firmly documented in an objective
study with data.
about 25 hours. Studies of people who have gone into caves where they have
no sense of waking in the natural day/night cycle reveal that these subjects
tend to adopt a 25-hour schedule rather than a 24-hour schedule. There are
interesting reasons why this is the case, but it is very dear that our natural
schedules tend to be longer than the daylight forc-s iis into- W!e'-. !eft to our
own devices during the week, we tend to stay up later and later each night,
and we tend to be late stayers more than early risers. What happens,
nevertheless, when we go into a long-haul flight is that we have suddenly
moved to a situation where the day/night cycle in the environment where we
land, is different from the day/night cycle that our brain has adapted to when
we took off. This phenomenon is called desynchrrmiado.
Desynchronization is represented by Figure 8.14. The upper graph represents
the westbound flight and the lower graph represents the eastbound flight. The
dotted line is the natural circadian rhythm that was formed when we left our
home base. So it is the same no matter whether we are flying west or east. The
solid line for the west- and eastbound flights is the circadian rhythm at the
destination. As we fly west, we are flying with the sun, and initially undergo a
195
4# %
%/
Natural Rhythm
Eastbound Flight
-*
Fg"
L14.
4.
%%%
Grap~hs shwig
zone (ad".g
. n
r)
very long day. As we reach the new destination, now we have a day/night
cycle, but it is shifted ahead of what our natural cycle is. So when our brain
thinks it's night, it is still afternoon. When we are flying east, on the other
hand, we have a very fast day initially. When we reach our destination, again
The data in either case obviously suggest that there is a mismatch between our
circadian rhythms and the post flight day/night cycle.
The data also suggests that it is considerably easier to adapt to westbound
flights than eastbound flights. When flying west, the natural rhythms have an
easier time lengthening themselves to get in synchrony with the local day/night
cycle. On the other hand, when flying east, it is as if the rhythms don't know
whether to contract and make a very short day, or expand to make a doubly
long data ahso sges that s coniderdata indicating that the eastbound
flights, which condense the day, are worse than the westbound flights which
stretch the day. These data come, in part, from examining the way in which
different characteristics of the physiological systems adapt to the new rhythms.
In other words, you have got a natural rhythm which was in existence when
196
you left, and you acquire a new rhythm which you should take on when you
reach your destination. The longer you stay at your destination, the more the
old rhythm is going to shift into phase with the new rhythm. We can then plot
how rapidly that shift takes place.
Table 8.2 shows the shift rates for different variables after transmeridian flights,
either westbound or eastbound.
Table 8.2
Shift Rates after Trmnsmeidian Flights for Some Biological and Performance
Functions
Westbound
Adrenaline
Nradiuaaline
Eastbomund
90
160
60
120
52
38
74
Hlemit rate
Body temperature
90
60
60
39
17-01H(
47
32
probably more like five to six days after an eastbound flight, and perhaps three
to four days after a westbound flight. Figu- 8.15 shows some more data
100 -6
means of 8 variables
.-.
50-
25 -- 1.5
12.5
0.75
6.25-- 0.375
0
no"r 615&
Avmerage
8 days
representing this shift. It shows how much resynchronization took place for
different variables (body temperature, performance, etc.) after the first through
the eighth day. Notice that even after eight days, subjects still haven't
completely resynchronized with the new rhythms, although most of the
resynchronization took place after the second and third day. The bottom line
question of course is whether this desynchronization leads to a higher number
of pilot-induced accidents or poorer pilot performance. At this point, there isn't
a good database to suggest that is the case. In other words, there aren't
accidents that have been directly attributed to the resynchronization problem,
but there are certainly suggestions that it may have been a contributing cause
in some instances.
Reconmed
There are a number of recommendations that have come out of the research on
sleep resynchronization, and these are, again, taken from Graeber's work
(Graeber, 1988, 1989). His chapter recommends that pilots should sleep when it
is most effective and do so within the natural cycle. Where possible, sleep ought
198
to be scheduled at late night, early morning hours in the phase with the
rhythms to which the body is accustomed. Extra sleep, rather than deprivation
prior to a short haul, is advised. Following a long transoceanic flight, Graeber
argues it is better not to sleep immediately after one's arrival, but simply try to
stay awake until the local bedtime, particularly if one is going to be adjusting
for some time to new rhythms. During any 24-hour period, sleep is relatively
more effective before takeoff than after landing during a layover. So following a
landing, sleep during this period is going to be better just prior to the
subsequent takeoff. This is consistent with the idea of prophylactic sleep,
sleeping in advance of a period where one knows sleep deprivation is likely to
occur. Prophylactic sleep is helpful and much more restorative than sleeping just
after a period of time without sleep.
A somewhat more controversial issue, but one that is certainly receiving some
research interest, concerns conrolled napping. How effective is controlled
napping in flight, assuming, obviously, that somebody else is awake at the
controls. The studies that have been done of napping indicate there are really
two sorts of napping. First, there is micro-seep, where one may doze off for a
couple of seconds or a very short period of time. There is very little evidence
that micro-sleep, in itself, is effective in restoring sleep loss. Then there is a
bonafide nap. There is a minimum amount of time, about 10 minutes, before a
nap can be effective in terms of restoring some sort of sleep loss.
Another phenomenon that relates to naps is the concept of sleep inertia. Its
something that is intuitively familiar to all of us. Sleep inertia describes the
cognitive inertia we experience immediately after waking up. In fact, for 10
minutes or so after one wakes up, there is an inertia that inhibits our ability to
respond quickly, think fast, and so forth. This is well-documented in the
research of Chuck Czeisler at Harvard, which suggests that any program of
controlled napping has got to be one in which the wake-up time is well in
advance of the time one may have to carry out some sort of high-level cognitive
activity or rapid action. If this is applied to a pilot flying transoceanic, you don't
want to wake up just before you start making the important decisions required
on the approach, but rather with sufficient time to dissipate that sleep inertia
before such decisions are required.
In conclusion, it should be noted that the findings and recommendations
reported here result from pooling information from a lot of data sources, many
of them not taken from aviation. Furthermore, the causal links between the
different forms of sleep disruption and pilot error have not always been
conclusively established. Nevertheless, it is prudent to believe that there are
199
Human Error
Anytime one talks about human error, there is a tendency to do a lot of fingerpointing. Pilot error comes up with a red flag as being a frequent cause of a
disaster or accident. Training in engineering psychology, however, leads one to
conclude that when errors do occur, they rarely occur as a result of a mistake
made exclusively by the pilot. Typically errors are caused by some traininginduced, schedule-induced, or design-induced factor that made that error almost
an inevitable consequence--something that was bound to happen sooner or
later. This is actually a positive philosophy, for it suggests that there are usually
steps that can be taken to reduce the likelihood of error.
A number of studies that have looked at pilot errors have tried to categorize the
nature of the vrious errors in terms of where they occurred, how they
occurred, and what they were the result of. The approach to pilot error
classification that is consistent with the information processing model presented
in Chapter 7 is one that identifies four major kinds of errors. In this model of
information processing, there are the stages of perception and understanding
the situation (situation awareness or diagnosis), formulating some intention for
action, (deciding what to do about it and making a choice), and finally
executing the action. When taking an action, we often rely upon our memory,
both short- and long-term, to help us recall the rules of what it is we are
supposed to do. Within this context, two researchers, Norman (1988) of the
U.S., and Reason (1990) of the U.K. have come up with similar ways of
classifying errors. Classification is important because the different kinds of
errors seem to have different remediations, or different fixes. This classification
is nicely applied to aviation in Nagers chapter in Wiener and Nagers book on
Human Factors in Aviation (Academic Press, 1988).
200
=ro
Timesharinf Workload. and Human
the situation, you act with a high degree of certainty, even though you are
acting incorrectly. As Reason says, one's actions are "strong but wrong."
202
Timesbarii.
203
A general characteristic of slips is that they are "strong but wrong." An operator
commits to the action, and usually does it with the same degree of certainty as
the correct action. Fortunately, we are usually fairly good at detecting our own
slips just as they are made. As we type or enter data into a CDU, it is very
obvious when we make a slip, as if the finger knows before the brain knows
that it has gone to the wrong place or setting. With a particular switch in an
aircraft, you may know immediately that you made the wrong choice.
The fact that we are good at catching ourselves making slips has some
important implications for how we remediate them. Remediation of slips is a
major issue in system design. Since slips usually occur when attention is
directed elsewhere, that means slips usually occur on sequences of behavior that
are fairly well learned for operators that are highly trained. So remediation is
really not so much in training as it is in system design--remediation includes
such things as avoiding the design of similar controls with similar physical
actions which must be used in similar conditions. Good design avoids
circumstances where you have two similar switches that are flipped in similar
conditions but for different purposes. Always try to adhere to SR compatibility.
One of the major culprits causing slips is the incompatible response mapping,
discussed in Chapter 7. Here without paying attention, the pilot may have a
tendency to move something in the wrong direction because the right direction
was an incompatible response.
Ewr Roriedhiaion and Safuards
In this section we review and present a series of recommendations that
psychologists have proposed to remediate human error -- eliminate it, or reduce
the likelihood of its unpleasant consequences. First, there is the issue of
allowing for mvenifbily of actions. Such an allowance creates what we call a
fagiin system. As we noted, operators are usually pretty good at monitoring
their own performance and detecting their own errors if there are slips. Once
you've made an error, it is nice to have a chance to correct it. Some systems
have an "error capture" mechanism, which captures and delays the response a
little bit before its consequences can effect the system. That's not always a
feasible design option, but there are situations in which it can be made feasible.
There are computer systems that, whenever you press a button that involves
deleting a major file, will come back with a message that says, "Are you sure
you want to delete this?" That is like capturing your behavior before it gets
passed on to the system. Slips often involve throwing things away. Don
Norman, the author of The Psycholog of Everyday Thing, stores all of the trash
baskets in his office for 24 hours in a separate room before they are emptied. If
someone in the office realizes the next day that they inadvertently threw out
something important, they can go into the room and pull out the information.
204
This is a forgiving system. On the other hand, if, on an airplane, you slip some
paperwork into the seatback pocket, then forget it when you exit from the
plane, your chances of getting it back are slim. As soon as the plane is empty,
the maintenance crew will have almost immediately cleaned out the seatbacks
and destroyed it. That is not a forgiving system that acknowledges the fact that
people do have lapses of this sort.
The idea of reversible actions, or forgiving systems, where a slip can be reversed
and undone before it is passed on to the system has led to a philosophy of
human error that is somewhat of a marked departure from an earlier
philosophy. That earlier philosophy was that human errors are bad, and
whenever they occur, we ought to try to remediate them. Therefore, we ought
to try to redesign the system to make sure an error doesn't occur in the first
place. This philosophy has led to two approaches. One is called "bandaids." In
the bandaid approach, the system gets more and more complex, because every
human error is a cause for another design feature (i.e. a bandaid) that tries to
eliminate the human error. This correction, by making the system more
complex, very often creates conditions conducive for another error (mistakes
become more likely with more complex systems) and doesn't acknowledge the
fact that errors are probably always going to happen to some extent in any
case; any fix for one sort of error may be likely to produce another error. The
second approach characterizing the old philosophy that all human errors are
bad is one which pushes automation as an ideal because of the belief that a
computer can perform better than a human if there is a mistake. The problem
with automation is that the designer is usually transferring the responsibility for
human error to someone else. For example, this responsibility may be
transferred from the pilot to the computer programmer who is just as likely to
make the errors as the pilot.
In contrast to the earlier philosophy, the proponents of forgiving systems make
two assertions about errors. They say that an error is, first of all, unpredictable
and inevitable. No matter how we design the system, and patch it with
bandaids, errors are always going to occur to some extent. Furthermore, they
say that error is sometimes a necessary consequence of the fact that the human
is a flexible performer. It is that very flexibility that makes us want to keep
humans involved in the first place. Pilots have flexible problem-solving skills,
and that's good. There is an inevitable cost to that flexibility, and that
sometimes is going to lead to the wrong action in inappropriate circumstances,
but we still want to maintain that flexibility because of its positive qualities. We
have to accept the consequences, which are the occasional errors; therefore, our
philosophy of redesigning the system should be one that says errors are going
to occur but let's design the system in a way in which they can be tolerated.
This is the philosophy for envr toeinwt system.
205
In this vein, Earl Weiner has discussed the concept of the eledmc cocoon. The
idea here is that a pilot ought to be free to make a lot of different responses,
some of which may be incorrect. The appropriate role of automation would be
to simply monitor the performance envelope of the aircraft, and only intervene
if the errors are serious enough to bring about a serious consequence. The idea
is to have some master computer monitoring the pilot, but allow the pilot a lot
of opportunities to make errors and to correct them before things get bad. Bill
Rouse and his associates have done a lot of work on this concept for the Air
Force, as part of the Pilot's Associate program, designing electronic copilots that
can monitor the pilot's performance and act as a cooperative crew member.
Their concept is that of an intelligent system which can monitor human
performance and infer the intentions of the human control actions. You have a
pilot interacting with a task under intelligent monitoring. The pilot's behavior is
providing information to the monitor. The monitor, in turn, can take a series of
actions in the face of the pilot's behavior, if the monitor detects that the pilot
might be making mistakes. Rather than just simply taking over for the pilot,
Rouse and his colleagues suggest that this intelligent monitor might go through
a hierarchy of guidance. At the very first level, if the intelligent monitor infers
that the pilot is doing something that is amiss, it might do nothing more than
increase vigilance. If there is continued evidence that the pilot's behavior is
inappropriate, the intelligent monitoring system might say some things to the
pilot, like "Are you sure you want to do this? Are you watching your airspeed?"
If the error worsens, the monitoring system might prompt the operator with
some advice like lowering or increasing airspeed, etc. Only under the most
serious error circumstances will the intelligent monitor assume command
automatically and correct the error.
Emro in a Systems Context
somewhere (like the pathogen). All that was needed was for one operator to
"trigger the system, and make these inevitable errors occur. Furthermore, he
argues that there are a large number of potential causes of these catastrophes
within complex systems. Rather than pointing a finger of blame at a particular
operator who commits the final triggering error, Reason argues that the real
remediation should be accomplished by considering a number of uediin
factors that made the disaster a nearly inevitable consequence of a triggering
human error.
One of these factors is the collection of hardware defects related to poor human
factors concerns of design, construction, and location. System goals that are
incompatible with safety also contribute to errors. Very often in industry, system
goals are designed towards production rather than safety. These two goals are
not always totally compatible. Poor operating conditions have a tremendous
impact on the extent to which the goals are or are not compatible with safety.
Inadequate training is another factor. Just checking off a box and saying
somebody has been through the simulator is inadequate. Poor maintenance
procedures is an additional factor that creates conditions for error. The Three
Mile Island disaster was a case where maintenance procedures were sloppily
carried out, and it wasn't clear to the control room personnel on duty what
systems were and were not in operational status. Finally, management attitudes
(or lack of guidance) can lead to violations by operators that will help
propagate unsafe acts. The operators at Chernobyl provided a nice example of
where the people at the plant simply do things that they knew weren't
supposed to do, because the guidelines had said it was all right to do so. We
are all making violations every time we surpass the speed limit. We know we
are going over the speed limit by a few MPH, because we don't have incentive
not to do so.
Reason's final point is that sometimes even though a system is very well
designed from a human factors point of view, following the sort of prescriptions
we have discussed here, there will still be human errors because of the failures
at all of these other levels. This is a systemwide approach to human error.
207/208
CockpitAtmation
Cockpit Automation
by Richard F. Gabriel, McDonnell-Douglas, retired
/loduc~on
The Federal Aviation Administration (FAA) has a direct and pervasive influence
on aircraft design through its certification process, and on operations through
its design and operation of the Air Traffic Control System (ATC). In spite of the
FAA's broad regulatory administrative role, it is difficult for rules and
regulations to keep pace with rapid technological advances in aircraft design
and operation. It is therefore important that FAA personnel have an
understanding of the impact that advanced technology (automation) may have
on those who operate these systems, so that the benefits of automation can be
realized without unacceptable side effects.
209
in recent years, increasing levels of automation have shaped and changed the
aviation industry. These effects include:
"*Economic impacts - growth in passenger demand, increase in fuel prices
and other operating costs, increased competition among airlines;
"*Changes in airspace and airiort configuration - capacity limitations, huband-spoke concepts, air traffic control requirements;
Effects on eouivment - increased equipment reliability, increase in aircraft
longevity, aircraft design and performance improvements, increased
automation of flight decks;
"* Effects on operators - reductions in crew size, reduced emphasis by airlines
on training, changes in crew qualifications and availability.
This review will consider the human factors issues of automation from the
operator's standpoint. Although the discussion is relevant to ATC as well as
flight crews, emphasis will be on cockpit applications.
Definition
Automation has been defined as the incorporation or use of a system in which
many or all of the processes...are automatically performed by self operating
machinery [and] electronic devices. (Webster's New World Dictionary, 1970).
Figure 9.1 depicts the progress of automation in aircraft and indicates
automation has been increasing since the origin of heavier-than-air flight.
Automation is not an all-or-nothing proposition. Sheridan (1980) has identified
ten levels of automation, from totally manual (100 percent human controlled),
to systems in which a computer makes and implements a decision if it feels it
should and the human may not even be informed (100 percent computer
controlled). Current systems generally fall between these extremes, but the trend
is to reduce the role of the human and move away from human control even in
decisionmaking. Self-correcting systems are becoming commonplace in newer
aircraft. Table 9.1 presents Sheridan's levels of automation.
Cockpit Automation
eProtection
AlSystems
saFlight
Performance Mgt.
(A-32ie
(Mf-80)
Mgt. Systems
S~(MD-80,
B-767)
"p
r"
ry"Zoro .Reade
Director Device
Electronic Autopilots with
"Sperry
Autopelot in
Lockheed Electra World
(Howard
F~~~~~gmia
~~ FlightDeorni
~ Hughes)
~ f ~c~tAhmain
~
~
wt E..nvelopei
Sperry Automatic Pilot
Patent for
Gyroscopic Stabilizer
(Sir Hiram Maxim)
Augmentation (Taplin)
Rlight Derno of 2-Axis
Coupled Gyroscopic
Stabilizer (Sperry)
M&D Leading to Patent for
Stability Augmentation
System (Wright)
FW9~ 9.1. A k
dngdthe Dedom
orAcrf
ukn
result of disuse.
Some of these automation concerns are illustrated by the following scenario:
A pilot of average skill is captain of an advanced, highly automated
ca ain has flown in this type of aircraft for some years and has recently
upgraded to his position. The crew flies in the automatic modes most of
211
the time. They are making an automated approach and landing, when, at
the middle marker, a major electrical failure causes the aircraft to revert
back to its basic characteristics. The crew lhs to take over control of the
aircraft, make the correct decisions, and take appropriate actions.
Additional factors may complicate their decisionmaking: night, bad
weather, the start of a bid cycle, fatigue and other plausible and realistic
influences.
The ultimate question for designers, manufacturers, operators, and certifiers is
whether safety will be enhanced by incorporating a specific automated
capability. The answer lies in the crew's ability to interact with the automated
system effectively and to take over in the event of a failure or a situation not
foreseen by the designers.
Table 9.1
The Spectrumn of Automation in Decision Making (Sheridan, 1980)
100% HUMAN
CONTROL
100% COMPUTER
CONTROL
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
212
Codwit Atamnation
E e
Experience gained in the design and operation of automated systems in nonaviation environments is often relevant for aircraft systems. Process plants such
ab power plants, oil refineries, factories, and offices have adopted various levels
of automation.
Nuclear Power Studies
Designers of nuclear power plants have incorporated many automated features
in their control systems to avoid catastrophic human error. This is because they
fear that the human operator may not be able to respond to system emergencies
that occur with manual systems. They believe that automation, because of the
complexity of nuclear power plant design and processes, can solve this problem.
Yet, it has been found that automated safety systems aren't necessarily the
answer. An evaluation of 30,000 nuclear plant incidents revealed that 50
percent occurred through unique combinations of machine and human error
(Woods, 1987).
The Three Mile Island accident is a case in point. The initial blame for this
incident was assigned to humans. Investigators found, however, that the design
of the human interface was greatly deficient. Designers had not considered the
human functions systematically. They had paid little attention to display/control
design or work station layout. The control room was filled with banks of almost
identical controls and displays that made it difficult to identify the appropriate
information source or required response to system problems. For some
functions, the operator could not see the display and the corresponding controls
simultaneously. To compound these problems, training of control room
operators had been inadequate.
After the Three Mile Island accident, the response of managers and designers
was to further divorce the human operator from system control through even
more automation. Extensive programs for redesigning the displays and controls
were initiated. One involved changing the warning system from a tile (e.g.,
legend light) system to a computer-based system. The purpose was to automate
the alarm system and reduce display clutter. The result was disappointing. The
computer-based system wasn't programmed to anticipate all the possible
combinations of events that could occur; the operators lost the ability to
integrate display information by recognizing patterns of lights and thus gain
insight into the fundamental problem.
213
Office Automaton
Research in office automation has shown that no system, even a very simple
one, is ever completely defined by designers. One reason is that the system is
not always used for the purpose initially intended. A screwdriver offers a simple
example. It was designed to drive or loosen screws. But it is also used to open
lids of cans, scrape surfaces, clean fingernails, and even as a weapon. Similarly,
a wire coat hanger may be used to help open a locked car.
The same variability in application is found with automated systems. Inventory
systems may be used differently as business grows, shrinks, and/or conditions
change. Accounting systems may have to be altered as tax laws change. Even
office electronic mail systems may be used variably as security needs or capacity
requirements change (Card, 1987).
According to articles in the public press, many of the increases in productivity
anticipated from office automation have not been realized. Moreover, the costs
in personal satisfaction and well-being have been high. Worker motivation has
suffered as jobs have been changed and depersonalized.
Table 9.2 offers some conclusions various authorities have reached after
studying automation in arenas other than aviation.
Cousonimi
Table 9.2
Based on Research in Nonaviation Automation
"* Human
is degraded when automated systems perform very well
(Rouse, performance
1977).
"* In situations that require strict vigilance, information sampling and transfer is
done better by humans than by automated systems (Crossman, Cooke, and
Beishon, 1974).
214
Codit Automation
"* Automated systems usually solve simple problems but fall down in more
complex cases (Roth, Bennett, and Woods, 1987).
"* We need to complement the design for prevention of trouble with the design
for management of trouble (Roth, Bennett, and Woods, 1987).
"* Computer systems should be designed as a tool, not as a replacement for the
human (Roth, Bennett, and Woods, 1987).
Accident Data
Errors on the part of the flight crew have historically been cited as a primary
cause in most accidents. Figure 9.2 presents data tabulated by Boeing
Commercial Aircraft Company and cited by Nagel (Nagel, 1989). It shows that
flight crews have been identified as a primary cause for accidents about 65
percent of the time. The next largest primary cause-airframe, power plant, or
aircraft system failure--accounts for less than 20 percent of accidents.
As shown in Figure 9.2, the flight crew has remained a primary cause of
accidents at about the same frequency over the years since 1957. The reason for
the overall improvement in system safety is probably not a result of any single
factor. Reliability of equipment, better knowledge of weather, and almost
universal availability of instrument landing systems have undoubtedly
contributed. The largest gain in safety of air travel was made during the 1977 1981 period. (This was before the introduction of third generation jets that
dramatically increased automation in the cockpit.) The following period
21S
eucn
Factor
Primary
PrimarTotato
Right Crew
441
Airplane"
119
11
Weather
32
12
Maintenance
19
Misc. (Other)
30
10
674
231
Total with
Unknown orUnknowor
Fiou
9.2.
40
50
60
70
42*
33
Awaiting Reports
Total
30
150
Airport/ATC
Known Causes
20
10
Total_10 yrs.
_J
"ECUCI
i m
MOMty Adt~n
Turbulence Injury
Injury
Evacuation
31
262
731
747
Li
"vcuaes
Airframe
Araft Sysems
DnPoweot
1959 - 1986
Last 10 Years
(1977 - 1986)
Source: Statistical Summary - Boeing
from Nagel. 1987
I9sla).
(during which the MD-80, 757, and 767 were introduced) suggests a slight
reduction in safety, but this change may not be statistically significant. Even
though flight crew error rate as a cause of accidents has remained constant,
flight crew performance probably has improved through better training (use of
simulators, for example), better human factors engineering, and other
performance enhancements.
The trend in commercial aviation has been toward dramatic improvements in
safety. Table 9.3 shows accident trends in terms of the probability that an
individual will be killed due to an accident on any nonstop flight in the United
States in 5-year increments since 1957, the date jet service was initiated. The
data indicate that a traveller is approximately 10 times safer now than in the
1950s.
216
Codak h
Table 9.3
b
of an Individual Being Mlled
an a Noh-Stop U.S. Domestic Tnmkilne Fligh
.PERIOD
RISK LEVEL
1957-61
1 in 1.0milion
1962
1967
66
71
1 in 1.1 million
1 in 2.1 million
1972-76
1 in 2.6 mllion
1977-81
1 in 11.0 milon
1982-86
1 in 10.2 mllion
Incidet DAM
An incident has been described as an accident that didn't happen--an event that
could have resulted w. an accident but did not because the crew recovered
(avoidance maneuver) or other factors intervened. Since incidents occur more
frequently than accidents, they provide sufficient data to identify trends that
may allow detection of unsafe conditions and allow corrective measures to be
initiated before accidents occur.
The Aviation Safety Reporting System (ASRS) was established by NASA to
provide an incident database. The ASRS database includes data from all
segments of aviation, including commercial aviation, general aviation, and air
traffic control It is interesting that ASRS incident data presented in Figure 9.3
mirror almost exactly the proportion of human error depicted in Figure 9.2.
A NASA study on classification and reduction of pilot error used the ASRS
database to identify problems associated with Control-Display Units (CDU) in
cockpits (Rogers, Locan, and Boley, 1989). The CDU is a common feature of
automated systems ar s a common source of crew errors. It allows the
operator to program .,d observe the state of automated equipment. In modem
cockpits, it generally consists of a cathode ray tube (CRT) and a related
keyboard.
Of the approximately 29,000 reports in the ASRS database at the time of the
NASA study, 309 involved CDUs. Table 9.4 provides some specific problems
found with CDUs. This analysis of CDUs shows that both human and machine
error occurred, with human error predominant. Clearly, humans make errors
even in automated systems.
217
hUmiiAnPac3
70%
.t
.
.1
.ogme
.of
...
wniEfa i
GOA^.3
....
Ui
(FA.t.... ....
MR~a~
bl..9..
ax)Aa~
Roea
m,& oe,18
8f
Percntag
ReotdInidnstatdueoCD
.7 insufficientNtimebto program
*~~~~~~~al 69.4ins~*7
eneed43
descetud
-44 restictions
hl n odn
teimetionpoga
Inuficentl
incorrectlysconnrunwaynchange
one potential weakness of the ASRS is that reports are voluntary. Not everyone
experiencing an unsafe condition reports it. Aircraft equipment malfunctions
218
Cockdit Automation
probably occur more frequently than are reported to the ASRS, particularly
when they do not lead to incidents or near accidents.
However, the FAA requires significant equipment problems to be reported as
Service Difficulty Reports (SDRs). An analysis of SDRs for DC-9/MD-80 aircraft
during one time period is provided in Table 9.5. This table reveals that of the
445 events included, 201 required crew intervention. Of these, 160 required an
unscheduled landing or aborted takeoff. Only four SDRs involved cockpit crew
error. In focusing on accidents, it is easy to forget just how significant the
crew's role is in averting accidents caused by equipment malfunction.
Data from the Douglas Aircraft Accident/Incident Database supports this
conclusion. Table 9.5 shows that of the 736 reports in the McDonnell-Douglas
database, 65 percent are related to equipment malfunction. Only 12 percent are
related to crew error.
Pilot Opinion
The opinion of the flight crews operating the aircraft provides an important
source of information on cockpit design. Although crew opinion is subject to
many sources of bias and is not by itself adequate for design decisions, it is a
rich source for hypotheses about design advantages, disadvantages, and areas
needing intensive study.
Table 9.5
AnayWs of DC-9/MD-80 Service Difficulty Reports
445
201
29
131
11
3
13
12
29
45
230
CREW ERROR
12"
219
This brief review suggests that many of the same difficulties encountered in
non-aviation environments are experienced in automated cockpits.
220
Cockvit Amaau
Table 9.6
requirements
e Reduced costs
" By taking away the eas.parts of the task automation can make the
operator's task more difficult.
"* The classic aim of automation is to replace human manual control, planning
and problem solving by automated devices. But even highly automated
systems need humans for supervision, adjustment, maintenance, expansion,
improvement, etc.
"* The more advanced a system is, the more crucial may be the contribution of
the human operator.
"* Designers may view the human operator as unreliable and inefficient, to be
eliminated if possible. There are two (2) ironies in this: design error can be
a major source of operating problems; and designers seeking to eliminate the
human operator still leave7hinVher to do the tasks which the designers can't
automate.
frequency of use. (Consider any course which you passed and haven't
to copeYet
with
about
since.)onlyKnowledge
thou
lt about
theabnormal
operator is
feedback.
andhow
through use
develops
cond~ion
expected to cope with such situations when the reliability of the automated
system is te justification for acquisition.
"* Current automated systems work because they are being monitored and
"* A paradox is that with some automated systems the human operator is given
a task which is only possible for someone who has on-line control.
"* If a human is not involved in on-line control, he does not have detailed
222
Cockpit Automation
Job SeibcdUn
Fag
ockpit Automation
may be more critical because of the extent of the control exerted by the
automation. For example, multimode displays and keyboards may require
more disciplined cross checking and special procedures to assure the desired
mode is selected before control actions are taken.
On the other hand, many errors attributed to humans are facilitated by
poorly designed crew interfaces such as difficult-to-use displays and controls.
In fact, many of the early problems that supported the development of
human factors engineering as a separate design discipline were knob and
dial problems. Design-induced error became recognized as a real contributor
to human error. Automation does not completely eliminate this type of
error, and, in some cases, may facilitate it.
Display design is one of the areas where automation may contribute to
human erroL. The common sense approach so often used in the past by
display designers will certainly not be adequate to evaluate the varieties of
new format designs possible with electronic presentation. What is common
sense to a designer sitting at his desk may not be common sense to a pilot
flying in a crisis environment. For example, electronic displays introduced to
date have presented information in formats similar to those available in
conventional cockpits. These may need to be augmented by displays more
suitable for the specific monitoring function required.
Developing displays with formats that facilitate quick, accurate
understanding and aid problemsolving and decisionmaking can greatly
enhance crew performance and acceptance of automated systems. Designing
control systems that allow the crew to accurately insert information and/or
control the aircraft is essential for reducing error in programming systems
for normal operation as well as for making effective responses in an
emergency or abnormal situa-ion.
Design Practices
The information provided to this point indicates that although automation is
advancing rapidly, it has not always lived up to its promises. One reason for
this lack of complete success arises from the design processes followed. This
section will consider what might be considered a typical engineering design
process. While specific design teams differ in many respects, and both
personnel and practices constantly change, designers historically have had
225
226
CockDit Automation
The process described above has a number of weaknesses. One is the ability of
design engineers to fully understand and weigh all the factors that influence
their designs (See Table 9.8). Research into actual engineering practices has
revealed a number of areas where design teams depart from the ideal (Meister,
1987). Designers often deviate from a deliberate, logical process. Behavioral
data, even if available to a designer, may be ignored. Managers may reject
designers' recommendations if they believe these make no difference in
traditional aircraft performance parameters--reliability, cost, or development
time.
Table 9.8
Cognitive Factors Influencing Design Elements (Meister, 1987)
customers wanted it. There has also been an interest in reducing crew workload
through automation of various flight functions. The latter area received
particular emphasis during the development of the MD-80 and B-757/67 designs
in order to justify a two-person crew. Recently, as cockpit automation has
developed and its impact on safety has generated concern, more attention has
been devoted to identifying a philosophy of automation. Boeing has published a
paper illustrating its philosophy for some recent aircraft (Fadden and Weener,
1984).
A primary approach to reducing flight deck workload has been to simplify
system design to make the aircraft easier to operate. As an example, the number
of fuel tanks has been reduced to simply fuel transfer procedures. System
redundancy has been the next most common approach to increasing flight
safety. Automation has been incorporated only if design goals cannot be
achieved otherwise. Table 9.9 provides reasons commonly used to justifying
automation in Boeing's view.
Figure 9.4 illustrates Boeing's process for determining the level of crew
involvement in flight deck operations. A number of automation philosophies
have been proposed for making such determinations. Table 9.10 lists some of
them and their limitations.
Although none of these philosophies seems to be completely adequate at
present, there appears to be growing support for the concept of human
centered automation, as evidenced by the conclusions of the NASA conference
attenders cited later in this discussion. It should be apparent that if an
Table 9.9
Boeing's Automation Philosophy
(Reasons to Automate)
Cockdit Automation
CrSysewm
SysDem
stem
DesiIn
Analyze
Achfeu
Re
Mssion
iens
Eva
JDesi
Procgdixoai
onpTs
,noteefine
Sysem
Re ncioents
u
DelineDeieaetol
System
Isonfao, n
ulso
A
Functions
remests
Ca
emb/ar-fso
J~~F
...
_ n~s
Conce 8Tet
230
Cockpit Auimation
Some of the task changes identified were a decreased need for computations by
flight crews, reduced opportunity to practice motor skills, less active systems
monitoring, and more evenly balanced workload between the pilot flying (PF)
and pilot not flying (PNF). In an advanced cockpit, the PF has more of a
managerial function than previously, while the PNF does more work but less
active systems monitoring.
Table 9.10
Design Philosophies
PHILOSOPHY
UMITATION
Human-centered automation.
presents participants' conclusions as to how the crew role has been altered in
recent aircraft design.
Table 9.11
Functions of the Human Operator
PROCESSES
PERCEPTUAL
MEDIATIONAL
ACTIVITES
SPECIFIC BEHAVIORS
DETECTS
INSPECTS
OBSERVES
READS
RECEIVES
SCANS
SURVEYS
DISCRIMINATES
IDENTIFIES
LOCATES
INFORMATION PROCESSING
CATEGORIZES
INTERPOLATES
CAILULATES
ITEMIZES
CODES
COMPUTES
TABULATES
TRANSLATES
ANALYZES
CALCULATES
CHOOSES
COMPARES
COMPUTES
ESTIMATES
PLANS
ADVISES
ANSWERS
INFORMS
INSTRUCTS
COMMUNICATES
REQUESTS
DIRECTS
INDICATES
TRANSMITS
COMPLEX-CONTINUOUS
ADJUSTS
ALIGNS
REGULATES
SYNCHRONIZES
TRACKS
SIMPLE-DISCRETE
ACTIVATES
CLOSES
CONNECTS
DISCONNECTS
JOINS
MOVES
PRESSES
SETS
PROBLEMSOLVING AND
DECISIONMAKING
COMMUNICATION
MOTOR
Human Factors
Human factors may be defined as the application of knowledge about human
characteristics to the design, operation, and maintenance of systems. This
discipline gained recognition during World War II when the military recognized
that performance and safety could be enhanced by improving the harmony
between machine and human characteristics.
232
Cockpit Automation
Initial human factors interest was largely in knobs and dials--in improving
displays, such as altimeters, and controls such as levers, knobs, and cranks.
Fundamental principles such as control-display compatibility, and color and
position coding, are products of this work. Many of the early contributors to
human factors were experimental psychologists drawn from academia and
employed by the armed forces to study specific problems. At the end of the war,
most of these professionals returned to civilian status. A few remained in
government laboratories.
Table 9.12
Crew Role
HISTORICALLY
PRIMARY
RESPONSIBILFIY
SAFETY
SAME
PRIMARY
FUNCTIONS
AVIATE
NAVIGATE
COMMUNICATE
OPERATE
SAME
SAME
SAME
SAME
PRIMARY TASK
CHARACTERISTICS
DIRECT CONTROL
INDIRECT CONTROL
MANAGER, OPERATOR
MANAGER, MONITOR
DIRECT INVOLVEMENT
CONTINUOUSLY
MULTIPLE SOURCES OF
INFORMATION
INFORMATION GENERALLY
AVAILABLE
PERCEPTIALIPSYCHOMOTOR
SILLS USED FREQUENTLY
As system complexity increased, the U.S. Air Force recognized the need for more
emphasis in human factors and mandated contractors to employ specialists in
this area. Few schools offered courses in the discipline and companies found it
difficult to employ properly qualified people. There was uncertainty also
regarding the role and organizational placement of human factors specialists.
Often they became internal consultants who were used to make
recommendations or perform 3tudies to solve problems after these were
identified.
Because solutions to these problems called for consideration of many issues and
an adequate database was not available, the human factors specialists
233
"* Judgment that the system will benefit more from an additional
engineer from one of the traditional engineering disciplines than from
a human factors specialist.
234
Cockpit Amation
don Doesi
Performance
4,
.,,-
upper Limit
T
Workload
FgM 9.5
(ot n1Gum)
236
Cockpit Automation
Add Work
111
Subtract Work
o8
CD
Reduce Work to
Provide Safety Margin
I
Takeoff
Climbout
Time
Cruise
Descent
Approach
_Landing
Mission Scenario
Rgm 9.6 Benecial auanmwt Nl~g
f
Air Transport Association, for example, has not only established a standing task
force to identify human factors issues and promote their resolution, it has
encouraged the elevation of human factors to a core discipline in aircraft design
commensurate to such engineering disciplines as aerodynamics.
SoaW" Sciences and do Need for Tesing
One of the reservations many organizations have about human factors is that
they are supposedly based on "soft" sciences. This perception is not accurate.
Research into human characteristics has generated a great deal of "hard"
information. Sensory processes are reasonably well understood, and a great deal
is known about perception, learning, memory, motivation and emotion. Useful
data are also available regarding decisionmaking.
it is true, however, that in spite of the amount of data available, few theories
are available to integrate these data in useful human factors applications.
Adding to the difficulty is the fact that many interacting variables may influence
a person's performance in unpredictable ways at any specific time. These
237
"* Arousal
- Influence of learning/practice on
perception
"(anticipation)
* Stress
Warm-up
De
"* Inhibition (Hernandez
Peon)
* Attention
* Isolation
"VTransfer of training
* Vigilance
* Stress
"* Overload
* Biases in decisionmaking
perception
"* Sensation and
is the fact that many "hard" disciplines such as aerodynamics and meteorology
also have similar problems. In all of these disciplines, there is a need for
extensive testing to determine the efficacy of a particular design or model.
Testing is heavily emphasized in most aircraft design. Aircraft structure is
stressed to destruction at a cost of many million dollars to demonstrate that
design requirements are met. Millions of dollars a week are spent on windtunnel testing during some phases of design. In contrast, simulator tests of
cockpit design have not been as frequently or effectively used as other modes of
aircraft design testing. This seems inconsistent in view of the much greater
confidence in the "hard" data of the more traditional disciplines and the
identification of human error as a major contributor to accidents.
238
Cockdit Automation
may lack an understanding of the line piloes environment, such as flying the
same aircraft for years, flying many legs late at night, or flying long
intercontinental flights.
Ideally, design decisions should be based on criteria related to overall system
performance, but designers have generally deemed human performance difficult
to assess. Part of the difficulty may arise from the designers' relatively poor
understanding of human factors testing. It seems apparent that more attention
should be devoted to valid, reliable human performance measures.
In addition, if critical human tasks involve reprogramming and/or taking over
for automated systems in the event of a significant failure, it should always be
demonstrated that representative crews can perform adequately under
representative (including worst-case) scenarios. It is also desirable that human
performance be tested near the limits of its capabilities to assure adequate
safety margins.
Conclusions
This review far from exhausts the relevant information regarding cockpit
automation. Training issues have not been addressed at all, for example. Many
years of further study and of industry experience will be required for designers
to be fully confident in how to design automated systems that are compatible
with human characteristics. Several preliminary conclusions seem appropriate,
however:
"* Automation will continue to increase.
"* Successful automation depends on proper integration of human
capabilities.
"* The discipline of human factors has a store of knowledge and methods
which can be useful to good systems design.
239
Cockpit Automation
"o Review, critique, assess, and enrich manufacturer's cockpit development plan
"o Participate in selected development activities to assure adequacy
"o Review and assess cockpit relevant reports of tests, analyses, etc, submitted
by manufacturer
241/242
Display Desie
Chapter 10
Display Design
by Delmar M. Fadden, Chief Engineer--Flight Deck, Boeing Commercial Airplane
Group
The rapid and relable display of visual information in the flight deck requires a
thorough understanding of the functions being supported, thoughtful
application of available human performance knowledge, and careful selection of
the appropriate display media. This chapter explores some display characteristics
of special relevance to achieving highly effective human performance in flight
situations.
The measure of a truly effective display is how well it supports consistent
accomplishment of the tasks assigned to the person who will be using it.
Display design is as concerned with task design as it is with presentation
symbology and display devices. The process of identifying the full range of tasks
and the associated information requirements for a modem, highly integrated
display can be formidable indeed.
243
The core of the design process usually involves resolving contentions between
system functional requirements and operator capabilities and limitations.
Through actual design examples, this chapter illustrates the issues associated
with balancing system and human needs. The examples are based on display
development work at Boeing Commercial Airplane Group in suppr-, f the 757,
767, and 747-400 airplanes.
useful basis for discussing the fundamental elements of display design. Some
steps require considerably more effort than others, depending largely on the
scope and phase of the project. Some of the steps can be accomplished using
traditional engineering tools and methods; others are better suited to techniques
more commonly associated with the sciences of psychology, operations research,
and human factors. Many successful displays have been developed without
explicit attention to this process, though their development histories often show
evolutionary improvements that can be mapped to these steps.
Requirements
Displays exist to provide information to a human being who is asked to achieve
some objective. Accurately recognizing that objective in terms of required
outcomes is crucial to successful display design. Once the top level objectives
are identified, the focus shifts to determination of the detailed tasks necessary
to accomplish the objective and the information requirements that support those
tasks.
There is an understandable tendency to skip the formal definition of the
detailed tasks and associated information requirements and start the design by
developing display formats. Working on display formatting can be a useful aid
in initiating an understanding of the information requirements. However, the
understanding gained by first developing information requirements from the
related tasks is virtually always more accurate and complete. The are two
significant side effects that can follow a design which begins with display
formatting selection. The format selected likely will be based on the similarity
of information content to that of other displays rather than any actual linkage
to the tasks this specific display supports. The conceptualizations of the
information required and its organization will be well established before the full
range of task possibilities has been explored. Together these effects can result in
excessive display complexity, more operator errors, and less efficient task
performance by the pilot.
244
Display
Recognize New or
Modified Task. Function,
or Technology
ftqt.drements
No
Task
Property
Porlorm
Defined?
Detailed
Task Ana"is
Yes
Dellne
Information
Requirements
No
Into Conten
Adequately
Defined
Yes
R
rd Symbo
or Task?
Design
Develop
Candidate
Symbols &Formats
Yes
Suitable
Display
T nology?
No
Yes
Legible?
.. .......
....
..................
.......
...
Evaluateon
No
-- Ve-s-, - - - - -Symbology
Supports Primary
Task?
No
Yes
Display
& Syrribo
No
Yes
Flight
Tasks &Symbo
to
No
Yes
Op2rgon
Task Performance
No
Achieved?
F"
10.1
245
In most circumstances it is best to trace the task from the top level mission
objectives. While this is tedious work, it provides essential insight into the
interrelationships between tasks and provides a basis for meaningful discussions
with those who will use the system. The top-down task analysis also yields a
complete description of the steps believed necessary to execute each particular
task. To some, it may seem premature to prepare a task analysis before the
hardware is designed. However, the system designer knows conceptually what
has to take place. Specific switches and controls are not yet known, neither are
the overhead tasks which will be necessary to operate the system, so the initial
analysis must be done at a top level As the design develops, the analysis can
be expanded to a more detailed level. Proceeding in this fashion provides a
good check on the correctness and efficiency of the specific design. By
comparing the detailed analysis with the initial top level analysis, the designer
determines how much overhead has been added and checks to see that the
functional design remains consistent with the stated objectives.
The task analysis should identify at least the following information:
o the objective of the task, stated in measurable terms;
"o the timing for the task (any task initiation dependencies should be
defined, along with constraints on execution or completion time);
"o expectations about task performance, including accuracy, consistency
and completeness;
"o possible errors and related consequences (be sure to consider errors of
omission along with errors of commission);
"o task dependencies, other than those associated with timing already
identified (dependencies might include other tasks, specific events,
combinations of flight conditions, etc.);
"o criticality of the task objective in relationship to the safety and
efficiency of the flight.
246
DisDaV Desp
i
Having completed a detailed task analysis, the tasks can be linked with the
information necessary to accomplish each task. This would normally include:
"o definition of information requirements
"o the accuracy and range needed
o the context within which that information will be used
"o the necessary dynamic response
"o any special relationships with other events or information
At this point, it is useful to examine similar tasks and the information necessary
to support them. Task similarities provide valuable insight into the range of
human performance that can be expected. Where tasks are new or involve a
change in the required precision or dynamics, the designer will have to turn to
rapid prototyping, part task simulation, experimental tests, or other human
factors testing to identify and quantify the specific information requirements.
Tasks involving continuous dynamic control are further complicated by the
complex interaction between the dynamics of the control device, the vehicle
dynamics, the dynamics of the displayed information, and sometimes the
dynamics of the pilots' response. In difficult cases, this step will be iterated
many times in a series of progressive refinements until satisfactory performance
is achieved. Often these iterations are accomplished in conjunction with
iterations of the previously discussed task analysis and the symbology
development step that follows.
Not all information requirements need to be satisfied through on-board displays.
There are various other sources for required information that can be just as
effective. One of these sources is the knowledge that the pilot carries in his or
her mind through previous experience or training. Also, information can be
carried on board with the pilot or the pilot can derive it from other information
available on the flight deck. Information from an alternate source may be easier
for the pilot to integrate with the task than if it were contained in a flight deck
display. Taking the time to examine alternate sources for required information
can simplify display design considerably and aid the pilot by simplifying access
to the required information.
247
Design
Once the information requirements for a display have been defined, the next
step is to determine how to present the information. Symbology selection
determines how specific information elements will be represented within the
display. (In this context, symbology encompasses any form of character, graphic,
or textual entity.) By contrast, display format selection determines the
conceptual framework within which the information will be presented. The two
selections are closely related. For highly integrated dLsplays, the selection of
formatting will be heavily influenced by the top level tasks the display supports,
while the symbology selection often will be guided by specific requirements of
the detailed tasks. The necessity for joint and iterative refinement of symbology
and formatting frequently increases as display complexity increases.
It is standard practice to pay particular attention to how information has been
represented and related to tasks in similar successful displays. Building on past
successes has numerous advantages. Training can be simplified, if the pilot is
familiar with a significant portion of the display. The risks of introducing a new
display can be reduced, if the human performance expectations are based on
operational use of a similar display. These benefits are often perceived to be of
sufficient value as to preclude serious consideration of alternative symbology
and formats. However, examine the underlying tasks carefully, since subtle
differences in the current tasks may require that different information be
portrayed or that formatting be adjusted to highlight different relationships.
Changes in the technology used for display can force a change in the selection
of symbology even when the task and information requirements remain the
same. This would be the case when the change in technology alters important
characteristics used in creating symbology. For example, line widths that can be
presented using practical CRT technology are considerably thicker than those
which can be produced using print technology. This changes the amount of
detail that can be presented successfully in a given area. In effect, print media
have a much greater upper limit for information density when compared with a
CRT display. Another difference concerns the manner in which displays generate
brightness. Since CRTs emit light, the overall brightness of a CRT display will
be a direct function of the information content and a reverse function of the
ambient light. Reflective displays, on the other hand, change brightness as a
direct function of ambient light with a much smaller contribution based on the
information content.
When technology changes are involved, it should not be assumed that
symbology that has been successful in the past will carry over equally
successfully to the new display. Each display technology has unique
characteristics or capabilities that can be exploited to enhance the effectiveness
248
Display Desirn
of information transfer. The common ground for assessing the impact of any
limitations and the value of any enhancements is the task performance
achievable. Objective evaluation of these issues has a profound impact on the
decision about which technology to use.
If there isn't an existing presentation format for the task, new symbols and
formats must be created. Simplicity, quick recognition, and directness are
characteristics of proven value in effective symbology. Regardless of how the
symbol is conceived, there needs to be an appropriate performance measure
(agreed to in advance) to determine how well the symbol performs its job. User
preference is a significant factor in the development of symbology. If the users
don't like a symbol, there is little to be gained by continuing its use. However,
just because the users like a symbol does not mean that they can use it
effectively. The only way to know tmat a symbol really works is to have the
pilot use it and to measure the resulting performance.
As in all human performance testing, the test engineer is faced with the
challenge of obtaining an appropriate performance measurement yardstick. In
this case, it comes from the detailed task analysis. How much tracking accuracy
does the pilot have to achieve? What probability of error car, be tolerated?
How quickly do decisions have to be made? These questions can be quantified
based on the pilot's top level task and the details of the task analysis.
Finally, designers have to look at factors of legibility, so that the displayed
information can be seen in the operating environment. Legibility is a complex
issue in a modem airplane. Several factors contribute to the potential for less
than optimal viewing: the geometrical requirements for the aerodynamic shape
of the flight deck, external environmental influences, and the large vision
variability between pilots. Vision is one of the more variable of human
capabilities. It is not unusual for otherwise similar pilots to have quite different
visual capability. Corrective lenses can reduce the effects of individual acuity
differences; however, accommodation time, color perception, and critical flicker
fusion frequency remain highly variable individual characteristics. The pilot's
external environment varies from virtually pitch black to extremely bright
sunlight. The distance between the pilot's eyes and the display is generally
greater than a person would choose to read a book or a newspaper. Accordingly
the size of text and graphics must be increased to compensate. The pilot and
the displays vibrate at different rates when the airplane is in turbulence. The
resulting reiLative motion can severely hamper readability, particularly the
readability of small symbols or fine detail. If all or a portion of the information
must be read in turbulence, both the design and the legibility testing must take
that into account.
249
Evaluaton
The first part of the evaluation cycle determines whether the primary task
performance defined in the early requirements phase has been achieved. As with
the early development work, it is having dearly identified and measurable
performance criteria, that makes efficient testing possible. Once it has been
determined that the expected performance can be achieved for the intended
task, it is important to determine that the performance of other tasks has not
been degraded. This second portion of the evaluation process is generally more
difficult.
Knowledge of the various mechanisms that have contributed to performance
degradation in the past is a good place to start in deveklping an evaluation
strategy. Typical conflict mechanisms include the following:
"o Apparent symbol motion caused by actual motion of nearby symbols.
"o Poor recognition of a symbol or alphanumeric caused by excessive
dominance of an unrelated nearby symbol. Such dominance may be
due to relative size, color, brightness, or shape differences.
"o Symbol uses or format interpretations that are inconsistent with pilot
expectations. The pilot's expectation derive from many sources
including: other associated tasks or displays, his mental
conceptualization of the situation, cultural influences, training, or
previous experiences.
"o Similar symbols that support different tasks but can be confused. This
problem is particularly difficult to identify if the information is
identical or highly similar but the task or task performance level is
subtly different.
Integrated displays present a great deal of information, and have many tasks
associated with them. Therefore, if a new task is being added to an already
complex display, it is important to confirm that the required level of
performance for previous tasks can still be achieved. Once task performance has
been confirmed for all tasks associated with the integrated display, the check
should be expanded to examine all applicable task-display combinations on the
flight deck.
250
DiVbla Desin
Ope ton
The final phase in the display development process is operational follow-up.
Comments about problems or concerns are readily available from both
certification and airline personnel. Over the life of a typical display system, it is
not unusual to find that some of the tasks for which the display was originally
designed get redefined in subtle ways. This may be due to changes in the
operating environment, changes in the skill or knowledge base of the pilots
using the display, or it may be the result of refinement of a partially understood
task. In any case it is important that user comments be recognized and
evaluated against the design intent. The quality of future decisions about use of
the display, associated training, or operational enhancements depends on
accurate understanding of the pilots' tasks and how they are supported by
displayed information.
General Design Issues
OppmUot
as ber Sndrdbown
indeed the same for the engine-related tasks on the two airplanes, the task
execution strategies the pilots preferred were distinctly different.
For the four-engine 747-400, the pilot monitored for an engine anomaly by
comparing the same parameter on all four engines and focusing on the engine
whose parameter was inconsistent with the other three. For the twin-engine
767, the strategy involved comparing the parameters for each engine with the
pilot's expectations and his knowledge of past performance. In this case, the
pilot was concerned with relating the different parameters for a single engine.
Cross comparisons for the twin-engine airplane would be inconclusive for many
failure conditions.
Understanding this difference in task execution strategy provides a good basis
for understanding why there was such a dear difference in the display format
selection for the two airplanes. Where task differences do exist, the issue of
standardization can be reduced to comparing the cost saving which might result
from standard display hardware and software with the cost of the associated
degradation in performance and the added compensatory training that would be
necessary.
Flight functions that are common across many airplane types come under
significant market forces that, over time, promote de facto standardization. This
tends to apply to functions that are well known and quite stable. As would be
expected, the bulk of industry attention is focused on functions that are new,
incompletely understood, and rapidly changing. It should be possible to achieve
a reasonably high level of display standardization provided that detailed tasks
can be standardized. The crucial factor is whether the tasks are truly common.
That is a difficult question to answer in a business climate involving intense
competition and rapid technological change both on the flight deck and in the
ATC environ. In many ways, it is a tribute to the entire industry that the degree
of standardization that exists now has been achieved at all.
An example illustrates the subtlety of the pilot's use of dynamic symbology. The
primary instrument arrangement for the Boeing 767 has the map display
directly below the primary attitude display. The localizer deviation display is at
the bottom of the ADI. Since the track scale is at the top of the map display,
there is no need for repeating any heading information on the ADI. The Boeing
747-400 has larger CRT displays in a side-by-side arrangement. In this case, the
track scale is separated from the localizer deviation. Since this altered the "basic
T instrument arrangement, it was decided to place a heading scale at the
bottom of the primary flight display (PFD). The initial format for this
information was selected to emphasize airplane heading, thus maintaining a
strong link with past HSI displays. The scale at the top of the navigation
252
Display Desism
display (ND) is track oriented, as it is on most 767 airplanes. The two different
orientations were believed to match a difference between the localizer capture
and runway alignment tasks. In separate applications, both of these orientations
had been in wide-spread use for an extended period of time, each with highly
successful operational histories. During initial 747-400 flight testing, it was
found that a significant number of pilots were having difficulty with the
transition between instrument and visual conditions during initial departure and
the final phase of ILS approaches. Having the two scales in close proximity and
with a different orientation was suspected as contributing to the problem, since
the basic information contents of the displays on the 767 and the 747-400 are
identical. Identification of the specific sources of the performance difficulty was
done by a team led by John Wiedemann. The steps they accomplished in
resolving the difficulty provide an interesting perspective on the complexity of
designing highly integrated displays.
Figure 10.2 shows the original ;47-400 heading and track symbology on the
primary flight display (left side of the figure) and navigation display (right side
PROBLEM:
HEADING/TRACK SYMBOLOGY
THE PFD AND THE ND.
INCONSISTENCY BETWEEN
CRS214
SOURCES OF CONFUSION:
1.
2,
ND HEADING BUG
PFD TRACK BUG
A
Figure 10.2 Inkkd 747-400 PFD mid ND Heading md Track Symbklogy. (ovigd
g)
of the figure). On the navigation display, track is fixed at the top of the display
and heading is shown by a modified, triangular pointer which moves along the
253
CRS214
SOURCES OF CONFUSION:
40
1.
ND HEADING BUG
2.
3.
4.
5.
READOUT BOX
ND TRACK LINE
SELECTED HEADING BUG
A
Figure 10.3 Nav"Ion Display Heacing Poter Shape ChanOg. (owg
iure)
This simple change did not solve the problem. At this point, a thorough review
of task information relationships was accomplished beginning with an
assessment of how these tasks were supported by earlier displays. This review
confirmed that the information content was correct but indicated three areas of
potential confusion brought about by the close proximity of the PFD and ND
presentations. The next step involved changes in each of the three areas:
(shown in figure 10.4)
254
Display Design
"o make both heading pointers the same shape, but put them outside the
compass scale circle,
"o locate both digital readout boxes at the top of the display,
"o add a moveable track line on the PFD, analogous to the fixed track
line on the ND.
SOLUTION #2:
e4
SOURCES OF CONFUSION,
40
1. READOUT BOXES
2. Pi-D TRACK BUG
3. SELECTED HEADING BUG
A
Figure 10.4 Consistent Shapes for Heading and Track Pointers. (original figure)
Performance with this format was better; however there was now confusion
associated with the digital readouts and the track information. The results of
simulator testing suggested three more changes: (shown in Figure 10.5)
o remove the digital readout box from the PFD, so there is no read-out
confusion;
o add a tick to the PFD track line to strengthen the association with the
ND track line;
255
o move the selected heading split rectangle to the inside of the compass
arc to avoid conflict with the heading triangle.
FINAL SOLUTION:
180
CONSISTENT
SYMBOLOGY BETWEEN
DISPLAYS
MAG'S
Figure 10.5 Consistent 747-400 PFD and ND Heading and Track Symbology. (original figure)
This combination performed well. Clearly the success of this symbology suggests
that the actual task for which the pilots use the scale on the PFD is closer to
the capture and track task associated with the map display than the runway
alignment task that had been presumed. This interpretation follows from the
relatively small change 1n the ND format compared with previous map displays
and the much more significant changes to the PFD when compared with
previous HSI presentations. Note that no information was added or removed
from either display. All seven of the changes involved symbology and formatting
only. The number of changes and the sequential manner in which they were
identified emphasizes the high degree of interaction among the symbols in these
two displays.
Use of Color
The first commercial CRT displays developed by Boeing (originally intended for
the Boeing SST) were integrated in the NASA TCV airplane after the SST
256
displav Design
program was canceled in the early 1970s. (The NASA Terminal Configured
Vehicle, TCV, is a Boeing 737 airplane with a reconfigureable research flight
257
Eye Fatigue
The use of CRT displays incroduced new opportunities for eye fatigue. To
minimize this potential, several characteristics of the displays were carefully
controlled. Eye fatigue results when the muscles controlling the eye are subject
to overuse.
The mn-scles that change the shape of the lens respond to the sharpness of the
edges in the image falling on the retina. For conventional mechanical displays,
edge sharpness is very high. The manner by which a CRT image is created
produces a Gaussian-like distribution of light across each line in the display. If
line widths, along with phosphor dot arrangement and spacing, are not
carefully selected, the resulting soft edges can cause excessive refocusing and
eventual eye fatigue.
Laboratory testing with a variety of pilots revealed that the optimum line widths
for color CRT displays were significantly wider than for monochromatic CRTs
and that the desired widths varied with the color of the line. This latter finding
appears, at least in part, to be related to the fact that misconvergence can cause
color fringing along those lines composed of two or more primary colors.
Eye fatigue can also result from fixating in one location for an extended time.
Fortunately the distributed nature of information in a modern flight deck
encourages the pilot to change his point-of-regard frequently. When the large
format CRTs were first proposed for the 767, there was concern that the
novelty of the display along with the large amount of information they
contained would result in much greater dwell time on these instruments than
was true of previous displays. The original performance criteria for the displays
included graphic symbology that could be interpreted quickly by the pilot. Eye
258
Disvlay Desin
track records confirmed that dwell times remained quite similar to those
associated with conventional displays.
A third potential source of eye fatigue is the apparent motion in a display
caused by flicker. Rapid motion is a powerful means of attracting visual
attention. This is true for any visual scene real or created. The motion attention
response is so automatic, that it is not under the conscious control of the pilot
in most situations. The human visual system's sensitivity to flicker is not
uniform throughout the visual field. For most people, it is greatest in the
peripheral region between 45 and 60 degrees away from the eye
point-of-regard. In this region, the critical flicker fusion frequency generally will
not be less than 45 Hz nor more than 62 Hz. This is significantly higher than
in the foveal region where critical flicker fusion frequencies below 30 Hz are
common.
Unfortunately the zone of greatest flicker sensitivity overlaps the location of the
other pilot's displays in most side-by-side two-pilot flight decks. Thus the
required refresh frequency for flight displays is set by the flight deck geometry.
For displays used on the 757 and 767, the nominal refresh rate is 80 Hz. This
is allowed to drop, under high data presentation conditions, to as low as 65 Hz.
Below that frequency, a message appears alerting the pilot to the data overload
condition and allowing him to deselect unneeded information.
Though glare and reflections do not cause eye fatigue directly, they are likely to
be reported as such if the pilot becomes aware of them on a continuing basis.
The use of an anti-reflective coating on the external surface of the display along
with careful matching of the index of refraction for the various layers of the
display face plate greatly reduces the opportunity for perceived reflections.
Finally, the flight deck geometry is established to ensure that sunlight on the
pilot's white shirt will not reflect off the screen and into the pilots' eyes in his
normal seated position.
Attention to all of these details has resulted in displays that pilots regard as
highly readable and with which they achieve consistently high performance.
Future technology changes will likely alter the specific requirements
characteristic of current displays. Even some of the areas of concern might
change. However, by understanding the factors that influence both perceptions
and performance, it will be possible to ensure that the next display technology
evolution is at least as successful as the transition to CRTs has been.
259
260
Display Desi
A variety of performance data is available when the pilot takes deliberate action
that indicates such data would be useful. For example, the normal procedure for
changing altitude is to select the new altitude on the mode select panel and
then initiate a climb or descent, as appropriate. The two actions generate a
prediction of how far ahead of the current position the aircraft will be when
the new altitude is reached. This prediction is shown as a green arc on the map
display. Once the new altitude has been captured, the prediction is no longer
meaningful and it is automatically removed from the display.
A similar strategy is used to support the temporary engine exhaust gas
temperature (EGT) limit that applies during engine start. A red radial is shown
on the EGT gauge at the start limit value from the time the start is initiated by
the pilot until the start cycle is completed. If the start operation occurs while
the airplane is in flight, additional information is needed to ensure that enough
air flow is available to complete the start. In this case, appropriate information
about the airspeeds necessary for an unassisted start are displayed near the
primary engine indicators when the engine is not running during flight. If the
airplane is not at a speed sufficient for an unassisted start, the need for
cross-bleed assistance is shown directly on the appropriate engine rpm indicator.
The time sharing illustrated by these examples would not have been possible
without the flexibility of a general purpose display device like the CRT. The
obvious benefit obtained from the engine and performance time sharing
discussed above is the heightened awareness of the time shared data that occurs
during the interval when that data is significant to the pilot. The corollary
benefit may not be so obvious but is, nevertheless, one of the fundamental
operational reasons for considering time sharing. This benefit can best be
illustrated by noting that the most effective displays are those kept simple.
Every extra display element takes time to interpret and introduces additional
opportunities for misinterpretation and error. Further, the errors will not be
confined to the extra data. As noted in the section discussing evaluation, the
presence of nearby symbols, particularly dynamic symbols, can be a significant
enabling factor for error.
A human characteristic that points toward the desirability of simple displays is
the notion of selective attention, or "tunneling." In essence, under certain
conditions many people have a tendency to fixate on selected data or tasks and
ignore others. The circumstances that trigger this phenomenon are highly
individual; but excessive workload, high stress, fatigue, or fear are often
precursors. The task that is attended to may or may not be the most
appropriate for the existing circumstances. Indeed, if tunneling continues for
any significant time, it is likely that the data that would aid the pilot in
recognizing the need for a priority change has, itself, been biased by the lack of
261
attention. The simpler the normal displays are, the more likely they are to avoid
the tunneling phenomenon. If tunneling does occur and the displays are kept
simple, there is a greater chance that the pilot will see only high priority
information.
Another aspect of human perception that may play a part in the decision to
time share is our human tendency to see what we expect to see. If data are
continuously presented and are normal for an extended time, it is likely that the
threshold at which a pilot will recognize an abnormality exists, will become less
precise. Many tools are available to deal with this characteristic. Most depend
on some form of alerting triggered by a parameter exceeding a limit value. Two
examples illustrate ways of dealing with this phenomenon.
Exhaust gas temperature (EGT) is a basic engine health parameter on most jet
engines. As such, EGT is required to be displayed in the flight deck. It has no
other operational use. The actual value of EGT varies with engine power setting
and altitude in a rather complex way. Thus, over a typical flight, the pilot can
expect to see the EGT value vary from some low value to quite close to the
limit value. Thus, proximity to the limit is not necessarily a concern, but
exceeding the limit is. The reliability of modem jet engines suggests that, on
average, a pilot would see an over limit condition not more than once every
few years. That represents many hours of seeing normal values for every case of
an abnormal value.
Simple limit values can usually be sensed precisely and reliably by the
instrumentation system. That is the case for EGT. Several elements of the EGT
presentation change color when the established EGT limit is exceeded. The
color change affects the EGT pointer, the related EGT digital readout, and the
box drawn around the digital readout. Since the majority of the Engine
Indicating and Crew Alerting System (EICAS) display is white-on-black, this
change to red-on-black is highly visible. With the color change, there is no
doubt that the limit has been exceeded and which engine has the problem.
There are three engine types available for the 767 and the 757. A common type
rating was planned between the two airplanes. None of the engines have
exactly the same values for their limits, but they are displayed in exactly the
same way. Therefore, the pilot doesn't have to memorize a new number when
transitioning between airplane types. Instead, he uses the display exactly the
same way on both airplanes. This is one of the versatile things that can be
done with a CRT instrument.
The secondary engine instruments present a slightly different challenge. In this
case, there are five or more parameters per engine. The values of some of these
262
Display Desimn
parameters are only subtly linked to the pilot's operation of the engine. They
may or may not have limits associated with them. Fundamentally, these
parameters are used for long-term engine performance assessment, for backup if
a primary indication fails, or for maintenance assessment if abnormal engine
operation is encountered.
These secondary indications are grouped on the lower EICAS display. The
design of this display is such that the data can be turned off without loss of
limit indication. The computer monitors track those parameters that have limits
and pop up the appropriate information on the display if a parameter goes out
of limits.
Recommended usage of this feature for most engine-airframe combinations is to
have the lower display active during engine start and then to blank the display
for normal flight operations. Of course the pilot should activate the lower
display any time he wishes to check any of the secondary data. The flexibility of
use of this feature allows airlines and pilots to tailor operations to fit their
particular operating style. At the same time, the availability of the feature
recognizes that it is unreasonable to expect that all pilots will be properly
attentive to displayed information regardless of the circumstances and the
quantity of data actively displayed.
All the time sharing discussed to this point has involved changes to the data
content of an existing display. In all cases, the basic conceptual framework for
each display remains intact. The most general form of time sharing involves
conceptual changes in the content of the display. In the extreme, this could
mean that the display surface is used sequentially for totally independent tasks
involving completely different information.
Successful implementation of this type of time sharing requires careful attention
to the details of all related tasks and for the circumstances under which
switching from one task to another will occur. Recognizing and supporting all
the task linkages that can occur, particularly those associated with non-normal
operation, is a key prerequisite for success.
Selecting the various modes of a time shared display will be most successful if
the conceptual model used to implement the switching, matches the pilots'
understanding of system usage. For complex systems, this is a difficult task since
the level of system usage understanding will likely be different from pilot to
pilot. Understanding will also be different for a single pilot as his skill with the
system evolves from novice to expert. For example, a tree-structured selection
concept is often preferred during initial training but shifts as experience is
263
264
Display Desim
The corresponding predictive information on the map (see figure 10.6) consists
of a variable radius circular arc symbol whose radius varies with the current
turn rate. In this case, the pilot can see that he has selected the proper bank
angle when the arc is tangent to the desired path or when it passes through the
desired point ahead of the aircraft. A fixed straight line from the airplane
symbol to the top of the display shows the path the airplane would follow if
the turn rate were zero. The rate of closure between this symbol and the
desired path line or target way point and this fixed line provide the position
and rate information the pilot needs to select and control his roll out to level
flight again. These predictions are very simple but very powerful.
Track Line
Desired
Path
Curved
Trend
Vector
_Current
Position
The length of the curved trend vector is proportional to the airplane ground
speed. Gaps in the curved trend vector show where the airplane will be 30, 60,
and 90 seconds ahead of current position. Of course the pilot can get some
sense of speed from how fast the map information is moving beneath the
airplane symbol. However, the fixed time intervals of the arc symbol provide the
pilot with a relative time reference to use in interpreting the rest of the display
information.
The predictive information does not directly tell the pilot when to maneuver nor
does it demand a particular maneuvering strategy. The pilot must make these
266
Display Design
Woarload Assessment
Chapter 11
Workload Assessment
by Delmar M. Fadden, Chief Engineer-Flight Deck, Boeing Commercial Airplane
Group
Workload assessment became a formal part of the certification of large
commercial transports, with the adoption of Appendix D to FAR Part 25. While
Appendix D identifies the need for such assessment it does not define the
means. In retrospect, the fortuitous lack of rigidly defined methodology
prompted considerable research and development that otherwise might not have
occurred. The expansion of workload understanding and of the methods for
assessing workload has enabled the industry to keep pace with the rapidly
evolving character of crew workload over the last quarter century.
Operational differences in airplanes, such as the 737, 757/767, and 747-400,
cause changes in the workload the pilot experiences. The nature of these
changes has led to changes in the tools used to assess workload. On the 737,
the workload of primary concern was the shift of system management
269
responsibility to the two pilots. The flying task assigned to the pilots did not
change significantly between the 727 and the 737. The physical layout of the
column and wheel, primary flight displays, and the cockpit windows remained
very similar to the 727. The tasks that did change were those associated with
engines and systems management. The engine management tasks were subtly
different, reflecting the twin engine configuration of the 737. The systems
underwent substantial change to bring them into conformity with the two-pilot
operating concept.
By the time the 767 design was initiated, extensive experience had been
obtained from a wide range of two-crew operations around the world. This
experience confirmed the soundness of the basic principles underlying the
design of systems for two-crew operation. However, airline desire for improved
operating efficiency, coupled with the increasing complexity of the air traffic
control environment, argued for significant enhancements to the primary flight
information. A new flight management system concept was devised featuring
cathode ray tube (CRT) flight instruments and digital computers handling many
of the navigation, flight planning, and performance assessment calculations.
These changes altered the pilots' tasks in ways that achieved improved efficiency
and greater overall situational awareness. These changes produced
corresponding changes in the pilots' experience of workload.
The 747-400 incorporates both the systems enhancements that had been
pioneered on the 737 and the flight management capabilities first introduced on
the 757 and 767. In addition, the primary instrument panel is modified
permitting the use of larger CRT displays. Finally, a number of new information
management features assist the pilot in coping with the increasing quantity of
flight, engine, and systems information available. These changes, along with a
complete redesign of the airplane systems, made it possible to change the crew
size from three, as it had been on previous 747 models, to two. The workload
concerns in this case focused on the integration effectiveness of the overall
flight deck design.
TMir. chapter reviews the evolving techniques that have been found useful for
assessing workload in modem jet transports. Emphasis is placed on workload
assessment in the early stages of design, since that is the time where
quantitative workload data is the most effective in shaping the product. The
techniques that have been developed to add structure to the subjective
assessments of the evaluation pilots are described. Several issues that have
significant effect on workload and the workload certification process are
presented. The chapter concludes with a discussion of pilot error and a glimpse
at future workload issues.
270
Workload Assessment
Woddoad Mefhod
Commercial aviation, during the jet age, has established an excellent record for
safety. The skills of many pilots have been a vital factor in that achievement.
Nevertheless, when accidents do occur, history indicates that some type of pilot
error will be involved in over 70% of the cases. Any work that leads to a
reduction in the consequences of pilot error has the potential to improve the
future accident record. While pilot workload, per se, has never been cited as the
cause of an accident, there is a common perception that workload and error are
related in some fashion.
Workload on a commercial airliner seldom, if ever, reaches the absolute limits
of the flight crew. However, circumstances do arise which result in a significant
elevation of workload. Whether or not such increases are large enough to cause
concern about the potential for error is one of the reasons for doing workload
assessment. The general relationship between workload and error is not well
understood, even within the human engineering community. There is general
agreement that error increases at both extremely low and extremely high
workload levels. In between, evidence for any direct relationship is weak or
nonexistent. Individual differences between pilots contribute to the difficulty of
establishing a useful working relationship for workload and error. There
appear to be significant variations in the level at which workload is considered
extremely high or extremely low from one individual to another and, even, for
the same individual under different personal and environmental circumstances.
Regulations applicable to commercial aircraft treat workload as a series of
factors that must be considered for each of the primary flight functions. The
workload factors, identified in Appendix D to FAR Part 25, constitute several of
the key dimensions through which a pilot experiences workload. The
characteristics describing these factors remain reasonably consistent for any one
pilot across a variety of vehicles and flight conditions. Differences among
individuals, however, tend to be large. The workload functions, also identified
in Appendix D, encompass the major functional tasks normally assigned to the
pilot. The details of these tasks, the related specific performance objectives, and
the relative task priorities, vary considerably from one aircraft type to another.
Workload assessment plays a dual role in the design and development process.
During the design cycle, workload assessment provides insights about the design
that identify opportunities for improving the pilot interface. Workload
assessment during the certification process provides a structured method for
examining the various workload issues that are relevant to the particular
aircraft type under scrutiny. Because it is very difficult to change the
fundamental factors that establish crew workload after the airplane is built,
271
272
Workload Assessment
CommercialAircraft Woddoad
Commercial aircraft workload can be divided into two broad regimes: normal
and nonnormal. The former constitutes all the tasks associated with planned
operation of the aircraft, including:
"o all allowable flight operations,
"o all certified weathei , ,erations,
"o certified minimum crew size,
"o selected equipment unavailability under the minimum equipment list,
and
"o normal flight operations following probable equipment fault or failure
conditions (exclude tasks associated directly with management of the
fault or failure).
Normal workload presumes compliance with all operating and performance
requirements along with adherence to all restrictions, limitations and established
policies. Under nonnormal conditions, strict compliance with normal operating
requirements can be relaxed, as long as aircraft or personnel safety is not
further compromised. In addition, through appropriate coordination, it may be
possi'ble to relax adherence to certain externally imposed restrictions or
performance standards. Such relaxation plays a significant part in mitigating
additional workload that might otherwise accrue from nonnormal events.
All remaining tasks are considered nonnormal. Both the consequences of
occurrence and the probability of occurrence are considered in determining
which nonnormal tasks are identified with specific procedures in the operational
documentation and the training the pilot receives. During design, assessments
are made of all possible ways in which safety hazards can occur. In this
manner, the relevance of every nonnormal event is determined. Experience
shows that particular attention is needed for events that are associated with:
"o other than normal flight conditions,
"o incapacitation of a required crew member,
"o management of equipment fault or failure conditions,
273
274
Workload Assessment
Selective
Reference
Comparative
Mission Analysis
Analysis
New Airpln
Verification
New Airplane
Simulaion
Mission Analysis
SReference
1Selective
Subsystems
Comparative
New Airplane
Analysis
Verification
Certification
Flight Test
Subsystems
Simulation
>
Full Mission
Part Task
Avionics >
Airplane >
6
-
Product Development
Certification
Roll Out
Go Ahead
5
Delivery
Test
Specifications
Design
six years. The fundamental decisions that shape the basic airplane itself are
frequently made in the first 12 to 24 months of the design activity. Structured
workload assessments usually begin about 50 months before certification. The
assessment tools selected at this point provide useful insight even though many
of the details of the design are not yet finished. Where a reasonable degree of
task similarity exists, comparative analyses based on these tasks can provide an
275
anchored reference. Where the task or the information presented is new and
cannot be quantitatively linked to previous designs, some form of laboratory
assessment, part-task simulation, or even experimental flight test may be
necessary; particularly if the task is important to the safety or operating success
of the airplane.
The costs of using these tools, particularly simulation or flight test, are not
limited to dollars but extend to the time and human resources they absorb.
Since committing resources reduces their availability for other developmental
work, those issues selected for this type of testing are carefully considered and
prioritized.
The first step in many new airplane workload assessment programs is a
comparative analysis of the internal airplane systems; electrical power,
hydraulic, pneumatic, environmental control and fuel being the most important.
This analysis depends on system knowledge but requires little detail about
events external to the airplane. Any impacts associated with external events or
inter-system effects will be incorporated in subsequent analyses. Negative or
neutral analytic results indicate where to focus further design attention. As with
all analytic methods, these analyses provide visibility based on known or
hypothesized relationships. Additional testing must be done when the possibility
of unanticipated relationships between design elements or crew tasks cannot
otherwise be reduced to an acceptable level.
During design of the 767, the analytic workload assessment process resulted in
two additional design optimization cycles for the hydraulic system and one
added cycle for the pneumatic system. These cycles occurred well before
hardware was built at a time when significant design flexibility remained.
Similarly, the fuel system of the 757 was changed from a five-tank to a
three-tank configuration based on workload considerations. The fuel tank issue
is particularly interesting because it illustrates the complexity of achieving truly
effective designs.
Fuel is a major element of weight in a long range airplane. The distribution of
that weight in the wing affects the stress each portion of the wing will
experience during flight. The structural weight of the wing is directly related to
these stresses. Naturally, the more the structure weighs, the more fuel must be
carried to lift the extra weight. It is advantageous to reduce the bending stress
by having more weight remain within the outboard portion of the wing than
the inboard portion as fuel is burned during flight. Consideration of these
factors for an airplane the size of the 757 suggested that the best structural
design solution would involve five fuel tanks: two outboard, two mid-wing, and
276
Workload Assessment
a center tank. However, such a system would make necessary additional routine
actions to manage the flow of fuel to each engine.
On a three-tank system, the center tank pumps normally operate at higher
output pressure than the wing tank pumps. This ensures that center fuel will be
used first. Operating procedures can be kept very simple; turn on all pumps
before take-off and turn off the center pumps when center fuel is exhausted.
Managing a five-tank system is more complicated for the pilot, unless a system
is added to sequence the fuel automatically. Considering the criticality of the
fuel system and the additional complexity that would be necessary to
compensate for new failure modes, such added automation would result in an
increase in electronics weight, several new nonnormal procedures, and increased
maintenance requirements. Several design iterations addressed each of these
issues and resulted in a revised fuel system that achieved lower total weight
and the simplicity of a three-tank design. Reaching this decision required
agreement across several, otherwise independent, functional groups within the
design organization, the regulatory organization, and the airlines, thereby
adding considerable time and effort to the design. The in-service results suggest
that the effort was worthwhile.
This example points out how important the early workload estimates are.
Redesigning the tank layout would not have been practical had the workload
assessment been delayed until a full mission simulation or a flight test vehicle
was available. In this case, the workload concern was identified by the
manufacturer who then took timely steps to resolve it. Had the issue been a
regulatory concern, it would have been equally important to identify early.
Wlddocd Assessrmnt Cdtfeia
Should analytic workload techniques be used for certification? It is convenient
for the manufacturer if they are, because the manufacturer has already applied
them, and based a design upon them. If the regulatory agency and the
manufacturer both agree on the scope and validity of such methods, then they
can be highly useful.
Boeing starts with a subsystem analysis program called Subsystems Workload
Assessment Tool (SWAT). The SWAT program assesses both normal and
non-normal procedures. The primary purpose of this program is to relate the
operating procedures, the display and control devices, and the geometry of the
cockpit using a common measure. The subsystem analyses are not related to a
specific mission so all the normal and nonnormal procedures are accomplished
serially. The analysis encompasses time and motion assessments for hand and
eye tasks. Such ergonomic data is essential for ensuring that displays and
277
controls are properly located within the system panel Time and complexity
assessments for aural, verbal, eye, hand, and cognitive tasks are also examined.
The complexity score is a method of estimating the mental effort related to
gathering information. It characterizes the information content of the displays
and the number of discrete operating choices available to the pilot using a
logarithmic measure (BITs). SWAT generates summary statistics for each system
and for all systems.
Table 11.1 is a systems workload data summary comparing 767-200 normal
inflight procedures with those for the 737. The data reflect that the 767-200
systems require that the pilot only switch off the two center tank fuel pumps
when the center tank is depleted. The 737 requires a few more hand and eye
tasks.
Table 11.1
Subsystems Workload Data Summary
Normal Inflight Procedures (Boeing& 1982)
Motion
737
767-200
212
1
Motion
Cinches)
737
767-200
17
1
mevity Channel
(Br__2
32
3
IMs
32
2
Bs"
17
3
S
2
Tasks
14
2
Ta
a
2
"
Workdoad Assement
Table 11.2
Workoad Data Summary
Subyt
Noua4loaal Infi~ Procedures (Boein& 1982)
Modon
737
767-200
2214
1348
Motion
eanndvity
(sTcs
330
183
458
3s5
348
297
Tasks
169
126
Time
(
737
767-200
Bf
183
154
1Ts"
l
Tasks
140
134
88
81
has a lower average number of tasks and lower average time to complete the
tasks than the three-crew 747-200. These graphs show that the 747-200 results
are similar to the results for the 737 and 767. These comparisons give the
designer an initial indication of how the workload associated with a new design
will compare with other airplanes. The normal procedure eye analysis (upper
left graph) shows that the 747-400 has a larger average number of tasks, but
that on average, the tasks take less time to do than on the 737. The goal in
this case is ensuring that total task time is similar to the total task time of
another airplane having a good operating history. The 737 has been used as a
reference by Boeing since the development of the 767, because the 737 has an
excellent safety record and is flown by more customers, in more environments,
using a wider diversity of pilots, than any other Boeing airplane. Experience
indicates that the 737 is highly tolerant of pilot error and that it supports many
different operating strategies.
A workload assessment summary for nonnormal procedures is shown in figure
11.3. Numeric totals are not particularly interesting by themselves because, even
under the worst of circumstances, the pilot will use only a small percentage of
the nonnormal procedures at any one time. Minimizing the number of tasks per
procedure is considered desirable. While the 767 nonnormal workload is
consistently the lowest, the corresponding 747-400 workload is significantly
closer to that of the 737 on these logarithmic graphs than to the three-crew
747-200.
In successfully reducing workload, designers can establish circumstances where
the crew has limited opportunities to experience certain events. If the crew
279
1000
10000
10000
-1000
747-200
AVERAGE
1111
(SECONDS)
7(SECONDS)
AVER
TIME
747-200
1007
100
77-400
++7
767
737
++
747-4 D0
767
10
10
10
1 )0
1 0
10100
-0 6.1-6
to&61-2
10000
747-200
+
1000
1000
AVERAGE
EY
MOTION
(DEGREES)
+ 747-200
AVERAGE
UAND
+ + 7 7-400
737
MTON
(MCHES)
100
100
++ 747-400
767
10
10
100
10 0
1
AVERAGE NIMBER OF TASKS
FIg6l-I
Fogure 11Z2
Systems Normal Procedures Workload Results for Variou Airplanes. P(Boin, 1969)
280
110
Workload Assessment
10000
1000
1000
-oo
AVERAGE
AVERAGE
TIME
(SECONDS)
TIME
(SECONDS)
*47--200
737-t 747-400
100
747--40
100
747-200
737
767
767
10
10
AVERAOE NUMBR O
1Cb
10
ILq
TASKS
WSC3/$,12
F106 1-14
Y13C3/"92
FIO6I.-10
10000
10000
1000
737
,47_400
1000
4720AVERAGE
47-200
AVERAGE
EYE
.MOTION
+.4i- 747-200
MOTION
(ONCHO)N
(DEGREES)
737
HlAND
7471400
7767
100
767
10
10
l60
Fo 6.4
RFgu
10)
11.3.
Sy&uns Nmonnom
281
Workload Aesment
longer, cruise phase of the flight. Consequently, statistics are focused on the
arrival and departure phases of flight or on each of the 300-second blocks. Four
separate channels of activity are examined: visual (eyes), motor (hands), aural,
and verbal. Modal initiation and execution times for each task are recorded and
each task is assumed to require 100% channel capacity for the duration of that
task. Tasks are shifted as necessary to avoid overlap. Recent research results
indicate that the 100% channel capacity assumption is significantly more
conservative than necessary. However, it has proven useful, in the relatively
benign workload environment of commercial aviation, by ensuring early
The decision to keep the four channels separate has a similar expediency basis.
From the design point of view, knowledge of the specific channel workload is
essential if any adjustments are required. Thus, a combined statistic would
-re
only as an intermediate step toward getting to the specific channel workload.
Combining the channel workload data immediately raises the question of the
basis for the combination. With the exception of the aural-verbal pair,
experience indicates that all the pairings can overlap successfully most of the
time. The circumstances where complete overlap may not work appear to
involve task events unfamiliar to the pilot or tasks of unusual complexity. The
idiosyncratic nature of these circumstances makes a rule for identifying them
difficult to develop and even harder to defend. The reason usually given for
wanting a single workload number is to simplify the decision of whether the
overall workload is acceptable. The lack of a firm basis for combining the
channels has led Boeing to focus on the individual channel statistics.
Timeline analysis provides visibility of both dwell time and transition time.
Dwell time is the time taken to read or operate the specific control or display
device; for example: adjusting a control, reading information from a display,
entering a way point name, or selecting a new switch position. Transition time
is the time taken to switch from one activity to another. Examples of transition
time are: moving the eye-point-of-regard from one display to another, moving
the hand from the control column to the throttles, or changing from looking
dwell times for the design are high, then the system designer needs to consider
using alternative display formats or control devices. If the transition times are
high, the flight deck designer is prompted to examine alternate physical
arrangements of the various controls and displays. Table 11.3 is a flight
283
procedure workload data summary for a Chicago to St. Louis flight depicting
dwell and transition times for eye and hand activities. These are total dwell and
transition times needed for the entire flight. These particular data were
generated early in the 767 development as a gross indication of the design
progress.
Table 11.3
Flight Procedure Workload Data Summary
Chicago to St. Louis Flight Totals (Boeing, 1979)
Captain
(sc.
737
767-200
550
372
Transition
Time
(se
24
20
BTs_
2,271
1,811
767-200
737
767-200
199
161
510
331
119
91
629
370
First Officer
Eye Activity Channel
30
2,423
26
1,844
Hand Activity Channel
737
274
181
895
767-200
196
136
488
Other useful statistics generated by the timeline analysis program include the
average amount of dwell time spent on a particular instrument and the
probability of transitioning between various instrument pairs. Samples of these
statistics are shown in Table 11.4. These two statistics are very useful in
developing the most effective flight deck layout. Where these statistics depart
significantly from those associated with current airplanes, the designer has
reason to conduct more detailed studies.
The next two tables show the activity demands on the captain and first officer
during each five-minute block of the one-hour, Chicago to St. Louis, flight. The
total flight time is divided into 300-second (5-minute) blocks beginning at
brake release and the time demands during each interval are shown as a
percentage. The purpose of this form of data presentation is to examine the
distribution of workload throughout the flight. Several characteristics are of
interest in these tables. While none of the following trigger levels should be
284
WoMroad Assesment
MFight
rument V-iual Scan
Dwell Time and Transition Probability Summary (Boeing. 1979)
Averag Dwell Time (Seconds)
urument
ADI
HSI
"Z-AirR
Takeo/Climb
1.17
0.81
0.64
0.47
Dcen,nd
1.11
1.05
0.68
0.50
Takeoff/Climb
nescenvian
0.90
0.87
0.86
0.79
0.25
0.23
HSI to ADI
0.78
AD! to
maw
ADI to HSI
0.36
0.31
ADI to Airsee
0.80
0.28
0.36
Table 11.5
Line Operation V-iual Activity Time Demand (Boeing, 1982)
Average Percent of Time Available Devoted to Vsual Tasks
Tim Interval
Captain
First Officer
in Seconds
Takeoff
Climb
Cruise
767
737
767
737
1 - 300
23
28
19
30
301 - 600
19
20
601 - 900
901 - 1200
12
1201 - 1500
10
16
1501 - 1800
12
24
10
13
1801 - 2100
15
12
12
19
2101 - 2400
13
17
13
2401 - 2700
18
26
10
13
2701 - 3000
15
12
3001 - 3300
15
17
13
15
3301 - 3600
10
14
17
Descent
Land
"-Excludesflight
Wozkload Anesmazt
Table 11.6
Cruise
Descent
Land
Captain
First Officer
-
767
737
767
737
1 - 300
17
20
15
27
301 - 600
11
10
21
601 - 900
901 - 1200
1201 - 1500
11
1501 - 1800
11
14
11
1801 - 2100
10
13
2101 - 2400
20
12
2401 - 2700
11
11
12
2701 - 3000
12
3001 - 3300
11
11
15
3301 - 3600
10
<1
287
CHICAGO
100
VISUAL CHANNEL
75
% OF TflE
CAI'TA[N
737-200
X-X
so
--
747-400
BUSY
2S
10
TJO
30
0
401DL
ROBERT
75
VISUAL .CHANNEL
FIRST OFFICER
X
so
737-200
-X
...
747-400
%OFTIMEf
.BUSY
10
2J
~30
41
TOO
ODLY
ROBER73I
TOC
Fig= 11A kmbdnm PraUs ViWW Aciviy This Dummid, 747-400 md 737-
28
(Booftg 1968)
Woxkloads
The final type of timeline analysis is a comparative plot of the total workload
time for each of the four channels: eyes, hands, verbal, and auditory. An
example is shown in figure 11.5 for both the 767-200 and the 737. The white
bar represents the 767-200. The black bar represents the 737. This figure
involves data for the same flight scenario used to generate the data in tables
11.5 and 11.6.
Mission duration
Eyes
Captain
First
Officer
Hands
Captain
First
Officer
Verbal
Captain
First
Officer
Legend
Ln 767-200
117
I
Captain
Auditory
First
Officer
_.L
1000
2000
3000
4Q000
Total time workload seconds
5000
Analysis based on takeoff brake release at Chicago (O'Hare) to touchdown at St. Louis (Lambert)
Foium 11.5
Task-Time Probobi/'dy
Another technique for examining task demands on the crew is called task-time
probability. This method estimates the probability that the pilot will be busy
with a task at each point along the flight. Since the method is probabilistic, it is
possible to account for a range of pilot performance. Each task is associated
with separate initiation and execution times, as was true for timeline analysis.
However in this case, instead of being assigned discrete values, these two times
are assigned probability densities. Tasks are allowed to overlap. Task-time
probability is computed for each one-second interval along the ffight path.
The probability density functions are centered on the modal initiation and
execution times. As a first estimate, a nonsymmetrical, triangular distribution is
assumed unless more specific test data are available to support a different
distribution. The nonsymmetrical distribution for task initiation or completion
289
Table 11.7
Probability of Being Busy with a Visual Task* (Boeing. 1982)
0toot-M-Sque P
Time Intervat
in Seconds
Takeoff
CtI ib
Cruise
Descent
Land
Captain
rat
y)
First Officer
767
737
767.
737
1 - 300
.41
.49
.38
.52
301 - 600
.26
.40
.25
.43
601 - 900
.15
.19
.12
.22
901 - 1200
.16
.26
.12
.30
1201 - 1500
.29
.37
.19
.27
1501 - 1800
.31
.47
.30
.34
1801 - 2100
.35
.32
.32
.42
2101 - 2400
.27
.33
.37
.31
2401 - 2700
.38
.47
.26
.31
2701 - 3000
.25
.35
.16
.31
3001 - 3300
.35
.38
.32
.35
3301 - 3600
.20
.30
.36
.40
Worldoad Asemet
Root-Mem*,
Tim Interval
in Seconda
Takeoff
CMaim
Cruise
Descent
Land
Captain
767
737
fy
First Officer
767
737
1 - 300
.39
.42
.35
.50
301 - 600
.21
.30
.29
.4
601 - 900
.15
.18
.22
.27
901 - 1200
.15
.16
.14
.22
1201 - 1500
.26
.26
.23
.31
1501 - 1800
.28
.31
.37
.32
1801
2100
.28
.21
.23
.34
2101 - 2400
.19
.22
.41
.30
2401 - 2700
.27
.29
.30
.28
2701 - 3000
.16
.17
.19
.31
3001 - 3300
.18
.29
.31
.35
3301 - 3600
.26
.31
.00
.16
291
Unless unresolved questions remain after the simulator testing, it should not be
necessary to conduct instrumented flight tests simply to verify the analysis.
Flight testing for quantitative time demand workload is extremely difficult to
accomplish and is easily confounded by external circumstances beyond the
control of the test conductor.
Quantitative workload testing in the actual airplane is much more difficult than
in the simulator. The single biggest contributor to the difficulty is the
unpredictability of the actual flight environment. At the same time, the actual
flight environment improves the pilot's conscious sensitivity to variations in his
experience of workload. That sensitivity can be focused and standardized using
a well designed, structured workload questionnaire. Sample pages from a
Boeing questionnaire for assessing pilot workload on the 757/767 airplanes are
shown in Figures 11.6 to 11.10. By completing the questionnaire, the evaluation
pilots indicate their experience of workload while operating either airplane. The
specific workload functions and factors are related to those identified in FAR
25, Appendix D.
The questionnaire is structured to ensure that the pilot specifically thinks about
the departure and the arrival phases of the flight, each type of activity that
occurred, and each dimension of workload. Becoming consciously aware of the
various aspects of workload requires training. Figure 11.6 provides descriptions
of workload function and factor combinations that each pilot is asked to
evaluate. A copy of this matrix is reviewed before each flight and is available
with the questionnaire at the end of each flight leg. At the end of the
questionnaire there is space for comments the pilot may have concerning any
aspect of the questionnaire or the flight. After completion of each flight
sequence an analyst reviews the completed questionnaire with the pilot and
solicits more detailed information about any unusual events or any particularly
high or low workload experiences.
The bottom section in Figure 11.12 shows the part of the questionnaire where
the pilot specifies the reference airplane used in his or her evaluation of the
test airplane. Currency in the reference airplane is established by indicating
whether the pilot has flown the reference airplane within the preceding 90
days. The identification of a reference airplane by the evaluation pilot serves to
anchor the pilots ratings and comments. It also helps to temper any biases a
pilot may have for or against an individual design or design feature.
292
-7
71
''~~
uenm
Workload
WORKLOAD FACTORS
DIPARTURE (AARINAL)
WORKLOAD FUNCTIONS111 Mental Effort
Nuegeoe e MENTAL
COmpOWSe
EFFORT a, volatilye
Iipaiun
Understanding ofi
Horlionisi Position
Time Required
Physical Difficulty
Compo*th e PHYSICAL
OIPIICULTY itope~sabittse
imlwitofhe
ime~
NWimar
'~
NONIZONIAL POSIIIONINGO I
PGARG-
IAI~nell.
I~n#A
Meieig
andkOM
dee cI
0MENTAL
CompaimeNW
EFFORT aeemplasy eameor
COMPOOmePHYSICAL
OWFICULRIT
of opera"~n the
n FmeS
won mae
aMO es
mON
-d Untie
acsmba
leu
ft"
In me r 01
h
PUSvWtlhmctIqIwdOed1
anCtisrglth touter kincliont; e
miii thoplart@dwingdug
thei
Oepanef-el.
""desiog
IPPOarnmesaaarymecp...me
Etirse/irpem
Isystemaopermon,
an 110"
4 ~RWJ00T
~ ~
4
l~
meEIu
rel. akpian.
In mhe
lumakivns
Departure (Arrival).
ln
-b
Departure ltrativl
Camapoer
NWetegre of MENTAL
REQUIRED
Comps.. Ow TOME
to opera$, MeFMS w~hmet
relluiredtO
10 .. UakiW-
~Op"hon
rpwtre
(Arrtell).
IntoPHYSICAL
Conmpare
DIFFICULTY loOpwafle nd
Mofio me Engines and
FMSI
father mhan
Airplan systemos
durk"g epirteJAt~wal.
-stank.
-B-Stnk.
Camops.. me PHYSIICAL
o-ca-"Bak-ln
OIFFICULTYO~~Ien
I eA dur"a
paml
end
NWgh
Oepetwearkimllt.a
--------------------------------------------------------------------------------Campete
meiPHYSICAL
Sak
-DIFFICULTYeOIOPeram
Oaperunkare
Cipa
5N
sArtrsIIOwn
TIMEAVAILABLE
t)SEFULNESSOFWEFOR4ATION
ICam~wpaum
deres
,
oldMENIAL
neNi~e~t~tIColmpantelhe
~~EFFORT
g etlann~eavailef Oar
meom
--
Blne1
deetesamakdurW4
Depeslur tA~nwaip
USEFULNESSoNWM
TIMEAVAILABLE
decision making duinrnraetndembltor
dECIicin makin, durii
Oep~llartur(Arivll
00p"kur lAienvall
Comp~are
the TIMEAVAILABLE
ato
Blanket
tod iulge
eIl" r
--------------------------------------------------
F19m' 11.6 Duwrom d Woddosi Evalumdon Funcdon wid Factor Conftialons (Boleing
293
Subjective Evaluation
S5Pilot
0
A Arplan Model
0 757
Airplane Number ..
' Ous.lionnair
0 767
..
Date of
t*ri
f ye.,)
Flight
Flight Number
Test Number
(Local)
Time
. Pilot's Name
* Organization
L
Captain
0 First Officer
Boeing: 0 Other
FAA:
FAA:
[] Flight Test
0 Other
Other:
Reernc Airplane: Plas
0]737
(3[7;27'
niaewihsinle
[] 737(SP 17 7)
[]747
[]707
0l DC-9
[] ot-e
s~c
airplane you areusing as a
[] DC-9-80
[ ODC-10
(check one)
O'IL 101 1.
_.
Other.
hal airplane) in the lsto90 days [
Yes
[0oNo
A representalive ol the 751/767 Flight Deck Inlegralion (8-8765) will coilecl your completed questionnaire. For
additional Information contact OD.. Fadden
THANK YOU
Figur 11.7 Ev iMM Background 093 Sheet Plat SbscdivO Evluidon QusGeionnke.
(Bmaf 19M
294
Workload Assmn
Date
(Local)
(34001.and
(b) Precipitation of
Departure Airport
Departure
0 None
0VMC
QIMC
0 Mixed
0 Lght
] Moderate
O Heavy
0 Lght
0 Moderate
Q Severe
o) VFFI
o IFFRVedmia"Otty
alter takeoff?
(3 IFR:VAsged RoutsgAagndo
+
(3
ssgnd
FR Vc"loleQ3High
OvYes
]Lw0
0 No
.
03-'
C]Moderate
C] 5or more
Q None
w~a
QDYes
O MAP
O VOAI11LS C). Ye
DYes
0 "od
QNo
0No
QCWS
0CMD
Q LNAV
0 VNAV
0 Full-Time
0 Pant-Time
0 FuII.Tirne
0 Pan-Tnime
(:)No
0 Not Used
0 Other
0 Not Used
0j Not Used
Figins 11.8 Dupinku' hilarmdo D~at ShhinI POW St~mcOv Evduieon Quuseionakra
295
Menial Effort
Mo
2.1
-,
Physical Difficulty
L- S
Less
More
'A
*<
Time Required
More #
Less
Undertalnding of
Horizontal Position
Less
Mole
=Do=
NC
'--a-,
IZEI
2.3 ISalowoeinees"BanBan
"
(Im
2.4
2. 4 Al becompered fo maneal
rr~\vi
~
Might
0*.)-
~(Comsplte
-E
------------------------------------------
Time Avwl;ble
LeSS
la-k
2.?
-Cl
J..\L
Stnl-SaM
Usefulness of Informallon
Moe
J.
Mome
Less
t
-1
L-----------------------------'
296
these
hiss to,2.6 "27*
Workload AeU~
Non-Normal Operations
5.0 Non-Normal Procedures Workload Factors
FMIn one secuton for eacih Non.Norml Procedurecomplebed
5.1
Alrting Indications
Attention Gelling
Ousllty
Less
Uote
k.
Planned
__]_
Procedures
Mental Effort
To Understand Problem
Mw.e
Less
Cornplexily
1MoOe
C Unplanned
Physical Dlifflcully
Ls
L0
Mo,0
Less .
MueS
Less
SA.,
Alerting Indications
Attention Getting
Quality
LOeSS
Ma
Planned
_-_
Mental Elfort
To Understand Problem
M
a
AeMom Less 4
Complexity
A
LOSS
Less
Mm
Mo
ess
1,M4t
0,00
-:/
,k
Mw.e
CQUnplanned
(oProcedures
(o
planned and unplanned)
Mental Effort
To Understand Problem
Mesma
-e
Lost
Q Planned
Alerftngindlcetlons I (forunplannedonly)
Less
Ease Of Maintaining
Other Piloting Functions
Physical Difficulty
Attention Getting
Quality
CQUnplanned
oProcedures
planned and unplanned)
p(o
Less
Less
Complexity
Mwete
Less 1
*,
297
Physical Difficulty
Mwes
Less
Less.
P&A S&tJ910v Ev
Ease Of Maintaining
Other Piloting Functions
Ls ,
Less?
IuI
ASI
Physical Difculy
Less !
More
ci
401
4.
,,
F111<" I11T]
RgFuw 11.11 Rating B
Used int Boefi PUlt
Subjdiv Evdueo Quemonnahme
Okxft 1982)
WMUkOa Assmesnot
gave significant weight to those tasks that have a high conscious activity
content. These choices were then subjected to review during the simulator
validation of the questionnaire.
While the core content of the form applies equally well to any commercial
airplane, the unique features of any new model might warrant special
consideration. For example, the 767 included a CRT map display and a full-time
flight management system. There was some concern that these devices would
add workload. To understand better the total impact of these devices, two
questions were added to the questionnaire dealing with the information
supplied by these systems. These questions were integrated into the
questionnaire and appear in the far right column of Figure 11.6. They are titled,
"Understanding of Horizontal Position' and "Usefulness of Information."
The nonnormal operations portion of the questionnaire (Figure 11.10) provides
additional workload information about equipment failures or abnormal flight
conditions. These events always involve two aspects: recognition of the event
or condition and accomplishment of any special handling required to restore
normal operations. The questionnaire asks for two ratings regarding the alerting
indications and three ratings about the nonnormal procedure itself. In this case,
the mental effort rating is titled "Complexity" and the time required rating is
titled "Ease of Maintaining Other Piloting Functions." These enhancements
resulted from discussions with pilots who found these titles easier to relate to
the specific events of a nonnormal procedure.
During flight test operations, there is the possibility that actual equipment
failures or nonnormal flight environments will occur. Even though these
unplanned events are not specified in the test plan, they are included in the
nonnormal portion of the questionnaire process. Where possible, simulated
inflight faults are introduced in a way that will produce the appropriate alerting
and recognition indications to the pilot. These events are also treated as
unplanned on the questionnaire, since they appear to be unplanned from the
viewpoint of the evaluation pilot. Safety concerns limit the failure event realism
that can be simulated inflight. Where sucA concerns come into play, the alerting
indications will be missing or incorrect. However, the procedure portion of the
questionnaire is still valid and useful.
Normally, the questionnaire is completed by the evaluation pilots for both the
departure and arrival phases of the current flight leg immediately after landing
and before any discussion takes place. On occasion, the departure sheet can be
completed once the aircraft reaches cruise altitude; however, the requirements
of the test program generally place heavy demands on the pilots while airborne.
The post flight debriefing involving the evaluation pilots and a human
performance analyst is an important element in the total process. Through this
299
Wadkload
C w s fon Condiewffos
Et Re
D
ags
egasd Cockit
301
302
Workoad AMesumnent
opportunities for random error. In the end, however, ensuring that the pilot can
detect that an error has occurred and can do something about it, is the best
means of preventing the error from compounding into a more serious situation.
This is the essence of error tolerance--detection and effective action.
reduces the likelihood of the error sequence continuing. Often the reaction will
be to accomplish some physical action. Under other circumstances the
appropriate reaction may be a change in planning or strategy for the remainder
of the flight. Recognizing the full range of possible responses is the key step in
ensuring that the pilot is provided with the appropriate controls, information,
knowledge, and skills to react effectively.
One of the most difficult aspects of pilot error is recognizing what errors are
most likely. It is nearly impossible for one human being to imagine how
another human being could understand and interpret the same circumstances
differently, yet evidence abounds that such is the case. Add to this that pilots
vary considerably in their decision-making styles and it is evident that
understanding error is a team effort. Collective wisdom is consistently one of
the more effective means of seeking out possible error patterns and their causes.
For collective wisdom to work, it must be nonjudgmental with an emphasis on
understanding as many ways of interpreting the display or control device as
possible. There are no wrong answers, except to believe that one interpretation
is correct and the others are wrong. The goal must be to help all pilots catch
their mistakes.
Pilot error can be triggered by unrecognized and subtle mismatches between the
information that is presented and the tasks that information is meant to
support. It is easy for the pilot to assume that if the information presented is
the same, then the associated tasks must be the same as well. Conditions where
identical indications are used to support different tasks are an invitation to
error. To make matters worse, error detection by the pilot under these
circumstances is particularly difficult. Making the design error-tolerant means
that the possibility of this error is acted upon during design. If the assumption
that the tasks are the same is false, the simplest design solution is selection of
different display formats, indications, or controls.
As an example, the hydraulic systems of the 757 and 767 are slightly different
operationally, because of different load assignments to the individual hydraulic
systems. Slight differences in system management and post-failure planning
304
wokload A es
result from this difference. Because of the task difference, the hydraulic system
control panels on the two airplanes are intentionally different. Even though the
same number of control devices is required on both airplanes, the types of
switches and the physical layout of the panel are different.
Boredom, fatigue, and time-of-day are among the factors that influence pilot
attentiveness. Their effects will normally vary during a single flight. Given these
facts, it is obvious that the pilot cannot be at maximum attentiveness all the
time. The design of nonnormal procedures can be made error-tolerant by
ensuring that the pilot has extra time to recognize and respond to situations
that, from his perspective, are new or unexpected. Once alerted to the
possibility of a problem or unusual condition, virtually all pilots can achieve
significantly increased levels of attention within a short time. This heightened
attention can then be sustained, if the circumstances warrant, for much longer
than it took to reach the heightened attention initially.
Future Woddoad/d Iues
In the future, crew workload will be influenced strongly by the strategy used to
prioritize flight deck information. Pilots are expected to look at, and be aware
of, an ever increasing array of information. Human beings can be exceptionally
versatile at handling large quantities of information. However, the time pressure
of flight can lead to impromptu prioritizing strategies that may not be well
suited to the actual circumstances. While certifying an individual system, the
composite effect of that system on the total flight deck information load may
not be evident. Yet the overall flight deck information management issues can
only be addressed by managing the contribution of each system. This means
that everyone involved in development and certification of specific equipment or
systems must share responsibility for the impact of those systems on the overall
pilot-airplane interface effectiveness.
A related issue is the potential for information overload that could follow the
addition of a general purpose data link capability to the airplane. Conceptually,
such systems could allow the nearly limitless information sources stored in
ground-based computers to be available in the cockpit. The potential for good is
great but so is the potential for excessive information management workload.
The knowledge and the tools are available to ensure that realistic consideration
is made of the pilot's human capabilities and limitations. The question is, how
will we, as an industry, use this information to ensure that new data sources
are managed in a manner that improves the effectiveness of the pilot and
protects the aviation system from new human error risks.
305
In the future, it is conceivable that the basic reliability of some of the control
and display equipment will approach the lifetime of the airplane. This implies
that many crews will go through their entire careers without seeing certain first
failure conditions on the actual airplane. While this will reduce nonnormal
workload, it presents some interesting challenges for selecting appropriate fault
management strategies and training. Certainly the strategies of today, based on
memorized or highly practiced procedures, will be inefficient and may not be
effective. The assistance of computer-based expert systems may be desirable.
Alternatively, it may be better to create designs that ensure the pilot will have
time to develop a suitable response by applying his knowledge of the system or
event.
A final issue concerns the increasing performance demands placed on pilots and
systems by the increasing need for aviation system efficiency. Many of the
improvements in efficiency are likely to result from better matching of: the
information available to the pilot, the procedures established for the various
tasks, and the training the pilot receives. To avoid any unnecessary increase in
pilot workload, coordination of these improvements will require more
communication and understanding among all the organizations and agencies
involved. It will take foresight and initiative to weld the traditionally
independent domains of aviation equipment and operations into a team that
enables the American aviation system to remain the best in the world.
306
Human Pamt=upTestif
ind
alai
Intoduco
Many different types of questions are best answered with the results of a
human factors test. Some of the most common human factors questions include:
- Which of two or more proposed designs (of displays, controls, training
programs, etc.) is best from a human factors standpoint?
- What performance benefits are achieved from a specified design change?
307
How long will it take for an operator to perform a task, or part of a task,
with a new system?
and Eauat
implementation is the case of a major air carrier that wanted to give the flight
attendants a cue as to when sterile cockpit was in effect. The airline installed a
small indicator light above the cockpit door that was to be illuminated when
sterile cockpit was in effect. Problems arose because the light that was chosen
was green. In most cultures, the color green is not associated with "stop" or "no
admittance." The lights had to be changed to red, at no small expense to the
airline. In that case, a human factors test was not needed to predict the
problems that were experienced by the airline; it is common knowledge how a
green light is likely to be interpreted by a crewmember. However, most
questions about training, displays, controls, and how the operators may use or
abuse them are much more complex and require controlled testing to be
answered effectively.
The findings of basic research, such as information about our sensory and
cognitive capabilities and limitations, can steer us away from what is known to
be troublesome and can help us to identify desirable design options. However,
each specific application of a technology, training program, or procedure should
be evaluated under the same or similar conditions as it will be used, by the
same type of operator that will be using it, and while the operators are
performing the same types of tasks that actual operations require.
When a human factors evaluation of a system or subsystem is warranted, it
should be designed by both a human factors specialist and an operations
specialist. Operations specialists are intimately familiar with the operational
environment (e.g., a specific cockpit or ATC facility). They represent the
potential users and are usually operators (e.g., pilots or controllers) themselves.
As long as they are operationally current (i.e., knowledgeable of current issues,
procedures, and practices), they are the most appropriate source for information
on user preferences and suggestions for symbology, terminology, display layout,
etc. However, even the most experienced users should not be solely responsible
for the user-machine interface. In fact, many years of experience can
occasionally be a liability in making such decisions, since the skills and
knowledge that develop with extensive experience can often compensate for
design flaws that may then remain unnoticed. For these and other reasons, it is
important for operations specialists to work with human factors specialists in
the planning and conduct of a human factors test. Human factors specialists are
intimately familiar with the capabilities and limitations of the human system,
testing methods, and appropriate data analysis techniques. They can point to
potential problems that operational specialists might otherwise overlook. While
working together, the two specialists can predict problems and head them off
before they occur in actual opc dons. Together, they are best equipped to
decide exactly what needs to be cested and how it should be tested.
309
310
kmcm
There are many factors that are known to affect human performance, and
hence, response time. Some of these factors are characteristics of the stimulus,
that is, of the visual or auditory display. Others are characteristics of the
operator, such as, previous experience, skill, fatigue, etc. Still others are
characteristics of the test or operational environment, such as workload,
consequences of errors, etc. Each of these factors needs to be considered from
the test design to the interpretation of the results and controlled as much as
possible during a test.
311
Slhu*s Factor
Factors that influence detection of visual signals include location in the visual
field, and presentation format (e.g., blinking vs. steady text, brightness, etc.).
(See Chapters 1 and 2 of this text.) Response time will be faster if the signal is
presented in the center of the visual field, as opposed to out on the periphery.
If it is presented in the periphery, but flickering, detection time will be faster
than if it is in the periphery but steady. (This is one reason why a flickering
display can be distracting.) Intensity is also an important factor. Within limits,
a higher intensity stimulus will attract attention more efficiently than a less
intense stimulus. In the visual domain, intensity translates into brightness
(although other factors, such as contrast) are also critical. For auditory displays
(e.g., a tone or spoken warning message), intensity translates into loudness,
with frequency as a critical variable. The frequencies that are contained in the
ambient noise must be considered in deciding which frequencies should be
contained in the alert. The relative intensity of a message (tone or voice) must
always be measured in the environment in which it will be used. A warning
signal that sounds very loud on the bench may be inaudible in a 727 with the
windshield wipers on. In fact, the original Traffic Alert and Collision Avoidance
System (TCAS) voice alerts passed the bench test, but were found to be
unusable in the cockpit (Boucek, personal communication).
Another factor that can affect how quickly a signal can be recognized and
interpreted is how meaningful the signal is. Personally meaningful stimuli, such
as one's own name, and culturally meaningful stimuli, such as the color red or
a European siren (both of which are associated with danger) will attract
attention more efficiently than other stimuli of equal intensity. One exception to
this, however, is if one of these "meaningful" signals is presented repeatedly
without accompanying important information (as with false alarms). In this
case, it is not difficult to learn to ignore a signal that previously attracted
attention efficiently.
Ease of Irfte"maon
Another factor that affects response time is how intuitive the meaning of the
symbol is to the user. For example, one of the first TCAS prototypes used a red
arrow to convey to the pilot the urgency of the alert (red) and the direction in
which the pilot should fly. Even after training, some pilots felt that there could
be instances in which pilots would be unsure as to whether a red arrow
pointing up meant that they should climb or that the traffic was above them.
The arrow was changed to a red or amber arc on the IVSI (Instantaneous
Vertical Speed Indicator) with the instructions to the pilot to keep the IVSI
312
P Cp rtun-_Qd
om
Cor&W
Expectations and context have a strong influence on response time. Responses
to a stimulus that occurs very frequently, or one that we expect to occur, will
be faster than to one that occurs once every month. However, expectations may
also lead to inaccurate responses, when what is expected is not what occurs. In
many situations, particularly ambiguous ones, we see what we expect to see and
we hear what we expect to hear.
The following ASRS report (October, 1989) entitled "Something Blue" illustrates
the power of expectation:
"On a clear, hazy day with the sun at our backs we were being vectored for
an approach...at 6000' MSL. Approach advised us of converging IFR traffic at
10 o'dock, 5000, NE bound. After several checks in that position I finally
spotted him maybe 10 seconds before he passed beneath us... When I looked
up again I saw the small cross-section and very bright landing light of a jet
fighter at exactly 12:00 at very close range at our altitude... I overrode the
autopilot and pushed the nose over sharply. As I was pulling back the thrust
levers and cursing loudly, the "fighter" turned into a silver mylar balloon with
a blue ribbon hanging from itd I could see what it was when it zipped just
over our heads and the sunlight no longer reflected directly back in my eyes
(the landing light). I was convinced it was a military fighter, complete with
the usual trail of dark smoke coming out the back (the blue ribbon?)l
Then -- I remembered the traffic directly below usl I pulled the nose up just
as sharply as before. Fortunately, everyone was seated in the back, and there
were no injuries or damage... Our total altitude deviation was no more than
200'."
In this case, the expectation or "set" to spot traffic led to a false identification
of an object and, consequently, an inappropriate response to it.
Another good example of the powers of expectation is seen in the videotapes
that Boeing made of their original TCAS simulation studies. In this study, the
pilots had the traffic information display available to them and often tried to
predict what TCAS was going to do. In one case with a crew of two
experienced pilots, the pilot flying looked at the traffic alert (TA) display and
said, "I think we'll have to go above these two guys" (meaning other aircraft).
This set up the expectation for both crewmembers for a "climb" advisory. The
313
crew started to climb when they received their first TCAS message, "Don't
climb." The pilot flying told the pilot not flying to call Air Traffic Control (ATC)
and tell them what action they were taking. Without reservation, the pilot not
flying called ATC, said that in response to a TCAS alert, they were climbing to
avoid traffic. He also requested a block altitude. He then told the pilot flying
that they were cleared to climb. Meanwhile, as the climb was being executed,
"Descend" was repeated in the background over 25 times. Eventually, the pilot
not flying said, "I think its telling us to go down." The next thing that is heard
on the tape is "[expletive], it changed, What a mess." Crash. (Boucek, personal
communication).
Anyone could have made a similar mistake. It is human nature to assess a
situation and form expectations. In support of the pilot's expectation, and
perhaps because of it, he didn't hear the first syllable, which was "don't" - he
heard the action word "climb". The idea was then cemented. It takes much more
information to change an original thought than it does to induce a different
original thought.
Praceco
Another factor that affects response time is how practiced the response is. If the
response is a highly-practiced one, then response times will be quicker than if it
is a task that isn't performed very often.
Usw Conikdece
Another important factor is trust in the system. This may, or may not, develop
with exposure to the system. Response time will increase with the time required
to evaluate the validity of the advisory. Confidence in the system and a
willingness to follow it automatically will result in shorter response times.
Nwnmtr
of
Response Akmoav
Another factor that influences the decision component of response time is the
number of response alternatives. In Ground Proximity Warning System (GPWS)
for example, once you decide to respond, there is only one possible response: to
climb. In TCAS II there are two response alternatives: to climb, or to descend.
With TCAS III, there are at least four alternatives: climb, descend, turn right, or
turn left. Studies have shown that the response time increases with the number
of response alternatives (see Boff and Lincoln, 1988 p. 1862 for a review).
314
--
77-
and Evaluatio
message. This was measured from the beginning of the controller's transmission
to the end of the pilot's correct acknowledgement (and included "say agains"
and other requests for repeats). This total time ranged from four to 40 seconds
and averaged 10 seconds. Ninety percent of the transmissions were successfully
completed within 17 seconds. Interestingly, times required to complete similar,
but not time-critical transmissions, such as turns issued by controllers for
reasons other than traffic avoidance, were very similar. The time required for
successful transmission of such calls ranged from four to 52 seconds with a
mean of 10 seconds.
Finally, it is interesting to note that many pilots' (and controllers') perception is
that a pilot's responses to GPWS and to time-critical calls is immediate. While
this is largely true, analysis of the data shows that even the immediate takes
time.
What Melhod of Testing Should Be Used?
The testing method of choice depends on the specific problem or question
under investigation and the available resources. Most importantly, the method
must be appropriate to the issue. For example, one would not consider a
questionnaire for measuring the time required to complete a small task, nor
would one collect data on pilot eye movements by asking the pilots where and
when they moved their eyes. Another necessary consideration is the amount and
type of testing resources available. Often, the most desirable type of test is too
expensive and many compromises are necessary. The implications of these
compromises need to be recognized as do their implications for the
interpretation of the test results.
FRW Obswvaff
One evaluation technique that is often used is field observation. This includes
any over-the-shoulder evaluations, such as sitting behind the pilot and observing
a specific pilot activity or sitting behind a controller team and observing their
interactions. One advantage to this method is that it allows investigators to
make observations in the most natural setting possible. It can increase our
understanding of the nature of processes and problems in the work
environment. Specifically, valuable insights can be gained as to where problems
might occur with a specific system or procedure and why they might occur.
One task in which field observations are helpful is in rying to determine the
information or cues that people use in performing a task. We, as humans, are
rarely aware of all of the information that we use in performing a task. This is
illustrated in a "problem" that Boeing Commercial Airplanes once had with one
316
of their engineering simulators. After flying the simulator, one pilot reported
that, "It felt right Jast week, but it just doesn't feel right this week." The
mechanics examined everything that could possibly affect the handling qualities
of the simulator. They took much of it apart and put it back together. They
fine-tuned a few things, but made no substantive changes. The pilot flew the
simulator again, but again reported that it still didn't "feel right." It seemed a
little better, but it just wasn't right. Someone finally realized that the engine
noise had inadvertently been turned off. The engine noise was turned back on
and suddenly, the simulator once again "handled" like the aircraft (Fadden,
personal communication).
While field observations are often useful as initial investigations into a problem,
their limitations often preclude objective conclusions. Their findings may be
more subjective than objective, are dependent on the conditions under which
the observations were made and can actually be affected by the observation
process itself.
One factor that affects the reliability of findings based on field observations is
the number of observations made. For example, a conclusion based on 10 test
flights is going to be more reliable (i.e., more repeatable) than one based on
three flights. Furthermore, the findings based on field observations are
condition-dependent. That is, the findings must be qualified with respect to the
specific conditions under which the observations were made. For example, if
you observed five test flights and they all happened to be in good weather,
with no malfunctions, et cetera, you may have observed only low or moderate
workload flights. Any findings based on these flights can not then be
317
questionnaire is administered, the more useful the results are likely to be. One
exception to this rule is a questionnaire that is used to exauiine the
effectiveness of a training program. That is, how much of the information that
is presented in training is retained over a given period of time. For such a
"test," a significant time interval (e.g., one month or longer) between exposure
to the training and the questionnaire would be useful. A test with such a delay
would be more effective than a test with no delay in predicting what
information will be remembered and accessible for use when needed in actual
operations.
A&nes=W
Rating scales are often very useful. Most scales offer five or seven choices.
Fewer than five choices is confining; larger than seven, makes it difficult to
define the differences between consecutive numbers on the scale.
Indicator (HSI), until they used the electronic map display (Boucek, personal
communication).
It has also been the case that pilots have preferred one thing on the ground
(e.g., a display with lots of high-tech options and information) and something
else (usually a simpler, less cluttered, version) once they tried to use it in actual
operations.
Even simple behaviors do not lend themselves to accurate judgments about our
own actions. As part of an evaluation of a prototype navigation display, the
Boeing flight deck integration team monitored pilots eye movements as they
used a prototype navigation display. The team also asked the pilots to report
where they thought they were spending most of their time looking. There was
no systematic relation between where the pilots thought they were looking the
most and where the data actually showed that they were looking most (Fadden,
personal communication).
Lomky EWM
It is difficult, if not impossible, to investigate issues by manipulating factors in
actual operations. Such control is usually only available in a laboratory setting.
The goal of an experiment is to manipulate the variables under investigation
while keeping everything else constant. This careful manipulation of the key
variables allows investigators to determine which of them has an effect.
One common type of a laboratory experiment is a part-task simulation. Part-task
simulations are useful for studying simple questions, such as: "How long does it
take to notice a particular change in the display?" or "Will the user immediately
know what that symbols mean?" A part-task simulation is an ideal way to
conduct an in-depth test of a new display. It allows attention to be focussed on
the details of the display before it is tested operationally in a full-mission
simulation. In addition to providing valuable results, a part-task simulation
often points to specific areas that should be tested in a full-mission simulation.
The full-mission simulation is, of course, a very desirable type of test because it
preserves the most realism, and thus, yields results that are easy to generalize to
the real world. Full-mission simulation can give the same degree of control as a
laboratory experiment, with the added benefits afforded by the realism.
The major drawback of full-mission simulation is that it is very expensive. The
costs for computer time, simulator time, the salary for the pilots and/or
controllers who participate, in addition to the other costs of research, can be
prohibitive for all but the largest, and most well-funded, of projects. Also, there
320
Humi
Facow Testing ad
are only a few places in the country that have the capability to conduct full
mission simulation studies.
Another limitation of simulation studies that must be considered when
interpreting the results is the priming effect. When pilots walk into a simulator
knowing that they are going to participate in a test of Warning System X, they
are expecting to see that system activated. They will see System X activated
more times in one hour than they are likely to see in an entire day of line
flying. This expectation leads to a priming effect which yields faster response
times than can be expected when the activation of System X is not anticipated.
For this reason, the response times obtained in simulations are faster than can
be expected in the real-world and must be considered as examples of best-case
performance. How much faster the response times will be in simulation than in
actual operations is difficult to say as it depends on a variety of factors,
particularly the specific task. In addition to response times being faster, they are
also more homogeneous in simulation studies than would be expected in actual
operations. This reduced variability can result in a higher likelihood of
obtaining a statistically significant difference between two groups or conditions
in a simulation study than in actual operations. However, since data obtained in
actual operations are rarely obtainable, data from realistic simulation studies are
a good alternative.
Expwf mental Validity and Reliability
The goal of any evaluation is to have reliable and valid results. Reliability refers
to the repeatability of the results. If another investigator was to run the same
test with the same equipment and same type of test participants, what are the
chances that they would get the same results? In order to have repeatable
results, the results obtained need to be due to the factors that were
manipulated, and not to extraneous factors, chance, or anything peculiar to the
testing situation or individuals tested.
In any experiment, it is necessary to carefully manipulate the factors that will
be examined in the study and control all other variables (if only by keeping
them constant). Careful controls help to ensure that the results of the study are,
in fact, due to the factors examined and not to extraneous factors.
Validity refers to measuring what the test purports to measure. A classic
example of this is the IQ test. Does it really measure one's ability to learn? Do
the Standardized Aptitude Tests (SATs) actually measure one's ability to succeed
in college? If the answer to this type of question is "no," then the test is not
valid.
321
One way to help ensure that the results of the study are valid and reliable, is to
employ careful controls of critical factors of interest and of extraneous factors
(such as fatigued participants) that may influence the results of the study. This
is easier said than done because it is often very difficult to even identify all of
the factors that may contribute to your results. However, careful selection of
test participants and testing conditions, in addition to a sound experimental
design, will help to ensure valid and reliable results. A sound experimental
design ensures that an adequate number of test participants ("subjects") are
properly selected and tested (in an appropriate number and order of conditions)
and that careful controls of the variables are included in the test.
-
DarmW Vnjbw
Pool
results depends upon the task. These differences can be quite subtle, but
important.
For example, one approach to studying the similar call-sign problem might
involve determining which numbers are most likely to be confused when
presented auditorily. A sample research question would be "Is 225 more
confusable with 252, or 235?". This is a relatively simple task and the results
would comprise a confusability matrix. Because this is a simple auditory task,
pilots would not be expected to perform much differently than college students
(with the exception of the differences attributable to hearing loss due to age
and exposure to noise). In this case, performance depends solely on the ability
to hear the differences between numbers and results of experiments performed
with college students as subjects are likely to be applicable to pilots.
Now consider a superficially similar, but technically very different, task. If the
experimental task was to look at the effect of numerical grouping on memory
for air traffic control messages, subjects might listen to messages with numerical
information presented sequentially (e.g., "Descend and maintain one, zero
thousand. Reduce speed to two two zero. Contact Boston Approach oi.e one
niner point six five"), and messages with numerical information presented in
grouped form (e.g., "Descend and maintain ten thousand. Reduce speed to two
twenty. Contact Boston Approach one nineteen point sixty-five.") Since a pilot's
memory for that type of information is going to be very different from a college
student's memory of that information, (mostly because it is meaningful to the
pilot), results obtained by using college students would probably not be directly
applicable to pilot populations.
One important aspect in which subjects should be representative of the target
population is in terms of skill level. It is highly unlikely that a test pilot can
successfully train himself to react or think like a line pilot. A below-average
pilot (or an average pilot on a bad day) is likely to experience more difficulties
with a new system than a skilled test pilot, or an Aircraft Evaluation Group
(AEG) pilot. It is very difficult for a highly experienced operator to predict how
people without prior knowledge or specific experiences will perform a certain
task or what mistakes they are likely to make. Exceptional skill can enable an
operator to compensate for design flaws - flaws which, because of the skill, may
go unnoticed.
COaoMV
MM
While it is important that the people used as subjects are as similar as possible
to the people to whom you want to generalize the results, it is also important
that the subjects' biases don't affect the results of the test. If the participants
323
Pactai
HumanIIon-
have their own ideas as to how the results should come out, it is possible for
them to influence the results, either intentionally or unintentionally. It is not
unusual for subjects to be able to discern the "desirable" test outcome and
respond accordingly. To prevent this, investigators must take steps to control
subject bias. For example, studies designed to test the efficacy of a new drug
often employ a control group that receives a placebo (sugar pill). None of the
subjects knows whether he or she is in the group receiving the new drug or in
the group given the placebo. Some studies are conducted "double-blind"
meaning that even the experimenters who deal with the subjects do not know
who is receiving the placebo and who is receiving the drug.
In aviation applications, it is usually impossible to conduct a test (e.g., of new
equipment) without the participants knowing the purpose of the test.
Furthermore, this is often undesirable, since subjects' opinions (e.g., of the new
display) can be a vital component of the data. One solution to the problem of
controlling or balancing the effects of biases and expectations is the use of a
control group. This group of subjects is tested under the same conditions (and
presumably would have the same expectations) as the experimental group, but
is not exposed to the tested variable.
For example, consider a test designed to examine the effectiveness of a new
training program for wind shear (e.g., on the time required to maneuver based
on a recognition of wind shear, number of simulated crashes, etc.). If the new
training program is to be compared to an existing program, then the
performance of pilots who were trained in each program could be compared.
Pilots trained in the new program would be the experimental group and pilots
trained in the existing program would constitute the control group. If the
training program was a prototype and there was no such comparison to be
made, then the performance of pilots trained with the new program
(experimental group) could be compared to that of pilots who did not receive
this training (control group). In this case, however, it would be important to
control for test expectations. If, for example, the test wind shear scenarios were
presented within days of the training, then the pilots would naturally expect
wind shear to occur in the simulation sessions. This expectation would be
expected to improve their performance over what it would be if wind shear was
not anticipated. In this case, for the comparison between the two groups to be
meaningful, pilots in both groups would need to be informed of the purpose of
the test or be caught by surprise.
Another way to control subject bias is with careful subject selection. A good
example of this is illustrated in a test conducted at the FAA Civil Aeromedical
Institute to look at low-visibility minimums for passive auto-land systems
(Huntley, unpublished study). The Air Transport Association (ATA) wanted
324
and
l-w
lower minimums than the Air Line Pilots' Association (ALPA) thought was rafe.
Clearly, both of these groups had a stake in the outcome of the test. When the
simulation study was conducted, a portion of the subject pilots came from ALPA
and an equal number of pilots came from ATA (Huntley, personal
communication). While it is impossible to get rid of the biases that people bring
to a test, it is usually possible to balance them out.
Repmw I h Tom
on
conditions so that the results of the test can be generalized to actual operations.
Important conditions may include (but are not limited to): varied workload
levels, weather conditions, ambient illumination levels (Le., lighting conditions),
ambient noise conditions, traffic levels, etc. For example, if a data input device
is designed to be used in the cockpit, then it is important to ensure that it is
easily used in a wide variety of lighting conditions and in turbulence (when it
is difficult to keep a steady hand).
It is often important to include the "worst-case" scenario in addition to
representative conditions in a test. Most human factors evaluations must include
a worst case test condition, since it is the worst case (e.g., combination of
failures) that often results in a dangerous outcome. For example, if it is
important that a time-critical warning system be usable in all conditions, then
the operator response time that is assumed by the software's algorithm needs to
take this into account. In this case, in addition to measuring how long will it
take the average person under average conditions to respond to the system, the
longest possible response time, or response time at the 95th or 99th percentile,
should also be measured. Such "worst case' response times should be obtained
under "worst case" conditions.
Counter-bkw
One control that is not necessary in the engineering world but can be critical in
the human factors world is counter-balancing. When measuring the noise level
of two engines, it doesn't matter which one is tested first; the test of the first
engine will not affect the outcome of the test of the second engine. When
testing human pe~formance, however, such order effects are common.
325
There are two possibilities of how human performance can change during the
course of the test; it can get better or worse. Performance may improve because
exposure to the first system gives subjects some information that helps them in
using the second system. This is called positive transfer. For example, in a test
of two data input devices, it would be reasonable to have pilots use each of
them and measure the time required to perform specific tasks (response time)
with each system. The number of errors made in the data input process
(response accuracy) wot Id also be measured. Performance with System A could
be compared to perform3nce with System B to determine which of the two
systems is preferable. If the procedures for two systems are similar (e.g., in
terms of keypad layout, the required order of the information input, etc.) but
new to the pilot, then the practice acquired during test of System A might
improve his or her performance with System B over what it would have been
without the experience gained during the first test.
However, if the two systems are physically similar, but require different
procedures to operate, then the experience acquired with the use (test) of
System A would probably impair performance with System B. Performance with
System B would have been better with no previous experience with System A.
This phenomenon is referred to as negative transfer.
One way to avoid the possibility of positive or negative transfer influencing test
results is to balance the order of conditions. For example, in a comparison of
two navigation displays, a test could be conducted in which half of the pilots
are tested with one display and half the pilots are tested using the other
display. In this case, it is particularly important to ensure that there are no
important differences in the two pilot populations (e.g., in terms of skill level).
Alternatively, each pilot could be tested using both displays, with half of them
using Display A first and half of them using Display B first. This is referred to
as "counter-balancing."
There is another reason why performance may deteriorate over the course of a
test. If the test is extremely long or the task is very tedious, performance may
suffer due to a fatigue effect. When fatigue may be a factor in a test, careful
controls (such as the use of an appropriate control group or balancing the order
of conditions) must be considered. One study of the effects of fatigue on flight
crews illustrates this point. Foushee, Lauber, Baetge, and Acomb (1986)
investigated the effects of fatigue on flight crew errors. They had two groups of
active line pilots fly a LOFT-type scenario in a full-mission simulation. Ten
flightcrews flew the scenario within two to three hours after completing a
three-day, high density, short-haul duty cycle. The other ten flightcrews flew the
test scenario after a minimum of three days off. The results showed that while
the "Post-Duty" crews were more fatigued than the "Pre-Duty" crews, their
326
225,
275,
300,
400,
400,
500,
1450
then the mean would be (200 + 225 + 275 + 300 + 400 + 400 + 500 +
1450)/8 or 469 msec. The mean is considered to be the fulcrum of a data set
because the deviations in scores above it balances the deviation in scores below
it. The sum of the deviations about the mean is always zero. Because of this,
the mean is very sensitive to outlying scores, that is, scores that are very
327
different from the rest. A very high or very low score will tend to pull the mean
in the direction of that score. In our example data set cited above, the mean of
the first seven scores is 329 msec (compared to the mean of 469 with the score
of 1450). While the mean is more frequently cited than the median or the
mode, it is not always appropriate to cite it alone for this reason.
The median. The median is the score at which 50 percent of the scores fall
above it and 50 percent of the scores are below it. With an odd number of
scores, the median is the score in the middle when the scores are arranged from
lowest to highest. With an even number of scores, the median is the average of
the two middle scores. In the example array of data cited above, the median
would be the average of 300 and 400 or 350 msec. One advantage of the
median is that it is less sensitive to outlying data points. When there are a few
scores that are very different from the rest, then the median should be
considered as -well as the mean.
The mode. The mode is the most frequently occurring score. In our example
data set, the mode is 400, since it is the only score that occurs more than once.
It is always possible, especially with very small data sets to have no mode. In
very large data sets, it is possile to have multiple modes. While the mode is
the most easily computed measure of central tendency, it is also less stable than
the mean or median, and hence, usually not as useful.
Measurs of vadabmy
A measure of central tendency, when presented in isolation, cannot fully
describe the test results. In addition to the mean or median, we also need to
know how dose or disparate the scores were. In other words, how
homogeneous were the scores as a group? For example, did half of the pilots
take five seconds to perform the task and half of them require ten seconds or
did they all take about 7.5 seconds? To answer this type of question, we need
to compute a measure of variability, also known as a measure of dispersion.
The most commonly used measure of dispersion is the standard deviation. The
standard deviation takes into account the number of scores and how dose the
scores are to the mean.
The standard deviation (abbreviated as "s" or "s.d.") is the square root of the
variance. The variance (s2) equals the squared deviations of epch score from the
328
mean divided by the total number of scores. One equation for computing the
variance is as follows:
n-1
Where:
E is the summation sign
X represents each score
"equals the mean of the distribution, and
n equals the number of scores in the distribution.
To compute the standard deviation in this way, we subtract each score from the
mean, square each difference, add the squares of the differences, divide this
sum by the number of scores (or the number of scores minus one), and take
the square root of the result Relatively small standard deviation values are
indicative of a homogeneous set of scores. If all of the scores are the same, for
example, the standard deviation equals zero. In our sample set of data used to
compute the mean, the standard deviation equals 383 msec.
Another use of the standard deviation is that it helps us to determine what
scores, if any, we are justified in discarding from the data set. Studies in visual
perception, for example, often use stimuli that are presented for very brief
exposure durations (e.g., less than one-half of a second). In this case, a sneeze,
lapse in attention, or other chance occurrence, could produce an extraordinarily
long response time. This data point would not be representative of the person's
performance, nor would it be useful to the experimenter. What objective
criterion could be used to decide whether this data point should be included in
the analysis?
In the behavioral sciences, it is considered acceptable to discard any score that
is at least three standard deviations above or below the mean. In our sample set
of data, if we discrd the outlying score of 1450, the standard deviation
becomes 100. Leaving this score out of the analysis would not be acceptable,
however, using the convention of discarding scores three standard deviations
above or below the mean. In this example, only scores above 1635 would be
legitimately left out of the analysis. (In this case, it is impossible to have a
score three standard deviations below the mean, because it would indicate a
negative response time.)
329
asne
Caffbaon
Correlation is a commonly used descriptive statistic that describes the relation
between two variables. A correlation coefficient is reported as "r = x", where "x"
equals some number between negative one and one. When two variables are
unrelated (e.g., number of rainy days per month in Kansas and cost of airline
fares), the correlation coefficient is near zero. A high positive "r" indicates that
high values in one variable are associated with high values in the other
variable. A high negative "r" indicates that high values in one variable are
associated with low values in the other variable. A correlation of .7 or greater
(or -.7 or less) is usually regarded as indicative of a strong relation between the
two factors. An important note about correlation is that even a very high
correlation (e.g., r =.90) does not imply causality or a cause-effect relationship.
A correlation coefficient merely indicates the degree to which two factors varied
together, perhaps, as a result of a third variable that remains to be identified.
Another way in which the correlation coefficient, is useful is that when squared,
it indicates the percentage of the variance that is accounted for by the
manipulated factors. For example, with a correlation coefficient of .7, the
factors Lhat were examined in the analysis account for only 49 percent of the
variance (i.e., the variability in the data). The other 51 percent is due to chance
or things that were not controlled.
The statistics discussed above describe the test results and are, therefore,
referred to as "descriptive statistics." Inferential statistics are used to determine
whether two or more samples of data are significantly different, for example, if
performance on System A is significantly better or worse than performance on
System B.
The most commonly cited inferential statistics are the t-test and analysis of
variance. Each method of analysis has an underlying set of assumptions. If these
assumptions are seriously violated, or the analysis is inappropriate for the
experimental design, then the conclusions based on the analysis are
questionable.
SbuMd
' t RaIo
the performance of two groups of pilots - one using System A and the other
using System B. When both sets of scores are taken from the same group of
people, Student's t ratio for correlated samples is appropriate. When the scores
of two different groups of people are examined, Student's t ratio for
independent samples is appropriate. The formulas for computing a t-ratio (and
all of the statistics discussed in this chapter) can be found in Experimental
Statistics (Natrella, 1966) and in most statistics textbooks. Both types of t-tests
look at the differences between the two groups of scores with reference to the
variability found within the groups. They provide an indication as to whether or
not the difference between the two groups of scores is statistically significant.
The results of a t-test are typically reported in the following format:
t(df) = x, (p < p.)
Where:
"df" equals the number of degrees of freedom
"x" equals the computed t-value
"p." equals the probability value.
For example, t(20) = 3.29, p < .01).
Degrees of freedom (df) refers to the number of values that are free to vary,
once we have placed certain restrictions on the data. In the case of a t-test for
correlated samples, the number of degrees of freedom equals the number of
subjects minus one. For independent samples, df equals the number of subjects
in one group added to the number of subjects in the other group minus two. In
both cases, as the number of subjects increases (and, hence, the number of df
increases), a lower t-value is required to achieve significance.
S W sOCn
The p value relates to the probability that this specific result was achieved by
chance. This is true not only for the t-values, but for all other statistics as well.
A "p < .01" indicates that the probability that this result would be achieved by
chance (and not due to the manipulated factors) is less than one in 100. When
the results are significant at the .05 level, (i.e., p _<.05), the chances of the
results occurring by chance are 5 in 100, or less. Very often, the statistic is cited
at the end of a statement of the results. For example, "The number of errors
was significantly higher in the high workload condition than in the low
workload condition (t(15) = 2.25, p < .05.)." It can also be used to show that
there were no statistically significant differences between two conditions. For
example, "The number of errors in the high workload conditions was
331
following:
333
Delay
delay
----No
Number
of
Step-Ons
low
moderate
Frequency Congestion
high
Another possible result is that the only significant effect was due to frequency
congestion. This could mean that the number of step-ons increased with
frequency congestion regardless of the delay condition. Graphically, this
possibility might look like this:
Delay
------No delay
Number
of
Step-Ons
low
moderate
Frequency Congestion
high
334
h a Paco= Teutb
ad Ewalunid
Delay
----No
delay
Number
of
Step-Oris
low
moderate
high
Frequency Congestion
These are only examples of the type of results that may produce significant
Regrwabn An
A special case of analysis of variance that is often used is regression analysis.
Regression analysis takes the data and fits it to a mathematical function. The
function may be a straight line, a parabola, or any other function. The analysis
provides an indication of how well the data fits that particular function.
One of the advantages of regression analysis is that it is very forgiving of empty
cells in an experimental design (i.e., conditions in the design that do not have
as many data points as the other conditions). For example, if we wanted to test
how many mistakes pilots were likely to make with a certain system, but were
most interested in the number of errors to be expected under conditions of high
workload, then we might run a test with the majority of responses being in
high workload conditions. Perhaps some pilots would only be tested in the high
workload condition. Because of this asymmetry of data points in the high and
335
experiment to situations that were not included in the test. In our hypothetical
example, regression analysis would be appropriate if communication delays of 0
msec., 250 msec., and 500 msec. were tested and we wanted an estimate of the
number of step-ons that could be expected at delays of 300 or 600 msec. When
A final note about data analysis concerns the differences betw en statistically
significant and operationally significant results. Most statisticians only seriously
consider results that are statistically significant at the .05 level or better. This
enables the investigator to be reasonably certain that the findings were not due
to chance. A statistically significant difference may, however, be very small as
long as it is consistent. This may or may not be operationally useful. This
difference between statistical significance and operational significance is often
overlooked. A difference in response times of half of a second may be
statistically significant, but may not be operationally important, depending upon
the task.
On the other hand, when the experimental focus is actual operations, results
that are not statistically significant at the .05 level may still be important. For
example, if the focus of the experiment is serious operator errors that could
336
lut of Rkum
____________
REFERENCES
Chafteu 1-4
Albers, J. (1975) Intfudioun of Color. New Haven, CT: Yale University Press.
Alpern, M. (1971) Effector mechanisms in vision. In J.W. Kling & L.A. Riggs
(Eds.) Fjpfjw muta PIdwho
. New York: Holt, Rinehart & Winston, 367-394.
Appelle, S. (1972) Perception and discrimination as a function of stimulus
orientation: The oblique effect in man and animals. Pycholoical BuRlad, 78,
266-278.
Bedford, R.E. & Wyszecki, G. (1958) Wave length discrimination for point
sources. Jouin of he (4calSociety ofAnueca, 48, 129-135.
Bik~sy, G. & Rosenblith, WA. (19S1) The mechanical properties of the ear. In
S.S. Stevens (Ed.), Handbook of EpVabnental Psychoiqo. New York: John Wiley
& Sons, Inc., 1075-1115.
Blakemore, C.B. & Sutton, P. (1969) Size adaptation: A new aftereffect. Sa"e,
166, 245-247.
Boettner, EA. (1967) SpecbW Thamiguia of the Eye (data only). Report for
contract AF41(609)-2966. University of Michigan. Prepared for USAF School of
Aerospace Medicine, Aerospace Medical Division (AFSC), Brooks Air Force Base,
Texas.
Bouman, MA. & Walraven, P.L (1957) Some color naming experiments for red
and green monochromatic lights. kounal of the OpIcal Soety of Ameica, 47,
834-839.
Boynton, R.M (1978) Color in contour and object perception. In E.C. Carterette
& M.P. Friedman (Eds.), Handbiok of Parqim.New York: Academic Press.
Bowmaker, J.L., Dartnafl, HJ.A. & Mollon, J.D. (1980) Microspectrophotometric
demonstration of four classes of photoreceptors in an old world primate Macaca
fascicularis. .kwual ofaPhjwioiog, 298, 131-143.
Brown, J.F, (1931) The thresholds for visual velocity. PIyhokohe Foauzmng,
14, 249-268.
R-1
Peuun
Brown, J.L (1965) Flicker and intermittent stimulation. In C.H. Graham (Ed.),
Hsi=an
udmal cq m. New York: John Wiley & Sons, Inc., 251-320.
Campbell, F.W. & Robson, J.G. (1968) Application of Fourier analysis to the
visibility of gratings. Joana of APWio, 197, 551-566.
Campbell, F.W. & Westheimer, G. (1960) Dynamics of accommodation responses
of the human eye. Jounalof Phjysiol,
151, 285-295.
Campbell, F.W. & Wurtz, R.H. (1978) Saccadic omission: Why we do not see a
grey-out during a saccadic eye movement. ViMan Rawrh, 18, 1297-1303.
Carter, R.C. (1979) Visual search and color coding. rovceedkV of die Human
Facds Socdiety 23rd Annual Meetg, 369-373.
Chipanis, A. (1965) Color names for color space. Amerkan Scientht, 53,
327-346.
Christ, R.E. (1975) Review and analysis of color coding research for visual
displays. Human Factors, 17, 542-570.
Cole, B.L. & Macdonald (1988) Defectiv Color Vi.uon Can Impede Informaio
Aapaitianfrom Colour Coded Video Dijya. Department of Aviation Report
No. 3, Melbourne, Australia.
Coren, S., Porac, C., & Ward, L.M. (1984) Sewation and PI&qion. (2nd edit.)
Orlando, FL: Harcourt Brace & Company.
Cornsweet, T.N. (1970) V'aud hnqio.
Company.
C.Qaik, F.I.M. & Lockhart, RIS. (1972) Levels of processing: A framework for
memory research. Journal of Verbal Learbng and Vera/ Bdeaviour, 11, 671-684.
Davis, R.G. & Silverman, S.R. (1960) Heanmg and Deafueu. New York: Holt,
Rinehart & Winston.
DeHaan, W.V. (1982) The Optometrist'sand OphdiabioloSW's Guide to P7o's
iuvon. Boulder, Co: American Trend Publishing Company.
deLange, H. (1958) Research into the nature of the human fovea-cortex systems
with intermittent and modulated light. I. Attenuation characteristics with white
and colored light. Journal of de Optical Society of Ameica, 48, 777-784.
R-2
a Rdknuces
ofs
DeValois, R.L & DeValois, K.K. (1988) Spada VJiaw
University Press.
Fiorentini, A., Baumgartner, G., Magnussen, S., Schiller, P.H. & Thomas, J.P.
(1990) The perception of brightness and darkness: Relations to neuronal
receptive fields. In L. SpflLmann & J.S. Werner (Eds.), Vfulal
-tT7e
NeurraphysiolicalFoundadonm. New York: Academic Press, 129-161.
Fletcher, H. (1953) Speech and Hearing in Cmnrnuicado. (2nd edit.) New
York: Van Nostrand Company.
Fletcher, H. & Munson, W.A., (1933) Loudness, its definition, measurement and
calculation Journalof the Acoustcal Society of America, 5, 82-108.
Frisby, J.P. & Mayhew, J.E.W. (1976) Rivalrous texture stereograms. Natwr,
264, 53-56.
Fuld, K., Werner, J.S. & Wooten, B.R. (1983) The possible elemental nature of
brown. Vuion Reearch, 23, 631-637.
Gibson, J.J. (1966) The Sew Cousidered as Percepual Sywteu. Boston:
Houghton Mifflin.
R-3
& Phydoklc
R-4
Hunt, G. (1982) Cathode ray tubes. In R.D. Bosman (Ed.) Modem Diqplay
TeduokSj andIAppicadiom. AGARD Advisory Report No. 169, 37-53.
Hurvich, LM. (1981) Color Vnion. Sunderland, MA: Sinauer Associates.
Iavecchia, J.H., Iavecchia, H.P. & Roscoe, S.N. (1988) Eye accommodation to
head-up virtual images. IHunan Factors, 30, 689-702.
Jameson, D. & Hurvich, L.M. (1972) Color adaptation: Sensitivity, contrast,
after-images. In D. Jameson & L.M. Hurvich (Eds.), Handbook of Sensory
Physiolov, VoL VI/4 (pp. 568-581). Berlin: Springer-Verlag.
Judd, D.B. (1951) Basic correlates of the visual stimulus. In S.S. Stevens (Ed.),
New York: John Wiley & Sons, Inc.,
Handbook of ErpbnentalPsyadkoo.
811-867
Julesz, B. (1971) Foundaiomof qdqiean Peroqion. Chicago: University of
Chicago Press.
Kaiser, P.K. (1968) Color names of very small fields varying in duration and
luminance. Journal of the Opical Sociey of Amrica, 58, 849-852.
Kardon, D. (1989) Electroluminescent backlights for liquid crystal displays.
InJormaa'n Dbiplay, 6, 17-21.
Kaufman, L. (1974) Sigt and Md.- An Immiduaon to Kmal Pacgptm. New
York: Oxford University Press.
Kaufman, L. & Rock, . (1962) The moon illusion. Sieti wAmrican, 207,
120-132.
Kelly, D.H. (1972) Adaptation effects on spatiotemporal sine wave thresholds.
V'ion Reseanh, 12, 89-101.
Kelly, D.H. (1974) Spatio-temporal frequency characteristics of color-vision
mechanisms. Joumnal of die Opical Society of America, 64, 983-990.
Kinney, J.A.S. (1983) Brightness of colored self-luminous displays. Color
Reeawh and App4iicaion, 8, 82-89.
Klingberg, C.S., Elworth, C.S. & Filleau, C.R. (1970) Image quality and
detection performance of military photointerpreters. Boeing Comaany Repot
D162-10323-1.
R-5
Krupka, D.C. & Fukui, H. (1973) The determination of relative critical flicker
frequencies of raster-scanned CRT displays by analysis of phosphor persistence
characteristics. Procweinp of the Sociey for Inf'madik Dbiqy, 14, 89-91.
Kuo, K.H. & Kaimanash, M.H. (1984) Automatic chrominance compensation for
cockpit color displays. SD 84 Dihs, 65-67.
Kurtenbach, W., Sternheim, C.E. & Spillmann, L. (1984) Change in hue of
spectral colors by dilution with white light (Abney effect). Journal of the Opidca
Society, A, 1, 365-372.
Laycock, J. (1982) Reomvuendad cohaa of uw in airbornedirpaw. Technical
Report 82110, Royal Aircraft Establishment, Famborough, UK.
Leibowitz, H.W. (1983) A behavioral and perceptual analysis of grade crossing
accidents. Operaion .L1esa Naional SynwsAwn 1982. Chicago: National
Safety Council.
Mach, E. (1865) Ober die Wirkung der riumlichen Verteilung des Lichtreizes auf
die Netzhaut. L
der madmaisch-n
i
cftidwn Cl.we
der "ka hen Akademie der Wuenuwhaen, 52, 303-322.
Marc, R. E. & Sperling, H. G. (1977) Chromatic organization of primate cones.
Sdence, 196, 454-456.
Matlin, M.W. (1983) Senatio and Perceione
Bacon.
R-6
List of Refmeus
Nickerson, D. & Newall, S.M. (1943) A psychological color solid. Joumal of sde
Opdcal Soiery of America, 33, 419-422.
Osterberg, G. (1935) Topography of the layer of rods and cones in the human
retina. Acra 0phtalmoo~ ca, Supplement 6.
Owsley, C., Sekuler, R. & Siemsen, D. (1983) Contrast sensitivity throughout
adulthood. Vsion Reearch, 23, 689-699.
Pearlman, A.L., Birch, J. & Meadows, J.C. (1979) Cerebral color blindness: An
acquired defect in hue discrimination. Annas of Neawvogy, 5, 253-261.
Pettigrew, J.D. (1972) The neurophysiology of binocular vision. Scientic
American, 227, 84-96.
Pitts, D.G. (1982) The effects of aging on selected visual functions: Dark
adaptation, visual acuity, stereopsis and brightness contrast. In R. Sekuler, D.W.
Kline and KDismukes (Eds.), A8g and Human V'sual Funcdionr. New York:
Liss.
Pokorny, J, Smith, V.C., Verriest, G. & Pinckers, A.J.L.G. (1979) Congenita and
.
Acquire Color Vi'wn Defects. New York: Grune & Stratton.
Polyak, S. (1941) The Redia. Chicago: University of Chicago Press.
Purdy, D.M. (1931) Spectral hue as a function of intensity. Ameican Jounal of
Psycholob, 43, 541-559.
Regan, D., Beverly, K. & Cynader, M. (1979) The visual perception of motion in
depth. SdemnijiAmerican, 241, 136-151.
Richards, W. (1970) Stereopsis and stereoblindness. FEqerimet
10, 380-388.
Brain Research,
Riggs, LA. Visual acuity. In C.H. Graham (Ed.), Vuion and V'ual Peaceptou.
New York: John Wiley & Sons, Inc., 321-349.
Riggs, LA., Ratliff, F., Cornsweet, J.C. & Cornsweet, T.N. The disappearance of
steadily fixated visual test objects. Journal of the Optical Society of America, 43,
495-501.
Riggs, LA., Volkmann, F.C. & Moore, R.K. (1981) Suppression of the blackout
due to blinks. Vrion Reseawrch, 21, 1075-1079.
R-7
l-Immn
54-56.
Schlam, E. (1988) Thin-film electroluminescent displays. Infomaion Dilay, 4,
10-13.
Sekuler, R. (1974) Spatial vision. Annual Review of Pa'dy
o,
25, 195-232.
Sekuler, R. & Blake, R. (1985) Pevqiou New York: McGraw Hill, Inc.
Sherr, S. (1979) Electmic Diipaqw. New York: John Wiley & Sons, Inc.
Snyder, H.L. (1980) Human Viual Pefomance and Fa Panel Dilay hmqe
Quality. Technical Report HFL-80-1/ONR-80-1.
Snyder, H.L (1988) Image Quality. In M. Helander (Ed.), Handbook of
Hunan-ConnaerIntfmcdoc. Amsterdam: Elsevier Science Publishers.
Society of Automotive Engineers (1988) Human Engineein Considduu
s in the
Application of Color to Elctoic Aircmft Dijwp. ARP4032. Warrendale, PA:
SAE.
Stokes, A.F. & Wickens, C.D. (1988) Aviation displays. In E.L. Wiener & D.C.
Nagel (Eds.), Human Factr in Aviation. New York: Academic Press.
Taylor, C.A. (1965) The Phpicrof MjW/ Sound,. New York: Elsevier.
Teichner, W.H. (1979) Color and visual information coding. Proceadimpof te
Inf
n Diplay, 3, 38.
R-8
Uist of Refrece
Vision Committee, National Research Council (1981) Procedures for Tnng
Color Vni.m. Report of Working Group 41. Washington, D.C.: National
Academy Press.
Viveash, J.P. & Laycock, J. (1983) Computation of the resultant chromaticity
4, 17-23.
Volbrecht, V.J., Aposhyan, H.M. & Werner, J.S. (1988) Perception of electronic
display colours as a function of retinal illuminance. Dialays, 9, 56-64.
Volkmann, F.C. (1962) Vision during voluntary saccadic eye movements. Journal
of she Opical Society ofAmerica, 52, 571-578.
Ph.D. Thesis,
Ward, W.D. & Glorig, A. (1961) A case of fire-cracker induced hearing loss.
Laryngoscope, 71, 1590-1596.
Weale, RLA (1982) A Biography of the Eye. London: H.K Lewis & Co.
Weale, R.A. (1988) Age and the transmittance of the human crystalline lens.
Jounal of Ph)violg, 395, 577-587.
Welch, R.B. & Warren, D.H. (1980) Immediate perceptual response to
intersensory discrepancy. Psychological Buletin, 88, 638-667.
R-9
R-10
List of Rderem
PeI nnne
MiyawaKi, K, Strange, W., Verbrugge, R., Liberman, A.M., Jenkins, J.J., &
Fujimura, 0. (1975) An effect of linguistic experience: The discrimination of [r]
and [1] by native speakers of Japanese and English. lq~io & Paydiipkvia,
18, 331-340.
Nadler, E., Meigert, P., Sussman, E.D., Grossberg, M., Salomon, A., & Walker,
K., (unpublished manuscript). Effect of Binaural Delays in the Communication
Channel Linking Radar and Data Controller.
Neisser, U. (1967) Coaiive PolokV, New York: Appleton-Century-Crofts.
Neisser, U. & Becklen, R. (1975) Selective Looking: Attending to visually
specified events. Cogiitw Paoio&V, 7, 480-494.
Palmer, S. E. (1975) The effects of contextual scenes on the identification of
objects. Memoey and Cautilu., 3, 519-526.
Peterson, G. E. & Barney, H.L. (1952) Control methods used in the study of
vowels. Jomunal of he AcouWa Society ofAmerica, 24, 175-184.
Reicher, G. M. (1969) Perceptual recognition as a function of meaningfulness of
stimulus materiaL Jouralof FremealPochoV, 81, 275-280.
Scharf, B., Quigley, S., Aoki, C., Peachey, N., & Reeves, A. (1987) Focused
attention and frequency selectivity. Pewq n & / hoh
, 42, 215-221.
Sperling, G. S. (1960) The information available in brief visual presentations.
Psyic gical Moanograpl, 74, 1-29.
Tsal, Y. (1983) Movements of attention across the visual field. Jounal of
i
en habki-: Human Pewqpio and Pervfomance, 9, 523-530.
Warren, R.M. (1970) Perceptual Restoration of Missing Speech Sounds. Science,
167, 392-393.
Warren, R.M. & Obusek, C.J. (1971) Speech Perception and Phonemic
Restorations. PfcqWk and Pa
pvqi,
9 (38), 358-362.
R-12
R-13
controls. Proceadbn of the 28h Annual Cofenence on Manual Control, WrightPatterson AFB, Dayton, OH.
Hawkins, F.H. (1987) Human Fackva in Flih, Brookflield, VT: Gower Technical
Press.
Hyman, R. (1953) Stimulus information as a determinant of reaction time.
Journal of Expfeimental P.ayholo,
45, 423-432.
Jensen, R.J. & Benel, R. (1977) Judgment evaluation and instruction in civil
pilot training. fnal Report FAA-RD-78-24, Springfield, VA, National Technical
Information Service.
Johnson, S.L. & Roscoe, S.N. (1972) What moves, the airplane or the world?
Human Facovrs, 14, 107-129.
R-14
Lisatofu Memno
Kahneman, D., Slovic, P., & Tversky, A. (Eds.), (1982) AMWuen Unda
Unwerain*. Hawiau and Biwa, New York: Cambridge University Press.
Klein, GA (1989a) Do decision biases explain too much? HMan Factoa
Sodey Adn, 32, 1-3.
Klein, GA (1989b) Recognition-primed decisions. In W. Rouse (Ed.) Advancer
in Man-Madci
se u Rnan*, VoL 5, Greenwich, CT: JAI Press, 47-92.
Klein, K.G., Wegmann, H.M. & Hunt, B.I. (1972) Desynchronization of body
temperature and performance circadian rhythms as a result of outgoing and
homegoing transmeridian flights. Affaqwce Mediine, 43, 119-132.
Nagel, D.C. (1988) Human error in aviation operations. In E. Weiner & D.
Nagel (Eds.), Human Fadorsin Avisaion, New York: Academic Press, 263-303.
Newman, R.L (1987) I o
of head-q digiiay Atndan. Volumes I, II,
III. (AFWAL-TR-3055). Wright-Patterson Air Force Base, OH: Flight Dynamics
Laboratory.
Norman, D. (1988) The Psycho&V of Everyday Thin, New York: HarperCollins.
North, RA., & Riley, V.A (1989) A predictive model of operator workload. In
G.R. McMillan, D. Beevis, E. Salas, M.H. Strub, Rt. Sutton, & L Van Breda
(Eds.), Appiicadorn of Human Perforance to Syute Deui, New York: Plenum
Publishing Corp., 81-99.
Parks, D.L. & Boucek, G.P., Jr. (1989) Workload prediction, diagnosis and
continuing challenges. In G.R. McMillan et al (Eds.), Appficadom of Human
63.'
a
Modek to Symem Deuiu, New York: Plenum Publishing Corp., 4763.
Reason, J. (1990) Human Emr, New York: Cambridge University Press.
Richardson et aL (1982) Circadian Variation in Sleep Tendency in Elderly and
Young Subjects. Sleqp, 5 (suppL2), A.P.S.S., 82.
Roscoe, A.H. (1987). The puctical anam
t of'pl& woxkkad. NATO: AGARDAG-282. Loughton, U.K: Specialized Printing Services Ltd.
Roscoe, S.N. (1968) Airborne displays for flight and navigation. Human Factor,
10, 321-322.
R-15
Human Facto
Steenblik, J.W. (1989, December) Alaska Airlines' HGS. Air Lin Pikv, 10-14.
Vicente, KJ., Thornton, D.C., & Moray, N. (1987) Spectral analysis of sinus
arrhythmia: a measure of mental effort. Human Fadovi, 29(2), 171-182.
Wegman, H.M., Gundel, A., Naumann, M., Samel, A., Schwartz, E. & Vejvoda,
M. (1986). Sleep, sleepiness, and circadian rhythmicity in aircrews operating on
transatlantic routes. Aviation, Space, and Envone
Medicine, 57,
(12, suppl), B53-B64.
Weiner, E.L (1988) Cockpit automation. In E.L. Weiner & D.C. Nagel (Eds.),
Human Factorsin Aviation, New York: Academic Press, 263-303.
Weinstein, L.F. (1990) The reduction of central-visual overload in the cockpit.
in th Department of Defu,
List of References
Wilson, G.F., Skelly, J., & Purvis, G. (1989). Reactions to emergency situations
in actual and simulated flight. In Human Behavior in Hih Ser Siuations in
Afeospc Operaions. NATO: AGARD-CP-458. Loughton. U.K.: Specialized
Printing Services Ltd.
Wright, P. (1977) Presenting technical information: a survey of research
findings. Instrudonal Science, 6, 93-134.
Chapters 9-11
Bainbridge, L (1987) Ironies of automation. In New Technoloq and Human
Emrrr. Ed. M.J. Rasmussen, K. Duncan, and J. Leplat. West Sussex, U.K.: John
Wiley & Sons Ltd.
Barnett, A. & Higgins. M.K. (1989) Airline safety: the last decade. Journal of
Management Scienm 35, January 1989, 1-21.
Broadbent, D.E. (1971) Decisio and Stnes. London: Academic Press.
Card, J. K. (1987. Conversation with Richard Gabriel, Palo Alto, California.
Corwin, W.H., Sandry-Garza, D.L., Biferno, M.H., Boucek, G.P., Logan, A.L.,
Jonsson, J.E., & Metalis, SA. (1989) Asueomn of cew workload meauremnen
met,
ftechniues, and pvrcedurs (WRDC-7R-89-7006). Wright-Patterson Air
Force Base, Wright Research and Development Center, Dayton, Ohio.
Crossman, E.R., Cooke, J.E., & Beishon, R.J. (1974) Visual attention and
sampling of displayed information in process control In The Human Operator in
Process ControL Ed. M. E. Edwards and J. Lees, 25-50.
Curry, R.E. (1985) The introduction of new cockpit technology: a human factors
study. NASA Technical Memorandum 86659. Moffett Field, California: NASA
Ames Research Center.
Department of Defense. (1985) Human eaieerig guideina for management
informnaion swtems (DOD-HBK-761). Wright-Patterson Air Force Base, Air Force
Systems Command, Dayton, Ohio.
Fadden, D.M., & Weener, E.F. (1984) Selecting effective automation. Paper
presented at ALPA Air Safety Workshop Washington, D.C.
Gabriel, R.F. (1987) Internal memorandum. Society of Automotive Engineers
(SAE) Committee on Human Behavioral Technology.
R-17
Deck Cifificatho
Peruou
New York:
for
R-18
List of
Woods, D.D. (1987) Tedhniog
alnis
not em
Chapter 12
Berson, B. L, Po-Chedley, D. A., Boucek, G. P., Hanson, D. C., Leffler, M. F.,
& Wasson, R. L. (1981) AircraftAletig Stms Stw ddiation Study Volume H.
Akrmu Aleing Stmem Daiu Guiddeie, Report No. DOT/FAA/RD-81/II.
Boff, K. R., & Lincoln, J. E., (Eds.) (1988) Eigbweeing Data ComperdiuHuman Pacqei and Performance, Volume III, Harry G. Armstrong Aerospace
Medical Research Laboratory, Wright-Patterson Air Force Base, Ohio, 1862.
Boucek, G. P., Erickson, J. B., Berson, B. L., Hanson, D. C., Leffler, M. F., & Po-
Alng Ssm
Stada'diai
Sdy Phase L
DOT/FAA/RD-82/49.
CardosL Kl & Boole, P. (1991) Analysis of Pilot Rerqn Tme to Thue-Qitical
Air Thfic Cbnad CaAL Report No. DOT/FAA/RD-91/20.
Cardosi, K. & Huntley, M. S. (1988) Cocdpt and Cabin CewCoodntio
Report No. DOT/FAA/FS-88/1.
Foushee, C., Lauber, J., Baetge, M., & Acomb, D. (1986) OCrw Factors ;n F
pmaim19;- 77e q
tnad SodfiAnc
Human FPao
R-20
INDEX
Absorption, 13
increase with cataracts, 19
spectra, 42
Accident data, see automation, Workload assessment
Accommodation, 16, 126, 249
as a consideration in Display design, 249
effects of aging on, 126
to a display panel, 16
to a head-up display (IIUD), 16, 126
Adaptation, 8, 24, 63-65
chromatic, 63-65
to dark, 24
to sound, 8
Afterimage, see color contrast effects
Amacrine cells, see retina
Ambient noise, 8, 106
Amplitude,
spectra, 4
Angle of incidence, 14
Anomaloscope, see Color discrimination tests
Arctan, see retina
Assimilation, see color contrast effects
Attention, 96-97, 118-122, 165-169
automaticity in information display, 121
color as a focusing mechanism for, 119-120
divided, 119-121
early selection, 99
effect of display consistency on, 122
effect of display organization on, 121
effects of display clutter on, 122
electronic display issues of, 119
emergent display features of, 120
focused, 119
improvement with training in, 121
internal, 100
late selection of, 99
limited capacity of, 134
selective, 119, 121-122, 261
separate modalities in information display, 121
timesharing issues of, 165-169
two types of, 97
index-I
lack of objective human factors criteria in FARs and design specifications, 235
loss of proficiency as a result of, 223
loss of situation awareness as a result of, 223
main drivers of new airliner development, 226
manufacturer use of human factors in, 227
NASA conference on cockpit automation issues, 230-232
NASA studies of advanced aircraft acceptance, 220
NASA study of CDU problems, 217-218
need for FAA human factors specialists in certification, 240
need for situation dominance in advanced cockpits, 231
nuclear power studies of, 213
office applications of, 214
operator functions for task-related activities, 230
overconfidence as a result of, 224
pilot opinion as a source for cockpit design information, 219
pilot's role defined, 229-230
problem of non-specific human factors criteria for, 239
reasons cited for, 220-221
reduced job satisfaction as a result of, 223
role of human as a systems monitor in, 235
Sheridan's ten levels of, 210
traditional design practices in, 226-227
variability in applications of, 214
weaknesses of traditional design approach, 227
workload effects in relation to cockpit design, 236
Axons, 21
Bezold Spreading Effect, see color contrast effects
Bezold-Brucke hue shift, see hue
Bifocal lenses, see presbyopia
Binaural unmasking, 9
Bipolar cells, see retina
Blind spot, see optic disc
Brightness, 24, 51
saturation, 52, 257
simultaneous brightness contrast, 51
Broadbent, Donald, 98, 235
Cataract, 19
Cathode Ray Tube (CRT), see automation
Certification, 170, 272, 301-303
human factors criteria for, 240
issues raised by new display technology, 267
methodology requirements for Workload assessment, 272
index-3
color matching, 48
Farnsworth-Munsell 100-Hue test, 47
pseudoisochromatic plates, 49
Color identification, 59
optimum number of colors for visual displays, 59
Color specification, 65-69
advantages of blue stimuli, 70
CIE spectral tristimulus value, 66
CIE spectrum locus, 68
CIE tristimulus value, 66
CIE V function, 66
constraints on use of colors, 70
implications for displays, 69-70
Munsell chroma, 69
Munsell color chip parameters, 69
Munsell system, 68-69
problems with blue stimuli, 70
search time for display items, 69-70
Color vision deficiencies, 45-51
blue-yellow color defects, 47
color blindness, 47
deuteranomaly, 45-46
deuteranopia, 45-46
diabetes-related, 47
drug-related, 47
glaucoma, 47
occurrence in pilot population, 51
protanomaly, 45-46
protanopia, 45-46
red-green color defects, 47
tritanomaly, 45-46
tritanopia, 45-46
Coloi, see hue
Complex sound, 2
Cones, 20, 42-43
long-wave, 42
middle-wave, 42
retinal asymmetry in distribution of, 43
S cones, 43
short-wave, 42
Control Display Unit (CDU), see automation
Convergence, 24, 89
ocular, 89
to bipolar cells, 24
index-5
Cornea, 14
Critical flicker fusion (CFF), see flicker
Cycles, 2
Decibels (dB), 3
Decision making, 133-163
"broad/shallow" FMC menus, 149
"narrow/deep" FMC menus, 149
binocular rivalry, 91
chromostereopsis, 91
color stereopsis, 91
cues used by pilots, 87
interposition, 85
linear perspective, 85
monocular cues in relation to size and distance, 84-86
monocular cues in relation to size, 84
monocular depth cues, 83
moon illusion, 84
motion parallax, 87
motion perspective, 87
occurrence of strabismus in population, 91
optic flow patterns, 87
perceptionof texture, 86
random-dot stereograms, 90
spatial errors in, 85
stereo imagery on displays, 92
stereopsis, 89
strabismus, 91
use of binocular cues in aerial surveillance, 91
Dichromats, 45
Display compatibility, 115-118
applications to aviation, 116-118
meaning of colors, 116, 233, 258
multiple stereotypes, 116
perception of displayed information, 115
population stereotypes, 116, 258
principle of pictorial realism, 116
principle of the moving part, 116, 233
principles of multi-element display design, 118-122
S-C compatibility, 116
S-R compatibility, 116
spatial interpretation, 116
Display design, 243-267
advantages of building on past successes, 248
analysis of alternate sources for required information in, 247
analysis of continuous dynamic control tasks in, 247
analysis of new tasks in, 247
analysis of similar tasks in, 247
benefits of top-down task analysis for, 246
certification issues raised by new technology, 267
characteristics of proven value in symbology, 249
command information in, 264-265
index-8
reconstructive, 111-112
sensory, 94, 108-111
short-term memory capacity, 110
short-term memory interference, 110
short-term, 95, 108, 110-111
tip-of-tongue phenomenon, 111
working, 95, 108, 134
see also short-term memory
Memory, 107-113, see also information processing
Middle-wave cone, see cone
Monochromatic lights, 12
Monochromats, 45
cone, 46
rod, 46
Motion perception, 37-39
functions of, 37
illusions of, 38-39
of figure, 37
of ground, 37
stroboscopic, 37
thresholds of, 37
Motion perspective, see Depth perception
Munsell system, see Color specification
Myopia, 16
Nanometers, 12
Nasal retina, see retina
Neurons, 94
Nystagmus, 31
Ocular media transmission, 17
Optic disc, 21
blind spot of, 21
Optic nerve, 21
Optical density, 17
Parallel processing, see information processing
Pattern recognition, see information processing
Pertinence Theory, see Attention
Photons, 11
Photopic spectral sensitivity, see spectral sensitivity
Photopic vision, 22, 46
Photopigments, 20
Physiological nystagmus, see nystagmus
index- 17
Sine wave, 2
Situational awareness, see decision making, workload assessment
Size constancy, 72
Sleep cycle, 191
circadian rhythms of, 191
defining characteristics of, 190
desynchronization, 195
Mean Sleep Latency Test (MSLT)
resynchronization, 198-199
sleep latency, 192-193
Sleep disruption, 190-199
characteristics of sleep, 190
controlled napping as an antidote to, 199
desynchronization, 195
in pilots, 193-198
micro-sleep, 199
NASA long-haul study of, 195-198
NASA short-haul study of, 193-195
performance as a measure of, 193
prophylactic sleep as an antidote to, 193
rapid eye movement (REM) sleep, 190
shift rates of biological and performance functions after transmeridian flights, 197
sleep inertia as a phenomenon of, 199
sleep resynchronization, 198-199
slow wave sleep, 191
Sound adaptation, 8
Sound exposure, 7
Sound habituation, 8
Sound intensity, 2
stimulus, 8
Sound intensity, 2
interaural differences in, 7
Sound masking, 8, 106
Sound sensitivity, 5-6
index-19
absolute, 5
loss in, 6
Sound time differences, 7
Spectral sensitivity, 22
function, 23
Spectrum, 17
broadband of, 12
ultraviolet portion of, 17
visible portion of, 17
Speech perception, see information processing
Speed of sound, 2
Stabilized retinal image, see retina
Statistical tests and concepts, see human factors testing
Stereograms, see Depth perception, binocular depth cues
Stereopsis, see Depth perception
Stereovision, see Depth perception, binocular depth cues
Strabismus, see Depth perception
Stroboscopic motion, see motion perception
Temporal retina, see retina
Temporal vision, 32
Timbre, 4
Time differences, see sound time differences
Timesharing, 165-169
automatized performance in, 168
confusion in verbally dependent environments, 166-167
confusion in, 166
importance of voice quality in, 167
NASA Langley research in, 166
performance resource function in, 167
residual resources in, 167, 179
resources and, 167
sampling and scheduling in, 166
Trichromats, 45
Tunneling, see decision making, Display design
Tympanic membrane, 2
Ultraviolet radiation, 27
hazardous effects of, 27
Visual acuity, 20, 25, 72, 249
as a measure of resolution, 72
in Display design, 249
loss of, 25
index-20
index-23/index-24
IM9