Schaefer 2016 TrustPerceptionScale-HRI CHP 10.1007 978-1-4899-7668-0 10
Schaefer 2016 TrustPerceptionScale-HRI CHP 10.1007 978-1-4899-7668-0 10
net/publication/299993832
CITATIONS READS
178 13,064
1 author:
Kristin E. Schaefer
Army Research Laboratory
56 PUBLICATIONS 3,357 CITATIONS
SEE PROFILE
All content following this page was uploaded by Kristin E. Schaefer on 06 August 2020.
Kristin E. Schaefer
10.1 Introduction
antecedents relating to the human, the robot, and the environment (Hancock et al.
2011; Schaefer et al. 2014). The key findings from the associated meta-analyses
point to the fact there is still much to learn about how trust develops. However,
what is prevalent in the literature is the finding that until trust between a human
and a robot is solidly established, robotic partners will continue to be underutilized
or unused, therefore providing little to no opportunity for trust to develop in the
first place (Lussier et al. 2007). This is in part due to the fact that one of the
most significant challenges for successful collaboration between humans and robots
is the development of appropriate levels of mutual trust in robots (Desai et al.
2009; Groom and Nass 2007). So, regardless of the domain of application, the
environment, or the task, a human’s trust in their non-human collaborator is an
essential element required to ensure that any functional relationship will ultimately
be effective.
Research has continued to address the creation and validation of successful
evaluation methods for a wide spectrum of HRI issues, including this issue of
human-robot trust (Steinfeld et al. 2006). Yet, a limitation in the field has been
related to accurate measurement of trust specific to the unique nature of HRI.
Human-robot trust is currently measured through subjective assessment. However,
these previous studies have been limited by using measurement tools that are
a single self-report item (e.g., How much do you trust this robot?) or are an
adapted human-interpersonal or human-automation trust scale. The concern with
this methodology is that neither of those options truly assesses the full scope
of human-robot trust, and brings to question the accuracy of the trust scores.
There has been one notable exception: Yagoda and Gillan (2012) developed a
subjective human-robot trust scale that is specific to military application. However,
the changing vision of HRI continues to press the inclusion of robotic technologies
into multiple contextual domains that incorporate varying levels of autonomy,
intelligence, and interaction. This calls forth the need for the development of
additional trust measurement tools specific to the changing HRI environment.
This chapter summarizes research that was conducted to produce a reliable and
validated subjective measure: the Trust Perception Scale-HRI (see also Schaefer
2013). The goal of this research was to design a subjective tool specific to the
measurement of human-robot trust that could be expressed as an overall percentage
of trust. In addition, this scale was designed to effectively measure trust perceptions
over time, across robotic domains, by individuals in all the major roles of HRI
(operator, supervisor, mechanic, peer, or bystander, as defined by Scholtz 2003),
and across various levels of system autonomy and intelligence (see also Beer et al.
2014). To ensure that this new scale was valid, each part of scale development was
constructed using the widely-accepted procedures discussed in DeVellis (2003) and
Fink (2009). These procedures followed the protocol of large item pool creation,
statistical item pool reduction, content validity assessment, and task-based validity
testing.
10 Measuring Trust in Human Robot Interactions. . . 193
The first step in creating the Trust Perception Scale-HRI was to create an Item
Pool. An Item Pool is a collection of relevant phrases or items that are associated
with trust development. To meet this end, over 700 articles in the areas of human-
robot trust, human-automation trust, and human-interpersonal trust were reviewed
and analyzed. Theoretical, qualitative, and quantitative relationships were recorded.
Potential items were then organized in relation to the Three Factor Model of Human-
Robot Trust (Hancock et al. 2011). This model was then updated to incorporate
potential antecedents of trust (see Fig. 10.1). Specific items to be included in the
initial Item Pool were first drawn from these large scale literature reviews.
One major trust-specific finding from these reviews was the importance of
design as it related to the robot’s physical form and functional capability. While
some research had focused on the functional capabilities, limited experimental
study had been conducted specifically related to the impact of robot form on trust
development. Therefore, two initial experiments were conducted to assess this gap
in the literature and further develop the initial Item Pool.
The purpose of the first study was to determine the relationship between physical
form and trustworthiness, devoid of any direct information regarding the functional
capabilities of the system. One hundred sixty-one participants rated 63 images of
real-world industry, military, medical, service, social, entertainment, and therapy
robots. These ratings included the degree to which the robot was perceived to be a
machine, a robot, and an object, as well as its perceived intelligence (PI), level of
Fig. 10.1 Updated Three Factor Model of Human-Robot Trust following an extend literature
review of trust in the interpersonal, automation, and robot domains
194 K.E. Schaefer
automation (LOA), trustworthiness, and the degree to which the participant would
be likely to use or interact with the robot.
A multiple regression correlation analysis with stepwise entry of variables
was conducted to determine the factors that predicted trustworthiness from per-
ceived robot form alone. This was achieved by regressing trustworthiness onto
human-related factors (gender, race, age, year in school), personality traits (agree-
ableness, extroversion, conscientiousness, intellect, neuroticism), negative attitudes
toward robots (negative attitudes toward emotions in interactions, negative social
influence, and negative situational influence), as well as self-report items of
robot form (perceived intelligence, perceived level of automation (LOA), robot
classification). The final model included perceived intelligence (PI), robot classi-
fication (RC), and negative social influence (SI) as predictors of trustworthiness,
Ŷ D 0.825 C 0.651(PI) C 0.256(RC) 0.164(SI). It accounted for a significant R2
of 45.1 % of the variance, F(3156) D 42.70, p < 0.001.
These results suggested that preconceived ideas regarding the level of intelli-
gence of a robot are form-dependent and assessed prior to interaction, in much the
same way as one individual will assess another individual as a potential teammate.
Further, negative social influence (e.g., capabilities, functions, etc.) plays a key
role in expectation-setting similar to stereotypes of human teammates. Overall,
the results of the above-mentioned study provided support that physical form is
important to the trust that develops prior to HRI (for additional findings see also
Schaefer et al. 2012).
The follow-up study was designed to identify which perceived robot attributes
could impact the trustworthiness ratings. Robot attributes were assessed through
a subset of the Godspeed questionnaire (Bartneck et al. 2009), a standardized
measurement tool for HRI for interactive robots, specifically looking at items related
to anthropomorphism (Powers et al. 2007), animacy (Lee et al. 2005), likeability
(Monahan 1998), and perceived intelligence (Warner and Sugarman 1996). Over
200 participants rated a subset of the previous study’s stimuli (two that were
previously rated low on the robot classification scale, two that were rated high on the
robot classification scale, and 14 that had diverse ratings on the robot classification
scale). As anticipated, there was a significant relationship between how individuals
rated the robot image on the robot classification scale and their perceived level of
trustworthiness in the robot, r(2910) D 0.307, p < 0.001. The higher the rating of
a robot to actually be classified as a robot, the more likely it was to be rated as
trustworthy. The main purpose of this study was to determine if specific attributes
could be identified to account for this relationship. Overall, results showed that each
robot had different attributes that were important to classification. Therefore, it was
decided to include all attribute items in the initial Item Pool.
Following the literature review and the two studies mentioned above, a full
review of previously developed and referenced trust scales in the robot, automation,
and interpersonal domains were reviewed to refine the items. This resulted in a
review of 51 new scales (with a total of 487 trust items), 22 adapted versions
of previously developed scales, and 13 previously developed scales (see also
Table 10.1).
10 Measuring Trust in Human Robot Interactions. . . 195
The final Item Pool resulted in the creation of 156 initial items. Between two and
four items were created for each antecedent, representing equal number of positively
and negatively worded (or opposite related) items. Initial scale items were written
out as full sentences and referred to a general statement regarding “most robots” on
a 7-point Likert-type scale (see Fig. 10.2).
The second step in the scale development procedure was to reduce the size of
the initial Item Pool using statistical procedures. These procedures began with a
Principal Component Analysis (PCA) to identify potential groupings of items, as
well as items that were not included in the groupings. Secondary analysis was
conducted using paired samples t-tests to determine if the positively and negatively
worded items were equal and thus could be reduced from the initial Item Pool.
196 K.E. Schaefer
One hundred fifty-nine undergraduate students (65 males, 94 females) from the
University of Central Florida took part in this study via online participation
(SurveyMonkey.com). Following informed consent, participants completed the 156
randomized initial trust items. Participants then completed the demographics ques-
tionnaire that included gender, age, a mental model question, and prior experience
questions. The study took approximately 30 min to complete. Participants’ prior
experience with robots was assessed to understand previous exposure to robotic
technologies. Prior experience has been shown to be related to how an individual
forms a mental model of the robot and anticipates future HRI. As expected,
the sample population had prior exposure to media representations (N D 156);
some minor interaction with real-world robots (N D 36); and some opportunity to
control (N D 34) or build (N D 11) a real-world robot during school or club related
requirements. Table 10.2 presents results of these questions.
To assess the participants’ mental model of a robot, they were asked to describe
what a robot looks like with an open-ended question. Mental models refer to
structured, organized knowledge that humans possess which describe, explain, and
predict a system’s purpose, form, function, or state (Rouse and Morris 1986).
The responses were coded into categories (see Table 10.3). Seventeen participants
directly referenced specific robots from movies or television (e.g., R2D2, C-3P0,
iRobot, AI, and Terminator; N D 14); the video game Mass Effect 3 (N D 1); real-
world military robots (e.g., Predator, N D 2); and a robotic arm (N D 2).
All data were analyzed using IBM SPSS v.19 (SPSS 2010), with an alpha level set
to 0.05, unless otherwise indicated. These findings were important as they provided
potential cause as to whether to retain or reject specific items from the initial Item
Pool.
PCA was performed on the 156 initial trust items. Extraction was used
to identify 43 components (using the Kaiser Criterion of Eigenvalue >1 for
truncation), accounting for 79.63 % of the variance. Following review of the
scree plot, four components were retained. The un-rotated solution was subject
to orthogonal varimax rotation suppressed below j0.30j. In the rotated model, the
four components accounted for 30.64 % of the variance. In looking at the loadings
in the Rotated Component Matrix, 22 items with high loadings (>0.60) were
located in Component 1. Based on the loadings of trust items on each of the four
components, interpretations can be made about the factors themselves. Component
1 seemed to represent performance-based functional capabilities of the robot.
Component 2 seemed to represent robot behaviors and communication. Component
3 may represent task or mission specific items. Finally, Component 4 seemed to
represent feature-based descriptors of robots. These components supported the
theory addressed by the descriptive Three Factor Model of Human-Robot Trust
(first described by Hancock et al. 2011). Following PCA, 26 items were considered
for immediate removal from the Item Pool.
Means, standard deviations, normality (skewness and kurtosis), correlations,
z-scores, and paired samples t-tests were conducted to further assess items for
retention or removal. To be retained in the scale, items should retain normality.
Therefore, 62 items with significant skew and 20 items with significant kurtosis
were considered for removal from the Item Pool. In addition, paired samples t-tests
were conducted on all of the paired items (positive and negatively worded items) to
determine if they were interchangeable, thus reducing the item pool. The results of
198 K.E. Schaefer
this assessment resulted in 39 paired items that were found to be not significantly
different from each other. These results provided a rationale for reducing the scale
by an additional 39 items.
Even though some elements might have been considered for removal, the
following items were retained for subject matter expert (SME) review due to their
importance to trust theory: move quickly, move slowly, require frequent mainte-
nance, autonomous, led astray by unexpected changes in the environment, work
in close proximity with people, possess adequate decision-making capability, make
sensible decisions, openly communicate, and communicate only partial information.
Following the various statistical assessments (PCA, normality assessment, and
paired samples t-tests), the Item Pool was reduced from 156 items to 73 items.
Two major changes were made to the scale following this study. First, there were
some potential issues that arose with the wording of the items. Two main types of
item formation were included in the above version of the scale. Items either began
with “Most robots” or “I.” This may have impacted the factor creation. Therefore,
all items were reduced to a single word or short phrase prior to subject matter expert
(SME) review. Secondly, the scale was modified from a 7-point Likert-type scale
to a percentage scale with 10 % increments. The decision to make this change
in the scale was related to larger purpose to develop a scale that provided a trust
rating from no trust (0 %) to complete trust (100 %). This change was supported by
research, especially in the interpersonal and e-commerce domains that suggest trust
and distrust are viewed as related but separate constructs with differing effects on
behavior, consequences, and outcomes (Lewicki et al. 1998; McKnight et al. 2004;
Wildman 2011; Wildman et al. 2011).
The third step in the process to create a reliable and valid subjective scale was
content validation. In this step, the goal was to survey SMEs in the area of trust
and robotics in order to determine if each item should be retained or removed from
the Item Pool. This two-phase semantic analysis included item relevance (content
validation) using the protocols described by Lawshe (1975), and the identification of
the hypothetical range of differences (e.g., no trust and complete trust differences)
for each item.
10 Measuring Trust in Human Robot Interactions. . . 199
Eleven SMEs were included from the United States Army Research Laboratory,
United States Air Force Research Laboratory, and faculty members from university
research laboratories. All SMEs were considered experts in the fields of trust
research, robotics research, or HRI. Table 10.4 provides the SME’s years of
experience across a variety of robot, automation, and research topics.
SMEs were contacted via email. Upon agreement to participate, they were pro-
vided a link to complete an online survey. All data for this experiment were collected
through an online tool (SurveyMonkey.com). SMEs were provided background
information, purpose, and a brief review of trust theory prior to beginning the multi-
part study. In Part 1, SMEs were provided background information, purpose, and a
brief review of trust theory prior to beginning the multi-part study. In Part 1, SMEs
completed an expertise questionnaire. In Part 2, SMEs were given instructions to
complete the 73 item Trust Scale with the instructions “Please rate the following
items on how a person with little or no trust in a robot would rate them.” In
Part 3, SMEs were given instructions to complete the 73 item Trust Scale with
the instructions “Please rate the following items on how a person with complete
trust in a robot would rate them.” All items in Part 2 and Part 3 were randomized.
Part 4 was the Content Validation questionnaire based on Lawhe (1975) content
analysis protocols. SMEs rated each item on a 3-point Likert-type scale as either
“extremely important to include in scale,” “important to include in scale,” or “should
not be included in scale.” SMEs could also mark if they felt an item was domain
specific (e.g., military robotics, social robotics, etc.). A comment box was available
to provide any clarification about why they rated the item a specific way, to provide
additional recommendations to the scale design, or to suggest items that may be
missing from the scale. The total survey took approximately 30 min to complete.
200 K.E. Schaefer
Items were analyzed using the Content Validity Ratio developed by Lawshe (1975).
The Content Validity Ratio (CVR), depicted in Eq. 10.1, is a commonly used method
of analyzing scale items (see also Yagoda and Gillan 2012). The CVR equation was
derived from a 3-point Likert scale (1 D Should not be included in scale, 2 D Might
be important to include in scale, and 3 D Extremely important to include in scale).
Table 10.6 The 37 “Important Items” separated by retained and removed items
Complete trust No trust Range
CVR Mean SD Mean SD t p
Items retained
1. Operate in an integrated team 1.00 75.00 21:73 28.00 25.73 3:62 0.006
environment
2. Autonomous 1.00 69.09 20:23 38.89 23.15 3:18 0.013
3. Good teammate 0.82 87.27 11:04 11.82 11.68 16:60 <0.001
4. Performs a task better than a novice 0.82 69.09 20:71 24.55 24.23 6:69 <0.001
human user
5. Led astray by unexpected changes 0.82 71.00 19:12 26.00 15.78 5:78 <0.001
in the environment
6. Know the difference between friend 0.82 71.00 27:67 14.55 16.95 5:89 <0.001
and foe
7. Make sensible decisions 0.82 84.00 11:74 21.00 20.79 8:62 <0.001
8. Clearly communicate 0.82 83.00 11:60 19.00 15.24 12:30 <0.001
9. Warn people of potential risks in 0.82 83.00 11:60 23.00 18.29 7:75 <0.001
the environment
10. Incompetent 0.82 85.45 14:40 39.00 33.15 5:24 0.001
11.Possess adequate decision-making 0.82 71.11 16:91 20.00 21.60 5:57 0.001
capability
12. Are considered part of the team 0.82 79.00 12:87 35.00 31.36 3:36 0.010
13. Will act as part of the team 0.82 74.00 19:55 30.00 29.06 3:28 0.010
14. Perform many functions at one 0.82 68.00 23:48 36.00 25.47 4:40 0.002
time
15. Protect people 0.82 77.00 24:52 23.00 22.14 3:92 0.003
16. Openly communicate 0.82 80.00 15:81 34.00 28.36 3:79 0.005
17. Responsible 0.82 66.36 30:42 27.27 34.67 2:76 0.020
18. Built to last 0.82
19. Work in close proximity with 0.82 65.00 19:58 35.56 26.03 2:34 0.047
people
20. Supportive 0.64 66.00 16:47 18.00 11.35 8:67 <0.001
21. Work best with a team 0.64 71.00 18:53 34.00 28.36 3:41 0.008
22. Tell the truth 0.64 86.36 21:57 46.00 31.69 2:90 0.018
23. Keep classified information secure 0.64 84.00 15:06 55.45 36.43 2:69 0.025
24. Require frequent maintenance 0.64 74.00 20:66 44.00 28.75 2:37 0.042
Items removed from scale
25. Responsive 1.00 84.00 9:66 32.22 21.67 7:50 <0.001
26. Poor teammate 0.82 84.55 9:66 45.45 36.43 3:64 0.005
27. Are assigned tasks that are critical 0.82 58.00 32:25 25.00 35.67 1:84 0.098
to mission success
(continued)
202 K.E. Schaefer
Twenty-four items were retained for further scale validation. The remaining
13 items were removed from the scale for the following reasons: non-significant
findings on the paired samples t-tests for eight items, and SMEs comments assisted
in the removal of the remaining five items. The first comment was a general
comment stating that some items (e.g., easy/difficult to maintain) were repetitive
in nature. To address this comment, only one of the repetitive items was included in
the revised scale. Responsive, easy to maintain, and poor teammate were removed.
In addition instill fear in people was removed due to its nature of distrust more than
trust. Finally, the item likeable was considered to be too general an item for the scale.
An additional comment suggested that two items are given complete responsibility
for the completion of the mission, and are assigned tasks that are critical to mission
success represented situational factors that may be a separate issue from trust. Even
though CVR analysis revealed that SMEs felt that they might be important items
to include in the scale, their responses to the theoretical range of scores did not
show a significant change in trust. This added support for the SMEs’ comments
recommending removal of these two items.
Twenty-two items did not meet the CVR criterion and were reviewed for removal
from the scale. Of these 22 items, the four items regarding movement (move quickly,
move slowly, mobile, move rigidly) were removed. To further support this decision,
there was a comment from a SME suggesting that trust and speed were orthogonal.
SMEs further recommended that three items representing robot personality (kind,
caring, have a relationship with their human users or operators) were recommended
for removal. An additional nine of these items (human-like, fake, alive, dead,
offensive, organic, ignorant, apathetic, and make decisions that affect me personally)
were removed from the scale due to the lack of change in scores on the theoretical
range of responses. However, out of those 22 items that did not meet the CVR
criterion, a number of SMEs felt that some items may have particular importance
10 Measuring Trust in Human Robot Interactions. . . 203
The .avi file was converted into the .mp4 file format to add auditory feedback. The
robot stated “target detected” in a male computer synthetic voice. There were eight
possible targets included in each scenario. No false alarms were included.
The study was conducted in a single session. Participants were provided a copy
of the informed consent while the experimenter read it aloud. Participants then
viewed an image of the Talon robot and completed the 42 Item Trust Scale (Time
1; pre-interaction). Next, participants were instructed about the human-robot task
they were about to monitor. Following Video 1, participants completed the 42 Item
Trust Scale (Time 2; post-interaction, 100 % reliable feedback). Participants then
received the second task instructions, monitored Video 2, and completed the 42
Item Trust Scale (Time 3; post-interaction, 25 % reliable feedback). Following
these monitoring tasks, participants completed the ITS, NARS, and demographics
questionnaire. The study took approximately 90 min in its entirety.
All data were analyzed using IBM SPSS Statistics v.19 (SPSS 2010), with an alpha
level set to 0.05, unless otherwise indicated.
The first step of analysis was to determine if each item changed over time. A one-
way within-subjects repeated measures analysis of variance was conducted for each
of the 42 items of the Trust Scale. The factor was “time” and the dependent variable
was the individual score. Thirty-four items showed a significant mean difference
for the condition of “time.” Using Post-hoc analysis, six items were found to only
have significant differences between Time 1 (Pre-Interaction) and Time 2 (post-
interaction). These items included: lifelike, perform many functions at one time,
friendly, know the difference between friend and foe, keep classified information
secure, and work in close proximity to people. These results may have occurred due
to a significant change in the mental model from pre- to post-interaction; therefore,
the items were retained in the scale. Confidence interval analysis was conducted
on the remaining two items (operate in an integrated team environment, built to
last) and showed no significant change between Time 1, Time 2, or Time 3. These
items were removed from the scale. Additional analyses were conducted using the
40 retained items.
First, a general score of trust was created for each of the three time periods.
Following reverse coding of specific items, the 40 items were summed and divided
206 K.E. Schaefer
100
90
80
70
Trust Score (%)
60
50
40
30
20
10
0
Time 1: Pre-Interaction Time 2: Post-Interaction Time 3: Post-Interaction
(100% reliable) (25% reliable)
Fig. 10.3 Bar graph representing significant mean differences of trust over time, with 95 %
confidence interval error bars
10 Measuring Trust in Human Robot Interactions. . . 207
80
70
60
Trust Score (%)
50
40
30
20
10
0
Time 1 Time 2 Time 3
Fig. 10.4 Trust scores for the 40 item and the 14 item scale across time
Additional analyses were also conducted to identify the differences between the 40
Item Trust Scale and the 14 Item SME recommended scale. A 2 Scale (40 items, 14
items) 3 Time (Time 1, Time 2, Time 3) repeated measures analysis of variance
was conducted (see Fig. 10.4).
Results showed a significant effect of Time, F(2240) D 186.59, p < 0.001,
˜p2 D 0.609; Scale F(1240) D 273.61, p < 0.001, ˜p2 D 0.533; and an interaction
between Time and Scale, F(2240) D 108.84, p < 0.001, ˜p2 D 0.476. Review of the
confidence intervals showed a significant difference between the scales at Time 1
and Time 2, but not Time 3. While findings revealed significant differences between
the two scales, graphical representations showed similar patterns in the results.
Taking into account both the individual analyses of each item measured over Time,
as well as the comparative results of the two scales, it appeared that the total trust
score of the 40 Item Trust Scale provided a finer level of granularity and thus a more
accurate trust rating.
208 K.E. Schaefer
This study marked the final validation experiment for the 40 Item Trust Scale.
It used a Same-Trait approach (Campbell and Fiske 1959) to validate that this
scale measured trust and not an alternative construct. The same-trait was evaluated
through a comparison of the developed 40 Item Trust Scale, and the well-established
Checklist for Trust between People and Automation (Jian et al. 1998) trust in
automation scale. Human-robot interaction was accomplished through computer-
based simulation of a joint navigation task.
It was first hypothesized that there would be a strong positive correlation
between the 40 Item Scale, the 14 Item SME selected subscale, and Checklist
for Trust between People and Automation (Jian et al. 1998) trust scales. The
second hypothesis was that the three change scores in the post-interaction conditions
(20 % robot navigation errors—80 % robot navigation errors) for the types of trust
scales (i.e., 40 item scale, 14 item scale, and Checklist for Trust) would not show
significant mean differences. However, it was anticipated that the 40 Item and 14
Item scales would change from pre-interaction measurement to post-interaction
measurement, as shown in the prior experiment.
and the task conditions. The task was to assist an autonomous robot from a set
location to a rendezvous point. Participants controlled a Soldier avatar throughout
the Middle Eastern town using a keyboard and mouse interface. It was possible
that the robot could become stuck on an obstacle and required the participant’s
assistance. Participants could move certain obstacles out of the way of the robot
by simply walking into the obstacle. The mission ended when both the Soldier and
robot reached the rendezvous location. In Simulation A, the robot autonomously
navigated around four out of the five obstacles. In Simulation B, the robot only
navigated around one of the obstacles. Each simulation was approximately 1 min in
length. The order of simulation presentation was counterbalanced and determined
prior to participation.
Following completion of informed consent, participants completed three ques-
tionnaires: the demographics questionnaire, the Mini-IPIP personality inventory,
and the ITS. Participants were then shown a picture of the Talon robot and completed
the 40 Item Trust Scale, the Checklist for Trust between People and Automation,
and the DSSQ to acquire baseline information. Participants then completed the two
simulated tasks, followed by completion of the two trust scales and the post-task
®
DSSQ following each task. The simulated tasks were recorded using FRAPS real-
time video capture and benchmarking program with a 30 frame rate/s .avi file.
Video was recorded from the Soldier character’s perspective. It was saved with
a unique identifier to maintain participant confidentiality. The entire study took
approximately 1 h to complete.
All data were analyzed using IBM SPSS Statistics v.19 (SPSS 2010), with an
alpha level set to 0.05, unless otherwise indicated. Initial analyses were conducted
to assess changes in human states over time. Results demonstrated no significant
difference in mood state or motivation subscales. The thinking style subscales of
self-focused attention and concentration showed a significant difference between
pre-interaction and post-interaction, but no difference between the two post-
interaction conditions. A similar result was found for the thinking content subscale,
task interference. Due to these findings, no additional analyses were conducted
assessing human states. The Same-Trait methodology compared the developed 40
Item Trust Scale, the SME’s recommended 14 Item Trust Scale, and the Checklist
for Trust between People and Automation.
The first step in this validation was to identify the relationships between the three
trust scales. In support of Hypothesis 1, significant positive Pearson correlations
were found between all three scales (see Table 10.7).
210 K.E. Schaefer
The second step in this validation process was to determine if there was a
significant mean difference between the post-interaction change scores (20 % robot
navigation errors—80 % robot navigation errors) for the three scales. A within-
subjects repeated measures analysis of variance was conducted. In support of
Hypothesis 2, there was not a significant mean difference between the post-
interaction change scores for the three trust scales, F(2,19) D 2.64, p D 0.097. This
result provided additional support that the developed scale measures the construct
of trust.
To further explore these scale differences, a 3 Trust Scales (40 item, 14 item, and
Checklist for Trust) 3 Conditions (pre-interaction, 20 % robot error, and 80 %
robot error) repeated measures analysis of variance was conducted. There was a
main effect of scale, F(2,59) D 105.16, p < 0.001 ˜p2 D 0.781, but not condition
(p D 0.191). Results are depicted in Fig. 10.5. Confidence interval analysis of the
mean trust scores demonstrated that there was no significant difference between the
three scales that were recorded pre-interaction. This finding suggested that all three
scales provide similar trust scores prior to HRI. Further there were no significant
differences between the 14 Item Trust Scale and the well-established Checklist
for Trust between People and Automation (Jian et al. 1998). This finding is not
surprising as the items from both the 14 Item Trust Scale and the Jian et al. scale
referenced the capability of the system. The important finding was the significant
differences between the 40 item Trust Scale and the Jian et al. scale during the two
post-interaction conditions.
100
90
80
70
Trust Score (%)
60
40 Item
50
14 Item
40 Jian et al.
30
20
10
0
Pre-Interaction Post-Interaction (20% Post-Interaction (80%
Errors) Errors)
Fig. 10.5 Differences between the 40 Item, 14 Item, and Checklist for Trust between People and
Automation (Jian et al. 1998) trust scales
212 K.E. Schaefer
This study demonstrated that the developed trust scale assessed the construct of
trust. In addition, it provided support for additional benefits of the developed 40
Item Trust Scale above and beyond previously used scales (i.e., Checklist for Trust
between People and Automation; Jian et al. 1998). First, there were strong positive
correlations between the three scales. Second, mean analysis showed significant
differences between the 40 item and Jian et al. scale in the post-interaction
conditions. Both the 40 Item and the 14 Item scales showed a significant change
in trust from pre-interaction to post-interaction; however the Checklist for Trust
did not change. This change in trust was mirrored in the previous study, and is
supported by the trust theory. Therefore, it can be postulated that the developed
Trust Scale accounted for the relationship between the change in mental models
and trust development that occurs after HRI. In addition, findings from this study,
together with the findings from the previous study provide support that the 40 Item
Trust Scale had more accurate trust scores than both the 14 Item SME recommended
scale and the Checklist for Trust scale.
10.7 Conclusion
The goal was to develop a trust perception scale that focused on the antecedents
and measurable factors of trust specific to the human, robot, and environmental
elements. This resulted in the creation of the 40 item Trust Perception Scale-HRI
and the 14 item sub-scale. The finalized scale was designed as a pre-post interaction
measure used to assess changes in trust perception specific to HRI. The scale was
also designed to be used as post-interaction measure to compare changes in trust
across multiple conditions. It was further designed to be applicable across all robot
domains. Therefore, this scale can benefit future robotic development specific to the
interaction between humans and robots.
The scale provided an overall percentage score across all items. Items were preceded
by the question “What percentage of the time will this robot : : : ” followed by a list
of the items. The finalized 40 item scale is provided in Table 10.8, and took between
5 and 10 min to complete.
10 Measuring Trust in Human Robot Interactions. . . 213
When the scale is used as a pre-post interaction measure, the participants should
first be shown a picture of the robot they will be interacting with or provided a
description of the task prior to completing the pre-interaction scale. This accounts
for any mental model effects of robots and allows for comparison specific to the
robot at hand. For post-interaction measurement, the scale should be administered
directly following the interaction. To create the overall trust score, 5 items must first
be reverse coded. The reverse coded items are denoted in the above table. All items
are then summed and divided by the total number of items (40). This provides an
overall percentage of trust score.
While use of the 40 Item scale is recommended, a 14 Item subscale can be used to
provide rapid trust measurement specific to measuring changes in trust over time, or
during assessment with multiple trials or time restrictions. This subscale is specific
to functional capabilities of the robot, and therefore may not account for changes in
trust due to the feature-based antecedents of the robot. Trust score is calculated by
first reverse coding the ‘have errors,’ ‘unresponsive,’ and ‘malfunction’ items, and
then summing the 14 item scores and dividing by 14. The 14 items are marked in
Table 10.8.
10 Measuring Trust in Human Robot Interactions. . . 215
This scale was developed to provide a means to subjectively measure trust per-
ceptions over time and across robotic domains. In addition, it can be used by
individuals in all the major roles of HRI: operator, supervisor, mechanic, peer, or
bystander. Therefore, there are many potential avenues for future research using
Trust Perception Scale-HRI. Current and near-term research studies highlight the
expansion of human-robot trust antecedents, which were first described in the
Three Factor Model of Human-Robot Trust (Hancock et al. 2011; Schaefer et al.
2014). These include the exploration of human-robot trust as it relates to: Soldier,
bystander, and robot proximity in high-risk military tasking; transparent system
communication and feedback for Soldier-robot teaming in high-risk environments;
the development of natural language processing in human-robot teams; and dual-
task engagement with an autonomous vehicle designed for on-base personnel
transport. While a number of these studies are currently under review or in press, a
few preliminary results are discussed below.
First, Sanders et al. (2014) used the Trust Perception Scale-HRI to assess
the impact of the amount of communication feedback from the robot (constant,
contextual only, minimal) and modality of information (visual, text, audio) on
trust development. Their initial results found a greater increase in the change in
trust (post-interaction minus pre-interaction) for a constant stream of information
compared to contextual information only and minimal information across three
different communication modalities (text, auditory, and visual) during a Soldier-
robot team surveillance task clearing an area of weapons and locating civilians to
be safely evacuated from a hostile zone. These researchers are continuing to use
the Trust Perception Scale-HRI for future studies in transparent communication,
human roles (team member, bystanders), and social dynamics including proxemics,
as part of the US Army Research Laboratory’s Robotics Collaborative Technology
Alliance tasking related to determinants of shared cognition and social dynamics in
future Soldier-robot teams.
Second, a recent study using the Trust Perception Scale-HRI was conducted
where participants monitored a simulated and a live robot surveying an environment,
locating an object, and touching the object (Schafer et al. 2015). Results supported
previous findings that individuals trust a reliable robot significantly more than the
unreliable robot. Also, the results showed that the scale is effective for measuring
trust in both simulated and live HRI experimentation.
A third area of on-going and near-term work using the Trust Perception Scale-
HRI is exploring trust development with respect to the development of robotic
passenger vehicles and transparent passenger user interfaces (Schaefer 2015). The
design of this set of computer-based simulation studies is in line with the goals of
the US Army Tank Automotive Research, Development and Engineering Center’s
ARIBO (Applied Robotics for Installation and Base Operations) project for alter-
native transportation options for on-base wounded Soldier transit (Marshall 2014).
The ultimate goal is to provide a means for Soldiers to schedule an on-demand
216 K.E. Schaefer
autonomous robotic passenger vehicle to arrive and drive door-to-door to and from
the Medical Barracks to the on-base medical facilities. The benefit of this work
is to understand the levels of trust that will enhance usage and effective human-
robot interaction. Results of this set of studies will provide additional insight into
the impact of trust antecedents (e.g., transparency, cueing, human characteristics)
on trust development, as well as explore how the trust relationship changes as the
human role transitions from a driver, to a safety rider, to a supervisor external to the
vehicle, as well as to a passenger. Initial findings advance current trust theory by
demonstrating significant relationships with working memory capacity and coping
style related to driving, distress, workload, and task performance (Schaefer &
Scribner 2015). Future work on this project is two part: (1) understanding the effects
of the availability of driver control interfaces (i.e., steering and speed control versus
automation engage/disengage buttons) on usability, performance, and trust; and (2)
exploring the effects of transparent user interfaces design on trust development and
calibration for passengers.
References
Army Research Laboratory (2012) Robotics Collaborative Technology Alliance Annual Program
Plan. U.S. Army Research Laboratory, Aberdeen Proving Ground
Bartneck C, Kulić D, Croft E, Zoghbi S (2009) Measurement Instruments for the anthropomor-
phism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int J Soc
Robots 1:71–81. doi:10.1007/s12369-008-001-3
Beer JM, Fisk AD, Rogers WA (2014) Toward a framework for levels of robot autonomy in human-
robot interaction. J Hum Robot Interact 3(2):74–99. doi:10.5898/JHRI.3.2.Beer
Bloomqvist K (1997) The many faces of trust. Scand J Manage 13(3):271–286
Campbell DT, Fiske DW (1959) Convergent and discriminant validation by the multitrait-
multimethod matrix. Psychol Bull 56:81–105
Chen JYC, Terrence PI (2009) Effects of imperfect automation and individual differences on con-
current performance of military and robotics tasks in a simulated multi-tasking environment.
Ergonomics 52(8):907–920. doi:10.1080/00140130802680773
10 Measuring Trust in Human Robot Interactions. . . 217
Desai M, Stubbs K, Steinfeld A, Yanco H (2009) Creating trustworthy robots: lessons and
inspirations from automated systems. In: Proceedings of the AISB convention: new frontiers in
human-robot interaction, Edinburgh. Retrieved from https://www.ri.cmu.edu/pub_files/2009/4/
Desai_paper.pdf
DeVellis RF (2003) Scale development theory and applications, vol 26, 2nd edn, Applied social
research methods series. Sage, Thousand Oaks
Donnellan MB, Oswald FL, Baird BM, Lucas RE (2006) The Mini-IPIP scales: tiny-yet-
effective measures of the Big Five factors of personality. Psychol Assess 18(2):192–203.
doi:10.1037/1040-3590.18.2.192
Fink A (2009) How to conduct surveys: a step-by-step guide, 4th edn. Sage, Thousand Oaks
Gonzalez JP, Dodson W, Dean R, Kreafle G, Lacaze A, Sapronov L, Childers M (2009) Using
RIVET for parametric analysis of robotic systems. In: Proceedings of 2009 ground vehicle
systems engineering and technology symposium (GVSETS), Dearborn
Groom V, Nass C (2007) Can robots be teammates? Benchmarks in human-robot teams. Interact
Stud 8(3):483–500. doi:10.1075/is.8.3.10gro
Hancock PA, Billings DR, Schaefer KE, Chen JYC, Parasuraman R, de Visser E (2011) A meta-
analysis of factors affecting trust in human-robot interaction. Hum Factors 53(5):517–527.
doi:10.1177/0018720811417254
Jian J-Y, Bisantz AM, Drury CG, Llinas J (1998) Foundations for an empirically determined scale
of trust in automated systems (report no. AFRL-HE-WP-TR-2000-0102). Air Force Research
Laboratory, Wright-Patterson AFB
Lawshe CH (1975) A quantitative approach to content validity. Pers Psychol 24(4):563–575.
doi:10.1111/j.1744-6570.1975.tb01393.x
Lee KM, Park N, Song H (2005) Can a robot be perceived as a developing creature? Effects of a
robot’s long-term cognitive developments on its social presence and people’s social responses
toward it. Hum Commun Res 31(4):538–563. doi:10.1111/j.1468-2958.2005.tb00882.x
Lewicki RJ, McAllister DJ, Bies RJ (1998) Trust and distrust: new relationships and realities. Acad
Manage Rev 23(3):438–458. doi:10.5465/AMR.1998.926620
Lussier B, Gallien M, Guiochet J (2007) Fault tolerant planning for critical robots. In: Proceedings
of the 37th annual IEEE/IFIP international conference on dependable systems and networks,
pp 144–153. doi:10.1109/DSN.2007.50
Marshall P (2014) Army tests driverless vehicles in ‘living lab.’ GCN technology, tools, and
tactics for public sector IT. Retrieved from http://gcn.com/Articles/2014/07/16/ARIBO-Army-
TARDEC.aspx?Page=1
Matthews G, Joyner L, Gilliland K, Campbell SE, Falconer S, Huggins J (1999) Validation of a
comprehensive stress state questionnaire: towards a state “Big Three”. In: Mervielde I, Dreary
IJ, DeFruyt F, Ostendorf F (eds) Personality psychology in Europe, vol 7. Tilburg University
Press, Tilburg, pp 335–350
McAllister DJ (1995) Affect- and cognition-based trust as foundations for interpersonal coopera-
tion in organizations. Acad Manage J 38(1):24–59
McKnight DH, Kacmar CJ, Choudhury V (2004) Dispositional trust and distrust distinctions in
predicting high- and low-risk internet expert advice site perceptions. e-Service J 3(2):35–58.
Retrieved from http://www.jstor.org/stable/10.2979/ESJ.2004.3.2.35
Merritt SM, Ilgen DR (2008) Not all trust is created equal: dispositional and
history-based trust in human-automation interactions. Hum Factors 50(2):194–210.
doi:10.1518/001872008X288574
Monahan JL (1998) I don’t know it but I like you—the influence of non-conscious affect on person
perception. Hum Commun Res 24(4):480–500. doi:10.1111/j.1468-2958.1998.tb00428.x
Nomura T, Kanda T, Suzuki T, Kato K (2004) Psychology in human-robot communication: an
attempt through investigation of negative attitudes and anxiety toward robots. In: Proceedings
of the 2004 IEEE international workshop on robot and human interactive communication,
Kurashiki, Okayama, pp 35–40. doi:10.1109/ROMAN.2004.1374726
218 K.E. Schaefer
Powers A, Kiesler S (2006) The advisor robot: tracing people’s mental model from a robot’s
physical attributes. In: 1st ACM SIGCHI/SIGART conference on Human-robot interaction,
Salt Lake City, Utah, USA
Rotter JB (1967) A new scale for the measurement of interpersonal trust. J Pers 35(4):651–665.
doi:10.1111/j.1467-6494.1967.tb01454
Rouse WB, Morris NM (1986) On looking into the black box: prospects and limits in the search
for mental models. Psychol Bull 100(3):349–363. doi:10.1037/00332909.100.3.349
Sanders TL, Wixon T, Schafer KE, Chen JYC, Hancock PA (2014) The influence of modality and
transparency on trust in human-robot interaction. In: Proceedings of the fourth annual IEEE
CogSIMA conference, San Antonio
Schaefer KE (2013) The perception and measurement of human-robot trust. Dissertation, Univer-
sity of Central Florida, Orlando
Schaefer KE (2015) Perspectives of trust: research at the US Army Research Laboratory. In: R
Mittu, G Taylor, D Sofge, WF Lawless (Chairs) Foundations of autonomy and its (cyber)
threats: from individuals to interdependence. Symposium conducted at the 2015 Association
for the Advancement of Artificial Intelligence (AAAI), Stanford University, Stanford
Schaefer KE, Sanders TL, Yordon RE, Billings DR, Hancock PA (2012) Classification of robot
form: factors predicting perceived trustworthiness. Proc Hum Fact Ergon Soc 56:1548–1552.
doi:10.1177/1071181312561308
Schaefer KE, Billings DR, Szalma, JL, Adams, JK, Sanders, TL, Chen JYC, Hancock PA (2014)
A meta-anlaysis of factors influencing the development of trust in automation: implications for
human-robot interaction (report no ARL-TR-6984). U.S. Army Research Laboratory, Aberdeen
Proving Ground
Schafer KE, Sanders T, Kessler TA, Wild T, Dunfee M, Hancock PA (2015) Fidelity & validity in
robotic simulation. In: Proceedings of the fifth annual IEEE CogSIMA conference, Orlando
Schaefer KE, Scribner D (2015) Individual differences, trust, and vehicle autonomy: A pilot
study. In Proceedings of the Human Factors and Ergonomics Society 59(1):786–790. doi:
10.1177/1541931215591242
Scholtz J (2003) Theory and evaluation of human robot interactions. In: Proceedings from the 36th
annual Hawaii international conference on system sciences. doi:10.1109/HICSS.2003.1174284
Steinfeld A, Fong T, Kaber D, Lewis M, Scholtz J, Schultz A, Goodrich M (2006) Common metrics
for human-robot interaction. In: Proceedings of the first ACM/IEEE international conference
on human robot interaction, Salt Lake City, pp 33–40. doi:10.1145/1121241.1121249
Warner RM, Sugarman DB (1996) Attributes of personality based on physical appearance, speech,
and handwriting. J Pers Soc Psychol 50:792–799
Wildman JL (2011) Cultural differences in forgiveness: fatalism, trust violations, and trust repair
efforts in interpersonal collaboration. Dissertation, University of Central Florida, Orlando
Wildman JL, Fiore SM, Burke CS, Salas E, Garven S (2011) Trust in swift starting action teams:
critical considerations. In Stanton NA (ed) Trust in military teams. Ashgate, London, pp 71–88,
335–350
Yagoda RE, Gillan DJ (2012) You want me to trust a robot? The development of a human-robot
interaction trust scale. Int J Soc Robotics 4(3):235–248