0% found this document useful (0 votes)
25 views29 pages

Schaefer 2016 TrustPerceptionScale-HRI CHP 10.1007 978-1-4899-7668-0 10

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views29 pages

Schaefer 2016 TrustPerceptionScale-HRI CHP 10.1007 978-1-4899-7668-0 10

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/299993832

Measuring Trust in Human Robot Interactions: Development of the “Trust


Perception Scale-HRI”

Chapter · April 2016


DOI: 10.1007/978-1-4899-7668-0_10

CITATIONS READS

178 13,064

1 author:

Kristin E. Schaefer
Army Research Laboratory
56 PUBLICATIONS 3,357 CITATIONS

SEE PROFILE

All content following this page was uploaded by Kristin E. Schaefer on 06 August 2020.

The user has requested enhancement of the downloaded file.


Chapter 10
Measuring Trust in Human Robot Interactions:
Development of the “Trust Perception
Scale-HRI”

Kristin E. Schaefer

10.1 Introduction

Robotics technology has vastly advanced in recent years leading to improved


functional capability, robust intelligence, and autonomy of the system. However,
along with the added benefits of these technical advancements also come changes
in the way in which humans will use or interact with the system. The most prevalent
change can be seen in the vision for robot design and development for future human-
robot interaction (HRI). This vision is now directed toward a greater prevalence of
robotic technologies in context-driven tasks that require social-based interaction.
More specifically, robots are beginning to shed their more passive tool-based roles
and move more towards being an active integrated team member (Chen and Terrence
2009). The intricacies of HRI are bound to change in order to accommodate the
integration of a robot as it becomes more of a companion, friend, or teammate,
rather than strictly a machine. Thus, this change in direction has a direct translation
of the human role as less of an operator and more of a team member or even a
bystander. Thereby, the individual’s trust in that robot takes a prominent role in the
success of any interaction, including the future use of said robot.
This chapter works in conjunction with the rest of the book such that the prevalent
focus is on the intersection of robust intelligence (RI) and trust in robotic systems.
Therefore, to reduce redundancy, we will limit the background to point to the
difficulties relating to trust development and the potential issues that occur when
trust is not developed appropriately. First, it is important to note that there are a
number of factors that influence trust development. Large scale literature reviews
of both HRI and human-automation interaction point to the importance of trust

K.E. Schaefer ()


US Army Research Laboratory, Human Research and Engineering Directorate, 459 Mulberry
Point Rd., Aberdeen Proving Ground, MD 21005, USA
e-mail: [email protected]

© Springer Science+Business Media (outside the USA) 2016 191


R. Mittu et al. (eds.), Robust Intelligence and Trust in Autonomous Systems,
DOI 10.1007/978-1-4899-7668-0_10
192 K.E. Schaefer

antecedents relating to the human, the robot, and the environment (Hancock et al.
2011; Schaefer et al. 2014). The key findings from the associated meta-analyses
point to the fact there is still much to learn about how trust develops. However,
what is prevalent in the literature is the finding that until trust between a human
and a robot is solidly established, robotic partners will continue to be underutilized
or unused, therefore providing little to no opportunity for trust to develop in the
first place (Lussier et al. 2007). This is in part due to the fact that one of the
most significant challenges for successful collaboration between humans and robots
is the development of appropriate levels of mutual trust in robots (Desai et al.
2009; Groom and Nass 2007). So, regardless of the domain of application, the
environment, or the task, a human’s trust in their non-human collaborator is an
essential element required to ensure that any functional relationship will ultimately
be effective.
Research has continued to address the creation and validation of successful
evaluation methods for a wide spectrum of HRI issues, including this issue of
human-robot trust (Steinfeld et al. 2006). Yet, a limitation in the field has been
related to accurate measurement of trust specific to the unique nature of HRI.
Human-robot trust is currently measured through subjective assessment. However,
these previous studies have been limited by using measurement tools that are
a single self-report item (e.g., How much do you trust this robot?) or are an
adapted human-interpersonal or human-automation trust scale. The concern with
this methodology is that neither of those options truly assesses the full scope
of human-robot trust, and brings to question the accuracy of the trust scores.
There has been one notable exception: Yagoda and Gillan (2012) developed a
subjective human-robot trust scale that is specific to military application. However,
the changing vision of HRI continues to press the inclusion of robotic technologies
into multiple contextual domains that incorporate varying levels of autonomy,
intelligence, and interaction. This calls forth the need for the development of
additional trust measurement tools specific to the changing HRI environment.
This chapter summarizes research that was conducted to produce a reliable and
validated subjective measure: the Trust Perception Scale-HRI (see also Schaefer
2013). The goal of this research was to design a subjective tool specific to the
measurement of human-robot trust that could be expressed as an overall percentage
of trust. In addition, this scale was designed to effectively measure trust perceptions
over time, across robotic domains, by individuals in all the major roles of HRI
(operator, supervisor, mechanic, peer, or bystander, as defined by Scholtz 2003),
and across various levels of system autonomy and intelligence (see also Beer et al.
2014). To ensure that this new scale was valid, each part of scale development was
constructed using the widely-accepted procedures discussed in DeVellis (2003) and
Fink (2009). These procedures followed the protocol of large item pool creation,
statistical item pool reduction, content validity assessment, and task-based validity
testing.
10 Measuring Trust in Human Robot Interactions. . . 193

10.2 Creation of an Item Pool

The first step in creating the Trust Perception Scale-HRI was to create an Item
Pool. An Item Pool is a collection of relevant phrases or items that are associated
with trust development. To meet this end, over 700 articles in the areas of human-
robot trust, human-automation trust, and human-interpersonal trust were reviewed
and analyzed. Theoretical, qualitative, and quantitative relationships were recorded.
Potential items were then organized in relation to the Three Factor Model of Human-
Robot Trust (Hancock et al. 2011). This model was then updated to incorporate
potential antecedents of trust (see Fig. 10.1). Specific items to be included in the
initial Item Pool were first drawn from these large scale literature reviews.
One major trust-specific finding from these reviews was the importance of
design as it related to the robot’s physical form and functional capability. While
some research had focused on the functional capabilities, limited experimental
study had been conducted specifically related to the impact of robot form on trust
development. Therefore, two initial experiments were conducted to assess this gap
in the literature and further develop the initial Item Pool.
The purpose of the first study was to determine the relationship between physical
form and trustworthiness, devoid of any direct information regarding the functional
capabilities of the system. One hundred sixty-one participants rated 63 images of
real-world industry, military, medical, service, social, entertainment, and therapy
robots. These ratings included the degree to which the robot was perceived to be a
machine, a robot, and an object, as well as its perceived intelligence (PI), level of

Fig. 10.1 Updated Three Factor Model of Human-Robot Trust following an extend literature
review of trust in the interpersonal, automation, and robot domains
194 K.E. Schaefer

automation (LOA), trustworthiness, and the degree to which the participant would
be likely to use or interact with the robot.
A multiple regression correlation analysis with stepwise entry of variables
was conducted to determine the factors that predicted trustworthiness from per-
ceived robot form alone. This was achieved by regressing trustworthiness onto
human-related factors (gender, race, age, year in school), personality traits (agree-
ableness, extroversion, conscientiousness, intellect, neuroticism), negative attitudes
toward robots (negative attitudes toward emotions in interactions, negative social
influence, and negative situational influence), as well as self-report items of
robot form (perceived intelligence, perceived level of automation (LOA), robot
classification). The final model included perceived intelligence (PI), robot classi-
fication (RC), and negative social influence (SI) as predictors of trustworthiness,
Ŷ D 0.825 C 0.651(PI) C 0.256(RC)  0.164(SI). It accounted for a significant R2
of 45.1 % of the variance, F(3156) D 42.70, p < 0.001.
These results suggested that preconceived ideas regarding the level of intelli-
gence of a robot are form-dependent and assessed prior to interaction, in much the
same way as one individual will assess another individual as a potential teammate.
Further, negative social influence (e.g., capabilities, functions, etc.) plays a key
role in expectation-setting similar to stereotypes of human teammates. Overall,
the results of the above-mentioned study provided support that physical form is
important to the trust that develops prior to HRI (for additional findings see also
Schaefer et al. 2012).
The follow-up study was designed to identify which perceived robot attributes
could impact the trustworthiness ratings. Robot attributes were assessed through
a subset of the Godspeed questionnaire (Bartneck et al. 2009), a standardized
measurement tool for HRI for interactive robots, specifically looking at items related
to anthropomorphism (Powers et al. 2007), animacy (Lee et al. 2005), likeability
(Monahan 1998), and perceived intelligence (Warner and Sugarman 1996). Over
200 participants rated a subset of the previous study’s stimuli (two that were
previously rated low on the robot classification scale, two that were rated high on the
robot classification scale, and 14 that had diverse ratings on the robot classification
scale). As anticipated, there was a significant relationship between how individuals
rated the robot image on the robot classification scale and their perceived level of
trustworthiness in the robot, r(2910) D 0.307, p < 0.001. The higher the rating of
a robot to actually be classified as a robot, the more likely it was to be rated as
trustworthy. The main purpose of this study was to determine if specific attributes
could be identified to account for this relationship. Overall, results showed that each
robot had different attributes that were important to classification. Therefore, it was
decided to include all attribute items in the initial Item Pool.
Following the literature review and the two studies mentioned above, a full
review of previously developed and referenced trust scales in the robot, automation,
and interpersonal domains were reviewed to refine the items. This resulted in a
review of 51 new scales (with a total of 487 trust items), 22 adapted versions
of previously developed scales, and 13 previously developed scales (see also
Table 10.1).
10 Measuring Trust in Human Robot Interactions. . . 195

Table 10.1 Number of trust scales and trust items reviewed


Number of trust scales assessed Robot Automation Interpersonal
Created new scales 9 30 12
Minimum number of items 1 1 1
Maximum number of items 45 31 29
Adapted previous scales 5 14 3
Previously developed scales 2 9 2
Scale were not discussed 2 11 4

Strongly Slightly Slightly Strongly


Disagree Neutral Agree
Disagree Disagree Agree Agree
Most robots make
O O O O O O O
poor teammates.
Most robots possess
adequate decision- O O O O O O O
making capability.
Most robots are
pleasant towards O O O O O O O
people.
Most robots are
not precise in O O O O O O O
their actions.

Fig. 10.2 Example items included in the initial Item Pool

The final Item Pool resulted in the creation of 156 initial items. Between two and
four items were created for each antecedent, representing equal number of positively
and negatively worded (or opposite related) items. Initial scale items were written
out as full sentences and referred to a general statement regarding “most robots” on
a 7-point Likert-type scale (see Fig. 10.2).

10.3 Initial Item Pool Reduction

The second step in the scale development procedure was to reduce the size of
the initial Item Pool using statistical procedures. These procedures began with a
Principal Component Analysis (PCA) to identify potential groupings of items, as
well as items that were not included in the groupings. Secondary analysis was
conducted using paired samples t-tests to determine if the positively and negatively
worded items were equal and thus could be reduced from the initial Item Pool.
196 K.E. Schaefer

10.3.1 Experimental Method

One hundred fifty-nine undergraduate students (65 males, 94 females) from the
University of Central Florida took part in this study via online participation
(SurveyMonkey.com). Following informed consent, participants completed the 156
randomized initial trust items. Participants then completed the demographics ques-
tionnaire that included gender, age, a mental model question, and prior experience
questions. The study took approximately 30 min to complete. Participants’ prior
experience with robots was assessed to understand previous exposure to robotic
technologies. Prior experience has been shown to be related to how an individual
forms a mental model of the robot and anticipates future HRI. As expected,
the sample population had prior exposure to media representations (N D 156);
some minor interaction with real-world robots (N D 36); and some opportunity to
control (N D 34) or build (N D 11) a real-world robot during school or club related
requirements. Table 10.2 presents results of these questions.
To assess the participants’ mental model of a robot, they were asked to describe
what a robot looks like with an open-ended question. Mental models refer to
structured, organized knowledge that humans possess which describe, explain, and
predict a system’s purpose, form, function, or state (Rouse and Morris 1986).
The responses were coded into categories (see Table 10.3). Seventeen participants
directly referenced specific robots from movies or television (e.g., R2D2, C-3P0,
iRobot, AI, and Terminator; N D 14); the video game Mass Effect 3 (N D 1); real-
world military robots (e.g., Predator, N D 2); and a robotic arm (N D 2).

Table 10.2 Participants prior experience with robots


Prior experience questions Yes (%) No (%)
Have you ever watched a movie or television show that includes robots? 98 2
• 1–5 shows (N D 87)
• 6–10 shows (N D 31)
• Over 10 shows (N D 18)
Have you ever interacted with a robot? 23 77
• Museum or theme park animatronics (N D 5)
• Toys such as Furby (N D 8)
• Robot vacuum (N D 2)
• Classroom robots or Battlebots (N D 8)
• Everyday items such as cell phone, computer, ATM, or Xbox (N D 12)
• Unclassified (N D 1)
Have you ever built a robot? 7 93
• Classroom or robotics club robots
Have you ever controlled a robot? 21 79
• Teleoperation or remote control (N D 21)
• Speech, Gesture, Commands (N D 3)
• Computer programmed (N D 6)
10 Measuring Trust in Human Robot Interactions. . . 197

Table 10.3 Coding categories of mental model of a robot


Coding description N %
Machine-like (machine, metallic, silver) 121 76:1
Human-like (human-like or specific human features) 49 30:8
Varied (multiple descriptions or ranges of robots) 28 17:6
Tool 4 2:5
Task, Function, or Interaction 30 18:9
Internal Form: Computer, electronics, wires, buttons 25 15:7
External Form: shape, size, rigid, durable 34 21:4
Capabilities: movement 33 20:8
Communication: language 7 4:4
Other (helpful, intelligent, cameras, robot, alien) 7 4:4

10.3.2 Experimental Results

All data were analyzed using IBM SPSS v.19 (SPSS 2010), with an alpha level set
to 0.05, unless otherwise indicated. These findings were important as they provided
potential cause as to whether to retain or reject specific items from the initial Item
Pool.
PCA was performed on the 156 initial trust items. Extraction was used
to identify 43 components (using the Kaiser Criterion of Eigenvalue >1 for
truncation), accounting for 79.63 % of the variance. Following review of the
scree plot, four components were retained. The un-rotated solution was subject
to orthogonal varimax rotation suppressed below j0.30j. In the rotated model, the
four components accounted for 30.64 % of the variance. In looking at the loadings
in the Rotated Component Matrix, 22 items with high loadings (>0.60) were
located in Component 1. Based on the loadings of trust items on each of the four
components, interpretations can be made about the factors themselves. Component
1 seemed to represent performance-based functional capabilities of the robot.
Component 2 seemed to represent robot behaviors and communication. Component
3 may represent task or mission specific items. Finally, Component 4 seemed to
represent feature-based descriptors of robots. These components supported the
theory addressed by the descriptive Three Factor Model of Human-Robot Trust
(first described by Hancock et al. 2011). Following PCA, 26 items were considered
for immediate removal from the Item Pool.
Means, standard deviations, normality (skewness and kurtosis), correlations,
z-scores, and paired samples t-tests were conducted to further assess items for
retention or removal. To be retained in the scale, items should retain normality.
Therefore, 62 items with significant skew and 20 items with significant kurtosis
were considered for removal from the Item Pool. In addition, paired samples t-tests
were conducted on all of the paired items (positive and negatively worded items) to
determine if they were interchangeable, thus reducing the item pool. The results of
198 K.E. Schaefer

this assessment resulted in 39 paired items that were found to be not significantly
different from each other. These results provided a rationale for reducing the scale
by an additional 39 items.
Even though some elements might have been considered for removal, the
following items were retained for subject matter expert (SME) review due to their
importance to trust theory: move quickly, move slowly, require frequent mainte-
nance, autonomous, led astray by unexpected changes in the environment, work
in close proximity with people, possess adequate decision-making capability, make
sensible decisions, openly communicate, and communicate only partial information.
Following the various statistical assessments (PCA, normality assessment, and
paired samples t-tests), the Item Pool was reduced from 156 items to 73 items.

10.3.3 Key Findings and Changes

Two major changes were made to the scale following this study. First, there were
some potential issues that arose with the wording of the items. Two main types of
item formation were included in the above version of the scale. Items either began
with “Most robots” or “I.” This may have impacted the factor creation. Therefore,
all items were reduced to a single word or short phrase prior to subject matter expert
(SME) review. Secondly, the scale was modified from a 7-point Likert-type scale
to a percentage scale with 10 % increments. The decision to make this change
in the scale was related to larger purpose to develop a scale that provided a trust
rating from no trust (0 %) to complete trust (100 %). This change was supported by
research, especially in the interpersonal and e-commerce domains that suggest trust
and distrust are viewed as related but separate constructs with differing effects on
behavior, consequences, and outcomes (Lewicki et al. 1998; McKnight et al. 2004;
Wildman 2011; Wildman et al. 2011).

10.4 Content Validation

The third step in the process to create a reliable and valid subjective scale was
content validation. In this step, the goal was to survey SMEs in the area of trust
and robotics in order to determine if each item should be retained or removed from
the Item Pool. This two-phase semantic analysis included item relevance (content
validation) using the protocols described by Lawshe (1975), and the identification of
the hypothetical range of differences (e.g., no trust and complete trust differences)
for each item.
10 Measuring Trust in Human Robot Interactions. . . 199

Table 10.4 Years of experience for the subject matter experts


Robot Robot Robot Automation Automation Trust
SME design operator research HRI design research research
SME 1 0 0 8 8 0 8 0
SME 2 5 4 3 0 0 0 0
SME 3 4 0 4 0 2 2 0
SME 4 7 0 7 0 0 0 0
SME 5 4 8 8 8 0 0 0
SME 6 0 0 0 0 0 7 7
SME 7 11 0 11 11 0 0 3
SME 8 0 0 0 8 0 8 6
SME 9 0 0 4 0 0 4 0
SME 10 7 0 7 0 0 0 0
SME 11 0 0 10 10 20 30 15
All results are reported in years of experience

10.4.1 Experimental Method

Eleven SMEs were included from the United States Army Research Laboratory,
United States Air Force Research Laboratory, and faculty members from university
research laboratories. All SMEs were considered experts in the fields of trust
research, robotics research, or HRI. Table 10.4 provides the SME’s years of
experience across a variety of robot, automation, and research topics.
SMEs were contacted via email. Upon agreement to participate, they were pro-
vided a link to complete an online survey. All data for this experiment were collected
through an online tool (SurveyMonkey.com). SMEs were provided background
information, purpose, and a brief review of trust theory prior to beginning the multi-
part study. In Part 1, SMEs were provided background information, purpose, and a
brief review of trust theory prior to beginning the multi-part study. In Part 1, SMEs
completed an expertise questionnaire. In Part 2, SMEs were given instructions to
complete the 73 item Trust Scale with the instructions “Please rate the following
items on how a person with little or no trust in a robot would rate them.” In
Part 3, SMEs were given instructions to complete the 73 item Trust Scale with
the instructions “Please rate the following items on how a person with complete
trust in a robot would rate them.” All items in Part 2 and Part 3 were randomized.
Part 4 was the Content Validation questionnaire based on Lawhe (1975) content
analysis protocols. SMEs rated each item on a 3-point Likert-type scale as either
“extremely important to include in scale,” “important to include in scale,” or “should
not be included in scale.” SMEs could also mark if they felt an item was domain
specific (e.g., military robotics, social robotics, etc.). A comment box was available
to provide any clarification about why they rated the item a specific way, to provide
additional recommendations to the scale design, or to suggest items that may be
missing from the scale. The total survey took approximately 30 min to complete.
200 K.E. Schaefer

10.4.2 Experimental Results

Items were analyzed using the Content Validity Ratio developed by Lawshe (1975).
The Content Validity Ratio (CVR), depicted in Eq. 10.1, is a commonly used method
of analyzing scale items (see also Yagoda and Gillan 2012). The CVR equation was
derived from a 3-point Likert scale (1 D Should not be included in scale, 2 D Might
be important to include in scale, and 3 D Extremely important to include in scale).

CVR D .ne – N=2/ = .N=2/ (10.1)

CVR D Content Validity Ratio


ne D Number of SMEs indicating that an item is Extremely Important
N D Total number of SMEs
Lawshe’s protocol suggested that 11 SMEs with a criterion set to 0.59 are needed
to ensure that the SME agreement is unlikely to be due to chance. The formula
yielded values ranging from C1 to 1. Positive values indicated that at least half of
the SMEs rated the item as Extremely Important. Table 10.5 reports the CVR values
for the 14 items recommended by the SMEs.
CVR values were also calculated for the items that were rated as “important to
include in the scale.” This resulted in 37 additional items to consider for inclusion
in the finalized scale. The scores from the hypothetical range of differences were
used to further evaluate these 37 items. The hypothetical range of differences was
assessed from the SMEs completion of the Trust Scale in Part 2 and Part 3 of the
study. Paired samples t-tests were conducted to identify the hypothetical range of
differences (see Table 10.6).

Table 10.5 The 14 items Item CVR values


recommended by SMEs as
“Extremely Important” 1. Function successfully 1.00
2. Act consistently 1.00
3. Reliable 1.00
4. Predictable 1.00
5. Dependable 1.00
6. Follow directions 0.82
7. Meet the needs of the mission 0.82
8. Perform exactly as instructed 0.82
9. Have errorsa 0.82
10. Provide appropriate information 0.82
11. Malfunctiona 0.64
12. Communicate with people 0.64
13. Provide feedback 0.64
14. Unresponsivea 0.64
CVR  0.59
a
Represents reverse coded items
10 Measuring Trust in Human Robot Interactions. . . 201

Table 10.6 The 37 “Important Items” separated by retained and removed items
Complete trust No trust Range
CVR Mean SD Mean SD t p
Items retained
1. Operate in an integrated team 1.00 75.00 21:73 28.00 25.73 3:62 0.006
environment
2. Autonomous 1.00 69.09 20:23 38.89 23.15 3:18 0.013
3. Good teammate 0.82 87.27 11:04 11.82 11.68 16:60 <0.001
4. Performs a task better than a novice 0.82 69.09 20:71 24.55 24.23 6:69 <0.001
human user
5. Led astray by unexpected changes 0.82 71.00 19:12 26.00 15.78 5:78 <0.001
in the environment
6. Know the difference between friend 0.82 71.00 27:67 14.55 16.95 5:89 <0.001
and foe
7. Make sensible decisions 0.82 84.00 11:74 21.00 20.79 8:62 <0.001
8. Clearly communicate 0.82 83.00 11:60 19.00 15.24 12:30 <0.001
9. Warn people of potential risks in 0.82 83.00 11:60 23.00 18.29 7:75 <0.001
the environment
10. Incompetent 0.82 85.45 14:40 39.00 33.15 5:24 0.001
11.Possess adequate decision-making 0.82 71.11 16:91 20.00 21.60 5:57 0.001
capability
12. Are considered part of the team 0.82 79.00 12:87 35.00 31.36 3:36 0.010
13. Will act as part of the team 0.82 74.00 19:55 30.00 29.06 3:28 0.010
14. Perform many functions at one 0.82 68.00 23:48 36.00 25.47 4:40 0.002
time
15. Protect people 0.82 77.00 24:52 23.00 22.14 3:92 0.003
16. Openly communicate 0.82 80.00 15:81 34.00 28.36 3:79 0.005
17. Responsible 0.82 66.36 30:42 27.27 34.67 2:76 0.020
18. Built to last 0.82
19. Work in close proximity with 0.82 65.00 19:58 35.56 26.03 2:34 0.047
people
20. Supportive 0.64 66.00 16:47 18.00 11.35 8:67 <0.001
21. Work best with a team 0.64 71.00 18:53 34.00 28.36 3:41 0.008
22. Tell the truth 0.64 86.36 21:57 46.00 31.69 2:90 0.018
23. Keep classified information secure 0.64 84.00 15:06 55.45 36.43 2:69 0.025
24. Require frequent maintenance 0.64 74.00 20:66 44.00 28.75 2:37 0.042
Items removed from scale
25. Responsive 1.00 84.00 9:66 32.22 21.67 7:50 <0.001
26. Poor teammate 0.82 84.55 9:66 45.45 36.43 3:64 0.005
27. Are assigned tasks that are critical 0.82 58.00 32:25 25.00 35.67 1:84 0.098
to mission success
(continued)
202 K.E. Schaefer

Table 10.6 (continued)


Complete trust No trust Range
CVR Mean SD Mean SD t p
28. Communicated only partial 0.82 53.00 28.30 35.00 28.77 1.03 0.331
information
29. Instill fear in people 0.73 83.64 18.59 50.91 34.48 3.13 0.011
30. Likeable 0.64 60.00 33.54 14.00 10.75 4.61 0.001
31. Easy to maintain 0.64 60.00 33.54 35.00 31.71 2.89 0.018
32. Responsible for its own actions 0.64 64.00 32.04 32.73 36.63 2.22 0.054
33. Given complete responsibility for 0.64 59.00 37.55 19.00 37.84 2.16 0.059
the completion of a mission
34. Monitored during a mission 0.64 49.00 29.61 20.00 28.67 2.08 0.067
35. Are considered separate from the 0.64 68.00 24.40 36.00 32.39 1.92 0.091
team
36. Difficult to maintain 0.64 70.00 15.63 50.00 34.64 1.60 0.143
37. Work best alone 0.64 48.00 23.94 41.00 33.48 0.69 0.506

Twenty-four items were retained for further scale validation. The remaining
13 items were removed from the scale for the following reasons: non-significant
findings on the paired samples t-tests for eight items, and SMEs comments assisted
in the removal of the remaining five items. The first comment was a general
comment stating that some items (e.g., easy/difficult to maintain) were repetitive
in nature. To address this comment, only one of the repetitive items was included in
the revised scale. Responsive, easy to maintain, and poor teammate were removed.
In addition instill fear in people was removed due to its nature of distrust more than
trust. Finally, the item likeable was considered to be too general an item for the scale.
An additional comment suggested that two items are given complete responsibility
for the completion of the mission, and are assigned tasks that are critical to mission
success represented situational factors that may be a separate issue from trust. Even
though CVR analysis revealed that SMEs felt that they might be important items
to include in the scale, their responses to the theoretical range of scores did not
show a significant change in trust. This added support for the SMEs’ comments
recommending removal of these two items.
Twenty-two items did not meet the CVR criterion and were reviewed for removal
from the scale. Of these 22 items, the four items regarding movement (move quickly,
move slowly, mobile, move rigidly) were removed. To further support this decision,
there was a comment from a SME suggesting that trust and speed were orthogonal.
SMEs further recommended that three items representing robot personality (kind,
caring, have a relationship with their human users or operators) were recommended
for removal. An additional nine of these items (human-like, fake, alive, dead,
offensive, organic, ignorant, apathetic, and make decisions that affect me personally)
were removed from the scale due to the lack of change in scores on the theoretical
range of responses. However, out of those 22 items that did not meet the CVR
criterion, a number of SMEs felt that some items may have particular importance
10 Measuring Trust in Human Robot Interactions. . . 203

to trust development as robots advance further into socially relevant relationships.


Therefore, the following four items were retained in the scale: friendly, pleasant,
conscious, and lifelike.
Semantic analysis of the trust scale items reduced the scale from 73 items to 42
items. It further identified 14 items that were extremely important to trust measure-
ment, with an additional 24 items that could be important to trust measurement. Four
domain specific items (friendly, pleasant, conscious, and lifelike) were also retained
on the scale based on SME recommendations. No additional items or item revisions
were recommended by the SMEs. In addition, no changes to the scale design were
recommended.

10.5 Task-Based Validity Testing: Does the Score Change


Over Time with an Intervention?

Trust has typically been measured following an interaction. However, trust is


dynamic in nature, as ongoing interactions and relational history continuously
influence trust levels at any given point in time. Consequently, trust before, during,
and after an interaction may not be identical. Further, trust in the same partner will
likely change over time as the relationship progresses (Bloomqvist 1997). Research
in the areas of automation (see Merritt and Ilgen 2008), as well as interpersonal
trust (see McAllister 1995), recommended that trust should be measured at multiple
times, specifically before and after the task or interaction. Within the Three Factor
Model of Human-Robot Trust, pre-interaction measurement has been used to
identify initial trust perceptions that are influenced by human traits, robot features,
and the individual’s perception of the environment and perceived robot capabilities.
Post-interaction measurement was used to identify changes in trust related to human
states and trust perceptions following interactions. Obtaining the most reliable and
accurate reflection of the dynamic nature of trust in an interaction may necessitate
measuring trust multiple times.
In order to examine this concept of dynamic trust, an experiment was designed
to assess the 42 Item Trust Scale’s capability to measure changes in perceived trust.
Computer-based simulation was used to develop a monitoring task specific to a
“screen the back” scenario for a Soldier-robot team (Army Research Laboratory
2012). Trust, as measured using the 42 Item Trust Scale, was assessed both pre-
interaction, and post-interaction in the high trust condition (the robot provided
100 % reliable feedback on target detection), as well as post-interaction in the low
trust condition (the robot provided 25 % reliable feedback). It was hypothesized that
there are mean differences in trust that occur over time, with respect to changes in
robot reliability. More specifically, trust will increase from pre-interaction to after
experiencing a 100 % reliable interaction; trust will decrease after first experiencing
a 100 % reliable interaction and then experiencing a 25 % reliable interaction.
204 K.E. Schaefer

10.5.1 Experimental Method

Participants included 81 undergraduate students (25 males, 56 females;


M D 22.57 years, SD D 3.95) from an undergraduate psychology course in Science
and Pseudoscience at the University of Central Florida. Participants had varied
backgrounds with respect to robot familiarity. All participants previously watched
movies or television shows that incorporated robots: 1–5 shows (N D 30), 6–10
shows (N D 9), and over 10 shows (N D 42). Forty-four participants previously
interacted with robots, ranging from the Roomba™ vacuum cleaner to a bomb
disposal robot. Participants also reported previously controlling robots through a
variety of modalities: voice (N D 11), game controller (N D 24), gestures or pictures
(N D 3), or a radio control (RC) system (N D 35). In addition, five participants
previously built robots for class-based projects.
All materials were administered through paper and pencil versions. The Negative
Attitudes toward Robots Scale (NARS; Nomura et al. 2004), and demographics
questionnaire were included to identify potential individual difference ratings. The
NARS has three subscales: NARS_S1 represents negative attitudes toward situations
of interaction with robots; NARS_S2 represents negative attitudes toward social
influence of robots; and NARS_S3 represents negative emotions in interactions
with robots. Trait-based trust was measured through the Interpersonal Trust Scale
(ITS; Rotter 1967). State-based trust was measured through the 42 item Trust Scale,
administered pre- and post-interaction. Participants’ states were not assessed in this
experiment due to the nature of the task (monitoring only).
All HRI scenarios were developed using the Robotic Interactive Visualization &
Experimental Technology (RIVET) computer-based simulation system developed
by General Dynamics Robotic Systems (GDRS; Gonzalez et al. 2009), in collab-
oration with the US Army Research Laboratory. RIVET uses an adapted Torque
Software Development Kit (SDK) development and runtime environment through
a Client/Server networking model. Development of the virtual environment (VE)
was accomplished through a TorqueScript language, similar to CCC . The VEs
used a base environment developed previously for Army research activities. This
included the layout of the physical environment (e.g., ground, roadways, buildings,
and lighting), as well as inclusion of the Soldier and terrorist non-player characters.
Task-specific customization of the environment was accomplished through scripting
syntax. Examples of specific customization included entering objects, obstacles, and
creation of paths.
An experimental scenario was created using the RIVET computer-based sim-
ulation system of a Soldier-Talon™ robot team completing a “screen the back”
mission (Army Research Laboratory 2012). Participants monitored the Talon robot
as it navigated to the back of the building, repositioned behind a barrel, monitored
the back of the building for human targets, and provided a speech-based response
when a target was detected. A single video was created from the camera-view on
®
the Talon™ robot. The video of the simulation was created using FRAPS real-
time video capture and benchmarking program with a 30 frame rate/s .avi file.
10 Measuring Trust in Human Robot Interactions. . . 205

The .avi file was converted into the .mp4 file format to add auditory feedback. The
robot stated “target detected” in a male computer synthetic voice. There were eight
possible targets included in each scenario. No false alarms were included.
The study was conducted in a single session. Participants were provided a copy
of the informed consent while the experimenter read it aloud. Participants then
viewed an image of the Talon robot and completed the 42 Item Trust Scale (Time
1; pre-interaction). Next, participants were instructed about the human-robot task
they were about to monitor. Following Video 1, participants completed the 42 Item
Trust Scale (Time 2; post-interaction, 100 % reliable feedback). Participants then
received the second task instructions, monitored Video 2, and completed the 42
Item Trust Scale (Time 3; post-interaction, 25 % reliable feedback). Following
these monitoring tasks, participants completed the ITS, NARS, and demographics
questionnaire. The study took approximately 90 min in its entirety.

10.5.2 Experimental Results

All data were analyzed using IBM SPSS Statistics v.19 (SPSS 2010), with an alpha
level set to 0.05, unless otherwise indicated.

10.5.2.1 Individual Item Analysis

The first step of analysis was to determine if each item changed over time. A one-
way within-subjects repeated measures analysis of variance was conducted for each
of the 42 items of the Trust Scale. The factor was “time” and the dependent variable
was the individual score. Thirty-four items showed a significant mean difference
for the condition of “time.” Using Post-hoc analysis, six items were found to only
have significant differences between Time 1 (Pre-Interaction) and Time 2 (post-
interaction). These items included: lifelike, perform many functions at one time,
friendly, know the difference between friend and foe, keep classified information
secure, and work in close proximity to people. These results may have occurred due
to a significant change in the mental model from pre- to post-interaction; therefore,
the items were retained in the scale. Confidence interval analysis was conducted
on the remaining two items (operate in an integrated team environment, built to
last) and showed no significant change between Time 1, Time 2, or Time 3. These
items were removed from the scale. Additional analyses were conducted using the
40 retained items.

10.5.2.2 Trust Score Validation

First, a general score of trust was created for each of the three time periods.
Following reverse coding of specific items, the 40 items were summed and divided
206 K.E. Schaefer

by 40 to formulate a score between 0 and 100. To assess the impact of time on


trust development, a one-way within-subject repeated measures analysis of variance
was conducted. The results indicated a significant effect of time, F(2, 79) D 119.10,
p < 0.001, ˜¡2 D 0.75. Post-hoc analysis using the Fisher LSD revealed significant
mean difference ratings for all three times the trust scale was administered. It
revealed that trust was significantly greater in Time 2 (post-interaction, 100 %
reliable condition) than Time 1 (pre-interaction) and Time 3 (post-interaction, 25 %
reliable condition), thus supporting the hypothesis. In addition, mean trust scores at
Time 1 were significantly greater than at Time 3 (see Fig. 10.3).

100

90

80

70
Trust Score (%)

60

50

40

30

20

10

0
Time 1: Pre-Interaction Time 2: Post-Interaction Time 3: Post-Interaction
(100% reliable) (25% reliable)

Fig. 10.3 Bar graph representing significant mean differences of trust over time, with 95 %
confidence interval error bars
10 Measuring Trust in Human Robot Interactions. . . 207

100 Mean (40 item scale)

90 Mean (14 item scale)

80

70

60
Trust Score (%)

50

40

30

20

10

0
Time 1 Time 2 Time 3

Fig. 10.4 Trust scores for the 40 item and the 14 item scale across time

10.5.2.3 40 Items Versus 14 Items

Additional analyses were also conducted to identify the differences between the 40
Item Trust Scale and the 14 Item SME recommended scale. A 2 Scale (40 items, 14
items)  3 Time (Time 1, Time 2, Time 3) repeated measures analysis of variance
was conducted (see Fig. 10.4).
Results showed a significant effect of Time, F(2240) D 186.59, p < 0.001,
˜p2 D 0.609; Scale F(1240) D 273.61, p < 0.001, ˜p2 D 0.533; and an interaction
between Time and Scale, F(2240) D 108.84, p < 0.001, ˜p2 D 0.476. Review of the
confidence intervals showed a significant difference between the scales at Time 1
and Time 2, but not Time 3. While findings revealed significant differences between
the two scales, graphical representations showed similar patterns in the results.
Taking into account both the individual analyses of each item measured over Time,
as well as the comparative results of the two scales, it appeared that the total trust
score of the 40 Item Trust Scale provided a finer level of granularity and thus a more
accurate trust rating.
208 K.E. Schaefer

10.6 Task-Based Validity Testing: Does the Scale


Measure Trust?

This study marked the final validation experiment for the 40 Item Trust Scale.
It used a Same-Trait approach (Campbell and Fiske 1959) to validate that this
scale measured trust and not an alternative construct. The same-trait was evaluated
through a comparison of the developed 40 Item Trust Scale, and the well-established
Checklist for Trust between People and Automation (Jian et al. 1998) trust in
automation scale. Human-robot interaction was accomplished through computer-
based simulation of a joint navigation task.
It was first hypothesized that there would be a strong positive correlation
between the 40 Item Scale, the 14 Item SME selected subscale, and Checklist
for Trust between People and Automation (Jian et al. 1998) trust scales. The
second hypothesis was that the three change scores in the post-interaction conditions
(20 % robot navigation errors—80 % robot navigation errors) for the types of trust
scales (i.e., 40 item scale, 14 item scale, and Checklist for Trust) would not show
significant mean differences. However, it was anticipated that the 40 Item and 14
Item scales would change from pre-interaction measurement to post-interaction
measurement, as shown in the prior experiment.

10.6.1 Experimental Method

Twenty-one undergraduate students from the University of Central Florida (12


males, 9 females) participated in two Soldier-robot team-based computer simu-
lations to provide the next level of task-based validity testing. Multiple scales
were included to measure subjective trust, personality traits, demographics, and
human states. Subjective trust was measured through the developed 40 Item Trust
Scale. A partial measure represented by the 14 Items recommended by SME’s
was also assessed. The well-established Checklist for Trust between People and
Automation (Jian et al. 1998) was included for Same-Trait analysis. Items including
the word ‘automation’ were adapted to ‘robot.’ The Interpersonal Trust Scale (Rotter
1967) was also included. The 7-point Mini-IPIP personality assessment (Donnellan
et al. 2006) was used to measure the Big 5 personality traits: agreeableness,
extraversion, intellect, conscientiousness, and neuroticism. The Dundee Stress State
Questionnaire (DSSQ; Matthews et al. 1999) was included to measure human states
(i.e., mood state, motivation, workload, and thinking style) before and after a task.
The virtual environment (VE) for the present procedure was developed in
RIVET and used a base environment of a Middle Eastern town developed by
GDRS in collaboration with the US Army Research Laboratory. Independent task-
specific customization of the physical environment was accomplished through
scripting syntax. Specific customization included entering objects, obstacles, and
creation of paths, etc. Scripting files were created for both the training session
10 Measuring Trust in Human Robot Interactions. . . 209

and the task conditions. The task was to assist an autonomous robot from a set
location to a rendezvous point. Participants controlled a Soldier avatar throughout
the Middle Eastern town using a keyboard and mouse interface. It was possible
that the robot could become stuck on an obstacle and required the participant’s
assistance. Participants could move certain obstacles out of the way of the robot
by simply walking into the obstacle. The mission ended when both the Soldier and
robot reached the rendezvous location. In Simulation A, the robot autonomously
navigated around four out of the five obstacles. In Simulation B, the robot only
navigated around one of the obstacles. Each simulation was approximately 1 min in
length. The order of simulation presentation was counterbalanced and determined
prior to participation.
Following completion of informed consent, participants completed three ques-
tionnaires: the demographics questionnaire, the Mini-IPIP personality inventory,
and the ITS. Participants were then shown a picture of the Talon robot and completed
the 40 Item Trust Scale, the Checklist for Trust between People and Automation,
and the DSSQ to acquire baseline information. Participants then completed the two
simulated tasks, followed by completion of the two trust scales and the post-task
®
DSSQ following each task. The simulated tasks were recorded using FRAPS real-
time video capture and benchmarking program with a 30 frame rate/s .avi file.
Video was recorded from the Soldier character’s perspective. It was saved with
a unique identifier to maintain participant confidentiality. The entire study took
approximately 1 h to complete.

10.6.2 Experimental Results

All data were analyzed using IBM SPSS Statistics v.19 (SPSS 2010), with an
alpha level set to 0.05, unless otherwise indicated. Initial analyses were conducted
to assess changes in human states over time. Results demonstrated no significant
difference in mood state or motivation subscales. The thinking style subscales of
self-focused attention and concentration showed a significant difference between
pre-interaction and post-interaction, but no difference between the two post-
interaction conditions. A similar result was found for the thinking content subscale,
task interference. Due to these findings, no additional analyses were conducted
assessing human states. The Same-Trait methodology compared the developed 40
Item Trust Scale, the SME’s recommended 14 Item Trust Scale, and the Checklist
for Trust between People and Automation.

10.6.2.1 Correlation Analysis of the Three Scales

The first step in this validation was to identify the relationships between the three
trust scales. In support of Hypothesis 1, significant positive Pearson correlations
were found between all three scales (see Table 10.7).
210 K.E. Schaefer

Table 10.7 Same-trait trust scale correlations over time


M SD 1 2 3
Pre-interaction trust
1. 40 Item Scale 60.60 13.58 1
2. 14 Item Scale 72.45 12.70 0.829** 1
3. Checklist for Trust between People 71.71 16.27 0.620** 0.745** 1
and Automation
Post-interaction (20 % errors)
1. 40 Item Scale 49.92 19.59 1
2. 14 Item Scale 60.61 20.19 0.918** 1
3. Checklist for Trust between People 71.54 16.22 0.857** 0.854** 1
and Automation
Post-interaction (80 % errors)
1. 40 Item Scale 46.70 22.43 1
2. 14 Item Scale 57.42 24.93 0.934** 1
3. Checklist for Trust between People 72.27 17.16 0.855** 0.852** 1
and Automation
** represent significance at the .01 level

The second step in this validation process was to determine if there was a
significant mean difference between the post-interaction change scores (20 % robot
navigation errors—80 % robot navigation errors) for the three scales. A within-
subjects repeated measures analysis of variance was conducted. In support of
Hypothesis 2, there was not a significant mean difference between the post-
interaction change scores for the three trust scales, F(2,19) D 2.64, p D 0.097. This
result provided additional support that the developed scale measures the construct
of trust.

10.6.2.2 Pre-post Interaction Analysis

Additional analyses were conducted to determine the differences between the


pre-post interaction scores between the three scales. Paired samples t-tests were
conducted to assess the change in trust pre-post interaction. A significant change
in trust was found between the pre-post interaction trust measurement for the 40
Item Trust Scale, t(40) D 3.87, p < 0.001, and the 14 Item Trust Scale, t(40) D 3.86,
p < 0.001. However, the Checklist for Trust between People and Automation trust
scale did not change (p D 0.932). This finding suggested that the developed trust
scale does indeed measure something additional to the previously developed
trust scale.
10 Measuring Trust in Human Robot Interactions. . . 211

10.6.2.3 Differences Across Scales and Conditions

To further explore these scale differences, a 3 Trust Scales (40 item, 14 item, and
Checklist for Trust)  3 Conditions (pre-interaction, 20 % robot error, and 80 %
robot error) repeated measures analysis of variance was conducted. There was a
main effect of scale, F(2,59) D 105.16, p < 0.001 ˜p2 D 0.781, but not condition
(p D 0.191). Results are depicted in Fig. 10.5. Confidence interval analysis of the
mean trust scores demonstrated that there was no significant difference between the
three scales that were recorded pre-interaction. This finding suggested that all three
scales provide similar trust scores prior to HRI. Further there were no significant
differences between the 14 Item Trust Scale and the well-established Checklist
for Trust between People and Automation (Jian et al. 1998). This finding is not
surprising as the items from both the 14 Item Trust Scale and the Jian et al. scale
referenced the capability of the system. The important finding was the significant
differences between the 40 item Trust Scale and the Jian et al. scale during the two
post-interaction conditions.

100

90

80

70
Trust Score (%)

60
40 Item
50
14 Item
40 Jian et al.

30

20

10

0
Pre-Interaction Post-Interaction (20% Post-Interaction (80%
Errors) Errors)

Fig. 10.5 Differences between the 40 Item, 14 Item, and Checklist for Trust between People and
Automation (Jian et al. 1998) trust scales
212 K.E. Schaefer

10.6.3 Experimental Discussion

This study demonstrated that the developed trust scale assessed the construct of
trust. In addition, it provided support for additional benefits of the developed 40
Item Trust Scale above and beyond previously used scales (i.e., Checklist for Trust
between People and Automation; Jian et al. 1998). First, there were strong positive
correlations between the three scales. Second, mean analysis showed significant
differences between the 40 item and Jian et al. scale in the post-interaction
conditions. Both the 40 Item and the 14 Item scales showed a significant change
in trust from pre-interaction to post-interaction; however the Checklist for Trust
did not change. This change in trust was mirrored in the previous study, and is
supported by the trust theory. Therefore, it can be postulated that the developed
Trust Scale accounted for the relationship between the change in mental models
and trust development that occurs after HRI. In addition, findings from this study,
together with the findings from the previous study provide support that the 40 Item
Trust Scale had more accurate trust scores than both the 14 Item SME recommended
scale and the Checklist for Trust scale.

10.7 Conclusion

The goal was to develop a trust perception scale that focused on the antecedents
and measurable factors of trust specific to the human, robot, and environmental
elements. This resulted in the creation of the 40 item Trust Perception Scale-HRI
and the 14 item sub-scale. The finalized scale was designed as a pre-post interaction
measure used to assess changes in trust perception specific to HRI. The scale was
also designed to be used as post-interaction measure to compare changes in trust
across multiple conditions. It was further designed to be applicable across all robot
domains. Therefore, this scale can benefit future robotic development specific to the
interaction between humans and robots.

10.7.1 The Trust Perception Scale-HRI

The scale provided an overall percentage score across all items. Items were preceded
by the question “What percentage of the time will this robot : : : ” followed by a list
of the items. The finalized 40 item scale is provided in Table 10.8, and took between
5 and 10 min to complete.
10 Measuring Trust in Human Robot Interactions. . . 213

Table 10.8 Finalized Trust Perception Scale-HRI


0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % 100 %
What % of the time will this robot be : : :
1. Considered part of the O O O O O O O O O O O
team
2. Responsible O O O O O O O O O O O
3. Supportive O O O O O O O O O O O
4. Incompetenta O O O O O O O O O O O
5. Dependableb O O O O O O O O O O O
6. Friendly O O O O O O O O O O O
7. Reliableb O O O O O O O O O O O
8. Pleasant O O O O O O O O O O O
9. Unresponsivea,b O O O O O O O O O O O
10. Autonomous O O O O O O O O O O O
11. Predictableb O O O O O O O O O O O
12. Conscious O O O O O O O O O O O
13. Lifelike O O O O O O O O O O O
14. A good teammate O O O O O O O O O O O
15. Led astray by O O O O O O O O O O O
unexpected changes in the
environment
What % of the time will this robot : : :
16. Act consistentlyb O O O O O O O O O O O
17. Protect people O O O O O O O O O O O
18. Act as part of the team O O O O O O O O O O O
19. Function successfully O O O O O O O O O O O
20. Malfunctiona O O O O O O O O O O O
21. Clearly communicate O O O O O O O O O O O
22. Require frequent O O O O O O O O O O O
maintenancea
23. Openly communicate O O O O O O O O O O O
24. Have errorsa O O O O O O O O O O O
25. Performa a task better O O O O O O O O O O O
than a novice human user
26. Know the difference O O O O O O O O O O O
between friend and foe
27. Provide feedbackb O O O O O O O O O O O
28. Possess adequate O O O O O O O O O O O
decision-making capability
29. Warn people of potential O O O O O O O O O O O
risks in the environment
30. Meet the needs of the O O O O O O O O O O O
mission/taskb
(continued)
214 K.E. Schaefer

Table 10.8 (continued)


0 % 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % 100 %
31. Provide appropriate O O O O O O O O O O O
informationb
32. Communicate with O O O O O O O O O O O
peopleb
33. Work best with a team O O O O O O O O O O O
34. Keep classified O O O O O O O O O O O
information secure
35. Perform exactly as O O O O O O O O O O O
instructedb
36. Make sensible decisions O O O O O O O O O O O
37. Work in close proximity O O O O O O O O O O O
with people
38. Tell the truth O O O O O O O O O O O
39. Perform many functions O O O O O O O O O O O
at one time
40. Follow directionsb O O O O O O O O O O O
a
Represents the reverse coded items for scoring
b
Represents the 14 item sub-scale items

10.7.2 Instruction for Use

When the scale is used as a pre-post interaction measure, the participants should
first be shown a picture of the robot they will be interacting with or provided a
description of the task prior to completing the pre-interaction scale. This accounts
for any mental model effects of robots and allows for comparison specific to the
robot at hand. For post-interaction measurement, the scale should be administered
directly following the interaction. To create the overall trust score, 5 items must first
be reverse coded. The reverse coded items are denoted in the above table. All items
are then summed and divided by the total number of items (40). This provides an
overall percentage of trust score.
While use of the 40 Item scale is recommended, a 14 Item subscale can be used to
provide rapid trust measurement specific to measuring changes in trust over time, or
during assessment with multiple trials or time restrictions. This subscale is specific
to functional capabilities of the robot, and therefore may not account for changes in
trust due to the feature-based antecedents of the robot. Trust score is calculated by
first reverse coding the ‘have errors,’ ‘unresponsive,’ and ‘malfunction’ items, and
then summing the 14 item scores and dividing by 14. The 14 items are marked in
Table 10.8.
10 Measuring Trust in Human Robot Interactions. . . 215

10.7.3 Current and Future Applications

This scale was developed to provide a means to subjectively measure trust per-
ceptions over time and across robotic domains. In addition, it can be used by
individuals in all the major roles of HRI: operator, supervisor, mechanic, peer, or
bystander. Therefore, there are many potential avenues for future research using
Trust Perception Scale-HRI. Current and near-term research studies highlight the
expansion of human-robot trust antecedents, which were first described in the
Three Factor Model of Human-Robot Trust (Hancock et al. 2011; Schaefer et al.
2014). These include the exploration of human-robot trust as it relates to: Soldier,
bystander, and robot proximity in high-risk military tasking; transparent system
communication and feedback for Soldier-robot teaming in high-risk environments;
the development of natural language processing in human-robot teams; and dual-
task engagement with an autonomous vehicle designed for on-base personnel
transport. While a number of these studies are currently under review or in press, a
few preliminary results are discussed below.
First, Sanders et al. (2014) used the Trust Perception Scale-HRI to assess
the impact of the amount of communication feedback from the robot (constant,
contextual only, minimal) and modality of information (visual, text, audio) on
trust development. Their initial results found a greater increase in the change in
trust (post-interaction minus pre-interaction) for a constant stream of information
compared to contextual information only and minimal information across three
different communication modalities (text, auditory, and visual) during a Soldier-
robot team surveillance task clearing an area of weapons and locating civilians to
be safely evacuated from a hostile zone. These researchers are continuing to use
the Trust Perception Scale-HRI for future studies in transparent communication,
human roles (team member, bystanders), and social dynamics including proxemics,
as part of the US Army Research Laboratory’s Robotics Collaborative Technology
Alliance tasking related to determinants of shared cognition and social dynamics in
future Soldier-robot teams.
Second, a recent study using the Trust Perception Scale-HRI was conducted
where participants monitored a simulated and a live robot surveying an environment,
locating an object, and touching the object (Schafer et al. 2015). Results supported
previous findings that individuals trust a reliable robot significantly more than the
unreliable robot. Also, the results showed that the scale is effective for measuring
trust in both simulated and live HRI experimentation.
A third area of on-going and near-term work using the Trust Perception Scale-
HRI is exploring trust development with respect to the development of robotic
passenger vehicles and transparent passenger user interfaces (Schaefer 2015). The
design of this set of computer-based simulation studies is in line with the goals of
the US Army Tank Automotive Research, Development and Engineering Center’s
ARIBO (Applied Robotics for Installation and Base Operations) project for alter-
native transportation options for on-base wounded Soldier transit (Marshall 2014).
The ultimate goal is to provide a means for Soldiers to schedule an on-demand
216 K.E. Schaefer

autonomous robotic passenger vehicle to arrive and drive door-to-door to and from
the Medical Barracks to the on-base medical facilities. The benefit of this work
is to understand the levels of trust that will enhance usage and effective human-
robot interaction. Results of this set of studies will provide additional insight into
the impact of trust antecedents (e.g., transparency, cueing, human characteristics)
on trust development, as well as explore how the trust relationship changes as the
human role transitions from a driver, to a safety rider, to a supervisor external to the
vehicle, as well as to a passenger. Initial findings advance current trust theory by
demonstrating significant relationships with working memory capacity and coping
style related to driving, distress, workload, and task performance (Schaefer &
Scribner 2015). Future work on this project is two part: (1) understanding the effects
of the availability of driver control interfaces (i.e., steering and speed control versus
automation engage/disengage buttons) on usability, performance, and trust; and (2)
exploring the effects of transparent user interfaces design on trust development and
calibration for passengers.

Acknowledgments This research is a continuation of the author’s dissertation work supported


in part by the US Army Research Laboratory (Cooperative Agreement Number W911-10-2-
0016) and in part by an appointment to the US Army Research Postdoctoral Fellowship Program
administered by the Oak Ridge Associated Universities through a cooperative agreement with
the US Army Research Laboratory (Cooperative Agreement Number W911-NF-12-2-0019).
The views and conclusions contained in this document are those of the author and should not
be interpreted as representing the official policies, either expressed or implied, of the Army
Research Laboratory or the US Government. The US Government is authorized to reproduce
and distribute reprints for Government purposes notwithstanding any copyright notation herein.
Special acknowledgments hereby include the author’s dissertation committee: Drs. Peter A.
Hancock, John D. Lee, Florian Jentsch, Peter Kincaid, Deborah R. Billings, and Lauren Reinerman.
Additional acknowledgments are made to internal technical reviewers from the US Army Research
Laboratory: Dr. Susan G. Hill, Dr. Don Headley, Mr. John Lockett, Dr. Kim Drnec, and Dr.
Katherine Gamble.

References

Army Research Laboratory (2012) Robotics Collaborative Technology Alliance Annual Program
Plan. U.S. Army Research Laboratory, Aberdeen Proving Ground
Bartneck C, Kulić D, Croft E, Zoghbi S (2009) Measurement Instruments for the anthropomor-
phism, animacy, likeability, perceived intelligence, and perceived safety of robots. Int J Soc
Robots 1:71–81. doi:10.1007/s12369-008-001-3
Beer JM, Fisk AD, Rogers WA (2014) Toward a framework for levels of robot autonomy in human-
robot interaction. J Hum Robot Interact 3(2):74–99. doi:10.5898/JHRI.3.2.Beer
Bloomqvist K (1997) The many faces of trust. Scand J Manage 13(3):271–286
Campbell DT, Fiske DW (1959) Convergent and discriminant validation by the multitrait-
multimethod matrix. Psychol Bull 56:81–105
Chen JYC, Terrence PI (2009) Effects of imperfect automation and individual differences on con-
current performance of military and robotics tasks in a simulated multi-tasking environment.
Ergonomics 52(8):907–920. doi:10.1080/00140130802680773
10 Measuring Trust in Human Robot Interactions. . . 217

Desai M, Stubbs K, Steinfeld A, Yanco H (2009) Creating trustworthy robots: lessons and
inspirations from automated systems. In: Proceedings of the AISB convention: new frontiers in
human-robot interaction, Edinburgh. Retrieved from https://www.ri.cmu.edu/pub_files/2009/4/
Desai_paper.pdf
DeVellis RF (2003) Scale development theory and applications, vol 26, 2nd edn, Applied social
research methods series. Sage, Thousand Oaks
Donnellan MB, Oswald FL, Baird BM, Lucas RE (2006) The Mini-IPIP scales: tiny-yet-
effective measures of the Big Five factors of personality. Psychol Assess 18(2):192–203.
doi:10.1037/1040-3590.18.2.192
Fink A (2009) How to conduct surveys: a step-by-step guide, 4th edn. Sage, Thousand Oaks
Gonzalez JP, Dodson W, Dean R, Kreafle G, Lacaze A, Sapronov L, Childers M (2009) Using
RIVET for parametric analysis of robotic systems. In: Proceedings of 2009 ground vehicle
systems engineering and technology symposium (GVSETS), Dearborn
Groom V, Nass C (2007) Can robots be teammates? Benchmarks in human-robot teams. Interact
Stud 8(3):483–500. doi:10.1075/is.8.3.10gro
Hancock PA, Billings DR, Schaefer KE, Chen JYC, Parasuraman R, de Visser E (2011) A meta-
analysis of factors affecting trust in human-robot interaction. Hum Factors 53(5):517–527.
doi:10.1177/0018720811417254
Jian J-Y, Bisantz AM, Drury CG, Llinas J (1998) Foundations for an empirically determined scale
of trust in automated systems (report no. AFRL-HE-WP-TR-2000-0102). Air Force Research
Laboratory, Wright-Patterson AFB
Lawshe CH (1975) A quantitative approach to content validity. Pers Psychol 24(4):563–575.
doi:10.1111/j.1744-6570.1975.tb01393.x
Lee KM, Park N, Song H (2005) Can a robot be perceived as a developing creature? Effects of a
robot’s long-term cognitive developments on its social presence and people’s social responses
toward it. Hum Commun Res 31(4):538–563. doi:10.1111/j.1468-2958.2005.tb00882.x
Lewicki RJ, McAllister DJ, Bies RJ (1998) Trust and distrust: new relationships and realities. Acad
Manage Rev 23(3):438–458. doi:10.5465/AMR.1998.926620
Lussier B, Gallien M, Guiochet J (2007) Fault tolerant planning for critical robots. In: Proceedings
of the 37th annual IEEE/IFIP international conference on dependable systems and networks,
pp 144–153. doi:10.1109/DSN.2007.50
Marshall P (2014) Army tests driverless vehicles in ‘living lab.’ GCN technology, tools, and
tactics for public sector IT. Retrieved from http://gcn.com/Articles/2014/07/16/ARIBO-Army-
TARDEC.aspx?Page=1
Matthews G, Joyner L, Gilliland K, Campbell SE, Falconer S, Huggins J (1999) Validation of a
comprehensive stress state questionnaire: towards a state “Big Three”. In: Mervielde I, Dreary
IJ, DeFruyt F, Ostendorf F (eds) Personality psychology in Europe, vol 7. Tilburg University
Press, Tilburg, pp 335–350
McAllister DJ (1995) Affect- and cognition-based trust as foundations for interpersonal coopera-
tion in organizations. Acad Manage J 38(1):24–59
McKnight DH, Kacmar CJ, Choudhury V (2004) Dispositional trust and distrust distinctions in
predicting high- and low-risk internet expert advice site perceptions. e-Service J 3(2):35–58.
Retrieved from http://www.jstor.org/stable/10.2979/ESJ.2004.3.2.35
Merritt SM, Ilgen DR (2008) Not all trust is created equal: dispositional and
history-based trust in human-automation interactions. Hum Factors 50(2):194–210.
doi:10.1518/001872008X288574
Monahan JL (1998) I don’t know it but I like you—the influence of non-conscious affect on person
perception. Hum Commun Res 24(4):480–500. doi:10.1111/j.1468-2958.1998.tb00428.x
Nomura T, Kanda T, Suzuki T, Kato K (2004) Psychology in human-robot communication: an
attempt through investigation of negative attitudes and anxiety toward robots. In: Proceedings
of the 2004 IEEE international workshop on robot and human interactive communication,
Kurashiki, Okayama, pp 35–40. doi:10.1109/ROMAN.2004.1374726
218 K.E. Schaefer

Powers A, Kiesler S (2006) The advisor robot: tracing people’s mental model from a robot’s
physical attributes. In: 1st ACM SIGCHI/SIGART conference on Human-robot interaction,
Salt Lake City, Utah, USA
Rotter JB (1967) A new scale for the measurement of interpersonal trust. J Pers 35(4):651–665.
doi:10.1111/j.1467-6494.1967.tb01454
Rouse WB, Morris NM (1986) On looking into the black box: prospects and limits in the search
for mental models. Psychol Bull 100(3):349–363. doi:10.1037/00332909.100.3.349
Sanders TL, Wixon T, Schafer KE, Chen JYC, Hancock PA (2014) The influence of modality and
transparency on trust in human-robot interaction. In: Proceedings of the fourth annual IEEE
CogSIMA conference, San Antonio
Schaefer KE (2013) The perception and measurement of human-robot trust. Dissertation, Univer-
sity of Central Florida, Orlando
Schaefer KE (2015) Perspectives of trust: research at the US Army Research Laboratory. In: R
Mittu, G Taylor, D Sofge, WF Lawless (Chairs) Foundations of autonomy and its (cyber)
threats: from individuals to interdependence. Symposium conducted at the 2015 Association
for the Advancement of Artificial Intelligence (AAAI), Stanford University, Stanford
Schaefer KE, Sanders TL, Yordon RE, Billings DR, Hancock PA (2012) Classification of robot
form: factors predicting perceived trustworthiness. Proc Hum Fact Ergon Soc 56:1548–1552.
doi:10.1177/1071181312561308
Schaefer KE, Billings DR, Szalma, JL, Adams, JK, Sanders, TL, Chen JYC, Hancock PA (2014)
A meta-anlaysis of factors influencing the development of trust in automation: implications for
human-robot interaction (report no ARL-TR-6984). U.S. Army Research Laboratory, Aberdeen
Proving Ground
Schafer KE, Sanders T, Kessler TA, Wild T, Dunfee M, Hancock PA (2015) Fidelity & validity in
robotic simulation. In: Proceedings of the fifth annual IEEE CogSIMA conference, Orlando
Schaefer KE, Scribner D (2015) Individual differences, trust, and vehicle autonomy: A pilot
study. In Proceedings of the Human Factors and Ergonomics Society 59(1):786–790. doi:
10.1177/1541931215591242
Scholtz J (2003) Theory and evaluation of human robot interactions. In: Proceedings from the 36th
annual Hawaii international conference on system sciences. doi:10.1109/HICSS.2003.1174284
Steinfeld A, Fong T, Kaber D, Lewis M, Scholtz J, Schultz A, Goodrich M (2006) Common metrics
for human-robot interaction. In: Proceedings of the first ACM/IEEE international conference
on human robot interaction, Salt Lake City, pp 33–40. doi:10.1145/1121241.1121249
Warner RM, Sugarman DB (1996) Attributes of personality based on physical appearance, speech,
and handwriting. J Pers Soc Psychol 50:792–799
Wildman JL (2011) Cultural differences in forgiveness: fatalism, trust violations, and trust repair
efforts in interpersonal collaboration. Dissertation, University of Central Florida, Orlando
Wildman JL, Fiore SM, Burke CS, Salas E, Garven S (2011) Trust in swift starting action teams:
critical considerations. In Stanton NA (ed) Trust in military teams. Ashgate, London, pp 71–88,
335–350
Yagoda RE, Gillan DJ (2012) You want me to trust a robot? The development of a human-robot
interaction trust scale. Int J Soc Robotics 4(3):235–248

View publication stats

You might also like