0% found this document useful (0 votes)
35 views11 pages

VR Enhancements in Shoulder Surfing Research

This document discusses using virtual reality to augment lab-based shoulder surfing research on authentication systems. The authors conducted a study where participants observed user authentications in VR across different scenarios. They found that immersive VR led to a more realistic experience and higher sense of presence compared to 2D video, but did not impact observation performance.

Uploaded by

arolenecynthia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views11 pages

VR Enhancements in Shoulder Surfing Research

This document discusses using virtual reality to augment lab-based shoulder surfing research on authentication systems. The authors conducted a study where participants observed user authentications in VR across different scenarios. They found that immersive VR led to a more realistic experience and higher sense of presence compared to 2D video, but did not impact observation performance.

Uploaded by

arolenecynthia
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Virtual Reality Observations: Using Virtual Reality

to Augment Lab-Based Shoulder Surfing Research

In Proceedings of the IEEE Conference on Virtual Reality and


3D User Interfaces (VR)
Christchurch, New Zealand, March 2022 (IEEE VR 2022)

Florian Mathis, University of Glasgow, United Kingdom


[Link]@[Link]
Joseph O’Hagan, University of Glasgow, United Kingdom
[Link].1@[Link]
Mohamed Khamis, University of Glasgow, United Kingdom
[Link]@[Link]
Kami Vaniea, University of Edinburgh, United Kingdom
kvaniea@[Link]

This is the author’s final accepted version. It is posted here for your personal use. Not for
redistribution. The definitive version was accepted to the IEEE Conference on Virtual Reality
and 3D User Interfaces (VR) (IEEE VR 2022) and will be available in the IEEE Xplore library.
Virtual Reality Observations: Using Virtual Reality to Augment Lab-Based
Shoulder Surfing Research
Florian Mathis* Joseph O’Hagan† Mohamed Khamis‡ Kami Vaniea§
University of Glasgow University of Glasgow University of Glasgow University of Edinburgh
University of Edinburgh

Figure 1: We explore the use of virtual reality (VR) for shoulder surfing research in the authentication research domain. We
compare the impact of non-immersive/immersive VR observations on participants’ observation performance and behaviour while
shoulder surfing authentications. We demonstrate the strengths of VR-based shoulder surfing research by exploring three different
authentication scenarios: (➊) automated teller machine (ATM), (➋) smartphone PIN, and (➌) smartphone pattern authentication.

A BSTRACT authenticate (e.g., to enter a PIN), which puts them at risk of get-
Given the difficulties of studying the shoulder surfing resistance ting observed (referred to as shoulder surfing [23]). Consequently,
of authentication systems in a live setting, researchers often ask researchers looked into the shoulder surfing resistance of a large
study participants to shoulder surf authentications by watching two- variety of authentication schemes (e.g., [11, 18, 20, 39, 63]). A com-
dimensional (2D) video recordings of a user authenticating. How- mon approach in human-centred security research is to study such
ever, these video recordings do not provide participants with a realis- systems’ security by inviting participants to the lab, showing them
tic shoulder surfing experience, creating uncertainty in the value and two-dimensional (2D) video recordings, and asking them to guess
validity of lab-based shoulder surfing experiments. In this work, we the observed PIN (e.g., [20, 41]). These recordings show user au-
exploit the unique characteristics of virtual reality (VR) and study the thentications from pre-defined observation angles, with researchers’
use of non-immersive/immersive VR recordings for shoulder surfing intention to simulate a “best-case scenario” for an attacker that shoul-
research. We conducted a user study (N=18) to explore the strengths der surfs the user. Although 2D video recordings form a suitable
and weaknesses of such a VR-based shoulder surfing research ap- baseline for shoulder surfing research [9], it remains unclear a) how
proach. Our results suggest that immersive VR observations result in (if at all) researchers empirically define the observation perspective,
a more realistic shoulder surfing experience, in a significantly higher b) if the selected perspective indeed represents a best-case scenario
sense of being part of the authentication environment, in a greater for attackers, and c) if 2D video recordings can provide realistic
feeling of spatial presence, and in a higher level of involvement than shoulder surfing experiences. While studying shoulder surfing in a
2D video observations without impacting participants’ observation live setting is possible, it is often challenging [82] and in some cases
performance. This suggests that studying shoulder surfing in VR is even infeasible. For example, studying shoulder surfing on ATM
advantageous in many ways compared to currently used approaches, authentications in the real world is close to impossible due to ethical
e.g., participants can freely choose their observation angle rather and legal constraints [19, 77].
than being limited to a fixed observation angle as done in current As a result, we explore in this work how virtual reality (VR) can
methods. We discuss the strengths and weaknesses of using VR for support shoulder surfing research by enabling researchers to study
shoulder surfing research and conclude with four recommendations shoulder surfing in settings that are challenging to replicate in the
to help researchers decide when (and when not) to employ VR for lab and infeasible to research in the real world. Ideally, researchers
shoulder surfing research in the authentication research domain. would be able to assess a system’s shoulder surfing resistance in a
variety of contexts without much effort. Through the use of VR as a
Keywords: Virtual Reality, Shoulder Surfing, Authentication research platform, we enable researchers to a) evaluate the shoulder
surfing resistance of authentications in situ instead of in lab settings
1 I NTRODUCTION (e.g., [18, 22]), and b) investigate participants’ observation strategies
Accessing private data has become a fundamental part of most peo- in much more detail than what can be achieved in traditional lab
ple’s daily life. Examples include, but are not limited to, check- settings. To explore the potential of VR for shoulder surfing research
ing emails on smartphones, accessing the account balance through on authentication systems, we conducted a lab-based VR user study
online banking apps, or withdrawing cash at automated teller ma- (N=18). We exposed participants to user authentications in three
chines (ATMs). In many of these situations, users are required to different contexts: ATM, smartphone PIN, and smartphone pattern
authentication. We then ran a comparison of participants’ perception
* e-mail: [Link]@[Link] when shoulder surfing user authentications using commonly used
† e-mail: [Link].1@[Link] 2D video recordings (i.e., 2DVO, our baseline), and non-immersive1
‡ e-mail: [Link]@[Link] (i.e., 3DO) and immersive VR recordings (i.e., VRO). Our findings
§ e-mail: kvaniea@[Link]
1 We use the terminology by Freina and Ott [28] where non-immersive

refers to a computer-based environment that simulates places in the real or


imagined worlds. Immersive takes the idea even further by providing the
perception of being physically present in the non-physical world.
show that there is a significant difference in participants’ observa- ation methods are often “devised based on expert knowledge and
tion performance between VRO and 3DO. However, inline with general intuition, [but] method design should instead be driven by
Aviv et al.‘s findings [9], 2DVO already provide a suitable baseline well-established experimental evaluation” [13]. In many shoulder
measure for shoulder surfing research, especially when assessing surfing studies, it has been argued that the systems are evaluated
a system‘s resilience against observations. Participants’ observa- “under optimal conditions for the attacker” [61], that “opting for an
tion performance is highest in VRO (M=93.14%, SD=25.35%), fol- expert attack represents a worst-case-scenario that provides a good
lowed by 2DVO (M=89.35%, SD=30.92%) and 3DO (M=81.40%, estimate of the security of an authentication mechanism” [20], and
SD=39.01%), with no evidence of a significant difference between that corresponding threat models assumed a “best case scenario for
VRO and 2DVO. VRO resulted in a higher level of sense of being the attacker” [48]. However, Wiese and Roth [82] recommended
there, in a higher level of spatial presence, and increased participants’ to study live observations and that attackers’ observation strategies
involvement and experienced realism compared to 2DVO and 3DO. should be taken into account when studying security systems empir-
This, together with participants’ observation performance, suggests ically [82]. Aviv et al. [9] went one step further and analysed the
that VRO are suitable for shoulder surfing research and are to be claims that video recordings offer a suitable alternative for shoulder
preferred in situations where researchers’ aim is to a) provide partic- surfing research. Although they concluded that 2D video recordings
ipants with more realistic shoulder surfing experiences and b) study can provide a suitable baseline measure for shoulder surfing [9], they
participants’ observation strategies in much more detail than what also highlighted the importance of not overclaiming findings of such
2DVO are capable of. evaluations as they can, in fact, greatly underestimate the threat of
Based on our findings, we contribute four lessons learned, such as an attacker in a live setting [9].
accounting for real-world factors (e.g., proxemics [32]) and the im-
portance of introducing participants to novel observation methods, to 2.2 VR Studies for Human-centred Research
support researchers in their decision when (and when not) to employ Several research communities recently began using VR as a research
VR as a research method for authentication and shoulder surfing platform for human-centred research. The Human-computer In-
research. In sum, the contribution of our work is three-fold: (1) We teraction (HCI) community investigated using VR as a research
propose the use of non-immersive and immersive VR observations methodology to evaluate smart artefacts [76] and pedestrian naviga-
for shoulder surfing research on authentication systems and explore tion methods [65]. Voit et al.’s comparison of five empirical research
their strengths and weaknesses. (2) We demonstrate through three methods [76] (i.e., online, VR, augmented reality, lab, in situ) sug-
different authentication scenarios how VR can contribute towards gested that VR and in situ provide similar insights when evaluating
more realistic shoulder surfing research. (3) Finally, we discuss our standardised questionnaires such as SUS [14] or AttrakDiff [35].
findings in the light of prior works and provide four recommenda- Weiß et al. [81] showed that alternative empirical research meth-
tions to support researchers when leveraging VR for authentication ods (e.g., VR) might be used to infer insights about in situ studies
and shoulder surfing investigations. and that the evaluation of situated visualisations is not necessarily
dependent on the empirical research method.
2 BACKGROUND AND R ELATED W ORK In the human-centred security domain, Mathis et al. [47] con-
To contextualise our work, we review shoulder surfing and authenti- ducted a replication study to evaluate a real-world authentication
cation research, and works that used VR as a research platform. scheme in VR. While their work, along with George et al.‘s initial
comments on using VR as a testbed [30], is the first that validated
2.1 Shoulder Surfing and Authentication Research the potential of VR for human-centred security research, Mathis et
The literature of shoulder surfing ranges from works that collected al. [47] also argued that their investigation lays only the groundwork.
shoulder surfing stories in the wild [23], to more system-focused re- Particularly, that follow-up research is required to validate the use
search that explored the shoulder surfing resistance of novel privacy- of VR for the broader research field and establish VR studies as a
protecting (e.g., [15, 57]) and security systems (e.g., [20, 21, 78]). In complementary research method for real-world investigations. For
authentication research, which is considered to be a major theme example, Mathis et al. [47] did not study VR‘s unique affordances
in human-centred security and privacy [29], most shoulder surfing of non-immersive and immersive VR observations for shoulder surf-
evaluations rely on either a) two-dimensional video recordings or b) ing research. VR studies can also be particularly helpful at times
live observations [13, 82]. Roth et al. [61] exposed participants to where physical spaces are challenging to access or even prohibited
video recordings that showed both the authentication scheme and the (e.g., during a pandemic) [44]. Rebelo et al. [58] argued that VR
user’s interactions. De Luca et al. [20] located a camera opposite to enables researchers to develop realistic-looking environments that
their participants and an additional one at participants’ back to run come with greater control of experimental conditions than lab set-
post-hoc shoulder surfing evaluations. There is a significant larger tings and that users’ experience can benefit from using VR as a
body of work that relied on video recordings for shoulder surfing research methodology. Thomas Parsons [53] showed that virtual en-
evaluations (e.g., [11, 18, 39, 63]). vironments can enhance ecological validity in the clinical, affective,
Others conducted shoulder surfing research through live observa- and social neurosciences through evaluation paradigms that combine
tions where participants observed authentications in real time and the experimental control of laboratory measures with emotionally
could choose a viewing position on their own. Zakaria et al. [86] engaging background narratives.
simulated live shoulder surfing by letting participants observe user VR studies were also proposed as a new social psychologi-
authentications performed by the experimenter (who acted as “the cal research tool to overcome the existing problems around con-
victim”). Mathis et al. [48] equipped participants with a smart- trol–mundane realism trade-off, lack of replication, and unrepresen-
phone to then let them freely move around and record the user’s tative sampling [12]. Fiore et al. [27] proposed VR studies in the
authentication. Saad et al. [62] used 360◦ real-world videos to environmental policy research domain to provide a bridge over the
better understand users‘ shoulder surfing gaze behaviour and ar- methodological gap between lab and field studies and concluded that
gued such a virtual scenario “brings us one step closer to the goal VR has the potential to combine the internal validity of controlled
of understanding shoulder surfing”. Other works investigated the lab experiments with the external validity of field experiments.
impact of multiple simultaneous shoulder surfers on a system’s resis-
tance [40] or proposed a model for modern shoulder surfing where 2.3 Lessons Learned from Prior Work
authentications are divided into minimal human observations [83]. From the literature, we learned that live shoulder surfing (instead
Bošnjak and Brumen [13] even argued that shoulder surfing evalu- of video recordings) should be preferred when conducting shoulder
surfing research (e.g., [8, 82]). However, human-centred security for the hand tracking [50] and an abstract avatar design that comes
researchers often rely on video recordings due to the difficulties of with a head, body, legs, eyes, and hands. Note that the abstract
running these evaluations in real time (e.g., requiring researchers avatar‘s dimensions and movements were mapped to a human in the
to simulate real-world adversaries [82]). It is worth mentioning real world. Previous research showed that shoulder surfing studies
that video recordings offer consistency across the entire study sam- conducted in virtual environments do not necessarily require highly
ple, which is not necessarily the case in a real-time setting [82]. realistic full-body avatars [45, 47]. Using a more abstract avatar also
Prior work showed that VR setups enable researchers to simulate contributes to making VR studies [44, 49, 51] more accessible to the
hard-to-reach or safety-critical physical locations in an affordable broader research community [46] as it does not require additional
and effortless way [44, 54]. This is particularly interesting for the expertise in hardware (e.g., tracking systems) and avatar-building
human-centred security domain where private and sensitive contexts expertise. We used the same avatar (see Fig. 1 and Fig. 3) for all
are often challenging to study [19, 77]. We also noticed that VR has three authentication systems, authentication environments, and ob-
already been successfully applied in several other research domains servation methods to contribute to high internal validity. To track
(e.g., Human-computer Interaction [44, 47, 52, 76], Information Vi- users’ smartphone in the virtual environment, we attached an HTC
sualisation [81, 85]). VIVE tracker to the back of a real smartphone, similar to Amano
To draw on the success of previous VR studies and to close the et al. [7, Figure 5]. We then prepared 2D video recordings and
gap between commonly used 2D video recordings and the often non-immersive/immersive VR recordings for the actual user study
hard to conduct real-time shoulder surfing evaluations, we build (see Sect. 3.2). We enriched participants’ shoulder surfing expe-
upon previous works that used 2D video recordings (e.g., [8, 9, 47]). rience with realistic environmental sounds that match the virtual
As such, we investigate the strengths of VR for in situ shoulder environment (e.g., traffic sounds, birds twittering).
surfing research and participants‘ performance when using three-
dimensional VR-based observations. While Mathis et al. [47] ran a 3.1 Authentication Scenarios and Environments
comparison between 2D videos recorded in VR (2DVO, our baseline)
We used a low-polygon styled city package [38], a 3D model of an
and 2D real-world videos, we extend their work by investigating
ATM [1], and a smartphone 3D model [2] that we slightly modified
for the first time the impact of 3D non-immersive and immersive
by replacing the lock screen with our authentication schemes (PIN
VR observations on participants‘ shoulder surfing performance and
and pattern). For the PIN-based authentication, we used Unity’s On-
behaviour. Saad et al. [62] proposed 360◦ real-world videos for
CollisionEnter method which triggers after another object collides
shoulder surfing research, but there is a lack of an evaluation of a)
(i.e., the user’s finger). To implement a realistic pattern-based au-
the impact of such recordings on participants‘ performance when
thentication scheme, we used Unity’s Line Renderer component [3]
observing different authentication schemes in different contexts and
which takes an array of two or more points in 3D space to then
b) users‘ observation strategies and their movement behaviour (e.g.,
draw a straight line between each one. In the smartphone authen-
positioning, adhering to social proxemics [32]). Furthermore, due
tication scenarios the UI of the authentication scheme (i.e., the
to the lack of a baseline condition in the work by Saad et al. [62]
PIN/pattern layout) was only visible for the duration of the au-
(e.g., 2D videos [11, 18, 63]), it remains unclear how participants‘
thentication. The authentication scheme disappeared as soon as
performance differs in comparison to the use of traditional 2D videos.
a 4-symbol PIN/pattern was entered. This simulates a real-world
We fill this gap through an in-depth comparison between three-
smartphone authentication where the user lands on the home screen
dimensional VR observations (3DO and VRO, see Sect. 3.2) and
after authenticating (e.g., when unlocking the device).
the de facto standard approach (2D Video Observations) to evaluate
authentication systems and their resilience against shoulder surfing.
3.2 Authentication Recordings
Our work provides promising insights into the use of VR for au-
thentication and shoulder surfing research. It demonstrates how such Two-dimensional video recordings (our baseline) are typically
a research approach enables researchers to study users‘ movement recorded from pre-defined observation angles with the aim to provide
behaviour when observing user authentications in different environ- attackers with a best-case scenario, i.e., a clear sight on a mobile de-
ments and on different authentication schemes and opens the door vice’s screen and input (e.g., [11, 39, 63]). We used VR capture [60]
for the research community to leverage VR’s unique affordances to to create such 2D video recordings of both the user’s input and the
further advance human-centred security research. authentication scheme. Fig. 1 shows the three authentication systems
participants observed. We used an observation position that presents
3 S TUDIED AUTHENTICATION S CENARIOS : A PPARATUS participants with a “best case scenario”. The observation perspec-
AND I MPLEMENTATION tive for the 2D video recordings has been determined through pilot
tests. For the three-dimensional recordings, we built upon Ultimate
We simulated in this work three scenarios that all take place in public Replay [74], a state-based replay system that records the scene using
spaces: 1) ATM authentication, 2) smartphone PIN authentication, 3) “snapshots“ at regular intervals that reconstruct the scene during
smartphone pattern authentication. We studied these three scenarios playback. We implemented additional scripts to track mesh changes
due to several reasons: First, a survey by Eiband et al. [23] showed and to keep track of the different states of Unity’s Line Renderer
that shoulder surfing is most prominent in public spaces, especially component. Participants then experienced the authentications (∼ 2 -
when using smartphones. Second, ATMs are often found in public 3.5 seconds, similar to previous PIN/pattern-based research [6, 18])
spaces, are frequently visited by people (e.g., De Luca et al. [19] using state-of-the-art 2D Video Observations, 3D Observations, and
reported widespread ATM usage), and are particularly challenging to VR Observations.
research in the real world [19,77]. Running a similar study in front of
a real-world ATM is close to impossible in the detail required for our 2D Video Observations (2DVO, baseline). Our baseline de-
research. Furthermore, shoulder surfing forms an important threat picts the scenario where both the user’s input and the authentication
vector in authentication research and both studied schemes (i.e., PIN scheme were recorded using an angle that provides a shoulder surfer
and pattern) form a popular security baseline in the human-centred with a “best-case” scenario, similar to how prior shoulder surfing
security field (e.g., for PINs: [8, 20, 31, 39], for patterns: [8, 20, 31]). evaluations were conducted (e.g., [11, 39, 63, 78]). Participants per-
To evaluate the suitability of VR-based three-dimensional obser- formed their shoulder surfing observations on video recordings on a
vations for shoulder surfing research, we first had to collect record- computer screen and could not manipulate the observation position
ings of users authenticating. We implemented three authentication and orientation. Note that we recorded the authentications through
scenarios using Unity 3D (C#), see Fig. 1. We used a leap motion virtual cameras in the virtual environment. Previous work showed
that shoulder surfers’ observation performance on VR-based two- TLX [34]) and presence (IPQ [66]). We did this to ensure a consis-
dimensional video material matches to a great extent with findings tent VR experience and not break participants’ focus [55].
from a video-based real-world shoulder surfing study [47].
4.1 Study Procedure
3D Observations (3DO, non-immersive). Participants’ ini-
We first explained a) the different authentication scenarios and au-
tial observation view was positioned so that the camera points to-
thentication schemes, b) the different observation methods, and c)
wards the user’s back. We did this to ensure that our participants
what participants’ task is (i.e., observing 4-digit PIN authentica-
come up with individual observation strategies and are required
tions). In advance of the observation task, participants went through
to change their position and perspective. The initial position did
an example authentication (e.g., “1234” PIN entry). We did this
not provide them with a clear line of sight on the authentication
to familiarise them with the observation methods and the authenti-
scheme. Participants navigated in the environment using a traditional
cation schemes. Participants then started with the first observation
mouse-keyboard configuration, which we borrowed from previous
method (e.g., 2DVO) and observed four authentications for each
work on direct manipulations in non-immersive VR environments
authentication context. Participants were not allowed to clip through
(e.g., [25, 59]). Participants used the keyboard to simulate walking
the virtual avatar in 3DO and VRO as this would not be possible in
(i.e., translation along the x/y/z-axis) and the mouse to simulate head
the real world. However, we did not restrict them from positioning
movements (i.e., rotations along the x/y/z-axis), and watched the au-
themselves in, for example, front of the virtual avatar because a)
thentications on a traditional computer monitor after setting up their
this could happen in the real world as well (e.g., standing at a bus
preferred observation position/orientation. Participants were not
station) and b) we aimed to investigate if participants make use
restricted to physical real-world conditions. We aimed to investigate
of proxemics [32] (e.g., do they maintain a certain social distance
if participants exploit the unique affordances of such a 3D observa-
to the user authenticating? are they aware that such observations
tional approach in a virtual environment (e.g., being independent of
are likely noticeable by the user authenticating?). For each obser-
gravitational force).
vation, participants could provide up to three PIN/pattern guesses.
VR Observations (VRO, immersive). Participants were wear- Participants then filled in the NASA-TLX [34] and the IPQ question-
ing a VR headset (i.e., HTC VIVE) and could freely move around naire [66]. We concluded with semi-structured interviews (available
and change their observation perspective and position as they wished. in Appendix A in our supplementary material) about participants’
This depicts a scenario which is closest to in situ observations where perceived performance and their observation experience when using
a bystander can freely move around in a physical space and shoulder the different observation methods.
surf a user authenticating.
4.2 Ethical Considerations and Compensations
4 M ETHODOLOGY Our research has been reviewed and approved by the College of
Science and Engineering Ethics Committee at the University of
We conducted a series of 1.5 hour in-the-lab investigations where Glasgow. The study was conducted in Austria due to COVID-19.
participants (in the role of observers) observed overall 648 authen- Participants were paid C15 (C10/h) and took part in a lottery to
tications (18 participants × 12 PINs/patterns × 3 authentication win additional C15. Participants were made aware in advance of
scenarios). We reached out to potential participants using social me- the study that chances of winning increases with the number of
dia postings and word of mouth (outside of a university environment). successfully observed PINs/patterns. We did this to motivate them
We recruited a sample of 18 participants (5 male, 13 female). Partic- to perform well in their shoulder surfing task (similar to [41, 48]).
ipants were on average 32.44 years (min=18, max=61, SD=12.22).
All participants reported that they have used an ATM before and that 5 R ESULTS
they own a smartphone that they use on a daily basis. Slightly more
than half of our participants (N=11) mentioned that they have used We first report participants’ observation performance, represented
VR before. Participants observed authentications in all three authen- through the percentages of successful observations and the mean
tication scenarios: 1) 4-digit PIN entries on an ATM, 2) 4-digit PIN Levenshtein distances. We then report participants’ sense of pres-
entries on a smartphone, and 3) 4-symbol pattern entries on a smart- ence and perceived workload when using 2DVO, 3DO, and VRO.
phone. All participants went through all three observation methods Finally, we provide a qualitative analysis of the semi-structured in-
(within-subject design). Conditions were counter-balanced using a terviews along participants’ observation strategies. Unless otherwise
Latin Square. As independent variables, we had the observation stated, we performed an aligned rank transformation on our data
type (three levels: 2DVO (our baseline), 3DO, VRO), and the threat to correct for violations of normalcy using ART by Wobbrock et
model (two levels: single-view and repeated-view observations, al. [84] and ART-C [24] for post-hoc pairwise comparisons. We
both threat models are frequently used when evaluating a system’s report η p2 (partial eta square) as an effect size statistic for our ART
security [39, 41, 47]). While in single-view observations participants analysis (0.01 = small, 0.06 = medium, 0.14 = large [16, 17]). Ap-
could observe the user authenticating only once, in repeated-view pendix C & D in our supplementary material provide a full overview
observations participants could replay the authentication. The type of the F-ratios, together with effect sizes, means, and stdevs.
of attack was alternating, similar to [41]. We had four dependent 5.1 Observation Performance and Levenshtein Distance
variables: Observation Performance: Participants’ observation
performance, the number of successful PIN/pattern guesses. Leven- Participants’ observations in VRO resulted in overall more successful
shtein Distance: the minimum number of single-digit edits between observations (M=93.14%, SD=25.34%) than in 2DVO (M=89.35%,
participants’ best guess and the correct PIN/pattern, which is com- SD=30.92%) and 3DO (M=81.40%, SD=39.01%). We calculated
monly used in shoulder surfing research (e.g., [4, 21, 31]). Sense of the mean Levenshtein distances between participants’ best guess
Presence: Participants’ sense of presence experienced when using and the correct PIN/pattern to proceed with a statistical analysis and
the different observation methods, measured using the standard IPQ to gain better insights into how close participants’ guesses are to the
questionnaire [66]. Perceived workload: Participants’ perceived entered PINs/patterns.
workload when using the different observation methods, measured ATM Authentication: Participants’ observation performance
using the NASA-TLX questionnaire [33]. was M=94.44% (SD=15.94%) for 2DVO, M=83.33% (SD=23.90%)
Demographic questions (including age, gender, VR experience) for 3DO, and M=95.59% (SD=14.40%) for VRO. There was a sig-
were asked using Qualtrics [56]. We used additional in-VR ques- nificant effect of observation method (F(1,83) = 4.584, p < 0.05, η p2
tionnaires [26] to measure participants’ perceived workload (NASA- = 0.10) and threat model (F(1,83) = 4.526, p < 0.05, η p2 = 0.05) on
6
* * 100
* * *
*

Nasa−TLX Scores
75
*
4 method method
IPQ Scores
* * *
* 2D 2D
50
3D 3D

2 VR VR
25

0 0
PRES SP INV REAL mental physical temporal performance effort frustration
IPQ Dimensions NASA−TLX Dimensions

Figure 2: VRO led to a significantly higher sense of being there, higher spatial presence, higher involvement, and higher experienced realism
than 2DVO and 3DO. There were no statistically significant differences in participants’ perceived workload when using the different observation
methods. Error bars denote the 95% confidence interval (CI).

participants’ guesses and their distance to the correct PIN. There SD=1.55) (p < 0.05). The difference between 2DVO and 3DO was
was also an interaction effect between threat model × observation also statistically significant (p < 0.05).
method (F(1,83) = 3.319, p < 0.05, η p2 = 0.07). Post-hoc Bonfer- Spatial presence. Participants’ experienced spatial presence dif-
roni adjusted analysis did not confirm the interaction effect, with all fered statistically significantly between the different observation
pairwise-comparisons being not significant (p > 0.05). Follow-up methods (F(2,34) = 59.61, p < 0.05, η p2 = 0.78). Post-hoc analy-
analysis on the main effect of observation method revealed that par- sis revealed statistically significant differences in participants’ spa-
ticipants’ guesses on ATM authentications were closer to the correct tial presence in 2DVO (M=1.28, SD=1.48) and in 3DO (M=2.34,
PIN when using VRO (M=0.074, SD=0.250) and 2DVO (M=0.097, SD=1.99) compared to VRO (M=5.03, SD=1.18) (p < 0.05). The
SD=0.288) compared to 3DO (M=0.278, SD=0.470) (p < 0.05). difference between 2DVO and 3DO was also significant (p < 0.05).
Smartphone PIN Authentication: Participants’ observation per- Involvement. Participants’ experienced involvement was statis-
formance was M=77.78% (SD=30.34%) for 2DVO, M=69.44% tically significantly different in the different observation methods
(SD=36.41%) for 3DO, and M=83.82% (SD=26.74%) for VRO. (F(2,34) = 20.592, p < 0.05, η p2 = 0.55). Post-hoc analysis revealed
There was a significant effect of observation method (F(1,83) = 4.95, statistically significant differences in 2DVO (M=2.11, SD=2.15) and
p < 0.05, η p2 = 0.11) and threat model (F(1,83) = 6.69, p < 0.05, η p2 in 3DO (M=2.35, SD=1.91) compared to VRO (M=4.32, SD=1.46)
= 0.07) on the mean Levenshtein distance. Participants‘ guesses in (p < 0.05). There is no evidence that participants’ experienced
VRO were closer to the correct PIN (M=0.265, SD=0.448) than in involvement differed statistically between 2DVO and 3DO.
3DO (M=0.648, SD=0.867) (p < 0.05). There were no significant
Experienced Realism. Participants’ experienced realism was
differences between the other pairs (2DVO: M=0.403, SD=0.685).
statistically significantly different between the different observation
Pattern Smartphone Authentication: Participants’ observation methods (F(2,34) = 23.944, p < 0.05, η p2 = 0.58). Post-hoc analysis
performance was M=95.83% (SD=14.02%) for 2DVO, M=91.67% revealed statistically significant differences in participants’ experi-
(SD=22.36%) for 3DO, and M=100.00% (SD=0.00%) for VRO. enced realism in 2DVO (M=1.50, SD=1.64) and in 3DO (M=2.13,
There was a significant effect of observation method (F(1,83) = 3.21, SD=1.83) compared to VRO (M=2.96, SD=1.98) (p < 0.05). The
p < 0.05, η p2 = 0.07) and threat model (F(1,83) = 25.53, p < 0.05, difference between 2DVO and 3DO was also significant (p < 0.05).
η p2 = 0.24) on the mean Levenshtein distance. Participants’ guesses
in VRO were closer to the correct pattern (M=0.00, SD=0.00) than Summary: Sense of Presence
in 3DO (M=0.139, SD=0.371) (p < 0.05). There were no significant VRO led to a significant higher sense of being part of the virtual
differences between the other pairs (2DVO: M=0.083, SD=0.305). environment, to a higher spatial presence, and to a higher feeling
Summary: Observation Performance of involvement and experienced realism than 2DVO and 3DO.
The Levenshtein distances confirmed the differences in partici-
pants’ observation performance between VRO and 3DO, but not 5.3 Perceived Workload (NASA-TLX)
between VRO and 2DVO. VRO resulted in the most accurate ob-
servations, followed by 2DVO. Shapiro-Wilk tests of normality indicated that participants’ per-
ceived workload when experiencing the different observation meth-
5.2 Sense of Presence (IPQ) ods follows a normal distribution on the level of each observation
method. Therefore, we did not perform an aligned rank transforma-
There was a significant effect of observation method on the overall
tion. Mauchly’s test of sphericity indicated that the assumption of
IPQ scores (F(2,34) = 71.429, p < 0.05, η p2 = 0.81). Post-hoc sphericity had not been violated, χ 2 (2) = 3.255, p=0.196. Partici-
analysis confirmed that the sense of presence was significantly higher pants’ perceived workload was statistically significantly different
in VRO (M=4.22, SD=1.76) than in 3DO (M=2.28, SD=1.93) and between the observation methods, F(2,34) = 4.715, p < 0.05, η p2
2DVO (M=1.55, SD=1.77) (p < 0.05). The difference between = 0.217, but post-hoc analysis with Bonferroni adjustment did not
3DO and 2DVO was also significant (p < 0.05). Fig. 2 shows an confirm the significant differences (p > 0.05). The mean values
overview of the results, featuring the subscales 1) sense of being of participants’ perceived workload are M=28.15 (SD=15.77) for
there (PRES), 2) spatial presence (SP), 3) involvement (INV), 4) 2DVO, M=27.31 (SD=14.61) for 3DO, and M=18.98 (SD=17.62) for
experienced realism (REALISM). We followed up with a more
VRO. Fig. 2 shows the mean NASA-TLX values for each dimension.
nuanced analysis on the level of each subscale.
Sense of being there. The observation methods elicited statisti- Summary: Perceived Workload
cally significant changes in participants’ sense of being (F(2,34) =
31.932, p < 0.05, η p2 = 0.65). Post-hoc analysis revealed a statisti- There is no evidence that VRO or 3DO led to a higher workload
cally significant lower sense of being in 2DVO (M=0.88, SD=1.45) than 2DVO, suggesting that participants’ differences in perceived
and in 3DO (M=2.33, SD=2.14) compared to VRO (M=4.78, workload when using 2DVO, VRO, and 3DO are negligible.
5.4 Semi-structured Interviews
We concluded our study with semi-structured interviews to a) shed
more light on participants’ perception and performance when using
the different observation methods and b) better understand their
perceived differences to shoulder surfing in the wild. We transcribed
the interview data and split participants’ statements into meaning-
ful excerpts. This process resulted in overall N=292 participant
statements, which we then systematically clustered using an affinity
diagram. The initial clustering was performed by the lead researcher.
A second researcher then performed an independent review of the
clustering and added tags to clusters that required another iteration.
Both researchers then met to discuss the clustering and to resolve any
discussion points that came up during the review process. Through
this process, we identified five themes: 1) Observation Methods’
Unique Characteristics, 2) VRO for More Realistic Shoulder Surfing
Experiences, 3) Lab vs Real-World Observations, 4) The Differences
Between the Authentication Scenarios, and 5) General Comments. Figure 3: ➊ shows the reference position + orientation of 2DVO. Par-
Below, we discuss those that are particularly relevant for the scope ticipants made use of the absence of physical constraints in 3DO (see
of our research in more detail. Reporting the number of participants ➋). From the immersive VR observations we noticed that social fac-
who shared certain opinions would be inaccurate due to the use of tors (e.g., the proximity to the user authenticating) lose relevance in
a semi-structured interview approach and the study‘s exploratory such a virtual environment, which we discuss further in Sect. 6.4.
nature. Thus, we do not include frequencies. Quotes are translated ➌ shows a VR observation in which the participant pretended to
from German to English where necessary. tie their shoes while performing the observation (a); (b) shows the
observation position through another perspective.
5.4.1 Observation Methods’ Unique Characteristics
We noticed that VRO contributed to a close-to-reality looking over realistic in situ shoulder surfing experience, 2DVO and 3DO were
someone’s shoulder experience. Although 3DO provided partici- considered to be observations from “another world”.
pants with a more realistic shoulder surfing experience than 2DVO,
the mouse-keyboard interaction impacted participants’ observation 5.4.3 Lab vs Real-World Observations
performance. Consequently, the “plug-and-play” characteristic of
Participants reported that they would perform real-world observa-
2DVO resulted in observations being easier than 3DO. P11 men-
tions similarly as done using VRO, e.g., “I can imagine that [real-
tioned that in VRO “[they] could position [themselves] in a way
world observations] work exactly how I did it in VR” (P12). How-
how they wanted it and it was super easy to select the position; this
ever, across all participants the message was that they would respect
was more difficult with keyboard/mouse” (P11). Others mentioned
the social distances to the user more in the real world. P9 mentioned
that in VRO “[you] just need to walk to a specific position” (P17).
that “[they] would probably stay further away and do it less con-
Regarding 3DO, participants mentioned that their experience was
spicuously” (P9). Others voiced that they completely ignored the
closer-to-reality than 2DVO because “it felt more like that [they]
social factor during the study and “only optimised [their] viewing
really want to look over someone’s shoulder” (P15). P7 mentioned
point” (P10). P4 added that in the real world “there would be other
that “they could experiment a bit like in the real world where you
people [and that they] would probably feel being observed” (P4).
can observe [the authentication] from different perspectives.” (P7).
P13 voiced that in the ATM scenario “the user who withdraws cash
Although the lack of manipulations was raised by some participants
probably already acts precautiously – so you would realise when
in 2DVO, there was a general consensus that it was easier to observe
someone stays that close to you.” (P13).
authentications in 2DVO than in 3DO. Participants mentioned that
In summary, we noticed that while VRO contributed to more
the observation position + angle provided them with a clear line of
realistic shoulder surfing experiences than 2DVO, participants men-
sight and that their only task was to watch the authentication record-
tioned that users would sense if someone is close to them. In our
ing. In fact, some participants mentioned they found the videos more
study, participants did not necessarily consider the social factor (i.e.,
realistic because they used VRO and 3DO “in a way to really abuse
proxemics [32]) in their observations (see participants’ tracked ob-
them” (P9), resulting in some unusual observation positions.
servation positions in Fig. 3, visualised through black dots), which
5.4.2 VRO for More Realistic Shoulder Surfing Experiences arguably takes on an important role in real-world observations [15].
In VRO, P3 voiced that “the [real] environment would be completely
6 D ISCUSSION
irrelevant; it does not matter if [they are] in a basement, in an attic,
outside, or at the sea” (P3), and that they did not feel like being We explored how the use of VR can contribute to advanced shoulder
part of an experiment. Others mentioned that “with the VR headset surfing research. We found that VRO provided participants with a
[they] moved within the environment and it felt on a physical way reasonably realistic shoulder surfing experience without negatively
more realistic” (P4). For 3DO, participants voiced that they did impacting their shoulder surfing performance (see Sect. 5.1). Our
not feel being part of the environment to the same extent as in study showed that VRO contribute to a significant higher sense of
VRO because of the presence of reality and that they were “aware of being in the environment, a greater feeling of spatial presence, a
everything that surrounded [them] in the reality” (P15). P3 explained higher level of involvement, and a higher experienced realism than
this based on the fact that they were “sitting in front of the PC and 2DVO (baseline). While this is an expected finding with the benefits
could see stuff on the left and right side that is not related to the of immersive VR in terms of presence being well-known to the VR
[authentication scheme]” (P3). For 2DVO, participants voiced that community (e.g., [73, 75]), the advantages of VRO over 2DVO are
their task was only to “watch” the authentications and that they were particularly interesting for shoulder surfing research. Our findings
“very conscious that there is a technical device between [them] and imply that previous shoulder surfing studies using 2D videos were
the environment” (P4). The overall qualitative feedback suggests that not necessarily capable of providing participants with a close-to-
there are two extremes: While VRO contributed towards a reasonably reality shoulder surfing experience; therefore, impacting the often
desired high ecological validity of usable security research stud- studying participants’ observation behaviour and their observation
ies [46]. Despite the advantages of VRO, our results suggest that strategies, shoulder surfing methods such as 3DO can be particularly
2DVO are sufficient to assess a system’s resilience against obser- helpful because they enable researchers to study situations that are
vations (see Sect. 5.1). This confirms Aviv et al.’s findings when challenging to research using other means.
comparing 2D video recordings with live observations [9]. In all
three authentication contexts, there is no evidence that VRO were 6.3 VR Studies and Research in the Wild
more accurate than 2DVO. Below, we discuss the impact of 3DO on It is important to acknowledge that VR studies should not, at any
shoulder surfing experiments together with participants’ observation point, replace traditional real-world lab or field studies, but rather
behaviour in more detail. Participants’ observation behaviour was complement them [44,47,49]. As put by Mäkelä et al. [44], “VR field
similar across the authentication scenarios. Therefore, we moved studies situate between lab studies and real-world field studies, being
the smartphone PIN/pattern visualisations to Appendix B in our sup- closer to field studies in ecological validity, and closer to lab studies
plementary material and discuss participants‘ observation behaviour with regards to their required effort”. Virtual simulations make it
on ATM authentications in more detail in Sect. 6.1. “easy to experiment with different physical display configurations,
e.g., layouts, shapes, sizes and locations” [44]. In a similar vein,
6.1 VR-based Observation Methods: A Blessing and a we showed how VR enables researchers to study human shoulder
Curse for Shoulder Surfing Research surfing on authentication schemes in several contexts without much
From participants’ shoulder surfing behaviour (see Fig. 3), we no- additional effort. Studying all three authentication contexts in the
ticed that in 3DO participants made use of the unique characteristics wild would require a significant amount of additional hardware
of non-immersive VR. This is apparent in our study as follows: In (e.g., tracking sensors, cameras) and is often infeasible to do due to
3DO, participants positioned themselves in several different posi- the nature of private and sensitive contexts [19]. While the usable
tions, many of which are challenging to reach in VRO due to physical security community often expects in the wild research to increase the
constraints. Although some of these positions seem to be unrealis- generalisability and the ecological validity of research findings [46],
tic at first glance, such observations can indeed happen in the real it has been argued that “we [as a community] just need to be a little
world using, for example, drones equipped with cameras [79] or bit more open to what sort of solutions/evaluations we are expecting
surveillance cameras on the corner of a building. In our study, some out of [something] that has not actually been deployed in the real
participants linked their observations to other real-world actions. P7 world.“ [46]. VR studies [44,49] can be particularly helpful to further
brought up the example of observing ATM input in an unobtrusive contribute to more realistic authentication research and studies of
way while tying shoes (Fig. 3-3a). As such, VR-based shoulder surf- that type can be particularly promising when researchers aim to
ing studies using VRO and 3DO enable researchers to study different run a large number of consecutive experiments. It is often easier
observation strategies in much more detail what can be achieved to maintain such virtual environments and make adjustments (e.g.,
with traditional 2DVO. change lighting conditions, replace authentication systems). Virtual
While our findings suggest that a VR-based research approach artefacts are also easier to store, reuse, deploy, and share because
can provide researchers with insights into participants’ observation they do not require physical storage space [44, 46].
strategies, doing this is not necessarily in favour of a critical se- We believe that VR replications are particularly promising for us-
curity evaluation at times where the observation method deviates able security and privacy research when the targeted real-world space
from a realistic observation (e.g., mouse-keyboard manipulations is not available, which is not unlikely when conducting research in
in 3DO). Fig. 3 and the qualitative feedback suggest that partici- relatively sensitive and private contexts (e.g., studying ATM authen-
pants made use of the affordances of 3DO (e.g., being physically tication behaviour [19] or security systems at airports [64]).
independent), but using 2DVO and VRO led to more accurate obser-
vations (see Sect. 5.1). This means that at times where VR-based 6.4 Lessons Learned and Recommendations
observation methods are introduced for authentication research (e.g., We outline four lessons learned and recommendations to support
3DO) and the shoulder surfing resilience of a system is at the centre and guide researchers in future VR-based shoulder surfing studies.
of the investigation, participant-defined observation positions can Recommendation #1: Account For Real-World Factors if
greatly overestimate a system’s resilience against observations. Tak- They are of Relevance and Consider How the Correspond-
ing 3DO and ATM authentication as an example, someone could ing Research Findings Transfer to the Real World. The use
conclude that observations on ATM authentications are success- of VR can greatly advance shoulder surfing research by enabling
ful in “only” 83.33% observations, while both the de facto standard researchers to get insights into participants’ observation strate-
evaluation approach (i.e., 2DVO [20,47]) and VRO resulted in notice- gies. However, results from such VR studies also highly depend
able more successful observations (2DVO: 94.44%, VRO: 95.59%). on how well reality is emulated (e.g., proxemics [32], additional
Therefore, researchers risk being mislead into thinking that the sys- bystanders [40]). We encourage researchers to control for prox-
tem is more resilient against observations than it actually is. emics [32] in virtual environments if social factors are of relevance
to the research question. Contrary to prior work that found users’
6.2 VR Observation Methods and Their Use Cases perception of personal space in the real world is similar to that in
The literature discussed how participants’ lack of experience can lead a virtual environment [10, 36], we noticed that at times where par-
to an under-estimation of risk [82] and emphasised the importance ticipants optimise their shoulder surfing observations, social factors
of participants’ familiarity with the authentication methods (e.g., and the proximity to the user authenticating lose relevance and may
[18, 20, 39, 43, 48]). Building upon these discussions, we argue that even be ignored by participants. There are several directions where
participants’ experience is particularly important when researchers future work is called. For example, we encourage future work to
introduce novel observation methods for shoulder surfing research. consider detection mechanisms that inform participants during their
As evidenced by our semi-structured interviews, VRO were perceived observations when they are in the user‘s field of view. In cases
as highly realistic. However, the interaction with alternative methods, where the user authenticating would be aware of an observation,
which differ from participants’ real-world observation experiences participants may want to reconsider their observation position to
(e.g., mouse-keyboard manipulations, 3DO), can have a negative perform less conspicuous observations (as reported by P9 and P10
impact on shoulder surfing evaluations and corresponding security in Sect. 5.4.3). At this point, it is important to consider the existing
conclusions of authentication systems. Still, in cases where the community discussions when aiming for close-to-reality shoulder
focus is more on an exploratory shoulder surfing evaluation such as surfing behaviour in virtual environments. Slater [69] argued that
the effect of both “place illusion” and “plausibility illusion” (PI) impact the safety and well-being of people. While our initial investi-
can contribute to realistic behaviour in virtual environments and that gation of using VR for shoulder surfing research on authentication
improved visual realism can enhance realistic behavioural responses systems took place in the lab, we encourage future work to look
[70]. Skarbez et al. [67, 68] argued that PI is “essentially the extent at more distributed research approaches [49]. While remote (vir-
to which a scenario complies with a user‘s expectations”. As put tual/augmented reality) experiments introduce practical and ethical
by Weber et al. [80], “there is only little research about the effects concerns [71], they can, as put by Steed et al. [72], “continue to
of perceived realism in VR and the conducted studies generally forge forward with experimental work”.
show that higher realism goes along with stronger presence”. It
is important to note that the effect of perceived realism in VR is 7 F UTURE R ESEARCH D IRECTIONS
often relatively small [80] and that a high level of realism does not We explored the strengths and weaknesses of 3D VR recordings for
necessarily imply strong presence [37]. shoulder surfing research, which we compared to state-of-the-art
We demonstrated how VR increases participants‘ perceived shoul- shoulder surfing evaluations using 2D video recordings. We were
der surfing realism, but it is important to keep in mind that hinting particularly interested in participants‘ shoulder surfing behaviour
at similar behaviour to the real world is, due to the the introduced and how participants exploit VR‘s unique affordances when perform-
challenges when conducting security and privacy research in the ing observation attacks on user authentications. However, we did not
wild [19, 46], often only possible using qualitative research methods account for the many additional factors (e.g., shoulder surfing users
(as done in Sect. 5.4 or in [19,23]). Conducting similar shoulder surf- when interacting with different devices such as tablets [57], or situa-
ing studies in the real world (e.g., in different private and sensitive tions in which shoulder surfing defense strategies are applied [42]).
contexts) would go beyond what is ethically and legally possible. We leave this to future work. Similar to the work by Aviv et al. [8]
Recommendation #2: Consider How Participants Can we did not study text-based authentication, mainly because tradi-
Best Be Familiarised With VR Observation Methods. Partici- tional PIN and pattern authentications are the most commonly used
pants’ lack of experience w.r.t. novel shoulder surfing methods can baselines measures in shoulder surfing and authentication research
significantly impact their experience, preference, and performance (e.g., [20, 31, 39]). Future research may apply 3D VR recordings
when observing authentications. Even traditional input systems for the evaluation of multimodal authentication schemes (e.g., gaze
(e.g., mouse-keyboard manipulations) can have a negative impact + touch/mid-air [5, 39]). Furthermore, we used a non-vivid envi-
on participants’ experience and performance. Consequently, it is ronment (e.g., no additional bystanders) to immerse participants
important to introduce participants to novel (VR-based) shoulder into different authentication scenarios. We did this because one
surfing methods prior to the data collection as their lack of experi- key factor of shoulder surfing research on authentication systems
ence can significantly impact the outcome of a system’s shoulder is to provide participants (in the role of observers) with a best-case
surfing evaluation (e.g., see Sect. 5.1). scenario when observing authentications (e.g., [11,39,63,78]). More
vivid contexts may led to an even more realistic atmosphere, which
Recommendation #3: Consider a VR-Based Shoulder forms an interesting future research direction. Finally, a photorealis-
Surfing Approach When the Aim is to Contribute Towards Rea- tic VR environment may further increase the visual realism of such a
sonably “Realistic” Shoulder Surfing Experiences, but Keep virtual environment. However, recording such sensitive and private
2DVO as a Baseline Measure. As evidenced through our par- contexts as studied in our work is often infeasible to do in the wild.
ticipants’ qualitative feedback and the IPQ scores (see Sect. 5.4 and For example, creating 360◦ real-world recordings as done in the
Sect. 5.2), VRO leads to more realistic shoulder surfing experiments work by Saad et al. [62] introduces ethical and legal challenges in
compared to using 2DVO. However, traditional 2DVO already pro- the context of ATM authentication. Such recordings are also limited
vide a suitable baseline measure for a system’s resilience against to what is actually possible to stage/record in the real world. Virtual
observations [9]. While novel shoulder surfing methods (e.g., 3DO, replications are particularly promising at this point because they
VRO) may be used to contribute towards more realistic shoulder surf- provide researchers with more flexibility in changing parts of the
ing experiences and increase participants’ sense of being part of the environment [44] and enable researchers to study scenarios that are
shoulder surfing environment, they do not necessarily outperform tra- challenging (or even impossible) to access in the real world.
ditional 2DVO. It is important to set clear expectations and identify
at the beginning of the research whether or not it is useful to employ 8 C ONCLUSION
a VR-based research approach when studying shoulder surfing. In We introduced non-immersive and immersive VR observations to
situations where investigations in the wild are infeasible, VR-based advance lab-based shoulder surfing research. We demonstrated how
shoulder surfing research can be particularly promising, but to make VR and its unique affordances can be applied in the human-centred
results more tangible, and to support replication studies and com- security research domain to study shoulder surfing in different au-
parisons to prior works, we recommend to keep state-of-the-art 2D thentication scenarios. We showed that immersive VR recordings
video observations (i.e., 2DVO) as a baseline condition. provide participants with a reasonably realistic human shoulder surf-
Recommendation #4: Use VR to Study Shoulder Surf- ing experience without impacting their observation performance
ing in Contexts that are Challenging to Access in the Real compared to commonly used 2D video recordings. Through our
World. VR-based shoulder surfing studies are not an alternative to investigation of using VR for shoulder surfing research, we hope to
real-world research, but rather complement and advance lab studies contribute to more realistic human-centred security research in the
by enabling researchers to study scenarios that are otherwise chal- long run and encourage future work to find ways to further improve
lenging to access (e.g., ATM authentication [18, 19, 22]). In such lab-based usable security and privacy research using VR.
situations, using VR for human-centred shoulder surfing research
can be particularly valuable as such a research approach does not ACKNOWLEDGMENTS
require having physical access to private and sensitive contexts and We thank all participants for taking part in our research. We also
gives researchers more control of the study environments (e.g., high thank all reviewers whose comments significantly improved the pa-
internal validity, more consistency across participants). Virtual en- per. This publication was supported by the University of Edinburgh
vironments are often more affordable and faster to build, deploy, and the University of Glasgow jointly funded PhD studentships, and
and evaluate than corresponding real-world scenarios [44]. The use partially by the EPSRC (EP/V008870/1) and the PETRAS National
of VR as a testbed for human-centred research can be particularly Centre of Excellence for IoT Systems Cybersecurity, which is also
promising at times where pandemics (e.g., COVID-19) significantly funded by the EPSRC (EP/S035362/1).
R EFERENCES atm. In Communication by Gaze Interaction (COGAIN), 2008.
[23] M. Eiband, M. Khamis, E. von Zezschwitz, H. Hussmann, and F. Alt.
[1] 3d atm model, 2019. [Link] Understanding shoulder surfing in the wild: Stories from users and ob-
accessed 04 November 2021. servers. In Proc. of the SIGCHI Conf. on Human Factors in Computing
[2] 3d smartphone model, 2021. [Link] Systems, CHI ’17. ACM, New York, NY, USA, 2017.
[Link], accessed 04 November 2021. [24] L. A. Elkin, M. Kay, J. J. Higgins, and J. O. Wobbrock. An aligned
[3] U. 3D. User manual, 2021. [Link] rank transform procedure for multifactor contrast tests, 2021.
[Link], accessed 04 November 2021. [25] U. Erra, D. Malandrino, and L. Pepe. Virtual reality interfaces for
[4] Y. Abdelrahman, M. Khamis, S. Schneegass, and F. Alt. Stay cool! interacting with three-dimensional graphs. International Journal of
understanding thermal attacks on mobile-based user authentication. In Human–Computer Interaction, 2019.
Proc. of the 2017 CHI Conf. on Human Factors in Computing Systems, [26] M. Feick, N. Kleer, A. Tang, and A. Krüger. The virtual reality ques-
CHI ’17. ACM, New York, NY, USA, 2017. tionnaire toolkit. UIST Adjunct. ACM, New York, NY, USA, 2020.
[5] Y. Abdrabou, M. Khamis, R. M. Eisa, S. Ismael, and A. Elmougy. En- [27] S. M. Fiore, G. W. Harrison, C. E. Hughes, and E. E. Rutström. Virtual
gage: Resisting shoulder surfing using novel gaze gestures authentica- experiments and environmental policy. Environmental Economics and
tion. In Proc. of the 17th International Conf. on Mobile and Ubiquitous Management, 2009.
Multimedia. ACM, New York, NY, USA, 2018. [28] L. Freina and M. Ott. A literature review on immersive virtual reality
[6] Y. Abdrabou, M. Khamis, R. M. Eisa, S. Ismail, and A. Elmougy. Just in education: state of the art and perspectives. In The international
gaze and wave: Exploring the use of gaze and gestures for shoulder- scientific Conf. elearning and software for education, 2015.
surfing resilient authentication. In Proc. of the ACM Symp. on Eye [29] S. Garfinkel and H. R. Lipford. Usable security: History, themes, and
Tracking Research & Applications. ACM, New York, NY, USA, 2019. challenges. Synthesis Lectures on Information Security, Privacy, and
[7] T. Amano, S. Kajita, H. Yamaguchi, T. Higashino, and M. Takai. Trust, 2014.
Smartphone applications testbed using virtual reality. In Proc. of [30] C. George, M. Khamis, D. Buschek, and H. Hussmann. Investigating
the 15th EAI International Conf. on Mobile and Ubiquitous Systems: the third dimension for authentication in immersive virtual reality and
Computing, Networking and Services, MobiQuitous ’18. ACM, New in the real world. In 2019 IEEE Conf. on Virtual Reality and 3D User
York, NY, USA, 2018. Interfaces (VR), March 2019.
[8] A. J. Aviv, J. T. Davin, F. Wolf, and R. Kuber. Towards baselines [31] C. George, M. Khamis, E. von Zezschwitz, M. Burger, H. Schmidt,
for shoulder surfing on mobile authentication. In Proc. of the 33rd F. Alt, and H. Hussmann. Seamless and secure vr: Adapting and
Annual Computer Security Applications Conference, ACSAC 2017. evaluating established authentication systems for virtual reality. In
ACM, New York, NY, USA, 2017. Network and Distributed System Security Symposium (NDSS 2017),
[9] A. J. Aviv, F. Wolf, and R. Kuber. Comparing video based shoul- USEC ’17. NDSS, February 2017.
der surfing with live simulation. In Proc. of the Computer Security [32] E. T. Hall. The hidden dimension. Garden City, NY: Doubleday, 1966.
Applications Conf., ACSAC ’18. ACM, New York, NY, USA, 2018. [33] S. Hart and L. Staveland. Development of NASA-TLX (Task Load
[10] J. N. Bailenson, J. Blascovich, A. C. Beall, and J. M. Loomis. Equi- Index): Results of empirical and theoretical research. In Human mental
librium theory revisited: Mutual gaze and personal space in virtual workload, 1988.
environments. Presence, 2001. [34] S. G. Hart. Nasa-task load index (nasa-tlx); 20 years later. In Proc.
[11] A. Bianchi, I. Oakley, and D. S. Kwon. Spinlock: A single-cue haptic of the human factors and ergonomics society annual meeting. Sage
and audio pin input technique for authentication. In Haptic and Audio publications Sage CA: Los Angeles, CA, 2006.
Interaction Design. Springer, Berlin, Heidelberg, 2011. [35] M. Hassenzahl, M. Burmester, and F. Koller. Attrakdiff: A question-
[12] J. Blascovich, J. Loomis, A. C. Beall, K. R. Swinth, C. L. Hoyt, and J. N. naire to measure perceived hedonic and pragmatic quality. In Mensch
Bailenson. Immersive virtual environment technology as a method- & Computer, 2003.
ological tool for social psychology. Psychological Inquiry, 2002. [36] H. Hecht, R. Welsch, J. Viehoff, and M. R. Longo. The shape of
[13] L. Bošnjak and B. Brumen. Shoulder surfing experiments: A systematic personal space. Acta Psychologica, 2019.
literature review. Computers & Security, 2020. [37] M. Hofer, T. Hartmann, A. Eden, R. Ratan, and L. Hahn. The role of
[14] J. Brooke. Sus: a ”quick and dirty” usability. 1996. plausibility in the experience of spatial presence in virtual environments.
[15] F. Brudy, D. Ledo, S. Greenberg, and A. Butz. Is Anyone Looking? Frontiers in Virtual Reality, 2020.
Mitigating Shoulder Surfing on Public Displays through Awareness [38] T. hundred fifty-five (255) pixel studios. City package, 2021.
and Protection. In Proc. of The International Symposium on Pervasive [Link]
Displays, PerDis ’14. ACM, New York, NY, USA, 2014. package-107224, accessed 04 November 2021.
[16] J. Cohen. Eta-squared and partial eta-squared in fixed factor anova [39] M. Khamis, F. Alt, M. Hassib, E. von Zezschwitz, R. Hasholzner, and
designs. Educational and Psychological Measurement, 1973. A. Bulling. Gazetouchpass: Multimodal authentication using gaze
[17] J. Cohen. Statistical power analysis for the behavioral sciences. Aca- and touch on mobile devices. In Proc. of the 34th Annual ACM Conf.
demic press, 2013. Extended Abstracts on Human Factors in Computing Systems, CHI EA
[18] A. De Luca, K. Hertzschuch, and H. Hussmann. Colorpin: Securing ’16. ACM, New York, NY, USA, 2016.
pin entry through indirect input. In Proc. of the SIGCHI Conf. on [40] M. Khamis, L. Bandelow, S. Schick, D. Casadevall, A. Bulling, and
Human Factors in Computing Systems, CHI ’10. ACM, New York, NY, F. Alt. They are all after you: Investigating the viability of a threat
USA, 2010. model that involves multiple shoulder surfers. In Proc. of the 16th
[19] A. De Luca, M. Langheinrich, and H. Hussmann. Towards understand- International Conf. on Mobile and Ubiquitous Multimedia, MUM ’17.
ing atm security: A field study of real world atm use. In Proc. of the ACM, New York, NY, USA, 2017.
6th Symposium on Usable Privacy and Security, SOUPS ’10. ACM, [41] M. Khamis, L. Trotter, V. Mäkelä, E. v. Zezschwitz, J. Le, A. Bulling,
New York, NY, USA, 2010. and F. Alt. Cueauth: Comparing touch, mid-air gestures, and gaze
[20] A. De Luca, E. von Zezschwitz, N. D. H. Nguyen, M.-E. Maurer, for cue-based authentication on situated displays. Proc. ACM Interact.
E. Rubegni, M. P. Scipioni, and M. Langheinrich. Back-of-device Mob. Wearable Ubiquitous Technol., Dec. 2018.
authentication on smartphones. In Proc. of the SIGCHI Conf. on [42] H. Khan, U. Hengartner, and D. Vogel. Evaluating attack and defense
Human Factors in Computing Systems, CHI ’13. ACM, New York, NY, strategies for smartphone pin shoulder surfing. In Proc. of the 2018
USA, 2013. CHI Conf. on Human Factors in Computing Systems. ACM, New York,
[21] A. De Luca, E. von Zezschwitz, L. Pichler, and H. Hussmann. Using NY, USA, 2018.
Fake Cursors to Secure On-Screen Password Entry. In Proc. of the [43] L. Kraus, R. Schmidt, M. Walch, F. Schaub, and S. Möller. On the
SIGCHI Conf. on Human Factors in Computing Systems, CHI ’13. use of emojis in mobile authentication. In S. De Capitani di Vimercati
ACM, New York, NY, USA, 2013. and F. Martinelli, eds., ICT Systems Security and Privacy Protection.
[22] P. Dunphy, A. Fitch, and P. Olivier. Gaze-contingent passwords at the Springer International Publishing, Cham, 2017.
[44] V. Mäkelä, S. R. R. Rivu, S. Alsherif, M. Khamis, C. Xiao, L. M. presence: Factor analytic insights. Presence: Teleoperators & Virtual
Borchert, A. Schmidt, and F. Alt. Virtual Field Studies: Conducting Environments, 2001.
Studies on Public Displays in Virtual Reality. In Proc. of the 38th [67] R. Skarbez, F. P. Brooks, Jr., and M. C. Whitton. A survey of presence
Annual ACM Conf. on Human Factors in Computing Systems, CHI ’20. and related concepts. ACM Comput. Surv., Nov. 2017.
ACM, New York, NY, USA, 2020. [68] R. Skarbez, J. Gabbard, D. A. Bowman, T. Ogle, and T. Tucker. Virtual
[45] F. Mathis, K. Vaniea, and M. Khamis. Observing virtual avatars: The replicas of real places: Experimental investigations. IEEE Transactions
impact of avatars’ fidelity on identifying interactions. In Proc. of the on Visualization and Computer Graphics, 2021.
24th International Conf. on Academic Mindtrek, AcademicMindtrek [69] M. Slater. Place illusion and plausibility can lead to realistic behaviour
’21. ACM, New York, NY, USA, 2021. in immersive virtual environments. Philosophical Transactions of the
[46] F. Mathis, K. Vaniea, and M. Khamis. Prototyping usable privacy Royal Society B: Biological Sciences, 2009.
and security systems: Insights from experts. International Journal of [70] M. Slater, P. Khanna, J. Mortensen, and I. Yu. Visual realism enhances
Human–Computer Interaction, 2021. realistic response in an immersive virtual environment. IEEE computer
[47] F. Mathis, K. Vaniea, and M. Khamis. Replicueauth: Validating the graphics and applications, 2009.
use of a lab-based virtual reality setup for evaluating authentication [71] A. Steed, S. Frlston, M. M. Lopez, J. Drummond, Y. Pan, and D. Swapp.
systems. In Proc. of the 39th Annual ACM Conf. on Human Factors in An ‘in the wild’ experiment on presence and embodiment using con-
Computing Systems, CHI ’21. ACM, New York, NY, USA, 2021. sumer virtual reality equipment. IEEE Transactions on Visualization
[48] F. Mathis, J. H. Williamson, K. Vaniea, and M. Khamis. Fast and secure and Computer Graphics, 2016.
authentication in virtual reality using coordinated 3d manipulation and [72] A. Steed, F. Ortega, A. Williams, E. Kruijff, W. Stuerzlinger, A. Bat-
pointing. ACM Trans. Comput.-Hum. Interact., Jan. 2021. maz, A. Won, E. Rosenberg, A. Simeone, and A. Hayes. Evaluating
[49] F. Mathis, X. Zhang, J. O’Hagan, D. Medeiros, P. Saeghe, M. McGill, immersive experiences during covid-19 and beyond. 2020.
S. Brewster, and M. Khamis. Remote xr studies: The golden future of [73] A. Steed and R. Schroeder. Collaboration in immersive and non-
hci research? In CHI 2021 Workshop on XR Remote Research, 2021. immersive virtual environments. In Immersed in Media. 2015.
[50] L. Motion. Leap motion, 2019. accessed 04 November 2021. [74] TI. Ultimatereplay, 2021. [Link]
[51] J. O’Hagan and J. R. Williamson. Reality aware vr headsets. In Proc. of camera/ultimate-replay-2-0-178602, accessed 04 November 2021.
the 9TH ACM International Symposium on Pervasive Displays, PerDis [75] S. Ventura, E. Brivio, G. Riva, and R. M. Baños. Immersive versus non-
’20. ACM, New York, NY, USA, 2020. immersive experience: Exploring the feasibility of memory assessment
[52] J. O’Hagan, J. R. Williamson, M. McGill, and M. Khamis. Safety, through 360 technology. Frontiers in psychology, 2019.
power imbalances, ethics and proxy sex: Surveying in-the-wild inter- [76] A. Voit, S. Mayer, V. Schwind, and N. Henze. Online, VR, AR, Lab,
actions between vr users and bystanders. In 2021 IEEE International and In-Situ: Comparison of Research Methods to Evaluate Smart Arti-
Symposium on Mixed and Augmented Reality (ISMAR), 2021. facts. In Proc. of the 2019 CHI Conf. on Human Factors in Computing
[53] T. D. Parsons. Virtual reality for enhanced ecological validity and Systems, CHI ’19. ACM, New York, NY, USA, 2019.
experimental control in the clinical, affective and social neurosciences. [77] M. Volkamer, A. Gutmann, K. Renaud, P. Gerber, and P. Mayer. Repli-
Frontiers in Human Neuroscience, 2015. cation study: A cross-country field observation study of real world
[54] S. Pedram, R. Skarbez, S. Palmisano, M. Farrelly, and P. Perez. Lessons {PIN} usage at atms and in various electronic payment scenarios. In
learned from immersive and desktop vr training of mines rescuers. Symposium on Usable Privacy and Security (SOUPS), 2018.
Frontiers in Virtual Reality, 2021. [78] E. von Zezschwitz, A. De Luca, B. Brunkow, and H. Hussmann. Swipin:
[55] S. Putze, D. Alexandrovsky, F. Putze, S. Höffner, J. D. Smeddinck, and Fast and secure pin-entry on smartphones. In Proc. of the 33rd Annual
R. Malaka. Breaking the experience: Effects of questionnaires in vr ACM Conf. on Human Factors in Computing Systems, CHI ’15. ACM,
user studies. In Proc. of the 2020 CHI Conf. on Human Factors in New York, NY, USA, 2015.
Computing Systems, CHI ’20. ACM, New York, NY, USA, 2020. [79] Y. Wang, H. Xia, Y. Yao, and Y. Huang. Flying eyes and hidden
[56] Qualtrics. Qualtrics, 2005. accessed 04 November 2021. controllers: A qualitative study of people’s privacy perceptions of
[57] K. Ragozin, Y. S. Pai, O. Augereau, K. Kise, J. Kerdels, and K. Kunze. civilian drones in the us. Proc. on Privacy Enhancing Tech., 2016.
Private Reader: Using Eye Tracking to Improve Reading Privacy in [80] S. Weber, D. Weibel, and F. W. Mast. How to get there when you are
Public Spaces. In Proc. of the 21st International Conf. on Human- there already? defining presence in virtual reality and the importance
Computer Interaction with Mobile Devices and Services, MobileHCI of perceived realism. Frontiers in Psychology, 2021.
’19. ACM, New York, NY, USA, 2019. [81] M. Weiß, K. Angerbauer, A. Voit, M. Schwarzl, M. Sedlmair, and
[58] F. Rebelo, P. Noriega, E. Duarte, and M. Soares. Using virtual reality S. Mayer. Revisited: Comparison of empirical methods to evaluate
to assess user experience. Human Factors, 2012. visualizations supporting crafting and assembly purposes. IEEE Trans-
[59] G. Robertson, S. Card, and J. Mackinlay. Three views of virtual reality: actions on Visualization and Computer Graphics, 2020.
nonimmersive virtual reality. Computer, 1993. [82] O. Wiese and V. Roth. Pitfalls of shoulder surfing studies. In NDSS
[60] RockVR. Vr capture, 2021. [Link] Workshop on Usable Security, 2015.
video/vr-capture-75654, accessed 04 November 2021. [83] O. Wiese and V. Roth. See you next time: A model for modern shoulder
[61] V. Roth, K. Richter, and R. Freidinger. A pin-entry method resilient surfers. In Proc. of the 18th International Conf. on Human-Computer
against shoulder surfing. In Proc. of the 11th ACM Conf. on Computer Interaction with Mobile Devices and Services, MobileHCI ’16. ACM,
and Communications Security. ACM, New York, NY, USA, 2004. New York, NY, USA, 2016.
[62] A. Saad, J. Liebers, U. Gruenefeld, F. Alt, and S. Schneegass. Un- [84] J. O. Wobbrock, L. Findlater, D. Gergle, and J. J. Higgins. The aligned
derstanding bystanders’ tendency to shoulder surf smartphones using rank transform for nonparametric factorial analyses using only anova
360-degree videos in virtual reality. 2018. procedures. In Proc. of the SIGCHI Conf. on Human Factors in Com-
[63] H. Sasamoto, N. Christin, and E. Hayashi. Undercover: Authentication puting Systems, CHI ’11. ACM, New York, NY, USA, 2011.
usable in front of prying eyes. In Proc. of the SIGCHI Conf. on Human [85] E. Wu, M. Piekenbrock, T. Nakumura, and H. Koike. Spinpong - virtual
Factors in Computing Systems. ACM, New York, NY, USA, 2008. reality table tennis skill acquisition using visual, haptic and temporal
[64] M. A. Sasse. Red-eye blink, bendy shuffle, and the yuck factor: A user cues. IEEE Transactions on Visualization and Computer Graphics,
experience of biometric airport systems. IEEE Security Privacy, 2007. 2021.
[65] G.-L. Savino, N. Emanuel, S. Kowalzik, F. Kroll, M. C. Lange, M. Lau- [86] N. H. Zakaria, D. Griffiths, S. Brostoff, and J. Yan. Shoulder surfing
dan, R. Leder, Z. Liang, D. Markhabayeva, M. Schmeißer, N. Schütz, defence for recall-based graphical passwords. In Proc. of the 7th Symp.
C. Stellmacher, Z. Xu, K. Bub, T. Kluss, J. Maldonado, E. Kruijff, and on Usable Privacy and Security, SOUPS ’11. ACM, New York, NY,
J. Schöning. Comparing pedestrian navigation methods in virtual real- USA, 2011.
ity and real life. In 2019 International Conf. on Multimodal Interaction,
ICMI ’19. ACM, New York, NY, USA, 2019.
[66] T. Schubert, F. Friedmann, and H. Regenbrecht. The experience of

You might also like