VR Enhancements in Shoulder Surfing Research
VR Enhancements in Shoulder Surfing Research
This is the author’s final accepted version. It is posted here for your personal use. Not for
redistribution. The definitive version was accepted to the IEEE Conference on Virtual Reality
and 3D User Interfaces (VR) (IEEE VR 2022) and will be available in the IEEE Xplore library.
Virtual Reality Observations: Using Virtual Reality to Augment Lab-Based
Shoulder Surfing Research
Florian Mathis* Joseph O’Hagan† Mohamed Khamis‡ Kami Vaniea§
University of Glasgow University of Glasgow University of Glasgow University of Edinburgh
University of Edinburgh
Figure 1: We explore the use of virtual reality (VR) for shoulder surfing research in the authentication research domain. We
compare the impact of non-immersive/immersive VR observations on participants’ observation performance and behaviour while
shoulder surfing authentications. We demonstrate the strengths of VR-based shoulder surfing research by exploring three different
authentication scenarios: (➊) automated teller machine (ATM), (➋) smartphone PIN, and (➌) smartphone pattern authentication.
A BSTRACT authenticate (e.g., to enter a PIN), which puts them at risk of get-
Given the difficulties of studying the shoulder surfing resistance ting observed (referred to as shoulder surfing [23]). Consequently,
of authentication systems in a live setting, researchers often ask researchers looked into the shoulder surfing resistance of a large
study participants to shoulder surf authentications by watching two- variety of authentication schemes (e.g., [11, 18, 20, 39, 63]). A com-
dimensional (2D) video recordings of a user authenticating. How- mon approach in human-centred security research is to study such
ever, these video recordings do not provide participants with a realis- systems’ security by inviting participants to the lab, showing them
tic shoulder surfing experience, creating uncertainty in the value and two-dimensional (2D) video recordings, and asking them to guess
validity of lab-based shoulder surfing experiments. In this work, we the observed PIN (e.g., [20, 41]). These recordings show user au-
exploit the unique characteristics of virtual reality (VR) and study the thentications from pre-defined observation angles, with researchers’
use of non-immersive/immersive VR recordings for shoulder surfing intention to simulate a “best-case scenario” for an attacker that shoul-
research. We conducted a user study (N=18) to explore the strengths der surfs the user. Although 2D video recordings form a suitable
and weaknesses of such a VR-based shoulder surfing research ap- baseline for shoulder surfing research [9], it remains unclear a) how
proach. Our results suggest that immersive VR observations result in (if at all) researchers empirically define the observation perspective,
a more realistic shoulder surfing experience, in a significantly higher b) if the selected perspective indeed represents a best-case scenario
sense of being part of the authentication environment, in a greater for attackers, and c) if 2D video recordings can provide realistic
feeling of spatial presence, and in a higher level of involvement than shoulder surfing experiences. While studying shoulder surfing in a
2D video observations without impacting participants’ observation live setting is possible, it is often challenging [82] and in some cases
performance. This suggests that studying shoulder surfing in VR is even infeasible. For example, studying shoulder surfing on ATM
advantageous in many ways compared to currently used approaches, authentications in the real world is close to impossible due to ethical
e.g., participants can freely choose their observation angle rather and legal constraints [19, 77].
than being limited to a fixed observation angle as done in current As a result, we explore in this work how virtual reality (VR) can
methods. We discuss the strengths and weaknesses of using VR for support shoulder surfing research by enabling researchers to study
shoulder surfing research and conclude with four recommendations shoulder surfing in settings that are challenging to replicate in the
to help researchers decide when (and when not) to employ VR for lab and infeasible to research in the real world. Ideally, researchers
shoulder surfing research in the authentication research domain. would be able to assess a system’s shoulder surfing resistance in a
variety of contexts without much effort. Through the use of VR as a
Keywords: Virtual Reality, Shoulder Surfing, Authentication research platform, we enable researchers to a) evaluate the shoulder
surfing resistance of authentications in situ instead of in lab settings
1 I NTRODUCTION (e.g., [18, 22]), and b) investigate participants’ observation strategies
Accessing private data has become a fundamental part of most peo- in much more detail than what can be achieved in traditional lab
ple’s daily life. Examples include, but are not limited to, check- settings. To explore the potential of VR for shoulder surfing research
ing emails on smartphones, accessing the account balance through on authentication systems, we conducted a lab-based VR user study
online banking apps, or withdrawing cash at automated teller ma- (N=18). We exposed participants to user authentications in three
chines (ATMs). In many of these situations, users are required to different contexts: ATM, smartphone PIN, and smartphone pattern
authentication. We then ran a comparison of participants’ perception
* e-mail: [Link]@[Link] when shoulder surfing user authentications using commonly used
† e-mail: [Link].1@[Link] 2D video recordings (i.e., 2DVO, our baseline), and non-immersive1
‡ e-mail: [Link]@[Link] (i.e., 3DO) and immersive VR recordings (i.e., VRO). Our findings
§ e-mail: kvaniea@[Link]
1 We use the terminology by Freina and Ott [28] where non-immersive
Nasa−TLX Scores
75
*
4 method method
IPQ Scores
* * *
* 2D 2D
50
3D 3D
2 VR VR
25
0 0
PRES SP INV REAL mental physical temporal performance effort frustration
IPQ Dimensions NASA−TLX Dimensions
Figure 2: VRO led to a significantly higher sense of being there, higher spatial presence, higher involvement, and higher experienced realism
than 2DVO and 3DO. There were no statistically significant differences in participants’ perceived workload when using the different observation
methods. Error bars denote the 95% confidence interval (CI).
participants’ guesses and their distance to the correct PIN. There SD=1.55) (p < 0.05). The difference between 2DVO and 3DO was
was also an interaction effect between threat model × observation also statistically significant (p < 0.05).
method (F(1,83) = 3.319, p < 0.05, η p2 = 0.07). Post-hoc Bonfer- Spatial presence. Participants’ experienced spatial presence dif-
roni adjusted analysis did not confirm the interaction effect, with all fered statistically significantly between the different observation
pairwise-comparisons being not significant (p > 0.05). Follow-up methods (F(2,34) = 59.61, p < 0.05, η p2 = 0.78). Post-hoc analy-
analysis on the main effect of observation method revealed that par- sis revealed statistically significant differences in participants’ spa-
ticipants’ guesses on ATM authentications were closer to the correct tial presence in 2DVO (M=1.28, SD=1.48) and in 3DO (M=2.34,
PIN when using VRO (M=0.074, SD=0.250) and 2DVO (M=0.097, SD=1.99) compared to VRO (M=5.03, SD=1.18) (p < 0.05). The
SD=0.288) compared to 3DO (M=0.278, SD=0.470) (p < 0.05). difference between 2DVO and 3DO was also significant (p < 0.05).
Smartphone PIN Authentication: Participants’ observation per- Involvement. Participants’ experienced involvement was statis-
formance was M=77.78% (SD=30.34%) for 2DVO, M=69.44% tically significantly different in the different observation methods
(SD=36.41%) for 3DO, and M=83.82% (SD=26.74%) for VRO. (F(2,34) = 20.592, p < 0.05, η p2 = 0.55). Post-hoc analysis revealed
There was a significant effect of observation method (F(1,83) = 4.95, statistically significant differences in 2DVO (M=2.11, SD=2.15) and
p < 0.05, η p2 = 0.11) and threat model (F(1,83) = 6.69, p < 0.05, η p2 in 3DO (M=2.35, SD=1.91) compared to VRO (M=4.32, SD=1.46)
= 0.07) on the mean Levenshtein distance. Participants‘ guesses in (p < 0.05). There is no evidence that participants’ experienced
VRO were closer to the correct PIN (M=0.265, SD=0.448) than in involvement differed statistically between 2DVO and 3DO.
3DO (M=0.648, SD=0.867) (p < 0.05). There were no significant
Experienced Realism. Participants’ experienced realism was
differences between the other pairs (2DVO: M=0.403, SD=0.685).
statistically significantly different between the different observation
Pattern Smartphone Authentication: Participants’ observation methods (F(2,34) = 23.944, p < 0.05, η p2 = 0.58). Post-hoc analysis
performance was M=95.83% (SD=14.02%) for 2DVO, M=91.67% revealed statistically significant differences in participants’ experi-
(SD=22.36%) for 3DO, and M=100.00% (SD=0.00%) for VRO. enced realism in 2DVO (M=1.50, SD=1.64) and in 3DO (M=2.13,
There was a significant effect of observation method (F(1,83) = 3.21, SD=1.83) compared to VRO (M=2.96, SD=1.98) (p < 0.05). The
p < 0.05, η p2 = 0.07) and threat model (F(1,83) = 25.53, p < 0.05, difference between 2DVO and 3DO was also significant (p < 0.05).
η p2 = 0.24) on the mean Levenshtein distance. Participants’ guesses
in VRO were closer to the correct pattern (M=0.00, SD=0.00) than Summary: Sense of Presence
in 3DO (M=0.139, SD=0.371) (p < 0.05). There were no significant VRO led to a significant higher sense of being part of the virtual
differences between the other pairs (2DVO: M=0.083, SD=0.305). environment, to a higher spatial presence, and to a higher feeling
Summary: Observation Performance of involvement and experienced realism than 2DVO and 3DO.
The Levenshtein distances confirmed the differences in partici-
pants’ observation performance between VRO and 3DO, but not 5.3 Perceived Workload (NASA-TLX)
between VRO and 2DVO. VRO resulted in the most accurate ob-
servations, followed by 2DVO. Shapiro-Wilk tests of normality indicated that participants’ per-
ceived workload when experiencing the different observation meth-
5.2 Sense of Presence (IPQ) ods follows a normal distribution on the level of each observation
method. Therefore, we did not perform an aligned rank transforma-
There was a significant effect of observation method on the overall
tion. Mauchly’s test of sphericity indicated that the assumption of
IPQ scores (F(2,34) = 71.429, p < 0.05, η p2 = 0.81). Post-hoc sphericity had not been violated, χ 2 (2) = 3.255, p=0.196. Partici-
analysis confirmed that the sense of presence was significantly higher pants’ perceived workload was statistically significantly different
in VRO (M=4.22, SD=1.76) than in 3DO (M=2.28, SD=1.93) and between the observation methods, F(2,34) = 4.715, p < 0.05, η p2
2DVO (M=1.55, SD=1.77) (p < 0.05). The difference between = 0.217, but post-hoc analysis with Bonferroni adjustment did not
3DO and 2DVO was also significant (p < 0.05). Fig. 2 shows an confirm the significant differences (p > 0.05). The mean values
overview of the results, featuring the subscales 1) sense of being of participants’ perceived workload are M=28.15 (SD=15.77) for
there (PRES), 2) spatial presence (SP), 3) involvement (INV), 4) 2DVO, M=27.31 (SD=14.61) for 3DO, and M=18.98 (SD=17.62) for
experienced realism (REALISM). We followed up with a more
VRO. Fig. 2 shows the mean NASA-TLX values for each dimension.
nuanced analysis on the level of each subscale.
Sense of being there. The observation methods elicited statisti- Summary: Perceived Workload
cally significant changes in participants’ sense of being (F(2,34) =
31.932, p < 0.05, η p2 = 0.65). Post-hoc analysis revealed a statisti- There is no evidence that VRO or 3DO led to a higher workload
cally significant lower sense of being in 2DVO (M=0.88, SD=1.45) than 2DVO, suggesting that participants’ differences in perceived
and in 3DO (M=2.33, SD=2.14) compared to VRO (M=4.78, workload when using 2DVO, VRO, and 3DO are negligible.
5.4 Semi-structured Interviews
We concluded our study with semi-structured interviews to a) shed
more light on participants’ perception and performance when using
the different observation methods and b) better understand their
perceived differences to shoulder surfing in the wild. We transcribed
the interview data and split participants’ statements into meaning-
ful excerpts. This process resulted in overall N=292 participant
statements, which we then systematically clustered using an affinity
diagram. The initial clustering was performed by the lead researcher.
A second researcher then performed an independent review of the
clustering and added tags to clusters that required another iteration.
Both researchers then met to discuss the clustering and to resolve any
discussion points that came up during the review process. Through
this process, we identified five themes: 1) Observation Methods’
Unique Characteristics, 2) VRO for More Realistic Shoulder Surfing
Experiences, 3) Lab vs Real-World Observations, 4) The Differences
Between the Authentication Scenarios, and 5) General Comments. Figure 3: ➊ shows the reference position + orientation of 2DVO. Par-
Below, we discuss those that are particularly relevant for the scope ticipants made use of the absence of physical constraints in 3DO (see
of our research in more detail. Reporting the number of participants ➋). From the immersive VR observations we noticed that social fac-
who shared certain opinions would be inaccurate due to the use of tors (e.g., the proximity to the user authenticating) lose relevance in
a semi-structured interview approach and the study‘s exploratory such a virtual environment, which we discuss further in Sect. 6.4.
nature. Thus, we do not include frequencies. Quotes are translated ➌ shows a VR observation in which the participant pretended to
from German to English where necessary. tie their shoes while performing the observation (a); (b) shows the
observation position through another perspective.
5.4.1 Observation Methods’ Unique Characteristics
We noticed that VRO contributed to a close-to-reality looking over realistic in situ shoulder surfing experience, 2DVO and 3DO were
someone’s shoulder experience. Although 3DO provided partici- considered to be observations from “another world”.
pants with a more realistic shoulder surfing experience than 2DVO,
the mouse-keyboard interaction impacted participants’ observation 5.4.3 Lab vs Real-World Observations
performance. Consequently, the “plug-and-play” characteristic of
Participants reported that they would perform real-world observa-
2DVO resulted in observations being easier than 3DO. P11 men-
tions similarly as done using VRO, e.g., “I can imagine that [real-
tioned that in VRO “[they] could position [themselves] in a way
world observations] work exactly how I did it in VR” (P12). How-
how they wanted it and it was super easy to select the position; this
ever, across all participants the message was that they would respect
was more difficult with keyboard/mouse” (P11). Others mentioned
the social distances to the user more in the real world. P9 mentioned
that in VRO “[you] just need to walk to a specific position” (P17).
that “[they] would probably stay further away and do it less con-
Regarding 3DO, participants mentioned that their experience was
spicuously” (P9). Others voiced that they completely ignored the
closer-to-reality than 2DVO because “it felt more like that [they]
social factor during the study and “only optimised [their] viewing
really want to look over someone’s shoulder” (P15). P7 mentioned
point” (P10). P4 added that in the real world “there would be other
that “they could experiment a bit like in the real world where you
people [and that they] would probably feel being observed” (P4).
can observe [the authentication] from different perspectives.” (P7).
P13 voiced that in the ATM scenario “the user who withdraws cash
Although the lack of manipulations was raised by some participants
probably already acts precautiously – so you would realise when
in 2DVO, there was a general consensus that it was easier to observe
someone stays that close to you.” (P13).
authentications in 2DVO than in 3DO. Participants mentioned that
In summary, we noticed that while VRO contributed to more
the observation position + angle provided them with a clear line of
realistic shoulder surfing experiences than 2DVO, participants men-
sight and that their only task was to watch the authentication record-
tioned that users would sense if someone is close to them. In our
ing. In fact, some participants mentioned they found the videos more
study, participants did not necessarily consider the social factor (i.e.,
realistic because they used VRO and 3DO “in a way to really abuse
proxemics [32]) in their observations (see participants’ tracked ob-
them” (P9), resulting in some unusual observation positions.
servation positions in Fig. 3, visualised through black dots), which
5.4.2 VRO for More Realistic Shoulder Surfing Experiences arguably takes on an important role in real-world observations [15].
In VRO, P3 voiced that “the [real] environment would be completely
6 D ISCUSSION
irrelevant; it does not matter if [they are] in a basement, in an attic,
outside, or at the sea” (P3), and that they did not feel like being We explored how the use of VR can contribute to advanced shoulder
part of an experiment. Others mentioned that “with the VR headset surfing research. We found that VRO provided participants with a
[they] moved within the environment and it felt on a physical way reasonably realistic shoulder surfing experience without negatively
more realistic” (P4). For 3DO, participants voiced that they did impacting their shoulder surfing performance (see Sect. 5.1). Our
not feel being part of the environment to the same extent as in study showed that VRO contribute to a significant higher sense of
VRO because of the presence of reality and that they were “aware of being in the environment, a greater feeling of spatial presence, a
everything that surrounded [them] in the reality” (P15). P3 explained higher level of involvement, and a higher experienced realism than
this based on the fact that they were “sitting in front of the PC and 2DVO (baseline). While this is an expected finding with the benefits
could see stuff on the left and right side that is not related to the of immersive VR in terms of presence being well-known to the VR
[authentication scheme]” (P3). For 2DVO, participants voiced that community (e.g., [73, 75]), the advantages of VRO over 2DVO are
their task was only to “watch” the authentications and that they were particularly interesting for shoulder surfing research. Our findings
“very conscious that there is a technical device between [them] and imply that previous shoulder surfing studies using 2D videos were
the environment” (P4). The overall qualitative feedback suggests that not necessarily capable of providing participants with a close-to-
there are two extremes: While VRO contributed towards a reasonably reality shoulder surfing experience; therefore, impacting the often
desired high ecological validity of usable security research stud- studying participants’ observation behaviour and their observation
ies [46]. Despite the advantages of VRO, our results suggest that strategies, shoulder surfing methods such as 3DO can be particularly
2DVO are sufficient to assess a system’s resilience against obser- helpful because they enable researchers to study situations that are
vations (see Sect. 5.1). This confirms Aviv et al.’s findings when challenging to research using other means.
comparing 2D video recordings with live observations [9]. In all
three authentication contexts, there is no evidence that VRO were 6.3 VR Studies and Research in the Wild
more accurate than 2DVO. Below, we discuss the impact of 3DO on It is important to acknowledge that VR studies should not, at any
shoulder surfing experiments together with participants’ observation point, replace traditional real-world lab or field studies, but rather
behaviour in more detail. Participants’ observation behaviour was complement them [44,47,49]. As put by Mäkelä et al. [44], “VR field
similar across the authentication scenarios. Therefore, we moved studies situate between lab studies and real-world field studies, being
the smartphone PIN/pattern visualisations to Appendix B in our sup- closer to field studies in ecological validity, and closer to lab studies
plementary material and discuss participants‘ observation behaviour with regards to their required effort”. Virtual simulations make it
on ATM authentications in more detail in Sect. 6.1. “easy to experiment with different physical display configurations,
e.g., layouts, shapes, sizes and locations” [44]. In a similar vein,
6.1 VR-based Observation Methods: A Blessing and a we showed how VR enables researchers to study human shoulder
Curse for Shoulder Surfing Research surfing on authentication schemes in several contexts without much
From participants’ shoulder surfing behaviour (see Fig. 3), we no- additional effort. Studying all three authentication contexts in the
ticed that in 3DO participants made use of the unique characteristics wild would require a significant amount of additional hardware
of non-immersive VR. This is apparent in our study as follows: In (e.g., tracking sensors, cameras) and is often infeasible to do due to
3DO, participants positioned themselves in several different posi- the nature of private and sensitive contexts [19]. While the usable
tions, many of which are challenging to reach in VRO due to physical security community often expects in the wild research to increase the
constraints. Although some of these positions seem to be unrealis- generalisability and the ecological validity of research findings [46],
tic at first glance, such observations can indeed happen in the real it has been argued that “we [as a community] just need to be a little
world using, for example, drones equipped with cameras [79] or bit more open to what sort of solutions/evaluations we are expecting
surveillance cameras on the corner of a building. In our study, some out of [something] that has not actually been deployed in the real
participants linked their observations to other real-world actions. P7 world.“ [46]. VR studies [44,49] can be particularly helpful to further
brought up the example of observing ATM input in an unobtrusive contribute to more realistic authentication research and studies of
way while tying shoes (Fig. 3-3a). As such, VR-based shoulder surf- that type can be particularly promising when researchers aim to
ing studies using VRO and 3DO enable researchers to study different run a large number of consecutive experiments. It is often easier
observation strategies in much more detail what can be achieved to maintain such virtual environments and make adjustments (e.g.,
with traditional 2DVO. change lighting conditions, replace authentication systems). Virtual
While our findings suggest that a VR-based research approach artefacts are also easier to store, reuse, deploy, and share because
can provide researchers with insights into participants’ observation they do not require physical storage space [44, 46].
strategies, doing this is not necessarily in favour of a critical se- We believe that VR replications are particularly promising for us-
curity evaluation at times where the observation method deviates able security and privacy research when the targeted real-world space
from a realistic observation (e.g., mouse-keyboard manipulations is not available, which is not unlikely when conducting research in
in 3DO). Fig. 3 and the qualitative feedback suggest that partici- relatively sensitive and private contexts (e.g., studying ATM authen-
pants made use of the affordances of 3DO (e.g., being physically tication behaviour [19] or security systems at airports [64]).
independent), but using 2DVO and VRO led to more accurate obser-
vations (see Sect. 5.1). This means that at times where VR-based 6.4 Lessons Learned and Recommendations
observation methods are introduced for authentication research (e.g., We outline four lessons learned and recommendations to support
3DO) and the shoulder surfing resilience of a system is at the centre and guide researchers in future VR-based shoulder surfing studies.
of the investigation, participant-defined observation positions can Recommendation #1: Account For Real-World Factors if
greatly overestimate a system’s resilience against observations. Tak- They are of Relevance and Consider How the Correspond-
ing 3DO and ATM authentication as an example, someone could ing Research Findings Transfer to the Real World. The use
conclude that observations on ATM authentications are success- of VR can greatly advance shoulder surfing research by enabling
ful in “only” 83.33% observations, while both the de facto standard researchers to get insights into participants’ observation strate-
evaluation approach (i.e., 2DVO [20,47]) and VRO resulted in notice- gies. However, results from such VR studies also highly depend
able more successful observations (2DVO: 94.44%, VRO: 95.59%). on how well reality is emulated (e.g., proxemics [32], additional
Therefore, researchers risk being mislead into thinking that the sys- bystanders [40]). We encourage researchers to control for prox-
tem is more resilient against observations than it actually is. emics [32] in virtual environments if social factors are of relevance
to the research question. Contrary to prior work that found users’
6.2 VR Observation Methods and Their Use Cases perception of personal space in the real world is similar to that in
The literature discussed how participants’ lack of experience can lead a virtual environment [10, 36], we noticed that at times where par-
to an under-estimation of risk [82] and emphasised the importance ticipants optimise their shoulder surfing observations, social factors
of participants’ familiarity with the authentication methods (e.g., and the proximity to the user authenticating lose relevance and may
[18, 20, 39, 43, 48]). Building upon these discussions, we argue that even be ignored by participants. There are several directions where
participants’ experience is particularly important when researchers future work is called. For example, we encourage future work to
introduce novel observation methods for shoulder surfing research. consider detection mechanisms that inform participants during their
As evidenced by our semi-structured interviews, VRO were perceived observations when they are in the user‘s field of view. In cases
as highly realistic. However, the interaction with alternative methods, where the user authenticating would be aware of an observation,
which differ from participants’ real-world observation experiences participants may want to reconsider their observation position to
(e.g., mouse-keyboard manipulations, 3DO), can have a negative perform less conspicuous observations (as reported by P9 and P10
impact on shoulder surfing evaluations and corresponding security in Sect. 5.4.3). At this point, it is important to consider the existing
conclusions of authentication systems. Still, in cases where the community discussions when aiming for close-to-reality shoulder
focus is more on an exploratory shoulder surfing evaluation such as surfing behaviour in virtual environments. Slater [69] argued that
the effect of both “place illusion” and “plausibility illusion” (PI) impact the safety and well-being of people. While our initial investi-
can contribute to realistic behaviour in virtual environments and that gation of using VR for shoulder surfing research on authentication
improved visual realism can enhance realistic behavioural responses systems took place in the lab, we encourage future work to look
[70]. Skarbez et al. [67, 68] argued that PI is “essentially the extent at more distributed research approaches [49]. While remote (vir-
to which a scenario complies with a user‘s expectations”. As put tual/augmented reality) experiments introduce practical and ethical
by Weber et al. [80], “there is only little research about the effects concerns [71], they can, as put by Steed et al. [72], “continue to
of perceived realism in VR and the conducted studies generally forge forward with experimental work”.
show that higher realism goes along with stronger presence”. It
is important to note that the effect of perceived realism in VR is 7 F UTURE R ESEARCH D IRECTIONS
often relatively small [80] and that a high level of realism does not We explored the strengths and weaknesses of 3D VR recordings for
necessarily imply strong presence [37]. shoulder surfing research, which we compared to state-of-the-art
We demonstrated how VR increases participants‘ perceived shoul- shoulder surfing evaluations using 2D video recordings. We were
der surfing realism, but it is important to keep in mind that hinting particularly interested in participants‘ shoulder surfing behaviour
at similar behaviour to the real world is, due to the the introduced and how participants exploit VR‘s unique affordances when perform-
challenges when conducting security and privacy research in the ing observation attacks on user authentications. However, we did not
wild [19, 46], often only possible using qualitative research methods account for the many additional factors (e.g., shoulder surfing users
(as done in Sect. 5.4 or in [19,23]). Conducting similar shoulder surf- when interacting with different devices such as tablets [57], or situa-
ing studies in the real world (e.g., in different private and sensitive tions in which shoulder surfing defense strategies are applied [42]).
contexts) would go beyond what is ethically and legally possible. We leave this to future work. Similar to the work by Aviv et al. [8]
Recommendation #2: Consider How Participants Can we did not study text-based authentication, mainly because tradi-
Best Be Familiarised With VR Observation Methods. Partici- tional PIN and pattern authentications are the most commonly used
pants’ lack of experience w.r.t. novel shoulder surfing methods can baselines measures in shoulder surfing and authentication research
significantly impact their experience, preference, and performance (e.g., [20, 31, 39]). Future research may apply 3D VR recordings
when observing authentications. Even traditional input systems for the evaluation of multimodal authentication schemes (e.g., gaze
(e.g., mouse-keyboard manipulations) can have a negative impact + touch/mid-air [5, 39]). Furthermore, we used a non-vivid envi-
on participants’ experience and performance. Consequently, it is ronment (e.g., no additional bystanders) to immerse participants
important to introduce participants to novel (VR-based) shoulder into different authentication scenarios. We did this because one
surfing methods prior to the data collection as their lack of experi- key factor of shoulder surfing research on authentication systems
ence can significantly impact the outcome of a system’s shoulder is to provide participants (in the role of observers) with a best-case
surfing evaluation (e.g., see Sect. 5.1). scenario when observing authentications (e.g., [11,39,63,78]). More
vivid contexts may led to an even more realistic atmosphere, which
Recommendation #3: Consider a VR-Based Shoulder forms an interesting future research direction. Finally, a photorealis-
Surfing Approach When the Aim is to Contribute Towards Rea- tic VR environment may further increase the visual realism of such a
sonably “Realistic” Shoulder Surfing Experiences, but Keep virtual environment. However, recording such sensitive and private
2DVO as a Baseline Measure. As evidenced through our par- contexts as studied in our work is often infeasible to do in the wild.
ticipants’ qualitative feedback and the IPQ scores (see Sect. 5.4 and For example, creating 360◦ real-world recordings as done in the
Sect. 5.2), VRO leads to more realistic shoulder surfing experiments work by Saad et al. [62] introduces ethical and legal challenges in
compared to using 2DVO. However, traditional 2DVO already pro- the context of ATM authentication. Such recordings are also limited
vide a suitable baseline measure for a system’s resilience against to what is actually possible to stage/record in the real world. Virtual
observations [9]. While novel shoulder surfing methods (e.g., 3DO, replications are particularly promising at this point because they
VRO) may be used to contribute towards more realistic shoulder surf- provide researchers with more flexibility in changing parts of the
ing experiences and increase participants’ sense of being part of the environment [44] and enable researchers to study scenarios that are
shoulder surfing environment, they do not necessarily outperform tra- challenging (or even impossible) to access in the real world.
ditional 2DVO. It is important to set clear expectations and identify
at the beginning of the research whether or not it is useful to employ 8 C ONCLUSION
a VR-based research approach when studying shoulder surfing. In We introduced non-immersive and immersive VR observations to
situations where investigations in the wild are infeasible, VR-based advance lab-based shoulder surfing research. We demonstrated how
shoulder surfing research can be particularly promising, but to make VR and its unique affordances can be applied in the human-centred
results more tangible, and to support replication studies and com- security research domain to study shoulder surfing in different au-
parisons to prior works, we recommend to keep state-of-the-art 2D thentication scenarios. We showed that immersive VR recordings
video observations (i.e., 2DVO) as a baseline condition. provide participants with a reasonably realistic human shoulder surf-
Recommendation #4: Use VR to Study Shoulder Surf- ing experience without impacting their observation performance
ing in Contexts that are Challenging to Access in the Real compared to commonly used 2D video recordings. Through our
World. VR-based shoulder surfing studies are not an alternative to investigation of using VR for shoulder surfing research, we hope to
real-world research, but rather complement and advance lab studies contribute to more realistic human-centred security research in the
by enabling researchers to study scenarios that are otherwise chal- long run and encourage future work to find ways to further improve
lenging to access (e.g., ATM authentication [18, 19, 22]). In such lab-based usable security and privacy research using VR.
situations, using VR for human-centred shoulder surfing research
can be particularly valuable as such a research approach does not ACKNOWLEDGMENTS
require having physical access to private and sensitive contexts and We thank all participants for taking part in our research. We also
gives researchers more control of the study environments (e.g., high thank all reviewers whose comments significantly improved the pa-
internal validity, more consistency across participants). Virtual en- per. This publication was supported by the University of Edinburgh
vironments are often more affordable and faster to build, deploy, and the University of Glasgow jointly funded PhD studentships, and
and evaluate than corresponding real-world scenarios [44]. The use partially by the EPSRC (EP/V008870/1) and the PETRAS National
of VR as a testbed for human-centred research can be particularly Centre of Excellence for IoT Systems Cybersecurity, which is also
promising at times where pandemics (e.g., COVID-19) significantly funded by the EPSRC (EP/S035362/1).
R EFERENCES atm. In Communication by Gaze Interaction (COGAIN), 2008.
[23] M. Eiband, M. Khamis, E. von Zezschwitz, H. Hussmann, and F. Alt.
[1] 3d atm model, 2019. [Link] Understanding shoulder surfing in the wild: Stories from users and ob-
accessed 04 November 2021. servers. In Proc. of the SIGCHI Conf. on Human Factors in Computing
[2] 3d smartphone model, 2021. [Link] Systems, CHI ’17. ACM, New York, NY, USA, 2017.
[Link], accessed 04 November 2021. [24] L. A. Elkin, M. Kay, J. J. Higgins, and J. O. Wobbrock. An aligned
[3] U. 3D. User manual, 2021. [Link] rank transform procedure for multifactor contrast tests, 2021.
[Link], accessed 04 November 2021. [25] U. Erra, D. Malandrino, and L. Pepe. Virtual reality interfaces for
[4] Y. Abdelrahman, M. Khamis, S. Schneegass, and F. Alt. Stay cool! interacting with three-dimensional graphs. International Journal of
understanding thermal attacks on mobile-based user authentication. In Human–Computer Interaction, 2019.
Proc. of the 2017 CHI Conf. on Human Factors in Computing Systems, [26] M. Feick, N. Kleer, A. Tang, and A. Krüger. The virtual reality ques-
CHI ’17. ACM, New York, NY, USA, 2017. tionnaire toolkit. UIST Adjunct. ACM, New York, NY, USA, 2020.
[5] Y. Abdrabou, M. Khamis, R. M. Eisa, S. Ismael, and A. Elmougy. En- [27] S. M. Fiore, G. W. Harrison, C. E. Hughes, and E. E. Rutström. Virtual
gage: Resisting shoulder surfing using novel gaze gestures authentica- experiments and environmental policy. Environmental Economics and
tion. In Proc. of the 17th International Conf. on Mobile and Ubiquitous Management, 2009.
Multimedia. ACM, New York, NY, USA, 2018. [28] L. Freina and M. Ott. A literature review on immersive virtual reality
[6] Y. Abdrabou, M. Khamis, R. M. Eisa, S. Ismail, and A. Elmougy. Just in education: state of the art and perspectives. In The international
gaze and wave: Exploring the use of gaze and gestures for shoulder- scientific Conf. elearning and software for education, 2015.
surfing resilient authentication. In Proc. of the ACM Symp. on Eye [29] S. Garfinkel and H. R. Lipford. Usable security: History, themes, and
Tracking Research & Applications. ACM, New York, NY, USA, 2019. challenges. Synthesis Lectures on Information Security, Privacy, and
[7] T. Amano, S. Kajita, H. Yamaguchi, T. Higashino, and M. Takai. Trust, 2014.
Smartphone applications testbed using virtual reality. In Proc. of [30] C. George, M. Khamis, D. Buschek, and H. Hussmann. Investigating
the 15th EAI International Conf. on Mobile and Ubiquitous Systems: the third dimension for authentication in immersive virtual reality and
Computing, Networking and Services, MobiQuitous ’18. ACM, New in the real world. In 2019 IEEE Conf. on Virtual Reality and 3D User
York, NY, USA, 2018. Interfaces (VR), March 2019.
[8] A. J. Aviv, J. T. Davin, F. Wolf, and R. Kuber. Towards baselines [31] C. George, M. Khamis, E. von Zezschwitz, M. Burger, H. Schmidt,
for shoulder surfing on mobile authentication. In Proc. of the 33rd F. Alt, and H. Hussmann. Seamless and secure vr: Adapting and
Annual Computer Security Applications Conference, ACSAC 2017. evaluating established authentication systems for virtual reality. In
ACM, New York, NY, USA, 2017. Network and Distributed System Security Symposium (NDSS 2017),
[9] A. J. Aviv, F. Wolf, and R. Kuber. Comparing video based shoul- USEC ’17. NDSS, February 2017.
der surfing with live simulation. In Proc. of the Computer Security [32] E. T. Hall. The hidden dimension. Garden City, NY: Doubleday, 1966.
Applications Conf., ACSAC ’18. ACM, New York, NY, USA, 2018. [33] S. Hart and L. Staveland. Development of NASA-TLX (Task Load
[10] J. N. Bailenson, J. Blascovich, A. C. Beall, and J. M. Loomis. Equi- Index): Results of empirical and theoretical research. In Human mental
librium theory revisited: Mutual gaze and personal space in virtual workload, 1988.
environments. Presence, 2001. [34] S. G. Hart. Nasa-task load index (nasa-tlx); 20 years later. In Proc.
[11] A. Bianchi, I. Oakley, and D. S. Kwon. Spinlock: A single-cue haptic of the human factors and ergonomics society annual meeting. Sage
and audio pin input technique for authentication. In Haptic and Audio publications Sage CA: Los Angeles, CA, 2006.
Interaction Design. Springer, Berlin, Heidelberg, 2011. [35] M. Hassenzahl, M. Burmester, and F. Koller. Attrakdiff: A question-
[12] J. Blascovich, J. Loomis, A. C. Beall, K. R. Swinth, C. L. Hoyt, and J. N. naire to measure perceived hedonic and pragmatic quality. In Mensch
Bailenson. Immersive virtual environment technology as a method- & Computer, 2003.
ological tool for social psychology. Psychological Inquiry, 2002. [36] H. Hecht, R. Welsch, J. Viehoff, and M. R. Longo. The shape of
[13] L. Bošnjak and B. Brumen. Shoulder surfing experiments: A systematic personal space. Acta Psychologica, 2019.
literature review. Computers & Security, 2020. [37] M. Hofer, T. Hartmann, A. Eden, R. Ratan, and L. Hahn. The role of
[14] J. Brooke. Sus: a ”quick and dirty” usability. 1996. plausibility in the experience of spatial presence in virtual environments.
[15] F. Brudy, D. Ledo, S. Greenberg, and A. Butz. Is Anyone Looking? Frontiers in Virtual Reality, 2020.
Mitigating Shoulder Surfing on Public Displays through Awareness [38] T. hundred fifty-five (255) pixel studios. City package, 2021.
and Protection. In Proc. of The International Symposium on Pervasive [Link]
Displays, PerDis ’14. ACM, New York, NY, USA, 2014. package-107224, accessed 04 November 2021.
[16] J. Cohen. Eta-squared and partial eta-squared in fixed factor anova [39] M. Khamis, F. Alt, M. Hassib, E. von Zezschwitz, R. Hasholzner, and
designs. Educational and Psychological Measurement, 1973. A. Bulling. Gazetouchpass: Multimodal authentication using gaze
[17] J. Cohen. Statistical power analysis for the behavioral sciences. Aca- and touch on mobile devices. In Proc. of the 34th Annual ACM Conf.
demic press, 2013. Extended Abstracts on Human Factors in Computing Systems, CHI EA
[18] A. De Luca, K. Hertzschuch, and H. Hussmann. Colorpin: Securing ’16. ACM, New York, NY, USA, 2016.
pin entry through indirect input. In Proc. of the SIGCHI Conf. on [40] M. Khamis, L. Bandelow, S. Schick, D. Casadevall, A. Bulling, and
Human Factors in Computing Systems, CHI ’10. ACM, New York, NY, F. Alt. They are all after you: Investigating the viability of a threat
USA, 2010. model that involves multiple shoulder surfers. In Proc. of the 16th
[19] A. De Luca, M. Langheinrich, and H. Hussmann. Towards understand- International Conf. on Mobile and Ubiquitous Multimedia, MUM ’17.
ing atm security: A field study of real world atm use. In Proc. of the ACM, New York, NY, USA, 2017.
6th Symposium on Usable Privacy and Security, SOUPS ’10. ACM, [41] M. Khamis, L. Trotter, V. Mäkelä, E. v. Zezschwitz, J. Le, A. Bulling,
New York, NY, USA, 2010. and F. Alt. Cueauth: Comparing touch, mid-air gestures, and gaze
[20] A. De Luca, E. von Zezschwitz, N. D. H. Nguyen, M.-E. Maurer, for cue-based authentication on situated displays. Proc. ACM Interact.
E. Rubegni, M. P. Scipioni, and M. Langheinrich. Back-of-device Mob. Wearable Ubiquitous Technol., Dec. 2018.
authentication on smartphones. In Proc. of the SIGCHI Conf. on [42] H. Khan, U. Hengartner, and D. Vogel. Evaluating attack and defense
Human Factors in Computing Systems, CHI ’13. ACM, New York, NY, strategies for smartphone pin shoulder surfing. In Proc. of the 2018
USA, 2013. CHI Conf. on Human Factors in Computing Systems. ACM, New York,
[21] A. De Luca, E. von Zezschwitz, L. Pichler, and H. Hussmann. Using NY, USA, 2018.
Fake Cursors to Secure On-Screen Password Entry. In Proc. of the [43] L. Kraus, R. Schmidt, M. Walch, F. Schaub, and S. Möller. On the
SIGCHI Conf. on Human Factors in Computing Systems, CHI ’13. use of emojis in mobile authentication. In S. De Capitani di Vimercati
ACM, New York, NY, USA, 2013. and F. Martinelli, eds., ICT Systems Security and Privacy Protection.
[22] P. Dunphy, A. Fitch, and P. Olivier. Gaze-contingent passwords at the Springer International Publishing, Cham, 2017.
[44] V. Mäkelä, S. R. R. Rivu, S. Alsherif, M. Khamis, C. Xiao, L. M. presence: Factor analytic insights. Presence: Teleoperators & Virtual
Borchert, A. Schmidt, and F. Alt. Virtual Field Studies: Conducting Environments, 2001.
Studies on Public Displays in Virtual Reality. In Proc. of the 38th [67] R. Skarbez, F. P. Brooks, Jr., and M. C. Whitton. A survey of presence
Annual ACM Conf. on Human Factors in Computing Systems, CHI ’20. and related concepts. ACM Comput. Surv., Nov. 2017.
ACM, New York, NY, USA, 2020. [68] R. Skarbez, J. Gabbard, D. A. Bowman, T. Ogle, and T. Tucker. Virtual
[45] F. Mathis, K. Vaniea, and M. Khamis. Observing virtual avatars: The replicas of real places: Experimental investigations. IEEE Transactions
impact of avatars’ fidelity on identifying interactions. In Proc. of the on Visualization and Computer Graphics, 2021.
24th International Conf. on Academic Mindtrek, AcademicMindtrek [69] M. Slater. Place illusion and plausibility can lead to realistic behaviour
’21. ACM, New York, NY, USA, 2021. in immersive virtual environments. Philosophical Transactions of the
[46] F. Mathis, K. Vaniea, and M. Khamis. Prototyping usable privacy Royal Society B: Biological Sciences, 2009.
and security systems: Insights from experts. International Journal of [70] M. Slater, P. Khanna, J. Mortensen, and I. Yu. Visual realism enhances
Human–Computer Interaction, 2021. realistic response in an immersive virtual environment. IEEE computer
[47] F. Mathis, K. Vaniea, and M. Khamis. Replicueauth: Validating the graphics and applications, 2009.
use of a lab-based virtual reality setup for evaluating authentication [71] A. Steed, S. Frlston, M. M. Lopez, J. Drummond, Y. Pan, and D. Swapp.
systems. In Proc. of the 39th Annual ACM Conf. on Human Factors in An ‘in the wild’ experiment on presence and embodiment using con-
Computing Systems, CHI ’21. ACM, New York, NY, USA, 2021. sumer virtual reality equipment. IEEE Transactions on Visualization
[48] F. Mathis, J. H. Williamson, K. Vaniea, and M. Khamis. Fast and secure and Computer Graphics, 2016.
authentication in virtual reality using coordinated 3d manipulation and [72] A. Steed, F. Ortega, A. Williams, E. Kruijff, W. Stuerzlinger, A. Bat-
pointing. ACM Trans. Comput.-Hum. Interact., Jan. 2021. maz, A. Won, E. Rosenberg, A. Simeone, and A. Hayes. Evaluating
[49] F. Mathis, X. Zhang, J. O’Hagan, D. Medeiros, P. Saeghe, M. McGill, immersive experiences during covid-19 and beyond. 2020.
S. Brewster, and M. Khamis. Remote xr studies: The golden future of [73] A. Steed and R. Schroeder. Collaboration in immersive and non-
hci research? In CHI 2021 Workshop on XR Remote Research, 2021. immersive virtual environments. In Immersed in Media. 2015.
[50] L. Motion. Leap motion, 2019. accessed 04 November 2021. [74] TI. Ultimatereplay, 2021. [Link]
[51] J. O’Hagan and J. R. Williamson. Reality aware vr headsets. In Proc. of camera/ultimate-replay-2-0-178602, accessed 04 November 2021.
the 9TH ACM International Symposium on Pervasive Displays, PerDis [75] S. Ventura, E. Brivio, G. Riva, and R. M. Baños. Immersive versus non-
’20. ACM, New York, NY, USA, 2020. immersive experience: Exploring the feasibility of memory assessment
[52] J. O’Hagan, J. R. Williamson, M. McGill, and M. Khamis. Safety, through 360 technology. Frontiers in psychology, 2019.
power imbalances, ethics and proxy sex: Surveying in-the-wild inter- [76] A. Voit, S. Mayer, V. Schwind, and N. Henze. Online, VR, AR, Lab,
actions between vr users and bystanders. In 2021 IEEE International and In-Situ: Comparison of Research Methods to Evaluate Smart Arti-
Symposium on Mixed and Augmented Reality (ISMAR), 2021. facts. In Proc. of the 2019 CHI Conf. on Human Factors in Computing
[53] T. D. Parsons. Virtual reality for enhanced ecological validity and Systems, CHI ’19. ACM, New York, NY, USA, 2019.
experimental control in the clinical, affective and social neurosciences. [77] M. Volkamer, A. Gutmann, K. Renaud, P. Gerber, and P. Mayer. Repli-
Frontiers in Human Neuroscience, 2015. cation study: A cross-country field observation study of real world
[54] S. Pedram, R. Skarbez, S. Palmisano, M. Farrelly, and P. Perez. Lessons {PIN} usage at atms and in various electronic payment scenarios. In
learned from immersive and desktop vr training of mines rescuers. Symposium on Usable Privacy and Security (SOUPS), 2018.
Frontiers in Virtual Reality, 2021. [78] E. von Zezschwitz, A. De Luca, B. Brunkow, and H. Hussmann. Swipin:
[55] S. Putze, D. Alexandrovsky, F. Putze, S. Höffner, J. D. Smeddinck, and Fast and secure pin-entry on smartphones. In Proc. of the 33rd Annual
R. Malaka. Breaking the experience: Effects of questionnaires in vr ACM Conf. on Human Factors in Computing Systems, CHI ’15. ACM,
user studies. In Proc. of the 2020 CHI Conf. on Human Factors in New York, NY, USA, 2015.
Computing Systems, CHI ’20. ACM, New York, NY, USA, 2020. [79] Y. Wang, H. Xia, Y. Yao, and Y. Huang. Flying eyes and hidden
[56] Qualtrics. Qualtrics, 2005. accessed 04 November 2021. controllers: A qualitative study of people’s privacy perceptions of
[57] K. Ragozin, Y. S. Pai, O. Augereau, K. Kise, J. Kerdels, and K. Kunze. civilian drones in the us. Proc. on Privacy Enhancing Tech., 2016.
Private Reader: Using Eye Tracking to Improve Reading Privacy in [80] S. Weber, D. Weibel, and F. W. Mast. How to get there when you are
Public Spaces. In Proc. of the 21st International Conf. on Human- there already? defining presence in virtual reality and the importance
Computer Interaction with Mobile Devices and Services, MobileHCI of perceived realism. Frontiers in Psychology, 2021.
’19. ACM, New York, NY, USA, 2019. [81] M. Weiß, K. Angerbauer, A. Voit, M. Schwarzl, M. Sedlmair, and
[58] F. Rebelo, P. Noriega, E. Duarte, and M. Soares. Using virtual reality S. Mayer. Revisited: Comparison of empirical methods to evaluate
to assess user experience. Human Factors, 2012. visualizations supporting crafting and assembly purposes. IEEE Trans-
[59] G. Robertson, S. Card, and J. Mackinlay. Three views of virtual reality: actions on Visualization and Computer Graphics, 2020.
nonimmersive virtual reality. Computer, 1993. [82] O. Wiese and V. Roth. Pitfalls of shoulder surfing studies. In NDSS
[60] RockVR. Vr capture, 2021. [Link] Workshop on Usable Security, 2015.
video/vr-capture-75654, accessed 04 November 2021. [83] O. Wiese and V. Roth. See you next time: A model for modern shoulder
[61] V. Roth, K. Richter, and R. Freidinger. A pin-entry method resilient surfers. In Proc. of the 18th International Conf. on Human-Computer
against shoulder surfing. In Proc. of the 11th ACM Conf. on Computer Interaction with Mobile Devices and Services, MobileHCI ’16. ACM,
and Communications Security. ACM, New York, NY, USA, 2004. New York, NY, USA, 2016.
[62] A. Saad, J. Liebers, U. Gruenefeld, F. Alt, and S. Schneegass. Un- [84] J. O. Wobbrock, L. Findlater, D. Gergle, and J. J. Higgins. The aligned
derstanding bystanders’ tendency to shoulder surf smartphones using rank transform for nonparametric factorial analyses using only anova
360-degree videos in virtual reality. 2018. procedures. In Proc. of the SIGCHI Conf. on Human Factors in Com-
[63] H. Sasamoto, N. Christin, and E. Hayashi. Undercover: Authentication puting Systems, CHI ’11. ACM, New York, NY, USA, 2011.
usable in front of prying eyes. In Proc. of the SIGCHI Conf. on Human [85] E. Wu, M. Piekenbrock, T. Nakumura, and H. Koike. Spinpong - virtual
Factors in Computing Systems. ACM, New York, NY, USA, 2008. reality table tennis skill acquisition using visual, haptic and temporal
[64] M. A. Sasse. Red-eye blink, bendy shuffle, and the yuck factor: A user cues. IEEE Transactions on Visualization and Computer Graphics,
experience of biometric airport systems. IEEE Security Privacy, 2007. 2021.
[65] G.-L. Savino, N. Emanuel, S. Kowalzik, F. Kroll, M. C. Lange, M. Lau- [86] N. H. Zakaria, D. Griffiths, S. Brostoff, and J. Yan. Shoulder surfing
dan, R. Leder, Z. Liang, D. Markhabayeva, M. Schmeißer, N. Schütz, defence for recall-based graphical passwords. In Proc. of the 7th Symp.
C. Stellmacher, Z. Xu, K. Bub, T. Kluss, J. Maldonado, E. Kruijff, and on Usable Privacy and Security, SOUPS ’11. ACM, New York, NY,
J. Schöning. Comparing pedestrian navigation methods in virtual real- USA, 2011.
ity and real life. In 2019 International Conf. on Multimodal Interaction,
ICMI ’19. ACM, New York, NY, USA, 2019.
[66] T. Schubert, F. Friedmann, and H. Regenbrecht. The experience of