\setcctype

by

Characterizing Personality from Eye-Tracking: The Role of Gaze and Its Absence in Interactive Search Environments

Jiaman He 0009-0007-2817-7675 RMIT UniversityNaarm/MelbourneAustralia [email protected] , Marta Micheli 0009-0003-4562-0334 University of TurinTorinoItaly [email protected] , Damiano Spina 0000-0001-9913-433X RMIT UniversityNaarm/MelbourneAustralia [email protected] , Dana McKay 0000-0001-7522-1842 RMIT UniversityNaarm/MelbourneAustralia [email protected] , Johanne R. Trippas 0000-0002-7801-0239 RMIT UniversityNaarm/MelbourneAustralia [email protected] and Noriko Kando 0000-0002-2133-0215 National Institute of InformaticsTokyoJapan [email protected]
(2026)
Abstract.

Personality traits influence how individuals engage, behave, and make decisions during the information-seeking process. However, few studies have linked personality to observable search behaviors. This study aims to characterize personality traits through a multimodal time-series model that integrates eye-tracking data and gaze missingness–periods when the user’s gaze is not captured. This approach is based on the idea that people often look away when they think, signaling disengagement or reflection. We conducted a user study with 25 participants, who used an interactive application on an iPad, allowing them to engage with digital artifacts from a museum. We rely on raw gaze data from an eye tracker, minimizing preprocessing so that behavioral patterns can be preserved without substantial data cleaning. From this perspective, we trained models to predict personality traits using gaze signals. Our results from a five-fold cross-validation study demonstrate strong predictive performance across all five dimensions: Neuroticism (Macro F1 = 77.69%), Conscientiousness (74.52%), Openness (77.52%), Agreeableness (73.09%), and Extraversion (76.69%). The ablation study examines whether the absence of gaze information affects the model performance, demonstrating that incorporating missingness improves multimodal time-series modeling. The full model, which integrates both time-series signals and missingness information, achieves 10–15% higher accuracy and macro F1 scores across all Big Five traits compared to the model without time-series signals and missingness. These findings provide evidence that personality can be inferred from search-related gaze behavior and demonstrate the value of incorporating missing gaze data into time-series multimodal modeling.

Eye Tracking, Personality Prediction, Interactive Search
journalyear: 2026copyright: ccconference: 2026 ACM SIGIR Conference on Human Information Interaction and Retrieval; March 22–26, 2026; Seattle, WA, USAbooktitle: 2026 ACM SIGIR Conference on Human Information Interaction and Retrieval (CHIIR ’26), March 22–26, 2026, Seattle, WA, USAdoi: 10.1145/3786304.3788842isbn: 979-8-4007-2414-5/2026/03ccs: Information systems Users and interactive retrieval

1. Introduction

Consider the different ways people behave when searching for information. Some are easily distracted, while others display a strong sense of curiosity. Certain individuals prefer to skim ahead to the end, whereas others move back and forth through the material to piece together a deeper understanding. The motivations for searching also vary: some seek confirmation of what they already believe (Boonprakong et al., 2025), while others are driven by the desire to learn something new (Kang et al., 2009). Exploring these different behaviors is essential for advancing areas such as recommendation systems (Hu and Pu, 2011; Onori et al., 2016; Yusefi Hafshejani et al., 2018; Dhelim et al., 2022), personalization (Ho et al., 2008; Javadi et al., 2026), understanding confirmation bias (Melinder et al., 2020), or LLM-based simulations (Ma et al., 2025; He et al., 2025a; Leng et al., 2025; Zerhoudi et al., 2026). Among the many factors that shape these behaviors, personality traits stand out as a foundational influence, impacting human patterns of thought, preference, and decision-making across contexts (Karumur et al., 2018).

Personality traits fundamentally shape how and why people seek information. Curious and open individuals often seek intellectual engagement, while those high in neuroticism or intolerance of uncertainty tend to seek reassurance in negative contexts (Jach et al., 2022). The Big Five model (McCrae and Costa, 1987), which includes Openness, Conscientiousness, Extraversion, Agreeableness, and Neuroticism, is widely used in personalization research. Studies link personality to information seeking (Heinström, 2003) and web search behaviors (Ashkanasy et al., 2007), showing its potential for personalized information retrieval (IR). However, while the importance of psychological dimensions in interaction and search has long been acknowledged as a major challenge in IR (Belkin, 1990), the impact of the systematic exploration personality on IR behaviors and its data-driven modeling remains in its early stages.

A direction for modeling personality in IR lies in eye-tracking data. With mobile and wearable devices (e.g., Tobii Pro Glasses 3 (Pro, 2025), Apple Vision Pro (Inc., 2025)) now featuring eye-tracking capabilities (Krafka et al., 2016), it is possible to collect such data in real-world settings. Eye movements provide insights into cognitive and attentional processes even before a user makes a decision (He et al., 2025b). Previous work has investigated links between personality and gaze patterns (Hoppe et al., 2018; Berkovsky et al., 2019), though studies focusing specifically on personality inference from eye movements during digital search interactions remain limited (Chen et al., 2023; Millecamp et al., 2021; Woods et al., 2022). In addition, eye-tracking data also present challenges; for instance, data missingness occurs frequently when users are not looking at the screen, and noise often requires complex preprocessing. We address these issues by interpreting missingness–the absence of gaze–as informative signals and by using raw gaze data obtained directly from the eye tracker (Section 3.1.3), avoiding additional cleaning steps in order to capture behavioral patterns without extensive preprocessing.

Building on this perspective, we investigate the role of viewing behaviors in recognizing users’ Big Five personality traits during search tasks. Our user study uses a GUI-based application displaying museum items, enabling the collection of rich data on interactive search behaviors, including gaze patterns and engagement dynamics. Our main contributions are as follows:

  1. (1)

    We propose a multimodal time-series model that incorporates missing eye-tracking data as behavioral cues, integrating gaze and pupil signals to capture both attention patterns and underlying cognitive states.

  2. (2)

    We show that users’ Big Five personality traits can be predicted from eye-tracking data in complex search environments, achieving a classification performance across all five dimensions: Neuroticism (Macro F1 = 77.69%), Conscientiousness (74.52%), Openness (77.52%), Agreeableness (73.09%) (see Table 1).

  3. (3)

    We present findings that characterize how users search and explore digitized museum collections, based on a controlled user study (N=25N=25) involving a graphical digital interface.

2. Related Work

2.1. Personality Traits and Search Behavior

Personality traits describe stable patterns in how individuals think, feel, and behave (John et al., 1999). These traits shape emotional and motivational processes, influencing how people seek, process, and use information (Sanderson and Dumais, 2007). Prior research indicates that Big Five personality traits are associated with differences in information search behaviors (Jach et al., 2022).

Curiosity reflects a drive to explore novelty, complexity, or ambiguity (Kashdan et al., 2018). It can be expressed as deprivation sensitivity, where individuals are motivated to close knowledge gaps, or as joyous exploration, where the act of learning itself is intrinsically rewarding (Kashdan et al., 2018). By contrast, intolerance of uncertainty describes the tendency to experience uncertain situations as threatening, distressing, or undesirable (Carleton et al., 2007).

Curiosity—particularly joyous exploration—is usually understood as part of openness/intellect, which captures differences in creativity, imagination, and intellectual engagement (Silvia and Christensen, 2020; Kashdan et al., 2018). Intolerance of uncertainty aligns more closely with neuroticism, reflecting vulnerability to negative affect and worry (Jach and Smillie, 2019; Belkin, 1990). Together with extraversion, agreeableness, and conscientiousness, these dimensions form the Big Five model of personality (Markon, 2009).

These traits play an important role in information-seeking behavior. Individuals high in joyous exploration may search broadly and openly, driven by intrinsic interest, while those high in deprivation sensitivity may pursue information more urgently to resolve perceived knowledge gaps. Conversely, individuals with high intolerance of uncertainty may engage in searching defensively, focusing on information that reduces ambiguity or confirms certainty, while avoiding open-ended exploration. In this way, personality traits can be seen as probabilistic predictors of how people approach information environments: shaping whether they explore widely, focus narrowly, avoid uncertainty, or embrace it.

From a dynamic perspective, personality traits should not be viewed as fixed responses but as distributions of tendencies across time and context (Fleeson and Gallagher, 2009). For example, someone high in intolerance of uncertainty will not resist ambiguity in every instance but will do so more frequently and with greater intensity than someone lower on that trait. Similarly, individuals high in curiosity will tend to experience more frequent and stronger motivation to seek new information (Jach et al., 2022). Thus, understanding personality traits provides valuable insight into the variability of information-seeking strategies across individuals.

In our study, we incorporate the Big Five personality traits (McCrae and Costa, 1987), commonly summarized by the acronym OCEAN: Openness (O), Conscientiousness (C), Extraversion (E), Agreeableness (A), and Neuroticism (N). We use eye-tracking data to examine how these traits relate to user behavior.

2.2. Eye Tracking and Personality Traits

Eye-tracking has been widely used to investigate people’s cognitive states and their relationship to information-seeking behavior (Eickhoff et al., 2015; Cole et al., 2013; He et al., 2025b). Research suggests that individuals with different personality traits may follow distinct eye movement patterns when searching for information (Al-Samarraie et al., 2017). More broadly, prior work has established that personality traits can influence gaze control and visual attention (Matsumoto et al., 2010; Rauthmann et al., 2012; Al-Samarraie et al., 2017; Sarsam et al., 2023). This has motivated a series of studies exploring whether personality can be inferred directly from eye-tracking data.

Hoppe et al. (2018) examined the Big Five traits during everyday activities. Participants wore head-mounted eye trackers while walking and shopping, and the resulting gaze data were used to train random forest classifiers. Their models achieved above-chance prediction of Extraversion, Neuroticism, Agreeableness, and Conscientiousness. Berkovsky et al. (2019) extended this work to controlled settings, proposing a framework for predicting psychological characteristics—including the Big Five—based on passive viewing of images and videos. Using data from 21 participants, they applied supervised machine learning techniques and achieved encouraging levels of accuracy.

Other studies focused on interactive systems. Chen et al. (2023) collected gaze data during product selection tasks with recommender systems, applying multiple classifiers and feature selection methods to predict personality. Millecamp et al. (2021) studied gaze patterns in a music recommender system, with models predicting traits such as Openness to Experience, though their accuracy was not yet sufficient for practical applications.

Expanding to immersive contexts, Khatri et al. (2022) examined user behavior in a virtual reality shop, incorporating three-dimensional gaze features to predict the Big Five. Woods et al. (2022) investigated social media use, tracking participants as they browsed their Facebook News Feeds. Using only 20 seconds of gaze data per user, classification produced acceptable results for Extraversion and Conscientiousness, though performance was weaker for other traits.

Collectively, these studies highlight the dual role of eye tracking: it not only provides insights into cognitive and information-seeking processes but also offers a potential signal for inferring personality. Personality-related differences in gaze behavior suggest that personality-aware models may help explain why individuals vary in how they explore, attend to, and process information.

A practical challenge is that eye-tracking data often contain noise, such as blinks or missing samples when users look away (Ji et al., 2024). Previous studies have typically handled these gaps by interpolation, replacement, or removal (Winn et al., 2018; Franzen et al., 2022; Mathôt et al., 2018; Blumenfeld, 2002; He et al., 2025b). However, we argue that instances of looking away are themselves informative, as they may reveal valuable aspects of human behavior. To leverage this, we propose a novel method that preserves these missing segments. Our approach, detailed in Section 3.2, integrates the missing data into a time-series framework to better capture behavioral patterns.

2.3. Time-Series Modeling and Missing Data

Cole et al. (2015) analyzed time-series activity patterns to study search behavior. Similarly, eye-tracking data can be represented as a multivariate time series, capturing variables such as gaze coordinates, pupil size, and gaze velocity continuously over time (Längkvist et al., 2014). At each timestep, a vector of measurements is recorded, and the sequence of these vectors traces the dynamics of visual attention as it evolves moment by moment.

Most prior work on personality inference from gaze has reduced these streams to aggregate features, such as fixation counts, blink rates, or average saccade lengths(Berkovsky et al., 2019; Hoppe et al., 2018; Millecamp et al., 2021; Woods et al., 2022). While useful, such handcrafted summaries discard the fine-grained temporal patterns, such as prolonged fixations, systematic scanning, or rhythmic shifts in attention, that may carry diagnostic information about personality. By contrast, sequence models such as recurrent neural networks (Hochreiter and Schmidhuber, 1997), convolutional temporal encoders (Wang et al., 2017), and more recently Transformers (Lim et al., 2021) are designed to capture dependencies across time, making them well suited for modeling how gaze evolves during complex tasks.

To motivate this shift, it is helpful to look at parallel domains where similar challenges arise. In human activity recognition (HAR), wearable sensors generate continuous multivariate streams (e.g., accelerometer, gyroscope) that are best understood as sequences (Ordóñez and Roggen, 2016). Likewise, in healthcare, physiological signals such as ECG or PPG are modeled as multivariate time series to capture patterns in heart rhythms or respiration (Hong et al., 2020). In both cases, moving from handcrafted features to sequence modeling has led to significant improvements in predictive accuracy and robustness (Holmqvist et al., 2011). These domains demonstrate the value of treating behavioral signals as structured temporal data—an approach we adopt for eye-tracking in information retrieval.

Another challenge with eye-tracking is the prevalence of missing data. Blinks, momentary loss of calibration, or glances away from the screen introduce gaps in the signal. Earlier studies often discarded these segments or filled them in with simple interpolation (Hoppe et al., 2018; Millecamp et al., 2021). However, research in healthcare time series has shown that missingness can itself be informative (Lipton et al., 2016): the duration and frequency of gaps may reflect underlying behavioral or physiological states. Modern approaches augment sequential inputs with masking vectors and temporal gap features, allowing models to distinguish between transient noise or short-term dropouts and more prolonged periods of missingness in the signal (Che et al., 2018).

Despite the success of these methods in other domains, they have not been applied to eye-tracking for personality prediction. Our study builds on these insights by proposing a missing-data-aware sequence modeling framework, which treats gaze not only as a multivariate temporal signal but also as a behavioral record where the absence of data may carry meaningful information about individual differences in attention and personality.

3. Methodology

3.1. Experimental Setup

Experiments were conducted using the Minpaku Guide (Shoji et al., 2021), an iPad application developed for the National Museum of Ethnology in Osaka, Japan. For this study, the English version of the app was employed. The application provides extensive content related to the museum’s artifacts and is based on an ostensive search model (Campbell and Van Rijsbergen, 1996). The interface follows the same structural and functional principles as a search engine results page (SERP): users begin by browsing a grid of artifact photos, which they can select to view detailed information and scroll to explore related items (Shoji et al., 2021). The app consists of multiple page types, including (i) modular image grid (shown in 1(a)), (ii) individual page, which contains description about a single museum object (shown in 1(b)), and (iii) map-based views.

Refer to caption
(a) Example of image grid page
Refer to caption
(b) Example of description page
Figure 1. Overview of the application interfaces

3.1.1. Procedure

The study followed a structured procedure: participants gave informed consent, completed a pre-task questionnaire and short training, performed the search task, and concluded with a post-questionnaire and brief interview with stimulus recall.

This paper uses the data collected in the following two phases:

  1. (1)

    Personality Assessment. As part of pre-task questionnaire, participants completed the BFI-44 questionnaire (Donahue et al., 1991) in English to measure their Big Five personality traits. This self-reporting instrument has been shown to be reliable (Arterberry et al., 2014).

  2. (2)

    Search Task. Participants explored the Minpaku Guide and selected five items they liked most, marking them as favorites. No time limit was imposed to avoid biasing exploratory behavior. In addition to user’s eye movement (see Section 3.1.3), the application also logged detailed interaction data, including item offsets and vertical scroll positions, since all the pages were scrollable.

This study was reviewed and approved by the Institutional Review Board (IRB) of National Institute of Informatics (NII), Japan.

In preparing the application for the task, we ensured that the quality and arrangement of the images on the first page were appropriate for the experiment. This page, which participants encountered first when using the Minpaku Guide, displayed multiple artifacts in a grid layout. To maximize heterogeneity, we selected images from a larger set using an agglomerative clustering method (K=50K=50) applied to image features. The features were extracted with ResNet50 (He et al., 2016), a deep learning convolutional neural network widely used in image processing research. The items displayed in the Minpaku Guide were identical for all participants.

3.1.2. Participants

Twenty-five participants took part in the experiment, representing different nationalities and ranging in age from 18 to 34 years. Participants were recruited using a snowball sampling strategy. Participants were informed about the purpose and procedures of the study, including its expected duration, potential risks, and policies regarding data storage and usage to safeguard privacy. Participants were also explicitly told that they could withdraw from the study at any point without penalty. All participants provided written informed consent prior to the experiment.

3.1.3. Apparatus

Eye tracking was conducted using a Tobii Pro Nano at a sampling rate of 60 Hz. The device, connected to a Dell Precision 7550 computer, was positioned above an iPad Pro 2022 to avoid obstructing participants during interaction. Each participant sat approximately 50 cm from the iPad.

Because a video capture card was unavailable, the screen activity of the iPad application, along with participants’ gaze trajectories, was recorded using a Logitech C920 webcam. The webcam feed was shown on a separate monitor that was not visible to the participant.

Tracking was performed for both eyes and their recorded data was used for analysis. Recordings had a resolution of 1024 × 576 pixels with an average accuracy of 0.52°. Ambient lighting was kept consistent across participants to avoid effects on pupil dilation.

3.2. Missing-Data-Aware Network for Personality Prediction

We propose a missing-data-aware framework for predicting personality traits from eye-tracking signals. First, raw gaze coordinates, pupil size, and gaze velocity are represented as multivariate time series segmented into fixed-length windows (Section 3.2.1). Second, missingness is modeled with binary masks and temporal gap features, treating absent data as informative signals rather than noise (Section 3.2.3). Finally, the augmented sequences are processed by a bidirectional LSTM, which captures temporal dependencies and produces window-level predictions (Section 3.2.3). Together, these components form a pipeline that leverages both eye-movement patterns and missing data for personality prediction. An overview of the full pipeline is presented in Algorithm 1.

Algorithm 1 Training Missing-Data-Aware Network for Personality Prediction
0: Raw eye-tracking sequence {(gtx,gty,pt)}t=1T\{(g_{t}^{x},g_{t}^{y},p_{t})\}_{t=1}^{T}, sampling period Δt\Delta t, window length LL, number of epochs EE
0: Trained BiLSTM parameters θ={W,b}\theta=\{W,b\}
1: Compute features:
  • Compute gaze velocity vt=𝐠t𝐠t12/Δtv_{t}=\|\mathbf{g}_{t}-\mathbf{g}_{t-1}\|_{2}/\Delta t (v1=0v_{1}=0).

  • Normalize gaze coordinates: g¯tx,g¯ty[1,1]\bar{g}_{t}^{x},\bar{g}_{t}^{y}\in[-1,1].

  • Standardize pupil diameter ptp_{t} and velocity vtv_{t} with z-scores.

2: Handle missingness:
  • Replace NaN with 0.

  • Construct binary mask 𝐦t{0,1}d\mathbf{m}_{t}\in\{0,1\}^{d}.

  • Update temporal gaps 𝚫𝒕t\boldsymbol{\Delta t}_{t} recursively.

3: Build augmented input 𝐟t=[𝐱~t,𝐦t,𝚫𝒕t]k\mathbf{f}_{t}=[\tilde{\mathbf{x}}_{t},\mathbf{m}_{t},\boldsymbol{\Delta t}_{t}]\in\mathbb{R}^{k}.
4: Partition sequence into overlapping windows 𝐖k(n)L×k\mathbf{W}^{(n)}_{k}\in\mathbb{R}^{L\times k}.
5:for epoch =1=1 to EE do
6:  for each window 𝐖k(n)\mathbf{W}^{(n)}_{k} with label y(n)y^{(n)} do
7:   Encode window with BiLSTM {𝐡t}t=1L\rightarrow\{\mathbf{h}_{t}\}_{t=1}^{L}.
8:   Summarize: 𝐳=[𝐡L;𝐡1]\mathbf{z}=[\overrightarrow{\mathbf{h}}_{L};\overleftarrow{\mathbf{h}}_{1}].
9:   Predict with softmax: y^=softmax(W𝐳+b)\hat{y}=\text{softmax}(W\mathbf{z}+b).
10:   Compute loss: =c=1C𝟏[y=c]logy^c\mathcal{L}=-\sum_{c=1}^{C}\mathbf{1}[y=c]\log\hat{y}_{c}.
11:   Update parameters θ\theta with gradient descent.
12:  end for
13:end for

3.2.1. Eye-Tracking Data as a Multivariate Time Series

Let fsf_{s} denote the sampling frequency (Hz), and Δt=1/fs\Delta t=1/f_{s} the sampling period. That is, the eye tracker records fsf_{s} samples per second, with Δt\Delta t seconds between two consecutive recordings.

For a recording session of length TT samples, we represent the eye-tracking stream as a multivariate time series, where each sample is represented as follows:

𝐱t=[gtx,gty,pt,vt]d,t=1,,T,d=4,\mathbf{x}_{t}=\big[g_{t}^{x},\;g_{t}^{y},\;p_{t},\;v_{t}\big]^{\top}\in\mathbb{R}^{d},\quad t=1,\dots,T,\qquad d=4,

where gtx,gtyg_{t}^{x},g_{t}^{y} are the horizontal and vertical gaze coordinates on the display, ptp_{t} is pupil diameter, and vtv_{t} is gaze velocity. Each 𝐱t\mathbf{x}_{t} is therefore a four-dimensional snapshot of eye behavior at time tt.

The gaze velocity is computed from the gaze position vector 𝐠t=(gtx,gty)\mathbf{g}_{t}=(g_{t}^{x},g_{t}^{y})^{\top} as

vt=𝐠t𝐠t12Δt,t2,v_{t}\;=\;\frac{\|\mathbf{g}_{t}-\mathbf{g}_{t-1}\|_{2}}{\Delta t},\quad t\geq 2,

with v1=0v_{1}=0 by convention. This captures how far the eyes moved between consecutive samples, divided by the elapsed time.

Session and window representation.

We collect the full session as

𝐗=[𝐱1,𝐱2,,𝐱T]d×T,\mathbf{X}=\big[\mathbf{x}_{1},\mathbf{x}_{2},\dots,\mathbf{x}_{T}\big]\in\mathbb{R}^{d\times T},

with timestamps τt=(t1)Δt\tau_{t}=(t-1)\Delta t. Here, 𝐗\mathbf{X} is arranged so that each column corresponds to one moment in time (t=1,,Tt=1,\dots,T), and each row corresponds to one of the d=4d=4 recorded features (horizontal gaze gxg^{x}, vertical gaze gyg^{y}, pupil size pp, velocity vv). Thus, 𝐗\mathbf{X} can be seen as a compact table of the entire session: moving across columns follows the sequence of time steps, while moving down rows inspects the different measurements collected at that time.

For model training, this long sequence is divided into overlapping subsequences (windows) of fixed length LL:

𝐖k=[𝐱sk,𝐱sk+1,,𝐱sk+L1]d×L,k=1,,K,\mathbf{W}_{k}=\big[\mathbf{x}_{s_{k}},\mathbf{x}_{s_{k}+1},\dots,\mathbf{x}_{s_{k}+L-1}\big]\in\mathbb{R}^{d\times L},\quad k=1,\dots,K,

where sks_{k} is the start index of the kk-th window and KK is the total number of extracted windows. Each 𝐖k\mathbf{W}_{k} is therefore a short clip of the session that spans LL consecutive timesteps but keeps all dd features at each step. In practice, this sliding-window procedure allows the model to learn from many local fragments of behavior, capturing recurring gaze patterns that may be predictive of personality traits.

Normalization.

Let (W,H)(W,H) denote the screen width and height in pixels, and (gtx,gty)(g_{t}^{x},g_{t}^{y}) the raw gaze position recorded at time tt in pixel coordinates. To remove dependence on the specific display size, gaze coordinates are rescaled to a zero-centered, unit-square system: g¯tx=2gtxW1,g¯ty=2gtyH1,\bar{g}_{t}^{x}\;=\;\frac{2g_{t}^{x}}{W}-1,\qquad\bar{g}_{t}^{y}\;=\;\frac{2g_{t}^{y}}{H}-1, so that (g¯tx,g¯ty)[1,1]2(\bar{g}_{t}^{x},\bar{g}_{t}^{y})\in[-1,1]^{2}. After this transformation, the center of the screen corresponds to (0,0)(0,0), the left and right edges correspond to 1-1 and +1+1 on the xx-axis, and the top and bottom edges correspond to +1+1 and 1-1 on the yy-axis. This mapping ensures that gaze positions are expressed in a common reference frame across devices and participants.

For the pupil diameter, let ptp_{t} be the raw measurement at time tt, μp\mu_{p} the mean pupil diameter across the training set, and σp\sigma_{p} the corresponding standard deviation. We standardize pupil diameter using z-score normalization: p~t=ptμpσp.\tilde{p}_{t}\;=\;\frac{p_{t}-\mu_{p}}{\sigma_{p}}.

Similarly, let vtv_{t} denote the raw gaze velocity, with μv\mu_{v} and σv\sigma_{v} the mean and standard deviation of velocity over the training set. Velocity is standardized in the same way: v~t=vtμvσv.\tilde{v}_{t}\;=\;\frac{v_{t}-\mu_{v}}{\sigma_{v}}.

This standardization rescales ptp_{t} and vtv_{t} to have zero mean and unit variance, so that the model focuses on relative fluctuations (e.g., dilations above or below a typical pupil size, or faster versus slower gaze shifts) rather than absolute raw values, which may vary considerably across individuals.

Final feature vector.

Unless otherwise noted, the input at each timestep tt is the normalized feature vector

𝐱~t=[g¯tx,g¯ty,p~t,v~t]4,\tilde{\mathbf{x}}_{t}\;=\;\big[\bar{g}_{t}^{x},\;\bar{g}_{t}^{y},\;\tilde{p}_{t},\;\tilde{v}_{t}\big]^{\top}\in\mathbb{R}^{4},

which stacks together the four processed measurements: normalized horizontal gaze position g¯tx\bar{g}_{t}^{x}, normalized vertical gaze position g¯ty\bar{g}_{t}^{y}, standardized pupil diameter p~t\tilde{p}_{t}, and standardized gaze velocity v~t\tilde{v}_{t}.

3.2.2. Modeling Missingness in Eye-Tracking Data

Eye-tracking data collected under naturalistic conditions often contain substantial missing values. This missingness arises when the participant blinks, looks away from the screen, or when the tracker momentarily loses calibration. In the raw data, such missing entries are recorded as NaN. Since the model cannot operate directly on NaN values, we replace them with zeros before training. However, the value 0 can also be a valid observation (for example, a gaze coordinate of 0 corresponds to the center of the screen). To prevent confusion between true zeros and placeholders for missing values, we introduce an explicit binary validity mask.

Binary validity masks.

For each feature dimension j{1,,d}j\in\{1,\dots,d\} at time tt, we define a binary indicator variable

mtj={1,if feature j has a valid observed value at time t,0,if feature j is missing (recorded as NaN in the raw data),m_{t}^{j}=\begin{cases}1,&\text{if feature $j$ has a valid observed value at time $t$},\\ 0,&\text{if feature $j$ is missing (recorded as {NaN} in the raw data)},\end{cases}

where d=4d=4 in our case (horizontal gaze, vertical gaze, pupil size, and velocity). These indicators are then collected into a mask vector

𝐦t=[mt1,mt2,,mtd]{0,1}d.\mathbf{m}_{t}=[m_{t}^{1},m_{t}^{2},\dots,m_{t}^{d}]^{\top}\in\{0,1\}^{d}.

The role of this mask is to preserve the distinction between two different situations: (1) a feature truly takes the value 0 (e.g., g¯tx=0\bar{g}_{t}^{x}=0 meaning the gaze is exactly at the horizontal center of the screen), and (2) the original data at that position was missing and has been replaced with 0 only as a placeholder so the model can process the input. Without the mask, these two cases would be indistinguishable.

Concretely, if the horizontal gaze coordinate is missing at time tt, we set its value in 𝐱~t\tilde{\mathbf{x}}_{t} to 0 but also set mt1=0m_{t}^{1}=0. If the gaze is genuinely at the screen center, then the value is also 0 but the mask records mt1=1m_{t}^{1}=1. Thus, the pair (𝐱~t,𝐦t)(\tilde{\mathbf{x}}_{t},\mathbf{m}_{t}) allows the model to differentiate between a true zero measurement and a zero that only indicates missingness.

Temporal gap encoding.

In addition to the binary masks, we record how long each feature has been continuously missing. For feature jj at time tt, we define a temporal gap variable Δttj\Delta t_{t}^{j} that is updated recursively as

Δttj={0,if mtj=1 (feature j is observed at time t),Δtt1j+Δt,if mtj=0 (feature j is missing at time t),\Delta t_{t}^{j}=\begin{cases}0,&\text{if $m_{t}^{j}=1$ (feature $j$ is observed at time $t$)},\\ \Delta t_{t-1}^{j}+\Delta t,&\text{if $m_{t}^{j}=0$ (feature $j$ is missing at time $t$)},\end{cases}

where Δt\Delta t is the sampling period.

Thus, Δttj\Delta t_{t}^{j} counts how long feature jj has been missing up to time tt. Whenever a valid measurement is observed (mtj=1m_{t}^{j}=1), the gap resets to 0. When the feature remains missing across consecutive timesteps (mtj=0m_{t}^{j}=0), the gap grows by Δt\Delta t each time step.

This variable provides temporal context for the missingness:

  • If Δttj\Delta t_{t}^{j} is small (close to 0), the absence is likely due to a short interruption such as a blink or a brief calibration error.

  • If Δttj\Delta t_{t}^{j} grows large, it indicates a prolonged disengagement, for example the participant looking away from the screen for several seconds.

By including Δttj\Delta t_{t}^{j} as an explicit feature, the model can distinguish between short, transient dropouts and longer episodes of missing data, which may carry different behavioral meanings.

Extended feature representation.

At each timestep tt, we construct an augmented input vector

𝐟t=[𝐱~t,𝐦t,𝚫𝒕t],\mathbf{f}_{t}=\big[\tilde{\mathbf{x}}_{t},\;\mathbf{m}_{t},\;\boldsymbol{\Delta t}_{t}\big],

where:

  • 𝐱~td\tilde{\mathbf{x}}_{t}\in\mathbb{R}^{d} are the normalized feature values (horizontal and vertical gaze coordinates, standardized pupil size, and standardized velocity),

  • 𝐦t{0,1}d\mathbf{m}_{t}\in\{0,1\}^{d} is the binary validity mask, indicating for each feature whether the value at time tt was truly observed (11) or was originally missing and replaced by a placeholder (0),

  • 𝚫𝒕t=[Δtt1,,Δttd]d\boldsymbol{\Delta t}_{t}=[\Delta t_{t}^{1},\dots,\Delta t_{t}^{d}]^{\top}\in\mathbb{R}^{d} contains the temporal gap variables, which record how long each feature has been continuously missing.

This yields an augmented representation of dimension k=3dk=3d: for each of the dd base features, we include its normalized value, a validity flag, and a temporal gap duration.

The motivation for this augmentation is threefold:

  1. (1)

    The model can ignore invalid entries by using the mask 𝐦t\mathbf{m}_{t} while still retaining information about which features were missing.

  2. (2)

    The temporal gap variables 𝚫𝒕t\boldsymbol{\Delta t}_{t} allow the model to capture behavioral patterns such as the difference between a short blink (a brief gap) and sustained disengagement (a long gap).

  3. (3)

    By combining observed values with missingness structure, the model can potentially learn personality-related regularities, for example, that certain individuals tend to look away more often or for longer periods, which may be predictive of traits like neuroticism or conscientiousness.

3.2.3. Sequential Modeling for Personality Prediction

Personality prediction from eye-tracking is framed as a sequence-to-label task: the model receives a short sequence of eye-tracking data and must predict the participant’s personality traits.

Given an augmented feature window

𝐖k(n)=[𝐟sk,𝐟sk+1,,𝐟sk+L1]L×k,\mathbf{W}^{(n)}_{k}=[\mathbf{f}_{s_{k}},\mathbf{f}_{s_{k}+1},\dots,\mathbf{f}_{s_{k}+L-1}]^{\top}\in\mathbb{R}^{L\times k},

we stack LL consecutive augmented vectors 𝐟tk\mathbf{f}_{t}\in\mathbb{R}^{k} (normalized features, validity masks, and temporal gaps). Here, nn indexes the participant, sks_{k} is the start index of the kk-th window, and kk is the feature dimension.

The prediction target is y(n){1,2,3}5,y^{(n)}\in\{1,2,3\}^{5}, a 5-dimensional vector for the Big Five traits (Openness, Conscientiousness, Extraversion, Agreeableness, Neuroticism), where each entry takes values 1,2,31,2,3 for Low, Medium, and High. For the classification setup, participants’ continuous personality scores (originally measured on a 1–5 scale) were partitioned into three groups using a quantile-based procedure following (Saboundji et al., 2024). For each trait, cut points were set at the 33rd and 66th percentiles of its empirical distribution, so that each group contained roughly the same number of participants. This procedure yielded balanced class sizes and provided clear separation among low, medium, and high scorers. All windows from participant nn share the same label y(n)y^{(n)}, so the task is to learn a mapping from temporal sequences 𝐖k(n)\mathbf{W}^{(n)}_{k} to the trait categories.

Sequential representation learning.

Prior work often summarized windows into handcrafted statistics, such as average fixation duration, blink counts, or velocity variance. While useful, such summary features discard the temporal ordering of the signal. Here we instead retain the sequence itself, enabling the model to exploit fine-grained temporal dependencies—such as sustained fixations, repeated scanning motions, or short pupil dilations—that may be characteristic of personality traits.

BiLSTM encoder.

To capture temporal dependencies in the data, we employ a bidirectional Long Short-Term Memory (BiLSTM) network. An LSTM is a type of recurrent neural network that is designed to process sequences one element at a time, while retaining information from previous steps through a hidden state. This makes it well suited to modeling time-series signals such as eye movements, where the current behavior depends strongly on what came before.

At each timestep tt, the hidden state is updated as

𝐡t=BiLSTM(𝐟t,𝐡t1),𝐡th,\mathbf{h}_{t}=\text{BiLSTM}(\mathbf{f}_{t},\mathbf{h}_{t-1}),\quad\mathbf{h}_{t}\in\mathbb{R}^{h},

where 𝐟t\mathbf{f}_{t} is the augmented feature vector at time tt and hh is the dimensionality of the hidden state.

Unlike a standard LSTM, which only processes the sequence forward in time, a BiLSTM maintains two parallel chains of hidden states: a forward chain 𝐡t\overrightarrow{\mathbf{h}}_{t} that processes the sequence from 1L1\rightarrow L, and a backward chain 𝐡t\overleftarrow{\mathbf{h}}_{t} that processes it in reverse from L1L\rightarrow 1. At each timestep, the two are concatenated as 𝐡t=[𝐡t;𝐡t].\mathbf{h}_{t}=\big[\overrightarrow{\mathbf{h}}_{t};\,\overleftarrow{\mathbf{h}}_{t}\big].

This bidirectional setup is important because the meaning of an event often depends on its temporal context. For example, a brief period of missing data could be interpreted as a blink if it is followed immediately by normal gaze behavior, but the same gap might indicate disengagement if it occurs before and after long stretches of missingness. By looking both backward and forward in time, the BiLSTM can capture such context more effectively than a unidirectional model.

Window-level representation and prediction.

After processing a window of LL timesteps, the BiLSTM produces a sequence of hidden states {𝐡t}t=1L\{\mathbf{h}_{t}\}_{t=1}^{L}, each encoding information about the input at time tt and its surrounding context. To obtain a fixed-length representation for the entire window, we concatenate the last forward hidden state 𝐡L\overrightarrow{\mathbf{h}}_{L} (which summarizes information up to the end of the window) with the last backward hidden state 𝐡1\overleftarrow{\mathbf{h}}_{1} (which summarizes information looking backward from the start): 𝐳=[𝐡L;𝐡1].\mathbf{z}=[\overrightarrow{\mathbf{h}}_{L};\,\overleftarrow{\mathbf{h}}_{1}].

This vector 𝐳\mathbf{z} serves as a compact summary of the whole sequence, capturing both past and future dependencies.

The representation 𝐳\mathbf{z} is passed to a classification head, which applies a linear transformation followed by a softmax activation, outputting a probability distribution over the three categories for each personality trait.

Training objective.

Model training minimizes the categorical cross-entropy loss: =c=1C𝟏[y=c]logy^c,\mathcal{L}=-\sum_{c=1}^{C}\mathbf{1}[y=c]\,\log\hat{y}_{c},

which penalizes the divergence between the predicted distribution y^\hat{y} and the true label yy. Here, 𝟏[y=c]\mathbf{1}[y=c] is an indicator function that equals 11 when the ground-truth label is cc and 0 otherwise. Intuitively, this loss encourages the model to assign high probability to the correct class while discouraging probability mass on incorrect categories.

4. Experimental Evaluation

4.1. Classifier Training

We train a missing-data-aware bidirectional LSTM classifier using a 12-dimensional input feature vector at each timestep: (1) horizontal gaze coordinate gxg_{x}, (2) its validity mask mgxm_{g_{x}}, (3) temporal gap Δtgx\Delta t_{g_{x}}, (4) vertical gaze coordinate gyg_{y}, (5) mask mgym_{g_{y}}, (6) gap Δtgy\Delta t_{g_{y}}, (7) gaze velocity vv, (8) mask mvm_{v}, (9) gap Δtv\Delta t_{v}, (10) pupil diameter pp, (11) mask mpm_{p}, and (12) gap Δtp\Delta t_{p}. Sequences are segmented with a sliding window of length L=100L{=}100 and overlap =50=50 (stride =50=50) (Shoaib et al., 2015), which corresponds to a window duration of 1.67s1.67\,\text{s} with 0.83s0.83\,\text{s} stride at a 60Hz60\,\text{Hz} sampling rate. The network uses hidden size 6464, 22 layers, dropout 0.30.3, and bidirectionality, followed by a two-layer classifier head; weights are initialized with orthogonal (LSTM) and Xavier (linear) schemes. Training uses cross-entropy loss with Adam optimization (learning rate 10310^{-3}, weight decay 10510^{-5}), gradient clipping, and a ReduceLROnPlateau scheduler (patience =10=10, factor =0.5=0.5), for up to 100100 epochs with early stopping after 1515 epochs without validation improvement. We evaluate performance using stratified 5-fold cross-validation (preserving the Low/Medium/High class distribution) and report mean accuracy and macro-averaged F1 scores with standard deviations. To prevent data leakage, we adopt a leakage-free data-splitting strategy in which data are split at the segment level: for each participant, continuous signals are first divided into non-overlapping contiguous segments, which are then assigned to one of the five folds. Sliding windows are generated only within segments after fold assignment, ensuring no overlap in raw data or windows between training and test folds. We also conduct participant-stratified cross-validation, assigning each participant entirely to one fold to test generalizability to unseen individuals, and report average macro F1 with standard deviations. Leave-one-subject-out (LOSO) was not feasible because each participant only provided data for one class. As a result, test folds contained a single class only, making accuracy and F1 computation infeasible.

4.2. Results

Table 1. Classification performance (mean ±\pm standard deviation) across five folds for each Big Five trait using the full pipeline (time series + masks + temporal gaps). The last column reports participant-stratified cross-validation (by-participant) macro F1 averaged across folds. Values are percentages; the % sign is omitted for readability.
Trait Accuracy Macro F1 By-Participant F1
Openness 77.96±2.4277.96\pm 2.42 77.52±2.6377.52\pm 2.63 66.23±7.9266.23\pm\hskip 3.87498pt7.92
Conscientiousness 74.86±2.3474.86\pm 2.34 74.52±2.3274.52\pm 2.32 65.99±25.1265.99\pm 25.12
Extraversion 78.17±1.4978.17\pm 1.49 76.69±1.2476.69\pm 1.24 70.89±13.9270.89\pm 13.92
Agreeableness 73.76±1.8673.76\pm 1.86 73.09±2.0573.09\pm 2.05 63.13±15.2063.13\pm 15.20
Neuroticism 79.19±2.7079.19\pm 2.70 77.69±2.9277.69\pm 2.92 74.33±23.8574.33\pm 23.85
Table 2. Ablation study results across Big Five traits. “Full” uses time series + masks + temporal gaps features; “TS+Temporal Gap” uses time series + temporal gap features; “TS Only” uses raw time series features; and “Statistical” uses handcrafted statistical features with a Random Forest. Results are reported as mean accuracy and macro F1 (%) with standard deviation.
Openness Conscientiousness Extraversion Agreeableness Neuroticism
Accuracy (%)
Full 77.96 ±\pm 2.42 74.86 ±\pm 2.34 78.17 ±\pm 1.49 73.76 ±\pm 1.86 79.19 ±\pm 2.70
TS+Temporal Gap 69.96 ±\pm 1.50 70.43 ±\pm 1.48 73.49 ±\pm 2.65 67.15 ±\pm 1.70 71.80 ±\pm 0.99
TS Only 69.23 ±\pm 2.48 66.86 ±\pm 0.74 72.67 ±\pm 2.32 65.75 ±\pm 1.36 73.44 ±\pm 0.44
Statistical 67.59 ±\pm 1.84 67.15 ±\pm 2.18 70.97 ±\pm 3.51 65.56 ±\pm 1.67 73.73 ±\pm 2.08
Macro F1 Score (%)
Full 77.52 ±\pm 2.63 74.52 ±\pm 2.32 76.69 ±\pm 1.24 73.09 ±\pm 2.05 77.69 ±\pm 2.92
TS+Temporal Gap 69.87 ±\pm 1.54 70.38 ±\pm 1.52 70.29 ±\pm 3.12 66.21 ±\pm 1.68 69.60 ±\pm 2.58
TS Only 68.73 ±\pm 2.45 66.51 ±\pm 0.71 70.64 ±\pm 2.12 65.56 ±\pm 1.46 71.56 ±\pm 0.37
Statistical 66.40 ±\pm 1.80 66.99 ±\pm 2.27 67.85 ±\pm 3.30 64.62 ±\pm 1.57 70.31 ±\pm 2.90

Table 1 summarizes the classification performance for each of the Big Five traits. Overall, the proposed missing-data-aware BiLSTM achieves stable and moderately strong performance across all traits, with mean accuracies ranging from 73.09%73.09\% (Agreeableness) to 77.69%77.69\% (Neuroticism). Neuroticism (77.69%77.69\%) and Openness (77.52%77.52\%) exhibited relatively higher predictive performance. Extraversion (76.69%76.69\%) and Conscientiousness (74.52%74.52\%) also achieved competitive results. Agreeableness remains the most challenging trait, with macro F1 score at 73.09%73.09\%. Figure 2 depicts the confusion matrices for each of the Big Five traits.

In addition to fold-level cross-validation, we evaluated performance using participant-stratified cross-validation to test generalizability to unseen individuals. While performance naturally decreased in this stricter setting (e.g., Macro F1 score of 63.1%63.1\% for Agreeableness and 66.2%66.2\% for Openness), the model still achieved reasonable performance, with Extraversion (70.9%70.9\%) and Neuroticism (74.3%74.3\%) showing the strongest generalization.

4.3. Ablation Study

To examine whether incorporating missingness information affects the model performance, we conducted a set of ablation experiments where different subsets of features and modeling strategies were retained. Here, binary masks denote missingness indicators (1 = valid, 0 = missing).

  1. (1)

    Full Pipeline. The proposed missing-data-aware BiLSTM trained with the complete representation: time-series features (horizontal/vertical gaze, pupil diameter, gaze velocity), their binary validity masks, and temporal gap encodings.

  2. (2)

    Time Series + Temporal Gap. The BiLSTM is trained using the four time-series features together with temporal gap (Δt\Delta t) encodings, but without missingness indicators. In this setting, the model retains information about the elapsed time since the last observation, but does not explicitly encode whether a value is observed or missing.

  3. (3)

    Time Series Only. The BiLSTM trained solely on the four raw time-series features, without masks or temporal gaps. This serves as a baseline for sequential modeling without any explicit missingness indicators.

  4. (4)

    Non-sequential Baseline. A non-sequential model trained on handcrafted features. For each of the four base signals (gaze xx, gaze yy, pupil, gaze velocity), we compute five descriptive statistics (minimum, maximum, mean, standard deviation, median), yielding a 20-dimensional feature vector. A random forest classifier is then trained on this representation, providing a feature-engineered baseline without temporal modeling.

Table 2 summarizes the results of the ablation study across the Big Five traits. The full pipeline, which combines time-series features, explicit missingness masks, and temporal gap encodings, consistently achieves the best performance across traits, with accuracies ranging from approximately 74%74\% (Agreeableness) to 79%79\% (Neuroticism) and macro F1 scores between 73%73\% and 78%78\%. Removing missingness masks while retaining temporal gap encodings (TS+Temporal Gap) leads to consistent performance degradation across all traits, suggesting that explicit missingness indicators provide complementary information beyond temporal gap encodings alone. Using only raw time-series features (TS Only) further reduces performance, indicating that modeling both irregular sampling and missing data is important for effective temporal representation learning. The statistical baseline, which relies on handcrafted features and a Random Forest classifier, performs worst overall, highlighting the benefit of sequential models augmented with explicit temporal and missingness representations. Overall, the ablation results demonstrate that each component of the proposed pipeline contributes meaningfully to robust personality trait prediction.

Refer to caption
(a) O
Refer to caption
(b) C
Refer to caption
(c) E
Refer to caption
(d) A
Refer to caption
(e) N
Figure 2. Confusion matrices for each personality trait.

5. Discussion

The results in Table 1 show that it is indeed possible to classify personality traits from temporal behavioral features, and that this approach works best when the full modeling pipeline is used – that is, when raw time-series signals are combined with missingness and temporal gap information. The consistently strong scores suggest that eye-tracking data contains rich signals about stable psychological traits. At the same time, the differences in performance across the Big Five traits reveal how personality is expressed unevenly in gaze behavior. These patterns provide not only theoretical insights into the behavioral expression of personality but also practical guidance for how to build computational models.

Among the five traits, Neuroticism (N) and Openness (O) showed relatively stronger and more consistent performance, in line with prior work linking these traits to distinctive gaze variability and exploratory viewing behaviors (Rauthmann et al., 2012; Perlman et al., 2009; Agnoli et al., 2015; Risko et al., 2012). Extraversion (E) and Conscientiousness (C) achieved moderate performance, suggesting that attentional cues are informative but less distinctive in solitary tasks (Le Bras et al., 2024; Hoppe et al., 2018; Tsigeman et al., 2024). In contrast, Agreeableness (A) remained the most difficult trait to predict, consistent with evidence that it is primarily expressed in social interaction contexts (Rauthmann et al., 2012; Wu et al., 2014). Overall, traits tied to internal regulation (attention, emotion) were easier to detect from gaze than socially oriented traits, which may require richer contexts or additional modalities.

When examining generalization, the by-participant cross-validation revealed a different picture. Scores were lower and more variable across individuals, especially for Agreeableness and Conscientiousness. This highlights a major challenge: while group-level models capture broad tendencies, predicting personality consistently across different individuals is more complex. One reason is the heterogeneity of cognitive strategies: individuals may differ in how they allocate attention, process information, or regulate effort during a task. For example, some people naturally adopt systematic scanning routines, while others rely on more opportunistic or intuitive search behaviors (Hoppe et al., 2018; Berkovsky et al., 2019), leading to divergent gaze dynamics even if they share the same personality trait (MacFarlane et al., 2017). Another possible explanation is the limited sample size: with only 25 participants, the distribution of Big Five personality traits may differ substantially from that observed in much larger populations (Srivastava et al., 2003). This discrepancy can influence performance at the individual-participant level and limit the model’s ability to generalize. Increasing the number and diversity of participants would likely lead to a more representative trait distribution and improve generalization performance.

Behavioral habits and situational factors further complicate generalization. Gaze patterns are shaped not only by personality but also by momentary states such as fatigue, stress, or task engagement (Zargari Marandi et al., 2018; Fleeson and Gallagher, 2009). A conscientious individual (Conscientiousness) might display high attentional stability in one session but exhibit more variability under distraction or cognitive load. Similarly, Agreeableness – already subtle in solitary tasks – may only surface in socially interactive contexts, making it less consistently observable across participants (Wu et al., 2014).

These considerations suggest that variability is not simply “noise” but reflects meaningful person–context interactions. Capturing these nuances may require models that account for both stable dispositions and dynamic behavioral states. Approaches such as domain adaptation, personalized modeling, or hierarchical frameworks could help bridge the gap, enabling models to distinguish between trait-driven regularities and state-driven fluctuations. More broadly, these findings underscore the importance of integrating cognitive and behavioral theories into computational modeling: personality is not expressed uniformly, but filtered through the individual’s strategies, habits, and situational context (Fleeson and Gallagher, 2009).

The ablation results in Table 2 emphasize why temporal context matters. The complete pipeline (time series + masks + temporal gaps) consistently outperformed both raw time series alone and handcrafted statistical features. Notably, the inclusion of masks and temporal gap information added substantial value. This suggests that personality-relevant information is not just in the sequence of gaze coordinates but also in irregularities – when people look away, how long gaps last, and how missing data is structured. This aligns with recent research in time-series modeling that treats missingness as informative rather than as noise (Lipton et al., 2016; Che et al., 2018). Ignoring such irregularities risks discarding psychologically meaningful signals.

From a cognitive perspective, irregularities reflect the underlying processes of attention and information processing. Long gaps can signal lapses in sustained attention or shifts in cognitive focus, often linked to high Neuroticism or lower engagement with the task (Doherty-Sneddon and Phelps, 2005). Doherty-Sneddon and Phelps (2005) noted that people tend to look away when under high cognitive load. By contrast, more frequent gaze shifts and less tightly constrained fixation patterns may reflect exploratory viewing strategies, which are characteristic of individuals high in Openness, who tend to engage flexibly with stimuli and seek novel information. Thus, the timing of gaze interruptions offers indirect but meaningful cues about attention and control.

Behaviorally, missingness patterns capture differences in task approach. Some individuals show bursts of exploratory scanning interspersed with pauses – consistent with curiosity and Openness – while others show repetitive fixations and fewer breaks, reflecting more rigid or habitual engagement (Agnoli et al., 2015). This means irregular timing is not just a random error but a behavioral marker of personality-linked cognitive styles.

The methodological implications are equally important. Traditional feature engineering often assumes that missing data should be interpolated or discarded, but our findings indicate that the very presence and distribution of missingness can serve as predictive features. This perspective aligns with a growing body of behavioral informatics research, which argues that “what is absent” in behavioral traces can be as informative as “what is present” (Che et al., 2018). For personality modeling, this means that gaze dynamics should be analyzed not only for their visible sequences but also for their absences, interruptions, and irregular rhythms.

In summary, our findings show that eye-tracking holds promise for inferring personality, albeit with important boundaries. Traits tied to attention and emotion, such as Openness and Neuroticism, are especially well captured, while socially expressed trait like Agreeableness may require richer data sources such as language, physiology, or social interaction (Taib et al., 2020; Tsigeman et al., 2024). Task design also plays a critical role: social or collaborative contexts may amplify expressions of traits that remain muted in solitary tasks.

In general, the temporal structure of eye movements, including where gaps occur, provides meaningful cues about personality. However, not all traits are equally predictable, and performance varies across individuals. Recognizing these nuances is crucial both for advancing personality science and for building adaptive systems that personalize interactions. Combining eye-tracking with multimodal data and more diverse task designs could lead to models that are both more comprehensive and more generalizable.

6. Conclusion

This study demonstrates that eye-tracking data, especially when analyzed as time-based sequences that include gaps and missing points, can provide useful insights into personality. We found that traits related to attention and emotion regulation, such as Neuroticism and Openness, are captured well in gaze patterns. In contrast, socially driven traits like Agreeableness is harder to detect in tasks performed alone. This highlights both the promise and the limits of using eye movements alone to infer personality.

Beyond prediction accuracy, our work makes two broader contributions. First, we demonstrate that irregularities in gaze, moments when people look away or pause, are not just noise but carry meaningful signals about attention control, disengagement, and exploration. Second, we highlight that personality is shaped by both stable traits and changing contexts, which means models must account for individual differences while also adapting to situational influences.

Limitations

Our study focused exclusively on museum settings using picture-based search systems. Additionally, different eye-tracking devices may yield varying results due to differences in processing capabilities and other influencing factors, such as sampling frequency, camera resolution, and calibration quality.

Future Work

Future research could extend this work to a wider range of interactive settings. Additionally, combining eye-tracking data with other modalities such as electrodermal activity (EDA), electroencephalography (EEG), or electrocardiography (ECG) could enrich multimodal models and offer deeper insights into how personality traits influence information-seeking behavior.

Acknowledgements.
This research is supported by National Institute of Informatics (NII) Internship Program, the Sponsor Japan Society for the Promotion of Science https://www.jsps.go.jp/english/ (JSPS, Grant #23K28375), and the Sponsor Australian Research Council (ARC) https://www.arc.gov.au/ Centre of Excellence for Automated Decision-Making and Society (ADM+S, Grant #CE200100005).

References

  • S. Agnoli, L. Franchin, E. Rubaltelli, and G. E. Corazza (2015) An eye-tracking analysis of irrelevance processing as moderator of openness and creative performance. Creativity Research Journal 27 (2), pp. 125–132. Cited by: §5, §5.
  • H. Al-Samarraie, A. Eldenfria, and H. Dawoud (2017) The impact of personality traits on users’ information-seeking behavior. Information Processing & Management 53 (1), pp. 237–247. Cited by: §2.2.
  • B. J. Arterberry, M. P. Martens, J. M. Cadigan, and D. Rohrer (2014) Application of generalizability theory to the big five inventory. Personality and individual differences 69, pp. 98–103. Cited by: item 1.
  • N. Ashkanasy, P. L. Bowen, F. H. Rohde, and C. Y. A. Wu (2007) The effects of user characteristics on query performance in the presence of information request ambiguity. Journal of Information Systems 21 (1), pp. 53–82. Cited by: §1.
  • N. J. Belkin (1990) The cognitive viewpoint in information science. Journal of information science 16 (1), pp. 11–15. Cited by: §1, §2.1.
  • S. Berkovsky, R. Taib, I. Koprinska, E. Wang, Y. Zeng, J. Li, and S. Kleitman (2019) Detecting personality traits using eye-tracking data. In Proceedings of the 2019 CHI conference on human factors in computing systems, pp. 1–12. Cited by: §1, §2.2, §2.3, §5.
  • H. Blumenfeld (2002) Neuroanatomy through clinical cases. Sinauer. Cited by: §2.2.
  • N. Boonprakong, B. Tag, J. Goncalves, and T. Dingler (2025) How do hci researchers study cognitive biases? a scoping review. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp. 1–20. Cited by: §1.
  • I. Campbell and K. Van Rijsbergen (1996) The ostensive model of developing information needs. In CoLIS 2: second international conference on conceptions of library science: integration in perspective (Copenhague, October 13-16, 1996), pp. 251–268. Cited by: §3.1.
  • R. N. Carleton, M. P. J. Norton, and G. J. Asmundson (2007) Fearing the unknown: a short version of the intolerance of uncertainty scale. Journal of anxiety disorders 21 (1), pp. 105–117. Cited by: §2.1.
  • Z. Che, S. Purushotham, K. Cho, D. Sontag, and Y. Liu (2018) Recurrent neural networks for multivariate time series with missing values. Scientific reports 8 (1), pp. 6085. Cited by: §2.3, §5, §5.
  • L. Chen, W. Cai, D. Yan, and S. Berkovsky (2023) Eye-tracking-based personality prediction with recommendation interfaces. User Modeling and User-Adapted Interaction 33 (1), pp. 121–157. Cited by: §1, §2.2.
  • M. J. Cole, J. Gwizdka, C. Liu, N. J. Belkin, and X. Zhang (2013) Inferring user knowledge level from eye movement patterns. Information Processing & Management 49 (5), pp. 1075–1091. Cited by: §2.2.
  • M. J. Cole, C. Hendahewa, N. J. Belkin, and C. Shah (2015) User activity patterns during information search. ACM Transactions on Information Systems (TOIS) 33 (1), pp. 1–39. Cited by: §2.3.
  • S. Dhelim, N. Aung, M. A. Bouras, H. Ning, and E. Cambria (2022) A survey on personality-aware recommendation systems. Artificial Intelligence Review 55 (3), pp. 2409–2454. Cited by: §1.
  • G. Doherty-Sneddon and F. G. Phelps (2005) Gaze aversion: a response to cognitive or social difficulty?. Memory & cognition 33 (4), pp. 727–733. Cited by: §5.
  • E. Donahue, O. John, and R. Kentle (1991) The big five inventory-versions 4a and 54, berkeley. CA: University of California, Berkeley, Institute of Personality and Social Research. Cited by: item 1.
  • C. Eickhoff, S. Dungs, and V. Tran (2015) An eye-tracking study of query reformulation. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’15, pp. 13–22. External Links: ISBN 9781450336215 Cited by: §2.2.
  • W. Fleeson and P. Gallagher (2009) The implications of big five standing for the distribution of trait manifestation in behavior: fifteen experience-sampling studies and a meta-analysis.. Journal of personality and social psychology 97 (6), pp. 1097. Cited by: §2.1, §5, §5.
  • L. Franzen, A. Cabugao, B. Grohmann, K. Elalouf, and A. P. Johnson (2022) Individual pupil size changes as a robust indicator of cognitive familiarity differences. PloS one 17 (1), pp. 1–22. Cited by: §2.2.
  • J. He, Z. Leng, D. McKay, D. Spina, and J. R. Trippas (2025a) Can we hide machines in the crowd? quantifying equivalence in llm-in-the-loop annotation tasks. In Proceedings of the 2025 Annual International ACM SIGIR Conference on Research and Development in Information Retrieval in the Asia Pacific Region, pp. 426–436. Cited by: §1.
  • J. He, Z. Leng, D. McKay, J. R. Trippas, and D. Spina (2025b) Characterising topic familiarity and query specificity using eye-tracking data. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2602–2606. Cited by: §1, §2.2, §2.2.
  • K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §3.1.1.
  • J. Heinström (2003) Five personality dimensions and their influence on information behaviour. Information research 9 (1), pp. 9–1. Cited by: §1.
  • S. Y. Ho, M. J. Davern, and K. Y. Tam (2008) Personalization and choice behavior: the role of personality traits. ACM SIGMIS Database: The DATABASE for Advances in Information Systems 39 (4), pp. 31–47. Cited by: §1.
  • S. Hochreiter and J. Schmidhuber (1997) Long short-term memory. Neural computation 9 (8), pp. 1735–1780. Cited by: §2.3.
  • K. Holmqvist, M. Nyström, R. Andersson, R. Dewhurst, H. Jarodzka, and J. Van de Weijer (2011) Eye tracking: a comprehensive guide to methods and measures. oup Oxford. Cited by: §2.3.
  • S. Hong, Y. Zhou, J. Shang, C. Xiao, and J. Sun (2020) Opportunities and challenges of deep learning methods for electrocardiogram data: a systematic review. Computers in biology and medicine 122, pp. 103801. Cited by: §2.3.
  • S. Hoppe, T. Loetscher, S. A. Morey, and A. Bulling (2018) Eye movements during everyday behavior predict personality traits. Frontiers in human neuroscience 12, pp. 328195. Cited by: §1, §2.2, §2.3, §2.3, §5, §5.
  • R. Hu and P. Pu (2011) Enhancing collaborative filtering systems with personality information. In Proceedings of the fifth ACM conference on Recommender systems, pp. 197–204. Cited by: §1.
  • A. Inc. (2025) Apple vision pro. Note: Accessed: 2025-01-17 External Links: Link Cited by: §1.
  • H. K. Jach, C. G. DeYoung, and L. D. Smillie (2022) Why do people seek information? the role of personality traits and situation perception.. Journal of Experimental Psychology: General 151 (4), pp. 934. Cited by: §1, §2.1, §2.1.
  • H. K. Jach and L. D. Smillie (2019) To fear or fly to the unknown: tolerance for ambiguity and big five personality traits. Journal of Research in Personality 79, pp. 67–78. Cited by: §2.1.
  • V. S. Javadi, F. Róg, A. Aksa, J. R. Trippas, S. Vakulenko, and L. Flek (2026) CHARISMA: Character-Based Interaction Simulation with Multi-LLM Agents Toward Computational Social Psychology. In Proceedings of the ACM Conference on Information Interaction and Retrieval (CHIIR’26), pp. 1–5. Cited by: §1.
  • K. Ji, D. Hettiachchi, F. D. Salim, F. Scholer, and D. Spina (2024) Characterizing information seeking processes with multiple physiological signals. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’24, pp. 1006–1017. External Links: ISBN 9798400704314 Cited by: §2.2.
  • O. P. John, S. Srivastava, et al. (1999) The big-five trait taxonomy: history, measurement, and theoretical perspectives. Cited by: §2.1.
  • M. J. Kang, M. Hsu, I. M. Krajbich, G. Loewenstein, S. M. McClure, J. T. Wang, and C. F. Camerer (2009) The wick in the candle of learning: epistemic curiosity activates reward circuitry and enhances memory. Psychological science 20 (8), pp. 963–973. Cited by: §1.
  • R. P. Karumur, T. T. Nguyen, and J. A. Konstan (2018) Personality, user preferences and behavior in recommender systems. Information Systems Frontiers 20 (6), pp. 1241–1265. Cited by: §1.
  • T. B. Kashdan, M. C. Stiksma, D. J. Disabato, P. E. McKnight, J. Bekier, J. Kaji, and R. Lazarus (2018) The five-dimensional curiosity scale: capturing the bandwidth of curiosity and identifying four unique subgroups of curious people. Journal of Research in Personality 73, pp. 130–149. Cited by: §2.1, §2.1.
  • J. Khatri, J. Marín-Morales, M. Moghaddasi, J. Guixeres, I. A. C. Giglioli, and M. Alcañiz (2022) Recognizing personality traits using consumer behavior patterns in a virtual retail store. Frontiers in psychology 13, pp. 752073. Cited by: §2.2.
  • K. Krafka, A. Khosla, P. Kellnhofer, H. Kannan, S. Bhandarkar, W. Matusik, and A. Torralba (2016) Eye tracking for everyone. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2176–2184. Cited by: §1.
  • M. Längkvist, L. Karlsson, and A. Loutfi (2014) A review of unsupervised feature learning and deep learning for time-series modeling. Pattern recognition letters 42, pp. 11–24. Cited by: §2.3.
  • T. Le Bras, B. Allibe, and K. Doré-Mazars (2024) The way we look at an image or a webpage can reveal personality traits. Scientific Reports 14 (1), pp. 15488. Cited by: §5.
  • Z. Leng, M. Thukral, Y. Liu, H. Rajasekhar, S. K. Hiremath, J. He, and T. Plötz (2025) AgentSense: virtual sensor data generation using llm agents in simulated home environments. External Links: 2506.11773, Link Cited by: §1.
  • B. Lim, S. Ö. Arık, N. Loeff, and T. Pfister (2021) Temporal fusion transformers for interpretable multi-horizon time series forecasting. International journal of forecasting 37 (4), pp. 1748–1764. Cited by: §2.3.
  • Z. C. Lipton, D. C. Kale, R. Wetzel, et al. (2016) Modeling missing data in clinical time series with rnns. Machine Learning for Healthcare 56 (56), pp. 253–270. Cited by: §2.3, §5.
  • C. Ma, Z. Xu, Y. Ren, D. Hettiachchi, and J. Chan (2025) PUB: an llm-enhanced personality-driven user behaviour simulator for recommender system evaluation. In Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’25, pp. 2690–2694. Cited by: §1.
  • A. MacFarlane, G. Buchanan, A. Al-Wabil, G. Andrienko, and N. Andrienko (2017) Visual analysis of dyslexia on search. In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval, pp. 285–288. Cited by: §5.
  • K. E. Markon (2009) Hierarchies in the structure of personality traits. Social and Personality Psychology Compass 3 (5), pp. 812–826. Cited by: §2.1.
  • S. Mathôt, J. Fabius, E. Van Heusden, and S. Van der Stigchel (2018) Safe and sensible preprocessing and baseline correction of pupil-size data. Behavior research methods 50, pp. 94–106. Cited by: §2.2.
  • K. Matsumoto, S. Shibata, S. Seiji, C. Mori, and K. Shioe (2010) Factors influencing the processing of visual information from non-verbal communications. Psychiatry and clinical neurosciences 64 (3), pp. 299–308. Cited by: §2.2.
  • R. R. McCrae and P. T. Costa (1987) Validation of the five-factor model of personality across instruments and observers.. Journal of personality and social psychology 52 (1), pp. 81. Cited by: §1, §2.1.
  • A. Melinder, T. Brennen, M. F. Husby, and O. Vassend (2020) Personality, confirmation bias, and forensic interviewing performance. Applied Cognitive Psychology 34 (5), pp. 961–971. Cited by: §1.
  • M. Millecamp, C. Conati, and K. Verbert (2021) Classifeye: classification of personal characteristics based on eye tracking data in a recommender system interface. In Joint Proceedings of the ACM IUI 2021 Workshops co-located with 26th ACM Conference on Intelligent User Interfaces (ACM IUI 2021), Vol. 2903. Cited by: §1, §2.2, §2.3, §2.3.
  • M. Onori, A. Micarelli, and G. Sansonetti (2016) A comparative analysis of personality-based music recommender systems.. In Empire@ RecSys, pp. 55–59. Cited by: §1.
  • F. J. Ordóñez and D. Roggen (2016) Deep convolutional and lstm recurrent neural networks for multimodal wearable activity recognition. Sensors 16 (1), pp. 115. Cited by: §2.3.
  • S. B. Perlman, J. P. Morris, B. C. Vander Wyk, S. R. Green, J. L. Doyle, and K. A. Pelphrey (2009) Individual differences in personality predict how people look at faces. PloS one 4 (6), pp. e5952. Cited by: §5.
  • T. Pro (2025) Tobii pro glasses 3. Note: Accessed: 2025-01-17 External Links: Link Cited by: §1.
  • J. F. Rauthmann, C. T. Seubert, P. Sachse, and M. R. Furtner (2012) Eyes as windows to the soul: gazing behavior is related to personality. Journal of Research in Personality 46 (2), pp. 147–156. Cited by: §2.2, §5.
  • E. F. Risko, N. C. Anderson, S. Lanthier, and A. Kingstone (2012) Curious eyes: individual differences in personality predict eye movement behavior in scene-viewing. Cognition 122 (1), pp. 86–90. Cited by: §5.
  • R. R. Saboundji, K. B. Faragó, and V. Firyaridi (2024) Prediction of attention groups and big five personality traits from gaze features collected from an outlier search game. Journal of Imaging 10 (10). External Links: ISSN 2313-433X Cited by: §3.2.3.
  • M. Sanderson and S. Dumais (2007) Examining repetition in user search behavior. In European Conference on Information Retrieval, pp. 597–604. Cited by: §2.1.
  • S. M. Sarsam, H. Al-Samarraie, and A. I. Alzahrani (2023) Influence of personality traits on users’ viewing behaviour. Journal of Information Science 49 (1), pp. 233–247. Cited by: §2.2.
  • M. Shoaib, S. Bosch, H. Scholten, P. J. M. Havinga, and O. D. Incel (2015) Towards detection of bad habits by fusing smartphone and smartwatch sensors. In 2015 IEEE International Conference on Pervasive Computing and Communication Workshops (PerCom Workshops), Vol. , pp. 591–596. Cited by: §4.1.
  • Y. Shoji, K. Aihara, N. Kando, Y. Nakashima, H. Ohshima, S. Takidaira, M. Ueta, T. Yamamoto, and Y. Yamamoto (2021) Museum experience into a souvenir: generating memorable postcards from guide device behavior log. In 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), pp. 120–129. Cited by: §3.1.
  • P. J. Silvia and A. P. Christensen (2020) Looking up at the curious personality: individual differences in curiosity and openness to experience. Current Opinion in Behavioral Sciences 35, pp. 1–6. Cited by: §2.1.
  • S. Srivastava, O. P. John, S. D. Gosling, and J. Potter (2003) Development of personality in early and middle adulthood: set like plaster or persistent change?. Journal of personality and social psychology 84 (5), pp. 1041. Cited by: §5.
  • R. Taib, S. Berkovsky, I. Koprinska, E. Wang, Y. Zeng, and J. Li (2020) Personality sensing: detection of personality traits using physiological responses to image and video stimuli. ACM Transactions on Interactive Intelligent Systems (TiiS) 10 (3), pp. 1–32. Cited by: §5.
  • E. Tsigeman, V. Zemliak, M. Likhanov, K. A. Papageorgiou, and Y. Kovas (2024) AI can see you: machiavellianism and extraversion are reflected in eye-movements. Plos one 19 (8), pp. e0308631. Cited by: §5, §5.
  • Z. Wang, W. Yan, and T. Oates (2017) Time series classification from scratch with deep neural networks: a strong baseline. In 2017 International joint conference on neural networks (IJCNN), pp. 1578–1585. Cited by: §2.3.
  • M. B. Winn, D. Wendt, T. Koelewijn, and S. E. Kuchinsky (2018) Best practices and advice for using pupillometry to measure listening effort: an introduction for those who want to get started. Trends in hearing 22, pp. 1–32. Cited by: §2.2.
  • C. Woods, Z. Luo, D. Watling, and S. Durant (2022) Twenty seconds of visual behaviour on social media gives insight into personality. Scientific Reports 12 (1), pp. 1178. Cited by: §1, §2.2, §2.3.
  • D. W. Wu, W. F. Bischof, N. C. Anderson, T. Jakobsen, and A. Kingstone (2014) The influence of personality on social attention. Personality and Individual Differences 60, pp. 25–29. Cited by: §5, §5.
  • Z. Yusefi Hafshejani, M. Kaedi, and A. Fatemi (2018) Improving sparsity and new user problems in collaborative filtering by clustering the personality factors. Electronic Commerce Research 18 (4), pp. 813–836. Cited by: §1.
  • R. Zargari Marandi, P. Madeleine, Ø. Omland, N. Vuillerme, and A. Samani (2018) Eye movement characteristics reflected fatigue development in both young and elderly individuals. Scientific reports 8 (1), pp. 13148. Cited by: §5.
  • S. Zerhoudi, A. Roegiest, and J. R. Trippas (2026) Simulation of Interactive Information Retrieval: A Guided Tour. In Proceedings of the ACM Conference on Information Interaction and Retrieval (CHIIR’26), Cited by: §1.