\tocauthor

Adithya Parthasarathy ¹¹institutetext: IEEE member,
California, USA
¹¹email: [email protected]

¹¹institutetext: IEEE Senior Member, USA
¹¹email: [email protected] ²²institutetext: IEEE Senior Member, USA
³³institutetext: IEEE Senior Member, USA ⁴⁴institutetext: IEEE Senior Member, USA ⁵⁵institutetext: NTT Data, USA ⁶⁶institutetext: IEEE Senior Member, USA ⁷⁷institutetext: IEEE Senior Member, USA ⁸⁸institutetext: East West Bank, USA

Scalable Sequential Recommendation under Latency and Memory Constraints

Adithya Parthasarathy Adithya Parthasarathy Aswathnarayan Muthukrishnan Kirubakaran Vinoth Punniyamoorthy Nachiappan Chockalingam Lokesh Butra Kabilan Kannan Abhirup Mazumder Sumit Saha

Abstract

Sequential recommender systems must model long-range user behavior while operating under strict memory and latency constraints. Transformer-based approaches achieve strong accuracy but suffer from quadratic attention complexity, forcing aggressive truncation of user histories and limiting their practicality for long-horizon modeling. This paper presents HoloMambaRec, a lightweight sequential recommendation architecture that combines holographic reduced representations for attribute-aware embedding with a selective state space encoder for linear-time sequence processing. Item and attribute information are bound using circular convolution, preserving embedding dimensionality while encoding structured metadata. A shallow selective state space backbone, inspired by recent Mamba-style models, enables efficient training and constant-time recurrent inference. Experiments on Amazon Beauty and MovieLens-1M datasets demonstrate that HoloMambaRec consistently outperforms SASRec and achieves competitive performance with GRU4Rec under a constrained 10-epoch training budget, while maintaining substantially lower memory complexity. The design further incorporates forward-compatible mechanisms for temporal bundling and inference-time compression, positioning HoloMambaRec as a practical and extensible alternative for scalable, metadata-aware sequential recommendation.

keywords:

Recommender Systems, Sequence Modeling, Representation Learning, Scalable Machine Learning, Deep Learning

1 Introduction

Sequential recommendation aims to infer a user’s next intent from an evolving interaction history while operating under strict latency and memory constraints. In real-world production systems, user histories often span thousands of interactions across months or years; however, practical deployments routinely truncate these sequences to $50$ – $100$ events to accommodate model efficiency. This truncation eliminates long-term behavioral signals such as periodic preferences, seasonal interests, and delayed item resurgence, all of which are critical for retention, personalization, and re-engagement.

Transformer-based architectures have become the dominant paradigm for sequential recommendation due to their ability to model global dependencies. Models such as SASRec [1] and BERT4Rec [2] leverage self-attention to capture non-Markovian user behavior and long-range correlations. Despite their effectiveness, these models incur quadratic $\mathcal{O}(L^{2})$ memory and compute complexity with respect to sequence length $L$ , driven by attention matrices and key–value caches. As a result, scaling Transformers to thousand-step histories remains impractical on common accelerators such as T4 or A100 GPUs, forcing aggressive truncation and limiting their applicability in long-horizon recommendation settings. Recurrent architectures such as GRU4Rec [3] alleviate the quadratic memory burden but suffer from sequential training constraints and gradual information decay, making it difficult to preserve very long-term context.

Recent advances in state space models (SSMs) provide a promising alternative for scalable sequence modeling. Structured SSMs such as S4 [4] and selective SSMs such as Mamba [5] reformulate sequence processing as discretized continuous-time dynamics with input-dependent transitions. These models achieve linear $\mathcal{O}(L)$ time and memory complexity while retaining strong expressive power, enabling efficient modeling of long sequences with constant-time recurrent inference. Their hardware-aware scan implementations further improve throughput and latency, making SSMs attractive for production-grade sequential recommendation workloads.

In parallel, recommender systems increasingly rely on rich item metadata such as categories, genres, brands, or tags to improve representation quality and cold-start robustness. However, naïvely incorporating attributes via concatenation or projection increases embedding dimensionality and memory footprint. Hyperdimensional computing and holographic reduced representations (HRR) [6, 7] offer an alternative algebraic framework in which entities and attributes are bound using circular convolution, preserving dimensionality while encoding structured relationships. This binding operation directly addresses the representation expansion problem and provides a principled mechanism for attribute-aware modeling without increasing parameter count.

HoloMambaRec integrates these complementary ideas into a unified sequential recommendation architecture. The model employs holographic binding to fuse item identifiers with discrete attributes in a fixed-dimensional embedding space, and processes the resulting sequence using a shallow selective state space encoder [8]. The overall design is intentionally lightweight, targeting reproducibility on a single commodity GPU [9], while remaining forward-compatible with temporal bundling mechanisms that compress long interaction histories before encoding. This separation of representation binding and sequence dynamics enables efficient long-horizon modeling without sacrificing interpretability or deployment feasibility. Code and replication assets are available at [10].

Contributions:

•

We propose a holographic embedding layer that binds item identifiers with discrete attributes via circular convolution, enabling attribute-aware representations without dimensionality expansion.
•

We design a shallow (2–3 layer) selective state space encoder for sequential recommendation that operates in linear time and supports constant-time recurrent inference.
•

We introduce a unified preprocessing and evaluation pipeline and empirically validate the proposed architecture on Amazon Beauty [11] and MovieLens-1M [12], demonstrating consistent gains over SASRec and superior performance to GRU4Rec on MovieLens-1M under a constrained 10-epoch training budget.

2 Background and Related Work

Sequential recommendation and the memory wall:

Sequential recommendation has evolved from Markovian and session-based formulations to models capable of capturing long-range user dependencies. Self-attention mechanisms, popularized by SASRec [13] and BERT4Rec [14], enable each interaction to attend to all preceding items, effectively modeling non-Markovian behavior and complex temporal dependencies. This global receptive field significantly improves recommendation quality, particularly for users with diverse and evolving preferences. However, the expressiveness of self-attention comes at the cost of quadratic $\mathcal{O}(L^{2})$ memory and compute complexity with respect to sequence length $L$ , driven by attention matrices and key–value (KV) caches.

Earlier convolutional and session-based architectures, such as Caser [15] and NARM [16], improve efficiency by restricting context to fixed-size windows or localized patterns. While these models reduce computational overhead, they are inherently limited in their ability to capture long-term dependencies and recurring behavioral cycles. In practice, production systems using Transformer-based recommenders frequently truncate user histories to avoid excessive memory consumption and inference latency, discarding slow-moving tastes, seasonal preferences, and item re-engagement patterns that are critical for personalization.

Recurrent neural network (RNN) based approaches, most notably GRU4Rec, avoid quadratic attention costs by processing sequences sequentially. Although such models scale linearly in memory, they suffer from vanishing gradients, limited parallelism during training, and progressive information decay over long sequences. As a result, none of these paradigms simultaneously satisfy the requirements of long-horizon modeling, low latency, and efficient resource utilization in large-scale recommender systems.

State space models:

State space models (SSMs) have recently re-emerged as a compelling alternative for long-sequence modeling. Structured state space architectures such as S4 reformulate continuous-time linear dynamical systems into discrete sequence models using diagonal-plus-low-rank parameterizations and FFT-accelerated convolutions [17]. These designs enable linear $\mathcal{O}(L)$ complexity while retaining the ability to model long-range dependencies.

Mamba further advances this paradigm by introducing selective state space models with input-conditioned transition matrices $\mathcal{B}_{t}$ , output matrices $\mathcal{C}_{t}$ , and adaptive step sizes $\Delta_{t}$ , combined with hardware-aware scan operations [18]. This formulation achieves both efficient parallel training and constant-time recurrent inference, addressing key limitations of Transformers and RNNs alike. Such properties make SSMs particularly attractive for sequential recommendation, where latency constraints, long interaction histories, and throughput requirements intersect. HoloMambaRec adopts this selective SSM formulation while avoiding specialized kernels, maintaining hardware portability and implementation simplicity.

Holographic and hyperdimensional binding:

In parallel with advances in sequence modeling, representation learning for recommender systems increasingly incorporates auxiliary item attributes and side information. Traditional approaches typically concatenate attribute embeddings with item embeddings, increasing dimensionality and parameter count. Hyperdimensional computing and holographic reduced representations (HRR) provide an alternative algebraic framework in which symbols are bound via circular convolution, producing composite representations that preserve dimensionality while encoding relational structure [19].

Holographic Embeddings (HolE) demonstrate that circular convolution can effectively replace tensor products in relational learning, achieving comparable expressiveness with significantly lower memory requirements. Beyond efficiency, holographic binding offers a principled solution to the neural “binding problem” by tightly associating attributes with entities in a single vector space. Superposition further enables multiple bound pairs to coexist within one representation, at the cost of controlled interference. These properties underpin HoloMambaRec’s attribute-aware embedding strategy and its planned window-level temporal compression, where multiple interactions can be bound and superimposed to reduce effective sequence length without expanding embedding dimensionality.

3 Problem Setting and Notation

Let $\mathcal{U}$ denote the set of users and $\mathcal{I}$ the set of items. For a given user $u\in\mathcal{U}$ , we observe an interaction sequence $S_{u}=[i_{1},i_{2},\dots,i_{L}],$ where $i_{t}\in\{1,\dots,|\mathcal{I}|\}$ denotes the item interacted with at time step $t$ . Each interaction is associated with a discrete item attribute, yielding an aligned attribute sequence $A_{u}=[a_{1},a_{2},\dots,a_{L}],$ where $a_{t}\in\{1,\dots,|\mathcal{A}|\}$ and $\mathcal{A}$ denotes the attribute vocabulary.

Given the historical interaction–attribute pair $(S_{u},A_{u})$ , the objective of sequential recommendation is to predict the next item $i_{L+1}$ that the user is most likely to interact with. The model outputs a probability distribution over $\mathcal{I}$ conditioned on the observed sequence, and training proceeds by maximizing the likelihood of the ground-truth next item.

We denote the embedding dimensionality by $d$ and the internal state size of the state space model by $d_{\mathrm{state}}$ . To support minibatch training with variable-length sequences, all sequences are left-padded to a fixed maximum length $L$ , with padding index $0$ reserved for both item and attribute identifiers. Padding positions are masked during loss computation and evaluation to prevent information leakage.

4 Method

4.1 Data pipeline and preprocessing

We adopt a unified and reproducible data pipeline across all datasets to ensure fair comparison and consistent preprocessing. Raw interaction logs are ingested in a streaming fashion to minimize memory overhead: compressed JSON files for Amazon Beauty and DAT files for MovieLens-1M are parsed into tabular form. Users with fewer than five interactions are removed to reduce noise and mitigate extreme sparsity.

User and item identifiers are remapped to contiguous integer indices, with index $0$ reserved for padding. Each item is associated with a single discrete attribute to study the effect of holographic binding under controlled conditions. For Amazon Beauty, attributes are generated via a synthetic hash $(\mathrm{item\_id}\bmod 50)+1$ to simulate a moderately sized attribute vocabulary. For MovieLens-1M, the first listed genre is used as the attribute.

For each user $u$ , interactions are sorted chronologically. The final item $i_{L+1}$ is held out as the prediction target, and the preceding interactions form the input sequence. Sequences are truncated to a maximum length $L=50$ and left-padded for efficient minibatch processing. During training, teacher forcing is employed by shifting targets by one position, whereas evaluation uses the full prefix to predict the held-out item.

4.2 Holographic item–attribute binding

Let $i_{t}$ denote the item interacted with at time step $t$ , and let $a_{t}$ denote its associated attribute. Item and attribute identifiers are embedded into a shared $d$ -dimensional space using learnable embedding matrices $E^{(i)}\in\mathbb{R}^{|\mathcal{I}|\times d}$ and $E^{(a)}\in\mathbb{R}^{|\mathcal{A}|\times d}.$ The corresponding embeddings are given by

\displaystyle e^{(i)}_{t}

\displaystyle=E^{(i)}[i_{t}],\qquad e^{(a)}_{t}=E^{(a)}[a_{t}].

(1)

To incorporate attribute information without increasing dimensionality, we employ holographic binding via circular convolution. The bound representation is computed as

\tilde{e}_{t}=\mathrm{LayerNorm}\!\left(e^{(i)}_{t}+\alpha\left(e^{(i)}_{t}\circledast e^{(a)}_{t}\right)\right),

(2)

where $\alpha$ is a learnable scalar controlling the contribution of attribute binding. Circular convolution $\circledast$ is defined element-wise as

(x\circledast y)_{j}=\sum_{k=0}^{d-1}x_{k}\,y_{(j-k)\bmod d}.

(3)

In practice, convolution is implemented efficiently in the frequency domain using the Fast Fourier Transform (FFT), exploiting the identity $\mathcal{F}(x\circledast y)=\mathcal{F}(x)\odot\mathcal{F}(y),$ followed by an inverse FFT. This operation preserves the embedding dimensionality, avoiding the parameter growth and memory overhead associated with concatenation-based attribute fusion while encoding structured item–attribute relationships.

4.3 Selective state space encoder

The sequence of bound tokens $\tilde{E}=[\tilde{e}_{1},\ldots,\tilde{e}_{L}]$ is processed by a stack of selective state space blocks inspired by Mamba-style architectures. At each timestep $t$ , the block computes input-conditioned parameters governing the state update:

	$\displaystyle[\hat{x}_{t},g_{t}]$	$\displaystyle=W_{\mathrm{in}}\tilde{e}_{t},\qquad u_{t}=\mathrm{Conv1D}(\hat{x}_{t}),$		(4)
	$\displaystyle\Delta_{t},B_{t},C_{t}$	$\displaystyle=\mathrm{split}\!\left(W_{\mathrm{ssm}}u_{t}\right),\qquad\Delta_{t}=\mathrm{softplus}(\Delta_{t}),$		(5)

where $\Delta_{t}$ denotes the adaptive step size, and $B_{t}$ and $C_{t}$ are input-conditioned transition and output parameters, respectively.

The latent state $h_{t}\in\mathbb{R}^{d_{\mathrm{state}}}$ is updated according to

h_{t}=\exp(-\Delta_{t}A)\odot h_{t-1}+\Delta_{t}\odot B_{t}\odot u_{t},

(6)

with diagonal state matrix $A$ . The output at timestep $t$ is computed as

y_{t}=W_{\mathrm{out}}\left(\mathrm{SiLU}(g_{t})\odot(C_{t}\odot h_{t}+D\odot u_{t})\right),

(7)

where $D$ is a learnable skip parameter and $\odot$ denotes element-wise multiplication.

The encoder consists of two blocks for Amazon Beauty and three blocks for MovieLens-1M, each equipped with pre-LayerNorm and residual connections. The selective scan is executed explicitly over timesteps, preserving linear-time $\mathcal{O}(L)$ complexity without relying on specialized CUDA or Triton kernels.

4.4 Prediction head, masking, and loss

For each timestep, the encoder output is projected to item logits via

z_{t}=y_{t}W_{\mathrm{cls}}^{\top},

(8)

where $W_{\mathrm{cls}}\in\mathbb{R}^{|\mathcal{I}|\times d}$ is the classification matrix. Training minimizes the masked cross-entropy loss

\mathcal{L}=-\frac{1}{\sum_{t}\mathbf{1}[i_{t+1}\neq 0]}\sum_{t}\mathbf{1}[i_{t+1}\neq 0]\log\frac{\exp(z_{t,\,i_{t+1}})}{\sum_{j=1}^{|\mathcal{I}|}\exp(z_{t,\,j})},

(9)

where padding positions (index $0$ ) are excluded. During evaluation, logits corresponding to padding are set to $-\infty$ prior to ranking to prevent leakage.

4.5 Complexity

Holographic binding via FFT incurs $\mathcal{O}(d\log d)$ cost per timestep. The selective state space encoder operates in $\mathcal{O}(Ld_{\mathrm{state}})$ time and memory, in contrast to the $\mathcal{O}(L^{2}d)$ complexity of self-attention. With benchmark defaults $d=96$ and $d_{\mathrm{state}}=16$ , the overall architecture is well suited for single-GPU training and inference with modest memory requirements.

4.6 Design for temporal bundling (future activation)

The architecture is forward-compatible with temporal bundling, in which $k$ consecutive bound tokens are superimposed using learnable positional role vectors. This reduces the effective sequence length from $L$ to $\lceil L/k\rceil$ , enabling scalable modeling of ultra-long histories. Although bundling is disabled in the current experiments to isolate the effects of binding and selective dynamics, it provides a principled pathway for future compression-aware training and inference.

Algorithm 1 HoloMambaRec training loop

1: Download and parse dataset (DataManager); map users/items to contiguous IDs; assign attributes.

2: Build training batches: left-pad sequences to length

L

, align attributes, and create shifted targets.

3: for epoch

=1

E

4: for batch

(\mathbf{i},\mathbf{a},\mathbf{y})

\tilde{E}\leftarrow\mathrm{LayerNorm}(E^{(i)}[\mathbf{i}]+\alpha(E^{(i)}[\mathbf{i}]\circledast E^{(a)}[\mathbf{a}]))

Y\leftarrow\mathrm{SelectiveScan}(\tilde{E})

(Alg. inside MambaBlock)

\hat{Y}\leftarrow\mathrm{Linear}(Y)

8: Mask padding logits at index

0

; compute

\mathcal{L}\leftarrow\mathrm{CrossEntropy}(\hat{Y},\mathbf{y})

9: Backpropagate and update with AdamW

10: end for

11: Evaluate HR@10 / NDCG@10 on held-out last item per user

12: end for

Refer to caption — Figure 1: HoloMambaRec architecture

5 Experimental Setup

5.1 Datasets and preprocessing

We evaluate the proposed approach on two widely used benchmarks for sequential recommendation, following a consistent preprocessing pipeline to ensure comparability.

Amazon Beauty: User reviews are parsed from compressed JSON files in a streaming fashion to reduce memory overhead. Users with fewer than five interactions are removed to mitigate extreme sparsity. Item identifiers are reindexed to a contiguous range, and each item is assigned a single synthetic attribute computed as $(\mathrm{item\_id}\bmod 50)+1$ . This construction yields a moderately sized attribute vocabulary and allows controlled evaluation of holographic binding behavior under multiple attribute buckets.

MovieLens-1M: Rating logs and movie metadata are merged, and movie identifiers are remapped to contiguous indices. The first listed genre for each movie is used as the discrete attribute, reflecting a realistic categorical side signal commonly used in recommender systems.

For both datasets, user interactions are sorted chronologically and truncated or left-padded to a maximum sequence length of $L=50$ . We adopt a leave-one-out evaluation protocol in which the final interaction of each user sequence is held out as the test target and the remaining prefix serves as input. Padding index $0$ is consistently masked during training and evaluation to prevent information leakage.

5.2 Metrics

We report standard top- $K$ ranking metrics computed at the final prediction position, namely Hit Rate (HR@K) and Normalized Discounted Cumulative Gain (NDCG@K):

	$\displaystyle\mathrm{HR@K}$	$\displaystyle=\frac{1}{\|\mathcal{B}\|}\sum_{u\in\mathcal{B}}\mathbf{1}\{i_{L+1}^{u}\in\mathrm{TopK}(u)\},$		(10)
	$\displaystyle\mathrm{NDCG@K}$	$\displaystyle=\frac{1}{\|\mathcal{B}\|}\sum_{u\in\mathcal{B}}\frac{1}{\log_{2}(1+\mathrm{rank}_{u}(i_{L+1}^{u}))}\mathbf{1}\{\mathrm{rank}_{u}(i_{L+1}^{u})\leq K\},$		(11)

where $\mathcal{B}$ denotes the evaluation batch and $\mathrm{rank}_{u}(i_{L+1}^{u})$ is the rank of the ground-truth next item within the model’s scored item list for user $u$ . During evaluation, logits corresponding to the padding index $0$ are masked to eliminate padding-induced bias.

5.3 Baselines, hyperparameters, and runtime profile

We compare HoloMambaRec against two representative baselines: SASRec, a Transformer-based sequential recommender, and GRU4Rec, a recurrent baseline with linear memory complexity. All models are trained and evaluated under matched conditions using dataset-specific tuned hyperparameters.

Across both datasets, we set the embedding dimension to $d=96$ and the state size to $d_{\mathrm{state}}=16$ . Training uses a batch size of $64$ , the AdamW optimizer with learning rate $10^{-3}$ , and a fixed budget of $10$ epochs. The number of layers is set to $n_{\mathrm{layers}}=2$ for Amazon Beauty and $3$ for MovieLens-1M to account for differences in data density. Dropout is configured consistently across models but disabled within the custom selective state space block to maintain parity with the baselines and isolate architectural effects.

5.4 Ablations and logging

To quantify the contribution of holographic binding, we evaluate an item-only Mamba variant in which attribute embeddings and binding operations are removed. Learning curves are logged for all models, and HR@10 and NDCG@10 are recorded at each epoch. Final reported results correspond to the last training epoch for both datasets. The experimental framework also includes hooks for future ablations on temporal window compression, which will be activated once holographic bundling is jointly trained and evaluated.

6 Results and Analysis

Table 1 summarizes the final-epoch performance of all models under the fixed 10-epoch training budget. Overall, HoloMambaRec consistently outperforms the Transformer-based SASRec baseline on both datasets and achieves the strongest results on MovieLens-1M, while GRU4Rec remains competitive on the sparser Amazon Beauty dataset.

On Amazon Beauty, GRU4Rec attains the highest HR@10 and NDCG@10, reflecting the strength of recurrent inductive bias in highly sparse and noisy interaction sequences. HoloMambaRec nevertheless improves over SASRec by a clear margin, indicating that selective state space dynamics are more effective than quadratic self-attention under constrained sequence length and training budgets. On MovieLens-1M, HoloMambaRec achieves the best overall performance, improving HR@10 by $+0.0338$ and NDCG@10 by $+0.0238$ relative to SASRec, and outperforming GRU4Rec by an even larger margin.

Table 1: HR@10 and NDCG@10 comparison

Model	Amazon Beauty		ML-1M
	HR@10	NDCG@10	HR@10	NDCG@10
HoloMamba	0.0467	0.0290	0.1722	0.0966
SASRec	0.0340	0.0203	0.1384	0.0728
GRU4Rec	0.0653	0.0388	0.1323	0.0650

These results highlight three key observations. First, selective state space modeling provides a favorable trade-off between expressiveness and efficiency, particularly on denser datasets where long-range dependencies are more informative. Second, the superior NDCG@10 achieved by HoloMambaRec on MovieLens-1M indicates stronger ranking quality beyond simple hit accuracy, suggesting improved modeling of relative item preferences. Third, the narrowed performance gaps across models under tuned hyperparameters emphasize that dataset characteristics such as interaction density, regularity, and noise strongly influence whether recurrent or selective state space inductive biases dominate.

Fig. 2 reports learning curves for HR@10 and NDCG@10 across epochs. HoloMambaRec exhibits stable and monotonic improvements during training, converging faster than SASRec on both datasets. On MovieLens-1M, HoloMambaRec maintains a consistent advantage throughout training, whereas on Amazon Beauty the gap between HoloMambaRec and GRU4Rec narrows but does not fully close within the 10-epoch budget. This suggests that longer training schedules or richer side information may further benefit the proposed architecture.

6.1 Ablation: binding vs. item-only

To isolate the effect of holographic binding, Table 2 reports an ablation comparing HoloMambaRec against an item-only Mamba encoder in which attribute embeddings and binding operations are removed. Both variants are trained for 10 epochs under a shared configuration with $n_{\mathrm{layers}}{=}2$ and batch size $128$ on both datasets.

Table 2: Ablation on holographic binding

Model	Amazon Beauty		ML-1M
	HR@10	NDCG@10	HR@10	NDCG@10
HoloMamba (binding)	0.0431	0.0268	0.1695	0.0945
Mamba (item-only)	0.0424	0.0268	0.1728	0.0976

The ablation indicates that holographic binding provides limited benefit under the current experimental setup. On Amazon Beauty, binding yields near-identical performance to the item-only variant, while on MovieLens-1M it slightly trails in both HR@10 and NDCG@10. This outcome is consistent with the use of a single synthetic or coarse-grained attribute, which provides limited additional signal beyond item identity. The results suggest that the full potential of holographic binding is likely to emerge when richer metadata or multiple attributes per item are incorporated, or when temporal bundling is jointly trained to exploit superposition and compression effects.

7 Practical Notes

Circular convolution enables attribute-aware item representations while preserving the original embedding dimensionality, ensuring that the encoder’s parameter count and memory footprint remain fixed as additional metadata is introduced. The selective state space scan is executed explicitly over timesteps, which makes the recurrent dynamics transparent and interpretable while retaining linear $\mathcal{O}(L)$ computational complexity. All experiments are conducted using a consistent and reproducible configuration with embedding dimension $d=96$ , state size $d_{\mathrm{state}}=16$ , and $n_{\mathrm{layers}}{=}2$ for Amazon Beauty and $3$ for MovieLens-1M, optimized with AdamW at a learning rate of $10^{-3}$ , batch size $64$ , and a fixed training budget of $10$ epochs. Although the architecture supports temporal compression through holographic windowed bundling, this mechanism is disabled in the main experiments to isolate the impact of holographic binding and selective state space dynamics on recommendation quality.

8 Limitations and Future Work

This study focuses on establishing feasibility and is intentionally limited in scope. The current design binds only a single discrete attribute per item, whereas production recommender systems typically rely on richer and multi-valued metadata. Extending holographic binding to multiple attributes and analyzing interference when several roles are superimposed are important directions for future work, as is understanding whether selective state space models can reliably recover individual roles from such superpositions.

Efficiency and evaluation also present limitations. While temporal bundling offers clear potential for reducing latency and VRAM, its accuracy trade-offs without joint training remain unresolved. Enabling bundling during training and adapting prediction targets to operate at the window level will be necessary to realize practical compression benefits. In addition, the selective scan is currently implemented in Python; future work should explore fused CUDA or Triton kernels and comparisons with optimized Transformer variants.

Finally, evaluation is restricted to HR@10 and NDCG@10 on two datasets with a short training schedule. Broader metrics, longer training horizons, larger catalogs, and deployment-oriented measures such as throughput, cold-start robustness, and online latency behavior will be required to fully characterize the strengths and limitations of HoloMambaRec.

9 Conclusion

This paper presents HoloMambaRec, a hybrid sequential recommendation architecture that integrates holographic reduced representations with a selective state space encoder to overcome the scalability and memory constraints of attention-based models. Item identifiers are bound with discrete attributes via circular convolution, and sequence dynamics are modeled using linear-time selective state spaces, enabling attribute-aware representations without increasing embedding dimensionality or incurring quadratic attention overhead.

Empirical evaluation on Amazon Beauty and MovieLens-1M under a constrained 10-epoch training budget shows that HoloMambaRec consistently outperforms SASRec on both datasets and achieves the strongest overall results on MovieLens-1M. Compared to GRU4Rec, the model remains competitive on sparse data and demonstrates clear advantages on denser interaction sequences, highlighting the effectiveness of selective state space dynamics for long-horizon modeling under practical resource constraints. Learning curves further indicate stable optimization behavior and faster convergence relative to Transformer baselines.

Ablation results suggest that single-attribute holographic binding provides limited gains in the current setting, underscoring that richer metadata or multi-attribute fusion is likely required to fully realize the benefits of holographic representations. Importantly, the architecture is designed to be forward-compatible: temporal bundling, multi-attribute binding, and optimized kernel implementations can be activated and trained end-to-end to further reduce latency and memory usage while expanding representational capacity.

Overall, HoloMambaRec constitutes a practical step toward scalable and deployment-ready sequential recommendation systems, enabling long-horizon user modeling, efficient integration of metadata, and operation within realistic latency and memory constraints.

References

[1] W.C. Kang and J. McAuley, “Self-Attentive Sequential Recommendation,” in 2018 IEEE International Conference on Data Mining (ICDM), 2018, pp. 197–206. doi: 10.1109/ICDM.2018.00035.
[2] F. Sun, J. Liu, J. Wu, C. Pei, X. Lin, W. Ou, and P. Jiang, “BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer,” arXiv preprint arXiv:1904.06690, 2019. [Online]. Available: https://arxiv.org/abs/1904.06690
[3] B. Hidasi, A. Karatzoglou, L. Baltrunas, and D. Tikk, “Session-based recommendations with recurrent neural networks,” in International Conference on Learning Representations (ICLR), 2015.
[4] A. Gu, K. Goel, and C. Ré, “Efficiently modeling long sequences with structured state spaces,” in International Conference on Learning Representations (ICLR), 2021.
[5] A. Gu and T. Dao, “Linear-time sequence modeling with selective state spaces,” in International Conference on Learning Representations (ICLR), 2024.
[6] T. A. Plate, “Holographic reduced representations,” IEEE Transactions on Neural Networks, vol. 6, no. 3, pp. 623–641, 1995. doi: 10.1109/72.377968.
[7] M. Nickel, L. Rosasco, and T. Poggio, “Holographic Embeddings of Knowledge Graphs,” in Proc. AAAI Conf. Artif. Intell., 2016. [Online]. Available: https://arxiv.org/abs/1510.04935
[8] A. Chharia, W. Gou, and H. Dong, “MV-SSM: Multi-View State Space Modeling for 3D Human Pose Estimation,” in Proc. IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR), 2025, pp. 11590–11599. doi: 10.1109/CVPR52734.2025.01082.
[9] A. Muthukrishnan Kirubakaran, N. Saksena, S. Malempati, S. Saha, S. K. R. Carimireddy, A. Mazumder, and R. S. Bodala, “Federated Multi-Modal Learning Across Distributed Devices,” International Journal of Innovative Research in Technology, vol. 12, no. 7, pp. 2852–2857, 2025, doi: 10.5281/zenodo.17892974.
[10] A. Parthasarathy, “HoloMambaRec: Code and replication assets,” GitHub repository, 2025. [Online]. Available: https://github.com/adithyap/holo-mamba-rec
[11] J. McAuley and J. Leskovec, “Hidden factors and hidden topics: Understanding rating dimensions with review text,” in 7th ACM Conference on Recommender Systems, 2013.
[12] F. M. Harper and J. A. Konstan, “The MovieLens datasets: History and context,” ACM Transactions on Interactive Intelligent Systems, vol. 5, no. 4, pp. 1–19, 2015.
[13] X. Jiang, “Multi-Layer Self-Attentive Sequential Recommendation,” in Proc. 9th Int. Conf. Intelligent Computing and Signal Processing (ICSP), 2024, pp. 1205–1210. doi: 10.1109/ICSP62122.2024.10743339.
[14] K. Song, “Efficient Recommendation Systems for Movies Based on BERT4Rec,” in Proc. 3rd Int. Signal Processing, Communications and Engineering Management Conf. (ISPCEM), 2023, pp. 789–793. doi: 10.1109/ISPCEM60569.2023.00149.
[15] J. Tang and K. Wang, “Personalized top-N sequential recommendation via convolutional sequence embedding,” in 11th ACM International Conference on Web Search and Data Mining (WSDM), 2018.
[16] J. Li, P. Ren, Z. Chen, Z. Ren, and J. Ma, “Neural Attentive Session-based Recommendation,” in Proc. Int. Conf. Information and Knowledge Management (CIKM), 2017. [Online]. Available: https://arxiv.org/abs/1711.04725
[17] J. Geng, X. Yi, S. Wang, W. Xiong, H. Yin, and X. Li, “S4-NeRF: Neural Radiation Fields Based on Structured State Space Sequence Model,” in Proc. 6th Int. Academic Exchange Conf. Science and Technology Innovation (IAECST), 2024, pp. 88–92. doi: 10.1109/IAECST64597.2024.11117273.
[18] S. Verma, N. Sebe, and S. Raman, “GMOT-Mamba: Mamba-Based Model Prediction for Generic Multiple Object Tracking,” in Proc. IEEE Int. Conf. Image Processing (ICIP), 2025, pp. 1253–1258. doi: 10.1109/ICIP55913.2025.11084714.
[19] L. Orzo, Z. Gorocs, I. Szatmari, and S. Tokes, “GPU implementation of volume reconstruction and object detection in Digital Holographic Microscopy,” in Proc. 12th Int. Workshop on Cellular Nanoscale Networks and their Applications (CNNA), 2010, pp. 1–4. doi: 10.1109/CNNA.2010.5430246.