BLADE: Behavior-Level Anomaly Detection
BLADE: Behavior-Level Anomaly Detection
Abstract—With their widespread popularity, web services have such as injection, scanning, and brute-force attacks [7]–[11].
become the main targets of various cyberattacks. Existing traffic These attacks are often triggered by malicious payloads or
anomaly detection approaches focus on flow-level attacks, yet abnormal communication patterns observable within a single
arXiv:2511.05193v1 [[Link]] 7 Nov 2025
Then, each feature vector is resized to a fixed length of Encoder Architecture. The encoder is designed to learn a
L by reserving the first L values when longer or padding latent feature representation of each feature sequence F , using
it with zeros when shorter. Therefore, the feature sequence a bidirectional gated recurrent unit (BiGRU) followed by a
F ∈ RN ×L extracted from each flow can be represented as self-attention layer. The BiGRU processes the feature sequence
F along the time step. Each feature vector f n corresponds to
F = f 1; · · · ; f n; · · · ; f N .
(1) a hidden state hn in the BiGRU output. The BiGRU learns
Therein, f n ∈ R1×L is a fixed-length feature vector that the temporal relationships within each feature vector by pro-
encodes the n-th attributes in the L consecutive packets. cessing it in both forward and backward directions, capturing
Moreover, each flow is associated with a first-seen timestamp dependencies from both past and future context within each
τ ∈ R ≥ 0. flow. These hidden states are then concatenated to form the
Since an access behavior consists of multiple network flows, BiGRU output H, which is expressed as H = BiGRU(F ).
we aggregate W consecutive flows of an individual user in Following the BiGRU, a self-attention layer is performed on
timestamp order into a multi-flow sample B. Specifically, a H. The attention result is further added to H. A residual con-
non-overlapping window with a size of W is applied on a nection is applied between the original BiGRU output H and
series of feature sequences extracted from each user’s traffic. the H processed by the attention mechanism. Then, a multi-
In this way, a multi-flow sample B can be represented as layer perceptron (MLP) is leveraged to produce the final latent
1 feature representation Z as Z = MLP(H + Attention(H)).
τ , · · · , τ w, · · · , τ W
B= . (2) Decoder Architecture. The decoder reconstructs the feature
F 1, · · · , F w , · · · , F W sequence F from the latent feature representation Z using
Here, τ w represents the timestamp of the w-th flow and F w time-aware upsampling. The decoder is designed as the reverse
denotes its feature sequence. As indicated in Eq. (2), the multi- process of the encoder, ensuring that the spatial and temporal
flow sample B contains packet-level attributes of multiple dimensions of the latent feature representation are effectively
flows, enabling BLADE to detect both flow-level and behavior- upsampled to their original form. For simplicity, we denote
level attacks. Decoder(·) as the entire decoding process. It generates a
Training Dataset Construction. In anomaly detection, only reconstructed version of F as
benign traffic is collected and used during the training phase.
F̂ = Decoder(Z) = fˆ1 ; · · · ; fˆn ; · · · ; fˆN ,
(4)
We define B as our training dataset. Therefore, B can be
represented as where F̂ ∈ RN ×L represents the reconstructed feature se-
B = B1, · · · , Bm, · · · , BM ,
(3) quence and fˆn ∈ R1×L is reconstructed version of f n .
Reconstruction Loss. The reconstruction loss is calculated
where B m represents the m-th multi-flow sample and M for each feature vector in F . Given the feature vector f n in
represents the total number of training samples. the n-th channel of F , we compute its mean squared error
(MSE) between the original and reconstructed vectors as
C. Flow Autoencoder
1 n ˆn 2
After extracting a multi-flow sample, BLADE exploits a Ln = ∥f − f ∥2 , (5)
customized autoencoder to learn the latent feature representa- L
tion and reconstruction losses of each network flow. The flow where ∥ · ∥2 denotes the L2 norm. Note that, Ln will be used
autoencoder has an encoder and a decoder. as the input data for the subsequent anomaly score estimation.
Autoencoder Training. During the training phase, the flow features without a predefined cluster number. Specifically,
autoencoder utilizes all feature vectors in the training dataset HDBSACN is first performed on all transformed latent feature
B to capture the latent feature representations of legitimate representations, and the centers and boundaries of all clusters
network flows. Thus, the training loss of the flow autoencoder are determined. For each transformed latent feature represen-
can be represented as tation Zw , HDBSCAN assigns a pseudo operation label O as
M N
1 X 1 X n O = HDBSCAN(Zw ). (8)
Lauto = L , (6)
M m=1 N n=1 m
It is worth noting that the pseudo operation label O is just a
where Lnm denotes the MSE of the n-th attribute in the m-th cluster ID and corresponds to a certain web operation in the
flow. In the detection phase, given a new multi-flow sample application layer.
B, the flow autoencoder processes all feature vectors within
E. Anomaly Score Estimation
it, producing corresponding latent feature representations and
MSE losses. This component takes the reconstruction losses between F
and F̂ as input and outputs an anomaly score that indicates
D. Pseudo Operation Label Assignment the anomaly degree of each network flow.
This component takes a latent feature representation Z In the feature sequence F , different feature vectors have
as input and assigns a pseudo operation label to it. The different value ranges and units. This fact could result in
reason behind this is that application-layer data, i.e., payload, some feature vectors dominating the overall reconstruction loss
is typically encapsulated into a network-layer packet using of the entire flow. To address this issue, BLADE evaluates
certain encryption schemes; thus, it is difficult to determine the anomaly degree of a new flow by computing its upper-
an exact operation when a user accesses web services based tail probability in the historical distribution of reconstruction
on encrypted network traffic. Despite this, different user oper- losses of each type of feature vectors in the training dataset.
ations exhibit unique communication patterns that correspond This ensures that different anomaly scores are balanced and
to distinguishable latent representations, which allows opera- normalized to the same range. Finally, all anomaly scores are
tional pattern discovery via an unsupervised clustering method. aggregated into an overall anomaly score for this flow.
Latent Feature Transformation. Before feature clustering, Probabilistic Calibration. Let Ln denote the collection
we apply a series of transformations on each latent feature of reconstruction losses belonging to the n-th attribute in all
representation to enhance clustering robustness. First, we training feature sequences. The calibration process transforms
perform low-variance filtering on Z to eliminate uninformative these MSE loss values into probabilistic anomaly scores. To
feature dimensions. For each dimension across all training do this, the empirical cumulative distribution function for each
samples, we compute its empirical variance and retain only channel is constructed as
those dimensions with variance above a predefined threshold |L | n
α that reflects the overall anomaly degree of this flow. Given Pseudo Operation Label Assignment Low-Variance Threshold (θ) 0.01
the reconstruction losses of all flows in the multi-flow sample Log-Transformation Parameter (ϵ) 10−8
Anomaly Score Estimation
Probability Parameter (δ) 10−8
B, this component calculates their anomaly scores, providing
a measure of deviation from legitimate behavior patterns.
F. Multi-Flow Anomaly Detection testbed with a fixed number of web operations. DPDK [15]
is used to capture network traffic via 10G optical modules
For each multi-flow sample B, this component first aggre-
for data acquisition. To collect benign traffic, ten volunteers
gates its pseudo operation labels and anomaly scores into a
are recruited to normally interact with the web testbed using
behavior sample S. Then, a feature extractor encodes S into
different clients. In this condition, the traffic of each user’s ses-
a latent feature representation, and a one-class support vector
sion is captured. To collect anomalous traffic, we launch both
machine is exploited to detect traffic anomalies.
the flow-level and behavior-level attacks on the web testbed.
First, a behavior sample S can be expressed as
1 The flow-level attacks include DoS attacks, injection attacks,
τ , · · · , τ w, · · · , τ W
brute-force attacks, and scanning attacks. As for behavior-level
S = α1 , · · · , αw , · · · , αW , (13) attacks, active session attacks, web bot bulk operations, and
O1 , · · · , Ow , · · · , OW malicious data harvesting attacks are implemented. In this way,
where τ w , αw , and Ow represent the timestamp, anomaly a total of 597296 benign traffic flows and 123474 malicious
score, and pseudo operation label of the w-th flow in the flows are collected. For each flow, packet-level attributes,
multi-flow sample B, respectively. Then, we use the behavior including packet size, inter-arrival time, and TCP control flags,
samples extracted from the training dataset B to train an are extracted. In this way, this dataset contains 11965 benign
autoencoder, which has the same structure as the flow autoen- multi-flow samples and 2480 malicious ones.
coder. Next, we take its encoder part as a feature extractor. This CIC-IDS2017 Dataset. The CIC-IDS2017 [14] dataset is
design aims to capture the contextual dependencies within S. a well-known benchmark dataset for intrusion detection and
The output of the extractor X is denoted as provides labeled network traffic under benign conditions and
multiple types of real-world attacks, such as DoS, port scans,
X = Extractor(S). (14) brute force, botnet, and web attacks. Several studies have re-
The behavior representation X is then passed to a OCSVM. ported some issues in the CIC-IDS2017 dataset, such as packet
The OCSVM is used to distinguish between legitimate misordering, packet duplication, and mislabeling. To deal with
and malicious traffic by building the decision boundary of these issues, we adopt the correction methods suggested in the
representations of benign data. Since this method works in an literature [16], [17]. To generate multi-flow samples, the flows
unsupervised manner, it does not require labeled anomalous that have the same IP address and port number are considered
data in the training phase. During the detection phase, the to be from the same user and associated in chronological
learned OCSVM processes the behavior representation and order. In this way, we yield 22753 benign multi-flow samples
outputs a detection result Y as and 2642 malicious ones. Because the CIC-IDS2017 dataset
does not involve behavior-level attacks, this dataset is only
Y = OCSVM(X). (15) used to evaluate BLADE’s detection performance on flow-
In summary, BLADE is an unsupervised and adaptive traffic level attacks.
anomaly detection system that effectively detects both flow- Training and Testing. We implement BLADE on a work-
level and behavior-level attacks in diverse web systems. This station equipped with an Intel Core i7-13700K CPU, NVIDIA
approach enables efficient and robust anomaly detection in GeForce RTX 4090 GPU, and 256 GB RAM, running Ubuntu
environments where labeled data is scarce or unavailable. 20.04.6. All components are developed using Python 3.12 and
PyTorch 2.5.0. Table I lists the specifications of BLADE’s
IV. E XPERIMENTAL E VALUATION hyperparameters in our experiment. For model training and
A. Evaluation Methodology testing, the benign samples are partitioned into training and
We evaluate BLADE on both our custom dataset and the testing samples with a ratio of 7:3 in each of the two datasets.
publicly available CIC-IDS2017 dataset [14]. Moreover, all malicious samples are used for testing.
Custom Dataset. To the best of our knowledge, no public Evaluation Metrics. We use the following metrics to eval-
dataset provides labeled traffic under flow-level and behavior- uate BLADE’s performance.
level attacks. Hence, we build a reproducible blog-style web • Precision. It is the ratio of correctly identified malicious
TABLE II (a) Flow-Level Attacks vs Benign Behaviors (b) Behavior-Level Attacks vs Benign Behaviors
Benign Behaviors Benign Behaviors
P ERFORMANCE OF BLADE ON C USTOM DATASET Flow-Level Attacks Behavior-Level Attacks
t-SNE Component 3
t-SNE Component 3
Scan 0.9296 0.9965 0.9619 10
5
Flow-Level 5
Injection 0.9708 0.9677 0.9693 0
0
-5
Brute Force 0.9759 0.9179 0.9460 -5
-10
Data Harvesting 0.9661 0.9907 0.9782 -10
-15
-15
-20
Behavior-Level Active Session 0.9894 0.9993 0.9943
15
20 15 20
Web Botnet Operation 0.9635 1.0000 0.9814 10 20 15
t-SN 5 0 10
nt 1 t-SN 10 5 5 10nt 1
Average - 0.9659 0.9814 0.9732 E Co -5
mpo -10 0
omp
one E Co
mpo 0 -5 -5 0ompone
nent
2 -15 -10
t-SN
EC nent -10
2 -15 -10
t-SN
E C
-20 -20 -15 -20
Silhouette Score
Variants Precision Recall F1 Score
Dimension 2
Cluster 3
Cluster 4 0.434 0.420 0.440 0.407
0 Cluster 5 0.4
0.280 #1: w/o Anomaly Score Estimation 0.3414 0.5432 0.4193
5 0.2 0.166
#2: w/o Pseudo Labels 0.4574 0.6428 0.5345
#3: w/o Anomaly Scores 0.6386 0.6157 0.6269
10 BLADE (Complete Model) 0.9659 0.9814 0.9732
0.0
10 5 0 5 10 15
BI s
HD H
OP N
Sp S
ral
AP
an
Dimension 1
C
BI
RC
CA
BG
ect
TI
Me
M-
BS
X-
GM
(c) Calinski-Harabasz Index (d) Davies-Bouldin Index superior performance on most of the attack vectors in the two
2192
1.50 1.389 datasets. As depicted in Fig. 5 (a), the F1 score of BLADE is at
2000 1.245 1.284
1.25 1.178 1.159
least 0.09 higher than that of the baselines in injection attacks
1500 1.00
CH Index
DB Index
0.889
and 0.04 higher than that in brute force attacks on the custom
1000 0.75
0.50
0.501 0.558 dataset. When it comes to the CIC-IDS-2017 dataset, FS-Net
500 304 235 306 284 318 281 354
0.25
achieves the best performance among the four baselines, and
0 0.00 even has a higher F1 score in web attacks than BLADE.
However, BLADE outperforms FS-Net in the other four types
M
BI s
HD H
OP N
Sp S
ral
AP
BI s
HD H
OP N
Sp S
ral
AP
an
an
C
C
BI
BI
RC
CA
RC
CA
BG
BG
ect
ect
TI
TI
Me
Me
M-
M-
BS
BS
X-
X-
GM
GM
0.8 variants are at least 0.32, 0.36, and 0.34 lower than those
Ours
0.7 BAE of BLADE, respectively. Specifically, the first variant has
MemAE
0.6 LSTM-AE the worst performance, with a low precision of 0.3414. This
FS-Net
0.5 indicates that the raw reconstruction losses are of little help
DoS Scan Botnet WebAttack Brute Force
Attack Types in anomaly detection. The second variant suffers a precision
decrease of more than 0.5, suggesting that pseudo operation
Fig. 5. Performance comparison between BLADE and flow-level baselines labels are helpful in characterizing meaningful patterns of
on the custom and CIC-IDS2017 datasets. legitimate web service traffic. The third variant obtains the
highest performance among all variants, but still presents a
significant performance degradation. The above results show
density-based approach that effectively discovers clusters with the high effectiveness of each component proposed in BLADE.
different point densities, making it well-suited for grouping
feature points of benign traffic, as shown in Fig. 4 (a). These V. R ELATED WORK
results justify the adoption of HDBSCAN in our system for Traffic Anomaly Detection. Traffic anomaly detection is
latent feature clustering. widely adopted to protect the security of web services. Liu et
Comparison with Single-Flow Baselines. Because no ex- al. [7] proposed the flow sequence network (FS-Net), which
isting approach exploits network traffic for detecting behavior- extracts representative features from raw network flows and
level attacks in web services, we conduct a comparative study classifies them from a single-flow perspective. Qing et al. [10]
against four representative single-level baselines for flow-level proposed RAPIER that leverages the distinct distributions of
attack detection. The selected baselines include BAE [18], legitimate and malicious traffic flows in the feature space to
MemAE [9], LSTM-AE [8], and FS-Net [7]. We evaluate augment training data. Additionally, there are some methods
their anomaly detection performance on both the custom and based on mixed traffic, where a burst refers to a significant
CIC-IDS2017 datasets. As presented in Fig. 5, BLADE shows and sudden increase in the flow size within a given time
window, without the need to classify individual flows. Cheng [2] A. Nascita, G. Aceto, D. Ciuonzo, A. Montieri, V. Persico, and
et al. [19] proposed BurstDetector, incorporating the definition A. Pescapé, “A survey on explainable artificial intelligence for internet
traffic classification and prediction, and intrusion detection,” IEEE
of across-period bursts and employing a two-stage detection Communications Surveys & Tutorials, 2024.
process. However, these approaches mainly focus on single- [3] W. Dong, J. Yu, X. Lin, G. Gou, and G. Xiong, “Deep learning and pre-
flow characteristics and fail to capture behavioral-level attacks training technology for encrypted traffic classification: A comprehensive
review,” Neurocomputing, vol. 617, p. 128444, 2025.
that span multiple flows. In contrast, BLADE is the first traffic [4] F. Alotaibi and S. Maffeis, “Mateen: Adaptive ensemble learning
anomaly detection system that exploits multi-flow features and for network anomaly detection,” in Proceedings of the 27th
is capable of detecting not only flow-level but also behavior- International Symposium on Research in Attacks, Intrusions and
Defenses, ser. RAID ’24. New York, NY, USA: Association
level attacks. for Computing Machinery, 2024, p. 215–234. [Online]. Available:
User Behavior Analysis. User behavior analysis is also [Link]
used to secure web services. Luo et al. [12] proposed Bot- [5] D. Han, Z. Wang, W. Chen, K. Wang, R. Yu, S. Wang, H. Zhang,
Z. Wang, M. Jin, J. Yang et al., “Anomaly detection in the open world:
Graph, using a pre-obtained sitemap to convert user behaviors Normality shift detection, explanation, and adaptation.” in NDSS, 2023.
from log data into subgraphs for behavior classification. Pri- [6] C. Fu, Q. Li, and K. Xu, “Flow interaction graph analysis: Unknown
nakaa et al. [13] proposed a real-time API abuse detection encrypted malicious traffic detection,” IEEE/ACM Transactions on Net-
working, vol. 32, no. 4, pp. 2972–2987, 2024.
system that utilizes behavioral analysis of API logs to identify [7] C. Liu, L. He, G. Xiong, Z. Cao, and Z. Li, “Fs-net: A flow sequence
and mitigate security threats. Because log data is hard to network for encrypted traffic classification,” in IEEE INFOCOM 2019 -
access in many cases, researchers are beginning to investigate IEEE Conference on Computer Communications, 2019, pp. 1171–1179.
[8] M. Said Elsayed, N.-A. Le-Khac, S. Dev, and A. D. Jurcut, “Network
the feasibility of user behavior analysis using network traffic. anomaly detection using lstm based autoencoder,” in Proceedings of the
Mengmeng et al. [20] proposed Enmob, a multi-flow-based 16th ACM symposium on QoS and security for wireless and mobile
behavioral traffic classification method, which is designed networks, 2020, pp. 37–45.
[9] B. Min, J. Yoo, S. Kim, D. Shin, and D. Shin, “Network anomaly
to uncover application behaviors using encrypted application detection using memory-augmented deep autoencoder,” IEEE Access,
traffic. However, this method requires predefined behavior vol. 9, pp. 104 695–104 706, 2021.
patterns and labeled traffic. Differently, BLADE focuses on [10] Y. Qing, Q. Yin, X. Deng, Y. Chen, Z. Liu, K. Sun, K. Xu, J. Zhang, and
Q. Li, “Low-quality training data only? a robust framework for detecting
anomaly detection and can automatically generate pseudo encrypted malicious network traffic,” arXiv preprint arXiv:2309.04798,
operation labels in an unsupervised manner. 2023.
[11] W. Liu, W. Cui, B. Wang, H. Pan, W. She, and Z. Tian, “Decentralized
traffic detection utilizing blockchain-federated learning with quality-
VI. C ONCLUSION driven aggregation,” Computer Networks, vol. 262, p. 111179, 2025.
[12] Y. Luo, G. She, P. Cheng, and Y. Xiong, “Botgraph: Web bot detection
This paper presents BLADE, a novel traffic anomaly detec- based on sitemap,” arXiv preprint arXiv:1903.08074, 2019.
tion system that can detect both the flow-level and behavior- [13] S. Prinakaa, V. Bavanika, S. Sanjana, S. Srinivasan, and V. Sarasvathi, “A
real-time approach to detecting api abuses based on behavioral patterns,”
level attacks in web services. We observe that application-layer in 2024 8th International Conference on Cryptography, Security and
operations of web services exhibit distinctive communication Privacy (CSP). IEEE, 2024, pp. 24–28.
patterns at the network layer from a multi-flow perspective. [14] I. Sharafaldin, A. H. Lashkari, A. A. Ghorbani et al., “Toward generating
a new intrusion detection dataset and intrusion traffic characterization.”
BLADE generates a pseudo operation label and an anomaly ICISSp, vol. 1, no. 2018, pp. 108–116, 2018.
score for each flow and learns behavior patterns of legitimate [15] L. Foundation, “Data plane development kit (DPDK),” 2015. [Online].
web users using features within multiple flows, thus facilitating Available: [Link]
[16] M. Lanvin, P.-F. Gimenez, Y. Han, F. Majorczyk, L. Mé, and É. To-
the detection of single-flow and multi-flow anomalies. We tel, “Errors in the cicids2017 dataset and the significant differences
implement BLADE and evaluate it on our custom and public in detection performances it makes,” in Risks and Security of Internet
CIC-IDS2017 datasets. The evaluation results demonstrate that and Systems, S. Kallel, M. Jmaiel, M. Zulkernine, A. Hadj Kacem,
F. Cuppens, and N. Cuppens, Eds. Cham: Springer Nature Switzerland,
BLADE achieves an average F1 score of 0.9732 against both 2023, pp. 18–33.
the flow-level and behavior-level attacks. In addition, BLADE [17] L. Liu, G. Engelen, T. Lynar, D. Essam, and W. Joosen, “Error
outperforms four flow-level baselines, showing the superiority prevalence in nids datasets: A case study on cic-ids-2017 and cse-cic-
ids-2018,” in 2022 IEEE Conference on Communications and Network
of multi-flow anomaly detection over traditional single-flow Security (CNS). IEEE, 2022, pp. 254–262.
approaches. [18] D. Wang, M. Nie, and D. Chen, “Bae: Anomaly detection algorithm
based on clustering and autoencoder,” Mathematics, vol. 11, no. 15, p.
3398, 2023.
ACKNOWLEDGEMENT [19] Z. Cheng, G. Gao, H. Huang, Y.-E. Sun, Y. Du, and H. Wang,
“Burstdetector: Real-time and accurate across-period burst detection in
This work was supported in part by the National Natural high-speed networks,” in IEEE INFOCOM 2024 - IEEE Conference on
Science Foundation of China with Grant 62301499 and the Computer Communications, 2024, pp. 2338–2347.
Henan Association for Science and Technology with Grant [20] G. Mengmeng, F. Ruitao, L. Likun, Y. Xiangzhan, S. Vinay, X. Xiaofei,
and L. Yang, “Enmob: Unveil the behavior with multi-flow analysis of
2025HYTP037. encrypted app traffic,” Cybersecurity, vol. 8, no. 1, p. 26, 2025.
R EFERENCES
[1] A. Technologies, “State of apps and api security 2025: How ai is
shifting the digital terrain,” [Link]
state-of-apps-and-api-security-2025-how-ai-is-shifting-the-digital-terrain/,
2025, accessed: 2025-09-14.