0% found this document useful (0 votes)
73 views8 pages

BLADE: Behavior-Level Anomaly Detection

The document presents BLADE, an unsupervised traffic anomaly detection system designed to identify both flow-level and behavior-level attacks in web services. BLADE utilizes a flow autoencoder to learn latent feature representations from network traffic, assigns pseudo operation labels through unsupervised clustering, and computes anomaly scores based on reconstruction losses. Experimental results demonstrate BLADE's effectiveness, achieving high F1 scores and outperforming traditional detection methods on multiple datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views8 pages

BLADE: Behavior-Level Anomaly Detection

The document presents BLADE, an unsupervised traffic anomaly detection system designed to identify both flow-level and behavior-level attacks in web services. BLADE utilizes a flow autoencoder to learn latent feature representations from network traffic, assigns pseudo operation labels through unsupervised clustering, and computes anomaly scores based on reconstruction losses. Experimental results demonstrate BLADE's effectiveness, achieving high F1 scores and outperforming traditional detection methods on multiple datasets.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

BLADE: Behavior-Level Anomaly Detection

Using Network Traffic in Web Services


Zhibo Dong, Yong Huang* , Shubao Sun, Wentao Cui, Zhihua Wang
School of Cyber Science and Engineering, Zhengzhou University, Zhengzhou 450001, China
Email: {dongzhibo, wentao}@[Link], {yonghuang, zhwang}@[Link], sunzzu@[Link]

Abstract—With their widespread popularity, web services have such as injection, scanning, and brute-force attacks [7]–[11].
become the main targets of various cyberattacks. Existing traffic These attacks are often triggered by malicious payloads or
anomaly detection approaches focus on flow-level attacks, yet abnormal communication patterns observable within a single
arXiv:2511.05193v1 [[Link]] 7 Nov 2025

fail to recognize behavior-level attacks, which appear benign


in individual flows but reveal malicious purpose using multiple network flow. However, these approaches struggle to detect
network flows. To transcend this limitation, we propose a novel behavior-level attacks, which rarely reveal malicious intent
unsupervised traffic anomaly detection system, BLADE, capable within a single flow and instead exploit vulnerabilities of
of detecting not only flow-level but also behavior-level attacks application-layer rules and configurations through multiple
in web services. Our key observation is that application-layer flows. Typical examples include constraint violations of appli-
operations of web services exhibit distinctive communication
patterns at the network layer from a multi-flow perspective. cation programming interfaces (APIs), active session attacks,
BLADE first exploits a flow autoencoder to learn a latent feature web path traversal, and data harvesting. Behavior-level attacks
representation and calculates its reconstruction losses per flow. are generally detected through log data analysis, but this
Then, the latent representation is assigned a pseudo operation approach inherently suffers from response latency issues [12],
label using an unsupervised clustering method. Next, an anomaly [13]. Because behavior-level attacks are increasing in preva-
score is computed based on the reconstruction losses. Finally,
the triplets of timestamps, pseudo labels, and anomaly scores lence and coexist with flow-level attacks, a traffic anomaly
from multiple flows are aggregated and fed into a one-class detection system must recognize not only flow-level but also
classifier to characterize the behavior patterns of legitimate web behavior-level attacks.
operations, enabling the detection of flow-level and behavior-level To fill this gap, we propose BLADE, a Behavior-Level
anomalies. BLADE is extensively evaluated on both the custom Anomaly DEtection system that relies on multi-flow traffic
dataset and the CIC-IDS2017 dataset. The experimental results
demonstrate BLADE’s superior performance, achieving high F1 patterns in web services. Our key observation is that web
scores of 0.9732 and 0.9801, respectively, on the two datasets, services typically expose a finite set of operations through
and outperforming traditional single-flow anomaly detection their application-layer interfaces. These operations exhibit
baselines. distinctive communication patterns at the network layer, and
Index Terms—traffic anomaly detection, multi-flow analysis, these patterns can be captured from a multi-flow perspective,
unsupervised learning, behavioral patterns, web service security.
enabling differentiation of network flows generated by differ-
ent web operations.
I. I NTRODUCTION BLADE is a fully unsupervised and adaptive system. It
requires only unlabeled benign traffic data for an unsuper-
Web services play a pivotal role in today’s digital landscape
vised training process and can quickly adapt to various web
and enable seamless communication and data exchange be-
systems. BLADE consists of four core components: 1) A flow
tween different applications and systems. With their increasing
autoencoder maps individual network flows into latent feature
prevalence, web services have become prime targets for a
representations and generates per-flow reconstruction losses;
wide range of cyberattacks. It is reported that in 2024, attacks
2) A pseudo operation label assignment component applies un-
on web services have surpassed 311 billion and resulted in
supervised clustering to discover web operational patterns and
approximately 87 billion dollars in global losses [1]. Malicious
assign a pseudo operation label to each flow; 3) An anomaly
traffic analysis is a critical component in safeguarding web
score estimation component evaluates the anomaly degree of
services, enabling timely threat detection and mitigation [2],
each flow using empirical cumulative distribution functions
[3]. Since malicious traffic represents a negligible share of the
(ECDFs) on reconstruction losses; 4) A multi-flow anomaly
total network traffic and is evolving at a rapid pace, a popular
detection component aggregates the triplets of timestamps,
strategy is to design malicious traffic analysis as an anomaly
pseudo labels, and anomaly scores from consecutive flows,
detection task [4]–[6].
Despite growing attempts and extensive endeavors, existing then employs a feature extractor as well as a one-class support
traffic anomaly detection approaches focus on verifying a vector machine (OCSVM) for traffic anomaly detection.
The contributions of this work are summarized as follows.
single network flow and limit themselves to flow-level attacks
• This paper is among the first to allow both flow-level and
* The corresponding author is Yong Huang (yonghuang@[Link]). behavior-level attack detection in web services using net-
Out-of-Band Protected Assets Their objectives typically include disrupting web services,
Deployment
stealing sensitive information, or gaining unauthorized access.
Attacker Attack Vectors Database Server
These attackers can trigger both flow-level and behavior-level
attacks. In flow-level attacks, an attack vector is contained
Gateway
in a single flow. Typical flow-level attacks comprise injection
Internet Firewall Web Server
vectors exploiting application logic to compromise databases,
denial-of-service (DoS) attacks disrupting service availability,
Application Server
botnet-driven attack automation, and reconnaissance scanning
Legitimate User Multi Access Flow
targeting vulnerability enumeration. However, behavior-level
attack vectors orchestrate multiple benign-appearing flows
Fig. 1. Threat model.
that collectively compromise system security. This category
encompasses adversarial scraping operations, including data
work traffic. The enabling observation is that application- harvesting, fraudulent bulk transactions, and resource hijack-
layer operations of web services exhibit distinctive com- ing, resulting in systemic service degradation and illegitimate
munication patterns at the network layer from a multi- infrastructure access.
flow perspective. Defense Objectives. To defend against flow-level and
• This paper proposes a novel traffic anomaly detection behavior-level attacks, we propose a novel traffic anomaly
system, BLADE, that is fully unsupervised and adaptive. detection technology called BLADE. BLADE is deployed out-
The BLADE system constitutes a set of novel schemes, of-band at the web service provider’s network gateway. It
including pseudo operation label assignment, anomaly monitors network traffic via port mirroring, leaving the benign
score estimation, and multi-flow anomaly detection. data flow intact. Taking an access behavior as a basic detection
• BLADE is evaluated on both custom and public datasets, unit, BLADE expands the capabilities of existing malicious
demonstrating BLADE’s superior performance with an traffic detection methods by simultaneously detecting both
F1 score of more than 0.97, outperforming state-of-the- flow-level and behavior-level attacks in web services.
art baselines and showing its effectiveness in detecting
III. S YSTEM D ESIGN OF BLADE
flow-level and behavior-level attacks in web services.
A. System Overview
II. T HREAT MODEL
This paper presents BLADE, a novel traffic anomaly de-
The threat model considered in this paper is shown in Fig.1. tection framework that identifies not only flow-level but also
Protected Assets. In this paper, protected assets are the behavior-level attacks in web services. As shown in Fig. 2,
computer systems that provide web services, facilitating com- BLADE consists of four key components – Flow Autoen-
munication between user clients and servers. Typically, these coder, Pseudo Operation Label Assignment, Anomaly Score
assets consist of web servers, application servers, database Estimation, and Multi-Flow Anomaly Detection. Initially, the
servers, and other supporting network infrastructures, which flow autoencoder maps the feature sequence of each flow into
collaboratively ensure the seamless delivery of web services. a latent feature representation and generates a reconstructed
The protected assets can be viewed as one or a set of IP feature sequence. Next, each latent feature representation is
addresses and ports in cyberspace. assigned a pseudo operation label using an unsupervised clus-
Legitimate Users. A legitimate user can be represented tering algorithm. The anomaly score is then calculated using
as an IP address that accesses the protected assets through the difference between the original and reconstructed feature
the Internet. A single transmission control protocol (TCP) sequence per flow. Finally, the triplets of timestamps, pseudo
connection initiated by a user client to the protected assets operation labels, and anomaly scores from consecutive flows
is considered one access action, and the traffic generated by are transformed by a feature extractor, followed by a one-
this TCP connection is referred to as a network flow. An access class classifier for anomaly detection. The main advantages
action can be viewed as an operation on the protected assets. of BLADE are that it only requires unlabeled benign traffic
When interacting with a web system, a user client typically data to automatically learn legitimate traffic patterns in an
performs continuous access actions. We refer to the sequence unsupervised manner. Thus, BLADE is fully unsupervised and
of actions within a time window as an access behavior, adaptive, and can be easily deployed in various web systems.
and these actions often exhibit contextual dependencies. For
instance, a user might first perform authentication through a B. Data Parsing
login page, then navigate to a dashboard, subsequently ac- The first step of BLADE is to extract multi-flow samples,
cess specific resources, and finally perform logout operations. i.e., the basic units for anomaly detection, to characterize
These sequential actions form coherent behavioral patterns that traffic patterns when a legitimate user accesses the protected
reflect legitimate usage workflows. assets.
Attackers and Attack Vectors. The attackers are external Multi-Flow Sample Extraction. For each captured network
individuals or groups that come from outside the protected flow, we first extract N feature vectors, each of which consists
assets and aim to compromise the security of the web system. of one type of packet-level attributes extracted from this flow.
Fig. 2. Framework of BLADE.

Then, each feature vector is resized to a fixed length of Encoder Architecture. The encoder is designed to learn a
L by reserving the first L values when longer or padding latent feature representation of each feature sequence F , using
it with zeros when shorter. Therefore, the feature sequence a bidirectional gated recurrent unit (BiGRU) followed by a
F ∈ RN ×L extracted from each flow can be represented as self-attention layer. The BiGRU processes the feature sequence
F along the time step. Each feature vector f n corresponds to
F = f 1; · · · ; f n; · · · ; f N .
 
(1) a hidden state hn in the BiGRU output. The BiGRU learns
Therein, f n ∈ R1×L is a fixed-length feature vector that the temporal relationships within each feature vector by pro-
encodes the n-th attributes in the L consecutive packets. cessing it in both forward and backward directions, capturing
Moreover, each flow is associated with a first-seen timestamp dependencies from both past and future context within each
τ ∈ R ≥ 0. flow. These hidden states are then concatenated to form the
Since an access behavior consists of multiple network flows, BiGRU output H, which is expressed as H = BiGRU(F ).
we aggregate W consecutive flows of an individual user in Following the BiGRU, a self-attention layer is performed on
timestamp order into a multi-flow sample B. Specifically, a H. The attention result is further added to H. A residual con-
non-overlapping window with a size of W is applied on a nection is applied between the original BiGRU output H and
series of feature sequences extracted from each user’s traffic. the H processed by the attention mechanism. Then, a multi-
In this way, a multi-flow sample B can be represented as layer perceptron (MLP) is leveraged to produce the final latent
 1 feature representation Z as Z = MLP(H + Attention(H)).
τ , · · · , τ w, · · · , τ W

B= . (2) Decoder Architecture. The decoder reconstructs the feature
F 1, · · · , F w , · · · , F W sequence F from the latent feature representation Z using
Here, τ w represents the timestamp of the w-th flow and F w time-aware upsampling. The decoder is designed as the reverse
denotes its feature sequence. As indicated in Eq. (2), the multi- process of the encoder, ensuring that the spatial and temporal
flow sample B contains packet-level attributes of multiple dimensions of the latent feature representation are effectively
flows, enabling BLADE to detect both flow-level and behavior- upsampled to their original form. For simplicity, we denote
level attacks. Decoder(·) as the entire decoding process. It generates a
Training Dataset Construction. In anomaly detection, only reconstructed version of F as
benign traffic is collected and used during the training phase.
F̂ = Decoder(Z) = fˆ1 ; · · · ; fˆn ; · · · ; fˆN ,
 
(4)
We define B as our training dataset. Therefore, B can be
represented as where F̂ ∈ RN ×L represents the reconstructed feature se-
B = B1, · · · , Bm, · · · , BM ,

(3) quence and fˆn ∈ R1×L is reconstructed version of f n .
Reconstruction Loss. The reconstruction loss is calculated
where B m represents the m-th multi-flow sample and M for each feature vector in F . Given the feature vector f n in
represents the total number of training samples. the n-th channel of F , we compute its mean squared error
(MSE) between the original and reconstructed vectors as
C. Flow Autoencoder
1 n ˆn 2
After extracting a multi-flow sample, BLADE exploits a Ln = ∥f − f ∥2 , (5)
customized autoencoder to learn the latent feature representa- L
tion and reconstruction losses of each network flow. The flow where ∥ · ∥2 denotes the L2 norm. Note that, Ln will be used
autoencoder has an encoder and a decoder. as the input data for the subsequent anomaly score estimation.
Autoencoder Training. During the training phase, the flow features without a predefined cluster number. Specifically,
autoencoder utilizes all feature vectors in the training dataset HDBSACN is first performed on all transformed latent feature
B to capture the latent feature representations of legitimate representations, and the centers and boundaries of all clusters
network flows. Thus, the training loss of the flow autoencoder are determined. For each transformed latent feature represen-
can be represented as tation Zw , HDBSCAN assigns a pseudo operation label O as
M N
1 X 1 X n O = HDBSCAN(Zw ). (8)
Lauto = L , (6)
M m=1 N n=1 m
It is worth noting that the pseudo operation label O is just a
where Lnm denotes the MSE of the n-th attribute in the m-th cluster ID and corresponds to a certain web operation in the
flow. In the detection phase, given a new multi-flow sample application layer.
B, the flow autoencoder processes all feature vectors within
E. Anomaly Score Estimation
it, producing corresponding latent feature representations and
MSE losses. This component takes the reconstruction losses between F
and F̂ as input and outputs an anomaly score that indicates
D. Pseudo Operation Label Assignment the anomaly degree of each network flow.
This component takes a latent feature representation Z In the feature sequence F , different feature vectors have
as input and assigns a pseudo operation label to it. The different value ranges and units. This fact could result in
reason behind this is that application-layer data, i.e., payload, some feature vectors dominating the overall reconstruction loss
is typically encapsulated into a network-layer packet using of the entire flow. To address this issue, BLADE evaluates
certain encryption schemes; thus, it is difficult to determine the anomaly degree of a new flow by computing its upper-
an exact operation when a user accesses web services based tail probability in the historical distribution of reconstruction
on encrypted network traffic. Despite this, different user oper- losses of each type of feature vectors in the training dataset.
ations exhibit unique communication patterns that correspond This ensures that different anomaly scores are balanced and
to distinguishable latent representations, which allows opera- normalized to the same range. Finally, all anomaly scores are
tional pattern discovery via an unsupervised clustering method. aggregated into an overall anomaly score for this flow.
Latent Feature Transformation. Before feature clustering, Probabilistic Calibration. Let Ln denote the collection
we apply a series of transformations on each latent feature of reconstruction losses belonging to the n-th attribute in all
representation to enhance clustering robustness. First, we training feature sequences. The calibration process transforms
perform low-variance filtering on Z to eliminate uninformative these MSE loss values into probabilistic anomaly scores. To
feature dimensions. For each dimension across all training do this, the empirical cumulative distribution function for each
samples, we compute its empirical variance and retain only channel is constructed as
those dimensions with variance above a predefined threshold |L | n

θ, effectively removing features that show little variation n 1 X


ECDF (x) = n I[log(Lni + ϵ) ≤ x], (9)
and contribute minimal discriminative information. Next, to |L | i=1
reduce feature dimensionality while decorrelating the remain-
ing features, the principal components analysis (PCA) with where |Ln | denotes the total number of training losses, x
the whitening operation is applied. We retain the smallest is the query point, i.e., a log-transformed loss value, Lni
components that collectively capture at least 95% of the represents the i-th reconstruction loss for the n-th attribute
total variance. These transformations can generate compact in the training dataset, ϵ is a small constant to avoid zeros,
and uncorrelated latent feature representations, thus improving and I[·] is the indicator function, which outputs the value 1
clustering performance. We use Z ′ to represent the filtered when the condition is satisfied, and 0 otherwise. Moreover,
version of Z, and PCAwhite (·) to denote the PCA whitening the log-transformation is applied to smooth and standardize
process. Zw represents the latent representation after the above reconstructed losses.
transformations, which can be expressed as During testing, given a new reconstruction loss Lntest , the
anomaly score is computed as the negative value of the log-
Zw = PCAwhite (Z ′ ). (7) transformation of its upper-tail probability
Latent Feature Clustering. In practice, the types of legiti- pn = 1 − ECDFn (log(Lntest + ϵ)), (10)
mate user operations are hard to determine based on network n n
a = − log(p + δ), (11)
traffic, and different web services have different operations.
Under these conditions, an unsupervised clustering method where pn represents the upper-tail probability for the n-th
is needed. To meet this need, the hierarchical density-based attribute, an is the corresponding anomaly score, and δ is used
spatial clustering of applications with noise (HDBSCAN) is to prevent zero probabilities. A higher anomaly score indicates
selected to perform clustering analysis on the latent feature a greater deviation from the benign traffic pattern.
representations. HDBSCAN is a density-based clustering al- Anomaly Score Aggregation. To derive a final anomaly
gorithm that automatically identifies clusters of noisy input score for each flow, we use the LogSumExp function to
aggregate the anomaly scores computed across all attributes TABLE I
as S ETTINGS OF BLADE’ S H YPERPARAMETERS IN O UR E XPERIMENT
N
!
X
n Component Hyperparameter Value
α = log exp(a ) . (12)
Length of Feature Vector (L) 50
n=1
Data Parsing Size of Behavior Window (W ) 50
This function is similar to the softmax function and places Number of Packet-level Attributes (N ) 3
Hidden State Size 128
greater emphasis on larger anomaly scores while reducing the Flow Autoencoder Latent Representation Dimension 64
influence of smaller ones. The result is a final anomaly score Number of BiGRU Layers 2

α that reflects the overall anomaly degree of this flow. Given Pseudo Operation Label Assignment Low-Variance Threshold (θ) 0.01

the reconstruction losses of all flows in the multi-flow sample Log-Transformation Parameter (ϵ) 10−8
Anomaly Score Estimation
Probability Parameter (δ) 10−8
B, this component calculates their anomaly scores, providing
a measure of deviation from legitimate behavior patterns.
F. Multi-Flow Anomaly Detection testbed with a fixed number of web operations. DPDK [15]
is used to capture network traffic via 10G optical modules
For each multi-flow sample B, this component first aggre-
for data acquisition. To collect benign traffic, ten volunteers
gates its pseudo operation labels and anomaly scores into a
are recruited to normally interact with the web testbed using
behavior sample S. Then, a feature extractor encodes S into
different clients. In this condition, the traffic of each user’s ses-
a latent feature representation, and a one-class support vector
sion is captured. To collect anomalous traffic, we launch both
machine is exploited to detect traffic anomalies.
the flow-level and behavior-level attacks on the web testbed.
First, a behavior sample S can be expressed as
 1 The flow-level attacks include DoS attacks, injection attacks,
τ , · · · , τ w, · · · , τ W

brute-force attacks, and scanning attacks. As for behavior-level
S =  α1 , · · · , αw , · · · , αW  , (13) attacks, active session attacks, web bot bulk operations, and
O1 , · · · , Ow , · · · , OW malicious data harvesting attacks are implemented. In this way,
where τ w , αw , and Ow represent the timestamp, anomaly a total of 597296 benign traffic flows and 123474 malicious
score, and pseudo operation label of the w-th flow in the flows are collected. For each flow, packet-level attributes,
multi-flow sample B, respectively. Then, we use the behavior including packet size, inter-arrival time, and TCP control flags,
samples extracted from the training dataset B to train an are extracted. In this way, this dataset contains 11965 benign
autoencoder, which has the same structure as the flow autoen- multi-flow samples and 2480 malicious ones.
coder. Next, we take its encoder part as a feature extractor. This CIC-IDS2017 Dataset. The CIC-IDS2017 [14] dataset is
design aims to capture the contextual dependencies within S. a well-known benchmark dataset for intrusion detection and
The output of the extractor X is denoted as provides labeled network traffic under benign conditions and
multiple types of real-world attacks, such as DoS, port scans,
X = Extractor(S). (14) brute force, botnet, and web attacks. Several studies have re-
The behavior representation X is then passed to a OCSVM. ported some issues in the CIC-IDS2017 dataset, such as packet
The OCSVM is used to distinguish between legitimate misordering, packet duplication, and mislabeling. To deal with
and malicious traffic by building the decision boundary of these issues, we adopt the correction methods suggested in the
representations of benign data. Since this method works in an literature [16], [17]. To generate multi-flow samples, the flows
unsupervised manner, it does not require labeled anomalous that have the same IP address and port number are considered
data in the training phase. During the detection phase, the to be from the same user and associated in chronological
learned OCSVM processes the behavior representation and order. In this way, we yield 22753 benign multi-flow samples
outputs a detection result Y as and 2642 malicious ones. Because the CIC-IDS2017 dataset
does not involve behavior-level attacks, this dataset is only
Y = OCSVM(X). (15) used to evaluate BLADE’s detection performance on flow-
In summary, BLADE is an unsupervised and adaptive traffic level attacks.
anomaly detection system that effectively detects both flow- Training and Testing. We implement BLADE on a work-
level and behavior-level attacks in diverse web systems. This station equipped with an Intel Core i7-13700K CPU, NVIDIA
approach enables efficient and robust anomaly detection in GeForce RTX 4090 GPU, and 256 GB RAM, running Ubuntu
environments where labeled data is scarce or unavailable. 20.04.6. All components are developed using Python 3.12 and
PyTorch 2.5.0. Table I lists the specifications of BLADE’s
IV. E XPERIMENTAL E VALUATION hyperparameters in our experiment. For model training and
A. Evaluation Methodology testing, the benign samples are partitioned into training and
We evaluate BLADE on both our custom dataset and the testing samples with a ratio of 7:3 in each of the two datasets.
publicly available CIC-IDS2017 dataset [14]. Moreover, all malicious samples are used for testing.
Custom Dataset. To the best of our knowledge, no public Evaluation Metrics. We use the following metrics to eval-
dataset provides labeled traffic under flow-level and behavior- uate BLADE’s performance.
level attacks. Hence, we build a reproducible blog-style web • Precision. It is the ratio of correctly identified malicious
TABLE II (a) Flow-Level Attacks vs Benign Behaviors (b) Behavior-Level Attacks vs Benign Behaviors
Benign Behaviors Benign Behaviors
P ERFORMANCE OF BLADE ON C USTOM DATASET Flow-Level Attacks Behavior-Level Attacks

Attack Type Attack Vector Precision Recall F1 score


20
DoS 0.9661 0.9975 0.9816 15
15
10

t-SNE Component 3

t-SNE Component 3
Scan 0.9296 0.9965 0.9619 10
5
Flow-Level 5
Injection 0.9708 0.9677 0.9693 0
0
-5
Brute Force 0.9759 0.9179 0.9460 -5
-10
Data Harvesting 0.9661 0.9907 0.9782 -10
-15
-15
-20
Behavior-Level Active Session 0.9894 0.9993 0.9943
15
20 15 20
Web Botnet Operation 0.9635 1.0000 0.9814 10 20 15
t-SN 5 0 10
nt 1 t-SN 10 5 5 10nt 1
Average - 0.9659 0.9814 0.9732 E Co -5
mpo -10 0
omp
one E Co
mpo 0 -5 -5 0ompone
nent
2 -15 -10
t-SN
EC nent -10
2 -15 -10
t-SN
E C
-20 -20 -15 -20

TABLE III Fig. 3. t-SNE visualization of malicious and benign traffic.


P ERFORMANCE OF BLADE ON CIC-IDS2017 DATASET

Attack Vector Precision Recall F1 score


score of 0.9943 and a high recall of 0.9993. The system obtains
DoS 0.9661 0.9907 0.9782
Botnet 0.9894 0.9993 0.9943 the lowest performance against web attacks with a precision of
Port Scan 0.9991 1.0000 0.9995 0.9214. Despite that, BLADE achieves a precision of 0.9679,
Web Attack 0.9214 0.9744 0.9472 a recall of 0.9929, and an F1 score of 0.9801 on average,
Brute Force 0.9635 1.0000 0.9814
Average 0.9679 0.9929 0.9801 validating its robustness across various flow-level attacks.
Behavioral Embedding Visualization. To highlight the
differences between malicious and benign traffic, we apply
samples to all samples predicted as malicious. t-distributed stochastic neighbor embedding (t-SNE) to the
• Recall. It is the ratio of correctly identified malicious samples from our custom dataset. Fig. 3 visualizes high-
samples to all actual malicious samples. dimensional embeddings from the multi-flow anomaly detec-
• F1 Score. It is the harmonic mean of precision and recall, tion module. Overall, the projection results show high distin-
providing a balanced measure of system performance. guishability between benign and malicious traffic in the latent
• Silhouette Coefficient. It is the similarity of an object to space. In Fig. 3 (a), flow-level attacks are clearly separated
its own cluster compared to other clusters. from benign ones, indicating that triplet-sequence modeling
• Calinski-Harabasz Index. It is a metric that evaluates is effective in capturing temporal and contextual cues. In
the ratio of the sum of inter-cluster dispersion to that of Fig. 3 (b), the latent embeddings of behavior-level attacks are
intra-cluster dispersion. distant from those of benign traffic. The strong separability
• Davies-Bouldin Index. It is based on the average simi- stems from distinctive multi-flow operation sequences (e.g.,
larity ratio of each cluster to its most-similar cluster. bulk operations, data harvesting) that are invisible per flow
The first three metrics measure the anomaly detection per- but revealed by our pseudo-operation labels and anomaly-score
formance of BLADE, while the latter three are used to integration. These clear discriminating boundaries confirm that
evaluate the performance of unsupervised clustering involved BLADE can distinguish both flow-level and behavior-level
in BLADE. anomalies within a unified latent space of multi-flow features.
Impact of Cluster Algorithms. We evaluate eight represen-
B. Experimental Results tative clustering algorithms covering four categories: 1) prob-
Anomaly Detection on Custom Dataset. First, we present abilistic model-based methods, including Bayesian Gaussian
BLADE’s anomaly detection performance on the custom mixture (BGM) and Gaussian mixture model with Bayesian in-
dataset. As shown in Table II, BLADE achieves consistently formation criterion (GMM-BIC); 2) prototype-based methods,
high performance against the four flow-level attacks. Espe- including X-means and balanced iterative reducing and clus-
cially, it obtains a precision of 0.9661, an exceptional recall of tering using hierarchies (BIRCH); 3) density-based methods,
0.9975, and a high F1 score of 0.9816 for DoS attacks. As for consisting of HDBSCAN and ordering points to identify the
behavior-level attacks, BLADE has the best performance when clustering structure (OPTICS); 4) graph-based approaches, in-
detecting active session attacks, with a precision of 0.9894, a cluding spectral clustering with eigenvector (SCE) and affinity
high recall of 0.9993, and a near-perfect F1 score of 0.9943. propagation (AP). Fig. 4 (a) illustrates the latent embeddings
On average, BLADE achieves a precision of 0.9659, a recall derived from benign traffic in our custom dataset. Six non-
of 0.9814, and an F1 score of 0.9732, indicating its high overlapping clusters can be clearly observed. These clusters
effectiveness in recognizing both flow-level and behavior-level correspond to different application-layer operations, which lay
web service attacks. the foundation for applying clustering algorithms to distin-
Anomaly Detection on CIC-IDS2017 Dataset. Then, we guish differences in operations. According to Fig. 4 (b)-(d),
report BLADE’s performance on the CIC-IDS2017 dataset for HDBSCAN achieves the best clustering performance, with the
detecting flow-level attacks in Table III. BLADE achieves the highest Silhouette score of 0.653, the lowest Davies-Bouldin
highest F1 score of 0.9995 with a perfect recall for port scan index of 0.501, and the highest Calinski-Harabasz index
detection. As for Botnet attacks, our system presents an F1 of 2192. HDBSCAN’s superior performance stems from its
(a) t-SNE of Flow Latent Space (b) Silhouette Score TABLE IV
10 Outlier
0.653
A BLATION S TUDY OF BLADE
Cluster 0 0.609
Cluster 1 0.6
5 Cluster 2

Silhouette Score
Variants Precision Recall F1 Score
Dimension 2

Cluster 3
Cluster 4 0.434 0.420 0.440 0.407
0 Cluster 5 0.4
0.280 #1: w/o Anomaly Score Estimation 0.3414 0.5432 0.4193
5 0.2 0.166
#2: w/o Pseudo Labels 0.4574 0.6428 0.5345
#3: w/o Anomaly Scores 0.6386 0.6157 0.6269
10 BLADE (Complete Model) 0.9659 0.9814 0.9732
0.0
10 5 0 5 10 15

BI s
HD H

OP N

Sp S
ral
AP
an
Dimension 1

C
BI

RC

CA
BG

ect
TI
Me
M-

BS
X-
GM
(c) Calinski-Harabasz Index (d) Davies-Bouldin Index superior performance on most of the attack vectors in the two
2192
1.50 1.389 datasets. As depicted in Fig. 5 (a), the F1 score of BLADE is at
2000 1.245 1.284
1.25 1.178 1.159
least 0.09 higher than that of the baselines in injection attacks
1500 1.00
CH Index

DB Index

0.889
and 0.04 higher than that in brute force attacks on the custom
1000 0.75
0.50
0.501 0.558 dataset. When it comes to the CIC-IDS-2017 dataset, FS-Net
500 304 235 306 284 318 281 354
0.25
achieves the best performance among the four baselines, and
0 0.00 even has a higher F1 score in web attacks than BLADE.
However, BLADE outperforms FS-Net in the other four types
M

BI s
HD H

OP N

Sp S
ral
AP

BI s
HD H

OP N

Sp S
ral
AP
an

an
C

C
BI

BI
RC

CA

RC

CA
BG

BG
ect

ect
TI

TI
Me

Me
M-

M-
BS

BS
X-

X-
GM

GM

of attacks. These results validate the superiority of multi-flow


anomaly detection over traditional single-flow approaches.
Fig. 4. Performance of different clustering algorithms. Ablation Study. Finally, we conduct an ablation study
to validate the effectiveness of each component in BLADE.
(a) F1-Score Comparison on Custom Dataset For this purpose, we build three variants by systematically
1.0 0.98 0.96 0.96
0.94
removing one component from BLADE at a time: 1) The first
0.90 0.90 0.92 0.90 0.89 0.90 0.88 0.88 0.89
0.90
0.9 0.87 0.87 0.87 0.87 0.87 0.88 variant removes the anomaly score estimation process and di-
F1-Score

0.8 rectly feeds raw reconstruction losses into multi-flow anomaly


Ours
0.7 BAE detection with pseudo operation labels and timestamps; 2)
MemAE
0.6 LSTM-AE
The second variant excludes pseudo operation labels, utilizing
FS-Net only anomaly scores and timestamps for anomaly detection;
0.5
DoS Scan Injection Brute Force 3) The third variant removes anomaly scores, employing only
Attack Types
pseudo operation labels and timestamps in multi-flow anomaly
(b) F1-Score Comparison on CIC-IDS-2017 detection. We evaluate the three variants on our custom dataset
0.99 0.99
1.0 0.97
0.94 0.94 0.93 0.94 0.93
0.96 0.98
and report their anomaly detection results in Table IV. As
0.91 0.92 0.90 0.91 0.91 0.90 0.90 0.88 0.91
0.89
0.9 0.87 0.87 0.87 0.86
0.84 the table shows, the precisions, recalls, and F1 scores of all
F1-Score

0.8 variants are at least 0.32, 0.36, and 0.34 lower than those
Ours
0.7 BAE of BLADE, respectively. Specifically, the first variant has
MemAE
0.6 LSTM-AE the worst performance, with a low precision of 0.3414. This
FS-Net
0.5 indicates that the raw reconstruction losses are of little help
DoS Scan Botnet WebAttack Brute Force
Attack Types in anomaly detection. The second variant suffers a precision
decrease of more than 0.5, suggesting that pseudo operation
Fig. 5. Performance comparison between BLADE and flow-level baselines labels are helpful in characterizing meaningful patterns of
on the custom and CIC-IDS2017 datasets. legitimate web service traffic. The third variant obtains the
highest performance among all variants, but still presents a
significant performance degradation. The above results show
density-based approach that effectively discovers clusters with the high effectiveness of each component proposed in BLADE.
different point densities, making it well-suited for grouping
feature points of benign traffic, as shown in Fig. 4 (a). These V. R ELATED WORK
results justify the adoption of HDBSCAN in our system for Traffic Anomaly Detection. Traffic anomaly detection is
latent feature clustering. widely adopted to protect the security of web services. Liu et
Comparison with Single-Flow Baselines. Because no ex- al. [7] proposed the flow sequence network (FS-Net), which
isting approach exploits network traffic for detecting behavior- extracts representative features from raw network flows and
level attacks in web services, we conduct a comparative study classifies them from a single-flow perspective. Qing et al. [10]
against four representative single-level baselines for flow-level proposed RAPIER that leverages the distinct distributions of
attack detection. The selected baselines include BAE [18], legitimate and malicious traffic flows in the feature space to
MemAE [9], LSTM-AE [8], and FS-Net [7]. We evaluate augment training data. Additionally, there are some methods
their anomaly detection performance on both the custom and based on mixed traffic, where a burst refers to a significant
CIC-IDS2017 datasets. As presented in Fig. 5, BLADE shows and sudden increase in the flow size within a given time
window, without the need to classify individual flows. Cheng [2] A. Nascita, G. Aceto, D. Ciuonzo, A. Montieri, V. Persico, and
et al. [19] proposed BurstDetector, incorporating the definition A. Pescapé, “A survey on explainable artificial intelligence for internet
traffic classification and prediction, and intrusion detection,” IEEE
of across-period bursts and employing a two-stage detection Communications Surveys & Tutorials, 2024.
process. However, these approaches mainly focus on single- [3] W. Dong, J. Yu, X. Lin, G. Gou, and G. Xiong, “Deep learning and pre-
flow characteristics and fail to capture behavioral-level attacks training technology for encrypted traffic classification: A comprehensive
review,” Neurocomputing, vol. 617, p. 128444, 2025.
that span multiple flows. In contrast, BLADE is the first traffic [4] F. Alotaibi and S. Maffeis, “Mateen: Adaptive ensemble learning
anomaly detection system that exploits multi-flow features and for network anomaly detection,” in Proceedings of the 27th
is capable of detecting not only flow-level but also behavior- International Symposium on Research in Attacks, Intrusions and
Defenses, ser. RAID ’24. New York, NY, USA: Association
level attacks. for Computing Machinery, 2024, p. 215–234. [Online]. Available:
User Behavior Analysis. User behavior analysis is also [Link]
used to secure web services. Luo et al. [12] proposed Bot- [5] D. Han, Z. Wang, W. Chen, K. Wang, R. Yu, S. Wang, H. Zhang,
Z. Wang, M. Jin, J. Yang et al., “Anomaly detection in the open world:
Graph, using a pre-obtained sitemap to convert user behaviors Normality shift detection, explanation, and adaptation.” in NDSS, 2023.
from log data into subgraphs for behavior classification. Pri- [6] C. Fu, Q. Li, and K. Xu, “Flow interaction graph analysis: Unknown
nakaa et al. [13] proposed a real-time API abuse detection encrypted malicious traffic detection,” IEEE/ACM Transactions on Net-
working, vol. 32, no. 4, pp. 2972–2987, 2024.
system that utilizes behavioral analysis of API logs to identify [7] C. Liu, L. He, G. Xiong, Z. Cao, and Z. Li, “Fs-net: A flow sequence
and mitigate security threats. Because log data is hard to network for encrypted traffic classification,” in IEEE INFOCOM 2019 -
access in many cases, researchers are beginning to investigate IEEE Conference on Computer Communications, 2019, pp. 1171–1179.
[8] M. Said Elsayed, N.-A. Le-Khac, S. Dev, and A. D. Jurcut, “Network
the feasibility of user behavior analysis using network traffic. anomaly detection using lstm based autoencoder,” in Proceedings of the
Mengmeng et al. [20] proposed Enmob, a multi-flow-based 16th ACM symposium on QoS and security for wireless and mobile
behavioral traffic classification method, which is designed networks, 2020, pp. 37–45.
[9] B. Min, J. Yoo, S. Kim, D. Shin, and D. Shin, “Network anomaly
to uncover application behaviors using encrypted application detection using memory-augmented deep autoencoder,” IEEE Access,
traffic. However, this method requires predefined behavior vol. 9, pp. 104 695–104 706, 2021.
patterns and labeled traffic. Differently, BLADE focuses on [10] Y. Qing, Q. Yin, X. Deng, Y. Chen, Z. Liu, K. Sun, K. Xu, J. Zhang, and
Q. Li, “Low-quality training data only? a robust framework for detecting
anomaly detection and can automatically generate pseudo encrypted malicious network traffic,” arXiv preprint arXiv:2309.04798,
operation labels in an unsupervised manner. 2023.
[11] W. Liu, W. Cui, B. Wang, H. Pan, W. She, and Z. Tian, “Decentralized
traffic detection utilizing blockchain-federated learning with quality-
VI. C ONCLUSION driven aggregation,” Computer Networks, vol. 262, p. 111179, 2025.
[12] Y. Luo, G. She, P. Cheng, and Y. Xiong, “Botgraph: Web bot detection
This paper presents BLADE, a novel traffic anomaly detec- based on sitemap,” arXiv preprint arXiv:1903.08074, 2019.
tion system that can detect both the flow-level and behavior- [13] S. Prinakaa, V. Bavanika, S. Sanjana, S. Srinivasan, and V. Sarasvathi, “A
real-time approach to detecting api abuses based on behavioral patterns,”
level attacks in web services. We observe that application-layer in 2024 8th International Conference on Cryptography, Security and
operations of web services exhibit distinctive communication Privacy (CSP). IEEE, 2024, pp. 24–28.
patterns at the network layer from a multi-flow perspective. [14] I. Sharafaldin, A. H. Lashkari, A. A. Ghorbani et al., “Toward generating
a new intrusion detection dataset and intrusion traffic characterization.”
BLADE generates a pseudo operation label and an anomaly ICISSp, vol. 1, no. 2018, pp. 108–116, 2018.
score for each flow and learns behavior patterns of legitimate [15] L. Foundation, “Data plane development kit (DPDK),” 2015. [Online].
web users using features within multiple flows, thus facilitating Available: [Link]
[16] M. Lanvin, P.-F. Gimenez, Y. Han, F. Majorczyk, L. Mé, and É. To-
the detection of single-flow and multi-flow anomalies. We tel, “Errors in the cicids2017 dataset and the significant differences
implement BLADE and evaluate it on our custom and public in detection performances it makes,” in Risks and Security of Internet
CIC-IDS2017 datasets. The evaluation results demonstrate that and Systems, S. Kallel, M. Jmaiel, M. Zulkernine, A. Hadj Kacem,
F. Cuppens, and N. Cuppens, Eds. Cham: Springer Nature Switzerland,
BLADE achieves an average F1 score of 0.9732 against both 2023, pp. 18–33.
the flow-level and behavior-level attacks. In addition, BLADE [17] L. Liu, G. Engelen, T. Lynar, D. Essam, and W. Joosen, “Error
outperforms four flow-level baselines, showing the superiority prevalence in nids datasets: A case study on cic-ids-2017 and cse-cic-
ids-2018,” in 2022 IEEE Conference on Communications and Network
of multi-flow anomaly detection over traditional single-flow Security (CNS). IEEE, 2022, pp. 254–262.
approaches. [18] D. Wang, M. Nie, and D. Chen, “Bae: Anomaly detection algorithm
based on clustering and autoencoder,” Mathematics, vol. 11, no. 15, p.
3398, 2023.
ACKNOWLEDGEMENT [19] Z. Cheng, G. Gao, H. Huang, Y.-E. Sun, Y. Du, and H. Wang,
“Burstdetector: Real-time and accurate across-period burst detection in
This work was supported in part by the National Natural high-speed networks,” in IEEE INFOCOM 2024 - IEEE Conference on
Science Foundation of China with Grant 62301499 and the Computer Communications, 2024, pp. 2338–2347.
Henan Association for Science and Technology with Grant [20] G. Mengmeng, F. Ruitao, L. Likun, Y. Xiangzhan, S. Vinay, X. Xiaofei,
and L. Yang, “Enmob: Unveil the behavior with multi-flow analysis of
2025HYTP037. encrypted app traffic,” Cybersecurity, vol. 8, no. 1, p. 26, 2025.

R EFERENCES
[1] A. Technologies, “State of apps and api security 2025: How ai is
shifting the digital terrain,” [Link]
state-of-apps-and-api-security-2025-how-ai-is-shifting-the-digital-terrain/,
2025, accessed: 2025-09-14.

You might also like