0% found this document useful (0 votes)

19 views12 pages

Deep Person Re-Identification

Uploaded by

gustavof

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views12 pages

Deep Person Re-Identification

Uploaded by

gustavof

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

WhoFi: Deep Person Re-Identification

via Wi-Fi Channel Signal Encoding

Danilo Avola, Emad Emam, Dario Montagnini, Daniele Pannone, and

Amedeo Ranaldi

Department of Computer Science, La Sapienza University of Rome

{avola, emam, montagnini, pannone, ranaldi}@di.uniroma1.it
arXiv:2507.12869v2 [cs.CV] 4 Aug 2025

Abstract. Person Re-Identification is a key and challenging task in

video surveillance. While traditional methods rely on visual data, is-
sues like poor lighting, occlusion, and suboptimal angles often hinder
performance. To address these challenges, we introduce WhoFi, a novel
pipeline that utilizes Wi-Fi signals for person re-identification. Biomet-
ric features are extracted from Channel State Information (CSI) and
processed through a modular Deep Neural Network (DNN) featuring a
Transformer-based encoder. The network is trained using an in-batch
negative loss function to learn robust and generalizable biometric sig-
natures. Experiments on the NTU-Fi dataset show that our approach
achieves competitive results compared to state-of-the-art methods, con-
firming its effectiveness in identifying individuals via Wi-Fi signals.

Keywords: Person Re-Identification · CSI · Deep Neural Networks ·

Transformers · Wi-Fi Signals · Radio Biometric Signature

1 Introduction
Person Re-Identification (Re-ID) plays a central role in surveillance systems,
aiming to determine whether two representations belong to the same individ-
ual across different times or locations. Traditional Re-ID systems typically rely
on visual data such as images or videos, comparing a probe (the input to be
identified) against a set of stored gallery samples by learning discriminative bio-
metric features. Most commonly, these features are based on appearance cues
such as clothing texture, color, and body shape. However, visual-based systems
suffer from a number of known limitations, including sensitivity to changes in
lighting conditions [4], occlusions [6], background clutter [20], and variations
in camera viewpoints [12]. These challenges often result in reduced robustness,
especially in unconstrained or real-world environments. To overcome these limi-
tations, an alternative research direction explores non-visual modalities, such as
Wi-Fi-based person Re-ID. Wi-Fi signals offer several advantages over camera-
based approaches: they are not affected by illumination, they can penetrate walls
and occlusions, and most importantly, they offer a privacy-preserving mechanism
for sensing. The core insight is that as a Wi-Fi signal propagates through an en-
vironment, its waveform is altered by the presence and physical characteristics
2 D. Avola et al.

of objects and people along its path. These alterations, captured in the form
of Channel State Information (CSI), contain rich biometric information. Unlike
optical systems that perceive only the outer surface of a person, Wi-Fi signals
interact with internal structures, such as bones, organs, and body composition,
resulting in person-specific signal distortions that act as a unique signature.
Earlier wireless sensing methods primarily relied on coarse signal measure-
ments such as the Received Signal Strength Indicator (RSSI) [11], which proved
insufficient for fine-grained recognition tasks. More recently, CSI has emerged as
a powerful alternative [17]. CSI provides subcarrier-level measurements across
multiple antennas and frequencies, enabling a detailed and time-resolved view of
how radio signals interact with the human body and surrounding environment.
By learning patterns from CSI sequences, it is possible to perform Re-ID by
capturing and matching these radio biometric signatures. Despite the promis-
ing nature of Wi-Fi-based Re-ID, the field remains underexplored, especially in
terms of developing scalable deep learning methods that can generalize across
individuals and sensing environments. In this paper, we propose WhoFi, a deep
learning pipeline for person Re-ID using only CSI data. Our model is trained
with an in-batch negative loss to learn robust embeddings from CSI sequences.
We evaluate multiple backbone architectures for sequence modeling, including
Long Short-Term Memory (LSTM), Bidirectional LSTM (Bi-LSTM), and Trans-
former networks, each designed to capture temporal dependencies and contextual
patterns. The main contributions of this work are:

– We propose a modular deep learning pipeline for person Re-ID that relies
solely on Wi-Fi CSI data, without requiring visual input;
– We perform a comparative study across three widely used backbone architec-
tures (LSTM, Bi-LSTM, and Transformer networks) to assess their ability
to encode biometric signatures from CSI;
– We adopt an in-batch negative loss training strategy, which enables scalable
and effective similarity learning in the absence of labeled pairs;
– We conduct extensive experiments on the public NTU-Fi dataset to demon-
strate the accuracy and generalizability of our approach;
– We perform an ablation study to evaluate the impact of preprocessing strate-
gies, input sequence length, model depth, and data augmentation.

By leveraging non-visual biometric features embedded in Wi-Fi CSI, this

study offers a privacy-preserving and robust approach for Wi-Fi-based Re-ID,
and it lays the foundation for future work in wireless biometric sensing.

2 Related Work

2.1 Person Re-Identification via Visual Data

In the field of computer vision, person Re-ID has long been of major impor-
tance. Earlier methods primarily relied on RGB images or videos to track peo-
ple across camera views. Handcrafted descriptors such as Local Binary Patterns
WhoFi 3

(LBP), color histograms, and Histograms of Oriented Gradients (HOG) were

widely used to capture low-level visual cues like texture and silhouette. With
the advent of deep learning, Convolutional Neural Networks (CNNs) became
the dominant approach, enabling hierarchical spatial feature learning [7]. Train-
ing strategies like triplet loss, cross-entropy with label smoothing, and center loss
were adopted to optimize embedding space separability [5, 19]. Recent models
often integrate attention mechanisms [10] and part-based representations [13]
to handle misalignment and occlusion. Despite strong benchmark performance,
these systems rely heavily on high-quality visual input and careful manual tun-
ing, limiting their applicability in uncontrolled environments.

2.2 Person Identification and Re-ID via Wi-Fi Sensing

Several works have extensively investigated human identification and authenti-
cation through Wi-Fi CSI, focusing on features such as amplitude, phase, and
heatmap variations [3]. Early methods include line-of-sight waveform modeling
combined with PCA or DWT for classification [15], or gait-based identifica-
tion through handcrafted features [18]. CAUTION [14] introduced a dataset
and few-shot learning approach for user recognition via downsampled CSI rep-
resentations. More recent methods leverage deep learning models to enhance
generalization capabilities [16]. A recent approach [1] proposed a dual-branch ar-
chitecture that combines CNN-based processing of amplitude-derived heatmaps
with LSTM-based modeling of phase information for re-identification. However,
the use of private datasets in such work limits replicability and hinders direct
comparison. In contrast, our study relies on a widely available public benchmark,
enabling reproducibility and fair evaluation across different architectures.

3 Method
In this section, details about data pre-processing and augmentation, together
with the proposed deep architecture, are presented.

3.1 Data Pre-processing

Data extracted from the CSI complex matrix must first be pre-processed to
remove noise and sampling offsets to extract meaningful biometric features.

Channel State Information (CSI): Wi-Fi transmission relies on electromag-

netic waves that carry information from a transmitting antenna (TX) to a receiv-
ing one (RX). Modern systems adopt Multiple-Input Multiple-Output (MIMO),
involving multiple TX/RX antennas, and Orthogonal Frequency-Division Multi-
plexing (OFDM), a modulation technique that transmits data across orthogonal
subcarriers spanning nearly the entire frequency band. The integration of MIMO
and OFDM enables sampling of the Channel Frequency Response (CFR) at sub-
carrier granularity in a CSI matrix. The CSI measurement for each subcarrier
4 D. Avola et al.

k ∈ K represents the CFR H (θ,γ) between the receiving antenna (RX) θ ∈ Θ

and the transmitting antenna (TX) γ ∈ Γ , and is given by:

H^{(\theta ,\gamma )}_{k} = |H^{(\theta ,\gamma )}_{k}| e^{j \angle H^{(\theta ,\gamma )}_{k}}, \label {eq:CSI} (1)
(θ,γ) (θ,γ)
where |Hk | denotes the signal amplitude and ∠Hk the signal phase. By
collecting the responses across all TX/RX antenna pairs, a CSI complex matrix
of size Θ × Γ × K is formed, representing the CFR across all subcarriers in K.

Amplitude Filtering: Signal amplitude represents the strength of the received

signal. For a subcarrier k ∈ K, receiver antenna θ ∈ Θ, and transmitter antenna
(θ,γ)
γ ∈ Γ , the signal amplitude Ak is defined as:

A^{(\theta ,\gamma )}_{k} = |H^{(\theta ,\gamma )}_{k}| = \sqrt {\text {real}(H^{(\theta ,\gamma )}_{k})^2 + \text {img}(H^{(\theta ,\gamma )}_{k})^2}, \label {eq:amp_eq} (2)

which corresponds to the magnitude of the CSI measurement. In this work, sig-
nal amplitudes are cleaned of outliers using the Hampel filter [2], which identifies
outliers based on the median of a local window and the Median Absolute Devi-
ation (MAD). Given a sequence of amplitude values across p packets, the local
window W p,k of size w (set to 5) centered on packet p is defined as:

W^{p,k} = \left \{ A^{p - \lfloor w/2 \rfloor }_k, \ldots , A^{p + \lfloor w/2 \rfloor }_k \;:\; A^{p - \lfloor w/2 \rfloor }_k < A^{p + \lfloor w/2 \rfloor }_k \right \}, (3)

\text {median}(W^{p,k}) = W^{(p,k)}_{\left \lfloor w/2 \right \rfloor }, (4)

\text {MAD}(W^{p,k}) = \text {median}(|W^{p,k}_{i} - \text {median}(W^{p,k})|) \quad \forall i, \; 1 \le i \le w, (5)
where W p,k denotes the vector containing the w neighboring data packets cen-
tered at packet p, sorted in ascending order, for the k-th subcarrier. An amplitude
value is classified as an outlier if its deviation from the local median exceeds a
fixed threshold. Specifically, any value outside the range:

\text {limit}_{p,k} = \text {median}(W^{p,k}) \pm \xi \cdot \text {MAD}(W^{p,k}), \label {eq:mad_calc} (6)

with ξ set to 3, is considered an outlier and removed.

Phase Sanitization: Signal phase represents the temporal shift of a signal. It

is calculated as the arctangent of the imaginary and real parts of the CFR:

P^{(\theta ,\gamma )}_{k} = \tan ^{-1} \left ( \frac {\text {img} (H^{(\theta ,\gamma )}_{k})}{\text {real}(H^{(\theta ,\gamma )}_{k})}\right ). (7)

To remove any possible phase shifts caused by imperfect synchronization between

the transmitter and receiver hardware components, we apply a standard linear
WhoFi 5

phase sanitization technique. The estimated phase ∠Ĥ(f )k at frequency f from

the CSI measurements is expressed as:

\angle \hat {H}(f)_{k} = H(f)_{k} + 2\pi \frac {m_k}{N} \Delta t + \beta + Z, (8)

where, H(f )k is the actual phase, ∆t is a time offset from any delay in the
signal arrival and reception, β is the unknown phase offset, and Z is a noise
factor. Since the delay factor is a linear function in the subcarrier index mk , it
is possible to estimate the correct phase slope a and offset b as:

a = \frac {\angle \hat {H}(f)_{K} - \angle \hat {H}(f)_{1}}{m_K - m_1}, (9)

b = \frac {1}{K} \sum _{k=1}^{K} \angle \hat {H}(f)_{k}. (10)

Therefore, the calibrated phase ∠H ′ (f )k for each subcarrier k ∈ K can be esti-

mated by subtracting a linear term from the raw phase as:

\angle H'(f)_{k} = \angle \hat {H}(f)_{k} - a m_k + b. (11)

3.2 Data Augmentation

To enhance model sensitivity and overall robustness against noise or minor sig-
nal fluctuations, we apply several data augmentation techniques during training.
These transformations are performed on the extracted amplitude features rather
than directly on the raw CSI data. For each amplitude entry, one augmenta-
tion is applied with a 90% probability, leaving the remaining 10% unmodified.
The first augmentation adds Gaussian noise n(t) ∼ N (0, σ 2 ) to the amplitude
(θ,γ)
value Ak (t) at each time step t, where σ = 0.02, simulating realistic signal
fluctuations and improving generalization in noisy environments. The second
augmentation scales the amplitude by a random factor uniformly sampled in
[0.9, 1.1], modeling small variations in signal strength due to environmental or
device-related factors. Finally, a time shift is applied by offsetting the ampli-
tude sequence forward or backward by a random integer t′ ∈ [−5, 5] within a
sequence of length P = 100. Any value shifted outside the sequence bounds is
replaced with the mean amplitude of the original signal, simulating delays or
de-synchronizations in signal acquisition.

3.3 Deep Neural Network Architecture

In the proposed pipeline, a DNN is designed to generate a biometric signature

from the processed CSI features. The architecture is composed of an Encoder
module (Me ) and a Signature Module (Ms ) as shown in Figure 1.
6 D. Avola et al.

Fig. 1: Overview of the proposed framework. The system takes an input signal (e.g.,
a person sensing data) and processes it through an encoder that extracts meaningful
latent representations. These features are passed to a signature model that computes
a compact signature vector s. To ensure consistency and comparability, the output
signature is normalized through the l2 normalization. The resulting signature serves as
a unique identifier for the individual based on the input signal characteristics.

Encoder Module: The encoder module produces a fixed-size vector that con-
tains human signature relevant information from the provided CSI measure-
ments. This module aims at extracting low-dimensional encoding of the high-
dimensional and sequential inputs, including the amplitude or phase extracted
from the CSI measurement of the wireless channel while a specific person is
present between the transmitter and receiver. This work evaluates three types
of encoding architectures compatible with sequential data: an LSTM encoder, a
Bi-LSTM encoder, and the encoder part of a Transformer model:

1. LSTM Encoder: LSTMs capture temporal dependencies in input sequences,

enabling the model to recognize recurrent patterns. The LSTM encoder con-
sists of l stacked hidden units, where the output of the li -th unit is passed to
the li+1 -th unit in the hidden layer. Dropout layers with probability pd are
interleaved between LSTM layers to improve robustness and reduce overfit-
ting during training. The final hidden state H l from the last LSTM layer
serves as the encoded output.
2. Bi-LSTM Encoder: Bi-LSTMs are able to capture the correlation between
time steps in the input sequence by processing the sequence in both forward
and direction. This allows the model to capture context from both past and
future time steps. Similar to the LSTM encoder, l stacked BiLSTM layers
with interleaved dropout layers are used to avoid overfitting. The last hidden
−
→ ←−
states from both forward (H l ) and backward (H l ) passes are concatenated
to form the output encoding H l .
3. Transformer Encoder: The encoder from the Transformer architecture is
capable of detecting correlation between different elements in distant time
steps in the input sequence. The encoder contains l identical layers, each
containing a multi-head self-attention sub-layer and a position-wise feed-
forward network sub-layer. Standard and non-trainable sinusoidal positional
WhoFi 7

encodings are added to the input embeddings to retain sequence order infor-
mation. Moreover, residual connections and layer normalization are applied
after each sub-layer. A Dropout layer with drop probability pd is used in-
between encoder layers as a regularization technique. The output of the final
Transformer layer acts as the encoded representation.

Signature Module: The Signature module takes the fixed-size vector output
from the encoder module and generates a final biometric signature. It consists of
a linear layer and a l2 normalization function. The linear layer is a fully connected
layer that maps the encoder output to the desired signature s-dimensional space.
Then, a normalization function is applied to regularize and uniform the output
vector values to have a unit l2 norm. Therefore, normalization ensures that the
signatures lie on a hypersphere, which facilitates the similarity computations
used in the loss function, thus, speeding up the training phase.

3.4 Loss Function

The training phase requires a loss function that facilitates signatures from the
same person to be close together in the embedding space, and increases the dis-
tance of signatures from different people. While contrastive loss and triplet loss
work on pairs or triplets, they might not leverage information from all available
negative samples effectively. To this aim, the pipeline utilizes in-batch negative
loss [8], which is widely used in retrieval tasks. During training, a custom batch
sampler is used to construct batches, each composed by two list of samples, a
N N
query list Bq = {Xi }i=0 and a gallery list Bg = {Xj }j=0 , where Xi are the CSI
measurements and N is the batch size. The i-th sample in Bq and the j-th sample
in Bg belong to the same person if and only if i = j. The entire batch with both
Bq and Bg is fed into the DNN, and consequently, the two lists of biometric
N N
signatures are computed by the model: Sq DN N {Xi }j=0 Sg DN N {Xj }j=0 . As
a result, a similarity matrix sim(q, g) of size N × N is computed between the
query and gallery signatures using cosine similarity. Due to the l2 normalization
in the Signature Module, this is simplified to the dot product:

sim(q, g) = S_q Â· S^T_g. (12)

In the similarity matrix shown in Figure 2, diagonal elements indicate simi-

larities between each query signature and its corresponding positive gallery sig-
nature (same person), while off-diagonal elements correspond to negative pairs
(different people). We apply cross-entropy loss across each row to maximize
diagonal (positive) scores and minimize off-diagonal (negative) ones. For each
query Sq,i , the softmax-normalized row is encouraged to peak at the i-th posi-
tion. This leads the matrix toward an identity structure, promoting separation
between individuals and clustering of same-person signatures.
8 D. Avola et al.

Fig. 2: Similarity Matrix example used in in-batch negative loss function.

Table 1: Results of each model on the NTU-Fi test set.

Model Rank-1 Rank-3 Rank-5 mAP

LSTM 0.777 ± 0.032 0.897 ± 0.014 0.933 ± 0.005 0.568 ± 0.010
Bi-LSTM 0.845 ± 0.045 0.934 ± 0.022 0.958 ± 0.013 0.612 ± 0.026
Transformer 0.955 ± 0.013 0.981 ± 0.006 0.991 ± 0.000 0.884 ± 0.012

4 Experimental Results and Discussion

4.1 Dataset
Experiments are conducted on the NTU-Fi dataset [14, 16]. This dataset is cre-
ated for Wi-Fi sensing applications and includes samples for both Human Ac-
tivity Recognition (HAR) and Human Identification (HID). We utilize only the
HID part to evaluate person Re-ID. The dataset collects the CSI measurements
of 14 different subjects. For each subject, 60 samples were collected while they
were performing a short walk inside the designated test area. The samples were
collected in three different scenarios: subjects wearing only a T-shirt, a T-shirt
and a coat, and a T-shirt, coat, and backpack, respectively. The data we recorded
using two TP-Link N750 routers. The transmitter router contains a single an-
tenna, while the receiver one contains three antennas. CSI amplitude data were
collected across 114 subcarriers per antenna pair and recorded over 2000 packets
per sample. As a result, each sample has a dimensionality of 3 × 114 × 2000. The
publicly available dataset provides only the amplitude values already extracted
from the CSI, with no access to the original complex CSI matrices. The dataset
is pre-divided into training and test sets, containing 546 and 294 samples respec-
tively. To allow for evaluation during training, a 3-fold cross-validation strategy
is employed, using an 80% training and 20% validation split within each fold.

4.2 Implementation Details

We train our model using an AMD Ryzen 7 processor with 8 cores (16 virtual
cores), 64GB RAM and a NVIDIA GeForce RTX 3090 GPU with 24GB of
WhoFi 9

Table 2: Performance comparison of different models with and without amplitude

filtering. Metrics reported are Rank-1 accuracy and mean Average Precision (mAP).
The results highlight the impact of amplitude filtering on retrieval performance across
LSTM, Bi-LSTM, and Transformer-based models.

Without filter With filter

Model Rank-1 mAP Rank-1 mAP
LSTM 0.777 ± 0.032 0.568 ± 0.010 0.755 ± 0.038 0.587 ± 0.018
Bi-LSTM 0.845 ± 0.045 0.612 ± 0.026 0.786 ± 0.036 0.675 ± 0.018
Transformers 0.955 ± 0.013 0.884 ± 0.012 0.930 ± 0.025 0.851 ± 0.035

Table 3: Effect of varying packet sizes on model performance. Results are reported for
LSTM and Transformer architectures across different packet counts (100 to 2000), using
Rank-1, Rank-3, Rank-5 accuracy, and mean Average Precision (mAP) as evaluation
metrics. The table illustrates how performance trends shift with input granularity for
both model types.

Model Packets Rank-1 Rank-3 Rank-5 mAP

LSTM 100 0.805 ± 0.050 0.918 ± 0.029 0.939 ± 0.022 0.597 ± 0.002
LSTM 200 0.777 ± 0.032 0.897 ± 0.014 0.933 ± 0.005 0.568 ± 0.010
LSTM 500 0.777 ± 0.065 0.906 ± 0.028 0.939 ± 0.017 0.592 ± 0.040
LSTM 1000 0.794 ± 0.048 0.991 ± 0.019 0.947 ± 0.011 0.592 ± 0.046
LSTM 2000 0.799 ± 0.029 0.915 ± 0.019 0.943 ± 0.013 0.579 ± 0.028
Transformers 100 0.952 ± 0.021 0.983 ± 0.006 0.990 ± 0.005 0.871 ± 0.041
Transformers 200 0.955 ± 0.013 0.981 ± 0.006 0.991 ± 0.000 0.884 ± 0.012
Transformers 500 0.937 ± 0.020 0.976 ± 0.012 0.984 ± 0.011 0.840 ± 0.033
Transformers 1000 0.960 ± 0.013 0.984 ± 0.005 0.988 ± 0.001 0.896 ± 0.020
Transformers 2000 0.960 ± 0.014 0.982 ± 0.011 0.990 ± 0.008 0.850 ± 0.054

RAM. For the models implementation, the Pytorch framework has been used.
Regarding the training process, 300 epochs are performed for each model using
a batch size of 8. Adam [9] optimizer is used with a starting learning rate of
0.0001. A StepLR learning rate scheduler decreases the learning rate by a factor
of 0.95 every 50 epochs.

4.3 Person Re-Identification Evaluation

To evaluate the performance our Re-ID model, mean Average Precision (mAP)
has been used together with Rank-k accuracy, defined as follows:

\text {Rank}(k) = \frac {1}{N} \sum _{i=1}^{N} \delta (r_i \le k), (13)

which provides the probability of finding the wanted subject in the top k most
probable labels. The results obtained during the tests are shown in Table 1. As
10 D. Avola et al.

Table 4: Impact of data augmentation on model performance. Comparison of Rank-1

accuracy and mean Average Precision (mAP) for LSTM, BiLSTM, and Transformer
models, evaluated with and without data augmentation.

Without augmentation With augmentation

Model Rank-1 mAP Rank-1 mAP
LSTM 0.777 ± 0.032 0.568 ± 0.010 0.808 ± 0.038 0.587 ± 0.018
Bi-LSTM 0.845 ± 0.045 0.612 ± 0.026 0.889 ± 0.017 0.668 ± 0.016
Transformers 0.955 ± 0.013 0.884 ± 0.012 0.949 ± 0.014 0.860 ± 0.043

Table 5: Evaluation of encoder type and layer depth on performance. Rank-1, Rank-3,
Rank-5 accuracy, and mean Average Precision (mAP) are reported for LSTM, BiLSTM,
and Transformer models with 1 and 3 encoder layers.

Model Layers Rank-1 Rank-3 Rank-5 mAP

LSTM 1 0.777 ± 0.032 0.897 ± 0.014 0.933 ± 0.005 0.568 ± 0.010
LSTM 3 0.822 ± 0.026 0.909 ± 0.004 0.941 ± 0.004 0.585 ± 0.001
Bi-LSTM 1 0.845 ± 0.045 0.934 ± 0.022 0.958 ± 0.013 0.612 ± 0.026
Bi-LSTM 3 0.825 ± 0.042 0.919 ± 0.012 0.955 ± 0.003 0.632 ± 0.043
Transformers 1 0.955 ± 0.013 0.981 ± 0.006 0.991 ± 0.000 0.884 ± 0.012
Transformers 3 0.919 ± 0.028 0.970 ± 0.008 0.984 ± 0.003 0.658 ± 0.026

demonstrated, the model utilizing the Transformers encoder exceeds in perfor-

mance both LSTM and Bi-LSTM ones. The Transformer-based model achieves a
95.5% score for the Rank-1 metric and an mAP score 88.4%. The self-attention
mechanism of Transformer renders it more accurate and robust at capturing
the discriminative, long-range temporal patterns within the Wi-Fi amplitude
sequences relevant for Re-ID compared to the LSTM-based models.

4.4 Ablation Study

Regarding amplitude filtering, Table 2 shows that models trained without the
amplitude filtering pre-processing step achieved better performance. This sug-
gests that the filtering process may have inadvertently removed useful signal
variations essential for learning highly discriminative biometric signatures. As
for data augmentation, Table 4 indicates that the applied transformations im-
proved generalization for both LSTM and Bi-LSTM architectures. In contrast,
the Transformer encoder did not benefit significantly, although it consistently
outperformed the other two models even without augmentation. With respect
to packet size, Table 3 reveals that LSTM performance remained mostly stable
or slightly degraded with longer sequence lengths, likely due to vanishing gradi-
ent issues and limited context modeling. Conversely, the Transformer benefited
from extended input sequences, thanks to its self-attention mechanism that al-
lows efficient modeling of long-range dependencies. Only LSTM and Transformer
WhoFi 11

models were evaluated in this experiment, due to the increased computational

cost associated with longer inputs. Finally, we compared shallow (1-layer) and
deeper (3-layer) variants of each encoder in Table 5. The Transformer achieved its
best performance with a single layer, as deeper configurations led to overfitting
and optimization instability. For LSTM and Bi-LSTM models, stacking layers
resulted in marginal performance gains but introduced slower convergence and
reduced training stability. These findings reinforce the overall robustness and
efficiency of the Transformer encoder within the proposed framework.

5 Conclusion
In this paper, we presented a pipeline to address the problem of person Re-
ID using Wi-Fi CSI. The proposed approach leverages a DNN that generates
biometric signatures from CSI-derived features. These signatures are then com-
pared to a gallery of known subjects to perform re-identification through similar-
ity matching. We evaluated three encoder architectures, LSTM, Bi-LSTM, and
Transformer, on the publicly available NTU-Fi dataset, with the Transformer-
based model delivering the best overall performance. By applying a unified and
reproducible pipeline to a public benchmark, this work establishes a valuable
baseline for future research in CSI-based person re-identification. The encour-
aging results achieved confirm the viability of Wi-Fi signals as a robust and
privacy-preserving biometric modality, and position this study as a meaningful
step forward in the development of signal-based Re-ID systems.

Acknowledgements. This work was supported by the “Smart unmannEd AeRial

vehiCles for Human l

References
1. Avola, D., Cascio, M., Cinque, L., Fagioli, A., Petrioli, C.: Person re-identification
through wi-fi extracted radio biometric signatures. IEEE Transactions on Infor-
mation Forensics and Security 17, 1145–1158 (2022). https://doi.org/10.1109/
TIFS.2022.3158058 3
2. Davies, L., Gather, U.: The identification of multiple outliers. Journal of the Amer-
ican Statistical Association 88(423), 782–792 (1993) 4
3. Duan, P., Diao, X., Cao, Y., Zhang, D., Zhang, B., Kong, J.: A comprehensive
survey on wi-fi sensing for human identity recognition. Electronics 12(23) (2023)
3
4. Feng, Z., Lai, J., Xie, X.: Learning modality-specific representations for visible-
infrared person re-identification. IEEE Transactions on Image Processing 29, 579–
590 (2019) 1
5. Hermans, A., Beyer, L., Leibe, B.: In defense of the triplet loss for person re-
identification. arXiv preprint arXiv:1703.07737 (2017) 3
6. Hou, R., Ma, B., Chang, H., Gu, X., Shan, S., Chen, X.: Vrstc: Occlusion-free video
person re-identification. In: 2019 IEEE/CVF Conference on Computer Vision and
Pattern Recognition (CVPR). pp. 7176–7185 (2019). https://doi.org/10.1109/
CVPR.2019.00735 1
12 D. Avola et al.

7. Jalali, A., Mallipeddi, R., Lee, M.: Sensitive deep convolutional neural network
for face recognition at large standoffs with small dataset. Expert Systems with
Applications 87, 304–315 (2017) 3
8. Karpukhin, V., Oguz, B., Min, S., Lewis, P.S., Wu, L., Edunov, S., Chen, D., Yih,
W.t.: Dense passage retrieval for open-domain question answering. In: EMNLP
(1). pp. 6769–6781 (2020) 7
9. Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. In: Bengio,
Y., LeCun, Y. (eds.) 3rd International Conference on Learning Representations,
ICLR 2015, San Diego, CA, USA, May 7-9, 2015, Conference Track Proceedings
(2015) 9
10. Li, W., Zhu, X., Gong, S.: Harmonious attention network for person re-
identification. In: Proceedings of the IEEE conference on computer vision and
pattern recognition. pp. 2285–2294 (2018) 3
11. Oguchi, K., Maruta, S., Hanawa, D.: Human positioning estimation method using
received signal strength indicator (rssi) in a wireless sensor network. Procedia Com-
puter Science 34, 126–132 (2014). https://doi.org/https://doi.org/10.1016/
j.procs.2014.07.066, the 9th International Conference on Future Networks and
Communications (FNC’14)/The 11th International Conference on Mobile Systems
and Pervasive Computing (MobiSPC’14)/Affiliated Workshops 2
12. Sun, X., Zheng, L.: Dissecting person re-identification from the viewpoint of view-
point. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recogni-
tion (CVPR). pp. 608–617 (2019). https://doi.org/10.1109/CVPR.2019.00070
1
13. Sun, Y., Zheng, L., Yang, Y., Tian, Q., Wang, S.: Beyond part models: Person
retrieval with refined part pooling (and a strong convolutional baseline). In: Pro-
ceedings of the European conference on computer vision (ECCV). pp. 480–496
(2018) 3
14. Wang, D., Yang, J., Cui, W., Xie, L., Sun, S.: Caution: A robust wifi-based human
authentication system via few-shot open-set gait recognition. IEEE Internet of
Things Journal 9, 1–1 (09 2022). https://doi.org/10.1109/JIOT.2022.3156099
3, 8
15. Xin, T., Guo, B., Wang, Z., Li, M., Yu, Z.: Freesense:indoor human identification
with wifi signals (2016) 3
16. Yang, J., Chen, X., Zou, H., Lu, C.X., Wang, D., Sun, S., Xie, L.: Sensefi: A
library and benchmark on deep-learning-empowered wifi human sensing. Patterns
4(3), 100703 (2023). https://doi.org/https://doi.org/10.1016/j.patter.
2023.100703 3, 8
17. Yang, Z., Zhou, Z., Liu, Y.: From rssi to csi: Indoor localization via channel re-
sponse. ACM Computing Surveys (CSUR) 46(2), 1–32 (2013) 2
18. Zeng, Y., Pathak, P.H., Mohapatra, P.: Wiwho: Wifi-based person identification in
smart spaces. In: 2016 15th ACM/IEEE International Conference on Information
Processing in Sensor Networks (IPSN). pp. 1–12 (2016). https://doi.org/10.
1109/IPSN.2016.7460727 3
19. Zheng, Z., Yang, X., Yu, Z., Zheng, L., Yang, Y., Kautz, J.: Joint discriminative and
generative learning for person re-identification. In: 2019 IEEE/CVF Conference on
Computer Vision and Pattern Recognition (CVPR). pp. 2133–2142 (2019). https:
//doi.org/10.1109/CVPR.2019.00224 3
20. Zhou, S., Wang, F., Huang, Z., Wang, J.: Discriminative feature learning with
consistent attention regularization for person re-identification. In: Proceedings of
the IEEE/CVF international conference on computer vision. pp. 8040–8049 (2019)
1

Sensefi
No ratings yet
Sensefi
15 pages
HAYMI Jayasundara2020
No ratings yet
HAYMI Jayasundara2020
9 pages
Multi-User HAR via WiFi Signal Analysis
No ratings yet
Multi-User HAR via WiFi Signal Analysis
17 pages
Wi-SensiNet: WiFi-Based Activity Recognition
No ratings yet
Wi-SensiNet: WiFi-Based Activity Recognition
18 pages
Wi-Fi CSI Based Behavior Recognition From Signals
No ratings yet
Wi-Fi CSI Based Behavior Recognition From Signals
11 pages
Siwis: Fine-Grained Human Detection Using Single Wifi Device
No ratings yet
Siwis: Fine-Grained Human Detection Using Single Wifi Device
16 pages
Clean SNCS An WiFi CSI Signal Enhancement
No ratings yet
Clean SNCS An WiFi CSI Signal Enhancement
20 pages
Challenge-CSI-Based Activity Recognition in Real-World Environments Wenjing Zhang
No ratings yet
Challenge-CSI-Based Activity Recognition in Real-World Environments Wenjing Zhang
11 pages
WiFi Sensing: Crowd Counting & Localization
No ratings yet
WiFi Sensing: Crowd Counting & Localization
4 pages
Pushing The Limits of WiFi Sensing With Low Transmission Rates
No ratings yet
Pushing The Limits of WiFi Sensing With Low Transmission Rates
15 pages
Non-Contact Heart Rate Monitoring Method Based On Wi-Fi CSI Signal
No ratings yet
Non-Contact Heart Rate Monitoring Method Based On Wi-Fi CSI Signal
22 pages
AI Wi-Fi Human Detection
No ratings yet
AI Wi-Fi Human Detection
1 page
Wireless Sensing for Human Activity
No ratings yet
Wireless Sensing for Human Activity
17 pages
Understanding and Modeling of Wifi Signal Based Human Activity Recognition
No ratings yet
Understanding and Modeling of Wifi Signal Based Human Activity Recognition
12 pages
A Dual-Modality Approach For Contactless Vital Sign Monitoring Using Camera and Wi-Fi CSI
No ratings yet
A Dual-Modality Approach For Contactless Vital Sign Monitoring Using Camera and Wi-Fi CSI
6 pages
Dense Pose Wifi
No ratings yet
Dense Pose Wifi
13 pages
WiFi-Based Crowd Counting Solution
No ratings yet
WiFi-Based Crowd Counting Solution
13 pages
Smart Presence Detection - Harnessing Wi-Fi Signals and Machine Learning With Esp8266
No ratings yet
Smart Presence Detection - Harnessing Wi-Fi Signals and Machine Learning With Esp8266
9 pages
Detection of Moving Human With WiFi
No ratings yet
Detection of Moving Human With WiFi
14 pages
Wi Fi Based Supervised Machine Learning Approach To Detect Objects Activities
No ratings yet
Wi Fi Based Supervised Machine Learning Approach To Detect Objects Activities
14 pages
WiARL-UIT: WiFi-Based Activity Dataset
No ratings yet
WiARL-UIT: WiFi-Based Activity Dataset
6 pages
Towards 3D Human Pose Construction Using Wifi
No ratings yet
Towards 3D Human Pose Construction Using Wifi
14 pages
Secur-Fi A Secure Wireless Sensing System Based On Commercial Wi-Fi Devices
No ratings yet
Secur-Fi A Secure Wireless Sensing System Based On Commercial Wi-Fi Devices
10 pages
Poster For Print
No ratings yet
Poster For Print
1 page
Through Wall Without Logos
No ratings yet
Through Wall Without Logos
1 page
WiFiSensing YongsenMa Authorversion
No ratings yet
WiFiSensing YongsenMa Authorversion
36 pages
IEEE Wi Fi Based Supervised Machine Learning Approach To Detect Objects Activities
No ratings yet
IEEE Wi Fi Based Supervised Machine Learning Approach To Detect Objects Activities
5 pages
WiFi Sensing for Care Home Safety
No ratings yet
WiFi Sensing for Care Home Safety
3 pages
Addressing Location Dependency in Human Activity Recognition Using Channel State Information Via 3D - CWT Approach
No ratings yet
Addressing Location Dependency in Human Activity Recognition Using Channel State Information Via 3D - CWT Approach
6 pages
Literature Review On Wireless sensing-Wi-Fi Signal-Based Recognition of Human Activities
No ratings yet
Literature Review On Wireless sensing-Wi-Fi Signal-Based Recognition of Human Activities
20 pages
WiSH WiFi-based Real-Time Human Detection
No ratings yet
WiSH WiFi-based Real-Time Human Detection
15 pages
Report BacSi
No ratings yet
Report BacSi
13 pages
Wifi Based Human Sensing Seminar
No ratings yet
Wifi Based Human Sensing Seminar
2 pages
Ubi Comp 2016 Gait
No ratings yet
Ubi Comp 2016 Gait
11 pages
Design and Deployment of A Practical Wireless Sensing System For HAR With WiFi CSI in The 5GHz Band
No ratings yet
Design and Deployment of A Practical Wireless Sensing System For HAR With WiFi CSI in The 5GHz Band
6 pages
Efficient Feature Extraction of Radio-Frequency Fingerprint Using Continuous Wavelet Transform
No ratings yet
Efficient Feature Extraction of Radio-Frequency Fingerprint Using Continuous Wavelet Transform
9 pages
Lightweight Person Re-ID Systems
No ratings yet
Lightweight Person Re-ID Systems
17 pages
Non-Invasive Detection of Moving and Stationary Human With WiFi
No ratings yet
Non-Invasive Detection of Moving and Stationary Human With WiFi
14 pages
Information: CSI Frequency Domain Fingerprint-Based Passive Indoor Human Detection
No ratings yet
Information: CSI Frequency Domain Fingerprint-Based Passive Indoor Human Detection
14 pages
Non-Invasive Detection of Moving and Stationary Human With Wifi
No ratings yet
Non-Invasive Detection of Moving and Stationary Human With Wifi
14 pages
Human Activity Recognition Using Deep Learning Models - !!
No ratings yet
Human Activity Recognition Using Deep Learning Models - !!
1 page
Edge WiFi Sensing with ESP32
No ratings yet
Edge WiFi Sensing with ESP32
32 pages
Contactless Wifi Sensing and Monitoring For Future Healthcare: Emerging Trends, Challenges and Opportunities
No ratings yet
Contactless Wifi Sensing and Monitoring For Future Healthcare: Emerging Trends, Challenges and Opportunities
22 pages
Indoor Wi-Fi Localization via CSI Images
No ratings yet
Indoor Wi-Fi Localization via CSI Images
11 pages
Artificial Intelligence Empowered Mobile Sensing For Human Flow Detection
No ratings yet
Artificial Intelligence Empowered Mobile Sensing For Human Flow Detection
6 pages
Device-Free Indoor Human Activity Recognition Using Wi-Fi RSSI Machine Learning Approaches
No ratings yet
Device-Free Indoor Human Activity Recognition Using Wi-Fi RSSI Machine Learning Approaches
2 pages
Seminar
No ratings yet
Seminar
15 pages
Wireless Image Retrieval Techniques
No ratings yet
Wireless Image Retrieval Techniques
12 pages
A Low Cost Automatic People Counting System at Bus Stops Using Wi Fi Probe Requests and Deep Learning
No ratings yet
A Low Cost Automatic People Counting System at Bus Stops Using Wi Fi Probe Requests and Deep Learning
30 pages
T-05 An Overview On IEEE 802.11bf WLAN Sensing
No ratings yet
T-05 An Overview On IEEE 802.11bf WLAN Sensing
34 pages
Multi-Modal 4D Human Dataset Overview
No ratings yet
Multi-Modal 4D Human Dataset Overview
20 pages
Wifi Meets ML
No ratings yet
Wifi Meets ML
4 pages
Deep Learning For Person Re-Identification: A Survey and Outlook
No ratings yet
Deep Learning For Person Re-Identification: A Survey and Outlook
25 pages
DNN-based Outdoor NLOS Human Detection Using IEEE 802.11ac WLAN Signal
No ratings yet
DNN-based Outdoor NLOS Human Detection Using IEEE 802.11ac WLAN Signal
4 pages
WiFi-Based Channel Impulse Response Estimation and Localization Via Multi-Band Splicing. Mahdi Barzegar Khalilsarai. 2020
No ratings yet
WiFi-Based Channel Impulse Response Estimation and Localization Via Multi-Band Splicing. Mahdi Barzegar Khalilsarai. 2020
6 pages
Plaintext Beamforming Feedback in WiFi
No ratings yet
Plaintext Beamforming Feedback in WiFi
17 pages
Biometric Gait Recognition To Identify Vicious Person Using Wi-Fi and Pyroelectric Infrared Sensors
No ratings yet
Biometric Gait Recognition To Identify Vicious Person Using Wi-Fi and Pyroelectric Infrared Sensors
9 pages
Smart Wireless Sensing From IoT To AIoT - Yang
No ratings yet
Smart Wireless Sensing From IoT To AIoT - Yang
238 pages
Panasonic TH42PA20A
No ratings yet
Panasonic TH42PA20A
80 pages
Easy DIY Multiband Ham Antenna
No ratings yet
Easy DIY Multiband Ham Antenna
16 pages
(The Oxford Series in Electrical and Computer Engineering) Sadiku, Matthew N. O - Elements of Electromagnetics-Oxford University Press (2018) - 533
No ratings yet
(The Oxford Series in Electrical and Computer Engineering) Sadiku, Matthew N. O - Elements of Electromagnetics-Oxford University Press (2018) - 533
6 pages
Mototrbo DGP E and DGP E Series: You'Re Completely Connected
No ratings yet
Mototrbo DGP E and DGP E Series: You'Re Completely Connected
4 pages
Manual de Instruções Marantz SR-7500 DFU - 00 - Cover
No ratings yet
Manual de Instruções Marantz SR-7500 DFU - 00 - Cover
56 pages
Online Result Je (Atc) 05 2020 - 0
No ratings yet
Online Result Je (Atc) 05 2020 - 0
6 pages
Power System Protection Quiz
No ratings yet
Power System Protection Quiz
7 pages
Yaesu Vertex VXR-5000 VHF Service Manual
No ratings yet
Yaesu Vertex VXR-5000 VHF Service Manual
136 pages
Tutorial RBS Ericsson Instalasi
No ratings yet
Tutorial RBS Ericsson Instalasi
13 pages
Essay On Electronic Devices
100% (1)
Essay On Electronic Devices
3 pages
A Machine Learning Approach To Wireless Propagation Modeling in Industrial Environment
No ratings yet
A Machine Learning Approach To Wireless Propagation Modeling in Industrial Environment
12 pages
GN12 The Smart Column-Xnjh33
No ratings yet
GN12 The Smart Column-Xnjh33
47 pages
Evolution of Audio Storage Formats
No ratings yet
Evolution of Audio Storage Formats
2 pages
Audio Devices Symbols
No ratings yet
Audio Devices Symbols
4 pages
Telecom Operations & Project Management Expert
No ratings yet
Telecom Operations & Project Management Expert
4 pages
Huawei - APE4518R42v06 Datasheet
No ratings yet
Huawei - APE4518R42v06 Datasheet
3 pages
LTE Call Flow for Telecom Experts
No ratings yet
LTE Call Flow for Telecom Experts
2 pages
Ac LGD User Instructions
No ratings yet
Ac LGD User Instructions
2 pages
Vishakhapatnam Airport Overview
No ratings yet
Vishakhapatnam Airport Overview
8 pages
Sapling GPS Cable Options: Synchronized Clock Systems
No ratings yet
Sapling GPS Cable Options: Synchronized Clock Systems
1 page
Internet Connectivity: Wired & Wireless
No ratings yet
Internet Connectivity: Wired & Wireless
2 pages
Sonika 1
No ratings yet
Sonika 1
10 pages
ACKNOWLEDGEMENT
No ratings yet
ACKNOWLEDGEMENT
28 pages
T Rec G.697 201611 I!!pdf e
No ratings yet
T Rec G.697 201611 I!!pdf e
40 pages
Lesson1 Where Is Haru-San House Easy Japanese NHK WORLD-JAPAN
No ratings yet
Lesson1 Where Is Haru-San House Easy Japanese NHK WORLD-JAPAN
1 page
Laporan RBB Minggu 5
No ratings yet
Laporan RBB Minggu 5
25 pages
Radio-Frequency Channel Arrangements For Medium-And High-Capacity Digital Fixed Wireless Systems Operating in The 6 425-7 125 MHZ Band
No ratings yet
Radio-Frequency Channel Arrangements For Medium-And High-Capacity Digital Fixed Wireless Systems Operating in The 6 425-7 125 MHZ Band
9 pages
Layered Space-Time Architecture For Wireless Communication in A Fading Environment When Using Multi-Element Antennas
No ratings yet
Layered Space-Time Architecture For Wireless Communication in A Fading Environment When Using Multi-Element Antennas
19 pages
Mobile Communications Networks - Midterm Exam - Feb 2025
No ratings yet
Mobile Communications Networks - Midterm Exam - Feb 2025
5 pages
Pioneer Pml009a Datasheet
No ratings yet
Pioneer Pml009a Datasheet
1 page

Deep Person Re-Identification

Uploaded by

Deep Person Re-Identification

Uploaded by

WhoFi: Deep Person Re-Identification

via Wi-Fi Channel Signal Encoding

Danilo Avola, Emad Emam, Dario Montagnini, Daniele Pannone, and

Department of Computer Science, La Sapienza University of Rome

Abstract. Person Re-Identification is a key and challenging task in

Keywords: Person Re-Identification · CSI · Deep Neural Networks ·

By leveraging non-visual biometric features embedded in Wi-Fi CSI, this

2.1 Person Re-Identification via Visual Data

(LBP), color histograms, and Histograms of Oriented Gradients (HOG) were

2.2 Person Identification and Re-ID via Wi-Fi Sensing

3.1 Data Pre-processing

Channel State Information (CSI): Wi-Fi transmission relies on electromag-

k ∈ K represents the CFR H (θ,γ) between the receiving antenna (RX) θ ∈ Θ

Amplitude Filtering: Signal amplitude represents the strength of the received

\text {median}(W^{p,k}) = W^{(p,k)}_{\left \lfloor w/2 \right \rfloor }, (4)

with ξ set to 3, is considered an outlier and removed.

Phase Sanitization: Signal phase represents the temporal shift of a signal. It

To remove any possible phase shifts caused by imperfect synchronization between

phase sanitization technique. The estimated phase ∠Ĥ(f )k at frequency f from

a = \frac {\angle \hat {H}(f)_{K} - \angle \hat {H}(f)_{1}}{m_K - m_1}, (9)

b = \frac {1}{K} \sum _{k=1}^{K} \angle \hat {H}(f)_{k}. (10)

Therefore, the calibrated phase ∠H ′ (f )k for each subcarrier k ∈ K can be esti-

\angle H'(f)_{k} = \angle \hat {H}(f)_{k} - a m_k + b. (11)

3.2 Data Augmentation

3.3 Deep Neural Network Architecture

In the proposed pipeline, a DNN is designed to generate a biometric signature

1. LSTM Encoder: LSTMs capture temporal dependencies in input sequences,

3.4 Loss Function

sim(q, g) = S_q Â· S^T_g. (12)

In the similarity matrix shown in Figure 2, diagonal elements indicate simi-

Fig. 2: Similarity Matrix example used in in-batch negative loss function.

Table 1: Results of each model on the NTU-Fi test set.

Model Rank-1 Rank-3 Rank-5 mAP

4 Experimental Results and Discussion

4.2 Implementation Details

Table 2: Performance comparison of different models with and without amplitude

Without filter With filter

Model Packets Rank-1 Rank-3 Rank-5 mAP

4.3 Person Re-Identification Evaluation

Table 4: Impact of data augmentation on model performance. Comparison of Rank-1

Without augmentation With augmentation

Model Layers Rank-1 Rank-3 Rank-5 mAP

demonstrated, the model utilizing the Transformers encoder exceeds in perfor-

4.4 Ablation Study

models were evaluated in this experiment, due to the increased computational

Acknowledgements. This work was supported by the “Smart unmannEd AeRial

You might also like