1 s2.0 S026322412500781X Main
1 s2.0 S026322412500781X Main
Measurement
journal homepage: www.elsevier.com/locate/measurement
A R T I C L E I N F O A B S T R A C T
Keywords: X-ray security imaging technology was vital for public safety as passenger traffic increased, resulting in a heavier
Risk assessment workload for security personnel and heightened security risks. To address this challenge and improve inspection
Joint detection efficiency, this paper introduced a joint detection risk assessment algorithm leveraging data enhancement and
Data enhancement
facial expression analysis of X-ray images of prohibited items. The authors proposed a Generative Adversarial
Generating adversarial networks
YOLOv8
Network with a Two-Stage Attention Mechanism (TSAM-GAN) to generate images of common contraband, such
as guns, knives, lighters, and toxic liquids. By fusing X-ray images of contraband with package images, this study
achieved effective data enhancement. Utilizing facial expressions from the CK + dataset, particularly natural and
frightened expressions, this study employed You Only Look Once Version 8 (YOLOv8) for simultaneous detection
of faces and enhanced X-ray images. The results were processed through a risk assessment network. Experiments
confirmed the algorithm’s effectiveness, with both qualitative and quantitative results showing improvements in
data enhancement. Real-time experiments conducted at security checkpoints demonstrated that the proposed
algorithm effectively detected and assessed risks based on facial expressions and X-ray security images.
* Corresponding author.
E-mail address: [email protected] (C. Chen).
https://doi.org/10.1016/j.measurement.2025.117422
Received 7 December 2024; Received in revised form 23 March 2025; Accepted 26 March 2025
Available online 1 April 2025
0263-2241/© 2025 Elsevier Ltd. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
X. Yu et al. Measurement 253 (2025) 117422
images necessitates specific datasets for effective contraband detection. comprehensive risk evaluation. In summary, the contributions of the
Among available public datasets, the SIXray dataset [10] features over 1 study included:
million pseudo-color X-ray images, with nearly 9,000 prohibited items (1) This paper proposed a GAN with a hybrid attention mechanism
manually annotated. The HiXray dataset [11], gathered from real-world for X-ray contraband image generation. The generator and discriminator
security checks at international airports, also offers valuable data. While were designed for adversarial training, and a hybrid network module
these datasets provide a solid foundation for detection algorithms, they that combined a spatial attention mechanism network and a self-
still exhibit limitations in the diversity and quantity of contraband items, attention mechanism was utilized to generate high-quality single
highlighting the need for further data generation. Additionally, pas contraband images.
sengers’ facial expressions during security checks can provide insights (2) Based on the principle of X-ray imaging, a fusion strategy for X-
into potential concerns. Expressions such as panic, nervousness, calm ray contraband images and package background images was designed to
ness, and happiness may indicate underlying issues. Research in facial inject contraband into the images, thereby obtaining X-ray security in
expression detection has also gained traction. For example [12], have spection images with contraband and achieving data enhancement.
explored self-supervised contrastive learning in facial expression (3) YOLOv8 was used to jointly detect X-ray security images and
recognition, presenting strategies to enhance expression-specific repre facial expression images. Additionally, a risk assessment and classifica
sentations while minimizing the impact of identity and facial style. tion network was proposed to automatically classify the detection re
Furthermore [13], methods like SimFLE have been proposed to improve sults and realize risk assessment.
facial expression recognition by learning effective facial marker
encodings without the need for extensive labeling. Overall, integrating 2. Related work
these insights into contraband detection could enhance security mea
sures at checkpoints. The integration of facial expression recognition and X-ray contra
To enhance the detection accuracy of unconventional contraband, band detection for risk assessment represents a technological and
this paper introduces a joint detection method that analyzes both the theoretical innovation, with no existing literature on the subject. To
internal contents of packages and the facial microexpressions of their provide a comprehensive background, this paper will be organized
owners. Given the challenges associated with collecting real-world X-ray around three key points: image enhancement, facial expression recog
contraband security images, this research focuses on data enhancement nition, and X-ray image contraband detection.
techniques to maximize the utility of limited datasets. Generative
adversarial network (GAN) serve as an effective approach for image 2.1. Data augmentation with generative adversarial networks
generation [14]. GANs consist of two models: a generative model G that
learns the data distribution and a discriminative model D that assesses The model’s high performance was heavily reliant on training with a
the probability that a given sample originates from the training data large number of labeled samples. However, obtaining extensive, high-
rather than from G. The objective of training G is to maximize the quality annotated data in practical scenarios proved to be challenging,
likelihood of D making incorrect predictions, which creates a minimax limiting its application in specific fields. To address this issue, data
two-player game scenario. In an ideal setting, both models converge augmentation became essential for expanding the sample size. Common
such that G perfectly replicates the training data distribution while D methods employed included image translation, flipping, cropping, and
outputs a probability of 1/2 across all inputs. When both models are adding noise. While these techniques increased the quantity of samples,
defined using multi-layer perceptrons, the entire system can be trained they did not enhance data diversity, necessitating the generation of new
via backpropagation, eliminating the need for Markov chains or com target images. The advent of GAN [14] offered innovative solutions to
plex inference networks. Numerous scholars have since proposed im these challenges. The DCGAN [15], a variant specifically designed to use
provements to GANs, significantly enhancing image generation quality. CNN for both the generator and discriminator, enabled the production of
Variants such as conditional GANs (cGAN) [15], Deep Convolutional high-quality, realistic images.
GANs (DCGAN) [16] and Wasserstein GANs (WGAN) [17] have been In the context of X-ray contraband security images, significant
explored for generating X-ray contraband images. These advancements research was conducted. For example, Li Da-shuang (2021) [18] utilized
enable the generation of single X-ray images of prohibited items and the GANs to synthesize X-ray security images containing multiple contra
application of fusion strategies for effective data augmentation [18,19]. band items from semantic label images. His Res2Net architecture
Specific details of these studies will be elaborated in the following effectively captured multi-scale features, achieving a mean average
chapter. precision (mAP) of 0.825 when using the Single Shot MultiBox Detector
The proposed algorithm implemented joint detection and risk (SSD) for detection. Dongming Liu (2022) [19] proposed an X-ray
assessment of facial microexpressions and items within packages, uti Wasserstein GAN with gradient penalty to generate various types of
lizing enhanced X-ray prohibited item security image data. To achieve prohibited goods images, employing a synthesis strategy to combine
this, a limited dataset of X-ray security inspection images was estab generated contraband images with background data. Jian Liu (2023)
lished, which included parcel images from various scenarios such as [20] implemented three GANs to synthesize prohibited X-ray security
airports and railways. These contraband images were collected in real- images. He used a pixel-to-pixel GAN to convert real contraband images
time through purchases, but acquiring a large volume of X-ray images into X-ray images, followed by background generation. Evaluating these
typically required significant time and resources. To enhance efficiency, methods with state-of-the-art object detection algorithms like YOLOv5
a GAN with a hybrid attention mechanism was introduced to facilitate demonstrated improvements of 4.6 % in [email protected] and 15.9 % in
data augmentation of X-ray contraband images. A collection platform [email protected] − 0.95.
was created where cameras captured pedestrian expressions during se
curity inspections, while X-ray images were obtained from security 2.2. Facial expression detection
machines. The expression dataset was categorized into abnormal and
natural expressions, while the X-ray contraband dataset featured four Micro-expression recognition is a complex and significant area
types of items: knives, guns, lighters, and liquids. The joint detection and within affective computing, involving the detection of subtle facial
risk assessment algorithm employed the YOLOv8 [7] object detection movements that are often imperceptible to the human eye within brief
model to identify contraband in both facial expressions and X-ray im timeframes. Recent advancements in this field have underscored its
ages. Based on field data collected, an experiment of actual security growing importance. One notable development is Micron-BERT, which
check work was carried out to verify its effectiveness. A risk assessment utilized diagonal micro-attention (DMA) to identify minute differences
strategy based on CNN was used to classify the results, enabling a between frames [21]. This method incorporated a new Point of Interest
2
X. Yu et al. Measurement 253 (2025) 117422
(PoI) module to localize and emphasize micro-expressions while mini 3.1. Construction of X-ray security image dataset
mizing background noise and interference, achieving high accuracy on
an unseen facial micro-expression dataset. Another innovative archi Common X-ray transmission technologies include single energy
tecture, LGAttNet, employed a dual-attention network for frame-level penetration technology and dual energy penetration technology. This
automatic detection of micro-expressions [22]. It featured two con study utilized dual energy transmission technology. The system
volutional neural network modules: a sparse module for initial feature employed two transmission modules and generated two different energy
extraction and a feature enhancement module. Experiments conducted levels of ray beams from a single ray source, subsequently producing two
with publicly available databases demonstrated that LGAttNet surpassed independent images. A pseudo-color image was generated by searching
state-of-the-art methods in both robustness and performance. Addi a table that corresponded to the pseudo-color and the substance. Dual
tionally, a study focused on high accuracy and interpretability elimi energy X-ray projection technology was the main method employed for
nated global displacement caused by head movement by aligning images security image generation. Two types of X-rays with different energy
based on the nose tip position [23]. It selected fourteen regions of in levels were generated from the same ray source. According to the Beer-
terest to capture subtle facial movements and employed a peak detection Lambert law, when an X-ray passes through an object, its energy will be
technique for precise localization of motion intervals. Evaluations on the attenuated, as described in Equation (1):
CAS(ME)2 and SAMM Long Video databases indicated that this method
achieved high accuracy at a relatively low computational cost. Lastly, a K = K0 • eω(σ) (1)
new feature refinement method for micro-expression recognition was
Here, K0 represented the initial X-ray intensity, K represented the
proposed, based on expression feature learning and fusion [24]. This
attenuated X-ray intensity, and w represented the attenuation function
method consisted of an expression module with an attention mechanism
of the X-ray, which depended on the cross-sectional area of each atom. It
and a classification branch, aiming to extract and predict specific
could be seen from Equation (1) that when the attenuation function was
expression features effectively.
large, the X-ray attenuation increased and the generated image was
darker. When the attenuation function was small, the X-ray attenuation
2.3. X-ray contraband object detection decreased and the generated image was brighter. For a compound, the
cross-sectional area of the molecule was calculated accordingly.
∑
A key challenge in X-ray security inspection is detecting contraband σ= σi (2)
in overlapping areas of suitcases. Many existing methods aim to enhance i
model robustness against object overlap by improving visual informa
The cross-sectional area of each atom was not a constant; it varied
tion such as color and edges. However, this approach often overlooks
with the energy of the X-ray photon. There were three main absorption
situations where objects share similar visual cues with the background
processes when the X-ray beam interacted with matter: (1) the photo
or overlap with one another. To address this, [25] proposed an X-ray
electric effect; (2) continuous scattering; and (3) discontinuous scat
security image detection algorithm based on an improved YOLOv4
tering (Compton scattering). The cross-sectional area at each energy
model, focusing on enhancing the recognition accuracy of small
level could be divided into three parts: σpe, σcs and σis corresponding to
contraband items in complex backgrounds. This algorithm incorporates
the proportions of the photoelectric effect, continuous scattering, and
deformable convolution to bolster the model’s detection capabilities for
discontinuous scattering (Compton scattering), respectively. When the
small objects. Training and testing on X-ray security datasets demon
energy was lower than 200 KeV, continuous scattering disappeared. For
strated that the improved algorithm achieves high accuracy and real-
low-energy X-rays with an energy of 60 KeV, the photoelectric effect
time recognition, even amidst complex backgrounds, significant scale
dominated, and the atomic cross-sectional area was accordingly deter
changes, and mutual occlusions. Additionally, [26] introduced the
mined. Compton scattering was dominant when the high-energy X-ray
CLCXray dataset to further this research and proposed a novel label-
energy was 160 KeV, and the atomic cross-sectional area was deter
aware mechanism (LA) to tackle the object overlap issue. Extensive
mined accordingly. The intensities of high-energy and low-energy rays,
experiments confirmed that LA effectively and robustly detects over
Ihigh and Ilow were respectively calculated:
lapping objects, showcasing its generalization ability against state-of-
the-art methods. Furthermore, [27] developed a contraband detection Khigh = K0 • eω(σis)
(3)
algorithm called “TB-YOLOv5,” addressing the low accuracy and omis Klow = K0 • eω(σpe)
sion issues associated with small object detection in X-ray datasets.
The final objective was to obtain a false-color image, so this paper
Experimental results indicated a significant performance boost, with
also involved generating a false-color image from a low-energy RAW
improvements of up to 14.9 % over YOLOv5 and 23.4 % in average
image. The dual energy value R was a crucial parameter in dual energy
accuracy at 0.5 IoU, particularly for small objects. Moreover, [28]
X-ray transmission technology, as it represented the recognition capa
introduced a scale interaction module to enhance feature perception by
bility at low energy and high relative atomic number. The calculation
allowing interactions between neighboring scales, alongside a cross-
formula for R was provided in Equation (4).
image weakly supervised semantic analysis model to differentiate
similar and distinct objects. Lastly, [29] proposed the PIXDet model, ln(K0/Khigh)
R= (4)
which integrates feature fusion and local–global semantic dependency ln(K0/Klow)
interaction, achieving excellent detection results across various datasets,
When the X-ray penetrated the object, it was assumed that the ma
including SIXray, OPIXray, CLCXray, and PIDray.
terial distribution of the substance was uniform, which could be repre
sented by the effective relative atomic number Zeff. By fitting Equation
3. Method
(8) and the traditional R-Zeff curve, Zeff was calculated as shown in
Equation (5):
The study introduced a data augmentation algorithm based on
generative adversarial networks and a joint detection and risk assess Zeff = λ1exp(ξ1R) − λ2exp( − ζ2R) (5)
ment method using face and security images. Specifically, it included:
(1) the construction of a security image dataset; (2) X-ray contraband Here, λi (i = 1,2) and ζi (i = 1,2) constanted depending on the high and
security image data enhancement based on TSAM-GAN; (3) a joint low energy values. The dual-energy X-ray pseudo-color image was
detection and risk assessment algorithm based on face and security colorized according to the distribution of the effective relative atomic
images. number Zeff, and finally the X-ray security check pseudo-color image was
3
X. Yu et al. Measurement 253 (2025) 117422
obtained. The dataset is shown in Fig. 1. connected to obtain the context feature information of the X-ray images
[33,34,35]. This module established relationships between different
3.2. Data augmentation for X-ray contraband security images parts of the image and effectively extracted the features of the key re
gions.Fig. 4..
In this paper, a GAN incorporating a two-stage attention mechanism The TSAM discriminator was down-sampled using a CNN and was
[30,31,32] was proposed to generate synthetic X-ray contraband im structured into six image down-sampling modules. The first five mod
ages. Subsequently, a fusion strategy for combining contraband and ules comprised an image convolutional layer, a batch normalization
background images was designed, based on the principles of X-ray im layer, and a Leaky-ReLU activation function. The sixth module, how
aging, to produce the final X-ray contraband security images for data ever, consisted of an image convolutional layer and a Sigmoid activation
augmentation purposes. The algorithm flow of the TSAM-GAN is illus function. This setup allowed for the assessment of the authenticity of
trated in Fig. 2. A noisy image was input into the generator, which then both generated and real images [36,37,38,39,40]. The loss function
produced a synthetic X-ray contraband image. Both real and synthetic between the TSAM discriminator and generator is described by Equa
images were fed into the discriminator, which was trained through a tions (6) and (7).
true–false game. Ultimately, high-quality X-ray contraband images were
FD = Exr∼Preal [(D(x, xr) − 1)2 ] + Exf∼Pfake [(D(G(x, xf)) − 0)2 ] (6)
obtained.
X-ray images differ from traditional visible light images in that they
FG = Exf∼Pfake [(D(G(x, xf)) − 1)2 ] + Exr∼Preal [(D(x, xr) − 0)2 ] (7)
are transmission images, revealing the placement of items within the
scanned package. Consequently, these images contain a wealth of in where xr represented the real output image, xf represented the gener
formation and often feature numerous overlapping areas. The TSAM ated image, and x represented the input image.
generator was structured into six image up-sampling modules. The first In order to ensure the similarity between the X-ray generated image
five modules comprised an image deconvolution layer, a two-stage and the real image, the FL1 loss function was added on the basis of the
attention mechanism module, a batch normalization layer, and a ReLU generative confrontation, as shown in Equation (8):
activation function. The sixth module included an image deconvolution
layer and a Tanh activation function, with skip connections established FL1 = Ex,xr,xf [‖xr − G(x, xf)‖1] (8)
between each layer. Finally, 128*128 target X-ray contraband images
The final loss function LTSAM-GAN was shown in Equation (9):
were randomly generated from 1*1 noise.
The X-ray security image was composed of a foreground, which LTSAM − GAN = FG + λFL1 (9)
represented the item being inspected, and a background that consisted
After the X-ray contraband image was generated, it was necessary to
of the suitcase or package where the item was stored. To focus on the
fuse it with the package image that did not contain contraband to obtain
information-rich foreground and occlusion areas, a two-stage attention
the final X-ray security image. The fusion was performed according to
mechanism module with good performance was incorporated into the
the X-ray imaging principle. The relationship between the gray value G
network. The structure of this network is shown in Fig. 3. The spatial
of the X-ray image and the intensity of the X-ray was established as
attention mechanism module performed global and average pooling on
follows:
the input feature maps and concatenated them according to the channel.
Additionally, a two-headed self-attention mechanism module was G = AK + B (10)
Where, A and B were the relevant weights. The grayscale represen
tation of the image was as follows:
Gsyn = AKsyn + B (11)
According to equations (9), (10) and (11), the mathematical rela
tionship between the gray value Gsyn of synthesized image, the gray
value Gcon of contraband image and the gray value Gback of parcel
background image can be obtained as follows.
Gsyn = (Gcon − B) • (Gback − B)/G0 + B (12)
Here, G0 was the energy intensity received by the detector when the
X-ray is idle. According to Equation (12), the X-ray image containing
contraband can be obtained.
3.3. Joint detection and risk assessment based on face and security images
4
X. Yu et al. Measurement 253 (2025) 117422
Fig. 2. TSAM-GAN.
O = CLA[yolov8(I)] (13)
{
0, if xischanged
CLA(x) = (14)
1, if xisinvariable
where I was the input expression image. After being processed by the
YOLOv8 network, the image was classified using the CLA (Classifica
tion) function to obtain the output result O. When there was no change
in expression, O was equal to 1, and when the expression changed,
O was equal to 0. The classification results were presented in Fig. 5.
For the target detection of X-ray prohibited items security images,
Fig. 3. Two-stage attention mechanism module. the YOLOv8 detection network was also used to locate and detect pro
hibited items in the image. The length, width, and position of the target
detection box were obtained through softmax classification, as illus
trated in Fig. 6. The coordinates (x,y) represented the center of the
prohibited items detected by YOLOv8, while (h,w) denoted the height
and width of the contraband area. Similarly, the cases where contraband
was detected were classified by the CLA function and set to 0, and the
cases where contraband was not detected were set to 1.
The risk assessment network was composed of an encoding path and
a decoding path. The encoding path was used to extract multi-level
features at different depths to retain rich information. The decoding
path utilized up-sampling and convolution modules to generate high-
quality images. The down-sampling part consisted of five convolu
tional modules. Among these, the first four down-sampling module
components were: two 3*3 convolutional layers, a batch normalization
layer, and a Max pooling layer. The fifth down-sampling module com
Fig. 4. Joint detection and risk assessment network based on X-ray contraband ponents were: two 3*3 convolutional layers, a ReLU activation function,
case images and face images. a batch normalization layer, and a residual network. The up-sampling
part consisted of four deconvolution layers plus 3*3 convolutional
layers, and skip connections [41] were used to combine the up-and-
5
X. Yu et al. Measurement 253 (2025) 117422
Fig. 6. YOLOv8 detects the location and width and height of contraband.
down sampled feature maps, in order to better output the risk assess 4.2.1. X-ray contraband image generation qualitative and quantitative
ment results. experiments
A single X-ray contraband image was generated through a two-stage
4. Experiment GAN. Excellent studies from recent years were selected for experimental
comparison, and the comparison results are shown in Fig. 7. X-ray im
4.1. Experiment deployment ages of four types of prohibited goods, including guns, knives, lighters,
and harmful liquids, were generated. After comparison, it was observed
(1) The dataset. This paper primarily investigated two aspects: the that the proposed algorithm could generate images with integrity in
enhancement of X-ray contraband safety images and the risk assessment shape and fewer noise points.
based on facial and X-ray contraband images, which were reflected in To more clearly and intuitively compare the generation quality of X-
the corresponding dataset. The study utilized the 6550 dual-energy X- ray contraband images, image generation indicators were used to
ray inspection equipment developed by the Shenyang Fault Diagnosis objectively evaluate the images. Frechet Inception Distance (FID) was
Center of Northeastern University. The collected data were categorized chosen as the evaluation index for quantitative analysis [45,46]. FID
into X-ray single contraband images and X-ray packet images. For the indicators quantify the distribution difference between two groups of
risk assessment involving facial and X-ray contraband images, the re images in the feature space. A smaller FID value indicates that the
searchers employed the publicly available CK + dataset for facial feature distribution of the generated image is closer to the distribution of
expression detection, along with a data-enhanced X-ray contraband the real image, which means that the generated image is of higher
safety image dataset as experimental support. The CK + dataset is a quality. In this study, the FID of four types of contraband was evaluated
comprehensive resource for facial expression recognition, containing separately, and the sum of the results was averaged. The FID evaluation
numerous images labeled with seven distinct expressions. It includes results are shown in Table 1. After comparison, it was found that the
593 expression sequences from 123 participants, capturing a range from quality of toxic liquid containers was somewhat lower than that of other
neutral to peak expressions. Out of these sequences, 327 were labeled prohibited items, while pure solids such as guns and knives were easier
with emotional tags, representing seven basic emotions: anger, to generate. The X-ray contraband image generated by the proposed
contempt, disgust, fear, happiness, sadness, and surprise. In this study, algorithm had the smallest FID value, indicating that the proposed al
the researchers specifically focused on the data for angry, sad, fearful, gorithm generated the highest quality images.
and calm expressions. To enhance recognition accuracy, data augmen
tation was performed on the CK + dataset [42,43,44], increasing each 4.2.2. Fusion of X-ray contraband security images
expression image count to 10,000. The enhancement methods included After generating the X-ray contraband images, the contraband items
rotation, scaling, and image blurring, aimed at improving the algo could be injected at any position into an X-ray security image that
rithm’s ability to accurately recognize subtle changes in facial originally contained no contraband. As shown in Fig. 8, the four cate
expressions. gories of generated contraband items: guns, knives, lighters, and toxic
(2) Experimental process. In the X-ray prohibited item security liquid containers, were fused [47,48,49] into a parcel image, resulting in
image data augmentation experiment, the TSAM-GAN algorithm was a realistic X-ray security inspection image. The experimental results
trained with a set of 100 epochs and a learning rate of 1e-3. The Adam demonstrated that the X-ray prohibited items security images, after data
optimizer was used with a parameter of β1 = 0.9, and the batch size was enhancement, could be utilized for subsequent target detection tasks.
set to 48. In the face and X-ray prohibited images risk assessment ex
periments, the epochs were set to 100, the learning rate was 1e-4, and 4.3. Implementation of joint detection and risk assessment based on face
the Adam optimizer was similarly used with a parameter of β1 = 0.9. and X-ray contraband images
The batch size was set to 24. To improve the training time, two 3060
GPU graphics cards were utilized for training, and the training process 4.3.1. Test experiment based on joint detection and risk assessment of face
took 6 h. and X-ray contraband images
After data enhancement of the X-ray prohibited goods security im
4.2. Data augmentation for X-ray contraband security images ages, they were jointly detected along with images showing calm and
fear expressions from the CK + dataset [50,51] using YOLOv8. The
To achieve the risk assessment goal, the data volume of X-ray detection results were then sent as input to the risk assessment network
contraband security images was considered very important. Data for training, ultimately yielding the risk assessment results. The exper
augmentation experiments were conducted on these images, which were imental results are presented in Fig. 9. YOLOv8 was able to accurately
divided into single X-ray contraband image generation and X-ray identify changes in facial expressions and the presence of contraband in
contraband security image fusion. Qualitative and quantitative experi the X-ray security images. In this study, a lighter was chosen as the
ments were carried out on the X-ray contraband image generation ex contraband item due to its small size and difficulty in detection, and it
periments. The experimental results will be introduced and analyzed in was used to predict danger or safety in conjunction with changes in
detail next. facial expressions. The risk assessment network, depicted in Fig. 10,
predicted the situation inside the final package. It considered the sce
nario dangerous when there was a change in facial expression and the
package contained contraband. Conversely, it judged the situation as
6
X. Yu et al. Measurement 253 (2025) 117422
Fig. 7. Comparison of the proposed X-ray contraband image generation algorithm with other algorithms.
safe when the expression was calm and there was no contraband in the
Table 1
package. Table 2 presents the accuracy of the network evaluation for
Experimental results of FID evaluation.
100 assessments across four categories of contraband. It can be observed
Algorithm FID that when the contraband was a lighter, the accuracy was slightly lower,
Gun Knife Lighter Fluid container Average but it still remained above 0.95. Therefore, the proposed algorithm was
Da-shuang L 25.89 17.52 89.25 105.64 59.58
able to achieve accurate risk assessment.
Dongming L 19.58 12.43 72.14 98.54 50.67
Jian Liu 15.29 9.87 64.25 88.74 44.54 4.3.2. Combined detection of face and X-ray contraband images under
Proposed 11.65 7.52 50.85 85.55 38.89 realistic conditions
This paper applied the proposed algorithm to real-world security
check operations and conducted experiments in actual scenarios to
validate its effectiveness. Specifically, they performed real-time
7
X. Yu et al. Measurement 253 (2025) 117422
detection of individuals’ facial expressions and the contents of packages expression changed and the package contained prohibited items, the
in a subway environment. During training, they synthesized contraband scenario was assessed as dangerous; conversely, if the expression
images and package images from real security check scenes. For facial remained unchanged and the package did not contain contraband, it was
expression recognition, this paper selected the publicly available CK + deemed safe.
dataset and enhanced its data for training. In the testing phase, they However, the algorithm also had limitations. In security screening
evaluated X-ray contraband images and facial expression images using situations, if a pedestrian’s face was obscured or the captured image was
individuals and objects in real security scenarios. Participants in the not frontal, there could be errors in expression detection. As illustrated
facial expression recognition experiment were carefully selected pe in Fig. 12, when the pedestrian was in profile or there was hand
destrians whose expressions were likely to change as they passed obstruction, the facial detection bounding box might be misaligned,
through security. Additionally, the packages were ensured not to resulting in incorrect expression recognition. Future research should
contain prohibited items, and contraband was not randomly placed. The focus on enhancing the diversity of facial expression data to better adapt
experimental results were illustrated in Fig. 11. The algorithm effec to more complex and variable application scenarios.
tively detected changes in pedestrians’ facial expressions and identified
whether packages contained contraband. When a pedestrian’s
8
X. Yu et al. Measurement 253 (2025) 117422
Fig. 10. Experimental results of joint detection and risk assessment based on facial expression and X-ray contraband security images.
9
X. Yu et al. Measurement 253 (2025) 117422
contraband map security check image to achieve data augmentation. In [3] S. Ren, K. He, B.G. Ross, J. Sun, et al., Faster R-CNN: towards real-time object
detection with region proposal networks[J], IEEE Trans. Pattern Anal. Mach. Intell.
this paper, the CK + public face dataset was selected as the expression
39 (6) (2017) 1137–1149.
detection data, and the YOLOv8 algorithm was used to jointly detect the [4] L. Wei, A. Dragomir, E. Dumitru, S. Christian, R. Scott, F. Cheng-Yang, C.
enhanced X-ray contraband security images and facial expression data, B. Alexander, et al., SSD: single shot MultiBox detector[J], Lect. Notes Comput. Sci
obtaining the detection information of facial expression changes and (2016) 21–37.
[5] B. Alexey, W. Chien-Yao, M.L. Hong-Yuan, YOLOv4: optimal speed and accuracy of
contraband types. An autoencoder structure hazard assessment network object detection[J], CoRR (2020).
was designed, and the jointly detected information was input into it to [6] Z. Fangbo, Z. Huailin, N. Zhen, Safety Helmet Detection Based on YOLOv5[J],
obtain the hazard assessment information. Finally, real-time detection 2021 IEEE International Conference on Power Electronics, Computer Applications
(ICPECA), 2021, pp. 6-11.
and risk assessment were carried out based on the collected on-site se [7] L. Yan, M. Yue, L. Ying, Protocol for assessing neighborhood physical disorder
curity data. The experiment demonstrated that the proposed algorithm using the YOLOv8 deep learning model[J], STAR Protoc. 5 (1) (2024) 102778.
was effective and could be applied to actual security work. [8] D. Janhavi, N. Preeth, S. P,K. Soumitra K, et al., Threat detection in X-ray images
using CNN[J], International journal of imaging and robotics 20(1) (2020) 19-27.
This method represented a new attempt in security inspection. [9] Y. Fenghong, J. Runqing, Y. Yan, X. Jing-Hao, W. Biao, W. Hanzi, et al., Dual-mode
Although joint face and contraband detection and risk assessment could learning for multi-dataset X-ray security image detection[J], IEEE Trans. Inform.
be achieved, there was still room for improvement and further research. Forensics Secur. 19 (99) (2024) 3510–3524.
[10] M. Caijing, X. Lingxi, W. Fang, S. Chi, L. Hongye, J. Jianbin, Y. Qixiang, et al.,
The study had only two outcomes: danger and safety, which was SIXray : a large-scale security inspection X-ray benchmark for prohibited item
considered too simple. In future studies, it was hoped to refine the discovery in overlapping images[J], Comput. Res. Reposit. (2019) 2114–2123.
danger level according to more complex situations, which would be [11] T. Renshuai, W. Yanlu, J. Xiangjian, L. Hainan, Q. Haotong, W. Jiakai, M. Yuqing,
Z. Libo, L. Xianglong, et al., Towards real-world X-ray security inspection - a high-
more conducive to the judgment of security personnel. Additionally, on
quality benchmark and lateral inhibition module for prohibited items detection[C],
the basis of facial expressions, there was a plan to add some abnormal IEEE Int. Conf. Comput. Vision (2021) 10903–10912.
behavior detection, such as fighting, pushing, gathering, and other [12] S. Yuxuan, G. Xiao, Y. Guang-Zhong, L. Benny, et al., Revisiting self-supervised
events, so that the security work could be more comprehensive. There contrastive learning for facial expression recognition[C], Brit. Mach. Vision Conf.
(2022).
fore, these aspects were intended to be studied in future work. [13] M. Jiyong, S.P. SimFLE, Simple facial landmark encoding for self-supervised facial
expression recognition in the wild[J], IEEE Trans. Affect. Comput. PP(99) (2024)
CRediT authorship contribution statement 1–16.
[14] J.G. Ian, P. Jean, M. Mehdi, X. Bing, W. David, O. Sherjil, C. Aaron, B. Yoshua, et
al., Generative adversarial nets[J], Comput. Sci. 29 (5) (2017) 177.
Xizhuo Yu: Writing – original draft. Chunyang Chen: Writing – [15] M. Mehdi, O. Simon, Conditional generative adversarial nets.[J], Comput. Res.
review & editing. Shu Cheng: Data curation. Jingming Li: Software. Reposit. (2014).
[16] R. Alec, M. Luke, C. Soumith, Unsupervised representation learning with deep
convolutional generative adversarial networks[C], Int. Conf. Learn. Representat.
Declaration of competing interest (2015) abs/1511.06434.
[17] H. Md. Nazmul, J. Sana Ullah, K. Insoo, Wasserstein GAN-based digital twin-
inspired model for early drift fault detection in wireless sensor networks[J], IEEE
The authors declare that they have no known competing financial Sens. J. 23(12) (2023) 13327-13339.
interests or personal relationships that could have appeared to influence [18] L. Da-shuang, H. Xiao-bing, Z. Hai-gang, Y. Jin-feng, et al., A GAN based method
for multiple prohibited items synthesis of X-ray security image[J], Optoelectron.
the work reported in this paper.
Lett. 17 (2) (2021) 112–117.
[19] L. Dongming, L. Jianchang, Y. Peixin, Y. Feng, et al., A Data augmentation method
Acknowledgments for prohibited item X-ray pseudocolor images in X-ray security inspection based on
wasserstein generative adversarial network and spatial-and-channel attention
block[J], Comput. Intell. Neurosci. 2022 (2022) 8172466.
This work is partially supported by The National Natural Science [20] L. Jian, T.H.L. Lin, A framework for the synthesis of X-ray security inspection
Foundation of China (52072414) and The Postgraduate Scientific images based on generative adversarial networks.[J], IEEE Access 11 (2023)
Research Innovation Project of Central South University 63751–63760.
[21] N. Xuan-Bac, N.D. Chi, L. Xin, G. Susan, S. Han-Seok, L. Khoa, et al., Micron-BERT:
(2023XQLH067). BERT-based facial micro-expression recognition.[J], Comput. Res. Reposit. (2023)
1482–1492.
Data availability [22] A.T. Madhumita, T. Selvarajah, R. Sutharshan, C. Zenon, X. Min, Y. John, et al.,
LGAttNet: Automatic micro-expression detection using dual-stream local and
global attentions.[J], Knowl.-Based Syst. 212 (2021) 106566.
The data that has been used is confidential. [23] H. Yuhong, X. Zhongliang, M. Lin, L. Haifeng, et al., Micro-expression spotting
based on optical flow features[J], Pattern Recogn. Lett. 163 (2022) 57–64.
[24] Z. Ling, M. Qirong, H. Xiaohua, Z. Feifei, Z. Zhihong, et al., Feature refinement: an
References expression-specific feature learning and fusion method for micro-expression
recognition[J], Pattern Recogn. 122 (1) (2022) 108275.
[1] B.G. Ross, D. Jeff, D. Trevor, M. Jitendra, et al., Rich feature hierarchies for [25] Z. Cheng, X. Hui, Y. Bicai, Y. Weichao, Z. Chenwei, et al., X-ray Security inspection
accurate object detection and semantic segmentation[J], Computi. Res. Reposit. image detection algorithm based on improved YOLOv4[J], in: 2021 IEEE 3rd
2014 (1) (2014) 580–587. Eurasia Conference on IOT, Communication and Engineering (ECICE), 2021: 546-
[2] Ross G. Fast R-CNN[J], 2015 IEEE International Conference on Computer Vision 550.
(ICCV), 2015, abs/1504.08083: 1440-1448.
10
X. Yu et al. Measurement 253 (2025) 117422
[26] Z. Cairong, Z. Liang, D. Shuguang, D. Weihong, W. Liang, et al., Detecting [40] H. Ligong, R.M. Martin, S. Anastasis, T. Yu, G. Ruijiang, K. Asim, M. Dimitris, et al.,
overlapped objects in X-ray security imagery by a label-aware mechanism[J], IEEE Dual projection generative adversarial networks for conditional image generation
Trans. Inf. Forensics Secur. 17 (2022) 998–1009. [C], IEEE Int. Conf. Comput. Vis. (2021) 14418–14427.
[27] W. Muchen, Z. Yueming, L. Yongkang, D. Huiping, et al., X-ray small target security [41] L. Fenglin, R. Xuancheng, Z. Zhiyuan, S. Xu, Z. Yuexian, et al., Rethinking skip
inspection based on TB-YOLOv5[J], Secur. Commun. Netw. 2022 (2022) 1–16. connection with layer normalization in transformers and ResNets[J], arXiv
[28] L. Dongsheng, T. Yan, X. Zhaocheng, J. Guotang, et al., Handling occlusion in (Cornell University), 2021, abs/2105.07205.
prohibited item detection from X-ray images[J], Neural Comput. & Applic. 34 (22) [42] S. Connor, T.M. Khoshgoftaar, A survey on image data augmentation for deep
(2022) 20285–20298. learning[J], J. Big Data 6 (1) (2019).
[29] X. Yan, Z. Qiyuan, S. Qian, L. Yu, et al., PIXDet: prohibited item detection in X-Ray [43] C. Phillip, M. Hang, V. Nym, D. Jason, H. Lois, H. Annette, et al., A review of
image based on whole-process feature fusion and local–global semantic medical image data augmentation techniques for deep learning applications[J],
dependency interaction[J], IEEE Trans. Instrum. Meas. 72 (2023) 1–17. J. Med. Imaging Radiat. Oncol. 65 (5) (2021) 545–563.
[30] A. Fatemeh, J.N.N. Jean-Francois, P. Sebastian, R. Federico, H. Jörn, D. Andreas, et [44] Z. Chunling, B. Nan, S. Hang, L. Hong, L. Jing, Q. Wei, Z. Shi, et al., A deep learning
al., Spatial transformer networks for curriculum learning[J], Comput. Sci. Res. image data augmentation method for single tumor segmentation[J], Front. Oncol.
Reposit. abs/2108.09696 (2021) 1–7. 12 (2022) 782988.
[31] H. Jie, S. Li, S. Gang, Squeeze-and-excitation networks[J], IEEE Trans. Pattern [45] J. Sadeep, R. Srikumar, V. Andreas, G. Daniel, C. Ayan, K. Sanjiv, et al., Rethinking
Anal. Mach. Intell. (2018) 7132–7141. FID: towards a better evaluation metric for image generation[J], Comput. Res.
[32] L. Xiang, W. Wenhai, H. Xiaolin, Y. Jian, et al., Selective kernel networks[J], Reposit. (2023) abs/2401.09603.
Proceedings - IEEE Computer Society Conference on Computer Vision and Pattern [46] K. Tuomas, K. Tero, A. Miika, A. Timo, L. Jaakko, et al., The role of ImageNet
Recognition (2019) 510–519. classes in fréchet inception distance[J], ICLR (2023, 2023.).
[33] L. Tsung-Yu, R. Aruni, M. Subhransu, Bilinear CNN models for fine-grained visual [47] D. Xin, Z. Yutong, X. Mai, G. Shuhang, D. Yiping, et al., Deep coupled feedback
recognition[C], IEEE International Conference on Computer Vision (2015) network for joint exposure fusion and image super-resolution[J], IEEE Trans.
1449–1457. Image Process. 30 (2021) 3098–3112.
[34] W. Sanghyun, P. Jongchan, L. Joon-Young, In: K. So CBAM: Convolutional Block [48] V. Divya, H. Taimur, D. Ernesto, W. Naoufel, et al., Recent Advances in baggage
Attention Module[J], Comput. Vis. - ECCV 2018, PT VII 11211 (2018) 3–19. threat detection: a comprehensive and systematic survey[J], ACM Comput. Surv.
[35] P. Jongchan, W. Sanghyun, L. Joon-Young, S.K. In, et al., BAM: bottleneck 55 (8) (2022) 1–38.
attention module[C], British Machine Vision Conference (2018). [49] À.P. Robin Riz, S. Yanik, S. Adrian, How realistic is threat image projection for X-
[36] L. Xinying, Z. Doudou, X. Jianli, A hybrid model for traffic incident detection based ray baggage screening?[J], Sensors 22 (6) (2022) 2220.
on generative adversarial networks and transformer model[J], Comput. Sci. Res. [50] L. Patrick, F.C. Jeffrey, K. Takeo, M.S. Jason, A. Zara, A. Iain Matthews, The
Reposit. (2024) abs/2403.01147. extended Cohn-Kanade dataset (CK+): a complete dataset for action unit and
[37] S. Jiaze, B. Binod, C. Zhixiang, K. Tae-Kyun, et al., SeCGAN: parallel conditional emotion-specified expression“, Conference on Computer Vision and Pattern
generative adversarial networks for face editing via semantic consistency[J], arXiv Recognition Workshops. IEEE Computer Society Conference on Computer Vision
preprint arXiv, 2021, abs/2111.09298. and Pattern Recognition. Workshops 2010.1 (2010): 94-101.
[38] Y. Guanglei, T. Hao, S. Humphrey, D. Mingli, S. Nicu, T. Radu, V.G. Luc, R. Elisa, [51] S. Rhea, R. Harshit, P. Preksha, T. Ankit, A Comparative study of machine learning
et al., Global and local alignment networks for unpaired image-to-image techniques for emotion recognition[J], Emerg. Res. Comput. Inform. Commun.
translation[J], arXiv preprint arXiv, 2021, abs/2111.10346. Appl. Adv. Intell. Syst. Comput. (2019) 459–464.
[39] W. Chao, N. Wenjie, J. Yufeng, Z. Haiyong, Y. Zhibin, G. Zhaorui, Z. Bing, et al.,
Discriminative region proposal adversarial network for high-quality image-to-
image translation[J], Int. J. Comput. Vis. 128 (10–11) (2019) 2366–2385.
11