Automated Threat Objects Detection With Synthetic Data For Real-Time X-Ray Baggage Inspection
Automated Threat Objects Detection With Synthetic Data For Real-Time X-Ray Baggage Inspection
net/publication/354710126
Automated Threat Objects Detection with Synthetic Data for Real-Time X-ray
Baggage Inspection
CITATIONS READS
13 88
6 authors, including:
SEE PROFILE
All content following this page was uploaded by Kunal Chaturvedi on 01 October 2024.
Abstract— With the recent surge in threats to public safety, the non-threat objects. It is a challenging process as it incorporates
security focus of several organizations has been moved towards the two most important tasks: image classification and
enhanced intelligent screening systems. Conventional X-ray localization of objects with bounding box coordinates. The
screening, which relies on the human operator is the best use of advent of deep learning techniques using Convolutional Neural
this technology, allowing for the more accurate identification of
2021 International Joint Conference on Neural Networks (IJCNN) | 978-1-6654-3900-8/21/$31.00 ©2021 IEEE | DOI: 10.1109/IJCNN52387.2021.9533928
978-0-7381-3366-9/21/$31.00 ©2021
Authorized licensed use limited to: IEEEof Technology Sydney. Downloaded on October 01,2024 at 10:14:25 UTC from IEEE Xplore. Restrictions apply.
University
II. RELATED WORKS III. METHODOLOGY
This section briefly addresses the previous methodologies Due to the lack of available data, the proposed approach
for X-ray security imagery and approaches for generating (AXSD) aims to generate accurate synthesized images from an
synthetic data. Security screening has become a major concern existing dataset GDXray. The generated images are evaluated
for organizations across the world due to a massive increase in using different object detection methods.
the influx and efflux of passengers, goods, and cargo in the past
A. AXSD: Automated X-ray Synthetic Data generation
decades. Due to this recent surge of interest in the field of X-ray
security imagery [4], conventional machine learning approaches This section presents an automated X-ray synthetic data
for classification [5] [6] [7], segmentation [8], [9], and detection generation (AXSD) to minimize the empirical limits with the
[10] have been extensively studied to automate the process. A availability of enormous training data for deep learning based
majority of previous classification work was carried out using object detection. AXSD consists of two components: data
bag of visual words (BoVW) [5] approach with SVM classifier, collection and object superposition.
and sparse representations [6]. A transfer-learning based i. Data Collection
approach [11] was proposed for threat object classification. Hu
et al. [12] proposed Security X-ray Multi-label Classification In the first component, the sampled data are randomly
Network (SXMNet) to deal with overlapping in X-ray images gathered of X-ray images of empty baggage and instances of
classification. However, the object detection is more beneficial threat/non-threat objects to perform the experiments in this
over the classification of objects for the detailed analysis of the paper. These instances of objects are extracted using the
contents in the X-ray imagery of baggage. Automated methodology described as shown in Fig. 1.
localization with bounding box dimensions and identification of Edge Detection: As the first step, for foreground/background
threat object assists security officers to avert bizarre human- segmentation, we use a cellular nonlinear network-based [23]
made disasters. Franzel et al. [13] conducted experiments on edge detector to predict fine edge-map from the foreground X-
multiple-view X-ray imagery and compares it to single-view ray image. The network takes up three arguments: input image,
detection that demonstrated superior detection efficiency over number of iterations, and a combination of 19 parameters,
single-view detection for handguns. With the advent of deep defined as cloning template. Here, these parameters under
learning in object detection, CNN based frameworks replaced optimization are iteratively adjusted so that its output converges
the approaches based on hand-crafted features. Samet et al. [14] to the output of the ideal edge detector. After the template
introduced the region-based object detection methodologies, learning, the optimised detector slides through the pixels,
Faster RCNN [15] for this purpose, but the analysis does not computing operations to evaluate edges. The predicted edge-
take into account the speed of the detector, which is important map is the union of edge sets detected along the and
for rapid threat detection. Dhiraj et al. [16] conducted directions respectively.
experiments on the X-ray scans of the GRIMA X-ray Database
(GDXray) that proposed by Mery et al. [3]. The dataset consists Noise Removal: Pre-processing technique such as median
of instances of guns, shuriken, razor blades, and knives using blur is applied on the predicted edge-map to eliminate noise.
different object detection frameworks. This step is necessary before contour detection, which generates
a hierarchy of contours.
However, the training of deep learning models becomes
challenging due to the restricted availability of private security Foreground Extraction: Further, the graph-based GrabCut
screening datasets and the skewed ratio of non-threatening algorithm [24] is applied to the contour with the largest area to
objects to threatening objects. Different strategies have been extract the foreground mask, which in our case, belongs to the
adopted to compensate for the data for X-ray security imagery instances of the threat item / non-threat item.
by generating synthetic dataset using General Adversarial
Networks (GAN) [17] [18] [19], Threat Image Projection [10],
and Logarithmic X-ray Imaging Model [20]. Yang et al. [17] and
Zhu et al. [18] used GANS that are limited to generating Median Blur +
Contour
synthetic instances of foreground objects including handgun, Detection
knife, etc. ignoring complex baggage information. These Edge Detection
using Cellular
approaches do not take into consideration of cluttered Neural GrabCut
Network
background situations, including noise, and the use of non- Algorithm on Foreground
the Contour
threatening objects. Most of the recent literature on GANs with the
[21][22] revolves around learning a mapping from a single largest Area
image source, i.e. image translation that restricts the ability of Hierarchy of Contours Generated
Mask
full control over the rotation, scaling, radiopacity, occlusion, etc. Background
This paper generates simulated X-ray images using real Input Instance
background and object instances with operations that allow full
Fig. 1 Illustration of the proposed foreground/background methodology for
user control over the rotation, scaling, radiopacity, occlusion, the extraction of the instances of prohibited items.
etc. on the foreground object to preserve both realism and
domain adaptation.
Authorized licensed use limited to: University of Technology Sydney. Downloaded on October 01,2024 at 10:14:25 UTC from IEEE Xplore. Restrictions apply.
a. Data Collection b. Object Pasting
Sample Background Image
Sample Random Threat items
Radiopacity
Foreground
Occlusion
Sample Non-threat items Extraction
Radiopacity
Rotation
Scaling
Occlusion
Fig. 2. A detailed explanation of our approach for a) Data collection and b) Object superposition
ii. Object Superposition background image to generate a genuine simulated image. The
use of occlusion with the radiopacity principle on both
The extracted foreground items are superimposed randomly
threat objects and distractors increases the X-ray image's scene
within a small section of the luggage in the background X-ray
realism while maintaining global consistency with the existing
images. Contour detection is used to determine the boundary
dataset. A summary of the proposed framework is shown in Fig.
regions of the luggage. Due to the limited data, data
4.
augmentation techniques are applied to the foreground items to
generate synthetic images. The main steps of the proposed
method are shown in Fig. 2b and described as follows: 0 1
Scaling: A random scale with values ranging from 0.5 to 1
Fig. 3. The radiopacity parameter is adjusted according to the
is chosen based on the size of the foreground object and the intermediate values between 0 and 1. The higher the value, the less
background luggage image. radiopaque the object will be.
Rotation2D: Different 2D viewpoints from the same object B. Evaluate the synthetic data using object detection methods
with an angle of rotation from -180 to 180 degrees are possible
inside the baggage as the position of the object does not matter. We evaluated different object detection strategies. These
detectors are based on the categorization of the conventional
Non-Threat Items: Multiple distractors are superimposed object detection system, which evaluates the regional proposals
onto the baggage X-ray image to simulate real-world scenarios and then classifies each proposal as a threat or non-threat
to prevent bias in the training algorithm. element. Another one uses a regression or classification
Radiopacity: The relative inability of electromagnetic problem-based unified framework to evaluate bounding-box
radiation to pass through a material varies by thickness and the coordinates and classification probabilities.
form of the material. These conditions are simulated by i. Region Proposal based Architectures
adjusting the alpha channel as shown in Fig. 3.
Faster RCNN proposed by Ren et al. [15] substitute the selective
Occlusion: Partial overlapping with a maximum Intersection search algorithm with the introduction of the Region Proposal
over Union (IOU) of 0.75 between different threat items and Network (RPN). It assigns a score to each anchor box that is of
non-threat items is used when superimposing it on the varying size and scale generated by sliding a fixed-sized window
Generation of Synthetic Input Training Phase Testing Phase / Output
Data
Synthetic
Data
Object
Detector
Real
Data
Fig. 4. Overview of our proposed approach. We first use the object superimposition to generate realistic data samples. Then, we use a
combination of real and generated synthetic data for the object detection pipeline.
Authorized licensed use limited to: University of Technology Sydney. Downloaded on October 01,2024 at 10:14:25 UTC from IEEE Xplore. Restrictions apply.
over the activation maps generated by the base network’s last scales allow the SSD to detect objects of different scales and
convolutional layer. After applying an ROI pooling layer, the sizes.
proposals generated are passed onto the RCNN. It uses two RetinaNet [28] addresses the issue of the class imbalance
different fully connected layers that output the bounding box and (foreground/background) problem faced by unified object
classifies whether a suspicious entity is present or not. Any detection frameworks with a reconfiguration of cross-entropy
redundant entity (overlapping bounding box) is further loss. Focal loss decreases the share of high-probability
eliminated using non-maximum suppression (NMS). detections that can dominate the model’s performance. It
Cascade RCNN [25] developed by Cai et al., is a multi- incorporates the feature pyramid network, a regression
staged framework based on the popular RCNN. At each stage, subnetwork that evaluates bounding box coordinates using a
cascaded bounding box regression combined with a sequence of regression head and classification subnetwork to allocate classes
detectors operates by changing the bounding boxes to be more to these boxes.
selective against the close false positives used in the next stage
of training. This reduces the risk of overfitting during training IV. EXPERIMENT RESULTS AND DISCUSSIONS
and overcomes a quality mismatch at inference, ensuring high- In this section, a comprehensive study is carried out on
quality detection performance over Faster RCNN when the different combinations of synthesized data and human-
items are placed next to one another in the luggage. annotated data using region-based and classification/regression-
ii. Classification/Regression based Architectures based CNN frameworks. Average precision (AP) is used as the
evaluation criteria for the different frameworks and precision,
Single Shot MultiBox Detector (SSD) [26] by Liu et al. uses and recall is determined by thresholding the Intersection-over-
a unified framework for both localizing objects and Union (IoU) for detection. The dataset is split into a ratio of
classification, eliminating the need for a proposal stage. It also 70:30 for training and testing, respectively. The number of
resolves the issue of low-resolution object detection within the epochs is fixed to 50 and the batch size to 4. The PyTorch
YOLO [27] framework by incorporating the idea of making framework is used for all the experiments. We first describe the
faster predictions for bounding boxes and their labels at each characteristics of the dataset. We then report the results of our
feature map than Faster RCNN [15]. Feature maps of varying method using different proportions of synthetic images mixed
Table 1. Comparison of dataset and + ℎ dataset based on the average precisions (AP) metrics using architectures
based on the two-stage framework and the one-stage framework.
Backbone Average Precision
Model Dataset mAP
Network Mobile Chip Shuriken Revolver Gun Knife Blade
GDXray 0.907 0.907 0.917 0.996 0.909 0.905 0.909 0.921
10% syn 0.910 0.924 0.925 0.979 0.899 0.847 0.913 0.913
20% syn 0.915 0.925 0.931 0.979 0.902 0.873 0.920 0.920
Faster RCNN ResNet50
30% syn 0.915 0.919 0.932 0.981 0.909 0.898 0.921 0.925
40% syn 0.903 0.908 0.931 0.999 0.952 0.948 0.921 0.937
50% syn 0.917 0.915 0.929 0.968 0.909 0.901 0.922 0.923
GDXray 0.908 0.904 0.909 0.995 0.906 0.907 0.913 0.920
10% syn 0.891 0.891 0.902 0.984 0.960 0.899 0.901 0.918
20% syn 0.899 0.902 0.903 0.989 0.969 0.903 0.901 0.922
Cascade RCNN ResNet50
30% syn 0.907 0.901 0.914 0.991 0.959 0.937 0.911 0.931
40% syn 0.904 0.907 0.929 0.999 0.961 0.954 0.923 0.939
50% syn 0.901 0.906 0.911 0.999 0.961 0.939 0.913 0.932
GDXray 0.899 0.888 0.897 0.941 0.928 0.834 0.907 0.889
10% syn 0.894 0.857 0.897 0.901 0.914 0.817 0.878 0.879
20% syn 0.903 0.833 0.901 0.898 0.925 0.851 0.879 0.884
SSD512 VGG16
30% syn 0.912 0.878 0.915 0.906 0.918 0.863 0.901 0.899
40% syn 0.903 0.903 0.907 0.964 0.907 0.859 0.909 0.907
50% syn 0.904 0.903 0.897 0.901 0.906 0.858 0.909 0.896
GDXray 0.905 0.893 0.909 0.953 0.933 0.842 0.909 0.906
10% syn 0.908 0.875 0.906 0.907 0.904 0.864 0.874 0.891
20% syn 0.912 0.884 0.913 0.903 0.931 0.844 0.918 0.900
RetinaNet ResNet50
30% syn 0.925 0.915 0.912 0.922 0.926 0.856 0.924 0.911
40% syn 0.911 0.907 0.929 0.970 0.907 0.877 0.919 0.917
50% syn 0.915 0.916 0.915 0.927 0.926 0.871 0.927 0.913
Authorized licensed use limited to: University of Technology Sydney. Downloaded on October 01,2024 at 10:14:25 UTC from IEEE Xplore. Restrictions apply.
Table 2. Comparison results of the architectures trained on ℎ dataset with 10% and 20% of GDXray dataset used as test samples.
Table 3. Comparison of the average precisions (AP) on + ℎ dataset, consisting of additional classes of prohibited items.
with real images. Finally, we compare our proposed method consists of three additional classes, namely pliers, scissors, and
with the state-of-the-art. hammers. The + ℎ dataset contains the
largest number of classes of prohibited items available.
A. Datasets
B. Results
The Grima X-ray dataset (GDXray) was collected by Mery
et al. [3]. The dataset consists of five groups of X-ray scans with As shown in Table 1 and Table 3, multiclass detection is
more than 21,100 images: castings, welds, baggage, nature, and performed, and the performance is compared in terms of average
settings. However, for this research work, we used the baggage precision (AP) for each class of prohibited item using different
group consisting of 1371 X-ray images of prohibited objects. architectures i.e., Faster RCNN, Cascade RCNN, RetinaNet
The prohibited items include five classes i.e., shuriken, knives, with ResNet50 as configuration and SSD512 with VGG16 [29]
blades, firearms (guns and revolvers), and electronic items as configuration. Table 1 investigates and compares the
(chips and mobiles). Chips and mobiles are taken into dataset and the + ℎ dataset and
consideration as they are restricted in high-risk environments Table 3 evaluates the effectiveness of the +
such as defense areas. ℎ a dataset comprising ten classes of prohibited items
including three additional classes. Table 1 shows that the
In this work, an exhaustive set of experiments are conducted networks trained on + samples of ℎ dataset
to choose the best performing synthetic dataset. For this purpose, achieve a performance gain up to ~2% over the
we evaluate the effectiveness of combining the fixed ratio of the dataset. As shown in Table 1, combination of and 40%
synthetic dataset with the real dataset. We synthesized and of ℎ resulted in a better and more robust performance
created two types of datasets of the same size as the GDXray
than a network trained on alone for all the
dataset, ℎ and ℎ using the same dataset architectures.
generation procedure discussed earlier and randomly took
samples in the ratio of 50%, 40%, 30%, 20%, and 10% from Moreover, the performance of the models is significant as
ℎ and 30%, 40%, and 50% from ℎ . shown in Table 2 when using a synthetic dataset as the training
ℎ is composed of synthetically generated instances of set and the real dataset as the test set. We used the ℎ
the classes only from the GDXray dataset and ℎ dataset and randomly chose 10% and 20% of GDXray as the test
set. One of the crucial factors observed was the domain
Authorized licensed use limited to: University of Technology Sydney. Downloaded on October 01,2024 at 10:14:25 UTC from IEEE Xplore. Restrictions apply.
adaptation when using the synthetic dataset. The results show not applied. The results also emphasize the importance of the
that the generalization learned through the model with the use of use of distractors which include non-threat objects to force the
only synthetic data overcomes the issue of domain adaptation on network to learn to ignore the nearby patterns.
the test set.
Further, we evaluate and compare different object detectors
Table 3 shows that the two-stage frameworks, namely Faster on the + 40% ℎ dataset using the
RCNN and Cascade RCNN reported higher mAP with 40% of precision-recall curve. For a comparison of the region-based
ℎ Dataset + GDXray while the one-stage frameworks detector and the classification/regression-based detectors, both
achieved better mAP on 50% of ℎ Dataset + Faster RCNN and Cascade RCNN outperformed SSD and
GDXray. This result is attributed to the increased amount of data RetinaNet using the metrics of mAP. We also evaluate and
for the classes pliers, scissors, and hammers which resulted in report the inference speed of different models using Nvidia RTX
inefficiently trained networks. Also, the source background 2080Ti on the GDXray dataset in Table 5. SSD512 based on
image and target foreground image for the additional classes of VGG16 outperformed Faster RCNN, Cascade RCNN, and
ℎ are obtained using a different X-ray film scanner. RetinaNet in speed, however Faster RCNN with 21.8 fps
However, the networks resulted in superior performance over surpassed RetinaNet in both accuracy and speed, which is
the ℎ dataset + GDXray. The Cascade RCNN crucial for security screening. Although Cascade RCNN is
architecture with ResNet50 achieved the best performance over comparable to Faster RCNN in terms of mAP, it is not feasible
the + 40% ℎ dataset with a mAP of to use it due to its low inference speed. Also, it was observed
that the training on the synthesized dataset is challenging due to
0.933, while SSD512 performed the worst with a mAP of 0.885.
the diverse nature of the prohibited objects with a diverse
cluttered background in the dataset which enabled the network
Table 4. Comparison of average precision (AP) by omitting trained on this dataset to be better generalized.
different settings on + 40% ℎ dataset
No Non- Table 5. Comparison of different models in terms of inference speed
No No All
Model
Radiopacity Augmentation
Threat
Operations on GDXRAY dataset using Nvidia RTX 2080 Ti with batch size 4.
Object
Model Backbone Inf(fps)
Faster RCNN 0.931 0.899 0.926 0.937
Cascade Faster RCNN ResNet50 21.8
0.928 0.902 0.924 0.939
RCNN Cascade RCNN ResNet50 13.7
SSD512 0.899 0.879 0.896 0.907 SSD512 VGG16 26.1
Retinanet50 0.911 0.891 0.910 0.917 ResNet50 16.5
RetinaNet
A maximal AP of 0.910 for the class gun was observed on the
RetinaNet. The visual results are shown in Fig. 6, using different C. Comparison with generative models
object strategies on the GDXray and Synthesized dataset. The proposed method AXSD is compared with the state-
of-the-art generative adversarial models [21][22]. The officially
Table 4 shows the results obtained by omitting different
released code is used to produce the results of the GAN models.
settings of our synthetic data generation process. A minor
decrease in the mAP for all the architectures was observed with BigGAN [22] is a class-conditional GAN model for high
the exclusion of the radiopacity principle in the images. fidelity natural image synthesis. The model emphasizes the
However, the decrease in mAP was significant when benefits of scaling up both the batch size and the model size for
augmentation techniques such as rotation and scaling, etc. are high-quality image generation. However, there is a lack of
GDXRAY
a)
SYNTHESIZED
b)
Fig. 6. Performance of the different threat detection pipelines on exemplar images from a) GDXRAY dataset and b) Synthesized dataset. It was
observed that the frameworks achieved superior results for cluttered scenes, occluded objects, and illumination changes.
Authorized licensed use limited to: University of Technology Sydney. Downloaded on October 01,2024 at 10:14:25 UTC from IEEE Xplore. Restrictions apply.
diversity in the generated images which is attributed to the fact networks for object classification within X-ray baggage
that the model truncates large values of the noise vector to security imagery,” in 2016 IEEE International Conference on
improve the image quality [30]. Image Processing (ICIP), Sep. 2016, pp. 1057–1061, doi:
10.1109/ICIP.2016.7532519.
SinGAN [21] is an unconditional image generation [3] D. Mery, V. Riffo, U. Zscherpel, G. Mondragón, I. Lillo, I.
technique that uses single image for serial training. The model Zuccar, et al., “GDXray: The Database of X-ray Images for
consists of a pyramid of fully convolutional GANs, that is Nondestructive Testing,” J. Nondestruct. Eval., vol. 34, no. 4,
trained using a multi-resolution and multi-stage approach. It pp. 1–12, Dec. 2015, doi: 10.1007/s10921-015-0315-7.
uses internal patch distribution within the single image to [4] D. Mery, D. Saavedra, and M. Prasad, “X-Ray Baggage
generate diversified samples. However, this method does not Inspection With Computer Vision: A Survey,” IEEE Access,
consider the relationship between the two images, thus ignoring vol. 8, pp. 145620–145633, 2020, doi:
distribution variations between the images. 10.1109/ACCESS.2020.3015014.
[5] M. Baştan, “Multi-view object detection in dual-energy X-ray
Here, we use the Fréchet Inception Distance (FID) [31] as a images,” Mach. Vis. Appl., vol. 26, no. 7–8, pp. 1045–1060,
measure of image quality. It compares the statistics of generated Nov. 2015, doi: 10.1007/s00138-015-0706-x.
samples to the ground truth samples. A lower FID score [6] D. Mery, E. Svec, and M. Arias, “Object Recognition in
indicates better model performance that generates images Baggage Inspection Using Adaptive Sparse Representations
similar to the real images. As shown in Table 6, the proposed of X-ray Images,” 2016, pp. 709–720.
method outperforms the state-of-the-art GAN based methods [7] D. Mery, V. Riffo, I. Zuccar, and C. Pieringer, “Object
with the lowest FID score. The GAN-based techniques are recognition in X-ray testing using an efficient search
resource-intensive and less automated for the controlled algorithm in multiple views,” Insight - Non-Destructive Test.
synthesis process to deal with complex data. Cond. Monit., vol. 59, no. 2, pp. 85–92, Feb. 2017, doi:
10.1784/insi.2017.59.2.85.
Table 6. Comparison with the state-of-the-art GAN-based methods. [8] M. Singh and S. Singh, “Image segmentation optimisation for
Lower FID indicates better image quality. x-ray images of airline luggage,” in Proceedings of the 2004
Method FID IEEE International Conference on Computational
BigGAN [22] 334.3
Intelligence for Homeland Security and Personal Safety,
SinGAN [21] 80.6 2004. CIHSPS 2004., pp. 10–17, doi:
Ours 57.9 10.1109/CIHSPS.2004.1360198.
[9] G. Heitz and G. Chechik, “Object separation in x-ray image
sets,” in 2010 IEEE Computer Society Conference on
V. CONCLUSIONS Computer Vision and Pattern Recognition, Jun. 2010, pp.
This paper proposes a novel approach for generating 2093–2100, doi: 10.1109/CVPR.2010.5539887.
synthetic datasets using different settings to address the scarcity [10] N. Bhowmik, Q. Wang, Y. F. A. Gaus, M. Szarek, and T. P.
of data in the X-ray imaging domain. The proposed approach Breckon, “The Good, the Bad and the Ugly: Evaluating
generates highly realistic diversified synthetic X-ray scans with Convolutional Neural Networks for Prohibited Item
less cost and time that assists human-annotated datasets for Detection Using Real and Synthetically Composited X-ray
Imagery,” Sep. 2019, [Online]. Available:
training robust neural networks. The performance of the
http://arxiv.org/abs/1909.11508.
combinations of the GDXray and synthesized dataset is
[11] D. Mery, E. Svec, M. Arias, V. Riffo, J. M. Saavedra, and S.
validated through an extensive set of experiments on two
Banerjee, “Modern Computer Vision Techniques for X-Ray
different detection pipelines. The results obtained are superior Testing in Baggage Inspection,” IEEE Trans. Syst. Man,
compared to the models trained on the real dataset. Also, this Cybern. Syst., vol. 47, no. 4, pp. 682–692, Apr. 2017, doi:
paper demonstrates that the performance of models trained 10.1109/TSMC.2016.2628381.
purely on the synthetic dataset is comparable to the [12] B. Hu, C. Zhang, L. Wang, and Q. Zhang, “Multi-label X-ray
performance of models trained on the mixture of the real and Imagery Classification via Bottom-up Attention and Meta
synthesized dataset. In future, the proposed approach can be Fusion.”
extended to other existing classification and detection problems [13] T. Franzel, U. Schmidt, and S. Roth, “Object Detection in
for further improvements in the efficiency and robustness of the Multi-view X-Ray Images,” 2012, pp. 144–154.
trained networks. Also, It can aim to generate realistic multi- [14] S. Akcay and T. P. Breckon, “An evaluation of region based
view radiographs in the medical domain and implement few- object detection strategies within X-ray baggage security
shot learning techniques to reduce dependency on huge imagery,” in 2017 IEEE International Conference on Image
amounts of data. Processing (ICIP), Sep. 2017, pp. 1337–1341, doi:
10.1109/ICIP.2017.8296499.
REFERENCES [15] S. Ren, K. He, R. Girshick, and J. Sun, “Faster R-CNN:
Towards Real-Time Object Detection with Region Proposal
Networks,” Jun. 2015, [Online]. Available:
[1] S. Akcay and T. Breckon, “Towards Automatic Threat
http://arxiv.org/abs/1506.01497.
Detection: A Survey of Advances of Deep Learning within
X-ray Security Imaging,” Jan. 2020, doi: 2001.01293. [16] Dhiraj and D. K. Jain, “An evaluation of deep learning based
object detection strategies for threat object detection in
[2] S. Akcay, M. E. Kundegorski, M. Devereux, and T. P.
baggage security imagery,” Pattern Recognit. Lett., vol. 120,
Breckon, “Transfer learning using convolutional neural
Authorized licensed use limited to: University of Technology Sydney. Downloaded on October 01,2024 at 10:14:25 UTC from IEEE Xplore. Restrictions apply.
pp. 112–119, Apr. 2019, doi: 10.1016/j.patrec.2019.01.014. IEEE Trans. Circuits Syst., vol. 35, no. 10, pp. 1257–1272,
[17] J. Yang, Z. Zhao, H. Zhang, and Y. Shi, “Data Augmentation Oct. 1988, doi: 10.1109/31.7600.
for X-Ray Prohibited Item Images Using Generative [24] C. Rother, V. Kolmogorov, and A. Blake, “‘GrabCut,’” ACM
Adversarial Networks,” IEEE Access, vol. 7, pp. 28894– Trans. Graph., vol. 23, no. 3, p. 309, Aug. 2004, doi:
28902, 2019, doi: 10.1109/ACCESS.2019.2902121. 10.1145/1015706.1015720.
[18] Y. Zhu, H. Zhang, J. An, and J. Yang, “GAN-based data [25] Z. Cai and N. Vasconcelos, “Cascade R-CNN: High Quality
augmentation of prohibited item X-ray images in security Object Detection and Instance Segmentation,” Jun. 2019,
inspection,” Optoelectron. Lett., vol. 16, no. 3, pp. 225–229, [Online]. Available: http://arxiv.org/abs/1906.09756.
May 2020, doi: 10.1007/s11801-020-9116-z. [26] W. Liu et al., “SSD: Single Shot MultiBox Detector,” Dec.
[19] D. Saavedra, S. Banerjee, and D. Mery, “Detection of threat 2015, doi: 10.1007/978-3-319-46448-0_2.
objects in baggage inspection with X-ray images using deep [27] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You
learning,” Neural Comput. Appl., 2020, doi: 10.1007/s00521- Only Look Once: Unified, Real-Time Object Detection,” Jun.
020-05521-2. 2015, doi: 1506.02640.
[20] D. Mery and A. K. Katsaggelos, “A Logarithmic X-Ray [28] T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollár, “Focal
Imaging Model for Baggage Inspection: Simulation and Loss for Dense Object Detection,” Aug. 2017, [Online].
Object Detection,” in 2017 IEEE Conference on Computer Available: http://arxiv.org/abs/1708.02002.
Vision and Pattern Recognition Workshops (CVPRW), Jul. [29] K. Simonyan and A. Zisserman, “Very Deep Convolutional
2017, pp. 251–259, doi: 10.1109/CVPRW.2017.37. Networks for Large-Scale Image Recognition,” Sep. 2014,
[21] T. R. Shaham, T. Dekel, and T. Michaeli, “SinGAN: Learning [Online]. Available: http://arxiv.org/abs/1409.1556.
a Generative Model From a Single Natural Image,” in 2019 [30] A. Razavi, A. van den Oord, and O. Vinyals, “Generating
IEEE/CVF International Conference on Computer Vision Diverse High-Fidelity Images with VQ-VAE-2,” Jun. 2019,
(ICCV), Oct. 2019, pp. 4569–4579, doi: [Online]. Available: http://arxiv.org/abs/1906.00446.
10.1109/ICCV.2019.00467.
[31] M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, and S.
[22] A. Brock, J. Donahue, and K. Simonyan, “Large Scale GAN Hochreiter, “GANs Trained by a Two Time-Scale Update
Training for High Fidelity Natural Image Synthesis,” Sep. Rule Converge to a Local Nash Equilibrium,” Jun. 2017,
2019, [Online]. Available: http://arxiv.org/abs/1809.11096. [Online]. Available: http://arxiv.org/abs/1706.08500.
[23] L. O. Chua and L. Yang, “Cellular neural networks: theory,”
Authorized licensed use limited to: University of Technology Sydney. Downloaded on October 01,2024 at 10:14:25 UTC from IEEE Xplore. Restrictions apply.