Deep Learning for Video Anomaly Detection
Deep Learning for Video Anomaly Detection
Review
Review on Deep Learning Approaches for Anomaly Event
Detection in Video Surveillance
Sabah Abdulazeez Jebur 1,2 , Khalid A. Hussein 3 , Haider Kadhim Hoomod 3 , Laith Alzubaidi 4, * and
José Santamaría 5
Abstract: In the last few years, due to the continuous advancement of technology, human behavior
detection and recognition have become important scientific research in the field of computer vision
(CV). However, one of the most challenging problems in CV is anomaly detection (AD) because of
the complex environment and the difficulty in extracting a particular feature that correlates with a
particular event. As the number of cameras monitoring a given area increases, it will become vital to
have systems capable of learning from the vast amounts of available data to identify any potential
suspicious behavior. Then, the introduction of deep learning (DL) has brought new development
directions for AD. In particular, DL models such as convolution neural networks (CNNs) and
recurrent neural networks (RNNs) have achieved excellent performance dealing with AD tasks, as
well as other challenging domains like image classification, object detection, and speech processing.
In this review, we aim to present a comprehensive overview of those research methods using DL
to address the AD problem. Firstly, different classifications of anomalies are introduced, and then
the DL methods and architectures used for video AD are discussed and analyzed, respectively. The
revised contributions have been categorized by the network type, architecture model, datasets, and
Citation: Jebur, S.A.; Hussein, K.A.;
performance metrics that are used to evaluate these methodologies. Moreover, several applications of
Hoomod, H.K.; Alzubaidi, L.;
video AD have been discussed. Finally, we outlined the challenges and future directions for further
Santamaría, J. Review on Deep
research in the field.
Learning Approaches for Anomaly
Event Detection in Video
Surveillance. Electronics 2023, 12, 29.
Keywords: deep learning; anomaly detection; human behavior; video surveillance
[Link]
electronics12010029
and reckless driving [18]. It is also used to detect abnormal behavior in specific places, such
as petrol stations [19] and elevators [20]. As previously stated, an anomaly is an irregular
scene in a particular time and place. For example, a crowd at a market on an average day is
regarded as normal, while the same crowd in the same place during a curfew is considered
abnormal. Similarly, a crowd at a marathon would not be regarded as an anomaly, but a
crowd in front of a building would be. In other words, the definition of anomaly can evolve
over time, and the current concept of normal or anomaly behavior might not be properly
represented in the future. In addition, challenges such as diversity in scenarios, noisy
videos, low probability of occurrence of anomalies, and infrequent and low availability of
labelled data for anomalous activity, all of which makes AD a challenging task for Artificial
Intelligence (AI) [21]. Usually, machine learning (ML) algorithms need reliable features
to function properly in order to both characterize the input data and classify the output
results. Therefore, correctly recognizing behaviors relies on well-designed features, which
have a direct impact on classification accuracy. Classification accuracy may decrease if
feature extraction is based on empirical experience. Unlike ML, deep learning (DL) uses
neural network (NN) models to automatically identify and extract features from input
data without requiring feature extraction stages. With the help of DL, a specific method is
available for classifying data that can scale to enormous amounts of data and incorporate
complex features. One of the main benefits of DL is that it eliminates the need for any
kind of preprocessing before acquiring feature descriptions. During training, the NN may
automatically determine a large number of unknown parameters. Training takes a lot
of time, however the results achieved pay off in the end [22]. Finally, the success of DL
methods tackling with the AD problem is due to the ability to extract valuable and complex
features from videos using nonlinear transformations. Furthermore, these kinds of methods
can detect anomalies in time and space. Here, the localization method finds each frame that
is abnormal and explains which portion of this frame is unusual, while detection focuses
on the video fragments that have anomalies across all videos [1,2]. DL models such as
convolution neural networks (CNNs), auto-encoders (AEs), generative neural networks
(GANs), and recurrent neural networks (RNNs) have achieved remarkable performance
addressing the AD problem. This paper provides the following main contribution:
• Review of the most relevant state-of-the-art contributions in the last four years dealing
with DL applied to the AD problem.
• Detailed categorization of the existing methods in AD by classifying the approaches
according to the specific DL methods and the adopted architectural models for AD.
• A comprehensive analysis of the DL architectures used in AD has been introduced to
make it easy for a researcher to choose which approach may be more appropriate for
the particular AD application.
• A performance evaluation of methodologies are discussed in terms of datasets and
measures of performance.
• A discussion of the current challenges and needs in the domain of DL applicable to
AD is put forth.
• A description of those new trends in DL-based AD are discussed to provide several
interesting ideas to be considered in future research.
This review is organized as follows: Section 2 introduces the classification of anomalies
in video streaming. Section 3 deals with DL methods for AD. Section 4 presents various
architectural models that are utilized for AD. Benchmarked datasets and performance
metrics are reviewed in Sections 5 and 6, respectively. Several applications and research
challenges in DL-based AD approaches are discussed in Sections 7 and 8, respectively. In
addition, this paper outlined the future direction and research opportunities in Section 9.
The conclusions of our work are drawn in Section 10.
Survey Methodology
The majority of significant research papers that have been reviewed in this article
were published during 2019–2022. The main focus was papers from the most reputed
Electronics 2023, 12, 29 3 of 22
publishers, such as IEEE, Springer, ArXiv, Elsevier, and MDPI. We have reviewed more
than 100 papers on the topic of anomaly detection in video. The most keywords used for
search criteria for this review paper are (“Video Anomaly Detection” OR “Video Anomaly
Recognition”), (“Abnormal Human Behavior”), (“Deep Learning” AND “Anomaly Detec-
tion”), (“CNN” and “Anomaly Detection”), (“Applications of Video Anomaly Detection”),
and (“Challenges in Anomaly Detection”).
3. DL Methods of AD
The nature of the input data is the basic factor that determines which deep neural
network (DNN) is employed in the AD task. DL techniques used for AD can be classified
into the following categories based on the extent of availability of labels: (1) Supervised
video AD, (2) Semi-supervised video AD, (3) Unsupervised video AD, (4) Transfer learning-
based video AD. (5) Deep active learning-based video AD. (6) Deep reinforcement Learning-
based AD, and (7) Deep hybrid models.
such as CNNs and RNNs. In addition, the CNN category includes VGG16, VGG19, and
Yolo series algorithms. While the RNN category includes long-short-term memory (LSTM)
and gated recurrent units (GRUs) algorithms [27,28]. The key advantage of these techniques
is the ability to gather data and produce data output from prior knowledge. Furthermore,
they are simpler and have higher performance compared to other DL techniques. On the
other hand, the disadvantage of these techniques is that decision boundaries might be
overstrained when the training set lacks examples that belong to a class. In addition, they
require precise labels for a variety of normal and anomalous classes, which are often not
available [25].
model performance, it also tackles the issues of an unbalanced dataset, concept drift—where
data patterns are constantly changing—and the high labeling cost [33,34].
4. DL Architectures for AD
A DNN’s architecture specifies its layering, width, depth, and node types. Many
network structures have been proposed for extracting features and identifying actions.
For behavior recognition in videos, DL networks need to take into account more than
just spatial information extraction, as is the case with image-based systems. Without
temporal information, the motion of an action cannot be differentiated; for example, the
act of opening a door is similar to that of closing a door. Action recognition in video
can be improved by making use of temporal motion data. Clearly, there is a connection
between temporal motion data and video action detection. Table 1 presents state-of-the-art
DL methods used in anomaly detection in videos during the last three years. The DL
architectures used to find video anomalies can be roughly put into the following groups:
[Ref.], Year Type of Network Proposed Architecture Dataset (Accuracy) Examples of Anomalies
human skeleton,
YOLOv3, UVF-101, HMDN51, and
[5], 2020 CNN Run, fall, fight
Multi-scale information camera (96.3%)
fusion network
Electronics 2023, 12, 29 6 of 22
Table 1. Cont.
[Ref.], Year Type of Network Proposed Architecture Dataset (Accuracy) Examples of Anomalies
Carts, bikers, skateboarders,
VGGNet-19 pretrained UMN (97.44%),
[4], 2020 CNN running, person walking over
network, binary SVM UCSD-ped1 (86.69%)
the grass
Drug addiction, autism,
[17], 2020 CNN, RNN Combined CNN-RNN NAHFE (89.5%)
criminal mentality.
Canny edge
HMDB51 and Hollywood2 Climbing,
[6], 2020 CNN detection algorithm,
(93%) fighting, falling
3D-ConVNet
Hockey (99%),
[11], 2020 CNN, RNN ConvLSTMs Violent Flow (93.75%), Violence
RLV (96.74)
[18], 2020 CNN YOLOv2 Camera (99.8% Reckless driving
Convolution AE and
[21], 2021 LSTM, AEs sequence to UMN (87%) Sudden running
sequence LSTM
CUHK Avenue (68.94%),
[12], 2021 CNN, GAN 3D-ConVNet Crime
ShanghaiTech (88.26%)
Behave (91.75%),
[8], 2021 CNN 3D-ConVNet Robbery, fight
Caviar (92.86%)
Human skeleton, ShanghaiTech (82.6%), Running, falling down,
[10], 2021 GRU, FFN
GRU-FFN Avenue (91.7%) robbing, fighting.
Weizmann (73.1%),
Human skeleton,
[38], 2021 CNN, LSTM KTH (93.4%), Punching, kicking
ConvLSTM
private (86.5%)
Human skeleton, UR Fall Detection and Fall
[39], 2021 RNN Fall
LSTM, GRU Detection (98.2%)
[7], 2022 RNN LSTM and GRU Camera (84%) Fall, fight
RLVS (96.5%),
[13], 2022 RNN, CNN 3D-ConVNet, LSTM Hockey (97%), Violence
violent flow (93.2%)
Human skeleton,
[20], 2022 CNN Camera (85%) Door blocking, door picking
ConvLSTMs
Abnormal Activities Robbery, fight
[9], 2022 CNN ConvLSTM
(97.64%) hijack, harassment
Hockey fight (93.5%),
Smoking, playing
[40], 2022 CNN, LSTM YOLOv5, ConvLSTM Cigarette smoker (90%),
cards, fighting
Playing cards (93.8%)
Not wearing safety helmet,
[19], 2022 CNN YOLOv5 Private (91%) entering dangerous
area, smoking
Begging, Drunkenness, Fight,
Abnormal Activities Harassment, Hijack, Knife
[41], 2022 CNN, LSTM ConvLSTM
(96.19%) Hazard, Robbery,
and Terrorism
Human skeleton,
[42], 2022 CNN YOLOv3, VGG16 Camera (95%) Walking, hugging, fighting
pre-trained network
Begging, Drunkenness,
[41], Abnormal Activities Fight, Harassment, Hijack,
CNN, LSTM ConvLSTM
2022 (96.19%) Knife Hazard, Robbery, and
Electronics 2023, 12, 29 Terrorism 7 of 22
[42], Human skeleton, YOLOv3,
CNN Camera (95%) Walking, hugging, fighting
2022 VGG16 pre-trained network
4.1. Two-Stream Convolutional Architecture (Dual-Stream CNNs)
4.1. Two-Stream Convolutional Architecture (Dual-Stream CNNs)
Simonyan et al. [43] proposed a two-stream CNN, also known as a dual-stream
CNN,Simonyan
to capture et al.
the[43] proposed
spatial a two-stream
and temporal CNN, alsorespectively.
information, known as a dual-stream
This modelCNN, con-
to capture the spatial and temporal information, respectively. This
tains two networks (as shown in Figure 1) to capture the space and time information model contains two of
networks
video [5]. (as
One shown
networkin Figure
takes 1) to capture theimage
a single-frame spaceas andthetime information
input, of video
then obtains [5].
the spa-
Onedomain
tial networkinformation
takes a single-frame
by extractingimagetheasfeatures
the input,
hiddenthenin obtains the spatial
the image. Whereasdomainthe
information
input of the by extracting
other network theisfeatures hidden
a certain frameinand
the image.
n frames Whereas the input
of images behindof the
it inother
the
network
video, is a certain
which frame and
is responsible fornprocessing
frames of images behind
the optical it in
flow the video, which
information in theisvideo
respon-
by
sible for processing the optical flow information in the video by stacking
stacking consecutive frames to extract temporal features. Finally, the outputs of the two consecutive
frames to extract
networks are fusedtemporal features.
to obtain Finally, the result.
the classification outputsSeveral
of the two networks
researchers are fused to
implemented
obtain the classification
two-stream CNN architecturesresult. for
Several researchers
anomaly detectionimplemented
[43–46] and two-stream
were shownCNN archi-
to produce
tectures for anomaly
state-of-the-art [Link] [43–46] and were shown to produce state-of-the-art results.
Figure 1.
Figure 1. Two-stream
Two-stream network
network architecture
architecture[43].
[43].
Figure 2.
Figure 2. 3D-convolution
3D-convolution operation.
operation.
4.3.
4.3. ConvLSTM
ConvLSTM Architecture
Architecture
ConvLSTM
ConvLSTM isis CNN
CNN combined
combinedwith
withananLSTM
LSTM network.
network. It isItlike
is like LSTM,
LSTM, but con-
but convolu-
volutional operations
tional operations are done
are done during
during layerlayer transitions
transitions [41]. [41].
ConvLSTMConvLSTM performs
performs on
on time-
dependent data like video. As a result, the network is able to detect temporal and spatial
correlations at the local level. ConvLSTM determines the future state of a particular cell
in the grid using the inputs and past states of its local neighbors. By combining CNN and
LSTM, AD is possible in multiple dimensions, including spatial, temporal, and any others
Figure 2. 3D-convolution operation.
Figure 3.
Figure 3. Structure
Structure of
of ConvLSTM
ConvLSTM [41].
[41].
4.4.
4.4. Using Human Skeleton
Skeleton Data
Data
Existing
Existing AD
ADmethods
methodssuffer
sufferfrom recognizing
from patterns
recognizing in complex
patterns environments
in complex such
environments
as background
such variation,
as background lighting
variation, changes,
lighting changes
changes, in pedestrian
changes clothing,
in pedestrian andand
clothing, a lack of
a lack
dimensional
of dimensionalinformation,
information,allall
ofofwhich
whichwork
workagainst
againstthe
theefficacy
efficacyofofinteractive
interactive behavior
behavior
detection systems. The data on the human skeleton is a high level of abstraction from the
body and can deal with interference pretty well [39]. Specifically, the methods based on
human key points are used to detect the anomalies in the video because they can effectively
eliminate background noise and extract human key points in crowded video scenes [13,38].
Multiple human skeleton-based methods have been proposed for action detection and
recognition, such as Openpose, Mediapipe, and Alphapose. The studies in [20,49–51] used
the Openpose method for human body extraction and recognition to provide a good basis
for action detection in video. The Openpose algorithm is the first real-time solution for
identifying key points in the human body, foot, hands, and face. It has also been added to
the OpenCV library [52]. Figure 4 presents 18 joint points in the human body estimated by
OpenPose. Mediapipe and Alphapose methods have been used in [10,53], respectively, to
achieve the extraction and detection of key points on the human body. Mediapipe is an
end-to-end, cross-platform skeletal software tool that works in real-time [53]. Furthermore,
Microsoft Kinect sensor is one of the most widely used approaches to estimating a human’s
basis for action detection in video. The Openpose algorithm is the first real-time solution
for identifying key points in the human body, foot, hands, and face. It has also been added
to the OpenCV library [52]. Figure 4 presents 18 joint points in the human body estimated
by OpenPose. Mediapipe and Alphapose methods have been used in [10,53], respectively,
Electronics 2023, 12, 29 to achieve the extraction and detection of key points on the human body. Mediapipe is9 an of 22
end-to-end, cross-platform skeletal software tool that works in real-time [53]. Further-
more, Microsoft Kinect sensor is one of the most widely used approaches to estimating a
human’s 3D position.
3D position. This technique
This technique works by transforming
works by transforming 2D image
2D image detection detection
from from
several camera
several camera views into 3D images [54,55]. Based on what has been found in the litera-
views into 3D images [54,55]. Based on what has been found in the literature, a feature of
ture,
the ahuman
featureskeleton
of the human skeleton
could be a goodcould berecognize
way to a good way to recognize
human [Link] behavior.
Thecombined
[Link]
Figure combinedCNN-RNN
CNN-RNNarchitecture.
architecture.
5. Benchmark Datasets
5. Benchmark Datasets
Evaluation and comparing system performance is greatly aided by the benchmark
Evaluation and comparing system performance is greatly aided by the benchmark
dataset. Having a complete and reliable dataset allows us to evaluate the system’s per-
dataset. Having a complete and reliable dataset allows us to evaluate the system’s perfor-
formance in a variety of ways. Many key factors must be taken into account when
mance in a variety of ways. Many key factors must be taken into account when building
building a dataset, such as the availability of labeled data, activity type, size of sam-
a dataset, such as the availability of labeled data, activity type, size of samples, test envi-
ples, test environments, the diversity of the captured video, etc. Most researchers divide
ronments, the diversity of the captured video, etc. Most researchers divide a dataset into
a dataset into two groups, training data and testing data, with certain percentages for
two groups, training data and testing data, with certain percentages for each group, such
each group, such as 70% and 30% [6,56] or 80% and 20% [11,17,38] from samples for
as 70% and 30% [6,56] or 80% and 20% [11,17,38] from samples for training and testing,
training and testing, respectively. Few researchers divide their dataset into three sections:
respectively. Few researchers divide their dataset into three sections: training, testing, and
training, testing, and validation. Refs. [9,41] divided the dataset into 70% training, 20%
validation.
testing, and[9,41]
10%divided the dataset
validation. intoin70%
Similarly, ref. training,
[40], the 20% testing,
dataset andinto
is split 10% validation.
80% training,
Similarly,
10% validation, and 10% testing. On the other hand, some researchers trained theirtesting.
in [40], the dataset is split into 80% training, 10% validation, and 10% models
On the other hand, some researchers trained their models on a specific dataset
on a specific dataset and then tested them on a completely different dataset [57,58]. Table and then2
tested them
lists the mostonwidely
a completely
used AD different
datasetsdataset [57,58].
that have beenTable 2 lists
used to the most
benchmark DLwidely used
approaches
AD datasets that have been used to benchmark DL approaches in the academic
in the academic literature. Furthermore, it provides the most relevant information needed literature.
Furthermore,
while workingitwith
provides the mostsuch
DL methods, relevant
as theinformation
main reference, needed while working
description, number with DL
of videos,
methods, such as the main reference, description, number of videos, examples
examples of anomalies, and access details. A more detailed description of the existing of anoma-
lies, and access
datasets details.
used for AD isApresented
more detailed description
in reference [59]. of the existing
Figure 6 showdatasets
examples used for AD
of different
isanomalies
presentedininUCF-crime
reference [59]. Figure 6 show examples of different anomalies in UCF-crime
dataset.
dataset.
Table 2. An overview of common datasets used for AD.
Table 2. Cont.
Table 2. Cont.
Figure 6. Examples of different anomalies in UCF-crime dataset (a) Abuse (b) Arson (c) Explosion
Figure 6. Examples of different anomalies in UCF-crime dataset (a) Abuse (b) Arson (c) Explosion (d)
(d) Fight (e) Road Accident (f) Shooting.
Fight (e) Road Accident (f) Shooting.
[Link]
Anomaly Detection
Detection Approach
ApproachPerformance
PerformanceMetrics
Metrics
The effectiveness of AD systems has been evaluated ininmany
The effectiveness of AD systems has been evaluated manyways
waysbybyresearchers.
[Link]
models aim to achieve low false positive (FP) and false negative (FN) [Link]
AD models aim to achieve low false positive (FP) and false negative (FN) rates. another
another
side, True positive (TP) and true negative (TN) rates should also be high. Howneg-
side, True positive (TP) and true negative (TN) rates should also be high. How many many
ative (normal) and positive (anomaly) examples are correctly labeled is
negative (normal) and positive (anomaly) examples are correctly labeled is denoted bydenoted by TN
andand
TN TP,TP,
respectively. While
respectively. the FP
While theand FN counts
FP and indicate
FN counts how how
indicate manymany
instances were in-
instances were
correctly labeled as positive or negative. Table 3 presents the most significant metrics
incorrectly labeled as positive or negative. Table 3 presents the most significant metrics used
for evaluating
used AD model
for evaluating performance
AD model [27,73–76].
performance [27,73–76].
Table 3. Evaluation metrics used for AD approaches.
Table 3. Evaluation metrics used for AD approaches.
Metric Definition Equation
Metric Definition Equation
It measures the number of anomalous and normal instances that
It measures
are successfully classifiedthewith
number of anomalous
respect and normal
to the overall dataset. TP + TN
Accuracy instances that are successfully classified with respect to
Accuracy Accuracy can be a useful measure if we have a similar balance in TP +TPTN ++FP
TP TN + FN
+TN+FP+FN
the overall dataset. Accuracy can be a useful measure if
the dataset.
we have a similar balance in the dataset.
It is a metric that evaluates the proportion of anomalies and
It is a metric that evaluates the proportion of anomalies FP + FN
Equal Error rate (EER) normal instances that are misclassified with respect to the overall
Equal Error rate (EER)
and normal instances that are misclassified with respect TP + TNFP++FP
FN + FN
dataset. It’s to
usedthe to showdataset.
overall biometric
It’s performance.
used to show TP+TN+FP+FN
The ratio of detected anomalies to total
biometric anomalies is calculated.
performance.
Recall (Sensitivity) TP
Recall is very
The used
ratiowhen you have
of detected to correctly
anomalies to totalclassify some
anomalies is
(True Recall
Positive Rate)
(Sensitivity) TP +TPFN
event thatRecall
calculated. has already occurred.
is very used when you have to
(True Positive Rate) TP+FN
correctly
It is a metric that classify
compares sometheevent that has
number already
of real occurred.
anomalies
Precision (Detection TP
discoveredIt to
is athe totalthat
metric number
comparesof anomalies.
the numberItofcalculates the
real anomalies
rate) TP +TPFP
Precision (Detection rate) accuracy to
discovered of the
thetotal
Truenumber
[Link] anomalies. It TP+FP
calculates the accuracy of the True Positive.
It determines the percentage of the samples that were
correctly labeled as normal. specificity is important TN
Specificity (True Negative Rate) FP+TN
when the objective is to minimize the number of
negative examples that are incorrectly classified.
Electronics 2023, 12, 29 14 of 22
Table 3. Cont.
7.4. Medical AD
The medical analysis depends on the diagnosis task, which in turn is related to AD in
the physiological data of patients since it captures the unique features in the physiological
data of patients. Thus, the identification of anomalies in medical data is considered a
sensitive task in such a field [93]. AD systems in medical tests face extra challenges because
they are directly related to human life and health. Further, there are many patient-specific
characteristics that should be taken into account when designing these systems, such as
age and gender, that lead to variations in data samples. For these reasons, supervised
learning algorithms are mostly used when developing models of medical AD due to their
high ability to distinguish between normal and abnormal samples [94].
problems of sparsely or moderately crowded video scenes and complex environments such
as background variation, lighting changes, and noisy video [13,38].
9. Future Directions
9.1. Aerial Surveillance
Activity recognition and AD in aerial videos are considered important domains of
research due to their contribution to many vital tasks such as search and rescue and aerial
surveillance. Unmanned aerial vehicles, commonly known as “drones,” are usually used
to capture aerial video data. Some challenges make it hard to find anomalies in aerial
videos, like when the drone is moving in the opposite direction from the object or when the
object and drone are moving at different speeds [103,104]. To overcome these difficulties,
researchers in the future will need to develop innovative algorithms specifically for use
with aerial videos.
10. Conclusions
This review aimed to be a significant research contribution to the study of DL in
the intelligent surveillance domain by analyzing and summarizing the DL techniques
utilized in AD for video streaming. In particular, our broad study used two categories to
categorize AD. The first category considered the number of frames used during detection,
while the second one the number of anomalies in a scene. Moreover, our study analyzed
the efficacy of many popular DL techniques for detecting anomalies, categorizing them
according to the network type and architectural design. Moreover, the benchmark datasets
and performance metrics used to evaluate the effectiveness of DL approaches were listed in
detail. Furthermore, our contribution highlighted the applications as well as the key issues
Electronics 2023, 12, 29 18 of 22
in DL-based AD approaches that are still open and need to be addressed for efficient AD.
Finally, we are convinced that the community researchers working on this topic will surely
find this review helpful in gaining a better understanding of this crucial area of research.
Then, our main goal was to encourage researchers to carry out more research in this area so
that it can move forward in the near future.
Author Contributions: Conceptualization, S.A.J., K.A.H. and H.K.H.; methodology, S.A.J. and L.A.;
software, S.A.J. and L.A.; validation, S.A.J., K.A.H., H.K.H., L.A. and J.S.; data curation, S.A.J., K.A.H.,
H.K.H., L.A. and J.S.; writing—original draft preparation, S.A.J. and L.A.; writing—review and
editing, S.A.J., K.A.H., H.K.H., L.A. and J.S.; project administration, S.A.J., K.A.H., H.K.H., L.A. and
J.S. All authors have read and agreed to the published version of the manuscript.
Funding: Laith Alzubaidi would like to acknowledge the support received through the following
funding schemes of Australian Government: ARC Industrial Transformation Training Centre (ITTC)
for Joint Biomechanics under grant IC190100020.
Data Availability Statement: All relevant dataset links were provided in the main review paper content.
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Dávila-Montero, S.; Dana-Lê, J.A.; Bente, G.; Hall, A.T.; Mason, A.J. Review and Challenges of Technologies for Real-Time Human
Behavior Monitoring. IEEE Trans. Biomed. Circuits Syst. 2021, 15, 2–28. [CrossRef] [PubMed]
2. Ren, J.; Xia, F.; Liu, Y.; Lee, I. Deep Video Anomaly Detection: Opportunities and Challenges. In Proceedings of the 2021
International Conference on Data Mining Workshops (ICDMW), Auckland, New Zealand, 7–10 December 2021; pp. 959–966.
3. Ruff, L.; Kauffmann, J.R.; Vandermeulen, R.A.; Montavon, G.; Samek, W.; Kloft, M.; Dietterich, T.G.; Müller, K.-R. A Unifying
Review of Deep and Shallow Anomaly Detection. Proc. IEEE 2021, 109, 756–795. [CrossRef]
4. Al-Dhamari, A.; Sudirman, R.; Mahmood, N.H. Transfer Deep Learning along with Binary Support Vector Machine for Abnormal
Behavior Detection. IEEE Access 2020, 8, 61085–61095. [CrossRef]
5. Yuan, J.; Wu, X.; Yuan, S. A Rapid Recognition Method for Pedestrian Abnormal Behavior. In Proceedings of the 2020 International
Conference on Computer Vision, Image and Deep Learning (CVIDL), Chongqing, China, 10–12 July 2020; pp. 241–245.
6. Bian, C.; Wang, L.; Gu, H.; Zhou, F. Abnormal Behavior Recognition Based on Edge Feature and 3D Convolutional Neural
Network. In Proceedings of the 2020 35th Youth Academic Annual Conference of Chinese Association of Automation (YAC),
Zhanjiang, China, 16–18 October 2020; pp. 1–6.
7. Gorodnichev, M.G.; Gromov, M.D.; Polyantseva, K.A.; Moseva, M.S. Research and Development of a System for Determining
Abnormal Human Behavior by Video Image Based on Deepstream Technology. In Proceedings of the 2022 Wave Electronics
and its Application in Information and Telecommunication Systems (WECONF), Sankt Petersburg, Russia, 31 May–4 June 2022;
pp. 1–9.
8. Cao, B.; Xia, H.; Liu, Z. A Video Abnormal Behavior Recognition Algorithm Based on Deep Learning. In Proceedings of the
2021 IEEE 4th Advanced Information Management, Communicates, Electronic and Automation Control Conference (IMCEC),
Chongqing, China, 18–20 June 2021; Volume 4, pp. 755–759.
9. Vrskova, R.; Hudec, R.; Kamencay, P.; Sykora, P. Recognition of Human Activity and Abnormal Behavior Using Deep Neural
Network. In Proceedings of the 2022 14th International Conference Elektro, Krakow, Poland, 23–26 May 2022; pp. 1–4.
10. Fan, B.; Li, P.; Jin, S.; Wang, Z. Anomaly Detection Based on Pose Estimation and GRU-FFN. In Proceedings of the 2021 IEEE
Sustainable Power and Energy Conference (iSPEC), Nanjing, China, 23–25 December 2021; pp. 3821–3825.
11. Traoré, A.; Akhloufi, M.A. Violence Detection in Videos Using Deep Recurrent and Convolutional Neural Networks.
In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics (SMC), Toronto, ON, Canada„ 11–14
December 2020; pp. 154–159.
12. Emad, M.; Ishack, M.; Ahmed, M.; Osama, M.; Salah, M.; Khoriba, G. Early-Anomaly Prediction in Surveillance Cameras for
Security Applications. In Proceedings of the 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference
(MIUCC), Cairo, Egypt„ 26–27 May 2021; pp. 124–128.
13. Chexia, Z.; Tan, Z.; Wu, D.; Ning, J.; Zhang, B. A Generalized Model for Crowd Violence Detection Focusing on Human Contour
and Dynamic Features. In Proceedings of the 2022 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing
(CCGrid), Taormina, Italy, 16–19 May 2022; pp. 327–335.
14. Zhang, W.; Miao, Z.; Xu, W. A Video Anomalous Behavior Detection Method Based on Multi-Task Learning. In Proceedings of
the 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 15–17 April 2022; pp.
396–400.
15. Alkanat, T.; Groot, H.G.J.; Zwemer, M.; Bondarev, E.; de Peter, H.N. Towards Scalable Abnormal Behavior Detection in Automated
Surveillance. In Proceedings of the 2021 4th International Conference on Artificial Intelligence for Industries (AI4I), Laguna Hills,
CA, USA, 20–22 September 2021; pp. 21–24.
Electronics 2023, 12, 29 19 of 22
16. Tang, X.; Astle, Y.S.; Freeman, C. Deep Anomaly Detection with Ensemble-Based Active Learning. In Proceedings of the 2020
IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 1663–1670.
17. Kabir, M.M.; Safir, F.B.; Shahen, S.; Maua, J.; Binte Awlad, I.A.; Mridha, M.F. Human Abnormality Classification Using Combined
CNN-RNN Approach. In Proceedings of the HONET 2020—IEEE 17th International Conference on Smart Communities:
Improving Quality of Life using ICT, IoT and AI, Charlotte, NC, USA, 14–16 December 2020; pp. 204–208. [CrossRef]
18. Heo, T.; Nam, W.; Paek, J.; Ko, J. Autonomous Reckless Driving Detection Using Deep Learning on Embedded GPUs.
In Proceedings of the 2020 IEEE 17th International Conference on Mobile Ad Hoc and Sensor Systems (MASS), Delhi, In-
dia, 10–13 December 2020; pp. 464–472.
19. Xiao, Y.; Wang, Y.; Li, W.; Sun, M.; Shen, X.; Luo, Z. Monitoring the Abnormal Human Behaviors in Substations Based on
Probabilistic Behaviours Prediction and YOLO-V5. In Proceedings of the 2022 7th Asia Conference on Power and Electrical
Engineering (ACPEE), Hangzhou, China, 15–17 April 2022; pp. 943–948.
20. Shi, Y.; Guo, B.; Xu, Y.; Xu, Z.; Huang, J.; Lu, J.; Yao, D. Recognition of Abnormal Human Behavior in Elevators Based on CNN.
In Proceedings of the 2021 26th International Conference on Automation and Computing (ICAC), Portsmouth, UK, 2–4 September
2021; pp. 1–6.
21. Pawar, K.; Attar, V. Application of Deep Learning for Crowd Anomaly Detection from Surveillance Videos. In Proceedings of the
2021 11th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 28–29 January
2021; pp. 506–511.
22. Wang, Z.; Jiang, K.; Hou, Y.; Dou, W.; Zhang, C.; Huang, Z.; Guo, Y. A Survey on Human Behavior Recognition Using Channel
State Information. IEEE Access 2019, 7, 155986–156024. [CrossRef]
23. Li, J.; Xie, H.; Zang, Z.; Wang, G. Real-Time Abnormal Behavior Recognition and Monitoring System Based on Panoramic Video.
In Proceedings of the 2020 39th Chinese Control Conference (CCC), Shenyang, China, 27–29 July 2020; pp. 7129–7134.
24. Marsiano, A.F.D.; Soesanti, I.; Ardiyanto, I. Deep Learning-Based Anomaly Detection on Surveillance Videos: Recent Advances.
In Proceedings of the 2019 International Conference of Advanced Informatics: Concepts, Theory and Applications (ICAICTA),
Yogyakarta, Indonesia, 20–21 September 2019; pp. 1–6.
25. Chalapathy, R.; Chawla, S. Deep Learning for Anomaly Detection: A Survey. arXiv 2019, arXiv:1901.03407.
26. Pawar, K.; Attar, V. Deep Learning Approaches for Video-Based Anomalous Activity Detection. World Wide Web 2019, 22, 571–601.
[CrossRef]
27. Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.;
Farhan, L. Review of Deep Learning: Concepts, CNN Architectures, Challenges, Applications, Future Directions. J. Big Data 2021,
8, 1–74. [CrossRef]
28. Nayak, R.; Pati, U.C.; Das, S.K. A Comprehensive Review on Deep Learning-Based Methods for Video Anomaly Detection. Image
Vis. Comput. 2021, 106, 104078. [CrossRef]
29. Alzubaidi, L.; Al-Shamma, O.; Fadhel, M.A.; Farhan, L.; Zhang, J.; Duan, Y. Optimizing the Performance of Breast Cancer
Classification by Employing the Same Domain Transfer Learning from Hybrid Deep Convolutional Neural Network Model.
Electronics 2020, 9, 445. [CrossRef]
30. Ali, L.R.; Jebur, S.A.; Jahefer, M.M.; Shaker, B.N. Employing Transfer Learning for Diagnosing COVID-19 Disease. Int. J. Onl. Eng.
2022, 18, 31–42. [CrossRef]
31. Alzubaidi, L.; Al-Amidie, M.; Al-Asadi, A.; Humaidi, A.J.; Al-Shamma, O.; Fadhel, M.A.; Zhang, J.; Santamaría, J.; Duan, Y. Novel
Transfer Learning Approach for Medical Imaging with Limited Labeled Data. Cancers 2021, 13, 1590. [CrossRef] [PubMed]
32. Alzubaidi, L.; Fadhel, M.A.; Al-Shamma, O.; Zhang, J.; Santamaría, J.; Duan, Y.; Oleiwi, S.R. Towards a Better Understanding of
Transfer Learning for Medical Imaging: A Case Study. Appl. Sci. 2020, 10, 4523. [CrossRef]
33. Liu, Y.; Li, Z.; Zhou, C.; Jiang, Y.; Sun, J.; Wang, M.; He, X. Generative Adversarial Active Learning for Unsupervised Outlier
Detection. IEEE Trans. Knowl. Data Eng. 2019, 32, 1517–1528. [CrossRef]
34. Pimentel, T.; Monteiro, M.; Veloso, A.; Ziviani, N. Deep Active Learning for Anomaly Detection. In Proceedings of the 2020
International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8.
35. Pang, G.; van den Hengel, A.; Shen, C.; Cao, L. Deep Reinforcement Learning for Unknown Anomaly Detection. arXiv 2020,
arXiv:2009.06847.
36. Aberkane, S.; Elarbi, M. Deep Reinforcement Learning for Real-World Anomaly Detection in Surveillance Videos. In Proceedings
of the 2019 6th International Conference on Image and Signal Processing and their Applications (ISPA), Mostaganem, Algeria,
24–25 November 2019; pp. 1–5.
37. Zhao, Y.; Deng, B.; Shen, C.; Liu, Y.; Lu, H.; Hua, X.-S. Spatio-Temporal Autoencoder for Video Anomaly Detection. In Proceedings
of the 25th ACM international conference on Multimedia, Mountain View, CA, USA, 23–27 October 2017; pp. 1933–1941.
38. Naik, A.J.; Gopalakrishna, M.T. Deep-Violence: Individual Person Violent Activity Detection in Video. Multimed. Tools Appl. 2021,
80, 18365–18380. [CrossRef]
39. Lin, C.-B.; Dong, Z.; Kuan, W.-K.; Huang, Y.-F. A Framework for Fall Detection Based on Openpose Skeleton and Lstm/Gru
Models. Appl. Sci. 2020, 11, 329. [CrossRef]
40. Khayrat, A.; Malak, P.; Victor, M.; Ahmed, S.; Metawie, H.; Saber, V.; Elshalakani, M. An Intelligent Surveillance System for
Detecting Abnormal Behaviors on Campus Using YOLO and CNN-LSTM Networks. In Proceedings of the 2022 2nd International
Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), Cairo, Egypt, 8–9 May 2022; pp. 104–109.
Electronics 2023, 12, 29 20 of 22
41. Vrskova, R.; Hudec, R.; Kamencay, P.; Sykora, P. A New Approach for Abnormal Human Activities Recognition Based on
ConvLSTM Architecture. Sensors 2022, 22, 2946. [CrossRef]
42. Ali, M.A.; Hussain, A.J.; Sadiq, A.T. Deep Learning Algorithms for Human Fighting Action Recognition. Int. J. Online Biomed.
Eng. 2022, 18, 71–87.
43. Simonyan, K.; Zisserman, A. Two-Stream Convolutional Networks for Action Recognition in Videos. Adv. Neural Inf. Process.
Syst. 2014, 27, 1–11.
44. Huang, X.; He, P.; Rangarajan, A.; Ranka, S. Intelligent Intersection: Two-Stream Convolutional Networks for Real-Time
near-Accident Detection in Traffic Video. ACM Trans. Spat. Algorithms Syst. 2020, 6, 1–28. [CrossRef]
45. Hao, W.; Zhang, R.; Li, S.; Li, J.; Li, F.; Zhao, S.; Zhang, W. Anomaly Event Detection in Security Surveillance Using Two-Stream
Based Model. Secur. Commun. Netw. 2020, 2020, 8876056. [CrossRef]
46. Jamadandi, A.; Kotturshettar, S.; Mudenagudi, U. Two Stream Convolutional Neural Networks for Anomaly Detection in
Surveillance Videos. In Smart Computing Paradigms: New Progresses and Challenges; Springer: Singapore, 2020; pp. 41–48.
47. Tran, D.; Bourdev, L.; Fergus, R.; Torresani, L.; Paluri, M. Learning Spatiotemporal Features with 3d Convolutional Networks.
In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 4489–4497.
48. Abdali, A.-M.R.; Al-Tuma, R.F. Robust Real-Time Violence Detection in Video Using Cnn and Lstm. In Proceedings of the 2019
2nd Scientific Conference of Computer Sciences (SCCS), Baghdad, Iraq, 27–28 March 2019; pp. 104–108.
49. Lin, F.-C.; Ngo, H.-H.; Dow, C.-R.; Lam, K.-H.; Le, H.L. Student Behavior Recognition System for the Classroom Environment
Based on Skeleton Pose Estimation and Person Detection. Sensors 2021, 21, 5314. [CrossRef] [PubMed]
50. Li, S.; Yi, J.; Farha, Y.A.; Gall, J. Pose Refinement Graph Convolutional Network for Skeleton-Based Action Recognition.
IEEE Robot. Autom. Lett. 2021, 6, 1028–1035. [CrossRef]
51. Ali, M.A.; Hussain, A.J.; Sadiq, A.T. Human Fall Down Recognition Using Coordinates Key Points Skeleton. Int. J. Online Biomed.
Eng. 2022, 18, 88–104.
52. Lathifah, N.; Lin, H.-I. A Brief Review on Behavior Recognition Based on Key Points of Human Skeleton and Eye Gaze To Prevent
Human Error. In Proceedings of the 2022 13th Asian Control Conference (ASCC), Jeju Island, Republic of Korea, 4–7 May 2022;
pp. 1396–1403.
53. Zhang, F.; Bazarevsky, V.; Vakunov, A.; Tkachenka, A.; Sung, G.; Chang, C.-L.; Grundmann, M. Mediapipe Hands: On-Device
Real-Time Hand Tracking. arXiv 2020, arXiv:2006.10214.
54. Jia, J.-G.; Zhou, Y.-F.; Hao, X.-W.; Li, F.; Desrosiers, C.; Zhang, C.-M. Two-Stream Temporal Convolutional Networks for
Skeleton-Based Human Action Recognition. J. Comput. Sci. Technol. 2020, 35, 538–550. [CrossRef]
55. Agahian, S.; Negin, F.; Köse, C. An Efficient Human Action Recognition Framework with Pose-Based Spatiotemporal Features.
Eng. Sci. Technol. Int. J. 2020, 23, 196–203. [CrossRef]
56. Kuehne, H.; Jhuang, H.; Garrote, E.; Poggio, T.; Serre, T. HMDB: A Large Video Database for Human Motion Recognition.
In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 2556–2563.
57. Patil, P.W.; Murala, S. MSFgNet: A Novel Compact End-to-End Deep Network for Moving Object Detection. IEEE Trans. Intell.
Transp. Syst. 2018, 20, 4066–4077. [CrossRef]
58. Patil, P.W.; Biradar, K.M.; Dudhane, A.; Murala, S. An End-to-End Edge Aggregation Network for Moving Object Segmentation.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020;
pp. 8149–8158.
59. Patil, N.; Biswas, P.K. A Survey of Video Datasets for Anomaly Detection in Automated Surveillance. In Proceedings of the 2016
Sixth International Symposium on Embedded Computing and System Design (ISED), Patna, India, 15–17 December 2016; pp.
43–48.
60. Fisher, R.B. The PETS04 Surveillance Ground-Truth Data Sets. In Proceedings of the 6th IEEE International Workshop on
Performance Evaluation of Tracking and Surveillance, Prague, Czech Republic, 10 May 2004; pp. 1–5.
61. Mehran, R.; Oyama, A.; Shah, M. Abnormal Crowd Behavior Detection Using Social Force Model. In Proceedings of the 2009
IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 935–942. [CrossRef]
62. Mahadevan, V.; Li, W.; Bhalodia, V.; Vasconcelos, N. Anomaly Detection in Crowded Scenes. In Proceedings of the 2010 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp.
1975–1981.
63. Blunsden, S.; Fisher, R.B. The BEHAVE Video Dataset: Ground Truthed Video for Multi-Person Behavior Classification.
Ann. BMVA 2010, 4, 4.
64. Bermejo Nievas, E.; Deniz Suarez, O.; Bueno García, G.; Sukthankar, R. Violence Detection in Video Using Computer Vision
Techniques. In Computer Analysis of Images and Patterns; Springer: Cham, Switzerland, 2011; pp. 332–339.
65. Hassner, T.; Itcher, Y.; Kliper-Gross, O. Violent Flows: Real-Time Detection of Violent Crowd Behavior. In Proceedings of the 2012
IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, Providence, RI, USA, 16–21 June
2012; pp. 1–6.
66. Soomro, K.; Zamir, A.R.; Shah, M. UCF101: A Dataset of 101 Human Actions Classes from Videos in the Wild. arXiv 2021,
arXiv:1212.0402.
67. Lu, C.; Shi, J.; Jia, J. Abnormal Event Detection at 150 Fps in Matlab. In Proceedings of the IEEE International Conference on
Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2720–2727.
Electronics 2023, 12, 29 21 of 22
68. Caba Heilbron, F.; Escorcia, V.; Ghanem, B.; Carlos Niebles, J. Activitynet: A Large-Scale Video Benchmark for Human Activity
Understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12
June 2015; pp. 961–970.
69. Kay, W.; Carreira, J.; Simonyan, K.; Zhang, B.; Hillier, C.; Vijayanarasimhan, S.; Viola, F.; Green, T.; Back, T.; Natsev, P. The Kinetics
Human Action Video Dataset. arXiv 2017, arXiv:1705.06950.
70. Luo, W.; Liu, W.; Gao, S. A Revisit of Sparse Coding Based Anomaly Detection in Stacked Rnn Framework. In Proceedings of the
IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 341–349.
71. Sultani, W.; Chen, C.; Shah, M. Real-World Anomaly Detection in Surveillance Videos. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6479–6488.
72. Soliman, M.M.; Kamal, M.H.; Nashed, M.A.E.-M.; Mostafa, Y.M.; Chawky, B.S.; Khattab, D. Violence Recognition from Videos
Using Deep Learning Techniques. In Proceedings of the 2019 Ninth International Conference on Intelligent Computing and
Information Systems (ICICIS), Cairo, Egypt, 8–10 December 2019; pp. 80–85.
73. Mandal, M.; Vipparthi, S.K. An Empirical Review of Deep Learning Frameworks for Change Detection: Model Design, Experi-
mental Frameworks, Challenges and Research Needs. IEEE Trans. Intell. Transp. Syst. 2021, 23, 6101–6122. [CrossRef]
74. Lindemann, B.; Maschler, B.; Sahlab, N.; Weyrich, M. A Survey on Anomaly Detection for Technical Systems Using LSTM
Networks. Comput. Ind. 2021, 131, 103498. [CrossRef]
75. Kalsotra, R.; Arora, S. A Comprehensive Survey of Video Datasets for Background Subtraction. IEEE Access 2019, 7, 59143–59171.
[CrossRef]
76. Wu, P.; Liu, J.; Shen, F. A Deep One-Class Neural Network for Anomalous Event Detection in Complex Scenes. IEEE Trans. Neural
Netw. Learn. Syst. 2019, 31, 2609–2622. [CrossRef]
77. Zhao, Y.; Man, K.L.; Smith, J.; Guan, S.-U. A Novel Two-Stream Structure for Video Anomaly Detection in Smart City Management.
J. Supercomput. 2022, 78, 3940–3954. [CrossRef]
78. Ullah, W.; Ullah, A.; Hussain, T.; Muhammad, K.; Heidari, A.A.; Del Ser, J.; Baik, S.W.; De Albuquerque, V.H.C. Artificial
Intelligence of Things-Assisted Two-Stream Neural Network for Anomaly Detection in Surveillance Big Video Data. Futur. Gener.
Comput. Syst. 2022, 129, 286–297. [CrossRef]
79. Ohgushi, T.; Horiguchi, K.; Yamanaka, M. Road Obstacle Detection Method Based on an Autoencoder with Semantic Segmentation.
In Proceedings of the Asian Conference on Computer Vision, Kyoto, Japan, 30 November–4 December 2020.
80. Nitsch, J.; Itkina, M.; Senanayake, R.; Nieto, J.; Schmidt, M.; Siegwart, R.; Kochenderfer, M.J.; Cadena, C. Out-of-Distribution
Detection for Automotive Perception. In Proceedings of the IEEE Conference on Intelligent Transportation Systems, Proceedings,
ITSC, Indianapolis, IN, USA, 19–22 September 2021; Volume 2021, pp. 2938–2943. [CrossRef]
81. Ryan, C.; Murphy, F.; Mullins, M. End-to-End Autonomous Driving Risk Analysis: A Behavioural Anomaly Detection Approach.
IEEE Trans. Intell. Transp. Syst. 2020, 22, 1650–1662. [CrossRef]
82. Lindemann, B.; Fesenmayr, F.; Jazdi, N.; Weyrich, M. Anomaly Detection in Discrete Manufacturing Using Self-Learning
Approaches. Procedia CIRP 2019, 79, 313–318. [CrossRef]
83. Maschler, B.; Knodel, T.; Weyrich, M. Towards Deep Industrial Transfer Learning for Anomaly Detection on Time Series Data.
In Proceedings of the 2021 26th IEEE International Conference on Emerging Technologies and Factory Automation (ETFA),
Vasteras, Sweden, 7–10 September 2021; pp. 1–8.
84. Aboah, A. A Vision-Based System for Traffic Anomaly Detection Using Deep Learning and Decision Trees. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4207–4212.
85. Samuel, D.J.; Cuzzolin, F. Unsupervised Anomaly Detection for a Smart Autonomous Robotic Assistant Surgeon (SARAS) Using
a Deep Residual Autoencoder. IEEE Robot. Autom. Lett. 2021, 6, 7256–7261. [CrossRef]
86. Breitenstein, J.; Termöhlen, J.-A.; Lipinski, D.; Fingscheidt, T. Corner Cases for Visual Perception in Automated Driving: Some
Guidance on Detection Approaches. arXiv 2021, arXiv:2102.05897.
87. Ferreira, R.S.; Guérin, J.; Guiochet, J.; Waeselynck, H. SiMOOD: Evolutionary Testing Simulation with Out-Of-Distribution Images.
In Proceedings of the 27th IEEE Pacific Rim International Symposium on Dependable Computing (PRDC 2022), Beijing, China,
22 November–1 December 2022.
88. Siddique, A.; Afanasyev, I. Deep Learning-Based Trajectory Estimation of Vehicles in Crowded and Crossroad Scenarios.
In Proceedings of the 2021 28th Conference of Open Innovations Association (FRUCT), Moscow, Russia, 27–29 January 2021; pp.
413–423.
89. Prati, A.; Shan, C.; Wang, K.I.-K. Sensors, Vision and Networks: From Video Surveillance to Activity Recognition and Health
Monitoring. J. Ambient Intell. Smart Environ. 2019, 11, 5–22.
90. Bakunah, R.A.; Baneamoon, S.M. A Hybrid Technique for Intelligent Bank Security System Based on Blink Gesture Recognition.
J. Phys. Conf. Ser. 2021, 1962, 12001. [CrossRef]
91. Rego, A.; Ramírez, P.L.G.; Jimenez, J.M.; Lloret, J. Artificial Intelligent System for Multimedia Services in Smart Home Environ-
ments. Cluster Comput. 2022, 25, 2085–2105. [CrossRef]
92. Roth, K.; Pemula, L.; Zepeda, J.; Schölkopf, B.; Brox, T.; Gehler, P. Towards Total Recall in Industrial Anomaly Detection.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June
2022; pp. 14318–14328.
Electronics 2023, 12, 29 22 of 22
93. Fernando, T.; Gammulle, H.; Denman, S.; Sridharan, S.; Fookes, C. Deep Learning for Medical Anomaly Detection–A Survey.
ACM Comput. Surv. 2021, 54, 1–37. [CrossRef]
94. Fernando, T.; Denman, S.; Ahmedt-Aristizabal, D.; Sridharan, S.; Laurens, K.R.; Johnston, P.; Fookes, C. Neural Memory Plasticity
for Medical Anomaly Detection. Neural Netw. 2020, 127, 67–81. [CrossRef]
95. Xu, K.; Jiang, X.; Sun, T. Anomaly Detection Based on Stacked Sparse Coding with Intraframe Classification Strategy. IEEE Trans.
Multimed. 2018, 20, 1062–1074. [CrossRef]
96. Akilan, T.; Wu, Q.J.; Safaei, A.; Huo, J.; Yang, Y. A 3D CNN-LSTM-Based Image-to-Image Foreground Segmentation. IEEE Trans.
Intell. Transp. Syst. 2019, 21, 959–971. [CrossRef]
97. Maschler, B.; Weyrich, M. Deep Transfer Learning for Industrial Automation: A Review and Discussion of New Techniques for
Data-Driven Machine Learning. IEEE Ind. Electron. Mag. 2021, 15, 65–75. [CrossRef]
98. Vu, H.; Phung, D.; Nguyen, T.D.; Trevors, A.; Venkatesh, S. Energy-Based Models for Video Anomaly Detection. arXiv 2017,
arXiv:1708.05211.
99. Miki, D.; Chen, S.; Demachi, K. Unnatural Human Motion Detection Using Weakly Supervised Deep Neural Network.
In Proceedings of the 2020 Third International Conference on Artificial Intelligence for Industries (AI4I), Irvine, CA, USA,
21–23 September 2020; pp. 10–13.
100. Mehmood, A. LightAnomalyNet: A Lightweight Framework for Efficient Abnormal Behavior Detection. Sensors 2021, 21, 8501.
[CrossRef] [PubMed]
101. Osifeko, M.O.; Hancke, G.P.; Abu-Mahfouz, A.M. SurveilNet: A Lightweight Anomaly Detection System for Cooperative IoT
Surveillance Networks. IEEE Sens. J. 2021, 21, 25293–25306. [CrossRef]
102. Chang, S.; Li, Y.; Shen, S.; Feng, J.; Zhou, Z. Contrastive Attention for Video Anomaly Detection. IEEE Trans. Multimed. 2021, 24,
4067–4076. [CrossRef]
103. Mandal, M.; Kumar, L.K.; Vipparthi, S.K. Mor-Uav: A Benchmark Dataset and Baselines for Moving Object Recognition in Uav
Videos. In Proceedings of the 28th ACM International Conference on Multimedia, New York, NY, USA, 12–16 October 2020; pp.
2626–2635.
104. Chen, X.; Li, Z.; Yang, Y.; Qi, L.; Ke, R. High-Resolution Vehicle Trajectory Extraction and Denoising from Aerial Videos.
IEEE Trans. Intell. Transp. Syst. 2020, 22, 3190–3202. [CrossRef]
105. Jiang, C.; Paudel, D.P.; Fofi, D.; Fougerolle, Y.; Demonceaux, C. Moving Object Detection by 3d Flow Field Analysis. IEEE Trans.
Intell. Transp. Syst. 2021, 22, 1950–1963. [CrossRef]
106. Fang, Z.; Jain, A.; Sarch, G.; Harley, A.W.; Fragkiadaki, K. Move to See Better: Self-Improving Embodied Object Detection. arXiv
2020, arXiv:2012.00057.
107. Xu, D.; Xiao, J.; Zhao, Z.; Shao, J.; Xie, D.; Zhuang, Y. Self-Supervised Spatiotemporal Learning via Video Clip Order Prediction.
In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June
2019; pp. 10334–10343.
108. Han, T.; Xie, W.; Zisserman, A. Video Representation Learning by Dense Predictive Coding. In Proceedings of the IEEE/CVF
International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27 October–2 November 2019.
109. Al-amri, R.; Murugesan, R.K.; Man, M.; Abdulateef, A.F.; Al-Sharafi, M.A.; Alkahtani, A.A. A Review of Machine Learning and
Deep Learning Techniques for Anomaly Detection in IoT Data. Appl. Sci. 2021, 11, 5320. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.