0% found this document useful (0 votes)
31 views26 pages

A Method For Detecting Abnormal Behavior of Ships

This research article presents a novel method for detecting abnormal ship behavior by integrating multi-dimensional density distance and an isolation mechanism, addressing both position and speed anomalies. The proposed method utilizes the AMDL and MDDBSCAN algorithms to improve trajectory data processing and anomaly threshold determination, achieving better accuracy and efficiency compared to traditional methods. Experimental results demonstrate significant improvements in detecting ship position and speed anomalies using historical AIS data from Xiamen port.

Uploaded by

Ercan Ersan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views26 pages

A Method For Detecting Abnormal Behavior of Ships

This research article presents a novel method for detecting abnormal ship behavior by integrating multi-dimensional density distance and an isolation mechanism, addressing both position and speed anomalies. The proposed method utilizes the AMDL and MDDBSCAN algorithms to improve trajectory data processing and anomaly threshold determination, achieving better accuracy and efficiency compared to traditional methods. Experimental results demonstrate significant improvements in detecting ship position and speed anomalies using historical AIS data from Xiamen port.

Uploaded by

Ercan Ersan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

MBE, 20(8): 13921–13946.

DOI: 10.3934/mbe.2023620
Received: 07 March 2023
Revised: 07 May 2023
Accepted: 29 May 2023
Published: 20 June 2023
http://www.aimspress.com/journal/MBE

Research article

A method for detecting abnormal behavior of ships based on


multi-dimensional density distance and an abnormal isolation
mechanism

Lixiang Zhang1, Yian Zhu1,*, Jie Ren1, Wei Lu2 and Ye Yao1

1
School of Computer Science, Northwestern Polytechnical University, Xi’an 710072, China
2
School of Information, Xi’an University of Finance and Economics, Xi’an 710100, China

* Correspondence: Email: [email protected].

Abstract: Abnormal ship behavior detection is essential for maritime navigation safety. Most
existing abnormal ship behavior detection methods only build A ship trajectory position outlier
detection model; however, the construction of a ship speed outlier detection model is also significant
for maritime navigation safety. In addition, in most existing methods for detecting a ship’s abnormal
behavior based on abnormal thresholds, one unsuitable threshold leads to the risk of the ship not
being minimized as much as possible. In this paper, we proposed an abnormal ship behavior
detection method based on distance measurement and an isolation mechanism. First, to address the
problem of traditional trajectory compression methods and density clustering methods only using
ship position information, the minimum description length principle based on acceleration (AMDL)
algorithm and Multi-Dimensional Density Clustering (MDDBSCAN) algorithm is used in this study.
These algorithms not only considered the position information of the ship, but also the speed
information. Second, regarding the issue of the difficulty in determining the anomaly threshold, one
method for determining the anomaly threshold based on the relationship between the velocity
weights and noise points of the MDDBSCAN algorithm has been introduced. Finally, due to the
randomness issue of the selected segmentation value in iForest, a strategy of selectively constructing
isolated trees was proposed, thus further improving the efficiency of abnormal ship behavior
detection. The experimental results on the historical automatic identification system data set of
Xiamen port prove the practicality and effectiveness of our proposed method. Our experiment results
show that the proposed method achieves an improvement of about 10% over the trajectory outlier
detection based on the local outlier fraction method, about 14% over the isolation-based online
13922

anomalous trajectory method in terms of the accuracy of ship position information anomaly detection,
and about 3% over the feature fusion method in terms of the accuracy of ship speed anomaly
detection. This method improves algorithm efficiency by about 5% compared to the traditional
isolation forest anomaly detection algorithm.

Keywords: AIS data; outlier detection; MDDBSCAN; isolated forest algorithm

1. Introduction

Maritime safety has always been the focus of naval navigation, especially with the rapid growth
of marine traffic, so it has become an imperative [1]. To ensure the safety of ships during navigation,
we need to monitor the navigation information of ships in real-time, such as position and speed
information. At present, the automatic identification system (AIS) [2,3] installed in most ships can
record the navigation information of ships in real-time. This navigation information includes the
ship’s unique identification number, i.e., the Maritime Mobile Service Identity, longitude, latitude,
speed, course, etc. Using this information can help us to analyze the navigation state of the ship and
detect abnormal behavior of the ship [4].
With the development of big data and artificial intelligence technologies in recent years [5,6],
the issue of trajectory outlier detection has been well-studied in trajectory data mining. At the same
time, there are many offline trajectory outliers detection methods, such as the density-based method,
isolation-based anomalous trajectory detection (iBAT) [7], time-dependent widespread routes-based
trajectory outlier detection (TPRO) [8], etc. Meanwhile, there are also some online trajectory outlier
detection methods, such as isolation-based online anomalous trajectory detection (iBOAT) [9],
time-dependent widespread routes-based real-time trajectory outlier detection (TPRRO) [10], driving
behavior-based trajectory outlier detection [11] and gravity vector [12]. In these methods, IBAT and
iBOAT are based on abnormal isolation mechanisms. TPRO and TPRRO are based on the
time-dependent popular route. The gravity vector is based on the distance measurement. Most of the
above methods, whether offline detection, online detection or anything else, only consider the
position anomaly information of ship behavior and ignore other anomaly information. At the same
time, their abnormality thresholds were shown to be difficult to determine during abnormality
detection, which led to inconsistent abnormality detection results. To solve the above problems, a
ship outlier detection method based on distance measurement and an isolation mechanism is
proposed in this paper. Meanwhile, this method provides us with a reasonable basis for determining
the threshold for judging speed as abnormal. And it is suitable for online outlier detection and can
also detect the outliers of ship position information and velocity information.
First, the method uses the minimum description length principle based on acceleration (AMDL)
algorithm to compress ship trajectories. The reason for choosing this algorithm is that the algorithm
is based on the minimum description length algorithm and has strong applicability. Meanwhile, the
shape of the trajectory output by the other trajectory compression algorithms, such as the
Douglas-Peuker algorithm, depends on the determination of the threshold. However, the state of
motion and direction of ships may change at any time, so the efficiency of trajectory compression
methods that rely on setting a threshold is low and these methods may not achieve good results under
the trajectory compression of a ship.

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13923

Secondly, to accurately extract the normal behavior model of ships, it is necessary to preprocess
AIS data (i.e., identify trajectory clusters and remove noise points). At the initial stage of AIS data
processing, the distribution characteristics of unprocessed AIS data are unknown. Therefore, we used
the Multi-Dimensional Density Clustering (MDDBSCAN) algorithm (based on Density-Based Spatial
Clustering of Applications with Noise) to identify ship trajectory clusters, removed noise points from
the original AIS data and extracted the ship’s normal behavior model to detect ship position outliers.
The DBSCAN algorithm does not require prior knowledge of the number of ship trajectory clusters to
be formed, and it can discover any shape of ship trajectory cluster classes.
Thirdly, the method offers a strategy of selectively building the iTree algorithm to construct the
iForest algorithm. This algorithm has high efficiency and is suitable for the online detection of
abnormal behavior of ships. At the same time, in the process of extracting the correct speed set for
ships and removing speed outliers, the algorithm does not need to consider the distribution of the
original data. Finally, establishing the relationship between velocity weights in MDDBSCAN and
anomaly thresholds will provide one suitable anomaly threshold for detecting ship speed outliers.
The main contributions of this paper are as follows:
1) Regarding the issue of the MDL algorithm only considering trajectory position information,
this paper presents the AMDL algorithm, which preserves not only position information but also
speed information, unlike the MDL algorithm. The AMDL algorithm, based on the MDL algorithm,
forcibly retains the points where the acceleration changes from positive to negative or the
acceleration changes from negative to positive.
2) In response to the problem of traditional density clustering only using ship position
information as a similarity measure, the MDDBSCAN algorithm was developed in this study.
Compared with the traditional density clustering-based ship anomaly detection algorithm, it takes
into account the ship’s speed factor in the similarity measure of the trajectory cluster, so the ship
behavior modeling in the trajectory cluster is more accurate, thus improving the detection of
abnormal ship behavior.
3) Due to the randomness issue of the selected segmentation values in iForest, we propose a
strategy of selectively constructing isolated trees, which improves the detection efficiency of the
isolation forest algorithm for abnormal data compared with the traditional isolation forest algorithm.
The strategy can maximize the difference between the number of nodes in the left sub-tree and the
right sub-tree to improve the convergence speed for the iForest algorithm.
4) In response to the difficulty in determining the anomaly threshold, by analyzing the
relationship between the velocity weights and noise points of the MDDBSCAN algorithm, we have
established the connection between velocity weights and anomaly thresholds, which provides a
reasonable basis for determining anomaly thresholds. Compared with using grid search or
determining the anomaly threshold by experience, this method is more efficient, and the anomaly
threshold selection is more explanatory.
The remainder of the paper is organized as follows. Section 2 discusses related work. Section 3
describes basic concepts about the sub-trajectories similarity measurement and trajectory
compression. Section 4 presents the abnormal ship behavior detection algorithm. Section 5 discusses
the experimental setup and result. Section 6 concludes the article and gives future work.

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13924

2. Related works

2.1. Definition of abnormal ship behavior

In this section, this paper first introduces the general definition of abnormal ship behavior, the
current methods of abnormal ship behavior detection, followed by the main contributions of this paper.
Nowadays, there are many definitions of abnormal behavior of ships, but the main problem is that there
is no unified definition. Martineau and Roy [13] defined and classified abnormal ship behavior earlier,
but this method only divides ship behavior into two categories: motion anomaly and position anomaly.
Portnoy et al. [14] defined abnormal behavior according to the difference between the mathematical
model of normal ship behavior and the ship's data to be detected. Zhang and Tang [15] defined the
abnormal behavior of a ship as the ship’s motion that did not conform to the normal navigation activity
law. Lane et al. [16] classified abnormal ship behavior into five categories according to AIS data:
deviation from the normal route, abnormal activity of the ship AIS, abnormal arrival of the ship,
abnormal distance among ships and an abnormal navigation zone. Laxhammar [17] defined abnormal
behavior of ships as the abnormal deviation of ships from the channel and course, sudden acceleration,
sudden deceleration, and appearance in areas that should not be entered. It could be seen from the
above that different experts had different emphases for the definition of abnormal ship behavior.
The detection of abnormal behavior of ships is the detection of the abnormal trajectory and speed of
ships, and the trajectory of ships is composed of the trajectory points of ships. Therefore, we detected the
abnormal behavior of ships by detecting the position and speed of the ship trajectory points. Combined
with the above definition and analysis, we have defined abnormal ship behavior as the occurrence of a
position outlier or velocity outlier in the trajectory points of ships. The position outlier refers to the
deviation of the ship trajectory from the historical channel, while the speed outlier refers to the ship
entering some particular area that does not conform to the general speed of ships in the area.

2.2. Research on detection methods for abnormal behavior of ships

In recent years, there has been much research on abnormal ship behavior detection, including
collaborative computing and distributed methods, deep learning methods, statistical methods,
distance measurement methods, outlier isolation methods, knowledge-based and data-driven
integrating approaches and so on [18]. In the method based on distance measurement, some
trajectories that are far away from most normal trajectories are regarded as outliers. Aiming at the
problem of the possible skewness in the distribution of raw AIS data, Bao and Du [12] extracted the
mathematical model from its trajectory clusters based on density clustering (DBSCAN) to detect the
abnormal behavior of ships. Aiming at the problem that the traditional Trajectory Outlier Detection
(TRAOD) algorithm cannot detect outliers from locally dense trajectories, Luan et al. [19] combined
the Local Outlier Factor algorithm with the traditional TRAOD algorithm to detect trajectory
anomalies. Due to a lack of serious studies on outlier detection for trajectory data, Liang et al. [20]
used the trajectory outlier detection based on the local outlier fraction (TODLOF) algorithm to detect
outliers in the trajectory dataset. Using an approach based on deep learning, Belhadi et al. [21] and
others compared the traditional deep learning methods with data mining, machine learning and other
methods, and they have proved the advantages of using traditional deep learning the Convolutional
Neural Network algorithm and the Region Convolutional Neural Network algorithm for outlier

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13925

detection. In the method based on statistical or collaborative computing and distributed methods,
Szarmach and Czarnowski [22] proposed a method of using a wavelet transform to detect incorrect
AIS data. Chen et al. [23] adopted spark technology to improve the detection efficiency of outliers.
Using a method based on isolating outliers, to avoid tricky parameters in their trajectory outlier
detection model, Hu et al. [24] used the idea of an isolated forest to isolate outliers. In other related
research on ship anomaly detection, Belhadi et al. [25] compared the current outlier detection
methods and deeply analyzed various trajectory outlier detection methods consequently, various
trajectory outlier detection methods could be well understood. Riveiro et al. [26] provided an
overview of the state-of-the-art research about maritime anomaly detection from the perspective of
data, methods, systems and user aspects.
Regarding the above methods of abnormal ship behavior detection, they were generally for
offline detection. That is, they could not detect abnormal ship behavior in realtime. Although the
effect of outlier detection was good in the experiment, it could not be applied to practical engineering.
The methods based on distance measurement and mathematical modeling have been widely used for
online abnormal ship behavior detection methods. Both judge whether the object detected is an
outlier by measuring the distance between the object to be detected and the correct object. However,
the problem is that the distance threshold selection significantly impacts the judgment of whether the
object to be detected is an outlier. At the same time, the method of abnormal ship behavior detection
based on mathematical modeling has poor scalability. At the same time, most other online abnormal
ship behavior detection methods only mined the position outlier, not the speed outlier. And then, the
anomaly threshold is challenging to determine in these methods, leading to unstable anomaly
detection. Regarding the approaches based on deep learning, these methods lack explanatory power
for detecting abnormal behavior of ships.
All in all, the above traditional methods for detecting abnormal behavior of ships have certain
problems, such as being unable to perform online detection, relying heavily on the selection of
thresholds for detection results, low scalability, only mining abnormal position information of ship
trajectories, lack of interpretability, etc.

2.3. Advantage of the method based on distance measurement and an isolation mechanism

Based on the above problems, we propose an abnormal ship behavior detection method based
on distance measurement and an isolation mechanism, which can not only detect the position outliers
of ship trajectory points in realtime, but it can also detect the speed outliers in realtime. Meanwhile,
this method improves the MDL algorithm to obtain more accurate compressed ship trajectories , and
it provides a reasonable basis for the determination of abnormal speed judgment thresholds. Finally,
a strategy of selectively constructing isolated trees is proposed to improve the efficiency of detecting
abnormal behavior in ships.
For the outlier detection of the ship position information, the AIS data are first processed and
compressed, leading to the minimum length description criterion [27] algorithm based on
acceleration (AMDL), which reflects the real navigation information of ships with less AIS data as
much as possible. Then the ship position information model is extracted from the trajectory cluster
after multi-dimensional density clustering (MDDBSCAN) [28,29]. By comparing the differences
between ship trajectory points and the ship position information model, the position outliers of the
ship can be detected in realtime.

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13926

For the outlier detection of the speed information, this method uses an isolation forest algorithm [30].
First, a functional relationship between the speed weights and abnormal speed judgment threshold is
established. Second, detect and eliminate the speed outliers by implementing the isolation forest
algorithm to obtain the correct ship speed set in some areas. Finally, the goal is to add the speed to be
detected to the correct ship speed set, and then to calculate the score of the speed to be detected. The
score can be used to judge whether the speed value is an outlier. The advantage of using an isolation
forest algorithm is that it has good efficiency and can meet the needs of online detection. Meanwhile,
the algorithm has strong expansibility for outlier detection for ship behavior. On this basis, the
efficiency of the traditional isolation forest algorithm is effectively improved by selectively
constructing isolated trees, resulting in faster detection of abnormal ship speeds. The method proposed
is not only applicable to online anomalous behavior detection for ships, but it can also provide a
theoretical reference basis for the establishment of anomalous behavior detection models of other
moving targets.

3. The definition of basic concepts

In this section, some related terms and formal expressions are defined first, which mainly
include the relevant definitions of sub-trajectory similarity measurement [31] and trajectory
compression.

3.1. Sub-trajectory similarity measurement method

There are three types of distances between trajectory segments: vertical distance (𝑑⊥ ), parallel
distance (𝑑|| ), and angular distance (𝑑𝜃 ). These three types of distances are used to measure the

similarity of trajectory segments. Figure 1 shows these three distances via a formal method.

Figure 1. Trajectory segment distance.

It is assumed that there are two trajectory segments in space, namely 𝐿𝑗 = 𝑠𝑗 𝑒𝑗 and 𝐿𝑖 = 𝑠𝑖 𝑒𝑖 ,
where 𝑠𝑖 and 𝑒𝑖 respectively represent the two endpoints of the segment 𝐿𝑖 .Then, 𝑒𝑗 and 𝑠𝑗
respectively represent the two endpoints of the segment 𝐿𝑗 . Here, it is assumed that the length of the
segment 𝐿𝑗 is shorter than 𝐿𝑗 .
The vertical distance of 𝐿𝑖 and 𝐿𝑗 is defined as Formula (1), where the two endpoints (𝑠𝑗 and
𝑒𝑗 ) of segment 𝐿𝑗 are projected as 𝑝𝑠 and 𝑝𝑒 on the segment 𝐿𝑖 . At the same time, the Euclidean
distance from the point 𝑠𝑗 to 𝑝𝑠 is 𝑙⊥1 , and the Euclidean distance from the point 𝑒𝑗 to 𝑝𝑒 is 𝑙⊥2 .

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13927

2 2
𝑙⊥1 +𝑙⊥2
𝑑⊥ (𝑙𝑖 , 𝑙𝑗 ) = (1)
𝑙⊥1 +𝑙⊥2

The parallel distance of 𝐿𝑖 and 𝐿𝑗 is defined as Formulas (2)–(4), where the two endpoints (𝑠𝑗
and 𝑒𝑗 ) of segment 𝐿𝑗 are projected as 𝑝𝑠 and 𝑝𝑒 on the segment 𝐿𝑖 . At the same time, the
Euclidean distance from the point 𝑠𝑗 to 𝑝𝑠 is 𝑙⊥1 , and the Euclidean distance from the point 𝑒𝑗 to
𝑝𝑒 is 𝑙⊥2 .
𝑑|| (𝑙𝑖 , 𝑙𝑗 ) = 𝑚𝑖𝑛( 𝑙||1 , 𝑙||2 ) (2)
𝑙||1 = 𝑚𝑖𝑛( 𝑑(𝑠𝑖 , 𝑝𝑠 ), 𝑑(𝑒𝑖 , 𝑝𝑠 )) (3)
𝑙||2 = 𝑚𝑖𝑛( 𝑑(𝑒𝑖 , 𝑝𝑒 ), 𝑑(𝑠𝑖 , 𝑝𝑒 )) (4)

The angular distance is defined as Formula (5). The angle of 𝐿𝑖 and 𝐿𝑗 is 𝜃(0 ≤ 𝜃 ≤ 𝜋).
Generally, angle 𝜃 selects the smaller angle between 𝐿𝑖 and 𝐿𝑗 . |𝑙𝑗 | represents the length of the
line segment 𝐿𝑗 .
𝜋
|𝑙𝑗 | × 𝑠𝑖𝑛 𝜃 0 ≤ 𝜃 ≤
2
𝑑𝜃 = { 𝜋 (5)
|𝑙𝑗 | ≤𝜃≤𝜋
2

The angular distance is usually used for the trajectory segment with direction. When dealing
with the trajectory segment without direction, the angular distance can be simply defined as
|𝑙𝑗 | × 𝑠𝑖𝑛 𝜃.

3.2. Trajectory compression technique

The purpose of trajectory compression is to describe the characteristics of a trajectory with as


few points as possible. This work adopts the AMDL principle, which is based on the MDL algorithm.
At the same time, by using the MDL algorithm, it forcibly retains the points where the acceleration
changes from positive to negative or the acceleration changes from negative to positive. The reason
for choosing this algorithm is that the algorithm has strong applicability and the shape of the
trajectory output does not depend on the determination of the threshold.
The AMDL segmentation cost is shown in Formulas (6) and (7), and 𝐴𝑀𝐷𝐿𝑝𝑎𝑟 = 𝐿(𝐻) +
𝐿(𝐷|𝐻). In Formula (6), 𝑝𝑖 represents the trajectory point and 𝑝𝑐𝑖 represents the point selected by

the AMDL algorithm; additionally, 𝑙𝑒𝑛(𝑝𝑖 , 𝑝𝑗 ) represents the Euclidean distance between two

trajectory points.

𝑝𝑎𝑟 −1
𝐿(𝐻) = ∑𝑖=1 𝑖 𝑙𝑜𝑔2 ( 𝑙𝑒𝑛(𝑝𝑐𝑖 , 𝑝𝑐𝑖+1 )) (6)

(𝑑⊥ (𝑝𝑐𝑖 𝑝𝑐𝑖+1 ,𝑝𝑘 𝑝𝑘+1 )) (𝑑𝜃 (𝑝𝑐𝑖 𝑝𝑐𝑖+1 ,𝑝𝑘 𝑝𝑘+1 ))
𝑝𝑎𝑟 −1
𝐿(𝐷|𝐻) = ∑𝑖=1 𝑖 ∑𝑐𝑘=𝑐
𝑖+1 −1
*𝑙𝑜𝑔2 + 𝑙𝑜𝑔2 + (7)
𝑖

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13928

No segmentation cost , i.e., 𝐴𝑀𝐷𝐿𝑛𝑜𝑝𝑎𝑟 is the total length of the trajectory from point 𝑝𝒊 to
point 𝑝𝑗 , and the formula is given by Formula (8).

𝑗−1
𝐴𝑀𝐷𝐿𝑛𝑜𝑝𝑎𝑟 = ∑𝑖=1 𝑙𝑒𝑛(𝑝𝑖 , 𝑝𝑖+1 ) (8)

4. Abnormal ship behavior detection method

This paper presents an abnormal ship behavior detection method based on multi-dimensional
density clustering and an abnormal isolation mechanism. In this method, massive AIS data must be
compressed, so this study adopted an AMDL algorithm. Second, the MDDBSCAN algorithm must
be carried out on the compressed data. At the same time, the trajectory cluster is divided into 10 grids.
Then, a position information model of the ship trajectory is extracted on each grid. By measuring the
distance difference between the point to be detected and the correct model, the method can judge
whether the ship’s position is abnormal. Thirdly, in each grid, the isolation forest algorithm based on
selectively constructing isolated trees is used to remove the abnormal speed points of ships to extract
the correct speed set. By calculating the abnormal score value of the speed to be detected in the
speed set, the method can judge whether the ship speed is abnormal. During the process, the
connection between the velocity weights in MDDBSCAN and anomaly thresholds are established,
providing a reasonable basis for determining speed anomaly thresholds. The detection flow chart is
shown in Figure 2.

Figure 2. Flow chart of abnormal ship behavior detection.

4.1. Ship trajectory compression

This paper’s data compression method is based on the AMDL algorithm. The core idea of the
AMDL algorithm is to extract feature points from a trajectory. At the same time, the trajectory
compression by this method has two ideal properties: accuracy and simplicity. Accuracy refers to the
trajectory after segmentation and the trajectory before segmentation having the same characteristics
as much as possible. At the same time, simplicity means that the feature points to be extracted from
the original trajectory should be as few as possible. Therefore, the AMDL algorithm process mainly
includes two parts: judging whether there are trajectory points with the positive and negative

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13929

transformations of front and rear acceleration in the segmented trajectory point set and judging
whether to segment the trajectory. In the AMDL algorithm, 𝑝i represents one trajectory point, 𝑝vi
represents trajectory points of positive or negative transformation of front and rear acceleration, and
𝑝𝑐i represents characteristic points of the ship trajectory selected by the AMDL algorithm.

Algorithm 1 AMDL
Input:One trajectory (𝑝1 , 𝑝2 𝑝𝑛 )
Output:All feature points of the trajectory (𝑝𝑐1 , 𝑝𝑐2 𝑝 𝑎𝑟𝑖 )
1:Add P1 into the set CP; /*the start point*/
2:startIndex = 1, length = 1;
3:while startIndex + length ≤ n do
4: currIndex = startIndex + length;
5: Add all points from startIndex to currindex to the set temp
6: Check if 𝑝vi exits in the temp set
7: if non-existent then
8: costpar = AMDLpar(PstartIndex,PcurrIndex);
9: costnopar = AMDLnopar(PstartIndex,PcurrIndex);
10: if costpar>costnopar then
11: Add the Pcurrindex-1 point to the set CP;
12: startIndex = currIndex-1 , length = 1;
13: else
14: length = length + 1;
15: else
16: mark the point as PcurrIndex
17: Add the pcurrIndex point to the set CP;
18: startIndex=currIndex;
19: length = 1;
20:Add Pn to the set CP

In Algorithm 1, the first two lines are the initialization operations of the algorithm. The process
from the third line to the end aims to find characteristic points of the ship trajectory. During the
process, the AMDL algorithm calculates the partition cost (costpar) and the no partition cost
(costnopar) for each trajectory point. If costpar is greater than costnopar, then the previous point of
that point will be selected as a characteristic point. Meanwhile, 𝑝vi will also be selected as the
characteristic point.

4.2. Position outlier detection for ship trajectory points

For the detection of ship trajectory position point outliers, it is necessary to carry out
multi-dimensional density clustering on the compressed AIS data. The MDDBSCAN algorithm is
proposed to solve the above problem. At the same time, the trajectory cluster needs to be meshed,
and then a correct model of ship position information is extracted on each grid.
The traditional DBSCAN algorithm only considers the Euclidean distance between points. The
clustering object of the MDDBSCAN algorithm in this paper is the sub-trajectory, and the clustering

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13930

process adopts the idea of DBSCAN [28,29]. In this process, the clustering objects are
sub-trajectories, that is 𝑆𝑢𝑏𝑖 = (𝑝𝑐𝑖 𝑝𝑐𝑗 ), and the sub-trajectory velocity can be represented by
1
𝑉𝑆𝑢𝑏𝑖 = (𝑉𝑝𝑐 + 𝑉𝑝𝑐 ). At the same time, the similarity distance of sub-trajectories will be calculated
2 𝑖 𝑗

by 𝐷𝑖𝑠𝑡(𝑆𝑢𝑏𝑖 , 𝑆𝑢𝑏𝑗 ) = 𝜔⊥ 𝑑⊥ (𝑆𝑢𝑏𝑖 , 𝑆𝑢𝑏𝑗 ) + 𝜔|| 𝑑|| (𝑆𝑢𝑏𝑖 , 𝑆𝑢𝑏𝑗 ) + 𝜔𝜃 𝑑𝜃 (𝑆𝑢𝑏𝑖 , 𝑆𝑢𝑏𝑗 ) + 𝜔𝑣 𝑉𝑆𝑢𝑏𝑖 .

The algorithm flow of MDDBSCAN is as follows.

Algorithm 2 MDDBSCAN
Input:(1) Sub-trajectory set 𝐷 = *Sub1 ,Sub2 Sub𝑛 +
(2) Neighborhood radius ,Minimum number of entities(𝑀𝑖𝑛𝑆𝑢𝑏𝑠)
Output:Clusters set 𝑆 = *𝑠1 , 𝑠2 𝑠𝑛 +
/*STEP 1*/
1:clusterID = 0; /*one initial id*/
2:Mark all sub-trajectories as unclassified
3:for each (𝑆𝑢𝑏𝑖 𝐷) do
4: if 𝑆𝑢𝑏𝑖 is not classified then
5: Compute 𝑁 (𝑆𝑢𝑏𝑖 ) /*find sub-trajectory 𝑆𝑢𝑏𝑖 neighborhood*/
6: if |𝑁 (𝑆𝑢𝑏𝑖 )| 𝑀𝑖𝑛𝑆𝑢𝑏𝑠 then
7: allocate 𝑐𝑙𝑢𝑠𝑡𝑒 𝐷 to 𝑆𝑢𝑏𝑖 𝑁 (𝑆𝑢𝑏𝑖 );
8: put 𝑁 (𝑆𝑢𝑏𝑖 ) 𝑆𝑢𝑏𝑖 into queue ;
/* STEP 2 */
9: ExpandCluster ( , 𝑐𝑙𝑢𝑠𝑡𝑒 𝐷, , 𝑀𝑖𝑛𝑆𝑢𝑏𝑠)
10: clusterID = clusterID + 1;
11: else
12: mark 𝑆𝑢𝑏𝑖 as noised sub-trajectory;
/*STEP 3*/
13:for each (𝑠𝑖 𝑆) do
14: if |𝑠𝑖 | minSubs then
15: remove 𝑠𝑖 from 𝑆;
/*STEP 2 find density connection set*/
16:ExpandCluster ( , 𝑐𝑙𝑢𝑠𝑡𝑒 𝐷, , 𝑀𝑖𝑛𝑆𝑢𝑏𝑠) {
17: while do
18: Define 𝑀 as the first sub-trajectory to be checked in the ;
19: Compute 𝑁 (𝑀) ;
20: if |𝑁 (𝑀)| 𝑀𝑖𝑛𝑆𝑢𝑏𝑠 then
21: for each ( 𝑁 (𝑀))do
22: if is not classified or is noised then
23: allocate 𝑐𝑙𝑢𝑠𝑡𝑒 𝐷 to ;
24: if is not classified then
25: put into queue ;
26: remove 𝑀 from queue ;
27:}

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13931

In Algorithm 2, STEP 1 includes two parts: algorithm initialization and searching for the
neighborhood of sub-trajectories. STEP 2 aims to find the density connection set and STEP 3 aims to
form clusters and remove noise data.
After completing the multi-dimensional density clustering of sub-trajectories, it is necessary to
establish a correct model of ship position information in each cluster. First, the trajectory cluster
needs to be meshed. In this work, the sub-trajectory cluster is divided into 10 grids, that is, 10
detection models are generated. Then, the center vector is established in each grid, and the center
vector will be used as the detection benchmark to judge whether the position information is abnormal.
The center vector is defined as follows: 𝑉 = (𝑎𝑣𝑔 , 𝑎𝑣𝑔𝑌, 𝑚𝑒𝑑𝑖𝑢𝑚𝐷).
𝑎𝑣𝑔 denotes the average X coordinate of all trajectory points in some grid. 𝑎𝑣𝑔𝑌 denotes
the average Y coordinate of all trajectory points in some grid. 𝑚𝑒𝑑𝑖𝑢𝑚𝐷 denotes the median
distance. Let all sub-trajectories in the grid be represented as a set a space*sub1 ,sub2 sub𝑛 +, and set
*𝑝𝑐1 , 𝑝𝑐2 𝑝𝑐𝑛 + denotes all trajectory points. Then the calculation formula for the components of the

CV is as follows.
∑𝑛
𝑖=1 𝑝𝑐𝑖 . 𝑥
1) average X coordinate: 𝑎𝑣𝑔 = ;
𝑛
∑𝑛
𝑖=1 𝑝𝑐𝑖 . 𝑦
2) average Y coordinate: 𝑎𝑣𝑔𝑌 = ;
𝑛
∑𝑛
𝑖=1 𝑙𝑒𝑛(𝑝𝑐𝑖 ,(𝑎𝑣𝑔𝑋,𝑎𝑣𝑔𝑌))
3) medium distance: 𝑚𝑒𝑑𝑖𝑢𝑚𝐷 = ; and 𝑙𝑒𝑛 denotes the Euclidean
𝑛
distance between two points.
After the center vector is determined in the grid of each sub-trajectory, the abnormal position of
ship trajectory points can be judged. The detection idea is to measure the relative distance between
the point to be detected and the center vector. If the relative distance exceeds the threshold range, it is
considered that the position of the trajectory point is abnormal. If the relative distance is within the
threshold, the position of the current point is considered normal. The formula of the relative distance
between the point to be detected and the center vector is given by Formula (9).

𝑙𝑒𝑛(𝑝,(𝐶𝑉.𝑎𝑣𝑔𝑋,𝐶𝑉.𝑎𝑣𝑔𝑌))
𝐷(𝑝, 𝑉) = (9)
𝐶𝑉.𝑚𝑒𝑑𝑖𝑢𝑚𝐷

In Formula (9), p is the point to be detected. When CRD > 1, the distance from the point to be
detected to the center vector is greater than the average distance from all points in the grid area to the
center vector. When CRD = 1, it is explained that the distance from the point to be detected to the
center vector is equal to the average of the distance from all points in the grid area to the center
vector. When CRD < 1, the distance from the point to be detected to the center vector is less than the
average distance from all points in the grid area to the center vector.
In determining the threshold (CRD(P,CV)), we assume that the distance from all points in the
sub-trajectory grid to the center vector approximately satisfies the normal distribution then, the
three standard deviations criterion is used to determine the threshold, and the formula is given by
Formula (10). When 𝑦𝑝𝑖 = 0, it indicates that the position of the point to be detected is normal.

When 𝑦𝑝𝑖 = 1, it indicates that the position of the point to be detected is abnormal.

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13932

𝑦𝑝𝑖 = {0, 𝐷 3𝜎 𝐷𝑖 𝐷 + 3𝜎 (10)


1, 𝑒𝑙𝑠𝑒

In the MDDBSCAN algorithm, the method takes into account the ship's speed factor in the
similarity measure of the trajectory cluster. So, the extracted ship behavior modeling in the
sub-trajectory cluster is more accurate, and the iForest algorithm which is used to detect abnormal
ship speed can achieve better results after removing noise data that considers the speed factor.

4.3. Speed outlier detection for ship trajectory points

For speed outlier detection for ships, the isolation forest algorithm based on selectively
constructing isolated trees is used to extract the correct ship speed set. The algorithm has high
efficiency and is suitable for the online detection of abnormal behavior of ships. At the same time, in
the process of extracting the correct speed set for ships and removing speed outliers, the algorithm
does not need to consider the distribution of the original data. The isolation forest algorithm is more
suitable for fewer data sets [32]. Ten grids have been divided for the trajectory cluster, as described
in the previous section. Still, to help the isolation forest algorithm achieve better results, each grid in
the previous section is divided into four grids again. Through this method, we can remove the
abnormal speed of ships in each grid, to obtain the correct speed set for ships in each grid. When it is
necessary to detect whether the speed of the ship is abnormal, the method will add the speed of the
ship to be detected to the correct set of ship speed in the grid at the corresponding position, as well as
determine whether the speed is abnormal by calculating the abnormal score value of the speed to be
detected.

Algorithm 3 Selective construction of isolated trees -- iTree(x,e,l)


Input:X, e, l.
Output:an iTree
1:if 𝑒 𝑙 or | | ≤ 1 then
2: return exNode {| |}; /*return Size of data set*/
3:else
4: Choose any value p between the maximum and minimum values of X;
5: 𝑙 𝑓𝑖𝑙𝑡𝑒 ( , 𝑖 𝑝)
6: 𝑟 𝑓𝑖𝑙𝑡𝑒 ( , 𝑖 𝑝)
7: end if
8: if |X_l |/|X_r | ≥ Ratio && the first division then
8: view the tree as a bad isolated tree, so not to build;
9: end if
10: return InNode{ 𝐿𝑒𝑓𝑡 𝑖 𝑒𝑒( 𝑙 , 𝑒 + 1, 𝑕) 𝑖𝑔𝑕𝑡 𝑖 𝑒𝑒( 𝑟 , 𝑒 + 1, 𝑕),
𝑠𝑝𝑙𝑖𝑡𝑉𝑎𝑙𝑢𝑒 𝑝 }; /*non-leaf node*/

iForest is similar to a decision tree and random forest. iForest is composed of isolated trees
(iTree). iTree uses random binary trees. Each node connects two child nodes or directly connects a
leaf node. Randomly sampling partial data to construct an isolated tree can ensure a difference
between different trees. To build an isolated tree, we need to select a feature (speed is selected here)

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13933

and randomly select a segmentation value to recursively segment the data set until the maximum
height limit of the tree is met or the number of samples of the tree nodes is only one. The maximum
height limit (h) of the tree is related to the number of sub-samples(φ), 𝑕 = 𝑐𝑒𝑖𝑙𝑖𝑛𝑔(𝑙𝑜𝑔2 ( 𝜙)).
Generally, when dividing left and right sub-trees, the isolation forest algorithm randomly selects
a number between the minimum and maximum values from the data set as the segmentation value.
Samples smaller than the segmentation value will be divided into the left sub-tree, and samples larger
than the segmentation value will be divided into the right sub-tree. Due to the randomness of the
selected segmentation value, there will be differences in the ability of each isolated tree to
distinguish outliers. For the identification of abnormal data, it is expected that the segmentation value
can maximize the difference between the number of nodes in the left sub-tree and the right sub-tree,
to improve the convergence speed of the algorithm. Therefore, we propose an algorithm for
selectively building an isolated tree. The algorithm flow for constructing the isolated tree of ship
speed is as shown in Algorithm 3. In Algorithm 3, the terms Ratio represents the ratio of the number
of samples divided into the left (right) sub-tree to the number of samples divided into the right (left)
sub-tree during the first division. X represents the data set to enter. e denotes the current height of
the tree. 𝑙 denotes the maximum height limit of the tree.
It is usually necessary to build 100 such isolated trees to construct an isolated forest (iForest).
The judgment of outliers in iForest is based on the average height of the outlier on 100 trees (i.e.,
path length). The average height of outliers in iForest is usually low. For iForest, given a data set
containing n samples, the average path length of the tree is as given by Formula (11).

2(𝑛−1)
𝑐(𝑛) = 2𝐻(𝑛 1) (11)
𝑛

H(i) is a harmonic number, which can be estimated as ln(i) + 0.5772156649. c(n) is the average
value of path length for a given number of samples n, which is used to standardize the path length
h(x) of sample X. The path length h(x) of sample point x is the number of edges from the root node to
the leaf node of iTree. The algorithm flow for calculating h(x) is as follows.

Algorithm 4 Calculate the length of the sample in the tree -- ( , , )


Input:x is a sample. T denotes an iTree. e denotes the current height of x on iTree.(The initial value
is 0)
Output:the height of x on iTree.
1:if T is a leaf node then
2: return 𝑒 + 𝑐( . 𝑠𝑖 𝑒)
3:end if
4:if . 𝑠𝑝𝑙𝑖𝑡𝑉𝑎𝑙𝑢𝑒 then
5: return 𝑎𝑡𝑕𝐿𝑒𝑛𝑔𝑡𝑕( , . 𝑙𝑒𝑓𝑡, 𝑒 + 1)
6:else
7: return 𝑎𝑡𝑕𝐿𝑒𝑛𝑔𝑡𝑕( , right, 𝑒 + 1);
8:end if

Outlier detection using iForest is performed by calculating the score of sample X. The score of
sample x is defined as Formula (12).

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13934

𝐸(ℎ(𝑥))

𝑠( , 𝑛) = 2 𝑐(𝑛) (12)

E (H (x)) represents the average path length of sample x on all iTrees in the isolated forest. c(n) is
the average value of path length for a given number of samples n, which is used to standardize the path
length H(x) of sample X. The relationship between the score s and E (H (x)) is shown in Figure 3.

Figure 3. the relationship between the score s and E (H (x))

It can be seen from Figure 3 that when E (H (x)) → C (n), s → 0.5, that is, when the average
path length of sample x is close to the average path length of the iTrees, it is difficult to distinguish
whether it is an outlier or not. When E (H (x)) → 0, s → 1, that is, when the score of X is close to
1, it is determined to be abnormal. When E (H (x)) → n-1, s → 0, it is determined to be normal.
Indeed, determining whether the data point is an outlier depends on the threshold value to
determine whether the data point is abnormal. If the threshold of deciding an outlier is too high, the
speed outlier cannot be detected in the data set as much as possible. If the threshold is too low,
misjudging the normal data as abnormal is possible. Here, the threshold can be determined by the
relationship between the speed weight and noise points in multi-dimensional density clustering.
Usually, multi-dimensional density clustering is mainly used to measure the position differences
of ship sub-trajectories, and its speed factor has little effect on the positions of trajectory points.
Therefore, the value of its velocity weight should generally not exceed the reciprocal of the number
of dimensions. If the number of noise points decreases obviously with the increase of speed weight,
it indicates that the ship speed in the data set is relatively average. So the speed weight should be
taken as a smaller value, and the threshold needed for determining the speed as an outlier should not
be too large. If the ship speed changes significantly in the data set, the speed weight can be
appropriately increased, but it should not exceed the reciprocal number of dimensions of
multi-dimensional density clustering. After selecting the appropriate speed weight, the relationship
between the threshold and the speed weight is defined as shown in Formula (13).

𝑘
𝑠𝑐𝑜 𝑒(𝜔) = (13)
1+𝑒 −𝜔

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13935

w is the velocity weight and k is the harmonic number. When the score of ship speed at a certain
moment is greater than score(w), the ship speed is abnormal; otherwise, it is normal.

1, 𝑠𝑐𝑜 𝑒(𝑣𝑖 ) 𝑠𝑐𝑜 𝑒(𝜔)


𝑦𝑣𝑖 = { (14)
0, 𝑒𝑙𝑠𝑒

When 𝑦𝑣𝑖 = 1, the speed is abnormal. When 𝑦𝑣𝑖 = 0, the speed is normal. After removing the

abnormal speed points of ships and extracting the correct speed setting in each grid, by calculating
the abnormal score value of the speed to be detected in the correct speed set, Formula (14) can judge
whether the ship speed is abnormal.
All in all, by Formula (13), the method will provide us with one reasonable basis for using the
threshold value to judge speed as abnormal. Meanwhile, the strategy of selectively constructing iTree
will accelerate the detection of abnormal ship speed.

5. Experiments and results analysis

During experiments, first, the original AIS data were preprocessed (improvement of data quality
and data compression, etc), and then multiple detection methods for abnormal ship behavior (ship's
position outliers and speed outliers) were compared from four perspectives (recall, precision, F1
score and accuracy). Finally, we carried out ablation experiments. The experimental hardware
environment in this study was Intel○ R
CoreTM i7-8700 octa-core CPU (3.20 GHz), 8 GB RAM; the
software experimental environment was Windows 10, Python 3.8 and JDK 1.8.

5.1. Data Preparation

The data [33] selected in the experiment were the AIS data of a passenger ship near Xiamen
port (the whole journey is about 18 km), a total of 40011 data point. The spatial area has a longitude
of 117.77 to 118.63, latitude of 24.09 to 24.69 and time range from November 29, 2018 to January 3,
2019. However, the unprocessed raw AIS data may have data quality issues, which can affect the
construction of abnormal ship behavior detection models [34]. Therefore, for the issue of raw AIS
data quality, we conducted relevant research and processing, such as the interpolation of trajectory
breakpoints using the cubic spline method and identification and removal of abnormal AIS data
(abnormal stop points, abnormal acceleration points, abnormal drift points, abnormal turning points)
to enhance the continuity and integrity of AIS data and improve the quality of AIS data [3]. Figure 4
shows the number of ship trajectory points after data preprocessing, MDL compression and AMDL
compression.
As seen in Figure 4, the number of data points before and after data preprocessing varies greatly.
If the abnormal data caused by AIS equipment abnormality is not removed, the ship's abnormal
behavior analysis will be significantly affected. The AMDL compression algorithm is an
improvement of the MDL algorithm. Based on the MDL algorithm, it forcibly retains the points
where there is a positive or negative acceleration transformation. Therefore, the AMDL algorithm
can better reflect the real characteristics of ship motion. The comparison of a trajectory point before
and after compression is shown in Figure 5. It can be seen that the compressed trajectory points have
a good balance of accuracy and simplicity.

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13936

Figure 4. Comparison of the number of points before and after data processing.

(a) Before compression (b) After compression

Figure 5. the comparison of result before and after trajectory compression.

5.2. Detection and analysis of abnormal ship behavior

For the analysis of ship behavior anomaly detection, we have analyzed it from the perspective
of recall, precision, F1 score and accuracy. Since AIS has no official standard data set, to label the
data set correctly as much as possible, the noise points of multi-dimensional density clustering were
marked as abnormal trajectory points, and the rest were marked as normal trajectory points. There
were 935 abnormal trajectory points and 14798 normal trajectory points. Here, the trajectory outlier
detection method (MDDBSCAN) in this paper is compared with the TODLOF trajectory outlier
detection method [20] and the isolation-based trajectory outlier detection algorithm (IBTOD)
trajectory outlier detection method [24] in terms of detection rate and false alarm rate. The confusion
matrices for the detection results for normal and abnormal trajectory points are shown in Figures 6–8.
Meanwhile, the MDDBSCAN method was compared with TODLOF, IBTOD, graph attention
network [35], Long Short-Term Memory, and feature fusion methods [36], and the results are shown
in Figure 9 and Table 1.

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13937

Figure 6. The result of IBTOD algorithm.

Figure 7. The result of MDDBSCAN algorithm.

Figure 8. The result of TODLOF algorithm.

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13938

Table 1. The ship position anomaly detection results for different methods.

Methods Recall Precision F1 Accuracy


MDDBSCAN 0.9803 1 0.9901 0.9814
TODLOF 0.8752 0.9964 0.9319 0.8796
IBTOD 0.8437 0.9826 0.9078 0.8389
GAT 0.9135 0.8906 0.9013 0.9022
LSTM 0.7214 0.8049 0.7131 0.7651
Feature Fusion 0.9700 0.9650 0.9600 0.9600

MDDBSCAN TODLOF IBTOD GAT LSTM Feature Fusion


1 0.9964
0.9901 0.9814
1 0.9803 0.97 0.9826 0.965 0.96 0.96
0.95 0.9319
0.9135 0.9078 0.9022
0.8906 0.9013
0.9 0.8752 0.8796

0.8437 0.8389
0.85
0.8049
VALUE

0.8
0.7651
0.75
0.7214 0.7131
0.7

0.65

0.6
RECALL PRECISION F1 ACCURACY

Figure 9. Ship position anomaly detection results for different methods.

Table 1 and Figure 9 show that the MDDBSCAN method outperforms other methods. Analyzing
the reasons, the core idea of TODLOF is based on the local outlier factor algorithm, which requires
that the detected data must have an obvious density difference. However, for a ship trajectory with a
fixed round-trip destination it is difficult to always ensure the obvious density difference, which
limits the application scenario of the algorithm. The core idea of the IBTOD is based on the isolation
forest algorithm. Still, this algorithm often requires a small data set, and a large number of samples
will reduce the ability of isolated forest outliers because normal samples will interfere with the
isolation process and reduce the ability to isolate outliers. At the same time, the algorithm assumes
that the number of abnormal samples in the overall model is tiny, so the application scenario of the
algorithm is also relatively limited. The GAT, LSTM and feature fusion methods are all based on
deep learning. Their detection capability depends on the quality of the training data set and the
appropriate hyper-parameters, and their detection effect is unstable.
For the detection and analysis of ship speed outliers, 246 ship speed values in a grid area were
selected in this study. Five of the values were marked as outliers, and the rest were marked as normal
values. Figure 10(a)–(d) all describe the variation of the number of noise points with four different
weight values. For example, in the experiment in Figure 10(a), when the velocity weight was set to
0.2, the other three weights were set to one-third of 0.8 and when the velocity weight was set to 0.25,

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13939

the other three weights were set to one-third of 0.75.

(a) area 1 (b) area 2

(c) area 3 (d) area 4

Figure 10. The number of noise points under different weights.

Four speed weight values were selected to measure the relationship between speed weight and
noise points. As seen from Figure 10 above, compared with the increase of the weight of the other
three dimensions, the number of noise points has an undeniable downward trend with the increase of
speed weight. This reflects that the speed difference of ships in the data is relatively average, so the
value of speed weight cannot exceed the weight value of the other three dimensions. Next, a different
threshold for determining the speed as abnormal was calculated according to different speed weight
values. The confusion matrices for the detection results are shown in Figures 11–13.
It can be seen that the score (w ≤ 0.25) can provide a more appropriate anomaly threshold for
judging whether the speed is abnormal from Figures 11–13. According to Figures 11–13, the recall,
precision, accuracy and F1 values of the model can be calculated under different anomaly thresholds,
and the method was also compared with feature fusion; the results are shown in Figure 14 and Table 2.

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13940

Figure 11. The result of score(0.05).

Figure 12. The result of score(0.15).

Figure 13. The result of score(0.25).

Table 2. The ship position anomaly detection results for different methods.

Threshold/Method Recall Precision F1 Accuracy


score(0.05) 0.9959 1 0.9979 0.9959
score(0.15) 1 1 1 1
score(0.25) 1 1 1 1
0.75 1 0.9877 0.9938 0.9879
0.8 1 0.9797 0.9897 0.9797
Feature Fusion 0.9600 0.9700 0.9600 0.9600

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13941

score(0.05) score(0.15) score(0.25) 0.75 0.8 Feature Fusion

1 1 1 1 1 1 1 0.99791 1 1 1
0.9959 0.9938 0.9959
1 0.9897
0.9877 0.9879
0.99
0.9797 0.9797
0.98 0.97
0.97 0.96 0.96 0.96
0.96
VALUE

0.95
0.94
0.93
0.92
0.91
0.9
Recall Precision F1 Accuracy

Figure 14. Ship position anomaly detection results for different methods.

Table 2 and Figure 14 show that when w is less than 0.25, the model can be guaranteed to detect
abnormal ship speed to a greater extent by using the score(w) formula. It can be seen that score(w)
can be used to determine a more appropriate anomaly threshold for judging whether the speed is
abnormal. When the anomaly threshold > 0.75, the detection capability begins to deteriorate, because
the speed of ships at sea is relatively average. The high anomaly threshold is difficult to apply to
accurately identify the anomaly data in a data set with a low degree of dispersion. The feature fusion
method is based on a deep learning algorithm, and its detection capability depends on the appropriate
super parameters and the quality of the training set. Its detection capability is not stable enough. Next,
comparing the improved iForest algorithm with the traditional iForest algorithm (which adopts the
strategy of selectively constructing isolated trees) from the perspective of algorithm efficiency, the
results are shown in Table 3.

Table 3. The comparison of the efficiency of different methods of detection.


The number of data points Running time Average detection time for
Method
detected (ms) single data points (ms)
IForest 246 1750 7.11
Improved iForest 246 1662 6.76

As can be seen from Table 3, the iForest algorithm takes 7.11 ms to detect a single data points,
while the improved iForest algorithm takes 6.76 ms. In terms of algorithmic efficiency, the improved
iForest method improves efficiency by about 5% over the traditional iForest algorithm. By analyzing
the reason, it can be seen that the improved iForest algorithm adopts the strategy of selectively
constructing isolated trees; when the ratio of the number of samples divided into the left sub-tree and
the number of samples divided into the right sub-tree is not large, it chooses the strategy of stopping
construction, so its efficiency will be better than that of the iForest algorithm.

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13942

Finally, we have carried out ablation experiments to verify the high accuracy of the iForest
algorithm in detecting the abnormal speed of ships after noise removal by the MDDBSCAN
algorithm. The results are shown in Figures 15 and 16 and Table 4. Meanwhile, in this experiment,
the threshold used to detect whether the ship speed is abnormal was set to score (0.15). It can be seen
from Figures 15 and 16 and Table 4 that the detection capability of MDDBSCAN- Improved iForest
is better than that of Improved iForest. By analyzing the reasons, the MDDBSCAN algorithm
provides a global anomaly detection scenario for the iForest algorithm, which has high accuracy on
such data sets.

Figure 15. The result of MDDBSCAN-improved iForest.

Figure 16. The result of Improved iForest.

Table 4. The comparison of Improved iForest and MDDBSCAN-improved iForest.

Method Recall Precision F1 Accuracy


Improved iForest 0.9834 0.9916 0.9875 0.9756
MDDBSCAN-improved iForest 1 1 1 1

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13943

6. Conclusions

This method separates the detection of ship behavior outliers into three steps. The first part is
data preprocessing and data compression, which achieves the accuracy and simplicity of describing
ship trajectories. Second, the position information modeling scheme detects the ship position outliers.
By comparing the five trajectory outlier detection methods, the method in this paper had a better
detection effect. Finally, the isolation forest algorithm is used to detect the ship's speed outliers, and
the functional relationship between the speed weight of multi-dimensional density clustering and the
threshold for determining the speed as abnormal has been established. Experiments showed that the
threshold selected by score(w) had a good result for detecting ship speed outliers. This paper's
abnormal ship behavior detection method is suitable for online detection and can also mine more
abnormal ship information besides speed, such as ship acceleration, heading, etc. Meanwhile, due to
the evolving computing power techniques, establishing a more efficient and accurate abnormal ship
behavior detection model will also have a promising possibility.

Use of AI tools declaration

The authors declare they have not used Artificial Intelligence (AI) tools in the creation of this
article.

Acknowledgments

This research was funded by the Key Research and Development Program of China, grant
number 2021YFC2802503; Key Research and Development Program of Shaanxi Province, grant
number 2021ZDLGY05-05 and 2019ZDLGY12G07.

Conflict of interest

The authors declare no conflict of interest.

References

1. C. Claramunt, C. Ray, E. Camossi, A. Jousselme, M. Hadzagic, G. Andrienko, et al., Maritime


data integration and analysis: recent progress and research challenges, in 20th International
Conference on Extending Database Technology, 2017.
2. T. Lv, C. He, J. Zhang, Z. Song, Massive AIS data storage and query based on Hadoop platform,
J. Phys. Conf. Ser., 1948 (2021), 012016. https://doi.org/10.1088/1742-6596/1948/1/012016
3. L. Zhang,Y. Zhu, W. Lu, J. Wen, A detection and restoration approach for vessel trajectory
anomalies based on AIS, J. Northwest. Polytech. Univ., 39 (2021), 119–125.
https://doi.org/10.1051/jnwpu/20213910119
4. K. Wolsing, L. Roepert, J. Bauer, K. Wehrle, Anomaly detection in maritime AIS tracks: A
Re-view of Recent Approaches, J. Mar. Sci. Eng., 10 (2022), 112.
https://doi.org/10.3390/jmse10 010112

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13944

5. C. Tian, Y. Yuan, S. Zhang, C. Lin, W. Zuo, D. Zhang, Image super-resolution with an


enhanced group convolutional neural network, 153 (2022), 373–385. https://doi.org/10.1016/
j.neunet.2022.06.009
6. C. Tian, Y. Zhang, W. Zuo, C. Lin, D. Zhang, Y. Yuan, A heterogeneous group CNN for image
super-resolution, IEEE Trans. Neural Netw. Learn. Syst., 13 (2022). https://doi:
10.1109/TNNLS.2022.3210433.
7. D. Zhang, L. Nan, Z. Zhou, C. Chen, L. Sun, S. Li, iBAT: Detecting anomalous taxi trajectories
from GPS traces, in UbiComp 2011: Ubiquitous Computing, 13th International Conference,
(2011), 99–108. https://doi.org/10.1145/2030112.2030127
8. J. Zhu, W. Jiang, A. Liu, G. Liu, L. Zhao, Time-dependent popular routes based trajectory
outlier detection, in International Conference on Web Information Systems Engineering, 9418
(2015). https://doi.org/10.1007/978-3-319-26190-4_2
9. C. Chen, D. Zhang, P. Castro, N. Li, L. Sun, S. Li, et al., iBOAT: isolation-based online
anomalous trajectory detection, IEEE Trans. Intell. Trans. Syst., 14 (2013), 806–818.
https://doi.org/10.1109/TITS.2013.2238531
10. J. Zhu, W. Jiang, A. Liu, G. Liu, L. Zhao, Effective and efficient trajectory outlier detection
based on time-dependent popular route, World Wide Web, 20 (2017), 111–134.
https://doi.org/10.1007/s11280-016-0400-6
11. W. Hao, W. Sun, B. Zheng, A fast trajectory outlier detection approach via driving behavior
modeling, in Proceedings of the 2017 ACM on Conference on Information and Knowledge
Management, (2017), 837–846. https://doi.org/10.1145/3132847.3132933
12. L.Bao, M. Du, A distance-based trajectory outlier detection method on maritime traffic data, in
2018 4th International Conference on Control, Automation and Robotics (ICCAR), 2018.
https://doi.org/10.1109/ICCAR.2018.8384697
13. E. Martineau, J. Roy, Maritime anomaly detection: domain introduction and review of selected
literature, Defense Res. Develop. Canada, 2011.
14. L. Portnoy, E. Eskin, S. Stolfo, Intrusion detection with unlabeled data using clustering, ACM
Workshop Data Mining Appl., 2001.
15. S. Zhang,Q. Tang, Abnormal vessel behavior detection based on AIS Data, Artif. Intell. Rob.
Res., 04 (2015), 23–31. https://doi.org/10.12677/airr.2015.44004
16. R. Lane, D. Nevell, S. Hayward, T. W. Beaney, Maritime anomaly detection and threat
assessment, in 2010 13th International Conference on Information Fusion, (2010), 1–8.
https://doi.org/10.1109/ICIF.2010.5711998
17. R. Laxhammar, Anomaly detection for sea surveillance, in International Conference on
Information Fusion, (2008), 1–8.
18. Y. Wang, J. Liu, R. Liu, Y. Liu, Z. Yuan, Data-driven methods for detection of abnormal ship
behavior: Progress and trends, Ocean Eng., 271 (2023).
https://doi.org/10.1016/j.oceaneng.2023.113673
19. F. Luan, Y. Zhang, K. Cao, Q. Li, Based local density trajectory outlier detection with
partition-and-detect framework, in 2017 13th International Conference on Natural Computation,
Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), (2017), 1708–1714.
https://doi.org/10.1109/FSKD.2017.8393023

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13945

20. B. Liang, S. Wu, W. Chen, Z. Zhu, Trajectory outlier detection based on partition-and-detection
framework, in 2017 13th International Conference on Natural Computation, Fuzzy Systems and
Knowledge Discovery (ICNC-FSKD), 2017. https://doi.org/10.1109/FSKD.2017.8393071
21. A. Belhadi, Y. Djenouri, D. Djenouri, T. Michalak, J. C. Lin, Deep learning versus traditional
solutions for group trajectory outliers, IEEE Trans. Cybernetics, 6 (2020), 1–12.
https://doi.org/10.1109/TCYB.2020.3029338
22. M. Szarmach, I. Czarnowski, Multi-Label classification for AIS data anomaly detection using
wavelet transform, IEEE Access, 10 (2022), 109119–109131.
https://doi.org/10.1109/ACCESS.2022.3214217
23. Y. Chen, J. Yu, G. Yong, Detecting trajectory outliers based on spark, in 2017 25th
International Conference on Geoinformatics, (2017), 1–5.
https://doi.org/10.1109/GEOINFORMATICS.2017.8090919
24. K. Hu, P. Duan, B. Hu, Q. Duan, IBTOD: An isolation-based method to detect outlying
sub-trajectories on multi-factors, in IEEE Advanced Information Management, Communicates,
Electronic and Automation Control Conference, 2018.
https://doi.org/10.1109/IMCEC.2018.8469416
25. A. Belhadi, Y. Djenouri, C. Lin, Comparative study on trajectory outlier detection algorithms, in
2019 International Conference on Data Mining Workshops (ICDMW), (2019), 415–423.
https://doi.org/10.1109/ICDMW.2019.00067
26. R. Maria, P. Giuliana, V. Michele, Maritime anomaly detection: A review, Wiley Interdiscip.
Rev. Data Mining Knowl. Discovery, 8 (2018), 8. https://doi.org/10.1002/widm.1266
27. S. Papadimitriou, H. Kitagawa, P. Gibbons, C. Faloutsos, LOCI: fast outlier detection using the
local correlation integral, in Proceedings 19th International Conference on Data Engineering,
2003, 315–326. https://doi.org/10.1109/ICDE.2003.1260802
28. G.Pallotta, M.Vespe, K.Bryan, Vessel pattern knowledge discovery from AIS data: A
framework for anomaly detection and route prediction, Entropy, 15 (2013), 2218–2245.
https://doi.org/10.3390/e15062218
29. W.Dai, C.Zhang, X.Su, S. Cao, Trajectory Outlier Detection Based on DBSCAN and Velocity
Entropy, in 2020 International Conferences on Internet of Things (iThings) and IEEE Green
Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing
(CPSCom) and IEEE Smart Data (SmartData) and IEEE Congress on Cybermatics
(Cybermatics), (2020), 550–557.
https://doi.org/10.1109/iThings-GreenCom-CPSCom-SmartData-Cybermatics50389.2020.00097
30. Z. Cheng, C. Zou, J. Dong, Outlier detection using isolation forest and local outlier factor, in
Proceedings of the Conference on Research in Adaptive and Convergent Systems, (2019), 161–
168. https://doi.org/10.1145/3338840.3355641
31. F. Luan, Y. Zhang, K. Cao, Q. Li., Based local density trajectory outlier detection with
partition-and-detect framework, in 2017 13th International Conference on Natural Computation,
Fuzzy Systems and Knowledge Discovery (ICNC-FSKD), (2017), 1708–1714.
https://doi.org/10.1109/FSKD.2017.8393023
32. T. Fei, M. Kai, Z. Zhou, Isolation forest, in Proceedings of the 2008 Eighth IEEE International
Conference on Data Mining, (2008), 413–422. https://doi.org/10.1109/ICDM.2008.17
33. Historical AIS Data Services (accessed on 10 December 2018). Available from:
http://www.vtexplorer.com/

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.


13946

34. C. Iphar, C. Ray, A. Napoli, Data integrity assessment for maritime anomaly detection, Expert
Syst. Appl., 147 (2020), 3. https://doi.org/10.1016/j.eswa.2020.113219
35. H. Liu, Y. Liu, Z. Zong, Research on ship abnormal behavior detection method based on graph
neural network, in 2022 IEEE International Conference on Mechatronics and Automation
(ICMA), (2022), 834–838. https://doi.org/10.1109/ICMA54519.2022.9856198
36. G. Huang, S. Lai, C. Ye, H. Zhou, Ship trajectory anomaly detection based on multi-feature
fusion, in 2021 IEEE International Conference on Smart Data Services (SMDS), (2021), 72–81.
https://doi.org/10.1109/SMDS53860.2021.00020

©2023 the Author(s), licensee AIMS Press. This is an open access


article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/4.0)

Mathematical Biosciences and Engineering Volume 20, Issue 8, 13921–13946.

You might also like