Reachable Distance for KNN Classification
Reachable Distance for KNN Classification
Classification
Shichao Zhang, Senior Member, IEEE, Jiaye Li and Yangding Li
Abstract—Distance function is a main metrics of measuring the affinity between two data points in machine learning. Extant distance
functions often provide unreachable distance values in real applications. This can lead to incorrect measure of the affinity between
data points. This paper proposes a reachable distance function for KNN classification. The reachable distance function is not a
geometric direct-line distance between two data points. It gives a consideration to the class attribute of a training dataset when
measuring the affinity between data points. Concretely speaking, the reachable distance between data points includes their class
center distance and real distance. Its shape looks like “Z”, and we also call it a Z distance function. In this way, the affinity between
data points in the same class is always stronger than that in different classes. Or, the intraclass data points are always closer than
those interclass data points. We evaluated the reachable distance with experiments, and demonstrated that the proposed distance
function achieved better performance in KNN classification.
1 INTRODUCTION
I
N machine learning applications, we must measure the affinity between data points, so as to carry out data clustering, or data
classification. A natural way of measuring the affinity between data points is Euclidean distance function, or its variants. It is
true that Euclidean distance is geometrically the shortest distance between two data points, which is only a spatially direct-line measure
[1]. However, Euclidean distance function and its variants often provide unreachable distance values in real applications. This can
lead to incorrect measure of the affinity between data points. In other words, this is not a real reachable distance in real
applications although Euclidean distance function and its variants have been widely adopted in data analysis and processing
applications. We illustrate this with two cases as follows.
Case I. There is a gap, such as a frontier, a river/sea, and a mountain, between two points, see Fig. 1. This gap can often be
unbridgeable. In other words, these two points are not able to reach in Euclidean distance time from each other.
Case II. Two data points are packed in different bags/shelves, illustrated in Fig. 2. For example, in a clinic doctors always put
medical records of benign tumors into a bag, and all medical records of malignant tumors into another bag. This indicates that a
medical record concerning a benign tumor is very far from any medical records of malignant tumors.
From the above Cases I and II, the Euclidean distance between two data points cannot be the reachable distance. In other words,
reachable distance between two data points has been an open problem. However, in big data mining, we must measure the affinity
between data points with proper metrics. These motivate
A
malignant tumors
B
benign tumors
Fig. 2. Medical records with different classes are often packed in differ-
ent bags.
j=1
a j −bj
( j ) σ
2
(3)
that in different classes. In this way, the intraclass data points are always closer than those interclass data points in a training
dataset. The reachable distance is evaluated with experiments, and demonstrated that the proposed distance function achieved better
performance in KNN classification.
The rest of this paper is organized as follows. Related work and some concepts are recalled in Section 2. The Z distance (reachable
distance) is proposed in Section 3. The reachable distance is evaluated with experiments in Section 4. This paper is concluded in
Section 5.
2 RELATED WORK
This section first recalls traditional distance functions. And then, some new distance functions are briefly discussed. Finally, it simply
discussed that Euclidean distance is often not the reachable distance in real applications.
where S is the covariance matrix. It can be seen from Eq ( 4) that if the covariance matrix is a identity matrix, the Mahalanobis
distance is the same as the Euclidean distance. If the covariance matrix is a diagonal matrix, then the Mahalanobis distance is the
same as the standardized Euclidean distance. It should be noted that Mahalanobis distance requires that the number of samples of
data is greater than the number of dimensions, so that the inverse matrix of covariance matrix S exists. Its disadvantage is
computational instability due to covariance matrix S.
The Bhattacharyya distance is a measure of the similarity
between two probability distributions, as shown in the following formula.
dB(p, q) = − ln(BC(p, q)) (5) where p and q are the two probability distributions on data
X respectively. If it is a discrete probability distribution, then Σ
BC(p, q) = p(x)q(x). If it is a continuous probabil- √
x∈X
∫√
ity distribution, then BC(p, q) = p(x)q(x)dx. The KL (Kullback-Leibler) divergence is similar to the Bhattacharyya distance.
It can also measure the distance or similarity of two probability distributions. As shown in the following formula:
[12], Mahalanobis distance [13], Bhattacharyya distance [14], Kullback-Leibler divergence [15], Hamming q(x) distance [16] and cosine
distance [17]. Next, we introduce these distance functions
∫
KL(p||q) = p(x) log dx
(7)
Σ
KL(p||q) = p(x) log p(x)
(6)
in detail.
Traditional distance function is defined as follows.
Σd p 1
p
d(a, b) = [j=1 |a
j −j b | ]
(1)
p(x)
q(x)
Eqs. (6) and (7) are the discrete probability distribution and continuous probability distribution, respectively. KL divergence
has a wider range of applications relative to Mahalanobis
distance.
In data transmission error control coding, Hamming distance is often used to measure the distance between two characters. It
describes the number of different values in the two codes. The formula is defined as follows.
Σ d
Hd (a, b) = j j a ⊕b
j=1
(8)
where ⊕is the XOR operation. Both a and b are n-bit codes. For example: a = 11100111, b = 10011001, then the Hamming
distance between a and b is d(a, b) = 6. Hamming distance is mostly used in signal processing. It can be used to calculate the
minimum operation required from one signal to another.
In addition to the above distance functions, there is also a cosine distance function. It is derived from the calculation of the cosine of
the included angle, as shown in the following formula.
Σd
ja j b
[23]. Specifically, it first uses the structural information of the data to formulate a semi-supervised distance metric learning model.
Then it transforms the proposed method into a problem of min- imizing symmetric positive definite matrices. Finally, it proposes an
accelerated solution method to keep the matrix symmetric and positive in each iteration. Wang et al. proposed a robust metric learning
method [24]. This method is an improvement of the nearest neighbor classification with large margin. Its main idea is to use random
distribution to estimate the posterior distribution of the transformation matrix. It can reduce the influence of noise in the data, and the
anti-noise of the algorithm is verified in experiments. Jiao et al. proposed a KNN classification method based on pairwise distance
metric [25]. It uses the theory of confidence function to decompose it into paired distance functions. Then it is adaptively designed as a
pair of KNN sub-classifiers. Finally, it performs multi-classification by integrating these sub- classifiers. Song et al. proposed a high-
dimensional KNN searcha·b
dC(a, b) = 1 − ǁaǁǁbǁ Σ Σ
= 1− s
j=1
aj bj
2 2
s
d d algorithm through the Bregman distance [26]. Specifically, it first
(9) j=1 j=1 partitions the total dimensions to obtain multiple subspaces. Then
Cosine distance is mostly used in machine learning algorithms to calculate the distance or similarity between two data points. Its
value range is [0, 2], which satisfies the non-negativity of the distance function. Its disadvantage is that it only considers the direction of
two samples, and does not consider the size of their values.
Distance function Measuring affinity Class information With or without parameters Reachable distance
Euclidean distance C ×
× ×
× ×
×
Manhattan distance C
Chebyshev distance C × × ×
× × ×
Mahalanobis distance C
Bhattacharyya distance C × ×
Kullback-Leibler divergence C × × ×
Hamming distance C × ×× ×
Cosine distance C
Gou et al. [19] C C C
×
Poorheravi et al. [20] C
×
Song et al. [21] C C × ×
Noh et al. [22] C C C
×
Ying et al. [23] C C C ×
Wang et al. [24] C C C
×
Jiao et al. [25] C C C
×
Song et al. [26] CC × C
C ×
Su et al. [27] C
×
Faruk Ertugrul et al. [28] C × C ×
C C
local hybrid spill tree to divide the data set into multiple subsets, and calculates the class membership degree of each subset. In addition,
it also uses the global approximate hybrid spill tree to generate a tree from the training data, so as to consider the class
membership degree of all samples. In this way, the method considers not only the local structure information of the data, but also the
global structure of the data.
Distance function can be used not only for KNN, but also for missing value filling, class imbalance classification, multi label
learning and clustering. It has a wide range of applications, e.g., Seoane Santos et al. used KNN to perform missing value interpolation
through different distance functions, and verified the effects of different distance functions [33]. Marchang et al. used KNN to propose a
sparse population perception model [34]. It con- siders spatial correlation and temporal correlation in the algorithm respectively. In
addition, the correlation between time and space is also embedded in the proposed method. Experiments have also shown that KNN,
which considers the correlation between time and space, has a better effect in the inference of missing data. Valverde, et al. used KNN
for text classification, and carried out the influence of different distance functions on text classification [35]. Susan and Kumar proposed
a combination of metric learning and KNN for class imbalance data classification [36]. Specifically, it first performs spatial
transformation on the data. Then it divides the K test samples into two clusters according to the distance of the two extreme
neighbors. Finally, the majority vote rule is used to determine the class label of test data. Although these researchers have proposed
some new measurement functions, none of them really takes the natural distance into account in the data. Sun et al. proposed a metric
learning for multi-label classification [37]. It is modeled by the interaction between the sample space and the label space. Specifically, it
first adopts matrix weighted representation based on component basis. Then it uses triples to optimize the weight of the components.
Finally, the effectiveness of the combined metric in multi-label classification is verified on 16 benchmark data sets. Gu et al.
proposed a new distance metric for clustering [38]. This method combines the advantages of Euclidean distance and cosine distance. It
can be applied to clustering to solve high-dimensional problems. Gong et al. used indexable distance to perform nearest neighbor query
[39]. It uses
kd-tree to further improve the search speed of the algorithm. Wang analyzed the multimodal data and showed the importance of
distance function in depth multimodal method [40]. Wang et al. proposed a dimensionality reduction algorithm for multi view data,
which is called kernel multi view subspace analysis [41]. It uses self weighted learning to apply appropriate weights to different
views. After reducing the dimension of multi view data, this method can greatly increase the applicability of distance function.
In addition to the above application of improving KNN based
on distance function, KNN with machine learned distances is also widely used in the field of time series classification [42]. In the
field of time series classification, general distance func- tions, such as Euclidean distance, Hamming distance and other distance
functions calculated similarity by corresponding position elements. They are not suitable for time series classification. The reasons for
this are: when the same phenomenon is observed many times, we can’t expect it to always occur at the same time and location.
And the duration of the event may be slightly different [43]. However, there are still some distance functions to solve this problem.
For example, Sakoe and Chiba proposed a dynamic time warping (DTW) to calculate the distance between spoken recognition time
series [44]. Specifically, it first obtains the general principle of time normalization by using the time distortion function. Then it
deduces the symmetric and asymmetric forms of distance function from this principle. Finally, it uses slope constraint to limit the slope
of warpage function, so as to improve the ability to distinguish between different classes. The nearest neighbor classifier based on
DTW and its variants have achieved great success in time series classification because they consider the unique time features in time
series data.
Notations Descriptions
X Rn×d Training set with n samples and d features.
Xtest∈∈ Rm×d Test set with m samples and d features.
Xlabel ∈ R1×n Class label of training data
YS Class label of test data.
a Sample a.
b Sample b.
aj The j-th element in vector a.
The center point of the class
ca
in which sample point a is located.
The center point of the class
cb in which sample point b is located.
The center point of the class
ce
in which sample point e is located.
c1, c2, c3, . . . , cc The center point of the first to c-th classes.
d(a, b) Euclidean distance between a and b.
mean() Mean function.
Gj the j-th class data in the training set.
K-nearest neighbor set
N (Z0) and N (Z) under Z0 distance and Z distance.
µ and K Adjustable parameters.
collectors take data as detailed as possible, so as to support much more data mining applications. In this way, some natural separation
information can be merged into databases, see Case I in the introduction section.
From extant data mining applications, both data collectors and data miners are unaware of that there may be an unbridgeable gap
between two data points, i.e., the Euclidean distance between two data points is not the reachable distance between in them. This must
lead to that the performance is decreased.
Different from current distance functions, this research pro- poses a reachable distance function, aiming at that the intraclass data
points are always closer than those interclass data points in training datasets. The reachable distance function finds a clue to developing
more suitable distance functions.
3 APPROACH
In this article, we use lowercase letters, lowercase bold letters, and uppercase bold letters to represent scalars, vectors, and matrices,
respectively. Assume a given sample data set X Rn×d, where n and d represent the number of samples and the number of features,
respectively. aj represents the j-th element in vector
∈ a. And let ca be the center point of the class in which sample point a is located, cb
be the center point of the class in which sample point b is located. The symbols used in this paper are shown in Table 2.
where d(t, c1) represents the Euclidean distance from t to c1, t represents the test data point, and c1, c2, c3, , cc represents the
center point of the first to c-th classes. From Eq. (10), it can be taken as that in the process of classification, we do not
· · · need to set any
parameters like K-nearest neighbor classification (such as the selection of K value). In practical applications, it is only necessary to
request the distance of the test data to the center point of each class. Then, which distance is closest, the class in which the center
point is located is predicted as the class label of the test data.
Although the above-described distance function (i.e., Eq. (10)) takes the characteristics of the class into account, it is still based
on the Euclidean distance to some extent. In addition, the method is poorly separable for the calculated distance, because different class
centers may be close to each other or different class centers are the same distance from the test data points, and the classification effect
of the algorithm will not be good. In summary, in this paper, we propose a reachable distance function for KNN classification as
follows.
Definition 1. Let a and b be two sample points, ca the center point of the class in which sample point a is located, and cb the
center point of the class in which sample point b is located. The Z0 distance between a and b is defined as
Z0(a, b) = d(a, ca) + d(b, cb) + µ ∗ d(ca, cb) (11) where d() is the Euclidean distance between two points.
It can be seen from Eq. (11) that if a and b do not belong to the same class, then their distance is farther through the Z0
function. In other words, by the calculation of the Eq. (11), it can make the distance between points of same class smaller than
the distance between sample points of different class. In this way, the separability of the class is greatly increased. It is undeniable
that this distance measurement function has changed our previous perception of distance. It should be noted that method makes the
classes more separable, but it is true that it increases the distance between similar sample points compared to the traditional method.
Because it introduces a class center point, it can be proved from Fig 3, i.e., it makes the original straight line distance into a
polyline distance. In response to this small defect, we can improve the Z0 distance in Eq. (11) as follows.
Definition 2. The Z distance (reachable distance) function is
defined as
class center closest to the overall data center, all data points in this 10
if T == Z0-KNN then
N (Z0) = i
min(Z0(X , X1), . . . , Z0(Xi
→
, Xn));
class are not necessarily close to the overall data center, i.e., when calculating the K nearest neighbors of the test data according to
11
1 K
end
test
test
the distance function, there are still some nearest neighbors in 12
Interclass distance
Intraclass distance
the distance between data points in Eq. (11). From Fig. 3, we can find that the distance between two data points of the same class is not
a straight line distance, it needs to pass the class center point. The shape of the distance between two data points in different
classes is like a “Z”.j Fig. 4 shows the distance between data
test
points
5 end
6 Calculate
dmin = min(d(Xitest
test j
, cc));
in Eq. (12). From Fig. 4, we can find that the distance between two
data points is the same as the traditional Euclidean distance in the
same class. In different classes, the distance between two data points includes not only the Euclidean distance between them, but
7 We can get the class label Y S of the test data according
to the class label corresponding to dmin;
8 end
In order to better understand the proposed reachable distance, we describe it with Figs. 3-5. We focus on the characteristics of the
two new distance functions according to Figs. 3 - 4. Fig. 3 shows
also the distance between the center of the class. Fig. 5 shows a situation in practice. For example, in the case of Eq. (11) , d1 +
d2 +d3 is likely to be smaller than d1 +d4. In the case of Eq. (12), d1 + d2 is likely to be smaller than d3. The distance between data points
of different class may be smaller than the distance between data points of same class. To avoid it, we introduced the parameter µ. The
size of the µ value will affect the measurement of the
Property 1. Nonnegativity: z(a, b) ≥ 0
Property 2. Symmetry: z(a, b) = z(b, a)
Property 3. Directness: z(a, e) ≤ z(a, b) + z(b, e)
Intraclass distance Now let’s prove these three properties.
Intraclass distance
Interclass distance
Intraclass distance d4 Σ
d1 d2
d3
(aj
— caj) ] 2 + [
d
Σ
j=1
(bj
— cbj
1
) ]2
(13)
+[ (caj
j=1
— cbj
1
2
) ]2
Intraclass distance
Proof. For Property 3, i.e., the proposed Z distance satisfies the
d3
Intraclass distance directness in the following two cases.
(1) When data points a, b and e belong to the same class.
According to Eq. (11), we can get the following formula:
d1
d2
Interclass distance
z(a, b) + z(b, e) − z(a, e)
= d(a, ca) + d(b, ca) + d(b, ca)
+d(e, ca) − d(a, ca) − d(e, ca)
Σ
= 2 ∗ d(b, c ) = 2
∗[
(15)
2 1
Fig. 5. The schematic diagram of the problem with the new distance function.
natural distance between two data points. Usually, when µ > 1, intraclass distance is less than interclass distance. In addition, we can
see that the distance between data points of same class in Eq.
(11) is greater than the distance between data points of same class
in Eq. (12). This is the main difference between Eq. (11) and Eq.
According to Eq. (12) and trigonometric inequality, we can get the following formula:
Properties of Z distance
The Z (reachable) distance function has three basic properties as follows.
From Eqs. (15) and (16), we can get that when a, b and e belong to the same class, the proposed Z distance (Eqs. (11) and (12))
have the property of directness.
(2) When data points a, b and e belong to different classes. According to Eq. (11) and trigonometric inequality, we can get the
following formula:
z(a, b) + z(b, e) − z(a, e)
= d(a, ca) + d(b, cb) + µ ∗ d(ca, cb)
+d(b, cb) + d(e, ce) + µ ∗ d(cb, ce)
−d(a, ca) − d(e, ce) − µ ∗ d(ca, ce)
TABLE 4
Function comparison between Euclidean distance and Z distance.
2 1
The Z ddistance is based on Euclidean distance, which can be regarded as an improvement of Euclidean distance. Compared
Σ
+µ [
∗ j=1
(cbj
— cej) ] 2 + µ
d Σ
[j=1 ∗
(caj
— cej
) ]2 ≥ 0
1
with Euclidean distance, Z distance not only considers the natural distance, but also makes the distance between different classes
According to Eq. (12), we can get the following formula:
z(a, b) + z(b, e) − z(a, e)
= d(a, b) + µ ∗ d(ca, cb)
+d(b, e) + µ ∗ d(cb, ce)
−d(a, e) − µ ∗ d(ca, ce)
greater than the distance between data in the same class. The properties and functions of Euclidean distance and Z distance are
listed in Tables 3 and 4, respectively. In addition, For the traditional KNN algorithm based on Euclidean distance, because each test
data point needs to calculate the distance with all training
data, its time complexity is O(n ∗ d), where n represents the
Σd 2 1
Σd
=[ (a
j −bj ) ] 2 + ∗ [ aj
1
(c − c bj )2 ] 2 (18)
j=1 µ j=1 ∗ ∗
d d
Σ 1
Σ
∗
number of training samples and d represents the dimension of
data. If there are m test data, its time complexity is O(m n
d). 2 2
Similarly, the difference between Z-KNN and KNN method +[ lies(bj
j=1
— ej ) ] 2 + µ
[ (cbj
j=1
— cej
1
) ]2
in the distance function (i.e., Z distance and Euclidean distance).
The Z distance is based on Euclidean distance, and the calculation
Σd 1
Σd 1
−[ (a
j − je )22] − µ ∗j=1
[ aj(c −
ej c 2 )2 ] ≥0
j=1
times of Z-KNN and KNN in finding k nearest neighbors are
the same. Therefore, the time complexity of Z-KNN is still
According to Eqs. (17) and (18), when a, b and e belong to different classes, the Z distance (Eqs. (11) and (12)) has the
property of directness. In conclusion, the proposed Z distance satisfies the property of directness.
Corollary 1. Intraclass distance is less than interclass distance. Proof. For Eq. (11), if data points a and b belong to the same class,
then the Z distance between them is the following formula:
z(a, b) = d(a, ca) + d(b, ca) (19) where ca is the class center of data points a and b. If data points
a and e belong to different classes, the Z distance between them is the following formula:
z(a, e) = d(a, ca) + d(e, ce) + µ ∗ d(ca, ce) (20)
From Eqs. (19) and (20), we can see that as long as the following equations are proved to be true:
Obviously, the interclass distance is one more natural distance than the intraclass distance, i.e., µ d(ca, ce). If the value of parameter
µ is infinite, then Eq. (21) is sure to hold.
∗ When µ takes a very small value, Eq. (21) may not hold. Therefore, if the value of parameter µ
is large, then Eq. (11) satisfies the characteristic that intraclass distance is less than interclass distance.
Similarly, for Eq. (12), we only need to prove that the follow- ing formula holds:
We can see that, as in Eq. (11), if the value of parameter µ is large, Eq. (12) satisfies the characteristic that intraclass distance is less
than interclass distance.
O(m ∗ n ∗ d).
In Fig. 6, we show the comparison between the proposed Z distance (i.e., Eqs. (11) and (12)) and Euclidean distance. From Fig. 6,
we can see that using Euclidean distance to calculate the distance between original data does not make classes separable, i.e., intraclass
distance may be larger than interclass distance. The proposed Z distance (Eq. (11)) can increase the interclass distance, which leads to
higher separability of different classes. However, it also increases the intraclass distance, making similar samples more dispersed. The
Z distance (Eq. (12)) not only increases the interclass distance, but also makes the intraclass distance constant (the same as the
Euclidean distance). Therefore, Z distance (Eq. (12)) has the best class separability.
Euclidean distance
Distance function Nonnegativity Symmetry Directness Intraclass distance is less than interclass distance
Euclidean distance C C C ×
Z distance C C C C
TABLE 5
The information of the data sets.
Number
Datasets of samples Dimensions Classes
Banknote 1372 4 2
Cnae 1080 856 9
Drift 1244 129 5
Secom 1567 590 2
Ionosphere 351 34 2
Usps 9298 256 10
Yeast 1484 1470 10
Letter 20000 16 26
Movements 360 90 15
Multiple 2000 649 10
Statlog 6435 36 6
German 1000 20 2
4 E XPERIMENTS
In order to verify the validity of the new distance functions, we compare the KNN classification accuracy of the new distance functions
and the 6 other comparison algorithms with 12 data sets1 2 (as shown in Table 5).
Experiment settings
We download the data sets for our experiments from the datasets website, which includes 4 binary datasets and 8 multiclassification
datasets. We divide each dataset into a training set and a test set by ten-fold cross-validation (i.e., we divide the data set into 10 parts, 9
of which are used as training sets, and the remaining one is used as a test set, which is sequentially cycled until all data have been
tested). The comparison algorithm is introduced as follows during the experiment:
KNN [45]: It’s the traditional KNN algorithm, and we don’t have to do anything during the training phase. In the test phase, for
each test data point, we find its K neighbors in the training data according to the Euclidean distance. Then, the class label with the
highest frequency of class in the K neighbors is selected as the final class label of the test data.
Nearest class center point for KNN (NCP-KNN): This method is the most basic algorithm after introducing the class center point. In
the training phase, a class center point is obtained for the training data in each class. In the test phase, we calculate the distance
between each test data and the center point of the training process. The class of the nearest class center point is the class label of the test
data.
Coarse to fine K nearest neighbor classifier (CFKNN) [46]: In this method, a new metric function is proposed, which is expressed
linearly by test data. Specifically, it first uses training data to represent test data through least squares loss. Then it gets the
1. [Link]
2. [Link]
relational metric matrix by solving the least squares loss. Finally, it uses the new metric matrix to construct a new distance function. It
classifies the test data according to the new distance function and major rule.
Local mean representation-based k-nearest neighbor classifier (LMRKNN) [47]: It is an improved KNN method based on local
mean vector representation. Specifically, it first finds K neighbors in each class and constructs a local mean vector. Then it uses
these local mean vectors to represent each test data and obtains a relationship measurement matrix. Finally, it uses the matrix to
construct a new distance function for KNN.
Graph regularized k-local hyperplane distance nearest neigh- bor algorithm (GHKNN) [48]: This method is a local hyperplane
nearest neighbor algorithm based on multi-kernel learning. Specif- ically, it first constructs six sequence based feature descriptors, and
then learns the weight of features. Finally, graph regularized k-local hyperplane KNN is used to classify the subcellular local- ization
of noncoding RNA.
Minkowski distance based fuzzy k nearest neighbor algo- rithm (MDFKNN) [49]: This method uses Minkowski distance to
replace Euclidean distance, which avoids the invalidity of Euclidean distance to high-dimensional data. Specifically, it first uses
Minkowski distance to calculate the nearest neighbor data similar to each test data. Then it applies fuzzy weight to the nearest neighbor
data. Finally, it uses weighted average to achieve more accurate prediction of data.
A Weighted Mutual k-Nearest Neighbour (WKNN) [50]: This method can eliminate the influence of noise and pseudo neighbors in
data. Specifically, it uses mutual domain and distance weighted voting to weaken the influence of distant neighbors. In addition, after
removing outliers, the dataset will be refined, which makes the algorithm more inclined to consider those nearest neighbors.
Z0-KNN: It is the traditional KNN method based on the Z0 distance function (i.e., Eq. (11)). During the training process, we
calculate the center point of each class in the training data. In the test process, we find K neighbors from training data according to
Eq. (11). And then, we use the majority rule to predict the class label of test data.
Z-KNN: It is the traditional KNN method based on the Z distance function (i.e., Eq. (12)). It is basically the same as Z0- KNN’s
training process and testing process. The only difference is that it is based on Eq. (12).
For the above algorithms, we did a series of experiments. Specifically, for each dataset, we test all the algorithms by setting
different K values (i.e., 1-10), where the NCP- KNN algorithm has no K parameter, so we have performed 10 experiments for it. It is
convenient for us to put all the algorithms in one subgraph. Finally, we measure their performance based on classification accuracy. In
addition, in the case of K = 5, we performed 10 experiments on all algorithms to preserve the average classification accuracy and
standard deviation. Finally, for the binary classification dataset, we not only calculated their classification accuracy, but also calculated
their Sensitivity (Sen) and Specificity (Spe).
The accuracy(Acc) and standard deviation(std) are calculated by the following equations respectively:
where Xcorrect represents the number of test data that is correctly classified, and Xtotal represents the total number of test data.
mainly considers the information lost during data collection (such as case 2), which is a reachable distance. The actual dataset may not
lose the original information in the collection. At this time, the advantages of Z-KNN can not be shown.
Table 7 shows the average classification accuracy and standard deviation of the algorithm on the multi-class dataset. From Table 7,
we can see thatsthe Z-KNN algorithm achieves the best per- formance on the multi-class dataset except the Yeast dataset. The worst
performer is the NCP-KNN
Σ algorithm. In addition, we can
n
std
n 1
i=1
=
(Acci
— A˜cc) (24)
also see that the std of all algorithms is relatively
2 small, i.e., their
stability is very good.
where n represents the number of experiments, Acci represents the classification accuracy of the i-th experiment, and Acc repre-
sents the average classification accuracy of the experiment. The
˜
4.4 Parameter sensitivity
In Eqs. (11) and (12), there is a parameter
µ, which determines
smaller the std, the more stable the algorithm is.
Binary classification
Table 6 shows the classification performance of all algorithms on the binary datasets. We can get the some result, i.e., the Z- KNN
algorithm achieves the best results, and Ncp-KNN performs the worst. Specifically, on the German dataset, the classification accuracy
of the Z-KNN algorithm is 7.14% higher than the traditional KNN algorithm. Because the Euclidean distance used in the traditional
KNN does not take into account the natural distance, and the separability of classes is not high. CFKNN uses the relationship matrix
between test data and training data to construct a new measurement function, which still does not take into account the natural distance
between the data. LMRKNN uses the local mean vector in each class to construct a new distance function. Although it takes into
account the local structure of the data, it does not take into account the separability between classes. GHKNN improves KNN through
the graph regularization term of multi-kernel learning. It considers the global structure information of data, and it is more suitable for
noncoding RNA to locate cells. MDFKNN not only uses Minkowski distance to improve KNN, but also uses the method of applying
weight to k-nearest neighbors to reduce the importance of distant neighbors like WKNN. To sum up, the above comparison algorithms
do not use reachable distance and ignore the class characteristics of data. The proposed Z-KNN not only considers the natural distance
in the data, but also makes the intraclass distance larger, which improves the separability of the data and achieves better results.
Multiple classification
Figure 7 shows the classification accuracy of all algorithms on 12 data sets as K value. Specifically, we can see that the performance of
the Z-KNN algorithm is best in some cases from Figure 7. The NCP-KNN algorithm has the worst effect, and the overall effect of
the Z0-KNN algorithm is not satisfactory, but it achieves the best effect on the Usps dataset, which shows that after we introduce the
class feature information, it has a certain effect. The effect of Z-KNN is sufficient to prove that we are looking for a distance
function with “high cohesion, low coupling” is very necessary for classification. For the traditional KNN algorithm, its effect is better
than the NCP-KNN algorithm, which shows that only considering class information is unreliable. In addition, we also see that Z-KNN
does not perform best on some data sets. There are two main reasons for this phenomenon: 1. Different K values will affect the
performance of the algorithm. 2. Z-KNN
the size of the natural distance. Different µ value will affect
the distance calculation between training data and test data, thus affecting the selection of nearest neighbors. If µ takes a small value, it
may not play the role of measuring natural distance at all, if µ takes a large value, which may greatly weaken the unnatural distance
between samples. Therefore, we set up experiments with different K values and different µ values. As shown in Figs 8 and 9, we can
see that in most cases, the value of µ has an impact on the performance of classification. Specifically, on the Drift, Cnae, and
Movements data sets, the accuracy rate varies greatly under different µ values. This shows that one has to adjust the value of
parameter µ carefully. In addition, on some data sets, such as Banknote and Yeast data sets, parameters K and µ have little impact
on the performance of the algorithm. This shows that on the one hand, the selection of K value does not have a great impact on these
data sets. On the other hand, this shows that there may be no insurmountable natural distance in these data sets, i.e., there is no missing
information in the data at the time of data collection. Therefore, K value and µ value have no significant effect on these data sets.
5 C ONCLUSION
This paper has proposed a new distance function, reachable distance, or Z distance. Specifically, it takes the class attribute into account
in the distance function, and uses the distance between the class center points to measure the natural distance in the data. In
addition, it is an reachable distance, and it makes the interclass distance must be greater than the intraclass distance. In the experiment,
the KNN based on Z distance (i.e., Z-KNN) ex- ceeds the advanced comparison algorithm in terms of classification accuracy.
In the future work, we plan to proceed from the following three points as follows.
1. Finding one or more better distance functions to make the K-nearest neighbor classification algorithm achieve better
performance.
2. Applying this idea to other classification algorithms to find distance functions that are suitable for other classification
algorithms.
3. We will find a new distance function to apply to clustering, it is very challenging and interesting.
ACKNOWLEDGMENT
This work is partially supported by the Key Program of the Nation- al Natural Science Foundation of China (Grant No: 61836016).
55
Acc(%)
50
45
KNN
NCP-KNN CFKNN LMRKNN GHKNN MDFKNN WKNN
40 Z0-KNN Z-KNN
1 2 3 4 5 6 7 8 9 10
K
(a) Banknote
14
12
10
Acc(%)
8
2 KNN
NCP-KNN CFKNN LMRKNN GHKNN MDFKNN WKNN
Z0-KNN Z-KNN
1 2 3 4 5 6 7 8 9 10
K
(b) Cnae
35
30
25
Acc(%)
20
15
10
1 2 3 4 5 6 7 8 9 10 KNN
NCP-KNN CFKNN LMRKNN GHKNN MDFKNN WKNN
K Z0-KNN Z-KNN
(c) Drift
70 70
60 60
50 50
Acc(%)
Acc(%)
40 40
30 30
20 20
KNN KNN
NCP-KNN CFKNN LMRKNN GHKNN MDFKNN WKNN NCP-KNN CFKNN LMRKNN GHKNN MDFKNN WKNN
Z0-KNN Z-KNN Z0-KNN Z-KNN
4.5
3.5
Acc(%)
2.5
1.5
KNN
NCP-KNN CFKNN LMRKNN GHKNN MDFKNN WKNN
10 Z0-KNN Z-KNN
1 2 3 4 5 6 7 8 9 10
K
(d) German
10
1 2 3 4 5 6 7 8 9 10
K
(e) Ionosphere
1
1 2 3 4 5 6 7 8 9 10
K
(f) Letter
10
8
Acc(%)
6
KNN
NCP-KNN CFKNN LMRKNN GHKNN MDFKNN WKNN
4
0 Z0-KNN Z-KNN
1234567 10
K
(g) Movements
12
10
Acc(%)
6
2 KNN
NCP-KNN CFKNN LMRKNN GHKNN MDFKNN WKNN
Z0-KNN Z-KNN
1 2 3 4 5 6 7 8 9 10
K
(h) Multiple
100
90
80
70
Acc(%)
60
50
40
30
KNN
NCP-KNN CFKNN LMRKNN GHKNN MDFKNN WKNN
Z0-KNN Z-KNN 20
8 9 1 2 3 4 5 6 7 8 9 10
K
(i) Secom
22
20
18
Acc(%)
16
14
12
KNN
NCP-KNN CFKNN LMRKNN GHKNN MDFKNN WKNN
10 Z0-KNN Z-KNN
1 2 3 4 5 6 7 8 9 10
K
(j) Stalog
13
12
11
10
Acc(%)
9
6
KNN
NCP-KNN CFKNN LMRKNN GHKNN MDFKNN WKNN
5 Z0-KNN Z-KNN
1 2 3 4 5 6 7 8 9 10
K
(k) Usps
35
30
25
Acc(%)
20
15
10
5 KNN
NCP-KNN CFKNN LMRKNN GHKNN MDFKNN WKNN
Z0-KNN Z-KNN
1 2 3 4 5 6 7 8 9 10
K
(l) Yeast
Fig. 7. The classification accuracy on the 12 datasets with different K values.
R EFERENCES
[1] S. Wang, Z. Tian, K. Dong, and Q. Xie, “Inconsistency of neighborhood based on voronoi tessellation and euclidean distance,” Journal of Alloys and
Compounds, vol. 854, p. 156983, 2021.
[2] X. Zhu, S. Zhang, Y. Zhu, P. Zhu, and Y. Gao, “Unsupervised spectral feature selection with dynamic hyper-graph learning,” IEEE Transactions on
Knowledge and Data Engineering, 2020.
[3] X. Zhu, S. Zhang, Y. Li, J. Zhang, L. Yang, and Y. Fang, “Low- rank sparse subspace for spectral clustering,” IEEE Transactions on Knowledge and
Data Engineering, vol. 31, no. 8, pp. 1532–1543, 2018.
[4] Y. Guo, Z. Cheng, J. Jing, Y. Lin, L. Nie, and M. Wang, “Enhancing factorization machines with generalized metric learning,” IEEE Transac- tions on
Knowledge and Data Engineering, 2020.
[5] Z. Tang, L. Chen, X. Zhang, and S. Zhang, “Robust image hashing with tensor decomposition,” IEEE Transactions on Knowledge and Data
Engineering, vol. 31, no. 3, pp. 549–560, 2018.
[6] C. Zhu, L. Cao, Q. Liu, J. Yin, and V. Kumar, “Heterogeneous metric learning of categorical data with hierarchical couplings,” IEEE Transac- tions on
Knowledge and Data Engineering, vol. 30, no. 7, pp. 1254–1267, 2018.
100 20 40
Acc(%)
Acc(%)
Acc(%)
50 10 20
0 0 0
123 123 123
456 7 8 9 10 456 7 8 9 10 456 7 8 9 10
789 23456 78 56 78 56
K K910 1234 K 910 1234
10 1
(a) Banknote
(b) Cnae
(c) Drift
100 100 5
Acc(%)
Acc(%)
Acc(%)
50 50
0 0 0
123 123 123
456 7 8 9 10 456 7 8 9 10 456 7 8 9 10
K 7 8 910 1 2 3 4 56 K 78 1234
56 K 7 8 910 1 2 3 4 56
910
(d) German
(e) Ionosphere
(f) Letter
10 20 100
Acc(%)
Acc(%)
Acc(%)
5 10 50
0 0 0
123 123 123
456 7 8 9 10 456 7 8 9 10 456 7 8 9 10
K 78 1234
56 78
1234
56 K 789 23456
910 K 910 10 1
(g) Movements
(h) Multiple
(i) Secom
20 20 40
Acc(%)
Acc(%)
Acc(%)
10 10 20
0 0 0
123 123 123
456 7 8 9 10 456 7 8 9 10 456 7 8 9 10
789 23456 789 23456 K 789 23456
K K
10 1 10 1 10 1
(j) Stalog
(k) Usps
(l) Yeast
Fig. 8. The classification accuracy of different K and µ parameter (in Eq. (11)) values on the dataset.
[7] X.-S. Wei, H.-J. Ye, X. Mu, J. Wu, C. Shen, and Z.-H. Zhou, “Multiple instance learning with emerging novel class,” IEEE Transactions on Knowledge and
Data Engineering, 2019.
[8] X. Zhu, S. Zhang, W. He, R. Hu, C. Lei, and P. Zhu, “One-step multi- view spectral clustering,” IEEE Transactions on Knowledge and Data Engineering,
vol. 31, no. 10, pp. 2022–2034, 2018.
[9] S. P. Patel and S. Upadhyay, “Euclidean distance based feature ranking and subset selection for bearing fault diagnosis,” Expert Systems with Applications,
vol. 154, p. 113400, 2020.
[10] X. Gao and G. Li, “A knn model based on manhattan distance to identify
the snare proteins,” IEEE Access, vol. 8, pp. 112 922–112 931, 2020.
[11] I. B. K. D. S. Negara and I. P. P. Wardana, “Identifikasi kecocokan motif tenun songket khas jembrana dengan metode manhattan distance,” Jurnal
Teknologi Informasi dan Komputer, vol. 7, no. 2, 2021.
[12] M. Majhi, A. K. Pal, S. H. Islam, and M. Khurram Khan, “Secure content-based image retrieval using modified euclidean distance for en- crypted
features,” Transactions on Emerging Telecommunications Tech- nologies, vol. 32, no. 2, p. e4013, 2021.
[13] H. Ji, “Statistics mahalanobis distance for incipient sensor fault detection and diagnosis,” Chemical Engineering Science, vol. 230, p. 116233,
100 20 40
Acc(%)
Acc(%)
Acc(%)
50 10 20
0 0 0
123 123 123
456 7 8 9 10 456 7 8 9 10 456 7 8 9 10
789 23456 K 78 56 K 78 56
K 910 1234 910 1234
10 1
(a) Banknote
(b) Cnae
(c) Drift
100 100 5
Acc(%)
Acc(%)
Acc(%)
50 50
0 0 0
123 123 123
456 7 8 9 10 456 7 8 9 10 456 7 8 9 10
K 78 1234
56 K 78 1234
56 K 78
1234
56
910 910 910
(d) German
(e) Ionosphere
(f) Letter
10 20 100
Acc(%)
Acc(%)
Acc(%)
5 10 50
0 0 0
123 123 123
456 7 8 9 10 456 7 8 9 10 456 7 8 9 10
K 78 56 K 789 23456 7 8 910 1 2 3 4 56
910 1234 K
10 1
(g) Movements
(h) Multiple
(i) Secom
40 20 40
Acc(%)
Acc(%)
Acc(%)
20 10 20
0 0 0
123 123 123
456 7 8 9 10 456 7 8 9 10 456 7 8 9 10
78 56 K 78 56 7 8 910 1 2 3 4 56
K910 1234 910 1234 K
(j) Stalog
(k) Usps
(l) Yeast
Fig. 9. The classification accuracy of different K and µ parameter (in Eq. (12)) values on the dataset.
2021.
[14] L. Aggoun and Y. Chetouani, “Fault detection strategy combining narmax model and bhattacharyya distance for process monitoring,” Journal of the
Franklin Institute, vol. 358, no. 3, pp. 2212–2228, 2021.
[15] S. Ji, Z. Zhang, S. Ying, L. Wang, X. Zhao, and Y. Gao, “Kullback-leibler divergence metric learning,” IEEE Transactions on Cybernetics, 2020.
[16] Q. Zhang, X. Guan, H. Wang, and P. M. Pardalos, “Maximum shortest path interdiction problem by upgrading edges on trees under hamming distance,”
Optimization Letters, pp. 1–20, 2021.
[17] P. Ganesan, B. Sathish, L. L. Joseph, K. Subramanian, and R. Murugesan,
“The impact of distance measures in k-means clustering algorithm for natural color images,” in Advances in Artificial Intelligence and Data Engineering.
Springer, 2021, pp. 947–963.
[18] S. Zhang and J. Li, “Knn classification with one-step computation,” IEEE Transactions on Knowledge and Data Engineering, pp. 1–1, 2021.
[19] J. Gou, H. Ma, W. Ou, S. Zeng, Y. Rao, and H. Yang, “A generalized mean distance-based k-nearest neighbor classifier,” Expert Systems with
Applications, vol. 115, pp. 356–372, 2019.
[20] P. A. Poorheravi, B. Ghojogh, V. Gaudet, F. Karray, and M. Crowley, “Acceleration of large margin metric learning for nearest neighbor
TABLE 6
Experimental results for binary data sets.
TABLE 7
Accuracy (mean ± standard deviation) statistical results on multi-class datasets.
classification using triplet mining and stratified sampling,” arXiv preprint arXiv:2009.14244, 2020.
[21] K. Song, F. Nie, J. Han, and X. Li, “Parameter free large margin nearest neighbor for distance metric learning,” in Proceedings of the AAAI Conference on
Artificial Intelligence, vol. 31, no. 1, 2017.
[22] Y.-K. Noh, B.-T. Zhang, and D. D. Lee, “Generative local metric learn- ing for nearest neighbor classification,” IEEE transactions on pattern analysis and
machine intelligence, vol. 40, no. 1, pp. 106–118, 2017.
[23] S. Ying, Z. Wen, J. Shi, Y. Peng, J. Peng, and H. Qiao, “Manifold preserving: An intrinsic approach for semisupervised distance metric learning,” IEEE
transactions on neural networks and learning systems, vol. 29, no. 7, pp. 2731–2742, 2017.
[24] D. Wang and X. Tan, “Robust distance metric learning via bayesian inference,” IEEE Transactions on Image Processing, vol. 27, no. 3, pp. 1542–1553,
2017.
[25] L. Jiao, X. Geng, and Q. Pan, “Bp k nn: k-nearest neighbor classifier with pairwise distance metrics and belief function theory,” IEEE Access, vol. 7, pp. 48
935–48 947, 2019.
[26] Y. Song, Y. Gu, R. Zhang, and G. Yu, “Brepartition: Optimized high- dimensional knn search with bregman distances,” IEEE Transactions on Knowledge
and Data Engineering, 2020.
[27] B. Su and Y. Wu, “Learning meta-distance for sequences by learning a ground metric via virtual sequence regression,” IEEE Transactions on Pattern
Analysis and Machine Intelligence, 2020.
[28] O¨ . F. Ertug˘rul, “A novel distance metric based on differential evolution,” Arabian Journal for Science and Engineering, vol. 44, no. 11, pp. 9641– 9651,
2019.
[29] Z. Geler, V. Kurbalija, M. Ivanovic´, and M. Radovanovic´, “Weighted knn and constrained elastic distances for time-series classification,” Expert Systems
with Applications, vol. 162, p. 113829, 2020.
[30] M. Feng, M. Li, and S. Xu, “Project 2: Knn with different distance metrics.”
[31] E. O¨ . KIYAK, D. B˙IRANT, and K. U. B˙IRANT, “An improved version of multi-view k-nearest neighbors (mvknn) for multiple view learning,”
Turkish Journal of Electrical Engineering & Computer Sciences, vol. 29, no. 3, pp. 1401–1428, 2021.
[32] J. Maillo, S. Garc´ıa, J. Luengo, F. Herrera, and I. Triguero, “Fast and
scalable approaches to accelerate the fuzzy k-nearest neighbors classifier for big data,” IEEE Transactions on Fuzzy Systems, vol. 28, no. 5, pp. 874–886,
2019.
[33] M. S. Santos, P. H. Abreu, S. Wilk, and J. Santos, “How distance metrics influence missing data imputation with k-nearest neighbours,” Pattern
Recognition Letters, vol. 136, pp. 111–119, 2020.
[34] N. Marchang and R. Tripathi, “Knn-st: Exploiting spatio-temporal corre- lation for missing data inference in environmental crowd sensing,” IEEE Sensors
Journal, vol. 21, no. 3, pp. 3429–3436, 2020.
[35] L. A. C. Valverde and J. A. M. Arias, “Evaluacio´n de distintas te´cnicas de representacio´n de texto y medidas de distancia de texto usando knn para
clasificacio´n de documentos,” Tecnolog´ıa en Marcha, vol. 33, no. 1, pp. 64–79, 2020.
[36] S. Susan and A. Kumar, “Dst-ml-eknn: data space transformation with metric learning and elite k-nearest neighbor cluster formation for clas- sification of
imbalanced datasets,” in Advances in Artificial Intelligence and Data Engineering. Springer, 2021, pp. 319–328.
[37] Y.-P. Sun and M.-L. Zhang, “Compositional metric learning for multi- label classification,” Frontiers of Computer Science, vol. 15, no. 5, pp. 1–12, 2021.
[38] X. Gu, P. P. Angelov, D. Kangin, and J. C. Principe, “A new type of distance metric and its use for clustering,” Evolving Systems, vol. 8, no. 3, pp. 167–
177, 2017.
[39] L. Gong, H. Wang, M. Ogihara, and J. Xu, “idec: indexable distance estimating codes for approximate nearest neighbor search,” Proceedings of the VLDB
Endowment, vol. 13, no. 9, pp. 1483–1497, 2020.
[40] Y. Wang, “Survey on deep multi-modal data analytics: collaboration, rivalry, and fusion,” ACM Transactions on Multimedia Computing, Com-
munications, and Applications (TOMM), vol. 17, no. 1s, pp. 1–25, 2021.
[41] H. Wang, Y. Wang, Z. Zhang, X. Fu, L. Zhuo, M. Xu, and M. Wang, “Kernelized multiview subspace analysis by self-weighted learning,” IEEE
Transactions on Multimedia, 2020.
[42] K. Buza, A. Nanopoulos, and L. Schmidt-Thieme, “Fusion of similarity measures for time series classification,” in International Conference on Hybrid
Artificial Intelligence Systems. Springer, 2011, pp. 253–261.
[43] N. Tomasˇev, K. Buza, K. Marussy, and P. B. Kis, “Hubness-aware classification, instance selection and feature construction: Survey and
extensions to time-series,” in Feature selection for data and pattern
recognition. Springer, 2015, pp. 231–262.
[44] H. Sakoe and S. Chiba, “Dynamic programming algorithm optimization
for spoken word recognition,” IEEE transactions on acoustics, speech,
and signal processing, vol. 26, no. 1, pp. 43–49, 1978.
[45] U. Lall and A. Sharma, “A nearest neighbor bootstrap for resampling
hydrologic time series,” Water Resources Research, vol. 32, no. 3, pp.
679–693, 1996.
[46] Y. Xu, Q. Zhu, Z. Fan, M. Qiu, Y. Chen, and H. Liu, “Coarse to fine k
nearest neighbor classifier,” Pattern recognition letters, vol. 34, no. 9,
pp. 980–986, 2013.
[47] J. Gou, W. Qiu, Z. Yi, Y. Xu, Q. Mao, and Y. Zhan, “A local mean
representation-based k-nearest neighbor classifier,” ACM Transactions
on Intelligent Systems and Technology (TIST), vol. 10, no. 3, pp. 1–25,
2019.
[48] H. Zhou, H. Wang, J. Tang, Y. Ding, and F. Guo, “Identify ncrna
subcellu- lar localization via graph regularized k-local hyperplane
distance nearest neighbor model on multi-kernel learning,” IEEE/ACM
Transactions on Computational Biology and Bioinformatics, 2021.
[49] M. Mailagaha Kumbure and P. Luukka, “A generalized fuzzy k-nearest
neighbor regression model based on minkowski distance,” Granular
Computing, pp. 1–15, 2021.
[50] J. Dhar, A. Shukla, M. Kumar, and P. Gupta, “A weighted mutu-
al k-nearest neighbour for classification mining,” arXiv preprint arX-
iv:2005.08640, 2020.