0% found this document useful (0 votes)
44 views

Fuzzy DBScan

This document discusses extensions of the DBSCAN clustering algorithm to generate fuzzy clusters. It proposes three approaches that allow either fuzzy density characteristics within clusters or fuzzy overlapping boundaries between clusters. The classic DBSCAN algorithm is also summarized, which uses parameters for minimum points and distance to assign points to dense clusters or label them as noise. Previous related work on a fuzzy extension of DBSCAN called FN-DBSCAN is mentioned.

Uploaded by

Nghiem Quoc Anh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views

Fuzzy DBScan

This document discusses extensions of the DBSCAN clustering algorithm to generate fuzzy clusters. It proposes three approaches that allow either fuzzy density characteristics within clusters or fuzzy overlapping boundaries between clusters. The classic DBSCAN algorithm is also summarized, which uses parameters for minimum points and distance to assign points to dense clusters or label them as noise. Previous related work on a fuzzy extension of DBSCAN called FN-DBSCAN is mentioned.

Uploaded by

Nghiem Quoc Anh
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/287527085

Fuzzy Core DBScan Clustering Algorithm

Conference Paper  in  Communications in Computer and Information Science · July 2014


DOI: 10.1007/978-3-319-08852-5_11

CITATIONS READS
15 1,245

2 authors:

Gloria Bordogna Dino Ienco


Italian National Research Council French National Institute for Agriculture, Food, and Environment (INRAE)
217 PUBLICATIONS   3,838 CITATIONS    172 PUBLICATIONS   2,154 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

SPACE4AGRI: Sviluppo di metodologie aerospaziali innovative di osservazione della terra a supporto del settore agricolo in Lombardia View project

Unsupervised study of satellite image time series View project

All content following this page was uploaded by Dino Ienco on 08 June 2016.

The user has requested enhancement of the downloaded file.


Fuzzy extensions of the DBScan Clustering
Algorithm

Gloria Bordogna1 and Dino Ienco2


1
CNR IREA, Via Bassini 15, Milano (Italy)
[email protected]
2
Irstea, UMR TETIS, Montpellier, France
LIRMM, Montpellier, France
[email protected]

Abstract. We propose some distinct extensions of the DBSCAN al-


gorithm to generate fuzzy clusters characterized by distinct properties:
either clusters with fuzzy density characteristics or clusters with fuzzy
overlapping boundaries. These proposals are compared with the clas-
sic DBSCAN and the baseline fuzzy extension of DBSCAN, the FN-
DBSCAN algorithm. Some results are discussed on synthetic data.

1 Introduction
Density based clustering algorithms have a wide applicability in spatial data
mining. They apply a local criterion to group objects: clusters are regarded as
regions in the data space where the objects are dense, and which are separated
by regions of low object density (noise). Among the density based clustering
algorithms DBSCAN is very popular due both to its low complexity and its
ability to detect clusters of any shape, which is a desired characteristics when
one does not have any knowledge of the possible clusters’ shapes, or when the
objects are distributed heterogenously such as along paths of a graph or a road
network.
Nevertheless, to drive the process this algorithm needs two numeric input
parameters, minPts and  which together define the desired density characteris-
tics of the generated clusters. Specifically, minPts is a positive integer specifying
the minimum number of objects that must exists within a maximum distance 
from any given object in order for this latter to belong to a cluster.
Since DBSCAN is very sensible to the setting of these input parameters they
must be chosen with great accuracy, which generally needs an exploration phase
of trials and errors to fix the right values.
Unfortunately, these input parameters should be set properly considering
both the scale of the dataset and the closeness of the objects in order not to
affect too much both the speed of the algorithm which strongly depends on
these values ?? and the effectiveness of the results too.
In fact, a common drawback of all crip flat clustering algorithms used to group
objects whose distribution has a faint and smooth density profile is that they
draw crisp boundaries to separate clusters, which are often somewhat arbritrary.
There are also applications in which the positions of the objects is ill-known,
such as in the case of databases of moving objects, whose locations are recorded
at fixed timestamps, or in the case of objects appearing in remote sensing images
having a coarse spatial resolution so that a pixel is much greater than the object
dimension, and thus uncertainty is implied when one has to detect the exact
position of the object within the area of the pixel.
In this contribution we investigate several extensions of the DBSCAN algo-
rithm defined within the framework of fuzzy set theory whose aim is to detect
fuzzy clusters with desired density characteristics.
The objectives of the extensions are several: first in order to leverage the pa-
rameter setting we proposes distinct fuzzy extensions of the DBSCAN algorithm
which do not require to specify precise values for all the two input parameters,
but allow the specification of some approximative value of the parameters ex-
pressed by means of soft constraints defined by fuzzy sets on the basic domains of
the input parameters themselves. The algorithm uses this approximative input
to generate fuzzy clusters, i.e., clusters whose elements are associated with a nu-
meric membership degree in [0,1]. Having a membership degree associated with
each pair ¡object - cluster¿ it is then possible to perform a sensitivity analysis to
obtain distinct crisp partitions by specifying distinct minimum thresholds on the
membership degrees. This allows an exploration of the spatial distribution of the
objects without the need of several runs of the clustering. Second objective is to
be able to improve the effectiveness of the grouping of objects characterized by
a fuzzy distribution. We will present three possible fuzzy extensions and discuss
the properties of the generated fuzzy clusters.

2 Classic DBScan Algorithm

For sake of clarity in the following we will consider a set of objects represented
by distinct points defined in a bidimensional spatial domain. These objects can
be either actual entities located on the real spatial domain such as cars, taxi
cabs, airplains, or virtual entities, such as web pages or tweets represented in
the virtual space of terms they contain. DBSCAN can be applied to group these
objects based on their local densities in the space that to identify traffic jams
of cars on the roads, or web pages and tweets that deal with close topics. DB-
SCAN assigns points of a spatial domain defined on RxR to particular clusters
or designates them as statistical noise if they are not sufficiently close to other
points. DBSCAN determines cluster assignments by assessing the local den-
sity at each point using two parameters: distance () and minimum number of
points (minP ts). A single point which meets the minimum density criterion,
namely that there are minPts located within distance , is designated a core
point. Formally, Given a set P of N points pi = (xi , yi ) wih xi ,yi defined on the
spatial domain . p ∈ P is a core point if at least a minimum number minP ts
of points p1 , , pminP ts ∈ P ∃s.t||pj − p|| < , Two core points pi and pj with
i, js.t||pi − pj || <  define a cluster c, pi , pj ∈ c and are core point of c, i.e.,
pj , pj ∈ core(c) All not core points within the maximum  distance from a core
point are considered non-core members of a cluster, and are boudary points:
p∈ / core(c) is a boundary point of c if ∃pi ∈ core(c) with ||p − pi || < . Finally,
points that are not part of a cluster are considered noise: p ∈/ core(c) are noise
if ∀c, @pi ∈ core(c) with ||p − pi || < . IN the following the classic DBSCAN
algorithm is described:

Algorithm 1 DBSCAN (D,,M inP ts)


Require: P : dataset of points
Require: : the maximum distance around a point defining the point neighbourhood
Require: M inP ts: density, in points, around a point to be considered a core point
1: C = 0
2: Clusters = ∅
3: for all p ∈ P s.t. p is unvisited do
4: mark p as visited
5: neighborsPts = regionQuery(p,)
6: if (sizeof (neighborsP ts) <= M inP ts) then
7: mark p as NOISE
8: else
9: C = next cluster
10: Clusters = Clusters ∪ expandCluster(p, neighborsP ts, C, , M inP ts)
11: end if
12: end for
13: return Clusters

3 Related Work

In the literature there have been a few extensions of the DSCAN algorithm
in order to detect fuzzy clusters. In [1] the authors propose a fuzzy extension
of the DBSCAN, named FN-DBSCAN (fuzzy neighborhood DBSCAN), whose
main characteristic is to use a fuzzy neighborhood relation whereas DBSCAN
uses a crisp neighborhood relation. In this approach they address the difficulty
of the user in setting the values of the parameters when both the number of
the points to cluster is unknown and when their distances are in distinct scales.
Thus they first normalize the distances between pairs of points in [0,1], and
then they allow specifying distinct membership functions on the distance to
delimit the neighborhood of points, i.e., the decaying of the membership degree
as a function of the distance from the point. Then, they select as belonging to
the fuzzy neighbourhood of a point only those having a minimum membership
degree. This extension of DBSCAN uses a level-based neighborhood set instead
of a distance-based neighborhood set and it uses the concept of fuzzy cardinality
instead of classical cardinality for identifying core points. This last choice causes
the creation (with the same run of the algorithm) of both fuzzy clusters with
cores having many sparse points and fuzzy clusters with cores having only a
Algorithm 2 expandCluster(p, neighborsP ts, C, , M inP ts)
Require: p: the point just marked as visited
Require: neighborsP ts: the neighbourhood of p
Require: C: the actual cluster
Require:  the distance around a point to compute its density
Require: M inP ts: density, in points, around a point to be considered a core point
1: add p to cluster C
0
2: for all p ∈ neighborsP ts do
0
3: if p is not visited then
0
4: mark p as visited
0 0
5: neighborsP ts = regionQuery(p ,)
0
6: if sizeof (neighborsP ts ) > M inP ts then
0
7: neighborsP ts = neighborsP ts ∪ neighborsP ts
8: end if
9: end if
0
10: if p is not yet member of any cluster then
0
11: add p to cluster C
12: end if
13: end for
14: return C

few close points. Thus the density characteristic of the generated clusters is
heterogeneous. A scalable implementation of the FN-DBSCAN, named SFN-
DBSCAN has been proposed in ?? an improvement of the efficency of the FN-
DBSCAN is described.
The utility of a fuzzy DBSCAN has pointed out in the paper ?? where the
authors use FN-DBSCAN in conjunction with the computation of the convex
hull of the generated fuzzy clusters to derive connected footprints of entities with
arbitrary shape. Having fuzzy clusters allows generating isolines footprints.
A second paper is somehow related to the motivations of our proposal since it
takles the problem of clustering huge number of objects strongly affected by noise
?? when the scale distributions of objects are heterogeneous. Their solution does
not generate fuzzy clusters but we report it since their work can be the basis for
our fuzzy extension. To remove noise they first map the distance from any point
of its k-neighbours and rank the distance values in decreasing order of distance;
then they determine the threshold θ on the distance which corresponds to the
first minimum on the ordered values. All points in the first ranked positions
having a distance above the thresholds θ are noise points and are removed,
while the remaining will belong to a cluster. These latter points are clustered
with the classic DBSCAN by providing as input parameters minP ts = K and
 = θ. As stated by the authors, the main problem of this approach is the fact
that θ is somehow arbitrarily chosen within a range of possible values.
Another motivation of defining fuzzy DBSCAN is for clustering objects whose
position is ill-known as in the paper ?? where the authors propose a fuzzy
distance measure to define the probability that an object is directly density-
reachable from another objects.

4 Fuzzification

4.1 Generating clusters with approximately dense fuzzy core

A first extension of the classic DBSCAN algorithm can be obtained by con-


sidering crisp the distance as in the classic approach and by introducing an
approximative value of the desired density minP ts. This can be done by sub-
stituting the numeric value minP ts with a soft constraint defined by a non
decreasing membership function on the domain of the positive integer values.
This soft constraint specifies the approximative number of points that are re-
quired for generating a fuzzy core of a cluster. Let us define the piecewise linear
membership function as follows:

1,
 if x ≥ M ptsM ax
µminP (x) M ptsx−M ptsM in
M ax −M ptsM in
, if M ptsM in < x < M ptsM ax

0, if x ≤ M ptsM in

This membership function gives the value 1 when the number x of elements
in the neighbourhood of a point is greater than M ptsM ax , a value 0 when x
is below M ptsM in and intermediate values when x is in between M ptsM in and
M ptsM ax .
Let us redefine the fuzzy core . Given a set P of N points pi = (xi , yi )
with xi , yi defined on the spatial domain. Given a point p ∈ P , if x points pi
∃ in the neighbourhood of point p , i.e., with |pi − p| < , s.t. µminP (x) > 0
the p is a fuzzy core point with membership degree to the fuzzy core given by
F uzzycore(p) = µM inP (x) If two fuzzy core points pi , pj ( F uzzycore(pi ) > 0
and F uzzycore(pj ) > 0) ∃ with i 6= j s.t. |pi − pj | <  then they define a cluster
c, pi , pj ∈ c , and are fuzzy core points of c, i.e., pi , pj ∈ f uzzycore(c) with their
membership degrees F uzzycorec (pi ) and F uzzycorec (pj ).
A point p of a cluster that is not a fuzzy core point, but is a boundary point,
is defined as follows: Given p if ∃pi ∈ f uzzycore(c), i.e., with membership degree
f uzzycorec (pi ) > 0 , s.t. |p−pi | <  then p gets a membership degree to c defined
as: µc (p) = f uzzycorec (pi )
This definition allows generating fuzzy clusters with a fuzzy core, where the
membership degree represents the more or less number of core points, i.e., the
cluster density. Notice that the points belonging to a cluster c get all the same
membership value to the cluster. However distinct clusters may have distinct
membership degrees indicating their distinct density properties.
Moreover, a boundary point p can partially belong to more than one cluster
at the same time with distinct membership values µci (p) since boundary points
of given clusters can be considered as candidate boundary point of other clusters.
This allow generating fuzzy clusters with overlapping boundaries.
Finally, points p that are not part of a cluster are considered noise ∀c@pi ∈
f uzzycore(c) s.t. |pi − p| ≤ , then p is noise.

Algorithm 3 Approx Fuzzy Core DBSCAN(D,,M inP ts)


Require: P : dataset of points
Require: : the maximum distance around a point defining the point neighbourhood
Require: M inP tsM in , M inP tsM ax : soft constraint interval for the density around a
point to be considered a core point
1: C = 0
2: Clusters = ∅
3: for all p ∈ P s.t. p is unvisited do
4: mark p as visited
5: neighborsPts = regionQuery(p,)
6: if (sizeof (neighborsP ts) < M inP tsM in ) then
7: mark p as NOISE
8: else
9: C = next cluster
10: Clusters = Clusters∪expandClusterFuzzyCore(p, neighborsP ts, C, , M inP tsM in , M inP tsM ax )
11: end if
12: end for
13: return Clusters

4.2 Generating clusters with approximately reachable fuzzy


boundaries

The second proposal is to extend DBSCAN by allowing the specification of


an approximative maximum distance, instead of asking for a single numeric
distance parameter . This can be done by defining a membership function on
the numeric distance domain which is interpreted by the algorithm as a soft
constraint admitting degrees of satisfaction in [0,1]. This allows computing a
gradual membership to the clusters for the boundary points.
Differently than the proposal of [1] we allow to specify the membership func-
tion on the distance as a soft constraint with piecewise linear shape defined by
two values M in and M ax so that when the distance is smaller than M in the
membership degree is maximum (1), when it is greater than M ax its membership
is null (0) and it decreases linearly when it is in between M in and M ax :

1,
 if kp − pi || ≤ M in
M ax −kp−pi ||
µdist (pi , pj )  − , if M in < kp − pi || < M ax
 M ax M in

0, if kp − pi || > M ax
We can now redefine a core point of a fuzzy cluster: Given a point p if at
least a number minP ts of points {p1 , ..., pminP ts } ∃ s.t. ∀pi ∈ P, µdist (pi , pj ) = 1
then p is a core point .
Algorithm 4 expandClusterF uzzyCore(p, neighborsP ts, C, , M inP tsM in ,
M inP tsM ax )
Require: p: the point just marked as visited
Require: neighborsP ts: the neighbourhood of p
Require: C: the actual cluster
Require:  the distance around a point to compute its density
Require: M inP tsM in , M inP tsM ax : soft constraint interval for the density around a
point to be considered a core point
1: add p to C with membership F uzzycore(p) = µM inP (|neighborsP ts|)
0
2: for all p ∈ neighborsP ts do
0
3: if p is not visited then
0
4: mark p as visited
0 0
5: neighborsP ts = regionQuery(p ,)
0
6: if sizeof (neighborsP ts ) > M inP tsM in then
0
7: neighborsP ts = neighborsP ts ∪ neighborsP ts
0 0 0
8: add p to C with membership F uzzycore(p ) = µM inP (|neighborsP ts |)
9: end if
0
10: if p is not yet member of any cluster then
0
11: add p to C as border point
12: end if
13: end if
14: end for
15: return C

If two core points pi pj ∃ with i 6= j and µdist (pi , pj ) = 1 then pj ,pj belong to
c , i.e., they define a fuzzy cluster c, and are core points of c, i.e., pi , pj ∈ core(c)
and they get a membership degree to the cluster given by µc (p) = 1.
A point p of a fuzzy cluster that is not a core point, can be a boundary
point, if it satisfies the following: p ∈ / core(c) is a boundary point of a cluster
c if ∃pi ∈ core(c) s.t µdist (pi , p) > 0: p gets a membership degree to cluster c
defined as:

µc (p) = maxpi ∈core(c) µdist (pi , p)

This definition of the membership degree of a boundary point of a cluster c


allows to consider the closest point pi in core(c).
This definition allows generating fuzzy clusters with approximate reachability
properties, i.e., with variable faint boundaries, where the membership degree
represents the distinct distance of each boundary point from a core point of
the cluster. In this approach the clusters have all the same density properties
for their cores, while they may have boundary points with distinct reachability
from the core.
Moreover, a boundary point p can partially belong to more than one cluster
at the same time with distinct membership values. This allow generating fuzzy
clusters with overlapping boundaries. This is guaranteed by the condition for the
selection of the points to be evaluated as boundary points of clusters for which
it is sufficient not to be a core point of any cluster.
Finally a point p is considered noise if ∀c @pi ∈ core(c) s.t. µdist (pi , p) > 0.
Notice that the core is still crisp and not fuzzy as in [1]. Further, differently
than in the previous cited paper minP ts is still a precise numeric value that
defines the minimum number of points of a core as in the classic DBSCAN , and
it is not the minimum cardinality of the fuzzy cluster. This allows generating
fuzzy clusters with a crisp core. More clearly, in our proposal many points at
a large distance satisfying the membership function are not equivalent to a few
highly dense points.

4.3 Generating clusters with both approximately dense Fuzzy


Cores and approximately reachable Fuzzy Boundaries
In this section we introduce an extension that combines together the previous
two so as to model fuzziness over both cores and boundaries. The use of soft
constraints to approximatively specify both M inP ts and  allows to obtain a
much more flexible fuzzy version of the DBscan algorithm.
This approach can deal with both input parameters specified by approxima-
tive values :
– M inP ts is replaced by two values (M ptsmin and M ptsmax ) that define the
soft constraint µm inP specifying the desired approximative density of the
fuzzy core;
–  is replaced by two values (min and max ) that define the soft constraint
µd ist specifying the approximative reachability property of the fuzzy bound-
ary .
We define neigh(p, max ) as the set of points that have a distance from p less
than max .
A fuzzy core point is defined by first computing the fuzzy cardinality dens(p)
of neigh(p, max as follows:
X
dens(p) = µdist (p, pi )
pi ∈neigh(p,max )

if µminP (dens(p)) > 0, i.e., it is greater than 0, the point p ∈ f uzzycore of


certain clusters with membership degree µminP (dens(p)). if p has a µminP (dens(p)) <
0 , then p is a border or a noise point.
Now we define the fuzzy core of a cluster c: f uzzycore(c): Given p ∈ f uzzycore
with membership degree µminP (dens(p)) if ∃pi ∈ f uzzycore s.t. µdist (p, pi ) > 0
then p ∈ f uzzycore(c). Its membership degree µc (p) to cluster c is computed as
follows:
µc (p) = min(µminP (dens(p)), maxj (µdist (p, pj )) ∀pj s.t. µdist (p, pj ) > 0andpj ∈
f uzzycore(c)
A point p ∈ / f uzzycore(c), i.e. that does not belong to the fuzzy core of a
cluster c, is a fuzzy boundary point of c, if it satisfies the following condition:
∃pi ∈ f uzzycore(c) s.t. µdist (p, pi ) > 0.
Its membership to the fuzzy cluser c is defined as follows: µc (p) = min(µminP (dens(p)), maxj (µdist (p, pj ))
∀pj s.t. µdist (p, pj ) > 0andpj ∈ f uzzycore(c)
If a point is neither fuzzycore nor fuzzyboder is a noise point.
Notice that with this definition we define a fuzzy cluser c as composed of
two disjoint fuzzy sets: the fuzzy core f uzzycore(c) and the fuzzy boundary
f uzzyboundary(c) such that : f uzzycore(c) ∩ f uzzyboundary(c) =
Each point p beloging to either the f uzzyore(c) or f uzzyboundary(c) gets a
distinct membership degree to c indicated by µc (p).
Futher, while fuzzy core points can belong to the fuzzy core of only one
cluster, points that are not fuzzy core points can belong to the fuzzy boundary
of several clusters with distinct membership values.

5 Experiments

6 Conclusion

The paper reviewed some models for evaluating soft aggregations of selection
conditions with unequal importance weights in flexible queries to databases.
We outlined the drawbacks of these approaches, specifically the fact that they
model only the relative importance of the conditions. Further, we proposed a
generalization of the p-norm model [3] to allow other semantics of importance
weights: besides the relative importance semantics it can model the ideal (desired
or undesired) and the minimum (crisp or broad) acceptance levels of satisfaction
degrees of the conditions.

References
1. Efendi N. Nasibov and Gözde Ulutagay. Robustness of density-based cluster-
ing methods with various neighborhood relations. Fuzzy Sets and Systems,
160(24):3601–3615, 2009.
2. Ester, M., Kriegel, H.P., Sander J., Xu, X. A density-based algorithm for discovering
clusters in large spatial databases with noise. Proc. 2nd Int. Conf. on Knowledge
Discovery and Data Mining, :226231, 1996.
3. Parker, J.K., Downs, J.A., Footprint generation using fuzzy-neighborhood cluster-
ing. Geoinformatica, 17 :285299, 2013.
600

550

500

450
Y

400

350 cluster0
cluster1
cluster2
cluster3
cluster4
300 cluster5
cluster6
cluster7
cluster8
250
50 100 150 200 250 300 350 400 450 500
X
(a)
600

550

500

450
Y

400

350
cluster0
cluster1
cluster2
300 cluster3
cluster4
cluster5
cluster6
250
50 100 150 200 250 300 350 400 450 500
X
(b)

Fig. 1. Results of a) DBSCAN b) Approx Fuzzy Core DBSCAN. For we set Mpts = 9
and  = 12 while for Approx Fuzzy Core DBSCAN the soft constraint over the minimum
number of points ranges from 7 to 12 and  is always equal to 12.
580 fuzzycore(c) = 1.0
1.0 > fuzzycore(c) >= 0.5
0.5 > fuzzycore(c) > 0
570

560

550
Y

540

530

520

510
120 140 160 180 200 220 240 260 280 300
X

Fig. 2. Inspection of a cluster generated with the Approx Fuzzy Core DBSCAN ap-
proach (M pts=(9,12), =12]). For the light blue cluster (Cluster2) shown in Figure 1b
we visualize the fuzzy core points grouped in three category: fuzzy cores with mem-
bership equal to 1 (red cross), fuzzy cores with membership lesser than 1 and greater
or equal to 0.5 (blue X) and fuzzy cores with membership lesser than 0.5 and bigger
than 0 (green star)

View publication stats

You might also like