Fuzzy Clustering Toolbox
Fuzzy Clustering Toolbox
Analysis Toolbox
For Use with Matlab
i
multidimensional scaling method described by Sammon. The original
method is computationally expensive when a new data point has to
be mapped, so a modified method described by Abonyi got into this
toolbox.
Installation
Contact
1 Theoretical introduction 3
1.3 Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Visualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 Reference 19
Function Arguments . . . . . . . . . . . . . . . . . . . . . . . . 21
Kmeans . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Kmedoid . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1
FCMclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
GKclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
GGclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
clusteval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
validity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Sammon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
FuzSam . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
projeval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
samstr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3 Case Studies 55
3.3.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . 71
Theoretical introduction
The aim of this chapter is to introduce the theories in the fuzzy clustering so
that one can understand the subsequent chapter of this thesis at a necessary
level. Fuzzy clustering can be used as a tool to obtain the partitioning of data.
Section 1.1 gives the basic notions about the data, clusters and different types
of partitioning. Section 1.2 presents the description of the algorithms used in
this toolbox. The validation of these algorithms is described in Section 1.3.
Each discussed algorithm has a demonstrating example in Chapter 2 at the
description of the Matlab functions.
For a more detailed treatment of this subject see the classical monograph by
Bezdek [1] or Hoppner’s book [2]. The notations and the descriptions of the
algorithms closely follow the structure used by [3].
3
1.1. CLUSTER ANALYSIS
x11 x12 ··· x1n
x21 x22 ··· x2n
X=
.. .. .. .. .
(1.1)
. . . .
xN 1 xN 2 · · · xN n
means of a distance norm. Distance can be measured among the data vectors
themselves, or as a distance form a data vector to some prototypical object of
the cluster. The prototypes are usually not known beforehand, and are sought
by the clustering algorithms simultaneously with the partitioning of the data.
The prototypes may be vectors of the same dimension as the data objects, but
they can also be defined as ”higher-level” geometrical objects, such as linear or
nonlinear subspaces or functions.
Data can reveal clusters of different geometrical shapes, sizes and densities as
demonstrated in Fig. 1.1. Clusters can be spherical (a), elongated or ”linear”
(b), and also hollow (c) and (d). Clusters (b) to (d) can be characterized as
linear and nonlinear subspaces of the data space (R2 in this case). Algorithms
that can detect subspaces of the data space are of particular interest for identi-
fication. The performance of most clustering algorithms is influenced not only
by the geometrical shapes and densities of the individual clusters but also by
the spatial relations and distances among the clusters. Clusters can be well-
separated, continuously connected to each other, or overlapping each other.
Since clusters can formally be seen as subsets of the data set, one possible
classification of clustering methods can be according to whether the subsets
are fuzzy or crisp (hard). Hard clustering methods are based on classical set
theory, and require that an object either does or does not belong to a cluster.
Hard clustering in a data set X means partitioning the data into a specified
Hard partition
The objective of clustering is to partition the data set X into c clusters. For
the time being, assume that c is known, based on prior knowledge, for instance,
or it is a trial value, of witch partition results must be validated[1].
Using classical sets, a hard partition can be defined as a family of subsets
{Ai |1 ≤ i ≤ c ⊂ P (X)}, its properties are as follows:
Sc
i=1 Ai = X, (1.2)
Ai ∩ Aj , 1 ≤ i 6= j ≤ c, (1.3)
Ø ⊂ Ai ⊂ X, 1 ≤ i ≤ c. (1.4)
These conditions mean that the subsets Ai contain all the data in X, they must
be disjoint and none of them is empty nor contains all the data in X. Expressed
in the terms of membership functions:
Wc
i=1 µAi = 1, (1.5)
µAi ∨ µAj , 1 ≤ i 6= j ≤ c, (1.6)
0 < µAi < 1, 1 ≤ i ≤ c. (1.7)
Here µAi is the characteristic function of the subset Ai and its value can be
zero or one.
To simplify the notations, we use µi instead of µAi , and denoting µi (xk ) by
µik , partitions can be represented in a matrix notation.
A N ×c matrix U = [µik ] represents the hard partition if and only if its elements
satisfy:
µij ∈ 0, 1, 1 ≤ i ≤ N, 1 ≤ k ≤ c, (1.8)
c
P
µik = 1, 1 ≤ i ≤ N, (1.9)
k=1
N
P
0< µik < N, 1 ≤ k ≤ c. (1.10)
i=1
Fuzzy partition
The i-th column of U contains values of the membership function of the i-th
fuzzy subset of X.(1.13) constrains the sum of each column to 1, and thus the
total membership of each xk in X equals one. The distribution of memberships
The hard partitioning methods are simple and popular, though its results are not
always reliable and these algorithms have numerical problems as well. From an
N × n dimensional data set K-means and K-medoid algorithms allocates each
data point to one of c clusters to minimize the within-cluster sum of squares:
c X
X
||xk − vi ||2 (1.16)
i=1 k∈Ai
where Ai is a set of objects (data points) in the i-th cluster and vi is the
mean for that points over cluster i. (1.16) denotes actually a distance norm. In
K-means clustering vi is called the cluster prototypes, i.e. the cluster centers:
PNi
k=1 xk
vi = , xk ∈ Ai , (1.17)
Ni
where Ni is the number of objects in Ai .
In K-medoid clustering the cluster centers are the nearest objects to the mean
of data in one cluster V = {vi ∈ X|1 ≤ i ≤ c}. It is useful for example, when
each data point denotes a position of a system, so there is no continuity in the
data space. In these ways the mean of the points in one set does not exist. The
concrete algorithms are described on page 27 and 29 in Chapter 2.
The stationary points of the objective function (1.18) can be found by adjoining
the constraint (1.13) to J by means of Lagrange multipliers:
c X
N N
à c !
X X X
m 2
J(X; U, V, λ) = (µik ) DikA + λk µik − 1 , (1.21)
i=1 k=1 k=1 i=1
and
N
P
µm
ik xk
k=1
vi = N
, 1 ≤ i ≤ c. (1.23)
P
µm
i,k
k=1
This solution also satisfies the remaining constraints (1.12) and (1.14). Note
that equation (1.23) gives vi as the weighted mean of the data items that
belong to a cluster, where the weights are the membership degrees. That is
why the algorithm is called ”c-means”. One can see that the FCM algorithm is
a simple iteration through (1.22) and (1.23).
The FCM algorithm computes with the standard Euclidean distance norm, which
induces hyperspherical clusters. Hence it can only detect clusters with the same
shape and orientation, because the common choice of norm inducing matrix
is: A = I or it can be chosen as an n × n diagonal matrix that accounts for
different variances in the directions in the directions of the coordinate axes of
X:
(1/σ1 )2 0 ··· 0
2
0 (1/σ2 ) · · · 0
AD =
.. .. .. .. ,
(1.24)
. . . .
0 0 · · · (1/σn )2
Gustafson and Kessel extended the standard fuzzy c-means algorithm by employ-
ing an adaptive distance norm, in order to detect clusters of different geometrical
shapes in one data set [4]. Each cluster has its own norm-inducing matrix Ai ,
which yields the following inner-product norm:
2
DikA = (xk − vi )T Ai (xk − vi ), 1 ≤ i ≤ c, 1 ≤ k ≤ N. (1.26)
The matrices Ai are used as optimization variables in the c-means functional,
thus allowing each cluster to adapt the distance norm to the local topological
structure of the data. Let A denote a c-tuple of the norm-inducing matrices:
A = (A1 , A2 , ..., Ac ). The objective functional of the GK algorithm is defined
by:
c X
X N
J(X; U, V, A) = (µik )m DikA
2
i
. (1.27)
i=1 k=1
For a fixed A, conditions (1.12), (1.13) and (1.14) can be directly applied.
However, the objective function (1.27) cannot be directly minimized with respect
to Ai , since it is linear in Ai . This means that J can be made as small as desired
by simply making Ai less positive definite. To obtain a feasible solution, Ai must
be constrained in some way. The usual way of accomplishing this is to constrain
the determinant of Ai . Allowing the matrix Ai to vary with its determinant
fixed corresponds to optimizing the cluster’s shape while its volume remains
constant:
kAi k = ρi , ρ > 0, (1.28)
where ρi is fixed for each cluster. Using the Lagrange multiplier method, the
following expression for Ai is obtained:
Ai = [ρi det(F i )]1/n F −1
i , (1.29)
where F i is the fuzzy covariance matrix of the i-th cluster defined by:
N
P
(µik )m (xk − vi )(xk − vi )T
k=1
Fi = N
. (1.30)
P
(µik )m
k=1
Note that the substitution of (1.29) and (1.30) into (1.26) gives a generalized
squared Mahalanobis distance norm between xk and the cluster mean vi , where
the covariance is weighted by the membership degrees in U. The concrete
algorithm is described on page 37 in Chapter 2.
The numerically robust GK algorithm described by R. Babuška, P.J. van der
Veen, and U. Kaymak [5] is used in this toolbox.
1.3 Validation
Cluster validity refers to the problem whether a given fuzzy partition fits to the
data all. The clustering algorithm always tries to find the best fit for a fixed
number of clusters and the parameterized cluster shapes. However this does not
mean that even the best fit is meaningful at all. Either the number of clusters
might be wrong or the cluster shapes might not correspond to the groups in
the data, if the data can be grouped in a meaningful way at all. Two main
approaches to determining the appropriate number of clusters in data can be
distinguished:
Different scalar validity measures have been proposed in the literature, none
of them is perfect by oneself, therefor we used several indexes in our Toolbox,
which are described below:
3. Partition Index (SC): is the ratio of the sum of compactness and sep-
aration of the clusters. It is a sum of individual cluster validity measures
normalized through division by the fuzzy cardinality of each cluster[8].
c PN m 2
X j=1 (µij ) ||xj − vi ||
SC(c) = Pc (1.36)
i=1
Ni k=1 ||vk − vi ||2
SC is useful when comparing different partitions having equal number of
clusters. A lower value of SC indicates a better partition.
4. Separation Index (S): on the contrary of partition index (SC), the sepa-
ration index uses a minimum-distance separation for partition validity[8].
Pc PN 2
i=1 − vi ||2
j=1 (µij ) ||xj
S(c) = (1.37)
N mini,k ||vk − vi ||2
5. Xie and Beni’s Index (XB):it aims to quantify the ratio of the total
variation within clusters and the separation of clusters[9].
Pc PN m
i=1 j=1 (µij ) ||xj− vi ||2
XB(c) = (1.38)
N mini,j ||xj − vi ||2
The optimal number of clusters should minimize the value of the index.
6. Dunn’s Index (DI): this index is originally proposed to use at the iden-
tification of ”compact and well separated clusters”. So the result of the
clustering has to be recalculated as it was a hard partition algorithm.
function between two clusters(minx∈Ci ,y∈Cj d(x, y)) is rated in value from
beneath by the triangle-nonequality:
Note, that the only difference of SC, S and XB is the approach of the separation
of clusters. In the case of overlapped clusters the values of DI and ADI are
not really reliable because of re-partitioning the results with the hard partition
method.
1.4 Visualization
The clustering-based data mining tools are getting popular, since they are able to
”learn” the mapping of functions and systems or explore structures and classes
in the data.
The Principal Component Analysis maps the data points into a lower dimen-
sional space, which is useful in the analysis and visualization of the correlated
high-dimensional data.
We focused on the Sammon mapping method for the visualization of the clus-
tering results,which preserves interpattern distances. This kind of mapping of
distances is much closer to the proposition of clustering than simply preserving
the variances. Two problems with the Sammon mapping application are:
• The prototypes of clusters are usually not known apriori, and they are
calculated along with the partitioning of the data. These prototypes can
be vectors dimensionally equal to the examined data points, but they also
can be defined as geometrical objects, i.e. linear or non.linear subspaces,
functions. Sammon mapping is a projection method, which is based on
the preservation of the Euclidian interpoint distance norm, so it can be
only used by clustering algorithms calculating with this type of distance
norm[10],[11].
• The Sammon mapping algorithm forces to find in a high n-dimensional
space N points in a – lower – q-dimensional subspace, such these inter-
point distances correspond to the distances measured in the n-dimensional
space. This effects a computationally expensive algorithm, since in every
iteration step it requires computation of N (N − 1)/2 distances.
P NP
−1 N
P
where λ = i<j dij = dij , but there is no need to maintain λ for a
i=1 j=i+1
successful solution of the optimization problem, since as a constant, it does not
changes the optimization result.
where d(xk , vi ) represents the distance between the xk datapoint and the
vi cluster center measured in the original n-dimensional space, while d∗ki =
d∗ (yk , zi ) represents the Euclidian distance between the projected cluster cen-
ter zi and the projected data yk .
This means, in the projected two dimensional space every cluster is represented
by a single point, independently to the form of the original cluster prototype,
vi .
The resulted algorithm is similar to the original Sammon mapping, but in this
case in every iteration after the adaptation of the projected data points, the
projected cluster centers are recalculated based on the weighted mean formula
of the fuzzy clustering algorithms described in Chapter 2 on Page 51.
The distances between the projected data and the projected cluster centers are
based on the normal Euclidian distance measures. The membership values of
the projected data can be also plotted based on the classical formula of the
calculation of the membership values:
1
µ∗ki = c ³ ´ 2 (1.47)
P d∗ (xk ,ηi ) m−1
d∗ (xk ,vj )
j=1
P = kU − U∗ k (1.48)
Reference
The Reference chapter provides the descriptions of the functions included into
this toolbox. At each description one can find the syntax, the algorithm and
two uncomplicated example are considered: a generated disjoint data set in R2
and a motorcycle real data set: head acceleration of a human ”post mortem
test object” was plotted in time.
(it can be found on the web: http://www.ece.pdx.edu/∼mcnames/DataSets/).
The figures are discussed, and several notes are mentioned as well. The lines of
the contour maps mean the level curves of the same values of the membership
degree (for details see the description of clusteval on Page 43).
The used input and output arguments are shown in Tab. 2.3 and Tab. 2.4
in the ”Function Arguments” section, so one can more easily find out, which
function uses which parameter, and what are the output matrix structures. In
the following Tab. 2.1 the implemented functions are listed and grouped by
proposition.
19
Table 2.1: Reference
Hard and Fuzzy clustering algorithms Page
Kmeans Crisp clustering method 24
with standard Euclidean distance norm
Kmedoid Crisp clustering method with standard 28
standard Euclidean distance norm
and centers chosen from the data
FCMclust Fuzzy C-means clustering method 31
with standard Euclidean distance norm
GKclust Fuzzy Gustafson–Kessel clustering method 35
with squared Mahalanobis distance norm
GGclust Fuzzy Gath–Geva clustering method 39
with a distance norm based on the
fuzzy maximum likelihood estimates
Evaluation of clustering results Page
clusteval calculates the membership values of fuzzy clustering 43
algorithms (C-means, Gustafson-Kessel, Gath-Geva)
for ”unseen” data
Validation of the clustering results Page
validity Validity measure by calculating different types of 43
validation indexes.
Data normalization Page
clustnormalize data normalization with two possible method. 46
clustdenormalize denormalization of data with the method used 46
through the normalization.
Visualization Page
PCA Principal Component Analysis of n-dimensional data. 48
sammon Sammon Mapping for visualization. 50
FuzSam Modified Sammon Mapping for visualization of n- 51
dimensional data.
The data structures created during normalization are shown in Table 2.2
The input parameters and output matrices are defined in Table 2.3 and
Table 2.4.
Purpose
Hard partition of the given data matrix, described inSection 1.2.1.
Syntax
[result]=Kmeans(data,param)
Description
The objective function Kmeans algorithm is to partition the data
set X into c clusters. Calling the algorithm with the syntax above,
first the param.c and data.X must be given, and the output ma-
trices of the function are saved in result.data.f , result.data.d and
result.cluster.v. The number of iteration and the cost function
are also saved in the result structure. The calculated cluster center
vi (i ∈ {1, 2, . . . , c}) is the mean of the data points in cluster i.
The difference from the latter discussed algorithms are that it gener-
ates random cluster centers, not partition matrix for initialization,
so param.c can be given as an initializing matrix containing the
cluster centers too.
Example
The examples were generated with the Kmeanscall function located
in ..\Demos\clusteringexamples\synthetic\ and
..\Demos\clusteringexamples\motorcycle\ directories, where the nDex-
ample function and motorcycle.txt are situated.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 2.3: The results of the K-means clustering algorithm by the motor-
cycle data set.
Discussion
In Fig. 2.1 and Fig. 2.2 one can see two results on the synthetic
data set with different initializations.(on the latter figure Kmeans
was called with v = [0.6 0.5; 0.6 0.5; 0.6 0.5; ]) The difference
is obvious. Fig. 2.3 shows the K-means clustered motorcycle data.
The cluster centers are marked with ’o’.
Being a hard partition the clustered data points can easily sepa-
rated by using different markers. If the param.c is greater than 3,
the algorithm draws the ”border” of the clusters using the Matlab
voronoi function.
The main problem of K-means algorithm is that the random ini-
tialization of centers, because the calculation can run into wrong
results, if the centers ”have no data points”.
Notes
1. It is recommended to run Kmeans several times to achieve the
correct results.
2. To avoid the problem described above, the cluster centers are
initialized with randomly chosen data points.
3. If Dik becomes zero for some xk , singularity occurs in the
algorithm, so the initializing centers are not exactly the random
data points, they are just near them. (with a distance of 10−10 in
each dimension)
4. If the initialization problem still occurs for some reason (e.g. the
user adds wrong initialization of the function), the ”lonely” centers
are redefined to data points.
Repeat for l = 1, 2, . . .
Step 1 Compute the distances
2
Dik = (xk − vi )T (xk − vi ), 1 ≤ i ≤ c, 1 ≤ k ≤ N. (2.3)
Step 2 Select the points for a cluster with the minimal distances,
they belong to that cluster.
Step 3 Calculate cluster centers
PNi
(l) j=1 xi
vi = (2.4)
Ni
n
Q
until max|v (l) − v (l−1) | 6= 0.
k=1
See Also
Kmedoid, clusteval, validity
Purpose
Hard partition of the given data matrix, where cluster centers must
be data points (described inSection 1.2.1).
Syntax
[result]=Kmedoid(data,param)
Description
The objective function Kmedoid algorithm is to partition the data
set X into c clusters. The input and output arguments are the
same what K-means uses. The main difference between Kmeans
and Kmedoid stands in calculating the cluster centers: the new
cluster center is the nearest data point to the mean of the cluster
points.
The function generates random cluster centers, not partition matrix
for initialization, so param.c can be given as an initializing matrix
containing the cluster centers too, not only a scalar (the number of
clusters)
Example
The examples were generated with the Kmedoidcall function lo-
cated in ..\Demos\clusteringexamples\synthetic\ and
..\Demos\clusteringexamples\motorcycle\ directories, where the nDex-
ample function and motorcycle.txt are situated.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Figure 2.4: The results of the K-medoid clustering algorithm by the motor-
cycle data set.
Discussion
Different initializations of the algorithm can also effect very different
results as Fig. 2.1 and Fig. 2.2 shows. Fig. 2.4 shows the K-medoid
clustered motorcycle data. The cluster centers are selected from the
data set. Otherwise K-medoid is just like the K-means algorithm.
Notes
See the description of K-means algorithm on Page 27.
Algorithm
[K-medoid algorithm] For corresponding theory see Section 1.2.1.
Given the data set X, choose the number of clusters 1 < c <
N .Initialize with random cluster centers chosen from the data set
X.
Repeat for l = 1, 2, . . .
Step 1 Compute the distances
2
Dik = (xk − vi )T (xk − vi ), 1 ≤ i ≤ c, 1 ≤ k ≤ N. (2.5)
Step 2 Select the points for a cluster with the minimal distances,
they belong to that cluster.
Step 3 Calculate fake cluster centers
(the original K-means)
PNi
(l)∗ j=1 xi
vi = (2.6)
Ni
2∗
Dik = (xk − v∗ i )T (xk − v∗ i ), (2.7)
and
(l)
x∗i = argmini (Dik
2∗
) ; vi = x∗i . (2.8)
n
Q
until max|v(l) − v(l−1) | 6= 0.
k=1
See Also
Kmeans, clusteval, validity
Purpose
Fuzzy C-means clustering of a given data set (described in Sec-
tion 1.2.2).
Syntax
[result]=FCMclust(data,param)
Description
The Fuzzy C-means clustering algorithm uses the minimization of
the fuzzy C-means functional (1.18). There are three input pa-
rameter needed to run this function: param.c, as the number of
clusters or initializing partition matrix; param.m, as the fuzziness
weighting exponent; and param.e, as the maximum termination
tolerance. The two latter parameter have their default value, if
they are not given by the user.
The function calculates with the standard Euclidean distance norm,
the norm inducing matrix is an n × n identity matrix. The result of
the partition is collected in structure arrays. One can get the par-
tition matrix cluster centers, the square distances, the number of
iteration and the values of the C-means functional at each iteration
step.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.9
0.8
0.7
0.6
x2
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x1
Figure 2.6: The results of the Fuzzy C-means clustering algorithm by the
motorcycle data set.
Notes
1. If DikA becomes zero for some xk , singularity occurs in the
algorithm: the membership degree cannot be computed.
2. The correct choice of the weighting parameter (m) is important:
as m approaches one from above, the partition becomes hard, if it
approaches to infinity, the partition becomes maximally fuzzy, i.e.
µik = 1/c.
3. There is no possibility to use different AA for the clusters,
although in most cases it would be needed.
Algorithm
[FCM algorithm] For corresponding theory see Section 1.2.2.
Given the data set X, choose the number of clusters 1 < c < N ,
the weighting exponent m > 1, the termination tolerance ² > 0
and the norm-inducing matrix A. Initialize the partition matrix
randomly, such that U(0) ∈ Mf c .
Repeat for l = 1, 2, . . .
Step 1 Compute the cluster prototypes (means):
N
P (l−1) m
(µik ) xk
(l) k=1
vi = N
, 1 ≤ i ≤ c. (2.9)
P (l−1)
(µi,k )m
k=1
(l) 1
µi,k = Pc 2/(m−1)
. (2.11)
j=1 (DikA /DjkA )
See Also
Kmeans, GKclust, GGclust
References
J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function
Algorithms, Plenum Press, 1981
Purpose
Gustafson-Kessel clustering algorithm extends the Fuzzy C-means
algorithm by employing an adaptive distance norm, in order to de-
tect clusters with different geometrical shapes in the data set.
Syntax
[result]=GKclust(data,param)
Description
The GKclust function forces, that each cluster has its own norm
inducing matrix Ai , so they are allowed to adapt the distance norm
to the local topological structure of the data points. The algorithm
uses the Mahalanobis distance norm. The parameters are extended
with param.ro, it is set to one for each cluster by default value.
There are two numerical problems with GK algorithm, which are
described in Notes.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.9
0.8
0.7
0.6
x2
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x1
Notes
1. If there is no prior knowledge, ρi is 1 for each cluster, so the GK
algorithm can find only clusters with approximately equal volumes.
2. A numerical drawback of GK is: When an eigenvalue is zero or
when the ratio between the maximal and the minimal eigenvalue,
i.e. the condition number of the covariance matrix is very large, the
matrix is nearly singular. Also the normalization to a fixed volume
fails, as the determinant becomes zero. In this case it is useful to
constrain the ratio between the maximal and minimal eigenvalue,
this ratio should be smaller than some predefined threshold, that is
in the β parameter.
3. In case of clusters extremely extended in the direction of the
largest eigenvalues the computed covariance matrix cannot estimate
the underlaying data distribution, so a scaled identity matrix can
be added to the covariance matrix by changing the value of γ from
zero (as default)to a scalar: 0 ≤ γ ≤ 1.
Algorithm
[Modified Gustafson-Kessel algorithm] For corresponding theory see
Section 1.2.3.
Given the data set X, choose the number of clusters 1 < c < N ,
the weighting exponent m > 1, the termination tolerance ² > 0
and the norm-inducing matrix A. Initialize the partition matrix
randomly, such that U(0) ∈ Mf c .
Repeat for l = 1, 2, . . .
Step 1 Calculate the cluster centers.
N
P (l−1) m
(µik ) xk
(l) k=1
vi = N
,1≤i≤c (2.12)
P (l−1) m
(µik )
k=1
Reconstruct F i by:
(l) 1
µik = Pc 2/(m−1)
, 1 ≤ i ≤ c, 1 ≤ k ≤ N .
j=1 (DikAi (xk , vi )/Djk (xk , vj ))
(2.18)
until ||U (l) − U (l−1) || < ².
See Also
Kmeans, FCMclust, GGclust
References
R. Babuska, P.J. van der Veen, and U. Kaymak. Improved co-
variance estimation for GustafsonKessel clustering. In Proceedings
of 2002 IEEE International Conference on Fuzzy Systems, pages
10811085, Honolulu, Hawaii, May 2002.
Purpose
Gath-Geva clustering algorithm uses a distance norm based on the
fuzzy maximum likelihood estimates.
Syntax
[result]=GGclust(data,param)
Description
The Gath-Geva clustering algorithm has the same outputs defined
at the description of Kmeans and GKclust, but it has less input
parameters (only the weighting exponenet and the termination tol-
erance), because the used distance norm involving the exponential
term cannot run into numerical problems.
Note that the original fuzzy maximum likelihood estimates does not
involve the value of param.m, it is constant 1.
The parameter param.c can be given as an initializing partition
matrix or as the number of clusters. For other attributes of the
function see Notes.
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.9
0.8
0.7
0.6
x2
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x1
Figure 2.10: The results of the Gath-Geva clustering algorithm by the mo-
torcycle data set.
Notes
1. The GG algorithm can overflow at large c values (which indicate
small distances) because of invertation problems in (1.31).
2. The fuzzy maximum likelihood estimates clustering algorithm is
able to detect clusters of varying shapes, sizes and densities.
3. The cluster covariance matrix is used in conjunction with an
”exponential” distance, and the clusters are not constrained in vol-
ume.
4. This algorithm is less robust in the sense that it needs a good
initialization, since due to the exponential distance norm, it con-
verges to a near local optimum. So it is recommended to use the
resulting partition matrix of e.g. FCM to initialize this algorithm.
Algorithm
[Gath-Geva algorithm] For corresponding theory see Section 1.2.4.
Given a set of data X specify c, choose a weighting exponent
m > 1 and a termination tolerance ² > 0. Initialize the partition
matrix with a more robust method.
Repeat for l = 1, 2, . . .
Step 1 Calculate the cluster centers:
P
N
(l−1) w
(µik ) xk
(l) k=1
vi = P N ,1≤i≤c
(l−1) w
(µik )
k=1
1 PN
αi = N k=1 µik
(l) 1
µik = Pc 2/(m−1)
, 1 ≤ i ≤ c, 1 ≤ k ≤ N .
j=1 (Dik (xk , vi )/Djk (xk , vj ))
(2.21)
until ||U(l) − U(l−1) || < ².
See Also
Kmeans, FCMclust, GKclust
References
I. Gath and A.B. Geva, Unsupervised Optimal Fuzzy Clustering,
IEEE Transactions on Pattern Analysis and Machine Intelligence,
7:773–781, 1989
Purpose
The purpose of the function is to evaluate ”unseen” data with
the cluster centers calculated by the selected function of clustering
method.
Syntax
[eval] = clusteval(new,result,param)
Description
The clusteval function uses the results and the parameters of FCM-
clust, GKclust, GGclust clustering functions: It recognizes the used
clustering algorithm to evaluate the unseen data sets, on the grounds
of the results of clustering.
The new data points to be evaluated must be given in the N 0 × n
structure array new.X, i.e. only the dimensions must be equal for
both data sets. The results are collected in eval.f and eval.d, as
the partition matrix and the distances from cluster prototypes of
this data set.
In 2-dimensional case it generates pair-coordinate points in the data
space, calculates the partition matrix, and draws a contour-map by
selecting the points with the same partitioning values and drawing
default number of colored lines in the data field (the density,i.e.
number of lines can be changed with the parameter of the contour
function). If the user wants to see this contour-map for a simple
clustering, new.X = data.X should be used. The lines of the
contour map denotes the position of points with equal membership
degrees.
Algorithm
The clusteval function uses the algorithms of the clustering func-
tions, i.e. it can be comprehended as a clustering method with only
one iteration step.
0.9
0.8
0.7
0.6
x2
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x1
Discussion
To present the evaluation, a 2-D data set (data2.txt) was
selected and the GK-clustering was executed. During the
first running new.Xwas equal to data.X, so the contour map
could be plotted (see Fig. 2.11). After that a new data point
was chosen to be evaluated: new.X = [0.5 0.5] and the
result is in eval.f .
As Fig. 2.11 also shows the new data point rather belongs to
the clusters in the ”upper left” corner of the normalized data
space.
See Also
FCMclust, GKclust, GGclust
Purpose
Calculating validity measure indexes to estimate the goodness of
an algorithm or to help to find the optimal number of clusters for
a data set.
Syntax
[result] = validity(result,data,param)
Description
Depending on the value of param.val the validity function calcu-
lates different cluster validity measures.
If this parameter is not given by the user or it has another value, the
program calculates only PC and CE, as default validity measures.
For validation of hard partitioning methods it is recommended to
calculate DI and ADI.
The calculation of these indexes is described in Section 1.3.
Example
Examples of the validation indexes are shown in the experimental
chapter in Section 3.1 and Section 3.2.
References
J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function
Algorithms, Plenum Press, 1981, NY.
A.M. Bensaid, L.O. Hall, J.C. Bezdek, L.P. Clarke, M.L. Silbiger,
J.A. Arrington, R.F. Murtagh, Validity-guided (re)clustering with
applications to image segmentation, IEEE Transactions on on Fuzzy
Systems, 4:112 -123, 1996.
X.L. Xie and G.A. Beni, Validity measure for fuzzy clustering, IEEE
Trans. PAMI, 3(8):841–846, 1991.
Purpose
Normalization and denormalization of the wanted data set.
Syntax
[data] = clustnormalize(data,method)
Syntax
[data] = clustdenormalize(data,method)
Description
clustnormalize uses two method to data normalization. If method
is:
range - Values are normalized between [0,1] (linear operation).
var - Variance is normalized to one (linear operation).
The original data is saved in data.Xold, and the function also saves:
1. in case of ’range’ the row vectors containing the minimum
and the maximum elements of each column from the original data
(data.min and data.max)
2. in case of ’var’ the row vectors containing the mean and standard
deviation of the data (data.mean and data.std)
Example
The example was generated with the normexample function located
in ..\Demos\normexample\ directory, where data3.txt is situated.
−5
−1 0 1 2 3 4 5 6 7 8 9
1
x2
0.5
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
4
−2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.5
x1
Figure 2.12: Original data, normalized data with ’range’ method, normalized
data with ’var’ method.
Algorithm
Normalization
’range’
X − Xmin
X= (2.23)
Xmax − Xmin
’var’
Xold − X
X= (2.24)
σX
See Also
FCMclust, GKclust, GGclust
Purpose
Principal Component Analysis. Projection of the n-dimensional
data into a lower q-dimensional data (Section 1.4.1).
Syntax
[result] = PCA(result,data,param)
Description
The inputs of the PCA function are the multidimensional data.X
and the param.q projection dimension parameter (with a common
value of 2). The function calculates the autocorrelation matrix of
the data and its eigenvectors, sort them according to eigenvalues,
and they are normalized. Only the first eigenvector of q-dimension is
selected, it will be the direction of the plane to which the data points
are projected. The output matrices are saved in result.P CAproj
structure, one can get the projected data (P), the projected clus-
ter centers(vp), the eigenvectors (V) and the eigenvalues (D). The
results are evaluated with projeval function.
Example
The example was generated with the PCAexample function located
in ..\Demos\PCAexample\ directory, where the nDexample func-
tion is situated, which generated the random 3-D data.
Synthetic 3−D data
6
x3
4
7
3
6
2 5
6
5
4 4
3
2
1
0
−1 3
−2 x1
x2
y2
0
−1
−2
−3
−3 −2 −1 0 1 2 3
y1
Discussion
A simple example is presented: random generated 3-D data is clus-
tered with Fuzzy C-means algorithm, and through PCA projection
the results are plotted. The original and the projected data is shown
in Fig. 2.13 and Fig. 2.14. The error (defined at the description of
the projeval function on page 53) is P = 0.0039.
Notes
The PCA Matlab function is taken from the SOM Toolbox, which
is obtainable on the web:
http://www.cis.hut.fi/projects/somtoolbox/.
See Also
projeval, samstr
References
Juha Vesanto, Neural Network Tool for Data Mining: SOM Tool-
box,Proceedings of Symposium on Tool Environments and Develop-
ment Methods for Intelligent Systems (TOOLMET2000), 184-196,
2000.
Purpose
Sammon’s Mapping. Projection of the n-dimensional data into a
lower q-dimensional data (described in Section 1.4.2).
Syntax
[result] = Sammon(result,data,param)
Description
The Sammon function calculates the original Sammon’s Mapping.
It uses result.data.d of the clustering as input and two parame-
ters: the maximum iteration number (param.max, default value
is 500.) and the step size of the gradient method (param.alpha,
default value is 0.4.). The proj.P can be given either an initializing
projected data matrix or the projection dimension. In latter case the
function calculates with random initialized projected data matrix,
hence it needs normalized clustering results. During calculation the
Sammon function uses online drawing, where the projected data
points are marked with ’o’, and the projected cluster centers with
’*’. The online plotting can be disengaged by editing the code, if
faster calculation wanted. The results are evaluated with projeval
function.
Example
The examples would be like the one shown in Fig. 2.14 on page 49.
Notes
The Sammon Matlab function is is taken from the SOM Toolbox,
which is obtainable on the web:
http://www.cis.hut.fi/projects/somtoolbox/.
See Also
projeval, samstr
References
Juha Vesanto, Neural Network Tool for Data Mining: SOM Tool-
box,Proceedings of Symposium on Tool Environments and Develop-
ment Methods for Intelligent Systems (TOOLMET2000), 184-196,
2000.
Purpose
Fuzzy Sammon Mapping. Projection of the n-dimensional data into
a lower q-dimensional data (described Section 1.4.3).
Syntax
[result] = FuzSam(proj,result,param)
Description
The FuzSam function modifies the Sammon Mapping, so it becomes
computationally cheaper. It uses result.data.f and result.data.d
of the clustering as input.It has two parameters, the maximum it-
eration number (param.max, default value is 500.) and the step
size of the gradient method (param.alpha, default value is 0.4.).
The proj.P can be given either an initializing projected data matrix
or the projection dimension. In latter case the function calculates
with random initialized projected data matrix, hence it needs nor-
malized clustering results. During calculation FuzSam uses online
drawing, where the projected data points are marked with ’o’, and
the projected cluster centers with ’*’. The online plotting can be
disengaged by editing the code. The results are evaluated with
projeval function.
Example
The examples would be like the one shown in Fig. 2.14 on page 49.
Algorithm
• [Input] : Desired dimension of the projection, usually q = 2,
the original dataset, X; and the results of fuzzy clustering:
cluster prototypes, vi , membership values, U = [µki ], and the
distances D = [dki = d(xk , vi )]N ×c .
• [Initialize] the projected datapoints by yk PCA based projec-
tion of xk , and compute the projected cluster centers by
PN
k=1 (µki )m yk
zi = PN
(2.25)
m
k=1 (µki )
See Also
projeval, samstr
References
A. Kovács - J. Abonyi, Vizualization of Fuzzy Clustering Results
by Modified Sammon Mapping, Proceedings of the 3rd Interna-
tional Symposium of Hungarian Researchers on Computational In-
telligence, 177-188, 2002.
Purpose
Evaluation for projected data.
Syntax
[perf ] = projeval(result,param)
Description
The projeval function uses the results and the parameters of the
clustering and the visualization functions. It is analog to the clus-
teval function, but it evaluates the projected data. The distances
between projected data and projected cluster centers are based on
the Euclidean norm, so the function calculates only with a 2-by-2
identity matrix, generates the pair-coordinate points, calculates the
new partition matrix, and draws a contour-map by selecting the
points with the same partitioning values.
A subfunction of projeval is to calculate relation-indexes defined on
the ground of (1.48):
N
X N
X
P = kU − U∗ k, µ2k , µ2∗
k . (2.26)
k=1 k=1
See Also
clusteval, PCA, Sammon, FuzSam, samstr
Purpose
Calculation of Sammon’s stress for projection methods.
Syntax
[result] = samstr(data,result)
Description
The simple function calculates Sammon’s stress defined in (1.43).
It uses data.X and result.proj.P as input, and the only output is
the result.proj.e containing the value of this validation constant.
See Also
PCA,Sammon, FuzSam
Case Studies
The aim of this chapter is present the differences, the usefulness and effective-
ness of the partitioning clustering algorithms by partitioning different data sets.
In Section 3.1 the five presented algorithms are compared based on numerical
results (validity measures). Section 3.2 deals with the problem of finding the
optimal number of clusters, because this information is rarely known apriori. A
real partitioning problem is presented in Section 3.3 with three real data sets:
different types of wine, iris flowers and breast cancer symptoms are partitioned.
55
3.1 Comparing the clustering methods
Using the validity measures mentioned in Section 1.3 the partitioning methods
can be easily compared. To solve this problem, a synthetic data set was used
shown in Fig. 3.1-5, so the index-values are better demarcated at each type of
clustering. These validity measures are collected in Tab. 3.1.
First of all it must be mentioned, that all these algorithms use random initial-
ization, so different runnings issue in different partition results, i.e. values of
the validation measures. On the other hand the results hardly depend from
the structure of the data, and no validity index is perfect by itself for a clus-
tering problem. Several experiment and evaluation are needed that are not the
proposition of this work. The presented figures were generated with the Kmeans-
call, Kmedoidcall, FCMcall, GKcall, GGcall functions, which are located in the
..\Demos\comparing\ directory. Each function call modvalidity function, which
calculates all the validity measures, and it exists also in the directory above.
Kmeans
1
0.9
0.8
0.7
0.6
x2
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x1
0.9
0.8
0.7
0.6
x2
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x1
Fig. 3.1 shows that hard clustering methods also can find a good solution for
the clustering problem, when it is compared with the figures of fuzzy clustering
algorithms. On the contrary in Fig. 3.2 one can see a typical example for the
initialization problem of hard clustering. This caused the differences between
the validity index values in Tab. 3.1, e.g. the Xie and Beni’s index is infinity (in
”normal case” the K-medoid returns with almost the same results as K-means).
The only difference between Fig. 3.3 and Fig. 3.4 stands in the shape of the
clusters, while the Gustafson-Kessel algorithm can find the elongated clusters
better (the description can be find in Section 1.2.3and the concrete algorithm
on page 37). Fig. 3.5 shows that the Gath–Geva algorithm returned with a
result of three subspaces.
As one can see in Tab. 3.1, PC and CE are useless for K-means and K-medoid,
while they are hard clustering methods. But that is the reason for the best
results in S, DI (and ADI), which are useful to validate crisp and well separated
clusters.
On the score of the values of the two ”most popular and used” indexes for
fuzzy clustering (Partition Coefficient and Xie and Beni’s Index) the Gath-Geva
clustering has the very best results for this data set.
0.9
0.8
0.7
0.6
x2
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x1
GK
1
0.9
0.8
0.7
0.6
x2
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x1
0.9
0.8
0.7
0.6
x2
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
x1
In the course of every partitioning problem the number of subsets (called the
clusters) must be given by the user before the calculation, but it is rarely known
apriori, in this case it must be searched also with using validity measures. In
this section only a simple example is presented: the motorcycle data set is
used to find out the optimal number of clusters. In Section 1.3 described
validity measures were used with Gustafson-Kessel algorithm to validate the
partitioning of the motorcycle data with the current number of clusters. The
results are discussed. The presented figures were generated with the optnumber
function, which is located in the ..\Demos\optnumber\ directory, and it calls
also the modvalidity function like the calling functions in Section 3.1.
0.9
0.8
0.7
0.6
2 4 6 8 10 12 14
c
0.8
0.6
0.4
0.2
2 4 6 8 10 12 14
c
1.5
0.5
0
2 4 6 8 10 12 14
Separation index (S)
0.02
0.015
0.01
0.005
2 4 6 8 10 12 14
Xie and Beni index (XB)
30
20
10
0
2 4 6 8 10 12 14
c
Figure 3.7: Values of Partition Index and Separation Index and Xie and
Beni’s Index
We must mention again, that no validation index is reliable only by itself, that
is why all the programmed indexes are shown, and the optimum can be only
detected with the comparison of all the results. We consider that partitions with
less clusters are better, when the differences between the values of a validation
index are minor.
The main drawback of PC is the monotonic decreasing with c and the lack of
direct connection to the data. CE has the same problems: monotonic increasing
with c and hardly detectable connection to the data structure. On the score of
Fig. 3.6, the number of clusters can be only rated to 3.
In Fig. 3.7 there are more informative diagrams are shown: SC and S hardly
decreases at the c = 5 point. The XB index reaches this local minimum at
c = 4. Considering that SC and S are more useful, when comparing different
clustering methods with the same c, we chose the optimal number of clusters
to 4, which is confirmed by the Dunn’s index too in Fig. 3.8. (The Alternative
Dunn Index is not tested enough to know how reliable its results are.)
0.1
0.08
0.06
0.04
0.02
0
2 4 6 8 10 12 14
1.5
0.5
0
2 4 6 8 10 12 14
Figure 3.8: Values of Dunn’s Index and the Alternative Dunn Index
• Iris data
• Wine data
These data sets come from the UCI Repository of Machine Learning Databases,
and they are downloadable from:
ftp://ftp.ics.uci.edu/pub/machine-learning-databases/ , but they also exist in
the ..\Demos\projection\ directory. Cause of the too many data points there
is no use to show the partition matrixes in tables, so the results of the n-
dimensional clustering was projected into 2-dimension, and the 2-D results were
plotted. Considering that projected figures are only approximations of the real
partitioning results, the difference between the original and the projected par-
tition matrix is also represented, and on the other hand one can observe the
difference between the PCA, Sammon’s mapping and the Modified Sammon
Mapping too, when these values are comprehended. These relation-indexes are
defined in (2.26).
Because of using real data sets the classes are known, so the misclassed objects
can be plotted. These items are signed with an ’o’ accessory-marker. The
goodness of the classification is also can be defined by the percental error of
these misclassed data points (a data point is misclassed when on the grounds
of its largest membership degree it is not in the cluster which it belongs to):
Nmisscl
Err = · 100. (3.1)
N
Note that calculating Err indicates only the goodness of the classification not
the clustering, hence we must take its results under protest. This error value is
independent from the visualization method, so they are stated below in Tab. 3.3.
For the comparison five independent run were estimated with each algorithm.
Fuzzy C-means, Gustafson-Kessel and Gath-Geva returned always with the same
minimum, while the results of K-means and K-medoid hardly depend from the
initialization as shown in Tab. 3.3, so in the case of these algorithms the mini-
mum, mean and maximum were also represented.
The difference between the minimal and maximal value shows the sensitivity of
the algorithm to the initialization (as one can see K-medoid is the most sensitive
Table 3.3: Percental values of the misclassed objects with different clustering
methods.
algorithm). The minimal value denotes the flexibility of the model, the mean
denotes the estimation of the expected error. One can see that the best stable
results has the Fuzzy C-means clustering for these data sets, so its resulting
figures are shown in the following subsections, which are generated with the
visualcall function (situated in the same directory as the data sets).
The data set contains 3 classes of 50 instances each, where each class refers
to a type of iris plant. One class is linearly separable from the other 2; the
latter are not linearly separable from each other. Predicted attribute: class of
iris plant.
The attributes are as follows:x1 - sepal length in cm, x2 - sepal width in cm,
x3 - petal length in cm, x4 - petal width in cm. In x5 attribute are the three
classes: Iris Setosa (marked with ’.’), Iris Versicolour (marked with ’x’) and Iris
Virginica(marked with ’+’). The projected cluster centers are marked with ’*’.
PCA
0.6
0.4
0.2
y2
−0.2
−0.4
PN 2 PN 2∗
P k=1 µk k=1 µk E
PCA 0.0203 0.7420 0.7850 0.0117
Sammon 0.0132 0.7420 0.7662 0.0071
FuzSam 0.0025 0.7420 0.7426 0.0131
0.4
0.2
y2
−0.2
−0.4
Figure 3.10: Result of Sammon’s mapping projection by the Iris data set.
FuzSam
0.6
0.4
0.2
y2
−0.2
−0.4
Figure 3.11: Result of Fuzzy Sammon Mapping projection by the Iris data
set.
The Wine data contains the chemical analysis of 178 wines grown in the same
region in Italy but derived from three different cultivars (marked with ’.’,’x’
and ’+’). The problem is to distinguish the three different types based on 13
continuous attributes derived from chemical analysis.
PCA
0.6
0.4
0.2
0
y2
−0.2
−0.4
−0.6
−0.8
−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8
y1
Sammon
0.8
0.6
0.4
0.2
y2
−0.2
−0.4
−0.6
−0.8
−1
Figure 3.13: Result of Sammon’s mapping projection by the Wine data set.
0.8
0.6
0.4
0.2
y2
−0.2
−0.4
−0.6
−0.8
−1
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
y1
Figure 3.14: Result of Fuzzy Sammon Mapping projection by the Wine data
set.
PN 2 PN 2∗
P k=1 µk k=1 µk E
PCA 0.1295 0.5033 0.7424 0.1301
Sammon 0.0874 0.5033 0.6574 0.0576
FuzSam 0.0365 0.5033 0.5170 0.0991
The Wisconsin breast cancer data is widely used to test the effectiveness of
classification and rule extraction algorithms. The aim of the classification is to
distinguish between benign and malignant cancers based on the available nine
measurements: x1 clump thickness, x2 uniformity of cell size, x3 uniformity of
cell shape, x4 marginal adhesion, x5 single epithelial cell size, x6 bare nuclei, x7
bland chromatin, x8 normal nuclei, and x9 mitosis. The attributes have integer
value in the range [1,10]. The original database contains 699 instances however
16 of these are omitted because these are incomplete, which is common with
other studies. The class distribution is 65.5% benign and 34.5% malignant,
respectively.
PCA
0.6
0.4
0.2
0
y2
−0.2
−0.4
−0.6
−0.8
−1
Figure 3.15: Result of PCA projection by the Wisconsin Breast Cancer data
set.
PN 2 PN 2∗
P k=1 µk k=1 µk E
PCA 0.0456 0.8409 0.9096 0.0882
Sammon 0.0233 0.8409 0.8690 0.0260
FuzSam 0.0050 0.8409 0.8438 0.0497
0.5
0
y2
−0.5
−1
−1.5
FuzSam
0.5
y2
−0.5
−1
As Tab. 3.4, Tab. 3.5 and Tab. 3.6 show, Fuzzy Sammon Mapping has much
better projection results by the value of P than Principal component Analy-
sis, and it is computationally cheaper than the original Sammon Mapping. So
during the evaluation of the partition the figures created with this projection
method were considered. We calculated the original Sammon’s stress for all the
three techniques to be able to compare them.
Tab. 3.3 shows that the ”advanced” algorithms does not have always the best
GK algorithm − FuzSam
0.8
0.6
0.4
0.2
y2
−0.2
−0.4
−0.6
Figure 3.18: Result of Fuzzy Sammon projection by the wine data set with
Gustafson-Kessel algorithm.
To meet the growing demands of systematizing the nascent data, a flexible, pow-
erful tool is needed. The Fuzzy Clustering and Data Analysis Toolbox provides
several approaches to cluster, classify and evaluate wether industrial or exper-
imental data sets. The software for these operations has been developed with
Matlab, which is very powerful for matrix-based calculations. The Toolbox
provides five different types of clustering algorithms, which can be validated by
seven validity measures. High-dimensional data sets can be also visualized with
a 2-dimension projection, hence the toolbox contains three different method for
visualization.
With all these useful tools the work is not finished yet: many more clustering
algorithms, validation indices, visualization methods exist or will come up the
next day. They could take place in this work too, hence the author knows ”it is
not the end. . . it is the beginning. . . ”
72
Bibliography
[4] D.E. Gustafson and W.C. Kessel. Fuzzy clustering with fuzzy covariance
matrix. In Proceedings of the IEEE CDC, San Diego, pages 761–766. 1979.
[6] J.C. Bezdek and J.C. Dunn. Optimal fuzzy partitions: A heuristic for
estimating the parameters in a mixture of normal dustrubutions. IEEE
Transactions on Computers, pages 835–838, 1975.
[7] I. Gath and A.B. Geva. Unsupervised optimal fuzzy clustering. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, 7:773–781, 1989.
[8] A.M. Bensaid, L.O. Hall, J.C. Bezdek, L.P. Clarke, M.L. Silbiger, J.A.
Arrington, and R.F. Murtagh. Validity-guided (Re)Clustering with applica-
tions to imige segmentation. IEEE Transactions on Fuzzy Systems, 4:112–
123, 1996.
[9] X. L. Xie and G. A. Beni. Validity measure for fuzzy clustering. IEEE
Trans. PAMI, 3(8):841–846, 1991.
[10] Sammon JW Jr. A nonlinear mapping for data structure analysis. IEEE
Transactions on Computers, 18:401–409, 1969.
73
[12] Juha Vesanto. Neural network tool for data mining: Som toolbox. In Pro-
ceedings of Symposium on Tool Environments and Development Methods
for Intelligent Systems (TOOLMET2000), pages 184–196. 2000.