Comparative Analysis of Community Detection Algorithms
Comparative Analysis of Community Detection Algorithms
Abstract—Rapid growth in data has caused a sudden surge regarding number of communities and size of them. Moreover,
of interest among researchers to study the network structure community detection combines similar nodes in same group
for community detection. In this paper, we have provided a whereas graph partitioning only minimizes cut section [3]. In
comparative analysis of community detection algorithms for com-
plex networks. In comparison with earlier work on community a community, there are more intra-community links compared
detection, our work presents the analysis on real network data to inter-community links. The community structure in network
instead of using synthetic data. This analysis has been performed can also be expressed in terms of probability. In a network
in two phases. The first phase uses small and medium networks if Pin is the probability with which nodes are connected
(nodes:10k,edges:20k) and is a selection phase to extract best to other nodes in same community and Pout is probabil-
performing algorithms. The second phase identifies the best
community detection algorithm through evaluation on larger ity of linking nodes with other nodes present in different
complex networks(nodes:100k,edges:1000k). communities, condition Pin > Pout implies an existence
Index Terms—complex networks, community detection, evalu- of communities otherwise partitioned network will not be
ation, analysis better than a random graph with no significant community
structure [8]. Modularity(Q) [10] is used to assess the quality
I. I NTRODUCTION of detected communities. Modularity of a partitioned network
Real life systems are represented as networks with nodes can range from 0 to 1. Higher modularity value indicates better
as elementary parts and edges as communication between community structure while a modularity value of 0 indicates
them [7]. These networks with scale-free properties and that the detected communities are not better than the same in
enormous size are known as complex networks. Complex a random network. Modularity value greater than 0.3 signifies
networks are networks with millions of vertices and edges. an existence of close-knit community in network [4]. In this
Studying complex networks are essentially of greater utility paper, we have considered modularity and execution time as
for understanding complex real systems. Enormous size of evaluation parameters for community detection algorithms.
complex networks makes it very difficult to comprehend them Community detection techniques broadly can be divided
quickly. Community identification may simplify the process into agglomerative methods, divisive methods and optimiza-
of understanding complex networks by identifying coherent tion methods [6]. Agglomerative method perform merging
substructures. Researchers across disciplines have proposed of nodes on the basis of their similarity to each other. To
numerous community detection techniques [1] [2] [3] [4] [6] the contrary, divisive approach removes links between com-
[9] [10] [12] [13]. These techniques aid in understanding munities recursively. Optimization methods try to maximize
network structures with lesser efforts. But one recent study or minimize an objective function while finding community
has revealed that characteristics at community level are quite pattern in network.
different from same at network level [9]. Despite the contrarian Rest of the paper is organized as follows. Section 2 provides
views, it is certain that network structure plays a vital role in an introduction to community finding algorithms chosen for
understanding the properties of a network. A community in a analysis. In section 3, we provide a detailed description of
social network groups a set of nodes which are similar to each modularity parameter. In section 4, we present a brief summary
other. For instance, in a WWW network, partitions, similar to of datasets used for analysis and also provide details of
communities are defined on the basis of content presented in experimental setup. In section 5, we have analyzed obtained
web pages, which divides the network into groups which have results of this study. Finally, section 6 presents conclusion of
similar content in each group while being different from other comparative study undertaken in this paper.
groups.
Community detection algorithms resemble graph partition- II. C OMMUNITY D ETECTION A LGORITHMS
ing methods. Graph partitioning problem deals with dividing Numerous community detection methods have been pro-
a graph into approximately equal sized c clusters whereas posed in past few years. These methods allow researchers to
community detection does not require any priori information reveal communities in networks which can be used in wide
978-1-5386-1866-0/17/$31.00 2017
c IEEE
Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on February 10,2025 at 18:03:38 UTC from IEEE Xplore. Restrictions apply.
2017 Conference on Information and Communication Technology (CICT’17)
range of applications ie. recommendation system, innovation D. Fast Greedy community detection
diffusion, viral marketing etc. S. Emminos et al. [21] pro- This algorithm was proposed by Clauset et al. [9]. It mainly
vided comparison of Louvain, Infomap, Label Propagation focuses on networks which have sparse adjacency matrix.
and Smart Local Moving algorithms using modularity and This method utilizes efficient data structures to speed-up
information recovery metrics. Our analysis has not considered community detection. It starts with considering each node as
information recovery metrics due to non-availability of ground a community and maintains ΔQ (change in modularity) for
truth reality of communities for some of datasets used in each pair of communities. A maxheap is maintained which
this paper. In this paper, we have used Python’s igraph [20] stores largest ΔQ with information of community pairs. In
library to compare community detection algorithms. This every step communities are combined which results in higher
library provides mostly used community detection algorithms modularity gain. This process stops when there remains a
ie. Newman2006, Infomap, Louvain, Fast greedy, Label prop- single community. Fast greedy community detection executes
agation, Spin-glass and Random-walktrap algorithms. in O(md logn) time where d is depth of dendrogram.
This algorithm was proposed by Blondel et al. [6]. This III. EVALUATION METRICS
algorithm works in multiple passes. It utilizes modularity In this paper, we have used modularity [10] and execu-
parameter as the stopping criteria. This process stops when tion time as the evaluation factors for community detection
there is no change in modularity value. In first phase, local algorithms. Modularity measures goodness of partitions of a
maxima of modularity is discovered. Each node i is considered network by capturing differences between partitions produced
as belonging to unique community. Adjacent nodes whose by community detection algorithms and partitions of a random
merging results in higher modularity gain are combined in network.
same group. Once local maxima is achieved, next phase starts. Adjacency matrix A stores elements in 0 or 1 form. If Aij
In next phase, communities are treated as nodes while total value is 1 then it means there is an edge between node i and
of weights of inter-communities edges are taken as weight node j.
assigned to edges among new nodes. Again same process is
repeated on this newly formed network. Results have shown 1
significant improvement in terms of computational speed as if node i and node j are connected,
Aij = (1)
compared to others. 0 otherwise
Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on February 10,2025 at 18:03:38 UTC from IEEE Xplore. Restrictions apply.
2017 Conference on Information and Communication Technology (CICT’17)
TABLE I
C OMPARATIVE ANALYSIS OF C OMMUNITY DETECTION ALGORITHMS
Dataset newman2006 infomap louvain fast greedy spin glass random walk label prop
Q 0.393 0.402 0.419 0.380 0.419 0.353 0.402
Karate
T 0.004 0.007 0.0001 0.0001 .467 0.0002 6.389
Q 0.491 0.528 0.518 0.495 0.528 0.489 0.486
Dolphin
T .009 0.011 .0002 .0002 0.624 0.0004 0.0001
Q 0.467 0.523 0.520 0.502 0.526 0.507 0.495
Polbooks
T 0.012 0.027 0.0004 0.0006 1.67 0.0011 0.0002
Q 0.952 0.929 0.959 0.955 – 0.956 0.908
Netscience
T 0.15 0.394 0.0069 0.007 – 0.0233 0.0064
Q 0.799 0.809 0.834 0.774 0.833 0.811 0.814
Facebook
T 1.79 5.02 0.1 1.53 563.5 1.96 0.0814
Q 0.825 0.815 0.936 0.933 0.920 0.831 0.804
Powergrid
T 4.256 7.63 .051 .0168 147.71 0.215 0.375
Q 0.756 0.768 0.848 0.812 – 0.755 0.771
HiEnCo
T 5.666 6.42 .0486 0.216 – 0.977 0.855
Q 0.343 0.674 0.760 0.678 – 0.646 0.659
Cond-2003
T 2.95 202.87 0.3228 30.15 – 42.19 23.35
Modularity presents goodness score for partitions of a IV. DATA SETS & E XPERIMENTAL SETUP
network. This score is calculated by finding the difference In this paper we have used two kinds of data sets, medium
between fraction of edges inside a community and the same and large. Medium data sets (Table II) are used for comparing
in a random network. Fraction of edges inside a community performance of community finding methods discussed in this
is computed as follows paper and to select the best performing methods (Table III).
u,v Au,v δ(cu , cv ) The medium datasets are karate [15], dolphin [18], polbooks,
= (2) netscience [9], facebook [5], powergrid [17], hiEnCo [16] and
u,v Au,v
Cond-2003 [16]. The large datasets are complex networks
Function δ(cu , cv ) considers only edges whose both vertices from Stanford datasets [5].
are grouped in same community. Here, cu represents commu-
nity of node u and cv represents community of node v. Value of TABLE II
δ(cu , cv ) is 1 if cu equals to cv and 0 otherwise. Denominator DATA S ETS FOR A NALYSIS
of equation (2) counts each edge twice and hence the total
number of edges is given by, Dataset nodes edges
Karate 34 78
1 Dolphin 62 159
m= Au,v (3)
2 u,v Facebook 4039 88234
Powergrid 4941 6594
So, equation (2) can be rewritten as Polbooks 105 441
HiEnCo 8361 15751
1 Cond-2003 31163 120029
= Au,v δ(cu , cv ) (4)
2m u,v Netscience 1589 2742
Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on February 10,2025 at 18:03:38 UTC from IEEE Xplore. Restrictions apply.
2017 Conference on Information and Communication Technology (CICT’17)
TABLE III
C OMPLEX N ETWORK DATA S ETS FOR A NALYSIS
Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on February 10,2025 at 18:03:38 UTC from IEEE Xplore. Restrictions apply.
2017 Conference on Information and Communication Technology (CICT’17)
[15] W. W. Zachary. ”An information flow model for conflict and fission
in small groups.” Journal of Anthropological Research 33, 1977, pp.
452-473.
[16] M. E. J. Newman, Proc. Natl. Acad. Sci. USA 98, 2001, pp. 404-409.
[17] D. J. Watts and S. H. Strogatz, Nature 393, 1998, pp. 440-442.
[18] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S.
M. Dawson, Behavioral Ecology and Sociobiology 54, 2003, pp. 396-
405.
[19] A. Mislove and M. Marcon and K. P. Gummadi and P. Druschel and B.
Bhattacharjee. ”Measurement and Analysis of Online Social Networks.”
Proceedings of the 5th ACM/Usenix Internet Measurement Conference,
2007.
[20] Csardi G, Nepusz T: The igraph software package for complex network
research, InterJournal, Complex Systems 1695. 2006. http://igraph.org
[21] S. Emmons, S. Kobourov, M. Gallant, and K. Brner, ”Analysis of
Network Clustering Algorithms and Cluster Quality Metrics at Scale.”
Ed. Constantine Dovrolis. PLoS ONE 11.7 (2016): e0159161. PMC.
Web. 27 Aug. 2017.
Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on February 10,2025 at 18:03:38 UTC from IEEE Xplore. Restrictions apply.