0% found this document useful (0 votes)
7 views

Comparative Analysis of Community Detection Algorithms

This paper presents a comparative analysis of community detection algorithms applied to complex networks, focusing on real network data rather than synthetic data. The study evaluates various algorithms based on modularity and execution time across small, medium, and large datasets, identifying the best-performing methods. The results indicate significant differences in community detection effectiveness, emphasizing the importance of network structure in understanding complex systems.

Uploaded by

as784
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Comparative Analysis of Community Detection Algorithms

This paper presents a comparative analysis of community detection algorithms applied to complex networks, focusing on real network data rather than synthetic data. The study evaluates various algorithms based on modularity and execution time across small, medium, and large datasets, identifying the best-performing methods. The results indicate significant differences in community detection effectiveness, emphasizing the importance of network structure in understanding complex systems.

Uploaded by

as784
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

2017 Conference on Information and Communication Technology (CICT’17)

Comparative Analysis of Community Detection


Algorithms
Pankaj Chejara, W. Wilfred Godfrey
Department of ICT
ABV-Indian Institute of Information Technology and Management
Gwalior, India
[email protected], [email protected]

Abstract—Rapid growth in data has caused a sudden surge regarding number of communities and size of them. Moreover,
of interest among researchers to study the network structure community detection combines similar nodes in same group
for community detection. In this paper, we have provided a whereas graph partitioning only minimizes cut section [3]. In
comparative analysis of community detection algorithms for com-
plex networks. In comparison with earlier work on community a community, there are more intra-community links compared
detection, our work presents the analysis on real network data to inter-community links. The community structure in network
instead of using synthetic data. This analysis has been performed can also be expressed in terms of probability. In a network
in two phases. The first phase uses small and medium networks if Pin is the probability with which nodes are connected
(nodes:10k,edges:20k) and is a selection phase to extract best to other nodes in same community and Pout is probabil-
performing algorithms. The second phase identifies the best
community detection algorithm through evaluation on larger ity of linking nodes with other nodes present in different
complex networks(nodes:100k,edges:1000k). communities, condition Pin > Pout implies an existence
Index Terms—complex networks, community detection, evalu- of communities otherwise partitioned network will not be
ation, analysis better than a random graph with no significant community
structure [8]. Modularity(Q) [10] is used to assess the quality
I. I NTRODUCTION of detected communities. Modularity of a partitioned network
Real life systems are represented as networks with nodes can range from 0 to 1. Higher modularity value indicates better
as elementary parts and edges as communication between community structure while a modularity value of 0 indicates
them [7]. These networks with scale-free properties and that the detected communities are not better than the same in
enormous size are known as complex networks. Complex a random network. Modularity value greater than 0.3 signifies
networks are networks with millions of vertices and edges. an existence of close-knit community in network [4]. In this
Studying complex networks are essentially of greater utility paper, we have considered modularity and execution time as
for understanding complex real systems. Enormous size of evaluation parameters for community detection algorithms.
complex networks makes it very difficult to comprehend them Community detection techniques broadly can be divided
quickly. Community identification may simplify the process into agglomerative methods, divisive methods and optimiza-
of understanding complex networks by identifying coherent tion methods [6]. Agglomerative method perform merging
substructures. Researchers across disciplines have proposed of nodes on the basis of their similarity to each other. To
numerous community detection techniques [1] [2] [3] [4] [6] the contrary, divisive approach removes links between com-
[9] [10] [12] [13]. These techniques aid in understanding munities recursively. Optimization methods try to maximize
network structures with lesser efforts. But one recent study or minimize an objective function while finding community
has revealed that characteristics at community level are quite pattern in network.
different from same at network level [9]. Despite the contrarian Rest of the paper is organized as follows. Section 2 provides
views, it is certain that network structure plays a vital role in an introduction to community finding algorithms chosen for
understanding the properties of a network. A community in a analysis. In section 3, we provide a detailed description of
social network groups a set of nodes which are similar to each modularity parameter. In section 4, we present a brief summary
other. For instance, in a WWW network, partitions, similar to of datasets used for analysis and also provide details of
communities are defined on the basis of content presented in experimental setup. In section 5, we have analyzed obtained
web pages, which divides the network into groups which have results of this study. Finally, section 6 presents conclusion of
similar content in each group while being different from other comparative study undertaken in this paper.
groups.
Community detection algorithms resemble graph partition- II. C OMMUNITY D ETECTION A LGORITHMS
ing methods. Graph partitioning problem deals with dividing Numerous community detection methods have been pro-
a graph into approximately equal sized c clusters whereas posed in past few years. These methods allow researchers to
community detection does not require any priori information reveal communities in networks which can be used in wide

978-1-5386-1866-0/17/$31.00 2017
c IEEE
Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on February 10,2025 at 18:03:38 UTC from IEEE Xplore. Restrictions apply.
2017 Conference on Information and Communication Technology (CICT’17)

range of applications ie. recommendation system, innovation D. Fast Greedy community detection
diffusion, viral marketing etc. S. Emminos et al. [21] pro- This algorithm was proposed by Clauset et al. [9]. It mainly
vided comparison of Louvain, Infomap, Label Propagation focuses on networks which have sparse adjacency matrix.
and Smart Local Moving algorithms using modularity and This method utilizes efficient data structures to speed-up
information recovery metrics. Our analysis has not considered community detection. It starts with considering each node as
information recovery metrics due to non-availability of ground a community and maintains ΔQ (change in modularity) for
truth reality of communities for some of datasets used in each pair of communities. A maxheap is maintained which
this paper. In this paper, we have used Python’s igraph [20] stores largest ΔQ with information of community pairs. In
library to compare community detection algorithms. This every step communities are combined which results in higher
library provides mostly used community detection algorithms modularity gain. This process stops when there remains a
ie. Newman2006, Infomap, Louvain, Fast greedy, Label prop- single community. Fast greedy community detection executes
agation, Spin-glass and Random-walktrap algorithms. in O(md logn) time where d is depth of dendrogram.

A. Newman2006 E. Label propagation community detection


Newman utilized the concept of spectral partitioning [1]. Raghavan et al. [3] presented a near linear time algorithm
Leading eigen vectors of modularity matrix are calculated to detect communities in complex networks. This method is
and then network is partitioned into two sub-networks which of greater interest for a researcher due to its near linear time
maximize modularity. In further subdivision, modularity con- complexity. It works as each node is assigned a unique label
tribution is calculated at each step and this process is stopped in the begining and then in every step node gets a label which
when this contribution becomes negative. is owned by most of its neighbors. In case of ties randomly
selected label among neighbors is used. As a stop condition
strong community [11] measure is used.
B. Infomap community detection
This algorithm was proposed by Martin Rosvall et al. [14]. F. Spin-glass community detection
It uses map equation to find community structure in network. This algorithm was proposed by J Reichardt [13]. This
Map equation represents description length of a random walker algorithm considers spin state of nodes as communities and
in a network. Partitions with good modular structure tend to try to minimize the spin energy. This method works on the
have smaller description length. This is used as an objective concept that nodes with same spin state should be connected
function to find better partitions of a network. If a random and with different spin state should be disconnected. The major
walker stays longer in a region then it’s description length objective of this method is to find ground state of spin-glass
can be compressed. Hence, partitions with better community model.
structure will have minimum description length. This method
is based on agglomeration of nodes. It begins with by consid- G. Random-walk community detection
ering each node as a separate module. Then, randomly selected In [12] author used random-walk concept to find community
nodes are combined together resulting in largest decrease in a network. This method is based on the principle that
in map equation. In subsequent steps, modules formed in random walks tend to be confined to denser region of a
previous steps are considered as nodes and the same process is network (ie. communities). Random walker starts from a non-
repeated. This process stops when there is no further decrease clustered area and calculates distance between adjacent nodes.
in map equation. Two adjacent communities are chosen and merged into one.
Then, distance between communities are updated. This process
C. Louvain community detection is repeated (N-1) times.

This algorithm was proposed by Blondel et al. [6]. This III. EVALUATION METRICS
algorithm works in multiple passes. It utilizes modularity In this paper, we have used modularity [10] and execu-
parameter as the stopping criteria. This process stops when tion time as the evaluation factors for community detection
there is no change in modularity value. In first phase, local algorithms. Modularity measures goodness of partitions of a
maxima of modularity is discovered. Each node i is considered network by capturing differences between partitions produced
as belonging to unique community. Adjacent nodes whose by community detection algorithms and partitions of a random
merging results in higher modularity gain are combined in network.
same group. Once local maxima is achieved, next phase starts. Adjacency matrix A stores elements in 0 or 1 form. If Aij
In next phase, communities are treated as nodes while total value is 1 then it means there is an edge between node i and
of weights of inter-communities edges are taken as weight node j.
assigned to edges among new nodes. Again same process is
repeated on this newly formed network. Results have shown  1
significant improvement in terms of computational speed as if node i and node j are connected,
Aij = (1)
compared to others. 0 otherwise

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on February 10,2025 at 18:03:38 UTC from IEEE Xplore. Restrictions apply.
2017 Conference on Information and Communication Technology (CICT’17)

TABLE I
C OMPARATIVE ANALYSIS OF C OMMUNITY DETECTION ALGORITHMS

Dataset newman2006 infomap louvain fast greedy spin glass random walk label prop
Q 0.393 0.402 0.419 0.380 0.419 0.353 0.402
Karate
T 0.004 0.007 0.0001 0.0001 .467 0.0002 6.389
Q 0.491 0.528 0.518 0.495 0.528 0.489 0.486
Dolphin
T .009 0.011 .0002 .0002 0.624 0.0004 0.0001
Q 0.467 0.523 0.520 0.502 0.526 0.507 0.495
Polbooks
T 0.012 0.027 0.0004 0.0006 1.67 0.0011 0.0002
Q 0.952 0.929 0.959 0.955 – 0.956 0.908
Netscience
T 0.15 0.394 0.0069 0.007 – 0.0233 0.0064
Q 0.799 0.809 0.834 0.774 0.833 0.811 0.814
Facebook
T 1.79 5.02 0.1 1.53 563.5 1.96 0.0814
Q 0.825 0.815 0.936 0.933 0.920 0.831 0.804
Powergrid
T 4.256 7.63 .051 .0168 147.71 0.215 0.375
Q 0.756 0.768 0.848 0.812 – 0.755 0.771
HiEnCo
T 5.666 6.42 .0486 0.216 – 0.977 0.855
Q 0.343 0.674 0.760 0.678 – 0.646 0.659
Cond-2003
T 2.95 202.87 0.3228 30.15 – 42.19 23.35

Modularity presents goodness score for partitions of a IV. DATA SETS & E XPERIMENTAL SETUP
network. This score is calculated by finding the difference In this paper we have used two kinds of data sets, medium
between fraction of edges inside a community and the same and large. Medium data sets (Table II) are used for comparing
in a random network. Fraction of edges inside a community performance of community finding methods discussed in this
is computed as follows paper and to select the best performing methods (Table III).

u,v Au,v δ(cu , cv ) The medium datasets are karate [15], dolphin [18], polbooks,
=  (2) netscience [9], facebook [5], powergrid [17], hiEnCo [16] and
u,v Au,v
Cond-2003 [16]. The large datasets are complex networks
Function δ(cu , cv ) considers only edges whose both vertices from Stanford datasets [5].
are grouped in same community. Here, cu represents commu-
nity of node u and cv represents community of node v. Value of TABLE II
δ(cu , cv ) is 1 if cu equals to cv and 0 otherwise. Denominator DATA S ETS FOR A NALYSIS
of equation (2) counts each edge twice and hence the total
number of edges is given by, Dataset nodes edges
Karate 34 78
1 Dolphin 62 159
m= Au,v (3)
2 u,v Facebook 4039 88234
Powergrid 4941 6594
So, equation (2) can be rewritten as Polbooks 105 441
HiEnCo 8361 15751
1  Cond-2003 31163 120029
= Au,v δ(cu , cv ) (4)
2m u,v Netscience 1589 2742

For trivial cases in which the entire network is considered as a


single community (equation 4) achieves highest fraction of 1. A. Datasets
Therefore, in order to avoid trivial cases the expected fraction
Karate data set is a friendship network of 34 karate club’s
of edges are subtracted. If kv is degree of node v and kw is
students at a US University. This club is split into two groups
degree of node w. Then the expected number of edges between
as a consequence of spat between their group’s leader. Dolphin
node v and w would be
data set is association network among 64 dolphins living in
kv kw community. Polbooks contains books of politics published
= (5)
2m during 2004 and an edge between them represent co-purchase
Now, modularity can be written as made by buyer on Amazon. Netscience dataset provides
1  kv kw co-authorship network of scientists working in the area of
Q= ( Au,v − )δ(cu , cv ) (6) network theory. Facebook dataset is social network obtained
2m u,v 2m
from facebook social site through survey. All data replaced
If a community structure is not better than a random network by anonymous data to protect identity of users. Powergrid
then modularity value is 0. Network with modularity 0.3 network data is topological representation of western state
are expected to have a strong community structure. The power grids of united states. HiEnCo is a weighted network of
algorithms are also compared for their efficieny with respect co-authorship of scientist posting under HIgh Energy Theory
to experimental running times. Archive. Cond-2003 data set is updated co-authorship network

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on February 10,2025 at 18:03:38 UTC from IEEE Xplore. Restrictions apply.
2017 Conference on Information and Communication Technology (CICT’17)

of scientist posted under Condensed Matter E-Print Archive


between 1995 and 2003.

TABLE III
C OMPLEX N ETWORK DATA S ETS FOR A NALYSIS

Dataset nodes edges communities


Youtube 1,134,890 2,987,624 8,385
DBLP 317,080 1,049,866 13,477
Amazon 334,863 925,872 75,149

Youtube dataset contains user’s group formed over youtube


(video sharing website) by users. This data set has been
provided by [19]. Amazon dataset is collected by Amazon Fig. 1. Comparison of modularity for various algorithms
website. Products which are bought together represent a link
in this dataset. For each product category, links connecting
VI. CONCLUSION
product of that category are considered as single community.
DBLP is a co-citation network. Nodes in this network rep- The analysis presented in this paper has shown that lou-
resent authors and link between two nodes represents that vain community detection algorithm has outperformed other
corresponding authors have coauthored a paper together. community finding methods undertaken. This method took
least time on complex networks. However, this analysis has
B. Experimental Setup also revealed that there are community detection methods i.e
spin glass, fast greedy, infomap worked approximately as good
We have used igraph python library for community detec- as louvain but are computationally expensive. First phase of
tion algorithms and these algorithms are executed on system analysis allowed us to select computationally efficient and
with Core i7 processor with 4GB RAM. high modular structure producing community method (new-
man2006, louvain, fast greedy) for further analysis. Presented
V. R ESULTS results have shown that louvain community detection method
In our analysis, we have found that in first phase louvain, has performed best among all community detection algorithms
newman2006, label propagation and fast greedy algorithms for both phases.
performed better than others in terms of modularity and R EFERENCES
execution time. Although label propagation method offered an
efficient solution to community detection but poorly performed [1] M. E. J. Newman, Finding community structure in networks using the
eigenvectors of matrices. Physical review E 74.3, 066133, 2006.
on cond-2003 dataset in terms of execution time. However, [2] M. Rosvall and C. T. Bergstrom. Multilevel compression of random
spin-glass community detection method has performed similar walks on networks reveals hierarchical organization in large integrated
to these algorithms in terms of modularity but running time systems. Plos one 6.4, e18209, 2011.
[3] U.N. Raghavan, R. Albert, and S. Kumara. Near linear time algorithm
of spin-glass method is much higher. Spin-glass method failed to detect community structures in large-scale networks. Physical review
to perform over netscience, hiEnCo and cond-2003 datasets E 76.3, 036106, 2007.
because it requires fully connected graph in order to work. [4] M. E. J. Newman, Fast algorithm for detecting community structure in
networks. Physical review E 69.6, 066133, 2004.
In case of complex network, louvain outperformed new- [5] J. Leskovec, A. Krevl, SNAP datasets : Stanford large network dataset
man2006 and fast greedy algorithms. Newman2006 commu- collection, https://snap.stanford.edu/data, June 2014.
nity detection method produced community structure with [6] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, E. Lefebvre. Fast unfold-
ing of communities in large networks, Journal of statistical mechanics:
very low modularity value. Fast greedy algorithm finds out theory and experiment 2008. 10, 2008: P10008.
community with higher modularity value as compared to [7] A. Lancichinetti, S. Fortunato. Community detection algorithms: a
newman2006 but the execution time was significanlty higher. comparative analysis. Physical Review E 80, 056117, 2009.
[8] A. Condon, R. M. Karp, Random struct. Algor. 18, 116, 2001.
[9] A. Clauset, M.E.J. Newman and C. Moore. ”Finding community struc-
TABLE IV ture in very large networks.” Phys. Rev. E 70, 066111, 2004.
C OMPARATIVE ANALYSIS OF C OMMUNITY DETECTION ALGORITHMS ON [10] M. E. J. Newman. Finding and evaluating community structure in
C OMPLEX NETWORKS networks. Phys. Rev. E 69, 026113, 2004.
[11] F. Fadichi, C. Castellano, F. Cecconi, V. Loreto and D. Parisi. Proceed-
Dataset newman2006 louvain fast greedy ings of national academy of sciences 101, 2658, 2004.
Q 0.0 0.685 - [12] P. Pons, M. Latapy. ”Computing Communities in Large Networks Using
Youtube Random Walks.” In: Yolum ., Gngr T., Grgen F., zturan C. (eds)
T 37.33 14.14 -
Q 0.0 0.925 0.876 Computer and Information Sciences - ISCIS 2005. ISCIS 2005. Lecture
Amazon Notes in Computer Science, vol 3733. Springer, Berlin, Heidelberg
T 13.35 4.68 736.02
Q 0.0241 0.809 0.735 [13] J. Reichardt and S. Bornholdt. ”Statistical mechanics of community
DBLP detection.” Phys, Rev. E 74, 016110, 2006.
T 10.01 4.38 2596.2
[14] M. Rosvall and C. T. Bergstrom. ”Maps of information flow reveal
community structure in complex networks.” PNAS 105, 1118, 2008.

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on February 10,2025 at 18:03:38 UTC from IEEE Xplore. Restrictions apply.
2017 Conference on Information and Communication Technology (CICT’17)

[15] W. W. Zachary. ”An information flow model for conflict and fission
in small groups.” Journal of Anthropological Research 33, 1977, pp.
452-473.
[16] M. E. J. Newman, Proc. Natl. Acad. Sci. USA 98, 2001, pp. 404-409.
[17] D. J. Watts and S. H. Strogatz, Nature 393, 1998, pp. 440-442.
[18] D. Lusseau, K. Schneider, O. J. Boisseau, P. Haase, E. Slooten, and S.
M. Dawson, Behavioral Ecology and Sociobiology 54, 2003, pp. 396-
405.
[19] A. Mislove and M. Marcon and K. P. Gummadi and P. Druschel and B.
Bhattacharjee. ”Measurement and Analysis of Online Social Networks.”
Proceedings of the 5th ACM/Usenix Internet Measurement Conference,
2007.
[20] Csardi G, Nepusz T: The igraph software package for complex network
research, InterJournal, Complex Systems 1695. 2006. http://igraph.org
[21] S. Emmons, S. Kobourov, M. Gallant, and K. Brner, ”Analysis of
Network Clustering Algorithms and Cluster Quality Metrics at Scale.”
Ed. Constantine Dovrolis. PLoS ONE 11.7 (2016): e0159161. PMC.
Web. 27 Aug. 2017.

Authorized licensed use limited to: SHIV NADAR UNIVERSITY. Downloaded on February 10,2025 at 18:03:38 UTC from IEEE Xplore. Restrictions apply.

You might also like