Fast Unfolding of Communities in Large Networks
Fast Unfolding of Communities in Large Networks
net/publication/1913681
CITATIONS READS
18,567 29,707
4 authors, including:
Renaud Lambiotte
University of Oxford
298 PUBLICATIONS 34,832 CITATIONS
SEE PROFILE
All content following this page was uploaded by Renaud Lambiotte on 31 July 2013.
This article has been downloaded from IOPscience. Please scroll down to see the full text article.
(http://iopscience.iop.org/1742-5468/2008/10/P10008)
View the table of contents for this issue, or go to the journal homepage for more
Download details:
IP Address: 138.48.20.157
The article was downloaded on 30/12/2012 at 07:21
Online at stacks.iop.org/JSTAT/2008/P10008
doi:10.1088/1742-5468/2008/10/P10008
2008
c IOP Publishing Ltd and SISSA 1742-5468/08/P10008+12$30.00
Fast unfolding of communities in large networks
Contents
1. Introduction 2
2. Method 3
3. Application to large networks 6
4. Conclusion and discussion 10
Acknowledgments 11
1. Introduction
Social, technological and information systems can often be described in terms of complex
networks that have a topology of interconnected nodes combining organization and
randomness [1, 2]. The typical size of large networks such as social network services,
mobile phone networks or the web is now counted in millions, if not billions, of nodes
and these scales demand new methods to retrieve comprehensive information from their
structure. A promising approach consists in decomposing the networks into sub-units
or communities, which are sets of highly interconnected nodes [3]. The identification of
these communities is of crucial importance as they may help to uncover a priori unknown
functional modules such as topics in information networks or cyber-communities in social
networks. Moreover, the resulting meta-network, whose nodes are the communities, may
then be used to visualize the original network structure.
The problem of community detection requires the partition of a network into
communities of densely connected nodes, with the nodes belonging to different
communities being only sparsely connected. Precise formulations of this optimization
problem are known to be computationally intractable. Several algorithms have therefore
been proposed to find reasonably good partitions in a reasonably fast way. This search
for fast algorithms has attracted much interest in recent years due to the increasing
availability of large network datasets and the impact of networks on everyday life. One
can distinguish several types of community detection algorithms: divisive algorithms
detect inter-community links and remove them from the network [4]–[6], agglomerative
algorithms merge similar nodes/communities recursively [7] and optimization methods
are based on the maximization of an objective function [8]–[10]. The quality of the
partitions resulting from these methods is often measured by the so-called modularity
of the partition. The modularity of a partition is a scalar value between −1 and 1
that measures the density of links inside communities as compared to links between
communities [5, 11]. In the case of weighted networks (weighted networks are networks
that have weights on their links, such as the number of communications between two
mobile phone users), it is defined as [12]
1 ki kj
Q= Aij − δ(ci , cj ), (1)
2m i,j 2m
doi:10.1088/1742-5468/2008/10/P10008 2
Fast unfolding of communities in large networks
where Aij represents the weight of the edge between i and j, ki = j Aij is the sum of
the weights of the edges attached to vertex i, ci is the community to which vertex i is
1
assigned, the δ function δ(u, v) is 1 if u = v and 0 otherwise and m = 2 ij Aij .
Modularity has been used to compare the quality of the partitions obtained by
different methods, but also as an objective function to optimize [13]. Unfortunately,
exact modularity optimization is a problem that is computationally hard [14] and so
approximation algorithms are necessary when dealing with large networks. The fastest
approximation algorithm for optimizing modularity on large networks was proposed by
Clauset et al [8]. That method consists in recurrently merging communities that optimize
the production of modularity. Unfortunately, this greedy algorithm may produce values
2. Method
We now introduce our algorithm that finds high modularity partitions of large networks
in a short time and that unfolds a complete hierarchical community structure for the
network, thereby giving access to different resolutions of community detection. Contrary
to all the other community detection algorithms, the network size limits that we are facing
with our algorithm are due to limited storage capacity rather than limited computation
time: identifying communities in a 118 million nodes network took only 152 min4 .
Our algorithm is divided into two phases that are repeated iteratively. Assume that
we start with a weighted network of N nodes. First, we assign a different community to
each node of the network. So, in this initial partition there are as many communities as
there are nodes. Then, for each node i we consider the neighbours j of i and we evaluate
the gain of modularity that would take place by removing i from its community and by
4
All methods described here have been compiled and tested on the same machine: a bi-opteron 2.2k with 24 GB
of memory. The code is freely available for download on the web-page http://findcommunities.googlepages.com.
doi:10.1088/1742-5468/2008/10/P10008 3
Fast unfolding of communities in large networks
placing it in the community of j. The node i is then placed in the community for which
this gain is maximum (in the case of a tie we use a breaking rule), but only if this gain
is positive. If no positive gain is possible, i stays in its original community. This process
is applied repeatedly and sequentially for all nodes until no further improvement can be
achieved and the first phase is then complete. Let us insist on the fact that a node may be,
and often is, considered several times. This first phase stops when a local maxima of the
modularity is attained, i.e. when no individual move can improve the modularity5 . One
should also note that the output of the algorithm depends on the order in which the nodes
are considered. Preliminary results on several test cases seem to indicate that the ordering
of the nodes does not have a significant influence on the modularity that is obtained, while
doi:10.1088/1742-5468/2008/10/P10008 4
Fast unfolding of communities in large networks
This simple algorithm has several advantages. First, its steps are intuitive and easy to
implement, and the outcome is unsupervised. Moreover, the algorithm is extremely fast,
i.e. computer simulations on large ad hoc modular networks suggest that its complexity
is linear on typical and sparse data. This is due to the fact that the possible gains
in modularity are easy to compute with the above formula and that the number of
communities decreases drastically after just a few passes so that most of the running
time is concentrated on the first iterations. The so-called resolution limit problem of
modularity also seems to be circumvented thanks to the intrinsic multi-level nature of our
algorithm. Indeed, it is well known [22] that modularity optimization fails to identify
communities smaller than a certain scale, thereby inducing a resolution limit on the
community detected by a pure modularity optimization approach. This observation is
only partially relevant in our case because the first phase of our algorithm involves
the displacement of single nodes from one community to another. Consequently, the
probability that two distinct communities can be merged by moving nodes one by one is
very low. These communities may possibly be merged in the later passes, after blocks
of nodes have been aggregated. However, our algorithm provides a decomposition of the
network into communities for different levels of organization. In order to illustrate this
feature, let us focus on the ring of 30 cliques discussed in [22], where the cliques are
composed of 5 nodes and are interconnected through single links. The first pass of the
algorithm finds the natural partition of the network, where each community corresponds
to one clique. The second pass finds the global maximum of modularity where cliques
are combined into groups of 2. Consequently, if the cliques are indeed merged in the final
doi:10.1088/1742-5468/2008/10/P10008 5
Fast unfolding of communities in large networks
Table 1. Summary of numerical results. This table gives the performances of the
algorithm of Clauset et al [8], of Pons and Latapy [7], of Wakita and Tsurumi [16]
and of our algorithm for community detection in networks of various sizes. For
each method/network, the table displays the modularity that is achieved and the
computation time. Empty cells correspond to a computation time over 24 h. Our
method clearly performs better in terms of computer time and modularity. It is
also interesting to note the small value of Q found by WT for the mobile phone
network. This bad modularity result may originate from their heuristic which
creates balanced communities, while our approach gives unbalanced communities
in this specific network.
partition due to the resolution limit, they are distinct after the first pass. This result
suggests that the intermediate solutions found by our algorithm may also be meaningful
and that the uncovered hierarchical structure may allow the end-user to zoom into the
network and to observe its structure with the desired resolution.
In order to verify the validity of our algorithm, we have applied it on a number of test-
case networks that are commonly used for efficiency comparison and we have compared
it with three other community detection algorithms (see table 1). The networks that
we consider include a small social network [23], a network of 9000 scientific papers and
their citations [24], a sub-network of the internet [25] and a web-page network of a few
hundred thousand web-pages (the nd.edu domain, see [26]). In all cases, one can observe
both the rapidity and the large values of the modularity that are obtained. Our method
outperforms all the other methods to which it is compared. We also have applied our
method on two web networks of unprecedented sizes: a sub-network of the.uk domain
of 39 million nodes and 783 million links [27] and a network of 118 million nodes and 1
billion links obtained by the Stanford WebBase crawler [27, 28]. Even for these very large
networks, the computation time is small (12 min and 152 min, respectively) and makes
networks of still larger size, perhaps a billion nodes, accessible to computational analysis.
It is also interesting to note that the number of passes is usually very small. In the case
of the Karate Club [23], for instance, there are only 3 passes: during the first one, the 34
nodes of the network are partitioned into 6 communities; after the second one, only four
communities remain; during the third one, nothing happens and the algorithm therefore
stops. In the above examples, the number of passes is always smaller than 5.
doi:10.1088/1742-5468/2008/10/P10008 6
Fast unfolding of communities in large networks
We have also tested the sensitivity of our algorithm by applying it on ad hoc networks
that have a known community structure. To do so, we have used networks composed of
128 nodes which are split into 4 communities of 32 nodes each [29]. Pairs of nodes
belonging to the same community are linked with probability pin while pairs belonging
to different communities are linked with probability pout . The accuracy of the method
is evaluated by measuring the fraction of correctly identified nodes and the normalized
mutual information. In the benchmark proposed in [29], the fraction of correctly identified
nodes is 0.67 for zout = 8, 0.92 for zout = 7 and 0.98 for zout = 6, i.e. an accuracy similar
to that of the algorithm of Pons and Latapy [7] and of the algorithm of Reichardt and
Bornholdt [30]. To our knowledge, only two algorithms have a better accuracy than ours,
doi:10.1088/1742-5468/2008/10/P10008 7
Fast unfolding of communities in large networks
doi:10.1088/1742-5468/2008/10/P10008 8
Fast unfolding of communities in large networks
doi:10.1088/1742-5468/2008/10/P10008 9
Fast unfolding of communities in large networks
σT = 3.2 s but remain reasonably small as this is only a 7% variation. The smallest
and largest values of T among the 100 runs are 39 and 55 s. This interval suggests that a
good choice for the ordering of the nodes may substantially accelerate the dynamics. We
have therefore checked if an order related to the community structure would accelerate
the computation time. To do so, we have ordered the nodes by their postcodes, but this
choice did not lead to any improvement as compared to a random ordering, as Qzip = 0.76
and Tzip = 44 s.
doi:10.1088/1742-5468/2008/10/P10008 10
Fast unfolding of communities in large networks
local optimization involved at each step. These are, however, very qualitative arguments
and the multi-resolution of our algorithm will only be confirmed after looking in detail at
the hierarchies found in ad hoc networks with known hierarchical structure [19] or without
community structure (e.g. Erdös–Renyi random graphs), or after comparing with other
methods incorporating a tunable resolution [32, 37, 38].
Acknowledgments
This research was supported by the Communauté Française de Belgique through a grant
ARC and by the Belgian Network DYSCO, funded by the Interuniversity Attraction Poles
References
[1] Albert R and Barabási A-L, 2002 Rev. Mod. Phys. 74 4797
[2] Newman M E J, Barabási A-L and Watts D J, 2006 The Structure and Dynamics of Networks (Princeton,
NJ: Princeton University Press)
[3] Fortunato S and Castellano C, 2007 arXiv:0712.2716
[4] Girvan M and Newman M E J, 2002 Proc. Nat. Acad. Sci. 99 7821
[5] Newman M E J and Girvan M, 2004 Phys. Rev. E 69 026113
[6] Radicchi F, Castellano C, Cecconi F, Loreto V and Parisi D, 2004 Proc. Nat. Acad. Sci. 101 2658
[7] Pons P and Latapy M, 2006 J. Graph Algorithms Appl. 10 191
[8] Clauset A, Newman M E J and Moore C, 2004 Phys. Rev. E 70 066111
[9] Wu F and Huberman B A, 2004 Eur. Phys. J. B 38 331
[10] Newman M E J, 2006 Phys. Rev. E 74 036104
[11] Newman M E J, 2006 Proc. Nat. Acad. Sci. 103 8577
[12] Newman M E J, 2004 Phys. Rev. E 70 056131
[13] Newman M E J, 2004 Phys. Rev. E 69 066133
[14] Brandes U, Delling D, Gaertler M, Goerke R, Hoefer M, Nikoloski Z and Wagner D, 2006
arXiv:physics/0608255
[15] Guimera R, Sales M and Amaral L A N, 2004 Phys. Rev. E 70 025101
[16] Wakita K and Tsurumi T, 2007 Proc. IADIS Int. Conf. on WWW/Internet 2007 p 153
[17] Palla G, Derényi I, Farkas I and Vicsek T, 2005 Nature 435 814
[18] Raghavan U N, Albert R and Kumara S, 2007 Phys. Rev. E 76 036106
[19] Sales-Pardo M, Guimera R, Moreira A A and Amaral L A N, 2007 Proc. Nat. Acad. Sci. 104 15224
[20] Arenas A, Duch J, Fernández A and Gómez S, 2007 New J. Phys. 9 176
[21] Song C, Havlin S and Makse H A, 2005 Nature 433 392
[22] Fortunato S and Barthélemy M, 2007 Proc. Nat. Acad. Sci. 104 36
[23] Zachary WW, 1977 J. Anthropol. Res. 33 452
[24] http://www.cs.cornell.edu/projects/kddcup/ (Cornell KDD Cup)
[25] Hoerdt M and Magoni D, 2003 Proc. 11th Int. Conf. on Software, Telecommunications and Computer
Networks p 257
[26] Albert R, Jeong H and Barabási A-L, 1999 Nature 401 130
[27] http://law.dsi.unimi.it/ (Laboratory for Web Algorithmics)
[28] http://dbpubs.stanford.edu:8091/∼testbed/doc2/WebBase/ (Stanford WebBase Project)
[29] Danon L, Dı́az-Guilera A, Duch J and Arenas A, 2005 J. Stat. Mech. P09008
[30] Reichardt J and Bornholdt S, 2004 Phys. Rev. Lett. 93 218701
[31] Duch J and Arenas A, 2005 Phys. Rev. E 72 027104
[32] Lancichinetti A, Fortunato S and Kertesz J, 2008 arXiv:0802.1218
[33] Lambiotte R, Blondel V D, de Kerchove C, Huens E, Prieur C, Smoreda Z and Van Dooren P, 2008
Physica A 387 5317
[34] Palla G, Barabási A-L and Vicsek T, 2007 Nature 446 664
[35] Onnela J-P, Saramäki J, Hyvönen J, Szabó G, Lazer D, Kaski K, Kertész J and Barabási A-L, 2007 Proc.
Nat. Acad. Sci. 104 7332
doi:10.1088/1742-5468/2008/10/P10008 11
Fast unfolding of communities in large networks
doi:10.1088/1742-5468/2008/10/P10008 12