SNA-Community Detection
SNA-Community Detection
Community Detection
Homophily in the Society
❑Diverse approaches to the problem depending on how we define a community structure in the
network
Community Detection in
Networks: Applications
✓Performance enhancement of the similarity-based link prediction algorithms
✓Improving recommendation quality in Recommender systems by separating like-minded people
✓Controlling information diffusion within a network by identifying community memberships
✓Designing better marketing strategy by identifying position of the target group within the network
✓Restricting epidemic propagation by suitably isolating and immunizing the vulnerable population
✓Better anomaly detection
✓Studying evolution of communities
✓Applications in criminology and detecting terrorist groups
Types of Communities: Disjoint
Communities
❑Also referred to as flat communities
❑Considered as communities
❑Issue:
❑A node not present in K-clique can contribute in formation of the shortest distance in it!!
Node-centric Community Detection:
K-clan
❑A stricter version of K-clique
❑Only the nodes present in the set under inspection are used to create the subgraph in which the
distance between any two nodes should be less than or equal to K
❑Challenges:
❑These algorithms are still computationally expensive for large K
❑Deciding appropriate K is difficult
Node-centric Community Detection:
K-plex - Based on the degree of the nodes
❑A subset of vertices 𝑆 in a graph is a 𝐾-plex if every vertex of the subgraph has degree at least
|𝑆| − 𝐾
❑ K+1 core subgraph can be created from the current K core subgraph by recursively removing nodes of
degree K.
❑ This above should be repeated until there is no node of degree K in the current subgraph.
❑ Issues:
❑ Checking whether a given network is K-core or K-plex is computationally easy
❑ Finding maximal K-core/K-plex is NP-complete!!
Community Detection: Modularity
❑Modularity comes from the word ‘module’
❑Networks with high modularity have dense connections between the nodes within modules but
sparse connections between nodes in different modules.
❑an assignment that maximizes the modularity of the overall network often finds the communities
in the network
❑Fast Greedy Algorithm
❑Louvain Method
Community Detection: Fast
Greedy Algorithm
Community Detection: Louvain
Method
Community Detection through
Modularity Maximization: Limitations
1) Resolution limit
✓ well-connected smaller communities tend to get merged with larger communities even if the resultant
communities are not that dense
✓ fails to detect those communities which are well-separated with densely connected intra-community
nodes but only a single inter-community edge with the rest of the network
2) Degeneracy of solutions
✓ the case when there is an exponential number of community structures with same (maximum)
modularity value
Permanence and
Community Detection
❑Modularity is a network-centric global metric
▪ Considers the entire network structure during maximization process
▪ Not suitable for large and evolving networks
❑Requires a method that looks at the local neighborhood while detecting communities
❑Hypothesis 2:
▪ In a community, all the vertices should be highly inter-connected to each other
Assignment
Vertex deg ∙ 𝐼 ∙ 𝐸𝑚𝑎𝑥 ∙ 𝑐𝑖𝑛 ∙ Perm ∙
E F
C 3 2 1 1 0.67
A
Assignment A
E
B C
Assignment
Vertex deg ∙ 𝐼 ∙ 𝐸𝑚𝑎𝑥 ∙ 𝑐𝑖𝑛 ∙ Perm ∙
A D C
B
E
E F
Assignment B
Permanence Maximization for
Community Detection: MaxPerm
❑Uses greedy approach for producing high permanence partitions in the network
❑Join the small communities if and only if the permanence value of the network increases
❑If a vertex is connected to more than one neighboring communities and those communities
overlap with each other, then Permanence maximization method fails to handle the resolution
limit
❑ The merging process stops when no more cliques are there to merge
1. Initialization: Initialize unique community labels for all the nodes in the network
2. Inner Iteration: Update labels for all the nodes in the networks
a. Update with the label having the highest frequency in its neighbors’ current labels
b. Break the tie in case of discrepancy at random
3. Outer Iteration: Stop if the label is unchanged compared to earlier iteration. Else, continue
Community Detection: Label
Propagation
Labels are updated at the end of the inner iteration
Label Propagation: Limitations
❑Not possible to find the number of outer iterations required to get the correct answer for large
networks
where Γ𝑣𝑐 : set of edges of vertex v that are incident on nodes in community 𝑐
and 𝑥𝑒 : number of communities that contains edge e
❑Generalizing connectedness: a measure of how strongly vertex is connected with the internal neighbors of the
community
1
σ𝑒∈Γ𝑐
𝑐 𝑣 𝑥𝑒
1 − 𝑐𝑖𝑛 𝑣 ×
𝐼 𝑣
𝑐
where 𝑐𝑖𝑛 𝑣 : local clustering coefficient of vertex 𝑣, considering only the subgraph induced by community 𝑐
Overlapping Community Detection:
Generalized Permanence: Illustration
❑In the network shown, v is the center node and a part of C1 and C3 communities, and C2
is neighboring community
3. Convergence: Stop when no improvement for all the vertices, or maximum number of iterations reached
Local Community Detection:
Subgraph Modularity
❑subgraph modularity is proposed based on the degree of each vertex
❑Define adjacency matrix for subgraph 𝐶 and its neighbors 𝑈 as:
1 𝑖𝑓 𝑖 𝑎𝑛𝑑 𝑗 𝑎𝑟𝑒 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑, 𝑖 ∈ 𝐶 ∨ 𝑗 ∈ 𝐶
𝑆𝑖𝑗 = ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
❑Define in-degree of subgraph 𝐶 is the total number of edges that lie completely in subgraph 𝐶
𝐼𝑛 𝐶 = 𝑆𝑖𝑗 𝛿 𝑖, 𝑗
𝑖,𝑗
❑Define out-degree of subgraph 𝐶 is the total number of edges between 𝐶 and the remaining part of the network 𝐺
𝑂𝑢𝑡 𝐶 = 𝑆𝑖𝑗 𝜆 𝑖, 𝑗
𝑖,𝑗