0% found this document useful (0 votes)
6 views

SNA-Community Detection

Uploaded by

jvl jsp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

SNA-Community Detection

Uploaded by

jvl jsp
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Social Network Analysis

Community Detection
Homophily in the Society

❑Tendency of individuals to associate and bond ❑Homophily occurs against a number of


with similar others categories:
▪ Age
❑Similar nodes tend to attract each other, and
▪ Gender
dissimilar nodes tend to get away from each
▪ Education, occupation
other
▪ Religion
❑Causes formation of a community structure in ▪ Interests
a social network ▪ Organizational role, etc.
Communities in a Network
❑Identifying communities gives an insight about the inherent network structure

❑Community detection is not an well-defined problem


❑what we mean by a ‘community’ is often not concrete
❑often hard to reliably define a ground-truth annotation for communities
❑no standard measure to asses the performance

❑Diverse approaches to the problem depending on how we define a community structure in the
network
Community Detection in
Networks: Applications
✓Performance enhancement of the similarity-based link prediction algorithms
✓Improving recommendation quality in Recommender systems by separating like-minded people
✓Controlling information diffusion within a network by identifying community memberships
✓Designing better marketing strategy by identifying position of the target group within the network
✓Restricting epidemic propagation by suitably isolating and immunizing the vulnerable population
✓Better anomaly detection
✓Studying evolution of communities
✓Applications in criminology and detecting terrorist groups
Types of Communities: Disjoint
Communities
❑Also referred to as flat communities

❑Each node in the network can belong to at most one community

❑Differs from disconnected components:


❑nodes in two different communities can still have connecting edges
❑referred to as bridges

❑Example: Full-time employees of an organization


Types of Communities:
Overlapping Communities
❑Members can belong to more than one community at a time

❑Communities can even share edges

❑Realistic and generic community structure

❑Harder to find than flat communities

❑Example: Various groups in social networks


Types of Communities:
Hierarchical Communities
❑Outcome of merging two or more flat or overlapping communities in a network

❑Can be linked to other hierarchical, overlapping, or flat communities

❑Example: various city-level communities merged to form a state-level community


Types of Communities:
Local Communities
❑Shows a community structure from local perspective without focusing on global structure

❑Example: citation network formed by research groups inside a university


Node-centric Community Detection
❑Use the property of the nodes to find community structure in the network

❑Exploits node-centric features in a number of ways:


❖Complete Mutuality
▪ Cliques
❖Reachability of Members
▪ K-cliques
▪ K-clan
▪ K-club
❖Node Degree
▪ K-plex
▪ K-core
Node-centric Community Detection:
Finding Cliques
❑A subgraph of a graph is a clique if every vertex-pair in the subgraph are adjacent

❑Considered as communities

❑A couple of problems with this approach


▪ Finding cliques from a network is NP-complete
▪ Large cliques are not present in social networks usually
Node-centric Community Detection:
K-Cliques
❑The maximal subset of vertices of the network such that, for any two nodes belonging to this
subset, the shortest distance between them is less than or equal to K

❑1-clique is normal clique

❑2-cliques are known as known as friend of a friend in social network analysis

❑Issue:
❑A node not present in K-clique can contribute in formation of the shortest distance in it!!
Node-centric Community Detection:
K-clan
❑A stricter version of K-clique

❑Only the nodes present in the set under inspection are used to create the subgraph in which the
distance between any two nodes should be less than or equal to K

❑Maximality condition of K-clique also persists in K-clan


Node-centric Community Detection:
K-club
❑K-club is a K-clan minus the maximality condition

❑Every K-clan is a K-club as well as a K-clique

❑Challenges:
❑These algorithms are still computationally expensive for large K
❑Deciding appropriate K is difficult
Node-centric Community Detection:
K-plex - Based on the degree of the nodes
❑A subset of vertices 𝑆 in a graph is a 𝐾-plex if every vertex of the subgraph has degree at least
|𝑆| − 𝐾

❑A measure based on the degree of the nodes


Node-centric Community Detection:
K-core-Another degree-centric measure
❑ A subgraph 𝐺′ of a graph 𝐺 in which each node has degree greater than or equal to 𝐾

❑ K+1 core subgraph can be created from the current K core subgraph by recursively removing nodes of
degree K.

❑ This above should be repeated until there is no node of degree K in the current subgraph.

❑ Issues:
❑ Checking whether a given network is K-core or K-plex is computationally easy
❑ Finding maximal K-core/K-plex is NP-complete!!
Community Detection: Modularity
❑Modularity comes from the word ‘module’

❑Network-centric metric to determine the quality of a community structure


Community Detection: Modularity
Formulation of modularity:
|𝐶𝑜𝑚𝑚| 2
𝑚𝑛 𝑘𝑛
𝑄= ෍ −
|𝐸| 2 ∙ |𝐸|
𝑛=1

▪𝑚𝑛 denotes the number of edges in the community 𝑛

▪|𝐶𝑜𝑚𝑚| is the total number of communities

▪𝑘𝑛 = σ𝑖∈𝐶𝑜𝑚𝑚(𝑛) deg(𝑖)


Community Detection: Modularity
Maximization
❑Modularity can be positive, negative, and zero
▪ Positive modularity shows presence of strong community structure

❑Networks with high modularity have dense connections between the nodes within modules but
sparse connections between nodes in different modules.

❑Different community assignments can lead to different values of modularity

❑an assignment that maximizes the modularity of the overall network often finds the communities
in the network
❑Fast Greedy Algorithm
❑Louvain Method
Community Detection: Fast
Greedy Algorithm
Community Detection: Louvain
Method
Community Detection through
Modularity Maximization: Limitations
1) Resolution limit
✓ well-connected smaller communities tend to get merged with larger communities even if the resultant
communities are not that dense
✓ fails to detect those communities which are well-separated with densely connected intra-community
nodes but only a single inter-community edge with the rest of the network

2) Degeneracy of solutions
✓ the case when there is an exponential number of community structures with same (maximum)
modularity value
Permanence and
Community Detection
❑Modularity is a network-centric global metric
▪ Considers the entire network structure during maximization process
▪ Not suitable for large and evolving networks

❑Requires a method that looks at the local neighborhood while detecting communities

❑Permanence- local metric for community detection


❑A vertex-centric metric
❑Two communities 𝐴 and 𝐵 are neighbouring communities if ∃𝑢 ∈ 𝐴, 𝑣 ∈ 𝐵, and there is an
edge between 𝑢 and 𝑣
Permanence and
Community Detection
❑Hypothesis 1:
▪ The number of internal connections of node 𝑣 should be greater than the number of external connections of
node 𝑣 with any external community

❑Hypothesis 2:
▪ In a community, all the vertices should be highly inter-connected to each other

❑Expression for Permanence for a vertex 𝑣 is:


𝐼 𝑣 1
𝑃𝑒𝑟𝑚 𝑣 = × − 1 − 𝑐𝑖𝑛 𝑣
𝐸𝑚𝑎𝑥 𝑣 deg 𝑣

▪𝐼 𝑣 : Number of internal neighbours of 𝑣 within its own community


▪𝐸𝑚𝑎𝑥 : maximum number of connections of 𝑣 to neighbors in an external community
▪𝑐𝑖𝑛 : internal clustering coefficient of 𝑣
Permanence and
Community Detection
❑Permanence of the entire network:
σ𝑣∈𝑉 𝑃𝑒𝑟𝑚(𝑣)
𝑃𝑒𝑟𝑚 𝐺 =
|𝑉|
❑permanence value ranges between -1 to 1
❑when vertex 𝑣 is a part of a clique, Permanence is 1
❑when there is no appropriate community structure of a network (like a grid network), Permanence is 0
❑when 𝐼 𝑣 ≪ 𝑑𝑒𝑔 𝑣 and 𝑐𝑖𝑛 𝑣 ≈ 0, Permanence tends to -1
Permanence and
Community Detection: Illustration
B C To see how community membership alters permanence
scores for vertices 𝐶 and 𝐸
A D

Assignment
Vertex deg ∙ 𝐼 ∙ 𝐸𝑚𝑎𝑥 ∙ 𝑐𝑖𝑛 ∙ Perm ∙
E F
C 3 2 1 1 0.67

A
Assignment A
E
B C
Assignment
Vertex deg ∙ 𝐼 ∙ 𝐸𝑚𝑎𝑥 ∙ 𝑐𝑖𝑛 ∙ Perm ∙
A D C
B

E
E F
Assignment B
Permanence Maximization for
Community Detection: MaxPerm
❑Uses greedy approach for producing high permanence partitions in the network

❑Join the small communities if and only if the permanence value of the network increases

❑Basic steps of the algorithm is same as Louvain method

❑Two Basic stages of the algorithm


❑First stage (Permanence maximization):
❑Merging of small communities greedily
❑Merging stops when the maximum permanence gain is attained
❑Second stage (Node aggregation)
❑Build the super-network
❑Final nodes of super-network generated are the final communities
Permanence Maximization for
Community Detection: Limitations
❑Permanence maximization reduces the problem of resolution limit and degeneracy of solutions

❑If a vertex is connected to more than one neighboring communities and those communities
overlap with each other, then Permanence maximization method fails to handle the resolution
limit

❑For real-world networks, permanence maximization tends to produce small communities


Overlapping Community Detection:
Clique Percolation
❑ Based on iteratively finding and merging cliques of size k (often referred to as k-cliques) to form (k+1)-cliques

❑ Two k-cliques can be merged if they have (k-1) nodes in common

❑ The merging process stops when no more cliques are there to merge

❑ In the resulting communities, node 3 is common to both


Clique Percolation Method:
Limitations
❑There is no fixed value of K, and it is not easy to find a correct value of K

❑Finding a clique in a network is computationally expensive.

❑Method is more like pattern matching applied to the network


Community Detection: Label
Propagation (Dynamic networks)
❑Dynamic networks continuously evolve over time
❑Require algorithms that can directly be applied to the updated part only
❑Label propagation is one such

❑The basic algorithm:

1. Initialization: Initialize unique community labels for all the nodes in the network

2. Inner Iteration: Update labels for all the nodes in the networks
a. Update with the label having the highest frequency in its neighbors’ current labels
b. Break the tie in case of discrepancy at random

3. Outer Iteration: Stop if the label is unchanged compared to earlier iteration. Else, continue
Community Detection: Label
Propagation
Labels are updated at the end of the inner iteration
Label Propagation: Limitations
❑Not possible to find the number of outer iterations required to get the correct answer for large
networks

❑Method is not stable as it involves a random process to break the tie

❑Produces no unique solution but a sequence of many solutions


Overlapping Community Detection:
Generalized Permanence
❑An edge (𝑢, 𝑣) is a shared edge if it lies in more than one community

❑An edge (𝑢, 𝑣) is a non-shared edge if it lies completely in a single community


Overlapping Community Detection:
Generalized Permanence
❑Generalizing Pull: the internal pull of a vertex from its community
𝑐
1
𝐼 𝑣 = ෍
𝑥
𝑐 𝑒
𝑒∈Γ𝑣

where Γ𝑣𝑐 : set of edges of vertex v that are incident on nodes in community 𝑐
and 𝑥𝑒 : number of communities that contains edge e
❑Generalizing connectedness: a measure of how strongly vertex is connected with the internal neighbors of the
community
1
σ𝑒∈Γ𝑐
𝑐 𝑣 𝑥𝑒
1 − 𝑐𝑖𝑛 𝑣 ×
𝐼 𝑣
𝑐
where 𝑐𝑖𝑛 𝑣 : local clustering coefficient of vertex 𝑣, considering only the subgraph induced by community 𝑐
Overlapping Community Detection:
Generalized Permanence: Illustration
❑In the network shown, v is the center node and a part of C1 and C3 communities, and C2
is neighboring community

❑Using notions above, Generalized Permanence measure is defined as:


1
𝐼𝑐 𝑣 1 σ𝑒∈Γ𝑐
𝑐 𝑣 𝑥𝑒
𝑃𝑔𝑐 𝑣 = × − 1 − 𝑐𝑖𝑛 𝑣 ×
𝐸𝑚𝑎𝑥 𝑣 deg 𝑣 𝐼 𝑣

▪𝐼 𝑣 : Number of internal neighbours of 𝑣 within its own community

▪𝐸𝑚𝑎𝑥 : maximum number of connections of 𝑣 to neighbors in an external community

▪𝑐𝑖𝑛 : internal clustering coefficient of 𝑣

Γ𝑣𝑐 : set of edges of vertex v that are incident on nodes in community 𝑐

𝑥𝑒 : number of communities that contains edge e


Overlapping Community Detection:
Generalized Permanence Maximization
❑works in a manner that is similar to MaxPerm, except
❑metric being maximized is GenPerm
❑seed entities being edges

❑Basic Steps of the Algorithm:

1. Initialization: initialize each edge of the network as a single community

2. Update: Follow these steps


a. Calculate the GenPerm value of vertex v with respect to each of the communities it belongs to
b. If 𝑃𝑔𝐶 𝑣 > 0, assign 𝑣 to that community
c. Calculate the total GenPerm value with respect to the new set of communities
d. Update neighboring community set of vertex 𝑣, if the value is improved

3. Convergence: Stop when no improvement for all the vertices, or maximum number of iterations reached
Local Community Detection:
Subgraph Modularity
❑subgraph modularity is proposed based on the degree of each vertex
❑Define adjacency matrix for subgraph 𝐶 and its neighbors 𝑈 as:
1 𝑖𝑓 𝑖 𝑎𝑛𝑑 𝑗 𝑎𝑟𝑒 𝑐𝑜𝑛𝑛𝑒𝑐𝑡𝑒𝑑, 𝑖 ∈ 𝐶 ∨ 𝑗 ∈ 𝐶
𝑆𝑖𝑗 = ቊ
0 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
❑Define in-degree of subgraph 𝐶 is the total number of edges that lie completely in subgraph 𝐶

𝐼𝑛 𝐶 = ෍ 𝑆𝑖𝑗 𝛿 𝑖, 𝑗
𝑖,𝑗

❑Define out-degree of subgraph 𝐶 is the total number of edges between 𝐶 and the remaining part of the network 𝐺

𝑂𝑢𝑡 𝐶 = ෍ 𝑆𝑖𝑗 𝜆 𝑖, 𝑗
𝑖,𝑗

•𝜆 𝑖, 𝑗 : 1 if exactly one of 𝑖 or 𝑗 lies in subgraph 𝐶, and 0 otherwise


Local Community Detection:
Subgraph Modularity Maximization
❑The subgraph modularity 𝑆𝑀 is defined for the subgraph 𝐶 of network 𝐺 as the ratio of the in-
degree of subgraph 𝐶 to the out-degree of the subgraph 𝐶
𝐼𝑛 𝐶
𝑆𝑀 =
𝑂𝑢𝑡 𝐶

You might also like