0% found this document useful (0 votes)
207 views

Clustering Social Network Graphs

The document discusses social network clustering and algorithms for identifying communities within networks. It introduces the Girvan-Newman algorithm which uses betweenness centrality to identify edges that connect communities. The algorithm calculates the betweenness of each edge by finding the number of shortest paths between all nodes that pass through that edge. Edges with the highest betweenness are removed iteratively until individual communities are identified. However, the algorithm has disadvantages in that nodes cannot belong to multiple communities and certain boundary nodes may be incorrectly removed from their own community.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
207 views

Clustering Social Network Graphs

The document discusses social network clustering and algorithms for identifying communities within networks. It introduces the Girvan-Newman algorithm which uses betweenness centrality to identify edges that connect communities. The algorithm calculates the betweenness of each edge by finding the number of shortest paths between all nodes that pass through that edge. Edges with the highest betweenness are removed iteratively until individual communities are identified. However, the algorithm has disadvantages in that nodes cannot belong to multiple communities and certain boundary nodes may be incorrectly removed from their own community.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

 

CLUSTERING
SOCIAL NETWORK
GRAPHS 
Introduction

 Social Network is a nonrandom collection


of entities in a network, having at least
one relationship between them 
 Social networks contain communities of
entities that are connected by many edges 
 Eg: Groups of friends at school, Researchers
interested in the same topic etc.
 Communities can be identified by clustering
 Absence of a proper distance measure
Disadvantages  Sub communities will not be identified
of  Possibility of different cluster nodes
Standard getting combined
 Possibility of wrong clustering in both
Clustering K Means and Hierarchical clustering
Algorithms 
 Betweenness of an edge (a, b) is the
number of pairs of nodes x and y
such that the edge (a, b) lies on the
Solving the shortest path between x and y 
 Finding the edges that are least likely
problem to be inside a community
    Large betweenness shows edge runs
between two different communities
"Betweenness" 
The Girvan-Newman Algorithm 
 Used for calculating the number of shortest paths going through each edge 
 Visits each node X once and computes the number of shortest paths from X to each of the other nodes that
go through each of the edges
 STEPS
1. Performing a breadth-first search (BFS) of the graph, starting at the node X 
2. Label each node by the number of shortest paths that reach it from the root and label each node Y  by sums of labels 
3. Calculate for each edge e the sum over all nodes Y of the fraction of shortest paths from the root X to Y that go
through e
4. Repeat for all nodes
5. Completing credit calculation
The Girvan-Newman Algorithm contd...
Step1 - performing a breadth-first search (BFS) of the graph
The Girvan-Newman Algorithm contd...
Step 2 - Label each node by the number of shortest paths that reach it from the root and label each node Y 
by sums of labels 
The Girvan-Newman Algorithm contd...
Step 3 - Calculate for each edge e the sum over all nodes Y of the fraction of shortest paths from the root X to Y that go through e
The rules for the calculation are as follows:
1. Each leaf in the DAG (a leaf is a node with no DAG edges to nodes at levels below) gets a credit of 1.
2. Each node that is not a leaf gets a credit equal to 1 plus the sum of the credits of the DAG edges from that node to the level
below.
3. A DAG edge e entering node Z from the level above is given a share of the credit of Z proportional to the fraction of shortest
paths from the root to Z that go through e.
The Girvan-Newman Algorithm contd...
Step 5 &6 -Repeat for all nodes and Completing credit calculation
Since each shortest path will have been discovered twice – once when each of its endpoints is the root – we must
divide the credit for each edge by 2.
Girvan-Newman Algorithm
& Betweenness

 Remove Edges with highest credit value


 Stopped when individuals are assigned to clusters
The Girvan-Newman Algorithm Disadvantage
 Nodes cannot be in two different communities together
 Certain nodes may be removed from the community on being associated with another
community
THANKYOU

You might also like