Community Detection in Social Networks
Community Detection in Social Networks
material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. If you make use of a significant portion of these slides in your own
lecture, please include this message, or a link to our web site: http://www.mmds.org
http://www.mmds.org
query
advertiser
b=16
b=7.5
Edge betweenness
in a real network
[Girvan-Newman 02]
Girvan-Newman Algorithm:
Undirected unweighted networks
12
33
49
Need to re-compute
betweenness at
every step
10
Step 1:
Step 3:
Step 2:
11
12
13
1.
2.
14
Want to compute
betweenness of
paths starting at
node
15
16
The algorithm:
Add edge flows:
-- node flow =
1+child edges
-- split the flow up
based on the parent
value
Repeat the BFS
procedure for each
starting node
1+1 paths to H
Split evenly
1+0.5 paths to J
Split 1:2
1 path to K.
Split evenly
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
17
The algorithm:
Add edge flows:
-- node flow =
1+child edges
-- split the flow up
based on the parent
value
Repeat the BFS
procedure for each
starting node
1+1 paths to H
Split evenly
1+0.5 paths to J
Split 1:2
1 path to K.
Split evenly
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
18
1.
2.
19
Communities: sets of
tightly connected nodes
Define: Modularity
20
Note:
2
21
Aij = 1 if ij,
0 else
22
23
2
3
Bi-partitioning task:
A
Questions:
25
1
2
3
26
1
2
3
cut(A,B) = 2
4
27
Criterion: Minimum-cut
Minimize weight of connections between groups
Problem:
Only considers external cluster connections
Does not consider internal cluster connectivity
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
28
[Shi-Malik]
Why
29
30
jth coordinate of A
x:
')'
31
32
Details!
33
A bit of intuition:
A
) )7
) )7 8 4
More intuition:
) )7
) )7 8 4
35
1
2
4
Important properties:
Symmetric matrix
Eigenvectors are real and orthogonal
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
36
1
2
4
3
37
1
2
3
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
> ?
Important properties:
38
Details!
(c)
(b): B C DB B C G C GB BG
Proof:
GB @ 0
(b)
(a): Let ) be an eigenvalue of >. Then by (b)
B C DB @ 0 so B C DB B C 9B 9B C B ) @ 4
(a)
(c): is also easy! Do it yourself.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
39
x Mx
2 = min T
x
x x
N
B
E,F O E
E,F O 2BE BF
BFN
2BE BF !
' '
40
x Mx
2 = min T
x
x x
Details!
E SE 9E QE QE ) V
To minimize this over all unit vectors x orthogonal to:
w = min over choices of S; , S: ! so that:
SEN 1 (unit length) SE 0 (orthogonal to Q; )
To minimize this, set V and so ) V )
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
41
Remember:
= min
All labelings
of nodes . so
that BE 0
( i , j )E
( xi x j )
i
2
i
BE
BF
Balance to minimize
42
3E 1 0
3F P1
43
BE
BF
44
Details!
Proof:
rs 7rt
s rsu
c 2S
) is only smaller
(while also E BE 0)
45
Details!
Proof (continued):
1) Lets set: '
Z
w
v
Z
P
x
#
2) Then:
;
z
;
|
rs 7rt
s rsu
;
z
;
z
} } u
s,t
~
} u
} u
z 7 | ~
V
;
z
P{
;
|
} } u
~
} }
~
4
46
Details!
V
V @ ) @
w'
V
w'
47
Spectral Clustering
48
2) Decomposition
Compute eigenvalues and eigenvectors of the matrix
Map each point to a lower-dimensional representation
based on one or more eigenvectors
3) Grouping
Assign points to two or more clusters, based on the new
representation
49
1) Pre-processing:
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
-1
0.0
0.4
0.3
-0.5
-0.2
-0.4
-0.5
1.0
0.4
0.6
0.4
-0.4
0.4
0.0
0.4
0.3
0.1
0.6
-0.4
0.5
0.4
-0.3
0.1
0.6
0.4
-0.5
4.0
0.4
-0.3
-0.5
-0.2
0.4
0.5
5.0
0.4
-0.6
0.4
-0.4
-0.4
0.0
Build Laplacian
matrix L of the
graph
2)
Decomposition:
Find eigenvalues
and eigenvectors x
of the matrix L
Map vertices to
corresponding
components of 2
3.0
3.0
0.3
0.6
0.3
-0.3
-0.3
-0.6
X=
How do we now
find the clusters?
50
3) Grouping:
Sort components of reduced 1-dimensional vector
Identify clusters by splitting the sorted vector in two
0.3
0.6
0.3
-0.3
0.3
-0.3
-0.3
0.6
-0.3
-0.6
0.3
-0.6
51
Value of x2
Rank in x2
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
52
Value of x2
Components of x2
Rank in x2
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
53
Components of x1
Components of x3
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
54
55
Well-separated space
56
58
|X| = s = 3
|Y| = t = 4
K3,4
Fully connected
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
59
[Agrawal-Srikant 99]
Goal:
Find all subsets T s.t. T Si of at least f sets Si
(items in T were bought together at least f times)
60
How?
View each node i as a
set Si of nodes i points to
Ks,t = a set Y of size t
that occurs in s sets Si
Looking for Ks,t set of
frequency threshold to s
and look at layer t all
frequent sets of size t
b
i
Si={a,b,c,d}
d
a
j
i
k
b
c
d
Y
s minimum support (|X|=s)
t itemset size (|Y|=t)
61
c
d
x
Si={a,b,c,d}
We found Ks,t!
Ks,t = a set Y of size t
that occurs in s sets Si
a
z
a
x
y
z
b
c
Y
62
{b,d}: support 3
{e,f}: support 2
c
d
e
f
Itemsets:
a = {b,c,d}
b = {d}
c = {b,d,e,f}
d = {e,f}
e = {b,d}
f = {}
c
d
c
d
e
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
63
[Kumar, Raghavan, Rajagopalan, Tomkins: Trawling the Web for emerging cyber-communities 1999]
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
64