0% found this document useful (0 votes)

42 views

Community Detection in Social Networks

1. The document summarizes key concepts from the book "Mining of Massive Datasets" by Jure Leskovec, Anand Rajaraman, and Jeff Ullman. 2. It discusses algorithms for finding communities or clusters in networks, including the Girvan-Newman algorithm which removes edges with highest betweenness to partition a network hierarchically. 3. It also covers the concept of modularity, a measure of how well a network is partitioned into communities, and how it can be used to select the optimal number of clusters.

Uploaded by

Arun Manick

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

42 views

Community Detection in Social Networks

Uploaded by

Arun Manick

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 64

Note to other teachers and users of these slides: We would be delighted if you found this our

material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify
them to fit your own needs. If you make use of a significant portion of these slides in your own
lecture, please include this message, or a link to our web site: http://www.mmds.org

Mining of Massive Datasets

Jure Leskovec, Anand Rajaraman, Jeff Ullman
Stanford University

http://www.mmds.org

We often think of networks being organized

into modules, cluster, communities:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Find micro-markets by partitioning the

query-to-advertiser graph:

query

advertiser

[Andersen, Lang: Communities from seed sets, 2006]

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Clusters in Movies-to-Actors graph:

[Andersen, Lang: Communities from seed sets, 2006]

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Discovering social circles, circles of trust:

[McAuley, Leskovec: Discovering social circles in ego networks, 2012]

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

How to find communities?

We will work with undirected (unweighted) networks

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Edge betweenness: Number of

shortest paths passing over the edge
Intuition:

Edge strengths (call volume)

in a real network

b=16

b=7.5

Edge betweenness
in a real network

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

[Girvan-Newman 02]

Divisive hierarchical clustering based on the

notion of edge betweenness:
Number of shortest paths passing through the edge

Girvan-Newman Algorithm:
Undirected unweighted networks

Repeat until no edges are left:

Calculate betweenness of edges
Remove edges with highest betweenness

Connected components are communities

Gives a hierarchical decomposition of the network
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

12
33

Need to re-compute
betweenness at
every step

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Step 1:

Step 3:

Step 2:

Hierarchical network decomposition:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Communities in physics collaborations

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Zacharys Karate club:

Hierarchical decomposition

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

1.
2.

How to compute betweenness?

How to select the number of
clusters?

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Want to compute
betweenness of
paths starting at
node

Breath first search

starting from :

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Count the number of shortest paths from

to all other nodes of the network:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Compute betweenness by working up the

tree: If there are multiple paths count them
fractionally

The algorithm:
Add edge flows:
-- node flow =
1+child edges
-- split the flow up
based on the parent
value
Repeat the BFS
procedure for each
starting node

1+1 paths to H
Split evenly
1+0.5 paths to J
Split 1:2
1 path to K.
Split evenly
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Compute betweenness by working up the

tree: If there are multiple paths count them
fractionally

The algorithm:
Add edge flows:
-- node flow =
1+child edges
-- split the flow up
based on the parent
value
Repeat the BFS
procedure for each
starting node

1+1 paths to H
Split evenly

1+0.5 paths to J
Split 1:2
1 path to K.
Split evenly
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

1.
2.

How to compute betweenness?

How to select the number of
clusters?

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Communities: sets of
tightly connected nodes
Define: Modularity

A measure of how well

a network is partitioned
into communities
Given a partitioning of the
network into groups :
Q s S [ (# edges within group s)
(expected # edges within group s) ]
Need a null model!
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Given real on nodes and edges,

construct rewired network

Same degree distribution but

i
random connections
j
Consider as a multigraph
The expected number of edges between nodes
and of degrees and equals to:

The expected number of edges in (multigraph) G:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Note:

Modularity of partitioning S of graph G:

Q s S [ (# edges within group s)
(expected # edges within group s) ]
,

Normalizing cost.: -1<Q<1

Aij = 1 if ij,
0 else

Modularity values take range [1,1]

It is positive if the number of edges within
groups exceeds the expected number
0.3-0.7<Q means significant community structure
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Modularity is useful for selecting the

number of clusters:

Next time: Why not optimize Modularity directly?

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Undirected graph , !:

2
3

Divide vertices into two disjoint groups , #

Bi-partitioning task:
A

How can we define a good partition of ?

How can we efficiently identify such a partition?

Questions:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

What makes a good partition?

Maximize the number of within-group
connections
Minimize the number of between-group
connections
5

1
2
3

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Express partitioning objectives as a function

of the edge cut of the partition

Cut: Set of edges with only one vertex in a

group:

1
2
3

cut(A,B) = 2
4

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Criterion: Minimum-cut
Minimize weight of connections between groups

arg minA,B cut(A,B)

Degenerate case:
Optimal cut
Minimum cut

Problem:
Only considers external cluster connections
Does not consider internal cluster connectivity
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

[Shi-Malik]

Criterion: Normalized-cut [Shi-Malik, 97]

Connectivity between groups relative to the
density of each group
$%&!: total weight of the edges with at least
one endpoint in : $%&

Why

use this criterion?

Produces more balanced partitions

How do we efficiently find a good partition?

Problem: Computing optimal cut is NP-hard
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Aij =1 if , ! is an edge, else 0

A: adjacency matrix of undirected G

Think of it as a label/value of each node of

x is a vector in n with components ', , '!

What is the meaning of A x?

Entry yi is a sum of labels xj of neighbors of i

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

jth coordinate of A

Sum of the x-values

of neighbors of j
Make this a new value at node j

')'

Analyze the spectrum of matrix representing

Spectrum: Eigenvectors ' of a graph, ordered by
the magnitude (strength) of their corresponding
eigenvalues ) :

Spectral Graph Theory:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Suppose all nodes in have degree *

and is connected
What are some eigenvalues/vectors of ?
' ) ' What is ? What x?

Lets try: ' , , , !

Then: ' *, *, , * ) '. So: ) *
We found eigenpair of : ' , , , !, ) *

Remember the meaning of + ':

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Details!

G is d-regular connected, A is its adjacency matrix

Claim:
d is largest eigenvalue of A,
d has multiplicity of 1 (there is only 1 eigenvector
associated with eigenvalue d)

Proof: Why no eigenvalue , - ?

To obtain d we needed ' ' for every ., /

This means ' 0 1,1, , 1! for some const. 0
Define: = nodes with maximum possible value of '
Then consider some vector + which is not a multiple of
vector , , !. So not all nodes (with labels + ) are in
Consider some node and a neighbor then
node gets a value strictly less than *
So 3 is not eigenvector! And so * is the largest eigenvalue!

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

What if is not connected?

has 2 components, each *-regular

' Put all s on and 4s on # or vice versa

What are some eigenvectors?

' , , , 4, , 4! then 6 ' , , , 4, , 4

|B|
|A|
' 4, , 4, , , ! then ' 4, , 4, *, , *!
And so in both cases the corresponding ) *

A bit of intuition:
A

) )7

) )7 8 4

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

2nd largest eigval.

9:7; now has
value very close
to 9:
34

More intuition:

) )7

) )7 8 4

2nd largest eigval.

9:7; now has
value very close
to 9:

If the graph is connected (right example) then we

already know that ' , ! is an eigenvector
Since eigenvectors are orthogonal then the
components of '7 sum to 0.
Why? Because ' '7 ' '7 <=

So we can look at the eigenvector of the 2nd largest

eigenvalue and declare nodes with positive label in A
and negative label in B.
But there is still lots to sort out.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Adjacency matrix (A):

n n matrix
A=[aij], aij=1 if edge between node i and j
5

1
2
4

Important properties:

Symmetric matrix
Eigenvectors are real and orthogonal
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Degree matrix (D):

n n diagonal matrix
D=[dii], dii = degree of node i
5

1
2
4
3

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Laplacian matrix (L):

n n symmetric matrix
5

1
2
3

-1

> ?

' , , ! then > ' 4 and so ) ) 4

What is trivial eigenpair?

Important properties:

Eigenvalues are non-negative real numbers

Eigenvectors are real and orthogonal
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Details!

(a) All eigenvalues are @ 0

(b) B C DB EF DEF BE BF @ 0 for every B
(c) D G C G

That is, D is positive semi-definite

Proof:

As it is just the square of length of GB

GB @ 0

(b)
(a): Let ) be an eigenvalue of >. Then by (b)
B C DB @ 0 so B C DB B C 9B 9B C B ) @ 4
(a)
(c): is also easy! Do it yourself.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Fact: For symmetric matrix M:

x Mx
2 = min T
x
x x

What is the meaning of min xT L x on G?

x I L x :E,FK; DEF BE BF :E,FK; LEF MEF BE BF

E LEE BEN

N
B
E,F O E

E,F O 2BE BF

BFN

2BE BF !

' '

Node has degree * . So, value ' needs to be summed up * times.

But each edge , ! has two endpoints so we need ' P'
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

x Mx
2 = min T
x
x x

Details!

Write B in axes of eigenvecotrs Q; , QN , , Q: of

R. So, B :E SE QE
Then we get: TB E SE TQE E SE 9E QE
4 if X
) W
So, what is 'U R'?
1 otherwise
B C TB E SE QE E SE 9E QE EF SE 9F SF QE QF

E SE 9E QE QE ) V
To minimize this over all unit vectors x orthogonal to:
w = min over choices of S; , S: ! so that:
SEN 1 (unit length) SE 0 (orthogonal to Q; )
To minimize this, set V and so ) V )
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

' is unit vector: '

' is orthogonal to 1st eigenvector , , ! thus:
' ' 4

What else do we know about x?

Remember:

= min
All labelings
of nodes . so
that BE 0

( i , j )E

( xi x j )
i

2
i

We want to assign values ' to nodes i such

that few edges cross 0.
(we want xi and xj to subtract each other)

Balance to minimize

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Back to finding the optimal cut

Express partition (A,B) as a vector
P Z
+ Y
Z #
We can minimize the cut of the partition by
finding a non-trivial vector x that minimizes:

Cant solve exactly. Lets relax + and

allow it to take any real value.

3E 1 0

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

3F P1

) [\] Z + : The minimum value of Z+! is

given by the 2nd smallest eigenvalue 2 of the

Laplacian matrix L
^ _`a [\] b Z + : The optimal solution for y
is given by the corresponding eigenvector ',
referred as the Fiedler vector
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Details!

Suppose there is a partition of G into A and B

# ghigj klmn o pm q!
where M c |e|, s.t. V
o
then 2V @ )

This is the approximation guarantee of the spectral

clustering. It says the cut spectral finds is at most 2
away from the optimal one of score V.

Proof:

Let: a=|A|, b=|B| and e= # edges from A to B

Enough to choose some ' based on A and B such
that: 9N c

rs 7rt
s rsu

c 2S

) is only smaller

(while also E BE 0)

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Details!

Proof (continued):
1) Lets set: '

Z
w
v
Z
P
x

Lets quickly verify that E BE 0: y

2) Then:

;
z

;
|

rs 7rt
s rsu

;
z

} } u
s,t
~
} u
} u
z 7 | ~

e number of edges between A and B

;
z

;
|

} } u

~
} }

~

Which proves that the cost

achieved by spectral is better
than twice the OPT cost

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Details!

V
V @ ) @
w'

Putting it all together:

where nzr is the maximum node degree

in the graph
Note we only provide the 1st part: V @ )
We did not prove ) @

V
w'

Overall this always certifies that ) always gives a

useful bound
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

How to define a good partition of a graph?

Minimize a given graph cut criterion

How to efficiently identify such a partition?

Approximate using information provided by the
eigenvalues and eigenvectors of a graph

Spectral Clustering

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Three basic stages:

1) Pre-processing
Construct a matrix representation of the graph

2) Decomposition
Compute eigenvalues and eigenvectors of the matrix
Map each point to a lower-dimensional representation
based on one or more eigenvectors

3) Grouping
Assign points to two or more clusters, based on the new
representation

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

1) Pre-processing:

-1

0.0

0.4

0.3

-0.5

-0.2

-0.4

-0.5

1.0

0.4

0.6

0.4

-0.4

0.4

0.0

0.4

0.3

0.1

0.6

-0.4

0.5

0.4

-0.3

0.1

0.6

0.4

-0.5

4.0

0.4

-0.3

-0.5

-0.2

0.4

0.5

5.0

0.4

-0.6

0.4

-0.4

0.0

Build Laplacian
matrix L of the
graph

2)
Decomposition:
Find eigenvalues
and eigenvectors x
of the matrix L
Map vertices to
corresponding
components of 2

3.0
3.0

0.3

0.6

0.3

-0.3

-0.6

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

How do we now
find the clusters?
50

3) Grouping:
Sort components of reduced 1-dimensional vector
Identify clusters by splitting the sorted vector in two

How to choose a splitting point?

Nave approaches:
Split at 0 or median value

More expensive approaches:

Attempt to minimize normalized cut in 1-dimension
(sweep over ordering of nodes induced by the eigenvector)
Split at 0:
Cluster A: Positive points
Cluster B: Negative points

0.3

0.6

0.3

-0.3

0.3

-0.3

0.6

-0.3

-0.6

0.3

-0.6

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Value of x2

Rank in x2
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Value of x2

Components of x2

Rank in x2
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Components of x1

Components of x3
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

How do we partition a graph into k clusters?

Two basic approaches:

Recursive bi-partitioning [Hagen et al., 92]
Recursively apply bi-partitioning algorithm in a
hierarchical divisive manner
Disadvantages: Inefficient, unstable

Cluster multiple eigenvectors [Shi-Malik, 00]

Build a reduced space from multiple eigenvectors
Commonly used in recent papers
A preferable approach
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Approximates the optimal cut [Shi-Malik, 00]

Can be used to approximate optimal k-way normalized
cut

Emphasizes cohesive clusters

Increases the unevenness in the distribution of the data
Associations between similar points are amplified,
associations between dissimilar points are attenuated
The data begins to approximate a clustering

Well-separated space

Transforms data to a new embedded space,

consisting of k orthogonal basis vectors

Multiple eigenvectors prevent instability due to

information loss
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

[Kumar et al. 99]

Searching for small communities in

the Web graph
What is the signature of a community /
discussion in a Web graph?

Use this to define topics:

What the same people on
the left talk about on the right
Remember HITS!

Dense 2-layer graph

Intuition: Many people all talking about the same things
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

A more well-defined problem:

Enumerate complete bipartite subgraphs Ks,t
Where Ks,t : s nodes on the left where each links
to the same t other nodes on the right

|X| = s = 3
|Y| = t = 4

K3,4
Fully connected
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

[Agrawal-Srikant 99]

Market basket analysis. Setting:

Market: Universe U of n items
Baskets: m subsets of U: S1, S2, , Sm U
(Si is a set of items one person bought)
Support: Frequency threshold f

Goal:
Find all subsets T s.t. T Si of at least f sets Si
(items in T were bought together at least f times)

Whats the connection between the

itemsets and complete bipartite graphs?
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

[Kumar et al. 99]

Frequent itemsets = complete bipartite graphs!

How?
View each node i as a
set Si of nodes i points to
Ks,t = a set Y of size t
that occurs in s sets Si
Looking for Ks,t set of
frequency threshold to s
and look at layer t all
frequent sets of size t

b
i

Si={a,b,c,d}

d
a
j
i
k

b
c
d

Y
s minimum support (|X|=s)
t itemset size (|Y|=t)

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

[Kumar et al. 99]

Say we find a frequent

itemset Y={a,b,c} of supp s
So, there are s nodes that
link to all of {a,b,c}:

View each node i as a

set Si of nodes i points to
a
b
i

c
d
x

Si={a,b,c,d}

Find frequent itemsets:

s minimum support
t itemset size

We found Ks,t!
Ks,t = a set Y of size t
that occurs in s sets Si

a
z

a
x
y
z

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

b
c

Y
62

{b,d}: support 3
{e,f}: support 2

c
d
e

Itemsets:
a = {b,c,d}
b = {d}
c = {b,d,e,f}
d = {e,f}
e = {b,d}
f = {}

Support threshold s=2

And we just found 2 bipartite

subgraphs:
a

c
d

e
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Example of a community from a web graph

Nodes on the right

Nodes on the left

[Kumar, Raghavan, Rajagopalan, Tomkins: Trawling the Web for emerging cyber-communities 1999]
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org

Interview for Java SDET position
No ratings yet
Interview for Java SDET position
17 pages
Vigenere Attack
No ratings yet
Vigenere Attack
13 pages
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
No ratings yet
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
64 pages
Unit 4
No ratings yet
Unit 4
60 pages
ch05 Linkanalysis1
No ratings yet
ch05 Linkanalysis1
60 pages
Girvan-Newman Algorithm
No ratings yet
Girvan-Newman Algorithm
77 pages
ch04 Streams1
No ratings yet
ch04 Streams1
4 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
56 pages
ch01 Intro
No ratings yet
ch01 Intro
28 pages
ch07 Clustering
No ratings yet
ch07 Clustering
56 pages
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
No ratings yet
Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
46 pages
16 Streams
No ratings yet
16 Streams
5 pages
Mining Data Streams 1
No ratings yet
Mining Data Streams 1
46 pages
ch01 Intro
No ratings yet
ch01 Intro
29 pages
16 Streams
No ratings yet
16 Streams
61 pages
ch04 Streams2
No ratings yet
ch04 Streams2
4 pages
Lecture 27
No ratings yet
Lecture 27
21 pages
Unit 5
No ratings yet
Unit 5
39 pages
Mining Data Streams (Part 2) : Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
No ratings yet
Mining Data Streams (Part 2) : Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
46 pages
Big Data Analytics Course Introduction
No ratings yet
Big Data Analytics Course Introduction
28 pages
Mod2_Data_Streams
No ratings yet
Mod2_Data_Streams
75 pages
Mining Data Streams (Part 1) : Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
No ratings yet
Mining Data Streams (Part 1) : Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
46 pages
18 Advertising
No ratings yet
18 Advertising
48 pages
L21 Mining Social Network Graphs
No ratings yet
L21 Mining Social Network Graphs
30 pages
Aphs
No ratings yet
Aphs
111 pages
08 Recsys2
No ratings yet
08 Recsys2
60 pages
19 Bandits
No ratings yet
19 Bandits
48 pages
Ch01 Intro
No ratings yet
Ch01 Intro
19 pages
09 Pagerank
No ratings yet
09 Pagerank
61 pages
18-Sub-Modular Functions
No ratings yet
18-Sub-Modular Functions
51 pages
ch-09 - Part 1
No ratings yet
ch-09 - Part 1
22 pages
ch06-assocrules
No ratings yet
ch06-assocrules
59 pages
Map-Reduce and The New Software Stack: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
No ratings yet
Map-Reduce and The New Software Stack: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
48 pages
Map-Reduce and The New Software Stack: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
No ratings yet
Map-Reduce and The New Software Stack: Mining of Massive Datasets Jure Leskovec, Anand Rajaraman, Jeff Ullman
49 pages
19 Submodular
No ratings yet
19 Submodular
47 pages
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
No ratings yet
Graph Data Mining: Slides Are Modified From Jiawei Han & Micheline Kamber
37 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
42 pages
11 Graph Pattern Mining
No ratings yet
11 Graph Pattern Mining
71 pages
menendezLlorente
No ratings yet
menendezLlorente
22 pages
ch03 LSH
No ratings yet
ch03 LSH
58 pages
P 3.1.3 Hierarchical
No ratings yet
P 3.1.3 Hierarchical
30 pages
ch06 Assocrules
No ratings yet
ch06 Assocrules
110 pages
MapReduce - 1
No ratings yet
MapReduce - 1
39 pages
Graph Algorithms: Text Book: Introduction To Algorithms Byclrs
No ratings yet
Graph Algorithms: Text Book: Introduction To Algorithms Byclrs
142 pages
Mining of Massive Datasets 2nd free Edition Jure Leskovec - The ebook with rich content is ready for you to download
100% (1)
Mining of Massive Datasets 2nd free Edition Jure Leskovec - The ebook with rich content is ready for you to download
42 pages
MapReduce and The New Software Stack
No ratings yet
MapReduce and The New Software Stack
33 pages
Chp10 Cluster Analysis Basic Concepts and Methods
No ratings yet
Chp10 Cluster Analysis Basic Concepts and Methods
24 pages
Graph Mining: Anuraj Mohan 13MZ01, CSED
No ratings yet
Graph Mining: Anuraj Mohan 13MZ01, CSED
50 pages
MapReduce-Final
No ratings yet
MapReduce-Final
92 pages
ch02 Mapreduce
No ratings yet
ch02 Mapreduce
7 pages
Instant download (Ebook) Mining of Massive Datasets (Third Edition) by Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman ISBN 9781108476348, 1108476341 pdf all chapter
100% (12)
Instant download (Ebook) Mining of Massive Datasets (Third Edition) by Jure Leskovec, Anand Rajaraman, Jeffrey David Ullman ISBN 9781108476348, 1108476341 pdf all chapter
65 pages
Social Network Analysis Unit-3
No ratings yet
Social Network Analysis Unit-3
28 pages
Clustering Hierarchical Algorithms
100% (1)
Clustering Hierarchical Algorithms
21 pages
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
No ratings yet
CS246: Mining Massive Datasets Jure Leskovec,: Stanford University
49 pages
Clustering
No ratings yet
Clustering
39 pages
Unit 3 DVA
No ratings yet
Unit 3 DVA
50 pages
Statistical Analysis of Financial Networks: Cite This Paper
No ratings yet
Statistical Analysis of Financial Networks: Cite This Paper
14 pages
(Ebook) Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeﬀrey D. Ullman ISBN 9781107077232, 1107077230 - Quickly access the ebook and start reading today
No ratings yet
(Ebook) Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeﬀrey D. Ullman ISBN 9781107077232, 1107077230 - Quickly access the ebook and start reading today
59 pages
Pagerank The Matrix Formulation
No ratings yet
Pagerank The Matrix Formulation
4 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Computer Methods in Power Systems Analysis with MATLAB
From Everand
Computer Methods in Power Systems Analysis with MATLAB
Sekhar Chandra P.
No ratings yet
Neural Networks
From Everand
Neural Networks
Sasha Kurzweil
No ratings yet
Introduction To C Sharp
100% (2)
Introduction To C Sharp
72 pages
Practical Code Generation
No ratings yet
Practical Code Generation
41 pages
COMP313A Programming Languages
No ratings yet
COMP313A Programming Languages
92 pages
The Predicate Calculus
No ratings yet
The Predicate Calculus
71 pages
Artificial Intelligence: by Neeta Deshpande
No ratings yet
Artificial Intelligence: by Neeta Deshpande
1 page
Production Information System
No ratings yet
Production Information System
15 pages
CP - Video Workflow Guidelines - V3
No ratings yet
CP - Video Workflow Guidelines - V3
5 pages
Sandra Mock: Logistics Management Supply Chain Technology Inventory Control
No ratings yet
Sandra Mock: Logistics Management Supply Chain Technology Inventory Control
3 pages
802.11 - Wi-Fi Standards and Speeds Explained - Network World PDF
No ratings yet
802.11 - Wi-Fi Standards and Speeds Explained - Network World PDF
6 pages
Net Developer
No ratings yet
Net Developer
4 pages
Autodesk Autocad/ Autocad LT: Learning Essentials Training Course Outline
100% (1)
Autodesk Autocad/ Autocad LT: Learning Essentials Training Course Outline
3 pages
QCC Ewma
No ratings yet
QCC Ewma
50 pages
2nd Test 3rd Term
No ratings yet
2nd Test 3rd Term
4 pages
PHP Composer
100% (1)
PHP Composer
102 pages
TA357 Asset MGMT v1.0 (IBM Edits)
No ratings yet
TA357 Asset MGMT v1.0 (IBM Edits)
21 pages
Vendor Sub-Range in Purchasing - SAP Documentation
No ratings yet
Vendor Sub-Range in Purchasing - SAP Documentation
5 pages
HEC-RAS 5.0.7 Release Notes
100% (1)
HEC-RAS 5.0.7 Release Notes
28 pages
Webauth TR
No ratings yet
Webauth TR
24 pages
The Binary Search: Textbook Authors: Ken Lambert & Doug Nance Powerpoint Lecture by Dave Clausen
No ratings yet
The Binary Search: Textbook Authors: Ken Lambert & Doug Nance Powerpoint Lecture by Dave Clausen
15 pages
Graphs Sheet
No ratings yet
Graphs Sheet
19 pages
Android - I Can't Set JDK 1.8 in AndroidStudio - Stack Overflow
No ratings yet
Android - I Can't Set JDK 1.8 in AndroidStudio - Stack Overflow
3 pages
Programming in C
No ratings yet
Programming in C
5 pages
UNIT4
100% (1)
UNIT4
32 pages
TCS Percentage Quiz-3 » PREP INSTA
No ratings yet
TCS Percentage Quiz-3 » PREP INSTA
20 pages
Online E - Ticketing Report
100% (1)
Online E - Ticketing Report
67 pages
User Manual For Ownice K3 Series
100% (2)
User Manual For Ownice K3 Series
12 pages
CyberSecurity Monitoring Tools and Projects PDF
No ratings yet
CyberSecurity Monitoring Tools and Projects PDF
128 pages
Puspita 2019 J. Phys. Conf. Ser. 1196 012073
No ratings yet
Puspita 2019 J. Phys. Conf. Ser. 1196 012073
8 pages
ATMT Mobile Media HDD (Model H855)
100% (5)
ATMT Mobile Media HDD (Model H855)
44 pages
Fuzzy Logic: A Presentation On
No ratings yet
Fuzzy Logic: A Presentation On
17 pages
En ACS850 Crane CTRL PRG Supplement A Screen
No ratings yet
En ACS850 Crane CTRL PRG Supplement A Screen
128 pages
HPC Fall 2010: Prof. Robert Van Engelen
No ratings yet
HPC Fall 2010: Prof. Robert Van Engelen
35 pages
Love Charm - VNOJ - VNOI Online Judge
No ratings yet
Love Charm - VNOJ - VNOI Online Judge
2 pages
Dke Co.,Ltd: EPD Module User Manual DEPG0310RHS760F0
No ratings yet
Dke Co.,Ltd: EPD Module User Manual DEPG0310RHS760F0
31 pages