0% found this document useful (0 votes)
3 views

Stochastic Blockmodel.ipynb - Colab

Uploaded by

akhil.s18
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Stochastic Blockmodel.ipynb - Colab

Uploaded by

akhil.s18
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

12/4/24, 8:08 AM Stochastic Blockmodel.

ipynb - Colab

keyboard_arrow_down The Stochastic Blockmodel


import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
import sympy

keyboard_arrow_down The simplest community: a clique


G_clique = nx.from_edgelist([(i,j) for i in range(10) for j in range(10) if i!=j])
nx.draw(G_clique, pos=nx.circular_layout(G_clique))

# The adjacency matrix is (almost) all ones


A_clique = nx.adjacency_matrix(G_clique).todense()
sympy.Matrix(A_clique)

# Visualize the adjacency matrix


plt.matshow(A_clique)
plt.colorbar()

https://colab.research.google.com/drive/1J5VPLqq75tEOfmWKGVpSm0fCEP5MP6LC 1/11
12/4/24, 8:08 AM Stochastic Blockmodel.ipynb - Colab
<matplotlib.colorbar.Colorbar at 0x167fb20bc20>

keyboard_arrow_down Two communities: the Caveman graph


G_caveman = nx.from_edgelist([(i,j) for i in range(20) for j in range(20) if i!=j and (i-10)*(j-10)>0])
A_caveman = nx.adjacency_matrix(G_caveman).todense()
plt.matshow(A_caveman)

<matplotlib.image.AxesImage at 0x167fca4fec0>

keyboard_arrow_down Suppose you didn't know who lived in which cave


In other words, the nodes were in some random order

np.random.seed(42)
order = np.random.permutation(len(A_caveman))
A_caveman2 = A_caveman[order,:][:,order]
plt.matshow(A_caveman2)

https://colab.research.google.com/drive/1J5VPLqq75tEOfmWKGVpSm0fCEP5MP6LC 2/11
12/4/24, 8:08 AM Stochastic Blockmodel.ipynb - Colab
<matplotlib.image.AxesImage at 0x167fcad7c80>

keyboard_arrow_down How can we figure out which nodes are in the same cave?
Let's look at a few rows of the adjacency matrix.

fig, axes = plt.subplots(nrows=3, ncols=2, figsize=(12,2))


for i, r in enumerate([0, 2, 4, 5, 15, 12]):
col = i % 2
row = int(i/2)
axes[row, col].matshow(A_caveman2[r:r+1])
axes[row, col].set_title(f"Row {r}")
axes[row, col].axes.xaxis.set_ticks([])
axes[row, col].axes.yaxis.set_ticks([])

Idea: Run K-Nearest Neighbors clustering with these rows as the feature vectors
It would group rows 0/4/15 into one cluster, and 2/5/12 into another
Clusters = Communities

(We'll improve upon this idea later)

keyboard_arrow_down How would we check if the communities were good?


SUPPOSE someone told us here are the communities.

Maybe by doing K-Nearest Neighbors.


How would we check?

We would reorder the nodes by grouping people from the same club together
Then, we would look at the new adjacency matrix

np.where(order>=10)[0]

https://colab.research.google.com/drive/1J5VPLqq75tEOfmWKGVpSm0fCEP5MP6LC 3/11
12/4/24, 8:08 AM Stochastic Blockmodel.ipynb - Colab
array([ 2, 5, 7, 8, 9, 12, 14, 16, 17], dtype=int64)

someone_says_community1 = [0, 1, 3, 4, 6, 10, 11, 13, 15, 18]


someone_says_community2 = [2, 5, 7, 8, 9, 12, 14, 16, 17]

# Reordering the nodes


ordering = np.concatenate([someone_says_community1, someone_says_community2])
ordering

array([ 0, 1, 3, 4, 6, 10, 11, 13, 15, 18, 2, 5, 7, 8, 9, 12, 14,


16, 17])

# Adjacency matrix with new ordering


A_caveman2_ordered = A_caveman2[ordering][:, ordering]

# What does the reordered adjacency matrix look like?


plt.matshow(A_caveman2_ordered)

<matplotlib.image.AxesImage at 0x167fcb2acf0>

SUPPOSE someone told us here are the communities. How would we check?
IF the memberships are correct, the reordered adjacency matrix is block-structured.

Note: Whether the big block is first or the small block doesn't matter.

keyboard_arrow_down Stochastic Blockmodel: Generalizing the caveman graph


First, we will restate what we did in the caveman graph using new terminology.

n = 10 # number of nodes

# Each node can belong to one of two clubs


clubs = np.random.choice(2, size=n)
clubs

array([1, 1, 1, 0, 0, 1, 1, 1, 0, 1])

# interests = club memberships matrix


interests = np.zeros((n, 2))
interests[np.arange(n), clubs] = 1
interests

array([[0., 1.],
[0., 1.],
[0., 1.],
[1., 0.],
[1., 0.],
[0., 1.],
https://colab.research.google.com/drive/1J5VPLqq75tEOfmWKGVpSm0fCEP5MP6LC 4/11
12/4/24, 8:08 AM Stochastic Blockmodel.ipynb - Colab
[0., 1.],
[0., 1.],
[1., 0.],
[0., 1.]])

Each row represents one person


The first number is the person's interest in club #1.
The second number is the interest in club #2.

keyboard_arrow_down From interests to network


Fans of club #1 become friends
Fans of club #2 become friends

club1_fans = interests[:,0] # Everyone's interest in club #1


club1_fans

array([0., 0., 0., 1., 1., 0., 0., 0., 1., 0.])

# A1[i,j] = club1_fans[i] * club1_fans[j]


A1 = np.outer(club1_fans, club1_fans)

plt.matshow(A1)

<matplotlib.image.AxesImage at 0x167fde95e50>

club2_fans = interests[:,1] # fans of club #2


club2_fans

array([1., 1., 1., 0., 0., 1., 1., 1., 0., 1.])

A2 = np.outer(club2_fans, club2_fans)
plt.matshow(A2)

https://colab.research.google.com/drive/1J5VPLqq75tEOfmWKGVpSm0fCEP5MP6LC 5/11
12/4/24, 8:08 AM Stochastic Blockmodel.ipynb - Colab
<matplotlib.image.AxesImage at 0x167fdf3c410>

# All friendships together gives the adjacency matrix A


A = A1 + A2
plt.matshow(A)

<matplotlib.image.AxesImage at 0x167fca65460>

keyboard_arrow_down From interests to network (in one step)


# Same thing, without all the intermediate steps
A = interests @ interests.T # Matrix multiplication
plt.matshow(A)

https://colab.research.google.com/drive/1J5VPLqq75tEOfmWKGVpSm0fCEP5MP6LC 6/11
12/4/24, 8:08 AM Stochastic Blockmodel.ipynb - Colab
<matplotlib.image.AxesImage at 0x167fdf3f9b0>

keyboard_arrow_down From network to interests


You see the network. How can you figure out the club memberships?

⇒ To find the right memberships, we need to find the ordering that makes the matrix block-structured.

keyboard_arrow_down Method #1: Communities via modularity


G = nx.from_numpy_array(A)

# Communities via modularity


communities = nx.community.louvain_communities(G)
communities

[{0, 1, 2, 5, 6, 7, 9}, {3, 4, 8}]

# Check if the communities give a block-structured matrix


ordering = np.concatenate([list(x) for x in communities])
plt.matshow(A[ordering][:, ordering])

<matplotlib.image.AxesImage at 0x167fe47d100>

https://colab.research.google.com/drive/1J5VPLqq75tEOfmWKGVpSm0fCEP5MP6LC 7/11
12/4/24, 8:08 AM Stochastic Blockmodel.ipynb - Colab

keyboard_arrow_down Method #2: Communities via spectral decomposition


eigenvalues, eigenvectors = np.linalg.eigh(A)
eigenvalues

array([-4.88399708e-16, -4.17365745e-16, -4.44899761e-17, -5.41731251e-34,


3.81435172e-19, 3.71371796e-18, 2.61415001e-16, 4.46874861e-16,
3.00000000e+00, 7.00000000e+00])

The eigenvalues are returned in ascending order.


The two largest ones are 7 and 3; the rest are pretty much 0

Let's look at the last two eigenvectors

fig, ax = plt.subplots(figsize=(10,5))
ax.matshow(eigenvectors[:,-2:])

<matplotlib.image.AxesImage at 0x167fca66300>

Each row corresponds to one node.


Clearly, the rows are of two types

$\Rightarrow$ K-Nearest Neighbors clustering of the rows of the eigenvector matrix


Previously for the caveman: we clustered the rows of the adjacency matrix

from sklearn.cluster import KMeans

model = KMeans(n_clusters=2)
model.fit(eigenvectors[:,-2:]) # K-Means on the eigenvector rows
predicted_clubs = model.labels_ # Clusters found by K-Means
predicted_clubs

C:\Users\deepay\Miniconda\Lib\site-packages\sklearn\cluster\_kmeans.py:1446: UserWarning: KMeans is known to have a memory l


warnings.warn(
array([0, 0, 0, 1, 1, 0, 0, 0, 1, 0])

predicted_club1_members = np.where(predicted_clubs==0)[0]
predicted_club2_members = np.where(predicted_clubs==1)[0]
print('Predicted clubs', predicted_club1_members, 'and', predicted_club2_members)

Predicted clubs [0 1 2 5 6 7 9] and [3 4 8]

ordering = np.concatenate([predicted_club1_members, predicted_club2_members])


plt.matshow(A[ordering][:, ordering])

https://colab.research.google.com/drive/1J5VPLqq75tEOfmWKGVpSm0fCEP5MP6LC 8/11
12/4/24, 8:08 AM Stochastic Blockmodel.ipynb - Colab
<matplotlib.image.AxesImage at 0x16780407fb0>

keyboard_arrow_down Generalization
Story so far:

all fans of the same club become friends


fans of different clubs do not become friends.

Generalization:

Fans of the same club become friends with probability 0.8 (say)
Fans of different clubs become friends with probability 0.10 (say)

# B = cluster-connection matrix
B = np.array([[0.8, 0.1], [0.1, 0.8]])
sympy.Matrix(B)

Previously: A = interests @ interests.T # Adjacency matrix


This gave us the caveman graph

Now: We create a probability matrix, from which we sample the adjacency matrix

P = interests @ B @ interests.T # Probability matrix depends on the cluster-connection matrix B


P

array([[0.8, 0.8, 0.8, 0.1, 0.1, 0.8, 0.8, 0.8, 0.1, 0.8],
[0.8, 0.8, 0.8, 0.1, 0.1, 0.8, 0.8, 0.8, 0.1, 0.8],
[0.8, 0.8, 0.8, 0.1, 0.1, 0.8, 0.8, 0.8, 0.1, 0.8],
[0.1, 0.1, 0.1, 0.8, 0.8, 0.1, 0.1, 0.1, 0.8, 0.1],
[0.1, 0.1, 0.1, 0.8, 0.8, 0.1, 0.1, 0.1, 0.8, 0.1],
[0.8, 0.8, 0.8, 0.1, 0.1, 0.8, 0.8, 0.8, 0.1, 0.8],
[0.8, 0.8, 0.8, 0.1, 0.1, 0.8, 0.8, 0.8, 0.1, 0.8],
[0.8, 0.8, 0.8, 0.1, 0.1, 0.8, 0.8, 0.8, 0.1, 0.8],
[0.1, 0.1, 0.1, 0.8, 0.8, 0.1, 0.1, 0.1, 0.8, 0.1],
[0.8, 0.8, 0.8, 0.1, 0.1, 0.8, 0.8, 0.8, 0.1, 0.8]])

A = np.random.binomial(1, P) # Friendships are random


A

array([[1, 0, 1, 0, 0, 1, 0, 1, 0, 1],
[1, 1, 0, 0, 0, 1, 1, 1, 0, 1],
[0, 1, 1, 0, 0, 1, 1, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 1, 0, 0],
[0, 0, 1, 1, 1, 0, 0, 0, 1, 0],
[1, 1, 1, 0, 0, 1, 0, 1, 1, 0],
[1, 1, 1, 0, 0, 1, 1, 1, 0, 0],

https://colab.research.google.com/drive/1J5VPLqq75tEOfmWKGVpSm0fCEP5MP6LC 9/11
12/4/24, 8:08 AM Stochastic Blockmodel.ipynb - Colab
[0, 1, 1, 0, 0, 1, 1, 1, 0, 0],
[0, 0, 0, 1, 1, 0, 0, 1, 1, 0],
[0, 1, 1, 0, 0, 1, 1, 0, 0, 1]])

plt.matshow(A)

<matplotlib.image.AxesImage at 0x167808be510>

keyboard_arrow_down Does the block-structure still apply?


ordering = np.concatenate([np.where(clubs==0)[0], np.where(clubs==1)[0]]) # Actual communities
plt.matshow(A[ordering][:, ordering])

<matplotlib.image.AxesImage at 0x167808e0950>

Still roughly block structured.


In the caveman graph, it was exactly block-structured.

keyboard_arrow_down Finding the Communities


Same ideas as before.

keyboard_arrow_down Method #1: Modularity

https://colab.research.google.com/drive/1J5VPLqq75tEOfmWKGVpSm0fCEP5MP6LC 10/11
12/4/24, 8:08 AM Stochastic Blockmodel.ipynb - Colab
G = nx.from_numpy_array(A)
communities = nx.community.louvain_communities(G)
communities

[{2}, {3, 4, 8}, {7}, {0, 1, 5, 6, 9}]


Got split into too many communities

# Play with the resolution to get the desired number of communities


communities = nx.community.louvain_communities(G, resolution=0.5)
communities

[{3, 4, 8}, {0, 1, 2, 5, 6, 7, 9}]

ordering = np.concatenate([list(x) for x in communities])


plt.matshow(A[ordering][:, ordering])

<matplotlib.image.AxesImage at 0x167809c6510>

https://colab.research.google.com/drive/1J5VPLqq75tEOfmWKGVpSm0fCEP5MP6LC 11/11

You might also like