GCN是特别常见的图神经网络模型,这个模型在各种图神经网络的开源库都有实现,例如DGL,PYG。但是开源库里面的实现,基本上是空域的图卷积操作,意思是说DGL和PYG里面的邻接矩阵 AAA 都必须是硬定义的。这里的良性定义是指:Aij∈{0,1}A_{ij}\in\{0,1\}Aij∈{0,1},每个元素非0即1,必须能够确定性的知道是否存在节点 iii 和节点 jjj直接的连边。如果我们想要跑soft 的邻接矩阵 Aij∈[0,1]A_{ij}\in[0,1]Aij∈[0,1],这个矩阵里面的元素是连续型的实数,此时DGL和PYG就不得行了。
然而,根据最原始的GCN的定义,邻接矩阵是不要必须是hard的,因此本人考虑使用最原始的GCN,直接跑矩阵运算。
GCN的公式为:
其中AAA 是邻接矩阵,INI_NIN 是单位矩阵,注意里面的D^−0.5A^D^−0.5\hat D^{-0.5} \hat A \hat D^{-0.5}D^−0.5A^D^−0.5是归一化拉普拉斯矩阵的近似。
最原始的GCN可以参见:https://github.com/tkipf/pygcn
我们摘录其中的实现:
class GraphConvolution(Module):
"""
Simple GCN layer, similar to https://arxiv.org/abs/1609.02907
"""
def __init__(self, in_features, out_features, bias=True):
super(GraphConvolution, self).__init__()
self.in_features = in_features
self.out_features = out_features
self.weight = Parameter(torch.FloatTensor(in_features, out_features))
if bias:
self.bias = Parameter(torch.FloatTensor(out_features))
else:
self.register_parameter('bias', None)
self.reset_parameters()
def reset_parameters(self):
stdv = 1. / math.sqrt(self.weight.size(1))
self.weight.data.uniform_(-stdv, stdv)
if self.bias is not None:
self.bias.data.uniform_(-stdv, stdv)
def forward(self, input, adj):
support = torch.mm(input, self.weight)
output = torch.spmm(adj, support)
if self.bias is not None:
return output + self.bias
else:
return output
def __repr__(self):
return self.__class__.__name__ + ' (' \
+ str(self.in_features) + ' -> ' \
+ str(self.out_features) + ')'
其中邻接矩阵处理在 https://github.com/tkipf/pygcn/blob/master/pygcn/utils.py
def normalize(mx):
"""Row-normalize sparse matrix"""
rowsum = np.array(mx.sum(1))
r_inv = np.power(rowsum, -1).flatten()
r_inv[np.isinf(r_inv)] = 0.
r_mat_inv = sp.diags(r_inv)
mx = r_mat_inv.dot(mx)
return mx
我们注意到,源码里面实际的实现是用 D^−1A^\hat D^{-1} \hat AD^−1A^去近似归一化拉普拉斯矩阵。
这种类似的处理在Graph_Transformer_Networks也有出现:https://github.com/jmhIcoding/Graph_Transformer_Networks/blob/master/model.py
import torch
import numpy as np
import torch.nn as nn
import torch.nn.functional as F
import math
from matplotlib import pyplot as plt
import pdb
class GTN(nn.Module):
def __init__(self, num_edge, num_channels, w_in, w_out, num_class,num_layers,norm):
super(GTN, self).__init__()
self.num_edge = num_edge
self.num_channels = num_channels
self.w_in = w_in
self.w_out = w_out
self.num_class = num_class
self.num_layers = num_layers
self.is_norm = norm
layers = []
for i in range(num_layers):
if i == 0:
layers.append(GTLayer(num_edge, num_channels, first=True))
else:
layers.append(GTLayer(num_edge, num_channels, first=False))
self.layers = nn.ModuleList(layers)
self.weight = nn.Parameter(torch.Tensor(w_in, w_out))
self.bias = nn.Parameter(torch.Tensor(w_out))
self.loss = nn.CrossEntropyLoss()
self.linear1 = nn.Linear(self.w_out*self.num_channels, self.w_out)
self.linear2 = nn.Linear(self.w_out, self.num_class)
self.reset_parameters()
def reset_parameters(self):
nn.init.xavier_uniform_(self.weight)
nn.init.zeros_(self.bias)
def gcn_conv(self,X,H):
X = torch.mm(X, self.weight)
H = self.norm(H, add=True)
return torch.mm(H.t(),X)
def normalization(self, H):
for i in range(self.num_channels):
if i==0:
H_ = self.norm(H[i,:,:]).unsqueeze(0)
else:
H_ = torch.cat((H_,self.norm(H[i,:,:]).unsqueeze(0)), dim=0)
return H_
def norm(self, H, add=False):
H = H.t()
if add == False:
H = H*((torch.eye(H.shape[0])==0).type(torch.FloatTensor))
else:
H = H*((torch.eye(H.shape[0])==0).type(torch.FloatTensor)) + torch.eye(H.shape[0]).type(torch.FloatTensor)
deg = torch.sum(H, dim=1)
deg_inv = deg.pow(-1)
deg_inv[deg_inv == float('inf')] = 0
deg_inv = deg_inv*torch.eye(H.shape[0]).type(torch.FloatTensor)
H = torch.mm(deg_inv,H)
H = H.t()
return H
def forward(self, A, X, target_x, target):
A = A.unsqueeze(0).permute(0,3,1,2)
Ws = []
for i in range(self.num_layers):
if i == 0:
H, W = self.layers[i](A)
else:
H = self.normalization(H)
H, W = self.layers[i](A, H)
Ws.append(W)
#H,W1 = self.layer1(A)
#H = self.normalization(H)
#H,W2 = self.layer2(A, H)
#H = self.normalization(H)
#H,W3 = self.layer3(A, H)
for i in range(self.num_channels):
if i==0:
X_ = F.relu(self.gcn_conv(X,H[i]))
else:
X_tmp = F.relu(self.gcn_conv(X,H[i]))
X_ = torch.cat((X_,X_tmp), dim=1)
X_ = self.linear1(X_)
X_ = F.relu(X_)
y = self.linear2(X_[target_x])
loss = self.loss(y, target)
return loss, y, Ws
注意里面的norm。
问题: D^−0.5A^D^−0.5\hat D^{-0.5} \hat A \hat D^{-0.5}D^−0.5A^D^−0.5 和 D^−1A^\hat D^{-1} \hat AD^−1A^ 有啥区别呢?
答:D^−1A^\hat D^{-1} \hat AD^−1A^ 是随机游走矩阵,表示用当前邻居的特征值的算术平均数去更新当前节点。
D^−0.5A^D^−0.5\hat D^{-0.5} \hat A \hat D^{-0.5}D^−0.5A^D^−0.5是拉普拉斯平滑,表示用当前节点与所有邻居的特征值之差的和去更新当前节点。