CS224W Homework 3
November 2, 2023
1 GraphRNN [20 points]
In class, we covered GraphRNN, a generative model for graph structures. Here we
assume that the graph has no node types/features, and no edge types/features.
1.1 Edge-level RNN [12 points]
Remember that GraphRNN uses random BFS ordering to generate graphs by iter-
atively adding a node and predicting its connectivity to the nodes already in the
graph. Suppose that the GraphRNN model is generating a grid graph:
Figure 1: GraphRNN grid graph.
If we wanted a GraphRNN to generate this graph, what predictions would each
cell in the edge-level RNN need to make? Recall that a GraphRNN needs to predict,
for each new node, which existing nodes it needs to wire an edge with. It outputs
1 when there should be an edge, and 0 when there should not. Nodes are added
in BFS ordering starting from Node A. Assume that the neighbors of a node are
explored in alphabetical order (i.e. ties are broken using alphabetical ordering).
Sample answer format (at a particular step of the node-level RNN):
Decodes node < node Id > (edge-level RNN predicts < edge value(s) > for con-
nectivity to < node Id(s) > )
1
*fill-in < node Id >, < edge value(s) >, and < node Id(s) >
⋆ Solution ⋆
1.2 Advantages of BFS ordering [8 points]
Explain 2 advantages of graph generation with random BFS ordering of nodes in
the graph, as opposed to generating with a random ordering of nodes in the graph.
⋆ Solution ⋆
2 LightGCN [25 points]
We learned in class about LightGCN, a GNN model for recommender systems.
Given a bipartite user-item graph G = (V, E), let A ∈ R|V |×|V | be its unnormal-
ized adjacency matrix, D ∈ R|V |×|V | be its degree matrix and E(k) ∈ R|V |×d be
its node embedding matrix at layer k where d is the embedding dimension. Let
à = D−1/2 AD−1/2 be the normalized adjacency matrix.
The original GCN updates node embeddings across layers according to E(k+1) =
ReLU(ÃE(k) W(k) ), while LightGCN removes the non-linearity and uses the equa-
tion for each layer k ∈ {0, 1, ..., K − 1}:
E(k+1) = ÃE(k) (1)
Moreover, LightGCN adopts multi-scale diffusion to compute the final node embed-
dings for link prediction, averaging across layers:
K
X
E= αi E(i) , (2)
i=0
1
where we have uniform coefficients αi = K+1 .
2.1 Advantages of Average Embeddings [4 points]
Why does LightGCN average over layer embeddings? What benefits does it bring,
in a recommendation systems setting?
What to submit? 1-3 sentences of explanation on the reasons and benefits of
averaging across layers.
⋆ Solution ⋆
2
2.2 Self-connection [4 points]
(k) (k)
We denote the embedding of an item i at layer-k ei and that of a user u eu . The
graph convolution operation (a.k.a., propagation rule) in LightGCN is defined as:
X 1 (k)
e(k+1)
u = p p ei
i∈Nu
|Nu | |Ni |
(k+1)
X 1
ei = p p e(k)
u
u∈Ni
|Ni | |Nu |
1√
The symmetric normalization term √ follows the design of standard GCN,
|Nu | |Ni |
which can avoid the scale of embeddings increasing with graph convolution opera-
tions.
However, from the equations above, we can find that in LGCN, we only aggregate
the connected neighbors and do not integrate the target node itself (i.e., there is
no self-connection). This is different from most existing graph convolution op-
erations that typically aggregate extended neighbors and also specifically handle
self-connection.
Does LightGCN contain implicit self-connection? If your answer is yes, which op-
eration captures the same effect as self-connection? If no, what do you think is the
reason why LightGCN doesn’t need self-connection or similar effects?
What to submit? Yes or no and 1-2 sentences of justification.
⋆ Solution ⋆
2.3 Relation with APPNP [5 points]
There is a work that connects GCN with Personalized PageRank, where the authors
propose a GCN variant named APPNP that can propagate long range without the
risk of oversmoothing. Inspired by the teleport design in Personalized PageRank,
APPNP complements each propagation layer with the starting features (i.e., the 0-
th layer embeddings), which can balance the need of preserving locality (i.e., staying
close to the root node to alleviate oversmoothing) and leveraging the information
from a large neighborhood. The propagation layer in APPNP is defined as:
E(k+1) = βE(0) + (1 − β)ÃE (k)
where β is called the “teleport probability” to control the retention of starting fea-
tures in the propagation, and à denotes the normalized adjacency matrix.
3
Aligning with Equation (2), we can see that by setting αk accordingly, LightGCN
can fully recover the prediction embedding used by APPNP. As such, LightGCN
shares the strength of APPNP in combating oversmoothing — by setting the α
properly, LightGCN allows using a large K for long-range modeling with control-
lable oversmoothing.
Express the layer-K embeddings E(K) of APPNP as a function of the initial embed-
dings E(0) and the normalized adjacency matrix Ã. Show all work.
What to submit? Multi-line mathematical derivation of the relationship be-
tween E(K) and E(0)
⋆ Solution ⋆
2.4 Recommendation Task [12 points]
We are given the following bipartite graph. Solid edges represent already-observed
user-item interactions, while dotted edges denote the set of positive interactions in
the future.
One-dimensional embeddings for the nodes are listed in the following table.
a b c d e f g h
0.3 0.96 0.7 0.6 0.4 0.8 0.7 0.64
We are given a recommendation model that uses L2 distance (lower distance means
higher recommendation) between user and item embeddings to calculate the scores.
Compute the Recall@2 score for users a, b, c. For each user, explicitly write out the
positive items Pu and recommended items Ru , as sets of the form {X, Y, Z}. Please
exclude already-interacted items in Ru , as we are only interested in recommending
not-yet-interacted items.
Fill in your answers in the following table. Where the first 5 columns represent
user-item distances. What is the final Recall@2?
d e f g h Pu Ru Recall@2
a
b
c
4
What to submit? The Pu , Ru , and the Recall@2 values for each of the users,
as well as the overall final Recall@2.
⋆ Solution ⋆
3 Honor Code [0 points]
(X) I have read and understood Stanford Honor Code before I submitted my work.
**Collaboration: Write down the names & SUNetIDs of students you collabo-
rated with on Homework 2 (None if you didn’t).**
**Note: Read our website on our policy about collaboration!**