0% found this document useful (0 votes)
3 views

GCN

Uploaded by

muneebke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

GCN

Uploaded by

muneebke
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

Nodes, Edges and Weights

I A graph is a triplet G = (V, E, W), which includes vertices V, edges E, and weights W

) Vertices or nodes are a set of n labels. Typical labels are V = {1, . . . , n}

) Edges are ordered pairs of labels (i, j). We interpret (i, j) 2 E as “i can be influenced by j.”

) Weights wij 2 R are numbers associated to edges (i, j). “Strength of the influence of j on i.”

w42 w46

w12 2 4 6 w86

w43 w52 w65 w74


1 w23 w76 8

w53 w75
w31 3 5 7 w87

w35 w57

2
Directed Graphs

I Edge (i, j) is represented by an arrow pointing from j into i. Influence of node j on node i

) This is the opposite of the standard notation used in graph theory

I Edge (i, j) is di↵erent from edge (j, i) ) It is possible to have (i, j) 2 .E and (j, i) 2
/E

I If both edges are in the edge set, the weights can be di↵erent ) It is possible to have wij 6= wji

w42 w46

w12 2 4 6 w86

w43 w52 w65 w74


1 w23 w76 8
w53 w75
w31 3 5 7 w87

w35 w57

3
Symmetric Graphs

I A graph is symmetric or undirected if both, the edge set and the weight are symmetric

) Edges come in pairs ) We have (i, j) 2 E if and only if (j, i) 2 E

) Weights are symmetric ) We must have wij = wji for all (i, j) 2 E

w24 w46

w12 2 4 6 w68

w34 w25 w56 w47


1 w23 w67 8

w13 3 5 7 w78

w53 = w35 w57

4
Unweighted Graphs

I A graph is unweighted if it doesn’t have weights

) Equivalently, we can say that all weights are units ) wij = 1 for all (i, j) 2 E

I Unweighted graphs could be directed or symmetric

2 4 6

1 8

3 5 7

5
Unweighted Graphs

I A graph is unweighted if it doesn’t have weights

) Equivalently, we can say that all weights are units ) wij = 1 for all (i, j) 2 E

I Unweighted graphs could be directed or symmetric

2 4 6

1 8

3 5 7

5
Weighted Symmetric Graphs

I Graphs can be directed or symmetric. Separately, they can be weighted or unweighted.

I Most of the graphs we encounter in practical situations are symmetric and weighted

w24 w46

w12 2 4 6 w68

w34 w25 w56 w47


1 w23 w67 8

w13 3 5 7 w78

w35 w57

6
Graph Shift Operators

I Graphs have matrix representations. Which in this course, we call graph shift operators (GSOs)

7
Adjacency Matrices

I The adjacency matrix of graph G = (V, E, W) is the sparse matrix A with nonzero entries

Aij = wij , for all (i, j) 2 E

I If the graph is symmetric, the adjacency matrix is symmetric ) A = AT . As in the example

w24 = w42
2 4
2 3
w12 = w21 0 w12 w13 0 0
6 w21 0 w23 w24 0 7
6 7
A=6 w31 w32 0 0 w35 7.
1 w32 = w23 w45 = w54 4 5
0 w42 0 0 w45
0 0 w53 w54 0
w31 = w13
3 5
w53 = w35

8
Adjacency Matrices for Unweighted Graphs

I For the particular case in which the graph is unweighted. Weights interpreted as units

Aij = 1, for all (i, j) 2 E

1
1 2 4 2 3
0 1 1 0 0
6 1 0 1 1 0 7
6 7
11 1 A=6 1 1 0 0 1 7.
4 0 1 0 0 1 5
0 0 1 1 0
1 3 5
1

9
Neighborhoods and Degrees

I The neighborhood of node i is the set of nodes that influence i ) n(i) := {j : (i, j) 2 E}

X X
I Degree di of node i is the sum of the weights of its incident edges ) di = wij = wij
j2n(i) j:(i,j)2E}

w24 = w42
2 4
w12
I Node 1 neighborhood ) n(1) = {2, 3}
1 w32 = w23 w45 = w54
I Node 1 degree ) n(1) = w12 + w13
w13
3 5
w53 = w35

10
Degree Matrix

I The degree matrix is a diagonal matrix D with degrees as diagonal entries ) Dii = di

P
I Write in terms of adjacency matrix as D = diag(A1). Because (A1)i = j wij = di

1
1 2 4 2 3
2 0 0 0 0
6 0 3 0 0 0 7
1 1 1 6 7
D=6 0 0 3 0 0 7
4 0 0 0 2 0 5
0 0 0 0 2
1 3 5
1

11
Laplacian Matrix

I The Laplacian matrix of a graph with adjacency matrix A is ) L = D A = diag(A1) A

I Can also be written explicitly in terms of graph weights Aij = wij

) O↵ diagonal entries ) Lij = Aij = wij


X
) Diagonal entries ) Lii = di = wij
j2n(i)

2 3 1 2 4
2 1 1 0 0
6 1 3 1 1 0 7
6 7 1 1 1
L=6 1 1 3 0 1 7
4 0 1 0 2 1 5
0 0 1 1 2
1 3 5
1

12
Normalized Matrix Representations: Adjacencies

I Normalized adjacency and Laplacian matrices express weights relative to the nodes’ degrees

wij
I Normalized adjacency matrix ) Ā := D 1/2
AD 1/2
) Results in entries (Ā)ij = p
di dj

I The normalized adjacency is symmetric if the graph is symmetric ) ĀT = Ā.

13
Normalized Matrix Representations: Laplacians

I Normalized Laplacian matrix ) L̄ := D 1/2


LD 1/2
. Same normalization of adjacency matrix

⇣ ⌘
I Given definitions normalized representations ) L̄ = D 1/2
D A D 1/2
= I Ā

) The normalized Laplacian and adjacency are essentially the same linear transformation.

I Normalized operators are more homogeneous. The entries in the vector A1 tend to be similar.

14
Graph Shift Operator

I The Graph Shift Operator S is a stand in for any of the matrix representations of the graph

Adjacency Matrix Laplacian Matrix Normalized Adjacency Normalized Laplacian


S=A S=L S = Ā S = L̄

I If the graph is symmetric, the shift operator S is symmetric ) S = ST

I The specific choice matters in practice but most of results and analysis hold for any choice of S

15
Graph Signals

I Graph Signals are supported on a graph. They are the objets we process in Graph Signal Processing

16
Graph Signal

I Consider a given graph G with n nodes and shift operator S

I A graph signal is a vector x 2 Rn in which component xi is associated with node i

I To emphasize that the graph is intrinsic to the signal we may write the signal as a pair ) (S, x)

x2 w24 x4 w46 x6
w12 2 4 6 w68

x1 w34 w25 w56 w47 x8


1 w23 w67 8

w13 3 5 7 w78
x3 w35 w57 x7
x5

I The graph is an expectation of proximity or similarity between components of the signal x

17
Graph Signal Di↵usion

I Multiplication by the graph shift operator implements di↵usion of the signal over the graph
X X
I Define di↵used signal y = Sx ) Components are yi = wij xj = wij xj
j2n(i) j

) Stronger weights contribute more to the di↵usion output

) Codifies a local operation where components are mixed with components of neighboring nodes.

y2 w24 x4 w46 x6
w12 2 4 6 w68

w34 w25 w56 w47 x8


x1 w23
1 w67 8

w13 3 5 7 w78

x3 w35 w57 x7
x5

18
The Di↵usion Sequence

I Compose the di↵usion operator to produce di↵usion sequence ) defined recursively as

x(k+1) = Sx(k) , with x(0) = x

I Can unroll the recursion and write the di↵usion sequence as the power sequence ) x(k) = Sk x

x(0) = x = S0 x x(1) = Sx(0) = S1 x x(2) = Sx(1) = S2 x x(3) = Sx(2) = S3 x

19
Some Observations about the Di↵usion Sequence

I The kth element of the di↵usion sequence x (k) di↵uses information to k-hop neighborhoods

) One reason why we use the di↵usion sequence to define graph convolutions

I We have two definitions. One recursive. The other one using powers of S

) Always implement the recursive version. The power version is good for analysis

x(0) = x = S0 x x(1) = Sx(0) = S1 x x(2) = Sx(1) = S2 x x(3) = Sx(2) = S3 x

20
Graph Convolutional Filters

I Graph convolutional filters are the tool of choice for the linear processing of graph signals

21
Graph Filters

I Given graph shift operator S and coefficients hk , a graph filter is a polynomial (series) on S

1
X
H(S) = h k Sk
k=0

I The result of applying the filter H(S) to the signal x is the signal

1
X
y = H(S) x = h k Sk x
k=0

I We say that y = h ?S x is the graph convolution of the filter h = {hk }1


k=0 with the signal x

22
From Local to Global Information

I Graph convolutions aggregate information growing from local to global neighborhoods

I Consider a signal x supported on a graph with shift operator S. Along with filter h = {hk }Kk=01

x3 x2 x9 x8
3 2 9 8
x4 x1 x10 x7
4 1 10 7

5 6 11 12
x5 x6 x11 x12

K
X1
I Graph convolution output ) y = h ?S x = h0 S0 x +h1 S1 x +h2 S2 x +h3 S3 x + . . . = h k Sk x
k=0

23
Transferability of Filters Across Di↵erent Graphs

I The same filter h = {hk }1


k=0 can be executed in multiple graphs ) We can transfer the filter

Graph Filter on a Graph Same Graph Filter on Another Graph


x2 w24 x4 w46 x6
x3 x2 x9 x8
3 2 9 8 w12 2 4 6 w68

x4 x1 x10 x7 w34 w25 w56 w47


x1 x8
4 1 10 7 1 w23 w67 8

5 6 11 12 w13 3 5 7 w78
x5 x6 x11 x12
x3 w35 x5 w57 x7

1
X
I Graph convolution output ) y = h ?S x = h0 S0 x +h1 S1 x +h2 S2 x +h3 S3 x + . . . = h k Sk x
k=0

I Output depends on the filter coefficients h, the graph shift operator S and the signal x

24
Learning with Graph Signals

I Almost ready to introduce GNNs. We begin with a short discussion of learning with graph signals

1
Empirical Risk Minimization

I In this course, machine learning (ML) on graphs ⌘ empirical risk minimization (ERM) on graphs.

I In ERM we are given:

) A training set T containing observation pairs (x, y) 2 T . Assume equal length x, y, 2 Rn .

) A loss function `(y, ŷ) to evaluate the similarity between y and an estimate ŷ

) A function class C
⇣ ⌘
I Learning means finding function ⇤
2 C that minimizes loss ` y, (x) averaged over training set
X ⇣ ⌘

= argmin ` y, (x),
2C
(x,y)2T

I We use ⇤
(x) to estimate outputs ŷ = ⇤
(x) when inputs x are observed but outputs y are unknown

2
Empirical Risk Minimization with Graph Signals

I In ERM, the function class C is the degree of freedom available to the system’s designer


X ⇣ ⌘
= argmin ` y, (x)
2C
(x,y)2T

I Designing a Machine Learning ⌘ finding the right function class C

I Since we are interested in graph signals, graph convolutional filters are a good starting point
enough

despite
either
each
down
hence
from
for

can
by
if

but
in
into

th
ay
bo
aw
it
e
lik
e

e
at
littl

id
y

as
an

d
m

un
as

ay
aro
m t
igh y
an er
m
oth
re
mo
t an
mos d
h an
muc
an
must
r along
neithe
all
next
against
about
no
none
aboard
nor
a
nothing
yet
of
would
on
with
once
will
one
while
or
whi
othe ch
r whe
ou ther
r wh
ou ere
t wh
en
wh
ro
un
d at
sh

us
all

sh

up
ou

so

on
up
ld

so

un
me

su

un

to
than
ch

to

til
that

throu
the

those
them

this
then

they
thence

these
therefore

gh

3
Learning with a Graph Convolutional Filter

I Input / output signals x / y are graph signals supported on a common graph with shift operator S

K
X1
I Function class ) graph filters of order K supported on S ) (x) = h k Sk x = (x;S,h)
k=0

K
X1
x k
z= (x; S,h)
z= hk S x
k=0

X ⇣ ⌘
I Learn ERM solution restricted to graph filter class ) h⇤ = argmin ` y, ( x; S, h )
h
(x,y)2T

) Optimization is over filter coefficients h with the graph shift operator S given

4
When the Output is Not a Graph Signal: Readout

I Outputs y 2 Rm are not graph signals ) Add readout layer at filter’s output to match dimensions

K
X1
I Readout matrix A 2 Rm⇥n yields parametrization ) A ⇥ (x;S,h) = A ⇥ h k Sk x
k=0

K
X1
x k
z= (x; S,h) A⇥ (x; S,h)
z= hk S x A
k=0

X ⇣ ⌘
I Making A trainable is inadvisable. Learn filter only. ) h⇤ = argmin ` y, A ⇥ ( x; S, h )
h
(x,y)2T

I Readouts are simple. Read out node i ) A = eTi . Read out signal average ) A = 1T .

5
Graph Neural Networks (GNNs)

6
Pointwise Nonlinearities

I A pointwise nonlinearity is a nonlinear function applied componentwise. Without mixing entries

2 3 2 3
x1 (x1 )
h i 6 x2 7 6 (x2 ) 7
I The result of applying pointwise 6 7 6 7
to a vector x is ) x = 6 .. 7=6 .. 7
4 . 5 4 . 5
xn (xn )

I A pointwise nonlinearity is the simplest nonlinear function we can apply to a vector

I ReLU: (x) = max(0, x). Hyperbolic tangent: (x) = (e 2x 1)/(e 2x + 1). Absolute value: (x) = |x|.

I Pointwise nonlinearities decrease variability. ) They function as demodulators.

7
Learning with a Graph Perceptron

I Graph filters have limited expressive power because they can only learn linear maps
" K
#
X1
I A first approach to nonlinear maps is the graph perceptron ) (x) = k
hk S x = (x; S,h)
k=0

2 3 2 3
x1 (x1 )
K
X1 h i (x; S, h) h i 6 x2 7 6 (x2 ) 7
x k z 6 7 6 7
z= hk S x z x = 6 .. 7=6 .. 7
k=0
4 . 5 4 . 5
Perceptron xn (xn )

X ⇣ ⌘
I Optimal regressor restricted to perceptron class ) h⇤ = argmin ` y, ( x; S, h )
h
(x,y)2T

) Perceptron allows learning of nonlinear maps ) More expressive. Larger Representable Class

8
Graph Neural Networks (GNNs)

I To define a GNN we compose several graph perceptrons ) We layer graph perceptrons

I Layer 1 processes input signal x with the perceptron h1 = [h10 , . . . , h1,K 1] to produce output x1

"K 1 #
h i X k
x1 = z1 = h1k S x
k=0

I The Output of Layer 1 x1 becomes an input to Layer 2. Still x1 but with di↵erent interpretation

I Repeat analogous operations for L times (the GNNs depth) ) Yields the GNN predicted output xL

9
Graph Neural Networks (GNNs)

I To define a GNN we compose several graph perceptrons ) We layer graph perceptrons

I Layer 2 processes its input signal x1 with the perceptron h2 = [h20 , . . . , h2,K 1] to produce output x2

"K 1 #
h i X k
x2 = z2 = h2k S x1
k=0

I The Output of Layer 2 x2 becomes an input to Layer 3. Still x2 but with di↵erent interpretation

I Repeat analogous operations for L times (the GNNs depth) ) Yields the GNN predicted output xL

9
The GNN Layer Recursion

I A generic layer of the GNN, Layer `, takes as input the output x` 1 of the previous layer (` 1)

I Layer ` processes its input signal x` 1 with perceptron h` = [h`0 , . . . , h`,K 1] to produce output x`

"K 1 #
h i X k
x` = z` = h`k S x` 1
k=0

I With the convention that the Layer 1 input is x0 = x, this provides a recursive definition of a GNN

⇣ ⌘ ⇣ ⌘
I If it has L layers, the GNN output ) xL = x; S, h1 , . . . , hL = x; S, H

I The filter tensor H = [h1 , . . . , hL ] is the trainable parameter. The graph shift is prior information

10
GNN Block Diagram
x0 = x

KX1 h i
I Illustrate definition with a GNN with 3 layers z1 =
k
h1k S x
z1
x1 = z1
k=0
Layer 1
x1
I Feed input signal x = x0 into Layer 1 x1

"K 1 # KX1
z2 h i
h i X z2 =
k
h2k S x1 x2 = z2
x1 = z1 = h1k Sk x0 k=0
k=0 Layer 2
x2
x2
I Last layer output is the GNN output ) (x; S, H)
KX1 h i
k z3
z3 = h3k S x2 x3 = z3
) Parametrized by filter tensor H = [h1 , h2 , h3 ] k=0
Layer 3

x3 = (x; S, H)

11
GNN Block Diagram
x0 = x

KX1 h i
I Illustrate definition with a GNN with 3 layers z1 =
k
h1k S x
z1
x1 = z1
k=0
Layer 1
x1
I Feed Layer 1 output as an input to Layer 2 x1

"K 1 # KX1
z2 h i
h i X z2 =
k
h2k S x1 x2 = z2
x2 = z2 = h2k Sk x1 k=0
k=0 Layer 2
x2
x2
I Last layer output is the GNN output ) (x; S, H)
KX1 h i
k z3
z3 = h3k S x2 x3 = z3
) Parametrized by filter tensor H = [h1 , h2 , h3 ] k=0
Layer 3

x3 = (x; S, H)

11
GNN Block Diagram
x0 = x

KX1 h i
I Illustrate definition with a GNN with 3 layers z1 =
k
h1k S x
z1
x1 = z1
k=0
Layer 1
x1
I Feed Layer 2 output as an input to Layer 3 x1

"K 1 # KX1
z2 h i
h i X z2 =
k
h2k S x1 x2 = z2
x3 = z3 = h3k Sk x2 k=0
k=0 Layer 2
x2
x2
I Last layer output is the GNN output ) (x; S, H)
KX1 h i
k z3
z3 = h3k S x2 x3 = z3
) Parametrized by filter tensor H = [h1 , h2 , h3 ] k=0
Layer 3

x3 = (x; S, H)

11

You might also like