Skip to content

Commit 24eae1f

Browse files
authored
[GNNFlux] Fix Temporal graph classification tutorial (#575)
* Fix temporal graph classification literate * [GNNFlux] Translate `Traffic prediction` Pluto notebook to Literate (#572) * Add traffic prediction * Fixes * Fix temporal graph classification literate * Back to Vector * Fixes * Add info about 250 graphs
1 parent 9a665ee commit 24eae1f

File tree

3 files changed

+197
-12
lines changed

3 files changed

+197
-12
lines changed

GraphNeuralNetworks/docs/make.jl

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,7 @@ makedocs(;
6666
],
6767
"Temporal graph neural networks" =>[
6868
"Node autoregression" => "tutorials/traffic_prediction.md",
69-
"Temporal graph classification" => "tutorials/temporal_graph_classification_pluto.md"
69+
"Temporal graph classification" => "tutorials/temporal_graph_classification.md"
7070
],
7171
],
7272

Lines changed: 183 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,183 @@
1+
# Temporal Graph classification with GraphNeuralNetworks.jl
2+
3+
In this tutorial, we will learn how to extend the graph classification task to the case of temporal graphs, i.e., graphs whose topology and features are time-varying.
4+
5+
We will design and train a simple temporal graph neural network architecture to classify subjects' gender (female or male) using the temporal graphs extracted from their brain fMRI scan signals. Given the large amount of data, we will implement the training so that it can also run on the GPU.
6+
7+
## Import
8+
9+
We start by importing the necessary libraries. We use `GraphNeuralNetworks.jl`, `Flux.jl` and `MLDatasets.jl`, among others.
10+
11+
````julia
12+
using Flux
13+
using GraphNeuralNetworks
14+
using Statistics, Random
15+
using LinearAlgebra
16+
using MLDatasets: TemporalBrains
17+
using CUDA # comment out if you don't have a CUDA GPU
18+
19+
ENV["DATADEPS_ALWAYS_ACCEPT"] = "true" # don't ask for dataset download confirmation
20+
Random.seed!(17); # for reproducibility
21+
````
22+
23+
## Dataset: TemporalBrains
24+
The TemporalBrains dataset contains a collection of functional brain connectivity networks from 1000 subjects obtained from resting-state functional MRI data from the [Human Connectome Project (HCP)](https://www.humanconnectome.org/study/hcp-young-adult/document/extensively-processed-fmri-data-documentation).
25+
Functional connectivity is defined as the temporal dependence of neuronal activation patterns of anatomically separated brain regions.
26+
27+
The graph nodes represent brain regions and their number is fixed at 102 for each of the 27 snapshots, while the edges, representing functional connectivity, change over time.
28+
For each snapshot, the feature of a node represents the average activation of the node during that snapshot.
29+
Each temporal graph has a label representing gender ('M' for male and 'F' for female) and age group (22-25, 26-30, 31-35, and 36+).
30+
The network's edge weights are binarized, and the threshold is set to 0.6 by default.
31+
32+
````julia
33+
brain_dataset = TemporalBrains()
34+
````
35+
36+
````
37+
dataset TemporalBrains:
38+
graphs => 1000-element Vector{MLDatasets.TemporalSnapshotsGraph}
39+
````
40+
41+
After loading the dataset from the MLDatasets.jl package, we see that there are 1000 graphs and we need to convert them to the `TemporalSnapshotsGNNGraph` format.
42+
So we create a function called `data_loader` that implements the latter and splits the dataset into the training set that will be used to train the model and the test set that will be used to test the performance of the model. Due to computational costs, we use only 250 out of the original 1000 graphs, 200 for training and 50 for testing.
43+
44+
````julia
45+
function data_loader(brain_dataset)
46+
graphs = brain_dataset.graphs
47+
dataset = Vector{TemporalSnapshotsGNNGraph}(undef, length(graphs))
48+
for i in 1:length(graphs)
49+
graph = graphs[i]
50+
dataset[i] = TemporalSnapshotsGNNGraph(GNNGraphs.mlgraph2gnngraph.(graph.snapshots))
51+
# Add graph and node features
52+
for t in 1:27
53+
s = dataset[i].snapshots[t]
54+
s.ndata.x = [I(102); s.ndata.x']
55+
end
56+
dataset[i].tgdata.g = Float32.(Flux.onehot(graph.graph_data.g, ["F", "M"]))
57+
end
58+
# Split the dataset into a 80% training set and a 20% test set
59+
train_loader = dataset[1:200]
60+
test_loader = dataset[201:250]
61+
return train_loader, test_loader
62+
end
63+
````
64+
65+
````
66+
data_loader (generic function with 1 method)
67+
````
68+
69+
The first part of the `data_loader` function calls the `mlgraph2gnngraph` function for each snapshot, which takes the graph and converts it to a `GNNGraph`. The vector of `GNNGraph`s is then rewritten to a `TemporalSnapshotsGNNGraph`.
70+
71+
The second part adds the graph and node features to the temporal graphs, in particular it adds the one-hot encoding of the label of the graph (in this case we directly use the identity matrix) and appends the mean activation of the node of the snapshot (which is contained in the vector `dataset[i].snapshots[t].ndata.x`, where `i` is the index indicating the subject and `t` is the snapshot). For the graph feature, it adds the one-hot encoding of gender.
72+
73+
The last part splits the dataset.
74+
75+
## Model
76+
77+
We now implement a simple model that takes a `TemporalSnapshotsGNNGraph` as input.
78+
It consists of a `GINConv` applied independently to each snapshot, a `GlobalPool` to get an embedding for each snapshot, a pooling on the time dimension to get an embedding for the whole temporal graph, and finally a `Dense` layer.
79+
80+
First, we start by adapting the `GlobalPool` to the `TemporalSnapshotsGNNGraphs`.
81+
82+
````julia
83+
function (l::GlobalPool)(g::TemporalSnapshotsGNNGraph, x::AbstractVector)
84+
h = [reduce_nodes(l.aggr, g[i], x[i]) for i in 1:(g.num_snapshots)]
85+
return mean(h)
86+
end
87+
````
88+
89+
Then we implement the constructor of the model, which we call `GenderPredictionModel`, and the foward pass.
90+
91+
````julia
92+
struct GenderPredictionModel
93+
gin::GINConv
94+
mlp::Chain
95+
globalpool::GlobalPool
96+
dense::Dense
97+
end
98+
99+
Flux.@layer GenderPredictionModel
100+
101+
function GenderPredictionModel(; nfeatures = 103, nhidden = 128, σ = relu)
102+
mlp = Chain(Dense(nfeatures => nhidden, σ), Dense(nhidden => nhidden, σ))
103+
gin = GINConv(mlp, 0.5)
104+
globalpool = GlobalPool(mean)
105+
dense = Dense(nhidden => 2)
106+
return GenderPredictionModel(gin, mlp, globalpool, dense)
107+
end
108+
109+
function (m::GenderPredictionModel)(g::TemporalSnapshotsGNNGraph)
110+
h = m.gin(g, g.ndata.x)
111+
h = m.globalpool(g, h)
112+
return m.dense(h)
113+
end
114+
````
115+
116+
## Training
117+
118+
We train the model for 100 epochs, using the Adam optimizer with a learning rate of 0.001. We use the `logitbinarycrossentropy` as the loss function, which is typically used as the loss in two-class classification, where the labels are given in a one-hot format.
119+
The accuracy expresses the number of correct classifications.
120+
121+
````julia
122+
lossfunction(ŷ, y) = Flux.logitbinarycrossentropy(ŷ, y);
123+
124+
function eval_loss_accuracy(model, data_loader)
125+
error = mean([lossfunction(model(g), g.tgdata.g) for g in data_loader])
126+
acc = mean([round(100 * mean(Flux.onecold(model(g)) .== Flux.onecold(g.tgdata.g)); digits = 2) for g in data_loader])
127+
return (loss = error, acc = acc)
128+
end
129+
130+
function train(dataset)
131+
device = gpu_device()
132+
133+
function report(epoch)
134+
train_loss, train_acc = eval_loss_accuracy(model, train_loader)
135+
test_loss, test_acc = eval_loss_accuracy(model, test_loader)
136+
println("Epoch: $epoch $((; train_loss, train_acc)) $((; test_loss, test_acc))")
137+
return (train_loss, train_acc, test_loss, test_acc)
138+
end
139+
140+
model = GenderPredictionModel() |> device
141+
142+
opt = Flux.setup(Adam(1.0f-3), model)
143+
144+
train_loader, test_loader = data_loader(dataset)
145+
train_loader = train_loader |> device
146+
test_loader = test_loader |> device
147+
148+
report(0)
149+
for epoch in 1:100
150+
for g in train_loader
151+
grads = Flux.gradient(model) do model
152+
= model(g)
153+
lossfunction(vec(ŷ), g.tgdata.g)
154+
end
155+
Flux.update!(opt, model, grads[1])
156+
end
157+
if epoch % 20 == 0
158+
report(epoch)
159+
end
160+
end
161+
return model
162+
end
163+
164+
train(brain_dataset);
165+
````
166+
167+
````
168+
Epoch: 0 (train_loss = 0.80321693f0, train_acc = 50.5) (test_loss = 0.79863846f0, test_acc = 60.0)
169+
Epoch: 20 (train_loss = 0.5073769f0, train_acc = 74.5) (test_loss = 0.64655066f0, test_acc = 60.0)
170+
Epoch: 40 (train_loss = 0.13417317f0, train_acc = 96.5) (test_loss = 0.5689327f0, test_acc = 74.0)
171+
Epoch: 60 (train_loss = 0.01875147f0, train_acc = 100.0) (test_loss = 0.45651233f0, test_acc = 82.0)
172+
Epoch: 80 (train_loss = 0.12695672f0, train_acc = 95.0) (test_loss = 0.65159386f0, test_acc = 82.0)
173+
Epoch: 100 (train_loss = 0.036399372f0, train_acc = 99.0) (test_loss = 0.6491585f0, test_acc = 86.0)
174+
175+
````
176+
177+
# Conclusions
178+
In this tutorial, we implemented a very simple architecture to classify temporal graphs in the context of gender classification using brain data. We then trained the model on the GPU for 100 epochs on the TemporalBrains dataset. The accuracy of the model is approximately 85%, but can be improved by fine-tuning the parameters and training on more data.
179+
180+
---
181+
182+
*This page was generated using [Literate.jl](https://github.com/fredrikekre/Literate.jl).*
183+

GraphNeuralNetworks/docs/src_tutorials/introductory_tutorials/temporal_graph_classification.jl

Lines changed: 13 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@ using LinearAlgebra
1616
using MLDatasets: TemporalBrains
1717
using CUDA # comment out if you don't have a CUDA GPU
1818

19+
ENV["DATADEPS_ALWAYS_ACCEPT"] = "true" # don't ask for dataset download confirmation
20+
Random.seed!(17); # for reproducibility
21+
22+
1923
# ## Dataset: TemporalBrains
2024
# The TemporalBrains dataset contains a collection of functional brain connectivity networks from 1000 subjects obtained from resting-state functional MRI data from the [Human Connectome Project (HCP)](https://www.humanconnectome.org/study/hcp-young-adult/document/extensively-processed-fmri-data-documentation).
2125
# Functional connectivity is defined as the temporal dependence of neuronal activation patterns of anatomically separated brain regions.
@@ -28,23 +32,23 @@ using CUDA # comment out if you don't have a CUDA GPU
2832
brain_dataset = TemporalBrains()
2933

3034
# After loading the dataset from the MLDatasets.jl package, we see that there are 1000 graphs and we need to convert them to the `TemporalSnapshotsGNNGraph` format.
31-
# So we create a function called `data_loader` that implements the latter and splits the dataset into the training set that will be used to train the model and the test set that will be used to test the performance of the model.
35+
# So we create a function called `data_loader` that implements the latter and splits the dataset into the training set that will be used to train the model and the test set that will be used to test the performance of the model. Due to computational costs, we use only 250 out of the original 1000 graphs, 200 for training and 50 for testing.
3236

3337

3438
function data_loader(brain_dataset)
3539
graphs = brain_dataset.graphs
3640
dataset = Vector{TemporalSnapshotsGNNGraph}(undef, length(graphs))
3741
for i in 1:length(graphs)
3842
graph = graphs[i]
39-
dataset[i] = TemporalSnapshotsGNNGraph(GraphNeuralNetworks.mlgraph2gnngraph.(graph.snapshots))
40-
# Add graph and node features
43+
dataset[i] = TemporalSnapshotsGNNGraph(GNNGraphs.mlgraph2gnngraph.(graph.snapshots))
44+
## Add graph and node features
4145
for t in 1:27
4246
s = dataset[i].snapshots[t]
4347
s.ndata.x = [I(102); s.ndata.x']
4448
end
4549
dataset[i].tgdata.g = Float32.(Flux.onehot(graph.graph_data.g, ["F", "M"]))
4650
end
47-
# Split the dataset into a 80% training set and a 20% test set
51+
## Split the dataset into a 80% training set and a 20% test set
4852
train_loader = dataset[1:200]
4953
test_loader = dataset[201:250]
5054
return train_loader, test_loader
@@ -65,8 +69,7 @@ end
6569

6670
function (l::GlobalPool)(g::TemporalSnapshotsGNNGraph, x::AbstractVector)
6771
h = [reduce_nodes(l.aggr, g[i], x[i]) for i in 1:(g.num_snapshots)]
68-
sze = size(h[1])
69-
reshape(reduce(hcat, h), sze[1], length(h))
72+
return mean(h)
7073
end
7174

7275
# Then we implement the constructor of the model, which we call `GenderPredictionModel`, and the foward pass.
@@ -91,7 +94,6 @@ end
9194
function (m::GenderPredictionModel)(g::TemporalSnapshotsGNNGraph)
9295
h = m.gin(g, g.ndata.x)
9396
h = m.globalpool(g, h)
94-
h = mean(h, dims=2)
9597
return m.dense(h)
9698
end
9799

@@ -135,16 +137,16 @@ function train(dataset)
135137
end
136138
Flux.update!(opt, model, grads[1])
137139
end
138-
if epoch % 10 == 0
140+
if epoch % 20 == 0
139141
report(epoch)
140142
end
141143
end
142144
return model
143145
end
144146

147+
train(brain_dataset);
145148

146-
train(brain_dataset)
147149

148-
## Conclusions
149150
#
150-
# In this tutorial, we implemented a very simple architecture to classify temporal graphs in the context of gender classification using brain data. We then trained the model on the GPU for 100 epochs on the TemporalBrains dataset. The accuracy of the model is approximately 75-80%, but can be improved by fine-tuning the parameters and training on more data.
151+
# # Conclusions
152+
# In this tutorial, we implemented a very simple architecture to classify temporal graphs in the context of gender classification using brain data. We then trained the model on the GPU for 100 epochs on the TemporalBrains dataset. The accuracy of the model is approximately 85%, but can be improved by fine-tuning the parameters and training on more data.

0 commit comments

Comments
 (0)