A question regarding aggregation; edge_index and adj_t #4728

jaes77 · 2022-05-27T02:12:52Z

jaes77
May 27, 2022

Thanks for supporting the great development environment.

When using gcn_conv in torch_geometric, I noticed that there are two options; using adj_t or edge_index to describe aggregation stage and both are totally the same whatever I use.

But it looks like there is some performance gap between them; doing sparse matmul with adj_t is faster than the other at least two times with cache = True.

In general, as far as I know, with torch only model, aggregation is composed of sparse matmul with adjacency matrix and input like "torch.spmm(input, adj)"

In the aspect of that, I'm curious why edge_index is used in most cases with pytorch_geometric model implementation.

Even in your environment (pytorch_geometric/benchmark/kernel/main_performance.py), default method utilizes edge_index and gather,scatter for aggregation.

Are there any advantages of edge_index compared with adj_t?

profile result
(Sorry for the bad paste, this environment is automatically converting below format.)
with adj_t

                Name              Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls  

              total              310.000us         2.06%      14.980ms      14.980ms             1  
           propagate               16.000us         0.11%      11.910ms      11.910ms             1

message_and_aggregate 11.000us 0.07% 11.861ms 11.861ms 1
torch_sparse::spmm_sum 11.844ms 78.71% 11.846ms 11.846ms 1
aten::scatter_add_ 1.279ms 8.50% 1.292ms 646.000us 2
aten::linear 44.000us 0.29% 900.000us 300.000us 3
dense_matmul 42.000us 0.28% 865.000us 865.000us 1
aten::matmul 19.000us 0.13% 723.000us 723.000us 1

with edge_index

             Name      Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls  

          total        395.000us         0.36%     111.016ms     111.016ms             1  
      propagate        23.000us         0.02%     107.470ms     107.470ms             1  
        collect         16.000us         0.01%      60.927ms      60.927ms             1

aten::index_select 60.884ms 54.79% 60.888ms 60.888ms 1
aten::scatter_add_ 26.154ms 23.54% 26.168ms 8.723ms 3
aggregate 17.000us 0.02% 24.854ms 24.854ms 1
message 10.000us 0.01% 21.628ms 21.628ms 1

(I'm now using gpu model)
"total" is total model running time.
propagate is "def propagate".

As you can see, total running time of adj_t is roughly over 7 times faster than that of edge_index in "cuda total".

rusty1s · 2022-05-27T06:40:55Z

rusty1s
May 27, 2022
Maintainer

Yes, adj_t allows us to make use of more memory-efficient aggregation operators, which can also benefit runtime performance (depends on the density of the graph though). I tried to describe the idea behind this here. Reasons for the performance gain is that adj_t leverages CSR layout, while edge_index leverages COO layout. As such, edge_index is way more flexible (you can simple add edges/nodes without re-sorting, etc), but is somewhat slower during message passing. In addition, not all GNN operators can leverage adj_t to full extent, e.g., most GNN operators that incorporate edge features or central node features will not see major performance gains when leveraging adj_t.

Due to this and the simplicity of the edge_index representation, we still use it as our default format for representing graphs.

5 replies

condy0919 Nov 2, 2022

To be specific, when using GraphUnet, how hard to make edge_index in CSR to squeeze performance of GCN? GCN is ready for CSR, but GraphUnet isn't.

a) Converting COO to CSR before GCN propagation leads to poor performance (running time of examples/graph_unet.py 38s vs 23s), seems like converting has a huge overhead.
b) Make edge_index CSR everywhere, includes...

Remove all uses of edge_weight in models/graph_unet.py, as it's stored in edge_index now
The augment_adj and relevant functions, e.g., remove_self_loops, add_self_loops, ...
The filter_adj function in topk_pool.py
Is there something missing here?

Looks like the plan b needs a lot of work.

------ cut off ------

What I did in plan a is commenting out of the original self.propagate(...), construct SparseTensor and self.propagate it.

https://github.com/pyg-team/pytorch_geometric/blob/master/torch_geometric/nn/conv/gcn_conv.py#L196

# original
out = self.propagate(edge_index, x=x, edge_weight=edge_weight, size=None)

# COO to CSR
adj = SparseTensor(row=edge_index[0], col=edge_index[1], value=edge_weight)
out = self.propagate(adj, x=x)

rusty1s Nov 3, 2022
Maintainer

Yes, plan b needs some changes to support both formats, but shouldn't be too hard to add if we just assume SparseTensor by default (and do the conversion if edge_index is a Tensor rather than a SparseTensor. For example, remove_self_loops and add_self_loops can be easily replaced by SparseTensor.set_diag(...), etc.

condy0919 Nov 3, 2022

Yes, plan b needs some changes to support both formats, but shouldn't be too hard to add if we just assume SparseTensor by default (and do the conversion if edge_index is a Tensor rather than a SparseTensor. For example, remove_self_loops and add_self_loops can be easily replaced by SparseTensor.set_diag(...), etc.

That's it. I will play with GraphUnet later :-)

condy0919 Nov 9, 2022

I have made CSR almost everywhere except that TopKPooling requires COO input, CSR has to be converted to COO then convert back. As mentioned above,

In addition, not all GNN operators can leverage adj_t to full extent, e.g., most GNN operators that incorporate edge features or central node features will not see major performance gains when leveraging adj_t.

the running time has no major difference after applying this patch.

rusty1s Nov 10, 2022
Maintainer

Thanks, feel free to send a PR :)

jaes77 · 2022-05-27T08:28:39Z

jaes77
May 27, 2022
Author

Thanks for the quick and concise explanation.
What kind of operation is not compatible to adjacency matrix?
Because I'm not familiar with comprehensive gnn models, I do not come up with the exception.
Isn't the connection information(adj_t, edge_index) only used in aggregation?

0 replies

A question regarding aggregation; edge_index and adj_t #4728

Uh oh!

Uh oh!

jaes77 May 27, 2022

Replies: 2 comments · 5 replies

Uh oh!

rusty1s May 27, 2022 Maintainer

Uh oh!

Uh oh!

condy0919 Nov 2, 2022

Uh oh!

rusty1s Nov 3, 2022 Maintainer

Uh oh!

condy0919 Nov 3, 2022

Uh oh!

Uh oh!

condy0919 Nov 9, 2022

Uh oh!

rusty1s Nov 10, 2022 Maintainer

Uh oh!

jaes77 May 27, 2022 Author

jaes77
May 27, 2022

Replies: 2 comments 5 replies

rusty1s
May 27, 2022
Maintainer

rusty1s Nov 3, 2022
Maintainer

rusty1s Nov 10, 2022
Maintainer

jaes77
May 27, 2022
Author