gather + MLP (without scatter) vs. sparse-matrix multiplication ? (for performance) #8822

neumannjan · 2024-01-25T14:37:54Z

neumannjan
Jan 25, 2024

Hi,

in Memory-Efficient Aggregations you talk about gather+MLP+scatter being less performant than sparse-matrix multiplication.

However, let me simplify the gather use-case. Let's say that you don't need to gather source nodes, let's say you only need to gather target nodes, then perhaps reshape, dense MLP and dense aggregation (e.g. tensor.mean(dim=-1) ). Let's say we've even removed the need for scatter_reduce, where we've replaced it with the reshape followed by the mean along the last dimension (due to some assumptions we can make for our case), so essentially the only remaining "sparse" operation is the gather, whereas all other operations are dense and no different from those found in traditional feed-forward NNs.

Would it in this case still be better to use sparse-dense matrix multiplication instead of the gather+MLP? Could you please help me understand why? I have found the info that for sparse-dense matrix multiplication you leverage CSR layout, which is more performant than COO. Do I understand correctly that gather is pretty much equivalent to sparse-dense multiplication in COO layout, which means it is less performant because COO is less performant than CSR?

Thanks!

rusty1s · 2024-01-29T07:47:49Z

rusty1s
Jan 29, 2024
Maintainer

If you are operating on fixed neighborhood sizes (and thus can use reshape + torch.mean), this will probably be the fastest option (but this is not applicable to any graph, just fixed-size degree graphs). However, you are still required to materialize the "edge-level" tensor (i.e., a matrix of shape [num_nodes, num_neighbors, num_features]) which might be too expensive to store in case the number of neighbors is large.

2 replies

neumannjan Feb 1, 2024
Author

So index_select is generally faster than sparse-dense matrix multiplication?

rusty1s Feb 2, 2024
Maintainer

Last time I benchmarked, this was generally true, but it certainly depends on the number of neighbors and the feature size.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gather + MLP (without scatter) vs. sparse-matrix multiplication ? (for performance) #8822

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

gather + MLP (without scatter) vs. sparse-matrix multiplication ? (for performance) #8822

Uh oh!

neumannjan Jan 25, 2024

Replies: 1 comment · 2 replies

Uh oh!

rusty1s Jan 29, 2024 Maintainer

Uh oh!

neumannjan Feb 1, 2024 Author

Uh oh!

rusty1s Feb 2, 2024 Maintainer

neumannjan
Jan 25, 2024

Replies: 1 comment 2 replies

rusty1s
Jan 29, 2024
Maintainer

neumannjan Feb 1, 2024
Author

rusty1s Feb 2, 2024
Maintainer