fix/reduce cuda stream syncs in bipartite_subgraph() #10532

cathalobrien · 2025-11-19T12:53:06Z

Hello,

this PR reduces the number of CUDA stream syncs when running torch_geometric/utils/_subgraph.py bipartite_subgraph() on GPU from 4 to 1.
This is done by replacing the use of nonzero() with nonzero_static() and replacing an indexing operation in torch_geometric/utils/mask.py index_to_mask() with scatter()

there is still 1 more sync, i think it is this indexing here which calls nonzero()

#torch_geometric/utils/_subgraph.py bipartite_subgraph()
    edge_index = edge_index[:, edge_mask]
    edge_attr = edge_attr[edge_mask] if edge_attr is not None else None

the unit tests test/utils/test_mask.py and test/utils/test_subgraph.py pass.

See the screenshots below of pytorch perfetto traces before and after. This is accompanied by a speedup in my use case

for more information, see https://pre-commit.ci

reduce syncs

0ad2690

cathalobrien requested review from akihironitta, rusty1s and wsad1 as code owners November 19, 2025 12:53

[pre-commit.ci] auto fixes from pre-commit.com hooks

ea45d16

for more information, see https://pre-commit.ci

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix/reduce cuda stream syncs in bipartite_subgraph() #10532

fix/reduce cuda stream syncs in bipartite_subgraph() #10532

Uh oh!

cathalobrien commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fix/reduce cuda stream syncs in bipartite_subgraph() #10532

Are you sure you want to change the base?

fix/reduce cuda stream syncs in bipartite_subgraph() #10532

Uh oh!

Conversation

cathalobrien commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant