Replicate Neighbor Sampler behaviour without torch_sparse or pyg_lib. #9218

atpugludrim · 2024-04-20T17:15:26Z

atpugludrim
Apr 20, 2024

Hi, I am a researcher working on a GNN based project. I want to scale the training to a large dataset like reddit. I am working on a shared server. I do not have sudo access on the server. This will be important. I can easily scale training to large datasets using NeighborLoader or NeighborSampler. The only problem is that it requires pyg-lib or torch-sparse, but they get disabled due to missing dependencies (GLIBC-2.29 and GLIBCXX-3.4.29 not found, and since I am not sudo, I can not install or update them).

I'm thinking of a solution where I replicate the behavior of NeighborSampler in pure torch.utils.data.DataLoader. Please see the attached piece of code. What I am unable to do is perform parallel BFS to extract k-hop neighborhoods of nodes and renumber nodes in them to create proper batched edge_index-es, and simultaneously map the target nodes in the right order ($i$^th node in B should be the node mapped to i in edge_index, for the convolutions to work properly). The attached code is obviously buggy, and incomplete. Please help me do this parallel BFS + numbering in as efficient a way as possible without using the torch cpp extension that torch-sparse uses.

I know I can loop over everything and generate their ego graphs and then do another loop to re-number, but is that the most efficient way to do this?

Thanks in advance.

Replicate Neighbor Loader.ipynb.gz

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Replicate Neighbor Sampler behaviour without torch_sparse or pyg_lib. #9218

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Replicate Neighbor Sampler behaviour without torch_sparse or pyg_lib. #9218

Uh oh!

atpugludrim Apr 20, 2024

Replies: 0 comments

atpugludrim
Apr 20, 2024