Replicate Neighbor Sampler behaviour without torch_sparse or pyg_lib. #9218
Unanswered
atpugludrim
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am a researcher working on a GNN based project. I want to scale the training to a large dataset like reddit. I am working on a shared server. I do not have
sudo
access on the server. This will be important. I can easily scale training to large datasets usingNeighborLoader
orNeighborSampler
. The only problem is that it requirespyg-lib
ortorch-sparse
, but they get disabled due to missing dependencies (GLIBC-2.29
andGLIBCXX-3.4.29
not found, and since I am notsudo
, I can not install or update them).I'm thinking of a solution where I replicate the behavior of$i$ th node in
NeighborSampler
in puretorch.utils.data.DataLoader
. Please see the attached piece of code. What I am unable to do is perform parallel BFS to extract k-hop neighborhoods of nodes and renumber nodes in them to create proper batchededge_index
-es, and simultaneously map the target nodes in the right order (B
should be the node mapped toi
inedge_index
, for the convolutions to work properly). The attached code is obviously buggy, and incomplete. Please help me do this parallel BFS + numbering in as efficient a way as possible without using the torch cpp extension thattorch-sparse
uses.I know I can loop over everything and generate their ego graphs and then do another loop to re-number, but is that the most efficient way to do this?
Thanks in advance.
Replicate Neighbor Loader.ipynb.gz
Beta Was this translation helpful? Give feedback.
All reactions