Training with a huge dataset class #9227
Unanswered
Charles-Ca
asked this question in
Q&A
Replies: 1 comment 2 replies
-
In general this is a valid approach. Two things:
|
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
Thank you for the amazing framework; I really love it. I am currently testing to scale up a GNN on a huge dataset. I have a single graph that contains approximately 700 million edges and a lot of nodes. I am trying to train a graphSAGE on my data. My graph is heterogeneous with two types of nodes.
My graph can't fit into memory, so I made my own dataset class that works great. My dataset contains approximately 150 subgraphs, each with 5 million edges. The data is similar to the Aminer dataset (authors write papers), except that my nodes have features. This is why I am trying to use graphSAGE to do a link prediction task.
It should be okay to "break" my graph at a certain point, which is why I have used the dataset class, but maybe I am also wrong here?
I'm performing the following operations:
data = Dataset_Large(root=os.path.join(path, 'data'))
graph_loader = DataLoader(data, batch_size=1, shuffle=True)
for epoch in range(1, 15) :
total_loss = total_examples = 0
for batch_graph in graph_loader :
I am performing a kind of "batch of batch," and I am wondering if what I'm doing is correct or if it may be memory or computationally inefficient.
Thank you for your advice.
Beta Was this translation helpful? Give feedback.
All reactions