You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I want to do graph-level classification using GCNConv, but the question is the size of each graph in the dataset is too large to fit in one gpu. The average number of nodes in the dataset is several millions and the number of graphs may be hundreds. The memory of one gpu is not capable for the dataset, therefore I want to use multi-gpu to train the model.
I read related tutorials, such as Distributed Training , and example codes under example/multi_gpu/. I find out that, besides model parallelism, there are two ways to do multi-gpu training for data parallelism
For node or edge level taske, usinig NeighborLoader or NeighborLinkLoader to create mini-batches in one large graph;
For graph level task, splitting the batch into mini-batches since the size of all graphs are small.
But those methods are not suitable for my problem. My idea is using NeighborLoader to deal with every graph in the dataset separately and using SageConv to replace GCNConv
for graph in dataset:
node_idx = torch.arange(graph.num_nodes)
rank_idx = node_idx.split( node_idx.size(0)//word_size )[rank]
single_graph_loader = NeighborLoader(graph,
num_neighbors = [-1]*3,
input_nodes = rank_idx,
batch_size = 128)
for batch in single_graph_loader:
...
I am not sure those codes are feasible or not, please correct me if I am wrong.
Another problem I encounter is how to calculate loss. The traditional DDP will calculate the loss based on the mini-batch, and only sync the gradient. In my case, the loss can only be calculated based on the whole graph, not the mini-batch. I can only compute the "subgraph" features by READOUT all node features in the rank. If I reduce all "subgraph" features into the graph feature in rank 0, will it be out of memory in the corresponding gpu?
The newly coming tutorial Distributed Training in PyG seems to be a cure, but it can not be executed in gpu for now. Does anyone have any suggestions?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hello, all,
I want to do graph-level classification using GCNConv, but the question is the size of each graph in the dataset is too large to fit in one gpu. The average number of nodes in the dataset is several millions and the number of graphs may be hundreds. The memory of one gpu is not capable for the dataset, therefore I want to use multi-gpu to train the model.
I read related tutorials, such as Distributed Training , and example codes under example/multi_gpu/. I find out that, besides model parallelism, there are two ways to do multi-gpu training for data parallelism
NeighborLoader
orNeighborLinkLoader
to create mini-batches in one large graph;But those methods are not suitable for my problem. My idea is using
NeighborLoader
to deal with every graph in the dataset separately and usingSageConv
to replaceGCNConv
I am not sure those codes are feasible or not, please correct me if I am wrong.
Another problem I encounter is how to calculate loss. The traditional DDP will calculate the loss based on the mini-batch, and only sync the gradient. In my case, the loss can only be calculated based on the whole graph, not the mini-batch. I can only compute the "subgraph" features by READOUT all node features in the rank. If I reduce all "subgraph" features into the graph feature in rank 0, will it be out of memory in the corresponding gpu?
The newly coming tutorial Distributed Training in PyG seems to be a cure, but it can not be executed in gpu for now. Does anyone have any suggestions?
Beta Was this translation helpful? Give feedback.
All reactions