Replies: 5 comments 7 replies
-
If you model is too large to fit in a single gpu then fsdp (reference1, reference2) should help.
Are you not using subgraph sampling because you want to make graph level predictions? |
Beta Was this translation helpful? Give feedback.
-
Yes, I think so. There is basically all-to-all message passing. But that means I should be able to use fsdp with pytorch geometric, there are now support issues or anything? I was not able to find any examples. Thank you very much. |
Beta Was this translation helpful? Give feedback.
-
from: https://discuss.pytorch.org/t/fsdp-issue-with-pytorch-geometric/174628 when trying to adapt the distributed_batching example (https://github.com/pyg-team/pytorch_geometric/blob/master/examples/multi_gpu/distributed_batching.py): any idea what is going wrong?
source code:
|
Beta Was this translation helpful? Give feedback.
-
pytorch/pytorch#95791 (comment) seems similar |
Beta Was this translation helpful? Give feedback.
-
ok, now with pytorch 2.0 and pyg2.3 with one GPU FSDP is switching of shariding, and then it works: distributed/fsdp/_init_utils.py:295: UserWarning: FSDP is switching to use but with GPU count > 1 Traceback (most recent call last): -- Process 0 terminated with the following error: The routine has nothing todo with pytorch_geometric though? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi, I am new to pytorch geometric, so sorry if this is a stupid question: I need to reduce memory footprint as much as possible and was wondering if fsdp or ddp_sharded for distributed training is an option. I always have the full graph in a sample -> no subgraph sampling. Or is only ddp_spawn supported? Can I use ddp-spawn-sharded? Thank you!
Beta Was this translation helpful? Give feedback.
All reactions