Support for custom layer sharding #9882
Unanswered
DavidPeleg6
asked this question in
DDP / multi-GPU / multi-node
Replies: 1 comment
-
Dear @DavidPeleg6, Yes, in theory it should be possible. The TorchSharded API is still experiemental, so expect some changes. Here is the MOE implementation from DeepSpeed: https://github.com/microsoft/DeepSpeed/blob/9f5939d2a7bcdd2953d52a0baf09ede485221a81/deepspeed/moe/layer.py#L18. Some good learning on how they actually implement MOE :) Best, |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hello,
I recently noticed the inclusion of the simplified sharding API mentioned in #9375 , and I wonder whether it would be possible to utilize this api to perform a variable amount of splits to any layer.
For example, if I have a two layer MLP and a node with 5 gpus, would it be possible to explicitly split the first layer to gpus[0,1] , and the second layer to gpus[2,3,4]?
Beta Was this translation helpful? Give feedback.
All reactions