-
Hello! is there a recommended way to have meshes that can change logical shape dynamically? or put another way, can have multiple axis name mappings? to make it a little more concrete, user might want to have mesh of shape then in sharding annotations etc, code would be able to reference either one possible application of this would be DeepSpeed-TED style expert parallelism, where in expert regions DP is "traded" for EP. Megatron expands on this even further and makes it so users can do arbitrary reshapes of the (DP, EP, CP, TP) submesh for the expert region (i.e. also reduce CP or TP in favor of EP). another possible use case would be "hybrid sharded data parallel" like in MiCS, where we want to shard parameters only along a subset of DP. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 11 replies
-
Yes, I think there is a way to do this. Let me write an example to show you (in a little bit). |
Beta Was this translation helpful? Give feedback.
Actually we need to do some more work to make this possible. But this is roughly how you can do it (note this code raises an error right now which I'll fix):