You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`--fast_moe`: trains MoE models in parallel, increasing throughput and decreasing memory usage.
803
+
-`--fast_moe`: trains MoE models in parallel with [Scatter MoE kernels](https://github.com/foundation-model-stack/fms-acceleration/tree/main/plugins/accelerated-moe#fms-acceleration-for-mixture-of-experts), increasing throughput and decreasing memory usage.
804
804
805
805
Notes:
806
806
*`quantized_lora_config` requires that it be used along with LoRA tuning technique. See [LoRA tuning section](https://github.com/foundation-model-stack/fms-hf-tuning/tree/main?tab=readme-ov-file#lora-tuning-example) on the LoRA parameters to pass.
@@ -820,8 +820,13 @@ Notes:
820
820
- works only for *multi-gpu*.
821
821
- currently only includes the version of *multipack* optimized for linear attention implementations like *flash-attn*.
822
822
* Notes on Fast MoE
823
-
-`--fast_moe` is an integer value that configures the amount of expert parallel sharding (ep_degree).
823
+
-`--fast_moe` takes either an integer or boolean value.
824
+
- When an integer `n` is passed, it enables expert parallel sharding with the expert parallel degree as `n` along with Scatter MoE kernels enabled.
825
+
- When a boolean is passed, the expert parallel degree defaults to 1 and further the behaviour would be as follows:
826
+
- if True, it is Scatter MoE Kernels with experts sharded based on the top level sharding protocol (e.g. FSDP).
827
+
- if False, Scatter MoE Kernels with complete replication of experts across ranks.
824
828
-`world_size` must be divisible by the `ep_degree`
829
+
-`number of experts` in the MoE module must be divisible by the `ep_degree`
825
830
- Running fast moe modifies the state dict of the model, and must be post-processed which happens automatically and the converted checkpoint can be found at `hf_converted_checkpoint` folder within every saved checkpoint directory. Alternatively, we can perform similar option manually through [checkpoint utils](https://github.com/foundation-model-stack/fms-acceleration/blob/main/plugins/accelerated-moe/src/fms_acceleration_moe/utils/checkpoint_utils.py) script.
0 commit comments