You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Enable LoRA saving only for non MoE linear layers training with kernels. (foundation-model-stack#530)
* save peft
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* post process hf converted dir
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* fix: convert hf converted checkpoint
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* lora config
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* save adapter config
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* fix: add input linear and output linear to target modules
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* fix: extend instead of append
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* fix: if hasattr peft config
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* fix: remove unneeded target modules
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* test: lora for scattermoe
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* explitcitly don't support router layer
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* docs: update documentation
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* fix: simplify accelerate launch post processing
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* tests: more target modules + ep_degree
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* fix: only restrict all-linear, raise warning for other modules
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* fix: augmentation test
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* fix: raise error
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* turn off requires grad if using scattermoe with lora
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* fix: freeze scattermoe params
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
* fix: safer freezing
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
---------
Signed-off-by: Will Johnson <mwjohnson728@gmail.com>
Copy file name to clipboardExpand all lines: README.md
+3Lines changed: 3 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -855,6 +855,9 @@ Notes:
855
855
- When a boolean is passed, the expert parallel degree defaults to 1 and further the behaviour would be as follows:
856
856
- if True, it is Scatter MoE Kernels with experts sharded based on the top level sharding protocol (e.g. FSDP).
857
857
- if False, Scatter MoE Kernels with complete replication of experts across ranks.
858
+
- FSDP must be used when lora tuning with `--fast_moe`
859
+
- lora tuning with ScatterMoE is supported, but because of inference restrictions on vLLM/vanilla PEFT, the expert layers and router linear layer should not be trained as `target_modules` for models being tuned with ScatterMoE. Users have control over which `target_modules` they wish to train:
860
+
- At this time, only attention layers are trainable when using LoRA with scatterMoE. Until support for the router linear layer is added in, target modules must be specified explicitly (i.e `target_modules: ["q_proj", "v_proj", "o_proj", "k_proj"]`) instead of passing `target_modules: ["all-linear"]`.
858
861
-`world_size` must be divisible by the `ep_degree`
859
862
-`number of experts` in the MoE module must be divisible by the `ep_degree`
860
863
- Running fast moe modifies the state dict of the model, and must be post-processed which happens automatically and the converted checkpoint can be found at `hf_converted_checkpoint` folder within every saved checkpoint directory. Alternatively, we can perform similar option manually through [checkpoint utils](https://github.com/foundation-model-stack/fms-acceleration/blob/main/plugins/accelerated-moe/src/fms_acceleration_moe/utils/checkpoint_utils.py) script.
0 commit comments