You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -75,15 +75,15 @@ You can contact us and communicate with us by adding our group:
75
75
76
76
## 🎉 News
77
77
- 🎁 2025.04.15: SWIFT paper has been accepted by AAAI 2025, you can find the paper [here](https://ojs.aaai.org/index.php/AAAI/article/view/35383).
78
-
- 🎁 2025.03.23: SWIFT supports multi round GRPO, this is used to construct multi turn conversations(use cases like agent tool calling), check script [here](examples/train/grpo/train_multi_round.sh).
78
+
- 🎁 2025.03.23: SWIFT supports multi round GRPO, this is used to construct multi turn conversations(use cases like agent tool calling), check script [here](examples/train/grpo/internal/train_multi_round.sh).
79
79
- 🎁 2025.03.16: SWIFT supports training with Megatron's parallel technology. Please refer to the [Megatron-SWIFT Training Documentation](https://swift.readthedocs.io/en/latest/Instruction/Megatron-SWIFT-Training.html).
80
80
- 🎁 2025.03.15: SWIFT support the fine-tuning of gme(multi-modal) embedding models,please check the [training script](examples/train/embedding/train_gme.sh)。
81
-
- 🎁 2025.03.13: We provide a script of GRPO to train a 72B model with only 4 GPUs(4*80G), please check [here](examples/train/grpo/train_72b_4gpu.sh)
82
-
- 🎁 2025.03.05: We support the hybrid mode of GRPO(rollout and actor on the same GPU, rollout sleep when actor training), meanwhile tensor parallel for GRPO, check [training script here](examples/train/grpo/multi_gpu_mp_colocate.sh)
83
-
- 🎁 2025.02.21: We test the speed performance of GRPO,and with some tricks to [speed up to 300%](examples/train/grpo/full_lmdeploy.sh). WanDB charts can be found [here](https://wandb.ai/tastelikefeet/grpo_perf_test?nw=nwuseryuzezyz)
81
+
- 🎁 2025.03.13: We provide a script of GRPO to train a 72B model with only 4 GPUs(4*80G), please check [here](examples/train/grpo/internal/train_72b_4gpu.sh)
82
+
- 🎁 2025.03.05: We support the hybrid mode of GRPO(rollout and actor on the same GPU, rollout sleep when actor training), meanwhile tensor parallel for GRPO, check [training script here](examples/train/grpo/internal/multi_gpu_mp_colocate.sh)
83
+
- 🎁 2025.02.21: We test the speed performance of GRPO,and with some tricks to [speed up to 300%](examples/train/grpo/internal/full_lmdeploy.sh). WanDB charts can be found [here](https://wandb.ai/tastelikefeet/grpo_perf_test?nw=nwuseryuzezyz)
84
84
- 🎁 2025.02.21: Support distill from LLM API,Please check [this example](examples/sampler/distill/distill.sh)
85
85
- 🎁 2025.02.17: Support SwanLab, just add [a few of arguments](docs/source_en/Instruction/Command-line-parameters.md#swanlab) you can use swanlab to analysis your training results
86
-
- 🎁 2025.02.16: Support LMDeploy in GRPO, use `--use_lmdeploy true`. Please check [this script](examples/train/grpo/full_lmdeploy.sh)
86
+
- 🎁 2025.02.16: Support LMDeploy in GRPO, use `--use_lmdeploy true`. Please check [this script](examples/train/grpo/internal/full_lmdeploy.sh)
87
87
- 🔥 2025.02.12: Support for GRPO(Group Relative Policy Optimization) algorithm for llm and mllm, document can be found in [here](docs/source_en/Instruction/GRPO.md)
88
88
- 🎁 2025.02.10: SWIFT support the fine-tuning of embedding models,please check the [training script](examples/train/embedding/train_gte.sh)。
89
89
- 🎁 2025.01.23: SWIFT support the `sample` command, this is a very important feature for complex CoT and RFT. Meanwhile, we support an [Reinforced Fine-tuning script](docs/source_en/Instruction/Reinforced_Fine_tuning.md).
| KTO Training | ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/kto.sh)| ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/rlhf/kto.sh)| ✅ |[✅](https://github.com/modelscope/ms-swift/blob/main/examples/train/multimodal/rlhf/kto.sh)|
0 commit comments