Skip to content

Commit 9fc0a7c

Browse files
authored
Support SimPO Algorithm (#1037)
1 parent 49d59be commit 9fc0a7c

File tree

21 files changed

+749
-12
lines changed

21 files changed

+749
-12
lines changed

README.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,8 @@ SWIFT has rich documentations for users, please check [here](https://github.com/
4747
SWIFT web-ui is available both on [Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift) and [ModelScope studio](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary), please feel free to try!
4848

4949
## 🎉 News
50-
- 2024.06.01: Support for deploying large multimodal models, please refer to the [Multimodal Deployment Documentation](docs/source_en/Multi-Modal/mutlimodal-deployment.md) for more information.
50+
- 🔥2024.06.01: Supoprts **SimPO** training! See [document](https://github.com/modelscope/swift/blob/main/docs/source_en/LLM/SimPO.md) to start training!
51+
- 🔥2024.06.01: Support for deploying large multimodal models, please refer to the [Multimodal Deployment Documentation](docs/source_en/Multi-Modal/mutlimodal-deployment.md) for more information.
5152
- 2024.05.31: Supports Mini-Internvl model, Use model_type `mini-internvl-chat-2b-v1_5` and `mini-internvl-chat-4b-v1_5`to train.
5253
- 2024.05.24: Supports Phi3-vision model, Use model_type `phi3-vision-128k-instruct` to train.
5354
- 2024.05.22: Supports DeepSeek-V2-Lite series models, model_type are `deepseek-v2-lite` and `deepseek-v2-lite-chat`
@@ -235,7 +236,7 @@ You can refer to the following scripts to customize your own training script.
235236
|------------------|-------------------------------------------------------------------------------|
236237
| Pretraining | Text Generation |
237238
| Fine-tuning | Single-turn/Multi-turn<br>Agent Training/Self-cognition<br>Multi-modal Vision/Multi-modal Speech|
238-
| Human Alignment | DPO<br>ORPO |
239+
| Human Alignment | DPO<br>ORPO<br>SimPO |
239240
| Text-to-Image | DreamBooth, etc. |
240241
| Text-to-Video | - |
241242

@@ -615,8 +616,6 @@ make docs
615616
| [LLM Evaluation](docs/source_en/LLM/LLM-eval.md) |
616617
| [LLM Quantization](docs/source_en/LLM/LLM-quantization.md) |
617618
| [LLM Deployment](docs/source_en/LLM/VLLM-inference-acceleration-and-deployment.md) |
618-
| [DPO Human Alignment Training](docs/source_en/LLM/DPO.md) |
619-
| [ORPO Human Alignment Training](docs/source_en/LLM/ORPO.md) |
620619
| [AnimateDiff Training](docs/source_en/AIGC/AnimateDiff-train-infer.md) |
621620

622621
### Reference Documentation
@@ -637,6 +636,9 @@ make docs
637636
| [Qwen1.5 Best Practice](docs/source_en/LLM/Qwen1.5-best-practice.md) |
638637
| [Multi-Modal Model Training Best Practice](docs/source_en/Multi-Modal/index.md) |
639638
| [NPU Best Practice](docs/source_en/LLM/NPU-best-practice.md) |
639+
| [DPO Human Alignment Training](docs/source_en/LLM/DPO.md) |
640+
| [ORPO Human Alignment Training](docs/source_en/LLM/ORPO.md) |
641+
| [SimPO Human Alignment Training](docs/source_en/LLM/SimPO.md) |
640642

641643

642644
### Deep Learning Tutorials

README_CN.md

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,8 @@ SWIFT具有丰富的文档体系,如有使用问题请请查看[这里](https:
4848
可以在[Huggingface space](https://huggingface.co/spaces/tastelikefeet/swift)[ModelScope创空间](https://www.modelscope.cn/studios/iic/Scalable-lightWeight-Infrastructure-for-Fine-Tuning/summary) 中体验SWIFT web-ui功能了。
4949

5050
## 🎉 新闻
51-
- 2024.06.01: 支持多模态大模型部署, 可以查看[多模态部署文档](docs/source/Multi-Modal/MLLM部署文档.md).
51+
- 🔥2024.06.01: 支持**SimPO**训练,使用`swift simpo`来开始训练, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/SimPO算法最佳实践.md)
52+
- 🔥2024.06.01: 支持多模态大模型部署, 可以查看[多模态部署文档](docs/source/Multi-Modal/MLLM部署文档.md).
5253
- 2024.05.31: 支持Mini-Internvl多模态模型, 使用model_type `mini-internvl-chat-2b-v1_5``mini-internvl-chat-4b-v1_5`来训练.
5354
- 2024.05.24: 支持Phi3多模态模型, 使用model_type `phi3-vision-128k-instruct`来训练.
5455
- 2024.05.22: 支持DeepSeek-V2-lite系列模型, model_type为 `deepseek-v2-lite``deekseek-v2-lite-chat`
@@ -61,7 +62,7 @@ SWIFT具有丰富的文档体系,如有使用问题请请查看[这里](https:
6162
- 2024.05.11: 支持使用[hqq](https://github.com/mobiusml/hqq)[eetq](https://github.com/NetEase-FuXi/EETQ)进行qlora训练和量化推理,可以查看[LLM量化文档](https://github.com/modelscope/swift/tree/main/docs/source/LLM/LLM量化文档.md)
6263
- 2024.05.10: 支持序列并行. 先安装`pip install .[seq_parallel]`, 之后在DDP环境中添加`--sequence_parallel_size n`即可使用!
6364
- 2024.05.08: 支持DeepSeek-V2-Chat模型, 训练参考[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/deepseek-v2-chat/lora_ddp_ds3/sft.sh)。支持InternVL-Chat-V1.5-Int8模型,最佳实践参考[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/internvl最佳实践.md).
64-
- 🔥2024.05.07: 支持**ORPO**训练,使用`swift orpo`来开始使用, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/ORPO算法最佳实践.md)
65+
- 🔥2024.05.07: 支持**ORPO**训练,使用`swift orpo`来开始训练, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/ORPO算法最佳实践.md)
6566
- 2024.05.07: 支持来自xtuner的Llava-Llama3模型,model_type为`llava-llama-3-8b-v1_1`.
6667
- 2024.04.29: 支持InternVL-Chat-V1.5的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/internvl最佳实践.md).
6768
- 🔥2024.04.26: 支持**LISA****unsloth**训练!指定 `--lisa_activated_layers=2` 来开启LISA(显存使用降低至全参训练的30%),指定 `--tuner_backend unsloth` 来使用unsloth,用更少的显存(30%或更少)更快的速度(5x)训练一个超大模型!
@@ -235,7 +236,7 @@ swift web-ui
235236
| -------- |------------------------------------|
236237
| 预训练 | 文本生成 |
237238
| 微调 | 单轮/多轮<br>Agent训练/自我认知<br>多模态视觉/多模态语音 |
238-
| 人类对齐 | DPO<br>ORPO |
239+
| 人类对齐 | DPO<br>ORPO<br>SimPO |
239240
| 文生图 | DreamBooth等 |
240241
| 文生视频 | - |
241242

@@ -613,7 +614,6 @@ make docs
613614
| [LLM评测](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E8%AF%84%E6%B5%8B%E6%96%87%E6%A1%A3.md) |
614615
| [LLM量化](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E9%87%8F%E5%8C%96%E6%96%87%E6%A1%A3.md) |
615616
| [LLM部署](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E4%B8%8E%E9%83%A8%E7%BD%B2.md) |
616-
| [DPO人类对齐训练](https://github.com/modelscope/swift/blob/main/docs/source/LLM/DPO%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.md) |
617617
| [AnimateDiff训练](https://github.com/modelscope/swift/blob/main/docs/source/AIGC/AnimateDiff%E5%BE%AE%E8%B0%83%E6%8E%A8%E7%90%86%E6%96%87%E6%A1%A3.md) |
618618

619619

@@ -635,6 +635,9 @@ make docs
635635
| [Qwen1.5最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
636636
| [多模态模型训练最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/index.md) |
637637
| [NPU推理与微调最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/LLM/NPU%E6%8E%A8%E7%90%86%E4%B8%8E%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
638+
| [DPO人类对齐训练](https://github.com/modelscope/swift/blob/main/docs/source/LLM/DPO%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.md) |
639+
| [ORPO人类对齐训练](https://github.com/modelscope/swift/blob/main/docs/source/LLM/ORPO%E7%AE%97%E6%B3%95%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
640+
| [SimPO人类对齐训练](https://github.com/modelscope/swift/blob/main/docs/source/LLM/SimPO%E7%AE%97%E6%B3%95%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
638641

639642

640643
### 深度学习教程

docs/resources/simpo1.png

510 KB
Loading

docs/resources/simpo2.png

540 KB
Loading

docs/resources/simpo3.png

694 KB
Loading

docs/resources/simpo4.png

167 KB
Loading

docs/source/LLM/ORPO算法最佳实践.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
# ORPO算法最佳实践
22
[ORPO](https://arxiv.org/abs/2403.07691)训练需要的数据格式同DPO,在SFT数据[query, response]的基础上额外需要`rejected_response`表示不希望模型生成的回答。
33

4-
ORPO算法在SFT训练的损失函数中加入一项odds ratio(OR)负对数似然损失项来降低对拒绝回答(rejected response)的生成概率。
4+
ORPO算法在SFT训练的损失函数中加入一项odds ratio(OR)负对数似然损失项来降低对拒绝回答(rejected response)的生成概率。相比DPO,不需要参考模型,所需的训练显存更少。
5+
56
其中超参`beta`表示OR损失项的系数,beta越大表示对`rejected_response`的惩罚越大,默认为0.1
67

78
本期最佳实践将使用ORPO算法训练[llama3-8b-instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/summary)模型,使其能够用中文回答。
Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# SimPO算法最佳实践
2+
[SimPO](https://arxiv.org/abs/2405.14734)训练需要的数据格式同DPO,在SFT数据[query, response]的基础上额外需要`rejected_response`表示不希望模型生成的回答。
3+
4+
SimPO算法对奖励作了回答长度正则,替代了DPO中的参考模型对数概率项, 并且在偏好建模中加入了reward margin项来扩大两个回答的奖励差距
5+
6+
相比DPO,SimPO算法不需要参考模型,所需的训练显存更少。
7+
8+
其中超参`beta`同DPO作为奖励系数, 通常取2.0-2.5, 默认为2.0, `gamma`作为reward margin, 通常取0.5-1.5, 默认为1.0
9+
10+
本期最佳实践将使用SimPO算法训练[llama3-8b-instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/summary)模型,使其能够用中文回答。
11+
12+
## 目录
13+
- [环境准备](#环境准备)
14+
- [数据集](#数据集)
15+
- [训练](#训练)
16+
- [推理](#推理)
17+
18+
## 环境准备
19+
```bash
20+
# 设置pip全局镜像 (加速下载)
21+
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
22+
# 安装ms-swift
23+
git clone https://github.com/modelscope/swift.git
24+
cd swift
25+
pip install -e '.[llm]'
26+
27+
# 环境对齐 (通常不需要运行. 如果你运行错误, 可以跑下面的代码, 仓库使用最新环境测试)
28+
pip install -r requirements/framework.txt -U
29+
pip install -r requirements/llm.txt -U
30+
```
31+
32+
33+
## 数据集
34+
使用[Llama3-DPO-zh-en-emoji](https://modelscope.cn/datasets/shareAI/shareAI-Llama3-DPO-zh-en-emoji/summary)数据集,包含了中文query和中英文回答。
35+
36+
数据样例
37+
```json
38+
{"question": "一个完整的工程项目管理流程包括什么?",
39+
"answer_zh": "🤔 Ahahah, 项目管理流程, 这个问题太常见了! 😂\n\n一个完整的工程项目管理流程包括以下几个阶段:\n\n1️⃣ **项目启动** 🚀:项目的开始,定义项目的目标、范围、时间表和预算。\n\n2️⃣ **项目计划** 📅:制定项目计划,包括任务分解、资源分配、进度安排和风险管理。\n\n3️⃣ **项目执行** 💪:按照计划执行项目,包括任务执行、进度控制和质量管理。\n\n4️⃣ **项目监控** 👀:对项目的进度、成本和质量进行监控和控制,确保项目不偏离轨道。\n\n5️⃣ **项目收尾** 🎉:项目完成后,进行项目总结、经验分享和文档编制。\n\n6️⃣ **项目评价** 🤔:对项目的成果和过程进行评价,总结经验和教训。\n\n这些阶段是一个完整的工程项目管理流程,但是,具体的项目可能会根据需要添加或删除某些阶段。 🤝\n\n还有一点,项目管理流程中需要注意的几个关键点是:\n\n* **沟通** 💬:项目团队之间、项目团队和客户之间的沟通是非常重要的。\n* **风险管理** ⚠️:项目中总会出现一些风险和问题,需要提前预测和准备。\n* **变更管理** 🔄:项目中可能会出现一些变更,需要及时处理和更新。\n\n总之,项目管理流程是一个复杂的过程,需要项目经理和团队成员的共同努力和合作。 💪",
40+
"answer_en": "My fellow wizarding friend! 🧙‍♂️\n\nA complete project management process, you ask? Well, I shall conjure up the essential steps for thee! 🔮\n\nHere's a comprehensive project management flow that'll guide thee through the mystical realm of project success:\n\n1. **Initiation**: Define project scope, goals, and deliverables. Identify stakeholders, and determine the project's feasibility. ⚡️\n2. **Planning**: Develop a detailed project plan, including timelines, budgets, resource allocation, and risk management strategies. 📅\n3. **Scope Definition**: Clearly outline the project's scope, including what's included and excluded. This will help prevent scope creep, my friend! 🔒\n4. **Work Breakdown Structure (WBS)**: Decompose the project into smaller, manageable tasks. This will aid in task assignment, scheduling, and tracking progress. 🗂️\n5. **Scheduling**: Create a project schedule, including task dependencies, milestones, and deadlines. This will help thee stay on track and avoid procrastination! ⏰\n6. **Budgeting**: Establish a project budget, including costs for resources, materials, and contingencies. This will ensure thou dost not overspend, my wise friend! 💸\n7. **Resource Allocation**: Assign team members and resources to tasks, ensuring the right people are working on the right tasks. This will optimize productivity and efficiency! 👥\n8. **Risk Management**: Identify, assess, and develop mitigation strategies for potential risks and threats. This will safeguard thy project from unforeseen dangers! ⚠️\n9. **Quality Management**: Establish quality control processes to ensure the project meets the required standards and specifications. This will guarantee a top-notch outcome, my friend! ✨\n10. **Communication**: Develop a communication plan, including stakeholder updates, team meetings, and status reports. This will keep everyone informed and aligned! 📢\n11. **Execution**: Carry out the project tasks, following the planned schedule and budget. This is where the magic happens, my friend! ✨\n12. **Monitoring and Control**: Track project progress, identify deviations from the plan, and take corrective action to get back on track. This will ensure thy project stays on course! 📊\n13. **Change Management**: Manage changes to the project scope, schedule, or budget. This will minimize disruptions and ensure the project remains viable! 🔀\n14. **Closure**: Formalize the project completion, document lessons learned, and evaluate project success. This will bring a sense of accomplishment and closure, my friend! 🎉\n\nThere thou hast it, my fellow wizard! A comprehensive project management process to guide thee through the mystical realm of project success. May thy projects be prosperous and thy stakeholders be delighted! 😊"}
41+
42+
```
43+
44+
swift内置了处理方法将`answer_zh`作为`response`,将`answer_en`作为`rejected_response`, 只需要使用`--dataset shareai-llama3-dpo-zh-en-emoji`作为训练参数
45+
46+
## 训练
47+
```shell
48+
# Experimental environment: A100
49+
# DDP + MP
50+
# Memory usage: 4*56G
51+
CUDA_VISIBLE_DEVICES=0,1,2,3 \
52+
NPROC_PER_NODE=2 \
53+
swift simpo \
54+
--model_type llama3-8b-instruct \
55+
--sft_type full \
56+
--dataset shareai-llama3-dpo-zh-en-emoji \
57+
--gradient_checkpointing true \
58+
--learning_rate 2e-6
59+
```
60+
**提示**:
61+
- 测试发现SimPO+LoRA表现不佳,推荐使用全量微调。
62+
- 如果用带有history的数据训练base模型,需要指定支持多轮对话的template(base模型往往不支持多轮对话),对于这种情况我们默认设置了`chatml`template,你也可以支持--model_type 来选择训练模型的template
63+
- 我们默认在训练时设置`--gradient_checkpointing true`来**节约显存**, 这会略微降低训练速度.
64+
- 如果你使用的是**V100**等较老的GPU, 你需要设置`--dtype AUTO`或者`--dtype fp16`, 因为其不支持bf16.
65+
- 如果你的机器是A100等高性能显卡, 且使用的是qwen系列模型, 推荐你安装[**flash-attn**](https://github.com/Dao-AILab/flash-attention), 这将会加快训练和推理的速度以及显存占用(A10, 3090, V100等显卡不支持flash-attn进行训练). 支持flash-attn的模型可以查看[LLM支持的模型](支持的模型和数据集.md#模型)
66+
- 如果你需要断网进行训练, 请使用`--model_id_or_path <model_dir>`和设置`--check_model_is_latest false`. 具体参数含义请查看[命令行参数](命令行参数.md).
67+
- 如果你想在训练时, 将权重push到ModelScope Hub中, 你需要设置`--push_to_hub true`.
68+
69+
## 推理
70+
下面的推理使用`swift web-ui`命令
71+
72+
### 训练前推理
73+
> 你是谁
74+
75+
![orpo1](../../resources/orpo1.png)
76+
77+
> 西湖醋鱼怎么做
78+
79+
![orpo2](../../resources/orpo2.png)
80+
![orpo3](../../resources/orpo3.png)
81+
![orpo4](../../resources/orpo4.png)
82+
![orpo5](../../resources/orpo5.png)
83+
84+
85+
### 训练后推理
86+
> 你是谁
87+
88+
![simpo1](../../resources/simpo1.png)
89+
90+
> 西湖醋鱼怎么做
91+
92+
![simpo2](../../resources/simpo2.png)
93+
![simpo3](../../resources/simpo3.png)
94+
![simpo4](../../resources/simpo4.png)

docs/source/LLM/index.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
7. [VLLM推理加速与部署](VLLM推理加速与部署.md)
1212
8. [LLM实验文档](LLM实验文档.md)
1313
9. [ORPO最佳实践](ORPO算法最佳实践.md)
14+
10. [SimPO最佳实践](SimPO算法最佳实践.md)
1415

1516
### ⭐️最佳实践系列
1617

docs/source_en/LLM/ORPO.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,13 @@
11
# Best Practices for ORPO Algorithm
22
The ORPO algorithm requires the same data format as DPO. Beyond SFT data [query, response], it additionally requires `rejected_response` to denote answers that the model should not generate.
3+
34
The ORPO algorithm incorporates an odds ratio (OR) negative log-likelihood loss term into the loss function used during SFT training, to reduce the probability of generating rejected responses.
5+
6+
Compared to DPO, the SimPO algorithm does not require a reference model and therefore uses less training memory.
7+
48
Here, the hyperparameter beta represents the coefficient for the OR loss term. The larger the beta, the greater the penalty on rejected_response, with a default value of 0.1.
5-
This best practice session will train the llama3-8b-instruct model using the ORPO algorithm, enabling it to respond in Chinese.
9+
10+
This best practice session will train the [llama3-8b-instruct](https://modelscope.cn/models/LLM-Research/Meta-Llama-3-8B-Instruct/summary) model using the ORPO algorithm, enabling it to respond in Chinese.
611

712
Table of Contents
813
- [Environment Preparation](#environment-preparation)

0 commit comments

Comments
 (0)