Skip to content

Commit 1dc2d65

Browse files
authored
support minicpm and mixtral-moe model (#692)
1 parent 8727cd3 commit 1dc2d65

File tree

9 files changed

+164
-10
lines changed

9 files changed

+164
-10
lines changed

README.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,8 @@ To facilitate use by users unfamiliar with deep learning, we provide a Gradio we
3939
Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.
4040

4141
## 🎉 News
42+
- 2024.04.13: Support the fine-tuning and inference of Mixtral-8x22B-v0.1 model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mixtral_moe_8x22b_v1/lora_ddp_ds/sft.sh) to start training!
43+
- 2024.04.13: Support the newly launched **MiniCPM** series: MiniCPM-V-2.0、MiniCPM-2B-128k、MiniCPM-MoE-8x2B and MiniCPM-1B.use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/minicpm_moe_8x2b/lora_ddp/sft.sh) to start training!
4244
- 🔥2024.04.11: Support Model Evaluation with MMLU/ARC/CEval datasets(also user custom eval datasets) with one command! Check [this documentation](docs/source_en/LLM/LLM-eval.md) for details. Meanwhile, we support a trick way to do multiple ablation experiments, check [this documentation](docs/source_en/LLM/LLM-exp.md) to use.
4345
- 🔥2024.04.11: Support **c4ai-command-r** series: c4ai-command-r-plus, c4ai-command-r-v01, [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/c4ai-command-r-plus/lora_mp/sft.sh) to train.
4446
- 2024.04.10: Use SWIFT to fine-tune the qwen-7b-chat model to enhance its function call capabilities, and combine it with [Modelscope-Agent](https://github.com/modelscope/modelscope-agent) for best practices, which can be found [here](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/Agent-best-practice.md#Usage-with-Modelscope_Agent).
@@ -383,13 +385,13 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
383385
| Yuan2 | [Langchao Yuan series models](https://github.com/IEIT-Yuan) | Chinese<br>English | 2B-102B | instruct model |
384386
| XVerse | [XVerse series models](https://github.com/xverse-ai) | Chinese<br>English | 7B-65B | base model<br>chat model<br>long text model<br>MoE model |
385387
| LLaMA2 | [LLaMA2 series models](https://github.com/facebookresearch/llama) | English | 7B-70B<br>including quantized versions | base model<br>chat model |
386-
| Mistral<br>Mixtral | [Mistral series models](https://github.com/mistralai/mistral-src) | English | 7B | base model<br>instruct model<br>MoE model |
388+
| Mistral<br>Mixtral | [Mistral series models](https://github.com/mistralai/mistral-src) | English | 7B-22B | base model<br>instruct model<br>MoE model |
387389
| YI | [01AI's YI series models](https://github.com/01-ai) | Chinese<br>English | 6B-34B | base model<br>chat model<br>long text model |
388390
| InternLM<br>InternLM2<br>InternLM2-Math | [Pujiang AI Lab InternLM series models](https://github.com/InternLM/InternLM) | Chinese<br>English | 1.8B-20B | base model<br>chat model<br>math model |
389391
| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math | [DeepSeek series models](https://github.com/deepseek-ai) | Chinese<br>English | 1.3B-67B | base model<br>chat model<br>MoE model<br>code model<br>math model |
390392
| MAMBA | [MAMBA temporal convolution model](https://github.com/state-spaces/mamba) | English | 130M-2.8B | base model |
391393
| Gemma | [Google Gemma series models](https://github.com/google/gemma_pytorch) | English | 2B-7B | base model<br>instruct model |
392-
| MiniCPM | [OpenBmB MiniCPM series models](https://github.com/OpenBMB/MiniCPM) | Chinese<br>English | 2B-3B | chat model |
394+
| MiniCPM | [OpenBmB MiniCPM series models](https://github.com/OpenBMB/MiniCPM) | Chinese<br>English | 2B-3B | chat model<br>MoE model |
393395
| OpenBuddy | [OpenBuddy series models](https://github.com/OpenBuddy/OpenBuddy) | Chinese<br>English | 7B-67B | base model<br>chat model |
394396
| Orion | [OrionStar AI series models](https://github.com/OrionStarAI) | Chinese<br>English | 14B | base model<br>chat model |
395397
| BlueLM | [VIVO BlueLM large model](https://github.com/vivo-ai-lab/BlueLM) | Chinese<br>English | 7B | base model<br>chat model |

README_CN.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,8 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
4040
此外,我们也在拓展其他模态的能力,目前我们支持了AnimateDiff的全参数训练和LoRA训练。
4141

4242
## 🎉 新闻
43+
- 2024.04.13: 支持Mixtral-8x22B-v0.1模型的推理与微调, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/mixtral_moe_8x22b_v1/lora_ddp_ds/sft.sh)来开始训练!
44+
- 2024.04.13: 支持新推出的**MiniCPM**系列: MiniCPM-V-2.0、MiniCPM-2B-128k、MiniCPM-MoE-8x2B和MiniCPM-1B。使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/minicpm_moe_8x2b/lora_ddp/sft.sh)来开始训练!
4345
- 🔥2024.04.11: 支持一键式模型评测能力! 首批数据集包含MMLU、CEval、ARC等,也支持用户自定义数据集,具体可以[这个文档](docs/source/LLM/LLM评测文档.md)。同时, 我们支持了一个比较trick的方法来做多个消融实验的管理,查看[这个文档](docs/source/LLM/LLM实验文档.md)来使用。
4446
- 🔥2024.04.11: 支持**c4ai-command-r**系列: c4ai-command-r-plus, c4ai-command-r-v01。使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/c4ai-command-r-plus/lora_mp/sft.sh)来开始训练!
4547
- 2024.04.10: 使用swift微调qwen-7b-chat模型增强模型function call能力,并结合[Modelscope-Agent](https://github.com/modelscope/modelscope-agent)使用,最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/Agent微调最佳实践.md#搭配Modelscope-Agent使用)
@@ -382,13 +384,13 @@ CUDA_VISIBLE_DEVICES=0 swift deploy \
382384
| Yuan2 | [浪潮源系列模型](https://github.com/IEIT-Yuan) | 中文<br>英文 | 2B-102B | instruct模型 |
383385
| XVerse | [元象系列模型](https://github.com/xverse-ai) | 中文<br>英文 | 7B-65B | base模型<br>chat模型<br>长文本模型<br>MoE模型 | |
384386
| LLaMA2 | [LLaMA2系列模型](https://github.com/facebookresearch/llama) | 英文 | 7B-70B<br>包含量化版本 | base模型<br>chat模型 |
385-
| Mistral<br>Mixtral | [Mistral系列模型](https://github.com/mistralai/mistral-src) | 英文 | 7B | base模型<br>instruct模型<br>MoE模型 |
387+
| Mistral<br>Mixtral | [Mistral系列模型](https://github.com/mistralai/mistral-src) | 英文 | 7B-8x22B | base模型<br>instruct模型<br>MoE模型 |
386388
| YI | [01AI的YI系列模型](https://github.com/01-ai) | 中文<br>英文 | 6B-34B | base模型<br>chat模型<br>长文本模型 |
387389
| InternLM<br>InternLM2<br>InternLM2-Math | [浦江实验室书生浦语系列模型](https://github.com/InternLM/InternLM) | 中文<br>英文 | 1.8B-20B | base模型<br>chat模型<br>数学模型 |
388390
| DeepSeek<br>DeepSeek-MoE<br>DeepSeek-Coder<br>DeepSeek-Math | [幻方系列模型](https://github.com/deepseek-ai) | 中文<br>英文 | 1.3B-67B | base模型<br>chat模型<br>MoE模型<br>代码模型<br>数学模型 |
389391
| MAMBA | [MAMBA时序卷积模型](https://github.com/state-spaces/mamba) | 英文 | 130M-2.8B | base模型 |
390392
| Gemma | [Google Gemma系列模型](https://github.com/google/gemma_pytorch) | 英文 | 2B-7B | base模型<br>instruct模型 |
391-
| MiniCPM | [OpenBmB MiniCPM系列模型](https://github.com/OpenBMB/MiniCPM) | 中文<br>英文 | 2B-3B | chat模型 |
393+
| MiniCPM | [OpenBmB MiniCPM系列模型](https://github.com/OpenBMB/MiniCPM) | 中文<br>英文 | 2B-3B | chat模型<br>MoE模型 |
392394
| OpenBuddy | [OpenBuddy系列模型](https://github.com/OpenBuddy/OpenBuddy) | 中文<br>英文 | 7B-67B | base模型<br>chat模型 |
393395
| Orion | [猎户星空系列模型](https://github.com/OrionStarAI) | 中文<br>英文 | 14B | base模型<br>chat模型 |
394396
| BlueLM | [VIVO蓝心大模型](https://github.com/vivo-ai-lab/BlueLM) | 中文<br>英文 | 7B | base模型<br>chat模型 |

docs/source/LLM/支持的模型和数据集.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -136,9 +136,13 @@
136136
|gemma-7b|[AI-ModelScope/gemma-7b](https://modelscope.cn/models/AI-ModelScope/gemma-7b/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;|transformers>=4.38|-|
137137
|gemma-2b-instruct|[AI-ModelScope/gemma-2b-it](https://modelscope.cn/models/AI-ModelScope/gemma-2b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.38|-|
138138
|gemma-7b-instruct|[AI-ModelScope/gemma-7b-it](https://modelscope.cn/models/AI-ModelScope/gemma-7b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.38|-|
139-
|minicpm-2b-sft-chat|[OpenBMB/MiniCPM-2B-sft-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-fp32/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2718;||-|
140-
|minicpm-2b-chat|[OpenBMB/MiniCPM-2B-dpo-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp32/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2718;||-|
139+
|minicpm-1b-sft-chat|[OpenBMB/MiniCPM-1B-sft-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;|transformers>=4.36.0|-|
140+
|minicpm-2b-sft-chat|[OpenBMB/MiniCPM-2B-sft-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-fp32/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;||-|
141+
|minicpm-2b-chat|[OpenBMB/MiniCPM-2B-dpo-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp32/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;||-|
142+
|minicpm-2b-128k|[OpenBMB/MiniCPM-2B-128k](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-128k/summary)|q_proj, k_proj, v_proj|chatml|&#x2714;|&#x2714;|transformers>=4.36.0|-|
143+
|minicpm-moe-8x2b|[OpenBMB/MiniCPM-MoE-8x2B](https://modelscope.cn/models/OpenBMB/MiniCPM-MoE-8x2B/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;|transformers>=4.36.0|-|
141144
|minicpm-v-3b-chat|[OpenBMB/MiniCPM-V](https://modelscope.cn/models/OpenBMB/MiniCPM-V/summary)|q_proj, k_proj, v_proj|minicpm-v|&#x2714;|&#x2718;||-|
145+
|minicpm-v-v2|[OpenBMB/MiniCPM-V-2.0](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2.0/summary)|q_proj, k_proj, v_proj|minicpm-v|&#x2714;|&#x2718;||-|
142146
|openbuddy-llama2-13b-chat|[OpenBuddy/openbuddy-llama2-13b-v8.1-fp16](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-13b-v8.1-fp16/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;||-|
143147
|openbuddy-llama-65b-chat|[OpenBuddy/openbuddy-llama-65b-v8-bf16](https://modelscope.cn/models/OpenBuddy/openbuddy-llama-65b-v8-bf16/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;||-|
144148
|openbuddy-llama2-70b-chat|[OpenBuddy/openbuddy-llama2-70b-v10.1-bf16](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;||-|
@@ -153,6 +157,7 @@
153157
|mixtral-moe-7b|[AI-ModelScope/Mixtral-8x7B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;|transformers>=4.36|-|
154158
|mixtral-moe-7b-instruct|[AI-ModelScope/Mixtral-8x7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;|transformers>=4.36|-|
155159
|mixtral-moe-7b-aqlm-2bit-1x16|[AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2718;|transformers>=4.38, aqlm, torch>=2.2.0|-|
160+
|mixtral-moe-8x22b-v1|[AI-ModelScope/Mixtral-8x22B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x22B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;|transformers>=4.36|-|
156161
|baichuan-7b|[baichuan-inc/baichuan-7B](https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary)|W_pack|default-generation|&#x2718;|&#x2714;|transformers<4.34|-|
157162
|baichuan-13b|[baichuan-inc/Baichuan-13B-Base](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Base/summary)|W_pack|default-generation|&#x2718;|&#x2714;|transformers<4.34|-|
158163
|baichuan-13b-chat|[baichuan-inc/Baichuan-13B-Chat](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Chat/summary)|W_pack|baichuan|&#x2718;|&#x2714;|transformers<4.34|-|

docs/source_en/LLM/Supported-models-datasets.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -136,9 +136,13 @@ The table below introcudes all models supported by SWIFT:
136136
|gemma-7b|[AI-ModelScope/gemma-7b](https://modelscope.cn/models/AI-ModelScope/gemma-7b/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;|transformers>=4.38|-|
137137
|gemma-2b-instruct|[AI-ModelScope/gemma-2b-it](https://modelscope.cn/models/AI-ModelScope/gemma-2b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.38|-|
138138
|gemma-7b-instruct|[AI-ModelScope/gemma-7b-it](https://modelscope.cn/models/AI-ModelScope/gemma-7b-it/summary)|q_proj, k_proj, v_proj|gemma|&#x2714;|&#x2714;|transformers>=4.38|-|
139-
|minicpm-2b-sft-chat|[OpenBMB/MiniCPM-2B-sft-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-fp32/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2718;||-|
140-
|minicpm-2b-chat|[OpenBMB/MiniCPM-2B-dpo-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp32/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2718;||-|
139+
|minicpm-1b-sft-chat|[OpenBMB/MiniCPM-1B-sft-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;|transformers>=4.36.0|-|
140+
|minicpm-2b-sft-chat|[OpenBMB/MiniCPM-2B-sft-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-fp32/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;||-|
141+
|minicpm-2b-chat|[OpenBMB/MiniCPM-2B-dpo-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp32/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;||-|
142+
|minicpm-2b-128k|[OpenBMB/MiniCPM-2B-128k](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-128k/summary)|q_proj, k_proj, v_proj|chatml|&#x2714;|&#x2714;|transformers>=4.36.0|-|
143+
|minicpm-moe-8x2b|[OpenBMB/MiniCPM-MoE-8x2B](https://modelscope.cn/models/OpenBMB/MiniCPM-MoE-8x2B/summary)|q_proj, k_proj, v_proj|minicpm|&#x2714;|&#x2714;|transformers>=4.36.0|-|
141144
|minicpm-v-3b-chat|[OpenBMB/MiniCPM-V](https://modelscope.cn/models/OpenBMB/MiniCPM-V/summary)|q_proj, k_proj, v_proj|minicpm-v|&#x2714;|&#x2718;||-|
145+
|minicpm-v-v2|[OpenBMB/MiniCPM-V-2.0](https://modelscope.cn/models/OpenBMB/MiniCPM-V-2.0/summary)|q_proj, k_proj, v_proj|minicpm-v|&#x2714;|&#x2718;||-|
142146
|openbuddy-llama2-13b-chat|[OpenBuddy/openbuddy-llama2-13b-v8.1-fp16](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-13b-v8.1-fp16/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;||-|
143147
|openbuddy-llama-65b-chat|[OpenBuddy/openbuddy-llama-65b-v8-bf16](https://modelscope.cn/models/OpenBuddy/openbuddy-llama-65b-v8-bf16/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;||-|
144148
|openbuddy-llama2-70b-chat|[OpenBuddy/openbuddy-llama2-70b-v10.1-bf16](https://modelscope.cn/models/OpenBuddy/openbuddy-llama2-70b-v10.1-bf16/summary)|q_proj, k_proj, v_proj|openbuddy|&#x2714;|&#x2714;||-|
@@ -153,6 +157,7 @@ The table below introcudes all models supported by SWIFT:
153157
|mixtral-moe-7b|[AI-ModelScope/Mixtral-8x7B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;|transformers>=4.36|-|
154158
|mixtral-moe-7b-instruct|[AI-ModelScope/Mixtral-8x7B-Instruct-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7B-Instruct-v0.1/summary)|q_proj, k_proj, v_proj|llama|&#x2714;|&#x2714;|transformers>=4.36|-|
155159
|mixtral-moe-7b-aqlm-2bit-1x16|[AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x7b-AQLM-2Bit-1x16-hf/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2718;|transformers>=4.38, aqlm, torch>=2.2.0|-|
160+
|mixtral-moe-8x22b-v1|[AI-ModelScope/Mixtral-8x22B-v0.1](https://modelscope.cn/models/AI-ModelScope/Mixtral-8x22B-v0.1/summary)|q_proj, k_proj, v_proj|default-generation-bos|&#x2714;|&#x2714;|transformers>=4.36|-|
156161
|baichuan-7b|[baichuan-inc/baichuan-7B](https://modelscope.cn/models/baichuan-inc/baichuan-7B/summary)|W_pack|default-generation|&#x2718;|&#x2714;|transformers<4.34|-|
157162
|baichuan-13b|[baichuan-inc/Baichuan-13B-Base](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Base/summary)|W_pack|default-generation|&#x2718;|&#x2714;|transformers<4.34|-|
158163
|baichuan-13b-chat|[baichuan-inc/Baichuan-13B-Chat](https://modelscope.cn/models/baichuan-inc/Baichuan-13B-Chat/summary)|W_pack|baichuan|&#x2718;|&#x2714;|transformers<4.34|-|
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Experimental environment: A100
2+
# 30GB GPU memory
3+
4+
CUDA_VISIBLE_DEVICES=0 \
5+
swift infer \
6+
--ckpt_dir "output/minicpm-moe-8x2b/vx-xxx/checkpoint-xxx" \
7+
--load_dataset_config true \
8+
--max_new_tokens 2048 \
9+
--temperature 0.1 \
10+
--top_p 0.7 \
11+
--repetition_penalty 1. \
12+
--do_sample true \
13+
--merge_lora false \
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Experimental environment: 4 * A100
2+
# 4 * 41GB GPU memory
3+
nproc_per_node=4
4+
CUDA_VISIBLE_DEVICES=0,1,2,3 \
5+
NPROC_PER_NODE=$nproc_per_node \
6+
MASTER_PORT=29500 \
7+
swift sft \
8+
--model_id_or_path OpenBMB/MiniCPM-MoE-8x2B \
9+
--model_revision master \
10+
--sft_type lora \
11+
--tuner_backend peft \
12+
--dtype bf16 \
13+
--output_dir output \
14+
--dataset blossom-math-zh \
15+
--train_dataset_sample -1 \
16+
--num_train_epochs 1 \
17+
--max_length 2048 \
18+
--check_dataset_strategy warning \
19+
--lora_rank 8 \
20+
--lora_alpha 32 \
21+
--lora_dropout_p 0.05 \
22+
--lora_target_modules DEFAULT \
23+
--gradient_checkpointing false \
24+
--batch_size 1 \
25+
--weight_decay 0.1 \
26+
--learning_rate 1e-4 \
27+
--gradient_accumulation_steps $(expr 16 / $nproc_per_node) \
28+
--max_grad_norm 0.5 \
29+
--warmup_ratio 0.03 \
30+
--eval_steps 100 \
31+
--save_steps 100 \
32+
--save_total_limit 2 \
33+
--logging_steps 10 \
34+
--use_flash_attn false \

0 commit comments

Comments
 (0)