Skip to content

Commit df7a194

Browse files
authored
support npu & deepspeed (#743)
1 parent 410d952 commit df7a194

File tree

12 files changed

+283
-25
lines changed

12 files changed

+283
-25
lines changed

README.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@ To facilitate use by users unfamiliar with deep learning, we provide a Gradio we
3939
Additionally, we are expanding capabilities for other modalities. Currently, we support full-parameter training and LoRA training for AnimateDiff.
4040

4141
## 🎉 News
42+
- 2024.04.19: Support for single-card, DDP, ZeRO2, and ZeRO3 training and inference with NPU, please refer to [NPU Inference and Fine-tuning Best Practices](docs/source/LLM/NPU Inference and Fine-tuning Best Practices.md).
4243
- 2024.04.19: Support for inference, fine-tuning, and deployment of **Llama3** series models. This includes: Llama-3-8B, Llama-3-8B-Instruct, Llama-3-70B, and Llama-3-70B-Instruct. use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/llama3_8b_instruct/lora/sft.sh) to train.
4344
- 2024.04.18: Supported models: wizardlm2-7b-awq, wizardlm2-8x22b, yi-6b-chat-awq, yi-6b-chat-int8, yi-34b-chat-awq, yi-34b-chat-int8. Supported `--deepspeed zero3-offload` and provided default zero3-offload configuration file for zero3+cpu offload usage.
4445
- 2024.04.18: Supported compatibility with HuggingFace ecosystem using the environment variable `USE_HF`, switching to use models and datasets from HF. Please refer to the [HuggingFace ecosystem compatibility documentation](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/Compat-HF.md).
@@ -60,6 +61,8 @@ Additionally, we are expanding capabilities for other modalities. Currently, we
6061
- 🔥2024.03.29: Support the fine-tuning and inference of **Grok-1** 300B MoE, please view details [here](https://github.com/modelscope/swift/tree/main/docs/source_en/LLM/Grok-1-best-practice.md).
6162
- 🔥2024.03.25: Supports inference and fine-tuning of TeleChat-7b and TeleChat-12b model, use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh) to start training!
6263
- 🔥2024.03.20: Supports inference and fine-tuning for the **llava** series. For best practice, you can refer to [here](https://github.com/modelscope/swift/tree/main/docs/source_en/Multi-Modal/llava-best-practice.md).
64+
<details><summary>More</summary>
65+
6366
- 🔥2024.03.12: Support inference and fine-tuning for **deepseek-vl** series. Best practices can be found [here](docs/source_en/Multi-Modal/deepseek-vl-best-practice.md).
6467
- 🔥2024.03.11: Support [GaLore](https://arxiv.org/abs/2403.03507) for effectively reducing memory usage to 1/2 of the original in full-parameter training.
6568
- 🔥2024.03.10: [End-to-end best practices](docs/source_en/LLM/Qwen1.5-best-practice.md) from fine-tuning to deployment for Qwen1.5-7B-Chat and Qwen1.5-72B-Chat.
@@ -69,8 +72,6 @@ Additionally, we are expanding capabilities for other modalities. Currently, we
6972
- 🔥2024.02.29: Support [LLaMA PRO](https://arxiv.org/pdf/2401.02415.pdf), simply use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/yi_6b_chat/llamapro/sft.sh) to start training.
7073
- 🔥2024.02.29: Support [LoRA+](https://arxiv.org/pdf/2402.12354.pdf), simply use [this script](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/yi_6b_chat/lorap/sft.sh) to start training.
7174
- 2024.02.25: Support `swift export` to quantize models using **AWQ/GPTQ** and push to ModelScope Hub. See documentation: [LLM Quantization](docs/source_en/LLM/LLM-quantization.md).
72-
<details><summary>More</summary>
73-
7475
- 2024.02.22: Support gemma series: gemma-2b, [gemma-2b-instruct](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/gemma_2b_instruct), gemma-7b, gemma-7b-instruct.
7576
- 2024.02.16: Support deepseek-math series: deepseek-math-7b, deepseek-math-7b-instruct, deepseek-math-7b-chat.
7677
- 🔥2024.02.05: Support **Qwen1.5** series models, see [model list](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E6%94%AF%E6%8C%81%E7%9A%84%E6%A8%A1%E5%9E%8B%E5%92%8C%E6%95%B0%E6%8D%AE%E9%9B%86.md#%E6%A8%A1%E5%9E%8B) for all supported Qwen1.5 models. Provide fine-tuning scripts for [qwen1half-7b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen1half_7b_chat), [qwen1half-7b-chat-int8](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen1half_7b_chat_int8).
@@ -519,8 +520,9 @@ make docs
519520
| ------------------------------------------------------------ |
520521
| [Using Web-UI](docs/source_en/GetStarted/Web-ui.md) |
521522
| [Using Tuners](docs/source_en/GetStarted/Tuners.md) |
522-
| [LLM Fine-tuning](docs/source_en/LLM/LLM-fine-tuning.md) |
523523
| [LLM Inference](docs/source_en/LLM/LLM-inference.md) |
524+
| [LLM Fine-tuning](docs/source_en/LLM/LLM-fine-tuning.md) |
525+
| [LLM Evaluation](docs/source_en/LLM/LLM-eval.md) |
524526
| [LLM Quantization](docs/source_en/LLM/LLM-quantization.md) |
525527
| [LLM Deployment](docs/source_en/LLM/VLLM-inference-acceleration-and-deployment.md) |
526528
| [DPO Human Alignment Training](docs/source_en/LLM/RLHF.md) |
@@ -532,17 +534,19 @@ make docs
532534
| [Command Line Arguments](docs/source_en/LLM/Command-line-parameters.md) |
533535
| [Customizing New Models and Datasets](docs/source_en/LLM/Customization.md) |
534536
| [Supported Models and Datasets List](docs/source_en/LLM/Supported-models-datasets.md) |
535-
| [Runtime Speed and Memory Benchmark](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Benchmark.md) |
537+
| [Runtime Speed and Memory Benchmark](docs/source_en/LLM/Benchmark.md) |
536538

537539

538540
### Best Practices
539541

540542
| Best Practices Name |
541543
| ------------------------------------------------------------ |
542-
| [Agent Fine-Tuning Best Practice](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Agent%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
543-
| [Self-Cognition Fine-Tuning Best Practice](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E8%87%AA%E6%88%91%E8%AE%A4%E7%9F%A5%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
544-
| [Qwen1.5 Best Practice](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
545-
| [Multi-Modal Model Training Best Practice](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/index.md) |
544+
| [Agent Fine-Tuning Best Practice](docs/source_en/LLM/Agent-best-practice.md) |
545+
| [Self-Cognition Fine-Tuning Best Practice](docs/source_en/LLM/Self-cognition-best-practice.md) |
546+
| [Qwen1.5 Best Practice](docs/source_en/LLM/Qwen1.5-best-practice.md) |
547+
| [Multi-Modal Model Training Best Practice](docs/source_en/Multi-Modal/index.md) |
548+
| [NPU Best Practice](docs/source_en/LLM/NPU-best-practice.md) |
549+
546550

547551
### Deep Learning Tutorials
548552

README_CN.md

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,7 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
4040
此外,我们也在拓展其他模态的能力,目前我们支持了AnimateDiff的全参数训练和LoRA训练。
4141

4242
## 🎉 新闻
43+
- 2024.04.19: 支持NPU的单卡、DDP、ZeRO2和ZeRO3的训练与推理, 可以查看[NPU推理与微调最佳实践](docs/source/LLM/NPU推理与微调最佳实践.md).
4344
- 2024.04.19: 支持**Llama3**系列模型的推理, 微调和部署等. 包括: Llama-3-8B, Llama-3-8B-Instruct, Llama-3-70B, Llama-3-70B-Instruct. 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/llama3_8b_instruct/lora/sft.sh)开始训练叭!
4445
- 2024.04.18: 支持模型: wizardlm2-7b-awq, wizardlm2-8x22b, yi-6b-chat-awq, yi-6b-chat-int8, yi-34b-chat-awq, yi-34b-chat-int8. 支持`--deepspeed zero3-offload`, 提供了默认zero3-offload配置文件来使用zero3+cpu offload.
4546
- 2024.04.18: 支持使用环境变量`USE_HF`兼容HuggingFace生态, 切换成使用HF中的模型和数据集, 可以查看[HuggingFace生态兼容文档](https://github.com/modelscope/swift/tree/main/docs/source/LLM/HuggingFace生态兼容.md).
@@ -61,6 +62,8 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
6162
- 🔥2024.03.29: 支持**Grok-1** 300B MoE模型的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/LLM/Grok训练和推理.md).
6263
- 🔥2024.03.25: 支持TeleChat-7b和TeleChat-12b模型的训练和推理, 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/telechat_12b/lora/sft.sh)来开始训练!.
6364
- 🔥2024.03.20: 支持**llava**系列的推理与微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/llava最佳实践.md).
65+
<details><summary>更多</summary>
66+
6467
- 🔥2024.03.12: 支持**deepseek-vl**系列推理和微调, 最佳实践可以查看[这里](https://github.com/modelscope/swift/tree/main/docs/source/Multi-Modal/deepseek-vl最佳实践.md).
6568
- 🔥2024.03.11: 支持[GaLore](https://arxiv.org/abs/2403.03507), 用于在全参数训练中有效减小显存占用至原来的1/2.
6669
- 🔥2024.03.10: Qwen1.5-7B-Chat与Qwen1.5-72B-Chat从微调到部署[全流程最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md).
@@ -70,8 +73,6 @@ SWIFT支持近**200种LLM和MLLM**(多模态大模型)的训练、推理、
7073
- 🔥2024.02.29: 支持[LLaMA PRO](https://arxiv.org/pdf/2401.02415.pdf), 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/yi_6b_chat/llamapro/sft.sh)即可开始训练.
7174
- 🔥2024.02.29: 支持[LoRA+](https://arxiv.org/pdf/2402.12354.pdf), 使用[这个脚本](https://github.com/modelscope/swift/blob/main/examples/pytorch/llm/scripts/yi_6b_chat/lorap/sft.sh)即可开始训练.
7275
- 2024.02.25: 支持`swift export`, 对模型进行**AWQ/GPTQ**量化导出, 以及推送ModelScope Hub. 具体可以查看文档: [LLM量化文档](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E9%87%8F%E5%8C%96%E6%96%87%E6%A1%A3.md).
73-
<details><summary>更多</summary>
74-
7576
- 2024.02.22: 支持gemma系列: gemma-2b, [gemma-2b-instruct](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/gemma_2b_instruct), gemma-7b, gemma-7b-instruct.
7677
- 2024.02.16: 支持deepseek-math系列: deepseek-math-7b, deepseek-math-7b-instruct, deepseek-math-7b-chat.
7778
- 🔥2024.02.05: 支持**Qwen1.5**系列模型, 支持的所有Qwen1.5系列模型请查看[模型列表](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E6%94%AF%E6%8C%81%E7%9A%84%E6%A8%A1%E5%9E%8B%E5%92%8C%E6%95%B0%E6%8D%AE%E9%9B%86.md#%E6%A8%A1%E5%9E%8B). 提供了[qwen1half-7b-chat](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen1half_7b_chat), [qwen1half-7b-chat-int8](https://github.com/modelscope/swift/tree/main/examples/pytorch/llm/scripts/qwen1half_7b_chat_int8)微调的脚本.
@@ -518,8 +519,9 @@ make docs
518519
| ------------------------------------------------------------ |
519520
| [使用Web-UI](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md) |
520521
| [使用Tuners](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E4%BD%BF%E7%94%A8tuners.md) |
521-
| [LLM微调](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E5%BE%AE%E8%B0%83%E6%96%87%E6%A1%A3.md) |
522522
| [LLM推理](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E6%8E%A8%E7%90%86%E6%96%87%E6%A1%A3.md) |
523+
| [LLM微调](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E5%BE%AE%E8%B0%83%E6%96%87%E6%A1%A3.md) |
524+
| [LLM评测](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E8%AF%84%E6%B5%8B%E6%96%87%E6%A1%A3.md) |
523525
| [LLM量化](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E9%87%8F%E5%8C%96%E6%96%87%E6%A1%A3.md) |
524526
| [LLM部署](https://github.com/modelscope/swift/blob/main/docs/source/LLM/VLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E4%B8%8E%E9%83%A8%E7%BD%B2.md) |
525527
| [DPO人类对齐训练](https://github.com/modelscope/swift/blob/main/docs/source/LLM/LLM%E4%BA%BA%E7%B1%BB%E5%AF%B9%E9%BD%90%E8%AE%AD%E7%BB%83%E6%96%87%E6%A1%A3.md) |
@@ -533,6 +535,7 @@ make docs
533535
| [自定义新模型和数据集](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E8%87%AA%E5%AE%9A%E4%B9%89%E4%B8%8E%E6%8B%93%E5%B1%95.md) |
534536
| [支持的模型和数据集列表](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E6%94%AF%E6%8C%81%E7%9A%84%E6%A8%A1%E5%9E%8B%E5%92%8C%E6%95%B0%E6%8D%AE%E9%9B%86.md) |
535537
| [运行速度与显存Benchmark](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Benchmark.md) |
538+
| [HuggingFace生态兼容](https://github.com/modelscope/swift/blob/main/docs/source/LLM/HuggingFace%E7%94%9F%E6%80%81%E5%85%BC%E5%AE%B9.md) |
536539

537540

538541
### 最佳实践
@@ -542,6 +545,8 @@ make docs
542545
| [自我认知微调最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/LLM/%E8%87%AA%E6%88%91%E8%AE%A4%E7%9F%A5%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
543546
| [Qwen1.5最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/LLM/Qwen1.5%E5%85%A8%E6%B5%81%E7%A8%8B%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
544547
| [多模态模型训练最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/Multi-Modal/index.md) |
548+
| [NPU推理与微调最佳实践](https://github.com/modelscope/swift/blob/main/docs/source/LLM/NPU%E6%8E%A8%E7%90%86%E4%B8%8E%E5%BE%AE%E8%B0%83%E6%9C%80%E4%BD%B3%E5%AE%9E%E8%B7%B5.md) |
549+
545550

546551
### 深度学习教程
547552

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# NPU训练最佳实践
2+
3+
## 目录
4+
- [环境准备](#环境准备)
5+
- [微调](#微调)
6+
- [推理](#推理)
7+
8+
9+
## 环境准备
10+
11+
实验环境:8 * 昇腾910B3
12+
13+
```shell
14+
pip install ms-swift -U
15+
pip install torch-npu
16+
```
17+
18+
测试环境是否安装正确:
19+
```python
20+
from transformers.utils import is_torch_npu_available
21+
import torch
22+
23+
print(is_torch_npu_available()) # True
24+
print(torch.npu.device_count()) # 8
25+
```
26+
27+
## 微调
28+
以下介绍LoRA的微调, 全参数微调设置参数`--sft_type full`即可.
29+
30+
31+
### 单卡训练
32+
33+
通过如下命令启动单卡微调:
34+
35+
```shell
36+
# 实验环境: 昇腾910B3
37+
# 显存需求: 25GB
38+
# 运行时长: 8小时
39+
ASCEND_RT_VISIBLE_DEVICES=0 \
40+
swift sft \
41+
--model_type qwen1half-7b-chat \
42+
--dataset blossom-math-zh \
43+
--num_train_epochs 5 \
44+
--sft_type lora \
45+
--output_dir output \
46+
```
47+
48+
49+
### 数据并行训练
50+
51+
```shell
52+
# 实验环境: 4 * 昇腾910B3
53+
# 显存需求: 4 * 30GB
54+
# 运行时长: 2小时
55+
NPROC_PER_NODE=4 \
56+
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
57+
swift sft \
58+
--model_type qwen1half-7b-chat \
59+
--dataset blossom-math-zh \
60+
--num_train_epochs 5 \
61+
--sft_type lora \
62+
--output_dir output \
63+
```
64+
65+
66+
### Deepspeed训练
67+
68+
ZeRO2:
69+
```shell
70+
# 实验环境: 4 * 昇腾910B3
71+
# 显存需求: 4 * 28GB
72+
# 运行时长: 3小时
73+
NPROC_PER_NODE=4 \
74+
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
75+
swift sft \
76+
--model_type qwen1half-7b-chat \
77+
--dataset blossom-math-zh \
78+
--num_train_epochs 5 \
79+
--sft_type lora \
80+
--output_dir output \
81+
--deepspeed default-zero2 \
82+
```
83+
84+
ZeRO3:
85+
```shell
86+
# 实验环境: 4 * 昇腾910B3
87+
# 显存需求: 4 * 25GB
88+
# 运行时长: 8小时
89+
NPROC_PER_NODE=4 \
90+
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
91+
swift sft \
92+
--model_type qwen1half-7b-chat \
93+
--dataset blossom-math-zh \
94+
--num_train_epochs 5 \
95+
--sft_type lora \
96+
--output_dir output \
97+
--deepspeed default-zero3 \
98+
```
99+
100+
101+
## 推理
102+
103+
原始模型:
104+
```shell
105+
ASCEND_RT_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat
106+
```
107+
108+
LoRA微调后:
109+
```shell
110+
ASCEND_RT_VISIBLE_DEVICES=0 swift infer --ckpt_dir xxx/checkpoint-xxx --load_dataset_config true
111+
```

docs/source/LLM/index.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@
55
1. [自我认知微调最佳实践](自我认知微调最佳实践.md)
66
2. [Agent训练与通用数据混合最佳实践](Agent微调最佳实践.md)
77
3. [Qwen1.5全流程最佳实践](Qwen1.5全流程最佳实践.md)
8+
4. [NPU推理与微调最佳实践](NPU推理与微调最佳实践.md)
9+
5. [Grok-1训练和推理最佳实践](Grok训练和推理.md)
810

911

1012
### 🍀Multi-Modal最佳实践系列
@@ -17,8 +19,11 @@
1719
2. [LLM微调文档](LLM微调文档.md)
1820
3. [DPO训练文档](LLM人类对齐训练文档.md)
1921
4. [界面训练与推理](https://github.com/modelscope/swift/blob/main/docs/source/GetStarted/%E7%95%8C%E9%9D%A2%E8%AE%AD%E7%BB%83%E6%8E%A8%E7%90%86.md)
20-
5. [LLM量化文档](LLM量化文档.md)
21-
6. [VLLM推理加速与部署](VLLM推理加速与部署.md)
22+
5. [LLM评测文档](LLM评测文档.md)
23+
6. [LLM量化文档](LLM量化文档.md)
24+
7. [VLLM推理加速与部署](VLLM推理加速与部署.md)
25+
8. [LLM实验文档](LLM实验文档.md)
26+
2227

2328
### 🐔参考文档
2429
1. [自定义模型和数据集](自定义与拓展.md)
Lines changed: 110 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,110 @@
1+
# NPU Best Practice
2+
3+
## Table of Contents
4+
- [Environment Preparation](#Environment-Preparation)
5+
- [Fine-tuning](#Fine-tuning)
6+
- [Inference](#Inference)
7+
8+
## Environment Preparation
9+
10+
Experimental environment: 8 * Ascend 910B3
11+
12+
```shell
13+
pip install ms-swift -U
14+
pip install torch-npu
15+
```
16+
17+
Verify the installation of the testing environment:
18+
```python
19+
from transformers.utils import is_torch_npu_available
20+
import torch
21+
22+
print(is_torch_npu_available()) # True
23+
print(torch.npu.device_count()) # 8
24+
```
25+
26+
## Fine-tuning
27+
The following introduces the fine-tuning of LoRA. Set the parameter `--sft_type full` for full parameter fine-tuning.
28+
29+
30+
### Single Card Training
31+
32+
Start single card fine-tuning with the following command:
33+
34+
```shell
35+
# Experimental Environment: Ascend 910B3
36+
# GPU Memory Requirement: 25GB
37+
# Runtime: 8 hours
38+
ASCEND_RT_VISIBLE_DEVICES=0 \
39+
swift sft \
40+
--model_type qwen1half-7b-chat \
41+
--dataset blossom-math-zh \
42+
--num_train_epochs 5 \
43+
--sft_type lora \
44+
--output_dir output \
45+
```
46+
47+
48+
### Training with DDP
49+
50+
```shell
51+
# Experimental Environment: 4 * Ascend 910B3
52+
# GPU Memory Requirement: 4 * 30GB
53+
# Runtime: 2 hours
54+
NPROC_PER_NODE=4 \
55+
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
56+
swift sft \
57+
--model_type qwen1half-7b-chat \
58+
--dataset blossom-math-zh \
59+
--num_train_epochs 5 \
60+
--sft_type lora \
61+
--output_dir output \
62+
```
63+
64+
65+
### Training with DeepSpeed
66+
67+
ZeRO2:
68+
```shell
69+
# Experimental Environment: 4 * Ascend 910B3
70+
# GPU Memory Requirement: 4 * 28GB
71+
# Runtime: 3 hours
72+
NPROC_PER_NODE=4 \
73+
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
74+
swift sft \
75+
--model_type qwen1half-7b-chat \
76+
--dataset blossom-math-zh \
77+
--num_train_epochs 5 \
78+
--sft_type lora \
79+
--output_dir output \
80+
--deepspeed default-zero2 \
81+
```
82+
83+
ZeRO3:
84+
```shell
85+
# Experimental Environment: 4 * Ascend 910B3
86+
# GPU Memory Requirement: 4 * 25GB
87+
# Runtime: 8 hours
88+
NPROC_PER_NODE=4 \
89+
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
90+
swift sft \
91+
--model_type qwen1half-7b-chat \
92+
--dataset blossom-math-zh \
93+
--num_train_epochs 5 \
94+
--sft_type lora \
95+
--output_dir output \
96+
--deepspeed default-zero3 \
97+
```
98+
99+
100+
## Inference
101+
102+
Original Model:
103+
```shell
104+
ASCEND_RT_VISIBLE_DEVICES=0 swift infer --model_type qwen1half-7b-chat
105+
```
106+
107+
After LoRA Fine-tuning:
108+
```shell
109+
ASCEND_RT_VISIBLE_DEVICES=0 swift infer --ckpt_dir xxx/checkpoint-xxx --load_dataset_config true
110+
```

docs/source_en/LLM/index.md

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,8 @@
55
1. [Self Cognition Best Practice](Self-cognition-best-practice.md)
66
2. [Agent Training and Inference Best Practice](Agent-best-practice.md)
77
3. [Qwen1.5 Best Practice](Qwen1.5-best-practice.md)
8+
4. [NPU Best Practice](NPU-best-practice.md)
9+
5. [Grok-1 Training and Inference Best Practice](Grok-1-best-practice.md)
810

911

1012
### 🍀Multi-Modal Best Practices!
@@ -18,8 +20,11 @@ Please check: [Multi-Modal Best Practices](../Multi-Modal/index.md)
1820
2. [LLM Finetuning](LLM-fine-tuning.md)
1921
3. [DPO Training](RLHF.md)
2022
4. [Web-ui Training and Inference](../GetStarted/Web-ui.md)
21-
5. [LLM quantization](LLM-quantization.md)
22-
6. [VLLM Inference and Deployment](VLLM-inference-acceleration-and-deployment.md)
23+
5. [LLM Evaluation](LLM-eval.md)
24+
6. [LLM Quantization](LLM-quantization.md)
25+
7. [VLLM Inference and Deployment](VLLM-inference-acceleration-and-deployment.md)
26+
8. [LLM Experimental](LLM-exp.md)
27+
2328

2429
### 🐔References!
2530
1. [Customization for models and datasets](Customization.md)

0 commit comments

Comments
 (0)