Skip to content

Commit dfff141

Browse files
authored
Update NPU推理与微调最佳实践.md (#754)
1 parent fb1f693 commit dfff141

File tree

1 file changed

+105
-11
lines changed

1 file changed

+105
-11
lines changed

docs/source/LLM/NPU推理与微调最佳实践.md

Lines changed: 105 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -8,33 +8,127 @@
88

99
## 环境准备
1010

11-
实验环境:8 * 昇腾910B3
11+
实验环境:8 * 昇腾910B3 64G
1212

1313
```shell
14-
pip install ms-swift -U
14+
# 创建新的conda虚拟环境(可选)
15+
conda create -n npu python=3.10.12 -y
16+
conda activate npu
17+
# 设置pip全局镜像 (可选,加速下载)
18+
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
19+
20+
# 安装ms-swift(当前推荐从源码安装, 待发版后可直接pip安装)
21+
git clone https://github.com/modelscope/swift.git
22+
cd swift
23+
pip install -e '.[llm]'
24+
# 安装torch-npu
1525
pip install torch-npu
26+
# 如果你想要使用deepspeed(控制显存占用,训练速度会有一定下降)
27+
pip install deepspeed -U
28+
# datasets==2.19.0不向下兼容,需指定安装2.18.0版本
29+
pip install datasets==2.18.0
30+
# 安装依赖缺失的包
31+
pip install decorator
32+
33+
# 环境对齐 (可选,通常不需要运行. 如果你运行错误, 可以跑下面的代码, 仓库使用最新环境测试)
34+
pip install -r requirements/framework.txt -U
35+
pip install -r requirements/llm.txt -U
36+
1637
```
1738

18-
测试环境是否安装正确:
39+
测试环境是否安装正确,NPU能否被正常加载
1940
```python
2041
from transformers.utils import is_torch_npu_available
2142
import torch
43+
import torch_npu
44+
45+
torch.randn((10,), device='npu:0')
46+
torch.npu.set_device(0)
2247

2348
print(is_torch_npu_available()) # True
2449
print(torch.npu.device_count()) # 8
2550
```
51+
查看NPU的P2P连接,这里看到每个NPU都通过7条HCCS与其他NPU互联
52+
```shell
53+
(valle) root@valle:~/src# npu-smi info -t topo
54+
NPU0 NPU1 NPU2 NPU3 NPU4 NPU5 NPU6 NPU7 CPU Affinity
55+
NPU0 X HCCS HCCS HCCS HCCS HCCS HCCS HCCS 144-167
56+
NPU1 HCCS X HCCS HCCS HCCS HCCS HCCS HCCS 144-167
57+
NPU2 HCCS HCCS X HCCS HCCS HCCS HCCS HCCS 96-119
58+
NPU3 HCCS HCCS HCCS X HCCS HCCS HCCS HCCS 96-119
59+
NPU4 HCCS HCCS HCCS HCCS X HCCS HCCS HCCS 0-23
60+
NPU5 HCCS HCCS HCCS HCCS HCCS X HCCS HCCS 0-23
61+
NPU6 HCCS HCCS HCCS HCCS HCCS HCCS X HCCS 48-71
62+
NPU7 HCCS HCCS HCCS HCCS HCCS HCCS HCCS X 48-71
63+
64+
Legend:
65+
66+
X = Self
67+
SYS = Path traversing PCIe and NUMA nodes. Nodes are connected through SMP, such as QPI, UPI.
68+
PHB = Path traversing PCIe and the PCIe host bridge of a CPU.
69+
PIX = Path traversing a single PCIe switch
70+
PXB = Path traversing multipul PCIe switches
71+
HCCS = Connection traversing HCCS.
72+
NA = Unknown relationship.
73+
74+
```
75+
查看NPU状态,
76+
[npu-smi命令详解](https://support.huawei.com/enterprise/zh/doc/EDOC1100079287/10dcd668)
77+
```shell
78+
(valle) root@valle:~/src# npu-smi info
79+
+------------------------------------------------------------------------------------------------+
80+
| npu-smi 24.1.rc1.b030 Version: 24.1.rc1.b030 |
81+
+---------------------------+---------------+----------------------------------------------------+
82+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
83+
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
84+
+===========================+===============+====================================================+
85+
| 0 910B3 | OK | 101.8 43 0 / 0 |
86+
| 0 | 0000:C1:00.0 | 0 0 / 0 3318 / 65536 |
87+
+===========================+===============+====================================================+
88+
| 1 910B3 | OK | 92.0 39 0 / 0 |
89+
| 0 | 0000:C2:00.0 | 0 0 / 0 3314 / 65536 |
90+
+===========================+===============+====================================================+
91+
| 2 910B3 | OK | 102.0 40 0 / 0 |
92+
| 0 | 0000:81:00.0 | 0 0 / 0 3314 / 65536 |
93+
+===========================+===============+====================================================+
94+
| 3 910B3 | OK | 99.8 40 0 / 0 |
95+
| 0 | 0000:82:00.0 | 0 0 / 0 3314 / 65536 |
96+
+===========================+===============+====================================================+
97+
| 4 910B3 | OK | 98.6 45 0 / 0 |
98+
| 0 | 0000:01:00.0 | 0 0 / 0 3314 / 65536 |
99+
+===========================+===============+====================================================+
100+
| 5 910B3 | OK | 99.7 44 0 / 0 |
101+
| 0 | 0000:02:00.0 | 0 0 / 0 3314 / 65536 |
102+
+===========================+===============+====================================================+
103+
| 6 910B3 | OK | 103.8 45 0 / 0 |
104+
| 0 | 0000:41:00.0 | 0 0 / 0 3314 / 65536 |
105+
+===========================+===============+====================================================+
106+
| 7 910B3 | OK | 98.2 44 0 / 0 |
107+
| 0 | 0000:42:00.0 | 0 0 / 0 3315 / 65536 |
108+
+===========================+===============+====================================================+
26109

110+
```
27111
## 微调
28112
以下介绍LoRA的微调, 全参数微调设置参数`--sft_type full`即可.
29113

30-
114+
| 模型大小 | NPU数量 | deepspeed类型 | 最大显存占用量 |
115+
|------|-------|-------------|-----------|
116+
| 7B | 1 | None | 1 * 28 GB |
117+
| 7B | 4 | None | 4 * 22 GB |
118+
| 7B | 4 | zero2 | 4 * 28 GB |
119+
| 7B | 4 | zero3 | 4 * 22 GB |
120+
| 7B | 8 | None | 8 * 22 GB |
121+
| 14B | 1 | None | 1 * 45 GB |
122+
| 14B | 8 | None | 8 * 51 GB |
123+
| 14B | 8 | zero2 | 8 * 49 GB |
124+
| 14B | 8 | zero3 | 8 * 31 GB |
31125
### 单卡训练
32126

33-
通过如下命令启动单卡微调
127+
通过如下命令启动单卡微调:
34128

35129
```shell
36130
# 实验环境: 昇腾910B3
37-
# 显存需求: 25GB
131+
# 显存需求: 28 GB
38132
# 运行时长: 8小时
39133
ASCEND_RT_VISIBLE_DEVICES=0 \
40134
swift sft \
@@ -46,11 +140,11 @@ swift sft \
46140
```
47141

48142

49-
### 数据并行训练
143+
### 数据并行训练,4卡ddp, qwen1.5-7B-Chat
50144

51145
```shell
52146
# 实验环境: 4 * 昇腾910B3
53-
# 显存需求: 4 * 30GB
147+
# 显存需求: 4 * 22 GB
54148
# 运行时长: 2小时
55149
NPROC_PER_NODE=4 \
56150
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
@@ -69,7 +163,7 @@ ZeRO2:
69163
```shell
70164
# 实验环境: 4 * 昇腾910B3
71165
# 显存需求: 4 * 28GB
72-
# 运行时长: 3小时
166+
# 运行时长: 3.5小时
73167
NPROC_PER_NODE=4 \
74168
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
75169
swift sft \
@@ -84,8 +178,8 @@ swift sft \
84178
ZeRO3:
85179
```shell
86180
# 实验环境: 4 * 昇腾910B3
87-
# 显存需求: 4 * 25GB
88-
# 运行时长: 8小时
181+
# 显存需求: 4 * 22 GB
182+
# 运行时长: 8.5小时
89183
NPROC_PER_NODE=4 \
90184
ASCEND_RT_VISIBLE_DEVICES=0,1,2,3 \
91185
swift sft \

0 commit comments

Comments
 (0)