Skip to content

Commit b89e931

Browse files
addsubmuldivJintao-Huang
authored andcommitted
update npu document
1 parent 76a7d69 commit b89e931

File tree

2 files changed

+55
-25
lines changed

2 files changed

+55
-25
lines changed

docs/source/BestPractices/NPU-support.md

Lines changed: 27 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -22,16 +22,24 @@
2222
## 环境准备
2323

2424
实验环境:8 * 昇腾910B3 64G
25-
25+
### 环境安装
2626
```shell
2727
# 创建新的 conda 虚拟环境(可选)
2828
conda create -n swift-npu python=3.10 -y
2929
conda activate swift-npu
3030

31+
# 注意进行后续操作前要先 source 激活 CANN 环境
32+
source /usr/local/Ascend/ascend-toolkit/set_env.sh
33+
3134
# 设置 pip 全局镜像(可选,加速下载)
3235
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
3336
pip install ms-swift -U
3437

38+
# 使用源码安装
39+
git clone https://github.com/modelscope/ms-swift.git
40+
cd ms-swift
41+
pip install -e .
42+
3543
# 安装 torch-npu
3644
pip install torch-npu decorator
3745
# 如果你想要使用 deepspeed(控制显存占用,训练速度会有一定下降)
@@ -43,8 +51,20 @@ pip install evalscope[opencompass]
4351
# 如果需要使用 vllm-ascend 进行推理,请安装以下包
4452
pip install vllm==0.11.0
4553
pip install vllm-ascend==0.11.0rc3
54+
```
55+
56+
测试环境是否安装正确,NPU能否被正常加载:
57+
```python
58+
from transformers.utils import is_torch_npu_available
59+
import torch
4660

47-
# 如果需要使用 MindSpeed(Megatron-LM),请按照下面引导安装必要依赖
61+
print(is_torch_npu_available()) # True
62+
print(torch.npu.device_count()) # 8
63+
print(torch.randn(10, device='npu:0'))
64+
```
65+
66+
**如果需要使用 MindSpeed(Megatron-LM),请按照下面引导安装必要依赖**
67+
```shell
4868
# 1. 获取并切换 Megatron-LM 至 core_v0.12.1 版本
4969
git clone https://github.com/NVIDIA/Megatron-LM.git
5070
cd Megatron-LM
@@ -63,17 +83,13 @@ export PYTHONPATH=$PYTHONPATH:<your_local_megatron_lm_path>
6383
export MEGATRON_LM_PATH=<your_local_megatron_lm_path>
6484
```
6585

66-
测试环境是否安装正确,NPU能否被正常加载:
67-
68-
```python
69-
from transformers.utils import is_torch_npu_available
70-
import torch
71-
72-
print(is_torch_npu_available()) # True
73-
print(torch.npu.device_count()) # 8
74-
print(torch.randn(10, device='npu:0'))
86+
执行如下命令验证 MindSpeed(Megatron-LM) 是否配置成功:
87+
```shell
88+
python -c "import mindspeed.megatron_adaptor; from swift.megatron.init import init_megatron_env; init_megatron_env(); print('✓ NPU环境下的Megatron-SWIFT配置验证成功!')"
7589
```
7690

91+
### 环境查看
92+
7793
查看NPU的P2P连接,这里看到每个NPU都通过7条HCCS与其他NPU互联
7894

7995
```shell

docs/source_en/BestPractices/NPU-support.md

Lines changed: 28 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -20,16 +20,24 @@ For detailed environment setup, please refer to the [Ascend PyTorch installation
2020
## Environment Preparation
2121

2222
Experiment Environment: 8 * Ascend 910B3 64G
23-
23+
### Environment Installation
2424
```shell
2525
# Create a new conda virtual environment (optional)
2626
conda create -n swift-npu python=3.10 -y
2727
conda activate swift-npu
2828

29+
# Note: Before proceeding with subsequent operations, you need to source and activate CANN environment first
30+
source /usr/local/Ascend/ascend-toolkit/set_env.sh
31+
2932
# Set pip global mirror (optional, to speed up downloads)
3033
pip config set global.index-url https://mirrors.aliyun.com/pypi/simple/
3134
pip install ms-swift -U
3235

36+
# Install from source
37+
git clone https://github.com/modelscope/ms-swift.git
38+
cd ms-swift
39+
pip install -e .
40+
3341
# Install torch-npu
3442
pip install torch-npu decorator
3543
# If you want to use deepspeed (to control memory usage, training speed might decrease)
@@ -41,8 +49,20 @@ pip install evalscope[opencompass]
4149
# If you need to use vllm-ascend for inference, please install the following packages
4250
pip install vllm==0.11.0
4351
pip install vllm-ascend==0.11.0rc3
52+
```
53+
54+
Check if the test environment is installed correctly and whether the NPU can be loaded properly.
55+
```python
56+
from transformers.utils import is_torch_npu_available
57+
import torch
4458

45-
# If you need to use MindSpeed ​​(Megatron-LM), please install the following packages
59+
print(is_torch_npu_available()) # True
60+
print(torch.npu.device_count()) # 8
61+
print(torch.randn(10, device='npu:0'))
62+
```
63+
64+
**If you need to use MindSpeed (Megatron-LM), please follow the guide below to install the necessary dependencies**
65+
```shell
4666
# 1. Obtain and switch Megatron-LM to core_v0.12.1
4767
git clone https://github.com/NVIDIA/Megatron-LM.git
4868
cd Megatron-LM
@@ -60,17 +80,11 @@ cd ..
6080
export PYTHONPATH=$PYTHONPATH:<your_local_megatron_lm_path>
6181
export MEGATRON_LM_PATH=<your_local_megatron_lm_path>
6282
```
63-
64-
Check if the test environment is installed correctly and whether the NPU can be loaded properly.
65-
```python
66-
from transformers.utils import is_torch_npu_available
67-
import torch
68-
69-
print(is_torch_npu_available()) # True
70-
print(torch.npu.device_count()) # 8
71-
print(torch.randn(10, device='npu:0'))
83+
Run the following command to verify if MindSpeed (Megatron-LM) is configured successfully:
84+
```shell
85+
python -c "import mindspeed.megatron_adaptor; from swift.megatron.init import init_megatron_env; init_megatron_env(); print('✓ NPU environment Megatron-SWIFT configuration verified successfully!')"
7286
```
73-
87+
### Environment Viewing
7488
Check the P2P connections of the NPU, where we can see that each NPU is interconnected through 7 HCCS links with other NPUs.
7589
```shell
7690
(valle) root@valle:~/src# npu-smi info -t topo
@@ -95,7 +109,7 @@ Legend:
95109
NA = Unknown relationship.
96110
```
97111

98-
Check the status of the NPU. Detailed information about the `npu-smi` command can be found in the [official documentation](https://support.huawei.com/enterprise/zh/doc/EDOC1100079287/10dcd668).
112+
Check the status of the NPU. For detailed information about the `npu-smi` command, please refer to the [official documentation](https://support.huawei.com/enterprise/en/doc/EDOC1100079287/10dcd668).
99113
```shell
100114
(valle) root@valle:~/src# npu-smi info
101115
+------------------------------------------------------------------------------------------------+
@@ -345,6 +359,6 @@ ASCEND_RT_VISIBLE_DEVICES=0 swift deploy \
345359
| Using sglang as inference engine |
346360

347361

348-
## NPU Wechat Group
362+
## NPU WeChat Group
349363

350364
<img src="https://raw.githubusercontent.com/modelscope/ms-swift/main/docs/resources/wechat/npu.png" width="250">

0 commit comments

Comments
 (0)