Skip to content

Conversation

starmountain1997
Copy link

@starmountain1997 starmountain1997 commented Aug 14, 2025

环境配置

基础环境配置

镜像启动

建议使用镜像安装,当然你也可以在裸机上安装。

首先根据自己的系统架构拉取镜像:

docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-x86_64-gcc84 # X86 架构

docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann80RC2-ubuntu20-npu-base-aarch64-gcc84 # ARM 架构

启动镜像:

docker run -it --name ${NAME} -v /home/guozr:/home/guozr \
    --privileged --shm-size=128G -w=/home/guozr \
    -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
    -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
    -v /usr/local/dcmi:/usr/local/dcmi \
    --net host \
    -e ASCEND_RT_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" \
    e6acd904bbcf /bin/bash

安装高版本 CANN

镜像内的 CANN 套件较老,需要重新安装 CANN Toolkit、CANN Kernels 和 NNAL,版本>=8.1.RC1,请注意,三个软件的版本需配套,推荐使用 8.2.RC1 版本。请正确选择 CPU 架构,CANN kernels 是分硬件的,请注意选择。下载好后按下面顺序安装:

yes | toolkit.run --install
yes | kernels.run --install
yes | nnal.run --install

配置环境变量

运行前请配置下列环境变量:

source /usr/local/Ascend/ascend-toolkit/set_env.sh
source /usr/local/Ascend/atb/set_env.sh
source /usr/local/Ascend/nnal/atb/set_env.sh --cxx_abi=1

另外默认显存分配机制为 naive_best_fit 可选择配置 Paddle 显存分配机制为 auto_growth 以随着真实数据需要再占用内存/显存,但内存/显存可能会产生碎片,详见

目前由于未知原因,不将显存分配机制设为 auto_growth会爆显存,因此也请设置下面的环境变量:

export FLAGS_allocator_strategy=auto_growth

Python 环境配置

安装 Paddle

可使用如下命令安装(更高版本的 paddlepaddlepaddleformers 有冲突,因此这里建议安装 3.1 版本):

# 先安装飞桨 CPU 安装包
pip install paddlepaddle==3.1
# 再安装飞桨 NPU 插件包
pip install paddle-custom-npu -i https://www.paddlepaddle.org.cn/packages/stable/npu

详见昇腾 NPU 安装说明

安装三方库

编译 PaddleCustomDevice 之前,需要安装三方库 spdlogjson

# 安装 spglog
git clone https://github.com/gabime/spdlog.git
cd spdlog
mkdir build && cd build
cmake ..
make -j$(nproc)
make install

# 安装 json
git clone https://github.com/nlohmann/json.git
cd json
mkdir build && cd build
cmake ..
make -j$(nproc)
make install

安装 PaddleCustomDevice

git clone https://github.com/PaddlePaddle/PaddleCustomDevice.git
cd PaddleCustomDevice/backends/npu
bash tools/compile.sh

完成编译后执行下面的命令安装:

pip install build/dist/paddle_custom_npu-*.whl --force-reinstall

手动安装这个 PR

git clone https://github.com/llliiilil/PaddleCustomDevicetmp.git miPaddleCustomDevice
cd miPaddleCustomDevice/backends/npu
bash tools/compile.sh
pip install build/dist/paddle_custom_npu-*.whl --force-reinstall


source tools/set_env.sh
cd opp/ascendc_custom_ops/build/
bash build_ops.sh
cd custom_project/build_out/
./custom_opp*.run
export LD_LIBRARY_PATH=/usr/local/Ascend/ascend-toolkit/latest/opp/vendors/aie_ascendc/op_api/lib/:${LD_LIBRARY_PATH}

如后续报错 please make sure you registered your op first and try again,请在手动安装后回去再覆盖安装一下主线版本 PaddlePaddle/PaddleCustomDevice 中生成的 whl

安装 PaddleNLP

从源码克隆:

git clone https://github.com/PaddlePaddle/PaddleNLP.git

csrc/npu 目录下按照 README.md 安装:

python setup.py build bdist_wheel
pip install dist/paddlenlp_ops*.whl

编译 FastDeploy

bash build.sh

运行时可能会报错:

ModuleNotFoundError: No module named 'distutils.dir_util'

可以修改 /usr/local/lib/python3.10/dist-packages/paddleformers/utils/pdc_sdk.py 22 行的 from distutils.dir_util import copy_tree 为:

from shutil import copytree as copy_tree

运行前需把对应的 FastDeploy 目录添加到 PYTHONPATH

export PYTHONPATH="/work/FastDeploy":${PYTHONPATH}
export LD_LIBRARY_PATH=/usr/local/Ascend/npt/lib:$LD_LIBRARY_PATH

如果遇到 libgomp cannot allocate memory in static TLS block 错误,可以按如下方法解决:

export LD_PRELOAD=$LD_PRELOAD:/usr/local/lib/python3.10/dist-packages/scikit_learn.libs/libgomp-{一串数字,根据你实际情况决定}.so.1.0.0

如果遇到循环导入问题,且不运行多模态模型,可以临时卸载 opencv。另外请注意,目前对 numpy 2.0 支持不佳,因此在最后请强制安装 numpy 1.26.4 版本:

pip uninstall opencv-python
pip install numpy==1.26.4

如果遇到:

  File "/home/guozr/CODE/FastDeploy/fastdeploy/utils.py", line 443, in get_host_ip
    ip = socket.gethostbyname(socket.gethostname())
socket.gaierror: [Errno -2] Name or service not known

先查询:

hostname

然后在 /etc/hosts 加上上面查询到的 hostname:

127.0.0.1   hostname-mbqbc.foreman.pxe localhost

Copy link

paddle-bot bot commented Aug 14, 2025

Thanks for your contribution!

@paddle-bot paddle-bot bot added the contributor External developers label Aug 14, 2025
@starmountain1997 starmountain1997 changed the title 【wip】npu support [NPU] ERNIE 4.5 support Aug 14, 2025
@CLAassistant
Copy link

CLAassistant commented Aug 14, 2025

CLA assistant check
All committers have signed the CLA.

@starmountain1997 starmountain1997 marked this pull request as ready for review August 21, 2025 11:22
@starmountain1997 starmountain1997 force-pushed the npu branch 3 times, most recently from a48c978 to 59ad44e Compare August 22, 2025 03:43
@starmountain1997 starmountain1997 force-pushed the npu branch 2 times, most recently from 9c1eb8e to 9d5d11a Compare August 26, 2025 02:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
contributor External developers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants