Skip to content

Commit 59a0b85

Browse files
authored
[bugfix] fix blackwell deepep installation (#22255)
1 parent 469b3ff commit 59a0b85

File tree

2 files changed

+12
-6
lines changed

2 files changed

+12
-6
lines changed

tools/ep_kernels/README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -13,16 +13,16 @@ All scripts accept a positional argument as workspace path for staging the build
1313

1414
## Usage
1515

16-
### Single-node
17-
1816
```bash
19-
bash install_python_libraries.sh
17+
# for hopper
18+
TORCH_CUDA_ARCH_LIST="9.0" bash install_python_libraries.sh
19+
# for blackwell
20+
TORCH_CUDA_ARCH_LIST="10.0" bash install_python_libraries.sh
2021
```
2122

22-
### Multi-node
23+
Additional step for multi-node deployment:
2324

2425
```bash
25-
bash install_python_libraries.sh
2626
sudo bash configure_system_drivers.sh
2727
sudo reboot # Reboot is required to load the new driver
2828
```

tools/ep_kernels/install_python_libraries.sh

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,12 @@ if [ -z "$CUDA_HOME" ]; then
2929
exit 1
3030
fi
3131

32+
# assume TORCH_CUDA_ARCH_LIST is set correctly
33+
if [ -z "$TORCH_CUDA_ARCH_LIST" ]; then
34+
echo "TORCH_CUDA_ARCH_LIST is not set, please set it to your desired architecture."
35+
exit 1
36+
fi
37+
3238
# disable all features except IBGDA
3339
export NVSHMEM_IBGDA_SUPPORT=1
3440

@@ -95,7 +101,7 @@ clone_repo "https://github.com/ppl-ai/pplx-kernels" "pplx-kernels" "setup.py"
95101
cd pplx-kernels
96102
# see https://github.com/pypa/pip/issues/9955#issuecomment-838065925
97103
# PIP_NO_BUILD_ISOLATION=0 disables build isolation
98-
PIP_NO_BUILD_ISOLATION=0 TORCH_CUDA_ARCH_LIST=9.0a+PTX pip install -vvv -e .
104+
PIP_NO_BUILD_ISOLATION=0 pip install -vvv -e .
99105
popd
100106

101107
# build and install deepep, require pytorch installed

0 commit comments

Comments
 (0)