Skip to content

Commit 01d56b9

Browse files
committed
Make GPU architecture detection optional for evoformer
1 parent 150465b commit 01d56b9

File tree

2 files changed

+17
-9
lines changed

2 files changed

+17
-9
lines changed

docs/_tutorials/ds4sci_evoformerattention.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,8 @@ export CUTLASS_PATH=/path/to/cutlass
2727
The kernels will be compiled when `DS4Sci_EvoformerAttention` is called for the first time.
2828

2929
`DS4Sci_EvoformerAttention` requires GPUs with compute capability 7.0 or higher (NVIDIA V100 or later GPUs) and the minimal CUDA version is 11.3. It is recommended to use CUDA 11.7 or later for better performance. Besides, the performance of backward kernel on V100 kernel is not as good as that on A100 for now.
30-
The extension checks both requirements and fails if any is not met. To disable the check, for example for cross-compiling in a system without GPUs, you can set the environment variable ```DS_IGNORE_CUDA_DETECTION=TRUE```.
30+
The extension checks both requirements and fails if any is not met. To disable the check, for example for cross-compiling in a system without GPUs, you can set the environment variable ```DS_IGNORE_CUDA_DETECTION=TRUE```
31+
and the environment value ```DS_EVOFORMER_GPU_ARCH={70|75|80}```, which controls the target GPU (80 being the last supported and meaning NVIDIA Ampere and later).
3132

3233
### 3.2 Unit test and benchmark
3334

op_builder/evoformer_attn.py

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,10 @@ def __init__(self, name=None):
1616
name = self.NAME if name is None else name
1717
super().__init__(name=name)
1818
self.cutlass_path = os.environ.get('CUTLASS_PATH')
19+
# Target GPU architecture
20+
# Current useful values: >70, >75, >80, see gemm_kernel_utils.h
21+
# For modern GPUs, >80 is obfiously the right value
22+
self.gpu_arch = os.environ.get('DS_EVOFORMER_GPU_ARCH')
1923

2024
def absolute_name(self):
2125
return f'deepspeed.ops.{self.NAME}_op'
@@ -32,14 +36,17 @@ def sources(self):
3236

3337
def nvcc_args(self):
3438
args = super().nvcc_args()
35-
try:
36-
import torch
37-
except ImportError:
38-
self.warning("Please install torch if trying to pre-compile kernels")
39-
return args
40-
major = torch.cuda.get_device_properties(0).major #ignore-cuda
41-
minor = torch.cuda.get_device_properties(0).minor #ignore-cuda
42-
args.append(f"-DGPU_ARCH={major}{minor}")
39+
if not self.gpu_arch:
40+
try:
41+
import torch
42+
except ImportError:
43+
self.warning("Please install torch if trying to pre-compile kernels")
44+
return args
45+
major = torch.cuda.get_device_properties(0).major #ignore-cuda
46+
minor = torch.cuda.get_device_properties(0).minor #ignore-cuda
47+
args.append(f"-DGPU_ARCH={major}{minor}")
48+
else:
49+
args.append(f"-DGPU_ARCH={self.gpu_arch}")
4350
return args
4451

4552
def is_compatible(self, verbose=False):

0 commit comments

Comments
 (0)