Skip to content

Commit 8f6b089

Browse files
Jokerenliuyunqi20
authored andcommitted
[PROTON] Emit an error for the roctracer backend if HIP_VISIBLE_DEVICES is set (#4986)
Based on the feedback from AMD, the device mapping problem has to be addressed by the ROCm team, so we emit an error for now.
1 parent d0dfe8e commit 8f6b089

File tree

2 files changed

+17
-0
lines changed

2 files changed

+17
-0
lines changed

third_party/proton/README.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -209,3 +209,7 @@ If you encounter permission related problems when using instruction sampling, yo
209209

210210
The overhead of instruction sampling on NVIDIA GPUs is about 20x using Proton because we haven't enabled continuous sampling yet.
211211
Continuous sampling can allow for more runtime optimizations, but it makes it more challenging to attribute performance data back to the GPU kernels because: (1) it enables profiling of concurrent kernels, (2) it doesn't allow profiling of time and instruction samples simultaneously, and (3) it works best if we have a separate thread dedicated to attributing instruction samples to the GPU kernels
212+
213+
- Visible devices on AMD GPUs
214+
215+
Environment variables such as `HIP_VISIBLE_DEVICES`, and `CUDA_VISIBLE_DEVICES` are not supported on AMD GPUs. Once it's set, we cannot find a valid mapping between the device ID returned by RocTracer and the physical device ID. Instead, `ROCR_VISIBLE_DEVICES` is recommended to be used.

third_party/proton/proton/profile.py

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
import functools
22
import triton
3+
import os
34

45
from triton._C.libproton import proton as libproton
56
from .hook import register_triton_hook, unregister_triton_hook
@@ -19,6 +20,16 @@ def _select_backend() -> str:
1920
raise ValueError("No backend is available for the current target.")
2021

2122

23+
def _check_env(backend: str) -> None:
24+
if backend == "roctracer":
25+
hip_device_envs = ["HIP_VISIBLE_DEVICES", "CUDA_VISIBLE_DEVICES"]
26+
for env in hip_device_envs:
27+
if os.getenv(env, None) is not None:
28+
raise ValueError(
29+
f"Proton does not work when the environment variable {env} is set on AMD GPUs. Please unset it and use `ROCR_VISIBLE_DEVICES` instead"
30+
)
31+
32+
2233
def start(
2334
name: Optional[str] = None,
2435
*,
@@ -66,6 +77,8 @@ def start(
6677
if backend is None:
6778
backend = _select_backend()
6879

80+
_check_env(backend)
81+
6982
set_profiling_on()
7083
if hook and hook == "triton":
7184
register_triton_hook()

0 commit comments

Comments
 (0)