Skip to content

Commit cede64c

Browse files
authored
[AMD] Enable ASan tests on gfx942 (#8819)
Enables ASan and ASan tests on AMD gfx942.
1 parent 1d8e147 commit cede64c

File tree

2 files changed

+13
-3
lines changed

2 files changed

+13
-3
lines changed

.github/workflows/integration-tests-amd.yml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ jobs:
2323
options: >-
2424
--device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video --user root
2525
--volume /home/runner/.triton:/github/home/.triton
26-
- image: rocm/pytorch:rocm7.0_ubuntu22.04_py3.10_pytorch_release_2.8.0
26+
- image: rocm/pytorch-private:rocm7.0_ubuntu22.04_py3.10_pytorch_2.8.0_asan
2727
runner: ["amd-gfx942"]
2828
# We add --env-file to pull in HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES definition for GPU isolation.
2929
options: >-
@@ -47,6 +47,7 @@ jobs:
4747
PROTON_SKIP_PC_SAMPLING_TEST: 1
4848
PYTHON: "python3"
4949
CCACHE_COMPRESS: "true"
50+
PIP_BREAK_SYSTEM_PACKAGES: 1
5051
container:
5152
image: ${{ matrix.image }}
5253
options: ${{ matrix.options }}
@@ -167,18 +168,22 @@ jobs:
167168
run: |
168169
make test-distributed
169170
- name: Run asan tests on AMD
170-
if: false
171+
if: ${{ matrix.runner[0] == 'amd-gfx942' }}
171172
run: |
172173
cd third_party/amd/python/test/
173174
ulimit -s 1024
174175
export PATH=$(find ~/.triton/llvm -name llvm-symbolizer -printf '%h\n'):$PATH
176+
TORCH_PATH=$(find /opt -name libcaffe2_nvrtc.so -printf '%h\n')
177+
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$TORCH_PATH
178+
mv $TORCH_PATH/libamdhip64.so $TORCH_PATH/libamdhip64_bck.so
175179
export LD_LIBRARY_PATH=$(find /opt -name libclang_rt.asan-x86_64.so -printf '%h\n'):$LD_LIBRARY_PATH
176180
export LD_LIBRARY_PATH=$(find /opt -type d -wholename *lib/llvm/lib/asan):$LD_LIBRARY_PATH
177-
export LD_LIBRARY_PATH=$(find /usr -name libcaffe2_nvrtc.so -printf '%h\n'):$LD_LIBRARY_PATH
181+
export LD_LIBRARY_PATH=$(find /opt -wholename *lib/asan/libamdhip64.so -printf '%h\n'):$LD_LIBRARY_PATH
178182
export CLANG_ASAN_LIB=$(find /opt -name libclang_rt.asan-x86_64.so)
179183
export HIP_ASAN_LIB=$(find /opt -wholename *lib/asan/libamdhip64.so)
180184
ASAN_OPTIONS=detect_leaks=0,alloc_dealloc_mismatch=0 \
181185
LD_PRELOAD=$CLANG_ASAN_LIB:$HIP_ASAN_LIB python3 -m pytest -s test_address_sanitizer.py
186+
mv $TORCH_PATH/libamdhip64_bck.so $TORCH_PATH/libamdhip64.so
182187
- name: Run regression tests
183188
run: |
184189
make test-regression

third_party/amd/backend/compiler.py

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -451,6 +451,11 @@ def make_amdgcn(src, metadata, options):
451451
dump_file_id)
452452
amdgcn = llvm.translate_to_asm(src, amd.TARGET_TRIPLE, options.arch, features, flags, options.enable_fp_fusion,
453453
False)
454+
# TODO: Remove the following workaround once LLVM is bumped to include: https://github.com/llvm/llvm-project/pull/169851
455+
# Workaround for LLVM ERROR: cannot evaluate equated symbol 'amdgcn.device.init.num_named_barrier'
456+
if knobs.compilation.enable_asan and 'gfx1250' not in options.arch:
457+
amdgcn = amdgcn.replace('.amdgpu_metadata',
458+
'\t.set\tamdgcn.device.init.num_named_barrier, 0\n.amdgpu_metadata')
454459
if knobs.amd.dump_amdgcn:
455460
print("// -----// AMDGCN Dump //----- //")
456461
print(amdgcn)

0 commit comments

Comments
 (0)