-
Notifications
You must be signed in to change notification settings - Fork 180
Description
Note: If you see gfx1152 or gfx1150 missing in any component, please add support for both in the same PR.
- Add initial gfx1153 support #2215
- TheRock: Reduce ctest parallelism depending on gfx target (-j1 or -j2 on gfx1153, maybe gfx1152). Should fix the failing rocThrust tests in TEST_TYPE=full
- [ci] Halve rocThrust test parallelism on gfx1153 #2658
- [ci] Halve rocThrust test parallelism on gfx1152 #2828
- Verify that these tests are passing reliably in CI after jobs for these targets are re-enabled
Hip libraries
- [hipBLAS] Add initial gfx1152 support rocm-libraries#2656 (both gfx1152 and gfx1153)
- [hipBLASLt] Add initial gfx1152 and gfx1153 support rocm-libraries#2655 (both gfx1152 and gfx1153)
- [hipCUB] Add gfx1150, gfx1152, gfx1153 rocm-libraries#3451 (depends on rocPrim)
- hipDNN (nothing to update on repo - inherits automatically) - test passing
- [hipFFT] Enable gfx1152 & gfx1153 rocm-libraries#3157 (both gfx1152 and gfx1153)
- Superseded by [rocFFT | hipFFT] Enable gfx1152 & gfx1153 rocm-libraries#3155 (combines hipFFT with rocFFT PR)
- [hipRAND] Enable gfx1152 and gfx1153 rocm-libraries#3077 (both gfx1152 and gfx1153)
- hipSolver (nothing to update on repo - inherits automatically)
- hipSparse (nothing to update on repo - inherits automatically)
- hipSparseLt does not support gfx115{0,1,2,3} since it requires of sparse MFMA matrix cores.
- hipTensor
does not support gfx115{0,1,2,3} since it requires of MFMA matrix cores.Edit: Supports it though WMMA - origami (rocm-libraries/shared/origami/include/origami/hardware.hpp) (also missing gfx1150)
Roc libraries
- [rocBLAS][Tensile] Add initial gfx1152/gfx1153 support rocm-libraries#2653 (both gfx1152 and gfx1153)
- [rocFFT | hipFFT] Enable gfx1152 & gfx1153 rocm-libraries#3155 (both gfx1152 and gfx1153)
- [rocPRIM] Add gfx1152 and gfx1153 rocm-libraries#3269 (n.b. [rocPRIM] Config modernization rocm-libraries#2955)
- [rocRAND] Enable gfx1152 and gfx1153 rocm-libraries#2799
- [rocSOLVER] Add support for gfx1150, gfx1152 and gfx1153 architectures rocm-libraries#3485 (depends on rocPrim (optionally on rocSparse - cmake BUILD_WITH_SPARSE))
- [rocSparse] Enable gfx1152 and gfx1153 rocm-libraries#3106 (both gfx1152 and gfx1153)
- [rocSparse] Enable gfx1150 rocm-libraries#3379 (gfx1150)
- [rocThrust] Add gfx1152 and gfx1153 rocm-libraries#3268 (both gfx1152 and gfx1153) (rocThrust depends on rocPRIM, but the PRs are independent)
- [rocWMMA] Add gfx1152 and gfx1153 rocm-libraries#2850
- rocRoller (not necessary for functionality, no good enough use case to justify the investment)
- Adding counters support for GFX1152 and GFX1153 rocm-systems#2055
- rccl (needs gfx1103, gfx115X)
- gfx1151: Support Strix Halo gfx1151 rccl#2075
- others: could be done using the gfx1151 PR as an example, but we would need a multi-GPU system of each type for testing
Composable kernels
AOTriton (Ahead of Time Triton math library - mostly for flash attention)
- Add gfx1152 and gfx1153 iGPU support aotriton#142
- Bump https://github.com/pytorch/pytorch/blame/main/cmake/External/aotriton.cmake#L27 to commit that supports gfx1152 and gfx1153 (@roberteg16)
- Port gfx115{2,3} enablement to
release/0.11branch aotriton#143 - Wait for new aotriton release before bump (Laitio, Mika is the contact person
- Revert https://github.com/ROCm/TheRock/pull/2709/changes
- Revert https://github.com/ROCm/TheRock/pull/2810/changes
- Port gfx115{2,3} enablement to
MIOpen
vLLM
- vLLM (like in [ROCm][Build] Add support for AMD Ryzen AI MAX / AI 300 Series vllm-project/vllm#25908 and Update gpu.rocm.inc.md to add support for AMD Ryzen AI MAX / AI 300 Series (gfx1151, gfx1150) vllm-project/vllm#28308)
- [AMD] flash-attention: Support all gfx10/11/12 targets triton#913 - flash attention work happens in AITER repo
- Enable all gfx11 targets flash-attention#165
- Add gfx11XX targets aiter#1498
llama.cpp
- Support for build + benchmarking (Nothing to update on repo - verified as working on target GPUs)
Pytorch
- Requires of a rocm versions that supports (compilation and packaging) gfx1152 and gfx1153.
- Right now only gfx1150 and gfx1151 are being packaged on rocm.
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/rocm7.0ls .venv/lib/python3.12/site-packages/torch/lib/rocblas/library/ | grep -E "gfx115(0|1)"shows object code for gfx115{0,1}ls .venv/lib/python3.12/site-packages/torch/lib/rocblas/library/ | grep -E "gfx115(2|3)"does not show object code for gfx115{2,3}- Enable rocmWMMA for gfx115{2,3}
- [ROCm] add gfx1152 & gfx1153 to supported gemm lists pytorch/pytorch#170307
- Perhaps open first in fork? Pytorch versions are set in stone.
- [Pytorch] gfx1153 - Pytorch build failures in all versions "error: use of undeclared identifier 'CK_BUFFER_RESOURCE_3RD_DWORD'" #2723
CI
- Get boards connected (in progress)
- Enable nightly CI (like https://github.com/ROCm/TheRock/blob/main/build_tools/github_actions/amdgpu_family_matrix.py#L67)
- Add runner
- amdgpu_family_matrix: Remove expect_failure for gfx1152/gfx1153 #2647
- Get fixed 1153 board init failure: https://ontrack-internal.amd.com/browse/SWDEV-576414
- Get fixed 1153 board hang: https://ontrack-internal.amd.com/browse/SWDEV-576577
- [ci] Re-enable gfx1153 Linux test machines after error is resolved #2682
- amdgpu_family_matrix: Remove sanity_check_only_for_family on gfx1153 on Linux #2648
- Revert [ci] Disabling Linux
gfx1151,gfx103Xandgfx110Xtest machines andgfx950rocroller tests #2742 - Get more runners (need 5 more; 8 total)
- Ensure that the nightly test job passes: https://github.com/ROCm/TheRock/actions/workflows/ci_nightly.yml
- Ensure that nightly packages/python wheels are built and uploaded (Release portable Linux packages fails to trigger release_portable_linux_pytorch_wheels for gfx1152/1153 #2646)
Collateral
- Update https://github.com/ROCm/TheRock/blob/main/ROADMAP.md with support status
Reactions are currently unavailable
Sub-issues
Metadata
Metadata
Assignees
Type
Projects
Status
TODO