gfx1152 & gfx1153 bring up

Note: If you see gfx1152 or gfx1150 missing in any component, please add support for both in the same PR.

- [x] https://github.com/ROCm/TheRock/pull/2215
- [ ] TheRock: Reduce ctest parallelism depending on gfx target (-j1 or -j2 on gfx1153, maybe gfx1152). Should fix the failing rocThrust tests in TEST_TYPE=full
  - [x] https://github.com/ROCm/TheRock/pull/2658
  - [x] https://github.com/ROCm/TheRock/pull/2828
  - [ ] Verify that these tests are passing reliably in CI after jobs for these targets are re-enabled

# Hip libraries
- [x] https://github.com/ROCm/rocm-libraries/pull/2656 (both gfx1152 and gfx1153)
- [x] https://github.com/ROCm/rocm-libraries/pull/2655 (both gfx1152 and gfx1153)
- [x] https://github.com/ROCm/rocm-libraries/pull/3451 (depends on rocPrim)
- [x] hipDNN (nothing to update on repo - inherits automatically) - test passing
- [x] https://github.com/ROCm/rocm-libraries/pull/3157 (both gfx1152 and gfx1153)
  - Superseded by https://github.com/ROCm/rocm-libraries/pull/3155 (combines hipFFT with rocFFT PR)
- [x]  https://github.com/ROCm/rocm-libraries/pull/3077 (both gfx1152 and gfx1153)
- [x] hipSolver (nothing to update on repo - inherits automatically)
  - [ ]  Issue opened: https://github.com/ROCm/rocm-libraries/issues/3380
- [x] hipSparse (nothing to update on repo - inherits automatically)
  - [ ]  Issue opened: https://github.com/ROCm/rocm-libraries/issues/3117
- [x] hipSparseLt does not support gfx115{0,1,2,3} since it requires of **sparse MFMA matrix cores**.
- [x] hipTensor ~does not support gfx115{0,1,2,3} since it requires of **MFMA matrix cores**.~  Edit: Supports it though WMMA
  - [x]  https://github.com/ROCm/rocm-libraries/pull/3496
- [x] origami (rocm-libraries/shared/origami/include/origami/hardware.hpp) (also missing gfx1150)
  - [x] https://github.com/ROCm/rocm-libraries/pull/3809
  - [x] https://github.com/ROCm/rocm-libraries/pull/3876

# Roc libraries
- [x] https://github.com/ROCm/rocm-libraries/pull/2653 (both gfx1152 and gfx1153)
- [x] https://github.com/ROCm/rocm-libraries/pull/3155 (both gfx1152 and gfx1153)
- [x] https://github.com/ROCm/rocm-libraries/pull/3269 (n.b. https://github.com/ROCm/rocm-libraries/pull/2955)
- [x] https://github.com/ROCm/rocm-libraries/pull/2799
- [x] https://github.com/ROCm/rocm-libraries/pull/3485 (depends on rocPrim (optionally on rocSparse - cmake BUILD_WITH_SPARSE))
  - [ ] https://github.com/ROCm/rocm-libraries/issues/3169
  - [ ] https://github.com/ROCm/rocm-libraries/issues/3171
- [x] https://github.com/ROCm/rocm-libraries/pull/3106 (both gfx1152 and gfx1153)
- [x] https://github.com/ROCm/rocm-libraries/pull/3379 (gfx1150)
- [x] https://github.com/ROCm/rocm-libraries/pull/3268 (both gfx1152 and gfx1153) (rocThrust depends on rocPRIM, but the PRs are independent)
- [x] https://github.com/ROCm/rocm-libraries/pull/2850
    - [ ] https://github.com/ROCm/TheRock/pull/2650
- [x] rocRoller (not necessary for functionality, no good enough use case to justify the investment)
- [ ] https://github.com/ROCm/rocm-systems/pull/2055
- [ ] rccl (needs gfx1103, gfx115X)
  - [ ] gfx1151: https://github.com/ROCm/rccl/pull/2075
  - [ ] others: could be done using the gfx1151 PR as an example, but we would need a multi-GPU system of each type for testing

# Composable kernels
- [x] https://github.com/ROCm/composable_kernel/pull/3306
    - [x] https://github.com/ROCm/composable_kernel/pull/3496
    - [x] https://github.com/ROCm/rocm-libraries/pull/3061
    - [x] bump into TheRock
    - [ ] https://github.com/ROCm/TheRock/pull/2809

# AOTriton (Ahead of Time Triton math library - mostly for flash attention)
- [x] https://github.com/ROCm/aotriton/pull/142
- [ ] Bump https://github.com/pytorch/pytorch/blame/main/cmake/External/aotriton.cmake#L27 to [commit that supports gfx1152 and gfx1153](https://github.com/ROCm/aotriton/pull/142) (@roberteg16)
  - [x] https://github.com/ROCm/aotriton/pull/143
  - [ ] Wait for new aotriton release before bump (Laitio, Mika is the contact person
  - [ ] Revert https://github.com/ROCm/TheRock/pull/2709/changes
  - [ ] Revert https://github.com/ROCm/TheRock/pull/2810/changes

# MIOpen
- [x] https://github.com/ROCm/rocm-libraries/pull/3296

# vLLM
- [ ] vLLM (like in https://github.com/vllm-project/vllm/pull/25908 and https://github.com/vllm-project/vllm/pull/28308)
- [x] https://github.com/ROCm/triton/pull/913 - flash attention work happens in AITER repo
- [ ] https://github.com/ROCm/flash-attention/pull/165 
- [x] https://github.com/ROCm/aiter/pull/1498

# llama.cpp
- [x] Support for build + benchmarking (Nothing to update on repo - verified as working on target GPUs)

# Pytorch
- [ ] Requires of a rocm versions that supports (compilation and packaging) gfx1152 and gfx1153.
  -  Right now only gfx1150 and gfx1151 are being packaged on rocm.
    -  `pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/rocm7.0`
    -  `ls .venv/lib/python3.12/site-packages/torch/lib/rocblas/library/ | grep -E "gfx115(0|1)"` shows object code for gfx115{0,1}
    -  `ls .venv/lib/python3.12/site-packages/torch/lib/rocblas/library/ | grep -E "gfx115(2|3)"` does not show object code for gfx115{2,3}
  - [ ] Enable rocmWMMA for gfx115{2,3}
- [ ] https://github.com/pytorch/pytorch/pull/170307
  -  Perhaps open first in fork? Pytorch versions are set in stone.
- [ ] https://github.com/ROCm/TheRock/issues/2723

# CI
- [ ] Get boards connected (in progress)
    - [x] Enable nightly CI (like https://github.com/ROCm/TheRock/blob/main/build_tools/github_actions/amdgpu_family_matrix.py#L67)
    - [x] [Add runner](https://github.com/ROCm/TheRock/pull/2542)
    - [x] https://github.com/ROCm/TheRock/pull/2647
    - [x] Get fixed 1153 board init failure: https://ontrack-internal.amd.com/browse/SWDEV-576414
    - [ ] Get fixed 1153 board hang: https://ontrack-internal.amd.com/browse/SWDEV-576577
    - [ ] https://github.com/ROCm/TheRock/issues/2682
    - [ ] https://github.com/ROCm/TheRock/pull/2648
    - [ ] Revert https://github.com/ROCm/TheRock/pull/2742
    - [ ] Get more runners (need 5 more; 8 total)
    - [ ] Ensure that the nightly test job passes: https://github.com/ROCm/TheRock/actions/workflows/ci_nightly.yml
    - [ ] Ensure that nightly packages/python wheels are built and uploaded (https://github.com/ROCm/TheRock/issues/2646)

# Collateral
- [ ] Update https://github.com/ROCm/TheRock/blob/main/ROADMAP.md with support status

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gfx1152 & gfx1153 bring up #2310

Hip libraries

Roc libraries

Composable kernels

AOTriton (Ahead of Time Triton math library - mostly for flash attention)

MIOpen

vLLM

llama.cpp

Pytorch

CI

Collateral

Sub-issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gfx1152 & gfx1153 bring up #2310

Description

Hip libraries

Roc libraries

Composable kernels

AOTriton (Ahead of Time Triton math library - mostly for flash attention)

MIOpen

vLLM

llama.cpp

Pytorch

CI

Collateral

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions