Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
974 commits
Select commit Hold shift + click to select a range
1ceb3b9
Merge remote-tracking branch 'origin/main' into aiter-mla-integration
vllmellm Apr 3, 2025
20a3f07
fix mypy error on Iterable typing error
vllmellm Apr 3, 2025
7153046
Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…
gshtras Apr 3, 2025
e3f03b7
Disable fp8_out_scale on V1
gshtras Apr 3, 2025
eaecf03
Merge remote-tracking branch 'embedded/aiter-mla-integration' into up…
gshtras Apr 3, 2025
c045f59
Merge pull request #499 from ROCm/upstream_merge_2025_04_02
gshtras Apr 3, 2025
b101125
Bump aiter version (#500)
gshtras Apr 3, 2025
6d258fa
Adding 2stage MoE support separately until it is added upstream (#501)
gshtras Apr 3, 2025
732455b
Fused FP8 conversion in attention for v1 (#502)
gshtras Apr 7, 2025
f657987
Merge remote-tracking branch 'upstream/main'
gshtras Apr 7, 2025
2b6e9c9
Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…
gshtras Apr 7, 2025
d17d4df
Merge pull request #503 from ROCm/upstream_merge_2025_04_07
gshtras Apr 7, 2025
8826599
Fix fused moe (#506)
gshtras Apr 7, 2025
97b78bf
Update moe_tune_script.sh (#507)
divakar-amd Apr 8, 2025
f68829f
doubled size to wa issue and preserve CAR perf (#510)
maleksan85 Apr 10, 2025
b8498bc
re-enable custom paged attention for V0 (#511)
charlifu Apr 10, 2025
f4b308f
Add gfx950 to the attention archs
jpvillam-amd Apr 3, 2025
e201e58
Linter
jpvillam-amd Apr 10, 2025
c43debd
Updated README.md with April 10 results (#512)
Mcirino1 Apr 14, 2025
9025082
Update README.md (#514)
faisalgulfam32 Apr 16, 2025
1c0a1ae
update base image (#515)
charlifu Apr 17, 2025
44c9580
Merge remote-tracking branch 'upstream/main'
gshtras Apr 21, 2025
40f2157
Update test-template.j2 to enable building (#517)
Alexei-V-Ivanov-AMD Apr 21, 2025
60cd57b
Update test-template.j2 to fix new location of run-amd-test.sh (#518)
Alexei-V-Ivanov-AMD Apr 21, 2025
e26141f
Rocm 6.4 docker (#519)
gshtras Apr 22, 2025
8ad1c44
Update README.md (#521)
t-parry Apr 22, 2025
49e4719
Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…
gshtras Apr 23, 2025
a9af7a9
Remove leftovers from 2stage
gshtras Apr 23, 2025
105e655
Re-add 2stage moe
gshtras Apr 23, 2025
ae144d6
custom all-reduce, gfx950
seungrokj Apr 24, 2025
cfda5b3
Merge remote-tracking branch 'origin/main' into upstream_merge_2025_0…
gshtras Apr 24, 2025
c5b41dc
Missing parameter for sdpa
gshtras Apr 24, 2025
c383e6c
Update README.md (#523)
t-parry Apr 24, 2025
cfc530a
Merge branch 'main' into upstream_merge_2025_04_21
gshtras Apr 24, 2025
c3f61dd
Merge pull request #522 from ROCm/upstream_merge_2025_04_21
gshtras Apr 24, 2025
8c211e5
Merge remote-tracking branch 'upstream/main'
gshtras Apr 25, 2025
a9e7a00
Fix API typo and remove FP8 on V1 restriction
gshtras Apr 25, 2025
28007b0
Upstream merge 2025 04 25 (#524)
gshtras Apr 25, 2025
8bd7ee1
Bump hiblaslt (#528)
gshtras Apr 28, 2025
328b04d
Merge branch 'main' into jpvillam/fa_gfx950
jpvillam-amd Apr 28, 2025
550b072
Update rocm.py
jpvillam-amd Apr 28, 2025
1fbb019
Restrict setuptools version (#529)
gshtras Apr 28, 2025
ad806ba
Linter
jpvillam-amd Apr 28, 2025
dc6c46b
lint
gshtras Apr 28, 2025
1f4e00c
Revert aiter commit (#530)
gshtras Apr 29, 2025
e8766c6
Merge remote-tracking branch 'upstream/main'
gshtras Apr 29, 2025
7a9f58a
Update README.md (#531)
t-parry Apr 30, 2025
41b85b6
Restrict ray version due to https://github.com/ray-project/ray/issues…
gshtras Apr 30, 2025
8e45f88
Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…
gshtras Apr 30, 2025
285ac51
Merge remote-tracking branch 'upstream/main' into jpvillam/fa_gfx950
gshtras Apr 30, 2025
0bc1d7c
No vllm.vllm_flash_attn.layers.rotary on ROCm
gshtras Apr 30, 2025
134d285
Merge remote-tracking branch 'origin/rocm_fix' into upstream_merge_20…
gshtras Apr 30, 2025
2921150
Merge remote-tracking branch 'origin/jpvillam/fa_gfx950' into upstrea…
gshtras Apr 30, 2025
f3a5bf0
Restore the function that is used elsewhere
gshtras Apr 30, 2025
8334e54
Merge remote-tracking branch 'origin/jpvillam/fa_gfx950' into upstrea…
gshtras Apr 30, 2025
c1cb05e
Fix Quark API use
gshtras May 1, 2025
2c68ff9
Merge branch 'main' into upstream_merge_2025_04_29
gshtras May 2, 2025
29241ca
Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…
gshtras May 2, 2025
0b8eaec
Re-fix Quark API
gshtras May 2, 2025
f3f620a
Using the right torch API
gshtras May 2, 2025
2fea69f
Merge pull request #536 from ROCm/upstream_merge_2025_04_29
gshtras May 2, 2025
d283632
Merge remote-tracking branch 'upstream/main'
gshtras May 6, 2025
8e62073
Fix for the condition to accept empty encoder inputs for mllama
gshtras May 6, 2025
a0b4ef2
Cherry-pick skinny gemm fix
gshtras May 6, 2025
166d0ef
Merge pull request #538 from ROCm/upstream_merge_2025_05_06
gshtras May 6, 2025
d483fc2
integrate aiter
fsx950223 May 8, 2025
4f85566
add env variable
fsx950223 May 8, 2025
ae85e79
rename function
fsx950223 May 9, 2025
87ea0ba
optimize kernels with small query lens
fsx950223 May 9, 2025
db4bc55
change condition
fsx950223 May 9, 2025
b526478
Aiter mla cherrypick (#543)
gshtras May 9, 2025
c791a85
Cherry pick skinny gemms (#544)
gshtras May 9, 2025
efe59bd
Merge remote-tracking branch 'upstream/main' into fa_upstream
fsx950223 May 12, 2025
40654e4
add rocm aiter backend
fsx950223 May 12, 2025
59f1b15
add gfx950 support for skinny gemms
charlifu May 12, 2025
5b1895e
Merge branch 'main' into amd/gfx950_skinny_gemm
charlifu May 13, 2025
6f5df79
Merge remote-tracking branch 'upstream/main'
gshtras May 13, 2025
6b08324
Merge remote-tracking branch 'origin/main'
gshtras May 13, 2025
d9da93f
fix on_mi3xx
charlifu May 14, 2025
bb1f213
Merge remote-tracking branch 'upstream/main'
gshtras May 15, 2025
0c6ce45
Merge remote-tracking branch 'upstream/main'
gshtras May 15, 2025
222fa01
Remove gradlib
gshtras May 15, 2025
34483a3
Fix P3L Arg parser
gshtras May 15, 2025
c13eddf
pre-commit
gshtras May 15, 2025
8dd236d
new fa impl
fsx950223 May 16, 2025
1466c79
Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…
gshtras May 16, 2025
ccd96e8
Toggle for v1 attention
gshtras May 16, 2025
262ed1e
Merge pull request #547 from ROCm/upstream_merge_2025_05_15
gshtras May 16, 2025
d1d3ff9
Remove gradlib mention from pyproject (#549)
gshtras May 16, 2025
db892e7
Fix input layer norm mismatch for Eagle Speculative Decoding compatib…
mmkamani7 May 16, 2025
16d2b92
Updated README.md (#546)
Mcirino1 May 19, 2025
662127a
Merge remote-tracking branch 'upstream/main'
gshtras May 19, 2025
9b131ae
Caching the env variable in the __init__
gshtras May 19, 2025
e34fd18
Restrict FP8 attention output to non unified backend until the accura…
gshtras May 19, 2025
e94c760
Merge pull request #550 from ROCm/upstream_merge_2025_05_19
gshtras May 19, 2025
8a67a53
Reduce diff from upstream (#551)
gshtras May 20, 2025
e950b15
Fixing a bug from transformers==4.52. config.head_dim is now explicit…
gshtras May 20, 2025
258d2d3
Remove the option to compile cython during the docker build. It hasn'…
gshtras May 20, 2025
a31e5d8
Fixing pre-commit in github. Not sure why this issue does not affect …
gshtras May 20, 2025
16af49c
Merge remote-tracking branch 'upstream/main'
gshtras May 21, 2025
91a5600
Fused FP8 attention output is now only possible for both flash and pa…
gshtras May 21, 2025
7c1213e
Remove incorrect env value
gshtras May 21, 2025
e995fdc
update api
fsx950223 May 21, 2025
a501ff0
optimize performance
fsx950223 May 26, 2025
3ad9e3a
merge
fsx950223 May 26, 2025
05e460f
remove try catch
fsx950223 May 15, 2025
1c450a5
Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…
gshtras May 27, 2025
d5e35a9
Merge remote-tracking branch 'origin/main' into upstream_merge_2025_0…
gshtras May 27, 2025
1900335
Upstream merge 2025 05 27 (#557)
gshtras May 27, 2025
307d8bc
Removing redundant parameters from the MIs side and fixing Navi build…
gshtras May 27, 2025
12447b9
Merge branch 'main' into amd/gfx950_skinny_gemm
charlifu May 28, 2025
630ed84
cache get_lds_size()
charlifu May 28, 2025
0a337a6
clean code
fsx950223 May 29, 2025
f4a992c
Removing RPD in favor of torch profiler for V1 (#558)
gshtras May 29, 2025
bee14ca
Merge remote-tracking branch 'upstream/main'
gshtras May 29, 2025
7bb0618
Added benchmark results and commit hash (#556)
Mcirino1 May 29, 2025
0286875
Merge branch 'main' into amd/gfx950_skinny_gemm
charlifu May 29, 2025
7bf92f9
Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…
gshtras May 29, 2025
421c498
Merge leftover
gshtras May 29, 2025
628db8d
Merge remote-tracking branch 'origin/amd/gfx950_skinny_gemm' into ups…
gshtras May 29, 2025
d92c04b
Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…
gshtras Jun 2, 2025
9c22cdd
Remove redundant configs
gshtras Jun 2, 2025
9d4c238
Merge branch 'main' into upstream_merge_2025_06_02
gshtras Jun 2, 2025
3712649
Merge pull request #565 from ROCm/upstream_merge_2025_06_02
gshtras Jun 2, 2025
ab92741
Merge remote-tracking branch 'upstream/main'
gshtras Jun 3, 2025
aee731f
cleanup
gshtras Jun 3, 2025
8cde510
Merge pull request #566 from ROCm/upstream_merge_2025_06_03
gshtras Jun 3, 2025
06efc40
add split kv version of unified triton kernel
jvlunteren Jun 3, 2025
8377189
Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…
gshtras Jun 3, 2025
5070c4b
remove type cast
fsx950223 Jun 4, 2025
4d69fde
formatting
jvlunteren Jun 4, 2025
29bef2c
Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…
gshtras Jun 4, 2025
c233c3d
address suggestions by gemini-code-assist
jvlunteren Jun 5, 2025
71cbfe5
Fix attention fp8 output fusion for split attention path in v1 (#569)
gshtras Jun 5, 2025
ccfa3b8
Merge remote-tracking branch 'origin/main' into upstream_merge_2025_0…
gshtras Jun 5, 2025
bcbb7a6
Merge remote-tracking branch 'upstream/main' into upstream_merge_2025…
gshtras Jun 5, 2025
1a254d8
Merge pull request #570 from ROCm/upstream_merge_2025_06_05
gshtras Jun 5, 2025
cdfe72b
Rocm 6.4.1 as base (#571)
gshtras Jun 5, 2025
a9abba3
Merge remote-tracking branch 'origin/fa_upstream3' into 0610_rc2
gshtras Jun 5, 2025
149943e
new aiter commit introduced new enum
tjtanaa May 31, 2025
0721687
Fix the use of a deprecated function
gshtras Jun 5, 2025
8f2462f
Merge remote-tracking branch 'jvlunteren/jvl-splitkv-triton-unif-attn…
gshtras Jun 5, 2025
ccf25ca
[Bugfix] Add padding for block-scale fused-moe weights for AITER lib
qli88 Jun 5, 2025
45604cc
[Bugfix] Add None check for optional list
qli88 Jun 5, 2025
1c2adb5
Make sure block quant is used before doing possible padding
qli88 Jun 5, 2025
bda243e
Replace block_quant with raw check to stop CI complain
qli88 Jun 5, 2025
57e5540
yapf
qli88 Jun 5, 2025
f0c789c
Cherry pick https://github.com/vllm-project/vllm/pull/19234
gshtras Jun 5, 2025
71faa18
Cherry-pick https://github.com/vllm-project/vllm/pull/19158
gshtras Jun 5, 2025
d4b681f
[ROCm][Build] Clean up the ROCm build (#19040) (#567)
gshtras Jun 3, 2025
959a21c
[AMD] Update compatible packaging version (#19309) (#573)
gshtras Jun 9, 2025
e4587a0
[ROCm][Build] Clean up the ROCm build (#19040) (#567)
gshtras Jun 3, 2025
b9836a3
MI350 enablement for fp16 and fp8 models V0/V1 (#576)
maleksan85 Jun 16, 2025
9c6587b
Gathering missed fp4 changes together into this stream (#582)
maleksan85 Jun 25, 2025
710c8c9
Merge remote-tracking branch 'origin/amd/out_of_place_layernorm' into…
gshtras Jun 30, 2025
191d264
Merge remote-tracking branch 'origin/attention_fusion_v1' into 0715_rc1
gshtras Jun 30, 2025
b2a3219
Merge remote-tracking branch 'origin/qiang_qwen3_moe_padding' into 07…
gshtras Jun 30, 2025
c0f6542
Fetching upstream to pull the tags for the correct version
gshtras Jun 20, 2025
2eab60f
Default config values to enable torch compile passes
gshtras Jun 30, 2025
b381e56
Merge remote-tracking branch 'origin/amd/out_of_place_layernorm' into…
gshtras Jul 1, 2025
646a0f4
warpSize is being made non constexpr in ROCm 7.0 (#588)
gshtras Jul 1, 2025
d78b542
Conditional defaults on V1. Adding custom ops
gshtras Jul 1, 2025
b432b7a
Disable explicit commandr compilation to work around GPU crashes
gshtras Jul 1, 2025
f80b3c8
V1 for fp4 (#584)
maleksan85 Jul 8, 2025
97c32fc
Correct the logic for PYTORCH_ROCM_ARCH (#593)
pramenku Jul 11, 2025
dfbc6e2
Merge remote-tracking branch 'origin/0715_rc1' into ROCm-7.0
gshtras Jul 15, 2025
5ccdbd6
wip
gshtras Jul 15, 2025
8bbd5d4
cleanup
gshtras Jul 15, 2025
9d97adb
Add contiguous as a partial workaround
gshtras Jul 16, 2025
9fcf6d5
Using the shuffle fix from Ali
gshtras Jul 16, 2025
b9b2d82
Fixes in rope and aiter0.1.4 adjustment
gshtras Jul 18, 2025
7a1858d
Using aiter branch sith compiler fix
gshtras Jul 18, 2025
8b8f1c3
fix torch.compile issue
charlifu Jul 28, 2025
ed105af
Building triton on top of ROCm7.0 RC1; Apply fix from SWDEV-546201
gshtras Jul 29, 2025
f8dc6d4
Using triton from the triton-gfx950-launch branch
gshtras Jul 29, 2025
8d2ebe3
Using the workaround for the AITER API dtype. Using the workaround fr…
gshtras Jul 30, 2025
f15735b
Base image
gshtras Jul 30, 2025
2ee43db
update triton
kiran-thumma Aug 1, 2025
60f3f4b
update triton new commit
kiran-thumma Aug 1, 2025
891e379
Merge remote-tracking branch 'rocm/amd/out_of_place_layernorm'
gshtras Aug 5, 2025
8487cda
Merge remote-tracking branch 'origin/attention_fusion_v1' into 0715_rc1
gshtras Jun 30, 2025
396d6c4
Merge leftovers and build.fix
gshtras Aug 5, 2025
51b1469
Revert "[Front-end] microbatch tokenization (#19334)"
gshtras Aug 5, 2025
016c25b
RC specific changes - commandr compilation; CAR size increase; tags f…
gshtras Aug 5, 2025
e085291
Merge remote-tracking branch 'origin/0812_rc1' into 355_0805_rc1
gshtras Aug 5, 2025
76ff141
Rope merge leftover
gshtras Aug 5, 2025
c5aaf8c
clean up unneeded proxy methods
Aug 7, 2025
2a9a2af
Missing import
gshtras Aug 7, 2025
a852d7b
RC specific changes - commandr compilation; CAR size increase; tags f…
gshtras Aug 5, 2025
e760056
Revert "[Front-end] microbatch tokenization (#19334)"
gshtras Aug 8, 2025
340ea86
Update dockerfile. Use AITER commit with DS fix
gshtras Aug 8, 2025
4b08ad2
fp8 kv cache support for fp4 llama 3.1 405B
Aug 12, 2025
b82e6cb
merge branch 355_0805_rc1 into 355_wip
Aug 12, 2025
aa8021a
merge branch 0812_rc2 into 355_wip
Aug 12, 2025
f9748d3
restoring fp4 gemm asm aiter kernel correctness as per PR 630
Aug 13, 2025
f22b077
switching to well tested rocm base 24_ubuntu22.04_py3.10_pytorch_lw_r…
Aug 14, 2025
7518839
Use torch/triton versions for OSS support
gshtras Aug 14, 2025
2170b0d
Merge remote-tracking branch 'upstream/main' into 355_wip_oss_base_build
gshtras Aug 14, 2025
9eea278
Config changes refactor adjustment
gshtras Aug 14, 2025
b71976e
Merge fixes
gshtras Aug 14, 2025
38e3f1f
Building triton_kernels as part of building triton. Porting over mxfp…
gshtras Aug 14, 2025
67c53ad
restoring rupport for QuarkW8A8Fp8, QuarkW8A8Int8 after QuarkW4A4MXFP…
Aug 15, 2025
bc91ec3
Update triton_attn.py to remove output_scale
hongxiayang Aug 15, 2025
ee90cd2
Update triton_attn.py: put back output_scale (for broader context)
hongxiayang Aug 15, 2025
25be3d7
Move aiter to Dockerfile.rocm
Aug 17, 2025
3dbb8de
Move commit to aiter branch (355_wip)
Aug 18, 2025
602562b
Merge remote-tracking branch 'origin/355_0805_rc1_upstream' into 355_wip
Aug 20, 2025
68f2e92
[FP4] Padding FP4 gemm output dim0 to align 32
zejunchen-zejun Aug 20, 2025
2b43479
Update to ROCm 7.0 RC4 base image
Aug 20, 2025
a07ff61
Fix env for chunked prefill
fsx950223 Aug 20, 2025
1b1e7cd
add sink arg
fsx950223 Aug 20, 2025
4f43dae
Fix ARCH arguments
Aug 21, 2025
02a90d0
Update to new RC4 build with updated HIPBLASLT
Aug 21, 2025
ffaeba9
Allow Flash attention to build
Aug 21, 2025
95fa2bd
Remove copy of pytorch and vision modules
Aug 21, 2025
8ae5dee
Manually build pytorch/rocm7.1_internal_testing branch of triton
Aug 22, 2025
8eb058e
Add new ROCm image with triton_kernels installed
dllehr-amd Aug 23, 2025
0f04865
Add Day 0 vllm changes to 355_wip branch
Aug 25, 2025
a45886a
Update rocm_base to rocm/vllm-private:355_wip_base_image_0823x
dllehr-amd Aug 25, 2025
4d63faf
add AITER Triton RoPE as a registered ops with VLLM_USE_AITER_TRITON_…
k50112113 Aug 26, 2025
bcc4e69
Add triton gemm calls for unquantized gemms
Aug 26, 2025
25e4e87
remove duplicate aiter_GEMM_check function
Aug 26, 2025
93ee1c5
Add handling for non-contiguous x
rebklee Aug 27, 2025
1f732c1
Updated README.md for June 10 release (#574)
Mcirino1 Jun 11, 2025
ad2771e
Cleanup
gshtras Jun 12, 2025
a82a6f1
New typos checker
gshtras Jun 12, 2025
f0a71e6
Merge leftover
gshtras Jun 17, 2025
9ea0c6e
Cherry pick https://github.com/vllm-project/vllm/pull/19158
gshtras Jun 17, 2025
37897a6
Remove unused vars
gshtras Jun 20, 2025
26a85c7
Updated README.md for June 24 Docker release (#589)
Mcirino1 Jul 7, 2025
d61a61c
Minor changes to command line examples (#594)
Mcirino1 Jul 16, 2025
c15dc3d
Update test-pipeline.yaml (#599)
Alexei-V-Ivanov-AMD Jul 18, 2025
ec540fa
cleanup
gshtras Jul 29, 2025
f4a4bdb
support ck-tile blockquant gemm in vllm
Jul 31, 2025
5800181
Rebase the ck_tile_gemm branch to rocm/355_wip
eliotwang Sep 2, 2025
09ec68f
add triton fp8 gemm support
k50112113 Sep 2, 2025
48dc133
add fused_kv_cache support for llama fp8
k50112113 Sep 3, 2025
fc0dbad
Merge remote-tracking branch 'rocm/0909_rc2' into 355_wip_0909_rc2
Sep 4, 2025
25f843d
Merge branch '355_wip' into ck_tile_gemm
eliotwang Sep 4, 2025
f77bfba
sync with 0909_rc2 changes
Sep 4, 2025
867c2b3
sync with 0909_rc2 changes
Sep 4, 2025
caea443
Integrate mxfp4 MoE native kernels
mawong-amd Aug 15, 2025
176244a
Merge pull request #642 from eliotwang/ck_tile_gemm
charyang-ai Sep 5, 2025
f83d4df
Fix Qwen accuracy fix by not sending quant_config to MOE self.gate RLU
Sep 5, 2025
676b200
clean up
Sep 5, 2025
fb3d439
Merge branch '355_wip' into 355_wip_0909_rc2
Sep 5, 2025
0f826a6
merge artefact correction
Sep 8, 2025
b193a40
updated logic for attn selection with default split attn
Sep 9, 2025
78aa33e
updated logic for attn selection with default split attn and increase…
Sep 9, 2025
7a7123f
[BugFix][AMD][Deepseek] fix a dtype mismatch error for deepseek runni…
KingsleyZhang123 Aug 29, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .buildkite/scripts/hardware_ci/run-amd-test.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ export PYTHONPATH=".."
echo "--- Confirming Clean Initial State"
while true; do
sleep 3
if grep -q clean /opt/amdgpu/etc/gpu_state; then
if grep -q clean ${BUILDKITE_AGENT_META_DATA_RESET_TARGET}; then
echo "GPUs state is \"clean\""
break
fi
Expand Down Expand Up @@ -49,18 +49,18 @@ cleanup_docker

echo "--- Resetting GPUs"

echo "reset" > /opt/amdgpu/etc/gpu_state
echo "reset" > ${BUILDKITE_AGENT_META_DATA_RESET_TARGET}

while true; do
sleep 3
if grep -q clean /opt/amdgpu/etc/gpu_state; then
if grep -q clean ${BUILDKITE_AGENT_META_DATA_RESET_TARGET}; then
echo "GPUs state is \"clean\""
break
fi
done

echo "--- Pulling container"
image_name="rocm/vllm-ci:${BUILDKITE_COMMIT}"
image_name="rocm/vllm-ci-private:${BUILDKITE_COMMIT}"
container_name="rocm_${BUILDKITE_COMMIT}_$(tr -dc A-Za-z0-9 < /dev/urandom | head -c 10; echo)"
docker pull "${image_name}"

Expand Down
47 changes: 47 additions & 0 deletions .buildkite/test-template.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
{% set docker_image = "public.ecr.aws/q9t5s3a7/vllm-ci-test-repo:$BUILDKITE_COMMIT" %}
{% set docker_image_amd = "rocm/vllm-ci-private:$BUILDKITE_COMMIT" %}
{% set default_working_dir = "vllm/tests" %}
{% set hf_home = "/root/.cache/huggingface" %}

steps:
- label: ":docker: build image"
depends_on: ~
commands:
- "docker build --build-arg max_jobs=16 --tag {{ docker_image_amd }} -f docker/Dockerfile.rocm --build-arg ARG_PYTORCH_ROCM_ARCH='gfx90a;gfx942' --target test --progress plain ."
- "docker push {{ docker_image_amd }}"
key: "amd-build"
env:
DOCKER_BUILDKIT: "1"
retry:
automatic:
- exit_status: -1 # Agent was lost
limit: 5
- exit_status: -10 # Agent was lost
limit: 5
agents:
queue: amd-cpu
soft_fail: false

{% for step in steps %}
{% if step.mirror_hardwares and "amd" in step.mirror_hardwares %}
- label: "AMD: {{ step.label }}"
depends_on:
- "amd-build"
agents:
{% if step.amd_gpus and step.amd_gpus==8%}
queue: amd_gpu
{% elif step.amd_gpus and step.amd_gpus==4%}
queue: amd_gpu
{% elif step.amd_gpus and step.amd_gpus==2%}
queue: amd_gpu
{% else%}
queue: amd_gpu
{% endif%}
commands:
- bash .buildkite/scripts/hardware_ci/run-amd-test.sh "cd {{ (step.working_dir or default_working_dir) | safe }} ; {{ step.command or (step.commands | join(" && ")) | safe }}"
env:
DOCKER_BUILDKIT: "1"
priority: 100
soft_fail: false
{% endif %}
{% endfor %}
54 changes: 0 additions & 54 deletions .github/workflows/reminder_comment.yml

This file was deleted.

14 changes: 6 additions & 8 deletions .github/workflows/scripts/build.sh
Original file line number Diff line number Diff line change
@@ -1,22 +1,20 @@
#!/bin/bash
set -eux

python_executable=python$1
cuda_home=/usr/local/cuda-$2
python_executable=python3

# Update paths
PATH=${cuda_home}/bin:$PATH
LD_LIBRARY_PATH=${cuda_home}/lib64:$LD_LIBRARY_PATH

# Install requirements
$python_executable -m pip install -r requirements/build.txt -r requirements/cuda.txt
$python_executable -m pip install -r requirements/rocm.txt

# Limit the number of parallel jobs to avoid OOM
export MAX_JOBS=1
# Make sure release wheels are built for the following architectures
export TORCH_CUDA_ARCH_LIST="7.0 7.5 8.0 8.6 8.9 9.0+PTX"
export PYTORCH_ROCM_ARCH="gfx90a;gfx942"

rm -f "$(which sccache)"

bash tools/check_repo.sh
export MAX_JOBS=32

# Build
$python_executable setup.py bdist_wheel --dist-dir=dist
19 changes: 19 additions & 0 deletions ROCm_performance.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Overview of the optional performance features uinque to https://github.com/ROCm/vllm

## Triton attention
The default attention function on ROCm is using triton attention kernel. To fallback to the https://github.com/ROCm/flash-attention implementation set up the following environment symbol:
`VLLM_USE_TRITON_FLASH_ATTN=0`

## Tunable ops
Pytorch tunable ops are supported.
Define the following environment symbol: `PYTORCH_TUNABLEOP_ENABLED=1` in order to enable both the runtime tuning and the subsequent use of tuned results. To only use the tuned results without tuning any newly encountered shapes, set `PYTORCH_TUNABLEOP_TUNING=0`

## Custom PagedAttention

On ROCm, to have better performance, a custom paged attention is available by switching on the env variable: `VLLM_USE_ROCM_CUSTOM_PAGED_ATTN=1`.
Currently, this env variable is enabled by default. To fallback to PagedAttention v2 kernel assign the env variable to 0.
The custom PagedAttention kernel is enabled for dtype: bf16, fp16, block-size=16, head-size=128, and max context length <= 16k, with GQA ratio (num_heads//num_kv_heads) between 1 to 16. On all the other cases, we fallback to PagedAttention v2 kernel.

## NCCL Performance environment variable

For MI300x, setting environment variable NCCL_MIN_NCHANNELS=112 is expected to improve performance.
Loading