-
Notifications
You must be signed in to change notification settings - Fork 13.7k
Description
Name and Version
./llama-cli --version
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 3 ROCm devices:
Device 0: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
version: 6969 (aa37417)
built with AOMP_STANDALONE_22.0_roc7-1 clang version 22.0.0_AOMP_STANDALONE_22.0_roc7-1 (https://github.com/ROCm/llvm-project 5e5ac6bb724fe52fe05a96cd4fee1aea0142d40c) for x86_64-unknown-linux-gnu
Operating systems
Linux
GGML backends
HIP
Hardware
Epyc 7B13 + 3X Radeon Pro VII
Models
Qwen3-VL-30B-A3B-Instruct Q8_0
Qwen3-VL-32B-Instruct Q8_0
Qwen3-VL-235B-A22B-Instruct Q8_0
GLM-4.6 Q5_K_M
Problem description & steps to reproduce
Token generation speed is much lower with b6969 while memory usage increased.
Every model is affected, to a different degree.
-fa on/off doesn't affect the result.
b6968 VRAM usage
============================================= ROCm System Management Interface =============================================
======================================================= Concise Info =======================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Socket) (Mem, Compute, ID)
============================================================================================================================
0 3 0x66a1, 28047 66.0°C 128.0W N/A, N/A, 0 1654Mhz 1000Mhz 30.59% manual 140.0W 80% 99%
1 1 0x66a1, 40382 54.0°C 132.0W N/A, N/A, 0 1654Mhz 1000Mhz 30.59% manual 140.0W 75% 99%
2 2 0x66a1, 52861 67.0°C 133.0W N/A, N/A, 0 1654Mhz 1000Mhz 30.59% manual 140.0W 77% 99%
============================================================================================================================
=================================================== End of ROCm SMI Log ====================================================
b6969 VRAM usage
============================================= ROCm System Management Interface =============================================
======================================================= Concise Info =======================================================
Device Node IDs Temp Power Partitions SCLK MCLK Fan Perf PwrCap VRAM% GPU%
(DID, GUID) (Edge) (Socket) (Mem, Compute, ID)
============================================================================================================================
0 3 0x66a1, 28047 63.0°C 139.0W N/A, N/A, 0 1654Mhz 1000Mhz 30.59% manual 140.0W 91% 100%
1 1 0x66a1, 40382 52.0°C 121.0W N/A, N/A, 0 1654Mhz 1000Mhz 30.59% manual 140.0W 86% 100%
2 2 0x66a1, 52861 62.0°C 129.0W N/A, N/A, 0 1654Mhz 1000Mhz 30.59% manual 140.0W 88% 100%
============================================================================================================================
=================================================== End of ROCm SMI Log ====================================================
First Bad Commit
b6969
Relevant log output
b6968 llama-bench
llama-bench -m /home/user/text-generation-webui/models/Qwen3-vl-32b/Qwen3-VL-32B-Instruct-UD-Q8_K_XL.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 3 ROCm devices:
Device 0: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3vl 32B Q8_0 | 36.76 GiB | 32.76 B | ROCm | 99 | pp512 | 110.06 ± 0.04 |
| qwen3vl 32B Q8_0 | 36.76 GiB | 32.76 B | ROCm | 99 | tg128 | 15.32 ± 0.01 |
build: 5b180c3d6 (6968)
llama-bench -m /home/user/text-generation-webui/models/Qwen3-vl-32b/Qwen3-VL-32B-Instruct-UD-Q8_K_XL.gguf -sm row
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 3 ROCm devices:
Device 0: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | sm | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ----: | --------------: | -------------------: |
| qwen3vl 32B Q8_0 | 36.76 GiB | 32.76 B | ROCm | 99 | row | pp512 | 186.17 ± 0.58 |
| qwen3vl 32B Q8_0 | 36.76 GiB | 32.76 B | ROCm | 99 | row | tg128 | 22.19 ± 0.02 |
build: 5b180c3d6 (6968)
b6969 llama-bench
llama-bench -m /home/user/text-generation-webui/models/Qwen3-vl-32b/Qwen3-VL-32B-Instruct-UD-Q8_K_XL.gguf
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 3 ROCm devices:
Device 0: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | --------------: | -------------------: |
| qwen3vl 32B Q8_0 | 36.76 GiB | 32.76 B | ROCm | 99 | pp512 | 110.57 ± 0.05 |
| qwen3vl 32B Q8_0 | 36.76 GiB | 32.76 B | ROCm | 99 | tg128 | 8.36 ± 0.01 |
build: aa374175c (6969)
llama-bench -m /home/user/text-generation-webui/models/Qwen3-vl-32b/Qwen3-VL-32B-Instruct-UD-Q8_K_XL.gguf -sm row
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 3 ROCm devices:
Device 0: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 1: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
Device 2: AMD Radeon Pro VII, gfx906:sramecc+:xnack- (0x906), VMM: no, Wave Size: 64
| model | size | params | backend | ngl | sm | test | t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ----: | --------------: | -------------------: |
| qwen3vl 32B Q8_0 | 36.76 GiB | 32.76 B | ROCm | 99 | row | pp512 | 184.50 ± 1.41 |
| qwen3vl 32B Q8_0 | 36.76 GiB | 32.76 B | ROCm | 99 | row | tg128 | 7.44 ± 0.00 |
build: aa374175c (6969)