1.8.0 working with ROCm 7.0.1 on Strix Halo AMD Ryzen AI Max+ 395

Ubuntu 25.04, manually installed ROCm 7.0.1, and whisper.cpp 1.8.0 works on a Strix Halo Ryzen AI Max+ 395 APU:

```bash
$ mkdir build ; cd build
$ cmake .. \
  -DGPU_TARGETS="gfx1151" \
  -DGGML_HIP=ON \
  -DCMAKE_C_COMPILER=/opt/rocm/bin/amdclang \
  -DCMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++ \
  -DCMAKE_PREFIX_PATH="/opt/rocm" -DGGML_ROCM=1
$ cmake --build . --config Release -j
```

then:

```bash
$ time bin/whisper-cli -m ../models/ggml-base.en.bin -f really-long-audio-file.mp3
whisper_init_from_file_with_params_no_state: loading model from '../models/ggml-base.en.bin'
whisper_init_with_params_no_state: use gpu    = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw        = 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ:    no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
  Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
whisper_init_with_params_no_state: devices    = 2
whisper_init_with_params_no_state: backends   = 2
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head  = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 512
whisper_model_load: n_text_head   = 8
whisper_model_load: n_text_layer  = 6
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs       = 99
whisper_model_load:        ROCm0 total size =   147.37 MB
whisper_model_load: model size    =  147.37 MB
whisper_backend_init_gpu: using ROCm0 backend
whisper_init_state: kv self size  =    6.29 MB
whisper_init_state: kv cross size =   18.87 MB
whisper_init_state: kv pad  size  =    3.15 MB
whisper_init_state: compute buffer (conv)   =   17.24 MB
whisper_init_state: compute buffer (encode) =   23.09 MB
whisper_init_state: compute buffer (cross)  =    4.66 MB
whisper_init_state: compute buffer (decode) =   97.29 MB

system_info: n_threads = 4 / 32 | WHISPER : COREML = 0 | OPENVINO = 0 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | OPENMP = 1 | REPACK = 1 |

main: processing '/path/to/really-long-audio-file.mp3' (32702798 samples, 2043.9 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...

... transcription here ...

whisper_print_timings:     load time =   143.40 ms
whisper_print_timings:     fallbacks =   1 p /   0 h
whisper_print_timings:      mel time =   608.37 ms
whisper_print_timings:   sample time =  8784.25 ms / 33747 runs (     0.26 ms per run)
whisper_print_timings:   encode time =  2269.30 ms /    95 runs (    23.89 ms per run)
whisper_print_timings:   decode time =  1165.29 ms /   639 runs (     1.82 ms per run)
whisper_print_timings:   batchd time = 19240.58 ms / 32629 runs (     0.59 ms per run)
whisper_print_timings:   prompt time =   900.82 ms / 19604 runs (     0.05 ms per run)
whisper_print_timings:    total time = 35210.69 ms

real    0m35.324s
user    0m52.344s
sys     0m16.786s
```

Using the `time` command, I got it to run on my GPU at about 1 second per minute of audio.

`amdgpu_top` screenshot while processing:

<img width="1247" height="1127" alt="Image" src="https://github.com/user-attachments/assets/7f7e4262-d514-4d4b-a70c-05fadd04d74a" />

(side note, the VRAM usage you see over 13GB is because I have an LLM loaded in memory at the same time)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

1.8.0 working with ROCm 7.0.1 on Strix Halo AMD Ryzen AI Max+ 395 #3459

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

1.8.0 working with ROCm 7.0.1 on Strix Halo AMD Ryzen AI Max+ 395 #3459

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions