-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Open
Description
Ubuntu 25.04, manually installed ROCm 7.0.1, and whisper.cpp 1.8.0 works on a Strix Halo Ryzen AI Max+ 395 APU:
$ mkdir build ; cd build
$ cmake .. \
-DGPU_TARGETS="gfx1151" \
-DGGML_HIP=ON \
-DCMAKE_C_COMPILER=/opt/rocm/bin/amdclang \
-DCMAKE_CXX_COMPILER=/opt/rocm/bin/amdclang++ \
-DCMAKE_PREFIX_PATH="/opt/rocm" -DGGML_ROCM=1
$ cmake --build . --config Release -j
then:
$ time bin/whisper-cli -m ../models/ggml-base.en.bin -f really-long-audio-file.mp3
whisper_init_from_file_with_params_no_state: loading model from '../models/ggml-base.en.bin'
whisper_init_with_params_no_state: use gpu = 1
whisper_init_with_params_no_state: flash attn = 1
whisper_init_with_params_no_state: gpu_device = 0
whisper_init_with_params_no_state: dtw = 0
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no
ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no
ggml_cuda_init: found 1 ROCm devices:
Device 0: AMD Radeon Graphics, gfx1151 (0x1151), VMM: no, Wave Size: 32
whisper_init_with_params_no_state: devices = 2
whisper_init_with_params_no_state: backends = 2
whisper_model_load: loading model
whisper_model_load: n_vocab = 51864
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 512
whisper_model_load: n_audio_head = 8
whisper_model_load: n_audio_layer = 6
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 512
whisper_model_load: n_text_head = 8
whisper_model_load: n_text_layer = 6
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 2 (base)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: n_langs = 99
whisper_model_load: ROCm0 total size = 147.37 MB
whisper_model_load: model size = 147.37 MB
whisper_backend_init_gpu: using ROCm0 backend
whisper_init_state: kv self size = 6.29 MB
whisper_init_state: kv cross size = 18.87 MB
whisper_init_state: kv pad size = 3.15 MB
whisper_init_state: compute buffer (conv) = 17.24 MB
whisper_init_state: compute buffer (encode) = 23.09 MB
whisper_init_state: compute buffer (cross) = 4.66 MB
whisper_init_state: compute buffer (decode) = 97.29 MB
system_info: n_threads = 4 / 32 | WHISPER : COREML = 0 | OPENVINO = 0 | ROCm : NO_VMM = 1 | PEER_MAX_BATCH_SIZE = 128 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | BMI2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | OPENMP = 1 | REPACK = 1 |
main: processing '/path/to/really-long-audio-file.mp3' (32702798 samples, 2043.9 sec), 4 threads, 1 processors, 5 beams + best of 5, lang = en, task = transcribe, timestamps = 1 ...
... transcription here ...
whisper_print_timings: load time = 143.40 ms
whisper_print_timings: fallbacks = 1 p / 0 h
whisper_print_timings: mel time = 608.37 ms
whisper_print_timings: sample time = 8784.25 ms / 33747 runs ( 0.26 ms per run)
whisper_print_timings: encode time = 2269.30 ms / 95 runs ( 23.89 ms per run)
whisper_print_timings: decode time = 1165.29 ms / 639 runs ( 1.82 ms per run)
whisper_print_timings: batchd time = 19240.58 ms / 32629 runs ( 0.59 ms per run)
whisper_print_timings: prompt time = 900.82 ms / 19604 runs ( 0.05 ms per run)
whisper_print_timings: total time = 35210.69 ms
real 0m35.324s
user 0m52.344s
sys 0m16.786s
Using the time
command, I got it to run on my GPU at about 1 second per minute of audio.
amdgpu_top
screenshot while processing:

(side note, the VRAM usage you see over 13GB is because I have an LLM loaded in memory at the same time)
Metadata
Metadata
Assignees
Labels
No labels