Skip to content

Conversation

@KitaitiMakoto
Copy link
Collaborator

@KitaitiMakoto KitaitiMakoto commented Nov 11, 2025

Hi,

I found that we can use VAD feature separately from ASR and added API for that to Ruby bindings.

Thank you for such useful feature and API.

@jwijffels
Copy link
Contributor

A question. Did you manage to run the VAD on a GPU or is the VAD CPU-only?

@KitaitiMakoto
Copy link
Collaborator Author

Thank you for apprival!

@KitaitiMakoto KitaitiMakoto merged commit d9b7613 into ggml-org:master Nov 13, 2025
64 of 66 checks passed
@KitaitiMakoto KitaitiMakoto deleted the ruby-vad branch November 13, 2025 01:15
@KitaitiMakoto
Copy link
Collaborator Author

Did you manage to run the VAD on a GPU or is the VAD CPU-only?

I didn't care about. It seems CPU is used:

whisper.cpp/src/whisper.cpp

Lines 4658 to 4660 in a1867e0

// TODO: GPU VAD is forced disabled until the performance is improved
//whisper_context_params.use_gpu = vctx->params.use_gpu;
whisper_context_params.use_gpu = false;

@jwijffels
Copy link
Contributor

Did you manage to run the VAD on a GPU or is the VAD CPU-only?

I didn't care about. It seems CPU is used:

whisper.cpp/src/whisper.cpp

Lines 4658 to 4660 in a1867e0

// TODO: GPU VAD is forced disabled until the performance is improved
//whisper_context_params.use_gpu = vctx->params.use_gpu;
whisper_context_params.use_gpu = false;

Thanks for the answer and the link to the code. Indeed CPU-only.

bygreencn added a commit to bygreencn/whisper.cpp that referenced this pull request Nov 15, 2025
# By Georgi Gerganov (80) and others
# Via GitHub
* ggerganov/master: (441 commits)
  ruby : VAD separately from ASR (ggml-org#3518)
  sync : llama.cpp
  sync : ggml
  vulkan: iGPU memory reporting fix (llama/17110)
  vulkan: fix mmq out of bounds reads (llama/17108)
  vulkan: fuse mul_mat_id + mul (llama/17095)
  metal : retain src and dst buffers during async ops (llama/17101)
  vulkan: Use spec constants for conv2d s/d/p and kernel W/H (llama/16978)
  Revert "CUDA: add expert reduce kernel (ggml/16857)" (llama/17100)
  CUDA: skip fusion for repeating adds in bias (llama/17080)
  vulkan: Increase BK to 32; use BK/4 for non-CM mul_mm.comp (llama/16636)
  ggml: disable vxe for cross-compilation by default (llama/16966)
  vulkan: fuse rms_norm + mul + rope (+ view + set_rows) (llama/16977)
  vulkan: Fix test-thread-safety crashes (llama/17024)
  CUDA: fix MMQ stream-k fixup ne1 indices (llama/17089)
  ggml webgpu: faster matrix multiplication/matrix-vector multiplication (llama/17031)
  CUDA: properly handle nb00=nb02 case for cpy (llama/17081)
  vulkan : refactor buffer handling in vk_op_f32 (llama/16840)
  CUDA: fix should_use_mmvf for ne11 == 1 (llama/17085)
  Revert "ggml-cpu: detect correct cpu flags for arm64 (llama/16229) (#16239)" (llama/17084)
  ...

# Conflicts:
#	examples/CMakeLists.txt
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants