-
Notifications
You must be signed in to change notification settings - Fork 4.8k
Open
Description
Hello, I encountered a performance inconsistency while testing the [large-v3](https://huggingface.co/openai/whisper-large-v3)
model for speech transcription. Specifically, Whisper.cpp in CPU mode is noticeably slower than the Python implementation using CPU, which is unexpected given that C++ implementations are typically more efficient.
I would like to clarify whether this is due to technical limitations, implementation specifics, or configuration issues.
Test Setup & Conditions
- Test audio: Multiple
.mp3
files (10 rounds per test) - Model:
large-v3
(same model used across implementations) - Language: Chinese (
-l zh
) - Model loading time excluded from runtime measurement
- Each mode logs audio duration, processing time, RTF, and memory usage
- Whisper.cpp was invoked via Python using
subprocess.Popen
, to programmatically measure execution time and capture stdout/stderr - Same machine and environment were used for all tests (no container/OS-level changes)
Summary of Results (CPU mode)
Mode | Avg RTF (excluding load time) | Notes |
---|---|---|
Whisper.cpp (CPU) | Significantly higher | with --no-gpu |
Whisper (Python) | Noticeably faster | using torch + whisper |
Use python call whisper c++ cli:
whisper_cli_path = "/app/whisper.cpp/build/bin/whisper-cli"
cmd = [
whisper_cli_path,
"-f", audio_file,
"-m", "/app/whisper.cpp/models/ggml-large-v3.bin",
"-l", "zh",
"--no-gpu"
]
Use python whisper:
import whisper
model = whisper.load_model("large-v3", device="cpu")
result = model.transcribe(str(audio_path))
Suggested Labels
- performance
- question
- help wanted
shanekao-sks
Metadata
Metadata
Assignees
Labels
No labels