-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Bug Description
Environment
- OS: Ubuntu 24.04 on WSL2 (Windows 11)
- GPU: NVIDIA RTX 4080
- Hermes: installed via Pinokio
- ctranslate2: 4.7.1
- faster-whisper: local provider, model: base
Bug
When using stt.provider: local, sending a voice message crashes with:
RuntimeError: Library libcublas.so.12 is not found or cannot be loaded
Full traceback points to transcription_tools.py line 283:
_local_model = WhisperModel(model_name, device="auto", compute_type="auto")
Root Cause
device="auto" tells ctranslate2 to probe for CUDA at model init time.
On systems where CUDA libs exist inside Python wheels (e.g. torch's bundled
nvidia-cublas-cu12) but are NOT on LD_LIBRARY_PATH, ctranslate2 crashes hard
instead of falling back gracefully to CPU.
This affects WSL2 setups where GPU works for other tools (Jan.ai, Wan2GP via
bundled torch CUDA) but CUDA libs are not system-visible.
Workaround
Manually patching line 283 of transcription_tools.py:
_local_model = WhisperModel(model_name, device="cpu", compute_type="int8")
Confirms the fix works. But this is overwritten on every Hermes update.
Proposed Fix
Two complementary changes:
1. Safer default — change device="auto" to device="cpu" as the default.
GPU Whisper requires explicit CUDA system setup that cannot be assumed,
especially on WSL2 or venv-isolated environments. For short voice messages
(typical Telegram/Discord use), CPU Whisper with int8 quantization is fast
enough that GPU offers no meaningful UX improvement.
2. Expose device in config.yaml — add device and compute_type keys
under stt.local so users who do have proper system CUDA can opt into GPU:
stt:
provider: local
local:
model: base
device: cpu # or cuda
compute_type: int8 # or float16 for GPU
Steps to Reproduce
- Install Hermes Agent via Pinokio on WSL2 (Ubuntu 24.04)
- Configure stt.provider: local in config.yaml
- Send a voice message via Telegram
Expected Behavior
Voice message is transcribed. If CUDA unavailable, falls back to CPU silently.
Actual Behavior
RuntimeError: Library libcublas.so.12 is not found or cannot be loaded
STT fails completely, no fallback.
Affected Component
Gateway (Telegram/Discord/Slack/WhatsApp), Other
Messaging Platform (if gateway-related)
Telegram
Operating System
Ubuntu 24.04 on WSL2 (Windows 11)
Python Version
3.11.15
Hermes Version
v0.4.0 (2026.3.23)
Relevant Logs / Traceback
Root Cause Analysis (optional)
No response
Proposed Fix (optional)
No response
Are you willing to submit a PR for this?
- I'd like to fix this myself and submit a PR