Skip to content

[Bug]: Local STT crashes with RuntimeError on systems without system-wide CUDA 12 libs (WSL2, venv-only CUDA) #2885

@hiro-volforto

Description

@hiro-volforto

Bug Description

Environment

  • OS: Ubuntu 24.04 on WSL2 (Windows 11)
  • GPU: NVIDIA RTX 4080
  • Hermes: installed via Pinokio
  • ctranslate2: 4.7.1
  • faster-whisper: local provider, model: base

Bug

When using stt.provider: local, sending a voice message crashes with:

RuntimeError: Library libcublas.so.12 is not found or cannot be loaded

Full traceback points to transcription_tools.py line 283:

_local_model = WhisperModel(model_name, device="auto", compute_type="auto")

Root Cause

device="auto" tells ctranslate2 to probe for CUDA at model init time.
On systems where CUDA libs exist inside Python wheels (e.g. torch's bundled
nvidia-cublas-cu12) but are NOT on LD_LIBRARY_PATH, ctranslate2 crashes hard
instead of falling back gracefully to CPU.

This affects WSL2 setups where GPU works for other tools (Jan.ai, Wan2GP via
bundled torch CUDA) but CUDA libs are not system-visible.

Workaround

Manually patching line 283 of transcription_tools.py:

_local_model = WhisperModel(model_name, device="cpu", compute_type="int8")

Confirms the fix works. But this is overwritten on every Hermes update.

Proposed Fix

Two complementary changes:

1. Safer default — change device="auto" to device="cpu" as the default.
GPU Whisper requires explicit CUDA system setup that cannot be assumed,
especially on WSL2 or venv-isolated environments. For short voice messages
(typical Telegram/Discord use), CPU Whisper with int8 quantization is fast
enough that GPU offers no meaningful UX improvement.

2. Expose device in config.yaml — add device and compute_type keys
under stt.local so users who do have proper system CUDA can opt into GPU:

stt:
  provider: local
  local:
    model: base
    device: cpu        # or cuda
    compute_type: int8  # or float16 for GPU

Steps to Reproduce

  1. Install Hermes Agent via Pinokio on WSL2 (Ubuntu 24.04)
  2. Configure stt.provider: local in config.yaml
  3. Send a voice message via Telegram

Expected Behavior

Voice message is transcribed. If CUDA unavailable, falls back to CPU silently.

Actual Behavior

RuntimeError: Library libcublas.so.12 is not found or cannot be loaded
STT fails completely, no fallback.

Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp), Other

Messaging Platform (if gateway-related)

Telegram

Operating System

Ubuntu 24.04 on WSL2 (Windows 11)

Python Version

3.11.15

Hermes Version

v0.4.0 (2026.3.23)

Relevant Logs / Traceback

Root Cause Analysis (optional)

No response

Proposed Fix (optional)

No response

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions