[Bug]: Local STT crashes with RuntimeError on systems without system-wide CUDA 12 libs (WSL2, venv-only CUDA)

### Bug Description

## Environment
- OS: Ubuntu 24.04 on WSL2 (Windows 11)
- GPU: NVIDIA RTX 4080
- Hermes: installed via Pinokio
- ctranslate2: 4.7.1
- faster-whisper: local provider, model: base

## Bug

When using `stt.provider: local`, sending a voice message crashes with:

    RuntimeError: Library libcublas.so.12 is not found or cannot be loaded

Full traceback points to `transcription_tools.py` line 283:

    _local_model = WhisperModel(model_name, device="auto", compute_type="auto")

## Root Cause

`device="auto"` tells ctranslate2 to probe for CUDA at model init time. 
On systems where CUDA libs exist inside Python wheels (e.g. torch's bundled 
nvidia-cublas-cu12) but are NOT on LD_LIBRARY_PATH, ctranslate2 crashes hard 
instead of falling back gracefully to CPU.

This affects WSL2 setups where GPU works for other tools (Jan.ai, Wan2GP via 
bundled torch CUDA) but CUDA libs are not system-visible.

## Workaround

Manually patching line 283 of `transcription_tools.py`:

    _local_model = WhisperModel(model_name, device="cpu", compute_type="int8")

Confirms the fix works. But this is overwritten on every Hermes update.

## Proposed Fix

Two complementary changes:

**1. Safer default** — change `device="auto"` to `device="cpu"` as the default. 
GPU Whisper requires explicit CUDA system setup that cannot be assumed, 
especially on WSL2 or venv-isolated environments. For short voice messages 
(typical Telegram/Discord use), CPU Whisper with int8 quantization is fast 
enough that GPU offers no meaningful UX improvement.

**2. Expose device in config.yaml** — add `device` and `compute_type` keys 
under `stt.local` so users who do have proper system CUDA can opt into GPU:

    stt:
      provider: local
      local:
        model: base
        device: cpu        # or cuda
        compute_type: int8  # or float16 for GPU

### Steps to Reproduce

1. Install Hermes Agent via Pinokio on WSL2 (Ubuntu 24.04)
2. Configure stt.provider: local in config.yaml
3. Send a voice message via Telegram

### Expected Behavior

Voice message is transcribed. If CUDA unavailable, falls back to CPU silently.

### Actual Behavior

RuntimeError: Library libcublas.so.12 is not found or cannot be loaded
STT fails completely, no fallback.

### Affected Component

Gateway (Telegram/Discord/Slack/WhatsApp), Other

### Messaging Platform (if gateway-related)

Telegram

### Operating System

Ubuntu 24.04 on WSL2 (Windows 11)

### Python Version

3.11.15

### Hermes Version

v0.4.0 (2026.3.23)

### Relevant Logs / Traceback

```shell

```

### Root Cause Analysis (optional)

_No response_

### Proposed Fix (optional)

_No response_

### Are you willing to submit a PR for this?

- [ ] I'd like to fix this myself and submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Local STT crashes with RuntimeError on systems without system-wide CUDA 12 libs (WSL2, venv-only CUDA) #2885

Bug Description

Environment

Bug

Root Cause

Workaround

Proposed Fix

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Operating System

Python Version

Hermes Version

Relevant Logs / Traceback

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: Local STT crashes with RuntimeError on systems without system-wide CUDA 12 libs (WSL2, venv-only CUDA) #2885

Description

Bug Description

Environment

Bug

Root Cause

Workaround

Proposed Fix

Steps to Reproduce

Expected Behavior

Actual Behavior

Affected Component

Messaging Platform (if gateway-related)

Operating System

Python Version

Hermes Version

Relevant Logs / Traceback

Root Cause Analysis (optional)

Proposed Fix (optional)

Are you willing to submit a PR for this?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions