You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(stt): add base_url and language support for local whisper servers (#616)
Add base_url and language fields to SttConfig, allowing the Whisper STT
provider to target OpenAI-compatible local servers (e.g. whisper.cpp)
without requiring an OpenAI API key. Pass language parameter in
transcription requests for accurate non-English speech recognition.
Preserve voice message attachments through drain_channel buffering and
add configurable language support for the candle-whisper backend.
Copy file name to clipboardExpand all lines: CHANGELOG.md
+5Lines changed: 5 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
7
7
## [Unreleased]
8
8
9
9
### Added
10
+
-`base_url` and `language` fields in `[llm.stt]` config for OpenAI-compatible local whisper servers (e.g. whisper.cpp)
11
+
-`ZEPH_STT_BASE_URL` and `ZEPH_STT_LANGUAGE` environment variable overrides
12
+
- Whisper API provider now passes `language` parameter for accurate non-English transcription
13
+
- Documentation for whisper.cpp server setup with Metal acceleration on macOS
10
14
- Per-sub-provider `base_url` and `embedding_model` overrides in orchestrator config
11
15
- Full orchestrator example with cloud + local + STT in default.toml
12
16
- All previously undocumented config keys in default.toml (`agent.auto_update_check`, `llm.stt`, `llm.vision_model`, `skills.disambiguation_threshold`, `tools.filters.*`, `tools.permissions`, `a2a.auth_token`, `mcp.servers.env`)
@@ -17,6 +21,7 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
17
21
- Vault age backend now falls back to default directory for key/path when `--vault-key`/`--vault-path` are not provided, matching `zeph vault init` behavior (#613)
18
22
19
23
### Changed
24
+
- Whisper STT provider no longer requires OpenAI API key when `base_url` points to a local server
20
25
- Orchestrator sub-providers now resolve `base_url` and `embedding_model` via fallback chain: per-provider, parent section, global default
Copy file name to clipboardExpand all lines: docs/src/advanced/multimodal.md
+42-1Lines changed: 42 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,15 +20,56 @@ provider = "whisper"
20
20
model = "whisper-1"
21
21
```
22
22
23
-
The Whisper provider inherits the OpenAI API key from `[llm.openai]` or `ZEPH_OPENAI_API_KEY`. Environment variable overrides: `ZEPH_STT_PROVIDER`, `ZEPH_STT_MODEL`.
23
+
When `base_url` is omitted, the provider uses the OpenAI API key from `[llm.openai]` or `ZEPH_OPENAI_API_KEY`. Set `base_url` to point at any OpenAI-compatible server (no API key required for local servers). The `language` field accepts an [ISO-639-1](https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes) code (e.g. `ru`, `en`, `de`) or `auto` for automatic detection.
| OpenAI Whisper API |`whisper`|`stt`| Cloud-based transcription |
32
+
| OpenAI-compatible server |`whisper`|`stt`| Any local server with `/v1/audio/transcriptions`|
30
33
| Local Whisper |`candle-whisper`|`candle`| Fully offline via candle |
31
34
35
+
### Local Whisper Server (whisper.cpp)
36
+
37
+
The recommended setup for local speech-to-text. Uses Metal acceleration on Apple Silicon and handles all audio formats (including Telegram OGG/Opus) server-side.
0 commit comments