OpenVoiceOS
diff --git a/‎AUDIT.md‎
Lines changed: 12 additions & 6 deletions b/‎AUDIT.md‎
Lines changed: 12 additions & 6 deletions
diff --git a/‎FAQ.md‎
Lines changed: 131 additions & 8 deletions b/‎FAQ.md‎
Lines changed: 131 additions & 8 deletions
diff --git a/‎MAINTENANCE_REPORT.md‎
Lines changed: 17 additions & 0 deletions b/‎MAINTENANCE_REPORT.md‎
Lines changed: 17 additions & 0 deletions
diff --git a/‎QUICK_FACTS.md‎
Lines changed: 6 additions & 2 deletions b/‎QUICK_FACTS.md‎
Lines changed: 6 additions & 2 deletions
diff --git a/‎SUGGESTIONS.md‎
Lines changed: 11 additions & 5 deletions b/‎SUGGESTIONS.md‎
Lines changed: 11 additions & 5 deletions
@@ -7,17 +7,23 @@
 - [x] AUDIT.md
 - [x] SUGGESTIONS.md
 - [x] docs/index.md
+- [x] docs/api-compatibility.md
+- [x] docs/audio-formats.md
+- [x] docs/response-formats.md
 
 ## Technical Debt & Issues
 
-- `[MAJOR]` **tests**: No unit tests — `test/` directory does not exist (`__init__.py` has no corresponding test coverage).
 - `[MINOR]` **pyproject.toml**: `requires-python = ">=3.9"` — workspace standard is 3.10+; align after verifying compatibility — `pyproject.toml:12`.
 - `[MINOR]` **deps**: `fastapi~=0.95` and `uvicorn~=0.22` are old pinned versions; should be broadened — `pyproject.toml:21-22`.
 - `[MINOR]` **validation**: `/stt` does not validate `sample_width` values — `__init__.py:165`.
+- `[MINOR]` **Speechmatics in-memory store**: `_jobs` dict in `speechmatics.py:13` is module-level and not thread-safe under concurrent requests; grows unboundedly — `routers/speechmatics.py:13`.
+- `[MINOR]` **Deepgram audio assumption**: Deepgram router assumes 16 kHz 16-bit mono regardless of `Content-Type` — `routers/deepgram.py:81`. WAV files with different parameters will produce incorrect results.
 - `[INFO]` **ci**: `publish_stable.yml` and `release_workflow.yml` already use `@dev` refs — no action needed.
 
-## Resolved Issues (2026-03-17)
-- Gradio dependency and `gradio_app.py` removed.
-- `CORS_ORIGINS` env-var removed; `allow_origins=["*"]` unconditional.
-- `unit_tests.yml` updated from obsolete `neongeckocom` reference to `OpenVoiceOS/gh-automations@dev`.
-- `lint.yml`, `build_tests.yml`, `pip_audit.yml` workflows added.
+## Resolved Issues
+
+- `[RESOLVED 2026-03-18]` **tests**: 25 unit tests added in `test/unittests/test_compat_routers.py`.
+- `[RESOLVED 2026-03-17]` Gradio dependency and `gradio_app.py` removed.
+- `[RESOLVED 2026-03-17]` `CORS_ORIGINS` env-var removed; `allow_origins=["*"]` unconditional.
+- `[RESOLVED 2026-03-17]` `unit_tests.yml` updated from obsolete `neongeckocom` reference to `OpenVoiceOS/gh-automations@dev`.
+- `[RESOLVED 2026-03-17]` `lint.yml`, `build_tests.yml`, `pip_audit.yml` workflows added.
@@ -1,25 +1,148 @@
 # FAQ — ovos-stt-http-server
 
+## General
+
+**Q: What is ovos-stt-http-server?**
+A: A FastAPI-based HTTP server that wraps any OVOS STT plugin and exposes it via a REST API. It also provides five vendor-compatible compat routers for drop-in use with OpenAI Whisper, Deepgram, Google Cloud STT, AssemblyAI, and Speechmatics clients.
+
+**Q: What is the default port?**
+A: The CLI defaults to `8080`. Override with `--port <number>`.
+
+**Q: How do I start the server?**
+A: `ovos-stt-server --engine <plugin-name> --port 8080`. The `--engine` flag is required.
+
+**Q: Do I need API keys or credentials?**
+A: No. All vendor-compatible routers accept auth headers and API-key query parameters but silently ignore them. No credentials are validated.
+
+**Q: What Python version is required?**
+A: Python 3.9 or later (pyproject.toml `requires-python = ">=3.9"`).
+
 **Q: What audio format does `/stt` expect?**
-Raw PCM bytes: 16 kHz, mono, 16-bit signed integer (int16). Pass `sample_rate` and `sample_width` query params if your audio differs from the defaults.
+A: Raw PCM bytes: 16 kHz, mono, 16-bit signed integer (int16). Pass `sample_rate` and `sample_width` query params if your audio differs from the defaults.
 
 **Q: How do I configure CORS?**
-CORS is unconditionally set to `allow_origins=["*"]`. There is no env-var override. All origins are permitted. See `create_app` — `ovos_stt_http_server/__init__.py:109`.
+A: CORS is unconditionally set to `allow_origins=["*"]`. There is no env-var override. All origins are permitted. See `create_app` — `ovos_stt_http_server/__init__.py:109`.
 
 **Q: How do I enable automatic language detection?**
-Pass `lang=auto` as a query parameter to `/stt`, or use the `/lang_detect` endpoint directly. A `lang_plugin` must be provided at startup (`--lang-engine`).
+A: Pass `lang=auto` as a query parameter to `/stt`, or use the `/lang_detect` endpoint directly. A `lang_plugin` must be provided at startup (`--lang-engine`).
 
 **Q: What is `--multi` mode?**
-`--multi` loads one `MultiModelContainer` (`__init__.py:57`) that instantiates a separate plugin instance per language code on first use. Useful for multilingual deployments with language-specific models.
+A: `--multi` loads one `MultiModelContainer` (`__init__.py:57`) that instantiates a separate plugin instance per language code on first use. Useful for multilingual deployments with language-specific models.
 
 **Q: How do I specify the STT plugin?**
-Pass `--engine <plugin-name>` to the CLI. The plugin must be installed and discoverable via `ovos-plugin-manager`.
+A: Pass `--engine <plugin-name>` to the CLI. The plugin must be installed and discoverable via `ovos-plugin-manager`.
 
 **Q: What plugins are supported?**
-Any plugin registered under the `opm.plugin.stt` entry point group. Install the plugin package and reference it by its entry point name.
+A: Any plugin registered under the `opm.plugin.stt` entry point group. Install the plugin package and reference it by its entry point name.
 
 **Q: What does `/status` return?**
-`{"status": "ok", "plugin": "<engine-name>", "lang_plugin": "<lang-engine-name-or-null>"}` — `stats` handler in `__init__.py:142`.
+A: `{"status": "ok", "plugin": "<engine-name>", "lang_plugin": "<lang-engine-name-or-null>"}` — `stats` handler in `__init__.py:142`.
 
 **Q: Is Gradio UI supported?**
-No. Gradio support was removed. The server is a pure REST API only.
+A: No. Gradio support was removed. The server is a pure REST API only.
+
+---
+
+## OpenAI Whisper Compatible Clients
+
+**Q: Which OpenAI Whisper clients work with this server?**
+A: Any client that POSTs to `/v1/audio/transcriptions` or `/v1/audio/translations` with multipart form data works. This includes the official `openai` Python SDK, `whisper-client`, and raw `curl` commands.
+
+**Q: How do I use the OpenAI Python SDK against this server?**
+A: Set `base_url="http://localhost:8080/openai"` when constructing the `OpenAI` client. The `api_key` parameter is accepted but ignored.
+
+**Q: What `response_format` values are supported?**
+A: `json` (default), `text`, `srt`, `vtt`, and `verbose_json`. See [docs/response-formats.md](docs/response-formats.md).
+
+**Q: Does `verbose_json` return real word-level segments?**
+A: No. The `segments` field is always an empty list. `task`, `language`, `duration`, and `text` are populated.
+
+**Q: Does the translations endpoint really translate audio?**
+A: No — it calls the same STT engine as transcriptions but forces `language=en`. Translation between languages is not performed; the engine transcribes with English as the target hint.
+
+---
+
+## Deepgram Compatible Clients
+
+**Q: Which Deepgram clients work with this server?**
+A: Any client that POSTs raw audio bytes to `/v1/listen`. The official `deepgram-sdk` Python package works when its base URL is overridden.
+
+**Q: How is audio parsed for the Deepgram endpoint?**
+A: The raw request body is wrapped in `AudioData(body, 16000, 2)` — no format detection. Send WAV or raw PCM at 16 kHz 16-bit mono for best results.
+
+**Q: Does `punctuate=true` add punctuation?**
+A: No. The `punctuate` query parameter is accepted and ignored. Punctuation depends on the underlying STT plugin.
+
+**Q: What does the Deepgram `words` array contain?**
+A: An empty list. Word-level timing is not implemented.
+
+---
+
+## Google Speech-to-Text Compatible Clients
+
+**Q: Which Google STT clients work?**
+A: Any client that POSTs to `/v1/speech:recognize` with a JSON body containing `config` and `audio.content` (base64-encoded audio).
+
+**Q: Are GCS URIs (`gs://...`) supported?**
+A: No. The server returns HTTP 501 if `audio.uri` is set. Use `audio.content` with base64-encoded audio.
+
+**Q: Does the `encoding` field matter?**
+A: No — the server attempts to parse uploaded bytes as WAV regardless of the `encoding` field value, then falls back to raw PCM.
+
+---
+
+## AssemblyAI Stub Behavior
+
+**Q: Why does the AssemblyAI GET transcript endpoint always return `status: error`?**
+A: This server is synchronous. Transcription completes in the POST response. No job store persists between requests, so GET by ID cannot retrieve prior results.
+
+**Q: Do I need to poll for results like the real AssemblyAI API?**
+A: No — the POST response already contains `status: completed` and the `text` field. Read the result directly from the POST response.
+
+**Q: What happens if I send `audio_url` instead of `audio`?**
+A: The server returns `status: error` with a message explaining that `audio_url` fetching is not supported. Encode your audio as base64 and put it in the `audio` field.
+
+**Q: Is the `id` in the POST response reusable?**
+A: No. The ID is a UUID generated per-request. The GET endpoint ignores it and always returns an error stub.
+
+---
+
+## Speechmatics Behavior
+
+**Q: How does the Speechmatics job model work on this server?**
+A: Job creation (POST `/v1/jobs`) transcribes immediately and stores the result in an in-memory dict keyed by job ID. GET retrieves from that dict.
+
+**Q: What happens if I GET a job that doesn't exist?**
+A: HTTP 404 is returned: `{"detail": "Job '<id>' not found."}`.
+
+**Q: Are job results preserved across server restarts?**
+A: No. The `_jobs` dict (`speechmatics.py:13`) is in-memory only.
+
+**Q: What `format` parameter does GET `/transcript` accept?**
+A: The `format` query param is accepted and ignored. The response is always Speechmatics JSON v2.9 format.
+
+---
+
+## Audio Format
+
+**Q: What audio formats are supported?**
+A: WAV is supported natively via stdlib. MP3, OGG, FLAC, M4A, and WebM require `pydub` (`pip install pydub`). See [docs/audio-formats.md](docs/audio-formats.md).
+
+**Q: What happens if I upload a non-WAV file without pydub installed?**
+A: HTTP 501 is returned with a message indicating that the format requires pydub.
+
+**Q: What sample rate and bit depth should I use?**
+A: 16 kHz, mono, 16-bit (int16). The server resamples non-WAV files via pydub to match these parameters.
+
+---
+
+## Language
+
+**Q: How do I specify the transcription language?**
+A: Each compat router has its own mechanism: `language` form field (Whisper), `?language=` query param (Deepgram), `config.languageCode` JSON field (Google), `language_code` JSON field (AssemblyAI), `transcription_config.language` in the job config JSON (Speechmatics).
+
+**Q: What happens if no language is specified?**
+A: Defaults vary per router: Deepgram defaults to `en`, AssemblyAI defaults to `en`, Speechmatics defaults to `en`, Whisper passes `None` → the engine receives `"auto"`.
+
+**Q: Does language auto-detection work with compat routers?**
+A: Not directly. Use the native `/lang_detect` endpoint, or start the server with `--lang-engine` to enable automatic language detection in the underlying engine.
@@ -1,5 +1,22 @@
 # Maintenance Report — ovos-stt-http-server
 
+## 2026-03-18
+
+**AI Model**: claude-sonnet-4-6
+**Oversight**: Human-directed, agent-executed
+
+### Actions Taken
+
+- **Created `docs/api-compatibility.md`**: Full table of all 5 compat routers with vendor prefix, endpoints, auth method, input formats, response formats, and curl examples per endpoint.
+- **Created `docs/audio-formats.md`**: Documents `multipart_audio_to_audiodata()` WAV/pydub paths, 501 fallback, Deepgram raw-body handling, Google/AssemblyAI base64 handling, and supported MIME types.
+- **Created `docs/response-formats.md`**: Documents all Whisper `response_format` values (`json`, `text`, `srt`, `vtt`, `verbose_json`) with example outputs, plus Deepgram/Google/AssemblyAI/Speechmatics response shapes.
+- **Updated `docs/index.md`**: Added table of contents linking to all three new docs files, added compat router section to architecture, updated audio format note.
+- **Rewrote `FAQ.md`**: Expanded from 8 to 30+ Q&A entries covering OpenAI Whisper, Deepgram, Google STT, AssemblyAI, Speechmatics, audio formats, language parameters, port/startup, and all general questions.
+- **Updated `QUICK_FACTS.md`**: Added `multipart_audio_to_audiodata()`, all 5 API prefixes, default port, and test count.
+- **Updated `AUDIT.md`**: Marked `[MAJOR]` test issue as resolved (25 tests added). Added new issues for compat router edge cases.
+- **Updated `SUGGESTIONS.md`**: Marked S-001 resolved. Added S-006 (Speechmatics in-memory store), S-007 (pydub optional dep documentation).
+- **Extended `test/unittests/test_compat_routers.py`**: Added 8 new tests — `response_format=text` plain text, `verbose_json` with `segments` field, translations endpoint forces `lang=en`, Deepgram with `?punctuate=true`, Google STT with base64 WAV, AssemblyAI GET transcript `status` field, Speechmatics GET unknown job_id returns 404, Speechmatics GET known job_id returns transcript.
+
 ## 2026-03-17
 
 **AI Model**: claude-sonnet-4-6
 
@@ -9,8 +9,12 @@
 | | `MultiModelContainer` — `ovos_stt_http_server/__init__.py:57` |
 | **Key functions** | `create_app()` — `ovos_stt_http_server/__init__.py:109` |
 | | `start_stt_server()` — `ovos_stt_http_server/__init__.py:184` |
-| **Endpoints** | `GET /status`, `POST /stt`, `POST /lang_detect` |
-| **Audio format** | PCM 16 kHz mono int16 |
+| | `multipart_audio_to_audiodata()` — `ovos_stt_http_server/audio_utils.py:10` |
+| **Native endpoints** | `GET /status`, `POST /stt`, `POST /lang_detect` |
+| **API prefixes** | `/openai`, `/deepgram`, `/google`, `/assemblyai/v2`, `/speechmatics/v1` |
+| **Audio format** | PCM 16 kHz mono int16 (native); WAV/MP3/OGG via compat routers |
 | **CORS** | Unconditional `allow_origins=["*"]` |
+| **Default port** | `8080` |
 | **Python** | >=3.9 |
 | **License** | Apache-2.0 |
+| **Unit tests** | 25 tests — `test/unittests/test_compat_routers.py` |
@@ -1,10 +1,7 @@
 # Suggestions — ovos-stt-http-server
 
-## S-001: Add unit tests
-No tests exist. Add `test/unittests/` with at least:
-- `create_app()` smoke test using a mock STT plugin.
-- `/status` endpoint response shape assertion.
-- `/stt` endpoint with synthetic PCM bytes.
+## S-001: Add unit tests [RESOLVED 2026-03-18]
+25 tests added in `test/unittests/test_compat_routers.py` covering all five compat routers.
 
 ## S-002: Pin fastapi and uvicorn to broader ranges
 `fastapi~=0.95` and `uvicorn~=0.22` are old. Update to `fastapi>=0.95,<1.0` and `uvicorn>=0.22` to allow newer compatible releases.
@@ -17,3 +14,12 @@ Add a CI matrix test across Python 3.10, 3.11, 3.12 using `OpenVoiceOS/gh-automa
 
 ## S-005: Migrate requires-python to >=3.10
 The project targets `>=3.9` but the workspace standard is 3.10+. Align after verifying no 3.9-specific usage.
+
+## S-006: Add TTL or size limit to Speechmatics in-memory job store
+The `_jobs` dict (`speechmatics.py:13`) grows unboundedly. Add a `maxlen` via `collections.OrderedDict` or an LRU cache, or a TTL-based eviction on the store.
+
+## S-007: Document pydub as an optional dependency in pyproject.toml
+`pydub` is imported conditionally in `audio_utils.py:35` but is not listed as a dependency. Add it as an optional extra in `pyproject.toml`: `[project.optional-dependencies] audio = ["pydub"]`.
+
+## S-008: Parse WAV headers in Deepgram router
+The Deepgram router blindly treats the body as 16 kHz 16-bit mono. Attempt `wave.open()` first and fall back to the hardcoded parameters only if parsing fails — similar to the pattern in `google_stt.py:90-97`.