Real-time speech-to-text and translation web application. Speak into a microphone, see transcription appear instantly, and get live translations into two target languages simultaneously.
Built with FastAPI, powered by three interchangeable STT engines, and designed to run anywhere -- locally, in Docker, or behind a reverse proxy.
- Features
- Architecture
- Quick Start
- Configuration
- API Reference
- Testing
- Deployment
- Security
- Roadmap
- Contributing
- Support
- License
-
Three STT engines -- switchable in the UI at any time:
Engine Runs on API Key Notes Web Speech API Browser None Chrome/Edge recommended; no server cost Deepgram Nova-3 Server Required High accuracy, low latency ElevenLabs Scribe v2 Server or Browser Required Server-side proxy or direct browser connection -
Real-time translation into two configurable target languages (powered by googletrans)
-
Interim + final results -- partial transcriptions shown live before the utterance is committed
-
Interim throttling -- server-side message versioning skips stale translations to prevent queue buildup
-
Password-protected -- cookie-based auth with HMAC-signed tokens (can be disabled for VPN/proxy setups)
-
Rate-limited login -- 10 attempts per 60 seconds per IP
-
Engine access control -- enable/disable engines per deployment via
ENABLED_ENGINES -
Security headers -- CSP, X-Content-Type-Options, X-Frame-Options, CSRF mitigation
-
Dark mode -- automatic (
prefers-color-scheme) or manual toggle (light/dark/system) -
Responsive UI -- works on desktop, tablet, and mobile
-
Adjustable font size -- slider for transcript readability (12--64 px, persisted)
-
Health check --
/healthendpoint for DockerHEALTHCHECKand load balancers
┌─────────────┐ ┌──────────────────────────────┐
│ Browser │◄─────►│ FastAPI Server │
│ │ WS │ │
│ Web Speech ├──────►│ /ws (text → translate)│
│ Deepgram ├──────►│ /ws/deepgram (audio → STT → tr.)│
│ ElevenLabs ├──────►│ /ws/elevenlabs (audio → STT → tr.)│
│ │ │ │
│ ElevenLabs ├──────►│ ElevenLabs WS (browser mode) │
│ (browser) │ WS │ ↕ /ws for translation only │
└─────────────┘ └──────────┬───────────────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
Deepgram API ElevenLabs API googletrans
(Nova-3 STT) (Scribe v2) (translation)
Web Speech -- The browser's built-in SpeechRecognition API handles STT locally; recognized text is sent to /ws for translation only.
Deepgram -- Raw PCM audio streams from the browser to /ws/deepgram. The server proxies it to the Deepgram SDK for transcription, then translates via googletrans.
ElevenLabs (server mode) -- Same pattern as Deepgram but using the ElevenLabs Scribe v2 Realtime WebSocket API at /ws/elevenlabs.
ElevenLabs (browser mode) -- The browser fetches a single-use token via POST /api/elevenlabs/token, connects directly to the ElevenLabs WS, and sends recognized text to /ws for translation (same flow as Web Speech).
- Python 3.10+ (uses
X | Noneunion syntax) - A microphone-capable browser (Chrome or Edge recommended for Web Speech)
- API keys for Deepgram and/or ElevenLabs (optional -- Web Speech works without any)
# Clone the repository
git clone https://github.com/Rhiz3K/realtime-stt-translator.git
cd realtime-stt-translator
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # Linux/macOS
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt
# Configure environment
cp .env.example .env
# Edit .env — at minimum set APP_PASSWORD
# Start the server
uvicorn app.main:app --host 0.0.0.0 --port 8000Open http://localhost:8000 and enter your password.
docker build -t live-translator .
docker run -p 8000:8000 --env-file .env live-translatorOr run with inline environment variables:
docker run -p 8000:8000 \
-e APP_PASSWORD=your-secret \
-e ENABLED_ENGINES=webspeech,deepgram \
-e DEEPGRAM_API_KEY=your-key \
live-translatorCopy .env.example and edit to taste. All variables have sensible defaults except APP_PASSWORD.
| Variable | Required | Default | Description |
|---|---|---|---|
APP_PASSWORD |
Yes* | -- | Login password. Required when AUTH_ENABLED=true. |
AUTH_ENABLED |
No | true |
Set false to skip login (useful behind VPN or reverse proxy auth). |
AUTH_SECRET |
No | APP_PASSWORD |
Separate HMAC signing secret for auth tokens. Recommended for production. |
AUTH_COOKIE_NAME |
No | srlt_auth |
Name of the auth cookie. |
AUTH_TOKEN_TTL_SECONDS |
No | 43200 (12 h) |
Auth token time-to-live. |
AUTH_COOKIE_SECURE |
No | Auto-detect | Force Secure flag on cookies. Set true behind HTTPS reverse proxy. |
| Variable | Required | Default | Description |
|---|---|---|---|
ALLOWED_ORIGINS |
No | -- | Comma-separated allowed WebSocket origins. If empty, origin host must match request Host header. |
| Variable | Required | Default | Description |
|---|---|---|---|
ENABLED_ENGINES |
No | webspeech |
Comma-separated list: webspeech, deepgram, elevenlabs. Disabled engines appear grayed out in the UI. |
DEEPGRAM_API_KEY |
For Deepgram | -- | API key from console.deepgram.com |
DEEPGRAM_RESULT_QUEUE_SIZE |
No | 100 |
Internal queue size for Deepgram transcription results. |
ELEVENLABS_API_KEY |
For ElevenLabs | -- | API key from elevenlabs.io |
| Variable | Required | Default | Description |
|---|---|---|---|
MAX_TEXT_LENGTH |
No | 5000 |
Maximum accepted input text length per WebSocket message. |
TRANSLATE_TIMEOUT_SECONDS |
No | 10 |
Timeout for a single googletrans call (seconds). |
Engines are enabled via the ENABLED_ENGINES environment variable:
# Web Speech only (default — no API keys needed)
ENABLED_ENGINES=webspeech
# All engines
ENABLED_ENGINES=webspeech,deepgram,elevenlabs
# Deepgram + ElevenLabs (no Web Speech)
ENABLED_ENGINES=deepgram,elevenlabsDisabled engines appear in the UI dropdown but are grayed out and cannot be selected.
| Method | Path | Auth | Description |
|---|---|---|---|
GET |
/ |
Yes* | Main UI. Renders login form if not authenticated. |
GET |
/health |
No | Health check. Returns {"status": "ok"}. |
GET |
/deepgram |
-- | Legacy redirect to /. |
POST |
/login |
No | Form login (password, next). Sets auth cookie. Rate-limited. |
GET |
/api/translate/languages |
Yes* | Lists available translation languages. |
POST |
/api/elevenlabs/token |
Yes* | Creates single-use ElevenLabs Scribe token. Accepts optional {"api_key": "..."} body. |
*Auth is required only when AUTH_ENABLED=true (default).
| Path | Input | Description |
|---|---|---|
/ws |
JSON text messages | Translates text (Web Speech + ElevenLabs browser mode). |
/ws/deepgram |
Binary PCM audio | Streams audio to Deepgram for STT + translation. |
/ws/elevenlabs |
Binary PCM audio | Streams audio to ElevenLabs for STT + translation. |
All WebSocket endpoints require a valid auth cookie and matching origin header (when AUTH_ENABLED=true).
Client -> Server (/ws):
Server -> Client:
// Translation result
{
"type": "final",
"original": "Ahoj svete",
"dests": ["en", "ru"],
"translations": {"en": "Hello world", "ru": "Privet mir"}
}
// Error
{"error": "translation_failed"}
// Keepalive response
{"type": "pong"}Client -> Server (/ws/deepgram, /ws/elevenlabs):
The first message can optionally be a JSON config:
{
"type": "config",
"deepgram": {"language": "cs", "interim_results": true, "punctuate": true},
"translate": {"src": "cs", "dests": ["en", "ru"]},
"translate_interim": false
}All subsequent messages are raw binary PCM audio (16-bit, 16 kHz, mono).
# Install dev dependencies
pip install -r requirements-dev.txt
# Run all tests
pytest
# Verbose output
pytest -vv
# Run with coverage report
pytest --cov=app --cov-report=term-missing
# Run a specific test
pytest tests/test_main.py::test_ws_translates_text
# Run tests matching a pattern
pytest -k translate
# Quick syntax check (no execution)
python -m compileall app testsSee CONTRIBUTING.md for the full development workflow and known test issues.
services:
live-translator:
build: .
ports:
- "8000:8000"
environment:
APP_PASSWORD: ${APP_PASSWORD}
AUTH_SECRET: ${AUTH_SECRET}
ENABLED_ENGINES: webspeech,deepgram,elevenlabs
DEEPGRAM_API_KEY: ${DEEPGRAM_API_KEY}
ELEVENLABS_API_KEY: ${ELEVENLABS_API_KEY}
restart: unless-stopped- Create a new service pointing to the GitHub repository.
- Set environment variables in the Coolify dashboard (see Configuration).
- Deploy. The
Dockerfileincludes aHEALTHCHECKthat Coolify uses automatically.
When running behind nginx, Caddy, or similar:
- Set
AUTH_COOKIE_SECURE=trueif the proxy terminates TLS. - Set
ALLOWED_ORIGINS=https://your-domain.comto restrict WebSocket origins. - Ensure the proxy forwards
Host,Origin, andX-Forwarded-Forheaders. - Enable WebSocket proxying for
/ws,/ws/deepgram, and/ws/elevenlabs.
Example nginx location block:
location / {
proxy_pass http://127.0.0.1:8000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location ~ ^/ws {
proxy_pass http://127.0.0.1:8000;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_read_timeout 86400;
}- Authentication -- HMAC-SHA256 signed tokens in
httpOnlycookies with configurable TTL. - Login rate limiting -- 10 attempts per 60 seconds per IP (in-memory).
- CSRF protection -- Origin/Referer validation on login form submissions.
- Content Security Policy -- Restricts script sources, frame ancestors, and connect targets.
- WebSocket origin check -- Validates
Originheader againstHostorALLOWED_ORIGINS. - Safe redirects --
sanitize_next_pathprevents open redirects after login. - No secrets in logs -- Passwords and API keys are never logged.
For vulnerability reporting, please see SECURITY.md.
The following improvements are planned or under consideration. Contributions welcome!
- Add CI pipeline -- GitHub Actions workflow for linting, testing, and Docker build
- Internationalize the UI -- currently Czech labels are hardcoded in templates
- Session recording/export -- save transcriptions and translations to a downloadable file
These items would improve the project but are not blocking. They make great first contributions:
- Pin dependency versions in
requirements.txt-- currently unpinned, which can cause breakage on fresh installs when upstream packages release breaking changes - Expand test coverage -- add tests for:
- ElevenLabs WebSocket happy-path
/api/translate/languagesendpointsanitize_next_pathedge cases- WebSocket config message handling
MAX_TEXT_LENGTHenforcement- Auth token expiry
- Extract duplicated AudioWorklet PCM processor code into a shared JavaScript constant -- the same processor is currently inlined in three places (Deepgram, ElevenLabs server mode, ElevenLabs browser mode)
- Consider
google-cloud-translateordeeplfor production translation -- the currentgoogletranslibrary uses an unofficial API that can be slow (1--3 s per call) and occasionally breaks; a paid translation API would be more reliable for production deployments
Contributions are welcome! Please read CONTRIBUTING.md before submitting a pull request.
This project follows the Contributor Covenant Code of Conduct.
See SUPPORT.md.
This project is licensed under the MIT License.
Made with care by Rhiz3K
