A production-ready AI voice translator with Flask backend + Next.js 14 App Router frontend. It combines Deepgram STT, OpenRouter (GPT-4o-mini) translation, ElevenLabs/Browsers TTS fallback, waveform recording, history, theming, and PWA polish. Built with performance, accessibility, and DX in mind.
Real-time, voice-enabled AI translation for web and mobile.
An end-to-end Next.js 14 + Flask solution that turns speech into translated speech in seconds. Built for developers who want a ready-to-ship stack, travelers who need instant comprehension, and businesses who require polished, themeable, PWA-friendly interfaces.
| Capability | Details |
|---|---|
| Voice in/out | Record via MediaRecorder, visualize waveform, TTS playback (ElevenLabs or browser SpeechSynthesis) |
| Languages | 50+ language options with flag picker and animated swap |
| AI models | OpenRouter (GPT-4o-mini) for translation, session-aware history |
| Real-time flow | Speech → Text → LLM → Translation → TTS with live status indicators |
| Themes | Dark/light with glassmorphism and neon accents |
| Responsive | Three-panel desktop, collapsible tablet, single-column mobile |
| Offline-friendly | PWA manifest, graceful fallbacks, cached history (Zustand persist) |
# 1) Clone repository
git clone https://github.com/yocho1/babel-fish-assistant.git
cd babel-fish-assistant
# 2) Backend setup
python -m venv .venv
. .venv/Scripts/activate # PowerShell: .\.venv\Scripts\activate
pip install -r requirements.txt
# 3) Configure environment
cp .env.example .env
# Fill: DEEPGRAM_API_KEY, OPENROUTER_API_KEY, ELEVENLABS_API_KEY (optional for browser TTS fallback)
# 4) Run services
python app.py # backend on 5000
cd frontend && npm install && npm run dev # frontend on 3000 (or 3001)
# 5) Access app
# http://localhost:3000 (calls backend http://localhost:5000)- Frontend: Next.js 14 (App Router), React 18, TypeScript, Tailwind, Framer Motion, Zustand, Wavesurfer.js
- Backend: Flask 3, flask-cors, python-dotenv, requests
- APIs: Deepgram STT, OpenRouter LLM, ElevenLabs TTS (or browser SpeechSynthesis fallback)
- State: Session cookie
session_id+ persisted history in Zustand
- CORS restricted to http://localhost:3000 and http://localhost:3001 with credentials
- Env keys in .env (not committed): DEEPGRAM_API_KEY, OPENROUTER_API_KEY, ELEVENLABS_API_KEY
- Manifest at frontend/public/manifest.json (replace placeholder icons as needed)
- Choose source/target languages, press record or type text, then Translate
- Watch the pipeline statuses and waveform while processing
- History panel stores recent translations; auto-play toggle controls TTS
- Backend: run behind HTTPS with a production WSGI server (gunicorn/uwsgi)
- Frontend:
npm run buildthen deploy (Vercel/Netlify/your infra) - Secrets: inject env vars on your platform; never commit keys
- Strict TypeScript, explicit CORS, environment-driven config
- WCAG-minded: keyboardable controls, focus styles, reduced-motion friendly
- Rotate API keys; monitor provider quotas; add OpenRouter retry/backoff if the network is flaky
MIT
- Rotate API keys regularly; keep
.envout of version control. - Monitor OpenRouter/Deepgram/ElevenLabs quotas and latencies.
- Replace placeholder icons with branded assets (192/512 PNGs) in
frontend/public. - Consider adding retries/backoff for OpenRouter in
app.pyif your network is flaky.