AI-powered video dubbing pipeline with per-step Gemini multimodal optimization.
- Full dubbing pipeline: transcription → translation → TTS → render
- Per-step Gemini review (transcription, translation, speaker analysis)
- 24 atomic operations (timing, translation, emotion, TTS style, volume, subtitles, etc.)
- Multi-round final optimization with score-regression rollback
- ElevenLabs + OpenAI TTS + edge-tts voice synthesis
- Speaker identification & voice assignment
- Auto-rephrase for extreme speed ratios
- Background volume control
# Install
pip install -e ".[dev]"
# Configure (create storage/settings.json)
cp storage/settings.example.json storage/settings.json
# Edit with your API keys
# Run pipeline
dubber pipeline video.mp4 --target zh-CNCreate storage/settings.json with the following structure:
{
"api_keys": {
"openai_api_key": "<OpenRouter key>",
"openai_base_url": "https://openrouter.ai/api/v1",
"elevenlabs_api_key": "<ElevenLabs key>"
},
"models": {
"tts_provider": "elevenlabs",
"tts_model": "eleven_v3",
"asr_provider": "elevenlabs-scribe",
"translation_model": "google/gemini-2.5-flash"
},
"gemini_api_key": "<same as openai_api_key>",
"gemini_base_url": "https://openrouter.ai/api/v1",
"gemini_model": "google/gemini-2.5-flash"
}| Field | Description |
|---|---|
api_keys.openai_api_key |
OpenRouter API key (used for translation + rephrase) |
api_keys.openai_base_url |
OpenRouter base URL |
api_keys.elevenlabs_api_key |
ElevenLabs API key for TTS |
models.tts_provider |
TTS engine: elevenlabs, openai-tts, or edge-tts |
models.tts_model |
ElevenLabs model: eleven_v3, eleven_turbo_v2_5, etc. |
models.asr_provider |
ASR provider: elevenlabs-scribe or faster-whisper |
models.translation_model |
LLM for translation (OpenRouter model name) |
gemini_api_key |
Gemini/OpenRouter key for multimodal analysis |
gemini_base_url |
Gemini API base URL (can reuse OpenRouter) |
gemini_model |
Gemini model name |
- elevenlabs (default): High quality, requires ElevenLabs API key
- openai-tts: OpenAI TTS via OpenRouter
- edge-tts: Free Microsoft Edge TTS, no API key required
dubber pipeline <video> # Full pipeline
dubber transcribe <video> # Transcribe only
dubber translate <project_id> # Translate only
dubber tts <project_id> # TTS only
dubber render <project_id> # Render only
dubber project list # List projects
dubber project show <id> # Show project detailsfrom dubber.adapters.openclaw_skill import DubberSkill
skill = DubberSkill({
"storage_dir": "/path/to/storage",
"gemini_api_key": "<OpenRouter key>",
"gemini_base_url": "https://openrouter.ai/api/v1",
"elevenlabs_api_key": "<ElevenLabs key>",
"tts_provider": "elevenlabs",
"tts_model": "eleven_v3",
})
# Async
result = await skill.execute("dub_video", {
"video_path": "video.mp4",
"target_language": "zh-CN",
})dubber/
├── agent/
│ ├── orchestrator.py # Pipeline orchestrator
│ ├── pipeline_helpers.py # Extracted helper functions
│ ├── per_step_optimizer.py # Gemini per-step optimization
│ ├── optimization_loop.py # Multi-round optimization loop
│ └── gemini_analyzer.py # Gemini multimodal analysis
├── adapters/
│ └── openclaw_skill.py # OpenClaw skill wrapper
├── ops/ # 24 atomic operations
├── services/ # Core pipeline services
├── state/ # JSON-file state management
└── config_reader.py # Settings format helpers