Skip to content

DataArcTech/ai-video-dubber

Repository files navigation

AI Video Dubber

AI-powered video dubbing pipeline with per-step Gemini multimodal optimization.

Features

  • Full dubbing pipeline: transcription → translation → TTS → render
  • Per-step Gemini review (transcription, translation, speaker analysis)
  • 24 atomic operations (timing, translation, emotion, TTS style, volume, subtitles, etc.)
  • Multi-round final optimization with score-regression rollback
  • ElevenLabs + OpenAI TTS + edge-tts voice synthesis
  • Speaker identification & voice assignment
  • Auto-rephrase for extreme speed ratios
  • Background volume control

Quick Start

# Install
pip install -e ".[dev]"

# Configure (create storage/settings.json)
cp storage/settings.example.json storage/settings.json
# Edit with your API keys

# Run pipeline
dubber pipeline video.mp4 --target zh-CN

Configuration

Create storage/settings.json with the following structure:

{
  "api_keys": {
    "openai_api_key": "<OpenRouter key>",
    "openai_base_url": "https://openrouter.ai/api/v1",
    "elevenlabs_api_key": "<ElevenLabs key>"
  },
  "models": {
    "tts_provider": "elevenlabs",
    "tts_model": "eleven_v3",
    "asr_provider": "elevenlabs-scribe",
    "translation_model": "google/gemini-2.5-flash"
  },
  "gemini_api_key": "<same as openai_api_key>",
  "gemini_base_url": "https://openrouter.ai/api/v1",
  "gemini_model": "google/gemini-2.5-flash"
}

Key Fields

Field Description
api_keys.openai_api_key OpenRouter API key (used for translation + rephrase)
api_keys.openai_base_url OpenRouter base URL
api_keys.elevenlabs_api_key ElevenLabs API key for TTS
models.tts_provider TTS engine: elevenlabs, openai-tts, or edge-tts
models.tts_model ElevenLabs model: eleven_v3, eleven_turbo_v2_5, etc.
models.asr_provider ASR provider: elevenlabs-scribe or faster-whisper
models.translation_model LLM for translation (OpenRouter model name)
gemini_api_key Gemini/OpenRouter key for multimodal analysis
gemini_base_url Gemini API base URL (can reuse OpenRouter)
gemini_model Gemini model name

TTS Providers

  • elevenlabs (default): High quality, requires ElevenLabs API key
  • openai-tts: OpenAI TTS via OpenRouter
  • edge-tts: Free Microsoft Edge TTS, no API key required

CLI Commands

dubber pipeline <video>          # Full pipeline
dubber transcribe <video>        # Transcribe only
dubber translate <project_id>    # Translate only
dubber tts <project_id>          # TTS only
dubber render <project_id>       # Render only
dubber project list              # List projects
dubber project show <id>         # Show project details

OpenClaw Skill

from dubber.adapters.openclaw_skill import DubberSkill

skill = DubberSkill({
    "storage_dir": "/path/to/storage",
    "gemini_api_key": "<OpenRouter key>",
    "gemini_base_url": "https://openrouter.ai/api/v1",
    "elevenlabs_api_key": "<ElevenLabs key>",
    "tts_provider": "elevenlabs",
    "tts_model": "eleven_v3",
})

# Async
result = await skill.execute("dub_video", {
    "video_path": "video.mp4",
    "target_language": "zh-CN",
})

Architecture

dubber/
├── agent/
│   ├── orchestrator.py        # Pipeline orchestrator
│   ├── pipeline_helpers.py    # Extracted helper functions
│   ├── per_step_optimizer.py  # Gemini per-step optimization
│   ├── optimization_loop.py   # Multi-round optimization loop
│   └── gemini_analyzer.py     # Gemini multimodal analysis
├── adapters/
│   └── openclaw_skill.py      # OpenClaw skill wrapper
├── ops/                        # 24 atomic operations
├── services/                   # Core pipeline services
├── state/                      # JSON-file state management
└── config_reader.py            # Settings format helpers

About

AI Video Dubbing Agent — automatically dubs videos from any language to any language with speaker cloning, Gemini multimodal optimization, and ElevenLabs TTS

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages