🎧 Choose your mode, dub in any language, and enjoy crystal-clear vocals.
Bluez-Dubbing is a modular, production-ready pipeline for automatic video dubbing and subtitle generation. It integrates state-of-the-art models for ASR (Automatic Speech Recognition), translation, and TTS (Text-to-Speech), supporting features like:
- audio source separation
- VAD-based duration alignment
- sophisticated dubbing strategies
- customizable subtitle styles
- End-to-End Dubbing: From video/audio input to fully dubbed output with burned-in subtitles.
- Multiple Modes: Video dubbing (with or without subtitles), audio translation, or subtitling only.
- REST API & CLI: FastAPI endpoints and command-line tools for automation.
- Independent Web UI: A dedicated app offering an intuitive experience and live progress tracking. See Web UI for details.
- Modular Services: Easily plug, swap, or extend ASR, translation, and TTS models.
- Flexible Translation: Segment-wise or full-text translation with smart synchronization.
- Advanced Audio Synchronization: Multiple algorithms for seamless and natural voice replacement.
- Subtitle Generation: Netflix-style, bold-desktop, or mobile-optimized SRT/VTT/ASS output.
bluez-dubbing/
├── apps/
│ ├── backend/
│ │ ├── cache/ # Cached audio/background/intermediate data
│ │ ├── libs/
│ │ │ └── common-schemas/ # Shared Pydantic models & utilities
│ │ ├── models_cache/ # Downloaded model weights/configs
│ │ ├── outs/ # Output workspaces per job
│ │ ├── services/
│ │ │ ├── asr/ # ASR (WhisperX, etc.)
│ │ │ ├── orchestrator/ # Main API & pipeline logic
│ │ │ ├── translation/ # Translation service
│ │ │ └── tts/ # TTS service
│ │ └── uploads/ # Uploaded media from the UI
│ └── frontend/
│ ├── assets/ # UI icons and branding
│ ├── scripts/ # JS modules for the Web UI
│ ├── styles/ # Stylesheets
│ └── index.html # Web application entry
├── Makefile
└── README.md
|
|
|
git clone https://github.com/your-org/bluez-dubbing.git
cd bluez-dubbingEnsure ffmpeg and uv are installed.
Linux example:
sudo apt update && sudo apt install ffmpeg -y
sudo apt install uvNote: Some tokenizers (e.g.
mecab-python3for Japanese) require a JVM to be installed.
To install dependencies for any service:
cd apps/backend/services/<serviceName>
uv syncOr for all at once:
make install-depThis sets up .venv environments for each service (ASR, translation, TTS, orchestrator).
Dependency notes:
-
If
onnxandml_dtypesconflict, run:uv lock --upgrade-package ml_dtypes==0.5.3 && uv sync -
Chatterbox pins
torch==2.6.0/torchaudio==2.6.0. If your hardware needs newer versions (e.g., RTX 5080 GPUs require ≥ 2.8.0):uv pip uninstall torch torchaudio uv pip install torch==2.8.0 torchaudio==2.8.0
For CUDA wheels (Windows or manual install):
uv pip install torch==2.8.0 torchaudio==2.8.0 \ --index-url https://download.pytorch.org/whl/cu12x
⚠️ Don’t re-runuv syncafterwards, as it will downgrade again.
- Copy
.env.example→.env - Set required variables (
HF_TOKEN,ORCHESTRATOR_ALLOWED_ORIGINS, etc.) - Place model weights in
models_cache/
make start-api # Launch orchestrator only
make stack-up # Launch ASR, translation, TTS, orchestrator
make stop # Stop all services
make restart # Restart everythingmake start-uiDefault URL: http://localhost:5173
The UI connects to the backend at http://localhost:8000/api.
To change it:
localStorage.setItem("bluez-backend-base", "https://your-host/api");Restart or stop with:
make restart-ui
make stopSee CONTRIBUTING for a full explanation of parameters and tuning guidance. Defaults work for most cases, and the models automatically adjust when needed.
After serving the frontend:
- Upload a file or paste a video link (YouTube, Instagram, TikTok…)
- Adjust model and dubbing parameters or use auto-selection and hit the run dubbing pipeline that's it!
- Watch live logs (ASR → Translation → TTS → Merge)
- Preview or download results
- Choose Lazy Mode (fully automatic) or Involve Mode (manual fine-tuning)
- Toggle “Keep Intermediate Artefacts” to retain separated tracks or transcripts
curl -X POST -G 'http://localhost:8000/v1/dub' \
--data-urlencode 'video_url=/path/to/video.mp4' \
--data-urlencode 'target_work=dub' \
--data-urlencode 'target_langs=fr' \
--data-urlencode 'asr_model=whisperx' \
--data-urlencode 'tr_model=deep_translator' \
--data-urlencode 'tts_model=edge_tts' \
--data-urlencode 'perform_vad_trimming=true' \
--data-urlencode 'dubbing_strategy=full_replacement' \
--data-urlencode 'sophisticated_dub_timing=true' \
--data-urlencode 'subtitle_style=netflix_mobile' \
--data-urlencode 'persist_intermediate=false'Outputs are saved to apps/backend/outs/<workspace_id>/.
Each microservice has its own CLI for debugging or running isolated stages:
# ASR
uv run python -m services.asr.cli /path/to/audio.wav --output-json asr.json
# Translation
uv run python -m services.translation.cli asr.json --target-lang fr --output-json translation.json
# TTS
uv run python -m services.tts.cli translation.json --workspace ./tts_out --output-json tts.jsonRun --help on any CLI for available flags.
Run tests via:
make testIncludes:
- unit tests for service CLIs
- registry validation (ensures all registered models run properly)
- end-to-end integration test for the orchestrator pipeline
GitHub Actions workflow (.github/workflows/ci.yml) automatically:
- sets up Python 3.11 +
uv - runs
make test - validates model registries and pipeline integration
Ensure your PRs keep all tests green.
- ASR: WhisperX
- Translation: deep-translator, M2M100, etc.
- TTS: Edge TTS, Chatterbox, and more
- ASR: WhisperX out of the box; extend via
services/asr/app/registry.py. - Translation:
deep_translator, M2M100, and pluggable custom translators. - TTS: Edge TTS, Chatterbox, plus any custom registry entry.
See libs/common-schemas/config/ for model configs and supported languages.
Add new models via each service’s registry.py and model folder see CONTRIBUTING.md for more details
Contributions are welcome! Please read CONTRIBUTING.md before submitting PRs or issues.
Licensed under the Apache License 2.0.
Thanks to these open-source projects:
Contact: 📧 contactglobluez@gmail.com