Merging subtitles using only the nearest timestamp often leads to incorrect pairings β lines may end up out of sync, duplicated, or mismatched.
This Python tool uses semantic similarity (via Sentence Transformers) to align subtitle lines based on meaning instead of timestamps β making it possible to pair subtitles across different languages.
- π Aligns subtitle lines based on meaning, not timing
- π Multilingual support based on the user selected Sentence Transformer model
- π Flexible format support β works with SRT, VTT, MPL2, TTML, ASS, SSA files
- π§© Easy-to-use Python API for integration
- π» Command-line interface with customizable options
- π Web UI β run locally or in the cloud via Google Colab or Hugging Face Spaces
You can launch the Web UI instantly without installing anything locally by running it in the cloud.
Note
- Google Colab has a limited runtime allocation, especially when using the free instance.
- On Hugging Face Spaces, only a few models are preloaded, and inference can be slower because it runs on CPU.
- Install the correct version of PyTorch for your system by following the official instructions: https://pytorch.org/get-started/locally
- Install this repo via pip:
pip install duosubs
You can launch the web UI locally:
-
via command line
duosubs launch-webui
-
via Python API
from duosubs import create_duosubs_gr_blocks # Build the Web UI layout (Gradio Blocks) webui = create_duosubs_gr_blocks() # These commands work just like launching a regular Gradio app webui.queue(default_concurrency_limit=None) # Allow unlimited concurrent requests webui.launch(inbrowser=True) # Start the Web UI and open it in a browser tab
This starts the server, prints its url (e.g. http://127.0.0.1:7860), and then opens the Web UI in a new browser tab.
If you want to launch it in other url (e.g. 0.0.0.0) and port (e.g 8000), you can run:
-
via command line
duosubs launch-webui --host 0.0.0.0 --port 8000
-
via Python API
from duosubs import create_duosubs_gr_blocks webui = create_duosubs_gr_blocks() webui.queue(default_concurrency_limit=None) webui.launch( server_name = "0.0.0.0", # use different address server_port = 8000, # use different port number inbrowser=True )
Warning
- The Web UI caches files during processing, and clears files older than 2 hours every 1 hour. Cached data may remain if the server stops unexpectedly.
- Sometimes, older model may fail to be released after switching or closing sessions. If you run out of RAM or VRAM, simply restart the script.
To learn more about the launching options, please see the sections of Launch Web UI Command and Web UI Launching in the documentation.
With the demo files provided, here are the simplest way to merge the subtitles:
-
via command line
duosubs merge -p demo/primary_sub.srt -s demo/secondary_sub.srt
-
via Python API
from duosubs import MergeArgs, run_merge_pipeline # Store all arguments args = MergeArgs( primary="demo/primary_sub.srt", secondary="demo/secondary_sub.srt" ) # Load, merge, and save subtitles. run_merge_pipeline(args, print)
These codes will produce primary_sub.zip, with the following structure:
primary_sub.zip
βββ primary_sub_combined.ass # Merged subtitles
βββ primary_sub_primary.ass # Original primary subtitles
βββ primary_sub_secondary.ass # Time-shifted secondary subtitles
By default, the Sentence Transformer model used is LaBSE.
If you want to experiment with different models, then pick one from π€ Hugging Face or check out from the leaderboard for top performing model.
For example, if the model chosen is Qwen/Qwen3-Embedding-0.6B, you can run:
-
via command line
duosubs merge -p demo/primary_sub.srt -s demo/secondary_sub.srt --model Qwen/Qwen3-Embedding-0.6B
-
via Python API
from duosubs import MergeArgs, run_merge_pipeline # Store all arguments args = MergeArgs( primary="demo/primary_sub.srt", secondary="demo/secondary_sub.srt", model="Qwen/Qwen3-Embedding-0.6B" ) # Load, merge, and save subtitles. run_merge_pipeline(args, print)
-
via Web UI
In Configurations β Model & Device β Sentence Transformer Model, replace
sentence-transformers/LaBSE
withQwen/Qwen3-Embedding-0.6B
.
Warning
- Some models may require significant RAM or GPU (VRAM) to run, and might not be compatible with all devices β especially larger models.
- Also, please ensure the selected model supports your desired language for reliable results.
Also, this tool has 3 merging modes, i.e. synced
, mixed
, and cuts
modes.
Here are some of the simple guidelines to choose the appropriate mode:
- If both subtitle files are timestamp-synced, use
synced
for the cleanest result. - If timestamps drift or only partially overlap, use
mixed
. - If subtitles come from different editions of the video, with primary subtitles being
the extended or longer version, use
cuts
.
To merge with a specific mode (e.g. cuts
), run:
-
via command line
duosubs merge -p primary_sub.srt -s secondary_sub.srt --mode cuts
-
via Python API
from duosubs import MergeArgs, MergingMode, run_merge_pipeline # Store all arguments args = MergeArgs( primary="primary_sub.srt", secondary="secondary_sub.srt", merging_mode=MergingMode.CUTS # Modes available: MergingMode.SYNCED, MergingMode.MIXED, MergingMode.CUTS ) # Load, merge, and save subtitles. run_merge_pipeline(args, print)
-
via Web UI
In Configurations β Alignment Behavior β Merging Mode, choose
Cuts
.
Tip
For mixed
and cuts
modes, try to use subtitle files without scene annotations if possible, as they may reduce alignment quality.
To learn more about merging options, please see the sections of Merge Command and Core Subtitle Merging in the documentation.
- Parse subtitles and detect language.
- Tokenize subtitle lines.
- Extract and filter non-overlapping subtitles (
synced
mode only). - Estimate tokenized subtitle pairings using DTW.
- Refine alignment using a sliding window approach with size of 3.
- Extract and filter extended subtitles from the primary track (
cuts
mode only). - Refine alignment using a sliding window approach with size of 2.
- Combine aligned and non-overlapping subtitles or extended subtitles
- Eliminate unnecessary newline within subtitle lines.
- The accuracy of the merging process varies on the model selected.
- Some models may produce unreliable results for unsupported or low-resource languages.
- Some sentence fragments from secondary subtitles may be misaligned to the primary subtitles line due to the tokenization algorithm used.
- Secondary subtitles might contain extra whitespace as a result of token-level merging.
- In
mixed
andcuts
modes, the algorithm may not work reliably since matching lines have no timestamp overlap, and either subtitle could contain extra or missing lines.
This project wouldn't be possible without the incredible work of the open-source community. Special thanks to:
- sentence-transformers β for the semantic embedding backbone
- Hugging Face β for hosting models and making them easy to use
- PyTorch β for providing the deep learning framework
- fastdtw β for aligning the subtitles
- hmmlearn β for denoising sequences
- lingua-py β for detecting the subtitles' language codes
- pysubs2 β for subtitle file I/O utilities
- charset_normalizer β for identifying the file encoding
- typer β for CLI application
- tqdm β for displaying progress bar
- gradio β for creating Web UI application
- Tears of Steel β subtitles used for demo, testing and development purposes. Created by the Blender Foundation, licensed under CC BY 3.0.
Contributions are welcome! If you'd like to submit a pull request, please check out the contributing guidelines.
Apache-2.0 license - see the LICENSE file for details.