You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Bump version to 1.1.0, update README for 3-way ensemble, fix license format
- Update README: 3-way default, anti-hallucination flags, distil-large-v3,
A/B/C adjudication, Distil-Whisper acknowledgment
- Bump version to 1.1.0 in pyproject.toml and __init__.py (were inconsistent)
- Fix license field to SPDX string format (was deprecated table format)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: README.md
+16-12Lines changed: 16 additions & 12 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,8 +12,8 @@ The approach applies principles from [textual criticism](https://en.wikipedia.or
12
12
13
13
-**Critical text merging**: Combines 2–3+ transcript sources into the most accurate version using blind, anonymous presentation to an LLM — no source receives preferential treatment
14
14
-**wdiff-based alignment**: Uses longest common subsequence alignment (via `wdiff`) to keep chunks properly aligned across sources of different lengths, replacing naive proportional slicing
15
-
-**Multi-model Whisper ensembling**: Runs multiple Whisper models (e.g., small + medium) and resolves disagreements via LLM
16
-
-**Hallucination detection**: Automatically detects and collapses Whisper repetition loops (e.g., a phrase repeated 60+ times) in both raw outputs and merged transcripts
15
+
-**Multi-model Whisper ensembling**: Runs multiple Whisper models (default: small + medium + distil-large-v3) and resolves disagreements via LLM with anonymous A/B/C labels
16
+
-**Anti-hallucination**: Whisper runs use `condition_on_previous_text=False` and other flags to prevent cascading hallucination; residual repetition loops are automatically detected and collapsed
17
17
-**External transcript support**: Merges in human-edited transcripts (e.g., from publisher websites) as an additional source
18
18
-**Structured transcript preservation**: When external transcripts have speaker labels and timestamps, the merged output preserves that structure
19
19
-**Slide extraction and analysis**: Automatic scene detection for presentation slides, with optional vision API descriptions
@@ -247,13 +249,14 @@ Each source alone gets some things right and others wrong. Whisper hallucinates
247
249
248
250
### Multi-Model Whisper Merging
249
251
250
-
When using multiple Whisper models (default: `small,medium`):
252
+
When using multiple Whisper models (default: `small,medium,distil-large-v3`):
251
253
252
-
1. Runs each model independently
253
-
2. Uses `wdiff` to identify specific word-level differences (normalized: no caps, no punctuation)
254
-
3. Clusters nearby differences and presents each cluster to an LLM with anonymous labels ("A" / "B") and surrounding context
255
-
4. The LLM picks A or B for each disagreement — constrained to choose between actual transcriptions, preventing hallucinated text
256
-
5. Chosen readings are surgically applied to the base transcript, leaving uncontested regions untouched
254
+
1. Runs each model independently with anti-hallucination flags
255
+
2. Uses `wdiff` to identify specific word-level differences between each non-base model and the base (largest model)
256
+
3. For 3+ models, merges pairwise diffs at the same positions into unified diffs with per-model readings
257
+
4. Clusters nearby differences and presents each cluster to an LLM with anonymous labels (A/B or A/B/C) and surrounding context — model names are never revealed
258
+
5. The LLM picks a letter for each disagreement — constrained to choose between actual transcriptions, preventing hallucinated text
259
+
6. Chosen readings are surgically applied to the base transcript, leaving uncontested regions untouched
257
260
258
261
This targeted diff resolution avoids the problems of full-text rewriting (chunk-boundary duplication, errors in uncontested regions, wasted tokens). The implementation runs Whisper-vs-Whisper adjudication first to produce a single merged Whisper witness (`whisper_merged.txt`), which then enters the multi-source merge alongside captions and external transcripts.
259
262
@@ -285,7 +288,7 @@ Every stage checks `is_up_to_date(output, *inputs)` — if the output file is ne
0 commit comments