Skip to content

Conversation

@louisjoecodes
Copy link

@louisjoecodes louisjoecodes commented Oct 27, 2025

  • Enables speaker diarization by default for ElevenLabs Scribe to handle multi-speaker audio correctly
  • Extracts dominant speaker (most words spoken) from multi-speaker transcriptions

Results

AMI dataset (200 samples):

  • Before: 14.43% WER
  • After: 10.13% WER
  • Improvement: 30% relative WER reduction

ElevenLabs Scribe was transcribing all speakers in multi-speaker audio (e.g., AMI meeting recordings), while benchmark ground truth contains only single-speaker utterances. This caused artificially high WER despite accurate transcription.

…asets

ElevenLabs Scribe was transcribing all speakers in multi-speaker audio (e.g., AMI meetings), while benchmarks expected only the dominant speaker. This caused artificially high WER (14.43% on AMI).

Changes:
- Enable diarization by default for ElevenLabs Scribe
- Extract dominant speaker (most words spoken) from transcription
- Fix language_code parameter (en vs eng)

Results on AMI dataset (200 samples):
- Before: 14.43% WER
- After: 10.13% WER (30% relative improvement)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant