Skip to content

As an admin, I can configure the diarization clustering threshold so that speaker merging is reduced #14

@julien731

Description

@julien731

User Story

As an administrator, I can configure the diarization clustering threshold via environment variable so that the system can be tuned to favor splitting over merging speakers.

Spec: docs/specs/transcription-quality-improvements.md — US-5

Size: L

Acceptance Criteria

  • DIARIZATION_THRESHOLD env var controls the clustering sensitivity
  • Default value (0.715) preserves current diarization behavior
  • Lower values reduce speaker merging (at the cost of potential over-splitting)
  • Given the threshold is set too low, then speakers may be over-split — this is documented with a recommended range (0.4–0.8)
  • Requires replacing WhisperX's built-in diarization wrapper with direct pyannote Pipeline usage

Truths

  • T1: This is an admin-only setting (env var), not exposed in the upload form
  • T2: This story involves the largest refactor — replacing the WhisperX diarization wrapper with direct pyannote pipeline calls
  • T3: All existing diarization functionality must continue to work identically at the default threshold

Business Rules

None directly. Addresses Goal #2 (reduce speaker merging errors).

Edge Cases

  • DIARIZATION_THRESHOLD set too low → over-splitting (one speaker appears as multiple). Documented with recommended range (0.4–0.8).
  • Default threshold (0.715) → behavior identical to current

Dependencies

None strictly, but recommended to implement last due to refactor scope (replaces WhisperX diarization wrapper).

Metadata

Metadata

Assignees

No one assigned

    Labels

    featureNew feature or enhancement

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions