Skip to content

As a user, I can have noisy audio automatically cleaned before transcription so that accuracy improves #13

@julien731

Description

@julien731

User Story

As a user uploading a meeting recorded in a large room, I can have the system automatically clean up the audio before transcription so that distant speakers and background noise don't degrade accuracy.

Spec: docs/specs/transcription-quality-improvements.md — US-1

Size: M

Acceptance Criteria

  • Uploaded audio undergoes preprocessing before transcription: high-pass filter, noise reduction, and loudness normalization
  • Preprocessing is enabled by default
  • The upload form includes a toggle/checkbox to disable preprocessing per upload
  • Given a clean, close-mic recording with preprocessing enabled, then audio quality is not degraded (conservative noise reduction)
  • The preprocess_audio field is persisted in MeetingMetadata
  • New dependencies: noisereduce (>=3.0.0) and pyloudnorm (>=0.1.1)

Truths

  • T1: Noise reduction uses conservative settings (prop_decrease=0.75) — Whisper was trained on noisy data and aggressive removal hurts accuracy (BR-3)
  • T2: Preprocessing is applied to the audio fed to both transcription and diarization (simplest approach, pending OQ#1)
  • T3: The original uploaded audio file is preserved; preprocessing creates a working copy

Business Rules

  • BR-3: Audio preprocessing uses conservative noise reduction (prop_decrease=0.75). Aggressive noise removal can strip speech harmonics and hurt accuracy.

Edge Cases

  • Audio is already clean (close-mic) → conservative preprocessing should not degrade quality; user can disable per upload
  • Preprocessing disabled → audio goes directly to transcription pipeline as today

Dependencies

None

Metadata

Metadata

Assignees

Labels

featureNew feature or enhancement

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions