-
Notifications
You must be signed in to change notification settings - Fork 0
As a user, I can have noisy audio automatically cleaned before transcription so that accuracy improves #13
Copy link
Copy link
Open
Labels
featureNew feature or enhancementNew feature or enhancement
Description
User Story
As a user uploading a meeting recorded in a large room, I can have the system automatically clean up the audio before transcription so that distant speakers and background noise don't degrade accuracy.
Spec: docs/specs/transcription-quality-improvements.md — US-1
Size: M
Acceptance Criteria
- Uploaded audio undergoes preprocessing before transcription: high-pass filter, noise reduction, and loudness normalization
- Preprocessing is enabled by default
- The upload form includes a toggle/checkbox to disable preprocessing per upload
- Given a clean, close-mic recording with preprocessing enabled, then audio quality is not degraded (conservative noise reduction)
- The
preprocess_audiofield is persisted inMeetingMetadata - New dependencies:
noisereduce(>=3.0.0) andpyloudnorm(>=0.1.1)
Truths
- T1: Noise reduction uses conservative settings (
prop_decrease=0.75) — Whisper was trained on noisy data and aggressive removal hurts accuracy (BR-3) - T2: Preprocessing is applied to the audio fed to both transcription and diarization (simplest approach, pending OQ#1)
- T3: The original uploaded audio file is preserved; preprocessing creates a working copy
Business Rules
- BR-3: Audio preprocessing uses conservative noise reduction (
prop_decrease=0.75). Aggressive noise removal can strip speech harmonics and hurt accuracy.
Edge Cases
- Audio is already clean (close-mic) → conservative preprocessing should not degrade quality; user can disable per upload
- Preprocessing disabled → audio goes directly to transcription pipeline as today
Dependencies
None
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
featureNew feature or enhancementNew feature or enhancement