Skip to content

[#13] Add audio preprocessing for uploaded recordings#33

Merged
julien731 merged 11 commits intodevelopfrom
feature/13-audio-preprocessing
Mar 17, 2026
Merged

[#13] Add audio preprocessing for uploaded recordings#33
julien731 merged 11 commits intodevelopfrom
feature/13-audio-preprocessing

Conversation

@julien731
Copy link
Copy Markdown
Member

Story: #13

Summary

Add an audio preprocessing pipeline that automatically cleans uploaded audio before transcription. The pipeline applies three stages: 80Hz high-pass filter (removes rumble), conservative noise reduction via noisereduce (prop_decrease=0.75), and loudness normalization to -23 LUFS via pyloudnorm. Preprocessing is enabled by default and can be toggled off per upload via a new checkbox in the upload form.

Key changes:

  • New backend/services/audio_preprocessor.py service with the three-stage pipeline
  • preprocess_audio boolean field added to MeetingMetadata (defaults to True)
  • Upload endpoint accepts preprocess_audio form parameter
  • Transcription pipeline calls preprocessor before WhisperX, using a working copy (original preserved)
  • Preprocessed file is cleaned up in a finally block after transcription
  • Upload form includes an "Audio preprocessing" checkbox with description
  • New dependencies: noisereduce>=3.0.0, pyloudnorm>=0.1.1

Approach

Preprocessing is implemented as a separate service module to keep the transcriber focused on transcription. The preprocessed audio is saved as a WAV working copy alongside the original file (preserving the original per T3). Conservative noise reduction settings are used because Whisper was trained on noisy data (BR-3/T1). The preprocessed copy is fed to both transcription and diarization (T2).

Verification

  • 119 tests passing (pytest -x -q)
  • Unit tests cover: preprocessor output path, original preservation, valid WAV output, stereo-to-mono conversion, spec constants
  • Integration tests cover: preprocess_audio defaults to true, can be disabled via form field
  • Ruff check and format pass
  • Architect review: no critical or major issues

@julien731
Copy link
Copy Markdown
Member Author

QA Confidence Verdict

What Was Verified

Truths:

Truth Status Method
T1: Noise reduction uses prop_decrease=0.75 PASS Code inspection: NOISE_PROP_DECREASE = 0.75 constant used in nr.reduce_noise(). Unit test test_constants_match_spec asserts this.
T2: Preprocessing applied to both transcription and diarization PASS Code inspection: audio_path is reassigned to preprocessed path before any WhisperX calls. All downstream steps (transcribe, align, diarize) use the same variable.
T3: Original uploaded file is preserved; preprocessing creates a working copy PASS Code inspection: preprocessed file saved as audio_preprocessed.wav alongside original. Unit test test_preserves_original_file verifies original is untouched. Preprocessed copy is cleaned up in finally block.

Acceptance Criteria:

AC Status Method
High-pass filter, noise reduction, loudness normalization applied PASS Code inspection: all three stages implemented in audio_preprocessor.py (80Hz Butterworth, noisereduce, pyloudnorm to -23 LUFS)
Preprocessing enabled by default PASS preprocess_audio: bool = True in schema. Form default is "true". Integration test confirms.
Upload form includes toggle to disable preprocessing PASS Code inspection: checkbox with id preprocess-checkbox, checked by default, wired through api.js to backend.
Conservative noise reduction does not degrade clean audio PASS Enforced by prop_decrease=0.75 and stationary=False settings.
preprocess_audio persisted in MeetingMetadata PASS Pydantic field added. Integration tests verify persistence for both true and false values.
Dependencies: noisereduce>=3.0.0, pyloudnorm>=0.1.1 PASS Both added to requirements.txt with correct version constraints.

Tests: All 30 tests pass (5 unit tests for preprocessor, 2 integration tests for preprocess_audio persistence).

What Needs Human Eyes

  • Visual appearance of the preprocessing checkbox in the upload form (alignment, spacing, dark/light theme)
  • Preprocessing hint text readability

Risk Areas

  • No test exercises the full transcription pipeline with preprocessing enabled (transcriber tests set preprocess_audio=False). The integration between preprocessor and transcriber is verified by code inspection only.
  • No test verifies that the preprocessed file is cleaned up after transcription completes.

Suggested QA Focus

  • Quick glance: Upload form checkbox appearance in both themes
  • Functional test: Upload a file with preprocessing enabled and disabled, verify both complete successfully
  • Estimated effort: 10 minutes manual testing

@julien731 julien731 self-assigned this Mar 14, 2026
@julien731 julien731 added the feature New feature or enhancement label Mar 14, 2026
CI does not install ML dependencies. Use pytest.importorskip to
gracefully skip when numpy/soundfile/noisereduce are unavailable.
@julien731 julien731 changed the title #13 Add audio preprocessing for uploaded recordings [#13] Add audio preprocessing for uploaded recordings Mar 16, 2026
soundfile (libsndfile) cannot read m4a/AAC files, causing preprocessing
to fail on common upload formats. Convert unsupported formats to WAV
using ffmpeg before applying filters.
@julien731 julien731 merged commit c547000 into develop Mar 17, 2026
2 checks passed
@julien731 julien731 deleted the feature/13-audio-preprocessing branch March 17, 2026 06:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature New feature or enhancement

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant