GSoC 2026 Candidate Submission: End-to-End Narrative Audio Pipeline#39
Open
meganho456 wants to merge 5 commits intohumanai-foundation:masterfrom
Open
GSoC 2026 Candidate Submission: End-to-End Narrative Audio Pipeline#39meganho456 wants to merge 5 commits intohumanai-foundation:masterfrom
meganho456 wants to merge 5 commits intohumanai-foundation:masterfrom
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR contains my GSoC 2026 test submission for a complete narrative-audio workflow, including all required tasks and a bonus storytelling analysis component.
What’s included
Task 1: Audio Processing Pipeline
Loads .wav recordings, normalizes audio, segments clips when needed, and extracts ML-ready features.
Features include MFCCs, pitch, spectral centroid, RMS energy, and duration.
Produces a structured feature dataset and normalized audio outputs.
Task 2: Narrative Tone Classification
Trains a neural-network classifier using labeled emotional-tone data.
Uses train/test split and reports evaluation metrics (accuracy, weighted F1, per-class report).
Task 3: AI-Based Transcription
Implements batch transcription with Whisper.
Exports transcripts to text format.
Measures transcription quality on a subset using WER.
Task 4: Narrative Audio Retrieval
Implements a retrieval prototype for narrative-style queries (e.g., calm narration, high-energy speech, dramatic dialogue).
Combines structured filtering and semantic ranking to return relevant recordings.
Bonus: Storytelling Audio Analysis
Analyzes storytelling-oriented cues: pacing/pauses, pitch variation, energy dynamics, and sentence-length characteristics.
Adds a heuristic storytelling score and ranks clips by storytelling-like expressiveness.
Deliverables in this submission
Full source code for Tasks 1–4 and bonus task, and run_pipeline that chains all the tasks together
Technical report PDF
README with setup and run instructions
Example output artifacts (feature CSVs, transcripts, analysis outputs)