Add a wav loader #14923

larryliu0820 · 2025-10-09T00:22:13Z

This pull request adds support for loading and processing .wav audio files in the multimodal runner, alongside existing .bin file support. It introduces a dedicated WAV loader utility, updates the runner to dispatch audio file processing based on file type, and adds comprehensive tests for WAV file parsing and normalization. These changes improve flexibility and robustness when handling audio inputs.

WAV file support and audio processing:

Added a new utility wav_loader.h that provides functions to parse WAV file headers and load normalized PCM audio data from .wav files, supporting 16-bit and 32-bit PCM formats.
Updated multimodal.cpp to support loading audio from both .bin and .wav files, including input validation and error handling for unsupported formats. The runner now uses the processor for both file types and enforces processor requirements for .wav files. [1] [2] [3]
Added a new command-line flag data_path and passed it to the multimodal runner to facilitate data file handling. [1] [2] [3]

Testing and build integration:

Introduced test_wav_loader.cpp, which provides unit tests for WAV header parsing, sample normalization, error handling, and unsupported format detection.
Registered the new utility and tests in build configuration files, ensuring proper header exports and test coverage. [1] [2] [3] [4]

pytorch-bot · 2025-10-09T00:22:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14923

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 16 Pending

As of commit a56734d with merge base fc512fa ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

meta-codesync · 2025-10-09T00:35:42Z

@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this in D84214903.

examples/models/voxtral/multimodal.cpp

mergennachin

Update the README.md please

https://github.com/pytorch/executorch/blob/main/examples/models/voxtral/README.md

extension/llm/runner/wav_loader.h

Summary: This pull request adds support for loading and processing `.wav` audio files in the multimodal runner, alongside existing `.bin` file support. It introduces a dedicated WAV loader utility, updates the runner to dispatch audio file processing based on file type, and adds comprehensive tests for WAV file parsing and normalization. These changes improve flexibility and robustness when handling audio inputs. **WAV file support and audio processing:** * Added a new utility `wav_loader.h` that provides functions to parse WAV file headers and load normalized PCM audio data from `.wav` files, supporting 16-bit and 32-bit PCM formats. * Updated `multimodal.cpp` to support loading audio from both `.bin` and `.wav` files, including input validation and error handling for unsupported formats. The runner now uses the processor for both file types and enforces processor requirements for `.wav` files. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L138-R149) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L166-R191) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R247-L255) * Added a new command-line flag `data_path` and passed it to the multimodal runner to facilitate data file handling. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R38) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R294) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L297-R322) **Testing and build integration:** * Introduced `test_wav_loader.cpp`, which provides unit tests for WAV header parsing, sample normalization, error handling, and unsupported format detection. * Registered the new utility and tests in build configuration files, ensuring proper header exports and test coverage. [[1]](diffhunk://#diff-8a73187dfda9c5479db6911bee649164ff4434d36e8f4eb881cc1f049c4e3271R108) [[2]](diffhunk://#diff-24b61cfeb7f1fc9a646df385ece0c31ea2ab18b3c7e34fc62117c62538e111ffL22-R22) [[3]](diffhunk://#diff-c8ef93f128805fc48fe2d7c1dadb9ff5d2f4dc5ee7c00b638fd193d3dfb1f06cR47-R56) [[4]](diffhunk://#diff-d755455ed59da7a902bb5a5c1e540a1924f63e8f70a9dc78b455f2c569a19db6R17) Reviewed By: mergennachin Differential Revision: D84214903 Pulled By: larryliu0820

meta-codesync · 2025-10-09T20:20:44Z

@larryliu0820 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84214903.

Summary: This pull request adds support for loading and processing `.wav` audio files in the multimodal runner, alongside existing `.bin` file support. It introduces a dedicated WAV loader utility, updates the runner to dispatch audio file processing based on file type, and adds comprehensive tests for WAV file parsing and normalization. These changes improve flexibility and robustness when handling audio inputs. **WAV file support and audio processing:** * Added a new utility `wav_loader.h` that provides functions to parse WAV file headers and load normalized PCM audio data from `.wav` files, supporting 16-bit and 32-bit PCM formats. * Updated `multimodal.cpp` to support loading audio from both `.bin` and `.wav` files, including input validation and error handling for unsupported formats. The runner now uses the processor for both file types and enforces processor requirements for `.wav` files. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L138-R149) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L166-R191) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R247-L255) * Added a new command-line flag `data_path` and passed it to the multimodal runner to facilitate data file handling. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R38) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R294) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L297-R322) **Testing and build integration:** * Introduced `test_wav_loader.cpp`, which provides unit tests for WAV header parsing, sample normalization, error handling, and unsupported format detection. * Registered the new utility and tests in build configuration files, ensuring proper header exports and test coverage. [[1]](diffhunk://#diff-8a73187dfda9c5479db6911bee649164ff4434d36e8f4eb881cc1f049c4e3271R108) [[2]](diffhunk://#diff-24b61cfeb7f1fc9a646df385ece0c31ea2ab18b3c7e34fc62117c62538e111ffL22-R22) [[3]](diffhunk://#diff-c8ef93f128805fc48fe2d7c1dadb9ff5d2f4dc5ee7c00b638fd193d3dfb1f06cR47-R56) [[4]](diffhunk://#diff-d755455ed59da7a902bb5a5c1e540a1924f63e8f70a9dc78b455f2c569a19db6R17) Reviewed By: mergennachin Differential Revision: D84214903 Pulled By: larryliu0820

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 9, 2025

larryliu0820 marked this pull request as ready for review October 9, 2025 00:22

larryliu0820 requested review from jackzhxng, kirklandsign, lucylq, mergennachin and swolchok as code owners October 9, 2025 00:22

shoumikhin approved these changes Oct 9, 2025

View reviewed changes

jackzhxng reviewed Oct 9, 2025

View reviewed changes

examples/models/voxtral/multimodal.cpp Show resolved Hide resolved

mergennachin approved these changes Oct 9, 2025

View reviewed changes

extension/llm/runner/wav_loader.h Outdated Show resolved Hide resolved

facebook-github-bot force-pushed the process_wav branch from 0cff5e8 to d18960f Compare October 9, 2025 20:20

meta-codesync bot added fb-exported meta-exported labels Oct 9, 2025

facebook-github-bot force-pushed the process_wav branch from d18960f to cd552dc Compare October 9, 2025 21:39

larryliu0820 added the release notes: multimodal Changes and new features for multimodal support label Oct 9, 2025

facebook-github-bot force-pushed the process_wav branch from cd552dc to a56734d Compare October 9, 2025 22:09

larryliu0820 merged commit 66c3dea into main Oct 9, 2025
137 checks passed

larryliu0820 deleted the process_wav branch October 9, 2025 22:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add a wav loader #14923

Add a wav loader #14923

Uh oh!

larryliu0820 commented Oct 9, 2025

Uh oh!

pytorch-bot bot commented Oct 9, 2025 •

edited

Loading

Uh oh!

meta-codesync bot commented Oct 9, 2025

Uh oh!

Uh oh!

mergennachin left a comment

Uh oh!

Uh oh!

meta-codesync bot commented Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Add a wav loader #14923

Add a wav loader #14923

Uh oh!

Conversation

larryliu0820 commented Oct 9, 2025

Uh oh!

pytorch-bot bot commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14923

⏳ No Failures, 16 Pending

Uh oh!

meta-codesync bot commented Oct 9, 2025

Uh oh!

Uh oh!

mergennachin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

meta-codesync bot commented Oct 9, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pytorch-bot bot commented Oct 9, 2025 •

edited

Loading