Skip to content

Conversation

larryliu0820
Copy link
Contributor

This pull request adds support for loading and processing .wav audio files in the multimodal runner, alongside existing .bin file support. It introduces a dedicated WAV loader utility, updates the runner to dispatch audio file processing based on file type, and adds comprehensive tests for WAV file parsing and normalization. These changes improve flexibility and robustness when handling audio inputs.

WAV file support and audio processing:

  • Added a new utility wav_loader.h that provides functions to parse WAV file headers and load normalized PCM audio data from .wav files, supporting 16-bit and 32-bit PCM formats.
  • Updated multimodal.cpp to support loading audio from both .bin and .wav files, including input validation and error handling for unsupported formats. The runner now uses the processor for both file types and enforces processor requirements for .wav files. [1] [2] [3]
  • Added a new command-line flag data_path and passed it to the multimodal runner to facilitate data file handling. [1] [2] [3]

Testing and build integration:

  • Introduced test_wav_loader.cpp, which provides unit tests for WAV header parsing, sample normalization, error handling, and unsupported format detection.
  • Registered the new utility and tests in build configuration files, ensuring proper header exports and test coverage. [1] [2] [3] [4]

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 9, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14923

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 16 Pending

As of commit a56734d with merge base fc512fa (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 9, 2025
@larryliu0820 larryliu0820 marked this pull request as ready for review October 9, 2025 00:22
@meta-codesync
Copy link

meta-codesync bot commented Oct 9, 2025

@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this in D84214903.

Copy link
Contributor

@mergennachin mergennachin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

facebook-github-bot pushed a commit that referenced this pull request Oct 9, 2025
Summary:
This pull request adds support for loading and processing `.wav` audio files in the multimodal runner, alongside existing `.bin` file support. It introduces a dedicated WAV loader utility, updates the runner to dispatch audio file processing based on file type, and adds comprehensive tests for WAV file parsing and normalization. These changes improve flexibility and robustness when handling audio inputs.

**WAV file support and audio processing:**

* Added a new utility `wav_loader.h` that provides functions to parse WAV file headers and load normalized PCM audio data from `.wav` files, supporting 16-bit and 32-bit PCM formats.
* Updated `multimodal.cpp` to support loading audio from both `.bin` and `.wav` files, including input validation and error handling for unsupported formats. The runner now uses the processor for both file types and enforces processor requirements for `.wav` files. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L138-R149) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L166-R191) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R247-L255)
* Added a new command-line flag `data_path` and passed it to the multimodal runner to facilitate data file handling. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R38) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R294) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L297-R322)

**Testing and build integration:**

* Introduced `test_wav_loader.cpp`, which provides unit tests for WAV header parsing, sample normalization, error handling, and unsupported format detection.
* Registered the new utility and tests in build configuration files, ensuring proper header exports and test coverage. [[1]](diffhunk://#diff-8a73187dfda9c5479db6911bee649164ff4434d36e8f4eb881cc1f049c4e3271R108) [[2]](diffhunk://#diff-24b61cfeb7f1fc9a646df385ece0c31ea2ab18b3c7e34fc62117c62538e111ffL22-R22) [[3]](diffhunk://#diff-c8ef93f128805fc48fe2d7c1dadb9ff5d2f4dc5ee7c00b638fd193d3dfb1f06cR47-R56) [[4]](diffhunk://#diff-d755455ed59da7a902bb5a5c1e540a1924f63e8f70a9dc78b455f2c569a19db6R17)


Reviewed By: mergennachin

Differential Revision: D84214903

Pulled By: larryliu0820
@meta-codesync
Copy link

meta-codesync bot commented Oct 9, 2025

@larryliu0820 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84214903.

facebook-github-bot pushed a commit that referenced this pull request Oct 9, 2025
Summary:
This pull request adds support for loading and processing `.wav` audio files in the multimodal runner, alongside existing `.bin` file support. It introduces a dedicated WAV loader utility, updates the runner to dispatch audio file processing based on file type, and adds comprehensive tests for WAV file parsing and normalization. These changes improve flexibility and robustness when handling audio inputs.

**WAV file support and audio processing:**

* Added a new utility `wav_loader.h` that provides functions to parse WAV file headers and load normalized PCM audio data from `.wav` files, supporting 16-bit and 32-bit PCM formats.
* Updated `multimodal.cpp` to support loading audio from both `.bin` and `.wav` files, including input validation and error handling for unsupported formats. The runner now uses the processor for both file types and enforces processor requirements for `.wav` files. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L138-R149) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L166-R191) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R247-L255)
* Added a new command-line flag `data_path` and passed it to the multimodal runner to facilitate data file handling. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R38) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R294) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L297-R322)

**Testing and build integration:**

* Introduced `test_wav_loader.cpp`, which provides unit tests for WAV header parsing, sample normalization, error handling, and unsupported format detection.
* Registered the new utility and tests in build configuration files, ensuring proper header exports and test coverage. [[1]](diffhunk://#diff-8a73187dfda9c5479db6911bee649164ff4434d36e8f4eb881cc1f049c4e3271R108) [[2]](diffhunk://#diff-24b61cfeb7f1fc9a646df385ece0c31ea2ab18b3c7e34fc62117c62538e111ffL22-R22) [[3]](diffhunk://#diff-c8ef93f128805fc48fe2d7c1dadb9ff5d2f4dc5ee7c00b638fd193d3dfb1f06cR47-R56) [[4]](diffhunk://#diff-d755455ed59da7a902bb5a5c1e540a1924f63e8f70a9dc78b455f2c569a19db6R17)


Reviewed By: mergennachin

Differential Revision: D84214903

Pulled By: larryliu0820
@larryliu0820 larryliu0820 added the release notes: multimodal Changes and new features for multimodal support label Oct 9, 2025
Summary:
This pull request adds support for loading and processing `.wav` audio files in the multimodal runner, alongside existing `.bin` file support. It introduces a dedicated WAV loader utility, updates the runner to dispatch audio file processing based on file type, and adds comprehensive tests for WAV file parsing and normalization. These changes improve flexibility and robustness when handling audio inputs.

**WAV file support and audio processing:**

* Added a new utility `wav_loader.h` that provides functions to parse WAV file headers and load normalized PCM audio data from `.wav` files, supporting 16-bit and 32-bit PCM formats.
* Updated `multimodal.cpp` to support loading audio from both `.bin` and `.wav` files, including input validation and error handling for unsupported formats. The runner now uses the processor for both file types and enforces processor requirements for `.wav` files. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L138-R149) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L166-R191) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R247-L255)
* Added a new command-line flag `data_path` and passed it to the multimodal runner to facilitate data file handling. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R38) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R294) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L297-R322)

**Testing and build integration:**

* Introduced `test_wav_loader.cpp`, which provides unit tests for WAV header parsing, sample normalization, error handling, and unsupported format detection.
* Registered the new utility and tests in build configuration files, ensuring proper header exports and test coverage. [[1]](diffhunk://#diff-8a73187dfda9c5479db6911bee649164ff4434d36e8f4eb881cc1f049c4e3271R108) [[2]](diffhunk://#diff-24b61cfeb7f1fc9a646df385ece0c31ea2ab18b3c7e34fc62117c62538e111ffL22-R22) [[3]](diffhunk://#diff-c8ef93f128805fc48fe2d7c1dadb9ff5d2f4dc5ee7c00b638fd193d3dfb1f06cR47-R56) [[4]](diffhunk://#diff-d755455ed59da7a902bb5a5c1e540a1924f63e8f70a9dc78b455f2c569a19db6R17)


Reviewed By: mergennachin

Differential Revision: D84214903

Pulled By: larryliu0820
@larryliu0820 larryliu0820 merged commit 66c3dea into main Oct 9, 2025
137 checks passed
@larryliu0820 larryliu0820 deleted the process_wav branch October 9, 2025 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. fb-exported meta-exported release notes: multimodal Changes and new features for multimodal support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants