-
Notifications
You must be signed in to change notification settings - Fork 693
Add a wav loader #14923
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a wav loader #14923
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14923
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 16 PendingAs of commit a56734d with merge base fc512fa ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
@larryliu0820 has imported this pull request. If you are a Meta employee, you can view this in D84214903. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the README.md please
https://github.com/pytorch/executorch/blob/main/examples/models/voxtral/README.md
Summary: This pull request adds support for loading and processing `.wav` audio files in the multimodal runner, alongside existing `.bin` file support. It introduces a dedicated WAV loader utility, updates the runner to dispatch audio file processing based on file type, and adds comprehensive tests for WAV file parsing and normalization. These changes improve flexibility and robustness when handling audio inputs. **WAV file support and audio processing:** * Added a new utility `wav_loader.h` that provides functions to parse WAV file headers and load normalized PCM audio data from `.wav` files, supporting 16-bit and 32-bit PCM formats. * Updated `multimodal.cpp` to support loading audio from both `.bin` and `.wav` files, including input validation and error handling for unsupported formats. The runner now uses the processor for both file types and enforces processor requirements for `.wav` files. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L138-R149) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L166-R191) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R247-L255) * Added a new command-line flag `data_path` and passed it to the multimodal runner to facilitate data file handling. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R38) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R294) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L297-R322) **Testing and build integration:** * Introduced `test_wav_loader.cpp`, which provides unit tests for WAV header parsing, sample normalization, error handling, and unsupported format detection. * Registered the new utility and tests in build configuration files, ensuring proper header exports and test coverage. [[1]](diffhunk://#diff-8a73187dfda9c5479db6911bee649164ff4434d36e8f4eb881cc1f049c4e3271R108) [[2]](diffhunk://#diff-24b61cfeb7f1fc9a646df385ece0c31ea2ab18b3c7e34fc62117c62538e111ffL22-R22) [[3]](diffhunk://#diff-c8ef93f128805fc48fe2d7c1dadb9ff5d2f4dc5ee7c00b638fd193d3dfb1f06cR47-R56) [[4]](diffhunk://#diff-d755455ed59da7a902bb5a5c1e540a1924f63e8f70a9dc78b455f2c569a19db6R17) Reviewed By: mergennachin Differential Revision: D84214903 Pulled By: larryliu0820
0cff5e8
to
d18960f
Compare
@larryliu0820 has exported this pull request. If you are a Meta employee, you can view the originating Diff in D84214903. |
Summary: This pull request adds support for loading and processing `.wav` audio files in the multimodal runner, alongside existing `.bin` file support. It introduces a dedicated WAV loader utility, updates the runner to dispatch audio file processing based on file type, and adds comprehensive tests for WAV file parsing and normalization. These changes improve flexibility and robustness when handling audio inputs. **WAV file support and audio processing:** * Added a new utility `wav_loader.h` that provides functions to parse WAV file headers and load normalized PCM audio data from `.wav` files, supporting 16-bit and 32-bit PCM formats. * Updated `multimodal.cpp` to support loading audio from both `.bin` and `.wav` files, including input validation and error handling for unsupported formats. The runner now uses the processor for both file types and enforces processor requirements for `.wav` files. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L138-R149) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L166-R191) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R247-L255) * Added a new command-line flag `data_path` and passed it to the multimodal runner to facilitate data file handling. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R38) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R294) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L297-R322) **Testing and build integration:** * Introduced `test_wav_loader.cpp`, which provides unit tests for WAV header parsing, sample normalization, error handling, and unsupported format detection. * Registered the new utility and tests in build configuration files, ensuring proper header exports and test coverage. [[1]](diffhunk://#diff-8a73187dfda9c5479db6911bee649164ff4434d36e8f4eb881cc1f049c4e3271R108) [[2]](diffhunk://#diff-24b61cfeb7f1fc9a646df385ece0c31ea2ab18b3c7e34fc62117c62538e111ffL22-R22) [[3]](diffhunk://#diff-c8ef93f128805fc48fe2d7c1dadb9ff5d2f4dc5ee7c00b638fd193d3dfb1f06cR47-R56) [[4]](diffhunk://#diff-d755455ed59da7a902bb5a5c1e540a1924f63e8f70a9dc78b455f2c569a19db6R17) Reviewed By: mergennachin Differential Revision: D84214903 Pulled By: larryliu0820
d18960f
to
cd552dc
Compare
Summary: This pull request adds support for loading and processing `.wav` audio files in the multimodal runner, alongside existing `.bin` file support. It introduces a dedicated WAV loader utility, updates the runner to dispatch audio file processing based on file type, and adds comprehensive tests for WAV file parsing and normalization. These changes improve flexibility and robustness when handling audio inputs. **WAV file support and audio processing:** * Added a new utility `wav_loader.h` that provides functions to parse WAV file headers and load normalized PCM audio data from `.wav` files, supporting 16-bit and 32-bit PCM formats. * Updated `multimodal.cpp` to support loading audio from both `.bin` and `.wav` files, including input validation and error handling for unsupported formats. The runner now uses the processor for both file types and enforces processor requirements for `.wav` files. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L138-R149) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L166-R191) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R247-L255) * Added a new command-line flag `data_path` and passed it to the multimodal runner to facilitate data file handling. [[1]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R38) [[2]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843R294) [[3]](diffhunk://#diff-0ac16dbe4eaefa08e21fbda582fe2cd2b482f43aaedfc1bf2f31becf5e7bb843L297-R322) **Testing and build integration:** * Introduced `test_wav_loader.cpp`, which provides unit tests for WAV header parsing, sample normalization, error handling, and unsupported format detection. * Registered the new utility and tests in build configuration files, ensuring proper header exports and test coverage. [[1]](diffhunk://#diff-8a73187dfda9c5479db6911bee649164ff4434d36e8f4eb881cc1f049c4e3271R108) [[2]](diffhunk://#diff-24b61cfeb7f1fc9a646df385ece0c31ea2ab18b3c7e34fc62117c62538e111ffL22-R22) [[3]](diffhunk://#diff-c8ef93f128805fc48fe2d7c1dadb9ff5d2f4dc5ee7c00b638fd193d3dfb1f06cR47-R56) [[4]](diffhunk://#diff-d755455ed59da7a902bb5a5c1e540a1924f63e8f70a9dc78b455f2c569a19db6R17) Reviewed By: mergennachin Differential Revision: D84214903 Pulled By: larryliu0820
cd552dc
to
a56734d
Compare
This pull request adds support for loading and processing
.wav
audio files in the multimodal runner, alongside existing.bin
file support. It introduces a dedicated WAV loader utility, updates the runner to dispatch audio file processing based on file type, and adds comprehensive tests for WAV file parsing and normalization. These changes improve flexibility and robustness when handling audio inputs.WAV file support and audio processing:
wav_loader.h
that provides functions to parse WAV file headers and load normalized PCM audio data from.wav
files, supporting 16-bit and 32-bit PCM formats.multimodal.cpp
to support loading audio from both.bin
and.wav
files, including input validation and error handling for unsupported formats. The runner now uses the processor for both file types and enforces processor requirements for.wav
files. [1] [2] [3]data_path
and passed it to the multimodal runner to facilitate data file handling. [1] [2] [3]Testing and build integration:
test_wav_loader.cpp
, which provides unit tests for WAV header parsing, sample normalization, error handling, and unsupported format detection.