[multimodal] Allow float32 image input #14490

pytorchbot · 2025-09-22T21:47:12Z

Letting Image class support both uint8_t and float data types, changing MultimodalPrefiller class to support text, image, and audio modalities with error checking and modularity.

Image Data Handling and Type Safety:

Refactored the Image class in image.h from a simple struct to a class that uses a std::variant to support both uint8_t and float image data, providing type-safe accessors and a toTensor method for conversion to tensors.
Updated load_image in Llava main.cpp to construct Image objects using the new class interface and move semantics, ensuring correct data layout and encapsulation.
Added a runtime check in LlavaImagePrefiller to ensure only uint8_t images are processed, using the new type-checking methods.

Multimodal Prefill Logic and Flexibility:

Updated the MultimodalPrefiller class in multimodal_prefiller.h to dynamically check input types, validate tensor types against model expectations, and handles encoder/decoder execution with improved error handling and modularity.

Letting `Image` class support both `uint8_t` and `float` data types, changing `MultimodalPrefiller` class to support text, image, and audio modalities with error checking and modularity. **Image Data Handling and Type Safety:** * Refactored the `Image` class in `image.h` from a simple struct to a class that uses a `std::variant` to support both `uint8_t` and `float` image data, providing type-safe accessors and a `toTensor` method for conversion to tensors. * Updated `load_image` in Llava `main.cpp` to construct `Image` objects using the new class interface and move semantics, ensuring correct data layout and encapsulation. * Added a runtime check in `LlavaImagePrefiller` to ensure only `uint8_t` images are processed, using the new type-checking methods. **Multimodal Prefill Logic and Flexibility:** * Updated the `MultimodalPrefiller` class in `multimodal_prefiller.h` to dynamically check input types, validate tensor types against model expectations, and handles encoder/decoder execution with improved error handling and modularity. (cherry picked from commit bc18834)

pytorch-bot · 2025-09-22T21:47:17Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14490

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit f66fd96 with merge base e0dda90 ():

NEW FAILURE - The following job has failed:

pull / test-llava-runner-linux / linux-job (gh)
RuntimeError: Command docker exec -t 800725572e79e67953a3458cc143ecb9dbb24c2c7d0b79a78b4a4c61153f72df /exec failed with exit code 139

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorchbot requested review from jackzhxng, kirklandsign, larryliu0820, lucylq, mergennachin and swolchok as code owners September 22, 2025 21:47

This was referenced Sep 22, 2025

[v1.0.0] Release Tracker #14288

Closed

[multimodal] Allow float32 image input #14359

Merged

meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 22, 2025

larryliu0820 added 3 commits September 22, 2025 15:07

patch

61ebe1e

Patch llava

3daa2ba

Patch llava

f66fd96

larryliu0820 approved these changes Sep 22, 2025

View reviewed changes

larryliu0820 merged commit b0294e2 into release/1.0 Sep 23, 2025
121 of 122 checks passed

larryliu0820 deleted the cherry-pick-14359-by-pytorch_bot_bot_ branch September 23, 2025 00:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[multimodal] Allow float32 image input #14490

[multimodal] Allow float32 image input #14490

Uh oh!

pytorchbot commented Sep 22, 2025

Uh oh!

pytorch-bot bot commented Sep 22, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[multimodal] Allow float32 image input #14490

[multimodal] Allow float32 image input #14490

Uh oh!

Conversation

pytorchbot commented Sep 22, 2025

Uh oh!

pytorch-bot bot commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14490

❌ 1 New Failure

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pytorch-bot bot commented Sep 22, 2025 •

edited

Loading