Skip to content

Conversation

@pytorchbot
Copy link
Collaborator

Letting Image class support both uint8_t and float data types, changing MultimodalPrefiller class to support text, image, and audio modalities with error checking and modularity.

Image Data Handling and Type Safety:

  • Refactored the Image class in image.h from a simple struct to a class that uses a std::variant to support both uint8_t and float image data, providing type-safe accessors and a toTensor method for conversion to tensors.
  • Updated load_image in Llava main.cpp to construct Image objects using the new class interface and move semantics, ensuring correct data layout and encapsulation.
  • Added a runtime check in LlavaImagePrefiller to ensure only uint8_t images are processed, using the new type-checking methods.

Multimodal Prefill Logic and Flexibility:

  • Updated the MultimodalPrefiller class in multimodal_prefiller.h to dynamically check input types, validate tensor types against model expectations, and handles encoder/decoder execution with improved error handling and modularity.

Letting `Image` class support both `uint8_t` and `float` data types,
changing `MultimodalPrefiller` class to support text, image, and audio
modalities with error checking and modularity.

**Image Data Handling and Type Safety:**

* Refactored the `Image` class in `image.h` from a simple struct to a
class that uses a `std::variant` to support both `uint8_t` and `float`
image data, providing type-safe accessors and a `toTensor` method for
conversion to tensors.
* Updated `load_image` in Llava `main.cpp` to construct `Image` objects
using the new class interface and move semantics, ensuring correct data
layout and encapsulation.
* Added a runtime check in `LlavaImagePrefiller` to ensure only
`uint8_t` images are processed, using the new type-checking methods.

**Multimodal Prefill Logic and Flexibility:**

* Updated the `MultimodalPrefiller` class in `multimodal_prefiller.h` to
dynamically check input types, validate tensor types against model
expectations, and handles encoder/decoder execution with improved error
handling and modularity.

(cherry picked from commit bc18834)
@pytorch-bot
Copy link

pytorch-bot bot commented Sep 22, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14490

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit f66fd96 with merge base e0dda90 (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Sep 22, 2025
@larryliu0820 larryliu0820 merged commit b0294e2 into release/1.0 Sep 23, 2025
121 of 122 checks passed
@larryliu0820 larryliu0820 deleted the cherry-pick-14359-by-pytorch_bot_bot_ branch September 23, 2025 00:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants