-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
[Model] Define merge_by_field_config MM interface (R-T) #26260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Define merge_by_field_config MM interface (R-T) #26260
Conversation
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a new merge_by_field_config
interface for multimodal models to optimize input batching. While the refactoring is a good step, it introduces a critical bug in several models (step3_vl.py
, tarsier.py
). The changes incorrectly assume that multimodal data like pixel_values
will always be a single stacked tensor. However, for batches with heterogeneous input sizes (e.g., images of different dimensions), the data will be a list of tensors. The previous flatten_bn
function, which was removed, handled this by concatenating the list. This concatenation logic is now missing, which will lead to AttributeError
and crashes. I've added suggestions to restore the concatenation logic in the affected models.
f115476
to
05760d6
Compare
05760d6
to
ffa00c6
Compare
Have you tested each model using the example script? |
ffa00c6
to
f6f7839
Compare
Signed-off-by: Ayush Satyam <[email protected]>
f6f7839
to
6e3d1d8
Compare
@DarkLight1337 I have done the required changes. But, I am sorry since I couldn't run the example scripts due to GPU constraints |
Signed-off-by: DarkLight1337 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for helping out! However I found that both Step3-VL and Terratorch fail to be run locally. I'll try to fix them later.
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
Signed-off-by: DarkLight1337 <[email protected]>
…#26260) Signed-off-by: Ayush Satyam <[email protected]> Signed-off-by: DarkLight1337 <[email protected]> Co-authored-by: DarkLight1337 <[email protected]>
Part of #26149