Skip to content

Conversation

ayushsatyam146
Copy link
Contributor

@ayushsatyam146 ayushsatyam146 commented Oct 5, 2025

Part of #26149

Copy link

mergify bot commented Oct 5, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ayushsatyam146.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Oct 5, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new merge_by_field_config interface for multimodal models to optimize input batching. While the refactoring is a good step, it introduces a critical bug in several models (step3_vl.py, tarsier.py). The changes incorrectly assume that multimodal data like pixel_values will always be a single stacked tensor. However, for batches with heterogeneous input sizes (e.g., images of different dimensions), the data will be a list of tensors. The previous flatten_bn function, which was removed, handled this by concatenating the list. This concatenation logic is now missing, which will lead to AttributeError and crashes. I've added suggestions to restore the concatenation logic in the affected models.

@ayushsatyam146 ayushsatyam146 force-pushed the merge_by_field_config-R branch from f115476 to 05760d6 Compare October 5, 2025 17:37
@mergify mergify bot removed the needs-rebase label Oct 5, 2025
@ayushsatyam146 ayushsatyam146 force-pushed the merge_by_field_config-R branch from 05760d6 to ffa00c6 Compare October 5, 2025 17:56
@DarkLight1337
Copy link
Member

Have you tested each model using the example script?

@ayushsatyam146 ayushsatyam146 force-pushed the merge_by_field_config-R branch from ffa00c6 to f6f7839 Compare October 6, 2025 06:03
@ayushsatyam146 ayushsatyam146 force-pushed the merge_by_field_config-R branch from f6f7839 to 6e3d1d8 Compare October 6, 2025 18:19
@ayushsatyam146
Copy link
Contributor Author

@DarkLight1337 I have done the required changes. But, I am sorry since I couldn't run the example scripts due to GPU constraints

Signed-off-by: DarkLight1337 <[email protected]>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping out! However I found that both Step3-VL and Terratorch fail to be run locally. I'll try to fix them later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants