Skip to content

Conversation

@zhangbilang
Copy link
Collaborator

Accuracy Alignment

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @SmudgedWings, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request primarily focuses on enhancing the compatibility and performance alignment of FastV, FasterVLM, and VisionZip token reduction methods with the LLaVA 1.5 model. The changes involve updating configuration parameters, refactoring the core logic of these methods to support different LLaVA model variants, and ensuring proper handling of model-specific settings like use_cache.

Highlights

  • Configuration Updates: I've updated the sparsification rates for FastV and FasterVLM in their respective YAML configuration files to 0.778. Additionally, the VisionZip configuration has been restructured to explicitly group vision-related parameters under a new vision key.
  • LLaVA 1.5 Model Compatibility: I've adapted the FastV, FasterVLM, and VisionZip token reduction implementations to ensure compatibility with different LLaVA model architectures (specifically LlavaHf and Llava). This involved introducing conditional logic for hook registrations and adjusting internal parameter handling (pruning_pars renamed to pruning_paras) based on the detected model class.
  • VisionZip Refactoring: The VisionZip integration has undergone significant refactoring. The custom CLIPVisionTower_VisionZip class has been removed, and its core logic is now applied by directly patching the CLIPEncoderLayer's forward method for Llava models. This allows for a more direct and integrated approach to token reduction within the vision tower.
  • Model Cache Management: I've enforced use_cache to be True for LLaVA model configurations, as it appears to be a necessary setting for these sparsification methods. Correspondingly, an explicit use_cache = False setting within the vlm_model.forward wrapper has been removed to prevent conflicts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for FastV, FasterVLM, and VisionZip token reduction methods for LLaVA 1.5 models. The changes primarily involve adding conditional logic to handle different LLaVA model architectures and implementing the necessary hooks and model modifications for these compression techniques. The review identified critical issues related to global monkey-patching and potential runtime errors, along with suggestions for code robustness and maintainability.

Comment on lines +53 to +54
else:
pruning_paras['image_attentions'] = pruning_paras['image_attentions_list']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

If len(pruning_paras['image_attentions_list']) is greater than 1, pruning_paras['image_attentions'] will be a list of tensors. The pruning_hook at line 59 expects a single tensor, which will raise an AttributeError. Stack and average the attentions here.

Suggested change
else:
pruning_paras['image_attentions'] = pruning_paras['image_attentions_list']
else:
# If multiple attention tensors are collected, stack and average them.
pruning_paras['image_attentions'] = torch.stack(pruning_paras['image_attentions_list']).mean(dim=0)

self.model.vlm_model.__class__.forward = visionzip_forward
self.model.vlm_model.vision_tower.register_forward_pre_hook(
if self.model.__class__.__name__ == 'LlavaHf':
self.model.vlm_model.__class__.forward = visionzip_forward

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Modifying __class__.forward monkey-patches the method for all instances of this class. Modify the forward method on the instance instead using types.MethodType.

from types import MethodType

self.model.vlm_model.forward = MethodType(visionzip_forward, self.model.vlm_model)

self.model.vlm_model.__class__.forward = visionzip_forward
elif self.model.__class__.__name__ == 'Llava':
from transformers.models.clip.modeling_clip import CLIPEncoderLayer
CLIPEncoderLayer.forward = CLIP_EncoderLayer_forward

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Globally monkey-patching CLIPEncoderLayer.forward is extremely risky. Modify the method only for the specific instances you are working with, or if you must patch the class, restore the original method afterward to limit the scope of the change.

elif pruning_paras['select_feature'] in ('full', 'cls_patch'):
image_attention = image_attentions
else:
raise ValueError(f'Unexpected select feature: {self.select_feature}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The variable self.select_feature is not defined in this class context. Use pruning_paras['select_feature'] instead.

Suggested change
raise ValueError(f'Unexpected select feature: {self.select_feature}')
raise ValueError(f"Unexpected select feature: {pruning_paras['select_feature']}")

Comment on lines +145 to +151
hook_fn = make_hook_prepare_inputs_labels_for_multimodal(self.pruning_paras)
self.model.vlm_model._original_prepare_inputs_labels_for_multimodal = (
self.model.vlm_model.prepare_inputs_labels_for_multimodal
)
self.model.vlm_model.prepare_inputs_labels_for_multimodal = MethodType(
hook_fn, self.model.vlm_model
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Monkey-patching methods at runtime can make the code difficult to understand and maintain. Consider a more explicit mechanism, like subclassing the model or using a wrapper class. If monkey-patching is the only viable option, add a comment explaining why it's necessary.

Comment on lines +139 to +186
if self.model.__class__.__name__ == 'LlavaHf':
self.model.vision_model.register_forward_pre_hook(
update_output_attentions_hook,
with_kwargs=True
)

self.model.vision_model.register_forward_hook(
functools.partial(store_attention_hook, pruning_paras=self.pruning_paras),
)
elif self.model.__class__.__name__ == 'Llava':
self.model.vision_model.register_forward_pre_hook(
functools.partial(clear_attentions_hook, pruning_paras=self.pruning_paras),
)

self.model.vision_model.register_forward_hook(
functools.partial(update_attentions_hook, pruning_paras=self.pruning_paras),
)

self.model.vision_model.vision_tower.register_forward_pre_hook(
update_output_attentions_hook,
with_kwargs=True
)

self.model.vision_model.vision_tower.register_forward_hook(
functools.partial(store_attention_hook, pruning_paras=self.pruning_paras),
)

self.model.vision_projector.register_forward_pre_hook(
functools.partial(pruning_hook, pruning_pars=self.model.model.parameters),
functools.partial(pruning_hook, pruning_paras=self.pruning_paras),
with_kwargs=True
)

self.model.vlm_model.register_forward_pre_hook(
functools.partial(get_image_mask_hook, pruning_pars=self.model.model.parameters),
with_kwargs=True
)
if self.model.__class__.__name__ == 'LlavaHf':
self.model.vlm_model.register_forward_pre_hook(
functools.partial(get_image_mask_hook, pruning_paras=self.pruning_paras),
with_kwargs=True
)

self.model.model.register_forward_pre_hook(
functools.partial(
prepare_inputs_for_llm_hook, pruning_pars=self.model.model.parameters
),
with_kwargs=True
)
self.model.model.register_forward_pre_hook(
functools.partial(
prepare_inputs_for_llm_hook, pruning_paras=self.pruning_paras
),
with_kwargs=True
)
elif self.model.__class__.__name__ == 'Llava':
self.model.vision_projector.register_forward_hook(
functools.partial(prepare_inputs_hook, pruning_paras=self.pruning_paras),
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using self.model.__class__.__name__ for conditional logic is brittle. Use isinstance with the specific model classes (e.g., isinstance(self.model, LlavaHf)) for a more robust approach.

torch.bmm(assign_one_hot.transpose(1, 2), hidden_to_merge) / counts
)
target_hidden = hidden_states_filtered[:, target_indices, :]
r = self.self_attn.k_proj._info['r'].pop(0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using pop(0) on a list is an O(n) operation. Consider using collections.deque, which provides an O(1) popleft() operation.

if self.model.__class__.__name__ == 'LlavaHf':
self.model.vlm_model.__class__.forward = visionzip_forward
elif self.model.__class__.__name__ == 'Llava':
from transformers.models.clip.modeling_clip import CLIPEncoderLayer

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Imports should generally be at the top of the file, as per PEP 8 guidelines. Move this import to the top of the file.

@zhangbilang zhangbilang merged commit ff18600 into ModelTC:main Jun 22, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants