Skip to content

Conversation

@zhangbilang
Copy link
Collaborator

support qwen2.5vl(fastv,dart,visionzip),set token_start_idx ,token_length,vision_token_idx in model

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @SmudgedWings, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly extends the system's capabilities by integrating comprehensive support for the Qwen2.5-VL multimodal model. It adapts existing token reduction techniques—DART, FastV, and VisionZip—to work seamlessly with Qwen2.5-VL's architecture, particularly focusing on accurate vision token identification and efficient pruning. The changes involve introducing a new model definition, refining token handling logic across the sparsification methods, and adding visualization tools to aid in analysis.

Highlights

  • Qwen2.5-VL Model Integration: Added a new Qwen2_5VL model class and registered it, enabling the use of this model within the system. This includes defining its specific architecture and layer subsets for pruning.
  • Token Reduction Method Adaptation: Modified DART, FastV, and VisionZip to correctly identify and process vision tokens (vision_token_index, vision_token_length, vision_token_start_index) for Qwen2.5-VL and other VLM models, ensuring compatibility and proper pruning.
  • Qwen2.5-VL Specific VisionZip Logic: Implemented a dedicated Qwen2_5_VLVisionAttention_forward function and associated hooks within VisionZip to handle the unique attention mechanism and token merging for Qwen2.5-VL's visual tokens.
  • Generalized Token Configuration: Updated the base TokenReductionModule to retrieve vision token configuration parameters more robustly using .get() methods, improving flexibility.
  • Visualization Utilities: Introduced a new visualizer.py module with functions to visualize kept image patches and attention maps, which can be valuable for debugging and understanding the token reduction process.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for the qwen2.5vl model and refactors several token reduction methods (DART, FastV, VisionZip) to be more generic and support the new model architecture. The changes are extensive and introduce new model definitions, evaluation wrappers, and specific pruning logic for qwen2.5vl.

My review has identified a critical bug in the new pruning logic that would likely cause it to fail. I've also pointed out several medium-severity issues related to code correctness, robustness, and adherence to best practices. Addressing these points will improve the quality and stability of the new features.

Comment on lines +491 to +492
dominant_num = int(self.dominant * attn_logits.size(0))
contextual_num = max(int(self.contextual * attn_logits.size(0)), 1)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The calculation for dominant_num and contextual_num appears to be incorrect. self.dominant and self.contextual are loaded from the configuration as absolute token counts. However, they are being multiplied by attn_logits.size(0) (the sequence length), which will result in values that are far too large and will likely cause an error in the subsequent torch.topk call.

The multiplication should be removed, and the values from the configuration should be used directly.

Suggested change
dominant_num = int(self.dominant * attn_logits.size(0))
contextual_num = max(int(self.contextual * attn_logits.size(0)), 1)
dominant_num = self.dominant
contextual_num = max(self.contextual, 1)

Comment on lines +290 to +291
self.dominant = special_config['dominant']
self.contextual = special_config['contextual']

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using direct dictionary access special_config['...'] is less safe than using special_config.get('...', default_value). This change makes the code less robust, as it will raise a KeyError if the keys are not present in the configuration. It's better to retain the safe access pattern with default values.

Suggested change
self.dominant = special_config['dominant']
self.contextual = special_config['contextual']
self.dominant = special_config.get('dominant', 191)
self.contextual = special_config.get('contextual', 30)

first, last = st_idx[0].item(), st_idx[-1].item()
img_mask[first: last + 1] = ~select_mask
img_mask = ~img_mask
contexual_input_idx = false_pos[target_indices] + first

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the variable name contexual_input_idx. It should be contextual_input_idx. This typo is also present on line 565 where the variable is used.

Suggested change
contexual_input_idx = false_pos[target_indices] + first
contextual_input_idx = false_pos[target_indices] + first

inputs_embeds[:, contexual_input_idx] = contextual_tokens
kwargs['inputs_embeds'] = inputs_embeds[:, img_mask]
del contextual_tokens, hidden_states_filtered, hidden_to_merge, aggregated_hidden
torch.cuda.empty_cache()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of torch.cuda.empty_cache() can negatively impact performance as it is a blocking, synchronizing call. While it can help in releasing memory, it's often a sign of high memory pressure that might be better addressed by optimizing tensor lifetimes. Please consider if this call is strictly necessary, especially in library code.

'vision_token_start_index': 15
}

# todo: check

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

A todo: check comment is present in the code. Such comments should be addressed and removed before merging to ensure the code is complete and has been verified.

logger.warning(
'Can not import lmms_eval. '
'If you need it, please upgrade transformers.'
) No newline at end of file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file does not end with a newline character. It is a standard convention to end files with a newline to avoid issues with various command-line tools and file concatenation.

from PIL import Image, ImageDraw
import matplotlib.pyplot as plt

def save_image(imgae_tensor,mean,std,save_path):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a typo in the function parameter imgae_tensor. It should be image_tensor for clarity and correctness.

Suggested change
def save_image(imgae_tensor,mean,std,save_path):
def save_image(image_tensor,mean,std,save_path):

@llmc-reviewer llmc-reviewer merged commit 5f2895f into ModelTC:main Jul 23, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants