support qwen2.5vl(fastv,dart,visionzip) #423

zhangbilang · 2025-07-23T11:37:44Z

support qwen2.5vl(fastv,dart,visionzip),set token_start_idx ,token_length,vision_token_idx in model

gemini-code-assist

Summary of Changes

Hello @SmudgedWings, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly extends the system's capabilities by integrating comprehensive support for the Qwen2.5-VL multimodal model. It adapts existing token reduction techniques—DART, FastV, and VisionZip—to work seamlessly with Qwen2.5-VL's architecture, particularly focusing on accurate vision token identification and efficient pruning. The changes involve introducing a new model definition, refining token handling logic across the sparsification methods, and adding visualization tools to aid in analysis.

Highlights

Qwen2.5-VL Model Integration: Added a new Qwen2_5VL model class and registered it, enabling the use of this model within the system. This includes defining its specific architecture and layer subsets for pruning.
Token Reduction Method Adaptation: Modified DART, FastV, and VisionZip to correctly identify and process vision tokens (vision_token_index, vision_token_length, vision_token_start_index) for Qwen2.5-VL and other VLM models, ensuring compatibility and proper pruning.
Qwen2.5-VL Specific VisionZip Logic: Implemented a dedicated Qwen2_5_VLVisionAttention_forward function and associated hooks within VisionZip to handle the unique attention mechanism and token merging for Qwen2.5-VL's visual tokens.
Generalized Token Configuration: Updated the base TokenReductionModule to retrieve vision token configuration parameters more robustly using .get() methods, improving flexibility.
Visualization Utilities: Introduced a new visualizer.py module with functions to visualize kept image patches and attention maps, which can be valuable for debugging and understanding the token reduction process.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request adds support for the qwen2.5vl model and refactors several token reduction methods (DART, FastV, VisionZip) to be more generic and support the new model architecture. The changes are extensive and introduce new model definitions, evaluation wrappers, and specific pruning logic for qwen2.5vl.

My review has identified a critical bug in the new pruning logic that would likely cause it to fail. I've also pointed out several medium-severity issues related to code correctness, robustness, and adherence to best practices. Addressing these points will improve the quality and stability of the new features.

gemini-code-assist · 2025-07-23T11:39:56Z

llmc/compression/token_reduction/visionzip.py

+            dominant_num = int(self.dominant * attn_logits.size(0))
+            contextual_num = max(int(self.contextual * attn_logits.size(0)), 1)


The calculation for dominant_num and contextual_num appears to be incorrect. self.dominant and self.contextual are loaded from the configuration as absolute token counts. However, they are being multiplied by attn_logits.size(0) (the sequence length), which will result in values that are far too large and will likely cause an error in the subsequent torch.topk call.

The multiplication should be removed, and the values from the configuration should be used directly.

Suggested change

dominant_num = int(self.dominant * attn_logits.size(0))

contextual_num = max(int(self.contextual * attn_logits.size(0)), 1)

dominant_num = self.dominant

contextual_num = max(self.contextual, 1)

gemini-code-assist · 2025-07-23T11:39:56Z

llmc/compression/token_reduction/visionzip.py

+        self.dominant = special_config['dominant']
+        self.contextual = special_config['contextual']


Using direct dictionary access special_config['...'] is less safe than using special_config.get('...', default_value). This change makes the code less robust, as it will raise a KeyError if the keys are not present in the configuration. It's better to retain the safe access pattern with default values.

Suggested change

self.dominant = special_config['dominant']

self.contextual = special_config['contextual']

self.dominant = special_config.get('dominant', 191)

self.contextual = special_config.get('contextual', 30)

gemini-code-assist · 2025-07-23T11:39:56Z

llmc/compression/token_reduction/visionzip.py

+                first, last = st_idx[0].item(), st_idx[-1].item()
+                img_mask[first: last + 1] = ~select_mask
+                img_mask = ~img_mask
+                contexual_input_idx = false_pos[target_indices] + first


There is a typo in the variable name contexual_input_idx. It should be contextual_input_idx. This typo is also present on line 565 where the variable is used.

Suggested change

contexual_input_idx = false_pos[target_indices] + first

contextual_input_idx = false_pos[target_indices] + first

gemini-code-assist · 2025-07-23T11:39:56Z

llmc/compression/token_reduction/visionzip.py

+            inputs_embeds[:, contexual_input_idx] = contextual_tokens
+            kwargs['inputs_embeds'] = inputs_embeds[:, img_mask]
+            del contextual_tokens, hidden_states_filtered, hidden_to_merge, aggregated_hidden
+            torch.cuda.empty_cache()


The use of torch.cuda.empty_cache() can negatively impact performance as it is a blocking, synchronizing call. While it can help in releasing memory, it's often a sign of high memory pressure that might be better addressed by optimizing tensor lifetimes. Please consider if this call is strictly necessary, especially in library code.

gemini-code-assist · 2025-07-23T11:39:56Z

llmc/models/qwen2_5vl.py

+            'vision_token_start_index': 15
+        }
+
+    # todo: check


A todo: check comment is present in the code. Such comments should be addressed and removed before merging to ensure the code is complete and has been verified.

gemini-code-assist · 2025-07-23T11:39:56Z

llmc/models/qwen2_5vl.py

+    logger.warning(
+        'Can not import lmms_eval. '
+        'If you need it, please upgrade transformers.'
+    )


This file does not end with a newline character. It is a standard convention to end files with a newline to avoid issues with various command-line tools and file concatenation.

gemini-code-assist · 2025-07-23T11:39:57Z

llmc/utils/visualizer.py

+from PIL import Image, ImageDraw
+import matplotlib.pyplot as plt
+
+def save_image(imgae_tensor,mean,std,save_path):


There is a typo in the function parameter imgae_tensor. It should be image_tensor for clarity and correctness.

Suggested change

def save_image(imgae_tensor,mean,std,save_path):

def save_image(image_tensor,mean,std,save_path):

gemini-code-assist bot reviewed Jul 23, 2025

View reviewed changes

support qwen2.5vl(fastv,dart,visionzip)

3fca41c

zhangbilang force-pushed the main branch from a66f7bc to 3fca41c Compare July 23, 2025 12:03

llmc-reviewer approved these changes Jul 23, 2025

View reviewed changes

llmc-reviewer merged commit 5f2895f into ModelTC:main Jul 23, 2025
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

support qwen2.5vl(fastv,dart,visionzip) #423

support qwen2.5vl(fastv,dart,visionzip) #423

Uh oh!

zhangbilang commented Jul 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jul 23, 2025

Uh oh!

gemini-code-assist bot Jul 23, 2025

Uh oh!

gemini-code-assist bot Jul 23, 2025

Uh oh!

gemini-code-assist bot Jul 23, 2025

Uh oh!

gemini-code-assist bot Jul 23, 2025

Uh oh!

gemini-code-assist bot Jul 23, 2025

Uh oh!

gemini-code-assist bot Jul 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		dominant_num = int(self.dominant * attn_logits.size(0))
		contextual_num = max(int(self.contextual * attn_logits.size(0)), 1)

		self.dominant = special_config['dominant']
		self.contextual = special_config['contextual']

	contexual_input_idx = false_pos[target_indices] + first
	contextual_input_idx = false_pos[target_indices] + first

	def save_image(imgae_tensor,mean,std,save_path):
	def save_image(image_tensor,mean,std,save_path):

support qwen2.5vl(fastv,dart,visionzip) #423

support qwen2.5vl(fastv,dart,visionzip) #423

Uh oh!

Conversation

zhangbilang commented Jul 23, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants