update vispruner #425

zhangbilang · 2025-07-26T06:09:47Z

No description provided.

gemini-code-assist

Summary of Changes

Hello @SmudgedWings, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new visual token reduction method named VisPruner designed for vision-language models, aiming to improve efficiency by intelligently pruning redundant visual tokens. Alongside VisPruner, a comprehensive Visualizer tool is added to provide insights into the token reduction process through various visualizations. The changes also include necessary model adaptations and enhancements to core visualization utilities to support these new functionalities.

Highlights

New Visual Token Pruning Method: I've introduced VisPruner, a novel token reduction strategy designed for vision-language models. This method intelligently prunes redundant visual tokens by selecting important ones based on attention scores and diverse ones through an iterative pruning process based on feature similarity. This aims to improve model efficiency without significant performance degradation.
Integrated Visualization Tool: To aid in debugging and understanding the token reduction process, I've added a Visualizer class. This tool can capture and display various aspects of the model's internal state, including attention maps (per head and grid-to-grid) and the specific image patches that are retained after pruning.
Configuration and Integration: A new YAML configuration file has been added for VisPruner, allowing for easy setup and parameter tuning (e.g., prune_ratio, important_ratio). Both VisPruner and Visualizer have been integrated into the token_reduction module, making them discoverable and usable within the framework.
Model Compatibility Updates: I've made necessary adjustments to the Llava model's configuration loading, simplifying the process by removing explicit LlavaConfig usage. Additionally, the way pruning parameters (select_layer, select_feature) are accessed has been updated to reference the vision_model directly. Minor cleanups and flag additions were also made to the Qwen2.5VL model.
Enhanced Visualization Utilities: The core llmc/utils/visualizer.py module has been significantly enhanced. I've added a new helper function (to_pil_image) for consistent image conversion and modified existing plotting functions (grid_show, visualize_heads, visualize_grid_to_grid) to save generated visualizations to disk instead of displaying them directly. This improves automation and reproducibility of visualization outputs.
Minor Fixes and Improvements: I've addressed a typo (contexual_input_idx to contextual_input_idx) and introduced a check for non-contiguous visual tokens within the visionzip.py module, improving the robustness of token handling.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces the VisPruner method and a Visualizer utility. The changes are a good addition but include several critical issues like leftover debugging code (pdb, hardcoded arguments), bugs that will cause runtime errors, and some maintainability concerns like magic numbers. These issues need to be addressed before this can be merged.

gemini-code-assist · 2025-07-26T06:12:27Z

llmc/compression/token_reduction/visionzip.py

+                    raise ValueError('Visual tokens are not contiguous in input_ids!')
+                    segment_starts = [st_idx[0].item()] + [st_idx[i + 1].item() for i in discontinuities.tolist()]  # noqa
+                    segment_ends = [st_idx[i].item() for i in discontinuities.tolist()] + [st_idx[-1].item()]  # noqa
+                    offset = 0
+                    for first, last in zip(segment_starts, segment_ends):
+                        length = last - first + 1
+                        # [15 1502] [1505 3289]
+                        img_mask[first: last + 1] = ~select_mask[offset: offset + length]


The raise ValueError statement on line 544 makes the subsequent lines (545-551) unreachable. This dead code should be removed to improve clarity.

If the intention is to support discontinuous visual tokens, the raise statement should be removed, and the logic to handle multiple segments needs to be fully implemented, as the current implementation seems incomplete (e.g., the calculation of contextual_input_idx and slicing of inputs_embeds assume a single contiguous block).

raise ValueError('Visual tokens are not contiguous in input_ids!')

gemini-code-assist · 2025-07-26T06:12:27Z

llmc/compression/token_reduction/vispruner.py

+                image_attentions = image_attentions[:, :, 0, 1:]
+            elif pruning_paras['select_feature'] == 'cls_patch':
+                image_attentions = image_attentions
+                raise ValueError(f'Unexpected select feature: {self.select_feature}')


The attribute self.select_feature is used here but it's not defined on the VisPruner instance. This will cause an AttributeError at runtime.

You probably meant to use pruning_paras['select_feature'], which is available in the hook's scope.

raise ValueError(f"Unexpected select feature: {pruning_paras['select_feature']}")

gemini-code-assist · 2025-07-26T06:12:28Z

llmc/compression/token_reduction/visualizer.py

+            attention_maps = pruning_paras['attentions'][0]
+            visual_attention_maps = attention_maps[:,:,35:35+576,35:35+576]
+            image = pruning_paras['images'][0]
+            import pdb;pdb.set_trace()


A pdb.set_trace() call is present. This is a debugger breakpoint and must be removed from the code before merging.

gemini-code-assist · 2025-07-26T06:12:28Z

llmc/compression/token_reduction/visualizer.py

+
+            visualize_kept_patches(
+                pruning_paras['images'][0],
+                pruning_paras['visual_keep_indexs'],


There is a typo in the dictionary key: visual_keep_indexs. This will raise a KeyError at runtime. It should be visual_keep_indices.

Suggested change

pruning_paras['visual_keep_indexs'],

pruning_paras['visual_keep_indices'],

gemini-code-assist · 2025-07-26T06:12:28Z

llmc/eval/eval_vqa.py

+        import argparse
+        cli_args = argparse.Namespace(
+            process_with_media=True,
+        )


This block hardcodes cli_args, which overrides any arguments passed to the function. This appears to be a debugging remnant and should be removed.

Additionally, import argparse should be moved to the top of the file per PEP 8 guidelines.

gemini-code-assist · 2025-07-26T06:12:28Z

llmc/compression/token_reduction/visualizer.py

+            visual_attention_maps = attention_maps[:,:,35:35+576,35:35+576]
+            image = pruning_paras['images'][0]
+            import pdb;pdb.set_trace()
+            print(__file__)


This print() statement seems to be for debugging purposes and should be removed.

gemini-code-assist · 2025-07-26T06:12:28Z

llmc/compression/token_reduction/visualizer.py

+                save_path=''
+            )
+            visualize_grid_to_grid(
+                visual_attention_maps[0,4,:,:],                
+                300, 
+                image,
+                grid_size=24,
+                save_path='')
+
+            visualize_kept_patches(
+                pruning_paras['images'][0],
+                pruning_paras['visual_keep_indexs'],
+                patch_size=14,
+                save_path='',


The save_path argument is hardcoded to an empty string ('') for visualize_heads, visualize_grid_to_grid, and visualize_kept_patches. This will prevent the visualizations from being saved. These paths should be configurable, likely passed in through the pruning_paras.

gemini-code-assist · 2025-07-26T06:12:28Z

llmc/compression/token_reduction/vispruner.py

+
+            B, N, C = image_features.shape
+            device = image_features.device
+            index_masks = torch.ones(B, N, dtype=torch.bool, device=device)


The index_masks variable is initialized here but it is never used before being redefined on line 79. This line is dead code and can be removed.

gemini-code-assist · 2025-07-26T06:12:28Z

llmc/compression/token_reduction/visualizer.py

+        @prefill_wrapper
+        def visualizer_hook(module, inps, layer_outs, pruning_paras):
+            attention_maps = pruning_paras['attentions'][0]
+            visual_attention_maps = attention_maps[:,:,35:35+576,35:35+576]


The values 35 and 576 are magic numbers. They make the code harder to understand and maintain. Please define them as named constants or fetch them from the configuration, similar to how it's done in other modules.

gemini-code-assist · 2025-07-26T06:12:28Z

llmc/compression/token_reduction/visualizer.py

+        )
+
+        for idx, blk in enumerate(self.blocks):
+            if idx == 5:


The value 5 is a magic number. It's not clear what this layer index represents. Please replace it with a named constant or a configuration parameter to improve code clarity and maintainability.

Suggested change

if idx == 5:

VISUALIZATION_LAYER_IDX = 5

if idx == VISUALIZATION_LAYER_IDX:

gemini-code-assist bot reviewed Jul 26, 2025

View reviewed changes

llmc-reviewer approved these changes Jul 26, 2025

View reviewed changes

gemini-code-assist bot reviewed Jul 26, 2025

View reviewed changes

update vispruner

35e895a

zhangbilang force-pushed the main branch from f74e2bb to 35e895a Compare July 26, 2025 06:25

llmc-reviewer merged commit c038265 into ModelTC:main Jul 26, 2025
2 checks passed

	pruning_paras['visual_keep_indexs'],
	pruning_paras['visual_keep_indices'],

	if idx == 5:
	VISUALIZATION_LAYER_IDX = 5
	if idx == VISUALIZATION_LAYER_IDX:

update vispruner #425

update vispruner #425

Uh oh!

Conversation

zhangbilang commented Jul 26, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 26, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants