Skip to content

Conversation

@zhangbilang
Copy link
Collaborator

No description provided.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @SmudgedWings, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new visual token reduction method named VisPruner designed for vision-language models, aiming to improve efficiency by intelligently pruning redundant visual tokens. Alongside VisPruner, a comprehensive Visualizer tool is added to provide insights into the token reduction process through various visualizations. The changes also include necessary model adaptations and enhancements to core visualization utilities to support these new functionalities.

Highlights

  • New Visual Token Pruning Method: I've introduced VisPruner, a novel token reduction strategy designed for vision-language models. This method intelligently prunes redundant visual tokens by selecting important ones based on attention scores and diverse ones through an iterative pruning process based on feature similarity. This aims to improve model efficiency without significant performance degradation.
  • Integrated Visualization Tool: To aid in debugging and understanding the token reduction process, I've added a Visualizer class. This tool can capture and display various aspects of the model's internal state, including attention maps (per head and grid-to-grid) and the specific image patches that are retained after pruning.
  • Configuration and Integration: A new YAML configuration file has been added for VisPruner, allowing for easy setup and parameter tuning (e.g., prune_ratio, important_ratio). Both VisPruner and Visualizer have been integrated into the token_reduction module, making them discoverable and usable within the framework.
  • Model Compatibility Updates: I've made necessary adjustments to the Llava model's configuration loading, simplifying the process by removing explicit LlavaConfig usage. Additionally, the way pruning parameters (select_layer, select_feature) are accessed has been updated to reference the vision_model directly. Minor cleanups and flag additions were also made to the Qwen2.5VL model.
  • Enhanced Visualization Utilities: The core llmc/utils/visualizer.py module has been significantly enhanced. I've added a new helper function (to_pil_image) for consistent image conversion and modified existing plotting functions (grid_show, visualize_heads, visualize_grid_to_grid) to save generated visualizations to disk instead of displaying them directly. This improves automation and reproducibility of visualization outputs.
  • Minor Fixes and Improvements: I've addressed a typo (contexual_input_idx to contextual_input_idx) and introduced a check for non-contiguous visual tokens within the visionzip.py module, improving the robustness of token handling.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the VisPruner method and a Visualizer utility. The changes are a good addition but include several critical issues like leftover debugging code (pdb, hardcoded arguments), bugs that will cause runtime errors, and some maintainability concerns like magic numbers. These issues need to be addressed before this can be merged.

Comment on lines +544 to +551
raise ValueError('Visual tokens are not contiguous in input_ids!')
segment_starts = [st_idx[0].item()] + [st_idx[i + 1].item() for i in discontinuities.tolist()] # noqa
segment_ends = [st_idx[i].item() for i in discontinuities.tolist()] + [st_idx[-1].item()] # noqa
offset = 0
for first, last in zip(segment_starts, segment_ends):
length = last - first + 1
# [15 1502] [1505 3289]
img_mask[first: last + 1] = ~select_mask[offset: offset + length]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The raise ValueError statement on line 544 makes the subsequent lines (545-551) unreachable. This dead code should be removed to improve clarity.

If the intention is to support discontinuous visual tokens, the raise statement should be removed, and the logic to handle multiple segments needs to be fully implemented, as the current implementation seems incomplete (e.g., the calculation of contextual_input_idx and slicing of inputs_embeds assume a single contiguous block).

raise ValueError('Visual tokens are not contiguous in input_ids!')

image_attentions = image_attentions[:, :, 0, 1:]
elif pruning_paras['select_feature'] == 'cls_patch':
image_attentions = image_attentions
raise ValueError(f'Unexpected select feature: {self.select_feature}')

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The attribute self.select_feature is used here but it's not defined on the VisPruner instance. This will cause an AttributeError at runtime.

You probably meant to use pruning_paras['select_feature'], which is available in the hook's scope.

raise ValueError(f"Unexpected select feature: {pruning_paras['select_feature']}")

attention_maps = pruning_paras['attentions'][0]
visual_attention_maps = attention_maps[:,:,35:35+576,35:35+576]
image = pruning_paras['images'][0]
import pdb;pdb.set_trace()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

A pdb.set_trace() call is present. This is a debugger breakpoint and must be removed from the code before merging.


visualize_kept_patches(
pruning_paras['images'][0],
pruning_paras['visual_keep_indexs'],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

There is a typo in the dictionary key: visual_keep_indexs. This will raise a KeyError at runtime. It should be visual_keep_indices.

Suggested change
pruning_paras['visual_keep_indexs'],
pruning_paras['visual_keep_indices'],

Comment on lines +92 to +95
import argparse
cli_args = argparse.Namespace(
process_with_media=True,
)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This block hardcodes cli_args, which overrides any arguments passed to the function. This appears to be a debugging remnant and should be removed.

Additionally, import argparse should be moved to the top of the file per PEP 8 guidelines.

visual_attention_maps = attention_maps[:,:,35:35+576,35:35+576]
image = pruning_paras['images'][0]
import pdb;pdb.set_trace()
print(__file__)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This print() statement seems to be for debugging purposes and should be removed.

Comment on lines 47 to 60
save_path=''
)
visualize_grid_to_grid(
visual_attention_maps[0,4,:,:],
300,
image,
grid_size=24,
save_path='')

visualize_kept_patches(
pruning_paras['images'][0],
pruning_paras['visual_keep_indexs'],
patch_size=14,
save_path='',

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The save_path argument is hardcoded to an empty string ('') for visualize_heads, visualize_grid_to_grid, and visualize_kept_patches. This will prevent the visualizations from being saved. These paths should be configurable, likely passed in through the pruning_paras.


B, N, C = image_features.shape
device = image_features.device
index_masks = torch.ones(B, N, dtype=torch.bool, device=device)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The index_masks variable is initialized here but it is never used before being redefined on line 79. This line is dead code and can be removed.

@prefill_wrapper
def visualizer_hook(module, inps, layer_outs, pruning_paras):
attention_maps = pruning_paras['attentions'][0]
visual_attention_maps = attention_maps[:,:,35:35+576,35:35+576]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The values 35 and 576 are magic numbers. They make the code harder to understand and maintain. Please define them as named constants or fetch them from the configuration, similar to how it's done in other modules.

)

for idx, blk in enumerate(self.blocks):
if idx == 5:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The value 5 is a magic number. It's not clear what this layer index represents. Please replace it with a named constant or a configuration parameter to improve code clarity and maintainability.

Suggested change
if idx == 5:
VISUALIZATION_LAYER_IDX = 5
if idx == VISUALIZATION_LAYER_IDX:

@llmc-reviewer llmc-reviewer merged commit c038265 into ModelTC:main Jul 26, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants