-
Notifications
You must be signed in to change notification settings - Fork 66
update vispruner #425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update vispruner #425
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @SmudgedWings, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a new visual token reduction method named VisPruner designed for vision-language models, aiming to improve efficiency by intelligently pruning redundant visual tokens. Alongside VisPruner, a comprehensive Visualizer tool is added to provide insights into the token reduction process through various visualizations. The changes also include necessary model adaptations and enhancements to core visualization utilities to support these new functionalities.
Highlights
- New Visual Token Pruning Method: I've introduced
VisPruner, a novel token reduction strategy designed for vision-language models. This method intelligently prunes redundant visual tokens by selecting important ones based on attention scores and diverse ones through an iterative pruning process based on feature similarity. This aims to improve model efficiency without significant performance degradation. - Integrated Visualization Tool: To aid in debugging and understanding the token reduction process, I've added a
Visualizerclass. This tool can capture and display various aspects of the model's internal state, including attention maps (per head and grid-to-grid) and the specific image patches that are retained after pruning. - Configuration and Integration: A new YAML configuration file has been added for
VisPruner, allowing for easy setup and parameter tuning (e.g.,prune_ratio,important_ratio). BothVisPrunerandVisualizerhave been integrated into thetoken_reductionmodule, making them discoverable and usable within the framework. - Model Compatibility Updates: I've made necessary adjustments to the
Llavamodel's configuration loading, simplifying the process by removing explicitLlavaConfigusage. Additionally, the way pruning parameters (select_layer,select_feature) are accessed has been updated to reference thevision_modeldirectly. Minor cleanups and flag additions were also made to theQwen2.5VLmodel. - Enhanced Visualization Utilities: The core
llmc/utils/visualizer.pymodule has been significantly enhanced. I've added a new helper function (to_pil_image) for consistent image conversion and modified existing plotting functions (grid_show,visualize_heads,visualize_grid_to_grid) to save generated visualizations to disk instead of displaying them directly. This improves automation and reproducibility of visualization outputs. - Minor Fixes and Improvements: I've addressed a typo (
contexual_input_idxtocontextual_input_idx) and introduced a check for non-contiguous visual tokens within thevisionzip.pymodule, improving the robustness of token handling.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the VisPruner method and a Visualizer utility. The changes are a good addition but include several critical issues like leftover debugging code (pdb, hardcoded arguments), bugs that will cause runtime errors, and some maintainability concerns like magic numbers. These issues need to be addressed before this can be merged.
| raise ValueError('Visual tokens are not contiguous in input_ids!') | ||
| segment_starts = [st_idx[0].item()] + [st_idx[i + 1].item() for i in discontinuities.tolist()] # noqa | ||
| segment_ends = [st_idx[i].item() for i in discontinuities.tolist()] + [st_idx[-1].item()] # noqa | ||
| offset = 0 | ||
| for first, last in zip(segment_starts, segment_ends): | ||
| length = last - first + 1 | ||
| # [15 1502] [1505 3289] | ||
| img_mask[first: last + 1] = ~select_mask[offset: offset + length] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The raise ValueError statement on line 544 makes the subsequent lines (545-551) unreachable. This dead code should be removed to improve clarity.
If the intention is to support discontinuous visual tokens, the raise statement should be removed, and the logic to handle multiple segments needs to be fully implemented, as the current implementation seems incomplete (e.g., the calculation of contextual_input_idx and slicing of inputs_embeds assume a single contiguous block).
raise ValueError('Visual tokens are not contiguous in input_ids!')| image_attentions = image_attentions[:, :, 0, 1:] | ||
| elif pruning_paras['select_feature'] == 'cls_patch': | ||
| image_attentions = image_attentions | ||
| raise ValueError(f'Unexpected select feature: {self.select_feature}') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The attribute self.select_feature is used here but it's not defined on the VisPruner instance. This will cause an AttributeError at runtime.
You probably meant to use pruning_paras['select_feature'], which is available in the hook's scope.
raise ValueError(f"Unexpected select feature: {pruning_paras['select_feature']}")| attention_maps = pruning_paras['attentions'][0] | ||
| visual_attention_maps = attention_maps[:,:,35:35+576,35:35+576] | ||
| image = pruning_paras['images'][0] | ||
| import pdb;pdb.set_trace() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| visualize_kept_patches( | ||
| pruning_paras['images'][0], | ||
| pruning_paras['visual_keep_indexs'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| import argparse | ||
| cli_args = argparse.Namespace( | ||
| process_with_media=True, | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| visual_attention_maps = attention_maps[:,:,35:35+576,35:35+576] | ||
| image = pruning_paras['images'][0] | ||
| import pdb;pdb.set_trace() | ||
| print(__file__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| save_path='' | ||
| ) | ||
| visualize_grid_to_grid( | ||
| visual_attention_maps[0,4,:,:], | ||
| 300, | ||
| image, | ||
| grid_size=24, | ||
| save_path='') | ||
|
|
||
| visualize_kept_patches( | ||
| pruning_paras['images'][0], | ||
| pruning_paras['visual_keep_indexs'], | ||
| patch_size=14, | ||
| save_path='', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| B, N, C = image_features.shape | ||
| device = image_features.device | ||
| index_masks = torch.ones(B, N, dtype=torch.bool, device=device) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| @prefill_wrapper | ||
| def visualizer_hook(module, inps, layer_outs, pruning_paras): | ||
| attention_maps = pruning_paras['attentions'][0] | ||
| visual_attention_maps = attention_maps[:,:,35:35+576,35:35+576] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| ) | ||
|
|
||
| for idx, blk in enumerate(self.blocks): | ||
| if idx == 5: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.