-
Notifications
You must be signed in to change notification settings - Fork 66
dycoke #388
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dycoke #388
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @chengtao-lv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello! Gemini here, providing a summary of this pull request. This PR introduces a new token reduction method called 'DyCoke' specifically designed for vision models, integrated into the Llava OneVision architecture. The core idea appears to be reducing the number of visual tokens by merging similar ones across frames based on cosine similarity. This involves adding the DyCoke implementation, registering it within the token reduction framework, and updating the Llava OneVision model to correctly identify its vision blocks for applying this method. A new configuration file is also added to define the parameters for the DyCoke method.
Highlights
- New Token Reduction Method: Adds the implementation for a new token reduction method named 'DyCoke', aimed at reducing visual tokens in models like Llava OneVision.
- Vision Model Integration: Modifies the Llava OneVision model to correctly identify its vision encoder layers, enabling the application of token reduction methods to the visual modality.
- Configuration: Includes a new YAML configuration file (
dycoke.yml) to specify parameters for the DyCoke token reduction method. - Token Merging Logic: Implements a function (
dycole_ttm) that calculates cosine similarity between tokens in adjacent frames and merges tokens based on a defined merging ratio.
Changelog
Click here to see the changelog
- configs/sparsification/methods/DyCoke/dycoke.yml
- Added a new configuration file for the DyCoke token reduction method.
- Defines parameters such as
dycoke_layer_idx,num_tokens_per_frame,merging_ratio, anddycoke_radio.
- llmc/compression/token_reduction/init.py
- Added an import statement to include the new
DyCokemodule, making it available in thetoken_reductionpackage.
- Added an import statement to include the new
- llmc/compression/token_reduction/dycoke.py
- Added the core implementation of the
DyCoketoken reduction method. - Includes the
dycole_ttmfunction for token merging based on similarity. - Registers
DyCokewith theTOKEN_REDUCTION_REGISTRY. - Adds a hook (
add_dycole_ttm_to_get_2dPool) to integrate the merging logic into the model's processing pipeline, specifically forLlavaMetaForCausalLM.
- Added the core implementation of the
- llmc/models/llava_onevision.py
- Added the
find_blocksmethod to theLLaVA_OVclass. - This method now correctly identifies the vision encoder layers (
self.vision_model.vision_tower.vision_model.encoder.layers) when the modality is 'vision', in addition to handling the 'language' modality.
- Added the
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the DyCoke token reduction method. The changes include a new configuration file, the core implementation of the DyCoke algorithm, and necessary integrations into the existing model and token reduction framework.
Overall, the PR lays a good foundation for the DyCoke method. However, there are several points that need attention, including a potential bug in handling an odd number of frames, typos in configuration and code, and areas where clarity could be improved. Addressing these will enhance the robustness and maintainability of the new feature.
Summary of Findings
- Potential Bug: Data Loss with Odd Number of Frames: In
dycoke.py, thedycole_ttmfunction's first loop does not correctly handle cases wherenum_framesis odd and greater than 1, potentially leading to the last frame being lost. This is a high-severity issue. - Configuration Clarity and Typos: The
dycoke.ymlconfiguration file contains placeholder paths that need user attention. There's also a parameterdycoke_radiowhich appears to be a typo fordycoke_ratioormerging_ratioand is currently unused by the Python code. - Naming Inconsistencies: The core function
dycole_ttmand its registration helperadd_dycole_ttm_to_get_2dPoolindycoke.pyseem to have a typo ('dycole' instead of 'dycoke'). Consistent naming aligning with 'DyCoke' is recommended. - Code Clarity: The
dycole_ttmfunction indycoke.pyis complex. A docstring explaining its logic, inputs, and outputs would greatly improve its understandability. Additionally, a comment regarding the second processing loop could be more descriptive. - Unused Imports: The file
llmc/compression/token_reduction/dycoke.pyimportsfunctools,List, Optional, Tuple, Unionfromtyping,Cache, DynamicCachefromtransformers.cache_utils, andprefill_wrapperfrom.utils, but these imports are not used in the new code. (Severity: low, not commented inline due to review settings) - Redundant Tensor Conversion: In
llmc/compression/token_reduction/dycoke.py(lines 40-42 and 76-78),torch.tensor(similarity)is used wheresimilarityis already a tensor. This is redundant.torch.stack(similarities)directly should suffice. (Severity: low, not commented inline due to review settings)
Merge Readiness
This pull request introduces promising functionality with the DyCoke method. However, due to the identified high-severity issue regarding data loss with an odd number of frames, and several medium-severity issues related to configuration, naming, and clarity, I recommend that these changes be addressed before merging. Once these points are resolved, the PR will be in a much stronger state. As a reviewer, I am not authorized to approve pull requests, so please ensure further review and approval from authorized maintainers after addressing the feedback.
| for i in range(0, num_frames - 1, 2): | ||
| # Get tokens for adjacent frames | ||
| frame1_tokens = image_feature[ | ||
| i * num_tokens_per_frame: (i + 1) * num_tokens_per_frame | ||
| ] | ||
| frame2_tokens = image_feature[ | ||
| (i + 1) * num_tokens_per_frame: (i + 2) * num_tokens_per_frame | ||
| ] | ||
|
|
||
| # Calculate cosine similarity between normalized tokens | ||
| frame1_norm = torch.nn.functional.normalize(frame1_tokens, p=2, dim=1) | ||
| frame2_norm = torch.nn.functional.normalize(frame2_tokens, p=2, dim=1) | ||
| similarity = torch.nn.functional.cosine_similarity( | ||
| frame1_norm, frame2_norm, dim=1 | ||
| ) | ||
| similarities.append(similarity) | ||
|
|
||
| similarities = torch.stack( | ||
| [torch.tensor(similarity) for similarity in similarities] | ||
| ) | ||
|
|
||
| # Process even frames | ||
| modified_image_feature = [] | ||
| for i in range(0, num_frames - 1, 2): | ||
| frame1_tokens = image_feature[ | ||
| i * num_tokens_per_frame: (i + 1) * num_tokens_per_frame | ||
| ] | ||
| frame2_tokens = image_feature[ | ||
| (i + 1) * num_tokens_per_frame: (i + 2) * num_tokens_per_frame | ||
| ] | ||
|
|
||
| avg_similarity = similarities[i // 2] | ||
| num_tokens_to_keep = int(merging_ratio * num_tokens_per_frame) | ||
| tokens_to_keep = avg_similarity.topk(num_tokens_to_keep, largest=False).indices | ||
|
|
||
| modified_image_feature.append(frame1_tokens) | ||
| modified_image_feature.append(frame2_tokens[tokens_to_keep]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The first loop for i in range(0, num_frames - 1, 2) processes pairs of frames. If num_frames is odd and greater than 1 (e.g., 3, 5), the last frame will be omitted from modified_image_feature. For example, if num_frames = 3, the loop runs for i=0, processing frame0 and frame1. frame2 is never added to modified_image_feature, leading to data loss.
Could you review the logic for handling an odd number of frames? One common approach is to append the last frame unprocessed if num_frames is odd.
A docstring explaining the overall algorithm, inputs, and outputs of this function would also be very beneficial for future understanding and maintenance.
| seed: &seed 42 | ||
| model: | ||
| type: Llava OneVision | ||
| path: model path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| type: vqa | ||
| name: [mme] | ||
| download: False | ||
| path: MME dataset path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| dycoke_layer_idx: 3 | ||
| num_tokens_per_frame: 196 | ||
| merging_ratio: 0.7 | ||
| dycoke_radio: 0.7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a parameter dycoke_radio: 0.7. Is 'radio' a typo for 'ratio'?
Additionally, the Python code (llmc/compression/token_reduction/dycoke.py) uses pruning_paras['merging_ratio'] (defined on line 21) but does not seem to use dycoke_radio.
Could you clarify:
- If
dycoke_radioshould bedycoke_ratioor perhapsmerging_ratio? - If it's a distinct parameter, how is it intended to be used?
dycoke_ratio: 0.7 # Or perhaps remove if redundant with merging_ratio
| save: | ||
| save_trans: False | ||
| save_fake: False | ||
| save_path: /path/to/save/ No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| from .utils import prefill_wrapper | ||
|
|
||
|
|
||
| def dycole_ttm(image_feature, pruning_paras): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be a typo in the function name. Should dycole_ttm be dycoke_ttm to align with the module name 'DyCoke' and the class name? This typo appears in its definition and usages (lines 100, 126). Consistent naming would improve readability and maintainability.
| def dycole_ttm(image_feature, pruning_paras): | |
| def dycoke_ttm(image_feature, pruning_paras): |
| modified_image_feature.append(frame1_tokens) | ||
| modified_image_feature.append(frame2_tokens[tokens_to_keep]) | ||
|
|
||
| # Process odd frames |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment # Process odd frames for the loop starting on line 63 might be misleading. This loop iterates with i taking values 0, 4, 8, ... and processes frame_i and frame_{i+2}. These are originally even-indexed frames.
Perhaps a more descriptive comment like # Further prune even-indexed frames based on similarity to preceding even frames or similar would better reflect the logic?
| return combined_tokens | ||
|
|
||
|
|
||
| def add_dycole_ttm_to_get_2dPool(model, post_hook_fn, pruning_paras): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in function name: add_dycole_ttm_to_get_2dPool. Should this be add_dycoke_ttm_to_get_2dPool for consistency with the intended 'DyCoke' naming?
| def add_dycole_ttm_to_get_2dPool(model, post_hook_fn, pruning_paras): | |
| def add_dycoke_ttm_to_get_2dPool(model, post_hook_fn, pruning_paras): |
| add_dycole_ttm_to_get_2dPool( | ||
| self.model.model, dycole_ttm, self.model.model.pruning_paras |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No description provided.