-
Notifications
You must be signed in to change notification settings - Fork 66
fix bugs #383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix bugs #383
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @chengtao-lv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello team, gemini-code-assist here to provide a summary of this pull request. This PR, titled "fix bugs", primarily focuses on refactoring and improving the handling of vision tokens and hook execution within the token reduction modules, specifically for FastV, PyramidDrop, and SparseVLM. The main changes involve moving common configuration logic to the base class and introducing decorators to manage hook behavior during the prefill and decoding stages of model inference.
Highlights
- Refactor Sparse Config: Common logic for determining vision token index and length, and whether the model is a video model, has been moved from individual token reduction classes (FastV, PyramidDrop) into a new
set_sparse_configmethod in the baseTokenReductionModule. This promotes code reuse and consistency. - Introduce Prefill Wrappers: Two new decorator functions,
@prefill_wrapperand@prefill_wrapper_model, have been added toutils.py. These decorators are designed to wrap hook functions and prevent them from executing during the decoding stage (when the input sequence length is 1), which is often necessary for performance or correctness during generation. - Apply Prefill Wrappers to Hooks: Various forward hooks (
input_hook,pruning_hook,read_parameter_hook,decoder_attn_hook,register_module_pars) in the FastV, PyramidDrop, and SparseVLM implementations have been decorated with the new@prefill_wrapperor@prefill_wrapper_modelto ensure they only run during the prefill stage. - Dynamic Image Token Index Detection: In FastV, the
image_token_start_indexis no longer a fixed value in the config but is now dynamically determined by aninput_hookregistered on the embedding layer, which finds the index of the vision token in the input sequence during prefill.
Changelog
Click here to see the changelog
- llmc/compression/token_reduction/fastv.py
- Imported
prefill_wrapperfrom.utils. - Modified
add_sparse_configto useself.special_configand removed settingimage_token_start_index. - Added
input_hookdecorated with@prefill_wrapperto dynamically findimage_token_start_index. - Decorated
fastv_pruning_hookandread_parameter_hookwith@prefill_wrapper. - Registered the new
input_hookonself.model.embed_tokens.
- Imported
- llmc/compression/token_reduction/pyramiddrop.py
- Imported
prefill_wrapperfrom.utils. - Modified
add_sparse_configto useself.special_configand removed logic forvision_token_indexandvision_token_length(moved to base class). - Decorated
pruning_hook,input_hook, andread_parameter_hookwith@prefill_wrapper. - Removed explicit
shape[1] == 1checks frominput_hookandread_parameter_hook.
- Imported
- llmc/compression/token_reduction/sparsevlm.py
- Imported
prefill_wrapperandprefill_wrapper_modelfrom.utils. - Decorated
input_hook,decoder_attn_hook, andread_parameter_hookwith@prefill_wrapper. - Decorated
register_module_parswith@prefill_wrapper_model.
- Imported
- llmc/compression/token_reduction/token_reduction_module.py
- Added
set_sparse_configmethod to initializeself.special_configand determinevision_token_indexandvision_token_length. - Called
set_sparse_configin the__init__method.
- Added
- llmc/compression/token_reduction/utils.py
- Imported
wrapsfromfunctools. - Added
prefill_wrapperdecorator to skip hooks during decoding based on input tensor shape. - Added
prefill_wrapper_modeldecorator to skip hooks during decoding based oninputs_embedsshape in kwargs. - Added imports for
torchandtorch.nn.
- Imported
- llmc/models/llava.py
- Removed hardcoded
image_token_start_index: 5fromself.pruning_config. - Added
is_video_model: Falsetoself.pruning_config.
- Removed hardcoded
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
A hook attached, a function wrapped,
Decoding stage, the logic trapped.
With wrappers new, the code is clean,
Prefill runs, then silent scene.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The pull request introduces token reduction techniques (FastV, PyramidDrop, SparseVLM) to the llava model. It includes utility functions and modifications to the model's configuration and hook registration. The changes aim to improve efficiency by reducing the number of tokens processed. Overall, the changes seem well-structured, but there are some areas that could be improved for clarity and maintainability.
Summary of Findings
- Direct Access to
special_config: Directly accessingself.special_configcan be error-prone and reduces code readability. Consider using getter methods or more descriptive variable names. - Missing Docstrings: Several hook functions lack docstrings, which reduces code readability and maintainability. Adding docstrings to these functions would improve the code's clarity.
- Code Duplication: The logic for setting
special_configis duplicated in multiple files. Consider moving this logic to a shared utility function or a base class method to avoid code duplication.
Merge Readiness
The pull request introduces important token reduction techniques. However, addressing the identified issues related to code clarity, error handling, and code duplication would significantly improve the code's quality and maintainability. I recommend addressing these issues before merging. I am unable to approve this pull request, and other reviewers should review and approve this code before merging.
No description provided.