Skip to content

Conversation

@chengtao-lv
Copy link
Collaborator

No description provided.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @chengtao-lv, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, gemini-code-assist here to provide a summary of this pull request. This PR, titled "fix bugs", primarily focuses on refactoring and improving the handling of vision tokens and hook execution within the token reduction modules, specifically for FastV, PyramidDrop, and SparseVLM. The main changes involve moving common configuration logic to the base class and introducing decorators to manage hook behavior during the prefill and decoding stages of model inference.

Highlights

  • Refactor Sparse Config: Common logic for determining vision token index and length, and whether the model is a video model, has been moved from individual token reduction classes (FastV, PyramidDrop) into a new set_sparse_config method in the base TokenReductionModule. This promotes code reuse and consistency.
  • Introduce Prefill Wrappers: Two new decorator functions, @prefill_wrapper and @prefill_wrapper_model, have been added to utils.py. These decorators are designed to wrap hook functions and prevent them from executing during the decoding stage (when the input sequence length is 1), which is often necessary for performance or correctness during generation.
  • Apply Prefill Wrappers to Hooks: Various forward hooks (input_hook, pruning_hook, read_parameter_hook, decoder_attn_hook, register_module_pars) in the FastV, PyramidDrop, and SparseVLM implementations have been decorated with the new @prefill_wrapper or @prefill_wrapper_model to ensure they only run during the prefill stage.
  • Dynamic Image Token Index Detection: In FastV, the image_token_start_index is no longer a fixed value in the config but is now dynamically determined by an input_hook registered on the embedding layer, which finds the index of the vision token in the input sequence during prefill.

Changelog

Click here to see the changelog
  • llmc/compression/token_reduction/fastv.py
    • Imported prefill_wrapper from .utils.
    • Modified add_sparse_config to use self.special_config and removed setting image_token_start_index.
    • Added input_hook decorated with @prefill_wrapper to dynamically find image_token_start_index.
    • Decorated fastv_pruning_hook and read_parameter_hook with @prefill_wrapper.
    • Registered the new input_hook on self.model.embed_tokens.
  • llmc/compression/token_reduction/pyramiddrop.py
    • Imported prefill_wrapper from .utils.
    • Modified add_sparse_config to use self.special_config and removed logic for vision_token_index and vision_token_length (moved to base class).
    • Decorated pruning_hook, input_hook, and read_parameter_hook with @prefill_wrapper.
    • Removed explicit shape[1] == 1 checks from input_hook and read_parameter_hook.
  • llmc/compression/token_reduction/sparsevlm.py
    • Imported prefill_wrapper and prefill_wrapper_model from .utils.
    • Decorated input_hook, decoder_attn_hook, and read_parameter_hook with @prefill_wrapper.
    • Decorated register_module_pars with @prefill_wrapper_model.
  • llmc/compression/token_reduction/token_reduction_module.py
    • Added set_sparse_config method to initialize self.special_config and determine vision_token_index and vision_token_length.
    • Called set_sparse_config in the __init__ method.
  • llmc/compression/token_reduction/utils.py
    • Imported wraps from functools.
    • Added prefill_wrapper decorator to skip hooks during decoding based on input tensor shape.
    • Added prefill_wrapper_model decorator to skip hooks during decoding based on inputs_embeds shape in kwargs.
    • Added imports for torch and torch.nn.
  • llmc/models/llava.py
    • Removed hardcoded image_token_start_index: 5 from self.pruning_config.
    • Added is_video_model: False to self.pruning_config.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A hook attached, a function wrapped,
Decoding stage, the logic trapped.
With wrappers new, the code is clean,
Prefill runs, then silent scene.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces token reduction techniques (FastV, PyramidDrop, SparseVLM) to the llava model. It includes utility functions and modifications to the model's configuration and hook registration. The changes aim to improve efficiency by reducing the number of tokens processed. Overall, the changes seem well-structured, but there are some areas that could be improved for clarity and maintainability.

Summary of Findings

  • Direct Access to special_config: Directly accessing self.special_config can be error-prone and reduces code readability. Consider using getter methods or more descriptive variable names.
  • Missing Docstrings: Several hook functions lack docstrings, which reduces code readability and maintainability. Adding docstrings to these functions would improve the code's clarity.
  • Code Duplication: The logic for setting special_config is duplicated in multiple files. Consider moving this logic to a shared utility function or a base class method to avoid code duplication.

Merge Readiness

The pull request introduces important token reduction techniques. However, addressing the identified issues related to code clarity, error handling, and code duplication would significantly improve the code's quality and maintainability. I recommend addressing these issues before merging. I am unable to approve this pull request, and other reviewers should review and approve this code before merging.

@helloyongyang helloyongyang merged commit 1c37518 into main May 26, 2025
2 checks passed
@helloyongyang helloyongyang deleted the vlm branch May 26, 2025 05:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants