-
Notifications
You must be signed in to change notification settings - Fork 66
support llava-lht(test:quantize-awq, MME-pretrain) #379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @SmudgedWings, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello, Gemini here! I've reviewed this pull request from @SmudgedWings. The primary goal of this PR, based on the title, is to introduce support for the Llava-LHT model within the framework, with a focus on enabling testing for quantization methods like AWQ and potentially MME-pretrain evaluations. The changes involve adding a new model implementation specifically tailored for Llava-LHT and integrating it into the existing model handling and evaluation infrastructure. Additionally, some runtime dependencies have been updated.
Highlights
- Add Llava-LHT Model Support: Introduces a new model class
LlavaLHTand its evaluation counterpartLlavaLHTEvalto integrate the Llava-LHT vision-language model into the framework. - Handle Llava-LHT Specifics: Implements custom methods in
LlavaLHTto correctly load the model, identify its vision and language components, and adapt its forward and generate methods to work around specific behaviors (e.g., handlingcache_position,input_ids/inputs, andattention_mask). This ensures compatibility with the framework's processing and evaluation pipelines. - Update Dependencies: Increments the minimum required PyTorch version to 2.2.0 and adjusts the listing of the
lmms-evaldependency in the runtime requirements file.
Changelog
Click here to see the changelog
- llmc/models/init.py
- Added import for the new
LlavaLHTmodel class (line 13).
- Added import for the new
- llmc/models/llava_lht.py
- Added a new file containing the
LlavaLHTandLlavaLHTEvalclasses. - Implemented
LlavaLHTinheriting fromLlama, with custom methods (build_model,get_extra_rot_module_besides_embed_layers,find_blocks,get_layernorms_in_block,get_subsets_in_block) to integrate the Llava-LHT architecture. - Added workarounds for
vlm_model.forward,vlm_model.generate, andvlm_model.prepare_inputs_for_generationto handle specific model behaviors. - Implemented
LlavaLHTEvalinheriting fromlmms_eval.models.llava.Llavato handle evaluation specifics for Llava-LHT.
- Added a new file containing the
- requirements/runtime.txt
- Updated the minimum required PyTorch version from 2.1.0 to 2.2.0 (line 1).
- Added
lmms-evalto the main dependency list (line 7). - Removed a duplicate or misplaced entry for
lmms-eval(line 34).
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Image meets the text,
Llava sees and understands,
New model arrives.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces support for the llava-lht model, including its evaluation counterpart. The changes involve adding a new model class LlavaLHT and an evaluation class LlavaLHTEval, along with updating the model registry and runtime requirements. The implementation includes custom handling for the llava-lht model's specific behaviors, such as parameter naming and generation arguments. Overall, the code adds necessary components for integrating this new model.
Summary of Findings
- Monkey Patching: The implementation uses monkey patching to modify core model methods (
forward,generate,prepare_inputs_for_generation). This approach can be fragile and less maintainable compared to using hooks or wrapper classes. - Device Mapping Logic: The device assignment logic in
LlavaLHTEval.__init__appears complex and could potentially be simplified for clarity and robustness. - Disallowing
**kwargs: TheLlavaLHTEvalconstructor strictly disallows any extra keyword arguments, which might limit future extensibility. - DEEPSPEED Configuration Requirement: A specific requirement for DEEPSPEED (zero stage 0) is mentioned in a comment, but the reason is not explained.
- Missing Docstrings: The newly added classes and methods lack docstrings, which reduces code maintainability and understanding.
- Redundant Logging: There are two identical
logger.infocalls loggingself.vlm_modelinLlavaLHT.build_model. (Note: This is a low severity issue and no specific comment was added per review settings). - Empty
build_tokenizer: TheLlavaLHT.build_tokenizermethod is empty (pass). While the tokenizer is built inbuild_model, having an empty method might be slightly confusing without a comment explaining why it's empty. (Note: This is a low severity issue and no specific comment was added per review settings).
Merge Readiness
The pull request introduces support for a new model and its evaluation. The core logic seems to handle the specific requirements of the llava-lht model. However, there are several medium severity issues related to maintainability (monkey patching, complex device logic, missing docstrings) and potential extensibility (**kwargs assertion). I recommend addressing these issues before merging to ensure the code is more robust, easier to understand, and maintainable in the long term. I am unable to approve this pull request; please have other reviewers review and approve this code before merging.
No description provided.