[add] add skip image cache and disable_prompt_cache para#1061
[add] add skip image cache and disable_prompt_cache para#1061hiworldwzj merged 9 commits intomainfrom
Conversation
There was a problem hiding this comment.
Summary of Changes
Hello @SangChengC, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request enhances control over caching mechanisms within the system by introducing new parameters for both text prompt and multimodal image processing. It adds disable_prompt_cache to SamplingParams and skip_image_cache to MultimodalParams, providing users with the flexibility to explicitly bypass caching for specific requests. These changes are integrated into the core inference and visual server components, alongside a minor refinement to shared memory size calculation for multimodal inputs.
Highlights
- New Parameter: disable_prompt_cache: Introduced a new boolean parameter,
disable_prompt_cache, withinSamplingParamsto allow explicit control over whether prompt caching should be bypassed for a given request. This parameter is integrated into the inference batch processing logic. - New Parameter: skip_image_cache: Added a new boolean parameter,
skip_image_cache, toMultimodalParams. This enables requests to bypass the image embedding cache, ensuring that image embeddings are re-processed rather than retrieved from cache. - Prompt Cache Management: Modified the inference batch processing to respect the
disable_prompt_cacheparameter, preventing prompt cache matching and memory freeing for requests where caching is explicitly disabled. - Image Cache Management: Updated the visual server manager to check the
skip_image_cacheparameter for multimodal requests, allowing it to bypass the image embedding cache lookup when set to true. - Shared Memory Size Calculation: Adjusted the shared memory size estimation utility by adding
image_patch_max_numto fake image items, which helps in more accurately calculating the required shared memory for multimodal image tokens.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in pull request comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Code Review
This pull request introduces two new parameters, skip_image_cache and disable_prompt_cache, to provide more control over caching mechanisms. The implementation correctly integrates these parameters into the multimodal parameter handling and the inference batch processing logic. The changes are logical and well-contained. I've provided a few minor suggestions to improve code quality, including correcting a comment, removing dead code, and refactoring a magic number into a constant for better maintainability.
| ctypes.c_bool, | ||
| ), # whether to add spaces between special tokens when decoding | ||
| ("print_eos_token", ctypes.c_bool), # eos_id will be always ignored except the value is set to True | ||
| ("disable_prompt_cache", ctypes.c_bool), # eos_id will be always ignored except the value is set to True |
There was a problem hiding this comment.
The comment for disable_prompt_cache appears to be a copy-paste from the line above and is incorrect. It should be updated to accurately describe the purpose of this parameter.
| ("disable_prompt_cache", ctypes.c_bool), # eos_id will be always ignored except the value is set to True | |
| ("disable_prompt_cache", ctypes.c_bool), # whether to disable prompt cache |
| ) | ||
| fake_image_item.image_w = fake_image_item._data[0] | ||
| fake_image_item.image_h = fake_image_item._data[1] | ||
| fake_image_item.extra_params["image_patch_max_num"] = 12 |
There was a problem hiding this comment.
The value 12 is a magic number. It should be defined as a named constant at the module level (e.g., DEFAULT_IMAGE_PATCH_MAX_NUM = 12) for better readability and maintainability.
| fake_image_item.extra_params["image_patch_max_num"] = 12 | |
| fake_image_item.extra_params["image_patch_max_num"] = DEFAULT_IMAGE_PATCH_MAX_NUM |
6ea57ef to
c612b29
Compare
No description provided.