Skip to content

[add] add skip image cache and disable_prompt_cache para#1061

Merged
hiworldwzj merged 9 commits intomainfrom
skip_image_cache
Sep 24, 2025
Merged

[add] add skip image cache and disable_prompt_cache para#1061
hiworldwzj merged 9 commits intomainfrom
skip_image_cache

Conversation

@SangChengC
Copy link
Contributor

No description provided.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @SangChengC, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances control over caching mechanisms within the system by introducing new parameters for both text prompt and multimodal image processing. It adds disable_prompt_cache to SamplingParams and skip_image_cache to MultimodalParams, providing users with the flexibility to explicitly bypass caching for specific requests. These changes are integrated into the core inference and visual server components, alongside a minor refinement to shared memory size calculation for multimodal inputs.

Highlights

  • New Parameter: disable_prompt_cache: Introduced a new boolean parameter, disable_prompt_cache, within SamplingParams to allow explicit control over whether prompt caching should be bypassed for a given request. This parameter is integrated into the inference batch processing logic.
  • New Parameter: skip_image_cache: Added a new boolean parameter, skip_image_cache, to MultimodalParams. This enables requests to bypass the image embedding cache, ensuring that image embeddings are re-processed rather than retrieved from cache.
  • Prompt Cache Management: Modified the inference batch processing to respect the disable_prompt_cache parameter, preventing prompt cache matching and memory freeing for requests where caching is explicitly disabled.
  • Image Cache Management: Updated the visual server manager to check the skip_image_cache parameter for multimodal requests, allowing it to bypass the image embedding cache lookup when set to true.
  • Shared Memory Size Calculation: Adjusted the shared memory size estimation utility by adding image_patch_max_num to fake image items, which helps in more accurately calculating the required shared memory for multimodal image tokens.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces two new parameters, skip_image_cache and disable_prompt_cache, to provide more control over caching mechanisms. The implementation correctly integrates these parameters into the multimodal parameter handling and the inference batch processing logic. The changes are logical and well-contained. I've provided a few minor suggestions to improve code quality, including correcting a comment, removing dead code, and refactoring a magic number into a constant for better maintainability.

ctypes.c_bool,
), # whether to add spaces between special tokens when decoding
("print_eos_token", ctypes.c_bool), # eos_id will be always ignored except the value is set to True
("disable_prompt_cache", ctypes.c_bool), # eos_id will be always ignored except the value is set to True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The comment for disable_prompt_cache appears to be a copy-paste from the line above and is incorrect. It should be updated to accurately describe the purpose of this parameter.

Suggested change
("disable_prompt_cache", ctypes.c_bool), # eos_id will be always ignored except the value is set to True
("disable_prompt_cache", ctypes.c_bool), # whether to disable prompt cache

)
fake_image_item.image_w = fake_image_item._data[0]
fake_image_item.image_h = fake_image_item._data[1]
fake_image_item.extra_params["image_patch_max_num"] = 12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The value 12 is a magic number. It should be defined as a named constant at the module level (e.g., DEFAULT_IMAGE_PATCH_MAX_NUM = 12) for better readability and maintainability.

Suggested change
fake_image_item.extra_params["image_patch_max_num"] = 12
fake_image_item.extra_params["image_patch_max_num"] = DEFAULT_IMAGE_PATCH_MAX_NUM

@hiworldwzj hiworldwzj merged commit a5f188f into main Sep 24, 2025
1 check passed
@hiworldwzj hiworldwzj deleted the skip_image_cache branch September 24, 2025 11:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants