[add] add skip image cache and disable_prompt_cache para by SangChengC · Pull Request #1061 · ModelTC/LightLLM

SangChengC · 2025-09-18T09:49:45Z

No description provided.

gemini-code-assist

Summary of Changes

Hello @SangChengC, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances control over caching mechanisms within the system by introducing new parameters for both text prompt and multimodal image processing. It adds disable_prompt_cache to SamplingParams and skip_image_cache to MultimodalParams, providing users with the flexibility to explicitly bypass caching for specific requests. These changes are integrated into the core inference and visual server components, alongside a minor refinement to shared memory size calculation for multimodal inputs.

Highlights

New Parameter: disable_prompt_cache: Introduced a new boolean parameter, disable_prompt_cache, within SamplingParams to allow explicit control over whether prompt caching should be bypassed for a given request. This parameter is integrated into the inference batch processing logic.
New Parameter: skip_image_cache: Added a new boolean parameter, skip_image_cache, to MultimodalParams. This enables requests to bypass the image embedding cache, ensuring that image embeddings are re-processed rather than retrieved from cache.
Prompt Cache Management: Modified the inference batch processing to respect the disable_prompt_cache parameter, preventing prompt cache matching and memory freeing for requests where caching is explicitly disabled.
Image Cache Management: Updated the visual server manager to check the skip_image_cache parameter for multimodal requests, allowing it to bypass the image embedding cache lookup when set to true.
Shared Memory Size Calculation: Adjusted the shared memory size estimation utility by adding image_patch_max_num to fake image items, which helps in more accurately calculating the required shared memory for multimodal image tokens.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces two new parameters, skip_image_cache and disable_prompt_cache, to provide more control over caching mechanisms. The implementation correctly integrates these parameters into the multimodal parameter handling and the inference batch processing logic. The changes are logical and well-contained. I've provided a few minor suggestions to improve code quality, including correcting a comment, removing dead code, and refactoring a magic number into a constant for better maintainability.

gemini-code-assist · 2025-09-18T09:51:03Z

lightllm/server/core/objs/sampling_params.py

            ctypes.c_bool,
        ),  # whether to add spaces between special tokens when decoding
        ("print_eos_token", ctypes.c_bool),  # eos_id will be always ignored except the value is set to True
+        ("disable_prompt_cache", ctypes.c_bool),  # eos_id will be always ignored except the value is set to True


The comment for disable_prompt_cache appears to be a copy-paste from the line above and is incorrect. It should be updated to accurately describe the purpose of this parameter.

Suggested change

("disable_prompt_cache", ctypes.c_bool), # eos_id will be always ignored except the value is set to True

("disable_prompt_cache", ctypes.c_bool), # whether to disable prompt cache

gemini-code-assist · 2025-09-18T09:51:03Z

lightllm/utils/shm_size_check.py

        )
        fake_image_item.image_w = fake_image_item._data[0]
        fake_image_item.image_h = fake_image_item._data[1]
+        fake_image_item.extra_params["image_patch_max_num"] = 12


The value 12 is a magic number. It should be defined as a named constant at the module level (e.g., DEFAULT_IMAGE_PATCH_MAX_NUM = 12) for better readability and maintainability.

Suggested change

fake_image_item.extra_params["image_patch_max_num"] = 12

fake_image_item.extra_params["image_patch_max_num"] = DEFAULT_IMAGE_PATCH_MAX_NUM

gemini-code-assist bot reviewed Sep 18, 2025

View reviewed changes

SangChengC and others added 5 commits September 24, 2025 10:56

[add]add skip-image-cache parameters

09fc433

[add]add disable_prompt_cache parameters

802e85c

fix

4dd2989

[add]add disable_prompt_cache

c0642f8

[add]add disable_prompt_cache

c612b29

hiworldwzj force-pushed the skip_image_cache branch from 6ea57ef to c612b29 Compare September 24, 2025 10:59

wangzaijun added 4 commits September 24, 2025 10:59

fix

6613211

fix

14d01db

fix

b44137f

fix

a1beb98

hiworldwzj merged commit a5f188f into main Sep 24, 2025
1 check passed

hiworldwzj deleted the skip_image_cache branch September 24, 2025 11:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[add] add skip image cache and disable_prompt_cache para#1061

[add] add skip image cache and disable_prompt_cache para#1061
hiworldwzj merged 9 commits intomainfrom
skip_image_cache

SangChengC commented Sep 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Sep 18, 2025

Uh oh!

gemini-code-assist bot Sep 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	("disable_prompt_cache", ctypes.c_bool), # eos_id will be always ignored except the value is set to True
	("disable_prompt_cache", ctypes.c_bool), # whether to disable prompt cache

	fake_image_item.extra_params["image_patch_max_num"] = 12
	fake_image_item.extra_params["image_patch_max_num"] = DEFAULT_IMAGE_PATCH_MAX_NUM

Conversation

SangChengC commented Sep 18, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants