Skip to content

Conversation

@DarkLight1337
Copy link
Member

@DarkLight1337 DarkLight1337 commented Nov 21, 2025

Purpose

Update various documentations to better match the current status of V1. Moved various sections into the V1 page so people can more easily find out the differences between V0 and V1.

cc @tdoublep can you update the status of prefix caching support for hybrid models? Feel free to update this PR directly if it hasn't been merged yet.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify
Copy link

mergify bot commented Nov 21, 2025

Documentation preview: https://vllm--29188.org.readthedocs.build/en/29188/

@mergify mergify bot added the documentation Improvements or additions to documentation label Nov 21, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request provides a good set of updates to the documentation for vLLM V1. The changes refactor and consolidate information, making the documentation clearer and more up-to-date. I've identified a couple of minor but important typos in the markdown files that affect rendering and link functionality. Addressing these will ensure the documentation is presented correctly to users.

Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
@DarkLight1337
Copy link
Member Author

/gemini review

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates various documentation files to reflect the current status of vLLM V1, consolidating information into a central v1_guide.md. The changes are mostly documentation updates and look good. However, I found a significant contradiction in v1_guide.md regarding the status of 'Prompt Logprobs with Prefix Caching', which could confuse users. Please see the specific comment for details.

Comment on lines +59 to +61
#### Prompt Logprobs with Prefix Caching

For each item, our progress towards V1 support falls into one of the following states:
Logprobs are not cached. For a request requiring prompt logprobs, the engine will ignore the prefix cache and recompute the prefill of full prompt to generate the logprobs.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

There is a contradiction in the documentation regarding 'Prompt Logprobs with Prefix Caching'. This section states that for requests with prompt logprobs, 'the engine will ignore the prefix cache'. However, the feature table on line 150 indicates that 'Prompt Logprobs with Prefix Caching' is '🟢 Functional'. These two statements are conflicting. Please clarify the correct behavior and update the documentation to be consistent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @njhill

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The statement here is correct. I think it's OK to leave it as functional (not optimized). You can link the Prompt Logprobs with Prefix Caching to this section if you want.

vLLM does not guarantee the reproducibility of the results by default, for the sake of performance. To achieve
reproducible results, you need to turn off multiprocessing to make the scheduling deterministic by setting `VLLM_ENABLE_V1_MULTIPROCESSING=0`.
reproducible results, consider enabling [batch invariance](../features/batch_invariance.md) as the scheduling
cannot be made deterministic without using offline mode and setting `VLLM_ENABLE_V1_MULTIPROCESSING=0`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make it more clear? IMO

  • for online serving, you need batch invariance
  • for offline serving, you need either batch invariance or VLLM_ENABLE_V1_MULTIPROCESSING=0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants