-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[Doc] Update more docs with respect to V1 #29188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: DarkLight1337 <[email protected]>
|
Documentation preview: https://vllm--29188.org.readthedocs.build/en/29188/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request provides a good set of updates to the documentation for vLLM V1. The changes refactor and consolidate information, making the documentation clearer and more up-to-date. I've identified a couple of minor but important typos in the markdown files that affect rendering and link functionality. Addressing these will ensure the documentation is presented correctly to users.
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
Signed-off-by: DarkLight1337 <[email protected]>
|
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request updates various documentation files to reflect the current status of vLLM V1, consolidating information into a central v1_guide.md. The changes are mostly documentation updates and look good. However, I found a significant contradiction in v1_guide.md regarding the status of 'Prompt Logprobs with Prefix Caching', which could confuse users. Please see the specific comment for details.
| #### Prompt Logprobs with Prefix Caching | ||
|
|
||
| For each item, our progress towards V1 support falls into one of the following states: | ||
| Logprobs are not cached. For a request requiring prompt logprobs, the engine will ignore the prefix cache and recompute the prefill of full prompt to generate the logprobs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a contradiction in the documentation regarding 'Prompt Logprobs with Prefix Caching'. This section states that for requests with prompt logprobs, 'the engine will ignore the prefix cache'. However, the feature table on line 150 indicates that 'Prompt Logprobs with Prefix Caching' is '🟢 Functional'. These two statements are conflicting. Please clarify the correct behavior and update the documentation to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @njhill
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The statement here is correct. I think it's OK to leave it as functional (not optimized). You can link the Prompt Logprobs with Prefix Caching to this section if you want.
| vLLM does not guarantee the reproducibility of the results by default, for the sake of performance. To achieve | ||
| reproducible results, you need to turn off multiprocessing to make the scheduling deterministic by setting `VLLM_ENABLE_V1_MULTIPROCESSING=0`. | ||
| reproducible results, consider enabling [batch invariance](../features/batch_invariance.md) as the scheduling | ||
| cannot be made deterministic without using offline mode and setting `VLLM_ENABLE_V1_MULTIPROCESSING=0`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you make it more clear? IMO
- for online serving, you need batch invariance
- for offline serving, you need either batch invariance or
VLLM_ENABLE_V1_MULTIPROCESSING=0
Purpose
Update various documentations to better match the current status of V1. Moved various sections into the V1 page so people can more easily find out the differences between V0 and V1.
cc @tdoublep can you update the status of prefix caching support for hybrid models? Feel free to update this PR directly if it hasn't been merged yet.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.