-
-
Notifications
You must be signed in to change notification settings - Fork 10.6k
docs: clarify remaining v0 references #26311
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request provides a number of documentation updates to remove references to the legacy v0 engine and clarify concepts for the current v1 engine. The changes are well-executed across multiple files, improving the clarity and relevance of the documentation for users. The updates are consistent with the stated goals of the PR, and I have no further suggestions.
|
||
We have started the process of deprecating V0. Please read [RFC #18571](gh-issue:18571) for more details. | ||
|
||
V1 is now enabled by default for all supported use cases, and we will gradually enable it for every use case we plan to support. Please share any feedback on [GitHub](https://github.com/vllm-project/vllm) or in the [vLLM Slack](https://inviter.co/vllm-slack). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also update this paragraph?
| **Mamba Models** | <nobr>🟢 (Mamba-2), 🟢 (Mamba-1)</nobr> | | ||
| **Multimodal Models** | <nobr>🟢 Functional</nobr> | | ||
|
||
vLLM V1 currently excludes model architectures with the `SupportsV0Only` protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove the V1 column from the Supported Models page and delete all models that don't support V1
Chunked prefill allows vLLM to process large prefills in smaller chunks and batch them together with decode requests. This feature helps improve both throughput and latency by better balancing compute-bound (prefill) and memory-bound (decode) operations. | ||
|
||
In vLLM V1, **chunked prefill is always enabled by default**. This is different from vLLM V0, where it was conditionally enabled based on model characteristics. | ||
In vLLM V1, **chunked prefill is always enabled by default** so that behavior is consistent across supported models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In vLLM V1, **chunked prefill is always enabled by default** so that behavior is consistent across supported models. | |
In vLLM V1, **chunked prefill is always enabled by default**. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are probably some mistakes here. @markmc PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@njhill I guess this page can use a full clean up
Speculative decoding with a draft model requires the V1 engine. | ||
Older releases that predate V1 (such as the 0.10.x series) raise a `NotImplementedError`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speculative decoding with a draft model requires the V1 engine. | |
Older releases that predate V1 (such as the 0.10.x series) raise a `NotImplementedError`. | |
Speculative decoding with a draft model is not supported in vLLM V1 version. | |
You can use older version before the 0.10x series to continue to leverage it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DarkLight1337 PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should remove the V1 column from the Supported Models page and delete all models that don't support V1
LGTM after doing this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can probably gradually remove this docs
Summary
Testing
https://chatgpt.com/codex/tasks/task_e_68e3f11c47408329bf2324ac7b1ad7bf