Skip to content

Conversation

Josephasafg
Copy link
Contributor

@Josephasafg Josephasafg commented Aug 26, 2025

Purpose

This PR enables V1 by default to Mamba models so they won't fall back to V0. This PR needs to have this PR and this PR that enables full cuda graph support by default merged first, so users won't have to specify VLLM_ATTENTION_BACKEND=FLASHINFER when they start vLLM.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@Josephasafg Josephasafg requested a review from hmellor as a code owner August 26, 2025 11:35
@mergify mergify bot added the documentation Improvements or additions to documentation label Aug 26, 2025
@Josephasafg
Copy link
Contributor Author

Josephasafg commented Aug 26, 2025

@tdoublep @heheda12345 Once we get PR and this PR merged, we should probably enable Mamba models by default to V1.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to enable the V1 engine by default for Mamba models by removing the check that was previously falling back to the V0 engine. While the change is correct in principle, it introduces a critical issue where prefix caching is enabled by default for these models, which they do not support, leading to a crash. A fix is required to adjust the default V1 arguments for Mamba-like models.

@tdoublep tdoublep self-requested a review August 26, 2025 18:07
Copy link
Member

@tdoublep tdoublep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change will enable V1 by default for all models that use mamba1, mamba2, minimax linear attention and short conv layers.

For mamba2, we definitely don't want to do this until we first enable cudagraph_mode=FULL_AND_PIECEWISE as the default, because otherwise the performance drop between V0 and V1 is very big.

@tdoublep
Copy link
Member

Let's also merge this tiny one first (otherwise user with get a crash using default vllm serve)

@heheda12345
Copy link
Collaborator

@tdoublep what about merging the two PRs in parallel? There's no merge conflict between them

Copy link
Member

@tdoublep tdoublep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@tdoublep tdoublep enabled auto-merge (squash) August 27, 2025 06:55
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025
@tdoublep
Copy link
Member

tdoublep commented Aug 27, 2025

@heheda12345 We can. If this one goes first, there will be a short time inbetween when vllm crashes by default for these models. But it's OK - both PRs can be merged in next hours.

@Josephasafg
Copy link
Contributor Author

@tdoublep I need to set V0 in test_hybrid.py and remove V1 env set. Fixing

auto-merge was automatically disabled August 27, 2025 07:34

Head branch was pushed to by a user without write access

Signed-off-by: asafg <[email protected]>
Signed-off-by: asafg <[email protected]>
@tdoublep tdoublep enabled auto-merge (squash) August 27, 2025 08:10
@Josephasafg
Copy link
Contributor Author

I'll wait for this PR to be merged first as my tests kind of depend on them

auto-merge was automatically disabled August 27, 2025 14:27

Head branch was pushed to by a user without write access

@tdoublep tdoublep enabled auto-merge (squash) August 27, 2025 14:43
@@ -417,4 +417,5 @@ def verify_and_update_config(cls, vllm_config: "VllmConfig") -> None:
"GptOssForCausalLM": GptOssForCausalLMConfig,
"MambaForCausalLM": MambaModelConfig,
"Mamba2ForCausalLM": MambaModelConfig,
"FalconMambaForCausalLM": MambaModelConfig,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tdoublep fine for this PR but I think this line makes vLLM not that plugable to new models.

@heheda12345
Copy link
Collaborator

@tdoublep Can you create a new PR to update the doc for recent status and a simple guideline for contributing new mamba models?

Signed-off-by: asafg <[email protected]>
auto-merge was automatically disabled August 27, 2025 17:55

Head branch was pushed to by a user without write access

@Josephasafg
Copy link
Contributor Author

Josephasafg commented Aug 27, 2025

@heheda12345 @tdoublep I fixed some tests. Some tests are, I believe, only suited for V0 like

  • test_chunked_prefill
  • test_models_preemption_recompute

@tdoublep
Copy link
Member

tdoublep commented Aug 27, 2025

@Josephasafg Thanks - the tests are passing now. When we remove V0 code we can either decide to adapt those 2 tests to V1 or drop them if they don't make sense.

@tdoublep tdoublep enabled auto-merge (squash) August 27, 2025 19:19
@tdoublep
Copy link
Member

@tdoublep Can you create a new PR to update the doc for recent status and a simple guideline for contributing new mamba models?

@heheda12345 Sure, I will do that. Are there any examples of guidelines for contributing other models that I could use as a reference?

@Josephasafg
Copy link
Contributor Author

@Josephasafg Thanks - the tests are passing now. When we remove V0 code we can either decide to adapt those 2 tests to V1 or drop them if they don't make sense.

Sounds good.

@tdoublep tdoublep merged commit 853c371 into vllm-project:main Aug 27, 2025
41 checks passed
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
dumb0002 pushed a commit to dumb0002/vllm that referenced this pull request Aug 28, 2025
2015aroras pushed a commit to 2015aroras/vllm that referenced this pull request Aug 29, 2025
nopperl pushed a commit to pfnet/vllm that referenced this pull request Sep 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation ready ONLY add when PR is ready to merge/full CI is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants