[V1][Mamba] - Enable V1 by default for Mamba Models #23650

Josephasafg · 2025-08-26T11:35:50Z

Purpose

This PR enables V1 by default to Mamba models so they won't fall back to V0. This PR needs to have this PR and this PR that enables full cuda graph support by default merged first, so users won't have to specify VLLM_ATTENTION_BACKEND=FLASHINFER when they start vLLM.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: asafg <[email protected]>

Josephasafg · 2025-08-26T11:37:36Z

@tdoublep @heheda12345 Once we get PR and this PR merged, we should probably enable Mamba models by default to V1.

gemini-code-assist

Code Review

This pull request aims to enable the V1 engine by default for Mamba models by removing the check that was previously falling back to the V0 engine. While the change is correct in principle, it introduces a critical issue where prefix caching is enabled by default for these models, which they do not support, leading to a crash. A fix is required to adjust the default V1 arguments for Mamba-like models.

Signed-off-by: asafg <[email protected]>

…m into default_mamba_v1_support

tdoublep

This change will enable V1 by default for all models that use mamba1, mamba2, minimax linear attention and short conv layers.

For mamba2, we definitely don't want to do this until we first enable cudagraph_mode=FULL_AND_PIECEWISE as the default, because otherwise the performance drop between V0 and V1 is very big.

tdoublep · 2025-08-27T06:50:28Z

Let's also merge this tiny one first (otherwise user with get a crash using default vllm serve)

heheda12345 · 2025-08-27T06:54:21Z

@tdoublep what about merging the two PRs in parallel? There's no merge conflict between them

tdoublep

LGTM

tdoublep · 2025-08-27T06:56:50Z

@heheda12345 We can. If this one goes first, there will be a short time inbetween when vllm crashes by default for these models. But it's OK - both PRs can be merged in next hours.

Josephasafg · 2025-08-27T07:25:45Z

@tdoublep I need to set V0 in test_hybrid.py and remove V1 env set. Fixing

…pport

Signed-off-by: asafg <[email protected]>

Josephasafg · 2025-08-27T09:19:04Z

I'll wait for this PR to be merged first as my tests kind of depend on them

Signed-off-by: asafg <[email protected]>

tests/models/language/generation/test_hybrid.py

Signed-off-by: asafg <[email protected]>

tests/models/language/generation/test_hybrid.py

heheda12345 · 2025-08-27T17:51:05Z

vllm/model_executor/models/config.py

@@ -417,4 +417,5 @@ def verify_and_update_config(cls, vllm_config: "VllmConfig") -> None:
    "GptOssForCausalLM": GptOssForCausalLMConfig,
    "MambaForCausalLM": MambaModelConfig,
    "Mamba2ForCausalLM": MambaModelConfig,
+    "FalconMambaForCausalLM": MambaModelConfig,


@tdoublep fine for this PR but I think this line makes vLLM not that plugable to new models.

heheda12345 · 2025-08-27T17:54:20Z

@tdoublep Can you create a new PR to update the doc for recent status and a simple guideline for contributing new mamba models?

Signed-off-by: asafg <[email protected]>

Josephasafg · 2025-08-27T18:19:47Z

@heheda12345 @tdoublep I fixed some tests. Some tests are, I believe, only suited for V0 like

test_chunked_prefill
test_models_preemption_recompute

tdoublep · 2025-08-27T19:16:57Z

@Josephasafg Thanks - the tests are passing now. When we remove V0 code we can either decide to adapt those 2 tests to V1 or drop them if they don't make sense.

tdoublep · 2025-08-27T20:03:10Z

@tdoublep Can you create a new PR to update the doc for recent status and a simple guideline for contributing new mamba models?

@heheda12345 Sure, I will do that. Are there any examples of guidelines for contributing other models that I could use as a reference?

Josephasafg · 2025-08-27T20:15:41Z

@Josephasafg Thanks - the tests are passing now. When we remove V0 code we can either decide to adapt those 2 tests to V1 or drop them if they don't make sense.

Sounds good.

Signed-off-by: asafg <[email protected]>

feat: Default Support for Mamba models in V1

49bdc08

Signed-off-by: asafg <[email protected]>

Josephasafg requested a review from hmellor as a code owner August 26, 2025 11:35

Merge branch 'main' into default_mamba_v1_support

4418048

mergify bot added the documentation Improvements or additions to documentation label Aug 26, 2025

gemini-code-assist bot reviewed Aug 26, 2025

View reviewed changes

Josephasafg added 3 commits August 26, 2025 21:01

revert: docs due to conflicts

cde0a25

Signed-off-by: asafg <[email protected]>

Merge branch 'default_mamba_v1_support' of github.com:Josephasafg/vll…

bac0a47

…m into default_mamba_v1_support

Merge branch 'main' into default_mamba_v1_support

4088cb9

tdoublep self-requested a review August 26, 2025 18:07

tdoublep reviewed Aug 26, 2025

View reviewed changes

tdoublep mentioned this pull request Aug 26, 2025

[V1] [Hybrid] Enable Full CUDA graph by default for hybrid models in V1 #22594

Merged

4 tasks

Merge branch 'main' into default_mamba_v1_support

7cf56bb

tdoublep approved these changes Aug 27, 2025

View reviewed changes

tdoublep enabled auto-merge (squash) August 27, 2025 06:55

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 27, 2025

Josephasafg added 2 commits August 27, 2025 10:28

Merge remote-tracking branch 'upstream/main' into default_mamba_v1_su…

ae83562

…pport

fix: Updated tests to use v0

6e2b2cf

Signed-off-by: asafg <[email protected]>

auto-merge was automatically disabled August 27, 2025 07:34
Head branch was pushed to by a user without write access

Josephasafg added 2 commits August 27, 2025 10:35

fix: Lint

6e4de9b

Signed-off-by: asafg <[email protected]>

fix: Removed with

05e1c00

Signed-off-by: asafg <[email protected]>

tdoublep enabled auto-merge (squash) August 27, 2025 08:10

Josephasafg added 3 commits August 27, 2025 15:57

Merge branch 'main' into default_mamba_v1_support

fb1755b

fix: test_batching to test only first items in ssm and hybrid

18a6ae7

Signed-off-by: asafg <[email protected]>

feat: Added FalconMambaForCausalLM to config map

de89dda

Signed-off-by: asafg <[email protected]>

auto-merge was automatically disabled August 27, 2025 14:27
Head branch was pushed to by a user without write access

tdoublep reviewed Aug 27, 2025

View reviewed changes

tests/models/language/generation/test_hybrid.py Outdated Show resolved Hide resolved

tdoublep reviewed Aug 27, 2025

View reviewed changes

tests/models/language/generation/test_hybrid.py Outdated Show resolved Hide resolved

fix: Removed enable_prefix_caching=False

37c61ba

Signed-off-by: asafg <[email protected]>

tdoublep enabled auto-merge (squash) August 27, 2025 14:43

Josephasafg added 2 commits August 27, 2025 19:37

fix: test_batching list

2a9db64

Signed-off-by: asafg <[email protected]>

fix: Removed flashinfer env

db941f8

Signed-off-by: asafg <[email protected]>

heheda12345 reviewed Aug 27, 2025

View reviewed changes

fix: Tests

1b46348

Signed-off-by: asafg <[email protected]>

auto-merge was automatically disabled August 27, 2025 17:55
Head branch was pushed to by a user without write access

fix: Removed full_cuda_graph param

ef82b62

Signed-off-by: asafg <[email protected]>

tdoublep enabled auto-merge (squash) August 27, 2025 19:19

tdoublep merged commit 853c371 into vllm-project:main Aug 27, 2025
41 checks passed

xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025

[V1][Mamba] - Enable V1 by default for Mamba Models (vllm-project#23650)

0670130

Signed-off-by: asafg <[email protected]>

zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025

[V1][Mamba] - Enable V1 by default for Mamba Models (vllm-project#23650)

e715ac4

Signed-off-by: asafg <[email protected]>

dumb0002 pushed a commit to dumb0002/vllm that referenced this pull request Aug 28, 2025

[V1][Mamba] - Enable V1 by default for Mamba Models (vllm-project#23650)

63099e6

Signed-off-by: asafg <[email protected]>

2015aroras pushed a commit to 2015aroras/vllm that referenced this pull request Aug 29, 2025

[V1][Mamba] - Enable V1 by default for Mamba Models (vllm-project#23650)

ff6ba0e

Signed-off-by: asafg <[email protected]>

nopperl pushed a commit to pfnet/vllm that referenced this pull request Sep 3, 2025

[V1][Mamba] - Enable V1 by default for Mamba Models (vllm-project#23650)

adb5a27

Signed-off-by: asafg <[email protected]>

Uh oh!

[V1][Mamba] - Enable V1 by default for Mamba Models #23650

[V1][Mamba] - Enable V1 by default for Mamba Models #23650

Uh oh!

Conversation

Josephasafg commented Aug 26, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

Josephasafg commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

tdoublep left a comment

Choose a reason for hiding this comment

Uh oh!

tdoublep commented Aug 27, 2025

Uh oh!

heheda12345 commented Aug 27, 2025

Uh oh!

tdoublep left a comment

Choose a reason for hiding this comment

Uh oh!

tdoublep commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Josephasafg commented Aug 27, 2025

Uh oh!

Josephasafg commented Aug 27, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

heheda12345 Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

heheda12345 commented Aug 27, 2025

Uh oh!

Josephasafg commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tdoublep commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tdoublep commented Aug 27, 2025

Uh oh!

Josephasafg commented Aug 27, 2025

Uh oh!

Uh oh!

Uh oh!

Josephasafg commented Aug 26, 2025 •

edited by github-actions bot

Loading

Josephasafg commented Aug 26, 2025 •

edited

Loading

tdoublep commented Aug 27, 2025 •

edited

Loading

Josephasafg commented Aug 27, 2025 •

edited

Loading

tdoublep commented Aug 27, 2025 •

edited

Loading