[MODEL] Add support for Zamba2 models #13185

yury-tokpanov · 2025-02-13T01:55:46Z

This PR adds support for Zamba2 models (#9382), a series of mamba2-transformer hybrid models with shared attention blocks and LoRAs, applied to shared MLP and attention blocks, depending on the model. 1.2B and 7B models use RoPE for their attention blocks.

This PR is fully compatible with Zamba2 integration in HuggingFace transformers library, which was recently merged into the main branch.

We are able to reproduce evaluation results using evaluation harness with vllm evaluator.
We also inspected logits and intermediate layers outputs, comparing them with our reference implementation. We find a good agreement between the two (given numerical precision).
Chunked prefill appears to be working.
TP is supported.
PP is not supported, and we believe it wouldn't make much sense to use our models with PP, since it will remove the memory advantage of shared attention layers.

Unit tests pass now.

We would like to acknowledge authors of Bamba PR and Mamba2 PR (@fabianlim and @tlrmchlsmth respectively) for adding mamba2 support to vLLM and having productive discussions!

cc: @Quentin-Anthony @BerenMillidge @pglorio

github-actions · 2025-02-13T01:55:56Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

requirements-common.txt

vllm/model_executor/models/zamba2.py

yury-tokpanov · 2025-03-07T02:14:17Z

seems like current failures in checks are due to cv2 imports in transformers v4.49.0. This is a known issue: #13905

Other than that things work.

yury-tokpanov · 2025-03-11T00:58:09Z

@tlrmchlsmth other than the external issue with the latest released transformers (cv2 import in 4.49.0, but I see it's fixed in their dev branch), do you have other suggestions for this PR?

tlrmchlsmth

The PR looks great to me, thanks for the contribution -- I'll accept once the transformers 4.49 issue is resolved

tests/models/registry.py

requirements/common.txt

Signed-off-by: Yury Tokpanov <[email protected]>

Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: Quentin Anthony <[email protected]>

Signed-off-by: Yury Tokpanov <[email protected]>

Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Yury Tokpanov <[email protected]>

Signed-off-by: Yury Tokpanov <[email protected]>

DarkLight1337 · 2025-03-18T06:20:10Z

Thanks, can you also update the list of supported models in the docs with this model?

Signed-off-by: Yury Tokpanov <[email protected]>

yury-tokpanov · 2025-03-18T15:34:43Z

Thanks, can you also update the list of supported models in the docs with this model?

Yep, all done.

Btw, I see a bunch of tests failed, but upon further inspection it seems they're unrelated to the PR.

DarkLight1337 · 2025-03-18T15:56:09Z

Indeed they are unrelated - merging

Signed-off-by: Yury Tokpanov <[email protected]> Signed-off-by: Quentin Anthony <[email protected]> Co-authored-by: Quentin Anthony <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

Signed-off-by: Yury Tokpanov <[email protected]> Signed-off-by: Quentin Anthony <[email protected]> Co-authored-by: Quentin Anthony <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>

Signed-off-by: Yury Tokpanov <[email protected]> Signed-off-by: Quentin Anthony <[email protected]> Co-authored-by: Quentin Anthony <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>

Signed-off-by: Yury Tokpanov <[email protected]> Signed-off-by: Quentin Anthony <[email protected]> Co-authored-by: Quentin Anthony <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Mu Huai <[email protected]>

yury-tokpanov marked this pull request as draft February 13, 2025 01:56

yury-tokpanov force-pushed the zamba2 branch from d385611 to 43fff82 Compare February 22, 2025 01:58

mergify bot added the ci/build label Feb 22, 2025

yury-tokpanov force-pushed the zamba2 branch from 6223b86 to dc786fd Compare February 22, 2025 03:19

yury-tokpanov marked this pull request as ready for review February 22, 2025 03:19

yury-tokpanov requested review from DarkLight1337 and ywang96 as code owners February 22, 2025 03:19

DarkLight1337 requested a review from tlrmchlsmth February 22, 2025 04:13

yury-tokpanov force-pushed the zamba2 branch 2 times, most recently from 9593333 to 0a83814 Compare February 25, 2025 00:27

tlrmchlsmth self-assigned this Feb 25, 2025

tlrmchlsmth reviewed Feb 25, 2025

View reviewed changes

requirements-common.txt Outdated Show resolved Hide resolved

yury-tokpanov force-pushed the zamba2 branch from 0a83814 to 8a01708 Compare March 3, 2025 20:28

tlrmchlsmth reviewed Mar 3, 2025

View reviewed changes

vllm/model_executor/models/zamba2.py Outdated Show resolved Hide resolved

vllm/model_executor/models/zamba2.py Outdated Show resolved Hide resolved

vllm/model_executor/models/zamba2.py Outdated Show resolved Hide resolved

Quentin-Anthony force-pushed the zamba2 branch from 98380f5 to 0299739 Compare March 3, 2025 22:27

yury-tokpanov force-pushed the zamba2 branch 6 times, most recently from 66b1112 to 2b7397c Compare March 6, 2025 23:38

yury-tokpanov force-pushed the zamba2 branch from 2b7397c to 2ba5a42 Compare March 11, 2025 00:55

tlrmchlsmth reviewed Mar 13, 2025

View reviewed changes

DarkLight1337 reviewed Mar 14, 2025

View reviewed changes

tests/models/registry.py Outdated Show resolved Hide resolved

yury-tokpanov force-pushed the zamba2 branch from 360b1e8 to 6ecbe0d Compare March 15, 2025 01:50

DarkLight1337 reviewed Mar 15, 2025

View reviewed changes

requirements/common.txt Outdated Show resolved Hide resolved

Zamba2 initial commit

1716efb

Signed-off-by: Yury Tokpanov <[email protected]>

yury-tokpanov and others added 7 commits March 18, 2025 05:49

TP support, unit test, docsrtrings

c91c9c4

Signed-off-by: Yury Tokpanov <[email protected]>

Fix unit tests

89dab81

Signed-off-by: Yury Tokpanov <[email protected]>

label transformers req as just zamba2

39105a9

Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: Quentin Anthony <[email protected]>

Rebase + remove kv_cache/attn_metadata args

0867d1b

Signed-off-by: Yury Tokpanov <[email protected]>

MergedColumnParallel for MLP block

f3ec9ef

Signed-off-by: Yury Tokpanov <[email protected]>

Zamba2LoRA class + block indexing rework

7ad96e0

Signed-off-by: Yury Tokpanov <[email protected]>

Update tests/models/registry.py

54f25dd

Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Yury Tokpanov <[email protected]>

yury-tokpanov force-pushed the zamba2 branch 2 times, most recently from 4e3b75d to 71cf8f3 Compare March 18, 2025 06:17

revert requirements

68ffb4a

Signed-off-by: Yury Tokpanov <[email protected]>

yury-tokpanov force-pushed the zamba2 branch from 71cf8f3 to 68ffb4a Compare March 18, 2025 06:18

Add SupportsV0Only and update list of supported models

b34a384

Signed-off-by: Yury Tokpanov <[email protected]>

mergify bot added the documentation Improvements or additions to documentation label Mar 18, 2025

DarkLight1337 approved these changes Mar 18, 2025

View reviewed changes

DarkLight1337 enabled auto-merge (squash) March 18, 2025 07:24

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 18, 2025

vllm-bot merged commit 452e8fd into vllm-project:main Mar 18, 2025
37 of 42 checks passed

DarkLight1337 added this to the v0.8.0 milestone Mar 18, 2025

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

tlrmchlsmth mentioned this pull request Apr 24, 2025

[RFC]: Native support for Mamba, SSM, and hybrid transformer models in vLLM V1 #17140

Open

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MODEL] Add support for Zamba2 models #13185

[MODEL] Add support for Zamba2 models #13185

yury-tokpanov commented Feb 13, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Feb 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yury-tokpanov commented Mar 7, 2025 •

edited

Loading

Uh oh!

yury-tokpanov commented Mar 11, 2025

Uh oh!

tlrmchlsmth left a comment

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Mar 18, 2025

Uh oh!

yury-tokpanov commented Mar 18, 2025

Uh oh!

DarkLight1337 commented Mar 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

[MODEL] Add support for Zamba2 models #13185

[MODEL] Add support for Zamba2 models #13185

Conversation

yury-tokpanov commented Feb 13, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 13, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yury-tokpanov commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yury-tokpanov commented Mar 11, 2025

Uh oh!

tlrmchlsmth left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

DarkLight1337 commented Mar 18, 2025

Uh oh!

yury-tokpanov commented Mar 18, 2025

Uh oh!

DarkLight1337 commented Mar 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

yury-tokpanov commented Feb 13, 2025 •

edited by github-actions bot

Loading

yury-tokpanov commented Mar 7, 2025 •

edited

Loading