-
-
Notifications
You must be signed in to change notification settings - Fork 11.5k
[MODEL] Add support for Zamba2 models #13185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
d385611 to
43fff82
Compare
6223b86 to
dc786fd
Compare
9593333 to
0a83814
Compare
66b1112 to
2b7397c
Compare
|
seems like current failures in checks are due to cv2 imports in transformers v4.49.0. This is a known issue: #13905 Other than that things work. |
|
@tlrmchlsmth other than the external issue with the latest released transformers (cv2 import in 4.49.0, but I see it's fixed in their dev branch), do you have other suggestions for this PR? |
tlrmchlsmth
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks great to me, thanks for the contribution -- I'll accept once the transformers 4.49 issue is resolved
Signed-off-by: Yury Tokpanov <[email protected]>
Signed-off-by: Yury Tokpanov <[email protected]>
Signed-off-by: Yury Tokpanov <[email protected]>
Co-authored-by: Tyler Michael Smith <[email protected]> Signed-off-by: Quentin Anthony <[email protected]>
Signed-off-by: Yury Tokpanov <[email protected]>
Signed-off-by: Yury Tokpanov <[email protected]>
Signed-off-by: Yury Tokpanov <[email protected]>
Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Yury Tokpanov <[email protected]>
4e3b75d to
71cf8f3
Compare
Signed-off-by: Yury Tokpanov <[email protected]>
|
Thanks, can you also update the list of supported models in the docs with this model? |
Signed-off-by: Yury Tokpanov <[email protected]>
Yep, all done. Btw, I see a bunch of tests failed, but upon further inspection it seems they're unrelated to the PR. |
|
Indeed they are unrelated - merging |
Signed-off-by: Yury Tokpanov <[email protected]> Signed-off-by: Quentin Anthony <[email protected]> Co-authored-by: Quentin Anthony <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Yury Tokpanov <[email protected]> Signed-off-by: Quentin Anthony <[email protected]> Co-authored-by: Quentin Anthony <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Louis Ulmer <[email protected]>
Signed-off-by: Yury Tokpanov <[email protected]> Signed-off-by: Quentin Anthony <[email protected]> Co-authored-by: Quentin Anthony <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Cyrus Leung <[email protected]>
Signed-off-by: Yury Tokpanov <[email protected]> Signed-off-by: Quentin Anthony <[email protected]> Co-authored-by: Quentin Anthony <[email protected]> Co-authored-by: Tyler Michael Smith <[email protected]> Co-authored-by: Cyrus Leung <[email protected]> Signed-off-by: Mu Huai <[email protected]>
This PR adds support for Zamba2 models (#9382), a series of mamba2-transformer hybrid models with shared attention blocks and LoRAs, applied to shared MLP and attention blocks, depending on the model. 1.2B and 7B models use RoPE for their attention blocks.
This PR is fully compatible with Zamba2 integration in HuggingFace transformers library, which was recently merged into the main branch.
Unit tests pass now.
We would like to acknowledge authors of Bamba PR and Mamba2 PR (@fabianlim and @tlrmchlsmth respectively) for adding mamba2 support to vLLM and having productive discussions!
cc: @Quentin-Anthony @BerenMillidge @pglorio