-
Notifications
You must be signed in to change notification settings - Fork 53
feat: Adds a vllm backend #122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Adds a vllm backend #122
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
ba3bb52 to
300c148
Compare
93a191a to
69d3be8
Compare
jakelorocco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks mostly good to me; I left a few comments and looks like there's some pre-commit checks failing as well
87cbc96 to
9701c08
Compare
jakelorocco
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good. There will be conflicts / changes required based on my async PR: #137. I'm happy to make those commits though once the async stuff gets merged and this branch gets rebased.
e84ff46 to
048e90d
Compare
61d6384 to
291b459
Compare
|
async support is already implemented |
Signed-off-by: Masataro Asai <[email protected]>
Signed-off-by: Masataro Asai <[email protected]>
Signed-off-by: Masataro Asai <[email protected]>
5009659 to
186af0f
Compare
186af0f to
55397ad
Compare
…PR that migrates to V1
|
I just merged main. If tests do not predictably pass in the CI/CD pipeline, we should mark those as non-cicd tests (ask @avinash2692 if you need a pointer on how to do this). Please run the full test suite locally after the main merge and ensure all tests are passing locally. Other than that, LGTM. |
* feat: added smaller qwen models for debugging Signed-off-by: Masataro Asai <[email protected]> * feat(vllm): copied from huggingface Signed-off-by: Masataro Asai <[email protected]> * fix(vllm): remove alora and cache Signed-off-by: Masataro Asai <[email protected]> * fix(vllm): remove tool calls Signed-off-by: Masataro Asai <[email protected]> * fix(vllm): finished the implementation with limited functionality: free-form and constrained generation Signed-off-by: Masataro Asai <[email protected]> * fix(vllm): passing mypy and linter Signed-off-by: Masataro Asai <[email protected]> * fix(vllm): added vllm optional dep in pyproject.toml Signed-off-by: Masataro Asai <[email protected]> * feat(vllm test): copied from huggingface Signed-off-by: Masataro Asai <[email protected]> * fix(vllm test): implemented the test Signed-off-by: Masataro Asai <[email protected]> * test: require V0 in vllm test Signed-off-by: Masataro Asai <[email protected]> * refactor: ctx to chat conversion function * refactor: use_alora function * refactor: moved _extract_model_tool_requests to mellea.backends.utils * feat(vllm): added tool calls * test(tools): run test with mistral * fix(vllm): rename model_options -> engine_args * fix(vllm): use FancyLogger * fix(vllm): ignore type checking for vllm and msgspec * fix(vllm): fixed the backend name in the log * feat(vllm): asynchronous call support * test(vllm): asynchronous call support * fix(vllm): avoid unnecessary incremental processing in non-streaming mode * fix(vllm): fix for the new return format * fix(vllm): fixed vllm test for the new contexts * fix(vllm): addressed minor comments * fix(vllm): uv lock * fix(vllm): mark V0 api test qualitative; will be removed in a future PR that migrates to V1 --------- Signed-off-by: Masataro Asai <[email protected]> Co-authored-by: MASATARO ASAI [email protected] <[email protected]> Co-authored-by: Nathan Fulton <[email protected]>
basic VLLM backend without tool and alora support.
VLLM_V1=0to be set when running.