-
Notifications
You must be signed in to change notification settings - Fork 4.5k
Use vLLM extra to generate GPU requirements files #36420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
@damccorm I used your changes plus fixed OutOfMemory issue, but there is still an error with starting VLLM server. |
|
This PR is unrelated to that flakiness, so I wouldn't expect it to make a difference immediately. In the logs, I now see a bunch of failures like: I'm not 100% sure what would cause that error, but it may be a consequence of trying to run Gemma on a T4 - could you try updating to an L4 to see if that solves the problem? |
It helped, now I have a green run. Still need to investigate the reasons and perform more tests. |
This reverts commit 1a5d907.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #36420 +/- ##
=============================================
+ Coverage 36.24% 55.10% +18.85%
Complexity 1666 1666
=============================================
Files 1060 1059 -1
Lines 165704 165727 +23
Branches 1195 1195
=============================================
+ Hits 60063 91320 +31257
+ Misses 103465 72231 -31234
Partials 2176 2176
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
assign set of reviewers |
|
Assigning reviewers: R: @tvalentyn for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
| INDEX_URL_OPTION="--extra-index-url https://download.pytorch.org/whl/cpu" | ||
| if [[ $EXTRAS == *"vllm"* ]]; then | ||
| # Explicitly install torch to avoid https://github.com/facebookresearch/xformers/issues/740 | ||
| # This should be overwritten later since the vllm extra is installed alongside torch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be overwritten later
I don't quite follow the comment. Could you please clarify what is being overwritten?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - I agree this was unclear. I was trying to say that the torch version we install here doesn't matter since torch will get installed anyways later as part of the vllm install (and that will determine the actual version which is installed). Hopefully the new comment is clearer
| source "$ENV_PATH"/bin/activate | ||
| pip install --upgrade pip setuptools wheel | ||
|
|
||
| # For non-vllm (non-gpu) requirement files, force downloading torch from CPU wheels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the outcome of the generated requirements file depend on which wheel is installed (cpu vs non-cpu)? If so, is it because the dependencies of torch change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes - specifically
| torch==2.8.0+cpu |
has a cpu postfix, and some gpu requirements (which are not ASF-license compliant) are omitted
|
Thanks! |
While we can't actually package this in a dockerfile because of licensing issues, this provides a reproducable set of requirements that we can build/test against, and which external users can use to generate their containers.
We can also eventually point to third party providers who publish these containers (e.g. I plan to publish this in a Google-maintained container registry).
Currently, this scopes out Python 3.9 and 3.13
With these changes, I was able to auto-generate #36650 by running the update python dependencies action.
Part of #35487
Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md
GitHub Actions Tests Status (on master branch)
See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.