Use vLLM extra to generate GPU requirements files #36420

damccorm · 2025-10-07T14:10:15Z

While we can't actually package this in a dockerfile because of licensing issues, this provides a reproducable set of requirements that we can build/test against, and which external users can use to generate their containers.

We can also eventually point to third party providers who publish these containers (e.g. I plan to publish this in a Google-maintained container registry).

Currently, this scopes out Python 3.9 and 3.13

3.9 is scoped out because I ran into minor issues with dependency conflicts around transformers versions. These could be fixed, but I didn't bother since we will remove 3.9 support with the next release
3.13 requires some wheels to be built on the fly since not all packages provide prebuilt wheels. This is generally fine, but causes issues for some wheels which require rust to be built (this is not part of our CI env, and ideally wouldn't be in most users' containers). I filed [Task]: Add 3.13 GPU containers when dependencies are ready #36637 to track this.

With these changes, I was able to auto-generate #36650 by running the update python dependencies action.

Part of #35487

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

Amar3tto · 2025-10-08T13:41:20Z

@damccorm I used your changes plus fixed OutOfMemory issue, but there is still an error with starting VLLM server.
Could you please help to understand what happened?
https://github.com/apache/beam/actions/runs/18342904797/job/52242432601

damccorm · 2025-10-08T14:03:22Z

This PR is unrelated to that flakiness, so I wouldn't expect it to make a difference immediately.

In the logs, I now see a bunch of failures like:

RuntimeError: self and mat2 must have the same dtype, but got Float and Half

I'm not 100% sure what would cause that error, but it may be a consequence of trying to run Gemma on a T4 - could you try updating to an L4 to see if that solves the problem?

Amar3tto · 2025-10-09T05:28:52Z

This PR is unrelated to that flakiness, so I wouldn't expect it to make a difference immediately.

In the logs, I now see a bunch of failures like:
RuntimeError: self and mat2 must have the same dtype, but got Float and Half
I'm not 100% sure what would cause that error, but it may be a consequence of trying to run Gemma on a T4 - could you try updating to an L4 to see if that solves the problem?

It helped, now I have a green run. Still need to investigate the reasons and perform more tests.
Draft PR: #36451

This reverts commit 1a5d907.

codecov · 2025-10-27T21:32:36Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 55.10%. Comparing base (471050c) to head (3e17728).
⚠️ Report is 15 commits behind head on master.

Additional details and impacted files

@@              Coverage Diff              @@
##             master   #36420       +/-   ##
=============================================
+ Coverage     36.24%   55.10%   +18.85%     
  Complexity     1666     1666               
=============================================
  Files          1060     1059        -1     
  Lines        165704   165727       +23     
  Branches       1195     1195               
=============================================
+ Hits          60063    91320    +31257     
+ Misses       103465    72231    -31234     
  Partials       2176     2176

Flag	Coverage Δ
python	`80.97% <ø> (+40.45%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

github-actions · 2025-10-28T18:10:38Z

Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment assign set of reviewers

damccorm · 2025-10-28T18:19:01Z

assign set of reviewers

github-actions · 2025-10-28T18:20:08Z

Assigning reviewers:

R: @tvalentyn for label python.
R: @Abacn for label build.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

stop reviewer notifications - opt out of the automated review tooling
remind me after tests pass - tag the comment author after tests pass
waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

tvalentyn · 2025-10-28T20:17:11Z

sdks/python/container/run_generate_requirements.sh

+INDEX_URL_OPTION="--extra-index-url https://download.pytorch.org/whl/cpu"
+if [[ $EXTRAS == *"vllm"* ]]; then
+  # Explicitly install torch to avoid https://github.com/facebookresearch/xformers/issues/740
+  # This should be overwritten later since the vllm extra is installed alongside torch


This should be overwritten later

I don't quite follow the comment. Could you please clarify what is being overwritten?

Thanks - I agree this was unclear. I was trying to say that the torch version we install here doesn't matter since torch will get installed anyways later as part of the vllm install (and that will determine the actual version which is installed). Hopefully the new comment is clearer

tvalentyn · 2025-10-28T20:18:24Z

sdks/python/container/run_generate_requirements.sh

 source "$ENV_PATH"/bin/activate
 pip install --upgrade pip setuptools wheel

+# For non-vllm (non-gpu) requirement files, force downloading torch from CPU wheels


Does the outcome of the generated requirements file depend on which wheel is installed (cpu vs non-cpu)? If so, is it because the dependencies of torch change?

Yes - specifically

beam/sdks/python/container/ml/py310/ml_image_requirements.txt

Line 217 in fe07fe7

torch==2.8.0+cpu

has a cpu postfix, and some gpu requirements (which are not ASF-license compliant) are omitted

tvalentyn · 2025-10-31T16:45:21Z

Thanks!

Add vllm extra

ae8245b

github-actions bot added the python label Oct 7, 2025

Push requirements files

731f477

github-actions bot added the docker label Oct 7, 2025

preinstall torch

cce7db4

Amar3tto mentioned this pull request Oct 9, 2025

Fix vLLM Gemma, add vLLM extra #36451

Merged

3 tasks

damccorm added 3 commits October 13, 2025 16:32

Fix up requirement generation

1a5d907

Revert "Fix up requirement generation"

f7d0003

This reverts commit 1a5d907.

install rust

ba5d016

github-actions bot added the build label Oct 13, 2025

damccorm added 4 commits October 13, 2025 19:01

Highmem

6836908

Exclude 3.13 for now

0026842

add todo

d5f9457

CPU torch wheels

fb80415

remove 3.9

c0114a7

damccorm mentioned this pull request Oct 28, 2025

Update Python Dependencies #36650

Closed

Add requirements files

3e17728

damccorm changed the title ~~Add vllm extra~~ Use vLLM extra to generate GPU containers Oct 28, 2025

damccorm changed the title ~~Use vLLM extra to generate GPU containers~~ Use vLLM extra to generate GPU requirements files Oct 28, 2025

damccorm marked this pull request as ready for review October 28, 2025 16:29

github-actions bot added the Next Action: Reviewers label Oct 28, 2025

tvalentyn reviewed Oct 28, 2025

View reviewed changes

Clarify comment

dbe4ed3

damccorm mentioned this pull request Oct 31, 2025

Update Python Dependencies #36695

Closed

tvalentyn approved these changes Oct 31, 2025

View reviewed changes

damccorm merged commit d90b4e8 into master Oct 31, 2025
130 of 138 checks passed

damccorm deleted the users/damccorm/vllmExtra branch October 31, 2025 16:53

Use vLLM extra to generate GPU requirements files #36420

Use vLLM extra to generate GPU requirements files #36420

Uh oh!

Conversation

damccorm commented Oct 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GitHub Actions Tests Status (on master branch)

Uh oh!

Amar3tto commented Oct 8, 2025

Uh oh!

damccorm commented Oct 8, 2025

Uh oh!

Amar3tto commented Oct 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions bot commented Oct 28, 2025

Uh oh!

damccorm commented Oct 28, 2025

Uh oh!

github-actions bot commented Oct 28, 2025

Uh oh!

tvalentyn Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

damccorm Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

tvalentyn Oct 28, 2025

Choose a reason for hiding this comment

Uh oh!

damccorm Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

tvalentyn commented Oct 31, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

damccorm commented Oct 7, 2025 •

edited

Loading

Amar3tto commented Oct 9, 2025 •

edited

Loading

codecov bot commented Oct 27, 2025 •

edited

Loading

tvalentyn Oct 28, 2025 •

edited

Loading