Fix Ray placement group allocation is not respecting env VLLM_RAY_PER_WORKER_GPUS (fractional gpu) #22577

eric-higgins-ai · 2025-08-10T00:30:37Z

Purpose

This is pretty much the same change as #14521, but with the STRICT_SPREAD change removed. Just to add some context about our use case + address the comments raised on that PR:

We are running vLLM as a step in a Ray Data pipeline, where our Ray cluster has 1 CPU-only head node, some GPU worker nodes with 1 GPU each, and some CPU-only worker nodes. vLLM requires the engine to be run on a GPU node, and in order to do that we need to either run the engine with num_gpus>0 or create it in a placement group with GPUs, and we don't want to do the latter for a reason that's not important here. Since vLLM requires 1 worker on the same node as the engine, we need to set e.g. num_gpus=0.1 and VLLM_RAY_PER_WORKER_GPUS=0.9, but since this placement group validation doesn't respect VLLM_RAY_PER_WORKER_GPUS this is impossible right now.
- Let me know if there's a better solution than this though. My main constraint is that I want to avoid scheduling the engine actor in a placement group, and this seems like it should be supported (considering that the code to create a new placement group exists), but I'm open to other solutions if I missed anything. Another possible solution is allowing placement_group to be passed into LLMEngine.from_engine_args
Addressing this comment on the previous PR: "With this change, workers from the same TP or PP group of a vLLM instance may colocation with each other, which may cause failures" - this seems to me like it's already an issue and not a new one introduced by this PR. vLLM creates placement group bundles that request 1 GPU and Ray allows scheduling multiple actors to 1 bundle, so if VLLM_RAY_PER_WORKER_GPUS<=0.5 then it's already possible that multiple workers could share a GPU.
- I'm not sure if Ray will spread tasks among bundles if possible - i.e. if there are 5 bundles with 1 GPU and 5 workers each requesting 0.5 GPU, Ray might always prefer to use all 5 bundles. Its exact behavior in this case isn't documented though, so it's better to not rely on it.

Test Plan

We applied the patch internally and ran some jobs. Checked that they all started up successfully

github-actions · 2025-08-10T00:30:45Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Code Review

This pull request aims to respect the VLLM_RAY_PER_WORKER_GPUS environment variable when creating Ray placement groups, which is crucial for fractional GPU allocation. The changes correctly replace the hardcoded GPU request of 1.0 with the value from the environment variable. My review focuses on a potential side effect for users with multi-GPU setups. While the change is valid for your use case, it could lead to multiple workers being co-located on the same GPU in a multi-GPU scenario, which is generally unsupported and may cause failures. I've suggested adding a warning to alert users to this possibility.

gemini-code-assist · 2025-08-10T00:31:46Z

vllm/executor/ray_utils.py

@@ -338,6 +339,7 @@ def initialize_ray_cluster(
    else:
        logger.info("No current placement group found. "
                    "Creating a new placement group.")
+        device_resource_request = envs.VLLM_RAY_PER_WORKER_GPUS


Using a fractional VLLM_RAY_PER_WORKER_GPUS when world_size > 1 can lead to multiple workers from the same tensor-parallel group being scheduled on the same GPU. This is generally not supported and can cause failures.

While you've noted this is for a specific use case, this change could unintentionally affect users with multi-GPU setups. The previous implementation requested a full GPU (1.0) for each worker in the placement group, which prevented this co-location scenario. This PR changes that behavior.

To mitigate this risk for other users, please add a warning when world_size > 1 and a fractional GPU value is used.

device_resource_request = envs.VLLM_RAY_PER_WORKER_GPUS if parallel_config.world_size > 1 and device_resource_request < 1.0: logger.warning( "VLLM_RAY_PER_WORKER_GPUS is set to %f, which is less than 1.0. " "When using multi-GPU inference (world_size > 1), this can " "cause multiple workers to be placed on the same GPU, which " "is not supported and may lead to unexpected behavior or " "failures. Please ensure that each worker is placed on a " "separate GPU.", device_resource_request)

Signed-off-by: eric-higgins-ai <[email protected]>

gemini-code-assist bot reviewed Aug 10, 2025

View reviewed changes

update

421d557

Signed-off-by: eric-higgins-ai <[email protected]>

eric-higgins-ai force-pushed the main branch from fed7268 to 421d557 Compare August 10, 2025 00:33

eric-higgins-ai added 3 commits August 9, 2025 17:42

fix formatting

944606d

Signed-off-by: eric-higgins-ai <[email protected]>

fix ruff

cb28450

Signed-off-by: eric-higgins-ai <[email protected]>

formatting

db2522d

Signed-off-by: eric-higgins-ai <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix Ray placement group allocation is not respecting env VLLM_RAY_PER_WORKER_GPUS (fractional gpu) #22577

Fix Ray placement group allocation is not respecting env VLLM_RAY_PER_WORKER_GPUS (fractional gpu) #22577

eric-higgins-ai commented Aug 10, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Aug 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Aug 10, 2025

Uh oh!

Uh oh!

Uh oh!

Fix Ray placement group allocation is not respecting env VLLM_RAY_PER_WORKER_GPUS (fractional gpu) #22577

Are you sure you want to change the base?

Fix Ray placement group allocation is not respecting env VLLM_RAY_PER_WORKER_GPUS (fractional gpu) #22577

Conversation

eric-higgins-ai commented Aug 10, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Uh oh!

github-actions bot commented Aug 10, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Aug 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eric-higgins-ai commented Aug 10, 2025 •

edited by github-actions bot

Loading