Skip to content

Conversation

eric-higgins-ai
Copy link

@eric-higgins-ai eric-higgins-ai commented Jul 17, 2025

Purpose

The engine is run in a spawned subprocess, which Ray interprets as a new job with its own runtime environment. This means that vllm can't be pulled through the Ray runtime environment, as we don't pass the original job's runtime env through to the subprocess.

This issue was reported here.

Test Plan

Ran a Ray job with the following code

vision_processor_config = vLLMEngineProcessorConfig(
        model="Qwen/Qwen2.5-VL-32B-Instruct",
        engine_kwargs=dict(
            tensor_parallel_size=1,  
            pipeline_parallel_size=NUMBER_OF_GPUS,
            max_model_len=4096,
            enable_chunked_prefill=True,
            max_num_batched_tokens=2048,
            distributed_executor_backend="ray",
            device="cuda",
        ),
        # Override Ray's runtime env to include the Hugging Face token. Ray Data uses Ray under the hood to orchestrate the inference pipeline.
        runtime_env=dict(
            env_vars=dict(
                HF_TOKEN="<token>",
                VLLM_USE_V1="1",
            ),
        ),
        batch_size=1,
        concurrency=1,
        has_image=False
    )
    
    #build the processor
    processor = build_llm_processor(
        vision_processor_config,
        preprocess=lambda row: dict(
            messages=[
                {"role": "system", "content": "You are a bot that responds with haikus."},
                {"role": "user", "content": row["item"]}
            ],
            sampling_params=dict(
                temperature=0.3,
                max_tokens=250,
            )
        ),
        postprocess=lambda row: dict(
            answer=row["generated_text"],
            **row  # This will return all the original columns in the dataset.
        ),
    )

    #create the dataset
    ds = ray.data.from_items(["Start of the haiku is: Complete this for me..."])
    ds = processor(ds)
    ds.show(limit=1)

Test Result

I checked in the Ray dashboard that the launched job has the runtime env provided in the engine_kwargs.

Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables the propagation of a Ray runtime environment to vLLM's distributed workers. This is a useful feature when vLLM is used as a component within a larger Ray application that defines a specific runtime environment.

The changes are well-targeted:

  1. The ParallelConfig is extended to hold an optional runtime_env.
  2. When creating the engine configuration inside a Ray actor, the current runtime_env is fetched from the Ray context and stored in the ParallelConfig.
  3. When the Ray executor initializes the Ray cluster, it now passes this runtime_env to ray.init(), ensuring that subsequently created workers inherit the correct environment.

I've reviewed the implementation, and the logic appears sound and correctly handles the cases where Ray is already initialized versus when vLLM needs to initialize it. The changes are constrained to the Ray execution path and should not affect other backends. Overall, this is a good addition to improve vLLM's integration with the Ray ecosystem.

Signed-off-by: eric-higgins-ai <[email protected]>
Signed-off-by: eric-higgins-ai <[email protected]>
Copy link
Collaborator

@ruisearch42 ruisearch42 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I fully understand the problem. @eric-higgins-ai Would you mind clarifying a bit more?
cc @lk-chen @kouroshHakha are you aware of this Ray Data LLM issue?

# This call initializes Ray automatically if it is not initialized,
# but we should not do this here.
placement_group = ray.util.get_current_placement_group()
runtime_env = ray.get_runtime_context().runtime_env
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are passed in the runtime env?

@eric-higgins-ai
Copy link
Author

eric-higgins-ai commented Jul 20, 2025

@ruisearch42 sorry, I probably should have explained it more thoroughly in the description. We're pulling vllm in the ray runtime environment, so we launch the job with something like ray job submit --runtime-env-json '{"pip": ["vllm==0.9.2"]}' -- python3 script.py. vllm spawns a subprocess for the engine core and then runs ray.init in that subprocess - since this process is independent from the main one, ray interprets this as a new job with its own runtime environment. The current code doesn't pass any runtime environment into the ray.init call, so any tasks spawned by this new process (like the workers here) won't have vllm installed.

This issue was also reported a few days ago here (I intended to link this in the description but seem to have forgotten)

@ruisearch42
Copy link
Collaborator

ruisearch42 commented Jul 21, 2025

@eric-higgins-ai thanks for the context. The PR generally looks good to me.
Some quick questions: is the main goal passing pip dependencies in runtime_env? Is using an image with vLLM not an option?

I checked in the Ray dashboard that the launched job has the runtime env provided in the engine_kwargs.

Can you clarify what you runtime_env you saw in the Ray dashboard? Did you pass in pip dependencies?

@ruisearch42
Copy link
Collaborator

@eric-higgins-ai , could you add a unit test? It will be useful to verify it works and prevent future regressions.
After that I will approve and merge the PR.

@eric-higgins-ai
Copy link
Author

eric-higgins-ai commented Jul 23, 2025

@ruisearch42 to answer your questions:

  1. The main goal is indeed to pass pip dependencies in runtime_env. We could build a docker image with vllm, but 1. it doesn't integrate that well with our infra, and 2. it seems to me like this should be supported by vllm anyway, and it doesn't seem that hard to add support for it (considering that this PR is quite small)
  2. I passed in {"pip": ["vllm==0.9.2"]} and saw the same thing in the ray dashboard. We're also passing some env vars and I saw those too
  3. I'm a little busy with other things right now, but I'll try to add a unit test in the next few days

@ruisearch42
Copy link
Collaborator

Thanks @eric-higgins-ai .
I'm fixing the issue in this PR: #22040
similar to what's done here and with added unit test.
Could you let me know your email, will add you as co-author.

@eric-higgins-ai
Copy link
Author

thanks for the fix @ruisearch42! would rather not send my email here for fear of receiving spam email 😅 I'm ok with not being credited as co-author

@hmellor
Copy link
Member

hmellor commented Aug 8, 2025

fyi @eric-higgins-ai if you go to https://github.com/settings/emails you can use a GitHub provided noreply email for signing off commits

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants