vLLM cannot connect to existing Ray cluster #17512

as-bain · 2025-05-01T00:55:10Z

as-bain
May 1, 2025

I've been attempting to connect a vLLM engine (as part of KubeAI) to a Ray Cluster (deployed by Kuberay) and have not had much success. For some reason it is unable to generate the file node_ip_address.json. I can confirm that if I run ray status in the vLLM engine pod I see exactly the same output as I can see in the Ray cluster head pod, so vLLM is able to communicate with ray. These are the logs from vLLM.

2025-04-30 17:31:15,749	INFO worker.py:1514 -- Using address ray-cluster-kuberay-head-svc.kuberay.svc.cluster.local:6379 set in the environment variable RAY_ADDRESS
2025-04-30 17:31:15,749	INFO worker.py:1654 -- Connecting to existing Ray cluster at address: ray-cluster-kuberay-head-svc.kuberay.svc.cluster.local:6379...
2025-04-30 17:31:16,766	INFO node.py:1084 -- Can't find a `node_ip_address.json` file from /tmp/ray/session_2025-04-29_22-14-32_731655_1. Have you started Ray instance using `ray start` or `ray.init`?
2025-04-30 17:31:26,771	INFO node.py:1084 -- Can't find a `node_ip_address.json` file from /tmp/ray/session_2025-04-29_22-14-32_731655_1. Have you started Ray instance using `ray start` or `ray.init`?

Executing a health check from the vLLM engine pod returns an exit code of 0, which means the ray cluster health is allegedly ok.

ray health-check --address ray-cluster-kuberay-head-svc.kuberay.svc.cluster.local:6379

Has anyone seen the same behaviour before but successfully connected vLLM to an external ray cluster?

Engine Config:

  args:
  - --dtype=bfloat16
  - --tensor-parallel-size=2
  - --pipeline-parallel-size=2
  - --no-enable-prefix-caching
  - --gpu-memory-utilization=0.95
  - --distributed-executor-backend=ray
  - --max-model-len=65536
  engine: VLLM
  env:
    RAY_ADDRESS: ray-cluster-kuberay-head-svc.kuberay.svc.cluster.local:6379

Versions:

vLLM - 0.8.5, 0.8.2
Ray - 2.43.0-py312

Platform:

AKS (v1.30.9)

Stack Trace:

Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.12/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 400, in run_engine_core
    raise e
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 387, in run_engine_core
    engine_core = EngineCoreProc(*args, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 329, in __init__
    super().__init__(vllm_config, executor_class, log_stats,
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core.py", line 64, in __init__
    self.model_executor = executor_class(vllm_config)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 286, in __init__
    super().__init__(*args, **kwargs)
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/executor_base.py", line 52, in __init__
    self._init_executor()
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_distributed_executor.py", line 105, in _init_executor
    initialize_ray_cluster(self.parallel_config)
  File "/usr/local/lib/python3.12/dist-packages/vllm/executor/ray_utils.py", line 299, in initialize_ray_cluster
    ray.init(address=ray_address)
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/client_mode_hook.py", line 103, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/worker.py", line 1797, in init
    _global_node = ray._private.node.Node(
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/node.py", line 204, in __init__
    node_ip_address = self._wait_and_get_for_node_address()
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/ray/_private/node.py", line 1091, in _wait_and_get_for_node_address
    raise ValueError(
INFO 04-30 18:19:21 [ray_distributed_executor.py:127] Shutting down Ray distributed executor. If you see error log from logging.cc regarding SIGTERM received, please ignore because this is the expected termination process in Ray.
ValueError: Can't find a `node_ip_address.json` file from /tmp/ray/session_2025-04-29_22-14-32_731655_1. for 60 seconds. A ray instance hasn't started. Did you do `ray start` or `ray.init` on this host?
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1130, in <module>
    uvloop.run(run_server(args))
  File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 109, in run
    return __asyncio.run(
           ^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 195, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
  File "/usr/local/lib/python3.12/dist-packages/uvloop/__init__.py", line 61, in wrapper
    return await main
           ^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 1078, in run_server
    async with build_async_engine_client(args) as engine_client:
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 146, in build_async_engine_client
    async with build_async_engine_client_from_engine_args(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/entrypoints/openai/api_server.py", line 178, in build_async_engine_client_from_engine_args
    async_llm = AsyncLLM.from_vllm_config(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 150, in from_vllm_config
    return cls(
           ^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/async_llm.py", line 118, in __init__
    self.engine_core = core_client_class(
                       ^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 642, in __init__
    super().__init__(
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 398, in __init__
    self._wait_for_engine_startup()
  File "/usr/local/lib/python3.12/dist-packages/vllm/v1/engine/core_client.py", line 430, in _wait_for_engine_startup
    raise RuntimeError("Engine core initialization failed. "
RuntimeError: Engine core initialization failed. See root cause above.

onestardao · 2025-08-18T11:46:40Z

onestardao
Aug 18, 2025

This is a textbook distributed connection/healthcheck failure—exactly the sort of cross-process bug that keeps popping up in vLLM + Ray setups. (In our issue map it’s classified under “distributed infra: stale connection pool / cluster node health desync”.)

Most of the time, your Ray cluster may pass the built-in healthcheck but still fail when vLLM tries to schedule or allocate resources—due to socket state, firewall, or subtle config drift between nodes.

Quick things to check:

Ensure all Ray and vLLM nodes have matching environment, compatible Ray versions, and no leftover zombie processes.
Check your cluster’s firewall/NAT settings; sometimes internal IPs don’t propagate right between vLLM pods and Ray’s head node.
Clean up all Ray nodes with ray stop --force and restart.
If you’re using Docker, check network mode and port forwarding.
There are known issues with Ray 2.8.x and vLLM on certain Ubuntu builds; downgrading Ray has solved similar issues for some teams.

If you want the step-by-step diagnosis checklist or a full breakdown of these connection issues, let me know and I’ll share the reference.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

vLLM cannot connect to existing Ray cluster #17512

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

vLLM cannot connect to existing Ray cluster #17512

Uh oh!

Uh oh!

as-bain May 1, 2025

Replies: 1 comment

Uh oh!

onestardao Aug 18, 2025

as-bain
May 1, 2025

onestardao
Aug 18, 2025