feat: upgrade ray to v2.53.0 and vllm to v0.11.2 for static node clusters by Levi080513 · Pull Request #274 · neutree-ai/neutree

Levi080513 · 2026-02-12T11:27:59Z

Issues

Upgrade Ray from v2.44.1 to v2.53.0 and vLLM from v0.8.5 to v0.11.2 for static node clusters (serving version > v1.0.0).

Notes:

Currently only NVIDIA GPU static node clusters are upgraded. AMD GPU cluster image has not been adapted or tested yet, pending resources for full validation.
Upgrading from v1.0.0 to v1.0.1 involves breaking changes: Endpoints need to be updated to work with v1.0.1 clusters, as v1.0.1 maybe no longer supports vLLM v0.8.5.

Changes

Upgrade Ray base image to 2.53.0 and vLLM to v0.11.2, adapt app.py for new Ray RequestRouter/RequestRouterConfig API and vLLM V1 AsyncLLM engine
Add NeutreeRayStatLogger to export vLLM metrics via Ray gauge, replacing the removed RayPrometheusStatLogger
Adapt custom schedulers (chwbl_scheduler.py, static_hash_scheduler.py) for Ray 2.53.0 RequestRouter API
Filter deprecated --dashboard-grpc-port and --dashboard-agent-grpc-port flags based on cluster version (> v1.0.0) in Go reconciler, with safety net filtering in start.py
Update vmagent relabel regex to handle both ray_vllm: (old) and ray_vllm_ (new) metric prefixes for OpenTelemetry compatibility
Switch from RAY_kill_child_processes_on_worker_exit_with_raylet_subreaper to RAY_process_group_cleanup_enabled for clusters > v1.0.0, removing VLLM_SKIP_P2P_CHECK workaround for new clusters
Use > v1.0.0 threshold for all version checks to correctly handle pre-release versions (e.g., v1.0.1-alpha.1)
Add ray_version and accelerators inputs to release-serve workflow
Sync test_chwbl_cache_key.py with actual chwbl_scheduler.py implementation
Reduce Ray object store memory from default 30% to 10% to free memory for inference engines
Disable OTEL metrics backend (RAY_enable_open_telemetry=false) due to metrics loss issue, fall back to OpenCensus. Set in both Dockerfile (for cross-version/cross-mode compatibility) and SSH Docker run options
Remove redundant enable_reasoning flag, rely solely on reasoning_parser to control reasoning mode (aligned with vLLM v0.9.0+ deprecation)

Test

Manual E2E testing on NVIDIA GPU static node cluster ✅
Old version (v1.0.0) static node cluster backward compatibility testing ✅

codecov · 2026-02-12T14:03:06Z

Codecov Report

❌ Patch coverage is 79.41176% with 7 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
internal/cluster/ray_ssh_operation.go	75.00%	3 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

Notes: 1. Currently only NVIDIA GPU static clusters are upgraded. AMD GPU clusters are pending full testing when resources become available. 2. Upgrading from v1.0.0 to v1.0.1 involves breaking changes: Endpoints need to be updated to work with v1.0.1 clusters, as v1.0.1 no longer supports vLLM v0.8.5. Changes: - Filter deprecated --dashboard-grpc-port and --dashboard-agent-grpc-port flags based on cluster version (> v1.0.0) in Go reconciler, with safety net filtering in start.py - Update vmagent relabel regex to handle both ray_vllm: (old) and ray_vllm_ (new) metric prefixes for OpenTelemetry compatibility - Switch from RAY_kill_child_processes_on_worker_exit_with_raylet_subreaper to RAY_process_group_cleanup_enabled for clusters > v1.0.0, which doesn't cause parent processes to lose child exit codes - Remove VLLM_SKIP_P2P_CHECK for new clusters since RAY_process_group_cleanup_enabled doesn't break vLLM's P2P check - Use > v1.0.0 threshold for all version checks to correctly handle pre-release versions (e.g., v1.0.1-alpha.1) - Sync test_chwbl_cache_key.py with actual chwbl_scheduler.py implementation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… API - Move CHWBL custom params to initialize_state() to match Ray 2.53.0 RequestRouter API (request_router_kwargs passes to initialize_state, not __init__) - Remove redundant curr_replicas property from both schedulers (already provided by base RequestRouter class) - Remove unnecessary threading.Lock (Ray Serve runs in single-threaded asyncio event loop) - Fix CHWBL to use own load balancing when initial replica is not a candidate instead of falling back to Ray's default scheduling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…assertion error Ray 2.53.0 defaults RAY_enable_open_telemetry to true, but its reporter_agent has an assertion that fails when different vLLM endpoints register the same histogram metric with different bucket boundaries (due to different max_model_len). This crashes the entire OTLP Export RPC, dropping all metrics in that batch. Fall back to OpenCensus to avoid this issue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Yuyz0112 · 2026-02-25T02:34:01Z

Upgrading from v1.0.0 to v1.0.1 involves breaking changes: Endpoints need to be updated to work with v1.0.1 clusters, as v1.0.1 maybe no longer supports vLLM v0.8.5.

What will happen if users:

Upgrade to v1.0.1
Suspend a v0.8.5 vllm endpoint
Resume the endpoint again

Can the new ray serve actor spin up?

Levi080513 · 2026-02-25T03:14:16Z

@Yuyz0112

The new ray serve actor will keep failed because now the static-node cluster not support multiple versions of the inference engine.

This issue will be resolved once support for multiple inference engine versions is available.

Set the env var in the Dockerfile so the fix applies regardless of control plane version or deployment mode (K8s/SSH). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Since vLLM v0.9.0, --enable-reasoning is deprecated. The reasoning_parser parameter alone controls whether reasoning is enabled - passing it directly to both engine and serving layers is sufficient. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Levi080513 force-pushed the hw/bump-ray-vllm branch 2 times, most recently from 6ee6236 to 8b3088f Compare February 12, 2026 13:59

Levi080513 force-pushed the hw/bump-ray-vllm branch from 8b3088f to a6578e1 Compare February 13, 2026 08:33

Levi080513 force-pushed the hw/bump-ray-vllm branch from a6578e1 to 58c3059 Compare February 14, 2026 03:40

Levi080513 marked this pull request as ready for review February 14, 2026 05:51

Levi080513 changed the title ~~feat: upgrade ray to v2.53.0 and vllm to v0.11.2~~ feat: upgrade ray to v2.53.0 and vllm to v0.11.2 for static node clusters Feb 14, 2026

Levi080513 requested a review from Yuyz0112 February 14, 2026 05:56

Levi080513 force-pushed the hw/bump-ray-vllm branch from 7ac9ff9 to 7a7803d Compare February 14, 2026 07:58

Levi080513 and others added 2 commits February 25, 2026 11:46

fix: bake RAY_enable_open_telemetry=false into cluster image Dockerfile

0afbf65

Set the env var in the Dockerfile so the fix applies regardless of control plane version or deployment mode (K8s/SSH). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: upgrade ray to v2.53.0 and vllm to v0.11.2 for static node clusters#274

feat: upgrade ray to v2.53.0 and vllm to v0.11.2 for static node clusters#274
Levi080513 wants to merge 5 commits intomainfrom
hw/bump-ray-vllm

Levi080513 commented Feb 12, 2026 •

edited

Loading

Uh oh!

codecov bot commented Feb 12, 2026 •

edited

Loading

Uh oh!

Yuyz0112 commented Feb 25, 2026

Uh oh!

Levi080513 commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Levi080513 commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issues

Changes

Test

Uh oh!

codecov bot commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Yuyz0112 commented Feb 25, 2026

Uh oh!

Levi080513 commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Levi080513 commented Feb 12, 2026 •

edited

Loading

codecov bot commented Feb 12, 2026 •

edited

Loading