feat: upgrade ray to v2.53.0 and vllm to v0.11.2 for static node clusters#274
Open
Levi080513 wants to merge 5 commits intomainfrom
Open
feat: upgrade ray to v2.53.0 and vllm to v0.11.2 for static node clusters#274Levi080513 wants to merge 5 commits intomainfrom
Levi080513 wants to merge 5 commits intomainfrom
Conversation
6ee6236 to
8b3088f
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
8b3088f to
a6578e1
Compare
Notes: 1. Currently only NVIDIA GPU static clusters are upgraded. AMD GPU clusters are pending full testing when resources become available. 2. Upgrading from v1.0.0 to v1.0.1 involves breaking changes: Endpoints need to be updated to work with v1.0.1 clusters, as v1.0.1 no longer supports vLLM v0.8.5. Changes: - Filter deprecated --dashboard-grpc-port and --dashboard-agent-grpc-port flags based on cluster version (> v1.0.0) in Go reconciler, with safety net filtering in start.py - Update vmagent relabel regex to handle both ray_vllm: (old) and ray_vllm_ (new) metric prefixes for OpenTelemetry compatibility - Switch from RAY_kill_child_processes_on_worker_exit_with_raylet_subreaper to RAY_process_group_cleanup_enabled for clusters > v1.0.0, which doesn't cause parent processes to lose child exit codes - Remove VLLM_SKIP_P2P_CHECK for new clusters since RAY_process_group_cleanup_enabled doesn't break vLLM's P2P check - Use > v1.0.0 threshold for all version checks to correctly handle pre-release versions (e.g., v1.0.1-alpha.1) - Sync test_chwbl_cache_key.py with actual chwbl_scheduler.py implementation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
a6578e1 to
58c3059
Compare
… API - Move CHWBL custom params to initialize_state() to match Ray 2.53.0 RequestRouter API (request_router_kwargs passes to initialize_state, not __init__) - Remove redundant curr_replicas property from both schedulers (already provided by base RequestRouter class) - Remove unnecessary threading.Lock (Ray Serve runs in single-threaded asyncio event loop) - Fix CHWBL to use own load balancing when initial replica is not a candidate instead of falling back to Ray's default scheduling Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
7ac9ff9 to
7a7803d
Compare
…assertion error Ray 2.53.0 defaults RAY_enable_open_telemetry to true, but its reporter_agent has an assertion that fails when different vLLM endpoints register the same histogram metric with different bucket boundaries (due to different max_model_len). This crashes the entire OTLP Export RPC, dropping all metrics in that batch. Fall back to OpenCensus to avoid this issue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Contributor
What will happen if users:
Can the new ray serve actor spin up? |
Collaborator
Author
|
The new ray serve actor will keep failed because now the static-node cluster not support multiple versions of the inference engine. This issue will be resolved once support for multiple inference engine versions is available. |
Set the env var in the Dockerfile so the fix applies regardless of control plane version or deployment mode (K8s/SSH). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Since vLLM v0.9.0, --enable-reasoning is deprecated. The reasoning_parser parameter alone controls whether reasoning is enabled - passing it directly to both engine and serving layers is sufficient. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issues
Upgrade Ray from v2.44.1 to v2.53.0 and vLLM from v0.8.5 to v0.11.2 for static node clusters (serving version > v1.0.0).
Notes:
Changes
RequestRouter/RequestRouterConfigAPI and vLLM V1AsyncLLMengineNeutreeRayStatLoggerto export vLLM metrics via Ray gauge, replacing the removedRayPrometheusStatLoggerchwbl_scheduler.py,static_hash_scheduler.py) for Ray 2.53.0RequestRouterAPI--dashboard-grpc-portand--dashboard-agent-grpc-portflags based on cluster version (> v1.0.0) in Go reconciler, with safety net filtering instart.pyray_vllm:(old) andray_vllm_(new) metric prefixes for OpenTelemetry compatibilityRAY_kill_child_processes_on_worker_exit_with_raylet_subreapertoRAY_process_group_cleanup_enabledfor clusters > v1.0.0, removingVLLM_SKIP_P2P_CHECKworkaround for new clusters> v1.0.0threshold for all version checks to correctly handle pre-release versions (e.g.,v1.0.1-alpha.1)ray_versionandacceleratorsinputs to release-serve workflowtest_chwbl_cache_key.pywith actualchwbl_scheduler.pyimplementationRAY_enable_open_telemetry=false) due to metrics loss issue, fall back to OpenCensus. Set in both Dockerfile (for cross-version/cross-mode compatibility) and SSH Docker run optionsenable_reasoningflag, rely solely onreasoning_parserto control reasoning mode (aligned with vLLM v0.9.0+ deprecation)Test