|
| 1 | +# KubeRay APIServer v2 Migration Plan |
| 2 | + |
| 3 | +## Target Customers |
| 4 | + |
| 5 | +- Advanced and early adopters who currently use the APIServer v1 in production |
| 6 | +- Infrastructure and platform engineers who have customized or extended the KubeRay APIServer for internal tooling or environments |
| 7 | + |
| 8 | +## Overview |
| 9 | + |
| 10 | +KubeRay APIServer v2 introduces a more maintainable, Kubernetes-native, and flexible interface for managing Ray clusters. |
| 11 | + |
| 12 | +In v1, exposing new fields required modifying protobuf definitions, regenerating HTTP/gRPC clients, and updating |
| 13 | +tests — a time-consuming and error-prone process that often delayed support for new features. This manual |
| 14 | +synchronization between CRDs and protobuf definitions added significant maintenance overhead and slowed |
| 15 | +developer velocity. |
| 16 | + |
| 17 | +With v2, we eliminate these bottlenecks by directly reusing the OpenAPI schema defined in the Kubernetes CRDs. |
| 18 | +Instead of manually defining fields or generating new clients, APIServer v2 acts as a transparent HTTP proxy to the |
| 19 | +Kubernetes API server. All CRD fields are exposed by default, and advanced behaviors such as compute template |
| 20 | +injection, default values, and mutations can be implemented using user-defined middleware functions (UDFs). |
| 21 | + |
| 22 | +To simplify the system further, gRPC support has been removed in favor of HTTP-only APIs. This approach aligns |
| 23 | +better with native Kubernetes tooling and improves extensibility, maintainability, and onboarding for |
| 24 | +infrastructure engineers. This document outlines the major changes from APIServer v1 to v2. |
| 25 | +existing workflows. |
| 26 | + |
| 27 | +## What’s Changed v1 vs v2 |
| 28 | + |
| 29 | +| Category | v1 (Legacy) | v2 (New) | |
| 30 | +|-----------------------------|------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------| |
| 31 | +| API protocol | HTTP + gRPC | HTTP only (gRPC removed) | |
| 32 | +| Field exposure model | Manually defined in protobuf + requires codegen | Auto-reflected from Kubernetes CRD OpenAPI schema | |
| 33 | +| Client code generation | Required (for HTTP and gRPC clients) | Not required (simple HTTP proxy with path rewrite) | |
| 34 | +| Adding new fields | Requires protobuf updates, codegen, and tests | No protobuf needed; all fields exposed automatically via OpenAPI + optional UDF | |
| 35 | +| Compute template support | Requires hardcoded logic in API server | Handled via HTTP middleware/UDF; official support starts in Stage 2 | |
| 36 | +| Pagination support | Manual implementation required | Natively supported using Kubernetes API pagination | |
| 37 | +| Documentation policy | Partial or undocumented | Official documentation includes only v2 | |
| 38 | +| Maintenance and extensibility | High overhead: PR needed per new field, manual sync with operator | Lightweight: CRD fields auto-reflected, extensible via middleware, 50% less maintenance | |
| 39 | + |
| 40 | +## Migration Plan |
| 41 | + |
| 42 | +The migration from v1 to v2 will occur in three stages: |
| 43 | + |
| 44 | +- Stage 1 – v2 released and validated |
| 45 | + - v2 becomes publicly available and feature-complete |
| 46 | + - All major features from v1 are supported in v2 |
| 47 | + - gRPC support is not included in v2, but remains available in v1 during the deprecation period |
| 48 | + - Integration tests are reused where applicable |
| 49 | + - Middleware-based compute template support is deferred to Stage 2 |
| 50 | +- Stage 2 – v1 deprecated (v1 and v2 coexist) |
| 51 | + - v2 becomes the primary supported interface |
| 52 | + - v1 remains for compatibility but receives no new features |
| 53 | + - Warning messages may be surfaced for v1 usage |
| 54 | + - A proposal is under consideration to refactor the API server as a reusable SDK (e.g., apiserversdk) |
| 55 | +- Stage 3 – v1 fully removed |
| 56 | + - All v1-related code and tests are deleted |
| 57 | + - v2 becomes the only supported API server implementation |
| 58 | + |
| 59 | +## Dev Progress |
| 60 | + |
| 61 | +- Not all v2 features will ship in the kuberay v1.4 release |
| 62 | +- Essential features, such as creating KubeRay CR, from v1 are supported |
| 63 | + |
| 64 | +## Issue Report |
| 65 | + |
| 66 | +- When filing GitHub issues, please tag them with apiserver-v1 or v2 |
| 67 | +- If no version tag is added, or the issue content does not mention a version, we’ll assume it refers to v2 |
0 commit comments