Skip to content

Commit e796f37

Browse files
authored
Implement the progressive rollout for raw deployment (kserve#4623)
Signed-off-by: Vincent Hou <shou73@bloomberg.net>
1 parent b72c993 commit e796f37

21 files changed

+1251
-31
lines changed

config/configmap/inferenceservice.yaml

Lines changed: 32 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -537,22 +537,38 @@ data:
537537
"imagePullSecrets": ["docker-secret"]
538538
}
539539
540-
# ====================================== DEPLOYMENT CONFIGURATION ======================================
541-
# Example
542-
deploy: |-
543-
{
544-
"defaultDeploymentMode": "Serverless"
545-
}
546-
deploy: |-
547-
{
548-
# defaultDeploymentMode specifies the default deployment mode of the kserve. The supported values are
549-
# Serverless, RawDeployment and ModelMesh. Users can override the deployment mode at service level
550-
# by adding the annotation serving.kserve.io/deploymentMode.For more info on deployment mode visit
551-
# Serverless https://kserve.github.io/website/master/admin/serverless/serverless/
552-
# RawDeployment https://kserve.github.io/website/master/admin/kubernetes_deployment/
553-
# ModelMesh https://kserve.github.io/website/master/admin/modelmesh/
554-
"defaultDeploymentMode": "Serverless"
555-
}
540+
# ====================================== DEPLOYMENT CONFIGURATION ======================================
541+
# Example
542+
deploy: |-
543+
{
544+
"defaultDeploymentMode": "Serverless",
545+
"deploymentRolloutStrategy": {
546+
"defaultRollout": {
547+
"maxSurge": "1",
548+
"maxUnavailable": "1"
549+
}
550+
}
551+
}
552+
553+
deploy: |-
554+
{
555+
# defaultDeploymentMode specifies the default deployment mode of the kserve. The supported values are
556+
# Standard and Knative. Users can override the deployment mode at service level
557+
# by adding the annotation serving.kserve.io/deploymentMode.
558+
# "defaultDeploymentMode": "Standard",
559+
# deploymentRolloutStrategy specifies the default rollout strategy for the Standard deployment mode
560+
# "deploymentRolloutStrategy": {
561+
# defaultRollout specifies the default rollout configuration using Kubernetes deployment strategy
562+
# "defaultRollout": {
563+
# maxSurge specifies the maximum number of pods that can be created above the desired replica count
564+
# Can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%)
565+
# "maxSurge": "1",
566+
# maxUnavailable specifies the maximum number of pods that can be unavailable during the update
567+
# Can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%)
568+
# "maxUnavailable": "1"
569+
# }
570+
# }
571+
}
556572
557573
# ====================================== SERVICE CONFIGURATION ======================================
558574
# Example
Lines changed: 195 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,195 @@
1+
# Rollout Strategy API Reference
2+
3+
## Overview
4+
5+
This document describes the API fields for rollout strategy configuration in KServe v1beta1. Rollout strategies can be configured through ConfigMap defaults or directly using Kubernetes `DeploymentStrategy`.
6+
7+
## ComponentExtensionSpec
8+
9+
The `ComponentExtensionSpec` supports two approaches for rollout strategy configuration:
10+
11+
### Fields
12+
13+
| Field | Type | Description | Required |
14+
|-------|------|-------------|----------|
15+
| `deploymentStrategy` | `appsv1.DeploymentStrategy` | Direct Kubernetes deployment strategy (highest priority) | No |
16+
17+
### Configuration Priority
18+
19+
1. **deploymentStrategy** - User-defined Kubernetes deployment strategy (highest priority)
20+
2. **ConfigMap rollout strategy** - Fallback when `defaultDeploymentMode` is `"Standard"`
21+
22+
## RolloutSpec (ConfigMap Configuration)
23+
24+
Defines the rollout strategy configuration for ConfigMap defaults. Users can configure different rollout modes by setting appropriate `maxSurge` and `maxUnavailable` values:
25+
26+
**Availability Mode (Zero Downtime)**:
27+
- Set `maxUnavailable: "0"` and `maxSurge` to desired value/percentage
28+
- New pods are created before old pods are terminated
29+
30+
**ResourceAware Mode (Resource Efficient)**:
31+
- Set `maxSurge: "0"` and `maxUnavailable` to desired value/percentage
32+
- Old pods are terminated before new pods are created
33+
34+
### Fields
35+
36+
| Field | Type | Description | Required | Default |
37+
|-------|------|-------------|----------|---------|
38+
| `maxSurge` | `string` | Maximum number of pods that can be created above desired replica count (e.g., `"1"`, `"25%"`) | Yes | - |
39+
| `maxUnavailable` | `string` | Maximum number of pods that can be unavailable during update (e.g., `"1"`, `"25%"`) | Yes | - |
40+
41+
42+
43+
## DeployConfig
44+
45+
The `DeployConfig` includes configuration for default rollout strategies.
46+
47+
### Fields
48+
49+
| Field | Type | Description | Required |
50+
|-------|------|-------------|----------|
51+
| `deploymentRolloutStrategy` | `DeploymentRolloutStrategy` | Default rollout strategy for deployments | No |
52+
53+
## DeploymentRolloutStrategy
54+
55+
Defines the default rollout strategy configuration for deployments.
56+
57+
### Fields
58+
59+
| Field | Type | Description | Required |
60+
|-------|------|-------------|----------|
61+
| `defaultRollout` | `RolloutSpec` | Default rollout configuration | No |
62+
63+
## Example ConfigMap
64+
65+
```yaml
66+
apiVersion: v1
67+
kind: ConfigMap
68+
metadata:
69+
name: inferenceservice-config
70+
namespace: kserve
71+
data:
72+
deploy: |-
73+
{
74+
"defaultDeploymentMode": "Standard",
75+
"deploymentRolloutStrategy": {
76+
"defaultRollout": {
77+
"maxSurge": "1", # For Availability mode: set maxUnavailable: "0"
78+
"maxUnavailable": "1" # For ResourceAware mode: set maxSurge: "0"
79+
}
80+
}
81+
}
82+
```
83+
84+
## Example InferenceService (Direct DeploymentStrategy)
85+
86+
### Availability Mode Example:
87+
```yaml
88+
apiVersion: serving.kserve.io/v1beta1
89+
kind: InferenceService
90+
metadata:
91+
name: availability-mode-example
92+
annotations:
93+
serving.kserve.io/deploymentMode: "Standard"
94+
spec:
95+
predictor:
96+
model:
97+
modelFormat:
98+
name: sklearn
99+
storageUri: "s3://my-bucket/model"
100+
# Availability mode: maxUnavailable = 0, maxSurge = desired value
101+
deploymentStrategy:
102+
type: RollingUpdate
103+
rollingUpdate:
104+
maxUnavailable: "0" # Zero downtime
105+
maxSurge: "1" # Allow one extra pod
106+
```
107+
108+
### ResourceAware Mode Example:
109+
```yaml
110+
apiVersion: serving.kserve.io/v1beta1
111+
kind: InferenceService
112+
metadata:
113+
name: resource-aware-example
114+
annotations:
115+
serving.kserve.io/deploymentMode: "Standard"
116+
spec:
117+
predictor:
118+
model:
119+
modelFormat:
120+
name: sklearn
121+
storageUri: "s3://my-bucket/model"
122+
# ResourceAware mode: maxSurge = 0, maxUnavailable = desired value
123+
deploymentStrategy:
124+
type: RollingUpdate
125+
rollingUpdate:
126+
maxSurge: "0" # Resource efficient
127+
maxUnavailable: "1" # Allow one pod unavailable
128+
```
129+
130+
## Example InferenceService (Using ConfigMap Defaults)
131+
132+
```yaml
133+
apiVersion: serving.kserve.io/v1beta1
134+
kind: InferenceService
135+
metadata:
136+
name: example-configmap-defaults
137+
annotations:
138+
serving.kserve.io/deploymentMode: "Standard"
139+
spec:
140+
predictor:
141+
model:
142+
modelFormat:
143+
name: sklearn
144+
storageUri: "s3://my-bucket/model"
145+
# No deploymentStrategy specified - uses ConfigMap defaults
146+
```
147+
148+
## Validation Rules
149+
150+
### For ConfigMap Configuration:
151+
1. **maxSurge Validation**: Must be a valid number or percentage string
152+
- Valid percentages: `"25%"`, `"50%"`, `"100%"`
153+
- Valid numbers: `"1"`, `"2"`, `"5"`
154+
2. **maxUnavailable Validation**: Same format as maxSurge
155+
156+
### For Direct DeploymentStrategy:
157+
1. **type**: Must be `"RollingUpdate"`
158+
2. **rollingUpdate.maxSurge**: Same validation as ConfigMap maxSurge
159+
3. **rollingUpdate.maxUnavailable**: Same validation as ConfigMap maxUnavailable
160+
161+
## Priority Order
162+
163+
When configuring rollout strategies, the following priority order applies:
164+
165+
1. **Multinode deployment override** (HIGHEST priority) - automatic for Ray workloads with `RAY_NODE_COUNT` environment variable
166+
2. **User-defined deploymentStrategy** (high priority) - specified in component extension spec
167+
3. **ConfigMap rollout strategy** (fallback) - only applies when `defaultDeploymentMode` is `"Standard"`
168+
4. **KServe default values** (if no configuration is provided)
169+
170+
**Important**: The ConfigMap rollout strategy only applies when:
171+
- No user-defined `deploymentStrategy` is specified in the component spec
172+
- The `defaultDeploymentMode` in the ConfigMap is set to `"Standard"`
173+
174+
## Default Values
175+
176+
### KServe Defaults
177+
When no rollout strategy is specified anywhere, KServe applies these defaults:
178+
- **maxUnavailable**: `25%`
179+
- **maxSurge**: `25%`
180+
181+
### Multinode Deployment Override
182+
For multinode deployments (Ray workloads), KServe automatically overrides ALL rollout strategy configurations with:
183+
- **maxUnavailable**: `0%`
184+
- **maxSurge**: `100%`
185+
186+
This override takes precedence over all other configurations, including user-defined `deploymentStrategy`.
187+
188+
### Default Values Summary
189+
190+
| Configuration | maxUnavailable | maxSurge | Notes |
191+
|---------------|----------------|----------|-------|
192+
| **No rollout strategy specified** | `25%` | `25%` | KServe defaults |
193+
| **Multinode deployment** | `0%` | `100%` | Overrides ALL other configurations |
194+
| **Availability mode** | `0` | `<ratio>` | From rollout spec |
195+
| **ResourceAware mode** | `<ratio>` | `0` | From rollout spec |
Lines changed: 89 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,89 @@
1+
apiVersion: serving.kserve.io/v1beta1
2+
kind: InferenceService
3+
metadata:
4+
name: rollout-strategy-example
5+
namespace: default
6+
annotations:
7+
serving.kserve.io/deploymentMode: "Standard"
8+
spec:
9+
predictor:
10+
model:
11+
modelFormat:
12+
name: sklearn
13+
storageUri: "s3://my-bucket/model"
14+
# Example 1: Availability Mode - Direct deployment strategy for high availability
15+
# Configuration: maxUnavailable = 0, maxSurge = desired value
16+
# Behavior: New pods are created first, then old pods are terminated (zero downtime)
17+
deploymentStrategy:
18+
type: RollingUpdate
19+
rollingUpdate:
20+
maxUnavailable: "0" # No pods unavailable during rollout
21+
maxSurge: "50%" # Can create 50% more pods during rollout
22+
23+
transformer:
24+
custom:
25+
container:
26+
image: my-transformer:latest
27+
env:
28+
- name: MODEL_NAME
29+
value: "my-model"
30+
# Example 2: ResourceAware Mode - Resource-efficient deployment strategy
31+
# Configuration: maxSurge = 0, maxUnavailable = desired value
32+
# Behavior: Old pods are terminated first, then new pods are created (resource efficient)
33+
deploymentStrategy:
34+
type: RollingUpdate
35+
rollingUpdate:
36+
maxSurge: "0" # No extra pods during rollout
37+
maxUnavailable: "25%" # Up to 25% of pods can be unavailable
38+
39+
---
40+
# Example 3: Using ConfigMap defaults (no deploymentStrategy specified)
41+
apiVersion: serving.kserve.io/v1beta1
42+
kind: InferenceService
43+
metadata:
44+
name: configmap-defaults-example
45+
namespace: default
46+
annotations:
47+
serving.kserve.io/deploymentMode: "Standard"
48+
spec:
49+
predictor:
50+
model:
51+
modelFormat:
52+
name: sklearn
53+
storageUri: "s3://my-bucket/model"
54+
# No deploymentStrategy specified - will use ConfigMap global defaults
55+
# when defaultDeploymentMode is "Standard"
56+
# Allows administrators to set organization-wide rollout policies
57+
58+
---
59+
# Example 4: Multinode deployment (Ray workload)
60+
# Note: KServe will automatically override ANY rollout strategy to:
61+
# maxUnavailable: "0%", maxSurge: "100%" for multinode deployments
62+
apiVersion: serving.kserve.io/v1beta1
63+
kind: InferenceService
64+
metadata:
65+
name: multinode-example
66+
namespace: default
67+
annotations:
68+
serving.kserve.io/deploymentMode: "Standard"
69+
spec:
70+
predictor:
71+
model:
72+
modelFormat:
73+
name: huggingface
74+
storageUri: "s3://my-bucket/llm-model"
75+
containers:
76+
- name: kserve-container
77+
image: my-ray-model-server:latest
78+
env:
79+
- name: RAY_NODE_COUNT # This triggers multinode deployment
80+
value: "4" # 1 head + 3 worker nodes
81+
- name: REQUEST_GPU_COUNT
82+
value: "8"
83+
# Even if you specify a different rollout strategy, KServe will override it
84+
# for multinode deployments to ensure Ray cluster stability
85+
deploymentStrategy:
86+
type: RollingUpdate
87+
rollingUpdate:
88+
maxUnavailable: "50%" # This will be overridden to "0%"
89+
maxSurge: "25%" # This will be overridden to "100%"

pkg/apis/serving/v1beta1/configmap.go

Lines changed: 19 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -129,7 +129,25 @@ type IngressConfig struct {
129129

130130
// +kubebuilder:object:generate=false
131131
type DeployConfig struct {
132-
DefaultDeploymentMode string `json:"defaultDeploymentMode,omitempty"`
132+
DefaultDeploymentMode string `json:"defaultDeploymentMode,omitempty"`
133+
DeploymentRolloutStrategy *DeploymentRolloutStrategy `json:"deploymentRolloutStrategy,omitempty"`
134+
}
135+
136+
// DeploymentRolloutStrategy defines the rollout strategy configuration for deployments
137+
type DeploymentRolloutStrategy struct {
138+
// DefaultRollout specifies the default rollout configuration
139+
// +optional
140+
DefaultRollout *RolloutSpec `json:"defaultRollout,omitempty"`
141+
}
142+
143+
// RolloutSpec defines the rollout strategy configuration using Kubernetes deployment strategy
144+
type RolloutSpec struct {
145+
// MaxSurge specifies the maximum number of pods that can be created above the desired replica count.
146+
// Can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%).
147+
MaxSurge string `json:"maxSurge"`
148+
// MaxUnavailable specifies the maximum number of pods that can be unavailable during the update.
149+
// Can be an absolute number (ex: 5) or a percentage of desired pods (ex: 10%).
150+
MaxUnavailable string `json:"maxUnavailable"`
133151
}
134152

135153
// +kubebuilder:object:generate=false

0 commit comments

Comments
 (0)