You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Auto-MNNVL: validate annotation values and sync design doc (#386)
Update the design document to match the implementation details.
Add strict validation for grove.io/auto-mnnvl annotation values
Changes:
- Update design doc with correct config paths and annotation values
- Reject invalid annotation values (must be "enabled" or "disabled")
Copy file name to clipboardExpand all lines: docs/designs/mnnvl-design.md
+26-22Lines changed: 26 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -94,25 +94,26 @@ This document covers **Phase 1** of MNNVL support in Grove. See GREP-270 for the
94
94
Enabling and disabling the feature will be done by the cluster admin by setting a flag in the Grove OperatorConfiguration.
95
95
96
96
```go
97
-
// MNNVLConfiguration defines the configuration for MNNVL (Multi-Node NVLink) support.
98
-
typeMNNVLConfigurationstruct {
99
-
// Enabled indicates whether MNNVL support is enabled.
100
-
// When true, the operator validates that the ComputeDomain CRD is installed at startup.
97
+
// NetworkAcceleration defines the configuration for network acceleration features.
98
+
typeNetworkAccelerationstruct {
99
+
// AutoMNNVLEnabled indicates whether automatic MNNVL (Multi-Node NVLink) support is enabled.
100
+
// When true, the operator validates that the ComputeDomain CRD is installed at startup
101
+
// and automatically creates ComputeDomain resources for GPU workloads.
101
102
// When MNNVL support is enabled, cluster admin should ensure that the ComputeDomain CRD has been installed.
102
103
// If this prerequisite fails then Grove will exit with a non-zero exit code.
103
104
// Default: false
104
-
Enabledbool`json:"enabled"`
105
+
AutoMNNVLEnabledbool`json:"autoMNNVLEnabled"`
105
106
}
106
107
```
107
108
108
-
The default value of `Enabled` is `false`, meaning MNNVL support is disabled unless explicitly enabled by the cluster administrator.
109
+
The default value of `AutoMNNVLEnabled` is `false`, meaning MNNVL support is disabled unless explicitly enabled by the cluster administrator.
109
110
110
111
The value could be set from a Helm chart under the config attribute
111
112
112
113
```yaml
113
114
config:
114
-
mnnvl:
115
-
enabled: false
115
+
network:
116
+
autoMNNVLEnabled: false
116
117
```
117
118
118
119
> **Note:** Using the `OperatorConfiguration` for feature enablement is chosen for simplicity in Phase 1. However, a plugin-based approach would provide better decoupling between the MNNVL feature and Grove core, and should be considered for future phases.
@@ -129,7 +130,7 @@ When a PodCliqueSet is created, webhooks determine and enforce the MNNVL enablem
129
130
130
131
#### Mutating Webhook (on Create)
131
132
132
-
The mutating webhook adds the `grove.io/auto-mnnvl: "true"` annotation **only** when all conditions are met:
133
+
The mutating webhook adds the `grove.io/auto-mnnvl: "enabled"` annotation **only** when all conditions are met:
133
134
134
135
1. Annotation does not already exist
135
136
2. MNNVL feature is enabled in `OperatorConfiguration`
@@ -162,7 +163,7 @@ kind: PodCliqueSet
162
163
metadata:
163
164
name: my-pcs
164
165
annotations:
165
-
grove.io/auto-mnnvl: "true"# Added by webhook
166
+
grove.io/auto-mnnvl: "enabled" # Added by webhook
166
167
spec:
167
168
# ... same spec
168
169
```
@@ -171,25 +172,26 @@ spec:
171
172
172
173
A validating webhook runs **on PCS creation only** to reject invalid MNNVL configurations:
173
174
174
-
-**Reject** if `grove.io/auto-mnnvl: "true"` is set but MNNVL feature is **disabled** globally
175
+
- **Reject** if annotation value is not `"enabled"` or `"disabled"` (e.g., `"true"`, `"false"`, empty string)
176
+
- **Reject** if `grove.io/auto-mnnvl: "enabled"` is set but MNNVL feature is **disabled** globally
175
177
176
-
This prevents users from explicitly requesting MNNVL when the cluster doesn't support it.
178
+
This prevents users from explicitly requesting MNNVL when the cluster doesn't support it, and ensures only valid annotation values are accepted.
177
179
178
180
#### Validating Webhook (on Update)
179
181
180
182
A validating webhook ensures the `grove.io/auto-mnnvl` annotation is **immutable** after PCS creation. Any attempt to add, modify, or remove the annotation on an existing PCS is rejected.
181
183
182
184
#### Opt-out Behavior
183
185
184
-
Users can opt-out of MNNVL for a specific PCS by explicitly setting `grove.io/auto-mnnvl: "false"`**before creation**. When the mutating webhook sees the annotation already exists, it does not override it.
186
+
Users can opt-out of MNNVL for a specific PCS by explicitly setting `grove.io/auto-mnnvl: "disabled"` **before creation**. When the mutating webhook sees the annotation already exists, it does not override it.
185
187
186
188
```yaml
187
189
apiVersion: grove.io/v1alpha1
188
190
kind: PodCliqueSet
189
191
metadata:
190
192
name: my-pcs
191
193
annotations:
192
-
grove.io/auto-mnnvl: "false"# Explicit opt-out
194
+
grove.io/auto-mnnvl: "disabled"# Explicit opt-out
193
195
spec:
194
196
# ... GPU workload that won't use MNNVL
195
197
```
@@ -204,8 +206,8 @@ The PCS controller has a reconciliation flow for managing resources in a specifi
204
206
205
207
Before creating the `CD`, the controller checks the `grove.io/auto-mnnvl` annotation on the PCS:
206
208
207
-
- If `grove.io/auto-mnnvl: "true"` → Create ComputeDomains for each replica
208
-
- If `grove.io/auto-mnnvl: "false"` or annotation is absent → Skip ComputeDomain creation
209
+
- If `grove.io/auto-mnnvl: "enabled"` → Create ComputeDomains for each replica
210
+
- If `grove.io/auto-mnnvl: "disabled"` or annotation is absent → Skip ComputeDomain creation
209
211
210
212
Since the annotation is set by the mutating webhook at PCS creation time (based on feature enablement and GPU requirements), the controller logic is simplified to a single annotation check.
211
213
@@ -253,7 +255,7 @@ ComputeDomain creation follows the same observability pattern as other Grove-man
253
255
254
256
Deleting a ComputeDomain while pods are actively using its RCT causes significant workload disruption. To prevent accidental deletion, Grove adds a **finalizer** to each ComputeDomain it creates.
255
257
256
-
**Finalizer:**`grove.io/protect-computedomain`
258
+
**Finalizer:**`grove.io/computedomain-finalizer`
257
259
258
260
With this finalizer, a ComputeDomain cannot be deleted until Grove explicitly removes the finalizer. This provides stronger protection than a watch-and-recreate approach, which would leave a gap where the workload is in a broken state.
259
261
@@ -263,7 +265,7 @@ kind: ComputeDomain
263
265
metadata:
264
266
name: my-pcs-0
265
267
finalizers:
266
-
- grove.io/protect-computedomain
268
+
- grove.io/computedomain-finalizer
267
269
labels:
268
270
app.kubernetes.io/managed-by: grove
269
271
app.kubernetes.io/part-of: my-pcs
@@ -283,6 +285,8 @@ spec:
283
285
284
286
If a user attempts to delete a CD manually, it will remain in `Terminating` state until the PCS is deleted or scaled down.
285
287
288
+
> **Note:** The finalizer name `grove.io/computedomain-finalizer` follows the pattern `grove.io/{resource}-finalizer` for clarity and consistency.
289
+
286
290
### Scale-Out and Scale-In
287
291
288
292
When scaling out (replicas increased), the subsequent reconciliation process will identify the ComputeDomains missing for the new replica indices and create them using the identical logic as the initial creation.
@@ -293,7 +297,7 @@ The controller lists existing `ComputeDomains` by label selector, computes expec
293
297
294
298
### PCS Deletion
295
299
296
-
When a PodCliqueSet is deleted, the PCS controller's finalizer logic removes the `grove.io/protect-computedomain` finalizer from all owned ComputeDomains. Once the finalizer is removed, Kubernetes garbage-collects the CDs through the owner reference mechanism.
300
+
When a PodCliqueSet is deleted, the PCS controller's finalizer logic removes the `grove.io/computedomain-finalizer` finalizer from all owned ComputeDomains. Once the finalizer is removed, Kubernetes garbage-collects the CDs through the owner reference mechanism.
297
301
298
302
## PCLQ Creation and RCT Injection
299
303
@@ -302,7 +306,7 @@ The `resourceClaims` reference is injected into the PCLQ's pod spec template at
302
306
### PCS Creating PCLQ
303
307
304
308
When the PCS controller creates a PCLQ, it checks:
305
-
1. Does the PCS have `grove.io/auto-mnnvl: "true"`?
309
+
1. Does the PCS have `grove.io/auto-mnnvl: "enabled"`?
306
310
2. Does the PCLQ's pod spec require GPU (`nvidia.com/gpu`)?
307
311
308
312
If both conditions are true, the controller injects `resourceClaims` into the PCLQ's pod spec template before creation:
@@ -340,7 +344,7 @@ kind: PodCliqueScalingGroup
340
344
metadata:
341
345
name: my-pcs-0-scaling
342
346
annotations:
343
-
grove.io/auto-mnnvl: "true" # Propagated from PCS
347
+
grove.io/auto-mnnvl: "enabled" # Propagated from PCS
344
348
labels:
345
349
app.kubernetes.io/part-of: my-pcs
346
350
grove.io/podcliqueset-replica-index: "0"
@@ -351,7 +355,7 @@ spec:
351
355
### PCSG Creating PCLQ
352
356
353
357
When the PCSG controller creates a PCLQ, it uses the **same injection logic** as the PCS controller:
354
-
1. Check if PCSG has `grove.io/auto-mnnvl: "true"` annotation
358
+
1. Check if PCSG has `grove.io/auto-mnnvl: "enabled"` annotation
355
359
2. Check if the PCLQ's pod spec requires GPU
356
360
357
361
If both are true, inject `resourceClaims` into the PCLQ's pod spec template.
0 commit comments