Skip to content

Commit 7bd6f86

Browse files
KEP-5073: update declarative validation KEP to explain DV-Only, related roadmap, and associated graduation criteria
Co-authored-by: Lalit Chauhan <[email protected]>
1 parent ae9096a commit 7bd6f86

File tree

1 file changed

+113
-0
lines changed
  • keps/sig-api-machinery/5073-declarative-validation-with-validation-gen

1 file changed

+113
-0
lines changed

keps/sig-api-machinery/5073-declarative-validation-with-validation-gen/README.md

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,13 @@
1818
- [<code>DeclarativeValidation</code> &amp; <code>DeclarativeValidationTakeover</code> Will Target Beta From The Beginning](#declarativevalidation--declarativevalidationtakeover-will-target-beta-from-the-beginning)
1919
- [Linter](#linter)
2020
- [Documentation Generation](#documentation-generation)
21+
- [DV-Only Graduation Plan](#dv-only-graduation-plan)
22+
- [Requirements for DV-Only Usage](#requirements-for-dv-only-usage)
23+
- [Graduation Criteria for DV Tags and Features](#graduation-criteria-for-dv-tags-and-features)
24+
- [Tag Stability Levels](#tag-stability-levels)
25+
- [DV-Only Implementation Strategy for v1.35](#dv-only-implementation-strategy-for-v135)
26+
- [DV-Only Implementation Details](#dv-only-implementation-details)
27+
- [DV-Only Rollout Timeline](#dv-only-rollout-timeline)
2128
- [Analysis of existing validation rules](#analysis-of-existing-validation-rules)
2229
- [User Stories (Optional)](#user-stories-optional)
2330
- [Kubernetes developer wishes to add a field to an existing API version](#kubernetes-developer-wishes-to-add-a-field-to-an-existing-api-version)
@@ -38,6 +45,12 @@
3845
- [Risk: Added latency to API request handling.](#risk-added-latency-to-api-request-handling)
3946
- [Mitigation: Resolve Known &quot;Low Hanging Fruit&quot; of Performance Improvements In Current Validation Code](#mitigation-resolve-known-low-hanging-fruit-of-performance-improvements-in-current-validation-code)
4047
- [Mitigation: Avoid Conversion to Internal Type](#mitigation-avoid-conversion-to-internal-type)
48+
- [Risk: Committing to DV Only Makes Future Reversals More Costly](#risk-committing-to-dv-only-makes-future-reversals-more-costly)
49+
- [Mitigation: Incremental Adoption and Calculated Risk](#mitigation-incremental-adoption-and-calculated-risk)
50+
- [Risk: Altered Feature Gate Semantics for Mixed Validation Types](#risk-altered-feature-gate-semantics-for-mixed-validation-types)
51+
- [Mitigation: Controlled Scope and Communication](#mitigation-controlled-scope-and-communication)
52+
- [Risk: Panics in Mixed Validation Scenarios Cause Validation to &quot;Fail-Closed&quot;](#risk-panics-in-mixed-validation-scenarios-cause-validation-to-fail-closed)
53+
- [Mitigation: Controlled Scope, Initial Code Review, and Comprehensive Testing](#mitigation-controlled-scope-initial-code-review-and-comprehensive-testing)
4154
- [Design Details](#design-details)
4255
- [Summary of Declarative Validation Components](#summary-of-declarative-validation-components)
4356
- [<code>validation-gen</code> Implementation Plan](#validation-gen-implementation-plan)
@@ -235,6 +248,7 @@ Please feel free to try out the [prototype](https://github.com/jpbetz/kubernetes
235248
* Retain native (or nearly native) performance.
236249
* Improve testing rigor by being vastly easier to test.
237250
* Allow for client-side validation experiments.
251+
* Establish a low-risk, data-driven path for new APIs to adopt declarative validation natively ("DV-Only") without requiring a handwritten fallback, simplifying API development and review.
238252

239253
### Non-Goals
240254

@@ -340,6 +354,76 @@ By having all validators, associated IDL tags, their descriptions, etc. defined
340354
* Publishing documentation on all tags including how they work, their intended usage, examples, etc.
341355
* Building a system to auto-gen docs from this
342356

357+
## DV-Only Graduation Plan
358+
359+
### Requirements for DV-Only Usage
360+
For any validation tag or feature to be used in a DV-Only context, it must meet the following requirements:
361+
* Guaranteed GA Semantics: All horizontal features/semantics used by DV-Only tags must be GA and cannot be disabled. This ensures consistent behavior across clusters and versions. All tags in DV-Only usage must be GA.
362+
* Proven Stability: The tag/feature must have been proven stable through on-by-default usage for at least one release cycle with no metric failures observed in production.
363+
* No Backwards-Incompatible Changes: The validation semantics must not change in backwards-incompatible ways between versions.
364+
365+
### Graduation Criteria for DV Tags and Features
366+
Horizontal Features Must be GA before any DV-Only usage. An example of such features include ratcheting, subresource support, and update correlation Individual validation tags must also graduate to GA/stable before any DV-Only usage. GA/stable is proven by the features and tags meeting the below criteria:
367+
* One full release cycle (~3-4 months) of production usage with no declarative_validation_mismatch_total or declarative_validation_panic_total metric failures
368+
* Declarative Validation Workgroup confirmation that the feature is considered GA/stable
369+
370+
### Tag Stability Levels
371+
Each validation tag will be assigned a stability level:
372+
373+
| Stability Level | Definition | DV-Only Eligible |
374+
| --------------- | ---------- | ---------------- |
375+
| GA/Stable | Proven stable “on-by-default” usage | Yes |
376+
| Alpha | Experimental or newly introduced | No - requires handwritten fallback |
377+
378+
### DV-Only Implementation Strategy for v1.35
379+
No DV-Only usage will be permitted in v1.35. Instead, the v1.35 release will focus on:
380+
381+
* Data Collection: Use v1.33, v1.34 and v1.35 to gather stability metrics for:
382+
* Ratcheting behavior
383+
* Declarative Validation tags used and “on-by-default” in these releases
384+
* Dual Implementation Requirement: New API fields must implement both declarative validation tags AND handwritten validation code.
385+
* Simplified Migration Path: To ease the dual implementation requirement and prepare for v1.36 DV-Only we plan on providing a library of validation methods corresponding to DV tags. This way users can more easily onboard onto declarative tags and in the future allows for more easily migrating fully to declarative validation.
386+
387+
### DV-Only Implementation Details
388+
389+
The implementation of DV-Only support will enable new API fields to be validated using only Declarative Validation (DV) tags without requiring parallel handwritten Go code.
390+
391+
Execution of Declarative Validations: For any API type that includes at least one "DV-Only" validation rule, the generated declarative validation code will always be executed. This ensures that the DV-Only rules are always enforced. As noted in the KEP risks, "The declarative validation code path for these types will always run, regardless of the feature gate's setting, to ensure the authoritative 'DV-Only' rules are enforced".
392+
393+
Error Differentiation: A mechanism will be implemented within the validation runtime to distinguish between errors arising from "DV-Only" rules and those from "Migrated" DV rules (which are still under dual implementation with handwritten code).
394+
395+
"DV-Only" Errors Always Enforced: Validation errors identified as originating from "DV-Only" rules will always be included in the final set of errors returned to the user. Their enforcement is not controlled by the feature gates.
396+
397+
Feature Gate Scope Limited to Migrated Rules: The behavior of the DeclarativeValidation and DeclarativeValidationTakeover feature gates will be limited to the "Migrated" portions of the validation logic. These gates will control whether the handwritten or declarative version of a migrated rule is authoritative and if comparisons are done, but they will not affect the enforcement of "DV-Only" rules.
398+
399+
Panic Handling ("Fail-Closed"): In API types that combine DV-Only and migrated rules, any panic occurring within the declarative validation execution path will cause the entire validation to fail, and an error will be returned. This "fail-closed" behavior is necessary because it's not possible to isolate the source of the panic to a specific rule type, and DV-Only rules must always be enforced to ensure the integrity of new API fields. Rigorous testing, as outlined in the "Mitigation" section of the KEP, will be crucial to prevent panics.
400+
401+
This implementation strategy allows new fields to natively adopt Declarative Validation, streamlining development, while coexisting with the ongoing migration of existing handwritten validations.
402+
### DV-Only Rollout Timeline
403+
v1.33 - v1.34 (completed):
404+
405+
* ReplicationController migration with +k8s:minimum, +k8s:optional and default ratcheting
406+
* CSR migration with +k8s:item, +k8s:zeroOrOneOfMember, +k8s:listType=map, +k8s:listMapKey, and list ratcheting
407+
* Begin collecting stability metrics
408+
409+
v1.35 (current plan):
410+
411+
* No DV-Only usage permitted
412+
* Continue migrations and net new API field validation logic with dual implementation (DV + hand-written) requirement
413+
* Expand tag coverage for data collection
414+
* Implement validation library for simplified dual implementation
415+
* CSR migration adds: +k8s:item, +k8s:zeroOrOneOfMember, list ratcheting
416+
417+
v1.36 (target):
418+
419+
* Enable DV-Only for GA-graduated tags (pending v1.34/v1.35 metrics validation)
420+
* Initial set limited to “low-risk” tags with proven stability
421+
* Maintain dual implementation for non-GA tags
422+
* Decision point: Review metrics and determine final GA tag set for v1.36
423+
424+
v1.37+:
425+
* Enable DV-Only for expanding set of GA-graduated tags and features (pending v1.36+ metrics validation)
426+
343427
## Analysis of existing validation rules
344428

345429
At the time of writing this document, there are ~1181 validation rules written in about 15k lines of go code in [kubernetes/kubernetes/pkg/apis](https://github.com/kubernetes/kubernetes/commit/0c62b122c02bff9131b6db960042150a3638d3f3).
@@ -468,6 +552,33 @@ From analyzing the validation code there is "SO MUCH low-hanging fruit" - @thock
468552

469553
Requests are received as the versioned type, so it should be feasible to avoid extra conversions for resources that have no need of handwritten validations. This is likely not necessary given the known "low hanging fruit" of performance improvements but mentioned for completeness.
470554

555+
#### Risk: Committing to DV Only Makes Future Reversals More Costly
556+
By allowing new APIs to be developed with "DV-Only" rules (w/ no handwritten fallback), we are establishing DV as an authoritative component for those APIs. If a future decision were made to back out of the Declarative Validation initiative entirely, it would become significantly more work. We would need to perform a reverse migration to generate handwritten validation code from the DV tags for these new APIs before removing the DV tooling.
557+
##### Mitigation: Incremental Adoption and Calculated Risk
558+
This is a calculated risk that reflects growing confidence in the Declarative Validation project. The "DV-Only" approach is limited to net-new validations on new fields, which provides a clear and contained path for adoption. All new validations will be on new fields, which are always feature gated. It does not affect the rollback strategy for existing types that are being migrated. This incremental step allows us to prove the value of DV for new development while the broader migration of legacy code continues under the safety of the existing feature gate mechanism.
559+
#### Risk: Altered Feature Gate Semantics for Mixed Validation Types
560+
"DV-Only" rules changes the initial behavior of the DeclarativeValidation feature gate for any API type that adopts them. For the "migrated validation only" cases (w/ no "DV-Only') setting DeclarativeValidation=false acts as a complete off-switch, preventing the execution of any generated declarative validation code.
561+
562+
For new API types that mix "DV-Only" and migrated validations, this is no longer the case. The declarative validation code path for these types will always run, regardless of the feature gate's setting, to ensure the authoritative "DV-Only" rules are enforced. The DeclarativeValidation gate's role is reduced to only controlling whether the system performs a comparison against handwritten rules for the migrated portion of the validation (w/ DeclarativeValidation controlling nothing in these cases and DeclarativeValidationTakeover controlling if handwritten or declarative validation is the authoritative validator). This creates a dual-behavior system for the feature gates, which could be confusing for operators and violates the expectation that a feature gate can fully disable a feature's code path.
563+
##### Mitigation: Controlled Scope and Communication
564+
This is a calculated trade-off to enable progress and native adoption of Declarative Validation for new APIs. The mitigation strategy relies on clear distinctions of the implementation patterns and communication:
565+
566+
* **Controlled Scope and Low-Risk Adoption:** The initial scope for "DV-Only" is strictly limited. We will manage risk by targeting:
567+
* **Low-Risk Validations:** We will not use the "DV-Only" approach for new, highly complex validation rules. The focus is on clear, straightforward rules.
568+
* **Low-Risk Fields:** "DV-Only" validations will only be added to net-new fields, which are independently controlled by their own feature gates. This prevents any impact on the stability of existing, stable API fields.
569+
570+
* **Documentation and Communication:** Documentation will clearly describe this dual-mode behavior, explaining when and why the declarative validation code always runs for certain types. This ensures cluster administrators understand the behaviour of the declarative validation feature gates.
571+
#### Risk: Panics in Mixed Validation Scenarios Cause Validation to "Fail-Closed"
572+
For API types that mix "DV-Only" rules with migrated DV rules, the behavior in the event of a panic changes significantly. In the existing migration-only case (eg: ReplicationController, CSR, etc.), if a panic occurs in the declarative validation code while DeclarativeValidationTakeover is false, the panic is recovered and ignored. The system "fails open" by falling back to the trusted handwritten validation result.
573+
574+
However, in a mixed validation scenario, the system cannot distinguish whether a panic originated from a "DV-Only" rule or a feature-gated migrated rule. To ensure new APIs are not left with unenforceable validation, any panic in the declarative validation path will cause the entire validation to fail, returning an error to the user. This "fail-closed" behavior is safer for new APIs but means a bug in a migrated rule—which would have previously been safely ignored—could now block the creation or update of new API types that are adopting DV natively.
575+
##### Mitigation: Controlled Scope, Initial Code Review, and Comprehensive Testing
576+
* **Controlled Scope and Low-Risk Adoption:** The initial scope for "DV-Only" is strictly limited. We will manage risk by targeting:
577+
* **Low-Risk Validations:** We will not use the "DV-Only" approach for “risky” or complex validation rules. The focus is on clear, straightforward rules.
578+
* **Low-Risk Fields:** "DV-Only" validations will only be added to net-new fields (& their validation), which are independently controlled by their own feature gates. This prevents any impact on the stability of existing, stable API fields.
579+
* **Code Review:** The generated declarative validation code is checked into the repository. This makes the code fully reviewable as we start DV-Only, allowing reviewers to catch potential issues before they are merged. Once DV-Only is established the generated code can be glimpsed/assumed-correct similar to other k8s generated code.
580+
* **Comprehensive Unit and Fuzz Testing:** The generated validation logic for these new types will undergo unit and fuzz testing. The primary goal of this testing is to ensure the code is error-proof and, most importantly, panic-proof, directly addressing the "fail-closed" concern.
581+
471582
## Design Details
472583

473584
### Summary of Declarative Validation Components
@@ -1844,6 +1955,8 @@ If the API server is failing to meet SLOs (latency, validation error-rate, etc.)
18441955
18451956
## Implementation History
18461957
1958+
v1.35: Dual implementation (DV + hand-written) requirement enforced, no DV-Only usage, tag/feature stability data collection and stability codified, validation library for dual implementation
1959+
18471960
## Drawbacks
18481961
18491962
## Alternatives

0 commit comments

Comments
 (0)