diff --git a/keps/prod-readiness/sig-scheduling/5491.yaml b/keps/prod-readiness/sig-scheduling/5491.yaml new file mode 100644 index 00000000000..b77e51ec930 --- /dev/null +++ b/keps/prod-readiness/sig-scheduling/5491.yaml @@ -0,0 +1,6 @@ +# The KEP must have an approver from the +# "prod-readiness-approvers" group +# of http://git.k8s.io/enhancements/OWNERS_ALIASES +kep-number: 5491 +alpha: + approver: "@johnbelamaric" diff --git a/keps/sig-scheduling/5491-dra-list-types-for-attributes/README.md b/keps/sig-scheduling/5491-dra-list-types-for-attributes/README.md new file mode 100644 index 00000000000..9fe55477133 --- /dev/null +++ b/keps/sig-scheduling/5491-dra-list-types-for-attributes/README.md @@ -0,0 +1,1438 @@ + +# KEP-5491: DRA: List Types for Attributes + + + + + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [API Changes](#api-changes) + - [Introduce typed-list in DeviceAttribute](#introduce-typed-list-in--deviceattribute) + - [Introduce matchSemantics in DeviceConstraint](#introduce-matchsemantics-in-deviceconstraint) + - [Introduce distinctSemantics in DeviceConstraint](#introduce-distinctsemantics-in-deviceconstraint) + - [User Stories (Optional)](#user-stories-optional) + - [Story 1: Hardware Topological Aligned CPUs & GPUs & NICs](#story-1-hardware-topological-aligned-cpus--gpus--nics) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Go Type Definitions](#go-type-definitions) + - [DeviceAttribute](#deviceattribute) + - [DeviceConstraint](#deviceconstraint) + - [Implementation (for evaluating constraints)](#implementation-for-evaluating-constraints) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [Integration tests](#integration-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Alpha](#alpha) + - [Beta](#beta) + - [GA](#ga) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) + - [Just support formatted string list instead of introducing list type](#just-support-formatted-string-list-instead-of-introducing-list-type) + - [Just change matchAttribute / distinctSemantics with compatibility](#just-change-matchattribute--distinctsemantics-with-compatibility) + - [Unified semantics field instead of matchSemantics/distinctSemantics](#unified-semantics-field-instead-of-matchsemanticsdistinctsemantics) +- [Infrastructure Needed (Optional)](#infrastructure-needed-optional) + + +## Release Signoff Checklist + + + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) within one minor version of promotion to GA +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + + + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + + + +The Device Resource Assignment (DRA) API currently allows scalar attribute values to describe device characteristics. However, many real-world device topologies require representing sets of relationships (e.g., multiple PCIe roots, NUMA nodes). This KEP introduces support for list-typed attributes in `ResourceSlice` and extends `ResourceClaim`'s `constraints` with a new declarative field, `matchSemantics`/`distinctSemantics`, that can configure _how_ attribute values should be evaluated. + +## Motivation + + + +The `ResourceSlice` API allows users to attach scalar attributes to devices. These can be used to allocate devices that share common topology within the node. For certain types of topological relationships, scalar values are insufficient. For example, a CPU may have adjacency to multiple PCIe roots. This enhancement proposes allowing attributes to be lists. The semantics of the `MatchAttribute` and `DistinctAttribute` constraints must adapt to the possibility of lists. For example, rather than defining an attribute "match" as equality, it would be defined as a non-empty intersection, treating scalars as single-element lists. Conversely, "distinct" attributes for lists would be defined as an empty intersection. + +### Goals + + + +- Support typed-list in device attribute values. +- Define an extensible API in `ResourceClaim`'s `constraints` field which enables flexible matching semantics against single device attribute value. +- Initial supported semantics are + - `NonEmptyIntersection` and `Identical` for `matchAttribute` + - `EmptyIntersection`, `PairwiseDisjoint`, and `AllDistinct` for `distinctAttribute` +- Maintain backward compatibility and inter-operability for scalar-only attributes. +- Provide an extensible API pattern for future matching semantics. + +### Non-Goals + + + +- Redesigning the entire DRA matching model. + - Currently `Allocator`'s algorithm assumes [_monotonic_ constraints](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/allocator_experimental.go#L274-L276) only. Monotonic means that once a constraint returns false, adding more devices will never cause it to return true. This allows to bound the computational complexity for searching device combinations which satisfies the specified constraints. This KEP does NOT intend to change this design. Thus, this KEP focuses to propose monotonic constraints only. +- Introducing generic or complex boolean logic in constraints([KEP-5254: DRA: Constraints with CEL](https://github.com/kubernetes/enhancements/issues/5254)). +- Forcing all drivers to use list attributes immediately. (See ) + +## Proposal + + + + +The proposal has mainly two parts: + +- Add list-types in `DeviceAttribute` so that DRA drivers can expose the attribute values in typed list(`int`, `string`, `boolean`, `version`) +- Add `MatchSemantics`/`DistinctSemantics` field in `DeviceConstraint` so that users can configure flexible semantics over device attribute value specified in `DeviceConstraint.MatchAttribute`/`DistinctAttribute`: + - Initial supported semantics are: + - For `MatchSemantics` + - `Identical (∀i,j, v_i = v_j)`: the attribute values among candidate devices are identical, supporting both list-order-sensitive and set-equivalence comparisons via `listMode`. + - `NonEmptyIntersection (∩ v_k != ∅)`: the intersection (as a set) of all the list values among candidate devices is non-empty. The required intersection size could be configurable via `minSize`. + - For `DistinctSemantics`, + - `AllDistinct (∀i,j, s.t. i != j, v_i != v_j)`: all the attribute values among candidate devices are distinct, supporting both list-order-sensitive and set-equivalence comparisons via `listMode`. + - `EmptyIntersection (∩ v_k = ∅)`: the intersection (as a set) of all the list values among candidate devices is empty. + - `PairwiseDisjoint (∀i,j, s.t. i != j, v_i ∩ v_j = ∅)`: Every pair of the list values (as a set) of candidate devices is disjoint (i.e. completely no overlap). + - Note: for scalar attribute values, `AllDistinct` and `PairwiseDisjoint` with `coerceScalarToList=true` are equivalent. + - For backward compatibility(for old DRA drivers that expose scalar values for the attributes expected to be a list), `coerceScalarToList` can provide implicit type conversion from scalar to singleton. + +### API Changes + +#### Introduce typed-`list` in `DeviceAttribute` + +```yaml +kind: ResourceSlice +spec: + devices: + - name: typed-list-attributes + attributes: + list-of-string: + list: + string: ["pci0000:00", "pci000:01"] + list-of-int: + list: + int: [0, 1, 2] + list-of-bool: + list: + bool: [true, false, true] + list-of-version: + list: + version: ["1.0.0", "1.0.1"] +``` + +#### Introduce `matchSemantics` in `DeviceConstraint` + +```yaml +kind: ResourceClaim +spec: + constraints: + - requests: [ "device1", "device2", "device3" ] + matchAttribute: "resource.kubernetes.io/pcieRoot" + + # [NEW] + # An optional field that defines customized "match" semantics over attribute values. + # This field must not set when "distinctAttribute" is set + matchSemantics: + # mode specifies the "match" semantics + # Identical (∀i,j, v_i = v_j): + # All the attribute values among candidate devices are identical, + # supporting both list-order-sensitive and set-equivalence comparisons via `listMode`. + # NonEmptyIntersection (|∩ v_i| >= k (>=1)): + # The intersection (as a set) of list values among candidate devices is non-empty. + # The required intersection size could be configurable via `minSize`. + # For future possible cases: + # - CommonPrefix/Suffix with customizable length + # - Identical for aggregated values of the list items (min/max/sum/length) + mode: Identical | NonEmptyIntersection + + options: + nonEmptyIntersection: + # if true, implicit cast from scalar to list will be performed. The default is false. + coerceScalarToList: true | false + # minSize specifies the minimum size of the intersection to evaluate as true. + # Default is 1. The value must be positive integer. + minSize: 1 + identical: + coerceScalarToList: true | false # common option + # listMode specified the equality as a set(order/duplicates are ignored) or list (order significant). Default is List + listMode: List | Set +``` +Examples of distinct semantics mode: + +| attribute values | `Identical` | `NonEmptyIntersection`
(`coerceScalarToList=true`) | +|:--:|:--:|:--:| +| `d1="a"`, `d2="b"` | `false` | `false` | +| `d1=["a", "b"]` , `d2=["b", "a"]` | `false`(`listMode: List`)
`true`(`listMode: Set`) | `true`
(`d1 ∩ d2 = {"a", "b"}`) | +| `d1=["a", "b"]` , `d2=["a", "c"]`| `false` | `true`
(`d1 ∩ d2 = {"a"}`) | +| `d1=["a", "b"]` , `d1=["c", "d"]` | `false` | `false`
(`d1 ∩ d2 = ∅`) | + +#### Introduce `distinctSemantics` in `DeviceConstraint` + +```yaml +kind: ResourceClaim +spec: + constraints: + - requests: [ "device1", "device2", "device3" ] + distinctAttribute: "resource.kubernetes.io/numaNode" # note: this is imaginary attribute. + + # [NEW] + # an optional field that defines customized "distinct" semantics over attribute values + # this field must not set when "matchAttribute" is set + distinctSemantics: + # mode specifies the "distinct" semantics + # `AllDistinct`: + # All the values are distinct, supporting both list-order-sensitive and set-equivalence comparisons via `listMode`. + # (i.e. ∀i,j s.t. i ≠ j, v_i != v_j), + # `EmptyIntersection`: + # The intersection (as a set) of all the list values among candidate devices is empty. (i.e. ∩ v_k = ∅ ) + # `PairwiseDisjoint`: + # Every pair of the list values (as a set) of candidate devices is disjoint (i.e. completely no overlap). + # (i.e. ∀i,j s.t. i ≠ j, v_i ∩ v_j = ∅), + # For future possible cases: + # - NoCommonPrefix/Suffix, PairwiseDisjointPrefix/Suffix with customizable length + # - AllDistinct for aggregated values of the list items (min/max/sum/length) + mode: AllDistinct | EmptyIntersection | PairwiseDisjoint + + options: + allDistinct: + coerceScalarToList: true | false # common option + # listMode specified the equality as a set(order/duplicates are ignored) or list (order significant). Default is List + listMode: List | Set + emptyIntersection: + coerceScalarToList: true | false # common option + pairwiseDisjoint: + coerceScalarToList: true | false # common option +``` + +Examples of match semantics mode: + +| attribute values | `AllDistinct` | `PairwiseDistinct`
(`coerceScalarToList=true`) | `EmptyIntersection`
(`coerceScalarToList=true`) | +|:--:|:--:|:--:|:--:| +| `d1="a"`, `d2="b"` | `false` | `false` | `false` | +| `d1=["a", "b"]` , `d2=["b", "a"]` | `true`(`listMode: List`)
`false`(`listMode: Set`) | `false`
(`d1 ∩ d2={"a","b"}`) | `false`
(`∩dk={"a","b"}`) | +| `d1=["a", "b"]` , `d2=["a", "c"]`, `d3=["a", "d"]` | `true` | `false`
(`di ∩ dj = {"a"} ≠ ∅`) | `false`
(`∩ dk = {"a"} ≠ ∅`) | +| `d1=["a", "b"]` , `d2=["b", "c"]`, `d3=["c", "a"]` | `true` | `false`
(`di ∩ dj ≠ ∅`) | `true`
(`∩ dk = ∅`) | +| `d1=["a", "b"]` , `d2=["c", "d"]`, `d3=["e", "f"]` | `true` | `true`
(`di ∩ dj = ∅`) | `true`
(`∩ dk = ∅`) | + +### User Stories (Optional) + + + +#### Story 1: Hardware Topological Aligned CPUs & GPUs & NICs + +Assume several DRA drivers exposed device attribute `resource.kubernetes.io/pcieRoot`: + +```yaml +apiVersion: resource.k8s.io/v1 +kind: ResourceSlice +metadata: + name: cpu +spec: + driver: "cpu.example.com" + pool: + name: "cpu" + resourceSliceCount: 1 + allNodes: node-1 + devices: + - name: "cpu-0" + attributes: + resource.kubernetes.io/pcieRoot: + list: + string: + - pci0000:01 + - pci0000:02 + - name: "cpu-1" + attributes: + resource.kubernetes.io/pcieRoot: + list: + string: + - pci0000:01 + - pci0000:02 +--- +apiVersion: resource.k8s.io/v1 +kind: ResourceSlice +metadata: + name: gpu +spec: + driver: "gpu.example.com" + pool: + name: "gpu" + resourceSliceCount: 1 + allNodes: node-1 + devices: + - name: "gpu-0" + attributes: + # Assume this driver is a bit old that keeps exposing string for the attribute + resource.kubernetes.io/pcieRoot: + string: pci0000:01 +--- +apiVersion: resource.k8s.io/v1 +kind: ResourceSlice +metadata: + name: nic +spec: + driver: "nic.example.com" + pool: + name: "nic" + resourceSliceCount: 1 + allNodes: node-1 + devices: + - name: "nic-0" + attributes: + # Assume this driver is a bit old that keeps exposing string for the attribute + resource.kubernetes.io/pcieRoot: + string: pci0000:02 +``` + +Then, user can create `ResourceClaim` resource which requests PCIe topology aligned CPU & GPU & NIC triple like below: + +```yaml +apiVersion: resource.k8s.io/v1 +kind: ResourceClaim +spec: + requests: + - name: "gpu" + exactly: + deviceClassName: gpu.example.com + count: 1 + - name: "nic" + exactly: + deviceClassName: nic.example.com + count: 1 + - name: "cpu" + exactly: + deviceClassName: cpu.example.com + count: 2 + constraints: + - requests: ["gpu", "nic", "cpu"] + matchAttribute: k8s.io/pcieRoot + attributeEquality: + mode: NonEmptyIntersection + options: + nonEmptyIntersection: + coerceScalarToList: true +``` + +#### Story 2 + +T.B.D. + +### Notes/Constraints/Caveats (Optional) + + + +### Risks and Mitigations + + +- Risk 1: Driver adoption lag + - Mitigation: Keep scalar compatibility; allow opt-in `coerceScalarToList` +- Risk 2: Scheduler performance overhead + - bound lengths of the list-typed attribute values + +## Design Details + + + +### Go Type Definitions + +#### `DeviceAttribute` + +```go +type DeviceAttribute struct { + ... + // ListValue is a typed-list. + // + // +optional + // +k8s:optional + // +k8s:unionMember + ListValue *ListAttribute `json:"list,omitempty"` +} + +// ListAttribute defines typed-list value for device attributes +type ListAttribute struct { + // IntValue is a list of numbers. + // + // +optional + // +k8s:optional + // +k8s:unionMember + // +k8s:listType=atomic + // +k8s:maxItems=64 + IntValue []int64 `json:"int,omitempty"` + + // BoolValue is a list of true/false values. + // + // +optional + // +k8s:optional + // +k8s:unionMember + // +k8s:listType=atomic + // +k8s:maxItems=64 + BoolValue []bool `json:"bool,omitempty"` + + // StringValue is a list of strings. + // Each string must not be longer than 64 characters. + // + // +optional + // +k8s:optional + // +k8s:unionMember + // +k8s:listType=atomic + // +k8s:maxItems=64 + // +k8s:eachVal=+k8s:maxLength=64 + StringValue []string `json:"string,omitempty"` + + // VersionValue is a list of semantic versions according to semver.org spec 2.0.0. + // Each version string must not be longer than 64 characters. + // + // +optional + // +k8s:optional + // +k8s:unionMember + // +k8s:listType=atomic + // +k8s:maxItems=64 + // +k8s:eachVal=+k8s:maxLength=64 + VersionValue []string `json:"version,omitempty"` +} +``` + +#### `DeviceConstraint` + +```go +// DeviceConstraint must have exactly one field set besides Requests. +type DeviceConstraint struct { + ... + // MatchAttribute specified the device attribute name that + // requires the attribute values across those devices are "matched". + // + // The semantics of "match" can be configured by MatchSemantics field. + // When MatchSemantics is not specified, it requires that all devices in + // question have this attribute and that its type and value are the same + // across those devices. + // + // For example, if you specified "dra.example.com/numa" (a hypothetical example!), + // then only devices in the same NUMA node will be chosen. A device which + // does not have that attribute will not be chosen. All devices should + // use a value of the same type for this attribute because that is part of + // its specification, but if one device doesn't, then it also will not be + // chosen. + // + // Must include the domain qualifier. + // + // +optional + // +oneOf=ConstraintType + // +k8s:optional + // +k8s:format=k8s-resource-fully-qualified-name + MatchAttribute *FullyQualifiedName `json:"matchAttribute,omitempty"` + + // DistinctAttribute specified the device attribute name that + // requires the attributes values across the devices are "distinct". + // + // The semantics of "distinct" can be configured by MatchSemantics field. + // + // This constraint is used to avoid allocating multiple requests to the same device + // by ensuring attribute-level differentiation. + // + // This is useful for scenarios where resource requests must be fulfilled by separate physical devices. + // For example, a container requests two network interfaces that must be allocated from two different physical NICs. + // + // +optional + // +oneOf=ConstraintType + // +featureGate=DRAConsumableCapacity + DistinctAttribute *FullyQualifiedName `json:"distinctAttribute,omitempty"` + + // MatchSemantics specified a semantics the device attribute values should be evaluated as "matched" + // This must not be set when "DistinctAttribute` was set. + // + // +optional + // +featureGate=DRAListTypeAttributes + MatchSemantics *MatchSemantics `json:"matchSemantics,omitempty"` + + // distinctSemantics specified a semantics the device attribute values should be evaluated as "distinct" + // This must not be set when "MatchAttribute` was set. + // + // +optional + // +featureGate=DRAListTypeAttributes + DistinctSemantics *DistinctSemantics `json:"distinctSemantics,omitempty"` +} + +// +// MatchSemantics +// + +// MatchMode is a semantic mode for evaluating "MatchAttribute" +// +k8s:enum +type MatchMode string +const ( + // MatchModeNonEmptyIntersection evaluates to true if there is a non-empty set intersection among devices. + MatchModeNonEmptyIntersection MatchMode = "NonEmptyIntersection" + // MatchModeIdentical evaluates to true if attribute values are exactly equal (set or list semantics). + MatchModeIdentical MatchMode = "Identical" +) + +// MatchSemantics defines how it should evaluate attribute values +// across devices as "matched". +type MatchSemantics struct { + // Mode specified match semantic mode for matching device attributes + // +required + Mode MatchMode `json:"mode"` + + // Options configured the behavior for the specified match semantic mode + // +optional + Options MatchOptions `json:"options"` +} + +// MatchOptions holds the configuration for each mode type. Only one option struct +// should be specified at a time, matching the selected MatchMode. +type MatchOptions struct { + // NonEmptyIntersection specifies the options for NonEmptyIntersection mode + // + // +k8s:zeroOrOneOfMember + // +optional + NonEmptyIntersection *NonEmptyIntersectionOptions `json:"nonEmptyIntersection,omitempty"` + + // Identical specifies the options for Exact mode + // + // +k8s:zeroOrOneOfMember + // +optional + Identical *IdenticalOptions `json:"identical,omitempty"` +} + +// SemanticsModeCommonOptions defines shared behavior for attribute evaluation. +type SemanticsModeCommonOptions struct { + // CoerceScalarToList, when true, treats scalar attributes as single-element lists during evaluation. + // This exists mainly for inter-operability for old DRA driver advertised scalar values for the same + // device attribute. Default is false. + CoerceScalarToList bool `json:"coerceScalarToList"` +} + +// NonEmptyIntersectionOptions configures options for NonEmptyIntersection mode. +type NonEmptyIntersectionOptions struct { + SemanticsModeCommonOptions `json:",inline"` + + // MinSize defines the minimum required size of intersection among candidate devices. + // Default: 1. + // + // +k8s:minimum=1 + MinSize int `json:"minSize"` +} + +// AttributeCompareListMode defines a equality evaluation strategies for Exact mode when the attribute value type is list. +// +k8s:enum +type AttributeCompareListMode string +const ( + // AttributeCompareListModeSet compares attributes as sets, ignoring duplicates and order. + AttributeCompareListModeSet AttributeCompareListMode = "Set" + // AttributeCompareListModeList compares attributes as ordered lists, considering sequence and duplicates. + AttributeCompareListModeList AttributeCompareListMode = "List" +) + +// IdenticalOptions defines configures options for Exact mode. +type IdenticalOptions struct { + SemanticsModeCommonOptions `json:",inline"` + + // ListMode specifies whether equality is evaluated as a set (duplicates/order ignored) + // or as an ordered list. + // Default: List. + ListMode AttributeCompareListMode `json:"listMode"` +} + +// +// DistinctSemantics +// + +// DistinctMode is a semantic mode for evaluating "DistinctAttribute" +// +k8s:enum +type DistinctMode string +const ( + // DistinctAllDistinct evaluates to true if all the values are distinct (i.e. ∀i,j s.t. i ≠ j, v_i ≠ v_j) + DistinctAllDistinct DistinctMode = "AllDistinct" + // DistinctModeEmptyIntersection evaluates to true if the intersection of all the values among devices are empty. (i.e. ∩v_k = ∅) + DistinctModeEmptyIntersection DistinctMode = "EmptyIntersection" + // DistinctPairwiseDisjoint evaluates to true if every pair of attribute values of devices are disjoint. (i.e. ∀i,j s.t. i ≠ j, v_i∩v_j = ∅) + DistinctPairwiseDisjoint DistinctMode = "PairwiseDisjoint" +) + +// DistinctSemantics defines how it should evaluate attribute values +// across devices as "distinct". +type DistinctSemantics struct { + // Mode specified semantic mode for evaluating attribute value as "distinct" + // +required + Mode MatchMode `json:"mode"` + + // Options configured the behavior for the specified match semantic mode + // +optional + Options DistinctOptions `json:"options"` +} + +// DistinctOptions holds the configuration for each mode type. Only one option struct +// should be specified at a time, matching the selected DistinctMode. +type DistinctOptions struct { + // AllDistinct specifies the options for Exact mode + // + // +k8s:zeroOrOneOfMember + // +optional + AllDistinct *IdenticalOptions `json:"allDistinct,omitempty"` + + // EmptyIntersection specifies the options for EmptyIntersection mode + // + // +k8s:zeroOrOneOfMember + // +optional + EmptyIntersection *SemanticsModeCommonOptions `json:"emptyIntersection,omitempty"` + + // PairwiseDisjoint specifies the options for PairwiseDisjoint mode + // + // +k8s:zeroOrOneOfMember + // +optional + PairwiseDisjoint *SemanticsModeCommonOptions `json:"pairwiseDisjoint,omitempty"` +} +``` + +### Implementation (for evaluating constraints) + +Because all the proposed constraint are _monotonic_, we would not need updating [`Allocator.Allocate()` algorithm](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/allocator_experimental.go#L135) and can keep using [`constraint` interface](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/allocator_experimental.go#L703-L712). We will just extend the current [`matchAttributeConstraint`](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/allocator_experimental.go#L721C6-L728) and [`distinctAttributeConstraint`](https://github.com/kubernetes/kubernetes/blob/v1.34.2/staging/src/k8s.io/dynamic-resource-allocation/structured/internal/experimental/constraint.go#L34-L41) instances. Or, we could introduce `constraint` instances for proposed modes (e.g., `nonEmptyIntersectionMatchAttributeConstraint`, etc.). + +### Test Plan + + + +[x] I/we understand the owners of the involved components may require updates to +existing tests to make this code solid enough prior to committing the changes necessary +to implement this enhancement. + +##### Prerequisite testing updates + + + +##### Unit tests + + + + + +- ``: `` - `` + +##### Integration tests + + + + + +- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/integration/...): [integration master](https://testgrid.k8s.io/sig-release-master-blocking#integration-master?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature) + +##### e2e tests + + + +- [test name](https://github.com/kubernetes/kubernetes/blob/2334b8469e1983c525c0c6382125710093a25883/test/e2e/...): [SIG ...](https://testgrid.k8s.io/sig-...?include-filter-by-regex=MyCoolFeature), [triage search](https://storage.googleapis.com/k8s-triage/index.html?test=MyCoolFeature) + +### Graduation Criteria + + + +#### Alpha + +- Feature implemented behind a feature flag (`DRAListTypeAttributes`). The Feature gate is disabled by default. +- Documentation provided +- Initial unit, integration and e2e tests completed and enabled. + +#### Beta + +- Feature Gates are enabled by default. +- No major outstanding bugs. +- 1 example of real-world use case. +- Feedback collected from the community (developers and users) with adjustments provided, implemented and tested. + +#### GA + +- 2 examples of real-world use cases. +- Allowing time for feedback from developers and users. + +### Upgrade / Downgrade Strategy + + + +### Version Skew Strategy + + + +For upgrade, existing `ResourceClaim`/`ResourceSlice` will still work as expected, as the new fields are missing there. + +For downgrade, when there exists `ResourceClaim` with `matchSemantics`/`distinctSemantics` field or `ResourceSlice` with `list` type attribute values, there need to be caution. Although the already allocated claim does not affect, but when re-allocating, `matchSemantics`/`distinctSemantics` will be ignored. And, specified attribute in `matchAttribute`/`distinctAttribute` is `list` type, then allocation will be failed. + +## Production Readiness Review Questionnaire + + + +### Feature Enablement and Rollback + + + +###### How can this feature be enabled / disabled in a live cluster? + + + +- [x] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: `DRAListTypeAttributes` + - Components depending on the feature gate: kube-apiserver, kube-**scheduler** +- [ ] Other + - Describe the mechanism: + - Will enabling / disabling the feature require downtime of the control + plane? + - Will enabling / disabling the feature require downtime or reprovisioning + of a node? + +###### Does enabling the feature change any default behavior? + + + +No. Just introducing new API fields in `ResourceClaim` and `ResourceSlice` which does NOT change the default behavior. + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + + + +Yes. It can be disabled after enabled until Beta. When disabled, you can not create `ResourceClaim` with `matchSemantics`/`distinctSemantics` nor `DeviceAttribute` with `list`-type values. And, existing `list`-type attribute values are just ignored, and `matchSemantics`/`distinctSemantics` are just ignored. But, if specified attribute in `matchAttribute`/`distinctAttribute` is `list` type, allocation will be failed. + +###### What happens if we reenable the feature if it was previously rolled back? + +`list`-type attribute values in `DeviceAttribute` and `matchSemantics`/`distinctAttribute` in `ResourceClaim` will be available again. + +###### Are there any tests for feature enablement/disablement? + + + +Yes, it will be covered by [Unit tests](#unit-tests). + +### Rollout, Upgrade and Rollback Planning + + + +###### How can a rollout or rollback fail? Can it impact already running workloads? + + + +###### What specific metrics should inform a rollback? + + + +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? + + + +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? + + + +### Monitoring Requirements + + + +###### How can an operator determine if the feature is in use by workloads? + + + +###### How can someone using this feature know that it is working for their instance? + + + +- [ ] Events + - Event Reason: +- [ ] API .status + - Condition name: + - Other field: +- [ ] Other (treat as last resort) + - Details: + +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? + + + +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? + + + +- [ ] Metrics + - Metric name: + - [Optional] Aggregation method: + - Components exposing the metric: +- [ ] Other (treat as last resort) + - Details: + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + + + +### Dependencies + + + +###### Does this feature depend on any specific services running in the cluster? + + + +### Scalability + + + +###### Will enabling / using this feature result in any new API calls? + + + +No + +###### Will enabling / using this feature result in introducing new API types? + + + +No + +###### Will enabling / using this feature result in any new calls to the cloud provider? + + + +No + +###### Will enabling / using this feature result in increasing size or count of the existing API objects? + + +Yes and no. It does add new fields, which increase the worst case size of `ResourceSlice` and `ResourceClaim` object. However, the increase size is bounded for most cases: +- `ResourceClaim`: linear to the number of constraints specified in the resource. +- `ResourceSlice`: linear to the number of devices defined in the resource. And, the number of list items is also bounded. + +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? + + + +Not expected. All the proposed constraints in this KEP are _monotonic_ constraint. Thus, worst case of computational complexity for device search is the same. + +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? + + + +No. + +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? + + + +No. + +### Troubleshooting + + + +###### How does this feature react if the API server and/or etcd is unavailable? + +###### What are other known failure modes? + + + +###### What steps should be taken if SLOs are not being met to determine the problem? + +## Implementation History + + + +## Drawbacks + + + +## Alternatives + + + +### Just support formatted string list instead of introducing `list` type + +We could add pseudo list type support only for string type attribute (e.g. comma separated string). + +- Pros: + - Simple, no change in `DeviceAttribute` +- Cons: + - String list only (Can't support list of int/version). + - prone to mis-formatted string + - extra parsing computation + +### Just change `matchAttribute` / `distinctSemantics` with compatibility + +We could consider the featuregate `DRAListTypeAttributes` just changes the semantics in backward-compatible manner. That is: + +- Scalar values implicitly cast to sets, and +- for `matchAttribute`, the constraint satisfies when a non-empty intersection exists +- for `distinctAttribute`, the constraint satisfies when elements are pairwise-disjoint + +This can introduce constraints for list type attributes while maintaining backward compatibility. + +- Pros + - Minimal API Change + - Satisfy currently expected use-case (align GPU & NIC &CPU with `resource.kubernetes.io/pcieRoot`) +- Cons + - Not extensible + - Not flexible + - Hard to evolve for new constraints requirements in the future + +### Unified `semantics` field instead of `matchSemantics`/`distinctSemantics` + +We can consider unified `semantics` field for both `matchAttribute`/`distinctAttribute` like below: + +```yaml +semantics: + mode: NonEmptyIntersection | EmptyIntersection | Identical | AllDistinct | PairwiseDisjoint +``` + +- Pros: + - Simple +- Cons: + - Confusing which mode is valid for `matchAttribute` or `distinctAttribute` + - Extra validation logics + +## Infrastructure Needed (Optional) + + diff --git a/keps/sig-scheduling/5491-dra-list-types-for-attributes/kep.yaml b/keps/sig-scheduling/5491-dra-list-types-for-attributes/kep.yaml new file mode 100755 index 00000000000..061476b9d92 --- /dev/null +++ b/keps/sig-scheduling/5491-dra-list-types-for-attributes/kep.yaml @@ -0,0 +1,58 @@ +title: "DRA: List Types for Attributes" +kep-number: 5491 +authors: + - "@everpeace" +owning-sig: sig-scheduling +participating-sigs: [] +# status: provisional|implementable|implemented|deferred|rejected|withdrawn|replaced +status: implementable +creation-date: 2025-11-14 + +reviewers: + - "@johnbelamaric" + - "@klueska" + - "@pohly" + - "@pravk03" +approvers: + - "@johnbelamaric" # WG-Device-Management + - "@klueska" # WG-Device-Management + - "@pohly" # WG-Device-Management + - "@dom4ha" # SIG-Scheduling + - "@liggitt" # API Review + +see-also: + - "/keps/sig-node/4381-dra-structured-parameters" + - "/keps/sig-scheduling/5075-dra-consumable-capacity" + +replaces: [] + +# The target maturity stage in the current dev cycle for this KEP. +# If the purpose of this KEP is to deprecate a user-visible feature +# and a Deprecated feature gates are added, they should be deprecated|disabled|removed. +# stage: alpha|beta|stable +stage: alpha + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.36" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "v1.36" + beta: "" + stable: "" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: DRAListTypeAttributes + components: + - kube-apiserver + - kube-scheduler + +disable-supported: true + +# The following PRR answers are required at beta release +metrics: [] +# - my_feature_metric