Skip to content

Commit a1382ec

Browse files
committed
DRA: promotion to beta
This updates the README to reflect what has been done and fills in sections that were left out earlier. The next milestone is 1.29.
1 parent e950f76 commit a1382ec

File tree

3 files changed

+99
-50
lines changed

3 files changed

+99
-50
lines changed

keps/prod-readiness/sig-node/3063.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,5 @@
44
kep-number: 3063
55
alpha:
66
approver: "@johnbelamaric"
7+
beta:
8+
approver: "@johnbelamaric"

keps/sig-node/3063-dynamic-resource-allocation/README.md

Lines changed: 96 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -721,8 +721,8 @@ For a resource driver the following components are needed:
721721
- *Resource kubelet plugin*: a component which cooperates with kubelet to prepare
722722
the usage of the resource on a node.
723723

724-
An utility library for resource drivers will be developed outside of Kubernetes
725-
and does not have to be used by drivers, therefore it is not described further
724+
An [utility library](https://github.com/kubernetes/kubernetes/tree/master/staging/src/k8s.io/dynamic-resource-allocation) for resource drivers was developed.
725+
It does not have to be used by drivers, therefore it is not described further
726726
in this KEP.
727727

728728
### State and communication
@@ -962,14 +962,6 @@ arbitrarily. Some combinations are more useful than others:
962962

963963
### Coordinating resource allocation through the scheduler
964964

965-
<<[UNRESOLVED pohly]>>
966-
The entire scheduling section is tentative. Key opens:
967-
- Support arbitrary combinations of user- vs. Kubernetes-managed ResourceClaims
968-
and immediate vs. late allocation?
969-
https://github.com/kubernetes/enhancements/pull/3064#discussion_r901948474
970-
<<[/UNRESOLVED]>>
971-
972-
973965
For immediate allocation, scheduling Pods is simple because the
974966
resource is already allocated and determines the nodes on which the
975967
Pod may run. The downside is that pod scheduling is less flexible.
@@ -1399,16 +1391,6 @@ type AllocationResult struct {
13991391
// than one consumer at a time.
14001392
// +optional
14011393
Shareable bool
1402-
1403-
<<[UNRESOLVED pohly]>>
1404-
We will have to discuss use cases and real resource drivers that
1405-
support sharing before deciding on a) which limit is useful and
1406-
b) whether we need a different API that supports an unlimited
1407-
number of users.
1408-
1409-
Any solution that handles reservations differently will have to
1410-
be very careful about race conditions.
1411-
<<[/UNRESOLVED]>>
14121394
}
14131395
14141396
// AllocationResultResourceHandlesMaxSize represents the maximum number of
@@ -2426,13 +2408,23 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
24262408
https://storage.googleapis.com/k8s-triage/index.html
24272409
-->
24282410

2429-
The existing integration tests for kube-scheduler and kubelet will get extended
2430-
to cover scenarios involving dynamic resources. A new integration test will get
2431-
added for the dynamic resource controller.
2411+
The existing [integration tests for kube-scheduler which measure
2412+
performance](https://github.com/kubernetes/kubernetes/tree/master/test/integration/scheduler_perf#readme)
2413+
were extended to also [cover
2414+
DRA](https://github.com/kubernetes/kubernetes/blob/294bde0079a0d56099cf8b8cf558e3ae7230de12/test/integration/scheduler_perf/config/performance-config.yaml#L717-L779)
2415+
and to runs as [correctness
2416+
tests](https://github.com/kubernetes/kubernetes/commit/cecebe8ea2feee856bc7a62f4c16711ee8a5f5d9)
2417+
as part of the normal Kubernetes "integration" jobs. That also covers [the
2418+
dynamic resource
2419+
controller](https://github.com/kubernetes/kubernetes/blob/294bde0079a0d56099cf8b8cf558e3ae7230de12/test/integration/scheduler_perf/util.go#L135-L139).
2420+
2421+
kubelet were extended to cover scenarios involving dynamic resources.
24322422

24332423
For beta:
24342424

2435-
- <test>: <link to test coverage>
2425+
- kube-scheduler, kube-controller-manager: http://perf-dash.k8s.io/#/, [`k8s.io/kubernetes/test/integration/scheduler_perf.scheduler_perf`](https://testgrid.k8s.io/sig-release-master-blocking#integration-master)
2426+
- kubelet: ...
2427+
24362428

24372429
##### e2e tests
24382430

@@ -2447,12 +2439,12 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
24472439
-->
24482440

24492441
End-to-end testing depends on a working resource driver and a container runtime
2450-
with CDI support. A mock driver will be developed in parallel to developing the
2451-
code in Kubernetes, but as it will depend on the new APIs, we have to get those
2452-
merged first.
2442+
with CDI support. A [test driver](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/dra/test-driver)
2443+
was developed in parallel to developing the
2444+
code in Kubernetes.
24532445

2454-
Such a mock driver could be as simple as taking parameters from ResourceClass
2455-
and ResourceClaim and turning them into environment variables that then get
2446+
That test driver simply takes parameters from ResourceClass
2447+
and ResourceClaim and turns them into environment variables that then get
24562448
checked inside containers. Tests for different behavior of an driver in various
24572449
scenarios can be simulated by running the control-plane part of it in the E2E
24582450
test itself. For interaction with kubelet, proxying of the gRPC interface can
@@ -2465,14 +2457,11 @@ All tests that don't involve actually running a Pod can become part of
24652457
conformance testing. Those tests that run Pods cannot be because CDI support in
24662458
runtimes is not required.
24672459

2468-
Once we have end-to-end tests, at least two Prow jobs will be defined:
2469-
- A pre-merge job that will be required and run only for the in-tree code of
2470-
this KEP (`optional: false`, `run_if_changed` set, `always_run: false`).
2471-
- A periodic job that runs the same tests to determine stability and detect
2472-
unexpected regressions.
2473-
24742460
For beta:
2475-
- <test>: <link to test coverage>
2461+
- pre-merge with kind (optional, triggered for code which has an impact on DRA): https://testgrid.k8s.io/sig-node-dynamic-resource-allocation#pull-kind-dra
2462+
- periodic with kind: https://testgrid.k8s.io/sig-node-dynamic-resource-allocation#ci-kind-dra
2463+
- pre-merge with CRI-O: https://testgrid.k8s.io/sig-node-dynamic-resource-allocation#pull-node-dra
2464+
- periodic with CRI-O: https://testgrid.k8s.io/sig-node-dynamic-resource-allocation#ci-node-e2e-crio-dra
24762465

24772466
### Graduation Criteria
24782467

@@ -2602,7 +2591,7 @@ There will be pods which have a non-empty PodSpec.ResourceClaims field and Resou
26022591
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
26032592

26042593
For kube-controller-manager, metrics similar to the generic ephemeral volume
2605-
controller will be added:
2594+
controller [were added](https://github.com/kubernetes/kubernetes/blob/163553bbe0a6746e7719380e187085cf5441dfde/pkg/controller/resourceclaim/metrics/metrics.go#L32-L47):
26062595

26072596
- [X] Metrics
26082597
- Metric name: `resource_controller_create_total`
@@ -2729,7 +2718,65 @@ already received all the relevant updates (Pod, ResourceClaim, etc.).
27292718

27302719
###### What are other known failure modes?
27312720

2732-
To be added for beta.
2721+
- DRA driver does not or cannot allocate a resource claim.
2722+
2723+
- Detection: The primary mechanism is through vendors-provided monitoring for
2724+
their driver. That monitor needs to include health of the driver, availability
2725+
of the underlying resource, etc. The common helper code for DRA drivers
2726+
posts events for a ResourceClaim when an allocation attempt fails.
2727+
2728+
When pods fail to get scheduled, kube-scheduler reports that through events
2729+
and pod status. For DRA, that includes "waiting for resource driver to
2730+
provide information" (node not selected yet) and "waiting for resource
2731+
driver to allocate resource" (node has been selected). The
2732+
["unschedulable_pods"](https://github.com/kubernetes/kubernetes/blob/9fca4ec44afad4775c877971036b436eef1a1759/pkg/scheduler/metrics/metrics.go#L200-L206)
2733+
metric will have pods counted under the "dynamicresources" plugin label.
2734+
2735+
To troubleshoot, "kubectl describe" can be used on (in this order) Pod,
2736+
ResourceClaim, PodSchedulingContext.
2737+
2738+
- Mitigations: This depends on the vendor of the DRA driver.
2739+
2740+
- Diagnostics: In kube-scheduler, -v=4 enables simple progress reporting
2741+
in the "dynamicresources" plugin. -v=5 provides more information about
2742+
each plugin method. The special status results mentioned above also get
2743+
logged.
2744+
2745+
- Testing: E2E testing covers various scenarios that involve waiting
2746+
for a DRA driver. This also simulates partial allocation of node-local
2747+
resources in one driver and then failing to allocate the remaining
2748+
resources in another driver (the "need to deallocate" fallback).
2749+
2750+
- A Pod gets scheduled without allocating resources.
2751+
2752+
- Detection: The Pod either fails to start (when kubelet has DRA
2753+
enabled) or gets started without the resources (when kubelet doesn't
2754+
have DRA enabled), which then will fail in an application specific
2755+
way.
2756+
2757+
- Mitigations: DRA must get enabled properly in kubelet and kube-controller-manager.
2758+
Then kube-controller-manager will try to allocate and reserve resources for
2759+
already scheduled pods. To prevent this from happening for new pods, DRA
2760+
must get enabled in kube-scheduler.
2761+
2762+
- Diagnostics: kubelet will log pods without allocated resources as errors
2763+
and emit events for them.
2764+
2765+
- Testing: An E2E test covers the expected behavior of kubelet and
2766+
kube-controller-manager by creating a pod with `spec.nodeName` already set.
2767+
2768+
- A DRA driver kubelet plugin fails to prepare resources.
2769+
2770+
- Detection: The Pod fails to start after being scheduled.
2771+
2772+
- Mitigations: This depends on the specific DRA driver and has to be documented
2773+
by vendors.
2774+
2775+
- Diagnostics: kubelet will log pods with such errors and emit events for them.
2776+
2777+
- Testing: An E2E test covers the expected retry mechanism in kubelet when
2778+
`NodePrepareResources` fails intermittently.
2779+
27332780

27342781
<!--
27352782
For each of them, fill in the following information by copying the below template:
@@ -2746,20 +2793,20 @@ For each of them, fill in the following information by copying the below templat
27462793

27472794
###### What steps should be taken if SLOs are not being met to determine the problem?
27482795

2749-
To be added for beta.
2796+
Performance depends on a large extend on how individual DRA drivers are
2797+
implemented. Vendors will have to provide their own SLOs and troubleshooting
2798+
instructions.
27502799

27512800
## Implementation History
27522801

2753-
<!--
2754-
Major milestones in the lifecycle of a KEP should be tracked in this section.
2755-
Major milestones might include:
2756-
- the `Summary` and `Motivation` sections being merged, signaling SIG acceptance
2757-
- the `Proposal` section being merged, signaling agreement on a proposed design
2758-
- the date implementation started
2759-
- the first Kubernetes release where an initial version of the KEP was available
2760-
- the version of Kubernetes where the KEP graduated to general availability
2761-
- when the KEP was retired or superseded
2762-
-->
2802+
- Kubernetes 1.25: KEP accepted as "implementable".
2803+
- Kubernetes 1.26: Code merged as "alpha".
2804+
- Kubernetes 1.27: API breaks (batching of NodePrepareResource in kubelet API,
2805+
AllocationResult in ResourceClaim status can provide results for multiple
2806+
drivers).
2807+
- Kubernetes 1.28: API break (ResourceClaim names for claims created from
2808+
a template are generated instead of deterministic), scheduler performance
2809+
enhancements (no more backoff delays).
27632810

27642811
## Drawbacks
27652812

keps/sig-node/3063-dynamic-resource-allocation/kep.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ stage: alpha
2424
# The most recent milestone for which work toward delivery of this KEP has been
2525
# done. This can be the current (upcoming) milestone, if it is being actively
2626
# worked on.
27-
latest-milestone: "v1.28"
27+
latest-milestone: "v1.29"
2828

2929
# The milestone at which this feature was, or is targeted to be, at each stage.
3030
milestone:

0 commit comments

Comments
 (0)