Skip to content

Commit c1f4e66

Browse files
committed
addressed review feedback
1 parent c37374e commit c1f4e66

File tree

1 file changed

+15
-14
lines changed
  • keps/sig-scheduling/5004-dra-extended-resource

1 file changed

+15
-14
lines changed

keps/sig-scheduling/5004-dra-extended-resource/README.md

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -51,19 +51,19 @@
5151

5252
Items marked with (R) are required *prior to targeting to a milestone / release*.
5353

54-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
55-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
56-
- [ ] (R) Design details are appropriately documented
57-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
58-
- [ ] e2e Tests for all Beta API Operations (endpoints)
54+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
55+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
56+
- [x] (R) Design details are appropriately documented
57+
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
58+
- [x] e2e Tests for all Beta API Operations (endpoints)
5959
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
6060
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
6161
- [ ] (R) Graduation criteria is in place
6262
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
63-
- [ ] (R) Production readiness review completed
64-
- [ ] (R) Production readiness review approved
65-
- [ ] "Implementation History" section is up-to-date for milestone
66-
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
63+
- [x] (R) Production readiness review completed
64+
- [x] (R) Production readiness review approved
65+
- [x] "Implementation History" section is up-to-date for milestone
66+
- [x] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
6767
- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
6868

6969
<!--
@@ -815,8 +815,8 @@ ensure `ExtendedResourceName`s are handled by the scheduler as described in this
815815

816816
#### Beta
817817

818-
- Reevaluate where to create the special resource claim, in scheduler or some
819-
other controller, based on feedback from Alpha and the nomination concept.
818+
- The basic scoring in NodeResourcesFit has to be implemented and that the queueing hints have to work efficiently.
819+
- Keep the Alpha behavior to create the special resource claim in scheduler.
820820
- Gather feedback from developers and surveys
821821
- 3 examples of vendors making use of the extensions proposed in this KEP
822822
- Scalability tests that mirror real-world usage as determined by user feedback
@@ -996,7 +996,7 @@ Recall that end users cannot usually observe component logs or access metrics.
996996
- Details:
997997
-->
998998
- [x] API .status
999-
- Other field: `.status.extendedResourceClaimStatus` will have a list of resource claims that are created for
999+
- Other field: Pod's `.status.extendedResourceClaimStatus` will have a list of resource claims that are created for
10001000
DRA extended resources.
10011001

10021002
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
@@ -1067,7 +1067,8 @@ Pick one more of these and delete the rest.
10671067
- Type: Counter
10681068
- Labels: `status` ("failure", "success")
10691069
- SLI Usage: Calculate success rate to monitor the reliability of automatic resource claim creation. High failure rates indicate potential issues with extended resource configuration.
1070-
- Because the resource claim is created in the scheduler, we need a different metric from `resourceclaim_controller_creates_total`.
1070+
- Because the resource claim is created in the scheduler PreBind phase by making k8s API call, we need a different metric from `resourceclaim_controller_creates_total`.
1071+
- The metric is incremented accordingly based on the API call outcome, either success or failure.
10711072

10721073
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
10731074

@@ -1156,7 +1157,7 @@ still applies.
11561157
###### How does this feature react if the API server and/or etcd is unavailable?
11571158

11581159
The Kubernetes control plane will be down, so no new Pods get scheduled. kubelet may
1159-
still be able to start or or restart containers if it already received all the relevant
1160+
still be able to start or restart containers if it already received all the relevant
11601161
updates (Pod, ResourceClaim, etc.).
11611162

11621163
###### What are other known failure modes?

0 commit comments

Comments
 (0)