You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-node/2403-pod-resources-allocatable-resources/README.md
+16-1Lines changed: 16 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,7 +43,7 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
43
43
-[X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
44
44
-[X] (R) Graduation criteria is in place
45
45
-[X] (R) Production readiness review completed
46
-
-[X] Production readiness review approved
46
+
-[X](R) Production readiness review approved
47
47
-[X] "Implementation History" section is up-to-date for milestone
48
48
- ~~ [] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] ~~
49
49
-[X] Supporting documentation e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes
@@ -102,6 +102,10 @@ The GRPC Service will expose an additional endpoint:
102
102
- 'GetAllocatableResources`, which returns a single AllocatableResourcesResponse, enabling monitor applications to query for the allocatable set of resources available on the node.
103
103
This endpoint will return error if the corresponding feature gate is disabled.
104
104
105
+
NOTE:
106
+
107
+
-`GetAllocatableResources` should only be used to evaluate [allocatable](https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable) resources on a node. If the goal is to evaluate free/unallocated resources it should be used in conjunction with the List() endpoint. The result obtained by `GetAllocatableResources` would remain the same unless the underlying resources exposed to kubelet change. This happens rarely but when it does (e.g. CPUs onlined/offlined, devices added/removed), client is expected to call `GetAlloctableResources` endpoint.
108
+
105
109
The extended interface is shown in proto below:
106
110
```protobuf
107
111
// PodResources is a service provided by the kubelet that provides information about the
@@ -139,6 +143,7 @@ message ContainerResources {
139
143
string name = 1;
140
144
repeated ContainerDevices devices = 2;
141
145
repeated int64 cpu_ids = 3;
146
+
repeated ContainerMemory memory = 4;
142
147
}
143
148
144
149
// Topology describes hardware topology of the resource
@@ -165,6 +170,8 @@ message ContainerDevices {
165
170
The implementation PR adds a suite of E2E tests which cover both the existing `List` endpoint already implemented in the podresources API and
166
171
the new proposed `GetAllocatableResources` API.
167
172
173
+
Add additional tests to prove that unhealthy devices are skipped as part of GetAllocatable and empty NUMA topology is not returned.
174
+
168
175
### Graduation Criteria
169
176
170
177
#### Alpha
@@ -174,6 +181,13 @@ the new proposed `GetAllocatableResources` API.
174
181
#### Alpha to Beta Graduation
175
182
-[X] The new API is consumed by other public software components (e.g. NFD).
176
183
-[X] No major bugs reported in the previous cycle.
184
+
-[X] Ensure that empty NUMA topology is handled properly.
185
+
-[X] Ensure that unhealthy devices are skipped in GetAllocatable.
186
+
-[X] External clients are using this capability in their solutions
187
+
Topology aware Scheduling is one of the primary use cases of GetAllocatableResource podresource endpoint. As part of this initiative an exporter populates CRs per node to expose the information of resources available per NUMA. Pod Resource API `List` and `GetAllocatableResources` API endpoints are used to obtain resource allocation of running pods along with the underlying hardware topology (NUMA) information. Topology aware scheduler can be configured such that users can create custom exporters or use already existing exporters to expose the NodeResourceTopology information as CRs and then [Topology aware Scheduler](https://github.com/kubernetes-sigs/scheduler-plugins/tree/master/pkg/noderesourcetopology) uses this information to make a NUMA aware placement decision leading to the reduction of occurrence of Topology affinity Errors highlighted in the issue [here](https://github.com/kubernetes/kubernetes/issues/84869).
188
+
Examples of two such exporters are:
189
+
-[Node feature Discovery](https://github.com/kubernetes-sigs/node-feature-discovery) for exposing resource topology information as part of the initiative here: [Introducing NFD Topology Updater exposing Resource hardware Topology info through CRs](https://github.com/kubernetes-sigs/node-feature-discovery/pull/525).
0 commit comments