You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Items marked with (R) are required *prior to targeting to a milestone / release*.
45
46
46
47
-[X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
47
-
-[] (R) KEP approvers have approved the KEP status as `implementable`
48
-
-[] (R) Design details are appropriately documented
49
-
-[] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
48
+
-[X] (R) KEP approvers have approved the KEP status as `implementable`
49
+
-[X] (R) Design details are appropriately documented
50
+
-[X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
50
51
-[ ] e2e Tests for all Beta API Operations (endpoints)
51
52
-[ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
52
53
-[ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
@@ -94,7 +95,10 @@ Exposing unhealthy devices in Pod Status will provide a generic way to understan
94
95
95
96
As part of the InPlacePodVerticalScaling KEP, the two new fields were introduced in Pod Status to reflect the currently allocated resources for the Pod:
96
97
97
-
```
98
+
```go
99
+
typeContainerStatusstruct {
100
+
...
101
+
98
102
// AllocatedResources represents the compute resources allocated for this container by the
99
103
// node. Kubelet sets this value to Container.Resources.Requests upon successful pod admission
100
104
// and after successfully admitting desired pod resize.
@@ -107,6 +111,9 @@ As part of the InPlacePodVerticalScaling KEP, the two new fields were introduced
// ResourceID is calculated based on the source of this resource health information.
138
173
// For DevicePlugin:
139
-
// deviceplugin:DeviceID, where DeviceID is from the Device structure of DevicePlugin's ListAndWatchResponse type: https://github.com/kubernetes/kubernetes/blob/eda1c780543a27c078450e2f17d674471e00f494/staging/src/k8s.io/kubelet/pkg/apis/deviceplugin/v1alpha/api.proto#L61-L73
174
+
//
175
+
// DeviceID, where DeviceID is from the Device structure of DevicePlugin's ListAndWatchResponse type: https://github.com/kubernetes/kubernetes/blob/eda1c780543a27c078450e2f17d674471e00f494/staging/src/k8s.io/kubelet/pkg/apis/deviceplugin/v1alpha/api.proto#L61-L73
176
+
//
140
177
// DevicePlugin ID is usually a constant for the lifetime of a Node and typically can be used to uniquely identify the device on the node.
141
178
// For DRA:
142
-
// dra:<driver name>/<pool name>/<device name>: such a device can be looked up in the information published by that DRA driver to learn more about it. It is designed to be globally unique in a cluster.
179
+
//
180
+
// <driver name>/<pool name>/<device name>: such a device can be looked up in the information published by that DRA driver to learn more about it. It is designed to be globally unique in a cluster.
143
181
typeResourceIDstring
144
182
183
+
// ResourceHealth represents the health of a resource. It has the latest device health information.
184
+
// This is a part of KEP https://kep.k8s.io/4680 and historical health changes are planned to be added in future iterations of a KEP.
145
185
typeResourceHealthstruct {
146
-
// List of conditions with the transition times
147
-
Conditions []ResourceHealthCondition
186
+
// ResourceID is the unique identifier of the resource. See the ResourceID type for more information.
@@ -233,7 +286,6 @@ We should consider introducing another field to the Status that will be a free f
233
286
### DRA implementation details
234
287
235
288
Today DRA does not return the health of the device back to kubelet. The proposal is to extend the
236
-
237
289
type `NamedResourcesInstance` (from [pkg/apis/resource/namedresources.go](https://github.com/kubernetes/kubernetes/blob/790dfdbe386e4a115f41d38058c127d2dd0e6f44/pkg/apis/resource/namedresources.go#L29-L37)) to include the Health field the same way it is done in
238
290
the Device Plugin as well as a device ID.
239
291
@@ -245,7 +297,41 @@ The API will be limited to "prepared" devices and include the claim `name/namesp
245
297
246
298
Kubelet will react on this field the same way as we propose to do it for the Device Plugin.
247
299
248
-
Specific implementation details will be added for the beta.
300
+
The new method will be added to the same gRPC server that serves the [Node service](https://github.com/kubernetes/kubernetes/blob/04bba3c222bb2c5b1b1565713de4bf334ee7fbe4/staging/src/k8s.io/kubelet/pkg/apis/dra/v1alpha4/api.proto#L34) interface (Node service exposes
301
+
`NodePrepareResources` and `NodeUnprepareResources`). The new interface will have a `Device` structure similar to Node Service's device, with the added `health` field:
302
+
303
+
```proto
304
+
service NodeHealth {
305
+
...
306
+
307
+
// WatchDevicesStatus returns a stream of List of Devices
308
+
// Whenever a Device state change or a Device disappears, WatchDevicesStatus
309
+
// returns the new list.
310
+
// This method is optional and may not be implemented.
0 commit comments