Skip to content

Commit bc030a8

Browse files
committed
Fixes based on review
* Set data slice to single RawExtension * Remove NetworkAddress struct and set Addresses as slice of string * Add unit tests for registry strategy for dropping field * Add e2e tests * Add rollout / rollback details * Add resourceclaim_status_devices_update_attempts_total and resourceclaim_status_devices_update_failures_total metrics * Set status as implementable * Set latest milestone * Set author
1 parent 2b93fbd commit bc030a8

File tree

2 files changed

+50
-40
lines changed

2 files changed

+50
-40
lines changed

keps/sig-node/4817-resource-claim-device-status/README.md

Lines changed: 42 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -130,12 +130,12 @@ A device, identified by `<driver name>/<pool name>/<device name>` can be
130130
represented only once in the `Devices` slice and will also mention which
131131
request caused the device to be allocated. The state and characteristics of the
132132
device are reported in the `Conditions`, representing the operational state of
133-
the device and in the `Data`, an arbitrary data slice representing device
133+
the device and in the `Data`, an arbitrary data field representing device
134134
specific characteristics. Additionally, for networking devices, a field
135135
`NetworkData` can be used to report the IPs, the MAC address and the
136136
interface name.
137137

138-
`Data` being a slice of arbitrary data allows the DRA Driver to store
138+
`Data` being an arbitrary data field allows the DRA Driver to store
139139
device specific data in different formats. For example, a Network Device being
140140
configured by via a CNI plugin could get its `Data` field filled with the CNI
141141
result for troubleshooting purpose and with a Network result in a modern
@@ -161,7 +161,7 @@ type ResourceClaimStatus struct {
161161
// +listMapKey=driver
162162
// +listMapKey=device
163163
// +listMapKey=pool
164-
// +featureGate=DRAResourceClaimDeviceStatus
164+
// +featureGate=DRAResourceClaimDeviceStatus
165165
Devices []AllocatedDeviceStatus `json:"devices,omitempty" protobuf:"bytes,4,opt,name=devices"`
166166
}
167167

@@ -204,8 +204,7 @@ type AllocatedDeviceStatus struct {
204204
// Data contains arbitrary driver-specific data.
205205
//
206206
// +optional
207-
// +listType=atomic
208-
Data []runtime.RawExtension `json:"data,omitempty" protobuf:"bytes,5,opt,name=data"`
207+
Data runtime.RawExtension `json:"data,omitempty" protobuf:"bytes,5,opt,name=data"`
209208

210209
// NetworkData contains network-related information specific to the device.
211210
//
@@ -226,26 +225,19 @@ type NetworkDeviceData struct {
226225

227226
// Addresses lists the network addresses assigned to the device's network interface.
228227
// This can include both IPv4 and IPv6 addresses.
228+
// The addresses are in the CIDR notation, which includes both the address and the
229+
// associated subnet mask.
230+
// e.g.: "192.0.2.0/24" for IPv4 and "2001:db8::/64" for IPv6.
229231
//
230232
// +optional
231233
// +listType=atomic
232-
Addresses []NetworkAddress `json:"addresses,omitempty" protobuf:"bytes,2,opt,name=addresses"`
234+
Addresses []string `json:"addresses,omitempty" protobuf:"bytes,2,opt,name=addresses"`
233235

234236
// HWAddress represents the hardware address (e.g. MAC Address) of the device's network interface.
235237
//
236238
// +optional
237239
HWAddress string `json:"hwAddress,omitempty" protobuf:"bytes,3,opt,name=hwAddress"`
238240
}
239-
240-
// NetworkAddress provides a network address related details such as IP and Mask.
241-
type NetworkAddress struct {
242-
// CIDR contains the network address in CIDR notation, which includes
243-
// both the address and the associated subnet mask.
244-
// e.g.: "192.0.2.0/24" for IPv4 and "2001:db8::/64" for IPv6.
245-
//
246-
// +required
247-
CIDR string `json:"cidr,omitempty" protobuf:"bytes,1,rep,name=cidr"`
248-
}
249241
```
250242

251243
### User Stories (Optional)
@@ -288,7 +280,7 @@ the device they manage. An access control must be set in place to restrict the
288280
write access to the appropriate driver (A device status can only be updated by
289281
the entities that have a direct control over the device(s) being reported).
290282

291-
Adding `Data` as an arbitrary data slice may introduce extra processing
283+
Adding `Data` as an arbitrary data field may introduce extra processing
292284
and storage overhead which might impact performance in a cluster with many
293285
devices and frequent status updates. In large-scale clusters where many devices
294286
are allocated, this impact must be considered.
@@ -404,9 +396,13 @@ to implement this enhancement.
404396
- A device can only be reported once in the `ResourceClaim`.
405397
- The reported device is allocated in the `ResourceClaim`.
406398
- Properties set in `AllocatedDeviceStatus` are in the correct format.
399+
- `ResourceClaim` registry strategy:
400+
- With the feature gate enabled, the field is not dropped.
401+
- With the feature gate disabled, the field is dropped.
407402

408403
Coverage:
409-
- `k8s.io/kubernetes/pkg/apis/resource/validation`: `9/30/2024` - `77.1`
404+
- `k8s.io/kubernetes/pkg/apis/resource/validation`: `9/28/2024` - `77.1`
405+
- `k8s.io/kubernetes/pkg/registry/resource/resourceclaim`: `9/28/2024` - `75.5`
410406

411407
##### Integration tests
412408

@@ -419,7 +415,13 @@ Coverage:
419415

420416
##### e2e tests
421417

422-
TBD
418+
The [DRA test driver](https://github.com/kubernetes/kubernetes/tree/master/test/e2e/dra/test-driver)
419+
will be extended to support the new `ResourceClaim.Status.Devices` field and to
420+
populate it with data related to the device configured in the pod.
421+
422+
A new network DRA Driver will be implemented (or extended from the existing DRA
423+
test driver) to support networking type of devices and report their
424+
network status.
423425

424426
### Graduation Criteria
425427

@@ -479,24 +481,28 @@ No
479481

480482
###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)?
481483

482-
Yes, with no side effects except for missing the new field in
483-
`ResourceClaim.Status`. Re-enabling this feature will not guarantee to keep
484-
the values written before the feature has been disabled.
484+
Yes, with no side effects except for missing the new field `Devices` in
485+
`ResourceClaim.Status`.
485486

486487
###### What happens if we reenable the feature if it was previously rolled back?
487488

489+
The field will be available again for read and write, but this feature will
490+
not guarantee to keep the values written before the feature has been disabled.
491+
DRA-Drivers will then have to write again the `ResourceClaim.Status.Devices`.
492+
488493
###### Are there any tests for feature enablement/disablement?
489494

490495
Enablement/disablement of this feature is tested as part of the integration
491496
tests.
492497

493498
### Rollout, Upgrade and Rollback Planning
494499

495-
No
496-
497500
###### How can a rollout or rollback fail? Can it impact already running workloads?
498501

499-
No
502+
Enabling the feature will enable the field to be written and therefore invoke
503+
validation of the field. Since the feature does nothing more than allow storage
504+
of this data, the only way it can really fail is to crash in validation. It
505+
will not affect running workloads.
500506

501507
###### What specific metrics should inform a rollback?
502508

@@ -532,27 +538,24 @@ N/A
532538

533539
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
534540

535-
- [ ] Metrics
536-
- Metric name:
541+
- [x] Metrics
542+
- Metric name: `resourceclaim_status_devices_update_attempts_total` ; `resourceclaim_status_devices_update_failures_total`
537543
- [Optional] Aggregation method:
538-
- Components exposing the metric:
544+
- Components exposing the metric: kube-apiserver
539545
- [ ] Other (treat as last resort)
540546
- Details:
541547

542548
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
543549

544-
<!--
545-
Describe the metrics themselves and the reasons why they weren't added (e.g., cost,
546-
implementation difficulties, etc.).
547-
-->
550+
No
548551

549552
### Dependencies
550553

551554
[KEP-4381 - DRA Structured Parameters](https://github.com/kubernetes/enhancements/issues/4381)
552555

553556
###### Does this feature depend on any specific services running in the cluster?
554557

555-
No
558+
No, the field won't be populated unless a DRA driver utilizes it.
556559

557560
### Scalability
558561

@@ -674,11 +677,12 @@ Here is below the proposed API:
674677
// state of a system, especially if the node that hosts the pod cannot contact the control
675678
// plane.
676679
type PodStatus struct {
677-
[...]
678-
// Networks is a list of PodNetworks that are attached to the Pod.
679-
//
680-
// +optional
681-
Networks []NetworkStatus `json:"networks,omitempty"`
680+
...
681+
682+
// Networks is a list of PodNetworks that are attached to the Pod.
683+
//
684+
// +optional
685+
Networks []NetworkStatus `json:"networks,omitempty"`
682686
}
683687

684688
// NetworkStatus provides the status of specific PodNetwork in a Pod.

keps/sig-node/4817-resource-claim-device-status/kep.yaml

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
title: Resource Claim Status With Possible Standardized Network Interface Data
22
kep-number: 4817
33
authors:
4-
- "@jane.doe"
4+
- "@LionelJouin"
55
owning-sig: sig-node
66
participating-sigs:
77
- sig-node
88
- sig-network
9-
status: provisional
9+
status: implementable
1010
creation-date: 2024-08-30
1111
reviewers:
1212
- "@johnbelamaric"
@@ -29,6 +29,8 @@ see-also:
2929
# The target maturity stage in the current dev cycle for this KEP.
3030
stage: alpha #beta|stable
3131

32+
latest-milestone: "v1.32"
33+
3234
# The milestone at which this feature was, or is targeted to be, at each stage.
3335
milestone:
3436
alpha: "v1.32"
@@ -40,3 +42,7 @@ feature-gates:
4042
components:
4143
- kube-apiserver
4244
disable-supported: true
45+
46+
metrics:
47+
- resourceclaim_status_devices_update_attempts_total
48+
- resourceclaim_status_devices_update_failures_total

0 commit comments

Comments
 (0)