|
| 1 | +# KEP-4009: Add CDI devices to device plugin API |
| 2 | + |
| 3 | +<!-- toc --> |
| 4 | +- [Release Signoff Checklist](#release-signoff-checklist) |
| 5 | +- [Summary](#summary) |
| 6 | +- [Motivation](#motivation) |
| 7 | + - [Goals](#goals) |
| 8 | +- [Design Details](#design-details) |
| 9 | + - [Test Plan](#test-plan) |
| 10 | + - [Prerequisite testing updates](#prerequisite-testing-updates) |
| 11 | + - [Unit tests](#unit-tests) |
| 12 | + - [Integration tests](#integration-tests) |
| 13 | + - [e2e tests](#e2e-tests) |
| 14 | + - [Graduation Criteria](#graduation-criteria) |
| 15 | + - [Alpha](#alpha) |
| 16 | + - [Alpha to Beta Graduation](#alpha-to-beta-graduation) |
| 17 | + - [Beta to G.A Graduation](#beta-to-ga-graduation) |
| 18 | + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) |
| 19 | + - [Version Skew Strategy](#version-skew-strategy) |
| 20 | +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) |
| 21 | + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) |
| 22 | + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) |
| 23 | + - [Monitoring Requirements](#monitoring-requirements) |
| 24 | + - [Dependencies](#dependencies) |
| 25 | + - [Scalability](#scalability) |
| 26 | + - [Troubleshooting](#troubleshooting) |
| 27 | +- [Implementation History](#implementation-history) |
| 28 | +- [Drawbacks](#drawbacks) |
| 29 | +- [Alternatives](#alternatives) |
| 30 | +<!-- /toc --> |
| 31 | + |
| 32 | +## Release Signoff Checklist |
| 33 | + |
| 34 | +Items marked with (R) are required *prior to targeting to a milestone / release*. |
| 35 | + |
| 36 | +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) |
| 37 | +- [ ] (R) KEP approvers have approved the KEP status as `implementable` |
| 38 | +- [ ] (R) Design details are appropriately documented |
| 39 | +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) |
| 40 | + - [ ] e2e Tests for all Beta API Operations (endpoints) |
| 41 | + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) |
| 42 | + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free |
| 43 | +- [ ] (R) Graduation criteria is in place |
| 44 | + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) |
| 45 | +- [ ] (R) Production readiness review completed |
| 46 | +- [ ] (R) Production readiness review approved |
| 47 | +- [ ] "Implementation History" section is up-to-date for milestone |
| 48 | +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] |
| 49 | +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes |
| 50 | + |
| 51 | +[kubernetes.io]: https://kubernetes.io/ |
| 52 | +[kubernetes/enhancements]: https://git.k8s.io/enhancements |
| 53 | +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes |
| 54 | +[kubernetes/website]: https://git.k8s.io/website |
| 55 | + |
| 56 | +## Summary |
| 57 | + |
| 58 | +This KEP proposes extending the Device Plugin API, adding a field to specify |
| 59 | +Container Device Interface (CDI) device IDs in the `AllocateResponse`. This |
| 60 | +supplements the existing fields such as annotations and allows device plugin |
| 61 | +implementations to uniquely specify devices using their fully-qualified CDI |
| 62 | +devices names. |
| 63 | + |
| 64 | +The recent addition of CDI device IDs to the CRI structures in [#3731](https://github.com/kubernetes/enhancements/pull/3731) allow these IDs to be forwarded to the CRI runtimes in a secure manner. Although |
| 65 | +these changes were motivated by [KEP-3063](https://github.com/kubernetes/enhancements/issues/3063), adding support for these fields to the |
| 66 | +existing device plugin API allows this mechanism to also be used for devices |
| 67 | +supported by these plugins. |
| 68 | + |
| 69 | +## Motivation |
| 70 | + |
| 71 | +The Container Device Inteface (CDI) provides a standard mechanism for device |
| 72 | +vendors to describe what is required to provide access to a specific resource |
| 73 | +such as a GPU. These resources can be uniquely identified using a |
| 74 | +fully-qualified CDI device name. |
| 75 | + |
| 76 | +The changes proposed in [#3731]((https://github.com/kubernetes/enhancements/pull/3731)) extend the CRI to provide a well-defined mechanism for forwarding such |
| 77 | +requests to CRI runtimes such as Containerd and Cri-o. These have already |
| 78 | +been extended to accept CDI device requests, and to use the associated CDI |
| 79 | +specifications to ensure that the required |
| 80 | +modifications are made to the OCI runtime specification for a container being |
| 81 | +launched. |
| 82 | + |
| 83 | +The addition of an explicit field for specifying CDI device names to the Device |
| 84 | +Plugin API allows this CRI field to be used to indicate which devices should be |
| 85 | +injected. This removes the need to use workarounds such as container annotations |
| 86 | +to pass this information to the runtimes and allows Device Plugin authors to |
| 87 | +adopt CDI to inject devices without requiring that users move to a Dynamic |
| 88 | +Resource Allocation (DRA) based implementation. |
| 89 | + |
| 90 | +### Goals |
| 91 | + |
| 92 | +* Allow Device Plugin authors to forward device requests to CRI runtimes as a CRI field. |
| 93 | +* Allow Device Plugin authors to use CDI to define the modifications required for containerised environments. |
| 94 | + |
| 95 | +## Design Details |
| 96 | + |
| 97 | +This adds a repeated `CDIDevice` field to the exiting `ContainerAllocateResponse` returned as part of the |
| 98 | +`AllocateResponse` in the Device Plugin API. This matches the modifications made to the Dynamic Resource Allocation API in [#3731](https://github.com/kubernetes/enhancements/pull/3731). |
| 99 | + |
| 100 | +The values contained in this field are then used to populate the corresponding field in the CRI |
| 101 | +which is passed to the container runtimes. In addition, annotations with a `cdi.k8s.io` prefix will be |
| 102 | +added to the CRI to allow for consumption in container runtimes that do not yet support the |
| 103 | +CRI field directly, but do support device requests through annotations. |
| 104 | + |
| 105 | +```protobuf |
| 106 | +// CDIDevice specifies a CDI device information. |
| 107 | +message CDIDevice { |
| 108 | + // Fully qualified CDI device name |
| 109 | + // for example: vendor.com/gpu=gpudevice1 |
| 110 | + // see more details in the CDI specification: |
| 111 | + // https://github.com/container-orchestrated-devices/container-device-interface/blob/main/SPEC.md |
| 112 | + string name = 1; |
| 113 | +} |
| 114 | +
|
| 115 | +message ContainerAllocateResponse { |
| 116 | + // List of environment variable to be set in the container to access one of more devices. |
| 117 | + map<string, string> envs = 1; |
| 118 | + // Mounts for the container. |
| 119 | + repeated Mount mounts = 2; |
| 120 | + // Devices for the container. |
| 121 | + repeated DeviceSpec devices = 3; |
| 122 | + // Container annotations to pass to the container runtime |
| 123 | + map<string, string> annotations = 4; |
| 124 | + // CDI devices for the container. |
| 125 | + repeated CDIDevice cdi_devices = 5; |
| 126 | +} |
| 127 | +``` |
| 128 | + |
| 129 | +### Test Plan |
| 130 | + |
| 131 | +[x] I/we understand the owners of the involved components may require updates to |
| 132 | +existing tests to make this code solid enough prior to committing the changes necessary |
| 133 | +to implement this enhancement. |
| 134 | + |
| 135 | +##### Prerequisite testing updates |
| 136 | + |
| 137 | +##### Unit tests |
| 138 | + |
| 139 | +- `devicemanager`: `2023-06-15` - `85.1%` |
| 140 | + |
| 141 | +##### Integration tests |
| 142 | + |
| 143 | +There are currently no integration tests for device plugins. |
| 144 | +We do not plan to add any for this feature. |
| 145 | + |
| 146 | +However, these cases will be added in the existing integration tests: |
| 147 | + - Feature gate enable/disable tests |
| 148 | + |
| 149 | +##### e2e tests |
| 150 | + |
| 151 | +These cases will be added in the existing `e2e_node` tests: |
| 152 | + - Device Plugin works with CDI devices |
| 153 | + |
| 154 | +### Graduation Criteria |
| 155 | + |
| 156 | +#### Alpha |
| 157 | +- [X] Add the CDIDevices field to the device plugin API |
| 158 | +- [X] Implement the logic to pass the CDIDevices into the CRI |
| 159 | +- [X] Add proper `e2e_node` tests |
| 160 | + |
| 161 | +#### Alpha to Beta Graduation |
| 162 | +- [X] No major bugs reported in the previous cycle |
| 163 | + |
| 164 | +#### Beta to G.A Graduation |
| 165 | +- [X] Gather feedback from at least 2 device plugin vendors that CDI support works for them |
| 166 | + |
| 167 | +### Upgrade / Downgrade Strategy |
| 168 | + |
| 169 | +We expect no impact on upgrades. |
| 170 | +On downgrades, we expect no impact to Kubernetes and minimal impact to device |
| 171 | +plugin developers. |
| 172 | + |
| 173 | +We are not bumping the device plugin API version, but simply adding a field to |
| 174 | +its protobuf. On upgrades this means that older device plugins will simply |
| 175 | +continue to work as they always have, since they will need to opt-in to using |
| 176 | +this new field. |
| 177 | + |
| 178 | +For downgrades, if a plugin has not opted to use the new field, there will be |
| 179 | +no impact since a downgraded kubelet won't support it anyway. If a device |
| 180 | +plugin has opted-in to use the new field, a downgraded kubelet will simply |
| 181 | +silently ignore it. This would have no impact to Kubernetes itself, but the |
| 182 | +plugin developer would need to be aware of this if they are confused as to why |
| 183 | +their new CDI support is suddenly not working anymore. |
| 184 | + |
| 185 | +### Version Skew Strategy |
| 186 | + |
| 187 | +The kubelet will always be backwards compatible, so going forward existing |
| 188 | +plugins are not expected to break. |
| 189 | + |
| 190 | +## Production Readiness Review Questionnaire |
| 191 | + |
| 192 | +### Feature Enablement and Rollback |
| 193 | + |
| 194 | +###### How can this feature be enabled / disabled in a live cluster? |
| 195 | + |
| 196 | +- [x] Feature gate (also fill in values in `kep.yaml`) |
| 197 | + - Feature gate names: |
| 198 | + - `DevicePluginCDIDevices` |
| 199 | + - Components depending on the feature gate: kubelet |
| 200 | +- [x] Pass CDI devices to the kubelet over the new field in the device plugin API |
| 201 | + - Will enabling / disabling the feature require downtime of the control |
| 202 | + plane? |
| 203 | + No. |
| 204 | + - Will enabling / disabling the feature require downtime or reprovisioning |
| 205 | + of a node? |
| 206 | + No. |
| 207 | + |
| 208 | + |
| 209 | +###### Does enabling the feature change any default behavior? |
| 210 | + |
| 211 | +No. Device Plugins need to be updated to make use of the new field. |
| 212 | + |
| 213 | +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? |
| 214 | + |
| 215 | +- Yes, disabling the `DevicePluginCDIDevices` feature gate shuts down the feature completely. |
| 216 | +- Yes, by not sending CDI devices over the device plugin API (and falling back to the old way of passing device info). |
| 217 | + |
| 218 | +###### What happens if we reenable the feature if it was previously rolled back? |
| 219 | + |
| 220 | +Nothing bad will happen, new containers will simply be able to be started with |
| 221 | +CDI devices again. |
| 222 | + |
| 223 | +###### Are there any tests for feature enablement/disablement? |
| 224 | + |
| 225 | +There will be e2e tests demonstrating that CDI devices are attached as expected |
| 226 | +when the feature is enabled, and silently ignored if the feature is disabled. |
| 227 | + |
| 228 | +### Rollout, Upgrade and Rollback Planning |
| 229 | + |
| 230 | +###### How can a rollout or rollback fail? Can it impact already running workloads? |
| 231 | + |
| 232 | +The failure of the kubelet would mean that fields from new device allocations |
| 233 | +will not be processed. |
| 234 | + |
| 235 | +However, CDI device themselves are only interpereted at container start. |
| 236 | +Existing containers that were started with support for CDI devices will not be |
| 237 | +impacted if the feature gate is enabled or disabled during the lifetime of a |
| 238 | +running container. Only new containers will be impacted by the presence or |
| 239 | +absence of the feature gate. |
| 240 | + |
| 241 | +###### What specific metrics should inform a rollback? |
| 242 | + |
| 243 | +N/A |
| 244 | + |
| 245 | +###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested? |
| 246 | + |
| 247 | +N/A |
| 248 | + |
| 249 | +###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? |
| 250 | + |
| 251 | +No |
| 252 | + |
| 253 | +### Monitoring Requirements |
| 254 | + |
| 255 | +###### How can an operator determine if the feature is in use by workloads? |
| 256 | + |
| 257 | +This depends on Device Plugin vendor implementations making use of the required |
| 258 | +field and cannot be directly determined. |
| 259 | + |
| 260 | +###### How can someone using this feature know that it is working for their instance? |
| 261 | + |
| 262 | +End-users are not aware that this feature exists. Device plugin developers can |
| 263 | +ensure that this feature is working by passing CDI devices to workloads |
| 264 | +requesting them, and ensuring that the workloads come up successfully with |
| 265 | +access to the devices they asked for. |
| 266 | + |
| 267 | +###### What are the reasonable SLOs (Service Level Objectives) for the enhancement? |
| 268 | + |
| 269 | +N/A |
| 270 | + |
| 271 | +###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service? |
| 272 | + |
| 273 | +N/A |
| 274 | + |
| 275 | +###### Are there any missing metrics that would be useful to have to improve observability of this feature? |
| 276 | + |
| 277 | +N/A |
| 278 | + |
| 279 | +### Dependencies |
| 280 | + |
| 281 | +###### Does this feature depend on any specific services running in the cluster? |
| 282 | + |
| 283 | +- The container runtime (e.g. containerd, crio-o, etc.) must support CDI. |
| 284 | +- A Device Plugin must be implemented to use the field. |
| 285 | + |
| 286 | +### Scalability |
| 287 | + |
| 288 | +###### Will enabling / using this feature result in any new API calls? |
| 289 | + |
| 290 | +No |
| 291 | + |
| 292 | +###### Will enabling / using this feature result in introducing new API types? |
| 293 | + |
| 294 | +No |
| 295 | + |
| 296 | +###### Will enabling / using this feature result in any new calls to the cloud provider? |
| 297 | + |
| 298 | +No |
| 299 | + |
| 300 | +###### Will enabling / using this feature result in increasing size or count of the existing API objects? |
| 301 | + |
| 302 | +No |
| 303 | + |
| 304 | +###### Will enabling / using this feature result in increasing time taken by any operations covered by existing SLIs/SLOs? |
| 305 | + |
| 306 | +No |
| 307 | + |
| 308 | +###### Will enabling / using this feature result in non-negligible increase of resource usage (CPU, RAM, disk, IO, ...) in any components? |
| 309 | + |
| 310 | +No. The additional field will replace existing usages where used. |
| 311 | + |
| 312 | +###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)? |
| 313 | + |
| 314 | +No |
| 315 | + |
| 316 | +### Troubleshooting |
| 317 | + |
| 318 | +N/A |
| 319 | + |
| 320 | +###### What are other known failure modes? |
| 321 | + |
| 322 | +TBD |
| 323 | + |
| 324 | +###### What steps should be taken if SLOs are not being met to determine the problem? |
| 325 | + |
| 326 | +N/A |
| 327 | + |
| 328 | +## Implementation History |
| 329 | + |
| 330 | +- 2023-05-15: KEP created |
| 331 | + |
| 332 | +## Drawbacks |
| 333 | + |
| 334 | +There is no reason this KEP should not be implemented. CDI is the new standard |
| 335 | +for device support in containerized environments, and this enhancement now |
| 336 | +makes this possible through a simple addition to the device plugin API. |
| 337 | + |
| 338 | +## Alternatives |
| 339 | + |
| 340 | +None |
0 commit comments