@@ -40,9 +40,9 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
40
40
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [ kubernetes/enhancements] (not the initial KEP PR)
41
41
- [X] (R) KEP approvers have approved the KEP status as ` implementable `
42
42
- [X] (R) Design details are appropriately documented
43
- - [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
43
+ - [X ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
44
44
- [X] (R) Graduation criteria is in place
45
- - [ ] (R) Production readiness review completed
45
+ - [X ] (R) Production readiness review completed
46
46
- [ ] Production readiness review approved
47
47
- [ ] "Implementation History" section is up-to-date for milestone
48
48
- [ ] User-facing documentation has been created in [ kubernetes/website] , for publication to [ kubernetes.io]
@@ -376,46 +376,51 @@ _This section must be completed when targeting beta graduation to a release._
376
376
377
377
* ** How can a rollout fail? Can it impact already running workloads?**
378
378
379
- TBD for beta.
379
+ Feature is enabled but exec plugin does not properly fetch and return credentials to the kubelet.
380
+ Impact is that kubelet cannot authenticate and pull credentials from those registries.
380
381
381
382
* ** What specific metrics should inform a rollback?**
382
383
383
- TBD for beta .
384
+ This feature does not have metrics .
384
385
385
386
* ** Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
386
387
387
- TBD for beta.
388
+ No, upgrade->downgrade->upgrade were not tested. Manual validation will be done prior to promoting this feature to beta in v1.21 .
388
389
389
390
* ** Is the rollout accompanied by any deprecations and/or removals of features, APIs,
390
391
fields of API types, flags, etc.?**
391
392
392
- TBD for beta .
393
+ Yes, this feature was added to remove the in-tree kubelet credential providers for AWS, Azure and GCP .
393
394
394
395
### Monitoring Requirements
395
396
396
397
_ This section must be completed when targeting beta graduation to a release._
397
398
398
399
* ** How can an operator determine if the feature is in use by workloads?**
399
400
400
- TBD for beta.
401
+ Operators can check for a kubelet config file passed into the ` --image-credential-provider-config ` .
402
+ The config has a field called ` imageMatches ` which indicates the images a plugin will be invoked for.
401
403
402
404
* ** What are the SLIs (Service Level Indicators) an operator can use to determine
403
405
the health of the service?**
404
406
- [ ] Metrics
405
407
- Metric name:
406
408
- [ Optional] Aggregation method:
407
409
- Components exposing the metric:
408
- - [ ] Other (treat as last resort)
409
- - Details:
410
+ - [X ] Other (treat as last resort)
411
+ - Details: the kubelet has several error-level logs for when exec plugins time out or return a non-zero exit code.
410
412
411
413
* ** What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
412
414
413
- TBD for beta.
415
+ On failure to fetch credentials from an exec plugin, the kubelet will retry after some period and invoke the plugin again.
416
+ The kubelet will retry whenever it attempts to pull an image, but until then, kubelet will not be able to authenticate to
417
+ the registry and pull images. The SLO for successfully invoking exec plugins should be based on the SLO for successfully
418
+ pulling images for the container registry in question.
414
419
415
420
* ** Are there any missing metrics that would be useful to have to improve observability
416
421
of this feature?**
417
422
418
- TBD for beta .
423
+ Possibly. We could add a metric for failed calls to exec plugins .
419
424
420
425
421
426
### Dependencies
@@ -424,7 +429,8 @@ _This section must be completed when targeting beta graduation to a release._
424
429
425
430
* ** Does this feature depend on any specific services running in the cluster?**
426
431
427
- TBD for beta.
432
+ This feature depends on the existence of a credential provider plugin binary on the host and a configuration file
433
+ for the plugin to be read by the kubelet.
428
434
429
435
### Scalability
430
436
@@ -480,19 +486,29 @@ _This section must be completed when targeting beta graduation to a release._
480
486
No.
481
487
482
488
* ** What are other known failure modes?**
483
- For each of them, fill in the following information by copying the below template:
484
- - [ Failure mode brief description]
485
- - Detection: How can it be detected via metrics? Stated another way:
486
- how can an operator troubleshoot without logging into a master or worker node?
487
- - Mitigations: What can be done to stop the bleeding, especially for already
488
- running user workloads?
489
- - Diagnostics: What are the useful log messages and their required logging
490
- levels that could help debug the issue?
491
- Not required until feature graduated to beta.
492
- - Testing: Are there any tests for failure mode? If not, describe why.
489
+
490
+ - kubelet is invoking an exec plugin that does not work, therefore kubelet cannot pull images handled by the plugin
491
+ - Detection: Images fail to pull
492
+ - Mitigations: Use imagePullSecrets as a workaround
493
+ - Diagnostics: Check kubelet logs for errors.
494
+ - Testing: No, it is expected that images will fail to pull if an exec plugin is faulty.
495
+ - a credential provider plugin invoked by the kubelet returns credentials but they are not valid and kubelet cannot
496
+ use them to authenicate to the container registry
497
+ - Detection: Images fail to pull
498
+ - Mitigations: Use imagePullSecrets as a workaround
499
+ - Diagnostics: Check kubelet logs for errors.
500
+ - Testing: No, it is expected that images will fail to pull if an exec plugin is faulty.
501
+ - kubelet is invoking an exec plugin but the exec plugin takes longer than the default 1m timeout.
502
+ - Detection: Images fail to pull
503
+ - Mitigations: Check cloud provider quotas. The plugin might be taking a long time due to API quota limits.
504
+ - Diagnostics: Check kubelet logs for errors.
505
+ - Testing: No, it is expected that images will fail to pull if an exec plugin takes longer than 1m.
493
506
494
507
* ** What steps should be taken if SLOs are not being met to determine the problem?**
495
508
509
+ - check logs of kubelet
510
+ - check service availability of container registries used by the cluster
511
+
496
512
[ supported limits ] : https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
497
513
[ existing SLIs/SLOs ] : https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
498
514
0 commit comments