Skip to content

Commit 2e8fb5a

Browse files
committed
KEP-2133: add beta milestone and prod readiness review
Signed-off-by: Andrew Sy Kim <[email protected]>
1 parent e9b84b1 commit 2e8fb5a

File tree

3 files changed

+43
-24
lines changed

3 files changed

+43
-24
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 2053
2+
beta:
3+
approver: "@deads2k"

keps/sig-node/2133-kubelet-credential-providers/README.md

Lines changed: 38 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -40,9 +40,9 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
4040
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
4141
- [X] (R) KEP approvers have approved the KEP status as `implementable`
4242
- [X] (R) Design details are appropriately documented
43-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
43+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
4444
- [X] (R) Graduation criteria is in place
45-
- [ ] (R) Production readiness review completed
45+
- [X] (R) Production readiness review completed
4646
- [ ] Production readiness review approved
4747
- [ ] "Implementation History" section is up-to-date for milestone
4848
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
@@ -376,46 +376,51 @@ _This section must be completed when targeting beta graduation to a release._
376376

377377
* **How can a rollout fail? Can it impact already running workloads?**
378378

379-
TBD for beta.
379+
Feature is enabled but exec plugin does not properly fetch and return credentials to the kubelet.
380+
Impact is that kubelet cannot authenticate and pull credentials from those registries.
380381

381382
* **What specific metrics should inform a rollback?**
382383

383-
TBD for beta.
384+
This feature does not have metrics.
384385

385386
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
386387

387-
TBD for beta.
388+
No, upgrade->downgrade->upgrade were not tested. Manual validation will be done prior to promoting this feature to beta in v1.21.
388389

389390
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
390391
fields of API types, flags, etc.?**
391392

392-
TBD for beta.
393+
Yes, this feature was added to remove the in-tree kubelet credential providers for AWS, Azure and GCP.
393394

394395
### Monitoring Requirements
395396

396397
_This section must be completed when targeting beta graduation to a release._
397398

398399
* **How can an operator determine if the feature is in use by workloads?**
399400

400-
TBD for beta.
401+
Operators can check for a kubelet config file passed into the `--image-credential-provider-config`.
402+
The config has a field called `imageMatches` which indicates the images a plugin will be invoked for.
401403

402404
* **What are the SLIs (Service Level Indicators) an operator can use to determine
403405
the health of the service?**
404406
- [ ] Metrics
405407
- Metric name:
406408
- [Optional] Aggregation method:
407409
- Components exposing the metric:
408-
- [ ] Other (treat as last resort)
409-
- Details:
410+
- [X] Other (treat as last resort)
411+
- Details: the kubelet has several error-level logs for when exec plugins time out or return a non-zero exit code.
410412

411413
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
412414

413-
TBD for beta.
415+
On failure to fetch credentials from an exec plugin, the kubelet will retry after some period and invoke the plugin again.
416+
The kubelet will retry whenever it attempts to pull an image, but until then, kubelet will not be able to authenticate to
417+
the registry and pull images. The SLO for successfully invoking exec plugins should be based on the SLO for successfully
418+
pulling images for the container registry in question.
414419

415420
* **Are there any missing metrics that would be useful to have to improve observability
416421
of this feature?**
417422

418-
TBD for beta.
423+
Possibly. We could add a metric for failed calls to exec plugins.
419424

420425

421426
### Dependencies
@@ -424,7 +429,8 @@ _This section must be completed when targeting beta graduation to a release._
424429

425430
* **Does this feature depend on any specific services running in the cluster?**
426431

427-
TBD for beta.
432+
This feature depends on the existence of a credential provider plugin binary on the host and a configuration file
433+
for the plugin to be read by the kubelet.
428434

429435
### Scalability
430436

@@ -480,19 +486,29 @@ _This section must be completed when targeting beta graduation to a release._
480486
No.
481487

482488
* **What are other known failure modes?**
483-
For each of them, fill in the following information by copying the below template:
484-
- [Failure mode brief description]
485-
- Detection: How can it be detected via metrics? Stated another way:
486-
how can an operator troubleshoot without logging into a master or worker node?
487-
- Mitigations: What can be done to stop the bleeding, especially for already
488-
running user workloads?
489-
- Diagnostics: What are the useful log messages and their required logging
490-
levels that could help debug the issue?
491-
Not required until feature graduated to beta.
492-
- Testing: Are there any tests for failure mode? If not, describe why.
489+
490+
- kubelet is invoking an exec plugin that does not work, therefore kubelet cannot pull images handled by the plugin
491+
- Detection: Images fail to pull
492+
- Mitigations: Use imagePullSecrets as a workaround
493+
- Diagnostics: Check kubelet logs for errors.
494+
- Testing: No, it is expected that images will fail to pull if an exec plugin is faulty.
495+
- a credential provider plugin invoked by the kubelet returns credentials but they are not valid and kubelet cannot
496+
use them to authenicate to the container registry
497+
- Detection: Images fail to pull
498+
- Mitigations: Use imagePullSecrets as a workaround
499+
- Diagnostics: Check kubelet logs for errors.
500+
- Testing: No, it is expected that images will fail to pull if an exec plugin is faulty.
501+
- kubelet is invoking an exec plugin but the exec plugin takes longer than the default 1m timeout.
502+
- Detection: Images fail to pull
503+
- Mitigations: Check cloud provider quotas. The plugin might be taking a long time due to API quota limits.
504+
- Diagnostics: Check kubelet logs for errors.
505+
- Testing: No, it is expected that images will fail to pull if an exec plugin takes longer than 1m.
493506

494507
* **What steps should be taken if SLOs are not being met to determine the problem?**
495508

509+
- check logs of kubelet
510+
- check service availability of container registries used by the cluster
511+
496512
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
497513
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
498514

keps/sig-node/2133-kubelet-credential-providers/kep.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,12 @@ replaces:
2222
- "/keps/sig-cloud-provider/20191004-out-of-tree-credential-providers.md"
2323

2424
# The target maturity stage in the current dev cycle for this KEP.
25-
stage: alpha
25+
stage: beta
2626

2727
# The most recent milestone for which work toward delivery of this KEP has been
2828
# done. This can be the current (upcoming) milestone, if it is being actively
2929
# worked on.
30-
latest-milestone: "v1.20"
30+
latest-milestone: "v1.21"
3131

3232
# The milestone at which this feature was, or is targeted to be, at each stage.
3333
milestone:

0 commit comments

Comments
 (0)