Skip to content

Commit 155acc0

Browse files
authored
Merge pull request kubernetes#2457 from andrewsykim/kep-2133
KEP-2133: add beta milestone and prod readiness review
2 parents 98c940c + 8066aa5 commit 155acc0

File tree

3 files changed

+58
-29
lines changed

3 files changed

+58
-29
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 2053
2+
beta:
3+
approver: "@deads2k"

keps/sig-node/2133-kubelet-credential-providers/README.md

Lines changed: 53 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
- [Credential Provider Request API](#credential-provider-request-api)
1414
- [Credential Provider Response API](#credential-provider-response-api)
1515
- [Caching Credentials](#caching-credentials)
16+
- [Metrics](#metrics)
1617
- [Test Plan](#test-plan)
1718
- [Graduation Criteria](#graduation-criteria)
1819
- [Alpha](#alpha)
@@ -40,9 +41,9 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
4041
- [X] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
4142
- [X] (R) KEP approvers have approved the KEP status as `implementable`
4243
- [X] (R) Design details are appropriately documented
43-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
44+
- [X] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input
4445
- [X] (R) Graduation criteria is in place
45-
- [ ] (R) Production readiness review completed
46+
- [X] (R) Production readiness review completed
4647
- [ ] Production readiness review approved
4748
- [ ] "Implementation History" section is up-to-date for milestone
4849
- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io]
@@ -312,12 +313,22 @@ The plugin can signal to the kubelet how it should cache a given response. There
312313
2. Registry: the kubelet should cache and use this response only for future images with the same registry hostname (and port if included).
313314
3. Image: the kubelet should cache and use this response only for future images that match the image exactly.
314315

316+
### Metrics
317+
318+
Two kubelet metrics will be added:
319+
* `kubelet_credential_provider_plugin_errors`: this will track the number errors that occurred from invoking an exec plugin
320+
* `kubelet_credential_provider_plugin_duration`: this will track the duration of execution by plugins.
321+
315322
### Test Plan
316323

317324
Alpha:
318325
* unit tests for the exec plugin provider
319326
* unit tests for API validation
320327

328+
Beta:
329+
* integration or e2e tests with at least one working plugin implementation
330+
* unit tests for new concurrency/caching improvements.
331+
321332
### Graduation Criteria
322333

323334
### Alpha
@@ -330,6 +341,7 @@ can be achieved using the exec plugin.
330341

331342
* integration or e2e tests.
332343
* at least one working plugin implementation.
344+
* kubelet metrics for failed calls to exec plugins.
333345
* improvements to concurrency and caching:
334346
- use `singleflight.Group` to ensure only a single call per image. Today the kubelet holds a single lock for every call to `Provide`.
335347
See [this](https://github.com/kubernetes/kubernetes/pull/94196#discussion_r517805701) and [this](https://github.com/kubernetes/kubernetes/pull/94196#discussion_r518487386) discussion.
@@ -376,55 +388,59 @@ _This section must be completed when targeting beta graduation to a release._
376388

377389
* **How can a rollout fail? Can it impact already running workloads?**
378390

379-
TBD for beta.
391+
Feature is enabled but exec plugin does not properly fetch and return credentials to the kubelet.
392+
Impact is that kubelet cannot authenticate and pull credentials from those registries.
380393

381394
* **What specific metrics should inform a rollback?**
382395

383-
TBD for beta.
396+
High error rates from `kubelet_credential_provider_plugin_error` and long durations from `kubelet_credential_provider_plugin_duration`.
384397

385398
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
386399

387-
TBD for beta.
400+
No, upgrade->downgrade->upgrade were not tested. Manual validation will be done prior to promoting this feature to beta in v1.21.
388401

389402
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
390403
fields of API types, flags, etc.?**
391404

392-
TBD for beta.
405+
Yes, this feature was added to remove the in-tree kubelet credential providers for AWS, Azure and GCP.
393406

394407
### Monitoring Requirements
395408

396409
_This section must be completed when targeting beta graduation to a release._
397410

398411
* **How can an operator determine if the feature is in use by workloads?**
399412

400-
TBD for beta.
413+
Operators can check for a kubelet config file passed into the `--image-credential-provider-config`.
414+
The config has a field called `imageMatches` which indicates the images a plugin will be invoked for.
401415

402416
* **What are the SLIs (Service Level Indicators) an operator can use to determine
403417
the health of the service?**
404-
- [ ] Metrics
405-
- Metric name:
406-
- [Optional] Aggregation method:
407-
- Components exposing the metric:
408-
- [ ] Other (treat as last resort)
409-
- Details:
418+
- [X] Metrics
419+
- Metric name: `kubelet_credential_provider_plugin_error`, `kubelet_credential_provider_plugin_duration`
420+
- Components exposing the metric: kubelet
421+
- [X] Other (treat as last resort)
422+
- Details: the kubelet has several error-level logs for when exec plugins time out or return a non-zero exit code.
410423

411424
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
412425

413-
TBD for beta.
426+
On failure to fetch credentials from an exec plugin, the kubelet will retry after some period and invoke the plugin again.
427+
The kubelet will retry whenever it attempts to pull an image, but until then, kubelet will not be able to authenticate to
428+
the registry and pull images. The SLO for successfully invoking exec plugins should be based on the SLO for successfully
429+
pulling images for the container registry in question.
414430

415431
* **Are there any missing metrics that would be useful to have to improve observability
416432
of this feature?**
417433

418-
TBD for beta.
419-
434+
No.
420435

421436
### Dependencies
422437

423438
_This section must be completed when targeting beta graduation to a release._
424439

425440
* **Does this feature depend on any specific services running in the cluster?**
426441

427-
TBD for beta.
442+
This feature depends on the existence of a credential provider plugin binary on the host and a configuration file
443+
for the plugin to be read by the kubelet.
428444

429445
### Scalability
430446

@@ -480,19 +496,29 @@ _This section must be completed when targeting beta graduation to a release._
480496
No.
481497

482498
* **What are other known failure modes?**
483-
For each of them, fill in the following information by copying the below template:
484-
- [Failure mode brief description]
485-
- Detection: How can it be detected via metrics? Stated another way:
486-
how can an operator troubleshoot without logging into a master or worker node?
487-
- Mitigations: What can be done to stop the bleeding, especially for already
488-
running user workloads?
489-
- Diagnostics: What are the useful log messages and their required logging
490-
levels that could help debug the issue?
491-
Not required until feature graduated to beta.
492-
- Testing: Are there any tests for failure mode? If not, describe why.
499+
500+
- kubelet is invoking an exec plugin that does not work, therefore kubelet cannot pull images handled by the plugin
501+
- Detection: Images fail to pull
502+
- Mitigations: Use imagePullSecrets as a workaround
503+
- Diagnostics: Check kubelet logs for errors.
504+
- Testing: No, it is expected that images will fail to pull if an exec plugin is faulty.
505+
- a credential provider plugin invoked by the kubelet returns credentials but they are not valid and kubelet cannot
506+
use them to authenicate to the container registry
507+
- Detection: Images fail to pull
508+
- Mitigations: Use imagePullSecrets as a workaround
509+
- Diagnostics: Check kubelet logs for errors.
510+
- Testing: No, it is expected that images will fail to pull if an exec plugin is faulty.
511+
- kubelet is invoking an exec plugin but the exec plugin takes longer than the default 1m timeout.
512+
- Detection: Images fail to pull
513+
- Mitigations: Check cloud provider quotas. The plugin might be taking a long time due to API quota limits.
514+
- Diagnostics: Check kubelet logs for errors.
515+
- Testing: No, it is expected that images will fail to pull if an exec plugin takes longer than 1m.
493516

494517
* **What steps should be taken if SLOs are not being met to determine the problem?**
495518

519+
- check logs of kubelet
520+
- check service availability of container registries used by the cluster
521+
496522
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
497523
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
498524

keps/sig-node/2133-kubelet-credential-providers/kep.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,12 @@ replaces:
2222
- "/keps/sig-cloud-provider/20191004-out-of-tree-credential-providers.md"
2323

2424
# The target maturity stage in the current dev cycle for this KEP.
25-
stage: alpha
25+
stage: beta
2626

2727
# The most recent milestone for which work toward delivery of this KEP has been
2828
# done. This can be the current (upcoming) milestone, if it is being actively
2929
# worked on.
30-
latest-milestone: "v1.20"
30+
latest-milestone: "v1.21"
3131

3232
# The milestone at which this feature was, or is targeted to be, at each stage.
3333
milestone:

0 commit comments

Comments
 (0)