Skip to content

Commit 4c79463

Browse files
committed
KEP-4412: promote wi for image pulls to beta in v1.34
Signed-off-by: Anish Ramasekar <[email protected]>
1 parent d8f6b48 commit 4c79463

File tree

3 files changed

+67
-21
lines changed
  • keps
    • prod-readiness/sig-auth
    • sig-auth/4412-projected-service-account-tokens-for-kubelet-image-credential-providers

3 files changed

+67
-21
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 4412
22
alpha:
33
approver: "deads2k"
4+
beta:
5+
approver: "deads2k"

keps/sig-auth/4412-projected-service-account-tokens-for-kubelet-image-credential-providers/README.md

Lines changed: 62 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -104,7 +104,6 @@ tags, and then generate with `hack/update-toc.sh`.
104104
- [e2e tests](#e2e-tests)
105105
- [Graduation Criteria](#graduation-criteria)
106106
- [Alpha](#alpha)
107-
- [Post Alpha](#post-alpha)
108107
- [Beta](#beta)
109108
- [GA](#ga)
110109
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
@@ -141,10 +140,10 @@ checklist items _must_ be updated for the enhancement to be released.
141140

142141
Items marked with (R) are required *prior to targeting to a milestone / release*.
143142

144-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
145-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
146-
- [ ] (R) Design details are appropriately documented
147-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
143+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
144+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
145+
- [x] (R) Design details are appropriately documented
146+
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
148147
- [ ] e2e Tests for all Beta API Operations (endpoints)
149148
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
150149
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
@@ -687,6 +686,8 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
687686
https://storage.googleapis.com/k8s-triage/index.html
688687
-->
689688

689+
This feature is fully tested with unit and e2e tests.
690+
690691
##### e2e tests
691692

692693
<!--
@@ -699,6 +700,18 @@ https://storage.googleapis.com/k8s-triage/index.html
699700
We expect no non-infra related flakes in the last month as a GA graduation criteria.
700701
-->
701702

703+
There is an existing e2e test for kubelet credential providers using gcp credential provider.
704+
705+
- test/e2e_node/image_credential_provider.go: https://testgrid.k8s.io/sig-node-kubelet#kubelet-credential-provider
706+
707+
As part of alpha implementation, the [e2e test has been updated](https://github.com/kubernetes/kubernetes/commit/2090a01e0a495301432276216bbf9af102fc431c) to cover the new credential provider configuration and the new behavior of the kubelet when the `TokenAttributes` field is set.
708+
709+
We created a symlink to the existing gcp credential provider executable with a different name to use for testing service account token for credential provider. The credential provider has been updated to validate the following when plugin is run in service account token mode:\
710+
711+
1. Check the required annotations are sent as part of the `CredentialProviderRequest.ServiceAccountAnnotations` field.
712+
2. Check the service account token is sent as part of the `CredentialProviderRequest.ServiceAccountToken` field.
713+
3. Extract the claims from the service account token and validate the audience claim matches the `ServiceAccountTokenAudience` field in the kubelet's credential provider configuration.
714+
702715
### Graduation Criteria
703716

704717
<!--
@@ -773,15 +786,13 @@ in back-to-back releases.
773786
- `ServiceAccountNodeAudienceRestriction` feature gate implemented in KAS as a beta feature
774787
- Audience validation is enabled by default for service account tokens requested by the kubelet
775788

776-
#### Post Alpha
777-
778-
- Make sure the feature is compatible with the [Ensure secret pull images KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2535-ensure-secret-pulled-images).
779-
780789
#### Beta
781790

782-
- The implementation works well with the Ensure secret pull images KEP and supports pod image pull policy set to any value.
791+
- Make the feature compatible with the [Ensure secret pull images KEP](https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/2535-ensure-secret-pulled-images).
783792
- `ServiceAccountNodeAudienceRestriction` feature gate is beta in KAS and enabled by default. This feature needs to be beta/enabled by default at least one release before this KEP goes to beta. This is critical to support downgrade use cases.
784-
- Add metrics
793+
- Caching KSA tokens per pod-sa to prevent generating tokens during hot loop/multiple containers with images.
794+
- Some indication of whether the credentials are SA or SA+pod-scoped
795+
- whether that's indicated in the config or in the plugin-returned content, and what the default is if unspecified (defaulting to pod is less performance, defaulting to SA risks incorrect cross-pod caching)
785796

786797
#### GA
787798

@@ -886,6 +897,14 @@ FeatureSpec{
886897
}
887898
```
888899

900+
```go
901+
FeatureSpec{
902+
Default: true,
903+
LockToDefault: false,
904+
PreRelease: featuregate.Beta,
905+
}
906+
```
907+
889908
- [x] Feature gate (also fill in values in `kep.yaml`)
890909
- Feature gate name: `ServiceAccountNodeAudienceRestriction`
891910
- Components depending on the feature gate: kube-apiserver
@@ -933,7 +952,7 @@ Steps to disable the feature:
933952
3. Restart the kubelet.
934953

935954
These steps need to be performed on all nodes in the cluster.
936-
After restarting the kubelet on all nodes, remove the audiences used by kubelet from the KAS `--allowed-kubelet-audiences` flag.
955+
After restarting the kubelet on all nodes, remove the allowed audiences for which the kubelet is allowed to generate service account tokens for image pulls in KAS by removing the previous `ClusterRole` or `Role` with the `request-serviceaccounts-token-audience` verb.
937956

938957
###### What happens if we reenable the feature if it was previously rolled back?
939958

@@ -974,13 +993,18 @@ rollout. Similarly, consider large clusters and how enablement/disablement
974993
will rollout across nodes.
975994
-->
976995

996+
Feature is enabled but exec plugin does not properly fetch and return credentials to the kubelet.
997+
Impact is that kubelet cannot authenticate and pull credentials from those registries.
998+
977999
###### What specific metrics should inform a rollback?
9781000

9791001
<!--
9801002
What signals should users be paying attention to when the feature is young
9811003
that might indicate a serious problem?
9821004
-->
9831005

1006+
High error rates from `kubelet_credential_provider_plugin_error` and long durations from `kubelet_credential_provider_plugin_duration`.
1007+
9841008
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
9851009

9861010
<!--
@@ -989,12 +1013,16 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
9891013
are missing a bunch of machinery and tooling and can't do that now.
9901014
-->
9911015

1016+
No, upgrade->downgrade->upgrade were not tested. Manual validation will be done prior to promoting this feature to beta in v1.34.
1017+
9921018
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
9931019

9941020
<!--
9951021
Even if applying deprecation policies, they may still surprise some users.
9961022
-->
9971023

1024+
No.
1025+
9981026
### Monitoring Requirements
9991027

10001028
<!--
@@ -1012,6 +1040,9 @@ checking if there are objects with field X set) may be a last resort. Avoid
10121040
logs or events for this purpose.
10131041
-->
10141042

1043+
Operators can check for a kubelet config file passed into the `--image-credential-provider-config`.
1044+
The config has a field called `imageMatches` which indicates the images a plugin will be invoked for.
1045+
10151046
###### How can someone using this feature know that it is working for their instance?
10161047

10171048
<!--
@@ -1023,13 +1054,10 @@ and operation of this feature.
10231054
Recall that end users cannot usually observe component logs or access metrics.
10241055
-->
10251056

1026-
- [ ] Events
1027-
- Event Reason:
1028-
- [ ] API .status
1029-
- Condition name:
1030-
- Other field:
1031-
- [ ] Other (treat as last resort)
1032-
- Details:
1057+
Users can observe events for successful image pulls that use the service account token for image pull.
1058+
1059+
- [x] Events
1060+
- Event Reason: " Successfully pulled image "xxx" in 11.877s (11.877s including waiting). Image size: xxx bytes."
10331061

10341062
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
10351063

@@ -1048,6 +1076,11 @@ These goals will help you determine what you need to measure (SLIs) in the next
10481076
question.
10491077
-->
10501078

1079+
On failure to fetch credentials from an exec plugin, the kubelet will retry after some period and invoke the plugin again.
1080+
The kubelet will retry whenever it attempts to pull an image, but until then, kubelet will not be able to authenticate to
1081+
the registry and pull images. The SLO for successfully invoking exec plugins should be based on the SLO for successfully
1082+
pulling images for the container registry in question.
1083+
10511084
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
10521085

10531086
<!--
@@ -1093,6 +1126,8 @@ and creating new ones, as well as about cluster-level services (e.g. DNS):
10931126
- Impact of its degraded performance or high-error rates on the feature:
10941127
-->
10951128

1129+
This feature depends on the existence of a credential provider plugin binary on the host and a configuration file for the plugin to be read by the kubelet.
1130+
10961131
### Scalability
10971132

10981133
<!--
@@ -1222,6 +1257,8 @@ details). For now, we leave it here.
12221257

12231258
###### How does this feature react if the API server and/or etcd is unavailable?
12241259

1260+
If the API server is unavailable, kubelet will not be able to fetch service account tokens for image pull. The kubelet will retry fetching the token after some period, but until then, kubelet will not be able to authenticate to the registry and pull images that rely on the credential provider plugin using service account tokens for image pull.
1261+
12251262
###### What are other known failure modes?
12261263

12271264
<!--
@@ -1239,6 +1276,9 @@ For each of them, fill in the following information by copying the below templat
12391276

12401277
###### What steps should be taken if SLOs are not being met to determine the problem?
12411278

1279+
- check logs of kubelet
1280+
- check service availability of container registries used by the cluster
1281+
12421282
## Implementation History
12431283

12441284
<!--
@@ -1252,6 +1292,9 @@ Major milestones might include:
12521292
- when the KEP was retired or superseded
12531293
-->
12541294

1295+
1.33: Alpha release
1296+
1.34: Beta release
1297+
12551298
## Drawbacks
12561299

12571300
<!--

keps/sig-auth/4412-projected-service-account-tokens-for-kubelet-image-credential-providers/kep.yaml

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,10 +16,11 @@ see-also:
1616
- "/keps/sig-node/2535-ensure-secret-pulled-images"
1717
creation-date: "2024-09-09"
1818
status: implementable
19-
stage: alpha
20-
latest-milestone: "v1.33"
19+
stage: beta
20+
latest-milestone: "v1.34"
2121
milestone:
2222
alpha: "v1.33"
23+
beta: "v1.34"
2324
feature-gates:
2425
- name: ServiceAccountTokenForKubeletCredentialProviders
2526
components:

0 commit comments

Comments
 (0)