diff --git a/keps/prod-readiness/sig-storage/5538.yaml b/keps/prod-readiness/sig-storage/5538.yaml new file mode 100644 index 00000000000..3a9da308d1e --- /dev/null +++ b/keps/prod-readiness/sig-storage/5538.yaml @@ -0,0 +1,3 @@ +kep-number: 5538 +beta: + approver: "@jpbetz" diff --git a/keps/sig-storage/5538-csi-sa-tokens-secrets-field/README.md b/keps/sig-storage/5538-csi-sa-tokens-secrets-field/README.md new file mode 100644 index 00000000000..c93abd2a692 --- /dev/null +++ b/keps/sig-storage/5538-csi-sa-tokens-secrets-field/README.md @@ -0,0 +1,384 @@ +# KEP-5538: CSI driver opt-in for service account tokens via secrets field + + +- [Release Signoff Checklist](#release-signoff-checklist) +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Proposal](#proposal) + - [User Stories](#user-stories) + - [Story 1](#story-1) + - [Story 2](#story-2) + - [Notes/Constraints/Caveats](#notesconstraintscaveats) + - [Risks and Mitigations](#risks-and-mitigations) +- [Design Details](#design-details) + - [Current Implementation](#current-implementation) + - [Proposed Implementation](#proposed-implementation) + - [Driver Migration Example](#driver-migration-example) + - [Test Plan](#test-plan) + - [Prerequisite testing updates](#prerequisite-testing-updates) + - [Unit tests](#unit-tests) + - [e2e tests](#e2e-tests) + - [Graduation Criteria](#graduation-criteria) + - [Beta](#beta) + - [GA](#ga) + - [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy) + - [Version Skew Strategy](#version-skew-strategy) +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) + - [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning) + - [Monitoring Requirements](#monitoring-requirements) + - [Dependencies](#dependencies) + - [Scalability](#scalability) + - [Troubleshooting](#troubleshooting) +- [Implementation History](#implementation-history) +- [Drawbacks](#drawbacks) +- [Alternatives](#alternatives) + - [Alternative 1: Force migration approach](#alternative-1-force-migration-approach) + - [Alternative 2: Enhance protosanitizer](#alternative-2-enhance-protosanitizer) + - [Alternative 3: New dedicated token field in CSI spec](#alternative-3-new-dedicated-token-field-in-csi-spec) + - [Alternative 4: Automatic detection and dual placement](#alternative-4-automatic-detection-and-dual-placement) + - [Alternative 5: Use CSI capability instead of CSIDriver field](#alternative-5-use-csi-capability-instead-of-csidriver-field) +- [Infrastructure Needed](#infrastructure-needed) + + +## Release Signoff Checklist + +Items marked with (R) are required *prior to targeting to a milestone / release*. + +- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR) +- [ ] (R) KEP approvers have approved the KEP status as `implementable` +- [ ] (R) Design details are appropriately documented +- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) + - [ ] e2e Tests for all Beta API Operations (endpoints) + - [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) + - [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free +- [ ] (R) Graduation criteria is in place + - [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) +- [ ] (R) Production readiness review completed +- [ ] (R) Production readiness review approved +- [ ] "Implementation History" section is up-to-date for milestone +- [ ] User-facing documentation has been created in [kubernetes/website], for publication to [kubernetes.io] +- [ ] Supporting documentation—e.g., additional design documents, links to mailing list discussions/SIG meetings, relevant PRs/issues, release notes + +[kubernetes.io]: https://kubernetes.io/ +[kubernetes/enhancements]: https://git.k8s.io/enhancements +[kubernetes/kubernetes]: https://git.k8s.io/kubernetes +[kubernetes/website]: https://git.k8s.io/website + +## Summary + +This KEP proposes adding an opt-in mechanism for CSI drivers to receive service account tokens through the dedicated `secrets` field in `NodePublishVolumeRequest` instead of the `volume_context` field. A new field in the `CSIDriver` spec will allow drivers to explicitly opt into using the secrets field for improved security and proper handling of sensitive information. The default behavior will remain unchanged (tokens in volume context) to maintain backward compatibility. + +## Motivation + +Currently, when the `TokenRequests` field is set in the `CSIDriver` spec, service account tokens are generated by the kubelet and passed to CSI drivers as part of the volume attributes map with the key `csi.storage.k8s.io/serviceAccount.tokens`. These tokens could be used for workload identity in cloud providers, particularly in the Secrets Store CSI Driver where they access external secret stores. + +However, including sensitive service account tokens in the volume context map alongside non-sensitive information (such as pod name, namespace, service account name) is inappropriate. The volume context is not designed for sensitive data, and this approach has led to security issues: + +1. Security vulnerabilities: This approach led to CVE-2023-2878 in the Secrets Store CSI Driver and CVE-2024-3744 in the Azure File CSI Driver where service account tokens were logged as part of GRPC requests. +2. Insufficient protection: The protosanitizer tool used by CSI drivers doesn't consider volume context as "secret", so tokens were logged without sanitization. +3. Driver-specific workarounds: Each driver must implement custom logic to handle token sanitization, creating inconsistency and potential security gaps. + +The CSI specification provides a dedicated `secrets` field in `NodePublishVolumeRequest` that is more appropriate for sensitive information like service account tokens. + +### Goals + +- Add an opt-in mechanism for CSI drivers to receive service account tokens through the secrets field in `NodePublishVolumeRequest` +- Improve security by providing a proper way to handle sensitive token information for drivers that opt-in +- Maintain complete backward compatibility by keeping volume context as the default behavior +- Reduce the need for driver-specific security workarounds for drivers that choose to opt-in + +### Non-Goals + +- Changing the `TokenRequests` API in `CSIDriver` spec +- Modifying how service account tokens are generated +- Breaking backward compatibility or changing default behavior +- Forcing migration of existing CSI drivers + +## Proposal + +### User Stories + +#### Story 1 + +As a CSI driver developer, I want the option to configure my driver to receive service account tokens through the proper secrets field so that my driver can handle them securely without requiring custom sanitization logic, while maintaining the choice to use the existing volume context approach if needed. + +#### Story 2 + +As a cluster administrator, I want to ensure that CSI drivers that have opted into secure token handling will not accidentally expose service account tokens in logs or debug output, while not breaking existing drivers that haven't migrated yet. + +### Notes/Constraints/Caveats + +- CSI drivers must explicitly opt-in to receive tokens via the secrets field +- Default behavior remains unchanged (tokens in volume context) for backward compatibility +- This creates a permanent bifurcation - tokens will never be removed from volume context +- Each CSI driver can choose the approach that works best for their implementation + +### Risks and Mitigations + +One risk is the permanent maintenance of two token delivery mechanisms. However, this is an acceptable trade-off for maintaining backward compatibility and allowing gradual adoption. + +Another potential issue is confusion about which mechanism to use. We'll address this by providing clear documentation and best practice guidance recommending the secrets field for new drivers. + +## Design Details + +### Current Implementation + +Currently, when `TokenRequests` is specified in a `CSIDriver` spec, the kubelet: + +1. Generates service account tokens based on the audience and expiration specified in `TokenRequests` +2. Adds the tokens to the volume context map with key `csi.storage.k8s.io/serviceAccount.tokens` +3. Passes the volume context to the CSI driver via `NodePublishVolumeRequest.VolumeContext` + +### Proposed Implementation + +The proposed implementation will add a new field to the `CSIDriver` spec to allow drivers to opt-in to receiving tokens via the secrets field: + +```yaml +apiVersion: storage.k8s.io/v1 +kind: CSIDriver +metadata: + name: example-csi-driver +spec: + # ... existing fields ... + tokenRequests: + - audience: "example.com" + expirationSeconds: 3600 + # New field for opting into secrets delivery + serviceAccountTokenInSecrets: true # defaults to false +``` + +The behavior depends on the `serviceAccountTokenInSecrets` field: + +- `false` (default): Tokens are placed in `VolumeContext` with key `csi.storage.k8s.io/serviceAccount.tokens` (existing behavior) +- `true`: Tokens are placed only in the `Secrets` field with key `csi.storage.k8s.io/serviceAccount.tokens` and not in volume context + +The `serviceAccountTokenInSecrets` field has the following validation rules: + +1. Can only be set when `tokenRequests` is configured. The API server will reject CSIDriver specs that have `serviceAccountTokenInSecrets` set without any `tokenRequests`. +2. Can only be set when the `CSIServiceAccountTokenSecrets` feature gate is enabled. The API server will reject CSIDriver specs that set this field when the feature gate is disabled. + +These validations prevent misconfiguration and ensure proper downgrade behavior. + +When drivers use the default behavior (`serviceAccountTokenInSecrets: false` or not specified), kubelet will log a warning message indicating that this approach is not recommended for security reasons and suggesting the driver opt-in to the secrets field. + +The token key in the secrets field will be `csi.storage.k8s.io/serviceAccount.tokens`, the same key used in volume context. This makes migration easier for drivers since they only need to change where they read the tokens from, not what key to look for. + +### Driver Migration Example + +For CSI drivers that want to support both old and new kubelets during migration, here's example fallback code: + +```go +const serviceAccountTokenKey = "csi.storage.k8s.io/serviceAccount.tokens" + +func getServiceAccountTokens(req *csi.NodePublishVolumeRequest) (string, error) { + // Check secrets field first (new behavior when driver opts in) + if tokens, ok := req.Secrets[serviceAccountTokenKey]; ok { + return tokens, nil + } + + // Fall back to volume context (existing behavior) + if tokens, ok := req.VolumeContext[serviceAccountTokenKey]; ok { + return tokens, nil + } + + return "", fmt.Errorf("service account tokens not found") +} +``` + +This approach allows drivers to work with both token delivery mechanisms during the transition period. Once a driver has opted in via `serviceAccountTokenInSecrets: true` and all clusters have been upgraded, the volume context fallback becomes unnecessary but remains harmless. + +### Test Plan + +#### Prerequisite testing updates + +- Unit tests for kubelet volume manager to verify token placement in secrets field +- Integration tests to ensure CSI drivers receive tokens correctly + +#### Unit tests + +- `k8s.io/kubernetes/pkg/volume/csi`: `2025-10-03` - `73.6%` +- `k8s.io/kubernetes/pkg/apis/storage/validation`: `2025-10-03` - `96.2%` + +New tests will be added for: +- Token placement in secrets field when opt-in is enabled +- Token placement in volume context when opt-in is disabled (default) +- API validation that rejects `serviceAccountTokenInSecrets` being set without `tokenRequests` +- API validation that rejects `serviceAccountTokenInSecrets` being set when feature gate is disabled +- Feature gate behavior (enabled/disabled) + +#### e2e tests + +- Enhance existing e2e tests to cover both token delivery mechanisms. + +### Graduation Criteria + +#### Beta + +- Feature gate `CSIServiceAccountTokenSecrets` enabled by default +- New `serviceAccountTokenInSecrets` field added to `CSIDriver` spec +- API validation ensures the field can only be set when feature gate is enabled and `tokenRequests` is configured +- Comprehensive test coverage (unit, e2e) +- Documentation available + +**Rationale for starting at Beta:** Since the feature requires explicit opt-in (`serviceAccountTokenInSecrets` defaults to `false`), enabling the feature gate by default poses no risk to existing deployments. + +#### GA + +- Feature gate locked to true (always enabled) +- At least one CSI driver updated to use the secrets field +- Clear best practice documentation and migration guides + +### Upgrade / Downgrade Strategy + +During upgrades, the new field defaults to `false`, so existing CSI drivers continue to work exactly as before. Drivers can opt-in to the new behavior at their own pace. + +During downgrades where the feature gate is disabled, the `serviceAccountTokenInSecrets` field must be unset from all CSIDriver specs before the downgrade (API validation enforces this). Once downgraded, all tokens are placed in volume context (existing behavior). + +### Version Skew Strategy + +The feature will handle version skew gracefully: + +- New kubelet with old CSI driver: Existing behavior preserved (tokens in volume context) +- Old kubelet with new CSI driver spec: New field ignored, existing behavior preserved +- New kubelet with new CSI driver: Driver can opt-in to secrets field behavior + +## Production Readiness Review Questionnaire + +### Feature Enablement and Rollback + +###### How can this feature be enabled / disabled in a live cluster? + +- [x] Feature gate (also fill in values in `kep.yaml`) + - Feature gate name: CSIServiceAccountTokenSecrets + - Components depending on the feature gate: kubelet, kube-apiserver + +###### Does enabling the feature change any default behavior? + +No, enabling the feature gate only adds the ability for CSI drivers to opt-in to receiving tokens via the secrets field. The default behavior (tokens in volume context) remains unchanged for all existing drivers. + +###### Can the feature be disabled once it has been enabled (i.e. can we roll back the enablement)? + +Yes, but the `serviceAccountTokenInSecrets` field must be unset from all CSIDriver specs before disabling the feature gate. API validation prevents the field from being set when the feature gate is disabled. Once the feature gate is disabled, all tokens are placed in volume context (existing behavior). + +###### What happens if we re-enable the feature if it was previously rolled back? + +CSI drivers can set the `serviceAccountTokenInSecrets` field again in their CSIDriver specs. Once set, those drivers will receive tokens via the secrets field. + +###### Are there any tests for feature enablement/disablement? + +Yes, unit and e2e tests will verify behavior with the feature gate both enabled and disabled, and with the opt-in field set to both true and false. + +### Rollout, Upgrade and Rollback Planning + +###### How can a rollout or rollback fail? Can it impact already running workloads? + +The rollback failure risk is minimal since the feature maintains backward compatibility. CSI drivers that haven't opted-in continue to work unchanged. Only drivers that have explicitly opted-in would be affected if the feature is disabled. + +Running workloads are not impacted since this only affects new volume mount operations. + +###### What specific metrics will inform us that the feature is working as intended? + +- `storage_operation_duration_seconds` for volume mount operations should not show increased error rates when the feature is enabled. + +### Monitoring Requirements + +###### How can an operator determine if the feature is in use by workloads? + +Run `kubectl get CSIDriver` to see whether `serviceAccountTokenInSecrets` is set to `true`. + +###### How can someone using this feature know that it is working? + +- CSI drivers that have opted-in successfully receive tokens through the secrets field +- CSI drivers that haven't opted-in continue to receive tokens via volume context +- Appropriate warning messages appear in logs for drivers using volume context +- Improved security posture for drivers that have migrated to secrets field + +### Dependencies + +###### Does this feature depend on any specific services running in the cluster? + +No additional services required. The feature only modifies how kubelet communicates with CSI drivers. + +### Scalability + +###### Will enabling / using this feature result in any new API calls? + +No new API calls. This only changes the structure of existing CSI driver communication. + +###### Will enabling / using this feature result in introducing new API types? + +No new API types. This uses existing CSI specification fields. + +###### Will enabling / using this feature result in any new calls to the cloud provider? + +No new calls to cloud providers. + +### Troubleshooting + +###### How does this feature react if the cloud provider is unavailable? + +This feature is independent of cloud provider availability. It only affects how kubelet passes tokens to CSI drivers. + +###### What are the reasonable SLOs for this feature? + +The feature should not add measurable latency to volume mount operations. Token placement should succeed 100% of the time when the feature is enabled. + +###### What are the SLIs for this feature? + +- Success rate of token placement in secrets field +- Latency impact on volume mount operations (should be negligible) + +###### Are there any missing metrics that would be useful to have to improve observability of this feature? + +No. + +## Implementation History + +## Drawbacks + +This creates a permanent bifurcation where tokens can be delivered via two different mechanisms, which increases maintenance complexity over time. + +There's no guarantee that existing drivers will migrate to the new approach, so we may end up supporting both mechanisms indefinitely. + +We'll need to maintain documentation for both approaches, which adds complexity for users trying to understand the correct approach for their use case. + +## Alternatives + +### Alternative 1: Force migration approach + +We could gradually migrate all drivers from volume context to secrets field over multiple releases. + +This would eventually remove technical debt and provide a cleaner long-term design, but carries the risk of breaking existing drivers and requires a more complex migration path. + +### Alternative 2: Enhance protosanitizer + +Instead of providing an opt-in mechanism, we could enhance the protosanitizer library to sanitize specific keys from volume context. + +This requires no API changes and is backward compatible, but treats volume context as potentially sensitive and doesn't address the fundamental design issue. + +### Alternative 3: New dedicated token field in CSI spec + +We could create a new dedicated field specifically for service account tokens in the CSI spec. + +This would be very explicit and clear in purpose, but requires CSI spec changes when the secrets field already serves this purpose. + +### Alternative 4: Automatic detection and dual placement + +We could automatically place tokens in both volume context and secrets field for all drivers. + +This needs no driver changes and allows gradual adoption, but always doubles token placement overhead and the security issue remains in volume context. + +### Alternative 5: Use CSI capability instead of CSIDriver field + +We could add a new CSI capability that kubelet queries to determine whether a driver supports receiving tokens via the secrets field, instead of using a field in the `CSIDriver` API. + +This would be cleaner from a CSI specification perspective and avoid permanently adding a field to the Kubernetes `CSIDriver` API for backward compatibility. Drivers could opt-in via CSI capabilities during the GetPluginCapabilities call. + +However, service account tokens are not a first-class concept in the CSI specification - they're currently passed as Kubernetes-specific annotations in volume context. Adding a CSI capability for a Kubernetes-specific feature would be mixing concerns. Additionally, this would require changes to the CSI specification itself, which has a separate governance process and would slow down implementation. + +## Infrastructure Needed + +No additional infrastructure is required. The implementation only requires changes to kubelet and CSI driver implementations. diff --git a/keps/sig-storage/5538-csi-sa-tokens-secrets-field/kep.yaml b/keps/sig-storage/5538-csi-sa-tokens-secrets-field/kep.yaml new file mode 100644 index 00000000000..11374b4f36d --- /dev/null +++ b/keps/sig-storage/5538-csi-sa-tokens-secrets-field/kep.yaml @@ -0,0 +1,44 @@ +title: CSI driver opt-in for service account tokens via secrets field +kep-number: 5538 +authors: + - "@aramase" +owning-sig: sig-storage +participating-sigs: + - sig-auth + - sig-storage +status: implementable +creation-date: 2025-09-16 +reviewers: + - "@msau42" + - "@xing-yang" +approvers: + - "@msau42" + - "@saad-ali" + +see-also: + - "https://github.com/kubernetes/kubernetes/issues/118377" + +# The target maturity stage in the current dev cycle for this KEP. +# If the purpose of this KEP is to deprecate a user-visible feature +# and a Deprecated feature gates are added, they should be deprecated|disabled|removed. +stage: beta + +# The most recent milestone for which work toward delivery of this KEP has been +# done. This can be the current (upcoming) milestone, if it is being actively +# worked on. +latest-milestone: "v1.35" + +# The milestone at which this feature was, or is targeted to be, at each stage. +milestone: + alpha: "" + beta: "v1.35" + stable: "" + +# The following PRR answers are required at alpha release +# List the feature gate name and the components for which it must be enabled +feature-gates: + - name: CSIServiceAccountTokenSecrets + components: + - kube-apiserver + - kubelet +disable-supported: true