|
| 1 | +# Service Account Token for CSI Driver |
| 2 | + |
| 3 | +## Table of Contents |
| 4 | + |
| 5 | +<!-- toc --> |
| 6 | + |
| 7 | +- [Summary](#summary) |
| 8 | +- [Motivation](#motivation) |
| 9 | + - [User stories](#user-stories) |
| 10 | +- [Proposal](#proposal) |
| 11 | + - [Goals](#goals) |
| 12 | + - [Non-Goals](#non-goals) |
| 13 | + - [API Changes](#api-changes) |
| 14 | + - [Example Workflow](#example-workflow) |
| 15 | + - [Notes/Constraints/Caveats](#notesconstraintscaveats) |
| 16 | + - [Test Plan](#test-plan) |
| 17 | + - [Graduation Criteria](#graduation-criteria) |
| 18 | + - [Alpha->Beta](#alpha-beta) |
| 19 | + - [Beta->GA](#beta-ga) |
| 20 | +- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire) |
| 21 | + - [Feature Enablement and Rollback](#feature-enablement-and-rollback) |
| 22 | + - [Scalability](#scalability) |
| 23 | +- [Alternatives](#alternatives) |
| 24 | +- [Implementation History](#implementation-history) |
| 25 | +<!-- /toc --> |
| 26 | + |
| 27 | +## Summary |
| 28 | + |
| 29 | +This KEP proposes a way to obtain service account token for pods that the CSI |
| 30 | +drivers are mounting volumes for. Since these tokens are valid only for a |
| 31 | +limited period, this KEP will also give the CSI drivers an option to re-execute |
| 32 | +`NodePublishVolume` to mount volumes. |
| 33 | + |
| 34 | +## Motivation |
| 35 | + |
| 36 | +Currently, the only way that CSI drivers acquire service account tokens is to |
| 37 | +directly read the token in the file system. However, this approach has |
| 38 | +uncharming traits: |
| 39 | + |
| 40 | +1. It will not work for csi drivers which run as a different non-root user than |
| 41 | + the pods. See |
| 42 | + [file permission section for service account token](https://github.com/kubernetes/enhancements/blob/master/keps/sig-storage/20180515-svcacct-token-volumes.md#file-permission). |
| 43 | +2. CSI driver will have access to the secrets of pods that do not use it |
| 44 | + because the CSI driver should have a `hostPath` volume for the `pods` |
| 45 | + subdirectory to read the token. |
| 46 | +3. The audience of the token is defaulted to kube apiserver. |
| 47 | +4. The token is not guaranteed to be available (e.g. |
| 48 | + `automountServiceAccountToken=false`). |
| 49 | + |
| 50 | +### User stories |
| 51 | + |
| 52 | +- HashiCorp Vault provider for secret store CSI driver requires service |
| 53 | + account token of the pods they are mounting secrets at to authenticate to |
| 54 | + Vaults. The provisioned secrets also have given TTL in Vault, so it is |
| 55 | + necessary get tokens after the initial mount. |
| 56 | +- Cert manager CSI dirver will create CertificateRequests on behalf of the |
| 57 | + pods. |
| 58 | +- Amazon EFS CSI driver wants the service account tokens of pods to exchange |
| 59 | + for AWS credentials. |
| 60 | + |
| 61 | +## Proposal |
| 62 | + |
| 63 | +### Goals |
| 64 | + |
| 65 | +- Allow CSI driver to request audience-bounded service account tokens of pods |
| 66 | + from kubelet to `NodePublishVolume`. |
| 67 | +- Provide an option to re-execute `NodePublishVolume` in a best-effort manner. |
| 68 | + |
| 69 | +### Non-Goals |
| 70 | + |
| 71 | +- Other CSI calls e.g. `NodeStageVolume` may not acquire pods' service account |
| 72 | + tokens via this feature. |
| 73 | +- Failed re-execution of `NodePublishVolume` will not unmount volumes. |
| 74 | + |
| 75 | +### API Changes |
| 76 | + |
| 77 | +```go |
| 78 | +// CSIDriverSpec is the specification of a CSIDriver. |
| 79 | +type CSIDriverSpec struct { |
| 80 | + ... // existing fields |
| 81 | + |
| 82 | + RequiresRemount *bool |
| 83 | + ServiceAccountTokens []ServiceAccountToken |
| 84 | +} |
| 85 | + |
| 86 | +// ServiceAccountToken contains parameters of a token. |
| 87 | +type ServiceAccountToken struct { |
| 88 | + Audience *string |
| 89 | + ExpirationSeconds *int64 |
| 90 | +} |
| 91 | +``` |
| 92 | + |
| 93 | +These three fields are all optional: |
| 94 | + |
| 95 | +- **`ServiceAccountToken.Audience`**: will be set in `TokenRequestSpec`. This |
| 96 | +- will default to `APIAudiences` of kube-apiserver if it is empty. The storage |
| 97 | + provider of the CSI driver is supposed to send a `TokenReview` with at least |
| 98 | + one of the audiences specified. |
| 99 | + |
| 100 | +- **`ServiceAccountToken.ExpirationSeconds`**: will be set in |
| 101 | + `TokenRequestSpec`. The issued token may have a different duration, so the |
| 102 | + `ExpirationTimestamp` in `TokenRequestStatus` will be passed to CSI driver. |
| 103 | + |
| 104 | +- **`RequiresRemount`**: should be only set when the mounted volumes by the |
| 105 | + CSI driver have TTL and require re-validation on the token. |
| 106 | + |
| 107 | + - **Note**: Remount means re-execution of `NodePublishVolume` in scope of |
| 108 | + CSI and there is no intervening unmounts. If use this option, |
| 109 | + `NodePublishVolume` should only change the contents rather than the |
| 110 | + mount because container will not be restarted to reflect the mount |
| 111 | + change. The period between remounts is 0.1s which is hardcoded as |
| 112 | + `reconcilerLoopSleepPeriod` in volume manager. However, the rate |
| 113 | + `TokenRequest` is not 10/s because it will be cached until expiration. |
| 114 | + |
| 115 | +The token will be bounded to the pod that the CSI driver is mounting volumes for |
| 116 | +and will be set in `VolumeContext`: |
| 117 | + |
| 118 | +```go |
| 119 | +"csi.storage.k8s.io/serviceAccount.tokens": { |
| 120 | + 'audience': { |
| 121 | + 'token': token, |
| 122 | + 'expiry': expiry, |
| 123 | + }, |
| 124 | + ... |
| 125 | +} |
| 126 | +``` |
| 127 | + |
| 128 | +### Example Workflow |
| 129 | + |
| 130 | +Take the Vault provider for secret store CSI driver as an example: |
| 131 | + |
| 132 | +1. Create `CSIDriver` object with `ServiceAccountToken[0].Audience=['vault']` |
| 133 | + and `RequiresRemount=true`. |
| 134 | +2. When the volume manager of kubelet sees a new volume, the pod object in |
| 135 | + `mountedPods` will have `requiresRemound=true` after `MarkRemountRequired` |
| 136 | + is called. `MarkRemountRequired` will call into `RequiresRemount` of the |
| 137 | + in-tree csi plugin to fetch the `CSIDriver` object. |
| 138 | +3. Before `NodePublishVolume` call, kubelet will request token from |
| 139 | + `TokenRequest` api with `audiences=['vault']`. |
| 140 | +4. The token will be specified in `VolumeContext` to `NodePublishVolume` call. |
| 141 | +5. Every 0.1 second, the reconciler component of volume manager will remount |
| 142 | + the volume in case the vault secrets expire and re-login is required. |
| 143 | + |
| 144 | +### Notes/Constraints/Caveats |
| 145 | + |
| 146 | +The `RequiresRemount` is useful when the mounted volumes can expire and the |
| 147 | +availability and validity of volumes are continuously required. Those volumes |
| 148 | +are most likely credentials which rotates for the best security practice. There |
| 149 | +are two options when the remount failed: |
| 150 | + |
| 151 | +1. Keep the container/pod running and use the old credentials. |
| 152 | + - The next `NodePublishVolume` may succeed if it was unlucky transient |
| 153 | + failure. |
| 154 | + - Given there are multiple of 0.1 second usage of stale credentials, it is |
| 155 | + critical for the credential provisioners to guarantee that the validity |
| 156 | + is revoked after expiry. In general, it is much harder to eliminate the |
| 157 | + sinks than source. |
| 158 | + - The container/pod will also have better observability in usage of the |
| 159 | + stale credentials. |
| 160 | +2. Kill the container/pod and hopefully the new container/pod has the refreshed |
| 161 | + credentials. |
| 162 | + - This will reduce the stale volume exposure by one sink. |
| 163 | + - More likely to overcome fatal errors. |
| 164 | + - Container start-up cost is high |
| 165 | + |
| 166 | +Option 1 is adopted. See discussion |
| 167 | +[here](https://github.com/kubernetes/enhancements/pull/1855#discussion_r443040359). |
| 168 | + |
| 169 | +### Test Plan |
| 170 | + |
| 171 | +- Unit tests around all the added logic in kubelet. |
| 172 | +- E2E tests around remount and token passing. |
| 173 | + |
| 174 | +### Graduation Criteria |
| 175 | + |
| 176 | +#### Alpha->Beta |
| 177 | + |
| 178 | +- Implemented the feature. |
| 179 | +- Wrote all the unit and E2E tests. |
| 180 | + |
| 181 | +#### Beta->GA |
| 182 | + |
| 183 | +- Deployed the feature in production and went through at least minor k8s |
| 184 | + version. |
| 185 | + |
| 186 | +## Production Readiness Review Questionnaire |
| 187 | + |
| 188 | +### Feature Enablement and Rollback |
| 189 | + |
| 190 | +- **How can this feature be enabled / disabled in a live cluster?** |
| 191 | + |
| 192 | + - Feature gate name: CSIDriverServiceAccountToken |
| 193 | + - Components depending on the feature gate: kubelet, kube-apiserver |
| 194 | + - Will enabling / disabling the feature require downtime of the control |
| 195 | + plane? no. |
| 196 | + - Will enabling / disabling the feature require downtime or reprovisioning |
| 197 | + of a node? yes. |
| 198 | + |
| 199 | +- **Does enabling the feature change any default behavior?** no. |
| 200 | + |
| 201 | +- **Can the feature be disabled once it has been enabled (i.e. can we roll |
| 202 | + back the enablement)?** yes, as long as the new fields in `CSIDriverSpec` is |
| 203 | + not used. |
| 204 | + |
| 205 | +- **What happens if we reenable the feature if it was previously rolled |
| 206 | + back?** nothing, as long as the new fields in `CSIDriverSpec` is not used. |
| 207 | + |
| 208 | +- **Are there any tests for feature enablement/disablement?** yes, unit tests |
| 209 | + will cover this. |
| 210 | + |
| 211 | +### Scalability |
| 212 | + |
| 213 | +- **Will enabling / using this feature result in any new API calls?** |
| 214 | + |
| 215 | + - API call type: `TokenRequest` |
| 216 | + - estimated throughput: 1(`RequiresRemount=false`) or |
| 217 | + 1/ExpirationSeconds/s(`RequiresRemount=true`) for each CSI driver using |
| 218 | + this feature. |
| 219 | + - originating component: kubelet |
| 220 | + - components listing and/or watching resources they didn't before: n/a. |
| 221 | + - API calls that may be triggered by changes of some Kubernetes resources: |
| 222 | + n/a. |
| 223 | + - periodic API calls to reconcile state (e.g. periodic fetching state, |
| 224 | + heartbeats, leader election, etc.): n/a. |
| 225 | + |
| 226 | +- **Will enabling / using this feature result in introducing new API types?** |
| 227 | + no. |
| 228 | + |
| 229 | +- **Will enabling / using this feature result in any new calls to the cloud |
| 230 | + provider?** no. |
| 231 | + |
| 232 | +- **Will enabling / using this feature result in increasing size or count of |
| 233 | + the existing API objects?** no. |
| 234 | + |
| 235 | +- **Will enabling / using this feature result in increasing time taken by any |
| 236 | + operations covered by [existing SLIs/SLOs]?** no. |
| 237 | + |
| 238 | +- **Will enabling / using this feature result in non-negligible increase of |
| 239 | + resource usage (CPU, RAM, disk, IO, ...) in any components?** no. |
| 240 | + |
| 241 | +## Alternatives |
| 242 | + |
| 243 | +1. Instead of fetching tokens in kubelet, CSI drivers will be granted |
| 244 | + permission to `TokenRequest` api. This will require non-trivial admission |
| 245 | + plugin to do necessary validation and every csi driver needs to reimplement |
| 246 | + the same functionality. |
| 247 | + |
| 248 | +## Implementation History |
0 commit comments