Skip to content

Commit 5739c63

Browse files
authored
Merge pull request kubernetes#1912 from zshihang/bound
add a section for service account admission controller migration
2 parents 21642d5 + b7901bc commit 5739c63

File tree

2 files changed

+211
-59
lines changed

2 files changed

+211
-59
lines changed

keps/sig-auth/20190806-serviceaccount-tokens.md renamed to keps/sig-auth/1205-bound-service-account-tokens/README.md

Lines changed: 175 additions & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,9 @@
1-
---
2-
title: Bound Service Account Tokens
3-
authors:
4-
- "@mikedanese"
5-
owning-sig: sig-auth
6-
approvers:
7-
- "@liggitt"
8-
- TBD
9-
creation-date: 2019-08-06
10-
last-updated: 2020-03-25
11-
status: implemented
12-
---
13-
141
# Bound Service Account Tokens
152

163
## Table Of Contents
174

185
<!-- toc -->
6+
197
- [Summary](#summary)
208
- [Background](#background)
219
- [Motivation](#motivation)
@@ -30,16 +18,21 @@ status: implemented
3018
- [Example Flow](#example-flow)
3119
- [Service Account Authenticator Modification](#service-account-authenticator-modification)
3220
- [ACLs for TokenRequest](#acls-for-tokenrequest)
33-
- [Safe Adoption](#safe-adoption)
21+
- [ServiceAccount Admission Controller Migration](#serviceaccount-admission-controller-migration)
22+
- [Prerequisites](#prerequisites)
3423
- [Safe rollout of time-bound token](#safe-rollout-of-time-bound-token)
3524
- [Graduation Criteria](#graduation-criteria)
25+
- [Alpha-&gt;Beta](#alpha-beta)
3626
- [Beta -&gt; GA Graduation](#beta---ga-graduation)
37-
<!-- /toc -->
27+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
28+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
29+
- [Scalability](#scalability)
30+
<!-- /toc -->
3831

3932
## Summary
4033

41-
This KEP describes an API that would allow workloads running on Kubernetes
42-
to request JSON Web Tokens that are audience, time and eventually key bound.
34+
This KEP describes an API that would allow workloads running on Kubernetes to
35+
request JSON Web Tokens that are audience, time and eventually key bound.
4336

4437
## Background
4538

@@ -89,9 +82,9 @@ TokenReview API will support this validation.
8982
Tokens issued from this API will be time bound. Time validity of these tokens
9083
will be claimed in the following fields:
9184

92-
* `exp`: expiration time
93-
* `nbf`: not before
94-
* `iat`: issued at
85+
- `exp`: expiration time
86+
- `nbf`: not before
87+
- `iat`: issued at
9588

9689
A recipient of a token should verify that the token is valid at the time that
9790
the token is presented, and should otherwise reject the token. The TokenReview
@@ -112,17 +105,17 @@ object will only be valid for as long as that object exists.
112105
Only a subset of object kinds will support object binding. Initially the only
113106
kinds that will be supported are:
114107

115-
* v1/Pod
116-
* v1/Secret
108+
- v1/Pod
109+
- v1/Secret
117110

118111
The TokenRequest API will validate this binding.
119112

120113
### API Changes
121114

122115
#### Add `tokenrequests.authentication.k8s.io`
123116

124-
We will add an imperative API (a la TokenReview) to the
125-
`authentication.k8s.io` API group:
117+
We will add an imperative API (a la TokenReview) to the `authentication.k8s.io`
118+
API group:
126119

127120
```golang
128121
type TokenRequest struct {
@@ -265,53 +258,176 @@ service account token on behalf of pods running on that node. The
265258
NodeRestriction admission controller will require that these tokens are pod
266259
bound.
267260

268-
### Safe Adoption
261+
### ServiceAccount Admission Controller Migration
262+
263+
#### Prerequisites
264+
265+
Before migration to a version with `BoundServiceAccountVolume=true`, cluster
266+
operators should make sure:
267+
268+
1. Set feature gate `TokenRequest=true`. (default to `true` since 1.12)
269+
270+
- This feature requires the following flags to the API server:
271+
- `--service-account-issuer`
272+
- `--service-account-signing-key-file`
273+
- `--service-account-key-file`
274+
- `--api-audiences` (default to `--service-account-issuer`)
275+
276+
2. Set feature gate `TokenRequestProjection=true`. (default to `true` since
277+
1.12)
278+
279+
3. Update all workloads to newer version of officially supported Kubernetes
280+
client libraries to reload token:
281+
282+
- Go: >= v0.15.7
283+
- Python: >= v12.0.0
284+
- Java: >= v9.0.0
285+
- Javascript: >= v0.10.3
286+
- Ruby: master branch
287+
- Haskell: v0.3.0.0
288+
289+
For community-maintained client libraries, feel free to contribute to them
290+
if the reloading logic is missing.
291+
292+
**Note**: If having trouble in finding places using in-cluster config
293+
completely, cluster operators can specify flag
294+
`--service-account-extend-token-expiration` to kube apiserver to allow
295+
tokens have longer expiration temporarily during the migration. Any usage of
296+
legacy token will be recorded in both metrics and audit logs. After fixing
297+
all the potentially broken workloads, don't forget to remove the flag so
298+
that the original expiration settings are honored.
299+
300+
- Metrics: `serviceaccount_stale_tokens_total`
301+
- Audit: looking for `authentication.k8s.io/stale-token` annotation
302+
303+
See next section for the details of how to discover the workloads that will
304+
suffer from expired tokens.
305+
306+
If anything goes wrong, please file a bug and CC @kubernetes/sig-auth-bugs. More
307+
contact information
308+
[here](https://github.com/kubernetes/community/tree/master/sig-auth#contact).
269309

270310
#### Safe rollout of time-bound token
271311

272-
Legacy service account tokens distributed via secrets are not time-bound.
273-
Many client libraries have come to depend on this behavior. After time-bound service
274-
account token being used, if in-cluster clients do not periodically reload token
275-
from projected volume, requests would be rejected once the initial token got expired.
312+
Legacy service account tokens distributed via secrets are not time-bound. Many
313+
client libraries have come to depend on this behavior. After time-bound service
314+
account token being used, if in-cluster clients do not periodically reload token
315+
from projected volume, requests would be rejected once the initial token got
316+
expired.
276317

277318
In order to allow guadual adoption of time-bound token, we would:
278-
1. Pick a constant period D between one and two hours. The value of D would be static
279-
across Kubernetes deployments, while avoiding collision with common duration.
280-
1. Modify service account admission control to inject token valid for D when the
281-
BoundServiceAccountTokenVolume feature is enabled.
282-
1. Modify kube apiserver TokenRequest API. When it receives TokenRequest with requested
283-
valid period D, extend the token lifetime to one year. At the same time, save
284-
the original requested D to `kubernetes.io/warnafter` field in minted token.
285-
1. In the TokenRequest status, tell clients that the token would be valid only for D,
286-
encouraging clients to reload token as if the token was valid for D.
287-
319+
320+
1. Pick a constant period D between one and two hours. The value of D would be
321+
static across Kubernetes deployments, while avoiding collision with common
322+
duration.
323+
1. Modify service account admission control to inject token valid for D when
324+
the BoundServiceAccountTokenVolume feature is enabled.
325+
1. Modify kube apiserver TokenRequest API. When it receives TokenRequest with
326+
requested valid period D, extend the token lifetime to one year. At the same
327+
time, save the original requested D to `kubernetes.io/warnafter` field in
328+
minted token.
329+
1. In the TokenRequest status, tell clients that the token would be valid only
330+
for D, encouraging clients to reload token as if the token was valid for D.
331+
288332
This modification could be optionally enabled by providing a command line flag
289-
to kube apiserver.
333+
to kube apiserver.
334+
335+
These extended tokens would not expire and continue to be accepted within one
336+
year. At the same time, the authentication side could monitor whether clients
337+
are properly reloading tokens by:
290338

291-
These extended tokens would not expire and continue to be accepted within one year.
292-
At the same time, the authentication side could monitor whether clients are properly
293-
reloading tokens by:
339+
1. Compare the `kubernetes.io/warnafter` field with current time. If current
340+
time is after `kubernetes.io/warnafter` field, it implies calling client is
341+
not reloading token regularly.
342+
1. Expose metrics to monitor number of legacy and stale token used.
343+
1. Add annotation to audit events for legacy and stale tokens including
344+
necessary information to locate problematic client.
294345

295-
1. Compare the `kubernetes.io/warnafter` field with current time. If current time
296-
is after `kubernetes.io/warnafter` field, it implies calling client is not
297-
reloading token regularly.
298-
1. Expose metrics to monitor number of legacy and stale token used.
299-
1. Add annotation to audit events for legacy and stale tokens including necessary
300-
information to locate problematic client.
301-
302-
This functionality can be implemented entirely in the kube-apiserver and does not
303-
require cooperation by the kubelet beyond the standard functionality implemented
304-
as part of [Service Account Token Volumes](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/storage/svcacct-token-volume-source.md).
346+
### Graduation Criteria
305347

306-
We will direct cluster administrators to adopt bound service account tokens and
307-
turn on the command line flag to enable extended token. Fix clients if monitoring or
308-
audit report that stale or legacy token are in use. The operator can turn off
309-
the command line flag after no stale token is being used.
348+
#### Alpha->Beta
310349

350+
Estimated version: v1.20
311351

352+
All known migration frictions have been fixed:
312353

313-
### Graduation Criteria
354+
- PodSecurityPolicies that allow secrets but not projected volumes will
355+
prevent the use of token volumes.
356+
- Fixed in https://github.com/kubernetes/kubernetes/pull/92006
357+
- In-cluster clients that don’t reload service account tokens will start
358+
failing an hour after deployment.
359+
- Mitigation added in https://github.com/kubernetes/kubernetes/issues/68164
360+
- Pods running as non root may not access the service account token.
361+
- Fixed in https://github.com/kubernetes/kubernetes/pull/89193
362+
363+
An upgrade test is passing periodically:
364+
365+
1. Create pod A with feature disabled where pod A is working and a secret volume
366+
is mounted.
367+
2. Enable feature where pod A continue working
368+
3. Create pod B and it is working and projected volumes are mounted.
314369

315370
#### Beta -> GA Graduation
316371

317-
- TBD
372+
Estimated version: v1.21+
373+
374+
New `ServiceAccount` admission controller WAI in Beta for >= 1 minor without
375+
significant issues.
376+
377+
## Production Readiness Review Questionnaire
378+
379+
### Feature Enablement and Rollback
380+
381+
- **How can this feature be enabled / disabled in a live cluster?**
382+
383+
- Feature gate name: `BoundServiceAccountTokenVolume`
384+
- Components depending on the feature gate: kube-apiserver and
385+
kube-controller-manager
386+
- Will enabling / disabling the feature require downtime of the control
387+
plane? yes, need to restart kube-apiserver and kube-controller-manager.
388+
- Will enabling / disabling the feature require downtime or reprovisioning
389+
of a node? no.
390+
391+
- **Does enabling the feature change any default behavior?** yes, pods'
392+
service account tokens will not be long-lived and are not stored as Secrets
393+
any more.
394+
395+
- **Can the feature be disabled once it has been enabled (i.e. can we roll
396+
back the enablement)?** yes. pods created while the feature was enabled will
397+
reference a configmap that can grow stale with the feature disabled.
398+
399+
- **What happens if we reenable the feature if it was previously rolled
400+
back?** the same as the first enablement.
401+
402+
- **Are there any tests for feature enablement/disablement?**
403+
- unit test: plugin/pkg/admission/serviceaccount/admission_test.go
404+
- upgrade test: test/e2e/upgrades/serviceaccount_admission_controller_migration.go
405+
406+
### Scalability
407+
408+
- **Will enabling / using this feature result in any new API calls?**
409+
410+
- API call type: `TokenRequest`
411+
- estimated throughput: 1/pod every ~48 minutes.
412+
- originating component: kubelet
413+
- components listing and/or watching resources they didn't before: N/A.
414+
- API calls that may be triggered by changes of some Kubernetes resources:
415+
N/A.
416+
- periodic API calls to reconcile state (e.g. periodic fetching state,
417+
heartbeats, leader election, etc.): 1 call per pod every ~48 minutes.
418+
419+
- **Will enabling / using this feature result in introducing new API types?**
420+
no.
421+
422+
- **Will enabling / using this feature result in any new calls to the cloud
423+
provider?** no.
424+
425+
- **Will enabling / using this feature result in increasing size or count of
426+
the existing API objects?** no.
427+
428+
- **Will enabling / using this feature result in increasing time taken by any
429+
operations covered by [existing SLIs/SLOs]?** no.
430+
431+
- **Will enabling / using this feature result in non-negligible increase of
432+
resource usage (CPU, RAM, disk, IO, ...) in any components?** it adds a
433+
token minting operation in the API server every ~48 minutes for every pod.
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
---
2+
title: Bound Service Account Tokens
3+
authors:
4+
- "@mikedanese"
5+
- "@zshihang"
6+
owning-sig: sig-auth
7+
participating-sigs:
8+
- sig-auth
9+
reviewers:
10+
- "@liggitt"
11+
approvers:
12+
- "@liggitt"
13+
creation-date: 2019-08-06
14+
last-updated: 2020-09-23
15+
status: implementable
16+
stage: beta
17+
latest-milestone: "v1.20"
18+
milestone:
19+
alpha: "v1.13"
20+
beta: "v1.20"
21+
feature-gates:
22+
- name: TokenRequest
23+
components:
24+
- kube-apiserver
25+
- kube-controller-manager
26+
- name: TokenRequestProjection
27+
components:
28+
- kube-apiserver
29+
- kubelet
30+
- name: BoundServiceAccountTokenVolume
31+
components:
32+
- kube-apiserver
33+
- kube-controller-manager
34+
metrics:
35+
- serviceaccount_stale_tokens_total
36+
---

0 commit comments

Comments
 (0)