Skip to content

Commit e95fd98

Browse files
authored
Merge pull request #4470 from munnerz/4193-update-milestone
KEP-4193: Update milestone for v1.30 & clarify latest state
2 parents d26a46e + 1ded674 commit e95fd98

File tree

3 files changed

+115
-13
lines changed

3 files changed

+115
-13
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 4193
22
alpha:
33
approver: "deads2k"
4+
beta:
5+
approver: "@deads2k"

keps/sig-auth/4193-bound-service-account-token-improvements/README.md

Lines changed: 112 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -45,14 +45,14 @@
4545

4646
Items marked with (R) are required *prior to targeting to a milestone / release*.
4747

48-
- [ ] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
49-
- [ ] (R) KEP approvers have approved the KEP status as `implementable`
50-
- [ ] (R) Design details are appropriately documented
51-
- [ ] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
48+
- [x] (R) Enhancement issue in release milestone, which links to KEP dir in [kubernetes/enhancements] (not the initial KEP PR)
49+
- [x] (R) KEP approvers have approved the KEP status as `implementable`
50+
- [x] (R) Design details are appropriately documented
51+
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
5252
- [ ] e2e Tests for all Beta API Operations (endpoints)
5353
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
5454
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
55-
- [ ] (R) Graduation criteria is in place
55+
- [x] (R) Graduation criteria is in place
5656
- [ ] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
5757
- [ ] (R) Production readiness review completed
5858
- [ ] (R) Production readiness review approved
@@ -148,7 +148,10 @@ When a TokenRequest is being issued/fulfilled, we will modify the issuing code t
148148
can be later used to trace the requests that a specific issued token has made to the apiserver via the audit log.
149149

150150
This will require changing the JWT issuing code to actually generate this UUID, as well as extending the code around the
151-
audit log to have it record this information into audit entries.
151+
audit log to have it record this information into audit entries when a token is issued (via the `authentication.k8s.io/issued-credential-id` audit annotation).
152+
153+
As this UUID will be embedded as part of a user's ExtraInfo, it'll automatically be persisted into audit events for all
154+
requests made using a token that embeds a credential identifier (as `authentication.k8s.io/credential-id`).
152155

153156
### User Stories (Optional)
154157

@@ -164,10 +167,10 @@ The node assertion can be checked to ensure the host identity matches the node a
164167
Bob is an administrator of a cluster and has noticed some strange request patterns from an unknown service account.
165168

166169
Bob would like to understand who initially issued/authorised this token to be issued. To do so, Bob looks up the JTI
167-
of the token making the suspicious requests by looking inside the audit log entries for these suspect requests.
170+
of the token making the suspicious requests by looking inside the audit log entries at user's ExtraInfo for these suspect requests.
168171

169172
This JTI is then used for a further audit log lookup - namely, looking for the TokenRequest `create` call which contains
170-
the audit annotation with key `authentication.kubernetes.io/token-identifier` and the value set to that of the suspect token.
173+
the audit annotation with key `authentication.kubernetes.io/issued-credential-id` and the value set to that of the suspect token.
171174

172175
This allows Bob to determine precisely who made the original request for this token, and (depending on the 'chain'
173176
above this token), allows Bob to recursively perform this lookup to find all involved parties that led to this token
@@ -216,7 +219,7 @@ when drafting this test plan.
216219
[testing-guidelines]: https://git.k8s.io/community/contributors/devel/sig-testing/testing.md
217220
-->
218221

219-
[ ] I/we understand the owners of the involved components may require updates to
222+
[x] I/we understand the owners of the involved components may require updates to
220223
existing tests to make this code solid enough prior to committing the changes necessary
221224
to implement this enhancement.
222225

@@ -229,17 +232,34 @@ implementing this enhancement to ensure the enhancements have also solid foundat
229232

230233
##### Unit tests
231234

235+
`pkg/registry/core/serviceaccount/storage`:
236+
* Coverage before (`release-1.28`): `k8s.io/kubernetes/pkg/registry/core/serviceaccount/storage 8.354s coverage: 10.7% of statements`
237+
* Coverage after: `k8s.io/kubernetes/pkg/registry/core/serviceaccount/storage 8.394s coverage: 8.7% of statements`
238+
* Test ensuring audit annotations are added to audit events for the `serviceaccounts/<name>/token` subresource.
239+
* Tests verifying it's possible to bind a token to a Node object.
240+
* Tests ensuring tokens bound to pod objects also embed associated node metadata.
241+
* NOTE: the majority of this file is untested with *unit tests* (instead, using integration tests). [#121515](https://github.com/kubernetes/kubernetes/issues/121515).
242+
232243
`staging/src/k8s.io/apiserver/pkg/authentication/serviceaccount`:
244+
* Coverage before (`release-1.28`): `k8s.io/apiserver/pkg/authentication/serviceaccount 0.567s coverage: 60.8% of statements`
245+
* Coverage after: `k8s.io/apiserver/pkg/authentication/serviceaccount 0.569s coverage: 70.1% of statements`
233246
* Test ensuring that service account info (JTI, node name and UID) is correctly extracted from a presented JWT.
234247
* Tests to ensure the information is NOT extracted when the feature gate is disabled.
235248

236249
`pkg/serviceaccount`:
250+
* Coverage before (`release-1.28`): `k8s.io/kubernetes/pkg/serviceaccount 0.755s coverage: 72.4% of statements`
251+
* Coverage after: `k8s.io/kubernetes/pkg/serviceaccount 0.786s coverage: 72.7% of statements`
237252
* Extending tests to ensure Node info is embedded into extended claims (name and uid)
238253
* Tests to ensure `ID`/`JTI` field is always set to a random UUID.
239254
* Tests to ensure the info embedded on a JWT is extracted from the token and into the ServiceAccountInfo when
240255
a token is validated.
241256
* Tests to ensure the information is NOT embedded or extracted when the feature gate is disabled.
242257

258+
`staging/src/k8s.io/kubectl/pkg/cmd/create`:
259+
* Coverage before (`release-1.28`): `k8s.io/kubectl/pkg/cmd/create 0.995s coverage: 55.1% of statements`
260+
* Coverage after: `k8s.io/kubectl/pkg/cmd/create 0.949s coverage: 55.2% of statements`
261+
* Add tests ensuring it's possible to request a token that is bound to a Node object (gated by environment variable during alpha)
262+
243263
<!--
244264
In principle every added code should have complete unit test coverage, so providing
245265
the exact set of tests will not bring additional value.
@@ -283,7 +303,12 @@ For Beta and GA, add links to added tests together with links to k8s-triage for
283303
https://storage.googleapis.com/k8s-triage/index.html
284304
-->
285305

286-
- <test>: <link to test coverage>
306+
`k8s.io/test/integration/sig-auth/svcacct_test.go`
307+
- [TestServiceAccountTokenCreate_bound to a service account and pod](https://github.com/kubernetes/kubernetes/blob/release-1.29/test/integration/auth/svcaccttoken_test.go#L247)
308+
- [TestServiceAccountTokenCreate_bound to service account and a pod with an assigned nodeName that does not exist](https://github.com/kubernetes/kubernetes/blob/release-1.29/test/integration/auth/svcaccttoken_test.go#L415)
309+
- [TestServiceAccountTokenCreate_bound to service account and a pod with an assigned nodeName](https://github.com/kubernetes/kubernetes/blob/release-1.29/test/integration/auth/svcaccttoken_test.go#L416)
310+
- [TestServiceAccountTokenCreate_fails to bind to a Node if the feature gate is disabled](https://github.com/kubernetes/kubernetes/blob/release-1.29/test/integration/auth/svcaccttoken_test.go#L418)
311+
- [TestServiceAccountTokenCreate_bound to service account and node](https://github.com/kubernetes/kubernetes/blob/release-1.29/test/integration/auth/svcaccttoken_test.go#L448)
287312

288313
##### e2e tests
289314

@@ -304,7 +329,11 @@ https://storage.googleapis.com/k8s-triage/index.html
304329
#### Beta
305330

306331
- Decide what the default of the new flag should be
332+
- Decision: this flag was not added during alpha, and MAY be added post-beta, but will definitely default to **off**.
333+
- This does not need to block promotion of ServiceAccountTokenPodNodeInfo feature as a result.
307334
- Decide if using an audit annotation is the correct approach
335+
- Decision: audit annotation is the correct approach as this is only for `serviceaccounts/<name>/token` requests, not all
336+
- Renaming audit annotation to `authentication.kubernetes.io/issued-credential-id` to disambiguate from `authentication.kubernetes.io/credential-id` in user's ExtraInfo
308337
- Docs around the SA JWT schema (this does not exist today)
309338

310339
#### GA
@@ -360,9 +389,17 @@ you need any help or guidance.
360389

361390
* `ServiceAccountTokenJTI` feature flag will toggle including JTI information in tokens, as well as recording JTIs in the audit log / the SA user info.
362391
* `ServiceAccountTokenPodNodeInfo` feature flag will toggle including node info associated with pods in tokens.
392+
* `ServiceAccountTokenNodeBindingValidation` feature flag will toggle the apiserver validating Node claims in node bound service account tokens.
363393
* `ServiceAccountTokenNodeBinding` feature flag will toggle allowing service account tokens to be bound to Node objects.
364394

365-
Both of these feature flags can be disabled without any unexpected adverse affects or coordination required.
395+
The `ServiceAccountTokenNodeBindingValidation` feature will graduate to beta in version v1.30, a release earlier than `ServiceAccountTokenNodeBinding`
396+
to ensure a safe rollback from version v1.31 to v1.30 (more info below in rollback considerations section).
397+
398+
The `ServiceAccountTokenNodeBinding` feature gate must only be enabled once the `ServiceAccountTokenNodeBindingValidation` feature has been enabled.
399+
Disabling the `ServiceAccountTokenNodeBindingValidation` feature whilst keeping `ServiceAccountTokenNodeBinding` would allow tokens that are expected to
400+
be bound to the lifetime of a particular Node to validate even if that Node no longer exists.
401+
402+
All other feature flags can be disabled without any unexpected adverse affects or coordination required.
366403

367404
###### How can this feature be enabled / disabled in a live cluster?
368405

@@ -413,7 +450,8 @@ The `ServiceAccountTokenNodeBindingValidation` feature gate should be enabled an
413450
any server.
414451

415452
The `ServiceAccountTokenNodeBindingValidation` will be defaulted to on one release **before** `ServiceAccountTokenNodeBinding`
416-
to account for this.
453+
to account for this. Concretely, `ServiceAccountTokenNodeBindingValidation` will be enabled by default in v1.30 and
454+
`ServiceAccountTokenNodeBinding` will be enabled by default in v1.31.
417455

418456
This should not have any issues/affect during upgrades.
419457
Rollback is done by removing/disabling the feature gate(s).
@@ -431,6 +469,10 @@ To help avoid this, the feature will be graduated in two phases:
431469

432470
This allows for a safe rollback in which the same security expectations are enforced once a token has been issued.
433471

472+
If a user explicitly *disables* `ServiceAccountTokenNodeBindingValidation` but keeps `ServiceAccountTokenNodeBinding` enabled,
473+
the node claims in the issued tokens will not be properly validated. This configuration will be explicitly denied by the
474+
kube-apiserver and will cause it to exit on startup.
475+
434476
###### What specific metrics should inform a rollback?
435477

436478
* `authentication_attempts`
@@ -447,6 +489,52 @@ New metrics that can be used to identify if the feature is in use:
447489

448490
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
449491

492+
**For `ServiceAccountTokenJTI` feature (alpha v1.29, beta v1.30):**
493+
494+
*Without* the feature gate enabled, issued service account tokens *will not* have their `jti` field set to a random UUID,
495+
and the audit log will not persist the issued credential identifier when issuing a token.
496+
497+
*With* the feature gate enabled, issued service accounts will set the `jti` field to a random UUID.
498+
Additionally, the audit event recorded when issuing a new token will have a new annotation added (`authentication.k8s.io/issued-credential-id`).
499+
As a service account's JTI field is used to infer the credential identifier, which forms part of a users `ExtraInfo`,
500+
audit events generated using this newly issued token will also include this JTI (persisted as `authentication.k8s.io/credential-id`).
501+
502+
If the feature is *disabled* and a token is presented that includes a credential identifier, **it will still be persisted into the audit log**
503+
as part of the UserInfo in the audit event.
504+
505+
As none of these fields are actually used for validating/verifying a token is valid, enabling & disabling the feature
506+
does not cause any adverse side effects.
507+
508+
**For `ServiceAccountTokenNodeBinding` (alpha v1.29, beta v1.31) and `ServiceAccountTokenNodeBindingValidation` (alpha v1.29, beta v1.30) feature:**
509+
510+
*Without* the feature gate enabled, service account tokens that have been bound to Node objects will not have their
511+
node reference claims validated (to ensure the referenced node exists).
512+
513+
*With* the feature gate enabled, if a token has a `node` claim contained within it, it'll be validated to ensure the
514+
corresponding Node object actually exists.
515+
516+
Disabling this feature will therefore *relax* the security posture of the cluster in an unexpected way, as tokens that
517+
may have been previously invalid (because their corresponding Node does not exist) may become valid again.
518+
519+
Node bound tokens may only be issued if the `ServiceAccountTokenNodeBinding` feature is enabled, and it is not possible
520+
to enable `ServiceAccountTokenNodeBinding` without `ServiceAccountTokenNodeBindingValidation` being enabled too.
521+
522+
This is further mitigated by graduating the `ServiceAccountTokenNodeBindingValidation` feature one release **earlier**
523+
than `ServiceAccountTokenNodeBinding`.
524+
525+
Tokens that are bound to objects other than Nodes are unaffected.
526+
527+
**For `ServiceAccountTokenPodNodeInfo` feature (alpha v1.29, beta v1.30):**
528+
529+
*Without* the feature gate enabled, tokens that are bound to Pod objects will not include information about the Node
530+
that the pod is scheduled/assigned to.
531+
532+
*With* the feature enabled, newly minted tokens that are bound to Pod objects will include metadata about the Node, namely
533+
the Node's name and UID.
534+
535+
These fields are **not validated** and therefore disabling the feature after enabling it will not cause any adverse side-effects.
536+
537+
``
450538
<!--
451539
Describe manual testing that was done and the outcomes.
452540
Longer term, we may want to require automated upgrade/rollback tests, but we
@@ -592,8 +680,20 @@ For each of them, fill in the following information by copying the below templat
592680

593681
###### What steps should be taken if SLOs are not being met to determine the problem?
594682

683+
After observing an issue (e.g. uptick in denied authentication requests or a significant shift in any metrics added for this KEP),
684+
kube-apiserver logs from the authenticator may be used to debug.
685+
686+
Additionally, manually attempting to exercise the affected codepaths would surface information that'd aid debugging.
687+
For example, attempting to issue a node bound token, or attempting to authenticate to the apiserver using a node bound token.
688+
595689
## Implementation History
596690

691+
* KEP marked implementable and merged for the v1.29 release
692+
* KEP implemented in an alpha state for v1.29
693+
* Renamed audit annotation used for the `serviceaccounts/<name>/token` endpoint to be clearer: https://github.com/kubernetes/kubernetes/pull/123098
694+
* Added restrictions to disallow enabling `ServiceAccountTokenNodeBinding` without `ServiceAccountTokenNodeBindingValidation`: https://github.com/kubernetes/kubernetes/pull/123135
695+
* `ServiceAccountTokenJTI`, `ServiceAccountTokenNodeBindingValidation` and `ServiceAccountTokenPodNodeInfo` promoted to beta for v1.30 release
696+
597697
<!--
598698
Major milestones in the lifecycle of a KEP should be tracked in this section.
599699
Major milestones might include:

keps/sig-auth/4193-bound-service-account-token-improvements/kep.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ stage: alpha
2323
# The most recent milestone for which work toward delivery of this KEP has been
2424
# done. This can be the current (upcoming) milestone, if it is being actively
2525
# worked on.
26-
latest-milestone: "v1.29"
26+
latest-milestone: "v1.30"
2727

2828
# The milestone at which this feature was, or is targeted to be, at each stage.
2929
milestone:

0 commit comments

Comments
 (0)