Skip to content

Commit 0f23d65

Browse files
authored
Merge pull request kubernetes#2363 from mtaufen/said-ga
Add PRR survey to OIDC Discovery KEP
2 parents dc706f6 + 13d4ab8 commit 0f23d65

File tree

3 files changed

+232
-1
lines changed

3 files changed

+232
-1
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 1393
2+
stable:
3+
approver: "@johnbelamaric"

keps/sig-auth/1393-oidc-discovery/README.md

Lines changed: 205 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,13 @@
1313
- [Design Details](#design-details)
1414
- [Test Plan](#test-plan)
1515
- [Graduation Criteria](#graduation-criteria)
16+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
17+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
18+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
19+
- [Monitoring Requirements](#monitoring-requirements)
20+
- [Dependencies](#dependencies)
21+
- [Scalability](#scalability)
22+
- [Troubleshooting](#troubleshooting)
1623
- [Implementation History](#implementation-history)
1724
<!-- /toc -->
1825

@@ -296,11 +303,208 @@ of versioning. However, we can still treat graduation in terms of
296303
[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions
297304
[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/
298305

306+
## Production Readiness Review Questionnaire
307+
308+
<!--
309+
310+
Production readiness reviews are intended to ensure that features merging into
311+
Kubernetes are observable, scalable and supportable; can be safely operated in
312+
production environments, and can be disabled or rolled back in the event they
313+
cause increased failures in production. See more in the PRR KEP at
314+
https://git.k8s.io/enhancements/keps/sig-architecture/1194-prod-readiness.
315+
316+
The production readiness review questionnaire must be completed and approved
317+
for the KEP to move to `implementable` status and be included in the release.
318+
319+
In some cases, the questions below should also have answers in `kep.yaml`. This
320+
is to enable automation to verify the presence of the review, and to reduce review
321+
burden and latency.
322+
323+
The KEP must have a approver from the
324+
[`prod-readiness-approvers`](http://git.k8s.io/enhancements/OWNERS_ALIASES)
325+
team. Please reach out on the
326+
[#prod-readiness](https://kubernetes.slack.com/archives/CPNHUMN74) channel if
327+
you need any help or guidance.
328+
329+
-->
330+
331+
### Feature Enablement and Rollback
332+
333+
_This section must be completed when targeting alpha to a release._
334+
335+
* **How can this feature be enabled / disabled in a live cluster?**
336+
337+
No. This feature is always enabled (post-GA).
338+
Pre-GA it was possible to disable with the feature gate.
339+
340+
- [x] Feature gate (also fill in values in `kep.yaml`)
341+
- Feature gate name: ServiceAccountIssuerDiscovery
342+
- Components depending on the feature gate: kube-apiserver
343+
- Note: This feature is targeted to GA in 1.21, at which point feature gates
344+
lock to enabled. This means it will not be possible to disable after the
345+
current dev cycle.
346+
347+
* **Does enabling the feature change any default behavior?**
348+
No. It adds an entirely new non-resource-url that can be used to discover
349+
metadata related to the cluster's service account issuer.
350+
351+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
352+
the enablement)?**
353+
354+
No. This feature is always enabled post-GA.
355+
The only way to roll back is to return to an older K8s version.
356+
357+
**Describe the consequences on existing workloads (e.g., if this is a runtime
358+
feature, can it break the existing applications?).**
359+
360+
Existing applications would have to take a dependency on this feature to
361+
be broken by it. Thus, enabling the feature for the first time is not a risk
362+
to existing applications, but disabling it later could be.
363+
364+
* **What happens if we reenable the feature if it was previously rolled back?**
365+
366+
The feature should continue to work just fine.
367+
368+
* **Are there any tests for feature enablement/disablement?**
369+
370+
No.
371+
372+
### Rollout, Upgrade and Rollback Planning
373+
374+
_This section must be completed when targeting beta graduation to a release._
375+
376+
* **How can a rollout fail? Can it impact already running workloads?**
377+
Enablement shouldn't affect any existing workloads. If we broke the feature in
378+
the future, we would _possibly_ see failures of workloads to authenticate to
379+
Relying Parties _outside_ the cluster, but in-cluster workload to
380+
kube-apiserver authentication would still work, since it doesn't rely
381+
on this path.
382+
383+
* **What specific metrics should inform a rollback?**
384+
N/A
385+
386+
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
387+
The standard upgrade tests would have covered this between alpha and beta,
388+
when the feature was enabled by default.
389+
390+
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
391+
fields of API types, flags, etc.?**
392+
393+
No.
394+
395+
### Monitoring Requirements
396+
397+
_This section must be completed when targeting beta graduation to a release._
398+
399+
* **How can an operator determine if the feature is in use by workloads?**
400+
Ideally, there would just be usage metrics for all API server endpoints.
401+
Since we don't currently have that, the next best option would be to examine
402+
API server logs.
403+
404+
* **What are the SLIs (Service Level Indicators) an operator can use to determine
405+
the health of the service?**
406+
- [x] Other (treat as last resort)
407+
- Details: API server logs, or ability of workloads to authenticate to
408+
Relying Parties.
409+
410+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?**
411+
We expect the endpoints to maintain high reliability, with reliability
412+
matching that of kube-apiserver.
413+
414+
* **Are there any missing metrics that would be useful to have to improve observability
415+
of this feature?**
416+
It would be nice to have usage metrics for this endpoint. We haven't added
417+
them so far because non-resource URLs don't have them by default. This could
418+
be worth solving in general but a general solution is out of scope for this
419+
KEP.
420+
421+
### Dependencies
422+
423+
_This section must be completed when targeting beta graduation to a release._
424+
425+
* **Does this feature depend on any specific services running in the cluster?**
426+
It only depends on kube-apiserver being up. If, for example, the issuer is
427+
configured as https://kubernetes.default.svc, then the corresponding Service
428+
needs to exist in the cluster as well.
429+
430+
431+
### Scalability
432+
433+
_For alpha, this section is encouraged: reviewers should consider these questions
434+
and attempt to answer them._
435+
436+
_For beta, this section is required: reviewers must answer these questions._
437+
438+
_For GA, this section is required: approvers should be able to confirm the
439+
previous answers based on experience in the field._
440+
441+
* **Will enabling / using this feature result in any new API calls?**
442+
Yes.
443+
- GET `${API_SERVER}/.well-known/openid-configuration`
444+
- GET `${API_SERVER}/openid/v1/jwks`
445+
- Note each endpoint serves a response that is pre-rendered when
446+
kube-apiserver starts up.
447+
- Originating components: Could be arbitrary. For example:
448+
- A cluster installer reads these once when configuring identity federation
449+
with a cloud provider (Low throughput).
450+
- In-cluster components use this to perform an OIDC discovery flow to
451+
validate tokens (Medium to High throughput). Note TokenReview is the
452+
preferred approach in this case.
453+
- A cluster admin adds additional RBAC to make these endpoints public, and
454+
points Relying Parties directly at these endpoints (High throughput,
455+
though RPs _should_ do some caching instead of making calls on every
456+
token validation).
457+
458+
* **Will enabling / using this feature result in introducing new API types?**
459+
No new types, just two new non-resource URLs that implement this KEP, as
460+
described above. There is no new state stored in etcd.
461+
462+
* **Will enabling / using this feature result in any new calls to the cloud
463+
provider?**
464+
No.
465+
466+
* **Will enabling / using this feature result in increasing size or count of
467+
the existing API objects?**
468+
No.
469+
470+
* **Will enabling / using this feature result in increasing time taken by any
471+
operations covered by [existing SLIs/SLOs]?**
472+
No.
473+
474+
* **Will enabling / using this feature result in non-negligible increase of
475+
resource usage (CPU, RAM, disk, IO, ...) in any components?**
476+
This isn't expected, given it's just copying a pre-rendered string into
477+
the response.
478+
479+
### Troubleshooting
480+
481+
The Troubleshooting section currently serves the `Playbook` role. We may consider
482+
splitting it into a dedicated `Playbook` document (potentially with some monitoring
483+
details). For now, we leave it here.
484+
485+
_This section must be completed when targeting beta graduation to a release._
486+
487+
* **How does this feature react if the API server and/or etcd is unavailable?**
488+
If kube-apiserver is unavailable, this feature is also unavailable. This
489+
feature is not affected by etcd availability.
490+
491+
* **What are other known failure modes?**
492+
N/A
493+
494+
* **What steps should be taken if SLOs are not being met to determine the problem?**
495+
- Examine the responses from the above endpoints.
496+
- Examine kube-apiserver logs.
497+
- Examine kube-apiserver configuration related to this KEP.
498+
499+
[supported limits]: https://git.k8s.io/community//sig-scalability/configs-and-limits/thresholds.md
500+
[existing SLIs/SLOs]: https://git.k8s.io/community/sig-scalability/slos/slos.md#kubernetes-slisslos
501+
299502
## Implementation History
300503

301504
- 2018-06-26: Proposed in https://github.com/kubernetes/community/pull/2314
302505
- 2018, 2019: Various comments on pull request
303506
- 2019-07-30: Moved to a KEP (with no edits from the original proposal)
304507
- 2019-08-05: Updated KEP with more details.
305508
- 2019-10-18: Updated KEP with more RBAC details.
306-
- 2020-1-25: Updated KEP and marked as implementable.
509+
- 2020-01-25: Updated KEP and marked as implementable.
510+
- 2021-01-28: Added PRR questionaire.

keps/sig-auth/1393-oidc-discovery/kep.yaml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,9 +17,33 @@ approvers:
1717
- "@enj"
1818
- "@micahhausler"
1919
- "@ericchiang"
20+
prr-approvers:
21+
- "@johnbelamaric"
2022
editor: TBD
2123
creation-date: 2018-06-26
2224
last-updated: 2020-01-25
2325
status: implementable
2426
replaces:
2527
- "https://github.com/kubernetes/community/pull/2314/"
28+
29+
# The target maturity stage in the current dev cycle for this KEP.
30+
stage: stable
31+
32+
# The most recent milestone for which work toward delivery of this KEP has been
33+
# done. This can be the current (upcoming) milestone, if it is being actively
34+
# worked on.
35+
latest-milestone: "v1.21"
36+
37+
# The milestone at which this feature was, or is targeted to be, at each stage.
38+
milestone:
39+
alpha: "v1.18"
40+
beta: "v1.20"
41+
stable: "v1.21"
42+
43+
# The following PRR answers are required at alpha release
44+
# List the feature gate name and the components for which it must be enabled
45+
feature-gates:
46+
- name: ServiceAccountIssuerDiscovery
47+
components:
48+
- kube-apiserver
49+
disable-supported: false

0 commit comments

Comments
 (0)