Skip to content

Commit 2ed09de

Browse files
authored
Merge pull request #4472 from Jefftree/agg-discovery-ga
KEP-3352: Aggregated Discovery to GA
2 parents a21b94b + b70f3cf commit 2ed09de

File tree

3 files changed

+78
-18
lines changed

3 files changed

+78
-18
lines changed

keps/prod-readiness/sig-api-machinery/3352.yaml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,5 @@ alpha:
33
approver: "@deads2k"
44
beta:
55
approver: "@deads2k"
6+
stable:
7+
approver: "@jpbetz"

keps/sig-api-machinery/3352-aggregated-discovery/README.md

Lines changed: 72 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -126,26 +126,26 @@ released. -->
126126
Items marked with (R) are required *prior to targeting to a milestone
127127
/ release*.
128128

129-
- [ ] (R) Enhancement issue in release milestone, which links to KEP
129+
- [x] (R) Enhancement issue in release milestone, which links to KEP
130130
dir in [kubernetes/enhancements] (not the initial KEP PR)
131-
- [ ] (R) KEP approvers have approved the KEP status as
131+
- [x] (R) KEP approvers have approved the KEP status as
132132
`implementable`
133-
- [ ] (R) Design details are appropriately documented
134-
- [ ] (R) Test plan is in place, giving consideration to SIG
133+
- [x] (R) Design details are appropriately documented
134+
- [x] (R) Test plan is in place, giving consideration to SIG
135135
Architecture and SIG Testing input (including test refactors)
136136
- [ ] e2e Tests for all Beta API Operations (endpoints)
137137
- [ ] (R) Ensure GA e2e tests for meet requirements for [Conformance
138138
Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
139139
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake
140140
free
141-
- [ ] (R) Graduation criteria is in place
141+
- [x] (R) Graduation criteria is in place
142142
- [ ] (R) [all GA
143143
Endpoints](https://github.com/kubernetes/community/pull/1806)
144144
must be hit by [Conformance
145145
Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
146-
- [ ] (R) Production readiness review completed
147-
- [ ] (R) Production readiness review approved
148-
- [ ] "Implementation History" section is up-to-date for milestone
146+
- [x] (R) Production readiness review completed
147+
- [x] (R) Production readiness review approved
148+
- [x] "Implementation History" section is up-to-date for milestone
149149
- [ ] User-facing documentation has been created in
150150
[kubernetes/website], for publication to [kubernetes.io]
151151
- [ ] Supporting documentation—e.g., additional design documents,
@@ -180,7 +180,6 @@ length, to make it easier for reviewers to cite specific portions, and
180180
to minimize diff churn on updates.
181181
-->
182182

183-
184183
The operations that a Kubernetes API server supports are reported
185184
through a collection of small documents partitioned by group-version.
186185
All clients of Kubernetes APIs must send a request to every
@@ -341,6 +340,9 @@ will request for the aggregated discovery v2 type, aggregated
341340
discovery v2beta1 type, and unaggregated v1 type in that order. The
342341
server will return the first option that is supported.
343342

343+
Refer to the Version Skew Strategy section for more information on how backwards compatibility
344+
is maintained by both the client and server when the types are promoted from v2beta1 to v2.
345+
344346
### API
345347

346348
The contents of this endpoint will be an `APIGroupDiscoveryList`,
@@ -354,7 +356,7 @@ current API will be representable in the new API.
354356
The endpoint will also publish an ETag calculated based on a hash of
355357
the data for clients.
356358

357-
These types will live in the `apidiscovery/v2beta1` group version.
359+
These types will live in the `apidiscovery/v2` group version.
358360

359361
This is what the new API will look like.
360362

@@ -541,7 +543,10 @@ This can inform certain test coverage improvements that we want to do
541543
before extending the production code to implement this enhancement.
542544
-->
543545

544-
This will be implemented in a new package in kube-aggregator.
546+
- k8s.io/apiserver/pkg/endpoints/discovery/aggregated: 77.4
547+
- Note that the `fake.go` file has no unit test coverage as it is a utility designed to be used by integration tests. The rest of the files in the package have 90+ coverage.
548+
- k8s.io/kube-aggregator/pkg/apiserver/handler_discovery.go: 82.2
549+
- k8s.io/client-go/discovery/aggregated_discovery.go: 96.8
545550

546551
##### Integration tests
547552

@@ -553,8 +558,8 @@ For Beta and GA, add links to added tests together with links to
553558
k8s-triage for those tests:
554559
https://storage.googleapis.com/k8s-triage/index.html -->
555560

556-
For alpha, integration tests will be added to exercise the new
557-
aggregated discovery code path.
561+
Integration tests
562+
- [test/integration/apiserver/discovery/discovery_test.go](https://testgrid.k8s.io/sig-release-master-blocking#integration-master&width=5&include-filter-by-regex=discovery)
558563

559564
##### e2e tests
560565

@@ -569,8 +574,8 @@ https://storage.googleapis.com/k8s-triage/index.html
569574
We expect no non-infra related flakes in the last month as a GA
570575
graduation criteria. -->
571576

572-
For alpha, tests will be added to exercise the new aggregated
573-
discovery code path for kubectl, both on the server and client side.
577+
e2e tests
578+
- [test/e2e/apimachinery/aggregated_discovery.go](https://testgrid.k8s.io/sig-release-master-blocking#gce-cos-master-default&width=5&include-filter-by-regex=discovery)
574579

575580
### Graduation Criteria
576581

@@ -620,7 +625,11 @@ main focus will be on kubectl and golang clients.
620625

621626
#### GA
622627

623-
- TBD
628+
- Existing bugs are fixed:
629+
- AggregatedDiscovery controller does not purge old APIServices from cache ([Issue](https://github.com/kubernetes/kubernetes/issues/115301))
630+
- Aggregated Discovery doesn't show aggregated apiservices as Stale before initial health check ([Issue](https://github.com/kubernetes/kubernetes/issues/115303))
631+
- New API type `apidiscovery.k8s.io/v2` is introduced
632+
- e2e and conformance tests
624633

625634
**Note:** Generally we also wait at least two releases between beta
626635
and GA/stable, because there's no opportunity for user feedback, or
@@ -633,6 +642,7 @@ include [conformance tests].**
633642

634643
#### Deprecation
635644

645+
Once Aggregated Discovery v2 types are GA, v2beta1 types will be deprecated and removed after 3 releases.
636646

637647
### Upgrade / Downgrade Strategy
638648

@@ -641,6 +651,31 @@ feature and upgrade/downgrade is not a problem.
641651

642652
### Version Skew Strategy
643653

654+
When moving from beta to GA, we will introduce a new API group version `apidiscovery.k8s.io/v2`.
655+
656+
All clients v1.26 to v1.29 will only request for the beta API group version `apidiscovery.k8s.io/v2beta1`.
657+
658+
To accommodate skew between the client and server (older client and newer server), the server will serve both v2 and v2beta1 versions based on the client request headers. The API server will continue to support v2beta1 until its removal in Kubernetes v1.33.
659+
660+
To accommodate skew between an older server and newer client, starting in v1.30,
661+
client-go will request for both v2 and v2beta1 by sending a list of group versions
662+
requested (in order from v2, v2beta1, unaggregated) and the server will return the
663+
first group version that matches. Concretely, this is done using `Accept` headers with a single request.
664+
665+
```
666+
Accept: application/json;as=APIGroupDiscoveryList;v=v2;g=apidiscovery.k8s.io,application/json;as=APIGroupDiscoveryList;v=v2beta1;g=apidiscovery.k8s.io,application/json
667+
```
668+
669+
In the case of older servers, the server will only
670+
be able to match v2beta1. The client will support both v2 and v2beta1. This allows a
671+
newer client to communicate with an older server that supports only the beta version.
672+
Other clients should follow the same convention to support version skew, though a client
673+
that is only capable of processing v2 is sufficient if it only communicates with v1.30+ servers.
674+
Otherwise, the client will need to be ready to tolerate a 406 Not Acceptable response and handle
675+
the error appropriately.
676+
677+
If there is no skew and both server and client are v1.30+, clients will still request for v2 and v2beta1, and the server will match the first group version and return v2.
678+
644679
## Production Readiness Review Questionnaire
645680

646681
<!--
@@ -839,6 +874,10 @@ high level (needs more precise definitions) those may be things like:
839874
These goals will help you determine what you need to measure (SLIs) in
840875
the next question. -->
841876

877+
Aggregated Discovery falls under a non-streaming read-only API call which is defined under the Kubernetes API call latency
878+
[SLI/SLO](https://github.com/kubernetes/community/blob/master/sig-scalability/slos/api_call_latency.md).
879+
The number in the SLO are a good bound for Aggregated Discovery (p99 < 1s).
880+
842881
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
843882

844883
<!-- Pick one more of these and delete the rest. -->
@@ -970,6 +1009,10 @@ large cases, again with respect to the [supported limits].
9701009

9711010
No.
9721011

1012+
###### Can enabling / using this feature result in resource exhaustion of some node resources (PIDs, sockets, inodes, etc.)?
1013+
1014+
No.
1015+
9731016
### Troubleshooting
9741017

9751018
<!-- This section must be completed when targeting beta to a release.
@@ -1002,6 +1045,15 @@ the below template:
10021045
- Testing: Are there any tests for failure mode? If not, describe
10031046
why. -->
10041047

1048+
- Aggregated API Server is unavailable:
1049+
- Detection: An Aggregated API Server that is unavailable will return Stale as the DiscoveryFreshness.
1050+
A prolonged period of staleness could indicate that the aggregated apiserver is unavailable.
1051+
- Mitigations: If the aggregated apiserver is not reacheable, it will not be part of the resources available.
1052+
Restarting the pod or checking for any misconfigurations could be a valid next step.
1053+
- Diagnostics: Missing the (v3) log line: `DiscoveryManager: successfully downloaded discovery/legacy discovery for <apiservice>`
1054+
- Testing: We test for unreacheable aggregated apiservers returning Stale, but an aggregated apiserver could
1055+
be unavailable for a wide variety of reasons that would require further diagnosis.
1056+
10051057
###### What steps should be taken if SLOs are not being met to determine the problem?
10061058

10071059
The feature can be rolled back by setting the AggregatedDiscoveryEndpoint feature flag to false.
@@ -1021,6 +1073,10 @@ this section. Major milestones might include:
10211073
availability
10221074
- when the KEP was retired or superseded -->
10231075

1076+
- v1.26: Aggregated Discovery KEP is merged and moves to alpha
1077+
- v1.27: Aggregated Discovery moves to beta
1078+
- v1.30: Aggregated Discovery moves to stable
1079+
10241080
## Drawbacks
10251081

10261082
With aggregation, the size of the aggregated discovery document could

keps/sig-api-machinery/3352-aggregated-discovery/kep.yaml

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -16,19 +16,21 @@ reviewers:
1616
approvers:
1717
- "@deads2k"
1818
- "@lavalamp"
19+
- "@jpbetz"
1920

2021
# The target maturity stage in the current dev cycle for this KEP.
21-
stage: beta
22+
stage: stable
2223

2324
# The most recent milestone for which work toward delivery of this KEP has been
2425
# done. This can be the current (upcoming) milestone, if it is being actively
2526
# worked on.
26-
latest-milestone: "v1.27"
27+
latest-milestone: "v1.30"
2728

2829
# The milestone at which this feature was, or is targeted to be, at each stage.
2930
milestone:
3031
alpha: "v1.26"
3132
beta: "v1.27"
33+
stable: "v1.30"
3234

3335
# The following PRR answers are required at alpha release
3436
# List the feature gate name and the components for which it must be enabled

0 commit comments

Comments
 (0)