Skip to content

Commit 359cd6b

Browse files
committed
OpenAPI Beta
1 parent 6ec5481 commit 359cd6b

File tree

3 files changed

+142
-36
lines changed

3 files changed

+142
-36
lines changed
Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
11
kep-number: 2896
22
alpha:
33
approver: "@deads2k"
4+
beta:
5+
approver: "@deads2k"

keps/sig-api-machinery/2896-openapi-v3/README.md

Lines changed: 136 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ Architecture for cross-cutting KEPs). -->
7272
- [Aggregator](#aggregator)
7373
- [OpenAPI](#openapi)
7474
- [Version Skew](#version-skew)
75+
- [OpenAPI V3 Proto](#openapi-v3-proto)
7576
- [Test Plan](#test-plan)
7677
- [Graduation Criteria](#graduation-criteria)
7778
- [Alpha](#alpha)
@@ -226,31 +227,56 @@ proposal will be implemented, this is the place to discuss them. -->
226227

227228
### Paths
228229

229-
The overarching `/openapi/v3` endpoint will contain the list of paths (groups)
230+
The root `/openapi/v3` endpoint will contain the list of paths (groups)
230231
available and serve as a discovery endpoint. Clients can then choose the
231232
group(s) to fetch and send additional requests.
232233

233234
/openapi/v3
234235

235236
```json
236237
{
237-
"Paths" : [
238-
"api",
239-
"api/v1",
240-
"apis",
241-
"apis/admissionregistration.k8s.io",
242-
"apis/apiextensions.k8s.io",
243-
"apis/apps",
244-
"apis/authentication.k8s.io",
238+
"Paths" : {
239+
"api": "/openapi/v3/api?etag=tag",
240+
"api/v1": "/openapi/v3/api/v1?etag=tag",
241+
"apis": "/openapi/v3/apis?etag=tag",
242+
"apis/admissionregistration.k8s.io/v1": "/openapi/v3/apis/admissionregistration.k8s.io/v1?etag=tag",
243+
"apis/apiextensions.k8s.io/v1": "/openapi/v3/apis/apiextensions.k8s.io/v1?etag=tag",
244+
"apis/apps/v1": "/openapi/v3/apis/apps/v1?etag=tag",
245245
...
246-
]
246+
}
247247
}
248248
```
249249

250250
Based on the provided group, clients can then request `openapi/v3/apis/apps/v1`,
251251
`/openapi/v3/apis/networking.k8s.io/v1` and etc. These leaf node specs are self
252252
contained OpenAPI v3 specs and include all referenced types.
253253

254+
The discovery document has the format of a map with the key being the
255+
group-version and value representing the URL of the OpenAPI for the
256+
particular group version. Note that the URL can be constructed by the
257+
client by prepending the group-version name with the `openapi/v3`
258+
prefix. The URL listed here provides a special etag query parameter to
259+
denote the latest etag for the OpenAPI spec for the particular
260+
group-version. The concept of using and changing the query parameter
261+
when a new version of the spec is available is a pattern used
262+
frequently in browser caching known as cache busting. All OpenAPI spec
263+
requests with the `?etag` query parameter will return a response with
264+
`Cache Control: immutable`. This allows clients to cache the OpenAPI
265+
spec when the etag is not changed. The `max-age` will also be set to a
266+
large value that is equivalent to publishing a spec that never
267+
expires.
268+
269+
Support for caching is built into the httpcache library used in
270+
client-go, and no change is needed on the client side to support this
271+
mechanism other than passing the additional query parameter. Passing
272+
the etag as the query parameter allows clients to check the etag in
273+
the root discovery document. Clients can avoid sending an additional
274+
request to fetch an ETag from specific group-versions, and the root
275+
document itself can provide information on group-version changes and
276+
updates. If there is a race and the client passes in an outdated etag
277+
value, the server will send a 301 to redirect the client to the URL
278+
with the latest etag value.
279+
254280
### Controllers
255281

256282
#### OpenAPI Builder
@@ -271,11 +297,12 @@ the endpoint for their specific group.
271297

272298
The aggregator has a mapping of all the APIServices and refreshes the aggregated
273299
spec on an interval. APIService already publish by group-version so their
274-
behavior is unchanged. Instead of aggregating in the aggregator, we will simply
275-
copy the spec to be published at the corresponding aggregator endpoint. For
276-
CRDs, instead of downloading the entire spec for CRDs, they will be downloaded
277-
per group-version, increasing the number of requests sent internally when a CRD
278-
with multiple groups is registered.
300+
behavior is unchanged. Because OpenAPI V3 is published by
301+
group-version, the fully aggregated spec is not needed and thus
302+
aggregation can be skipped. No spec downloading will be done by the
303+
aggregator. The aggregator in OpenAPI V3 will act as a proxy
304+
rather than aggregator, proxing group-version requests to downstream
305+
API servers.
279306

280307
### OpenAPI
281308

@@ -324,6 +351,39 @@ support for v2. The drawback is that v2 is lossy and converting it to
324351
v3 will provide a lossy v3 schema. This problem will be fixed when aggregated
325352
apiservers upgrade to publishing v3.
326353

354+
### OpenAPI V3 Proto
355+
356+
Kubernetes relies on a "bug" (relaxed constraint) in the gnostic library for OpenAPI v2 where a `$ref` and `description` can coexist in the same object. See [Issue](https://github.com/kubernetes/kubernetes/issues/106387) for more details. This is disallowed per JSON Schema Draft 4 which is the schema version OpenAPI v2 follows. The `$ref` and `description` coexistence is important to Kubernetes because kubectl explain uses it for providing documentation for reference fields.
357+
358+
For instance, a PodSpec object could have the properties:
359+
360+
```
361+
"affinity": {
362+
"$ref": "#/definitions/io.k8s.api.core.v1.Affinity",
363+
"description": "If specified, the pod's scheduling constraints"
364+
},
365+
```
366+
367+
kubectl explain uses the description to provide documentation for struct fields that are represented as references in the OpenAPI schema. The description here describes a field/struct's role in the PodSpec object rather than the struct itself. The gnostic library for OpenAPI 3.0 currently disallows the relaxed constraint and removes the description field when the OpenAPI spec is passed through proto. We will work with the gnostic team to update the library to support the same constraint relaxation as in OpenAPI 2.0 to fix the OpenAPI 3.0 protobuf.
368+
369+
Another workaround to this problem could be to wrap reference structures with `allOf`.
370+
371+
Eg:
372+
373+
```
374+
"affinity": {
375+
"allOf": [{
376+
"$ref": "#/definitions/io.k8s.api.core.v1.Affinity",
377+
}],
378+
"description": "If specified, the pod's scheduling constraints"
379+
},
380+
```
381+
382+
This solves the immediate protobuf issue but adds complexity to the OpenAPI schema.
383+
384+
The final alternative is to upgrade to OpenAPI 3.1 where the new JSON Schema version it is based off of supports fields alongside a `$ref`. However, OpenAPI does not follow semvar and 3.1 is a major upgrade over 3.0 and introduces various backwards incompatible changes. Furthermore, library support is currently lacking (gnostic) and doesn't fully support OpenAPI 3.1. One important backwards incompatible change is the removal of the nullable field and replacing it by changing the type field from a single string to an array of strings.
385+
386+
327387
### Test Plan
328388

329389
<!-- **Note:** *Not required until targeted at a release.*
@@ -357,13 +417,15 @@ generated is valid OpenAPI v3.
357417
#### Beta
358418

359419
- Native types are updated to capture capabilities introduced with v3
420+
- Incorrect OpenAPI polymorphic types (IntOrString, Quantity) are updated to use `anyOf` in OpenAPI V3
360421
- Definition names of native resources are updated to omit their package paths
361422
- Parameters are reused as components
362423
- `kubectl explain` to support using the OpenAPI v3 Schema
363424
- Aggregated API servers are queried for their v2 endpoint and converted to
364425
publish v3 if they do not directly publish v3
365426
- Heuristics are used for the OpenAPI v2 to v3 conversion to maximize
366427
correctness of published spec
428+
- Aggregation for OpenAPI v3 will serve as a proxy to downstream OpenAPI paths
367429

368430
### Upgrade / Downgrade Strategy
369431

@@ -437,7 +499,7 @@ you need any help or guidance. -->
437499
<!-- Pick one of these and delete the rest. -->
438500

439501
- [x] Feature gate (also fill in values in `kep.yaml`)
440-
- Feature gate name: OpenAPIv3Enabled
502+
- Feature gate name: OpenAPIV3
441503
- Components depending on the feature gate: kube-apiserver
442504
- [ ] Other
443505
- Describe the mechanism:
@@ -489,31 +551,37 @@ will be enabled on some API servers and not others during the rollout.
489551
Similarly, consider large clusters and how enablement/disablement will rollout
490552
across nodes. -->
491553

554+
Version skew is discussed in a section above during a rolling control plane upgrade.
555+
492556
###### What specific metrics should inform a rollback?
493557

494558
<!-- What signals should users be paying attention to when the feature is young
495559
that might indicate a serious problem? -->
496560

561+
Non 200 responses from the `openapi/v3` endpoint could indicate a problem. A long response time from the apiserver for OpenAPI requests could also be an indicator of a problem.
562+
497563
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
498564

499565
<!-- Describe manual testing that was done and the outcomes. Longer term, we may
500566
want to require automated upgrade/rollback tests, but we are missing a bunch of
501567
machinery and tooling and can't do that now. -->
502568

569+
n/a
570+
503571
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
504572

505573
<!-- Even if applying deprecation policies, they may still surprise some users.
506574
-->
507575

576+
No.
577+
508578
### Monitoring Requirements
509579

510580
<!-- This section must be completed when targeting beta to a release. -->
511581

512582
###### How can an operator determine if the feature is in use by workloads?
513583

514-
<!-- Ideally, this should be a metric. Operations against the Kubernetes API
515-
(e.g., checking if there are objects with field X set) may be a last resort.
516-
Avoid logs or events for this purpose. -->
584+
The OpenAPI path `/openapi/v3` is populated. On the metrics side, an OpenAPI V3 specific metric is `crd_openapi_v3_aggregation_duration_seconds`, and should emit data if the feature is enabled.
517585

518586
###### How can someone using this feature know that it is working for their instance?
519587

@@ -524,13 +592,7 @@ users below with sufficient detail so that they can verify correct enablement
524592
and operation of this feature. Recall that end users cannot usually observe
525593
component logs or access metrics. -->
526594

527-
- [ ] Events
528-
- Event Reason:
529-
- [ ] API .status
530-
- Condition name:
531-
- Other field:
532-
- [ ] Other (treat as last resort)
533-
- Details:
595+
The `openapi/v3` endpoint will be populated with the list of groups if OpenAPI V3 is enabled.
534596

535597
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
536598

@@ -547,28 +609,45 @@ It's impossible to provide comprehensive guidance, but at the very high level
547609
These goals will help you determine what you need to measure (SLIs) in the next
548610
question. -->
549611

612+
This feature should not affect the SLO of any components.
613+
614+
OpenAPI v3 aggregation comes on-top of v2 aggregation => there is additional load. But v3 aggregation is cheap:
615+
616+
- OpenAPI v3 aggregation is much cheaper as every group-version is aggregated independently.
617+
- APIServices (aggregated apiservers) coming and going (due to availability changes) do not lead to aggregation because kube-apiserver only acts as a proxy
618+
- CRD aggregation is structurally trivial (no unification of definition names, which is quadratic in schema sizes) and hence linear in the number of CRDs per group-version.
619+
- CRDs do not "come and go" as aggregated APIs, but it's a one-time per CRD schema change and API server startup operation.
620+
550621
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
551622

552623
<!-- Pick one more of these and delete the rest. -->
553624

554-
- [ ] Metrics
555-
- Metric name:
556-
- [Optional] Aggregation method:
557-
- Components exposing the metric:
558-
- [ ] Other (treat as last resort)
559-
- Details:
625+
A new metric will be added in CRD controller to measure the time taken to aggregate CRD OpenAPI specs.
626+
627+
- [X] Metrics
628+
- Metric name: `crd_openapi_v3_aggregation_duration_seconds`
629+
- Components exposing the metric: kube-apiserver
560630

561631
###### Are there any missing metrics that would be useful to have to improve observability of this feature?
562632

563633
<!-- Describe the metrics themselves and the reasons why they weren't added
564634
(e.g., cost, implementation difficulties, etc.). -->
565635

636+
Not at the moment.
637+
566638
### Dependencies
567639

568640
<!-- This section must be completed when targeting beta to a release. -->
569641

570642
###### Does this feature depend on any specific services running in the cluster?
571643

644+
OpenAPI V3 aggregates from apiservers provided by APIService.
645+
646+
- APIService
647+
- OpenAPI V3 fetches the `openapi/v3` endpoint and specs from aggregated API
648+
- Impact of its outage on the feature: If an APIService is unavailable, the OpenAPI spec of the corresponding APIService will be unavailable but other OpenAPI specs will be unaffected. The resource usage will be better than with OpenAPI V2 because no aggregation is needed when APIServices become unavailable and available.
649+
- Impact of its degraded performance or high-error rates on the feature: Same as above.
650+
572651
<!-- Think about both cluster-level services (e.g. metrics-server) as well as
573652
node-level agents (e.g. specific version of CRI). Focus on external or optional
574653
services that are needed. For example, if this feature depends on a cloud
@@ -608,7 +687,7 @@ mostly on:
608687
heartbeats, leader election, etc.) -->
609688

610689
Yes. Get on the `/openapi/v3` endpoint as well as
611-
`/openapi/v3/{group}/{version}` for each API group provided by Kubernetes
690+
`/openapi/v3/{group}/{version}` for each API group provided by Kubernetes.
612691

613692
###### Will enabling / using this feature result in introducing new API types?
614693

@@ -673,6 +752,8 @@ some monitoring details). For now, we leave it here. -->
673752

674753
###### How does this feature react if the API server and/or etcd is unavailable?
675754

755+
The feature is part of the API server and will not function if it is unavailable. It does not depend on the availability of etcd.
756+
676757
###### What are other known failure modes?
677758

678759
<!-- For each of them, fill in the following information by copying the below
@@ -687,6 +768,28 @@ template:
687768
graduated to beta.
688769
- Testing: Are there any tests for failure mode? If not, describe why. -->
689770

771+
- Failure in endpoint
772+
- Detection: How can it be detected via metrics? Stated another way: how can
773+
an operator troubleshoot without logging into a master or worker node?
774+
775+
Lack of 200 status in the OpenAPI endpoint. High latency in apiserver responses
776+
777+
- Mitigations: What can be done to stop the bleeding, especially for already
778+
running user workloads?
779+
780+
OpenAPI V3 can be rolled back and disabled via the feature flag
781+
782+
- Diagnostics: What are the useful log messages and their required logging
783+
levels that could help debug the issue? Not required until feature
784+
graduated to beta.
785+
786+
Warning and Error logging messages having openapi/swagger/spec keywords.
787+
788+
- Testing: Are there any tests for failure mode? If not, describe why.
789+
790+
Tests will be added for failure conditions.
791+
792+
690793
###### What steps should be taken if SLOs are not being met to determine the problem?
691794

692795
## Implementation History

keps/sig-api-machinery/2896-openapi-v3/kep.yaml

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,21 +16,22 @@ prr-approvers:
1616
- "@deads2k"
1717

1818
# The target maturity stage in the current dev cycle for this KEP.
19-
stage: alpha
19+
stage: beta
2020

2121
# The most recent milestone for which work toward delivery of this KEP has been
2222
# done. This can be the current (upcoming) milestone, if it is being actively
2323
# worked on.
24-
latest-milestone: "v1.23"
24+
latest-milestone: "v1.24"
2525

2626
# The milestone at which this feature was, or is targeted to be, at each stage.
2727
milestone:
2828
alpha: "v1.23"
29+
beta: "v1.24"
2930

3031
# The following PRR answers are required at alpha release
3132
# List the feature gate name and the components for which it must be enabled
3233
feature-gates:
33-
- name: OpenAPIv3Enabled
34+
- name: OpenAPIV3
3435
components:
3536
- kube-apiserver
3637
disable-supported: true

0 commit comments

Comments
 (0)