You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Based on the provided group, clients can then request `openapi/v3/apis/apps/v1`,
251
251
`/openapi/v3/apis/networking.k8s.io/v1` and etc. These leaf node specs are self
252
252
contained OpenAPI v3 specs and include all referenced types.
253
253
254
+
The discovery document has the format of a map with the key being the
255
+
group-version and value representing the URL of the OpenAPI for the
256
+
particular group version. Note that the URL can be constructed by the
257
+
client by prepending the group-version name with the `openapi/v3`
258
+
prefix. The URL listed here provides a special etag query parameter to
259
+
denote the latest etag for the OpenAPI spec for the particular
260
+
group-version. The concept of using and changing the query parameter
261
+
when a new version of the spec is available is a pattern used
262
+
frequently in browser caching known as cache busting. All OpenAPI spec
263
+
requests with the `?etag` query parameter will return a response with
264
+
`Cache Control: immutable`. This allows clients to cache the OpenAPI
265
+
spec when the etag is not changed. The `max-age` will also be set to a
266
+
large value that is equivalent to publishing a spec that never
267
+
expires.
268
+
269
+
Support for caching is built into the httpcache library used in
270
+
client-go, and no change is needed on the client side to support this
271
+
mechanism other than passing the additional query parameter. Passing
272
+
the etag as the query parameter allows clients to check the etag in
273
+
the root discovery document. Clients can avoid sending an additional
274
+
request to fetch an ETag from specific group-versions, and the root
275
+
document itself can provide information on group-version changes and
276
+
updates. If there is a race and the client passes in an outdated etag
277
+
value, the server will send a 301 to redirect the client to the URL
278
+
with the latest etag value.
279
+
254
280
### Controllers
255
281
256
282
#### OpenAPI Builder
@@ -271,11 +297,12 @@ the endpoint for their specific group.
271
297
272
298
The aggregator has a mapping of all the APIServices and refreshes the aggregated
273
299
spec on an interval. APIService already publish by group-version so their
274
-
behavior is unchanged. Instead of aggregating in the aggregator, we will simply
275
-
copy the spec to be published at the corresponding aggregator endpoint. For
276
-
CRDs, instead of downloading the entire spec for CRDs, they will be downloaded
277
-
per group-version, increasing the number of requests sent internally when a CRD
278
-
with multiple groups is registered.
300
+
behavior is unchanged. Because OpenAPI V3 is published by
301
+
group-version, the fully aggregated spec is not needed and thus
302
+
aggregation can be skipped. No spec downloading will be done by the
303
+
aggregator. The aggregator in OpenAPI V3 will act as a proxy
304
+
rather than aggregator, proxing group-version requests to downstream
305
+
API servers.
279
306
280
307
### OpenAPI
281
308
@@ -324,6 +351,39 @@ support for v2. The drawback is that v2 is lossy and converting it to
324
351
v3 will provide a lossy v3 schema. This problem will be fixed when aggregated
325
352
apiservers upgrade to publishing v3.
326
353
354
+
### OpenAPI V3 Proto
355
+
356
+
Kubernetes relies on a "bug" (relaxed constraint) in the gnostic library for OpenAPI v2 where a `$ref` and `description` can coexist in the same object. See [Issue](https://github.com/kubernetes/kubernetes/issues/106387) for more details. This is disallowed per JSON Schema Draft 4 which is the schema version OpenAPI v2 follows. The `$ref` and `description` coexistence is important to Kubernetes because kubectl explain uses it for providing documentation for reference fields.
357
+
358
+
For instance, a PodSpec object could have the properties:
"description": "If specified, the pod's scheduling constraints"
364
+
},
365
+
```
366
+
367
+
kubectl explain uses the description to provide documentation for struct fields that are represented as references in the OpenAPI schema. The description here describes a field/struct's role in the PodSpec object rather than the struct itself. The gnostic library for OpenAPI 3.0 currently disallows the relaxed constraint and removes the description field when the OpenAPI spec is passed through proto. We will work with the gnostic team to update the library to support the same constraint relaxation as in OpenAPI 2.0 to fix the OpenAPI 3.0 protobuf.
368
+
369
+
Another workaround to this problem could be to wrap reference structures with `allOf`.
"description": "If specified, the pod's scheduling constraints"
379
+
},
380
+
```
381
+
382
+
This solves the immediate protobuf issue but adds complexity to the OpenAPI schema.
383
+
384
+
The final alternative is to upgrade to OpenAPI 3.1 where the new JSON Schema version it is based off of supports fields alongside a `$ref`. However, OpenAPI does not follow semvar and 3.1 is a major upgrade over 3.0 and introduces various backwards incompatible changes. Furthermore, library support is currently lacking (gnostic) and doesn't fully support OpenAPI 3.1. One important backwards incompatible change is the removal of the nullable field and replacing it by changing the type field from a single string to an array of strings.
385
+
386
+
327
387
### Test Plan
328
388
329
389
<!-- **Note:** *Not required until targeted at a release.*
@@ -357,13 +417,15 @@ generated is valid OpenAPI v3.
357
417
#### Beta
358
418
359
419
- Native types are updated to capture capabilities introduced with v3
420
+
- Incorrect OpenAPI polymorphic types (IntOrString, Quantity) are updated to use `anyOf` in OpenAPI V3
360
421
- Definition names of native resources are updated to omit their package paths
361
422
- Parameters are reused as components
362
423
-`kubectl explain` to support using the OpenAPI v3 Schema
363
424
- Aggregated API servers are queried for their v2 endpoint and converted to
364
425
publish v3 if they do not directly publish v3
365
426
- Heuristics are used for the OpenAPI v2 to v3 conversion to maximize
366
427
correctness of published spec
428
+
- Aggregation for OpenAPI v3 will serve as a proxy to downstream OpenAPI paths
367
429
368
430
### Upgrade / Downgrade Strategy
369
431
@@ -437,7 +499,7 @@ you need any help or guidance. -->
437
499
<!-- Pick one of these and delete the rest. -->
438
500
439
501
-[x] Feature gate (also fill in values in `kep.yaml`)
440
-
- Feature gate name: OpenAPIv3Enabled
502
+
- Feature gate name: OpenAPIV3
441
503
- Components depending on the feature gate: kube-apiserver
442
504
-[ ] Other
443
505
- Describe the mechanism:
@@ -489,31 +551,37 @@ will be enabled on some API servers and not others during the rollout.
489
551
Similarly, consider large clusters and how enablement/disablement will rollout
490
552
across nodes. -->
491
553
554
+
Version skew is discussed in a section above during a rolling control plane upgrade.
555
+
492
556
###### What specific metrics should inform a rollback?
493
557
494
558
<!-- What signals should users be paying attention to when the feature is young
495
559
that might indicate a serious problem? -->
496
560
561
+
Non 200 responses from the `openapi/v3` endpoint could indicate a problem. A long response time from the apiserver for OpenAPI requests could also be an indicator of a problem.
562
+
497
563
###### Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?
498
564
499
565
<!-- Describe manual testing that was done and the outcomes. Longer term, we may
500
566
want to require automated upgrade/rollback tests, but we are missing a bunch of
501
567
machinery and tooling and can't do that now. -->
502
568
569
+
n/a
570
+
503
571
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?
504
572
505
573
<!-- Even if applying deprecation policies, they may still surprise some users.
506
574
-->
507
575
576
+
No.
577
+
508
578
### Monitoring Requirements
509
579
510
580
<!-- This section must be completed when targeting beta to a release. -->
511
581
512
582
###### How can an operator determine if the feature is in use by workloads?
513
583
514
-
<!-- Ideally, this should be a metric. Operations against the Kubernetes API
515
-
(e.g., checking if there are objects with field X set) may be a last resort.
516
-
Avoid logs or events for this purpose. -->
584
+
The OpenAPI path `/openapi/v3` is populated. On the metrics side, an OpenAPI V3 specific metric is `crd_openapi_v3_aggregation_duration_seconds`, and should emit data if the feature is enabled.
517
585
518
586
###### How can someone using this feature know that it is working for their instance?
519
587
@@ -524,13 +592,7 @@ users below with sufficient detail so that they can verify correct enablement
524
592
and operation of this feature. Recall that end users cannot usually observe
525
593
component logs or access metrics. -->
526
594
527
-
-[ ] Events
528
-
- Event Reason:
529
-
-[ ] API .status
530
-
- Condition name:
531
-
- Other field:
532
-
-[ ] Other (treat as last resort)
533
-
- Details:
595
+
The `openapi/v3` endpoint will be populated with the list of groups if OpenAPI V3 is enabled.
534
596
535
597
###### What are the reasonable SLOs (Service Level Objectives) for the enhancement?
536
598
@@ -547,28 +609,45 @@ It's impossible to provide comprehensive guidance, but at the very high level
547
609
These goals will help you determine what you need to measure (SLIs) in the next
548
610
question. -->
549
611
612
+
This feature should not affect the SLO of any components.
613
+
614
+
OpenAPI v3 aggregation comes on-top of v2 aggregation => there is additional load. But v3 aggregation is cheap:
615
+
616
+
- OpenAPI v3 aggregation is much cheaper as every group-version is aggregated independently.
617
+
- APIServices (aggregated apiservers) coming and going (due to availability changes) do not lead to aggregation because kube-apiserver only acts as a proxy
618
+
- CRD aggregation is structurally trivial (no unification of definition names, which is quadratic in schema sizes) and hence linear in the number of CRDs per group-version.
619
+
- CRDs do not "come and go" as aggregated APIs, but it's a one-time per CRD schema change and API server startup operation.
620
+
550
621
###### What are the SLIs (Service Level Indicators) an operator can use to determine the health of the service?
551
622
552
623
<!-- Pick one more of these and delete the rest. -->
553
624
554
-
-[ ] Metrics
555
-
- Metric name:
556
-
-[Optional] Aggregation method:
557
-
- Components exposing the metric:
558
-
-[ ] Other (treat as last resort)
559
-
- Details:
625
+
A new metric will be added in CRD controller to measure the time taken to aggregate CRD OpenAPI specs.
<!-- This section must be completed when targeting beta to a release. -->
569
641
570
642
###### Does this feature depend on any specific services running in the cluster?
571
643
644
+
OpenAPI V3 aggregates from apiservers provided by APIService.
645
+
646
+
- APIService
647
+
- OpenAPI V3 fetches the `openapi/v3` endpoint and specs from aggregated API
648
+
- Impact of its outage on the feature: If an APIService is unavailable, the OpenAPI spec of the corresponding APIService will be unavailable but other OpenAPI specs will be unaffected. The resource usage will be better than with OpenAPI V2 because no aggregation is needed when APIServices become unavailable and available.
649
+
- Impact of its degraded performance or high-error rates on the feature: Same as above.
650
+
572
651
<!-- Think about both cluster-level services (e.g. metrics-server) as well as
573
652
node-level agents (e.g. specific version of CRI). Focus on external or optional
574
653
services that are needed. For example, if this feature depends on a cloud
@@ -608,7 +687,7 @@ mostly on:
608
687
heartbeats, leader election, etc.) -->
609
688
610
689
Yes. Get on the `/openapi/v3` endpoint as well as
611
-
`/openapi/v3/{group}/{version}` for each API group provided by Kubernetes
690
+
`/openapi/v3/{group}/{version}` for each API group provided by Kubernetes.
612
691
613
692
###### Will enabling / using this feature result in introducing new API types?
614
693
@@ -673,6 +752,8 @@ some monitoring details). For now, we leave it here. -->
673
752
674
753
###### How does this feature react if the API server and/or etcd is unavailable?
675
754
755
+
The feature is part of the API server and will not function if it is unavailable. It does not depend on the availability of etcd.
756
+
676
757
###### What are other known failure modes?
677
758
678
759
<!-- For each of them, fill in the following information by copying the below
@@ -687,6 +768,28 @@ template:
687
768
graduated to beta.
688
769
- Testing: Are there any tests for failure mode? If not, describe why. -->
689
770
771
+
- Failure in endpoint
772
+
- Detection: How can it be detected via metrics? Stated another way: how can
773
+
an operator troubleshoot without logging into a master or worker node?
774
+
775
+
Lack of 200 status in the OpenAPI endpoint. High latency in apiserver responses
776
+
777
+
- Mitigations: What can be done to stop the bleeding, especially for already
778
+
running user workloads?
779
+
780
+
OpenAPI V3 can be rolled back and disabled via the feature flag
781
+
782
+
- Diagnostics: What are the useful log messages and their required logging
783
+
levels that could help debug the issue? Not required until feature
784
+
graduated to beta.
785
+
786
+
Warning and Error logging messages having openapi/swagger/spec keywords.
787
+
788
+
- Testing: Are there any tests for failure mode? If not, describe why.
789
+
790
+
Tests will be added for failure conditions.
791
+
792
+
690
793
###### What steps should be taken if SLOs are not being met to determine the problem?
0 commit comments