Skip to content

Commit 98c940c

Browse files
authored
Merge pull request kubernetes#2405 from Jefftree/ssa
Update Server Side Apply KEP for 1.21 GA
2 parents 57651fb + 1e7521a commit 98c940c

File tree

3 files changed

+205
-3
lines changed

3 files changed

+205
-3
lines changed
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
kep-number: 555
2+
stable:
3+
approver: "@deads2k"

keps/sig-api-machinery/555-server-side-apply/README.md

Lines changed: 185 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,13 @@
1919
- [Proposed Change](#proposed-change)
2020
- [Alternatives](#alternatives)
2121
- [Implementation History](#implementation-history)
22+
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
23+
- [Feature Enablement and Rollback](#feature-enablement-and-rollback)
24+
- [Rollout, Upgrade and Rollback Planning](#rollout-upgrade-and-rollback-planning)
25+
- [Monitoring Requirements](#monitoring-requirements)
26+
- [Dependencies](#dependencies)
27+
- [Scalability](#scalability)
28+
- [Troubleshooting](#troubleshooting)
2229
- [Risks and Mitigations](#risks-and-mitigations)
2330
- [Testing Plan](#testing-plan)
2431
- [Graduation Criteria](#graduation-criteria)
@@ -279,6 +286,176 @@ The conversion between the two and creating the diff was complex and would have
279286

280287
- 12/2019 [#86083](https://github.com/kubernetes/kubernetes/pull/86083) implementing a poc for the described approach
281288

289+
## Production Readiness Review Questionnaire
290+
291+
<!--
292+
293+
Production readiness reviews are intended to ensure that features merging into
294+
Kubernetes are observable, scalable and supportable; can be safely operated in
295+
production environments, and can be disabled or rolled back in the event they
296+
cause increased failures in production. See more in the PRR KEP at
297+
https://git.k8s.io/enhancements/keps/sig-architecture/1194-prod-readiness.
298+
299+
The production readiness review questionnaire must be completed and approved
300+
for the KEP to move to `implementable` status and be included in the release.
301+
302+
In some cases, the questions below should also have answers in `kep.yaml`. This
303+
is to enable automation to verify the presence of the review, and to reduce review
304+
burden and latency.
305+
306+
The KEP must have a approver from the
307+
[`prod-readiness-approvers`](http://git.k8s.io/enhancements/OWNERS_ALIASES)
308+
team. Please reach out on the
309+
[#prod-readiness](https://kubernetes.slack.com/archives/CPNHUMN74) channel if
310+
you need any help or guidance.
311+
312+
-->
313+
314+
### Feature Enablement and Rollback
315+
316+
_This section must be completed when targeting alpha to a release._
317+
318+
* **How can this feature be enabled / disabled in a live cluster?**
319+
- [x] Feature gate (also fill in values in `kep.yaml`)
320+
- Feature gate name: [ServerSideApply](https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/apiserver/pkg/features/kube_features.go#L100)
321+
- Components depending on the feature gate: kube-apiserver
322+
323+
* **Does enabling the feature change any default behavior?**
324+
325+
While this changes how objects are modified and then stored in the database, all the changes should be strictly backward compatible, and shouldn’t break existing automation or users. The increase in size can possibly have adverse, surprising consequences including increased memory usage for controllers, increased bandwidth usage when fetching objects, bigger objects when displaying for users (kubectl get -o yaml). We’re trying to mitigate all of these with the addition of a new header.
326+
327+
* **Can the feature be disabled once it has been enabled (i.e. can we roll back
328+
the enablement)?**
329+
Also set `disable-supported` to `true` or `false` in `kep.yaml`.
330+
Describe the consequences on existing workloads (e.g., if this is a runtime
331+
feature, can it break the existing applications?).
332+
333+
Yes. The consequence is that managed fields will be reset for server-side applied objects (requiring a read/write cycle on the impacted resources).
334+
335+
* **What happens if we reenable the feature if it was previously rolled back?**
336+
337+
The feature will be restored. Server-side applied objects will have lost their “set” which may cause some surprising behavior (fields might not be removed as expected).
338+
339+
* **Are there any tests for feature enablement/disablement?**
340+
The e2e framework does not currently support enabling or disabling feature
341+
gates. However, unit tests in each component dealing with managing data, created
342+
with and without the feature, are necessary. At the very least, think about
343+
conversion tests if API types are being modified.
344+
345+
Tests are in place for upgrading from client side to server side apply and vice versa.
346+
347+
### Rollout, Upgrade and Rollback Planning
348+
349+
_This section must be completed when targeting beta graduation to a release._
350+
351+
* **How can a rollout fail? Can it impact already running workloads?**
352+
Try to be as paranoid as possible - e.g., what if some components will restart
353+
mid-rollout?
354+
There is no specific way that the rollout can fail. The rollout can't impact existing workload.
355+
* **What specific metrics should inform a rollback?**
356+
357+
The feature shouldn't affect any existing behavior. A surprisingly high number of modification rejections could be a sign that something is not working properly.
358+
359+
* **Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
360+
361+
Because the feature doesn't affect existing behavior, rollback and upgrades haven't be specifically tested.
362+
The feature is being used by the cluster role aggregator though. Upgrading/downgrading/upgrading, which
363+
could result in the managedFields being removed, wouldn't cause any problems since the `Rules` field
364+
filled by the controller is `atomic`, and thus doesn't depend on the current state of the managedFields.
365+
366+
The new `managedFields` field is cleared when it is incorrect. That protects us from having invalid data inserted by a potential bad upgrade.
367+
368+
* **Is the rollout accompanied by any deprecations and/or removals of features, APIs,
369+
fields of API types, flags, etc.?** No
370+
No.
371+
### Monitoring Requirements
372+
373+
_This section must be completed when targeting beta graduation to a release._
374+
375+
* **How can an operator determine if the feature is in use by workloads?**
376+
Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
377+
checking if there are objects with field X set) may be a last resort. Avoid
378+
logs or events for this purpose.
379+
380+
Any existing metric split by request verb will record the [APPLY](https://github.com/kubernetes/kubernetes/blob/8f6ffb24df989608b87451f89b8ac9fc338ed71c/staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go#L507-L509) verb if the feature is in use.
381+
382+
Additionally, the OpenAPI spec exposes the available media-type for each individual endpoint. The presence of the `apply` type for the PATCH verb of a endpoints indicates whether the feature is enabled for that specific resource, e.g.
383+
```json
384+
...
385+
"patch": {
386+
"consumes": [
387+
"application/json-patch+json",
388+
"application/merge-patch+json",
389+
"application/strategic-merge-patch+json",
390+
"application/apply-patch+yaml"
391+
],
392+
...
393+
}
394+
...
395+
```
396+
397+
* **What are the SLIs (Service Level Indicators) an operator can use to determine
398+
the health of the service?**
399+
400+
There is no specific metric attached to server side apply. All PATCH requests that utilize SSA will use the verb APPLY when logging metrics. API Server metrics that are split by verb automatically include this. They include `apiserver_request_total`, `apiserver_longrunning_gauge`, `apiserver_response_sizes`, `apiserver_request_terminations_total`, `apiserver_selfrequest_total`
401+
- Components exposing the metric: kube-apiserver
402+
403+
Apply requests (`PATCH` with `application/apply-patch+yaml` mime type) have the same level of SLIs as other types of requests.
404+
405+
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** n/a
406+
Apply requests (`PATCH` with `application/apply-patch+yaml` mime type) have the same level of SLOs as other types of requests.
407+
* **Are there any missing metrics that would be useful to have to improve observability
408+
of this feature?** n/a
409+
410+
### Dependencies
411+
412+
* **Does this feature depend on any specific services running in the cluster?** No
413+
414+
### Scalability
415+
416+
* **Will enabling / using this feature result in any new API calls?** No
417+
418+
* **Will enabling / using this feature result in introducing new API types?**
419+
Describe them, providing: No
420+
421+
* **Will enabling / using this feature result in any new calls to the cloud
422+
provider?** No
423+
424+
* **Will enabling / using this feature result in increasing size or count of
425+
the existing API objects?** Objects applied using server side apply will have their managed fields metadata populated. `managedFields` metadata fields can represent up to 60% of the total size of an object, increasing the size of objects.
426+
427+
* **Will enabling / using this feature result in increasing time taken by any
428+
operations covered by [existing SLIs/SLOs]?** No
429+
430+
* **Will enabling / using this feature result in non-negligible increase of
431+
resource usage (CPU, RAM, disk, IO, ...) in any components?** Since objects are larger with the new `managedFields`, caches as well as network bandwidth requirement will increase.
432+
433+
### Troubleshooting
434+
435+
The Troubleshooting section currently serves the `Playbook` role. We may consider
436+
splitting it into a dedicated `Playbook` document (potentially with some monitoring
437+
details). For now, we leave it here.
438+
439+
_This section must be completed when targeting beta graduation to a release._
440+
441+
* **How does this feature react if the API server and/or etcd is unavailable?**
442+
443+
The feature is part of of the API server and will not function without it
444+
445+
* **What are other known failure modes?**
446+
For each of them, fill in the following information by copying the below template:
447+
- [Failure mode brief description]
448+
- Detection: How can it be detected via metrics? Stated another way:
449+
how can an operator troubleshoot without logging into a master or worker node? Apply requests (`PATCH` with `application/apply-patch+yaml` mime type) have the same level of SLIs as other types of requests.
450+
- Mitigations: What can be done to stop the bleeding, especially for already
451+
running user workloads? This shouldn't affect running workloads, and this feature shouldn't alter the behavior of previously existing mechanisms like PATCH and PUT.
452+
- Diagnostics: What are the useful log messages and their required logging
453+
levels that could help debug the issue? The feature uses very little logging, and errors should be returned directly to the user.
454+
Not required until feature graduated to beta.
455+
- Testing: Are there any tests for failure mode? Failure modes are tested exhaustively both as unit-tests and as integration tests.
456+
457+
* **What steps should be taken if SLOs are not being met to determine the problem?** n/a
458+
282459
### Risks and Mitigations
283460

284461
We used a feature branch to ensure that no partial state of this feature would
@@ -341,6 +518,8 @@ Integration tests for:
341518
- [x] Apply works with custom resources. [link](https://github.com/kubernetes/kubernetes/blob/b55417f429353e1109df8b3bfa2afc8dbd9f240b/staging/src/k8s.io/apiextensions-apiserver/test/integration/apply_test.go#L34-L117)
342519
- [x] Run kubectl apply tests with server-side flag enabled. [link](https://github.com/kubernetes/kubernetes/blob/81e6407393aa46f2695e71a015f93819f1df424c/test/cmd/apply.sh#L246-L314)
343520

521+
E2E and Conformance tests will be added for GA.
522+
344523
## Graduation Criteria
345524

346525
An alpha version of this is targeted for 1.14.
@@ -349,8 +528,11 @@ This can be promoted to beta when it is a drop-in replacement for the existing
349528
kubectl apply, and has no regressions (which aren't bug fixes). This KEP will be
350529
updated when we know the concrete things changing for beta.
351530

352-
This will be promoted to GA once it's gone a sufficient amount of time as beta
353-
with no changes. A KEP update will precede this.
531+
A GA version of this is targeted for 1.21.
532+
533+
- E2E tests are created and graduate to conformance
534+
- [Apply for client-go's typed client](https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/2144-clientgo-apply) is implemented and at least one kube-controller-manager uses that client
535+
- Outstanding bugs around status wiping and scale subresource are fixed
354536

355537
### Upgrade / Downgrade Strategy
356538

@@ -423,6 +605,7 @@ annotation is preserved and up-to-date as described in the downgrade above.
423605
* Early 2018: @lavalamp begins thinking about apply and writing design docs
424606
* 2018Q3: Design shift from merge + diff to tracking field managers.
425607
* 2019Q1: Alpha.
608+
* 2019Q3: Beta.
426609

427610
(For more details, one can view the apply-wg recordings, or join the mailing list
428611
and view the meeting notes. TODO: links)

keps/sig-api-machinery/555-server-side-apply/kep.yaml

Lines changed: 17 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,13 +11,29 @@ reviewers:
1111
- "@erictune"
1212
approvers:
1313
- "@bgrant0607"
14+
prr-approvers:
15+
- "@deads2k"
1416
editor: TBD
1517
creation-date: 2018-03-28
16-
last-updated: 2018-03-28
18+
last-updated: 2021-02-21
1719
status: implementable
1820
see-also:
1921
- n/a
2022
replaces:
2123
- n/a
2224
superseded-by:
2325
- n/a
26+
27+
stage: stable
28+
latest-milestone: "v1.21"
29+
30+
milestone:
31+
alpha: "v1.14"
32+
beta: "v1.16"
33+
stable: "v1.21"
34+
35+
feature-gates:
36+
- name: ServerSideApply
37+
components:
38+
- kube-apiserver
39+
disable-supported: true

0 commit comments

Comments
 (0)