You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
- Components depending on the feature gate: kube-apiserver
322
+
323
+
***Does enabling the feature change any default behavior?**
324
+
325
+
While this changes how objects are modified and then stored in the database, all the changes should be strictly backward compatible, and shouldn’t break existing automation or users. The increase in size can possibly have adverse, surprising consequences including increased memory usage for controllers, increased bandwidth usage when fetching objects, bigger objects when displaying for users (kubectl get -o yaml). We’re trying to mitigate all of these with the addition of a new header.
326
+
327
+
***Can the feature be disabled once it has been enabled (i.e. can we roll back
328
+
the enablement)?**
329
+
Also set `disable-supported` to `true` or `false` in `kep.yaml`.
330
+
Describe the consequences on existing workloads (e.g., if this is a runtime
331
+
feature, can it break the existing applications?).
332
+
333
+
Yes. Managed fields will be reset for server-side applied objects.
334
+
335
+
***What happens if we reenable the feature if it was previously rolled back?**
336
+
337
+
The feature will be restored. Server-side applied objects will have lost their “set” which may cause some surprising behavior (fields might not be removed as expected).
338
+
339
+
***Are there any tests for feature enablement/disablement?**
340
+
The e2e framework does not currently support enabling or disabling feature
341
+
gates. However, unit tests in each component dealing with managing data, created
342
+
with and without the feature, are necessary. At the very least, think about
343
+
conversion tests if API types are being modified.
344
+
345
+
Tests are in place for upgrading from client side to server side apply and vice versa.
346
+
347
+
### Rollout, Upgrade and Rollback Planning
348
+
349
+
_This section must be completed when targeting beta graduation to a release._
350
+
351
+
***How can a rollout fail? Can it impact already running workloads?**
352
+
Try to be as paranoid as possible - e.g., what if some components will restart
353
+
mid-rollout?
354
+
355
+
***What specific metrics should inform a rollback?**
356
+
357
+
358
+
359
+
***Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
360
+
Describe manual testing that was done and the outcomes.
361
+
Longer term, we may want to require automated upgrade/rollback tests, but we
362
+
are missing a bunch of machinery and tooling and can't do that now.
363
+
364
+
***Is the rollout accompanied by any deprecations and/or removals of features, APIs,
365
+
fields of API types, flags, etc.?** No
366
+
367
+
### Monitoring Requirements
368
+
369
+
_This section must be completed when targeting beta graduation to a release._
370
+
371
+
***How can an operator determine if the feature is in use by workloads?**
372
+
Ideally, this should be a metric. Operations against the Kubernetes API (e.g.,
373
+
checking if there are objects with field X set) may be a last resort. Avoid
374
+
logs or events for this purpose.
375
+
376
+
Any existing metric split by request verb will record the [APPLY](https://github.com/kubernetes/kubernetes/blob/8f6ffb24df989608b87451f89b8ac9fc338ed71c/staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go#L507-L509) verb if the feature is in use.
377
+
378
+
***What are the SLIs (Service Level Indicators) an operator can use to determine
379
+
the health of the service?**
380
+
381
+
There is no specific metric attached to server side apply. All PATCH requests that utilize SSA will use the verb APPLY when logging metrics. API Server metrics that are split by verb automatically include this. They include `apiserver_request_total`, `apiserver_longrunning_gauge`, `apiserver_response_sizes`, `apiserver_request_terminations_total`, `apiserver_selfrequest_total`
382
+
- Components exposing the metric: kube-apiserver
383
+
384
+
***What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** n/a
385
+
386
+
***Are there any missing metrics that would be useful to have to improve observability
387
+
of this feature?** n/a
388
+
389
+
### Dependencies
390
+
391
+
***Does this feature depend on any specific services running in the cluster?** No
392
+
393
+
### Scalability
394
+
395
+
***Will enabling / using this feature result in any new API calls?** No
396
+
397
+
***Will enabling / using this feature result in introducing new API types?**
398
+
Describe them, providing: No
399
+
400
+
***Will enabling / using this feature result in any new calls to the cloud
401
+
provider?** No
402
+
403
+
***Will enabling / using this feature result in increasing size or count of
404
+
the existing API objects?** Objects applied using server side apply will have their managed fields metadata populated.
405
+
406
+
***Will enabling / using this feature result in increasing time taken by any
407
+
operations covered by [existing SLIs/SLOs]?** No
408
+
409
+
***Will enabling / using this feature result in non-negligible increase of
410
+
resource usage (CPU, RAM, disk, IO, ...) in any components?** No
411
+
412
+
### Troubleshooting
413
+
414
+
The Troubleshooting section currently serves the `Playbook` role. We may consider
415
+
splitting it into a dedicated `Playbook` document (potentially with some monitoring
416
+
details). For now, we leave it here.
417
+
418
+
_This section must be completed when targeting beta graduation to a release._
419
+
420
+
***How does this feature react if the API server and/or etcd is unavailable?**
421
+
422
+
The feature is part of of the API server and will not function without it
423
+
424
+
***What are other known failure modes?**
425
+
For each of them, fill in the following information by copying the below template:
426
+
-[Failure mode brief description]
427
+
- Detection: How can it be detected via metrics? Stated another way:
428
+
how can an operator troubleshoot without logging into a master or worker node?
429
+
- Mitigations: What can be done to stop the bleeding, especially for already
430
+
running user workloads?
431
+
- Diagnostics: What are the useful log messages and their required logging
432
+
levels that could help debug the issue?
433
+
Not required until feature graduated to beta.
434
+
- Testing: Are there any tests for failure mode? If not, describe why.
435
+
436
+
***What steps should be taken if SLOs are not being met to determine the problem?** n/a
We used a feature branch to ensure that no partial state of this feature would
@@ -341,6 +500,8 @@ Integration tests for:
341
500
-[x] Apply works with custom resources. [link](https://github.com/kubernetes/kubernetes/blob/b55417f429353e1109df8b3bfa2afc8dbd9f240b/staging/src/k8s.io/apiextensions-apiserver/test/integration/apply_test.go#L34-L117)
342
501
-[x] Run kubectl apply tests with server-side flag enabled. [link](https://github.com/kubernetes/kubernetes/blob/81e6407393aa46f2695e71a015f93819f1df424c/test/cmd/apply.sh#L246-L314)
343
502
503
+
E2E and Conformance tests will be added for GA.
504
+
344
505
## Graduation Criteria
345
506
346
507
An alpha version of this is targeted for 1.14.
@@ -349,8 +510,11 @@ This can be promoted to beta when it is a drop-in replacement for the existing
349
510
kubectl apply, and has no regressions (which aren't bug fixes). This KEP will be
350
511
updated when we know the concrete things changing for beta.
351
512
352
-
This will be promoted to GA once it's gone a sufficient amount of time as beta
353
-
with no changes. A KEP update will precede this.
513
+
A GA version of this is targeted for 1.21.
514
+
515
+
- E2E tests are created and graduate to conformance
516
+
-[Apply for client-go's typed client](https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/2144-clientgo-apply) is implemented and has an in-tree controller using it
517
+
- Outstanding bugs around status wiping and scale subresource are fixed
354
518
355
519
### Upgrade / Downgrade Strategy
356
520
@@ -423,6 +587,7 @@ annotation is preserved and up-to-date as described in the downgrade above.
423
587
* Early 2018: @lavalamp begins thinking about apply and writing design docs
424
588
* 2018Q3: Design shift from merge + diff to tracking field managers.
425
589
* 2019Q1: Alpha.
590
+
* 2019Q3: Beta.
426
591
427
592
(For more details, one can view the apply-wg recordings, or join the mailing list
0 commit comments