You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-api-machinery/555-server-side-apply/README.md
+29-13Lines changed: 29 additions & 13 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -351,19 +351,19 @@ _This section must be completed when targeting beta graduation to a release._
351
351
***How can a rollout fail? Can it impact already running workloads?**
352
352
Try to be as paranoid as possible - e.g., what if some components will restart
353
353
mid-rollout?
354
-
354
+
There is no specific way that the rollout can fail. The rollout can't impact existing workload.
355
355
***What specific metrics should inform a rollback?**
356
356
357
-
357
+
The feature shouldn't affect any existing behavior. A surprisingly high number of modification rejections could be a sign that something is not working properly.
358
358
359
359
***Were upgrade and rollback tested? Was the upgrade->downgrade->upgrade path tested?**
360
-
Describe manual testing that was done and the outcomes.
361
-
Longer term, we may want to require automated upgrade/rollback tests, but we
362
-
are missing a bunch of machinery and tooling and can't do that now.
360
+
Because the feature doesn't affect existing behavior, rollback and upgrades haven't be specifically tested.
361
+
362
+
The new `managedFields` field is cleared when it is incorrect. That protects us from having invalid data inserted by a potential bad upgrade.
363
363
364
364
***Is the rollout accompanied by any deprecations and/or removals of features, APIs,
365
365
fields of API types, flags, etc.?** No
366
-
366
+
No.
367
367
### Monitoring Requirements
368
368
369
369
_This section must be completed when targeting beta graduation to a release._
@@ -375,14 +375,30 @@ _This section must be completed when targeting beta graduation to a release._
375
375
376
376
Any existing metric split by request verb will record the [APPLY](https://github.com/kubernetes/kubernetes/blob/8f6ffb24df989608b87451f89b8ac9fc338ed71c/staging/src/k8s.io/apiserver/pkg/endpoints/metrics/metrics.go#L507-L509) verb if the feature is in use.
377
377
378
+
Additionally, the OpenAPI spec exposes the available media-type for each individual endpoint. The presence of the `apply` type for the PATCH verb of a endpoints indicates whether the feature is enabled for that specific resource, e.g.
379
+
```json
380
+
...
381
+
"patch": {
382
+
"consumes": [
383
+
"application/json-patch+json",
384
+
"application/merge-patch+json",
385
+
"application/strategic-merge-patch+json",
386
+
"application/apply-patch+yaml"
387
+
],
388
+
...
389
+
}
390
+
...
391
+
378
392
* **What are the SLIs (Service Level Indicators) an operator can use to determine
379
393
the health of the service?**
380
394
381
395
There is no specific metric attached to server side apply. All PATCH requests that utilize SSA will use the verb APPLY when logging metrics. API Server metrics that are split by verb automatically include this. They include `apiserver_request_total`, `apiserver_longrunning_gauge`, `apiserver_response_sizes`, `apiserver_request_terminations_total`, `apiserver_selfrequest_total`
382
396
- Components exposing the metric: kube-apiserver
397
+
398
+
Apply requests (`PATCH` with `application/apply-patch+yaml` mime type) have the same level of SLIs as other types of requests.
383
399
384
400
* **What are the reasonable SLOs (Service Level Objectives) for the above SLIs?** n/a
385
-
401
+
Apply requests (`PATCH` with `application/apply-patch+yaml` mime type) have the same level of SLOs as other types of requests.
386
402
* **Are there any missing metrics that would be useful to have to improve observability
387
403
of this feature?** n/a
388
404
@@ -401,13 +417,13 @@ of this feature?** n/a
401
417
provider?** No
402
418
403
419
* **Will enabling / using this feature result in increasing size or count of
404
-
the existing API objects?** Objects applied using server side apply will have their managed fields metadata populated.
420
+
the existing API objects?** Objects applied using server side apply will have their managed fields metadata populated. `managedFields` metadata fields can represent up to 60% of the total size of an object, increasing the size of objects.
405
421
406
422
* **Will enabling / using this feature result in increasing time taken by any
407
423
operations covered by [existing SLIs/SLOs]?** No
408
424
409
425
* **Will enabling / using this feature result in non-negligible increase of
410
-
resource usage (CPU, RAM, disk, IO, ...) in any components?**No
426
+
resource usage (CPU, RAM, disk, IO, ...) in any components?** Since objects are larger with the new `managedFields`, caches as well as network bandwidth requirement will increase.
411
427
412
428
### Troubleshooting
413
429
@@ -425,13 +441,13 @@ The feature is part of of the API server and will not function without it
425
441
For each of them, fill in the following information by copying the below template:
426
442
- [Failure mode brief description]
427
443
- Detection: How can it be detected via metrics? Stated another way:
428
-
how can an operator troubleshoot without logging into a master or worker node?
444
+
how can an operator troubleshoot without logging into a master or worker node? Apply requests (`PATCH` with `application/apply-patch+yaml` mime type) have the same level of SLIs as other types of requests.
429
445
- Mitigations: What can be done to stop the bleeding, especially for already
430
-
running user workloads?
446
+
running user workloads? This shouldn't affect running workloads, and this feature shouldn't alter the behavior of previously existing mechanisms like PATCH and PUT.
431
447
- Diagnostics: What are the useful log messages and their required logging
432
-
levels that could help debug the issue?
448
+
levels that could help debug the issue? The feature uses very little logging, and errors should be returned directly to the user.
433
449
Not required until feature graduated to beta.
434
-
- Testing: Are there any tests for failure mode? If not, describe why.
450
+
- Testing: Are there any tests for failure mode? Failure modes are tested exhaustively both as unit-tests and as integration tests.
435
451
436
452
* **What steps should be taken if SLOs are not being met to determine the problem?** n/a
0 commit comments