Skip to content

Commit 6c31cac

Browse files
authored
Merge pull request #37110 from lavalamp/patch-2
Add a blog post about advanced serverside apply
2 parents 7515fc6 + c5d3999 commit 6c31cac

File tree

1 file changed

+208
-0
lines changed

1 file changed

+208
-0
lines changed
Lines changed: 208 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,208 @@
1+
---
2+
layout: blog
3+
title: "Server Side Apply Is Great And You Should Be Using It"
4+
date: 2022-10-20
5+
slug: advanced-server-side-apply
6+
---
7+
8+
**Author:** Daniel Smith (Google)
9+
10+
Server-side apply (SSA) has now been [GA for a few
11+
releases](https://kubernetes.io/blog/2021/08/06/server-side-apply-ga/), and I
12+
have found myself in a number of conversations, recommending that people / teams
13+
in various situations use it. So I’d like to write down some of those reasons.
14+
15+
## Obvious (and not-so-obvious) benefits of SSA {#benefits}
16+
17+
A list of improvements / niceties you get from switching from various things to
18+
Server-side apply!
19+
20+
* Versus client-side-apply (that is, plain `kubectl apply`):
21+
* The system gives you conflicts when you accidentally fight with another
22+
actor over the value of a field!
23+
* When combined with --dry-run, there’s no chance of accidentally running a
24+
client-side dry run instead of a server side dry run.
25+
* Versus hand-rolling patches:
26+
* The SSA patch format is extremely natural to write, with no weird syntax.
27+
It’s just a regular object, but you can (and should) omit any field you
28+
don’t care about.
29+
* The old patch format (“strategic merge patch”) was ad-hoc and still has some
30+
bugs; JSON-patch and JSON merge-patch fail to handle some cases that are
31+
common in the Kubernetes API, namely lists with items that should be
32+
recursively merged based on a “name” or other identifying field.
33+
* There’s also now great [go-language library support](https://kubernetes.io/blog/2021/08/06/server-side-apply-ga/#using-server-side-apply-in-a-controller)
34+
for building apply calls programmatically!
35+
* You can use SSA to explicitly delete fields you don’t “own” by setting them
36+
to `null`, which makes it a feature-complete replacement for all of the old
37+
patch formats.
38+
* Versus shelling out to kubectl:
39+
* You can use the apply api call from any language without shelling out to
40+
kubectl!
41+
* As stated above, the [go library has dedicated mechanisms](https://kubernetes.io/blog/2021/08/06/server-side-apply-ga/#server-side-apply-support-in-client-go)
42+
to make this easy now.
43+
* Versus GET-modify-PUT:
44+
* (This one is more complicated and you can skip it if you’ve never written a
45+
controller!)
46+
* To use GET-modify-PUT correctly, you have to handle and retry a write
47+
failure in the case that someone else has modified the object in any way
48+
between your GET and PUT. This is an “optimistic concurrency failure” when
49+
it happens.
50+
* SSA offloads this task to the server– you only have to retry if there’s a
51+
conflict, and the conflicts you can get are all meaningful, like when you’re
52+
actually trying to take a field away from another actor in the system.
53+
* To put it another way, if 10 actors do a GET-modify-PUT cycle at the same
54+
time, 9 will get an optimistic concurrency failure and have to retry, then
55+
8, etc, for up to 50 total GET-PUT attempts in the worst case (that’s .5N^2
56+
GET and PUT calls for N actors making simultaneous changes). If the actors
57+
are using SSA instead, and the changes don’t actually conflict over specific
58+
fields, then all the changes can go in in any order. Additionally, SSA
59+
changes can often be done without a GET call at all. That’s only N **apply**
60+
requests for N actors, which is a drastic improvement!
61+
62+
## How can I use SSA?
63+
64+
### Users
65+
66+
Use `kubectl apply --server-side`! Soon we (SIG API Machinery) hope to make this
67+
the default and remove the “client side” apply completely!
68+
69+
### Controller authors
70+
71+
There’s two main categories here, but for both of them, **you should probably
72+
_force conflicts_ when using SSA**. This is because your controller probably
73+
doesn’t know what to do when some other entity in the system has a different
74+
desire than your controller about a particular field. (See the [CI/CD
75+
section](#ci-cd-systems), though!)
76+
77+
#### Controllers that use either a GET-modify-PUT sequence or a PATCH {#get-modify-put-patch-controllers}
78+
79+
This kind of controller GETs an object (possibly from a
80+
[**watch**](https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes)),
81+
modifies it, and then PUTs it back to write its changes. Sometimes it constructs
82+
a custom PATCH, but the semantics are the same. Most existing controllers
83+
(especially those in-tree) work like this.
84+
85+
If your controller is perfect, great! You don’t need to change it. But if you do
86+
want to change it, you can take advantage of the new client library’s _extract_
87+
workflow– that is, **get** the existing object, extract your existing desires,
88+
make modifications, and re-**apply**. For many controllers that were computing
89+
the smallest API changes possible, this will be a minor update to the existing
90+
implementation.
91+
92+
This workflow avoids the failure mode of accidentally trying to own every field
93+
in the object, which is what happens if you just GET the object, make changes,
94+
and then **apply**. (Note that the server will notice you did this and reject
95+
your change!)
96+
97+
#### Reconstructive controllers
98+
99+
This kind of controller wasn’t really possible prior to SSA. The idea here is to
100+
(whenever something changes etc) reconstruct from scratch the fields of the
101+
object as the controller wishes them to be, and then **apply** the change to the
102+
server, letting it figure out the result. I now recommend that new controllers
103+
start out this way–it's less fiddly to say what you want an object to look like
104+
than it is to say how you want it to change.
105+
106+
The client library supports this method of operation by default.
107+
108+
The only downside is that you may end up sending unneeded **apply** requests to
109+
the API server, even if actually the object already matches your controller’s
110+
desires. This doesn’t matter if it happens once in a while, but for extremely
111+
high-throughput controllers, it might cause a performance problem for the
112+
cluster–specifically, the API server. No-op writes are not written to storage
113+
(etcd) or broadcast to any watchers, so it’s not really that big of a deal. If
114+
you’re worried about this anyway, today you could use the method explained in
115+
the previous section, or you could still do it this way for now, and wait for an
116+
additional client-side mechanism to suppress zero-change applies.
117+
118+
To get around this downside, why not GET the object and only send your **apply**
119+
if the object needs it? Surprisingly, it doesn’t help much– a no-op **apply** is
120+
not very much more work for the API server than an extra GET; and an **apply**
121+
that changes things is cheaper than that same **apply** with a preceding GET.
122+
Worse, since it is a distributed system, something could change between your GET
123+
and **apply**, invalidating your computation. Instead, we can use this
124+
optimization on an object retrieved from a cache–then it legitimately will
125+
reduce load on the system (at the cost of a delay when a change is needed and
126+
the cache is a bit behind).
127+
128+
#### CI/CD systems {#ci-cd-systems}
129+
130+
Continuous integration (CI) and/or continuous deployment (CD) systems are a
131+
special kind of controller which is doing something like reading manifests from
132+
source control (such as a Git repo) and automatically pushing them into the
133+
cluster. Perhaps the CI / CD process first generates manifests from a template,
134+
then runs some tests, and then deploys a change. Typically, users are the
135+
entities pushing changes into source control, although that’s not necessarily
136+
always the case.
137+
138+
Some systems like this continuously reconcile with the cluster, others may only
139+
operate when a change is pushed to the source control system. The following
140+
considerations are important for both, but more so for the continuously
141+
reconciling kind.
142+
143+
CI/CD systems are literally controllers, but for the purpose of **apply**, they
144+
are more like users, and unlike other controllers, they need to pay attention to
145+
conflicts. Reasoning:
146+
* Abstractly, CI/CD systems can change anything, which means they could conflict
147+
with **any** controller out there. The recommendation that controllers force
148+
conflicts is assuming that controllers change a limited number of things and
149+
you can be reasonably sure that they won’t fight with other controllers about
150+
those things; that’s clearly not the case for CI/CD controllers.
151+
* Concrete example: imagine the CI/CD system wants `.spec.replicas` for some
152+
Deployment to be 3, because that is the value that is checked into source
153+
code; however there is also a HorizontalPodAutoscaler (HPA) that targets the
154+
same deployment. The HPA computes a target scale and decides that there should
155+
be 10 replicas. Which should win? I just said that most controllers–including
156+
the HPA–should ignore conflicts. The HPA has no idea if it has been enabled
157+
incorrectly, and the HPA has no convenient way of informing users of errors.
158+
* The other common cause of a CI/CD system getting a conflict is probably when
159+
it is trying to overwrite a hot-fix (hand-rolled patch) placed there by a
160+
system admin / SRE / dev-on-call. You almost certainly don’t want to override
161+
that automatically.
162+
* Of course, sometimes SRE makes an accidental change, or a dev makes an
163+
unauthorized change – those you do want to notice and overwrite; however, the
164+
CI/CD system can’t tell the difference between these last two cases.
165+
166+
Hopefully this convinces you that CI/CD systems need error paths–a way to
167+
back-propagate these conflict errors to humans; in fact, they should have this
168+
already, certainly continuous integration systems need some way to report that
169+
tests are failing. But maybe I can also say something about how _humans_ can
170+
deal with errors:
171+
* Reject the hotfix: the (human) administrator of the CI/CD system observes the
172+
error, and manually force-applies the manifest in question. Then the CI/CD
173+
system will be able to apply the manifest successfully and become a co-owner.
174+
175+
Optional: then the administrator applies a blank manifest (just the object
176+
type / namespace / name) to relinquish any fields they became a manager for.
177+
if this step is omitted, there's some chance the administrator will end up
178+
owning fields and causing an unwanted future conflict.
179+
180+
Note: why an administrator? We're assuming that developers which ordinarily
181+
push to the CI/CD system and/or its source control system may not have
182+
permissions to push directly to the cluster.
183+
* Accept the hotfix: the author of the change in question sees the conflict, and
184+
edits their change to accept the value running in production.
185+
* Accept then reject: as in the accept option, but after that manifest is
186+
applied, and the CI/CD queue owns everything again (so no conflicts), re-apply
187+
the original manifest.
188+
* I can also imagine the CI/CD system permitting you to mark a manifest as
189+
“force conflicts” somehow– if there’s demand for this we could consider making
190+
a more standardized way to do this. A rigorous version of this which lets you
191+
declare exactly which conflicts you intend to force would require support from
192+
the API server; in lieu of that, you can make a second manifest with only that
193+
subset of fields.
194+
* Future work: we could imagine an especially advanced CI/CD system that could
195+
parse `metadata.managedFields` data to see who or what they are conflicting
196+
with, over what fields, and decide whether or not to ignore the conflict. In
197+
fact, this information is also presented in any conflict errors, though
198+
perhaps not in an easily machine-parseable format. We (SIG API Machinery)
199+
mostly didn't expect that people would want to take this approach — so we
200+
would love to know if in fact people want/need the features implied by this
201+
approach, such as the ability, when **apply**ing to request to override
202+
certain conflicts but not others.
203+
204+
If this sounds like an approach you'd want to take for your own controller,
205+
come talk to SIG API Machinery!
206+
207+
Happy **apply**ing!
208+

0 commit comments

Comments
 (0)