Skip to content

Commit 0c4a1d5

Browse files
committed
Update with more design info
1 parent 58e1fa3 commit 0c4a1d5

File tree

1 file changed

+144
-168
lines changed

1 file changed

+144
-168
lines changed

keps/sig-cli/3805-ssa-default/README.md

Lines changed: 144 additions & 168 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
- [User Stories (Optional)](#user-stories-optional)
1111
- [Story 1](#story-1)
1212
- [Story 2](#story-2)
13-
- [Notes/Constraints/Caveats (Optional)](#notesconstraintscaveats-optional)
13+
- [Story 3](#story-3)
1414
- [Risks and Mitigations](#risks-and-mitigations)
1515
- [Design Details](#design-details)
1616
- [Test Plan](#test-plan)
@@ -19,6 +19,10 @@
1919
- [Integration tests](#integration-tests)
2020
- [e2e tests](#e2e-tests)
2121
- [Graduation Criteria](#graduation-criteria)
22+
- [Alpha](#alpha)
23+
- [Beta](#beta)
24+
- [GA](#ga)
25+
- [Deprecation](#deprecation)
2226
- [Upgrade / Downgrade Strategy](#upgrade--downgrade-strategy)
2327
- [Version Skew Strategy](#version-skew-strategy)
2428
- [Production Readiness Review Questionnaire](#production-readiness-review-questionnaire)
@@ -64,105 +68,144 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
6468

6569
## Summary
6670

67-
<!--
68-
This section is incredibly important for producing high-quality, user-focused
69-
documentation such as release notes or a development roadmap. It should be
70-
possible to collect this information before implementation begins, in order to
71-
avoid requiring implementors to split their attention between writing release
72-
notes and implementing the feature itself. KEP editors and SIG Docs
73-
should help to ensure that the tone and content of the `Summary` section is
74-
useful for a wide audience.
75-
76-
A good summary is probably at least a paragraph in length.
77-
78-
Both in this section and below, follow the guidelines of the [documentation
79-
style guide]. In particular, wrap lines to a reasonable length, to make it
80-
easier for reviewers to cite specific portions, and to minimize diff churn on
81-
updates.
82-
83-
[documentation style guide]: https://github.com/kubernetes/community/blob/master/contributors/guide/style-guide.md
84-
-->
71+
Server-Side Apply has been a major new feature of Kubernetes, and has
72+
landed as GA in Kubernetes v1.22. Unfortunately, while the feature is
73+
accessible through `kubectl apply –server-side=true`, new objects,
74+
existing objects, and even previously server-side applied objects are
75+
still client-side applied by default unless the proper flag value is
76+
specified. The main reason is that Server-Side Apply is not entirely
77+
backwards compatible with client-side, and breaking users of `kubectl
78+
apply` now would come with a cost.
8579

86-
## Motivation
80+
We’re proposing a way forward toward toggling the `--server-side` flag
81+
by default.
8782

88-
<!--
89-
This section is for explicitly listing the motivation, goals, and non-goals of
90-
this KEP. Describe why the change is important and the benefits to users. The
91-
motivation section can optionally provide links to [experience reports] to
92-
demonstrate the interest in a KEP within the wider Kubernetes community.
9383

94-
[experience reports]: https://github.com/golang/go/wiki/ExperienceReports
95-
-->
84+
## Motivation
85+
86+
While changing from client-side to server-side is difficult because it
87+
will break some people's expectations and might break some scripts too,
88+
but the feature is required none-the-less for its benefits, for users:
89+
- Users are missing out on Server-Side Apply feature because they don't
90+
know they can use it
91+
- Having both server-side and client-side feature is confusing for users
92+
who've tried the feature, and changing from one to the other can cause
93+
odd behaviors
94+
- Some flags in kubectl are meant for this historical CSA while some
95+
flags are meant for SSA, causing even more confusion
96+
- strategic-merge-patch (SMP) is not maintained and broken, a
97+
frustrating situation for users
98+
99+
For maintainers:
100+
- Many workflows need to be considered for both paradigms, making
101+
maintenance of kubectl more difficult
102+
- kubectl has a lot of code to maintain SMP and other client-side apply
103+
mechanisms. Removing the feature can greatly reduce the complexity of
104+
kubectl
96105

97106
### Goals
98107

99-
<!--
100-
List the specific goals of the KEP. What is it trying to achieve? How will we
101-
know that this has succeeded?
102-
-->
108+
The goal is to increase usage of Server-Side Apply so that users can
109+
benefits from the feature, mostly by turning the feature on by default.
103110

104111
### Non-Goals
105112

106-
<!--
107-
What is out of scope for this KEP? Listing non-goals helps to focus discussion
108-
and make progress.
109-
-->
113+
Good question.
110114

111115
## Proposal
112116

113-
<!--
114-
This is where we get down to the specifics of what the proposal actually is.
115-
This should have enough detail that reviewers can understand exactly what
116-
you're proposing, but should not include things like API designs or
117-
implementation. What is the desired outcome and how do we measure success?.
118-
The "Design Details" section below is for the real
119-
nitty-gritty.
120-
-->
117+
The feature consists of adding a new `auto` value to the `--server-side`
118+
flag for `kubectl apply` (and corresponding `kubectl diff`) and make
119+
this the default value. All the other values for the flag would continue
120+
to work as expected (for reference, `false` continues to client-side
121+
apply, and `true` continues to server-side apply).
122+
123+
The meaning of `auto` goes as follows:
124+
- Resources continue to be fetched (GET) before-hand
125+
- If the resource has a kubectl `last-applied` annotation, we infer that
126+
the resource is client-side applied, and we continue to client-side
127+
apply that resource
128+
- If the resource is new (GET returns 404), the resource is server-side applied
129+
- If the resource already exists but doesn't have the `last-applied`
130+
annotation, the resource is server-side applied
121131

122132
### User Stories (Optional)
123133

124-
<!--
125-
Detail the things that people will be able to do if this KEP is implemented.
126-
Include as much detail as possible so that people can understand the "how" of
127-
the system. The goal here is to make this feel real for users without getting
128-
bogged down.
129-
-->
130-
131134
#### Story 1
132135

136+
User 1 starts using `kubectl` for the first time or on a new
137+
project, everything will always be server-side applied, they will get
138+
conflicts and all the good things about server-side apply.
139+
133140
#### Story 2
134141

135-
### Notes/Constraints/Caveats (Optional)
142+
User 2 only uses server-side apply all the time, they may have
143+
`--server-side` inserted in some scripts or configuration, or they must
144+
remember to always include the flag. Thanks the feature, forgetting the
145+
flag is now doing the right thing by server-side applying all their
146+
resources as expected, causing less confusion and risks of messing up.
136147

137-
<!--
138-
What are the caveats to the proposal?
139-
What are some important details that didn't come across above?
140-
Go in to as much detail as necessary here.
141-
This might be a good place to talk about core concepts and how they relate.
142-
-->
148+
#### Story 3
143149

144-
### Risks and Mitigations
150+
User 3 has never used server-side apply and always used the default
151+
value of `--server-side`. Applying their existing resources will
152+
continue to work as expected. New resources, on the other hand, will at
153+
first apply the same but may eventually show-up with a conflict. They
154+
may have to update scripts to continue to work with
155+
`--server-side=false` if they want to, or they can update their
156+
scripts/workflow to address the conflicts properly.
145157

146-
<!--
147-
What are the risks of this proposal, and how do we mitigate? Think broadly.
148-
For example, consider both security and how this will impact the larger
149-
Kubernetes ecosystem.
150-
151-
How will security be reviewed, and by whom?
152-
153-
How will UX be reviewed, and by whom?
158+
### Risks and Mitigations
154159

155-
Consider including folks who also work outside the SIG or subproject.
156-
-->
160+
This design has no impact on security, but some consequences on UX, since the
161+
default UX of `kubectl apply` will change:
162+
- Currently, it can never fail because of conflict, since they are
163+
overridden by default. The new default for Server-Side Apply is to
164+
fail on conflicts, unless the `--force-conflict` flag is used. While
165+
people can re-run the command with the flag, this might impact CI/CD
166+
scripts that use `kubectl apply` directly, since they may not have a
167+
break-glass way to address that.
168+
- CSA injects a last-applied-annotation into the objects that it
169+
applies, but these don’t make sense in the context of SSA. An API or
170+
tool that would use this annotation to detect which resources have
171+
been applied would fail to find it for server-side applied objects.
172+
Note that doing that is heavily frowned upon.
173+
- CSA users expect all excluded fields to be removed from the applied
174+
object on the server. SSA has more complicated semantics for removing
175+
fields (e.g. if another user manages fields)
176+
177+
<!-- Missing mitigation here -->
157178

158179
## Design Details
159180

160-
<!--
161-
This section should contain enough information that the specifics of your
162-
change are understandable. This may include API specs (though not always
163-
required) or even code snippets. If there's any ambiguity about HOW your
164-
proposal will be implemented, this is the place to discuss them.
165-
-->
181+
As mentioned in the proposal, the new value of `auto` will detect if a
182+
resource has been client-side applied before and will continue to
183+
client-side apply these resources, while all other resources will be
184+
server-side applied. This means that new resources will be server-side
185+
applied by default.
186+
187+
The `last-applied` annotation is used to detect previous client-side
188+
applied object, hence it is not inserted for server-side objects.
189+
190+
The interaction with other `kubectl apply` flags can be described as
191+
follows:
192+
- `--force-conflicts` only applies to existing resources that are
193+
server-side applied
194+
- `--fieldmanager` applies to all resources
195+
- The `--prune` family of flags, I have no idea
196+
- `--overwrite` only applies to client-side applied resources
197+
198+
Some other commands might be impacted. `kubectl create` (and family)
199+
notably have a `--save-config` flag that create the last-applied
200+
annotation. While I don't know how many people actually use the flag,
201+
the idea of saving this as a config is confusing, since people don't
202+
actually have the file and so the situation doesn't really fit well the
203+
`apply` workflow. We suggest adding a warning when this flag is used, as
204+
well as updating its documentation to suggest not using it.
205+
206+
Because `kubectl diff` is supposed to map the behavior of `kubectl
207+
apply` as closely as possible, the change will also be done for that
208+
command.
166209

167210
### Test Plan
168211

@@ -228,121 +271,54 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
228271

229272
### Graduation Criteria
230273

231-
<!--
232-
**Note:** *Not required until targeted at a release.*
233-
234-
Define graduation milestones.
235-
236-
These may be defined in terms of API maturity, [feature gate] graduations, or as
237-
something else. The KEP should keep this high-level with a focus on what
238-
signals will be looked at to determine graduation.
239-
240-
Consider the following in developing the graduation criteria for this enhancement:
241-
- [Maturity levels (`alpha`, `beta`, `stable`)][maturity-levels]
242-
- [Feature gate][feature gate] lifecycle
243-
- [Deprecation policy][deprecation-policy]
244-
245-
Clearly define what graduation means by either linking to the [API doc
246-
definition](https://kubernetes.io/docs/concepts/overview/kubernetes-api/#api-versioning)
247-
or by redefining what graduation means.
248-
249-
In general we try to use the same stages (alpha, beta, GA), regardless of how the
250-
functionality is accessed.
251-
252-
[feature gate]: https://git.k8s.io/community/contributors/devel/sig-architecture/feature-gates.md
253-
[maturity-levels]: https://git.k8s.io/community/contributors/devel/sig-architecture/api_changes.md#alpha-beta-and-stable-versions
254-
[deprecation-policy]: https://kubernetes.io/docs/reference/using-api/deprecation-policy/
255-
256-
Below are some examples to consider, in addition to the aforementioned [maturity levels][maturity-levels].
257-
258274
#### Alpha
259275

260-
- Feature implemented behind a feature flag
261-
- Initial e2e tests completed and enabled
276+
Alpha is the current level of the feature since server-side apply is
277+
currently enabled by default in Kubernetes, but enabled on-demand by
278+
kubectl users.
279+
280+
The feature already has a fair amount of usage though since tools
281+
(sometimes outside of kubectl) have used it both as "clients" and in
282+
controllers.
262283

263284
#### Beta
264285

265-
- Gather feedback from developers and surveys
266-
- Complete features A, B, C
267-
- Additional tests are in Testgrid and linked in KEP
286+
Server-Side Apply has a very limited set of bugs or feature requests as
287+
this point and is definitely mature. Enabling client-side will allow
288+
increased usage and reduce burden cost for kubectl to maintain both
289+
mecahnisms.
268290

269291
#### GA
270292

271-
- N examples of real-world usage
272-
- N installs
273-
- More rigorous forms of testing—e.g., downgrade tests and scalability tests
274-
- Allowing time for feedback
293+
Kubectl doesn't have real-time metrics for usage. The decision to move
294+
to server-side entirely by default (if ever enabled) will be driven by
295+
bug reports and complaints from customers. Also by the ability to
296+
migrate existing client-side usage to server-side.
275297

276-
**Note:** Generally we also wait at least two releases between beta and
277-
GA/stable, because there's no opportunity for user feedback, or even bug reports,
278-
in back-to-back releases.
279-
280-
**For non-optional features moving to GA, the graduation criteria must include
281-
[conformance tests].**
298+
#### Deprecation
282299

283-
[conformance tests]: https://git.k8s.io/community/contributors/devel/sig-architecture/conformance-tests.md
300+
We are not intending to deprecate the flag, but we might remove the
301+
`--server-side` flag in the long term.
284302

285-
#### Deprecation
303+
Same thing applies for `--save-config` and other client-side related
304+
flags in kubectl which we might remove.
286305

287-
- Announce deprecation and support policy of the existing flag
288-
- Two versions passed since introducing the functionality that deprecates the flag (to address version skew)
289-
- Address feedback on usage/changed behavior, provided on GitHub issues
290-
- Deprecate the flag
291-
-->
306+
No deprecation is planned at that time though.
292307

293308
### Upgrade / Downgrade Strategy
294309

295-
<!--
296-
If applicable, how will the component be upgraded and downgraded? Make sure
297-
this is in the test plan.
298-
299-
Consider the following in developing an upgrade/downgrade strategy for this
300-
enhancement:
301-
- What changes (in invocations, configurations, API use, etc.) is an existing
302-
cluster required to make on upgrade, in order to maintain previous behavior?
303-
- What changes (in invocations, configurations, API use, etc.) is an existing
304-
cluster required to make on upgrade, in order to make use of the enhancement?
305-
-->
310+
While upgrade / downgrade doesn't really apply to a kubectl feature, we
311+
currently have a upgrade (and somewhat downgrade) feature in kubectl to
312+
go from client-side to server-side apply. The upgrade and downgrade
313+
works well in the nominal cases but fail with special cases. Enabling
314+
server-side by default also intends to address that problem.
306315

307316
### Version Skew Strategy
308317

309-
<!--
310-
If applicable, how will the component handle version skew with other
311-
components? What are the guarantees? Make sure this is in the test plan.
312-
313-
Consider the following in developing a version skew strategy for this
314-
enhancement:
315-
- Does this enhancement involve coordinating behavior in the control plane and
316-
in the kubelet? How does an n-2 kubelet without this feature available behave
317-
when this feature is used?
318-
- Will any other components on the node change? For example, changes to CSI,
319-
CRI or CNI may require updating that component before the kubelet.
320-
-->
318+
N/A.
321319

322320
## Production Readiness Review Questionnaire
323321

324-
<!--
325-
326-
Production readiness reviews are intended to ensure that features merging into
327-
Kubernetes are observable, scalable and supportable; can be safely operated in
328-
production environments, and can be disabled or rolled back in the event they
329-
cause increased failures in production. See more in the PRR KEP at
330-
https://git.k8s.io/enhancements/keps/sig-architecture/1194-prod-readiness.
331-
332-
The production readiness review questionnaire must be completed and approved
333-
for the KEP to move to `implementable` status and be included in the release.
334-
335-
In some cases, the questions below should also have answers in `kep.yaml`. This
336-
is to enable automation to verify the presence of the review, and to reduce review
337-
burden and latency.
338-
339-
The KEP must have a approver from the
340-
[`prod-readiness-approvers`](http://git.k8s.io/enhancements/OWNERS_ALIASES)
341-
team. Please reach out on the
342-
[#prod-readiness](https://kubernetes.slack.com/archives/CPNHUMN74) channel if
343-
you need any help or guidance.
344-
-->
345-
346322
### Feature Enablement and Rollback
347323

348324
###### How can this feature be enabled / disabled in a live cluster?

0 commit comments

Comments
 (0)