Skip to content

Commit 2d6d22d

Browse files
authored
Merge pull request #31440 from MikeSpreitzer/note-apf-autoupdate
Catch APF description up with recent developments
2 parents d7e1bca + 2bfc318 commit 2d6d22d

File tree

2 files changed

+156
-82
lines changed

2 files changed

+156
-82
lines changed

content/en/docs/concepts/cluster-administration/flow-control.md

Lines changed: 155 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -42,21 +42,21 @@ Fairness feature enabled.
4242
## Enabling/Disabling API Priority and Fairness
4343

4444
The API Priority and Fairness feature is controlled by a feature gate
45-
and is enabled by default. See
46-
[Feature Gates](/docs/reference/command-line-tools-reference/feature-gates/)
45+
and is enabled by default. See [Feature
46+
Gates](/docs/reference/command-line-tools-reference/feature-gates/)
4747
for a general explanation of feature gates and how to enable and
4848
disable them. The name of the feature gate for APF is
4949
"APIPriorityAndFairness". This feature also involves an {{<
5050
glossary_tooltip term_id="api-group" text="API Group" >}} with: (a) a
51-
`v1alpha1` version, disabled by default, and (b) a `v1beta1`
52-
version, enabled by default. You can disable the feature
53-
gate and API group v1beta1 version by adding the following
51+
`v1alpha1` version, disabled by default, and (b) `v1beta1` and
52+
`v1beta2` versions, enabled by default. You can disable the feature
53+
gate and API group beta versions by adding the following
5454
command-line flags to your `kube-apiserver` invocation:
5555

5656
```shell
5757
kube-apiserver \
5858
--feature-gates=APIPriorityAndFairness=false \
59-
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta1=false \
59+
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta1=false,flowcontrol.apiserver.k8s.io/v1beta2=false \
6060
# …and other flags as usual
6161
```
6262

@@ -127,86 +127,13 @@ any of the limitations imposed by this feature. These exemptions prevent an
127127
improperly-configured flow control configuration from totally disabling an API
128128
server.
129129

130-
## Defaults
131-
132-
The Priority and Fairness feature ships with a suggested configuration that
133-
should suffice for experimentation; if your cluster is likely to
134-
experience heavy load then you should consider what configuration will work
135-
best. The suggested configuration groups requests into five priority
136-
classes:
137-
138-
* The `system` priority level is for requests from the `system:nodes` group,
139-
i.e. Kubelets, which must be able to contact the API server in order for
140-
workloads to be able to schedule on them.
141-
142-
* The `leader-election` priority level is for leader election requests from
143-
built-in controllers (in particular, requests for `endpoints`, `configmaps`,
144-
or `leases` coming from the `system:kube-controller-manager` or
145-
`system:kube-scheduler` users and service accounts in the `kube-system`
146-
namespace). These are important to isolate from other traffic because failures
147-
in leader election cause their controllers to fail and restart, which in turn
148-
causes more expensive traffic as the new controllers sync their informers.
149-
150-
* The `workload-high` priority level is for other requests from built-in
151-
controllers.
152-
153-
* The `workload-low` priority level is for requests from any other service
154-
account, which will typically include all requests from controllers running in
155-
Pods.
156-
157-
* The `global-default` priority level handles all other traffic, e.g.
158-
interactive `kubectl` commands run by nonprivileged users.
159-
160-
Additionally, there are two PriorityLevelConfigurations and two FlowSchemas that
161-
are built in and may not be overwritten:
162-
163-
* The special `exempt` priority level is used for requests that are not subject
164-
to flow control at all: they will always be dispatched immediately. The
165-
special `exempt` FlowSchema classifies all requests from the `system:masters`
166-
group into this priority level. You may define other FlowSchemas that direct
167-
other requests to this priority level, if appropriate.
168-
169-
* The special `catch-all` priority level is used in combination with the special
170-
`catch-all` FlowSchema to make sure that every request gets some kind of
171-
classification. Typically you should not rely on this catch-all configuration,
172-
and should create your own catch-all FlowSchema and PriorityLevelConfiguration
173-
(or use the `global-default` configuration that is installed by default) as
174-
appropriate. To help catch configuration errors that miss classifying some
175-
requests, the mandatory `catch-all` priority level only allows one concurrency
176-
share and does not queue requests, making it relatively likely that traffic
177-
that only matches the `catch-all` FlowSchema will be rejected with an HTTP 429
178-
error.
179-
180-
## Health check concurrency exemption
181-
182-
The suggested configuration gives no special treatment to the health
183-
check requests on kube-apiservers from their local kubelets --- which
184-
tend to use the secured port but supply no credentials. With the
185-
suggested config, these requests get assigned to the `global-default`
186-
FlowSchema and the corresponding `global-default` priority level,
187-
where other traffic can crowd them out.
188-
189-
If you add the following additional FlowSchema, this exempts those
190-
requests from rate limiting.
191-
192-
{{< caution >}}
193-
Making this change also allows any hostile party to then send
194-
health-check requests that match this FlowSchema, at any volume they
195-
like. If you have a web traffic filter or similar external security
196-
mechanism to protect your cluster's API server from general internet
197-
traffic, you can configure rules to block any health check requests
198-
that originate from outside your cluster.
199-
{{< /caution >}}
200-
201-
{{< codenew file="priority-and-fairness/health-for-strangers.yaml" >}}
202-
203130
## Resources
204131

205132
The flow control API involves two kinds of resources.
206-
[PriorityLevelConfigurations](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#prioritylevelconfiguration-v1beta1-flowcontrol-apiserver-k8s-io)
133+
[PriorityLevelConfigurations](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#prioritylevelconfiguration-v1beta2-flowcontrol-apiserver-k8s-io)
207134
define the available isolation classes, the share of the available concurrency
208135
budget that each can handle, and allow for fine-tuning queuing behavior.
209-
[FlowSchemas](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#flowschema-v1beta1-flowcontrol-apiserver-k8s-io)
136+
[FlowSchemas](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#flowschema-v1beta2-flowcontrol-apiserver-k8s-io)
210137
are used to classify individual inbound requests, matching each to a
211138
single PriorityLevelConfiguration. There is also a `v1alpha1` version
212139
of the same API group, and it has the same Kinds with the same syntax and
@@ -329,6 +256,153 @@ omitted entirely), in which case all requests matched by this FlowSchema will be
329256
considered part of a single flow. The correct choice for a given FlowSchema
330257
depends on the resource and your particular environment.
331258

259+
## Defaults
260+
261+
Each kube-apiserver maintains two sorts of APF configuration objects:
262+
mandatory and suggested.
263+
264+
### Mandatory Configuration Objects
265+
266+
The four mandatory configuration objects reflect fixed built-in
267+
guardrail behavior. This is behavior that the servers have before
268+
those objects exist, and when those objects exist their specs reflect
269+
this behavior. The four mandatory objects are as follows.
270+
271+
* The mandatory `exempt` priority level is used for requests that are
272+
not subject to flow control at all: they will always be dispatched
273+
immediately. The mandatory `exempt` FlowSchema classifies all
274+
requests from the `system:masters` group into this priority
275+
level. You may define other FlowSchemas that direct other requests
276+
to this priority level, if appropriate.
277+
278+
* The mandatory `catch-all` priority level is used in combination with
279+
the mandatory `catch-all` FlowSchema to make sure that every request
280+
gets some kind of classification. Typically you should not rely on
281+
this catch-all configuration, and should create your own catch-all
282+
FlowSchema and PriorityLevelConfiguration (or use the suggested
283+
`global-default` priority level that is installed by default) as
284+
appropriate. Because it is not expected to be used normally, the
285+
mandatory `catch-all` priority level has a very small concurrency
286+
share and does not queue requests.
287+
288+
### Suggested Configuration Objects
289+
290+
The suggested FlowSchemas and PriorityLevelConfigurations constitute a
291+
reasonable default configuration. You can modify these and/or create
292+
additional configuration objects if you want. If your cluster is
293+
likely to experience heavy load then you should consider what
294+
configuration will work best.
295+
296+
The suggested configuration groups requests into six priority levels:
297+
298+
* The `node-high` priority level is for health updates from nodes.
299+
300+
* The `system` priority level is for non-health requests from the
301+
`system:nodes` group, i.e. Kubelets, which must be able to contact
302+
the API server in order for workloads to be able to schedule on
303+
them.
304+
305+
* The `leader-election` priority level is for leader election requests from
306+
built-in controllers (in particular, requests for `endpoints`, `configmaps`,
307+
or `leases` coming from the `system:kube-controller-manager` or
308+
`system:kube-scheduler` users and service accounts in the `kube-system`
309+
namespace). These are important to isolate from other traffic because failures
310+
in leader election cause their controllers to fail and restart, which in turn
311+
causes more expensive traffic as the new controllers sync their informers.
312+
313+
* The `workload-high` priority level is for other requests from built-in
314+
controllers.
315+
316+
* The `workload-low` priority level is for requests from any other service
317+
account, which will typically include all requests from controllers running in
318+
Pods.
319+
320+
* The `global-default` priority level handles all other traffic, e.g.
321+
interactive `kubectl` commands run by nonprivileged users.
322+
323+
The suggested FlowSchemas serve to steer requests into the above
324+
priority levels, and are not enumerated here.
325+
326+
### Maintenance of the Mandatory and Suggested Configuration Objects
327+
328+
Each `kube-apiserver` independently maintains the mandatory and
329+
suggested configuration objects, using initial and periodic behavior.
330+
Thus, in a situation with a mixture of servers of different versions
331+
there may be thrashing as long as different servers have different
332+
opinions of the proper content of these objects.
333+
334+
Each `kube-apiserver` makes an inital maintenance pass over the
335+
mandatory and suggested configuration objects, and after that does
336+
periodic maintenance (once per minute) of those objects.
337+
338+
For the mandatory configuration objects, maintenance consists of
339+
ensuring that the object exists and, if it does, has the proper spec.
340+
The server refuses to allow a creation or update with a spec that is
341+
inconsistent with the server's guardrail behavior.
342+
343+
Maintenance of suggested configuration objects is designed to allow
344+
their specs to be overridden. Deletion, on the other hand, is not
345+
respected: maintenance will restore the object. If you do not want a
346+
suggested configuration object then you need to keep it around but set
347+
its spec to have minimal consequences. Maintenance of suggested
348+
objects is also designed to support automatic migration when a new
349+
version of the `kube-apiserver` is rolled out, albeit potentially with
350+
thrashing while there is a mixed population of servers.
351+
352+
Maintenance of a suggested configuration object consists of creating
353+
it --- with the server's suggested spec --- if the object does not
354+
exist. OTOH, if the object already exists, maintenance behavior
355+
depends on whether the `kube-apiservers` or the users control the
356+
object. In the former case, the server ensures that the object's spec
357+
is what the server suggests; in the latter case, the spec is left
358+
alone.
359+
360+
The question of who controls the object is answered by first looking
361+
for an annotation with key `apf.kubernetes.io/autoupdate-spec`. If
362+
there is such an annotation and its value is `true` then the
363+
kube-apiservers control the object. If there is such an annotation
364+
and its value is `false` then the users control the object. If
365+
neither of those condtions holds then the `metadata.generation` of the
366+
object is consulted. If that is 1 then the kube-apiservers control
367+
the object. Otherwise the users control the object. These rules were
368+
introduced in release 1.22 and their consideration of
369+
`metadata.generation` is for the sake of migration from the simpler
370+
earlier behavior. Users who wish to control a suggested configuration
371+
object should set its `apf.kubernetes.io/autoupdate-spec` annotation
372+
to `false`.
373+
374+
Maintenance of a mandatory or suggested configuration object also
375+
includes ensuring that it has an `apf.kubernetes.io/autoupdate-spec`
376+
annotation that accurately reflects whether the kube-apiservers
377+
control the object.
378+
379+
Maintenance also includes deleting objects that are neither mandatory
380+
nor suggested but are annotated
381+
`apf.kubernetes.io/autoupdate-spec=true`.
382+
383+
## Health check concurrency exemption
384+
385+
The suggested configuration gives no special treatment to the health
386+
check requests on kube-apiservers from their local kubelets --- which
387+
tend to use the secured port but supply no credentials. With the
388+
suggested config, these requests get assigned to the `global-default`
389+
FlowSchema and the corresponding `global-default` priority level,
390+
where other traffic can crowd them out.
391+
392+
If you add the following additional FlowSchema, this exempts those
393+
requests from rate limiting.
394+
395+
{{< caution >}}
396+
Making this change also allows any hostile party to then send
397+
health-check requests that match this FlowSchema, at any volume they
398+
like. If you have a web traffic filter or similar external security
399+
mechanism to protect your cluster's API server from general internet
400+
traffic, you can configure rules to block any health check requests
401+
that originate from outside your cluster.
402+
{{< /caution >}}
403+
404+
{{< codenew file="priority-and-fairness/health-for-strangers.yaml" >}}
405+
332406
## Diagnostics
333407

334408
Every HTTP response from an API server with the priority and fairness feature

content/en/examples/priority-and-fairness/health-for-strangers.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
1+
apiVersion: flowcontrol.apiserver.k8s.io/v1beta2
22
kind: FlowSchema
33
metadata:
44
name: health-for-strangers

0 commit comments

Comments
 (0)