Skip to content

Commit 2bfc318

Browse files
committed
Catch APF description up with recent developments
Primarily the change, in release 1.22, of how configuration objects are maintained. Also describe the new priority level. Also move the sections on defaults and health check configuration to follow the description of configuration objects.
1 parent a3a6d46 commit 2bfc318

File tree

2 files changed

+156
-82
lines changed

2 files changed

+156
-82
lines changed

content/en/docs/concepts/cluster-administration/flow-control.md

Lines changed: 155 additions & 81 deletions
Original file line numberDiff line numberDiff line change
@@ -42,21 +42,21 @@ Fairness feature enabled.
4242
## Enabling/Disabling API Priority and Fairness
4343

4444
The API Priority and Fairness feature is controlled by a feature gate
45-
and is enabled by default. See
46-
[Feature Gates](/docs/reference/command-line-tools-reference/feature-gates/)
45+
and is enabled by default. See [Feature
46+
Gates](/docs/reference/command-line-tools-reference/feature-gates/)
4747
for a general explanation of feature gates and how to enable and
4848
disable them. The name of the feature gate for APF is
4949
"APIPriorityAndFairness". This feature also involves an {{<
5050
glossary_tooltip term_id="api-group" text="API Group" >}} with: (a) a
51-
`v1alpha1` version, disabled by default, and (b) a `v1beta1`
52-
version, enabled by default. You can disable the feature
53-
gate and API group v1beta1 version by adding the following
51+
`v1alpha1` version, disabled by default, and (b) `v1beta1` and
52+
`v1beta2` versions, enabled by default. You can disable the feature
53+
gate and API group beta versions by adding the following
5454
command-line flags to your `kube-apiserver` invocation:
5555

5656
```shell
5757
kube-apiserver \
5858
--feature-gates=APIPriorityAndFairness=false \
59-
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta1=false \
59+
--runtime-config=flowcontrol.apiserver.k8s.io/v1beta1=false,flowcontrol.apiserver.k8s.io/v1beta2=false \
6060
# …and other flags as usual
6161
```
6262

@@ -127,86 +127,13 @@ any of the limitations imposed by this feature. These exemptions prevent an
127127
improperly-configured flow control configuration from totally disabling an API
128128
server.
129129

130-
## Defaults
131-
132-
The Priority and Fairness feature ships with a suggested configuration that
133-
should suffice for experimentation; if your cluster is likely to
134-
experience heavy load then you should consider what configuration will work
135-
best. The suggested configuration groups requests into five priority
136-
classes:
137-
138-
* The `system` priority level is for requests from the `system:nodes` group,
139-
i.e. Kubelets, which must be able to contact the API server in order for
140-
workloads to be able to schedule on them.
141-
142-
* The `leader-election` priority level is for leader election requests from
143-
built-in controllers (in particular, requests for `endpoints`, `configmaps`,
144-
or `leases` coming from the `system:kube-controller-manager` or
145-
`system:kube-scheduler` users and service accounts in the `kube-system`
146-
namespace). These are important to isolate from other traffic because failures
147-
in leader election cause their controllers to fail and restart, which in turn
148-
causes more expensive traffic as the new controllers sync their informers.
149-
150-
* The `workload-high` priority level is for other requests from built-in
151-
controllers.
152-
153-
* The `workload-low` priority level is for requests from any other service
154-
account, which will typically include all requests from controllers running in
155-
Pods.
156-
157-
* The `global-default` priority level handles all other traffic, e.g.
158-
interactive `kubectl` commands run by nonprivileged users.
159-
160-
Additionally, there are two PriorityLevelConfigurations and two FlowSchemas that
161-
are built in and may not be overwritten:
162-
163-
* The special `exempt` priority level is used for requests that are not subject
164-
to flow control at all: they will always be dispatched immediately. The
165-
special `exempt` FlowSchema classifies all requests from the `system:masters`
166-
group into this priority level. You may define other FlowSchemas that direct
167-
other requests to this priority level, if appropriate.
168-
169-
* The special `catch-all` priority level is used in combination with the special
170-
`catch-all` FlowSchema to make sure that every request gets some kind of
171-
classification. Typically you should not rely on this catch-all configuration,
172-
and should create your own catch-all FlowSchema and PriorityLevelConfiguration
173-
(or use the `global-default` configuration that is installed by default) as
174-
appropriate. To help catch configuration errors that miss classifying some
175-
requests, the mandatory `catch-all` priority level only allows one concurrency
176-
share and does not queue requests, making it relatively likely that traffic
177-
that only matches the `catch-all` FlowSchema will be rejected with an HTTP 429
178-
error.
179-
180-
## Health check concurrency exemption
181-
182-
The suggested configuration gives no special treatment to the health
183-
check requests on kube-apiservers from their local kubelets --- which
184-
tend to use the secured port but supply no credentials. With the
185-
suggested config, these requests get assigned to the `global-default`
186-
FlowSchema and the corresponding `global-default` priority level,
187-
where other traffic can crowd them out.
188-
189-
If you add the following additional FlowSchema, this exempts those
190-
requests from rate limiting.
191-
192-
{{< caution >}}
193-
Making this change also allows any hostile party to then send
194-
health-check requests that match this FlowSchema, at any volume they
195-
like. If you have a web traffic filter or similar external security
196-
mechanism to protect your cluster's API server from general internet
197-
traffic, you can configure rules to block any health check requests
198-
that originate from outside your cluster.
199-
{{< /caution >}}
200-
201-
{{< codenew file="priority-and-fairness/health-for-strangers.yaml" >}}
202-
203130
## Resources
204131

205132
The flow control API involves two kinds of resources.
206-
[PriorityLevelConfigurations](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#prioritylevelconfiguration-v1beta1-flowcontrol-apiserver-k8s-io)
133+
[PriorityLevelConfigurations](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#prioritylevelconfiguration-v1beta2-flowcontrol-apiserver-k8s-io)
207134
define the available isolation classes, the share of the available concurrency
208135
budget that each can handle, and allow for fine-tuning queuing behavior.
209-
[FlowSchemas](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#flowschema-v1beta1-flowcontrol-apiserver-k8s-io)
136+
[FlowSchemas](/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#flowschema-v1beta2-flowcontrol-apiserver-k8s-io)
210137
are used to classify individual inbound requests, matching each to a
211138
single PriorityLevelConfiguration. There is also a `v1alpha1` version
212139
of the same API group, and it has the same Kinds with the same syntax and
@@ -329,6 +256,153 @@ omitted entirely), in which case all requests matched by this FlowSchema will be
329256
considered part of a single flow. The correct choice for a given FlowSchema
330257
depends on the resource and your particular environment.
331258

259+
## Defaults
260+
261+
Each kube-apiserver maintains two sorts of APF configuration objects:
262+
mandatory and suggested.
263+
264+
### Mandatory Configuration Objects
265+
266+
The four mandatory configuration objects reflect fixed built-in
267+
guardrail behavior. This is behavior that the servers have before
268+
those objects exist, and when those objects exist their specs reflect
269+
this behavior. The four mandatory objects are as follows.
270+
271+
* The mandatory `exempt` priority level is used for requests that are
272+
not subject to flow control at all: they will always be dispatched
273+
immediately. The mandatory `exempt` FlowSchema classifies all
274+
requests from the `system:masters` group into this priority
275+
level. You may define other FlowSchemas that direct other requests
276+
to this priority level, if appropriate.
277+
278+
* The mandatory `catch-all` priority level is used in combination with
279+
the mandatory `catch-all` FlowSchema to make sure that every request
280+
gets some kind of classification. Typically you should not rely on
281+
this catch-all configuration, and should create your own catch-all
282+
FlowSchema and PriorityLevelConfiguration (or use the suggested
283+
`global-default` priority level that is installed by default) as
284+
appropriate. Because it is not expected to be used normally, the
285+
mandatory `catch-all` priority level has a very small concurrency
286+
share and does not queue requests.
287+
288+
### Suggested Configuration Objects
289+
290+
The suggested FlowSchemas and PriorityLevelConfigurations constitute a
291+
reasonable default configuration. You can modify these and/or create
292+
additional configuration objects if you want. If your cluster is
293+
likely to experience heavy load then you should consider what
294+
configuration will work best.
295+
296+
The suggested configuration groups requests into six priority levels:
297+
298+
* The `node-high` priority level is for health updates from nodes.
299+
300+
* The `system` priority level is for non-health requests from the
301+
`system:nodes` group, i.e. Kubelets, which must be able to contact
302+
the API server in order for workloads to be able to schedule on
303+
them.
304+
305+
* The `leader-election` priority level is for leader election requests from
306+
built-in controllers (in particular, requests for `endpoints`, `configmaps`,
307+
or `leases` coming from the `system:kube-controller-manager` or
308+
`system:kube-scheduler` users and service accounts in the `kube-system`
309+
namespace). These are important to isolate from other traffic because failures
310+
in leader election cause their controllers to fail and restart, which in turn
311+
causes more expensive traffic as the new controllers sync their informers.
312+
313+
* The `workload-high` priority level is for other requests from built-in
314+
controllers.
315+
316+
* The `workload-low` priority level is for requests from any other service
317+
account, which will typically include all requests from controllers running in
318+
Pods.
319+
320+
* The `global-default` priority level handles all other traffic, e.g.
321+
interactive `kubectl` commands run by nonprivileged users.
322+
323+
The suggested FlowSchemas serve to steer requests into the above
324+
priority levels, and are not enumerated here.
325+
326+
### Maintenance of the Mandatory and Suggested Configuration Objects
327+
328+
Each `kube-apiserver` independently maintains the mandatory and
329+
suggested configuration objects, using initial and periodic behavior.
330+
Thus, in a situation with a mixture of servers of different versions
331+
there may be thrashing as long as different servers have different
332+
opinions of the proper content of these objects.
333+
334+
Each `kube-apiserver` makes an inital maintenance pass over the
335+
mandatory and suggested configuration objects, and after that does
336+
periodic maintenance (once per minute) of those objects.
337+
338+
For the mandatory configuration objects, maintenance consists of
339+
ensuring that the object exists and, if it does, has the proper spec.
340+
The server refuses to allow a creation or update with a spec that is
341+
inconsistent with the server's guardrail behavior.
342+
343+
Maintenance of suggested configuration objects is designed to allow
344+
their specs to be overridden. Deletion, on the other hand, is not
345+
respected: maintenance will restore the object. If you do not want a
346+
suggested configuration object then you need to keep it around but set
347+
its spec to have minimal consequences. Maintenance of suggested
348+
objects is also designed to support automatic migration when a new
349+
version of the `kube-apiserver` is rolled out, albeit potentially with
350+
thrashing while there is a mixed population of servers.
351+
352+
Maintenance of a suggested configuration object consists of creating
353+
it --- with the server's suggested spec --- if the object does not
354+
exist. OTOH, if the object already exists, maintenance behavior
355+
depends on whether the `kube-apiservers` or the users control the
356+
object. In the former case, the server ensures that the object's spec
357+
is what the server suggests; in the latter case, the spec is left
358+
alone.
359+
360+
The question of who controls the object is answered by first looking
361+
for an annotation with key `apf.kubernetes.io/autoupdate-spec`. If
362+
there is such an annotation and its value is `true` then the
363+
kube-apiservers control the object. If there is such an annotation
364+
and its value is `false` then the users control the object. If
365+
neither of those condtions holds then the `metadata.generation` of the
366+
object is consulted. If that is 1 then the kube-apiservers control
367+
the object. Otherwise the users control the object. These rules were
368+
introduced in release 1.22 and their consideration of
369+
`metadata.generation` is for the sake of migration from the simpler
370+
earlier behavior. Users who wish to control a suggested configuration
371+
object should set its `apf.kubernetes.io/autoupdate-spec` annotation
372+
to `false`.
373+
374+
Maintenance of a mandatory or suggested configuration object also
375+
includes ensuring that it has an `apf.kubernetes.io/autoupdate-spec`
376+
annotation that accurately reflects whether the kube-apiservers
377+
control the object.
378+
379+
Maintenance also includes deleting objects that are neither mandatory
380+
nor suggested but are annotated
381+
`apf.kubernetes.io/autoupdate-spec=true`.
382+
383+
## Health check concurrency exemption
384+
385+
The suggested configuration gives no special treatment to the health
386+
check requests on kube-apiservers from their local kubelets --- which
387+
tend to use the secured port but supply no credentials. With the
388+
suggested config, these requests get assigned to the `global-default`
389+
FlowSchema and the corresponding `global-default` priority level,
390+
where other traffic can crowd them out.
391+
392+
If you add the following additional FlowSchema, this exempts those
393+
requests from rate limiting.
394+
395+
{{< caution >}}
396+
Making this change also allows any hostile party to then send
397+
health-check requests that match this FlowSchema, at any volume they
398+
like. If you have a web traffic filter or similar external security
399+
mechanism to protect your cluster's API server from general internet
400+
traffic, you can configure rules to block any health check requests
401+
that originate from outside your cluster.
402+
{{< /caution >}}
403+
404+
{{< codenew file="priority-and-fairness/health-for-strangers.yaml" >}}
405+
332406
## Diagnostics
333407

334408
Every HTTP response from an API server with the priority and fairness feature

content/en/examples/priority-and-fairness/health-for-strangers.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
apiVersion: flowcontrol.apiserver.k8s.io/v1beta1
1+
apiVersion: flowcontrol.apiserver.k8s.io/v1beta2
22
kind: FlowSchema
33
metadata:
44
name: health-for-strangers

0 commit comments

Comments
 (0)