@@ -42,21 +42,21 @@ Fairness feature enabled.
4242## Enabling/Disabling API Priority and Fairness
4343
4444The API Priority and Fairness feature is controlled by a feature gate
45- and is enabled by default. See
46- [ Feature Gates] ( /docs/reference/command-line-tools-reference/feature-gates/ )
45+ and is enabled by default. See [ Feature
46+ Gates] ( /docs/reference/command-line-tools-reference/feature-gates/ )
4747for a general explanation of feature gates and how to enable and
4848disable them. The name of the feature gate for APF is
4949"APIPriorityAndFairness". This feature also involves an {{<
5050glossary_tooltip term_id="api-group" text="API Group" >}} with: (a) a
51- ` v1alpha1 ` version, disabled by default, and (b) a ` v1beta1 `
52- version, enabled by default. You can disable the feature
53- gate and API group v1beta1 version by adding the following
51+ ` v1alpha1 ` version, disabled by default, and (b) ` v1beta1 ` and
52+ ` v1beta2 ` versions, enabled by default. You can disable the feature
53+ gate and API group beta versions by adding the following
5454command-line flags to your ` kube-apiserver ` invocation:
5555
5656``` shell
5757kube-apiserver \
5858--feature-gates=APIPriorityAndFairness=false \
59- --runtime-config=flowcontrol.apiserver.k8s.io/v1beta1=false \
59+ --runtime-config=flowcontrol.apiserver.k8s.io/v1beta1=false,flowcontrol.apiserver.k8s.io/v1beta2=false \
6060 # …and other flags as usual
6161```
6262
@@ -127,86 +127,13 @@ any of the limitations imposed by this feature. These exemptions prevent an
127127improperly-configured flow control configuration from totally disabling an API
128128server.
129129
130- ## Defaults
131-
132- The Priority and Fairness feature ships with a suggested configuration that
133- should suffice for experimentation; if your cluster is likely to
134- experience heavy load then you should consider what configuration will work
135- best. The suggested configuration groups requests into five priority
136- classes:
137-
138- * The ` system ` priority level is for requests from the ` system:nodes ` group,
139- i.e. Kubelets, which must be able to contact the API server in order for
140- workloads to be able to schedule on them.
141-
142- * The ` leader-election ` priority level is for leader election requests from
143- built-in controllers (in particular, requests for ` endpoints ` , ` configmaps ` ,
144- or ` leases ` coming from the ` system:kube-controller-manager ` or
145- ` system:kube-scheduler ` users and service accounts in the ` kube-system `
146- namespace). These are important to isolate from other traffic because failures
147- in leader election cause their controllers to fail and restart, which in turn
148- causes more expensive traffic as the new controllers sync their informers.
149-
150- * The ` workload-high ` priority level is for other requests from built-in
151- controllers.
152-
153- * The ` workload-low ` priority level is for requests from any other service
154- account, which will typically include all requests from controllers running in
155- Pods.
156-
157- * The ` global-default ` priority level handles all other traffic, e.g.
158- interactive ` kubectl ` commands run by nonprivileged users.
159-
160- Additionally, there are two PriorityLevelConfigurations and two FlowSchemas that
161- are built in and may not be overwritten:
162-
163- * The special ` exempt ` priority level is used for requests that are not subject
164- to flow control at all: they will always be dispatched immediately. The
165- special ` exempt ` FlowSchema classifies all requests from the ` system:masters `
166- group into this priority level. You may define other FlowSchemas that direct
167- other requests to this priority level, if appropriate.
168-
169- * The special ` catch-all ` priority level is used in combination with the special
170- ` catch-all ` FlowSchema to make sure that every request gets some kind of
171- classification. Typically you should not rely on this catch-all configuration,
172- and should create your own catch-all FlowSchema and PriorityLevelConfiguration
173- (or use the ` global-default ` configuration that is installed by default) as
174- appropriate. To help catch configuration errors that miss classifying some
175- requests, the mandatory ` catch-all ` priority level only allows one concurrency
176- share and does not queue requests, making it relatively likely that traffic
177- that only matches the ` catch-all ` FlowSchema will be rejected with an HTTP 429
178- error.
179-
180- ## Health check concurrency exemption
181-
182- The suggested configuration gives no special treatment to the health
183- check requests on kube-apiservers from their local kubelets --- which
184- tend to use the secured port but supply no credentials. With the
185- suggested config, these requests get assigned to the ` global-default `
186- FlowSchema and the corresponding ` global-default ` priority level,
187- where other traffic can crowd them out.
188-
189- If you add the following additional FlowSchema, this exempts those
190- requests from rate limiting.
191-
192- {{< caution >}}
193- Making this change also allows any hostile party to then send
194- health-check requests that match this FlowSchema, at any volume they
195- like. If you have a web traffic filter or similar external security
196- mechanism to protect your cluster's API server from general internet
197- traffic, you can configure rules to block any health check requests
198- that originate from outside your cluster.
199- {{< /caution >}}
200-
201- {{< codenew file="priority-and-fairness/health-for-strangers.yaml" >}}
202-
203130## Resources
204131
205132The flow control API involves two kinds of resources.
206- [ PriorityLevelConfigurations] (/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#prioritylevelconfiguration-v1beta1 -flowcontrol-apiserver-k8s-io)
133+ [ PriorityLevelConfigurations] (/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#prioritylevelconfiguration-v1beta2 -flowcontrol-apiserver-k8s-io)
207134define the available isolation classes, the share of the available concurrency
208135budget that each can handle, and allow for fine-tuning queuing behavior.
209- [ FlowSchemas] (/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#flowschema-v1beta1 -flowcontrol-apiserver-k8s-io)
136+ [ FlowSchemas] (/docs/reference/generated/kubernetes-api/{{< param "version" >}}/#flowschema-v1beta2 -flowcontrol-apiserver-k8s-io)
210137are used to classify individual inbound requests, matching each to a
211138single PriorityLevelConfiguration. There is also a ` v1alpha1 ` version
212139of the same API group, and it has the same Kinds with the same syntax and
@@ -329,6 +256,153 @@ omitted entirely), in which case all requests matched by this FlowSchema will be
329256considered part of a single flow. The correct choice for a given FlowSchema
330257depends on the resource and your particular environment.
331258
259+ ## Defaults
260+
261+ Each kube-apiserver maintains two sorts of APF configuration objects:
262+ mandatory and suggested.
263+
264+ ### Mandatory Configuration Objects
265+
266+ The four mandatory configuration objects reflect fixed built-in
267+ guardrail behavior. This is behavior that the servers have before
268+ those objects exist, and when those objects exist their specs reflect
269+ this behavior. The four mandatory objects are as follows.
270+
271+ * The mandatory ` exempt ` priority level is used for requests that are
272+ not subject to flow control at all: they will always be dispatched
273+ immediately. The mandatory ` exempt ` FlowSchema classifies all
274+ requests from the ` system:masters ` group into this priority
275+ level. You may define other FlowSchemas that direct other requests
276+ to this priority level, if appropriate.
277+
278+ * The mandatory ` catch-all ` priority level is used in combination with
279+ the mandatory ` catch-all ` FlowSchema to make sure that every request
280+ gets some kind of classification. Typically you should not rely on
281+ this catch-all configuration, and should create your own catch-all
282+ FlowSchema and PriorityLevelConfiguration (or use the suggested
283+ ` global-default ` priority level that is installed by default) as
284+ appropriate. Because it is not expected to be used normally, the
285+ mandatory ` catch-all ` priority level has a very small concurrency
286+ share and does not queue requests.
287+
288+ ### Suggested Configuration Objects
289+
290+ The suggested FlowSchemas and PriorityLevelConfigurations constitute a
291+ reasonable default configuration. You can modify these and/or create
292+ additional configuration objects if you want. If your cluster is
293+ likely to experience heavy load then you should consider what
294+ configuration will work best.
295+
296+ The suggested configuration groups requests into six priority levels:
297+
298+ * The ` node-high ` priority level is for health updates from nodes.
299+
300+ * The ` system ` priority level is for non-health requests from the
301+ ` system:nodes ` group, i.e. Kubelets, which must be able to contact
302+ the API server in order for workloads to be able to schedule on
303+ them.
304+
305+ * The ` leader-election ` priority level is for leader election requests from
306+ built-in controllers (in particular, requests for ` endpoints ` , ` configmaps ` ,
307+ or ` leases ` coming from the ` system:kube-controller-manager ` or
308+ ` system:kube-scheduler ` users and service accounts in the ` kube-system `
309+ namespace). These are important to isolate from other traffic because failures
310+ in leader election cause their controllers to fail and restart, which in turn
311+ causes more expensive traffic as the new controllers sync their informers.
312+
313+ * The ` workload-high ` priority level is for other requests from built-in
314+ controllers.
315+
316+ * The ` workload-low ` priority level is for requests from any other service
317+ account, which will typically include all requests from controllers running in
318+ Pods.
319+
320+ * The ` global-default ` priority level handles all other traffic, e.g.
321+ interactive ` kubectl ` commands run by nonprivileged users.
322+
323+ The suggested FlowSchemas serve to steer requests into the above
324+ priority levels, and are not enumerated here.
325+
326+ ### Maintenance of the Mandatory and Suggested Configuration Objects
327+
328+ Each ` kube-apiserver ` independently maintains the mandatory and
329+ suggested configuration objects, using initial and periodic behavior.
330+ Thus, in a situation with a mixture of servers of different versions
331+ there may be thrashing as long as different servers have different
332+ opinions of the proper content of these objects.
333+
334+ Each ` kube-apiserver ` makes an inital maintenance pass over the
335+ mandatory and suggested configuration objects, and after that does
336+ periodic maintenance (once per minute) of those objects.
337+
338+ For the mandatory configuration objects, maintenance consists of
339+ ensuring that the object exists and, if it does, has the proper spec.
340+ The server refuses to allow a creation or update with a spec that is
341+ inconsistent with the server's guardrail behavior.
342+
343+ Maintenance of suggested configuration objects is designed to allow
344+ their specs to be overridden. Deletion, on the other hand, is not
345+ respected: maintenance will restore the object. If you do not want a
346+ suggested configuration object then you need to keep it around but set
347+ its spec to have minimal consequences. Maintenance of suggested
348+ objects is also designed to support automatic migration when a new
349+ version of the ` kube-apiserver ` is rolled out, albeit potentially with
350+ thrashing while there is a mixed population of servers.
351+
352+ Maintenance of a suggested configuration object consists of creating
353+ it --- with the server's suggested spec --- if the object does not
354+ exist. OTOH, if the object already exists, maintenance behavior
355+ depends on whether the ` kube-apiservers ` or the users control the
356+ object. In the former case, the server ensures that the object's spec
357+ is what the server suggests; in the latter case, the spec is left
358+ alone.
359+
360+ The question of who controls the object is answered by first looking
361+ for an annotation with key ` apf.kubernetes.io/autoupdate-spec ` . If
362+ there is such an annotation and its value is ` true ` then the
363+ kube-apiservers control the object. If there is such an annotation
364+ and its value is ` false ` then the users control the object. If
365+ neither of those condtions holds then the ` metadata.generation ` of the
366+ object is consulted. If that is 1 then the kube-apiservers control
367+ the object. Otherwise the users control the object. These rules were
368+ introduced in release 1.22 and their consideration of
369+ ` metadata.generation ` is for the sake of migration from the simpler
370+ earlier behavior. Users who wish to control a suggested configuration
371+ object should set its ` apf.kubernetes.io/autoupdate-spec ` annotation
372+ to ` false ` .
373+
374+ Maintenance of a mandatory or suggested configuration object also
375+ includes ensuring that it has an ` apf.kubernetes.io/autoupdate-spec `
376+ annotation that accurately reflects whether the kube-apiservers
377+ control the object.
378+
379+ Maintenance also includes deleting objects that are neither mandatory
380+ nor suggested but are annotated
381+ ` apf.kubernetes.io/autoupdate-spec=true ` .
382+
383+ ## Health check concurrency exemption
384+
385+ The suggested configuration gives no special treatment to the health
386+ check requests on kube-apiservers from their local kubelets --- which
387+ tend to use the secured port but supply no credentials. With the
388+ suggested config, these requests get assigned to the ` global-default `
389+ FlowSchema and the corresponding ` global-default ` priority level,
390+ where other traffic can crowd them out.
391+
392+ If you add the following additional FlowSchema, this exempts those
393+ requests from rate limiting.
394+
395+ {{< caution >}}
396+ Making this change also allows any hostile party to then send
397+ health-check requests that match this FlowSchema, at any volume they
398+ like. If you have a web traffic filter or similar external security
399+ mechanism to protect your cluster's API server from general internet
400+ traffic, you can configure rules to block any health check requests
401+ that originate from outside your cluster.
402+ {{< /caution >}}
403+
404+ {{< codenew file="priority-and-fairness/health-for-strangers.yaml" >}}
405+
332406## Diagnostics
333407
334408Every HTTP response from an API server with the priority and fairness feature
0 commit comments