@@ -22,7 +22,7 @@ The API Priority and Fairness feature (APF) is an alternative that improves upon
22
22
aforementioned max-inflight limitations. APF classifies
23
23
and isolates requests in a more fine-grained way. It also introduces
24
24
a limited amount of queuing, so that no requests are rejected in cases
25
- of very brief bursts. Requests are dispatched from queues using a
25
+ of very brief bursts. Requests are dispatched from queues using a
26
26
fair queuing technique so that, for example, a poorly-behaved
27
27
{{< glossary_tooltip text="controller" term_id="controller" >}} need not
28
28
starve others (even at the same priority level).
@@ -46,15 +46,15 @@ are not subject to the `--max-requests-inflight` limit.
46
46
## Enabling/Disabling API Priority and Fairness
47
47
48
48
The API Priority and Fairness feature is controlled by a command-line flag
49
- and is enabled by default. See
49
+ and is enabled by default. See
50
50
[ Options] ( /docs/reference/command-line-tools-reference/kube-apiserver/#options )
51
51
for a general explanation of the available kube-apiserver command-line
52
- options and how to enable and disable them. The name of the
53
- command-line option for APF is "--enable-priority-and-fairness". This feature
52
+ options and how to enable and disable them. The name of the
53
+ command-line option for APF is "--enable-priority-and-fairness". This feature
54
54
also involves an {{<glossary_tooltip term_id="api-group" text="API Group" >}}
55
55
with: (a) a stable ` v1 ` version, introduced in 1.29, and
56
56
enabled by default (b) a ` v1beta3 ` version, enabled by default, and
57
- deprecated in v1.29. You can
57
+ deprecated in v1.29. You can
58
58
disable the API group beta version ` v1beta3 ` by adding the
59
59
following command-line flags to your ` kube-apiserver ` invocation:
60
60
@@ -96,7 +96,7 @@ from succeeding.
96
96
97
97
The concurrency limits of the priority levels are periodically
98
98
adjusted, allowing under-utilized priority levels to temporarily lend
99
- concurrency to heavily-utilized levels. These limits are based on
99
+ concurrency to heavily-utilized levels. These limits are based on
100
100
nominal limits and bounds on how much concurrency a priority level may
101
101
lend and how much it may borrow, all derived from the configuration
102
102
objects mentioned below.
@@ -111,29 +111,29 @@ word "seat" is used to mean one unit of concurrency, inspired by the
111
111
way each passenger on a train or aircraft takes up one of the fixed
112
112
supply of seats.
113
113
114
- But some requests take up more than one seat. Some of these are ** list**
114
+ But some requests take up more than one seat. Some of these are ** list**
115
115
requests that the server estimates will return a large number of
116
- objects. These have been found to put an exceptionally heavy burden
117
- on the server. For this reason, the server estimates the number of objects
116
+ objects. These have been found to put an exceptionally heavy burden
117
+ on the server. For this reason, the server estimates the number of objects
118
118
that will be returned and considers the request to take a number of seats
119
119
that is proportional to that estimated number.
120
120
121
121
### Execution time tweaks for watch requests
122
122
123
123
API Priority and Fairness manages ** watch** requests, but this involves a
124
- couple more excursions from the baseline behavior. The first concerns
125
- how long a ** watch** request is considered to occupy its seat. Depending
126
- on request parameters, the response to a ** watch** request may or may not
127
- begin with ** create** notifications for all the relevant pre-existing
128
- objects. API Priority and Fairness considers a ** watch** request to be
124
+ couple more excursions from the baseline behavior. The first concerns
125
+ how long a ** watch** request is considered to occupy its seat. Depending
126
+ on request parameters, the response to a ** watch** request may or may not
127
+ begin with ** create** notifications for all the relevant pre-existing
128
+ objects. API Priority and Fairness considers a ** watch** request to be
129
129
done with its seat once that initial burst of notifications, if any,
130
130
is over.
131
131
132
132
The normal notifications are sent in a concurrent burst to all
133
- relevant ** watch** response streams whenever the server is notified of an
134
- object create/update/delete. To account for this work, API Priority
133
+ relevant ** watch** response streams whenever the server is notified of an
134
+ object create/update/delete. To account for this work, API Priority
135
135
and Fairness considers every write request to spend some additional
136
- time occupying seats after the actual writing is done. The server
136
+ time occupying seats after the actual writing is done. The server
137
137
estimates the number of notifications to be sent and adjusts the write
138
138
request's number of seats and seat occupancy time to include this
139
139
extra work.
@@ -155,7 +155,7 @@ To enable distinct handling of distinct instances, controllers that have
155
155
many instances should authenticate with distinct usernames
156
156
157
157
After classifying a request into a flow, the API Priority and Fairness
158
- feature then may assign the request to a queue. This assignment uses
158
+ feature then may assign the request to a queue. This assignment uses
159
159
a technique known as {{< glossary_tooltip term_id="shuffle-sharding"
160
160
text="shuffle sharding" >}}, which makes relatively efficient use of
161
161
queues to insulate low-intensity flows from high-intensity flows.
@@ -203,19 +203,19 @@ go up (or down) by the same fraction.
203
203
{{< caution >}}
204
204
In the versions before ` v1beta3 ` the relevant
205
205
PriorityLevelConfiguration field is named "assured concurrency shares"
206
- rather than "nominal concurrency shares". Also, in Kubernetes release
206
+ rather than "nominal concurrency shares". Also, in Kubernetes release
207
207
1.25 and earlier there were no periodic adjustments: the
208
208
nominal/assured limits were always applied without adjustment.
209
209
{{< /caution >}}
210
210
211
211
The bounds on how much concurrency a priority level may lend and how
212
212
much it may borrow are expressed in the PriorityLevelConfiguration as
213
- percentages of the level's nominal limit. These are resolved to
213
+ percentages of the level's nominal limit. These are resolved to
214
214
absolute numbers of seats by multiplying with the nominal limit /
215
- 100.0 and rounding. The dynamically adjusted concurrency limit of a
215
+ 100.0 and rounding. The dynamically adjusted concurrency limit of a
216
216
priority level is constrained to lie between (a) a lower bound of its
217
217
nominal limit minus its lendable seats and (b) an upper bound of its
218
- nominal limit plus the seats it may borrow. At each adjustment the
218
+ nominal limit plus the seats it may borrow. At each adjustment the
219
219
dynamic limits are derived by each priority level reclaiming any lent
220
220
seats for which demand recently appeared and then jointly fairly
221
221
responding to the recent seat demand on the priority levels, within
@@ -328,9 +328,9 @@ mandatory and suggested.
328
328
### Mandatory Configuration Objects
329
329
330
330
The four mandatory configuration objects reflect fixed built-in
331
- guardrail behavior. This is behavior that the servers have before
331
+ guardrail behavior. This is behavior that the servers have before
332
332
those objects exist, and when those objects exist their specs reflect
333
- this behavior. The four mandatory objects are as follows.
333
+ this behavior. The four mandatory objects are as follows.
334
334
335
335
* The mandatory ` exempt ` priority level is used for requests that are
336
336
not subject to flow control at all: they will always be dispatched
@@ -352,8 +352,8 @@ this behavior. The four mandatory objects are as follows.
352
352
### Suggested Configuration Objects
353
353
354
354
The suggested FlowSchemas and PriorityLevelConfigurations constitute a
355
- reasonable default configuration. You can modify these and/or create
356
- additional configuration objects if you want. If your cluster is
355
+ reasonable default configuration. You can modify these and/or create
356
+ additional configuration objects if you want. If your cluster is
357
357
likely to experience heavy load then you should consider what
358
358
configuration will work best.
359
359
@@ -405,33 +405,33 @@ The server refuses to allow a creation or update with a spec that is
405
405
inconsistent with the server's guardrail behavior.
406
406
407
407
Maintenance of suggested configuration objects is designed to allow
408
- their specs to be overridden. Deletion, on the other hand, is not
409
- respected: maintenance will restore the object. If you do not want a
408
+ their specs to be overridden. Deletion, on the other hand, is not
409
+ respected: maintenance will restore the object. If you do not want a
410
410
suggested configuration object then you need to keep it around but set
411
- its spec to have minimal consequences. Maintenance of suggested
411
+ its spec to have minimal consequences. Maintenance of suggested
412
412
objects is also designed to support automatic migration when a new
413
413
version of the ` kube-apiserver ` is rolled out, albeit potentially with
414
414
thrashing while there is a mixed population of servers.
415
415
416
416
Maintenance of a suggested configuration object consists of creating
417
417
it --- with the server's suggested spec --- if the object does not
418
- exist. OTOH, if the object already exists, maintenance behavior
418
+ exist. OTOH, if the object already exists, maintenance behavior
419
419
depends on whether the ` kube-apiservers ` or the users control the
420
- object. In the former case, the server ensures that the object's spec
420
+ object. In the former case, the server ensures that the object's spec
421
421
is what the server suggests; in the latter case, the spec is left
422
422
alone.
423
423
424
424
The question of who controls the object is answered by first looking
425
- for an annotation with key ` apf.kubernetes.io/autoupdate-spec ` . If
425
+ for an annotation with key ` apf.kubernetes.io/autoupdate-spec ` . If
426
426
there is such an annotation and its value is ` true ` then the
427
- kube-apiservers control the object. If there is such an annotation
428
- and its value is ` false ` then the users control the object. If
427
+ kube-apiservers control the object. If there is such an annotation
428
+ and its value is ` false ` then the users control the object. If
429
429
neither of those conditions holds then the ` metadata.generation ` of the
430
- object is consulted. If that is 1 then the kube-apiservers control
431
- the object. Otherwise the users control the object. These rules were
430
+ object is consulted. If that is 1 then the kube-apiservers control
431
+ the object. Otherwise the users control the object. These rules were
432
432
introduced in release 1.22 and their consideration of
433
433
` metadata.generation ` is for the sake of migration from the simpler
434
- earlier behavior. Users who wish to control a suggested configuration
434
+ earlier behavior. Users who wish to control a suggested configuration
435
435
object should set its ` apf.kubernetes.io/autoupdate-spec ` annotation
436
436
to ` false ` .
437
437
@@ -448,7 +448,7 @@ nor suggested but are annotated
448
448
449
449
The suggested configuration gives no special treatment to the health
450
450
check requests on kube-apiservers from their local kubelets --- which
451
- tend to use the secured port but supply no credentials. With the
451
+ tend to use the secured port but supply no credentials. With the
452
452
suggested config, these requests get assigned to the ` global-default `
453
453
FlowSchema and the corresponding ` global-default ` priority level,
454
454
where other traffic can crowd them out.
@@ -459,7 +459,7 @@ requests from rate limiting.
459
459
{{< caution >}}
460
460
Making this change also allows any hostile party to then send
461
461
health-check requests that match this FlowSchema, at any volume they
462
- like. If you have a web traffic filter or similar external security
462
+ like. If you have a web traffic filter or similar external security
463
463
mechanism to protect your cluster's API server from general internet
464
464
traffic, you can configure rules to block any health check requests
465
465
that originate from outside your cluster.
@@ -489,7 +489,7 @@ poorly-behaved workloads that may be harming system health.
489
489
(cumulative since server start) of requests that were rejected,
490
490
broken down by the labels ` flow_schema ` (indicating the one that
491
491
matched the request), ` priority_level ` (indicating the one to which
492
- the request was assigned), and ` reason ` . The ` reason ` label will be
492
+ the request was assigned), and ` reason ` . The ` reason ` label will be
493
493
one of the following values:
494
494
495
495
* ` queue-full ` , indicating that too many requests were already
@@ -541,7 +541,7 @@ poorly-behaved workloads that may be harming system health.
541
541
high water marks of the number of queued requests, grouped by a
542
542
label named ` request_kind ` whose value is ` mutating ` or ` readOnly ` .
543
543
These high water marks describe the largest number seen in the one
544
- second window most recently completed. These complement the older
544
+ second window most recently completed. These complement the older
545
545
` apiserver_current_inflight_requests ` gauge vector that holds the
546
546
last window's high water mark of number of requests actively being
547
547
served.
@@ -555,7 +555,7 @@ poorly-behaved workloads that may be harming system health.
555
555
nanosecond, of the number of requests broken down by the labels
556
556
` phase ` (which takes on the values ` waiting ` and ` executing ` ) and
557
557
` request_kind ` (which takes on the values ` mutating ` and
558
- ` readOnly ` ). Each observed value is a ratio, between 0 and 1, of
558
+ ` readOnly ` ). Each observed value is a ratio, between 0 and 1, of
559
559
the number of requests divided by the corresponding limit on the
560
560
number of requests (queue volume limit for waiting and concurrency
561
561
limit for executing).
@@ -568,21 +568,21 @@ poorly-behaved workloads that may be harming system health.
568
568
histogram vector of observations, made at the end of each
569
569
nanosecond, of the number of requests broken down by the labels
570
570
` phase ` (which takes on the values ` waiting ` and ` executing ` ) and
571
- ` priority_level ` . Each observed value is a ratio, between 0 and 1,
571
+ ` priority_level ` . Each observed value is a ratio, between 0 and 1,
572
572
of a number of requests divided by the corresponding limit on the
573
573
number of requests (queue volume limit for waiting and concurrency
574
574
limit for executing).
575
575
576
576
* ` apiserver_flowcontrol_priority_level_seat_utilization ` is a
577
577
histogram vector of observations, made at the end of each
578
578
nanosecond, of the utilization of a priority level's concurrency
579
- limit, broken down by ` priority_level ` . This utilization is the
580
- fraction (number of seats occupied) / (concurrency limit). This
579
+ limit, broken down by ` priority_level ` . This utilization is the
580
+ fraction (number of seats occupied) / (concurrency limit). This
581
581
metric considers all stages of execution (both normal and the extra
582
582
delay at the end of a write to cover for the corresponding
583
583
notification work) of all requests except WATCHes; for those it
584
584
considers only the initial stage that delivers notifications of
585
- pre-existing objects. Each histogram in the vector is also labeled
585
+ pre-existing objects. Each histogram in the vector is also labeled
586
586
with ` phase: executing ` (there is no seat limit for the waiting
587
587
phase).
588
588
@@ -603,9 +603,9 @@ poorly-behaved workloads that may be harming system health.
603
603
{{< /note >}}
604
604
605
605
* ` apiserver_flowcontrol_request_concurrency_limit ` is the same as
606
- ` apiserver_flowcontrol_nominal_limit_seats ` . Before the
607
- introduction of concurrency borrowing between priority levels, this
608
- was always equal to ` apiserver_flowcontrol_current_limit_seats `
606
+ ` apiserver_flowcontrol_nominal_limit_seats ` . Before the
607
+ introduction of concurrency borrowing between priority levels,
608
+ this was always equal to ` apiserver_flowcontrol_current_limit_seats `
609
609
(which did not exist as a distinct metric).
610
610
611
611
* ` apiserver_flowcontrol_lower_limit_seats ` is a gauge vector holding
@@ -616,8 +616,8 @@ poorly-behaved workloads that may be harming system health.
616
616
617
617
* ` apiserver_flowcontrol_demand_seats ` is a histogram vector counting
618
618
observations, at the end of every nanosecond, of each priority
619
- level's ratio of (seat demand) / (nominal concurrency limit). A
620
- priority level's seat demand is the sum, over both queued requests
619
+ level's ratio of (seat demand) / (nominal concurrency limit).
620
+ A priority level's seat demand is the sum, over both queued requests
621
621
and those in the initial phase of execution, of the maximum of the
622
622
number of seats occupied in the request's initial and final
623
623
execution phases.
@@ -791,6 +791,6 @@ Example FlowSchema object to isolate list event requests:
791
791
792
792
- You can visit flow control [ reference doc] ( /docs/reference/debug-cluster/flow-control/ ) to learn more about troubleshooting.
793
793
- For background information on design details for API priority and fairness, see
794
- the [ enhancement proposal] ( https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness ) .
794
+ the [ enhancement proposal] ( https://github.com/kubernetes/enhancements/tree/master/keps/sig-api-machinery/1040-priority-and-fairness ) .
795
795
- You can make suggestions and feature requests via [ SIG API Machinery] ( https://github.com/kubernetes/community/tree/master/sig-api-machinery )
796
- or the feature's [ slack channel] ( https://kubernetes.slack.com/messages/api-priority-and-fairness ) .
796
+ or the feature's [ slack channel] ( https://kubernetes.slack.com/messages/api-priority-and-fairness ) .
0 commit comments