@@ -31,10 +31,12 @@ use informers and react to failures of API requests with exponential
31
31
back-off, and other clients that also work this way.
32
32
33
33
{{< caution >}}
34
- Requests classified as "long-running" — primarily watches — are not
35
- subject to the API Priority and Fairness filter. This is also true for
36
- the ` --max-requests-inflight ` flag without the API Priority and
37
- Fairness feature enabled.
34
+ Some requests classified as "long-running" — such as remote command
35
+ execution or log tailing — are not subject to the API Priority and
36
+ Fairness filter. This is also true for the ` --max-requests-inflight `
37
+ flag without the API Priority and Fairness feature enabled. WATCH
38
+ requests are considered long-running if API Priority and Fairness is
39
+ disabled, NOT long-running if it enabled.
38
40
{{< /caution >}}
39
41
40
42
<!-- body -->
@@ -93,6 +95,40 @@ Pods. This means that an ill-behaved Pod that floods the API server with
93
95
requests cannot prevent leader election or actions by the built-in controllers
94
96
from succeeding.
95
97
98
+ ### Request Width
99
+
100
+ The above description of concurrency management is the baseline story.
101
+ In it, all requests have equal "width": each takes up one "seat", one
102
+ unit of concurrency.
103
+
104
+ But some requests take up more than one seat. Some of these are LIST
105
+ requests that the server estimates will return a large number of
106
+ objects. These have been found to put an exceptionally heavy burden
107
+ on the server, among requests that take a similar amount of time to
108
+ run. For this reason, the server estimates the number of objects that
109
+ will be returned and considers the request to take a number of seats
110
+ that is proportional to that estimated number.
111
+
112
+ ### Execution Time Tweaks for WATCH
113
+
114
+ API Priority and Fairness manages WATCH requests but this involves a
115
+ couple more excursions from the baseline behavior. The first concerns
116
+ how long a WATCH request is considered to occupy its seat. Depending
117
+ on request parameters, the response to a WATCH request may or may not
118
+ begin with CREATE notifications for all the relevant pre-existing
119
+ objects. API Priority and Fairness considers a WATCH request to be
120
+ done with its seat once that initial burst of notifications, if any,
121
+ is over.
122
+
123
+ The normal notifications are sent in a concurrent burst to all
124
+ relevant WATCH response streams whenever the server is notified of an
125
+ object create/update/delete. To account for this work, API Priority
126
+ and Fairness consiers every write request to spend some additional
127
+ time occupying seats after the actual writing is done. The server
128
+ estimates the number of notifications to be sent and adjusts the write
129
+ request's number of seats and seat occupancy time to include this
130
+ extra work.
131
+
96
132
### Queuing
97
133
98
134
Even within a priority level there may be a large number of distinct sources of
0 commit comments