@@ -31,10 +31,13 @@ use informers and react to failures of API requests with exponential
31
31
back-off, and other clients that also work this way.
32
32
33
33
{{< caution >}}
34
- Requests classified as "long-running" — primarily watches — are not
35
- subject to the API Priority and Fairness filter. This is also true for
36
- the ` --max-requests-inflight ` flag without the API Priority and
37
- Fairness feature enabled.
34
+ Some requests classified as "long-running"&mdash ; such as remote
35
+ command execution or log tailing&mdash ; are not subject to the API
36
+ Priority and Fairness filter. This is also true for the
37
+ ` --max-requests-inflight ` flag without the API Priority and Fairness
38
+ feature enabled. API Priority and Fairness _ does_ apply to ** watch**
39
+ requests. When API Priority and Fairness is disabled, ** watch** requests
40
+ are not subject to the ` --max-requests-inflight ` limit.
38
41
{{< /caution >}}
39
42
40
43
<!-- body -->
@@ -93,6 +96,44 @@ Pods. This means that an ill-behaved Pod that floods the API server with
93
96
requests cannot prevent leader election or actions by the built-in controllers
94
97
from succeeding.
95
98
99
+ ### Seats Occupied by a Request
100
+
101
+ The above description of concurrency management is the baseline story.
102
+ In it, requests have different durations but are counted equally at
103
+ any given moment when comparing against a priority level's concurrency
104
+ limit. In the baseline story, each request occupies one unit of
105
+ concurrency. The word "seat" is used to mean one unit of concurrency,
106
+ inspired by the way each passenger on a train or aircraft takes up one
107
+ of the fixed supply of seats.
108
+
109
+ But some requests take up more than one seat. Some of these are ** list**
110
+ requests that the server estimates will return a large number of
111
+ objects. These have been found to put an exceptionally heavy burden
112
+ on the server, among requests that take a similar amount of time to
113
+ run. For this reason, the server estimates the number of objects that
114
+ will be returned and considers the request to take a number of seats
115
+ that is proportional to that estimated number.
116
+
117
+ ### Execution time tweaks for watch requests
118
+
119
+ API Priority and Fairness manages ** watch** requests, but this involves a
120
+ couple more excursions from the baseline behavior. The first concerns
121
+ how long a ** watch** request is considered to occupy its seat. Depending
122
+ on request parameters, the response to a ** watch** request may or may not
123
+ begin with ** create** notifications for all the relevant pre-existing
124
+ objects. API Priority and Fairness considers a ** watch** request to be
125
+ done with its seat once that initial burst of notifications, if any,
126
+ is over.
127
+
128
+ The normal notifications are sent in a concurrent burst to all
129
+ relevant ** watch** response streams whenever the server is notified of an
130
+ object create/update/delete. To account for this work, API Priority
131
+ and Fairness considers every write request to spend some additional
132
+ time occupying seats after the actual writing is done. The server
133
+ estimates the number of notifications to be sent and adjusts the write
134
+ request's number of seats and seat occupancy time to include this
135
+ extra work.
136
+
96
137
### Queuing
97
138
98
139
Even within a priority level there may be a large number of distinct sources of
0 commit comments