@@ -735,8 +735,7 @@ next section.
735
735
This is based on fair queuing but is modified to deal with serving
736
736
requests in an apiserver instead of transmitting packets in a router.
737
737
You can find the original fair queuing paper at
738
- [ ACM] ( https://dl.acm.org/citation.cfm?doid=75247.75248 ) or
739
- [ MIT] ( http://people.csail.mit.edu/imcgraw/links/research/pubs/networks/WFQ.pdf ) ,
738
+ [ ACM] ( https://dl.acm.org/citation.cfm?doid=75247.75248 ) ,
740
739
and an
741
740
[ implementation outline at Wikipedia] ( https://en.wikipedia.org/wiki/Fair_queuing ) .
742
741
Our problem differs from the normal fair queuing problem in three
@@ -1133,7 +1132,7 @@ arrive to that queue and crowd out other queues for an arbitrarily
1133
1132
long time. To mitigate this problem, the implementation has a special
1134
1133
step that effectively prevents ` t_dispatch_virtual ` of the next
1135
1134
request to dispatch from dropping below the current time. But that
1136
- solves only half of the problem. Other queueus may accumulate a
1135
+ solves only half of the problem. Other queues may accumulate a
1137
1136
corresponding deficit (inappropriately large values for
1138
1137
` t_dispatch_virtual ` and ` t_finish_virtual ` ). Such a queue can have
1139
1138
an arbitrarily long burst of inappropriate lossage to other queues.
@@ -1201,7 +1200,7 @@ of available resources), a single request should consume no more than A
1201
1200
concurrency units. Fortunately that all compiles together because the
1202
1201
` processing latency ` of the LIST request is actually proportional to the
1203
1202
number of processed objects, so the cost of the request (defined above as
1204
- ` <width> x <processing latency> ` really is proportaional to the number of
1203
+ ` <width> x <processing latency> ` really is proportional to the number of
1205
1204
processed objects as expected.
1206
1205
1207
1206
For RAM the situation is actually different. In order to process a LIST
@@ -1220,7 +1219,7 @@ where N is the number of items a given LIST request is processing.
1220
1219
1221
1220
The question is how to combine them to a single number. While the main goal
1222
1221
is to stay on the safe side and protect from the overload, we also want to
1223
- maxiumize the utilization of the available concurrency units.
1222
+ maximize the utilization of the available concurrency units.
1224
1223
Fortunately, when we normalize CPU and RAM to percentage of available capacity,
1225
1224
it appears that almost all requests are much more cpu-intensive. Assuming
1226
1225
4GB:1CPU ratio and 10kB average object and the fact that processing larger
@@ -1234,7 +1233,7 @@ independently, which translates to the following function:
1234
1233
```
1235
1234
We're going to better tune the function based on experiments, but based on the
1236
1235
above back-of-envelope calculations showing that memory should almost never be
1237
- a limiting factor we will apprximate the width simply with:
1236
+ a limiting factor we will approximate the width simply with:
1238
1237
```
1239
1238
width_approx(n) = min(A, ceil(N / E)), where E = 1 / B
1240
1239
```
@@ -1267,7 +1266,7 @@ the virtual world for `additional latency`.
1267
1266
Adjusting virtual time of a queue to do that is trivial. The other thing
1268
1267
to tweak is to ensure that the concurrency units will not get available
1269
1268
for other requests for that time (because currently all actions are
1270
- triggerred by starting or finishing some request). We will maintain that
1269
+ triggered by starting or finishing some request). We will maintain that
1271
1270
possibility by wrapping the handler into another one that will be sleeping
1272
1271
for ` additional latence ` after the request is processed.
1273
1272
@@ -1284,7 +1283,7 @@ requests. Now in order to start processing a request, it has to accumulate
1284
1283
The important requirement to recast now is fairness. As soon a single
1285
1284
request can consume more units of concurrency, the fairness is
1286
1285
no longer about the number of requests from a given queue, but rather
1287
- about number of consumed concurrency units. This justifes the above
1286
+ about number of consumed concurrency units. This justifies the above
1288
1287
definition of adjusting the cost of the request to now be equal to
1289
1288
` <width> x <processing latency> ` (instead of just ` <processing latency> ` ).
1290
1289
@@ -1310,7 +1309,7 @@ modification to the current dispatching algorithm:
1310
1309
semantics of virtual time tracked by the queues to correspond to work,
1311
1310
instead of just wall time. That means when we estimate a request's
1312
1311
virtual duration, we will use ` estimated width x estimated latency ` instead
1313
- of just estimated latecy . And when a request finishes, we will update
1312
+ of just estimated latency . And when a request finishes, we will update
1314
1313
the virtual time for it with ` seats x actual latency ` (note that seats
1315
1314
will always equal the estimated width, since we have no way to figure out
1316
1315
if a request used less concurrency than we granted it).
@@ -1348,7 +1347,7 @@ We will solve this problem by also handling watch requests by our priority
1348
1347
and fairness kube-apiserver filter. The queueing and admitting of watch
1349
1348
requests will be happening exactly the same as for all non-longrunning
1350
1349
requests. However, as soon as watch is initialized, it will be sending
1351
- an articial ` finished ` signal to the APF dispatcher - after receiving this
1350
+ an artificial ` finished ` signal to the APF dispatcher - after receiving this
1352
1351
signal dispatcher will be treating the request as already finished (i.e.
1353
1352
the concurrency units it was occupying will be released and new requests
1354
1353
may potentially be immediately admitted), even though the request itself
@@ -1362,7 +1361,7 @@ The first question to answer is how we will know that watch initialization
1362
1361
has actually been done. However, the answer for this question is different
1363
1362
depending on whether the watchcache is on or off.
1364
1363
1365
- In watchcache, the initialization phase is clearly separated - we explicily
1364
+ In watchcache, the initialization phase is clearly separated - we explicitly
1366
1365
compute ` init events ` and process them. What we don't control at this level
1367
1366
is the process of serialization and sending out the events.
1368
1367
In the initial version we will ignore this and simply send the `initialization
@@ -1395,7 +1394,7 @@ LIST requests) adjust the `width` of the request. However, in the initial
1395
1394
version, we will just use ` width=1 ` for all watch requests. In the future,
1396
1395
we are going to evolve it towards a function that will be better estimating
1397
1396
the actual cost (potentially somewhat similarly to how LIST requests are done)
1398
- but we first need to a machanism to allow us experiment and tune it better.
1397
+ but we first need to a mechanism to allow us experiment and tune it better.
1399
1398
1400
1399
#### Keeping the watch up-to-date
1401
1400
@@ -1425,7 +1424,7 @@ Let's start with an assumption that sending every watch event is equally
1425
1424
expensive. We will discuss how to generalize it below.
1426
1425
1427
1426
With the above assumption, a cost of a mutating request associated with
1428
- sending watch events triggerred by it is proportional to the number of
1427
+ sending watch events triggered by it is proportional to the number of
1429
1428
watchers that has to process that event. So let's describe how we can
1430
1429
estimate this number.
1431
1430
@@ -1468,7 +1467,7 @@ structure to avoid loosing too much information.
1468
1467
If we would have a hashing function that can combine only a similar buckets
1469
1468
(e.g. it won't combine "all Endpoints" bucket with "pods from node X") then
1470
1469
we can simply write maximum from all entries that are hashed to the same value.
1471
- This means that some costs may be overestimated, but if we resaonably hash
1470
+ This means that some costs may be overestimated, but if we reasonably hash
1472
1471
requests originating by system components, that seems acceptable.
1473
1472
The above can be achieved by hashing each resource type to a separate set of
1474
1473
buckets, and within a resource type hashing (namespace, name) as simple as:
@@ -1489,7 +1488,7 @@ as whenever something quickly grows we report it, but we don't immediately
1489
1488
downscale which is a way to somehow incorporate a history.
1490
1489
1491
1490
However, we will treat the above as a feasibility proof. We will just start
1492
- with the simplest apprach of treating each kube-apiserver independently.
1491
+ with the simplest approach of treating each kube-apiserver independently.
1493
1492
We will implement the above (i.e. knowledge sharing between kube-apiserver),
1494
1493
if the independence assumption will not work good enough.
1495
1494
The above description shows that it won't result in almost any wasted work
0 commit comments