You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-api-machinery/3157-watch-list/README.md
+55-22Lines changed: 55 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -85,7 +85,7 @@ tags, and then generate with `hack/update-toc.sh`.
85
85
-[Proposal](#proposal)
86
86
-[Risks and Mitigations](#risks-and-mitigations)
87
87
-[Design Details](#design-details)
88
-
-[Required changes for a WATCH request with the RV="" and the ResourceVersionMatch=MostRecent](#required-changes-for-a-watch-request-with-the-rv-and-the-resourceversionmatchmostrecent)
88
+
-[Required changes for a WATCH request with the SendInitialEvents=true](#required-changes-for-a-watch-request-with-the-sendinitialeventstrue)
-[Manual testing without the changes in place](#manual-testing-without-the-changes-in-place)
@@ -179,7 +179,7 @@ The kube-apiserver is vulnerable to memory explosion.
179
179
The issue is apparent in larger clusters, where only a few LIST requests might cause serious disruption.
180
180
Uncontrolled and unbounded memory consumption of the servers does not only affect clusters that operate in an
181
181
HA mode but also other programs that share the same machine.
182
-
In this KEP we propose a potential solution to this issue.
182
+
In this KEP we propose a solution to this issue.
183
183
184
184
## Motivation
185
185
@@ -257,7 +257,7 @@ The "Design Details" section below is for the real
257
257
nitty-gritty.
258
258
-->
259
259
260
-
In order to lower memory consumption while getting a list of data and make it more predictable, we propose to use consistent streaming from the watch-cache instead of paging from etcd.
260
+
In order to lower memory consumption while getting a list of data and make it more predictable, we propose to use streaming from the watch-cache instead of paging from etcd.
261
261
Initially, the proposed changes will be applied to informers as they are usually the heaviest users of LIST requests (see [Appendix](#appendix) section for more details on how informers operate today).
262
262
The primary idea is to use standard WATCH request mechanics for getting a stream of individual objects, but to use it for LISTs.
263
263
This would allow us to keep memory allocations constant.
@@ -266,17 +266,17 @@ plus a few additional allocations, that will be explained later in this document
266
266
The rough idea/plan is as follows:
267
267
268
268
- step 1: change the informers to establish a WATCH request with a new query parameter instead of a LIST request.
269
-
- step 2: upon receiving the request from an informer, contact etcd to get the latest RV. It will be used to make sure the watch cache has seen objects up to the received RV. This step is necessary and ensures we will serve consistent data, even from the cache.
270
-
- step 2a: send all objects currently stored in memory for the given resource.
269
+
- step 2: upon receiving the request from an informer, compute the RV at which the result should be returned (possibly contacting etcd if consistent read was requested). It will be used to make sure the watch cache has seen objects up to the received RV. This step is necessary and ensures we will meet the consistency requirements of the request.
270
+
- step 2a: send all objects currently stored in memory for the given resource type.
271
271
- step 2b: propagate any updates that might have happened meanwhile until the watch cache catches up to the latest RV received in step 2.
272
272
- step 2c: send a bookmark event to the informer with the given RV.
273
273
- step 3: listen for further events using the request from step 1.
274
274
275
275
Note: the proposed watch-list semantics (without bookmark event and without the consistency guarantee) kube-apiserver follows already in RV="0" watches.
276
276
The mode is not used in informers today but is supported by every kube-apiserver for legacy, compatibility reasons.
277
-
A watch started with RV="0" may return stale. It is possible for the watch to start at a much older resource version that the client has previously observed, particularly in high availability configurations, due to partitions or stale caches
277
+
A watch started with RV="0" may return stale data. It is possible for the watch to start at a much older resource version that the client has previously observed, particularly in high availability configurations, due to partitions or stale caches.
278
278
279
-
Note 2: informers need consistent lists to avoid time-travel when switching to another HA instance of kube-apiserver with outdated/lagging watch cache.
279
+
Note 2: informers need consistent lists to avoid time-travel when initializing after restart to avoid time travel in case of switching to another HA instance of kube-apiserver with outdated/lagging watch cache.
280
280
See the following [issue](https://github.com/kubernetes/kubernetes/issues/59848) for more details.
281
281
282
282
@@ -310,7 +310,7 @@ required) or even code snippets. If there's any ambiguity about HOW your
310
310
proposal will be implemented, this is the place to discuss them.
311
311
-->
312
312
313
-
### Required changes for a WATCH request with the RV="" and the ResourceVersionMatch=MostRecent
313
+
### Required changes for a WATCH request with the SendInitialEvents=true
314
314
315
315
The following sequence diagram depicts steps that are needed to complete the proposed feature.
316
316
A high-level overview of each was provided in a table that follows immediately the diagram.
@@ -328,11 +328,11 @@ Whereas further down in this section we provided a detailed description of each
328
328
</tr>
329
329
<tr>
330
330
<th>2.</th>
331
-
<th>The watch cache contacts etcd for the most up-to-date ResourceVersion.</th>
331
+
<th>If needed, the watch cache contacts etcd for the most up-to-date ResourceVersion.</th>
332
332
</tr>
333
333
<tr>
334
334
<th>2a.</th>
335
-
<th>The watch cache starts streaming initial data. The data it already has in memory.</th>
335
+
<th>The watch cache starts streaming initial data it already has in memory.</th>
336
336
</tr>
337
337
<tr>
338
338
<th>2b.</th>
@@ -352,14 +352,14 @@ Whereas further down in this section we provided a detailed description of each
352
352
</tr>
353
353
</table>
354
354
355
-
Step 1: On initialization the reflector gets a snapshot of data from the server by passing RV=”” (= unset value) and setting resourceVersionMatch=MostRecent (= ensure freshness).
355
+
Step 1: On initialization the reflector gets a snapshot of data from the server by passing RV=”” (= unset value) to ensure freshness and setting resourceVersionMatch=NotOlderThan and sendInitialEvents=true.
356
356
We do that only during the initial ListAndWatch call.
357
357
Each event (ADD, UPDATE, DELETE) except the BOOKMARK event received from the server is collected.
358
-
Passing resourceVersionMatch=MostRecent tells the cacher it has to guarantee that the cache is at least up to date as a LIST executed at the same time.
358
+
Passing resourceVersion="" tells the cacher it has to guarantee that the cache is at least up to date as a LIST executed at the same time.
359
359
360
360
Note: This ensures that returned data is consistent, served from etcd via a quorum read and prevents "going back in time".
361
361
362
-
Note 2: Unfortunately as of today, the watch cache is vulnerable to stale reads, see https://github.com/kubernetes/kubernetes/issues/59848 for more details.
362
+
Note 2: Watch cache currently doesn't have the feature of supporting resourceVersion="" and thus is vulnerable to stale reads, see https://github.com/kubernetes/kubernetes/issues/59848 for more details.
363
363
364
364
Step 2: Right after receiving a request from the reflector, the cacher gets the current resourceVersion (aka bookmarkAfterResourceVersion) directly from the etcd.
365
365
It is used to make sure the cacher is up to date (has seen data stored in etcd) and to let the reflector know it has seen all initial data.
@@ -447,19 +447,52 @@ It replaces its internal store with the collected items (syncWith) and reuses th
447
447
448
448
#### API changes
449
449
450
-
Extend the optional `ResourceVersionMatch` query parameter of `ListOptions` with the following enumeration value:
450
+
Extend the `ListOptions`struct with the following field:
451
451
452
452
```
453
-
const (
454
-
// ResourceVersionMatchMostRecent matches data at the most recent ResourceVersion.
455
-
// The returned data is consistent, that is, served from etcd via a quorum read.
456
-
// For watch calls, it begins with synthetic "Added" events of all resources up to the most recent ResourceVersion.
457
-
// It ends with a synthetic "Bookmark" event containing the most recent ResourceVersion.
458
-
// For list calls, it has the same semantics as leaving ResourceVersion and ResourceVersionMatch unset.
0 commit comments