Skip to content

Commit 5234cc7

Browse files
committed
Apply feedback
1 parent 539e805 commit 5234cc7

File tree

1 file changed

+19
-9
lines changed

1 file changed

+19
-9
lines changed

keps/sig-api-machinery/20191210-consistent-reads-from-cache.md

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ approvers:
1414
editor: TBD
1515
creation-date: 2019-12-10
1616
last-updated: 2020-01-21
17-
status: implementable
17+
status: provisional
1818
see-also:
1919
replaces:
2020
superseded-by:
@@ -40,7 +40,7 @@ read from etcd.
4040
- [Motivation](#motivation)
4141
- [Goals](#goals)
4242
- [Proposal](#proposal)
43-
- [Leveraging the Progress Notify Mechanism](#leveraging-the-progress-notify-mechanism)
43+
- [Consistent reads from cache](#consistent-reads-from-cache-1)
4444
- [Use WithProgressNotify to enable automatic watch updates](#use-withprogressnotify-to-enable-automatic-watch-updates)
4545
- [Determining if etcd is sending progress notify events](#determining-if-etcd-is-sending-progress-notify-events)
4646
- [Risks and Mitigations](#risks-and-mitigations)
@@ -51,6 +51,7 @@ read from etcd.
5151
- [Option: Serve 1st page of paginated requests from the watch cache](#option-serve-1st-page-of-paginated-requests-from-the-watch-cache)
5252
- [Option: Enable pagination in the watch cache](#option-enable-pagination-in-the-watch-cache)
5353
- [Rejected Option: Return unpaginated responses to paginated list requests](#rejected-option-return-unpaginated-responses-to-paginated-list-requests)
54+
- [What if the watch cache is stale?](#what-if-the-watch-cache-is-stale)
5455
- [Test Plan](#test-plan)
5556
- [Rollout Plan](#rollout-plan)
5657
- [Serving consistent reads from cache](#serving-consistent-reads-from-cache)
@@ -126,7 +127,7 @@ serves the resourceVersion="0" list requests from reflectors today.
126127

127128
## Proposal
128129

129-
### Leveraging the Progress Notify Mechanism
130+
### Consistent reads from cache
130131

131132
Guard this by a `WatchCacheConsistentReads` feature gate.
132133

@@ -143,7 +144,7 @@ When an consistent LIST request is received and the watch cache is enabled:
143144

144145
- Get the current revision from etcd for the resource type being served. The returned revision is strongly consistent (guaranteed to be the latest revision via a quorum read).
145146
- Use the existing `waitUntilFreshAndBlock` function in the watch cache to wait briefly for the watch to catch up to the current revision.
146-
- If the block times out, skip the cache and serve the request directly from storage (etcd).
147+
- If the block times out, fail the request (see "What if the watch cache is stale?" section for details)
147148

148149
To get the revsion we have some options:
149150

@@ -153,11 +154,6 @@ To get the revsion we have some options:
153154
Consistent GET requests will continue to be served directly from etcd. We will
154155
only serve consistent LIST requests from cache.
155156

156-
We will explore how long the kube-apiserver should wait for a watch cache to
157-
catch up to the needed revision. I.e. for our chosen progress event interval,
158-
should the kube-apiserver also wait a similar interval for the required revision
159-
to become available or should it fallback to making the request to etcd?
160-
161157
Optional: For some (but not all) of the etcd progress watch events, also create a
162158
kubernetes "bookmark" watch event and send it to kube-apiserver clients so that
163159
reflectors and shared informers are kept up-to-date. The benefit of this is that
@@ -300,6 +296,20 @@ LIST resourceVersion="".
300296

301297
We are not planning to pursue this option.
302298

299+
### What if the watch cache is stale?
300+
301+
This design requires wait for a watch cache to catch up to the needed revision
302+
for consistent reads. If the cache doesn't catch up within some time limit we
303+
either fail the request for have a fallback.
304+
305+
If the fallback it to forward consistent reads to etcd, a cascading failure
306+
is likely to occur if caches become stale and a large number of read requests
307+
are forwarded to etcd.
308+
309+
Since falling back to etcd won't work, we should fail the requests and rely on
310+
rate limiting to prevent cascading failure. I.e. `Retry-After` HTTP header (for
311+
well behaved clients) and [Priority and Fairness](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190228-priority-and-fairness.md).
312+
303313
### Test Plan
304314

305315
Correctness:

0 commit comments

Comments
 (0)