You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: keps/sig-api-machinery/2340-Consistent-reads-from-cache/README.md
+1-9Lines changed: 1 addition & 9 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -245,16 +245,9 @@ This design requires wait for a watch cache to catch up to the needed revision.
245
245
There might be situation where watch is not providing updates causing watch cache to be permanently stale.
246
246
For example if watch stream is clogged, or when using etcd version v3.3 and older that don't provide progress notifications.
247
247
248
-
If the watch cache doesn't catch up in within some time limit we either fail the request or have a fallback.
249
-
250
-
For Beta we have implemented a fallback mechanism that will revert to normal behavior before we reach timeout.
248
+
If the watch cache doesn't catch up in within some time limit we fallback to etcd.
251
249
To monitor fallback rate we introduced `apiserver_watch_cache_consistent_read_total` with `fallback` and `success` labels.
252
250
253
-
With qualification results showing that fallback is needed and we can go back to the original design.
254
-
We should fail the requests and rely on rate limiting to prevent cascading failure. I.e. `Retry-After` HTTP header (for
255
-
well-behaved clients) and [Apiserver Priority and Fairness](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190228-priority-and-fairness.md).
256
-
The main reason for that is the added complexity and incorrect handling in APF that assumes that request cost doesn't change.
257
-
258
251
### How to debug cache issue?
259
252
260
253
Let's present how the system currently works
@@ -341,7 +334,6 @@ After almost a year in Beta, running enabled default, we have collected the foll
341
334
* In 99.9% of cases watch cache became fresh enough within 110ms.
342
335
* Only 0.001% of waits for fresh cache took more than 250ms.
343
336
* Consistent reads reached 5 nine's of availability, meaning cache was able to become fresh before timeout (3s) in 99.999% of cases.
344
-
* Main cause of fallback was rolling update of etcd that forces a watch cache reinitialization.
345
337
* We have identified and addressed one issue https://github.com/kubernetes/kubernetes/issues/129931
346
338
347
339
The above results show that consistent reads from cache are stable and reliable. We are confident in promoting this feature to Stable.
0 commit comments