Skip to content

Commit 2cd9e56

Browse files
authored
Merge pull request #5345 from serathius/kep-2340-feedback
Revert decision to return retry-after based on feedback
2 parents 3a4c03f + 6c7f649 commit 2cd9e56

File tree

1 file changed

+1
-9
lines changed
  • keps/sig-api-machinery/2340-Consistent-reads-from-cache

1 file changed

+1
-9
lines changed

keps/sig-api-machinery/2340-Consistent-reads-from-cache/README.md

Lines changed: 1 addition & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -245,16 +245,9 @@ This design requires wait for a watch cache to catch up to the needed revision.
245245
There might be situation where watch is not providing updates causing watch cache to be permanently stale.
246246
For example if watch stream is clogged, or when using etcd version v3.3 and older that don't provide progress notifications.
247247

248-
If the watch cache doesn't catch up in within some time limit we either fail the request or have a fallback.
249-
250-
For Beta we have implemented a fallback mechanism that will revert to normal behavior before we reach timeout.
248+
If the watch cache doesn't catch up in within some time limit we fallback to etcd.
251249
To monitor fallback rate we introduced `apiserver_watch_cache_consistent_read_total` with `fallback` and `success` labels.
252250

253-
With qualification results showing that fallback is needed and we can go back to the original design.
254-
We should fail the requests and rely on rate limiting to prevent cascading failure. I.e. `Retry-After` HTTP header (for
255-
well-behaved clients) and [Apiserver Priority and Fairness](https://github.com/kubernetes/enhancements/blob/master/keps/sig-api-machinery/20190228-priority-and-fairness.md).
256-
The main reason for that is the added complexity and incorrect handling in APF that assumes that request cost doesn't change.
257-
258251
### How to debug cache issue?
259252

260253
Let's present how the system currently works
@@ -341,7 +334,6 @@ After almost a year in Beta, running enabled default, we have collected the foll
341334
* In 99.9% of cases watch cache became fresh enough within 110ms.
342335
* Only 0.001% of waits for fresh cache took more than 250ms.
343336
* Consistent reads reached 5 nine's of availability, meaning cache was able to become fresh before timeout (3s) in 99.999% of cases.
344-
* Main cause of fallback was rolling update of etcd that forces a watch cache reinitialization.
345337
* We have identified and addressed one issue https://github.com/kubernetes/kubernetes/issues/129931
346338

347339
The above results show that consistent reads from cache are stable and reliable. We are confident in promoting this feature to Stable.

0 commit comments

Comments
 (0)