Skip to content

Commit 9afa3e0

Browse files
committed
Update debugging strategy and beta criteria
1 parent 41381be commit 9afa3e0

File tree

2 files changed

+52
-28
lines changed

2 files changed

+52
-28
lines changed

keps/sig-api-machinery/2340-Consistent-reads-from-cache/README.md

Lines changed: 49 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -49,6 +49,7 @@ read from etcd.
4949
- [Troubleshooting](#troubleshooting)
5050
- [Implementation History](#implementation-history)
5151
- [Alternatives](#alternatives)
52+
- [Per-request override](#per-request-override)
5253
<!-- /toc -->
5354

5455
## Summary
@@ -306,36 +307,41 @@ The table for watch requests look like the following
306307
[validation]: https://github.com/kubernetes/kubernetes/blob/release-1.30/staging/src/k8s.io/apimachinery/pkg/apis/meta/internalversion/validation/validation.go#L28
307308
[etcd resolution]: https://github.com/kubernetes/kubernetes/blob/release-1.30/staging/src/k8s.io/apiserver/pkg/storage/etcd3/store.go#L589-L627
308309

309-
For such situations we will provide users with following tools:
310+
As presented in the above tables, the semantics for a given request server from
311+
etcd and watchcache is a little bit different. It's a consequence of the fact that:
312+
* etcd design supports only `Exact` semantics - it allows for consistent list
313+
from a given resource version (either specific value or "now").
314+
The semantics of `NotOlderThan` is implemented as getting consistent list from
315+
"now" and checking if it satisfies the condition.
316+
* watchcache design supports only `NotOlderThan` semantics - it always waits
317+
until its resource version is at least as fresh as requested resource version
318+
and then returns the result from its current state
319+
320+
For the above reason, sending the same request to etcd and watchcache, especially
321+
when cluster state is changing, may legitimately return different results.
322+
323+
In order to allow debugging results returned from watchcache in a runnning cluster,
324+
the only reasonable procedure is:
325+
* send a request that is served from watchcache
326+
* send a request setting `ResourceVersionMatch=Exact` and `ResourceVersions` to value
327+
returned from the request returned in a previous point
328+
* compare the two results
329+
330+
The existing API already allows us to achieve it.
331+
332+
To further allow debugging and improve confidence we will provide users with the
333+
following tools:
310334
* a dedicated `apiserver_watch_cache_read_wait` metric to detect a problem with
311335
watch cache.
312-
* a per-request override to disable watch cache to allow debugging.
336+
* a `inconsistency detector` that for requests served from watchcache will be able
337+
to send a request to etcd (as described above) and compare the results
313338

314339
Metric `apiserver_watch_cache_read_wait` will measure wait time experienced by
315340
reads for watch cache to become fresh. If user notices a latency request in
316341
they can use this metric to confirm that the issue is caused by watch cache.
317342

318-
Per request override should allow user to compare request results without
319-
impacting other requests or requiring to redeploy whole cluster. The exact
320-
details of override API will be clarified during API review. In healthy
321-
situation, using this override should not cause any impact on the response,
322-
however it might increase resource usage. In our tests cpu load could increase
323-
tenfold. To prevent abuse access to it should be limited to users with
324-
`cluster-admin` role, rejecting the request otherwise.
325-
326-
In case of issues with watch cache users can use the `ConsistentListFromCache`
327-
feature flag to disable the feature or the existing `--watch-cache` flag to
328-
disable the whole watch cache.
329-
330-
We prefer to provide users an explicit flag and per-request override over an
331-
automatic fallback. It gives users full control and visibility into how request
332-
are handled and ensures accurate APF cost estimates. We expect watch being
333-
starved to happen very rarely, meaning its logic needs to be very simple to
334-
ensure it works properly. A simple fallback will not bring much benefit over
335-
what user can do manually. It will just make the harder to understand and
336-
predict behavior. APF estimates cost just based on request parameters,
337-
before it is passed to storage. If fallback was based on state of watch cache,
338-
cost of request would change after the APF decision increasing the risk of overload.
343+
The `inconsistency detector` will get enabled in our CI to detect issues with
344+
the introduced mechanism.
339345

340346
## Design Details
341347

@@ -432,7 +438,7 @@ Comparing resource usage and latency with and without consistent list from watch
432438

433439
- Feature is enabled by default.
434440
- Metric `apiserver_watch_cache_read_wait` is implemented.
435-
- Per-request watch cache opt-out is implemented.
441+
- Inconsistency detector is implemented and enabled in CI
436442
- Deprecate support of etcd v3.3.X, v3.4.24 and v3.5.7
437443

438444
#### GA
@@ -582,7 +588,7 @@ Use per-request override to compare latency when reading from watch cache vs etc
582588
## Implementation History
583589

584590
* 1.28 - Alpha
585-
* 1.30 - Beta
591+
* 1.31 - Beta
586592

587593
## Alternatives
588594

@@ -601,3 +607,21 @@ Do a dynamic fallback based on watch cache wait time.
601607

602608
- We expect watch being starved to happen very rarely, meaning its logic needs to be very simple to ensure it works properly.
603609
- Simple fallback will rather not do a better job then just a manual fallback.
610+
611+
### Per-request override
612+
613+
To enable debugging, we considered introducing per-request override to disable
614+
watchcache to force the request to be served from etcd. This would allow us
615+
to compare request results without impacting other requests or requiring to
616+
redeploy the whole cluster. However, as described in the KEP itself, the results
617+
of the same requests served from watchcache and etcd may legitimately return
618+
different results. As a result, the proposed debugging mechanism was decided
619+
to better serve its purpose.
620+
621+
We also considered automatic fallback. However, we expect watch being
622+
starved to happen very rarely, meaning its logic needs to be very simple to
623+
ensure it works properly. A simple fallback will not bring much benefit over
624+
what user can do manually. It will just make the harder to understand and
625+
predict behavior. APF estimates cost just based on request parameters,
626+
before it is passed to storage. If fallback was based on state of watch cache,
627+
cost of request would change after the APF decision increasing the risk of overload.

keps/sig-api-machinery/2340-Consistent-reads-from-cache/kep.yaml

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -16,17 +16,17 @@ approvers:
1616
- "@wojtek-t"
1717
editor: TBD
1818
creation-date: 2019-12-10
19-
last-updated: 2023-06-15
19+
last-updated: 2024-05-09
2020
status: implementable
2121
see-also:
2222
- "/keps/sig-api-machinery/3157-watch-list"
2323
replaces:
2424
superseded-by:
2525
stage: beta
26-
latest-milestone: "v1.30"
26+
latest-milestone: "v1.31"
2727
milestone:
2828
alpha: "v1.28"
29-
beta: "v1.30"
29+
beta: "v1.31"
3030
feature-gates:
3131
- name: ConsistentListFromCache
3232
components:

0 commit comments

Comments
 (0)