-
Notifications
You must be signed in to change notification settings - Fork 291
chore: Use watch APIs to list k8s resources #716
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
chore: Use watch APIs to list k8s resources #716
Conversation
Signed-off-by: Andrii Korotkov <[email protected]>
Signed-off-by: Andrii Korotkov <[email protected]>
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #716 +/- ##
==========================================
- Coverage 54.26% 53.69% -0.57%
==========================================
Files 64 64
Lines 6164 6488 +324
==========================================
+ Hits 3345 3484 +139
- Misses 2549 2727 +178
- Partials 270 277 +7 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Signed-off-by: Andrii Korotkov <[email protected]>
|
Is there some way this can be benchmarked? Like what are the expected performance improvements with this method (I imagine they are large, but ballpark numbers)? |
I'll try to think of something. |
if opts.ResourceVersion == "" { | ||
opts.ResourceVersion = "0" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Required by the list options construction function's code, otherwise it would return false as success.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have any concerns with large clusters and setting ResourceVersion=0
along with a limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The kube-apiserver's caching mechanism (watchcache) efficiently serves requests for the latest observed state. However, LIST requests for previous states (e.g., via pagination or by specifying a resourceVersion) often bypass this cache and are served directly from etcd. This direct etcd access significantly increases performance costs and can lead to stability issues, particularly with large resources, due to memory pressure from transferring large data blobs.
This KEP proposes an enhancement to the kube-apiserver's watch cache to generate B-tree snapshots, allowing it to serve LIST requests for previous states directly from the cache. This change aims to improve API server performance and stability. To support this snapshotting mechanism, this proposal also details changes to the watch cache's compaction behavior to maintain Kubernetes Conformance and introduces an automatic cache inconsistency detection mechanism.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
listResourcesUsingWatchAPI atomic.Int32 | ||
listResourcesUsingRegularAPI atomic.Int32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe we shouldn't do this, because listResources gets called a lot over the lifetime of a controller run (i.e. every time the cache expires or there's a retry due to an error). I don't think it's worth it just for the test. And maybe we can find a different way to test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree that it's not ideal, I'll think how to test better.
t.Run(tc.name, func(t *testing.T) { | ||
if tc.listUsingWatchAPIs { | ||
// Enable WatchListClient in particular. Setting via env variable here is too late. | ||
clientfeatures.ReplaceFeatureGates(AlwaysEnabledGates{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this enables every feature gate that can be enabled. Can we instead enable just the watch-list gate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit more complicated, and it's just for this test, so I'd rather keep this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although shouldn't be too hard to construct gates more precisely.
@andrii-korotkov and @crenshaw-dev are ready to merge this? So I can ask my team to do a load-test with this patch |
Use watch APIs to list resources instead of paginated regular APIs.