-
Notifications
You must be signed in to change notification settings - Fork 962
add rate limit for zk read rate in gc. #4645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add rate limit for zk read rate in gc. #4645
Conversation
|
PTAL, thanks. @StevenLuMT @hezhangjian @hangc0276 @nodece @zymap |
StevenLuMT
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think adding a gcRateLimiter doesn't fundamentally solve the problem, nor is it practically feasible to implement or limit the number of calls.
I think it could be more practical to break down the rate limits into several categories: rate limits for GC calls to ZooKeeper, rate limits for GC file reads, rate limits for GC file writes, or rate limits for other GC resources.
Why? I believe adding a rate limiter and acquiring it prior to each zk operation would suffice.
We haven't encountered any issues with GC file reads or writes. The only thing I need to limit is the rate of zk reads. |
...keeper-server/src/main/java/org/apache/bookkeeper/bookie/ScanAndCompareGarbageCollector.java
Outdated
Show resolved
Hide resolved
bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java
Outdated
Show resolved
Hide resolved
bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java
Outdated
Show resolved
Hide resolved
nodece
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM: The previous definition was a bit broad. Now you have sunk to MetadataOpRateLimit, which I understand is still useful and configurable.
|
By the way, is it possible to add a testcase? If it is possible, please add a testcase to verify the correctness of the function. Thank you @thetumbled |
|
rerun failure checks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds rate limiting for ZooKeeper read operations during garbage collection to prevent ZooKeeper latency spikes that threaten cluster stability. The unlimited ZooKeeper operations during GC were causing read latencies to reach tens of seconds.
- Adds a new configuration parameter
gcZkOpRateLimitwith a default value of 1000 operations per second - Implements rate limiting using Google Guava's RateLimiter in the ScanAndCompareGarbageCollector
- Applies rate limiting to ZooKeeper read operations during ledger metadata scanning and overreplicated ledger processing
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| ServerConfiguration.java | Adds configuration parameter and getter/setter methods for GC ZooKeeper operation rate limiting |
| ScanAndCompareGarbageCollector.java | Implements rate limiting using RateLimiter for ZooKeeper operations during garbage collection |
Comments suppressed due to low confidence (1)
bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java:495
- The parameter name
gcRateLimitin the javadoc doesn't match the actual parameter namegcRateLimitused in the method signature. Consider using a more descriptive name likegcZkOpRateLimitto match the configuration constant and method name.
* @param gcRateLimit
bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java
Outdated
Show resolved
Hide resolved
bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java
Outdated
Show resolved
Hide resolved
A test case has been added to verify the correctness of the rate limit feature, PTAL, thanks. |
|
rerun failure checks |
StevenLuMT
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Please check the configuration guidelines at: https://bookkeeper.apache.org/community/coding-guide/#configuration
-
This configuration should also be added to the default
conf/bk_server.confandsite3/website/docs/reference/config.md.
- Names should be thought through from the point of view of the person using the config.
- The default values should be thought as best value for people who runs the program without tuning parameters.
- All configuration settings should be added to default configuration file and documented.
- The new configuration should be cross-referenced with the existing "compaction" rate limiting settings since it will also impact the rate.
bookkeeper-server/src/main/java/org/apache/bookkeeper/conf/ServerConfiguration.java
Show resolved
Hide resolved
eolivelli
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
lhotari
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* add rate limit for gc. * fix checkstyle. * fix conf name and acqurie. * rename gcZkOpRateLimit to gcMetadataOpRateLimit. * rename gcZkOpRateLimit to gcMetadataOpRateLimit. * add return label. * add test code. * document conf. --------- Co-authored-by: fengwenzhi <fengwenzhi.max@bigo.sg> (cherry picked from commit 417ff16)
Motivation
Each time the bookie gc is triggered, the read latency of zookeeper soars high to tens of seconds, threatening the stability of the cluster.


The unlimited zookeeper read operation in gc is responsible for the issue.
Changes
Add config
gcMetadataOpRateLimitto limit the read operation rate in gc.