-
Notifications
You must be signed in to change notification settings - Fork 754
storelimit: refresh store limit once updating #10131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: okjiang <819421878@qq.com>
server/cluster/cluster.go
Outdated
| // refreshStoreRateLimit applies the schedule config's store limit to the in-memory store limiter. | ||
| func (c *RaftCluster) refreshStoreRateLimit(storeID uint64, limitType storelimit.Type) { | ||
| // Only v1 uses StoreRateLimit for AddPeer/RemovePeer. | ||
| if c.opt.GetStoreLimitVersion() != storelimit.VersionV1 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If V2 is enabled, we should not allow modifying the V1 store limit.
|
/retest |
49c317f to
d137e35
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #10131 +/- ##
=======================================
Coverage 78.58% 78.59%
=======================================
Files 520 520
Lines 69650 69664 +14
=======================================
+ Hits 54737 54750 +13
+ Misses 10975 10966 -9
- Partials 3938 3948 +10
Flags with carried forward coverage won't be shown. Click here to find out more. 🚀 New features to boost your workflow:
|
server/cluster/cluster.go
Outdated
| log.Error("persist store limit meet error", errs.ZapError(err)) | ||
| return err | ||
| } | ||
| for storeID := range c.opt.GetAllStoresLimit() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does GetAllStoresLimit always contain all store?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In GetStoreLimit, if use default-add-peer or default-remove-peer, StoreLimit will not be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you. I changed to use GetStoreIDs
Signed-off-by: okjiang <819421878@qq.com>
|
/retest |
| } | ||
|
|
||
| // refreshStoreRateLimit applies the schedule config's store limit to the in-memory store limiter. | ||
| func (c *RaftCluster) refreshStoreRateLimit(storeID uint64, limitType storelimit.Type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, do we need to refresh store limit in SetStoreLimitTTL?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems necessary, but I have a question, is SetStoreLimitTTL used somewhere? Is this a overdesign? Or can we delete it directly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know where it is used, either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rleungx How about removing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM if no one uses it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed it #10135 cc
|
/retest |
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
📝 WalkthroughWalkthroughAdds an in-memory synchronization step for store rate limits: a new RaftCluster.refreshStoreRateLimit method reads persisted store-limit config, converts per-minute rates to per-second, and updates the in-memory store limiter after SetStoreLimit and SetAllStoresLimit operations. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
server/cluster/cluster.go (2)
2445-2447: Consider performance for large clustersThe loop refreshes limiters for all stores synchronously. While
refreshStoreRateLimitis lightweight, in clusters with thousands of stores this could introduce latency. Consider whether this needs optimization or if the current approach is acceptable given thatSetAllStoresLimitis not a hot path.
2452-2468: Extract magic number to package-level constantThe value
60forstoreBalanceBaseTimerepresents the conversion factor from per-minute to per-second rates. Extract this to a package-level constant with a descriptive comment explaining the conversion.♻️ Suggested refactoring
At the package level (near line 96 with other constants):
+// storeBalanceBaseTime is the time unit base for store limit rate conversion. +// Store limits are configured in operations per minute but the in-memory limiter +// operates in operations per second, requiring division by 60. +const storeBalanceBaseTime = 60.0Then in the method:
func (c *RaftCluster) refreshStoreRateLimit(storeID uint64, limitType storelimit.Type) { store := c.GetStore(storeID) if store == nil { return } limit, ok := store.GetStoreLimit().(*storelimit.StoreRateLimit) if !ok { return } - // Schedule config stores the unit in rate-per-minute, but limiter uses rate-per-second. - const storeBalanceBaseTime = float64(60) ratePerSec := c.opt.GetStoreLimitByType(storeID, limitType) / storeBalanceBaseTime if limit.Rate(limitType) != ratePerSec { c.ResetStoreLimit(storeID, limitType, ratePerSec) } }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
server/cluster/cluster.goserver/cluster/cluster_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
server/cluster/cluster.go (3)
pkg/core/storelimit/limit.go (1)
Type(22-22)pkg/core/storelimit/store_limit.go (4)
StoreRateLimit(68-70)StoreRateLimit(86-86)StoreRateLimit(89-91)StoreRateLimit(94-94)pkg/core/store_option.go (1)
ResetStoreLimit(285-295)
🔇 Additional comments (3)
server/cluster/cluster.go (1)
2426-2426: LGTM: Proper synchronization after persistThe call to
refreshStoreRateLimitafter persisting ensures the in-memory limiter reflects the updated configuration, directly addressing issue #10108.server/cluster/cluster_test.go (2)
2908-2915: LGTM: Proper throttling simulationThe change from
0to0.0001correctly simulates a throttled store, sinceStoreRateLimittreats zero as unlimited. The additional token consumption logic properly validates throttling behavior.
2932-2964: LGTM: Comprehensive test for limiter refreshThis test effectively validates the fix for issue #10108 by:
- Simulating a throttled store with exhausted tokens
- Increasing the store limit via the API
- Verifying the in-memory limiter is refreshed and the store becomes available
The test structure is clear and directly addresses the reported bug scenario.
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lhy1024, rleungx The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/retest |
Signed-off-by: okjiang <819421878@qq.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
server/cluster/cluster.go (2)
2465-2465: Consider epsilon-based comparison for floating-point values.Using
!=for direct float comparison can be unreliable due to floating-point precision. While this may work in practice for the current use case, consider using an epsilon-based comparison for robustness:const epsilon = 1e-9 if math.Abs(limit.Rate(limitType) - ratePerSec) > epsilon { c.ResetStoreLimit(storeID, limitType, ratePerSec) }
2458-2461: Silent no-op on type mismatch may hide configuration issues.When the store's limiter is not a
*storelimit.StoreRateLimit, the method silently returns without updating or logging. If this is intentional (e.g., for stores using different limiter types like SlidingWindows), consider adding a debug log to aid troubleshooting.📝 Optional: Add debug logging
func (c *RaftCluster) refreshStoreRateLimit(storeID uint64, limitType storelimit.Type) { store := c.GetStore(storeID) if store == nil { return } limit, ok := store.GetStoreLimit().(*storelimit.StoreRateLimit) if !ok { + log.Debug("store limiter is not StoreRateLimit type, skipping refresh", + zap.Uint64("store-id", storeID), + zap.String("limit-type", limitType.String())) return }
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
server/cluster/cluster.go
🧰 Additional context used
🧬 Code graph analysis (1)
server/cluster/cluster.go (2)
pkg/core/storelimit/store_limit.go (4)
StoreRateLimit(68-70)StoreRateLimit(86-86)StoreRateLimit(89-91)StoreRateLimit(94-94)pkg/core/store_option.go (1)
ResetStoreLimit(285-295)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
- GitHub Check: tso-function-test
- GitHub Check: chunks (7, Client Integration Test)
- GitHub Check: chunks (4, Tests(1))
- GitHub Check: chunks (8, TSO Integration Test)
- GitHub Check: chunks (6, Tools Test)
- GitHub Check: chunks (9, Microservice Integration(!TSO))
- GitHub Check: chunks (10, Microservice Integration(TSO))
- GitHub Check: chunks
- GitHub Check: chunks (5, Tests(2))
- GitHub Check: chunks (2, Unit Test(2))
- GitHub Check: chunks (3, Unit Test(3))
- GitHub Check: chunks (1, Unit Test(1))
- GitHub Check: statics
🔇 Additional comments (3)
server/cluster/cluster.go (3)
2426-2426: LGTM: Refresh call correctly placed after persist.The call to
refreshStoreRateLimitafter successful persistence ensures the in-memory limiter stays synchronized with the persisted configuration, addressing the core issue in #10108.
2445-2447: No issues identified. TheGetStoreIDs()method is correctly inherited through the embedded*core.BasicClusterfield, which itself embeds*StoresInfo. The method returns a slice of all store IDs currently tracked in the cluster and is properly used in the loop to refresh rate limits for all stores.
2466-2466: No action required —ResetStoreLimitmethod is correctly accessible through embedded delegation.The
RaftClusterembeds*core.BasicCluster, which in turn embeds*StoresInfo. TheStoresInfotype definesResetStoreLimit(storeID uint64, limitType storelimit.Type, ratePerSec ...float64), and this method is automatically promoted and accessible onRaftClusterthrough Go's embedded field method forwarding. The call at line 2466 is valid.
|
/retest |
1 similar comment
|
/retest |
close tikv#10108 Signed-off-by: okjiang <819421878@qq.com>
close tikv#10108 Signed-off-by: okjiang <819421878@qq.com>

What problem does this PR solve?
Issue Number: Close #10108
What is changed and how does it work?
Check List
Tests
Code changes
Side effects
Related changes
pingcap/docs/pingcap/docs-cn:pingcap/tiup:Release note
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.