Skip to content

Conversation

@okJiang
Copy link
Member

@okJiang okJiang commented Jan 5, 2026

What problem does this PR solve?

Issue Number: Close #10108

What is changed and how does it work?

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Code changes

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

Release note

None.

Summary by CodeRabbit

  • Bug Fixes

    • In-memory store rate limiters now refresh when persisted limits change, including converting per-minute configs to per-second rates and applying updates for all stores so throttling behavior stays consistent.
  • Tests

    • Added tests validating throttling behavior and that limiter state updates when store limits are changed, including recovery from a throttled state.

✏️ Tip: You can customize this high-level summary in your review settings.

Signed-off-by: okjiang <819421878@qq.com>
@ti-chi-bot ti-chi-bot bot added release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/needs-triage-completed dco-signoff: yes Indicates the PR's author has signed the dco. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed do-not-merge/needs-triage-completed labels Jan 5, 2026
// refreshStoreRateLimit applies the schedule config's store limit to the in-memory store limiter.
func (c *RaftCluster) refreshStoreRateLimit(storeID uint64, limitType storelimit.Type) {
// Only v1 uses StoreRateLimit for AddPeer/RemovePeer.
if c.opt.GetStoreLimitVersion() != storelimit.VersionV1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can keep it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If V2 is enabled, we should not allow modifying the V1 store limit.

@okJiang
Copy link
Member Author

okJiang commented Jan 5, 2026

/retest

Signed-off-by: okjiang <819421878@qq.com>
@okJiang okJiang force-pushed the update-storelimit branch from 49c317f to d137e35 Compare January 6, 2026 03:47
@codecov
Copy link

codecov bot commented Jan 6, 2026

Codecov Report

❌ Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.59%. Comparing base (88ddf8a) to head (447a623).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #10131   +/-   ##
=======================================
  Coverage   78.58%   78.59%           
=======================================
  Files         520      520           
  Lines       69650    69664   +14     
=======================================
+ Hits        54737    54750   +13     
+ Misses      10975    10966    -9     
- Partials     3938     3948   +10     
Flag Coverage Δ
unittests 78.59% <85.71%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@okJiang
Copy link
Member Author

okJiang commented Jan 6, 2026

/cc @rleungx @lhy1024

@ti-chi-bot ti-chi-bot bot requested review from lhy1024 and rleungx January 6, 2026 04:01
log.Error("persist store limit meet error", errs.ZapError(err))
return err
}
for storeID := range c.opt.GetAllStoresLimit() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does GetAllStoresLimit always contain all store?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Almost

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GetStoreLimit, if use default-add-peer or default-remove-peer, StoreLimit will not be set.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. I changed to use GetStoreIDs

Signed-off-by: okjiang <819421878@qq.com>
@okJiang
Copy link
Member Author

okJiang commented Jan 6, 2026

/retest

}

// refreshStoreRateLimit applies the schedule config's store limit to the in-memory store limiter.
func (c *RaftCluster) refreshStoreRateLimit(storeID uint64, limitType storelimit.Type) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, do we need to refresh store limit in SetStoreLimitTTL?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems necessary, but I have a question, is SetStoreLimitTTL used somewhere? Is this a overdesign? Or can we delete it directly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know where it is used, either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rleungx How about removing it?

Copy link
Member

@rleungx rleungx Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM if no one uses it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed it #10135 cc

@okJiang
Copy link
Member Author

okJiang commented Jan 8, 2026

/retest

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jan 8, 2026
@rleungx
Copy link
Member

rleungx commented Jan 9, 2026

@coderabbitai full review

@coderabbitai
Copy link

coderabbitai bot commented Jan 9, 2026

✅ Actions performed

Full review triggered.

@coderabbitai
Copy link

coderabbitai bot commented Jan 9, 2026

📝 Walkthrough

Walkthrough

Adds an in-memory synchronization step for store rate limits: a new RaftCluster.refreshStoreRateLimit method reads persisted store-limit config, converts per-minute rates to per-second, and updates the in-memory store limiter after SetStoreLimit and SetAllStoresLimit operations.

Changes

Cohort / File(s) Summary
Store Rate Limit Synchronization
server/cluster/cluster.go
Adds refreshStoreRateLimit(storeID uint64, limitType storelimit.Type) and invokes it after SetStoreLimit and SetAllStoresLimit. Converts persisted per-minute rates to per-second and resets the in-memory limiter when values change; no-op for non-StoreRateLimit types.
Tests / Behavior Verification
server/cluster/cluster_test.go
Updates TestCheckCache to use a small positive AddPeer limit (0.0001) and adds runtime checks for token consumption/throttling. Adds TestStoreLimitChangeRefreshLimiter to verify throttling at very low limit and that increasing the persisted limit refreshes the in-memory limiter.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

Suggested labels

release-note, size/L

Suggested reviewers

  • rleungx
  • lhy1024
  • niubell
  • bufferflies

Poem

🐰 A tiny token hops in line,

Persisted rates now meet the mind.
From disk to memory the numbers flow,
Throttles loosen, regions grow.
Hooray — refreshed, the cluster's fine! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'storelimit: refresh store limit once updating' clearly relates to the main change: adding in-memory synchronization to refresh store limits when they are updated.
Description check ✅ Passed The PR description includes the required issue number (Close #10108), references the linked issue, and indicates unit tests were added. Template sections are appropriately filled or marked as not applicable.
Linked Issues check ✅ Passed The code changes directly address issue #10108 by implementing the refreshStoreRateLimit mechanism to ensure updated store limits take effect immediately in the in-memory limiter, preventing stores from being incorrectly filtered after limit updates.
Out of Scope Changes check ✅ Passed All changes are narrowly scoped to store limit synchronization: a new refreshStoreRateLimit method, updates to SetStoreLimit/SetAllStoresLimit to call it, and tests validating the refresh behavior. No unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
server/cluster/cluster.go (2)

2445-2447: Consider performance for large clusters

The loop refreshes limiters for all stores synchronously. While refreshStoreRateLimit is lightweight, in clusters with thousands of stores this could introduce latency. Consider whether this needs optimization or if the current approach is acceptable given that SetAllStoresLimit is not a hot path.


2452-2468: Extract magic number to package-level constant

The value 60 for storeBalanceBaseTime represents the conversion factor from per-minute to per-second rates. Extract this to a package-level constant with a descriptive comment explaining the conversion.

♻️ Suggested refactoring

At the package level (near line 96 with other constants):

+// storeBalanceBaseTime is the time unit base for store limit rate conversion.
+// Store limits are configured in operations per minute but the in-memory limiter
+// operates in operations per second, requiring division by 60.
+const storeBalanceBaseTime = 60.0

Then in the method:

 func (c *RaftCluster) refreshStoreRateLimit(storeID uint64, limitType storelimit.Type) {
 	store := c.GetStore(storeID)
 	if store == nil {
 		return
 	}
 	limit, ok := store.GetStoreLimit().(*storelimit.StoreRateLimit)
 	if !ok {
 		return
 	}
-	// Schedule config stores the unit in rate-per-minute, but limiter uses rate-per-second.
-	const storeBalanceBaseTime = float64(60)
 	ratePerSec := c.opt.GetStoreLimitByType(storeID, limitType) / storeBalanceBaseTime
 	if limit.Rate(limitType) != ratePerSec {
 		c.ResetStoreLimit(storeID, limitType, ratePerSec)
 	}
 }
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6173d50 and 59720d0.

📒 Files selected for processing (2)
  • server/cluster/cluster.go
  • server/cluster/cluster_test.go
🧰 Additional context used
🧬 Code graph analysis (1)
server/cluster/cluster.go (3)
pkg/core/storelimit/limit.go (1)
  • Type (22-22)
pkg/core/storelimit/store_limit.go (4)
  • StoreRateLimit (68-70)
  • StoreRateLimit (86-86)
  • StoreRateLimit (89-91)
  • StoreRateLimit (94-94)
pkg/core/store_option.go (1)
  • ResetStoreLimit (285-295)
🔇 Additional comments (3)
server/cluster/cluster.go (1)

2426-2426: LGTM: Proper synchronization after persist

The call to refreshStoreRateLimit after persisting ensures the in-memory limiter reflects the updated configuration, directly addressing issue #10108.

server/cluster/cluster_test.go (2)

2908-2915: LGTM: Proper throttling simulation

The change from 0 to 0.0001 correctly simulates a throttled store, since StoreRateLimit treats zero as unlimited. The additional token consumption logic properly validates throttling behavior.


2932-2964: LGTM: Comprehensive test for limiter refresh

This test effectively validates the fix for issue #10108 by:

  1. Simulating a throttled store with exhausted tokens
  2. Increasing the store limit via the API
  3. Verifying the in-memory limiter is refreshed and the store becomes available

The test structure is clear and directly addresses the reported bug scenario.

@ti-chi-bot ti-chi-bot bot added the lgtm label Jan 9, 2026
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jan 9, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lhy1024, rleungx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 9, 2026
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Jan 9, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-01-08 09:25:37.312694832 +0000 UTC m=+867093.131003254: ☑️ agreed by lhy1024.
  • 2026-01-09 06:18:29.461459671 +0000 UTC m=+942265.279768103: ☑️ agreed by rleungx.

@rleungx
Copy link
Member

rleungx commented Jan 9, 2026

/retest

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
server/cluster/cluster.go (2)

2465-2465: Consider epsilon-based comparison for floating-point values.

Using != for direct float comparison can be unreliable due to floating-point precision. While this may work in practice for the current use case, consider using an epsilon-based comparison for robustness:

const epsilon = 1e-9
if math.Abs(limit.Rate(limitType) - ratePerSec) > epsilon {
    c.ResetStoreLimit(storeID, limitType, ratePerSec)
}

2458-2461: Silent no-op on type mismatch may hide configuration issues.

When the store's limiter is not a *storelimit.StoreRateLimit, the method silently returns without updating or logging. If this is intentional (e.g., for stores using different limiter types like SlidingWindows), consider adding a debug log to aid troubleshooting.

📝 Optional: Add debug logging
 func (c *RaftCluster) refreshStoreRateLimit(storeID uint64, limitType storelimit.Type) {
 	store := c.GetStore(storeID)
 	if store == nil {
 		return
 	}
 	limit, ok := store.GetStoreLimit().(*storelimit.StoreRateLimit)
 	if !ok {
+		log.Debug("store limiter is not StoreRateLimit type, skipping refresh",
+			zap.Uint64("store-id", storeID),
+			zap.String("limit-type", limitType.String()))
 		return
 	}
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59720d0 and 447a623.

📒 Files selected for processing (1)
  • server/cluster/cluster.go
🧰 Additional context used
🧬 Code graph analysis (1)
server/cluster/cluster.go (2)
pkg/core/storelimit/store_limit.go (4)
  • StoreRateLimit (68-70)
  • StoreRateLimit (86-86)
  • StoreRateLimit (89-91)
  • StoreRateLimit (94-94)
pkg/core/store_option.go (1)
  • ResetStoreLimit (285-295)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)
  • GitHub Check: tso-function-test
  • GitHub Check: chunks (7, Client Integration Test)
  • GitHub Check: chunks (4, Tests(1))
  • GitHub Check: chunks (8, TSO Integration Test)
  • GitHub Check: chunks (6, Tools Test)
  • GitHub Check: chunks (9, Microservice Integration(!TSO))
  • GitHub Check: chunks (10, Microservice Integration(TSO))
  • GitHub Check: chunks
  • GitHub Check: chunks (5, Tests(2))
  • GitHub Check: chunks (2, Unit Test(2))
  • GitHub Check: chunks (3, Unit Test(3))
  • GitHub Check: chunks (1, Unit Test(1))
  • GitHub Check: statics
🔇 Additional comments (3)
server/cluster/cluster.go (3)

2426-2426: LGTM: Refresh call correctly placed after persist.

The call to refreshStoreRateLimit after successful persistence ensures the in-memory limiter stays synchronized with the persisted configuration, addressing the core issue in #10108.


2445-2447: No issues identified. The GetStoreIDs() method is correctly inherited through the embedded *core.BasicCluster field, which itself embeds *StoresInfo. The method returns a slice of all store IDs currently tracked in the cluster and is properly used in the loop to refresh rate limits for all stores.


2466-2466: No action required — ResetStoreLimit method is correctly accessible through embedded delegation.

The RaftCluster embeds *core.BasicCluster, which in turn embeds *StoresInfo. The StoresInfo type defines ResetStoreLimit(storeID uint64, limitType storelimit.Type, ratePerSec ...float64), and this method is automatically promoted and accessible on RaftCluster through Go's embedded field method forwarding. The call at line 2466 is valid.

@okJiang
Copy link
Member Author

okJiang commented Jan 9, 2026

/retest

1 similar comment
@okJiang
Copy link
Member Author

okJiang commented Jan 9, 2026

/retest

@ti-chi-bot ti-chi-bot bot merged commit 31f6530 into tikv:master Jan 9, 2026
32 checks passed
@okJiang okJiang deleted the update-storelimit branch January 9, 2026 13:56
bufferflies pushed a commit to bufferflies/pd that referenced this pull request Jan 20, 2026
close tikv#10108

Signed-off-by: okjiang <819421878@qq.com>
bufferflies pushed a commit to bufferflies/pd that referenced this pull request Jan 21, 2026
close tikv#10108

Signed-off-by: okjiang <819421878@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Store limit setting does not take effect

3 participants