storelimit: refresh store limit once updating #10131

okJiang · 2026-01-05T08:41:05Z

What problem does this PR solve?

Issue Number: Close #10108

What is changed and how does it work?

Check List

Tests

Unit test
Integration test
Manual test (add detailed scripts or steps below)
No code

Code changes

Has the configuration change
Has HTTP API interfaces changed (Don't forget to add the declarative for the new API)
Has persistent data change

Side effects

Possible performance regression
Increased code complexity
Breaking backward compatibility

Related changes

PR to update pingcap/docs/pingcap/docs-cn:
PR to update pingcap/tiup:
Need to cherry-pick to the release branch

Release note

None.

Summary by CodeRabbit

Bug Fixes
- In-memory store rate limiters now refresh when persisted limits change, including converting per-minute configs to per-second rates and applying updates for all stores so throttling behavior stays consistent.
Tests
- Added tests validating throttling behavior and that limiter state updates when store limits are changed, including recovery from a throttled state.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

Signed-off-by: okjiang <819421878@qq.com>

rleungx · 2026-01-05T08:52:21Z

server/cluster/cluster.go

+// refreshStoreRateLimit applies the schedule config's store limit to the in-memory store limiter.
+func (c *RaftCluster) refreshStoreRateLimit(storeID uint64, limitType storelimit.Type) {
+	// Only v1 uses StoreRateLimit for AddPeer/RemovePeer.
+	if c.opt.GetStoreLimitVersion() != storelimit.VersionV1 {


Is it necessary?

I think we can keep it.

If V2 is enabled, we should not allow modifying the V1 store limit.

okJiang · 2026-01-05T09:24:36Z

/retest

Signed-off-by: okjiang <819421878@qq.com>

codecov · 2026-01-06T03:56:13Z

Codecov Report

❌ Patch coverage is 85.71429% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.59%. Comparing base (88ddf8a) to head (447a623).
⚠️ Report is 1 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #10131   +/-   ##
=======================================
  Coverage   78.58%   78.59%           
=======================================
  Files         520      520           
  Lines       69650    69664   +14     
=======================================
+ Hits        54737    54750   +13     
+ Misses      10975    10966    -9     
- Partials     3938     3948   +10

Flag	Coverage Δ
unittests	`78.59% <85.71%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

okJiang · 2026-01-06T04:01:30Z

/cc @rleungx @lhy1024

lhy1024 · 2026-01-06T05:49:24Z

server/cluster/cluster.go

 		log.Error("persist store limit meet error", errs.ZapError(err))
 		return err
 	}
+	for storeID := range c.opt.GetAllStoresLimit() {


Does GetAllStoresLimit always contain all store?

In GetStoreLimit, if use default-add-peer or default-remove-peer, StoreLimit will not be set.

Thank you. I changed to use GetStoreIDs

Signed-off-by: okjiang <819421878@qq.com>

okJiang · 2026-01-06T10:42:33Z

/retest

lhy1024 · 2026-01-06T11:09:38Z

server/cluster/cluster.go

 }

+// refreshStoreRateLimit applies the schedule config's store limit to the in-memory store limiter.
+func (c *RaftCluster) refreshStoreRateLimit(storeID uint64, limitType storelimit.Type) {


BTW, do we need to refresh store limit in SetStoreLimitTTL?

It seems necessary, but I have a question, is SetStoreLimitTTL used somewhere? Is this a overdesign? Or can we delete it directly?

I don't know where it is used, either.

@rleungx How about removing it?

LGTM if no one uses it.

removed it #10135 cc

okJiang · 2026-01-08T09:01:49Z

/retest

rleungx · 2026-01-09T06:06:46Z

@coderabbitai full review

coderabbitai · 2026-01-09T06:06:52Z

✅ Actions performed

Full review triggered.

coderabbitai · 2026-01-09T06:06:59Z

📝 Walkthrough

Walkthrough

Adds an in-memory synchronization step for store rate limits: a new RaftCluster.refreshStoreRateLimit method reads persisted store-limit config, converts per-minute rates to per-second, and updates the in-memory store limiter after SetStoreLimit and SetAllStoresLimit operations.

Changes

Cohort / File(s)	Summary
Store Rate Limit Synchronization `server/cluster/cluster.go`	Adds `refreshStoreRateLimit(storeID uint64, limitType storelimit.Type)` and invokes it after `SetStoreLimit` and `SetAllStoresLimit`. Converts persisted per-minute rates to per-second and resets the in-memory limiter when values change; no-op for non-StoreRateLimit types.
Tests / Behavior Verification `server/cluster/cluster_test.go`	Updates `TestCheckCache` to use a small positive AddPeer limit (0.0001) and adds runtime checks for token consumption/throttling. Adds `TestStoreLimitChangeRefreshLimiter` to verify throttling at very low limit and that increasing the persisted limit refreshes the in-memory limiter.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Possibly related PRs

storelimit: remove store limit ttlSecond param #10135 — Modifies store-limit handling in server/cluster/cluster.go; related changes to store-limit behavior and APIs.

Suggested labels

release-note, size/L

Suggested reviewers

rleungx
lhy1024
niubell
bufferflies

Poem

🐰 A tiny token hops in line,

Persisted rates now meet the mind.
From disk to memory the numbers flow,
Throttles loosen, regions grow.
Hooray — refreshed, the cluster's fine! 🎉

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'storelimit: refresh store limit once updating' clearly relates to the main change: adding in-memory synchronization to refresh store limits when they are updated.
Description check	✅ Passed	The PR description includes the required issue number (Close #10108), references the linked issue, and indicates unit tests were added. Template sections are appropriately filled or marked as not applicable.
Linked Issues check	✅ Passed	The code changes directly address issue #10108 by implementing the refreshStoreRateLimit mechanism to ensure updated store limits take effect immediately in the in-memory limiter, preventing stores from being incorrectly filtered after limit updates.
Out of Scope Changes check	✅ Passed	All changes are narrowly scoped to store limit synchronization: a new refreshStoreRateLimit method, updates to SetStoreLimit/SetAllStoresLimit to call it, and tests validating the refresh behavior. No unrelated modifications detected.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

server/cluster/cluster.go (2)
2445-2447: Consider performance for large clusters

The loop refreshes limiters for all stores synchronously. While refreshStoreRateLimit is lightweight, in clusters with thousands of stores this could introduce latency. Consider whether this needs optimization or if the current approach is acceptable given that SetAllStoresLimit is not a hot path.

2452-2468: Extract magic number to package-level constant

The value 60 for storeBalanceBaseTime represents the conversion factor from per-minute to per-second rates. Extract this to a package-level constant with a descriptive comment explaining the conversion.
♻️ Suggested refactoring

At the package level (near line 96 with other constants):
+// storeBalanceBaseTime is the time unit base for store limit rate conversion.
+// Store limits are configured in operations per minute but the in-memory limiter
+// operates in operations per second, requiring division by 60.
+const storeBalanceBaseTime = 60.0
Then in the method:
 func (c *RaftCluster) refreshStoreRateLimit(storeID uint64, limitType storelimit.Type) {
 	store := c.GetStore(storeID)
 	if store == nil {
 		return
 	}
 	limit, ok := store.GetStoreLimit().(*storelimit.StoreRateLimit)
 	if !ok {
 		return
 	}
-	// Schedule config stores the unit in rate-per-minute, but limiter uses rate-per-second.
-	const storeBalanceBaseTime = float64(60)
 	ratePerSec := c.opt.GetStoreLimitByType(storeID, limitType) / storeBalanceBaseTime
 	if limit.Rate(limitType) != ratePerSec {
 		c.ResetStoreLimit(storeID, limitType, ratePerSec)
 	}
 }

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6173d50 and 59720d0.

📒 Files selected for processing (2)

server/cluster/cluster.go
server/cluster/cluster_test.go

🧰 Additional context used

🧬 Code graph analysis (1)

server/cluster/cluster.go (3)

pkg/core/storelimit/limit.go (1)

Type (22-22)

pkg/core/storelimit/store_limit.go (4)

StoreRateLimit (68-70)

StoreRateLimit (86-86)

StoreRateLimit (89-91)

StoreRateLimit (94-94)

pkg/core/store_option.go (1)

ResetStoreLimit (285-295)

🔇 Additional comments (3)

server/cluster/cluster.go (1)

2426-2426: LGTM: Proper synchronization after persist

The call to refreshStoreRateLimit after persisting ensures the in-memory limiter reflects the updated configuration, directly addressing issue #10108.

server/cluster/cluster_test.go (2)

2908-2915: LGTM: Proper throttling simulation

The change from 0 to 0.0001 correctly simulates a throttled store, since StoreRateLimit treats zero as unlimited. The additional token consumption logic properly validates throttling behavior.

2932-2964: LGTM: Comprehensive test for limiter refresh

This test effectively validates the fix for issue #10108 by:

Simulating a throttled store with exhausted tokens

Increasing the store limit via the API

Verifying the in-memory limiter is refreshed and the store becomes available

The test structure is clear and directly addresses the reported bug scenario.

ti-chi-bot · 2026-01-09T06:18:28Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lhy1024, rleungx

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [lhy1024,rleungx]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ti-chi-bot · 2026-01-09T06:18:30Z

[LGTM Timeline notifier]

Timeline:

2026-01-08 09:25:37.312694832 +0000 UTC m=+867093.131003254: ☑️ agreed by lhy1024.
2026-01-09 06:18:29.461459671 +0000 UTC m=+942265.279768103: ☑️ agreed by rleungx.

rleungx · 2026-01-09T06:26:54Z

/retest

Signed-off-by: okjiang <819421878@qq.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

server/cluster/cluster.go (2)
2465-2465: Consider epsilon-based comparison for floating-point values.

Using != for direct float comparison can be unreliable due to floating-point precision. While this may work in practice for the current use case, consider using an epsilon-based comparison for robustness:
const epsilon = 1e-9
if math.Abs(limit.Rate(limitType) - ratePerSec) > epsilon {
    c.ResetStoreLimit(storeID, limitType, ratePerSec)
}
2458-2461: Silent no-op on type mismatch may hide configuration issues.

When the store's limiter is not a *storelimit.StoreRateLimit, the method silently returns without updating or logging. If this is intentional (e.g., for stores using different limiter types like SlidingWindows), consider adding a debug log to aid troubleshooting.
📝 Optional: Add debug logging
 func (c *RaftCluster) refreshStoreRateLimit(storeID uint64, limitType storelimit.Type) {
 	store := c.GetStore(storeID)
 	if store == nil {
 		return
 	}
 	limit, ok := store.GetStoreLimit().(*storelimit.StoreRateLimit)
 	if !ok {
+		log.Debug("store limiter is not StoreRateLimit type, skipping refresh",
+			zap.Uint64("store-id", storeID),
+			zap.String("limit-type", limitType.String()))
 		return
 	}

📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 59720d0 and 447a623.

📒 Files selected for processing (1)

server/cluster/cluster.go

🧰 Additional context used

🧬 Code graph analysis (1)

server/cluster/cluster.go (2)

pkg/core/storelimit/store_limit.go (4)

StoreRateLimit (68-70)

StoreRateLimit (86-86)

StoreRateLimit (89-91)

StoreRateLimit (94-94)

pkg/core/store_option.go (1)

ResetStoreLimit (285-295)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (13)

GitHub Check: tso-function-test
GitHub Check: chunks (7, Client Integration Test)
GitHub Check: chunks (4, Tests(1))
GitHub Check: chunks (8, TSO Integration Test)
GitHub Check: chunks (6, Tools Test)
GitHub Check: chunks (9, Microservice Integration(!TSO))
GitHub Check: chunks (10, Microservice Integration(TSO))
GitHub Check: chunks
GitHub Check: chunks (5, Tests(2))
GitHub Check: chunks (2, Unit Test(2))
GitHub Check: chunks (3, Unit Test(3))
GitHub Check: chunks (1, Unit Test(1))
GitHub Check: statics

🔇 Additional comments (3)

server/cluster/cluster.go (3)

2426-2426: LGTM: Refresh call correctly placed after persist.

The call to refreshStoreRateLimit after successful persistence ensures the in-memory limiter stays synchronized with the persisted configuration, addressing the core issue in #10108.

2445-2447: No issues identified. The GetStoreIDs() method is correctly inherited through the embedded *core.BasicCluster field, which itself embeds *StoresInfo. The method returns a slice of all store IDs currently tracked in the cluster and is properly used in the loop to refresh rate limits for all stores.

2466-2466: No action required — ResetStoreLimit method is correctly accessible through embedded delegation.

The RaftCluster embeds *core.BasicCluster, which in turn embeds *StoresInfo. The StoresInfo type defines ResetStoreLimit(storeID uint64, limitType storelimit.Type, ratePerSec ...float64), and this method is automatically promoted and accessible on RaftCluster through Go's embedded field method forwarding. The call at line 2466 is valid.

okJiang · 2026-01-09T08:31:53Z

/retest

okJiang · 2026-01-09T09:16:26Z

/retest

close tikv#10108 Signed-off-by: okjiang <819421878@qq.com>

update store limit

dbcc245

Signed-off-by: okjiang <819421878@qq.com>

rleungx reviewed Jan 5, 2026

View reviewed changes

fix ut

d137e35

Signed-off-by: okjiang <819421878@qq.com>

okJiang force-pushed the update-storelimit branch from 49c317f to d137e35 Compare January 6, 2026 03:47

ti-chi-bot bot requested review from lhy1024 and rleungx January 6, 2026 04:01

lhy1024 reviewed Jan 6, 2026

View reviewed changes

fix comment

59720d0

Signed-off-by: okjiang <819421878@qq.com>

lhy1024 reviewed Jan 6, 2026

View reviewed changes

lhy1024 approved these changes Jan 8, 2026

View reviewed changes

ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jan 8, 2026

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

rleungx approved these changes Jan 9, 2026

View reviewed changes

ti-chi-bot bot added the lgtm label Jan 9, 2026

ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jan 9, 2026

Merge branch 'master' of github.com:tikv/pd into update-storelimit

447a623

Signed-off-by: okjiang <819421878@qq.com>

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

ti-chi-bot bot merged commit 31f6530 into tikv:master Jan 9, 2026
32 checks passed

okJiang deleted the update-storelimit branch January 9, 2026 13:56

bufferflies pushed a commit to bufferflies/pd that referenced this pull request Jan 20, 2026

storelimit: refresh store limit once updating (tikv#10131)

33924fc

close tikv#10108 Signed-off-by: okjiang <819421878@qq.com>

bufferflies pushed a commit to bufferflies/pd that referenced this pull request Jan 21, 2026

storelimit: refresh store limit once updating (tikv#10131)

18d766e

close tikv#10108 Signed-off-by: okjiang <819421878@qq.com>

storelimit: refresh store limit once updating #10131

storelimit: refresh store limit once updating #10131

Uh oh!

Conversation

okJiang commented Jan 5, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What problem does this PR solve?

What is changed and how does it work?

Check List

Release note

Summary by CodeRabbit

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

okJiang commented Jan 5, 2026

Uh oh!

codecov bot commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

okJiang commented Jan 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

okJiang commented Jan 6, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rleungx Jan 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

okJiang commented Jan 8, 2026

Uh oh!

rleungx commented Jan 9, 2026

Uh oh!

coderabbitai bot commented Jan 9, 2026

Uh oh!

coderabbitai bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

ti-chi-bot bot commented Jan 9, 2026

Uh oh!

ti-chi-bot bot commented Jan 9, 2026

[LGTM Timeline notifier]

Uh oh!

rleungx commented Jan 9, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

okJiang commented Jan 9, 2026

okJiang commented Jan 5, 2026 •

edited by coderabbitai bot

Loading

codecov bot commented Jan 6, 2026 •

edited

Loading

rleungx Jan 7, 2026 •

edited

Loading

coderabbitai bot commented Jan 9, 2026 •

edited

Loading