scheduler: network slow store scheduler enhancement (#22269)

qiancai · web-flow · commit 34be0676d8be · 2026-01-13T01:59:35.000Z
diff --git a/best-practices/pd-scheduling-best-practices.md b/best-practices/pd-scheduling-best-practices.md
@@ -297,7 +297,9 @@ If a TiKV node fails, PD defaults to setting the corresponding node to the **dow
 
 Practically, if a node failure is considered unrecoverable, you can immediately take it offline. This makes PD replenish replicas soon in another node and reduces the risk of data loss. In contrast, if a node is considered recoverable, but the recovery cannot be done in 30 minutes, you can temporarily adjust `max-store-down-time` to a larger value to avoid unnecessary replenishment of the replicas and resources waste after the timeout.
 
-In TiDB v5.2.0, TiKV introduces the mechanism of slow TiKV node detection. By sampling the requests in TiKV, this mechanism works out a score ranging from 1 to 100. A TiKV node with a score higher than or equal to 80 is marked as slow. You can add [`evict-slow-store-scheduler`](/pd-control.md#scheduler-show--add--remove--pause--resume--config--describe) to detect and schedule slow nodes. If only one TiKV is detected as slow, and the slow score reaches the limit (80 by default), the Leader in this node will be evicted (similar to the effect of `evict-leader-scheduler`).
+Starting from TiDB v5.2.0, TiKV introduces a mechanism to detect slow-disk nodes. By sampling the requests in TiKV, this mechanism works out a score ranging from 1 to 100. A TiKV node with a score higher than or equal to 80 is marked as slow. You can add [`evict-slow-store-scheduler`](/pd-control.md#scheduler-show--add--remove--pause--resume--config--describe) to schedule slow nodes. If only one TiKV node is detected as slow, and its slow score reaches the limit (80 by default), the Leaders on that node will be evicted (similar to the effect of `evict-leader-scheduler`).
+
+Starting from v8.5.5 and v9.0.0, TiKV introduces a mechanism to detect slow-network nodes. Similar to slow-disk node detection, this mechanism identifies slow nodes by probing network latency between TiKV nodes and calculating a score. You can enable this mechanism using [`enable-network-slow-store`](/pd-control.md#scheduler-config-evict-slow-store-scheduler).
 
 > **Note:**
 >
diff --git a/pd-control.md b/pd-control.md
@@ -941,7 +941,7 @@ Usage:
 >> scheduler config evict-leader-scheduler                 // Display the stores in which the scheduler is located since v4.0.0
 >> scheduler config evict-leader-scheduler add-store 2     // Add leader eviction scheduling for store 2
 >> scheduler config evict-leader-scheduler delete-store 2  // Remove leader eviction scheduling for store 2
->> scheduler add evict-slow-store-scheduler                // When there is one and only one slow store, evict all Region leaders of that store
+>> scheduler add evict-slow-store-scheduler                // Automatically detect slow-disk or slow-network nodes and evict all Region leaders from those nodes when specific conditions are met
 >> scheduler remove grant-leader-scheduler-1               // Remove the corresponding scheduler, and `-1` corresponds to the store ID
 >> scheduler pause balance-region-scheduler 10             // Pause the balance-region scheduler for 10 seconds
 >> scheduler pause all 10                                  // Pause all schedulers for 10 seconds
@@ -965,6 +965,44 @@ The state of the scheduler can be one of the following:
 - `pending`: the scheduler cannot generate scheduling operators. For a scheduler in the `pending` state, brief diagnostic information is returned. The brief information describes the state of stores and explains why these stores cannot be selected for scheduling.
 - `normal`: there is no need to generate scheduling operators.
 
+### `scheduler config evict-slow-store-scheduler`
+
+The `evict-slow-store-scheduler` limits PD from scheduling Leaders to abnormal TiKV nodes and actively evicts Leaders when necessary, thereby reducing the impact of slow nodes on the cluster when TiKV nodes experience disk I/O or network jitter.
+
+#### Slow-disk nodes
+
+Starting from v6.2.0, TiKV reports a `SlowScore` in store heartbeats to PD. This score is calculated based on disk I/O conditions and ranges from 1 to 100. A higher value indicates a higher possibility of disk performance anomalies on that node.
+
+For slow-disk nodes, the detection on TiKV and the scheduling via `evict-slow-store-scheduler` on PD are enabled by default, which means no additional configuration is required.
+
+#### Slow-network nodes
+
+Starting from v8.5.5 and v9.0.0, TiKV supports reporting a `NetworkSlowScore` in store heartbeats to PD. It is calculated based on network detection results and helps identify slow nodes experiencing network jitter. The score ranges from 1 to 100, where a higher value indicates a higher possibility of network anomalies.
+
+For compatibility and resource consumption considerations, the detection and scheduling of slow-network nodes are disabled by default. To enable them, configure both of the following:
+
+1. Enable the PD scheduler to handle slow-network nodes:
+
+    ```bash
+    scheduler config evict-slow-store-scheduler set enable-network-slow-store true
+    ```
+
+2. On TiKV, set the [`raftstore.inspect-network-interval`](/tikv-configuration-file.md#inspect-network-interval-new-in-v855-and-v900) configuration item to a value greater than `0` to enable network detection.
+
+#### Recovery time control
+
+You can specify how long a slow node must remain stable before it is considered recovered by using the `recovery-duration` parameter.
+
+Example:
+
+```bash
+>> scheduler config evict-slow-store-scheduler
+{
+  "recovery-duration": "1800"  // 30 minutes
+}
+>> scheduler config evict-slow-store-scheduler set recovery-duration 600
+```
+
 ### `scheduler config balance-leader-scheduler`
 
 Use this command to view and control the `balance-leader-scheduler` policy.
diff --git a/tikv-configuration-file.md b/tikv-configuration-file.md
@@ -289,6 +289,13 @@ This document only describes the parameters that are not included in command-lin
 + Sets the size of the connection pool for service and forwarding requests to the server. Setting it to too small a value affects the request latency and load balancing.
 + Default value: `4`
 
+### `inspect-network-interval` <span class="version-mark">New in v8.5.5 and v9.0.0</span>
+
++ Controls the interval at which the TiKV HealthChecker actively performs network detection to PD and other TiKV nodes. TiKV calculates a `NetworkSlowScore` based on the network detection results and reports the network status of slow nodes to PD.
++ Setting this value to `0` disables the network detection. Setting it to a smaller value increases the detection frequency, which helps detect network jitter more quickly, but it also consumes more network bandwidth and CPU resources.
++ Default value: `100ms`
++ Value range: `0` or `[10ms, +∞)`
+
 ## readpool.unified
 
 Configuration items related to the single thread pool serving read requests. This thread pool supersedes the original storage thread pool and coprocessor thread pool since the 4.0 version.

Original file line number	Diff line number	Diff line change
`@@ -297,7 +297,9 @@ If a TiKV node fails, PD defaults to setting the corresponding node to the **dow`
`297`	`297`
`298`	`298`	Practically, if a node failure is considered unrecoverable, you can immediately take it offline. This makes PD replenish replicas soon in another node and reduces the risk of data loss. In contrast, if a node is considered recoverable, but the recovery cannot be done in 30 minutes, you can temporarily adjust `max-store-down-time` to a larger value to avoid unnecessary replenishment of the replicas and resources waste after the timeout.
`299`	`299`
`300`		-In TiDB v5.2.0, TiKV introduces the mechanism of slow TiKV node detection. By sampling the requests in TiKV, this mechanism works out a score ranging from 1 to 100. A TiKV node with a score higher than or equal to 80 is marked as slow. You can add [`evict-slow-store-scheduler`](/pd-control.md#scheduler-show--add--remove--pause--resume--config--describe) to detect and schedule slow nodes. If only one TiKV is detected as slow, and the slow score reaches the limit (80 by default), the Leader in this node will be evicted (similar to the effect of `evict-leader-scheduler`).
	`300`	+Starting from TiDB v5.2.0, TiKV introduces a mechanism to detect slow-disk nodes. By sampling the requests in TiKV, this mechanism works out a score ranging from 1 to 100. A TiKV node with a score higher than or equal to 80 is marked as slow. You can add [`evict-slow-store-scheduler`](/pd-control.md#scheduler-show--add--remove--pause--resume--config--describe) to schedule slow nodes. If only one TiKV node is detected as slow, and its slow score reaches the limit (80 by default), the Leaders on that node will be evicted (similar to the effect of `evict-leader-scheduler`).
	`301`	`+`
	`302`	+Starting from v8.5.5 and v9.0.0, TiKV introduces a mechanism to detect slow-network nodes. Similar to slow-disk node detection, this mechanism identifies slow nodes by probing network latency between TiKV nodes and calculating a score. You can enable this mechanism using [`enable-network-slow-store`](/pd-control.md#scheduler-config-evict-slow-store-scheduler).
`301`	`303`
`302`	`304`	`> Note:`
`303`	`305`	`>`