-
Notifications
You must be signed in to change notification settings - Fork 69
improve the robust of balance region scheduler #85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 3 commits
38b6fe4
e951aeb
6da63b8
509e480
7d1a4db
47488d5
eef4f8c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| # Improve the robust of balance scheduler | ||
|
|
||
| - RFC PR: [https://github.com/tikv/rfcs/pull/85](https://github.com/tikv/rfcs/pull/83) | ||
| - Tracking Issue: [https://github.com/tikv/pd/issues/](https://github.com/tikv/pd/issues/4428) | ||
|
|
||
| ## Summary | ||
|
|
||
| Make scheduler more robust for dynamic region size. | ||
|
|
||
| ## Motivation | ||
|
|
||
| We have observed many different situations when the region size is different. The major drawback comes from this aspects: | ||
bufferflies marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| 1. Balance region scheduler pick source store in order of store's score, the second store will be picked after the first store has not met some filter or retry times exceed fixed value, this problem is also exist in target pick strategy. | ||
bufferflies marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| 2. Operator has an import effect on region leader, and the leader is responsible in the operator life cycle. But the region leader will not be limited by any filter. | ||
bufferflies marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| 3. There are some factor that influence execution time of operator such as region size, IO limit, cpu load. PD needs to be more flexible to manage operator's life not fixed config. | ||
bufferflies marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
bufferflies marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| 4. PD should know some global config about TiKV like `region-max-size`, `region-report-interval`. This config should synchronize with PD. | ||
|
|
||
| ## Detailed design | ||
|
|
||
| ### Store pick strategy | ||
|
|
||
| It can arrange all the store based on label, like TiKV and TiFlash and allow low score group has more chance to scheduler. But the first score region should has highest priority to be selected. | ||
|
||
|
|
||
| #### Consider Influence to leader | ||
|
|
||
| Normally, one operator is made of region, source store and target store, the key works finished by region leader such as snapshot generate, snapshot send. It is not friendly to the leader if majority operator is add follow. | ||
|
|
||
|
||
| It will add new store limit as new limit type to decrease leader loads of every store. | ||
|
||
|
|
||
| ### Operator control | ||
|
|
||
| #### Store limit cost | ||
|
|
||
| Different size region occupy tokens should be different. Maybe can use this formula: | ||
|
|
||
|  | ||
|
|
||
| Cost equals 200 if operator influence is 1Mb or equals 600 if operator influence is 1gb. | ||
|
||
|
|
||
| #### Operator life cycle | ||
|
|
||
| The operator life cycle can divide into some stages: create, executing(started), complete. PD will check operator stage by region heart beats and cancel operator if one operator‘s running time exceed the fixed value(10m). | ||
|
||
|
|
||
| It will be better if we can calculate every step expecting execute duration by major factor includes region size, IO limit and operator concurrency like this: | ||
|
||
|
|
||
|  | ||
|
|
||
| The snapshot generator duration can ignore because it doesn't need to scan. The apply snapshot duration will finish in minute level if it needs to load hot buckets. | ||
|
|
||
| ### Sync global config | ||
|
|
||
| There are some global config that all components need to synchronize like `region-max-size`, `io-limit`. Using ETCD api to implement global config may be a good idea like [this](<[https://github.com/pingcap/tidb/pull/31010/files](https://github.com/pingcap/tidb/pull/31010/files)>) | ||
|
||
|
|
||
| ## Drawbacks | ||
|
|
||
| ## Alternatives | ||
|
|
||
| Removing peer may not influence the cluster performance, it can be replace by leader store limit. | ||
bufferflies marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Canceling operator can depends on TiKV not by PD, but TiKV should notify PD after canceled one operator. | ||
|
|
||
| ## Questions | ||
|
||
|
|
||
| ## Unresolved questions | ||
Uh oh!
There was an error while loading. Please reload this page.