Commit d535f2b
authored
[enhancement](recycle bin) optimize the recycle bin to reduce the potential of FE hang (#55753)
### What problem does this PR solve?
I found when there are large amount of garbage(about 90000 partitions)
in recycle bin, the Fe's table lock will be hold for long time by
DynamicPartitionScheduler thread, the stack is like:
```
"recycle bin" #28 daemon prio=5 os_prio=0 cpu=73880509.81ms elapsed=96569.50s allocated=9212M defined_classes=9 tid=0x00007f0b545c1800 nid=0x2f4540 runnable [0x00007f0b251fd000]
java.lang.Thread.State: RUNNABLE
at org.apache.doris.catalog.CatalogRecycleBin.getSameNamePartitionIdListToErase(CatalogRecycleBin.java:539)
- locked <0x000000020d6d6130> (a org.apache.doris.catalog.CatalogRecycleBin)
at org.apache.doris.catalog.CatalogRecycleBin.erasePartitionWithSameName(CatalogRecycleBin.java:556)
- locked <0x000000020d6d6130> (a org.apache.doris.catalog.CatalogRecycleBin)
at org.apache.doris.catalog.CatalogRecycleBin.erasePartition(CatalogRecycleBin.java:510)
- locked <0x000000020d6d6130> (a org.apache.doris.catalog.CatalogRecycleBin)
at org.apache.doris.catalog.CatalogRecycleBin.runAfterCatalogReady(CatalogRecycleBin.java:1012)
at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58)
at org.apache.doris.common.util.Daemon.run(Daemon.java:119)
Locked ownable synchronizers:
- None
"DynamicPartitionScheduler" #41 daemon prio=5 os_prio=0 cpu=115405.50ms elapsed=87942.53s allocated=16637M defined_classes=96 tid=0x00007f0b545cc800 nid=0x2f4545 waiting for monitor entry [0x00007f0b247fe000]
java.lang.Thread.State: BLOCKED (on object monitor)
at org.apache.doris.catalog.CatalogRecycleBin.recyclePartition(CatalogRecycleBin.java:187)
- waiting to lock <0x000000020d6d6130> (a org.apache.doris.catalog.CatalogRecycleBin)
at org.apache.doris.catalog.OlapTable.dropPartition(OlapTable.java:1164)
at org.apache.doris.catalog.OlapTable.dropPartition(OlapTable.java:1207)
at org.apache.doris.datasource.InternalCatalog.dropPartitionWithoutCheck(InternalCatalog.java:1895)
at org.apache.doris.datasource.InternalCatalog.dropPartition(InternalCatalog.java:1884)
at org.apache.doris.catalog.Env.dropPartition(Env.java:3212)
at org.apache.doris.clone.DynamicPartitionScheduler.executeDynamicPartition(DynamicPartitionScheduler.java:605)
at org.apache.doris.clone.DynamicPartitionScheduler.runAfterCatalogReady(DynamicPartitionScheduler.java:729)
at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58)
at org.apache.doris.clone.DynamicPartitionScheduler.run(DynamicPartitionScheduler.java:688)
```
The DynamicPartitionScheduler thread is waiting the CatalogRecycleBin
thread while the table write lock is holding by itself .
In Fe log, you can see the CatalogRecycleBin thread is running something
big and cost almost 5~10 mins every run:
```
fe.log.20250907-2:2025-09-07 04:15:50,740 INFO (recycle bin|28) [CatalogRecycleBin.erasePartition():516] erasePartition eraseNum: 0 cost: 375503ms
fe.log.20250907-2:2025-09-07 04:23:14,109 INFO (recycle bin|28) [CatalogRecycleBin.erasePartition():516] erasePartition eraseNum: 0 cost: 413369ms
fe.log.20250907-2:2025-09-07 04:30:01,187 INFO (recycle bin|28) [CatalogRecycleBin.erasePartition():516] erasePartition eraseNum: 0 cost: 377077ms
fe.log.20250907-2:2025-09-07 04:38:22,769 INFO (recycle bin|28) [CatalogRecycleBin.erasePartition():516] erasePartition eraseNum: 0 cost: 471581ms
fe.log.20250907-2:2025-09-07 04:45:42,552 INFO (recycle bin|28) [CatalogRecycleBin.erasePartition():516] erasePartition eraseNum: 0 cost: 409782ms
fe.log.20250907-2:2025-09-07 04:54:30,825 INFO (recycle bin|28) [CatalogRecycleBin.erasePartition():516] erasePartition eraseNum: 0 cost: 498272ms
fe.log.20250907-2:2025-09-07 05:01:36,311 INFO (recycle bin|28) [CatalogRecycleBin.erasePartition():516] erasePartition eraseNum: 0 cost: 395485ms
```
The most costly task of the CatalogRecycleBin thread is erasing the
partition with same name:
```
2025-09-07 04:16:20,884 INFO (recycle bin|28) [CatalogRecycleBin.erasePartitionWithSameName():569] erase partition[62638463] name: p_2019051116000
0_20190511170000 from table[32976073] from db[682022]
2025-09-07 04:16:20,994 INFO (recycle bin|28) [CatalogRecycleBin.erasePartitionWithSameName():569] erase partition[62640651] name: p_2019043016000
0_20190430170000 from table[32976073] from db[682022]
2025-09-07 04:16:21,438 INFO (recycle bin|28) [CatalogRecycleBin.erasePartitionWithSameName():569] erase partition[60264769] name: p_2019051721000
0_20190517220000 from table[32976073] from db[682022]
2025-09-07 04:16:21,787 INFO (recycle bin|28) [CatalogRecycleBin.erasePartitionWithSameName():569] erase partition[62651922] name: p_2019051015000
0_20190510160000 from table[32976073] from db[682022]
2025-09-07 04:16:21,893 INFO (recycle bin|28) [CatalogRecycleBin.erasePartitionWithSameName():569] erase partition[59222503] name: p_2019052708000
0_20190527090000 from table[32976073] from db[682022]
2025-09-07 04:16:22,204 INFO (recycle bin|28) [CatalogRecycleBin.erasePartitionWithSameName():569] erase partition[62656398] name: p_2019051109000
0_20190511100000 from table[32976073] from db[682022]
2025-09-07 04:16:22,430 INFO (recycle bin|28) [CatalogRecycleBin.erasePartitionWithSameName():569] erase partition[59228497] name: p_2019051812000
0_20190518130000 from table[32976073] from db[682022]
2025-09-07 04:16:22,493 INFO (recycle bin|28) [CatalogRecycleBin.erasePartitionWithSameName():569] erase partition[62658335] name: p_2019051217000
0_20190512180000 from table[32976073] from db[682022]
...
```
This may leads to whole Fe hang because the table lock is used for many
threads.
<img width="1230" height="438" alt="Clipboard_Screenshot_1757283600"
src="https://github.com/user-attachments/assets/59ec8707-82f8-4daf-8dae-b9ebea2b2959"
/>
This commit mainly optimize the logic of recycling the same name meta,
adding caches to reduce the time complexity.
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [x] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->1 parent 1b81208 commit d535f2b
File tree
2 files changed
+1041
-139
lines changed- fe/fe-core/src
- main/java/org/apache/doris/catalog
- test/java/org/apache/doris/catalog
2 files changed
+1041
-139
lines changed
0 commit comments