Threadpool merge executor does not block aborted merges #129613

albertzaharovits · 2025-06-18T08:51:33Z

This PR addresses a bug where aborted merges are blocked if there's insufficient disk space.

Previously, the merge disk space estimation did not consider if the operation has been aborted when/while it was enqueued for execution. Consequently, aborted merges, for e.g. when closing a shard, were blocked if their disk space estimation was exceeding the available disk space threshold. In this case, the shard close operation would itself block.

This fix estimates a disk space budget of 0 for aborted merges, and it periodically checks if any enqueued merge tasks have been aborted (more generally, it checks if the budget estimate for any merge tasks has changed, and reorders the queue if so). This way aborted merges are prioritized and are never blocked.

Closes #129335

elasticsearchmachine · 2025-06-18T11:19:54Z

Hi @albertzaharovits, I've created a changelog YAML for you.

albertzaharovits · 2025-06-18T11:43:30Z

server/src/main/java/org/elasticsearch/index/engine/ThreadPoolMergeExecutorService.java

+                // updates the budget of enqueued elements (and possibly reorders the priority queue)
+                updateBudgetOfEnqueuedElementsAndReorderQueue();
+                // update the budget of dequeued, but still in-use elements (these are the elements that are consuming budget)
                unreleasedBudgetPerElement.replaceAll((e, v) -> budgetFunction.applyAsLong(e.element()));


This change will also adjust the budget of running merges that have been aborted to 0. That's a bit optimistic, but I find the alternative implementation convoluted, and it's probably counter-intuitive to estimate 0 for to-be-run merges but not for already-running ones.

henningandersen

LGTM.

We should preferably add specific testing, either before or after merging.

henningandersen · 2025-06-19T05:16:32Z

server/src/internalClusterTest/java/org/elasticsearch/index/engine/MergeWithLowDiskSpaceIT.java

+            .build();
+    }
+
+    public void testShardCloseWhenDiskSpaceInsufficient() {


It is not clear to me what this verifies? AFAICS, there is no merge at the end of the test and thus it may not verify anything?

Yeah, the test was not ready when you looked at it, it was still WIP, sorry for not being clear.

It is now ready and it tests that we can close a shard (an index) with enqueued merges that are blocked due to insufficient disk space. The merges will be aborted, which should unblock and prioritize them in the queue.

elasticsearchmachine · 2025-06-19T13:20:44Z

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

albertzaharovits · 2025-06-19T13:28:15Z

We should preferably add specific testing, either before or after merging.

In the end, I've added 2 tests here:

MergeWithLowDiskSpaceIT.testShardCloseWhenDiskSpaceInsufficient
ThreadPoolMergeExecutorServiceDiskSpaceTests.testEnqueuedMergeTasksAreUnblockedWhenEstimatedMergeSizeChanges

There's decent coverage, I think.

albertzaharovits · 2025-06-19T13:31:03Z

(labeling as a >non-issue rather than a >bug because the problem was caught before it was released)

elasticsearchmachine · 2025-06-19T14:52:30Z

💔 Backport failed

Status	Branch	Result
❌	8.19	Commit could not be cherrypicked due to conflicts
❌	9.0	Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 129613

This PR addresses a bug where aborted merges are blocked if there's insufficient disk space. Previously, the merge disk space estimation did not consider if the operation has been aborted when/while it was enqueued for execution. Consequently, aborted merges, for e.g. when closing a shard, were blocked if their disk space estimation was exceeding the available disk space threshold. In this case, the shard close operation would itself block. This fix estimates a disk space budget of `0` for aborted merges, and it periodically checks if any enqueued merge tasks have been aborted (more generally, it checks if the budget estimate for any merge tasks has changed, and reorders the queue if so). This way aborted merges are prioritized and are never blocked. Closes elastic#129335

…29727) This PR addresses a bug where aborted merges are blocked if there's insufficient disk space. Previously, the merge disk space estimation did not consider if the operation has been aborted when/while it was enqueued for execution. Consequently, aborted merges, for e.g. when closing a shard, were blocked if their disk space estimation was exceeding the available disk space threshold. In this case, the shard close operation would itself block. This fix estimates a disk space budget of `0` for aborted merges, and it periodically checks if any enqueued merge tasks have been aborted (more generally, it checks if the budget estimate for any merge tasks has changed, and reorders the queue if so). This way aborted merges are prioritized and are never blocked. Closes #129335

…) (#129728) * Threadpool merge executor does not block aborted merges (#129613) This PR addresses a bug where aborted merges are blocked if there's insufficient disk space. Previously, the merge disk space estimation did not consider if the operation has been aborted when/while it was enqueued for execution. Consequently, aborted merges, for e.g. when closing a shard, were blocked if their disk space estimation was exceeding the available disk space threshold. In this case, the shard close operation would itself block. This fix estimates a disk space budget of `0` for aborted merges, and it periodically checks if any enqueued merge tasks have been aborted (more generally, it checks if the budget estimate for any merge tasks has changed, and reorders the queue if so). This way aborted merges are prioritized and are never blocked. Closes #129335 * ClusterDisruptionIT.java * [CI] Auto commit changes from spotless --------- Co-authored-by: elasticsearchmachine <[email protected]>

This PR addresses a bug where aborted merges are blocked if there's insufficient disk space. Previously, the merge disk space estimation did not consider if the operation has been aborted when/while it was enqueued for execution. Consequently, aborted merges, for e.g. when closing a shard, were blocked if their disk space estimation was exceeding the available disk space threshold. In this case, the shard close operation would itself block. This fix estimates a disk space budget of `0` for aborted merges, and it periodically checks if any enqueued merge tasks have been aborted (more generally, it checks if the budget estimate for any merge tasks has changed, and reorders the queue if so). This way aborted merges are prioritized and are never blocked. Closes elastic#129335

update budget of enqueued elements

838fae3

albertzaharovits requested a review from henningandersen June 18, 2025 08:51

albertzaharovits self-assigned this Jun 18, 2025

albertzaharovits added the :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. label Jun 18, 2025

elasticsearchmachine added the v9.1.0 label Jun 18, 2025

albertzaharovits and others added 5 commits June 18, 2025 13:42

Test scaffold

e222ab9

Unmute test

4baa30c

Restore setting default

c7530d2

Merge branch 'main' into consider-aborting-merges-while-enqueued

1363bbb

[CI] Auto commit changes from spotless

8d7b18d

albertzaharovits added the >bug label Jun 18, 2025

Update docs/changelog/129613.yaml

51b5d62

albertzaharovits added v8.19.0 v9.0.3 labels Jun 18, 2025

albertzaharovits changed the title ~~Update budget estimates for enqueued merge tasks~~ Threadpool merge executor does not block aborted merges Jun 18, 2025

ES|QL: No plain strings in Literal (elastic#129399)

73abc28

albertzaharovits commented Jun 18, 2025

View reviewed changes

albertzaharovits added 3 commits June 18, 2025 15:26

More test scaffolding

ea996c1

Merge branch 'main' into consider-aborting-merges-while-enqueued

31fa7ae

boolean test for queue head over the available budget

d870bfb

henningandersen approved these changes Jun 19, 2025

View reviewed changes

elasticsearchmachine added v9.0.4 and removed v9.0.3 labels Jun 19, 2025

albertzaharovits and others added 6 commits June 19, 2025 10:52

Even more test scaffolding

96b6cb9

Merge branch 'main' into consider-aborting-merges-while-enqueued

9297945

[CI] Auto commit changes from spotless

8b0507c

IT done

b6aed72

Fix testShardCloseWhenDiskSpaceInsufficient

1fc3b64

[CI] Auto commit changes from spotless

12119c5

albertzaharovits added 3 commits June 19, 2025 14:15

do not force merge when indexing

328ac82

testEnqueuedMergeTasksAreUnblockedWhenEstimatedMergeSizeChanges

f47a419

Merge branch 'main' into consider-aborting-merges-while-enqueued

f2afd19

albertzaharovits added >non-issue and removed >bug labels Jun 19, 2025

Delete docs/changelog/129613.yaml

2527c37

albertzaharovits marked this pull request as ready for review June 19, 2025 13:20

elasticsearchmachine added the Team:Distributed Indexing Meta label for Distributed Indexing team label Jun 19, 2025

albertzaharovits added auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) auto-backport Automatically create backport pull requests when merged labels Jun 19, 2025

elasticsearchmachine merged commit 083326e into elastic:main Jun 19, 2025
27 checks passed

albertzaharovits deleted the consider-aborting-merges-while-enqueued branch June 19, 2025 14:51

elasticsearchmachine added the backport pending label Jun 19, 2025

This was referenced Jun 19, 2025

[9.0] Threadpool merge executor does not block aborted merges (#129613) #129727

Merged

[8.19] Threadpool merge executor does not block aborted merges (#129613) #129728

Merged

albertzaharovits removed the backport pending label Jun 20, 2025

kingherc mentioned this pull request Jul 14, 2025

Fix concurrent list in merge test #131186

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Threadpool merge executor does not block aborted merges #129613

Threadpool merge executor does not block aborted merges #129613

Uh oh!

albertzaharovits commented Jun 18, 2025 •

edited

Loading

Uh oh!

elasticsearchmachine commented Jun 18, 2025

Uh oh!

albertzaharovits Jun 18, 2025

Uh oh!

henningandersen left a comment

Uh oh!

henningandersen Jun 19, 2025

Uh oh!

albertzaharovits Jun 19, 2025

Uh oh!

elasticsearchmachine commented Jun 19, 2025

Uh oh!

albertzaharovits commented Jun 19, 2025

Uh oh!

albertzaharovits commented Jun 19, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Threadpool merge executor does not block aborted merges #129613

Threadpool merge executor does not block aborted merges #129613

Uh oh!

Conversation

albertzaharovits commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 18, 2025

Uh oh!

albertzaharovits Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

henningandersen left a comment

Choose a reason for hiding this comment

Uh oh!

henningandersen Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

albertzaharovits Jun 19, 2025

Choose a reason for hiding this comment

Uh oh!

elasticsearchmachine commented Jun 19, 2025

Uh oh!

albertzaharovits commented Jun 19, 2025

Uh oh!

albertzaharovits commented Jun 19, 2025

Uh oh!

Uh oh!

elasticsearchmachine commented Jun 19, 2025

💔 Backport failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

albertzaharovits commented Jun 18, 2025 •

edited

Loading