Skip to content

Conversation

@albertzaharovits
Copy link
Contributor

@albertzaharovits albertzaharovits commented Mar 31, 2025

We don't know how many semaphore merge permits we need to release, or how many are already released.

Fixes #125744

@albertzaharovits albertzaharovits added >test Issues or PRs that are addressing/adding tests :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Mar 31, 2025
@albertzaharovits albertzaharovits self-assigned this Mar 31, 2025
@elasticsearchmachine elasticsearchmachine added Team:Distributed Indexing Meta label for Distributed Indexing team v9.1.0 labels Mar 31, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

void allowAllMerging() {
// even when indexing is done, queued and backlogged merges can themselves trigger further merging
// don't let this test be bothered by that, and simply let all merging run unhindered
runMergeSemaphore.release(Integer.MAX_VALUE - initialRunMergesCount);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The semaphore was also released when excess merges were enqueued. This is a dynamic process. I preferred I not fudge an upper limit value, or put Integer.MAX_VALUE / 2 ...

// await all merging to catch up
assertBusy(() -> {
// unblock merge threads
testEnginePlugin.allowAllMerging();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dynamically release merge permits until everything's done.

runMergeSemaphore.release(Integer.MAX_VALUE - initialRunMergesCount);
if (runMergeSemaphore.availablePermits() < 10_000) {
runMergeSemaphore.release(10_000);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not quite sure what we are doing here. Isn't the maximum number of permits we can release = initialRunMergesCount - runMergeSemaphore.availablePermits() ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to release as many permits as there are merges, but we don't know how many of those are going to be generated (depends on many parameters). And the fact that there is some "background" merging while merging is otherwise mostly stopped, complicates computations.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

// even when indexing is done, queued and backlogged merges can themselves trigger further merging
// don't let this test be bothered by that, and simply let all merging run unhindered
runMergeSemaphore.release(Integer.MAX_VALUE - initialRunMergesCount);
if (runMergeSemaphore.availablePermits() < 10_000) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How can it have more than 10K available already? Should we instead assert it has not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can't have more than 10K, or Integer.MAX_VALUE / 2, 99999, etc...
It's just strange to fudge any sort of non-obvious value, but on the other hand, the shenanigans required to avoid that are not worth it. I'll fudge a value, and drop a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed 03df3d1

@albertzaharovits albertzaharovits merged commit edc5379 into elastic:main Apr 2, 2025
17 checks passed
@albertzaharovits albertzaharovits deleted the fix-125744 branch April 2, 2025 13:07
andreidan pushed a commit to andreidan/elasticsearch that referenced this pull request Apr 9, 2025
…nCatchesUp (elastic#125956)

We don't know how many semaphore merge permits we need to release, or how many are already released.

Fixes elastic#125744
albertzaharovits added a commit to albertzaharovits/elasticsearch that referenced this pull request Jun 9, 2025
…nCatchesUp (elastic#125956)

We don't know how many semaphore merge permits we need to release, or how many are already released.

Fixes elastic#125744
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Indexing Meta label for Distributed Indexing team >test Issues or PRs that are addressing/adding tests v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] ThreadPoolMergeSchedulerStressTestIT testMergingFallsBehindAndThenCatchesUp failing

4 participants