Skip to content

Conversation

@albertzaharovits
Copy link
Contributor

Fixes #125842
Relates #120869

@albertzaharovits albertzaharovits added >test Issues or PRs that are addressing/adding tests :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. labels Apr 1, 2025
@albertzaharovits albertzaharovits self-assigned this Apr 1, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Indexing Meta label for Distributed Indexing team label Apr 1, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@albertzaharovits
Copy link
Contributor Author

The only test failure (https://gradle-enterprise.elastic.co/s/f7fv2rsrf2mwy) was when some merge thread was interrupted...

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS from the test run, the order of things is opposite. The assertBusy below seems to time-out, which then causes us to interrupt the threads.

I think we've not found the root cause here?

@albertzaharovits
Copy link
Contributor Author

I think we've not found the root cause here?

I haven't figured it out 100% what's happening here, but I've strengthen the assertions that the submitted task was indeed running or aborting before asserting the disk IO rate update on the set of all currently running merge tasks (this should rule out any interference between the loops in the test).

@albertzaharovits
Copy link
Contributor Author

@henningandersen Can you take another look here, please?

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

threadPoolMergeExecutorService.submitMergeTask(mergeTask);
long newIORate = threadPoolMergeExecutorService.getTargetIORateBytesPerSec();
// all currently running merge tasks must be IO throttled
boolean mergeTaskSubmitted = threadPoolMergeExecutorService.submitMergeTask(mergeTask);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we assert that the runSemaphore has no permits here? I think that is an assumption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pushed 5db6465

@albertzaharovits albertzaharovits added v9.0.3 v8.19.0 auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) auto-backport Automatically create backport pull requests when merged labels Jun 13, 2025
@elasticsearchmachine elasticsearchmachine merged commit d98327c into elastic:main Jun 16, 2025
24 checks passed
@albertzaharovits albertzaharovits deleted the fix-125842 branch June 16, 2025 18:00
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.19 Commit could not be cherrypicked due to conflicts
9.0 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 126058

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) :Distributed Indexing/Engine Anything around managing Lucene and the Translog in an open shard. Team:Distributed Indexing Meta label for Distributed Indexing team >test Issues or PRs that are addressing/adding tests v8.19.0 v9.0.3 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] ThreadPoolMergeExecutorServiceTests testIORateIsAdjustedForRunningMergeTasks failing

3 participants