Skip to content

Conversation

ywangd
Copy link
Member

@ywangd ywangd commented Sep 8, 2025

The assertion iterates through the list of responses. It needs to wait for the last element to be added to the list otherwise it may encounter ConcurrentModificationException

Resolves: #134277

The assertion iterates through the list of responses. It needs to wait
for the last element to be added to the list otherwise it may encounter
ConcurrentModificationException

Resolves: elastic#134277
@ywangd ywangd added >test Issues or PRs that are addressing/adding tests auto-backport Automatically create backport pull requests when merged :Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. v9.2.0 v9.1.4 v9.0.7 v8.19.4 labels Sep 8, 2025
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Sep 8, 2025
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR fixes a concurrency issue in the TransportNodesActionTests.testConcurrentlyCompletionAndCancellation test method that was causing ConcurrentModificationException errors. The test was failing because it was iterating through a list of responses while another thread was still adding elements to it.

  • Added synchronization using CountDownLatch to ensure all responses are gathered before assertion
  • Removed the test from the muted tests list as the concurrency issue is now resolved

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
TransportNodesActionTests.java Added CountDownLatch synchronization to wait for response completion before asserting
muted-tests.yml Removed the previously muted test since the concurrency issue is fixed

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I think this excludes some interleavings that matter. It should be that all the responses apart from the last one are released before onCancelledLatch is triggered, regardless of whether we've received & processed all the responses, whereas now we're delaying that check until we've definitely received all the responses.

Could we use nodeResponses just for the first N-1 responses, check that they're all released on cancel, and then handle the racy Nth response separately?

@DaveCTurner
Copy link
Contributor

Also hmm if we do this do we need onCancelledLatch? We know the task is cancelled by this point, because we've received the exception response. Releasing all the in-flight node responses should happen-before responding to the outer request. So really we just need to wait for the decRef in the final completeOneRequest call.

@ywangd
Copy link
Member Author

ywangd commented Sep 9, 2025

Yeah you are right on both points. I pushed 87704ae for them. Thanks a lot!

@ywangd ywangd requested a review from DaveCTurner September 9, 2025 02:59
Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, tho (nit) a CountDownLatch plus a separate AtomicReference is effectively a PlainActionFuture on which you could call ESTestCase#safeGet.

@ywangd
Copy link
Member Author

ywangd commented Sep 10, 2025

Ah yes. I changed to use PlainActionFuture see fdf51b8

@ywangd ywangd added the auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Sep 10, 2025
@elasticsearchmachine elasticsearchmachine merged commit b03e1d4 into elastic:main Sep 10, 2025
34 checks passed
@ywangd ywangd deleted the ES-134277-fix branch September 10, 2025 03:25
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
9.1 Commit could not be cherrypicked due to conflicts
9.0 Commit could not be cherrypicked due to conflicts
8.19 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 134279

ywangd added a commit to ywangd/elasticsearch that referenced this pull request Sep 10, 2025
The assertion iterates through the list of responses. It needs to wait
for the last element to be added to the list otherwise it may encounter
ConcurrentModificationException

Resolves: elastic#134277
(cherry picked from commit b03e1d4)

# Conflicts:
#	muted-tests.yml
@ywangd
Copy link
Member Author

ywangd commented Sep 10, 2025

💚 All backports created successfully

Status Branch Result
9.1
9.0
8.19

Questions ?

Please refer to the Backport tool documentation

ywangd added a commit to ywangd/elasticsearch that referenced this pull request Sep 10, 2025
The assertion iterates through the list of responses. It needs to wait
for the last element to be added to the list otherwise it may encounter
ConcurrentModificationException

Resolves: elastic#134277
(cherry picked from commit b03e1d4)

# Conflicts:
#	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this pull request Sep 10, 2025
…34424)

The assertion iterates through the list of responses. It needs to wait
for the last element to be added to the list otherwise it may encounter
ConcurrentModificationException

Resolves: #134277
(cherry picked from commit b03e1d4)

# Conflicts:
#	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this pull request Sep 10, 2025
…34423)

The assertion iterates through the list of responses. It needs to wait
for the last element to be added to the list otherwise it may encounter
ConcurrentModificationException

Resolves: #134277
(cherry picked from commit b03e1d4)

# Conflicts:
#	muted-tests.yml
sarog pushed a commit to portsbuild/elasticsearch that referenced this pull request Sep 11, 2025
…) (elastic#134424)

The assertion iterates through the list of responses. It needs to wait
for the last element to be added to the list otherwise it may encounter
ConcurrentModificationException

Resolves: elastic#134277
(cherry picked from commit b03e1d4)

# Conflicts:
#	muted-tests.yml
elasticsearchmachine pushed a commit that referenced this pull request Sep 15, 2025
…34422)

The assertion iterates through the list of responses. It needs to wait
for the last element to be added to the list otherwise it may encounter
ConcurrentModificationException

Resolves: #134277
(cherry picked from commit b03e1d4)

# Conflicts:
#	muted-tests.yml

Co-authored-by: Elastic Machine <[email protected]>
sarog pushed a commit to portsbuild/elasticsearch that referenced this pull request Sep 19, 2025
…) (elastic#134424)

The assertion iterates through the list of responses. It needs to wait
for the last element to be added to the list otherwise it may encounter
ConcurrentModificationException

Resolves: elastic#134277
(cherry picked from commit b03e1d4)

# Conflicts:
#	muted-tests.yml
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged auto-merge-without-approval Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) backport pending :Distributed Coordination/Distributed A catch all label for anything in the Distributed Coordination area. Please avoid if you can. Team:Distributed Coordination Meta label for Distributed Coordination team >test Issues or PRs that are addressing/adding tests v8.19.4 v9.0.7 v9.1.4 v9.2.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[CI] TransportNodesActionTests testConcurrentlyCompletionAndCancellation failing

3 participants