-
Notifications
You must be signed in to change notification settings - Fork 25.5k
[Test] Wait for cancellation before assertions #134532
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Test] Wait for cancellation before assertions #134532
Conversation
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
// Wait for the latch, the listener for releasing node responses is called before it. | ||
// We need to wait for the latch because the cancellation may be detected in CancellableFanOut#onCompletion with | ||
// the responseHandled flag being true. The flag is set by the cancellation listener which is still in process of | ||
// draining existing responses. | ||
safeAwait(onCancelledLatch); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out we still need the latch. It was actually part of the original issue, i.e. the cancellation can comes in after all node responses are collected and right before the final response is sent. In this case, the final response is short circuited to be an exception while the cancellation listener is still doing work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm ok I think I see, so you mean the final response can be sent in between these two lines:
elasticsearch/server/src/main/java/org/elasticsearch/action/support/CancellableFanOut.java
Lines 78 to 80 in a59c182
semaphore.release(); | |
// finally, release refs to all the per-item listeners (without calling onItemFailure, so this is also fast) | |
cancellableTask.notifyIfCancelled(itemCancellationListener); |
On reflection this seems kinda surprising if not an outright bug: we generally prefer to delay sending the response until everything is released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not the above lines. The final response (cancellation) can be sent at this line
elasticsearch/server/src/main/java/org/elasticsearch/action/support/nodes/TransportNodesAction.java
Line 179 in 6884dbb
cancellableTask.notifyIfCancelled(l); |
before the node responses are released by this line
elasticsearch/server/src/main/java/org/elasticsearch/action/support/nodes/TransportNodesAction.java
Line 120 in 6884dbb
Releasables.wrap(Iterators.map(drainedResponses.iterator(), r -> r::decRef)).close(); |
A potential alternative fix is to add a similar synchronized block to drain node responses right before the first line linked above. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah sorry this is all quite a tangle. I don't think it'll work to drain responses in onCompletion
- we could already have drained them in the listener within addReleaseOnCancellationListener
but still not quite released them yet.
On reflection it looks like there's other ways we can complete the final listener before releasing all the node-level responses, e.g. here:
elasticsearch/server/src/main/java/org/elasticsearch/action/support/nodes/TransportNodesAction.java
Lines 171 to 173 in 74fd66c
try (var ignored = Releasables.wrap(Iterators.map(responses.iterator(), r -> r::decRef))) { | |
newResponseAsync(task, request, actionContext, responses, exceptions, l); | |
} |
Let's leave this alone then; I appreciate the comment in the test calling out that this is slightly odd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@elasticmachine update branch |
@elasticmachine update branch |
There are no new commits on the base branch. |
💔 Backport failed
You can use sqren/backport to manually backport by running |
Resolves: elastic#134277 (cherry picked from commit 2ea81d0) # Conflicts: # muted-tests.yml
💚 All backports created successfully
Questions ?Please refer to the Backport tool documentation |
Resolves: elastic#134277 (cherry picked from commit 2ea81d0) # Conflicts: # muted-tests.yml
Resolves: #134277