-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Fix RestCancellableNodeClientTests.testConcurrentExecuteAndClose() #129294
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fixes a race condition in testConcurrentExecuteAndClose() where awaitClose() is called before addCloseListener() has been called, resulting in a situation where closeListener is never completed and the closeLatch is never pulled. Closes elastic#129121
|
Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination) |
| // if the channel is already closed, the listener gets notified immediately, from the same thread. | ||
| if (open.get() == false) { | ||
| listener.onResponse(null); | ||
| // Handle scenario where awaitClose() was called before any calls to addCloseListener(), this ensures closeLatch is pulled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah good point, but then could we achieve the same thing by dropping the whole if (open.get() == false) and always calling closeListener.onResponse(ActionListener.assertOnce(listener));? I guess that doesn't guarantee to complete listener on the calling thread in that case, not sure if this is important.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes testChannelAlreadyClosed() can fail on the unexpected RestCancellableNodeClient.getNumChannels() value if a listener is completed async. I can update this test and refactor TestHttpChannel to support waiting for the closeLatch to get pulled after previously calling close(). Right now you can't just call awaitClose() since it tries to first call close() which will fail since the atomic has already been set.
I'll try to simplify TestHttpChannel.addCloseListener() to address this comment and the other comment below, adjusting the existing tests and TestHttpChannel as needed.
| if (open.get() == false) { | ||
| listener.onResponse(null); | ||
| // Handle scenario where awaitClose() was called before any calls to addCloseListener(), this ensures closeLatch is pulled. | ||
| if (closeListener.isDone() == false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assertFalse(closeListener.isDone()) on this branch too? Seems like it'd be a test bug to want to add two of them?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When the channel is closed we can get more calls here since the channel is removed from httpChannels in RestCancelllableNodeClient.CloseListener.onResponse(), so a new RestCancelllableNodeClient.CloseListener is created in RestCancelllableNodeClient.doExecute(), which leads to maybeRegisterChannel() and then httpChannel.addCloseListener().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to avoid multiple addCloseListener() calls it looks like we'd want to check if the httpChannel is closed at the beginning of RestCancellableNodeClient.doExecute() and complete the listener with a failure before returning. But this would be a change in existing behavior and expected cancellations and we'd need to alter the tests for it. To keep the change minimal in the PR I removed the isDone() check and added an additional comment to clarify what is expected when the channel is closed. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
| if (open.get() == false) { | ||
| listener.onResponse(null); | ||
| // Ensure closeLatch is pulled by completing the closeListener with a noop that is ignored if it is already completed. | ||
| // Note that when the channel is closed we may see multiple addCloseListener() calls, so we do not assert on isDone() here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... nor can we rely on the listener with which closeListener is completed itself being completed which is why we pass in a noop() after completing listener directly ourselves. Would you add words to that effect to this comment for the next reader?
This all seems kinda ugly but I don't have a better suggestion.
…lastic#129294) Fixes a race condition in testConcurrentExecuteAndClose() where awaitClose() is called before addCloseListener() has been called, resulting in a situation where closeListener is never completed and the closeLatch is never pulled. Closes elastic#129121
…lastic#129294) Fixes a race condition in testConcurrentExecuteAndClose() where awaitClose() is called before addCloseListener() has been called, resulting in a situation where closeListener is never completed and the closeLatch is never pulled. Closes elastic#129121
…lastic#129294) Fixes a race condition in testConcurrentExecuteAndClose() where awaitClose() is called before addCloseListener() has been called, resulting in a situation where closeListener is never completed and the closeLatch is never pulled. Closes elastic#129121
…lastic#129294) Fixes a race condition in testConcurrentExecuteAndClose() where awaitClose() is called before addCloseListener() has been called, resulting in a situation where closeListener is never completed and the closeLatch is never pulled. Closes elastic#129121
…lastic#129294) Fixes a race condition in testConcurrentExecuteAndClose() where awaitClose() is called before addCloseListener() has been called, resulting in a situation where closeListener is never completed and the closeLatch is never pulled. Closes elastic#129121 (cherry picked from commit 9a8e503)
Fixes a race condition in
testConcurrentExecuteAndClose()whereawaitClose()is called beforeaddCloseListener()has been called, resulting in a situation wherecloseListeneris never completed and thecloseLatchis never pulled.Closes #129121