Skip to content

Conversation

javanna
Copy link
Member

@javanna javanna commented Oct 2, 2025

#127112 introduced BytesTransportResponse to be used in search batched execution, so that NodeQueryResponse could be written as BytesTransportResponse as opposed to materializing the response object on heap.

When a proxy node acts as a proxy to query its local data, and the coordinating node is on a different version than the proxy node, the response will fail to deserialize in the coord node because it was written with the version of the proxy node as opposed to that of the coord (target) node. This is because DirectResponseChannel skips the step of reading and writing back such response, which would lead to it being converted to the right format.

This commit attempts to fix this problem by tracking the version used to write the response, and conditionally converting it in the ProxyRequestHandler.

There may be better ways to fix this problem, let's use this draft to align on the problem and agree on possible solutions. I am happy to iterate on it, as this currently blocks merging #135549 due to the serialization change it introduces.

Note that search batched executed is still under a feature flag, and disabled by default, so this bug does not currently affect our users.

execution, to a BytesTransportResponse as opposed to materializing the
response object on heap.

When a proxy node acts as a proxy to query its local data, and the coordinating
node is on a different version than the proxy node, the response will fail
to deserialize in the coord node because it was written with the version of the
proxy node as opposed to that of the coord (target) node. This is because
DirectResponseChannel does not read and write such response, which would lead
to it being converted to the right format.

This commit attempts to fix this problem by tracking the version used to
write the response, and conditionally converting it in the ProxyRequestHandler.
@javanna javanna added >bug :Distributed Coordination/Network Http and internode communication implementations :Search Foundations/Search Catch all for Search Foundations v9.3.0 labels Oct 2, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @javanna, I've created a changelog YAML for you.

@javanna javanna removed the :Search Foundations/Search Catch all for Search Foundations label Oct 2, 2025
@javanna javanna marked this pull request as ready for review October 3, 2025 10:09
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Oct 3, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This deserves some tests in TransportActionProxyTests showing that we do/don't re-serialize the response in these cases.

try {
channel.sendResponse(convertedResponse);
} finally {
convertedResponse.decRef();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd initially forgotten this, can you double check that this is what I need to be doing please?

convertedResponse.decRef();
}
} catch (IOException e) {
throw new UncheckedIOException(e);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should I throw or send the exception as a response? I am assuming that exceptions thrown are already caught and sent as responses, but good to double check.

public void writeTo(StreamOutput out) throws IOException {
out.writeString(targetNode);
if (out.getTransportVersion().supports(transportVersion1)) {
out.writeBoolean(true);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reproduces the problem in the tests I added, which is fixed with the conversion this change introduces.

"internal:test",
cancellable,
// For a proxy node proxying to itself, the response is sent directly, without it being read by the proxy layer
r -> { throw new AssertionError(); },
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this diff is misleading! I only changed this line in testSendLocalRequest

@javanna
Copy link
Member Author

javanna commented Oct 3, 2025

Thanks for the suggestion, I added tests @DaveCTurner.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@javanna javanna merged commit cebc1a8 into elastic:main Oct 6, 2025
34 checks passed
@javanna javanna deleted the fix/proxy_direct_response_convert branch October 6, 2025 12:01
@javanna javanna added v8.19.6 and removed v8.19.6 labels Oct 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Coordination/Network Http and internode communication implementations Team:Distributed Coordination Meta label for Distributed Coordination team v9.3.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants