Skip to content

Conversation

@original-brownbear
Copy link
Contributor

We only retained the response to response to a test-only call back in this lambda. Also in OutboundMessage we only retain the message for its class name until the very end on the off chance we need it to print a slowness warning. This should do away with all retention in the transport layer code, probably a lot of spots to fix for this upstream from there, but IMO it's a good start and saves heap for all the spots that don't need fixing right away.

PS: I'm aware that the OutBoundMessage is a weird class now, even weirder than before. My suggestion would be to go for this change as a "shortest possible fix" kinda thing and refactor away OutBoundMessage in the next step (and quickly do so).

We only retained the response to response to a test-only call back in
this lambda. With this fixed, all that's seemingly left is removing the
response reference from `OutboundMessage` and we should be (unless
there's more specifc to the callsite issues upstream) able to GC these
potentially huge instances a lot quicker under network pressure.
@elasticsearchmachine elasticsearchmachine added the Team:Distributed Coordination Meta label for Distributed Coordination team label Mar 18, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. Could you open a PR with the more finished state too (with OutboundMessage gone) so we can see more clearly where this is heading? It might be that the next step is also not so large and we can do it in one.

+ "}{"
+ isHandshake()
+ "}{"
+ message.getClass()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather we kept some way to identify the response type, although I must say I never much liked sharing its class here. Could we plumb in the action from the TcpTransportChannel down to here first?

We only retained the response to response to a test-only call back in
this lambda. With this fixed, all that's seemingly left is removing the
response reference from `OutboundMessage` and we should be (unless
there's more specifc to the callsite issues upstream) able to GC these
potentially huge instances a lot quicker under network pressure.
@original-brownbear
Copy link
Contributor Author

@DaveCTurner thanks for taking a look. So roughly https://github.com/elastic/elasticsearch/compare/main...original-brownbear:drop-outbound-msg?expand=1 is where I'd take this. It's essentially just a bunch of test plumbing and maybe some other simplifications to serialization but nothing complicated either.

Note that I made that version in a way that preserves the slowness warning serialization exactly as it is today ... not necessarily the approach I'd recommend long term but also not a big deal memory wise for now.

DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 20, 2025
Nobody uses this parameter (except some tests that simplify verify the
otherwise-unused plumbing is connected). This commit removes it.

Relates elastic#125163
DaveCTurner added a commit to DaveCTurner/elasticsearch that referenced this pull request Mar 20, 2025
Nobody uses this parameter (except some tests that simply verify the
otherwise-unused plumbing is connected). This commit removes it.

Relates elastic#125163
@DaveCTurner
Copy link
Contributor

Ok yeah that looks nicer indeed, I think I'd rather do the whole thing in one step tbh rather than having this intermediate state. Or rather I'd like to do #125326 separately first because it's kinda unrelated to the rest of it, then just https://github.com/elastic/elasticsearch/compare/main...original-brownbear:drop-outbound-msg?expand=1 straight away.

@original-brownbear
Copy link
Contributor Author

Sounds good, I'll wait for that PR to go on, then I'll update this one to cover all the changes when resolving the conflicts anyway :)

@DaveCTurner
Copy link
Contributor

👍 I think my only comment so far is I'd rather not use requestAction == null to say whether we're sending a request or a response, let's have a boolean parameter instead (and validate it against requestAction == null)

elasticsearchmachine pushed a commit that referenced this pull request Mar 20, 2025
Nobody uses this parameter (except some tests that simply verify the
otherwise-unused plumbing is connected). This commit removes it.

Relates #125163
DaveCTurner added a commit that referenced this pull request Mar 20, 2025
Nobody uses this parameter (except some tests that simply verify the
otherwise-unused plumbing is connected). This commit removes it.

Relates #125163
channel,
message,
networkMessage,
requestAction == null
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said above, not 100% sure about this one, but then again, either we want this information or not though and this still seems cheaper than redundantly building the string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I think we want this info. Looks easy enough to plumb in the action on the response path too these days, but we can do that in a followup.


// public for tests
public static BytesReference serialize(
@Nullable String requestAction,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get your point @DaveCTurner, setting request or response based on a null is not great, but then again, this method already has an absurd number of parameters? :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it needs a parameters object? I think I'd call it... OutboundMessage 🤣

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll follow up with something like this:

original-brownbear/elasticsearch@stop-retaining-transport-response-past-sending...DaveCTurner:elasticsearch:2025/03/21/OutboundHandler-MessageDirection

drops the isError boolean that only makes sense on responses, replacing it with a three-state enum, and makes the action available on all paths.

final BytesReference totalBytes = message.serialize(body, os);
final BytesReference totalBytes;
if (isRequest) {
totalBytes = OutboundHandler.serialize(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we could add some overloads to shorten this thing, but not sure it's worth it, it's not that many spots and it's test only code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah no big deal IMO

@original-brownbear
Copy link
Contributor Author

Alright thanks @DaveCTurner I merge my other branch into this one. Fine by me adding another flag here to mark a request but other than that wdyt? :) Certainly could be done nicer here and there, but IMO it's an improvement over the status quo beyond the GC improvements because it also makes it a little easier to add additional zero-copy serialization paths I think.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (few nits but nothing blocking)


// public for tests
public static BytesReference serialize(
@Nullable String requestAction,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it needs a parameters object? I think I'd call it... OutboundMessage 🤣

final BytesReference totalBytes = message.serialize(body, os);
final BytesReference totalBytes;
if (isRequest) {
totalBytes = OutboundHandler.serialize(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah no big deal IMO

channel,
message,
networkMessage,
requestAction == null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I think we want this info. Looks easy enough to plumb in the action on the response path too these days, but we can do that in a followup.

*/
void sendBytes(TcpChannel channel, BytesReference bytes, ActionListener<Void> listener) {
internalSend(channel, bytes, null, listener);
internalSend(channel, bytes, () -> "", listener);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
internalSend(channel, bytes, () -> "", listener);
internalSend(channel, bytes, () -> "raw bytes", listener);

if (compressionScheme != null) {
status = TransportStatus.setCompress(status);
}
byteStreamOutput.seek(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should at least assert that byteStreamOutput.position() == 0 at the top of the method (the existing code also makes this assumption)


import java.io.IOException;

abstract class OutboundMessage extends NetworkMessage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no usages of NetworkMessage left now either?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right deleted it :)


final int variableHeaderLength = Math.toIntExact(byteStreamOutput.position() - TcpHeader.HEADER_SIZE);
BytesReference message = serializeMessageBody(writeable, compressionScheme, version, byteStreamOutput);
byte status = requestAction != null ? 0 : TransportStatus.setResponse((byte) 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would prefer to start with 0 and use if (...) over ... ? ... : ... for harmony with the following lines

@original-brownbear original-brownbear added the auto-backport Automatically create backport pull requests when merged label Mar 21, 2025
@original-brownbear
Copy link
Contributor Author

Thanks David!

@original-brownbear original-brownbear merged commit 9c8750b into elastic:main Mar 21, 2025
17 checks passed
@original-brownbear original-brownbear deleted the stop-retaining-transport-response-past-sending branch March 21, 2025 12:08
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 125163

afoucret pushed a commit to afoucret/elasticsearch that referenced this pull request Mar 21, 2025
…125326)

Nobody uses this parameter (except some tests that simply verify the
otherwise-unused plumbing is connected). This commit removes it.

Relates elastic#125163
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Mar 21, 2025
…125326)

Nobody uses this parameter (except some tests that simply verify the
otherwise-unused plumbing is connected). This commit removes it.

Relates elastic#125163
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Mar 21, 2025
Remove the `OutboundMessage` class that needlessly holds on to the the response instances after they are not needed any longer. Inlining the logic should save considerably heap under pressure and enabled further optimisations.
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
…125326)

Nobody uses this parameter (except some tests that simply verify the
otherwise-unused plumbing is connected). This commit removes it.

Relates elastic#125163
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
Remove the `OutboundMessage` class that needlessly holds on to the the response instances after they are not needed any longer. Inlining the logic should save considerably heap under pressure and enabled further optimisations.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 1, 2025
Remove the `OutboundMessage` class that needlessly holds on to the the response instances after they are not needed any longer. Inlining the logic should save considerably heap under pressure and enabled further optimisations.
original-brownbear added a commit to original-brownbear/elasticsearch that referenced this pull request Apr 1, 2025
Remove the `OutboundMessage` class that needlessly holds on to the the response instances after they are not needed any longer. Inlining the logic should save considerably heap under pressure and enabled further optimisations.
original-brownbear added a commit that referenced this pull request Apr 2, 2025
)

Remove the `OutboundMessage` class that needlessly holds on to the the response instances after they are not needed any longer. Inlining the logic should save considerably heap under pressure and enabled further optimisations.

backport of #125163
@original-brownbear
Copy link
Contributor Author

back ported in #126078 now, sorry for the delay @DaveCTurner !

@DaveCTurner
Copy link
Contributor

Thanks Armin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Distributed Coordination/Network Http and internode communication implementations >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants