-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Make DelayableWriteable compress its contents #126988
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make DelayableWriteable compress its contents #126988
Conversation
|
Pinging @elastic/es-analytical-engine (Team:Analytics) |
rjernst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The concepts here seem good (using fixed-size sizes and compressing). I have one question.
| out.writeInt(serialized.length()); | ||
| serialized.writeTo(out); | ||
| } else { | ||
| out.writeBytesReference(serialized); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the above comment is no longer accurate with the new branch since we will no longer copy the raw bytes, but instead reencode?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still copy the raw bytes, it's just that the prefix is a full int now.
writeBytesReference is nothing else but:
out.writeVInt(serialized.length());
serialized.writeTo(out);all that happens here is going from a vint to an int?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, I understand now.
rjernst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
Thanks Ryan! |
In light of data from recent escalations and the introduction of batched execution we can make two improvements to this logic.
For one, we should prefix with a fixed length length field so that we don't need to do any copying when serializing to account for the vint. This outright halves the memory bandwidth required relative to the previous implementation.
More importantly maybe, we should compress these bytes. The wire-format for aggregations is rather inefficient when working with nested bucket aggregations since the type strings are repeated over and over. These don't contribute to the peak heap requirements because they are translated into Java types, but blow up the message size considerably (among other things). Practically, it seems that we often get compression ratios of ~10x for aggregations.
Given that we generally have more memory issues than CPU issues during the reduce-step it seems like an easy tradeoff to trade a little CPU for compression for serious heap savings here.