Skip to content

Conversation

Tim-Brooks
Copy link
Contributor

@Tim-Brooks Tim-Brooks commented Sep 17, 2024

This commit adds a mechanism of splitting bulk requests when a
coordinating nodes is under memory pressure. In order to do that it
adds a new mechanism of http stream handling. And the proper framework
to use this with bulk requests.

@Tim-Brooks Tim-Brooks added >non-issue :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. v8.16.0 labels Sep 17, 2024
@Tim-Brooks Tim-Brooks requested a review from a team as a code owner September 17, 2024 18:06
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed (Team:Distributed)

@elasticsearchmachine elasticsearchmachine added Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v9.0.0 labels Sep 17, 2024
@Tim-Brooks Tim-Brooks force-pushed the partial-rest-requests-rebase branch 2 times, most recently from ff66351 to a85a02a Compare September 18, 2024 00:18
Copy link
Contributor

@nielsbauman nielsbauman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes in BulkOperation LGTM. I do think it would be worth adding one test that ensures that short-circuited shard requests also end up in the failure store - maybe in IncrementalBulkRestIT.java or IncrementalBulkIT.java perhaps.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.


if (request.isStreamedContent()) {
assert action instanceof RequestBodyChunkConsumer;
var chunkConsumer = (RequestBodyChunkConsumer) action;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is where we'd fail if there is a discrepancy between the handlers and the decider filter in Netty4HttpAggregator. This is fine for now (for the merge), and possibly also for our initial roll-out plans.

I do wonder though if we could have a simple aggregating handler here that protects us against a discrepancy, in particular before we enable this for regular releases (or perhaps before that even)? Would it be feasible to always do the aggregation here even (when the mode is enabled), assuming larger requests are handled by proper handlers?

Copy link
Contributor

@mhl-b mhl-b Sep 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a temporary measure to do minimal change in http/rest layer to allow partial content. Current intention is to remove HttpAggregator and do aggregation in (Base)RestHandler. I drafted it before in #112120 and planning to revisit once we merge to main. There are several options how to do that, I will prepare a design doc.

public void channelRead(ChannelHandlerContext ctx, Object msg) throws Exception {
assert msg instanceof HttpObject;
if (msg instanceof HttpRequest request) {
var preReq = HttpHeadersAuthenticatorUtils.asHttpPreRequest(request);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find this somewhat unnecessary, could the decider not work on the uri directly instead or even the HttpRequest (the decider is quite intimate anyway). Happy to leave as is for now ofc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. IMO we definitely have room for improvement here. There is a ticket tracking improved handling in this area.

@henningandersen
Copy link
Contributor

I do think it would be worth adding one test that ensures that short-circuited shard requests also end up in the failure store - maybe in IncrementalBulkRestIT.java or IncrementalBulkIT.java perhaps.

Thanks, that makes sense. We should tackle that as a follow-up though, given that the incremental bulk handling is off by default and I'd prefer to get this in and start some early testing on it.

Copy link
Member

@jbaiera jbaiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BulkOperation changes look like they make sense to me! I second the ask for a test, but it's fine if that comes as a follow up.

mhl-b and others added 13 commits September 18, 2024 13:38
Allow a single bulk request to be passed to Elasticsearch in multiple
parts. Once a certain memory threshold or number of operations have
been received, the request can be split and submitted for processing.
This commit splits bulks once memory usage for indexing pressure has
passed a configurable threshold.
Integrate the incremental bulks into RestBulkAction
Currently the rest.incremental_bulk is read in two different places.
This means that it will be employed in two steps introducing
unpredictable behavior. This commit ensures that it is only read in a
single place.
The header validator is very aggressive about adjusting autoread on the
belief it is the only place where autoread is tweaked. However, with
stream backpressure, we should only change it when we are starting or
finishing header validation.
Currently, unless a rest handler specifies that it handles "unsafe"
buffers, we must copy the http buffers in releaseAndCopy. Unfortuantely,
the original content was slipping through in the initial stream PR. This
less to memory corruption on index and update requests which depend on
buffers being copied.
Currently, the entire close pipeline is not hooked up in case of a
channel close while a request is being buffered or executed. This commit
resolves the issue by adding a connection to a stream closure.
This commit ensures we properly throw exceptions when an empty bulk
request is received with the incremental handling enabled.
@Tim-Brooks Tim-Brooks force-pushed the partial-rest-requests-rebase branch from a326505 to 50eb7f9 Compare September 18, 2024 19:55
Spotless broke during a rebase. Fixing in this commit.
@Tim-Brooks Tim-Brooks force-pushed the partial-rest-requests-rebase branch from 50eb7f9 to 529d349 Compare September 18, 2024 19:59
Copy link
Contributor

@mhl-b mhl-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Tim-Brooks Tim-Brooks merged commit 3e6acdd into main Sep 18, 2024
16 checks passed
@Tim-Brooks Tim-Brooks deleted the partial-rest-requests-rebase branch September 18, 2024 22:28
Tim-Brooks added a commit that referenced this pull request Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

:Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. >non-issue Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. v8.16.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants