Skip to content

Conversation

@mhl-b
Copy link
Contributor

@mhl-b mhl-b commented Jan 15, 2025

Netty's HttpObjectAggregator handles for Expect: 100-Continue and chunked oversized requests besides aggregating parts. But HTTP stream handling does not use aggregator and we plan to remove it completely. That means we need to handle these cases by ourself.

This PR introduces Netty4HttpContentSizeHandler that handles expect-continue and oversized requests in the same way as HttpObjectAggregator. Some parts are copied from netty's code and simplified for our usage. Follow up on #117787 split into smaller pieces.

Once we completely switch to HTTP stream this handler will replace HttpObjectAggregator. For now there is conditional logic between stream and aggregation.

There is an interesting interaction between HttpRequestDecoder and HttpObjectAggregator . When aggregator responds with 413-too-large to expect-100-continue it resets HttpRequestDecoder through userEvent in pipeline.
Aggregator sends event and Decoder resets state.
This reset is required to avoid treating subsequent request as content of rejected request. But this is private code. Public interface is HttpRequestDecoder#reset(), so Netty4HttpContentSizeHandler requires explicit access to decoder and reset.

Comment on lines -336 to +341
// ensures that oversized chunked encoded request has no limits at http layer
// rest handler is responsible for oversized requests
public void testOversizedChunkedEncodingNoLimits() throws Exception {
// ensures that oversized chunked encoded request has maxContentLength limit and returns 413
public void testOversizedChunkedEncoding() throws Exception {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With stream introduction we dropped limits for chunked content. Bringing them back.

@mhl-b mhl-b marked this pull request as ready for review January 16, 2025 06:11
@mhl-b mhl-b requested a review from ywangd January 16, 2025 06:11
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jan 16, 2025
@mhl-b mhl-b added >non-issue :Distributed Coordination/Network Http and internode communication implementations Team:Distributed Coordination Meta label for Distributed Coordination team and removed needs:triage Requires assignment of a team area label labels Jan 16, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-coordination (Team:Distributed Coordination)

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a couple of small points around testing.

Comment on lines 124 to 127
} else {
decoder.reset();
ctx.writeAndFlush(EXPECTATION_FAILED.retainedDuplicate()).addListener(ChannelFutureListener.CLOSE_ON_FAILURE);
return;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we cover this branch in the unit tests - could you add a test? Also just resetting the decoder seems quite a weak response, I would expect that in this situation we just don't know how to proceed with subsequent data so should give up and close the connection.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will add a test. Subsequent data will result in decoding failure and trip HttpPipeliningHandler and very likely connection close. If there is no data connection can be reused. I kind of agree that closing makes sense, since the 'Expect' header supports "100-Continue" only, while other values seem suspicious.

@ESIntegTestCase.ClusterScope(numDataNodes = 1)
public class Netty4IncrementalRequestHandlingIT extends ESNetty4IntegTestCase {

private static final int maxContentLength = 50 * 1024 * 1024;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: suggest

Suggested change
private static final int maxContentLength = 50 * 1024 * 1024;
private static final int TEST_MAX_CONTENT_LENGTH_BYTES = ByteSizeUnit.MB.toIntBytes(50);

SHOUTY_SNAKE_CASE for a constant, plus avoiding * 1024 * 1024.

channel.writeInbound(encode(part1));
((HttpContent) channel.readInbound()).release();

var part2 = httpContent(MAX_CONTENT_LENGTH);
Copy link
Contributor

@DaveCTurner DaveCTurner Jan 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should fail at 1 byte, no need to send double the limit here. But also, could we send a wider variety of chunk sizes rather than just one max-sized chunk? Something like this?

        int contentBytesSent = 0;
        do {
            var thisPartSize = between(1, MAX_CONTENT_LENGTH * 2);
            channel.writeInbound(encode(httpContent(thisPartSize)));
            contentBytesSent += thisPartSize;

            if (contentBytesSent <= MAX_CONTENT_LENGTH) {
                ((HttpContent) channel.readInbound()).release();
            } else {
                break;
            }
        } while (true);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test covers both invariants of content tracking before reaching limit and after, also at precise mark of MAX_CONTENT_LENGTH to cover one-off case. The chunking itself has no meaning here. I'm fine to change thou.

Copy link
Contributor

@DaveCTurner DaveCTurner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

*/
public void testExpectationFailed() {
var sendRequest = httpRequest();
sendRequest.headers().set(HttpHeaderNames.EXPECT, "Potato");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:) maybe randomValueOtherThan("100-Continue", ESTestCase::randomIdentifier); would be a more descriptive way to say what we're trying to do here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fancy

@mhl-b mhl-b added the v8.18.0 label Jan 16, 2025
@mhl-b
Copy link
Contributor Author

mhl-b commented Jan 16, 2025

@DaveCTurner, CI catch an interesting exception in testEntityTooLargeWithContent.
In this test client sends oversized request with "expect" and content without waiting for continue response.
So I reset decoder and expect following content to have decoding failure. But in some cases content can look like a new request and confuses decoder.

I'm wondering should we always close connection when request is oversized, including expect-continue?
I would say yes, client should not poke ES with different content sizes, it's not a proxy. Also I wouldn't rely on client to behave and wait for a 100-Continue reply.

@DaveCTurner
Copy link
Contributor

In this test client sends oversized request with "expect" and content without waiting for continue response.

That sounds like invalid client behaviour to me. The whole point of Expect: 100-Continue is that the client wants to be told to continue before it starts to send the body, or else to move on to the next request without sending any body on an error. If it's just going to send the body regardless of the response then why send Expect: 100-Continue?

I think the test should instead simulate correct client behaviour: send the headers but no body, check it gets a 413 back, and then ideally try and send another (valid) request to check the decoder gets left in the right state to handle it.

@mhl-b
Copy link
Contributor Author

mhl-b commented Jan 16, 2025

I think it's correct behaviour based on rfc. The expect header is not very strict.

https://www.rfc-editor.org/rfc/rfc9110.html#section-10.1.1-11.3

A client that sends a 100-continue expectation is not required to wait for any specific length of time; such a client MAY proceed to send the content even if it has not yet received a response. Furthermore, since 100 (Continue) responses cannot be sent through an HTTP/1.0 intermediary, such a client SHOULD NOT wait for an indefinite period before sending the content.

A server MAY omit sending a 100 (Continue) response if it has already received some or all of the content for the corresponding request, or if the framing indicates that there is no content.

@mhl-b
Copy link
Contributor Author

mhl-b commented Jan 16, 2025

I think the test should instead simulate correct client behaviour: send the headers but no body, check it gets a 413 back, and then ideally try and send another (valid) request to check the decoder gets left in the right state to handle it.

I will add this one.

Copy link
Contributor

@nicktindall nicktindall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mhl-b
Copy link
Contributor Author

mhl-b commented Jan 17, 2025

Added mixed content test. 6080e50. Assuming a well behaving client that patiently awaits Expect response and does not send content if receive rejection.

@DaveCTurner
Copy link
Contributor

I think it's correct behaviour based on rfc.

Interesting, I missed that. In which case I think we should reinstate the test behaviour that sends the content straight away too (maybe in a follow-up). The spec doesn't say how to proceed after sending a response other than 100 Continue tho: is the client supposed to skip the body or not? If so how does the server detect the end of the body and thus the start of the next request? It seems the only safe thing to do in this case is indeed to close the connection.

@mhl-b
Copy link
Contributor Author

mhl-b commented Jan 17, 2025

If so how does the server detect the end of the body and thus the start of the next request?

I'm not liking spec much. The implementation seems not trivial and decoder has to be very smart about it. I think keeping current netty's behaviour - reset decoder and dont close connection - should be ok. And dont test this use case.

@mhl-b mhl-b added the auto-backport Automatically create backport pull requests when merged label Jan 17, 2025
@mhl-b mhl-b merged commit 43e3e24 into elastic:main Jan 17, 2025
16 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 120246

@DaveCTurner
Copy link
Contributor

I'm not liking spec much. The implementation seems not trivial and decoder has to be very smart about it. I think keeping current netty's behaviour - reset decoder and dont close connection - should be ok. And dont test this use case.

I'm ok with that too.

ldematte added a commit to ldematte/elasticsearch that referenced this pull request Jan 27, 2025
Tim-Brooks pushed a commit that referenced this pull request Jan 27, 2025
Tim-Brooks pushed a commit to Tim-Brooks/elasticsearch that referenced this pull request Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged :Distributed Coordination/Network Http and internode communication implementations >non-issue Team:Distributed Coordination Meta label for Distributed Coordination team v8.18.0 v9.0.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants