Skip to content

[Bug] Stream Load via FE may fail with broken pipe on master while Doris 3.0 works #63325

@wenzhenghu

Description

@wenzhenghu

Version

  • Affected versions: Doris 3.1.3 and later
  • Verified baseline without this problem in this reproduction: Doris 3.0.8
  • Verified reproduction on current master: doris-0.0.0-b06684a15d5

What's Wrong?

When using Python low-level HTTP clients to perform Stream Load through FE, Doris FE may fail with connection errors such as:

  • BrokenPipeError(32, 'Broken pipe')
  • ConnectionResetError(54, 'Connection reset by peer')

This problem occurs on the FE redirect path of Stream Load. The request is sent to FE, FE returns 307 Temporary Redirect, but the client may still be sending the request body at that moment. On affected Doris versions, FE may close or reset the connection early, which causes the client-side write failure.

This is a compatibility regression compared with Doris 3.0, because the same client behavior against Doris 3.0 FE can still receive a normal 307 Temporary Redirect response without triggering the connection error.

This problem mainly affects clients that:

  • send Stream Load to FE instead of BE
  • use HTTP/1.1
  • start sending request body before fully processing the FE redirect
  • use chunked transfer or generator-based streaming
  • run in higher RTT environments

Typical affected clients include:

  • Python http.client
  • Python requests in some streaming modes
  • Logstash HTTP-based output plugins

What You Expected?

  • Stream Load requests sent to FE should remain compatible with common HTTP/1.1 streaming clients
  • FE should return a normal 307 Temporary Redirect without causing client-side broken pipe or connection reset by peer
  • Behavior should remain compatible with Doris 3.0 for the same reproduction pattern

How to Reproduce?

Target Instances

Two Doris instances were used for comparison on the same host. The host/IP has been anonymized below as <REDACTED_HOST>.

Doris 3.0

  • MySQL: <REDACTED_HOST>:9030
  • FE HTTP: <REDACTED_HOST>:8030
  • BE HTTP: <REDACTED_HOST>:8040
  • Version: doris-3.0.8-rc01-53c80683e85

Doris master

  • MySQL: <REDACTED_HOST>:9034
  • FE HTTP: <REDACTED_HOST>:8034
  • BE HTTP: <REDACTED_HOST>:8044
  • Version: doris-0.0.0-b06684a15d5

Reproduction Script

A sanitized reproduction script is provided here:

The script uses Python http.client with chunked upload to simulate a client that keeps sending request body while FE responds with redirect.

Reproduction Commands

Doris 3.0 FE: receives a normal 307 redirect

python stream_load_redirect_repro.py \
  --client httpclient \
  --host <REDACTED_HOST> \
  --mysql-port 9030 \
  --fe-http-port 8030 \
  --be-http-port 8040 \
  --db wzh \
  --table stream_load_redirect_repro \
  --target fe \
  --payload-mb 1 \
  --chunk-kb 1 \
  --sleep-ms 0 \
  --truncate-before \
  --show-row-count

Expected result: FE returns 307 Temporary Redirect.

Doris master FE: reproduces connection reset / broken pipe

python stream_load_redirect_repro.py \
  --client httpclient \
  --host <REDACTED_HOST> \
  --mysql-port 9034 \
  --fe-http-port 8034 \
  --be-http-port 8044 \
  --db wzhtest \
  --table stream_load_redirect_repro \
  --target fe \
  --payload-mb 1 \
  --chunk-kb 1 \
  --sleep-ms 0 \
  --truncate-before \
  --show-row-count

Typical result on master: ConnectionResetError.

The following variant can reproduce BrokenPipeError more aggressively by increasing the write window:

python stream_load_redirect_repro.py \
  --client httpclient \
  --host <REDACTED_HOST> \
  --mysql-port 9034 \
  --fe-http-port 8034 \
  --be-http-port 8044 \
  --db wzhtest \
  --table stream_load_redirect_repro \
  --target fe \
  --payload-mb 8 \
  --chunk-kb 16 \
  --sleep-ms 10 \
  --truncate-before \
  --show-row-count

Reproduction Result on Doris 3.0

Request to FE:

http://<REDACTED_HOST>:8030/api/wzh/stream_load_redirect_repro/_stream_load

Result:

{
  "target": "fe",
  "url": "http://<REDACTED_HOST>:8030/api/wzh/stream_load_redirect_repro/_stream_load",
  "client": "httpclient",
  "status_code": 307,
  "elapsed_seconds": 18.205,
  "headers": {
    "Location": "http://root:@<REDACTED_HOST>:8040/api/wzh/stream_load_redirect_repro/_stream_load?",
    "Connection": "close"
  },
  "body": ""
}

Reproduction Result on Doris master

Request to FE:

http://<REDACTED_HOST>:8034/api/wzhtest/stream_load_redirect_repro/_stream_load

Result:

{
  "target": "fe",
  "url": "http://<REDACTED_HOST>:8034/api/wzhtest/stream_load_redirect_repro/_stream_load",
  "client": "httpclient",
  "elapsed_seconds": 0.605,
  "exception_type": "ConnectionResetError",
  "exception": "ConnectionResetError(54, 'Connection reset by peer')"
}

In another run with a larger payload and paced chunk sending, Doris master also reproduced:

{
  "exception_type": "BrokenPipeError",
  "exception": "BrokenPipeError(32, 'Broken pipe')"
}

Anything Else?

Comparison Between Doris 3.0 and Later Versions

Using the same host, same network, same Python client style, and same reproduction approach:

  • Doris 3.0 FE returns normal 307 Temporary Redirect
  • Doris 3.1.3 and later are affected by this compatibility problem
  • Current master FE closes/resets the connection early

Root Cause Analysis

The current FE Stream Load path behaves as:

  1. validate request
  2. select target BE
  3. immediately return 307 Temporary Redirect
  4. do not consume the request body

At the same time, FE now runs on a newer web stack including:

  • Spring Boot 3
  • Spring Framework 6
  • Jetty 12

Jetty 12 is more sensitive in the HTTP/1.1 case where the application returns a response before consuming the request body. If the client is still sending request body data when FE already redirects or closes the connection, the client may observe:

  • broken pipe
  • connection reset by peer

The following Jetty issues are closely related to this compatibility problem:

Proposed Fix

Two service-side improvements are proposed:

  1. Expose Jetty's maxUnconsumedRequestContentReads as an FE configuration, for example jetty_server_max_unconsumed_request_content_reads, and apply it to FE HttpConfiguration.
  2. Add bounded drain compatibility logic on the Stream Load redirect path. After writing 307, FE should drain or discard a bounded amount of remaining request body, controlled by an FE config such as stream_load_redirect_bounded_drain_max_bytes.

Environment

  • macOS client
  • Python 3.14
  • PyMySQL + Python http.client
  • same host and same network path for both Doris instances

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions