-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
If you are reporting any crash or any potential security issue, do not
open an issue in this repo. Please report the issue via emailing
[email protected] where the issue will be triaged appropriately.
Title: Envoy closing upstream connection as idle even before connection established
Description:
What issue is being seen? Describe what should be happening instead of
the bug, for example: Envoy should not crash, the expected value isn't
returned, etc.
Recently we are seeing error upstream_reset_before_response_started{local_connection_failure} in our envoy proxy. envoy version 1.31.3
Due to few legacy backends we have set http connection idle_timeout of 3seconds as we have to support few legacy backends which have low keep-alive of 5s. We have default connect timeout of 5s.
We are using DFP sub-cluster for most of the non mesh endpoints, where we see these errors, but i think this is generic issue with any upstream type.
I checked the code source/common/http/codec_client.cc and see that the idle_timeout timer is started even before connection is established, resulting in local connection failures if envoy is unable to connect in 3s. (default connect timeout of 5s does not seem to be used if idle_timeout fires earlier).
I think idle_timeout timer should start only after connection is established. For connection establishing, only connect_timeout should be used.
I think the code
envoy/source/common/http/codec_client.cc
Lines 23 to 45 in 1913192
| CodecClient::CodecClient(CodecType type, Network::ClientConnectionPtr&& connection, | |
| Upstream::HostDescriptionConstSharedPtr host, | |
| Event::Dispatcher& dispatcher) | |
| : type_(type), host_(host), connection_(std::move(connection)), | |
| idle_timeout_(host_->cluster().idleTimeout()) { | |
| if (type_ != CodecType::HTTP3) { | |
| // Make sure upstream connections process data and then the FIN, rather than processing | |
| // TCP disconnects immediately. (see https://github.com/envoyproxy/envoy/issues/1679 for | |
| // details) | |
| connection_->detectEarlyCloseWhenReadDisabled(false); | |
| } | |
| connection_->addConnectionCallbacks(*this); | |
| connection_->addReadFilter(Network::ReadFilterSharedPtr{new CodecReadFilter(*this)}); | |
| if (idle_timeout_) { | |
| idle_timer_ = dispatcher.createTimer([this]() -> void { onIdleTimeout(); }); | |
| enableIdleTimer(); | |
| } | |
| // We just universally set no delay on connections. Theoretically we might at some point want | |
| // to make this configurable. | |
| connection_->noDelay(true); | |
| } |
Repro steps:
Include sample requests, environment, etc. All data and inputs
required to reproduce the bug.
I don't have fixed steps, but i think we can reproduce by having low idle_timeouts(1s) and high connect timeout (15s) and try connecting with a non responsive endpoint.
Note: The Envoy_collect tool
gathers a tarball with debug logs, config and the following admin
endpoints: /stats, /clusters and /server_info. Please note if there are
privacy concerns, sanitize the data prior to sharing the tarball/pasting.
Admin and Stats Output:
Include the admin output for the following endpoints: /stats,
/clusters, /routes, /server_info. For more information, refer to the
admin endpoint documentation.
Note: If there are privacy concerns, sanitize the data prior to
sharing.
Config:
Include the config used to configure Envoy.
Logs:
The upstream connection in question is "ConnectionId":"31708".
2025-12-21T08:54:38.914487Z debug envoy pool external/envoy/source/common/http/conn_pool_base.cc:78 queueing stream due to no available connections (ready=0 busy=0 connecting=0) thread=82
2025-12-21T08:54:38.914493Z debug envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:291 trying to create new connection thread=82
2025-12-21T08:54:38.914507Z trace envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:292 ConnPoolImplBase 0x2688f81ac1e0, ready_clients_.size(): 0, busy_clients_.size(): 0, connecting_clients_.size(): 0, connecting_stream_capacity_: 0, num_active_streams_: 0, pending_streams_.size(): 1 per upstream preconnect ratio: 1 thread=82
2025-12-21T08:54:38.914511Z debug envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:145 creating a new connection (connecting=0) thread=82
2025-12-21T08:54:38.914582Z debug envoy connection external/envoy/source/common/network/connection_impl.h:98 [Tags: "ConnectionId":"31708"] current connecting state: true thread=82
2025-12-21T08:54:38.914595Z debug envoy client external/envoy/source/common/http/codec_client.cc:57 [Tags: "ConnectionId":"31708"] connecting thread=82
2025-12-21T08:54:38.914601Z debug envoy connection external/envoy/source/common/network/connection_impl.cc:1017 [Tags: "ConnectionId":"31708"] connecting to 10.73.125.107:443 thread=82
2025-12-21T08:54:38.914690Z debug envoy connection external/envoy/source/common/network/connection_impl.cc:1036 [Tags: "ConnectionId":"31708"] connection in progress thread=82
2025-12-21T08:54:38.914702Z trace envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:131 not creating a new connection, shouldCreateNewConnection returned false. thread=82
2025-12-21T08:54:38.914711Z trace envoy http external/envoy/source/common/http/filter_manager.cc:601 [Tags: "ConnectionId":"28698","StreamId":"16729853266424420263"] decode headers called: filter=envoy.filters.http.upstream_codec status=4 thread=82
2025-12-21T08:54:38.914719Z trace envoy http external/envoy/source/common/http/filter_manager.cc:601 [Tags: "ConnectionId":"28698","StreamId":"16729853266424420263"] decode headers called: filter=envoy.filters.http.router status=1 thread=82
2025-12-21T08:54:38.914725Z debug envoy http external/envoy/contrib/golang/filters/http/source/golang_filter.cc:427 after done handle golang status, status: 2, state: decoder:Done, done: true, seen trailers: false thread=82
2025-12-21T08:54:38.954835Z debug envoy connection external/envoy/source/common/network/connection_impl.cc:150 [Tags: "ConnectionId":"31677"] closing data_to_write=0 type=1 thread=82
...
2025-12-21T08:54:41.915612Z debug envoy connection external/envoy/source/common/network/connection_impl.cc:150 [Tags: "ConnectionId":"31708"] closing data_to_write=0 type=1 thread=82
2025-12-21T08:54:41.915638Z debug envoy connection external/envoy/source/common/network/connection_impl.cc:276 [Tags: "ConnectionId":"31708"] closing socket: 1 thread=82
2025-12-21T08:54:41.915704Z trace envoy connection external/envoy/source/common/network/connection_impl.cc:469 [Tags: "ConnectionId":"31708"] raising connection event 1 thread=82
2025-12-21T08:54:41.915714Z debug envoy client external/envoy/source/common/http/codec_client.cc:107 [Tags: "ConnectionId":"31708"] disconnect. resetting 0 pending requests thread=82
2025-12-21T08:54:41.915726Z debug envoy pool external/envoy/source/common/conn_pool/conn_pool_base.cc:495 [Tags: "ConnectionId":"31708"] client disconnected, failure reason: thread=82
2025-12-21T08:54:41.915746Z debug envoy router external/envoy/source/common/router/router.cc:1377 [Tags: "ConnectionId":"28698","StreamId":"16729853266424420263"] upstream reset: reset reason: local connection failure, transport failure reason: thread=82
2025-12-21T08:54:41.915793Z trace envoy main external/envoy/source/common/event/dispatcher_impl.cc:228 item added to deferred deletion list (size=1) thread=82
Note: If there are privacy concerns, sanitize the data prior to
sharing.
Call Stack:
If the Envoy binary is crashing, a call stack is required.
Please refer to the Bazel Stack trace documentation.