Skip to content

Conversation

@preslavle
Copy link

@preslavle preslavle commented Jan 17, 2026

Does your PR solve an issue?

The PR adds a optional and configurable ping timeout. Without ping timeout, some connections can get stuck indefinitely in rare circumstances, where a TCP connection is blackholed. I don't a reliable repro, but this is a clearly a problem that can happen, especially in cloud environments with live VM migrations, etc. This can affect the connection pool in two ways:

  1. Ping performed during return_to_pool. Without a timeout, the ping might take indefinite amount of time, and holding the connection permit. We need to add a timeout similar to how we have close timeout.
  2. Ping on connection aquire if test_before_acquire is set to true. Unlike return_to_pool, the duration of this is bounded by the acquire timeout. However, if we trigger the acquire connection timeout, the caller will observe an error. However, if we configure a tighter ping timeout, the caller will be able to establish a new connection within the acquire timeout.

To add this, add a configurable and optional ping timeout.

Is this a breaking change?

No. The default ping timeout is None, which means no timeout. I would personally set this to tens of seconds or less in a production environment, but kept the default to None so the change is opt-in.

Testing

Added tests that exercise the success and error (timeout) on acquire and return_to_pool. Duration::ZERO always results in timeout to facilitate making the tests non-flaky and deterministic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant