Skip to content

Conversation

prestonvasquez
Copy link
Member

@prestonvasquez prestonvasquez commented Oct 14, 2024

This PR implements the design for connection pooling improvements described in DRIVERS-2884, based on the CSOT (Client-Side Operation Timeout) spec. It addresses connection churn caused by network timeouts during operations, especially in environments with low client-side timeouts and high latency.

When a connection is checked out after a network timeout, the driver now attempts to resume and complete reading any pending server response (instead of closing and discarding the connection). This may require multiple checkouts.
Each pending response read is subject to a cumulative 3-second static timeout. The timeout is refreshed after each successful read, acknowledging that progress is being made. If no data is read and the timeout is exceeded, the connection is closed.

To reduce unnecessary latency, if the timeout has expired while the connection was idle in the pool, a non-blocking single-byte read is performed; if no data is available, the connection is closed immediately.
This update introduces new CMAP events and logging messages (PendingResponseStarted, PendingResponseSucceeded, PendingResponseFailed) to improve observability of this path.

Please complete the following before merging:

@prestonvasquez prestonvasquez marked this pull request as ready for review April 25, 2025 21:36
@prestonvasquez prestonvasquez requested review from a team as code owners April 25, 2025 21:36
@prestonvasquez prestonvasquez requested review from alcaeus, stIncMale, baileympearson and ShaneHarvey and removed request for a team April 25, 2025 21:36
Copy link

codeowners-service-app bot commented Apr 25, 2025

Assigned qingyang-hu for team dbx-spec-owners-csot because ShaneHarvey is out of office.
Assigned qingyang-hu for team dbx-spec-owners-csot because ShaneHarvey is out of office.
Assigned qingyang-hu for team dbx-spec-owners-csot because ShaneHarvey is out of office.
Assigned tom-selander for team dbx-spec-owners-sdam because ShaneHarvey is out of office.
Assigned esha-bhargava for team dbx-spec-owners-sdam because ShaneHarvey is out of office.

Copy link
Member

@alcaeus alcaeus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to wait until #1792 is merged to review the schema changes. From what I can see in the UTF specification, those changes look good.


#### Connection Aliveness Check Fails

1. Initialize a mock TCP listener to simulate the server-side behavior. The listener should write at least 5 bytes to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on adding a mock server to drivers-evergreen-tools for these tests? I could go either way - there are only two, so the burden on drivers isn't too great but it might be nice if drivers didn't need to worry about the mock server logic themselves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m concerned that this solution will require drivers to spin up a server when trying to test locally. I’ve suggested DRIVERS-3183 to support raw-TCP connection test entities which will allow us to convert these prose tests to a unified spec test in the future.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CC: @baileympearson could you POC what it would take to create a TCP listener to perform a round trip that holds 1 byte on the write side.

connectionId: int64;

/**
* The time it took to complete the pending read.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So long as data is still coming back from socket in intervals of <3s, it is possible for the same connection to require multiple checkout requests to fully exhaust. So - is this duration the total time it took to read all of the data off of the socket (now() - time of timeout) or the amount of time that the checkout request waited on the final pending read wait?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same comment for logging events)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would anticipate this duration to be within the context of ConnectionPendingResponseStarted, i.e. 1 call to await_pending_response.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Can we clarify that in the description of duration? We can take inspiration from the definitions of duration for checkout failed and checkout succeeded events. Ex:

  /**
   * The time it took to establish the connection.
   * In accordance with the definition of establishment of a connection
   * specified by `ConnectionPoolOptions.maxConnecting`,
   * it is the time elapsed between emitting a `ConnectionCreatedEvent`
   * and emitting this event as part of the same checking out.
   *
   * Naturally, when establishing a connection is part of checking out,
   * this duration is not greater than
   * `ConnectionCheckedOutEvent`/`ConnectionCheckOutFailedEvent.duration`.
   *
   * A driver MAY choose the type idiomatic to the driver.
   * If the type chosen does not convey units, e.g., `int64`,
   * then the driver MAY include units in the name, e.g., `durationMS`.
   */
  duration: Duration;

So, maybe something like:

  /**
   * The time it took to complete the pending read.
   * This duration is defined as the time elapsed between emitting a `PendingResponseStarted` event
   * and emitting this event as part of the same checking out.
   *
   * A driver MAY choose the type idiomatic to the driver.
   * If the type chosen does not convey units, e.g., `int64`,
   * then the driver MAY include units in the name, e.g., `durationMS`.
   */
  duration: Duration;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same comment for other definitions of duration in this PR).


- description: "force a pending response read, fail first try, succeed second try"
operations:
- name: createEntities
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, can we add a test that demonstrates that when the pending read checkout has no timeoutMS set, we use socket_timeout_ms (if it is <3s)?

Copy link
Member Author

@prestonvasquez prestonvasquez May 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! The Go Driver doesn’t support socket timeouts which is a technically deprecated option. Perhaps @ShaneHarvey can opine. If we decide to add this test would you mind implementing it since the Go Driver has no way of verifying.

Copy link
Member

@alcaeus alcaeus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to the unified test format LGTM.

@prestonvasquez as per our conversation around where to add the missing event names in #1782, this schema version would be an ideal candidate as it already adds new events to the list.

connectionId: int64;

/**
* The time it took to complete the pending read.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. Can we clarify that in the description of duration? We can take inspiration from the definitions of duration for checkout failed and checkout succeeded events. Ex:

  /**
   * The time it took to establish the connection.
   * In accordance with the definition of establishment of a connection
   * specified by `ConnectionPoolOptions.maxConnecting`,
   * it is the time elapsed between emitting a `ConnectionCreatedEvent`
   * and emitting this event as part of the same checking out.
   *
   * Naturally, when establishing a connection is part of checking out,
   * this duration is not greater than
   * `ConnectionCheckedOutEvent`/`ConnectionCheckOutFailedEvent.duration`.
   *
   * A driver MAY choose the type idiomatic to the driver.
   * If the type chosen does not convey units, e.g., `int64`,
   * then the driver MAY include units in the name, e.g., `durationMS`.
   */
  duration: Duration;

So, maybe something like:

  /**
   * The time it took to complete the pending read.
   * This duration is defined as the time elapsed between emitting a `PendingResponseStarted` event
   * and emitting this event as part of the same checking out.
   *
   * A driver MAY choose the type idiomatic to the driver.
   * If the type chosen does not convey units, e.g., `int64`,
   * then the driver MAY include units in the name, e.g., `durationMS`.
   */
  duration: Duration;

connectionId: int64;

/**
* The time it took to complete the pending read.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same comment for other definitions of duration in this PR).

@prestonvasquez prestonvasquez removed the request for review from qingyang-hu May 6, 2025 23:28
@prestonvasquez prestonvasquez removed the request for review from JamesKovacs July 14, 2025 19:53
Copy link
Contributor

@isabelatkinson isabelatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

retryability spec changes LGTM. Rust does not have CSOT, so I'm not able to POC/test

@jyemin jyemin removed the request for review from tom-selander August 1, 2025 20:02
@prestonvasquez prestonvasquez force-pushed the DRIVERS-2884 branch 8 times, most recently from 2761fb1 to bb4e2db Compare August 29, 2025 19:28
expectError:
isTimeoutError: true
# Execute a subsequent operation to complete the read.
- name: findOne
Copy link
Member

@sanych-sun sanych-sun Sep 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

findOne operation is optional and not all drivers implement it (we do not support it in CSharp Driver for example). Can we replace its usage with find?


# Execute a subsequent operation which should time out during the
# pending response read attempt.
- name: findOne
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use find here instead of findOne.

serverPort: { $$type: [int, long] }
driverConnectionId: { $$type: [int, long] }
requestId: { $$type: [int, long] }
reason: "timeout"
Copy link
Member

@sanych-sun sanych-sun Sep 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we expect Connection checkout failed event here? To signal that previous checkout attempt failed.

Copy link
Member Author

@prestonvasquez prestonvasquez Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ShaneHarvey Can you opine on this? In the original review it was decided to mute ConnectionCheckOutFailed when draining a connection during the check out process. However, this wasn't mentioned in the scope or in the documentation updates for CMAP on this PR. Should the ConnectionCheckOutFailed event be propagated when we return an error to the operation layer after attempting to drain a pending response?

CC: @stIncMale

Copy link
Member

@sanych-sun sanych-sun Sep 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the logs reader prospective: if one need to investigate why some operations takes longer then usual, the logical path to investigate the problem would be go from operation level to more detailed levels IN BULK. So first thingy to check would be operation logs, then server selection (if it takes longer then usual, was there any errors), then connection checkout (the duration of checkout, was there some errors)... and if we DO NOT raise checkout failed event - then the fact of failed checkouts because of pending reads failure might be missed until one decide to investigate step by step a particular operation and will see the fact that checkout was started, but never completed (as per logs). Such behavior will be really confusing.
More over: if we decide to "reduce a noise in logs", then why do we reporting success twice: Pending response succeeded and then Connection checked out.

From my understanding we should either report 2 sets of "started" and "succeeded"/"failed" events, or report the only event for pending reads "started" and imply that depending on pending reads results we will report "Connection checked out" or "Connection checkout failed".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah a CheckoutFailed is required here since a CheckoutStarted always needs a corresponding CheckoutFailed or CheckoutSucceeded event.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, will try to get an update in ASAP. Thanks @ShaneHarvey

Copy link
Member

@stIncMale stIncMale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started reading the PR to prepare for triaging https://jira.mongodb.org/browse/DRIVERS-3276, and left some comments. These comments do not represent a result of a full review, nor do I know if I will do such a review.

*/
interface PendingResponseStarted {
/**
* The ServerAddress of the Endpoint the pool is attempting to connect to.
Copy link
Member

@stIncMale stIncMale Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this description was copied from another event, but it can't be the correct description of what PendingResponseStarted.address is: when a pool is attempting to read a pending server response, it does so using a connection that has already been established.

The same applies to PendingResponseSucceeded, PendingResponseFailed.

/**
* The driver-generated request ID associated with the network timeout.
*/
requestId: int64;
Copy link
Member

@stIncMale stIncMale Sep 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have CommandStartedEvent/CommandSucceededEvent/CommandFailedEvent.requestId specified in https://github.com/mongodb/specifications/blob/master/source/command-logging-and-monitoring/command-logging-and-monitoring.md#events-api.

  1. I am guessing that PendingResponseStarted/PendingResponseSucceeded/PendingResponseFailed.requestId represent the same thing - the ID of a command. If this is the case, the specification should explicitly point that out, instead of expecting that a reader will somehow guess this (to guess this correctly it is required but not sufficient to know and remember that there is CommandStartedEvent/CommandSucceededEvent/CommandFailedEvent.requestId).
  2. Assuming that the above guess is correct, which command must PendingResponseStarted/PendingResponseSucceeded/PendingResponseFailed.requestId correspond to: the one that failed with a network timeout and left a response not completely read from the connection, or the one for which the connection is being checked out? In a way, they are both "associated" with that timeout, and the specification should make it clear which one it talks about.

}

/**
* Emitted when the connection successfully read the pending read and is ready
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect, "response" was meant to be used here:

Suggested change
* Emitted when the connection successfully read the pending read and is ready
* Emitted when the connection successfully read the pending response and is ready

But if not, then what does it mean to "read the pending read"?

1. **Persist and update timestamp**: The connection must record the current time immediately after the original socket
timeout. This timestamp MUST be updated to the current time whenever any bytes are successfully read, received, or
consumed while explicitly awaiting the pending response as part of checking out the connection.
2. **Aliveness check**: If the undrained connection remains idle (i.e. no data is read or received) for more than 3
Copy link
Member

@stIncMale stIncMale Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is "undrained connection" different from a connection in the "pending response" state? If yes, then the specification should explain what "undrained connection" is. If there is no difference, then the specification should use only one term consistently.

*
* - "available": The Connection has been established and is waiting in the pool to be checked
* out. Contributes to both totalConnectionCount and availableConnectionCount.
* - "pending response": The Connection is attempting to discard a response for an operation where the socket timed
Copy link
Member

@stIncMale stIncMale Sep 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, the state is described as "Connection is attempting to discard a response". Not "read" (which is what I would have expected to be mentioned), not "drain", not "consume", not "receive", not "execute" but only "discard". In other places, however, all of those words are used, and even combinations of them:

  • read/drain operation
  • drained and discarded vs drained and successfully discarded vs read and discard
  • "pending response" drain
  • bytes are successfully read, received, or consumed
  • reading from the socket (or draining buffered data)
  • data is drained and discarded either by explicit reads or, in push-based I/O implementations (e.g. Node.JS), by
    consuming buffered data.
    - The specification does not rely on this distinction, nor can I actually see a meaningful distinction, as any practical TCP implementation has to buffer inbound data one way or another. If a single term, like "reading", is deemed unclear by maintainers of different drivers, let's specify in one place that for the purpose of the specification the term "[pick one term]" is going to be used to denote [explain the meaning taking into account all the implementations quirks that are deemed necessary], and then use the single picked term consistently.
  • draining buffered data vs consuming buffered data
  • execute_pending_response
  • source/connection-monitoring-and-pooling/tests/README.md also uses different words in different places, and it is unclear whether they mean the same or not
    • drain the rest of the response
    • discard bytes from the TCP stream

Are all these terms meaningfully different? If yes, they should be defined clearly, and used strictly according to the definition. If not, then the specification should use a single word to refer to the same thing.


1. **Persist and update timestamp**: The connection must record the current time immediately after the original socket
timeout. This timestamp MUST be updated to the current time whenever any bytes are successfully read, received, or
consumed while explicitly awaiting the pending response as part of checking out the connection.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the difference between "awaiting a pending response" used previously and "explicitly awaiting the pending response" used here?

availableConnectionCount MUST be decremented.

```text
##### Awaiting Pending Read (drivers that support CSOT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The specification introduces a new "pending response" connection state. But then there is this "Awaiting Pending Read" section, which uses the "pending read" term exactly once, and mostly uses "pending response" instead. The "Events" section, on the other hand, uses "pending read".

What is the difference between the meaning of "pending response" and "pending read"?

else:
decrement availableConnectionCount

error = await_pending_response(pool, connection)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

await_pending_response accepts timeout and conn, based on its pseudocode. But here pool is passed instead of timeout. I don't think this is correct, especially given that timeout cannot be extracted from pool.

* Emitted when the connection being checked out is attempting to read and
* discard a pending server response.
*/
interface PendingResponseStarted {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have pseudocode showing when most other pool events should be emitted. Furthermore, we have pseudocode showing how duration for ConnectionReadyEvent, ConnectionCheckOutFailedEvent, ConnectionCheckedOutEvent should be computed.

Let's do the same for the new PendingResponseStarted, PendingResponseSucceeded, PendingResponseFailed events.

reuse.

```mermaid
sequenceDiagram
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The diagram suggests that a connection in the pending response state can be checked out from a pool, and that reading a response which wasn't read in full can be done after the connection having been checked out. Both of these pieces of behavior contradict the pseudocode and the design.

else:
decrement availableConnectionCount

error = await_pending_response(pool, connection)
Copy link
Member

@stIncMale stIncMale Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seem to be issues with this pseudocode and the way the "pending response" state is defined:

  1. await_pending_response must be called if and only if connection is "pending response", which is not expressed in the pseudocode.
  2. The only state the connection can be at this point is "available", so await_pending_response can never be called.
  3. The specification instructs to maintain availableConnectionCount, pendingConnectionCount, totalConnectionCount with the invariant being totalConnectionCount = pendingConnectionCount + availableConnectionCount + <in use connection count, which is not explicitly maintained> (search for "pending" + "available" + "in use"). While I don't know why availableConnectionCount is maintained (maybe we should figure this out), the introduction of the new "pending response" state affects the aforementioned, and has to be properly dealt with.
  4. I haven't checked everything else, but given that currently it looks like the new state was slapped on top of the spec without much regard to the rest of the spec, I would not be surprised if there were more places that have to be adjusted to take into account the new "pending response" state. Update: I realized that there is at least one more place (please do look for more). "Checking In a Connection" currently either closes a connection or moves it to the "available" state. It should move the connection in either "available" or "pending response" state depending on what transpired before it being checked in. So this section needs a change.

I suspect that handling of connections that are in the the "pending response" state should be done at the same point where perished connections are handled. Note that being perished is not considered to be a state of a connection by the spec, but merely a value of the perishable property of a connection (both "value" and "property" are not used here in the same sense they are used in programming; that's my best interpretation of the spec as it is now).

We also may consider making "pending response" not a new state of a connection, but rather a value of another property, similarly to how it is done with the "perishable" property and its "perished"/"non-perished" values.

availableConnectionCount MUST be decremented.

```text
##### Awaiting Pending Read (drivers that support CSOT)
Copy link
Member

@stIncMale stIncMale Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is wrong with the structure of the spec here:

  1. Instead of updating the "Checking Out a Connection" section to integrate the new logic into the checking out logic, the new "Awaiting Pending Read (drivers that support CSOT)" subsection was added without any integration. One can understand what this subsection means and how it is supposed to be integrated in the checking out logic only after looking at the pseudocode. But pseudocode is supposed to supplement the prose of the spec, not be a replacement for it. It seems to me that the new logic should be properly described as part of the checking out logic.
  2. The "Awaiting Pending Read (drivers that support CSOT)" subsection also specifies how an "in use" connection transitions into the "pending response" state. This part has no relation to "Checking Out a Connection", and should not be inside that section.

See also #1675 (comment).

response data is drained and discarded either by explicit reads or, in push-based I/O implementations (e.g. Node.JS), by
consuming buffered data.

1. **Persist and update timestamp**: The connection must record the current time immediately after the original socket
Copy link
Member

@stIncMale stIncMale Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This enumerated list does not seem to be introduced in any way, or related to anything. It just exists out there, specifying six instructions. This concern is essentially part of the concern expressed above, and should be addressed as part of that one.

seconds since the start of the "pending response" state or since the last successful read/receive, the driver MUST
attempt to verify the connection’s health by either performing a non-blocking read or using the minimal possible
timeout to check if at least one byte can be read/received. If at least one byte can be read the connection should
be returned to the pool for reuse and a retryable error should be propagated to the operation layer. If no bytes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This item and also item 6 below cannot say "returned to the pool" when talking about a connection that is being checked out, because such a connection is still in the pool, and the ConnectionCheckedOutEvent hasn't been emitted for it.

5. **Error or over-age**: If reading from the socket (or draining buffered data) results in an error that is not a
timeout, or if the connection exceeds the 3 second pending-response window, the driver MUST close the connection.
6. **Clear pending state on success**: If the pending response is fully drained and successfully discarded, and the
connection remains healthy, the pending state may be cleared and the connection MAY be returned to the pool for
Copy link
Member

@stIncMale stIncMale Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the pseudocode, if await_pending_response completes without an error, the checking out succeeds for the connection: the connection transitions into the "in use" state, the ConnectionCheckedOutEvent is emitted for it. This happens always. I fail to see how "MAY be returned to the pool" makes sense both because of what I have just described, and because of #1675 (comment).

close_connection(conn)

if error is not None:
raise error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pseudocode that existed before the current PR used throw, for example, throw PoolClosedError. The new pseudocode should continue using that, instead of coming up with new syntax.


error = await_pending_response(pool, connection)
if error:
return error
Copy link
Member

@stIncMale stIncMale Sep 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pseudocode that existed before the current PR used throw, for example, throw PoolClosedError. The new pseudocode should continue using that, instead of coming up with new syntax.

What is worse, is that the new pseudocode uses return here, but raise is two other places. That is, it is not even consistent within itself.


error = await_pending_response(pool, connection)
if error:
return error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should fix the pseudocode by emitting the ConnectionCheckOutFailedEvent before throwing this error. This issue was identified in #1675 (comment), I am just pointing out to at least one place where the change has to be done.

@stIncMale
Copy link
Member

@baileympearson and I triaged https://jira.mongodb.org/browse/DRIVERS-3276 today. We disagree on what to do with that ticket. I think, the work on that ticket should be done as part of the work done in this PR. I expressed more of my thoughts in this Jira comment.

Comment on lines 609 to 611
4. **Default timeout**: If no user-provided timeout is specified, the driver MUST use the minimum of (a) the remaining 3
second "pending response" window and (b) the `socketTimeoutMS` (if supported by the driver) as the effective
timeout for the read/drain operation.
Copy link
Member

@stIncMale stIncMale Sep 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sanych-sun, thank you for pointing out that the CMAP spec has ConnectionPoolOptions.waitQueueTimeoutMS. If we don't take it into account here (and in the pseudocode), then we are changing its meaning, which may result in surprising behavior from the perspective of users: currently, checking out for a non-CSOT operation is expected to potentially go over waitQueueTimeoutMS only if(*) a new connection is created and established as part of checking out; otherwise, the duration of checking out is expected to be within waitQueueTimeoutMS (see the documentation for ConnectionCheckedOutEvent.duration and waitQueueTimeoutMS).

Thus, the timeout for draining should not exceed what is left of waitQueueTimeoutMS, and should not exceed the "remaining 3 second "pending response" window". I am unsure if socketTimeoutMS needs to be involved at all. I remember proposing socketTimeoutMS when I was reviewing the design, because waitQueueTimeoutMS did not even come to my mind at that time, but I am not certain about my recollection.


(*) A driver not providing hard real-time guarantees is irrelevant for the purpose of the current comment, which is why I said "only if".

@sanych-sun
Copy link
Member

This PR was closed by mistake, I'll open another PR for the changes, as I cannot push any changes into the branch anymore. And will make sure to double-check all open yet comments to be solved.
Sorry for the inconveniences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants