-
Notifications
You must be signed in to change notification settings - Fork 246
DRIVERS-2884 Avoid connection churn when operations timeout #1675
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
source/client-side-operations-timeout/tests/connection-churn.yml
Outdated
Show resolved
Hide resolved
source/client-side-operations-timeout/tests/connection-churn.yml
Outdated
Show resolved
Hide resolved
source/client-side-operations-timeout/tests/connection-churn.yml
Outdated
Show resolved
Hide resolved
source/client-side-operations-timeout/tests/connection-churn.yml
Outdated
Show resolved
Hide resolved
Assigned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to wait until #1792 is merged to review the schema changes. From what I can see in the UTF specification, those changes look good.
source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md
Show resolved
Hide resolved
|
||
#### Connection Aliveness Check Fails | ||
|
||
1. Initialize a mock TCP listener to simulate the server-side behavior. The listener should write at least 5 bytes to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on adding a mock server to drivers-evergreen-tools for these tests? I could go either way - there are only two, so the burden on drivers isn't too great but it might be nice if drivers didn't need to worry about the mock server logic themselves.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’m concerned that this solution will require drivers to spin up a server when trying to test locally. I’ve suggested DRIVERS-3183 to support raw-TCP connection test entities which will allow us to convert these prose tests to a unified spec test in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC: @baileympearson could you POC what it would take to create a TCP listener to perform a round trip that holds 1 byte on the write side.
source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md
Outdated
Show resolved
Hide resolved
source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md
Outdated
Show resolved
Hide resolved
source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md
Outdated
Show resolved
Hide resolved
connectionId: int64; | ||
|
||
/** | ||
* The time it took to complete the pending read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So long as data is still coming back from socket in intervals of <3s, it is possible for the same connection to require multiple checkout requests to fully exhaust. So - is this duration the total time it took to read all of the data off of the socket (now() - time of timeout) or the amount of time that the checkout request waited on the final pending read wait?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same comment for logging events)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would anticipate this duration to be within the context of ConnectionPendingResponseStarted, i.e. 1 call to await_pending_response.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Can we clarify that in the description of duration? We can take inspiration from the definitions of duration for checkout failed
and checkout succeeded
events. Ex:
/**
* The time it took to establish the connection.
* In accordance with the definition of establishment of a connection
* specified by `ConnectionPoolOptions.maxConnecting`,
* it is the time elapsed between emitting a `ConnectionCreatedEvent`
* and emitting this event as part of the same checking out.
*
* Naturally, when establishing a connection is part of checking out,
* this duration is not greater than
* `ConnectionCheckedOutEvent`/`ConnectionCheckOutFailedEvent.duration`.
*
* A driver MAY choose the type idiomatic to the driver.
* If the type chosen does not convey units, e.g., `int64`,
* then the driver MAY include units in the name, e.g., `durationMS`.
*/
duration: Duration;
So, maybe something like:
/**
* The time it took to complete the pending read.
* This duration is defined as the time elapsed between emitting a `PendingResponseStarted` event
* and emitting this event as part of the same checking out.
*
* A driver MAY choose the type idiomatic to the driver.
* If the type chosen does not convey units, e.g., `int64`,
* then the driver MAY include units in the name, e.g., `durationMS`.
*/
duration: Duration;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same comment for other definitions of duration in this PR).
source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md
Outdated
Show resolved
Hide resolved
|
||
- description: "force a pending response read, fail first try, succeed second try" | ||
operations: | ||
- name: createEntities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, can we add a test that demonstrates that when the pending read checkout has no timeoutMS set, we use socket_timeout_ms (if it is <3s)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! The Go Driver doesn’t support socket timeouts which is a technically deprecated option. Perhaps @ShaneHarvey can opine. If we decide to add this test would you mind implementing it since the Go Driver has no way of verifying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes to the unified test format LGTM.
@prestonvasquez as per our conversation around where to add the missing event names in #1782, this schema version would be an ideal candidate as it already adds new events to the list.
source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md
Outdated
Show resolved
Hide resolved
source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md
Outdated
Show resolved
Hide resolved
connectionId: int64; | ||
|
||
/** | ||
* The time it took to complete the pending read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. Can we clarify that in the description of duration? We can take inspiration from the definitions of duration for checkout failed
and checkout succeeded
events. Ex:
/**
* The time it took to establish the connection.
* In accordance with the definition of establishment of a connection
* specified by `ConnectionPoolOptions.maxConnecting`,
* it is the time elapsed between emitting a `ConnectionCreatedEvent`
* and emitting this event as part of the same checking out.
*
* Naturally, when establishing a connection is part of checking out,
* this duration is not greater than
* `ConnectionCheckedOutEvent`/`ConnectionCheckOutFailedEvent.duration`.
*
* A driver MAY choose the type idiomatic to the driver.
* If the type chosen does not convey units, e.g., `int64`,
* then the driver MAY include units in the name, e.g., `durationMS`.
*/
duration: Duration;
So, maybe something like:
/**
* The time it took to complete the pending read.
* This duration is defined as the time elapsed between emitting a `PendingResponseStarted` event
* and emitting this event as part of the same checking out.
*
* A driver MAY choose the type idiomatic to the driver.
* If the type chosen does not convey units, e.g., `int64`,
* then the driver MAY include units in the name, e.g., `durationMS`.
*/
duration: Duration;
connectionId: int64; | ||
|
||
/** | ||
* The time it took to complete the pending read. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(same comment for other definitions of duration in this PR).
source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md
Outdated
Show resolved
Hide resolved
source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md
Outdated
Show resolved
Hide resolved
source/client-side-operations-timeout/tests/pending-response.yml
Outdated
Show resolved
Hide resolved
source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
retryability spec changes LGTM. Rust does not have CSOT, so I'm not able to POC/test
2761fb1
to
bb4e2db
Compare
expectError: | ||
isTimeoutError: true | ||
# Execute a subsequent operation to complete the read. | ||
- name: findOne |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
findOne
operation is optional and not all drivers implement it (we do not support it in CSharp Driver for example). Can we replace its usage with find
?
|
||
# Execute a subsequent operation which should time out during the | ||
# pending response read attempt. | ||
- name: findOne |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use find
here instead of findOne
.
serverPort: { $$type: [int, long] } | ||
driverConnectionId: { $$type: [int, long] } | ||
requestId: { $$type: [int, long] } | ||
reason: "timeout" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't we expect Connection checkout failed
event here? To signal that previous checkout attempt failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ShaneHarvey Can you opine on this? In the original review it was decided to mute ConnectionCheckOutFailed
when draining a connection during the check out process. However, this wasn't mentioned in the scope or in the documentation updates for CMAP on this PR. Should the ConnectionCheckOutFailed
event be propagated when we return an error to the operation layer after attempting to drain a pending response?
CC: @stIncMale
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the logs reader prospective: if one need to investigate why some operations takes longer then usual, the logical path to investigate the problem would be go from operation level to more detailed levels IN BULK. So first thingy to check would be operation logs, then server selection (if it takes longer then usual, was there any errors), then connection checkout (the duration of checkout, was there some errors)... and if we DO NOT raise checkout failed event - then the fact of failed checkouts because of pending reads failure might be missed until one decide to investigate step by step a particular operation and will see the fact that checkout was started, but never completed (as per logs). Such behavior will be really confusing.
More over: if we decide to "reduce a noise in logs", then why do we reporting success twice: Pending response succeeded
and then Connection checked out
.
From my understanding we should either report 2 sets of "started" and "succeeded"/"failed" events, or report the only event for pending reads "started" and imply that depending on pending reads results we will report "Connection checked out" or "Connection checkout failed".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah a CheckoutFailed is required here since a CheckoutStarted always needs a corresponding CheckoutFailed or CheckoutSucceeded event.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will try to get an update in ASAP. Thanks @ShaneHarvey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started reading the PR to prepare for triaging https://jira.mongodb.org/browse/DRIVERS-3276, and left some comments. These comments do not represent a result of a full review, nor do I know if I will do such a review.
*/ | ||
interface PendingResponseStarted { | ||
/** | ||
* The ServerAddress of the Endpoint the pool is attempting to connect to. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this description was copied from another event, but it can't be the correct description of what PendingResponseStarted.address
is: when a pool is attempting to read a pending server response, it does so using a connection that has already been established.
The same applies to PendingResponseSucceeded
, PendingResponseFailed
.
/** | ||
* The driver-generated request ID associated with the network timeout. | ||
*/ | ||
requestId: int64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have CommandStartedEvent
/CommandSucceededEvent
/CommandFailedEvent.requestId
specified in https://github.com/mongodb/specifications/blob/master/source/command-logging-and-monitoring/command-logging-and-monitoring.md#events-api.
- I am guessing that
PendingResponseStarted
/PendingResponseSucceeded
/PendingResponseFailed.requestId
represent the same thing - the ID of a command. If this is the case, the specification should explicitly point that out, instead of expecting that a reader will somehow guess this (to guess this correctly it is required but not sufficient to know and remember that there isCommandStartedEvent
/CommandSucceededEvent
/CommandFailedEvent.requestId
). - Assuming that the above guess is correct, which command must
PendingResponseStarted
/PendingResponseSucceeded
/PendingResponseFailed.requestId
correspond to: the one that failed with a network timeout and left a response not completely read from the connection, or the one for which the connection is being checked out? In a way, they are both "associated" with that timeout, and the specification should make it clear which one it talks about.
} | ||
|
||
/** | ||
* Emitted when the connection successfully read the pending read and is ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect, "response" was meant to be used here:
* Emitted when the connection successfully read the pending read and is ready | |
* Emitted when the connection successfully read the pending response and is ready |
But if not, then what does it mean to "read the pending read"?
1. **Persist and update timestamp**: The connection must record the current time immediately after the original socket | ||
timeout. This timestamp MUST be updated to the current time whenever any bytes are successfully read, received, or | ||
consumed while explicitly awaiting the pending response as part of checking out the connection. | ||
2. **Aliveness check**: If the undrained connection remains idle (i.e. no data is read or received) for more than 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is "undrained connection" different from a connection in the "pending response" state? If yes, then the specification should explain what "undrained connection" is. If there is no difference, then the specification should use only one term consistently.
* | ||
* - "available": The Connection has been established and is waiting in the pool to be checked | ||
* out. Contributes to both totalConnectionCount and availableConnectionCount. | ||
* - "pending response": The Connection is attempting to discard a response for an operation where the socket timed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, the state is described as "Connection is attempting to discard a response". Not "read" (which is what I would have expected to be mentioned), not "drain", not "consume", not "receive", not "execute" but only "discard". In other places, however, all of those words are used, and even combinations of them:
- read/drain operation
- drained and discarded vs drained and successfully discarded vs read and discard
- "pending response" drain
- bytes are successfully read, received, or consumed
- reading from the socket (or draining buffered data)
- data is drained and discarded either by explicit reads or, in push-based I/O implementations (e.g. Node.JS), by
consuming buffered data. - The specification does not rely on this distinction, nor can I actually see a meaningful distinction, as any practical TCP implementation has to buffer inbound data one way or another. If a single term, like "reading", is deemed unclear by maintainers of different drivers, let's specify in one place that for the purpose of the specification the term "[pick one term]" is going to be used to denote [explain the meaning taking into account all the implementations quirks that are deemed necessary], and then use the single picked term consistently. - draining buffered data vs consuming buffered data
- execute_pending_response
source/connection-monitoring-and-pooling/tests/README.md
also uses different words in different places, and it is unclear whether they mean the same or not- drain the rest of the response
- discard bytes from the TCP stream
Are all these terms meaningfully different? If yes, they should be defined clearly, and used strictly according to the definition. If not, then the specification should use a single word to refer to the same thing.
|
||
1. **Persist and update timestamp**: The connection must record the current time immediately after the original socket | ||
timeout. This timestamp MUST be updated to the current time whenever any bytes are successfully read, received, or | ||
consumed while explicitly awaiting the pending response as part of checking out the connection. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the difference between "awaiting a pending response" used previously and "explicitly awaiting the pending response" used here?
availableConnectionCount MUST be decremented. | ||
|
||
```text | ||
##### Awaiting Pending Read (drivers that support CSOT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specification introduces a new "pending response" connection state. But then there is this "Awaiting Pending Read" section, which uses the "pending read" term exactly once, and mostly uses "pending response" instead. The "Events" section, on the other hand, uses "pending read".
What is the difference between the meaning of "pending response" and "pending read"?
else: | ||
decrement availableConnectionCount | ||
|
||
error = await_pending_response(pool, connection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
await_pending_response
accepts timeout
and conn
, based on its pseudocode. But here pool
is passed instead of timeout
. I don't think this is correct, especially given that timeout
cannot be extracted from pool
.
* Emitted when the connection being checked out is attempting to read and | ||
* discard a pending server response. | ||
*/ | ||
interface PendingResponseStarted { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have pseudocode showing when most other pool events should be emitted. Furthermore, we have pseudocode showing how duration
for ConnectionReadyEvent
, ConnectionCheckOutFailedEvent
, ConnectionCheckedOutEvent
should be computed.
Let's do the same for the new PendingResponseStarted
, PendingResponseSucceeded
, PendingResponseFailed
events.
reuse. | ||
|
||
```mermaid | ||
sequenceDiagram |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The diagram suggests that a connection in the pending response state can be checked out from a pool, and that reading a response which wasn't read in full can be done after the connection having been checked out. Both of these pieces of behavior contradict the pseudocode and the design.
else: | ||
decrement availableConnectionCount | ||
|
||
error = await_pending_response(pool, connection) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seem to be issues with this pseudocode and the way the "pending response" state is defined:
await_pending_response
must be called if and only ifconnection is "pending response"
, which is not expressed in the pseudocode.- The only state the
connection
can be at this point is "available", soawait_pending_response
can never be called. - The specification instructs to maintain
availableConnectionCount
,pendingConnectionCount
,totalConnectionCount
with the invariant beingtotalConnectionCount = pendingConnectionCount + availableConnectionCount + <in use connection count, which is not explicitly maintained>
(search for"pending" + "available" + "in use"
). While I don't know whyavailableConnectionCount
is maintained (maybe we should figure this out), the introduction of the new "pending response" state affects the aforementioned, and has to be properly dealt with. - I haven't checked everything else, but given that currently it looks like the new state was slapped on top of the spec without much regard to the rest of the spec, I would not be surprised if there were more places that have to be adjusted to take into account the new "pending response" state. Update: I realized that there is at least one more place (please do look for more). "Checking In a Connection" currently either closes a connection or moves it to the "available" state. It should move the connection in either "available" or "pending response" state depending on what transpired before it being checked in. So this section needs a change.
I suspect that handling of connections that are in the the "pending response" state should be done at the same point where perished connections are handled. Note that being perished is not considered to be a state of a connection by the spec, but merely a value of the perishable property of a connection (both "value" and "property" are not used here in the same sense they are used in programming; that's my best interpretation of the spec as it is now).
We also may consider making "pending response" not a new state of a connection, but rather a value of another property, similarly to how it is done with the "perishable" property and its "perished"/"non-perished" values.
availableConnectionCount MUST be decremented. | ||
|
||
```text | ||
##### Awaiting Pending Read (drivers that support CSOT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is wrong with the structure of the spec here:
- Instead of updating the "Checking Out a Connection" section to integrate the new logic into the checking out logic, the new "Awaiting Pending Read (drivers that support CSOT)" subsection was added without any integration. One can understand what this subsection means and how it is supposed to be integrated in the checking out logic only after looking at the pseudocode. But pseudocode is supposed to supplement the prose of the spec, not be a replacement for it. It seems to me that the new logic should be properly described as part of the checking out logic.
- The "Awaiting Pending Read (drivers that support CSOT)" subsection also specifies how an "in use" connection transitions into the "pending response" state. This part has no relation to "Checking Out a Connection", and should not be inside that section.
See also #1675 (comment).
response data is drained and discarded either by explicit reads or, in push-based I/O implementations (e.g. Node.JS), by | ||
consuming buffered data. | ||
|
||
1. **Persist and update timestamp**: The connection must record the current time immediately after the original socket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This enumerated list does not seem to be introduced in any way, or related to anything. It just exists out there, specifying six instructions. This concern is essentially part of the concern expressed above, and should be addressed as part of that one.
seconds since the start of the "pending response" state or since the last successful read/receive, the driver MUST | ||
attempt to verify the connection’s health by either performing a non-blocking read or using the minimal possible | ||
timeout to check if at least one byte can be read/received. If at least one byte can be read the connection should | ||
be returned to the pool for reuse and a retryable error should be propagated to the operation layer. If no bytes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This item and also item 6 below cannot say "returned to the pool" when talking about a connection that is being checked out, because such a connection is still in the pool, and the ConnectionCheckedOutEvent
hasn't been emitted for it.
5. **Error or over-age**: If reading from the socket (or draining buffered data) results in an error that is not a | ||
timeout, or if the connection exceeds the 3 second pending-response window, the driver MUST close the connection. | ||
6. **Clear pending state on success**: If the pending response is fully drained and successfully discarded, and the | ||
connection remains healthy, the pending state may be cleared and the connection MAY be returned to the pool for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the pseudocode, if await_pending_response
completes without an error, the checking out succeeds for the connection: the connection transitions into the "in use" state, the ConnectionCheckedOutEvent
is emitted for it. This happens always. I fail to see how "MAY be returned to the pool" makes sense both because of what I have just described, and because of #1675 (comment).
close_connection(conn) | ||
|
||
if error is not None: | ||
raise error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pseudocode that existed before the current PR used throw
, for example, throw PoolClosedError
. The new pseudocode should continue using that, instead of coming up with new syntax.
|
||
error = await_pending_response(pool, connection) | ||
if error: | ||
return error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pseudocode that existed before the current PR used throw
, for example, throw PoolClosedError
. The new pseudocode should continue using that, instead of coming up with new syntax.
What is worse, is that the new pseudocode uses return
here, but raise
is two other places. That is, it is not even consistent within itself.
|
||
error = await_pending_response(pool, connection) | ||
if error: | ||
return error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should fix the pseudocode by emitting the ConnectionCheckOutFailedEvent
before throwing this error
. This issue was identified in #1675 (comment), I am just pointing out to at least one place where the change has to be done.
@baileympearson and I triaged https://jira.mongodb.org/browse/DRIVERS-3276 today. We disagree on what to do with that ticket. I think, the work on that ticket should be done as part of the work done in this PR. I expressed more of my thoughts in this Jira comment. |
4. **Default timeout**: If no user-provided timeout is specified, the driver MUST use the minimum of (a) the remaining 3 | ||
second "pending response" window and (b) the `socketTimeoutMS` (if supported by the driver) as the effective | ||
timeout for the read/drain operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanych-sun, thank you for pointing out that the CMAP spec has ConnectionPoolOptions.waitQueueTimeoutMS
. If we don't take it into account here (and in the pseudocode), then we are changing its meaning, which may result in surprising behavior from the perspective of users: currently, checking out for a non-CSOT operation is expected to potentially go over waitQueueTimeoutMS
only if(*) a new connection is created and established as part of checking out; otherwise, the duration of checking out is expected to be within waitQueueTimeoutMS
(see the documentation for ConnectionCheckedOutEvent.duration
and waitQueueTimeoutMS
).
Thus, the timeout for draining should not exceed what is left of waitQueueTimeoutMS
, and should not exceed the "remaining 3 second "pending response" window". I am unsure if socketTimeoutMS
needs to be involved at all. I remember proposing socketTimeoutMS
when I was reviewing the design, because waitQueueTimeoutMS
did not even come to my mind at that time, but I am not certain about my recollection.
(*) A driver not providing hard real-time guarantees is irrelevant for the purpose of the current comment, which is why I said "only if".
4d0330a
to
cca5d6c
Compare
cca5d6c
to
4244306
Compare
This PR was closed by mistake, I'll open another PR for the changes, as I cannot push any changes into the branch anymore. And will make sure to double-check all open yet comments to be solved. |
This PR implements the design for connection pooling improvements described in DRIVERS-2884, based on the CSOT (Client-Side Operation Timeout) spec. It addresses connection churn caused by network timeouts during operations, especially in environments with low client-side timeouts and high latency.
When a connection is checked out after a network timeout, the driver now attempts to resume and complete reading any pending server response (instead of closing and discarding the connection). This may require multiple checkouts.
Each pending response read is subject to a cumulative 3-second static timeout. The timeout is refreshed after each successful read, acknowledging that progress is being made. If no data is read and the timeout is exceeded, the connection is closed.
To reduce unnecessary latency, if the timeout has expired while the connection was idle in the pool, a non-blocking single-byte read is performed; if no data is available, the connection is closed immediately.
This update introduces new CMAP events and logging messages (PendingResponseStarted, PendingResponseSucceeded, PendingResponseFailed) to improve observability of this path.
Please complete the following before merging:
clusters, and serverless).