DRIVERS-2884 Avoid connection churn when operations timeout #1675

prestonvasquez · 2024-10-14T21:06:45Z

This PR implements the design for connection pooling improvements described in DRIVERS-2884, based on the CSOT (Client-Side Operation Timeout) spec. It addresses connection churn caused by network timeouts during operations, especially in environments with low client-side timeouts and high latency.

When a connection is checked out after a network timeout, the driver now attempts to resume and complete reading any pending server response (instead of closing and discarding the connection). This may require multiple checkouts.
Each pending response read is subject to a cumulative 3-second static timeout. The timeout is refreshed after each successful read, acknowledging that progress is being made. If no data is read and the timeout is exceeded, the connection is closed.

To reduce unnecessary latency, if the timeout has expired while the connection was idle in the pool, a non-blocking single-byte read is performed; if no data is available, the connection is closed immediately.
This update introduces new CMAP events and logging messages (PendingResponseStarted, PendingResponseSucceeded, PendingResponseFailed) to improve observability of this path.

Please complete the following before merging:

Update changelog.
Make sure there are generated JSON files from the YAML test files.
Test changes in at least one language driver. Go: GODRIVER-3173 Complete pending reads on conn checkout mongo-go-driver#1977
Test these changes against all server versions and topologies (including standalone, replica set, sharded
clusters, and serverless).

source/client-side-operations-timeout/tests/connection-churn.yml

codeowners-service-app · 2025-04-25T21:49:22Z

Assigned qingyang-hu for team dbx-spec-owners-csot because ShaneHarvey is out of office.
Assigned qingyang-hu for team dbx-spec-owners-csot because ShaneHarvey is out of office.
Assigned qingyang-hu for team dbx-spec-owners-csot because ShaneHarvey is out of office.
Assigned tom-selander for team dbx-spec-owners-sdam because ShaneHarvey is out of office.
Assigned esha-bhargava for team dbx-spec-owners-sdam because ShaneHarvey is out of office.

alcaeus

I'd like to wait until #1792 is merged to review the schema changes. From what I can see in the UTF specification, those changes look good.

source/unified-test-format/unified-test-format.md

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

baileympearson · 2025-05-01T20:16:09Z

source/connection-monitoring-and-pooling/tests/README.md

+
+#### Connection Aliveness Check Fails
+
+1. Initialize a mock TCP listener to simulate the server-side behavior. The listener should write at least 5 bytes to


Thoughts on adding a mock server to drivers-evergreen-tools for these tests? I could go either way - there are only two, so the burden on drivers isn't too great but it might be nice if drivers didn't need to worry about the mock server logic themselves.

I’m concerned that this solution will require drivers to spin up a server when trying to test locally. I’ve suggested DRIVERS-3183 to support raw-TCP connection test entities which will allow us to convert these prose tests to a unified spec test in the future.

CC: @baileympearson could you POC what it would take to create a TCP listener to perform a round trip that holds 1 byte on the write side.

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

baileympearson · 2025-05-02T15:58:46Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+  connectionId: int64;
+
+  /**
+   *  The time it took to complete the pending read.


So long as data is still coming back from socket in intervals of <3s, it is possible for the same connection to require multiple checkout requests to fully exhaust. So - is this duration the total time it took to read all of the data off of the socket (now() - time of timeout) or the amount of time that the checkout request waited on the final pending read wait?

(same comment for logging events)

I would anticipate this duration to be within the context of ConnectionPendingResponseStarted, i.e. 1 call to await_pending_response.

Agreed. Can we clarify that in the description of duration? We can take inspiration from the definitions of duration for checkout failed and checkout succeeded events. Ex:

/** * The time it took to establish the connection. * In accordance with the definition of establishment of a connection * specified by `ConnectionPoolOptions.maxConnecting`, * it is the time elapsed between emitting a `ConnectionCreatedEvent` * and emitting this event as part of the same checking out. * * Naturally, when establishing a connection is part of checking out, * this duration is not greater than * `ConnectionCheckedOutEvent`/`ConnectionCheckOutFailedEvent.duration`. * * A driver MAY choose the type idiomatic to the driver. * If the type chosen does not convey units, e.g., `int64`, * then the driver MAY include units in the name, e.g., `durationMS`. */ duration: Duration;

So, maybe something like:

/** * The time it took to complete the pending read. * This duration is defined as the time elapsed between emitting a `PendingResponseStarted` event * and emitting this event as part of the same checking out. * * A driver MAY choose the type idiomatic to the driver. * If the type chosen does not convey units, e.g., `int64`, * then the driver MAY include units in the name, e.g., `durationMS`. */ duration: Duration;

(same comment for other definitions of duration in this PR).

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

source/connection-monitoring-and-pooling/tests/README.md

baileympearson · 2025-05-02T16:12:51Z

source/connection-monitoring-and-pooling/tests/logging/connection-logging.yml

+
+  - description: "force a pending response read, fail first try, succeed second try"
+    operations:
+      - name: createEntities


If possible, can we add a test that demonstrates that when the pending read checkout has no timeoutMS set, we use socket_timeout_ms (if it is <3s)?

Great catch! The Go Driver doesn’t support socket timeouts which is a technically deprecated option. Perhaps @ShaneHarvey can opine. If we decide to add this test would you mind implementing it since the Go Driver has no way of verifying.

alcaeus

Changes to the unified test format LGTM.

@prestonvasquez as per our conversation around where to add the missing event names in #1782, this schema version would be an ideal candidate as it already adds new events to the list.

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

baileympearson · 2025-05-05T20:43:27Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+  connectionId: int64;
+
+  /**
+   *  The time it took to complete the pending read.


Agreed. Can we clarify that in the description of duration? We can take inspiration from the definitions of duration for checkout failed and checkout succeeded events. Ex:

/** * The time it took to establish the connection. * In accordance with the definition of establishment of a connection * specified by `ConnectionPoolOptions.maxConnecting`, * it is the time elapsed between emitting a `ConnectionCreatedEvent` * and emitting this event as part of the same checking out. * * Naturally, when establishing a connection is part of checking out, * this duration is not greater than * `ConnectionCheckedOutEvent`/`ConnectionCheckOutFailedEvent.duration`. * * A driver MAY choose the type idiomatic to the driver. * If the type chosen does not convey units, e.g., `int64`, * then the driver MAY include units in the name, e.g., `durationMS`. */ duration: Duration;

So, maybe something like:

/** * The time it took to complete the pending read. * This duration is defined as the time elapsed between emitting a `PendingResponseStarted` event * and emitting this event as part of the same checking out. * * A driver MAY choose the type idiomatic to the driver. * If the type chosen does not convey units, e.g., `int64`, * then the driver MAY include units in the name, e.g., `durationMS`. */ duration: Duration;

baileympearson · 2025-05-05T20:43:38Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+  connectionId: int64;
+
+  /**
+   *  The time it took to complete the pending read.


(same comment for other definitions of duration in this PR).

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

source/connection-monitoring-and-pooling/tests/README.md

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

source/client-side-operations-timeout/tests/pending-response.yml

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

source/retryable-reads/retryable-reads.md

isabelatkinson

retryability spec changes LGTM. Rust does not have CSOT, so I'm not able to POC/test

sanych-sun · 2025-09-02T19:46:27Z

source/client-side-operations-timeout/tests/pending-response.yml

+        expectError:
+          isTimeoutError: true
+      # Execute a subsequent operation to complete the read.
+      - name: findOne


findOne operation is optional and not all drivers implement it (we do not support it in CSharp Driver for example). Can we replace its usage with find?

sanych-sun · 2025-09-10T20:47:42Z

source/connection-monitoring-and-pooling/tests/logging/connection-logging-csot.yml

+
+      # Execute a subsequent operation which should time out during the
+      # pending response read attempt.
+      - name: findOne


Please use find here instead of findOne.

sanych-sun · 2025-09-10T21:26:39Z

source/connection-monitoring-and-pooling/tests/logging/connection-logging-csot.yml

+              serverPort: { $$type: [int, long] }
+              driverConnectionId: { $$type: [int, long] }
+              requestId: { $$type: [int, long] }
+              reason: "timeout"


Shouldn't we expect Connection checkout failed event here? To signal that previous checkout attempt failed.

@ShaneHarvey Can you opine on this? In the original review it was decided to mute ConnectionCheckOutFailed when draining a connection during the check out process. However, this wasn't mentioned in the scope or in the documentation updates for CMAP on this PR. Should the ConnectionCheckOutFailed event be propagated when we return an error to the operation layer after attempting to drain a pending response?

CC: @stIncMale

From the logs reader prospective: if one need to investigate why some operations takes longer then usual, the logical path to investigate the problem would be go from operation level to more detailed levels IN BULK. So first thingy to check would be operation logs, then server selection (if it takes longer then usual, was there any errors), then connection checkout (the duration of checkout, was there some errors)... and if we DO NOT raise checkout failed event - then the fact of failed checkouts because of pending reads failure might be missed until one decide to investigate step by step a particular operation and will see the fact that checkout was started, but never completed (as per logs). Such behavior will be really confusing.
More over: if we decide to "reduce a noise in logs", then why do we reporting success twice: Pending response succeeded and then Connection checked out.

From my understanding we should either report 2 sets of "started" and "succeeded"/"failed" events, or report the only event for pending reads "started" and imply that depending on pending reads results we will report "Connection checked out" or "Connection checkout failed".

Yeah a CheckoutFailed is required here since a CheckoutStarted always needs a corresponding CheckoutFailed or CheckoutSucceeded event.

Sounds good, will try to get an update in ASAP. Thanks @ShaneHarvey

stIncMale

I started reading the PR to prepare for triaging https://jira.mongodb.org/browse/DRIVERS-3276, and left some comments. These comments do not represent a result of a full review, nor do I know if I will do such a review.

stIncMale · 2025-09-15T21:59:39Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+ */
+interface PendingResponseStarted {
+  /**
+   *  The ServerAddress of the Endpoint the pool is attempting to connect to.


Looks like this description was copied from another event, but it can't be the correct description of what PendingResponseStarted.address is: when a pool is attempting to read a pending server response, it does so using a connection that has already been established.

The same applies to PendingResponseSucceeded, PendingResponseFailed.

stIncMale · 2025-09-15T22:20:08Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+  /**
+   *  The driver-generated request ID associated with the network timeout.
+   */
+  requestId: int64;


We already have CommandStartedEvent/CommandSucceededEvent/CommandFailedEvent.requestId specified in https://github.com/mongodb/specifications/blob/master/source/command-logging-and-monitoring/command-logging-and-monitoring.md#events-api.

I am guessing that PendingResponseStarted/PendingResponseSucceeded/PendingResponseFailed.requestId represent the same thing - the ID of a command. If this is the case, the specification should explicitly point that out, instead of expecting that a reader will somehow guess this (to guess this correctly it is required but not sufficient to know and remember that there is CommandStartedEvent/CommandSucceededEvent/CommandFailedEvent.requestId).

Assuming that the above guess is correct, which command must PendingResponseStarted/PendingResponseSucceeded/PendingResponseFailed.requestId correspond to: the one that failed with a network timeout and left a response not completely read from the connection, or the one for which the connection is being checked out? In a way, they are both "associated" with that timeout, and the specification should make it clear which one it talks about.

stIncMale · 2025-09-15T22:21:07Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+}
+
+/**
+ *  Emitted when the connection successfully read the pending read and is ready


I suspect, "response" was meant to be used here:

Suggested change

* Emitted when the connection successfully read the pending read and is ready

* Emitted when the connection successfully read the pending response and is ready

But if not, then what does it mean to "read the pending read"?

stIncMale · 2025-09-16T00:37:27Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+1. **Persist and update timestamp**: The connection must record the current time immediately after the original socket
+    timeout. This timestamp MUST be updated to the current time whenever any bytes are successfully read, received, or
+    consumed while explicitly awaiting the pending response as part of checking out the connection.
+2. **Aliveness check**: If the undrained connection remains idle (i.e. no data is read or received) for more than 3


Is "undrained connection" different from a connection in the "pending response" state? If yes, then the specification should explain what "undrained connection" is. If there is no difference, then the specification should use only one term consistently.

stIncMale · 2025-09-16T01:27:41Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

   *
-   *   - "available":     The Connection has been established and is waiting in the pool to be checked
-   *                      out. Contributes to both totalConnectionCount and availableConnectionCount.
+   *   - "pending response":  The Connection is attempting to discard a response for an operation where the socket timed


Here, the state is described as "Connection is attempting to discard a response". Not "read" (which is what I would have expected to be mentioned), not "drain", not "consume", not "receive", not "execute" but only "discard". In other places, however, all of those words are used, and even combinations of them:

read/drain operation

drained and discarded vs drained and successfully discarded vs read and discard

"pending response" drain

bytes are successfully read, received, or consumed

reading from the socket (or draining buffered data)

data is drained and discarded either by explicit reads or, in push-based I/O implementations (e.g. Node.JS), by
consuming buffered data. - The specification does not rely on this distinction, nor can I actually see a meaningful distinction, as any practical TCP implementation has to buffer inbound data one way or another. If a single term, like "reading", is deemed unclear by maintainers of different drivers, let's specify in one place that for the purpose of the specification the term "[pick one term]" is going to be used to denote [explain the meaning taking into account all the implementations quirks that are deemed necessary], and then use the single picked term consistently.

draining buffered data vs consuming buffered data

execute_pending_response

source/connection-monitoring-and-pooling/tests/README.md also uses different words in different places, and it is unclear whether they mean the same or not

drain the rest of the response

discard bytes from the TCP stream

Are all these terms meaningfully different? If yes, they should be defined clearly, and used strictly according to the definition. If not, then the specification should use a single word to refer to the same thing.

stIncMale · 2025-09-16T01:32:21Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+
+1. **Persist and update timestamp**: The connection must record the current time immediately after the original socket
+    timeout. This timestamp MUST be updated to the current time whenever any bytes are successfully read, received, or
+    consumed while explicitly awaiting the pending response as part of checking out the connection.


What is the difference between "awaiting a pending response" used previously and "explicitly awaiting the pending response" used here?

stIncMale · 2025-09-16T01:46:48Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

 availableConnectionCount MUST be decremented.

-```text
+##### Awaiting Pending Read (drivers that support CSOT)


The specification introduces a new "pending response" connection state. But then there is this "Awaiting Pending Read" section, which uses the "pending read" term exactly once, and mostly uses "pending response" instead. The "Events" section, on the other hand, uses "pending read".

What is the difference between the meaning of "pending response" and "pending read"?

stIncMale · 2025-09-16T22:38:41Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

 else:
    decrement availableConnectionCount
+
+error = await_pending_response(pool, connection)


await_pending_response accepts timeout and conn, based on its pseudocode. But here pool is passed instead of timeout. I don't think this is correct, especially given that timeout cannot be extracted from pool.

stIncMale · 2025-09-16T22:51:43Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+ *  Emitted when the connection being checked out is attempting to read and
+ *  discard a pending server response.
+ */
+interface PendingResponseStarted {


We have pseudocode showing when most other pool events should be emitted. Furthermore, we have pseudocode showing how duration for ConnectionReadyEvent, ConnectionCheckOutFailedEvent, ConnectionCheckedOutEvent should be computed.

Let's do the same for the new PendingResponseStarted, PendingResponseSucceeded, PendingResponseFailed events.

stIncMale · 2025-09-17T02:18:09Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+    reuse.
+
+```mermaid
+sequenceDiagram


The diagram suggests that a connection in the pending response state can be checked out from a pool, and that reading a response which wasn't read in full can be done after the connection having been checked out. Both of these pieces of behavior contradict the pseudocode and the design.

stIncMale · 2025-09-17T04:06:47Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

 else:
    decrement availableConnectionCount
+
+error = await_pending_response(pool, connection)


There seem to be issues with this pseudocode and the way the "pending response" state is defined:

await_pending_response must be called if and only if connection is "pending response", which is not expressed in the pseudocode.

The only state the connection can be at this point is "available", so await_pending_response can never be called.

The specification instructs to maintain availableConnectionCount, pendingConnectionCount, totalConnectionCount with the invariant being totalConnectionCount = pendingConnectionCount + availableConnectionCount + <in use connection count, which is not explicitly maintained> (search for "pending" + "available" + "in use"). While I don't know why availableConnectionCount is maintained (maybe we should figure this out), the introduction of the new "pending response" state affects the aforementioned, and has to be properly dealt with.

I haven't checked everything else, but given that currently it looks like the new state was slapped on top of the spec without much regard to the rest of the spec, I would not be surprised if there were more places that have to be adjusted to take into account the new "pending response" state. Update: I realized that there is at least one more place (please do look for more). "Checking In a Connection" currently either closes a connection or moves it to the "available" state. It should move the connection in either "available" or "pending response" state depending on what transpired before it being checked in. So this section needs a change.

I suspect that handling of connections that are in the the "pending response" state should be done at the same point where perished connections are handled. Note that being perished is not considered to be a state of a connection by the spec, but merely a value of the perishable property of a connection (both "value" and "property" are not used here in the same sense they are used in programming; that's my best interpretation of the spec as it is now).

We also may consider making "pending response" not a new state of a connection, but rather a value of another property, similarly to how it is done with the "perishable" property and its "perished"/"non-perished" values.

stIncMale · 2025-09-17T04:29:16Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

 availableConnectionCount MUST be decremented.

-```text
+##### Awaiting Pending Read (drivers that support CSOT)


Something is wrong with the structure of the spec here:

Instead of updating the "Checking Out a Connection" section to integrate the new logic into the checking out logic, the new "Awaiting Pending Read (drivers that support CSOT)" subsection was added without any integration. One can understand what this subsection means and how it is supposed to be integrated in the checking out logic only after looking at the pseudocode. But pseudocode is supposed to supplement the prose of the spec, not be a replacement for it. It seems to me that the new logic should be properly described as part of the checking out logic.

The "Awaiting Pending Read (drivers that support CSOT)" subsection also specifies how an "in use" connection transitions into the "pending response" state. This part has no relation to "Checking Out a Connection", and should not be inside that section.

See also #1675 (comment).

stIncMale · 2025-09-17T04:34:30Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+response data is drained and discarded either by explicit reads or, in push-based I/O implementations (e.g. Node.JS), by
+consuming buffered data.
+
+1. **Persist and update timestamp**: The connection must record the current time immediately after the original socket


This enumerated list does not seem to be introduced in any way, or related to anything. It just exists out there, specifying six instructions. This concern is essentially part of the concern expressed above, and should be addressed as part of that one.

stIncMale · 2025-09-17T04:43:05Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+    seconds since the start of the "pending response" state or since the last successful read/receive, the driver MUST
+    attempt to verify the connection’s health by either performing a non-blocking read or using the minimal possible
+    timeout to check if at least one byte can be read/received. If at least one byte can be read the connection should
+    be returned to the pool for reuse and a retryable error should be propagated to the operation layer. If no bytes


This item and also item 6 below cannot say "returned to the pool" when talking about a connection that is being checked out, because such a connection is still in the pool, and the ConnectionCheckedOutEvent hasn't been emitted for it.

stIncMale · 2025-09-17T04:47:29Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+5. **Error or over-age**: If reading from the socket (or draining buffered data) results in an error that is not a
+    timeout, or if the connection exceeds the 3 second pending-response window, the driver MUST close the connection.
+6. **Clear pending state on success**: If the pending response is fully drained and successfully discarded, and the
+    connection remains healthy, the pending state may be cleared and the connection MAY be returned to the pool for


According to the pseudocode, if await_pending_response completes without an error, the checking out succeeds for the connection: the connection transitions into the "in use" state, the ConnectionCheckedOutEvent is emitted for it. This happens always. I fail to see how "MAY be returned to the pool" makes sense both because of what I have just described, and because of #1675 (comment).

stIncMale · 2025-09-17T05:03:50Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+        close_connection(conn)
+
+    if error is not None:
+        raise error


The pseudocode that existed before the current PR used throw, for example, throw PoolClosedError. The new pseudocode should continue using that, instead of coming up with new syntax.

stIncMale · 2025-09-17T05:05:41Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+
+error = await_pending_response(pool, connection)
+if error:
+  return error


The pseudocode that existed before the current PR used throw, for example, throw PoolClosedError. The new pseudocode should continue using that, instead of coming up with new syntax.

What is worse, is that the new pseudocode uses return here, but raise is two other places. That is, it is not even consistent within itself.

stIncMale · 2025-09-17T05:07:43Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+
+error = await_pending_response(pool, connection)
+if error:
+  return error


We should fix the pseudocode by emitting the ConnectionCheckOutFailedEvent before throwing this error. This issue was identified in #1675 (comment), I am just pointing out to at least one place where the change has to be done.

stIncMale · 2025-09-17T05:27:55Z

@baileympearson and I triaged https://jira.mongodb.org/browse/DRIVERS-3276 today. We disagree on what to do with that ticket. I think, the work on that ticket should be done as part of the work done in this PR. I expressed more of my thoughts in this Jira comment.

stIncMale · 2025-09-18T20:56:13Z

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md

+4. **Default timeout**: If no user-provided timeout is specified, the driver MUST use the minimum of (a) the remaining 3
+    second "pending response" window and (b) the `socketTimeoutMS` (if supported by the driver) as the effective
+    timeout for the read/drain operation.


@sanych-sun, thank you for pointing out that the CMAP spec has ConnectionPoolOptions.waitQueueTimeoutMS. If we don't take it into account here (and in the pseudocode), then we are changing its meaning, which may result in surprising behavior from the perspective of users: currently, checking out for a non-CSOT operation is expected to potentially go over waitQueueTimeoutMS only if^(*) a new connection is created and established as part of checking out; otherwise, the duration of checking out is expected to be within waitQueueTimeoutMS (see the documentation for ConnectionCheckedOutEvent.duration and waitQueueTimeoutMS).

Thus, the timeout for draining should not exceed what is left of waitQueueTimeoutMS, and should not exceed the "remaining 3 second "pending response" window". I am unsure if socketTimeoutMS needs to be involved at all. I remember proposing socketTimeoutMS when I was reviewing the design, because waitQueueTimeoutMS did not even come to my mind at that time, but I am not certain about my recollection.

^(*) A driver not providing hard real-time guarantees is irrelevant for the purpose of the current comment, which is why I said "only if".

sanych-sun · 2025-09-23T18:55:57Z

This PR was closed by mistake, I'll open another PR for the changes, as I cannot push any changes into the branch anymore. And will make sure to double-check all open yet comments to be solved.
Sorry for the inconveniences.

prestonvasquez requested a review from ShaneHarvey October 14, 2024 21:13

ShaneHarvey reviewed Oct 15, 2024

View reviewed changes

source/client-side-operations-timeout/tests/connection-churn.yml Outdated Show resolved Hide resolved

source/client-side-operations-timeout/tests/connection-churn.yml Outdated Show resolved Hide resolved

prestonvasquez requested a review from ShaneHarvey October 21, 2024 18:15

ShaneHarvey requested changes Apr 17, 2025

View reviewed changes

source/client-side-operations-timeout/tests/connection-churn.yml Outdated Show resolved Hide resolved

source/client-side-operations-timeout/tests/connection-churn.yml Outdated Show resolved Hide resolved

prestonvasquez marked this pull request as ready for review April 25, 2025 21:36

prestonvasquez requested review from a team as code owners April 25, 2025 21:36

prestonvasquez requested review from ShaneHarvey, alcaeus, baileympearson and stIncMale and removed request for a team April 25, 2025 21:36

codeowners-service-app bot requested a review from qingyang-hu April 25, 2025 21:49

prestonvasquez removed the request for review from qingyang-hu April 25, 2025 22:08

codeowners-service-app bot requested a review from qingyang-hu April 25, 2025 22:15

prestonvasquez removed request for qingyang-hu and stIncMale April 29, 2025 18:42

alcaeus reviewed Apr 30, 2025

View reviewed changes

source/unified-test-format/unified-test-format.md Outdated Show resolved Hide resolved

prestonvasquez mentioned this pull request Apr 30, 2025

GODRIVER-3173 Complete pending reads on conn checkout mongodb/mongo-go-driver#1977

Open

prestonvasquez requested a review from alcaeus April 30, 2025 23:02

codeowners-service-app bot requested a review from qingyang-hu May 2, 2025 07:53

baileympearson requested changes May 2, 2025

View reviewed changes

alcaeus approved these changes May 5, 2025

View reviewed changes

prestonvasquez requested a review from baileympearson May 5, 2025 19:34

baileympearson requested changes May 6, 2025

View reviewed changes

ShaneHarvey reviewed May 6, 2025

View reviewed changes

source/client-side-operations-timeout/tests/pending-response.yml Outdated Show resolved Hide resolved

source/connection-monitoring-and-pooling/connection-monitoring-and-pooling.md Outdated Show resolved Hide resolved

prestonvasquez removed the request for review from qingyang-hu May 6, 2025 23:28

prestonvasquez removed the request for review from JamesKovacs July 14, 2025 19:53

codeowners-service-app bot requested a review from tom-selander July 14, 2025 19:56

isabelatkinson reviewed Jul 21, 2025

View reviewed changes

source/retryable-reads/retryable-reads.md Outdated Show resolved Hide resolved

prestonvasquez requested a review from isabelatkinson July 23, 2025 21:20

isabelatkinson approved these changes Jul 23, 2025

View reviewed changes

jyemin removed the request for review from tom-selander August 1, 2025 20:02

codeowners-service-app bot requested a review from esha-bhargava August 7, 2025 19:18

prestonvasquez force-pushed the DRIVERS-2884 branch 8 times, most recently from 2761fb1 to bb4e2db Compare August 29, 2025 19:28

sanych-sun requested changes Sep 2, 2025

View reviewed changes

prestonvasquez requested a review from sanych-sun September 3, 2025 17:58

sanych-sun requested changes Sep 10, 2025

View reviewed changes

sanych-sun reviewed Sep 10, 2025

View reviewed changes

stIncMale requested changes Sep 16, 2025

View reviewed changes

stIncMale requested changes Sep 17, 2025

View reviewed changes

stIncMale reviewed Sep 18, 2025

View reviewed changes

sanych-sun force-pushed the DRIVERS-2884 branch from 4d0330a to cca5d6c Compare September 23, 2025 18:28

sanych-sun closed this Sep 23, 2025

sanych-sun force-pushed the DRIVERS-2884 branch from cca5d6c to 4244306 Compare September 23, 2025 18:31

sanych-sun mentioned this pull request Sep 29, 2025

DRIVERS-2884: CSOT avoid connection churn when operations timeout #1845

Open

3 tasks


		#### Connection Aliveness Check Fails

		1. Initialize a mock TCP listener to simulate the server-side behavior. The listener should write at least 5 bytes to

	* Emitted when the connection successfully read the pending read and is ready
	* Emitted when the connection successfully read the pending response and is ready

DRIVERS-2884 Avoid connection churn when operations timeout #1675

DRIVERS-2884 Avoid connection churn when operations timeout #1675

Uh oh!

Conversation

prestonvasquez commented Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codeowners-service-app bot commented Apr 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alcaeus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prestonvasquez May 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

alcaeus left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

isabelatkinson left a comment

Choose a reason for hiding this comment

Uh oh!

sanych-sun Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sanych-sun Sep 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prestonvasquez Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

prestonvasquez commented Oct 14, 2024 •

edited

Loading

codeowners-service-app bot commented Apr 25, 2025 •

edited

Loading

prestonvasquez May 2, 2025 •

edited

Loading

sanych-sun Sep 2, 2025 •

edited

Loading

sanych-sun Sep 10, 2025 •

edited

Loading

prestonvasquez Sep 11, 2025 •

edited

Loading

sanych-sun Sep 11, 2025 •

edited

Loading

stIncMale Sep 15, 2025 •

edited

Loading

stIncMale Sep 15, 2025 •

edited

Loading

stIncMale Sep 16, 2025 •

edited

Loading

stIncMale Sep 16, 2025 •

edited

Loading

stIncMale Sep 17, 2025 •

edited

Loading

stIncMale Sep 17, 2025 •

edited

Loading