-
Notifications
You must be signed in to change notification settings - Fork 246
DRIVERS-2884: CSOT avoid connection churn when operations timeout #1845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
f973c84
to
f3d26ba
Compare
Assigned |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed the unified test schema changes alone and those LGTM. I'll defer to CSOT spec folks for the other files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sanych-sun Have we implemented these changes in any driver? Which driver is supposed to be the second implementer? iirc it was originally Python
close connection | ||
connection = Null | ||
if connection is in "pending response" state: | ||
drain the pending response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have pseudocode for "drain the pending response" somewhere? I don't see it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we have events emission in the current pseudocode, not sure if we need to have pseudocode for "drain the pending response", as it basically "consume the bytes from underling stream/socket and ignore them".
if connection is perished: | ||
close connection | ||
connection = Null | ||
if connection is in "pending response" state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Connection pools are FIFO. Considering the case where we are successfully reading chunks of data off of the socket but we're timing out before we finish the read (the final else block in your pseudocode), won't checking the connection back into the pool and looping just cause this request to immediately checkout the same connection again (the behavior https://jira.mongodb.org/browse/DRIVERS-3276 is intending to avoid)?
And regarding https://jira.mongodb.org/browse/DRIVERS-3276: are we convinced that its better to potentially block a number of operations on the slow connection than to block just one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can check in the connection back only if the current check out is timed out (either because of CSOT or waitQueueTimeout has elapsed), but "pending response sliding window" has not expired yet. But the "current" operation is timed out anyway, so it should raise an error instead of attempting another connection.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have "slow to drain" connection in the pool - it will cause the same kind of problem regardless we do retry on operation level or if we attempting to drain several connections inside the checkout. Because such connection will return back to pool and affect next "unlucky" operation anyway.
if connection is in "pending response" state: | ||
drain the pending response | ||
if error: | ||
if non-timeout error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pseudocode needs to include new event and logging emissions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
/** | ||
* The driver-generated request ID of the operation that caused the pending response state. | ||
*/ | ||
requestId: int64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems odd that the events emitted for an operation include a requestId that corresponds to the request that made the connection pending response
. What is the value in this datapoint?
If we decide to keep it - do you think a more precise name would be beneficial? requestId
is ambiguous - users could easily assume it refers to the request that is reading the pending response, not the operation that made the connection pending.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of reporting the original requestId is to keep tracking how long/how many draining attempt were made before success or failure. Honestly I would prefer to have BOTH, current request Id and "original timed out requestId", but it could make sense only if other check out event had "current requestId" field, which is not there =(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might depend on driver internals but in Python the new request ID will not be available at this point because the command has not been serialized (we need to checkout the connection first).
As for including the old requestId, I'm not sure how useful it is but it seems harmless to add. The driver needs to validate requestId == responseTo field on the server reply using the old requestId anyway to make sure the wire protocol isn't violated so that value will be available.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you Shane! I suppose we want to keep the field. @baileympearson is this OK with you?
|
||
## Changelog | ||
|
||
- 2025-01-22: Clarify durationMS in logs may be Int32/Int64/Double. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please update the changelog.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
source/client-side-operations-timeout/tests/pending-response.yml
Outdated
Show resolved
Hide resolved
- name: insertOne | ||
object: *collection | ||
arguments: | ||
timeoutMS: 50 | ||
document: {_id: 3, x: 1} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment as on above test.
- `sendBytes`: We have 3 possible states here: | ||
1. Message size was partially read: random value between 1 and 3 inclusive | ||
2. Message size was read, body was not read at all: use 4 | ||
3. Message size was read, body read partially: random value between 5 and 100 inclusive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The insert
is the first command on the connection, so where are the values for sendBytes
coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the parameter for proxy, which specifies how many bytes should be streamed, before sleeping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of sendBytes
is determined based on the state of the connection. But this is the first operation on the connection, so the connection will never have anything in the buffer and it isn't clear what the value of sendBytes should be.
Unless I misunderstand what sub-bullets 1,2, and 3 mean?
This test verifies that if only part of a response was read before the timeout, the driver can drain the rest of the | ||
response and reuse the connection for the next operation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't match the actual contents of the test, so either the description or the test needs to be updated.
But if its the description that is inaccurate - with sendAll: true
and the events (only one pending read started + finished pair), the full response will be read in the first operation. Don't we have coverage for this scenario from our unified tests?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using proxy server here to control when/how server response arrived to the client side. So in step 2 in addition to the regular insert command payload, we have to add additional proxyTest
property, which instruct proxy to emulate timeout. It works as following:
- it send request to the server
- based on the
sendBytes
parameter it stream requested amount of bytes instantly - it sleeps for
delayMS
- it streams rest of the response.
I think it makes sense to add this explanation to the test summary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, maybe my confusion stems from the interaction between sendBytes
and sendAll
. Does sendAll
being enabled not mean that the proxy will forward the full response back to the client?
…-and-pooling.md Co-authored-by: Bailey Pearson <[email protected]>
Co-authored-by: Bailey Pearson <[email protected]>
tConnectionDrainingStarted = current instant (use a monotonic clock if possible) | ||
emit PendingResponseStartedEvent and equivalent log message | ||
drain the pending response | ||
if error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The mix of exception/error paradigms in this pseudocode block is a bit confusing. We should rewrite this new pseudocode to use try/catch like the existing code.
This PR is based on the Preston's work in another PR that was closed by mistake: #1675
This PR implements the design for connection pooling improvements described in DRIVERS-2884, based on the CSOT (Client-Side Operation Timeout) spec. It addresses connection churn caused by network timeouts during operations, especially in environments with low client-side timeouts and high latency.
When a connection is checked out after a network timeout, the driver now attempts to resume and complete reading any pending server response (instead of closing and discarding the connection). This may require multiple connection to be attempted during the connection check out from the pool.
Each pending response draining is subject to a cumulative 3-second static timeout. The timeout is refreshed after each successful read, acknowledging that progress is being made. If no data is read and the timeout is exceeded, the connection is closed.
This update introduces new CMAP events and logging messages (PendingResponseStarted, PendingResponseSucceeded, PendingResponseFailed) to improve observability of this path.
Please complete the following before merging:
clusters).