-
Notifications
You must be signed in to change notification settings - Fork 247
DRIVERS-3391: Clarify withTransaction CSOT timeout error with backoff+jitter #1890
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from 3 commits
47c94b1
14870f2
03f12b3
5ef3afd
87b3cf8
160d1fb
8aad52a
634a14d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -565,6 +565,11 @@ the ClientSession `defaultTimeoutMS` option, and once more with the timeout spec | |
|
|
||
| Tests in this section MUST only run against replica sets and sharded clusters with server versions 4.4 or higher. | ||
|
|
||
| It is recommended that drivers run these tests with | ||
| [jitter](../../transactions-convenient-api/transactions-convenient-api.md#clientsessionwithtransaction) disabled (set to | ||
| 0\) to reduce the likelihood of flakiness due to varying | ||
| [backoff times](../../transactions-convenient-api/transactions-convenient-api.md#backoff-benefits). | ||
|
|
||
| #### timeoutMS is refreshed for abortTransaction if the callback fails | ||
|
|
||
| 1. Using `internalClient`, drop the `db.coll` collection. | ||
|
|
@@ -602,6 +607,50 @@ Tests in this section MUST only run against replica sets and sharded clusters wi | |
| 1. `command_started` and `command_failed` events for an `insert` command. | ||
| 2. `command_started` and `command_failed` events for an `abortTransaction` command. | ||
|
|
||
| #### withTransaction applies a single timeout across retries due to transient errors | ||
|
||
|
|
||
| 1. Using `internalClient`, drop the collection `db.coll`. | ||
|
|
||
| 2. Using `internalClient`, set the following fail point: | ||
|
|
||
| ```javascript | ||
| { | ||
| configureFailPoint: "failCommand", | ||
| mode: "alwaysOn", | ||
| data: { | ||
| failCommands: ["insert"], | ||
| blockConnection: true, | ||
| blockTimeMS: 25, | ||
| errorCode: 24, | ||
| errorLabels: ["TransientTransactionError"] | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| 3. Create a new MongoClient `client` with default settings. | ||
|
|
||
| 4. Using `client`, create a new explicit `session` with `defaultTimeout=200ms`. | ||
|
|
||
| 5. Initialize an `attempt` counter to `0`. | ||
|
|
||
| 6. Using `session`, execute `withTransaction` with a callback that:- | ||
|
|
||
| - increment the `attempt` counter | ||
| - Inserts the document `{ x: 1 }` into `db.coll` | ||
|
|
||
| 7. Expect this to fail with a CSOT timeout error (this depends on the driver's error/exception type) | ||
| `MongoOperationTimeoutException` for Java). | ||
|
|
||
| 8. Verify that there has been at least 2 attempts to execute the `insert` command as part of the `withTransaction` call. | ||
| (`attempt > 1`) | ||
|
|
||
| **Rationale:** This test verifies that when `withTransaction` encounters transient transaction errors (such as lock | ||
| acquisition failures with error code 24), the retry attempts share the same timeout budget rather than resetting it. | ||
| Each retry consumes time from the original 200ms timeout, and the cumulative delay from retries exceeds the available | ||
| time budget, resulting in a timeout error. This test also ensures that the driver does not throw the lock acquisition | ||
| error directly to the user, but instead surfaces a timeout error after exhausting the retry attempts within the | ||
| specified timeout. The timeout error thrown contains as a cause the last transient error encountered. | ||
|
|
||
| ### 11. Multi-batch bulkWrites | ||
|
|
||
| This test MUST only run against server versions 8.0+. This test must be skipped on Atlas Serverless. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -220,9 +220,12 @@ withTransaction(callback, options) { | |
| this.abortTransaction(); | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It is out of diff: but we also need to raise to clarify the error handling behavior on line 206.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Also out of diff, but can you also update the prose description accordingly? |
||
| } | ||
|
|
||
| if (error.hasErrorLabel("TransientTransactionError") && | ||
| Date.now() - startTime < timeout) { | ||
| continue retryTransaction; | ||
| if (error.hasErrorLabel("TransientTransactionError")) { | ||
| if (Date.now() - startTime < timeout) { | ||
| continue retryTransaction; | ||
| } else { | ||
| throw getCSOTTimeoutIfSet() != null ? createCSOTMongoTimeoutException(error) : createLegacyMongoTimeoutException(e); | ||
| } | ||
| } | ||
|
|
||
| throw error; | ||
|
|
@@ -247,15 +250,17 @@ withTransaction(callback, options) { | |
| * {ok:0, code: 50, codeName: "MaxTimeMSExpired"} | ||
| * {ok:1, writeConcernError: {code: 50, codeName: "MaxTimeMSExpired"}} | ||
| */ | ||
| lastError = error; | ||
| if (Date.now() - startTime >= timeout) { | ||
| throw getCSOTTimeoutIfSet() != null ? createCSOTMongoTimeoutException(error) : createLegacyMongoTimeoutException(e); | ||
| } | ||
|
|
||
nhachicha marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| if (!isMaxTimeMSExpiredError(error) && | ||
| error.hasErrorLabel("UnknownTransactionCommitResult") && | ||
| Date.now() - startTime < timeout) { | ||
| error.hasErrorLabel("UnknownTransactionCommitResult")) { | ||
| continue retryCommit; | ||
| } | ||
|
|
||
| if (error.hasErrorLabel("TransientTransactionError") && | ||
| Date.now() - startTime < timeout) { | ||
| lastError = error; | ||
| if (error.hasErrorLabel("TransientTransactionError")) { | ||
| continue retryTransaction; | ||
| } | ||
|
|
||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you seeing flakiness with these tests? Node hasn't had issues and we've had these changes merged for ~2 months, and I don't know that Python has either (@sleepyStick ?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think python has noticed any issues with this
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed this section 👍