Retryable Write Tests

Introduction

The YAML and JSON files in this directory are platform-independent tests meant to exercise a driver's implementation of retryable writes. These tests utilize the Unified Test Format.

Several prose tests, which are not easily expressed in YAML, are also presented in this file. Those tests will need to be manually implemented by each driver.

Tests will require a MongoClient created with options defined in the tests. Integration tests will require a running MongoDB cluster with server versions 3.6.0 or later. The {setFeatureCompatibilityVersion: 3.6} admin command will also need to have been executed to enable support for retryable writes on the cluster. Some tests may have more stringent version requirements depending on the fail points used.

Use as Integration Tests

Integration tests are expressed in YAML and can be run against a replica set or sharded cluster as denoted by the top-level runOn field. Tests that rely on the onPrimaryTransactionalWrite fail point cannot be run against a sharded cluster because the fail point is not supported by mongos.

The tests exercise the following scenarios:

Single-statement write operations
- Each test expecting a write result will encounter at-most one network error for the write command. Retry attempts should return without error and allow operation to succeed. Observation of the collection state will assert that the write occurred at-most once.
- Each test expecting an error will encounter successive network errors for the write command. Observation of the collection state will assert that the write was never committed on the server.
Multi-statement write operations
- Each test expecting a write result will encounter at-most one network error for some write command(s) in the batch. Retry attempts should return without error and allow the batch to ultimately succeed. Observation of the collection state will assert that each write occurred at-most once.
- Each test expecting an error will encounter successive network errors for some write command in the batch. The batch will ultimately fail with an error, but observation of the collection state will assert that the failing write was never committed on the server. We may observe that earlier writes in the batch occurred at-most once.

We cannot test a scenario where the first and second attempts both encounter network errors but the write does actually commit during one of those attempts. This is because (1) the fail point only triggers when a write would be committed and (2) the skip and times options are mutually exclusive. That said, such a test would mainly assert the server's correctness for at-most once semantics and is not essential to assert driver correctness.

Split Batch Tests

The YAML tests specify bulk write operations that are split by command type (e.g. sequence of insert, update, and delete commands). Multi-statement write operations may also be split due to maxWriteBatchSize, maxBsonObjectSize, or maxMessageSizeBytes.

For instance, an insertMany operation with five 10 MiB documents executed using OP_MSG payload type 0 (i.e. entire command in one document) would be split into five insert commands in order to respect the 16 MiB maxBsonObjectSize limit. The same insertMany operation executed using OP_MSG payload type 1 (i.e. command arguments pulled out into a separate payload vector) would be split into two insert commands in order to respect the 48 MB maxMessageSizeBytes limit.

Noting when a driver might split operations, the onPrimaryTransactionalWrite fail point's skip option may be used to control when the fail point first triggers. Once triggered, the fail point will transition to the alwaysOn state until disabled. Driver authors should also note that the server attempts to process all documents in a single insert command within a single commit (i.e. one insert command with five documents may only trigger the fail point once). This behavior is unique to insert commands (each statement in an update and delete command is processed independently).

If testing an insert that is split into two commands, a skip of one will allow the fail point to trigger on the second insert command (because all documents in the first command will be processed in the same commit). When testing an update or delete that is split into two commands, the skip should be set to the number of statements in the first command to allow the fail point to trigger on the second command.

Command Construction Tests

The command construction prose tests have been removed in favor of command event assertions in the unified format tests.

Prose Tests

The following tests ensure that retryable writes work properly with replica sets and sharded clusters.

1. Test that retryable writes raise an exception when using the MMAPv1 storage engine.

For this test, execute a write operation, such as insertOne, which should generate an exception. Assert that the error message is the replacement error message:

This MongoDB deployment does not support retryable writes. Please add
retryWrites=false to your connection string.

and the error code is 20.

Note

Drivers that rely on serverStatus to determine the storage engine in use MAY skip this test for sharded clusters, since mongos does not report this information in its serverStatus response.

2. Test that drivers properly retry after encountering PoolClearedErrors.

This test MUST be implemented by any driver that implements the CMAP specification.

This test requires MongoDB 4.3.4+ for both the errorLabels and blockConnection fail point options.

Create a client with maxPoolSize=1 and retryWrites=true. If testing against a sharded deployment, be sure to connect to only a single mongos.

Enable the following failpoint:

{
    configureFailPoint: "failCommand",
    mode: { times: 1 },
    data: {
        failCommands: ["insert"],
        errorCode: 91,
        blockConnection: true,
        blockTimeMS: 1000,
        errorLabels: ["RetryableWriteError"]
    }
}

Start two threads and attempt to perform an insertOne simultaneously on both.
Verify that both insertOne attempts succeed.
Via CMAP monitoring, assert that the first check out succeeds.
Via CMAP monitoring, assert that a PoolClearedEvent is then emitted.
Via CMAP monitoring, assert that the second check out then fails due to a connection error.
Via Command Monitoring, assert that exactly three insert CommandStartedEvents were observed in total.
Disable the failpoint.

3. Test that drivers return the original error after encountering a WriteConcernError with a RetryableWriteError label.

This test MUST:

be implemented by any driver that implements the Command Monitoring specification,
only run against replica sets as mongos does not propagate the NoWritesPerformed label to the drivers.
be run against server versions 6.0 and above.

Additionally, this test requires drivers to set a fail point after an insertOne operation but before the subsequent retry. Drivers that are unable to set a failCommand after the CommandSucceededEvent SHOULD use mocking or write a unit test to cover the same sequence of events.

Create a client with retryWrites=true.

Configure a fail point with error code 91 (ShutdownInProgress):

{
    configureFailPoint: "failCommand",
    mode: {times: 1},
    data: {
        failCommands: ["insert"],
        errorLabels: ["RetryableWriteError"],
        writeConcernError: { code: 91 }
    }
}

Via the command monitoring CommandSucceededEvent, configure a fail point with error code 10107 (NotWritablePrimary) and a NoWritesPerformed label:
```
{
    configureFailPoint: "failCommand",
    mode: {times: 1},
    data: {
        failCommands: ["insert"],
        errorCode: 10107,
        errorLabels: ["RetryableWriteError", "NoWritesPerformed"]
    }
}
```
Drivers SHOULD only configure the 10107 fail point command if the the succeeded event is for the 91 error configured in step 2.
Attempt an insertOne operation on any record for any database and collection. For the resulting error, assert that the associated error code is 91.

Disable the fail point:

{
    configureFailPoint: "failCommand",
    mode: "off"
}

4. Test that in a sharded cluster writes are retried on a different mongos when one is available.

This test MUST be executed against a sharded cluster that has at least two mongos instances, supports retryWrites=true, has enabled the configureFailPoint command, and supports the errorLabels field (MongoDB 4.3.1+).

Note

This test cannot reliably distinguish "retry on a different mongos due to server deprioritization" (the behavior intended to be tested) from "retry on a different mongos due to normal SDAM randomized suitable server selection". Verify relevant code paths are correctly executed by the tests using external means such as a logging, debugger, code coverage tool, etc.

Create two clients s0 and s1 that each connect to a single mongos from the sharded cluster. They must not connect to the same mongos.

Configure the following fail point for both s0 and s1:

{
    configureFailPoint: "failCommand",
    mode: { times: 1 },
    data: {
        failCommands: ["insert"],
        errorCode: 6,
        errorLabels: ["RetryableWriteError"]
    }
}

Create a client client with retryWrites=true that connects to the cluster using the same two mongoses as s0 and s1.
Enable failed command event monitoring for client.
Execute an insert command with client. Assert that the command failed.
Assert that two failed command events occurred. Assert that the failed command events occurred on different mongoses.
Disable the fail points on both s0 and s1.

5. Test that in a sharded cluster writes are retried on the same mongos when no others are available.

This test MUST be executed against a sharded cluster that supports retryWrites=true, has enabled the configureFailPoint command, and supports the errorLabels field (MongoDB 4.3.1+).

Note: this test cannot reliably distinguish "retry on a different mongos due to server deprioritization" (the behavior intended to be tested) from "retry on a different mongos due to normal SDAM behavior of randomized suitable server selection". Verify relevant code paths are correctly executed by the tests using external means such as a logging, debugger, code coverage tool, etc.

Create a client s0 that connects to a single mongos from the cluster.

Configure the following fail point for s0:

{
    configureFailPoint: "failCommand",
    mode: { times: 1 },
    data: {
        failCommands: ["insert"],
        errorCode: 6,
        errorLabels: ["RetryableWriteError"],
        closeConnection: true
    }
}

Create a client client with directConnection=false (when not set by default) and retryWrites=true that connects to the cluster using the same single mongos as s0.
Enable succeeded and failed command event monitoring for client.
Execute an insert command with client. Assert that the command succeeded.
Assert that exactly one failed command event and one succeeded command event occurred. Assert that both events occurred on the same mongos.
Disable the fail point on s0.

6. Test error propagation after encountering multiple errors.

These tests MUST:

be implemented by any driver that implements the Command Monitoring specification.
only run against replica sets as mongos does not propagate the NoWritesPerformed label to the drivers.
be run against server versions 6.0 and above.
be implemented by any driver that has implemented the Client Backpressure specification.

Additionally, this test requires drivers to set a fail point after an insertOne operation but before the subsequent retry. Drivers that are unable to set a failCommand after the CommandFailedEvent SHOULD use mocking or write a unit test to cover the same sequence of events.

Case 1: Test that drivers return the correct error when receiving only errors without `NoWritesPerformed`

Create a client with retryWrites=true.

Configure a fail point with error code 91 (ShutdownInProgress) with the RetryableError and SystemOverloadedError error labels:

{
    configureFailPoint: "failCommand",
    mode: {times: 1},
    data: {
        failCommands: ["insert"],
        errorLabels: ["RetryableError", "SystemOverloadedError"],
        errorCode: 91
    }
}

Via the command monitoring CommandFailedEvent, configure a fail point with error code 10107 (NotWritablePrimary):

{
    configureFailPoint: "failCommand",
    mode: "alwaysOn",
    data: {
        failCommands: ["insert"],
        errorCode: 10107,
        errorLabels: ["RetryableError", "SystemOverloadedError"]
    }
}

Configure the 10107 fail point command only if the the failed event is for the 91 error configured in step 2.

Attempt an insertOne operation on any record for any database and collection. Expect the insertOne to fail with a server error. Assert that the error code of the server error is 10107.

Disable the fail point:

{
    configureFailPoint: "failCommand",
    mode: "off"
}

Case 2: Test that drivers return the correct error when receiving only errors with `NoWritesPerformed`

Create a client with retryWrites=true.

Configure a fail point with error code 91 (ShutdownInProgress) with the RetryableError and SystemOverloadedError error labels:

{
    configureFailPoint: "failCommand",
    mode: {times: 1},
    data: {
        failCommands: ["insert"],
        errorLabels: ["RetryableError", "SystemOverloadedError", "NoWritesPerformed"],
        errorCode: 91
    }
}

Via the command monitoring CommandFailedEvent, configure a fail point with error code 10107 (NotWritablePrimary) and a NoWritesPerformed label:

{
    configureFailPoint: "failCommand",
    mode: "alwaysOn",
    data: {
        failCommands: ["insert"],
        errorCode: 10107,
        errorLabels: ["RetryableError", "SystemOverloadedError", "NoWritesPerformed"]
    }
}

Configure the 10107 fail point command only if the the failed event is for the 91 error configured in step 2.

Attempt an insertOne operation on any record for any database and collection. Expect the insertOne to fail with a server error. Assert that the error code of the server error is 91.

Disable the fail point:

{
    configureFailPoint: "failCommand",
    mode: "off"
}

Case 3: Test that drivers return the correct error when receiving some errors with `NoWritesPerformed` and some without `NoWritesPerformed`

Create a client with retryWrites=true and monitorCommands=true.

Configure the client to listen to CommandFailedEvents. In the attached listener, configure a fail point with error code 91 (NotWritablePrimary) and the NoWritesPerformed, RetryableError and SystemOverloadedError labels:

{
    configureFailPoint: "failCommand",
    mode: {times: 1},
    data: {
        failCommands: ["insert"],
        errorLabels: ["RetryableError", "SystemOverloadedError", "NoWritesPerformed"],
        errorCode: 91
    }
}

Configure a fail point with error code 91 (ShutdownInProgress) with the RetryableError and SystemOverloadedError error labels but without the NoWritesPerformed error label:

{
    configureFailPoint: "failCommand",
    mode: {times: 1},
    data: {
        failCommands: ["insert"],
        errorLabels: ["RetryableError", "SystemOverloadedError"],
        errorCode: 91
    }
}

Attempt an insertOne operation on any record for any database and collection. Expect the insertOne to fail with a server error. Assert that the error code of the server error is 91. Assert that the error does not contain the error label NoWritesPerformed.

Disable the fail point:

{
    configureFailPoint: "failCommand",
    mode: "off"
}

Changelog

2026-02-03: Add tests for error propagation behavior when multiple errors are encountered.
2024-10-29: Convert command construction tests to unified format.
2024-05-30: Migrated from reStructuredText to Markdown.
2024-02-27: Convert legacy retryable writes tests to unified format.
2024-02-21: Update prose tests 4 and 5 to workaround SDAM behavior preventing execution of deprioritization code paths.
2024-01-05: Fix typo in prose test title.
2024-01-03: Note server version requirements for fail point options and revise tests to specify the errorLabels option at the top-level instead of within writeConcernError.
2023-08-26: Add prose tests for retrying in a sharded cluster.
2022-08-30: Add prose test verifying correct error handling for errors with the NoWritesPerformed label, which is to return the original error.
2022-04-22: Clarifications to serverless and useMultipleMongoses.
2021-08-27: Add serverless to runOn. Clarify behavior of useMultipleMongoses for LoadBalanced topologies.
2021-04-23: Add load-balanced to test topology requirements.
2021-03-24: Add prose test verifying PoolClearedErrors are retried.
2019-10-21: Add errorLabelsContain and errorLabelsContain fields to result
2019-08-07: Add Prose Tests section
2019-06-07: Mention $merge stage for aggregate alongside $out
2019-03-01: Add top-level runOn field to denote server version and/or topology requirements requirements for the test file. Removes the minServerVersion and maxServerVersion top-level fields, which are now expressed within runOn elements.

Add test-level useMultipleMongoses field.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retryable Write Tests

Introduction

Use as Integration Tests

Split Batch Tests

Command Construction Tests

Prose Tests

1. Test that retryable writes raise an exception when using the MMAPv1 storage engine.

2. Test that drivers properly retry after encountering PoolClearedErrors.

3. Test that drivers return the original error after encountering a WriteConcernError with a RetryableWriteError label.

4. Test that in a sharded cluster writes are retried on a different mongos when one is available.

5. Test that in a sharded cluster writes are retried on the same mongos when no others are available.

6. Test error propagation after encountering multiple errors.

Case 1: Test that drivers return the correct error when receiving only errors without `NoWritesPerformed`

Case 2: Test that drivers return the correct error when receiving only errors with `NoWritesPerformed`

Case 3: Test that drivers return the correct error when receiving some errors with `NoWritesPerformed` and some without `NoWritesPerformed`

Changelog

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Retryable Write Tests

Introduction

Use as Integration Tests

Split Batch Tests

Command Construction Tests

Prose Tests

1. Test that retryable writes raise an exception when using the MMAPv1 storage engine.

2. Test that drivers properly retry after encountering PoolClearedErrors.

3. Test that drivers return the original error after encountering a WriteConcernError with a RetryableWriteError label.

4. Test that in a sharded cluster writes are retried on a different mongos when one is available.

5. Test that in a sharded cluster writes are retried on the same mongos when no others are available.

6. Test error propagation after encountering multiple errors.

Case 1: Test that drivers return the correct error when receiving only errors without NoWritesPerformed

Case 2: Test that drivers return the correct error when receiving only errors with NoWritesPerformed

Case 3: Test that drivers return the correct error when receiving some errors with NoWritesPerformed and some without NoWritesPerformed

Changelog

Case 1: Test that drivers return the correct error when receiving only errors without `NoWritesPerformed`

Case 2: Test that drivers return the correct error when receiving only errors with `NoWritesPerformed`

Case 3: Test that drivers return the correct error when receiving some errors with `NoWritesPerformed` and some without `NoWritesPerformed`