Skip to content

Commit 434f5d6

Browse files
DRIVERS-3326: clarify retry behavior when errors with NoWritesPerformed are encountered (mongodb#1878)
1 parent 9e2acc6 commit 434f5d6

File tree

3 files changed

+179
-7
lines changed

3 files changed

+179
-7
lines changed

source/retryable-reads/retryable-reads.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -548,7 +548,7 @@ any customers experiencing degraded performance can simply disable `retryableRea
548548
549549
## Changelog
550550
551-
- 2026-12-08: Clarified that server deprioritization during retries must use a list of server addresses.
551+
- 2025-12-08: Clarified that server deprioritization during retries must use a list of server addresses.
552552
553553
- 2024-04-30: Migrated from reStructuredText to Markdown.
554554

source/retryable-writes/retryable-writes.md

Lines changed: 19 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -325,10 +325,22 @@ retrying is not possible and drivers MUST raise the retryable error from the pre
325325
is able to infer that an attempt was made.
326326

327327
If a retry attempt also fails, drivers MUST update their topology according to the SDAM spec (see:
328-
[Error Handling](../server-discovery-and-monitoring/server-discovery-and-monitoring.md#error-handling)). If an error
329-
would not allow the caller to infer that an attempt was made (e.g. connection pool exception originating from the
330-
driver) or the error is labeled "NoWritesPerformed", the error from the previous attempt should be raised. If all server
331-
errors are labeled "NoWritesPerformed", then the first error should be raised.
328+
[Error Handling](../server-discovery-and-monitoring/server-discovery-and-monitoring.md#error-handling)).
329+
330+
If the driver is unable to retry an operation, an error MUST be returned to the user. Some errors that a driver
331+
encounters indicate that no writes were attempted (i.e., the operation is a no-op). These errors include any client-side
332+
error that occurs before a command is sent (e.g., a server selection or connection checkout error) or any server error
333+
with the `NoWritesPerformed` error label. When the driver encounters multiple errors, the driver MUST ensure that if an
334+
error has been encountered which indicates that a write was attempted, this error is returned. This behavior is
335+
summarized below in the following rules:
336+
337+
- If the driver has encountered only errors that indicate write attempts were made, the most recently encountered error
338+
must be returned.
339+
- If all errors indicate no attempt was made (e.g., all errors contain the `NoWritesPerformed` error label or are
340+
client-side errors before a command is sent), the first error encountered must be returned.
341+
- If the driver has encountered some errors which indicate a write attempt was made and some which indicate no write
342+
attempt was made (e.g., a retryable server error followed by a checkout error), the most recently encountered error
343+
which indicates a write attempt occurred must be returned.
332344

333345
If a driver associates server information (e.g. the server address or description) with an error, the driver MUST ensure
334346
that the reported server information corresponds to the server that originated the error.
@@ -681,7 +693,9 @@ retryWrites is not true would be inconsistent with the server and potentially co
681693

682694
## Changelog
683695

684-
- 2026-12-08: Clarified that server deprioritization during retries must use a list of server addresses.
696+
- 2026-01-14: Clarify which error to return when more than one error with the `NoWritesPerformed` label is encountered.
697+
698+
- 2025-12-08: Clarified that server deprioritization during retries must use a list of server addresses.
685699

686700
- 2024-05-08: Add guidance for client-level `bulkWrite()` retryability.
687701

source/retryable-writes/tests/README.md

Lines changed: 159 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -262,15 +262,173 @@ debugger, code coverage tool, etc.
262262

263263
7. Disable the fail point on `s0`.
264264

265+
### 6. Test error propagation after encountering multiple errors.
266+
267+
These tests MUST:
268+
269+
- be implemented by any driver that implements the Command Monitoring specification.
270+
- only run against replica sets as mongos does not propagate the NoWritesPerformed label to the drivers.
271+
- be run against server versions 6.0 and above.
272+
- be implemented by any driver that has implemented the Client Backpressure specification.
273+
274+
Additionally, this test requires drivers to set a fail point after an `insertOne` operation but before the subsequent
275+
retry. Drivers that are unable to set a failCommand after the CommandFailedEvent SHOULD use mocking or write a unit test
276+
to cover the same sequence of events.
277+
278+
#### Case 1: Test that drivers return the correct error when receiving only errors without `NoWritesPerformed`
279+
280+
1. Create a client with `retryWrites=true`.
281+
282+
2. Configure a fail point with error code `91` (ShutdownInProgress) with the `RetryableError` and
283+
`SystemOverloadedError` error labels:
284+
285+
```javascript
286+
{
287+
configureFailPoint: "failCommand",
288+
mode: {times: 1},
289+
data: {
290+
failCommands: ["insert"],
291+
errorLabels: ["RetryableError", "SystemOverloadedError"],
292+
errorCode: 91
293+
}
294+
}
295+
```
296+
297+
3. Via the command monitoring CommandFailedEvent, configure a fail point with error code `10107` (NotWritablePrimary):
298+
299+
```javascript
300+
{
301+
configureFailPoint: "failCommand",
302+
mode: "alwaysOn",
303+
data: {
304+
failCommands: ["insert"],
305+
errorCode: 10107,
306+
errorLabels: ["RetryableError", "SystemOverloadedError"]
307+
}
308+
}
309+
```
310+
311+
Configure the `10107` fail point command only if the the failed event is for the `91` error configured in step 2.
312+
313+
4. Attempt an `insertOne` operation on any record for any database and collection. Expect the `insertOne` to fail with a
314+
server error. Assert that the error code of the server error is `10107`.
315+
316+
5. Disable the fail point:
317+
318+
```javascript
319+
{
320+
configureFailPoint: "failCommand",
321+
mode: "off"
322+
}
323+
```
324+
325+
#### Case 2: Test that drivers return the correct error when receiving only errors with `NoWritesPerformed`
326+
327+
1. Create a client with `retryWrites=true`.
328+
329+
2. Configure a fail point with error code `91` (ShutdownInProgress) with the `RetryableError` and
330+
`SystemOverloadedError` error labels:
331+
332+
```javascript
333+
{
334+
configureFailPoint: "failCommand",
335+
mode: {times: 1},
336+
data: {
337+
failCommands: ["insert"],
338+
errorLabels: ["RetryableError", "SystemOverloadedError", "NoWritesPerformed"],
339+
errorCode: 91
340+
}
341+
}
342+
```
343+
344+
3. Via the command monitoring CommandFailedEvent, configure a fail point with error code `10107` (NotWritablePrimary)
345+
and a NoWritesPerformed label:
346+
347+
```javascript
348+
{
349+
configureFailPoint: "failCommand",
350+
mode: "alwaysOn",
351+
data: {
352+
failCommands: ["insert"],
353+
errorCode: 10107,
354+
errorLabels: ["RetryableError", "SystemOverloadedError", "NoWritesPerformed"]
355+
}
356+
}
357+
```
358+
359+
Configure the `10107` fail point command only if the the failed event is for the `91` error configured in step 2.
360+
361+
4. Attempt an `insertOne` operation on any record for any database and collection. Expect the `insertOne` to fail with a
362+
server error. Assert that the error code of the server error is 91.
363+
364+
5. Disable the fail point:
365+
366+
```javascript
367+
{
368+
configureFailPoint: "failCommand",
369+
mode: "off"
370+
}
371+
```
372+
373+
#### Case 3: Test that drivers return the correct error when receiving some errors with `NoWritesPerformed` and some without `NoWritesPerformed`
374+
375+
1. Create a client with `retryWrites=true` and `monitorCommands=true`.
376+
377+
2. Configure the client to listen to CommandFailedEvents. In the attached listener, configure a fail point with error
378+
code `91` (NotWritablePrimary) and the `NoWritesPerformed`, `RetryableError` and `SystemOverloadedError` labels:
379+
380+
```javascript
381+
{
382+
configureFailPoint: "failCommand",
383+
mode: {times: 1},
384+
data: {
385+
failCommands: ["insert"],
386+
errorLabels: ["RetryableError", "SystemOverloadedError", "NoWritesPerformed"],
387+
errorCode: 91
388+
}
389+
}
390+
```
391+
392+
3. Configure a fail point with error code `91` (ShutdownInProgress) with the `RetryableError` and
393+
`SystemOverloadedError` error labels but without the `NoWritesPerformed` error label:
394+
395+
```javascript
396+
{
397+
configureFailPoint: "failCommand",
398+
mode: {times: 1},
399+
data: {
400+
failCommands: ["insert"],
401+
errorLabels: ["RetryableError", "SystemOverloadedError"],
402+
errorCode: 91
403+
}
404+
}
405+
```
406+
407+
4. Attempt an `insertOne` operation on any record for any database and collection. Expect the `insertOne` to fail with a
408+
server error. Assert that the error code of the server error is 91. Assert that the error does not contain the
409+
error label `NoWritesPerformed`.
410+
411+
5. Disable the fail point:
412+
413+
```javascript
414+
{
415+
configureFailPoint: "failCommand",
416+
mode: "off"
417+
}
418+
```
419+
265420
## Changelog
266421

422+
- 2026-02-03: Add tests for error propagation behavior when multiple errors are encountered.
423+
267424
- 2024-10-29: Convert command construction tests to unified format.
268425

269426
- 2024-05-30: Migrated from reStructuredText to Markdown.
270427

271428
- 2024-02-27: Convert legacy retryable writes tests to unified format.
272429

273-
- 2024-02-21: Update prose test 4 and 5 to workaround SDAM behavior preventing execution of deprioritization code paths.
430+
- 2024-02-21: Update prose tests 4 and 5 to workaround SDAM behavior preventing execution of deprioritization code
431+
paths.
274432

275433
- 2024-01-05: Fix typo in prose test title.
276434

0 commit comments

Comments
 (0)