Skip to content
Merged
Show file tree
Hide file tree
Changes from 10 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,8 @@ Fields:
field in the server's hello or legacy hello response, in the case that the server reports an address different from
the address the client uses.

- (=) `error`: information about the last error related to this server. Default null.
- (=) `error`: information about the last error related to this server. Default null. MUST contain or be able to produce
a string describing the error.

- `roundTripTime`: the duration of the hello or legacy hello call. Default null.

Expand Down Expand Up @@ -485,7 +486,12 @@ removed once the primary is checked.
#### error

If the client experiences any error when checking a server, it stores error information in the ServerDescription's error
field.
field. The message contained in this field MUST contain the substrings detailed in the table below when the
ServerDescription is changed to Unknown in the circumstances outlined.

| circumstance | error substring |
| ---------------------------------------------------------- | -------------------------------------------------------------- |
| RSPrimary with a stale electionId/setVersion is discovered | `'primary marked stale due to electionId/setVersion mismatch'` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this comment #1729 (comment) I had suggested adding new and old election tuples to the error message. Did you decide not to do that? Or could you add an example of the suggested error message here?

Copy link
Contributor Author

@W-A-James W-A-James Jan 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, just missed that. Here's what the message looks like in the Node driver currently (implemented before we finalized this):

MongoStalePrimaryError: primary marked stale due to electionId/setVersion mismatch: server setVersion: 1, server electionId: 000000000000000000000001, topology setVersion: 1, topology electionId: 000000000000000000000002

After implementing this it could look like

MongoStalePrimaryError: primary marked stale due to electionId/setVersion mismatch: stale electionId/setVersion: (000000000000000000000001,1), new electionId/setVersion: (000000000000000000000002, 1)


#### roundTripTime

Expand Down Expand Up @@ -863,38 +869,46 @@ else if topologyDescription.setName != serverDescription.setName:

if serverDescription.maxWireVersion >= 17: # MongoDB 6.0+
# Null values for both electionId and setVersion are always considered less than
if serverDescription.electionId > topologyDescription.maxElectionId or (
serverDescription.electionId == topologyDescription.maxElectionId
and serverDescription.setVersion >= topologyDescription.maxSetVersion
):
topologyDescription.maxElectionId = serverDescription.electionId
topologyDescription.maxSetVersion = serverDescription.setVersion
else:
# Stale primary.
# replace serverDescription with a default ServerDescription of type "Unknown"
if serverDescription.electionId < topologyDescription.maxElectionId:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we the pseudocode is a bit more verbose now that the logic needs to distinguish between the electionId vs setVersion stale cases. What do you think about one error message that include the two tuples:

"primary marked stale due to electionId/setVersion mismatch, <stale tuple> is stale compared to <max tuple>"

That way we don't need to refactor this code and more importantly drivers don't need to refactor their comparison logic.

# Stale primary due to electionId mismatch
# replace serverDescription with a default ServerDescription of type "Unknown" and an error
# field with a message containing the substring "primary marked stale due to mismatched electionId"
checkIfHasPrimary()
return
else:
# Maintain old comparison rules, namely setVersion is checked before electionId
if serverDescription.setVersion is not null and serverDescription.electionId is not null:
if (
topologyDescription.maxSetVersion is not null
and topologyDescription.maxElectionId is not null
and (
topologyDescription.maxSetVersion > serverDescription.setVersion
or (
topologyDescription.maxSetVersion == serverDescription.setVersion
and topologyDescription.maxElectionId > serverDescription.electionId
)
)
):
# Stale primary.
# replace serverDescription with a default ServerDescription of type "Unknown"
elif serverDescription.electionId == topologyDescription.maxElectionId :
if serverDescription.setVersion < topology.maxSetVersion:
# Stale primary due to setVersion mismatch
# replace serverDescription with a default ServerDescription of type "Unknown" and an error
# field with a message containing the substring "primary marked stale due to mismatched setVersion"
checkIfHasPrimary()
return

else:
topologyDescription.maxElectionId = serverDescription.electionId
topologyDescription.maxSetVersion = serverDescription.setVersion
else:
topologyDescription.maxElectionId = serverDescription.electionId
topologyDescription.maxSetVersion = serverDescription.setVersion


else:
# Maintain old comparison rules, namely setVersion is checked before electionId
if serverDescription.setVersion is not null and serverDescription.electionId is not null
and topologyDescription.maxSetVersion is not null and topologyDescription.maxElectionId is not
null:
if topologyDescription.maxSetVersion > serverDescription.setVersion:
# replace serverDescription with a default ServerDescription of type "Unknown" and an
# error field with a message containing the substring "primary marked stale due to mismatched setVersion"
checkIfHasPrimary()
return
elif topologyDescription.maxSetVersion == serverDescription.setVersion &&
topologyDescription.maxElectionId > serverDescription.electionId:
# Stale primary due to electionId mismatch
# replace serverDescription with a default ServerDescription of type "Unknown" and an error
# field with a message containing the substring "primary marked stale due to mismatched electionId"
checkIfHasPrimary()
return
else:
topologyDescription.maxElectionId = serverDescription.electionId
if serverDescription.setVersion is not null and (
topologyDescription.maxSetVersion is null
or serverDescription.setVersion > topologyDescription.maxSetVersion
Expand All @@ -905,9 +919,9 @@ else:
for each server in topologyDescription.servers:
if server.address != serverDescription.address:
if server.type is RSPrimary:
# See note below about invalidating an old primary.
replace the server with a default ServerDescription of type "Unknown"

# See note below about invalidating an old primary
# Replace the server with a default ServerDescription of type "Unknown" and an error field
# with a message containing the substring "primary marked stale due to discovery of newer primary"
for each address in serverDescription's "hosts", "passives", and "arbiters":
if address is not in topologyDescription.servers:
add new default ServerDescription of type "Unknown"
Expand All @@ -921,8 +935,11 @@ checkIfHasPrimary()
```

A note on invalidating the old primary: when a new primary is discovered, the client finds the previous primary (there
should be none or one) and replaces its description with a default ServerDescription of type "Unknown." A multi-threaded
client MUST [request an immediate check](server-monitoring.md#requesting-an-immediate-check) for that server as soon as
should be none or one) and replaces its description with a default ServerDescription of type "Unknown". Additionally,
the `error` field of the new `ServerDescription` object MUST include a descriptive error explaining that it was
invalidated because the primary was determined to be stale. Drivers MAY additionally specify whether this was due to an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is "a note on invalidating the old primary" so I don't think it applies to all the "electionId/setVersion mismatch" case and this sentence should be moved:

Drivers MAY additionally specify whether this was due to an electionId or setVersion mismatch as described in the ServerDescripion.error section.

It should be moved to the paragraph below that starts with "If the server is primary with an obsolete electionId or setVersion,"

electionId or setVersion mismatch as described in the [ServerDescripion.error section](#error). A multi-threaded client
MUST [request an immediate check](server-monitoring.md#requesting-an-immediate-check) for that server as soon as
possible.

If the old primary server version is 4.0 or earlier, the client MUST clear its connection pool for the old primary, too:
Expand Down
2 changes: 2 additions & 0 deletions source/server-discovery-and-monitoring/tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ following keys:

- type: A ServerType name, like "RSSecondary". See [ServerType](../server-discovery-and-monitoring.md#servertype) for
details pertaining to async and multi-threaded drivers.
- error: An optional object with a with a string field containing a string that must be a substring of the message on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An optional object with a with a string field containing a string that must be...
->
An optional string that must be...

the `ServerDescription.error` object
- setName: A string with the expected replica set name, or null.
- setVersion: absent or an integer.
- electionId: absent, null, or an ObjectId.
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,8 @@ phases: [
"a:27017": {
type: "Unknown",
setName: ,
electionId:
electionId: ,
error: "primary marked stale due to electionId/setVersion mismatch"
},
"b:27017": {
type: "RSPrimary",
Expand Down Expand Up @@ -100,7 +101,8 @@ phases: [
"a:27017": {
type: "Unknown",
setName: ,
electionId:
electionId:,
error: "primary marked stale due to electionId/setVersion mismatch"
},
"b:27017": {
type: "RSPrimary",
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,8 @@ phases: [
"a:27017": {
type: "Unknown",
setName: ,
electionId:
electionId:,
error: "primary marked stale due to electionId/setVersion mismatch"
},
"b:27017": {
type: "RSPrimary",
Expand Down Expand Up @@ -100,7 +101,8 @@ phases: [
"a:27017": {
type: "Unknown",
setName: ,
electionId:
electionId:,
error: "primary marked stale due to electionId/setVersion mismatch"
},
"b:27017": {
type: "RSPrimary",
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@ phases: [
"a:27017": {
type: "Unknown",
setName: ,
electionId:
electionId:,
error: "primary marked stale due to electionId/setVersion mismatch"
},
"b:27017": {
type: "RSPrimary",
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,8 @@ phases: [
"a:27017": {
type: "Unknown",
setName: ,
electionId:
electionId:,
error: "primary marked stale due to electionId/setVersion mismatch"
},
"b:27017": {
type: "RSPrimary",
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ phases: [
"a:27017": {
type: "Unknown",
setName: ,
electionId:
electionId:,
error: "primary marked stale due to electionId/setVersion mismatch"
},
"b:27017": {
type: "RSPrimary",
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,8 @@ phases: [
"a:27017": {
type: "Unknown",
setName: ,
electionId:
electionId:,
error: "primary marked stale due to electionId/setVersion mismatch"
},
"b:27017": {
type: "RSPrimary",
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,8 @@ phases: [
"a:27017": {
type: "Unknown",
setName: ,
electionId:
electionId:,
error: "primary marked stale due to electionId/setVersion mismatch"
},
"b:27017": {
type: "RSPrimary",
Expand Down Expand Up @@ -99,7 +100,8 @@ phases: [
"a:27017": {
type: "Unknown",
setName: ,
electionId:
electionId:,
error: "primary marked stale due to electionId/setVersion mismatch"
},
"b:27017": {
type: "RSPrimary",
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,8 @@ phases: [
"b:27017": {
type: "Unknown",
setName: ,
electionId:
electionId:,
error: "primary marked stale due to electionId/setVersion mismatch"
}
},
topologyType: "ReplicaSetWithPrimary",
Expand Down Expand Up @@ -106,7 +107,8 @@ phases: [
"b:27017":{
type: "Unknown",
setName: ,
electionId:
electionId:,
error: "primary marked stale due to electionId/setVersion mismatch"
}
},
topologyType: "ReplicaSetWithPrimary",
Expand Down
Loading