Skip to content
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -226,7 +226,9 @@ Fields:
field in the server's hello or legacy hello response, in the case that the server reports an address different from
the address the client uses.

- (=) `error`: information about the last error related to this server. Default null.
- (=) `error`: information about the last error related to this server. Default null. MUST contain or be able to produce
a string describing the error. The name of the field containing the string describing the error SHOULD be what is
most idiomatic for each driver.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was the motivation for adding "The name of the field containing the string describing the error SHOULD be what is most idiomatic for each driver."?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking that the error field would be an object representing an error and since different languages have different ways of doing that, I think it's better to defer to driver engineers on what is the best way to communicate that to users. In Node we'd likely do that with a new MongoError subclass (which itself subclasses the native Node Error type) that uses a message field to hold human readable information about the error. We currently have it typed as a generic MongoError

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think we can remove that sentence since it generally always true of any spec prescribed api.


- `roundTripTime`: the duration of the hello or legacy hello call. Default null.

Expand Down Expand Up @@ -485,7 +487,14 @@ removed once the primary is checked.
#### error

If the client experiences any error when checking a server, it stores error information in the ServerDescription's error
field.
field. The message contained in this field MUST contain the substrings detailed in the table below when the
ServerDescription is changed to Unknown in the circumstances outlined.

| circumstance | error substring |
| -------------------------------------------------------------------- | -------------------------------------------------------- |
| RSPrimary with a stale electionId is discovered | `'primary marked stale due to electionId mismatch'` |
| RSPrimary with a stale setVersion is discovered | `'primary marked stale due to setVersion mismatch'` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we combine these two cases and include both the stale and max election tuples?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's fair

| A more current RSPrimary is discovered alongside an existing primary | `'primary marked stale due to discovery of new primary'` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we combine this case as well? IIUC the case is the same as the above but just from the other server's point of view.


#### roundTripTime

Expand Down Expand Up @@ -863,38 +872,46 @@ else if topologyDescription.setName != serverDescription.setName:

if serverDescription.maxWireVersion >= 17: # MongoDB 6.0+
# Null values for both electionId and setVersion are always considered less than
if serverDescription.electionId > topologyDescription.maxElectionId or (
serverDescription.electionId == topologyDescription.maxElectionId
and serverDescription.setVersion >= topologyDescription.maxSetVersion
):
topologyDescription.maxElectionId = serverDescription.electionId
topologyDescription.maxSetVersion = serverDescription.setVersion
else:
# Stale primary.
# replace serverDescription with a default ServerDescription of type "Unknown"
if serverDescription.electionId < topologyDescription.maxElectionId:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see we the pseudocode is a bit more verbose now that the logic needs to distinguish between the electionId vs setVersion stale cases. What do you think about one error message that include the two tuples:

"primary marked stale due to electionId/setVersion mismatch, <stale tuple> is stale compared to <max tuple>"

That way we don't need to refactor this code and more importantly drivers don't need to refactor their comparison logic.

# Stale primary due to electionId mismatch
# replace serverDescription with a default ServerDescription of type "Unknown" and an error
# field with a message containing the substring "primary marked stale due to mismatched electionId"
checkIfHasPrimary()
return
else:
# Maintain old comparison rules, namely setVersion is checked before electionId
if serverDescription.setVersion is not null and serverDescription.electionId is not null:
if (
topologyDescription.maxSetVersion is not null
and topologyDescription.maxElectionId is not null
and (
topologyDescription.maxSetVersion > serverDescription.setVersion
or (
topologyDescription.maxSetVersion == serverDescription.setVersion
and topologyDescription.maxElectionId > serverDescription.electionId
)
)
):
# Stale primary.
# replace serverDescription with a default ServerDescription of type "Unknown"
elif serverDescription.electionId == topologyDescription.maxElectionId :
if serverDescription.setVersion < topology.maxSetVersion:
# Stale primary due to setVersion mismatch
# replace serverDescription with a default ServerDescription of type "Unknown" and an error
# field with a message containing the substring "primary marked stale due to mismatched setVersion"
checkIfHasPrimary()
return

else:
topologyDescription.maxElectionId = serverDescription.electionId
topologyDescription.maxSetVersion = serverDescription.setVersion
else:
topologyDescription.maxElectionId = serverDescription.electionId
topologyDescription.maxSetVersion = serverDescription.setVersion


else:
# Maintain old comparison rules, namely setVersion is checked before electionId
if serverDescription.setVersion is not null and serverDescription.electionId is not null
and topologyDescription.maxSetVersion is not null and topologyDescription.maxElectionId is not
null:
if topologyDescription.maxSetVersion > serverDescription.setVersion:
# replace serverDescription with a default ServerDescription of type "Unknown" and an
# error field with a message containing the substring "primary marked stale due to mismatched setVersion"
checkIfHasPrimary()
return
elif topologyDescription.maxSetVersion == serverDescription.setVersion &&
topologyDescription.maxElectionId > serverDescription.electionId:
# Stale primary due to electionId mismatch
# replace serverDescription with a default ServerDescription of type "Unknown" and an error
# field with a message containing the substring "primary marked stale due to mismatched electionId"
checkIfHasPrimary()
return
else:
topologyDescription.maxElectionId = serverDescription.electionId
if serverDescription.setVersion is not null and (
topologyDescription.maxSetVersion is null
or serverDescription.setVersion > topologyDescription.maxSetVersion
Expand All @@ -905,9 +922,9 @@ else:
for each server in topologyDescription.servers:
if server.address != serverDescription.address:
if server.type is RSPrimary:
# See note below about invalidating an old primary.
replace the server with a default ServerDescription of type "Unknown"

# See note below about invalidating an old primary
# Replace the server with a default ServerDescription of type "Unknown" and an error field
# with a message containing the substring "primary marked stale due to discovery of newer primary"
for each address in serverDescription's "hosts", "passives", and "arbiters":
if address is not in topologyDescription.servers:
add new default ServerDescription of type "Unknown"
Expand All @@ -921,9 +938,11 @@ checkIfHasPrimary()
```

A note on invalidating the old primary: when a new primary is discovered, the client finds the previous primary (there
should be none or one) and replaces its description with a default ServerDescription of type "Unknown." A multi-threaded
client MUST [request an immediate check](server-monitoring.md#requesting-an-immediate-check) for that server as soon as
possible.
should be none or one) and replaces its description with a default ServerDescription of type "Unknown". Additionally,
the `error` field of the new `ServerDescription` object MUST include a descriptive error explaining that it was
invalidated because the primary was determined to be stale. Drivers MAY additionally specify whether this was due to an
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This paragraph is "a note on invalidating the old primary" so I don't think it applies to all the "electionId/setVersion mismatch" case and this sentence should be moved:

Drivers MAY additionally specify whether this was due to an electionId or setVersion mismatch as described in the ServerDescripion.error section.

It should be moved to the paragraph below that starts with "If the server is primary with an obsolete electionId or setVersion,"

electionId or setVersion mismatch. A multi-threaded client MUST
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's get more specific about what the error should look like so that we can add tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I purposely left this a little more open-ended to give drivers leeway to use the Error/Exception API most native to their language. I can say that more explicitly in the ServerDescription.error section.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to cross-reference the ServerDescription.error section

[request an immediate check](server-monitoring.md#requesting-an-immediate-check) for that server as soon as possible.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we expand the scope of this ticket to include more info in other cases where we reset a server to "unknown" besides stale primary?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think that makes sense to do here. Seems like the only other place we do that is in the handleError function, so I'll take a look at that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are two places we do this in our handleError logic, but in both cases, the handleError function takes in an error as a parameter. Do we still want to be testing against the errors here?


If the old primary server version is 4.0 or earlier, the client MUST clear its connection pool for the old primary, too:
the connections are all bad because the old primary has closed its sockets. If the old primary server version is 4.2 or
Expand Down
2 changes: 2 additions & 0 deletions source/server-discovery-and-monitoring/tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ following keys:

- type: A ServerType name, like "RSSecondary". See [ServerType](../server-discovery-and-monitoring.md#servertype) for
details pertaining to async and multi-threaded drivers.
- error: An optional object with a with a string field containing a string that must be a substring of the message on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An optional object with a with a string field containing a string that must be...
->
An optional string that must be...

the `ServerDescription.error` object
- setName: A string with the expected replica set name, or null.
- setVersion: absent or an integer.
- electionId: absent, null, or an ObjectId.
Expand Down
10 changes: 10 additions & 0 deletions source/tests_that_need_to_change
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/tests/rs/new_primary.yml
https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/tests/rs/primary_disconnect_setversion.yml
https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/tests/rs/set_version_can_rollback.yml
https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/tests/rs/setversion_equal_max_without_electionid.yml
https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/tests/rs/setversion_greaterthan_max_without_electionid.yml
https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/tests/rs/setversion_without_electionid-pre-6.0.yml
https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/tests/rs/setversion_without_electionid.yml
https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/tests/rs/stepdown_change_set_name.yml
https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/tests/rs/use_setversion_without_electionid-pre-6.0.yml
https://github.com/mongodb/specifications/blob/master/source/server-discovery-and-monitoring/tests/rs/use_setversion_without_electionid.yml
Loading