Add support for server selection's deprioritized servers to all topologies.#1860
Add support for server selection's deprioritized servers to all topologies.#1860vbabanin wants to merge 9 commits intomongodb:mainfrom
Conversation
|
|
||
| protected void updateDescription(final ClusterDescription newDescription) { | ||
| @VisibleForTesting(otherwise = PROTECTED) | ||
| public void updateDescription(final ClusterDescription newDescription) { |
There was a problem hiding this comment.
It seems we can avoid making this method public by overriding it in the subclass. We won't even need to make it public in the subclass, because the test code using the method will be in the same package as the subclass. However, this requires the subclass to not be anonymous (if we could use var, the proposed approach would have worked even for an anonymous subclass).
| } catch (MongoTimeoutException mongoTimeoutException) { | ||
| List<ServerDescription> inLatencyWindowServers = buildServerDescriptions(definition.getArray("in_latency_window")); | ||
| assertTrue("Expected emtpy but was " + inLatencyWindowServers.size() + " " + definition.toJson( | ||
| JsonWriterSettings.builder() | ||
| .indent(true).build()), inLatencyWindowServers.isEmpty()); | ||
| return; |
There was a problem hiding this comment.
Since we now perform the full server-selection path (via BaseCluster), the behavior of "no servers selected: is observed differently than before.
Previously, these tests invoked the Selector directly, got an empty list, and asserted on that result. With BaseCluster, server selection runs the normal selection loop: it will retry until either a server becomes selectable or the selection timeout elapses.
In this setup, a server selection timeout is the expected signal that no servers are available/eligible for selection. The timeout is set to 200ms to keep the test fast, while giving enough headroom to avoid any flakiness.
| List<ServerDescription> latencyBasedSelectedServers = latencyBasedServerSelector.select(clusterDescription); | ||
| assertServers(latencyBasedSelectedServers, inLatencyWindowServers); | ||
| assertNotNull(serverTuple); | ||
| assertTrue(inLatencyWindowServers.stream().anyMatch(s -> s.getAddress().equals(serverTuple.getServerDescription().getAddress()))); |
There was a problem hiding this comment.
There’s an ambiguity in the server selection spec docs about what drivers must assert for in_latency_window:
tests/README.md says to verify the set of servers in in_latency_window.
"Drivers implementing server selection MUST test that their implementation correctly returns the set of servers in in_latency_window."
server-selection-tests.mdsays to verify the selection returns one of the servers in in_latency_window.
"Drivers implementing server selection MUST test that their implementations correctly return one of the servers in in_latency_window."
This test follows server-selection-tests.md, so asserting that the selected server is within the expected in_latency_window set is consistent with the requirements in the spec.
P.S: Both files were created in this single commit with contradicting requirements:
Feb 6, 2015 - Commit 6b63123a - "Add Server Selection Spec"
File 1: server-selection-tests.rst
Drivers implementing server selection MUST test that their implementations
correctly return one of the servers in ``in_latency_window``.
File 2: tests/README.rst
Drivers implementing server selection MUST test that their implementations
correctly return the set of servers in ``in_latency_window``.
# Conflicts: # driver-core/src/test/resources/specifications # driver-core/src/test/unit/com/mongodb/connection/ServerSelectionSelectionTest.java
stIncMale
left a comment
There was a problem hiding this comment.
The last reviewed commit is f20b482.
The files I haven't yet reviewed:
ServerDeprioritizationTestServerSelectionSelectionTest
@vbabanin I am proposing to resolve all the outstanding comments before proceeding with the changes required by DRIVERS-3404 (for details, see the description of this PR).
| new ReadConcernAwareNoOpSessionContext(ReadConcern.DEFAULT), | ||
| new TimeoutContext(TIMEOUT_SETTINGS), | ||
| getServerApi()); | ||
| public static OperationContext getOperationContext() { |
There was a problem hiding this comment.
Thank you for noticing the issue with OperationContext being mutable (it was mutable before this PR), and fixing it.
The methods returning OperationContext in this class are a mess:
getOperationContext()createOperationContext(TimeoutSettings timeoutSettings)createNewOperationContext(TimeoutSettings timeoutSettings)getOperationContext(ReadPreference readPreference)
- Let's name the methods consistently. I think, all of the aforementioned methods should use the "create" prefix.
1.1. Let's do this automatically via IDE in a separate commit, and express in the commit message that the commit was done via automatic refactoring, so that reviewers know not to review it. - Let's remove the weirdly named and trivial
createNewOperationContextmethod, and inline it where it is used (fortunately, it is used only in two places inClusterFixture, and nowhere else).
| def source = getBinding().getReadConnectionSource(OPERATION_CONTEXT) | ||
| def connection = source.getConnection(OPERATION_CONTEXT) | ||
| def source = getBinding().getReadConnectionSource(getOperationContext()) | ||
| def connection = source.getConnection(getOperationContext()) |
There was a problem hiding this comment.
I think, the OperationContext used here should be the same. There may be many more changed places in this PR where this is the case, and given how many changes there are, it may not be too easy to identify them all.
For brevity, I'll be marking such places with just the "same context" comment. For now I am not sure if it is even practically achievable to identify them all.
The problem is exacerbated by ClusterFixture creating a new OperationContext each time it needs one, which means that the ClusterFixture.getBinding never uses the same context as the one used by the test calling ClusterFixture.getBinding.
| def source = getBinding().getReadConnectionSource(OPERATION_CONTEXT) | ||
| def connection = source.getConnection(OPERATION_CONTEXT) | ||
| def source = getBinding().getReadConnectionSource(getOperationContext()) | ||
| def connection = source.getConnection(getOperationContext()) |
| @@ -98,7 +98,7 @@ protected void applyApplicationError(final BsonDocument applicationError) { | |||
| switch (type) { | |||
| case "command": | |||
| exception = getCommandFailureException(applicationError.getDocument("response"), serverAddress, | |||
| OPERATION_CONTEXT.getTimeoutContext()); | |||
| getOperationContext().getTimeoutContext()); | |||
| Timeout serverSelectionTimeout = getOperationContext().getTimeoutContext().computeServerSelectionTimeout(); | ||
| DefaultServer server = (DefaultServer) getCluster() | ||
| .getServersSnapshot(serverSelectionTimeout, OPERATION_CONTEXT.getTimeoutContext()) | ||
| .getServersSnapshot(serverSelectionTimeout, getOperationContext().getTimeoutContext()) |
| final ClusterableServerFactory serverFactory, | ||
| final ClientMetadata clientMetadata) { | ||
| @VisibleForTesting(otherwise = PRIVATE) | ||
| protected BaseCluster(final ClusterId clusterId, |
There was a problem hiding this comment.
The value of otherwise is incorrect here.
| getRaceConditionPreFilteringSelector(serversSnapshot), | ||
| serverSelector, | ||
| serverDeprioritization.getServerSelector(), | ||
| serverDeprioritization.applyDeprioritization(serverSelector), |
There was a problem hiding this comment.
Let's rename the method to apply. The name of the type (ServerDeprioritization) of the object whose instance method we call tells what is being applied. There is no need to duplicate that in the names of methods.
Let's do this automatically via IDE in a separate commit, and express in the commit message that the commit was done via automatic refactoring, so that reviewers know not to review it.
| if (serverDescriptions.size() == 1 || deprioritized.isEmpty()) { | ||
| return wrappedSelector.select(clusterDescription); | ||
| } | ||
|
|
||
| List<ServerDescription> nonDeprioritizedServerDescriptions = serverDescriptions | ||
| .stream() | ||
| .filter(serverDescription -> !deprioritized.contains(serverDescription.getAddress())) | ||
| .collect(toList()); |
There was a problem hiding this comment.
Let's leave a TODO-JAVA-XXXX comment linking this code (the if optimization as well as the use of Stream) to the existing performance ticket about server selection. We should also mention in the description of that ticket that there are TODO-JAVA-XXXX comments that needs to be addressed.
The open questions here are:
- whether the
ifoptimization is worth itl - whether we should use a loop instead of using
Stream.
| new ClusterDescription(clusterDescription.getConnectionMode(), clusterDescription.getType(), | ||
| nonDeprioritizedServerDescriptions, | ||
| clusterDescription.getClusterSettings(), | ||
| clusterDescription.getServerSettings())); |
There was a problem hiding this comment.
We should use the
public ClusterDescription(final ClusterConnectionMode connectionMode, final ClusterType type,
@Nullable final MongoException srvResolutionException,
final List<ServerDescription> serverDescriptions,
@Nullable final ClusterSettings clusterSettings,
@Nullable final ServerSettings serverSettings) {constructor.
| * The returned {@link ServerSelector} wraps the provided selector and attempts server selection in two passes: | ||
| * <ol> | ||
| * <li>First pass: calls the wrapped selector with only non-deprioritized {@link ServerDescription}s</li> | ||
| * <li>Second pass: if the first pass returns no servers, calls the wrapped selector again with all servers (including deprioritized ones)</li> | ||
| * </ol> |
There was a problem hiding this comment.
[optional]
| * The returned {@link ServerSelector} wraps the provided selector and attempts server selection in two passes: | |
| * <ol> | |
| * <li>First pass: calls the wrapped selector with only non-deprioritized {@link ServerDescription}s</li> | |
| * <li>Second pass: if the first pass returns no servers, calls the wrapped selector again with all servers (including deprioritized ones)</li> | |
| * </ol> | |
| * The returned {@link ServerSelector} wraps the provided selector and attempts | |
| * {@linkplain ServerSelector#select(ClusterDescription) server selection} in two passes: | |
| * <ol> | |
| * <li>First pass: selects using the wrapped selector with only non-deprioritized {@link ServerDescription}s.</li> | |
| * <li>Second pass: if the first pass selects no {@link ServerDescription}s, | |
| * selects using the wrapped selector again with all {@link ServerDescription}s, including deprioritized ones.</li> | |
| * </ol> |
Relevant specification changes:
JAVA-6021,
JAVA-6074,
JAVA-6105,
JAVA-6114