CASSANDRA-20476 & CASSANDRA-20736 Handle CMS member addresses changing concurrently #4613
CASSANDRA-20476 & CASSANDRA-20736 Handle CMS member addresses changing concurrently #4613beobal wants to merge 23 commits intoapache:trunkfrom
Conversation
…ed before allowing startup to proceed
…og processing is suspended
|
|
||
| public ImmutableSet<NodeId> joiningMembers() | ||
| { | ||
| return ImmutableSet.copyOf(joiningMembers); |
There was a problem hiding this comment.
nit; BTreeSet is already immutable, we probably don't need to copy it
|
|
||
| /** | ||
| * Used to derive a CMSMembership when deserializing a ClusterMetadata instance written with a metadata version | ||
| * prior to V7. At that time, CMS membership was always inferred from the data placements of the distributed |
| } | ||
|
|
||
| private final Map<NodeId, Pair<InetAddressAndPort, InetAddressAndPort>> overrides; | ||
| private final BiMap<InetAddressAndPort, InetAddressAndPort> addressMap; |
There was a problem hiding this comment.
addressMap is only used in the toString method
| return new InitialBuilder(metadata); | ||
| } | ||
|
|
||
| private final Map<NodeId, Pair<InetAddressAndPort, InetAddressAndPort>> overrides; |
There was a problem hiding this comment.
this should probably be an ImmutableMap for clarity?
and if we make InitialBuilder and rebuild below build immutablemaps we can avoid the copying
| return state == State.ACTIVE; | ||
| } | ||
|
|
||
| public InetAddressAndPort getAddressOverride(NodeId id) |
| else | ||
| { | ||
| // This cluster did not previously upgrade from a gossip based version (i.e. pre-6.0) but did at some point | ||
| // run a version prior to MetadataVersion.V7 where we started to encode CMS membership directly. This |
| // so we can derive the CMSMembership using the data placement and directory. | ||
| DataPlacement placement = placements.get(metadataKs.params.replication); | ||
| cmsMembership = CMSMembership.reconstruct(placement, dir); | ||
| placements = placements.unbuild().without(metadataKs.params.replication).build(); |
There was a problem hiding this comment.
I think this is unnecessary - we do the same thing directly after the if stmt
| int currentRound = 0; | ||
| long roundTimeNanos = Math.min(TimeUnit.SECONDS.toNanos(4), | ||
| DatabaseDescriptor.getDiscoveryTimeout(TimeUnit.NANOSECONDS) / maxRounds); | ||
| // TODO a non-CMS node only needs to be able to contact a single CMS member to commit its STARTUP |
There was a problem hiding this comment.
should we fix this? It feels like we'll most often discover the full CMS if its up
and if it is not yet up, it might be better to wait here before trying to commit Startup?
|
|
||
| int maxRounds = 5; | ||
| int currentRound = 0; | ||
| long roundTimeNanos = Math.min(TimeUnit.SECONDS.toNanos(4), |
There was a problem hiding this comment.
is 4s enough here? Should we add another "discover survey" config setting?
| Map<NodeId, InetAddressAndPort> confirmedCMS = new HashMap<>(); | ||
|
|
||
| Set<InetAddressAndPort> candidates = new HashSet<>(previousCMS.values()); | ||
| candidates.add(newAddress); |
There was a problem hiding this comment.
any reason we don't add the seeds to candidates here? Feels like it could save us a discovery round
Changing broadcast address has always been supported, but it requires the node to inform the CMS of the change at startup. If a majority of the CMS members attempt to do this concurrently, they have no way to establish the quorum required to make those metadata changes, leading to a deadlocked startup.
This is addressed by the combination of 2 patchsets: