Skip to content

Commit c9e31f2

Browse files
committed
Introduce "hardRemoved" concept for marking nodes permanently offline, to permit recovery from availability loss
Also fix: - Paxos should not update minHlc if AccordService is not setup - Remove EndpointMapper methods that require non-null, to ensure call-sites handle null case explicitly - AccordRepair should specify the Node Ids that must participate, but these should not affect the durability requirements for GC - Avoid erroneously marking UniversalOrInvalidated when only known to quorum - AccordSyncPropagator should not consult FailureDetector as it may throw UnknownEndpointException - Refuse ReplaceSameAddress if accord enabled - Reset Accord topologies on restart if no local command stores to ensure we can catch up (as intervening epochs may have otherwise been GC'd) - MaybeRecover should not abort if found durable if either: 1) prevProgressToken already witnessed durable; or 2) Home state has requested we recover anyway - Invariant in MaybeRecover is too strong now we permit recovering when known to be stable - ReadData must respond with InsufficientAndWaiting if not Stable to ensure Durability is inferred correctly by ExecuteTxn - Avoid StackOverflowException in NotifyWaitingOn - PreAccept should honour the REJECTED flag of any member of the latest topology if the vote would be relied upon for any prior epoch's quorum - Resume Accord bootstrap on restart - Fix burn test non-determinism - Reset Accord topologies on restart if no local command stores to ensure we can catch up (as intervening epochs may have otherwise been GC'd) - Exclude stale or removed ids from sync points - Terminate a failed DurabilityRequest if a required participant is now removed/hardRemoved Also improve: - Introduce accord_debug_keyspace.{listeners_txn, listeners_local} - Filter and record faulty in AbstractCoordination - If we are the home shard, and our home progress status is done, we should not wait for the home shard to advance the waiting progress state - Introduce CommandStore.tryExecuteListening to try to execute all waiting transactions and their dependencies - ExecuteTxn on recovery should not wait, to avoid consuming progress log concurrency slots for transactions that cannot execute - HomeState should update its phase based on the command status to handle operator request to retry all states - Separate home/wait queue in DefaultProgressLog - Additional tracing - Do not call trySendMore from prerecordFailure - Force full recovery for sync points if we have lost quorum patch by Benedict; reviewed by Alex Petrov for CASSANDRA-20921
1 parent 0db9d4d commit c9e31f2

File tree

79 files changed

+2793
-1073
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

79 files changed

+2793
-1073
lines changed

modules/accord

Submodule accord updated 107 files

src/java/org/apache/cassandra/config/AccordSpec.java

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -143,11 +143,12 @@ public enum QueueSubmissionModel
143143
public String slow_syncpoint_preaccept = "10s";
144144
public String slow_txn_preaccept = "30ms <= p50*2 <= 100ms";
145145
public String slow_read = "30ms <= p50*2 <= 100ms";
146-
public StringRetryStrategy retry_syncpoint = new StringRetryStrategy("10s*attempts <= 600s");
147-
public StringRetryStrategy retry_durability = new StringRetryStrategy("10s*attempts <= 600s");
148-
public StringRetryStrategy retry_bootstrap = new StringRetryStrategy("10s*attempts <= 600s");
149-
public StringRetryStrategy retry_fetch_min_epoch = new StringRetryStrategy("200ms...1s*attempts <= 1s,retries=3");
150-
public StringRetryStrategy retry_fetch_topology = new StringRetryStrategy("200ms...1s*attempts <= 1s,retries=100");
146+
public StringRetryStrategy retry_syncpoint = new StringRetryStrategy("10s*attempt <= 600s");
147+
public StringRetryStrategy retry_durability = new StringRetryStrategy("10s*attempt <= 600s");
148+
public StringRetryStrategy retry_bootstrap = new StringRetryStrategy("10s*attempt <= 600s");
149+
public StringRetryStrategy retry_join_bootstrap = new StringRetryStrategy("30s*attempt,attempts=5");
150+
public StringRetryStrategy retry_fetch_min_epoch = new StringRetryStrategy("200ms...1s*attempt <= 1s,retries=3");
151+
public StringRetryStrategy retry_fetch_topology = new StringRetryStrategy("200ms...1s*attempt <= 1s,retries=100");
151152
public StringRetryStrategy retry_journal_index_ready = new StringRetryStrategy("100ms");
152153

153154
public volatile DurationSpec.IntSecondsBound fast_path_update_delay = null;

src/java/org/apache/cassandra/db/virtual/AccordDebugKeyspace.java

Lines changed: 476 additions & 45 deletions
Large diffs are not rendered by default.

src/java/org/apache/cassandra/db/virtual/AccordDebugRemoteKeyspace.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,6 @@ public class AccordDebugRemoteKeyspace extends RemoteToLocalVirtualKeyspace
2626

2727
public AccordDebugRemoteKeyspace(String name, VirtualKeyspace wrap)
2828
{
29-
super(name, wrap);
29+
super(name, wrap, vt -> !vt.name().equals(AccordDebugKeyspace.NODE_OPS));
3030
}
3131
}

src/java/org/apache/cassandra/db/virtual/VirtualKeyspaceRegistry.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ public void register(VirtualKeyspace keyspace)
4343
VirtualKeyspace previous = virtualKeyspaces.put(keyspace.name(), keyspace);
4444
// some tests choose to replace the keyspace, if so make sure to cleanup tables as well
4545
if (previous != null)
46-
previous.tables().forEach(t -> virtualTables.remove(t));
46+
previous.tables().forEach(t -> virtualTables.remove(t.metadata().id));
4747
keyspace.tables().forEach(t -> virtualTables.put(t.metadata().id, t));
4848
}
4949

src/java/org/apache/cassandra/locator/InetAddressAndPort.java

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@
6363
* need to sometimes return a port and sometimes not.
6464
*
6565
*/
66-
public final class InetAddressAndPort extends InetSocketAddress implements Comparable<InetAddressAndPort>, Serializable
66+
public final class InetAddressAndPort extends InetSocketAddress implements Comparable<InetAddressAndPort>, Serializable, Endpoint
6767
{
6868
private static final long serialVersionUID = 0;
6969
private static final Logger logger = LoggerFactory.getLogger(InetAddressAndPort.class);
@@ -382,6 +382,12 @@ static int getDefaultPort()
382382
return defaultPort;
383383
}
384384

385+
@Override
386+
public InetAddressAndPort endpoint()
387+
{
388+
return this;
389+
}
390+
385391
public static final class MetadataSerializer implements org.apache.cassandra.tcm.serialization.MetadataSerializer<InetAddressAndPort>
386392
{
387393
public static final MetadataSerializer serializer = new MetadataSerializer();

src/java/org/apache/cassandra/repair/RepairJob.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -192,7 +192,7 @@ protected void runRepair()
192192
requireAllEndpoints = true;
193193
}
194194
logger.info("{} {}.{} starting accord repair, require all endpoints {}", session.previewKind.logPrefix(session.getId()), desc.keyspace, desc.columnFamily, requireAllEndpoints);
195-
AccordRepair repair = new AccordRepair(ctx, cfs, desc.sessionId, desc.keyspace, desc.ranges, requireAllEndpoints);
195+
AccordRepair repair = new AccordRepair(ctx, cfs, desc.sessionId, desc.keyspace, desc.ranges, requireAllEndpoints, allEndpoints);
196196
return repair.repair(taskExecutor).flatMap(accordRepairResult -> {
197197
// Propagate the HLC discovered during Accord repair to Paxos so Paxos doesn't use ballots < Accord has already used
198198
if (accordRepairResult.maxHlc != IAccordService.NO_HLC)

src/java/org/apache/cassandra/service/RetryStrategy.java

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -276,6 +276,13 @@ public static RetryStrategy parse(String spec, LatencySourceFactory latencies, W
276276
default: throw new IllegalArgumentException("Invalid modifier specification: unrecognised property '" + key + '\'');
277277
case "retries":
278278
retries = Integer.parseInt(value);
279+
if (retries < 0)
280+
throw new IllegalArgumentException("retries must be non-negative (retries=" + value + " supplied)");
281+
break;
282+
case "attempts":
283+
retries = Integer.parseInt(value);
284+
if (retries < 0)
285+
throw new IllegalArgumentException("Must permit at least one attempt (attempts=" + value + " supplied)");
279286
break;
280287
case "rnd":
281288
if (randomizer != null)

src/java/org/apache/cassandra/service/TimeoutStrategy.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ public class TimeoutStrategy implements WaitStrategy
8585
static final Pattern WAIT = Pattern.compile(
8686
"\\s*(?<const>0|[0-9]+[mu]?s)" +
8787
"|\\s*((p(?<perc>[0-9]+)(\\((?<rw>r|w|rw|wr)\\))?)?|(?<constbase>0|[0-9]+[mu]?s))" +
88-
"\\s*(([*]\\s*(?<mod>[0-9.]+))?\\s*(?<modkind>[*^]\\s*attempts)?)?\\s*");
88+
"\\s*(([*]\\s*(?<mod>[0-9.]+))?\\s*(?<modkind>[*^]\\s*attempts?)?)?\\s*");
8989
static final Pattern TIME = Pattern.compile(
9090
"0|[0-9]+[mu]?s");
9191

src/java/org/apache/cassandra/service/accord/AccordCommandStore.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -474,7 +474,7 @@ public AccordCommandStoreReplayer replayer()
474474
@Override
475475
protected void ensureDurable(Ranges ranges, RedundantBefore onCommandStoreDurable)
476476
{
477-
if (!CommandsForKey.reportLinearizabilityViolations())
477+
if (node().isReplaying())
478478
return;
479479

480480
long reportId = nextDurabilityLoggingId.incrementAndGet();

0 commit comments

Comments
 (0)