Skip to content

Commit e5d78c3

Browse files
committed
kvserver: update splitPreApply comments
Epic: none Release note: none
1 parent 5798d47 commit e5d78c3

File tree

1 file changed

+21
-15
lines changed

1 file changed

+21
-15
lines changed

pkg/kv/kvserver/store_split.go

Lines changed: 21 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -46,12 +46,14 @@ func splitPreApply(
4646
r.StoreID(), split)
4747
}
4848

49-
// Check on the RHS, we need to ensure that it exists and has a minReplicaID
50-
// less than or equal to the replica we're about to initialize.
49+
// Obtain the RHS replica. In the common case, it exists and its ReplicaID
50+
// matches the one in the split trigger. It is the uninitialized replica that
51+
// has just been created or obtained in Replica.acquireSplitLock, and its
52+
// raftMu is locked.
5153
//
52-
// The right hand side of the split was already created (and its raftMu
53-
// acquired) in Replica.acquireSplitLock. It must be present here if it hasn't
54-
// been removed in the meantime (handled below).
54+
// In the less common case, the ReplicaID is already removed from this Store,
55+
// and rightRepl is either nil or an uninitialized replica with a higher
56+
// ReplicaID. Its raftMu is not locked.
5557
rightRepl := r.store.GetReplicaIfExists(split.RightDesc.RangeID)
5658
// Check to see if we know that the RHS has already been removed from this
5759
// store at the replica ID implied by the split.
@@ -76,6 +78,8 @@ func splitPreApply(
7678
// it's always present).
7779
var hs raftpb.HardState
7880
if rightRepl != nil {
81+
// TODO(pav-kv): rightRepl could have been destroyed by the time we get to
82+
// lock it here. The HardState read-then-write appears risky in this case.
7983
rightRepl.raftMu.Lock()
8084
defer rightRepl.raftMu.Unlock()
8185
// Assert that the rightRepl is not initialized. We're about to clear out
@@ -90,6 +94,10 @@ func splitPreApply(
9094
log.Fatalf(ctx, "failed to load hard state for removed rhs: %v", err)
9195
}
9296
}
97+
// TODO(#152199): the rightRepl == nil condition is flaky. There can be a
98+
// racing replica creation for a higher ReplicaID, and it can subsequently
99+
// update its HardState. Here, we can accidentally clear the HardState of
100+
// that new replica.
93101
if err := kvstorage.ClearRangeData(ctx, split.RightDesc.RangeID, readWriter, readWriter, kvstorage.ClearRangeDataOptions{
94102
// We know there isn't anything in these two replicated spans below in the
95103
// right-hand side (before the current batch), so setting these options
@@ -99,19 +107,17 @@ func splitPreApply(
99107
ClearReplicatedByRangeID: true,
100108
// See the HardState write-back dance above and below.
101109
//
102-
// TODO(tbg): we don't actually want to touch the raft state of the right
103-
// hand side replica since it's absent or a more recent replica than the
104-
// split. Now that we have a boolean targeting the unreplicated
105-
// RangeID-based keyspace, we can set this to false and remove the
106-
// HardState+ReplicaID write-back. (The WriteBatch does not contain
107-
// any writes to the unreplicated RangeID keyspace for the RHS, see
108-
// splitTriggerHelper[^1]).
110+
// TODO(tbg): we don't actually want to touch the raft state of the RHS
111+
// replica since it's absent or a more recent one than in the split. Now
112+
// that we have a bool targeting unreplicated RangeID-local keys, we can
113+
// set it to false and remove the HardState+ReplicaID write-back. However,
114+
// there can be historical split proposals with the RaftTruncatedState key
115+
// set in splitTriggerHelper[^1]. We must first make sure that such
116+
// proposals no longer exist, e.g. with a below-raft migration.
109117
//
110118
// [^1]: https://github.com/cockroachdb/cockroach/blob/f263a765d750e41f2701da0a923a6e92d09159fa/pkg/kv/kvserver/batcheval/cmd_end_transaction.go#L1109-L1149
111119
//
112-
// See also:
113-
//
114-
// https://github.com/cockroachdb/cockroach/issues/94933
120+
// See also: https://github.com/cockroachdb/cockroach/issues/94933
115121
ClearUnreplicatedByRangeID: true,
116122
}); err != nil {
117123
log.Fatalf(ctx, "failed to clear range data for removed rhs: %v", err)

0 commit comments

Comments
 (0)