-
Notifications
You must be signed in to change notification settings - Fork 4k
mma: eliminate pendingChangeNoRollback #158024
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, no major concerns! Thanks for making this simplification, it's nice to see this streamlined further. Basically an LGTM, but I'm a bit fried today so @wenyihu6 should take a look. I also see that you had a TODO that you probably wanted to answer to yourself first.
@tbg reviewed 5 of 5 files at r1, all commit messages.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @wenyihu6)
pkg/kv/kvserver/allocator/mmaprototype/testdata/cluster_state/rebalance_replica line 93 at r1 (raw file):
store-id=1 range-id=1 load=[80,80,80] raft-cpu=20 config=(num_replicas=3 constraints={'+region=us-west-1:1'} voter_constraints={'+region=us-west-1:1'}) store-id=1 replica-id=1 type=VOTER_FULL leaseholder=true
oops.
pkg/kv/kvserver/allocator/mmaprototype/testdata/cluster_state/rebalance_replica_local_stores_enacted_gc line 255 at r1 (raw file):
# TODO: why local-store-id=1?
What is the question here?
pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 280 at r1 (raw file):
// NB: the pointers in change.pendingReplicaChanges are not the same as the // pointers in the internal clusterState. We look the latter up using the // changeID. The changes slice contains the internal pointers.
I was getting a little confused by this comment and I'm not sure what it's trying to explain to me. clusterState.pendingChanges is a keyed map. var changes is a slice. What is the confusion that could arise without the comment?
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 223 at r1 (raw file):
// // The prev field is mutable after creation, to ensure that an undo restores // the state to the latest source of truth from the leaseholder.
❤️
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 471 at r1 (raw file):
// *pendingReplicaChange object, which is mutable. To prevent race conditions // with exported functions on PendingRangeChange called from outside the // package, the *pendingReplicaChange objects returned outside the package are
This seems a little iffy. Have you thought about storing the pendingReplicaChange in just one place and having the other locations just store the ChangeID? That way there's only ever one master copy to update, and we can do it "by value" without worrying about this memory held anywhere else.
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 1602 at r1 (raw file):
} // We did not undo the load change above, or remove it from the various // pendingChanges data-structures. We do those things now.
Is this the only place in which this is done? There's enough going on here to not want to have this duplicated.
a600a0b to
f8cd9a8
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @tbg and @wenyihu6)
pkg/kv/kvserver/allocator/mmaprototype/allocator_state.go line 280 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I was getting a little confused by this comment and I'm not sure what it's trying to explain to me.
clusterState.pendingChangesis a keyed map.var changesis a slice. What is the confusion that could arise without the comment?
The comment is referencing the method parameter change.pendingReplicaChanges and just trying to ensure that the reader realizes that the pointers we successfully lookup in the clusterState.pendingChanges are not expected to be identical, in case someone later things that a pointer equality assertion is a good idea.
I've reworded this comment and moved it down:
// NB: the ch and c pointers are not identical even though they have the
// same changeID. We create two copies in
// clusterState.addPendingRangeChange, since the internal copy is mutable.
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 471 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
This seems a little iffy. Have you thought about storing the pendingReplicaChange in just one place and having the other locations just store the ChangeID? That way there's only ever one master copy to update, and we can do it "by value" without worrying about this memory held anywhere else.
I considered it, but the PendingRangeChange we return to the outside world is used for actually retrieving the explicit changes for the caller to perform (and logging of course). For example, PendingRangeChange.ReplicationChanges() called by MMAStoreRebalancer. We cannot rely on the forgetful storage in MMA to retrieve those changes (due to some random slowness, those could be GC'd), and it is really better for it to be self contained.
What one could do is return a different type than PendingRangeChange -- but at that point we have created two types that are similarish (since the external change needs the changeID and some subset of ReplicaChange), and seemed like software over-engineering.
pkg/kv/kvserver/allocator/mmaprototype/cluster_state.go line 1602 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Is this the only place in which this is done? There's enough going on here to not want to have this duplicated.
Yes, this is the only place. The other places are decisive about undoing things (GC or AdjustPendingChangeDisposition). This is the one path where we have latest source of truth and first apply it to the membership state and then make a decision on what to do with the pending changes. Which is why the preceding if-block also has the so we pass !applyLoadChange comment.
pkg/kv/kvserver/allocator/mmaprototype/testdata/cluster_state/rebalance_replica_local_stores_enacted_gc line 255 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
What is the question here?
oops, I forgot to investigate. It was to figure out why we have top-k-ranges for local-store-id=1. It was just how the test was setup. I've tweaked the test to show both the stale and fresh view of top-k, with some commentary.
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heads up I didn’t get to this today and will take a pass tomorrow. The main thing I wanted to look into is when prev becomes mutable after creation and may no longer be compatible with next as a pair, will that break any of the helper-function (like
| for _, c := range prc.pendingReplicaChanges { |
| if c.prev.IsLeaseholder && !c.next.IsLeaseholder { |
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @tbg)
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing I wanted to look into is when
prevbecomes mutable after creation and may no longer be compatible withnextas a pair, will that break any of the helper-function
That is a good point. I'll make an attempt to clean this up more and ping when it is ready.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @tbg)
Instead, the ReplicaChange.prev field is updated to reflect the latest state reported by the leaseholder. In addition to simplifyin the code, it fixes an existing issue where an undo could rollback to a state preceding the latest leaseholder state. Informs cockroachdb#157049 Epic: CRDB-55052 Release note: None
f8cd9a8 to
2ed96a3
Compare
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @tbg and @wenyihu6)
pkg/kv/kvserver/allocator/mmaprototype/testdata/cluster_state/rebalance_replica_local_stores_fail_lease_transfer line 73 at r3 (raw file):
# The replica from store s2 is removed because of GC. This is not correct. # # TODO: fix after cleanup PR is merged and this rebases on top.
I've added this test which indicates a problem.
This is related to the hazard raised by @wenyihu6
wenyihu6
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like me to hold off on the review, or should I continue reviewing with the understanding that this will be addressed in a follow-up PR?
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @tbg)
Potential Bug(s) DetectedThe three-stage Claude Code analysis has identified potential bug(s) in this PR that may warrant investigation. Next Steps: Note: When viewing the workflow output, scroll to the bottom to find the Final Analysis Summary. After you review the findings, please tag the issue as follows:
|
sumeerbhola
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like me to hold off on the review, or should I continue reviewing with the understanding that this will be addressed in a follow-up PR?
I've just created #158153 which will need to merge first, and this will be rebased on top of it.
Reviewable status:
complete! 0 of 0 LGTMs obtained (waiting on @tbg and @wenyihu6)
Instead, the ReplicaChange.prev field is updated to reflect the latest state reported by the leaseholder.
In addition to simplifyin the code, it fixes an existing issue where an undo could rollback to a state preceding the latest leaseholder state.
Informs #157049
Epic: CRDB-55052
Release note: None