Skip to content

Commit aa46307

Browse files
craig[bot]wenyihu6
andcommitted
Merge #154499
154499: allocatorimpl: consult with mma for replicate queue rebalancing r=wenyihu6 a=wenyihu6 Part of: #153520 Epic: CRDB-55052 Release note: none --- **allocatorimpl: plumb allocatorSync to Allocator** This commit adds allocatorSync to the Allocator struct. In the future, Allocator will call mma directly to check if a change conflicts with mma's goals and could cause thrashing, skipping candidates that do. --- **allocatorimpl: consult with mma for replicate queue rebalancing** Previously, LBRebalancingMultiMetricAndCount allowed both MMA- and count-based rebalancing, which could lead to conflicting goals and thrashing. This commit updates the replicate queue to consult MMA on the final rebalancing target selected. Critical rebalances that repair constraints, disk fullness, or diversity bypass the MMA check; all other rebalances consult MMA before proceeding. Some asim tests showed even more thrashing, possibly due to two factors: 1. The replicate queue may have a stale view of in-progress changes. 2. The mean calculation of replica count now includes overloaded stores that might not be viable targets, leading to suboptimal decisions. For example, the queue may rebalance from a slightly overfull store toward an around-the-mean store, but the mean can be deflated by overloaded stores with very low range counts. As a result, slightly overfull store might appear to deserve a rebalance even though it won't be able to pick a very good candidate in the end. We are okay with this for now and will revisit if needed. The lease queue does not yet apply this check; future commits will extend it. I plan to add more data-driven tests to ensure that constraints are still repaired even when MMA rejects certain actions. These will be included in follow-up PRs. --- **mmaintegration: add TestingKnobs to AllocatorSync** This commit adds a TestingKnobs field to AllocatorSync and plumbs it through, enabling future commits to add tests that override decisions in InConflictWithMMA. --- **allocatorimpl: add TestAllocatorRebalanceMMAConflict** This commit adds TestAllocatorRebalanceMMAConflict, which verifies that allocator.RebalanceTarget allows critical rebalances to proceed even when they conflict with MMA's goals, and ensures it consults MMA correctly for non-critical rebalances. --- **mmaprototype: use meansLoad in MMARebalanceAdvisor** This commit switches to using meansLoad and passes the stores slice directly to computeMeansForStoreSet. Once the other PR is merged and this one is rebased, we should fix-up this commit to allocatorimpl: consult with mma for replicate queue rebalancing. --- **allocatorimpl: remove leftover log lines** This commit removes leftover fmt.Println debug logs. --- **mmaprototype: return a pointer for BuildMMARebalanceAdvisor** This commit updates BuildMMARebalanceAdvisor to return a pointer, since future changes will require passing it around and modifying its fields. --- **allocatorimpl: populate advisor under rebalanceOptions** Previously, `bestRebalanceTarget` took a map and populated it with (option index->advisor) during the loop. This commit changes it to populate the advisor directly within the `rebalanceOptions` struct as a field. --- **mmaprototype: lazily initialize existingStoreSLS** Previously, `MMARebalanceAdvisor` built `existingStoreSLS` during `BuildMMARebalanceAdvisor`. This commit changes it to store only `existingStoreID` at `BuildMMARebalanceAdvisor`, and construct the actual `existingStoreSLS` during the first call to `IsInConflictWithMMA`. This has two advantages: (1) `existingStoreID` is now part of the advisor struct, making it available for logging in `IsInConflictWithMMA`; and (2) only `IsInConflictWithMMA` needs a context for logging, which makes sense for it to have one. --- **mmaprototype: pass context properly** This commit ensures `IsInConflictWithMMA` receives a proper context so that its logs appear in traces. --- **mmaprototype: improve logs with IsInConflictWithMMA** This commit enhances `IsInConflictWithMMA` logs so they appear in traces and under vmodule, improving visibility into MMA rejections. --- **allocatorimpl: improve comments for recent changes** This commit updates comments to reflect changes made in previous commits. --- **allocatorimpl: fixes TestBestRebalanceTarget** This commit fixes a test that previously panic because `nil` was passed for `allocatorSync` in `bestRebalanceTarget`. --- **mmaprototype: move MMARebalanceAdvisor to rebalance_advisor.go** This commit moves `MMARebalanceAdvisor` related code from `allocator_state.go` to `rebalance_advisor.go`. --- **allocatorimpl: remove CountBasedRebalancingOnlyEnabledByMMA** This commit removes `CountBasedRebalancingOnlyEnabledByMMA`, which was left unused. --- **mmaprototype: improve comments and logs** This commit resolves misc comments from PR review. Co-authored-by: wenyihu6 <[email protected]>
2 parents 1e387fd + 32abeb0 commit aa46307

23 files changed

+763
-91
lines changed

pkg/kv/kvserver/allocator/allocatorimpl/BUILD.bazel

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,12 +15,14 @@ go_library(
1515
"//pkg/kv/kvpb",
1616
"//pkg/kv/kvserver/allocator",
1717
"//pkg/kv/kvserver/allocator/load",
18+
"//pkg/kv/kvserver/allocator/mmaprototype",
1819
"//pkg/kv/kvserver/allocator/storepool",
1920
"//pkg/kv/kvserver/constraint",
2021
"//pkg/kv/kvserver/kvflowcontrol/rac2",
2122
"//pkg/kv/kvserver/kvserverbase",
2223
"//pkg/kv/kvserver/liveness",
2324
"//pkg/kv/kvserver/liveness/livenesspb",
25+
"//pkg/kv/kvserver/mmaintegration",
2426
"//pkg/kv/kvserver/raftutil",
2527
"//pkg/raft",
2628
"//pkg/raft/raftpb",
@@ -53,11 +55,13 @@ go_test(
5355
"//pkg/kv/kvpb",
5456
"//pkg/kv/kvserver/allocator",
5557
"//pkg/kv/kvserver/allocator/load",
58+
"//pkg/kv/kvserver/allocator/mmaprototype",
5659
"//pkg/kv/kvserver/allocator/storepool",
5760
"//pkg/kv/kvserver/constraint",
5861
"//pkg/kv/kvserver/kvflowcontrol/rac2",
5962
"//pkg/kv/kvserver/liveness",
6063
"//pkg/kv/kvserver/liveness/livenesspb",
64+
"//pkg/kv/kvserver/mmaintegration",
6165
"//pkg/kv/kvserver/replicastats",
6266
"//pkg/raft",
6367
"//pkg/raft/raftpb",
@@ -72,6 +76,7 @@ go_test(
7276
"//pkg/util/log",
7377
"//pkg/util/metric",
7478
"//pkg/util/stop",
79+
"//pkg/util/timeutil",
7580
"//pkg/util/tracing",
7681
"@com_github_cockroachdb_errors//:errors",
7782
"@com_github_kr_pretty//:pretty",

pkg/kv/kvserver/allocator/allocatorimpl/allocator.go

Lines changed: 70 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ import (
2121
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/constraint"
2222
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvflowcontrol/rac2"
2323
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/kvserverbase"
24+
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/mmaintegration"
2425
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/raftutil"
2526
"github.com/cockroachdb/cockroach/pkg/raft"
2627
"github.com/cockroachdb/cockroach/pkg/raft/raftpb"
@@ -619,6 +620,7 @@ type AllocatorMetrics struct {
619620
// in the cluster.
620621
type Allocator struct {
621622
st *cluster.Settings
623+
as *mmaintegration.AllocatorSync
622624
deterministic bool
623625
nodeLatencyFn func(nodeID roachpb.NodeID) (time.Duration, bool)
624626
// TODO(aayush): Let's replace this with a *rand.Rand that has a rand.Source
@@ -658,6 +660,7 @@ func makeAllocatorMetrics() AllocatorMetrics {
658660
// close coupling with the StorePool.
659661
func MakeAllocator(
660662
st *cluster.Settings,
663+
as *mmaintegration.AllocatorSync,
661664
deterministic bool,
662665
nodeLatencyFn func(nodeID roachpb.NodeID) (time.Duration, bool),
663666
knobs *allocator.TestingKnobs,
@@ -670,6 +673,7 @@ func MakeAllocator(
670673
}
671674
allocator := Allocator{
672675
st: st,
676+
as: as,
673677
deterministic: deterministic,
674678
nodeLatencyFn: nodeLatencyFn,
675679
randGen: makeAllocatorRand(randSource),
@@ -1868,11 +1872,76 @@ func (a Allocator) RebalanceTarget(
18681872
// would, we don't want to actually rebalance to that target.
18691873
var target, existingCandidate *candidate
18701874
var removeReplica roachpb.ReplicationTarget
1875+
1876+
// The loop below can iterate multiple times. This is because the
1877+
// (source,target) pair chosen by bestRebalanceTarget may be rejected either
1878+
// by the multi-metric allocator or by the check that a moved replica wouldn't
1879+
// immediately be removable. bestRebalanceTarget mutates the candidate slice
1880+
// for the given option (=source) to exclude each considered candidate. For
1881+
// example, the loop may proceed as follows:
1882+
// - initially, we may consider rebalancing from s1 to either of s2, s3, or
1883+
// s5, or rebalancing from s6 to either of s2 or s4: options = [s1 ->
1884+
// [s2,s3,s5], s6 -> [s2,s4]]. Each option here is considered as an
1885+
// equivalence class.
1886+
// - bestRebalanceTarget might pick s6->s4, and removes this choice from
1887+
// `options`. options now becomes [s1->[s2,s3,s5], s6 -> [s2]].
1888+
// - mma might reject s6->s4, so we loop around.
1889+
// - next, s1->s3 might be chosen, but fail the removable replica check.
1890+
// - so we'll begin a third loop with: options is now [s1->[s2,s5], s6 ->
1891+
// [s2]].
1892+
// - s6->s2 might be chosen and might succeed, terminating the loop and
1893+
// proceeding to make the change.
1894+
//
1895+
// Note that in general (and in the example) a source store can be considered
1896+
// multiple times (s6 is considered twice), so we cache the corresponding MMA
1897+
// advisor to avoid potentially expensive O(store) recomputations. The
1898+
// corresponding advisor is constructed only once and cached in
1899+
// results[bestIdx].advisor when the the source store is selected as the best
1900+
// rebalance target for the first time. After that, bestRebalanceTarget is
1901+
// free to mutate the cands set of the option. However, MMARebalancerAdvisor
1902+
// should use the original candidate set union the existing store to compute
1903+
// the load summary when calling IsInConflictWithMMA. It does so by using the
1904+
// computed meansLoad summary cached when this option was selected as the best
1905+
// rebalance target for the first time.
1906+
var bestIdx int
1907+
1908+
// NB: bestRebalanceTarget may modify the candidate set (cands) within each
1909+
// option in results. However, for each option, the associated source store,
1910+
// MMARebalanceAdvisor, and their index in results must remain unchanged
1911+
// throughout the process. This ensures that any cached MMARebalanceAdvisor
1912+
// continues to correspond to the original candidate set and source store,
1913+
// even as candidates are removed.
18711914
for {
1872-
target, existingCandidate = bestRebalanceTarget(a.randGen, results)
1915+
target, existingCandidate, bestIdx = bestRebalanceTarget(a.randGen, results, a.as)
18731916
if target == nil {
18741917
return zero, zero, "", false
18751918
}
1919+
if bestIdx == -1 {
1920+
log.KvDistribution.Fatalf(ctx, "programmer error: bestIdx is -1 when target is not nil")
1921+
}
1922+
1923+
// Skip mma conflict checks for critical rebalances, which repairs a bad
1924+
// state such as constraint violation, disk-fullness, and diversity
1925+
// improvements.
1926+
if !existingCandidate.isCriticalRebalance(target) {
1927+
// If the rebalance is not critical, we check if it conflicts with mma's
1928+
// goal. advisor for bestIdx should always be cached by
1929+
// bestRebalanceTarget. If mma rejects the rebalance, we will continue to
1930+
// the next target. Note that bestRebalanceTarget would delete this target
1931+
// from the candidates set when being selected, so this target will not be
1932+
// selected again.
1933+
if advisor := results[bestIdx].advisor; advisor != nil {
1934+
if a.as.IsInConflictWithMMA(ctx, target.store.StoreID, advisor, false) {
1935+
continue
1936+
}
1937+
} else {
1938+
if buildutil.CrdbTestBuild {
1939+
log.KvDistribution.Fatalf(ctx, "expected to find MMA handle for idx %d", bestIdx)
1940+
} else {
1941+
log.KvDistribution.Errorf(ctx, "expected to find MMA handle for idx %d", bestIdx)
1942+
}
1943+
}
1944+
}
18761945

18771946
// Add a fake new replica to our copy of the replica descriptor so that we can
18781947
// simulate the removal logic. If we decide not to go with this target, note

pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer.go

Lines changed: 56 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,10 @@ import (
1414
"time"
1515

1616
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/load"
17+
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/mmaprototype"
1718
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/storepool"
1819
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/constraint"
20+
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/mmaintegration"
1921
"github.com/cockroachdb/cockroach/pkg/roachpb"
2022
"github.com/cockroachdb/cockroach/pkg/settings"
2123
"github.com/cockroachdb/cockroach/pkg/util/admission/admissionpb"
@@ -853,6 +855,35 @@ func (c candidate) less(o candidate) bool {
853855
return c.compare(o) < 0
854856
}
855857

858+
// isCriticalRebalance returns true if the rebalance from source to target is
859+
// considered as a critical rebalance to repair constraints, disk fullness, or
860+
// diversity improvements.
861+
func (source candidate) isCriticalRebalance(target *candidate) bool {
862+
// valid is better.
863+
if !source.valid && target.valid {
864+
return true
865+
}
866+
// !fullDisk is better.
867+
if source.fullDisk && !target.fullDisk {
868+
return true
869+
}
870+
// necessary is better.
871+
if !source.necessary && target.necessary {
872+
return true
873+
}
874+
// voterNecessary is better.
875+
if !source.voterNecessary && target.voterNecessary {
876+
return true
877+
}
878+
// higher diversityScore is better.
879+
if !scoresAlmostEqual(source.diversityScore, target.diversityScore) {
880+
if target.diversityScore > source.diversityScore {
881+
return true
882+
}
883+
}
884+
return false
885+
}
886+
856887
// compare is analogous to strcmp in C or string::compare in C++ -- it returns
857888
// a positive result if c is a better fit for the range than o, 0 if they're
858889
// equivalent, or a negative result if o is a better fit than c. The magnitude
@@ -1389,6 +1420,11 @@ func candidateListForRemoval(
13891420
type rebalanceOptions struct {
13901421
existing candidate
13911422
candidates candidateList
1423+
// advisor is lazily initialized by bestRebalanceTarget when this option is
1424+
// selected as best rebalance target. It is used to determine if a candidate
1425+
// is in conflict with mma's goals when LBRebalancingMultiMetricAndCount mode
1426+
// is enabled.
1427+
advisor *mmaprototype.MMARebalanceAdvisor
13921428
}
13931429

13941430
// equivalenceClass captures the set of "equivalent" replacement candidates
@@ -1892,18 +1928,22 @@ func rankedCandidateListForRebalancing(
18921928
// bestRebalanceTarget returns the best target to try to rebalance to out of
18931929
// the provided options, and removes it from the relevant candidate list.
18941930
// Also returns the existing replicas that the chosen candidate was compared to.
1931+
// Also returns the index of the best target in the options slice.
18951932
// Returns nil if there are no more targets worth rebalancing to.
1933+
//
1934+
// Contract: responsible for making sure that the returned bestIdx has the
1935+
// corresponding MMA advisor in advisors.
18961936
func bestRebalanceTarget(
1897-
randGen allocatorRand, options []rebalanceOptions,
1898-
) (target, existingCandidate *candidate) {
1899-
bestIdx := -1
1937+
randGen allocatorRand, options []rebalanceOptions, as *mmaintegration.AllocatorSync,
1938+
) (target, existingCandidate *candidate, bestIdx int) {
1939+
bestIdx = -1
19001940
var bestTarget *candidate
19011941
var replaces candidate
19021942
for i, option := range options {
19031943
if len(option.candidates) == 0 {
19041944
continue
19051945
}
1906-
target := option.candidates.selectBest(randGen)
1946+
target = option.candidates.selectBest(randGen)
19071947
if target == nil {
19081948
continue
19091949
}
@@ -1915,14 +1955,24 @@ func bestRebalanceTarget(
19151955
}
19161956
}
19171957
if bestIdx == -1 {
1918-
return nil, nil
1958+
return nil, nil, -1 /*bestIdx*/
1959+
}
1960+
// For the first time an option in options is selected, build and cache the
1961+
// corresponding MMA advisor in advisors[bestIdx].
1962+
if options[bestIdx].advisor == nil {
1963+
stores := make([]roachpb.StoreID, 0, len(options[bestIdx].candidates))
1964+
for _, cand := range options[bestIdx].candidates {
1965+
stores = append(stores, cand.store.StoreID)
1966+
}
1967+
options[bestIdx].advisor = as.BuildMMARebalanceAdvisor(options[bestIdx].existing.store.StoreID, stores)
19191968
}
1969+
19201970
// Copy the selected target out of the candidates slice before modifying
19211971
// the slice. Without this, the returned pointer likely will be pointing
19221972
// to a different candidate than intended due to movement within the slice.
19231973
copiedTarget := *bestTarget
19241974
options[bestIdx].candidates = options[bestIdx].candidates.removeCandidate(copiedTarget)
1925-
return &copiedTarget, &options[bestIdx].existing
1975+
return &copiedTarget, &options[bestIdx].existing, bestIdx
19261976
}
19271977

19281978
// betterRebalanceTarget returns whichever of target1 or target2 is a larger

pkg/kv/kvserver/allocator/allocatorimpl/allocator_scorer_test.go

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ import (
1818
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator"
1919
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/allocator/storepool"
2020
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/constraint"
21+
"github.com/cockroachdb/cockroach/pkg/kv/kvserver/mmaintegration"
2122
"github.com/cockroachdb/cockroach/pkg/roachpb"
2223
"github.com/cockroachdb/cockroach/pkg/util/leaktest"
2324
"github.com/cockroachdb/cockroach/pkg/util/log"
@@ -436,10 +437,18 @@ func TestBestRebalanceTarget(t *testing.T) {
436437
expectedTargets := []roachpb.StoreID{13, 13, 11, 12}
437438
expectedExistingRepls := []roachpb.StoreID{3, 2, 1, 1}
438439
allocRand := makeAllocatorRand(rand.NewSource(0))
440+
ctx := context.Background()
441+
stopper, _, _, a, _ := CreateTestAllocatorWithKnobs(ctx, 10, false, /* deterministic */
442+
nil /* allocator.TestingKnobs */, &mmaintegration.TestingKnobs{
443+
OverrideIsInConflictWithMMA: func(cand roachpb.StoreID) bool {
444+
return false
445+
},
446+
} /* mmaintegration.TestingKnobs */)
447+
defer stopper.Stop(ctx)
439448
var i int
440449
for {
441450
i++
442-
target, existing := bestRebalanceTarget(allocRand, candidates)
451+
target, existing, _ := bestRebalanceTarget(allocRand, candidates, a.as)
443452
if len(expectedTargets) == 0 {
444453
if target == nil {
445454
break

0 commit comments

Comments
 (0)