mmaprototype: extract analyzeFunc #158175

wenyihu6 · 2025-11-21T05:20:52Z

Epic: CRDB-55052
Part of: #158163
Release note: none

mmaprototype: extract analyzeFunc

This commit moves analyzeFunc out of finishInit into its own helper improve
readability.

mmaprototype: extract doneFunc

This commit moves doneFunc out of analyzeFunc into its own helper function to
improve readability.

mmaprototype: add more comments for finishInit

This commit adds more comments for rangeAnalyzedConstraints.finishInit.

mmaprototype: extract diversity score computation

This commit moves diversity score computation out of analyzeFunc into its own
helper function to improve readability.

This commit moves analyzeFunc out of finishInit into its own helper improve readability.

This commit moves doneFunc out of analyzeFunc into its own helper function to improve readability.

This commit adds more comments for rangeAnalyzedConstraints.finishInit.

This commit moves diversity score computation out of analyzeFunc into its own helper function to improve readability.

blathers-crl · 2025-11-21T05:20:59Z

It looks like your PR touches production code but doesn't add or edit any test code. Did you consider adding tests to your PR?

_{🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

cockroach-teamcity · 2025-11-21T05:21:17Z

This change is

tbg

This LGTM in that it doesn't seem to break anything and the comments are 💯x better. But I'd need to spend a few more hours on the constraints code that I haven't spent yet to really say that everything in the comments is right.
Better let @sumeerbhola have a pass too, if only to settle your remaining questions, which I can't help with right now.

@tbg reviewed 1 of 1 files at r1, 1 of 1 files at r2, 2 of 2 files at r3, 1 of 1 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @sumeerbhola)

pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 1040 at r4 (raw file):
I feel like this would be easier to grok if this were framed as "we needed a function with certain properties. Here is a function". As is, from reading the comments, I have no idea what it does, and would probably need to look at tests.

But fundamentally, we are comparing slices and want to measure how "diverse" they are. This smells like a thing mathematicians have thought about before.

I asked Cursor about this and it said:

// diversityScore computes a normalized prefix-based distance metric between two
// locality tier sequences. This is mathematically equivalent to a normalized
// Longest Common Prefix (LCP) distance, which measures how far two hierarchical
// paths diverge in a tree structure.
//
// The metric is conceptually similar to:
// - Taxonomic distance in biology (measuring divergence in classification trees)
// - Tree path distance (normalized distance between two paths from root to leaf)
// - Normalized edit distance (simplified to only consider prefix differences)
//
// Given two sorted sequences of strings representing hierarchical locality tiers
// (e.g., [region, zone, rack]), the score is computed as:
// - If they differ at position i: (length - i) / length
// - If identical: 0 (minimum diversity, maximum similarity)
// - If no common prefix (i=0): 1.0 (maximum diversity, minimum similarity)
//
// The choice of 1.0 for maximum diversity (when sequences share no common prefix)
// is a practical normalization to avoid dealing with infinities while maintaining
// a bounded metric in [0, 1].

I don't know if this is exactly correct - we do a lot of summing things up and normalizing - but somehow mentioning that the base concept here is "length-mismatchPos/length" should give readers a pretty good grasp on what this "does".

pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 705 at r3 (raw file):

	// since we may populate satisfiedByReplica[nonVoterIndex][i] for voter
	// constraints as well. This just means that the store can satisfy
	// constraints[i] regardless of the replica type.

You kind of lost me here but in a good way, making me realize I don't actually know how this works. I've tried a few times to write my best understanding here but it always comes up garbled. I think the root of my confusion is: we have constraints and we have voter constraints. Yet, analyzedConstraints has no idea which it is (I think?). It knows which stores have voters and which ones have non-voters. It just computes which stores match which constraint and divvies this up by voter/non-voter status.
Let's say each constraint in the conjunction has a different set of voters satisfying it, and the numbers match up with the num_replicas in each constraint. Surely then the constraint, interpreted as a voter constraint, would be satisfied?
Some examples of what is and is not true in general would really help me grok this.

pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 855 at r3 (raw file):

// analyzeFunc analyzes the current replica set and determines which constraints
// replicas satisfy, populating ac.satisfiedByReplica and
// ac.satisfiedNoConstraintReplica (both empty at the beginning). They are later

I think I saw that this method is invoked multiple times. Is the space just going to be re-used? Or am I misremembering the multiple calls thing?

pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 873 at r5 (raw file):

// phase 3. A constraint may remain under-satisfied. If a store is
// under-satisfied in phase 2, it cannot be corrected in phase 3 and will
// continue to be under-satisfied. (TODO: Is this right during review)

I wish I could tell you. This is currently all out of my league.

pkg/kv/kvserver/allocator/mmaprototype/constraint.go line 1002 at r4 (raw file):

}

// diversityFunc computes the diversity score between two sets of stores.

Does this have any meaningful testing? Might be a good one for @angeladietz's queue otherwise.

wenyihu6 added 4 commits November 21, 2025 00:20

mmaprototype: extract analyzeFunc

01e7511

This commit moves analyzeFunc out of finishInit into its own helper improve readability.

mmaprototype: extract doneFunc

cef090f

This commit moves doneFunc out of analyzeFunc into its own helper function to improve readability.

mmaprototype: add more comments for finishInit

b9fb467

This commit adds more comments for rangeAnalyzedConstraints.finishInit.

mmaprototype: extract diversity score computation

93c9a08

This commit moves diversity score computation out of analyzeFunc into its own helper function to improve readability.

wenyihu6 requested review from a team as code owners November 21, 2025 05:20

wenyihu6 requested review from sumeerbhola and tbg November 21, 2025 05:21

fixup! mmaprototype: add more comments for finishInit

83131a1

tbg approved these changes Nov 21, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

mmaprototype: extract analyzeFunc #158175

mmaprototype: extract analyzeFunc #158175

wenyihu6 commented Nov 21, 2025 •

edited

Loading

Uh oh!

blathers-crl bot commented Nov 21, 2025

Uh oh!

cockroach-teamcity commented Nov 21, 2025

Uh oh!

tbg left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mmaprototype: extract analyzeFunc #158175

Are you sure you want to change the base?

mmaprototype: extract analyzeFunc #158175

Conversation

wenyihu6 commented Nov 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blathers-crl bot commented Nov 21, 2025

Uh oh!

cockroach-teamcity commented Nov 21, 2025

Uh oh!

tbg left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wenyihu6 commented Nov 21, 2025 •

edited

Loading