-
Notifications
You must be signed in to change notification settings - Fork 4.1k
release-26.1: mmaintegration: add StoreStatus to mma status translation plumbing #160844
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
release-26.1: mmaintegration: add StoreStatus to mma status translation plumbing #160844
Conversation
This commit adds to document identified gaps from pr discussion cockroachdb#158473.
Previously, storepool's StoreStatus was not exported and only used internally by storepool. This commit exports StoreStatus so that mma integration can consume the same store-level signals that SMA relies and translate them into mma's (Health, Disposition) model. An alternative approach would be to have store pool handle the translation of the store status to mma's (Health, Disposition) model. But this would couple storepool to mma and blur ownership; storepool should remain as the source of truth for store status. MMA owns the interpretation.
This commit removes `sp *storepool.StorePool` field in `mmaStoreRebalancer` given it being unused. Part of cockroachdb#156776 Epic: CRDB-55052
This commit adds the translation layer from StoreStatus to mma's (Health, Disposition) model. sma currently relies on StorePool methods (GetStoreList, LiveAndDeadReplicas) which internally compute StoreStatus using: NodeLivenessFunc (membership + health) combined with other signals (throttling, suspect state) to determine StoreStatus. To preserve sma's behavior, mma reuses StorePool's status() method and translates it to its own (Health, Disposition) model rather than re-deriving health independently. Alternatives considered (and rejected): 1. Query NodeLiveness directly in mma: NodeLiveness operates at the node level, while StorePool tracks per-store state so store status on the same node can diverge based on gossip timing and store specific signals. In addition, NodeLiveness does not include other store signals such as throttling (snapshot backpressure) and suspect status (recently unavailable) which are currently used by sma to filter candidates when making lease/replica placement decisions. 2. Periodically poll storepool from mma Statuses are plumbed before ComputeChanges() instead of periodically in another goroutine.It is more complex, may be stale and less efficient. mma currently only needs updated health statuses for ComputeChanges. Note that the translation goes through allocator sync, not directly in mmaprototype, to avoid importing storepool there and keep layering clean.
This commit updates asis's mma store rebalancer to refresh store status before calling ComputeChanges(), matching production behavior.
This commit changes Decommissioning LeaseDispositionShedding to refusing and Unknown LeaseDispositionRefusing to shedding. A decommissioning store is healthy and leaving - it shouldn't accept new leases, but existing leases can remain and relies on replica replacement triggering lease transfer. An unknown store may be unreachable, and since the leaseholder is important for availability and lease transfers are cheap, proactively shedding leases; this matches SMA behavior. Part of: cockroachdb#156776 Epic: CRDB-55052
|
Thanks for opening a backport. Before merging, please confirm that the change does not break backwards compatibility and otherwise complies with the backport policy. Include a brief release justification in the PR description explaining why the backport is appropriate. All backports must be reviewed by the TL for the owning area. While the stricter LTS policy does not yet apply, please exercise judgment and consider gating non-critical changes behind a disabled-by-default feature flag when appropriate. |
Backport:
Please see individual PRs for details.
/cc @cockroachdb/release
Release justification: low risk mmaprototype work