-
Notifications
You must be signed in to change notification settings - Fork 4.1k
mmaintegration: add StoreStatus to mma status translation plumbing #160462
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Previously, storepool's StoreStatus was not exported and only used internally by storepool. This commit exports StoreStatus so that mma integration can consume the same store-level signals that SMA relies and translate them into mma's (Health, Disposition) model. An alternative approach would be to have store pool handle the translation of the store status to mma's (Health, Disposition) model. But this would couple storepool to mma and blur ownership; storepool should remain as the source of truth for store status. MMA owns the interpretation.
|
@tbg still needs unit test but putting this up to check on the high-level design |
This commit adds the translation layer from StoreStatus to mma's (Health, Disposition) model. sma currently relies on StorePool methods (GetStoreList, LiveAndDeadReplicas) which internally compute StoreStatus using: NodeLivenessFunc (membership + health) combined with other signals (throttling, suspect state) to determine StoreStatus. To preserve sma's behavior, mma reuses StorePool's status() method and translates it to its own (Health, Disposition) model rather than re-deriving health independently. Alternatives considered (and rejected): 1. Query NodeLiveness directly in mma: NodeLiveness operates at the node level, while StorePool tracks per-store state so store status on the same node can diverge based on gossip timing and store specific signals. In addition, NodeLiveness does not include other store signals such as throttling (snapshot backpressure) and suspect status (recently unavailable) which are currently used by sma to filter candidates when making lease/replica placement decisions. 2. Periodically poll storepool from mma Statuses are plumbed before ComputeChanges() instead of periodically in another goroutine.It is more complex, may be stale and less efficient. mma currently only needs updated health statuses for ComputeChanges. Note that the translation goes through allocator sync, not directly in mmaprototype, to avoid importing storepool there and keep layering clean. The translation follows this mapping: | StorePool Status | MMA Health | Lease Disposition | Replica Disposition | Rationale | |------------------|-----------------|-------------------|---------------------|-----------------------------------------------------------------------| | Dead | HealthDead | Shedding | Shedding | Store is gone: shed everything | | Unknown | HealthUnknown | Refusing | Refusing | State is unknown: don't add but don't remove either | | Decommissioning | HealthOK | Shedding | Shedding | Store is leaving cluster: shed everything | | Draining | HealthOK | Shedding | Refusing | Store is draining: shed leases, accept replicas | | Throttled | HealthOK | OK | Refusing | Healthy but overlpaded: accept leases but not replicas | | Suspect | HealthUnhealthy | Shedding | Refusing | Recently unavailable: shed leases for safety and don't accept replicas| | Available | HealthOK | OK | OK | Healthy store: accept all |
This commit updates asis's mma store rebalancer to refresh store status before calling ComputeChanges(), matching production behavior.
tbg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
@tbg reviewed 12 files and all commit messages, and made 7 comments.
Reviewable status:complete! 0 of 0 LGTMs obtained (waiting on @wenyihu6).
-- commits line 52 at r2:
The rationale is inconsistent with the replica disposition. I'm guessing the Disposition is correct? When a store is draining, it shouldn't accept replicas. Worth thinking through the case in which a store that is the only remaining target for upreplication is draining: it still doesn't make sense to place a replica there I suppose.
-- commits line 53 at r2:
Worth double checking, but my memory is that "throttled" is when a store refuses a snapshot, right? Or is this triggered for other reasons as well? The "Healthy but overloaded" description doesn't really capture this. Let's call out specifically what "throttled" means here and also in the actual code where the translation happens.
-- commits line 54 at r2:
Isn't shedding too aggressive? Remind me how a store gets into "suspect" state according to the store pool?
pkg/kv/kvserver/mmaintegration/store_status.go line 14 at r2 (raw file):
// translateStorePoolStatusToMMA translates a StorePool status to MMA's (health, // disposition) model.
Reminder to update this table with the one in the PR description (I'm okay with removing the one from the PR and just keeping this one as the source of truth)
pkg/kv/kvserver/mmaintegration/store_status.go line 70 at r2 (raw file):
) default: // Unknown status - treat as unavailable.
Panic, at least in crdb test build?
pkg/kv/kvserver/allocator/storepool/store_pool.go line 899 at r2 (raw file):
} // GetStoreStatus returns the store status for the given store ID.
Is this used?
|
Going to split this pr up - first one here #160555. |
|
TFTR! I will reply these in the new PR #160623. |
|
Closing in favor of #160623. |
Part of #156776
Epic: CRDB-55052
storepool: export GetStoreStatuses for mma health signal plumbing
Previously, storepool's StoreStatus was not exported and only used internally by
storepool. This commit exports StoreStatus so that mma integration can consume
the same store-level signals that SMA relies and translate them into mma's
(Health, Disposition) model.
An alternative approach would be to have store pool handle the translation of
the store status to mma's (Health, Disposition) model. But this would couple
storepool to mma and blur ownership; storepool should remain as the source of
truth for store status. MMA owns the interpretation.
mmaintegration: add StoreStatus to mma status translation plumbing
This commit adds the translation layer from StoreStatus to mma's (Health,
Disposition) model.
sma currently relies on StorePool methods (GetStoreList, LiveAndDeadReplicas)
which internally compute StoreStatus using: NodeLivenessFunc (membership +
health) combined with other signals (throttling, suspect state) to determine
StoreStatus.
To preserve sma's behavior, mma reuses StorePool's status() method and
translates it to its own (Health, Disposition) model rather than re-deriving
health independently.
Alternatives considered (and rejected):
Query NodeLiveness directly in mma: NodeLiveness operates at the node level,
while StorePool tracks per-store state so store status on the same node can
diverge based on gossip timing and store specific signals. In addition,
NodeLiveness does not include other store signals such as throttling
(snapshot backpressure) and suspect status (recently unavailable) which are
currently used by sma to filter candidates when making lease/replica
placement decisions.
Periodically poll storepool from mma Statuses are plumbed before
ComputeChanges() instead of periodically in another goroutine.It is more
complex, may be stale and less efficient. mma currently only needs updated
health statuses for ComputeChanges.
Note that the translation goes through allocator sync, not directly in
mmaprototype, to avoid importing storepool there and keep layering clean.
The translation follows this mapping:
asim: make store rebalancer refresh store status
This commit updates asis's mma store rebalancer to refresh store status before
calling ComputeChanges(), matching production behavior.