Skip to content

Conversation

@birdayz
Copy link
Contributor

@birdayz birdayz commented Nov 19, 2024

avoid setting decommissionBrokerID for an already-decommissioned-broker by also checking in redpanda if that broker exists.

lots of context can be found in: #314

@chrisseto
Copy link
Contributor

Thus far, I like the approach here the most but it still feels like we're papering over the larger issue here: A lack of idempotency and crash safety.

Stale reads are a symptom of the above but we could easily hit the same problem if the operator crashes (or gets evicted) at an inopportune time.

Couldn't we largely eliminate the problem if the handleScaling function pulled state information from the StatefulSet and adminAPI and pushed that into .Status instead of relying on .Status? (Similar to what Rafal has mentioned in Slack).

It doesn't seem like that large of a change to me and I suspect a lot of logic in this code path could be simplified.

For example, handleDecommissionInProgress could first fetch the brokers list and brokers uuid list from the admin API. If there's a broker that's in the process of decommissioning, you know there's a decommission in progress. With additional information from the StatefulSets/Pod list you should be able to deduce if the decommission hasn't yet started or if it's finished.

avoid setting decommissionBrokerID for an already-decommissioned-broker
by also checking in redpanda if that broker exists.
@birdayz birdayz force-pushed the jb/do-not-decom-missing-broker branch from ad769be to 0bc1b40 Compare November 19, 2024 17:15
@birdayz
Copy link
Contributor Author

birdayz commented Nov 19, 2024

Regarding crash safety, the same check could be added in the cluster_controller.go, and clear the status decommissionBrokerID if required.

@RafalKorepta RafalKorepta merged commit 942191b into main Nov 19, 2024
@chrisseto chrisseto deleted the jb/do-not-decom-missing-broker branch November 19, 2024 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants