Replies: 2 comments 1 reply
-
|
Hey! Good topic. We've run into this in prod too. Beyond BMH and BMC secret, you'll typically also hit:
For Ironic DB cleanup: you don't need to delete the pod; you can clean up the node via A scriptable pattern we use:
Would be great to see BMO add a |
Beta Was this translation helpful? Give feedback.
-
|
Some time ago, I proposed adding forced decomissioning to Ironic: https://bugs.launchpad.net/ironic/+bug/2133499. The feedback was mixed, but maybe input from this group can boost it? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I want to discuss procedures and experiences with hardware failures and how to clean up after them.
Scenario:
A host fails catastrophically and cannot be fixed. The host was registered as a BareMetalHost and
provisionedwhen it happened.There is now no way to deprovision the BMH, so we are stuck with a BMH object that cannot be deleted without resorting to things like manually deleting the finalizers.
Additionally, if the BMH was consumed by a Metal3Machine as part of a CAPI Cluster, the Machine object will also be stuck waiting for the Metal3Machine, which is waiting on the BMH to be deprovisioned. (This could be an issue even for temporary issues that can be fixed. A user may not want to wait a week for a spare part with a cluster stuck with 2/3 control-plane nodes healthy.)
What I am after in this discussion is to figure out how such scenarios "should" be handled. I also want to probe the interest for new features to handle this in a more automatic way. To the best of my knowledge there isn't really any documentation or automation around this currently in Metal3. The user would have to figure out which resources are stuck and manually remove them.
As a first step, let's figure out what needs cleaning up. I know about these:
Both of these have finalizers and will get stuck if BMO cannot deprovision. Do you know of more? How about HardwareData?
For CAPM3/IPAM objects, I believe that once the BMH is gone, they will automatically be cleaned up, but I am not completely sure about this.
My first attempt at documentation for this:
Beta Was this translation helpful? Give feedback.
All reactions