Implement retry logic of the drain operation itself#1170
Implement retry logic of the drain operation itself#1170D3luxee wants to merge 1 commit intokubereboot:mainfrom
Conversation
Signed-off-by: Georg Doser <georg.doser@nitrado.net>
|
This is far too impactful in the long run of the project to not have tests. We'll need some tests to guarantee the behaviour. The initial issue is easy to test and guarantee a good behaviour. I don't think we need more than that for v1. I am however on the fence on the implementation: retrying drains on the node is enough, do we need the complexity around the timeouts? What's the reason you came to the conclusion that it was the best? |
|
I agree on the testing part, i just pushed my changes i made as we are currently running kured with these patches in our production environment.
Which complexity do you mean? Do you refer to the globalDrainTimeout and why there are two behaviors? I wanted to preserve the old behavior of drainTimeout and added globalDrainTimeout which will alter this behavior then. Generally the behavior i wanted is that kured retries the drain call to update the pod list without it uncordning the node which is what happens if you just set drainTimeout. I could also add the retry logic to the drain() function so that its generally applied without changing anything else. I avoided altering the general logic too much to get this merged more easily, but i can implement it differently as well if you want. |
|
This PR was automatically considered stale due to lack of activity. Please refresh it and/or join our slack channels to highlight it, before it automatically closes (in 7 days). |
This PR is aimed to resolve #1169 by implementing a retry logic of the drain function call with the drainer having a timeout by itself.
The globalDrainTimeout works together with the drainTimeout where drainTimeout will be the timeout for the drainer and globalDrainTimeout is the timeout for trying to drain the node after which it will abort draining and release the lock.
This should keep the same logic for all existing flags and only if globalDrainTimeout gets set it will change the behavior of kured.