-
Notifications
You must be signed in to change notification settings - Fork 0
Aggregates: Use GenerationChangedPredicate and join errors #161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
notandy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fine with the predicate. But do we really need to aggregate the errors? Does it give us any benefit to apply some more aggregates, but still fail ultimately for the cost of a less informative condition message?
Do you have an example where this behavior is benefitial?
Sure, as background, I stumbled over that pattern in the gardener project (i.e. here), but you can find that pattern also in kubernetes proper:. The current behavior is first try to add all missing, then remove all superfluous aggregates, but abort on the first error. So it will not execute the other changes, despite that being likely possible or maybe even necessary. Trying them all will solve possible inter-dependencies eventually. Assuming there are multiple underlying errors that require human intervention, the (human) operator will only see one error at a time, and will only be able to fix that one. Admittedly, the inter-dependencies of host-aggregates are rather simple, so that is a rather theoretical scenario and more an explanation why I found that pattern appealing. Concretely with host-aggregates we have the problem that one cannot add a host to two aggregates with different availability-zones. I suspect, you will say then let's simply remove the host from aggregates first, but then we have a problem with aggregates with tenant-isolation, as we would have then a period of time, where the tenant-isolation would not be active. So, in summary, I think it follows a common practice, and solves this particular issue. |
|
I have added a comment to explain that scenario. |
724a295 to
266cdc8
Compare
266cdc8 to
b05358f
Compare
| if err := ac.trackError(ctx, hv, "failed updating aggregates", errs...); err != nil { | ||
| return ctrl.Result{}, err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took me a while to understand what's happening here. Could you move the errs != nil check here so it's easier to follow the flow of thoughts.
Another thing is, now we have a status update (with errors), a error log with backtrack and a bubbling up error? I mean we can just omit the bubbling up error in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took me a while to understand what's happening here. Could you move the errs != nil check here so it's easier to follow the flow of thoughts.
Sure, changed that.
Another thing is, now we have a status update (with errors), a error log with backtrack and a bubbling up error? I mean we can just omit the bubbling up error in this case.
We need to bubble up the error to trigger the retry and the rate-limiter for that particular reconciliation.
The error practically won't get logged anymore, unless we do so explicitly. If the stacktrace is shown depends on the logging settings, and can be configured. With the current debug logging, it will be shown on errors.
Prior to this change, we would have to exit the reconcile function on each status update, and continue there. Now with the filter, we only retry on either an error or RequeueAfter. Joining the errors allows us to complete at least some of the changes, and not always abort on the first. Return kubernetes errors directly, and return a RequeueAfter on an error from Openstack for now, until the logging situation has been cleared.
b05358f to
c706846
Compare
Merging this branch will decrease overall coverage
Coverage by fileChanged files (no unit tests)
Please note that the "Total", "Covered", and "Missed" counts above refer to code statements instead of lines of code. The value in brackets refers to the test coverage of that file in the old version of the code. |
This PR modifies the Code of #161 to improve following points: 1. no need for extra error-log since instead of dropping Reconcile Errors, we format them nicely with the Encoder. 2. Function (like rewritten `setErrorCondition`) should not return the errors the've been invoked with - but only return errors if they fail. Also, it's an uneeded roundtrip to return the same error that has been passed by the caller. 3. Introduce `utils.LifecycleEnabledPredicate`, a predicate that will filter event's for hypervisors with LifecycleEnabled == True.
This PR modifies the Code of #161 to improve following points: 1. no need for extra error-log since instead of dropping Reconcile Errors, we format them nicely with the Encoder. 2. Function (like rewritten `setErrorCondition`) should not return the errors the've been invoked with - but only return errors if they fail. Also, it's an uneeded roundtrip to return the same error that has been passed by the caller. 3. Introduce `utils.LifecycleEnabledPredicate`, a predicate that will filter event's for hypervisors with LifecycleEnabled == True.
notandy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#166 is a proposal that's based on this PR, but does some things "cleaner".
Especially I am having a problem with trackError, that if it takes errors, unconditionally returns error. So I moved some stuff out into the caller to make it clear what's intended and why.
This PR modifies the Code of #161 to improve following points: 1. no need for extra error-log since instead of dropping Reconcile Errors, we format them nicely with the Encoder. 2. Function (like rewritten `setErrorCondition`) should not return the errors the've been invoked with - but only return errors if they fail. Also, it's an uneeded roundtrip to return the same error that has been passed by the caller. 3. Introduce `utils.LifecycleEnabledPredicate`, a predicate that will filter event's for hypervisors with LifecycleEnabled == True.
This PR modifies the Code of #161 to improve following points: 1. no need for extra error-log since instead of dropping Reconcile Errors, we format them nicely with the Encoder. 2. Function (like rewritten `setErrorCondition`) should not return the errors the've been invoked with - but only return errors if they fail. Also, it's an uneeded roundtrip to return the same error that has been passed by the caller. 3. Introduce `utils.LifecycleEnabledPredicate`, a predicate that will filter event's for hypervisors with LifecycleEnabled == True.
Prior to this change, we would have to exit the reconcile function on each status update, and continue there. Now with the filter, we only retry on either an error or RequeueAfter.
Joining the errors allows us to complete at least some of the changes, and not always abort on the first.
Return kubernetes errors directly, and return a RequeueAfter on an error from Openstack for now, until the logging situation has been cleared.