-
Notifications
You must be signed in to change notification settings - Fork 291
fix: pod with a restart policy of Never or OnFailure stuck at 'Progressing' (#15317) #709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #709 +/- ##
==========================================
- Coverage 54.26% 47.34% -6.93%
==========================================
Files 64 64
Lines 6164 6537 +373
==========================================
- Hits 3345 3095 -250
- Misses 2549 3187 +638
+ Partials 270 255 -15 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
This looks like a good approach to the problem. |
The pod manifest needs the following to pass the tests:
|
pkg/health/testdata/pod-running-restart-never-with-ignore-annotation.yaml
Fixed
Show fixed
Hide fixed
pkg/health/testdata/pod-running-restart-never-with-ignore-annotation.yaml
Fixed
Show fixed
Hide fixed
@drewhemm I have made the changes you suggested to get a Quality Gate pass |
Cool, looks like the last blocking issue is the commit sign off. |
444a326
to
84a1039
Compare
A non-blocking issue has been flagged by SonarQube, probably best to resolve it as follows:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. left nits.
f6e7045
to
61ddfd1
Compare
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This needs to be merged to solve lots of subsequent bugs that have been raised. |
@christianh814 would you be able to give this PR a review? |
@crenshaw-dev would you be able to give this PR a review? Much appreciated! |
pkg/health/health_pod.go
Outdated
@@ -12,6 +12,10 @@ import ( | |||
"github.com/argoproj/gitops-engine/pkg/utils/kube" | |||
) | |||
|
|||
const ( | |||
AnnotationIgnoreRestartPolicy = "argocd.argoproj.io/ignore-restart-policy" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest putting it in a separate file -
gitops-engine/pkg/sync/common/types.go
Line 14 in 093aef0
AnnotationSyncWave = "argocd.argoproj.io/sync-wave" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nitishfy Thanks for your comment. I have moved this to types.go as you suggested. Indeed better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Docs required!
3de2f0b
to
ffeef76
Compare
Signed-off-by: Roelof Kuijpers <[email protected]>
Signed-off-by: Roelof Kuijpers <[email protected]>
…sses the checks of the Quality Gate Signed-off-by: Roelof Kuijpers <[email protected]>
Signed-off-by: Roelof Kuijpers <[email protected]>
improve code readability Co-authored-by: sivchari <[email protected]> Signed-off-by: Roelof Kuijpers <[email protected]>
Signed-off-by: Roelof Kuijpers <[email protected]>
Signed-off-by: Roelof Kuijpers <[email protected]>
Signed-off-by: Roelof Kuijpers <[email protected]>
|
Have added info to the documentation now! |
A workaround is to use the annotation: argocd.argoproj.io/ignore-restart-policy: "true". | ||
|
||
When this annotation is set on the Pod resource, the controller will ignore the `restartPolicy` and consider the Pod *Running* as a valid healthy state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This behavior, combined with hooks, would be problematic since hook would be Healthy before their completions.
It is good to explain the current behavior. But not should be part of the health check documentation, and not the hooks.
policy := pod.Spec.RestartPolicy | ||
// If the pod has the AnnotationIgnoreRestartPolicy annotation or its restart policy is Always, | ||
// then treat it as a long-running pod and check its health status. | ||
if _, ok := pod.Annotations[common.AnnotationIgnoreRestartPolicy]; ok || policy == corev1.RestartPolicyAlways { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this feature is necessary with the usage of argocd.argoproj.io/ignore-healthcheck: 'true'
. The pod will show running, because that is the behavior that is configured in the Pod and we should have a consistent behaviour with Kubernetes.
If a controller is creating resource on which we cannot assume application healthiness reliably, then we should exclude those.
This implementation extends the health condition check for pods.
Previously the assumption was that Pods with restart policy of Never or OnFailure are hooks with a finite life, these were considered as Progressing instead of Healthy. However, this logic does not apply when the pod is managed by an operator (e.g., Flink operator) and therefore has a restart policy of Never.
We introduce a new annotation which existence is checked when the pod is Running, that allows for skipping this logic on restart policy.