-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Description
Checklist:
- I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
- I've included steps to reproduce the bug.
- I've pasted the output of
argocd version.
Describe the bug
In Argo CD v2.14.20, the built-in health customization for rabbitmq.com/RabbitmqCluster degrades the resource when either ClusterAvailable or AllReplicasReady is reported as Unknown:
Related file: https://github.com/argoproj/argo-cd/blob/v2.14.20/resource_customizations/rabbitmq.com/RabbitmqCluster/health.lua#L24
During normal operator reconciliation, the RabbitMQ Cluster Operator initially sets those conditions to Unknown before flipping them to True once pods/endpoints are ready. This short-lived Unknown state is expected, yet Argo CD maps it to Degraded, which briefly flips application health to Degraded and causes argocd app wait --health to fail (non‑zero) even though the system is working and soon transitions to Healthy.
Note: I've conducted a test in v3.1.7 and this is still recurrent, since the mapping is the same.
Expected behavior
I'm looking for some clarification. Is the mapping of Unknown condition for ClusterAvailable / AllReplicasReady to Degraded intended? Should we map the condition to Progressing during reconciliation? That would reflect the operator’s normal, transient state while the cluster is forming and prevents false failures in argocd app wait --health.
Version
argocd: v2.14.20+879895a
BuildDate: 2025-09-30T16:00:55Z
GitCommit: 879895af786513ae25ba22f36f861cc1afe3b435
GitTreeState: clean
GoVersion: go1.24.6
Compiler: gc
Platform: linux/arm64Logs
from argocd controller:
"Updated health status: Progressing -> Degraded" application=rabbitmq-cluster dest-namespace=test dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal
...
"Updated health status: Degraded -> Progressing" application=rabbitmq-cluster dest-namespace=test dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal
from argocd app wait command:
The following error occurred while executing a command:
- Command failed with exit code 20: argocd app wait -l component=rabbit-service, --health --suspended
- Command output:
TIMESTAMP GROUP KIND NAMESPACE NAME STATUS HEALTH HOOK MESSAGE
2025-12-01T15:06:47+00:00 rabbitmq.com RabbitmqCluster test rabbitmq Synced Progressing rabbitmqcluster.rabbitmq.com/rabbitmq created
2025-12-01T15:06:47+00:00 networking.k8s.io Ingress test rabbitmq-management Synced Progressing ingress.networking.k8s.io/rabbitmq-management created
2025-12-01T15:06:52+00:00 rabbitmq.com RabbitmqCluster test rabbitmq OutOfSync Progressing rabbitmqcluster.rabbitmq.com/rabbitmq created
...
- Error details:
time="2025-12-01T15:06:52Z" level=fatal msg="application 'test/rabbitmq-cluster' health state has transitioned from Progressing to Degraded"