Skip to content

Argocd app wait command will fail sync of RabbitMQ Cluster whenever it is created #25570

@pedrotiag0

Description

@pedrotiag0

Checklist:

  • I've searched in the docs and FAQ for my answer: https://bit.ly/argocd-faq.
  • I've included steps to reproduce the bug.
  • I've pasted the output of argocd version.

Describe the bug

In Argo CD v2.14.20, the built-in health customization for rabbitmq.com/RabbitmqCluster degrades the resource when either ClusterAvailable or AllReplicasReady is reported as Unknown:

Related file: https://github.com/argoproj/argo-cd/blob/v2.14.20/resource_customizations/rabbitmq.com/RabbitmqCluster/health.lua#L24

During normal operator reconciliation, the RabbitMQ Cluster Operator initially sets those conditions to Unknown before flipping them to True once pods/endpoints are ready. This short-lived Unknown state is expected, yet Argo CD maps it to Degraded, which briefly flips application health to Degraded and causes argocd app wait --health to fail (non‑zero) even though the system is working and soon transitions to Healthy.

Note: I've conducted a test in v3.1.7 and this is still recurrent, since the mapping is the same.

Expected behavior

I'm looking for some clarification. Is the mapping of Unknown condition for ClusterAvailable / AllReplicasReady to Degraded intended? Should we map the condition to Progressing during reconciliation? That would reflect the operator’s normal, transient state while the cluster is forming and prevents false failures in argocd app wait --health.

Version

argocd: v2.14.20+879895a
  BuildDate: 2025-09-30T16:00:55Z
  GitCommit: 879895af786513ae25ba22f36f861cc1afe3b435
  GitTreeState: clean
  GoVersion: go1.24.6
  Compiler: gc
  Platform: linux/arm64

Logs

from argocd controller:

"Updated health status: Progressing -> Degraded" application=rabbitmq-cluster dest-namespace=test dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal
...
"Updated health status: Degraded -> Progressing" application=rabbitmq-cluster dest-namespace=test dest-server="https://kubernetes.default.svc" reason=ResourceUpdated type=Normal

from argocd app wait command:

The following error occurred while executing a command:
- Command failed with exit code 20: argocd app wait -l component=rabbit-service, --health --suspended
- Command output:
TIMESTAMP                  GROUP                    KIND        NAMESPACE                  NAME    STATUS   HEALTH            HOOK  MESSAGE
2025-12-01T15:06:47+00:00  rabbitmq.com       RabbitmqCluster        test              rabbitmq    Synced  Progressing              rabbitmqcluster.rabbitmq.com/rabbitmq created
2025-12-01T15:06:47+00:00  networking.k8s.io     Ingress             test   rabbitmq-management    Synced  Progressing              ingress.networking.k8s.io/rabbitmq-management created
2025-12-01T15:06:52+00:00  rabbitmq.com  RabbitmqCluster        test              rabbitmq  OutOfSync  Progressing              rabbitmqcluster.rabbitmq.com/rabbitmq created

...

- Error details:
time="2025-12-01T15:06:52Z" level=fatal msg="application 'test/rabbitmq-cluster' health state has transitioned from Progressing to Degraded"

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions