-
Notifications
You must be signed in to change notification settings - Fork 83
Open
Description
Description
During compression workflow execution, compression workers intermittently fail to connect to RabbitMQ at startup because they begin before RabbitMQ is fully operational. While workers eventually retry and connect successfully, the initial connection failures generate error logs and indicate missing depends_on health check dependencies in the Docker Compose configuration.
Error Logs
[2025-10-28 19:54:36,496: ERROR/MainProcess] consumer: Cannot connect to amqp://clp-user:**@queue:5672//: [Errno 111] Connection refused.
Trying again in 2.00 seconds... (1/100)
[2025-10-28 19:54:38,499: ERROR/MainProcess] consumer: Cannot connect to amqp://clp-user:**@queue:5672//: [Errno 111] Connection refused.
Trying again in 4.00 seconds... (2/100)
Root Cause
The compression worker service does not declare an explicit dependency on the RabbitMQ (queue) service with a health check condition. As a result, Docker Compose starts the worker before verifying that RabbitMQ is healthy and ready to accept connections.
Scope
This issue tracks:
- Adding
depends_onwithcondition: service_healthyfor compression worker → RabbitMQ - Reviewing and adding any other missing service dependencies across all Docker Compose services (compression-scheduler, query-scheduler, query-worker, reducer, webui, garbage-collector, etc.)
Related
- PR: feat(deployment)!: Migrate package orchestration to Docker Compose (resolves #1177); Temporarily remove support for multi-node deployments. #1178
- Comment: feat(deployment)!: Migrate package orchestration to Docker Compose (resolves #1177); Temporarily remove support for multi-node deployments. #1178 (comment)
Reported by: @sitaowang1998
Metadata
Metadata
Assignees
Labels
No labels