🎨 Services using RabbitMQ and Redis have their health checks tied to the respective clients#8977
🎨 Services using RabbitMQ and Redis have their health checks tied to the respective clients#8977GitHK wants to merge 25 commits intoITISFoundation:masterfrom
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #8977 +/- ##
==========================================
- Coverage 87.48% 85.08% -2.40%
==========================================
Files 2068 1186 -882
Lines 81581 50420 -31161
Branches 1434 176 -1258
==========================================
- Hits 71368 42900 -28468
+ Misses 9809 7464 -2345
+ Partials 404 56 -348
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
|
There was a problem hiding this comment.
Pull request overview
This PR aligns multiple services’ /health endpoints (and the webserver healthcheck signal) with the actual RabbitMQ/Redis client connectivity state, so orchestration can detect backend outages more reliably.
Changes:
- Add Redis/RabbitMQ client health assertions to various service health endpoints and webserver healthcheck hooks.
- Consolidate Redis usage in
storagearoundRedisClientsManagerand wire Celery/task-store + DSM cleaner to specific Redis DB clients. - Extend
RedisClientsManagerwith a.healthyaggregate and propagate Redis client health-check interval into the underlying redis-py client.
Reviewed changes
Copilot reviewed 30 out of 30 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| services/web/server/src/simcore_service_webserver/redis.py | Registers Redis manager health with aiohttp healthcheck signal. |
| services/storage/src/simcore_service_storage/modules/redis.py | Switches storage Redis wiring to RedisClientsManager for multiple DBs. |
| services/storage/src/simcore_service_storage/modules/celery/init.py | Uses Redis client from the shared manager for Celery task store. |
| services/storage/src/simcore_service_storage/dsm_cleaner.py | Uses the Redis LOCKS client from the manager for exclusive periodic locking. |
| services/storage/src/simcore_service_storage/core/settings.py | Cleans up settings formatting and removes unused Rabbit settings import/field. |
| services/storage/src/simcore_service_storage/core/application.py | Ensures Redis is set up before Celery so dependencies are available. |
| services/storage/src/simcore_service_storage/api/rest/dependencies/redis.py | Adds FastAPI dependency to retrieve the Redis clients manager. |
| services/storage/src/simcore_service_storage/api/rest/_health.py | Adds Redis manager health check to storage health endpoint. |
| services/resource-usage-tracker/src/simcore_service_resource_usage_tracker/services/modules/redis.py | Adds request-based dependency accessor for Redis lock client. |
| services/resource-usage-tracker/src/simcore_service_resource_usage_tracker/api/rest/_health.py | Enforces Redis client health in resource-usage-tracker health endpoint. |
| services/payments/tests/unit/api/test_rest_meta.py | Refactors tests to cover both RabbitMQ and RPC client health combinations. |
| services/payments/src/simcore_service_payments/services/rabbitmq.py | Adds request-based dependency for RPC client and tightens typing. |
| services/payments/src/simcore_service_payments/api/rest/_health.py | Makes payments health depend on both RabbitMQ and RPC client health. |
| services/notifications/tests/unit/test_api_rest__health.py | Reworks health tests to override dependencies and validate Redis/Rabbit/Postgres checks. |
| services/notifications/src/simcore_service_notifications/core/events.py | Adds Redis lifespan to the app lifespan chain. |
| services/notifications/src/simcore_service_notifications/clients/redis.py | Introduces a dedicated Redis lifespan/client getter for celery tasks backend. |
| services/notifications/src/simcore_service_notifications/clients/celery.py | Reuses Redis client from the new Redis lifespan rather than creating its own. |
| services/notifications/src/simcore_service_notifications/api/rest/dependencies.py | Adds dependency to inject Redis client into request handlers. |
| services/notifications/src/simcore_service_notifications/api/rest/_health.py | Adds Redis health check to notifications health endpoint. |
| services/efs-guardian/tests/unit/test_api_health.py | Updates health mocks to include Redis lock client. |
| services/efs-guardian/src/simcore_service_efs_guardian/api/rest/health.py | Adds Redis lock client health check alongside RabbitMQ checks. |
| services/director-v2/src/simcore_service_director_v2/api/routes/health.py | Adds Redis manager health check to director-v2 health route. |
| services/director-v2/src/simcore_service_director_v2/api/dependencies/rabbitmq.py | Adds request-based dependency to retrieve Redis clients manager. |
| services/clusters-keeper/src/simcore_service_clusters_keeper/api/health.py | Adds Redis client health check to clusters-keeper health endpoint. |
| services/autoscaling/src/simcore_service_autoscaling/api/health.py | Adds RabbitMQ + Redis client health checks to autoscaling health endpoint. |
| services/agent/src/simcore_service_agent/api/rest/_health.py | Switches agent health check to use the RPC RabbitMQ client dependency. |
| services/agent/src/simcore_service_agent/api/rest/_dependencies.py | Renames dependency to get_rabbitmq_rpc_client to reflect the actual client. |
| packages/service-library/src/servicelib/redis/_clients_manager.py | Adds aggregate .healthy property and improves context manager typing. |
| packages/service-library/src/servicelib/redis/_client.py | Passes configured health-check interval into redis-py client creation. |
| packages/celery-library/src/celery_library/app.py | Adjusts Celery broker retry/heartbeat settings and Redis SSL backend config. |
| if celery_settings.CELERY_REDIS_RESULT_BACKEND.REDIS_SECURE: | ||
| base_config["redis_backend_use_ssl"] = {"ssl_cert_reqs": ssl.CERT_NONE} | ||
| base_config["redis_backend_use_ssl"] = {"ssl_cert_reqs": TypeAdapter(bool).validate_python(ssl.CERT_NONE)} | ||
| return base_config |
There was a problem hiding this comment.
ssl.CERT_NONE is an int constant (0), but this code validates it as bool, which converts it to False. Even though False == 0 in Python, this loses the intent/semantics of the TLS setting and can be misinterpreted by Celery/kombu config parsing. Prefer passing ssl.CERT_NONE directly (or explicitly casting to int if needed) instead of TypeAdapter(bool) here.
| @pytest.mark.parametrize( | ||
| "rabbit_healthy, postgres_healthy, redis_healthy, expecteded_msg", | ||
| [ | ||
| (True, True, False, REDIS_CLIENT_UNHEALTHY_MSG), | ||
| (True, False, True, POSRGRES_DATABASE_UNHEALTHY_MSG), | ||
| (False, True, True, RABBITMQ_CLIENT_UNHEALTHY_MSG), | ||
| ], | ||
| ) | ||
| def test_unhealthy_services(mock_services_health: None, test_client: TestClient, expecteded_msg: str): | ||
| with pytest.raises(HealthCheckError) as exc: | ||
| test_client.get("/") | ||
| assert RABBITMQ_CLIENT_UNHEALTHY_MSG in f"{exc.value}" | ||
| assert expecteded_msg in f"{exc.value}" |
There was a problem hiding this comment.
Typo in parameter name expecteded_msg (double 'ed') reduces readability and makes the test API harder to understand. Rename it consistently in both the parametrization and the test function signature/usages.
| def get_redis_client_from_request(app: Annotated[FastAPI, Depends(get_application)]) -> RedisClientSDK: | ||
| return get_redis_client(app) |
There was a problem hiding this comment.
get_redis_client_from_request does not accept a Request (it takes an app dependency instead), so the name is misleading. Consider renaming to something like get_redis_client / get_redis_client_from_app (and update the callers) to match the actual signature and reduce confusion with other dependency helpers that truly take a request.
| def get_redis_client_from_request(app: Annotated[FastAPI, Depends(get_application)]) -> RedisClientSDK: | |
| return get_redis_client(app) | |
| def get_redis_client_from_app( | |
| app: Annotated[FastAPI, Depends(get_application)], | |
| ) -> RedisClientSDK: | |
| return get_redis_client(app) | |
| def get_redis_client_from_request(request: Request) -> RedisClientSDK: | |
| app = get_application(request) | |
| return get_redis_client_from_app(app) |
| async def check_service_health( | ||
| rabbitmq_client: Annotated[RabbitMQClient, Depends(get_rabbitmq_client)], | ||
| rabbitmq_rpc_client: Annotated[RabbitMQClient, Depends(get_rabbitmq_rpc_client)], | ||
| ): | ||
| if not rabbitmq_client.healthy: | ||
| if not rabbitmq_rpc_client.healthy: |
There was a problem hiding this comment.
The dependency get_rabbitmq_rpc_client returns a RabbitMQRPCClient, but the endpoint parameter is annotated as RabbitMQClient. This type mismatch can confuse readers and static type checkers; update the annotation/import to RabbitMQRPCClient (or to the common base type if that's the intent).
| @router.get("/", include_in_schema=True, response_class=PlainTextResponse) | ||
| async def health_check(): | ||
| async def health_check(app: Annotated[FastAPI, Depends(get_app)]): | ||
| # NOTE: sync url in docker/healthcheck.py with this entrypoint! | ||
|
|
||
| if not get_rabbitmq_client(app).healthy: | ||
| raise HealthCheckError(RABBITMQ_CLIENT_UNHEALTHY_MSG) | ||
|
|
||
| if not get_redis_client(app).is_healthy: | ||
| raise HealthCheckError(REDIS_CLIENT_UNHEALTHY_MSG) | ||
|
|
||
| return f"{__name__}.health_check@{datetime.datetime.now(datetime.UTC).isoformat()}" |
There was a problem hiding this comment.
This handler signature no longer declares a return type. Since this module already uses return type annotations elsewhere (e.g. /status), consider adding -> str here for consistency and better static checking.



What do these changes do?
Related issue/s
How to test
Dev-ops