-
Notifications
You must be signed in to change notification settings - Fork 23
Healthchecks aware of State Stores #139
Description
Not related to any particular environment
Is there any way to have a healthcheck that reports as unhealthy unless the application is actually processing messages? So unhealthy during rebalances and when rebuilding the state stores from the changelog?
What we managed to do is a liveness check for when azkarra is running and a readiness check for when all the applications have been instantiated, but ideally the readiness check should succeed only when the application is processing messages.
This causes a huge problem during scale ups/dows:
- An instance is added/removed and starts redistributing the partitions without downtime due to the standby replicas
- While the rebalance and state-store rebuild is happening another instance is added/removed
In this scenario, the entire consumer group would perform a full rebalance and discard all their state-stores, this causes up to 30 minutes of downtime
Even if we have every instance being a stand replica to every other instance (which is not scalable, but we tested as a workaround) it just solves the scale down problem. We need to expose to the orchestrator when the application is in a state where it can't be scaled.
I could not find anything in the docs related to this