Healthchecks aware of State Stores

Not related to any particular environment

Is there any way to have a healthcheck that reports as unhealthy unless the application is actually processing messages? So unhealthy during rebalances and when rebuilding the state stores from the changelog?

What we managed to do is a liveness check for when azkarra is running and a readiness check for when all the applications have been instantiated, but ideally the readiness check should succeed only when the application is processing messages.

This causes a huge problem during scale ups/dows:
1. An instance is added/removed and starts redistributing the partitions without downtime due to the standby replicas
2. While the rebalance and state-store rebuild is happening another instance is added/removed

In this scenario, the entire consumer group would perform a full rebalance and discard all their state-stores, this causes up to 30 minutes of downtime

Even if we have every instance being a stand replica to every other instance (which is not scalable, but we tested as a workaround) it just solves the scale down problem. We need to expose to the orchestrator when the application is in a state where it can't be scaled.

I could not find anything in the docs related to this


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Healthchecks aware of State Stores #139

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Healthchecks aware of State Stores #139

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions