Skip to content

Healthchecks aware of State Stores #139

@Fryuni

Description

@Fryuni

Not related to any particular environment

Is there any way to have a healthcheck that reports as unhealthy unless the application is actually processing messages? So unhealthy during rebalances and when rebuilding the state stores from the changelog?

What we managed to do is a liveness check for when azkarra is running and a readiness check for when all the applications have been instantiated, but ideally the readiness check should succeed only when the application is processing messages.

This causes a huge problem during scale ups/dows:

  1. An instance is added/removed and starts redistributing the partitions without downtime due to the standby replicas
  2. While the rebalance and state-store rebuild is happening another instance is added/removed

In this scenario, the entire consumer group would perform a full rebalance and discard all their state-stores, this causes up to 30 minutes of downtime

Even if we have every instance being a stand replica to every other instance (which is not scalable, but we tested as a workaround) it just solves the scale down problem. We need to expose to the orchestrator when the application is in a state where it can't be scaled.

I could not find anything in the docs related to this

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions