Skip to content

Add metrics about store liveness which helps in diagnosing failover and replica selection #65459

@MyonKeminta

Description

@MyonKeminta

Enhancement

client-go has a liveness check mechanism implemented by HealthService builtin in gRPC. It can tell that a TiKV server is unable to serve either by finding out that it's unable to connect to the TiKV server (unreachable) or TiKV reporting unknown state indicating it's in an unhealthy state. This affects the policy that client-go's replica selection and failover behavior.

However the result of the liveness is inconvenient to observe. Better adding it to the grafana dashboard.

Ref: https://github.com/tidbcloud/cloud-storage-engine/issues/4061

Metadata

Metadata

Assignees

No one assigned

    Labels

    type/enhancementThe issue or PR belongs to an enhancement.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions