Skip to content

want CockroachDB cluster/replication status in metrics, inventory? #6404

@davepacheco

Description

@davepacheco

CockroachDB exposes a bunch of metrics that are important for understanding the cluster's status and particularly when it's at reduced fault tolerance. I'm using our original testing notes as a reference and seeing these useful metrics:

  • Each type of "Bad ranges" -- either broken out by reason or else separate metrics?
    • raft leaders that are not also leaseholders
    • unavailable
    • under-replicated
  • Count of nodes
  • Replicas per node
  • Leaseholders per node
  • Range operations (adds, removes, splits, merges)

I'm not sure if the basic query metrics there came from CockroachDB too but if they did that'd be useful to pull in, too.

Some of this is critical if we ever do graceful removal but that's not currently planned. Still, this seems very valuable to be able to have in Clickhouse if we're ever debugging an issue related to CockroachDB availability.

I'm not sure if some of this should also go into inventory but we could defer that until the point comes (if ever) that we want to act on these metrics programmatically.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions