Skip to content
This repository was archived by the owner on May 22, 2023. It is now read-only.

Monitor third party node downtime #802

@pdyraga

Description

@pdyraga

Keep ECDSA client offers plenty of metrics and diagnostics allowing to monitor the health of the node. However, there is no obvious way to monitor the health of third-party nodes which could be important especially if the node is a member of n-of-n threshold keep with the node being offline. Having an easy way to determine which nodes are offline and what is the impact could help operators to alert each other before a signature is requested from a keep.

One option to achieve it is to start warning in logs if a node sees a peer drop from their list for more than N minutes while they still have an active stake/keeps. We could also limit the warnings to the nodes with which the node being operated has active keeps with.

Another option, not requiring any change in the client, could be a remote telemetry service. The node exposes diagnostics with the list of connected peers that together with the graph can be used to identify offline operators that still have active keeps. This option could be even further enhanced by modeling the network topology for operators who opt-in to the mechanism and submit their diagnostics periodically.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions