Skip to content

WIS2 IMS: Alert Management Mechanism with Prometheus and Alertmanager #23

@kurt-hectic

Description

@kurt-hectic

In WIS2 operations issues with the system are detected by Global Brokers, Global Caches or Global Monitoring. This may lead to the creation of an incident in the WIS2 Incident Management System (IMS). Incidents are mapped to a country, classified and then assigned to either the GISC responsible for the country, or directly to the country. Incidents can be created automatically via an API, or manually.
The WIS2 IMS is operated by the WMO Secretariat and integrated with the WMO Experts database. IMS users are created for designated WIS2 Global Services operator contacts, Expert Team for WIS Operations members, and National Focal Points for WIS, with group membership determining their role in the IMS.

Due to redundant infrastructure in WIS2, issues can potentially be duplicated, for example if two Global Monitoring centers, or two subscribed Global Brokers detect an issue in a country.

The exact workflow that determines which issues lead to the creation of an incident in the IMS and by who, whether new issues are validated, and to who they are assigned will be tracked in the discussion of this issue.

Morocco has validated automatic creation of incidents, based on a Grafana, Prometheus and Alertmanager architecture, whereby once a custom alerting threshold on a metric is reached, a custom integration script leads to the creation of an incident in the WIS2 IMS via the Atlassian JIRA API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions