A command-line tool for SRE on-call automation. Given a Jira ticket number, it fetches the ticket details, identifies the alarm type from the title, extracts the target node from the description, runs the appropriate Ansible playbook, then updates the ticket with a mitigation comment, assigns it to the on-call engineer, and transitions it to In Progress — all in one command.
python cli.py --ticket OPS-1234
ticket number
│
▼
Jira API (fetch ticket)
│
├── summary → alarm type (keyword match)
└── description → node hostname (regex parse)
│
▼
alarm_handler.py (switch/case)
│
▼
ansible-playbook <playbook> -i <node>,
│
▼
Jira: post comment + assign + transition → In Progress
| Alarm Type | Trigger keywords (in ticket title) | Playbook |
|---|---|---|
HIGH_CPU |
high cpu, cpu spike, cpu utilization | remediate_high_cpu.yml |
DISK_FULL |
disk full, disk usage, low disk, no space left | remediate_disk_full.yml |
SERVICE_DOWN |
service down, service unavailable, process not running | remediate_service_down.yml |
OOM_KILL |
oom kill, out of memory, memory killed | remediate_oom_kill.yml |
NTP_DRIFT |
ntp drift, clock skew, time drift | remediate_ntp_drift.yml |
HIGH_MEMORY |
high memory, memory usage, memory pressure | remediate_high_memory.yml |
NETWORK_LATENCY |
network latency, packet loss, high latency | remediate_network_latency.yml |
SSL_EXPIRY |
ssl expiry, certificate expir, tls expir | remediate_ssl_expiry.yml |
LOAD_AVERAGE |
load average, high load, load avg | remediate_load_average.yml |
SWAP_USAGE |
swap usage, swap full, high swap | remediate_swap_usage.yml |
The tool parses two fields from the Jira ticket:
Summary (title) — must contain a recognisable alarm keyword:
[ALERT] High CPU utilization on prod-web-07 — threshold 90%
Description — must contain the target node on its own line:
Node: prod-web-07.internal
Runbook: https://wiki.example.com/runbooks/high-cpu
Accepted prefixes: Node:, Host:, Server:, Target:, Affected Node:, Affected Host:
- Python 3.10+
- Ansible installed and in
$PATH - SSH access from your machine to target nodes
git clone https://github.com/yourusername/sre-alarm-cli.git
cd sre-alarm-cli
pip install -r requirements.txt
cp env.example .envEdit .env with your Jira credentials and Ansible settings.
# Run full automation
python cli.py --ticket OPS-1234
# Dry run — fetch, parse and log what would happen, but don't execute anything
python cli.py --ticket OPS-1234 --dry-run2024-01-15 14:23:01 INFO sre-alarm-cli — Starting automation for ticket: OPS-1234
2024-01-15 14:23:01 INFO sre-alarm-cli — Fetched ticket — Summary: [ALERT] High CPU utilization on prod-web-07
2024-01-15 14:23:01 INFO sre-alarm-cli — Alarm type : HIGH_CPU
2024-01-15 14:23:01 INFO sre-alarm-cli — Target node: prod-web-07.internal
2024-01-15 14:23:01 INFO sre-alarm-cli — Playbook : playbooks/remediate_high_cpu.yml
2024-01-15 14:23:01 INFO ansible_runner — Running: ansible-playbook playbooks/remediate_high_cpu.yml ...
2024-01-15 14:23:08 INFO ansible_runner — Playbook completed successfully.
2024-01-15 14:23:08 INFO jira_updater — Comment posted to OPS-1234
2024-01-15 14:23:09 INFO jira_updater — Ticket OPS-1234 assigned to jane.doe
2024-01-15 14:23:09 INFO jira_updater — Ticket OPS-1234 transitioned to In Progress.
2024-01-15 14:23:09 INFO sre-alarm-cli — Done. Ticket OPS-1234 updated and set to In Progress.
sre-alarm-cli/
├── cli.py # Entry point and orchestration
├── jira_client.py # Jira REST API auth and ticket fetching
├── alarm_handler.py # Alarm type resolution and playbook mapping
├── ansible_runner.py # subprocess wrapper for ansible-playbook
├── jira_updater.py # Comment, assign, and transition via Jira API
├── config.py # Environment variable loading
├── playbooks/
│ ├── remediate_high_cpu.yml
│ ├── remediate_disk_full.yml
│ ├── remediate_service_down.yml
│ ├── remediate_oom_kill.yml
│ ├── remediate_ntp_drift.yml
│ ├── remediate_high_memory.yml
│ ├── remediate_network_latency.yml
│ ├── remediate_ssl_expiry.yml
│ ├── remediate_load_average.yml
│ └── remediate_swap_usage.yml
├── env.example
├── requirements.txt
└── README.md
- Add a new entry to
ALARM_KEYWORDSinalarm_handler.py:
"MY_ALARM": ["keyword one", "keyword two"],- Add it to
ALARM_PLAYBOOK_MAP:
"MY_ALARM": "remediate_my_alarm.yml",- Create
playbooks/remediate_my_alarm.yml.
curl -u your@email.com:your_api_token \
https://your-org.atlassian.net/rest/api/3/issue/OPS-1/transitions \
| python3 -m json.tool | grep -A2 '"name"'Set the id for "In Progress" as JIRA_TRANSITION_IN_PROGRESS in your .env.
MIT