Skip to content

Commit f669d33

Browse files
initial draft
1 parent 65a391e commit f669d33

File tree

1 file changed

+69
-0
lines changed

1 file changed

+69
-0
lines changed
Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,69 @@
1+
---
2+
title: Incident management in Azure SRE Agent (preview)
3+
description: Learn how the Azure SRE Agent incident management capabilities help reduce manual intervention and accelerate resolution times for your Azure resources.
4+
author: craigshoemaker
5+
ms.topic: conceptual
6+
ms.date: 07/21/2025
7+
ms.author: cshoe
8+
ms.service: azure
9+
---
10+
11+
# Incident management in Azure SRE Agent (preview)
12+
13+
Azure SRE Agent streamlines incident management by automatically collecting, analyzing, and responding to alerts from various monitoring platforms. This article explains how the agent processes incidents, evaluates their severity, and takes appropriate actions based on your configuration. L
14+
15+
Azure SRE Agent receives incident alerts from incident management platforms such as:
16+
17+
* [Azure Monitor](/azure/azure-monitor/fundamentals/overview)
18+
* [Incident control management (ICM)](/compliance/assurance/assurance-incident-management)
19+
* [PagerDuty](https://www.pagerduty.com/)
20+
21+
Alerts are triggered by predefined conditions configured in these external systems.
22+
23+
When SRE Agent receives an alert from the management platform, SRE Agent brings the incident into its context, analyzes the situation, and determines the next steps. This process mimics how a human SRE would acknowledge and investigate an incident.
24+
25+
The agent reviews logs, health probes, and other telemetry to assess the incident. During the assessment step, the agent summarizes findings, determines if the alert is a false positive, and decides whether action is needed.
26+
27+
## How agents respond
28+
29+
SRE Agent responds to incidents based on its configuration and operational mode.
30+
31+
* **Reader**: In reader mode, the agent provides recommendations and requires human intervention for resolution.
32+
33+
* **Autonomous**: In autonomous mode, the agent could automatically close incidents or take corrective actions. The agent can also update or close incidents in managing platforms to maintain synchronization across platforms.
34+
35+
You define the rules for how incidents of different priorities are handled. By customizing the rules in the managing platforms, you decide which incidents the agent should acknowledge, resolve, or escalate. These rules can be set via prompts or configuration options.
36+
37+
## Platform integration
38+
39+
Minimal setup is required for Azure Monitor (default integration), while non-Microsoft systems like PagerDuty require extra setup for incident handling preferences.
40+
41+
To access the incident management settings, open your agent in the Azure portal. Select *Settings tab* and select **Incident platform**.
42+
43+
### Azure Monitor
44+
45+
By default, Azure Monitor is configured as the agent's incident management platform. As the agent encounters incidents any instances of Azure Monitor in any resource groups managed by SRE Agent are contacted with incident data.
46+
47+
To use a different management platform, first disconnect Azure Monitor as the incident management platform for the SRE Agent.
48+
49+
### PagerDuty
50+
51+
Set up PagerDuty using the following settings:
52+
53+
| Setting | Value |
54+
|---|---|
55+
| Incident platform | Select **PagerDuty**. |
56+
| REST API access key | Enter your PagerDuty REST API access key. |
57+
| Quickstart handler | Keep the checkbox for the quickstart handler checked. |
58+
59+
Select **Save** to save your changes.
60+
61+
Once the changes are saved, now PagerDuty is responsible to manage incidents for the agent.
62+
63+
#### Tools
64+
65+
You can choose to enable a series of tools which provide granular control over how PagerDuty manages incidents. To further refine the incident management process, you can also add free-form text instructions (in the form of an LLM prompt) to customize how PagerDuty responds to incidents.
66+
67+
## Related content
68+
69+
* [Security contexts](./security-context.md)

0 commit comments

Comments
 (0)