You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/sre-agent/incident-management.md
+11-11Lines changed: 11 additions & 11 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,17 +10,17 @@ ms.service: azure
10
10
11
11
# Incident management in Azure SRE Agent (preview)
12
12
13
-
Azure SRE Agent streamlines incident management by automatically collecting, analyzing, and responding to alerts from various monitoring platforms. This article explains how the agent processes incidents, evaluates their severity, and takes appropriate actions based on your configuration. L
13
+
Azure SRE Agent streamlines incident management by automatically collecting, analyzing, and responding to alerts from various management platforms. This article explains how the agent processes incidents, evaluates their severity, and takes appropriate actions based on your configuration.
14
14
15
-
Azure SRE Agent receives incident alerts from incident management platforms such as:
15
+
Azure SRE Agent receives alerts from incident management platforms such as:
*[Incident control management (ICM)](/compliance/assurance/assurance-incident-management)
19
19
*[PagerDuty](https://www.pagerduty.com/)
20
20
21
-
Alerts are triggered by predefined conditions configured in these external systems.
21
+
Alerts are triggered by predefined conditions configured in these systems external to SRE Agent.
22
22
23
-
When SRE Agent receives an alert from the management platform, SRE Agent brings the incident into its context, analyzes the situation, and determines the next steps. This process mimics how a human SRE would acknowledge and investigate an incident.
23
+
When SRE Agent receives an alert from the management platform, the agent brings the incident into its context, analyzes the situation, and determines the next steps. This process mimics how a human SRE would acknowledge and investigate an incident.
24
24
25
25
The agent reviews logs, health probes, and other telemetry to assess the incident. During the assessment step, the agent summarizes findings, determines if the alert is a false positive, and decides whether action is needed.
26
26
@@ -30,25 +30,25 @@ SRE Agent responds to incidents based on its configuration and operational mode.
30
30
31
31
***Reader**: In reader mode, the agent provides recommendations and requires human intervention for resolution.
32
32
33
-
***Autonomous**: In autonomous mode, the agent could automatically close incidents or take corrective actions. The agent can also update or close incidents in managing platforms to maintain synchronization across platforms.
33
+
***Autonomous**: In autonomous mode, the agent could automatically close incidents or take corrective actions, depending on your configuration settings. The agent can also update or close incidents in management platforms to maintain synchronization across platforms.
34
34
35
-
You define the rules for how incidents of different priorities are handled. By customizing the rules in the managing platforms, you decide which incidents the agent should acknowledge, resolve, or escalate. These rules can be set via prompts or configuration options.
35
+
You define the rules for how incidents of different priorities are handled. By customizing the rules in the management platforms, you decide which incidents the agent should acknowledge, resolve, or escalate. These rules can be set via prompts or configuration options.
36
36
37
37
## Platform integration
38
38
39
39
Minimal setup is required for Azure Monitor (default integration), while non-Microsoft systems like PagerDuty require extra setup for incident handling preferences.
40
40
41
-
To access the incident management settings, open your agent in the Azure portal. Select *Settings tab* and select **Incident platform**.
41
+
To access the incident management settings, open your agent in the Azure portal. Select **Settings** and select **Incident platform**.
42
42
43
43
### Azure Monitor
44
44
45
-
By default, Azure Monitor is configured as the agent's incident management platform. As the agent encounters incidents any instances of Azure Monitor in any resource groups managed by SRE Agent are contacted with incident data.
45
+
By default, Azure Monitor is configured as the agent's incident management platform. As the agent encounters incidents any instances of Azure Monitor in any resource groups managed by SRE Agent process incident data.
46
46
47
47
To use a different management platform, first disconnect Azure Monitor as the incident management platform for the SRE Agent.
48
48
49
49
### PagerDuty
50
50
51
-
Set up PagerDuty using the following settings:
51
+
To set up PagerDuty, open the agent in the Azure portal, select **Settings** then select **Incident platform**, and enter the following settings:
52
52
53
53
| Setting | Value |
54
54
|---|---|
@@ -58,11 +58,11 @@ Set up PagerDuty using the following settings:
58
58
59
59
Select **Save** to save your changes.
60
60
61
-
Once the changes are saved, now PagerDuty is responsible to manage incidents for the agent.
61
+
Once the changes are saved, PagerDuty is now responsible to manage incidents for the agent.
62
62
63
63
#### Tools
64
64
65
-
You can choose to enable a series of tools which provide granular control over how PagerDuty manages incidents. To further refine the incident management process, you can also add free-form text instructions (in the form of an LLM prompt) to customize how PagerDuty responds to incidents.
65
+
You can choose to enable a series of tools that provide granular control over how PagerDuty manages incidents. To further refine the incident management process, you can also add free-form text instructions (in the form of an LLM prompt) to customize how PagerDuty responds to incidents.
0 commit comments