Skip to content

Commit cc166b8

Browse files
authored
Merge pull request #299945 from craigshoemaker/sre/overview-branding-faq
[SRE Agent] Update: Overview -> branding and FAQ section
2 parents a2d068f + 7d5bfab commit cc166b8

File tree

3 files changed

+155
-14
lines changed

3 files changed

+155
-14
lines changed

articles/app-service/sre-agent-overview.md

Lines changed: 71 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,15 @@
11
---
2-
title: SRE Agent overview (preview)
2+
title: Azure SRE Agent overview (preview)
33
description: Learn how AI-enabled agents help solve problems and support resilient and self-healing systems on your behalf.
4-
services: container-apps
54
author: craigshoemaker
6-
ms.service: azure-container-apps
75
ms.topic: conceptual
8-
ms.date: 05/08/2025
6+
ms.date: 05/16/2025
97
ms.author: cshoe
108
---
119

12-
# SRE Agent overview (preview)
10+
# Azure SRE Agent overview (preview)
1311

14-
Site Reliability Engineering (SRE) focuses on creating reliable, scalable systems through automation and proactive management. An SRE Agent brings these principles to your cloud environment by providing AI-powered monitoring, troubleshooting, and remediation capabilities. An SRE Agent automates routine operational tasks and provides reasoned insights to help you maintain application reliability while reducing manual intervention. Available as a chatbot, you can ask questions and give natural language commands to maintain your applications and services. To ensure accuracy and control, any agent action taken on your behalf requires your approval.
12+
Site Reliability Engineering (SRE) focuses on creating reliable, scalable systems through automation and proactive management. Azure SRE Agent brings these principles to your Azure hosted applications by providing AI-powered monitoring, troubleshooting, and remediation capabilities to your app environments. The agent automates routine operational tasks and provides reasoned insights to help you maintain application reliability while reducing manual intervention. Available as a chat interface, you can ask questions and give natural language commands to maintain your applications and services. To ensure accuracy and control, any agent action taken on your behalf requires your approval.
1513

1614
Agents have access to every resource inside the resource groups associated to the agent. Therefore, agents:
1715

@@ -26,27 +24,35 @@ An SRE Agent also integrates with [PagerDuty](https://www.pagerduty.com/) to sup
2624
> [!NOTE]
2725
> The SRE Agent feature is in limited preview. To sign up for access, fill out the [SRE Agent application](https://go.microsoft.com/fwlink/?linkid=2319540).
2826
27+
By using an SRE Agent, you consent the product-specific [Supplemental Terms of Use for Microsoft Azure Previews](https://azure.microsoft.com/support/legal/preview-supplemental-terms/).
28+
2929
## Key features
3030

3131
The SRE Agent offers several key features that enhance the reliability and performance of your Azure resources:
3232

33-
- **Proactive monitoring**: Continuous resource monitoring with real-time alerts for potential issues and daily resource reports.
33+
- **Proactive monitoring**: Continuous 24x7 resource monitoring with real-time alerts for potential issues and daily resource reports.
3434

3535
- **Automated mitigation:** Automatic detection and mitigation of common issues, reducing downtime and improving resource health. While agents attempt to work on your behalf, all automation requires your approval.
3636

37+
- **Infrastructure best practices:** Identify and remediate resources not following security best practices and help updates.
38+
39+
- **Automates incident response:** Automatically respond to Azure Monitor alerts or PagerDuty incidents with initial analysis.
40+
41+
- **Accelerates root cause analysis:** Diagnose root causes of app issues by analyzing metrics and logs and suggest mitigations.
42+
3743
- **Resource visualization**: Comprehensive views of your resource dependencies and health status.
3844

3945
:::image type="content" source="media/sre-agent/sre-agent-knowldege-graph.png" alt-text="Screenshot of an SRE Agent knowledge graph.":::
4046

41-
An SRE Agent works to proactively monitor and maintain your Azure services. Each day your agent creates daily resource reports which provide insights into the health and status of your applications. Reports include:
47+
An SRE Agent works to proactively monitor and maintain your Azure services. Each day your agent creates daily resource reports which provide insights into the health and status of your applications.
4248

43-
- **Actionable steps**: Measures you can take each day to reduce errors and harden security practices.
49+
Reports include:
4450

45-
- **Key insights**: Summaries of important details relevant to the health and maintenance of your Azure resources.
51+
- **Incident summary:** Generates information about incidents raised by the SRE Agent on the previous day. Categories include: active, mitigated, or resolved.
4652

47-
- **Reasoning**: Summaries of analysis done by your agent helping surface possible problem areas in your apps.
53+
- **Application group performance and health:** Key metrics for each application group to assess system stability and performance. Metrics include: availability, CPU usage, and memory usage.
4854

49-
- **Actions already taken by the agent**: A list of tasks the agent did on your behalf for the day.
55+
- **Action summary:** Summaries of important details and insights relevant to the health and maintenance of your Azure resources.
5056

5157
## Scenarios
5258

@@ -67,3 +73,56 @@ An agent can provide detailed information about different aspects of your apps a
6773
## Preview access
6874

6975
Access to an SRE Agent is only available as a limited preview. To sign up for access, fill out the [SRE Agent application](https://go.microsoft.com/fwlink/?linkid=2319540).
76+
77+
## Frequently asked questions
78+
79+
### What is Azure SRE Agent, what can it do, and what are its intended uses?
80+
81+
The Azure SRE Agent is a system designed to assist Site Reliability Engineers (SREs) in managing their Azure resources. Agents perform tasks such as monitoring, diagnosing, and mitigating issues.
82+
83+
The system takes your input on the resources it manages, the health or status of specific resources, and any issues to those resources.
84+
85+
Outputs from the agent include:
86+
87+
- Information about resources
88+
- Mitigations and solutions to issues
89+
- Recommendations on best practices
90+
- Actions to resolve issues or implement best practices with user approval
91+
92+
The Azure SRE Agent offers functionalities to assist SREs in managing Azure resources. Agents monitor system metrics and logs to detect issues early, diagnose the root causes of problems, and implement fixes and preventive measures to avoid future incidents. Examples of tasks the agent can handle include:
93+
94+
- Diagnosing and troubleshooting "application down" scenarios
95+
- Getting resource availability information
96+
- Ensuring Azure resources are following best practices.
97+
98+
The agent performs actions on behalf of the user with the right approvals and permissions, ensuring that humans remain in control.
99+
100+
The intended use of the SRE Agent is to help you monitor, diagnose issues, and maintain your Azure resources. The agent is designed to improve the reliability and efficiency of software systems by handling tasks like:
101+
102+
- Reading customer resource metrics and logs
103+
- Provide mitigations or recommendations
104+
- Perform actions on the customer's behalf with the right approvals and permissions
105+
106+
The agent aims to reduce the toil of SREs by automating routine tasks and providing insights and recommendations to enhance system reliability.
107+
108+
### How was SRE Agent evaluated? What metrics are used to measure performance?
109+
110+
The SRE Agent was evaluated through various assessment activities, including user validation, measurement, and mitigations. Metrics used to measure performance include the accuracy of diagnostics, the effectiveness of mitigations, and user feedback on the agent's recommendations.
111+
112+
The evaluation process involved testing the agent's capabilities across different scenarios, such as app availability and incident response, to ensure its reliability and effectiveness. Results are generalizable across use cases that weren't part of the initial evaluation. The agent's design allows it to adapt to different situations and provide consistent performance.
113+
114+
### What are the limitations of SRE Agent? How can impact of SRE Agent’s limitations be minimized?
115+
116+
The known limitations of the SRE Agent include its reliance on user approval for performing actions, which can slow down the response time in critical situations. Additionally, the agent might not be able to solve all problems or could produce inaccurate recommendations due to limitations in its knowledge base.
117+
118+
You can minimize the impact of these limitations by providing detailed and accurate inputs, regularly updating the agent's configuration, and closely monitoring its actions. Ensuring a human SRE reviews and validates the agent's recommendations also helps mitigate potential errors.
119+
120+
### What operational factors and settings allow for effective and responsible use of SRE Agent?
121+
122+
Effective and responsible use of the SRE Agent requires configuring the system to manage the appropriate resources and setting up permissions and approvals for actions. Ensuring that the agent operates within defined parameters and regularly reviewing its actions can help maintain reliability and safety.
123+
124+
### How do I provide feedback?
125+
126+
The current feedback system for the Azure SRE Agent includes thumbs-up and thumbs-down buttons for you to rate the quality of the agent's responses.
127+
128+
When you select either the thumbs-up or thumbs-down button, a small pop-up appears in the same view containing a text box for free-form text feedback. You can enter submit comments here to help the development identify areas for improvement.
Lines changed: 78 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,78 @@
1+
---
2+
title: Create and use an Azure SRE Agent
3+
description: Learn to use an automated agent to resolve problems and keep your apps running in Azure.
4+
author: craigshoemaker
5+
ms.topic: how-to
6+
ms.date: 05/16/2025
7+
ms.author: cshoe
8+
---
9+
10+
# Create and use an Azure SRE Agent
11+
12+
An Azure SRE Agent helps you maintain the health and performance of your Azure resources through AI-powered monitoring and assistance. Agents continuously watch your resources for issues, provide troubleshooting help, and suggest remediation steps available through a natural language chat interface. To ensure accuracy and control, any agent action taken on your behalf requires your approval.
13+
14+
This article demonstrates how to create an SRE Agent, connect it to your resources to maintain optimal application performance.
15+
16+
## Create an agent
17+
18+
Create an agent by pointing it to the resource groups you want to monitor.
19+
20+
### Prerequisites
21+
22+
You need to grant your agent permissions so that it can access the Azure resources.
23+
24+
Before you can create a new agent, make sure your user account has correct permissions. Your account needs `Microsoft.Authorization/roleAssignments/write` permissions using either [Role Based Access Control Administrator](/azure/role-based-access-control/built-in-roles) or [User Access Administrator](/azure/role-based-access-control/built-in-roles).
25+
26+
### Create
27+
28+
To create an SRE Agent, follow these steps:
29+
30+
1. Go to the Azure portal and search for and select **SRE Agent**.
31+
32+
1. Select **Create**.
33+
34+
1. Enter the following values in the *Create agent* window:
35+
36+
| Property | Value |
37+
|---|---|
38+
| Subscription | Select your Azure subscription. |
39+
| Resource group | Select an existing resource group or to create a new one, enter a name. |
40+
| Name | Enter a name for your agent. |
41+
| Region | Select **Sweden Central**.<br><br>During preview, SRE Agents are only available in the *Sweden Central* region, but they can monitor resources in any Azure region.|
42+
| Choose role | Select **Contributor role**. |
43+
44+
1. Select the **Select resource groups** button.
45+
46+
1. In the *Select resource groups to monitor* window, search for the resource group you want to monitor.
47+
48+
Avoid selecting the resource group name link. To select a resource group, check the checkbox next to the resource group.
49+
50+
1. Scroll to the bottom of the dialog window and select **Save**.
51+
52+
1. Select **Create**.
53+
54+
## Chat with your agent
55+
56+
Your agent has access to any resource inside the resource group associated with the agent. Use the chat feature to help you inquire about and resolve issues related to your resources.
57+
58+
1. Go to the Azure portal, search for and select **Azure SRE Agent**.
59+
60+
1. Locate your agent in the list and select the agent name.
61+
62+
Once the chat window loads, you can begin asking your agent questions. Here's a series of questions that can help you get started:
63+
64+
- What can you help me with?
65+
- What subscriptions/resource groups/resources are you managing?
66+
- What alerts should I set up for `<RESOURCE_NAME>`?
67+
- Show me visualization of `2xx` requests vs HTTP errors for my web apps across all subscriptions
68+
69+
If you have a specific problem in mind, you could ask questions like:
70+
71+
- Why is `<RESOURCE_NAME>` slow?
72+
- Why is `<RESOURCE_NAME>` not working?
73+
- Can you investigate `<RESOURCE_NAME>`?
74+
- Can you get me the `<METRIC>` of `<RESOURCE_NAME>`?
75+
76+
## Related content
77+
78+
- [Azure SRE Agent overview](./sre-agent-overview.md)

articles/app-service/toc.yml

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -219,8 +219,12 @@ items:
219219
href: /azure/templates/microsoft.web/allversions
220220
- name: Logs and monitoring
221221
items:
222-
- name: SRE Agent overview
223-
href: sre-agent-overview.md
222+
- name: Troubleshoot and resolve issues with an agent
223+
items:
224+
- name: Overview
225+
href: sre-agent-overview.md
226+
- name: Use an SRE agent
227+
href: sre-agent-usage.md
224228
- name: Monitor App Service
225229
href: monitor-app-service.md
226230
- name: Monitoring data reference

0 commit comments

Comments
 (0)