|
1 | 1 | ---
|
2 |
| -title: SRE Agent overview (preview) |
| 2 | +title: Azure SRE Agent overview (preview) |
3 | 3 | description: Learn how AI-enabled agents help solve problems and support resilient and self-healing systems on your behalf.
|
4 | 4 | services: container-apps
|
5 | 5 | author: craigshoemaker
|
6 | 6 | ms.service: azure-container-apps
|
7 | 7 | ms.topic: conceptual
|
8 |
| -ms.date: 05/08/2025 |
| 8 | +ms.date: 05/16/2025 |
9 | 9 | ms.author: cshoe
|
10 | 10 | ---
|
11 | 11 |
|
12 |
| -# SRE Agent overview (preview) |
| 12 | +# Azure SRE Agent overview (preview) |
13 | 13 |
|
14 |
| -Site Reliability Engineering (SRE) focuses on creating reliable, scalable systems through automation and proactive management. An SRE Agent brings these principles to your cloud environment by providing AI-powered monitoring, troubleshooting, and remediation capabilities. An SRE Agent automates routine operational tasks and provides reasoned insights to help you maintain application reliability while reducing manual intervention. Available as a chatbot, you can ask questions and give natural language commands to maintain your applications and services. To ensure accuracy and control, any agent action taken on your behalf requires your approval. |
| 14 | +Site Reliability Engineering (SRE) focuses on creating reliable, scalable systems through automation and proactive management. An Azure SRE Agent brings these principles to your cloud environment by providing AI-powered monitoring, troubleshooting, and remediation capabilities. An SRE Agent automates routine operational tasks and provides reasoned insights to help you maintain application reliability while reducing manual intervention. Available as a chatbot, you can ask questions and give natural language commands to maintain your applications and services. To ensure accuracy and control, any agent action taken on your behalf requires your approval. |
15 | 15 |
|
16 | 16 | Agents have access to every resource inside the resource groups associated to the agent. Therefore, agents:
|
17 | 17 |
|
@@ -67,3 +67,52 @@ An agent can provide detailed information about different aspects of your apps a
|
67 | 67 | ## Preview access
|
68 | 68 |
|
69 | 69 | Access to an SRE Agent is only available as a limited preview. To sign up for access, fill out the [SRE Agent application](https://go.microsoft.com/fwlink/?linkid=2319540).
|
| 70 | + |
| 71 | +## Frequently asked questions |
| 72 | + |
| 73 | +### What is Azure SRE Agent, what can it do, and what are its intended uses? |
| 74 | + |
| 75 | +The Azure SRE Agent is a system designed to assist Site Reliability Engineers (SREs) in managing their Azure resources. Agents perform tasks such as monitoring, diagnosing, and mitigating issues. |
| 76 | + |
| 77 | +The system takes your input on the resources it manages, the health or status of specific resources, and any issues to those resources. |
| 78 | + |
| 79 | +Outputs from the agent include: |
| 80 | + |
| 81 | +- Information about resources |
| 82 | +- Mitigations and solutions to issues |
| 83 | +- Recommendations on best practices |
| 84 | +- Actions to resolve issues or implement best practices with user approval |
| 85 | + |
| 86 | +The Azure SRE Agent offers functionalities to assist SREs in managing Azure resources. Agents monitor system metrics and logs to detect issues early, diagnose the root causes of problems, and implement fixes and preventive measures to avoid future incidents. Examples of tasks the agent can handle include: |
| 87 | + |
| 88 | +- Diagnosing and troubleshooting "application down" scenarios |
| 89 | +- Getting resource availability information |
| 90 | +- Ensuring Azure resources are following best practices. |
| 91 | + |
| 92 | +The agent performs actions on behalf of the user with the right approvals and permissions, ensuring that humans remain in control. |
| 93 | + |
| 94 | +The intended use of the SRE Agent is to help you monitor, diagnose issues, and maintain your Azure resources. The agent is designed to improve the reliability and efficiency of software systems by handling tasks like: |
| 95 | + |
| 96 | +- Reading customer resource metrics and logs |
| 97 | +- Provide mitigations or recommendations |
| 98 | +- Perform actions on the customer's behalf with the right approvals and permissions |
| 99 | + |
| 100 | +The agent aims to reduce the toil of SREs by automating routine tasks and providing insights and recommendations to enhance system reliability. |
| 101 | + |
| 102 | +### How was SRE Agent evaluated? What metrics are used to measure performance? |
| 103 | + |
| 104 | +The SRE Agent was evaluated through various assessment activities, including user validation, measurement, and mitigations. Metrics used to measure performance include the accuracy of diagnostics, the effectiveness of mitigations, and user feedback on the agent's recommendations. The evaluation process involved testing the agent's capabilities across different scenarios, such as app availability and incident response, to ensure its reliability and effectiveness. Results are generalizable across use cases that weren't part of the initial evaluation. The agent's design allows it to adapt to different situations and provide consistent performance. |
| 105 | + |
| 106 | +### What are the limitations of SRE Agent? How can impact of SRE Agent’s limitations be minimized? |
| 107 | + |
| 108 | +The known limitations of the SRE Agent include its reliance on user approval for performing actions, which can slow down the response time in critical situations. Additionally, the agent might not be able to solve all problems or could produce inaccurate recommendations due to limitations in its knowledge base. You can minimize the impact of these limitations by providing detailed and accurate inputs, regularly updating the agent's configuration, and closely monitoring its actions. Ensuring a human SRE reviews and validates the agent's recommendations also helps mitigate potential errors. |
| 109 | + |
| 110 | +### What operational factors and settings allow for effective and responsible use of SRE Agent? |
| 111 | + |
| 112 | +Effective and responsible use of the SRE Agent requires configuring the system to manage the appropriate resources and setting up permissions and approvals for actions. Ensuring that the agent operates within defined parameters and regularly reviewing its actions can help maintain reliability and safety. |
| 113 | + |
| 114 | +### How do I provide feedback? |
| 115 | + |
| 116 | +The current feedback system for the Azure SRE Agent includes thumbs-up and thumbs-down buttons for you to rate the quality of the agent's responses. |
| 117 | + |
| 118 | +When you select either the thumbs-up or thumbs-down button, a small pop-up appears in the same view containing a text box for free-form text feedback. You can enter submit comments here to help the development identify areas for improvement. |
0 commit comments