You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/app-service/sre-agent-overview.md
+13-22Lines changed: 13 additions & 22 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,13 +5,13 @@ services: container-apps
5
5
author: craigshoemaker
6
6
ms.service: azure-container-apps
7
7
ms.topic: conceptual
8
-
ms.date: 05/05/2025
8
+
ms.date: 05/06/2025
9
9
ms.author: cshoe
10
10
---
11
11
12
12
# SRE Agent overview (preview)
13
13
14
-
Site Reliability Engineering (SRE) focuses on creating reliable, scalable systems through automation and proactive management. SRE Agent brings these principles to your cloud environment by providing AI-powered monitoring, troubleshooting, and remediation capabilities. An SRE Agent automates routine operational tasks and provides reasoned insights to help you maintain application reliability while reducing manual intervention. Available as a chatbot, you can ask questions and give natural language commands to maintain your applications and services.
14
+
Site Reliability Engineering (SRE) focuses on creating reliable, scalable systems through automation and proactive management. An SRE Agent brings these principles to your cloud environment by providing AI-powered monitoring, troubleshooting, and remediation capabilities. An SRE Agent automates routine operational tasks and provides reasoned insights to help you maintain application reliability while reducing manual intervention. Available as a chatbot, you can ask questions and give natural language commands to maintain your applications and services. To ensure accuracy and control, any agent action taken on your behalf requires your approval.
15
15
16
16
Agents have access to every resource inside the resource groups associated to the agent. Therefore, agents:
17
17
@@ -32,9 +32,11 @@ The SRE Agent offers several key features that enhance the reliability and perfo
32
32
33
33
-**Proactive monitoring**: Continuous resource monitoring with real-time alerts for potential issues and daily resource reports.
34
34
35
-
-**Automated mitigation:** Automatic detection and mitigation of common issues, reducing downtime and improving resource health.
35
+
-**Automated mitigation:** Automatic detection and mitigation of common issues, reducing downtime and improving resource health. While agents attempt to work on your behalf, all automation requires your approval.
36
36
37
-
-**Resource visualization**: Comprehensive views of your resource dependencies and health status
37
+
-**Resource visualization**: Comprehensive views of your resource dependencies and health status.
38
+
39
+
:::image type="content" source="media/sre-agent/sre-agent-knowldege-graph.png" alt-text="Screenshot of an SRE Agent knowledge graph.":::
38
40
39
41
An SRE Agent works to proactively monitor and maintain your Azure services. Each day your agent creates daily resource reports which provide insights into the health and status of your applications. Reports include:
40
42
@@ -50,28 +52,17 @@ An SRE Agent works to proactively monitor and maintain your Azure services. Each
50
52
51
53
| Scenario | Possible cause | Agent mitigation |
52
54
|---|---|---|
53
-
| Application down | ▪ **Application code issues**: Bugs or errors in the application code can lead to crashes or unresponsiveness.<br><br>▪ **Bad deployment**: Incorrect configurations or failed deployments can cause the application to go down.<br><br>▪ **High CPU/memory/thread issues**: Resource exhaustion due to high CPU, memory, or thread usage can affect application performance. | The SRE Agent can detect these issues and provide actionable insights or automated fixes. For example, it can identify high CPU usage and recommend scaling up the resources or suggest code optimizations. |
54
-
| Virtual machine RDP issues | ▪ **Network configuration problems**: Incorrect network settings can prevent Remote Desktop Protocol (RDP) access to virtual machines.<br><br> ▪ **Firewall rules**: Misconfigured firewall rules can block RDP access.<br><br> ▪ **Resource health**: Virtual machine health issues can affect RDP connectivity. | The SRE Agent can monitor virtual machine health and network configurations, providing alerts and recommendations to resolve RDP issues. Agents can also automate the application of correct firewall rules to restore access. |
55
+
| Application down | ▪ **Application code issues**: Bugs or errors in the application code can lead to crashes or unresponsiveness.<br><br>▪ **Bad deployment**: Incorrect configurations or failed deployments can cause the application to go down.<br><br>▪ **High CPU/memory/thread issues**: Resource exhaustion due to high CPU, memory, or thread usage can affect application performance. | The SRE Agent can detect these issues and provide actionable insights or fixes. For example, it can identify a decrease in web app availability that coincides with a recent slot swap and recommend swapping back slots as first step of mitigation. |
56
+
| Virtual machine RDP issues | ▪ **NSG rules**: Misconfigured NSG rules on the NIC or Subnet can block RDP access. | The SRE Agent can monitor virtual machine health and network configurations, providing alerts and recommendations to resolve RDP issues. Agents can also automate the application of correct firewall rules to restore access. |
55
57
| Container image pull failures | ▪ **Registry authentication issues**: Problems with authentication to the container registry can prevent image pulls.<br><br> ▪ **Network connectivity**: Network issues can disrupt the connection to the container registry.<br><br>▪ **Image availability**: The requested image might not be available or could be missing. | The SRE Agent can detect container image pull failures and provide detailed diagnostics. It can recommend solutions such as verifying registry credentials, checking network connectivity, or ensuring the image is available. |
56
58
57
59
An agent can provide detailed information about different aspects of your apps and resources. The following examples demonstrate the types of questions you could pose to your agent:
58
60
59
-
- Which resource group is my app part of?
60
-
- Which regions do I have apps deployed in?
61
-
- Show me all web apps using .NET 6 runtime.
62
-
- Which apps have diagnostic logging turned on?
63
-
- What plan am I running, and who else shares it?
64
-
- Are there any staging slots configured for this app?
65
-
- What services or resources is my web app connected to?
66
-
- Do any apps in my subscription have ARR affinity enabled?
67
-
- Which apps have health checks enabled and what are their probe paths?
68
-
- Are any of my web apps still running on deprecated or unsupported runtime versions?
69
-
70
-
## Security context
71
-
72
-
An SRE Agent works on your behalf to evaluate and make changes to your Azure resources. Before you can create an agent, you need to make sure you're using an account that has the appropriate security context.
73
-
74
-
The user account used to create an agent needs `Microsoft.Authorization/roleAssignments/write` permissions using either [Role Based Access Control Administrator](/azure/role-based-access-control/built-in-roles) or [User Access Administrator](/azure/role-based-access-control/built-in-roles).
61
+
- What can you assist me with?
62
+
- Why isn't my application working?
63
+
- What services is my resource connected to?
64
+
- Can you provide best practices for my resource?
65
+
- What's the CPU and memory utilization of my app?
0 commit comments