Skip to content

Commit 5020f22

Browse files
authored
Merge pull request #299393 from craigshoemaker/sre/overview-updates
[SRE Agent] Overview updates
2 parents 20a2b5a + 320b1b4 commit 5020f22

File tree

2 files changed

+13
-22
lines changed

2 files changed

+13
-22
lines changed
117 KB
Loading

articles/app-service/sre-agent-overview.md

Lines changed: 13 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,13 @@ services: container-apps
55
author: craigshoemaker
66
ms.service: azure-container-apps
77
ms.topic: conceptual
8-
ms.date: 05/05/2025
8+
ms.date: 05/06/2025
99
ms.author: cshoe
1010
---
1111

1212
# SRE Agent overview (preview)
1313

14-
Site Reliability Engineering (SRE) focuses on creating reliable, scalable systems through automation and proactive management. SRE Agent brings these principles to your cloud environment by providing AI-powered monitoring, troubleshooting, and remediation capabilities. An SRE Agent automates routine operational tasks and provides reasoned insights to help you maintain application reliability while reducing manual intervention. Available as a chatbot, you can ask questions and give natural language commands to maintain your applications and services.
14+
Site Reliability Engineering (SRE) focuses on creating reliable, scalable systems through automation and proactive management. An SRE Agent brings these principles to your cloud environment by providing AI-powered monitoring, troubleshooting, and remediation capabilities. An SRE Agent automates routine operational tasks and provides reasoned insights to help you maintain application reliability while reducing manual intervention. Available as a chatbot, you can ask questions and give natural language commands to maintain your applications and services. To ensure accuracy and control, any agent action taken on your behalf requires your approval.
1515

1616
Agents have access to every resource inside the resource groups associated to the agent. Therefore, agents:
1717

@@ -32,9 +32,11 @@ The SRE Agent offers several key features that enhance the reliability and perfo
3232

3333
- **Proactive monitoring**: Continuous resource monitoring with real-time alerts for potential issues and daily resource reports.
3434

35-
- **Automated mitigation:** Automatic detection and mitigation of common issues, reducing downtime and improving resource health.
35+
- **Automated mitigation:** Automatic detection and mitigation of common issues, reducing downtime and improving resource health. While agents attempt to work on your behalf, all automation requires your approval.
3636

37-
- **Resource visualization**: Comprehensive views of your resource dependencies and health status
37+
- **Resource visualization**: Comprehensive views of your resource dependencies and health status.
38+
39+
:::image type="content" source="media/sre-agent/sre-agent-knowldege-graph.png" alt-text="Screenshot of an SRE Agent knowledge graph.":::
3840

3941
An SRE Agent works to proactively monitor and maintain your Azure services. Each day your agent creates daily resource reports which provide insights into the health and status of your applications. Reports include:
4042

@@ -50,28 +52,17 @@ An SRE Agent works to proactively monitor and maintain your Azure services. Each
5052

5153
| Scenario | Possible cause | Agent mitigation |
5254
|---|---|---|
53-
| Application down |**Application code issues**: Bugs or errors in the application code can lead to crashes or unresponsiveness.<br><br>▪ **Bad deployment**: Incorrect configurations or failed deployments can cause the application to go down.<br><br>▪ **High CPU/memory/thread issues**: Resource exhaustion due to high CPU, memory, or thread usage can affect application performance. | The SRE Agent can detect these issues and provide actionable insights or automated fixes. For example, it can identify high CPU usage and recommend scaling up the resources or suggest code optimizations. |
54-
| Virtual machine RDP issues |**Network configuration problems**: Incorrect network settings can prevent Remote Desktop Protocol (RDP) access to virtual machines.<br><br> ▪ **Firewall rules**: Misconfigured firewall rules can block RDP access.<br><br> ▪ **Resource health**: Virtual machine health issues can affect RDP connectivity. | The SRE Agent can monitor virtual machine health and network configurations, providing alerts and recommendations to resolve RDP issues. Agents can also automate the application of correct firewall rules to restore access. |
55+
| Application down |**Application code issues**: Bugs or errors in the application code can lead to crashes or unresponsiveness.<br><br>▪ **Bad deployment**: Incorrect configurations or failed deployments can cause the application to go down.<br><br>▪ **High CPU/memory/thread issues**: Resource exhaustion due to high CPU, memory, or thread usage can affect application performance. | The SRE Agent can detect these issues and provide actionable insights or fixes. For example, it can identify a decrease in web app availability that coincides with a recent slot swap and recommend swapping back slots as first step of mitigation. |
56+
| Virtual machine RDP issues |**NSG rules**: Misconfigured NSG rules on the NIC or Subnet can block RDP access. | The SRE Agent can monitor virtual machine health and network configurations, providing alerts and recommendations to resolve RDP issues. Agents can also automate the application of correct firewall rules to restore access. |
5557
| Container image pull failures |**Registry authentication issues**: Problems with authentication to the container registry can prevent image pulls.<br><br> ▪ **Network connectivity**: Network issues can disrupt the connection to the container registry.<br><br>▪ **Image availability**: The requested image might not be available or could be missing. | The SRE Agent can detect container image pull failures and provide detailed diagnostics. It can recommend solutions such as verifying registry credentials, checking network connectivity, or ensuring the image is available. |
5658

5759
An agent can provide detailed information about different aspects of your apps and resources. The following examples demonstrate the types of questions you could pose to your agent:
5860

59-
- Which resource group is my app part of?
60-
- Which regions do I have apps deployed in?
61-
- Show me all web apps using .NET 6 runtime.
62-
- Which apps have diagnostic logging turned on?
63-
- What plan am I running, and who else shares it?
64-
- Are there any staging slots configured for this app?
65-
- What services or resources is my web app connected to?
66-
- Do any apps in my subscription have ARR affinity enabled?
67-
- Which apps have health checks enabled and what are their probe paths?
68-
- Are any of my web apps still running on deprecated or unsupported runtime versions?
69-
70-
## Security context
71-
72-
An SRE Agent works on your behalf to evaluate and make changes to your Azure resources. Before you can create an agent, you need to make sure you're using an account that has the appropriate security context.
73-
74-
The user account used to create an agent needs `Microsoft.Authorization/roleAssignments/write` permissions using either [Role Based Access Control Administrator](/azure/role-based-access-control/built-in-roles) or [User Access Administrator](/azure/role-based-access-control/built-in-roles).
61+
- What can you assist me with?
62+
- Why isn't my application working?
63+
- What services is my resource connected to?
64+
- Can you provide best practices for my resource?
65+
- What's the CPU and memory utilization of my app?
7566

7667
## Preview access
7768

0 commit comments

Comments
 (0)