Skip to content

Commit aa9f17d

Browse files
committed
updates
1 parent 55cb59f commit aa9f17d

File tree

1 file changed

+46
-37
lines changed

1 file changed

+46
-37
lines changed

articles/app-service/tutorial-sre-agent.md

Lines changed: 46 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,14 @@ description: Learn how to use SRE Agent and Azure App Service to identify and fi
44
author: msangapu-msft
55
ms.author: msangapu
66
ms.topic: tutorial
7-
ms.custom: devx-track-azurecli
8-
ms.date: 04/22/2025
7+
ms.date: 05/15/2025
98
---
109

11-
# Tutorial: Troubleshoot an App Service app using SRE Agent
10+
# Troubleshoot an App Service app using SRE Agent (preview)
11+
12+
> ![NOTE]
13+
> Site Reliability Engineering (SRE) Agent is in preview.
14+
>
1215
1316
The Azure SRE (Site Reliability Engineering) Agent helps you manage and monitor Azure resources by using AI-enabled capabilities. Agents guide you in solving problems and aids in build resilient, self-healing systems on your behalf. The sample app includes code meant to exhaust memory and cause HTTP 500 errors, so you can diagnose and fix the problem using SRE Agent.
1417

@@ -18,7 +21,7 @@ In this tutorial, you:
1821
> * Create an App Service app using the Azure portal
1922
> * Deploy a sample App Service app using the Azure portal
2023
> * Enable App Service logs
21-
> * Create an Azure SRE Agent to monitor the app
24+
> * Create an Azure SRE Agent (preview) to monitor the app
2225
> * Cause the app to produce a HTTP 500 error
2326
> * Use AI-driven prompts to troubleshoot and fix errors
2427
@@ -121,7 +124,7 @@ This step configures application logs required by the SRE Agent to diagnose and
121124

122125
1. Select **Save**.
123126

124-
### Verify the sample app
127+
## 3. Verify the sample app
125128

126129
1. Select **Overview** in the left menu.
127130

@@ -131,11 +134,11 @@ This step configures application logs required by the SRE Agent to diagnose and
131134
132135
![Click `Tools` and select `Convert to PNG`](./media/tutorial-azure-monitor/sample-monitor-app-tools-menu.png)
133136
134-
1. Select the first three images and click `convert`. This converts successfully.
137+
1. Select the first 3 images and click `convert`. This converts successfully.
135138
136139
![Select the first two images](./media/tutorial-azure-monitor/sample-monitor-app-convert-two-images.png)
137140
138-
## 3. Create a deployment slot
141+
## 4. Create a deployment slot
139142
140143
1. In the left menu, find the *Deployment* section and select **Deployment slots**.
141144
@@ -166,7 +169,7 @@ This step configures application logs required by the SRE Agent to diagnose and
166169
167170
1. Select **Save**.
168171
169-
## 4. Create an agent
172+
## 5. Create an SRE agent
170173
171174
Next, create an agent to monitor the *my-aca-app-group* resource group.
172175
@@ -194,7 +197,7 @@ Next, create an agent to monitor the *my-aca-app-group* resource group.
194197
195198
1. Select **Create**.
196199
197-
## 5. Chat with your agent
200+
## 6. Chat with your agent
198201
199202
Your agent has access to any resource inside the resource groups associated with the agent. Use the chat feature to help you inquire about and resolve issues related to your resources.
200203
@@ -214,7 +217,7 @@ Your agent has access to any resource inside the resource groups associated with
214217
215218
Now that you have an agent that sees your App Service app, you can create an opportunity for the agent to make a fix on your behalf.
216219
217-
## 6. Break the app
220+
## 7. Break the app
218221
219222
1. In your App Service app page, find **Deployment* section in the left menu and select **Deployment slots**.
220223
@@ -239,6 +242,8 @@ Now that you have an agent that sees your App Service app, you can create an opp
239242
240243
![The convert will result in a HTTP 500 error](./media/tutorial-azure-monitor/sample-monitor-app-http-500.png)
241244
245+
1. Repeat the convert step several more times to log more HTTP 500s.
246+
242247
## 7. Fix the app
243248
244249
1. Go to the Azure portal, search for and select **Azure SRE Agent**.
@@ -253,51 +258,55 @@ Now that you have an agent that sees your App Service app, you can create an opp
253258
What's wrong with my-app-service-app?
254259
```
255260
256-
From here, the agent should recognize that the tag name is invalid.
261+
From here, the agent investigates any potential issues with the app. You will see a series of messages regarding the availability, CPU utilization, memory usage, and slot swap.
257262
258263
While every chat session is different, you should see a response that resembles the following response.
259264
260-
> The container app my-container-app appears to be in a healthy state ("Succeeded"), but I did notice a potential issue:
261-
>
262-
> - The container image specified is mcr.microsoft.com/k8se/quickstart:latest1. The ":latest1" tag is unusual and might be a typo (the standard tag is usually ":latest"). If this is not intentional, it could cause image pull failures or unexpected container behavior.
265+
>I will now perform mitigation for my-sre-app by swapping the slots back to recover the application to a healthy state. Please note that swapping slots back may not always immediately restore health. I will keep you updated on the progress.
263266
>
264-
> Other than that, there are no critical errors or warning states reported for my-container-app.
265-
>
266-
> Would you like me to investigate further for hidden issues such as image pull errors, app crashes, or logs? If yes, please specify what symptoms you're experiencing (e.g., app not responding, errors in logs, etc.).
267267
268-
1. In the chat box, give your agent the following command and wait for a response.
268+
Given that enough time has passed for the SRE Agent to detect the errors, it will prompt you with the following message:
269269
270-
```text
271-
Rollback to the last working version.
272-
```
270+
>Performing Slot Swap rollback to Restore Application Avaialibilty for my-sre-app
271+
>[Approve] [Deny]
273272
274-
As your agent processes this command, it runs diagnostics and reports back status information. After the agent inspects the previous revisions, the agent's response will include with something that resembles the following message:
275273
276-
> You have requested to roll back your container app to the last known working image. This action will cause a brief restart of your app. Proceeding with the rollback now.
274+
1. Select the **Approve** button to rollback the swap.
277275
278-
1. As the agent concludes the rollback analysis, it asks you for approval to execute the rollback operation.
276+
As your agent processes this command. After the swap, the agent's response will include with something that resembles the following message:
279277
280-
To approve the action, reply with the following prompt:
278+
> The slot swap for my-sre-app has been completed successfully <timestamp>. The production slot has been restored. I will now continue with post-mitigation steps:
279+
>>I will ask you for the correct GitHub repo URL to raise an issue for the swap-related downtime.
280+
>>I will monitor the app and provide an availability update in 5 minutes.
281+
> Please provide the GitHub repository URL where you want the issue to be raised.
281282
282-
```text
283-
approved
284-
```
283+
## 8. Verify the fix
285284
286-
After the rollback is successful, you should see a response similar to:
285+
1. To verify your App Service app is working properly, open the app's URL in a browser.
287286
288-
> Rollback complete! Your container app has been reverted to the last known working image: mcr.microsoft.com/k8se/quickstart:latest. Please monitor your app to ensure it starts successfully.
287+
1. To convert images, click `Tools` and select `Convert to PNG`.
289288
290-
## 8. Verify repair
289+
![Click `Tools` and select `Convert to PNG`](./media/tutorial-azure-monitor/sample-monitor-app-tools-menu.png)
291290
292-
Now you can prompt your agent to return your app's fully qualified domain name (FQDN) so you can verify a successful deployment.
291+
1. Select the first 5 images and click `convert`. Converting images should not longer produce the HTTP 500 errors.
293292
294-
1. In the chat box, enter the following prompt.
293+
![Select the first five images](./media/tutorial-azure-monitor/sample-monitor-app-working.png)
295294
296-
```text
297-
What is the FQDN for my-container-app?
298-
```
295+
## Clean up resources
296+
297+
If you're not going to continue to use this application, you can delete the App Service app and all the associated services by removing the resource groups created in this article.
298+
299+
Execute the following steps for both the *my-app-service-group* and *my-sre-agent-group* resource groups.
300+
301+
1. Go to the resource group in the Azure portal.
302+
303+
1. From the *Overview* section, select **Delete resource group**.
304+
305+
1. Enter the resource group name in the confirmation dialog.
306+
307+
1. Select **Delete**.
299308
300-
1. To verify your container app is working properly, open the FQDN in a web browser.
309+
The process to delete the resource group can take a few minutes to complete.
301310
302311
## Next steps
303312

0 commit comments

Comments
 (0)