You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: articles/app-service/tutorial-sre-agent.md
+46-37Lines changed: 46 additions & 37 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,11 +4,14 @@ description: Learn how to use SRE Agent and Azure App Service to identify and fi
4
4
author: msangapu-msft
5
5
ms.author: msangapu
6
6
ms.topic: tutorial
7
-
ms.custom: devx-track-azurecli
8
-
ms.date: 04/22/2025
7
+
ms.date: 05/15/2025
9
8
---
10
9
11
-
# Tutorial: Troubleshoot an App Service app using SRE Agent
10
+
# Troubleshoot an App Service app using SRE Agent (preview)
11
+
12
+
> ![NOTE]
13
+
> Site Reliability Engineering (SRE) Agent is in preview.
14
+
>
12
15
13
16
The Azure SRE (Site Reliability Engineering) Agent helps you manage and monitor Azure resources by using AI-enabled capabilities. Agents guide you in solving problems and aids in build resilient, self-healing systems on your behalf. The sample app includes code meant to exhaust memory and cause HTTP 500 errors, so you can diagnose and fix the problem using SRE Agent.
14
17
@@ -18,7 +21,7 @@ In this tutorial, you:
18
21
> * Create an App Service app using the Azure portal
19
22
> * Deploy a sample App Service app using the Azure portal
20
23
> * Enable App Service logs
21
-
> * Create an Azure SRE Agent to monitor the app
24
+
> * Create an Azure SRE Agent (preview) to monitor the app
22
25
> * Cause the app to produce a HTTP 500 error
23
26
> * Use AI-driven prompts to troubleshoot and fix errors
24
27
@@ -121,7 +124,7 @@ This step configures application logs required by the SRE Agent to diagnose and
121
124
122
125
1. Select **Save**.
123
126
124
-
### Verify the sample app
127
+
## 3. Verify the sample app
125
128
126
129
1. Select **Overview**in the left menu.
127
130
@@ -131,11 +134,11 @@ This step configures application logs required by the SRE Agent to diagnose and
131
134
132
135

133
136
134
-
1. Select the first three images and click `convert`. This converts successfully.
137
+
1. Select the first 3 images and click `convert`. This converts successfully.
135
138
136
139

137
140
138
-
## 3. Create a deployment slot
141
+
## 4. Create a deployment slot
139
142
140
143
1. In the left menu, find the *Deployment* section and select**Deployment slots**.
141
144
@@ -166,7 +169,7 @@ This step configures application logs required by the SRE Agent to diagnose and
166
169
167
170
1. Select **Save**.
168
171
169
-
## 4. Create an agent
172
+
## 5. Create an SRE agent
170
173
171
174
Next, create an agent to monitor the *my-aca-app-group* resource group.
172
175
@@ -194,7 +197,7 @@ Next, create an agent to monitor the *my-aca-app-group* resource group.
194
197
195
198
1. Select **Create**.
196
199
197
-
## 5. Chat with your agent
200
+
## 6. Chat with your agent
198
201
199
202
Your agent has access to any resource inside the resource groups associated with the agent. Use the chat feature to help you inquire about and resolve issues related to your resources.
200
203
@@ -214,7 +217,7 @@ Your agent has access to any resource inside the resource groups associated with
214
217
215
218
Now that you have an agent that sees your App Service app, you can create an opportunity for the agent to make a fix on your behalf.
216
219
217
-
## 6. Break the app
220
+
## 7. Break the app
218
221
219
222
1. In your App Service app page, find **Deployment* section in the left menu and select**Deployment slots**.
220
223
@@ -239,6 +242,8 @@ Now that you have an agent that sees your App Service app, you can create an opp
239
242
240
243

241
244
245
+
1. Repeat the convert step several more times to log more HTTP 500s.
246
+
242
247
## 7. Fix the app
243
248
244
249
1. Go to the Azure portal, search for and select**Azure SRE Agent**.
@@ -253,51 +258,55 @@ Now that you have an agent that sees your App Service app, you can create an opp
253
258
What's wrong with my-app-service-app?
254
259
```
255
260
256
-
From here, the agent should recognize that the tag name is invalid.
261
+
From here, the agent investigates any potential issues with the app. You will see a series of messages regarding the availability, CPU utilization, memory usage, and slot swap.
257
262
258
263
While every chat session is different, you should see a response that resembles the following response.
259
264
260
-
> The container app my-container-app appears to be in a healthy state ("Succeeded"), but I did notice a potential issue:
261
-
>
262
-
> - The container image specified is mcr.microsoft.com/k8se/quickstart:latest1. The ":latest1" tag is unusual and might be a typo (the standard tag is usually ":latest"). If this is not intentional, it could cause image pull failures or unexpected container behavior.
265
+
>I will now perform mitigation for my-sre-app by swapping the slots back to recover the application to a healthy state. Please note that swapping slots back may not always immediately restore health. I will keep you updated on the progress.
263
266
>
264
-
> Other than that, there are no critical errors or warning states reported for my-container-app.
265
-
>
266
-
> Would you like me to investigate further for hidden issues such as image pull errors, app crashes, or logs? If yes, please specify what symptoms you're experiencing (e.g., app not responding, errors in logs, etc.).
267
267
268
-
1. In the chat box, give your agent the following command and waitfor a response.
268
+
Given that enough time has passed for the SRE Agent to detect the errors, it will prompt you with the following message:
269
269
270
-
```text
271
-
Rollback to the last working version.
272
-
```
270
+
>Performing Slot Swap rollback to Restore Application Avaialibilty for my-sre-app
271
+
>[Approve] [Deny]
273
272
274
-
As your agent processes this command, it runs diagnostics and reports back status information. After the agent inspects the previous revisions, the agent's response will include with something that resembles the following message:
275
273
276
-
> You have requested to roll back your container app to the last known working image. This action will cause a brief restart of your app. Proceeding with the rollback now.
274
+
1. Select the **Approve** button to rollback the swap.
277
275
278
-
1. As the agent concludes the rollback analysis, it asks you for approval to execute the rollback operation.
276
+
As your agent processes this command. After the swap, the agent's response will include with something that resembles the following message:
279
277
280
-
To approve the action, reply with the following prompt:
278
+
> The slot swap for my-sre-app has been completed successfully <timestamp>. The production slot has been restored. I will now continue with post-mitigation steps:
279
+
>>I will ask you for the correct GitHub repo URL to raise an issue for the swap-related downtime.
280
+
>>I will monitor the app and provide an availability update in 5 minutes.
281
+
> Please provide the GitHub repository URL where you want the issue to be raised.
281
282
282
-
```text
283
-
approved
284
-
```
283
+
## 8. Verify the fix
285
284
286
-
After the rollback is successful, you should see a response similar to:
285
+
1. To verify your App Service app is working properly, open the app's URL in a browser.
287
286
288
-
> Rollback complete! Your container app has been reverted to the last known working image: mcr.microsoft.com/k8se/quickstart:latest. Please monitor your app to ensure it starts successfully.
287
+
1. To convert images, click `Tools` and select `Convert to PNG`.
289
288
290
-
## 8. Verify repair
289
+

291
290
292
-
Now you can prompt your agent to return your app's fully qualified domain name (FQDN) so you can verify a successful deployment.
291
+
1. Select the first 5 images and click `convert`. Converting images should not longer produce the HTTP 500 errors.
293
292
294
-
1. In the chat box, enter the following prompt.
293
+

295
294
296
-
```text
297
-
What is the FQDN for my-container-app?
298
-
```
295
+
## Clean up resources
296
+
297
+
If you're not going to continue to use this application, you can delete the App Service app and all the associated services by removing the resource groups created in this article.
298
+
299
+
Execute the following steps for both the *my-app-service-group* and *my-sre-agent-group* resource groups.
300
+
301
+
1. Go to the resource group in the Azure portal.
302
+
303
+
1. From the *Overview* section, select**Delete resource group**.
304
+
305
+
1. Enter the resource group name in the confirmation dialog.
306
+
307
+
1. Select **Delete**.
299
308
300
-
1. To verify your container app is working properly, open the FQDN in a web browser.
309
+
The process to delete the resource group can take a few minutes to complete.
0 commit comments