|
| 1 | +--- |
| 2 | +title: 'Tutorial: Troubleshoot an App using Azure SRE Agent (preview) in Azure App Service' |
| 3 | +description: Learn how to use Azure SRE Agent and Azure App Service to identify and fix app issues with AI-assisted troubleshooting. |
| 4 | +author: msangapu-msft |
| 5 | +ms.author: msangapu |
| 6 | +ms.topic: tutorial |
| 7 | +ms.date: 05/18/2025 |
| 8 | +--- |
| 9 | + |
| 10 | +# Troubleshoot an App Service app using Azure SRE Agent (preview) |
| 11 | + |
| 12 | +> [!NOTE] |
| 13 | +> Azure SRE Agent is in preview. By using SRE Agent, you consent the product-specific [Preview Terms of Use](https://azure.microsoft.com/support/legal/preview-supplemental-terms/). |
| 14 | +
|
| 15 | +Site Reliability Engineering (SRE) focuses on creating reliable, scalable systems through automation and proactive management. An SRE Agent brings these principles to your cloud environment by providing AI-powered monitoring, troubleshooting, and remediation capabilities. An SRE Agent automates routine operational tasks and provides reasoned insights to help you maintain application reliability while reducing manual intervention. Available as a chatbot, you can ask questions and give natural language commands to maintain your applications and services. To ensure accuracy and control, any agent action taken on your behalf requires your approval. |
| 16 | + |
| 17 | +This sample app demonstrates error detection by simulating HTTP 500 failures in a controlled way. You can safely test these scenarios using Azure App Service **deployment slots**, which let you run different app configurations side by side. |
| 18 | + |
| 19 | +You enable error simulation by setting the `INJECT_ERROR` app setting to `1`. When enabled, the app throws an HTTP 500 error after several button clicks, allowing you to see how the SRE Agent responds to application failures. |
| 20 | + |
| 21 | +In this tutorial, you will: |
| 22 | + |
| 23 | +> [!div class="checklist"] |
| 24 | +> * Create an App Service app using the Azure portal. |
| 25 | +> * Deploy a sample app from GitHub. |
| 26 | +> * Configure the app with a startup command and enable logging. |
| 27 | +> * Create a deployment slot to simulate failure. |
| 28 | +> * Set up an Azure SRE Agent to monitor the app. |
| 29 | +> * Trigger a failure by swapping to the broken slot. |
| 30 | +> * Use AI-driven chat to diagnose and resolve the issue by rolling back the swap. |
| 31 | +
|
| 32 | +[!INCLUDE [quickstarts-free-trial-note](~/reusable-content/ce-skilling/azure/includes/quickstarts-free-trial-note.md)] |
| 33 | + |
| 34 | +## Prerequisites |
| 35 | + |
| 36 | +To complete this tutorial, you need: |
| 37 | +- An [Azure subscription](https://azure.microsoft.com/free/). |
| 38 | +- `Microsoft.Authorization/roleAssignments/write` permissions to create role assignments (Role Based Access Control Administrator or User Access Administrator) for SRE Agent setup. |
| 39 | + |
| 40 | +## 1. Create an App Service app |
| 41 | + |
| 42 | +Start by creating a web app that the SRE Agent can monitor. |
| 43 | + |
| 44 | +1. Sign in to the https://portal.azure.com. |
| 45 | + |
| 46 | +1. In the top search bar, search for **App Services**, then select it from the results. |
| 47 | + |
| 48 | +1. Select **+ Create** and choose **Web App**. |
| 49 | + |
| 50 | +### Configure the Basics tab |
| 51 | + |
| 52 | +In the *Basics* tab, provide the following details: |
| 53 | + |
| 54 | +**Project details** |
| 55 | + |
| 56 | +| Setting | Value | |
| 57 | +|-----------------|--------------------------------| |
| 58 | +| Subscription | Your Azure subscription | |
| 59 | +| Resource group | **Create new** → `my-app-service-group` | |
| 60 | + |
| 61 | +**Instance details** |
| 62 | + |
| 63 | +| Setting | Value | |
| 64 | +|-----------------|--------------------------------| |
| 65 | +| Name | `my-sre-app` | |
| 66 | +| Publish | **Code** | |
| 67 | +| Runtime stack | **.NET 9 (STS)** | |
| 68 | +| Operating System| **Windows** | |
| 69 | +| Region | A region near you | |
| 70 | + |
| 71 | + |
| 72 | +1. Select the **Deployment** tab. |
| 73 | + |
| 74 | +1. Under *Authentication settings*, enable **Basic authentication**. |
| 75 | + |
| 76 | + > [!NOTE] |
| 77 | + > Basic authentication is used later for a one-time deployment from GitHub. [Disable Basic Auth](configure-basic-auth-disable.md?tabs=portal) in production. |
| 78 | + > |
| 79 | +
|
| 80 | +1. Select **Review and create**, then **Create** when validation passes. |
| 81 | + |
| 82 | +1. Once deployment completes, you see *Your deployment is complete*. |
| 83 | + |
| 84 | +## 2. Deploy the sample app |
| 85 | + |
| 86 | +Now that your App Service app is created, deploy the sample application from GitHub. |
| 87 | + |
| 88 | +1. In the Azure portal, navigate to your newly created App Service by selecting **Go to resource**. |
| 89 | + |
| 90 | +1. In the left-hand menu, under the *Deployment* section, select **Deployment Center**. |
| 91 | + |
| 92 | +1. In the *Settings* tab, configure: |
| 93 | + |
| 94 | +| Property | Value | |
| 95 | +|------------|--------------------------------------------------------------| |
| 96 | +| Source | **External Git** | |
| 97 | +| Repository | `https://github.com/Azure-Samples/app-service-dotnet-agent-tutorial`| |
| 98 | +| Branch | `main` | |
| 99 | + |
| 100 | +1. Select **Save** to apply the deployment settings. |
| 101 | + |
| 102 | +## 3. Verify the sample app |
| 103 | + |
| 104 | +After deployment, confirm that the sample app is running as expected. |
| 105 | + |
| 106 | +1. In the left menu of your App Service, select **Overview**. |
| 107 | + |
| 108 | +1. Select **Browse** to open the app in a new browser tab. (It might take a minute to load.) |
| 109 | + |
| 110 | +1. The app displays a large counter and two buttons: |
| 111 | + |
| 112 | + :::image type="content" source="media/tutorial-sre-agent/verify-sample-primary-slot.png" alt-text="Screenshot of the .NET sample in the primary slot." border="false"::: |
| 113 | + |
| 114 | +1. Select the *Increment* button several times to observe the counter increase. |
| 115 | + |
| 116 | +## 4. Set up a deployment slot for failure simulation |
| 117 | + |
| 118 | +To simulate an app failure scenario, add a secondary deployment slot. |
| 119 | + |
| 120 | +1. In the left menu of your App Service, under the *Deployment* section, select **Deployment slots**. |
| 121 | + |
| 122 | +1. Select **Add slot**. |
| 123 | + |
| 124 | +1. Enter the following values: |
| 125 | + |
| 126 | + | Property | Value | Remarks | |
| 127 | + |---------------------|--------------|------------------------------------------------------------------------------------------| |
| 128 | + | Name | `broken` | The error scenario is triggered in this slot. | |
| 129 | + | Clone settings from | `my-sre-app` | Copies configuration from the main app. | |
| 130 | + |
| 131 | +1. Scroll to the bottom of the dialog window and select **Add**. Slot creation might take a minute to complete. |
| 132 | + |
| 133 | +### Deploy the sample app to the slot |
| 134 | + |
| 135 | +1. Once the slot is created, select the **broken** slot from the list. |
| 136 | + |
| 137 | +1. In the left menu, under the *Deployment* section, select **Deployment Center**. |
| 138 | + |
| 139 | +1. In the *Settings* tab, configure: |
| 140 | + |
| 141 | + | Property | Value | |
| 142 | + |------------|---------------------------------------------------------------| |
| 143 | + | Source | **External Git** | |
| 144 | + | Repository | `https://github.com/Azure-Samples/app-service-dotnet-agent-tutorial` | |
| 145 | + | Branch | `main` | |
| 146 | + |
| 147 | +1. Select **Save** to apply the deployment settings. |
| 148 | + |
| 149 | +### Add an app setting to enable error simulation |
| 150 | + |
| 151 | +To control error simulation, configure an app setting your app checks at runtime. |
| 152 | + |
| 153 | +1. In the left menu of your App Service, select **Environment variables** under the *Settings* section. |
| 154 | + |
| 155 | +1. At the top, make sure you have the correct slot selected (for example, **broken**). |
| 156 | + |
| 157 | +1. Under the **App settings** tab, select **+ Add**. |
| 158 | + |
| 159 | +1. Enter the following values: |
| 160 | + |
| 161 | + | Property | Value | Remarks | |
| 162 | + |------------|---------------|--------------------------------------------------------------| |
| 163 | + | Name | `INJECT_ERROR`| Must be exactly `INJECT_ERROR` (all caps, no spaces). | |
| 164 | + | Value | `1` | Enables error simulation in the app. | |
| 165 | + |
| 166 | +1. Make sure the **Deployment slot setting** box is **not** checked. |
| 167 | + |
| 168 | +1. Select **Apply** to add the setting. |
| 169 | + |
| 170 | +1. At the bottom of the *Environment variables* page, select **Apply** to apply the changes. |
| 171 | + |
| 172 | +1. When prompted, select **Confirm** to confirm and restart the app in the selected slot. |
| 173 | + |
| 174 | +## 5. Create an Azure SRE Agent |
| 175 | + |
| 176 | +Now, create an Azure SRE Agent to monitor your App Service app. |
| 177 | + |
| 178 | +1. In the Azure portal, search for and select **Azure SRE Agent**. |
| 179 | + |
| 180 | +1. Select **+ Create**. |
| 181 | + |
| 182 | +1. In the *Create agent* window, enter the following values: |
| 183 | + |
| 184 | + | Property | Value | Remarks | |
| 185 | + |------------------|---------------------------|-------------------------------------------------------------------------| |
| 186 | + | Subscription | Your Azure subscription | | |
| 187 | + | Resource group | `my-sre-agent-group` | New group for the Azure SRE Agent | |
| 188 | + | Name | `my-sre-agent`| | |
| 189 | + | Region | **Sweden Central** | Required during preview; can monitor resources in any Azure region | |
| 190 | + | Choose role | **Contributor** | Grants the agent permission to take action on your behalf | |
| 191 | + |
| 192 | +1. Select **Select resource groups**. |
| 193 | + |
| 194 | +1. In the *Selected resource groups to monitor* window, search for and select `my-app-service-group`. |
| 195 | + |
| 196 | +1. Select **Save**. |
| 197 | + |
| 198 | +1. Back in the *Create agent* window, select **Create**. The agent creation process takes a few minutes to complete. |
| 199 | + |
| 200 | +## 6. Chat with your agent |
| 201 | + |
| 202 | +Once your SRE Agent is deployed and connected to your resource group, you can interact with it using natural language to monitor and troubleshoot your app. |
| 203 | + |
| 204 | +1. In the Azure portal, search for and select **Azure SRE Agent**. |
| 205 | + |
| 206 | +1. From the list of agents, select **my-app-service-sre-agent**. |
| 207 | + |
| 208 | +1. Select **Chat with agent**. |
| 209 | + |
| 210 | +1. In the chat box, enter the following command: |
| 211 | + |
| 212 | + ```text |
| 213 | + List my App Service apps |
| 214 | + ``` |
| 215 | +
|
| 216 | +1. The agent responds with a list of App Service apps deployed in the `my-app-service-group` resource group. |
| 217 | +
|
| 218 | +Now that the agent can see your app, you’re ready to simulate a failure and let the agent help you resolve it. |
| 219 | +
|
| 220 | +## 7. Break the app |
| 221 | +
|
| 222 | +Now simulate a failure scenario by swapping to the broken deployment slot. |
| 223 | +
|
| 224 | +1. In your App Service, go to the *Deployment* section in the left-hand menu and select **Deployment slots**. |
| 225 | +
|
| 226 | +1. Select **Swap**. |
| 227 | +
|
| 228 | +1. In the *Swap* dialog, configure: |
| 229 | + |
| 230 | + | Property | Value | Remarks | |
| 231 | + |----------|---------------------|----------------------------------| |
| 232 | + | Source | `my-sre-app-broken` | The slot with the faulty version | |
| 233 | + | Target | `my-sre-app` | The production slot | |
| 234 | +
|
| 235 | +1. Scroll to the bottom and select **Start Swap**. The swap operation might take a minute to complete. |
| 236 | +
|
| 237 | +1. Once the swap is complete, browse to the app’s URL. |
| 238 | +
|
| 239 | + :::image type="content" source="media/tutorial-sre-agent/verify-sample-broken-slot.png" alt-text="Screenshot of the .NET sample in the broken slot." border="false"::: |
| 240 | +
|
| 241 | +1. Select the "Increment" button six times. |
| 242 | +
|
| 243 | +1. You should see the app fail and return an HTTP 500 error. |
| 244 | +
|
| 245 | +1. Refresh the page (by pressing Command-R or F5) several times to generate additional HTTP 500 errors, which help the SRE Agent detect and diagnose the issue. |
| 246 | +
|
| 247 | +## 8. Fix the app |
| 248 | +
|
| 249 | +Now that the app is experiencing failures, use the SRE Agent to diagnose and resolve the issue. |
| 250 | +
|
| 251 | +1. In the Azure portal, search for and select **Azure SRE Agent**. |
| 252 | +
|
| 253 | +1. From the list of agents, select **my-app-service-sre-agent**. |
| 254 | +
|
| 255 | +1. Select **Chat with agent**. |
| 256 | +
|
| 257 | +1. In the chat box, enter the following command: |
| 258 | +
|
| 259 | + ```text |
| 260 | + What's wrong with my-sre-app? |
| 261 | + ``` |
| 262 | +
|
| 263 | +1. The agent begins to analyze the app’s health. You should see diagnostic messages related to availability, CPU and memory usage, and the recent slot swap. |
| 264 | +
|
| 265 | + > Each session may vary, but you should see a message similar to: |
| 266 | + > |
| 267 | + > *“I will now perform mitigation for my-sre-app by swapping the slots back to recover the application to a healthy state. Please note that swapping slots back may not always immediately restore health. I will keep you updated on the progress.”* |
| 268 | +
|
| 269 | +1. After a pause, the agent prompts you to approve the rollback: |
| 270 | +
|
| 271 | + > *Performing Slot Swap rollback to Restore Application Availability for my-sre-app* |
| 272 | + > |
| 273 | + > **[Approve]** **[Deny]** |
| 274 | +
|
| 275 | +1. Select **Approve** to initiate the rollback. |
| 276 | +
|
| 277 | +1. Once the rollback is complete, the agent confirms: |
| 278 | +
|
| 279 | + > *The slot swap for my-sre-app has been completed successfully (timestamp). The production slot has been restored. I will now continue with post-mitigation steps:* |
| 280 | + > |
| 281 | + > *I will ask you for the correct GitHub repo URL to raise an issue for the swap-related downtime.* |
| 282 | + > *I will monitor the app and provide an availability update in 5 minutes.* |
| 283 | + > |
| 284 | + > *Please provide the GitHub repository URL where you want the issue to be raised.* |
| 285 | +
|
| 286 | +## 9. Verify the fix |
| 287 | +
|
| 288 | +After the SRE Agent rolls back the slot swap, confirm that your app is functioning correctly. |
| 289 | +
|
| 290 | +1. Open your App Service app in a browser by selecting **Browse** from the **Overview** page. |
| 291 | +
|
| 292 | +1. Notice that the text "ERROR INJECTION ENABLED" no longer appears, confirming the app has reverted to its original state. |
| 293 | +
|
| 294 | +1. Select the **Increment** button six times to ensure no errors take place. |
| 295 | +
|
| 296 | +## Clean up resources |
| 297 | +
|
| 298 | +If you no longer need the app and agent created in this tutorial, you can delete the associated resource groups to avoid incurring charges. |
| 299 | +
|
| 300 | +Repeat the following steps for both of these resource groups: |
| 301 | +
|
| 302 | +- `my-app-service-group` (App Service resource group) |
| 303 | +- `my-sre-agent-group` (Azure SRE Agent resource group) |
| 304 | +
|
| 305 | +1. In the Azure portal, navigate to **Resource groups**. |
| 306 | +
|
| 307 | +1. Select the resource group you want to delete. |
| 308 | +
|
| 309 | +1. From the *Overview* tab, select **Delete resource group**. |
| 310 | +
|
| 311 | +1. In the confirmation dialog, enter the name of the resource group. |
| 312 | +
|
| 313 | +1. Select **Delete**. Deletion takes a few minutes to complete. |
| 314 | +
|
| 315 | +## Next steps |
| 316 | +
|
| 317 | +* [Overview of Azure App Service](overview.md) |
| 318 | +* [Use Azure Developer CLI for modern app development](/azure/developer/azure-developer-cli/overview) |
0 commit comments