|
| 1 | +# MSAL & MSI IMDS Error Handling and Retry Strategy Specification |
| 2 | + |
| 3 | +## Overview |
| 4 | +This document defines the error handling and retry strategy for MSAL when interacting with the IMDS (Instance Metadata Service) endpoint for Managed Identity (MSI) token acquisition. |
| 5 | + |
| 6 | +--- |
| 7 | + |
| 8 | +## 1️⃣ HTTP Status Codes & Recommended Actions |
| 9 | + |
| 10 | +| **HTTP Status Code** | **Error Reason** | **Recommended Action** | **Retry Delay Strategy** | |
| 11 | +|----------------------|-----------------------------------------------|---------------------------------------------|-----------------------------------------| |
| 12 | +| **400** | Bad Request (Invalid Parameters) | **Do not retry**, fix request | **No retry** | |
| 13 | +| **401** | Unauthorized | **Do not retry**, check authentication setup | **No retry** | |
| 14 | +| **403** | Forbidden | **Do not retry**, verify permissions | **No retry** | |
| 15 | +| **404** | IMDS endpoint is updating / Identity Not Found | Retry with Exponential Backoff (max 3 retries) | **1s → 2s → 4s (max 4s)** | |
| 16 | +| **408** | Request Timeout | Retry with Exponential Backoff (max 3 retries) | **1s → 2s → 4s (max 4s)** | |
| 17 | +| **410** | IMDS is undergoing updates | Retry every 10 seconds (max 70s / 7 attempts). Log each retry. | **10s → 10s → … (up to 7 attempts)** | |
| 18 | +| **429** | IMDS Throttle limit reached | Retry with Exponential Backoff (max 3 retries) | **1s → 2s → 4s (max 4s)** | |
| 19 | +| **504** | Gateway Timeout | Retry with Exponential Backoff (max 3 retries) | **1s → 2s → 4s (max 4s)** | |
| 20 | +| **5xx** | Transient service error | Retry with Exponential Backoff (max 3 retries) | **1s → 2s → 4s (max 4s)** | |
| 21 | + |
| 22 | +--- |
| 23 | + |
| 24 | +## 2️⃣ Identity Propagation & Special Handling for "Identity Not Found" Errors |
| 25 | +- **Scenario:** When an identity is newly assigned to a VM, it may take time for the IMDS service to recognize the identity. |
| 26 | +- **Exception Handling:** |
| 27 | + - If the **IMDS response contains "Identity Not Found"**, retry the request using **exponential backoff**. |
| 28 | + - **Error Code:** **404 (Identity Not Found)** |
| 29 | + - Recommended retry sequence: **1s → 2s → 4s** (max 3 retries) |
| 30 | + - If still failing, log an error and return the failure. |
| 31 | + |
| 32 | +--- |
| 33 | + |
| 34 | +## 3️⃣ Summary of the updated retry strategy |
| 35 | +Below is a summary table showing the retry patterns for each scenario: |
| 36 | + |
| 37 | +| **Scenario** | **Attempts** | **Delay Pattern** | |
| 38 | +|-----------------------------------------------------------|-----------------|-------------------------------------| |
| 39 | +| **404 (Identity Not Found), 408/504 (Timeout), 429, 5xx** | Up to **3** | **Exponential Backoff**: 1s → 2s → 4s | |
| 40 | +| **410 (IMDS Updates)** | Up to **7** | **Every 10 seconds** (max 70s total) | |
| 41 | + |
| 42 | +### Key Points |
| 43 | +- **Exponential Backoff** applies to: |
| 44 | + - 404 (*Identity Not Found*) |
| 45 | + - 408/504 (*Timeouts*) |
| 46 | + - 429 (*Throttling*) |
| 47 | + - 5xx errors |
| 48 | + - Retries occur **up to 3 times** with delays of **1s → 2s → 4s**. |
| 49 | +- **410 (IMDS Updates)** |
| 50 | + - Retry **every 10 seconds** for up to **7 attempts** (70s total). |
| 51 | +- **Log a statement on each retry** (for both exponential backoff and 410) indicating the attempt number, the reason for retry, and the total time waited so far. |
| 52 | + |
| 53 | +--- |
| 54 | + |
| 55 | +```mermaid |
| 56 | +graph TD; |
| 57 | + |
| 58 | + A[IMDS Request] -->|Success| B[✅ Token Issued] |
| 59 | + A -->|4xx Error?| C{Identity Not Found?} |
| 60 | + C -- Yes --> D[🔄 Retry: 1s → 2s → 4s] |
| 61 | + C -- No --> E[❌ Do Not Retry] |
| 62 | + A -->|5xx Error?| F[🔄 Retry: 1s → 2s → 4s] |
| 63 | + A -->|410 IMDS Updating?| H[🔄 Retry: 10s up to 7 attempts] |
| 64 | + A -->|429 Throttling?| G[🔄 Retry: 1s → 2s → 4s] |
| 65 | +``` |
| 66 | +--- |
| 67 | + |
| 68 | +**References:** |
| 69 | + |
| 70 | +1. https://learn.microsoft.com/en-gb/entra/identity/managed-identities-azure-resources/how-to-use-vm-token#error-handling |
| 71 | +2. https://eng.ms/docs/cloud-ai-platform/azure-core/core-compute-and-host/general-purpose-host-arunki/azure-instance-metadata-service/compute-azlinux-metadataserver/troubleshooting/unable-to-reach-imds#mitigate-http-status-code-410 |
| 72 | + |
0 commit comments