Skip to content

Commit 5fcc057

Browse files
authored
IMDS Retry specification (#5176)
* Create imds_retry_based_on_errors.md * Fix formatting in references section * Update retry strategies and error handling in docs * Reorder and update HTTP status codes table * Update retry strategy for IMDS errors * Update retry logic for 429 Throttling * Update retry strategy for 410 IMDS errors * Update retry strategy for 410 error * Update retry strategy for 410 errors * Update retry strategy documentation with summary table * Remove implementation notes from IMDS retry documentation * Fix retry pattern order and formatting in docs
1 parent f112da9 commit 5fcc057

File tree

1 file changed

+72
-0
lines changed

1 file changed

+72
-0
lines changed

docs/imds_retry_based_on_errors.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# MSAL & MSI IMDS Error Handling and Retry Strategy Specification
2+
3+
## Overview
4+
This document defines the error handling and retry strategy for MSAL when interacting with the IMDS (Instance Metadata Service) endpoint for Managed Identity (MSI) token acquisition.
5+
6+
---
7+
8+
## 1️⃣ HTTP Status Codes & Recommended Actions
9+
10+
| **HTTP Status Code** | **Error Reason** | **Recommended Action** | **Retry Delay Strategy** |
11+
|----------------------|-----------------------------------------------|---------------------------------------------|-----------------------------------------|
12+
| **400** | Bad Request (Invalid Parameters) | **Do not retry**, fix request | **No retry** |
13+
| **401** | Unauthorized | **Do not retry**, check authentication setup | **No retry** |
14+
| **403** | Forbidden | **Do not retry**, verify permissions | **No retry** |
15+
| **404** | IMDS endpoint is updating / Identity Not Found | Retry with Exponential Backoff (max 3 retries) | **1s → 2s → 4s (max 4s)** |
16+
| **408** | Request Timeout | Retry with Exponential Backoff (max 3 retries) | **1s → 2s → 4s (max 4s)** |
17+
| **410** | IMDS is undergoing updates | Retry every 10 seconds (max 70s / 7 attempts). Log each retry. | **10s → 10s → … (up to 7 attempts)** |
18+
| **429** | IMDS Throttle limit reached | Retry with Exponential Backoff (max 3 retries) | **1s → 2s → 4s (max 4s)** |
19+
| **504** | Gateway Timeout | Retry with Exponential Backoff (max 3 retries) | **1s → 2s → 4s (max 4s)** |
20+
| **5xx** | Transient service error | Retry with Exponential Backoff (max 3 retries) | **1s → 2s → 4s (max 4s)** |
21+
22+
---
23+
24+
## 2️⃣ Identity Propagation & Special Handling for "Identity Not Found" Errors
25+
- **Scenario:** When an identity is newly assigned to a VM, it may take time for the IMDS service to recognize the identity.
26+
- **Exception Handling:**
27+
- If the **IMDS response contains "Identity Not Found"**, retry the request using **exponential backoff**.
28+
- **Error Code:** **404 (Identity Not Found)**
29+
- Recommended retry sequence: **1s → 2s → 4s** (max 3 retries)
30+
- If still failing, log an error and return the failure.
31+
32+
---
33+
34+
## 3️⃣ Summary of the updated retry strategy
35+
Below is a summary table showing the retry patterns for each scenario:
36+
37+
| **Scenario** | **Attempts** | **Delay Pattern** |
38+
|-----------------------------------------------------------|-----------------|-------------------------------------|
39+
| **404 (Identity Not Found), 408/504 (Timeout), 429, 5xx** | Up to **3** | **Exponential Backoff**: 1s → 2s → 4s |
40+
| **410 (IMDS Updates)** | Up to **7** | **Every 10 seconds** (max 70s total) |
41+
42+
### Key Points
43+
- **Exponential Backoff** applies to:
44+
- 404 (*Identity Not Found*)
45+
- 408/504 (*Timeouts*)
46+
- 429 (*Throttling*)
47+
- 5xx errors
48+
- Retries occur **up to 3 times** with delays of **1s → 2s → 4s**.
49+
- **410 (IMDS Updates)**
50+
- Retry **every 10 seconds** for up to **7 attempts** (70s total).
51+
- **Log a statement on each retry** (for both exponential backoff and 410) indicating the attempt number, the reason for retry, and the total time waited so far.
52+
53+
---
54+
55+
```mermaid
56+
graph TD;
57+
58+
A[IMDS Request] -->|Success| B[✅ Token Issued]
59+
A -->|4xx Error?| C{Identity Not Found?}
60+
C -- Yes --> D[🔄 Retry: 1s → 2s → 4s]
61+
C -- No --> E[❌ Do Not Retry]
62+
A -->|5xx Error?| F[🔄 Retry: 1s → 2s → 4s]
63+
A -->|410 IMDS Updating?| H[🔄 Retry: 10s up to 7 attempts]
64+
A -->|429 Throttling?| G[🔄 Retry: 1s → 2s → 4s]
65+
```
66+
---
67+
68+
**References:**
69+
70+
1. https://learn.microsoft.com/en-gb/entra/identity/managed-identities-azure-resources/how-to-use-vm-token#error-handling
71+
2. https://eng.ms/docs/cloud-ai-platform/azure-core/core-compute-and-host/general-purpose-host-arunki/azure-instance-metadata-service/compute-azlinux-metadataserver/troubleshooting/unable-to-reach-imds#mitigate-http-status-code-410
72+

0 commit comments

Comments
 (0)