Skip to content

Commit 833ce41

Browse files
authored
Update msiv1_token_revocation.md and explain hash logic in detail (#5428)
1 parent 600be3b commit 833ce41

File tree

1 file changed

+76
-0
lines changed

1 file changed

+76
-0
lines changed

docs/msiv1_token_revocation.md

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -125,6 +125,82 @@ The `xms_cc` parameter can hold **multiple** client capabilities, formatted as:
125125
> [!NOTE]
126126
> RPs or MITS should not bypass cache if a bad token is not passed by MSAL.
127127
128+
#### Cluster-wide cache-bypass optimization (hash-based)
129+
130+
```mermaid
131+
sequenceDiagram
132+
%% cluster-wide cache-bypass (hash-based)
133+
autonumber
134+
participant NodeA as "Node A (stale token T0)"
135+
participant NodeB as "Node B (fresh token T1)"
136+
participant MI as "Managed-Identity endpoint / cache"
137+
participant AAD as "Azure AD"
138+
139+
Note over NodeA,AAD: Conditional-Access change → claims="{…}"
140+
141+
NodeA->>MI: GET /token? claims + hash(T0)
142+
MI->>MI: cache lookup hash(T0) (hit - stale)
143+
MI->>AAD: request fresh token T1
144+
AAD-->>MI: token T1
145+
MI->>MI: store hash(T1)\ninvalidate hash(T0)
146+
MI-->>NodeA: token T1 (fresh)
147+
148+
Note over NodeA,NodeB: shortly after …
149+
150+
NodeB->>MI: GET /token? claims + hash(T1)
151+
MI->>MI: cache lookup hash(T1) (hit - fresh)
152+
MI-->>NodeB: token T1 (from cache)
153+
154+
Note over NodeA,NodeB: **Only one** AAD call for this revoked token
155+
```
156+
157+
#### Step-by-step flow
158+
159+
1. **Claims challenge arrives** (e.g., CAE / Conditional Access).
160+
`AcquireToken*` receives `claims="{...}"`.
161+
162+
---
163+
164+
##### **Node A** (first node that still holds the stale token)
165+
166+
| Step | Action |
167+
|------|--------|
168+
| A-1 | Finds **`access_token_A`** in its local MSAL cache. |
169+
| A-2 | Computes **`hash_A = sha256(access_token_A)`**. |
170+
| A-3 | Calls the Managed-Identity (MI) endpoint with<br/>`token_sha256_to_refresh = hash_A`. |
171+
| A-4 | MI detects `hash_A` in its cache ⇒ marks token revoked, requests a new token **`access_token_B`** from AAD. |
172+
| A-5 | MI cache now stores `hash_B = sha256(access_token_B) → access_token_B`. |
173+
174+
---
175+
176+
##### **Node B** (arrives moments later)
177+
178+
| Step | Action |
179+
|------|--------|
180+
| B-1 | Already has **`access_token_B`** via cache propagation/read-through. |
181+
| B-2 | Computes **`hash_B = sha256(access_token_B)`**. |
182+
| B-3 | Sends `token_sha256_to_refresh = hash_B`. |
183+
| B-4 | MI cache looks up `hash_B`**hit** (token already fresh). |
184+
| B-5 | MI returns **HTTP 200** + **`access_token_B`**_no extra AAD round-trip_. |
185+
186+
---
187+
188+
##### Cluster settles
189+
190+
* **Only one** outbound call to AAD per unique revoked token, no matter how many nodes receive the claims challenge.
191+
* Dramatically reduces pressure on the MI proxy and on ESTS in large Service Fabric (or AKS) deployments.
192+
193+
---
194+
195+
##### Why a simple `bypass_cache=true` flag isn’t enough
196+
197+
* `bypass_cache=true` forces **every** node to refresh → scales **O(N)** with cluster size.
198+
Large clusters could issue thousands of token requests within seconds, triggering throttling (`429`) or high latency.
199+
200+
* The **hash check** turns the problem into **O(1)**:
201+
The first node refreshes; the hash acts as an idempotency key so all other nodes immediately reuse the fresh token already in the MI cache.
202+
203+
128204
#### Motivation
129205

130206
The *internal protocol* between the client and the RP (i.e. calling the MITS endpoint in case of Service Fabric), is a simplified version of CAE. This is because CAE is claims driven and involves JSON operations such as JSON doc merges. The RP doesn't need the actual claims to perform revocation, it just needs a signal to bypass the cache. As such, it was decided to not use the full claims value internally.

0 commit comments

Comments
 (0)