Skip to content

fix(ui): stop polling on expired/invalid key auth error in PriceDataReload#24350

Open
xykong wants to merge 1 commit intoBerriAI:mainfrom
xykong:fix/price-data-reload-stop-polling-on-auth-error
Open

fix(ui): stop polling on expired/invalid key auth error in PriceDataReload#24350
xykong wants to merge 1 commit intoBerriAI:mainfrom
xykong:fix/price-data-reload-stop-polling-on-auth-error

Conversation

@xykong
Copy link
Contributor

@xykong xykong commented Mar 22, 2026

Summary

Fixes #24349

The PriceDataReload component polls two admin endpoints every 30 seconds. When the user's virtual key expires (while the JWT cookie remains valid), every poll triggers ProxyException: Expired Key (HTTP 400), spamming the server with repeated auth errors indefinitely.

Changes:

  • isAuthError() module-level helper detects HTTP 400/401 responses
  • fetchReloadStatus and fetchSourceInfo wrapped with useCallback (satisfies React hooks lint rules)
  • On auth error, stopPolling() clears the interval and sets pollingDisabled = true
  • <Alert> banner shown when polling is paused, prompting the user to re-login
  • useEffect depends on pollingDisabled so polling restarts correctly after fresh login (new accessToken)

Tests

  • Added tests for isAuthError() edge cases (non-Error throws, non-matching status codes)

Manual verification: Deployed to production Kubernetes cluster. Zero ProxyException auth errors from polling after fix — confirmed via pod logs over 30+ minutes.

Checklist

  • Touched file: ui/litellm-dashboard/src/components/price_data_reload.tsx only (no backend changes)
  • No new dependencies added
  • Existing useEffect cleanup (clearInterval on unmount) preserved

…eload

When a user's virtual key expires, the JWT cookie may still be valid so
they remain logged in. The PriceDataReload component polls
/schedule/model_cost_map_reload/status and /model/cost_map/source every
30 seconds, triggering repeated ProxyException (Expired Key, HTTP 400)
errors that spam the server logs.

Changes:
- Extract isAuthError() helper (module-level) to detect HTTP 400/401 responses
- Wrap fetchReloadStatus and fetchSourceInfo with useCallback
- On auth error, call stopPolling() which clears the interval and sets
  pollingDisabled=true to prevent further polling
- Show an Alert banner when polling is paused, prompting the user to re-login
- useEffect now depends on pollingDisabled so it restarts correctly after
  a fresh login (accessToken change resets pollingDisabled to false)

Fixes: repeated ProxyException auth errors from expired virtual keys
@vercel
Copy link

vercel bot commented Mar 22, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 22, 2026 9:33am

Request Review

@codspeed-hq
Copy link
Contributor

codspeed-hq bot commented Mar 22, 2026

Merging this PR will not alter performance

✅ 16 untouched benchmarks


Comparing xykong:fix/price-data-reload-stop-polling-on-auth-error (85820df) with main (c89496f)

Open in CodSpeed

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 22, 2026

Greptile Summary

This PR fixes a polling spam issue in PriceDataReload where an expired virtual key would cause HTTP 400 errors on every 30-second poll cycle. The approach — detecting auth errors and stopping the interval — is sound, but there is a meaningful logic bug in the re-login recovery path, plus a fragile error-detection mechanism.

Key changes:

  • Adds isAuthError() helper using a regex on the error message to detect HTTP 400/401 responses
  • Wraps fetchReloadStatus and fetchSourceInfo in useCallback and calls stopPolling() on auth errors
  • stopPolling() clears intervalRef.current and sets pollingDisabled = true via state
  • Renders an Ant Design <Alert> warning when polling is paused

Issues found:

  • Polling never resumes after re-login (P1): pollingDisabled is never reset to false when accessToken changes. The useEffect early-return guard fires even with a brand new valid token, so the claim in the PR description that polling "restarts correctly after fresh login" is incorrect without a component remount.
  • Fragile auth-error detection (P2): isAuthError matches against /HTTP (400|401)/ in the error message string. If the networking layer changes its error format this will silently stop working.
  • Stale warning banner after successful manual action (P2): The "session expired" alert remains visible even after a successful hard-refresh via the button, which is misleading since the token is clearly still valid at that point.

Confidence Score: 3/5

  • Safe to merge as a partial improvement, but the re-login recovery path is broken — polling will not resume without a page navigation or component remount.
  • The core bug (polling spam on expired key) is fixed correctly. However, the stated re-login recovery behaviour does not work as described: pollingDisabled is never reset on accessToken change, leaving users with a permanently paused dashboard until they navigate away and back. The error-detection regex is also fragile. These issues lower confidence despite the overall approach being reasonable.
  • ui/litellm-dashboard/src/components/price_data_reload.tsx — specifically the missing pollingDisabled reset on accessToken change (lines 119–136) and the isAuthError regex (lines 15–20).

Important Files Changed

Filename Overview
ui/litellm-dashboard/src/components/price_data_reload.tsx Adds auth-error detection and polling backoff to stop spamming the server on expired keys. Core mechanism is correct but has a logic bug: pollingDisabled is never reset when accessToken changes, so polling won't restart after re-login as claimed. The isAuthError regex is also fragile, and the "expired" alert persists after a successful manual hard-refresh.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A([Component mounts / accessToken changes]) --> B{accessToken present\nAND pollingDisabled === false?}
    B -- No --> C([useEffect early return — no polling])
    B -- Yes --> D[Initial fetch: fetchReloadStatus + fetchSourceInfo]
    D --> E[Start setInterval 30s]
    E --> F{Interval fires}
    F --> G[fetchReloadStatus]
    F --> H[fetchSourceInfo]
    G --> I{Auth error\nHTTP 400/401?}
    H --> J{Auth error\nHTTP 400/401?}
    I -- Yes --> K[stopPolling: clearInterval + setPollingDisabled true]
    J -- Yes --> K
    I -- No --> L[Update reloadStatus state]
    J -- No --> M[Update sourceInfo state]
    K --> N[Show Alert warning banner]
    N --> O{User re-logs in\nnew accessToken prop?}
    O -- Yes --> P["useEffect re-runs BUT pollingDisabled still true ⚠️"]
    P --> C
    O -- No --> Q([Polling stays stopped])
Loading

Reviews (1): Last reviewed commit: "fix(ui): stop polling on expired/invalid..." | Re-trigger Greptile

Comment on lines +119 to +136
useEffect(() => {
if (!accessToken || pollingDisabled) return;

fetchReloadStatus();
fetchSourceInfo();

intervalRef.current = setInterval(() => {
fetchReloadStatus();
fetchSourceInfo();
}, 30000);

return () => {
if (intervalRef.current !== null) {
clearInterval(intervalRef.current);
intervalRef.current = null;
}
};
}, [accessToken, pollingDisabled, fetchReloadStatus, fetchSourceInfo]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Polling never resumes after re-login

The PR description states that "polling restarts correctly after fresh login (new accessToken)", but this is not the case with the current implementation.

When an auth error stops polling, pollingDisabled is set to true. If the user then re-logs in (causing the parent to pass a new accessToken), the useEffect re-runs (because accessToken and the memoized callbacks changed), but pollingDisabled is still true — so the early-return guard fires and the interval is never restarted.

Polling will only resume if the component is fully unmounted and remounted (e.g., navigating away and back). A simple fix is to reset pollingDisabled to false whenever accessToken changes:

// Reset polling-disabled guard when a new token arrives (e.g. after re-login)
useEffect(() => {
  if (accessToken) {
    setPollingDisabled(false);
  }
}, [accessToken]);

Place this before the polling useEffect. This ensures the next run of the polling effect sees pollingDisabled === false and starts the interval again.

Comment on lines +15 to +20
const isAuthError = (error: unknown): boolean => {
if (error instanceof Error) {
return /HTTP (400|401)/.test(error.message);
}
return false;
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 isAuthError relies on a fragile error-message substring match

The regex /HTTP (400|401)/ works only if the networking layer formats its errors exactly with that substring (e.g., "... HTTP 400 ..."). If networking.ts ever changes its error serialisation (e.g., Axios wraps it as "Request failed with status code 400", or a future refactor drops the "HTTP" prefix), isAuthError will silently return false, polling will continue indefinitely, and the fix regresses without any obvious failure.

Consider enriching the custom error objects thrown by the networking helpers with a typed status field, and checking that instead:

// In networking.ts (example)
class HttpError extends Error {
  constructor(public status: number, message: string) {
    super(message);
  }
}

// Then here:
const isAuthError = (error: unknown): boolean =>
  error instanceof HttpError && (error.status === 400 || error.status === 401);

If modifying the networking layer is out of scope for this PR, at minimum document the expected format in a comment so future maintainers don't break the assumption silently.

Comment on lines +241 to +248
{pollingDisabled && (
<Alert
type="warning"
message="Auto-refresh paused — your session key has expired. Please re-login to restore live status."
showIcon
style={{ marginBottom: 16 }}
/>
)}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Alert shown even after a successful manual hard-refresh

When pollingDisabled is true, the warning banner persists even if the user successfully performs a "Hard Refresh" via the button (which calls fetchReloadStatus / fetchSourceInfo directly). A successful manual call implies the token is now valid again — leaving the "session key has expired" alert up at that point is misleading UX.

Consider clearing pollingDisabled (and restarting polling) on a successful hard-refresh response, or at least hiding the alert if a direct call succeeds:

const handleHardRefresh = async () => {
  // ...
  try {
    const response = await reloadModelCostMap(accessToken);
    if (response.status === "success") {
      // ...
      setPollingDisabled(false); // Clear the auth-error gate on success
    }
  }
  // ...
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug(ui): PriceDataReload polls admin APIs every 30s with expired virtual key, spamming server with ProxyException auth errors

1 participant