-
Notifications
You must be signed in to change notification settings - Fork 7.1k
Fix "could not refresh token" error resulting from concurrent CLI instances #8645
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…tances Idle Codex CLI instances can get stuck after another concurrently-running instance refreshes and rotates the shared ChatGPT refresh token: the idle process wakes up, gets a 401, and its in-memory refresh token is no longer valid, so refresh fails permanently. This change makes 401 recovery resilient to concurrent token rotation by first syncing ChatGPT tokens from the configured credential store (file/keyring/auto) and retrying the request, then performing a network refresh only if needed (using the refresh token loaded from storage). It also prevents accidental cross-account/workspace switching by only adopting/refreshing when chatgpt_account_id matches the request’s auth snapshot, and adds bounded retries on transient auth.json parse errors to handle concurrent truncate+write. Added unit tests for the storage-sync outcomes.
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
Why is this required? |
| .await | ||
| .map_err(RefreshTokenError::Transient)? | ||
| else { | ||
| return Ok(None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should a method be extracted here that returns Optional and you can use ? to short circuit all these checks and return Ok(None);s?
| expected: &CodexAuth, | ||
| ) -> Result<Option<String>, RefreshTokenError> { | ||
| if expected.mode != AuthMode::ChatGPT { | ||
| return Ok(None); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method needs to be simpler. There are way too many Ok(None),
| auth: &Option<crate::auth::CodexAuth>, | ||
| ) -> Result<()> { | ||
| if *refreshed { | ||
| if recovery.refreshed_token { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we keep the refresh logic fully inside AuthManager so no external checking is needed? We can use some status endpoint to check whether the token is alive.
Will avoid every client having to maintain a complex recovery loop.
pakrym-oai
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way we can make both the refresh logic and the consumption logic simpler?
Idle Codex CLI instances can get stuck after another concurrently-running instance refreshes and rotates the shared ChatGPT refresh token: the idle process wakes up, gets a 401, and its in-memory refresh token is no longer valid, so refresh fails permanently.
This change makes 401 recovery resilient to concurrent token rotation by first syncing ChatGPT tokens from the configured credential store (file/keyring/auto) and retrying the request, then performing a network refresh only if needed (using the refresh token loaded from storage). It also prevents accidental cross-account/workspace switching by only adopting/refreshing when chatgpt_account_id matches the request’s auth snapshot, and adds bounded retries on transient auth.json parse errors to handle concurrent truncate+write. Added unit tests for the storage-sync outcomes.
This addresses #6498, which several users have reported.