Skip to content

Conversation

@etraut-openai
Copy link
Collaborator

Idle Codex CLI instances can get stuck after another concurrently-running instance refreshes and rotates the shared ChatGPT refresh token: the idle process wakes up, gets a 401, and its in-memory refresh token is no longer valid, so refresh fails permanently.

This change makes 401 recovery resilient to concurrent token rotation by first syncing ChatGPT tokens from the configured credential store (file/keyring/auto) and retrying the request, then performing a network refresh only if needed (using the refresh token loaded from storage). It also prevents accidental cross-account/workspace switching by only adopting/refreshing when chatgpt_account_id matches the request’s auth snapshot, and adds bounded retries on transient auth.json parse errors to handle concurrent truncate+write. Added unit tests for the storage-sync outcomes.

This addresses #6498, which several users have reported.

…tances

Idle Codex CLI instances can get stuck after another concurrently-running instance refreshes and rotates the shared ChatGPT refresh token: the idle process wakes up, gets a 401, and its in-memory refresh token is no longer valid, so refresh fails permanently.

This change makes 401 recovery resilient to concurrent token rotation by first syncing ChatGPT tokens from the configured credential store (file/keyring/auto) and retrying the request, then performing a network refresh only if needed (using the refresh token loaded from storage). It also prevents accidental cross-account/workspace switching by only adopting/refreshing when chatgpt_account_id matches the request’s auth snapshot, and adds bounded retries on transient auth.json parse errors to handle concurrent truncate+write. Added unit tests for the storage-sync outcomes.
@etraut-openai
Copy link
Collaborator Author

@codex review

Copy link
Contributor

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@pakrym-oai
Copy link
Collaborator

It also prevents accidental cross-account/workspace switching by only adopting/refreshing when chatgpt_account_id matches the request’s auth snapshot

Why is this required?

.await
.map_err(RefreshTokenError::Transient)?
else {
return Ok(None);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should a method be extracted here that returns Optional and you can use ? to short circuit all these checks and return Ok(None);s?

expected: &CodexAuth,
) -> Result<Option<String>, RefreshTokenError> {
if expected.mode != AuthMode::ChatGPT {
return Ok(None);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method needs to be simpler. There are way too many Ok(None),

auth: &Option<crate::auth::CodexAuth>,
) -> Result<()> {
if *refreshed {
if recovery.refreshed_token {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we keep the refresh logic fully inside AuthManager so no external checking is needed? We can use some status endpoint to check whether the token is alive.

Will avoid every client having to maintain a complex recovery loop.

Copy link
Collaborator

@pakrym-oai pakrym-oai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way we can make both the refresh logic and the consumption logic simpler?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants