Skip to content

Commit c4ebe4b

Browse files
Improved token refresh handling to address "Re-connecting" behavior (openai#6231)
Currently, when the access token expires, we attempt to use the refresh token to acquire a new access token. This works most of the time. However, there are situations where the refresh token is expired, exhausted (already used to perform a refresh), or revoked. In those cases, the current logic treats the error as transient and attempts to retry it repeatedly. This PR changes the token refresh logic to differentiate between permanent and transient errors. It also changes callers to treat the permanent errors as fatal rather than retrying them. And it provides better error messages to users so they understand how to address the problem. These error messages should also help us further understand why we're seeing examples of refresh token exhaustion. Here is the error message in the CLI. The same text appears within the extension. <img width="863" height="38" alt="image" src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f" /> I also correct the spelling of "Re-connecting", which shouldn't have a hyphen in it. Testing: I manually tested these code paths by adding temporary code to programmatically cause my refresh token to be exhausted (by calling the token refresh endpoint in a tight loop more than 50 times). I then simulated an access token expiration, which caused the token refresh logic to be invoked. I confirmed that the updated logic properly handled the error condition. Note: We earlier discussed the idea of forcefully logging out the user at the point where token refresh failed. I made several attempts to do this, and all of them resulted in a bad UX. It's important to surface this error to users in a way that explains the problem and tells them that they need to log in again. We also previously discussed deleting the auth.json file when this condition is detected. That also creates problems because it effectively changes the auth status from logged in to logged out, and this causes odd failures and inconsistent UX. I think it's therefore better not to delete auth.json in this case. If the user closes the CLI or VSCE and starts it again, we properly detect that the access token is expired and the refresh token is "dead", and we force the user to go through the login flow at that time. This should address aspects of openai#6191, openai#5679, and openai#5505
1 parent 1a89f70 commit c4ebe4b

File tree

6 files changed

+458
-32
lines changed

6 files changed

+458
-32
lines changed

codex-rs/core/src/auth.rs

Lines changed: 145 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,14 @@
11
mod storage;
22

33
use chrono::Utc;
4+
use reqwest::StatusCode;
45
use serde::Deserialize;
56
use serde::Serialize;
67
#[cfg(test)]
78
use serial_test::serial;
89
use std::env;
910
use std::fmt::Debug;
11+
use std::io::ErrorKind;
1012
use std::path::Path;
1113
use std::path::PathBuf;
1214
use std::sync::Arc;
@@ -22,10 +24,14 @@ use crate::auth::storage::AuthStorageBackend;
2224
use crate::auth::storage::create_auth_storage;
2325
use crate::config::Config;
2426
use crate::default_client::CodexHttpClient;
27+
use crate::error::RefreshTokenFailedError;
28+
use crate::error::RefreshTokenFailedReason;
2529
use crate::token_data::PlanType;
2630
use crate::token_data::TokenData;
2731
use crate::token_data::parse_id_token;
2832
use crate::util::try_parse_error_message;
33+
use serde_json::Value;
34+
use thiserror::Error;
2935

3036
#[derive(Debug, Clone)]
3137
pub struct CodexAuth {
@@ -46,26 +52,63 @@ impl PartialEq for CodexAuth {
4652
// TODO(pakrym): use token exp field to check for expiration instead
4753
const TOKEN_REFRESH_INTERVAL: i64 = 8;
4854

55+
const REFRESH_TOKEN_EXPIRED_MESSAGE: &str = "Your access token could not be refreshed because your refresh token has expired. Please log out and sign in again.";
56+
const REFRESH_TOKEN_REUSED_MESSAGE: &str = "Your access token could not be refreshed because your refresh token was already used. Please log out and sign in again.";
57+
const REFRESH_TOKEN_INVALIDATED_MESSAGE: &str = "Your access token could not be refreshed because your refresh token was revoked. Please log out and sign in again.";
58+
const REFRESH_TOKEN_UNKNOWN_MESSAGE: &str =
59+
"Your access token could not be refreshed. Please log out and sign in again.";
60+
const REFRESH_TOKEN_URL: &str = "https://auth.openai.com/oauth/token";
61+
pub const REFRESH_TOKEN_URL_OVERRIDE_ENV_VAR: &str = "CODEX_REFRESH_TOKEN_URL_OVERRIDE";
62+
63+
#[derive(Debug, Error)]
64+
pub enum RefreshTokenError {
65+
#[error("{0}")]
66+
Permanent(#[from] RefreshTokenFailedError),
67+
#[error(transparent)]
68+
Transient(#[from] std::io::Error),
69+
}
70+
71+
impl RefreshTokenError {
72+
pub fn failed_reason(&self) -> Option<RefreshTokenFailedReason> {
73+
match self {
74+
Self::Permanent(error) => Some(error.reason),
75+
Self::Transient(_) => None,
76+
}
77+
}
78+
79+
fn other_with_message(message: impl Into<String>) -> Self {
80+
Self::Transient(std::io::Error::other(message.into()))
81+
}
82+
}
83+
84+
impl From<RefreshTokenError> for std::io::Error {
85+
fn from(err: RefreshTokenError) -> Self {
86+
match err {
87+
RefreshTokenError::Permanent(failed) => std::io::Error::other(failed),
88+
RefreshTokenError::Transient(inner) => inner,
89+
}
90+
}
91+
}
92+
4993
impl CodexAuth {
50-
pub async fn refresh_token(&self) -> Result<String, std::io::Error> {
94+
pub async fn refresh_token(&self) -> Result<String, RefreshTokenError> {
5195
tracing::info!("Refreshing token");
5296

53-
let token_data = self
54-
.get_current_token_data()
55-
.ok_or(std::io::Error::other("Token data is not available."))?;
97+
let token_data = self.get_current_token_data().ok_or_else(|| {
98+
RefreshTokenError::Transient(std::io::Error::other("Token data is not available."))
99+
})?;
56100
let token = token_data.refresh_token;
57101

58-
let refresh_response = try_refresh_token(token, &self.client)
59-
.await
60-
.map_err(std::io::Error::other)?;
102+
let refresh_response = try_refresh_token(token, &self.client).await?;
61103

62104
let updated = update_tokens(
63105
&self.storage,
64106
refresh_response.id_token,
65107
refresh_response.access_token,
66108
refresh_response.refresh_token,
67109
)
68-
.await?;
110+
.await
111+
.map_err(RefreshTokenError::from)?;
69112

70113
if let Ok(mut auth_lock) = self.auth_dot_json.lock() {
71114
*auth_lock = Some(updated.clone());
@@ -74,7 +117,7 @@ impl CodexAuth {
74117
let access = match updated.tokens {
75118
Some(t) => t.access_token,
76119
None => {
77-
return Err(std::io::Error::other(
120+
return Err(RefreshTokenError::other_with_message(
78121
"Token data is not available after refresh.",
79122
));
80123
}
@@ -99,15 +142,21 @@ impl CodexAuth {
99142
..
100143
}) => {
101144
if last_refresh < Utc::now() - chrono::Duration::days(TOKEN_REFRESH_INTERVAL) {
102-
let refresh_response = tokio::time::timeout(
145+
let refresh_result = tokio::time::timeout(
103146
Duration::from_secs(60),
104147
try_refresh_token(tokens.refresh_token.clone(), &self.client),
105148
)
106-
.await
107-
.map_err(|_| {
108-
std::io::Error::other("timed out while refreshing OpenAI API key")
109-
})?
110-
.map_err(std::io::Error::other)?;
149+
.await;
150+
let refresh_response = match refresh_result {
151+
Ok(Ok(response)) => response,
152+
Ok(Err(err)) => return Err(err.into()),
153+
Err(_) => {
154+
return Err(std::io::Error::new(
155+
ErrorKind::TimedOut,
156+
"timed out while refreshing OpenAI API key",
157+
));
158+
}
159+
};
111160

112161
let updated_auth_dot_json = update_tokens(
113162
&self.storage,
@@ -425,38 +474,101 @@ async fn update_tokens(
425474
async fn try_refresh_token(
426475
refresh_token: String,
427476
client: &CodexHttpClient,
428-
) -> std::io::Result<RefreshResponse> {
477+
) -> Result<RefreshResponse, RefreshTokenError> {
429478
let refresh_request = RefreshRequest {
430479
client_id: CLIENT_ID,
431480
grant_type: "refresh_token",
432481
refresh_token,
433482
scope: "openid profile email",
434483
};
435484

485+
let endpoint = refresh_token_endpoint();
486+
436487
// Use shared client factory to include standard headers
437488
let response = client
438-
.post("https://auth.openai.com/oauth/token")
489+
.post(endpoint.as_str())
439490
.header("Content-Type", "application/json")
440491
.json(&refresh_request)
441492
.send()
442493
.await
443-
.map_err(std::io::Error::other)?;
494+
.map_err(|err| RefreshTokenError::Transient(std::io::Error::other(err)))?;
444495

445-
if response.status().is_success() {
496+
let status = response.status();
497+
if status.is_success() {
446498
let refresh_response = response
447499
.json::<RefreshResponse>()
448500
.await
449-
.map_err(std::io::Error::other)?;
501+
.map_err(|err| RefreshTokenError::Transient(std::io::Error::other(err)))?;
450502
Ok(refresh_response)
451503
} else {
452-
Err(std::io::Error::other(format!(
453-
"Failed to refresh token: {}: {}",
454-
response.status(),
455-
try_parse_error_message(&response.text().await.unwrap_or_default()),
456-
)))
504+
let body = response.text().await.unwrap_or_default();
505+
if status == StatusCode::UNAUTHORIZED {
506+
let failed = classify_refresh_token_failure(&body);
507+
Err(RefreshTokenError::Permanent(failed))
508+
} else {
509+
let message = try_parse_error_message(&body);
510+
Err(RefreshTokenError::Transient(std::io::Error::other(
511+
format!("Failed to refresh token: {status}: {message}"),
512+
)))
513+
}
457514
}
458515
}
459516

517+
fn classify_refresh_token_failure(body: &str) -> RefreshTokenFailedError {
518+
let code = extract_refresh_token_error_code(body);
519+
520+
let normalized_code = code.as_deref().map(str::to_ascii_lowercase);
521+
let reason = match normalized_code.as_deref() {
522+
Some("refresh_token_expired") => RefreshTokenFailedReason::Expired,
523+
Some("refresh_token_reused") => RefreshTokenFailedReason::Exhausted,
524+
Some("refresh_token_invalidated") => RefreshTokenFailedReason::Revoked,
525+
_ => RefreshTokenFailedReason::Other,
526+
};
527+
528+
if reason == RefreshTokenFailedReason::Other {
529+
tracing::warn!(
530+
backend_code = normalized_code.as_deref(),
531+
backend_body = body,
532+
"Encountered unknown 401 response while refreshing token"
533+
);
534+
}
535+
536+
let message = match reason {
537+
RefreshTokenFailedReason::Expired => REFRESH_TOKEN_EXPIRED_MESSAGE.to_string(),
538+
RefreshTokenFailedReason::Exhausted => REFRESH_TOKEN_REUSED_MESSAGE.to_string(),
539+
RefreshTokenFailedReason::Revoked => REFRESH_TOKEN_INVALIDATED_MESSAGE.to_string(),
540+
RefreshTokenFailedReason::Other => REFRESH_TOKEN_UNKNOWN_MESSAGE.to_string(),
541+
};
542+
543+
RefreshTokenFailedError::new(reason, message)
544+
}
545+
546+
fn extract_refresh_token_error_code(body: &str) -> Option<String> {
547+
if body.trim().is_empty() {
548+
return None;
549+
}
550+
551+
let Value::Object(map) = serde_json::from_str::<Value>(body).ok()? else {
552+
return None;
553+
};
554+
555+
if let Some(error_value) = map.get("error") {
556+
match error_value {
557+
Value::Object(obj) => {
558+
if let Some(code) = obj.get("code").and_then(Value::as_str) {
559+
return Some(code.to_string());
560+
}
561+
}
562+
Value::String(code) => {
563+
return Some(code.to_string());
564+
}
565+
_ => {}
566+
}
567+
}
568+
569+
map.get("code").and_then(Value::as_str).map(str::to_string)
570+
}
571+
460572
#[derive(Serialize)]
461573
struct RefreshRequest {
462574
client_id: &'static str,
@@ -475,6 +587,11 @@ struct RefreshResponse {
475587
// Shared constant for token refresh (client id used for oauth token refresh flow)
476588
pub const CLIENT_ID: &str = "app_EMoamEEZ73f0CkXaXp7hrann";
477589

590+
fn refresh_token_endpoint() -> String {
591+
std::env::var(REFRESH_TOKEN_URL_OVERRIDE_ENV_VAR)
592+
.unwrap_or_else(|_| REFRESH_TOKEN_URL.to_string())
593+
}
594+
478595
use std::sync::RwLock;
479596

480597
/// Internal cached auth state.
@@ -965,7 +1082,9 @@ impl AuthManager {
9651082

9661083
/// Attempt to refresh the current auth token (if any). On success, reload
9671084
/// the auth state from disk so other components observe refreshed token.
968-
pub async fn refresh_token(&self) -> std::io::Result<Option<String>> {
1085+
/// If the token refresh fails in a permanent (non‑transient) way, logs out
1086+
/// to clear invalid auth state.
1087+
pub async fn refresh_token(&self) -> Result<Option<String>, RefreshTokenError> {
9691088
let auth = match self.auth() {
9701089
Some(a) => a,
9711090
None => return Ok(None),

codex-rs/core/src/client.rs

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,7 @@ use tracing::warn;
3131

3232
use crate::AuthManager;
3333
use crate::auth::CodexAuth;
34+
use crate::auth::RefreshTokenError;
3435
use crate::chat_completions::AggregateStreamExt;
3536
use crate::chat_completions::stream_chat_completions;
3637
use crate::client_common::Prompt;
@@ -389,12 +390,17 @@ impl ModelClient {
389390
&& let Some(manager) = auth_manager.as_ref()
390391
&& let Some(auth) = auth.as_ref()
391392
&& auth.mode == AuthMode::ChatGPT
393+
&& let Err(err) = manager.refresh_token().await
392394
{
393-
manager.refresh_token().await.map_err(|err| {
394-
StreamAttemptError::Fatal(CodexErr::Fatal(format!(
395-
"Failed to refresh ChatGPT credentials: {err}"
396-
)))
397-
})?;
395+
let stream_error = match err {
396+
RefreshTokenError::Permanent(failed) => {
397+
StreamAttemptError::Fatal(CodexErr::RefreshTokenFailed(failed))
398+
}
399+
RefreshTokenError::Transient(other) => {
400+
StreamAttemptError::RetryableTransportError(CodexErr::Io(other))
401+
}
402+
};
403+
return Err(stream_error);
398404
}
399405

400406
// The OpenAI Responses endpoint returns structured JSON bodies even for 4xx/5xx

codex-rs/core/src/codex.rs

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1928,6 +1928,7 @@ async fn run_turn(
19281928
return Err(CodexErr::UsageLimitReached(e));
19291929
}
19301930
Err(CodexErr::UsageNotIncluded) => return Err(CodexErr::UsageNotIncluded),
1931+
Err(e @ CodexErr::RefreshTokenFailed(_)) => return Err(e),
19311932
Err(e) => {
19321933
// Use the configured provider-specific stream retry budget.
19331934
let max_retries = turn_context.client.get_provider().stream_max_retries();
@@ -1946,7 +1947,7 @@ async fn run_turn(
19461947
// at a seemingly frozen screen.
19471948
sess.notify_stream_error(
19481949
&turn_context,
1949-
format!("Re-connecting... {retries}/{max_retries}"),
1950+
format!("Reconnecting... {retries}/{max_retries}"),
19501951
)
19511952
.await;
19521953

codex-rs/core/src/error.rs

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,9 @@ pub enum CodexErr {
135135
#[error("unsupported operation: {0}")]
136136
UnsupportedOperation(String),
137137

138+
#[error("{0}")]
139+
RefreshTokenFailed(RefreshTokenFailedError),
140+
138141
#[error("Fatal error: {0}")]
139142
Fatal(String),
140143

@@ -201,6 +204,30 @@ impl std::fmt::Display for ResponseStreamFailed {
201204
}
202205
}
203206

207+
#[derive(Debug, Clone, PartialEq, Eq, Error)]
208+
#[error("{message}")]
209+
pub struct RefreshTokenFailedError {
210+
pub reason: RefreshTokenFailedReason,
211+
pub message: String,
212+
}
213+
214+
impl RefreshTokenFailedError {
215+
pub fn new(reason: RefreshTokenFailedReason, message: impl Into<String>) -> Self {
216+
Self {
217+
reason,
218+
message: message.into(),
219+
}
220+
}
221+
}
222+
223+
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
224+
pub enum RefreshTokenFailedReason {
225+
Expired,
226+
Exhausted,
227+
Revoked,
228+
Other,
229+
}
230+
204231
#[derive(Debug)]
205232
pub struct UnexpectedResponseError {
206233
pub status: StatusCode,

0 commit comments

Comments
 (0)