Skip to content

Commit 4e1f70a

Browse files
etraut-openaiHolovkat
authored andcommitted
Improved token refresh handling to address "Re-connecting" behavior (#6231)
Currently, when the access token expires, we attempt to use the refresh token to acquire a new access token. This works most of the time. However, there are situations where the refresh token is expired, exhausted (already used to perform a refresh), or revoked. In those cases, the current logic treats the error as transient and attempts to retry it repeatedly. This PR changes the token refresh logic to differentiate between permanent and transient errors. It also changes callers to treat the permanent errors as fatal rather than retrying them. And it provides better error messages to users so they understand how to address the problem. These error messages should also help us further understand why we're seeing examples of refresh token exhaustion. Here is the error message in the CLI. The same text appears within the extension. <img width="863" height="38" alt="image" src="https://github.com/user-attachments/assets/7ffc0d08-ebf0-4900-b9a9-265064202f4f" /> I also correct the spelling of "Re-connecting", which shouldn't have a hyphen in it. Testing: I manually tested these code paths by adding temporary code to programmatically cause my refresh token to be exhausted (by calling the token refresh endpoint in a tight loop more than 50 times). I then simulated an access token expiration, which caused the token refresh logic to be invoked. I confirmed that the updated logic properly handled the error condition. Note: We earlier discussed the idea of forcefully logging out the user at the point where token refresh failed. I made several attempts to do this, and all of them resulted in a bad UX. It's important to surface this error to users in a way that explains the problem and tells them that they need to log in again. We also previously discussed deleting the auth.json file when this condition is detected. That also creates problems because it effectively changes the auth status from logged in to logged out, and this causes odd failures and inconsistent UX. I think it's therefore better not to delete auth.json in this case. If the user closes the CLI or VSCE and starts it again, we properly detect that the access token is expired and the refresh token is "dead", and we force the user to go through the login flow at that time. This should address aspects of #6191, #5679, and #5505
1 parent f824d7e commit 4e1f70a

File tree

5 files changed

+455
-31
lines changed

5 files changed

+455
-31
lines changed

codex-rs/core/src/auth.rs

Lines changed: 144 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
mod storage;
22

33
use chrono::Utc;
4+
use reqwest::StatusCode;
45
use serde::Deserialize;
56
use serde::Serialize;
67
#[cfg(test)]
@@ -29,10 +30,14 @@ pub use crate::auth::storage::try_read_auth_json;
2930
pub use crate::auth::storage::write_auth_json;
3031
use crate::config::Config;
3132
use crate::default_client::CodexHttpClient;
33+
use crate::error::RefreshTokenFailedError;
34+
use crate::error::RefreshTokenFailedReason;
3235
use crate::token_data::PlanType;
3336
use crate::token_data::TokenData;
3437
use crate::token_data::parse_id_token;
3538
use crate::util::try_parse_error_message;
39+
use serde_json::Value;
40+
use thiserror::Error;
3641

3742
#[derive(Debug, Clone)]
3843
pub struct CodexAuth {
@@ -53,26 +58,63 @@ impl PartialEq for CodexAuth {
5358
// TODO(pakrym): use token exp field to check for expiration instead
5459
const TOKEN_REFRESH_INTERVAL: i64 = 8;
5560

61+
const REFRESH_TOKEN_EXPIRED_MESSAGE: &str = "Your access token could not be refreshed because your refresh token has expired. Please log out and sign in again.";
62+
const REFRESH_TOKEN_REUSED_MESSAGE: &str = "Your access token could not be refreshed because your refresh token was already used. Please log out and sign in again.";
63+
const REFRESH_TOKEN_INVALIDATED_MESSAGE: &str = "Your access token could not be refreshed because your refresh token was revoked. Please log out and sign in again.";
64+
const REFRESH_TOKEN_UNKNOWN_MESSAGE: &str =
65+
"Your access token could not be refreshed. Please log out and sign in again.";
66+
const REFRESH_TOKEN_URL: &str = "https://auth.openai.com/oauth/token";
67+
pub const REFRESH_TOKEN_URL_OVERRIDE_ENV_VAR: &str = "CODEX_REFRESH_TOKEN_URL_OVERRIDE";
68+
69+
#[derive(Debug, Error)]
70+
pub enum RefreshTokenError {
71+
#[error("{0}")]
72+
Permanent(#[from] RefreshTokenFailedError),
73+
#[error(transparent)]
74+
Transient(#[from] std::io::Error),
75+
}
76+
77+
impl RefreshTokenError {
78+
pub fn failed_reason(&self) -> Option<RefreshTokenFailedReason> {
79+
match self {
80+
Self::Permanent(error) => Some(error.reason),
81+
Self::Transient(_) => None,
82+
}
83+
}
84+
85+
fn other_with_message(message: impl Into<String>) -> Self {
86+
Self::Transient(std::io::Error::other(message.into()))
87+
}
88+
}
89+
90+
impl From<RefreshTokenError> for std::io::Error {
91+
fn from(err: RefreshTokenError) -> Self {
92+
match err {
93+
RefreshTokenError::Permanent(failed) => std::io::Error::other(failed),
94+
RefreshTokenError::Transient(inner) => inner,
95+
}
96+
}
97+
}
98+
5699
impl CodexAuth {
57-
pub async fn refresh_token(&self) -> Result<String, std::io::Error> {
100+
pub async fn refresh_token(&self) -> Result<String, RefreshTokenError> {
58101
tracing::info!("Refreshing token");
59102

60-
let token_data = self
61-
.get_current_token_data()
62-
.ok_or(std::io::Error::other("Token data is not available."))?;
103+
let token_data = self.get_current_token_data().ok_or_else(|| {
104+
RefreshTokenError::Transient(std::io::Error::other("Token data is not available."))
105+
})?;
63106
let token = token_data.refresh_token;
64107

65-
let refresh_response = try_refresh_token(token, &self.client)
66-
.await
67-
.map_err(std::io::Error::other)?;
108+
let refresh_response = try_refresh_token(token, &self.client).await?;
68109

69110
let updated = update_tokens(
70111
&self.storage,
71112
refresh_response.id_token,
72113
refresh_response.access_token,
73114
refresh_response.refresh_token,
74115
)
75-
.await?;
116+
.await
117+
.map_err(RefreshTokenError::from)?;
76118

77119
if let Ok(mut auth_lock) = self.auth_dot_json.lock() {
78120
*auth_lock = Some(updated.clone());
@@ -81,7 +123,7 @@ impl CodexAuth {
81123
let access = match updated.tokens {
82124
Some(t) => t.access_token,
83125
None => {
84-
return Err(std::io::Error::other(
126+
return Err(RefreshTokenError::other_with_message(
85127
"Token data is not available after refresh.",
86128
));
87129
}
@@ -106,15 +148,21 @@ impl CodexAuth {
106148
..
107149
}) => {
108150
if last_refresh < Utc::now() - chrono::Duration::days(TOKEN_REFRESH_INTERVAL) {
109-
let refresh_response = tokio::time::timeout(
151+
let refresh_result = tokio::time::timeout(
110152
Duration::from_secs(60),
111153
try_refresh_token(tokens.refresh_token.clone(), &self.client),
112154
)
113-
.await
114-
.map_err(|_| {
115-
std::io::Error::other("timed out while refreshing OpenAI API key")
116-
})?
117-
.map_err(std::io::Error::other)?;
155+
.await;
156+
let refresh_response = match refresh_result {
157+
Ok(Ok(response)) => response,
158+
Ok(Err(err)) => return Err(err.into()),
159+
Err(_) => {
160+
return Err(std::io::Error::new(
161+
ErrorKind::TimedOut,
162+
"timed out while refreshing OpenAI API key",
163+
));
164+
}
165+
};
118166

119167
let updated_auth_dot_json = update_tokens(
120168
&self.storage,
@@ -436,38 +484,101 @@ async fn update_tokens(
436484
async fn try_refresh_token(
437485
refresh_token: String,
438486
client: &CodexHttpClient,
439-
) -> std::io::Result<RefreshResponse> {
487+
) -> Result<RefreshResponse, RefreshTokenError> {
440488
let refresh_request = RefreshRequest {
441489
client_id: CLIENT_ID,
442490
grant_type: "refresh_token",
443491
refresh_token,
444492
scope: "openid profile email",
445493
};
446494

495+
let endpoint = refresh_token_endpoint();
496+
447497
// Use shared client factory to include standard headers
448498
let response = client
449-
.post("https://auth.openai.com/oauth/token")
499+
.post(endpoint.as_str())
450500
.header("Content-Type", "application/json")
451501
.json(&refresh_request)
452502
.send()
453503
.await
454-
.map_err(std::io::Error::other)?;
504+
.map_err(|err| RefreshTokenError::Transient(std::io::Error::other(err)))?;
455505

456-
if response.status().is_success() {
506+
let status = response.status();
507+
if status.is_success() {
457508
let refresh_response = response
458509
.json::<RefreshResponse>()
459510
.await
460-
.map_err(std::io::Error::other)?;
511+
.map_err(|err| RefreshTokenError::Transient(std::io::Error::other(err)))?;
461512
Ok(refresh_response)
462513
} else {
463-
Err(std::io::Error::other(format!(
464-
"Failed to refresh token: {}: {}",
465-
response.status(),
466-
try_parse_error_message(&response.text().await.unwrap_or_default()),
467-
)))
514+
let body = response.text().await.unwrap_or_default();
515+
if status == StatusCode::UNAUTHORIZED {
516+
let failed = classify_refresh_token_failure(&body);
517+
Err(RefreshTokenError::Permanent(failed))
518+
} else {
519+
let message = try_parse_error_message(&body);
520+
Err(RefreshTokenError::Transient(std::io::Error::other(
521+
format!("Failed to refresh token: {status}: {message}"),
522+
)))
523+
}
468524
}
469525
}
470526

527+
fn classify_refresh_token_failure(body: &str) -> RefreshTokenFailedError {
528+
let code = extract_refresh_token_error_code(body);
529+
530+
let normalized_code = code.as_deref().map(str::to_ascii_lowercase);
531+
let reason = match normalized_code.as_deref() {
532+
Some("refresh_token_expired") => RefreshTokenFailedReason::Expired,
533+
Some("refresh_token_reused") => RefreshTokenFailedReason::Exhausted,
534+
Some("refresh_token_invalidated") => RefreshTokenFailedReason::Revoked,
535+
_ => RefreshTokenFailedReason::Other,
536+
};
537+
538+
if reason == RefreshTokenFailedReason::Other {
539+
tracing::warn!(
540+
backend_code = normalized_code.as_deref(),
541+
backend_body = body,
542+
"Encountered unknown 401 response while refreshing token"
543+
);
544+
}
545+
546+
let message = match reason {
547+
RefreshTokenFailedReason::Expired => REFRESH_TOKEN_EXPIRED_MESSAGE.to_string(),
548+
RefreshTokenFailedReason::Exhausted => REFRESH_TOKEN_REUSED_MESSAGE.to_string(),
549+
RefreshTokenFailedReason::Revoked => REFRESH_TOKEN_INVALIDATED_MESSAGE.to_string(),
550+
RefreshTokenFailedReason::Other => REFRESH_TOKEN_UNKNOWN_MESSAGE.to_string(),
551+
};
552+
553+
RefreshTokenFailedError::new(reason, message)
554+
}
555+
556+
fn extract_refresh_token_error_code(body: &str) -> Option<String> {
557+
if body.trim().is_empty() {
558+
return None;
559+
}
560+
561+
let Value::Object(map) = serde_json::from_str::<Value>(body).ok()? else {
562+
return None;
563+
};
564+
565+
if let Some(error_value) = map.get("error") {
566+
match error_value {
567+
Value::Object(obj) => {
568+
if let Some(code) = obj.get("code").and_then(Value::as_str) {
569+
return Some(code.to_string());
570+
}
571+
}
572+
Value::String(code) => {
573+
return Some(code.to_string());
574+
}
575+
_ => {}
576+
}
577+
}
578+
579+
map.get("code").and_then(Value::as_str).map(str::to_string)
580+
}
581+
471582
#[derive(Serialize)]
472583
struct RefreshRequest {
473584
client_id: &'static str,
@@ -486,6 +597,11 @@ struct RefreshResponse {
486597
// Shared constant for token refresh (client id used for oauth token refresh flow)
487598
pub const CLIENT_ID: &str = "app_EMoamEEZ73f0CkXaXp7hrann";
488599

600+
fn refresh_token_endpoint() -> String {
601+
std::env::var(REFRESH_TOKEN_URL_OVERRIDE_ENV_VAR)
602+
.unwrap_or_else(|_| REFRESH_TOKEN_URL.to_string())
603+
}
604+
489605
use std::sync::RwLock;
490606

491607
/// Internal cached auth state.
@@ -1017,7 +1133,9 @@ impl AuthManager {
10171133

10181134
/// Attempt to refresh the current auth token (if any). On success, reload
10191135
/// the auth state from disk so other components observe refreshed token.
1020-
pub async fn refresh_token(&self) -> std::io::Result<Option<String>> {
1136+
/// If the token refresh fails in a permanent (non‑transient) way, logs out
1137+
/// to clear invalid auth state.
1138+
pub async fn refresh_token(&self) -> Result<Option<String>, RefreshTokenError> {
10211139
let auth = match self.auth() {
10221140
Some(a) => a,
10231141
None => return Ok(None),

codex-rs/core/src/client.rs

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,7 @@ use tracing::warn;
3434

3535
use crate::AuthManager;
3636
use crate::auth::CodexAuth;
37+
use crate::auth::RefreshTokenError;
3738
use crate::chat_completions::AggregateStreamExt;
3839
use crate::chat_completions::stream_chat_completions;
3940
use crate::client_common::Prompt;
@@ -450,12 +451,17 @@ impl ModelClient {
450451
&& let Some(manager) = auth_manager.as_ref()
451452
&& let Some(auth) = auth.as_ref()
452453
&& auth.mode == AuthMode::ChatGPT
454+
&& let Err(err) = manager.refresh_token().await
453455
{
454-
manager.refresh_token().await.map_err(|err| {
455-
StreamAttemptError::Fatal(CodexErr::Fatal(format!(
456-
"Failed to refresh ChatGPT credentials: {err}"
457-
)))
458-
})?;
456+
let stream_error = match err {
457+
RefreshTokenError::Permanent(failed) => {
458+
StreamAttemptError::Fatal(CodexErr::RefreshTokenFailed(failed))
459+
}
460+
RefreshTokenError::Transient(other) => {
461+
StreamAttemptError::RetryableTransportError(CodexErr::Io(other))
462+
}
463+
};
464+
return Err(stream_error);
459465
}
460466

461467
// The OpenAI Responses endpoint returns structured JSON bodies even for 4xx/5xx

codex-rs/core/src/error.rs

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,6 +135,9 @@ pub enum CodexErr {
135135
#[error("unsupported operation: {0}")]
136136
UnsupportedOperation(String),
137137

138+
#[error("{0}")]
139+
RefreshTokenFailed(RefreshTokenFailedError),
140+
138141
#[error("Fatal error: {0}")]
139142
Fatal(String),
140143

@@ -201,6 +204,30 @@ impl std::fmt::Display for ResponseStreamFailed {
201204
}
202205
}
203206

207+
#[derive(Debug, Clone, PartialEq, Eq, Error)]
208+
#[error("{message}")]
209+
pub struct RefreshTokenFailedError {
210+
pub reason: RefreshTokenFailedReason,
211+
pub message: String,
212+
}
213+
214+
impl RefreshTokenFailedError {
215+
pub fn new(reason: RefreshTokenFailedReason, message: impl Into<String>) -> Self {
216+
Self {
217+
reason,
218+
message: message.into(),
219+
}
220+
}
221+
}
222+
223+
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
224+
pub enum RefreshTokenFailedReason {
225+
Expired,
226+
Exhausted,
227+
Revoked,
228+
Other,
229+
}
230+
204231
#[derive(Debug)]
205232
pub struct UnexpectedResponseError {
206233
pub status: StatusCode,

0 commit comments

Comments
 (0)