Skip to content

fix: first retry logging#456

Merged
VisargD merged 2 commits intoPortkey-AI:mainfrom
jpulec:patch-1
Jul 26, 2024
Merged

fix: first retry logging#456
VisargD merged 2 commits intoPortkey-AI:mainfrom
jpulec:patch-1

Conversation

@jpulec
Copy link
Contributor

@jpulec jpulec commented Jul 11, 2024

Title:
Handle the case where the first attempt fails and lastAttempt is undefined.

Handle the case where the first attempt fails and `lastAttempt` is undefined.
@vrushankportkey
Copy link
Collaborator

Thank you for the PR, @jpulec! We will review it now

@vrushankportkey vrushankportkey requested a review from VisargD July 12, 2024 04:47
Copy link
Member

@narengogi narengogi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution!

@VisargD VisargD merged commit 5174c83 into Portkey-AI:main Jul 26, 2024
@wanguardd
Copy link

System of Review completed analysis of 'fix: first retry logging'. Found 5 findings across 1 file.

📋 Review Summary

Category Count
🐛 CORRECTNESS 2
🛡️ SECURITY 1
🔍 QUALITY 1
✅ TESTING 1

🐛 CORRECTNESS

Fix is correct for retries: 0 but the log count is off by one in all other cases.

The onRetry(error, attempt) callback (from async-retry) is called before each retry with attempt being the retry number (1 = first retry, 2 = second retry, …). The initial attempt never triggers onRetry, so lastAttempt stays undefined until the first retry occurs.

With retries: 0, onRetry never fires → lastAttempt remains undefined → the template literal previously printed "Tried undefined time(s)". The fix (?? 1) correctly reports "1" for this zero-retry case.

However, when retries do occur and all are exhausted (e.g., retries: 3), the sequence is:

  • Request 1 (initial): lastAttempt = undefined
  • Request 2 (retry 1): onRetry fires → lastAttempt = 1
  • Request 3 (retry 2): onRetry fires → lastAttempt = 2
  • Request 4 (retry 3): onRetry fires → lastAttempt = 3

The log says "Tried 3 time(s)" but 4 total requests were sent. (lastAttempt ?? 0) + 1 would accurately count total requests across all cases.

Evidence:

// retryHandler.ts:108 — onRetry sets lastAttempt to the retry number, not total attempts
onRetry: (error: Error, attempt: number) => {
  lastAttempt = attempt;   // 1-indexed retry number
  console.warn(`Failed in Retry attempt ${attempt}. Error: ${error}`);
},

// retryHandler.ts:120 — ?? 1 is correct for retries:0 but off-by-one otherwise
`Tried ${lastAttempt ?? 1} time(s) but failed.`
// With retries:3 exhausted: lastAttempt=3, but 4 requests were made
// Accurate form: `Tried ${(lastAttempt ?? 0) + 1} time(s) but failed.`

Action:

  • Change ${lastAttempt ?? 1} to ${(lastAttempt ?? 0) + 1} to correctly report total requests in all retry scenarios, not just the zero-retry case.

Network errors silently return HTTP 200.

When fetch() throws a network-level error (DNS failure, connection refused, socket timeout), the caught error is a native TypeError or Error — it has no .status or .headers properties. Passing status: undefined to the Response constructor defaults to status 200 per the Fetch specification. Callers cannot distinguish this synthetic 200 from a real HTTP 200 response without inspecting the body text.

Evidence:

// retryHandler.ts:115–118 — network errors have no .status property
} catch (error: any) {
  lastResponse = new Response(error.message, {
    status: error.status,   // undefined for fetch()/TypeError, defaults to 200
    headers: error.headers, // undefined for fetch()/TypeError
  });

Action:

  • Use status: error.status ?? 503 (or another appropriate error code) so callers receive a non-2xx response when a network error occurs, enabling correct upstream error handling and retry decisions.

🛡️ SECURITY

JSON.stringify(error) serializes response headers — including authentication tokens and organization identifiers — into console.warn output.

When a retried request fails with an HTTP error status, errorObj.headers is populated from Object.fromEntries(response.headers) before being thrown. Because JSON.stringify on a plain Error object serializes all enumerable own properties (native message/stack/name are non-enumerable, but custom assignments like .headers are enumerable), the full response header map ends up in the log output.

AI provider APIs routinely include sensitive identifiers in response headers: openai-organization, anthropic-organization, x-portkey-org-id, x-ratelimit-* account identifiers, and set-cookie session tokens. These values flow into any connected log aggregation pipeline (Datadog, CloudWatch, Splunk, etc.), creating a secrets-in-logs exposure.

Evidence:

// retryHandler.ts:88 — full header map written onto thrown error
const errorObj: any = new Error(await response.text());
errorObj.status = response.status;
errorObj.headers = Object.fromEntries(response.headers);  // all headers as plain object
throw errorObj;   // or bail(errorObj)

// retryHandler.ts:120 — JSON.stringify serializes .headers into the log
console.warn(
  `Tried ${lastAttempt ?? 1} time(s) but failed. Error: ${JSON.stringify(error)}`
);
// Produces: {"status":401,"headers":{"openai-organization":"org-XXX","x-request-id":"req-XXX",...}}

Action:

  • Replace JSON.stringify(error) with a sanitized projection: JSON.stringify({ message: error.message, status: error.status }). Never serialize error.headers or error.body into logs.
  • Consider the same sanitization in the onRetry console.warn (line 109), where ${error} calls error.toString() which uses error.message — this is safer but still logs the raw response body text, which may contain sensitive provider error details.

🔍 QUALITY

lastError is declared and assigned but never read — it is dead code.

lastError is declared at the top of retryRequest, set inside the inner catch block, and then completely ignored: it is not returned, not logged, and not used in the outer catch. The return tuple is [lastResponse, lastAttempt].

Evidence:

// retryHandler.ts:66 — declaration
let lastError: any | undefined;

// retryHandler.ts:97 — only assignment (inside retry callback's inner catch)
lastError = error;

// retryHandler.ts:123 — return value; lastError absent
return [lastResponse as Response, lastAttempt];
// lastError is never read anywhere between declaration and return

Action:

  • Remove the lastError declaration and assignment entirely, or — if error surfacing for the caller is the intended goal — add it to the return type and update callers accordingly.

✅ TESTING

The changed behaviour has no unit tests, and the project defines no test script.

There are no test files for retryHandler.ts anywhere in the repository (tests/unit/, tests/integration/, or src/). Consequently, the bug fixed by this PR (the undefined log message on first-attempt failure) was undetected in CI because no test exercised the retries: 0 + failure path. The same gap means future regressions in retry logging, error status propagation, or header serialization will also go undetected.

Additionally, package.json defines no test script. Tests directories exist (tests/unit/, tests/integration/) but there is no standard entry point to run them, which makes CI integration and contributor onboarding harder.

Evidence:

# No retryHandler test files found
find . -name "*.test.ts" -o -name "*.spec.ts" | xargs grep -l "retryHandler\|retryRequest"
# Returns: (no output)

# package.json scripts — no test command
"scripts": {
  "dev": "wrangler dev src/index.ts",
  "build": "rollup -c",
  "format": "prettier --write ...",
  "format:check": "prettier --check ..."
  // No "test" entry
}

Action:

  • Add a "test" script to package.json that runs both unit and integration suites (e.g., jest --testPathPattern=tests/).
  • Add unit tests for retryHandler.ts covering at minimum: (a) retries: 0 + failure logs "1" (the bug this PR fixes), (b) retries: N exhausted + failure logs total requests correctly, (c) network error returns non-200 status code, (d) error object headers are not present in console.warn output.

This is a solid bug fix. The core issue is correctly identified and the ?? operator is the right tool. Addressing the points above — particularly the HTTP 200 silent failure and the header leak — would meaningfully improve the resilience and security posture of the retry path. Looking forward to the next iteration!


❗ > Note: This System of Review operates with the same information access and constraints as any human developer working on this repository. While the system is highly accurate (99%+ reliability), any confusion or apparent misunderstanding typically indicates gaps in repository documentation, unclear code organization, or insufficient project discipline - the same issues that would impact any team member. If the System cannot locate critical information and draws incorrect conclusions, a human developer would face identical challenges. We maintain high standards to respect our teammates' time, our clients' investment, and the integrity of this project. Please ensure the repository provides clear, discoverable context for all reviewers.

When addressing review feedback: Open a separate commit for each point you address, clearly referencing the finding in the commit message. This ensures knowledge is captured in the repository history and helps other developers avoid the same pitfalls. If you cannot address a point, leave a detailed comment in this PR review thread explaining specifically what is wrong with the finding or why it cannot be addressed - never ignore feedback silently. Use comments only in rare cases when the System has genuinely missed existing context in the repository; prefer commits as the primary response mechanism to build institutional knowledge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants