Skip to content

CMR-11116 and CMR-11106: Improve error handling for launchpad tokens#2398

Merged
daniel-zamora merged 6 commits intomasterfrom
CMR-11116
Mar 12, 2026
Merged

CMR-11116 and CMR-11106: Improve error handling for launchpad tokens#2398
daniel-zamora merged 6 commits intomasterfrom
CMR-11116

Conversation

@daniel-zamora
Copy link
Copy Markdown
Contributor

@daniel-zamora daniel-zamora commented Mar 9, 2026

Overview

What is the objective?

Fix launchpad token timeout behavior and improve error propagation from EDL (Earthdata Login) to provide more actionable error messages to users.

What are the changes?

  1. CMR-11116: Fixed launchpad invalid token timeout
  • Fixed cache eviction logic so errors are not cached when re-validating expired tokens
  • Added transient error handling for 429 (rate limit) and 504 (gateway timeout) errors - these are no longer cached and are passed through immediately for retry
  • Corrected token expiration behavior to use absolute expiration times instead of relative times
  1. CMR-11106: Added error propagation from EDL for launchpad tokens
  • Added specific error type mappings for different HTTP status codes (408, 429, 500, 502, 503, 504, 401)
  • EDL error messages are now passed through directly to users instead of being wrapped in generic error messages
  • Improved error messages to be more specific about the nature of the failure (e.g., "Rate limit exceeded", "Gateway timeout")
  1. CMR-11116: Added tests for launchpad token caching behavior
  • Added comprehensive tests for token expiration, cache eviction, transient vs non-transient error handling
  • Enhanced mock URS to support stateful token expiration for realistic testing
  • Added cache size assertions and multi-key eviction tests

What areas of the application does this impact?

  • Ingest - Launchpad token authentication and caching
  • Access Control - User authentication via Launchpad tokens
  • transmit-lib - URS client and launchpad user cache
  • mock-echo-app - Mock URS for testing

Required Checklist

  • New and existing unit and int tests pass locally and remotely
  • clj-kondo has been run locally and all errors in changed files are corrected
  • I have commented my code, particularly in hard-to-understand areas
  • I have made changes to the documentation (if necessary)
  • My changes generate no new warnings

Additional Checklist

  • I have removed unnecessary/dead code and imports in files I have changed
  • I have cleaned up integration tests by doing one or more of the following:
    • migrated any are2 tests to are3 in files I have changed
    • de-duped, consolidated, removed dead int tests
    • transformed applicable int tests into unit tests
    • reduced number of system state resets by updating fixtures. Ex) (use-fixtures :each (ingest/reset-fixture {})) to be :once instead of :each

Summary by CodeRabbit

  • New Features

    • Broader error handling for additional HTTP statuses (timeouts, rate limits, server/gateway issues) with clearer responses.
    • Improved Launchpad token validation with per-token expiration tracking and deterministic test tokens.
    • Introduced cache-eviction capability across cache implementations for finer control and to avoid persisting transient failures.
  • Tests

    • Added tests for transient vs non‑transient error caching and for cache eviction behavior.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 9, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds cache eviction APIs and implementations, extends HTTP error-to-status mappings, refines Launchpad token validation/caching (SHA-256 keys, transient-error non-caching, eviction on expiration), updates mock Echo token behaviors, and adds/adjusts integration and unit tests.

Changes

Cohort / File(s) Summary
Cache protocol & in-memory
common-lib/src/cmr/common/cache.clj, common-lib/src/cmr/common/cache/in_memory_cache.clj, common-lib/src/cmr/common/cache/fallback_cache.clj, common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj
Added evict to CmrCache protocol and implemented evict across InMemoryCache, FallbackCache, and SingleThreadLookupCache to remove specific keys from caches.
Redis cache
redis-utils-lib/src/cmr/redis_utils/redis_cache.clj
Added evict implementation for RedisCache that deletes the serialized key from Redis.
Cache tests
common-lib/test/cmr/common/test/cache/in_memory_cache.clj
Added tests verifying eviction removes keys, triggers reload behavior, and preserves unrelated entries.
Launchpad cache & transmit changes
transmit-lib/src/cmr/transmit/launchpad_user_cache.clj, transmit-lib/src/cmr/transmit/urs.clj, transmit-lib/src/cmr/transmit/tokens.clj
Switched cache keys to DigestUtils/sha256Hex(token), added transient-error-types, avoid caching transient errors (e.g., 429, 504), evict expired entries, and added fine-grained HTTP-status handling (408,429,500,502,503,504) and 429 handling in tokens flow.
Mock Echo URS & routes
mock-echo-app/src/cmr/mock_echo/api/urs.clj, mock-echo-app/src/cmr/mock_echo/api/routes.clj
Added launchpad-token-validations and reset-launchpad-tokens! for deterministic token expiration; adjusted reset route and removed unused requires; added token behaviors for 429/504/invalid cases.
Integration tests
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
Added tests asserting transient errors (429, 504) are not cached while non-transient errors are cached; updated token-key derivation to SHA-256.
HTTP error mappings
common-lib/src/cmr/common/api/errors.clj
Expanded type->http-status-code map to include :request-timeout (408), :internal-server-error (500), and :bad-gateway (502).
Manifests
project.clj, deps.edn
Referenced/updated dependencies to support new imports (DigestUtils, time utilities).

Sequence Diagram

sequenceDiagram
    participant Client
    participant Cache as "Token Cache"
    participant EDL as "EDL / Launchpad"
    participant Mock as "Mock Echo"

    Client->>Cache: validate(token)
    alt Cached & valid
        Cache-->>Client: return cached user
    else Cached & error (non-transient)
        Cache-->>Client: return cached error payload
    else Not cached or expired
        Cache->>Cache: evict(key)
        Client->>EDL: request validation(token)
        EDL-->>Mock: upstream response (200 / 401 / 408 / 429 / 500 / 502 / 503 / 504)
        alt Success (200)
            EDL-->>Cache: user data
            Cache->>Cache: store user with expiration
            Cache-->>Client: return user
        else Transient (429,504)
            EDL-->>Client: return transient error (do not cache)
        else Non-transient (401,408,500,502,503)
            EDL-->>Cache: error payload
            Cache->>Cache: store error with expiration
            Cache-->>Client: return error
        end
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

hacktoberfest-accepted

Suggested reviewers

  • zimzoom
  • jceaser
  • eereiter

Poem

🐰
I nibbled stale keys beneath the log,
Evicted cobwebs from the cachey bog.
Tokens ticked, some fleeting, some grave,
I keep the swift, but never save the wave.
Hop, validate — a rabbit's rave!

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: improving error handling for launchpad tokens, with reference to the relevant CMR issue numbers.
Description check ✅ Passed The description comprehensively covers all required sections including objectives, detailed changes across both CMR issues, impacted areas, and completed checklists for testing, linting, and cleanup.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch CMR-11116

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@daniel-zamora daniel-zamora changed the title Cmr 11116 CMR-11116 and CMR-11106: Improve error handling for launchpad tokens Mar 9, 2026
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj (1)

98-109: Clarify test intent: verifies re-fetch after cache expiry, not cached error persistence.

The test name suggests "Non-transient errors are cached for 5 minutes," but advancing time by 301 seconds exceeds the 5-minute (300s) cache TTL. At that point, the cache entry would have expired, so the second request is actually re-fetching from the mock (which still returns 401).

If the intent is to verify the error remains cached within the 5-minute window, consider advancing time by less than 300 seconds. If the intent is to verify behavior after cache expiry, the test name and comment should reflect that.

💡 If testing cached error persistence within TTL
-      (dev-sys-util/advance-time! 301)
+      (dev-sys-util/advance-time! 200) ;; Still within 5-minute cache TTL
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`
around lines 98 - 109, The test non-transient-errors-are-cached-test is
ambiguous: either change the TTL advance to stay within 5 minutes (use
dev-sys-util/advance-time! 299) to verify the 401 stays cached and the second
ingest/ingest-concept still hits the cache, or keep the 301s advance but rename
the test (and any assertion text) to indicate "after cache expiry" and add an
assertion using cache-util/list-cache-keys (or check token-key) that the cache
entry is gone before the second ingest/ingest-concept; update the test
name/comment accordingly and adjust assertions to match the chosen intent.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@common-lib/src/cmr/common/cache.clj`:
- Around line 41-43: Implement the missing evict method for each CmrCache
implementation: add an evict implementation to FallbackCache that calls evict on
both primary and fallback caches (or on the underlying cache instances
represented in the FallbackCache record), add an evict to
SingleThreadLookupCache that removes the key from its local lookup map and
delegates to the wrapped cache's evict, and add an evict to RedisCache that
performs the Redis deletion logic currently used for put/remove (using the
existing redis client/connection code) so it deletes the key from Redis; ensure
each method signature matches the CmrCache protocol (evict [cache cache-key])
and delegates to the appropriate existing functions/fields in the FallbackCache,
SingleThreadLookupCache, and RedisCache implementations.

In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`:
- Around line 58-72: When a cached Launchpad entry expires and
get-launchpad-user-fn returns a non-valid result, the code currently throws the
error without re-caching it; change the expired-path so that on a non-valid
fresh-result you set the cache (using cache/set-value with the fresh-result)
before throwing the service error, mirroring the initial cache-miss behavior for
errors (use the same get-launchpad-user-fn handling and caching behavior), and
keep the existing errors/throw-service-error call to raise the error after
caching.

---

Nitpick comments:
In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 98-109: The test non-transient-errors-are-cached-test is
ambiguous: either change the TTL advance to stay within 5 minutes (use
dev-sys-util/advance-time! 299) to verify the 401 stays cached and the second
ingest/ingest-concept still hits the cache, or keep the 301s advance but rename
the test (and any assertion text) to indicate "after cache expiry" and add an
assertion using cache-util/list-cache-keys (or check token-key) that the cache
entry is gone before the second ingest/ingest-concept; update the test
name/comment accordingly and adjust assertions to match the chosen intent.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 2f90ba1e-3569-4914-8eaa-2a22732dffa3

📥 Commits

Reviewing files that changed from the base of the PR and between 22c44cb and 3d77b74.

📒 Files selected for processing (10)
  • common-lib/src/cmr/common/api/errors.clj
  • common-lib/src/cmr/common/cache.clj
  • common-lib/src/cmr/common/cache/in_memory_cache.clj
  • common-lib/test/cmr/common/test/cache/in_memory_cache.clj
  • mock-echo-app/src/cmr/mock_echo/api/routes.clj
  • mock-echo-app/src/cmr/mock_echo/api/urs.clj
  • system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/tokens.clj
  • transmit-lib/src/cmr/transmit/urs.clj

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
common-lib/src/cmr/common/cache.clj (1)

41-43: ⚠️ Potential issue | 🟠 Major

Finish the evict rollout before extending CmrCache.

FallbackCache, SingleThreadLookupCache, and RedisCache still implement cache/CmrCache without an evict method. Any cache/evict dispatch to one of those types will fail, so this protocol change is still incomplete.

Run this to confirm every CmrCache implementation picked up the new method. Expected result: no MISSING evict lines.

#!/bin/bash
set -euo pipefail

rg -l --type clojure 'cache/CmrCache' common-lib/src redis-utils-lib/src | while read -r file; do
  if ! rg -q '\(evict\b' "$file"; then
    echo "MISSING evict: $file"
  fi
done
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@common-lib/src/cmr/common/cache.clj` around lines 41 - 43, The protocol
CmrCache was extended with an evict method but FallbackCache,
SingleThreadLookupCache, and RedisCache do not implement it, causing dispatch
failures; add an evict implementation to each of these types (implement the
protocol method named evict / cache/evict) that matches their semantics: for
FallbackCache delegate eviction to both/backing caches or the appropriate inner
cache, for SingleThreadLookupCache remove the key from its local lookup store
and delegate to its backing cache, and for RedisCache perform the appropriate
key removal from Redis; after changes run the provided rg script to verify no
"MISSING evict" results.
🧹 Nitpick comments (2)
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj (1)

98-110: This test still doesn't prove the cached 401 was refreshed after 5 minutes.

The second request returns 401 whether the cache expired and revalidated or the stale error was served again. Capturing the cached :expiration-time before advancing time and asserting it moves forward afterward would make this test cover the timeout behavior it is naming.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`
around lines 98 - 110, The test non-transient-errors-are-cached-test should
capture the cache entry's expiration timestamp before advancing time and assert
that it is updated after the second ingest; locate the cache entry using the
token-key variable and cache-util/list-cache-keys or a cache-read helper to
fetch the cached map (which contains :expiration-time), store that
:expiration-time in a local binding, call dev-sys-util/advance-time! 301 and
perform the second ingest, then read the cache entry again and assert that the
new :expiration-time is greater than the previously captured value to prove the
entry was refreshed after expiry.
mock-echo-app/src/cmr/mock_echo/api/urs.clj (1)

66-76: Consider using the stored :expires-in value for consistency.

The :expires-in is stored in the atom on line 75 but never read back. Line 70 always uses the local constant expires-in instead of (:expires-in validation-info). While this works correctly for the current test token (always 1600 seconds), it would be more consistent to use the stored value:

♻️ Proposed fix to use stored expires-in
       (let [expires-in 1600
             validation-info (get `@launchpad-token-validations` token)]
         (if validation-info
-          (let [elapsed-seconds (t/in-seconds (t/interval (:first-validated-at validation-info) (time-keeper/now)))]
-            (if (>= elapsed-seconds expires-in)
+          (let [stored-expires-in (:expires-in validation-info)
+                elapsed-seconds (t/in-seconds (t/interval (:first-validated-at validation-info) (time-keeper/now)))]
+            (if (>= elapsed-seconds stored-expires-in)
               {:status 401 :body {:error (format "Launchpad token (partially redacted) [%s] has expired."
                                                   (common-util/scrub-token token))}}
-              {:status 200 :body {:uid "user1" :lp_token_expires_in (- expires-in elapsed-seconds)}}))
+              {:status 200 :body {:uid "user1" :lp_token_expires_in (- stored-expires-in elapsed-seconds)}}))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@mock-echo-app/src/cmr/mock_echo/api/urs.clj` around lines 66 - 76, The code
binds a local expires-in constant but then stores :expires-in in
launchpad-token-validations and never reads it; update the logic in the branch
that reads validation-info to use (:expires-in validation-info) instead of the
local expires-in when comparing elapsed-seconds and when computing
lp_token_expires_in, so the code consistently uses the stored :expires-in value
from validation-info (the atom launchpad-token-validations and the map key
:expires-in) rather than the hardcoded expires-in local.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 83-96: Update the assertions in transient-errors-not-cached-test
to match the new Launchpad messages emitted by
transmit-lib/src/cmr/transmit/urs.clj: replace the re-find checks for "Rate
limit exceeded" and "Gateway timeout" with patterns that match "Launchpad rate
limit exceeded" and "(gateway timeout)" (or use case-insensitive patterns like
(?i)rate limit exceeded and (?i)gateway timeout) so the ingest/ingest-concept
responses asserted in transient-errors-not-cached-test will match the current
URS messages.

In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`:
- Around line 56-57: Currently cache-key is created with (keyword (str (hash
token))) which interns a keyword per token and exposes collision/memory-leak
risks; change cache-key to be a plain string (not a keyword) derived from the
token or, preferably, a collision-resistant digest/HMAC (e.g., SHA-256 or
HMAC-SHA256 of token) and use that string when calling cache/get-value with
get-launchpad-user-fn so tokens are not interned as keywords and collisions are
minimized.
- Around line 61-74: After evicting the stale entry, do not write a failing
revalidation result back into the cache; change the logic in the block using
cache/evict, get-launchpad-user-fn, cache/set-value, and
errors/throw-service-error so that if fresh-result is valid you set
cache/set-value and return it, but if fresh-result is invalid you do NOT call
cache/set-value and instead directly raise the error
(errors/throw-service-error) using the existing (:error-type/:error-message)
(with the scrubbed token message fallback); this preserves eviction on failed
revalidation instead of caching the failure.

---

Duplicate comments:
In `@common-lib/src/cmr/common/cache.clj`:
- Around line 41-43: The protocol CmrCache was extended with an evict method but
FallbackCache, SingleThreadLookupCache, and RedisCache do not implement it,
causing dispatch failures; add an evict implementation to each of these types
(implement the protocol method named evict / cache/evict) that matches their
semantics: for FallbackCache delegate eviction to both/backing caches or the
appropriate inner cache, for SingleThreadLookupCache remove the key from its
local lookup store and delegate to its backing cache, and for RedisCache perform
the appropriate key removal from Redis; after changes run the provided rg script
to verify no "MISSING evict" results.

---

Nitpick comments:
In `@mock-echo-app/src/cmr/mock_echo/api/urs.clj`:
- Around line 66-76: The code binds a local expires-in constant but then stores
:expires-in in launchpad-token-validations and never reads it; update the logic
in the branch that reads validation-info to use (:expires-in validation-info)
instead of the local expires-in when comparing elapsed-seconds and when
computing lp_token_expires_in, so the code consistently uses the stored
:expires-in value from validation-info (the atom launchpad-token-validations and
the map key :expires-in) rather than the hardcoded expires-in local.

In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 98-110: The test non-transient-errors-are-cached-test should
capture the cache entry's expiration timestamp before advancing time and assert
that it is updated after the second ingest; locate the cache entry using the
token-key variable and cache-util/list-cache-keys or a cache-read helper to
fetch the cached map (which contains :expiration-time), store that
:expiration-time in a local binding, call dev-sys-util/advance-time! 301 and
perform the second ingest, then read the cache entry again and assert that the
new :expiration-time is greater than the previously captured value to prove the
entry was refreshed after expiry.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e071585c-c7b1-4cf8-bfc2-3caf1c259ae8

📥 Commits

Reviewing files that changed from the base of the PR and between 3d77b74 and 392df50.

📒 Files selected for processing (10)
  • common-lib/src/cmr/common/api/errors.clj
  • common-lib/src/cmr/common/cache.clj
  • common-lib/src/cmr/common/cache/in_memory_cache.clj
  • common-lib/test/cmr/common/test/cache/in_memory_cache.clj
  • mock-echo-app/src/cmr/mock_echo/api/routes.clj
  • mock-echo-app/src/cmr/mock_echo/api/urs.clj
  • system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/tokens.clj
  • transmit-lib/src/cmr/transmit/urs.clj
🚧 Files skipped from review as they are similar to previous changes (1)
  • common-lib/test/cmr/common/test/cache/in_memory_cache.clj

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj (1)

100-112: Consider removing unused token-key variable.

The token-key computed on line 103 is never used in the test assertions. If no cache key verification is intended, this variable can be removed to improve clarity.

♻️ Proposed fix to remove unused variable
 (deftest non-transient-errors-are-cached-test
   (testing "Non-transient errors are cached for 5 minutes"
-    (let [token "ABC-INV-ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ"
-          token-key (DigestUtils/sha256Hex token)]
+    (let [token "ABC-INV-ZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZZ"]
       (let [resp (ingest/ingest-concept (data-umm-c/collection-concept {}) {:token token})]
         (is (= 401 (:status resp))))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`
around lines 100 - 112, The test non-transient-errors-are-cached-test computes
token-key using (DigestUtils/sha256Hex token) but never uses it; remove the
unused token-key binding to clean up the test, i.e., delete the token-key
binding declaration that references DigestUtils/sha256Hex and keep the rest of
the test (token, ingest/ingest-concept calls, cache-util/list-cache-keys check,
and dev-sys-util/advance-time! invocation) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 100-112: The test non-transient-errors-are-cached-test computes
token-key using (DigestUtils/sha256Hex token) but never uses it; remove the
unused token-key binding to clean up the test, i.e., delete the token-key
binding declaration that references DigestUtils/sha256Hex and keep the rest of
the test (token, ingest/ingest-concept calls, cache-util/list-cache-keys check,
and dev-sys-util/advance-time! invocation) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 443d2468-6180-4ffb-89de-9406fc1346c5

📥 Commits

Reviewing files that changed from the base of the PR and between 392df50 and faf4a20.

📒 Files selected for processing (2)
  • system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/launchpad_user_cache.clj

@daniel-zamora daniel-zamora force-pushed the CMR-11116 branch 2 times, most recently from 5ef6ea1 to 4ee6718 Compare March 10, 2026 04:35
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`:
- Around line 29-31: The transient-error-types set currently only excludes
:too-many-requests and :gateway-timeout but upstream urs/get-launchpad-user can
throw :request-timeout, :internal-server-error, :bad-gateway, and
:service-unavailable; update the transient-error-types definition to include
those four symbols so the cache logic that treats other errors as invalid-token
results (e.g. the invalid-token-timeout/invalid-token caching path) will skip
caching for these transient server-side failures and let callers retry
immediately.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f0de632c-521c-49f9-9bbe-2174a428b74c

📥 Commits

Reviewing files that changed from the base of the PR and between faf4a20 and 5ef6ea1.

📒 Files selected for processing (10)
  • common-lib/src/cmr/common/api/errors.clj
  • common-lib/src/cmr/common/cache.clj
  • common-lib/src/cmr/common/cache/in_memory_cache.clj
  • common-lib/test/cmr/common/test/cache/in_memory_cache.clj
  • mock-echo-app/src/cmr/mock_echo/api/routes.clj
  • mock-echo-app/src/cmr/mock_echo/api/urs.clj
  • system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/tokens.clj
  • transmit-lib/src/cmr/transmit/urs.clj
🚧 Files skipped from review as they are similar to previous changes (3)
  • common-lib/src/cmr/common/cache/in_memory_cache.clj
  • system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
  • common-lib/src/cmr/common/api/errors.clj

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 10, 2026

Codecov Report

❌ Patch coverage is 20.95238% with 83 lines in your changes missing coverage. Please review.
✅ Project coverage is 57.84%. Comparing base (22c44cb) to head (65e6cea).

Files with missing lines Patch % Lines
transmit-lib/src/cmr/transmit/urs.clj 0.00% 35 Missing ⚠️
...smit-lib/src/cmr/transmit/launchpad_user_cache.clj 8.00% 23 Missing ⚠️
mock-echo-app/src/cmr/mock_echo/api/urs.clj 15.78% 16 Missing ⚠️
common-lib/src/cmr/common/cache/cache_spec.clj 64.28% 0 Missing and 5 partials ⚠️
mock-echo-app/src/cmr/mock_echo/api/routes.clj 0.00% 2 Missing ⚠️
...edis-utils-lib/src/cmr/redis_utils/redis_cache.clj 50.00% 0 Missing and 1 partial ⚠️
transmit-lib/src/cmr/transmit/tokens.clj 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2398      +/-   ##
==========================================
- Coverage   57.88%   57.84%   -0.04%     
==========================================
  Files        1067     1067              
  Lines       73351    73440      +89     
  Branches     2124     2129       +5     
==========================================
+ Hits        42462    42485      +23     
- Misses      28911    28968      +57     
- Partials     1978     1987       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

@eereiter eereiter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the other CmrCaches should be updated with evict, even if they don't do anything. That way we avoid exceptions being thrown. Also there are a couple of other code rabbit suggestions. Please let me know if they are not valid and then I can approve this PR.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
transmit-lib/src/cmr/transmit/launchpad_user_cache.clj (1)

29-31: ⚠️ Potential issue | 🟠 Major

Expand transient error types to include all server-side availability errors.

The transient-error-types set currently only includes :too-many-requests and :gateway-timeout, but urs.clj now throws additional server-side error types (:request-timeout, :internal-server-error, :bad-gateway, :service-unavailable) that represent temporary EDL/Launchpad outages. These should also be treated as transient to avoid caching valid tokens as failures during temporary service disruptions.

Suggested fix to include all transient error types
 (def transient-error-types
   "Error types that should not be cached - they are transient and should be retried immediately"
-  #{:too-many-requests :gateway-timeout})
+  #{:request-timeout
+    :too-many-requests
+    :internal-server-error
+    :bad-gateway
+    :service-unavailable
+    :gateway-timeout})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj` around lines 29 - 31,
The transient-error-types set currently only contains :too-many-requests and
:gateway-timeout; update the var transient-error-types to also include the
additional server-side availability keywords thrown by urs.clj—:request-timeout,
:internal-server-error, :bad-gateway, and :service-unavailable—so these error
kinds are treated as transient (not cached) during EDL/Launchpad outages; modify
the literal set in transmit-lib/src/cmr/transmit/launchpad_user_cache.clj to
include those symbols.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`:
- Around line 29-31: The transient-error-types set currently only contains
:too-many-requests and :gateway-timeout; update the var transient-error-types to
also include the additional server-side availability keywords thrown by
urs.clj—:request-timeout, :internal-server-error, :bad-gateway, and
:service-unavailable—so these error kinds are treated as transient (not cached)
during EDL/Launchpad outages; modify the literal set in
transmit-lib/src/cmr/transmit/launchpad_user_cache.clj to include those symbols.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4b11b2ce-999c-4119-80d4-b10627f3bf20

📥 Commits

Reviewing files that changed from the base of the PR and between 5ef6ea1 and f9cd940.

📒 Files selected for processing (6)
  • common-lib/src/cmr/common/cache/fallback_cache.clj
  • common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj
  • mock-echo-app/src/cmr/mock_echo/api/urs.clj
  • redis-utils-lib/src/cmr/redis_utils/redis_cache.clj
  • system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
  • transmit-lib/src/cmr/transmit/launchpad_user_cache.clj

@daniel-zamora daniel-zamora merged commit e423288 into master Mar 12, 2026
7 checks passed
eereiter pushed a commit that referenced this pull request Mar 19, 2026
…2398)

* CMR-11116: fixes launchpad invalid token timeout

* CMR-11106: adds error propagation from edl for launchpad tokens

* CMR-11116: adds tests for launchpad token caching behavior

* CMR-11116: pr comments

* CMR-11116: pr nitpicks

* CMR-11116: pr nitpicks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants