DRAFT Cmr 10838 erich by eereiter · Pull Request #2397 · nasa/Common-Metadata-Repository

eereiter · 2026-03-09T19:01:48Z

Overview

What is the objective?

Please summarize the reason for the changes made in this PR.

What are the changes?

Summarize what you have changed.

What areas of the application does this impact?

List impacted applications

Required Checklist

New and existing unit and int tests pass locally and remotely
clj-kondo has been run locally and all errors in changed files are corrected
I have commented my code, particularly in hard-to-understand areas
I have made changes to the documentation (if necessary)
My changes generate no new warnings

Additional Checklist

I have removed unnecessary/dead code and imports in files I have changed
I have cleaned up integration tests by doing one or more of the following:
- migrated any are2 tests to are3 in files I have changed
- de-duped, consolidated, removed dead int tests
- transformed applicable int tests into unit tests
- reduced number of system state resets by updating fixtures. Ex) (use-fixtures :each (ingest/reset-fixture {})) to be :once instead of :each

Summary by CodeRabbit

Refactor
- Granule counts now served from a caching layer for faster, more consistent responses; provider filtering is applied after retrieval.
Chores
- Background refresh for the has-granules map standardized as a scheduled job and simplified.
Bug Fixes / Security
- Zip extraction now rejects entries that would escape the target directory (prevents path‑traversal).
- Upstream/auth error types now map to appropriate HTTP status codes for clearer error responses.
Chores
- Cache implementations gained single-key eviction support for targeted invalidation.

… instead of calling elastic search

coderabbitai · 2026-03-09T19:02:12Z

📝 Walkthrough

Walkthrough

Removed direct Elasticsearch collection-granule-count retrieval and switched callers to use the granule-counts-cache API; added cache eviction across multiple cache implementations; hardened Launchpad token handling and error mappings; removed split-cluster logging/migration paths; added zip-entry directory validation for shapefile extraction and corresponding tests.

Changes

Cohort / File(s)	Summary
Granule counts refactor `search-app/src/cmr/search/data/elastic_search_index.clj`, `search-app/src/cmr/search/data/granule_counts_cache.clj`, `search-app/src/cmr/search/services/query_execution/has_granules_or_cwic_results_feature.clj`, `search-app/src/cmr/search/services/query_execution/has_granules_results_feature.clj`, `search-app/src/cmr/search/services/query_service.clj`, `search-app/test/cmr/search/test/unit/data/granule_counts_cache_test.clj`	Deleted ES-side `get-collection-granule-counts`; callers now use `granule-counts-cache/get-granule-counts` and client-side filtering; updated function arities, simplified cache refresh logging, and adjusted tests to match new callback arity.
Cache protocol + implementations + tests `common-lib/src/cmr/common/cache.clj`, `common-lib/src/cmr/common/cache/cache_spec.clj`, `common-lib/src/cmr/common/cache/fallback_cache.clj`, `common-lib/src/cmr/common/cache/in_memory_cache.clj`, `common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj`, `common-lib/test/cmr/common/test/cache/in_memory_cache.clj`, `redis-utils-lib/src/cmr/redis_utils/redis_cache.clj`, `redis-utils-lib/test/cmr/redis_utils/test/test_redis_cache.clj`, `transmit-lib/src/cmr/transmit/cache/consistent_cache.clj`	Added `evict` to `CmrCache` protocol; implemented `evict` in multiple cache records (in-memory, fallback, single-thread wrapper, Redis, consistent memory); added/updated unit tests validating eviction behavior.
Launchpad / EDL token handling & caching `transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`, `transmit-lib/src/cmr/transmit/tokens.clj`, `transmit-lib/src/cmr/transmit/urs.clj`, `mock-echo-app/src/cmr/mock_echo/api/urs.clj`, `mock-echo-app/src/cmr/mock_echo/api/routes.clj`, `system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`	Changed cache key hashing to SHA-256, introduced `transient-error-types`, added eviction-on-expiration and non-caching for transient errors, expanded EDL/URS error-type→HTTP mappings including 429 handling, added token validation simulation state and reset helper in mock-echo, and updated integration tests for new behaviors.
Split-cluster/logging and index-set cleanup `elastic-utils-lib/src/cmr/elastic_utils/config.clj`, `elastic-utils-lib/src/cmr/elastic_utils/es_helper.clj`, `elastic-utils-lib/src/cmr/elastic_utils/search/es_index.clj`, `indexer-app/src/cmr/indexer/data/elasticsearch.clj`, `indexer-app/src/cmr/indexer/data/index_set_elasticsearch.clj`	Removed `split-cluster-log-toggle` config and conditional logging; removed legacy `get-old-index-set` migration path and related transitional branches; minor local rebinding in send-with-retry.
Shapefile unzip safety `search-app/src/cmr/search/services/parameters/converters/shapefile.clj`, `search-app/test/cmr/search/test/unit/services/parameter_converters/shapefile.clj`	Added `validate-entry-dir` to assert zip entries cannot escape target directory; integrated into `unzip-file` and added tests covering malicious and valid entries.
API error mapping `common-lib/src/cmr/common/api/errors.clj`	Extended `type->http-status-code` to include `:request-timeout` (408), `:internal-server-error` (500), and `:bad-gateway` (502).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

CMR-10826: Create granule counts cache for all collections #2309: Directly related refactor switching callers to use granule-counts-cache and changing get-collection-granule-counts arities.
CMR-10949: Remove get-old-index-set and CMR-10600 feature toggle and logs #2399: Related removal of split-cluster logging and legacy index-set migration paths.
CMR-10969: Cache failed LP token validation #2354: Related changes to Launchpad token caching/validation behavior and cache key/eviction semantics.

Suggested labels

hacktoberfest-accepted

Suggested reviewers

DuJuan
jmaeng72
ygliuvt

Poem

🐇 I hopped through caches, hashes, and logs,
Evicted the crumbs, fixed zips and the frogs,
Tokens now age with a sigh and a thump,
Counts come from one place — no more double-jump,
Hooray for small fixes — time for a carrot chomp! 🥕

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	⚠️ Warning	The description is entirely a template with no concrete information filled in. No objectives, changes, impacted areas, or checklist items are actually addressed with specific details.	Complete all required sections with specific details: explain the objective (why these changes?), summarize the concrete changes made, list which applications are impacted, and check/address each checklist item with evidence.
Title check	❓ Inconclusive	The title is vague and non-descriptive, using a draft prefix and a generic branch identifier rather than summarizing the actual changes made in the pull request.	Replace the title with a clear, concise description of the main changes. For example, if the primary objective is removing legacy code or refactoring, state that explicitly rather than using draft/branch references.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch CMR-10838-Erich

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can make CodeRabbit's review stricter and more nitpicky using the `assertive` profile, if that's what you prefer.

Change the reviews.profile setting to assertive to make CodeRabbit's nitpick more issues in your PRs.

coderabbitai

🧹 Nitpick comments (2)

search-app/src/cmr/search/data/granule_counts_cache.clj (2)

83-91: Consider using a set for provider-id lookups.

The current some #(= ... %) provider-ids is O(n) per collection entry. Converting provider-ids to a set would make lookups O(1), which could matter if the counts map is large.

♻️ Optional optimization

   ([context provider-ids get-collection-granule-counts-fn]
    (let [cache (cache/context->cache context granule-counts-cache-key)
          counts (cache/get-value cache
                                  granule-counts-cache-key
                                  #(get-collection-granule-counts-fn context))]
      (if (seq provider-ids)
-       (into {} (filter (fn [[k _]]
-                          (some #(= (concepts/concept-id->provider-id k) %) provider-ids))
-                        counts))
+       (let [provider-id-set (set provider-ids)]
+         (into {} (filter (fn [[k _]]
+                            (provider-id-set (concepts/concept-id->provider-id k)))
+                          counts)))
        counts))))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@search-app/src/cmr/search/data/granule_counts_cache.clj` around lines 83 -
91, The provider-ids list is used for repeated membership checks inside the
filter, causing O(n) work per entry; convert provider-ids to a set once (e.g.,
provider-ids-set via set) and replace the some #(=
(concepts/concept-id->provider-id k) %) call with a constant-time membership
test like (contains? provider-ids-set (concepts/concept-id->provider-id k)) in
the filter that produces the returned map (referencing the local symbols
provider-ids, counts, and concepts/concept-id->provider-id).

45-62: Move the extra-collection-count check to execute once per provider, not per collection.

The do block checks extra-collection-count > 0 on every coll-bucket iteration, but the value is constant within a provider. This repeats the check N times (once per collection) instead of once per provider. Move the check into the :let binding immediately after computing extra-collection-count.

♻️ Proposed refactor

     (into {} (for [provider-bucket (get-in results [:aggregations :by-provider :buckets])
-                   :let [extra-collection-count (get-in provider-bucket [:by-collection-id :sum_other_doc_count])]
+                   :let [extra-collection-count (get-in provider-bucket [:by-collection-id :sum_other_doc_count])
+                         provider-id (:key provider-bucket)
+                         _ (when (> extra-collection-count 0)
+                             (e/internal-error!
+                              (format "Provider %s has more collections ([%s]) with granules than we support"
+                                      provider-id extra-collection-count)))]
                    coll-bucket (get-in provider-bucket [:by-collection-id :buckets])
-                   :let [provider-id (:key provider-bucket)
-                         coll-seq-id (:key coll-bucket)
+                   :let [coll-seq-id (:key coll-bucket)
                          num-granules (:doc_count coll-bucket)]]
-               (do
-                 ;; It's possible that there are more collections in the provider than we expected.
-                 ;; :sum_other_doc_count will be greater than 0 in that case.
-                 (when (> extra-collection-count 0)
-                   (e/internal-error!
-                    (format "Provider %s has more collections ([%s]) with granules than we support"
-                            provider-id extra-collection-count)))
-
-                 [(concepts/build-concept-id {:sequence-number coll-seq-id
-                                              :provider-id provider-id
-                                              :concept-type :collection})
-                  num-granules])))))
+               [(concepts/build-concept-id {:sequence-number coll-seq-id
+                                            :provider-id provider-id
+                                            :concept-type :collection})
+                num-granules]))))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@search-app/src/cmr/search/data/granule_counts_cache.clj` around lines 45 -
62, The extra-collection-count check is performed inside the per-collection do
block, causing it to run for every coll-bucket; move that check to run once per
provider by performing it immediately after computing extra-collection-count in
the :let for provider-bucket (instead of inside the inner do/when). Concretely:
remove the when/e/internal-error! from inside the coll-bucket body and add a
single when/e/internal-error! guarded by extra-collection-count in the
provider-level :let (the same scope where extra-collection-count and provider-id
are computed) so the error is raised once per provider; leave the
concepts/build-concept-id and num-granules mapping for each coll-bucket
unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@search-app/src/cmr/search/data/granule_counts_cache.clj`:
- Around line 83-91: The provider-ids list is used for repeated membership
checks inside the filter, causing O(n) work per entry; convert provider-ids to a
set once (e.g., provider-ids-set via set) and replace the some #(=
(concepts/concept-id->provider-id k) %) call with a constant-time membership
test like (contains? provider-ids-set (concepts/concept-id->provider-id k)) in
the filter that produces the returned map (referencing the local symbols
provider-ids, counts, and concepts/concept-id->provider-id).
- Around line 45-62: The extra-collection-count check is performed inside the
per-collection do block, causing it to run for every coll-bucket; move that
check to run once per provider by performing it immediately after computing
extra-collection-count in the :let for provider-bucket (instead of inside the
inner do/when). Concretely: remove the when/e/internal-error! from inside the
coll-bucket body and add a single when/e/internal-error! guarded by
extra-collection-count in the provider-level :let (the same scope where
extra-collection-count and provider-id are computed) so the error is raised once
per provider; leave the concepts/build-concept-id and num-granules mapping for
each coll-bucket unchanged.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e445e653-ea45-4a64-902a-b722b1f08ac2

📥 Commits

Reviewing files that changed from the base of the PR and between 8c8ca5f and 4b727e1.

📒 Files selected for processing (2)

search-app/src/cmr/search/data/granule_counts_cache.clj
search-app/test/cmr/search/test/unit/data/granule_counts_cache_test.clj

codecov-commenter · 2026-03-09T23:06:04Z

Codecov Report

❌ Patch coverage is 4.58015% with 125 lines in your changes missing coverage. Please review.
✅ Project coverage is 24.12%. Comparing base (22c44cb) to head (024ff64).
⚠️ Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
transmit-lib/src/cmr/transmit/urs.clj	0.00%	35 Missing ⚠️
...h-app/src/cmr/search/data/granule_counts_cache.clj	0.00%	30 Missing ⚠️
...smit-lib/src/cmr/transmit/launchpad_user_cache.clj	8.00%	23 Missing ⚠️
mock-echo-app/src/cmr/mock_echo/api/urs.clj	15.78%	16 Missing ⚠️
...tils-lib/src/cmr/elastic_utils/search/es_index.clj	0.00%	4 Missing ⚠️
...earch/services/parameters/converters/shapefile.clj	20.00%	4 Missing ⚠️
...s/query_execution/has_granules_results_feature.clj	0.00%	3 Missing ⚠️
mock-echo-app/src/cmr/mock_echo/api/routes.clj	0.00%	2 Missing ⚠️
...edis-utils-lib/src/cmr/redis_utils/redis_cache.clj	0.00%	2 Missing ⚠️
...it-lib/src/cmr/transmit/cache/consistent_cache.clj	0.00%	2 Missing ⚠️
... and 4 more

❗ There is a different number of reports uploaded between BASE (22c44cb) and HEAD (024ff64). Click for more details.

HEAD has 23 uploads less than BASE

Flag BASE (22c44cb) HEAD (024ff64)

24 1

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #2397       +/-   ##
===========================================
- Coverage   57.88%   24.12%   -33.77%     
===========================================
  Files        1067     1007       -60     
  Lines       73351    69938     -3413     
  Branches     2124      978     -1146     
===========================================
- Hits        42462    16873    -25589     
- Misses      28911    52087    +23176     
+ Partials     1978      978     -1000

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…2398) * CMR-11116: fixes launchpad invalid token timeout * CMR-11106: adds error propagation from edl for launchpad tokens * CMR-11116: adds tests for launchpad token caching behavior * CMR-11116: pr comments * CMR-11116: pr nitpicks * CMR-11116: pr nitpicks

…logs (#2399)

* logic * add new line * update test

coderabbitai

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

search-app/src/cmr/search/services/parameters/converters/shapefile.clj (1)
65-77: ⚠️ Potential issue | 🟠 Major

Recursively delete the temporary directory when extraction fails.

The File.delete() method (line 76) silently fails on non-empty directories—it returns false rather than throwing an exception. If extraction partially succeeds before an exception is raised, already-extracted files remain behind in the temporary directory, causing a resource leak.

The fix uses file-seq to recursively walk the directory tree and reverse to delete files before their parent directories:
Suggested fix
       (catch Exception e
-        (.delete (.toFile target-dir))
+        (run! io/delete-file (reverse (file-seq (.toFile target-dir))))
         (errors/throw-service-error :bad-request (str "Error while uncompressing zip file: " (.getMessage e))))))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@search-app/src/cmr/search/services/parameters/converters/shapefile.clj`
around lines 65 - 77, The catch block currently calls (.delete (.toFile
target-dir)) which silently fails on non-empty directories; change it to
recursively delete the temporary extraction directory by walking the tree from
(.toFile target-dir) using file-seq, reversing the sequence so files are removed
before their parent directories, and deleting each File (so partially-extracted
files are removed on error). Keep the rest of the try/catch and still throw the
same service error via errors/throw-service-error; reference the existing
symbols target-dir and (.toFile target-dir) and the surrounding extraction logic
(validate-entry-dir, io/copy) when making the change.

🧹 Nitpick comments (2)

redis-utils-lib/src/cmr/redis_utils/redis_cache.clj (1)

88-91: Pass s-key to wcar* here too.

wcar* uses its first argument for per-key logging, and the other write paths in this record pass the serialized Redis key. Using the raw EDN key just for evict makes invalidation logs/metrics inconsistent with set-value and reset.

♻️ Suggested change

   (evict
     [_this key]
     (let [s-key (serialize key)]
-      (wcar* key false primary-connection (carmine/del s-key))))
+      (wcar* s-key false primary-connection (carmine/del s-key))))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@redis-utils-lib/src/cmr/redis_utils/redis_cache.clj` around lines 88 - 91,
The evict implementation is calling (wcar* key false primary-connection
(carmine/del s-key)) which passes the raw EDN key for per-key logging; change
the first argument to the serialized key so logging/metrics match other write
paths. In the evict function (method name evict, helper serialize and wcar*),
pass s-key as the first parameter to wcar* while keeping (carmine/del s-key) as
the command so the call becomes wcar* s-key false primary-connection
(carmine/del s-key).

system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj (1)

119-120: Derive the expiry jump from config.

Hard-coding 301 bakes the default five-minute invalid-token timeout into the test. Using transmit-config/invalid-token-timeout keeps this assertion aligned if the cache window is tuned later.

Suggested change

-      (dev-sys-util/advance-time! 301)
+      (dev-sys-util/advance-time!
+       (inc (* 60 (transmit-config/invalid-token-timeout))))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`
around lines 119 - 120, Replace the hard-coded 301 advance in the test by
deriving the jump from the configured timeout: use
transmit-config/invalid-token-timeout to compute the advance amount (e.g.
timeout + 1) and call dev-sys-util/advance-time! with that value so the test
remains correct if invalid-token-timeout changes; update the reference in the
test where dev-sys-util/advance-time! is invoked to use the computed value
instead of 301.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@common-lib/src/cmr/common/cache/fallback_cache.clj`:
- Around line 80-83: The current evict implementation in the evict function
evicts primary-cache before backup-cache, which allows a race with get-value
that can repopulate primary from backup; change the eviction order to evict
backup-cache first and then primary-cache (i.e., call cache/evict on
backup-cache before calling cache/evict on primary-cache) to ensure the backup
is removed prior to any possible repopulation into primary.

In `@elastic-utils-lib/src/cmr/elastic_utils/search/es_index.clj`:
- Around line 179-182: Remove the four debug global defs that leak state (the
lines that do (def context context), (def index-info index-info), (def query
query), (def max-retries max-retries)); instead rely on the function parameters
or create local bindings with let if you need renamed locals, and ensure the
recursive retry call (the function that calls itself on retry) uses the
original/local bindings passed into the function rather than any global
vars—delete the debug def statements to avoid race conditions and namespace
pollution and, if necessary, wrap any temporary names in a let around the
existing logic so retries use the correct per-call values.

In `@search-app/src/cmr/search/services/parameters/converters/shapefile.clj`:
- Around line 55-59: The current validate-entry-dir uses string-based startsWith
on canonical paths which permits sibling escapes; change it to perform a
path-aware containment check using java.nio.file.Path APIs: obtain the canonical
Path for the target directory and the entry (use Path.of / .toPath on the
canonical File values), then verify that entryPath.startsWith(targetPath) (or
use targetPath.relativize and ensure it does not traverse up) before allowing
extraction; update the validate-entry-dir function and the extraction logic that
calls it (the canonical path computation around (.getCanonicalPath (File.
(.toString target-dir) (str entry))) and the check against (.getCanonicalPath
(.toFile target-dir))) to use Path-based containment instead of
String.startsWith.

In
`@search-app/test/cmr/search/test/unit/services/parameter_converters/shapefile.clj`:
- Around line 122-143: The test is asserting the wrong error for absolute paths
and misses a sibling-prefix traversal case; update the second test in
shapefile.clj that calls shapefile/unzip-file to expect the validation error
message "Given zip content is not allowed" (instead of the filesystem "No such
file or directory") and add a new unit test asserting that validate-entry-dir
throws the same validation error for a sibling-prefix traversal entry like
"../<target-dir-name>evil/..." (ensure you call validate-entry-dir directly and
assert thrown-with-msg? Exception #"Given zip content is not allowed") so the
raw .startsWith vulnerability is covered and prevented.

In `@transmit-lib/src/cmr/transmit/cache/consistent_cache.clj`:
- Around line 122-125: The evict implementation should remove the shared hash
entry before the local memory entry to minimize inconsistency window: in the
evict function (consistent_cache.clj) call (c/evict hash-cache
(key->hash-cache-key key)) before calling (c/evict memory-cache key) so the
shared hash is invalidated first; update the evict body in the function named
evict accordingly, preserving existing semantics and error handling.

In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj`:
- Around line 29-31: The transient-error-types set currently only lists
:too-many-requests and :gateway-timeout but must also treat URS upstream
failures as transient; update the definition of transient-error-types (and the
similar set used at lines 47-56) to include :request-timeout,
:internal-server-error, :bad-gateway, and :service-unavailable so that
cmr.transmit.urs/get-launchpad-user and related caching logic will not store a
five-minute invalid-token entry for temporary Launchpad/EDL outages.
- Around line 63-74: After evicting the stale entry, the code currently throws
on a non-transient invalid fresh-result without storing it back into the cache,
causing repeated EDL calls for the same bad token; update the branch in the
block around get-launchpad-user-fn so that when fresh-result is invalid (i.e.,
(:valid fresh-result) is false) you still call cache/set-value with the same
cache and cache-key to record the failure result (fresh-result) before invoking
errors/throw-service-error; reference symbols to update: cache/evict,
get-launchpad-user-fn, fresh-result, cache/set-value,
errors/throw-service-error, common-util/scrub-token, cache-key and token.
- Around line 58-59: The code currently interns SHA-256 hashes by creating
keywords for cache keys (cache-key = (keyword (DigestUtils/sha256Hex token))),
which leaks memory; change cache-key to use the hex string directly (e.g., call
DigestUtils/sha256Hex on token and pass that string into cache/get-value with
get-launchpad-user-fn) so you stop creating keywords for each token and avoid
filling the keyword intern table.

---

Outside diff comments:
In `@search-app/src/cmr/search/services/parameters/converters/shapefile.clj`:
- Around line 65-77: The catch block currently calls (.delete (.toFile
target-dir)) which silently fails on non-empty directories; change it to
recursively delete the temporary extraction directory by walking the tree from
(.toFile target-dir) using file-seq, reversing the sequence so files are removed
before their parent directories, and deleting each File (so partially-extracted
files are removed on error). Keep the rest of the try/catch and still throw the
same service error via errors/throw-service-error; reference the existing
symbols target-dir and (.toFile target-dir) and the surrounding extraction logic
(validate-entry-dir, io/copy) when making the change.

---

Nitpick comments:
In `@redis-utils-lib/src/cmr/redis_utils/redis_cache.clj`:
- Around line 88-91: The evict implementation is calling (wcar* key false
primary-connection (carmine/del s-key)) which passes the raw EDN key for per-key
logging; change the first argument to the serialized key so logging/metrics
match other write paths. In the evict function (method name evict, helper
serialize and wcar*), pass s-key as the first parameter to wcar* while keeping
(carmine/del s-key) as the command so the call becomes wcar* s-key false
primary-connection (carmine/del s-key).

In `@system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj`:
- Around line 119-120: Replace the hard-coded 301 advance in the test by
deriving the jump from the configured timeout: use
transmit-config/invalid-token-timeout to compute the advance amount (e.g.
timeout + 1) and call dev-sys-util/advance-time! with that value so the test
remains correct if invalid-token-timeout changes; update the reference in the
test where dev-sys-util/advance-time! is invoked to use the computed value
instead of 301.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1267cade-6f57-4aee-a535-00ef0e2a8a3d

📥 Commits

Reviewing files that changed from the base of the PR and between 4b727e1 and 024ff64.

📒 Files selected for processing (23)

common-lib/src/cmr/common/api/errors.clj
common-lib/src/cmr/common/cache.clj
common-lib/src/cmr/common/cache/cache_spec.clj
common-lib/src/cmr/common/cache/fallback_cache.clj
common-lib/src/cmr/common/cache/in_memory_cache.clj
common-lib/src/cmr/common/cache/single_thread_lookup_cache.clj
common-lib/test/cmr/common/test/cache/in_memory_cache.clj
elastic-utils-lib/src/cmr/elastic_utils/config.clj
elastic-utils-lib/src/cmr/elastic_utils/es_helper.clj
elastic-utils-lib/src/cmr/elastic_utils/search/es_index.clj
indexer-app/src/cmr/indexer/data/elasticsearch.clj
indexer-app/src/cmr/indexer/data/index_set_elasticsearch.clj
mock-echo-app/src/cmr/mock_echo/api/routes.clj
mock-echo-app/src/cmr/mock_echo/api/urs.clj
redis-utils-lib/src/cmr/redis_utils/redis_cache.clj
redis-utils-lib/test/cmr/redis_utils/test/test_redis_cache.clj
search-app/src/cmr/search/services/parameters/converters/shapefile.clj
search-app/test/cmr/search/test/unit/services/parameter_converters/shapefile.clj
system-int-test/test/cmr/system_int_test/misc/launchpad_user_cache.clj
transmit-lib/src/cmr/transmit/cache/consistent_cache.clj
transmit-lib/src/cmr/transmit/launchpad_user_cache.clj
transmit-lib/src/cmr/transmit/tokens.clj
transmit-lib/src/cmr/transmit/urs.clj

💤 Files with no reviewable changes (3)

elastic-utils-lib/src/cmr/elastic_utils/es_helper.clj
elastic-utils-lib/src/cmr/elastic_utils/config.clj
indexer-app/src/cmr/indexer/data/index_set_elasticsearch.clj

✅ Files skipped from review due to trivial changes (1)

redis-utils-lib/test/cmr/redis_utils/test/test_redis_cache.clj

coderabbitai · 2026-03-19T12:49:03Z

common-lib/src/cmr/common/cache/fallback_cache.clj

+  (evict
+    [_this key]
+    (cache/evict primary-cache key)
+    (cache/evict backup-cache key))


⚠️ Potential issue | 🟠 Major

Evict the backup before the primary.

get-value repopulates primary-cache from backup-cache on a miss. With the current order, a concurrent read between the two calls can miss in primary-cache, read the stale value from backup-cache, and write it back into primary-cache before backup eviction runs. That leaves the entry present again after evict returns.

🐛 Suggested change

(evict [_this key] - (cache/evict primary-cache key) - (cache/evict backup-cache key)) + (cache/evict backup-cache key) + (cache/evict primary-cache key))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@common-lib/src/cmr/common/cache/fallback_cache.clj` around lines 80 - 83, The current evict implementation in the evict function evicts primary-cache before backup-cache, which allows a race with get-value that can repopulate primary from backup; change the eviction order to evict backup-cache first and then primary-cache (i.e., call cache/evict on backup-cache before calling cache/evict on primary-cache) to ensure the backup is removed prior to any possible repopulation into primary.

coderabbitai · 2026-03-19T12:49:03Z

elastic-utils-lib/src/cmr/elastic_utils/search/es_index.clj

+  (def context context)
+  (def index-info index-info)
+  (def query query)
+  (def max-retries max-retries)


⚠️ Potential issue | 🔴 Critical

Critical: Remove debug def statements - causes race conditions and global namespace pollution.

Using def inside a function does not create local bindings—it creates or mutates global vars in the namespace. This is a well-known Clojure anti-pattern that causes:

Race conditions: Concurrent ES queries will overwrite these global vars, corrupting each other's context, index-info, query, and max-retries values.

Namespace pollution: These vars persist globally after the function returns.

Non-deterministic behavior: The recursive retry call at line 201 will use corrupted values if another thread modified the globals.

The PR commit message mentions "Sending up a debug change"—this debug code must be removed before merging.

🐛 Proposed fix: Remove debug statements

(defn- do-send-with-retry "Sends a query to ES, either normal or using a scroll query." [context index-info query max-retries] - (def context context) - (def index-info index-info) - (def query query) - (def max-retries max-retries) (try

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

(def context context)

(def index-info index-info)

(def query query)

(def max-retries max-retries)

(defn- do-send-with-retry

"Sends a query to ES, either normal or using a scroll query."

[context index-info query max-retries]

(try

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@elastic-utils-lib/src/cmr/elastic_utils/search/es_index.clj` around lines 179 - 182, Remove the four debug global defs that leak state (the lines that do (def context context), (def index-info index-info), (def query query), (def max-retries max-retries)); instead rely on the function parameters or create local bindings with let if you need renamed locals, and ensure the recursive retry call (the function that calls itself on retry) uses the original/local bindings passed into the function rather than any global vars—delete the debug def statements to avoid race conditions and namespace pollution and, if necessary, wrap any temporary names in a let around the existing logic so retries use the correct per-call values.

coderabbitai · 2026-03-19T12:49:03Z

search-app/src/cmr/search/services/parameters/converters/shapefile.clj

+(defn validate-entry-dir
+  [target-dir entry]
+  (let [canonical (.getCanonicalPath (File. (.toString target-dir) (str entry)))]
+    (when-not (.startsWith canonical (.getCanonicalPath (.toFile target-dir)))
+      (throw (Exception. "Given zip content is not allowed")))))


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -path "*/search-app/src/cmr/search/services/parameters/converters/shapefile.clj" -type f

Repository: nasa/Common-Metadata-Repository

Length of output: 147

🏁 Script executed:

cat -n search-app/src/cmr/search/services/parameters/converters/shapefile.clj | sed -n '45,85p'

Repository: nasa/Common-Metadata-Repository

Length of output: 2103

🏁 Script executed:

# Test the specific vulnerability scenario with Java canonical paths python3 << 'EOF' import os import tempfile from pathlib import Path # Simulate the scenario described in the review # Temp dir: /tmp/Shapes123 # Entry: ../Shapes1234/escape.shp # Create a test scenario with tempfile.TemporaryDirectory(prefix="Shapes123_") as tmpdir: base_dir = tmpdir # e.g., /tmp/Shapes123_xyz print(f"Base directory: {base_dir}") # The entry path that attempts to escape entry = "../Shapes1234/escape.shp" # Simulate what happens when constructing a File with base + entry # In Java: new File(base_dir, entry) test_path = os.path.join(base_dir, entry) # Normalize it (similar to .getCanonicalPath()) canonical = os.path.normpath(test_path) print(f"Entry: {entry}") print(f"Constructed path: {test_path}") print(f"Canonical path: {canonical}") # Check if canonical path is outside base dir # This is the vulnerability: string prefix check canonical_base = os.path.normpath(base_dir) print(f"Canonical base: {canonical_base}") print(f"Does canonical start with base string? {canonical.startswith(canonical_base)}") # But what about the sibling directory issue? # If base_dir is /tmp/tmpXXX_Shapes123, and we escape to ../Shapes1234 # We'd end up in /tmp/Shapes1234 which does NOT start with /tmp/tmpXXX_Shapes123 # Let's test with a more realistic scenario matching the review # where the directory names have similar prefixes base_similar = "/tmp/Shapes123" escaped = os.path.join(base_similar, entry) canonical_escaped = os.path.normpath(escaped) print(f"\nRealistic scenario:") print(f"Base: {base_similar}") print(f"Escaped path: {escaped}") print(f"Canonical: {canonical_escaped}") print(f"Does canonical start with base? {canonical_escaped.startswith(base_similar)}") # Ah! If we go to /tmp/Shapes1234, it DOES start with /tmp/Shapes # But not with /tmp/Shapes123 (with trailing slash issue?) print(f"\nPrefix test details:") print(f" {canonical_escaped}.startswith({base_similar}) = {canonical_escaped.startswith(base_similar)}") print(f" {canonical_escaped}.startswith({base_similar + os.sep}) = {canonical_escaped.startswith(base_similar + os.sep)}") EOF

Repository: nasa/Common-Metadata-Repository

Length of output: 662

🏁 Script executed:

# Read the complete validate-entry-dir function and surrounding context cat -n search-app/src/cmr/search/services/parameters/converters/shapefile.clj | sed -n '50,80p'

Repository: nasa/Common-Metadata-Repository

Length of output: 1735

The zip-slip guard is bypassable through sibling path directories due to string-based prefix validation.

The current implementation at lines 57-58 uses String.startsWith() on canonical paths, which cannot distinguish between containment and string prefix matches. For example, if the temp directory is /tmp/Shapes123, an entry like ../Shapes1234/escape.shp canonicalizes to /tmp/Shapes1234/escape.shp, which passes the validation because it starts with the string /tmp/Shapes123 but is actually outside the intended extraction root.

This affects the extraction logic at lines 67-73 as well, where validate-entry-dir is called after the file path is already constructed.

Replace the string-based prefix check with path-aware comparison using the java.nio.file.Path API, which correctly handles path containment semantics and prevents sibling directory escapes.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@search-app/src/cmr/search/services/parameters/converters/shapefile.clj` around lines 55 - 59, The current validate-entry-dir uses string-based startsWith on canonical paths which permits sibling escapes; change it to perform a path-aware containment check using java.nio.file.Path APIs: obtain the canonical Path for the target directory and the entry (use Path.of / .toPath on the canonical File values), then verify that entryPath.startsWith(targetPath) (or use targetPath.relativize and ensure it does not traverse up) before allowing extraction; update the validate-entry-dir function and the extraction logic that calls it (the canonical path computation around (.getCanonicalPath (File. (.toString target-dir) (str entry))) and the check against (.getCanonicalPath (.toFile target-dir))) to use Path-based containment instead of String.startsWith.

coderabbitai · 2026-03-19T12:49:03Z

search-app/test/cmr/search/test/unit/services/parameter_converters/shapefile.clj

+(deftest unzip-file-test
+  (testing "invalid file path in the source throws error"
+    (let [invalid-zip (create-single-entry-zip "invalid.zip" "../somewhere_else.txt" "something")]
+      (try
+        (is (thrown-with-msg? Exception #"Given zip content is not allowed" (shapefile/unzip-file invalid-zip)))
+        (finally
+          (io/delete-file invalid-zip true)))))
+  (testing "invalid file path within the target dir throws error"
+    (let [invalid-zip (create-single-entry-zip "wrong-dir.zip" "/BLAH/somewhere_else.txt" "something")]
+      (try
+        (is (thrown-with-msg? Exception #"Error while uncompressing zip file.*No such file or directory" (shapefile/unzip-file invalid-zip)))
+        (finally
+          (io/delete-file invalid-zip true)))))
+  (testing "valid zip file returns file obj"
+    (let [valid-zip (create-single-entry-zip "valid.zip" "somewhere_else.txt" "something")
+          result (shapefile/unzip-file valid-zip)]
+      (try
+        (is (some? result))
+        (finally
+          (when result
+            (run! io/delete-file (reverse (file-seq result))))
+          (io/delete-file valid-zip true))))))


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

find . -name "shapefile.clj" -type f

Repository: nasa/Common-Metadata-Repository

Length of output: 374

🏁 Script executed:

find . -name "shapefile.clj" -type f | head -20

Repository: nasa/Common-Metadata-Repository

Length of output: 374

🏁 Script executed:

wc -l ./search-app/src/cmr/search/services/parameters/converters/shapefile.clj

Repository: nasa/Common-Metadata-Repository

Length of output: 151

🏁 Script executed:

cat -n ./search-app/src/cmr/search/services/parameters/converters/shapefile.clj | head -70

Repository: nasa/Common-Metadata-Repository

Length of output: 3254

🏁 Script executed:

cat -n ./search-app/src/cmr/search/services/parameters/converters/shapefile.clj | sed -n '55,85p'

Repository: nasa/Common-Metadata-Repository

Length of output: 1721

🏁 Script executed:

cat -n ./search-app/test/cmr/search/test/unit/services/parameter_converters/shapefile.clj | sed -n '122,145p'

Repository: nasa/Common-Metadata-Repository

Length of output: 1363

🏁 Script executed:

cat -n ./search-app/test/cmr/search/test/unit/services/parameter_converters/shapefile.clj | head -40

Repository: nasa/Common-Metadata-Repository

Length of output: 2105

🏁 Script executed:

cat -n ./search-app/test/cmr/search/test/unit/services/parameter_converters/shapefile.clj | sed -n '100,121p'

Repository: nasa/Common-Metadata-Repository

Length of output: 959

🏁 Script executed:

cat -n ./search-app/src/cmr/search/services/parameters/converters/shapefile.clj | sed -n '61,77p'

Repository: nasa/Common-Metadata-Repository

Length of output: 1096

🏁 Script executed:

python3 << 'EOF' # Simulate the startsWith vulnerability # Example 1: Target directory target_dir = "/tmp/ShapesABC" # Example 2: Sibling-prefix traversal attempt entry_path = "../ShapesABCevil/escape.txt" # When File(target_dir, entry_path) is created: # It creates: /tmp/ShapesABC/../ShapesABCevil/escape.txt # Canonical form: /tmp/ShapesABCevil/escape.txt canonical = "/tmp/ShapesABCevil/escape.txt" canonical_target = "/tmp/ShapesABC" # The startsWith check vulnerable_result = canonical.startswith(canonical_target) print(f"Canonical path: {canonical}") print(f"Target dir canonical: {canonical_target}") print(f"Does canonical.startswith(target)? {vulnerable_result}") print(f"VULNERABLE: {vulnerable_result}") # Example 3: Absolute path (what the current test uses) entry_path2 = "/BLAH/somewhere_else.txt" canonical2 = "/BLAH/somewhere_else.txt" result2 = canonical2.startswith(canonical_target) print(f"\nAbsolute path test: {canonical2}") print(f"Does canonical.startswith(target)? {result2}") EOF

Repository: nasa/Common-Metadata-Repository

Length of output: 301

🏁 Script executed:

cat -n ./search-app/test/cmr/search/test/unit/services/parameter_converters/shapefile.clj | sed -n '113,135p'

Repository: nasa/Common-Metadata-Repository

Length of output: 1373

🏁 Script executed:

python3 << 'EOF' # Verify the regex matching behavior import re # Current test regex regex_pattern = r"Error while uncompressing zip file.*No such file or directory" # What the validation error would be validation_error = "Error while uncompressing zip file: Given zip content is not allowed" # What a filesystem error would be filesystem_error = "Error while uncompressing zip file: No such file or directory" print(f"Current regex: {regex_pattern}") print(f"Validation error message: {validation_error}") print(f"Match validation error? {bool(re.search(regex_pattern, validation_error))}") print() print(f"Filesystem error message: {filesystem_error}") print(f"Match filesystem error? {bool(re.search(regex_pattern, filesystem_error))}") EOF

Repository: nasa/Common-Metadata-Repository

Length of output: 394

🏁 Script executed:

# Check if there's any test that validates the sibling-prefix case grep -n "sibling\|\.\./" ./search-app/test/cmr/search/test/unit/services/parameter_converters/shapefile.clj

Repository: nasa/Common-Metadata-Repository

Length of output: 177

🏁 Script executed:

# Let me verify that the error from validation happens before filesystem operations cat -n ./search-app/src/cmr/search/services/parameters/converters/shapefile.clj | sed -n '61,77p'

Repository: nasa/Common-Metadata-Repository

Length of output: 1096

🏁 Script executed:

# Verify the exact error message behavior by testing with absolute path python3 << 'EOF' # When creating File("/tmp/ShapesABC", "/BLAH/somewhere_else.txt") # The second argument is treated as an absolute path in Java # File(parent, child) where child is absolute ignores parent # So it becomes just "/BLAH/somewhere_else.txt" # Canonical form of "/BLAH/somewhere_else.txt" is itself canonical = "/BLAH/somewhere_else.txt" target_dir = "/tmp/ShapesABC" # startsWith check result = canonical.startswith(target_dir) print(f"Absolute path test:") print(f" Canonical: {canonical}") print(f" Target: {target_dir}") print(f" Passes validation? {result}") print(f" Should throw? {not result}") EOF

Repository: nasa/Common-Metadata-Repository

Length of output: 205

🏁 Script executed:

grep -A 20 "deftest.*unzip" ./search-app/test/cmr/search/test/unit/services/parameter_converters/shapefile.clj | head -50

Repository: nasa/Common-Metadata-Repository

Length of output: 1162

🏁 Script executed:

# Let me also verify the complete imports to confirm ZipEntry is available grep -i "zipentry" ./search-app/test/cmr/search/test/unit/services/parameter_converters/shapefile.clj

Repository: nasa/Common-Metadata-Repository

Length of output: 176

🏁 Script executed:

python3 << 'EOF' # Simulate what happens with absolute path /BLAH/somewhere_else.txt # When File("/tmp/Shapes123", "/BLAH/somewhere_else.txt") is created: # In Java, if second arg is absolute, it overrides parent # So it becomes /BLAH/somewhere_else.txt # getCanonicalPath() on /BLAH/somewhere_else.txt will return /BLAH/somewhere_else.txt # (even if directory doesn't exist, canonical path just normalizes) target_dir_canonical = "/tmp/Shapes123" entry_path_canonical = "/BLAH/somewhere_else.txt" # startsWith check fails_validation = not entry_path_canonical.startswith(target_dir_canonical) print(f"Validation result for /BLAH/... path:") print(f" Entry canonical: {entry_path_canonical}") print(f" Target canonical: {target_dir_canonical}") print(f" Entry.startswith(target)? {entry_path_canonical.startswith(target_dir_canonical)}") print(f" Fails validation? {fails_validation}") print(f" Expected error: 'Given zip content is not allowed'") print(f" Wrapped as: 'Error while uncompressing zip file: Given zip content is not allowed'") print() # Now check sibling-prefix case print("Vulnerability check - sibling-prefix traversal:") target_dir_canonical = "/tmp/Shapes123" entry_path = "../Shapes123evil/escape.txt" # When resolved: /tmp/Shapes123/../Shapes123evil/escape.txt entry_path_canonical = "/tmp/Shapes123evil/escape.txt" passes_validation = entry_path_canonical.startswith(target_dir_canonical) print(f" Entry canonical: {entry_path_canonical}") print(f" Target canonical: {target_dir_canonical}") print(f" Entry.startswith(target)? {passes_validation}") print(f" VULNERABLE? {passes_validation} (should be False!)") EOF

Repository: nasa/Common-Metadata-Repository

Length of output: 592

🏁 Script executed:

grep -A 5 "invalid file path within the target dir" ./search-app/test/cmr/search/test/unit/services/parameter_converters/shapefile.clj

Repository: nasa/Common-Metadata-Repository

Length of output: 462

🏁 Script executed:

# Check imports in the test file to confirm ZipEntry is available head -30 ./search-app/test/cmr/search/test/unit/services/parameter_converters/shapefile.clj | grep -i "import\|zipentry"

Repository: nasa/Common-Metadata-Repository

Length of output: 145

🏁 Script executed:

python3 << 'EOF' # Trace through the logic of unzip-file with absolute path entry # Line 71: (validate-entry-dir target-dir entry) # This calls validate-entry-dir which: # 1. Creates File("/tmp/Shapes123", "/BLAH/somewhere_else.txt") # 2. In Java, absolute path as second arg overrides parent # 3. getCanonicalPath() returns "/BLAH/somewhere_else.txt" # 4. target-dir canonical is "/tmp/Shapes123" # 5. "/BLAH/somewhere_else.txt".startsWith("/tmp/Shapes123") = false # 6. Throws Exception("Given zip content is not allowed") # This exception is caught at line 75 and wrapped: # (str "Error while uncompressing zip file: " (.getMessage e)) # Result: "Error while uncompressing zip file: Given zip content is not allowed" # The test expects: #"Error while uncompressing zip file.*No such file or directory" # This regex would NOT match! import re regex = r"Error while uncompressing zip file.*No such file or directory" actual_message = "Error while uncompressing zip file: Given zip content is not allowed" print(f"Test expects regex: {regex}") print(f"Actual error message: {actual_message}") print(f"Regex matches actual? {bool(re.search(regex, actual_message))}") print() print("FINDING: The test at line 132 expects the WRONG error message!") print("It will fail because validation catches the absolute path before filesystem operations.") EOF

Repository: nasa/Common-Metadata-Repository

Length of output: 427

Test the validation contract, not the filesystem message.

The test at line 132 expects a regex matching "No such file or directory", but absolute-style paths like /BLAH/somewhere_else.txt actually fail validation (line 71) before any filesystem operations occur, throwing "Given zip content is not allowed" instead. The error message won't match the expected regex.

The test suite also misses the sibling-prefix escape vulnerability (../<target-dir-name>evil/...) that bypasses line 58's raw .startsWith check. This pattern would pass validation and allow directory traversal, but no test covers it.

Update the second test to expect the actual validation error message, and add a test for sibling-prefix traversal directly against validate-entry-dir to ensure the attack vector is blocked.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@search-app/test/cmr/search/test/unit/services/parameter_converters/shapefile.clj` around lines 122 - 143, The test is asserting the wrong error for absolute paths and misses a sibling-prefix traversal case; update the second test in shapefile.clj that calls shapefile/unzip-file to expect the validation error message "Given zip content is not allowed" (instead of the filesystem "No such file or directory") and add a new unit test asserting that validate-entry-dir throws the same validation error for a sibling-prefix traversal entry like "../<target-dir-name>evil/..." (ensure you call validate-entry-dir directly and assert thrown-with-msg? Exception #"Given zip content is not allowed") so the raw .startsWith vulnerability is covered and prevented.

coderabbitai · 2026-03-19T12:49:03Z

transmit-lib/src/cmr/transmit/cache/consistent_cache.clj

+  (evict
+    [_this key]
+    (c/evict memory-cache key)
+    (c/evict hash-cache (key->hash-cache-key key)))


⚠️ Potential issue | 🟠 Major

Evict the shared hash before the local entry.

If the hash-cache delete fails after Line 124 removes the local value, other nodes can keep serving the stale entry because the shared hash is still intact. Invalidating the shared hash first narrows that inconsistency window and better matches this cache's consistency-first contract.

Suggested change

(evict [_this key] - (c/evict memory-cache key) - (c/evict hash-cache (key->hash-cache-key key))) + (c/evict hash-cache (key->hash-cache-key key)) + (c/evict memory-cache key))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

(evict

[_this key]

(c/evict memory-cache key)

(c/evict hash-cache (key->hash-cache-key key)))

(evict

[_this key]

(c/evict hash-cache (key->hash-cache-key key))

(c/evict memory-cache key))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@transmit-lib/src/cmr/transmit/cache/consistent_cache.clj` around lines 122 - 125, The evict implementation should remove the shared hash entry before the local memory entry to minimize inconsistency window: in the evict function (consistent_cache.clj) call (c/evict hash-cache (key->hash-cache-key key)) before calling (c/evict memory-cache key) so the shared hash is invalidated first; update the evict body in the function named evict accordingly, preserving existing semantics and error handling.

coderabbitai · 2026-03-19T12:49:03Z

transmit-lib/src/cmr/transmit/launchpad_user_cache.clj

+(def transient-error-types
+  "Error types that should not be cached - they are transient and should be retried immediately"
+  #{:too-many-requests :gateway-timeout})


⚠️ Potential issue | 🟠 Major

Treat the other URS service failures as transient too.

cmr.transmit.urs/get-launchpad-user also raises :request-timeout, :internal-server-error, :bad-gateway, and :service-unavailable for upstream failures. Leaving them out here turns a temporary Launchpad/EDL outage into a five-minute invalid-token cache entry for an otherwise valid token.

Suggested change

(def transient-error-types "Error types that should not be cached - they are transient and should be retried immediately" - #{:too-many-requests :gateway-timeout}) + #{:request-timeout + :too-many-requests + :internal-server-error + :bad-gateway + :service-unavailable + :gateway-timeout})

Also applies to: 47-56

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj` around lines 29 - 31, The transient-error-types set currently only lists :too-many-requests and :gateway-timeout but must also treat URS upstream failures as transient; update the definition of transient-error-types (and the similar set used at lines 47-56) to include :request-timeout, :internal-server-error, :bad-gateway, and :service-unavailable so that cmr.transmit.urs/get-launchpad-user and related caching logic will not store a five-minute invalid-token entry for temporary Launchpad/EDL outages.

coderabbitai · 2026-03-19T12:49:03Z

transmit-lib/src/cmr/transmit/launchpad_user_cache.clj

+      (let [cache-key (keyword (DigestUtils/sha256Hex token))
+            token-info (cache/get-value cache cache-key get-launchpad-user-fn)]


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check if the file exists and examine the relevant section head -100 transmit-lib/src/cmr/transmit/launchpad_user_cache.clj | tail -60

Repository: nasa/Common-Metadata-Repository

Length of output: 3550

🏁 Script executed:

# Find the cache/get-value function definition to understand its key type expectations rg "def.*get-value|fn.*get-value" --max-count 10 -A 5

Repository: nasa/Common-Metadata-Repository

Length of output: 4125

🏁 Script executed:

# Check the cache module to understand what key types are accepted fd "cache" -e clj -e cljc | head -20

Repository: nasa/Common-Metadata-Repository

Length of output: 1363

🏁 Script executed:

# Look at the cache module to understand get-value signature cat -n common-lib/src/cmr/common/cache.clj

Repository: nasa/Common-Metadata-Repository

Length of output: 3526

🏁 Script executed:

# Look at in-memory cache implementation cat -n common-lib/src/cmr/common/cache/in_memory_cache.clj | head -80

Repository: nasa/Common-Metadata-Repository

Length of output: 2726

🏁 Script executed:

# Continue reading the in-memory cache implementation cat -n common-lib/src/cmr/common/cache/in_memory_cache.clj | sed -n '80,130p'

Repository: nasa/Common-Metadata-Repository

Length of output: 1842

🏁 Script executed:

# Also check how hash-cache handles this, since there's mention of hash_cache cat -n common-lib/src/cmr/common/hash_cache.clj | head -100

Repository: nasa/Common-Metadata-Repository

Length of output: 3469

🏁 Script executed:

# Check what type of cache is used for launchpad-user-cache-key rg "launchpad-user-cache-key|launchpad_user_cache_key" -B 2 -A 2

Repository: nasa/Common-Metadata-Repository

Length of output: 3141

🏁 Script executed:

# Search for other usages of cache keys in similar contexts to see if strings are used elsewhere rg "cache-key.*DigestUtils|cache-key.*sha256|cache-key.*hash" -B 2 -A 2

Repository: nasa/Common-Metadata-Repository

Length of output: 2396

🏁 Script executed:

# Verify if string keys work with the cache implementation by checking test cases rg -A 10 "cache/set-value|cache/get-value" common-lib/src/cmr/common/cache/cache_spec.clj | head -50

Repository: nasa/Common-Metadata-Repository

Length of output: 1353

🏁 Script executed:

# Let's verify that get-keys behavior doesn't affect our fix cat -n common-lib/src/cmr/common/cache/in_memory_cache.clj | sed -n '78,85p'

Repository: nasa/Common-Metadata-Repository

Length of output: 307

🏁 Script executed:

# Check if there are any constraints on key types in the protocol or implementations rg "get-keys|key.*convert" common-lib/src/cmr/common/cache.clj -B 2 -A 3

Repository: nasa/Common-Metadata-Repository

Length of output: 303

Use the SHA-256 hex string directly as the cache key; avoid interning via keyword.

Line 58 creates a keyword from each SHA-256 token hash. In Clojure, this interns the key in the keyword table, which persists in memory even after cache entries are evicted. Under high-cardinality Launchpad traffic, this accumulates unbounded entries in the keyword intern table. The hex string itself is already a suitable cache key.

Suggested change

- (let [cache-key (keyword (DigestUtils/sha256Hex token)) + (let [cache-key (DigestUtils/sha256Hex token) token-info (cache/get-value cache cache-key get-launchpad-user-fn)]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

(let [cache-key (keyword (DigestUtils/sha256Hex token))

token-info (cache/get-value cache cache-key get-launchpad-user-fn)]

(let [cache-key (DigestUtils/sha256Hex token)

token-info (cache/get-value cache cache-key get-launchpad-user-fn)]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj` around lines 58 - 59, The code currently interns SHA-256 hashes by creating keywords for cache keys (cache-key = (keyword (DigestUtils/sha256Hex token))), which leaks memory; change cache-key to use the hex string directly (e.g., call DigestUtils/sha256Hex on token and pass that string into cache/get-value with get-launchpad-user-fn) so you stop creating keywords for each token and avoid filling the keyword intern table.

coderabbitai · 2026-03-19T12:49:03Z

transmit-lib/src/cmr/transmit/launchpad_user_cache.clj

          (do
-            (error (format "Launchpad token (partially redacted) [%s] has expired."
-                           (common-util/scrub-token token)))
-            (errors/throw-service-error
-             :unauthorized
-             (format "Launchpad token (partially redacted) [%s] has expired."
-                     (common-util/scrub-token token))))
+            (cache/evict cache cache-key)
+            (let [fresh-result (get-launchpad-user-fn)]
+              (if (:valid fresh-result)
+                (do
+                  (cache/set-value cache cache-key fresh-result)
+                  fresh-result)
+                (errors/throw-service-error
+                 (or (:error-type fresh-result) :unauthorized)
+                 (or (:error-message fresh-result)
+                     (format "Invalid Launchpad token (partially redacted) [%s]"
+                             (common-util/scrub-token token)))))))


⚠️ Potential issue | 🟠 Major

Re-cache repeated permanent failures after revalidation.

After Line 64 evicts the stale entry, a second non-transient failure is thrown without storing fresh-result back. That means the first five-minute backoff only happens once; every later request with the same bad or expired token will call EDL again.

Suggested change

(do (cache/evict cache cache-key) (let [fresh-result (get-launchpad-user-fn)] (if (:valid fresh-result) (do (cache/set-value cache cache-key fresh-result) fresh-result) - (errors/throw-service-error - (or (:error-type fresh-result) :unauthorized) - (or (:error-message fresh-result) - (format "Invalid Launchpad token (partially redacted) [%s]" - (common-util/scrub-token token))))))) + (do + (cache/set-value cache cache-key fresh-result) + (errors/throw-service-error + (or (:error-type fresh-result) :unauthorized) + (or (:error-message fresh-result) + (format "Invalid Launchpad token (partially redacted) [%s]" + (common-util/scrub-token token))))))))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

(do

(error (format "Launchpad token (partially redacted) [%s] has expired."

(common-util/scrub-token token)))

(errors/throw-service-error

:unauthorized

(format "Launchpad token (partially redacted) [%s] has expired."

(common-util/scrub-token token))))

(cache/evict cache cache-key)

(let [fresh-result (get-launchpad-user-fn)]

(if (:valid fresh-result)

(do

(cache/set-value cache cache-key fresh-result)

fresh-result)

(errors/throw-service-error

(or (:error-type fresh-result) :unauthorized)

(or (:error-message fresh-result)

(format "Invalid Launchpad token (partially redacted) [%s]"

(common-util/scrub-token token)))))))

(do

(cache/evict cache cache-key)

(let [fresh-result (get-launchpad-user-fn)]

(if (:valid fresh-result)

(do

(cache/set-value cache cache-key fresh-result)

fresh-result)

(do

(cache/set-value cache cache-key fresh-result)

(errors/throw-service-error

(or (:error-type fresh-result) :unauthorized)

(or (:error-message fresh-result)

(format "Invalid Launchpad token (partially redacted) [%s]"

(common-util/scrub-token token))))))))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@transmit-lib/src/cmr/transmit/launchpad_user_cache.clj` around lines 63 - 74, After evicting the stale entry, the code currently throws on a non-transient invalid fresh-result without storing it back into the cache, causing repeated EDL calls for the same bad token; update the branch in the block around get-launchpad-user-fn so that when fresh-result is invalid (i.e., (:valid fresh-result) is false) you still call cache/set-value with the same cache and cache-key to record the failure result (fresh-result) before invoking errors/throw-service-error; reference symbols to update: cache/evict, get-launchpad-user-fn, fresh-result, cache/set-value, errors/throw-service-error, common-util/scrub-token, cache-key and token.

eereiter added 2 commits March 9, 2026 10:43

CMR-10838: Moving calls for granule counts to caches.

d5f1c01

CMR-10838: Changing has granules features to use granule counts cache…

8c8ca5f

… instead of calling elastic search

eereiter changed the title ~~Cmr 10838 erich~~ DRAFT Cmr 10838 erich Mar 9, 2026

CMR-10838: fixing unit tests.

4b727e1

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

daniel-zamora and others added 4 commits March 19, 2026 08:34

CMR-10949: Remove get-old-index-set and CMR-10600 feature toggle and …

97324f9

…logs (#2399)

CMR-11118

b6b017a

* logic * add new line * update test

CMR-10838: Sending up a debug change

024ff64

coderabbitai bot reviewed Mar 19, 2026

View reviewed changes

eereiter closed this Mar 19, 2026

		(let [cache-key (keyword (DigestUtils/sha256Hex token))
		token-info (cache/get-value cache cache-key get-launchpad-user-fn)]

Conversation

eereiter commented Mar 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What is the objective?

What are the changes?

What areas of the application does this impact?

Required Checklist

Additional Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Poem

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eereiter commented Mar 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

codecov-commenter commented Mar 9, 2026 •

edited

Loading