Skip to content

feat(caching): add seed_cache_dir for cross-run cache reuse#777

Open
gchlebus wants to merge 4 commits intomainfrom
feature/cache-seed-dir
Open

feat(caching): add seed_cache_dir for cross-run cache reuse#777
gchlebus wants to merge 4 commits intomainfrom
feature/cache-seed-dir

Conversation

@gchlebus
Copy link
Copy Markdown
Contributor

@gchlebus gchlebus commented Feb 26, 2026

Problem

When the same evaluation is run on different clusters (e.g., CW-PDX → CW-DFW), each gets a separate output_dir and therefore a separate cache_dir. The cache keys are identical (SHA-256 of the request JSON body), but the caches are physically isolated on separate Lustre filesystems. If a run times out on one cluster and is restarted on another, all cached results are lost — even though the exact same requests would produce the exact same cache keys.

Solution

Add a seed_cache_dir parameter to CachingInterceptor that acts as a read-only fallback cache. On a cache miss in the primary cache, the interceptor checks the seed cache. Seed cache hits are automatically promoted into the primary cache, so the output cache is always self-contained after a run. The seed cache itself is never modified.

Usage

Legacy config:

adapter_config:
  seed_cache_dir: /path/to/previous/run/cache

Interceptor config:

interceptors:
  - name: caching
    config:
      seed_cache_dir: /path/to/previous/run/cache

Workflow

  1. Run eval on Cluster A → cache populated at {output_dir}/cache/
  2. Copy cache directory to Cluster B: rsync -a clusterA:/path/cache/ clusterB:/path/seed-cache/
  3. Run eval on Cluster B with seed_cache_dir: /path/seed-cache
  4. Cluster B gets instant cache hits for all previously completed items
  5. Output cache on Cluster B is self-contained — includes both promoted seed entries and newly generated responses

Changes

  • caching_interceptor.py: Added seed_cache_dir param to Params, seed Cache initialization with directory existence validation (warns when subdirs are missing), fallback logic in _get_from_cache, automatic promotion of seed hits into primary cache via direct cache writes (bypasses response counter to avoid inflating _cached_responses_count)
  • adapter_config.py: Added seed_cache_dir to LegacyAdapterConfig, wired through to caching interceptor in from_legacy_config()
  • test_seed_cache.py: 11 tests covering fallback, promotion to primary, primary precedence, both-miss, no-seed, nonexistent dir, partial seed dir, write isolation, seed immutability, and legacy config passthrough
  • docs/libraries/nemo-evaluator/interceptors/caching.md: Added "Seed Cache" section documenting both legacy and interceptor configuration formats, promotion behavior, and cross-cluster usage guide

Testing

Unit tests

  • 11 tests, all passing
  • Existing tests pass with zero regressions

Real cluster validation

1. End-to-end test (AA_math_test_500, CW-PDX → CW-DFW):

  • Copied cache (2,500 entries, 56MB) from CW-PDX to CW-DFW
  • Ran eval with pre_cmd installing this branch
  • 2,500 / 2,500 LLM responses served from seed cache — 100% hit rate
  • Zero GPU inference on DFW — all responses from cache

2. Production use (HLE benchmark, CW-DFW → CW-PDX):

  • DFW multi-node run timed out at 63.5% (1,677/2,158 items)
  • Copied 151MB cache from DFW to PDX
  • Ran HLE eval on PDX with seed_cache_dir pointing to copied cache
  • 1,677 cached items served in ~11 seconds, then generated remaining 481 items in ~2h
  • GPT-4o judging completed in ~30 min
  • Result: 23.17% judge_correct — complete end-to-end evaluation with cross-cluster cache reuse
  • Primary cache on PDX ended up self-contained (all 2,158 entries) thanks to seed promotion

Test details

  • Invocations: 84774da8d9154adc (AA_math test), d8f3b1db276dd6e2 (HLE production)
  • Clusters: CW-DFW, CW-PDX
  • Model: Qwen3.5-122B-A10B (vLLM, 8×GPU per node, 8 nodes)
  • Branch install: pip install nemo-evaluator @ git+...@feature/cache-seed-dir via pre_cmd

@gchlebus gchlebus requested review from a team as code owners February 26, 2026 23:14
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Feb 26, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@gchlebus gchlebus force-pushed the feature/cache-seed-dir branch 2 times, most recently from 72a6da8 to 16c4c92 Compare February 28, 2026 00:13
@github-actions github-actions bot added the documentation Improvements or additions to documentation label Feb 28, 2026
@gchlebus gchlebus force-pushed the feature/cache-seed-dir branch from 4a5bc90 to 4a15ea8 Compare February 28, 2026 00:17
@gchlebus
Copy link
Copy Markdown
Contributor Author

/ok to test 4a15ea8

gchlebus added 3 commits March 3, 2026 08:54
Add a seed_cache_dir parameter to CachingInterceptor that enables
reusing cached responses from a previous evaluation run. On cache
miss in the primary cache, the interceptor falls back to the seed
cache directory (read-only). New responses are always written to
the primary cache only.

This is useful when migrating evaluations between clusters with
separate filesystems (e.g., CW-PDX to CW-DFW). The cache keys
are identical (SHA-256 of request JSON body), but previously the
caches were physically isolated. Users can now copy the cache
directory from one run and point seed_cache_dir at it.

Usage (legacy config):
  adapter_config:
    seed_cache_dir: /path/to/previous/run/cache

Usage (interceptor config):
  interceptors:
    - name: caching
      config:
        seed_cache_dir: /path/to/previous/run/cache

Changes:
- caching_interceptor.py: Add seed_cache_dir param, initialize
  read-only seed Cache instances, fall back on primary miss
- adapter_config.py: Add seed_cache_dir to LegacyAdapterConfig,
  pass through to caching interceptor in from_legacy_config()
- test_seed_cache.py: 9 tests covering fallback, precedence,
  isolation, nonexistent dirs, and legacy config passthrough

Signed-off-by: Grzegorz Chlebus <gchlebus@nvidia.com>
Seed cache entries are now automatically copied into the primary cache
on hit, making the output cache self-contained after a run. This means
future runs can use the primary cache directly without needing the
original seed cache.

Previously, seed hits were returned without writing to primary, leaving
the output cache incomplete (only containing newly generated responses).

- Updated _get_from_cache to call _save_to_cache on seed hits
- Updated field description to document promotion behavior
- Added test_seed_hit_promoted_to_primary test
- Updated existing tests to verify promotion side effects

Signed-off-by: Grzegorz Chlebus <gchlebus@nvidia.com>
Document the seed cache feature including:
- Configuration via interceptor config and legacy adapter config
- Cache lookup and promotion behavior
- Cross-cluster usage guide with rsync + mount example
- Cache key portability explanation

Signed-off-by: Grzegorz Chlebus <gchlebus@nvidia.com>
@gchlebus gchlebus force-pushed the feature/cache-seed-dir branch from 4a15ea8 to 01868ba Compare March 3, 2026 07:55
- Warn when seed_cache_dir is configured but subdirs are missing
- Avoid inflating response counter during seed-to-primary promotion
- Add test for partial seed directory (responses/ without headers/)
- Add interceptor config example to seed cache documentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Grzegorz Chlebus <gchlebus@nvidia.com>
@gchlebus
Copy link
Copy Markdown
Contributor Author

gchlebus commented Mar 3, 2026

/ok to test 8ddbf16

Comment on lines +107 to +108
self.seed_responses_cache = Cache(directory=seed_responses_dir)
self.seed_headers_cache = Cache(directory=seed_headers_dir)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please seed also the requests cache. Now it's confusing that we omit only that one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation nemo-evaluator tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants