Skip to content

Conversation

rahul-tuli
Copy link
Collaborator

@rahul-tuli rahul-tuli commented Oct 6, 2025

Implements a file-based caching mechanism to reduce lm-eval test times by avoiding redundant base model evaluations when multiple tests use the same base model
configuration.

Motivation

Our lm-eval tests are run individually as separate processes via run_tests.sh, with each config file spawning a new pytest process. This means:

  • Each test evaluates the base model independently for recovery-based testing
  • Tests with the same base model repeat identical evaluations
  • Total test time scales linearly with number of configs
bash tests/e2e/vLLM/run_tests.sh -c tests/lmeval/configs -t tests/lmeval/test_lmeval.py || SUCCESS=$?

For example, running 8 tests with the same base model (meta-llama/Meta-Llama-3-8B-Instruct) results in 8 redundant base model evaluations, roughly doubling the total
test time.

Solution

Implemented a file-based cache that persists base model evaluation results across separate pytest processes:

  • Cache key: Uniquely identifies evaluations by (model, task, num_fewshot, limit, batch_size, model_args)
  • Storage: JSON files in .lmeval_cache/ directory (configurable via LMEVAL_CACHE_DIR)
  • Persistence: Survives across separate pytest processes run by run_tests.sh
  • Control: Can be disabled via DISABLE_LMEVAL_CACHE=1 environment variable

Implementation Details

Core components:

  1. LMEvalCacheKey: Frozen dataclass that handles cache key generation, file I/O, and cache lookups
  2. cached_lm_eval_run: Decorator that transparently adds caching to base model evaluation methods
  3. File-based storage: Required because each config runs in a separate process (in-memory cache wouldn't persist)

Design decisions:

  • File-based over in-memory: Each test config runs as a separate pytest process, so in-memory cache would be cleared between tests
  • Fail-safe: Cache failures never break tests - errors are logged and execution continues without cache
  • Clean abstractions: Self-documenting code with minimal comments needed

Performance Impact

Expected speedup: ~30% for test suites with multiple configs sharing the same base model.

Usage

Enable caching (default):

bash tests/e2e/vLLM/run_tests.sh -c tests/lmeval/configs -t tests/lmeval/test_lmeval.py

Disable caching:

  DISABLE_LMEVAL_CACHE=1 bash tests/e2e/vLLM/run_tests.sh -c tests/lmeval/configs -t tests/lmeval/test_lmeval.py

Custom cache location:

  LMEVAL_CACHE_DIR=/tmp/my_cache bash tests/e2e/vLLM/run_tests.sh -c tests/lmeval/configs -t tests/lmeval/test_lmeval.py

Clear cache:

  rm -rf .lmeval_cache/

Testing

  • Tested with multiple weight-only quantization schemes sharing the same base model with and w/o the cache; the cached run was faster

Files Changed

  • tests/testing_utils.py: Added caching implementation (~50 lines)
  • tests/lmeval/test_lmeval.py: Applied @cached_lm_eval_run decorator to _eval_base_model()

Implements caching mechanism to reduce test times by avoiding redundant
base model evaluations when multiple tests use the same configuration.
Cache is stored in memory during test session and automatically cleared
when the process exits.

Signed-off-by: Rahul Tuli <[email protected]>
Signed-off-by: Rahul Tuli <[email protected]>
@rahul-tuli rahul-tuli requested a review from Copilot October 6, 2025 12:46
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a file-based caching mechanism for lm-eval tests to reduce test execution time by avoiding redundant base model evaluations when multiple test configurations use the same base model.

Key changes:

  • Replaces in-memory cache with file-based cache to persist across separate pytest processes
  • Introduces LMEvalCacheKey dataclass for managing cache keys and file operations
  • Refactors caching decorator to use the new file-based system

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@rahul-tuli rahul-tuli changed the base branch from feat/lmeval-base-model-caching to main October 6, 2025 15:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant