Add: File Based Caching for lm_eval
tests
#1900
Draft
+92
−3
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Implements a file-based caching mechanism to reduce lm-eval test times by avoiding redundant base model evaluations when multiple tests use the same base model
configuration.
Motivation
Our lm-eval tests are run individually as separate processes via
run_tests.sh
, with each config file spawning a new pytest process. This means:For example, running 8 tests with the same base model (
meta-llama/Meta-Llama-3-8B-Instruct
) results in 8 redundant base model evaluations, roughly doubling the totaltest time.
Solution
Implemented a file-based cache that persists base model evaluation results across separate pytest processes:
(model, task, num_fewshot, limit, batch_size, model_args)
.lmeval_cache/
directory (configurable viaLMEVAL_CACHE_DIR
)run_tests.sh
DISABLE_LMEVAL_CACHE=1
environment variableImplementation Details
Core components:
LMEvalCacheKey
: Frozen dataclass that handles cache key generation, file I/O, and cache lookupscached_lm_eval_run
: Decorator that transparently adds caching to base model evaluation methodsDesign decisions:
Performance Impact
Expected speedup: ~30% for test suites with multiple configs sharing the same base model.
Usage
Enable caching (default):
Disable caching:
Custom cache location:
Clear cache:
Testing
Files Changed
tests/testing_utils.py
: Added caching implementation (~50 lines)tests/lmeval/test_lmeval.py
: Applied@cached_lm_eval_run
decorator to_eval_base_model()