fix: issues on cache hits #3558

KennethEnevoldsen · 2025-11-14T10:34:37Z

Found a couple of issues where the cache is not hit

the condition overwrite_strategy == "only-missing" and overwrite_strategy == OverwriteStrategy.ONLY_MISSING would never be met as it is never both (so we always rerun all splits)
Currently, the remote cache path is not specified correctly (remote vs remote/results), which means that it is never hit.
Added a check if a merge is required. E.g., it is often the case that you don't need to merge the new results (because the results include all the splits), however, we still rerun them if the version does not match
Added better error messages to merge messages
Added test that cache re-used as intended

Found a couple of issue where the cache is not hit - the condition `overwrite_strategy == "only-missing" and overwrite_strategy == OverwriteStrategy.ONLY_MISSING` would never be met as it is never both (so we always rerun all splits) - Currently the remote cache path is not specified correctly (`remote` vs `remote/results`), which means that it is never hit. - Added a check if a merge is required. E.g. it is often the case that you don't need to merge the new results (because the results include all the splits), however we still rerun them if the version does not match - Added better error messages to merge messages

Samoed

Should we add tests for this?

KennethEnevoldsen · 2025-11-14T10:46:04Z

Yep probably

KennethEnevoldsen · 2025-11-14T12:38:21Z

Added test for cache re-use. Did not add tests for remote cache.

Samoed · 2025-11-14T13:28:21Z

tests/test_evaluate.py

+    cached_results = mteb.evaluate(
+        model, task, cache=cache, overwrite_strategy="only-cache"
+    )
+    cached_result = cached_results[0]
+    assert cached_result.task_name == task.metadata.name, (
+        "results should match the task"
+    )
+    assert cached_result.get_score() == expected_score, (
+        "main score should match the expected value"
+    )


Can we add check that result that task result that not in cache don't load?

Also, we can probably reuse tests/historic_results to test cache

I am unsure how this would look?

Oh, I forgot that we also have tests/mock_mteb_cache. And with it, cache can be tested.

Right, but clearly it should never load results that are not in the cache, but I am unsure how I would test that (we kinda do it already by checking the scores)

For check that datasets using only-cache

import mteb import pytest def test_cache_hit(): model = mteb.get_model("baseline/random-encoder-baseline") task = mteb.get_task("MIRACLRetrieval") with pytest.raises(ValueError, match="ValueError: overwrite_strategy is set to 'only-cache' and the results file exists. However there are the following missing splits (and subsets): {'dev': ['default']}. To rerun these set overwrite_strategy to 'only-missing'."): scores = mteb.evaluate(model, task, overwrite_strategy="only-cache")

Probably test for only-missing can be done by mocking and checking call times

KennethEnevoldsen requested a review from Samoed November 14, 2025 10:34

Samoed reviewed Nov 14, 2025

View reviewed changes

KennethEnevoldsen added 2 commits November 14, 2025 13:38

add test

4c67c6a

format

7c296f8

KennethEnevoldsen requested a review from Samoed November 14, 2025 12:38

Samoed reviewed Nov 14, 2025

View reviewed changes

fix: Overwrite / ignore existing results if not mergeable

e051c65

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: issues on cache hits #3558

fix: issues on cache hits #3558

KennethEnevoldsen commented Nov 14, 2025 •

edited

Loading

Uh oh!

Samoed left a comment

Uh oh!

KennethEnevoldsen commented Nov 14, 2025

Uh oh!

KennethEnevoldsen commented Nov 14, 2025

Uh oh!

Samoed Nov 14, 2025 •

edited

Loading

Uh oh!

Samoed Nov 14, 2025

Uh oh!

KennethEnevoldsen Nov 14, 2025

Uh oh!

Samoed Nov 14, 2025

Uh oh!

KennethEnevoldsen Nov 14, 2025

Uh oh!

Samoed Nov 14, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fix: issues on cache hits #3558

Are you sure you want to change the base?

fix: issues on cache hits #3558

Conversation

KennethEnevoldsen commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Samoed left a comment

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen commented Nov 14, 2025

Uh oh!

KennethEnevoldsen commented Nov 14, 2025

Uh oh!

Samoed Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Samoed Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

KennethEnevoldsen Nov 14, 2025

Choose a reason for hiding this comment

Uh oh!

Samoed Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KennethEnevoldsen commented Nov 14, 2025 •

edited

Loading

Samoed Nov 14, 2025 •

edited

Loading

Samoed Nov 14, 2025 •

edited

Loading