Skip to content

Conversation

@KennethEnevoldsen
Copy link
Contributor

@KennethEnevoldsen KennethEnevoldsen commented Nov 14, 2025

Found a couple of issues where the cache is not hit

  • the condition overwrite_strategy == "only-missing" and overwrite_strategy == OverwriteStrategy.ONLY_MISSING would never be met as it is never both (so we always rerun all splits)
  • Currently, the remote cache path is not specified correctly (remote vs remote/results), which means that it is never hit.
  • Added a check if a merge is required. E.g., it is often the case that you don't need to merge the new results (because the results include all the splits), however, we still rerun them if the version does not match
  • Added better error messages to merge messages
  • Added test that cache re-used as intended

Found a couple of issue where the cache is not hit

- the condition `overwrite_strategy == "only-missing"
        and overwrite_strategy == OverwriteStrategy.ONLY_MISSING` would never be met as it is never both (so we always rerun all splits)
- Currently the remote cache path is not specified correctly (`remote` vs `remote/results`), which means that it is never hit.
- Added a check if a merge is required. E.g. it is often the case that you don't need to merge the new results (because the results include all the splits), however we still rerun them if the version does not match
-  Added better error messages to merge messages
Copy link
Member

@Samoed Samoed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add tests for this?

@KennethEnevoldsen
Copy link
Contributor Author

Yep probably

@KennethEnevoldsen
Copy link
Contributor Author

Added test for cache re-use. Did not add tests for remote cache.

Comment on lines +81 to +90
cached_results = mteb.evaluate(
model, task, cache=cache, overwrite_strategy="only-cache"
)
cached_result = cached_results[0]
assert cached_result.task_name == task.metadata.name, (
"results should match the task"
)
assert cached_result.get_score() == expected_score, (
"main score should match the expected value"
)
Copy link
Member

@Samoed Samoed Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add check that result that task result that not in cache don't load?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we can probably reuse tests/historic_results to test cache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am unsure how this would look?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I forgot that we also have tests/mock_mteb_cache. And with it, cache can be tested.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, but clearly it should never load results that are not in the cache, but I am unsure how I would test that (we kinda do it already by checking the scores)

Copy link
Member

@Samoed Samoed Nov 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For check that datasets using only-cache

import mteb
import pytest

def test_cache_hit():
    model = mteb.get_model("baseline/random-encoder-baseline")
    task = mteb.get_task("MIRACLRetrieval")
    with pytest.raises(ValueError, match="ValueError: overwrite_strategy is set to 'only-cache' and the results file exists. However there are the following missing splits (and subsets): {'dev': ['default']}. To rerun these set overwrite_strategy to 'only-missing'."):
        scores = mteb.evaluate(model, task, overwrite_strategy="only-cache")

Probably test for only-missing can be done by mocking and checking call times

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants