-
Notifications
You must be signed in to change notification settings - Fork 504
fix: issues on cache hits #3558
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Found a couple of issue where the cache is not hit
- the condition `overwrite_strategy == "only-missing"
and overwrite_strategy == OverwriteStrategy.ONLY_MISSING` would never be met as it is never both (so we always rerun all splits)
- Currently the remote cache path is not specified correctly (`remote` vs `remote/results`), which means that it is never hit.
- Added a check if a merge is required. E.g. it is often the case that you don't need to merge the new results (because the results include all the splits), however we still rerun them if the version does not match
- Added better error messages to merge messages
Samoed
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add tests for this?
|
Yep probably |
|
Added test for cache re-use. Did not add tests for remote cache. |
| cached_results = mteb.evaluate( | ||
| model, task, cache=cache, overwrite_strategy="only-cache" | ||
| ) | ||
| cached_result = cached_results[0] | ||
| assert cached_result.task_name == task.metadata.name, ( | ||
| "results should match the task" | ||
| ) | ||
| assert cached_result.get_score() == expected_score, ( | ||
| "main score should match the expected value" | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add check that result that task result that not in cache don't load?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, we can probably reuse tests/historic_results to test cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure how this would look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I forgot that we also have tests/mock_mteb_cache. And with it, cache can be tested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but clearly it should never load results that are not in the cache, but I am unsure how I would test that (we kinda do it already by checking the scores)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For check that datasets using only-cache
import mteb
import pytest
def test_cache_hit():
model = mteb.get_model("baseline/random-encoder-baseline")
task = mteb.get_task("MIRACLRetrieval")
with pytest.raises(ValueError, match="ValueError: overwrite_strategy is set to 'only-cache' and the results file exists. However there are the following missing splits (and subsets): {'dev': ['default']}. To rerun these set overwrite_strategy to 'only-missing'."):
scores = mteb.evaluate(model, task, overwrite_strategy="only-cache")Probably test for only-missing can be done by mocking and checking call times
Found a couple of issues where the cache is not hit
overwrite_strategy == "only-missing" and overwrite_strategy == OverwriteStrategy.ONLY_MISSINGwould never be met as it is never both (so we always rerun all splits)remotevsremote/results), which means that it is never hit.