Results Cache invalidation

We aim to achieve the following:

1. Add back the file writing to disk: `all_results.to_disk(results_cache_path)`
2. Add _load_results (without cache), which just loads the results. This function is then used in the results repo to generate the cache as well. For debugging, we add the flag --no-cache 

-----

## Quote

The reason the cache was removed was that the cache invalidation makes leaderboard development hard (but it has been a while so I am a bit uncertain - could also just be a merge)

Example workflow:
1) Someone reports a bug (model X does not appear)
2) I run the leaderboard locally to confirm (creating a cache file)
3) I fix the bug
4) I rerun the leaderboard, which loads from cache and then hides the bug fix

Seems like we have two changes here that are meaningful: 1) faster loading using a pre-computed file, and 2) faster download by reducing the file size.

To me it seems like this could be incorporated in the current `_load_results` as:

```py
# suggest that we rename to: _load_results_with_remote_cache
def _load_results(cache: ResultCache) -> BenchmarkResults:
    # check if ResultCache has a remote with a valid cached result
      # attempt to download the remote 
      # load and return
    # if not available load results and usual 
```

Problems I see from this are how we ensure that the loaded results are equivalent to what would be generated. An easy fix here would be to add `_load_results` (without cache), which just loads the results. This function is then used in the results repo to generate the cache as well. For debugging, we add the flag `--no-cache`.

I would ideally want these changes added to the `ResultCache` rather than to the leaderboard, as they are not leaderboard-specific and could benefit us elsewhere, but I am not entirely sure where to work them in. Maybe a general concern is that we could speed it up further down the chain without (reducing download size + loading time), and that would be a simpler solution that tackles the root problems (we e.g. have a PR open here: https://github.com/embeddings-benchmark/results/pull/364)

I am unsure how far we can get with these. I might be leaning towards adding the above-mentioned `_load_results_with_remote_cache` for now  - would love to hear your thoughts here as well.

_Originally posted by @KennethEnevoldsen in https://github.com/embeddings-benchmark/mteb/issues/3795#issuecomment-3691755031_
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Results Cache invalidation #3798

Quote

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Results Cache invalidation #3798

Description

Quote

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions