-
Notifications
You must be signed in to change notification settings - Fork 551
Description
We aim to achieve the following:
- Add back the file writing to disk:
all_results.to_disk(results_cache_path) - Add _load_results (without cache), which just loads the results. This function is then used in the results repo to generate the cache as well. For debugging, we add the flag --no-cache
Quote
The reason the cache was removed was that the cache invalidation makes leaderboard development hard (but it has been a while so I am a bit uncertain - could also just be a merge)
Example workflow:
- Someone reports a bug (model X does not appear)
- I run the leaderboard locally to confirm (creating a cache file)
- I fix the bug
- I rerun the leaderboard, which loads from cache and then hides the bug fix
Seems like we have two changes here that are meaningful: 1) faster loading using a pre-computed file, and 2) faster download by reducing the file size.
To me it seems like this could be incorporated in the current _load_results as:
# suggest that we rename to: _load_results_with_remote_cache
def _load_results(cache: ResultCache) -> BenchmarkResults:
# check if ResultCache has a remote with a valid cached result
# attempt to download the remote
# load and return
# if not available load results and usual Problems I see from this are how we ensure that the loaded results are equivalent to what would be generated. An easy fix here would be to add _load_results (without cache), which just loads the results. This function is then used in the results repo to generate the cache as well. For debugging, we add the flag --no-cache.
I would ideally want these changes added to the ResultCache rather than to the leaderboard, as they are not leaderboard-specific and could benefit us elsewhere, but I am not entirely sure where to work them in. Maybe a general concern is that we could speed it up further down the chain without (reducing download size + loading time), and that would be a simpler solution that tackles the root problems (we e.g. have a PR open here: embeddings-benchmark/results#364)
I am unsure how far we can get with these. I might be leaning towards adding the above-mentioned _load_results_with_remote_cache for now - would love to hear your thoughts here as well.
Originally posted by @KennethEnevoldsen in #3795 (comment)