Skip to content

Results Cache invalidation #3798

@isaac-chung

Description

@isaac-chung

We aim to achieve the following:

  1. Add back the file writing to disk: all_results.to_disk(results_cache_path)
  2. Add _load_results (without cache), which just loads the results. This function is then used in the results repo to generate the cache as well. For debugging, we add the flag --no-cache

Quote

The reason the cache was removed was that the cache invalidation makes leaderboard development hard (but it has been a while so I am a bit uncertain - could also just be a merge)

Example workflow:

  1. Someone reports a bug (model X does not appear)
  2. I run the leaderboard locally to confirm (creating a cache file)
  3. I fix the bug
  4. I rerun the leaderboard, which loads from cache and then hides the bug fix

Seems like we have two changes here that are meaningful: 1) faster loading using a pre-computed file, and 2) faster download by reducing the file size.

To me it seems like this could be incorporated in the current _load_results as:

# suggest that we rename to: _load_results_with_remote_cache
def _load_results(cache: ResultCache) -> BenchmarkResults:
    # check if ResultCache has a remote with a valid cached result
      # attempt to download the remote 
      # load and return
    # if not available load results and usual 

Problems I see from this are how we ensure that the loaded results are equivalent to what would be generated. An easy fix here would be to add _load_results (without cache), which just loads the results. This function is then used in the results repo to generate the cache as well. For debugging, we add the flag --no-cache.

I would ideally want these changes added to the ResultCache rather than to the leaderboard, as they are not leaderboard-specific and could benefit us elsewhere, but I am not entirely sure where to work them in. Maybe a general concern is that we could speed it up further down the chain without (reducing download size + loading time), and that would be a simpler solution that tackles the root problems (we e.g. have a PR open here: embeddings-benchmark/results#364)

I am unsure how far we can get with these. I might be leaning towards adding the above-mentioned _load_results_with_remote_cache for now - would love to hear your thoughts here as well.

Originally posted by @KennethEnevoldsen in #3795 (comment)

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions