Maybe we should add a blog post for downsampling a dataset (notably retrieval dataset). I could imagine this is a common use-case _Originally posted by @KennethEnevoldsen in https://github.com/embeddings-benchmark/mteb/pull/3810#pullrequestreview-3617900338_