Skip to content

Make scripts/download_datasets.py download BEIR datasets #13

@Witiko

Description

@Witiko

In Dockerfile, we call the script scripts/download_datasets.py that downloads all datasets to /var/tmp/pv211, so that the datasets are shared by all students that use JupyterHub, saving time and disk space. For example, here we download the ARQMath datasets: 1->2. Here, the students load them: 3->4->5.

Since #3, we've supported BEIR datasets. However, the BEIR datasets are not downloaded in Dockerfile and they are saved and loaded from the ./datasets directory, which slows down the students and duplicates disk space occupied.

Tasks

  • Download BEIR datasets to /var/tmp/pv211 in scripts/download_datasets.py.
  • Load BEIR datasets from /var/tmp/pv211 in pv211_utils.beir.loader.

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions