Skip to content

[Storage] "from_hub" improvements #296

@casenave

Description

@casenave

download_from_hub: uses load_metadata_from_hub that download into cache folder, then reads from this cache, then resaves to user-provided local-dir. Same for load_infos_from_hub and load_problem_definitions_from_hub. Same for init_streaming_from_hub that uses these functions.
While this works, for parallel treatment, it will re-download several times: it's a problem for large constant parts of the tree. Must download in one-step to the user-provided folder (with overwrite option). Must either add a fingerprint mechanism to prevent redownload by parallel threads, or be very aware that Dataloaders must never init with something that will download (e.g.: never init_streaming_from_hub).
Besides, in PR #294, load_problem_definitions_from_hub download memmaps to a temp_folder, which is dangerous for later accesses.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions