-
Notifications
You must be signed in to change notification settings - Fork 4
Description
download_from_hub: uses load_metadata_from_hub that download into cache folder, then reads from this cache, then resaves to user-provided local-dir. Same for load_infos_from_hub and load_problem_definitions_from_hub. Same for init_streaming_from_hub that uses these functions.
While this works, for parallel treatment, it will re-download several times: it's a problem for large constant parts of the tree. Must download in one-step to the user-provided folder (with overwrite option). Must either add a fingerprint mechanism to prevent redownload by parallel threads, or be very aware that Dataloaders must never init with something that will download (e.g.: never init_streaming_from_hub).
Besides, in PR #294, load_problem_definitions_from_hub download memmaps to a temp_folder, which is dangerous for later accesses.