Loading (any) HuggingFace dataset using MLCroissant #839
Unanswered
camelia-tfds
asked this question in
Q&A
Replies: 1 comment
-
|
Have you tried using the croissant url as I believe the jsonld argument to the |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I noticed that, at least when loading datasets using "/workspaces/croissant/python/mlcroissant/mlcroissant/scripts/load.py", we are pointing to the JSON metadata of datasets which are under one of the subdirectories of "/workspaces/croissant/datasets" (0.8 or 1.0 or 1.1).
In those directories there is a certain number of datasets, but they do not seem to cover everything that one can find in HuggingFace (https://huggingface.co/datasets?modality=modality:image&sort=trending).
How does one load a HuggingFace dataset locally if it is not referred to in the codebase?
Is there a condition that any dataset which is to be loaded using mlcroissant has to have metadata under "/workspaces/croissant/datasets" or is there a different way to pass dataset metadata paths to our script?
Thanks in advance for any guidance!
Beta Was this translation helpful? Give feedback.
All reactions