Converting Huggingface dataset.datasets to DocBin #12392
Answered
by
thomashacker
emiltj
asked this question in
Help: Other Questions
-
How do I convert a HF dataset to DocBins? I tried:
But it is rather slow. I wonder if there's a more optimal way of doing it? |
Beta Was this translation helpful? Give feedback.
Answered by
thomashacker
Mar 13, 2023
Replies: 1 comment
-
Hey, thanks for your post! |
Beta Was this translation helpful? Give feedback.
0 replies
Answer selected by
emiltj
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hey, thanks for your post!
Doc.from_json
uses a specific JSON format and won't support whatever formatload_dataset
is returning. You'll need to convert the annotation to spaCy's format. If you'd like, we can provide some help with the conversion if you let us know how your annotation is structured.