Cannot convert json data to spacy format #6342
-
How to reproduce the behaviour
Your Environment
Hi, I'm trying to convert some json training data into spacy format using the convert command, so that I can try fine-tuning a pre-trained transformer model with the nightly version of spaCy. I used this data successfully for fine-tuning a scispaCy NER model. So I know the format is valid for previous versions of spaCy. When running the command above, I get the following result : You can find attached the file dev.json |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 1 reply
-
The JSON converter is for a specific spacy JSON training format described here: https://spacy.io/api/annotation#json-input See this answer for how to convert from the simple training format directly to |
Beta Was this translation helpful? Give feedback.
-
Thank you for your answer. I've just tried on my data and got : ValueError: [E973] Unexpected type for NER data Any idea ? |
Beta Was this translation helpful? Give feedback.
-
Make sure your data is in the simple training format as shown in the example here: https://github.com/explosion/spaCy/blob/45c9a688285081cd69faa0627d9bcaf1f5e799a1/examples/training/train_ner.py I think you needs tuples instead of lists for the entity spans so the annotations are detected correctly by |
Beta Was this translation helpful? Give feedback.
-
It works like a charm. Thank you very much. |
Beta Was this translation helpful? Give feedback.
-
test_logs_auth.txt |
Beta Was this translation helpful? Give feedback.
The JSON converter is for a specific spacy JSON training format described here: https://spacy.io/api/annotation#json-input
See this answer for how to convert from the simple training format directly to
DocBin
(.spacy
) for v3: https://stackoverflow.com/a/64677899/461847