Converting Ontonotes ( .gold_conll file format) to DocBins ( .spacy file format) #12222
-
Hi! TLDR; How to convert .gold_conll (as seen in table below) to .spacy?
My plan is to work with the OntoNotes data in spaCy. I have retrieved the OntoNotes dataset, however, I am struggling to convert my data to a format. I see that the format is similar, but not identical, to two of the sample data formats that may be converted using the python -m spacy converter CLI-command, both trying the --converter auto, conllu, and conll (ner-token-per-line.iob and ner-token-per-line-conll2003.iob). Above is a table of what my data looks like, and this is where I have retrieved it from. I have been unable to find the entire dataset anywhere else, and I see that none of the HuggingFace datasets are available with 1) all the data, and 2) in the right format. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 9 replies
-
Hi @emiltj, have you tried using the |
Beta Was this translation helpful? Give feedback.
Hi @emiltj, have you tried using the
-c
option to specify theconll
converter? See the docs here.E. g.:
spacy convert nameofyourfile.conll -c conll ./output/