IOB2 format for .spacy data conversion #7351
-
Hello, I want to convert my data to BIO2 format like the penultimate data sample presented in here. In the converters description says that sentences are separated by blank lines. Just to make sure, the file given to the converter should have a format like this:
"They" starts a new sentence. I am asking because I want to exclude possible bug reasons. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
Yes, using blank lines to separate sentences is the format described as conll 2003, or the There's slightly more detail about the optional https://github.com/explosion/spaCy/blob/master/spacy/training/converters/conll_ner_to_docs.py#L20 I guess by the "penultimate" example you mean |
Beta Was this translation helpful? Give feedback.
Yes, using blank lines to separate sentences is the format described as conll 2003, or the
ner
option for the converter. The example file from the repo you link to is here.There's slightly more detail about the optional
-DOCSTART-
tag, which can be used to separate documents, if you look at the source here:https://github.com/explosion/spaCy/blob/master/spacy/training/converters/conll_ner_to_docs.py#L20
I guess by the "penultimate" example you mean
ner-token-per-line.iob
? That doesn't have blank lines and so doesn't seem to distinguish sentences.