Skip to content
Discussion options

You must be logged in to vote

Yes, using blank lines to separate sentences is the format described as conll 2003, or the ner option for the converter. The example file from the repo you link to is here.

There's slightly more detail about the optional -DOCSTART- tag, which can be used to separate documents, if you look at the source here:

https://github.com/explosion/spaCy/blob/master/spacy/training/converters/conll_ner_to_docs.py#L20

I guess by the "penultimate" example you mean ner-token-per-line.iob? That doesn't have blank lines and so doesn't seem to distinguish sentences.

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@m-nlp-q
Comment options

@polm
Comment options

@polm
Comment options

@m-nlp-q
Comment options

@m-nlp-q
Comment options

Answer selected by m-nlp-q
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / training Feature: Training utils, Example, Corpus and converters
2 participants