Skip to content
Discussion options

You must be logged in to vote

We don't have a function to go from a DocBin (.spacy file) to the simple TRAINING_DATA format, but you can do so simply enough with a function:

import spacy
from spacy.tokens import DocBin

nlp = spacy.blank("en")
training_data = []

db = DocBin().from_disk("train.spacy")
for doc in db.get_docs(nlp.vocab):
    annotations = [(ent.start_char, ent.end_char, ent.label_) for ent in doc.ents]
    training_data.append( (doc.text, annotations) )

There's nothing special about the simple training format; the example uses it because you probably don't have .spacy files already and you'll need to convert whatever other annotations you have, so it's just an example of how to do that with relatively …

Replies: 1 comment 1 reply

Comment options

You must be logged in to vote
1 reply
@MagiCsito
Comment options

Answer selected by MagiCsito
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat / ner Feature: Named Entity Recognizer
2 participants