Skip to content
Discussion options

You must be logged in to vote

Hi @RandomArnab ,

In spaCy v3, we use the serialized .spacy files instead of JSONL. To prepare the training data for span categorization, you need to assign entities in the doc.span attribute. Here's an example below for an example sentence "Welcome to the Bank of China":

import spacy
from spacy import displacy
from spacy.tokens import Span

text = "Welcome to the Bank of China."

nlp = spacy.blank("en")
doc = nlp(text)

doc.spans["sc"] = [
   Span(doc, 3, 6, "ORG"), 
   Span(doc, 5, 6, "GPE"),
]

To serialize them into a .spacy file, you need to collect them inside a DocBin object and call the to_disk() method. Something like this:

from spacy.tokens import DocBin

doc_bin = DocBin(docs=my…

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@RandomArnab
Comment options

@geetarajagopal
Comment options

@polm
Comment options

Answer selected by ljvmiranda921
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage General spaCy usage
4 participants