Skip to content
Discussion options

You must be logged in to vote

I've been looking for an answer for quite some time and I finally figured out how to apply this (turned out to be so simple) so I'm posting my approach here so maybe other people could share their insights:

import spacy
text = 'EU rejects German call to boycott British lamb.'
nlp = spacy.load('en_core_web_sm')
doc = nlp(text)
offsets = [(0, 2, 'ORG'), (11, 17, 'MISC'), (34, 41, 'MISC')]
spans = [doc.char_span(x[0], x[1], label=x[2]) for x in offsets]
doc.spans['gold'] = spans
  • another approach I find was to apply only the Tokenizer from SpaCy:
tokens = tokenizer(text)
words = [token.text for token in tokens]
doc = Doc(Vocab(), words)

Replies: 1 comment 2 replies

Comment options

You must be logged in to vote
2 replies
@polm
Comment options

@ayoub-chammam
Comment options

Answer selected by svlandeg
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
training Training and updating models
2 participants