Skip to content
Discussion options

You must be logged in to vote

Ah, looking again at the debug data output, I think you've provided your entity labels in an incorrect format. You want the character spans to cover the whole entity and you don't include the B/I- when you specify the entity span as above:

s = "blah blah blah ... Acropora Seriatopora blah blah blah"
annotation = {"entities": [(38, 59, "LIVB")]}

When you're converting from character offsets, you don't provide the IOB or BILUO tags, you just provide the top-level label for the whole span as one unit. With what you have, it's trying to learn I-LIVB as one entity type and B-LIVB as another entity type, which isn't want you want. That would explain why it's not handling the whitespace like I …

Replies: 10 comments 1 reply

Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Comment options

You must be logged in to vote
0 replies
Answer selected by ines
Comment options

You must be logged in to vote
1 reply
@Sumit5194
Comment options

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage General spaCy usage feat / ner Feature: Named Entity Recognizer
4 participants
Converted from issue

This discussion was converted from issue #6349 on December 11, 2020 01:07.