ValueError: [E199] Unable to merge 0-length span at doc[23:23]
.-- Problem with Spacy Custom Spacy Components
#11324
-
How to reproduce the behaviourI have written a custom spacy NER function seen below, which I add my pipeline. Though it seems to work well in most instances, I have been debugging a few cases in my code-- when I came across this weird bug. While debugging my own code, I instantiated a new spacy model-- without adding my custom component. -- nlp_spacy = spacy.load('en_core_web_sm', disable = ['ner']) (...) then I run the 'problem' text on the model -- doc = nlp_spacy(': the noted that in late january he began to have a return of bloating and looser stools, but not diarrhea') then use this doc to debug my custom component. When I run my custom component once, I receive the above error 'ValueError: [E199] Unable to merge 0-length span at If I run the custom component again, and everything seems to work fine-- no error, etc... @Language.component("Multi_Word_NER") def Multi_Word_NER(doc):
What would be the cause of this? I don't understand how this could be an issue on my end. For now, as a work around, it seems that if I nest the retokenize section of the function in try statements, this issue doesn't occur. Attached are two pictures, the results after I run the first, then the second. Your Environment
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
Like the error states, you can't merge a span of zero length into a new token - it's unclear what that should do, besides nothing. Here's a shorter example that gives the same error:
I am not exactly sure how you are getting zero-length spans in your code, but you can check spans before merging to see if they have zero length and skip them. Also, to make it easier for us to help you, if you'd like to share mode code, please read the Github Markdown guide and don't share screenshots of code or errors, paste them as text. |
Beta Was this translation helpful? Give feedback.
Like the error states, you can't merge a span of zero length into a new token - it's unclear what that should do, besides nothing. Here's a shorter example that gives the same error:
I am not exactly sure how you are getting zero-length spans in your code, but you can check spans before merging to see if they have zero length and skip them.
Also, to make it easier for us to help you, if you'd like to share mode code, please read the Github Markdown guide and don't share screenshots of code or errors, paste them as text.