Unable to set entity information for token 2 which is included in more than one span in entities, blocked, missing or outside #9993
-
Hi. When trying train model NER, i get error (Unable to set entity information for token 2 which is included in more than one span in entities, blocked, missing or outside.). How to make work train, when exists duplicate name? |
Beta Was this translation helpful? Give feedback.
Replies: 5 comments 10 replies
-
You can have duplicate names - that is, more than one instance of the same word in a document. But you can't have two labels on the same token, like you can't say that "John" in "His name is John Smith" is both a PERSON and LOCATION, or that "John" and "John Smith" are separate entities. If you data has conflicting information like that, it's up to you to decide how to resolve it. Usually you'll have to discard some labels using rules you decide. What is the actual code/data that gave you this error? |
Beta Was this translation helpful? Give feedback.
-
Hi, on step doc_to_bin getting error def get_doc_bin(training_data,nlp):
# the DocBin will store the example documents
db = DocBin()
for text, annotations in training_data:
doc = nlp(text.lower()) #Construct a Doc object)
ents = []
items = []
for start, end, label in annotations:
span = doc.char_span(start, end, label=label)
if span is not None:
ents.append(span)
doc.ents = ents
db.add(doc)
return db Entity list: Entity list after docbin: My config file. My raw data |
Beta Was this translation helpful? Give feedback.
-
Hmm, i try later use rules
|
Beta Was this translation helpful? Give feedback.
-
Thanks, @ljvmiranda921, @polm your help. After creation entity_ruler, is all work |
Beta Was this translation helpful? Give feedback.
-
I am facing the same issue in a nested NER problem that I am working on. Is there any way we can put multiple entity-tags for a single token? e.g, Something like it. |
Beta Was this translation helpful? Give feedback.
You can have duplicate names - that is, more than one instance of the same word in a document. But you can't have two labels on the same token, like you can't say that "John" in "His name is John Smith" is both a PERSON and LOCATION, or that "John" and "John Smith" are separate entities.
If you data has conflicting information like that, it's up to you to decide how to resolve it. Usually you'll have to discard some labels using rules you decide.
What is the actual code/data that gave you this error?