Spacy 3.0 merge_entities not merging entities when predicting from trained custom ner model(trained new model on conll data using CLI) #7753
-
How to reproduce the behaviourimport pandas as pd
from tqdm import tqdm
import spacy
from spacy.tokens import DocBin
import json
from datetime import datetime
nlp = spacy.load("./output/model-best") # ner model trained on conll data
print(nlp.pipe_names)
# ['tok2vec', 'ner']
nlp.add_pipe("merge_entities") # added merge_entities pipeline
print(nlp.pipe_names)
# ['tok2vec', 'ner', 'merge_entities']
texts = ["I live in New York USA",
"Steve Smith is a cricketer"]
results = []
index = 0
print("Starting Spacy Ner")
now = datetime.now()
end_count = 1
n = 0
while n < end_count:
n += 1
for doc in nlp.pipe(texts):
index += 1
for ent in doc.ents:
temp = [index,ent.label_,ent.text,ent.start_char,ent.end_char]
results.append(temp)
elapsed_time = datetime.now() - now
print("Ner done",elapsed_time ) results
I am expecting that the New York and Steve Smith should come as one word because we have used merge_entities in the pipeline. Your EnvironmentSpacy 3.0.5 |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 7 replies
-
Hi, it looks like your training data wasn't in the right format. The labels ( |
Beta Was this translation helpful? Give feedback.
Hi, it looks like your training data wasn't in the right format. The labels (
ent.label_
) should beLOC
andPER
, without theIOB
part of the tag. Your model seems to have learned the individual IOB tags with tokens as the spans instead of learning the entities as whole spans, which explains why you don't see any merging: it thinks each token is a separate entity.