Spacy 3.0 merge_entities not merging entities when predicting from trained custom ner model(trained new model on conll data using CLI) #7753

Ashufet · 2021-04-12T13:27:50Z

Ashufet
Apr 12, 2021

How to reproduce the behaviour

import pandas as pd
from tqdm import tqdm
import spacy
from spacy.tokens import DocBin
import json
from datetime import datetime

nlp = spacy.load("./output/model-best") # ner model trained on conll data
print(nlp.pipe_names)
# ['tok2vec', 'ner']

nlp.add_pipe("merge_entities") # added merge_entities pipeline

print(nlp.pipe_names)
# ['tok2vec', 'ner', 'merge_entities']

texts = ["I live in New York USA",
        "Steve Smith is a cricketer"]

results = []
index = 0

print("Starting Spacy Ner")
now = datetime.now()
end_count = 1
n = 0

while n < end_count:
    n += 1
    for doc in nlp.pipe(texts):
        index += 1
        for ent in doc.ents:
            temp = [index,ent.label_,ent.text,ent.start_char,ent.end_char]
            results.append(temp)

elapsed_time = datetime.now() - now
print("Ner done",elapsed_time )

results

[[1, 'I-LOC', 'New', 10, 13],
 [1, 'I-LOC', 'York', 14, 18],
 [1, 'I-LOC', 'USA', 19, 22],
 [2, 'I-PER', 'Steve', 0, 5],
 [2, 'I-PER', 'Smith', 6, 11]]

I am expecting that the New York and Steve Smith should come as one word because we have used merge_entities in the pipeline.
Please let me know if I am doing anything worng.

Your Environment

Spacy 3.0.5

Answered by adrianeboyd

Apr 12, 2021

Hi, it looks like your training data wasn't in the right format. The labels (ent.label_) should be LOC and PER, without the IOB part of the tag. Your model seems to have learned the individual IOB tags with tokens as the spans instead of learning the entities as whole spans, which explains why you don't see any merging: it thinks each token is a separate entity.

View full answer

adrianeboyd · 2021-04-12T14:05:35Z

adrianeboyd
Apr 12, 2021

Hi, it looks like your training data wasn't in the right format. The labels (ent.label_) should be LOC and PER, without the IOB part of the tag. Your model seems to have learned the individual IOB tags with tokens as the spans instead of learning the entities as whole spans, which explains why you don't see any merging: it thinks each token is a separate entity.

7 replies

Ashufet Apr 12, 2021
Author

Thanks will update the training and train the model again.

But I have one question, if we are training complete entry as one token and entity then what's the use of merge_entity pipeline because the model will predict the complete entry as one as it has learned this way.

adrianeboyd Apr 12, 2021

The entity prediction (provided as spans in doc.ents) does not modify the tokenization of the doc (inspect [t.text for t in doc]), so John and Smith would still be two separate tokens even though there's an entity John Smith.

Using merge_entities will modify the tokens so that John Smith is one token in the doc instead.

Ashufet Apr 13, 2021
Author

Thanks for the info. It worked!!

Sumit5194 Jan 11, 2023

hello I am facing same problem. My ner model output is like United B-geo states I-geo but i want output should United States geo only. I gave training data eg- {"text": "There is United States, "entities":[(8 , 14, B-geo),(16,21,I-geo)]} like wise?

adrianeboyd Jan 16, 2023

There are two supported formats for entity tags.

One is text + character spans and the other is text + words + IOB tags. The character spans should be for the whole entity span and do not include B- or I-.

import spacy
from spacy.tokens import Doc
from spacy.training import Example

nlp = spacy.blank("en")

text = "There is United States"

ann1 = {
    "entities": [(9, 22, "geo")],
}
ex1 = Example.from_dict(nlp.make_doc(text), ann1)
print(ex1.reference.ents)

words2 = ["There", "is", "United", "States"]
ents2 = ["O", "O", "B-geo", "I-geo"]
doc2 = Doc(nlp.vocab, words=words2, ents=ents2)
ex2 = Example(nlp.make_doc(text), doc2)
print(ex2.reference.ents)

If you're still running into problems, please open a new thread with a full example showing what you're trying to do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Spacy 3.0 merge_entities not merging entities when predicting from trained custom ner model(trained new model on conll data using CLI) #7753

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 7 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Spacy 3.0 merge_entities not merging entities when predicting from trained custom ner model(trained new model on conll data using CLI) #7753

Uh oh!

Uh oh!

Ashufet Apr 12, 2021

How to reproduce the behaviour

Your Environment

Replies: 1 comment · 7 replies

Uh oh!

adrianeboyd Apr 12, 2021

Uh oh!

Ashufet Apr 12, 2021 Author

Uh oh!

adrianeboyd Apr 12, 2021

Uh oh!

Ashufet Apr 13, 2021 Author

Uh oh!

Sumit5194 Jan 11, 2023

Uh oh!

adrianeboyd Jan 16, 2023

Ashufet
Apr 12, 2021

Replies: 1 comment 7 replies

adrianeboyd
Apr 12, 2021

Ashufet Apr 12, 2021
Author

Ashufet Apr 13, 2021
Author