How to combine two NER HuggingFace pipelines with spacy-huggingface-pipelines #12815

CarlosGMPZ · 2023-07-11T09:00:28Z

CarlosGMPZ
Jul 11, 2023

I am trying to combine two HuggingFace models fine-tuned for NER, using spacy-huggingface-pipelines. One detects proteins and the other one detects terms related to cancer. Putting them both in the same pipeline means the latter one overwrites the previous one, so my approach was to combine docs using Doc.from_docs(). The code so far looks like this:

nlp_pharmaconer = spacy.blank("es")
nlp_pharmaconer.add_pipe(
    "hf_token_pipe",
    config={
        "model": "PlanTL-GOB-ES/bsc-bio-ehr-es-pharmaconer",
        "annotate": "ents",
        "annotate_spans_key": "bsc-bio-ehr-es-pharmaconer",
    },
)
nlp_pharmaconer.initialize()

nlp_cantemist = spacy.blank("es", vocab=nlp_pharmaconer.vocab)
nlp_cantemist.add_pipe(
    "hf_token_pipe",
    config={
        "model": "PlanTL-GOB-ES/bsc-bio-ehr-es-cantemist",
        "annotate": "ents",
        "annotate_spans_key": "bsc-bio-ehr-es-cantemist",
    },
)
nlp_cantemist.initialize()

Then, loading a sentence with both cancer terms and proteins and combining the results:

text = "Neoplasia y tumoración. Los resultados fueron normales, con ANA, anti-Sm, anti-RNP, anti-SSA, anti-SSB, anti-Jo1 y anti-Scl70 negativos."
doc_pharmaconer = nlp_pharmaconer(text)
doc_cantemist = nlp_cantemist(text)
doc_all = spacy.tokens.Doc.from_docs([doc_pharmaconer, doc_cantemist])
spacy.displacy.render(doc_all, style="ent")

Now, the ents part of doc_all looks as it should, a list of all ents. But when running displacy, the text appears twice, once annotated with the nlp_pharmaconer terms and one with the nlp_cantemist ones. I was hoping to get one text annotated with both that could be then exported as an image. Is there a way to do this?

Answered by svlandeg

Jul 14, 2023

Hi @CarlosGMPZ,

I see what you're trying to do here, and the best approach will be to get your two components into one pipeline.

To get this to work, you just have to give them each a unique name. If you don't do that, they take the factory name "hf_token_pipe" by default and the second one will complain that the name already exists. You can set a different name with nlp.add_pipe(..., name=XXX).

Then we need to make a few more adjustements. If you set both of them to "annotate": "ents" then yes, the second one will overwrite the first, which is not what we want. Instead, we'll let both components store their results in doc.spans, each using a unique key, like so:

"annotate": "spans",
"ann…

View full answer

svlandeg · 2023-07-14T08:40:33Z

svlandeg
Jul 14, 2023

Hi @CarlosGMPZ,

I see what you're trying to do here, and the best approach will be to get your two components into one pipeline.

To get this to work, you just have to give them each a unique name. If you don't do that, they take the factory name "hf_token_pipe" by default and the second one will complain that the name already exists. You can set a different name with nlp.add_pipe(..., name=XXX).

Then we need to make a few more adjustements. If you set both of them to "annotate": "ents" then yes, the second one will overwrite the first, which is not what we want. Instead, we'll let both components store their results in doc.spans, each using a unique key, like so:

"annotate": "spans",
"annotate_spans_key": "bsc-bio-ehr-es-pharmaconer",

Then, when the pipeline has run, we'll want to put all spans together, filter them (to avoid overlapping entities) and store them to doc.ents.

Altogether, you'd get something like this:

    nlp = spacy.blank("es")
    nlp.add_pipe(
        "hf_token_pipe",
        name="pharma_pipe",
        config={
            "model": "PlanTL-GOB-ES/bsc-bio-ehr-es-pharmaconer",
            "annotate": "spans",
            "annotate_spans_key": "bsc-bio-ehr-es-pharmaconer",
        },
    )
    nlp.add_pipe(
        "hf_token_pipe",
        name="cantemist_pipe",
        config={
            "model": "PlanTL-GOB-ES/bsc-bio-ehr-es-cantemist",
            "annotate": "spans",
            "annotate_spans_key": "bsc-bio-ehr-es-cantemist",
        },
    )
    nlp.initialize()

    text = "Neoplasia y tumoración. Los resultados fueron normales, con ANA, anti-Sm, anti-RNP, anti-SSA, anti-SSB, anti-Jo1 y anti-Scl70 negativos."
    doc = nlp(text)
    spans = doc.spans["bsc-bio-ehr-es-pharmaconer"]
    spans.extend(doc.spans["bsc-bio-ehr-es-cantemist"])
    doc.ents = spacy.util.filter_spans(spans)

If you run this through displacy, you'll see all entities on one text displayed together 🎉

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to combine two NER HuggingFace pipelines with spacy-huggingface-pipelines #12815

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to combine two NER HuggingFace pipelines with spacy-huggingface-pipelines #12815

Uh oh!

CarlosGMPZ Jul 11, 2023

Replies: 1 comment

Uh oh!

svlandeg Jul 14, 2023

CarlosGMPZ
Jul 11, 2023

svlandeg
Jul 14, 2023