SPACY3.1 Failing on simple sentences #8981

pratikchhapolika · 2021-08-17T11:27:44Z

pratikchhapolika
Aug 17, 2021

How to reproduce the behaviour

Your Environment

Operating System: Jupyter Notebook, Tensorflow 2.3
Python Version Used: Python 3.6
spaCy Version Used: Spacy 3.1
Environment Information: Jupyter Notebook, Tensorflow 2.3

I am trying NER tagging in latest spacy.

Here are few examples:

import spacy
from spacy import displacy

import en_core_web_trf
nlp = en_core_web_trf.load()

text = ("my name is richard and i will be taking over from here ")
doc = nlp(text)
displacy.render(doc, style='ent')

Output

/srv/conda/envs/notebook/lib/python3.7/site-packages/spacy/displacy/__init__.py:189: UserWarning: [W006] No entities to visualize found in Doc object. If this is surprising to you, make sure the Doc was processed using a model that supports named entity recognition, and check the `doc.ents` property manually if necessary.
  warnings.warn(Warnings.W006)
**my name is richard and i will be taking over from here**

Answered by adrianeboyd

Aug 17, 2021

Are you possibly using the v3.0.0 model rather than the v3.1.0 model (look at the version for en_core_web_trf in pip freeze and/or run spacy validate)? If so, I think you've just lucked into an example that doesn't work well for this model, which is primarily trained on newspaper-style text with standard capitalization. See more details about the statistical models in #3052.

The en_core_web_trf v3.1.0 model has some lowercase augmentation that should improve the performance on texts without newspaper-style capitalization. With spacy v3.1.1 and en_core_web_trf v3.1.0, both versions of "Richard" are shown as PERSON for me:

import spacy
from spacy import displacy

nlp = spacy.load("en_core_w…

View full answer

adrianeboyd · 2021-08-17T16:16:07Z

adrianeboyd
Aug 17, 2021

Are you possibly using the v3.0.0 model rather than the v3.1.0 model (look at the version for en_core_web_trf in pip freeze and/or run spacy validate)? If so, I think you've just lucked into an example that doesn't work well for this model, which is primarily trained on newspaper-style text with standard capitalization. See more details about the statistical models in #3052.

The en_core_web_trf v3.1.0 model has some lowercase augmentation that should improve the performance on texts without newspaper-style capitalization. With spacy v3.1.1 and en_core_web_trf v3.1.0, both versions of "Richard" are shown as PERSON for me:

import spacy
from spacy import displacy

nlp = spacy.load("en_core_web_trf")

text = "my name is richard and i will be taking over from here My name is Richard and I will be taking over from here."
doc = nlp(text)
displacy.serve(doc, style='ent')

Some users have also reported worse performance on identifying standalone first names in the en_core models in spacy v3 vs. v2, but I suspect it's the capitalization that's the main factor here.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

SPACY3.1 Failing on simple sentences #8981

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

SPACY3.1 Failing on simple sentences #8981

Uh oh!

pratikchhapolika Aug 17, 2021

How to reproduce the behaviour

Your Environment

Replies: 1 comment

Uh oh!

Uh oh!

adrianeboyd Aug 17, 2021

pratikchhapolika
Aug 17, 2021

adrianeboyd
Aug 17, 2021