Identifying entities in lowercase for spacy's NER #11931
-
Hi!I am having issues with detecting entites given in lower case. For Eg : |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 5 replies
-
The spaCy models are trained on newspaper style text, which is properly capitalized and punctuated for the most part, so they haven't seen lowercase text like before and won't do very well. I believe we do some augmentation to help with this, but being lowercase does make it harder. They also won't do very well on isolated entity names ("HDFC Bank" rather than "I opened an account with HDFC Bank"). If this is a common problem for you, it might make sense to train your own NER models, or to use a truecasing model, as mentioned in the issue you linked. (The particular truecaser linked there seems to be abandoned and somewhat old, so I would look for something more recent.) |
Beta Was this translation helpful? Give feedback.
The spaCy models are trained on newspaper style text, which is properly capitalized and punctuated for the most part, so they haven't seen lowercase text like before and won't do very well. I believe we do some augmentation to help with this, but being lowercase does make it harder. They also won't do very well on isolated entity names ("HDFC Bank" rather than "I opened an account with HDFC Bank").
If this is a common problem for you, it might make sense to train your own NER models, or to use a truecasing model, as mentioned in the issue you linked. (The particular truecaser linked there seems to be abandoned and somewhat old, so I would look for something more recent.)