Skip to content
Discussion options

You must be logged in to vote

The spaCy models are trained on newspaper style text, which is properly capitalized and punctuated for the most part, so they haven't seen lowercase text like before and won't do very well. I believe we do some augmentation to help with this, but being lowercase does make it harder. They also won't do very well on isolated entity names ("HDFC Bank" rather than "I opened an account with HDFC Bank").

If this is a common problem for you, it might make sense to train your own NER models, or to use a truecasing model, as mentioned in the issue you linked. (The particular truecaser linked there seems to be abandoned and somewhat old, so I would look for something more recent.)

Replies: 1 comment 5 replies

Comment options

You must be logged in to vote
5 replies
@n-srinidhi
Comment options

@polm
Comment options

@n-srinidhi
Comment options

@polm
Comment options

@n-srinidhi
Comment options

Answer selected by n-srinidhi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
perf / accuracy Performance: accuracy
2 participants