How can extract the exact human name from resume with spacy with patterns #10402
Replies: 2 comments 1 reply
-
Paul (a core spacy developer) has already answered this question here: https://stackoverflow.com/a/71303802 |
Beta Was this translation helpful? Give feedback.
-
Hey, based on this question and your related questions about dates on Stack Overflow, it seems you are working with this problem for the first time and maybe your expectations somewhat miscalibrated. One problem you mentioned is that if you use PERSON entities, you get things that aren't people, like "Curriculum Vitae". The response to that should not be to use POS tags - which aren't just for people's names but for all kinds of different things - but to work on improving the accuracy of your NER model. It's important to understand that NLP models aren't always predictable and aren't perfect, see #3052. Since you only gave a few examples, it's not clear if you're getting a lot of incorrect annotations, or if just a few is a problem. You need to be prepared to have some errors no matter what solution you use. As far as improving accuracy, the first thing you can do is use a larger spaCy model. You're currently using the small one. The Transformers model is usually the most accurate, but you need a decent GPU to use it. The large model will work fine without a GPU and should give comparable accuracy, so that's also worth trying. A bigger problem with your text is that since it's resume text, it's short, non-sentence bits of text. That's pretty hard for a machine learning model to pick up details in - most of the training data to spaCy's models is things like newspaper articles, with complete sentences. Since your text is very different, you may need to train your own model so that the model can understand what kinds of text you're looking for specifically. With dates, your problem seems to be finding "date of birth" as opposed to other kinds of dates. That's not an NER or matching problem, that's an information extraction problem. I would recommend you read at least the first part of the chapter on Information Extraction in the Jurafsky and Martin book (and maybe skim the NER chapter too). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
I'm extracting the human name from the resume with the spacy model en_core_web_sm and using spacy patterns like that
But With This working fine some and giving the exact name of the human but some time giving the wrong name like curriculum vitae, from Resume Genius, Sr, Electrical Engineer
I'm getting Name like this but giving problems due to identifying the proper name. Please give me a solution. Thanks
Beta Was this translation helpful? Give feedback.
All reactions