How lemmatizer works in spaCy? #11277

teohsinyee · 2022-08-08T12:13:50Z

teohsinyee
Aug 8, 2022

I lemmatized some words using
example["new"] = example["ori"].apply(lambda row: " ".join([w.lemma_ for w in nlp(row)]))
However, I can't understand the logic behind it. Please refer to the table below.

No.	ori	new
0	Co-Curricular Activities	Co - Curricular Activities
1	WORK EXPERIENCES	work experience
2	Seminar and extra curricular activities:	seminar and extra curricular activity :
3	ACADEMIC QUALIFICATIONS	ACADEMIC qualification
4	Seminars/Symposiums attended	seminar / symposium attend

Why at row 0, Activities is not being lemmatized? While row 1, EXPERIENCES is lemmatized?
Why EXPERIENCES has converted to lowercase?
At row 3, I expect the lemmatized words to be 'Academic qualification'. Why is ACADEMIC still all in UPPER case? Because those all UPPER case word has converted to lower case (Refer to row 1).

I was trying to figure out the pattern of the lemmatizer but it seems inconsistent in my case.

Answered by thomashacker

Aug 11, 2022

Hello, the English models use a rule-based lemmatizer based on the POS, but POS can be incorrect, or the rules might not be 100% correct in all cases. The accuracy also depends on whether you run the lemmatizer on short paragraphs or whole sentences. Here, you can read more about how the lemmatizer works and how the token.pos influences the results.

View full answer

thomashacker · 2022-08-11T08:51:50Z

thomashacker
Aug 11, 2022

Hello, the English models use a rule-based lemmatizer based on the POS, but POS can be incorrect, or the rules might not be 100% correct in all cases. The accuracy also depends on whether you run the lemmatizer on short paragraphs or whole sentences. Here, you can read more about how the lemmatizer works and how the token.pos influences the results.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How lemmatizer works in spaCy? #11277

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

How lemmatizer works in spaCy? #11277

Uh oh!

teohsinyee Aug 8, 2022

Replies: 1 comment

Uh oh!

thomashacker Aug 11, 2022

teohsinyee
Aug 8, 2022

thomashacker
Aug 11, 2022