Adding a lemma for a new word and the concept of normalization/lemmatization in spaCy #12990
Unanswered
igormorgado
asked this question in
Help: Coding & Implementations
Replies: 2 comments 2 replies
-
Got this partial solution
But IMHO, the |
Beta Was this translation helpful? Give feedback.
0 replies
-
Great question! The issue that you are running into is that the rule-based lemmatizer processed the lowercase orthographic forms (the tokens as they appear in the text) and not the normalized forms. You can resolve this issue by adding an exception to the tokenizer. See this earlier discussion for an example: |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Following the examples from documentation regarding tokenization I have the following code:
Then I check the tokenization
But the lemmatizer does not return the same output
In other hand
What I'm doing wrong? How to "fix"? In spaCy what is the difference between normalized tokens and lemmatized tokens? How can I "teach" the lemmatization of a single token (as this
gim
token in example) ?Beta Was this translation helpful? Give feedback.
All reactions