Wrong outputs from ru_core_news_sm model with hyphened words #10605
-
Hi, got strange outputs on lemmatizing Russian words with hyphen: Initialization: import spacy
nlp = spacy.load('ru_core_news_sm') Test 1: print([(x.lemma, x.lemma_) for x in nlp('по-любому')] Output: Test 2: print([(x.lemma, x.lemma_) for x in nlp('любому')] Output: Question: |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
Hello, |
Beta Was this translation helpful? Give feedback.
Hello,
the lemmatizer (
pymorphy2
) depends on POS annotations from the tagger. I'd suggest checking the POS tags first, maybe it makes it more clear why the lemmatization breaks in your first example.