-
-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Closed
Labels
feat / lemmatizerFeature: Rule-based and lookup lemmatizationFeature: Rule-based and lookup lemmatizationlang / itItalian language data and modelsItalian language data and models
Description
Hi,
I recently used Spacy 3.4.4 to classify Italian verbs, but ran into the following problem using the pretrained model it_core_news_lg:
`
sentence = "aprimi la porta"
--- output ---
| text | lemma | pos | tag |
|---|---|---|---|
| aprimi | aprimo | ADJ | A |
| la | il | DET | RD |
| porta | porta | NOUN | S |
`
Sadly, the lemmatizer recognizes the verb "aprimi" as an adjective and in other cases it fails to recognize the right conjugation (I used "leggimi un libro" as sentence and Spacy said that "leggimi" comes from the verb "leggimare").
In general it seems that spacy has difficulty recognizing agglutinated verbs that involve pronouns. I tried to update Spacy to version 3.6.1, but the problem persists.
Is there any reason that explains it?
Many thanks!
Your Environment
- Operating System: Windows 10 and Windows 11
- Python Version Used: 3.9.6 and 3.11.1
- spaCy Version Used: 3.4.4 and 3.6.1
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
feat / lemmatizerFeature: Rule-based and lookup lemmatizationFeature: Rule-based and lookup lemmatizationlang / itItalian language data and modelsItalian language data and models