Skip to content
Discussion options

You must be logged in to vote

Lots of decisions about lemma annotation (for example, how to lemmatize pronouns or punctuation) are task-specific or corpus-specific. (Using the masculine plural for nouns sounds unusual to me, though, is there a particular Spanish corpus or dictionary that does this?)

Looking at UD_Spanish-AnCora, which we're using to evaluate the rule-based Spanish lemmatizer in the es_* pipelines, it looks like the feminine forms of similar words are lemmatized to the feminine singular.

If you need different lemmas, you could modify the rules+exceptions for the current rule-based lemmatizer or you could potentially use the trainable lemmatizer with training data that uses the alternate forms. The data…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by emartinezctech
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / es Spanish language data and models feat / lemmatizer Feature: Rule-based and lookup lemmatization
2 participants