Skip to content
Discussion options

You must be logged in to vote

There are two issues here.

The general issue you're asking about is that when you call Irish() (or the same for other languages) you get a blank pipeline, but the Lemmatizer is a component you have to add. You can add it like this:

from spacy.lang.ga import Irish

nlp = Irish()
lemmatizer = nlp.add_pipe("lemmatizer")
lemmatizer.initialize()
print([token.lemma_ for token in nlp("Is mise an t-aon duine")])

That covers the general case.

The second issue, which is specific to the Irish lemmatizer, is that, based on the implementation, it seems like the lemmatizer doesn't do anything without part of speech information, but we don't have a trained tagger, so you'd have to supply your own tagge…

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by polm
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lang / ga Irish language data and models feat / lemmatizer Feature: Rule-based and lookup lemmatization
2 participants