Components Required for Lemmatizer #9194
-
Based on the diagram located here in your documentation, I believe I am reading that the tok2vec, tagger, and attribute-ruler all must be enabled in the pipeline in order to utilize the built-in lemmatizer in the small English model. Is my understanding correct? I was a little bit confused by the description below the diagram that says ."...requires token.pos annotation from either tagger+attribute_ruler or morphologizer" because it did not mention the tok2vec component as a dependence. From what I've seen the lemmatizer does not produce lemmas without the tok2vec? The reasoning behind my question is that after upgrading from using the v2 to v3 small english model and enabling toc2vec, tagger, and attribute-ruler components before the lemmatize component in my pipeline, I am noticing a big increase in the processing time. Looking for any way to trim the processing time down so I am reviewing which components are actually needed. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 3 replies
-
Yes.
The lemmatizer doesn't depend on the tok2vec directly, but in order for the tagger to work you need the tok2vec. If you had some way to get pos tags without the tok2vec the lemmatizer would happily use them.
What kind of documents are you working with (length/volume), and how much slower is it? We had some slowdown related to the Matcher, and I wouldn't be surprised if some other parts of the pipeline had slowed a little over time, but I don't think we've had reports of major slowdown for stuff like the tagger before. |
Beta Was this translation helpful? Give feedback.
-
Got it. Thank you both for the information! |
Beta Was this translation helpful? Give feedback.
Yes.
The lemmatizer doesn't depend on the tok2vec directly, but in order for the tagger to work you need the tok2vec. If you had some way to get pos tags without the tok2vec the lemmatizer would happily use them.
What kind of documents are you working with (length/volume), and how …