How does spacy handle transformers sequence limit? #8166
-
I like spacy because of using it for many document level entity recognition which have sequence length of 2000 -3000 words. But in spacy v3 we have transformers to choose as pre-trained model. But we know that bert like models have a sequence limit of 512 words. So then how can I use spacy with my documents for entity recognition. I want to understand how does spacy overcome this, and what options do I have here? |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 6 replies
-
spaCy supports a number of strategies for splitting up documents to fit in the Transformers window. The quickstart uses strided spans with some overlap. You can read more about that strategy and other options in the spaCy Transformers docs. |
Beta Was this translation helpful? Give feedback.
spaCy supports a number of strategies for splitting up documents to fit in the Transformers window. The quickstart uses strided spans with some overlap. You can read more about that strategy and other options in the spaCy Transformers docs.